From llvm-commits at lists.llvm.org Mon Jul 6 00:22:33 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:22:33 +0000 (UTC) Subject: [PATCH] D83160: [InstCombine] Lower infinite combine loop detection thresholds In-Reply-To: References: Message-ID: <858113abf5b9728bf94d6d8152572090@localhost.localdomain> nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83160/new/ https://reviews.llvm.org/D83160 From llvm-commits at lists.llvm.org Mon Jul 6 00:22:58 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:22:58 +0000 (UTC) Subject: [PATCH] D82799: [IndVars] Delay forgetValue() call In-Reply-To: References: Message-ID: nikic added a comment. Ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82799/new/ https://reviews.llvm.org/D82799 From llvm-commits at lists.llvm.org Mon Jul 6 00:33:44 2020 From: llvm-commits at lists.llvm.org (Simon Moll via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:33:44 +0000 (UTC) Subject: [PATCH] D83170: [VE] Support symbol with offset in assembly In-Reply-To: References: Message-ID: <12de711948ddca071b7ef6962d273141@localhost.localdomain> simoll added a comment. Please consider the coding style suggestions (in particular for lower-casing function names). Otw, LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83170/new/ https://reviews.llvm.org/D83170 From llvm-commits at lists.llvm.org Mon Jul 6 00:36:20 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:36:20 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <65153ee0cd4d0fdc535d607db76fe9cf@localhost.localdomain> djtodoro added inline comments. ================ Comment at: llvm/lib/CodeGen/CMakeLists.txt:184 + LiveDebugValues/LiveDebugValues.cpp LiveDebugValues/VarLocBasedImpl.cpp ---------------- jmorse wrote: > djtodoro wrote: > > We can add new subdirectory here as: > > `add_subdirectory(LiveDebugValues)` > > > > and then, add another `CMakeLists.txt` with in `LiveDebugValues/` and play with all of this within that `CMake` file locally? wdyt? > I've shied away from this due to being a CMake novice, but my understanding is we'd need to either: > * Append the sources to the "LLVMCodeGen" library sources list from inside that new CMakeLists.txt, or > * Define a new "llvm_component_library" for LiveDebugValues, which seems like overkill. > > Referring to the source files from this CMakeList avoided me having to think about that; I can implement the first item if it's preferable for each directory to have a CMakeLists.txt. I am OK with either way. If other folks do not have impression about that, this part is OK for me as well. >Referring to the source files from this CMakeList avoided me having to think about that; I can implement the first item if it's preferable for each directory to have a CMakeLists.txt. I am not sure we need that if we do not consider the item 2. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Mon Jul 6 00:38:10 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:38:10 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <25cd6850520aeb85d7ae895507c25cba@localhost.localdomain> jhenderson marked an inline comment as done. jhenderson added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:37 + if (Err) { + // Drop the lastly added set because it does not contain anything useful + // to dump. ---------------- Perhaps "newly" instead of "lastly". ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:57 + if (!C) { + // Preserve the lastly added set because at least some field of the header + // are read and can be dumped. ---------------- Same "lastly" -> "newly" maybe. I feel like it reads a little better. Also "field" -> "fields" ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:27-28 Sets.clear(); - DataExtractor::Cursor C(0); - while (C && Data.isValidOffset(C.tell())) { + uint64_t Offset = 0; + while (Data.isValidOffset(Offset)) { + uint64_t SetOffset = Offset; ---------------- ikudrin wrote: > jhenderson wrote: > > What's behind the reasoning for no longer using the `Cursor` throughout? > The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless. I'm not sure I follow. As far as my understanding of `Cursor` goes, you can have: ``` DataExtractor::Cursor C(0); while (C && Data.isValidOffset(C.tell())) { // Parse the length if (!C) { /* report invalid length, using C.takeError() */ return; } // Parse the header while (C) { /* parse entries */ } if (C && C.tell() != Offset) { /* report bad terminator */ } } if (!C) { /* report parsing error using C.takeError() */ ``` The `Cursor` is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that `C.takeError()` does not need calling if the `Cursor` is in a success state, much like `Expected`). The only case where it might be different is if `Cursor` is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got: ``` while (Offset) { DataExtractor::Cursor C(Offset); ... = Data.getInitialLength(C); if (!C) { /* report invalid length, using C.takeError() */ return; } // Parse the header while (C) { /* parse entries */ } if (C && C.tell() != Offset) { /* report bad terminator */ } if (!C) { /* report parsing error using C.takeError() */ } ``` I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around? ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:46 + Offset += Set.Length; + DWARFDataExtractor SetData(Data, Offset); + const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(Set.Format); ---------------- ikudrin wrote: > jhenderson wrote: > > You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator). > The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one. Thanks I misread. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s:1 # RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t # RUN: not llvm-dwarfdump -debug-pubnames %t 2>&1 | FileCheck %s ---------------- I'd probably fold in this test case now into the other file. I don't think there's any benefit having them separate. Alternatively, this lives separately, and move the other test case into the library testing. The idea is that we test the code in detail with the library tests, and at a high level in the tool tests (i.e. showing we handle the reported output). I don't mind either approach. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:8 + +## This set does not contain all required fields in the header. +# CHECK-NEXT: length = 0x00000002, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000000 ---------------- Strictly speaking, we should have an error check for each individual field, not just the header in general. This is because we could be using the non-checking version of the `get*` functions. This check currently only checks parsing of the offset field, but there's also the version and size fields. Similar comment applies for the individual entries. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:30 + +## This set contains a string which is not preperly terminated. +# CHECK-NEXT: length = 0x00000011, format = DWARF32, version = 0x0002, unit_offset = 0x00000064, unit_size = 0x000000f0 ---------------- preparly -> properly ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:61 + +## The remaining space in the section is too short to even contein a unit length +## field. ---------------- contein -> contain CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Mon Jul 6 00:47:34 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:47:34 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: jhenderson added reviewers: smeenai, arichardson, grimar, MaskRay, rupprecht. jhenderson added a subscriber: grimar. jhenderson added a comment. Added a couple more people with experience with Darwin, to see if they have any thoughts on the option name/whether it would be helpful to be identical etc. Also added @grimar/@MaskRay/@rupprecht who have a lot of experience with the binutils too. Please update the llvm-nm documentation found in the CommandGuide directory to include the new switch as part of this change. I'm happy to support adding the option, if there's a desire for it. It's trivial enough that it seems reasonable. I don't know about the name of it, and would prefer a shorter name if anybody can come up with one (my only suggestion is `--no-no-symbols-warning` which is almost as verbose and possibly more confusing), but if matching the libtool name is useful, I'm not actively opposed to it. ================ Comment at: llvm/test/tools/llvm-nm/X86/nm-no-symbols.test:14 +# RUN: llvm-nm -no-warning-for-no-symbols %t.o 2>&1 | FileCheck %s -DFILE=%t.o --check-prefix NO-WARNING --allow-empty +# NO-WARNING-NOT: no symbols ---------------- If I'm right, llvm-nm should not print any output at all in this case? If that's the case, I'd change this line to: `# NO-WARNING-NOT: {{.}}` to make it a little more robust against possible further changes/typos etc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Mon Jul 6 00:50:41 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:50:41 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: <3996da7a66de5a6de82e59f14d6f911c@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/linker-options.test:12 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section at index 5: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 ---------------- Repeating the "index 5" bit in the warning seems sub-optimal. I think it's only necessary if we don't trust the warning produced by the Object library to include the index? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 From llvm-commits at lists.llvm.org Mon Jul 6 00:55:08 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 07:55:08 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: <70dba3069b424057b0e28369e5ea9c71@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:6559-6566 + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); ---------------- This seems like a pattern we're likely to have in several different parts of the ELFDumper. Is there any code we could share to avoid duplication? Maybe it just makes sense to change `getStaticSymbolName` to report the warning/return the `` itself? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 From llvm-commits at lists.llvm.org Mon Jul 6 01:03:09 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 08:03:09 +0000 (UTC) Subject: [PATCH] D82868: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. In-Reply-To: References: Message-ID: <08fbcc7cbee73e7e1cfa16365f710702@localhost.localdomain> grimar added a comment. Ping. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82868/new/ https://reviews.llvm.org/D82868 From llvm-commits at lists.llvm.org Mon Jul 6 01:25:43 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via llvm-commits) Date: Mon, 06 Jul 2020 01:25:43 -0700 (PDT) Subject: [llvm] df3bda0 - [VE] Correct stack alignment Message-ID: <5f02e007.1c69fb81.fd1f5.14b7@mx.google.com> Author: Kazushi (Jam) Marukawa Date: 2020-07-06T17:25:29+09:00 New Revision: df3bda047d5abe9190bdd0422270328140556bd4 URL: https://github.com/llvm/llvm-project/commit/df3bda047d5abe9190bdd0422270328140556bd4 DIFF: https://github.com/llvm/llvm-project/commit/df3bda047d5abe9190bdd0422270328140556bd4.diff LOG: [VE] Correct stack alignment Summary: Change stack alignment from 64 bits to 128 bits to follow ABI correctly. And add a regression test for datalayout. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #llvm, #ve, #clang Differential Revision: https://reviews.llvm.org/D83173 Added: Modified: clang/lib/Basic/Targets/VE.h clang/test/CodeGen/target-data.c llvm/lib/Target/VE/VETargetMachine.cpp Removed: ################################################################################ diff --git a/clang/lib/Basic/Targets/VE.h b/clang/lib/Basic/Targets/VE.h index 7e50e7daeb90..f863a0af0acb 100644 --- a/clang/lib/Basic/Targets/VE.h +++ b/clang/lib/Basic/Targets/VE.h @@ -45,7 +45,7 @@ class LLVM_LIBRARY_VISIBILITY VETargetInfo : public TargetInfo { WCharType = UnsignedInt; WIntType = UnsignedInt; UseZeroLengthBitfieldAlignment = true; - resetDataLayout("e-m:e-i64:64-n32:64-S64"); + resetDataLayout("e-m:e-i64:64-n32:64-S128"); } void getTargetDefines(const LangOptions &Opts, diff --git a/clang/test/CodeGen/target-data.c b/clang/test/CodeGen/target-data.c index e619843f4bdb..8c740119cd1b 100644 --- a/clang/test/CodeGen/target-data.c +++ b/clang/test/CodeGen/target-data.c @@ -250,3 +250,7 @@ // RUN: %clang_cc1 -triple bpfeb -o - -emit-llvm %s | \ // RUN: FileCheck %s -check-prefix=BPFEB // BPFEB: target datalayout = "E-m:e-p:64:64-i64:64-i128:128-n32:64-S128" + +// RUN: %clang_cc1 -triple ve -o - -emit-llvm %s | \ +// RUN: FileCheck %s -check-prefix=VE +// VE: target datalayout = "e-m:e-i64:64-n32:64-S128" diff --git a/llvm/lib/Target/VE/VETargetMachine.cpp b/llvm/lib/Target/VE/VETargetMachine.cpp index a0c8ae0c82d7..08b55eebbc98 100644 --- a/llvm/lib/Target/VE/VETargetMachine.cpp +++ b/llvm/lib/Target/VE/VETargetMachine.cpp @@ -41,8 +41,8 @@ static std::string computeDataLayout(const Triple &T) { // VE supports 32 bit and 64 bits integer on registers Ret += "-n32:64"; - // Stack alignment is 64 bits - Ret += "-S64"; + // Stack alignment is 128 bits + Ret += "-S128"; return Ret; } From llvm-commits at lists.llvm.org Mon Jul 6 01:45:21 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Mon, 06 Jul 2020 01:45:21 -0700 (PDT) Subject: [llvm] 04288e9 - Fix 46594 - Alignment assertion failure in instcombine Message-ID: <5f02e4a1.1c69fb81.b6ec8.e631@mx.google.com> Author: Guillaume Chatelet Date: 2020-07-06T08:45:05Z New Revision: 04288e93be7bbcdca5707d84149e864923f9ed25 URL: https://github.com/llvm/llvm-project/commit/04288e93be7bbcdca5707d84149e864923f9ed25 DIFF: https://github.com/llvm/llvm-project/commit/04288e93be7bbcdca5707d84149e864923f9ed25.diff LOG: Fix 46594 - Alignment assertion failure in instcombine Added: Modified: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp index 966b3246f4f5..836af6234ad5 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -4540,10 +4540,11 @@ static void annotateAnyAllocSite(CallBase &Call, const TargetLibraryInfo *TLI) { Call.getContext(), Op1C->getZExtValue())); // Add alignment attribute if alignment is a power of two constant. if (Op0C && Op0C->getValue().ult(llvm::Value::MaximumAlignment)) { - if (MaybeAlign AlignmentVal = Op0C->getMaybeAlignValue()) - Call.addAttribute( - AttributeList::ReturnIndex, - Attribute::getWithAlignment(Call.getContext(), *AlignmentVal)); + uint64_t AlignmentVal = Op0C->getZExtValue(); + if (llvm::isPowerOf2_64(AlignmentVal)) + Call.addAttribute(AttributeList::ReturnIndex, + Attribute::getWithAlignment(Call.getContext(), + Align(AlignmentVal))); } } else if (isReallocLikeFn(&Call, TLI) && Op1C) { Call.addAttribute(AttributeList::ReturnIndex, From llvm-commits at lists.llvm.org Mon Jul 6 01:52:14 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Mon, 06 Jul 2020 01:52:14 -0700 (PDT) Subject: [llvm] 4c0a965 - Fix off by one error in Bitfields Message-ID: <5f02e63e.1c69fb81.784cd.15b0@mx.google.com> Author: Guillaume Chatelet Date: 2020-07-06T08:47:58Z New Revision: 4c0a965c0926d5d6aa786a7de60f7939239099e3 URL: https://github.com/llvm/llvm-project/commit/4c0a965c0926d5d6aa786a7de60f7939239099e3 DIFF: https://github.com/llvm/llvm-project/commit/4c0a965c0926d5d6aa786a7de60f7939239099e3.diff LOG: Fix off by one error in Bitfields Differential Revision: https://reviews.llvm.org/D83192 Added: Modified: llvm/include/llvm/ADT/Bitfields.h Removed: ################################################################################ diff --git a/llvm/include/llvm/ADT/Bitfields.h b/llvm/include/llvm/ADT/Bitfields.h index 2e38cca673ea..68b1549a0ac5 100644 --- a/llvm/include/llvm/ADT/Bitfields.h +++ b/llvm/include/llvm/ADT/Bitfields.h @@ -227,7 +227,7 @@ struct Bitfield { static constexpr unsigned Shift = Offset; static constexpr unsigned Bits = Size; static constexpr unsigned FirstBit = Offset; - static constexpr unsigned LastBit = Shift + Bits; + static constexpr unsigned LastBit = Shift + Bits - 1; private: template friend struct bitfields_details::Impl; @@ -273,7 +273,7 @@ struct Bitfield { /// Returns whether the two bitfields share common bits. template static constexpr bool isOverlapping() { - return A::LastBit > B::FirstBit && B::LastBit > A::FirstBit; + return A::LastBit >= B::FirstBit && B::LastBit >= A::FirstBit; } }; From llvm-commits at lists.llvm.org Mon Jul 6 02:58:59 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Mon, 06 Jul 2020 02:58:59 -0700 (PDT) Subject: [llvm] 55227f8 - [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost Message-ID: <5f02f5e3.1c69fb81.c7255.694e@mx.google.com> Author: David Green Date: 2020-07-06T10:58:40+01:00 New Revision: 55227f85d09c5d26b1484a5aaa9676068b21b6bd URL: https://github.com/llvm/llvm-project/commit/55227f85d09c5d26b1484a5aaa9676068b21b6bd DIFF: https://github.com/llvm/llvm-project/commit/55227f85d09c5d26b1484a5aaa9676068b21b6bd.diff LOG: [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost This alters getMemoryOpCost to use the Base TargetTransformInfo version that includes some additional checks for whether extending loads are legal. This will generally have the effect of making <2 x ..> and some <4 x ..> loads/stores more expensive, which in turn should help favour larger vector factors. Notably it alters the cost of a <4 x half>, which with the current codegen will be expensive if it is not extended. Differential Revision: https://reviews.llvm.org/D82456 Added: Modified: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Analysis/CostModel/ARM/load_store.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index c852dbb8b596..fa71a20d64f5 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -892,23 +892,24 @@ int ARMTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, return 1; // Type legalization can't handle structs - if (TLI->getValueType(DL, Src, true) == MVT::Other) + if (TLI->getValueType(DL, Src, true) == MVT::Other) return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind); - std::pair LT = TLI->getTypeLegalizationCost(DL, Src); - if (ST->hasNEON() && Src->isVectorTy() && (Alignment && *Alignment != Align(16)) && cast(Src)->getElementType()->isDoubleTy()) { // Unaligned loads/stores are extremely inefficient. // We need 4 uops for vst.1/vld.1 vs 1uop for vldr/vstr. + std::pair LT = TLI->getTypeLegalizationCost(DL, Src); return LT.first * 4; } + int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy() ? ST->getMVEVectorCostFactor() : 1; - return BaseCost * LT.first; + return BaseCost * BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, + CostKind, I); } int ARMTTIImpl::getInterleavedMemoryOpCost( diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll index 98570d32352c..b22f8ed9e543 100644 --- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -81,14 +81,14 @@ define i32 @load_extends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadi8 = load i8, i8* undef, align 1 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadi16 = load i16, i16* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadi32 = load i32, i32* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2i8 = load <2 x i8>, <2 x i8>* undef, align 2 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %loadv2i8 = load <2 x i8>, <2 x i8>* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4i8 = load <4 x i8>, <4 x i8>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv8i8 = load <8 x i8>, <8 x i8>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv16i8 = load <16 x i8>, <16 x i8>* undef, align 16 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2i16 = load <2 x i16>, <2 x i16>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %loadv2i16 = load <2 x i16>, <2 x i16>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4i16 = load <4 x i16>, <4 x i16>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv8i16 = load <8 x i16>, <8 x i16>* undef, align 16 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2i32 = load <2 x i32>, <2 x i32>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %loadv2i32 = load <2 x i32>, <2 x i32>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4i32 = load <4 x i32>, <4 x i32>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r0 = sext i8 %loadi8 to i16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r1 = zext i8 %loadi8 to i16 @@ -719,20 +719,20 @@ define i32 @store_trunc() { ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1632, i16* undef, align 2 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1664, i16* undef, align 2 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 %i3264, i32* undef, align 4 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4816, <4 x i8>* undef, align 4 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4832, <4 x i8>* undef, align 4 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4864, <4 x i8>* undef, align 4 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4816, <4 x i8>* undef, align 4 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4832, <4 x i8>* undef, align 4 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4864, <4 x i8>* undef, align 4 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8816, <8 x i8>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8832, <8 x i8>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8864, <8 x i8>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16816, <16 x i8>* undef, align 16 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16832, <16 x i8>* undef, align 16 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16864, <16 x i8>* undef, align 16 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> %v41632, <4 x i16>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> %v41664, <4 x i16>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> %v81632, <8 x i16>* undef, align 16 @@ -774,9 +774,9 @@ define i32 @store_trunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1632, i16* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1664, i16* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 %i3264, i32* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i8> %v4816, <4 x i8>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i8> %v4832, <4 x i8>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i8> %v4864, <4 x i8>* undef, align 4 @@ -786,13 +786,13 @@ define i32 @store_trunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <16 x i8> %v16816, <16 x i8>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <16 x i8> %v16832, <16 x i8>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <16 x i8> %v16864, <16 x i8>* undef, align 16 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i16> %v41632, <4 x i16>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i16> %v41664, <4 x i16>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i16> %v81632, <8 x i16>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i16> %v81664, <8 x i16>* undef, align 16 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i32> %v23264, <2 x i32>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i32> %v23264, <2 x i32>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i32> %v43264, <4 x i32>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; @@ -939,20 +939,20 @@ define i32 @store_trunc() { ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1632, i16* undef, align 2 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 %i1664, i16* undef, align 2 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 %i3264, i32* undef, align 4 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4816, <4 x i8>* undef, align 4 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4832, <4 x i8>* undef, align 4 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i8> %v4864, <4 x i8>* undef, align 4 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2816, <2 x i8>* undef, align 2 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2832, <2 x i8>* undef, align 2 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> %v2864, <2 x i8>* undef, align 2 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4816, <4 x i8>* undef, align 4 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4832, <4 x i8>* undef, align 4 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <4 x i8> %v4864, <4 x i8>* undef, align 4 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8816, <8 x i8>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8832, <8 x i8>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i8> %v8864, <8 x i8>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16816, <16 x i8>* undef, align 16 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16832, <16 x i8>* undef, align 16 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <16 x i8> %v16864, <16 x i8>* undef, align 16 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i16> %v21632, <2 x i16>* undef, align 4 +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i16> %v21664, <2 x i16>* undef, align 4 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> %v41632, <4 x i16>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> %v41664, <4 x i16>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> %v81632, <8 x i16>* undef, align 16 @@ -1273,11 +1273,11 @@ define i32 @load_fpextends() { ; CHECK-MVE-RECIP-LABEL: 'load_fpextends' ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadf16 = load half, half* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadf32 = load float, float* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2f16 = load <2 x half>, <2 x half>* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f16 = load <4 x half>, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %loadv2f16 = load <2 x half>, <2 x half>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16 = load <4 x half>, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv8f16 = load <8 x half>, <8 x half>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv16f16 = load <16 x half>, <16 x half>* undef, align 32 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float @@ -1294,7 +1294,7 @@ define i32 @load_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 328 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; @@ -1567,13 +1567,13 @@ define i32 @load_fptrunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x half> %v21632, <2 x half>* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x half> %v21664, <2 x half>* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x half> %v41632, <4 x half>* undef, align 8 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x half> %v41664, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x half> %v21632, <2 x half>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x half> %v21664, <2 x half>* undef, align 4 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: store <4 x half> %v41632, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: store <4 x half> %v41664, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x half> %v81632, <8 x half>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x half> %v81664, <8 x half>* undef, align 16 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x float> %v23264, <2 x float>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x float> %v23264, <2 x float>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x float> %v43264, <4 x float>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; @@ -2784,7 +2784,7 @@ define i32 @maskedload_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 328 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; diff --git a/llvm/test/Analysis/CostModel/ARM/load_store.ll b/llvm/test/Analysis/CostModel/ARM/load_store.ll index 18a094990b56..8d346ea1c807 100644 --- a/llvm/test/Analysis/CostModel/ARM/load_store.ll +++ b/llvm/test/Analysis/CostModel/ARM/load_store.ll @@ -69,16 +69,16 @@ define void @stores() { ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store i128 undef, i128* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float undef, float* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store double undef, double* undef, align 4 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i8> undef, <2 x i8>* undef, align 1 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i16> undef, <2 x i16>* undef, align 2 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i32> undef, <2 x i32>* undef, align 4 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i8> undef, <2 x i8>* undef, align 1 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i16> undef, <2 x i16>* undef, align 2 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x i32> undef, <2 x i32>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i64> undef, <2 x i64>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 2 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <16 x i8> undef, <16 x i8>* undef, align 1 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x float> undef, <4 x float>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <4 x double> undef, <4 x double>* undef, align 4 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x float> undef, <2 x float>* undef, align 4 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x float> undef, <2 x float>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x double> undef, <2 x double>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <2 x i64> undef, <2 x i64>* undef, align 1 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 1 @@ -95,8 +95,8 @@ define void @stores() { ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store i128 undef, i128* undef, align 4 ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float undef, float* undef, align 4 ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store double undef, double* undef, align 4 -; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i8> undef, <2 x i8>* undef, align 1 -; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i16> undef, <2 x i16>* undef, align 2 +; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i8> undef, <2 x i8>* undef, align 1 +; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <2 x i16> undef, <2 x i16>* undef, align 2 ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i32> undef, <2 x i32>* undef, align 4 ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <2 x i64> undef, <2 x i64>* undef, align 4 ; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4 @@ -256,16 +256,16 @@ define void @loads() { ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = load i128, i128* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load float, float* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load double, double* undef, align 4 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = load <2 x i8>, <2 x i8>* undef, align 1 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = load <2 x i16>, <2 x i16>* undef, align 2 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = load <2 x i32>, <2 x i32>* undef, align 4 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %8 = load <2 x i8>, <2 x i8>* undef, align 1 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %9 = load <2 x i16>, <2 x i16>* undef, align 2 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %10 = load <2 x i32>, <2 x i32>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = load <2 x i64>, <2 x i64>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = load <4 x i32>, <4 x i32>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = load <8 x i16>, <8 x i16>* undef, align 2 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = load <16 x i8>, <16 x i8>* undef, align 1 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = load <4 x float>, <4 x float>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %16 = load <4 x double>, <4 x double>* undef, align 4 -; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = load <2 x float>, <2 x float>* undef, align 4 +; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %17 = load <2 x float>, <2 x float>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %18 = load <2 x double>, <2 x double>* undef, align 4 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = load <2 x i64>, <2 x i64>* undef, align 1 ; CHECK-MVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %20 = load <4 x i32>, <4 x i32>* undef, align 1 diff --git a/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll b/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll index 0c2502df6f0a..747bac801f61 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll @@ -566,10 +566,10 @@ for.body: ; so reject this for now. define void @fpext_allowed(float* noalias nocapture %A, half* noalias nocapture readonly %B, float* noalias nocapture readonly %C) #0 { ; CHECK-LABEL: fpext_allowed( -; PREFER-FOLDING: vector.body: +; PREFER-FOLDING-NOT: vector.body: ; PREFER-FOLDING-NOT: llvm.masked.load ; PREFER-FOLDING-NOT: llvm.masked.store -; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %vector.body +; PREFER-FOLDING-NOT: br i1 %{{.*}}, label %{{.*}}, label %vector.body entry: br label %for.body @@ -595,10 +595,10 @@ for.body: ; so reject this for now. define void @fptrunc_allowed(half* noalias nocapture %A, float* noalias nocapture readonly %B, float* noalias nocapture readonly %C) #0 { ; CHECK-LABEL: fptrunc_allowed( -; PREFER-FOLDING: vector.body: +; PREFER-FOLDING-NOT: vector.body: ; PREFER-FOLDING-NOT: llvm.masked.load ; PREFER-FOLDING-NOT: llvm.masked.store -; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %vector.body +; PREFER-FOLDING-NOT: br i1 %{{.*}}, label %{{.*}}, label %vector.body entry: br label %for.body @@ -622,10 +622,10 @@ for.body: define void @fptrunc_not_allowed(float* noalias nocapture %A, float* noalias nocapture readonly %B, float* noalias nocapture readonly %C, half* noalias nocapture %D) #0 { ; CHECK-LABEL: fptrunc_not_allowed( -; PREFER-FOLDING: vector.body: +; PREFER-FOLDING-NOT: vector.body: ; PREFER-FOLDING-NOT: llvm.masked.load ; PREFER-FOLDING-NOT: llvm.masked.store -; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %vector.body +; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %for.body entry: br label %for.body From llvm-commits at lists.llvm.org Mon Jul 6 03:20:05 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:20:05 -0700 (PDT) Subject: [llvm] cd7f805 - [InstCombine] Lower infinite combine loop detection thresholds Message-ID: <5f02fad5.1c69fb81.3da74.04b5@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:19:31+03:00 New Revision: cd7f8051ac7b6f08734102446482c1e5d951bfcc URL: https://github.com/llvm/llvm-project/commit/cd7f8051ac7b6f08734102446482c1e5d951bfcc DIFF: https://github.com/llvm/llvm-project/commit/cd7f8051ac7b6f08734102446482c1e5d951bfcc.diff LOG: [InstCombine] Lower infinite combine loop detection thresholds Summary: 1000 iteratons is still kinda a lot. Would it make sense to iteratively lower it, until it becomes `2`, with some delay inbetween in order to let users actually potentially encounter it? Reviewers: spatel, nikic, kuhar Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83160 Added: Modified: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index d1c1e5418825..e810b3de25bc 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,8 +123,13 @@ STATISTIC(NumReassoc , "Number of reassociations"); DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); +// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; +#ifndef NDEBUG +static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; +#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; +#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), From llvm-commits at lists.llvm.org Mon Jul 6 03:20:07 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:20:07 -0700 (PDT) Subject: [llvm] f62c8db - [Scalarizer] InsertElement handling w/ constant insert index Message-ID: <5f02fad7.1c69fb81.f911f.0096@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:19:32+03:00 New Revision: f62c8dbc99eaaac35506f655fdf4d7b1cc21c81c URL: https://github.com/llvm/llvm-project/commit/f62c8dbc99eaaac35506f655fdf4d7b1cc21c81c DIFF: https://github.com/llvm/llvm-project/commit/f62c8dbc99eaaac35506f655fdf4d7b1cc21c81c.diff LOG: [Scalarizer] InsertElement handling w/ constant insert index Summary: As it can be clearly seen from the diff, this results in nicer IR. Reviewers: jdoerfert, arsenm, bjope, cameron.mcinally Reviewed By: jdoerfert Subscribers: arphaman, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83102 Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/basic.ll llvm/test/Transforms/Scalarizer/constant-insertelement.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 2e414f78271f..6802a9101882 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -192,6 +192,7 @@ class ScalarizerVisitor : public InstVisitor { bool visitGetElementPtrInst(GetElementPtrInst &GEPI); bool visitCastInst(CastInst &CI); bool visitBitCastInst(BitCastInst &BCI); + bool visitInsertElementInst(InsertElementInst &IEI); bool visitShuffleVectorInst(ShuffleVectorInst &SVI); bool visitPHINode(PHINode &PHI); bool visitLoadInst(LoadInst &LI); @@ -389,7 +390,7 @@ void ScalarizerVisitor::gather(Instruction *Op, const ValueVector &CV) { if (!SV.empty()) { for (unsigned I = 0, E = SV.size(); I != E; ++I) { Value *V = SV[I]; - if (V == nullptr) + if (V == nullptr || SV[I] == CV[I]) continue; Instruction *Old = cast(V); @@ -740,6 +741,31 @@ bool ScalarizerVisitor::visitBitCastInst(BitCastInst &BCI) { return true; } +bool ScalarizerVisitor::visitInsertElementInst(InsertElementInst &IEI) { + VectorType *VT = dyn_cast(IEI.getType()); + if (!VT) + return false; + + unsigned NumElems = VT->getNumElements(); + IRBuilder<> Builder(&IEI); + Scatterer Op0 = scatter(&IEI, IEI.getOperand(0)); + Value *NewElt = IEI.getOperand(1); + Value *InsIdx = IEI.getOperand(2); + + ValueVector Res; + Res.resize(NumElems); + + if (auto *CI = dyn_cast(InsIdx)) { + for (unsigned I = 0; I < NumElems; ++I) + Res[I] = CI->getValue().getZExtValue() == I ? NewElt : Op0[I]; + } else { + return false; + } + + gather(&IEI, Res); + return true; +} + bool ScalarizerVisitor::visitShuffleVectorInst(ShuffleVectorInst &SVI) { VectorType *VT = dyn_cast(SVI.getType()); if (!VT) diff --git a/llvm/test/Transforms/Scalarizer/basic.ll b/llvm/test/Transforms/Scalarizer/basic.ll index 2c82fd9cc3a5..2c7b6a6b588f 100644 --- a/llvm/test/Transforms/Scalarizer/basic.ll +++ b/llvm/test/Transforms/Scalarizer/basic.ll @@ -276,14 +276,14 @@ define void @f8(<4 x float *> *%dest, <4 x float *> %ptr0, <4 x i32> %i0, ; CHECK: %dest.i1 = getelementptr float*, float** %dest.i0, i32 1 ; CHECK: %dest.i2 = getelementptr float*, float** %dest.i0, i32 2 ; CHECK: %dest.i3 = getelementptr float*, float** %dest.i0, i32 3 +; CHECK: %ptr0.i0 = extractelement <4 x float*> %ptr0, i32 0 +; CHECK: %ptr0.i2 = extractelement <4 x float*> %ptr0, i32 2 +; CHECK: %ptr0.i3 = extractelement <4 x float*> %ptr0, i32 3 ; CHECK: %i0.i1 = extractelement <4 x i32> %i0, i32 1 ; CHECK: %i0.i3 = extractelement <4 x i32> %i0, i32 3 -; CHECK: %ptr0.i0 = extractelement <4 x float*> %ptr0, i32 0 ; CHECK: %val.i0 = getelementptr float, float* %ptr0.i0, i32 100 ; CHECK: %val.i1 = getelementptr float, float* %other, i32 %i0.i1 -; CHECK: %ptr0.i2 = extractelement <4 x float*> %ptr0, i32 2 ; CHECK: %val.i2 = getelementptr float, float* %ptr0.i2, i32 100 -; CHECK: %ptr0.i3 = extractelement <4 x float*> %ptr0, i32 3 ; CHECK: %val.i3 = getelementptr float, float* %ptr0.i3, i32 %i0.i3 ; CHECK: store float* %val.i0, float** %dest.i0, align 32 ; CHECK: store float* %val.i1, float** %dest.i1, align 8 diff --git a/llvm/test/Transforms/Scalarizer/constant-insertelement.ll b/llvm/test/Transforms/Scalarizer/constant-insertelement.ll index 8e8b640e9577..4ddde3598334 100644 --- a/llvm/test/Transforms/Scalarizer/constant-insertelement.ll +++ b/llvm/test/Transforms/Scalarizer/constant-insertelement.ll @@ -12,18 +12,9 @@ define <4 x i32> @f1(<4 x i32> *%src, i32 %repl, i32 %index) { ; ALL-NEXT: [[VAL0_I1:%.*]] = load i32, i32* [[SRC_I1]], align 4 ; ALL-NEXT: [[SRC_I2:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 2 ; ALL-NEXT: [[VAL0_I2:%.*]] = load i32, i32* [[SRC_I2]], align 8 -; ALL-NEXT: [[SRC_I3:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 3 -; ALL-NEXT: [[VAL0_I3:%.*]] = load i32, i32* [[SRC_I3]], align 4 -; ALL-NEXT: [[VAL0_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL0_I0]], i32 0 -; ALL-NEXT: [[VAL0_UPTO1:%.*]] = insertelement <4 x i32> [[VAL0_UPTO0]], i32 [[VAL0_I1]], i32 1 -; ALL-NEXT: [[VAL0_UPTO2:%.*]] = insertelement <4 x i32> [[VAL0_UPTO1]], i32 [[VAL0_I2]], i32 2 -; ALL-NEXT: [[VAL0:%.*]] = insertelement <4 x i32> [[VAL0_UPTO2]], i32 [[VAL0_I3]], i32 3 -; ALL-NEXT: [[VAL0_I01:%.*]] = extractelement <4 x i32> [[VAL0]], i32 0 -; ALL-NEXT: [[VAL2_I0:%.*]] = shl i32 1, [[VAL0_I01]] -; ALL-NEXT: [[VAL0_I12:%.*]] = extractelement <4 x i32> [[VAL0]], i32 1 -; ALL-NEXT: [[VAL2_I1:%.*]] = shl i32 2, [[VAL0_I12]] -; ALL-NEXT: [[VAL0_I23:%.*]] = extractelement <4 x i32> [[VAL0]], i32 2 -; ALL-NEXT: [[VAL2_I2:%.*]] = shl i32 3, [[VAL0_I23]] +; ALL-NEXT: [[VAL2_I0:%.*]] = shl i32 1, [[VAL0_I0]] +; ALL-NEXT: [[VAL2_I1:%.*]] = shl i32 2, [[VAL0_I1]] +; ALL-NEXT: [[VAL2_I2:%.*]] = shl i32 3, [[VAL0_I2]] ; ALL-NEXT: [[VAL2_I3:%.*]] = shl i32 4, [[REPL:%.*]] ; ALL-NEXT: [[VAL2_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL2_I0]], i32 0 ; ALL-NEXT: [[VAL2_UPTO1:%.*]] = insertelement <4 x i32> [[VAL2_UPTO0]], i32 [[VAL2_I1]], i32 1 From llvm-commits at lists.llvm.org Mon Jul 6 03:20:09 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:20:09 -0700 (PDT) Subject: [llvm] 28b7816 - [Scalarizer] ExtractElement handling w/ constant extract index Message-ID: <5f02fad9.1c69fb81.6d191.217b@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:19:32+03:00 New Revision: 28b7816b782bdeca509218b53edfbca6512c33d5 URL: https://github.com/llvm/llvm-project/commit/28b7816b782bdeca509218b53edfbca6512c33d5 DIFF: https://github.com/llvm/llvm-project/commit/28b7816b782bdeca509218b53edfbca6512c33d5.diff LOG: [Scalarizer] ExtractElement handling w/ constant extract index Summary: It appears to be better IR-wise to aggressively scalarize it, rather than relying on gathering it, and leaving it as-is. Reviewers: jdoerfert, bjope, arsenm, cameron.mcinally Reviewed By: jdoerfert Subscribers: arphaman, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83101 Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/constant-extractelement.ll llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 6802a9101882..5cc4d795d767 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -193,6 +193,7 @@ class ScalarizerVisitor : public InstVisitor { bool visitCastInst(CastInst &CI); bool visitBitCastInst(BitCastInst &BCI); bool visitInsertElementInst(InsertElementInst &IEI); + bool visitExtractElementInst(ExtractElementInst &EEI); bool visitShuffleVectorInst(ShuffleVectorInst &SVI); bool visitPHINode(PHINode &PHI); bool visitLoadInst(LoadInst &LI); @@ -766,6 +767,24 @@ bool ScalarizerVisitor::visitInsertElementInst(InsertElementInst &IEI) { return true; } +bool ScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) { + VectorType *VT = dyn_cast(EEI.getOperand(0)->getType()); + if (!VT) + return false; + + IRBuilder<> Builder(&EEI); + Scatterer Op0 = scatter(&EEI, EEI.getOperand(0)); + Value *ExtIdx = EEI.getOperand(1); + + if (auto *CI = dyn_cast(ExtIdx)) { + Value *Res = Op0[CI->getValue().getZExtValue()]; + gather(&EEI, {Res}); + return true; + } + + return false; +} + bool ScalarizerVisitor::visitShuffleVectorInst(ShuffleVectorInst &SVI) { VectorType *VT = dyn_cast(SVI.getType()); if (!VT) @@ -885,16 +904,20 @@ bool ScalarizerVisitor::finish() { if (!Op->use_empty()) { // The value is still needed, so recreate it using a series of // InsertElements. - auto *Ty = cast(Op->getType()); - Value *Res = UndefValue::get(Ty); - BasicBlock *BB = Op->getParent(); - unsigned Count = Ty->getNumElements(); - IRBuilder<> Builder(Op); - if (isa(Op)) - Builder.SetInsertPoint(BB, BB->getFirstInsertionPt()); - for (unsigned I = 0; I < Count; ++I) - Res = Builder.CreateInsertElement(Res, CV[I], Builder.getInt32(I), - Op->getName() + ".upto" + Twine(I)); + Value *Res = UndefValue::get(Op->getType()); + if (auto *Ty = dyn_cast(Op->getType())) { + BasicBlock *BB = Op->getParent(); + unsigned Count = Ty->getNumElements(); + IRBuilder<> Builder(Op); + if (isa(Op)) + Builder.SetInsertPoint(BB, BB->getFirstInsertionPt()); + for (unsigned I = 0; I < Count; ++I) + Res = Builder.CreateInsertElement(Res, CV[I], Builder.getInt32(I), + Op->getName() + ".upto" + Twine(I)); + } else { + assert(CV.size() == 1 && Op->getType() == CV[0]->getType()); + Res = CV[0]; + } Res->takeName(Op); Op->replaceAllUsesWith(Res); } diff --git a/llvm/test/Transforms/Scalarizer/constant-extractelement.ll b/llvm/test/Transforms/Scalarizer/constant-extractelement.ll index e5d935d186b7..f5bb2edac4e6 100644 --- a/llvm/test/Transforms/Scalarizer/constant-extractelement.ll +++ b/llvm/test/Transforms/Scalarizer/constant-extractelement.ll @@ -7,22 +7,9 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 define i32 @f1(<4 x i32> *%src, i32 %index) { ; ALL-LABEL: @f1( ; ALL-NEXT: [[SRC_I0:%.*]] = bitcast <4 x i32>* [[SRC:%.*]] to i32* -; ALL-NEXT: [[VAL0_I0:%.*]] = load i32, i32* [[SRC_I0]], align 16 -; ALL-NEXT: [[SRC_I1:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 1 -; ALL-NEXT: [[VAL0_I1:%.*]] = load i32, i32* [[SRC_I1]], align 4 -; ALL-NEXT: [[SRC_I2:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 2 -; ALL-NEXT: [[VAL0_I2:%.*]] = load i32, i32* [[SRC_I2]], align 8 ; ALL-NEXT: [[SRC_I3:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 3 ; ALL-NEXT: [[VAL0_I3:%.*]] = load i32, i32* [[SRC_I3]], align 4 -; ALL-NEXT: [[VAL1_I0:%.*]] = shl i32 1, [[VAL0_I0]] -; ALL-NEXT: [[VAL1_I1:%.*]] = shl i32 2, [[VAL0_I1]] -; ALL-NEXT: [[VAL1_I2:%.*]] = shl i32 3, [[VAL0_I2]] -; ALL-NEXT: [[VAL1_I3:%.*]] = shl i32 4, [[VAL0_I3]] -; ALL-NEXT: [[VAL1_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL1_I0]], i32 0 -; ALL-NEXT: [[VAL1_UPTO1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO0]], i32 [[VAL1_I1]], i32 1 -; ALL-NEXT: [[VAL1_UPTO2:%.*]] = insertelement <4 x i32> [[VAL1_UPTO1]], i32 [[VAL1_I2]], i32 2 -; ALL-NEXT: [[VAL1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO2]], i32 [[VAL1_I3]], i32 3 -; ALL-NEXT: [[VAL2:%.*]] = extractelement <4 x i32> [[VAL1]], i32 3 +; ALL-NEXT: [[VAL2:%.*]] = shl i32 4, [[VAL0_I3]] ; ALL-NEXT: ret i32 [[VAL2]] ; %val0 = load <4 x i32> , <4 x i32> *%src diff --git a/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll b/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll index 1de1f6509666..8e89efb5d31f 100644 --- a/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll +++ b/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll @@ -15,12 +15,7 @@ define i16 @f1() { ; CHECK-NEXT: [[PHI_I1:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] ; CHECK-NEXT: [[PHI_I2:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] ; CHECK-NEXT: [[PHI_I3:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] -; CHECK-NEXT: [[PHI_UPTO0:%.*]] = insertelement <4 x i16> undef, i16 [[PHI_I0]], i32 0 -; CHECK-NEXT: [[PHI_UPTO1:%.*]] = insertelement <4 x i16> [[PHI_UPTO0]], i16 [[PHI_I1]], i32 1 -; CHECK-NEXT: [[PHI_UPTO2:%.*]] = insertelement <4 x i16> [[PHI_UPTO1]], i16 [[PHI_I2]], i32 2 -; CHECK-NEXT: [[PHI:%.*]] = insertelement <4 x i16> [[PHI_UPTO2]], i16 [[PHI_I3]], i32 3 -; CHECK-NEXT: [[EXTRACT:%.*]] = extractelement <4 x i16> [[PHI]], i32 0 -; CHECK-NEXT: ret i16 [[EXTRACT]] +; CHECK-NEXT: ret i16 [[PHI_I0]] ; entry: br label %for.end From llvm-commits at lists.llvm.org Mon Jul 6 03:20:11 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:20:11 -0700 (PDT) Subject: [llvm] 6e50474 - [Scalarizer] InsertElement handling w/ variable insert index (PR46524) Message-ID: <5f02fadb.1c69fb81.1e041.192e@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:19:32+03:00 New Revision: 6e504745813259067f5b0ad696bec3a3d22ab044 URL: https://github.com/llvm/llvm-project/commit/6e504745813259067f5b0ad696bec3a3d22ab044 DIFF: https://github.com/llvm/llvm-project/commit/6e504745813259067f5b0ad696bec3a3d22ab044.diff LOG: [Scalarizer] InsertElement handling w/ variable insert index (PR46524) Summary: I'm interested in taking the original C++ input, for which we currently are stuck with an alloca and producing roughly the lower IR, with neither an alloca nor a vector ops: https://godbolt.org/z/cRRWaJ For that, as intermediate step, i'd to somehow perform scalarization. As per @arsenmn suggestion, i'm trying to see if scalarizer can help me avoid writing a bicycle. I'm not sure if it's really intentional that variable insert is not handled currently. If it really is, and is supposed to stay that way (?), i guess i could guard it.. See [[ https://bugs.llvm.org/show_bug.cgi?id=46524 | PR46524 ]]. Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert Reviewed By: jdoerfert Subscribers: arphaman, uabelho, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82961 Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-insertelement.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 5cc4d795d767..0327d3932135 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -51,6 +51,11 @@ using namespace llvm; #define DEBUG_TYPE "scalarizer" +static cl::opt ScalarizeVariableInsertExtract( + "scalarize-variable-insert-extract", cl::init(true), cl::Hidden, + cl::desc("Allow the scalarizer pass to scalarize " + "insertelement/extractelement with variable index")); + // This is disabled by default because having separate loads and stores // makes it more likely that the -combiner-alias-analysis limits will be // reached. @@ -760,7 +765,15 @@ bool ScalarizerVisitor::visitInsertElementInst(InsertElementInst &IEI) { for (unsigned I = 0; I < NumElems; ++I) Res[I] = CI->getValue().getZExtValue() == I ? NewElt : Op0[I]; } else { - return false; + if (!ScalarizeVariableInsertExtract) + return false; + + for (unsigned I = 0; I < NumElems; ++I) { + Res[I] = Builder.CreateSelect( + Builder.CreateICmpEQ(InsIdx, ConstantInt::get(InsIdx->getType(), I), + InsIdx->getName() + ".is." + Twine(I)), + NewElt, Op0[I], IEI.getName() + ".i" + Twine(I)); + } } gather(&IEI, Res); diff --git a/llvm/test/Transforms/Scalarizer/variable-insertelement.ll b/llvm/test/Transforms/Scalarizer/variable-insertelement.ll index fc2955fc1ae4..aeec2ddea2ea 100644 --- a/llvm/test/Transforms/Scalarizer/variable-insertelement.ll +++ b/llvm/test/Transforms/Scalarizer/variable-insertelement.ll @@ -1,50 +1,82 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt %s -scalarizer -scalarize-load-store -dce -S | FileCheck --check-prefixes=ALL %s +; RUN: opt %s -scalarizer -dce -S | FileCheck --check-prefixes=ALL,DEFAULT %s +; RUN: opt %s -scalarizer -scalarize-variable-insert-extract=false -dce -S | FileCheck --check-prefixes=ALL,OFF %s +; RUN: opt %s -scalarizer -scalarize-variable-insert-extract=true -dce -S | FileCheck --check-prefixes=ALL,DEFAULT,ON %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" ; Test that variable inserts are scalarized. define <4 x i32> @f1(<4 x i32> %src, i32 %val, i32 %index) { -; ALL-LABEL: @f1( -; ALL-NEXT: [[RES:%.*]] = insertelement <4 x i32> [[SRC:%.*]], i32 [[VAL:%.*]], i32 [[INDEX:%.*]] -; ALL-NEXT: ret <4 x i32> [[RES]] +; DEFAULT-LABEL: @f1( +; DEFAULT-NEXT: [[INDEX_IS_0:%.*]] = icmp eq i32 [[INDEX:%.*]], 0 +; DEFAULT-NEXT: [[SRC_I0:%.*]] = extractelement <4 x i32> [[SRC:%.*]], i32 0 +; DEFAULT-NEXT: [[RES_I0:%.*]] = select i1 [[INDEX_IS_0]], i32 [[VAL:%.*]], i32 [[SRC_I0]] +; DEFAULT-NEXT: [[INDEX_IS_1:%.*]] = icmp eq i32 [[INDEX]], 1 +; DEFAULT-NEXT: [[SRC_I1:%.*]] = extractelement <4 x i32> [[SRC]], i32 1 +; DEFAULT-NEXT: [[RES_I1:%.*]] = select i1 [[INDEX_IS_1]], i32 [[VAL]], i32 [[SRC_I1]] +; DEFAULT-NEXT: [[INDEX_IS_2:%.*]] = icmp eq i32 [[INDEX]], 2 +; DEFAULT-NEXT: [[SRC_I2:%.*]] = extractelement <4 x i32> [[SRC]], i32 2 +; DEFAULT-NEXT: [[RES_I2:%.*]] = select i1 [[INDEX_IS_2]], i32 [[VAL]], i32 [[SRC_I2]] +; DEFAULT-NEXT: [[INDEX_IS_3:%.*]] = icmp eq i32 [[INDEX]], 3 +; DEFAULT-NEXT: [[SRC_I3:%.*]] = extractelement <4 x i32> [[SRC]], i32 3 +; DEFAULT-NEXT: [[RES_I3:%.*]] = select i1 [[INDEX_IS_3]], i32 [[VAL]], i32 [[SRC_I3]] +; DEFAULT-NEXT: [[RES_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[RES_I0]], i32 0 +; DEFAULT-NEXT: [[RES_UPTO1:%.*]] = insertelement <4 x i32> [[RES_UPTO0]], i32 [[RES_I1]], i32 1 +; DEFAULT-NEXT: [[RES_UPTO2:%.*]] = insertelement <4 x i32> [[RES_UPTO1]], i32 [[RES_I2]], i32 2 +; DEFAULT-NEXT: [[RES:%.*]] = insertelement <4 x i32> [[RES_UPTO2]], i32 [[RES_I3]], i32 3 +; DEFAULT-NEXT: ret <4 x i32> [[RES]] +; +; OFF-LABEL: @f1( +; OFF-NEXT: [[RES:%.*]] = insertelement <4 x i32> [[SRC:%.*]], i32 [[VAL:%.*]], i32 [[INDEX:%.*]] +; OFF-NEXT: ret <4 x i32> [[RES]] ; %res = insertelement <4 x i32> %src, i32 %val, i32 %index ret <4 x i32> %res } define void @f2(<4 x i32> *%dest, <4 x i32> *%src, i32 %index) { -; ALL-LABEL: @f2( -; ALL-NEXT: [[DEST_I0:%.*]] = bitcast <4 x i32>* [[DEST:%.*]] to i32* -; ALL-NEXT: [[DEST_I1:%.*]] = getelementptr i32, i32* [[DEST_I0]], i32 1 -; ALL-NEXT: [[DEST_I2:%.*]] = getelementptr i32, i32* [[DEST_I0]], i32 2 -; ALL-NEXT: [[DEST_I3:%.*]] = getelementptr i32, i32* [[DEST_I0]], i32 3 -; ALL-NEXT: [[SRC_I0:%.*]] = bitcast <4 x i32>* [[SRC:%.*]] to i32* -; ALL-NEXT: [[VAL0_I0:%.*]] = load i32, i32* [[SRC_I0]], align 16 -; ALL-NEXT: [[SRC_I1:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 1 -; ALL-NEXT: [[VAL0_I1:%.*]] = load i32, i32* [[SRC_I1]], align 4 -; ALL-NEXT: [[SRC_I2:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 2 -; ALL-NEXT: [[VAL0_I2:%.*]] = load i32, i32* [[SRC_I2]], align 8 -; ALL-NEXT: [[SRC_I3:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 3 -; ALL-NEXT: [[VAL0_I3:%.*]] = load i32, i32* [[SRC_I3]], align 4 -; ALL-NEXT: [[VAL0_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL0_I0]], i32 0 -; ALL-NEXT: [[VAL0_UPTO1:%.*]] = insertelement <4 x i32> [[VAL0_UPTO0]], i32 [[VAL0_I1]], i32 1 -; ALL-NEXT: [[VAL0_UPTO2:%.*]] = insertelement <4 x i32> [[VAL0_UPTO1]], i32 [[VAL0_I2]], i32 2 -; ALL-NEXT: [[VAL0:%.*]] = insertelement <4 x i32> [[VAL0_UPTO2]], i32 [[VAL0_I3]], i32 3 -; ALL-NEXT: [[VAL1:%.*]] = insertelement <4 x i32> [[VAL0]], i32 1, i32 [[INDEX:%.*]] -; ALL-NEXT: [[VAL1_I0:%.*]] = extractelement <4 x i32> [[VAL1]], i32 0 -; ALL-NEXT: [[VAL2_I0:%.*]] = shl i32 1, [[VAL1_I0]] -; ALL-NEXT: [[VAL1_I1:%.*]] = extractelement <4 x i32> [[VAL1]], i32 1 -; ALL-NEXT: [[VAL2_I1:%.*]] = shl i32 2, [[VAL1_I1]] -; ALL-NEXT: [[VAL1_I2:%.*]] = extractelement <4 x i32> [[VAL1]], i32 2 -; ALL-NEXT: [[VAL2_I2:%.*]] = shl i32 3, [[VAL1_I2]] -; ALL-NEXT: [[VAL1_I3:%.*]] = extractelement <4 x i32> [[VAL1]], i32 3 -; ALL-NEXT: [[VAL2_I3:%.*]] = shl i32 4, [[VAL1_I3]] -; ALL-NEXT: store i32 [[VAL2_I0]], i32* [[DEST_I0]], align 16 -; ALL-NEXT: store i32 [[VAL2_I1]], i32* [[DEST_I1]], align 4 -; ALL-NEXT: store i32 [[VAL2_I2]], i32* [[DEST_I2]], align 8 -; ALL-NEXT: store i32 [[VAL2_I3]], i32* [[DEST_I3]], align 4 -; ALL-NEXT: ret void +; DEFAULT-LABEL: @f2( +; DEFAULT-NEXT: [[VAL0:%.*]] = load <4 x i32>, <4 x i32>* [[SRC:%.*]], align 16 +; DEFAULT-NEXT: [[INDEX_IS_0:%.*]] = icmp eq i32 [[INDEX:%.*]], 0 +; DEFAULT-NEXT: [[VAL0_I0:%.*]] = extractelement <4 x i32> [[VAL0]], i32 0 +; DEFAULT-NEXT: [[VAL1_I0:%.*]] = select i1 [[INDEX_IS_0]], i32 1, i32 [[VAL0_I0]] +; DEFAULT-NEXT: [[INDEX_IS_1:%.*]] = icmp eq i32 [[INDEX]], 1 +; DEFAULT-NEXT: [[VAL0_I1:%.*]] = extractelement <4 x i32> [[VAL0]], i32 1 +; DEFAULT-NEXT: [[VAL1_I1:%.*]] = select i1 [[INDEX_IS_1]], i32 1, i32 [[VAL0_I1]] +; DEFAULT-NEXT: [[INDEX_IS_2:%.*]] = icmp eq i32 [[INDEX]], 2 +; DEFAULT-NEXT: [[VAL0_I2:%.*]] = extractelement <4 x i32> [[VAL0]], i32 2 +; DEFAULT-NEXT: [[VAL1_I2:%.*]] = select i1 [[INDEX_IS_2]], i32 1, i32 [[VAL0_I2]] +; DEFAULT-NEXT: [[INDEX_IS_3:%.*]] = icmp eq i32 [[INDEX]], 3 +; DEFAULT-NEXT: [[VAL0_I3:%.*]] = extractelement <4 x i32> [[VAL0]], i32 3 +; DEFAULT-NEXT: [[VAL1_I3:%.*]] = select i1 [[INDEX_IS_3]], i32 1, i32 [[VAL0_I3]] +; DEFAULT-NEXT: [[VAL2_I0:%.*]] = shl i32 1, [[VAL1_I0]] +; DEFAULT-NEXT: [[VAL2_I1:%.*]] = shl i32 2, [[VAL1_I1]] +; DEFAULT-NEXT: [[VAL2_I2:%.*]] = shl i32 3, [[VAL1_I2]] +; DEFAULT-NEXT: [[VAL2_I3:%.*]] = shl i32 4, [[VAL1_I3]] +; DEFAULT-NEXT: [[VAL2_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL2_I0]], i32 0 +; DEFAULT-NEXT: [[VAL2_UPTO1:%.*]] = insertelement <4 x i32> [[VAL2_UPTO0]], i32 [[VAL2_I1]], i32 1 +; DEFAULT-NEXT: [[VAL2_UPTO2:%.*]] = insertelement <4 x i32> [[VAL2_UPTO1]], i32 [[VAL2_I2]], i32 2 +; DEFAULT-NEXT: [[VAL2:%.*]] = insertelement <4 x i32> [[VAL2_UPTO2]], i32 [[VAL2_I3]], i32 3 +; DEFAULT-NEXT: store <4 x i32> [[VAL2]], <4 x i32>* [[DEST:%.*]], align 16 +; DEFAULT-NEXT: ret void +; +; OFF-LABEL: @f2( +; OFF-NEXT: [[VAL0:%.*]] = load <4 x i32>, <4 x i32>* [[SRC:%.*]], align 16 +; OFF-NEXT: [[VAL1:%.*]] = insertelement <4 x i32> [[VAL0]], i32 1, i32 [[INDEX:%.*]] +; OFF-NEXT: [[VAL1_I0:%.*]] = extractelement <4 x i32> [[VAL1]], i32 0 +; OFF-NEXT: [[VAL2_I0:%.*]] = shl i32 1, [[VAL1_I0]] +; OFF-NEXT: [[VAL1_I1:%.*]] = extractelement <4 x i32> [[VAL1]], i32 1 +; OFF-NEXT: [[VAL2_I1:%.*]] = shl i32 2, [[VAL1_I1]] +; OFF-NEXT: [[VAL1_I2:%.*]] = extractelement <4 x i32> [[VAL1]], i32 2 +; OFF-NEXT: [[VAL2_I2:%.*]] = shl i32 3, [[VAL1_I2]] +; OFF-NEXT: [[VAL1_I3:%.*]] = extractelement <4 x i32> [[VAL1]], i32 3 +; OFF-NEXT: [[VAL2_I3:%.*]] = shl i32 4, [[VAL1_I3]] +; OFF-NEXT: [[VAL2_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL2_I0]], i32 0 +; OFF-NEXT: [[VAL2_UPTO1:%.*]] = insertelement <4 x i32> [[VAL2_UPTO0]], i32 [[VAL2_I1]], i32 1 +; OFF-NEXT: [[VAL2_UPTO2:%.*]] = insertelement <4 x i32> [[VAL2_UPTO1]], i32 [[VAL2_I2]], i32 2 +; OFF-NEXT: [[VAL2:%.*]] = insertelement <4 x i32> [[VAL2_UPTO2]], i32 [[VAL2_I3]], i32 3 +; OFF-NEXT: store <4 x i32> [[VAL2]], <4 x i32>* [[DEST:%.*]], align 16 +; OFF-NEXT: ret void ; %val0 = load <4 x i32> , <4 x i32> *%src %val1 = insertelement <4 x i32> %val0, i32 1, i32 %index From llvm-commits at lists.llvm.org Mon Jul 6 03:20:13 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:20:13 -0700 (PDT) Subject: [llvm] 51f9310 - [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) Message-ID: <5f02fadd.1c69fb81.a6e53.000d@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:19:33+03:00 New Revision: 51f9310ff2e3a615e43b87acc84dab0400b5854e URL: https://github.com/llvm/llvm-project/commit/51f9310ff2e3a615e43b87acc84dab0400b5854e DIFF: https://github.com/llvm/llvm-project/commit/51f9310ff2e3a615e43b87acc84dab0400b5854e.diff LOG: [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) Summary: Similar to D82961. Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert Reviewed By: jdoerfert Subscribers: arphaman, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82970 Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-extractelement.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 0327d3932135..a775be6ef7b8 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -785,6 +785,7 @@ bool ScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) { if (!VT) return false; + unsigned NumSrcElems = VT->getNumElements(); IRBuilder<> Builder(&EEI); Scatterer Op0 = scatter(&EEI, EEI.getOperand(0)); Value *ExtIdx = EEI.getOperand(1); @@ -795,7 +796,18 @@ bool ScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) { return true; } - return false; + if (!ScalarizeVariableInsertExtract) + return false; + + Value *Res = UndefValue::get(VT->getElementType()); + for (unsigned I = 0; I < NumSrcElems; ++I) { + Res = Builder.CreateSelect( + Builder.CreateICmpEQ(ExtIdx, ConstantInt::get(ExtIdx->getType(), I), + ExtIdx->getName() + ".is." + Twine(I)), + Op0[I], Res, EEI.getName() + ".upto" + Twine(I)); + } + gather(&EEI, {Res}); + return true; } bool ScalarizerVisitor::visitShuffleVectorInst(ShuffleVectorInst &SVI) { diff --git a/llvm/test/Transforms/Scalarizer/variable-extractelement.ll b/llvm/test/Transforms/Scalarizer/variable-extractelement.ll index 2f1c24878de0..50666562af32 100644 --- a/llvm/test/Transforms/Scalarizer/variable-extractelement.ll +++ b/llvm/test/Transforms/Scalarizer/variable-extractelement.ll @@ -1,38 +1,72 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt %s -scalarizer -scalarize-load-store -dce -S | FileCheck --check-prefixes=ALL %s +; RUN: opt %s -scalarizer -dce -S | FileCheck --check-prefixes=ALL,DEFAULT %s +; RUN: opt %s -scalarizer -scalarize-variable-insert-extract=false -dce -S | FileCheck --check-prefixes=ALL,OFF %s +; RUN: opt %s -scalarizer -scalarize-variable-insert-extract=true -dce -S | FileCheck --check-prefixes=ALL,DEFAULT,ON %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" ; Test that variable extracts scalarized. define i32 @f1(<4 x i32> %src, i32 %index) { -; ALL-LABEL: @f1( -; ALL-NEXT: [[RES:%.*]] = extractelement <4 x i32> [[SRC:%.*]], i32 [[INDEX:%.*]] -; ALL-NEXT: ret i32 [[RES]] +; DEFAULT-LABEL: @f1( +; DEFAULT-NEXT: [[INDEX_IS_0:%.*]] = icmp eq i32 [[INDEX:%.*]], 0 +; DEFAULT-NEXT: [[SRC_I0:%.*]] = extractelement <4 x i32> [[SRC:%.*]], i32 0 +; DEFAULT-NEXT: [[RES_UPTO0:%.*]] = select i1 [[INDEX_IS_0]], i32 [[SRC_I0]], i32 undef +; DEFAULT-NEXT: [[INDEX_IS_1:%.*]] = icmp eq i32 [[INDEX]], 1 +; DEFAULT-NEXT: [[SRC_I1:%.*]] = extractelement <4 x i32> [[SRC]], i32 1 +; DEFAULT-NEXT: [[RES_UPTO1:%.*]] = select i1 [[INDEX_IS_1]], i32 [[SRC_I1]], i32 [[RES_UPTO0]] +; DEFAULT-NEXT: [[INDEX_IS_2:%.*]] = icmp eq i32 [[INDEX]], 2 +; DEFAULT-NEXT: [[SRC_I2:%.*]] = extractelement <4 x i32> [[SRC]], i32 2 +; DEFAULT-NEXT: [[RES_UPTO2:%.*]] = select i1 [[INDEX_IS_2]], i32 [[SRC_I2]], i32 [[RES_UPTO1]] +; DEFAULT-NEXT: [[INDEX_IS_3:%.*]] = icmp eq i32 [[INDEX]], 3 +; DEFAULT-NEXT: [[SRC_I3:%.*]] = extractelement <4 x i32> [[SRC]], i32 3 +; DEFAULT-NEXT: [[RES:%.*]] = select i1 [[INDEX_IS_3]], i32 [[SRC_I3]], i32 [[RES_UPTO2]] +; DEFAULT-NEXT: ret i32 [[RES]] +; +; OFF-LABEL: @f1( +; OFF-NEXT: [[RES:%.*]] = extractelement <4 x i32> [[SRC:%.*]], i32 [[INDEX:%.*]] +; OFF-NEXT: ret i32 [[RES]] ; %res = extractelement <4 x i32> %src, i32 %index ret i32 %res } define i32 @f2(<4 x i32> *%src, i32 %index) { -; ALL-LABEL: @f2( -; ALL-NEXT: [[SRC_I0:%.*]] = bitcast <4 x i32>* [[SRC:%.*]] to i32* -; ALL-NEXT: [[VAL0_I0:%.*]] = load i32, i32* [[SRC_I0]], align 16 -; ALL-NEXT: [[SRC_I1:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 1 -; ALL-NEXT: [[VAL0_I1:%.*]] = load i32, i32* [[SRC_I1]], align 4 -; ALL-NEXT: [[SRC_I2:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 2 -; ALL-NEXT: [[VAL0_I2:%.*]] = load i32, i32* [[SRC_I2]], align 8 -; ALL-NEXT: [[SRC_I3:%.*]] = getelementptr i32, i32* [[SRC_I0]], i32 3 -; ALL-NEXT: [[VAL0_I3:%.*]] = load i32, i32* [[SRC_I3]], align 4 -; ALL-NEXT: [[VAL1_I0:%.*]] = shl i32 1, [[VAL0_I0]] -; ALL-NEXT: [[VAL1_I1:%.*]] = shl i32 2, [[VAL0_I1]] -; ALL-NEXT: [[VAL1_I2:%.*]] = shl i32 3, [[VAL0_I2]] -; ALL-NEXT: [[VAL1_I3:%.*]] = shl i32 4, [[VAL0_I3]] -; ALL-NEXT: [[VAL1_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL1_I0]], i32 0 -; ALL-NEXT: [[VAL1_UPTO1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO0]], i32 [[VAL1_I1]], i32 1 -; ALL-NEXT: [[VAL1_UPTO2:%.*]] = insertelement <4 x i32> [[VAL1_UPTO1]], i32 [[VAL1_I2]], i32 2 -; ALL-NEXT: [[VAL1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO2]], i32 [[VAL1_I3]], i32 3 -; ALL-NEXT: [[VAL2:%.*]] = extractelement <4 x i32> [[VAL1]], i32 [[INDEX:%.*]] -; ALL-NEXT: ret i32 [[VAL2]] +; DEFAULT-LABEL: @f2( +; DEFAULT-NEXT: [[VAL0:%.*]] = load <4 x i32>, <4 x i32>* [[SRC:%.*]], align 16 +; DEFAULT-NEXT: [[VAL0_I0:%.*]] = extractelement <4 x i32> [[VAL0]], i32 0 +; DEFAULT-NEXT: [[VAL1_I0:%.*]] = shl i32 1, [[VAL0_I0]] +; DEFAULT-NEXT: [[VAL0_I1:%.*]] = extractelement <4 x i32> [[VAL0]], i32 1 +; DEFAULT-NEXT: [[VAL1_I1:%.*]] = shl i32 2, [[VAL0_I1]] +; DEFAULT-NEXT: [[VAL0_I2:%.*]] = extractelement <4 x i32> [[VAL0]], i32 2 +; DEFAULT-NEXT: [[VAL1_I2:%.*]] = shl i32 3, [[VAL0_I2]] +; DEFAULT-NEXT: [[VAL0_I3:%.*]] = extractelement <4 x i32> [[VAL0]], i32 3 +; DEFAULT-NEXT: [[VAL1_I3:%.*]] = shl i32 4, [[VAL0_I3]] +; DEFAULT-NEXT: [[INDEX_IS_0:%.*]] = icmp eq i32 [[INDEX:%.*]], 0 +; DEFAULT-NEXT: [[VAL2_UPTO0:%.*]] = select i1 [[INDEX_IS_0]], i32 [[VAL1_I0]], i32 undef +; DEFAULT-NEXT: [[INDEX_IS_1:%.*]] = icmp eq i32 [[INDEX]], 1 +; DEFAULT-NEXT: [[VAL2_UPTO1:%.*]] = select i1 [[INDEX_IS_1]], i32 [[VAL1_I1]], i32 [[VAL2_UPTO0]] +; DEFAULT-NEXT: [[INDEX_IS_2:%.*]] = icmp eq i32 [[INDEX]], 2 +; DEFAULT-NEXT: [[VAL2_UPTO2:%.*]] = select i1 [[INDEX_IS_2]], i32 [[VAL1_I2]], i32 [[VAL2_UPTO1]] +; DEFAULT-NEXT: [[INDEX_IS_3:%.*]] = icmp eq i32 [[INDEX]], 3 +; DEFAULT-NEXT: [[VAL2:%.*]] = select i1 [[INDEX_IS_3]], i32 [[VAL1_I3]], i32 [[VAL2_UPTO2]] +; DEFAULT-NEXT: ret i32 [[VAL2]] +; +; OFF-LABEL: @f2( +; OFF-NEXT: [[VAL0:%.*]] = load <4 x i32>, <4 x i32>* [[SRC:%.*]], align 16 +; OFF-NEXT: [[VAL0_I0:%.*]] = extractelement <4 x i32> [[VAL0]], i32 0 +; OFF-NEXT: [[VAL1_I0:%.*]] = shl i32 1, [[VAL0_I0]] +; OFF-NEXT: [[VAL0_I1:%.*]] = extractelement <4 x i32> [[VAL0]], i32 1 +; OFF-NEXT: [[VAL1_I1:%.*]] = shl i32 2, [[VAL0_I1]] +; OFF-NEXT: [[VAL0_I2:%.*]] = extractelement <4 x i32> [[VAL0]], i32 2 +; OFF-NEXT: [[VAL1_I2:%.*]] = shl i32 3, [[VAL0_I2]] +; OFF-NEXT: [[VAL0_I3:%.*]] = extractelement <4 x i32> [[VAL0]], i32 3 +; OFF-NEXT: [[VAL1_I3:%.*]] = shl i32 4, [[VAL0_I3]] +; OFF-NEXT: [[VAL1_UPTO0:%.*]] = insertelement <4 x i32> undef, i32 [[VAL1_I0]], i32 0 +; OFF-NEXT: [[VAL1_UPTO1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO0]], i32 [[VAL1_I1]], i32 1 +; OFF-NEXT: [[VAL1_UPTO2:%.*]] = insertelement <4 x i32> [[VAL1_UPTO1]], i32 [[VAL1_I2]], i32 2 +; OFF-NEXT: [[VAL1:%.*]] = insertelement <4 x i32> [[VAL1_UPTO2]], i32 [[VAL1_I3]], i32 3 +; OFF-NEXT: [[VAL2:%.*]] = extractelement <4 x i32> [[VAL1]], i32 [[INDEX:%.*]] +; OFF-NEXT: ret i32 [[VAL2]] ; %val0 = load <4 x i32> , <4 x i32> *%src %val1 = shl <4 x i32> , %val0 From llvm-commits at lists.llvm.org Mon Jul 6 03:21:02 2020 From: llvm-commits at lists.llvm.org (Sam McCall via llvm-commits) Date: Mon, 06 Jul 2020 03:21:02 -0700 (PDT) Subject: [llvm] cd209f1 - [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc Message-ID: <5f02fb0e.1c69fb81.461e0.f658@mx.google.com> Author: Sam McCall Date: 2020-07-06T12:20:55+02:00 New Revision: cd209f1a3790af774b75213d7914c844a6140b4b URL: https://github.com/llvm/llvm-project/commit/cd209f1a3790af774b75213d7914c844a6140b4b DIFF: https://github.com/llvm/llvm-project/commit/cd209f1a3790af774b75213d7914c844a6140b4b.diff LOG: [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc Reviewers: hokein Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83128 Added: Modified: llvm/include/llvm/Support/Path.h llvm/lib/Support/Unix/Path.inc llvm/lib/Support/Windows/Path.inc llvm/unittests/Support/Path.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/Path.h b/llvm/include/llvm/Support/Path.h index cdfff2aa7a51..83bca5b70bc2 100644 --- a/llvm/include/llvm/Support/Path.h +++ b/llvm/include/llvm/Support/Path.h @@ -371,6 +371,13 @@ void system_temp_directory(bool erasedOnReboot, SmallVectorImpl &result); /// @result True if a home directory is set, false otherwise. bool home_directory(SmallVectorImpl &result); +/// Get the directory where packages should read user-specific configurations. +/// e.g. $XDG_CONFIG_HOME. +/// +/// @param result Holds the resulting path name. +/// @result True if the appropriate path was determined, it need not exist. +bool user_config_directory(SmallVectorImpl &result); + /// Get the directory where installed packages should put their /// machine-local cache, e.g. $XDG_CACHE_HOME. /// diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc index 2576da4506a0..824e8b0ca899 100644 --- a/llvm/lib/Support/Unix/Path.inc +++ b/llvm/lib/Support/Unix/Path.inc @@ -1158,6 +1158,30 @@ static bool getDarwinConfDir(bool TempDir, SmallVectorImpl &Result) { return false; } +bool user_config_directory(SmallVectorImpl &result) { +#ifdef __APPLE__ + // Mac: ~/Library/Preferences/ + if (home_directory(result)) { + append("Library", "Preferences"); + return true; + } +#else + // XDG_CONFIG_HOME as defined in the XDG Base Directory Specification: + // http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html + if (const char *RequestedDir = getenv("XDG_CONFIG_HOME")) { + result.clear(); + result.append(RequestedDir, RequestedDir + strlen(RequestedDir)); + return true; + } +#endif + // Fallback: ~/.config + if (!home_directory(result)) { + return false; + } + append(result, ".config"); + return true; +} + bool cache_directory(SmallVectorImpl &result) { #ifdef __APPLE__ if (getDarwinConfDir(false/*tempDir*/, result)) { diff --git a/llvm/lib/Support/Windows/Path.inc b/llvm/lib/Support/Windows/Path.inc index 96677e23b660..399a0cc7a25c 100644 --- a/llvm/lib/Support/Windows/Path.inc +++ b/llvm/lib/Support/Windows/Path.inc @@ -1372,6 +1372,12 @@ bool home_directory(SmallVectorImpl &result) { return getKnownFolderPath(FOLDERID_Profile, result); } +bool user_config_directory(SmallVectorImpl &result) { + // Either local or roaming appdata may be suitable in some cases, depending + // on the data. Local is more conservative, Roaming may not always be correct. + return getKnownFolderPath(FOLDERID_LocalAppData, result); +} + bool cache_directory(SmallVectorImpl &result) { return getKnownFolderPath(FOLDERID_LocalAppData, result); } diff --git a/llvm/unittests/Support/Path.cpp b/llvm/unittests/Support/Path.cpp index 906bf80a1850..6a228f093987 100644 --- a/llvm/unittests/Support/Path.cpp +++ b/llvm/unittests/Support/Path.cpp @@ -439,6 +439,26 @@ TEST(Support, HomeDirectoryWithNoEnv) { EXPECT_EQ(PwDir, HomeDir); } +TEST(Support, ConfigDirectoryWithEnv) { + WithEnv Env("XDG_CONFIG_HOME", "/xdg/config"); + + SmallString<128> ConfigDir; + EXPECT_TRUE(path::user_config_directory(ConfigDir)); + EXPECT_EQ("/xdg/config", ConfigDir); +} + +TEST(Support, ConfigDirectoryNoEnv) { + WithEnv Env("XDG_CONFIG_HOME", nullptr); + + SmallString<128> Fallback; + ASSERT_TRUE(path::home_directory(Fallback)); + path::append(Fallback, ".config"); + + SmallString<128> CacheDir; + EXPECT_TRUE(path::user_config_directory(CacheDir)); + EXPECT_EQ(Fallback, CacheDir); +} + TEST(Support, CacheDirectoryWithEnv) { WithEnv Env("XDG_CACHE_HOME", "/xdg/cache"); @@ -460,7 +480,29 @@ TEST(Support, CacheDirectoryNoEnv) { } #endif +#ifdef __APPLE__ +TEST(Support, ConfigDirectory) { + SmallString<128> Fallback; + ASSERT_TRUE(path::home_directory(Fallback)); + path::append(Fallback, "Library/Preferences"); + + SmallString<128> ConfigDir; + EXPECT_TRUE(path::user_config_directory(ConfigDir)); + EXPECT_EQ(Fallback, ConfigDir); +} +#endif + #ifdef _WIN32 +TEST(Support, ConfigDirectory) { + std::string Expected = getEnvWin(L"LOCALAPPDATA"); + // Do not try to test it if we don't know what to expect. + if (!Expected.empty()) { + SmallString<128> CacheDir; + EXPECT_TRUE(path::user_config_directory(CacheDir)); + EXPECT_EQ(Expected, CacheDir); + } +} + TEST(Support, CacheDirectory) { std::string Expected = getEnvWin(L"LOCALAPPDATA"); // Do not try to test it if we don't know what to expect. From llvm-commits at lists.llvm.org Mon Jul 6 03:33:19 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Mon, 06 Jul 2020 03:33:19 -0700 (PDT) Subject: [llvm] 60b8b2b - [ARM] Add extra extend and trunc costs for cast instructions Message-ID: <5f02fdef.1c69fb81.b80e3.eb18@mx.google.com> Author: David Green Date: 2020-07-06T11:33:05+01:00 New Revision: 60b8b2beeab9b6a994108da6ea3ab225a9e7bd9a URL: https://github.com/llvm/llvm-project/commit/60b8b2beeab9b6a994108da6ea3ab225a9e7bd9a DIFF: https://github.com/llvm/llvm-project/commit/60b8b2beeab9b6a994108da6ea3ab225a9e7bd9a.diff LOG: [ARM] Add extra extend and trunc costs for cast instructions This expands the existing extend costs with a few extras for larger types than legal, which will usually be split under MVE. It also adds trunk support for the same thing. These should not have a large effect on many things, but makes the costs explicit and keeps a certain balance between the trunks and extends. Differential Revision: https://reviews.llvm.org/D82457 Added: Modified: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index fa71a20d64f5..f3e1b5887bc0 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -228,12 +228,39 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 0}, {ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i8, 0}, {ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i8, 0}, + // The following extend from a legal type to an illegal type, so need to + // split the load. This introduced an extra load operation, but the + // extend is still "free". + {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 1}, + {ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 1}, }; if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { if (const auto *Entry = ConvertCostTableLookup(MVELoadConversionTbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT())) - return AdjustCost(Entry->Cost); + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); + } + } + + // The truncate of a store is free. This is the mirror of extends above. + if (I && I->hasOneUse() && isa(*I->user_begin())) { + static const TypeConversionCostTblEntry MVELoadConversionTbl[] = { + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i16, 0}, + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i8, 0}, + {ISD::TRUNCATE, MVT::v8i16, MVT::v8i8, 0}, + {ISD::TRUNCATE, MVT::v8i32, MVT::v8i16, 1}, + {ISD::TRUNCATE, MVT::v16i32, MVT::v16i8, 3}, + {ISD::TRUNCATE, MVT::v16i16, MVT::v16i8, 1}, + }; + if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { + if (const auto *Entry = + ConvertCostTableLookup(MVELoadConversionTbl, ISD, SrcTy.getSimpleVT(), + DstTy.getSimpleVT())) + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } } diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll index b22f8ed9e543..b9dc8a10a3c4 100644 --- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -122,8 +122,8 @@ define i32 @load_extends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %v8864u = zext <8 x i8> %loadv8i8 to <8 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816s = sext <16 x i8> %loadv16i8 to <16 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816u = zext <16 x i8> %loadv16i8 to <16 x i16> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1322 for instruction: %v16864s = sext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 298 for instruction: %v16864u = zext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v21632s = sext <2 x i16> %loadv2i16 to <2 x i32> @@ -758,7 +758,7 @@ define i32 @store_trunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8832 = trunc <8 x i32> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v8864 = trunc <8 x i64> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816 = trunc <16 x i16> undef to <16 x i8> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v16864 = trunc <16 x i64> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21632 = trunc <2 x i32> undef to <2 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21664 = trunc <2 x i64> undef to <2 x i16> From llvm-commits at lists.llvm.org Mon Jul 6 03:42:57 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 03:42:57 -0700 (PDT) Subject: [llvm] 5d7afe2 - [Scalarizer] visit{Insert,Extract}ElementInst(): avoid call arg evaluation order deps Message-ID: <5f030031.1c69fb81.72481.1cb7@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T13:42:35+03:00 New Revision: 5d7afe2d2e3c1a4715d022bfdb0c35df153e5430 URL: https://github.com/llvm/llvm-project/commit/5d7afe2d2e3c1a4715d022bfdb0c35df153e5430 DIFF: https://github.com/llvm/llvm-project/commit/5d7afe2d2e3c1a4715d022bfdb0c35df153e5430.diff LOG: [Scalarizer] visit{Insert,Extract}ElementInst(): avoid call arg evaluation order deps Compilers may evaluate call arguments in different order, which would result in different order of IR, which would break the tests. Spotted thanks to Dmitri Gribenko! Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index a775be6ef7b8..5cac4dca8cf8 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -769,10 +769,12 @@ bool ScalarizerVisitor::visitInsertElementInst(InsertElementInst &IEI) { return false; for (unsigned I = 0; I < NumElems; ++I) { - Res[I] = Builder.CreateSelect( + Value *ShouldReplace = Builder.CreateICmpEQ(InsIdx, ConstantInt::get(InsIdx->getType(), I), - InsIdx->getName() + ".is." + Twine(I)), - NewElt, Op0[I], IEI.getName() + ".i" + Twine(I)); + InsIdx->getName() + ".is." + Twine(I)); + Value *OldElt = Op0[I]; + Res[I] = Builder.CreateSelect(ShouldReplace, NewElt, OldElt, + IEI.getName() + ".i" + Twine(I)); } } @@ -801,10 +803,12 @@ bool ScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) { Value *Res = UndefValue::get(VT->getElementType()); for (unsigned I = 0; I < NumSrcElems; ++I) { - Res = Builder.CreateSelect( + Value *ShouldExtract = Builder.CreateICmpEQ(ExtIdx, ConstantInt::get(ExtIdx->getType(), I), - ExtIdx->getName() + ".is." + Twine(I)), - Op0[I], Res, EEI.getName() + ".upto" + Twine(I)); + ExtIdx->getName() + ".is." + Twine(I)); + Value *Elt = Op0[I]; + Res = Builder.CreateSelect(ShouldExtract, Elt, Res, + EEI.getName() + ".upto" + Twine(I)); } gather(&EEI, {Res}); return true; From llvm-commits at lists.llvm.org Mon Jul 6 03:49:32 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via llvm-commits) Date: Mon, 06 Jul 2020 03:49:32 -0700 (PDT) Subject: [llvm] 0663844 - [SystemZ/ZOS] Define Endian constants for z/OS. Message-ID: <5f0301bc.1c69fb81.25532.0c16@mx.google.com> Author: Kai Nacke Date: 2020-07-06T06:48:16-04:00 New Revision: 0663844b064dca074cbf12e868b9b3214cf52848 URL: https://github.com/llvm/llvm-project/commit/0663844b064dca074cbf12e868b9b3214cf52848 DIFF: https://github.com/llvm/llvm-project/commit/0663844b064dca074cbf12e868b9b3214cf52848.diff LOG: [SystemZ/ZOS] Define Endian constants for z/OS. This is needed to build LLVM on z/OS, as there is no header file which provides these constants. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D82368 Added: Modified: llvm/include/llvm/Support/SwapByteOrder.h Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/SwapByteOrder.h b/llvm/include/llvm/Support/SwapByteOrder.h index 500df2355307..0e544fc7e71e 100644 --- a/llvm/include/llvm/Support/SwapByteOrder.h +++ b/llvm/include/llvm/Support/SwapByteOrder.h @@ -36,6 +36,10 @@ #else #define BYTE_ORDER LITTLE_ENDIAN #endif +#elif defined(__MVS__) +#define BIG_ENDIAN 4321 +#define LITTLE_ENDIAN 1234 +#define BYTE_ORDER BIG_ENDIAN #else #if !defined(BYTE_ORDER) && !defined(_WIN32) #include From llvm-commits at lists.llvm.org Mon Jul 6 03:49:34 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via llvm-commits) Date: Mon, 06 Jul 2020 03:49:34 -0700 (PDT) Subject: [llvm] bfd84b1 - [SystemZ/ZOS] Implement getMainExecutable() and is_local_impl() Message-ID: <5f0301be.1c69fb81.f4dfe.efca@mx.google.com> Author: Kai Nacke Date: 2020-07-06T06:48:16-04:00 New Revision: bfd84b1c034d6b0413c293772662e1a5619d6b40 URL: https://github.com/llvm/llvm-project/commit/bfd84b1c034d6b0413c293772662e1a5619d6b40 DIFF: https://github.com/llvm/llvm-project/commit/bfd84b1c034d6b0413c293772662e1a5619d6b40.diff LOG: [SystemZ/ZOS] Implement getMainExecutable() and is_local_impl() Adds implementation of getMainExecutable() and is_local_impl() to Support/Unix/Path.inc. Both are needed to compile LLVM for z/OS. Reviewed By: hubert.reinterpretcast, emaste Differential Revision: https://reviews.llvm.org/D82544 Added: Modified: llvm/lib/Support/Unix/Path.inc Removed: ################################################################################ diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc index 824e8b0ca899..c35db79cbd8a 100644 --- a/llvm/lib/Support/Unix/Path.inc +++ b/llvm/lib/Support/Unix/Path.inc @@ -48,6 +48,8 @@ extern char **environ; #endif #elif defined(__DragonFly__) #include +#elif defined(__MVS__) +#include #endif // Both stdio.h and cstdio are included via diff erent paths and @@ -56,9 +58,13 @@ extern char **environ; #undef ferror #undef feof +#if !defined(PATH_MAX) // For GNU Hurd -#if defined(__GNU__) && !defined(PATH_MAX) -# define PATH_MAX 4096 +#if defined(__GNU__) +#define PATH_MAX 4096 +#elif defined(__MVS__) +#define PATH_MAX _XOPEN_PATH_MAX +#endif #endif #include @@ -100,7 +106,8 @@ typedef uint_t uint; #define STATVFS_F_FRSIZE(vfs) static_cast(vfs.f_bsize) #endif -#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) +#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) || \ + defined(__MVS__) #define STATVFS_F_FLAG(vfs) (vfs).f_flag #else #define STATVFS_F_FLAG(vfs) (vfs).f_flags @@ -265,6 +272,26 @@ std::string getMainExecutable(const char *argv0, void *MainAddr) { // Fall back to the classical detection. if (getprogpath(exe_path, argv0)) return exe_path; +#elif defined(__MVS__) + int token = 0; + W_PSPROC buf; + char exe_path[PS_PATHBLEN]; + pid_t pid = getpid(); + + memset(&buf, 0, sizeof(buf)); + buf.ps_pathptr = exe_path; + buf.ps_pathlen = sizeof(exe_path); + + while (true) { + if ((token = w_getpsent(token, &buf, sizeof(buf))) <= 0) + break; + if (buf.ps_pid != pid) + continue; + char real_path[PATH_MAX]; + if (realpath(exe_path, real_path)) + return std::string(real_path); + break; // Found entry, but realpath failed. + } #elif defined(HAVE_DLFCN_H) && defined(HAVE_DLADDR) // Use dladdr to get executable path if available. Dl_info DLInfo; @@ -493,6 +520,10 @@ static bool is_local_impl(struct STATVFS &Vfs) { // vmount entry not found; "remote" is the conservative answer. return false; +#elif defined(__MVS__) + // The file system can have an arbitrary structure on z/OS; must go with the + // conservative answer. + return false; #else return !!(STATVFS_F_FLAG(Vfs) & MNT_LOCAL); #endif From llvm-commits at lists.llvm.org Mon Jul 6 03:54:20 2020 From: llvm-commits at lists.llvm.org (Sam McCall via llvm-commits) Date: Mon, 06 Jul 2020 03:54:20 -0700 (PDT) Subject: [llvm] d7ea6ce - [Support] fix user_cache_directory on mac Message-ID: <5f0302dc.1c69fb81.983ca.9e7d@mx.google.com> Author: Sam McCall Date: 2020-07-06T12:54:11+02:00 New Revision: d7ea6ce809a4413afb1edafa17ba291b39129f52 URL: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52 DIFF: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52.diff LOG: [Support] fix user_cache_directory on mac Added: Modified: llvm/lib/Support/Unix/Path.inc Removed: ################################################################################ diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc index c35db79cbd8a..d91b269cc6d3 100644 --- a/llvm/lib/Support/Unix/Path.inc +++ b/llvm/lib/Support/Unix/Path.inc @@ -1193,7 +1193,7 @@ bool user_config_directory(SmallVectorImpl &result) { #ifdef __APPLE__ // Mac: ~/Library/Preferences/ if (home_directory(result)) { - append("Library", "Preferences"); + append(result, "Library", "Preferences"); return true; } #else From llvm-commits at lists.llvm.org Mon Jul 6 04:26:18 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Mon, 06 Jul 2020 04:26:18 -0700 (PDT) Subject: [llvm] e7a4a24 - [TargetLowering] Improve expansion of ROTL/ROTR Message-ID: <5f030a5a.1c69fb81.3da74.098f@mx.google.com> Author: Jay Foad Date: 2020-07-06T12:07:14+01:00 New Revision: e7a4a24dc50ab115cd0838bbf47d61251530a0cd URL: https://github.com/llvm/llvm-project/commit/e7a4a24dc50ab115cd0838bbf47d61251530a0cd DIFF: https://github.com/llvm/llvm-project/commit/e7a4a24dc50ab115cd0838bbf47d61251530a0cd.diff LOG: [TargetLowering] Improve expansion of ROTL/ROTR Using a negation instead of a subtraction from a constant can save an instruction on some targets. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82539 Added: Modified: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index d81582d4dd04..71c5e7b51610 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6177,12 +6177,15 @@ bool TargetLowering::expandROT(SDNode *Node, SDValue &Result, SDLoc DL(SDValue(Node, 0)); EVT ShVT = Op1.getValueType(); - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); + SDValue Zero = DAG.getConstant(0, DL, ShVT); - // If a rotate in the other direction is legal, use it. + assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && + "Expecting the type bitwidth to be a power of 2"); + + // If a rotate in the other direction is supported, use it. unsigned RevRot = IsLeft ? ISD::ROTR : ISD::ROTL; - if (isOperationLegal(RevRot, VT)) { - SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + if (isOperationLegalOrCustom(RevRot, VT)) { + SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); Result = DAG.getNode(RevRot, DL, VT, Op0, Sub); return true; } @@ -6195,15 +6198,13 @@ bool TargetLowering::expandROT(SDNode *Node, SDValue &Result, return false; // Otherwise, - // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and w-c, w-1))) - // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and w-c, w-1))) + // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and -c, w-1))) + // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and -c, w-1))) // - assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && - "Expecting the type bitwidth to be a power of 2"); unsigned ShOpc = IsLeft ? ISD::SHL : ISD::SRL; unsigned HsOpc = IsLeft ? ISD::SRL : ISD::SHL; SDValue BitWidthMinusOneC = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); SDValue And0 = DAG.getNode(ISD::AND, DL, ShVT, Op1, BitWidthMinusOneC); SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC); Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0), From llvm-commits at lists.llvm.org Mon Jul 6 04:30:16 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Mon, 06 Jul 2020 04:30:16 -0700 (PDT) Subject: [llvm] babbeaf - [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Message-ID: <5f030b48.1c69fb81.4c12d.18ba@mx.google.com> Author: Jay Foad Date: 2020-07-06T12:07:14+01:00 New Revision: babbeafa006f5317ed2162d1e64917422bfb58e7 URL: https://github.com/llvm/llvm-project/commit/babbeafa006f5317ed2162d1e64917422bfb58e7 DIFF: https://github.com/llvm/llvm-project/commit/babbeafa006f5317ed2162d1e64917422bfb58e7.diff LOG: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Use a simpler code sequence when the shift amount is known not to be zero modulo the bit width. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82540 Added: Modified: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 71c5e7b51610..96df20039b15 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6117,6 +6117,14 @@ bool TargetLowering::expandMUL(SDNode *N, SDValue &Lo, SDValue &Hi, EVT HiLoVT, return Ok; } +// Check that (every element of) Z is undef or not an exact multiple of BW. +static bool isNonZeroModBitWidth(SDValue Z, unsigned BW) { + return ISD::matchUnaryPredicate( + Z, + [=](ConstantSDNode *C) { return !C || C->getAPIntValue().urem(BW) != 0; }, + true); +} + bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result, SelectionDAG &DAG) const { EVT VT = Node->getValueType(0); @@ -6127,40 +6135,52 @@ bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result, !isOperationLegalOrCustomOrPromote(ISD::OR, VT))) return false; - // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) - // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) SDValue X = Node->getOperand(0); SDValue Y = Node->getOperand(1); SDValue Z = Node->getOperand(2); - unsigned EltSizeInBits = VT.getScalarSizeInBits(); + unsigned BW = VT.getScalarSizeInBits(); bool IsFSHL = Node->getOpcode() == ISD::FSHL; SDLoc DL(SDValue(Node, 0)); EVT ShVT = Z.getValueType(); - SDValue Mask = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue ShAmt, InvShAmt; - if (isPowerOf2_32(EltSizeInBits)) { - // Z % BW -> Z & (BW - 1) - ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); - // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) - InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); - } else { - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); - ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); - InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); - } - SDValue One = DAG.getConstant(1, DL, ShVT); SDValue ShX, ShY; - if (IsFSHL) { - ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); - SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); - ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + SDValue ShAmt, InvShAmt; + if (isNonZeroModBitWidth(Z, BW)) { + // fshl: X << C | Y >> (BW - C) + // fshr: X << (BW - C) | Y >> C + // where C = Z % BW is not zero + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, ShAmt); + ShX = DAG.getNode(ISD::SHL, DL, VT, X, IsFSHL ? ShAmt : InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, IsFSHL ? InvShAmt : ShAmt); } else { - SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); - ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); - ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) + // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) + SDValue Mask = DAG.getConstant(BW - 1, DL, ShVT); + if (isPowerOf2_32(BW)) { + // Z % BW -> Z & (BW - 1) + ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); + // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) + InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); + } else { + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); + } + + SDValue One = DAG.getConstant(1, DL, ShVT); + if (IsFSHL) { + ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); + SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); + ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + } else { + SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); + ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + } } Result = DAG.getNode(ISD::OR, DL, VT, ShX, ShY); return true; From llvm-commits at lists.llvm.org Mon Jul 6 04:48:00 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Mon, 06 Jul 2020 04:48:00 -0700 (PDT) Subject: [llvm] 0607c8d - [PowerPC] Legalize SREM/UREM directly on P9. Message-ID: <5f030f70.1c69fb81.ee221.1302@mx.google.com> Author: Esme-Yi Date: 2020-07-06T11:47:31Z New Revision: 0607c8df7faf71bd726e9d18bafd2f7566984e35 URL: https://github.com/llvm/llvm-project/commit/0607c8df7faf71bd726e9d18bafd2f7566984e35 DIFF: https://github.com/llvm/llvm-project/commit/0607c8df7faf71bd726e9d18bafd2f7566984e35.diff LOG: [PowerPC] Legalize SREM/UREM directly on P9. Summary: As Bugzilla-35090 reported, the rationale for using custom lowering SREM/UREM should no longer be true. At the IR level, the div-rem-pairs pass performs the transformation where the remainder is computed from the result of the division when both a required. We should now be able to lower these directly on P9. And the pass also fixed the problem that divide is in a different block than the remainder. This is a patch to remove redundant code and make SREM/UREM legal directly on P9. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D82145 Added: Modified: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index 532e2659eae1..a31b3fef2aba 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -261,15 +261,16 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM, // PowerPC has no SREM/UREM instructions unless we are on P9 // On P9 we may use a hardware instruction to compute the remainder. - // The instructions are not legalized directly because in the cases where the - // result of both the remainder and the division is required it is more - // efficient to compute the remainder from the result of the division rather - // than use the remainder instruction. + // When the result of both the remainder and the division is required it is + // more efficient to compute the remainder from the result of the division + // rather than use the remainder instruction. The instructions are legalized + // directly because the DivRemPairsPass performs the transformation at the IR + // level. if (Subtarget.isISA3_0()) { - setOperationAction(ISD::SREM, MVT::i32, Custom); - setOperationAction(ISD::UREM, MVT::i32, Custom); - setOperationAction(ISD::SREM, MVT::i64, Custom); - setOperationAction(ISD::UREM, MVT::i64, Custom); + setOperationAction(ISD::SREM, MVT::i32, Legal); + setOperationAction(ISD::UREM, MVT::i32, Legal); + setOperationAction(ISD::SREM, MVT::i64, Legal); + setOperationAction(ISD::UREM, MVT::i64, Legal); } else { setOperationAction(ISD::SREM, MVT::i32, Expand); setOperationAction(ISD::UREM, MVT::i32, Expand); @@ -10492,18 +10493,6 @@ SDValue PPCTargetLowering::LowerINTRINSIC_VOID(SDValue Op, return SDValue(); } -SDValue PPCTargetLowering::LowerREM(SDValue Op, SelectionDAG &DAG) const { - // Check for a DIV with the same operands as this REM. - for (auto UI : Op.getOperand(1)->uses()) { - if ((Op.getOpcode() == ISD::SREM && UI->getOpcode() == ISD::SDIV) || - (Op.getOpcode() == ISD::UREM && UI->getOpcode() == ISD::UDIV)) - if (UI->getOperand(0) == Op.getOperand(0) && - UI->getOperand(1) == Op.getOperand(1)) - return SDValue(); - } - return Op; -} - // Lower scalar BSWAP64 to xxbrd. SDValue PPCTargetLowering::LowerBSWAP(SDValue Op, SelectionDAG &DAG) const { SDLoc dl(Op); @@ -11121,9 +11110,6 @@ SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::INTRINSIC_VOID: return LowerINTRINSIC_VOID(Op, DAG); - case ISD::SREM: - case ISD::UREM: - return LowerREM(Op, DAG); case ISD::BSWAP: return LowerBSWAP(Op, DAG); case ISD::ATOMIC_CMP_SWAP: diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h index b3f309693e16..98256ae0c359 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h @@ -1119,7 +1119,6 @@ namespace llvm { SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const; SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const; SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const; - SDValue LowerREM(SDValue Op, SelectionDAG &DAG) const; SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const; SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const; SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const; diff --git a/llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll b/llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll index e3dcf8e5491e..e99074e7f90f 100644 --- a/llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll +++ b/llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll @@ -88,13 +88,16 @@ entry: store i32 %div, i32* @div_resultsw, align 4 ret void ; CHECK-LABEL: modulo_div_sw -; CHECK-NOT: modsw -; CHECK: div -; CHECK-NOT: modsw -; CHECK: mull -; CHECK-NOT: modsw -; CHECK: sub +; CHECK: modsw {{[0-9]+}}, 3, 4 ; CHECK: blr +; CHECK-DRP-LABEL: modulo_div_sw +; CHECK-DRP-NOT: modsw +; CHECK-DRP: div +; CHECK-DRP-NOT: modsw +; CHECK-DRP: mull +; CHECK-DRP-NOT: modsw +; CHECK-DRP: sub +; CHECK-DRP: blr ; CHECK-PWR8-LABEL: modulo_div_sw ; CHECK-PWR8: div ; CHECK-PWR8: mull @@ -129,13 +132,16 @@ entry: store i32 %div, i32* @div_resultuw, align 4 ret void ; CHECK-LABEL: modulo_div_uw -; CHECK-NOT: modsw -; CHECK: div -; CHECK-NOT: modsw -; CHECK: mull -; CHECK-NOT: modsw -; CHECK: sub +; CHECK: moduw {{[0-9]+}}, 3, 4 ; CHECK: blr +; CHECK-DRP-LABEL: modulo_div_uw +; CHECK-DRP-NOT: moduw +; CHECK-DRP: div +; CHECK-DRP-NOT: moduw +; CHECK-DRP: mull +; CHECK-DRP-NOT: moduw +; CHECK-DRP: sub +; CHECK-DRP: blr ; CHECK-PWR8-LABEL: modulo_div_uw ; CHECK-PWR8: div ; CHECK-PWR8: mull From llvm-commits at lists.llvm.org Mon Jul 6 06:01:06 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Mon, 06 Jul 2020 06:01:06 -0700 (PDT) Subject: [llvm] f6bd1bd - Regenerate neon copy tests. NFC. Message-ID: <5f032092.1c69fb81.8a151.19ac@mx.google.com> Author: Simon Pilgrim Date: 2020-07-06T13:58:25+01:00 New Revision: f6bd1bd8558f6d1c36f342e0f696c379ce98b549 URL: https://github.com/llvm/llvm-project/commit/f6bd1bd8558f6d1c36f342e0f696c379ce98b549 DIFF: https://github.com/llvm/llvm-project/commit/f6bd1bd8558f6d1c36f342e0f696c379ce98b549.diff LOG: Regenerate neon copy tests. NFC. To simplify the diffs in a patch in development. Added: Modified: llvm/test/CodeGen/AArch64/arm64-neon-copy.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll index 05a273f5f2d9..2878811b063a 100644 --- a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll +++ b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll @@ -211,7 +211,7 @@ define <2 x double> @ins1f2(<1 x double> %tmp1, <2 x double> %tmp2) { define <2 x double> @ins1f2_args_flipped(<2 x double> %tmp2, <1 x double> %tmp1) { ; CHECK-LABEL: ins1f2_args_flipped: ; CHECK: // %bb.0: -; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1 +; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1 ; CHECK-NEXT: mov v0.d[1], v1.d[0] ; CHECK-NEXT: ret %tmp3 = extractelement <1 x double> %tmp1, i32 0 @@ -1088,56 +1088,73 @@ define <2 x float> @test_bitcastv1f64tov2f32(<1 x i64> %a) #0 { ; Test insert element into an undef vector define <8 x i8> @scalar_to_vector.v8i8(i8 %a) { ; CHECK-LABEL: scalar_to_vector.v8i8: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <8 x i8> undef, i8 %a, i32 0 ret <8 x i8> %b } define <16 x i8> @scalar_to_vector.v16i8(i8 %a) { ; CHECK-LABEL: scalar_to_vector.v16i8: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <16 x i8> undef, i8 %a, i32 0 ret <16 x i8> %b } define <4 x i16> @scalar_to_vector.v4i16(i16 %a) { ; CHECK-LABEL: scalar_to_vector.v4i16: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <4 x i16> undef, i16 %a, i32 0 ret <4 x i16> %b } define <8 x i16> @scalar_to_vector.v8i16(i16 %a) { ; CHECK-LABEL: scalar_to_vector.v8i16: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <8 x i16> undef, i16 %a, i32 0 ret <8 x i16> %b } define <2 x i32> @scalar_to_vector.v2i32(i32 %a) { ; CHECK-LABEL: scalar_to_vector.v2i32: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <2 x i32> undef, i32 %a, i32 0 ret <2 x i32> %b } define <4 x i32> @scalar_to_vector.v4i32(i32 %a) { ; CHECK-LABEL: scalar_to_vector.v4i32: -; CHECK: fmov {{s[0-9]+}}, {{w[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov s0, w0 +; CHECK-NEXT: ret %b = insertelement <4 x i32> undef, i32 %a, i32 0 ret <4 x i32> %b } define <2 x i64> @scalar_to_vector.v2i64(i64 %a) { ; CHECK-LABEL: scalar_to_vector.v2i64: -; CHECK: fmov {{d[0-9]+}}, {{x[0-9]+}} +; CHECK: // %bb.0: +; CHECK-NEXT: fmov d0, x0 +; CHECK-NEXT: ret %b = insertelement <2 x i64> undef, i64 %a, i32 0 ret <2 x i64> %b } define <8 x i8> @testDUP.v1i8(<1 x i8> %a) { ; CHECK-LABEL: testDUP.v1i8: -; CHECK: dup v0.8b, v0.b[0] +; CHECK: // %bb.0: +; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0 +; CHECK-NEXT: dup v0.8b, v0.b[0] +; CHECK-NEXT: ret %b = extractelement <1 x i8> %a, i32 0 %c = insertelement <8 x i8> undef, i8 %b, i32 0 %d = insertelement <8 x i8> %c, i8 %b, i32 1 @@ -1152,7 +1169,10 @@ define <8 x i8> @testDUP.v1i8(<1 x i8> %a) { define <8 x i16> @testDUP.v1i16(<1 x i16> %a) { ; CHECK-LABEL: testDUP.v1i16: -; CHECK: dup v0.8h, v0.h[0] +; CHECK: // %bb.0: +; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0 +; CHECK-NEXT: dup v0.8h, v0.h[0] +; CHECK-NEXT: ret %b = extractelement <1 x i16> %a, i32 0 %c = insertelement <8 x i16> undef, i16 %b, i32 0 %d = insertelement <8 x i16> %c, i16 %b, i32 1 @@ -1167,7 +1187,10 @@ define <8 x i16> @testDUP.v1i16(<1 x i16> %a) { define <4 x i32> @testDUP.v1i32(<1 x i32> %a) { ; CHECK-LABEL: testDUP.v1i32: -; CHECK: dup v0.4s, v0.s[0] +; CHECK: // %bb.0: +; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0 +; CHECK-NEXT: dup v0.4s, v0.s[0] +; CHECK-NEXT: ret %b = extractelement <1 x i32> %a, i32 0 %c = insertelement <4 x i32> undef, i32 %b, i32 0 %d = insertelement <4 x i32> %c, i32 %b, i32 1 From llvm-commits at lists.llvm.org Mon Jul 6 06:01:10 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Mon, 06 Jul 2020 06:01:10 -0700 (PDT) Subject: [llvm] c37400f - Regenerate subreg liverange tests. NFC. Message-ID: <5f032096.1c69fb81.cb7c7.42db@mx.google.com> Author: Simon Pilgrim Date: 2020-07-06T13:58:25+01:00 New Revision: c37400f6e78d21fad36fcd8618047e66ad9b3ffc URL: https://github.com/llvm/llvm-project/commit/c37400f6e78d21fad36fcd8618047e66ad9b3ffc DIFF: https://github.com/llvm/llvm-project/commit/c37400f6e78d21fad36fcd8618047e66ad9b3ffc.diff LOG: Regenerate subreg liverange tests. NFC. To simplify the diffs in a patch in development. Added: Modified: llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll b/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll index 0d6bb6617977..0b0389edf46b 100644 --- a/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll +++ b/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll @@ -1,10 +1,21 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck %s + ; We may have subregister live ranges that are undefined on some paths. The ; verifier should not complain about this. - -; CHECK-LABEL: {{^}}func: define amdgpu_kernel void @func() #0 { +; CHECK-LABEL: func: +; CHECK: ; %bb.0: ; %B0 +; CHECK-NEXT: s_mov_b32 s0, 0 +; CHECK-NEXT: s_cbranch_scc1 BB0_2 +; CHECK-NEXT: ; %bb.1: ; %B30.1 +; CHECK-NEXT: s_mov_b32 s0, 0x7fc00000 +; CHECK-NEXT: BB0_2: ; %B30.2 +; CHECK-NEXT: v_mov_b32_e32 v0, s0 +; CHECK-NEXT: s_mov_b32 m0, -1 +; CHECK-NEXT: ds_write_b32 v0, v0 +; CHECK-NEXT: s_endpgm B0: br i1 undef, label %B1, label %B2 @@ -28,8 +39,28 @@ B30.2: ; FIXME: Extra undef subregister copy should be removed before ; overwritten with defined copy -; CHECK-LABEL: {{^}}valley_partially_undef_copy: define amdgpu_ps float @valley_partially_undef_copy() #0 { +; CHECK-LABEL: valley_partially_undef_copy: +; CHECK: ; %bb.0: ; %bb +; CHECK-NEXT: s_mov_b32 s3, 0xf000 +; CHECK-NEXT: s_mov_b32 s2, -1 +; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], 0 +; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 +; CHECK-NEXT: v_mov_b32_e32 v2, 0x7fc00000 +; CHECK-NEXT: s_waitcnt vmcnt(0) +; CHECK-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], 0 +; CHECK-NEXT: v_cmp_ne_u32_e64 s[0:1], 0, v1 +; CHECK-NEXT: BB1_1: ; %bb9 +; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1 +; CHECK-NEXT: s_andn2_b64 vcc, exec, s[0:1] +; CHECK-NEXT: s_cbranch_vccnz BB1_1 +; CHECK-NEXT: ; %bb.2: ; %bb11 +; CHECK-NEXT: s_mov_b32 s3, 0xf000 +; CHECK-NEXT: s_mov_b32 s2, -1 +; CHECK-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) +; CHECK-NEXT: ; return to shader part epilog bb: %tmp = load volatile i32, i32 addrspace(1)* undef, align 4 %tmp1 = load volatile i32, i32 addrspace(1)* undef, align 4 @@ -54,24 +85,27 @@ bb11: ; preds = %bb9 } ; FIXME: Should be able to remove the undef copies - -; CHECK-LABEL: {{^}}partially_undef_copy: -; CHECK: v_mov_b32_e32 v5, 5 -; CHECK-DAG: v_mov_b32_e32 v6, 6 - -; CHECK-DAG: v_mov_b32_e32 v[[OUTPUT_LO:[0-9]+]], v5 - -; Undef copy -; CHECK-DAG: v_mov_b32_e32 v1, v6 - -; undef copy -; CHECK-DAG: v_mov_b32_e32 v2, v7 - -; CHECK-DAG: v_mov_b32_e32 v[[OUTPUT_HI:[0-9]+]], v8 -; CHECK-DAG: v_mov_b32_e32 v[[OUTPUT_LO]], v6 - -; CHECK: buffer_store_dwordx4 v{{\[}}[[OUTPUT_LO]]:[[OUTPUT_HI]]{{\]}} define amdgpu_kernel void @partially_undef_copy() #0 { +; CHECK-LABEL: partially_undef_copy: +; CHECK: ; %bb.0: +; CHECK-NEXT: ;;#ASMSTART +; CHECK-NEXT: v_mov_b32_e32 v5, 5 +; CHECK-NEXT: ;;#ASMEND +; CHECK-NEXT: ;;#ASMSTART +; CHECK-NEXT: v_mov_b32_e32 v6, 6 +; CHECK-NEXT: ;;#ASMEND +; CHECK-NEXT: v_mov_b32_e32 v0, v5 +; CHECK-NEXT: v_mov_b32_e32 v1, v6 +; CHECK-NEXT: v_mov_b32_e32 v2, v7 +; CHECK-NEXT: v_mov_b32_e32 v3, v8 +; CHECK-NEXT: s_mov_b32 s3, 0xf000 +; CHECK-NEXT: s_mov_b32 s2, -1 +; CHECK-NEXT: v_mov_b32_e32 v0, v6 +; CHECK-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; CHECK-NEXT: ;;#ASMSTART +; CHECK-NEXT: v_nop +; CHECK-NEXT: ;;#ASMEND +; CHECK-NEXT: s_endpgm %tmp0 = call i32 asm sideeffect "v_mov_b32_e32 v5, 5", "={v5}"() %tmp1 = call i32 asm sideeffect "v_mov_b32_e32 v6, 6", "={v6}"() From llvm-commits at lists.llvm.org Mon Jul 6 06:01:12 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Mon, 06 Jul 2020 06:01:12 -0700 (PDT) Subject: [llvm] d6c72bd - [X86][XOP] Add XOP target vselect-pcmp tests Message-ID: <5f032098.1c69fb81.793e8.1999@mx.google.com> Author: Simon Pilgrim Date: 2020-07-06T13:58:26+01:00 New Revision: d6c72bdca2f20e724a755186e5c578b70b96b192 URL: https://github.com/llvm/llvm-project/commit/d6c72bdca2f20e724a755186e5c578b70b96b192 DIFF: https://github.com/llvm/llvm-project/commit/d6c72bdca2f20e724a755186e5c578b70b96b192.diff LOG: [X86][XOP] Add XOP target vselect-pcmp tests Noticed in the D83181 that XOP can probably do a lot more than other targets due to its vector shifts and vpcmov instructions Added: Modified: llvm/test/CodeGen/X86/vselect-pcmp.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/vselect-pcmp.ll b/llvm/test/CodeGen/X86/vselect-pcmp.ll index bc6dc30a9658..c393955e2088 100644 --- a/llvm/test/CodeGen/X86/vselect-pcmp.ll +++ b/llvm/test/CodeGen/X86/vselect-pcmp.ll @@ -1,8 +1,9 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx | FileCheck %s --check-prefix=AVX --check-prefix=AVX12F --check-prefix=AVX12 --check-prefix=AVX1 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 | FileCheck %s --check-prefix=AVX --check-prefix=AVX12F --check-prefix=AVX12 --check-prefix=AVX2 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f | FileCheck %s --check-prefix=AVX --check-prefix=AVX12F --check-prefix=AVX512 --check-prefix=AVX512F -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512vl | FileCheck %s --check-prefix=AVX --check-prefix=AVX512 --check-prefix=AVX512VL +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx | FileCheck %s --check-prefixes=CHECK,AVX,AVX12F,AVX12,AVX1 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 | FileCheck %s --check-prefixes=CHECK,AVX,AVX12F,AVX12,AVX2 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f | FileCheck %s --check-prefixes=CHECK,AVX,AVX12F,AVX512,AVX512F +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512vl | FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512VL +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=xop | FileCheck %s --check-prefixes=CHECK,XOP ; The condition vector for BLENDV* only cares about the sign bit of each element. ; So in these tests, if we generate BLENDV*, we should be able to remove the redundant cmp op. @@ -10,10 +11,10 @@ ; Test 128-bit vectors for all legal element types. define <16 x i8> @signbit_sel_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x i8> %mask) { -; AVX-LABEL: signbit_sel_v16i8: -; AVX: # %bb.0: -; AVX-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0 -; AVX-NEXT: retq +; CHECK-LABEL: signbit_sel_v16i8: +; CHECK: # %bb.0: +; CHECK-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0 +; CHECK-NEXT: retq %tr = icmp slt <16 x i8> %mask, zeroinitializer %z = select <16 x i1> %tr, <16 x i8> %x, <16 x i8> %y ret <16 x i8> %z @@ -28,6 +29,13 @@ define <8 x i16> @signbit_sel_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %mask) ; AVX-NEXT: vpcmpgtw %xmm2, %xmm3, %xmm2 ; AVX-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0 ; AVX-NEXT: retq +; +; XOP-LABEL: signbit_sel_v8i16: +; XOP: # %bb.0: +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomltw %xmm3, %xmm2, %xmm2 +; XOP-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %tr = icmp slt <8 x i16> %mask, zeroinitializer %z = select <8 x i1> %tr, <8 x i16> %x, <8 x i16> %y ret <8 x i16> %z @@ -57,6 +65,11 @@ define <4 x i32> @signbit_sel_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) ; AVX512VL-NEXT: vpcmpgtd %xmm2, %xmm3, %k1 ; AVX512VL-NEXT: vpblendmd %xmm0, %xmm1, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4i32: +; XOP: # %bb.0: +; XOP-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %tr = icmp slt <4 x i32> %mask, zeroinitializer %z = select <4 x i1> %tr, <4 x i32> %x, <4 x i32> %y ret <4 x i32> %z @@ -86,6 +99,11 @@ define <2 x i64> @signbit_sel_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) ; AVX512VL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1 ; AVX512VL-NEXT: vpblendmq %xmm0, %xmm1, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v2i64: +; XOP: # %bb.0: +; XOP-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %tr = icmp slt <2 x i64> %mask, zeroinitializer %z = select <2 x i1> %tr, <2 x i64> %x, <2 x i64> %y ret <2 x i64> %z @@ -115,6 +133,11 @@ define <4 x float> @signbit_sel_v4f32(<4 x float> %x, <4 x float> %y, <4 x i32> ; AVX512VL-NEXT: vpcmpgtd %xmm2, %xmm3, %k1 ; AVX512VL-NEXT: vblendmps %xmm0, %xmm1, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4f32: +; XOP: # %bb.0: +; XOP-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %tr = icmp slt <4 x i32> %mask, zeroinitializer %z = select <4 x i1> %tr, <4 x float> %x, <4 x float> %y ret <4 x float> %z @@ -144,6 +167,11 @@ define <2 x double> @signbit_sel_v2f64(<2 x double> %x, <2 x double> %y, <2 x i6 ; AVX512VL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1 ; AVX512VL-NEXT: vblendmpd %xmm0, %xmm1, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v2f64: +; XOP: # %bb.0: +; XOP-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %tr = icmp slt <2 x i64> %mask, zeroinitializer %z = select <2 x i1> %tr, <2 x double> %x, <2 x double> %y ret <2 x double> %z @@ -173,6 +201,16 @@ define <32 x i8> @signbit_sel_v32i8(<32 x i8> %x, <32 x i8> %y, <32 x i8> %mask) ; AVX512: # %bb.0: ; AVX512-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: signbit_sel_v32i8: +; XOP: # %bb.0: +; XOP-NEXT: vextractf128 $1, %ymm2, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomltb %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomltb %xmm4, %xmm2, %xmm2 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2 +; XOP-NEXT: vpcmov %ymm2, %ymm1, %ymm0, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <32 x i8> %mask, zeroinitializer %z = select <32 x i1> %tr, <32 x i8> %x, <32 x i8> %y ret <32 x i8> %z @@ -206,6 +244,16 @@ define <16 x i16> @signbit_sel_v16i16(<16 x i16> %x, <16 x i16> %y, <16 x i16> % ; AVX512-NEXT: vpcmpgtw %ymm2, %ymm3, %ymm2 ; AVX512-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: signbit_sel_v16i16: +; XOP: # %bb.0: +; XOP-NEXT: vextractf128 $1, %ymm2, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomltw %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomltw %xmm4, %xmm2, %xmm2 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2 +; XOP-NEXT: vpcmov %ymm2, %ymm1, %ymm0, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <16 x i16> %mask, zeroinitializer %z = select <16 x i1> %tr, <16 x i16> %x, <16 x i16> %y ret <16 x i16> %z @@ -234,6 +282,11 @@ define <8 x i32> @signbit_sel_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) ; AVX512VL-NEXT: vpcmpgtd %ymm2, %ymm3, %k1 ; AVX512VL-NEXT: vpblendmd %ymm0, %ymm1, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v8i32: +; XOP: # %bb.0: +; XOP-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <8 x i32> %mask, zeroinitializer %z = select <8 x i1> %tr, <8 x i32> %x, <8 x i32> %y ret <8 x i32> %z @@ -262,6 +315,11 @@ define <4 x i64> @signbit_sel_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) ; AVX512VL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1 ; AVX512VL-NEXT: vpblendmq %ymm0, %ymm1, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4i64: +; XOP: # %bb.0: +; XOP-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <4 x i64> %mask, zeroinitializer %z = select <4 x i1> %tr, <4 x i64> %x, <4 x i64> %y ret <4 x i64> %z @@ -290,6 +348,11 @@ define <4 x double> @signbit_sel_v4f64(<4 x double> %x, <4 x double> %y, <4 x i6 ; AVX512VL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1 ; AVX512VL-NEXT: vblendmpd %ymm0, %ymm1, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4f64: +; XOP: # %bb.0: +; XOP-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <4 x i64> %mask, zeroinitializer %z = select <4 x i1> %tr, <4 x double> %x, <4 x double> %y ret <4 x double> %z @@ -330,6 +393,15 @@ define <4 x double> @signbit_sel_v4f64_small_mask(<4 x double> %x, <4 x double> ; AVX512VL-NEXT: vpcmpgtd %xmm2, %xmm3, %k1 ; AVX512VL-NEXT: vblendmpd %ymm0, %ymm1, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4f64_small_mask: +; XOP: # %bb.0: +; XOP-NEXT: vpmovsxdq %xmm2, %xmm3 +; XOP-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1] +; XOP-NEXT: vpmovsxdq %xmm2, %xmm2 +; XOP-NEXT: vinsertf128 $1, %xmm2, %ymm3, %ymm2 +; XOP-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0 +; XOP-NEXT: retq %tr = icmp slt <4 x i32> %mask, zeroinitializer %z = select <4 x i1> %tr, <4 x double> %x, <4 x double> %y ret <4 x double> %z @@ -350,6 +422,12 @@ define <8 x double> @signbit_sel_v8f64(<8 x double> %x, <8 x double> %y, <8 x i6 ; AVX512-NEXT: vpcmpgtq %zmm2, %zmm3, %k1 ; AVX512-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1} ; AVX512-NEXT: retq +; +; XOP-LABEL: signbit_sel_v8f64: +; XOP: # %bb.0: +; XOP-NEXT: vblendvpd %ymm4, %ymm0, %ymm2, %ymm0 +; XOP-NEXT: vblendvpd %ymm5, %ymm1, %ymm3, %ymm1 +; XOP-NEXT: retq %tr = icmp slt <8 x i64> %mask, zeroinitializer %z = select <8 x i1> %tr, <8 x double> %x, <8 x double> %y ret <8 x double> %z @@ -384,6 +462,13 @@ define <4 x float> @signbit_sel_v4f32_fcmp(<4 x float> %x, <4 x float> %y, <4 x ; AVX512VL-NEXT: vcmpltps %xmm2, %xmm0, %k1 ; AVX512VL-NEXT: vblendmps %xmm0, %xmm1, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: signbit_sel_v4f32_fcmp: +; XOP: # %bb.0: +; XOP-NEXT: vxorps %xmm2, %xmm2, %xmm2 +; XOP-NEXT: vcmpltps %xmm2, %xmm0, %xmm2 +; XOP-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0 +; XOP-NEXT: retq %cmp = fcmp olt <4 x float> %x, zeroinitializer %sel = select <4 x i1> %cmp, <4 x float> %x, <4 x float> %y ret <4 x float> %sel @@ -420,6 +505,18 @@ define <4 x i64> @blend_splat1_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x ; AVX512VL-NEXT: vptestnmq {{.*}}(%rip){1to4}, %ymm0, %k1 ; AVX512VL-NEXT: vpblendmq %ymm1, %ymm2, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splat1_mask_cond_v4i64: +; XOP: # %bb.0: +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpsllq $63, %xmm3, %xmm3 +; XOP-NEXT: vmovdqa {{.*#+}} xmm4 = [18446744073709551553,18446744073709551553] +; XOP-NEXT: vpshaq %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpsllq $63, %xmm0, %xmm0 +; XOP-NEXT: vpshaq %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: retq %a = and <4 x i64> %x, %c = icmp eq <4 x i64> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i64> %y, <4 x i64> %z @@ -449,6 +546,14 @@ define <4 x i32> @blend_splat1_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x ; AVX512VL-NEXT: vptestnmd {{.*}}(%rip){1to4}, %xmm0, %k1 ; AVX512VL-NEXT: vpblendmd %xmm1, %xmm2, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splat1_mask_cond_v4i32: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqd %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <4 x i32> %x, %c = icmp eq <4 x i32> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i32> %y, <4 x i32> %z @@ -483,6 +588,17 @@ define <16 x i16> @blend_splat1_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, < ; AVX512-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0 ; AVX512-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splat1_mask_cond_v16i16: +; XOP: # %bb.0: +; XOP-NEXT: vpsllw $15, %xmm0, %xmm3 +; XOP-NEXT: vpsraw $15, %xmm3, %xmm3 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm0 +; XOP-NEXT: vpsllw $15, %xmm0, %xmm0 +; XOP-NEXT: vpsraw $15, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: retq %a = and <16 x i16> %x, %c = icmp eq <16 x i16> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i16> %y, <16 x i16> %z @@ -503,6 +619,14 @@ define <16 x i8> @blend_splat1_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x ; AVX512-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0 ; AVX512-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splat1_mask_cond_v16i8: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqb %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <16 x i8> %x, %c = icmp eq <16 x i8> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i8> %y, <16 x i8> %z @@ -532,6 +656,14 @@ define <2 x i64> @blend_splatmax_mask_cond_v2i64(<2 x i64> %x, <2 x i64> %y, <2 ; AVX512VL-NEXT: vptestnmq {{.*}}(%rip), %xmm0, %k1 ; AVX512VL-NEXT: vpblendmq %xmm1, %xmm2, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splatmax_mask_cond_v2i64: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqq %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <2 x i64> %x, %c = icmp eq <2 x i64> %a, zeroinitializer %r = select <2 x i1> %c, <2 x i64> %y, <2 x i64> %z @@ -559,6 +691,11 @@ define <8 x i32> @blend_splatmax_mask_cond_v8i32(<8 x i32> %x, <8 x i32> %y, <8 ; AVX512VL-NEXT: vptestnmd {{.*}}(%rip){1to8}, %ymm0, %k1 ; AVX512VL-NEXT: vpblendmd %ymm1, %ymm2, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splatmax_mask_cond_v8i32: +; XOP: # %bb.0: +; XOP-NEXT: vblendvps %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: retq %a = and <8 x i32> %x, %c = icmp eq <8 x i32> %a, zeroinitializer %r = select <8 x i1> %c, <8 x i32> %y, <8 x i32> %z @@ -579,6 +716,14 @@ define <8 x i16> @blend_splatmax_mask_cond_v8i16(<8 x i16> %x, <8 x i16> %y, <8 ; AVX512-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0 ; AVX512-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splatmax_mask_cond_v8i16: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqw %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <8 x i16> %x, %c = icmp eq <8 x i16> %a, zeroinitializer %r = select <8 x i1> %c, <8 x i16> %y, <8 x i16> %z @@ -610,6 +755,16 @@ define <32 x i8> @blend_splatmax_mask_cond_v32i8(<32 x i8> %x, <32 x i8> %y, <32 ; AVX512-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0 ; AVX512-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splatmax_mask_cond_v32i8: +; XOP: # %bb.0: +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcmpgtb %xmm3, %xmm4, %xmm3 +; XOP-NEXT: vpcmpgtb %xmm0, %xmm4, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: retq %a = and <32 x i8> %x, %c = icmp eq <32 x i8> %a, zeroinitializer %r = select <32 x i1> %c, <32 x i8> %y, <32 x i8> %z @@ -647,6 +802,18 @@ define <4 x i64> @blend_splat_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i ; AVX512VL-NEXT: vptestnmq {{.*}}(%rip){1to4}, %ymm0, %k1 ; AVX512VL-NEXT: vpblendmq %ymm1, %ymm2, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splat_mask_cond_v4i64: +; XOP: # %bb.0: +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpsllq $62, %xmm3, %xmm3 +; XOP-NEXT: vmovdqa {{.*#+}} xmm4 = [18446744073709551553,18446744073709551553] +; XOP-NEXT: vpshaq %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpsllq $62, %xmm0, %xmm0 +; XOP-NEXT: vpshaq %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: retq %a = and <4 x i64> %x, %c = icmp eq <4 x i64> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i64> %y, <4 x i64> %z @@ -676,6 +843,14 @@ define <4 x i32> @blend_splat_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i ; AVX512VL-NEXT: vptestnmd {{.*}}(%rip){1to4}, %xmm0, %k1 ; AVX512VL-NEXT: vpblendmd %xmm1, %xmm2, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_splat_mask_cond_v4i32: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqd %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <4 x i32> %x, %c = icmp eq <4 x i32> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i32> %y, <4 x i32> %z @@ -710,6 +885,17 @@ define <16 x i16> @blend_splat_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, <1 ; AVX512-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0 ; AVX512-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splat_mask_cond_v16i16: +; XOP: # %bb.0: +; XOP-NEXT: vpsllw $5, %xmm0, %xmm3 +; XOP-NEXT: vpsraw $15, %xmm3, %xmm3 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm0 +; XOP-NEXT: vpsllw $5, %xmm0, %xmm0 +; XOP-NEXT: vpsraw $15, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: retq %a = and <16 x i16> %x, %c = icmp eq <16 x i16> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i16> %y, <16 x i16> %z @@ -730,6 +916,14 @@ define <16 x i8> @blend_splat_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x ; AVX512-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0 ; AVX512-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_splat_mask_cond_v16i8: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomneqb %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0 +; XOP-NEXT: retq %a = and <16 x i8> %x, %c = icmp eq <16 x i8> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i8> %y, <16 x i8> %z @@ -772,6 +966,17 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z ; AVX512VL-NEXT: vptestnmq {{.*}}(%rip), %ymm0, %k1 ; AVX512VL-NEXT: vpblendmq %ymm1, %ymm2, %ymm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v4i64: +; XOP: # %bb.0: +; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomeqq %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqq %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: retq %a = and <4 x i64> %x, %c = icmp eq <4 x i64> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i64> %y, <4 x i64> %z @@ -804,6 +1009,14 @@ define <4 x i32> @blend_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z ; AVX512VL-NEXT: vptestnmd {{.*}}(%rip), %xmm0, %k1 ; AVX512VL-NEXT: vpblendmd %xmm1, %xmm2, %xmm0 {%k1} ; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v4i32: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqd %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq %a = and <4 x i32> %x, %c = icmp eq <4 x i32> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i32> %y, <4 x i32> %z @@ -839,6 +1052,17 @@ define <16 x i16> @blend_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, <16 x i1 ; AVX512-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0 ; AVX512-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 ; AVX512-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v16i16: +; XOP: # %bb.0: +; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomeqw %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqw %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: retq %a = and <16 x i16> %x, %c = icmp eq <16 x i16> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i16> %y, <16 x i16> %z @@ -853,6 +1077,14 @@ define <16 x i8> @blend_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x i8> %z ; AVX-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0 ; AVX-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 ; AVX-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v16i8: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqb %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq %a = and <16 x i8> %x, %c = icmp eq <16 x i8> %a, zeroinitializer %r = select <16 x i1> %c, <16 x i8> %y, <16 x i8> %z @@ -892,6 +1124,19 @@ define void @PR46531(i32* %x, i32* %y, i32* %z) { ; AVX512VL-NEXT: vpord %xmm0, %xmm1, %xmm2 {%k1} ; AVX512VL-NEXT: vmovdqu %xmm2, (%rdi) ; AVX512VL-NEXT: retq +; +; XOP-LABEL: PR46531: +; XOP: # %bb.0: +; XOP-NEXT: vmovdqu (%rsi), %xmm0 +; XOP-NEXT: vmovdqu (%rdx), %xmm1 +; XOP-NEXT: vpor %xmm0, %xmm1, %xmm2 +; XOP-NEXT: vpand {{.*}}(%rip), %xmm1, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomneqd %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpxor %xmm0, %xmm1, %xmm0 +; XOP-NEXT: vblendvps %xmm3, %xmm0, %xmm2, %xmm0 +; XOP-NEXT: vmovups %xmm0, (%rdi) +; XOP-NEXT: retq %vy = bitcast i32* %y to <4 x i32>* %a = load <4 x i32>, <4 x i32>* %vy, align 4 %vz = bitcast i32* %z to <4 x i32>* From llvm-commits at lists.llvm.org Mon Jul 6 06:01:25 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 06:01:25 -0700 (PDT) Subject: [llvm] bcff3de - AMDGPU/GlobalISel: Add some missing return tests Message-ID: <5f0320a5.1c69fb81.fe2b0.065b@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T09:01:18-04:00 New Revision: bcff3deaa12794edec9fdd1f12cecd6f41995225 URL: https://github.com/llvm/llvm-project/commit/bcff3deaa12794edec9fdd1f12cecd6f41995225 DIFF: https://github.com/llvm/llvm-project/commit/bcff3deaa12794edec9fdd1f12cecd6f41995225.diff LOG: AMDGPU/GlobalISel: Add some missing return tests Added: Modified: llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll index 82ecb616aa11..acd71947aeee 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll @@ -181,6 +181,21 @@ define signext i16 @i16_signext_func_void() #0 { ret i16 %val } +define half @f16_func_void() #0 { + ; CHECK-LABEL: name: f16_func_void + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr30_sgpr31 + ; CHECK: [[COPY:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 + ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; CHECK: [[LOAD:%[0-9]+]]:_(s16) = G_LOAD [[DEF]](p1) :: (load 2 from `half addrspace(1)* undef`, addrspace 1) + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[LOAD]](s16) + ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY]] + ; CHECK: S_SETPC_B64_return [[COPY1]], implicit $vgpr0 + %val = load half, half addrspace(1)* undef + ret half %val +} + define i32 @i32_func_void() #0 { ; CHECK-LABEL: name: i32_func_void ; CHECK: bb.1 (%ir-block.0): @@ -726,6 +741,24 @@ define <2 x i16> @v2i16_func_void() #0 { ret <2 x i16> %val } +define <2 x half> @v2f16_func_void() #0 { + ; CHECK-LABEL: name: v2f16_func_void + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr30_sgpr31 + ; CHECK: [[COPY:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 + ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; CHECK: [[LOAD:%[0-9]+]]:_(<2 x s16>) = G_LOAD [[DEF]](p1) :: (load 4 from `<2 x half> addrspace(1)* undef`, addrspace 1) + ; CHECK: [[UV:%[0-9]+]]:_(s16), [[UV1:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES [[LOAD]](<2 x s16>) + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UV]](s16) + ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV1]](s16) + ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) + ; CHECK: $vgpr1 = COPY [[ANYEXT1]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY]] + ; CHECK: S_SETPC_B64_return [[COPY1]], implicit $vgpr0, implicit $vgpr1 + %val = load <2 x half>, <2 x half> addrspace(1)* undef + ret <2 x half> %val +} + define <3 x i16> @v3i16_func_void() #0 { ; CHECK-LABEL: name: v3i16_func_void ; CHECK: bb.1 (%ir-block.0): From llvm-commits at lists.llvm.org Mon Jul 6 06:01:27 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 06:01:27 -0700 (PDT) Subject: [llvm] 7b76a5c - AMDGPU: Fix fixed ABI SGPR arguments Message-ID: <5f0320a7.1c69fb81.ef94e.6282@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T09:01:18-04:00 New Revision: 7b76a5c8a2a66684bffb19b37e851ebd39519541 URL: https://github.com/llvm/llvm-project/commit/7b76a5c8a2a66684bffb19b37e851ebd39519541 DIFF: https://github.com/llvm/llvm-project/commit/7b76a5c8a2a66684bffb19b37e851ebd39519541.diff LOG: AMDGPU: Fix fixed ABI SGPR arguments The default constructor wasn't setting isSet o the ArgDescriptor, so while these had the value set, they were treated as missing. This only ended up mattering in the indirect call case (and for regular calls in GlobalISel, which current doesn't have a way to support the variable ABI). Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp llvm/test/CodeGen/AMDGPU/indirect-call.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp index 69e48227e732..f41e774b34b4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp @@ -142,19 +142,20 @@ AMDGPUFunctionArgInfo::getPreloadedValue( constexpr AMDGPUFunctionArgInfo AMDGPUFunctionArgInfo::fixedABILayout() { AMDGPUFunctionArgInfo AI; - AI.PrivateSegmentBuffer = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3; - AI.DispatchPtr = AMDGPU::SGPR4_SGPR5; - AI.QueuePtr = AMDGPU::SGPR6_SGPR7; + AI.PrivateSegmentBuffer + = ArgDescriptor::createRegister(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3); + AI.DispatchPtr = ArgDescriptor::createRegister(AMDGPU::SGPR4_SGPR5); + AI.QueuePtr = ArgDescriptor::createRegister(AMDGPU::SGPR6_SGPR7); // Do not pass kernarg segment pointer, only pass increment version in its // place. - AI.ImplicitArgPtr = AMDGPU::SGPR8_SGPR9; - AI.DispatchID = AMDGPU::SGPR10_SGPR11; + AI.ImplicitArgPtr = ArgDescriptor::createRegister(AMDGPU::SGPR8_SGPR9); + AI.DispatchID = ArgDescriptor::createRegister(AMDGPU::SGPR10_SGPR11); // Skip FlatScratchInit/PrivateSegmentSize - AI.WorkGroupIDX = AMDGPU::SGPR12; - AI.WorkGroupIDY = AMDGPU::SGPR13; - AI.WorkGroupIDZ = AMDGPU::SGPR14; + AI.WorkGroupIDX = ArgDescriptor::createRegister(AMDGPU::SGPR12); + AI.WorkGroupIDY = ArgDescriptor::createRegister(AMDGPU::SGPR13); + AI.WorkGroupIDZ = ArgDescriptor::createRegister(AMDGPU::SGPR14); const unsigned Mask = 0x3ff; AI.WorkItemIDX = ArgDescriptor::createRegister(AMDGPU::VGPR31, Mask); diff --git a/llvm/test/CodeGen/AMDGPU/indirect-call.ll b/llvm/test/CodeGen/AMDGPU/indirect-call.ll index 8432d2961f04..dacc77b49992 100644 --- a/llvm/test/CodeGen/AMDGPU/indirect-call.ll +++ b/llvm/test/CodeGen/AMDGPU/indirect-call.ll @@ -81,16 +81,19 @@ define amdgpu_kernel void @test_indirect_call_sgpr_ptr() { ; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8 ; GCN-NEXT: s_add_u32 s0, s0, s17 ; GCN-NEXT: s_addc_u32 s1, s1, 0 -; GCN-NEXT: s_getpc_b64 s[4:5] -; GCN-NEXT: s_add_u32 s4, s4, gv.fptr0 at rel32@lo+4 -; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr0 at rel32@hi+4 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0 +; GCN-NEXT: s_mov_b32 s13, s15 +; GCN-NEXT: s_mov_b32 s12, s14 +; GCN-NEXT: s_getpc_b64 s[14:15] +; GCN-NEXT: s_add_u32 s14, s14, gv.fptr0 at rel32@lo+4 +; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr0 at rel32@hi+4 +; GCN-NEXT: s_load_dwordx2 s[18:19], s[14:15], 0x0 ; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2 ; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1 ; GCN-NEXT: v_or_b32_e32 v0, v0, v1 ; GCN-NEXT: v_or_b32_e32 v31, v0, v2 +; GCN-NEXT: s_mov_b32 s14, s16 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5] +; GCN-NEXT: s_swappc_b64 s[30:31], s[18:19] ; GCN-NEXT: s_endpgm %fptr = load void()*, void()* addrspace(4)* @gv.fptr0 call void %fptr() @@ -174,17 +177,20 @@ define amdgpu_kernel void @test_indirect_call_sgpr_ptr_arg() { ; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8 ; GCN-NEXT: s_add_u32 s0, s0, s17 ; GCN-NEXT: s_addc_u32 s1, s1, 0 -; GCN-NEXT: s_getpc_b64 s[4:5] -; GCN-NEXT: s_add_u32 s4, s4, gv.fptr1 at rel32@lo+4 -; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr1 at rel32@hi+4 +; GCN-NEXT: s_mov_b32 s13, s15 +; GCN-NEXT: s_mov_b32 s12, s14 +; GCN-NEXT: s_getpc_b64 s[14:15] +; GCN-NEXT: s_add_u32 s14, s14, gv.fptr1 at rel32@lo+4 +; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr1 at rel32@hi+4 ; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0 +; GCN-NEXT: s_load_dwordx2 s[18:19], s[14:15], 0x0 ; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1 ; GCN-NEXT: v_or_b32_e32 v0, v0, v1 ; GCN-NEXT: v_or_b32_e32 v31, v0, v2 ; GCN-NEXT: v_mov_b32_e32 v0, 0x7b +; GCN-NEXT: s_mov_b32 s14, s16 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5] +; GCN-NEXT: s_swappc_b64 s[30:31], s[18:19] ; GCN-NEXT: s_endpgm %fptr = load void(i32)*, void(i32)* addrspace(4)* @gv.fptr1 call void %fptr(i32 123) From llvm-commits at lists.llvm.org Mon Jul 6 06:11:37 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 06:11:37 -0700 (PDT) Subject: [llvm] 581f182 - AMDGPU/GlobalISel: Fix hardcoded register number checks in test Message-ID: <5f032309.1c69fb81.f6844.50ca@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T09:01:59-04:00 New Revision: 581f1823cdba50093f9eda2478de1207427032e4 URL: https://github.com/llvm/llvm-project/commit/581f1823cdba50093f9eda2478de1207427032e4 DIFF: https://github.com/llvm/llvm-project/commit/581f1823cdba50093f9eda2478de1207427032e4.diff LOG: AMDGPU/GlobalISel: Fix hardcoded register number checks in test Added: Modified: llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll index bfec0b93f33b..df536962e1b2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll @@ -1,10 +1,10 @@ ; RUN: not llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -global-isel-abort=2 -pass-remarks-missed="gisel.*" -o /dev/null 2>&1 %s | FileCheck -check-prefix=ERR %s -; ERR: remark: :0:0: cannot select: %24:sreg_32(p5) = G_DYN_STACKALLOC %23:vgpr(s32), 1 (in function: kernel_dynamic_stackalloc_vgpr_align4) +; ERR: remark: :0:0: cannot select: %{{[0-9]+}}:sreg_32(p5) = G_DYN_STACKALLOC %{{[0-9]+}}:vgpr(s32), 1 (in function: kernel_dynamic_stackalloc_vgpr_align4) ; ERR-NEXT: warning: Instruction selection used fallback path for kernel_dynamic_stackalloc_vgpr_align4 ; ERR-NEXT: error: :0:0: in function kernel_dynamic_stackalloc_vgpr_align4 void (i32 addrspace(1)*): unsupported dynamic alloca -; ERR: remark: :0:0: cannot select: %8:sreg_32(p5) = G_DYN_STACKALLOC %7:vgpr(s32), 1 (in function: func_dynamic_stackalloc_vgpr_align4) +; ERR: remark: :0:0: cannot select: %{{[0-9]+}}:sreg_32(p5) = G_DYN_STACKALLOC %{{[0-9]+}}:vgpr(s32), 1 (in function: func_dynamic_stackalloc_vgpr_align4) ; ERR-NEXT: warning: Instruction selection used fallback path for func_dynamic_stackalloc_vgpr_align4 ; ERR-NEXT: error: :0:0: in function func_dynamic_stackalloc_vgpr_align4 void (i32): unsupported dynamic alloca From llvm-commits at lists.llvm.org Mon Jul 6 06:11:39 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 06:11:39 -0700 (PDT) Subject: [llvm] a5b9ad7 - AMDGPU/GlobalISel: Don't emit code for unused kernel arguments Message-ID: <5f03230b.1c69fb81.c87a9.187b@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T09:04:06-04:00 New Revision: a5b9ad7e9aca1329ba310e638dafa58c47468a58 URL: https://github.com/llvm/llvm-project/commit/a5b9ad7e9aca1329ba310e638dafa58c47468a58 DIFF: https://github.com/llvm/llvm-project/commit/a5b9ad7e9aca1329ba310e638dafa58c47468a58.diff LOG: AMDGPU/GlobalISel: Don't emit code for unused kernel arguments Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp index 7a3a502113df..83e5fcef7d7b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp @@ -523,6 +523,9 @@ bool AMDGPUCallLowering::lowerFormalArgumentsKernel( uint64_t ArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + BaseOffset; ExplicitArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + AllocSize; + if (Arg.use_empty()) + continue; + ArrayRef OrigArgRegs = VRegs[i]; Register ArgReg = OrigArgRegs.size() == 1 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll index d5e7003a1561..4c48f9cc49fe 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll @@ -1092,9 +1092,9 @@ define amdgpu_kernel void @empty_struct_arg({} %in) nounwind { ; With the SelectionDAG argument lowering, the alignments for the ; struct members is not properly considered, making these wrong. -define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8, {i32, i64} %arg1) { +define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8 %pad, {i32, i64} %arg1) { ; HSA-VI-LABEL: name: struct_argument_alignment - ; HSA-VI: bb.1 (%ir-block.1): + ; HSA-VI: bb.1 (%ir-block.0): ; HSA-VI: liveins: $sgpr4_sgpr5 ; HSA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr4_sgpr5 ; HSA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0 @@ -1112,13 +1112,15 @@ define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8, {i32, ; HSA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD2]](s128), 64 ; HSA-VI: [[C3:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 ; HSA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C3]](p1) + ; HSA-VI: [[COPY2:%[0-9]+]]:_(p1) = COPY [[C3]](p1) ; HSA-VI: G_STORE [[EXTRACT]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; HSA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; HSA-VI: G_STORE [[LOAD1]](s8), [[COPY2]](p1) :: (volatile store 1 into `i8 addrspace(1)* null`, addrspace 1) ; HSA-VI: G_STORE [[EXTRACT2]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; HSA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; HSA-VI: S_ENDPGM 0 ; LEGACY-MESA-VI-LABEL: name: struct_argument_alignment - ; LEGACY-MESA-VI: bb.1 (%ir-block.1): + ; LEGACY-MESA-VI: bb.1 (%ir-block.0): ; LEGACY-MESA-VI: liveins: $sgpr0_sgpr1 ; LEGACY-MESA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr0_sgpr1 ; LEGACY-MESA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 36 @@ -1136,8 +1138,10 @@ define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8, {i32, ; LEGACY-MESA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD2]](s128), 64 ; LEGACY-MESA-VI: [[C3:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 ; LEGACY-MESA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C3]](p1) + ; LEGACY-MESA-VI: [[COPY2:%[0-9]+]]:_(p1) = COPY [[C3]](p1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: G_STORE [[LOAD1]](s8), [[COPY2]](p1) :: (volatile store 1 into `i8 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT2]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: S_ENDPGM 0 @@ -1147,6 +1151,7 @@ define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8, {i32, %val3 = extractvalue {i32, i64} %arg1, 1 store volatile i32 %val0, i32 addrspace(1)* null store volatile i64 %val1, i64 addrspace(1)* null + store volatile i8 %pad, i8 addrspace(1)* null store volatile i32 %val2, i32 addrspace(1)* null store volatile i64 %val3, i64 addrspace(1)* null ret void @@ -1164,20 +1169,15 @@ define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, ; HSA-VI: [[LOAD:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 12, align 16, addrspace 4) ; HSA-VI: [[EXTRACT:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD]](s96), 0 ; HSA-VI: [[EXTRACT1:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD]](s96), 32 - ; HSA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 12 + ; HSA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 13 ; HSA-VI: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C1]](s64) - ; HSA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 1, align 4, addrspace 4) - ; HSA-VI: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 13 - ; HSA-VI: [[PTR_ADD2:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C2]](s64) - ; HSA-VI: [[LOAD2:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD2]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) - ; HSA-VI: [[EXTRACT2:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD2]](s96), 0 - ; HSA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD2]](s96), 32 - ; HSA-VI: [[C3:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 - ; HSA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C3]](p1) - ; HSA-VI: G_STORE [[EXTRACT]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; HSA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; HSA-VI: [[C2:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 + ; HSA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C2]](p1) + ; HSA-VI: G_STORE [[EXTRACT]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; HSA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) - ; HSA-VI: G_STORE [[EXTRACT2]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) - ; HSA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; HSA-VI: G_STORE %3:_(s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; HSA-VI: G_STORE %4:_(s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; HSA-VI: S_ENDPGM 0 ; LEGACY-MESA-VI-LABEL: name: packed_struct_argument_alignment ; LEGACY-MESA-VI: bb.1 (%ir-block.1): @@ -1188,20 +1188,15 @@ define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, ; LEGACY-MESA-VI: [[LOAD:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 12, align 4, addrspace 4) ; LEGACY-MESA-VI: [[EXTRACT:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD]](s96), 0 ; LEGACY-MESA-VI: [[EXTRACT1:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD]](s96), 32 - ; LEGACY-MESA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 48 + ; LEGACY-MESA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 49 ; LEGACY-MESA-VI: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C1]](s64) - ; LEGACY-MESA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 1, align 16, addrspace 4) - ; LEGACY-MESA-VI: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 49 - ; LEGACY-MESA-VI: [[PTR_ADD2:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C2]](s64) - ; LEGACY-MESA-VI: [[LOAD2:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD2]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) - ; LEGACY-MESA-VI: [[EXTRACT2:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD2]](s96), 0 - ; LEGACY-MESA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD2]](s96), 32 - ; LEGACY-MESA-VI: [[C3:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 - ; LEGACY-MESA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C3]](p1) - ; LEGACY-MESA-VI: G_STORE [[EXTRACT]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; LEGACY-MESA-VI: [[C2:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 + ; LEGACY-MESA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C2]](p1) + ; LEGACY-MESA-VI: G_STORE [[EXTRACT]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) - ; LEGACY-MESA-VI: G_STORE [[EXTRACT2]](s32), [[C3]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) - ; LEGACY-MESA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: G_STORE %3:_(s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: G_STORE %4:_(s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: S_ENDPGM 0 %val0 = extractvalue <{i32, i64}> %arg0, 0 %val1 = extractvalue <{i32, i64}> %arg0, 1 @@ -1213,3 +1208,18 @@ define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, store volatile i64 %val3, i64 addrspace(1)* null ret void } + +define amdgpu_kernel void @unused_i32_arg(i32 addrspace(1)* nocapture %out, i32 %unused, i32 %in) nounwind { + ; HSA-VI-LABEL: name: unused_i32_arg + ; HSA-VI: bb.1.entry: + ; HSA-VI: liveins: $sgpr4_sgpr5 + ; HSA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr4_sgpr5 + ; HSA-VI: S_ENDPGM 0 + ; LEGACY-MESA-VI-LABEL: name: unused_i32_arg + ; LEGACY-MESA-VI: bb.1.entry: + ; LEGACY-MESA-VI: liveins: $sgpr0_sgpr1 + ; LEGACY-MESA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr0_sgpr1 + ; LEGACY-MESA-VI: S_ENDPGM 0 +entry: + ret void +} From llvm-commits at lists.llvm.org Mon Jul 6 06:22:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 06:22:58 -0700 (PDT) Subject: [llvm] 521ebc1 - GlobalISel: Move finalizeLowering call later Message-ID: <5f0325b2.1c69fb81.6bd98.5640@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T09:19:40-04:00 New Revision: 521ebc168152ab72047e2e7c81c8c6724b3e7623 URL: https://github.com/llvm/llvm-project/commit/521ebc168152ab72047e2e7c81c8c6724b3e7623 DIFF: https://github.com/llvm/llvm-project/commit/521ebc168152ab72047e2e7c81c8c6724b3e7623.diff LOG: GlobalISel: Move finalizeLowering call later This matches the DAG behavior where this is called after the loop checking for calls. The AMDGPU implementation depends on knowing if there are calls in the function or not, so move this later. Another problem is finalizeLowering is actually called twice; I was seeing weird inconsistencies since the first call would produce unexpected results and the second run would correct them in some contexts. Since this requires disabling the verifier, and it's useful to serialize the MIR immediately after selection, FinalizeISel should probably not be a real pass. Added: Modified: llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp b/llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp index fc114c6edc0d..f32278d07052 100644 --- a/llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp +++ b/llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp @@ -223,9 +223,6 @@ bool InstructionSelect::runOnMachineFunction(MachineFunction &MF) { return false; } #endif - auto &TLI = *MF.getSubtarget().getTargetLowering(); - TLI.finalizeLowering(MF); - // Determine if there are any calls in this machine function. Ported from // SelectionDAG. MachineFrameInfo &MFI = MF.getFrameInfo(); @@ -241,6 +238,9 @@ bool InstructionSelect::runOnMachineFunction(MachineFunction &MF) { } } + // FIXME: FinalizeISel pass calls finalizeLowering, so it's called twice. + auto &TLI = *MF.getSubtarget().getTargetLowering(); + TLI.finalizeLowering(MF); LLVM_DEBUG({ dbgs() << "Rules covered by selecting function: " << MF.getName() << ":"; @@ -249,11 +249,7 @@ bool InstructionSelect::runOnMachineFunction(MachineFunction &MF) { dbgs() << "\n\n"; }); CoverageInfo.emit(CoveragePrefix, - MF.getSubtarget() - .getTargetLowering() - ->getTargetMachine() - .getTarget() - .getBackendName()); + TLI.getTargetMachine().getTarget().getBackendName()); // If we successfully selected the function nothing is going to use the vreg // types after us (otherwise MIRPrinter would need them). Make sure the types From llvm-commits at lists.llvm.org Mon Jul 6 06:23:29 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Mon, 06 Jul 2020 06:23:29 -0700 (PDT) Subject: [llvm] afdb2ef - [ARM] Adjust default fp extend and trunc costs Message-ID: <5f0325d1.1c69fb81.b9fbf.1cb3@mx.google.com> Author: David Green Date: 2020-07-06T14:23:17+01:00 New Revision: afdb2ef2ed9debd419a29b78c23e4b84ce67ab0c URL: https://github.com/llvm/llvm-project/commit/afdb2ef2ed9debd419a29b78c23e4b84ce67ab0c DIFF: https://github.com/llvm/llvm-project/commit/afdb2ef2ed9debd419a29b78c23e4b84ce67ab0c.diff LOG: [ARM] Adjust default fp extend and trunc costs This adds some default costs for fp extends and truncates, generally costing them as 1 per lane. If the type is not legal then the cost will include a call to an __aeabi_ function. Some NEON code is also adjusted to make sure it applies to the expected types, now that fp16 is a more common thing. Differential Revision: https://reviews.llvm.org/D82458 Added: Modified: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast.ll llvm/test/Analysis/CostModel/ARM/cast_ldst.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index f3e1b5887bc0..04a259657321 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -180,21 +180,6 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, return Cost; }; - // Single to/from double precision conversions. - static const CostTblEntry NEONFltDblTbl[] = { - // Vector fptrunc/fpext conversions. - { ISD::FP_ROUND, MVT::v2f64, 2 }, - { ISD::FP_EXTEND, MVT::v2f32, 2 }, - { ISD::FP_EXTEND, MVT::v4f32, 4 } - }; - - if (Src->isVectorTy() && ST->hasNEON() && (ISD == ISD::FP_ROUND || - ISD == ISD::FP_EXTEND)) { - std::pair LT = TLI->getTypeLegalizationCost(DL, Src); - if (const auto *Entry = CostTableLookup(NEONFltDblTbl, ISD, LT.second)) - return AdjustCost(LT.first * Entry->Cost); - } - EVT SrcTy = TLI->getValueType(DL, Src); EVT DstTy = TLI->getValueType(DL, Dst); @@ -291,6 +276,23 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, } } + // Single to/from double precision conversions. + if (Src->isVectorTy() && ST->hasNEON() && + ((ISD == ISD::FP_ROUND && SrcTy.getScalarType() == MVT::f64 && + DstTy.getScalarType() == MVT::f32) || + (ISD == ISD::FP_EXTEND && SrcTy.getScalarType() == MVT::f32 && + DstTy.getScalarType() == MVT::f64))) { + static const CostTblEntry NEONFltDblTbl[] = { + // Vector fptrunc/fpext conversions. + {ISD::FP_ROUND, MVT::v2f64, 2}, + {ISD::FP_EXTEND, MVT::v2f32, 2}, + {ISD::FP_EXTEND, MVT::v4f32, 4}}; + + std::pair LT = TLI->getTypeLegalizationCost(DL, Src); + if (const auto *Entry = CostTableLookup(NEONFltDblTbl, ISD, LT.second)) + return AdjustCost(LT.first * Entry->Cost); + } + // Some arithmetic, load and store operations have specific instructions // to cast up/down their types automatically at no extra cost. // TODO: Get these tables to know at least what the related operations are. @@ -470,6 +472,27 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } + if (ISD == ISD::FP_ROUND || ISD == ISD::FP_EXTEND) { + // As general rule, fp converts that were not matched above are scalarized + // and cost 1 vcvt for each lane, so long as the instruction is available. + // If not it will become a series of function calls. + const int CallCost = getCallInstrCost(nullptr, Dst, {Src}, CostKind); + int Lanes = 1; + if (SrcTy.isFixedLengthVector()) + Lanes = SrcTy.getVectorNumElements(); + auto IsLegal = [this](EVT VT) { + EVT EltVT = VT.getScalarType(); + return (EltVT == MVT::f32 && ST->hasVFP2Base()) || + (EltVT == MVT::f64 && ST->hasFP64()) || + (EltVT == MVT::f16 && ST->hasFullFP16()); + }; + + if (IsLegal(SrcTy) && IsLegal(DstTy)) + return Lanes; + else + return Lanes * CallCost; + } + // Scalar integer conversion costs. static const TypeConversionCostTblEntry ARMIntegerConversionTbl[] = { // i16 -> i64 requires two dependent operations. diff --git a/llvm/test/Analysis/CostModel/ARM/cast.ll b/llvm/test/Analysis/CostModel/ARM/cast.ll index 26403a044fbb..28f7c6cfcf36 100644 --- a/llvm/test/Analysis/CostModel/ARM/cast.ll +++ b/llvm/test/Analysis/CostModel/ARM/cast.ll @@ -131,31 +131,31 @@ define i32 @casts() { ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -513,36 +513,36 @@ define i32 @casts() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 328 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1312 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 330 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1320 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -900,36 +900,36 @@ define i32 @casts() { ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -1287,36 +1287,36 @@ define i32 @casts() { ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -1679,31 +1679,31 @@ define i32 @casts() { ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -2061,36 +2061,36 @@ define i32 @casts() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -2448,36 +2448,36 @@ define i32 @casts() { ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -2835,36 +2835,36 @@ define i32 @casts() { ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rext_b = zext <2 x i32> undef to <2 x i64> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r74 = trunc <8 x i32> undef to <8 x i8> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r75 = trunc <16 x i32> undef to <16 x i8> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80df = fptrunc double undef to float -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80df = fptrunc double undef to float +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81df = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85fd = fpext float undef to double +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> @@ -3227,31 +3227,31 @@ define i32 @casts() { ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82df = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83df = fptrunc <8 x double> undef to <8 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r80fh = fptrunc float undef to half +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85fd = fpext float undef to double ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86fd = fpext <2 x float> undef to <2 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87fd = fpext <4 x float> undef to <4 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88fd = fpext <8 x float> undef to <8 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89fd = fpext <16 x float> undef to <16 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hd = fpext half undef to double -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r85hf = fpext half undef to float -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hd = fpext half undef to double +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hd = fpext <2 x half> undef to <2 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hd = fpext <4 x half> undef to <4 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hd = fpext <8 x half> undef to <8 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hd = fpext <16 x half> undef to <16 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r85hf = fpext half undef to float +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r86hf = fpext <2 x half> undef to <2 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r87hf = fpext <4 x half> undef to <4 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %r88hf = fpext <8 x half> undef to <8 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %r89hf = fpext <16 x half> undef to <16 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8> diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll index b9dc8a10a3c4..7628f09fc646 100644 --- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -1252,22 +1252,22 @@ define i32 @load_fpextends() { ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-MVE-RECIP-LABEL: 'load_fpextends' @@ -1281,21 +1281,21 @@ define i32 @load_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 330 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1320 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 328 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'load_fpextends' @@ -1308,22 +1308,22 @@ define i32 @load_fpextends() { ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-RECIP-LABEL: 'load_fpextends' @@ -1336,22 +1336,22 @@ define i32 @load_fpextends() { ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-RECIP-LABEL: 'load_fpextends' @@ -1364,22 +1364,22 @@ define i32 @load_fpextends() { ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-MVE-SIZE-LABEL: 'load_fpextends' @@ -1393,21 +1393,21 @@ define i32 @load_fpextends() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'load_fpextends' @@ -1420,22 +1420,22 @@ define i32 @load_fpextends() { ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-SIZE-LABEL: 'load_fpextends' @@ -1448,22 +1448,22 @@ define i32 @load_fpextends() { ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-SIZE-LABEL: 'load_fpextends' @@ -1476,22 +1476,22 @@ define i32 @load_fpextends() { ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = load <2 x float>, <2 x float>* undef, align 8 ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = load <4 x float>, <4 x float>* undef, align 16 ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = load <8 x float>, <8 x float>* undef, align 32 -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r1 = fpext half %loadf16 to float -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r2 = fpext half %loadf16 to double +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r1 = fpext half %loadf16 to float +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r3 = fpext float %loadf32 to double -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; %loadf16 = load half, half* undef @@ -1528,15 +1528,15 @@ define i32 @load_fpextends() { define i32 @load_fptrunc() { ; CHECK-NEON-RECIP-LABEL: 'load_fptrunc' -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 @@ -1554,16 +1554,16 @@ define i32 @load_fptrunc() { ; ; CHECK-MVE-RECIP-LABEL: 'load_fptrunc' ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1578,17 +1578,17 @@ define i32 @load_fptrunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'load_fptrunc' -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1603,17 +1603,17 @@ define i32 @load_fptrunc() { ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-RECIP-LABEL: 'load_fptrunc' -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1628,15 +1628,15 @@ define i32 @load_fptrunc() { ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-RECIP-LABEL: 'load_fptrunc' -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 @@ -1654,16 +1654,16 @@ define i32 @load_fptrunc() { ; ; CHECK-MVE-SIZE-LABEL: 'load_fptrunc' ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1678,17 +1678,17 @@ define i32 @load_fptrunc() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'load_fptrunc' -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1703,17 +1703,17 @@ define i32 @load_fptrunc() { ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-SIZE-LABEL: 'load_fptrunc' -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1664, half* undef, align 2 ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 @@ -1728,15 +1728,15 @@ define i32 @load_fptrunc() { ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-SIZE-LABEL: 'load_fptrunc' -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1632 = fptrunc float undef to half -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i1664 = fptrunc double undef to half +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1632 = fptrunc float undef to half +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i1664 = fptrunc double undef to half ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %i3264 = fptrunc double undef to float -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store half %i1632, half* undef, align 2 @@ -2750,19 +2750,19 @@ define i32 @maskedload_fpextends() { ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-MVE-RECIP-LABEL: 'maskedload_fpextends' @@ -2774,18 +2774,18 @@ define i32 @maskedload_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 330 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1320 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 328 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'maskedload_fpextends' @@ -2796,19 +2796,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-RECIP-LABEL: 'maskedload_fpextends' @@ -2819,19 +2819,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-RECIP-LABEL: 'maskedload_fpextends' @@ -2842,19 +2842,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-MVE-SIZE-LABEL: 'maskedload_fpextends' @@ -2865,19 +2865,19 @@ define i32 @maskedload_fpextends() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'maskedload_fpextends' @@ -2888,19 +2888,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-SIZE-LABEL: 'maskedload_fpextends' @@ -2911,19 +2911,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-SIZE-LABEL: 'maskedload_fpextends' @@ -2934,19 +2934,19 @@ define i32 @maskedload_fpextends() { ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv2f32 = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* undef, i32 4, <2 x i1> undef, <2 x float> undef) ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f32 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* undef, i32 4, <4 x i1> undef, <4 x float> undef) ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv8f32 = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* undef, i32 4, <8 x i1> undef, <8 x float> undef) -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v7 = fpext <8 x half> %loadv8f16 to <8 x double> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %v8 = fpext <16 x half> %loadv16f16 to <16 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; %loadv2f16 = call <2 x half> @llvm.masked.load.v2f16.p0v2f16(<2 x half>* undef, i32 2, <2 x i1> undef, <2 x half> undef) @@ -2977,12 +2977,12 @@ define i32 @maskedload_fpextends() { define i32 @maskedload_fptrunc() { ; CHECK-NEON-RECIP-LABEL: 'maskedload_fptrunc' -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-NEON-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) @@ -2997,13 +2997,13 @@ define i32 @maskedload_fptrunc() { ; ; CHECK-MVE-RECIP-LABEL: 'maskedload_fptrunc' ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3015,14 +3015,14 @@ define i32 @maskedload_fptrunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'maskedload_fptrunc' -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3034,14 +3034,14 @@ define i32 @maskedload_fptrunc() { ; CHECK-V8M-MAIN-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-RECIP-LABEL: 'maskedload_fptrunc' -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3053,12 +3053,12 @@ define i32 @maskedload_fptrunc() { ; CHECK-V8M-BASE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-RECIP-LABEL: 'maskedload_fptrunc' -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) @@ -3072,14 +3072,14 @@ define i32 @maskedload_fptrunc() { ; CHECK-V8R-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-MVE-SIZE-LABEL: 'maskedload_fptrunc' -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3091,14 +3091,14 @@ define i32 @maskedload_fptrunc() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'maskedload_fptrunc' -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3110,14 +3110,14 @@ define i32 @maskedload_fptrunc() { ; CHECK-V8M-MAIN-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-BASE-SIZE-LABEL: 'maskedload_fptrunc' -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> -; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> +; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21664, <2 x half>* undef, i32 2, <2 x i1> undef) ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4f16.p0v4f16(<4 x half> %v41632, <4 x half>* undef, i32 2, <4 x i1> undef) @@ -3129,12 +3129,12 @@ define i32 @maskedload_fptrunc() { ; CHECK-V8M-BASE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8R-SIZE-LABEL: 'maskedload_fptrunc' -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> -; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> ; CHECK-V8R-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v2f16.p0v2f16(<2 x half> %v21632, <2 x half>* undef, i32 2, <2 x i1> undef) From llvm-commits at lists.llvm.org Mon Jul 6 06:53:08 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Mon, 06 Jul 2020 06:53:08 -0700 (PDT) Subject: [llvm] dbfcf6e - [x86] add tests for vector select with non-splat bit-test condition; NFC Message-ID: <5f032cc4.1c69fb81.5e218.304c@mx.google.com> Author: Sanjay Patel Date: 2020-07-06T09:50:47-04:00 New Revision: dbfcf6eb721a4ff5e5f7b0ca61885796e2996ded URL: https://github.com/llvm/llvm-project/commit/dbfcf6eb721a4ff5e5f7b0ca61885796e2996ded DIFF: https://github.com/llvm/llvm-project/commit/dbfcf6eb721a4ff5e5f7b0ca61885796e2996ded.diff LOG: [x86] add tests for vector select with non-splat bit-test condition; NFC Goes with D83181. Added: Modified: llvm/test/CodeGen/X86/vselect-pcmp.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/vselect-pcmp.ll b/llvm/test/CodeGen/X86/vselect-pcmp.ll index c393955e2088..b7065c69b83b 100644 --- a/llvm/test/CodeGen/X86/vselect-pcmp.ll +++ b/llvm/test/CodeGen/X86/vselect-pcmp.ll @@ -930,6 +930,130 @@ define <16 x i8> @blend_splat_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x ret <16 x i8> %r } +define <2 x i64> @blend_mask_cond_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z) { +; AVX12-LABEL: blend_mask_cond_v2i64: +; AVX12: # %bb.0: +; AVX12-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; AVX12-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX12-NEXT: vpcmpeqq %xmm3, %xmm0, %xmm0 +; AVX12-NEXT: vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 +; AVX12-NEXT: retq +; +; AVX512F-LABEL: blend_mask_cond_v2i64: +; AVX512F: # %bb.0: +; AVX512F-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2 +; AVX512F-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1 +; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0 +; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [1,4] +; AVX512F-NEXT: vptestnmq %zmm3, %zmm0, %k1 +; AVX512F-NEXT: vpblendmq %zmm1, %zmm2, %zmm0 {%k1} +; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0 +; AVX512F-NEXT: vzeroupper +; AVX512F-NEXT: retq +; +; AVX512VL-LABEL: blend_mask_cond_v2i64: +; AVX512VL: # %bb.0: +; AVX512VL-NEXT: vptestnmq {{.*}}(%rip), %xmm0, %k1 +; AVX512VL-NEXT: vpblendmq %xmm1, %xmm2, %xmm0 {%k1} +; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v2i64: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqq %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq + %a = and <2 x i64> %x, + %c = icmp eq <2 x i64> %a, zeroinitializer + %r = select <2 x i1> %c, <2 x i64> %y, <2 x i64> %z + ret <2 x i64> %r +} + +define <4 x i32> @blend_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) { +; AVX12-LABEL: blend_mask_cond_v4i32: +; AVX12: # %bb.0: +; AVX12-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; AVX12-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX12-NEXT: vpcmpeqd %xmm3, %xmm0, %xmm0 +; AVX12-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 +; AVX12-NEXT: retq +; +; AVX512F-LABEL: blend_mask_cond_v4i32: +; AVX512F: # %bb.0: +; AVX512F-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2 +; AVX512F-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1 +; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0 +; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [65536,512,2,1] +; AVX512F-NEXT: vptestnmd %zmm3, %zmm0, %k1 +; AVX512F-NEXT: vpblendmd %zmm1, %zmm2, %zmm0 {%k1} +; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0 +; AVX512F-NEXT: vzeroupper +; AVX512F-NEXT: retq +; +; AVX512VL-LABEL: blend_mask_cond_v4i32: +; AVX512VL: # %bb.0: +; AVX512VL-NEXT: vptestnmd {{.*}}(%rip), %xmm0, %k1 +; AVX512VL-NEXT: vpblendmd %xmm1, %xmm2, %xmm0 {%k1} +; AVX512VL-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v4i32: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqd %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq + %a = and <4 x i32> %x, + %c = icmp eq <4 x i32> %a, zeroinitializer + %r = select <4 x i1> %c, <4 x i32> %y, <4 x i32> %z + ret <4 x i32> %r +} + +define <8 x i16> @blend_mask_cond_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %z) { +; AVX-LABEL: blend_mask_cond_v8i16: +; AVX: # %bb.0: +; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0 +; AVX-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; AVX-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v8i16: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqw %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq + %a = and <8 x i16> %x, + %c = icmp eq <8 x i16> %a, zeroinitializer + %r = select <8 x i1> %c, <8 x i16> %y, <8 x i16> %z + ret <8 x i16> %r +} + +define <16 x i8> @blend_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x i8> %z) { +; AVX-LABEL: blend_mask_cond_v16i8: +; AVX: # %bb.0: +; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0 +; AVX-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; AVX-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v16i8: +; XOP: # %bb.0: +; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqb %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: retq + %a = and <16 x i8> %x, + %c = icmp eq <16 x i8> %a, zeroinitializer + %r = select <16 x i1> %c, <16 x i8> %y, <16 x i8> %z + ret <16 x i8> %r +} + define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z) { ; AVX1-LABEL: blend_mask_cond_v4i64: ; AVX1: # %bb.0: @@ -955,7 +1079,7 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z ; AVX512F-NEXT: # kill: def $ymm2 killed $ymm2 def $zmm2 ; AVX512F-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1 ; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0 -; AVX512F-NEXT: vmovdqa {{.*#+}} ymm3 = [2,4,8,16] +; AVX512F-NEXT: vmovdqa {{.*#+}} ymm3 = [2,4,32768,1] ; AVX512F-NEXT: vptestnmq %zmm3, %zmm0, %k1 ; AVX512F-NEXT: vpblendmq %zmm1, %zmm2, %zmm0 {%k1} ; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0 @@ -977,50 +1101,63 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z ; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 ; XOP-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0 ; XOP-NEXT: retq - %a = and <4 x i64> %x, + %a = and <4 x i64> %x, %c = icmp eq <4 x i64> %a, zeroinitializer %r = select <4 x i1> %c, <4 x i64> %y, <4 x i64> %z ret <4 x i64> %r } -define <4 x i32> @blend_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) { -; AVX12-LABEL: blend_mask_cond_v4i32: -; AVX12: # %bb.0: -; AVX12-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; AVX12-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX12-NEXT: vpcmpeqd %xmm3, %xmm0, %xmm0 -; AVX12-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 -; AVX12-NEXT: retq +define <8 x i32> @blend_mask_cond_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %z) { +; AVX1-LABEL: blend_mask_cond_v8i32: +; AVX1: # %bb.0: +; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3 +; AVX1-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; AVX1-NEXT: vpcmpeqd %xmm4, %xmm3, %xmm3 +; AVX1-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0 +; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 +; AVX1-NEXT: retq ; -; AVX512F-LABEL: blend_mask_cond_v4i32: +; AVX2-LABEL: blend_mask_cond_v8i32: +; AVX2: # %bb.0: +; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 +; AVX2-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX2-NEXT: vpcmpeqd %ymm3, %ymm0, %ymm0 +; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 +; AVX2-NEXT: retq +; +; AVX512F-LABEL: blend_mask_cond_v8i32: ; AVX512F: # %bb.0: -; AVX512F-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2 -; AVX512F-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1 -; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0 -; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [65536,512,2,1] +; AVX512F-NEXT: # kill: def $ymm2 killed $ymm2 def $zmm2 +; AVX512F-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1 +; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0 +; AVX512F-NEXT: vmovdqa {{.*#+}} ymm3 = [1,2,8,4,8,1024,2,4096] ; AVX512F-NEXT: vptestnmd %zmm3, %zmm0, %k1 ; AVX512F-NEXT: vpblendmd %zmm1, %zmm2, %zmm0 {%k1} -; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0 -; AVX512F-NEXT: vzeroupper +; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0 ; AVX512F-NEXT: retq ; -; AVX512VL-LABEL: blend_mask_cond_v4i32: +; AVX512VL-LABEL: blend_mask_cond_v8i32: ; AVX512VL: # %bb.0: -; AVX512VL-NEXT: vptestnmd {{.*}}(%rip), %xmm0, %k1 -; AVX512VL-NEXT: vpblendmd %xmm1, %xmm2, %xmm0 {%k1} +; AVX512VL-NEXT: vptestnmd {{.*}}(%rip), %ymm0, %k1 +; AVX512VL-NEXT: vpblendmd %ymm1, %ymm2, %ymm0 {%k1} ; AVX512VL-NEXT: retq ; -; XOP-LABEL: blend_mask_cond_v4i32: +; XOP-LABEL: blend_mask_cond_v8i32: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqd %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomeqd %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqd %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 ; XOP-NEXT: retq - %a = and <4 x i32> %x, - %c = icmp eq <4 x i32> %a, zeroinitializer - %r = select <4 x i1> %c, <4 x i32> %y, <4 x i32> %z - ret <4 x i32> %r + %a = and <8 x i32> %x, + %c = icmp eq <8 x i32> %a, zeroinitializer + %r = select <8 x i1> %c, <8 x i32> %y, <8 x i32> %z + ret <8 x i32> %r } define <16 x i16> @blend_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, <16 x i16> %z) { @@ -1069,26 +1206,50 @@ define <16 x i16> @blend_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, <16 x i1 ret <16 x i16> %r } -define <16 x i8> @blend_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x i8> %z) { -; AVX-LABEL: blend_mask_cond_v16i8: -; AVX: # %bb.0: -; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0 -; AVX-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 -; AVX-NEXT: retq +define <32 x i8> @blend_mask_cond_v32i8(<32 x i8> %x, <32 x i8> %y, <32 x i8> %z) { +; AVX1-LABEL: blend_mask_cond_v32i8: +; AVX1: # %bb.0: +; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3 +; AVX1-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; AVX1-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm3 +; AVX1-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0 +; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; AVX1-NEXT: vandnps %ymm2, %ymm0, %ymm2 +; AVX1-NEXT: vandps %ymm0, %ymm1, %ymm0 +; AVX1-NEXT: vorps %ymm2, %ymm0, %ymm0 +; AVX1-NEXT: retq ; -; XOP-LABEL: blend_mask_cond_v16i8: +; AVX2-LABEL: blend_mask_cond_v32i8: +; AVX2: # %bb.0: +; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 +; AVX2-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX2-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0 +; AVX2-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 +; AVX2-NEXT: retq +; +; AVX512-LABEL: blend_mask_cond_v32i8: +; AVX512: # %bb.0: +; AVX512-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 +; AVX512-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX512-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0 +; AVX512-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0 +; AVX512-NEXT: retq +; +; XOP-LABEL: blend_mask_cond_v32i8: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqb %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 +; XOP-NEXT: vpcomeqb %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpcomeqb %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm2, %ymm1, %ymm0 ; XOP-NEXT: retq - %a = and <16 x i8> %x, - %c = icmp eq <16 x i8> %a, zeroinitializer - %r = select <16 x i1> %c, <16 x i8> %y, <16 x i8> %z - ret <16 x i8> %r + %a = and <32 x i8> %x, + %c = icmp eq <32 x i8> %a, zeroinitializer + %r = select <32 x i1> %c, <32 x i8> %y, <32 x i8> %z + ret <32 x i8> %r } define void @PR46531(i32* %x, i32* %y, i32* %z) { From llvm-commits at lists.llvm.org Mon Jul 6 07:10:25 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Mon, 06 Jul 2020 07:10:25 -0700 (PDT) Subject: [llvm] cff5739 - [LV] Pass dbgs() to verifyFunction call. Message-ID: <5f0330d1.1c69fb81.fd82b.1dab@mx.google.com> Author: Florian Hahn Date: 2020-07-06T15:09:20+01:00 New Revision: cff57391575da6bcc6f31e196bd73fa928b3abcb URL: https://github.com/llvm/llvm-project/commit/cff57391575da6bcc6f31e196bd73fa928b3abcb DIFF: https://github.com/llvm/llvm-project/commit/cff57391575da6bcc6f31e196bd73fa928b3abcb.diff LOG: [LV] Pass dbgs() to verifyFunction call. This is done in other places of the pass already and improves the output on verification failure. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 0c96842480c1..e3e0727b6b38 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -8026,7 +8026,7 @@ bool LoopVectorizePass::processLoop(Loop *L) { Hints.setAlreadyVectorized(); } - assert(!verifyFunction(*L->getHeader()->getParent())); + assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs())); return true; } From llvm-commits at lists.llvm.org Mon Jul 6 07:42:32 2020 From: llvm-commits at lists.llvm.org (Kadir Cetinkaya via llvm-commits) Date: Mon, 06 Jul 2020 07:42:32 -0700 (PDT) Subject: [llvm] d3e3f36 - Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" Message-ID: <5f033858.1c69fb81.fcc81.07d4@mx.google.com> Author: Mikhail Goncharov Date: 2020-07-06T16:41:59+02:00 New Revision: d3e3f36ff1151f565730977ac4f663a2ccee48ae URL: https://github.com/llvm/llvm-project/commit/d3e3f36ff1151f565730977ac4f663a2ccee48ae DIFF: https://github.com/llvm/llvm-project/commit/d3e3f36ff1151f565730977ac4f663a2ccee48ae.diff LOG: Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" Summary: This reverts commit 2c16100e6f72075564ea1f67fa5a82c269dafcd3. ninja check-polly fails: Polly :: Isl/CodeGen/MemAccess/generate-all.ll Polly :: ScopInfo/multidim_srem.ll Reviewers: kadircet, bollu Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83230 Added: Modified: llvm/lib/Analysis/ScalarEvolution.cpp llvm/test/Analysis/ScalarEvolution/sdiv.ll llvm/test/Analysis/ScalarEvolution/srem.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 75926aa3a960..609ad639d9c0 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6303,20 +6303,6 @@ const SCEV *ScalarEvolution::createSCEV(Value *V) { return getSCEV(U->getOperand(0)); break; - case Instruction::SDiv: - // If both operands are non-negative, this is just an udiv. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getUDivExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - - case Instruction::SRem: - // If both operands are non-negative, this is just an urem. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getURemExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can // lead to pointer expressions which cannot safely be expanded to GEPs, // because ScalarEvolution doesn't respect the GEP aliasing rules when diff --git a/llvm/test/Analysis/ScalarEvolution/sdiv.ll b/llvm/test/Analysis/ScalarEvolution/sdiv.ll index 106cda1b7f0f..b4895855ab71 100644 --- a/llvm/test/Analysis/ScalarEvolution/sdiv.ll +++ b/llvm/test/Analysis/ScalarEvolution/sdiv.ll @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = sdiv i32 %i.0, 2 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,1073741824) S: [0,1073741824) Exits: (%width /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: full-set S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,2147483648) S: [0,2147483648) Exits: ((zext i32 %width to i64) /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [-2147483648,2147483648) S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * ({0,+,1}<%for.cond> /u 2)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * ((zext i32 %width to i64) /u 2)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) diff --git a/llvm/test/Analysis/ScalarEvolution/srem.ll b/llvm/test/Analysis/ScalarEvolution/srem.ll index 6debab34e3b3..365b40d88e24 100644 --- a/llvm/test/Analysis/ScalarEvolution/srem.ll +++ b/llvm/test/Analysis/ScalarEvolution/srem.ll @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = srem i32 %i.0, 2 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i32) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i32) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i64) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i64) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * (zext i1 {false,+,true}<%for.cond> to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * (zext i1 (trunc i32 %width to i1) to i64)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) From llvm-commits at lists.llvm.org Mon Jul 6 07:47:54 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 6 Jul 2020 17:47:54 +0300 Subject: [llvm] d3e3f36 - Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" In-Reply-To: <5f033858.1c69fb81.fcc81.07d4@mx.google.com> References: <5f033858.1c69fb81.fcc81.07d4@mx.google.com> Message-ID: Hm, on which bots was that failing? On Mon, Jul 6, 2020 at 5:42 PM Kadir Cetinkaya via llvm-commits wrote: > > > Author: Mikhail Goncharov > Date: 2020-07-06T16:41:59+02:00 > New Revision: d3e3f36ff1151f565730977ac4f663a2ccee48ae > > URL: https://github.com/llvm/llvm-project/commit/d3e3f36ff1151f565730977ac4f663a2ccee48ae > DIFF: https://github.com/llvm/llvm-project/commit/d3e3f36ff1151f565730977ac4f663a2ccee48ae.diff > > LOG: Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" > > Summary: > This reverts commit 2c16100e6f72075564ea1f67fa5a82c269dafcd3. > > ninja check-polly fails: > Polly :: Isl/CodeGen/MemAccess/generate-all.ll > Polly :: ScopInfo/multidim_srem.ll > > Reviewers: kadircet, bollu > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D83230 > > Added: > > > Modified: > llvm/lib/Analysis/ScalarEvolution.cpp > llvm/test/Analysis/ScalarEvolution/sdiv.ll > llvm/test/Analysis/ScalarEvolution/srem.ll > > Removed: > > > > ################################################################################ > diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp > index 75926aa3a960..609ad639d9c0 100644 > --- a/llvm/lib/Analysis/ScalarEvolution.cpp > +++ b/llvm/lib/Analysis/ScalarEvolution.cpp > @@ -6303,20 +6303,6 @@ const SCEV *ScalarEvolution::createSCEV(Value *V) { > return getSCEV(U->getOperand(0)); > break; > > - case Instruction::SDiv: > - // If both operands are non-negative, this is just an udiv. > - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && > - isKnownNonNegative(getSCEV(U->getOperand(1)))) > - return getUDivExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); > - break; > - > - case Instruction::SRem: > - // If both operands are non-negative, this is just an urem. > - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && > - isKnownNonNegative(getSCEV(U->getOperand(1)))) > - return getURemExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); > - break; > - > // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can > // lead to pointer expressions which cannot safely be expanded to GEPs, > // because ScalarEvolution doesn't respect the GEP aliasing rules when > > diff --git a/llvm/test/Analysis/ScalarEvolution/sdiv.ll b/llvm/test/Analysis/ScalarEvolution/sdiv.ll > index 106cda1b7f0f..b4895855ab71 100644 > --- a/llvm/test/Analysis/ScalarEvolution/sdiv.ll > +++ b/llvm/test/Analysis/ScalarEvolution/sdiv.ll > @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { > ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } > ; CHECK-NEXT: %rem = sdiv i32 %i.0, 2 > -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,1073741824) S: [0,1073741824) Exits: (%width /u 2) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> %rem U: full-set S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 > -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,2147483648) S: [0,2147483648) Exits: ((zext i32 %width to i64) /u 2) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [-2147483648,2147483648) S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom > -; CHECK-NEXT: --> ((4 * ({0,+,1}<%for.cond> /u 2)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * ((zext i32 %width to i64) /u 2)) + %storage) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 > ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) > > diff --git a/llvm/test/Analysis/ScalarEvolution/srem.ll b/llvm/test/Analysis/ScalarEvolution/srem.ll > index 6debab34e3b3..365b40d88e24 100644 > --- a/llvm/test/Analysis/ScalarEvolution/srem.ll > +++ b/llvm/test/Analysis/ScalarEvolution/srem.ll > @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { > ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } > ; CHECK-NEXT: %rem = srem i32 %i.0, 2 > -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i32) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i32) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> %rem U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 > -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i64) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i64) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom > -; CHECK-NEXT: --> ((4 * (zext i1 {false,+,true}<%for.cond> to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * (zext i1 (trunc i32 %width to i1) to i64)) + %storage) LoopDispositions: { %for.cond: Computable } > +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 > ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } > ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 07:58:38 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Mon, 06 Jul 2020 07:58:38 -0700 (PDT) Subject: [llvm] 146dad0 - [ARM] MVE FP16 cost adjustments Message-ID: <5f033c1e.1c69fb81.cf123.222e@mx.google.com> Author: David Green Date: 2020-07-06T15:57:51+01:00 New Revision: 146dad0077b46a0fb8e158c10490c1774db5a762 URL: https://github.com/llvm/llvm-project/commit/146dad0077b46a0fb8e158c10490c1774db5a762 DIFF: https://github.com/llvm/llvm-project/commit/146dad0077b46a0fb8e158c10490c1774db5a762.diff LOG: [ARM] MVE FP16 cost adjustments This adjusts the MVE fp16 cost model, similar to how we already do for integer casts. It uses the base cost of 1 per cvt for most fp extend / truncates, but adjusts it for loads and stores where we know that a extending load has been used to get the load into the correct lane, and only an MVE VCVTB is then needed. Differential Revision: https://reviews.llvm.org/D81813 Added: Modified: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 04a259657321..44dfb9e8c129 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -229,6 +229,18 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, DstTy.getSimpleVT(), SrcTy.getSimpleVT())) return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } + + static const TypeConversionCostTblEntry MVEFLoadConversionTbl[] = { + // FPExtends are similar but also require the VCVT instructions. + {ISD::FP_EXTEND, MVT::v4f32, MVT::v4f16, 1}, + {ISD::FP_EXTEND, MVT::v8f32, MVT::v8f16, 3}, + }; + if (SrcTy.isVector() && ST->hasMVEFloatOps()) { + if (const auto *Entry = + ConvertCostTableLookup(MVEFLoadConversionTbl, ISD, + DstTy.getSimpleVT(), SrcTy.getSimpleVT())) + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); + } } // The truncate of a store is free. This is the mirror of extends above. @@ -247,6 +259,17 @@ int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src, DstTy.getSimpleVT())) return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } + + static const TypeConversionCostTblEntry MVEFLoadConversionTbl[] = { + {ISD::FP_ROUND, MVT::v4f32, MVT::v4f16, 1}, + {ISD::FP_ROUND, MVT::v8f32, MVT::v8f16, 3}, + }; + if (SrcTy.isVector() && ST->hasMVEFloatOps()) { + if (const auto *Entry = + ConvertCostTableLookup(MVEFLoadConversionTbl, ISD, SrcTy.getSimpleVT(), + DstTy.getSimpleVT())) + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); + } } // NEON vector operations that can extend their inputs. @@ -955,6 +978,22 @@ int ARMTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, return LT.first * 4; } + // MVE can optimize a fpext(load(4xhalf)) using an extending integer load. + // Same for stores. + if (ST->hasMVEFloatOps() && isa(Src) && I && + ((Opcode == Instruction::Load && I->hasOneUse() && + isa(*I->user_begin())) || + (Opcode == Instruction::Store && isa(I->getOperand(0))))) { + FixedVectorType *SrcVTy = cast(Src); + Type *DstTy = + Opcode == Instruction::Load + ? (*I->user_begin())->getType() + : cast(I->getOperand(0))->getOperand(0)->getType(); + if (SrcVTy->getNumElements() == 4 && SrcVTy->getScalarType()->isHalfTy() && + DstTy->getScalarType()->isFloatTy()) + return ST->getMVEVectorCostFactor(); + } + int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy() ? ST->getMVEVectorCostFactor() : 1; diff --git a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll index 7628f09fc646..491e0900e08a 100644 --- a/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ b/llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -1284,8 +1284,8 @@ define i32 @load_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> @@ -1294,8 +1294,8 @@ define i32 @load_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'load_fpextends' @@ -1396,8 +1396,8 @@ define i32 @load_fpextends() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r2 = fpext half %loadf16 to double ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r3 = fpext float %loadf32 to double ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = fpext <2 x half> %loadv2f16 to <2 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = fpext <4 x half> %loadv4f16 to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3 = fpext <8 x half> %loadv8f16 to <8 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4 = fpext <16 x half> %loadv16f16 to <16 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v5 = fpext <2 x half> %loadv2f16 to <2 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v6 = fpext <4 x half> %loadv4f16 to <4 x double> @@ -1407,7 +1407,7 @@ define i32 @load_fpextends() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'load_fpextends' @@ -1558,9 +1558,9 @@ define i32 @load_fptrunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> @@ -1569,7 +1569,7 @@ define i32 @load_fptrunc() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store float %i3264, float* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x half> %v21632, <2 x half>* undef, align 4 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <2 x half> %v21664, <2 x half>* undef, align 4 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: store <4 x half> %v41632, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x half> %v41632, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: store <4 x half> %v41664, <4 x half>* undef, align 8 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x half> %v81632, <8 x half>* undef, align 16 ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x half> %v81664, <8 x half>* undef, align 16 @@ -1658,9 +1658,9 @@ define i32 @load_fptrunc() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %i3264 = fptrunc double undef to float ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v21632 = fptrunc <2 x float> undef to <2 x half> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v21664 = fptrunc <2 x double> undef to <2 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v41632 = fptrunc <4 x float> undef to <4 x half> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v41664 = fptrunc <4 x double> undef to <4 x half> -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v81632 = fptrunc <8 x float> undef to <8 x half> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v81664 = fptrunc <8 x double> undef to <8 x half> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v23264 = fptrunc <2 x double> undef to <2 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v43264 = fptrunc <4 x double> undef to <4 x float> @@ -2784,8 +2784,8 @@ define i32 @maskedload_fpextends() { ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %v9 = fpext <2 x float> %loadv2f32 to <2 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-RECIP-LABEL: 'maskedload_fpextends' @@ -2877,7 +2877,7 @@ define i32 @maskedload_fpextends() { ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %v10 = fpext <4 x float> %loadv4f32 to <4 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v11 = fpext <8 x float> %loadv8f32 to <8 x double> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %loadv4f16ou = load <4 x half>, <4 x half>* undef, align 8 -; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> +; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2ou = fpext <4 x half> %loadv4f16ou to <4 x float> ; CHECK-MVE-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef ; ; CHECK-V8M-MAIN-SIZE-LABEL: 'maskedload_fpextends' diff --git a/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll b/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll index 747bac801f61..ff3e03c7bad4 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll @@ -562,14 +562,12 @@ for.body: br i1 %exitcond, label %for.cond.cleanup, label %for.body } -; TODO: this fpext could be allowed, but we don't lower it very efficiently yet, -; so reject this for now. define void @fpext_allowed(float* noalias nocapture %A, half* noalias nocapture readonly %B, float* noalias nocapture readonly %C) #0 { ; CHECK-LABEL: fpext_allowed( -; PREFER-FOLDING-NOT: vector.body: +; PREFER-FOLDING: vector.body: ; PREFER-FOLDING-NOT: llvm.masked.load ; PREFER-FOLDING-NOT: llvm.masked.store -; PREFER-FOLDING-NOT: br i1 %{{.*}}, label %{{.*}}, label %vector.body +; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %vector.body entry: br label %for.body @@ -591,14 +589,12 @@ for.body: br i1 %exitcond, label %for.cond.cleanup, label %for.body } -; TODO: this fptrunc could be allowed, but we don't lower it very efficiently yet, -; so reject this for now. define void @fptrunc_allowed(half* noalias nocapture %A, float* noalias nocapture readonly %B, float* noalias nocapture readonly %C) #0 { ; CHECK-LABEL: fptrunc_allowed( -; PREFER-FOLDING-NOT: vector.body: +; PREFER-FOLDING: vector.body: ; PREFER-FOLDING-NOT: llvm.masked.load ; PREFER-FOLDING-NOT: llvm.masked.store -; PREFER-FOLDING-NOT: br i1 %{{.*}}, label %{{.*}}, label %vector.body +; PREFER-FOLDING: br i1 %{{.*}}, label %{{.*}}, label %vector.body entry: br label %for.body From llvm-commits at lists.llvm.org Mon Jul 6 08:00:40 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 08:00:40 -0700 (PDT) Subject: [polly] a2619a6 - Reland "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" Message-ID: <5f033c98.1c69fb81.ab470.be8f@mx.google.com> Author: Roman Lebedev Date: 2020-07-06T18:00:22+03:00 New Revision: a2619a60e4601c445e9ca6e16c76052e00d907ff URL: https://github.com/llvm/llvm-project/commit/a2619a60e4601c445e9ca6e16c76052e00d907ff DIFF: https://github.com/llvm/llvm-project/commit/a2619a60e4601c445e9ca6e16c76052e00d907ff.diff LOG: Reland "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" This reverts commit d3e3f36ff1151f565730977ac4f663a2ccee48ae, which reverter the original commit 2c16100e6f72075564ea1f67fa5a82c269dafcd3, but with polly tests now actually passing. Added: Modified: llvm/lib/Analysis/ScalarEvolution.cpp llvm/test/Analysis/ScalarEvolution/sdiv.ll llvm/test/Analysis/ScalarEvolution/srem.ll polly/test/Isl/CodeGen/MemAccess/generate-all.ll polly/test/ScopInfo/multidim_srem.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 609ad639d9c0..75926aa3a960 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6303,6 +6303,20 @@ const SCEV *ScalarEvolution::createSCEV(Value *V) { return getSCEV(U->getOperand(0)); break; + case Instruction::SDiv: + // If both operands are non-negative, this is just an udiv. + if (isKnownNonNegative(getSCEV(U->getOperand(0))) && + isKnownNonNegative(getSCEV(U->getOperand(1)))) + return getUDivExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); + break; + + case Instruction::SRem: + // If both operands are non-negative, this is just an urem. + if (isKnownNonNegative(getSCEV(U->getOperand(0))) && + isKnownNonNegative(getSCEV(U->getOperand(1)))) + return getURemExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); + break; + // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can // lead to pointer expressions which cannot safely be expanded to GEPs, // because ScalarEvolution doesn't respect the GEP aliasing rules when diff --git a/llvm/test/Analysis/ScalarEvolution/sdiv.ll b/llvm/test/Analysis/ScalarEvolution/sdiv.ll index b4895855ab71..106cda1b7f0f 100644 --- a/llvm/test/Analysis/ScalarEvolution/sdiv.ll +++ b/llvm/test/Analysis/ScalarEvolution/sdiv.ll @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = sdiv i32 %i.0, 2 -; CHECK-NEXT: --> %rem U: full-set S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,1073741824) S: [0,1073741824) Exits: (%width /u 2) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> (sext i32 %rem to i64) U: [-2147483648,2147483648) S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,2147483648) S: [0,2147483648) Exits: ((zext i32 %width to i64) /u 2) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> ((4 * ({0,+,1}<%for.cond> /u 2)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * ((zext i32 %width to i64) /u 2)) + %storage) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) diff --git a/llvm/test/Analysis/ScalarEvolution/srem.ll b/llvm/test/Analysis/ScalarEvolution/srem.ll index 365b40d88e24..6debab34e3b3 100644 --- a/llvm/test/Analysis/ScalarEvolution/srem.ll +++ b/llvm/test/Analysis/ScalarEvolution/srem.ll @@ -14,11 +14,11 @@ define dso_local void @_Z4loopi(i32 %width) local_unnamed_addr #0 { ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = srem i32 %i.0, 2 -; CHECK-NEXT: --> %rem U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i32) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i32) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> (sext i32 %rem to i64) U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i64) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i64) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } +; CHECK-NEXT: --> ((4 * (zext i1 {false,+,true}<%for.cond> to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * (zext i1 (trunc i32 %width to i1) to i64)) + %storage) LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) diff --git a/polly/test/Isl/CodeGen/MemAccess/generate-all.ll b/polly/test/Isl/CodeGen/MemAccess/generate-all.ll index a3253ef2d168..f9b07a947c6a 100644 --- a/polly/test/Isl/CodeGen/MemAccess/generate-all.ll +++ b/polly/test/Isl/CodeGen/MemAccess/generate-all.ll @@ -9,11 +9,12 @@ ; } ; SCEV: polly.stmt.bb2: ; preds = %polly.loop_header -; SCEV-NEXT: %p_tmp = srem i64 %polly.indvar, 4 -; SCEV-NEXT: %p_tmp3 = getelementptr inbounds float, float* %A, i64 %p_tmp -; SCEV-NEXT: %tmp4_p_scalar_ = load float, float* %p_tmp3, align 4, !alias.scope !0, !noalias !2 +; SCEV-NEXT: %0 = trunc i64 %polly.indvar to i2 +; SCEV-NEXT: %1 = zext i2 %0 to i64 +; SCEV-NEXT: %scevgep = getelementptr float, float* %A, i64 %1 +; SCEV-NEXT: %tmp4_p_scalar_ = load float, float* %scevgep, align 4, !alias.scope !0, !noalias !2 ; SCEV-NEXT: %p_tmp5 = fadd float %tmp4_p_scalar_, 1.000000e+01 -; SCEV-NEXT: store float %p_tmp5, float* %p_tmp3, align 4, !alias.scope !0, !noalias !2 +; SCEV-NEXT: store float %p_tmp5, float* %scevgep, align 4, !alias.scope !0, !noalias !2 ; SCEV-NEXT: %polly.indvar_next = add nsw i64 %polly.indvar, 1 ; SCEV-NEXT: %polly.loop_cond = icmp sle i64 %polly.indvar_next, 99 ; SCEV-NEXT: br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit diff --git a/polly/test/ScopInfo/multidim_srem.ll b/polly/test/ScopInfo/multidim_srem.ll index 10673f41aaee..f7f9616a9df3 100644 --- a/polly/test/ScopInfo/multidim_srem.ll +++ b/polly/test/ScopInfo/multidim_srem.ll @@ -14,11 +14,12 @@ ; CHECK-NEXT: Schedule := ; CHECK-NEXT: [n] -> { Stmt_for_body_8[i0, i1, i2] -> [i0, i1, i2] }; ; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK-NEXT: [n] -> { Stmt_for_body_8[i0, i1, i2] -> MemRef_A[o0, i1, i2] : (i0 + o0) mod 2 = 0 and 0 <= o0 <= 1 } +; CHECK-NEXT: [n] -> { Stmt_for_body_8[i0, i1, i2] -> MemRef_A[1, i1, i2] : (1 + i0) mod 2 = 0; Stmt_for_body_8[i0, i1, i2] -> MemRef_A[0, i1, i2] : (i0) mod 2 = 0 }; ; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK-NEXT: [n] -> { Stmt_for_body_8[i0, i1, i2] -> MemRef_A[o0, i1, i2] : (i0 + o0) mod 2 = 0 and 0 <= o0 <= 1 }; +; CHECK-NEXT: [n] -> { Stmt_for_body_8[i0, i1, i2] -> MemRef_A[1, i1, i2] : (1 + i0) mod 2 = 0; Stmt_for_body_8[i0, i1, i2] -> MemRef_A[0, i1, i2] : (i0) mod 2 = 0 }; ; CHECK-NEXT: } + target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" From llvm-commits at lists.llvm.org Mon Jul 6 08:29:00 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard via llvm-commits) Date: Mon, 06 Jul 2020 08:29:00 -0700 (PDT) Subject: [llvm] e80b81d - [Support] Fix formatted_raw_ostream for UTF-8 Message-ID: <5f03433c.1c69fb81.eaad8.2f2f@mx.google.com> Author: Oliver Stannard Date: 2020-07-06T16:18:15+01:00 New Revision: e80b81d1cbf85dcd427759369978afdb48f0998f URL: https://github.com/llvm/llvm-project/commit/e80b81d1cbf85dcd427759369978afdb48f0998f DIFF: https://github.com/llvm/llvm-project/commit/e80b81d1cbf85dcd427759369978afdb48f0998f.diff LOG: [Support] Fix formatted_raw_ostream for UTF-8 * The getLine and getColumn functions need to update the position, or they will return stale data for buffered streams. This fixes a bug in the clang -analyzer-checker-option-help option, which was not wrapping the help text correctly when stdout is not a TTY. * If the stream contains multi-byte UTF-8 sequences, then the whole sequence needs to be considered to be a single character. This has the edge case that the buffer might fill up and be flushed part way through a character. * If the stream contains East Asian wide characters, these will be rendered twice as wide as other characters, so we need to increase the column count to match. This doesn't attempt to handle everything unicode can do (combining characters, right-to-left markers, ...), but hopefully covers most things likely to be common in messages and source code we might want to print. Differential revision: https://reviews.llvm.org/D76291 Added: Modified: clang/test/Analysis/checker-plugins.c llvm/include/llvm/Support/FormattedStream.h llvm/lib/Support/FormattedStream.cpp llvm/test/MC/ARM/lsl-zero.s llvm/unittests/Support/formatted_raw_ostream_test.cpp Removed: ################################################################################ diff --git a/clang/test/Analysis/checker-plugins.c b/clang/test/Analysis/checker-plugins.c index fbc9c9bd1c22..69fab8fa6eed 100644 --- a/clang/test/Analysis/checker-plugins.c +++ b/clang/test/Analysis/checker-plugins.c @@ -116,4 +116,5 @@ void caller() { // RUN: 2>&1 | FileCheck %s -check-prefix=CHECK-CHECKER-OPTION-HELP // CHECK-CHECKER-OPTION-HELP: example.MyChecker:ExampleOption (bool) This is an -// CHECK-CHECKER-OPTION-HELP-SAME: example checker opt. (default: false) +// CHECK-CHECKER-OPTION-HELP-SAME: example checker opt. (default: +// CHECK-CHECKER-OPTION-HELP-NEXT: false) diff --git a/llvm/include/llvm/Support/FormattedStream.h b/llvm/include/llvm/Support/FormattedStream.h index b49c8d86531d..5f937cfa7984 100644 --- a/llvm/include/llvm/Support/FormattedStream.h +++ b/llvm/include/llvm/Support/FormattedStream.h @@ -14,6 +14,7 @@ #ifndef LLVM_SUPPORT_FORMATTEDSTREAM_H #define LLVM_SUPPORT_FORMATTEDSTREAM_H +#include "llvm/ADT/SmallString.h" #include "llvm/Support/raw_ostream.h" #include @@ -21,8 +22,11 @@ namespace llvm { /// formatted_raw_ostream - A raw_ostream that wraps another one and keeps track /// of line and column position, allowing padding out to specific column -/// boundaries and querying the number of lines written to the stream. -/// +/// boundaries and querying the number of lines written to the stream. This +/// assumes that the contents of the stream is valid UTF-8 encoded text. This +/// doesn't attempt to handle everything Unicode can do (combining characters, +/// right-to-left markers, etc), but should cover the cases likely to appear in +/// source code or diagnostic messages. class formatted_raw_ostream : public raw_ostream { /// TheStream - The real stream we output to. We set it to be /// unbuffered, since we're already doing our own buffering. @@ -40,6 +44,14 @@ class formatted_raw_ostream : public raw_ostream { /// const char *Scanned; + /// PartialUTF8Char - Either empty or a prefix of a UTF-8 code unit sequence + /// for a Unicode scalar value which should be prepended to the buffer for the + /// next call to ComputePosition. This is needed when the buffer is flushed + /// when it ends part-way through the UTF-8 encoding of a Unicode scalar + /// value, so that we can compute the display width of the character once we + /// have the rest of it. + SmallString<4> PartialUTF8Char; + void write_impl(const char *Ptr, size_t Size) override; /// current_pos - Return the current position within the stream, @@ -52,10 +64,16 @@ class formatted_raw_ostream : public raw_ostream { } /// ComputePosition - Examine the given output buffer and figure out the new - /// position after output. - /// + /// position after output. This is safe to call multiple times on the same + /// buffer, as it records the most recently scanned character and resumes from + /// there when the buffer has not been flushed. void ComputePosition(const char *Ptr, size_t size); + /// UpdatePosition - scan the characters in [Ptr, Ptr+Size), and update the + /// line and column numbers. Unlike ComputePosition, this must be called + /// exactly once on each region of the buffer. + void UpdatePosition(const char *Ptr, size_t Size); + void setStream(raw_ostream &Stream) { releaseStream(); @@ -105,11 +123,17 @@ class formatted_raw_ostream : public raw_ostream { /// \param NewCol - The column to move to. formatted_raw_ostream &PadToColumn(unsigned NewCol); - /// getColumn - Return the column number - unsigned getColumn() { return Position.first; } + unsigned getColumn() { + // Calculate current position, taking buffer contents into account. + ComputePosition(getBufferStart(), GetNumBytesInBuffer()); + return Position.first; + } - /// getLine - Return the line number - unsigned getLine() { return Position.second; } + unsigned getLine() { + // Calculate current position, taking buffer contents into account. + ComputePosition(getBufferStart(), GetNumBytesInBuffer()); + return Position.second; + } raw_ostream &resetColor() override { TheStream->resetColor(); diff --git a/llvm/lib/Support/FormattedStream.cpp b/llvm/lib/Support/FormattedStream.cpp index 4eb747038bb9..081b8bf2cc19 100644 --- a/llvm/lib/Support/FormattedStream.cpp +++ b/llvm/lib/Support/FormattedStream.cpp @@ -11,7 +11,9 @@ //===----------------------------------------------------------------------===// #include "llvm/Support/FormattedStream.h" +#include "llvm/Support/ConvertUTF.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/Unicode.h" #include "llvm/Support/raw_ostream.h" #include @@ -19,16 +21,22 @@ using namespace llvm; /// UpdatePosition - Examine the given char sequence and figure out which /// column we end up in after output, and how many line breaks are contained. -/// -static void UpdatePosition(std::pair &Position, const char *Ptr, size_t Size) { +/// This assumes that the input string is well-formed UTF-8, and takes into +/// account Unicode characters which render as multiple columns wide. +void formatted_raw_ostream::UpdatePosition(const char *Ptr, size_t Size) { unsigned &Column = Position.first; unsigned &Line = Position.second; - // Keep track of the current column and line by scanning the string for - // special characters - for (const char *End = Ptr + Size; Ptr != End; ++Ptr) { - ++Column; - switch (*Ptr) { + auto ProcessUTF8CodePoint = [&Line, &Column](StringRef CP) { + int Width = sys::unicode::columnWidthUTF8(CP); + if (Width != sys::unicode::ErrorNonPrintableCharacter) + Column += Width; + + // The only special whitespace characters we care about are single-byte. + if (CP.size() > 1) + return; + + switch (CP[0]) { case '\n': Line += 1; LLVM_FALLTHROUGH; @@ -40,6 +48,46 @@ static void UpdatePosition(std::pair &Position, const char * Column += (8 - (Column & 0x7)) & 0x7; break; } + }; + + // If we have a partial UTF-8 sequence from the previous buffer, check that + // first. + if (PartialUTF8Char.size()) { + size_t BytesFromBuffer = + getNumBytesForUTF8(PartialUTF8Char[0]) - PartialUTF8Char.size(); + if (Size < BytesFromBuffer) { + // If we still don't have enough bytes for a complete code point, just + // append what we have. + PartialUTF8Char.append(StringRef(Ptr, Size)); + return; + } else { + // The first few bytes from the buffer will complete the code point. + // Concatenate them and process their effect on the line and column + // numbers. + PartialUTF8Char.append(StringRef(Ptr, BytesFromBuffer)); + ProcessUTF8CodePoint(PartialUTF8Char); + PartialUTF8Char.clear(); + Ptr += BytesFromBuffer; + Size -= BytesFromBuffer; + } + } + + // Now scan the rest of the buffer. + unsigned NumBytes; + for (const char *End = Ptr + Size; Ptr < End; Ptr += NumBytes) { + NumBytes = getNumBytesForUTF8(*Ptr); + + // The buffer might end part way through a UTF-8 code unit sequence for a + // Unicode scalar value if it got flushed. If this happens, we can't know + // the display width until we see the rest of the code point. Stash the + // bytes we do have, so that we can reconstruct the whole code point later, + // even if the buffer is being flushed. + if ((End - Ptr) < NumBytes) { + PartialUTF8Char = StringRef(Ptr, End - Ptr); + return; + } + + ProcessUTF8CodePoint(StringRef(Ptr, NumBytes)); } } @@ -52,9 +100,9 @@ void formatted_raw_ostream::ComputePosition(const char *Ptr, size_t Size) { if (Ptr <= Scanned && Scanned <= Ptr + Size) // Scan all characters added since our last scan to determine the new // column. - UpdatePosition(Position, Scanned, Size - (Scanned - Ptr)); + UpdatePosition(Scanned, Size - (Scanned - Ptr)); else - UpdatePosition(Position, Ptr, Size); + UpdatePosition(Ptr, Size); // Update the scanning pointer. Scanned = Ptr + Size; diff --git a/llvm/test/MC/ARM/lsl-zero.s b/llvm/test/MC/ARM/lsl-zero.s index 5d097115448f..6e64e0012362 100644 --- a/llvm/test/MC/ARM/lsl-zero.s +++ b/llvm/test/MC/ARM/lsl-zero.s @@ -1,6 +1,6 @@ -// RUN: llvm-mc -triple=thumbv7 -show-encoding < %s 2>&1 | FileCheck --check-prefix=CHECK --check-prefix=CHECK-NONARM --check-prefix=CHECK-THUMBV7 %s -// RUN: llvm-mc -triple=thumbv8 -show-encoding < %s 2>&1 | FileCheck --check-prefix=CHECK --check-prefix=CHECK-NONARM --check-prefix=CHECK-THUMBV8 %s -// RUN: llvm-mc -triple=armv7 -show-encoding < %s 2>&1 | FileCheck --check-prefix=CHECK --check-prefix=CHECK-ARM %s +// RUN: llvm-mc -triple=thumbv7 -show-encoding < %s 2>/dev/null | FileCheck --check-prefix=CHECK --check-prefix=CHECK-NONARM --check-prefix=CHECK-THUMBV7 %s +// RUN: llvm-mc -triple=thumbv8 -show-encoding < %s 2>/dev/null | FileCheck --check-prefix=CHECK --check-prefix=CHECK-NONARM --check-prefix=CHECK-THUMBV8 %s +// RUN: llvm-mc -triple=armv7 -show-encoding < %s 2>/dev/null | FileCheck --check-prefix=CHECK --check-prefix=CHECK-ARM %s // lsl #0 is actually mov, so here we check that it behaves the same as // mov with regards to the permitted registers and how it behaves in an diff --git a/llvm/unittests/Support/formatted_raw_ostream_test.cpp b/llvm/unittests/Support/formatted_raw_ostream_test.cpp index 0fe0869922a1..5c57f1f18700 100644 --- a/llvm/unittests/Support/formatted_raw_ostream_test.cpp +++ b/llvm/unittests/Support/formatted_raw_ostream_test.cpp @@ -29,4 +29,143 @@ TEST(formatted_raw_ostreamTest, Test_Tell) { } } +TEST(formatted_raw_ostreamTest, Test_LineColumn) { + // Test tracking of line and column numbers in a stream. + SmallString<128> A; + raw_svector_ostream B(A); + formatted_raw_ostream C(B); + + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(0U, C.getColumn()); + + C << "a"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(1U, C.getColumn()); + + C << "bcdef"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(6U, C.getColumn()); + + // '\n' increments line number, sets column to zero. + C << "\n"; + EXPECT_EQ(1U, C.getLine()); + EXPECT_EQ(0U, C.getColumn()); + + // '\r sets column to zero without changing line number + C << "foo\r"; + EXPECT_EQ(1U, C.getLine()); + EXPECT_EQ(0U, C.getColumn()); + + // '\t' advances column to the next multiple of 8. + // FIXME: If the column number is already a multiple of 8 this will do + // nothing, is this behaviour correct? + C << "1\t"; + EXPECT_EQ(8U, C.getColumn()); + C << "\t"; + EXPECT_EQ(8U, C.getColumn()); + C << "1234567\t"; + EXPECT_EQ(16U, C.getColumn()); + EXPECT_EQ(1U, C.getLine()); +} + +TEST(formatted_raw_ostreamTest, Test_Flush) { + // Flushing the buffer causes the characters in the buffer to be scanned + // before the buffer is emptied, so line and column numbers will still be + // tracked properly. + SmallString<128> A; + raw_svector_ostream B(A); + B.SetBufferSize(32); + formatted_raw_ostream C(B); + + C << "\nabc"; + EXPECT_EQ(4U, C.GetNumBytesInBuffer()); + C.flush(); + EXPECT_EQ(1U, C.getLine()); + EXPECT_EQ(3U, C.getColumn()); + EXPECT_EQ(0U, C.GetNumBytesInBuffer()); +} + +TEST(formatted_raw_ostreamTest, Test_UTF8) { + SmallString<128> A; + raw_svector_ostream B(A); + B.SetBufferSize(32); + formatted_raw_ostream C(B); + + // U+00A0 Non-breaking space: encoded as two bytes, but only one column wide. + C << u8"\u00a0"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(1U, C.getColumn()); + EXPECT_EQ(2U, C.GetNumBytesInBuffer()); + + // U+2468 CIRCLED DIGIT NINE: encoded as three bytes, but only one column + // wide. + C << u8"\u2468"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(2U, C.getColumn()); + EXPECT_EQ(5U, C.GetNumBytesInBuffer()); + + // U+00010000 LINEAR B SYLLABLE B008 A: encoded as four bytes, but only one + // column wide. + C << u8"\U00010000"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(3U, C.getColumn()); + EXPECT_EQ(9U, C.GetNumBytesInBuffer()); + + // U+55B5, CJK character, encodes as three bytes, takes up two columns. + C << u8"\u55b5"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(5U, C.getColumn()); + EXPECT_EQ(12U, C.GetNumBytesInBuffer()); + + // U+200B, zero-width space, encoded as three bytes but has no effect on the + // column or line number. + C << u8"\u200b"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(5U, C.getColumn()); + EXPECT_EQ(15U, C.GetNumBytesInBuffer()); +} + +TEST(formatted_raw_ostreamTest, Test_UTF8Buffered) { + SmallString<128> A; + raw_svector_ostream B(A); + B.SetBufferSize(4); + formatted_raw_ostream C(B); + + // U+2468 encodes as three bytes, so will cause the buffer to be flushed after + // the first byte (4 byte buffer, 3 bytes already written). We need to save + // the first part of the UTF-8 encoding until after the buffer is cleared and + // the remaining two bytes are written, at which point we can check the + // display width. In this case the display width is 1, so we end at column 4, + // with 6 bytes written into total, 2 of which are in the buffer. + C << u8"123\u2468"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(4U, C.getColumn()); + EXPECT_EQ(2U, C.GetNumBytesInBuffer()); + C.flush(); + EXPECT_EQ(6U, A.size()); + + // Same as above, but with a CJK character which displays as two columns. + C << u8"123\u55b5"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(9U, C.getColumn()); + EXPECT_EQ(2U, C.GetNumBytesInBuffer()); + C.flush(); + EXPECT_EQ(12U, A.size()); +} + +TEST(formatted_raw_ostreamTest, Test_UTF8TinyBuffer) { + SmallString<128> A; + raw_svector_ostream B(A); + B.SetBufferSize(1); + formatted_raw_ostream C(B); + + // The stream has a one-byte buffer, so it gets flushed multiple times while + // printing a single Unicode character. + C << u8"\u2468"; + EXPECT_EQ(0U, C.getLine()); + EXPECT_EQ(1U, C.getColumn()); + EXPECT_EQ(0U, C.GetNumBytesInBuffer()); + C.flush(); + EXPECT_EQ(3U, A.size()); +} } From llvm-commits at lists.llvm.org Mon Jul 6 08:49:47 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Mon, 06 Jul 2020 08:49:47 -0700 (PDT) Subject: [llvm] 6d3ae36 - [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s) Message-ID: <5f03481b.1c69fb81.7f1c2.4e27@mx.google.com> Author: jasonliu Date: 2020-07-06T15:49:15Z New Revision: 6d3ae365bdfc846b93d1033cfe625403a3b85ac5 URL: https://github.com/llvm/llvm-project/commit/6d3ae365bdfc846b93d1033cfe625403a3b85ac5 DIFF: https://github.com/llvm/llvm-project/commit/6d3ae365bdfc846b93d1033cfe625403a3b85ac5.diff LOG: [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s) Summary: When a desired symbol name contains invalid character that the system assembler could not process, we need to emit .rename directive in assembly path in order for that desired symbol name to appear in the symbol table. Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L Differential Revision: https://reviews.llvm.org/D82481 Added: llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll Modified: llvm/include/llvm/MC/MCAsmInfo.h llvm/include/llvm/MC/MCContext.h llvm/include/llvm/MC/MCSectionXCOFF.h llvm/include/llvm/MC/MCStreamer.h llvm/include/llvm/MC/MCSymbolXCOFF.h llvm/include/llvm/MC/MCXCOFFStreamer.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/MC/MCAsmInfoXCOFF.cpp llvm/lib/MC/MCAsmStreamer.cpp llvm/lib/MC/MCContext.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/MC/MCSymbolXCOFF.cpp llvm/lib/MC/XCOFFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/test_func_desc.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/MC/MCAsmInfo.h b/llvm/include/llvm/MC/MCAsmInfo.h index 974155ff3319..46c5a111c891 100644 --- a/llvm/include/llvm/MC/MCAsmInfo.h +++ b/llvm/include/llvm/MC/MCAsmInfo.h @@ -343,10 +343,6 @@ class MCAsmInfo { /// protected visibility. Defaults to MCSA_Protected MCSymbolAttr ProtectedVisibilityAttr = MCSA_Protected; - // This attribute is used to indicate symbols such as commons on AIX may have - // a storage mapping class embedded in the name. - bool SymbolsHaveSMC = false; - //===--- Dwarf Emission Directives -----------------------------------===// /// True if target supports emission of debugging information. Defaults to @@ -606,8 +602,6 @@ class MCAsmInfo { return ProtectedVisibilityAttr; } - bool getSymbolsHaveSMC() const { return SymbolsHaveSMC; } - bool doesSupportDebugInformation() const { return SupportsDebugInformation; } bool doesSupportExceptionHandling() const { diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h index 3f106c697b01..45be9bb3d225 100644 --- a/llvm/include/llvm/MC/MCContext.h +++ b/llvm/include/llvm/MC/MCContext.h @@ -57,6 +57,7 @@ namespace llvm { class MCSymbol; class MCSymbolELF; class MCSymbolWasm; + class MCSymbolXCOFF; class SMLoc; class SourceMgr; @@ -308,6 +309,9 @@ namespace llvm { unsigned UniqueID, const MCSymbolELF *LinkedToSym); + MCSymbolXCOFF *createXCOFFSymbolImpl(const StringMapEntry *Name, + bool IsTemporary); + /// Map of currently defined macros. StringMap MacroMap; diff --git a/llvm/include/llvm/MC/MCSectionXCOFF.h b/llvm/include/llvm/MC/MCSectionXCOFF.h index f6e7b0933d11..eed6b9c2609c 100644 --- a/llvm/include/llvm/MC/MCSectionXCOFF.h +++ b/llvm/include/llvm/MC/MCSectionXCOFF.h @@ -36,13 +36,15 @@ class MCSectionXCOFF final : public MCSection { XCOFF::SymbolType Type; XCOFF::StorageClass StorageClass; MCSymbolXCOFF *const QualName; + StringRef SymbolTableName; static constexpr unsigned DefaultAlignVal = 4; MCSectionXCOFF(StringRef Name, XCOFF::StorageMappingClass SMC, XCOFF::SymbolType ST, XCOFF::StorageClass SC, SectionKind K, - MCSymbolXCOFF *QualName, MCSymbol *Begin) + MCSymbolXCOFF *QualName, MCSymbol *Begin, + StringRef SymbolTableName) : MCSection(SV_XCOFF, Name, K, Begin), MappingClass(SMC), Type(ST), - StorageClass(SC), QualName(QualName) { + StorageClass(SC), QualName(QualName), SymbolTableName(SymbolTableName) { assert((ST == XCOFF::XTY_SD || ST == XCOFF::XTY_CM || ST == XCOFF::XTY_ER) && "Invalid or unhandled type for csect."); assert(QualName != nullptr && "QualName is needed."); @@ -72,6 +74,7 @@ class MCSectionXCOFF final : public MCSection { const MCExpr *Subsection) const override; bool UseCodeAlign() const override; bool isVirtualSection() const override; + StringRef getSymbolTableName() const { return SymbolTableName; } }; } // end namespace llvm diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h index f77ecb5af50c..d7255a22e941 100644 --- a/llvm/include/llvm/MC/MCStreamer.h +++ b/llvm/include/llvm/MC/MCStreamer.h @@ -574,6 +574,16 @@ class MCStreamer { virtual void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol, MCSymbolAttr Linkage, MCSymbolAttr Visibility); + + /// Emit a XCOFF .rename directive which creates a synonym for an illegal or + /// undesirable name. + /// + /// \param Name - The name used internally in the assembly for references to + /// the symbol. + /// \param Rename - The value to which the Name parameter is + /// changed at the end of assembly. + virtual void emitXCOFFRenameDirective(const MCSymbol *Name, StringRef Rename); + /// Emit an ELF .size directive. /// /// This corresponds to an assembler statement such as: diff --git a/llvm/include/llvm/MC/MCSymbolXCOFF.h b/llvm/include/llvm/MC/MCSymbolXCOFF.h index e5bea10fd5d5..d0379ec08b7d 100644 --- a/llvm/include/llvm/MC/MCSymbolXCOFF.h +++ b/llvm/include/llvm/MC/MCSymbolXCOFF.h @@ -24,6 +24,16 @@ class MCSymbolXCOFF : public MCSymbol { static bool classof(const MCSymbol *S) { return S->isXCOFF(); } + static StringRef getUnqualifiedName(StringRef Name) { + if (Name.back() == ']') { + StringRef Lhs, Rhs; + std::tie(Lhs, Rhs) = Name.rsplit('['); + assert(!Rhs.empty() && "Invalid SMC format in XCOFF symbol."); + return Lhs; + } + return Name; + } + void setStorageClass(XCOFF::StorageClass SC) { assert((!StorageClass.hasValue() || StorageClass.getValue() == SC) && "Redefining StorageClass of XCOFF MCSymbol."); @@ -36,16 +46,7 @@ class MCSymbolXCOFF : public MCSymbol { return StorageClass.getValue(); } - StringRef getUnqualifiedName() const { - const StringRef name = getName(); - if (name.back() == ']') { - StringRef lhs, rhs; - std::tie(lhs, rhs) = name.rsplit('['); - assert(!rhs.empty() && "Invalid SMC format in XCOFF symbol."); - return lhs; - } - return name; - } + StringRef getUnqualifiedName() const { return getUnqualifiedName(getName()); } bool hasRepresentedCsectSet() const { return RepresentedCsect != nullptr; } @@ -57,10 +58,21 @@ class MCSymbolXCOFF : public MCSymbol { XCOFF::VisibilityType getVisibilityType() const { return VisibilityType; } + bool hasRename() const { return !SymbolTableName.empty(); } + + void setSymbolTableName(StringRef STN) { SymbolTableName = STN; } + + StringRef getSymbolTableName() const { + if (hasRename()) + return SymbolTableName; + return getUnqualifiedName(); + } + private: Optional StorageClass; MCSectionXCOFF *RepresentedCsect = nullptr; XCOFF::VisibilityType VisibilityType = XCOFF::SYM_V_UNSPECIFIED; + StringRef SymbolTableName; }; } // end namespace llvm diff --git a/llvm/include/llvm/MC/MCXCOFFStreamer.h b/llvm/include/llvm/MC/MCXCOFFStreamer.h index 416a55f2c8fc..5fc2efbe5284 100644 --- a/llvm/include/llvm/MC/MCXCOFFStreamer.h +++ b/llvm/include/llvm/MC/MCXCOFFStreamer.h @@ -32,6 +32,11 @@ class MCXCOFFStreamer : public MCObjectStreamer { void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol, MCSymbolAttr Linkage, MCSymbolAttr Visibility) override; + void emitXCOFFRenameDirective(const MCSymbol *Name, + StringRef Rename) override { + report_fatal_error("emitXCOFFRenameDirective is not implemented yet on " + "object generation path"); + } }; } // end namespace llvm diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index 98b047d74eba..27e9ffe9ea07 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -682,8 +682,7 @@ void AsmPrinter::emitFunctionHeader() { if (!MAI->hasVisibilityOnlyWithLinkage()) emitVisibility(CurrentFnSym, F.getVisibility()); - if (MAI->needsFunctionDescriptors() && - F.getLinkage() != GlobalValue::InternalLinkage) + if (MAI->needsFunctionDescriptors()) emitLinkage(&F, CurrentFnDescSym); emitLinkage(&F, CurrentFnSym); diff --git a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp index 8d670300933f..eef5c1463fde 100644 --- a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp +++ b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp @@ -2168,6 +2168,6 @@ MCSection *TargetLoweringObjectFileXCOFF::getSectionForFunctionDescriptor( MCSection *TargetLoweringObjectFileXCOFF::getSectionForTOCEntry( const MCSymbol *Sym) const { return getContext().getXCOFFSection( - cast(Sym)->getUnqualifiedName(), XCOFF::XMC_TC, + cast(Sym)->getSymbolTableName(), XCOFF::XMC_TC, XCOFF::XTY_SD, XCOFF::C_HIDEXT, SectionKind::getData()); } diff --git a/llvm/lib/MC/MCAsmInfoXCOFF.cpp b/llvm/lib/MC/MCAsmInfoXCOFF.cpp index 1531f61da95e..b5c5bb3ace8e 100644 --- a/llvm/lib/MC/MCAsmInfoXCOFF.cpp +++ b/llvm/lib/MC/MCAsmInfoXCOFF.cpp @@ -7,6 +7,7 @@ //===----------------------------------------------------------------------===// #include "llvm/MC/MCAsmInfoXCOFF.h" +#include "llvm/ADT/StringExtras.h" using namespace llvm; @@ -32,7 +33,6 @@ MCAsmInfoXCOFF::MCAsmInfoXCOFF() { COMMDirectiveAlignmentIsInBytes = false; LCOMMDirectiveAlignmentType = LCOMM::Log2Alignment; HasDotTypeDotSizeDirective = false; - SymbolsHaveSMC = true; UseIntegratedAssembler = false; NeedsFunctionDescriptors = true; } @@ -43,5 +43,8 @@ bool MCAsmInfoXCOFF::isAcceptableChar(char C) const { if (C == '[' || C == ']') return true; - return MCAsmInfo::isAcceptableChar(C); + // For AIX assembler, symbols may consist of numeric digits, + // underscores, periods, uppercase or lowercase letters, or + // any combination of these. + return isAlnum(C) || C == '_' || C == '.'; } diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp index 9a86895a2fe1..0747ab2372ab 100644 --- a/llvm/lib/MC/MCAsmStreamer.cpp +++ b/llvm/lib/MC/MCAsmStreamer.cpp @@ -28,6 +28,7 @@ #include "llvm/MC/MCRegisterInfo.h" #include "llvm/MC/MCSectionMachO.h" #include "llvm/MC/MCStreamer.h" +#include "llvm/MC/MCSymbolXCOFF.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/Format.h" #include "llvm/Support/FormattedStream.h" @@ -174,6 +175,8 @@ class MCAsmStreamer final : public MCStreamer { void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol, MCSymbolAttr Linakge, MCSymbolAttr Visibility) override; + void emitXCOFFRenameDirective(const MCSymbol *Name, + StringRef Rename) override; void emitELFSize(MCSymbol *Symbol, const MCExpr *Value) override; void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size, @@ -797,6 +800,11 @@ void MCAsmStreamer::emitXCOFFLocalCommonSymbol(MCSymbol *LabelSym, void MCAsmStreamer::emitXCOFFSymbolLinkageWithVisibility( MCSymbol *Symbol, MCSymbolAttr Linkage, MCSymbolAttr Visibility) { + // Print symbol's rename (original name contains invalid character(s)) if + // there is one. + if (cast(Symbol)->hasRename()) + emitXCOFFRenameDirective(Symbol, + cast(Symbol)->getSymbolTableName()); switch (Linkage) { case MCSA_Global: @@ -808,6 +816,9 @@ void MCAsmStreamer::emitXCOFFSymbolLinkageWithVisibility( case MCSA_Extern: OS << "\t.extern\t"; break; + case MCSA_LGlobal: + OS << "\t.lglobl\t"; + break; default: report_fatal_error("unhandled linkage type"); } @@ -830,6 +841,22 @@ void MCAsmStreamer::emitXCOFFSymbolLinkageWithVisibility( EmitEOL(); } +void MCAsmStreamer::emitXCOFFRenameDirective(const MCSymbol *Name, + StringRef Rename) { + OS << "\t.rename\t"; + Name->print(OS, MAI); + const char DQ = '"'; + OS << ',' << DQ; + for (char C : Rename) { + // To escape a double quote character, the character should be doubled. + if (C == DQ) + OS << DQ; + OS << C; + } + OS << DQ; + EmitEOL(); +} + void MCAsmStreamer::emitELFSize(MCSymbol *Symbol, const MCExpr *Value) { assert(MAI->hasDotTypeDotSizeDirective()); OS << "\t.size\t"; @@ -841,6 +868,12 @@ void MCAsmStreamer::emitELFSize(MCSymbol *Symbol, const MCExpr *Value) { void MCAsmStreamer::emitCommonSymbol(MCSymbol *Symbol, uint64_t Size, unsigned ByteAlignment) { + // Print symbol's rename (original name contains invalid character(s)) if + // there is one. + MCSymbolXCOFF *XSym = dyn_cast(Symbol); + if (XSym && XSym->hasRename()) + emitXCOFFRenameDirective(XSym, XSym->getSymbolTableName()); + OS << "\t.comm\t"; Symbol->print(OS, MAI); OS << ',' << Size; diff --git a/llvm/lib/MC/MCContext.cpp b/llvm/lib/MC/MCContext.cpp index 92d99ec577b5..a0f9212f3b14 100644 --- a/llvm/lib/MC/MCContext.cpp +++ b/llvm/lib/MC/MCContext.cpp @@ -182,7 +182,7 @@ MCSymbol *MCContext::createSymbolImpl(const StringMapEntry *Name, case MCObjectFileInfo::IsWasm: return new (Name, *this) MCSymbolWasm(Name, IsTemporary); case MCObjectFileInfo::IsXCOFF: - return new (Name, *this) MCSymbolXCOFF(Name, IsTemporary); + return createXCOFFSymbolImpl(Name, IsTemporary); } } return new (Name, *this) MCSymbol(MCSymbol::SymbolKindUnset, Name, @@ -292,6 +292,61 @@ void MCContext::registerInlineAsmLabel(MCSymbol *Sym) { InlineAsmUsedLabelNames[Sym->getName()] = Sym; } +MCSymbolXCOFF * +MCContext::createXCOFFSymbolImpl(const StringMapEntry *Name, + bool IsTemporary) { + if (!Name) + return new (nullptr, *this) MCSymbolXCOFF(nullptr, IsTemporary); + + StringRef OriginalName = Name->first(); + if (OriginalName.startswith("._Renamed..") || + OriginalName.startswith("_Renamed..")) + reportError(SMLoc(), "invalid symbol name from source"); + + if (MAI->isValidUnquotedName(OriginalName)) + return new (Name, *this) MCSymbolXCOFF(Name, IsTemporary); + + // Now we have a name that contains invalid character(s) for XCOFF symbol. + // Let's replace with something valid, but save the original name so that + // we could still use the original name in the symbol table. + SmallString<128> InvalidName(OriginalName); + + // If it's an entry point symbol, we will keep the '.' + // in front for the convention purpose. Otherwise, add "_Renamed.." + // as prefix to signal this is an renamed symbol. + const bool IsEntryPoint = !InvalidName.empty() && InvalidName[0] == '.'; + SmallString<128> ValidName = + StringRef(IsEntryPoint ? "._Renamed.." : "_Renamed.."); + + // Append the hex values of '_' and invalid characters with "_Renamed.."; + // at the same time replace invalid characters with '_'. + for (size_t I = 0; I < InvalidName.size(); ++I) { + if (!MAI->isAcceptableChar(InvalidName[I]) || InvalidName[I] == '_') { + raw_svector_ostream(ValidName).write_hex(InvalidName[I]); + InvalidName[I] = '_'; + } + } + + // Skip entry point symbol's '.' as we already have a '.' in front of + // "_Renamed". + if (IsEntryPoint) + ValidName.append(InvalidName.substr(1, InvalidName.size() - 1)); + else + ValidName.append(InvalidName); + + auto NameEntry = UsedNames.insert(std::make_pair(ValidName, true)); + assert((NameEntry.second || !NameEntry.first->second) && + "This name is used somewhere else."); + // Mark the name as used for a non-section symbol. + NameEntry.first->second = true; + // Have the MCSymbol object itself refer to the copy of the string + // that is embedded in the UsedNames entry. + MCSymbolXCOFF *XSym = new (&*NameEntry.first, *this) + MCSymbolXCOFF(&*NameEntry.first, IsTemporary); + XSym->setSymbolTableName(MCSymbolXCOFF::getUnqualifiedName(OriginalName)); + return XSym; +} + //===----------------------------------------------------------------------===// // Section Management //===----------------------------------------------------------------------===// @@ -610,15 +665,18 @@ MCSectionXCOFF *MCContext::getXCOFFSection(StringRef Section, // Otherwise, return a new section. StringRef CachedName = Entry.first.SectionName; - MCSymbol *QualName = getOrCreateSymbol( - CachedName + "[" + XCOFF::getMappingClassString(SMC) + "]"); + MCSymbolXCOFF *QualName = cast(getOrCreateSymbol( + CachedName + "[" + XCOFF::getMappingClassString(SMC) + "]")); MCSymbol *Begin = nullptr; if (BeginSymName) Begin = createTempSymbol(BeginSymName, false); - MCSectionXCOFF *Result = new (XCOFFAllocator.Allocate()) MCSectionXCOFF( - CachedName, SMC, Type, SC, Kind, cast(QualName), Begin); + // QualName->getUnqualifiedName() and CachedName are the same except when + // CachedName contains invalid character(s) such as '$' for an XCOFF symbol. + MCSectionXCOFF *Result = new (XCOFFAllocator.Allocate()) + MCSectionXCOFF(QualName->getUnqualifiedName(), SMC, Type, SC, Kind, + QualName, Begin, CachedName); Entry.second = Result; auto *F = new MCDataFragment(); diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp index 0893835caf26..6d3a933c96a3 100644 --- a/llvm/lib/MC/MCStreamer.cpp +++ b/llvm/lib/MC/MCStreamer.cpp @@ -1071,6 +1071,12 @@ void MCStreamer::emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol, "XCOFF targets"); } +void MCStreamer::emitXCOFFRenameDirective(const MCSymbol *Name, + StringRef Rename) { + llvm_unreachable("emitXCOFFRenameDirective is only supported on " + "XCOFF targets"); +} + void MCStreamer::emitELFSize(MCSymbol *Symbol, const MCExpr *Value) {} void MCStreamer::emitELFSymverDirective(StringRef AliasName, const MCSymbol *Aliasee) {} diff --git a/llvm/lib/MC/MCSymbolXCOFF.cpp b/llvm/lib/MC/MCSymbolXCOFF.cpp index 53aec88c8978..536153e5518b 100644 --- a/llvm/lib/MC/MCSymbolXCOFF.cpp +++ b/llvm/lib/MC/MCSymbolXCOFF.cpp @@ -17,6 +17,9 @@ MCSectionXCOFF *MCSymbolXCOFF::getRepresentedCsect() const { RepresentedCsect->getCSectType() == XCOFF::XTY_ER) && "Symbol does not represent a csect; MCSectionXCOFF that represents " "the symbol should not be (but is) set."); + assert(getSymbolTableName().equals(RepresentedCsect->getSymbolTableName()) && + "SymbolTableNames need to be the same for this symbol and its csect " + "representation."); return RepresentedCsect; } @@ -29,5 +32,8 @@ void MCSymbolXCOFF::setRepresentedCsect(MCSectionXCOFF *C) { C->getCSectType() == XCOFF::XTY_ER) && "Symbol does not represent a csect; can only set a MCSectionXCOFF " "representation for a csect."); + assert(getSymbolTableName().equals(C->getSymbolTableName()) && + "SymbolTableNames need to be the same for this symbol and its csect " + "representation."); RepresentedCsect = C; } diff --git a/llvm/lib/MC/XCOFFObjectWriter.cpp b/llvm/lib/MC/XCOFFObjectWriter.cpp index 70146315e120..0dabdc9777d6 100644 --- a/llvm/lib/MC/XCOFFObjectWriter.cpp +++ b/llvm/lib/MC/XCOFFObjectWriter.cpp @@ -68,7 +68,7 @@ struct Symbol { XCOFF::StorageClass getStorageClass() const { return MCSym->getStorageClass(); } - StringRef getName() const { return MCSym->getName(); } + StringRef getSymbolTableName() const { return MCSym->getSymbolTableName(); } Symbol(const MCSymbolXCOFF *MCSym) : MCSym(MCSym), SymbolTableIndex(-1) {} }; @@ -81,7 +81,7 @@ struct ControlSection { SmallVector Syms; SmallVector Relocations; - StringRef getName() const { return MCCsect->getName(); } + StringRef getSymbolTableName() const { return MCCsect->getSymbolTableName(); } ControlSection(const MCSectionXCOFF *MCSec) : MCCsect(MCSec), SymbolTableIndex(-1), Address(-1), Size(0) {} }; @@ -334,8 +334,8 @@ void XCOFFObjectWriter::executePostLayoutBinding(MCAssembler &Asm, // If the name does not fit in the storage provided in the symbol table // entry, add it to the string table. - if (nameShouldBeInStringTable(MCSec->getName())) - Strings.add(MCSec->getName()); + if (nameShouldBeInStringTable(MCSec->getSymbolTableName())) + Strings.add(MCSec->getSymbolTableName()); CsectGroup &Group = getCsectGroup(MCSec); Group.emplace_back(MCSec); @@ -354,8 +354,8 @@ void XCOFFObjectWriter::executePostLayoutBinding(MCAssembler &Asm, // Handle undefined symbol. UndefinedCsects.emplace_back(ContainingCsect); SectionMap[ContainingCsect] = &UndefinedCsects.back(); - if (nameShouldBeInStringTable(ContainingCsect->getName())) - Strings.add(ContainingCsect->getName()); + if (nameShouldBeInStringTable(ContainingCsect->getSymbolTableName())) + Strings.add(ContainingCsect->getSymbolTableName()); continue; } @@ -375,8 +375,8 @@ void XCOFFObjectWriter::executePostLayoutBinding(MCAssembler &Asm, // If the name does not fit in the storage provided in the symbol table // entry, add it to the string table. - if (nameShouldBeInStringTable(XSym->getName())) - Strings.add(XSym->getName()); + if (nameShouldBeInStringTable(XSym->getSymbolTableName())) + Strings.add(XSym->getSymbolTableName()); } Strings.finalize(); @@ -555,7 +555,7 @@ void XCOFFObjectWriter::writeSymbolTableEntryForCsectMemberLabel( const Symbol &SymbolRef, const ControlSection &CSectionRef, int16_t SectionIndex, uint64_t SymbolOffset) { // Name or Zeros and string table offset - writeSymbolName(SymbolRef.getName()); + writeSymbolName(SymbolRef.getSymbolTableName()); assert(SymbolOffset <= UINT32_MAX - CSectionRef.Address && "Symbol address overflows."); W.write(CSectionRef.Address + SymbolOffset); @@ -592,7 +592,7 @@ void XCOFFObjectWriter::writeSymbolTableEntryForControlSection( const ControlSection &CSectionRef, int16_t SectionIndex, XCOFF::StorageClass StorageClass) { // n_name, n_zeros, n_offset - writeSymbolName(CSectionRef.getName()); + writeSymbolName(CSectionRef.getSymbolTableName()); // n_value W.write(CSectionRef.Address); // n_scnum diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp index d492fd848c7e..3092d56da1c5 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp @@ -120,14 +120,17 @@ class PPCTargetAsmStreamer : public PPCTargetStreamer { : PPCTargetStreamer(S), OS(OS) {} void emitTCEntry(const MCSymbol &S) override { - const MCAsmInfo *MAI = Streamer.getContext().getAsmInfo(); - OS << "\t.tc "; - OS << (MAI->getSymbolsHaveSMC() - ? cast(S).getUnqualifiedName() - : S.getName()); - OS << "[TC],"; - OS << S.getName(); - OS << '\n'; + if (const MCSymbolXCOFF *XSym = dyn_cast(&S)) { + MCSymbolXCOFF *TCSym = + cast(Streamer.getContext().getOrCreateSymbol( + XSym->getSymbolTableName() + "[TC]")); + if (TCSym->hasRename()) + Streamer.emitXCOFFRenameDirective(TCSym, TCSym->getSymbolTableName()); + OS << "\t.tc " << TCSym->getName() << "," << XSym->getName() << '\n'; + return; + } + + OS << "\t.tc " << S.getName() << "[TC]," << S.getName() << '\n'; } void emitMachine(StringRef CPU) override { diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp index b6c3086169e9..bf5fe741bac8 100644 --- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp +++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp @@ -1608,8 +1608,10 @@ void PPCAIXAsmPrinter::emitLinkage(const GlobalValue *GV, case GlobalValue::PrivateLinkage: return; case GlobalValue::InternalLinkage: - OutStreamer->emitSymbolAttribute(GVSym, MCSA_LGlobal); - return; + assert(GV->getVisibility() == GlobalValue::DefaultVisibility && + "InternalLinkage should not have other visibility setting."); + LinkageAttr = MCSA_LGlobal; + break; case GlobalValue::AppendingLinkage: llvm_unreachable("Should never emit this"); case GlobalValue::CommonLinkage: @@ -1621,8 +1623,7 @@ void PPCAIXAsmPrinter::emitLinkage(const GlobalValue *GV, MCSymbolAttr VisibilityAttr = MCSA_Invalid; switch (GV->getVisibility()) { - // TODO: "exported" and "internal" Visibility needs to go here. - + // TODO: "exported" and "internal" Visibility needs to go here. case GlobalValue::DefaultVisibility: break; case GlobalValue::HiddenVisibility: diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index a31b3fef2aba..40619519664f 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -5345,7 +5345,7 @@ static SDValue transformCallee(const SDValue &Callee, SelectionDAG &DAG, // MCSectionXCOFF to get the correct storage mapping class. // In this case, XCOFF::XMC_PR. MCSectionXCOFF *Sec = Context.getXCOFFSection( - S->getName(), XCOFF::XMC_PR, XCOFF::XTY_ER, SC, + S->getSymbolTableName(), XCOFF::XMC_PR, XCOFF::XTY_ER, SC, SectionKind::getMetadata()); S->setRepresentedCsect(Sec); } diff --git a/llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll b/llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll new file mode 100644 index 000000000000..ee3f9da68dad --- /dev/null +++ b/llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll @@ -0,0 +1,161 @@ +;; This file tests how llc handles symbols containing invalid characters on an +;; XCOFF platform. +;; Since symbol name resolution is the same between 32-bit and 64-bit, +;; tests for 64-bit mode are omitted. + +; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff -mcpu=pwr4 \ +; RUN: -mattr=-altivec < %s | \ +; RUN: FileCheck --check-prefix=ASM %s + +; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff -mcpu=pwr4 \ +; RUN: -mattr=-altivec -filetype=obj -o %t.o < %s +; RUN: llvm-objdump -D -r --symbol-description %t.o | \ +; RUN: FileCheck --check-prefix=OBJ %s + +; This is f`o +@"f\60o" = global i32 10, align 4 + +; This is f"o" +@"f\22o\22" = common global i32 0, align 4 + +define internal i32 @f$o() { +entry: + %call = call i32 bitcast (i32 (...)* @"f\40o" to i32 ()*)() + ret i32 %call +} + +; This is f&o +define i32 @"f\26o"() { +entry: + %call = call i32 @f$o() + ret i32 %call +} + +; This is f&_o +define i32 (...)* @"f\26_o"() { +entry: + ret i32 (...)* @"f\40o" +} + +; This is f at o +declare i32 @"f\40o"(...) + +; ASM: .rename _Renamed..24f_o[DS],"f$o" # -- Begin function f$o +; ASM-NEXT: .lglobl _Renamed..24f_o[DS] +; ASM-NEXT: .rename ._Renamed..24f_o,".f$o" +; ASM-NEXT: .lglobl ._Renamed..24f_o +; ASM-NEXT: .align 4 +; ASM-NEXT: .csect _Renamed..24f_o[DS],2 +; ASM-NEXT: .vbyte 4, ._Renamed..24f_o # @"f$o" +; ASM-NEXT: .vbyte 4, TOC[TC0] +; ASM-NEXT: .vbyte 4, 0 +; ASM-NEXT: .csect .text[PR],2 +; ASM-NEXT: ._Renamed..24f_o: +; ASM: bl ._Renamed..40f_o +; ASM-NEXT: nop +; ASM: .rename _Renamed..26f_o[DS],"f&o" # -- Begin function f&o +; ASM-NEXT: .globl _Renamed..26f_o[DS] +; ASM-NEXT: .rename ._Renamed..26f_o,".f&o" +; ASM-NEXT: .globl ._Renamed..26f_o +; ASM-NEXT: .align 4 +; ASM-NEXT: .csect _Renamed..26f_o[DS],2 +; ASM-NEXT: .vbyte 4, ._Renamed..26f_o # @"f&o" +; ASM-NEXT: .vbyte 4, TOC[TC0] +; ASM-NEXT: .vbyte 4, 0 +; ASM-NEXT: .csect .text[PR],2 +; ASM-NEXT: ._Renamed..26f_o: +; ASM: bl ._Renamed..24f_o +; ASM: .rename _Renamed..265ff__o[DS],"f&_o" # -- Begin function f&_o +; ASM-NEXT: .globl _Renamed..265ff__o[DS] +; ASM-NEXT: .rename ._Renamed..265ff__o,".f&_o" +; ASM-NEXT: .globl ._Renamed..265ff__o +; ASM-NEXT: .align 4 +; ASM-NEXT: .csect _Renamed..265ff__o[DS],2 +; ASM-NEXT: .vbyte 4, ._Renamed..265ff__o # @"f&_o" +; ASM-NEXT: .vbyte 4, TOC[TC0] +; ASM-NEXT: .vbyte 4, 0 +; ASM-NEXT: .csect .text[PR],2 +; ASM-NEXT: ._Renamed..265ff__o: +; ASM: .csect .data[RW],2 +; ASM-NEXT: .rename _Renamed..60f_o,"f`o" +; ASM-NEXT: .globl _Renamed..60f_o +; ASM-NEXT: .align 2 +; ASM-NEXT: _Renamed..60f_o: +; ASM-NEXT: .vbyte 4, 10 # 0xa +; ASM-NEXT: .rename _Renamed..2222f_o_[RW],"f""o""" +; ASM-NEXT: .comm _Renamed..2222f_o_[RW],4,2 +; ASM-NEXT: .rename ._Renamed..40f_o,".f at o" +; ASM-NEXT: .extern ._Renamed..40f_o +; ASM-NEXT: .rename _Renamed..40f_o[DS],"f at o" +; ASM-NEXT: .extern _Renamed..40f_o[DS] +; ASM-NEXT: .toc +; ASM-NEXT: L..C0: +; ASM-NEXT: .rename _Renamed..40f_o[TC],"f at o" +; ASM-NEXT: .tc _Renamed..40f_o[TC],_Renamed..40f_o[DS] + +; OBJ: Disassembly of section .text: +; OBJ-EMPTY: +; OBJ-NEXT: 00000000 (idx: 6) .f$o: +; OBJ-NEXT: 0: 7c 08 02 a6 mflr 0 +; OBJ-NEXT: 4: 90 01 00 08 stw 0, 8(1) +; OBJ-NEXT: 8: 94 21 ff c0 stwu 1, -64(1) +; OBJ-NEXT: c: 4b ff ff f5 bl 0x0 +; OBJ-NEXT: 0000000c: R_RBR (idx: 0) .f at o[PR] +; OBJ-NEXT: 10: 60 00 00 00 nop +; OBJ-NEXT: 14: 38 21 00 40 addi 1, 1, 64 +; OBJ-NEXT: 18: 80 01 00 08 lwz 0, 8(1) +; OBJ-NEXT: 1c: 7c 08 03 a6 mtlr 0 +; OBJ-NEXT: 20: 4e 80 00 20 blr +; OBJ-NEXT: 24: 60 00 00 00 nop +; OBJ-NEXT: 28: 60 00 00 00 nop +; OBJ-NEXT: 2c: 60 00 00 00 nop +; OBJ-EMPTY: +; OBJ-NEXT: 00000030 (idx: 8) .f&o: +; OBJ-NEXT: 30: 7c 08 02 a6 mflr 0 +; OBJ-NEXT: 34: 90 01 00 08 stw 0, 8(1) +; OBJ-NEXT: 38: 94 21 ff c0 stwu 1, -64(1) +; OBJ-NEXT: 3c: 4b ff ff c5 bl 0x0 +; OBJ-NEXT: 40: 38 21 00 40 addi 1, 1, 64 +; OBJ-NEXT: 44: 80 01 00 08 lwz 0, 8(1) +; OBJ-NEXT: 48: 7c 08 03 a6 mtlr 0 +; OBJ-NEXT: 4c: 4e 80 00 20 blr +; OBJ-EMPTY: +; OBJ-NEXT: 00000050 (idx: 10) .f&_o: +; OBJ-NEXT: 50: 80 62 00 00 lwz 3, 0(2) +; OBJ-NEXT: 00000052: R_TOC (idx: 24) f at o[TC] +; OBJ-NEXT: 54: 4e 80 00 20 blr +; OBJ-EMPTY: +; OBJ-NEXT: Disassembly of section .data: +; OBJ-EMPTY: +; OBJ-NEXT: 00000058 (idx: 14) f`o: +; OBJ-NEXT: 58: 00 00 00 0a +; OBJ-EMPTY: +; OBJ-NEXT: 0000005c (idx: 16) f$o[DS]: +; OBJ-NEXT: 5c: 00 00 00 00 +; OBJ-NEXT: 0000005c: R_POS (idx: 6) .f$o +; OBJ-NEXT: 60: 00 00 00 80 +; OBJ-NEXT: 00000060: R_POS (idx: 22) TOC[TC0] +; OBJ-NEXT: 64: 00 00 00 00 +; OBJ-EMPTY: +; OBJ-NEXT: 00000068 (idx: 18) f&o[DS]: +; OBJ-NEXT: 68: 00 00 00 30 +; OBJ-NEXT: 00000068: R_POS (idx: 8) .f&o +; OBJ-NEXT: 6c: 00 00 00 80 +; OBJ-NEXT: 0000006c: R_POS (idx: 22) TOC[TC0] +; OBJ-NEXT: 70: 00 00 00 00 +; OBJ-EMPTY: +; OBJ-NEXT: 00000074 (idx: 20) f&_o[DS]: +; OBJ-NEXT: 74: 00 00 00 50 +; OBJ-NEXT: 00000074: R_POS (idx: 10) .f&_o +; OBJ-NEXT: 78: 00 00 00 80 +; OBJ-NEXT: 00000078: R_POS (idx: 22) TOC[TC0] +; OBJ-NEXT: 7c: 00 00 00 00 +; OBJ-EMPTY: +; OBJ-NEXT: 00000080 (idx: 24) f at o[TC]: +; OBJ-NEXT: 80: 00 00 00 00 +; OBJ-NEXT: 00000080: R_POS (idx: 2) f at o[DS] +; OBJ-EMPTY: +; OBJ-NEXT: Disassembly of section .bss: +; OBJ-EMPTY: +; OBJ-NEXT: 00000084 (idx: 26) f"o"[RW]: +; OBJ-NEXT: ... diff --git a/llvm/test/CodeGen/PowerPC/test_func_desc.ll b/llvm/test/CodeGen/PowerPC/test_func_desc.ll index 500be4b63d2d..4c79bafeb035 100644 --- a/llvm/test/CodeGen/PowerPC/test_func_desc.ll +++ b/llvm/test/CodeGen/PowerPC/test_func_desc.ll @@ -56,6 +56,7 @@ entry: ; CHECK: bl .extern_foo ; CHECK: bl .static_foo +; CHECK: .lglobl static_foo[DS] ; CHECK: .lglobl .static_foo ; 32BIT: .csect static_foo[DS],2 ; 32BIT-NEXT: .vbyte 4, .static_foo From llvm-commits at lists.llvm.org Mon Jul 6 09:35:02 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Lu=C3=ADs_Marques?= via llvm-commits) Date: Mon, 06 Jul 2020 09:35:02 -0700 (PDT) Subject: [llvm] 61c2a0b - [RISCV] Fold ADDIs into load/stores with nonzero offsets Message-ID: <5f0352b6.1c69fb81.b1b17.3227@mx.google.com> Author: Luís Marques Date: 2020-07-06T17:32:57+01:00 New Revision: 61c2a0bb823677ce0e604b92e5dae65d9bd32b6e URL: https://github.com/llvm/llvm-project/commit/61c2a0bb823677ce0e604b92e5dae65d9bd32b6e DIFF: https://github.com/llvm/llvm-project/commit/61c2a0bb823677ce0e604b92e5dae65d9bd32b6e.diff LOG: [RISCV] Fold ADDIs into load/stores with nonzero offsets We can often fold an ADDI into the offset of load/store instructions: (load (addi base, off1), off2) -> (load base, off1+off2) (store val, (addi base, off1), off2) -> (store val, base, off1+off2) This is possible when the off1+off2 continues to fit the 12-bit immediate. We remove the previous restriction where we would never fold the ADDIs if the load/stores had nonzero offsets. We now do the fold the the resulting constant still fits a 12-bit immediate, or if off1 is a variable's address and we know based on that variable's alignment that off1+offs2 won't overflow. Differential Revision: https://reviews.llvm.org/D79690 Added: Modified: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll llvm/test/CodeGen/RISCV/callee-saved-gprs.ll llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll llvm/test/CodeGen/RISCV/fp128.ll llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll llvm/test/CodeGen/RISCV/wide-mem.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp index 7a86d5e80bce..e7584e4f60ea 100644 --- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp @@ -14,6 +14,7 @@ #include "MCTargetDesc/RISCVMCTargetDesc.h" #include "Utils/RISCVMatInt.h" #include "llvm/CodeGen/MachineFrameInfo.h" +#include "llvm/Support/Alignment.h" #include "llvm/Support/Debug.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" @@ -160,8 +161,9 @@ bool RISCVDAGToDAGISel::SelectAddrFI(SDValue Addr, SDValue &Base) { } // Merge an ADDI into the offset of a load/store instruction where possible. -// (load (add base, off), 0) -> (load base, off) -// (store val, (add base, off)) -> (store val, base, off) +// (load (addi base, off1), off2) -> (load base, off1+off2) +// (store val, (addi base, off1), off2) -> (store val, base, off1+off2) +// This is possible when off1+off2 fits a 12-bit immediate. void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() { SelectionDAG::allnodes_iterator Position(CurDAG->getRoot().getNode()); ++Position; @@ -202,10 +204,7 @@ void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() { break; } - // Currently, the load/store offset must be 0 to be considered for this - // peephole optimisation. - if (!isa(N->getOperand(OffsetOpIdx)) || - N->getConstantOperandVal(OffsetOpIdx) != 0) + if (!isa(N->getOperand(OffsetOpIdx))) continue; SDValue Base = N->getOperand(BaseOpIdx); @@ -215,18 +214,39 @@ void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() { continue; SDValue ImmOperand = Base.getOperand(1); + uint64_t Offset2 = N->getConstantOperandVal(OffsetOpIdx); if (auto Const = dyn_cast(ImmOperand)) { - ImmOperand = CurDAG->getTargetConstant( - Const->getSExtValue(), SDLoc(ImmOperand), ImmOperand.getValueType()); + int64_t Offset1 = Const->getSExtValue(); + int64_t CombinedOffset = Offset1 + Offset2; + if (!isInt<12>(CombinedOffset)) + continue; + ImmOperand = CurDAG->getTargetConstant(CombinedOffset, SDLoc(ImmOperand), + ImmOperand.getValueType()); } else if (auto GA = dyn_cast(ImmOperand)) { + // If the off1 in (addi base, off1) is a global variable's address (its + // low part, really), then we can rely on the alignment of that variable + // to provide a margin of safety before off1 can overflow the 12 bits. + // Check if off2 falls within that margin; if so off1+off2 can't overflow. + const DataLayout &DL = CurDAG->getDataLayout(); + Align Alignment = GA->getGlobal()->getPointerAlignment(DL); + if (Offset2 != 0 && Alignment <= Offset2) + continue; + int64_t Offset1 = GA->getOffset(); + int64_t CombinedOffset = Offset1 + Offset2; ImmOperand = CurDAG->getTargetGlobalAddress( GA->getGlobal(), SDLoc(ImmOperand), ImmOperand.getValueType(), - GA->getOffset(), GA->getTargetFlags()); + CombinedOffset, GA->getTargetFlags()); } else if (auto CP = dyn_cast(ImmOperand)) { + // Ditto. + Align Alignment = CP->getAlign(); + if (Offset2 != 0 && Alignment <= Offset2) + continue; + int64_t Offset1 = CP->getOffset(); + int64_t CombinedOffset = Offset1 + Offset2; ImmOperand = CurDAG->getTargetConstantPool( CP->getConstVal(), ImmOperand.getValueType(), CP->getAlign(), - CP->getOffset(), CP->getTargetFlags()); + CombinedOffset, CP->getTargetFlags()); } else { continue; } diff --git a/llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll b/llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll index 56d3ff04d163..2c5206d57c72 100644 --- a/llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll +++ b/llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll @@ -1,15 +1,16 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32-LP64 +; RUN: | FileCheck %s -check-prefix=ILP32 ; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32-LP64 +; RUN: | FileCheck %s -check-prefix=LP64 ; RUN: llc -mtriple=riscv32 -mattr=+f -target-abi ilp32f -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32F-LP64F +; RUN: | FileCheck %s -check-prefix=ILP32F ; RUN: llc -mtriple=riscv64 -mattr=+f -target-abi lp64f -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32F-LP64F +; RUN: | FileCheck %s -check-prefix=LP64F ; RUN: llc -mtriple=riscv32 -mattr=+d -target-abi ilp32d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32D-LP64D +; RUN: | FileCheck %s -check-prefix=ILP32D ; RUN: llc -mtriple=riscv64 -mattr=+d -target-abi lp64d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32D-LP64D +; RUN: | FileCheck %s -check-prefix=LP64D @var = global [32 x float] zeroinitializer @@ -20,113 +21,529 @@ ; something appropriate. define void @callee() nounwind { -; ILP32-LP64-LABEL: callee: -; ILP32-LP64: # %bb.0: -; ILP32-LP64-NEXT: lui a0, %hi(var) -; ILP32-LP64-NEXT: flw ft0, %lo(var)(a0) -; ILP32-LP64-NEXT: addi a1, a0, %lo(var) -; ILP32-LP64-NEXT: flw ft1, 4(a1) -; ILP32-LP64-NEXT: flw ft2, 8(a1) -; ILP32-LP64-NEXT: flw ft3, 12(a1) -; ILP32-LP64-NEXT: flw ft4, 16(a1) -; ILP32-LP64-NEXT: flw ft5, 20(a1) -; ILP32-LP64-NEXT: flw ft6, 24(a1) -; ILP32-LP64-NEXT: flw ft7, 28(a1) -; ILP32-LP64-NEXT: flw fa0, 32(a1) -; ILP32-LP64-NEXT: flw fa1, 36(a1) -; ILP32-LP64-NEXT: flw fa2, 40(a1) -; ILP32-LP64-NEXT: flw fa3, 44(a1) -; ILP32-LP64-NEXT: flw fa4, 48(a1) -; ILP32-LP64-NEXT: flw fa5, 52(a1) -; ILP32-LP64-NEXT: flw fa6, 56(a1) -; ILP32-LP64-NEXT: flw fa7, 60(a1) -; ILP32-LP64-NEXT: flw ft8, 64(a1) -; ILP32-LP64-NEXT: flw ft9, 68(a1) -; ILP32-LP64-NEXT: flw ft10, 72(a1) -; ILP32-LP64-NEXT: flw ft11, 76(a1) -; ILP32-LP64-NEXT: flw fs0, 80(a1) -; ILP32-LP64-NEXT: flw fs1, 84(a1) -; ILP32-LP64-NEXT: flw fs2, 88(a1) -; ILP32-LP64-NEXT: flw fs3, 92(a1) -; ILP32-LP64-NEXT: flw fs4, 96(a1) -; ILP32-LP64-NEXT: flw fs5, 100(a1) -; ILP32-LP64-NEXT: flw fs6, 104(a1) -; ILP32-LP64-NEXT: flw fs7, 108(a1) -; ILP32-LP64-NEXT: flw fs8, 124(a1) -; ILP32-LP64-NEXT: flw fs9, 120(a1) -; ILP32-LP64-NEXT: flw fs10, 116(a1) -; ILP32-LP64-NEXT: flw fs11, 112(a1) -; ILP32-LP64-NEXT: fsw fs8, 124(a1) -; ILP32-LP64-NEXT: fsw fs9, 120(a1) -; ILP32-LP64-NEXT: fsw fs10, 116(a1) -; ILP32-LP64-NEXT: fsw fs11, 112(a1) -; ILP32-LP64-NEXT: fsw fs7, 108(a1) -; ILP32-LP64-NEXT: fsw fs6, 104(a1) -; ILP32-LP64-NEXT: fsw fs5, 100(a1) -; ILP32-LP64-NEXT: fsw fs4, 96(a1) -; ILP32-LP64-NEXT: fsw fs3, 92(a1) -; ILP32-LP64-NEXT: fsw fs2, 88(a1) -; ILP32-LP64-NEXT: fsw fs1, 84(a1) -; ILP32-LP64-NEXT: fsw fs0, 80(a1) -; ILP32-LP64-NEXT: fsw ft11, 76(a1) -; ILP32-LP64-NEXT: fsw ft10, 72(a1) -; ILP32-LP64-NEXT: fsw ft9, 68(a1) -; ILP32-LP64-NEXT: fsw ft8, 64(a1) -; ILP32-LP64-NEXT: fsw fa7, 60(a1) -; ILP32-LP64-NEXT: fsw fa6, 56(a1) -; ILP32-LP64-NEXT: fsw fa5, 52(a1) -; ILP32-LP64-NEXT: fsw fa4, 48(a1) -; ILP32-LP64-NEXT: fsw fa3, 44(a1) -; ILP32-LP64-NEXT: fsw fa2, 40(a1) -; ILP32-LP64-NEXT: fsw fa1, 36(a1) -; ILP32-LP64-NEXT: fsw fa0, 32(a1) -; ILP32-LP64-NEXT: fsw ft7, 28(a1) -; ILP32-LP64-NEXT: fsw ft6, 24(a1) -; ILP32-LP64-NEXT: fsw ft5, 20(a1) -; ILP32-LP64-NEXT: fsw ft4, 16(a1) -; ILP32-LP64-NEXT: fsw ft3, 12(a1) -; ILP32-LP64-NEXT: fsw ft2, 8(a1) -; ILP32-LP64-NEXT: fsw ft1, 4(a1) -; ILP32-LP64-NEXT: fsw ft0, %lo(var)(a0) -; ILP32-LP64-NEXT: ret +; ILP32-LABEL: callee: +; ILP32: # %bb.0: +; ILP32-NEXT: lui a0, %hi(var) +; ILP32-NEXT: flw ft0, %lo(var)(a0) +; ILP32-NEXT: flw ft1, %lo(var+4)(a0) +; ILP32-NEXT: flw ft2, %lo(var+8)(a0) +; ILP32-NEXT: flw ft3, %lo(var+12)(a0) +; ILP32-NEXT: addi a1, a0, %lo(var) +; ILP32-NEXT: flw ft4, 16(a1) +; ILP32-NEXT: flw ft5, 20(a1) +; ILP32-NEXT: flw ft6, 24(a1) +; ILP32-NEXT: flw ft7, 28(a1) +; ILP32-NEXT: flw fa0, 32(a1) +; ILP32-NEXT: flw fa1, 36(a1) +; ILP32-NEXT: flw fa2, 40(a1) +; ILP32-NEXT: flw fa3, 44(a1) +; ILP32-NEXT: flw fa4, 48(a1) +; ILP32-NEXT: flw fa5, 52(a1) +; ILP32-NEXT: flw fa6, 56(a1) +; ILP32-NEXT: flw fa7, 60(a1) +; ILP32-NEXT: flw ft8, 64(a1) +; ILP32-NEXT: flw ft9, 68(a1) +; ILP32-NEXT: flw ft10, 72(a1) +; ILP32-NEXT: flw ft11, 76(a1) +; ILP32-NEXT: flw fs0, 80(a1) +; ILP32-NEXT: flw fs1, 84(a1) +; ILP32-NEXT: flw fs2, 88(a1) +; ILP32-NEXT: flw fs3, 92(a1) +; ILP32-NEXT: flw fs4, 96(a1) +; ILP32-NEXT: flw fs5, 100(a1) +; ILP32-NEXT: flw fs6, 104(a1) +; ILP32-NEXT: flw fs7, 108(a1) +; ILP32-NEXT: flw fs8, 124(a1) +; ILP32-NEXT: flw fs9, 120(a1) +; ILP32-NEXT: flw fs10, 116(a1) +; ILP32-NEXT: flw fs11, 112(a1) +; ILP32-NEXT: fsw fs8, 124(a1) +; ILP32-NEXT: fsw fs9, 120(a1) +; ILP32-NEXT: fsw fs10, 116(a1) +; ILP32-NEXT: fsw fs11, 112(a1) +; ILP32-NEXT: fsw fs7, 108(a1) +; ILP32-NEXT: fsw fs6, 104(a1) +; ILP32-NEXT: fsw fs5, 100(a1) +; ILP32-NEXT: fsw fs4, 96(a1) +; ILP32-NEXT: fsw fs3, 92(a1) +; ILP32-NEXT: fsw fs2, 88(a1) +; ILP32-NEXT: fsw fs1, 84(a1) +; ILP32-NEXT: fsw fs0, 80(a1) +; ILP32-NEXT: fsw ft11, 76(a1) +; ILP32-NEXT: fsw ft10, 72(a1) +; ILP32-NEXT: fsw ft9, 68(a1) +; ILP32-NEXT: fsw ft8, 64(a1) +; ILP32-NEXT: fsw fa7, 60(a1) +; ILP32-NEXT: fsw fa6, 56(a1) +; ILP32-NEXT: fsw fa5, 52(a1) +; ILP32-NEXT: fsw fa4, 48(a1) +; ILP32-NEXT: fsw fa3, 44(a1) +; ILP32-NEXT: fsw fa2, 40(a1) +; ILP32-NEXT: fsw fa1, 36(a1) +; ILP32-NEXT: fsw fa0, 32(a1) +; ILP32-NEXT: fsw ft7, 28(a1) +; ILP32-NEXT: fsw ft6, 24(a1) +; ILP32-NEXT: fsw ft5, 20(a1) +; ILP32-NEXT: fsw ft4, 16(a1) +; ILP32-NEXT: fsw ft3, %lo(var+12)(a0) +; ILP32-NEXT: fsw ft2, %lo(var+8)(a0) +; ILP32-NEXT: fsw ft1, %lo(var+4)(a0) +; ILP32-NEXT: fsw ft0, %lo(var)(a0) +; ILP32-NEXT: ret ; -; ILP32F-LP64F-LABEL: callee: -; ILP32F-LP64F: # %bb.0: -; ILP32F-LP64F-NEXT: addi sp, sp, -48 -; ILP32F-LP64F-NEXT: fsw fs0, 44(sp) -; ILP32F-LP64F-NEXT: fsw fs1, 40(sp) -; ILP32F-LP64F-NEXT: fsw fs2, 36(sp) -; ILP32F-LP64F-NEXT: fsw fs3, 32(sp) -; ILP32F-LP64F-NEXT: fsw fs4, 28(sp) -; ILP32F-LP64F-NEXT: fsw fs5, 24(sp) -; ILP32F-LP64F-NEXT: fsw fs6, 20(sp) -; ILP32F-LP64F-NEXT: fsw fs7, 16(sp) -; ILP32F-LP64F-NEXT: fsw fs8, 12(sp) -; ILP32F-LP64F-NEXT: fsw fs9, 8(sp) -; ILP32F-LP64F-NEXT: fsw fs10, 4(sp) -; ILP32F-LP64F-NEXT: fsw fs11, 0(sp) -; ILP32F-LP64F-NEXT: lui a0, %hi(var) -; ILP32F-LP64F-NEXT: flw ft0, %lo(var)(a0) -; ILP32F-LP64F-NEXT: addi a1, a0, %lo(var) +; LP64-LABEL: callee: +; LP64: # %bb.0: +; LP64-NEXT: lui a0, %hi(var) +; LP64-NEXT: flw ft0, %lo(var)(a0) +; LP64-NEXT: flw ft1, %lo(var+4)(a0) +; LP64-NEXT: flw ft2, %lo(var+8)(a0) +; LP64-NEXT: flw ft3, %lo(var+12)(a0) +; LP64-NEXT: addi a1, a0, %lo(var) +; LP64-NEXT: flw ft4, 16(a1) +; LP64-NEXT: flw ft5, 20(a1) +; LP64-NEXT: flw ft6, 24(a1) +; LP64-NEXT: flw ft7, 28(a1) +; LP64-NEXT: flw fa0, 32(a1) +; LP64-NEXT: flw fa1, 36(a1) +; LP64-NEXT: flw fa2, 40(a1) +; LP64-NEXT: flw fa3, 44(a1) +; LP64-NEXT: flw fa4, 48(a1) +; LP64-NEXT: flw fa5, 52(a1) +; LP64-NEXT: flw fa6, 56(a1) +; LP64-NEXT: flw fa7, 60(a1) +; LP64-NEXT: flw ft8, 64(a1) +; LP64-NEXT: flw ft9, 68(a1) +; LP64-NEXT: flw ft10, 72(a1) +; LP64-NEXT: flw ft11, 76(a1) +; LP64-NEXT: flw fs0, 80(a1) +; LP64-NEXT: flw fs1, 84(a1) +; LP64-NEXT: flw fs2, 88(a1) +; LP64-NEXT: flw fs3, 92(a1) +; LP64-NEXT: flw fs4, 96(a1) +; LP64-NEXT: flw fs5, 100(a1) +; LP64-NEXT: flw fs6, 104(a1) +; LP64-NEXT: flw fs7, 108(a1) +; LP64-NEXT: flw fs8, 124(a1) +; LP64-NEXT: flw fs9, 120(a1) +; LP64-NEXT: flw fs10, 116(a1) +; LP64-NEXT: flw fs11, 112(a1) +; LP64-NEXT: fsw fs8, 124(a1) +; LP64-NEXT: fsw fs9, 120(a1) +; LP64-NEXT: fsw fs10, 116(a1) +; LP64-NEXT: fsw fs11, 112(a1) +; LP64-NEXT: fsw fs7, 108(a1) +; LP64-NEXT: fsw fs6, 104(a1) +; LP64-NEXT: fsw fs5, 100(a1) +; LP64-NEXT: fsw fs4, 96(a1) +; LP64-NEXT: fsw fs3, 92(a1) +; LP64-NEXT: fsw fs2, 88(a1) +; LP64-NEXT: fsw fs1, 84(a1) +; LP64-NEXT: fsw fs0, 80(a1) +; LP64-NEXT: fsw ft11, 76(a1) +; LP64-NEXT: fsw ft10, 72(a1) +; LP64-NEXT: fsw ft9, 68(a1) +; LP64-NEXT: fsw ft8, 64(a1) +; LP64-NEXT: fsw fa7, 60(a1) +; LP64-NEXT: fsw fa6, 56(a1) +; LP64-NEXT: fsw fa5, 52(a1) +; LP64-NEXT: fsw fa4, 48(a1) +; LP64-NEXT: fsw fa3, 44(a1) +; LP64-NEXT: fsw fa2, 40(a1) +; LP64-NEXT: fsw fa1, 36(a1) +; LP64-NEXT: fsw fa0, 32(a1) +; LP64-NEXT: fsw ft7, 28(a1) +; LP64-NEXT: fsw ft6, 24(a1) +; LP64-NEXT: fsw ft5, 20(a1) +; LP64-NEXT: fsw ft4, 16(a1) +; LP64-NEXT: fsw ft3, %lo(var+12)(a0) +; LP64-NEXT: fsw ft2, %lo(var+8)(a0) +; LP64-NEXT: fsw ft1, %lo(var+4)(a0) +; LP64-NEXT: fsw ft0, %lo(var)(a0) +; LP64-NEXT: ret ; -; ILP32D-LP64D-LABEL: callee: -; ILP32D-LP64D: # %bb.0: -; ILP32D-LP64D-NEXT: addi sp, sp, -96 -; ILP32D-LP64D-NEXT: fsd fs0, 88(sp) -; ILP32D-LP64D-NEXT: fsd fs1, 80(sp) -; ILP32D-LP64D-NEXT: fsd fs2, 72(sp) -; ILP32D-LP64D-NEXT: fsd fs3, 64(sp) -; ILP32D-LP64D-NEXT: fsd fs4, 56(sp) -; ILP32D-LP64D-NEXT: fsd fs5, 48(sp) -; ILP32D-LP64D-NEXT: fsd fs6, 40(sp) -; ILP32D-LP64D-NEXT: fsd fs7, 32(sp) -; ILP32D-LP64D-NEXT: fsd fs8, 24(sp) -; ILP32D-LP64D-NEXT: fsd fs9, 16(sp) -; ILP32D-LP64D-NEXT: fsd fs10, 8(sp) -; ILP32D-LP64D-NEXT: fsd fs11, 0(sp) -; ILP32D-LP64D-NEXT: lui a0, %hi(var) -; ILP32D-LP64D-NEXT: flw ft0, %lo(var)(a0) -; ILP32D-LP64D-NEXT: addi a1, a0, %lo(var) +; ILP32F-LABEL: callee: +; ILP32F: # %bb.0: +; ILP32F-NEXT: addi sp, sp, -48 +; ILP32F-NEXT: fsw fs0, 44(sp) +; ILP32F-NEXT: fsw fs1, 40(sp) +; ILP32F-NEXT: fsw fs2, 36(sp) +; ILP32F-NEXT: fsw fs3, 32(sp) +; ILP32F-NEXT: fsw fs4, 28(sp) +; ILP32F-NEXT: fsw fs5, 24(sp) +; ILP32F-NEXT: fsw fs6, 20(sp) +; ILP32F-NEXT: fsw fs7, 16(sp) +; ILP32F-NEXT: fsw fs8, 12(sp) +; ILP32F-NEXT: fsw fs9, 8(sp) +; ILP32F-NEXT: fsw fs10, 4(sp) +; ILP32F-NEXT: fsw fs11, 0(sp) +; ILP32F-NEXT: lui a0, %hi(var) +; ILP32F-NEXT: flw ft0, %lo(var)(a0) +; ILP32F-NEXT: flw ft1, %lo(var+4)(a0) +; ILP32F-NEXT: flw ft2, %lo(var+8)(a0) +; ILP32F-NEXT: flw ft3, %lo(var+12)(a0) +; ILP32F-NEXT: addi a1, a0, %lo(var) +; ILP32F-NEXT: flw ft4, 16(a1) +; ILP32F-NEXT: flw ft5, 20(a1) +; ILP32F-NEXT: flw ft6, 24(a1) +; ILP32F-NEXT: flw ft7, 28(a1) +; ILP32F-NEXT: flw fa0, 32(a1) +; ILP32F-NEXT: flw fa1, 36(a1) +; ILP32F-NEXT: flw fa2, 40(a1) +; ILP32F-NEXT: flw fa3, 44(a1) +; ILP32F-NEXT: flw fa4, 48(a1) +; ILP32F-NEXT: flw fa5, 52(a1) +; ILP32F-NEXT: flw fa6, 56(a1) +; ILP32F-NEXT: flw fa7, 60(a1) +; ILP32F-NEXT: flw ft8, 64(a1) +; ILP32F-NEXT: flw ft9, 68(a1) +; ILP32F-NEXT: flw ft10, 72(a1) +; ILP32F-NEXT: flw ft11, 76(a1) +; ILP32F-NEXT: flw fs0, 80(a1) +; ILP32F-NEXT: flw fs1, 84(a1) +; ILP32F-NEXT: flw fs2, 88(a1) +; ILP32F-NEXT: flw fs3, 92(a1) +; ILP32F-NEXT: flw fs4, 96(a1) +; ILP32F-NEXT: flw fs5, 100(a1) +; ILP32F-NEXT: flw fs6, 104(a1) +; ILP32F-NEXT: flw fs7, 108(a1) +; ILP32F-NEXT: flw fs8, 124(a1) +; ILP32F-NEXT: flw fs9, 120(a1) +; ILP32F-NEXT: flw fs10, 116(a1) +; ILP32F-NEXT: flw fs11, 112(a1) +; ILP32F-NEXT: fsw fs8, 124(a1) +; ILP32F-NEXT: fsw fs9, 120(a1) +; ILP32F-NEXT: fsw fs10, 116(a1) +; ILP32F-NEXT: fsw fs11, 112(a1) +; ILP32F-NEXT: fsw fs7, 108(a1) +; ILP32F-NEXT: fsw fs6, 104(a1) +; ILP32F-NEXT: fsw fs5, 100(a1) +; ILP32F-NEXT: fsw fs4, 96(a1) +; ILP32F-NEXT: fsw fs3, 92(a1) +; ILP32F-NEXT: fsw fs2, 88(a1) +; ILP32F-NEXT: fsw fs1, 84(a1) +; ILP32F-NEXT: fsw fs0, 80(a1) +; ILP32F-NEXT: fsw ft11, 76(a1) +; ILP32F-NEXT: fsw ft10, 72(a1) +; ILP32F-NEXT: fsw ft9, 68(a1) +; ILP32F-NEXT: fsw ft8, 64(a1) +; ILP32F-NEXT: fsw fa7, 60(a1) +; ILP32F-NEXT: fsw fa6, 56(a1) +; ILP32F-NEXT: fsw fa5, 52(a1) +; ILP32F-NEXT: fsw fa4, 48(a1) +; ILP32F-NEXT: fsw fa3, 44(a1) +; ILP32F-NEXT: fsw fa2, 40(a1) +; ILP32F-NEXT: fsw fa1, 36(a1) +; ILP32F-NEXT: fsw fa0, 32(a1) +; ILP32F-NEXT: fsw ft7, 28(a1) +; ILP32F-NEXT: fsw ft6, 24(a1) +; ILP32F-NEXT: fsw ft5, 20(a1) +; ILP32F-NEXT: fsw ft4, 16(a1) +; ILP32F-NEXT: fsw ft3, %lo(var+12)(a0) +; ILP32F-NEXT: fsw ft2, %lo(var+8)(a0) +; ILP32F-NEXT: fsw ft1, %lo(var+4)(a0) +; ILP32F-NEXT: fsw ft0, %lo(var)(a0) +; ILP32F-NEXT: flw fs11, 0(sp) +; ILP32F-NEXT: flw fs10, 4(sp) +; ILP32F-NEXT: flw fs9, 8(sp) +; ILP32F-NEXT: flw fs8, 12(sp) +; ILP32F-NEXT: flw fs7, 16(sp) +; ILP32F-NEXT: flw fs6, 20(sp) +; ILP32F-NEXT: flw fs5, 24(sp) +; ILP32F-NEXT: flw fs4, 28(sp) +; ILP32F-NEXT: flw fs3, 32(sp) +; ILP32F-NEXT: flw fs2, 36(sp) +; ILP32F-NEXT: flw fs1, 40(sp) +; ILP32F-NEXT: flw fs0, 44(sp) +; ILP32F-NEXT: addi sp, sp, 48 +; ILP32F-NEXT: ret +; +; LP64F-LABEL: callee: +; LP64F: # %bb.0: +; LP64F-NEXT: addi sp, sp, -48 +; LP64F-NEXT: fsw fs0, 44(sp) +; LP64F-NEXT: fsw fs1, 40(sp) +; LP64F-NEXT: fsw fs2, 36(sp) +; LP64F-NEXT: fsw fs3, 32(sp) +; LP64F-NEXT: fsw fs4, 28(sp) +; LP64F-NEXT: fsw fs5, 24(sp) +; LP64F-NEXT: fsw fs6, 20(sp) +; LP64F-NEXT: fsw fs7, 16(sp) +; LP64F-NEXT: fsw fs8, 12(sp) +; LP64F-NEXT: fsw fs9, 8(sp) +; LP64F-NEXT: fsw fs10, 4(sp) +; LP64F-NEXT: fsw fs11, 0(sp) +; LP64F-NEXT: lui a0, %hi(var) +; LP64F-NEXT: flw ft0, %lo(var)(a0) +; LP64F-NEXT: flw ft1, %lo(var+4)(a0) +; LP64F-NEXT: flw ft2, %lo(var+8)(a0) +; LP64F-NEXT: flw ft3, %lo(var+12)(a0) +; LP64F-NEXT: addi a1, a0, %lo(var) +; LP64F-NEXT: flw ft4, 16(a1) +; LP64F-NEXT: flw ft5, 20(a1) +; LP64F-NEXT: flw ft6, 24(a1) +; LP64F-NEXT: flw ft7, 28(a1) +; LP64F-NEXT: flw fa0, 32(a1) +; LP64F-NEXT: flw fa1, 36(a1) +; LP64F-NEXT: flw fa2, 40(a1) +; LP64F-NEXT: flw fa3, 44(a1) +; LP64F-NEXT: flw fa4, 48(a1) +; LP64F-NEXT: flw fa5, 52(a1) +; LP64F-NEXT: flw fa6, 56(a1) +; LP64F-NEXT: flw fa7, 60(a1) +; LP64F-NEXT: flw ft8, 64(a1) +; LP64F-NEXT: flw ft9, 68(a1) +; LP64F-NEXT: flw ft10, 72(a1) +; LP64F-NEXT: flw ft11, 76(a1) +; LP64F-NEXT: flw fs0, 80(a1) +; LP64F-NEXT: flw fs1, 84(a1) +; LP64F-NEXT: flw fs2, 88(a1) +; LP64F-NEXT: flw fs3, 92(a1) +; LP64F-NEXT: flw fs4, 96(a1) +; LP64F-NEXT: flw fs5, 100(a1) +; LP64F-NEXT: flw fs6, 104(a1) +; LP64F-NEXT: flw fs7, 108(a1) +; LP64F-NEXT: flw fs8, 124(a1) +; LP64F-NEXT: flw fs9, 120(a1) +; LP64F-NEXT: flw fs10, 116(a1) +; LP64F-NEXT: flw fs11, 112(a1) +; LP64F-NEXT: fsw fs8, 124(a1) +; LP64F-NEXT: fsw fs9, 120(a1) +; LP64F-NEXT: fsw fs10, 116(a1) +; LP64F-NEXT: fsw fs11, 112(a1) +; LP64F-NEXT: fsw fs7, 108(a1) +; LP64F-NEXT: fsw fs6, 104(a1) +; LP64F-NEXT: fsw fs5, 100(a1) +; LP64F-NEXT: fsw fs4, 96(a1) +; LP64F-NEXT: fsw fs3, 92(a1) +; LP64F-NEXT: fsw fs2, 88(a1) +; LP64F-NEXT: fsw fs1, 84(a1) +; LP64F-NEXT: fsw fs0, 80(a1) +; LP64F-NEXT: fsw ft11, 76(a1) +; LP64F-NEXT: fsw ft10, 72(a1) +; LP64F-NEXT: fsw ft9, 68(a1) +; LP64F-NEXT: fsw ft8, 64(a1) +; LP64F-NEXT: fsw fa7, 60(a1) +; LP64F-NEXT: fsw fa6, 56(a1) +; LP64F-NEXT: fsw fa5, 52(a1) +; LP64F-NEXT: fsw fa4, 48(a1) +; LP64F-NEXT: fsw fa3, 44(a1) +; LP64F-NEXT: fsw fa2, 40(a1) +; LP64F-NEXT: fsw fa1, 36(a1) +; LP64F-NEXT: fsw fa0, 32(a1) +; LP64F-NEXT: fsw ft7, 28(a1) +; LP64F-NEXT: fsw ft6, 24(a1) +; LP64F-NEXT: fsw ft5, 20(a1) +; LP64F-NEXT: fsw ft4, 16(a1) +; LP64F-NEXT: fsw ft3, %lo(var+12)(a0) +; LP64F-NEXT: fsw ft2, %lo(var+8)(a0) +; LP64F-NEXT: fsw ft1, %lo(var+4)(a0) +; LP64F-NEXT: fsw ft0, %lo(var)(a0) +; LP64F-NEXT: flw fs11, 0(sp) +; LP64F-NEXT: flw fs10, 4(sp) +; LP64F-NEXT: flw fs9, 8(sp) +; LP64F-NEXT: flw fs8, 12(sp) +; LP64F-NEXT: flw fs7, 16(sp) +; LP64F-NEXT: flw fs6, 20(sp) +; LP64F-NEXT: flw fs5, 24(sp) +; LP64F-NEXT: flw fs4, 28(sp) +; LP64F-NEXT: flw fs3, 32(sp) +; LP64F-NEXT: flw fs2, 36(sp) +; LP64F-NEXT: flw fs1, 40(sp) +; LP64F-NEXT: flw fs0, 44(sp) +; LP64F-NEXT: addi sp, sp, 48 +; LP64F-NEXT: ret +; +; ILP32D-LABEL: callee: +; ILP32D: # %bb.0: +; ILP32D-NEXT: addi sp, sp, -96 +; ILP32D-NEXT: fsd fs0, 88(sp) +; ILP32D-NEXT: fsd fs1, 80(sp) +; ILP32D-NEXT: fsd fs2, 72(sp) +; ILP32D-NEXT: fsd fs3, 64(sp) +; ILP32D-NEXT: fsd fs4, 56(sp) +; ILP32D-NEXT: fsd fs5, 48(sp) +; ILP32D-NEXT: fsd fs6, 40(sp) +; ILP32D-NEXT: fsd fs7, 32(sp) +; ILP32D-NEXT: fsd fs8, 24(sp) +; ILP32D-NEXT: fsd fs9, 16(sp) +; ILP32D-NEXT: fsd fs10, 8(sp) +; ILP32D-NEXT: fsd fs11, 0(sp) +; ILP32D-NEXT: lui a0, %hi(var) +; ILP32D-NEXT: flw ft0, %lo(var)(a0) +; ILP32D-NEXT: flw ft1, %lo(var+4)(a0) +; ILP32D-NEXT: flw ft2, %lo(var+8)(a0) +; ILP32D-NEXT: flw ft3, %lo(var+12)(a0) +; ILP32D-NEXT: addi a1, a0, %lo(var) +; ILP32D-NEXT: flw ft4, 16(a1) +; ILP32D-NEXT: flw ft5, 20(a1) +; ILP32D-NEXT: flw ft6, 24(a1) +; ILP32D-NEXT: flw ft7, 28(a1) +; ILP32D-NEXT: flw fa0, 32(a1) +; ILP32D-NEXT: flw fa1, 36(a1) +; ILP32D-NEXT: flw fa2, 40(a1) +; ILP32D-NEXT: flw fa3, 44(a1) +; ILP32D-NEXT: flw fa4, 48(a1) +; ILP32D-NEXT: flw fa5, 52(a1) +; ILP32D-NEXT: flw fa6, 56(a1) +; ILP32D-NEXT: flw fa7, 60(a1) +; ILP32D-NEXT: flw ft8, 64(a1) +; ILP32D-NEXT: flw ft9, 68(a1) +; ILP32D-NEXT: flw ft10, 72(a1) +; ILP32D-NEXT: flw ft11, 76(a1) +; ILP32D-NEXT: flw fs0, 80(a1) +; ILP32D-NEXT: flw fs1, 84(a1) +; ILP32D-NEXT: flw fs2, 88(a1) +; ILP32D-NEXT: flw fs3, 92(a1) +; ILP32D-NEXT: flw fs4, 96(a1) +; ILP32D-NEXT: flw fs5, 100(a1) +; ILP32D-NEXT: flw fs6, 104(a1) +; ILP32D-NEXT: flw fs7, 108(a1) +; ILP32D-NEXT: flw fs8, 124(a1) +; ILP32D-NEXT: flw fs9, 120(a1) +; ILP32D-NEXT: flw fs10, 116(a1) +; ILP32D-NEXT: flw fs11, 112(a1) +; ILP32D-NEXT: fsw fs8, 124(a1) +; ILP32D-NEXT: fsw fs9, 120(a1) +; ILP32D-NEXT: fsw fs10, 116(a1) +; ILP32D-NEXT: fsw fs11, 112(a1) +; ILP32D-NEXT: fsw fs7, 108(a1) +; ILP32D-NEXT: fsw fs6, 104(a1) +; ILP32D-NEXT: fsw fs5, 100(a1) +; ILP32D-NEXT: fsw fs4, 96(a1) +; ILP32D-NEXT: fsw fs3, 92(a1) +; ILP32D-NEXT: fsw fs2, 88(a1) +; ILP32D-NEXT: fsw fs1, 84(a1) +; ILP32D-NEXT: fsw fs0, 80(a1) +; ILP32D-NEXT: fsw ft11, 76(a1) +; ILP32D-NEXT: fsw ft10, 72(a1) +; ILP32D-NEXT: fsw ft9, 68(a1) +; ILP32D-NEXT: fsw ft8, 64(a1) +; ILP32D-NEXT: fsw fa7, 60(a1) +; ILP32D-NEXT: fsw fa6, 56(a1) +; ILP32D-NEXT: fsw fa5, 52(a1) +; ILP32D-NEXT: fsw fa4, 48(a1) +; ILP32D-NEXT: fsw fa3, 44(a1) +; ILP32D-NEXT: fsw fa2, 40(a1) +; ILP32D-NEXT: fsw fa1, 36(a1) +; ILP32D-NEXT: fsw fa0, 32(a1) +; ILP32D-NEXT: fsw ft7, 28(a1) +; ILP32D-NEXT: fsw ft6, 24(a1) +; ILP32D-NEXT: fsw ft5, 20(a1) +; ILP32D-NEXT: fsw ft4, 16(a1) +; ILP32D-NEXT: fsw ft3, %lo(var+12)(a0) +; ILP32D-NEXT: fsw ft2, %lo(var+8)(a0) +; ILP32D-NEXT: fsw ft1, %lo(var+4)(a0) +; ILP32D-NEXT: fsw ft0, %lo(var)(a0) +; ILP32D-NEXT: fld fs11, 0(sp) +; ILP32D-NEXT: fld fs10, 8(sp) +; ILP32D-NEXT: fld fs9, 16(sp) +; ILP32D-NEXT: fld fs8, 24(sp) +; ILP32D-NEXT: fld fs7, 32(sp) +; ILP32D-NEXT: fld fs6, 40(sp) +; ILP32D-NEXT: fld fs5, 48(sp) +; ILP32D-NEXT: fld fs4, 56(sp) +; ILP32D-NEXT: fld fs3, 64(sp) +; ILP32D-NEXT: fld fs2, 72(sp) +; ILP32D-NEXT: fld fs1, 80(sp) +; ILP32D-NEXT: fld fs0, 88(sp) +; ILP32D-NEXT: addi sp, sp, 96 +; ILP32D-NEXT: ret +; +; LP64D-LABEL: callee: +; LP64D: # %bb.0: +; LP64D-NEXT: addi sp, sp, -96 +; LP64D-NEXT: fsd fs0, 88(sp) +; LP64D-NEXT: fsd fs1, 80(sp) +; LP64D-NEXT: fsd fs2, 72(sp) +; LP64D-NEXT: fsd fs3, 64(sp) +; LP64D-NEXT: fsd fs4, 56(sp) +; LP64D-NEXT: fsd fs5, 48(sp) +; LP64D-NEXT: fsd fs6, 40(sp) +; LP64D-NEXT: fsd fs7, 32(sp) +; LP64D-NEXT: fsd fs8, 24(sp) +; LP64D-NEXT: fsd fs9, 16(sp) +; LP64D-NEXT: fsd fs10, 8(sp) +; LP64D-NEXT: fsd fs11, 0(sp) +; LP64D-NEXT: lui a0, %hi(var) +; LP64D-NEXT: flw ft0, %lo(var)(a0) +; LP64D-NEXT: flw ft1, %lo(var+4)(a0) +; LP64D-NEXT: flw ft2, %lo(var+8)(a0) +; LP64D-NEXT: flw ft3, %lo(var+12)(a0) +; LP64D-NEXT: addi a1, a0, %lo(var) +; LP64D-NEXT: flw ft4, 16(a1) +; LP64D-NEXT: flw ft5, 20(a1) +; LP64D-NEXT: flw ft6, 24(a1) +; LP64D-NEXT: flw ft7, 28(a1) +; LP64D-NEXT: flw fa0, 32(a1) +; LP64D-NEXT: flw fa1, 36(a1) +; LP64D-NEXT: flw fa2, 40(a1) +; LP64D-NEXT: flw fa3, 44(a1) +; LP64D-NEXT: flw fa4, 48(a1) +; LP64D-NEXT: flw fa5, 52(a1) +; LP64D-NEXT: flw fa6, 56(a1) +; LP64D-NEXT: flw fa7, 60(a1) +; LP64D-NEXT: flw ft8, 64(a1) +; LP64D-NEXT: flw ft9, 68(a1) +; LP64D-NEXT: flw ft10, 72(a1) +; LP64D-NEXT: flw ft11, 76(a1) +; LP64D-NEXT: flw fs0, 80(a1) +; LP64D-NEXT: flw fs1, 84(a1) +; LP64D-NEXT: flw fs2, 88(a1) +; LP64D-NEXT: flw fs3, 92(a1) +; LP64D-NEXT: flw fs4, 96(a1) +; LP64D-NEXT: flw fs5, 100(a1) +; LP64D-NEXT: flw fs6, 104(a1) +; LP64D-NEXT: flw fs7, 108(a1) +; LP64D-NEXT: flw fs8, 124(a1) +; LP64D-NEXT: flw fs9, 120(a1) +; LP64D-NEXT: flw fs10, 116(a1) +; LP64D-NEXT: flw fs11, 112(a1) +; LP64D-NEXT: fsw fs8, 124(a1) +; LP64D-NEXT: fsw fs9, 120(a1) +; LP64D-NEXT: fsw fs10, 116(a1) +; LP64D-NEXT: fsw fs11, 112(a1) +; LP64D-NEXT: fsw fs7, 108(a1) +; LP64D-NEXT: fsw fs6, 104(a1) +; LP64D-NEXT: fsw fs5, 100(a1) +; LP64D-NEXT: fsw fs4, 96(a1) +; LP64D-NEXT: fsw fs3, 92(a1) +; LP64D-NEXT: fsw fs2, 88(a1) +; LP64D-NEXT: fsw fs1, 84(a1) +; LP64D-NEXT: fsw fs0, 80(a1) +; LP64D-NEXT: fsw ft11, 76(a1) +; LP64D-NEXT: fsw ft10, 72(a1) +; LP64D-NEXT: fsw ft9, 68(a1) +; LP64D-NEXT: fsw ft8, 64(a1) +; LP64D-NEXT: fsw fa7, 60(a1) +; LP64D-NEXT: fsw fa6, 56(a1) +; LP64D-NEXT: fsw fa5, 52(a1) +; LP64D-NEXT: fsw fa4, 48(a1) +; LP64D-NEXT: fsw fa3, 44(a1) +; LP64D-NEXT: fsw fa2, 40(a1) +; LP64D-NEXT: fsw fa1, 36(a1) +; LP64D-NEXT: fsw fa0, 32(a1) +; LP64D-NEXT: fsw ft7, 28(a1) +; LP64D-NEXT: fsw ft6, 24(a1) +; LP64D-NEXT: fsw ft5, 20(a1) +; LP64D-NEXT: fsw ft4, 16(a1) +; LP64D-NEXT: fsw ft3, %lo(var+12)(a0) +; LP64D-NEXT: fsw ft2, %lo(var+8)(a0) +; LP64D-NEXT: fsw ft1, %lo(var+4)(a0) +; LP64D-NEXT: fsw ft0, %lo(var)(a0) +; LP64D-NEXT: fld fs11, 0(sp) +; LP64D-NEXT: fld fs10, 8(sp) +; LP64D-NEXT: fld fs9, 16(sp) +; LP64D-NEXT: fld fs8, 24(sp) +; LP64D-NEXT: fld fs7, 32(sp) +; LP64D-NEXT: fld fs6, 40(sp) +; LP64D-NEXT: fld fs5, 48(sp) +; LP64D-NEXT: fld fs4, 56(sp) +; LP64D-NEXT: fld fs3, 64(sp) +; LP64D-NEXT: fld fs2, 72(sp) +; LP64D-NEXT: fld fs1, 80(sp) +; LP64D-NEXT: fld fs0, 88(sp) +; LP64D-NEXT: addi sp, sp, 96 +; LP64D-NEXT: ret %val = load [32 x float], [32 x float]* @var store volatile [32 x float] %val, [32 x float]* @var ret void @@ -140,71 +557,863 @@ define void @callee() nounwind { ; fs0-fs11 are preserved across calls. define void @caller() nounwind { -; ILP32-LP64-LABEL: caller: -; ILP32-LP64-NOT: ft{{[1-9][0-9]*}} -; ILP32-LP64-NOT: fs{{[0-9]+}} -; ILP32-LP64-NOT: fa{{[0-9]+}} -; ILP32-LP64: call callee -; ILP32-LP64-NOT: ft{{[1-9][0-9]*}} -; ILP32-LP64-NOT: fs{{[0-9]+}} -; ILP32-LP64-NOT: fa{{[0-9]+}} -; ILP32-LP64: ret +; ILP32-LABEL: caller: +; ILP32: # %bb.0: +; ILP32-NEXT: addi sp, sp, -144 +; ILP32-NEXT: sw ra, 140(sp) +; ILP32-NEXT: sw s0, 136(sp) +; ILP32-NEXT: sw s1, 132(sp) +; ILP32-NEXT: lui s0, %hi(var) +; ILP32-NEXT: flw ft0, %lo(var)(s0) +; ILP32-NEXT: fsw ft0, 128(sp) +; ILP32-NEXT: flw ft0, %lo(var+4)(s0) +; ILP32-NEXT: fsw ft0, 124(sp) +; ILP32-NEXT: flw ft0, %lo(var+8)(s0) +; ILP32-NEXT: fsw ft0, 120(sp) +; ILP32-NEXT: flw ft0, %lo(var+12)(s0) +; ILP32-NEXT: fsw ft0, 116(sp) +; ILP32-NEXT: addi s1, s0, %lo(var) +; ILP32-NEXT: flw ft0, 16(s1) +; ILP32-NEXT: fsw ft0, 112(sp) +; ILP32-NEXT: flw ft0, 20(s1) +; ILP32-NEXT: fsw ft0, 108(sp) +; ILP32-NEXT: flw ft0, 24(s1) +; ILP32-NEXT: fsw ft0, 104(sp) +; ILP32-NEXT: flw ft0, 28(s1) +; ILP32-NEXT: fsw ft0, 100(sp) +; ILP32-NEXT: flw ft0, 32(s1) +; ILP32-NEXT: fsw ft0, 96(sp) +; ILP32-NEXT: flw ft0, 36(s1) +; ILP32-NEXT: fsw ft0, 92(sp) +; ILP32-NEXT: flw ft0, 40(s1) +; ILP32-NEXT: fsw ft0, 88(sp) +; ILP32-NEXT: flw ft0, 44(s1) +; ILP32-NEXT: fsw ft0, 84(sp) +; ILP32-NEXT: flw ft0, 48(s1) +; ILP32-NEXT: fsw ft0, 80(sp) +; ILP32-NEXT: flw ft0, 52(s1) +; ILP32-NEXT: fsw ft0, 76(sp) +; ILP32-NEXT: flw ft0, 56(s1) +; ILP32-NEXT: fsw ft0, 72(sp) +; ILP32-NEXT: flw ft0, 60(s1) +; ILP32-NEXT: fsw ft0, 68(sp) +; ILP32-NEXT: flw ft0, 64(s1) +; ILP32-NEXT: fsw ft0, 64(sp) +; ILP32-NEXT: flw ft0, 68(s1) +; ILP32-NEXT: fsw ft0, 60(sp) +; ILP32-NEXT: flw ft0, 72(s1) +; ILP32-NEXT: fsw ft0, 56(sp) +; ILP32-NEXT: flw ft0, 76(s1) +; ILP32-NEXT: fsw ft0, 52(sp) +; ILP32-NEXT: flw ft0, 80(s1) +; ILP32-NEXT: fsw ft0, 48(sp) +; ILP32-NEXT: flw ft0, 84(s1) +; ILP32-NEXT: fsw ft0, 44(sp) +; ILP32-NEXT: flw ft0, 88(s1) +; ILP32-NEXT: fsw ft0, 40(sp) +; ILP32-NEXT: flw ft0, 92(s1) +; ILP32-NEXT: fsw ft0, 36(sp) +; ILP32-NEXT: flw ft0, 96(s1) +; ILP32-NEXT: fsw ft0, 32(sp) +; ILP32-NEXT: flw ft0, 100(s1) +; ILP32-NEXT: fsw ft0, 28(sp) +; ILP32-NEXT: flw ft0, 104(s1) +; ILP32-NEXT: fsw ft0, 24(sp) +; ILP32-NEXT: flw ft0, 108(s1) +; ILP32-NEXT: fsw ft0, 20(sp) +; ILP32-NEXT: flw ft0, 112(s1) +; ILP32-NEXT: fsw ft0, 16(sp) +; ILP32-NEXT: flw ft0, 116(s1) +; ILP32-NEXT: fsw ft0, 12(sp) +; ILP32-NEXT: flw ft0, 120(s1) +; ILP32-NEXT: fsw ft0, 8(sp) +; ILP32-NEXT: flw ft0, 124(s1) +; ILP32-NEXT: fsw ft0, 4(sp) +; ILP32-NEXT: call callee +; ILP32-NEXT: flw ft0, 4(sp) +; ILP32-NEXT: fsw ft0, 124(s1) +; ILP32-NEXT: flw ft0, 8(sp) +; ILP32-NEXT: fsw ft0, 120(s1) +; ILP32-NEXT: flw ft0, 12(sp) +; ILP32-NEXT: fsw ft0, 116(s1) +; ILP32-NEXT: flw ft0, 16(sp) +; ILP32-NEXT: fsw ft0, 112(s1) +; ILP32-NEXT: flw ft0, 20(sp) +; ILP32-NEXT: fsw ft0, 108(s1) +; ILP32-NEXT: flw ft0, 24(sp) +; ILP32-NEXT: fsw ft0, 104(s1) +; ILP32-NEXT: flw ft0, 28(sp) +; ILP32-NEXT: fsw ft0, 100(s1) +; ILP32-NEXT: flw ft0, 32(sp) +; ILP32-NEXT: fsw ft0, 96(s1) +; ILP32-NEXT: flw ft0, 36(sp) +; ILP32-NEXT: fsw ft0, 92(s1) +; ILP32-NEXT: flw ft0, 40(sp) +; ILP32-NEXT: fsw ft0, 88(s1) +; ILP32-NEXT: flw ft0, 44(sp) +; ILP32-NEXT: fsw ft0, 84(s1) +; ILP32-NEXT: flw ft0, 48(sp) +; ILP32-NEXT: fsw ft0, 80(s1) +; ILP32-NEXT: flw ft0, 52(sp) +; ILP32-NEXT: fsw ft0, 76(s1) +; ILP32-NEXT: flw ft0, 56(sp) +; ILP32-NEXT: fsw ft0, 72(s1) +; ILP32-NEXT: flw ft0, 60(sp) +; ILP32-NEXT: fsw ft0, 68(s1) +; ILP32-NEXT: flw ft0, 64(sp) +; ILP32-NEXT: fsw ft0, 64(s1) +; ILP32-NEXT: flw ft0, 68(sp) +; ILP32-NEXT: fsw ft0, 60(s1) +; ILP32-NEXT: flw ft0, 72(sp) +; ILP32-NEXT: fsw ft0, 56(s1) +; ILP32-NEXT: flw ft0, 76(sp) +; ILP32-NEXT: fsw ft0, 52(s1) +; ILP32-NEXT: flw ft0, 80(sp) +; ILP32-NEXT: fsw ft0, 48(s1) +; ILP32-NEXT: flw ft0, 84(sp) +; ILP32-NEXT: fsw ft0, 44(s1) +; ILP32-NEXT: flw ft0, 88(sp) +; ILP32-NEXT: fsw ft0, 40(s1) +; ILP32-NEXT: flw ft0, 92(sp) +; ILP32-NEXT: fsw ft0, 36(s1) +; ILP32-NEXT: flw ft0, 96(sp) +; ILP32-NEXT: fsw ft0, 32(s1) +; ILP32-NEXT: flw ft0, 100(sp) +; ILP32-NEXT: fsw ft0, 28(s1) +; ILP32-NEXT: flw ft0, 104(sp) +; ILP32-NEXT: fsw ft0, 24(s1) +; ILP32-NEXT: flw ft0, 108(sp) +; ILP32-NEXT: fsw ft0, 20(s1) +; ILP32-NEXT: flw ft0, 112(sp) +; ILP32-NEXT: fsw ft0, 16(s1) +; ILP32-NEXT: flw ft0, 116(sp) +; ILP32-NEXT: fsw ft0, %lo(var+12)(s0) +; ILP32-NEXT: flw ft0, 120(sp) +; ILP32-NEXT: fsw ft0, %lo(var+8)(s0) +; ILP32-NEXT: flw ft0, 124(sp) +; ILP32-NEXT: fsw ft0, %lo(var+4)(s0) +; ILP32-NEXT: flw ft0, 128(sp) +; ILP32-NEXT: fsw ft0, %lo(var)(s0) +; ILP32-NEXT: lw s1, 132(sp) +; ILP32-NEXT: lw s0, 136(sp) +; ILP32-NEXT: lw ra, 140(sp) +; ILP32-NEXT: addi sp, sp, 144 +; ILP32-NEXT: ret +; +; LP64-LABEL: caller: +; LP64: # %bb.0: +; LP64-NEXT: addi sp, sp, -160 +; LP64-NEXT: sd ra, 152(sp) +; LP64-NEXT: sd s0, 144(sp) +; LP64-NEXT: sd s1, 136(sp) +; LP64-NEXT: lui s0, %hi(var) +; LP64-NEXT: flw ft0, %lo(var)(s0) +; LP64-NEXT: fsw ft0, 132(sp) +; LP64-NEXT: flw ft0, %lo(var+4)(s0) +; LP64-NEXT: fsw ft0, 128(sp) +; LP64-NEXT: flw ft0, %lo(var+8)(s0) +; LP64-NEXT: fsw ft0, 124(sp) +; LP64-NEXT: flw ft0, %lo(var+12)(s0) +; LP64-NEXT: fsw ft0, 120(sp) +; LP64-NEXT: addi s1, s0, %lo(var) +; LP64-NEXT: flw ft0, 16(s1) +; LP64-NEXT: fsw ft0, 116(sp) +; LP64-NEXT: flw ft0, 20(s1) +; LP64-NEXT: fsw ft0, 112(sp) +; LP64-NEXT: flw ft0, 24(s1) +; LP64-NEXT: fsw ft0, 108(sp) +; LP64-NEXT: flw ft0, 28(s1) +; LP64-NEXT: fsw ft0, 104(sp) +; LP64-NEXT: flw ft0, 32(s1) +; LP64-NEXT: fsw ft0, 100(sp) +; LP64-NEXT: flw ft0, 36(s1) +; LP64-NEXT: fsw ft0, 96(sp) +; LP64-NEXT: flw ft0, 40(s1) +; LP64-NEXT: fsw ft0, 92(sp) +; LP64-NEXT: flw ft0, 44(s1) +; LP64-NEXT: fsw ft0, 88(sp) +; LP64-NEXT: flw ft0, 48(s1) +; LP64-NEXT: fsw ft0, 84(sp) +; LP64-NEXT: flw ft0, 52(s1) +; LP64-NEXT: fsw ft0, 80(sp) +; LP64-NEXT: flw ft0, 56(s1) +; LP64-NEXT: fsw ft0, 76(sp) +; LP64-NEXT: flw ft0, 60(s1) +; LP64-NEXT: fsw ft0, 72(sp) +; LP64-NEXT: flw ft0, 64(s1) +; LP64-NEXT: fsw ft0, 68(sp) +; LP64-NEXT: flw ft0, 68(s1) +; LP64-NEXT: fsw ft0, 64(sp) +; LP64-NEXT: flw ft0, 72(s1) +; LP64-NEXT: fsw ft0, 60(sp) +; LP64-NEXT: flw ft0, 76(s1) +; LP64-NEXT: fsw ft0, 56(sp) +; LP64-NEXT: flw ft0, 80(s1) +; LP64-NEXT: fsw ft0, 52(sp) +; LP64-NEXT: flw ft0, 84(s1) +; LP64-NEXT: fsw ft0, 48(sp) +; LP64-NEXT: flw ft0, 88(s1) +; LP64-NEXT: fsw ft0, 44(sp) +; LP64-NEXT: flw ft0, 92(s1) +; LP64-NEXT: fsw ft0, 40(sp) +; LP64-NEXT: flw ft0, 96(s1) +; LP64-NEXT: fsw ft0, 36(sp) +; LP64-NEXT: flw ft0, 100(s1) +; LP64-NEXT: fsw ft0, 32(sp) +; LP64-NEXT: flw ft0, 104(s1) +; LP64-NEXT: fsw ft0, 28(sp) +; LP64-NEXT: flw ft0, 108(s1) +; LP64-NEXT: fsw ft0, 24(sp) +; LP64-NEXT: flw ft0, 112(s1) +; LP64-NEXT: fsw ft0, 20(sp) +; LP64-NEXT: flw ft0, 116(s1) +; LP64-NEXT: fsw ft0, 16(sp) +; LP64-NEXT: flw ft0, 120(s1) +; LP64-NEXT: fsw ft0, 12(sp) +; LP64-NEXT: flw ft0, 124(s1) +; LP64-NEXT: fsw ft0, 8(sp) +; LP64-NEXT: call callee +; LP64-NEXT: flw ft0, 8(sp) +; LP64-NEXT: fsw ft0, 124(s1) +; LP64-NEXT: flw ft0, 12(sp) +; LP64-NEXT: fsw ft0, 120(s1) +; LP64-NEXT: flw ft0, 16(sp) +; LP64-NEXT: fsw ft0, 116(s1) +; LP64-NEXT: flw ft0, 20(sp) +; LP64-NEXT: fsw ft0, 112(s1) +; LP64-NEXT: flw ft0, 24(sp) +; LP64-NEXT: fsw ft0, 108(s1) +; LP64-NEXT: flw ft0, 28(sp) +; LP64-NEXT: fsw ft0, 104(s1) +; LP64-NEXT: flw ft0, 32(sp) +; LP64-NEXT: fsw ft0, 100(s1) +; LP64-NEXT: flw ft0, 36(sp) +; LP64-NEXT: fsw ft0, 96(s1) +; LP64-NEXT: flw ft0, 40(sp) +; LP64-NEXT: fsw ft0, 92(s1) +; LP64-NEXT: flw ft0, 44(sp) +; LP64-NEXT: fsw ft0, 88(s1) +; LP64-NEXT: flw ft0, 48(sp) +; LP64-NEXT: fsw ft0, 84(s1) +; LP64-NEXT: flw ft0, 52(sp) +; LP64-NEXT: fsw ft0, 80(s1) +; LP64-NEXT: flw ft0, 56(sp) +; LP64-NEXT: fsw ft0, 76(s1) +; LP64-NEXT: flw ft0, 60(sp) +; LP64-NEXT: fsw ft0, 72(s1) +; LP64-NEXT: flw ft0, 64(sp) +; LP64-NEXT: fsw ft0, 68(s1) +; LP64-NEXT: flw ft0, 68(sp) +; LP64-NEXT: fsw ft0, 64(s1) +; LP64-NEXT: flw ft0, 72(sp) +; LP64-NEXT: fsw ft0, 60(s1) +; LP64-NEXT: flw ft0, 76(sp) +; LP64-NEXT: fsw ft0, 56(s1) +; LP64-NEXT: flw ft0, 80(sp) +; LP64-NEXT: fsw ft0, 52(s1) +; LP64-NEXT: flw ft0, 84(sp) +; LP64-NEXT: fsw ft0, 48(s1) +; LP64-NEXT: flw ft0, 88(sp) +; LP64-NEXT: fsw ft0, 44(s1) +; LP64-NEXT: flw ft0, 92(sp) +; LP64-NEXT: fsw ft0, 40(s1) +; LP64-NEXT: flw ft0, 96(sp) +; LP64-NEXT: fsw ft0, 36(s1) +; LP64-NEXT: flw ft0, 100(sp) +; LP64-NEXT: fsw ft0, 32(s1) +; LP64-NEXT: flw ft0, 104(sp) +; LP64-NEXT: fsw ft0, 28(s1) +; LP64-NEXT: flw ft0, 108(sp) +; LP64-NEXT: fsw ft0, 24(s1) +; LP64-NEXT: flw ft0, 112(sp) +; LP64-NEXT: fsw ft0, 20(s1) +; LP64-NEXT: flw ft0, 116(sp) +; LP64-NEXT: fsw ft0, 16(s1) +; LP64-NEXT: flw ft0, 120(sp) +; LP64-NEXT: fsw ft0, %lo(var+12)(s0) +; LP64-NEXT: flw ft0, 124(sp) +; LP64-NEXT: fsw ft0, %lo(var+8)(s0) +; LP64-NEXT: flw ft0, 128(sp) +; LP64-NEXT: fsw ft0, %lo(var+4)(s0) +; LP64-NEXT: flw ft0, 132(sp) +; LP64-NEXT: fsw ft0, %lo(var)(s0) +; LP64-NEXT: ld s1, 136(sp) +; LP64-NEXT: ld s0, 144(sp) +; LP64-NEXT: ld ra, 152(sp) +; LP64-NEXT: addi sp, sp, 160 +; LP64-NEXT: ret +; +; ILP32F-LABEL: caller: +; ILP32F: # %bb.0: +; ILP32F-NEXT: addi sp, sp, -144 +; ILP32F-NEXT: sw ra, 140(sp) +; ILP32F-NEXT: sw s0, 136(sp) +; ILP32F-NEXT: sw s1, 132(sp) +; ILP32F-NEXT: fsw fs0, 128(sp) +; ILP32F-NEXT: fsw fs1, 124(sp) +; ILP32F-NEXT: fsw fs2, 120(sp) +; ILP32F-NEXT: fsw fs3, 116(sp) +; ILP32F-NEXT: fsw fs4, 112(sp) +; ILP32F-NEXT: fsw fs5, 108(sp) +; ILP32F-NEXT: fsw fs6, 104(sp) +; ILP32F-NEXT: fsw fs7, 100(sp) +; ILP32F-NEXT: fsw fs8, 96(sp) +; ILP32F-NEXT: fsw fs9, 92(sp) +; ILP32F-NEXT: fsw fs10, 88(sp) +; ILP32F-NEXT: fsw fs11, 84(sp) +; ILP32F-NEXT: lui s0, %hi(var) +; ILP32F-NEXT: flw ft0, %lo(var)(s0) +; ILP32F-NEXT: fsw ft0, 80(sp) +; ILP32F-NEXT: flw ft0, %lo(var+4)(s0) +; ILP32F-NEXT: fsw ft0, 76(sp) +; ILP32F-NEXT: flw ft0, %lo(var+8)(s0) +; ILP32F-NEXT: fsw ft0, 72(sp) +; ILP32F-NEXT: flw ft0, %lo(var+12)(s0) +; ILP32F-NEXT: fsw ft0, 68(sp) +; ILP32F-NEXT: addi s1, s0, %lo(var) +; ILP32F-NEXT: flw ft0, 16(s1) +; ILP32F-NEXT: fsw ft0, 64(sp) +; ILP32F-NEXT: flw ft0, 20(s1) +; ILP32F-NEXT: fsw ft0, 60(sp) +; ILP32F-NEXT: flw ft0, 24(s1) +; ILP32F-NEXT: fsw ft0, 56(sp) +; ILP32F-NEXT: flw ft0, 28(s1) +; ILP32F-NEXT: fsw ft0, 52(sp) +; ILP32F-NEXT: flw ft0, 32(s1) +; ILP32F-NEXT: fsw ft0, 48(sp) +; ILP32F-NEXT: flw ft0, 36(s1) +; ILP32F-NEXT: fsw ft0, 44(sp) +; ILP32F-NEXT: flw ft0, 40(s1) +; ILP32F-NEXT: fsw ft0, 40(sp) +; ILP32F-NEXT: flw ft0, 44(s1) +; ILP32F-NEXT: fsw ft0, 36(sp) +; ILP32F-NEXT: flw ft0, 48(s1) +; ILP32F-NEXT: fsw ft0, 32(sp) +; ILP32F-NEXT: flw ft0, 52(s1) +; ILP32F-NEXT: fsw ft0, 28(sp) +; ILP32F-NEXT: flw ft0, 56(s1) +; ILP32F-NEXT: fsw ft0, 24(sp) +; ILP32F-NEXT: flw ft0, 60(s1) +; ILP32F-NEXT: fsw ft0, 20(sp) +; ILP32F-NEXT: flw ft0, 64(s1) +; ILP32F-NEXT: fsw ft0, 16(sp) +; ILP32F-NEXT: flw ft0, 68(s1) +; ILP32F-NEXT: fsw ft0, 12(sp) +; ILP32F-NEXT: flw ft0, 72(s1) +; ILP32F-NEXT: fsw ft0, 8(sp) +; ILP32F-NEXT: flw ft0, 76(s1) +; ILP32F-NEXT: fsw ft0, 4(sp) +; ILP32F-NEXT: flw fs8, 80(s1) +; ILP32F-NEXT: flw fs9, 84(s1) +; ILP32F-NEXT: flw fs10, 88(s1) +; ILP32F-NEXT: flw fs11, 92(s1) +; ILP32F-NEXT: flw fs0, 96(s1) +; ILP32F-NEXT: flw fs1, 100(s1) +; ILP32F-NEXT: flw fs2, 104(s1) +; ILP32F-NEXT: flw fs3, 108(s1) +; ILP32F-NEXT: flw fs4, 112(s1) +; ILP32F-NEXT: flw fs5, 116(s1) +; ILP32F-NEXT: flw fs6, 120(s1) +; ILP32F-NEXT: flw fs7, 124(s1) +; ILP32F-NEXT: call callee +; ILP32F-NEXT: fsw fs7, 124(s1) +; ILP32F-NEXT: fsw fs6, 120(s1) +; ILP32F-NEXT: fsw fs5, 116(s1) +; ILP32F-NEXT: fsw fs4, 112(s1) +; ILP32F-NEXT: fsw fs3, 108(s1) +; ILP32F-NEXT: fsw fs2, 104(s1) +; ILP32F-NEXT: fsw fs1, 100(s1) +; ILP32F-NEXT: fsw fs0, 96(s1) +; ILP32F-NEXT: fsw fs11, 92(s1) +; ILP32F-NEXT: fsw fs10, 88(s1) +; ILP32F-NEXT: fsw fs9, 84(s1) +; ILP32F-NEXT: fsw fs8, 80(s1) +; ILP32F-NEXT: flw ft0, 4(sp) +; ILP32F-NEXT: fsw ft0, 76(s1) +; ILP32F-NEXT: flw ft0, 8(sp) +; ILP32F-NEXT: fsw ft0, 72(s1) +; ILP32F-NEXT: flw ft0, 12(sp) +; ILP32F-NEXT: fsw ft0, 68(s1) +; ILP32F-NEXT: flw ft0, 16(sp) +; ILP32F-NEXT: fsw ft0, 64(s1) +; ILP32F-NEXT: flw ft0, 20(sp) +; ILP32F-NEXT: fsw ft0, 60(s1) +; ILP32F-NEXT: flw ft0, 24(sp) +; ILP32F-NEXT: fsw ft0, 56(s1) +; ILP32F-NEXT: flw ft0, 28(sp) +; ILP32F-NEXT: fsw ft0, 52(s1) +; ILP32F-NEXT: flw ft0, 32(sp) +; ILP32F-NEXT: fsw ft0, 48(s1) +; ILP32F-NEXT: flw ft0, 36(sp) +; ILP32F-NEXT: fsw ft0, 44(s1) +; ILP32F-NEXT: flw ft0, 40(sp) +; ILP32F-NEXT: fsw ft0, 40(s1) +; ILP32F-NEXT: flw ft0, 44(sp) +; ILP32F-NEXT: fsw ft0, 36(s1) +; ILP32F-NEXT: flw ft0, 48(sp) +; ILP32F-NEXT: fsw ft0, 32(s1) +; ILP32F-NEXT: flw ft0, 52(sp) +; ILP32F-NEXT: fsw ft0, 28(s1) +; ILP32F-NEXT: flw ft0, 56(sp) +; ILP32F-NEXT: fsw ft0, 24(s1) +; ILP32F-NEXT: flw ft0, 60(sp) +; ILP32F-NEXT: fsw ft0, 20(s1) +; ILP32F-NEXT: flw ft0, 64(sp) +; ILP32F-NEXT: fsw ft0, 16(s1) +; ILP32F-NEXT: flw ft0, 68(sp) +; ILP32F-NEXT: fsw ft0, %lo(var+12)(s0) +; ILP32F-NEXT: flw ft0, 72(sp) +; ILP32F-NEXT: fsw ft0, %lo(var+8)(s0) +; ILP32F-NEXT: flw ft0, 76(sp) +; ILP32F-NEXT: fsw ft0, %lo(var+4)(s0) +; ILP32F-NEXT: flw ft0, 80(sp) +; ILP32F-NEXT: fsw ft0, %lo(var)(s0) +; ILP32F-NEXT: flw fs11, 84(sp) +; ILP32F-NEXT: flw fs10, 88(sp) +; ILP32F-NEXT: flw fs9, 92(sp) +; ILP32F-NEXT: flw fs8, 96(sp) +; ILP32F-NEXT: flw fs7, 100(sp) +; ILP32F-NEXT: flw fs6, 104(sp) +; ILP32F-NEXT: flw fs5, 108(sp) +; ILP32F-NEXT: flw fs4, 112(sp) +; ILP32F-NEXT: flw fs3, 116(sp) +; ILP32F-NEXT: flw fs2, 120(sp) +; ILP32F-NEXT: flw fs1, 124(sp) +; ILP32F-NEXT: flw fs0, 128(sp) +; ILP32F-NEXT: lw s1, 132(sp) +; ILP32F-NEXT: lw s0, 136(sp) +; ILP32F-NEXT: lw ra, 140(sp) +; ILP32F-NEXT: addi sp, sp, 144 +; ILP32F-NEXT: ret +; +; LP64F-LABEL: caller: +; LP64F: # %bb.0: +; LP64F-NEXT: addi sp, sp, -160 +; LP64F-NEXT: sd ra, 152(sp) +; LP64F-NEXT: sd s0, 144(sp) +; LP64F-NEXT: sd s1, 136(sp) +; LP64F-NEXT: fsw fs0, 132(sp) +; LP64F-NEXT: fsw fs1, 128(sp) +; LP64F-NEXT: fsw fs2, 124(sp) +; LP64F-NEXT: fsw fs3, 120(sp) +; LP64F-NEXT: fsw fs4, 116(sp) +; LP64F-NEXT: fsw fs5, 112(sp) +; LP64F-NEXT: fsw fs6, 108(sp) +; LP64F-NEXT: fsw fs7, 104(sp) +; LP64F-NEXT: fsw fs8, 100(sp) +; LP64F-NEXT: fsw fs9, 96(sp) +; LP64F-NEXT: fsw fs10, 92(sp) +; LP64F-NEXT: fsw fs11, 88(sp) +; LP64F-NEXT: lui s0, %hi(var) +; LP64F-NEXT: flw ft0, %lo(var)(s0) +; LP64F-NEXT: fsw ft0, 84(sp) +; LP64F-NEXT: flw ft0, %lo(var+4)(s0) +; LP64F-NEXT: fsw ft0, 80(sp) +; LP64F-NEXT: flw ft0, %lo(var+8)(s0) +; LP64F-NEXT: fsw ft0, 76(sp) +; LP64F-NEXT: flw ft0, %lo(var+12)(s0) +; LP64F-NEXT: fsw ft0, 72(sp) +; LP64F-NEXT: addi s1, s0, %lo(var) +; LP64F-NEXT: flw ft0, 16(s1) +; LP64F-NEXT: fsw ft0, 68(sp) +; LP64F-NEXT: flw ft0, 20(s1) +; LP64F-NEXT: fsw ft0, 64(sp) +; LP64F-NEXT: flw ft0, 24(s1) +; LP64F-NEXT: fsw ft0, 60(sp) +; LP64F-NEXT: flw ft0, 28(s1) +; LP64F-NEXT: fsw ft0, 56(sp) +; LP64F-NEXT: flw ft0, 32(s1) +; LP64F-NEXT: fsw ft0, 52(sp) +; LP64F-NEXT: flw ft0, 36(s1) +; LP64F-NEXT: fsw ft0, 48(sp) +; LP64F-NEXT: flw ft0, 40(s1) +; LP64F-NEXT: fsw ft0, 44(sp) +; LP64F-NEXT: flw ft0, 44(s1) +; LP64F-NEXT: fsw ft0, 40(sp) +; LP64F-NEXT: flw ft0, 48(s1) +; LP64F-NEXT: fsw ft0, 36(sp) +; LP64F-NEXT: flw ft0, 52(s1) +; LP64F-NEXT: fsw ft0, 32(sp) +; LP64F-NEXT: flw ft0, 56(s1) +; LP64F-NEXT: fsw ft0, 28(sp) +; LP64F-NEXT: flw ft0, 60(s1) +; LP64F-NEXT: fsw ft0, 24(sp) +; LP64F-NEXT: flw ft0, 64(s1) +; LP64F-NEXT: fsw ft0, 20(sp) +; LP64F-NEXT: flw ft0, 68(s1) +; LP64F-NEXT: fsw ft0, 16(sp) +; LP64F-NEXT: flw ft0, 72(s1) +; LP64F-NEXT: fsw ft0, 12(sp) +; LP64F-NEXT: flw ft0, 76(s1) +; LP64F-NEXT: fsw ft0, 8(sp) +; LP64F-NEXT: flw fs8, 80(s1) +; LP64F-NEXT: flw fs9, 84(s1) +; LP64F-NEXT: flw fs10, 88(s1) +; LP64F-NEXT: flw fs11, 92(s1) +; LP64F-NEXT: flw fs0, 96(s1) +; LP64F-NEXT: flw fs1, 100(s1) +; LP64F-NEXT: flw fs2, 104(s1) +; LP64F-NEXT: flw fs3, 108(s1) +; LP64F-NEXT: flw fs4, 112(s1) +; LP64F-NEXT: flw fs5, 116(s1) +; LP64F-NEXT: flw fs6, 120(s1) +; LP64F-NEXT: flw fs7, 124(s1) +; LP64F-NEXT: call callee +; LP64F-NEXT: fsw fs7, 124(s1) +; LP64F-NEXT: fsw fs6, 120(s1) +; LP64F-NEXT: fsw fs5, 116(s1) +; LP64F-NEXT: fsw fs4, 112(s1) +; LP64F-NEXT: fsw fs3, 108(s1) +; LP64F-NEXT: fsw fs2, 104(s1) +; LP64F-NEXT: fsw fs1, 100(s1) +; LP64F-NEXT: fsw fs0, 96(s1) +; LP64F-NEXT: fsw fs11, 92(s1) +; LP64F-NEXT: fsw fs10, 88(s1) +; LP64F-NEXT: fsw fs9, 84(s1) +; LP64F-NEXT: fsw fs8, 80(s1) +; LP64F-NEXT: flw ft0, 8(sp) +; LP64F-NEXT: fsw ft0, 76(s1) +; LP64F-NEXT: flw ft0, 12(sp) +; LP64F-NEXT: fsw ft0, 72(s1) +; LP64F-NEXT: flw ft0, 16(sp) +; LP64F-NEXT: fsw ft0, 68(s1) +; LP64F-NEXT: flw ft0, 20(sp) +; LP64F-NEXT: fsw ft0, 64(s1) +; LP64F-NEXT: flw ft0, 24(sp) +; LP64F-NEXT: fsw ft0, 60(s1) +; LP64F-NEXT: flw ft0, 28(sp) +; LP64F-NEXT: fsw ft0, 56(s1) +; LP64F-NEXT: flw ft0, 32(sp) +; LP64F-NEXT: fsw ft0, 52(s1) +; LP64F-NEXT: flw ft0, 36(sp) +; LP64F-NEXT: fsw ft0, 48(s1) +; LP64F-NEXT: flw ft0, 40(sp) +; LP64F-NEXT: fsw ft0, 44(s1) +; LP64F-NEXT: flw ft0, 44(sp) +; LP64F-NEXT: fsw ft0, 40(s1) +; LP64F-NEXT: flw ft0, 48(sp) +; LP64F-NEXT: fsw ft0, 36(s1) +; LP64F-NEXT: flw ft0, 52(sp) +; LP64F-NEXT: fsw ft0, 32(s1) +; LP64F-NEXT: flw ft0, 56(sp) +; LP64F-NEXT: fsw ft0, 28(s1) +; LP64F-NEXT: flw ft0, 60(sp) +; LP64F-NEXT: fsw ft0, 24(s1) +; LP64F-NEXT: flw ft0, 64(sp) +; LP64F-NEXT: fsw ft0, 20(s1) +; LP64F-NEXT: flw ft0, 68(sp) +; LP64F-NEXT: fsw ft0, 16(s1) +; LP64F-NEXT: flw ft0, 72(sp) +; LP64F-NEXT: fsw ft0, %lo(var+12)(s0) +; LP64F-NEXT: flw ft0, 76(sp) +; LP64F-NEXT: fsw ft0, %lo(var+8)(s0) +; LP64F-NEXT: flw ft0, 80(sp) +; LP64F-NEXT: fsw ft0, %lo(var+4)(s0) +; LP64F-NEXT: flw ft0, 84(sp) +; LP64F-NEXT: fsw ft0, %lo(var)(s0) +; LP64F-NEXT: flw fs11, 88(sp) +; LP64F-NEXT: flw fs10, 92(sp) +; LP64F-NEXT: flw fs9, 96(sp) +; LP64F-NEXT: flw fs8, 100(sp) +; LP64F-NEXT: flw fs7, 104(sp) +; LP64F-NEXT: flw fs6, 108(sp) +; LP64F-NEXT: flw fs5, 112(sp) +; LP64F-NEXT: flw fs4, 116(sp) +; LP64F-NEXT: flw fs3, 120(sp) +; LP64F-NEXT: flw fs2, 124(sp) +; LP64F-NEXT: flw fs1, 128(sp) +; LP64F-NEXT: flw fs0, 132(sp) +; LP64F-NEXT: ld s1, 136(sp) +; LP64F-NEXT: ld s0, 144(sp) +; LP64F-NEXT: ld ra, 152(sp) +; LP64F-NEXT: addi sp, sp, 160 +; LP64F-NEXT: ret ; -; ILP32F-LP64F-LABEL: caller: -; ILP32F-LP64F: flw fs8, 80(s1) -; ILP32F-LP64F-NEXT: flw fs9, 84(s1) -; ILP32F-LP64F-NEXT: flw fs10, 88(s1) -; ILP32F-LP64F-NEXT: flw fs11, 92(s1) -; ILP32F-LP64F-NEXT: flw fs0, 96(s1) -; ILP32F-LP64F-NEXT: flw fs1, 100(s1) -; ILP32F-LP64F-NEXT: flw fs2, 104(s1) -; ILP32F-LP64F-NEXT: flw fs3, 108(s1) -; ILP32F-LP64F-NEXT: flw fs4, 112(s1) -; ILP32F-LP64F-NEXT: flw fs5, 116(s1) -; ILP32F-LP64F-NEXT: flw fs6, 120(s1) -; ILP32F-LP64F-NEXT: flw fs7, 124(s1) -; ILP32F-LP64F-NEXT: call callee -; ILP32F-LP64F-NEXT: fsw fs7, 124(s1) -; ILP32F-LP64F-NEXT: fsw fs6, 120(s1) -; ILP32F-LP64F-NEXT: fsw fs5, 116(s1) -; ILP32F-LP64F-NEXT: fsw fs4, 112(s1) -; ILP32F-LP64F-NEXT: fsw fs3, 108(s1) -; ILP32F-LP64F-NEXT: fsw fs2, 104(s1) -; ILP32F-LP64F-NEXT: fsw fs1, 100(s1) -; ILP32F-LP64F-NEXT: fsw fs0, 96(s1) -; ILP32F-LP64F-NEXT: fsw fs11, 92(s1) -; ILP32F-LP64F-NEXT: fsw fs10, 88(s1) -; ILP32F-LP64F-NEXT: fsw fs9, 84(s1) -; ILP32F-LP64F-NEXT: fsw fs8, 80(s1) -; ILP32F-LP64F-NEXT: lw ft0, {{[0-9]+}}(sp) +; ILP32D-LABEL: caller: +; ILP32D: # %bb.0: +; ILP32D-NEXT: addi sp, sp, -192 +; ILP32D-NEXT: sw ra, 188(sp) +; ILP32D-NEXT: sw s0, 184(sp) +; ILP32D-NEXT: sw s1, 180(sp) +; ILP32D-NEXT: fsd fs0, 168(sp) +; ILP32D-NEXT: fsd fs1, 160(sp) +; ILP32D-NEXT: fsd fs2, 152(sp) +; ILP32D-NEXT: fsd fs3, 144(sp) +; ILP32D-NEXT: fsd fs4, 136(sp) +; ILP32D-NEXT: fsd fs5, 128(sp) +; ILP32D-NEXT: fsd fs6, 120(sp) +; ILP32D-NEXT: fsd fs7, 112(sp) +; ILP32D-NEXT: fsd fs8, 104(sp) +; ILP32D-NEXT: fsd fs9, 96(sp) +; ILP32D-NEXT: fsd fs10, 88(sp) +; ILP32D-NEXT: fsd fs11, 80(sp) +; ILP32D-NEXT: lui s0, %hi(var) +; ILP32D-NEXT: flw ft0, %lo(var)(s0) +; ILP32D-NEXT: fsw ft0, 76(sp) +; ILP32D-NEXT: flw ft0, %lo(var+4)(s0) +; ILP32D-NEXT: fsw ft0, 72(sp) +; ILP32D-NEXT: flw ft0, %lo(var+8)(s0) +; ILP32D-NEXT: fsw ft0, 68(sp) +; ILP32D-NEXT: flw ft0, %lo(var+12)(s0) +; ILP32D-NEXT: fsw ft0, 64(sp) +; ILP32D-NEXT: addi s1, s0, %lo(var) +; ILP32D-NEXT: flw ft0, 16(s1) +; ILP32D-NEXT: fsw ft0, 60(sp) +; ILP32D-NEXT: flw ft0, 20(s1) +; ILP32D-NEXT: fsw ft0, 56(sp) +; ILP32D-NEXT: flw ft0, 24(s1) +; ILP32D-NEXT: fsw ft0, 52(sp) +; ILP32D-NEXT: flw ft0, 28(s1) +; ILP32D-NEXT: fsw ft0, 48(sp) +; ILP32D-NEXT: flw ft0, 32(s1) +; ILP32D-NEXT: fsw ft0, 44(sp) +; ILP32D-NEXT: flw ft0, 36(s1) +; ILP32D-NEXT: fsw ft0, 40(sp) +; ILP32D-NEXT: flw ft0, 40(s1) +; ILP32D-NEXT: fsw ft0, 36(sp) +; ILP32D-NEXT: flw ft0, 44(s1) +; ILP32D-NEXT: fsw ft0, 32(sp) +; ILP32D-NEXT: flw ft0, 48(s1) +; ILP32D-NEXT: fsw ft0, 28(sp) +; ILP32D-NEXT: flw ft0, 52(s1) +; ILP32D-NEXT: fsw ft0, 24(sp) +; ILP32D-NEXT: flw ft0, 56(s1) +; ILP32D-NEXT: fsw ft0, 20(sp) +; ILP32D-NEXT: flw ft0, 60(s1) +; ILP32D-NEXT: fsw ft0, 16(sp) +; ILP32D-NEXT: flw ft0, 64(s1) +; ILP32D-NEXT: fsw ft0, 12(sp) +; ILP32D-NEXT: flw ft0, 68(s1) +; ILP32D-NEXT: fsw ft0, 8(sp) +; ILP32D-NEXT: flw ft0, 72(s1) +; ILP32D-NEXT: fsw ft0, 4(sp) +; ILP32D-NEXT: flw ft0, 76(s1) +; ILP32D-NEXT: fsw ft0, 0(sp) +; ILP32D-NEXT: flw fs8, 80(s1) +; ILP32D-NEXT: flw fs9, 84(s1) +; ILP32D-NEXT: flw fs10, 88(s1) +; ILP32D-NEXT: flw fs11, 92(s1) +; ILP32D-NEXT: flw fs0, 96(s1) +; ILP32D-NEXT: flw fs1, 100(s1) +; ILP32D-NEXT: flw fs2, 104(s1) +; ILP32D-NEXT: flw fs3, 108(s1) +; ILP32D-NEXT: flw fs4, 112(s1) +; ILP32D-NEXT: flw fs5, 116(s1) +; ILP32D-NEXT: flw fs6, 120(s1) +; ILP32D-NEXT: flw fs7, 124(s1) +; ILP32D-NEXT: call callee +; ILP32D-NEXT: fsw fs7, 124(s1) +; ILP32D-NEXT: fsw fs6, 120(s1) +; ILP32D-NEXT: fsw fs5, 116(s1) +; ILP32D-NEXT: fsw fs4, 112(s1) +; ILP32D-NEXT: fsw fs3, 108(s1) +; ILP32D-NEXT: fsw fs2, 104(s1) +; ILP32D-NEXT: fsw fs1, 100(s1) +; ILP32D-NEXT: fsw fs0, 96(s1) +; ILP32D-NEXT: fsw fs11, 92(s1) +; ILP32D-NEXT: fsw fs10, 88(s1) +; ILP32D-NEXT: fsw fs9, 84(s1) +; ILP32D-NEXT: fsw fs8, 80(s1) +; ILP32D-NEXT: flw ft0, 0(sp) +; ILP32D-NEXT: fsw ft0, 76(s1) +; ILP32D-NEXT: flw ft0, 4(sp) +; ILP32D-NEXT: fsw ft0, 72(s1) +; ILP32D-NEXT: flw ft0, 8(sp) +; ILP32D-NEXT: fsw ft0, 68(s1) +; ILP32D-NEXT: flw ft0, 12(sp) +; ILP32D-NEXT: fsw ft0, 64(s1) +; ILP32D-NEXT: flw ft0, 16(sp) +; ILP32D-NEXT: fsw ft0, 60(s1) +; ILP32D-NEXT: flw ft0, 20(sp) +; ILP32D-NEXT: fsw ft0, 56(s1) +; ILP32D-NEXT: flw ft0, 24(sp) +; ILP32D-NEXT: fsw ft0, 52(s1) +; ILP32D-NEXT: flw ft0, 28(sp) +; ILP32D-NEXT: fsw ft0, 48(s1) +; ILP32D-NEXT: flw ft0, 32(sp) +; ILP32D-NEXT: fsw ft0, 44(s1) +; ILP32D-NEXT: flw ft0, 36(sp) +; ILP32D-NEXT: fsw ft0, 40(s1) +; ILP32D-NEXT: flw ft0, 40(sp) +; ILP32D-NEXT: fsw ft0, 36(s1) +; ILP32D-NEXT: flw ft0, 44(sp) +; ILP32D-NEXT: fsw ft0, 32(s1) +; ILP32D-NEXT: flw ft0, 48(sp) +; ILP32D-NEXT: fsw ft0, 28(s1) +; ILP32D-NEXT: flw ft0, 52(sp) +; ILP32D-NEXT: fsw ft0, 24(s1) +; ILP32D-NEXT: flw ft0, 56(sp) +; ILP32D-NEXT: fsw ft0, 20(s1) +; ILP32D-NEXT: flw ft0, 60(sp) +; ILP32D-NEXT: fsw ft0, 16(s1) +; ILP32D-NEXT: flw ft0, 64(sp) +; ILP32D-NEXT: fsw ft0, %lo(var+12)(s0) +; ILP32D-NEXT: flw ft0, 68(sp) +; ILP32D-NEXT: fsw ft0, %lo(var+8)(s0) +; ILP32D-NEXT: flw ft0, 72(sp) +; ILP32D-NEXT: fsw ft0, %lo(var+4)(s0) +; ILP32D-NEXT: flw ft0, 76(sp) +; ILP32D-NEXT: fsw ft0, %lo(var)(s0) +; ILP32D-NEXT: fld fs11, 80(sp) +; ILP32D-NEXT: fld fs10, 88(sp) +; ILP32D-NEXT: fld fs9, 96(sp) +; ILP32D-NEXT: fld fs8, 104(sp) +; ILP32D-NEXT: fld fs7, 112(sp) +; ILP32D-NEXT: fld fs6, 120(sp) +; ILP32D-NEXT: fld fs5, 128(sp) +; ILP32D-NEXT: fld fs4, 136(sp) +; ILP32D-NEXT: fld fs3, 144(sp) +; ILP32D-NEXT: fld fs2, 152(sp) +; ILP32D-NEXT: fld fs1, 160(sp) +; ILP32D-NEXT: fld fs0, 168(sp) +; ILP32D-NEXT: lw s1, 180(sp) +; ILP32D-NEXT: lw s0, 184(sp) +; ILP32D-NEXT: lw ra, 188(sp) +; ILP32D-NEXT: addi sp, sp, 192 +; ILP32D-NEXT: ret ; -; ILP32D-LP64D-LABEL: caller: -; ILP32D-LP64D: flw fs8, 80(s1) -; ILP32D-LP64D-NEXT: flw fs9, 84(s1) -; ILP32D-LP64D-NEXT: flw fs10, 88(s1) -; ILP32D-LP64D-NEXT: flw fs11, 92(s1) -; ILP32D-LP64D-NEXT: flw fs0, 96(s1) -; ILP32D-LP64D-NEXT: flw fs1, 100(s1) -; ILP32D-LP64D-NEXT: flw fs2, 104(s1) -; ILP32D-LP64D-NEXT: flw fs3, 108(s1) -; ILP32D-LP64D-NEXT: flw fs4, 112(s1) -; ILP32D-LP64D-NEXT: flw fs5, 116(s1) -; ILP32D-LP64D-NEXT: flw fs6, 120(s1) -; ILP32D-LP64D-NEXT: flw fs7, 124(s1) -; ILP32D-LP64D-NEXT: call callee -; ILP32D-LP64D-NEXT: fsw fs7, 124(s1) -; ILP32D-LP64D-NEXT: fsw fs6, 120(s1) -; ILP32D-LP64D-NEXT: fsw fs5, 116(s1) -; ILP32D-LP64D-NEXT: fsw fs4, 112(s1) -; ILP32D-LP64D-NEXT: fsw fs3, 108(s1) -; ILP32D-LP64D-NEXT: fsw fs2, 104(s1) -; ILP32D-LP64D-NEXT: fsw fs1, 100(s1) -; ILP32D-LP64D-NEXT: fsw fs0, 96(s1) -; ILP32D-LP64D-NEXT: fsw fs11, 92(s1) -; ILP32D-LP64D-NEXT: fsw fs10, 88(s1) -; ILP32D-LP64D-NEXT: fsw fs9, 84(s1) -; ILP32D-LP64D-NEXT: fsw fs8, 80(s1) -; ILP32D-LP64D-NEXT: flw ft0, {{[0-9]+}}(sp) +; LP64D-LABEL: caller: +; LP64D: # %bb.0: +; LP64D-NEXT: addi sp, sp, -208 +; LP64D-NEXT: sd ra, 200(sp) +; LP64D-NEXT: sd s0, 192(sp) +; LP64D-NEXT: sd s1, 184(sp) +; LP64D-NEXT: fsd fs0, 176(sp) +; LP64D-NEXT: fsd fs1, 168(sp) +; LP64D-NEXT: fsd fs2, 160(sp) +; LP64D-NEXT: fsd fs3, 152(sp) +; LP64D-NEXT: fsd fs4, 144(sp) +; LP64D-NEXT: fsd fs5, 136(sp) +; LP64D-NEXT: fsd fs6, 128(sp) +; LP64D-NEXT: fsd fs7, 120(sp) +; LP64D-NEXT: fsd fs8, 112(sp) +; LP64D-NEXT: fsd fs9, 104(sp) +; LP64D-NEXT: fsd fs10, 96(sp) +; LP64D-NEXT: fsd fs11, 88(sp) +; LP64D-NEXT: lui s0, %hi(var) +; LP64D-NEXT: flw ft0, %lo(var)(s0) +; LP64D-NEXT: fsw ft0, 84(sp) +; LP64D-NEXT: flw ft0, %lo(var+4)(s0) +; LP64D-NEXT: fsw ft0, 80(sp) +; LP64D-NEXT: flw ft0, %lo(var+8)(s0) +; LP64D-NEXT: fsw ft0, 76(sp) +; LP64D-NEXT: flw ft0, %lo(var+12)(s0) +; LP64D-NEXT: fsw ft0, 72(sp) +; LP64D-NEXT: addi s1, s0, %lo(var) +; LP64D-NEXT: flw ft0, 16(s1) +; LP64D-NEXT: fsw ft0, 68(sp) +; LP64D-NEXT: flw ft0, 20(s1) +; LP64D-NEXT: fsw ft0, 64(sp) +; LP64D-NEXT: flw ft0, 24(s1) +; LP64D-NEXT: fsw ft0, 60(sp) +; LP64D-NEXT: flw ft0, 28(s1) +; LP64D-NEXT: fsw ft0, 56(sp) +; LP64D-NEXT: flw ft0, 32(s1) +; LP64D-NEXT: fsw ft0, 52(sp) +; LP64D-NEXT: flw ft0, 36(s1) +; LP64D-NEXT: fsw ft0, 48(sp) +; LP64D-NEXT: flw ft0, 40(s1) +; LP64D-NEXT: fsw ft0, 44(sp) +; LP64D-NEXT: flw ft0, 44(s1) +; LP64D-NEXT: fsw ft0, 40(sp) +; LP64D-NEXT: flw ft0, 48(s1) +; LP64D-NEXT: fsw ft0, 36(sp) +; LP64D-NEXT: flw ft0, 52(s1) +; LP64D-NEXT: fsw ft0, 32(sp) +; LP64D-NEXT: flw ft0, 56(s1) +; LP64D-NEXT: fsw ft0, 28(sp) +; LP64D-NEXT: flw ft0, 60(s1) +; LP64D-NEXT: fsw ft0, 24(sp) +; LP64D-NEXT: flw ft0, 64(s1) +; LP64D-NEXT: fsw ft0, 20(sp) +; LP64D-NEXT: flw ft0, 68(s1) +; LP64D-NEXT: fsw ft0, 16(sp) +; LP64D-NEXT: flw ft0, 72(s1) +; LP64D-NEXT: fsw ft0, 12(sp) +; LP64D-NEXT: flw ft0, 76(s1) +; LP64D-NEXT: fsw ft0, 8(sp) +; LP64D-NEXT: flw fs8, 80(s1) +; LP64D-NEXT: flw fs9, 84(s1) +; LP64D-NEXT: flw fs10, 88(s1) +; LP64D-NEXT: flw fs11, 92(s1) +; LP64D-NEXT: flw fs0, 96(s1) +; LP64D-NEXT: flw fs1, 100(s1) +; LP64D-NEXT: flw fs2, 104(s1) +; LP64D-NEXT: flw fs3, 108(s1) +; LP64D-NEXT: flw fs4, 112(s1) +; LP64D-NEXT: flw fs5, 116(s1) +; LP64D-NEXT: flw fs6, 120(s1) +; LP64D-NEXT: flw fs7, 124(s1) +; LP64D-NEXT: call callee +; LP64D-NEXT: fsw fs7, 124(s1) +; LP64D-NEXT: fsw fs6, 120(s1) +; LP64D-NEXT: fsw fs5, 116(s1) +; LP64D-NEXT: fsw fs4, 112(s1) +; LP64D-NEXT: fsw fs3, 108(s1) +; LP64D-NEXT: fsw fs2, 104(s1) +; LP64D-NEXT: fsw fs1, 100(s1) +; LP64D-NEXT: fsw fs0, 96(s1) +; LP64D-NEXT: fsw fs11, 92(s1) +; LP64D-NEXT: fsw fs10, 88(s1) +; LP64D-NEXT: fsw fs9, 84(s1) +; LP64D-NEXT: fsw fs8, 80(s1) +; LP64D-NEXT: flw ft0, 8(sp) +; LP64D-NEXT: fsw ft0, 76(s1) +; LP64D-NEXT: flw ft0, 12(sp) +; LP64D-NEXT: fsw ft0, 72(s1) +; LP64D-NEXT: flw ft0, 16(sp) +; LP64D-NEXT: fsw ft0, 68(s1) +; LP64D-NEXT: flw ft0, 20(sp) +; LP64D-NEXT: fsw ft0, 64(s1) +; LP64D-NEXT: flw ft0, 24(sp) +; LP64D-NEXT: fsw ft0, 60(s1) +; LP64D-NEXT: flw ft0, 28(sp) +; LP64D-NEXT: fsw ft0, 56(s1) +; LP64D-NEXT: flw ft0, 32(sp) +; LP64D-NEXT: fsw ft0, 52(s1) +; LP64D-NEXT: flw ft0, 36(sp) +; LP64D-NEXT: fsw ft0, 48(s1) +; LP64D-NEXT: flw ft0, 40(sp) +; LP64D-NEXT: fsw ft0, 44(s1) +; LP64D-NEXT: flw ft0, 44(sp) +; LP64D-NEXT: fsw ft0, 40(s1) +; LP64D-NEXT: flw ft0, 48(sp) +; LP64D-NEXT: fsw ft0, 36(s1) +; LP64D-NEXT: flw ft0, 52(sp) +; LP64D-NEXT: fsw ft0, 32(s1) +; LP64D-NEXT: flw ft0, 56(sp) +; LP64D-NEXT: fsw ft0, 28(s1) +; LP64D-NEXT: flw ft0, 60(sp) +; LP64D-NEXT: fsw ft0, 24(s1) +; LP64D-NEXT: flw ft0, 64(sp) +; LP64D-NEXT: fsw ft0, 20(s1) +; LP64D-NEXT: flw ft0, 68(sp) +; LP64D-NEXT: fsw ft0, 16(s1) +; LP64D-NEXT: flw ft0, 72(sp) +; LP64D-NEXT: fsw ft0, %lo(var+12)(s0) +; LP64D-NEXT: flw ft0, 76(sp) +; LP64D-NEXT: fsw ft0, %lo(var+8)(s0) +; LP64D-NEXT: flw ft0, 80(sp) +; LP64D-NEXT: fsw ft0, %lo(var+4)(s0) +; LP64D-NEXT: flw ft0, 84(sp) +; LP64D-NEXT: fsw ft0, %lo(var)(s0) +; LP64D-NEXT: fld fs11, 88(sp) +; LP64D-NEXT: fld fs10, 96(sp) +; LP64D-NEXT: fld fs9, 104(sp) +; LP64D-NEXT: fld fs8, 112(sp) +; LP64D-NEXT: fld fs7, 120(sp) +; LP64D-NEXT: fld fs6, 128(sp) +; LP64D-NEXT: fld fs5, 136(sp) +; LP64D-NEXT: fld fs4, 144(sp) +; LP64D-NEXT: fld fs3, 152(sp) +; LP64D-NEXT: fld fs2, 160(sp) +; LP64D-NEXT: fld fs1, 168(sp) +; LP64D-NEXT: fld fs0, 176(sp) +; LP64D-NEXT: ld s1, 184(sp) +; LP64D-NEXT: ld s0, 192(sp) +; LP64D-NEXT: ld ra, 200(sp) +; LP64D-NEXT: addi sp, sp, 208 +; LP64D-NEXT: ret %val = load [32 x float], [32 x float]* @var call void @callee() store volatile [32 x float] %val, [32 x float]* @var diff --git a/llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll b/llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll index f95bc45736af..d5c67fb46203 100644 --- a/llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll +++ b/llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll @@ -1,11 +1,12 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=riscv32 -mattr=+d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32-LP64 +; RUN: | FileCheck %s -check-prefix=ILP32 ; RUN: llc -mtriple=riscv64 -mattr=+d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32-LP64 +; RUN: | FileCheck %s -check-prefix=LP64 ; RUN: llc -mtriple=riscv32 -mattr=+d -target-abi ilp32d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32D-LP64D +; RUN: | FileCheck %s -check-prefix=ILP32D ; RUN: llc -mtriple=riscv64 -mattr=+d -target-abi lp64d -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ILP32D-LP64D +; RUN: | FileCheck %s -check-prefix=LP64D @var = global [32 x double] zeroinitializer @@ -16,94 +17,337 @@ ; something appropriate. define void @callee() nounwind { -; ILP32-LP64-LABEL: callee: -; ILP32-LP64: # %bb.0: -; ILP32-LP64-NEXT: lui a0, %hi(var) -; ILP32-LP64-NEXT: fld ft0, %lo(var)(a0) -; ILP32-LP64-NEXT: addi a1, a0, %lo(var) -; ILP32-LP64-NEXT: fld ft1, 8(a1) -; ILP32-LP64-NEXT: fld ft2, 16(a1) -; ILP32-LP64-NEXT: fld ft3, 24(a1) -; ILP32-LP64-NEXT: fld ft4, 32(a1) -; ILP32-LP64-NEXT: fld ft5, 40(a1) -; ILP32-LP64-NEXT: fld ft6, 48(a1) -; ILP32-LP64-NEXT: fld ft7, 56(a1) -; ILP32-LP64-NEXT: fld fa0, 64(a1) -; ILP32-LP64-NEXT: fld fa1, 72(a1) -; ILP32-LP64-NEXT: fld fa2, 80(a1) -; ILP32-LP64-NEXT: fld fa3, 88(a1) -; ILP32-LP64-NEXT: fld fa4, 96(a1) -; ILP32-LP64-NEXT: fld fa5, 104(a1) -; ILP32-LP64-NEXT: fld fa6, 112(a1) -; ILP32-LP64-NEXT: fld fa7, 120(a1) -; ILP32-LP64-NEXT: fld ft8, 128(a1) -; ILP32-LP64-NEXT: fld ft9, 136(a1) -; ILP32-LP64-NEXT: fld ft10, 144(a1) -; ILP32-LP64-NEXT: fld ft11, 152(a1) -; ILP32-LP64-NEXT: fld fs0, 160(a1) -; ILP32-LP64-NEXT: fld fs1, 168(a1) -; ILP32-LP64-NEXT: fld fs2, 176(a1) -; ILP32-LP64-NEXT: fld fs3, 184(a1) -; ILP32-LP64-NEXT: fld fs4, 192(a1) -; ILP32-LP64-NEXT: fld fs5, 200(a1) -; ILP32-LP64-NEXT: fld fs6, 208(a1) -; ILP32-LP64-NEXT: fld fs7, 216(a1) -; ILP32-LP64-NEXT: fld fs8, 248(a1) -; ILP32-LP64-NEXT: fld fs9, 240(a1) -; ILP32-LP64-NEXT: fld fs10, 232(a1) -; ILP32-LP64-NEXT: fld fs11, 224(a1) -; ILP32-LP64-NEXT: fsd fs8, 248(a1) -; ILP32-LP64-NEXT: fsd fs9, 240(a1) -; ILP32-LP64-NEXT: fsd fs10, 232(a1) -; ILP32-LP64-NEXT: fsd fs11, 224(a1) -; ILP32-LP64-NEXT: fsd fs7, 216(a1) -; ILP32-LP64-NEXT: fsd fs6, 208(a1) -; ILP32-LP64-NEXT: fsd fs5, 200(a1) -; ILP32-LP64-NEXT: fsd fs4, 192(a1) -; ILP32-LP64-NEXT: fsd fs3, 184(a1) -; ILP32-LP64-NEXT: fsd fs2, 176(a1) -; ILP32-LP64-NEXT: fsd fs1, 168(a1) -; ILP32-LP64-NEXT: fsd fs0, 160(a1) -; ILP32-LP64-NEXT: fsd ft11, 152(a1) -; ILP32-LP64-NEXT: fsd ft10, 144(a1) -; ILP32-LP64-NEXT: fsd ft9, 136(a1) -; ILP32-LP64-NEXT: fsd ft8, 128(a1) -; ILP32-LP64-NEXT: fsd fa7, 120(a1) -; ILP32-LP64-NEXT: fsd fa6, 112(a1) -; ILP32-LP64-NEXT: fsd fa5, 104(a1) -; ILP32-LP64-NEXT: fsd fa4, 96(a1) -; ILP32-LP64-NEXT: fsd fa3, 88(a1) -; ILP32-LP64-NEXT: fsd fa2, 80(a1) -; ILP32-LP64-NEXT: fsd fa1, 72(a1) -; ILP32-LP64-NEXT: fsd fa0, 64(a1) -; ILP32-LP64-NEXT: fsd ft7, 56(a1) -; ILP32-LP64-NEXT: fsd ft6, 48(a1) -; ILP32-LP64-NEXT: fsd ft5, 40(a1) -; ILP32-LP64-NEXT: fsd ft4, 32(a1) -; ILP32-LP64-NEXT: fsd ft3, 24(a1) -; ILP32-LP64-NEXT: fsd ft2, 16(a1) -; ILP32-LP64-NEXT: fsd ft1, 8(a1) -; ILP32-LP64-NEXT: fsd ft0, %lo(var)(a0) -; ILP32-LP64-NEXT: ret +; ILP32-LABEL: callee: +; ILP32: # %bb.0: +; ILP32-NEXT: lui a0, %hi(var) +; ILP32-NEXT: fld ft0, %lo(var)(a0) +; ILP32-NEXT: fld ft1, %lo(var+8)(a0) +; ILP32-NEXT: addi a1, a0, %lo(var) +; ILP32-NEXT: fld ft2, 16(a1) +; ILP32-NEXT: fld ft3, 24(a1) +; ILP32-NEXT: fld ft4, 32(a1) +; ILP32-NEXT: fld ft5, 40(a1) +; ILP32-NEXT: fld ft6, 48(a1) +; ILP32-NEXT: fld ft7, 56(a1) +; ILP32-NEXT: fld fa0, 64(a1) +; ILP32-NEXT: fld fa1, 72(a1) +; ILP32-NEXT: fld fa2, 80(a1) +; ILP32-NEXT: fld fa3, 88(a1) +; ILP32-NEXT: fld fa4, 96(a1) +; ILP32-NEXT: fld fa5, 104(a1) +; ILP32-NEXT: fld fa6, 112(a1) +; ILP32-NEXT: fld fa7, 120(a1) +; ILP32-NEXT: fld ft8, 128(a1) +; ILP32-NEXT: fld ft9, 136(a1) +; ILP32-NEXT: fld ft10, 144(a1) +; ILP32-NEXT: fld ft11, 152(a1) +; ILP32-NEXT: fld fs0, 160(a1) +; ILP32-NEXT: fld fs1, 168(a1) +; ILP32-NEXT: fld fs2, 176(a1) +; ILP32-NEXT: fld fs3, 184(a1) +; ILP32-NEXT: fld fs4, 192(a1) +; ILP32-NEXT: fld fs5, 200(a1) +; ILP32-NEXT: fld fs6, 208(a1) +; ILP32-NEXT: fld fs7, 216(a1) +; ILP32-NEXT: fld fs8, 248(a1) +; ILP32-NEXT: fld fs9, 240(a1) +; ILP32-NEXT: fld fs10, 232(a1) +; ILP32-NEXT: fld fs11, 224(a1) +; ILP32-NEXT: fsd fs8, 248(a1) +; ILP32-NEXT: fsd fs9, 240(a1) +; ILP32-NEXT: fsd fs10, 232(a1) +; ILP32-NEXT: fsd fs11, 224(a1) +; ILP32-NEXT: fsd fs7, 216(a1) +; ILP32-NEXT: fsd fs6, 208(a1) +; ILP32-NEXT: fsd fs5, 200(a1) +; ILP32-NEXT: fsd fs4, 192(a1) +; ILP32-NEXT: fsd fs3, 184(a1) +; ILP32-NEXT: fsd fs2, 176(a1) +; ILP32-NEXT: fsd fs1, 168(a1) +; ILP32-NEXT: fsd fs0, 160(a1) +; ILP32-NEXT: fsd ft11, 152(a1) +; ILP32-NEXT: fsd ft10, 144(a1) +; ILP32-NEXT: fsd ft9, 136(a1) +; ILP32-NEXT: fsd ft8, 128(a1) +; ILP32-NEXT: fsd fa7, 120(a1) +; ILP32-NEXT: fsd fa6, 112(a1) +; ILP32-NEXT: fsd fa5, 104(a1) +; ILP32-NEXT: fsd fa4, 96(a1) +; ILP32-NEXT: fsd fa3, 88(a1) +; ILP32-NEXT: fsd fa2, 80(a1) +; ILP32-NEXT: fsd fa1, 72(a1) +; ILP32-NEXT: fsd fa0, 64(a1) +; ILP32-NEXT: fsd ft7, 56(a1) +; ILP32-NEXT: fsd ft6, 48(a1) +; ILP32-NEXT: fsd ft5, 40(a1) +; ILP32-NEXT: fsd ft4, 32(a1) +; ILP32-NEXT: fsd ft3, 24(a1) +; ILP32-NEXT: fsd ft2, 16(a1) +; ILP32-NEXT: fsd ft1, %lo(var+8)(a0) +; ILP32-NEXT: fsd ft0, %lo(var)(a0) +; ILP32-NEXT: ret ; -; ILP32D-LP64D-LABEL: callee: -; ILP32D-LP64D: # %bb.0: -; ILP32D-LP64D-NEXT: addi sp, sp, -96 -; ILP32D-LP64D-NEXT: fsd fs0, 88(sp) -; ILP32D-LP64D-NEXT: fsd fs1, 80(sp) -; ILP32D-LP64D-NEXT: fsd fs2, 72(sp) -; ILP32D-LP64D-NEXT: fsd fs3, 64(sp) -; ILP32D-LP64D-NEXT: fsd fs4, 56(sp) -; ILP32D-LP64D-NEXT: fsd fs5, 48(sp) -; ILP32D-LP64D-NEXT: fsd fs6, 40(sp) -; ILP32D-LP64D-NEXT: fsd fs7, 32(sp) -; ILP32D-LP64D-NEXT: fsd fs8, 24(sp) -; ILP32D-LP64D-NEXT: fsd fs9, 16(sp) -; ILP32D-LP64D-NEXT: fsd fs10, 8(sp) -; ILP32D-LP64D-NEXT: fsd fs11, 0(sp) -; ILP32D-LP64D-NEXT: lui a0, %hi(var) -; ILP32D-LP64D-NEXT: fld ft0, %lo(var)(a0) -; ILP32D-LP64D-NEXT: addi a1, a0, %lo(var) +; LP64-LABEL: callee: +; LP64: # %bb.0: +; LP64-NEXT: lui a0, %hi(var) +; LP64-NEXT: fld ft0, %lo(var)(a0) +; LP64-NEXT: fld ft1, %lo(var+8)(a0) +; LP64-NEXT: addi a1, a0, %lo(var) +; LP64-NEXT: fld ft2, 16(a1) +; LP64-NEXT: fld ft3, 24(a1) +; LP64-NEXT: fld ft4, 32(a1) +; LP64-NEXT: fld ft5, 40(a1) +; LP64-NEXT: fld ft6, 48(a1) +; LP64-NEXT: fld ft7, 56(a1) +; LP64-NEXT: fld fa0, 64(a1) +; LP64-NEXT: fld fa1, 72(a1) +; LP64-NEXT: fld fa2, 80(a1) +; LP64-NEXT: fld fa3, 88(a1) +; LP64-NEXT: fld fa4, 96(a1) +; LP64-NEXT: fld fa5, 104(a1) +; LP64-NEXT: fld fa6, 112(a1) +; LP64-NEXT: fld fa7, 120(a1) +; LP64-NEXT: fld ft8, 128(a1) +; LP64-NEXT: fld ft9, 136(a1) +; LP64-NEXT: fld ft10, 144(a1) +; LP64-NEXT: fld ft11, 152(a1) +; LP64-NEXT: fld fs0, 160(a1) +; LP64-NEXT: fld fs1, 168(a1) +; LP64-NEXT: fld fs2, 176(a1) +; LP64-NEXT: fld fs3, 184(a1) +; LP64-NEXT: fld fs4, 192(a1) +; LP64-NEXT: fld fs5, 200(a1) +; LP64-NEXT: fld fs6, 208(a1) +; LP64-NEXT: fld fs7, 216(a1) +; LP64-NEXT: fld fs8, 248(a1) +; LP64-NEXT: fld fs9, 240(a1) +; LP64-NEXT: fld fs10, 232(a1) +; LP64-NEXT: fld fs11, 224(a1) +; LP64-NEXT: fsd fs8, 248(a1) +; LP64-NEXT: fsd fs9, 240(a1) +; LP64-NEXT: fsd fs10, 232(a1) +; LP64-NEXT: fsd fs11, 224(a1) +; LP64-NEXT: fsd fs7, 216(a1) +; LP64-NEXT: fsd fs6, 208(a1) +; LP64-NEXT: fsd fs5, 200(a1) +; LP64-NEXT: fsd fs4, 192(a1) +; LP64-NEXT: fsd fs3, 184(a1) +; LP64-NEXT: fsd fs2, 176(a1) +; LP64-NEXT: fsd fs1, 168(a1) +; LP64-NEXT: fsd fs0, 160(a1) +; LP64-NEXT: fsd ft11, 152(a1) +; LP64-NEXT: fsd ft10, 144(a1) +; LP64-NEXT: fsd ft9, 136(a1) +; LP64-NEXT: fsd ft8, 128(a1) +; LP64-NEXT: fsd fa7, 120(a1) +; LP64-NEXT: fsd fa6, 112(a1) +; LP64-NEXT: fsd fa5, 104(a1) +; LP64-NEXT: fsd fa4, 96(a1) +; LP64-NEXT: fsd fa3, 88(a1) +; LP64-NEXT: fsd fa2, 80(a1) +; LP64-NEXT: fsd fa1, 72(a1) +; LP64-NEXT: fsd fa0, 64(a1) +; LP64-NEXT: fsd ft7, 56(a1) +; LP64-NEXT: fsd ft6, 48(a1) +; LP64-NEXT: fsd ft5, 40(a1) +; LP64-NEXT: fsd ft4, 32(a1) +; LP64-NEXT: fsd ft3, 24(a1) +; LP64-NEXT: fsd ft2, 16(a1) +; LP64-NEXT: fsd ft1, %lo(var+8)(a0) +; LP64-NEXT: fsd ft0, %lo(var)(a0) +; LP64-NEXT: ret +; +; ILP32D-LABEL: callee: +; ILP32D: # %bb.0: +; ILP32D-NEXT: addi sp, sp, -96 +; ILP32D-NEXT: fsd fs0, 88(sp) +; ILP32D-NEXT: fsd fs1, 80(sp) +; ILP32D-NEXT: fsd fs2, 72(sp) +; ILP32D-NEXT: fsd fs3, 64(sp) +; ILP32D-NEXT: fsd fs4, 56(sp) +; ILP32D-NEXT: fsd fs5, 48(sp) +; ILP32D-NEXT: fsd fs6, 40(sp) +; ILP32D-NEXT: fsd fs7, 32(sp) +; ILP32D-NEXT: fsd fs8, 24(sp) +; ILP32D-NEXT: fsd fs9, 16(sp) +; ILP32D-NEXT: fsd fs10, 8(sp) +; ILP32D-NEXT: fsd fs11, 0(sp) +; ILP32D-NEXT: lui a0, %hi(var) +; ILP32D-NEXT: fld ft0, %lo(var)(a0) +; ILP32D-NEXT: fld ft1, %lo(var+8)(a0) +; ILP32D-NEXT: addi a1, a0, %lo(var) +; ILP32D-NEXT: fld ft2, 16(a1) +; ILP32D-NEXT: fld ft3, 24(a1) +; ILP32D-NEXT: fld ft4, 32(a1) +; ILP32D-NEXT: fld ft5, 40(a1) +; ILP32D-NEXT: fld ft6, 48(a1) +; ILP32D-NEXT: fld ft7, 56(a1) +; ILP32D-NEXT: fld fa0, 64(a1) +; ILP32D-NEXT: fld fa1, 72(a1) +; ILP32D-NEXT: fld fa2, 80(a1) +; ILP32D-NEXT: fld fa3, 88(a1) +; ILP32D-NEXT: fld fa4, 96(a1) +; ILP32D-NEXT: fld fa5, 104(a1) +; ILP32D-NEXT: fld fa6, 112(a1) +; ILP32D-NEXT: fld fa7, 120(a1) +; ILP32D-NEXT: fld ft8, 128(a1) +; ILP32D-NEXT: fld ft9, 136(a1) +; ILP32D-NEXT: fld ft10, 144(a1) +; ILP32D-NEXT: fld ft11, 152(a1) +; ILP32D-NEXT: fld fs0, 160(a1) +; ILP32D-NEXT: fld fs1, 168(a1) +; ILP32D-NEXT: fld fs2, 176(a1) +; ILP32D-NEXT: fld fs3, 184(a1) +; ILP32D-NEXT: fld fs4, 192(a1) +; ILP32D-NEXT: fld fs5, 200(a1) +; ILP32D-NEXT: fld fs6, 208(a1) +; ILP32D-NEXT: fld fs7, 216(a1) +; ILP32D-NEXT: fld fs8, 248(a1) +; ILP32D-NEXT: fld fs9, 240(a1) +; ILP32D-NEXT: fld fs10, 232(a1) +; ILP32D-NEXT: fld fs11, 224(a1) +; ILP32D-NEXT: fsd fs8, 248(a1) +; ILP32D-NEXT: fsd fs9, 240(a1) +; ILP32D-NEXT: fsd fs10, 232(a1) +; ILP32D-NEXT: fsd fs11, 224(a1) +; ILP32D-NEXT: fsd fs7, 216(a1) +; ILP32D-NEXT: fsd fs6, 208(a1) +; ILP32D-NEXT: fsd fs5, 200(a1) +; ILP32D-NEXT: fsd fs4, 192(a1) +; ILP32D-NEXT: fsd fs3, 184(a1) +; ILP32D-NEXT: fsd fs2, 176(a1) +; ILP32D-NEXT: fsd fs1, 168(a1) +; ILP32D-NEXT: fsd fs0, 160(a1) +; ILP32D-NEXT: fsd ft11, 152(a1) +; ILP32D-NEXT: fsd ft10, 144(a1) +; ILP32D-NEXT: fsd ft9, 136(a1) +; ILP32D-NEXT: fsd ft8, 128(a1) +; ILP32D-NEXT: fsd fa7, 120(a1) +; ILP32D-NEXT: fsd fa6, 112(a1) +; ILP32D-NEXT: fsd fa5, 104(a1) +; ILP32D-NEXT: fsd fa4, 96(a1) +; ILP32D-NEXT: fsd fa3, 88(a1) +; ILP32D-NEXT: fsd fa2, 80(a1) +; ILP32D-NEXT: fsd fa1, 72(a1) +; ILP32D-NEXT: fsd fa0, 64(a1) +; ILP32D-NEXT: fsd ft7, 56(a1) +; ILP32D-NEXT: fsd ft6, 48(a1) +; ILP32D-NEXT: fsd ft5, 40(a1) +; ILP32D-NEXT: fsd ft4, 32(a1) +; ILP32D-NEXT: fsd ft3, 24(a1) +; ILP32D-NEXT: fsd ft2, 16(a1) +; ILP32D-NEXT: fsd ft1, %lo(var+8)(a0) +; ILP32D-NEXT: fsd ft0, %lo(var)(a0) +; ILP32D-NEXT: fld fs11, 0(sp) +; ILP32D-NEXT: fld fs10, 8(sp) +; ILP32D-NEXT: fld fs9, 16(sp) +; ILP32D-NEXT: fld fs8, 24(sp) +; ILP32D-NEXT: fld fs7, 32(sp) +; ILP32D-NEXT: fld fs6, 40(sp) +; ILP32D-NEXT: fld fs5, 48(sp) +; ILP32D-NEXT: fld fs4, 56(sp) +; ILP32D-NEXT: fld fs3, 64(sp) +; ILP32D-NEXT: fld fs2, 72(sp) +; ILP32D-NEXT: fld fs1, 80(sp) +; ILP32D-NEXT: fld fs0, 88(sp) +; ILP32D-NEXT: addi sp, sp, 96 +; ILP32D-NEXT: ret +; +; LP64D-LABEL: callee: +; LP64D: # %bb.0: +; LP64D-NEXT: addi sp, sp, -96 +; LP64D-NEXT: fsd fs0, 88(sp) +; LP64D-NEXT: fsd fs1, 80(sp) +; LP64D-NEXT: fsd fs2, 72(sp) +; LP64D-NEXT: fsd fs3, 64(sp) +; LP64D-NEXT: fsd fs4, 56(sp) +; LP64D-NEXT: fsd fs5, 48(sp) +; LP64D-NEXT: fsd fs6, 40(sp) +; LP64D-NEXT: fsd fs7, 32(sp) +; LP64D-NEXT: fsd fs8, 24(sp) +; LP64D-NEXT: fsd fs9, 16(sp) +; LP64D-NEXT: fsd fs10, 8(sp) +; LP64D-NEXT: fsd fs11, 0(sp) +; LP64D-NEXT: lui a0, %hi(var) +; LP64D-NEXT: fld ft0, %lo(var)(a0) +; LP64D-NEXT: fld ft1, %lo(var+8)(a0) +; LP64D-NEXT: addi a1, a0, %lo(var) +; LP64D-NEXT: fld ft2, 16(a1) +; LP64D-NEXT: fld ft3, 24(a1) +; LP64D-NEXT: fld ft4, 32(a1) +; LP64D-NEXT: fld ft5, 40(a1) +; LP64D-NEXT: fld ft6, 48(a1) +; LP64D-NEXT: fld ft7, 56(a1) +; LP64D-NEXT: fld fa0, 64(a1) +; LP64D-NEXT: fld fa1, 72(a1) +; LP64D-NEXT: fld fa2, 80(a1) +; LP64D-NEXT: fld fa3, 88(a1) +; LP64D-NEXT: fld fa4, 96(a1) +; LP64D-NEXT: fld fa5, 104(a1) +; LP64D-NEXT: fld fa6, 112(a1) +; LP64D-NEXT: fld fa7, 120(a1) +; LP64D-NEXT: fld ft8, 128(a1) +; LP64D-NEXT: fld ft9, 136(a1) +; LP64D-NEXT: fld ft10, 144(a1) +; LP64D-NEXT: fld ft11, 152(a1) +; LP64D-NEXT: fld fs0, 160(a1) +; LP64D-NEXT: fld fs1, 168(a1) +; LP64D-NEXT: fld fs2, 176(a1) +; LP64D-NEXT: fld fs3, 184(a1) +; LP64D-NEXT: fld fs4, 192(a1) +; LP64D-NEXT: fld fs5, 200(a1) +; LP64D-NEXT: fld fs6, 208(a1) +; LP64D-NEXT: fld fs7, 216(a1) +; LP64D-NEXT: fld fs8, 248(a1) +; LP64D-NEXT: fld fs9, 240(a1) +; LP64D-NEXT: fld fs10, 232(a1) +; LP64D-NEXT: fld fs11, 224(a1) +; LP64D-NEXT: fsd fs8, 248(a1) +; LP64D-NEXT: fsd fs9, 240(a1) +; LP64D-NEXT: fsd fs10, 232(a1) +; LP64D-NEXT: fsd fs11, 224(a1) +; LP64D-NEXT: fsd fs7, 216(a1) +; LP64D-NEXT: fsd fs6, 208(a1) +; LP64D-NEXT: fsd fs5, 200(a1) +; LP64D-NEXT: fsd fs4, 192(a1) +; LP64D-NEXT: fsd fs3, 184(a1) +; LP64D-NEXT: fsd fs2, 176(a1) +; LP64D-NEXT: fsd fs1, 168(a1) +; LP64D-NEXT: fsd fs0, 160(a1) +; LP64D-NEXT: fsd ft11, 152(a1) +; LP64D-NEXT: fsd ft10, 144(a1) +; LP64D-NEXT: fsd ft9, 136(a1) +; LP64D-NEXT: fsd ft8, 128(a1) +; LP64D-NEXT: fsd fa7, 120(a1) +; LP64D-NEXT: fsd fa6, 112(a1) +; LP64D-NEXT: fsd fa5, 104(a1) +; LP64D-NEXT: fsd fa4, 96(a1) +; LP64D-NEXT: fsd fa3, 88(a1) +; LP64D-NEXT: fsd fa2, 80(a1) +; LP64D-NEXT: fsd fa1, 72(a1) +; LP64D-NEXT: fsd fa0, 64(a1) +; LP64D-NEXT: fsd ft7, 56(a1) +; LP64D-NEXT: fsd ft6, 48(a1) +; LP64D-NEXT: fsd ft5, 40(a1) +; LP64D-NEXT: fsd ft4, 32(a1) +; LP64D-NEXT: fsd ft3, 24(a1) +; LP64D-NEXT: fsd ft2, 16(a1) +; LP64D-NEXT: fsd ft1, %lo(var+8)(a0) +; LP64D-NEXT: fsd ft0, %lo(var)(a0) +; LP64D-NEXT: fld fs11, 0(sp) +; LP64D-NEXT: fld fs10, 8(sp) +; LP64D-NEXT: fld fs9, 16(sp) +; LP64D-NEXT: fld fs8, 24(sp) +; LP64D-NEXT: fld fs7, 32(sp) +; LP64D-NEXT: fld fs6, 40(sp) +; LP64D-NEXT: fld fs5, 48(sp) +; LP64D-NEXT: fld fs4, 56(sp) +; LP64D-NEXT: fld fs3, 64(sp) +; LP64D-NEXT: fld fs2, 72(sp) +; LP64D-NEXT: fld fs1, 80(sp) +; LP64D-NEXT: fld fs0, 88(sp) +; LP64D-NEXT: addi sp, sp, 96 +; LP64D-NEXT: ret %val = load [32 x double], [32 x double]* @var store volatile [32 x double] %val, [32 x double]* @var ret void @@ -117,43 +361,577 @@ define void @callee() nounwind { ; fs0-fs11 are preserved across calls. define void @caller() nounwind { -; ILP32-LP64-LABEL: caller: -; ILP32-LP64-NOT: ft{{[1-9][0-9]*}} -; ILP32-LP64-NOT: fs{{[0-9]+}} -; ILP32-LP64-NOT: fa{{[0-9]+}} -; ILP32-LP64: call callee -; ILP32-LP64-NOT: ft{{[1-9][0-9]*}} -; ILP32-LP64-NOT: fs{{[0-9]+}} -; ILP32-LP64-NOT: fa{{[0-9]+}} -; ILP32-LP64: ret +; ILP32-LABEL: caller: +; ILP32: # %bb.0: +; ILP32-NEXT: addi sp, sp, -272 +; ILP32-NEXT: sw ra, 268(sp) +; ILP32-NEXT: sw s0, 264(sp) +; ILP32-NEXT: sw s1, 260(sp) +; ILP32-NEXT: lui s0, %hi(var) +; ILP32-NEXT: fld ft0, %lo(var)(s0) +; ILP32-NEXT: fsd ft0, 248(sp) +; ILP32-NEXT: fld ft0, %lo(var+8)(s0) +; ILP32-NEXT: fsd ft0, 240(sp) +; ILP32-NEXT: addi s1, s0, %lo(var) +; ILP32-NEXT: fld ft0, 16(s1) +; ILP32-NEXT: fsd ft0, 232(sp) +; ILP32-NEXT: fld ft0, 24(s1) +; ILP32-NEXT: fsd ft0, 224(sp) +; ILP32-NEXT: fld ft0, 32(s1) +; ILP32-NEXT: fsd ft0, 216(sp) +; ILP32-NEXT: fld ft0, 40(s1) +; ILP32-NEXT: fsd ft0, 208(sp) +; ILP32-NEXT: fld ft0, 48(s1) +; ILP32-NEXT: fsd ft0, 200(sp) +; ILP32-NEXT: fld ft0, 56(s1) +; ILP32-NEXT: fsd ft0, 192(sp) +; ILP32-NEXT: fld ft0, 64(s1) +; ILP32-NEXT: fsd ft0, 184(sp) +; ILP32-NEXT: fld ft0, 72(s1) +; ILP32-NEXT: fsd ft0, 176(sp) +; ILP32-NEXT: fld ft0, 80(s1) +; ILP32-NEXT: fsd ft0, 168(sp) +; ILP32-NEXT: fld ft0, 88(s1) +; ILP32-NEXT: fsd ft0, 160(sp) +; ILP32-NEXT: fld ft0, 96(s1) +; ILP32-NEXT: fsd ft0, 152(sp) +; ILP32-NEXT: fld ft0, 104(s1) +; ILP32-NEXT: fsd ft0, 144(sp) +; ILP32-NEXT: fld ft0, 112(s1) +; ILP32-NEXT: fsd ft0, 136(sp) +; ILP32-NEXT: fld ft0, 120(s1) +; ILP32-NEXT: fsd ft0, 128(sp) +; ILP32-NEXT: fld ft0, 128(s1) +; ILP32-NEXT: fsd ft0, 120(sp) +; ILP32-NEXT: fld ft0, 136(s1) +; ILP32-NEXT: fsd ft0, 112(sp) +; ILP32-NEXT: fld ft0, 144(s1) +; ILP32-NEXT: fsd ft0, 104(sp) +; ILP32-NEXT: fld ft0, 152(s1) +; ILP32-NEXT: fsd ft0, 96(sp) +; ILP32-NEXT: fld ft0, 160(s1) +; ILP32-NEXT: fsd ft0, 88(sp) +; ILP32-NEXT: fld ft0, 168(s1) +; ILP32-NEXT: fsd ft0, 80(sp) +; ILP32-NEXT: fld ft0, 176(s1) +; ILP32-NEXT: fsd ft0, 72(sp) +; ILP32-NEXT: fld ft0, 184(s1) +; ILP32-NEXT: fsd ft0, 64(sp) +; ILP32-NEXT: fld ft0, 192(s1) +; ILP32-NEXT: fsd ft0, 56(sp) +; ILP32-NEXT: fld ft0, 200(s1) +; ILP32-NEXT: fsd ft0, 48(sp) +; ILP32-NEXT: fld ft0, 208(s1) +; ILP32-NEXT: fsd ft0, 40(sp) +; ILP32-NEXT: fld ft0, 216(s1) +; ILP32-NEXT: fsd ft0, 32(sp) +; ILP32-NEXT: fld ft0, 224(s1) +; ILP32-NEXT: fsd ft0, 24(sp) +; ILP32-NEXT: fld ft0, 232(s1) +; ILP32-NEXT: fsd ft0, 16(sp) +; ILP32-NEXT: fld ft0, 240(s1) +; ILP32-NEXT: fsd ft0, 8(sp) +; ILP32-NEXT: fld ft0, 248(s1) +; ILP32-NEXT: fsd ft0, 0(sp) +; ILP32-NEXT: call callee +; ILP32-NEXT: fld ft0, 0(sp) +; ILP32-NEXT: fsd ft0, 248(s1) +; ILP32-NEXT: fld ft0, 8(sp) +; ILP32-NEXT: fsd ft0, 240(s1) +; ILP32-NEXT: fld ft0, 16(sp) +; ILP32-NEXT: fsd ft0, 232(s1) +; ILP32-NEXT: fld ft0, 24(sp) +; ILP32-NEXT: fsd ft0, 224(s1) +; ILP32-NEXT: fld ft0, 32(sp) +; ILP32-NEXT: fsd ft0, 216(s1) +; ILP32-NEXT: fld ft0, 40(sp) +; ILP32-NEXT: fsd ft0, 208(s1) +; ILP32-NEXT: fld ft0, 48(sp) +; ILP32-NEXT: fsd ft0, 200(s1) +; ILP32-NEXT: fld ft0, 56(sp) +; ILP32-NEXT: fsd ft0, 192(s1) +; ILP32-NEXT: fld ft0, 64(sp) +; ILP32-NEXT: fsd ft0, 184(s1) +; ILP32-NEXT: fld ft0, 72(sp) +; ILP32-NEXT: fsd ft0, 176(s1) +; ILP32-NEXT: fld ft0, 80(sp) +; ILP32-NEXT: fsd ft0, 168(s1) +; ILP32-NEXT: fld ft0, 88(sp) +; ILP32-NEXT: fsd ft0, 160(s1) +; ILP32-NEXT: fld ft0, 96(sp) +; ILP32-NEXT: fsd ft0, 152(s1) +; ILP32-NEXT: fld ft0, 104(sp) +; ILP32-NEXT: fsd ft0, 144(s1) +; ILP32-NEXT: fld ft0, 112(sp) +; ILP32-NEXT: fsd ft0, 136(s1) +; ILP32-NEXT: fld ft0, 120(sp) +; ILP32-NEXT: fsd ft0, 128(s1) +; ILP32-NEXT: fld ft0, 128(sp) +; ILP32-NEXT: fsd ft0, 120(s1) +; ILP32-NEXT: fld ft0, 136(sp) +; ILP32-NEXT: fsd ft0, 112(s1) +; ILP32-NEXT: fld ft0, 144(sp) +; ILP32-NEXT: fsd ft0, 104(s1) +; ILP32-NEXT: fld ft0, 152(sp) +; ILP32-NEXT: fsd ft0, 96(s1) +; ILP32-NEXT: fld ft0, 160(sp) +; ILP32-NEXT: fsd ft0, 88(s1) +; ILP32-NEXT: fld ft0, 168(sp) +; ILP32-NEXT: fsd ft0, 80(s1) +; ILP32-NEXT: fld ft0, 176(sp) +; ILP32-NEXT: fsd ft0, 72(s1) +; ILP32-NEXT: fld ft0, 184(sp) +; ILP32-NEXT: fsd ft0, 64(s1) +; ILP32-NEXT: fld ft0, 192(sp) +; ILP32-NEXT: fsd ft0, 56(s1) +; ILP32-NEXT: fld ft0, 200(sp) +; ILP32-NEXT: fsd ft0, 48(s1) +; ILP32-NEXT: fld ft0, 208(sp) +; ILP32-NEXT: fsd ft0, 40(s1) +; ILP32-NEXT: fld ft0, 216(sp) +; ILP32-NEXT: fsd ft0, 32(s1) +; ILP32-NEXT: fld ft0, 224(sp) +; ILP32-NEXT: fsd ft0, 24(s1) +; ILP32-NEXT: fld ft0, 232(sp) +; ILP32-NEXT: fsd ft0, 16(s1) +; ILP32-NEXT: fld ft0, 240(sp) +; ILP32-NEXT: fsd ft0, %lo(var+8)(s0) +; ILP32-NEXT: fld ft0, 248(sp) +; ILP32-NEXT: fsd ft0, %lo(var)(s0) +; ILP32-NEXT: lw s1, 260(sp) +; ILP32-NEXT: lw s0, 264(sp) +; ILP32-NEXT: lw ra, 268(sp) +; ILP32-NEXT: addi sp, sp, 272 +; ILP32-NEXT: ret +; +; LP64-LABEL: caller: +; LP64: # %bb.0: +; LP64-NEXT: addi sp, sp, -288 +; LP64-NEXT: sd ra, 280(sp) +; LP64-NEXT: sd s0, 272(sp) +; LP64-NEXT: sd s1, 264(sp) +; LP64-NEXT: lui s0, %hi(var) +; LP64-NEXT: fld ft0, %lo(var)(s0) +; LP64-NEXT: fsd ft0, 256(sp) +; LP64-NEXT: fld ft0, %lo(var+8)(s0) +; LP64-NEXT: fsd ft0, 248(sp) +; LP64-NEXT: addi s1, s0, %lo(var) +; LP64-NEXT: fld ft0, 16(s1) +; LP64-NEXT: fsd ft0, 240(sp) +; LP64-NEXT: fld ft0, 24(s1) +; LP64-NEXT: fsd ft0, 232(sp) +; LP64-NEXT: fld ft0, 32(s1) +; LP64-NEXT: fsd ft0, 224(sp) +; LP64-NEXT: fld ft0, 40(s1) +; LP64-NEXT: fsd ft0, 216(sp) +; LP64-NEXT: fld ft0, 48(s1) +; LP64-NEXT: fsd ft0, 208(sp) +; LP64-NEXT: fld ft0, 56(s1) +; LP64-NEXT: fsd ft0, 200(sp) +; LP64-NEXT: fld ft0, 64(s1) +; LP64-NEXT: fsd ft0, 192(sp) +; LP64-NEXT: fld ft0, 72(s1) +; LP64-NEXT: fsd ft0, 184(sp) +; LP64-NEXT: fld ft0, 80(s1) +; LP64-NEXT: fsd ft0, 176(sp) +; LP64-NEXT: fld ft0, 88(s1) +; LP64-NEXT: fsd ft0, 168(sp) +; LP64-NEXT: fld ft0, 96(s1) +; LP64-NEXT: fsd ft0, 160(sp) +; LP64-NEXT: fld ft0, 104(s1) +; LP64-NEXT: fsd ft0, 152(sp) +; LP64-NEXT: fld ft0, 112(s1) +; LP64-NEXT: fsd ft0, 144(sp) +; LP64-NEXT: fld ft0, 120(s1) +; LP64-NEXT: fsd ft0, 136(sp) +; LP64-NEXT: fld ft0, 128(s1) +; LP64-NEXT: fsd ft0, 128(sp) +; LP64-NEXT: fld ft0, 136(s1) +; LP64-NEXT: fsd ft0, 120(sp) +; LP64-NEXT: fld ft0, 144(s1) +; LP64-NEXT: fsd ft0, 112(sp) +; LP64-NEXT: fld ft0, 152(s1) +; LP64-NEXT: fsd ft0, 104(sp) +; LP64-NEXT: fld ft0, 160(s1) +; LP64-NEXT: fsd ft0, 96(sp) +; LP64-NEXT: fld ft0, 168(s1) +; LP64-NEXT: fsd ft0, 88(sp) +; LP64-NEXT: fld ft0, 176(s1) +; LP64-NEXT: fsd ft0, 80(sp) +; LP64-NEXT: fld ft0, 184(s1) +; LP64-NEXT: fsd ft0, 72(sp) +; LP64-NEXT: fld ft0, 192(s1) +; LP64-NEXT: fsd ft0, 64(sp) +; LP64-NEXT: fld ft0, 200(s1) +; LP64-NEXT: fsd ft0, 56(sp) +; LP64-NEXT: fld ft0, 208(s1) +; LP64-NEXT: fsd ft0, 48(sp) +; LP64-NEXT: fld ft0, 216(s1) +; LP64-NEXT: fsd ft0, 40(sp) +; LP64-NEXT: fld ft0, 224(s1) +; LP64-NEXT: fsd ft0, 32(sp) +; LP64-NEXT: fld ft0, 232(s1) +; LP64-NEXT: fsd ft0, 24(sp) +; LP64-NEXT: fld ft0, 240(s1) +; LP64-NEXT: fsd ft0, 16(sp) +; LP64-NEXT: fld ft0, 248(s1) +; LP64-NEXT: fsd ft0, 8(sp) +; LP64-NEXT: call callee +; LP64-NEXT: fld ft0, 8(sp) +; LP64-NEXT: fsd ft0, 248(s1) +; LP64-NEXT: fld ft0, 16(sp) +; LP64-NEXT: fsd ft0, 240(s1) +; LP64-NEXT: fld ft0, 24(sp) +; LP64-NEXT: fsd ft0, 232(s1) +; LP64-NEXT: fld ft0, 32(sp) +; LP64-NEXT: fsd ft0, 224(s1) +; LP64-NEXT: fld ft0, 40(sp) +; LP64-NEXT: fsd ft0, 216(s1) +; LP64-NEXT: fld ft0, 48(sp) +; LP64-NEXT: fsd ft0, 208(s1) +; LP64-NEXT: fld ft0, 56(sp) +; LP64-NEXT: fsd ft0, 200(s1) +; LP64-NEXT: fld ft0, 64(sp) +; LP64-NEXT: fsd ft0, 192(s1) +; LP64-NEXT: fld ft0, 72(sp) +; LP64-NEXT: fsd ft0, 184(s1) +; LP64-NEXT: fld ft0, 80(sp) +; LP64-NEXT: fsd ft0, 176(s1) +; LP64-NEXT: fld ft0, 88(sp) +; LP64-NEXT: fsd ft0, 168(s1) +; LP64-NEXT: fld ft0, 96(sp) +; LP64-NEXT: fsd ft0, 160(s1) +; LP64-NEXT: fld ft0, 104(sp) +; LP64-NEXT: fsd ft0, 152(s1) +; LP64-NEXT: fld ft0, 112(sp) +; LP64-NEXT: fsd ft0, 144(s1) +; LP64-NEXT: fld ft0, 120(sp) +; LP64-NEXT: fsd ft0, 136(s1) +; LP64-NEXT: fld ft0, 128(sp) +; LP64-NEXT: fsd ft0, 128(s1) +; LP64-NEXT: fld ft0, 136(sp) +; LP64-NEXT: fsd ft0, 120(s1) +; LP64-NEXT: fld ft0, 144(sp) +; LP64-NEXT: fsd ft0, 112(s1) +; LP64-NEXT: fld ft0, 152(sp) +; LP64-NEXT: fsd ft0, 104(s1) +; LP64-NEXT: fld ft0, 160(sp) +; LP64-NEXT: fsd ft0, 96(s1) +; LP64-NEXT: fld ft0, 168(sp) +; LP64-NEXT: fsd ft0, 88(s1) +; LP64-NEXT: fld ft0, 176(sp) +; LP64-NEXT: fsd ft0, 80(s1) +; LP64-NEXT: fld ft0, 184(sp) +; LP64-NEXT: fsd ft0, 72(s1) +; LP64-NEXT: fld ft0, 192(sp) +; LP64-NEXT: fsd ft0, 64(s1) +; LP64-NEXT: fld ft0, 200(sp) +; LP64-NEXT: fsd ft0, 56(s1) +; LP64-NEXT: fld ft0, 208(sp) +; LP64-NEXT: fsd ft0, 48(s1) +; LP64-NEXT: fld ft0, 216(sp) +; LP64-NEXT: fsd ft0, 40(s1) +; LP64-NEXT: fld ft0, 224(sp) +; LP64-NEXT: fsd ft0, 32(s1) +; LP64-NEXT: fld ft0, 232(sp) +; LP64-NEXT: fsd ft0, 24(s1) +; LP64-NEXT: fld ft0, 240(sp) +; LP64-NEXT: fsd ft0, 16(s1) +; LP64-NEXT: fld ft0, 248(sp) +; LP64-NEXT: fsd ft0, %lo(var+8)(s0) +; LP64-NEXT: fld ft0, 256(sp) +; LP64-NEXT: fsd ft0, %lo(var)(s0) +; LP64-NEXT: ld s1, 264(sp) +; LP64-NEXT: ld s0, 272(sp) +; LP64-NEXT: ld ra, 280(sp) +; LP64-NEXT: addi sp, sp, 288 +; LP64-NEXT: ret +; +; ILP32D-LABEL: caller: +; ILP32D: # %bb.0: +; ILP32D-NEXT: addi sp, sp, -272 +; ILP32D-NEXT: sw ra, 268(sp) +; ILP32D-NEXT: sw s0, 264(sp) +; ILP32D-NEXT: sw s1, 260(sp) +; ILP32D-NEXT: fsd fs0, 248(sp) +; ILP32D-NEXT: fsd fs1, 240(sp) +; ILP32D-NEXT: fsd fs2, 232(sp) +; ILP32D-NEXT: fsd fs3, 224(sp) +; ILP32D-NEXT: fsd fs4, 216(sp) +; ILP32D-NEXT: fsd fs5, 208(sp) +; ILP32D-NEXT: fsd fs6, 200(sp) +; ILP32D-NEXT: fsd fs7, 192(sp) +; ILP32D-NEXT: fsd fs8, 184(sp) +; ILP32D-NEXT: fsd fs9, 176(sp) +; ILP32D-NEXT: fsd fs10, 168(sp) +; ILP32D-NEXT: fsd fs11, 160(sp) +; ILP32D-NEXT: lui s0, %hi(var) +; ILP32D-NEXT: fld ft0, %lo(var)(s0) +; ILP32D-NEXT: fsd ft0, 152(sp) +; ILP32D-NEXT: fld ft0, %lo(var+8)(s0) +; ILP32D-NEXT: fsd ft0, 144(sp) +; ILP32D-NEXT: addi s1, s0, %lo(var) +; ILP32D-NEXT: fld ft0, 16(s1) +; ILP32D-NEXT: fsd ft0, 136(sp) +; ILP32D-NEXT: fld ft0, 24(s1) +; ILP32D-NEXT: fsd ft0, 128(sp) +; ILP32D-NEXT: fld ft0, 32(s1) +; ILP32D-NEXT: fsd ft0, 120(sp) +; ILP32D-NEXT: fld ft0, 40(s1) +; ILP32D-NEXT: fsd ft0, 112(sp) +; ILP32D-NEXT: fld ft0, 48(s1) +; ILP32D-NEXT: fsd ft0, 104(sp) +; ILP32D-NEXT: fld ft0, 56(s1) +; ILP32D-NEXT: fsd ft0, 96(sp) +; ILP32D-NEXT: fld ft0, 64(s1) +; ILP32D-NEXT: fsd ft0, 88(sp) +; ILP32D-NEXT: fld ft0, 72(s1) +; ILP32D-NEXT: fsd ft0, 80(sp) +; ILP32D-NEXT: fld ft0, 80(s1) +; ILP32D-NEXT: fsd ft0, 72(sp) +; ILP32D-NEXT: fld ft0, 88(s1) +; ILP32D-NEXT: fsd ft0, 64(sp) +; ILP32D-NEXT: fld ft0, 96(s1) +; ILP32D-NEXT: fsd ft0, 56(sp) +; ILP32D-NEXT: fld ft0, 104(s1) +; ILP32D-NEXT: fsd ft0, 48(sp) +; ILP32D-NEXT: fld ft0, 112(s1) +; ILP32D-NEXT: fsd ft0, 40(sp) +; ILP32D-NEXT: fld ft0, 120(s1) +; ILP32D-NEXT: fsd ft0, 32(sp) +; ILP32D-NEXT: fld ft0, 128(s1) +; ILP32D-NEXT: fsd ft0, 24(sp) +; ILP32D-NEXT: fld ft0, 136(s1) +; ILP32D-NEXT: fsd ft0, 16(sp) +; ILP32D-NEXT: fld ft0, 144(s1) +; ILP32D-NEXT: fsd ft0, 8(sp) +; ILP32D-NEXT: fld ft0, 152(s1) +; ILP32D-NEXT: fsd ft0, 0(sp) +; ILP32D-NEXT: fld fs8, 160(s1) +; ILP32D-NEXT: fld fs9, 168(s1) +; ILP32D-NEXT: fld fs10, 176(s1) +; ILP32D-NEXT: fld fs11, 184(s1) +; ILP32D-NEXT: fld fs0, 192(s1) +; ILP32D-NEXT: fld fs1, 200(s1) +; ILP32D-NEXT: fld fs2, 208(s1) +; ILP32D-NEXT: fld fs3, 216(s1) +; ILP32D-NEXT: fld fs4, 224(s1) +; ILP32D-NEXT: fld fs5, 232(s1) +; ILP32D-NEXT: fld fs6, 240(s1) +; ILP32D-NEXT: fld fs7, 248(s1) +; ILP32D-NEXT: call callee +; ILP32D-NEXT: fsd fs7, 248(s1) +; ILP32D-NEXT: fsd fs6, 240(s1) +; ILP32D-NEXT: fsd fs5, 232(s1) +; ILP32D-NEXT: fsd fs4, 224(s1) +; ILP32D-NEXT: fsd fs3, 216(s1) +; ILP32D-NEXT: fsd fs2, 208(s1) +; ILP32D-NEXT: fsd fs1, 200(s1) +; ILP32D-NEXT: fsd fs0, 192(s1) +; ILP32D-NEXT: fsd fs11, 184(s1) +; ILP32D-NEXT: fsd fs10, 176(s1) +; ILP32D-NEXT: fsd fs9, 168(s1) +; ILP32D-NEXT: fsd fs8, 160(s1) +; ILP32D-NEXT: fld ft0, 0(sp) +; ILP32D-NEXT: fsd ft0, 152(s1) +; ILP32D-NEXT: fld ft0, 8(sp) +; ILP32D-NEXT: fsd ft0, 144(s1) +; ILP32D-NEXT: fld ft0, 16(sp) +; ILP32D-NEXT: fsd ft0, 136(s1) +; ILP32D-NEXT: fld ft0, 24(sp) +; ILP32D-NEXT: fsd ft0, 128(s1) +; ILP32D-NEXT: fld ft0, 32(sp) +; ILP32D-NEXT: fsd ft0, 120(s1) +; ILP32D-NEXT: fld ft0, 40(sp) +; ILP32D-NEXT: fsd ft0, 112(s1) +; ILP32D-NEXT: fld ft0, 48(sp) +; ILP32D-NEXT: fsd ft0, 104(s1) +; ILP32D-NEXT: fld ft0, 56(sp) +; ILP32D-NEXT: fsd ft0, 96(s1) +; ILP32D-NEXT: fld ft0, 64(sp) +; ILP32D-NEXT: fsd ft0, 88(s1) +; ILP32D-NEXT: fld ft0, 72(sp) +; ILP32D-NEXT: fsd ft0, 80(s1) +; ILP32D-NEXT: fld ft0, 80(sp) +; ILP32D-NEXT: fsd ft0, 72(s1) +; ILP32D-NEXT: fld ft0, 88(sp) +; ILP32D-NEXT: fsd ft0, 64(s1) +; ILP32D-NEXT: fld ft0, 96(sp) +; ILP32D-NEXT: fsd ft0, 56(s1) +; ILP32D-NEXT: fld ft0, 104(sp) +; ILP32D-NEXT: fsd ft0, 48(s1) +; ILP32D-NEXT: fld ft0, 112(sp) +; ILP32D-NEXT: fsd ft0, 40(s1) +; ILP32D-NEXT: fld ft0, 120(sp) +; ILP32D-NEXT: fsd ft0, 32(s1) +; ILP32D-NEXT: fld ft0, 128(sp) +; ILP32D-NEXT: fsd ft0, 24(s1) +; ILP32D-NEXT: fld ft0, 136(sp) +; ILP32D-NEXT: fsd ft0, 16(s1) +; ILP32D-NEXT: fld ft0, 144(sp) +; ILP32D-NEXT: fsd ft0, %lo(var+8)(s0) +; ILP32D-NEXT: fld ft0, 152(sp) +; ILP32D-NEXT: fsd ft0, %lo(var)(s0) +; ILP32D-NEXT: fld fs11, 160(sp) +; ILP32D-NEXT: fld fs10, 168(sp) +; ILP32D-NEXT: fld fs9, 176(sp) +; ILP32D-NEXT: fld fs8, 184(sp) +; ILP32D-NEXT: fld fs7, 192(sp) +; ILP32D-NEXT: fld fs6, 200(sp) +; ILP32D-NEXT: fld fs5, 208(sp) +; ILP32D-NEXT: fld fs4, 216(sp) +; ILP32D-NEXT: fld fs3, 224(sp) +; ILP32D-NEXT: fld fs2, 232(sp) +; ILP32D-NEXT: fld fs1, 240(sp) +; ILP32D-NEXT: fld fs0, 248(sp) +; ILP32D-NEXT: lw s1, 260(sp) +; ILP32D-NEXT: lw s0, 264(sp) +; ILP32D-NEXT: lw ra, 268(sp) +; ILP32D-NEXT: addi sp, sp, 272 +; ILP32D-NEXT: ret ; -; ILP32F-LP64D-LABEL: caller: -; ILP32D-LP64D: fld fs8, 160(s1) -; ILP32D-LP64D-NEXT: fld fs9, 168(s1) -; ILP32D-LP64D-NEXT: fld fs10, 176(s1) -; ILP32D-LP64D-NEXT: fld fs11, 184(s1) -; ILP32D-LP64D-NEXT: fld fs0, 192(s1) -; ILP32D-LP64D-NEXT: fld fs1, 200(s1) -; ILP32D-LP64D-NEXT: fld fs2, 208(s1) -; ILP32D-LP64D-NEXT: fld fs3, 216(s1) -; ILP32D-LP64D-NEXT: fld fs4, 224(s1) -; ILP32D-LP64D-NEXT: fld fs5, 232(s1) -; ILP32D-LP64D-NEXT: fld fs6, 240(s1) -; ILP32D-LP64D-NEXT: fld fs7, 248(s1) -; ILP32D-LP64D-NEXT: call callee -; ILP32D-LP64D-NEXT: fsd fs7, 248(s1) -; ILP32D-LP64D-NEXT: fsd fs6, 240(s1) -; ILP32D-LP64D-NEXT: fsd fs5, 232(s1) -; ILP32D-LP64D-NEXT: fsd fs4, 224(s1) -; ILP32D-LP64D-NEXT: fsd fs3, 216(s1) -; ILP32D-LP64D-NEXT: fsd fs2, 208(s1) -; ILP32D-LP64D-NEXT: fsd fs1, 200(s1) -; ILP32D-LP64D-NEXT: fsd fs0, 192(s1) -; ILP32D-LP64D-NEXT: fsd fs11, 184(s1) -; ILP32D-LP64D-NEXT: fsd fs10, 176(s1) -; ILP32D-LP64D-NEXT: fsd fs9, 168(s1) -; ILP32D-LP64D-NEXT: fsd fs8, 160(s1) -; ILP32D-LP64D-NEXT: fld ft0, {{[0-9]+}}(sp) +; LP64D-LABEL: caller: +; LP64D: # %bb.0: +; LP64D-NEXT: addi sp, sp, -288 +; LP64D-NEXT: sd ra, 280(sp) +; LP64D-NEXT: sd s0, 272(sp) +; LP64D-NEXT: sd s1, 264(sp) +; LP64D-NEXT: fsd fs0, 256(sp) +; LP64D-NEXT: fsd fs1, 248(sp) +; LP64D-NEXT: fsd fs2, 240(sp) +; LP64D-NEXT: fsd fs3, 232(sp) +; LP64D-NEXT: fsd fs4, 224(sp) +; LP64D-NEXT: fsd fs5, 216(sp) +; LP64D-NEXT: fsd fs6, 208(sp) +; LP64D-NEXT: fsd fs7, 200(sp) +; LP64D-NEXT: fsd fs8, 192(sp) +; LP64D-NEXT: fsd fs9, 184(sp) +; LP64D-NEXT: fsd fs10, 176(sp) +; LP64D-NEXT: fsd fs11, 168(sp) +; LP64D-NEXT: lui s0, %hi(var) +; LP64D-NEXT: fld ft0, %lo(var)(s0) +; LP64D-NEXT: fsd ft0, 160(sp) +; LP64D-NEXT: fld ft0, %lo(var+8)(s0) +; LP64D-NEXT: fsd ft0, 152(sp) +; LP64D-NEXT: addi s1, s0, %lo(var) +; LP64D-NEXT: fld ft0, 16(s1) +; LP64D-NEXT: fsd ft0, 144(sp) +; LP64D-NEXT: fld ft0, 24(s1) +; LP64D-NEXT: fsd ft0, 136(sp) +; LP64D-NEXT: fld ft0, 32(s1) +; LP64D-NEXT: fsd ft0, 128(sp) +; LP64D-NEXT: fld ft0, 40(s1) +; LP64D-NEXT: fsd ft0, 120(sp) +; LP64D-NEXT: fld ft0, 48(s1) +; LP64D-NEXT: fsd ft0, 112(sp) +; LP64D-NEXT: fld ft0, 56(s1) +; LP64D-NEXT: fsd ft0, 104(sp) +; LP64D-NEXT: fld ft0, 64(s1) +; LP64D-NEXT: fsd ft0, 96(sp) +; LP64D-NEXT: fld ft0, 72(s1) +; LP64D-NEXT: fsd ft0, 88(sp) +; LP64D-NEXT: fld ft0, 80(s1) +; LP64D-NEXT: fsd ft0, 80(sp) +; LP64D-NEXT: fld ft0, 88(s1) +; LP64D-NEXT: fsd ft0, 72(sp) +; LP64D-NEXT: fld ft0, 96(s1) +; LP64D-NEXT: fsd ft0, 64(sp) +; LP64D-NEXT: fld ft0, 104(s1) +; LP64D-NEXT: fsd ft0, 56(sp) +; LP64D-NEXT: fld ft0, 112(s1) +; LP64D-NEXT: fsd ft0, 48(sp) +; LP64D-NEXT: fld ft0, 120(s1) +; LP64D-NEXT: fsd ft0, 40(sp) +; LP64D-NEXT: fld ft0, 128(s1) +; LP64D-NEXT: fsd ft0, 32(sp) +; LP64D-NEXT: fld ft0, 136(s1) +; LP64D-NEXT: fsd ft0, 24(sp) +; LP64D-NEXT: fld ft0, 144(s1) +; LP64D-NEXT: fsd ft0, 16(sp) +; LP64D-NEXT: fld ft0, 152(s1) +; LP64D-NEXT: fsd ft0, 8(sp) +; LP64D-NEXT: fld fs8, 160(s1) +; LP64D-NEXT: fld fs9, 168(s1) +; LP64D-NEXT: fld fs10, 176(s1) +; LP64D-NEXT: fld fs11, 184(s1) +; LP64D-NEXT: fld fs0, 192(s1) +; LP64D-NEXT: fld fs1, 200(s1) +; LP64D-NEXT: fld fs2, 208(s1) +; LP64D-NEXT: fld fs3, 216(s1) +; LP64D-NEXT: fld fs4, 224(s1) +; LP64D-NEXT: fld fs5, 232(s1) +; LP64D-NEXT: fld fs6, 240(s1) +; LP64D-NEXT: fld fs7, 248(s1) +; LP64D-NEXT: call callee +; LP64D-NEXT: fsd fs7, 248(s1) +; LP64D-NEXT: fsd fs6, 240(s1) +; LP64D-NEXT: fsd fs5, 232(s1) +; LP64D-NEXT: fsd fs4, 224(s1) +; LP64D-NEXT: fsd fs3, 216(s1) +; LP64D-NEXT: fsd fs2, 208(s1) +; LP64D-NEXT: fsd fs1, 200(s1) +; LP64D-NEXT: fsd fs0, 192(s1) +; LP64D-NEXT: fsd fs11, 184(s1) +; LP64D-NEXT: fsd fs10, 176(s1) +; LP64D-NEXT: fsd fs9, 168(s1) +; LP64D-NEXT: fsd fs8, 160(s1) +; LP64D-NEXT: fld ft0, 8(sp) +; LP64D-NEXT: fsd ft0, 152(s1) +; LP64D-NEXT: fld ft0, 16(sp) +; LP64D-NEXT: fsd ft0, 144(s1) +; LP64D-NEXT: fld ft0, 24(sp) +; LP64D-NEXT: fsd ft0, 136(s1) +; LP64D-NEXT: fld ft0, 32(sp) +; LP64D-NEXT: fsd ft0, 128(s1) +; LP64D-NEXT: fld ft0, 40(sp) +; LP64D-NEXT: fsd ft0, 120(s1) +; LP64D-NEXT: fld ft0, 48(sp) +; LP64D-NEXT: fsd ft0, 112(s1) +; LP64D-NEXT: fld ft0, 56(sp) +; LP64D-NEXT: fsd ft0, 104(s1) +; LP64D-NEXT: fld ft0, 64(sp) +; LP64D-NEXT: fsd ft0, 96(s1) +; LP64D-NEXT: fld ft0, 72(sp) +; LP64D-NEXT: fsd ft0, 88(s1) +; LP64D-NEXT: fld ft0, 80(sp) +; LP64D-NEXT: fsd ft0, 80(s1) +; LP64D-NEXT: fld ft0, 88(sp) +; LP64D-NEXT: fsd ft0, 72(s1) +; LP64D-NEXT: fld ft0, 96(sp) +; LP64D-NEXT: fsd ft0, 64(s1) +; LP64D-NEXT: fld ft0, 104(sp) +; LP64D-NEXT: fsd ft0, 56(s1) +; LP64D-NEXT: fld ft0, 112(sp) +; LP64D-NEXT: fsd ft0, 48(s1) +; LP64D-NEXT: fld ft0, 120(sp) +; LP64D-NEXT: fsd ft0, 40(s1) +; LP64D-NEXT: fld ft0, 128(sp) +; LP64D-NEXT: fsd ft0, 32(s1) +; LP64D-NEXT: fld ft0, 136(sp) +; LP64D-NEXT: fsd ft0, 24(s1) +; LP64D-NEXT: fld ft0, 144(sp) +; LP64D-NEXT: fsd ft0, 16(s1) +; LP64D-NEXT: fld ft0, 152(sp) +; LP64D-NEXT: fsd ft0, %lo(var+8)(s0) +; LP64D-NEXT: fld ft0, 160(sp) +; LP64D-NEXT: fsd ft0, %lo(var)(s0) +; LP64D-NEXT: fld fs11, 168(sp) +; LP64D-NEXT: fld fs10, 176(sp) +; LP64D-NEXT: fld fs9, 184(sp) +; LP64D-NEXT: fld fs8, 192(sp) +; LP64D-NEXT: fld fs7, 200(sp) +; LP64D-NEXT: fld fs6, 208(sp) +; LP64D-NEXT: fld fs5, 216(sp) +; LP64D-NEXT: fld fs4, 224(sp) +; LP64D-NEXT: fld fs3, 232(sp) +; LP64D-NEXT: fld fs2, 240(sp) +; LP64D-NEXT: fld fs1, 248(sp) +; LP64D-NEXT: fld fs0, 256(sp) +; LP64D-NEXT: ld s1, 264(sp) +; LP64D-NEXT: ld s0, 272(sp) +; LP64D-NEXT: ld ra, 280(sp) +; LP64D-NEXT: addi sp, sp, 288 +; LP64D-NEXT: ret %val = load [32 x double], [32 x double]* @var call void @callee() store volatile [32 x double] %val, [32 x double]* @var diff --git a/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll b/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll index eb3a4468bb9d..99c07a35226c 100644 --- a/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll +++ b/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll @@ -1,3 +1,4 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \ ; RUN: | FileCheck %s -check-prefix=RV32I ; RUN: llc -mtriple=riscv32 -mattr=+f -target-abi ilp32f -verify-machineinstrs < %s \ @@ -41,10 +42,99 @@ define void @callee() nounwind { ; RV32I-NEXT: sw s9, 36(sp) ; RV32I-NEXT: sw s10, 32(sp) ; RV32I-NEXT: sw s11, 28(sp) -; RV32I-NEXT: lui a0, %hi(var) -; RV32I-NEXT: lw a1, %lo(var)(a0) -; RV32I-NEXT: sw a1, 24(sp) -; RV32I-NEXT: addi a2, a0, %lo(var) +; RV32I-NEXT: lui a7, %hi(var) +; RV32I-NEXT: lw a0, %lo(var)(a7) +; RV32I-NEXT: sw a0, 24(sp) +; RV32I-NEXT: lw a0, %lo(var+4)(a7) +; RV32I-NEXT: sw a0, 20(sp) +; RV32I-NEXT: lw a0, %lo(var+8)(a7) +; RV32I-NEXT: sw a0, 16(sp) +; RV32I-NEXT: lw a0, %lo(var+12)(a7) +; RV32I-NEXT: sw a0, 12(sp) +; RV32I-NEXT: addi a5, a7, %lo(var) +; RV32I-NEXT: lw a0, 16(a5) +; RV32I-NEXT: sw a0, 8(sp) +; RV32I-NEXT: lw a0, 20(a5) +; RV32I-NEXT: sw a0, 4(sp) +; RV32I-NEXT: lw t4, 24(a5) +; RV32I-NEXT: lw t5, 28(a5) +; RV32I-NEXT: lw t6, 32(a5) +; RV32I-NEXT: lw s2, 36(a5) +; RV32I-NEXT: lw s3, 40(a5) +; RV32I-NEXT: lw s4, 44(a5) +; RV32I-NEXT: lw s5, 48(a5) +; RV32I-NEXT: lw s6, 52(a5) +; RV32I-NEXT: lw s7, 56(a5) +; RV32I-NEXT: lw s8, 60(a5) +; RV32I-NEXT: lw s9, 64(a5) +; RV32I-NEXT: lw s10, 68(a5) +; RV32I-NEXT: lw s11, 72(a5) +; RV32I-NEXT: lw ra, 76(a5) +; RV32I-NEXT: lw s1, 80(a5) +; RV32I-NEXT: lw t3, 84(a5) +; RV32I-NEXT: lw t2, 88(a5) +; RV32I-NEXT: lw t1, 92(a5) +; RV32I-NEXT: lw t0, 96(a5) +; RV32I-NEXT: lw s0, 100(a5) +; RV32I-NEXT: lw a6, 104(a5) +; RV32I-NEXT: lw a4, 108(a5) +; RV32I-NEXT: lw a0, 124(a5) +; RV32I-NEXT: lw a1, 120(a5) +; RV32I-NEXT: lw a2, 116(a5) +; RV32I-NEXT: lw a3, 112(a5) +; RV32I-NEXT: sw a0, 124(a5) +; RV32I-NEXT: sw a1, 120(a5) +; RV32I-NEXT: sw a2, 116(a5) +; RV32I-NEXT: sw a3, 112(a5) +; RV32I-NEXT: sw a4, 108(a5) +; RV32I-NEXT: sw a6, 104(a5) +; RV32I-NEXT: sw s0, 100(a5) +; RV32I-NEXT: sw t0, 96(a5) +; RV32I-NEXT: sw t1, 92(a5) +; RV32I-NEXT: sw t2, 88(a5) +; RV32I-NEXT: sw t3, 84(a5) +; RV32I-NEXT: sw s1, 80(a5) +; RV32I-NEXT: sw ra, 76(a5) +; RV32I-NEXT: sw s11, 72(a5) +; RV32I-NEXT: sw s10, 68(a5) +; RV32I-NEXT: sw s9, 64(a5) +; RV32I-NEXT: sw s8, 60(a5) +; RV32I-NEXT: sw s7, 56(a5) +; RV32I-NEXT: sw s6, 52(a5) +; RV32I-NEXT: sw s5, 48(a5) +; RV32I-NEXT: sw s4, 44(a5) +; RV32I-NEXT: sw s3, 40(a5) +; RV32I-NEXT: sw s2, 36(a5) +; RV32I-NEXT: sw t6, 32(a5) +; RV32I-NEXT: sw t5, 28(a5) +; RV32I-NEXT: sw t4, 24(a5) +; RV32I-NEXT: lw a0, 4(sp) +; RV32I-NEXT: sw a0, 20(a5) +; RV32I-NEXT: lw a0, 8(sp) +; RV32I-NEXT: sw a0, 16(a5) +; RV32I-NEXT: lw a0, 12(sp) +; RV32I-NEXT: sw a0, %lo(var+12)(a7) +; RV32I-NEXT: lw a0, 16(sp) +; RV32I-NEXT: sw a0, %lo(var+8)(a7) +; RV32I-NEXT: lw a0, 20(sp) +; RV32I-NEXT: sw a0, %lo(var+4)(a7) +; RV32I-NEXT: lw a0, 24(sp) +; RV32I-NEXT: sw a0, %lo(var)(a7) +; RV32I-NEXT: lw s11, 28(sp) +; RV32I-NEXT: lw s10, 32(sp) +; RV32I-NEXT: lw s9, 36(sp) +; RV32I-NEXT: lw s8, 40(sp) +; RV32I-NEXT: lw s7, 44(sp) +; RV32I-NEXT: lw s6, 48(sp) +; RV32I-NEXT: lw s5, 52(sp) +; RV32I-NEXT: lw s4, 56(sp) +; RV32I-NEXT: lw s3, 60(sp) +; RV32I-NEXT: lw s2, 64(sp) +; RV32I-NEXT: lw s1, 68(sp) +; RV32I-NEXT: lw s0, 72(sp) +; RV32I-NEXT: lw ra, 76(sp) +; RV32I-NEXT: addi sp, sp, 80 +; RV32I-NEXT: ret ; ; RV32I-WITH-FP-LABEL: callee: ; RV32I-WITH-FP: # %bb.0: @@ -63,31 +153,211 @@ define void @callee() nounwind { ; RV32I-WITH-FP-NEXT: sw s10, 32(sp) ; RV32I-WITH-FP-NEXT: sw s11, 28(sp) ; RV32I-WITH-FP-NEXT: addi s0, sp, 80 -; RV32I-WITH-FP-NEXT: lui a0, %hi(var) -; RV32I-WITH-FP-NEXT: lw a1, %lo(var)(a0) -; RV32I-WITH-FP-NEXT: sw a1, -56(s0) -; RV32I-WITH-FP-NEXT: addi a2, a0, %lo(var) +; RV32I-WITH-FP-NEXT: lui a7, %hi(var) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var)(a7) +; RV32I-WITH-FP-NEXT: sw a0, -56(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+4)(a7) +; RV32I-WITH-FP-NEXT: sw a0, -60(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+8)(a7) +; RV32I-WITH-FP-NEXT: sw a0, -64(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+12)(a7) +; RV32I-WITH-FP-NEXT: sw a0, -68(s0) +; RV32I-WITH-FP-NEXT: addi a5, a7, %lo(var) +; RV32I-WITH-FP-NEXT: lw a0, 16(a5) +; RV32I-WITH-FP-NEXT: sw a0, -72(s0) +; RV32I-WITH-FP-NEXT: lw a0, 20(a5) +; RV32I-WITH-FP-NEXT: sw a0, -76(s0) +; RV32I-WITH-FP-NEXT: lw a0, 24(a5) +; RV32I-WITH-FP-NEXT: sw a0, -80(s0) +; RV32I-WITH-FP-NEXT: lw t5, 28(a5) +; RV32I-WITH-FP-NEXT: lw t6, 32(a5) +; RV32I-WITH-FP-NEXT: lw s2, 36(a5) +; RV32I-WITH-FP-NEXT: lw s3, 40(a5) +; RV32I-WITH-FP-NEXT: lw s4, 44(a5) +; RV32I-WITH-FP-NEXT: lw s5, 48(a5) +; RV32I-WITH-FP-NEXT: lw s6, 52(a5) +; RV32I-WITH-FP-NEXT: lw s7, 56(a5) +; RV32I-WITH-FP-NEXT: lw s8, 60(a5) +; RV32I-WITH-FP-NEXT: lw s9, 64(a5) +; RV32I-WITH-FP-NEXT: lw s10, 68(a5) +; RV32I-WITH-FP-NEXT: lw s11, 72(a5) +; RV32I-WITH-FP-NEXT: lw ra, 76(a5) +; RV32I-WITH-FP-NEXT: lw t4, 80(a5) +; RV32I-WITH-FP-NEXT: lw t3, 84(a5) +; RV32I-WITH-FP-NEXT: lw t2, 88(a5) +; RV32I-WITH-FP-NEXT: lw s1, 92(a5) +; RV32I-WITH-FP-NEXT: lw t1, 96(a5) +; RV32I-WITH-FP-NEXT: lw t0, 100(a5) +; RV32I-WITH-FP-NEXT: lw a6, 104(a5) +; RV32I-WITH-FP-NEXT: lw a4, 108(a5) +; RV32I-WITH-FP-NEXT: lw a0, 124(a5) +; RV32I-WITH-FP-NEXT: lw a1, 120(a5) +; RV32I-WITH-FP-NEXT: lw a2, 116(a5) +; RV32I-WITH-FP-NEXT: lw a3, 112(a5) +; RV32I-WITH-FP-NEXT: sw a0, 124(a5) +; RV32I-WITH-FP-NEXT: sw a1, 120(a5) +; RV32I-WITH-FP-NEXT: sw a2, 116(a5) +; RV32I-WITH-FP-NEXT: sw a3, 112(a5) +; RV32I-WITH-FP-NEXT: sw a4, 108(a5) +; RV32I-WITH-FP-NEXT: sw a6, 104(a5) +; RV32I-WITH-FP-NEXT: sw t0, 100(a5) +; RV32I-WITH-FP-NEXT: sw t1, 96(a5) +; RV32I-WITH-FP-NEXT: sw s1, 92(a5) +; RV32I-WITH-FP-NEXT: sw t2, 88(a5) +; RV32I-WITH-FP-NEXT: sw t3, 84(a5) +; RV32I-WITH-FP-NEXT: sw t4, 80(a5) +; RV32I-WITH-FP-NEXT: sw ra, 76(a5) +; RV32I-WITH-FP-NEXT: sw s11, 72(a5) +; RV32I-WITH-FP-NEXT: sw s10, 68(a5) +; RV32I-WITH-FP-NEXT: sw s9, 64(a5) +; RV32I-WITH-FP-NEXT: sw s8, 60(a5) +; RV32I-WITH-FP-NEXT: sw s7, 56(a5) +; RV32I-WITH-FP-NEXT: sw s6, 52(a5) +; RV32I-WITH-FP-NEXT: sw s5, 48(a5) +; RV32I-WITH-FP-NEXT: sw s4, 44(a5) +; RV32I-WITH-FP-NEXT: sw s3, 40(a5) +; RV32I-WITH-FP-NEXT: sw s2, 36(a5) +; RV32I-WITH-FP-NEXT: sw t6, 32(a5) +; RV32I-WITH-FP-NEXT: sw t5, 28(a5) +; RV32I-WITH-FP-NEXT: lw a0, -80(s0) +; RV32I-WITH-FP-NEXT: sw a0, 24(a5) +; RV32I-WITH-FP-NEXT: lw a0, -76(s0) +; RV32I-WITH-FP-NEXT: sw a0, 20(a5) +; RV32I-WITH-FP-NEXT: lw a0, -72(s0) +; RV32I-WITH-FP-NEXT: sw a0, 16(a5) +; RV32I-WITH-FP-NEXT: lw a0, -68(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+12)(a7) +; RV32I-WITH-FP-NEXT: lw a0, -64(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+8)(a7) +; RV32I-WITH-FP-NEXT: lw a0, -60(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+4)(a7) +; RV32I-WITH-FP-NEXT: lw a0, -56(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var)(a7) +; RV32I-WITH-FP-NEXT: lw s11, 28(sp) +; RV32I-WITH-FP-NEXT: lw s10, 32(sp) +; RV32I-WITH-FP-NEXT: lw s9, 36(sp) +; RV32I-WITH-FP-NEXT: lw s8, 40(sp) +; RV32I-WITH-FP-NEXT: lw s7, 44(sp) +; RV32I-WITH-FP-NEXT: lw s6, 48(sp) +; RV32I-WITH-FP-NEXT: lw s5, 52(sp) +; RV32I-WITH-FP-NEXT: lw s4, 56(sp) +; RV32I-WITH-FP-NEXT: lw s3, 60(sp) +; RV32I-WITH-FP-NEXT: lw s2, 64(sp) +; RV32I-WITH-FP-NEXT: lw s1, 68(sp) +; RV32I-WITH-FP-NEXT: lw s0, 72(sp) +; RV32I-WITH-FP-NEXT: lw ra, 76(sp) +; RV32I-WITH-FP-NEXT: addi sp, sp, 80 +; RV32I-WITH-FP-NEXT: ret ; ; RV64I-LABEL: callee: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -144 -; RV64I-NEXT: sd ra, 136(sp) -; RV64I-NEXT: sd s0, 128(sp) -; RV64I-NEXT: sd s1, 120(sp) -; RV64I-NEXT: sd s2, 112(sp) -; RV64I-NEXT: sd s3, 104(sp) -; RV64I-NEXT: sd s4, 96(sp) -; RV64I-NEXT: sd s5, 88(sp) -; RV64I-NEXT: sd s6, 80(sp) -; RV64I-NEXT: sd s7, 72(sp) -; RV64I-NEXT: sd s8, 64(sp) -; RV64I-NEXT: sd s9, 56(sp) -; RV64I-NEXT: sd s10, 48(sp) -; RV64I-NEXT: sd s11, 40(sp) -; RV64I-NEXT: lui a0, %hi(var) -; RV64I-NEXT: lw a1, %lo(var)(a0) -; RV64I-NEXT: sd a1, 32(sp) -; RV64I-NEXT: addi a2, a0, %lo(var) +; RV64I-NEXT: addi sp, sp, -160 +; RV64I-NEXT: sd ra, 152(sp) +; RV64I-NEXT: sd s0, 144(sp) +; RV64I-NEXT: sd s1, 136(sp) +; RV64I-NEXT: sd s2, 128(sp) +; RV64I-NEXT: sd s3, 120(sp) +; RV64I-NEXT: sd s4, 112(sp) +; RV64I-NEXT: sd s5, 104(sp) +; RV64I-NEXT: sd s6, 96(sp) +; RV64I-NEXT: sd s7, 88(sp) +; RV64I-NEXT: sd s8, 80(sp) +; RV64I-NEXT: sd s9, 72(sp) +; RV64I-NEXT: sd s10, 64(sp) +; RV64I-NEXT: sd s11, 56(sp) +; RV64I-NEXT: lui a7, %hi(var) +; RV64I-NEXT: lw a0, %lo(var)(a7) +; RV64I-NEXT: sd a0, 48(sp) +; RV64I-NEXT: lw a0, %lo(var+4)(a7) +; RV64I-NEXT: sd a0, 40(sp) +; RV64I-NEXT: lw a0, %lo(var+8)(a7) +; RV64I-NEXT: sd a0, 32(sp) +; RV64I-NEXT: lw a0, %lo(var+12)(a7) +; RV64I-NEXT: sd a0, 24(sp) +; RV64I-NEXT: addi a5, a7, %lo(var) +; RV64I-NEXT: lw a0, 16(a5) +; RV64I-NEXT: sd a0, 16(sp) +; RV64I-NEXT: lw a0, 20(a5) +; RV64I-NEXT: sd a0, 8(sp) +; RV64I-NEXT: lw t4, 24(a5) +; RV64I-NEXT: lw t5, 28(a5) +; RV64I-NEXT: lw t6, 32(a5) +; RV64I-NEXT: lw s2, 36(a5) +; RV64I-NEXT: lw s3, 40(a5) +; RV64I-NEXT: lw s4, 44(a5) +; RV64I-NEXT: lw s5, 48(a5) +; RV64I-NEXT: lw s6, 52(a5) +; RV64I-NEXT: lw s7, 56(a5) +; RV64I-NEXT: lw s8, 60(a5) +; RV64I-NEXT: lw s9, 64(a5) +; RV64I-NEXT: lw s10, 68(a5) +; RV64I-NEXT: lw s11, 72(a5) +; RV64I-NEXT: lw ra, 76(a5) +; RV64I-NEXT: lw s1, 80(a5) +; RV64I-NEXT: lw t3, 84(a5) +; RV64I-NEXT: lw t2, 88(a5) +; RV64I-NEXT: lw t1, 92(a5) +; RV64I-NEXT: lw t0, 96(a5) +; RV64I-NEXT: lw s0, 100(a5) +; RV64I-NEXT: lw a6, 104(a5) +; RV64I-NEXT: lw a4, 108(a5) +; RV64I-NEXT: lw a0, 124(a5) +; RV64I-NEXT: lw a1, 120(a5) +; RV64I-NEXT: lw a2, 116(a5) +; RV64I-NEXT: lw a3, 112(a5) +; RV64I-NEXT: sw a0, 124(a5) +; RV64I-NEXT: sw a1, 120(a5) +; RV64I-NEXT: sw a2, 116(a5) +; RV64I-NEXT: sw a3, 112(a5) +; RV64I-NEXT: sw a4, 108(a5) +; RV64I-NEXT: sw a6, 104(a5) +; RV64I-NEXT: sw s0, 100(a5) +; RV64I-NEXT: sw t0, 96(a5) +; RV64I-NEXT: sw t1, 92(a5) +; RV64I-NEXT: sw t2, 88(a5) +; RV64I-NEXT: sw t3, 84(a5) +; RV64I-NEXT: sw s1, 80(a5) +; RV64I-NEXT: sw ra, 76(a5) +; RV64I-NEXT: sw s11, 72(a5) +; RV64I-NEXT: sw s10, 68(a5) +; RV64I-NEXT: sw s9, 64(a5) +; RV64I-NEXT: sw s8, 60(a5) +; RV64I-NEXT: sw s7, 56(a5) +; RV64I-NEXT: sw s6, 52(a5) +; RV64I-NEXT: sw s5, 48(a5) +; RV64I-NEXT: sw s4, 44(a5) +; RV64I-NEXT: sw s3, 40(a5) +; RV64I-NEXT: sw s2, 36(a5) +; RV64I-NEXT: sw t6, 32(a5) +; RV64I-NEXT: sw t5, 28(a5) +; RV64I-NEXT: sw t4, 24(a5) +; RV64I-NEXT: ld a0, 8(sp) +; RV64I-NEXT: sw a0, 20(a5) +; RV64I-NEXT: ld a0, 16(sp) +; RV64I-NEXT: sw a0, 16(a5) +; RV64I-NEXT: ld a0, 24(sp) +; RV64I-NEXT: sw a0, %lo(var+12)(a7) +; RV64I-NEXT: ld a0, 32(sp) +; RV64I-NEXT: sw a0, %lo(var+8)(a7) +; RV64I-NEXT: ld a0, 40(sp) +; RV64I-NEXT: sw a0, %lo(var+4)(a7) +; RV64I-NEXT: ld a0, 48(sp) +; RV64I-NEXT: sw a0, %lo(var)(a7) +; RV64I-NEXT: ld s11, 56(sp) +; RV64I-NEXT: ld s10, 64(sp) +; RV64I-NEXT: ld s9, 72(sp) +; RV64I-NEXT: ld s8, 80(sp) +; RV64I-NEXT: ld s7, 88(sp) +; RV64I-NEXT: ld s6, 96(sp) +; RV64I-NEXT: ld s5, 104(sp) +; RV64I-NEXT: ld s4, 112(sp) +; RV64I-NEXT: ld s3, 120(sp) +; RV64I-NEXT: ld s2, 128(sp) +; RV64I-NEXT: ld s1, 136(sp) +; RV64I-NEXT: ld s0, 144(sp) +; RV64I-NEXT: ld ra, 152(sp) +; RV64I-NEXT: addi sp, sp, 160 +; RV64I-NEXT: ret ; ; RV64I-WITH-FP-LABEL: callee: ; RV64I-WITH-FP: # %bb.0: @@ -106,10 +376,101 @@ define void @callee() nounwind { ; RV64I-WITH-FP-NEXT: sd s10, 64(sp) ; RV64I-WITH-FP-NEXT: sd s11, 56(sp) ; RV64I-WITH-FP-NEXT: addi s0, sp, 160 -; RV64I-WITH-FP-NEXT: lui a0, %hi(var) -; RV64I-WITH-FP-NEXT: lw a1, %lo(var)(a0) -; RV64I-WITH-FP-NEXT: sd a1, -112(s0) -; RV64I-WITH-FP-NEXT: addi a2, a0, %lo(var) +; RV64I-WITH-FP-NEXT: lui a7, %hi(var) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var)(a7) +; RV64I-WITH-FP-NEXT: sd a0, -112(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+4)(a7) +; RV64I-WITH-FP-NEXT: sd a0, -120(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+8)(a7) +; RV64I-WITH-FP-NEXT: sd a0, -128(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+12)(a7) +; RV64I-WITH-FP-NEXT: sd a0, -136(s0) +; RV64I-WITH-FP-NEXT: addi a5, a7, %lo(var) +; RV64I-WITH-FP-NEXT: lw a0, 16(a5) +; RV64I-WITH-FP-NEXT: sd a0, -144(s0) +; RV64I-WITH-FP-NEXT: lw a0, 20(a5) +; RV64I-WITH-FP-NEXT: sd a0, -152(s0) +; RV64I-WITH-FP-NEXT: lw a0, 24(a5) +; RV64I-WITH-FP-NEXT: sd a0, -160(s0) +; RV64I-WITH-FP-NEXT: lw t5, 28(a5) +; RV64I-WITH-FP-NEXT: lw t6, 32(a5) +; RV64I-WITH-FP-NEXT: lw s2, 36(a5) +; RV64I-WITH-FP-NEXT: lw s3, 40(a5) +; RV64I-WITH-FP-NEXT: lw s4, 44(a5) +; RV64I-WITH-FP-NEXT: lw s5, 48(a5) +; RV64I-WITH-FP-NEXT: lw s6, 52(a5) +; RV64I-WITH-FP-NEXT: lw s7, 56(a5) +; RV64I-WITH-FP-NEXT: lw s8, 60(a5) +; RV64I-WITH-FP-NEXT: lw s9, 64(a5) +; RV64I-WITH-FP-NEXT: lw s10, 68(a5) +; RV64I-WITH-FP-NEXT: lw s11, 72(a5) +; RV64I-WITH-FP-NEXT: lw ra, 76(a5) +; RV64I-WITH-FP-NEXT: lw t4, 80(a5) +; RV64I-WITH-FP-NEXT: lw t3, 84(a5) +; RV64I-WITH-FP-NEXT: lw t2, 88(a5) +; RV64I-WITH-FP-NEXT: lw s1, 92(a5) +; RV64I-WITH-FP-NEXT: lw t1, 96(a5) +; RV64I-WITH-FP-NEXT: lw t0, 100(a5) +; RV64I-WITH-FP-NEXT: lw a6, 104(a5) +; RV64I-WITH-FP-NEXT: lw a4, 108(a5) +; RV64I-WITH-FP-NEXT: lw a0, 124(a5) +; RV64I-WITH-FP-NEXT: lw a1, 120(a5) +; RV64I-WITH-FP-NEXT: lw a2, 116(a5) +; RV64I-WITH-FP-NEXT: lw a3, 112(a5) +; RV64I-WITH-FP-NEXT: sw a0, 124(a5) +; RV64I-WITH-FP-NEXT: sw a1, 120(a5) +; RV64I-WITH-FP-NEXT: sw a2, 116(a5) +; RV64I-WITH-FP-NEXT: sw a3, 112(a5) +; RV64I-WITH-FP-NEXT: sw a4, 108(a5) +; RV64I-WITH-FP-NEXT: sw a6, 104(a5) +; RV64I-WITH-FP-NEXT: sw t0, 100(a5) +; RV64I-WITH-FP-NEXT: sw t1, 96(a5) +; RV64I-WITH-FP-NEXT: sw s1, 92(a5) +; RV64I-WITH-FP-NEXT: sw t2, 88(a5) +; RV64I-WITH-FP-NEXT: sw t3, 84(a5) +; RV64I-WITH-FP-NEXT: sw t4, 80(a5) +; RV64I-WITH-FP-NEXT: sw ra, 76(a5) +; RV64I-WITH-FP-NEXT: sw s11, 72(a5) +; RV64I-WITH-FP-NEXT: sw s10, 68(a5) +; RV64I-WITH-FP-NEXT: sw s9, 64(a5) +; RV64I-WITH-FP-NEXT: sw s8, 60(a5) +; RV64I-WITH-FP-NEXT: sw s7, 56(a5) +; RV64I-WITH-FP-NEXT: sw s6, 52(a5) +; RV64I-WITH-FP-NEXT: sw s5, 48(a5) +; RV64I-WITH-FP-NEXT: sw s4, 44(a5) +; RV64I-WITH-FP-NEXT: sw s3, 40(a5) +; RV64I-WITH-FP-NEXT: sw s2, 36(a5) +; RV64I-WITH-FP-NEXT: sw t6, 32(a5) +; RV64I-WITH-FP-NEXT: sw t5, 28(a5) +; RV64I-WITH-FP-NEXT: ld a0, -160(s0) +; RV64I-WITH-FP-NEXT: sw a0, 24(a5) +; RV64I-WITH-FP-NEXT: ld a0, -152(s0) +; RV64I-WITH-FP-NEXT: sw a0, 20(a5) +; RV64I-WITH-FP-NEXT: ld a0, -144(s0) +; RV64I-WITH-FP-NEXT: sw a0, 16(a5) +; RV64I-WITH-FP-NEXT: ld a0, -136(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+12)(a7) +; RV64I-WITH-FP-NEXT: ld a0, -128(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+8)(a7) +; RV64I-WITH-FP-NEXT: ld a0, -120(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+4)(a7) +; RV64I-WITH-FP-NEXT: ld a0, -112(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var)(a7) +; RV64I-WITH-FP-NEXT: ld s11, 56(sp) +; RV64I-WITH-FP-NEXT: ld s10, 64(sp) +; RV64I-WITH-FP-NEXT: ld s9, 72(sp) +; RV64I-WITH-FP-NEXT: ld s8, 80(sp) +; RV64I-WITH-FP-NEXT: ld s7, 88(sp) +; RV64I-WITH-FP-NEXT: ld s6, 96(sp) +; RV64I-WITH-FP-NEXT: ld s5, 104(sp) +; RV64I-WITH-FP-NEXT: ld s4, 112(sp) +; RV64I-WITH-FP-NEXT: ld s3, 120(sp) +; RV64I-WITH-FP-NEXT: ld s2, 128(sp) +; RV64I-WITH-FP-NEXT: ld s1, 136(sp) +; RV64I-WITH-FP-NEXT: ld s0, 144(sp) +; RV64I-WITH-FP-NEXT: ld ra, 152(sp) +; RV64I-WITH-FP-NEXT: addi sp, sp, 160 +; RV64I-WITH-FP-NEXT: ret %val = load [32 x i32], [32 x i32]* @var store volatile [32 x i32] %val, [32 x i32]* @var ret void @@ -120,127 +481,583 @@ define void @callee() nounwind { define void @caller() nounwind { ; RV32I-LABEL: caller: -; RV32I: lui a0, %hi(var) -; RV32I-NEXT: lw a1, %lo(var)(a0) -; RV32I-NEXT: sw a1, 88(sp) -; RV32I-NEXT: addi s0, a0, %lo(var) - -; RV32I: sw a0, 8(sp) -; RV32I-NEXT: lw s2, 84(s0) -; RV32I-NEXT: lw s3, 88(s0) -; RV32I-NEXT: lw s4, 92(s0) -; RV32I-NEXT: lw s5, 96(s0) -; RV32I-NEXT: lw s6, 100(s0) -; RV32I-NEXT: lw s7, 104(s0) -; RV32I-NEXT: lw s8, 108(s0) -; RV32I-NEXT: lw s9, 112(s0) -; RV32I-NEXT: lw s10, 116(s0) -; RV32I-NEXT: lw s11, 120(s0) -; RV32I-NEXT: lw s1, 124(s0) +; RV32I: # %bb.0: +; RV32I-NEXT: addi sp, sp, -144 +; RV32I-NEXT: sw ra, 140(sp) +; RV32I-NEXT: sw s0, 136(sp) +; RV32I-NEXT: sw s1, 132(sp) +; RV32I-NEXT: sw s2, 128(sp) +; RV32I-NEXT: sw s3, 124(sp) +; RV32I-NEXT: sw s4, 120(sp) +; RV32I-NEXT: sw s5, 116(sp) +; RV32I-NEXT: sw s6, 112(sp) +; RV32I-NEXT: sw s7, 108(sp) +; RV32I-NEXT: sw s8, 104(sp) +; RV32I-NEXT: sw s9, 100(sp) +; RV32I-NEXT: sw s10, 96(sp) +; RV32I-NEXT: sw s11, 92(sp) +; RV32I-NEXT: lui s0, %hi(var) +; RV32I-NEXT: lw a0, %lo(var)(s0) +; RV32I-NEXT: sw a0, 88(sp) +; RV32I-NEXT: lw a0, %lo(var+4)(s0) +; RV32I-NEXT: sw a0, 84(sp) +; RV32I-NEXT: lw a0, %lo(var+8)(s0) +; RV32I-NEXT: sw a0, 80(sp) +; RV32I-NEXT: lw a0, %lo(var+12)(s0) +; RV32I-NEXT: sw a0, 76(sp) +; RV32I-NEXT: addi s1, s0, %lo(var) +; RV32I-NEXT: lw a0, 16(s1) +; RV32I-NEXT: sw a0, 72(sp) +; RV32I-NEXT: lw a0, 20(s1) +; RV32I-NEXT: sw a0, 68(sp) +; RV32I-NEXT: lw a0, 24(s1) +; RV32I-NEXT: sw a0, 64(sp) +; RV32I-NEXT: lw a0, 28(s1) +; RV32I-NEXT: sw a0, 60(sp) +; RV32I-NEXT: lw a0, 32(s1) +; RV32I-NEXT: sw a0, 56(sp) +; RV32I-NEXT: lw a0, 36(s1) +; RV32I-NEXT: sw a0, 52(sp) +; RV32I-NEXT: lw a0, 40(s1) +; RV32I-NEXT: sw a0, 48(sp) +; RV32I-NEXT: lw a0, 44(s1) +; RV32I-NEXT: sw a0, 44(sp) +; RV32I-NEXT: lw a0, 48(s1) +; RV32I-NEXT: sw a0, 40(sp) +; RV32I-NEXT: lw a0, 52(s1) +; RV32I-NEXT: sw a0, 36(sp) +; RV32I-NEXT: lw a0, 56(s1) +; RV32I-NEXT: sw a0, 32(sp) +; RV32I-NEXT: lw a0, 60(s1) +; RV32I-NEXT: sw a0, 28(sp) +; RV32I-NEXT: lw a0, 64(s1) +; RV32I-NEXT: sw a0, 24(sp) +; RV32I-NEXT: lw a0, 68(s1) +; RV32I-NEXT: sw a0, 20(sp) +; RV32I-NEXT: lw a0, 72(s1) +; RV32I-NEXT: sw a0, 16(sp) +; RV32I-NEXT: lw a0, 76(s1) +; RV32I-NEXT: sw a0, 12(sp) +; RV32I-NEXT: lw a0, 80(s1) +; RV32I-NEXT: sw a0, 8(sp) +; RV32I-NEXT: lw a0, 84(s1) +; RV32I-NEXT: sw a0, 4(sp) +; RV32I-NEXT: lw s4, 88(s1) +; RV32I-NEXT: lw s5, 92(s1) +; RV32I-NEXT: lw s6, 96(s1) +; RV32I-NEXT: lw s7, 100(s1) +; RV32I-NEXT: lw s8, 104(s1) +; RV32I-NEXT: lw s9, 108(s1) +; RV32I-NEXT: lw s10, 112(s1) +; RV32I-NEXT: lw s11, 116(s1) +; RV32I-NEXT: lw s2, 120(s1) +; RV32I-NEXT: lw s3, 124(s1) ; RV32I-NEXT: call callee -; RV32I-NEXT: sw s1, 124(s0) -; RV32I-NEXT: sw s11, 120(s0) -; RV32I-NEXT: sw s10, 116(s0) -; RV32I-NEXT: sw s9, 112(s0) -; RV32I-NEXT: sw s8, 108(s0) -; RV32I-NEXT: sw s7, 104(s0) -; RV32I-NEXT: sw s6, 100(s0) -; RV32I-NEXT: sw s5, 96(s0) -; RV32I-NEXT: sw s4, 92(s0) -; RV32I-NEXT: sw s3, 88(s0) -; RV32I-NEXT: sw s2, 84(s0) +; RV32I-NEXT: sw s3, 124(s1) +; RV32I-NEXT: sw s2, 120(s1) +; RV32I-NEXT: sw s11, 116(s1) +; RV32I-NEXT: sw s10, 112(s1) +; RV32I-NEXT: sw s9, 108(s1) +; RV32I-NEXT: sw s8, 104(s1) +; RV32I-NEXT: sw s7, 100(s1) +; RV32I-NEXT: sw s6, 96(s1) +; RV32I-NEXT: sw s5, 92(s1) +; RV32I-NEXT: sw s4, 88(s1) +; RV32I-NEXT: lw a0, 4(sp) +; RV32I-NEXT: sw a0, 84(s1) ; RV32I-NEXT: lw a0, 8(sp) +; RV32I-NEXT: sw a0, 80(s1) +; RV32I-NEXT: lw a0, 12(sp) +; RV32I-NEXT: sw a0, 76(s1) +; RV32I-NEXT: lw a0, 16(sp) +; RV32I-NEXT: sw a0, 72(s1) +; RV32I-NEXT: lw a0, 20(sp) +; RV32I-NEXT: sw a0, 68(s1) +; RV32I-NEXT: lw a0, 24(sp) +; RV32I-NEXT: sw a0, 64(s1) +; RV32I-NEXT: lw a0, 28(sp) +; RV32I-NEXT: sw a0, 60(s1) +; RV32I-NEXT: lw a0, 32(sp) +; RV32I-NEXT: sw a0, 56(s1) +; RV32I-NEXT: lw a0, 36(sp) +; RV32I-NEXT: sw a0, 52(s1) +; RV32I-NEXT: lw a0, 40(sp) +; RV32I-NEXT: sw a0, 48(s1) +; RV32I-NEXT: lw a0, 44(sp) +; RV32I-NEXT: sw a0, 44(s1) +; RV32I-NEXT: lw a0, 48(sp) +; RV32I-NEXT: sw a0, 40(s1) +; RV32I-NEXT: lw a0, 52(sp) +; RV32I-NEXT: sw a0, 36(s1) +; RV32I-NEXT: lw a0, 56(sp) +; RV32I-NEXT: sw a0, 32(s1) +; RV32I-NEXT: lw a0, 60(sp) +; RV32I-NEXT: sw a0, 28(s1) +; RV32I-NEXT: lw a0, 64(sp) +; RV32I-NEXT: sw a0, 24(s1) +; RV32I-NEXT: lw a0, 68(sp) +; RV32I-NEXT: sw a0, 20(s1) +; RV32I-NEXT: lw a0, 72(sp) +; RV32I-NEXT: sw a0, 16(s1) +; RV32I-NEXT: lw a0, 76(sp) +; RV32I-NEXT: sw a0, %lo(var+12)(s0) +; RV32I-NEXT: lw a0, 80(sp) +; RV32I-NEXT: sw a0, %lo(var+8)(s0) +; RV32I-NEXT: lw a0, 84(sp) +; RV32I-NEXT: sw a0, %lo(var+4)(s0) +; RV32I-NEXT: lw a0, 88(sp) +; RV32I-NEXT: sw a0, %lo(var)(s0) +; RV32I-NEXT: lw s11, 92(sp) +; RV32I-NEXT: lw s10, 96(sp) +; RV32I-NEXT: lw s9, 100(sp) +; RV32I-NEXT: lw s8, 104(sp) +; RV32I-NEXT: lw s7, 108(sp) +; RV32I-NEXT: lw s6, 112(sp) +; RV32I-NEXT: lw s5, 116(sp) +; RV32I-NEXT: lw s4, 120(sp) +; RV32I-NEXT: lw s3, 124(sp) +; RV32I-NEXT: lw s2, 128(sp) +; RV32I-NEXT: lw s1, 132(sp) +; RV32I-NEXT: lw s0, 136(sp) +; RV32I-NEXT: lw ra, 140(sp) +; RV32I-NEXT: addi sp, sp, 144 +; RV32I-NEXT: ret ; ; RV32I-WITH-FP-LABEL: caller: -; RV32I-WITH-FP: addi s0, sp, 144 -; RV32I-WITH-FP-NEXT: lui a0, %hi(var) -; RV32I-WITH-FP-NEXT: lw a1, %lo(var)(a0) -; RV32I-WITH-FP-NEXT: sw a1, -56(s0) -; RV32I-WITH-FP-NEXT: addi s1, a0, %lo(var) -; RV32I-WITH-FP: sw a0, -140(s0) -; RV32I-WITH-FP-NEXT: lw s5, 88(s1) -; RV32I-WITH-FP-NEXT: lw s6, 92(s1) -; RV32I-WITH-FP-NEXT: lw s7, 96(s1) -; RV32I-WITH-FP-NEXT: lw s8, 100(s1) -; RV32I-WITH-FP-NEXT: lw s9, 104(s1) -; RV32I-WITH-FP-NEXT: lw s10, 108(s1) -; RV32I-WITH-FP-NEXT: lw s11, 112(s1) -; RV32I-WITH-FP-NEXT: lw s2, 116(s1) -; RV32I-WITH-FP-NEXT: lw s3, 120(s1) -; RV32I-WITH-FP-NEXT: lw s4, 124(s1) +; RV32I-WITH-FP: # %bb.0: +; RV32I-WITH-FP-NEXT: addi sp, sp, -144 +; RV32I-WITH-FP-NEXT: sw ra, 140(sp) +; RV32I-WITH-FP-NEXT: sw s0, 136(sp) +; RV32I-WITH-FP-NEXT: sw s1, 132(sp) +; RV32I-WITH-FP-NEXT: sw s2, 128(sp) +; RV32I-WITH-FP-NEXT: sw s3, 124(sp) +; RV32I-WITH-FP-NEXT: sw s4, 120(sp) +; RV32I-WITH-FP-NEXT: sw s5, 116(sp) +; RV32I-WITH-FP-NEXT: sw s6, 112(sp) +; RV32I-WITH-FP-NEXT: sw s7, 108(sp) +; RV32I-WITH-FP-NEXT: sw s8, 104(sp) +; RV32I-WITH-FP-NEXT: sw s9, 100(sp) +; RV32I-WITH-FP-NEXT: sw s10, 96(sp) +; RV32I-WITH-FP-NEXT: sw s11, 92(sp) +; RV32I-WITH-FP-NEXT: addi s0, sp, 144 +; RV32I-WITH-FP-NEXT: lui s6, %hi(var) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var)(s6) +; RV32I-WITH-FP-NEXT: sw a0, -56(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+4)(s6) +; RV32I-WITH-FP-NEXT: sw a0, -60(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+8)(s6) +; RV32I-WITH-FP-NEXT: sw a0, -64(s0) +; RV32I-WITH-FP-NEXT: lw a0, %lo(var+12)(s6) +; RV32I-WITH-FP-NEXT: sw a0, -68(s0) +; RV32I-WITH-FP-NEXT: addi s1, s6, %lo(var) +; RV32I-WITH-FP-NEXT: lw a0, 16(s1) +; RV32I-WITH-FP-NEXT: sw a0, -72(s0) +; RV32I-WITH-FP-NEXT: lw a0, 20(s1) +; RV32I-WITH-FP-NEXT: sw a0, -76(s0) +; RV32I-WITH-FP-NEXT: lw a0, 24(s1) +; RV32I-WITH-FP-NEXT: sw a0, -80(s0) +; RV32I-WITH-FP-NEXT: lw a0, 28(s1) +; RV32I-WITH-FP-NEXT: sw a0, -84(s0) +; RV32I-WITH-FP-NEXT: lw a0, 32(s1) +; RV32I-WITH-FP-NEXT: sw a0, -88(s0) +; RV32I-WITH-FP-NEXT: lw a0, 36(s1) +; RV32I-WITH-FP-NEXT: sw a0, -92(s0) +; RV32I-WITH-FP-NEXT: lw a0, 40(s1) +; RV32I-WITH-FP-NEXT: sw a0, -96(s0) +; RV32I-WITH-FP-NEXT: lw a0, 44(s1) +; RV32I-WITH-FP-NEXT: sw a0, -100(s0) +; RV32I-WITH-FP-NEXT: lw a0, 48(s1) +; RV32I-WITH-FP-NEXT: sw a0, -104(s0) +; RV32I-WITH-FP-NEXT: lw a0, 52(s1) +; RV32I-WITH-FP-NEXT: sw a0, -108(s0) +; RV32I-WITH-FP-NEXT: lw a0, 56(s1) +; RV32I-WITH-FP-NEXT: sw a0, -112(s0) +; RV32I-WITH-FP-NEXT: lw a0, 60(s1) +; RV32I-WITH-FP-NEXT: sw a0, -116(s0) +; RV32I-WITH-FP-NEXT: lw a0, 64(s1) +; RV32I-WITH-FP-NEXT: sw a0, -120(s0) +; RV32I-WITH-FP-NEXT: lw a0, 68(s1) +; RV32I-WITH-FP-NEXT: sw a0, -124(s0) +; RV32I-WITH-FP-NEXT: lw a0, 72(s1) +; RV32I-WITH-FP-NEXT: sw a0, -128(s0) +; RV32I-WITH-FP-NEXT: lw a0, 76(s1) +; RV32I-WITH-FP-NEXT: sw a0, -132(s0) +; RV32I-WITH-FP-NEXT: lw a0, 80(s1) +; RV32I-WITH-FP-NEXT: sw a0, -136(s0) +; RV32I-WITH-FP-NEXT: lw a0, 84(s1) +; RV32I-WITH-FP-NEXT: sw a0, -140(s0) +; RV32I-WITH-FP-NEXT: lw a0, 88(s1) +; RV32I-WITH-FP-NEXT: sw a0, -144(s0) +; RV32I-WITH-FP-NEXT: lw s8, 92(s1) +; RV32I-WITH-FP-NEXT: lw s9, 96(s1) +; RV32I-WITH-FP-NEXT: lw s10, 100(s1) +; RV32I-WITH-FP-NEXT: lw s11, 104(s1) +; RV32I-WITH-FP-NEXT: lw s2, 108(s1) +; RV32I-WITH-FP-NEXT: lw s3, 112(s1) +; RV32I-WITH-FP-NEXT: lw s4, 116(s1) +; RV32I-WITH-FP-NEXT: lw s5, 120(s1) +; RV32I-WITH-FP-NEXT: lw s7, 124(s1) ; RV32I-WITH-FP-NEXT: call callee -; RV32I-WITH-FP-NEXT: sw s4, 124(s1) -; RV32I-WITH-FP-NEXT: sw s3, 120(s1) -; RV32I-WITH-FP-NEXT: sw s2, 116(s1) -; RV32I-WITH-FP-NEXT: sw s11, 112(s1) -; RV32I-WITH-FP-NEXT: sw s10, 108(s1) -; RV32I-WITH-FP-NEXT: sw s9, 104(s1) -; RV32I-WITH-FP-NEXT: sw s8, 100(s1) -; RV32I-WITH-FP-NEXT: sw s7, 96(s1) -; RV32I-WITH-FP-NEXT: sw s6, 92(s1) -; RV32I-WITH-FP-NEXT: sw s5, 88(s1) +; RV32I-WITH-FP-NEXT: sw s7, 124(s1) +; RV32I-WITH-FP-NEXT: sw s5, 120(s1) +; RV32I-WITH-FP-NEXT: sw s4, 116(s1) +; RV32I-WITH-FP-NEXT: sw s3, 112(s1) +; RV32I-WITH-FP-NEXT: sw s2, 108(s1) +; RV32I-WITH-FP-NEXT: sw s11, 104(s1) +; RV32I-WITH-FP-NEXT: sw s10, 100(s1) +; RV32I-WITH-FP-NEXT: sw s9, 96(s1) +; RV32I-WITH-FP-NEXT: sw s8, 92(s1) +; RV32I-WITH-FP-NEXT: lw a0, -144(s0) +; RV32I-WITH-FP-NEXT: sw a0, 88(s1) ; RV32I-WITH-FP-NEXT: lw a0, -140(s0) +; RV32I-WITH-FP-NEXT: sw a0, 84(s1) +; RV32I-WITH-FP-NEXT: lw a0, -136(s0) +; RV32I-WITH-FP-NEXT: sw a0, 80(s1) +; RV32I-WITH-FP-NEXT: lw a0, -132(s0) +; RV32I-WITH-FP-NEXT: sw a0, 76(s1) +; RV32I-WITH-FP-NEXT: lw a0, -128(s0) +; RV32I-WITH-FP-NEXT: sw a0, 72(s1) +; RV32I-WITH-FP-NEXT: lw a0, -124(s0) +; RV32I-WITH-FP-NEXT: sw a0, 68(s1) +; RV32I-WITH-FP-NEXT: lw a0, -120(s0) +; RV32I-WITH-FP-NEXT: sw a0, 64(s1) +; RV32I-WITH-FP-NEXT: lw a0, -116(s0) +; RV32I-WITH-FP-NEXT: sw a0, 60(s1) +; RV32I-WITH-FP-NEXT: lw a0, -112(s0) +; RV32I-WITH-FP-NEXT: sw a0, 56(s1) +; RV32I-WITH-FP-NEXT: lw a0, -108(s0) +; RV32I-WITH-FP-NEXT: sw a0, 52(s1) +; RV32I-WITH-FP-NEXT: lw a0, -104(s0) +; RV32I-WITH-FP-NEXT: sw a0, 48(s1) +; RV32I-WITH-FP-NEXT: lw a0, -100(s0) +; RV32I-WITH-FP-NEXT: sw a0, 44(s1) +; RV32I-WITH-FP-NEXT: lw a0, -96(s0) +; RV32I-WITH-FP-NEXT: sw a0, 40(s1) +; RV32I-WITH-FP-NEXT: lw a0, -92(s0) +; RV32I-WITH-FP-NEXT: sw a0, 36(s1) +; RV32I-WITH-FP-NEXT: lw a0, -88(s0) +; RV32I-WITH-FP-NEXT: sw a0, 32(s1) +; RV32I-WITH-FP-NEXT: lw a0, -84(s0) +; RV32I-WITH-FP-NEXT: sw a0, 28(s1) +; RV32I-WITH-FP-NEXT: lw a0, -80(s0) +; RV32I-WITH-FP-NEXT: sw a0, 24(s1) +; RV32I-WITH-FP-NEXT: lw a0, -76(s0) +; RV32I-WITH-FP-NEXT: sw a0, 20(s1) +; RV32I-WITH-FP-NEXT: lw a0, -72(s0) +; RV32I-WITH-FP-NEXT: sw a0, 16(s1) +; RV32I-WITH-FP-NEXT: lw a0, -68(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+12)(s6) +; RV32I-WITH-FP-NEXT: lw a0, -64(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+8)(s6) +; RV32I-WITH-FP-NEXT: lw a0, -60(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var+4)(s6) +; RV32I-WITH-FP-NEXT: lw a0, -56(s0) +; RV32I-WITH-FP-NEXT: sw a0, %lo(var)(s6) +; RV32I-WITH-FP-NEXT: lw s11, 92(sp) +; RV32I-WITH-FP-NEXT: lw s10, 96(sp) +; RV32I-WITH-FP-NEXT: lw s9, 100(sp) +; RV32I-WITH-FP-NEXT: lw s8, 104(sp) +; RV32I-WITH-FP-NEXT: lw s7, 108(sp) +; RV32I-WITH-FP-NEXT: lw s6, 112(sp) +; RV32I-WITH-FP-NEXT: lw s5, 116(sp) +; RV32I-WITH-FP-NEXT: lw s4, 120(sp) +; RV32I-WITH-FP-NEXT: lw s3, 124(sp) +; RV32I-WITH-FP-NEXT: lw s2, 128(sp) +; RV32I-WITH-FP-NEXT: lw s1, 132(sp) +; RV32I-WITH-FP-NEXT: lw s0, 136(sp) +; RV32I-WITH-FP-NEXT: lw ra, 140(sp) +; RV32I-WITH-FP-NEXT: addi sp, sp, 144 +; RV32I-WITH-FP-NEXT: ret ; ; RV64I-LABEL: caller: -; RV64I: lui a0, %hi(var) -; RV64I-NEXT: lw a1, %lo(var)(a0) -; RV64I-NEXT: sd a1, 160(sp) -; RV64I-NEXT: addi s0, a0, %lo(var) -; RV64I: sd a0, 0(sp) -; RV64I-NEXT: lw s2, 84(s0) -; RV64I-NEXT: lw s3, 88(s0) -; RV64I-NEXT: lw s4, 92(s0) -; RV64I-NEXT: lw s5, 96(s0) -; RV64I-NEXT: lw s6, 100(s0) -; RV64I-NEXT: lw s7, 104(s0) -; RV64I-NEXT: lw s8, 108(s0) -; RV64I-NEXT: lw s9, 112(s0) -; RV64I-NEXT: lw s10, 116(s0) -; RV64I-NEXT: lw s11, 120(s0) -; RV64I-NEXT: lw s1, 124(s0) +; RV64I: # %bb.0: +; RV64I-NEXT: addi sp, sp, -288 +; RV64I-NEXT: sd ra, 280(sp) +; RV64I-NEXT: sd s0, 272(sp) +; RV64I-NEXT: sd s1, 264(sp) +; RV64I-NEXT: sd s2, 256(sp) +; RV64I-NEXT: sd s3, 248(sp) +; RV64I-NEXT: sd s4, 240(sp) +; RV64I-NEXT: sd s5, 232(sp) +; RV64I-NEXT: sd s6, 224(sp) +; RV64I-NEXT: sd s7, 216(sp) +; RV64I-NEXT: sd s8, 208(sp) +; RV64I-NEXT: sd s9, 200(sp) +; RV64I-NEXT: sd s10, 192(sp) +; RV64I-NEXT: sd s11, 184(sp) +; RV64I-NEXT: lui s0, %hi(var) +; RV64I-NEXT: lw a0, %lo(var)(s0) +; RV64I-NEXT: sd a0, 176(sp) +; RV64I-NEXT: lw a0, %lo(var+4)(s0) +; RV64I-NEXT: sd a0, 168(sp) +; RV64I-NEXT: lw a0, %lo(var+8)(s0) +; RV64I-NEXT: sd a0, 160(sp) +; RV64I-NEXT: lw a0, %lo(var+12)(s0) +; RV64I-NEXT: sd a0, 152(sp) +; RV64I-NEXT: addi s1, s0, %lo(var) +; RV64I-NEXT: lw a0, 16(s1) +; RV64I-NEXT: sd a0, 144(sp) +; RV64I-NEXT: lw a0, 20(s1) +; RV64I-NEXT: sd a0, 136(sp) +; RV64I-NEXT: lw a0, 24(s1) +; RV64I-NEXT: sd a0, 128(sp) +; RV64I-NEXT: lw a0, 28(s1) +; RV64I-NEXT: sd a0, 120(sp) +; RV64I-NEXT: lw a0, 32(s1) +; RV64I-NEXT: sd a0, 112(sp) +; RV64I-NEXT: lw a0, 36(s1) +; RV64I-NEXT: sd a0, 104(sp) +; RV64I-NEXT: lw a0, 40(s1) +; RV64I-NEXT: sd a0, 96(sp) +; RV64I-NEXT: lw a0, 44(s1) +; RV64I-NEXT: sd a0, 88(sp) +; RV64I-NEXT: lw a0, 48(s1) +; RV64I-NEXT: sd a0, 80(sp) +; RV64I-NEXT: lw a0, 52(s1) +; RV64I-NEXT: sd a0, 72(sp) +; RV64I-NEXT: lw a0, 56(s1) +; RV64I-NEXT: sd a0, 64(sp) +; RV64I-NEXT: lw a0, 60(s1) +; RV64I-NEXT: sd a0, 56(sp) +; RV64I-NEXT: lw a0, 64(s1) +; RV64I-NEXT: sd a0, 48(sp) +; RV64I-NEXT: lw a0, 68(s1) +; RV64I-NEXT: sd a0, 40(sp) +; RV64I-NEXT: lw a0, 72(s1) +; RV64I-NEXT: sd a0, 32(sp) +; RV64I-NEXT: lw a0, 76(s1) +; RV64I-NEXT: sd a0, 24(sp) +; RV64I-NEXT: lw a0, 80(s1) +; RV64I-NEXT: sd a0, 16(sp) +; RV64I-NEXT: lw a0, 84(s1) +; RV64I-NEXT: sd a0, 8(sp) +; RV64I-NEXT: lw s4, 88(s1) +; RV64I-NEXT: lw s5, 92(s1) +; RV64I-NEXT: lw s6, 96(s1) +; RV64I-NEXT: lw s7, 100(s1) +; RV64I-NEXT: lw s8, 104(s1) +; RV64I-NEXT: lw s9, 108(s1) +; RV64I-NEXT: lw s10, 112(s1) +; RV64I-NEXT: lw s11, 116(s1) +; RV64I-NEXT: lw s2, 120(s1) +; RV64I-NEXT: lw s3, 124(s1) ; RV64I-NEXT: call callee -; RV64I-NEXT: sw s1, 124(s0) -; RV64I-NEXT: sw s11, 120(s0) -; RV64I-NEXT: sw s10, 116(s0) -; RV64I-NEXT: sw s9, 112(s0) -; RV64I-NEXT: sw s8, 108(s0) -; RV64I-NEXT: sw s7, 104(s0) -; RV64I-NEXT: sw s6, 100(s0) -; RV64I-NEXT: sw s5, 96(s0) -; RV64I-NEXT: sw s4, 92(s0) -; RV64I-NEXT: sw s3, 88(s0) -; RV64I-NEXT: sw s2, 84(s0) -; RV64I-NEXT: ld a0, 0(sp) +; RV64I-NEXT: sw s3, 124(s1) +; RV64I-NEXT: sw s2, 120(s1) +; RV64I-NEXT: sw s11, 116(s1) +; RV64I-NEXT: sw s10, 112(s1) +; RV64I-NEXT: sw s9, 108(s1) +; RV64I-NEXT: sw s8, 104(s1) +; RV64I-NEXT: sw s7, 100(s1) +; RV64I-NEXT: sw s6, 96(s1) +; RV64I-NEXT: sw s5, 92(s1) +; RV64I-NEXT: sw s4, 88(s1) +; RV64I-NEXT: ld a0, 8(sp) +; RV64I-NEXT: sw a0, 84(s1) +; RV64I-NEXT: ld a0, 16(sp) +; RV64I-NEXT: sw a0, 80(s1) +; RV64I-NEXT: ld a0, 24(sp) +; RV64I-NEXT: sw a0, 76(s1) +; RV64I-NEXT: ld a0, 32(sp) +; RV64I-NEXT: sw a0, 72(s1) +; RV64I-NEXT: ld a0, 40(sp) +; RV64I-NEXT: sw a0, 68(s1) +; RV64I-NEXT: ld a0, 48(sp) +; RV64I-NEXT: sw a0, 64(s1) +; RV64I-NEXT: ld a0, 56(sp) +; RV64I-NEXT: sw a0, 60(s1) +; RV64I-NEXT: ld a0, 64(sp) +; RV64I-NEXT: sw a0, 56(s1) +; RV64I-NEXT: ld a0, 72(sp) +; RV64I-NEXT: sw a0, 52(s1) +; RV64I-NEXT: ld a0, 80(sp) +; RV64I-NEXT: sw a0, 48(s1) +; RV64I-NEXT: ld a0, 88(sp) +; RV64I-NEXT: sw a0, 44(s1) +; RV64I-NEXT: ld a0, 96(sp) +; RV64I-NEXT: sw a0, 40(s1) +; RV64I-NEXT: ld a0, 104(sp) +; RV64I-NEXT: sw a0, 36(s1) +; RV64I-NEXT: ld a0, 112(sp) +; RV64I-NEXT: sw a0, 32(s1) +; RV64I-NEXT: ld a0, 120(sp) +; RV64I-NEXT: sw a0, 28(s1) +; RV64I-NEXT: ld a0, 128(sp) +; RV64I-NEXT: sw a0, 24(s1) +; RV64I-NEXT: ld a0, 136(sp) +; RV64I-NEXT: sw a0, 20(s1) +; RV64I-NEXT: ld a0, 144(sp) +; RV64I-NEXT: sw a0, 16(s1) +; RV64I-NEXT: ld a0, 152(sp) +; RV64I-NEXT: sw a0, %lo(var+12)(s0) +; RV64I-NEXT: ld a0, 160(sp) +; RV64I-NEXT: sw a0, %lo(var+8)(s0) +; RV64I-NEXT: ld a0, 168(sp) +; RV64I-NEXT: sw a0, %lo(var+4)(s0) +; RV64I-NEXT: ld a0, 176(sp) +; RV64I-NEXT: sw a0, %lo(var)(s0) +; RV64I-NEXT: ld s11, 184(sp) +; RV64I-NEXT: ld s10, 192(sp) +; RV64I-NEXT: ld s9, 200(sp) +; RV64I-NEXT: ld s8, 208(sp) +; RV64I-NEXT: ld s7, 216(sp) +; RV64I-NEXT: ld s6, 224(sp) +; RV64I-NEXT: ld s5, 232(sp) +; RV64I-NEXT: ld s4, 240(sp) +; RV64I-NEXT: ld s3, 248(sp) +; RV64I-NEXT: ld s2, 256(sp) +; RV64I-NEXT: ld s1, 264(sp) +; RV64I-NEXT: ld s0, 272(sp) +; RV64I-NEXT: ld ra, 280(sp) +; RV64I-NEXT: addi sp, sp, 288 +; RV64I-NEXT: ret ; ; RV64I-WITH-FP-LABEL: caller: -; RV64I-WITH-FP: addi s0, sp, 288 -; RV64I-WITH-FP-NEXT: lui a0, %hi(var) -; RV64I-WITH-FP-NEXT: lw a1, %lo(var)(a0) -; RV64I-WITH-FP-NEXT: sd a1, -112(s0) -; RV64I-WITH-FP-NEXT: addi s1, a0, %lo(var) -; RV64I-WITH-FP: sd a0, -280(s0) -; RV64I-WITH-FP-NEXT: lw s5, 88(s1) -; RV64I-WITH-FP-NEXT: lw s6, 92(s1) -; RV64I-WITH-FP-NEXT: lw s7, 96(s1) -; RV64I-WITH-FP-NEXT: lw s8, 100(s1) -; RV64I-WITH-FP-NEXT: lw s9, 104(s1) -; RV64I-WITH-FP-NEXT: lw s10, 108(s1) -; RV64I-WITH-FP-NEXT: lw s11, 112(s1) -; RV64I-WITH-FP-NEXT: lw s2, 116(s1) -; RV64I-WITH-FP-NEXT: lw s3, 120(s1) -; RV64I-WITH-FP-NEXT: lw s4, 124(s1) +; RV64I-WITH-FP: # %bb.0: +; RV64I-WITH-FP-NEXT: addi sp, sp, -288 +; RV64I-WITH-FP-NEXT: sd ra, 280(sp) +; RV64I-WITH-FP-NEXT: sd s0, 272(sp) +; RV64I-WITH-FP-NEXT: sd s1, 264(sp) +; RV64I-WITH-FP-NEXT: sd s2, 256(sp) +; RV64I-WITH-FP-NEXT: sd s3, 248(sp) +; RV64I-WITH-FP-NEXT: sd s4, 240(sp) +; RV64I-WITH-FP-NEXT: sd s5, 232(sp) +; RV64I-WITH-FP-NEXT: sd s6, 224(sp) +; RV64I-WITH-FP-NEXT: sd s7, 216(sp) +; RV64I-WITH-FP-NEXT: sd s8, 208(sp) +; RV64I-WITH-FP-NEXT: sd s9, 200(sp) +; RV64I-WITH-FP-NEXT: sd s10, 192(sp) +; RV64I-WITH-FP-NEXT: sd s11, 184(sp) +; RV64I-WITH-FP-NEXT: addi s0, sp, 288 +; RV64I-WITH-FP-NEXT: lui s6, %hi(var) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var)(s6) +; RV64I-WITH-FP-NEXT: sd a0, -112(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+4)(s6) +; RV64I-WITH-FP-NEXT: sd a0, -120(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+8)(s6) +; RV64I-WITH-FP-NEXT: sd a0, -128(s0) +; RV64I-WITH-FP-NEXT: lw a0, %lo(var+12)(s6) +; RV64I-WITH-FP-NEXT: sd a0, -136(s0) +; RV64I-WITH-FP-NEXT: addi s1, s6, %lo(var) +; RV64I-WITH-FP-NEXT: lw a0, 16(s1) +; RV64I-WITH-FP-NEXT: sd a0, -144(s0) +; RV64I-WITH-FP-NEXT: lw a0, 20(s1) +; RV64I-WITH-FP-NEXT: sd a0, -152(s0) +; RV64I-WITH-FP-NEXT: lw a0, 24(s1) +; RV64I-WITH-FP-NEXT: sd a0, -160(s0) +; RV64I-WITH-FP-NEXT: lw a0, 28(s1) +; RV64I-WITH-FP-NEXT: sd a0, -168(s0) +; RV64I-WITH-FP-NEXT: lw a0, 32(s1) +; RV64I-WITH-FP-NEXT: sd a0, -176(s0) +; RV64I-WITH-FP-NEXT: lw a0, 36(s1) +; RV64I-WITH-FP-NEXT: sd a0, -184(s0) +; RV64I-WITH-FP-NEXT: lw a0, 40(s1) +; RV64I-WITH-FP-NEXT: sd a0, -192(s0) +; RV64I-WITH-FP-NEXT: lw a0, 44(s1) +; RV64I-WITH-FP-NEXT: sd a0, -200(s0) +; RV64I-WITH-FP-NEXT: lw a0, 48(s1) +; RV64I-WITH-FP-NEXT: sd a0, -208(s0) +; RV64I-WITH-FP-NEXT: lw a0, 52(s1) +; RV64I-WITH-FP-NEXT: sd a0, -216(s0) +; RV64I-WITH-FP-NEXT: lw a0, 56(s1) +; RV64I-WITH-FP-NEXT: sd a0, -224(s0) +; RV64I-WITH-FP-NEXT: lw a0, 60(s1) +; RV64I-WITH-FP-NEXT: sd a0, -232(s0) +; RV64I-WITH-FP-NEXT: lw a0, 64(s1) +; RV64I-WITH-FP-NEXT: sd a0, -240(s0) +; RV64I-WITH-FP-NEXT: lw a0, 68(s1) +; RV64I-WITH-FP-NEXT: sd a0, -248(s0) +; RV64I-WITH-FP-NEXT: lw a0, 72(s1) +; RV64I-WITH-FP-NEXT: sd a0, -256(s0) +; RV64I-WITH-FP-NEXT: lw a0, 76(s1) +; RV64I-WITH-FP-NEXT: sd a0, -264(s0) +; RV64I-WITH-FP-NEXT: lw a0, 80(s1) +; RV64I-WITH-FP-NEXT: sd a0, -272(s0) +; RV64I-WITH-FP-NEXT: lw a0, 84(s1) +; RV64I-WITH-FP-NEXT: sd a0, -280(s0) +; RV64I-WITH-FP-NEXT: lw a0, 88(s1) +; RV64I-WITH-FP-NEXT: sd a0, -288(s0) +; RV64I-WITH-FP-NEXT: lw s8, 92(s1) +; RV64I-WITH-FP-NEXT: lw s9, 96(s1) +; RV64I-WITH-FP-NEXT: lw s10, 100(s1) +; RV64I-WITH-FP-NEXT: lw s11, 104(s1) +; RV64I-WITH-FP-NEXT: lw s2, 108(s1) +; RV64I-WITH-FP-NEXT: lw s3, 112(s1) +; RV64I-WITH-FP-NEXT: lw s4, 116(s1) +; RV64I-WITH-FP-NEXT: lw s5, 120(s1) +; RV64I-WITH-FP-NEXT: lw s7, 124(s1) ; RV64I-WITH-FP-NEXT: call callee -; RV64I-WITH-FP-NEXT: sw s4, 124(s1) -; RV64I-WITH-FP-NEXT: sw s3, 120(s1) -; RV64I-WITH-FP-NEXT: sw s2, 116(s1) -; RV64I-WITH-FP-NEXT: sw s11, 112(s1) -; RV64I-WITH-FP-NEXT: sw s10, 108(s1) -; RV64I-WITH-FP-NEXT: sw s9, 104(s1) -; RV64I-WITH-FP-NEXT: sw s8, 100(s1) -; RV64I-WITH-FP-NEXT: sw s7, 96(s1) -; RV64I-WITH-FP-NEXT: sw s6, 92(s1) -; RV64I-WITH-FP-NEXT: sw s5, 88(s1) +; RV64I-WITH-FP-NEXT: sw s7, 124(s1) +; RV64I-WITH-FP-NEXT: sw s5, 120(s1) +; RV64I-WITH-FP-NEXT: sw s4, 116(s1) +; RV64I-WITH-FP-NEXT: sw s3, 112(s1) +; RV64I-WITH-FP-NEXT: sw s2, 108(s1) +; RV64I-WITH-FP-NEXT: sw s11, 104(s1) +; RV64I-WITH-FP-NEXT: sw s10, 100(s1) +; RV64I-WITH-FP-NEXT: sw s9, 96(s1) +; RV64I-WITH-FP-NEXT: sw s8, 92(s1) +; RV64I-WITH-FP-NEXT: ld a0, -288(s0) +; RV64I-WITH-FP-NEXT: sw a0, 88(s1) ; RV64I-WITH-FP-NEXT: ld a0, -280(s0) +; RV64I-WITH-FP-NEXT: sw a0, 84(s1) +; RV64I-WITH-FP-NEXT: ld a0, -272(s0) +; RV64I-WITH-FP-NEXT: sw a0, 80(s1) +; RV64I-WITH-FP-NEXT: ld a0, -264(s0) +; RV64I-WITH-FP-NEXT: sw a0, 76(s1) +; RV64I-WITH-FP-NEXT: ld a0, -256(s0) +; RV64I-WITH-FP-NEXT: sw a0, 72(s1) +; RV64I-WITH-FP-NEXT: ld a0, -248(s0) +; RV64I-WITH-FP-NEXT: sw a0, 68(s1) +; RV64I-WITH-FP-NEXT: ld a0, -240(s0) +; RV64I-WITH-FP-NEXT: sw a0, 64(s1) +; RV64I-WITH-FP-NEXT: ld a0, -232(s0) +; RV64I-WITH-FP-NEXT: sw a0, 60(s1) +; RV64I-WITH-FP-NEXT: ld a0, -224(s0) +; RV64I-WITH-FP-NEXT: sw a0, 56(s1) +; RV64I-WITH-FP-NEXT: ld a0, -216(s0) +; RV64I-WITH-FP-NEXT: sw a0, 52(s1) +; RV64I-WITH-FP-NEXT: ld a0, -208(s0) +; RV64I-WITH-FP-NEXT: sw a0, 48(s1) +; RV64I-WITH-FP-NEXT: ld a0, -200(s0) +; RV64I-WITH-FP-NEXT: sw a0, 44(s1) +; RV64I-WITH-FP-NEXT: ld a0, -192(s0) +; RV64I-WITH-FP-NEXT: sw a0, 40(s1) +; RV64I-WITH-FP-NEXT: ld a0, -184(s0) +; RV64I-WITH-FP-NEXT: sw a0, 36(s1) +; RV64I-WITH-FP-NEXT: ld a0, -176(s0) +; RV64I-WITH-FP-NEXT: sw a0, 32(s1) +; RV64I-WITH-FP-NEXT: ld a0, -168(s0) +; RV64I-WITH-FP-NEXT: sw a0, 28(s1) +; RV64I-WITH-FP-NEXT: ld a0, -160(s0) +; RV64I-WITH-FP-NEXT: sw a0, 24(s1) +; RV64I-WITH-FP-NEXT: ld a0, -152(s0) +; RV64I-WITH-FP-NEXT: sw a0, 20(s1) +; RV64I-WITH-FP-NEXT: ld a0, -144(s0) +; RV64I-WITH-FP-NEXT: sw a0, 16(s1) +; RV64I-WITH-FP-NEXT: ld a0, -136(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+12)(s6) +; RV64I-WITH-FP-NEXT: ld a0, -128(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+8)(s6) +; RV64I-WITH-FP-NEXT: ld a0, -120(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var+4)(s6) +; RV64I-WITH-FP-NEXT: ld a0, -112(s0) +; RV64I-WITH-FP-NEXT: sw a0, %lo(var)(s6) +; RV64I-WITH-FP-NEXT: ld s11, 184(sp) +; RV64I-WITH-FP-NEXT: ld s10, 192(sp) +; RV64I-WITH-FP-NEXT: ld s9, 200(sp) +; RV64I-WITH-FP-NEXT: ld s8, 208(sp) +; RV64I-WITH-FP-NEXT: ld s7, 216(sp) +; RV64I-WITH-FP-NEXT: ld s6, 224(sp) +; RV64I-WITH-FP-NEXT: ld s5, 232(sp) +; RV64I-WITH-FP-NEXT: ld s4, 240(sp) +; RV64I-WITH-FP-NEXT: ld s3, 248(sp) +; RV64I-WITH-FP-NEXT: ld s2, 256(sp) +; RV64I-WITH-FP-NEXT: ld s1, 264(sp) +; RV64I-WITH-FP-NEXT: ld s0, 272(sp) +; RV64I-WITH-FP-NEXT: ld ra, 280(sp) +; RV64I-WITH-FP-NEXT: addi sp, sp, 288 +; RV64I-WITH-FP-NEXT: ret + %val = load [32 x i32], [32 x i32]* @var call void @callee() store volatile [32 x i32] %val, [32 x i32]* @var diff --git a/llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll b/llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll index 4c98bafdfb8a..803778de1365 100644 --- a/llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll +++ b/llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll @@ -23,8 +23,7 @@ define i64 @load_g_0() nounwind { ; RV32I: # %bb.0: # %entry ; RV32I-NEXT: lui a1, %hi(g_0) ; RV32I-NEXT: lw a0, %lo(g_0)(a1) -; RV32I-NEXT: addi a1, a1, %lo(g_0) -; RV32I-NEXT: lw a1, 4(a1) +; RV32I-NEXT: lw a1, %lo(g_0+4)(a1) ; RV32I-NEXT: ret ; ; RV64I-LABEL: load_g_0: @@ -99,8 +98,7 @@ define i64 @load_g_8() nounwind { ; RV32I: # %bb.0: # %entry ; RV32I-NEXT: lui a1, %hi(g_8) ; RV32I-NEXT: lw a0, %lo(g_8)(a1) -; RV32I-NEXT: addi a1, a1, %lo(g_8) -; RV32I-NEXT: lw a1, 4(a1) +; RV32I-NEXT: lw a1, %lo(g_8+4)(a1) ; RV32I-NEXT: ret ; ; RV64I-LABEL: load_g_8: @@ -118,8 +116,7 @@ define i64 @load_g_16() nounwind { ; RV32I: # %bb.0: # %entry ; RV32I-NEXT: lui a1, %hi(g_16) ; RV32I-NEXT: lw a0, %lo(g_16)(a1) -; RV32I-NEXT: addi a1, a1, %lo(g_16) -; RV32I-NEXT: lw a1, 4(a1) +; RV32I-NEXT: lw a1, %lo(g_16+4)(a1) ; RV32I-NEXT: ret ; ; RV64I-LABEL: load_g_16: @@ -155,9 +152,8 @@ define void @store_g_8() nounwind { ; RV32I-LABEL: store_g_8: ; RV32I: # %bb.0: # %entry ; RV32I-NEXT: lui a0, %hi(g_8) +; RV32I-NEXT: sw zero, %lo(g_8+4)(a0) ; RV32I-NEXT: sw zero, %lo(g_8)(a0) -; RV32I-NEXT: addi a0, a0, %lo(g_8) -; RV32I-NEXT: sw zero, 4(a0) ; RV32I-NEXT: ret ; ; RV64I-LABEL: store_g_8: @@ -197,15 +193,14 @@ entry: define i64 @load_ga_16() nounwind { ; RV32I-LABEL: load_ga_16: ; RV32I: # %bb.0: # %entry -; RV32I-NEXT: lui a0, %hi(ga_16) -; RV32I-NEXT: addi a1, a0, %lo(ga_16) -; RV32I-NEXT: lw a0, 8(a1) -; RV32I-NEXT: lw a1, 12(a1) +; RV32I-NEXT: lui a1, %hi(ga_16) +; RV32I-NEXT: lw a0, %lo(ga_16+8)(a1) +; RV32I-NEXT: lw a1, %lo(ga_16+12)(a1) ; RV32I-NEXT: ret ; ; RV64I-LABEL: load_ga_16: ; RV64I: # %bb.0: # %entry -; RV64I-NEXT: lui a0, %hi(ga_16+8) +; RV64I-NEXT: lui a0, %hi(ga_16) ; RV64I-NEXT: ld a0, %lo(ga_16+8)(a0) ; RV64I-NEXT: ret entry: @@ -245,8 +240,7 @@ define i64 @load_tl_8() nounwind { ; RV32I-NEXT: lui a0, %tprel_hi(tl_8) ; RV32I-NEXT: add a1, a0, tp, %tprel_add(tl_8) ; RV32I-NEXT: lw a0, %tprel_lo(tl_8)(a1) -; RV32I-NEXT: addi a1, a1, %tprel_lo(tl_8) -; RV32I-NEXT: lw a1, 4(a1) +; RV32I-NEXT: lw a1, %tprel_lo(tl_8+4)(a1) ; RV32I-NEXT: ret ; ; RV64I-LABEL: load_tl_8: diff --git a/llvm/test/CodeGen/RISCV/fp128.ll b/llvm/test/CodeGen/RISCV/fp128.ll index 91b1702911af..81a19d065ac5 100644 --- a/llvm/test/CodeGen/RISCV/fp128.ll +++ b/llvm/test/CodeGen/RISCV/fp128.ll @@ -14,27 +14,25 @@ define i32 @test_load_and_cmp() nounwind { ; RV32I-NEXT: addi sp, sp, -48 ; RV32I-NEXT: sw ra, 44(sp) ; RV32I-NEXT: lui a0, %hi(x) -; RV32I-NEXT: addi a1, a0, %lo(x) -; RV32I-NEXT: lw a6, 4(a1) -; RV32I-NEXT: lw a7, 8(a1) -; RV32I-NEXT: lw a1, 12(a1) -; RV32I-NEXT: lw a0, %lo(x)(a0) +; RV32I-NEXT: lw a6, %lo(x)(a0) +; RV32I-NEXT: lw a7, %lo(x+4)(a0) +; RV32I-NEXT: lw a3, %lo(x+8)(a0) +; RV32I-NEXT: lw a0, %lo(x+12)(a0) ; RV32I-NEXT: lui a4, %hi(y) -; RV32I-NEXT: addi a5, a4, %lo(y) -; RV32I-NEXT: lw a2, 4(a5) -; RV32I-NEXT: lw a3, 8(a5) -; RV32I-NEXT: lw a5, 12(a5) -; RV32I-NEXT: lw a4, %lo(y)(a4) -; RV32I-NEXT: sw a4, 8(sp) -; RV32I-NEXT: sw a0, 24(sp) -; RV32I-NEXT: sw a5, 20(sp) -; RV32I-NEXT: sw a3, 16(sp) +; RV32I-NEXT: lw a5, %lo(y)(a4) +; RV32I-NEXT: lw a2, %lo(y+4)(a4) +; RV32I-NEXT: lw a1, %lo(y+8)(a4) +; RV32I-NEXT: lw a4, %lo(y+12)(a4) +; RV32I-NEXT: sw a4, 20(sp) +; RV32I-NEXT: sw a1, 16(sp) ; RV32I-NEXT: sw a2, 12(sp) -; RV32I-NEXT: sw a1, 36(sp) -; RV32I-NEXT: sw a7, 32(sp) +; RV32I-NEXT: sw a5, 8(sp) +; RV32I-NEXT: sw a0, 36(sp) +; RV32I-NEXT: sw a3, 32(sp) +; RV32I-NEXT: sw a7, 28(sp) ; RV32I-NEXT: addi a0, sp, 24 ; RV32I-NEXT: addi a1, sp, 8 -; RV32I-NEXT: sw a6, 28(sp) +; RV32I-NEXT: sw a6, 24(sp) ; RV32I-NEXT: call __netf2 ; RV32I-NEXT: snez a0, a0 ; RV32I-NEXT: lw ra, 44(sp) @@ -53,28 +51,26 @@ define i32 @test_add_and_fptosi() nounwind { ; RV32I-NEXT: addi sp, sp, -80 ; RV32I-NEXT: sw ra, 76(sp) ; RV32I-NEXT: lui a0, %hi(x) -; RV32I-NEXT: addi a1, a0, %lo(x) -; RV32I-NEXT: lw a6, 4(a1) -; RV32I-NEXT: lw a7, 8(a1) -; RV32I-NEXT: lw a1, 12(a1) -; RV32I-NEXT: lw a0, %lo(x)(a0) +; RV32I-NEXT: lw a6, %lo(x)(a0) +; RV32I-NEXT: lw a7, %lo(x+4)(a0) +; RV32I-NEXT: lw a2, %lo(x+8)(a0) +; RV32I-NEXT: lw a0, %lo(x+12)(a0) ; RV32I-NEXT: lui a4, %hi(y) -; RV32I-NEXT: addi a5, a4, %lo(y) -; RV32I-NEXT: lw a3, 4(a5) -; RV32I-NEXT: lw a2, 8(a5) -; RV32I-NEXT: lw a5, 12(a5) -; RV32I-NEXT: lw a4, %lo(y)(a4) -; RV32I-NEXT: sw a4, 24(sp) -; RV32I-NEXT: sw a0, 40(sp) -; RV32I-NEXT: sw a5, 36(sp) -; RV32I-NEXT: sw a2, 32(sp) +; RV32I-NEXT: lw a5, %lo(y)(a4) +; RV32I-NEXT: lw a3, %lo(y+4)(a4) +; RV32I-NEXT: lw a1, %lo(y+8)(a4) +; RV32I-NEXT: lw a4, %lo(y+12)(a4) +; RV32I-NEXT: sw a4, 36(sp) +; RV32I-NEXT: sw a1, 32(sp) ; RV32I-NEXT: sw a3, 28(sp) -; RV32I-NEXT: sw a1, 52(sp) -; RV32I-NEXT: sw a7, 48(sp) +; RV32I-NEXT: sw a5, 24(sp) +; RV32I-NEXT: sw a0, 52(sp) +; RV32I-NEXT: sw a2, 48(sp) +; RV32I-NEXT: sw a7, 44(sp) ; RV32I-NEXT: addi a0, sp, 56 ; RV32I-NEXT: addi a1, sp, 40 ; RV32I-NEXT: addi a2, sp, 24 -; RV32I-NEXT: sw a6, 44(sp) +; RV32I-NEXT: sw a6, 40(sp) ; RV32I-NEXT: call __addtf3 ; RV32I-NEXT: lw a1, 56(sp) ; RV32I-NEXT: lw a0, 60(sp) diff --git a/llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll b/llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll index c36a01ca3d98..025f92c3de96 100644 --- a/llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll +++ b/llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll @@ -383,16 +383,13 @@ define void @foo_double() nounwind #0 { ; CHECK-RV32-NEXT: sw t6, 0(sp) ; CHECK-RV32-NEXT: lui a1, %hi(h) ; CHECK-RV32-NEXT: lw a0, %lo(h)(a1) -; CHECK-RV32-NEXT: addi a1, a1, %lo(h) -; CHECK-RV32-NEXT: lw a1, 4(a1) +; CHECK-RV32-NEXT: lw a1, %lo(h+4)(a1) ; CHECK-RV32-NEXT: lui a3, %hi(i) ; CHECK-RV32-NEXT: lw a2, %lo(i)(a3) -; CHECK-RV32-NEXT: addi a3, a3, %lo(i) -; CHECK-RV32-NEXT: lw a3, 4(a3) +; CHECK-RV32-NEXT: lw a3, %lo(i+4)(a3) ; CHECK-RV32-NEXT: call __adddf3 ; CHECK-RV32-NEXT: lui a2, %hi(g) -; CHECK-RV32-NEXT: addi a3, a2, %lo(g) -; CHECK-RV32-NEXT: sw a1, 4(a3) +; CHECK-RV32-NEXT: sw a1, %lo(g+4)(a2) ; CHECK-RV32-NEXT: sw a0, %lo(g)(a2) ; CHECK-RV32-NEXT: lw t6, 0(sp) ; CHECK-RV32-NEXT: lw t5, 4(sp) @@ -466,16 +463,13 @@ define void @foo_double() nounwind #0 { ; CHECK-RV32IF-NEXT: fsw fs11, 0(sp) ; CHECK-RV32IF-NEXT: lui a1, %hi(h) ; CHECK-RV32IF-NEXT: lw a0, %lo(h)(a1) -; CHECK-RV32IF-NEXT: addi a1, a1, %lo(h) -; CHECK-RV32IF-NEXT: lw a1, 4(a1) +; CHECK-RV32IF-NEXT: lw a1, %lo(h+4)(a1) ; CHECK-RV32IF-NEXT: lui a3, %hi(i) ; CHECK-RV32IF-NEXT: lw a2, %lo(i)(a3) -; CHECK-RV32IF-NEXT: addi a3, a3, %lo(i) -; CHECK-RV32IF-NEXT: lw a3, 4(a3) +; CHECK-RV32IF-NEXT: lw a3, %lo(i+4)(a3) ; CHECK-RV32IF-NEXT: call __adddf3 ; CHECK-RV32IF-NEXT: lui a2, %hi(g) -; CHECK-RV32IF-NEXT: addi a3, a2, %lo(g) -; CHECK-RV32IF-NEXT: sw a1, 4(a3) +; CHECK-RV32IF-NEXT: sw a1, %lo(g+4)(a2) ; CHECK-RV32IF-NEXT: sw a0, %lo(g)(a2) ; CHECK-RV32IF-NEXT: flw fs11, 0(sp) ; CHECK-RV32IF-NEXT: flw fs10, 4(sp) @@ -580,16 +574,13 @@ define void @foo_fp_double() nounwind #1 { ; CHECK-RV32-NEXT: addi s0, sp, 80 ; CHECK-RV32-NEXT: lui a1, %hi(h) ; CHECK-RV32-NEXT: lw a0, %lo(h)(a1) -; CHECK-RV32-NEXT: addi a1, a1, %lo(h) -; CHECK-RV32-NEXT: lw a1, 4(a1) +; CHECK-RV32-NEXT: lw a1, %lo(h+4)(a1) ; CHECK-RV32-NEXT: lui a3, %hi(i) ; CHECK-RV32-NEXT: lw a2, %lo(i)(a3) -; CHECK-RV32-NEXT: addi a3, a3, %lo(i) -; CHECK-RV32-NEXT: lw a3, 4(a3) +; CHECK-RV32-NEXT: lw a3, %lo(i+4)(a3) ; CHECK-RV32-NEXT: call __adddf3 ; CHECK-RV32-NEXT: lui a2, %hi(g) -; CHECK-RV32-NEXT: addi a3, a2, %lo(g) -; CHECK-RV32-NEXT: sw a1, 4(a3) +; CHECK-RV32-NEXT: sw a1, %lo(g+4)(a2) ; CHECK-RV32-NEXT: sw a0, %lo(g)(a2) ; CHECK-RV32-NEXT: lw t6, 12(sp) ; CHECK-RV32-NEXT: lw t5, 16(sp) @@ -666,16 +657,13 @@ define void @foo_fp_double() nounwind #1 { ; CHECK-RV32IF-NEXT: addi s0, sp, 208 ; CHECK-RV32IF-NEXT: lui a1, %hi(h) ; CHECK-RV32IF-NEXT: lw a0, %lo(h)(a1) -; CHECK-RV32IF-NEXT: addi a1, a1, %lo(h) -; CHECK-RV32IF-NEXT: lw a1, 4(a1) +; CHECK-RV32IF-NEXT: lw a1, %lo(h+4)(a1) ; CHECK-RV32IF-NEXT: lui a3, %hi(i) ; CHECK-RV32IF-NEXT: lw a2, %lo(i)(a3) -; CHECK-RV32IF-NEXT: addi a3, a3, %lo(i) -; CHECK-RV32IF-NEXT: lw a3, 4(a3) +; CHECK-RV32IF-NEXT: lw a3, %lo(i+4)(a3) ; CHECK-RV32IF-NEXT: call __adddf3 ; CHECK-RV32IF-NEXT: lui a2, %hi(g) -; CHECK-RV32IF-NEXT: addi a3, a2, %lo(g) -; CHECK-RV32IF-NEXT: sw a1, 4(a3) +; CHECK-RV32IF-NEXT: sw a1, %lo(g+4)(a2) ; CHECK-RV32IF-NEXT: sw a0, %lo(g)(a2) ; CHECK-RV32IF-NEXT: flw fs11, 12(sp) ; CHECK-RV32IF-NEXT: flw fs10, 16(sp) diff --git a/llvm/test/CodeGen/RISCV/wide-mem.ll b/llvm/test/CodeGen/RISCV/wide-mem.ll index 02aae215fcec..40a074bd8768 100644 --- a/llvm/test/CodeGen/RISCV/wide-mem.ll +++ b/llvm/test/CodeGen/RISCV/wide-mem.ll @@ -22,8 +22,7 @@ define i64 @load_i64_global() nounwind { ; RV32I: # %bb.0: ; RV32I-NEXT: lui a1, %hi(val64) ; RV32I-NEXT: lw a0, %lo(val64)(a1) -; RV32I-NEXT: addi a1, a1, %lo(val64) -; RV32I-NEXT: lw a1, 4(a1) +; RV32I-NEXT: lw a1, %lo(val64+4)(a1) ; RV32I-NEXT: ret %1 = load i64, i64* @val64 ret i64 %1 From llvm-commits at lists.llvm.org Mon Jul 6 09:50:54 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Mon, 06 Jul 2020 09:50:54 -0700 (PDT) Subject: [lld] c1a5f73 - [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS Message-ID: <5f03566e.1c69fb81.f9552.007d@mx.google.com> Author: Fangrui Song Date: 2020-07-06T09:47:53-07:00 New Revision: c1a5f73a4ae70d8f808c1bac091c3f4b683815b4 URL: https://github.com/llvm/llvm-project/commit/c1a5f73a4ae70d8f808c1bac091c3f4b683815b4 DIFF: https://github.com/llvm/llvm-project/commit/c1a5f73a4ae70d8f808c1bac091c3f4b683815b4.diff LOG: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS Follow-up to D82899. Note, we need to disable R_DTPREL relaxation because ARM psABI does not define TLS relaxation. Reviewed By: grimar, psmith Differential Revision: https://reviews.llvm.org/D83138 Added: Modified: lld/ELF/Arch/ARM.cpp lld/ELF/Relocations.cpp lld/test/ELF/debug-dead-reloc-tls-arm.s Removed: ################################################################################ diff --git a/lld/ELF/Arch/ARM.cpp b/lld/ELF/Arch/ARM.cpp index 0dfdbf3d01e2..fd90557cc4f6 100644 --- a/lld/ELF/Arch/ARM.cpp +++ b/lld/ELF/Arch/ARM.cpp @@ -121,6 +121,8 @@ RelExpr ARM::getRelExpr(RelType type, const Symbol &s, return R_TLSGD_PC; case R_ARM_TLS_LDM32: return R_TLSLD_PC; + case R_ARM_TLS_LDO32: + return R_DTPREL; case R_ARM_BASE_PREL: // B(S) + A - P // FIXME: currently B(S) assumed to be .got, this may not hold for all diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp index dfae234fd60c..42341f67afee 100644 --- a/lld/ELF/Relocations.cpp +++ b/lld/ELF/Relocations.cpp @@ -238,7 +238,7 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && !config->shared) { + if (expr == R_DTPREL && canRelax && !config->shared) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); diff --git a/lld/test/ELF/debug-dead-reloc-tls-arm.s b/lld/test/ELF/debug-dead-reloc-tls-arm.s index 146133a5c8c0..7fa5bcaae19e 100644 --- a/lld/test/ELF/debug-dead-reloc-tls-arm.s +++ b/lld/test/ELF/debug-dead-reloc-tls-arm.s @@ -7,8 +7,7 @@ # RUN: llvm-objdump -s %t | FileCheck %s # CHECK: Contents of section .debug_info: -## FIXME: Use ffffffff -# CHECK-NEXT: 0000 00000000 +# CHECK-NEXT: 0000 ffffffff .globl _start _start: From llvm-commits at lists.llvm.org Mon Jul 6 09:54:19 2020 From: llvm-commits at lists.llvm.org (David Tenty via llvm-commits) Date: Mon, 06 Jul 2020 09:54:19 -0700 (PDT) Subject: [llvm] 2402f93 - [AIX] Add system-aix to lit config file Message-ID: <5f03573b.1c69fb81.97585.1682@mx.google.com> Author: Shuhong Liu Date: 2020-07-06T12:54:12-04:00 New Revision: 2402f9385e850a1434a4d2ee00d76ca01e44a40b URL: https://github.com/llvm/llvm-project/commit/2402f9385e850a1434a4d2ee00d76ca01e44a40b DIFF: https://github.com/llvm/llvm-project/commit/2402f9385e850a1434a4d2ee00d76ca01e44a40b.diff LOG: [AIX] Add system-aix to lit config file Summary: This is a complementary patch to D82100 since the aix builbot is still running the unsupported test shtest-format-argv0. Add system-aix to the sub llvm-lit config. Reviewers: daltenty, hubert.reinterpretcast Reviewed By: hubert.reinterpretcast Subscribers: delcypher, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82905 Added: Modified: llvm/utils/lit/lit/llvm/config.py llvm/utils/lit/tests/lit.cfg llvm/utils/lit/tests/shtest-format-argv0.py Removed: ################################################################################ diff --git a/llvm/utils/lit/lit/llvm/config.py b/llvm/utils/lit/lit/llvm/config.py index fe0f1f4441ac..db557a7b1fef 100644 --- a/llvm/utils/lit/lit/llvm/config.py +++ b/llvm/utils/lit/lit/llvm/config.py @@ -51,12 +51,14 @@ def __init__(self, lit_config, config): elif platform.system() == 'Windows': # For tests that require Windows to run. features.add('system-windows') - elif platform.system() == "Linux": + elif platform.system() == 'Linux': features.add('system-linux') elif platform.system() in ['FreeBSD']: features.add('system-freebsd') - elif platform.system() == "NetBSD": + elif platform.system() == 'NetBSD': features.add('system-netbsd') + elif platform.system() == 'AIX': + features.add('system-aix') # Native compilation: host arch == default triple arch # Both of these values should probably be in every site config (e.g. as diff --git a/llvm/utils/lit/tests/lit.cfg b/llvm/utils/lit/tests/lit.cfg index 85bdbf180b13..3c49f076a66e 100644 --- a/llvm/utils/lit/tests/lit.cfg +++ b/llvm/utils/lit/tests/lit.cfg @@ -87,7 +87,7 @@ if not llvm_config: if sys.platform.startswith('win') or sys.platform.startswith('cygwin'): config.available_features.add('system-windows') if platform.system() == 'AIX': - config.available_features.add('aix') + config.available_features.add('system-aix') # For each of lit's internal shell commands ('env', 'cd', ' diff ', etc.), put # a fake command that always fails at the start of PATH. This helps us check diff --git a/llvm/utils/lit/tests/shtest-format-argv0.py b/llvm/utils/lit/tests/shtest-format-argv0.py index 063fa80a678a..28f9acaa3322 100644 --- a/llvm/utils/lit/tests/shtest-format-argv0.py +++ b/llvm/utils/lit/tests/shtest-format-argv0.py @@ -5,7 +5,7 @@ # # This test is not supported on AIX since `[` is only available as a shell builtin # and is not installed under PATH by default. -# UNSUPPORTED: aix +# UNSUPPORTED: system-aix # # RUN: %{lit} -j 1 -v %{inputs}/shtest-format-argv0 | FileCheck %s From llvm-commits at lists.llvm.org Mon Jul 6 10:34:08 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:34:08 +0000 (UTC) Subject: [PATCH] D82499: [DAGCombiner] tighten constraints for fma fold In-Reply-To: References: Message-ID: <7071b30bb7748eab5b369b139556abbb@localhost.localdomain> cameron.mcinally accepted this revision. cameron.mcinally added a comment. This revision is now accepted and ready to land. LGTM, but I encourage others to review too. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11991 E = N0; } if (FMA && E) { ---------------- Do we need to check `Aggressive` here? For a hypothetical target with 2 FMUL/FADD ports and 1 FMA port, assuming slow FMAs, this could be a performance loss. It shouldn't be a problem for modern chips that I care about, so just picking nits. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82499/new/ https://reviews.llvm.org/D82499 From llvm-commits at lists.llvm.org Mon Jul 6 10:36:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:36:19 +0000 (UTC) Subject: [PATCH] D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads. In-Reply-To: References: Message-ID: <8aa3a0d225606d45f8371d00639d31e1@localhost.localdomain> arsenm added a comment. Is this still necessary? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80364/new/ https://reviews.llvm.org/D80364 From llvm-commits at lists.llvm.org Mon Jul 6 10:37:07 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:37:07 +0000 (UTC) Subject: [PATCH] D82248: AMDGPU: Don't ignore carry out user when expanding add_co_pseudo In-Reply-To: References: Message-ID: arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82248/new/ https://reviews.llvm.org/D82248 From llvm-commits at lists.llvm.org Mon Jul 6 10:38:10 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:38:10 +0000 (UTC) Subject: [PATCH] D83164: [flang] Basic tests of external I/O runtime (part 9/9) In-Reply-To: References: Message-ID: <6d30a87b68901168b93acb6f877cd892@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa39e9cf6bec4: [flang] Basic tests of external I/O runtime (part 9/9) (authored by klausler). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83164/new/ https://reviews.llvm.org/D83164 Files: flang/runtime/terminator.cpp flang/runtime/terminator.h flang/unittests/Runtime/CMakeLists.txt flang/unittests/Runtime/external-hello.cpp flang/unittests/Runtime/external-io.cpp flang/unittests/Runtime/testing.cpp flang/unittests/Runtime/testing.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83164.275761.patch Type: text/x-patch Size: 21538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 10:45:34 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:45:34 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstructionSelector.cpp:39 + bool select(MachineInstr &I) override; + static const char *getName() { return DEBUG_TYPE; } + ---------------- kbarton wrote: > arsenm wrote: > > I'm pretty sure you don't need these and all the other places that override this are dead code > I don't follow this. > Both select and getName seem to be required - getName is needed by the base InstructionSelector implementation in GlobalISel; select is needed by the PPCGenGlobalISel.inc file generated below. > > It is entirely possible I'm doing something incorrect though. Could you explain some more? Oh right, this isn't the direct pass. I think manual getName overrides are dead code on passes and set by the INITIALIZE_PASS* macros Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Mon Jul 6 10:45:53 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:45:53 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: dantrushin updated this revision to Diff 275762. dantrushin added a comment. Address review comments and fix clang-format errors Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp llvm/test/CodeGen/X86/statepoint-vreg.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.275762.patch Type: text/x-patch Size: 19116 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 10:47:17 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:47:17 +0000 (UTC) Subject: [PATCH] D82566: [CodeMoverUtils] Make specific analysis dependent checks optional In-Reply-To: References: Message-ID: <4041195d34e03e86a346663da4820c51@localhost.localdomain> RithikSharma updated this revision to Diff 275758. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82566/new/ https://reviews.llvm.org/D82566 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82566.275758.patch Type: text/x-patch Size: 10548 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 10:48:07 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:48:07 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <74856a1ce5b12936fcfef2b385dfdf84@localhost.localdomain> zequanwu updated this revision to Diff 275763. zequanwu added a comment. Delete `enable-npm-call-graph-profile` option for NPM, using `enable-call-graph-profile` for both LPM and NPM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/lib/CodeGen/BackendUtil.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll llvm/tools/opt/NewPMDriver.cpp llvm/tools/opt/NewPMDriver.h llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.275763.patch Type: text/x-patch Size: 18513 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 10:49:02 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:49:02 +0000 (UTC) Subject: [PATCH] D82566: [CodeMoverUtils] Make specific analysis dependent checks optional In-Reply-To: References: Message-ID: RithikSharma added a comment. Many thanks, I have updated the diff. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82566/new/ https://reviews.llvm.org/D82566 From llvm-commits at lists.llvm.org Mon Jul 6 10:50:05 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:50:05 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <6144c07fec2efc7688e1fd256f58f5eb@localhost.localdomain> ABataev added a comment. In D82881#2133511 , @aprantl wrote: > And conversely, with this patch applied, do GDB and LLDB still produce the expected result? GDB works correctly. Did not check with lldb, but it also should work. The result is similar to the debug info, produced for the next code: struct { short : 3; short : 6; } a; But the code, produced by the compiler, is also the same. So, I think, the debug info also should be the same. > Also, what happens to the next bit field or variable right after the bit-filed with the now larger container? Is that affected by the patch? It does not affect the next fields. We point exactly to the bytes, allocated for this particular bitfield only. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 10:50:33 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:50:33 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <631f6eb6712a09b345cdcd55f0effab6@localhost.localdomain> biplmish added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:651 + // CHECK-NEXT: [[T3:%.+]] = shufflevector <2 x double> [[T2:%.+]], <2 x double> undef, <2 x i32> zeroinitialize + // CHECK-NEXT: ret <2 x double> [[T3:%.+]] + return vec_splatid(1.0); ---------------- lei wrote: > missing CHECK-BE? There will 1 less case for BE as one splat location will be the same as index. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 10:51:53 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:51:53 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: jdenny added a comment. Thanks for working on this. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:64 +// Hold information about clause validity by version +class VersionedClause { ---------------- Please add punctuation to all comments. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:65 +// Hold information about clause validity by version +class VersionedClause { + // Actual clause ---------------- Why not unsigned as in the original? ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:69 + + // Mininum version number where this clause is valid in the list. + int minVersion = min; ---------------- What does "the list" refer to? ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:278 + VersionedClause, + VersionedClause]; } ---------------- The closing `]` is inconsistently placed here and in some other cases. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def:1881 -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, release) -// TODO This should ne `none` instead -__OMP_DIRECTIVE_CLAUSE(flush, 1, ~0, flush) ---------------- This patch loses this TODO and the next one. I'm not sure what they mean. Do we need to keep them? ================ Comment at: llvm/test/TableGen/directive1.td:109 +// IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 1 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; ---------------- I know the original code used a giant if-else block, but shouldn't this be a switch? ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:56 for (const auto &R : Records) { const auto Name = R->getValueAsString("name"); + OS << "constexpr auto " << Prefix << getFormattedName(Name) << " = " ---------------- Any reason not to call `getFormattedName` here instead of twice below? ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:183 const auto DefaultName = (*DefaultIt)->getValueAsString("name"); ---------------- Call `getFormattedName(DefaultName)` once here? ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:214 + + OS << " if (D == " << DirectivePrefix << getFormattedName(DirectiveName) + << " && C == " << ClausePrefix << getFormattedName(ClauseName) << " && " ---------------- Hoist `getFormattedName(DirectiveName)` out of the loop? ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:223 +// for the moment a copy of what was done in the OMPKinds.def. It can be +// update in the future since we have more flexibility to generate code. +void GenerateIsAllowedClause(const std::vector &Directives, ---------------- I'd drop this comment's last two sentences, which don't seem like they will be meaningful after pushing. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- Why is an empty line needed? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 10:55:26 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 17:55:26 +0000 (UTC) Subject: [PATCH] D82499: [DAGCombiner] tighten constraints for fma fold In-Reply-To: References: Message-ID: <34c20ee807be34537de7423b9f116ccc@localhost.localdomain> spatel marked 2 inline comments as done. spatel added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11991 E = N0; } if (FMA && E) { ---------------- cameron.mcinally wrote: > Do we need to check `Aggressive` here? For a hypothetical target with 2 FMUL/FADD ports and 1 FMA port, assuming slow FMAs, this could be a performance loss. > > It shouldn't be a problem for modern chips that I care about, so just picking nits. Removing the 'Aggressive' clause was the previous patch. :) D80801 The reason for not requiring 'Aggressive' is that using FMA on this case is what we should assume is the best case for a default target that supports FMA. As discussed in the earlier patch, we know that this is difficult to get right for all code sequences/targets, so there is already an opt-out to bypass this in SDAG and use MachineCombiner instead. Potentially, we could also transform patterns like this after they have been fused to FMA. That would again be in MachineCombiner (where we have the detailed scheduler info). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82499/new/ https://reviews.llvm.org/D82499 From llvm-commits at lists.llvm.org Mon Jul 6 11:00:44 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:00:44 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: spatel added a comment. Ping. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 From llvm-commits at lists.llvm.org Mon Jul 6 11:01:38 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:01:38 +0000 (UTC) Subject: [PATCH] D83031: AMDGPU/GlobalISel: Select G_FREEZE In-Reply-To: References: Message-ID: <2ccd2d8265c55921c1cf3d6b3bbd0a21@localhost.localdomain> arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-freeze.mir:7 +--- +name: test_freeze_s1 +legalized: true ---------------- These are all VGPR cases; should also test AGPR, SGPR, and vcc cases CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83031/new/ https://reviews.llvm.org/D83031 From llvm-commits at lists.llvm.org Mon Jul 6 11:03:54 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:03:54 +0000 (UTC) Subject: [PATCH] D82878: AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic In-Reply-To: References: Message-ID: <32b3abc1e5b666f3bfffb51e8e6852e1@localhost.localdomain> arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82878/new/ https://reviews.llvm.org/D82878 From llvm-commits at lists.llvm.org Mon Jul 6 11:05:42 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:05:42 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <90b09d02e0feded15e73403d8e5db3cd@localhost.localdomain> sameerarora101 marked 18 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:15 +# CHECK-NAMES-NEXT: input2.o +# CHECK-NAMES-NOT: {{.}} + ---------------- jhenderson wrote: > Watch out, here and in other cases, this only shows that there is no output AFTER `input2.o`. It's possible that there's input before `input1.o`. Also, you'll not catching name corruptions resulting in a prefix/suffix of the name, since FileCheck only does sub-string matching by default, not full line matching. For example, this would fail if the following was emitted: > > ``` > input-I-really-shouldn't-be-here > input1.osuffix > prefixinput2.o > ``` > > You probably want to add `--implicit-check-not={{.}}` to the FileCheck command line, rather than the final `CHECK-NAMES-NOT`. Thanks, I added `--implicit-check-not`. Furthermore, I also added `-DPREFIX=create-static-lib.test.tmp` as the file names are represented by `[[PREFIX]]-input1.o` and `[[PREFIX]]-input2.o` ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:19 +# RUN: llvm-nm --print-armap %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-SYMBOLS + ---------------- jhenderson wrote: > It would be best to check the symbol is in the right member. You can do this by using FileCheck's -D option, combined with the `%t_basename` (NB: check the exact name for correctness, but there should be other examples): > > ``` > # RUN: ... FileCheck %s -D FILE1=%t_basename ... > > # CHECK-SYMBOLS: _symbol1 in [[FILE1]] > ``` > > and so on. This defines a string variable that matches the specified input string, and can be used using the `[[VAR]]` syntax as shown. Take a look at the FileCheck documentation for details. > > Also, you should check the `Archive map` string to ensure there's no symbol before the first. Yup, I now check that the symbol is in the right member using the following: ``` ## Check that symbols are present: # RUN: llvm-nm --print-armap %t.lib | \ # RUN: FileCheck %s --check-prefix=CHECK-SYMBOLS -DPREFIX=create-static-lib.test.tmp # CHECK-SYMBOLS: Archive map # CHECK-SYMBOLS-NEXT: _symbol1 in [[PREFIX]]-input1.o # CHECK-SYMBOLS-NEXT: _symbol2 in [[PREFIX]]-input2.o # CHECK-SYMBOLS-EMPTY: ``` Would this be ok? ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:7 + +# MISSING-OPERATION: Library Type: option: must be specified at least once! + ---------------- jhenderson wrote: > Does the double space match the actual error message? Yes, the actual error msg also has the double space: ``` Library Type: option: must be specified at least once! ``` ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:9-18 +## Input file not found: +# RUN: not llvm-libtool-darwin -static -o %t.lib %t.missing 2>&1 | \ +# RUN: FileCheck %s --check-prefix=NO-FILE -DFILE=%t.missing + +# NO-FILE: '[[FILE]]': {{[nN]}}o such file or directory + +## Input file is not an object file: ---------------- jhenderson wrote: > These two checks plus the ELF one below probably belong in the invalid input/output arguments test. Ok, I have placed all 4 tests into `invalid-input-output-args.test` now. Please lemme know in case we needed a separate test file for the first test above `## Missing -static option:` ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:20 + +# NOT-OBJECT: The file was not recognized as a valid object file + ---------------- jhenderson wrote: > Does this message use `error:` as a prefix? yup, I added `error:` in the error message too now, thanks! ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:34-38 +static cl::opt LibraryOperation( + cl::desc("Library Type: "), + cl::values( + clEnumValN(Static, "static", + "Produce a statically linked library from the input files")), ---------------- jhenderson wrote: > I'm not really familiar with the `Operation` type. What does it look like in the help text? this is what help text looks like: ``` OVERVIEW: llvm-libtool-darwin USAGE: llvm-libtool-darwin [options] OPTIONS: Color Options: --color - Use colors in output (default=autodetect) Generic Options: --help - Display available options (--help-hidden for more) --help-list - Display list of available options (--help-list-hidden for more) --version - Display the version of this program llvm-libtool-darwin options: -o - Alias for -output --output= - Specify output filename Library Type: --static - Produce a statically linked library from the input files ``` I created an `enum Operation` here so that in future we can add support for `dynamic` operation easily. I can very well make the `-static` option a boolean flag as well. What do you think? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Mon Jul 6 11:06:41 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:06:41 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: sameerarora101 updated this revision to Diff 275767. sameerarora101 marked 6 inline comments as done. sameerarora101 added a comment. Updating tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.275767.patch Type: text/x-patch Size: 10038 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:09:00 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:09:00 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: asbirlea added a comment. In D83013#2132088 , @aeubanks wrote: > In D83013#2132070 , @echristo wrote: > > > Adding Chandler and Alina here as well. > > > > In general, I don't think that this is such a great idea. Being able to have this sort of thing work more reliably is one of the reasons for the new pass manager. I think I'd like to see this split out into an old versus new pass manager pass to avoid the difficulty of cleaning this up after we finish migrating llvm to the new pass manager. This also seems to add some technical debt around options and other enablement which is also less than ideal. Is this compelling to add right now versus finishing work migrating llvm completely to the new pass manager and removing the old one? From speaking with Alina I think that work should be done in a short while. > > > > Thanks. > > > > -eric > > > I don't think we're that close yet, probably at least a couple months out, there are lots of loose ends to be tied up. I'll make a post soon in llvm-dev (maybe first we can sync up again) about what I think needs to be done before the NPM switch. +1 to sync up again and make progress towards the NPM switch. I don't want to block this patch, but I do agree with Eric's point. We *really* want to focus more on the switch then invest into more LPM infra. Short term resolutions to unblock folks, with the best split possible, sure, keeping in mind they'll need to be cleaned up. But I don't want us to lose focus on tying up the remaining loose ends for the switch. I think it's critical for LLVM's codebase health to focus on the NPM switch in the next couple of months. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Mon Jul 6 11:10:57 2020 From: llvm-commits at lists.llvm.org (Hafiz Abid Qadeer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:10:57 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. Message-ID: abidh created this revision. abidh added reviewers: MaskRay, ruiu. Herald added subscribers: llvm-commits, arichardson, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. Such relocations don't generate errors for -r/--emit-relocs in InputSection::copyRelocations. This patch allows similar behaviour in general and makes lld more consistent. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83244 Files: lld/ELF/Relocations.cpp lld/test/ELF/comdat-discarded-no-error.s Index: lld/test/ELF/comdat-discarded-no-error.s =================================================================== --- /dev/null +++ lld/test/ELF/comdat-discarded-no-error.s @@ -0,0 +1,21 @@ +# REQUIRES: x86 +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o +# RUN: echo '.section .text.foo,"axG", at progbits,foo,comdat; .globl foo; foo:' |\ +# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t2.o +# RUN: echo '.section .text.foo,"axG", at progbits,foo,comdat; .globl bar; bar:' |\ +# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t3.o + +# RUN: ld.lld %t2.o %t3.o %t1.o -o /dev/null 2>&1 + +.globl _start +_start: + nop + +.section .text.foo,"axG", at progbits,foo,comdat + nop + +.section .eh_frame,"a", at unwind + .quad .text.foo + +.section .gcc_except_table,"a" + .quad .text.foo Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -955,6 +955,12 @@ (sec.name == ".got2" || sec.name == ".toc")) return false; + // The "gcc_except_table" and ".eh_frame" can have relocations to discarded + // sections. Don't error out. + if (cast(sym).discardedSecIdx != 0 && + (sec.name == ".gcc_except_table" || sec.name == ".eh_frame")) + return false; + bool isWarning = (config->unresolvedSymbols == UnresolvedPolicy::Warn && canBeExternal) || config->noinhibitExec; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83244.275764.patch Type: text/x-patch Size: 1428 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:15:36 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:15:36 +0000 (UTC) Subject: [PATCH] D82980: [NFC] Run clang-format on llvm-objcopy. In-Reply-To: References: Message-ID: <303b5260066d7fb0b20038754ad21bbe@localhost.localdomain> rupprecht updated this revision to Diff 275772. rupprecht marked 6 inline comments as done. rupprecht added a comment. Reformat with clang-format-10 to avoid clang-format version skew leading to premerge clang-format warnings Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82980/new/ https://reviews.llvm.org/D82980 Files: llvm/tools/llvm-objcopy/COFF/Object.h llvm/tools/llvm-objcopy/CommonOpts.td llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp llvm/tools/llvm-objcopy/ELF/Object.cpp llvm/tools/llvm-objcopy/ELF/Object.h llvm/tools/llvm-objcopy/InstallNameToolOpts.td llvm/tools/llvm-objcopy/ObjcopyOpts.td llvm/tools/llvm-objcopy/StripOpts.td llvm/tools/llvm-objcopy/llvm-objcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82980.275772.patch Type: text/x-patch Size: 24276 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:15:44 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:15:44 +0000 (UTC) Subject: [PATCH] D82980: [NFC] Run clang-format on llvm-objcopy. In-Reply-To: References: Message-ID: <0a54fb0e07bbaa785fad0c329c7b67b8@localhost.localdomain> rupprecht added a comment. In D82980#2125967 , @MaskRay wrote: > It is subjective but it seems to me that clang-format is making bad decisions for .td files. Many people don't format .td files (various lib/Target/*/*.td and include/clang/Driver/Options.td) The td formatting is less robust, but works well for simple CLI option files. I'm not particularly tied to the format it chooses, but I do find the options files easier to read when consistently formatted. The other td files you mentioned are much more complex, and I've tried using clang-format on them. It totally botches them and I don't think we should bother with complex files. > Formatting C++ files is fine but it seems that we run into some discrepancy between clang-format versions. Formatting with latest clang-format is probably fine. Yep In D82980#2127279 , @jhenderson wrote: > No issue in principle with this, but we do need to figure out the canonical version of clang-format we want to use for this if we are going to do it. I have no personal opinion on it, but suspect my installed clang-format version is out-of-date, and that if I were to do the same thing you did I'd get different results. What does `clang-format --version` look like on your machine? The one on my machine seems to be updated on a rolling basis (built from trunk on a regular basis), but `sudo apt install clang-format-10` is available to use a more stable version. ================ Comment at: llvm/tools/llvm-objcopy/CopyConfig.cpp:210 if (Split.second.getAsInteger(0, NewAlign)) - return createStringError(errc::invalid_argument, - "invalid alignment for --set-section-alignment: '%s'", - Split.second.str().c_str()); + return createStringError( + errc::invalid_argument, ---------------- jhenderson wrote: > MaskRay wrote: > > I suspect the difference is due to changed heuristics of clang-format (of different versions). > Actually, I suspect it just wasn't formatted before - the "invalid argument for..." string below in the original is over the 80 character limit. Yep, this is the reason why. FWIW I also tried an old version (clang-format-7) and it still formats this block. ================ Comment at: llvm/tools/llvm-objcopy/CopyConfig.cpp:490 + MatchStyle SymbolMatchStyle = + InputArgs.hasArg(OBJCOPY_regex) ? MatchStyle::Regex + : InputArgs.hasArg(OBJCOPY_wildcard) ? MatchStyle::Wildcard ---------------- MyDeveloperDay wrote: > MaskRay wrote: > > Pre-merge checks may suggest that this is due to different versions of clang-format. > > > > I wonder whether we want to format the block. > I do believe there were some changes made recently in this area @krasimir, @Typz might like to comment This is using clang-format from head (7d9518c8000bcd742b364a390bc79560f736dc96 at the time). Using `clang-format-10` restores it. I agree the version of clang-format to use is an important choice. I'll revert these portions for now and punt the question for later. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82980/new/ https://reviews.llvm.org/D82980 From llvm-commits at lists.llvm.org Mon Jul 6 11:16:09 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:16:09 +0000 (UTC) Subject: [PATCH] D82499: [DAGCombiner] tighten constraints for fma fold In-Reply-To: References: Message-ID: cameron.mcinally added a comment. Oops, missed that. Sorry. Still LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82499/new/ https://reviews.llvm.org/D82499 From llvm-commits at lists.llvm.org Mon Jul 6 11:16:19 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:16:19 +0000 (UTC) Subject: [PATCH] D83106: [WebAssembly] 64-bit memory limits In-Reply-To: References: Message-ID: <2ac338f90644ad63f6c7237cae47457e@localhost.localdomain> aardappel updated this revision to Diff 275773. aardappel added a comment. Added part of LLD data-layout.ll test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83106/new/ https://reviews.llvm.org/D83106 Files: lld/test/wasm/data-layout.ll lld/wasm/SyntheticSections.cpp lld/wasm/SyntheticSections.h lld/wasm/Writer.cpp llvm/include/llvm/BinaryFormat/Wasm.h llvm/lib/MC/WasmObjectWriter.cpp llvm/lib/Object/WasmObjectFile.cpp llvm/lib/ObjectYAML/WasmYAML.cpp llvm/test/MC/WebAssembly/wasm64.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83106.275773.patch Type: text/x-patch Size: 12165 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:16:43 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:16:43 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <181a40406fcfb92145691ff840d19551@localhost.localdomain> lei added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:633 +vector signed int test_vec_vec_splati_si(void) { + // CHECK: ret <4 x i32> + return vec_splati(-17); ---------------- missing CHECK-BE ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:637 + +vector unsigned int test_vec_vec_splati_ui(void) { + // CHECK: ret <4 x i32> ---------------- same ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:643 +vector float test_vec_vec_splati_f(void) { + // CHECK: ret <4 x float> + return vec_splati(1.0f); ---------------- same ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:651 + // CHECK-NEXT: [[T3:%.+]] = shufflevector <2 x double> [[T2:%.+]], <2 x double> undef, <2 x i32> zeroinitialize + // CHECK-NEXT: ret <2 x double> [[T3:%.+]] + return vec_splatid(1.0); ---------------- biplmish wrote: > lei wrote: > > missing CHECK-BE? > There will 1 less case for BE as one splat location will be the same as index. not sure what you mean. There is not checks for BE here at all.... CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 11:17:47 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:17:47 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. Message-ID: amyk created this revision. amyk added reviewers: power-llvm-team, PowerPC, nemanjai, hfinkel. amyk added a project: LLVM. Herald added subscribers: shchenz, hiraditya. This patch aims to exploit the `xxsplti32dx XT, IX, IMM32` instruction when lowering `VECTOR_SHUFFLE`s. We implement `lowerToXXSPLTI32DX` when lowering vector shuffles to check if: - Element size is 4 bytes - The RHS is a constant vector (and constant splat of 4-bytes) - The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks: <0, 4-7, 2, 4-7> <4-7, 1, 4-7, 3> Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275769.patch Type: text/x-patch Size: 10966 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:19:00 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:19:00 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <7d6d066a21a694b9e0048bdaa309b681@localhost.localdomain> nickdesaulniers added inline comments. ================ Comment at: llvm/lib/IR/Instructions.cpp:251 +CallBase *CallBase::Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt) { + switch (CB->getOpcode()) { ---------------- unused param? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:34 + +using namespace llvm; + ---------------- if you move this up, then you don't need to wrap the forward declaration of `class Module` above. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:78 + + OperandBundleRemapper(ArrayRef ChunksToKeep) : O(ChunksToKeep) {} + ---------------- `explicit` for one operand constructors? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 11:23:11 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:23:11 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: amyk added inline comments. ================ Comment at: clang/lib/Headers/altivec.h:17210 + +static __inline__ vector double __ATTRS_o_ai vec_splatid(const float __a) { + return ((vector double)((double)__a)); ---------------- Move function name on next line for consistency. ================ Comment at: clang/lib/Headers/altivec.h:17216 + +static __inline__ vector signed int __ATTRS_o_ai vec_splati_ins( + vector signed int __a, const unsigned int __b, const signed int __c) { ---------------- Also move function name to next line. ================ Comment at: clang/lib/Headers/altivec.h:17229 + +static __inline__ vector unsigned int __ATTRS_o_ai vec_splati_ins( + vector unsigned int __a, const unsigned int __b, const unsigned int __c) { ---------------- Also move function name to next line. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 11:27:37 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:27:37 +0000 (UTC) Subject: [PATCH] D82903: [flang] Bug fix for ambiguous references to data and functions In-Reply-To: References: Message-ID: <74e09360ea214863dc7c40e972027a13@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf9e24a563c36: [flang] Bug fix for ambiguous references to data and functions (authored by PeteSteinfeld). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82903/new/ https://reviews.llvm.org/D82903 Files: flang/lib/Semantics/expression.cpp flang/lib/Semantics/resolve-names.cpp flang/test/Semantics/resolve93.f90 Index: flang/test/Semantics/resolve93.f90 =================================================================== --- /dev/null +++ flang/test/Semantics/resolve93.f90 @@ -0,0 +1,44 @@ +! RUN: %S/test_errors.sh %s %t %f18 +subroutine s1() + character(10) str + character(10) str1 + !ERROR: Cannot reference function 'str' as data + print *, str(1:9), str(7) + block + character(10) str2 + character(10) str3 + !ERROR: Cannot reference function 'str1' as data + print *, str1(1:9), str1(7) + !ERROR: 'str2' is not an array + print *, str2(1:9), str2(7) + !ERROR: Cannot reference function 'str3' as data + print *, str3(7), str3(1:9) + end block +end subroutine s1 + +subroutine s2() + character(10) func + !ERROR: Cannot reference function 'func' as data + print *, func(7), func(1:9) +end subroutine s2 + +subroutine s3() + real(8) :: func + !ERROR: Cannot reference function 'func' as data + print *, func(7), func(1:6) +end subroutine s3 + +subroutine s4() + real(8) :: local + real(8) :: local1 + !ERROR: Cannot reference function 'local' as data + print *, local(1:6), local(7) + !ERROR: Cannot reference function 'local1' as data + print *, local1(7), local1(1:6) +end subroutine s4 + +subroutine s5(arg) + integer :: iVar + external :: arg + iVar = loc(arg) +end subroutine s5 Index: flang/lib/Semantics/resolve-names.cpp =================================================================== --- flang/lib/Semantics/resolve-names.cpp +++ flang/lib/Semantics/resolve-names.cpp @@ -5505,7 +5505,15 @@ }, [&](const Indirection &y) { Walk(y.value().subscripts); - return ResolveDataRef(y.value().base); + const parser::Name *name{ResolveDataRef(y.value().base)}; + if (!name) { + } else if (!name->symbol->has()) { + ConvertToObjectEntity(*name->symbol); + } else if (!context().HasError(*name->symbol)) { + SayWithDecl(*name, *name->symbol, + "Cannot reference function '%s' as data"_err_en_US); + } + return name; }, [&](const Indirection &y) { Walk(y.value().imageSelector); Index: flang/lib/Semantics/expression.cpp =================================================================== --- flang/lib/Semantics/expression.cpp +++ flang/lib/Semantics/expression.cpp @@ -909,7 +909,10 @@ return std::nullopt; } else if (baseExpr->Rank() == 0) { if (const Symbol * symbol{GetLastSymbol(*baseExpr)}) { - Say("'%s' is not an array"_err_en_US, symbol->name()); + if (!context_.HasError(symbol)) { + Say("'%s' is not an array"_err_en_US, symbol->name()); + context_.SetError(const_cast(*symbol)); + } } } else if (std::optional dataRef{ ExtractDataRef(std::move(*baseExpr))}) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D82903.275774.patch Type: text/x-patch Size: 2983 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:28:10 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 11:28:10 -0700 (PDT) Subject: [llvm] c19c153 - AMDGPU: Don't ignore carry out user when expanding add_co_pseudo Message-ID: <5f036d3a.1c69fb81.9b58b.07d6@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T14:28:01-04:00 New Revision: c19c153e7415807f043edc42843bc491232b717e URL: https://github.com/llvm/llvm-project/commit/c19c153e7415807f043edc42843bc491232b717e DIFF: https://github.com/llvm/llvm-project/commit/c19c153e7415807f043edc42843bc491232b717e.diff LOG: AMDGPU: Don't ignore carry out user when expanding add_co_pseudo This was resulting in a missing vreg def in the use select instruction. The output of the pseudo doesn't make sense, since it really shouldn't have the vreg output in the first place, and instead an implicit scc def to match the real scalar behavior. We could have easier to understand tests if we selected scalar versions of the [us]{add|sub}.with.overflow intrinsics. This does still end up producing vector code in the end, since it gets moved later. Added: llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index f23762b183a8..3ee48c1ffdff 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -3880,6 +3880,7 @@ MachineBasicBlock *SITargetLowering::EmitInstrWithCustomInserter( MachineBasicBlock::iterator MII = MI; const DebugLoc &DL = MI.getDebugLoc(); MachineOperand &Dest = MI.getOperand(0); + MachineOperand &CarryDest = MI.getOperand(1); MachineOperand &Src0 = MI.getOperand(2); MachineOperand &Src1 = MI.getOperand(3); MachineOperand &Src2 = MI.getOperand(4); @@ -3916,6 +3917,9 @@ MachineBasicBlock *SITargetLowering::EmitInstrWithCustomInserter( } BuildMI(*BB, MII, DL, TII->get(Opc), Dest.getReg()).add(Src0).add(Src1); + + BuildMI(*BB, MII, DL, TII->get(AMDGPU::COPY), CarryDest.getReg()) + .addReg(AMDGPU::SCC); MI.eraseFromParent(); return BB; } diff --git a/llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll b/llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll new file mode 100644 index 000000000000..aad3ea52ab81 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll @@ -0,0 +1,121 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck -check-prefixes=GCN,GFX10 %s + +define i32 @s_add_co_select_user() { +; GFX9-LABEL: s_add_co_select_user: +; GFX9: ; %bb.0: ; %bb +; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT: s_mov_b64 s[4:5], 0 +; GFX9-NEXT: s_load_dword s6, s[4:5], 0x0 +; GFX9-NEXT: s_waitcnt lgkmcnt(0) +; GFX9-NEXT: v_add_co_u32_e64 v0, s[4:5], s6, s6 +; GFX9-NEXT: s_cmp_lg_u64 s[4:5], 0 +; GFX9-NEXT: s_addc_u32 s4, s6, 0 +; GFX9-NEXT: s_cselect_b64 vcc, 1, 0 +; GFX9-NEXT: v_mov_b32_e32 v1, s4 +; GFX9-NEXT: s_cmp_gt_u32 s6, 31 +; GFX9-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; GFX9-NEXT: s_cselect_b64 vcc, -1, 0 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc +; GFX9-NEXT: s_setpc_b64 s[30:31] +; +; GFX10-LABEL: s_add_co_select_user: +; GFX10: ; %bb.0: ; %bb +; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT: s_waitcnt_vscnt null, 0x0 +; GFX10-NEXT: s_mov_b64 s[4:5], 0 +; GFX10-NEXT: ; implicit-def: $vcc_hi +; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0 +; GFX10-NEXT: s_waitcnt lgkmcnt(0) +; GFX10-NEXT: v_add_co_u32_e64 v0, s5, s4, s4 +; GFX10-NEXT: s_cmpk_lg_u32 s5, 0x0 +; GFX10-NEXT: s_addc_u32 s5, s4, 0 +; GFX10-NEXT: s_cselect_b32 s6, 1, 0 +; GFX10-NEXT: s_cmp_gt_u32 s4, 31 +; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, s5, s6 +; GFX10-NEXT: s_cselect_b32 vcc_lo, -1, 0 +; GFX10-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc_lo +; GFX10-NEXT: s_setpc_b64 s[30:31] +bb: + %i = load volatile i32, i32 addrspace(4)* null, align 8 + %i1 = add i32 %i, %i + %i2 = icmp ult i32 %i1, %i + %i3 = zext i1 %i2 to i32 + %i4 = add nuw nsw i32 %i3, 0 + %i5 = add i32 %i4, %i + %i6 = icmp ult i32 %i5, %i4 + %i7 = select i1 %i6, i32 %i5, i32 0 + %i8 = icmp ugt i32 %i, 31 + %i9 = select i1 %i8, i32 %i1, i32 %i7 + ret i32 %i9 +} + +define amdgpu_kernel void @s_add_co_br_user(i32 %i) { +; GFX9-LABEL: s_add_co_br_user: +; GFX9: ; %bb.0: ; %bb +; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0 +; GFX9-NEXT: s_waitcnt lgkmcnt(0) +; GFX9-NEXT: s_add_i32 s1, s0, s0 +; GFX9-NEXT: v_mov_b32_e32 v0, s0 +; GFX9-NEXT: v_cmp_lt_u32_e32 vcc, s1, v0 +; GFX9-NEXT: s_cmp_lg_u64 vcc, 0 +; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc +; GFX9-NEXT: s_addc_u32 s0, s0, 0 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, s0, v0 +; GFX9-NEXT: s_and_b64 vcc, exec, vcc +; GFX9-NEXT: s_cbranch_vccnz BB1_2 +; GFX9-NEXT: ; %bb.1: ; %bb0 +; GFX9-NEXT: v_mov_b32_e32 v0, 0 +; GFX9-NEXT: v_mov_b32_e32 v2, 9 +; GFX9-NEXT: v_mov_b32_e32 v1, 0 +; GFX9-NEXT: global_store_dword v[0:1], v2, off +; GFX9-NEXT: BB1_2: ; %bb1 +; GFX9-NEXT: v_mov_b32_e32 v0, 0 +; GFX9-NEXT: v_mov_b32_e32 v2, 10 +; GFX9-NEXT: v_mov_b32_e32 v1, 0 +; GFX9-NEXT: global_store_dword v[0:1], v2, off +; GFX9-NEXT: s_endpgm +; +; GFX10-LABEL: s_add_co_br_user: +; GFX10: ; %bb.0: ; %bb +; GFX10-NEXT: s_load_dword s0, s[4:5], 0x0 +; GFX10-NEXT: ; implicit-def: $vcc_hi +; GFX10-NEXT: s_waitcnt lgkmcnt(0) +; GFX10-NEXT: s_add_i32 s1, s0, s0 +; GFX10-NEXT: v_cmp_lt_u32_e64 s1, s1, s0 +; GFX10-NEXT: v_cndmask_b32_e64 v0, 0, 1, s1 +; GFX10-NEXT: s_cmpk_lg_u32 s1, 0x0 +; GFX10-NEXT: s_addc_u32 s0, s0, 0 +; GFX10-NEXT: v_cmp_ge_u32_e32 vcc_lo, s0, v0 +; GFX10-NEXT: s_and_b32 vcc_lo, exec_lo, vcc_lo +; GFX10-NEXT: s_cbranch_vccnz BB1_2 +; GFX10-NEXT: ; %bb.1: ; %bb0 +; GFX10-NEXT: v_mov_b32_e32 v0, 0 +; GFX10-NEXT: v_mov_b32_e32 v2, 9 +; GFX10-NEXT: v_mov_b32_e32 v1, 0 +; GFX10-NEXT: global_store_dword v[0:1], v2, off +; GFX10-NEXT: BB1_2: ; %bb1 +; GFX10-NEXT: v_mov_b32_e32 v0, 0 +; GFX10-NEXT: v_mov_b32_e32 v2, 10 +; GFX10-NEXT: v_mov_b32_e32 v1, 0 +; GFX10-NEXT: global_store_dword v[0:1], v2, off +; GFX10-NEXT: s_endpgm +bb: + %i1 = add i32 %i, %i + %i2 = icmp ult i32 %i1, %i + %i3 = zext i1 %i2 to i32 + %i4 = add nuw nsw i32 %i3, 0 + %i5 = add i32 %i4, %i + %i6 = icmp ult i32 %i5, %i4 + %i7 = select i1 %i6, i32 %i5, i32 0 + br i1 %i6, label %bb0, label %bb1 + +bb0: + store volatile i32 9, i32 addrspace(1)* null + br label %bb1 + +bb1: + store volatile i32 10, i32 addrspace(1)* null + ret void +} From llvm-commits at lists.llvm.org Mon Jul 6 11:28:17 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:28:17 +0000 (UTC) Subject: [PATCH] D82248: AMDGPU: Don't ignore carry out user when expanding add_co_pseudo In-Reply-To: References: Message-ID: <82d064d717317949be169b36d3c05ad0@localhost.localdomain> arsenm closed this revision. arsenm added a comment. c19c153e7415807f043edc42843bc491232b717e CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82248/new/ https://reviews.llvm.org/D82248 From llvm-commits at lists.llvm.org Mon Jul 6 11:28:22 2020 From: llvm-commits at lists.llvm.org (Katherine Rasmussen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:28:22 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <65bbae39686983c5db7471ce8748ae4c@localhost.localdomain> ktras marked 2 inline comments as done. ktras added inline comments. ================ Comment at: flang/test/Semantics/num_images.f90:14 + !ERROR: too many actual arguments for intrinsic 'num_images' + print *, num_images(3.4) + ---------------- tskeith wrote: > Why is the error "too many actual arguments" rather than incorrect type? I believe it is because 'num_images()' is overloaded with 3 variants and one of these has no arguments. If an argument is found that doesn't fully match the versions of 'num_images()' that do have arguments, then it seems to be interpreting those incorrect calls as an call to the version with no arguments. Thus the error being "too many actual arguments" if the argument is of an unexpected type or "unknown keyword argument" if a correct keyword argument is used, but with an incorrect type. I haven't looked into if there is a way to change the logic of the errors being produced in these cases. ================ Comment at: flang/test/Semantics/num_images.f90:22 + !ERROR: unknown keyword argument to intrinsic 'num_images' + print *, num_images(team_number=3.4) + ---------------- tskeith wrote: > Similar question here: `team_number` isn't an unknown keyword argument. The value has the wrong type. > > Are these bad error messages found with other intrinsics or unique to `num_images? Replied in comment above. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Mon Jul 6 11:29:09 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:29:09 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri updated this revision to Diff 275775. lebedev.ri marked 3 inline comments as done. lebedev.ri added a comment. @nickdesaulniers thank you for taking a look! Addressing review nits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 Files: llvm/include/llvm/IR/InstrTypes.h llvm/lib/IR/Instructions.cpp llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83177.275775.patch Type: text/x-patch Size: 12302 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:30:32 2020 From: llvm-commits at lists.llvm.org (Sam Clegg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:30:32 +0000 (UTC) Subject: [PATCH] D83106: [WebAssembly] 64-bit memory limits In-Reply-To: References: Message-ID: <6cadf63f4ce77f1f25cc153ebc943c2b@localhost.localdomain> sbc100 accepted this revision. sbc100 added inline comments. ================ Comment at: lld/test/wasm/data-layout.ll:6 -target triple = "wasm32-unknown-unknown" +; RUN: llvm-mc -filetype=obj -triple=wasm64-unknown-unknown %p/Inputs/hello.s -o %t.hello.o +; RUN: llc -mtriple=wasm64-unknown-unknown -filetype=obj %s -o %t.o ---------------- I would give each object a unique name e.g. `hello64.o` ? Than you can avoid rebuilding it below. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83106/new/ https://reviews.llvm.org/D83106 From llvm-commits at lists.llvm.org Mon Jul 6 11:36:23 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 11:36:23 -0700 Subject: [llvm] d7ea6ce - [Support] fix user_cache_directory on mac In-Reply-To: <5f0302dc.1c69fb81.983ca.9e7d@mx.google.com> References: <5f0302dc.1c69fb81.983ca.9e7d@mx.google.com> Message-ID: Any chance of testing this? (maybe even a unit test? (even if it's uninteresting on other platforms, perhaps)?) On Mon, Jul 6, 2020 at 3:54 AM Sam McCall via llvm-commits wrote: > > > Author: Sam McCall > Date: 2020-07-06T12:54:11+02:00 > New Revision: d7ea6ce809a4413afb1edafa17ba291b39129f52 > > URL: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52 > DIFF: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52.diff > > LOG: [Support] fix user_cache_directory on mac > > Added: > > > Modified: > llvm/lib/Support/Unix/Path.inc > > Removed: > > > > ################################################################################ > diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc > index c35db79cbd8a..d91b269cc6d3 100644 > --- a/llvm/lib/Support/Unix/Path.inc > +++ b/llvm/lib/Support/Unix/Path.inc > @@ -1193,7 +1193,7 @@ bool user_config_directory(SmallVectorImpl &result) { > #ifdef __APPLE__ > // Mac: ~/Library/Preferences/ > if (home_directory(result)) { > - append("Library", "Preferences"); > + append(result, "Library", "Preferences"); > return true; > } > #else > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 11:38:58 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:38:58 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: <4435d9b6fa866355081221b2b7476070@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13242 + // Estimate creation failed. Clean up speculatively created nodes. + if (AAZ->use_empty()) + DAG.RemoveDeadNode(AAZ.getNode()); ---------------- Can we just call recursivelyDeleteUnusedNodes(AAZ) if AAZ is unused and avoid the AA handling? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 From llvm-commits at lists.llvm.org Mon Jul 6 11:40:12 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:40:12 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. In-Reply-To: References: Message-ID: MaskRay added a comment. The `.eh_frame` test case is invalid. LLD handles .eh_frame input sections differently. It parses .eh_frame and deduplicates them. See `eh-frame-merge.s`, an input .eh_frame referencing a non-prevailing COMDAT group is dropped (EhFrameSection::isFdeLive) Do you have a realistic case where LLD erroneously errors? If so, can you get a minimal reproduce, use `LLD_REPRODUCE=/tmp/rep.tar` or `-Wl,--reproduce=/tmp/rep.tar` to get a reproduce file and upload it somewhere? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83244/new/ https://reviews.llvm.org/D83244 From llvm-commits at lists.llvm.org Mon Jul 6 11:41:04 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:41:04 +0000 (UTC) Subject: [PATCH] D83067: [BasicAA] Remove -basicaa alias In-Reply-To: References: Message-ID: ychen accepted this revision. ychen added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83067/new/ https://reviews.llvm.org/D83067 From llvm-commits at lists.llvm.org Mon Jul 6 11:43:03 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:43:03 +0000 (UTC) Subject: [PATCH] D83184: Avoid using globals in ELF Symbol Table In-Reply-To: References: Message-ID: <7cbe9a324dfb5d348187ab3269606ed0@localhost.localdomain> MaskRay added a comment. I can commit the for you if you don't have commit access. Please provide `Name ` so that you can get proper credit. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83184/new/ https://reviews.llvm.org/D83184 From llvm-commits at lists.llvm.org Mon Jul 6 11:44:50 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:44:50 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <93de257dc329e45832d32ac96e44ddea@localhost.localdomain> lei added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9791 + SDValue VecShuffle(SVN, 0); + SDLoc dl(SVN); + ---------------- `dl`->`DL` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9824 + unsigned SplatSize = SplatBitSize / 8; + if (SplatSize > 4) + return SDValue(); ---------------- no need for the tmp `SplatSize` ``` if ((SplatBitSize / 8) > 4) ``` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9838 + (((ShuffleMask[4] % 4 == 0) && (ShuffleMask[12] % 4 == 0)) && + (ShuffleMask[4] > 15 && ShuffleMask[12] > 15))) // Case 1. + Index = DAG.getTargetConstant(IsLE ? 1 : 0, dl, MVT::i1); ---------------- There see to be alot of extra, unnecessary `()` here... since all these are `&&` I think alot of these can be removed. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9840 + Index = DAG.getTargetConstant(IsLE ? 1 : 0, dl, MVT::i1); + else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2. + (((ShuffleMask[0] % 4 == 0) && (ShuffleMask[8] % 4 == 0)) && ---------------- same. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9843 + (ShuffleMask[0] > 15 && ShuffleMask[8] > 15))) + Index = DAG.getTargetConstant(IsLE ? 0 : 1, dl, MVT::i1); + ---------------- I think you are missing: ``` else return SDValue(); ``` ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:49 + SDTCisVec<1>, SDTCisInt<2>, SDTCisInt<3> +]>; + ---------------- nit: indentation? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 11:46:05 2020 From: llvm-commits at lists.llvm.org (William Moses via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:46:05 +0000 (UTC) Subject: [PATCH] D83184: Avoid using globals in ELF Symbol Table In-Reply-To: References: Message-ID: <58142f0cbfc9f35ab7efff5938bbcb11@localhost.localdomain> wsmoses added a comment. In D83184#2133678 , @MaskRay wrote: > I can commit the for you if you don't have commit access. Please provide `Name ` so that you can get proper credit. I do not presently have commit access, for my git commits I use: `William S. Moses ` Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83184/new/ https://reviews.llvm.org/D83184 From llvm-commits at lists.llvm.org Mon Jul 6 11:48:57 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via llvm-commits) Date: Mon, 06 Jul 2020 11:48:57 -0700 (PDT) Subject: [llvm] af8389e - [VE] Change to use isa Message-ID: <5f037219.1c69fb81.5a6f2.024a@mx.google.com> Author: Kazushi (Jam) Marukawa Date: 2020-07-07T03:48:49+09:00 New Revision: af8389e1315a1d4fa2bb5116f40cfc0704891a58 URL: https://github.com/llvm/llvm-project/commit/af8389e1315a1d4fa2bb5116f40cfc0704891a58 DIFF: https://github.com/llvm/llvm-project/commit/af8389e1315a1d4fa2bb5116f40cfc0704891a58.diff LOG: [VE] Change to use isa Summary: Change to use isa instead of dyn_cast to avoid a warning. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D83200 Added: Modified: llvm/lib/Target/VE/VEISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/VE/VEISelLowering.cpp b/llvm/lib/Target/VE/VEISelLowering.cpp index 9abffae413d4..ab720545dd83 100644 --- a/llvm/lib/Target/VE/VEISelLowering.cpp +++ b/llvm/lib/Target/VE/VEISelLowering.cpp @@ -548,7 +548,7 @@ bool VETargetLowering::hasAndNot(SDValue Y) const { // for all immediate values now. // FIXME: Change hasAndNot function to have two operands to make it work // correctly with Aurora VE. - if (auto *C = dyn_cast(Y)) + if (isa(Y)) return false; // It's ok for generic registers. From llvm-commits at lists.llvm.org Mon Jul 6 11:48:59 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:48:59 +0000 (UTC) Subject: [PATCH] D83200: [VE] Change to use isa In-Reply-To: References: Message-ID: <22f94463556196fd40d79bc5f86881cb@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGaf8389e1315a: [VE] Change to use isa (authored by kaz7). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83200/new/ https://reviews.llvm.org/D83200 Files: llvm/lib/Target/VE/VEISelLowering.cpp Index: llvm/lib/Target/VE/VEISelLowering.cpp =================================================================== --- llvm/lib/Target/VE/VEISelLowering.cpp +++ llvm/lib/Target/VE/VEISelLowering.cpp @@ -548,7 +548,7 @@ // for all immediate values now. // FIXME: Change hasAndNot function to have two operands to make it work // correctly with Aurora VE. - if (auto *C = dyn_cast(Y)) + if (isa(Y)) return false; // It's ok for generic registers. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83200.275778.patch Type: text/x-patch Size: 503 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:49:39 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:49:39 +0000 (UTC) Subject: [PATCH] D83084: DomTree: Remove the releaseMemory() method In-Reply-To: References: Message-ID: <07ed16895943b4b98582cda732968fae@localhost.localdomain> dblaikie added a comment. @arsenm - if you can, please include some text whenever approving a patch via phabricator, otherwise no email indicating approval is sent to the mailing lists (which makes auditing reviews difficult) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83084/new/ https://reviews.llvm.org/D83084 From llvm-commits at lists.llvm.org Mon Jul 6 11:50:11 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:50:11 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results Message-ID: okura created this revision. okura added a reviewer: jdoerfert. Herald added subscribers: llvm-commits, bbn, kuter, uenoku, hiraditya. Herald added a reviewer: sstefan1. Herald added a reviewer: uenoku. Herald added a reviewer: homerdin. Herald added a reviewer: baziotis. Herald added a project: LLVM. This is the next patch of D76210 . This patch contains two changes. 1. Use assumed liveness from AAIsDead ... If From instruction is assumed to be dead, return false immediately 2. Cache results ... Made a map in `InformationCache` for caching results. https://reviews.llvm.org/D83246 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/AttributorAttributes.cpp Index: llvm/lib/Transforms/IPO/AttributorAttributes.cpp =================================================================== --- llvm/lib/Transforms/IPO/AttributorAttributes.cpp +++ llvm/lib/Transforms/IPO/AttributorAttributes.cpp @@ -2469,7 +2469,7 @@ const auto &ReachabilityAA = A.getAAFor(*this, IRPosition::function(*ScopeFn)); - if (!ReachabilityAA.isAssumedReachable(UserI, getCtxI())) + if (!ReachabilityAA.isAssumedReachable(UserI, getCtxI(), A)) return true; if (auto *CB = dyn_cast(UserI)) { Index: llvm/include/llvm/Transforms/IPO/Attributor.h =================================================================== --- llvm/include/llvm/Transforms/IPO/Attributor.h +++ llvm/include/llvm/Transforms/IPO/Attributor.h @@ -715,6 +715,19 @@ /// Return the map conaining all the knowledge we have from `llvm.assume`s. const RetainedKnowledgeMap &getKnowledgeMap() const { return KnowledgeMap; } + bool getPotentiallyReachable(const Instruction *From, const Instruction *To) { + auto KeyPair = std::make_pair(From, To); + auto Iter = PotentiallyReachableMap.find(KeyPair); + const Function &F = *From->getFunction(); + if (Iter != PotentiallyReachableMap.end()) + return Iter->second; + bool Result = isPotentiallyReachable( + From, To, nullptr, AG.getAnalysis(F), + AG.getAnalysis(F)); + PotentiallyReachableMap.insert(std::make_pair(KeyPair, Result)); + return Result; + } + private: struct FunctionInfo { ~FunctionInfo(); @@ -774,6 +787,9 @@ /// Set of inlineable functions SmallPtrSet InlineableFunctions; + DenseMap, bool> + PotentiallyReachableMap; + /// Give the Attributor access to the members so /// Attributor::identifyDefaultAbstractAttributes(...) can initialize them. friend struct Attributor; @@ -2291,16 +2307,21 @@ /// Returns true if 'From' instruction is assumed to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isAssumedReachable(const Instruction *From, - const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isAssumedReachable(const Instruction *From, const Instruction *To, + Attributor &A) const { + const auto &LivenessAA = + A.getAAFor(*this, IRPosition::value(*From)); + if (A.isAssumedDead(*From, this, &LivenessAA)) + return false; + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Returns true if 'From' instruction is known to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isKnownReachable(const Instruction *From, const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isKnownReachable(const Instruction *From, const Instruction *To, + Attributor &A) const { + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Create an abstract attribute view for the position \p IRP. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83246.275776.patch Type: text/x-patch Size: 3325 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:50:52 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:50:52 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <8dc5be13e8e902c0e213892cb3d4d80f@localhost.localdomain> nickdesaulniers added inline comments. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- If we're going to create a new interface to `CallBase`, I kind of want to use it in more than just one place. In particular, at least `InlineFunction()` in `llvm/lib/Transforms/Utils/InlineFunction.cpp` looks like a perfect candidate to use this. There may be more, if you grep for the `cast`s. In that way, this change might help DRY up and also remove a repetitious pattern. WDYT? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 11:52:24 2020 From: llvm-commits at lists.llvm.org (Vedant Kumar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:52:24 +0000 (UTC) Subject: [PATCH] D83047: [LiveDebugValues] 2/4 Add instruction-referencing LiveDebugValues implementation In-Reply-To: References: Message-ID: <2464500e29d9f8f954ffbcf63ddc74ed@localhost.localdomain> vsk added a comment. Thanks for this, Jeremy. It'll take me multiple passes to page all of this in. I hope to get to the core algorithm changes in my next review. In the interest of getting some feedback to you sooner rather than later, I've included some minor suggestions and questions inline. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:241 +public: + uint64_t BlockNo : 20; /// The block where the def happens. + uint64_t InstNo : 20; /// The Instruction where the def happens. ---------------- You might find it convenient to use the new bitfield utilities from https://reviews.llvm.org/D82454. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:245 + LocIdx LocNo : NUM_LOC_BITS; /// The machine location where the def happens. + // (No idea why this can work as a LocIdx, it probably shouldn't) + ---------------- I don't follow this caveat, could you rephrase this? ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:313 + static bool isEqual(LocIdx A, LocIdx B) { return A == B; } +}; + ---------------- Wdyt of getting rid of these DenseMapInfo specializations? Having special reserved values complicates things a bit. If profiling demonstrates that std::map is a bottleneck, they could be added back. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:321 +/// the the value, and Boolean of whether or not it's indirect. +typedef std::pair MetaVal; + ---------------- Seems worthwhile to make this a proper class, with a constructor that accepts a MachineInstr and fills out the structure. I also find the name somewhat non-specific. Wdyt of "DbgValueProperties"? ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:360 + /// as the number of registers on the target -- if the value in the register + /// is not being tracked, then the LocIdx value will be zero. New entries are + /// appended if a new spill slot begins being tracked. ---------------- Why does there need to be a LocIdx reserved for the case where the value in a register isn't tracked? It doesn't look like this is done for stack slots. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:393 + LocIdxToIDNum.push_back({0, 0, LocIdx(0)}); + LocIDToLocIdx.resize(NumRegs); + memset(&LocIDToLocIdx[0], 0, NumRegs * sizeof(LocIdx)); ---------------- Could just be `LocIDToLocIdx.assign(NumRegs, LocIdx(0))`? ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:669 +/// (DebugVariable specific) dataflow analysis. +class ValueRec { +public: ---------------- nit -- "Rec" is suggestive of "recurrence". Wdyt of naming this "DbgValue"? ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:674 + /// If Kind is Const, the MachineOperand defining this value. + Optional MO; + /// Qualifiers for the ValueIDNum above. ---------------- Wdyt of grouping 'ID' and 'MO' in a union? This would make it clear that they cannot both be in use at the same time. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83047/new/ https://reviews.llvm.org/D83047 From llvm-commits at lists.llvm.org Mon Jul 6 11:54:59 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:54:59 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results In-Reply-To: References: Message-ID: <4c8c1b22ddbb4999c5d991b41aad1272@localhost.localdomain> okura updated this revision to Diff 275779. okura added a comment. Added a comment CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/AttributorAttributes.cpp Index: llvm/lib/Transforms/IPO/AttributorAttributes.cpp =================================================================== --- llvm/lib/Transforms/IPO/AttributorAttributes.cpp +++ llvm/lib/Transforms/IPO/AttributorAttributes.cpp @@ -2469,7 +2469,7 @@ const auto &ReachabilityAA = A.getAAFor(*this, IRPosition::function(*ScopeFn)); - if (!ReachabilityAA.isAssumedReachable(UserI, getCtxI())) + if (!ReachabilityAA.isAssumedReachable(UserI, getCtxI(), A)) return true; if (auto *CB = dyn_cast(UserI)) { Index: llvm/include/llvm/Transforms/IPO/Attributor.h =================================================================== --- llvm/include/llvm/Transforms/IPO/Attributor.h +++ llvm/include/llvm/Transforms/IPO/Attributor.h @@ -715,6 +715,19 @@ /// Return the map conaining all the knowledge we have from `llvm.assume`s. const RetainedKnowledgeMap &getKnowledgeMap() const { return KnowledgeMap; } + bool getPotentiallyReachable(const Instruction *From, const Instruction *To) { + auto KeyPair = std::make_pair(From, To); + auto Iter = PotentiallyReachableMap.find(KeyPair); + const Function &F = *From->getFunction(); + if (Iter != PotentiallyReachableMap.end()) + return Iter->second; + bool Result = isPotentiallyReachable( + From, To, nullptr, AG.getAnalysis(F), + AG.getAnalysis(F)); + PotentiallyReachableMap.insert(std::make_pair(KeyPair, Result)); + return Result; + } + private: struct FunctionInfo { ~FunctionInfo(); @@ -774,6 +787,10 @@ /// Set of inlineable functions SmallPtrSet InlineableFunctions; + /// A map for caching results of queries for isPotentiallyReachable + DenseMap, bool> + PotentiallyReachableMap; + /// Give the Attributor access to the members so /// Attributor::identifyDefaultAbstractAttributes(...) can initialize them. friend struct Attributor; @@ -2291,16 +2308,21 @@ /// Returns true if 'From' instruction is assumed to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isAssumedReachable(const Instruction *From, - const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isAssumedReachable(const Instruction *From, const Instruction *To, + Attributor &A) const { + const auto &LivenessAA = + A.getAAFor(*this, IRPosition::value(*From)); + if (A.isAssumedDead(*From, this, &LivenessAA)) + return false; + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Returns true if 'From' instruction is known to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isKnownReachable(const Instruction *From, const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isKnownReachable(const Instruction *From, const Instruction *To, + Attributor &A) const { + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Create an abstract attribute view for the position \p IRP. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83246.275779.patch Type: text/x-patch Size: 3397 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:57:32 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:57:32 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) Message-ID: tejohnson created this revision. tejohnson added reviewers: vitalybuka, kcc, eugenis. Herald added subscribers: Sanitizers, dberris. Herald added a project: Sanitizers. This refactors some common support related to shadow memory setup from asan and hwasan into sanitizer_common. This should not only reduce code duplication but also make these facilities available for new compiler-rt uses (e.g. heap profiling). In most cases the separate copies of the code were either identical, or at least functionally identical. A few notes: In ProtectGap, the asan version checked the address against an upper bound (kZeroBaseMaxShadowStart, which is (2^18). I have created a copy of kZeroBaseMaxShadowStart in hwasan_mapping.h, with the same value, as it isn't clear why that code should not do the same check. If it shouldn't, I can remove this and guard this check so that it only happens for asan. In asan's InitializeShadowMemory, in the dynamic shadow case it was setting __asan_shadow_memory_dynamic_address to 0 (which then sets both macro SHADOW_OFFSET as well as macro kLowShadowBeg to 0) before calling FindDynamicShadowStart(). AFAICT this is only needed because FindDynamicShadowStart utilizes kHighShadowEnd to get the shadow size, and kHighShadowEnd is a macro invoking MEM_TO_SHADOW(kHighMemEnd) which in turn invokes: (((kHighMemEnd) >> SHADOW_SCALE) + (SHADOW_OFFSET)) I.e. it computes the shadow space needed by kHighMemEnd (the shift), and adds the offset. Since we only want the shadow space here, the earlier setting of SHADOW_OFFSET to 0 via __asan_shadow_memory_dynamic_address accomplishes this. In the hwasan version, it simply gets the shadow space via "MemToShadowSize(kHighMemEnd)", where MemToShadowSize just does the shift. I've simplified the asan handling to do the same thing, and therefore was able to remove the setting of the SHADOW_OFFSET via __asan_shadow_memory_dynamic_address to 0. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83247 Files: compiler-rt/lib/asan/asan_internal.h compiler-rt/lib/asan/asan_linux.cpp compiler-rt/lib/asan/asan_mac.cpp compiler-rt/lib/asan/asan_premap_shadow.cpp compiler-rt/lib/asan/asan_rtl.cpp compiler-rt/lib/asan/asan_shadow_setup.cpp compiler-rt/lib/asan/asan_win.cpp compiler-rt/lib/hwasan/hwasan.cpp compiler-rt/lib/hwasan/hwasan.h compiler-rt/lib/hwasan/hwasan_dynamic_shadow.cpp compiler-rt/lib/hwasan/hwasan_linux.cpp compiler-rt/lib/hwasan/hwasan_mapping.h compiler-rt/lib/sanitizer_common/sanitizer_common.cpp compiler-rt/lib/sanitizer_common/sanitizer_common.h compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cpp compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp compiler-rt/lib/sanitizer_common/sanitizer_win.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83247.275781.patch Type: text/x-patch Size: 27597 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 11:57:38 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 18:57:38 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <4aeea9fb6322acddf9e87d19d4d397a7@localhost.localdomain> biplmish marked an inline comment as done. biplmish added inline comments. ================ Comment at: clang/lib/Headers/altivec.h:17210 + +static __inline__ vector double __ATTRS_o_ai vec_splatid(const float __a) { + return ((vector double)((double)__a)); ---------------- amyk wrote: > Move function name on next line for consistency. These changes break the clang-format. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:633 +vector signed int test_vec_vec_splati_si(void) { + // CHECK: ret <4 x i32> + return vec_splati(-17); ---------------- lei wrote: > missing CHECK-BE vec_splati and vec_splati do not have an endianess specific implementation. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 11:59:06 2020 From: llvm-commits at lists.llvm.org (Sam McCall via llvm-commits) Date: Mon, 6 Jul 2020 20:59:06 +0200 Subject: [llvm] d7ea6ce - [Support] fix user_cache_directory on mac In-Reply-To: References: <5f0302dc.1c69fb81.983ca.9e7d@mx.google.com> Message-ID: This is a compile fix for cd209f1a3790, which is tested including on Mac. The #ifdef __APPLE__ part didn't compile, and if it did the test wouldn't have passed. I didn't have a Mac to test on. (Typo in commit message: should be user_config_directory) On Mon, Jul 6, 2020, 8:36 PM David Blaikie wrote: > Any chance of testing this? (maybe even a unit test? (even if it's > uninteresting on other platforms, perhaps)?) > > On Mon, Jul 6, 2020 at 3:54 AM Sam McCall via llvm-commits > wrote: > > > > > > Author: Sam McCall > > Date: 2020-07-06T12:54:11+02:00 > > New Revision: d7ea6ce809a4413afb1edafa17ba291b39129f52 > > > > URL: > https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52 > > DIFF: > https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52.diff > > > > LOG: [Support] fix user_cache_directory on mac > > > > Added: > > > > > > Modified: > > llvm/lib/Support/Unix/Path.inc > > > > Removed: > > > > > > > > > ################################################################################ > > diff --git a/llvm/lib/Support/Unix/Path.inc > b/llvm/lib/Support/Unix/Path.inc > > index c35db79cbd8a..d91b269cc6d3 100644 > > --- a/llvm/lib/Support/Unix/Path.inc > > +++ b/llvm/lib/Support/Unix/Path.inc > > @@ -1193,7 +1193,7 @@ bool user_config_directory(SmallVectorImpl > &result) { > > #ifdef __APPLE__ > > // Mac: ~/Library/Preferences/ > > if (home_directory(result)) { > > - append("Library", "Preferences"); > > + append(result, "Library", "Preferences"); > > return true; > > } > > #else > > > > > > > > _______________________________________________ > > llvm-commits mailing list > > llvm-commits at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:00:11 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:00:11 +0000 (UTC) Subject: [PATCH] D82791: [lit] Improve lit's output with default settings and --verbose. In-Reply-To: References: Message-ID: <9143896c53f9810308f40130c5aa0893@localhost.localdomain> yln added a comment. The changes here are for text printed for failing tests, right? And `showAllOutput` is orthogonal and just shows everything even for non-failing tests? ================ Comment at: llvm/utils/lit/lit/OutputSettings.py:34 +ONLY_FAILING_COMMAND = CommandOutputStyle("OnlyFailing") +UP_TO_AND_INCLUDING_FAILING_COMMAND = CommandOutputStyle("UpToAndIncluding") + ---------------- I really like the "communicating intent" part of this infrastructure. However, this is a lot of code and `if`s considering we really only have to distinguish two cases. From your commit message: - default (no flags): no script, show only failing line in command output - `-v`: full script, adds 'set +x', shows command output Am I missing something or could everything be keyed off a single verbose (reading from the code below `showAllOutput` implies verbose) flag. Do we anticipate other combinations being useful? (In general I would argue for striving for the simplest implementation that gives us what we currently want and not try to anticipate future extensions.) Have you considered (or started out with) just using a single verbose flag to base your decisions in the implementation functions? ================ Comment at: llvm/utils/lit/lit/TestRunner.py:1499 +def locate_last_run_line(output_str): + """ ---------------- I feel like this function is the most complex part of this patch. I don't fully follow it, but you experimented and believe this is the best approach and wrote a whole test suite so I am happy with it. ================ Comment at: llvm/utils/lit/lit/TestRunner.py:1503-1508 + Returns a pair of: + - The index in ``output_str`` pointing to immediately after the preceding + newline, i.e. the start of the RUN line, before any shell-specific + prefix. + - The matched substring itself, including the number at the end, + starting with 'RUN', skipping the shell-specific prefix. ---------------- Why use a complicated parser-like return value? Our only caller below could just receive the potentially found RUN line. ================ Comment at: llvm/utils/lit/lit/TestRunner.py:1593 + assert(lit_config.script_output_style == OutputSettings.FULL_SCRIPT) + return default_output() + ---------------- Why are we using two local functions here? The whole thing could be (already assuming just one verbose flag): ``` def make_script_output(lit_config, script_lines, exit_code): if not lit_config.verbose: return "" return ... ``` ================ Comment at: llvm/utils/lit/lit/TestRunner.py:1614-1617 + line_start, run_line_str = locate_last_run_line(cmd_output) + # maybe there was an error, or maybe we are not truncating anything + if run_line_str is None or line_start == 0: + return default_output() ---------------- This block should be pushed into the `lit_config.command_output_style == OutputSettings.ONLY_FAILING_COMMAND` case, otherwise we are always searching for the last line, even if we don't really use the result of the computation. Also: if we fail to find the RUN line, we might print the note to use `--verbose` even if the user already specified it. ================ Comment at: llvm/utils/lit/lit/TestRunner.py:1624 + == OutputSettings.UP_TO_AND_INCLUDING_FAILING_COMMAND) + return make_output(cmd_output, is_truncated=False) ---------------- Please try to simplify this a bit, maybe the following? ``` def make_command_output(): if not lit_config.quiet: return "" verbose = lit_config.verbose output = "header {} ...".format(", truncated" if verbose else "") if verbose: output += cmd_output else: run_line = locate_last_run_line() # then deal with "not found" case output += ... output += "Note: try to use --verbose" return output ``` ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:56 action="store_true") + # TODO(python3): Use aliases kwarg for add_argument above. format_group.add_argument("-vv", "--echo-all-commands", ---------------- I think aliases kwarg is something else (seems to be related to subparsers). You could just add `"-vv", "--echo-all-commands"` after `"--verbose"` to allow for additional names for the flag, but I think I prefer to have it separate (makes it easer to remove it if we ever decide to drop the alias in the future). So long comment short: just drop this comment please. ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:178 + opts.command_output_style = OutputSettings.ONLY_FAILING_COMMAND + opts.echo_all_commands = True + ---------------- Unconditionally overwritten below ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:195 + cmd_output_style == OutputSettings.UP_TO_AND_INCLUDING_FAILING_COMMAND + or cmd_output_style == OutputSettings.ONLY_FAILING_COMMAND) ---------------- `opts.echo_all_commands = opts.command_output_style in {OutputSettings.UP_TO_AND_INCLUDING_FAILING_COMMAND, ...}` ================ Comment at: llvm/utils/lit/tests/shtest-run-at-line.py:70 + +################################################################################ +# Checking lines for verbose output ---------------- I really like the extension to this test. Thanks! ================ Comment at: llvm/utils/lit/tests/unittest-failing-locator.py:122 + +unittest.main() ---------------- Thanks for being diligent and adding these tests. This will also make it easy to add additional tests for corner cases if we need them in the future! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82791/new/ https://reviews.llvm.org/D82791 From llvm-commits at lists.llvm.org Mon Jul 6 12:05:50 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 12:05:50 -0700 Subject: [llvm] 8c288db - Reland [ADT] Support const-qualified unique_functions In-Reply-To: <5efa43ac.1c69fb81.93a5c.176a@mx.google.com> References: <5efa43ac.1c69fb81.93a5c.176a@mx.google.com> Message-ID: When recommitting a patch it'd be handy if you could include the full original commit message (since this is the patch folks are likely going to look at when going back to understand these changes via commit history, etc) - and I like to include the original commit hash as well as the revert hash, as well as some details about what was changed between the original commit and the recommit (so those changes can be post-commit reviewed more easily, if the original commit had already been reviewed by that reviewer) & whatever extra testing was done to ensure the class of issues that caused the revert had been addressed (eg: if this failed to compile with a certain compiler - explaining that LLVM was rebuilt with that compiler (validating both that you could reproduce the reason for the revert, and that the fixed version of the patch no longer produced that problem nor any others from that source/compiler)). Thanks! On Mon, Jun 29, 2020 at 3:54 PM Sam McCall via llvm-commits wrote: > > > Author: Sam McCall > Date: 2020-06-29T21:40:16+02:00 > New Revision: 8c288db2c69a2a9c75bfa629d54f80cf8953b11b > > URL: https://github.com/llvm/llvm-project/commit/8c288db2c69a2a9c75bfa629d54f80cf8953b11b > DIFF: https://github.com/llvm/llvm-project/commit/8c288db2c69a2a9c75bfa629d54f80cf8953b11b.diff > > LOG: Reland [ADT] Support const-qualified unique_functions > > This reverts commit 09b6dffb8ed19d624fddc7a57ce886f8be3c45b2. > > Now compiles with GCC! > > Added: > > > Modified: > llvm/include/llvm/ADT/FunctionExtras.h > llvm/unittests/ADT/FunctionExtrasTest.cpp > > Removed: > > > > ################################################################################ > diff --git a/llvm/include/llvm/ADT/FunctionExtras.h b/llvm/include/llvm/ADT/FunctionExtras.h > index ad84bbc35b78..b675889bce33 100644 > --- a/llvm/include/llvm/ADT/FunctionExtras.h > +++ b/llvm/include/llvm/ADT/FunctionExtras.h > @@ -11,11 +11,11 @@ > /// in ``. > /// > /// It provides `unique_function`, which works like `std::function` but supports > -/// move-only callable objects. > +/// move-only callable objects and const-qualification. > /// > /// Future plans: > -/// - Add a `function` that provides const, volatile, and ref-qualified support, > -/// which doesn't work with `std::function`. > +/// - Add a `function` that provides ref-qualified support, which doesn't work > +/// with `std::function`. > /// - Provide support for specifying multiple signatures to type erase callable > /// objects with an overload set, such as those produced by generic lambdas. > /// - Expand to include a copyable utility that directly replaces std::function > @@ -37,13 +37,31 @@ > #include "llvm/Support/MemAlloc.h" > #include "llvm/Support/type_traits.h" > #include > +#include > > namespace llvm { > > +/// unique_function is a type-erasing functor similar to std::function. > +/// > +/// It can hold move-only function objects, like lambdas capturing unique_ptrs. > +/// Accordingly, it is movable but not copyable. > +/// > +/// It supports const-qualification: > +/// - unique_function has a const operator(). > +/// It can only hold functions which themselves have a const operator(). > +/// - unique_function has a non-const operator(). > +/// It can hold functions with a non-const operator(), like mutable lambdas. > template class unique_function; > > -template > -class unique_function { > +namespace detail { > + > +template > +using EnableIfTrivial = > + std::enable_if_t::value && > + std::is_trivially_destructible::value>; > + > +template class UniqueFunctionBase { > +protected: > static constexpr size_t InlineStorageSize = sizeof(void *) * 3; > > // MSVC has a bug and ICEs if we give it a particular dependent value > @@ -113,8 +131,11 @@ class unique_function { > > // For in-line storage, we just provide an aligned character buffer. We > // provide three pointers worth of storage here. > - typename std::aligned_storage::type > - InlineStorage; > + // This is mutable as an inlined `const unique_function` may > + // still modify its own mutable members. > + mutable > + typename std::aligned_storage::type > + InlineStorage; > } StorageUnion; > > // A compressed pointer to either our dispatching callback or our table of > @@ -137,11 +158,25 @@ class unique_function { > .template get(); > } > > - void *getInlineStorage() { return &StorageUnion.InlineStorage; } > + CallPtrT getCallPtr() const { > + return isTrivialCallback() ? getTrivialCallback() > + : getNonTrivialCallbacks()->CallPtr; > + } > > - void *getOutOfLineStorage() { > + // These three functions are only const in the narrow sense. They return > + // mutable pointers to function state. > + // This allows unique_function::operator() to be const, even if the > + // underlying functor may be internally mutable. > + // > + // const callers must ensure they're only used in const-correct ways. > + void *getCalleePtr() const { > + return isInlineStorage() ? getInlineStorage() : getOutOfLineStorage(); > + } > + void *getInlineStorage() const { return &StorageUnion.InlineStorage; } > + void *getOutOfLineStorage() const { > return StorageUnion.OutOfLineStorage.StoragePtr; > } > + > size_t getOutOfLineStorageSize() const { > return StorageUnion.OutOfLineStorage.Size; > } > @@ -153,10 +188,11 @@ class unique_function { > StorageUnion.OutOfLineStorage = {Ptr, Size, Alignment}; > } > > - template > - static ReturnT CallImpl(void *CallableAddr, AdjustedParamT... Params) { > - return (*reinterpret_cast(CallableAddr))( > - std::forward(Params)...); > + template > + static ReturnT CallImpl(void *CallableAddr, > + AdjustedParamT... Params) { > + auto &Func = *reinterpret_cast(CallableAddr); > + return Func(std::forward(Params)...); > } > > template > @@ -170,11 +206,54 @@ class unique_function { > reinterpret_cast(CallableAddr)->~CallableT(); > } > > -public: > - unique_function() = default; > - unique_function(std::nullptr_t /*null_callable*/) {} > + // The pointers to call/move/destroy functions are determined for each > + // callable type (and called-as type, which determines the overload chosen). > + // (definitions are out-of-line). > + > + // By default, we need an object that contains all the > diff erent > + // type erased behaviors needed. Create a static instance of the struct type > + // here and each instance will contain a pointer to it. > + // Wrap in a struct to avoid https://gcc.gnu.org/PR71954 > + template > + struct CallbacksHolder { > + static NonTrivialCallbacks Callbacks; > + }; > + // See if we can create a trivial callback. We need the callable to be > + // trivially moved and trivially destroyed so that we don't have to store > + // type erased callbacks for those operations. > + template > + struct CallbacksHolder> { > + static TrivialCallback Callbacks; > + }; > + > + // A simple tag type so the call-as type to be passed to the constructor. > + template struct CalledAs {}; > + > + // Essentially the "main" unique_function constructor, but subclasses > + // provide the qualified type to be used for the call. > + // (We always store a T, even if the call will use a pointer to const T). > + template > + UniqueFunctionBase(CallableT Callable, CalledAs) { > + bool IsInlineStorage = true; > + void *CallableAddr = getInlineStorage(); > + if (sizeof(CallableT) > InlineStorageSize || > + alignof(CallableT) > alignof(decltype(StorageUnion.InlineStorage))) { > + IsInlineStorage = false; > + // Allocate out-of-line storage. FIXME: Use an explicit alignment > + // parameter in C++17 mode. > + auto Size = sizeof(CallableT); > + auto Alignment = alignof(CallableT); > + CallableAddr = allocate_buffer(Size, Alignment); > + setOutOfLineStorage(CallableAddr, Size, Alignment); > + } > + > + // Now move into the storage. > + new (CallableAddr) CallableT(std::move(Callable)); > + CallbackAndInlineFlag = {&CallbacksHolder::Callbacks, > + IsInlineStorage}; > + } > > - ~unique_function() { > + ~UniqueFunctionBase() { > if (!CallbackAndInlineFlag.getPointer()) > return; > > @@ -190,7 +269,7 @@ class unique_function { > getOutOfLineStorageAlignment()); > } > > - unique_function(unique_function &&RHS) noexcept { > + UniqueFunctionBase(UniqueFunctionBase &&RHS) noexcept { > // Copy the callback and inline flag. > CallbackAndInlineFlag = RHS.CallbackAndInlineFlag; > > @@ -219,72 +298,83 @@ class unique_function { > #endif > } > > - unique_function &operator=(unique_function &&RHS) noexcept { > + UniqueFunctionBase &operator=(UniqueFunctionBase &&RHS) noexcept { > if (this == &RHS) > return *this; > > // Because we don't try to provide any exception safety guarantees we can > // implement move assignment very simply by first destroying the current > // object and then move-constructing over top of it. > - this->~unique_function(); > - new (this) unique_function(std::move(RHS)); > + this->~UniqueFunctionBase(); > + new (this) UniqueFunctionBase(std::move(RHS)); > return *this; > } > > - template unique_function(CallableT Callable) { > - bool IsInlineStorage = true; > - void *CallableAddr = getInlineStorage(); > - if (sizeof(CallableT) > InlineStorageSize || > - alignof(CallableT) > alignof(decltype(StorageUnion.InlineStorage))) { > - IsInlineStorage = false; > - // Allocate out-of-line storage. FIXME: Use an explicit alignment > - // parameter in C++17 mode. > - auto Size = sizeof(CallableT); > - auto Alignment = alignof(CallableT); > - CallableAddr = allocate_buffer(Size, Alignment); > - setOutOfLineStorage(CallableAddr, Size, Alignment); > - } > + UniqueFunctionBase() = default; > > - // Now move into the storage. > - new (CallableAddr) CallableT(std::move(Callable)); > +public: > + explicit operator bool() const { > + return (bool)CallbackAndInlineFlag.getPointer(); > + } > +}; > > - // See if we can create a trivial callback. We need the callable to be > - // trivially moved and trivially destroyed so that we don't have to store > - // type erased callbacks for those operations. > - // > - // FIXME: We should use constexpr if here and below to avoid instantiating > - // the non-trivial static objects when unnecessary. While the linker should > - // remove them, it is still wasteful. > - if (llvm::is_trivially_move_constructible::value && > - std::is_trivially_destructible::value) { > - // We need to create a nicely aligned object. We use a static variable > - // for this because it is a trivial struct. > - static TrivialCallback Callback = { &CallImpl }; > - > - CallbackAndInlineFlag = {&Callback, IsInlineStorage}; > - return; > - } > +template > +template > +typename UniqueFunctionBase::NonTrivialCallbacks UniqueFunctionBase< > + R, P...>::CallbacksHolder::Callbacks = { > + &CallImpl, &MoveImpl, &DestroyImpl}; > > - // Otherwise, we need to point at an object that contains all the > diff erent > - // type erased behaviors needed. Create a static instance of the struct type > - // here and then use a pointer to that. > - static NonTrivialCallbacks Callbacks = { > - &CallImpl, &MoveImpl, &DestroyImpl}; > +template > +template > +typename UniqueFunctionBase::TrivialCallback > + UniqueFunctionBase::CallbacksHolder< > + CallableT, CalledAsT, EnableIfTrivial>::Callbacks{ > + &CallImpl}; > > - CallbackAndInlineFlag = {&Callbacks, IsInlineStorage}; > - } > +} // namespace detail > + > +template > +class unique_function : public detail::UniqueFunctionBase { > + using Base = detail::UniqueFunctionBase; > + > +public: > + unique_function() = default; > + unique_function(std::nullptr_t) {} > + unique_function(unique_function &&) = default; > + unique_function(const unique_function &) = delete; > + unique_function &operator=(unique_function &&) = default; > + unique_function &operator=(const unique_function &) = delete; > > - ReturnT operator()(ParamTs... Params) { > - void *CallableAddr = > - isInlineStorage() ? getInlineStorage() : getOutOfLineStorage(); > + template > + unique_function(CallableT Callable) > + : Base(std::forward(Callable), > + typename Base::template CalledAs{}) {} > > - return (isTrivialCallback() > - ? getTrivialCallback() > - : getNonTrivialCallbacks()->CallPtr)(CallableAddr, Params...); > + R operator()(P... Params) { > + return this->getCallPtr()(this->getCalleePtr(), Params...); > } > +}; > > - explicit operator bool() const { > - return (bool)CallbackAndInlineFlag.getPointer(); > +template > +class unique_function > + : public detail::UniqueFunctionBase { > + using Base = detail::UniqueFunctionBase; > + > +public: > + unique_function() = default; > + unique_function(std::nullptr_t) {} > + unique_function(unique_function &&) = default; > + unique_function(const unique_function &) = delete; > + unique_function &operator=(unique_function &&) = default; > + unique_function &operator=(const unique_function &) = delete; > + > + template > + unique_function(CallableT Callable) > + : Base(std::forward(Callable), > + typename Base::template CalledAs{}) {} > + > + R operator()(P... Params) const { > + return this->getCallPtr()(this->getCalleePtr(), Params...); > } > }; > > > diff --git a/llvm/unittests/ADT/FunctionExtrasTest.cpp b/llvm/unittests/ADT/FunctionExtrasTest.cpp > index bbbb045cb14a..2ae0d1813858 100644 > --- a/llvm/unittests/ADT/FunctionExtrasTest.cpp > +++ b/llvm/unittests/ADT/FunctionExtrasTest.cpp > @@ -10,6 +10,7 @@ > #include "gtest/gtest.h" > > #include > +#include > > using namespace llvm; > > @@ -224,4 +225,41 @@ TEST(UniqueFunctionTest, CountForwardingMoves) { > UnmovableF(X); > } > > +TEST(UniqueFunctionTest, Const) { > + // Can assign from const lambda. > + unique_function Plus2 = [X(std::make_unique(2))](int Y) { > + return *X + Y; > + }; > + EXPECT_EQ(5, Plus2(3)); > + > + // Can call through a const ref. > + const auto &Plus2Ref = Plus2; > + EXPECT_EQ(5, Plus2Ref(3)); > + > + // Can move-construct and assign. > + unique_function Plus2A = std::move(Plus2); > + EXPECT_EQ(5, Plus2A(3)); > + unique_function Plus2B; > + Plus2B = std::move(Plus2A); > + EXPECT_EQ(5, Plus2B(3)); > + > + // Can convert to non-const function type, but not back. > + unique_function Plus2C = std::move(Plus2B); > + EXPECT_EQ(5, Plus2C(3)); > + > + // Overloaded call operator correctly resolved. > + struct ChooseCorrectOverload { > + StringRef operator()() { return "non-const"; } > + StringRef operator()() const { return "const"; } > + }; > + unique_function ChooseMutable = ChooseCorrectOverload(); > + ChooseCorrectOverload A; > + EXPECT_EQ("non-const", ChooseMutable()); > + EXPECT_EQ("non-const", A()); > + unique_function ChooseConst = ChooseCorrectOverload(); > + const ChooseCorrectOverload &X = A; > + EXPECT_EQ("const", ChooseConst()); > + EXPECT_EQ("const", X()); > +} > + > } // anonymous namespace > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 12:06:02 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:06:02 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: chandlerc, jdoerfert, dblaikie, nickdesaulniers. lebedev.ri added a project: LLVM. Herald added a subscriber: hiraditya. It is reasonably common to want to clone some call with different bundles. Let's actually provide an interface to do that. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83248 Files: llvm/include/llvm/IR/InstrTypes.h llvm/lib/IR/Instructions.cpp llvm/lib/Transforms/CFGuard/CFGuard.cpp llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/lib/Transforms/Utils/InlineFunction.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83248.275782.patch Type: text/x-patch Size: 5210 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:06:06 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:06:06 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: <5c8c2c5d9ccae3b28b073a1754e3fcae@localhost.localdomain> tejohnson marked 3 inline comments as done. tejohnson added a comment. A couple comments and a question below. Also, what is the best way to test this for mac and win? I tried to find some directions for doing some cross-compiler builds of compiler-rt at least, but couldn't find something clear and definitive. ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:122 - uptr granularity = GetMmapGranularity(); - uptr alignment = granularity * 8; - uptr left_padding = granularity; ---------------- The code in asan is multiplying the mmap granularity by 8, whereas the hwasan version shifts it by kShadowScale. I wasn't sure if the 8 here is supposed to be equivalent to a left shift by the shadow scale (which is typically 3 in asan), or is specifically hardcoded separately not using SHADOW_SCALE since it could be something other than 3 in some cases (e.g. 5 for myriad, or user set via ASAN_SHADOW_SCALE). Depending on what was intended here, I would keep the hardcoding of "3" passed to my refactored MapDynamicShadow, or change that to SHADOW_SCALE. ================ Comment at: compiler-rt/lib/asan/asan_shadow_setup.cpp:102 if (shadow_start == kDefaultShadowSentinel) { - __asan_shadow_memory_dynamic_address = 0; - CHECK_EQ(0, kLowShadowBeg); ---------------- See patch description on why I removed this code. ================ Comment at: compiler-rt/lib/hwasan/hwasan_mapping.h:40 +// With the zero shadow base we can not actually map pages starting from 0. +// This constant is somewhat arbitrary. ---------------- Copied from the same values in asan. See the patch description for more info. Would it be better to keep a single copy of these values in sanitizer_common? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Mon Jul 6 12:06:10 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Mon, 06 Jul 2020 12:06:10 -0700 (PDT) Subject: [lld] dc6b3f0 - [ELF] Drop an unneeded reference to `symtab` from SymbolTable::addSymbol Message-ID: <5f037622.1c69fb81.14908.4d9b@mx.google.com> Author: William S. Moses Date: 2020-07-06T12:05:54-07:00 New Revision: dc6b3f03a872a1c551613e49db1d07bbdd8bfebb URL: https://github.com/llvm/llvm-project/commit/dc6b3f03a872a1c551613e49db1d07bbdd8bfebb DIFF: https://github.com/llvm/llvm-project/commit/dc6b3f03a872a1c551613e49db1d07bbdd8bfebb.diff LOG: [ELF] Drop an unneeded reference to `symtab` from SymbolTable::addSymbol The Symbol Table in LLD references the global object to add a symbol rather than adding it to itself. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D83184 Added: Modified: lld/ELF/SymbolTable.cpp Removed: ################################################################################ diff --git a/lld/ELF/SymbolTable.cpp b/lld/ELF/SymbolTable.cpp index f0a6af437c5f..afc8b05f8767 100644 --- a/lld/ELF/SymbolTable.cpp +++ b/lld/ELF/SymbolTable.cpp @@ -94,7 +94,7 @@ Symbol *SymbolTable::insert(StringRef name) { } Symbol *SymbolTable::addSymbol(const Symbol &newSym) { - Symbol *sym = symtab->insert(newSym.getName()); + Symbol *sym = insert(newSym.getName()); sym->resolve(newSym); return sym; } From llvm-commits at lists.llvm.org Mon Jul 6 12:06:12 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:06:12 +0000 (UTC) Subject: [PATCH] D83184: Avoid using globals in ELF Symbol Table In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGdc6b3f03a872: [ELF] Drop an unneeded reference to `symtab` from SymbolTable::addSymbol (authored by William S. Moses <gh at wsmoses.com>, committed by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83184/new/ https://reviews.llvm.org/D83184 Files: lld/ELF/SymbolTable.cpp Index: lld/ELF/SymbolTable.cpp =================================================================== --- lld/ELF/SymbolTable.cpp +++ lld/ELF/SymbolTable.cpp @@ -94,7 +94,7 @@ } Symbol *SymbolTable::addSymbol(const Symbol &newSym) { - Symbol *sym = symtab->insert(newSym.getName()); + Symbol *sym = insert(newSym.getName()); sym->resolve(newSym); return sym; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83184.275784.patch Type: text/x-patch Size: 371 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:06:46 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:06:46 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri updated this revision to Diff 275786. lebedev.ri marked 2 inline comments as done. lebedev.ri added a comment. Split `CallBase::Create()` into a separate patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 Files: llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83177.275786.patch Type: text/x-patch Size: 10454 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:06:56 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:06:56 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri added inline comments. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- nickdesaulniers wrote: > If we're going to create a new interface to `CallBase`, I kind of want to use it in more than just one place. In particular, at least `InlineFunction()` in `llvm/lib/Transforms/Utils/InlineFunction.cpp` looks like a perfect candidate to use this. There may be more, if you grep for the `cast`s. In that way, this change might help DRY up and also remove a repetitious pattern. WDYT? Doesn't really sound interesting to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 12:07:03 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:07:03 +0000 (UTC) Subject: [PATCH] D82821: [WebAssembly] Added 64-bit memory.grow/size/init/copy/fill In-Reply-To: References: Message-ID: <955e97d9062d22551e863144bb2f1d98@localhost.localdomain> aardappel updated this revision to Diff 275785. aardappel added a comment. Reinstated init/drop defs + test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82821/new/ https://reviews.llvm.org/D82821 Files: clang/include/clang/Basic/BuiltinsWebAssembly.def clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/builtins-wasm.c llvm/include/llvm/IR/IntrinsicsWebAssembly.td llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll llvm/test/CodeGen/WebAssembly/bulk-memory64.ll llvm/test/CodeGen/WebAssembly/memory-addr64.ll llvm/test/MC/WebAssembly/bulk-memory-encodings.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82821.275785.patch Type: text/x-patch Size: 22114 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:07:35 2020 From: llvm-commits at lists.llvm.org (Vedant Kumar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:07:35 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: vsk accepted this revision. vsk added a comment. This revision is now accepted and ready to land. Thanks, lgtm. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 From llvm-commits at lists.llvm.org Mon Jul 6 12:08:52 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:08:52 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <5baef0076737e7a609be66857ec28435@localhost.localdomain> clementval updated this revision to Diff 275788. clementval marked 19 inline comments as done. clementval added a comment. Address review comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275788.patch Type: text/x-patch Size: 95973 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:09:39 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:09:39 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: spatel updated this revision to Diff 275787. spatel marked 2 inline comments as done. spatel added a comment. Patch updated: Use recursivelyDeleteUnusedNodes() to reduce code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/X86/sqrt-fastmath.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82716.275787.patch Type: text/x-patch Size: 8428 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:09:59 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:09:59 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: clementval added a comment. Thanks for the review @jdenny. The patch is updated with your comments and some comments as well. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:65 +// Hold information about clause validity by version +class VersionedClause { + // Actual clause ---------------- jdenny wrote: > Why not unsigned as in the original? TableGen type does not have an `unsigned` type so I went with the closest one. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:69 + + // Mininum version number where this clause is valid in the list. + int minVersion = min; ---------------- jdenny wrote: > What does "the list" refer to? I updated the comment and dropped the list. I think it is clearer now. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:278 + VersionedClause, + VersionedClause]; } ---------------- jdenny wrote: > The closing `]` is inconsistently placed here and in some other cases. Good catch. I updated the file as well. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def:1881 -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, release) -// TODO This should ne `none` instead -__OMP_DIRECTIVE_CLAUSE(flush, 1, ~0, flush) ---------------- jdenny wrote: > This patch loses this TODO and the next one. I'm not sure what they mean. Do we need to keep them? I updated the patch to keep it. Same for `depobj`. ================ Comment at: llvm/test/TableGen/directive1.td:109 +// IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 1 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; ---------------- jdenny wrote: > I know the original code used a giant if-else block, but shouldn't this be a switch? The idea was to migrate to TableGen with the exact same code generated and to update this in a follow up patch. It would be replace with at least a switch for the `Directive` and probably an inner switch for the `Clause` or we keep the `if`s inside the Directives' switch. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:56 for (const auto &R : Records) { const auto Name = R->getValueAsString("name"); + OS << "constexpr auto " << Prefix << getFormattedName(Name) << " = " ---------------- jdenny wrote: > Any reason not to call `getFormattedName` here instead of twice below? Updated in the patch. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:183 const auto DefaultName = (*DefaultIt)->getValueAsString("name"); ---------------- jdenny wrote: > Call `getFormattedName(DefaultName)` once here? Updated as well. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- jdenny wrote: > Why is an empty line needed? Just to be consistent with clang-format in the generated file. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 12:10:21 2020 From: llvm-commits at lists.llvm.org (Konstantin Zhuravlyov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:10:21 +0000 (UTC) Subject: [PATCH] D83249: AMDGPU: Handle llvm.amdgcn.buffer.{load|store}.v2i16 intrinsics Message-ID: kzhuravl created this revision. kzhuravl added reviewers: arsenm, rampitec. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely. Herald added a project: LLVM. https://reviews.llvm.org/D83249 Files: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load-store.v2i16.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83249.275789.patch Type: text/x-patch Size: 5609 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:10:57 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:10:57 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper In-Reply-To: References: Message-ID: nickdesaulniers added inline comments. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- No call sites make use of this default value (`nullptr`). Is it the right value? I'm not sure if you can default it to `CB`, but if not, should it just be a required parameter with no default value? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83248/new/ https://reviews.llvm.org/D83248 From llvm-commits at lists.llvm.org Mon Jul 6 12:10:46 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 12:10:46 -0700 Subject: [llvm] 9649c20 - [InstCombine] Drop debug loc in TryToSinkInstruction (reland) In-Reply-To: <5ef69050.1c69fb81.6981b.4b43@mx.google.com> References: <5ef69050.1c69fb81.6981b.4b43@mx.google.com> Message-ID: On Fri, Jun 26, 2020 at 5:18 PM Vedant Kumar via llvm-commits wrote: > > > Author: Vedant Kumar > Date: 2020-06-26T17:18:15-07:00 > New Revision: 9649c2095f07a392bc2b2a93b5bd6c4c9bf5ba34 > > URL: https://github.com/llvm/llvm-project/commit/9649c2095f07a392bc2b2a93b5bd6c4c9bf5ba34 > DIFF: https://github.com/llvm/llvm-project/commit/9649c2095f07a392bc2b2a93b5bd6c4c9bf5ba34.diff > > LOG: [InstCombine] Drop debug loc in TryToSinkInstruction (reland) > > Summary: > The advice in HowToUpdateDebugInfo.rst is to "... preserve the debug > location of an instruction if the instruction either remains in its > basic block, or if its basic block is folded into a predecessor that > branches unconditionally". > > TryToSinkInstruction doesn't seem to satisfy the criteria as it's > sinking an instruction to some successor block. Preserving the debug loc > can make single-stepping appear to go backwards, or make a breakpoint > hit on that location happen "too late" (since single-stepping from that > breakpoint can cause the function to return unexpectedly). > > So, drop the debug location. > > This was reverted in ee3620643dfc because it removed source locations > from inlinable calls, breaking a verifier rule. I've added an exception > for calls because the alternative (setting a line 0 location) is not > better. What do you mean by "is not better"? My understanding is that the other reason for not moving such locations is profile accuracy. If the line is preserved when an instruction (including a call) is sunk into a codepath that doesn't necessarily pass through the original location, a sample based profile could incorrectly conclude that a conditional was taken that would've reached the original location of the instruction. > I tested the updated patch by completing a stage2 RelWithDebInfo > build. > > Reviewers: aprantl, davide > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D82487 > > Added: > llvm/test/Transforms/InstCombine/sink_to_unreachable_dbg.ll > > Modified: > llvm/lib/Transforms/InstCombine/InstructionCombining.cpp > > Removed: > > > > ################################################################################ > diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp > index 1f97f0c1ac99..6a406314f63c 100644 > --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp > +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp > @@ -3355,6 +3355,12 @@ static bool TryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { > I->moveBefore(&*InsertPos); > ++NumSunkInst; > > + // Drop the debug loc of non-inlinable instructions. This prevents > + // single-stepping from going backwards. See HowToUpdateDebugInfo.rst for > + // the full rationale. > + if (!isa(I)) > + I->setDebugLoc(DebugLoc()); > + > // Also sink all related debug uses from the source basic block. Otherwise we > // get debug use before the def. Attempt to salvage debug uses first, to > // maximise the range variables have location for. If we cannot salvage, then > > diff --git a/llvm/test/Transforms/InstCombine/sink_to_unreachable_dbg.ll b/llvm/test/Transforms/InstCombine/sink_to_unreachable_dbg.ll > new file mode 100644 > index 000000000000..e642276224b8 > --- /dev/null > +++ b/llvm/test/Transforms/InstCombine/sink_to_unreachable_dbg.ll > @@ -0,0 +1,46 @@ > +; RUN: opt -debugify -debugify-level=locations -instcombine -S < %s | FileCheck %s > + > +; CHECK-LABEL: @test1( > +; CHECK: [[phi:%.*]] = phi i32 > +; CHECK-NEXT: [[add:%.*]] = add i32 {{.*}}, 1{{$}} > +; CHECK-NEXT: add i32 [[phi]], [[add]], !dbg > +define i32 @test1(i32 %0, i1 %1) { > + %3 = add i32 %0, 1 > + br i1 %1, label %4, label %5 > + > +4: ; preds = %2 > + br label %6 > + > +5: ; preds = %2 > + br label %6 > + > +6: ; preds = %5, %4 > + %7 = phi i32 [ 0, %4 ], [ 1, %5 ] > + %8 = add i32 %7, %3 > + ret i32 %8 > +} > + > +; Function Attrs: nounwind readnone > +declare i32 @external(i32) #0 > + > +; CHECK-LABEL: @test2( > +; CHECK: [[phi:%.*]] = phi i32 > +; CHECK-NEXT: [[add:%.*]] = call i32 @external(i32 {{.*}}), !dbg > +; CHECK-NEXT: add i32 [[phi]], [[add]], !dbg > +define i32 @test2(i32 %0, i1 %1) { > + %3 = call i32 @external(i32 %0) > + br i1 %1, label %4, label %5 > + > +4: ; preds = %2 > + br label %6 > + > +5: ; preds = %2 > + br label %6 > + > +6: ; preds = %5, %4 > + %7 = phi i32 [ 0, %4 ], [ 1, %5 ] > + %8 = add i32 %7, %3 > + ret i32 %8 > +} > + > +attributes #0 = { nounwind readnone } > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 12:11:04 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:11:04 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: spatel added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13242 + // Estimate creation failed. Clean up speculatively created nodes. + if (AAZ->use_empty()) + DAG.RemoveDeadNode(AAZ.getNode()); ---------------- craig.topper wrote: > Can we just call recursivelyDeleteUnusedNodes(AAZ) if AAZ is unused and avoid the AA handling? Yes, that should do the same thing and save a few lines of code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 From llvm-commits at lists.llvm.org Mon Jul 6 12:12:56 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:12:56 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <40518df8518e66c16a502c81af625572@localhost.localdomain> nickdesaulniers accepted this revision. nickdesaulniers added a comment. This revision is now accepted and ready to land. > Doesn't really sound interesting to me. LOL, then why did you split it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 12:12:59 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:12:59 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper In-Reply-To: References: Message-ID: lebedev.ri marked 2 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- nickdesaulniers wrote: > No call sites make use of this default value (`nullptr`). Is it the right value? I'm not sure if you can default it to `CB`, but if not, should it just be a required parameter with no default value? This is a direct wrapper over all the subclasses `Create()` function with the same signature. They all follow this pattern. I'm not sure why we should deviate here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83248/new/ https://reviews.llvm.org/D83248 From llvm-commits at lists.llvm.org Mon Jul 6 12:13:38 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:13:38 +0000 (UTC) Subject: [PATCH] D83249: AMDGPU: Handle llvm.amdgcn.buffer.{load|store}.v2i16 intrinsics In-Reply-To: References: Message-ID: arsenm requested changes to this revision. arsenm added a comment. This revision now requires changes to proceed. Can you also make sure this works with globalisel? I think it should already, but it would be good to double check the tests are there ================ Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load-store.v2i16.ll:34-36 + %4 = ptrtoint i16* %0 to i64 + %5 = bitcast i64 %4 to <2 x i32> + %6 = shufflevector <2 x i32> %5, <2 x i32> undef, <4 x i32> ---------------- You can remove most of this setup code (also tests should use named values). ================ Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load-store.v2i16.ll:40 + %9 = shl i32 %8, 1 + %10 = tail call <2 x i16> @llvm.amdgcn.buffer.load.v2i16(<4 x i32> %7, i32 0, i32 %9, i1 zeroext false, i1 zeroext false) + ret <2 x i16> %10 ---------------- These are the legacy intrinsics. Only the .raw/.struct versions matter now. This should also add tests with <2 x half> CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83249/new/ https://reviews.llvm.org/D83249 From llvm-commits at lists.llvm.org Mon Jul 6 12:14:21 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:14:21 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper In-Reply-To: References: Message-ID: <6d72b8b4fcff485e65c09d90bb6cac6b@localhost.localdomain> nickdesaulniers accepted this revision. nickdesaulniers added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- lebedev.ri wrote: > nickdesaulniers wrote: > > No call sites make use of this default value (`nullptr`). Is it the right value? I'm not sure if you can default it to `CB`, but if not, should it just be a required parameter with no default value? > This is a direct wrapper over all the subclasses `Create()` function with the same signature. > They all follow this pattern. I'm not sure why we should deviate here. Ah, I missed the use in the child patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83248/new/ https://reviews.llvm.org/D83248 From llvm-commits at lists.llvm.org Mon Jul 6 12:14:42 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:14:42 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: biplmish updated this revision to Diff 275790. biplmish added a comment. Update tests to add checks for BE. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/test/CodeGen/PowerPC/p10-splatImm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82520.275790.patch Type: text/x-patch Size: 6281 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:15:05 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via llvm-commits) Date: Mon, 06 Jul 2020 12:15:05 -0700 (PDT) Subject: [compiler-rt] 71a6a41 - [Sanitizer] Fix demangling for Swift symbol names Message-ID: <5f037839.1c69fb81.f7515.2ce6@mx.google.com> Author: Julian Lettner Date: 2020-07-06T12:12:22-07:00 New Revision: 71a6a41f1c55c43c07942e49ef8ecdabd95f8b61 URL: https://github.com/llvm/llvm-project/commit/71a6a41f1c55c43c07942e49ef8ecdabd95f8b61 DIFF: https://github.com/llvm/llvm-project/commit/71a6a41f1c55c43c07942e49ef8ecdabd95f8b61.diff LOG: [Sanitizer] Fix demangling for Swift symbol names The Swift symbol name prefix has changed from `_T0` to `_$s` as documented here [1]. This prevents Swift names from properly being symbolicated when using the in-process LLVM symbolizer. The best way to fix this seems to be to avoid the duplication of "Is this a Swift symbol name?" here. We can simply remove this check as `swift_demangle` already returns null for non-Swift names [2,3]. The check was included in the initial support for Swift name demangling to avoid superfluous calls to `dlsym()` [4]. A subsequent commit changed this logic to retrieve the `swift_demangle` function pointer eagerly during sanitizer initialization, but did not remove the check [5]. [1] https://github.com/apple/swift/blob/master/docs/ABI/Mangling.rst [2] https://github.com/apple/swift/blob/b5a8b518eae54cea997f3b0954760fc7858829f6/include/swift/Demangling/Demangle.h#L643 [3] https://github.com/apple/swift/blob/b5a8b518eae54cea997f3b0954760fc7858829f6/stdlib/public/runtime/Demangle.cpp#L656 [4] https://reviews.llvm.org/D19135 [5] https://reviews.llvm.org/D20015 rdar://62753845 Reviewers: kubamracek, delcypher, dcoughlin, samsonov, thakis Reviewed By: kubamracek Differential Revision: https://reviews.llvm.org/D81705 Added: Modified: compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp Removed: ################################################################################ diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp index cbe5b5336b32..3c379a848025 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp @@ -78,13 +78,6 @@ static void InitializeSwiftDemangler() { // Attempts to demangle a Swift name. The demangler will return nullptr if a // non-Swift name is passed in. const char *DemangleSwift(const char *name) { - if (!name) return nullptr; - - // Check if we are dealing with a Swift mangled name first. - if (name[0] != '_' || name[1] != 'T') { - return nullptr; - } - if (swift_demangle_f) return swift_demangle_f(name, internal_strlen(name), 0, 0, 0); From llvm-commits at lists.llvm.org Mon Jul 6 12:16:43 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:16:43 +0000 (UTC) Subject: [PATCH] D83249: AMDGPU: Handle llvm.amdgcn.buffer.{load|store}.v2i16 intrinsics In-Reply-To: References: Message-ID: <7f345011694aa68cb37723ba424fa43f@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load-store.v2i16.ll:85 + ret void +} ---------------- Test should also go in the same file with the other cases CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83249/new/ https://reviews.llvm.org/D83249 From llvm-commits at lists.llvm.org Mon Jul 6 12:16:59 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via llvm-commits) Date: Mon, 06 Jul 2020 12:16:59 -0700 (PDT) Subject: [llvm] fa1fecc - [VE] Support symbol with offset in assembly Message-ID: <5f0378ab.1c69fb81.40ef6.0a88@mx.google.com> Author: Kazushi (Jam) Marukawa Date: 2020-07-07T04:16:51+09:00 New Revision: fa1fecc73d4d3884ae8eb887ac06c0f7f7492053 URL: https://github.com/llvm/llvm-project/commit/fa1fecc73d4d3884ae8eb887ac06c0f7f7492053 DIFF: https://github.com/llvm/llvm-project/commit/fa1fecc73d4d3884ae8eb887ac06c0f7f7492053.diff LOG: [VE] Support symbol with offset in assembly Summary: Change MCExpr to support Aurora VE's modifiers. Change asmparser to use existing MCExpr parser (parseExpression) to parse an expression contining symbols with modifiers and offsets. Also add several regression tests of MC layer. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D83170 Added: llvm/test/MC/VE/sym-br.s Modified: llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp llvm/test/MC/VE/symbols.s Removed: ################################################################################ diff --git a/llvm/include/llvm/MC/MCExpr.h b/llvm/include/llvm/MC/MCExpr.h index 776e116a6e16..803c0d443bee 100644 --- a/llvm/include/llvm/MC/MCExpr.h +++ b/llvm/include/llvm/MC/MCExpr.h @@ -327,6 +327,21 @@ class MCSymbolRefExpr : public MCExpr { VK_AMDGPU_ABS32_LO, // symbol at abs32@lo VK_AMDGPU_ABS32_HI, // symbol at abs32@hi + VK_VE_HI32, // symbol at hi + VK_VE_LO32, // symbol at lo + VK_VE_PC_HI32, // symbol at pc_hi + VK_VE_PC_LO32, // symbol at pc_lo + VK_VE_GOT_HI32, // symbol at got_hi + VK_VE_GOT_LO32, // symbol at got_lo + VK_VE_GOTOFF_HI32, // symbol at gotoff_hi + VK_VE_GOTOFF_LO32, // symbol at gotoff_lo + VK_VE_PLT_HI32, // symbol at plt_hi + VK_VE_PLT_LO32, // symbol at plt_lo + VK_VE_TLS_GD_HI32, // symbol at tls_gd_hi + VK_VE_TLS_GD_LO32, // symbol at tls_gd_lo + VK_VE_TPOFF_HI32, // symbol at tpoff_hi + VK_VE_TPOFF_LO32, // symbol at tpoff_lo + VK_TPREL, VK_DTPREL }; diff --git a/llvm/lib/MC/MCExpr.cpp b/llvm/lib/MC/MCExpr.cpp index fdb83aeef5eb..ecf63b10f73f 100644 --- a/llvm/lib/MC/MCExpr.cpp +++ b/llvm/lib/MC/MCExpr.cpp @@ -342,6 +342,20 @@ StringRef MCSymbolRefExpr::getVariantKindName(VariantKind Kind) { case VK_AMDGPU_REL64: return "rel64"; case VK_AMDGPU_ABS32_LO: return "abs32 at lo"; case VK_AMDGPU_ABS32_HI: return "abs32 at hi"; + case VK_VE_HI32: return "hi"; + case VK_VE_LO32: return "lo"; + case VK_VE_PC_HI32: return "pc_hi"; + case VK_VE_PC_LO32: return "pc_lo"; + case VK_VE_GOT_HI32: return "got_hi"; + case VK_VE_GOT_LO32: return "got_lo"; + case VK_VE_GOTOFF_HI32: return "gotoff_hi"; + case VK_VE_GOTOFF_LO32: return "gotoff_lo"; + case VK_VE_PLT_HI32: return "plt_hi"; + case VK_VE_PLT_LO32: return "plt_lo"; + case VK_VE_TLS_GD_HI32: return "tls_gd_hi"; + case VK_VE_TLS_GD_LO32: return "tls_gd_lo"; + case VK_VE_TPOFF_HI32: return "tpoff_hi"; + case VK_VE_TPOFF_LO32: return "tpoff_lo"; } llvm_unreachable("Invalid variant kind"); } @@ -463,6 +477,20 @@ MCSymbolRefExpr::getVariantKindForName(StringRef Name) { .Case("rel64", VK_AMDGPU_REL64) .Case("abs32 at lo", VK_AMDGPU_ABS32_LO) .Case("abs32 at hi", VK_AMDGPU_ABS32_HI) + .Case("hi", VK_VE_HI32) + .Case("lo", VK_VE_LO32) + .Case("pc_hi", VK_VE_PC_HI32) + .Case("pc_lo", VK_VE_PC_LO32) + .Case("got_hi", VK_VE_GOT_HI32) + .Case("got_lo", VK_VE_GOT_LO32) + .Case("gotoff_hi", VK_VE_GOTOFF_HI32) + .Case("gotoff_lo", VK_VE_GOTOFF_LO32) + .Case("plt_hi", VK_VE_PLT_HI32) + .Case("plt_lo", VK_VE_PLT_LO32) + .Case("tls_gd_hi", VK_VE_TLS_GD_HI32) + .Case("tls_gd_lo", VK_VE_TLS_GD_LO32) + .Case("tpoff_hi", VK_VE_TPOFF_HI32) + .Case("tpoff_lo", VK_VE_TPOFF_LO32) .Default(VK_Invalid); } diff --git a/llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp b/llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp index 214adf4c3c5b..7a899b4b38e2 100644 --- a/llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp +++ b/llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp @@ -65,8 +65,6 @@ class VEAsmParser : public MCTargetAsmParser { unsigned validateTargetOperandClass(MCParsedAsmOperand &Op, unsigned Kind) override; - // Helper function to parse and generate identifier with relocation. - const MCExpr *parseIdentifier(StringRef Identifier); // Custom parse functions for VE specific operands. OperandMatchResultTy parseMEMOperand(OperandVector &Operands); OperandMatchResultTy parseMEMAsOperand(OperandVector &Operands); @@ -75,6 +73,13 @@ class VEAsmParser : public MCTargetAsmParser { OperandMatchResultTy parseMImmOperand(OperandVector &Operands); OperandMatchResultTy parseOperand(OperandVector &Operands, StringRef Name); OperandMatchResultTy parseVEAsmOperand(std::unique_ptr &Operand); + + // Helper function to parse expression with a symbol. + const MCExpr *extractModifierFromExpr(const MCExpr *E, + VEMCExpr::VariantKind &Variant); + const MCExpr *fixupVariantKind(const MCExpr *E); + bool parseExpression(const MCExpr *&EVal); + // Split the mnemonic stripping conditional code and quantifiers StringRef splitMnemonic(StringRef Name, SMLoc NameLoc, OperandVector *Operands); @@ -948,27 +953,163 @@ bool VEAsmParser::ParseDirective(AsmToken DirectiveID) { return true; } -const MCExpr *VEAsmParser::parseIdentifier(StringRef Identifier) { - StringRef Modifier; - // Search @modifiers like "symbol at hi". - size_t at = Identifier.rfind('@'); - if (at != 0 || at != StringRef::npos) { - std::pair Pair = Identifier.rsplit("@"); - if (!Pair.first.empty() && !Pair.second.empty()) { - Identifier = Pair.first; - Modifier = Pair.second; +/// Extract \code @lo32/@hi32/etc \endcode modifier from expression. +/// Recursively scan the expression and check for VK_VE_HI32/LO32/etc +/// symbol variants. If all symbols with modifier use the same +/// variant, return the corresponding VEMCExpr::VariantKind, +/// and a modified expression using the default symbol variant. +/// Otherwise, return NULL. +const MCExpr * +VEAsmParser::extractModifierFromExpr(const MCExpr *E, + VEMCExpr::VariantKind &Variant) { + MCContext &Context = getParser().getContext(); + Variant = VEMCExpr::VK_VE_None; + + switch (E->getKind()) { + case MCExpr::Target: + case MCExpr::Constant: + return nullptr; + + case MCExpr::SymbolRef: { + const MCSymbolRefExpr *SRE = cast(E); + + switch (SRE->getKind()) { + case MCSymbolRefExpr::VK_None: + // Use VK_VE_REFLONG to a symbol without modifiers. + Variant = VEMCExpr::VK_VE_REFLONG; + break; + case MCSymbolRefExpr::VK_VE_HI32: + Variant = VEMCExpr::VK_VE_HI32; + break; + case MCSymbolRefExpr::VK_VE_LO32: + Variant = VEMCExpr::VK_VE_LO32; + break; + case MCSymbolRefExpr::VK_VE_PC_HI32: + Variant = VEMCExpr::VK_VE_PC_HI32; + break; + case MCSymbolRefExpr::VK_VE_PC_LO32: + Variant = VEMCExpr::VK_VE_PC_LO32; + break; + case MCSymbolRefExpr::VK_VE_GOT_HI32: + Variant = VEMCExpr::VK_VE_GOT_HI32; + break; + case MCSymbolRefExpr::VK_VE_GOT_LO32: + Variant = VEMCExpr::VK_VE_GOT_LO32; + break; + case MCSymbolRefExpr::VK_VE_GOTOFF_HI32: + Variant = VEMCExpr::VK_VE_GOTOFF_HI32; + break; + case MCSymbolRefExpr::VK_VE_GOTOFF_LO32: + Variant = VEMCExpr::VK_VE_GOTOFF_LO32; + break; + case MCSymbolRefExpr::VK_VE_PLT_HI32: + Variant = VEMCExpr::VK_VE_PLT_HI32; + break; + case MCSymbolRefExpr::VK_VE_PLT_LO32: + Variant = VEMCExpr::VK_VE_PLT_LO32; + break; + case MCSymbolRefExpr::VK_VE_TLS_GD_HI32: + Variant = VEMCExpr::VK_VE_TLS_GD_HI32; + break; + case MCSymbolRefExpr::VK_VE_TLS_GD_LO32: + Variant = VEMCExpr::VK_VE_TLS_GD_LO32; + break; + case MCSymbolRefExpr::VK_VE_TPOFF_HI32: + Variant = VEMCExpr::VK_VE_TPOFF_HI32; + break; + case MCSymbolRefExpr::VK_VE_TPOFF_LO32: + Variant = VEMCExpr::VK_VE_TPOFF_LO32; + break; + default: + return nullptr; } + + return MCSymbolRefExpr::create(&SRE->getSymbol(), Context); } - MCSymbol *Sym = getContext().getOrCreateSymbol(Identifier); - const MCExpr *Res = - MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, getContext()); - VEMCExpr::VariantKind VK = VEMCExpr::parseVariantKind(Modifier); - if (VK == VEMCExpr::VK_VE_None) { - // Create identifier using default variant kind - VEMCExpr::VariantKind Kind = VEMCExpr::VK_VE_REFLONG; - return VEMCExpr::create(Kind, Res, getContext()); + + case MCExpr::Unary: { + const MCUnaryExpr *UE = cast(E); + const MCExpr *Sub = extractModifierFromExpr(UE->getSubExpr(), Variant); + if (!Sub) + return nullptr; + return MCUnaryExpr::create(UE->getOpcode(), Sub, Context); } - return VEMCExpr::create(VK, Res, getContext()); + + case MCExpr::Binary: { + const MCBinaryExpr *BE = cast(E); + VEMCExpr::VariantKind LHSVariant, RHSVariant; + const MCExpr *LHS = extractModifierFromExpr(BE->getLHS(), LHSVariant); + const MCExpr *RHS = extractModifierFromExpr(BE->getRHS(), RHSVariant); + + if (!LHS && !RHS) + return nullptr; + + if (!LHS) + LHS = BE->getLHS(); + if (!RHS) + RHS = BE->getRHS(); + + if (LHSVariant == VEMCExpr::VK_VE_None) + Variant = RHSVariant; + else if (RHSVariant == VEMCExpr::VK_VE_None) + Variant = LHSVariant; + else if (LHSVariant == RHSVariant) + Variant = LHSVariant; + else + return nullptr; + + return MCBinaryExpr::create(BE->getOpcode(), LHS, RHS, Context); + } + } + + llvm_unreachable("Invalid expression kind!"); +} + +const MCExpr *VEAsmParser::fixupVariantKind(const MCExpr *E) { + MCContext &Context = getParser().getContext(); + + switch (E->getKind()) { + case MCExpr::Target: + case MCExpr::Constant: + case MCExpr::SymbolRef: + return E; + + case MCExpr::Unary: { + const MCUnaryExpr *UE = cast(E); + const MCExpr *Sub = fixupVariantKind(UE->getSubExpr()); + if (Sub == UE->getSubExpr()) + return E; + return MCUnaryExpr::create(UE->getOpcode(), Sub, Context); + } + + case MCExpr::Binary: { + const MCBinaryExpr *BE = cast(E); + const MCExpr *LHS = fixupVariantKind(BE->getLHS()); + const MCExpr *RHS = fixupVariantKind(BE->getRHS()); + if (LHS == BE->getLHS() && RHS == BE->getRHS()) + return E; + return MCBinaryExpr::create(BE->getOpcode(), LHS, RHS, Context); + } + } + + llvm_unreachable("Invalid expression kind!"); +} + +/// ParseExpression. This diff ers from the default "parseExpression" in that +/// it handles modifiers. +bool VEAsmParser::parseExpression(const MCExpr *&EVal) { + // Handle \code symbol @lo32/@hi32/etc \endcode. + if (getParser().parseExpression(EVal)) + return true; + + // Convert MCSymbolRefExpr with VK_* to MCExpr with VK_*. + EVal = fixupVariantKind(EVal); + VEMCExpr::VariantKind Variant; + const MCExpr *E = extractModifierFromExpr(EVal, Variant); + if (E) + EVal = VEMCExpr::create(Variant, E, getParser().getContext()); + + return false; } OperandMatchResultTy VEAsmParser::parseMEMOperand(OperandVector &Operands) { @@ -992,27 +1133,16 @@ OperandMatchResultTy VEAsmParser::parseMEMOperand(OperandVector &Operands) { case AsmToken::Minus: case AsmToken::Integer: - case AsmToken::Dot: { + case AsmToken::Dot: + case AsmToken::Identifier: { const MCExpr *EVal; - if (!getParser().parseExpression(EVal, E)) + if (!parseExpression(EVal)) Offset = VEOperand::CreateImm(EVal, S, E); else return MatchOperand_NoMatch; break; } - case AsmToken::Identifier: { - StringRef Identifier; - if (!getParser().parseIdentifier(Identifier)) { - E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); - const MCExpr *EVal = parseIdentifier(Identifier); - - E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); - Offset = VEOperand::CreateImm(EVal, S, E); - } - break; - } - case AsmToken::LParen: // empty disp (= 0) Offset = @@ -1110,9 +1240,10 @@ OperandMatchResultTy VEAsmParser::parseMEMAsOperand(OperandVector &Operands) { case AsmToken::Minus: case AsmToken::Integer: - case AsmToken::Dot: { + case AsmToken::Dot: + case AsmToken::Identifier: { const MCExpr *EVal; - if (!getParser().parseExpression(EVal, E)) + if (!parseExpression(EVal)) Offset = VEOperand::CreateImm(EVal, S, E); else return MatchOperand_NoMatch; @@ -1275,22 +1406,10 @@ VEAsmParser::parseVEAsmOperand(std::unique_ptr &Op) { case AsmToken::Minus: case AsmToken::Integer: case AsmToken::Dot: - if (!getParser().parseExpression(EVal, E)) + case AsmToken::Identifier: + if (!parseExpression(EVal)) Op = VEOperand::CreateImm(EVal, S, E); break; - - case AsmToken::Identifier: { - StringRef Identifier; - if (!getParser().parseIdentifier(Identifier)) { - E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); - MCSymbol *Sym = getContext().getOrCreateSymbol(Identifier); - - const MCExpr *Res = - MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, getContext()); - Op = VEOperand::CreateImm(Res, S, E); - } - break; - } } return (Op) ? MatchOperand_Success : MatchOperand_ParseFail; } diff --git a/llvm/test/MC/VE/sym-br.s b/llvm/test/MC/VE/sym-br.s new file mode 100644 index 000000000000..87800b518b06 --- /dev/null +++ b/llvm/test/MC/VE/sym-br.s @@ -0,0 +1,10 @@ +# RUN: llvm-mc -triple=ve %s -o - | FileCheck %s +# RUN: llvm-mc -triple=ve -filetype=obj %s -o - | llvm-objdump -r - | FileCheck %s --check-prefix=CHECK-OBJ + + b.l.t tgt(, %s1) + b.l.t tgt+24(, %s1) +# CHECK: b.l.t tgt(, %s1) +# CHECK-NEXT: b.l.t tgt+24(, %s1) + +# CHECK-OBJ: 0 R_VE_REFLONG tgt +# CHECK-OBJ-NEXT: 8 R_VE_REFLONG tgt+0x18 diff --git a/llvm/test/MC/VE/symbols.s b/llvm/test/MC/VE/symbols.s index 55702f89258c..1f1d9a341af6 100644 --- a/llvm/test/MC/VE/symbols.s +++ b/llvm/test/MC/VE/symbols.s @@ -5,11 +5,19 @@ lea %s1, var at lo and %s1, %s1, (32)0 lea.sl %s1, var at hi(, %s1) + lea %s1, var+8 at lo + and %s1, %s1, (32)0 + lea.sl %s1, var+8 at hi(, %s1) # CHECK: lea %s0, var # CHECK-NEXT: lea %s1, var at lo # CHECK-NEXT: and %s1, %s1, (32)0 # CHECK-NEXT: lea.sl %s1, var at hi(, %s1) +# CHECK-NEXT: lea %s1, var+8 at lo +# CHECK-NEXT: and %s1, %s1, (32)0 +# CHECK-NEXT: lea.sl %s1, var+8 at hi(, %s1) # CHECK-OBJ: 0 R_VE_REFLONG var # CHECK-OBJ-NEXT: 8 R_VE_LO32 var # CHECK-OBJ-NEXT: 18 R_VE_HI32 var +# CHECK-OBJ-NEXT: 20 R_VE_LO32 var+0x8 +# CHECK-OBJ-NEXT: 30 R_VE_HI32 var+0x8 From llvm-commits at lists.llvm.org Mon Jul 6 12:17:03 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:17:03 +0000 (UTC) Subject: [PATCH] D83170: [VE] Support symbol with offset in assembly In-Reply-To: References: Message-ID: <93481f228b0053bd638986d804cba67d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGfa1fecc73d4d: [VE] Support symbol with offset in assembly (authored by kaz7). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83170/new/ https://reviews.llvm.org/D83170 Files: llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp llvm/test/MC/VE/sym-br.s llvm/test/MC/VE/symbols.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83170.275792.patch Type: text/x-patch Size: 13288 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:17:17 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 12:17:17 -0700 Subject: [llvm] d7ea6ce - [Support] fix user_cache_directory on mac In-Reply-To: References: <5f0302dc.1c69fb81.983ca.9e7d@mx.google.com> Message-ID: Cool, thanks for the context! On Mon, Jul 6, 2020 at 11:59 AM Sam McCall wrote: > > This is a compile fix for cd209f1a3790, which is tested including on Mac. > The #ifdef __APPLE__ part didn't compile, and if it did the test wouldn't have passed. I didn't have a Mac to test on. > (Typo in commit message: should be user_config_directory) > > On Mon, Jul 6, 2020, 8:36 PM David Blaikie wrote: >> >> Any chance of testing this? (maybe even a unit test? (even if it's >> uninteresting on other platforms, perhaps)?) >> >> On Mon, Jul 6, 2020 at 3:54 AM Sam McCall via llvm-commits >> wrote: >> > >> > >> > Author: Sam McCall >> > Date: 2020-07-06T12:54:11+02:00 >> > New Revision: d7ea6ce809a4413afb1edafa17ba291b39129f52 >> > >> > URL: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52 >> > DIFF: https://github.com/llvm/llvm-project/commit/d7ea6ce809a4413afb1edafa17ba291b39129f52.diff >> > >> > LOG: [Support] fix user_cache_directory on mac >> > >> > Added: >> > >> > >> > Modified: >> > llvm/lib/Support/Unix/Path.inc >> > >> > Removed: >> > >> > >> > >> > ################################################################################ >> > diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc >> > index c35db79cbd8a..d91b269cc6d3 100644 >> > --- a/llvm/lib/Support/Unix/Path.inc >> > +++ b/llvm/lib/Support/Unix/Path.inc >> > @@ -1193,7 +1193,7 @@ bool user_config_directory(SmallVectorImpl &result) { >> > #ifdef __APPLE__ >> > // Mac: ~/Library/Preferences/ >> > if (home_directory(result)) { >> > - append("Library", "Preferences"); >> > + append(result, "Library", "Preferences"); >> > return true; >> > } >> > #else >> > >> > >> > >> > _______________________________________________ >> > llvm-commits mailing list >> > llvm-commits at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 12:18:32 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:18:32 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: <5f9c8cd6ab501a537c9f77a4d92f664f@localhost.localdomain> MaskRay added a comment. I cannot find any search result about `no-warning-for-no-symbols`. Is `-no-warning-for-no-symbols` really an existing option? libtool is an `ar` like tool. Second, I wonder how you are going to plug `-no-warning-for-no-symbols` into a build system. If you only parse stdout, you can ignore stderr. Even if you do, you can probably use `grep -v '^no symbols'`. This will have better portability (supported on older nm, supported on other binary formats). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Mon Jul 6 12:21:07 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:21:07 +0000 (UTC) Subject: [PATCH] D83134: [asan] Disable fast unwinder on arm-linux-gnueabi with thumb In-Reply-To: References: Message-ID: <18215bc88b0ab78c7ed636a26c48258f@localhost.localdomain> eugenis added a comment. Is unwinding actually broken on an all-clang, all-thumb system? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83134/new/ https://reviews.llvm.org/D83134 From llvm-commits at lists.llvm.org Mon Jul 6 12:22:50 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 12:22:50 -0700 Subject: [llvm] 613f12d - [AArch64][GlobalISel] Set the current debug loc when missing in some cases. In-Reply-To: References: <5ea15350.1c69fb81.e6a9c.69b0@mx.google.com> Message-ID: Ping On Mon, Apr 27, 2020 at 9:17 PM David Blaikie wrote: > > Could you add more precise tests that validate the specific debug locations that are present on the resulting instructions are the desired ones? > > On Thu, Apr 23, 2020 at 1:36 AM Amara Emerson via llvm-commits wrote: >> >> >> Author: Amara Emerson >> Date: 2020-04-23T01:34:57-07:00 >> New Revision: 613f12dd8e2403f5630ab299d2a1bb2cb111ead1 >> >> URL: https://github.com/llvm/llvm-project/commit/613f12dd8e2403f5630ab299d2a1bb2cb111ead1 >> DIFF: https://github.com/llvm/llvm-project/commit/613f12dd8e2403f5630ab299d2a1bb2cb111ead1.diff >> >> LOG: [AArch64][GlobalISel] Set the current debug loc when missing in some cases. >> >> Added: >> >> >> Modified: >> llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp >> llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp >> llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir >> llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir >> llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir >> >> Removed: >> >> >> >> ################################################################################ >> diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp >> index 47c723cbf5a3..09e303eadd49 100644 >> --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp >> +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp >> @@ -570,7 +570,7 @@ llvm::createMemLibcall(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI, >> } >> const char *Name = TLI.getLibcallName(RTLibcall); >> >> - MIRBuilder.setInstr(MI); >> + MIRBuilder.setInstrAndDebugLoc(MI); >> >> CallLowering::CallLoweringInfo Info; >> Info.CallConv = TLI.getLibcallCallingConv(RTLibcall); >> @@ -3610,7 +3610,7 @@ LegalizerHelper::moreElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx, >> LegalizerHelper::LegalizeResult >> LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx, >> LLT MoreTy) { >> - MIRBuilder.setInstr(MI); >> + MIRBuilder.setInstrAndDebugLoc(MI); >> unsigned Opc = MI.getOpcode(); >> switch (Opc) { >> case TargetOpcode::G_IMPLICIT_DEF: >> >> diff --git a/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp >> index 60ccb3621a2e..cae5028f1925 100644 >> --- a/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp >> +++ b/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp >> @@ -675,7 +675,7 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr( >> if (Amount > 31) >> return true; // This will have to remain a register variant. >> assert(MRI.getType(AmtReg).getSizeInBits() == 32); >> - MIRBuilder.setInstr(MI); >> + MIRBuilder.setInstrAndDebugLoc(MI); >> auto ExtCst = MIRBuilder.buildZExt(LLT::scalar(64), AmtReg); >> MI.getOperand(2).setReg(ExtCst.getReg(0)); >> return true; >> @@ -704,7 +704,7 @@ bool AArch64LegalizerInfo::legalizeLoadStore( >> return false; >> } >> >> - MIRBuilder.setInstr(MI); >> + MIRBuilder.setInstrAndDebugLoc(MI); >> unsigned PtrSize = ValTy.getElementType().getSizeInBits(); >> const LLT NewTy = LLT::vector(ValTy.getNumElements(), PtrSize); >> auto &MMO = **MI.memoperands_begin(); >> @@ -722,7 +722,7 @@ bool AArch64LegalizerInfo::legalizeLoadStore( >> bool AArch64LegalizerInfo::legalizeVaArg(MachineInstr &MI, >> MachineRegisterInfo &MRI, >> MachineIRBuilder &MIRBuilder) const { >> - MIRBuilder.setInstr(MI); >> + MIRBuilder.setInstrAndDebugLoc(MI); >> MachineFunction &MF = MIRBuilder.getMF(); >> Align Alignment(MI.getOperand(2).getImm()); >> Register Dst = MI.getOperand(0).getReg(); >> >> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir >> index 6d50898117cd..5b32fd51f58c 100644 >> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir >> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir >> @@ -1,5 +1,6 @@ >> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py >> # RUN: llc -O0 -march=aarch64 -run-pass=legalizer %s -o - | FileCheck %s >> +# RUN: llc -O0 -debugify-and-strip-all-safe -march=aarch64 -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s >> --- | >> target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" >> target triple = "aarch64" >> >> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir >> index 7ccb5166e4a7..dc42d603d737 100644 >> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir >> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir >> @@ -1,5 +1,6 @@ >> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py >> # RUN: llc -O0 -march=aarch64 -run-pass=legalizer %s -o - | FileCheck %s >> +# RUN: llc -O0 -debugify-and-strip-all-safe -march=aarch64 -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s >> --- >> name: test_shift >> body: | >> >> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir >> index fe1d5a5002c9..7446fde7ba08 100644 >> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir >> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir >> @@ -1,5 +1,5 @@ >> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py >> -# RUN: llc -O0 -run-pass=legalizer %s -o - | FileCheck %s >> +# RUN: llc -O0 -run-pass=legalizer --debugify-and-strip-all-safe --debugify-level=locations %s -o - | FileCheck %s >> >> --- | >> target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 12:23:09 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 12:23:09 -0700 Subject: [llvm] 2fa656c - [Debugify] Do not require named metadata to be present when stripping In-Reply-To: References: <5ea0dbae.1c69fb81.ef94a.17c7@mx.google.com> Message-ID: Ping On Mon, Apr 27, 2020 at 9:15 PM David Blaikie wrote: > > Should/could this have a test case? > > On Wed, Apr 22, 2020 at 5:05 PM Vedant Kumar via llvm-commits wrote: >> >> >> Author: Vedant Kumar >> Date: 2020-04-22T17:03:39-07:00 >> New Revision: 2fa656cdfd836d5d3959466f05e44ae51bcded4e >> >> URL: https://github.com/llvm/llvm-project/commit/2fa656cdfd836d5d3959466f05e44ae51bcded4e >> DIFF: https://github.com/llvm/llvm-project/commit/2fa656cdfd836d5d3959466f05e44ae51bcded4e.diff >> >> LOG: [Debugify] Do not require named metadata to be present when stripping >> >> This allows -mir-strip-debug to be run without -debugify having run >> before. >> >> Added: >> >> >> Modified: >> llvm/lib/Transforms/Utils/Debugify.cpp >> >> Removed: >> >> >> >> ################################################################################ >> diff --git a/llvm/lib/Transforms/Utils/Debugify.cpp b/llvm/lib/Transforms/Utils/Debugify.cpp >> index f2739a8257a2..19c73f3840fc 100644 >> --- a/llvm/lib/Transforms/Utils/Debugify.cpp >> +++ b/llvm/lib/Transforms/Utils/Debugify.cpp >> @@ -224,7 +224,8 @@ bool llvm::stripDebugifyMetadata(Module &M) { >> // Strip out the module-level Debug Info Version metadata. >> // FIXME: There must be an easier way to remove an operand from a NamedMDNode. >> NamedMDNode *NMD = M.getModuleFlagsMetadata(); >> - assert(NMD && "debugify metadata present without Debug Info Version set?"); >> + if (!NMD) >> + return Changed; >> SmallVector Flags; >> for (MDNode *Flag : NMD->operands()) >> Flags.push_back(Flag); >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Mon Jul 6 12:23:52 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:23:52 +0000 (UTC) Subject: [PATCH] D82821: [WebAssembly] Added 64-bit memory.grow/size/init/copy/fill In-Reply-To: References: Message-ID: tlively accepted this revision. tlively added a comment. This revision is now accepted and ready to land. LGTM, thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82821/new/ https://reviews.llvm.org/D82821 From llvm-commits at lists.llvm.org Mon Jul 6 12:24:10 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:24:10 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri updated this revision to Diff 275795. lebedev.ri marked an inline comment as done. lebedev.ri added a comment. Actually upload up-to-date patch now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 Files: llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83177.275795.patch Type: text/x-patch Size: 10493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:24:17 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:24:17 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <1292d48864b74b10303e46d203a6ec72@localhost.localdomain> lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:34 + +using namespace llvm; + ---------------- nickdesaulniers wrote: > if you move this up, then you don't need to wrap the forward declaration of `class Module` above. Hold up, that's actually bogus. https://godbolt.org/z/0qTwwz Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 12:24:46 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:24:46 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper In-Reply-To: References: Message-ID: lebedev.ri marked 3 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1145 + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + ---------------- nickdesaulniers wrote: > lebedev.ri wrote: > > nickdesaulniers wrote: > > > No call sites make use of this default value (`nullptr`). Is it the right value? I'm not sure if you can default it to `CB`, but if not, should it just be a required parameter with no default value? > > This is a direct wrapper over all the subclasses `Create()` function with the same signature. > > They all follow this pattern. I'm not sure why we should deviate here. > Ah, I missed the use in the child patch. Hm, that use wasn't supposed to be there. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83248/new/ https://reviews.llvm.org/D83248 From llvm-commits at lists.llvm.org Mon Jul 6 12:26:54 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:26:54 +0000 (UTC) Subject: [PATCH] D73853: [llvm-reduce] add ReduceAttribute delta pass In-Reply-To: References: Message-ID: <1cc31e7dbea423f7db9d4b7756591614@localhost.localdomain> lebedev.ri added a comment. @nickdesaulniers are you planning on picking this up? If not, just so we're done (or are we) with attributes, i can take over? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73853/new/ https://reviews.llvm.org/D73853 From llvm-commits at lists.llvm.org Mon Jul 6 12:28:18 2020 From: llvm-commits at lists.llvm.org (Anirudh Prasad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:28:18 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation Message-ID: anirudhp created this revision. anirudhp added reviewers: uweigand, Kai. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. - Previously this patch https://reviews.llvm.org/rGe1de2773a534957305d7a559c6d88c4b5ac354e2) provided support for accepting integer registers in inline asm i.e. __asm("lhi %r0, 5") -> lhi %r0, 5 __asm("lhi 0, 5") -> lhi 0,5 - This patch aims to extend this support to instructions which compute addresses as well. (i.e instructions of type BDMem and BD[X|R|V|L]Mem) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83251 Files: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/test/MC/SystemZ/insn-bad.s llvm/test/MC/SystemZ/insn-good-z13.s llvm/test/MC/SystemZ/insn-good-z14.s llvm/test/MC/SystemZ/insn-good-z15.s llvm/test/MC/SystemZ/insn-good.s llvm/test/MC/SystemZ/regs-good.s llvm/test/MC/SystemZ/tokens.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83251.275794.patch Type: text/x-patch Size: 40224 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:29:15 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:29:15 +0000 (UTC) Subject: [PATCH] D73853: [llvm-reduce] add ReduceAttribute delta pass In-Reply-To: References: Message-ID: <95ecc2a1af7096f83aa731d07440d3f8@localhost.localdomain> nickdesaulniers added a comment. In D73853#2133880 , @lebedev.ri wrote: > @nickdesaulniers are you planning on picking this up? > If not, just so we're done (or are we) with attributes, i can take over? All yours, I didn't/don't have the time to sort out what the issue was on OSX, but if you find/fix/land it, I'll owe you a beer! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73853/new/ https://reviews.llvm.org/D73853 From llvm-commits at lists.llvm.org Mon Jul 6 12:29:58 2020 From: llvm-commits at lists.llvm.org (Varun Gandhi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:29:58 +0000 (UTC) Subject: [PATCH] D82791: [lit] Improve lit's output with default settings and --verbose. In-Reply-To: References: Message-ID: varungandhi-apple marked 2 inline comments as done. varungandhi-apple added a comment. Thanks for the review, it's a big patch. 😅 I'm a bit busy at the moment, I will respond to the other comments later this week or sometime next week. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82791/new/ https://reviews.llvm.org/D82791 From llvm-commits at lists.llvm.org Mon Jul 6 12:30:27 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:30:27 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: <72fa9f0802faf36f5c5daf4e3735da96@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 From llvm-commits at lists.llvm.org Mon Jul 6 12:31:56 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:31:56 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: nickdesaulniers added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:34 + +using namespace llvm; + ---------------- lebedev.ri wrote: > nickdesaulniers wrote: > > if you move this up, then you don't need to wrap the forward declaration of `class Module` above. > Hold up, that's actually bogus. > https://godbolt.org/z/0qTwwz or just include the appropriate header for `llvm::Module` and forget forward declaration? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 12:32:21 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:32:21 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: <5b01f9b374efe5279b15a04027694e44@localhost.localdomain> dblaikie added a comment. Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 From llvm-commits at lists.llvm.org Mon Jul 6 12:35:33 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:35:33 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <53c5377190b099ef9d9ebc597ef779f2@localhost.localdomain> lebedev.ri marked 2 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:34 + +using namespace llvm; + ---------------- nickdesaulniers wrote: > lebedev.ri wrote: > > nickdesaulniers wrote: > > > if you move this up, then you don't need to wrap the forward declaration of `class Module` above. > > Hold up, that's actually bogus. > > https://godbolt.org/z/0qTwwz > or just include the appropriate header for `llvm::Module` and forget forward declaration? But here we don't need to know what `Module` actually is though, just a pointer to it. The current include list is what IWYU suggests, and i believe cleanup elsewhere in codebase is ongoing in the same direction. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 12:37:11 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:37:11 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: <4a719ff132fcff4b67e687081dc40109@localhost.localdomain> sameerarora101 marked 3 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:217 + + // Add new RPaths. + for (StringRef RPath : Config.RPathToAdd) { ---------------- jhenderson wrote: > sameerarora101 wrote: > > alexshap wrote: > > > sameerarora101 wrote: > > > > alexshap wrote: > > > > > smeenai wrote: > > > > > > Nit: do we want to be adding new load commands in a function called `updateLoadCommands`? At least to me that function name seems like it should only be updating existing load commands, since we have a separate `removeLoadCommands` to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate `addLoadCommands` function. > > > > > so basically the idea was to group together logical pieces of handleArgs (to some reasonable extent). > > > > > Besides error-reporting removeLoadCommands is ~10-12 lines of code, so I'd probably inline it into > > > > > updateLoadCommands for consistency. > > > > @alexshap Instead of inlining the whole `removeLoadCommands` inside `updateLoadCommands` I think it would cleaner if I just call > > > > > > > > ``` > > > > // Remove LCs. > > > > if (Error E = removeLoadCommands(Config, Obj)) > > > > return E; > > > > ``` > > > > from inside `updateLoadCommands`. This can allow for independent development of `removeLoadCommands` in future as well. What do you think? (I have updated the current diff with this change, however, I can update it again in case we want something else) > > > I'm not sure that removeLoadCommands is realistically independent from updateLoadCommands, e.g. the order in which you modify the list of load commands appears to be important. Since it's small (~10 lines) it seems preferable to avoid creating this weird asymmetry between removing/adding. > > I see. Ok, I have inlined `removeLoadCommands` into `updateLoadCommands` then. Thanks 😊 > There are two options. Either a) rename `updateLoadCommands` to something more generic (e.g. `processLoadCommands`, in which case I'd ensure all load command processing is done in that function), or b) think of it purely in conceptual terms where the load commands in the function name refers to the set of load commands, rather than each individual load command, if that makes sense. Thus you update that set by adding/removing elements, and also changing the existing elements. I like `processLoadCommands` too. Updated (I can change it to something else too?) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 12:37:45 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:37:45 +0000 (UTC) Subject: [PATCH] D83252: [XCOFF] Enable symbol alias for AIX Message-ID: jasonliu created this revision. jasonliu added reviewers: hubert.reinterpretcast, DiggerLin, daltenty, cebowleratibm. Herald added subscribers: llvm-commits, kbarton, hiraditya, nemanjai. Herald added a project: LLVM. AIX assembly's .set directive is not usable for aliasing purpose. We need to use extra-label-at-defintion strategy to generate symbol aliasing on AIX. Follow up items after this patch would be: 1. Investigate .set directive to see if it's needed for other purpose. 2. Use llvm-readobj to dump the relocation table and symbol table for the symbols to verify it on the integrate-as path. https://reviews.llvm.org/D83252 Files: llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h llvm/include/llvm/Target/TargetLoweringObjectFile.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/aix-alias-unsupported.ll llvm/test/CodeGen/PowerPC/aix-alias.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83252.275797.patch Type: text/x-patch Size: 12842 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:37:56 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:37:56 +0000 (UTC) Subject: [PATCH] D83106: [WebAssembly] 64-bit memory limits In-Reply-To: References: Message-ID: aardappel updated this revision to Diff 275799. aardappel added a comment. Test uses 32/64-bit specific .o files CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83106/new/ https://reviews.llvm.org/D83106 Files: lld/test/wasm/data-layout.ll lld/wasm/SyntheticSections.cpp lld/wasm/SyntheticSections.h lld/wasm/Writer.cpp llvm/include/llvm/BinaryFormat/Wasm.h llvm/lib/MC/WasmObjectWriter.cpp llvm/lib/Object/WasmObjectFile.cpp llvm/lib/ObjectYAML/WasmYAML.cpp llvm/test/MC/WebAssembly/wasm64.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83106.275799.patch Type: text/x-patch Size: 13135 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:40:59 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:40:59 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <5922ed0f7bfc87c4475dd73ed79015d0@localhost.localdomain> lei added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:46 +// PowerPC specific type constraints. +def SDT_PPCSplat32 : SDTypeProfile<1, 3, [ SDTCisVT<0, v2i64>, ---------------- nit: Maybe ``` PowerPC ISA 3.1 specific type constraints. ``` ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:49 + SDTCisVec<1>, SDTCisInt<2>, SDTCisInt<3> +]>; + ---------------- lei wrote: > nit: indentation? nvm. This is how it's been done else where.. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:51 + +// P10 specific PPCISD nodes. +def PPCxxsplti32dx : SDNode<"PPCISD::XXSPLTI32DX", SDT_PPCSplat32, []>; ---------------- nit: ``` // ISA 3.1 specific PPCISD nodes. ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 12:41:23 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via llvm-commits) Date: Mon, 06 Jul 2020 12:41:23 -0700 (PDT) Subject: [llvm] 4d135b0 - [WebAssembly] 64-bit memory limits Message-ID: <5f037e63.1c69fb81.89cbd.de23@mx.google.com> Author: Wouter van Oortmerssen Date: 2020-07-06T12:40:45-07:00 New Revision: 4d135b0446dc34885811bf103ba2c8b69fe6793b URL: https://github.com/llvm/llvm-project/commit/4d135b0446dc34885811bf103ba2c8b69fe6793b DIFF: https://github.com/llvm/llvm-project/commit/4d135b0446dc34885811bf103ba2c8b69fe6793b.diff LOG: [WebAssembly] 64-bit memory limits Added: Modified: lld/test/wasm/data-layout.ll lld/wasm/SyntheticSections.cpp lld/wasm/SyntheticSections.h lld/wasm/Writer.cpp llvm/include/llvm/BinaryFormat/Wasm.h llvm/lib/MC/WasmObjectWriter.cpp llvm/lib/Object/WasmObjectFile.cpp llvm/lib/ObjectYAML/WasmYAML.cpp llvm/test/MC/WebAssembly/wasm64.s Removed: ################################################################################ diff --git a/lld/test/wasm/data-layout.ll b/lld/test/wasm/data-layout.ll index 759c5440fe99..863447cfd26b 100644 --- a/lld/test/wasm/data-layout.ll +++ b/lld/test/wasm/data-layout.ll @@ -1,7 +1,12 @@ -; RUN: llvm-mc -filetype=obj -triple=wasm32-unknown-unknown %p/Inputs/hello.s -o %t.hello.o -; RUN: llc -filetype=obj %s -o %t.o +; RUN: llvm-mc -filetype=obj -triple=wasm32-unknown-unknown %p/Inputs/hello.s -o %t.hello32.o +; RUN: llc -mtriple=wasm32-unknown-unknown -filetype=obj %s -o %t32.o +; RUN: wasm-ld -m wasm32 -no-gc-sections --export=__data_end --export=__heap_base --allow-undefined --no-entry -o %t32.wasm %t32.o %t.hello32.o +; RUN: obj2yaml %t32.wasm | FileCheck --check-prefixes CHECK,CHK32 %s -target triple = "wasm32-unknown-unknown" +; RUN: llvm-mc -filetype=obj -triple=wasm64-unknown-unknown %p/Inputs/hello.s -o %t.hello64.o +; RUN: llc -mtriple=wasm64-unknown-unknown -filetype=obj %s -o %t64.o +; RUN: wasm-ld -m wasm64 -no-gc-sections --export=__data_end --export=__heap_base --allow-undefined --no-entry -o %t64.wasm %t64.o %t.hello64.o +; RUN: obj2yaml %t64.wasm | FileCheck --check-prefixes CHECK,CHK64 %s @foo = hidden global i32 1, align 4 @aligned_bar = hidden global i32 3, align 16 @@ -13,26 +18,28 @@ target triple = "wasm32-unknown-unknown" @local_struct = hidden global %struct.s zeroinitializer, align 4 @local_struct_internal_ptr = hidden local_unnamed_addr global i32* getelementptr inbounds (%struct.s, %struct.s* @local_struct, i32 0, i32 1), align 4 -; RUN: wasm-ld -no-gc-sections --export=__data_end --export=__heap_base --allow-undefined --no-entry -o %t.wasm %t.o %t.hello.o -; RUN: obj2yaml %t.wasm | FileCheck %s - ; CHECK: - Type: MEMORY ; CHECK-NEXT: Memories: -; CHECK-NEXT: - Initial: 0x00000002 +; CHK32-NEXT: - Initial: 0x00000002 +; CHK64-NEXT: - Flags: [ IS_64 ] +; CHK64-NEXT: Initial: 0x00000002 ; CHECK-NEXT: - Type: GLOBAL ; CHECK-NEXT: Globals: ; CHECK-NEXT: - Index: 0 -; CHECK-NEXT: Type: I32 +; CHK32-NEXT: Type: I32 +; CHK64-NEXT: Type: I64 ; CHECK-NEXT: Mutable: true ; CHECK-NEXT: InitExpr: -; CHECK-NEXT: Opcode: I32_CONST +; CHK32-NEXT: Opcode: I32_CONST +; CHK64-NEXT: Opcode: I64_CONST ; CHECK-NEXT: Value: 66624 ; CHECK-NEXT: - Index: 1 ; CHECK-NEXT: Type: I32 ; CHECK-NEXT: Mutable: false ; CHECK-NEXT: InitExpr: ; CHECK-NEXT: Opcode: I32_CONST -; CHECK-NEXT: Value: 1080 +; CHK32-NEXT: Value: 1080 +; CHK64-NEXT: Value: 1088 ; CHECK-NEXT: - Index: 2 ; CHECK-NEXT: Type: I32 ; CHECK-NEXT: Mutable: false @@ -53,13 +60,11 @@ target triple = "wasm32-unknown-unknown" ; CHECK-NEXT: Offset: ; CHECK-NEXT: Opcode: I32_CONST ; CHECK-NEXT: Value: 1040 -; CHECK-NEXT: Content: '0100000000000000000000000000000003000000000000000004000034040000' -; CHECK-NEXT: - Type: CUSTOM ; RUN: wasm-ld -no-gc-sections --allow-undefined --no-entry \ -; RUN: --initial-memory=131072 --max-memory=131072 -o %t_max.wasm %t.o \ -; RUN: %t.hello.o +; RUN: --initial-memory=131072 --max-memory=131072 -o %t_max.wasm %t32.o \ +; RUN: %t.hello32.o ; RUN: obj2yaml %t_max.wasm | FileCheck %s -check-prefix=CHECK-MAX ; CHECK-MAX: - Type: MEMORY @@ -70,7 +75,7 @@ target triple = "wasm32-unknown-unknown" ; RUN: wasm-ld -no-gc-sections --allow-undefined --no-entry --shared-memory \ ; RUN: --features=atomics,bulk-memory --initial-memory=131072 \ -; RUN: --max-memory=131072 -o %t_max.wasm %t.o %t.hello.o +; RUN: --max-memory=131072 -o %t_max.wasm %t32.o %t.hello32.o ; RUN: obj2yaml %t_max.wasm | FileCheck %s -check-prefix=CHECK-SHARED ; CHECK-SHARED: - Type: MEMORY @@ -79,7 +84,7 @@ target triple = "wasm32-unknown-unknown" ; CHECK-SHARED-NEXT: Initial: 0x00000002 ; CHECK-SHARED-NEXT: Maximum: 0x00000002 -; RUN: wasm-ld --relocatable -o %t_reloc.wasm %t.o %t.hello.o +; RUN: wasm-ld --relocatable -o %t_reloc.wasm %t32.o %t.hello32.o ; RUN: obj2yaml %t_reloc.wasm | FileCheck %s -check-prefix=RELOC ; RELOC: - Type: DATA diff --git a/lld/wasm/SyntheticSections.cpp b/lld/wasm/SyntheticSections.cpp index 0ed3ea273d70..70d6a10200c6 100644 --- a/lld/wasm/SyntheticSections.cpp +++ b/lld/wasm/SyntheticSections.cpp @@ -139,6 +139,8 @@ void ImportSection::writeBody() { } if (config->sharedMemory) import.Memory.Flags |= WASM_LIMITS_FLAG_IS_SHARED; + if (config->is64) + import.Memory.Flags |= WASM_LIMITS_FLAG_IS_64; writeImport(os, import); } @@ -234,6 +236,8 @@ void MemorySection::writeBody() { flags |= WASM_LIMITS_FLAG_HAS_MAX; if (config->sharedMemory) flags |= WASM_LIMITS_FLAG_IS_SHARED; + if (config->is64) + flags |= WASM_LIMITS_FLAG_IS_64; writeUleb128(os, flags, "memory limits flags"); writeUleb128(os, numMemoryPages, "initial pages"); if (hasMax) diff --git a/lld/wasm/SyntheticSections.h b/lld/wasm/SyntheticSections.h index 6cf593cf016c..3e125ca84e40 100644 --- a/lld/wasm/SyntheticSections.h +++ b/lld/wasm/SyntheticSections.h @@ -167,8 +167,8 @@ class MemorySection : public SyntheticSection { bool isNeeded() const override { return !config->importMemory; } void writeBody() override; - uint32_t numMemoryPages = 0; - uint32_t maxMemoryPages = 0; + uint64_t numMemoryPages = 0; + uint64_t maxMemoryPages = 0; }; // The event section contains a list of declared wasm events associated with the diff --git a/lld/wasm/Writer.cpp b/lld/wasm/Writer.cpp index 0434a4cf0293..1401dc50931b 100644 --- a/lld/wasm/Writer.cpp +++ b/lld/wasm/Writer.cpp @@ -224,8 +224,16 @@ void Writer::layoutMemory() { log("mem: stack base = " + Twine(memoryPtr)); memoryPtr += config->zStackSize; auto *sp = cast(WasmSym::stackPointer); - assert(sp->global->global.InitExpr.Opcode == WASM_OPCODE_I32_CONST); - sp->global->global.InitExpr.Value.Int32 = memoryPtr; + switch (sp->global->global.InitExpr.Opcode) { + case WASM_OPCODE_I32_CONST: + sp->global->global.InitExpr.Value.Int32 = memoryPtr; + break; + case WASM_OPCODE_I64_CONST: + sp->global->global.InitExpr.Value.Int64 = memoryPtr; + break; + default: + llvm_unreachable("init expr must be i32/i64.const"); + } log("mem: stack top = " + Twine(memoryPtr)); }; @@ -296,13 +304,16 @@ void Writer::layoutMemory() { if (WasmSym::heapBase) WasmSym::heapBase->setVirtualAddress(memoryPtr); + uint64_t maxMemorySetting = 1ULL << (config->is64 ? 48 : 32); + if (config->initialMemory != 0) { if (config->initialMemory != alignTo(config->initialMemory, WasmPageSize)) error("initial memory must be " + Twine(WasmPageSize) + "-byte aligned"); if (memoryPtr > config->initialMemory) error("initial memory too small, " + Twine(memoryPtr) + " bytes needed"); - if (config->initialMemory > (1ULL << 32)) - error("initial memory too large, cannot be greater than 4294967296"); + if (config->initialMemory > maxMemorySetting) + error("initial memory too large, cannot be greater than " + + Twine(maxMemorySetting)); memoryPtr = config->initialMemory; } out.dylinkSec->memSize = memoryPtr; @@ -316,8 +327,9 @@ void Writer::layoutMemory() { error("maximum memory must be " + Twine(WasmPageSize) + "-byte aligned"); if (memoryPtr > config->maxMemory) error("maximum memory too small, " + Twine(memoryPtr) + " bytes needed"); - if (config->maxMemory > (1ULL << 32)) - error("maximum memory too large, cannot be greater than 4294967296"); + if (config->maxMemory > maxMemorySetting) + error("maximum memory too large, cannot be greater than " + + Twine(maxMemorySetting)); out.memorySec->maxMemoryPages = config->maxMemory / WasmPageSize; log("mem: max pages = " + Twine(out.memorySec->maxMemoryPages)); } diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h index b8d3b3f4d66f..d8d72cacf226 100644 --- a/llvm/include/llvm/BinaryFormat/Wasm.h +++ b/llvm/include/llvm/BinaryFormat/Wasm.h @@ -63,8 +63,8 @@ struct WasmExport { struct WasmLimits { uint8_t Flags; - uint32_t Initial; - uint32_t Maximum; + uint64_t Initial; + uint64_t Maximum; }; struct WasmTable { @@ -282,6 +282,7 @@ enum : unsigned { enum : unsigned { WASM_LIMITS_FLAG_HAS_MAX = 0x1, WASM_LIMITS_FLAG_IS_SHARED = 0x2, + WASM_LIMITS_FLAG_IS_64 = 0x4, }; enum : unsigned { diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp index c6029b66a388..d1290b050ef2 100644 --- a/llvm/lib/MC/WasmObjectWriter.cpp +++ b/llvm/lib/MC/WasmObjectWriter.cpp @@ -108,7 +108,7 @@ struct WasmDataSegment { MCSectionWasm *Section; StringRef Name; uint32_t InitFlags; - uint32_t Offset; + uint64_t Offset; uint32_t Alignment; uint32_t LinkerFlags; SmallVector Data; @@ -326,7 +326,7 @@ class WasmObjectWriter : public MCObjectWriter { void writeValueType(wasm::ValType Ty) { W.OS << static_cast(Ty); } void writeTypeSection(ArrayRef Signatures); - void writeImportSection(ArrayRef Imports, uint32_t DataSize, + void writeImportSection(ArrayRef Imports, uint64_t DataSize, uint32_t NumElements); void writeFunctionSection(ArrayRef Functions); void writeExportSection(ArrayRef Exports); @@ -730,12 +730,12 @@ void WasmObjectWriter::writeTypeSection(ArrayRef Signatures) { } void WasmObjectWriter::writeImportSection(ArrayRef Imports, - uint32_t DataSize, + uint64_t DataSize, uint32_t NumElements) { if (Imports.empty()) return; - uint32_t NumPages = (DataSize + wasm::WasmPageSize - 1) / wasm::WasmPageSize; + uint64_t NumPages = (DataSize + wasm::WasmPageSize - 1) / wasm::WasmPageSize; SectionBookkeeping Section; startSection(Section, wasm::WASM_SEC_IMPORT); @@ -755,8 +755,8 @@ void WasmObjectWriter::writeImportSection(ArrayRef Imports, W.OS << char(Import.Global.Mutable ? 1 : 0); break; case wasm::WASM_EXTERNAL_MEMORY: - encodeULEB128(0, W.OS); // flags - encodeULEB128(NumPages, W.OS); // initial + encodeULEB128(Import.Memory.Flags, W.OS); + encodeULEB128(NumPages, W.OS); // initial break; case wasm::WASM_EXTERNAL_TABLE: W.OS << char(Import.Table.ElemType); @@ -935,7 +935,9 @@ uint32_t WasmObjectWriter::writeDataSection(const MCAsmLayout &Layout) { if (Segment.InitFlags & wasm::WASM_SEGMENT_HAS_MEMINDEX) encodeULEB128(0, W.OS); // memory index if ((Segment.InitFlags & wasm::WASM_SEGMENT_IS_PASSIVE) == 0) { - W.OS << char(wasm::WASM_OPCODE_I32_CONST); + W.OS << char(Segment.Offset > std::numeric_limits().max() + ? wasm::WASM_OPCODE_I64_CONST + : wasm::WASM_OPCODE_I32_CONST); encodeSLEB128(Segment.Offset, W.OS); // offset W.OS << char(wasm::WASM_OPCODE_END); } @@ -1187,7 +1189,7 @@ uint64_t WasmObjectWriter::writeObject(MCAssembler &Asm, SmallVector SymbolInfos; SmallVector, 2> InitFuncs; std::map> Comdats; - uint32_t DataSize = 0; + uint64_t DataSize = 0; // For now, always emit the memory import, since loads and stores are not // valid without it. In the future, we could perhaps be more clever and omit @@ -1196,6 +1198,7 @@ uint64_t WasmObjectWriter::writeObject(MCAssembler &Asm, MemImport.Module = "env"; MemImport.Field = "__linear_memory"; MemImport.Kind = wasm::WASM_EXTERNAL_MEMORY; + MemImport.Memory.Flags = is64Bit() ? wasm::WASM_LIMITS_FLAG_IS_64 : 0; Imports.push_back(MemImport); // For now, always emit the table section, since indirect calls are not diff --git a/llvm/lib/Object/WasmObjectFile.cpp b/llvm/lib/Object/WasmObjectFile.cpp index 25990b012118..bb2e81d64047 100644 --- a/llvm/lib/Object/WasmObjectFile.cpp +++ b/llvm/lib/Object/WasmObjectFile.cpp @@ -208,9 +208,9 @@ static Error readInitExpr(wasm::WasmInitExpr &Expr, static wasm::WasmLimits readLimits(WasmObjectFile::ReadContext &Ctx) { wasm::WasmLimits Result; Result.Flags = readVaruint32(Ctx); - Result.Initial = readVaruint32(Ctx); + Result.Initial = readVaruint64(Ctx); if (Result.Flags & wasm::WASM_LIMITS_FLAG_HAS_MAX) - Result.Maximum = readVaruint32(Ctx); + Result.Maximum = readVaruint64(Ctx); return Result; } diff --git a/llvm/lib/ObjectYAML/WasmYAML.cpp b/llvm/lib/ObjectYAML/WasmYAML.cpp index b12fd448de5a..d1aa1181a344 100644 --- a/llvm/lib/ObjectYAML/WasmYAML.cpp +++ b/llvm/lib/ObjectYAML/WasmYAML.cpp @@ -522,6 +522,7 @@ void ScalarBitSetTraits::bitset( #define BCase(X) IO.bitSetCase(Value, #X, wasm::WASM_LIMITS_FLAG_##X) BCase(HAS_MAX); BCase(IS_SHARED); + BCase(IS_64); #undef BCase } diff --git a/llvm/test/MC/WebAssembly/wasm64.s b/llvm/test/MC/WebAssembly/wasm64.s index 2ec331f751d6..b89718816a9f 100644 --- a/llvm/test/MC/WebAssembly/wasm64.s +++ b/llvm/test/MC/WebAssembly/wasm64.s @@ -147,6 +147,7 @@ test: # BIN-NEXT: Field: __linear_memory # BIN-NEXT: Kind: MEMORY # BIN-NEXT: Memory: +# BIN-NEXT: Flags: [ IS_64 ] # BIN-NEXT: Initial: 0x00000001 # BIN-NEXT: - Module: env # BIN-NEXT: Field: __indirect_function_table From llvm-commits at lists.llvm.org Mon Jul 6 12:41:25 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:41:25 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: sameerarora101 marked an inline comment as done. sameerarora101 added inline comments. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:85 + std::vector NewMembers; + for (const StringRef &Member : InputFiles) + if (Error E = addMember(NewMembers, Member)) ---------------- @jhenderson Should I replace the type of `Member` back to `auto`. clang-tidy raises a warning with `StringRef`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Mon Jul 6 12:43:03 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:43:03 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: <517acdfecb1effff28886868326c4a05@localhost.localdomain> sameerarora101 updated this revision to Diff 275801. sameerarora101 marked an inline comment as done. sameerarora101 added a comment. `updateLoadCommands` -> `processLoadCommands` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 Files: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82812.275801.patch Type: text/x-patch Size: 12613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:46:54 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:46:54 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: rupprecht added a comment. In D83152#2133855 , @MaskRay wrote: > I cannot find any search result about `no-warning-for-no-symbols`. Is `-no-warning-for-no-symbols` really an existing option? libtool is an `ar` like tool. I found it by looking for underscores instead of hyphens: `-no_warning_for_no_symbols`. However, the flag is an ar/ranlib/libtool flag, not nm, AFAICT. > Second, I wonder how you are going to plug `-no-warning-for-no-symbols` into a build system. If you only parse stdout, you can ignore stderr. Even if you do, you can probably use `grep -v '^no symbols'`. This will have better portability (supported on older nm, supported on other binary formats). I agree this is likely the simpler option (just add `2> /dev/null` to the build script using `nm`) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Mon Jul 6 12:50:07 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via llvm-commits) Date: Mon, 06 Jul 2020 12:50:07 -0700 (PDT) Subject: [llvm] 16d83c3 - [WebAssembly] Added 64-bit memory.grow/size/copy/fill Message-ID: <5f03806f.1c69fb81.3da74.2ce3@mx.google.com> Author: Wouter van Oortmerssen Date: 2020-07-06T12:49:50-07:00 New Revision: 16d83c395a1f8660fc583a66e1927a5c433fbbe1 URL: https://github.com/llvm/llvm-project/commit/16d83c395a1f8660fc583a66e1927a5c433fbbe1 DIFF: https://github.com/llvm/llvm-project/commit/16d83c395a1f8660fc583a66e1927a5c433fbbe1.diff LOG: [WebAssembly] Added 64-bit memory.grow/size/copy/fill This covers both the existing memory functions as well as the new bulk memory proposal. Added new test files since changes where also required in the inputs. Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit. Differential Revision: https://reviews.llvm.org/D82821 Added: llvm/test/CodeGen/WebAssembly/bulk-memory64.ll llvm/test/CodeGen/WebAssembly/memory-addr64.ll Modified: clang/include/clang/Basic/BuiltinsWebAssembly.def clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/builtins-wasm.c llvm/include/llvm/IR/IntrinsicsWebAssembly.td llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp llvm/test/MC/WebAssembly/bulk-memory-encodings.s Removed: llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll ################################################################################ diff --git a/clang/include/clang/Basic/BuiltinsWebAssembly.def b/clang/include/clang/Basic/BuiltinsWebAssembly.def index 5e6f0d90ab46..ecee7782920f 100644 --- a/clang/include/clang/Basic/BuiltinsWebAssembly.def +++ b/clang/include/clang/Basic/BuiltinsWebAssembly.def @@ -25,10 +25,6 @@ BUILTIN(__builtin_wasm_memory_size, "zIi", "n") BUILTIN(__builtin_wasm_memory_grow, "zIiz", "n") -// Bulk memory builtins -TARGET_BUILTIN(__builtin_wasm_memory_init, "vIUiIUiv*UiUi", "", "bulk-memory") -TARGET_BUILTIN(__builtin_wasm_data_drop, "vIUi", "", "bulk-memory") - // Thread-local storage TARGET_BUILTIN(__builtin_wasm_tls_size, "z", "nc", "bulk-memory") TARGET_BUILTIN(__builtin_wasm_tls_align, "z", "nc", "bulk-memory") diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 265fee392a82..91969267cdb9 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -16101,30 +16101,6 @@ Value *CodeGenFunction::EmitWebAssemblyBuiltinExpr(unsigned BuiltinID, Function *Callee = CGM.getIntrinsic(Intrinsic::wasm_memory_grow, ResultType); return Builder.CreateCall(Callee, Args); } - case WebAssembly::BI__builtin_wasm_memory_init: { - llvm::APSInt SegConst; - if (!E->getArg(0)->isIntegerConstantExpr(SegConst, getContext())) - llvm_unreachable("Constant arg isn't actually constant?"); - llvm::APSInt MemConst; - if (!E->getArg(1)->isIntegerConstantExpr(MemConst, getContext())) - llvm_unreachable("Constant arg isn't actually constant?"); - if (!MemConst.isNullValue()) - ErrorUnsupported(E, "non-zero memory index"); - Value *Args[] = {llvm::ConstantInt::get(getLLVMContext(), SegConst), - llvm::ConstantInt::get(getLLVMContext(), MemConst), - EmitScalarExpr(E->getArg(2)), EmitScalarExpr(E->getArg(3)), - EmitScalarExpr(E->getArg(4))}; - Function *Callee = CGM.getIntrinsic(Intrinsic::wasm_memory_init); - return Builder.CreateCall(Callee, Args); - } - case WebAssembly::BI__builtin_wasm_data_drop: { - llvm::APSInt SegConst; - if (!E->getArg(0)->isIntegerConstantExpr(SegConst, getContext())) - llvm_unreachable("Constant arg isn't actually constant?"); - Value *Arg = llvm::ConstantInt::get(getLLVMContext(), SegConst); - Function *Callee = CGM.getIntrinsic(Intrinsic::wasm_data_drop); - return Builder.CreateCall(Callee, {Arg}); - } case WebAssembly::BI__builtin_wasm_tls_size: { llvm::Type *ResultType = ConvertType(E->getType()); Function *Callee = CGM.getIntrinsic(Intrinsic::wasm_tls_size, ResultType); diff --git a/clang/test/CodeGen/builtins-wasm.c b/clang/test/CodeGen/builtins-wasm.c index 36d259f7405d..f7e3dc1ea5e7 100644 --- a/clang/test/CodeGen/builtins-wasm.c +++ b/clang/test/CodeGen/builtins-wasm.c @@ -26,18 +26,6 @@ __SIZE_TYPE__ memory_grow(__SIZE_TYPE__ delta) { // WEBASSEMBLY64: call i64 @llvm.wasm.memory.grow.i64(i32 0, i64 %{{.*}}) } -void memory_init(void *dest, int offset, int size) { - __builtin_wasm_memory_init(3, 0, dest, offset, size); - // WEBASSEMBLY32: call void @llvm.wasm.memory.init(i32 3, i32 0, i8* %{{.*}}, i32 %{{.*}}, i32 %{{.*}}) - // WEBASSEMBLY64: call void @llvm.wasm.memory.init(i32 3, i32 0, i8* %{{.*}}, i32 %{{.*}}, i32 %{{.*}}) -} - -void data_drop() { - __builtin_wasm_data_drop(3); - // WEBASSEMBLY32: call void @llvm.wasm.data.drop(i32 3) - // WEBASSEMBLY64: call void @llvm.wasm.data.drop(i32 3) -} - __SIZE_TYPE__ tls_size() { return __builtin_wasm_tls_size(); // WEBASSEMBLY32: call i32 @llvm.wasm.tls.size.i32() diff --git a/llvm/include/llvm/IR/IntrinsicsWebAssembly.td b/llvm/include/llvm/IR/IntrinsicsWebAssembly.td index 0c08aad22362..7c9ceb148a47 100644 --- a/llvm/include/llvm/IR/IntrinsicsWebAssembly.td +++ b/llvm/include/llvm/IR/IntrinsicsWebAssembly.td @@ -206,20 +206,6 @@ def int_wasm_nearest : [LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]>; -//===----------------------------------------------------------------------===// -// Bulk memory intrinsics -//===----------------------------------------------------------------------===// - -def int_wasm_memory_init : - Intrinsic<[], - [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty], - [IntrWriteMem, IntrInaccessibleMemOrArgMemOnly, WriteOnly>, - IntrHasSideEffects, ImmArg>, ImmArg>]>; -def int_wasm_data_drop : - Intrinsic<[], - [llvm_i32_ty], - [IntrNoDuplicate, IntrHasSideEffects, ImmArg>]>; - //===----------------------------------------------------------------------===// // Thread-local storage intrinsics //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td b/llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td index 05735cf6d31f..3e9ef6fbc7ea 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td +++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td @@ -33,39 +33,43 @@ def wasm_memset_t : SDTypeProfile<0, 4, def wasm_memset : SDNode<"WebAssemblyISD::MEMORY_FILL", wasm_memset_t, [SDNPHasChain, SDNPMayStore]>; +multiclass BulkMemoryOps { + let mayStore = 1, hasSideEffects = 1 in -defm MEMORY_INIT : +defm MEMORY_INIT_A#B : BULK_I<(outs), - (ins i32imm_op:$seg, i32imm_op:$idx, I32:$dest, - I32:$offset, I32:$size), + (ins i32imm_op:$seg, i32imm_op:$idx, rc:$dest, + rc:$offset, rc:$size), (outs), (ins i32imm_op:$seg, i32imm_op:$idx), - [(int_wasm_memory_init (i32 timm:$seg), (i32 timm:$idx), I32:$dest, - I32:$offset, I32:$size - )], + [], "memory.init\t$seg, $idx, $dest, $offset, $size", "memory.init\t$seg, $idx", 0x08>; let hasSideEffects = 1 in defm DATA_DROP : BULK_I<(outs), (ins i32imm_op:$seg), (outs), (ins i32imm_op:$seg), - [(int_wasm_data_drop (i32 timm:$seg))], + [], "data.drop\t$seg", "data.drop\t$seg", 0x09>; let mayLoad = 1, mayStore = 1 in -defm MEMORY_COPY : +defm MEMORY_COPY_A#B : BULK_I<(outs), (ins i32imm_op:$src_idx, i32imm_op:$dst_idx, - I32:$dst, I32:$src, I32:$len), + rc:$dst, rc:$src, rc:$len), (outs), (ins i32imm_op:$src_idx, i32imm_op:$dst_idx), [(wasm_memcpy (i32 imm:$src_idx), (i32 imm:$dst_idx), - I32:$dst, I32:$src, I32:$len + rc:$dst, rc:$src, rc:$len )], "memory.copy\t$src_idx, $dst_idx, $dst, $src, $len", "memory.copy\t$src_idx, $dst_idx", 0x0a>; let mayStore = 1 in -defm MEMORY_FILL : - BULK_I<(outs), (ins i32imm_op:$idx, I32:$dst, I32:$value, I32:$size), +defm MEMORY_FILL_A#B : + BULK_I<(outs), (ins i32imm_op:$idx, rc:$dst, I32:$value, rc:$size), (outs), (ins i32imm_op:$idx), - [(wasm_memset (i32 imm:$idx), I32:$dst, I32:$value, I32:$size)], + [(wasm_memset (i32 imm:$idx), rc:$dst, I32:$value, rc:$size)], "memory.fill\t$idx, $dst, $value, $size", "memory.fill\t$idx", 0x0b>; +} + +defm : BulkMemoryOps; +defm : BulkMemoryOps; diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td b/llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td index 686c017593c8..b3c63cc1f884 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td +++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td @@ -365,19 +365,24 @@ defm : StorePatGlobalAddrOffOnly; defm : StorePatGlobalAddrOffOnly; defm : StorePatGlobalAddrOffOnly; +multiclass MemoryOps { // Current memory size. -defm MEMORY_SIZE_I32 : I<(outs I32:$dst), (ins i32imm:$flags), +defm MEMORY_SIZE_A#B : I<(outs rc:$dst), (ins i32imm:$flags), (outs), (ins i32imm:$flags), - [(set I32:$dst, + [(set rc:$dst, (int_wasm_memory_size (i32 imm:$flags)))], "memory.size\t$dst, $flags", "memory.size\t$flags", 0x3f>; // Grow memory. -defm MEMORY_GROW_I32 : I<(outs I32:$dst), (ins i32imm:$flags, I32:$delta), +defm MEMORY_GROW_A#B : I<(outs rc:$dst), (ins i32imm:$flags, rc:$delta), (outs), (ins i32imm:$flags), - [(set I32:$dst, + [(set rc:$dst, (int_wasm_memory_grow (i32 imm:$flags), - I32:$delta))], + rc:$delta))], "memory.grow\t$dst, $flags, $delta", "memory.grow\t$flags", 0x40>; +} + +defm : MemoryOps; +defm : MemoryOps; diff --git a/llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp index b2b2b7a9d705..16e05150c64e 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp @@ -22,15 +22,15 @@ SDValue WebAssemblySelectionDAGInfo::EmitTargetCodeForMemcpy( SelectionDAG &DAG, const SDLoc &DL, SDValue Chain, SDValue Dst, SDValue Src, SDValue Size, Align Alignment, bool IsVolatile, bool AlwaysInline, MachinePointerInfo DstPtrInfo, MachinePointerInfo SrcPtrInfo) const { - if (!DAG.getMachineFunction() - .getSubtarget() - .hasBulkMemory()) + auto &ST = DAG.getMachineFunction().getSubtarget(); + if (!ST.hasBulkMemory()) return SDValue(); SDValue MemIdx = DAG.getConstant(0, DL, MVT::i32); + auto LenMVT = ST.hasAddr64() ? MVT::i64 : MVT::i32; return DAG.getNode(WebAssemblyISD::MEMORY_COPY, DL, MVT::Other, {Chain, MemIdx, MemIdx, Dst, Src, - DAG.getZExtOrTrunc(Size, DL, MVT::i32)}); + DAG.getZExtOrTrunc(Size, DL, LenMVT)}); } SDValue WebAssemblySelectionDAGInfo::EmitTargetCodeForMemmove( @@ -46,14 +46,14 @@ SDValue WebAssemblySelectionDAGInfo::EmitTargetCodeForMemset( SelectionDAG &DAG, const SDLoc &DL, SDValue Chain, SDValue Dst, SDValue Val, SDValue Size, Align Alignment, bool IsVolatile, MachinePointerInfo DstPtrInfo) const { - if (!DAG.getMachineFunction() - .getSubtarget() - .hasBulkMemory()) + auto &ST = DAG.getMachineFunction().getSubtarget(); + if (!ST.hasBulkMemory()) return SDValue(); SDValue MemIdx = DAG.getConstant(0, DL, MVT::i32); + auto LenMVT = ST.hasAddr64() ? MVT::i64 : MVT::i32; // Only low byte matters for val argument, so anyext the i8 return DAG.getNode(WebAssemblyISD::MEMORY_FILL, DL, MVT::Other, Chain, MemIdx, Dst, DAG.getAnyExtOrTrunc(Val, DL, MVT::i32), - DAG.getZExtOrTrunc(Size, DL, MVT::i32)); + DAG.getZExtOrTrunc(Size, DL, LenMVT)); } diff --git a/llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll b/llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll deleted file mode 100644 index dfb74b78f64f..000000000000 --- a/llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll +++ /dev/null @@ -1,28 +0,0 @@ -; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+bulk-memory | FileCheck %s - -; Test that bulk memory intrinsics lower correctly - -target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" -target triple = "wasm32-unknown-unknown" - -; CHECK-LABEL: memory_init: -; CHECK-NEXT: .functype memory_init (i32, i32, i32) -> () -; CHECK-NEXT: memory.init 3, 0, $0, $1, $2 -; CHECK-NEXT: return -declare void @llvm.wasm.memory.init(i32, i32, i8*, i32, i32) -define void @memory_init(i8* %dest, i32 %offset, i32 %size) { - call void @llvm.wasm.memory.init( - i32 3, i32 0, i8* %dest, i32 %offset, i32 %size - ) - ret void -} - -; CHECK-LABEL: data_drop: -; CHECK-NEXT: .functype data_drop () -> () -; CHECK-NEXT: data.drop 3 -; CHECK-NEXT: return -declare void @llvm.wasm.data.drop(i32) -define void @data_drop() { - call void @llvm.wasm.data.drop(i32 3) - ret void -} diff --git a/llvm/test/CodeGen/WebAssembly/bulk-memory64.ll b/llvm/test/CodeGen/WebAssembly/bulk-memory64.ll new file mode 100644 index 000000000000..6c450f5b5398 --- /dev/null +++ b/llvm/test/CodeGen/WebAssembly/bulk-memory64.ll @@ -0,0 +1,210 @@ +; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+bulk-memory | FileCheck %s --check-prefixes CHECK,BULK-MEM +; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=-bulk-memory | FileCheck %s --check-prefixes CHECK,NO-BULK-MEM + +; Test that basic bulk memory codegen works correctly + +target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" +target triple = "wasm64-unknown-unknown" + +declare void @llvm.memcpy.p0i8.p0i8.i8(i8*, i8*, i8, i1) +declare void @llvm.memcpy.p0i8.p0i8.i64(i8*, i8*, i64, i1) +declare void @llvm.memcpy.p0i32.p0i32.i64(i32*, i32*, i64, i1) + +declare void @llvm.memmove.p0i8.p0i8.i8(i8*, i8*, i8, i1) +declare void @llvm.memmove.p0i8.p0i8.i64(i8*, i8*, i64, i1) +declare void @llvm.memmove.p0i32.p0i32.i64(i32*, i32*, i64, i1) + +declare void @llvm.memset.p0i8.i8(i8*, i8, i8, i1) +declare void @llvm.memset.p0i8.i64(i8*, i8, i64, i1) +declare void @llvm.memset.p0i32.i64(i32*, i8, i64, i1) + +; CHECK-LABEL: memcpy_i8: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memcpy_i8 (i64, i64, i32) -> () +; BULK-MEM-NEXT: i64.extend_i32_u $push0=, $2 +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $pop0 +; BULK-MEM-NEXT: return +define void @memcpy_i8(i8* %dest, i8* %src, i8 zeroext %len) { + call void @llvm.memcpy.p0i8.p0i8.i8(i8* %dest, i8* %src, i8 %len, i1 0) + ret void +} + +; CHECK-LABEL: memmove_i8: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memmove_i8 (i64, i64, i32) -> () +; BULK-MEM-NEXT: i64.extend_i32_u $push0=, $2 +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $pop0 +; BULK-MEM-NEXT: return +define void @memmove_i8(i8* %dest, i8* %src, i8 zeroext %len) { + call void @llvm.memmove.p0i8.p0i8.i8(i8* %dest, i8* %src, i8 %len, i1 0) + ret void +} + +; CHECK-LABEL: memset_i8: +; NO-BULK-MEM-NOT: memory.fill +; BULK-MEM-NEXT: .functype memset_i8 (i64, i32, i32) -> () +; BULK-MEM-NEXT: i64.extend_i32_u $push0=, $2 +; BULK-MEM-NEXT: memory.fill 0, $0, $1, $pop0 +; BULK-MEM-NEXT: return +define void @memset_i8(i8* %dest, i8 %val, i8 zeroext %len) { + call void @llvm.memset.p0i8.i8(i8* %dest, i8 %val, i8 %len, i1 0) + ret void +} + +; CHECK-LABEL: memcpy_i32: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memcpy_i32 (i64, i64, i64) -> () +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $2 +; BULK-MEM-NEXT: return +define void @memcpy_i32(i32* %dest, i32* %src, i64 %len) { + call void @llvm.memcpy.p0i32.p0i32.i64(i32* %dest, i32* %src, i64 %len, i1 0) + ret void +} + +; CHECK-LABEL: memmove_i32: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memmove_i32 (i64, i64, i64) -> () +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $2 +; BULK-MEM-NEXT: return +define void @memmove_i32(i32* %dest, i32* %src, i64 %len) { + call void @llvm.memmove.p0i32.p0i32.i64(i32* %dest, i32* %src, i64 %len, i1 0) + ret void +} + +; CHECK-LABEL: memset_i32: +; NO-BULK-MEM-NOT: memory.fill +; BULK-MEM-NEXT: .functype memset_i32 (i64, i32, i64) -> () +; BULK-MEM-NEXT: memory.fill 0, $0, $1, $2 +; BULK-MEM-NEXT: return +define void @memset_i32(i32* %dest, i8 %val, i64 %len) { + call void @llvm.memset.p0i32.i64(i32* %dest, i8 %val, i64 %len, i1 0) + ret void +} + +; CHECK-LABEL: memcpy_1: +; CHECK-NEXT: .functype memcpy_1 (i64, i64) -> () +; CHECK-NEXT: i32.load8_u $push[[L0:[0-9]+]]=, 0($1) +; CHECK-NEXT: i32.store8 0($0), $pop[[L0]] +; CHECK-NEXT: return +define void @memcpy_1(i8* %dest, i8* %src) { + call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 1, i1 0) + ret void +} + +; CHECK-LABEL: memmove_1: +; CHECK-NEXT: .functype memmove_1 (i64, i64) -> () +; CHECK-NEXT: i32.load8_u $push[[L0:[0-9]+]]=, 0($1) +; CHECK-NEXT: i32.store8 0($0), $pop[[L0]] +; CHECK-NEXT: return +define void @memmove_1(i8* %dest, i8* %src) { + call void @llvm.memmove.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 1, i1 0) + ret void +} + +; CHECK-LABEL: memset_1: +; NO-BULK-MEM-NOT: memory.fill +; BULK-MEM-NEXT: .functype memset_1 (i64, i32) -> () +; BULK-MEM-NEXT: i32.store8 0($0), $1 +; BULK-MEM-NEXT: return +define void @memset_1(i8* %dest, i8 %val) { + call void @llvm.memset.p0i8.i64(i8* %dest, i8 %val, i64 1, i1 0) + ret void +} + +; CHECK-LABEL: memcpy_1024: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memcpy_1024 (i64, i64) -> () +; BULK-MEM-NEXT: i64.const $push[[L0:[0-9]+]]=, 1024 +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $pop[[L0]] +; BULK-MEM-NEXT: return +define void @memcpy_1024(i8* %dest, i8* %src) { + call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 1024, i1 0) + ret void +} + +; CHECK-LABEL: memmove_1024: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memmove_1024 (i64, i64) -> () +; BULK-MEM-NEXT: i64.const $push[[L0:[0-9]+]]=, 1024 +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $1, $pop[[L0]] +; BULK-MEM-NEXT: return +define void @memmove_1024(i8* %dest, i8* %src) { + call void @llvm.memmove.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 1024, i1 0) + ret void +} + +; CHECK-LABEL: memset_1024: +; NO-BULK-MEM-NOT: memory.fill +; BULK-MEM-NEXT: .functype memset_1024 (i64, i32) -> () +; BULK-MEM-NEXT: i64.const $push[[L0:[0-9]+]]=, 1024 +; BULK-MEM-NEXT: memory.fill 0, $0, $1, $pop[[L0]] +; BULK-MEM-NEXT: return +define void @memset_1024(i8* %dest, i8 %val) { + call void @llvm.memset.p0i8.i64(i8* %dest, i8 %val, i64 1024, i1 0) + ret void +} + +; The following tests check that frame index elimination works for +; bulk memory instructions. The stack pointer is bumped by 112 instead +; of 100 because the stack pointer in WebAssembly is currently always +; 16-byte aligned, even in leaf functions, although it is not written +; back to the global in this case. + +; TODO: Change TransientStackAlignment to 1 to avoid this extra +; arithmetic. This will require forcing the use of StackAlignment in +; PrologEpilogEmitter.cpp when +; WebAssemblyFrameLowering::needsSPWriteback would be true. + +; CHECK-LABEL: memcpy_alloca_src: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memcpy_alloca_src (i64) -> () +; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer +; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112 +; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]] +; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12 +; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]] +; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100 +; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]] +; BULK-MEM-NEXT: return +define void @memcpy_alloca_src(i8* %dst) { + %a = alloca [100 x i8] + %p = bitcast [100 x i8]* %a to i8* + call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %p, i64 100, i1 false) + ret void +} + +; CHECK-LABEL: memcpy_alloca_dst: +; NO-BULK-MEM-NOT: memory.copy +; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i64) -> () +; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer +; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112 +; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]] +; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12 +; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]] +; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100 +; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]] +; BULK-MEM-NEXT: return +define void @memcpy_alloca_dst(i8* %src) { + %a = alloca [100 x i8] + %p = bitcast [100 x i8]* %a to i8* + call void @llvm.memcpy.p0i8.p0i8.i64(i8* %p, i8* %src, i64 100, i1 false) + ret void +} + +; CHECK-LABEL: memset_alloca: +; NO-BULK-MEM-NOT: memory.fill +; BULK-MEM-NEXT: .functype memset_alloca (i32) -> () +; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer +; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112 +; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]] +; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12 +; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]] +; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100 +; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]] +; BULK-MEM-NEXT: return +define void @memset_alloca(i8 %val) { + %a = alloca [100 x i8] + %p = bitcast [100 x i8]* %a to i8* + call void @llvm.memset.p0i8.i64(i8* %p, i8 %val, i64 100, i1 false) + ret void +} diff --git a/llvm/test/CodeGen/WebAssembly/memory-addr64.ll b/llvm/test/CodeGen/WebAssembly/memory-addr64.ll new file mode 100644 index 000000000000..7268d166783a --- /dev/null +++ b/llvm/test/CodeGen/WebAssembly/memory-addr64.ll @@ -0,0 +1,27 @@ +; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck %s + +; Test that basic memory operations assemble as expected with 64-bit addresses. + +target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" +target triple = "wasm64-unknown-unknown" + +declare i64 @llvm.wasm.memory.size.i64(i32) nounwind readonly +declare i64 @llvm.wasm.memory.grow.i64(i32, i64) nounwind + +; CHECK-LABEL: memory_size: +; CHECK-NEXT: .functype memory_size () -> (i64){{$}} +; CHECK-NEXT: memory.size $push0=, 0{{$}} +; CHECK-NEXT: return $pop0{{$}} +define i64 @memory_size() { + %a = call i64 @llvm.wasm.memory.size.i64(i32 0) + ret i64 %a +} + +; CHECK-LABEL: memory_grow: +; CHECK-NEXT: .functype memory_grow (i64) -> (i64){{$}} +; CHECK: memory.grow $push0=, 0, $0{{$}} +; CHECK-NEXT: return $pop0{{$}} +define i64 @memory_grow(i64 %n) { + %a = call i64 @llvm.wasm.memory.grow.i64(i32 0, i64 %n) + ret i64 %a +} diff --git a/llvm/test/MC/WebAssembly/bulk-memory-encodings.s b/llvm/test/MC/WebAssembly/bulk-memory-encodings.s index d661932d2c8d..0863527c8b36 100644 --- a/llvm/test/MC/WebAssembly/bulk-memory-encodings.s +++ b/llvm/test/MC/WebAssembly/bulk-memory-encodings.s @@ -1,4 +1,5 @@ # RUN: llvm-mc -show-encoding -triple=wasm32-unknown-unknown -mattr=+bulk-memory < %s | FileCheck %s +# RUN: llvm-mc -show-encoding -triple=wasm64-unknown-unknown -mattr=+bulk-memory < %s | FileCheck %s main: .functype main () -> () From llvm-commits at lists.llvm.org Mon Jul 6 12:50:15 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:50:15 +0000 (UTC) Subject: [PATCH] D82821: [WebAssembly] Added 64-bit memory.grow/size/init/copy/fill In-Reply-To: References: Message-ID: <918ed8ed43290316ec9599625ba2f4e6@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG16d83c395a1f: [WebAssembly] Added 64-bit memory.grow/size/copy/fill (authored by aardappel). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D82821?vs=275785&id=275804#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82821/new/ https://reviews.llvm.org/D82821 Files: clang/include/clang/Basic/BuiltinsWebAssembly.def clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/builtins-wasm.c llvm/include/llvm/IR/IntrinsicsWebAssembly.td llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll llvm/test/CodeGen/WebAssembly/bulk-memory64.ll llvm/test/CodeGen/WebAssembly/memory-addr64.ll llvm/test/MC/WebAssembly/bulk-memory-encodings.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82821.275804.patch Type: text/x-patch Size: 22118 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:52:35 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:52:35 +0000 (UTC) Subject: [PATCH] D83240: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping In-Reply-To: References: Message-ID: <2a470f05a11e8e216e850ad072e65bdb@localhost.localdomain> arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.d16.ll:1-4 +; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs -global-isel | FileCheck -enable-var-scope -check-prefixes=GCN,UNPACKED,PREGFX10,PREGFX10-UNPACKED %s +; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs -global-isel | FileCheck -enable-var-scope -check-prefixes=GCN,PACKED,PREGFX10,PREGFX10-PACKED %s +; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel | FileCheck -enable-var-scope -check-prefixes=GCN,PACKED,PREGFX10,PREGFX10-PACKED %s +; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -global-isel | FileCheck -enable-var-scope -check-prefixes=GCN,PACKED,GFX10,GFX10-PACKED %s ---------------- Can you move the -global-isel argument to the beginning? ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll:1-3 +;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs -global-isel | FileCheck -check-prefixes=GCN,VERDE,PREGFX10 %s +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs -global-isel | FileCheck -check-prefixes=GCN,PREGFX10 %s +;RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -global-isel | FileCheck -check-prefixes=GCN,GFX10 %s ---------------- Ditto (also space between ; and RUN) ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll:31 +main_body: + %in1 = bitcast <4 x float> %1 to <4 x i32> + call void @llvm.amdgcn.raw.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 42, i32 0, i32 117, i32 0) ---------------- You can just make the argument types the result instead of having the bitcast ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll:75 +} + +declare void @llvm.amdgcn.raw.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32) #0 ---------------- Can you also add cases that require waterfall loops? I think the part to handle them is missing Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83240/new/ https://reviews.llvm.org/D83240 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:21 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:21 +0000 (UTC) Subject: [PATCH] D83192: Fix off by one error in Bitfields Message-ID: gchatelet created this revision. gchatelet added a reviewer: courbet. Herald added subscribers: llvm-commits, dexonsmith. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83192 Files: llvm/include/llvm/ADT/Bitfields.h Index: llvm/include/llvm/ADT/Bitfields.h =================================================================== --- llvm/include/llvm/ADT/Bitfields.h +++ llvm/include/llvm/ADT/Bitfields.h @@ -227,7 +227,7 @@ static constexpr unsigned Shift = Offset; static constexpr unsigned Bits = Size; static constexpr unsigned FirstBit = Offset; - static constexpr unsigned LastBit = Shift + Bits; + static constexpr unsigned LastBit = Shift + Bits - 1; private: template friend struct bitfields_details::Impl; @@ -273,7 +273,7 @@ /// Returns whether the two bitfields share common bits. template static constexpr bool isOverlapping() { - return A::LastBit > B::FirstBit && B::LastBit > A::FirstBit; + return A::LastBit >= B::FirstBit && B::LastBit >= A::FirstBit; } }; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83192.275597.patch Type: text/x-patch Size: 852 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:21 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:21 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: <21bb26f47baa441c19d172378e492509@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:132 +static Error updateLoadCommands(const CopyConfig &Config, Object &Obj) { + + // Remove RPaths. ---------------- Nit: remove blank line at start of function. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:217 + + // Add new RPaths. + for (StringRef RPath : Config.RPathToAdd) { ---------------- sameerarora101 wrote: > alexshap wrote: > > sameerarora101 wrote: > > > alexshap wrote: > > > > smeenai wrote: > > > > > Nit: do we want to be adding new load commands in a function called `updateLoadCommands`? At least to me that function name seems like it should only be updating existing load commands, since we have a separate `removeLoadCommands` to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate `addLoadCommands` function. > > > > so basically the idea was to group together logical pieces of handleArgs (to some reasonable extent). > > > > Besides error-reporting removeLoadCommands is ~10-12 lines of code, so I'd probably inline it into > > > > updateLoadCommands for consistency. > > > @alexshap Instead of inlining the whole `removeLoadCommands` inside `updateLoadCommands` I think it would cleaner if I just call > > > > > > ``` > > > // Remove LCs. > > > if (Error E = removeLoadCommands(Config, Obj)) > > > return E; > > > ``` > > > from inside `updateLoadCommands`. This can allow for independent development of `removeLoadCommands` in future as well. What do you think? (I have updated the current diff with this change, however, I can update it again in case we want something else) > > I'm not sure that removeLoadCommands is realistically independent from updateLoadCommands, e.g. the order in which you modify the list of load commands appears to be important. Since it's small (~10 lines) it seems preferable to avoid creating this weird asymmetry between removing/adding. > I see. Ok, I have inlined `removeLoadCommands` into `updateLoadCommands` then. Thanks 😊 There are two options. Either a) rename `updateLoadCommands` to something more generic (e.g. `processLoadCommands`, in which case I'd ensure all load command processing is done in that function), or b) think of it purely in conceptual terms where the load commands in the function name refers to the set of load commands, rather than each individual load command, if that makes sense. Thus you update that set by adding/removing elements, and also changing the existing elements. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:21 2020 From: llvm-commits at lists.llvm.org (Nathan James via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:21 +0000 (UTC) Subject: [PATCH] D82159: Add a cmake warning when someone tries to configure clang-tools-extra without clang In-Reply-To: References: Message-ID: njames93 added a comment. Ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82159/new/ https://reviews.llvm.org/D82159 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:22 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:22 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/docs/CommandGuide/llvm-libtool-darwin.rst:24-26 +.. option:: -help, -h + + Display usage information and exit. ---------------- sameerarora101 wrote: > jhenderson wrote: > > I suspect if you do `-help` you'll see one or two other options too. You'll probably want to include `-help-list` here at least, and might need to hide some other options. Could you post (out-of-line) what the full help text is, please? > here is the full help text from cmd line: > > ``` > OVERVIEW: llvm-libtool-darwin > > USAGE: llvm-libtool-darwin [options] > > OPTIONS: > > Color Options: > > --color - Use colors in output (default=autodetect) > > General options: > > -o - Alias for --output > --output= - Specify output filename > > Generic Options: > > --help - Display available options (--help-hidden for more) > --help-list - Display list of available options (--help-list-hidden for more) > --version - Display the version of this program > ``` > > I have added description for `--help-list` and `--color` now as well Thanks. Some more comments: 1) As this is a Darwin-inspired tool, we should use the standard option naming throughout. If I understand it correctly, this means single dashes rather than double. 2) You probably want to use the documentation for the various common options (help, version, color etc) used in the other tool documentation, for consistency. Take a look at e.g. llvm-objcopy or llvm-dwarfdump. In particular, I wouldn't report the "hidden" versions of the help options (they're hidden for a reason...). 3) Documentation should use full English grammar rules with leading caps and trailing full stops, like comments in the code. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/basic.test:1 +# This test checks that main exits normally (EC 0) for correct input/output args. + ---------------- Nit: use '##' for comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:10 + +# UNKNOWN-ARG: Unknown command line argument '{{-+}}abcabc' ---------------- sameerarora101 wrote: > jhenderson wrote: > > I believe this should include `error:` as a prefix? Please add it, if so. Same applies in the other test. > No, the output doesn't have `error:` as a prefix. Here is the output for passing `-abcabc`: > ``` > llvm-libtool-darwin: Unknown command line argument '-abcabc'. Try: './bin/llvm-libtool-darwin --help' > ``` > > `error:` is also not there as a prefix for the other test. Yuck, okay. Maybe something to look at another time, I guess. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:25 + +# DOUBLE-OUTPUT: for the --output option: must occur exactly one time! ---------------- sameerarora101 wrote: > jhenderson wrote: > > You probably want a test that shows that the expected passing cases also work (i.e. exit with code 0 for now). This could probably be a test file called `basic.test` for now. You can replace or expand the test as you add useful functionality. I'd expect the test to have cases for both one input and multiple inputs. > Ok, I added `basic.test` where I simply run > ``` > # RUN: yaml2obj %S/Inputs/input1.yaml -o %t-input1.o > # RUN: yaml2obj %S/Inputs/input2.yaml -o %t-input2.o > > ## Pass single input: > # RUN: llvm-libtool-darwin -o %t.lib %t-input1.o > > ## Pass multiple inputs: > # RUN: llvm-libtool-darwin -o %t.lib %t-input1.o %t-input2.o > ``` > Is this sufficient? Or is there some other way I need to verify that the exit code was indeed 0? That's sufficient, thanks. If a tool returns a non-zero exit code, lit will automatically fail unless `not` is prepended. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:1 +## This test checks that an error is thrown in case of invalid input/output args. + ---------------- On further reflection, perhaps it makes sense to combine this and basic.test into a single test. What do you think? ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:22 + cl::Required); +static cl::alias OutputFileShort("o", cl::desc("Alias for --output"), + cl::aliasopt(OutputFile), cl::NotHidden); ---------------- Similar to my documentation comment, I'm okay with this using single-dash if it's more common on Darwin. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:22 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:22 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <4f39ca679f1a022e84a883e847f6b9f8@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:1 +## This test checks that an error is thrown in case of invalid input/output args. + ---------------- jhenderson wrote: > On further reflection, perhaps it makes sense to combine this and basic.test into a single test. What do you think? Actually, ignore my previous comment, since basic.test is only short-term. You'll probably want to add --static to these tests when you add support to that option to avoid any potential confusion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:22 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:22 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <1a168d902ff156f8f54d911b6be4d24f@localhost.localdomain> hans added a comment. In D83013#2132070 , @echristo wrote: > Adding Chandler and Alina here as well. > > In general, I don't think that this is such a great idea. Being able to have this sort of thing work more reliably is one of the reasons for the new pass manager. I think I'd like to see this split out into an old versus new pass manager pass to avoid the difficulty of cleaning this up after we finish migrating llvm to the new pass manager. This also seems to add some technical debt around options and other enablement which is also less than ideal. Is this compelling to add right now versus finishing work migrating llvm completely to the new pass manager and removing the old one? From speaking with Alina I think that work should be done in a short while. Given how long the new pass manager has been in progress, we definitely don't want to block on enabling it. So yes, porting this pass to the current pass manager is compelling to do right now. I also don't see why it should be a big deal. As for splitting it into separate passes, this patch technically does that, although it extracts and changes the core code a bit so it can be shared between the passes. I think that's how most passes have been adapted to work with both pass managers, no? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:22 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:22 +0000 (UTC) Subject: [PATCH] D83173: [VE] Correct stack alignment In-Reply-To: References: Message-ID: <9dc16acd156047fd51bdda8a4065e441@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGdf3bda047d5a: [VE] Correct stack alignment (authored by kaz7). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83173/new/ https://reviews.llvm.org/D83173 Files: clang/lib/Basic/Targets/VE.h clang/test/CodeGen/target-data.c llvm/lib/Target/VE/VETargetMachine.cpp Index: llvm/lib/Target/VE/VETargetMachine.cpp =================================================================== --- llvm/lib/Target/VE/VETargetMachine.cpp +++ llvm/lib/Target/VE/VETargetMachine.cpp @@ -41,8 +41,8 @@ // VE supports 32 bit and 64 bits integer on registers Ret += "-n32:64"; - // Stack alignment is 64 bits - Ret += "-S64"; + // Stack alignment is 128 bits + Ret += "-S128"; return Ret; } Index: clang/test/CodeGen/target-data.c =================================================================== --- clang/test/CodeGen/target-data.c +++ clang/test/CodeGen/target-data.c @@ -250,3 +250,7 @@ // RUN: %clang_cc1 -triple bpfeb -o - -emit-llvm %s | \ // RUN: FileCheck %s -check-prefix=BPFEB // BPFEB: target datalayout = "E-m:e-p:64:64-i64:64-i128:128-n32:64-S128" + +// RUN: %clang_cc1 -triple ve -o - -emit-llvm %s | \ +// RUN: FileCheck %s -check-prefix=VE +// VE: target datalayout = "e-m:e-i64:64-n32:64-S128" Index: clang/lib/Basic/Targets/VE.h =================================================================== --- clang/lib/Basic/Targets/VE.h +++ clang/lib/Basic/Targets/VE.h @@ -45,7 +45,7 @@ WCharType = UnsignedInt; WIntType = UnsignedInt; UseZeroLengthBitfieldAlignment = true; - resetDataLayout("e-m:e-i64:64-n32:64-S64"); + resetDataLayout("e-m:e-i64:64-n32:64-S128"); } void getTargetDefines(const LangOptions &Opts, -------------- next part -------------- A non-text attachment was scrubbed... Name: D83173.275598.patch Type: text/x-patch Size: 1396 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:23 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:23 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: <99cf3ba1746bbf3647e707f00fc60fe6@localhost.localdomain> benshi001 marked an inline comment as done. benshi001 added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15750 + // are too large. + if (ConstNode.getScalarValueSizeInBits() > 8 * sizeof(int64_t)) + return true; ---------------- MaskRay wrote: > This should be `>=` This should be >, not >=, otherwise riscv64/aarch64 will still fall to regression for i64 add-mul. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:23 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:23 +0000 (UTC) Subject: [PATCH] D83172: [Attributor] Create getter function for the ID of the abstract attribute In-Reply-To: References: Message-ID: <009b1111a212d87dabc0b9841f58807c@localhost.localdomain> bbn updated this revision to Diff 275599. bbn added a comment. Herald added a subscriber: hiraditya. Rebased. I think we can create a separate patch for the macro staff. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83172/new/ https://reviews.llvm.org/D83172 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83172.275599.patch Type: text/x-patch Size: 7685 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:24 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:24 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:15 +# CHECK-NAMES-NEXT: input2.o +# CHECK-NAMES-NOT: {{.}} + ---------------- Watch out, here and in other cases, this only shows that there is no output AFTER `input2.o`. It's possible that there's input before `input1.o`. Also, you'll not catching name corruptions resulting in a prefix/suffix of the name, since FileCheck only does sub-string matching by default, not full line matching. For example, this would fail if the following was emitted: ``` input-I-really-shouldn't-be-here input1.osuffix prefixinput2.o ``` You probably want to add `--implicit-check-not={{.}}` to the FileCheck command line, rather than the final `CHECK-NAMES-NOT`. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:19 +# RUN: llvm-nm --print-armap %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-SYMBOLS + ---------------- It would be best to check the symbol is in the right member. You can do this by using FileCheck's -D option, combined with the `%t_basename` (NB: check the exact name for correctness, but there should be other examples): ``` # RUN: ... FileCheck %s -D FILE1=%t_basename ... # CHECK-SYMBOLS: _symbol1 in [[FILE1]] ``` and so on. This defines a string variable that matches the specified input string, and can be used using the `[[VAR]]` syntax as shown. Take a look at the FileCheck documentation for details. Also, you should check the `Archive map` string to ensure there's no symbol before the first. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:22 +# CHECK-SYMBOLS: _symbol1 in +# CHECK-SYMBOLS: _symbol2 in +# CHECK-SYMBOLS-EMPTY: ---------------- Use `CHECK-SYMBOLS-NEXT` here to show there's no symbol in between the two symbols. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:41-42 + +# OVERWRITE-NAMES: input2.o +# OVERWRITE-NAMES-NOT: {{.}} + ---------------- This has the same issue as the earlier comment re. `CHECK-NOT`/`--implicit-check-not` In fact, if you used OVERWRITE-NAMES and OVERWRITE-SYMBOLS instead of CHECK-NAMES/CHECK-SYMBOLS above, the test will still pass. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:44-45 + +# OVERWRITE-SYMBOLS: _symbol2 in +# OVERWRITE-SYMBOLS-EMPTY: + ---------------- Same as above. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:54-62 +# DUPLICATE-NAMES: input1.o +# DUPLICATE-NAMES: input2.o +# DUPLICATE-NAMES: input1.o +# DUPLICATE-NAMES-NOT: {{.}} + +# DUPLICATE-SYMBOLS: _symbol1 in +# DUPLICATE-SYMBOLS: _symbol2 in ---------------- Same comments here as above. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:7 + +# MISSING-OPERATION: Library Type: option: must be specified at least once! + ---------------- Does the double space match the actual error message? ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:9-18 +## Input file not found: +# RUN: not llvm-libtool-darwin -static -o %t.lib %t.missing 2>&1 | \ +# RUN: FileCheck %s --check-prefix=NO-FILE -DFILE=%t.missing + +# NO-FILE: '[[FILE]]': {{[nN]}}o such file or directory + +## Input file is not an object file: ---------------- These two checks plus the ELF one below probably belong in the invalid input/output arguments test. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:20 + +# NOT-OBJECT: The file was not recognized as a valid object file + ---------------- Does this message use `error:` as a prefix? ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:24 +# RUN: yaml2obj %s -o %t +# RUN: not llvm-libtool-darwin -static -o $t.lib %t 2>&1 | \ +# RUN: FileCheck %s --check-prefix=NOT-MACHO ---------------- Did you mean to use `$t.lib`? (probably not) ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:29-34 +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_REL + Machine: EM_X86_64 ---------------- Only these lines are required to emit a minimal ELF object. Delete everything else. Also, we prefer to use minimal amounts of padding: ``` --- !ELF FileHeader: Class: ELFCLASS64 Data: ELFDATA2LSB Type: ET_REL Machine: EM_X86_64 ``` ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:34-38 +static cl::opt LibraryOperation( + cl::desc("Library Type: "), + cl::values( + clEnumValN(Static, "static", + "Produce a statically linked library from the input files")), ---------------- I'm not really familiar with the `Operation` type. What does it look like in the help text? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:24 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:24 +0000 (UTC) Subject: [PATCH] D83082: [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode In-Reply-To: References: Message-ID: <6b4c510f73e4050430548a40057fa5cd@localhost.localdomain> gchatelet added a comment. Thank you for reporting the issue and providing a reduced test case. I'm working on a fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83082/new/ https://reviews.llvm.org/D83082 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:24 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:24 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a reviewer: rengolin. Herald added a project: LLVM. Fixes up one warning in this test: sve-sext-zext.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83195 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -4334,7 +4334,6 @@ EVT OutVT = N->getValueType(0); EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT); assert(NOutVT.isVector() && "This type must be promoted to a vector type"); - unsigned OutNumElems = OutVT.getVectorNumElements(); EVT NOutVTElem = NOutVT.getVectorElementType(); SDLoc dl(N); @@ -4371,6 +4370,7 @@ EVT InVT = InOp0.getValueType(); + unsigned OutNumElems = OutVT.getVectorNumElements(); SmallVector Ops; Ops.reserve(OutNumElems); for (unsigned i = 0; i != OutNumElems; ++i) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83195.275602.patch Type: text/x-patch Size: 805 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:24 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:24 +0000 (UTC) Subject: [PATCH] D83196: [CodeGen] Fix a warning in DAGTypeLegalizer::SetSplitVector Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a reviewer: rengolin. Herald added a project: LLVM. This fixes up one warning in the test: sve-sext-zext.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83196 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp @@ -836,8 +836,8 @@ SDValue Hi) { assert(Lo.getValueType().getVectorElementType() == Op.getValueType().getVectorElementType() && - 2*Lo.getValueType().getVectorNumElements() == - Op.getValueType().getVectorNumElements() && + Lo.getValueType().getVectorElementCount() * 2 == + Op.getValueType().getVectorElementCount() && Hi.getValueType() == Lo.getValueType() && "Invalid type for split vector"); // Lo/Hi may have been newly allocated, if so, add nodeid's as relevant. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83196.275603.patch Type: text/x-patch Size: 804 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:25 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:25 +0000 (UTC) Subject: [PATCH] D83192: Fix off by one error in Bitfields In-Reply-To: References: Message-ID: <86f84c823ebade836793c3431ee39f99@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG4c0a965c0926: Fix off by one error in Bitfields (authored by gchatelet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83192/new/ https://reviews.llvm.org/D83192 Files: llvm/include/llvm/ADT/Bitfields.h Index: llvm/include/llvm/ADT/Bitfields.h =================================================================== --- llvm/include/llvm/ADT/Bitfields.h +++ llvm/include/llvm/ADT/Bitfields.h @@ -227,7 +227,7 @@ static constexpr unsigned Shift = Offset; static constexpr unsigned Bits = Size; static constexpr unsigned FirstBit = Offset; - static constexpr unsigned LastBit = Shift + Bits; + static constexpr unsigned LastBit = Shift + Bits - 1; private: template friend struct bitfields_details::Impl; @@ -273,7 +273,7 @@ /// Returns whether the two bitfields share common bits. template static constexpr bool isOverlapping() { - return A::LastBit > B::FirstBit && B::LastBit > A::FirstBit; + return A::LastBit >= B::FirstBit && B::LastBit >= A::FirstBit; } }; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83192.275604.patch Type: text/x-patch Size: 852 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:25 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:25 +0000 (UTC) Subject: [PATCH] D83082: [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode In-Reply-To: References: Message-ID: <0c0af9d851c121272c63e6abf435f811@localhost.localdomain> gchatelet added a comment. Fixed by rG04288e93be7bbcdca5707d84149e864923f9ed25 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83082/new/ https://reviews.llvm.org/D83082 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:25 2020 From: llvm-commits at lists.llvm.org (Orlando Cazalet-Hyams via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:25 +0000 (UTC) Subject: [PATCH] D82129: [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope In-Reply-To: References: Message-ID: <04a4a0794cc24b19dd2c1b2b57d59580@localhost.localdomain> Orlando added a comment. Does anyone have any concerns with this patch that they feel have not been addressed? I've slightly reworded the patch description following the discussion on the nature of the changes to the register-variables.ll test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82129/new/ https://reviews.llvm.org/D82129 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:25 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:25 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: grimar marked an inline comment as done. grimar added a comment. In D83131#2131493 , @MaskRay wrote: > > 2. replace reportWarning with reportUniqueWarning calls. In this method it is no-op, because it is not possible to have a duplicated warnings anyways, but since we probably want to switch to reportUniqueWarning globally, this is a good thing to do. > > My understanding of the inline comments in D69671 is that we will change `reportWarning` call sites to use `reportUniqueWarning` and then rename `reportUniqueWarning` to `reportWarning`. Is that the case? Yes. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/linker-options.test:12 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section at index 5: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 ---------------- jhenderson wrote: > Repeating the "index 5" bit in the warning seems sub-optimal. I think it's only necessary if we don't trust the warning produced by the Object library to include the index? That is why I've introduced it, but now I see that `getSectionContents()` always includes the index and it is probably unlikely that one day it will get to some another case when it will not do it. I'll update this place. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:25 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:25 +0000 (UTC) Subject: [PATCH] D83197: [CodeGen] Fix warning in DAGTypeLegalizer::SplitVecRes_ExtendOp Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a reviewer: rengolin. Herald added a project: LLVM. This fixes up a warning in the following test: sve-sext-zext.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83197 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1767,8 +1767,7 @@ // more effectively move in the right direction and prevent falling down // to scalarization in many cases due to the input vector being split too // far. - unsigned NumElements = SrcVT.getVectorNumElements(); - if ((NumElements & 1) == 0 && + if ((SrcVT.getVectorMinNumElements() & 1) == 0 && SrcVT.getSizeInBits() * 2 < DestVT.getSizeInBits()) { LLVMContext &Ctx = *DAG.getContext(); EVT NewSrcVT = SrcVT.widenIntegerVectorElementType(Ctx); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83197.275607.patch Type: text/x-patch Size: 739 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:26 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:26 +0000 (UTC) Subject: [PATCH] D83198: [CodeGen] Fix warnings in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a reviewer: rengolin. Herald added a project: LLVM. Fixes warnings in this test: sve-sext-zext.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83198 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2179,13 +2179,18 @@ SDValue Idx = N->getOperand(1); SDLoc dl(N); SDValue Lo, Hi; + + assert(SubVT.isScalableVector() == + N->getOperand(0).getValueType().isScalableVector() && + "We only support extracting fixed length vectors from legal scalable " + "vector types"); GetSplitVector(N->getOperand(0), Lo, Hi); - uint64_t LoElts = Lo.getValueType().getVectorNumElements(); + uint64_t LoElts = Lo.getValueType().getVectorMinNumElements(); uint64_t IdxVal = cast(Idx)->getZExtValue(); if (IdxVal < LoElts) { - assert(IdxVal + SubVT.getVectorNumElements() <= LoElts && + assert(IdxVal + SubVT.getVectorMinNumElements() <= LoElts && "Extracted subvector crosses vector split!"); return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Lo, Idx); } else { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83198.275609.patch Type: text/x-patch Size: 1091 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:27 2020 From: llvm-commits at lists.llvm.org (Peter Smith via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:27 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: psmith accepted this revision. psmith added a comment. This revision is now accepted and ready to land. LGTM, it looks like it is difficult to generate local dynamic from clang, although it is possible with GCC. I was able to make a test application, that also had the advantage of working for shared libraries and PIE as R_ABS does not in that case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:27 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:27 +0000 (UTC) Subject: [PATCH] D83170: [VE] Support symbol with offset in assembly In-Reply-To: References: Message-ID: <5a168d5b4b1d974d5438e0094c608b93@localhost.localdomain> kaz7 updated this revision to Diff 275610. kaz7 added a comment. Change function names following suggestions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83170/new/ https://reviews.llvm.org/D83170 Files: llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp llvm/test/MC/VE/sym-br.s llvm/test/MC/VE/symbols.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83170.275610.patch Type: text/x-patch Size: 13288 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:28 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:28 +0000 (UTC) Subject: [PATCH] D83200: [VE] Change to use isa Message-ID: kaz7 created this revision. kaz7 added reviewers: simoll, k-ishizaka. kaz7 added projects: LLVM, VE. Herald added subscribers: llvm-commits, hiraditya. Change to use isa instead of dyn_cast to avoid a warning. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83200 Files: llvm/lib/Target/VE/VEISelLowering.cpp Index: llvm/lib/Target/VE/VEISelLowering.cpp =================================================================== --- llvm/lib/Target/VE/VEISelLowering.cpp +++ llvm/lib/Target/VE/VEISelLowering.cpp @@ -548,7 +548,7 @@ // for all immediate values now. // FIXME: Change hasAndNot function to have two operands to make it work // correctly with Aurora VE. - if (auto *C = dyn_cast(Y)) + if (isa(Y)) return false; // It's ok for generic registers. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83200.275614.patch Type: text/x-patch Size: 503 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:28 2020 From: llvm-commits at lists.llvm.org (Kristof Beyls via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:28 +0000 (UTC) Subject: [PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8 In-Reply-To: References: Message-ID: <363fa3891d7d9c5a6912a394e58709f0@localhost.localdomain> kristof.beyls accepted this revision. kristof.beyls added a comment. This revision is now accepted and ready to land. LGTM, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76291/new/ https://reviews.llvm.org/D76291 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:28 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:28 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: <2bb6aab1cfeda2547ba72c79970390b3@localhost.localdomain> grimar added inline comments. ================ Comment at: llvm/test/tools/llvm-nm/X86/nm-no-symbols.test:13 + +# RUN: llvm-nm -no-warning-for-no-symbols %t.o 2>&1 | FileCheck %s -DFILE=%t.o --check-prefix NO-WARNING --allow-empty +# NO-WARNING-NOT: no symbols ---------------- `-no-warning-for-no-symbols` -> `--no-warning-for-no-symbols` (there is an agreement to use double dash for long versions of options in tests. E.g: use `--reverse-sort` instead of `-reverse-sort` or just use `-r`. I think one of the reasons was to make the difference between things like `--reverse-sort` and `-r everse-sort` more obvious). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:29 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:29 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous Message-ID: gchatelet created this revision. gchatelet added a reviewer: courbet. Herald added subscribers: llvm-commits, jfb, dexonsmith. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83202 Files: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83202.275618.patch Type: text/x-patch Size: 9161 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:29 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:29 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <15822fcb953a5eac72f0b248c1c8af27@localhost.localdomain> sstefan1 added a comment. Herald added a subscriber: bbn. Can you add a test using this option? ================ Comment at: llvm/lib/Transforms/IPO/Attributor.cpp:1460 + return true; + return std::count(SeedAllowList.begin(), SeedAllowList.end(), AA.getName()); +} ---------------- would it make sense to make this always check lower case names, to avoid mistakes? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:29 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:29 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: <725133d80c1ed2eab14bf7de5e109ce3@localhost.localdomain> grimar added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:241 // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && !config->shared) { + if (expr == R_DTPREL && canRelax && !config->shared) { c.relocations.push_back( ---------------- I've noticed that `canRelax` is always used with `&& !config->shared` now. So can it be: ``` bool canRelax = !config->shared && config->emachine != EM_ARM && config->emachine != EM_HEXAGON && config->emachine != EM_RISCV; ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:29 2020 From: llvm-commits at lists.llvm.org (LiuChen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:29 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: LiuChen3 updated this revision to Diff 275619. LiuChen3 added a comment. Remove redundant variables CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/X86/win64-byval.ll Index: llvm/test/CodeGen/X86/win64-byval.ll =================================================================== --- llvm/test/CodeGen/X86/win64-byval.ll +++ llvm/test/CodeGen/X86/win64-byval.ll @@ -32,3 +32,31 @@ call void @foo({ float, double }* byval %arg) ret void } + +declare void @foo2({ float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, i64 %f) + at data = external constant { float, double } + +define void @test() { +; CHECK-LABEL: @test +; CHECK: movq (%rax), %rcx +; CHECK-NEXT: movq 8(%rax), %rax +; CHECK-NEXT: movq %rax, 120(%rsp) +; CHECK-NEXT: movq %rcx, 112(%rsp) +; CHECK-NEXT: movq %rcx, 96(%rsp) +; CHECK-NEXT: movq %rax, 104(%rsp) +; CHECK-NEXT: movq %rcx, 80(%rsp) +; CHECK-NEXT: movq %rax, 88(%rsp) +; CHECK-NEXT: movq %rcx, 64(%rsp) +; CHECK-NEXT: movq %rax, 72(%rsp) +; CHECK-NEXT: movq %rax, 56(%rsp) +; CHECK-NEXT: movq %rcx, 48(%rsp) +; CHECK-NEXT: leaq 48(%rsp), %rax +; CHECK-NEXT: movq %rax, 32(%rsp) +; CHECK-NEXT: movq $10, 40(%rsp) +; CHECK-NEXT: leaq 112(%rsp), %rcx +; CHECK-NEXT: leaq 96(%rsp), %rdx +; CHECK-NEXT: leaq 80(%rsp), %r8 +; CHECK-NEXT: leaq 64(%rsp), %r9 + call void @foo2({ float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, i64 10) + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.h =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.h +++ llvm/lib/Target/X86/X86ISelLowering.h @@ -1436,7 +1436,7 @@ SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const; + ISD::ArgFlagsTy Flags, bool hasCopy) const; // Call lowering helpers. Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -3763,12 +3763,15 @@ SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const { + ISD::ArgFlagsTy Flags, + bool hasCopy) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), StackPtr, PtrOff); - if (Flags.isByVal()) + // If the argument already has a copy on the stack, we do not need to + // creating a temporary stack slot, again. + if (Flags.isByVal() && !hasCopy) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); return DAG.getStore( @@ -4080,7 +4083,7 @@ StackPtr = DAG.getCopyFromReg(Chain, dl, RegInfo->getStackRegister(), getPointerTy(DAG.getDataLayout())); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, - dl, DAG, VA, Flags)); + dl, DAG, VA, Flags, !isByVal)); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83175.275619.patch Type: text/x-patch Size: 3578 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:30 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:30 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: <2da5ea62435cf82f7d97dde229558e69@localhost.localdomain> grimar accepted this revision. grimar added a comment. LGTM too (perhaps, with the nit I've mentioned. Up to you). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:30 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:30 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: <5b5adb034e18f0443a1b2c095e115f9f@localhost.localdomain> benshi001 updated this revision to Diff 275616. benshi001 edited the summary of this revision. benshi001 added a comment. One more change: add check if c1*c2 is overflow. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll llvm/test/CodeGen/X86/urem-seteq-nonzero.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83153.275616.patch Type: text/x-patch Size: 9469 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:30 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:30 +0000 (UTC) Subject: [PATCH] D83203: [CodeGen] Fix warnings in SelectionDAG::SplitVector Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a reviewer: rengolin. Herald added a project: LLVM. Fixes warnings in this test: sve-sext-zext.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83203 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -9614,14 +9614,18 @@ std::pair SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT, const EVT &HiVT) { - assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <= - N.getValueType().getVectorNumElements() && + assert(LoVT.isScalableVector() == HiVT.isScalableVector() && + LoVT.isScalableVector() == N.getValueType().isScalableVector() && + "Splitting vector with an invalid mixture of fixed and scalable " + "vector types"); + assert(LoVT.getVectorMinNumElements() + HiVT.getVectorMinNumElements() <= + N.getValueType().getVectorMinNumElements() && "More vector elements requested than available!"); SDValue Lo, Hi; Lo = getNode(ISD::EXTRACT_SUBVECTOR, DL, LoVT, N, getVectorIdxConstant(0, DL)); Hi = getNode(ISD::EXTRACT_SUBVECTOR, DL, HiVT, N, - getVectorIdxConstant(LoVT.getVectorNumElements(), DL)); + getVectorIdxConstant(LoVT.getVectorMinNumElements(), DL)); return std::make_pair(Lo, Hi); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83203.275623.patch Type: text/x-patch Size: 1315 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:31 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:31 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: <2e09aba12f948e058b6b0bafbb659aa8@localhost.localdomain> grimar marked an inline comment as done. grimar added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:6559-6566 + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); ---------------- jhenderson wrote: > This seems like a pattern we're likely to have in several different parts of the ELFDumper. Is there any code we could share to avoid duplication? Maybe it just makes sense to change `getStaticSymbolName` to report the warning/return the `` itself? >From what I see, the `getStaticSymbolName` is used in one more place: ``` template void LLVMStyle::printAddrsig(const ELFFile *Obj) { ... for (uint64_t Sym : *V) { Expected NameOrErr = this->dumper()->getStaticSymbolName(Sym); if (NameOrErr) { W.printNumber("Sym", *NameOrErr, Sym); continue; } reportWarning(NameOrErr.takeError(), this->FileName); W.printNumber("Sym", "", Sym); } } ``` And it looks like it should be reasonable and possible to do what you suggest. Follow-up? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:31 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:31 +0000 (UTC) Subject: [PATCH] D83204: [ARM Message-ID: dmgreen created this revision. Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. https://reviews.llvm.org/D83204 Files: llvm/lib/Target/ARM/ARMInstrVFP.td llvm/unittests/Target/ARM/MachineInstrTest.cpp Index: llvm/unittests/Target/ARM/MachineInstrTest.cpp =================================================================== --- llvm/unittests/Target/ARM/MachineInstrTest.cpp +++ llvm/unittests/Target/ARM/MachineInstrTest.cpp @@ -1103,8 +1103,8 @@ for (unsigned Op = 0; Op < ARM::INSTRUCTION_LIST_END; ++Op) { const MCInstrDesc &Desc = TII->get(Op); - if ((Desc.TSFlags & ARMII::DomainMask) != ARMII::DomainMVE && - (Desc.TSFlags & ARMII::DomainMask) != ARMII::DomainVFP) + if (((Desc.TSFlags & ARMII::DomainMask) & + (ARMII::DomainMVE | ARMII::DomainVFP | ARMII::DomainNEONA8)) == 0) continue; if (UnpredictableOpcodes.count(Op)) continue; Index: llvm/lib/Target/ARM/ARMInstrVFP.td =================================================================== --- llvm/lib/Target/ARM/ARMInstrVFP.td +++ llvm/lib/Target/ARM/ARMInstrVFP.td @@ -1600,6 +1600,7 @@ // Some single precision VFP instructions may be executed on both NEON and // VFP pipelines on A8. let D = VFPNeonA8Domain; + let hasSideEffects = 0; } def : VFPNoNEONPat<(i32 (fp_to_sint SPR:$a)), @@ -1647,6 +1648,7 @@ // Some single precision VFP instructions may be executed on both NEON and // VFP pipelines on A8. let D = VFPNeonA8Domain; + let hasSideEffects = 0; } def : VFPNoNEONPat<(i32 (fp_to_uint SPR:$a)), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83204.275624.patch Type: text/x-patch Size: 1345 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:31 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:31 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: benshi001 added a comment. Chnage list according to all your comments. 1. Seperate the test cases to show improvement in another patch. Done. https://reviews.llvm.org/D83159 2. Make sure c1 and c2 do not exceed int64, to avoid assert failure. Done. One more if-statment is added to check that. (the condition should be >, not >=, otherwise riscv64 can not be optimized) 3. Check if c1*c2 is overflow. Done One more if-statment for that is added. 4. Make a inverse transform if "opt -instcombine" has been performed. Shall we seperate this inverse transform in another patch? At least this patch improves the test case urem-seteq-nonzero.ll, and the case in https://reviews.llvm.org/D83159 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:32 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:32 +0000 (UTC) Subject: [PATCH] D83204: [ARM] More unpredicatable VCVT instructions. In-Reply-To: References: Message-ID: <72045c50c7e95c5cf35d554c9e921e67@localhost.localdomain> dmgreen updated this revision to Diff 275626. dmgreen added a comment. Update test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83204/new/ https://reviews.llvm.org/D83204 Files: llvm/lib/Target/ARM/ARMInstrVFP.td llvm/test/CodeGen/Thumb2/mve-vcvt.ll llvm/unittests/Target/ARM/MachineInstrTest.cpp Index: llvm/unittests/Target/ARM/MachineInstrTest.cpp =================================================================== --- llvm/unittests/Target/ARM/MachineInstrTest.cpp +++ llvm/unittests/Target/ARM/MachineInstrTest.cpp @@ -1103,8 +1103,8 @@ for (unsigned Op = 0; Op < ARM::INSTRUCTION_LIST_END; ++Op) { const MCInstrDesc &Desc = TII->get(Op); - if ((Desc.TSFlags & ARMII::DomainMask) != ARMII::DomainMVE && - (Desc.TSFlags & ARMII::DomainMask) != ARMII::DomainVFP) + if (((Desc.TSFlags & ARMII::DomainMask) & + (ARMII::DomainMVE | ARMII::DomainVFP | ARMII::DomainNEONA8)) == 0) continue; if (UnpredictableOpcodes.count(Op)) continue; Index: llvm/test/CodeGen/Thumb2/mve-vcvt.ll =================================================================== --- llvm/test/CodeGen/Thumb2/mve-vcvt.ll +++ llvm/test/CodeGen/Thumb2/mve-vcvt.ll @@ -45,8 +45,8 @@ ; CHECK-MVE: @ %bb.0: @ %entry ; CHECK-MVE-NEXT: vcvt.s32.f32 s4, s0 ; CHECK-MVE-NEXT: vcvt.s32.f32 s6, s1 -; CHECK-MVE-NEXT: vcvt.s32.f32 s8, s3 ; CHECK-MVE-NEXT: vcvt.s32.f32 s10, s2 +; CHECK-MVE-NEXT: vcvt.s32.f32 s8, s3 ; CHECK-MVE-NEXT: vmov r0, s4 ; CHECK-MVE-NEXT: vmov.32 q0[0], r0 ; CHECK-MVE-NEXT: vmov r0, s6 @@ -71,8 +71,8 @@ ; CHECK-MVE: @ %bb.0: @ %entry ; CHECK-MVE-NEXT: vcvt.u32.f32 s4, s0 ; CHECK-MVE-NEXT: vcvt.u32.f32 s6, s1 -; CHECK-MVE-NEXT: vcvt.u32.f32 s8, s3 ; CHECK-MVE-NEXT: vcvt.u32.f32 s10, s2 +; CHECK-MVE-NEXT: vcvt.u32.f32 s8, s3 ; CHECK-MVE-NEXT: vmov r0, s4 ; CHECK-MVE-NEXT: vmov.32 q0[0], r0 ; CHECK-MVE-NEXT: vmov r0, s6 Index: llvm/lib/Target/ARM/ARMInstrVFP.td =================================================================== --- llvm/lib/Target/ARM/ARMInstrVFP.td +++ llvm/lib/Target/ARM/ARMInstrVFP.td @@ -1600,6 +1600,7 @@ // Some single precision VFP instructions may be executed on both NEON and // VFP pipelines on A8. let D = VFPNeonA8Domain; + let hasSideEffects = 0; } def : VFPNoNEONPat<(i32 (fp_to_sint SPR:$a)), @@ -1647,6 +1648,7 @@ // Some single precision VFP instructions may be executed on both NEON and // VFP pipelines on A8. let D = VFPNeonA8Domain; + let hasSideEffects = 0; } def : VFPNoNEONPat<(i32 (fp_to_uint SPR:$a)), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83204.275626.patch Type: text/x-patch Size: 2289 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:32 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:32 +0000 (UTC) Subject: [PATCH] D83205: [SVE] Add checks for no warnings in CodeGen/AArch64/sve-sext-zext.ll Message-ID: david-arm created this revision. Herald added subscribers: llvm-commits, danielkiss, psnobl, kristof.beyls, tschuett. Herald added a reviewer: rengolin. Herald added a reviewer: efriedma. Herald added a project: LLVM. Previous patches fixed up all the warnings in this test: llvm/test/CodeGen/AArch64/sve-sext-zext.ll and this change simply checks that no new warnings are added in future. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83205 Files: llvm/test/CodeGen/AArch64/sve-sext-zext.ll Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-sext-zext.ll +++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning define @sext_i1_i8( %a) { ; CHECK-LABEL: sext_i1_i8: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83205.275628.patch Type: text/x-patch Size: 632 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:33 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:33 +0000 (UTC) Subject: [PATCH] D82876: [Alignment][NFC] Migrate TargetTransformInfo::allowsMisalignedMemoryAccesses to Align In-Reply-To: References: Message-ID: <8d1e7f7f72a09bdc5b27d14287044117@localhost.localdomain> gchatelet marked an inline comment as done. gchatelet added inline comments. ================ Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:16150 if ((Ty == MVT::v4i8 || Ty == MVT::v8i8 || Ty == MVT::v4i16) && Alignment >= VT.getScalarSizeInBits() / 8) { if (Fast) ---------------- gchatelet wrote: > courbet wrote: > > gchatelet wrote: > > > courbet wrote: > > > > When `alignment` was `0`, and `v8i8`, this is not an NFC. > > > `Ty == MVT::v4i8 || Ty == MVT::v8i8 || Ty == MVT::v4i16` so `Ty` is either 32 or 64 bits (`v4i8` is 32, `v8i8` and `v4i16` are 64) > > > Since `VT` is a `SimpleVT` => `VT.getScalarSizeInBits()` can only be 32 or 64 as well. > > > Then `VT.getScalarSizeInBits() / 8` can only be 4 or 8 so it doesn't matter whether Alignment is 0 or 1. > > > > > > Or am I missing something? > > `isSimple` means native to some processor (as opposed to extended). But e.g. `MVT::v8i8` is both a //simple// and //vector// `EVT`. the scalar type for `MVT::v8i8` is `MVT::i8`, so `VT.getScalarSizeInBits()==8`, i.e. `VT.getScalarSizeInBits() / 8 == 1` > Ha I see thx for the explanation and good catch! > > @dmgreen @samparker what do you think? > I fail to see whether you considered `Alignment` being `0` when writing D65580. > For context see [this line](https://reviews.llvm.org/D82876#inline-761892) and [this line](https://reviews.llvm.org/D82876#inline-761893). > > As-is, the whole test suite still passes. A gentle ping @dmgreen @samparker Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82876/new/ https://reviews.llvm.org/D82876 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:33 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:33 +0000 (UTC) Subject: [PATCH] D82931: [flang][OpenMP] Enhance parser support for atomic construct to OpenMP 5.0 In-Reply-To: References: Message-ID: <51550420df2f8702064ad30d7dbac0d4@localhost.localdomain> DavidTruby accepted this revision. DavidTruby added a comment. This revision is now accepted and ready to land. Looks good to me! As a nit, perhaps you could add some tests that shouldn't parse correctly as well? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82931/new/ https://reviews.llvm.org/D82931 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:34 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:34 +0000 (UTC) Subject: [PATCH] D82456: [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG55227f85d09c: [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost (authored by dmgreen). Changed prior to commit: https://reviews.llvm.org/D82456?vs=272986&id=275631#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82456/new/ https://reviews.llvm.org/D82456 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Analysis/CostModel/ARM/load_store.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82456.275631.patch Type: text/x-patch Size: 29559 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:34 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:34 +0000 (UTC) Subject: [PATCH] D82876: [Alignment][NFC] Migrate TargetTransformInfo::allowsMisalignedMemoryAccesses to Align In-Reply-To: References: Message-ID: gchatelet updated this revision to Diff 275630. gchatelet added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82876/new/ https://reviews.llvm.org/D82876 Files: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/R600ISelLowering.cpp llvm/lib/Target/AMDGPU/R600ISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/ARM/ARMISelLowering.cpp llvm/lib/Target/ARM/ARMISelLowering.h llvm/lib/Target/Hexagon/HexagonISelLowering.cpp llvm/lib/Target/Hexagon/HexagonISelLowering.h llvm/lib/Target/Mips/Mips16ISelLowering.cpp llvm/lib/Target/Mips/Mips16ISelLowering.h llvm/lib/Target/Mips/MipsSEISelLowering.cpp llvm/lib/Target/Mips/MipsSEISelLowering.h llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/VE/VEISelLowering.cpp llvm/lib/Target/VE/VEISelLowering.h llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82876.275630.patch Type: text/x-patch Size: 32960 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: <48bc439f5fc09081c08308e3483a5c31@localhost.localdomain> grimar updated this revision to Diff 275632. grimar marked an inline comment as done. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 Files: llvm/test/tools/llvm-readobj/ELF/linker-options.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6753,30 +6753,33 @@ if (Shdr.sh_type != ELF::SHT_LLVM_LINKER_OPTIONS) continue; - ArrayRef Contents = - unwrapOrError(this->FileName, Obj->getSectionContents(&Shdr)); - if (Contents.empty()) + Expected> ContentsOrErr = Obj->getSectionContents(&Shdr); + if (!ContentsOrErr) { + this->reportUniqueWarning( + createError("unable to read the content of the " + "SHT_LLVM_LINKER_OPTIONS section: " + + toString(ContentsOrErr.takeError()))); + continue; + } + if (ContentsOrErr->empty()) continue; - if (Contents.back() != 0) { - reportWarning(createError("SHT_LLVM_LINKER_OPTIONS section at index " + - Twine(I) + - " is broken: the " - "content is not null-terminated"), - this->FileName); + if (ContentsOrErr->back() != 0) { + this->reportUniqueWarning( + createError("SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: the " + "content is not null-terminated")); continue; } SmallVector Strings; - toStringRef(Contents.drop_back()).split(Strings, '\0'); + toStringRef(ContentsOrErr->drop_back()).split(Strings, '\0'); if (Strings.size() % 2 != 0) { - reportWarning( - createError( - "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + - " is broken: an incomplete " - "key-value pair was found. The last possible key was: \"" + - Strings.back() + "\""), - this->FileName); + this->reportUniqueWarning(createError( + "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: an incomplete " + "key-value pair was found. The last possible key was: \"" + + Strings.back() + "\"")); continue; } Index: llvm/test/tools/llvm-readobj/ELF/linker-options.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/linker-options.test +++ llvm/test/tools/llvm-readobj/ELF/linker-options.test @@ -2,13 +2,14 @@ ## to dump SHT_LLVM_LINKER_OPTIONS sections. # RUN: yaml2obj --docnum=1 %s -o %t1 -# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s --check-prefix=CHECK -DFILE=%t1 +# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s -DFILE=%t1 # CHECK: LinkerOptions [ # CHECK: option 0: value 0 # CHECK: option 1: value 1 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 2 is broken: an incomplete key-value pair was found. The last possible key was: "c" # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 # CHECK-NEXT: ] @@ -44,7 +45,15 @@ - Name: .linker-options.nonul Type: SHT_LLVM_LINKER_OPTIONS Content: "61" -## Case 5: another correct case to show we do not stop dumping after reporting a warning. +## Case 5: check we report a warning when it is not possible to read +## the content of the SHT_LLVM_LINKER_OPTIONS section. + - Name: .linker-options.broken.content + Type: SHT_LLVM_LINKER_OPTIONS + ShOffset: 0xffffffff + Options: + - Name: foo + Value: bar +## Case 6: another correct case to show we do not stop dumping after reporting a warning. - Name: .linker-options.valid2 Type: SHT_LLVM_LINKER_OPTIONS Options: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83131.275632.patch Type: text/x-patch Size: 4074 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D80301: [yaml][clang-tidy] Fix multiline YAML serialization In-Reply-To: References: Message-ID: <443d16fe00f9a4eac3a789d70d9f70b2@localhost.localdomain> DmitryPolukhin added a comment. @njames93 and @aaron.ballman - please take a look to this diff. Multiline replacements in YAML are broken and cannot be applied correctly. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80301/new/ https://reviews.llvm.org/D80301 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D83102: [Scalarizer] InsertElement handling w/ constant insert index In-Reply-To: References: Message-ID: <1796210b98244f0ee84e730e9f4c84cb@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf62c8dbc99ea: [Scalarizer] InsertElement handling w/ constant insert index (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83102/new/ https://reviews.llvm.org/D83102 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/basic.ll llvm/test/Transforms/Scalarizer/constant-insertelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83102.275635.patch Type: text/x-patch Size: 4797 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <02043ccca4daf91c0f6ea5e464c67d9f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG28b7816b782b: [Scalarizer] ExtractElement handling w/ constant extract index (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/constant-extractelement.ll llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83101.275637.patch Type: text/x-patch Size: 5424 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D83160: [InstCombine] Lower infinite combine loop detection thresholds In-Reply-To: References: Message-ID: <4b93c7adba9c9e855acbac7d793b24ea@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcd7f8051ac7b: [InstCombine] Lower infinite combine loop detection thresholds (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83160/new/ https://reviews.llvm.org/D83160 Files: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Index: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp =================================================================== --- llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,8 +123,13 @@ DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); +// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; +#ifndef NDEBUG +static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; +#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; +#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83160.275634.patch Type: text/x-patch Size: 795 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D82961: [Scalarizer] InsertElement handling w/ variable insert index (PR46524) In-Reply-To: References: Message-ID: <88abd1c31e163bf76d335df11069a149@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG6e5047458132: [Scalarizer] InsertElement handling w/ variable insert index (PR46524) (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82961/new/ https://reviews.llvm.org/D82961 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-insertelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82961.275636.patch Type: text/x-patch Size: 9456 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:35 +0000 (UTC) Subject: [PATCH] D82970: [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) In-Reply-To: References: Message-ID: <559e3ba1235874afcb9d99c92b270ee9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG51f9310ff2e3: [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82970/new/ https://reviews.llvm.org/D82970 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-extractelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82970.275638.patch Type: text/x-patch Size: 7404 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:36 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:36 +0000 (UTC) Subject: [PATCH] D83128: [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc In-Reply-To: References: Message-ID: <1e4ae169a491fecb0900e1994ea72c0d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcd209f1a3790: [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc (authored by sammccall). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83128/new/ https://reviews.llvm.org/D83128 Files: llvm/include/llvm/Support/Path.h llvm/lib/Support/Unix/Path.inc llvm/lib/Support/Windows/Path.inc llvm/unittests/Support/Path.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83128.275639.patch Type: text/x-patch Size: 4154 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:37 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:37 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM Message-ID: LukeGeeson created this revision. LukeGeeson added a reviewer: t.p.northover. Herald added subscribers: llvm-commits, cfe-commits, danielkiss, hiraditya, kristof.beyls. Herald added projects: clang, LLVM. This patch adds support for the Arm-v8 Cortex-A78 and Cortex-X1 processors for AArch64 and ARM. In detail: - Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang - Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm details of the CPU can be found here: https://www.arm.com/products/cortex-x https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78 The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.275633.patch Type: text/x-patch Size: 14081 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:37 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:37 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <25da1a35373741c2ba53c33228100e1a@localhost.localdomain> sanwou01 marked 2 inline comments as done. sanwou01 added a comment. Comments inline. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3030 + + auto Shape = + VFShape::get(*CI, {static_cast(VL.size()), false}, ---------------- ABataev wrote: > Is this a pointer? Then `auto *`. Surprisingly no! I'll remove auto to avoid the confusion. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5121 + if (I->mayReadOrWriteMemory() && !isSideeffectIntrinsic(I) && + !isVectorizableLibFunctionCall(I)) { // Update the linked list of memory accessing instructions. ---------------- ABataev wrote: > Why do you need to exclude vectorizable library functions here? For "normal" function calls, we have to assume that the functions may read or write memory any location in memory, which may alias memory read or written by another instruction in the same bundle. For functions with vector variants, we should be able to assume that they are pure: they won't write to memory (except when the function takes pointer arguments, which I'm not handling correctly now that I think about it; I'll fix that up). Actually, the calculateDependencies function below might be the less-surprising place to handle this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:37 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:37 +0000 (UTC) Subject: [PATCH] D82457: [ARM] Add extra extend and trunc costs for cast instructions In-Reply-To: References: Message-ID: <8c4359222703d650463a743f8f5aa2c2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG60b8b2beeab9: [ARM] Add extra extend and trunc costs for cast instructions (authored by dmgreen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82457/new/ https://reviews.llvm.org/D82457 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll Index: llvm/test/Analysis/CostModel/ARM/cast_ldst.ll =================================================================== --- llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -122,8 +122,8 @@ ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %v8864u = zext <8 x i8> %loadv8i8 to <8 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816s = sext <16 x i8> %loadv16i8 to <16 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816u = zext <16 x i8> %loadv16i8 to <16 x i16> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1322 for instruction: %v16864s = sext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 298 for instruction: %v16864u = zext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v21632s = sext <2 x i16> %loadv2i16 to <2 x i32> @@ -758,7 +758,7 @@ ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8832 = trunc <8 x i32> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v8864 = trunc <8 x i64> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816 = trunc <16 x i16> undef to <16 x i8> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v16864 = trunc <16 x i64> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21632 = trunc <2 x i32> undef to <2 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21664 = trunc <2 x i64> undef to <2 x i16> Index: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp =================================================================== --- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -228,12 +228,39 @@ {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 0}, {ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i8, 0}, {ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i8, 0}, + // The following extend from a legal type to an illegal type, so need to + // split the load. This introduced an extra load operation, but the + // extend is still "free". + {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 1}, + {ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 1}, }; if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { if (const auto *Entry = ConvertCostTableLookup(MVELoadConversionTbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT())) - return AdjustCost(Entry->Cost); + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); + } + } + + // The truncate of a store is free. This is the mirror of extends above. + if (I && I->hasOneUse() && isa(*I->user_begin())) { + static const TypeConversionCostTblEntry MVELoadConversionTbl[] = { + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i16, 0}, + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i8, 0}, + {ISD::TRUNCATE, MVT::v8i16, MVT::v8i8, 0}, + {ISD::TRUNCATE, MVT::v8i32, MVT::v8i16, 1}, + {ISD::TRUNCATE, MVT::v16i32, MVT::v16i8, 3}, + {ISD::TRUNCATE, MVT::v16i16, MVT::v16i8, 1}, + }; + if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { + if (const auto *Entry = + ConvertCostTableLookup(MVELoadConversionTbl, ISD, SrcTy.getSimpleVT(), + DstTy.getSimpleVT())) + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82457.275641.patch Type: text/x-patch Size: 4820 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:38 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:38 +0000 (UTC) Subject: [PATCH] D83207: [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions Message-ID: critson created this revision. critson added reviewers: rampitec, arsenm. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. MAD/MAC is no longer always being available. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83207 Files: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4267,10 +4267,13 @@ switch (VT.getSimpleVT().SimpleTy) { case MVT::f32: { - // This is as fast on some subtargets. However, we always have full rate f32 - // mad available which returns the same result as the separate operations - // which we should prefer over fma. We can't use this if we want to support - // denormals, so only report this in these cases. + // If mad is not available this depends only on if f32 fma is full rate. + if (!Subtarget->hasMadMacF32Insts()) + return Subtarget->hasFastFMAF32(); + + // Otherwise f32 mad is always full rate and returns the same result as + // the separate operations so should be preferred over fma. + // However does not support denomals. if (hasFP32Denormals(MF)) return Subtarget->hasFastFMAF32() || Subtarget->hasDLInsts(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83207.275642.patch Type: text/x-patch Size: 1043 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:38 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:38 +0000 (UTC) Subject: [PATCH] D83159: [RISCV] Add a new codegen test In-Reply-To: References: Message-ID: lenary accepted this revision. lenary added a comment. This revision is now accepted and ready to land. LGTM! Please ask for commit access and land this yourself. There's more about this, and what that entails, in this document: https://llvm.org/docs/DeveloperPolicy.html Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:38 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:38 +0000 (UTC) Subject: [PATCH] D83207: [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions In-Reply-To: References: Message-ID: <2103034358f32471ded9de901d05fa29@localhost.localdomain> critson added a comment. Technically this is NFC with current hardware configurations, but I would like to be able to decouple this from MAD/MAC. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83207/new/ https://reviews.llvm.org/D83207 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:39 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:39 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: <8dc99107ef15d159fbb2bdab9e3b5066@localhost.localdomain> foad added inline comments. ================ Comment at: llvm/lib/CodeGen/CodeGenPrepare.cpp:2943 +bool TypePromotionTransaction::changed( + TypePromotionTransaction::ConstRestorationPt Point) { ---------------- Maybe `changedSince` would be a better name? ================ Comment at: llvm/lib/CodeGen/CodeGenPrepare.cpp:4955 } - TPT.commit(); ---------------- I still don't really understand whether this is an intentional change in behaviour, or just fixing the return value from optimizeMemoryInst. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:39 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:39 +0000 (UTC) Subject: [PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests In-Reply-To: References: Message-ID: <901dd0d50031b96c16d7e213f0a20df9@localhost.localdomain> kmclaughlin accepted this revision. kmclaughlin added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82943/new/ https://reviews.llvm.org/D82943 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:39 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:39 +0000 (UTC) Subject: [PATCH] D83099: Revert "[clangd] Store index in '.clangd/index' instead of '.clangd-index'" In-Reply-To: References: Message-ID: <3acd294d313ab3a6d841c6345e8ff7b0@localhost.localdomain> sammccall updated this revision to Diff 275643. sammccall added a comment. Slightly different layout after getting input on discourse and elsewhere Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 Files: .gitignore clang-tools-extra/clangd/index/Background.h clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp clang-tools-extra/clangd/test/background-index.test llvm/.gitignore Index: llvm/.gitignore =================================================================== --- llvm/.gitignore +++ llvm/.gitignore @@ -59,8 +59,6 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd #==============================================================================# # Files created in tree by the Go bindings. Index: clang-tools-extra/clangd/test/background-index.test =================================================================== --- clang-tools-extra/clangd/test/background-index.test +++ clang-tools-extra/clangd/test/background-index.test @@ -15,8 +15,8 @@ # RUN: clangd -background-index -lit-test < %t/definition.jsonrpc | FileCheck %t/definition.jsonrpc --check-prefixes=CHECK,BUILD # Test that the index is writing files in the expected location. -# RUN: ls %t/.clangd/index/foo.cpp.*.idx -# RUN: ls %t/sub_dir/.clangd/index/foo.h.*.idx +# RUN: ls %t/.cache/clangd/index/foo.cpp.*.idx +# RUN: ls %t/sub_dir/.cache/clangd/index/foo.h.*.idx # Test the index is read from disk: delete code and restart clangd. # RUN: rm %t/foo.cpp Index: clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp =================================================================== --- clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp +++ clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp @@ -95,8 +95,8 @@ }; // Creates and owns IndexStorages for multiple CDBs. -// When a CDB root is found, shards are stored in $ROOT/.clangd/index. -// When no root is found, the fallback path is ~/.cache/clangd/index. +// When a CDB root is found, shards are stored in $ROOT/.cache/clangd/index/. +// When no root is found, the fallback path is ~/.cache/clangd/index/. class DiskBackedIndexStorageManager { public: DiskBackedIndexStorageManager( @@ -115,7 +115,7 @@ llvm::SmallString<128> StorageDir(FallbackDir); if (auto PI = GetProjectInfo(File)) { StorageDir = PI->SourceRoot; - llvm::sys::path::append(StorageDir, ".clangd", "index"); + llvm::sys::path::append(StorageDir, ".cache", "clangd", "index"); } auto &IndexStorage = IndexStorageMap[StorageDir]; if (!IndexStorage) Index: clang-tools-extra/clangd/index/Background.h =================================================================== --- clang-tools-extra/clangd/index/Background.h +++ clang-tools-extra/clangd/index/Background.h @@ -56,9 +56,9 @@ using Factory = llvm::unique_function; // Creates an Index Storage that saves shards into disk. Index storage uses - // CDBDirectory + ".clangd/index/" as the folder to save shards. CDBDirectory - // is the first directory containing a CDB in parent directories of a file, or - // user's home directory if none was found, e.g. standard library headers. + // CDBDirectory + ".cache/clangd/index/" as the folder to save shards. + // CDBDirectory is the first directory containing a CDB in parent directories + // of a file, or user's home directory if none was found, e.g. stdlib headers. static Factory createDiskBackedStorageFactory( std::function(PathRef)> GetProjectInfo); }; Index: .gitignore =================================================================== --- .gitignore +++ .gitignore @@ -53,10 +53,11 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache # static analyzer regression testing project files /clang/utils/analyzer/projects/*/CachedSource /clang/utils/analyzer/projects/*/PatchedSource /clang/utils/analyzer/projects/*/ScanBuildResults -/clang/utils/analyzer/projects/*/RefScanBuildResults \ No newline at end of file +/clang/utils/analyzer/projects/*/RefScanBuildResults -------------- next part -------------- A non-text attachment was scrubbed... Name: D83099.275643.patch Type: text/x-patch Size: 3810 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:40 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:40 +0000 (UTC) Subject: [PATCH] D83208: [llvm-readobj] - Refactor ELFDumper::getStaticSymbolName. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This is a followup for D83129 . It is possible to make `getStaticSymbolName` report warnings inside and return the "" on a error. This allows to encapsulate errors handling and slightly simplifies the logic in callers code. https://reviews.llvm.org/D83208 Files: llvm/test/tools/llvm-readobj/ELF/addrsig.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -343,7 +343,7 @@ const Elf_Sym *FirstSym) const; Expected getSymbolSectionName(const Elf_Sym *Symbol, unsigned SectionIndex) const; - Expected getStaticSymbolName(uint32_t Index) const; + std::string getStaticSymbolName(uint32_t Index) const; StringRef getDynamicString(uint64_t Value) const; Expected getSymbolVersionByIndex(uint32_t VersionSymbolIndex, bool &IsDefault) const; @@ -1131,21 +1131,27 @@ } template -Expected -ELFDumper::getStaticSymbolName(uint32_t Index) const { +std::string ELFDumper::getStaticSymbolName(uint32_t Index) const { + auto ReportWarn = [&](Error E) -> std::string { + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(std::move(E)))); + return ""; + }; + const ELFFile *Obj = ObjF->getELFFile(); Expected SymOrErr = Obj->getSymbol(DotSymtabSec, Index); if (!SymOrErr) - return SymOrErr.takeError(); + return ReportWarn(SymOrErr.takeError()); Expected StrTabOrErr = Obj->getStringTableForSymtab(*DotSymtabSec); if (!StrTabOrErr) - return StrTabOrErr.takeError(); + return ReportWarn(StrTabOrErr.takeError()); Expected NameOrErr = (*SymOrErr)->getName(*StrTabOrErr); if (!NameOrErr) - return NameOrErr.takeError(); + return ReportWarn(NameOrErr.takeError()); return maybeDemangle(*NameOrErr); } @@ -6555,21 +6561,12 @@ return; } - auto GetSymName = [&](uint32_t Index) -> std::string { - if (Expected NameOrErr = - this->dumper()->getStaticSymbolName(Index)) - return *NameOrErr; - else - this->reportUniqueWarning( - createError("unable to read the name of symbol with index " + - Twine(Index) + ": " + toString(NameOrErr.takeError()))); - return ""; - }; - for (const Elf_CGProfile &CGPE : *CGProfileOrErr) { DictScope D(W, "CGProfileEntry"); - W.printNumber("From", GetSymName(CGPE.cgp_from), CGPE.cgp_from); - W.printNumber("To", GetSymName(CGPE.cgp_to), CGPE.cgp_to); + W.printNumber("From", this->dumper()->getStaticSymbolName(CGPE.cgp_from), + CGPE.cgp_from); + W.printNumber("To", this->dumper()->getStaticSymbolName(CGPE.cgp_to), + CGPE.cgp_to); W.printNumber("Weight", CGPE.cgp_weight); } } Index: llvm/test/tools/llvm-readobj/ELF/addrsig.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/addrsig.test +++ llvm/test/tools/llvm-readobj/ELF/addrsig.test @@ -58,7 +58,7 @@ # INVALID-INDEX: Addrsig [ # INVALID-INDEX-NEXT: Sym: foo (1) -# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to get symbol from section [index 2]: invalid symbol index (255) +# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to read the name of symbol with index 255: unable to get symbol from section [index 2]: invalid symbol index (255) # INVALID-INDEX-NEXT: Sym: (255) # INVALID-INDEX-NEXT: Sym: bar (2) # INVALID-INDEX-NEXT: ] -------------- next part -------------- A non-text attachment was scrubbed... Name: D83208.275644.patch Type: text/x-patch Size: 3532 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:40 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:40 +0000 (UTC) Subject: [PATCH] D83209: Factor out call to EXTRACTOR in generateCC1CommandLine Message-ID: dang created this revision. dang added a reviewer: Bigcheese. Herald added subscribers: llvm-commits, cfe-commits, dexonsmith. Herald added projects: clang, LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83209 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83209.275646.patch Type: text/x-patch Size: 8323 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:40 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:40 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: <10ea32e0166199180304f74667d46036@localhost.localdomain> grimar marked an inline comment as done. grimar added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:6559-6566 + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); ---------------- grimar wrote: > jhenderson wrote: > > This seems like a pattern we're likely to have in several different parts of the ELFDumper. Is there any code we could share to avoid duplication? Maybe it just makes sense to change `getStaticSymbolName` to report the warning/return the `` itself? > From what I see, the `getStaticSymbolName` is used in one more place: > > ``` > template > void LLVMStyle::printAddrsig(const ELFFile *Obj) { > ... > for (uint64_t Sym : *V) { > Expected NameOrErr = this->dumper()->getStaticSymbolName(Sym); > if (NameOrErr) { > W.printNumber("Sym", *NameOrErr, Sym); > continue; > } > reportWarning(NameOrErr.takeError(), this->FileName); > W.printNumber("Sym", "", Sym); > } > } > ``` > > And it looks like it should be reasonable and possible to do what you suggest. Follow-up? Follow-up: D83208 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:40 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:40 +0000 (UTC) Subject: [PATCH] D83099: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' In-Reply-To: References: Message-ID: <7a779ee224c6b80b70177e82e1221715@localhost.localdomain> sammccall updated this revision to Diff 275645. sammccall retitled this revision from "Revert "[clangd] Store index in '.clangd/index' instead of '.clangd-index'"" to "[clangd] Store index in '.cache/clangd/index' instead of '.clangd/index'". sammccall edited the summary of this revision. sammccall added a comment. Updating description Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 Files: .gitignore clang-tools-extra/clangd/index/Background.h clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp clang-tools-extra/clangd/test/background-index.test llvm/.gitignore Index: llvm/.gitignore =================================================================== --- llvm/.gitignore +++ llvm/.gitignore @@ -59,8 +59,6 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd #==============================================================================# # Files created in tree by the Go bindings. Index: clang-tools-extra/clangd/test/background-index.test =================================================================== --- clang-tools-extra/clangd/test/background-index.test +++ clang-tools-extra/clangd/test/background-index.test @@ -15,8 +15,8 @@ # RUN: clangd -background-index -lit-test < %t/definition.jsonrpc | FileCheck %t/definition.jsonrpc --check-prefixes=CHECK,BUILD # Test that the index is writing files in the expected location. -# RUN: ls %t/.clangd/index/foo.cpp.*.idx -# RUN: ls %t/sub_dir/.clangd/index/foo.h.*.idx +# RUN: ls %t/.cache/clangd/index/foo.cpp.*.idx +# RUN: ls %t/sub_dir/.cache/clangd/index/foo.h.*.idx # Test the index is read from disk: delete code and restart clangd. # RUN: rm %t/foo.cpp Index: clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp =================================================================== --- clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp +++ clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp @@ -95,8 +95,8 @@ }; // Creates and owns IndexStorages for multiple CDBs. -// When a CDB root is found, shards are stored in $ROOT/.clangd/index. -// When no root is found, the fallback path is ~/.cache/clangd/index. +// When a CDB root is found, shards are stored in $ROOT/.cache/clangd/index/. +// When no root is found, the fallback path is ~/.cache/clangd/index/. class DiskBackedIndexStorageManager { public: DiskBackedIndexStorageManager( @@ -115,7 +115,7 @@ llvm::SmallString<128> StorageDir(FallbackDir); if (auto PI = GetProjectInfo(File)) { StorageDir = PI->SourceRoot; - llvm::sys::path::append(StorageDir, ".clangd", "index"); + llvm::sys::path::append(StorageDir, ".cache", "clangd", "index"); } auto &IndexStorage = IndexStorageMap[StorageDir]; if (!IndexStorage) Index: clang-tools-extra/clangd/index/Background.h =================================================================== --- clang-tools-extra/clangd/index/Background.h +++ clang-tools-extra/clangd/index/Background.h @@ -56,9 +56,9 @@ using Factory = llvm::unique_function; // Creates an Index Storage that saves shards into disk. Index storage uses - // CDBDirectory + ".clangd/index/" as the folder to save shards. CDBDirectory - // is the first directory containing a CDB in parent directories of a file, or - // user's home directory if none was found, e.g. standard library headers. + // CDBDirectory + ".cache/clangd/index/" as the folder to save shards. + // CDBDirectory is the first directory containing a CDB in parent directories + // of a file, or user's home directory if none was found, e.g. stdlib headers. static Factory createDiskBackedStorageFactory( std::function(PathRef)> GetProjectInfo); }; Index: .gitignore =================================================================== --- .gitignore +++ .gitignore @@ -53,10 +53,11 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache # static analyzer regression testing project files /clang/utils/analyzer/projects/*/CachedSource /clang/utils/analyzer/projects/*/PatchedSource /clang/utils/analyzer/projects/*/ScanBuildResults -/clang/utils/analyzer/projects/*/RefScanBuildResults \ No newline at end of file +/clang/utils/analyzer/projects/*/RefScanBuildResults -------------- next part -------------- A non-text attachment was scrubbed... Name: D83099.275645.patch Type: text/x-patch Size: 3810 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:41 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:41 +0000 (UTC) Subject: [PATCH] D82368: [SystemZ/zos] Define Endian constants for z/OS. In-Reply-To: References: Message-ID: <5cf43c63260d1ac9089bb08948c79043@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG0663844b064d: [SystemZ/ZOS] Define Endian constants for z/OS. (authored by Kai). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82368/new/ https://reviews.llvm.org/D82368 Files: llvm/include/llvm/Support/SwapByteOrder.h Index: llvm/include/llvm/Support/SwapByteOrder.h =================================================================== --- llvm/include/llvm/Support/SwapByteOrder.h +++ llvm/include/llvm/Support/SwapByteOrder.h @@ -36,6 +36,10 @@ #else #define BYTE_ORDER LITTLE_ENDIAN #endif +#elif defined(__MVS__) +#define BIG_ENDIAN 4321 +#define LITTLE_ENDIAN 1234 +#define BYTE_ORDER BIG_ENDIAN #else #if !defined(BYTE_ORDER) && !defined(_WIN32) #include -------------- next part -------------- A non-text attachment was scrubbed... Name: D82368.275648.patch Type: text/x-patch Size: 467 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:41 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:41 +0000 (UTC) Subject: [PATCH] D82544: [SystemZ][ZOS] Implement getMainExecutable() and is_local_impl() In-Reply-To: References: Message-ID: <05c2896dc6056e3a72448074f54c8d8f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbfd84b1c034d: [SystemZ/ZOS] Implement getMainExecutable() and is_local_impl() (authored by Kai). Changed prior to commit: https://reviews.llvm.org/D82544?vs=274075&id=275649#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82544/new/ https://reviews.llvm.org/D82544 Files: llvm/lib/Support/Unix/Path.inc Index: llvm/lib/Support/Unix/Path.inc =================================================================== --- llvm/lib/Support/Unix/Path.inc +++ llvm/lib/Support/Unix/Path.inc @@ -48,6 +48,8 @@ #endif #elif defined(__DragonFly__) #include +#elif defined(__MVS__) +#include #endif // Both stdio.h and cstdio are included via different paths and @@ -56,9 +58,13 @@ #undef ferror #undef feof +#if !defined(PATH_MAX) // For GNU Hurd -#if defined(__GNU__) && !defined(PATH_MAX) -# define PATH_MAX 4096 +#if defined(__GNU__) +#define PATH_MAX 4096 +#elif defined(__MVS__) +#define PATH_MAX _XOPEN_PATH_MAX +#endif #endif #include @@ -100,7 +106,8 @@ #define STATVFS_F_FRSIZE(vfs) static_cast(vfs.f_bsize) #endif -#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) +#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) || \ + defined(__MVS__) #define STATVFS_F_FLAG(vfs) (vfs).f_flag #else #define STATVFS_F_FLAG(vfs) (vfs).f_flags @@ -265,6 +272,26 @@ // Fall back to the classical detection. if (getprogpath(exe_path, argv0)) return exe_path; +#elif defined(__MVS__) + int token = 0; + W_PSPROC buf; + char exe_path[PS_PATHBLEN]; + pid_t pid = getpid(); + + memset(&buf, 0, sizeof(buf)); + buf.ps_pathptr = exe_path; + buf.ps_pathlen = sizeof(exe_path); + + while (true) { + if ((token = w_getpsent(token, &buf, sizeof(buf))) <= 0) + break; + if (buf.ps_pid != pid) + continue; + char real_path[PATH_MAX]; + if (realpath(exe_path, real_path)) + return std::string(real_path); + break; // Found entry, but realpath failed. + } #elif defined(HAVE_DLFCN_H) && defined(HAVE_DLADDR) // Use dladdr to get executable path if available. Dl_info DLInfo; @@ -493,6 +520,10 @@ // vmount entry not found; "remote" is the conservative answer. return false; +#elif defined(__MVS__) + // The file system can have an arbitrary structure on z/OS; must go with the + // conservative answer. + return false; #else return !!(STATVFS_F_FLAG(Vfs) & MNT_LOCAL); #endif -------------- next part -------------- A non-text attachment was scrubbed... Name: D82544.275649.patch Type: text/x-patch Size: 2126 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:42 2020 From: llvm-commits at lists.llvm.org (Mikhail Maltsev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:42 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <0bb2cccc86a8348e7c373f67c7c751c9@localhost.localdomain> miyuki added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.h:72 + ThunderX3T110, + CortexX1, + CortexA78 ---------------- I think the new CPUs should be grouped with other Cortex-A CPUs. ================ Comment at: llvm/lib/Target/ARM/ARMSubtarget.h:78 + Swift, + CortexX1, + CortexA78 ---------------- Please keep the list sorted in alphabetical order. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:42 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:42 +0000 (UTC) Subject: [PATCH] D83125: [AArch64][SVE] Remove erroneous assert in resolveFrameOffsetReference In-Reply-To: References: Message-ID: david-arm accepted this revision. david-arm added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83125/new/ https://reviews.llvm.org/D83125 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:43 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:43 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: lenary added a comment. If it's not too much trouble, I've added D83159 as a parent patch, which are the tests for this change in the RISC-V backend. I realise this shouldn't block changes to the other targets, but given the transform is target-independent, it makes sense to see how it affects more targets if possible. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:44 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:44 +0000 (UTC) Subject: [PATCH] D82539: [TargetLowering] Improve expansion of ROTL/ROTR In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe7a4a24dc50a: [TargetLowering] Improve expansion of ROTL/ROTR (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82539/new/ https://reviews.llvm.org/D82539 Files: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Index: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6177,12 +6177,15 @@ SDLoc DL(SDValue(Node, 0)); EVT ShVT = Op1.getValueType(); - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); + SDValue Zero = DAG.getConstant(0, DL, ShVT); - // If a rotate in the other direction is legal, use it. + assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && + "Expecting the type bitwidth to be a power of 2"); + + // If a rotate in the other direction is supported, use it. unsigned RevRot = IsLeft ? ISD::ROTR : ISD::ROTL; - if (isOperationLegal(RevRot, VT)) { - SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + if (isOperationLegalOrCustom(RevRot, VT)) { + SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); Result = DAG.getNode(RevRot, DL, VT, Op0, Sub); return true; } @@ -6195,15 +6198,13 @@ return false; // Otherwise, - // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and w-c, w-1))) - // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and w-c, w-1))) + // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and -c, w-1))) + // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and -c, w-1))) // - assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && - "Expecting the type bitwidth to be a power of 2"); unsigned ShOpc = IsLeft ? ISD::SHL : ISD::SRL; unsigned HsOpc = IsLeft ? ISD::SRL : ISD::SHL; SDValue BitWidthMinusOneC = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); SDValue And0 = DAG.getNode(ISD::AND, DL, ShVT, Op1, BitWidthMinusOneC); SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC); Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0), -------------- next part -------------- A non-text attachment was scrubbed... Name: D82539.275656.patch Type: text/x-patch Size: 2063 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:45 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:45 +0000 (UTC) Subject: [PATCH] D82540: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount In-Reply-To: References: Message-ID: foad added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:6105 +// Check that (every element of) Z is undef or not an exact multiple of BW. +static bool isNonZeroModBitWidth(SDValue Z, unsigned BW) { + return ISD::matchUnaryPredicate( ---------------- arsenm wrote: > !isDivisibleByBitWidth()? I don't think that's a good idea because it's a tri-state question: is Z known to be divisible by BW, or known to be non-zero mod BW, or unknown? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82540/new/ https://reviews.llvm.org/D82540 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:45 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:53:45 +0000 (UTC) Subject: [PATCH] D83210: [RISCV][NFC] Add more tests for 32-bit constant materialization Message-ID: luismarques created this revision. luismarques added reviewers: asb, lenary. Herald added subscribers: llvm-commits, evandro, apazos, sameer.abuasal, pzheng, s.egerton, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, MaskRay, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, simoncook, johnrusso, rbar. Herald added a project: LLVM. The existing tests were mostly for 64-bit constants. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83210 Files: llvm/test/CodeGen/RISCV/imm.ll Index: llvm/test/CodeGen/RISCV/imm.ll =================================================================== --- llvm/test/CodeGen/RISCV/imm.ll +++ llvm/test/CodeGen/RISCV/imm.ll @@ -105,6 +105,57 @@ ret i32 -65536 ; -0x10000 } +; This can be materialized with ADDI+SLLI, improving compressibility. + +define signext i32 @imm_left_shifted_addi() nounwind { +; RV32I-LABEL: imm_left_shifted_addi: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a0, 32 +; RV32I-NEXT: addi a0, a0, -64 +; RV32I-NEXT: ret +; +; RV64I-LABEL: imm_left_shifted_addi: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a0, 32 +; RV64I-NEXT: addiw a0, a0, -64 +; RV64I-NEXT: ret + ret i32 131008 ; 0x1FFC0 +} + +; This can be materialized with ADDI+SRLI, improving compressibility. + +define signext i32 @imm_right_shifted_addi() nounwind { +; RV32I-LABEL: imm_right_shifted_addi: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a0, 524288 +; RV32I-NEXT: addi a0, a0, -1 +; RV32I-NEXT: ret +; +; RV64I-LABEL: imm_right_shifted_addi: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a0, 524288 +; RV64I-NEXT: addiw a0, a0, -1 +; RV64I-NEXT: ret + ret i32 2147483647 ; 0x7FFFFFFF +} + +; This can be materialized with LUI+SRLI, improving compressibility. + +define signext i32 @imm_right_shifted_lui() nounwind { +; RV32I-LABEL: imm_right_shifted_lui: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a0, 56 +; RV32I-NEXT: addi a0, a0, 580 +; RV32I-NEXT: ret +; +; RV64I-LABEL: imm_right_shifted_lui: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a0, 56 +; RV64I-NEXT: addiw a0, a0, 580 +; RV64I-NEXT: ret + ret i32 229956 ; 0x38244 +} + define i64 @imm64_1() nounwind { ; RV32I-LABEL: imm64_1: ; RV32I: # %bb.0: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83210.275653.patch Type: text/x-patch Size: 1728 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:45 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:45 +0000 (UTC) Subject: [PATCH] D82540: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount In-Reply-To: References: Message-ID: <0d7005b3fa3c27d623eb78d5b4de8a6c@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. foad marked an inline comment as done. Closed by commit rGbabbeafa006f: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82540/new/ https://reviews.llvm.org/D82540 Files: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Index: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6117,6 +6117,14 @@ return Ok; } +// Check that (every element of) Z is undef or not an exact multiple of BW. +static bool isNonZeroModBitWidth(SDValue Z, unsigned BW) { + return ISD::matchUnaryPredicate( + Z, + [=](ConstantSDNode *C) { return !C || C->getAPIntValue().urem(BW) != 0; }, + true); +} + bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result, SelectionDAG &DAG) const { EVT VT = Node->getValueType(0); @@ -6127,40 +6135,52 @@ !isOperationLegalOrCustomOrPromote(ISD::OR, VT))) return false; - // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) - // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) SDValue X = Node->getOperand(0); SDValue Y = Node->getOperand(1); SDValue Z = Node->getOperand(2); - unsigned EltSizeInBits = VT.getScalarSizeInBits(); + unsigned BW = VT.getScalarSizeInBits(); bool IsFSHL = Node->getOpcode() == ISD::FSHL; SDLoc DL(SDValue(Node, 0)); EVT ShVT = Z.getValueType(); - SDValue Mask = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue ShAmt, InvShAmt; - if (isPowerOf2_32(EltSizeInBits)) { - // Z % BW -> Z & (BW - 1) - ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); - // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) - InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); - } else { - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); - ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); - InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); - } - SDValue One = DAG.getConstant(1, DL, ShVT); SDValue ShX, ShY; - if (IsFSHL) { - ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); - SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); - ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + SDValue ShAmt, InvShAmt; + if (isNonZeroModBitWidth(Z, BW)) { + // fshl: X << C | Y >> (BW - C) + // fshr: X << (BW - C) | Y >> C + // where C = Z % BW is not zero + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, ShAmt); + ShX = DAG.getNode(ISD::SHL, DL, VT, X, IsFSHL ? ShAmt : InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, IsFSHL ? InvShAmt : ShAmt); } else { - SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); - ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); - ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) + // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) + SDValue Mask = DAG.getConstant(BW - 1, DL, ShVT); + if (isPowerOf2_32(BW)) { + // Z % BW -> Z & (BW - 1) + ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); + // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) + InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); + } else { + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); + } + + SDValue One = DAG.getConstant(1, DL, ShVT); + if (IsFSHL) { + ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); + SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); + ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + } else { + SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); + ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + } } Result = DAG.getNode(ISD::OR, DL, VT, ShX, ShY); return true; -------------- next part -------------- A non-text attachment was scrubbed... Name: D82540.275658.patch Type: text/x-patch Size: 3921 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:47 2020 From: llvm-commits at lists.llvm.org (Prathamesh via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:47 +0000 (UTC) Subject: [PATCH] D79785: [ARM] Register pressure with -mthumb forces register reload before each call In-Reply-To: References: Message-ID: <0e5cc1325991f0732064e2584776990e@localhost.localdomain> prathamesh updated this revision to Diff 275661. prathamesh added a comment. Hi, Sorry for late response. In the attached patch, I added a couple of more constraints if we're compiling for Thumb1: 1. Number of args passed + caller's num of args < total number of available regs 2. Each arg to callee, is either a "pass thru" arg, OR 8-bit imm OR a constant load. The intent is to allow only those args that need a single register for computation. The motivation is to allow the transform for simple cases which fit the above cases, or use direct call otherwise. Does it look reasonable ? The patch does not regress ARM tests and converts all calls to bl in the test attached in patch. Thanks, Prathamesh Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79785/new/ https://reviews.llvm.org/D79785 Files: llvm/lib/Target/ARM/ARMISelLowering.cpp llvm/test/CodeGen/ARM/minsize-call-cse-2.ll Index: llvm/test/CodeGen/ARM/minsize-call-cse-2.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/ARM/minsize-call-cse-2.ll @@ -0,0 +1,20 @@ +; RUN: llc < %s | FileCheck %s + +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" +target triple = "thumbv6m-arm-none-eabi" + +; CHECK-LABEL: f: +; CHECK: bl g +; CHECK: bl g +; CHECK: bl g +; CHECK: bl g +define void @f(i32* %p, i32 %x, i32 %y, i32 %z) minsize optsize { +entry: + call void @g(i32* %p, i32 %x, i32 %y, i32 %z) + call void @g(i32* %p, i32 %x, i32 %y, i32 %z) + call void @g(i32* %p, i32 %x, i32 %y, i32 %z) + call void @g(i32* %p, i32 %x, i32 %y, i32 %z) + ret void +} + +declare void @g(i32*,i32,i32,i32) Index: llvm/lib/Target/ARM/ARMISelLowering.cpp =================================================================== --- llvm/lib/Target/ARM/ARMISelLowering.cpp +++ llvm/lib/Target/ARM/ARMISelLowering.cpp @@ -2228,6 +2228,35 @@ return isa(U) && cast(U)->getParent() == BB; }) > 2; + + // For Thumb1, we impose additional constraints + // due to low number of registers: + // 1. Number of args passed + caller's num of args < + // Total number of available regs + // 2. Each arg to callee, is either a "pass thru" arg, OR 8-bit imm + // OR a constant load. The intent is to allow only those args, + // that need a single register for computation. + + if (PreferIndirect && Subtarget->isThumb1Only()) { + const Instruction *I = cast(*GV->users().begin()); + Function &F = MF.getFunction(); + PreferIndirect = false; + // FIXME: What API to use to get number of available regs + // instead of hardcoding 7 ? + if (F.arg_size() + I->getNumOperands() < 7) { + unsigned i; + for (i = 0; i < I->getNumOperands() - 1; i++) { + Value *O = I->getOperand(i); + if (!(isa(O) || + (i < F.arg_size() && O == F.getArg(i)) || + (isa(O) && + cast(O)->getZExtValue() < 256))) + break; + } + if (i == I->getNumOperands() - 1) + PreferIndirect = true; + } + } } } if (isTailCall) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D79785.275661.patch Type: text/x-patch Size: 2418 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:48 2020 From: llvm-commits at lists.llvm.org (EsmeYi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:48 +0000 (UTC) Subject: [PATCH] D82145: [PowerPC] Legalize SREM/UREM directly on P9. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG0607c8df7faf: [PowerPC] Legalize SREM/UREM directly on P9. (authored by Esme). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82145/new/ https://reviews.llvm.org/D82145 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82145.275668.patch Type: text/x-patch Size: 4426 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:48 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:48 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot Message-ID: mbrkusanin created this revision. mbrkusanin added reviewers: foad, arsenm. mbrkusanin added a project: LLVM. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Select ballot intrinsic for GlobalISel. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83214 Files: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83214.275666.patch Type: text/x-patch Size: 9673 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:49 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:49 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Message-ID: ebevhan created this revision. ebevhan added reviewers: leonardchan, craig.topper, bjope. Herald added subscribers: llvm-commits, jdoerfert, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are needed for implementing the Embedded-C fixed point support in Clang. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.275676.patch Type: text/x-patch Size: 37395 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:49 2020 From: llvm-commits at lists.llvm.org (Ronak Chauhan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:49 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: rochauha updated this revision to Diff 275674. rochauha added a comment. - Compute .amdhsa_next_free_vgpr based on inverse of what the assembler does to compute GRANULATED_WORKITEM_VGPR_COUNT. - Some changes to accomodate differences between GFX9 and GFX10 - Updated test case for GFX10 as well Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 Files: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h llvm/test/tools/llvm-objdump/ELF/AMDGPU/code-object-v3.ll llvm/tools/llvm-objdump/llvm-objdump.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80713.275674.patch Type: text/x-patch Size: 22236 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:50 2020 From: llvm-commits at lists.llvm.org (Ronak Chauhan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:50 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: rochauha marked an inline comment as done. rochauha added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1245 + // + // To to get the exact same bytes in re-assembled binary, we disassemble + // aamdhsa_next_free_sgpr as the amdgcn.next_free_sgpr assembler symbol and ---------------- rochauha wrote: > scott.linder wrote: > > For this and the above case we should have tests to prove this out. I.e. assemble sources to a binary, disassemble and reassemble it, and then compare the two binaries. Ideally we would do this for some edge cases around VGPR/SGPR allocation granularity. > > > > There may need to be some fixup between disassembly and reassembly to account for the remaining non-reassembleable bits produced by llvm-objdump, but they should be pretty minor for a trivial kernel, and I would expect you could handle them with just `sed` which seems to be available to LIT tests. > Right now we can't really re-assemble in the lit-test. This needs to be tested 'informally' by: > > - Manually writing a small test case. Make a copy of it too. > - Assembling it into the binary : Binary-1. > - Disassembling it. > - Replace the original kernel descriptor with the disassembled kernel descriptor in the copy. > - Assemble the copy : Binary-2. > - Compare Binary-1 and Binary-2. Went this route to check whether re-assembled binaries match or not. Turns out that both binaries match, in size (overall size as well as size of sections) and also in terms of all the disassembled content. But a `diff object1 object2` says that binary files differ. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:51 2020 From: llvm-commits at lists.llvm.org (Simon Moll via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:51 +0000 (UTC) Subject: [PATCH] D83200: [VE] Change to use isa In-Reply-To: References: Message-ID: <5891f5c4c188c5fce338db0b8a74b4ce@localhost.localdomain> simoll accepted this revision. simoll added a comment. This revision is now accepted and ready to land. I'd say you can commit a minor NFC like this one (in particular on a code base you have authored for the most part) without review. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83200/new/ https://reviews.llvm.org/D83200 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:51 2020 From: llvm-commits at lists.llvm.org (Simon Moll via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:51 +0000 (UTC) Subject: [PATCH] D83170: [VE] Support symbol with offset in assembly In-Reply-To: References: Message-ID: <5d98f8ce3a1c5fcd405e16406df12f09@localhost.localdomain> simoll accepted this revision. simoll added a comment. This revision is now accepted and ready to land. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83170/new/ https://reviews.llvm.org/D83170 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:52 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:52 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5121 + if (I->mayReadOrWriteMemory() && !isSideeffectIntrinsic(I) && + !isVectorizableLibFunctionCall(I)) { // Update the linked list of memory accessing instructions. ---------------- sanwou01 wrote: > ABataev wrote: > > Why do you need to exclude vectorizable library functions here? > For "normal" function calls, we have to assume that the functions may read or write memory any location in memory, which may alias memory read or written by another instruction in the same bundle. For functions with vector variants, we should be able to assume that they are pure: they won't write to memory (except when the function takes pointer arguments, which I'm not handling correctly now that I think about it; I'll fix that up). > > Actually, the calculateDependencies function below might be the less-surprising place to handle this. Then it is better to add correct attributes to such functions. Special processing may lead to problems later. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:52 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:52 +0000 (UTC) Subject: [PATCH] D83219: [ARM] Add MVE_TwoOpPattern. NFC Message-ID: dmgreen created this revision. dmgreen added reviewers: SjoerdMeijer, simon_tatham, efriedma, samparker, ostannard. Herald added subscribers: danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. This commons out a chunk of the different two operand MVE patterns into a single helper multidef. Or technically two multidef patterns so that the Dup qr patterns can also get the same treatment. This is most of the two address instructions that we have some codegen pattern for (not ones that we select purely from intrinsics). It does not include shifts, which are more spread out and will need some extra work to be given the same treatment. https://reviews.llvm.org/D83219 Files: llvm/lib/Target/ARM/ARMInstrMVE.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83219.275684.patch Type: text/x-patch Size: 28243 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:53 2020 From: llvm-commits at lists.llvm.org (Daniel Kiss via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:53 +0000 (UTC) Subject: [PATCH] D77786: [AArch64] Add v8.5 Branch Target Identification support. In-Reply-To: References: Message-ID: <939ffc641f6bda7f7138d1da6bc59273@localhost.localdomain> danielkiss added a comment. Ping. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77786/new/ https://reviews.llvm.org/D77786 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:53 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:53 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <69c9ea41e09a4e28d53349c9cda9cabb@localhost.localdomain> kuter updated this revision to Diff 275688. kuter added a comment. Introduce test for the command line option CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp llvm/test/Transforms/Attributor/allow_list.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83185.275688.patch Type: text/x-patch Size: 5537 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:53 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:53 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <48d3e577d4bbded29adef93d7b621bd8@localhost.localdomain> kuter marked an inline comment as done. kuter added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/Attributor.cpp:1460 + return true; + return std::count(SeedAllowList.begin(), SeedAllowList.end(), AA.getName()); +} ---------------- sstefan1 wrote: > would it make sense to make this always check lower case names, to avoid mistakes? Since the name is usually the name of the class that defines the attribute, I think it makes for sense the use the name as is. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:53 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:53 +0000 (UTC) Subject: [PATCH] D83220: [DWARFYAML][unittest] Refactor parseDWARFYAML(). Message-ID: Higuoxing created this revision. Higuoxing added reviewers: jhenderson, grimar, MaskRay. Herald added subscribers: llvm-commits, aprantl. Herald added a project: LLVM. In this change, `parseDWARFYAML()` is refactored to be able to parse YAML decription into different data structures. We don't have to craft the whole DWARF structure for a small test in the future. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83220 Files: llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83220.275689.patch Type: text/x-patch Size: 4695 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:53 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:53 +0000 (UTC) Subject: [PATCH] D82524: [SVE][CodeGen] Fix bug when falling back to DAG ISel In-Reply-To: References: Message-ID: sdesmalen accepted this revision. sdesmalen added a comment. This revision is now accepted and ready to land. LGTM! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82524/new/ https://reviews.llvm.org/D82524 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:54 2020 From: llvm-commits at lists.llvm.org (LuoYuanke via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:54 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: LuoYuanke added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:3774 + // creating a temporary stack slot, again. + if (Flags.isByVal() && !hasCopy) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); ---------------- Let me check below 2 rule is right or not. 1. On linux when the byval attribute is set, it indicate copy the value that point by the pointer to the parameter stack. 2. On window when the byval attribute is set, it indicate that allocate temporary object in caller, copy the value to the temporary, and store the temporary pointer (which point to the temporary object) to the parameter stack. On linux, the VA.getLocInfo() is CCValAssign::Full, and on windows is the VA.getLocInfo() is CCValAssign::Indirect. So I think we can just check the VA.getLocInfo(). If VA.getLocInfo() is CCValAssign::Indirect, we can NOT copy object. Instead we just restore the pointer. `(Flags.isByVal() && VA.getLocInfo() != CCValAssign::Indirect)` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:54 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:54 +0000 (UTC) Subject: [PATCH] D83220: [DWARFYAML][unittest] Refactor parseDWARFYAML(). In-Reply-To: References: Message-ID: Higuoxing updated this revision to Diff 275692. Higuoxing added a comment. Update. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83220/new/ https://reviews.llvm.org/D83220 Files: llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83220.275692.patch Type: text/x-patch Size: 4898 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:55 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:55 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <10015045cda8a0098a04d425d383bb30@localhost.localdomain> RKSimon added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:40260 + TLI.isTypeLegal(VT) && ((Subtarget.hasAVX() && EltBitWidth == 32) || + (Subtarget.hasAVX2() && EltBitWidth == 64)); + if (CanShiftBlend && ---------------- XOP has more vector shifts and vpcmov which should allow 8/16-bit cases as well - I added testing at rGd6c72bdca2f2 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:56 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:53:56 +0000 (UTC) Subject: [PATCH] D83034: [GlobalISel] Don't skip adding predicate matcher In-Reply-To: References: Message-ID: nhaehnle added a comment. Could you provide a more concrete example that goes wrong without this change? I believe that could help understanding, would be good to add to the commit message for this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83034/new/ https://reviews.llvm.org/D83034 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:55 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:55 +0000 (UTC) Subject: [PATCH] D83222: [ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z) Message-ID: dmgreen created this revision. dmgreen added reviewers: SjoerdMeijer, simon_tatham, efriedma, samparker, ostannard. Herald added subscribers: danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. Most MVE instructions can be predicated to fold a select into the instruction, using the predicate and the select "else" as a passthough. This adds tablegen patterns for most two operand instructions using the newly added TwoOpPattern from D83219 . It could probably be done differently, perhaps after ISel as a peephole optimisation, but doing it in tblgen gives a good excuse to clear up some of the existing patterns. https://reviews.llvm.org/D83222 Files: llvm/lib/Target/ARM/ARMInstrMVE.td llvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll llvm/test/CodeGen/Thumb2/mve-pred-selectop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83222.275690.patch Type: text/x-patch Size: 41712 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:57 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:57 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: <8b77836c9b4a3ff85816e8920c32511e@localhost.localdomain> djtodoro marked an inline comment as done. djtodoro added inline comments. ================ Comment at: llvm/test/Transforms/DeadArgElim/multdeadretval.ll:7 +; RUN: opt < %s -enable-debugify=synthetic -deadargelim -S 2>&1 \ +; RUN: | FileCheck %s -check-prefix=DEBUG ---------------- vsk wrote: > I recommend copying this test, modifying it to include debug info, and dropping the -enable-debugify=synthetic part. This bugfix doesn't need to depend on the debugify original mode patchset. Also, the hardcoded checks for DILocation line numbers will make this test hard to modify, so if we want to check specific synthetic line numbers I think we'd be better served by a dedicated test. It makes sense to me! Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:57 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:57 +0000 (UTC) Subject: [PATCH] D82458: [ARM] Adjust default fp extend and trunc costs In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGafdb2ef2ed9d: [ARM] Adjust default fp extend and trunc costs (authored by dmgreen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82458/new/ https://reviews.llvm.org/D82458 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast.ll llvm/test/Analysis/CostModel/ARM/cast_ldst.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82458.275694.patch Type: text/x-patch Size: 205917 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:57 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:57 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: <62216e156cc787af066b5d1440a2b997@localhost.localdomain> djtodoro marked an inline comment as done. djtodoro added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp:973 // return value V = ExtractValueInst::Create(NewCB, NewRetIdxs[Ri], "newret", InsertPt); ---------------- vsk wrote: > aprantl wrote: > > @vsk: would it be better style to ad-hoc create an IRBuilder with the correct debug location here? > Yes, that seems to be the common idiom. I recommend using `IRBuilder` to avoid spurious test changes. Oh.. The `IRBuilder<>` will generate a debug loc by default (via `insert()` method). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:57 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:57 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: <1b80cd4dd0d8fb6385d8fd1c065fb72b@localhost.localdomain> djtodoro updated this revision to Diff 275695. djtodoro added a comment. - Use `IRBuilder<>` since it generates dbg loc by default - Create new test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 Files: llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81939.275695.patch Type: text/x-patch Size: 18000 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:53:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:58 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: <5ee09162d960264de67f2a5267a401a8@localhost.localdomain> arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1054-1059 + MachineInstr *Value = MRI->getVRegDef(I.getOperand(2).getReg()); + if (Value->getOpcode() == AMDGPU::COPY) + Value = MRI->getVRegDef(Value->getOperand(1).getReg()); + + if (Value->getOpcode() == AMDGPU::G_CONSTANT) { + const APInt &Val = Value->getOperand(1).getCImm()->getValue(); ---------------- You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1062 + if (Val.isNullValue()) { + unsigned Opcode = Is64 ? AMDGPU::V_MOV_B32_e64 : AMDGPU::V_MOV_B32_e32; + BuildMI(*BB, &I, DL, TII.get(Opcode), DstReg).addImm(0); ---------------- This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32. It's nota 64-bit move. This also returns a scalar value ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1065 + } else if (Val.isAllOnesValue()) { + unsigned SrcReg = Is64 ? AMDGPU::EXEC : AMDGPU::EXEC_LO; + BuildMI(*BB, &I, DL, TII.get(AMDGPU::COPY), DstReg).addReg(SrcReg); ---------------- Register ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:4165 + unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits(); + OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize); + OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize); ---------------- This returns an SGPR value Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:58 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:8 + +define i64 @test0() { +; CHECK-LABEL: test0: ---------------- Can you give the test names more desrciptive names, like constant_false, constant_true? Also the function returns should use SGPRs, so switch to shader calling conventions? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:59 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:59 +0000 (UTC) Subject: [PATCH] D82547: [Debugify] Expose debugify (original mode) as CC1 option In-Reply-To: References: Message-ID: <6ef427e063b1c6fe04d7922af3858fa7@localhost.localdomain> djtodoro marked 2 inline comments as done. djtodoro added inline comments. ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:855 +class ClangCustomPassManager : public legacy::PassManager { +public: ---------------- vsk wrote: > Please factor out OptCustomPassManager from opt and generalize it so it can be used by both opt and clang. That should help ensure that extensions and bug fixes are only made to one custom 'debugify' pass manager. I'll try that with the latest code. I remember I've tried it once, but I ended up moving it into the IR library (since we need to link it within legacy pass manager). ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:893 + + void enableDebugifyEachOriginal() { DebugifyEachOriginalEnabled = true; } + ---------------- vsk wrote: > I don't think the discussion from 'RFC: Introduce LLVM DI Checker utility' is complete, and I'd ask that you split off changes for 'original mode' from this patch until there's some consensus about what that mode should look like. > > There are open questions about to what extent a new mode is needed (e.g., it may be that the interesting questions compiler developers need to answer about debug info loss are simpler to determine some other way (which is not to say that that's true -- just that we haven't explored the space much yet)). Or what its output should look like. OK, I'll split off this and notify you/resend a message on the RFC. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82547/new/ https://reviews.llvm.org/D82547 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:00 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:00 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: lei added inline comments. ================ Comment at: clang/lib/Headers/altivec.h:17037 + })(x)) \ + .ui +#define DP2LL(x) \ ---------------- This looks just like a cast to `unsigned int`, can you explain why it needs to be cast to a union to extract the unsigned int instead of just directly casting it to an `unsigned int`? ================ Comment at: clang/lib/Headers/altivec.h:17044 + .ull + + ---------------- nit: remove extra empty line. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:519 + // CHECK-BE-NEXT: ret <4 x i32> + // CHECK-LE: @llvm.ppc.altivec.vinsw(<4 x i32> %{{.+}}, i64 %{{.+}}, i32 12 + // CHECK-LE-NEXT: ret <4 x i32> ---------------- I don't see why you have 2 different CHECK prefix for the same run line. Seems redundant to me. Please update to just use `CHECK`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:00 2020 From: llvm-commits at lists.llvm.org (Madhur Amilkanthwar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:00 +0000 (UTC) Subject: [PATCH] D83034: [GlobalISel] Don't skip adding predicate matcher In-Reply-To: References: Message-ID: <95de76b491a79e99f4fcb3b23cee3a4f@localhost.localdomain> madhur13490 added a subscriber: arsenm. madhur13490 added a comment. Ping! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83034/new/ https://reviews.llvm.org/D83034 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:01 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:01 +0000 (UTC) Subject: [PATCH] D82387: [flang] add RTBuilder In-Reply-To: References: Message-ID: <7291fbdeedbc2408b140944fa7971cfe@localhost.localdomain> DavidTruby added a comment. Is there a reason to use `float _Complex` here at all? The C++ standard (29.5.4 of C++17) guarantees that `std::complex` and `float _Complex` are layout compatible and can be reinterpret_casted to each other so even if these functions are intended to be callable from C/interoperable with _Complex in C code, it'd be better to use std::complex on the C++ side. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82387/new/ https://reviews.llvm.org/D82387 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:01 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:01 +0000 (UTC) Subject: [PATCH] D82995: [UpdateTestChecks] Allow $ in function names In-Reply-To: References: Message-ID: <06fa7c087e8de0fa14c3bcb6e0c9250d@localhost.localdomain> greened added a comment. In D82995#2129326 , @spatel wrote: > It would be good to check in a test example alongside this change, so we know it works. (And we'll know what we are losing if this has to be reverted for some reason.) Will do. I was just recently pointed to where tests live. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82995/new/ https://reviews.llvm.org/D82995 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:01 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:01 +0000 (UTC) Subject: [PATCH] D82995: [UpdateTestChecks] Allow $ in function names In-Reply-To: References: Message-ID: <4e61a59b2ff4e572cb20d6df88718cc9@localhost.localdomain> greened marked an inline comment as done. greened added inline comments. ================ Comment at: llvm/utils/UpdateTestChecks/asm.py:122 ASM_FUNCTION_ARM_MACHO_RE = re.compile( r'^_(?P[^:]+):[ \t]*\n' ---------------- spatel wrote: > Do these ARM regexes not need the extra '?' like the others? I'm not entirely certain. `` will match `"` because `[^:]` matches `"`. I originally developed this against x86_64 and the way the asm printer works for that target, the function symbol itself is printed without quotes (`$` is fine in symnbol names) but the *comment* following the label includes the quotes. I did the same for every other target though I should verify that is correct (I will add tests for every target). Since these AArch64 patterns don't include a comment after the labels I didn't have anywhere to put quotes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82995/new/ https://reviews.llvm.org/D82995 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay, atanasyan. Herald added subscribers: rupprecht, arichardson, sdardis, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. `MipsGOTParser` is a helper class that is used to dump MIPS GOT and PLT. There is a problem with it: it might call `report_fatal_error()` on invalid input. When this happens, the tool reports a crash: # command stderr: LLVM ERROR: Cannot find PLTGOT dynamic table tag. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backt race. Stack dump: ... Such error were not tested. In this patch I've refactored `MipsGOTParser`: I've splitted handling of GOT and PLT to separate methods. This allows to propagate any possible errors to caller and should allow to dump the PLT when something is wrong with the GOT and vise versa in the future. I've added tests for each `report_fatal_error()` and now calling the `reportError` instead. In the future we might want to switch to reporting warnings, but it requres the additional testing and should be performed independently. I've kept `unwrapOrError` calls untouched for now as I'd like to focus on eliminating `report_fatal_error` calls in this patch only (to stop crashing on invalid inputs when doing inputs fuzzing). https://reviews.llvm.org/D83225 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83225.275700.patch Type: text/x-patch Size: 53073 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D82365: [Power10] Implement Vector Insert Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <2d9fefb10b1931b7c5d17a8164b71ddc@localhost.localdomain> lei added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:12 +// RUN: -target-cpu pwr10 -triple powerpc64le-unknown-unknown -emit-llvm %s \ +// RUN: -o - | FileCheck %s -check-prefix=CHECK-LE + ---------------- biplmish wrote: > lei wrote: > > I just noticed this. There is no need to add this RUN line since it's the same as the one on line 2. Please post a patch to remove this and update tests to use the default `CHECK`. > Sure. However there are also tests in Line 125,133 etc which would need modification. > > Can we also do "RUN: -o - | FileCheck %s -check-prefixes=CHECK,CHECK-LE" in the test1 and remove the test3 so that the tests work in the current format. Please make all the necessary modification all affected testcases to use `CHECK` instead of `CHECK-LE`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82365/new/ https://reviews.llvm.org/D82365 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D83020: [AMDGPU] Avoid using s_cmpk when src0 is not register In-Reply-To: References: Message-ID: arsenm accepted this revision. arsenm added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/test/CodeGen/AMDGPU/cmp_shrink.mir:5 +# GCN: bb.0: +# GCN-NOT: S_CMPK_GT_I32 +--- ---------------- ruiling wrote: > arsenm wrote: > > positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too? > will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue. So what is the context this appears? Why wasn't it optimized out? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83020/new/ https://reviews.llvm.org/D83020 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <39759edd1d08036bc8afcfe62c1ac67b@localhost.localdomain> dantrushin marked 5 inline comments as done. dantrushin added inline comments. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:125 +// Return statepoint GC args as a set +static SmallSet collectGCRegs(MachineInstr &MI) { + StatepointOpers SO(&MI); ---------------- skatkov wrote: > Do I understand correctly that with your changes ALL GC pointers must be defs? > So why do you need these iterations instead of just taking all defs? Strictly speaking, no. Only derived pointers passed in registers. Are we guaranteed that all base pointers will appear as derived ones too? If yes, then it is good catch, taking them from defs is simpler (but taking them from operand list instead of def list sounds a bit more natural, IMHO) ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:401 + + // To insert reload at the end of MBB, insert it before last instruction + // and then swap them. ---------------- skatkov wrote: > what is the reason for this magic? The reason is that `TTI.loadRegFromStackSlot` can insert load only **before** some existing instruction. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:131 + + // Skip deopt args + while (NumDeoptArgs--) ---------------- skatkov wrote: > dantrushin wrote: > > skatkov wrote: > > > What if deopt args contains gc pointer? > > At this point, we can not know. > > And your code handles all deopt args uniformly already ;) > > Mine adds some more restrictions > This is because I'm sure that GC pointer cannot be on register until your changes. So nothing has changed. At this point there is no way to detect deopt pointer which is not in gc list. ISEL determines what pointers to pass where. If implementation cannot handle pointer deopt value, not present in gc list, it should not enable it at all or spill **all** deopt values Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D83034: [GlobalISel] Don't skip adding predicate matcher In-Reply-To: References: Message-ID: <21eef3226f3d519767b0ca9c80aaff07@localhost.localdomain> arsenm added a comment. I would expect the existing tablegen tests to break from this and need updating? ================ Comment at: llvm/utils/TableGen/GlobalISelEmitter.cpp:3575 + HasPredicateCode = true; + InsnMatcher.addPredicate(Predicate); + } ---------------- I think this should remain as the last predicate type added to the matcher, after everything else here. The ordering does matter (see D82331) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83034/new/ https://reviews.llvm.org/D83034 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:03 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:03 +0000 (UTC) Subject: [PATCH] D79709: [AArch64][BFloat] basic AArch64 bfloat support In-Reply-To: References: Message-ID: <0347701d80fe0f2bba8bbc420ae8b760@localhost.localdomain> stuij marked an inline comment as done. stuij added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:135 addRegisterClass(MVT::f16, &AArch64::FPR16RegClass); + addRegisterClass(MVT::bf16, &AArch64::FPR16RegClass); addRegisterClass(MVT::f32, &AArch64::FPR32RegClass); ---------------- c-rhodes wrote: > Shouldn't this and the types below be predicated on `Subtarget->hasBF16()`? > > We've been fixing up cases in SVE for bfloat intrinsics where we missed predicating intrinsics / patterns on `+bf16`. I fixed this for the sizeless bfloat types added here in D82494 and it revealed the places we'd forgot to add the guard. Sorry, I missed this comment. Yes, you're right, we should clean this up. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79709/new/ https://reviews.llvm.org/D79709 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:02 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:02 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: grimar updated this revision to Diff 275701. grimar added a comment. - Remove unused YAML line. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83225.275701.patch Type: text/x-patch Size: 53053 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:03 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:03 +0000 (UTC) Subject: [PATCH] D82331: TableGen/GlobalISel: Partially fix nontrivial, custom predicates In-Reply-To: References: Message-ID: <7eb1e78cd50c02451085c95fc9db0467@localhost.localdomain> arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82331/new/ https://reviews.llvm.org/D82331 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:04 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:04 +0000 (UTC) Subject: [PATCH] D83092: DomTree: Add findSiblingOfUncle helper In-Reply-To: References: Message-ID: <05a6ee920ea54a230add109d6917d432@localhost.localdomain> nhaehnle marked an inline comment as done. nhaehnle added inline comments. ================ Comment at: llvm/lib/Support/GenericDomTree.cpp:220 +/// the degenerate case where \p A itself is a sibling of \p Uncle. +const GenericDomTreeNodeBase *GenericDominatorTreeBase::findSiblingOfUncle( + const GenericDomTreeNodeBase *A, ---------------- arsenm wrote: > I'm not sure these are the right family analogies. This could also find a great uncle, or the same parent. Fair enough, do you have a suggestion for a better name? `findSiblingOfNthUncle` perhaps? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83092/new/ https://reviews.llvm.org/D83092 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:04 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:04 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <7c2f45d68cc6fadcdd27423b537cf0de@localhost.localdomain> spatel updated this revision to Diff 275705. spatel marked an inline comment as done. spatel added a comment. Patch updated: Enable transform for XOP targets. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vselect-pcmp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83181.275705.patch Type: text/x-patch Size: 12114 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:04 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:04 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <1b110ee97c5eea3372b192382701d7e1@localhost.localdomain> spatel marked an inline comment as done. spatel added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:40260 + TLI.isTypeLegal(VT) && ((Subtarget.hasAVX() && EltBitWidth == 32) || + (Subtarget.hasAVX2() && EltBitWidth == 64)); + if (CanShiftBlend && ---------------- RKSimon wrote: > XOP has more vector shifts and vpcmov which should allow 8/16-bit cases as well - I added testing at rGd6c72bdca2f2 Ok - I'll enable XOP for all legal types, and we can decide if we need to exclude any types based on those diffs. I don't have a good sense of what's good/bad/possible with those instructions. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:04 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:04 +0000 (UTC) Subject: [PATCH] D83227: [flang] Add algorithm include to runtime/file.cpp for std::min Message-ID: DavidTruby created this revision. DavidTruby added reviewers: sscalpone, klausler. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. std::min is only guaranteed to exist in the header. Currently this builds with libstdc++ because the algorithm include is coming transitively from elsewhere, however it doesn't build with libc++. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83227 Files: flang/runtime/file.cpp Index: flang/runtime/file.cpp =================================================================== --- flang/runtime/file.cpp +++ flang/runtime/file.cpp @@ -9,6 +9,7 @@ #include "file.h" #include "magic-numbers.h" #include "memory.h" +#include #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: D83227.275706.patch Type: text/x-patch Size: 317 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:04 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:04 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <51334d13a34463d269ea0ac1c6c8ef54@localhost.localdomain> foad added a comment. @lebedev.ri this is causing assertion failures and verification failures in some of our downstream tests. Here's a test case: $ cat reduced.ll define void @main(<3 x i32> inreg %w) { entry: %a = extractelement <3 x i32> undef, i32 0 %b = extractelement <3 x i32> undef, i32 1 %x = extractelement <3 x i32> %w, i32 2 %y = insertelement <4 x i32> undef, i32 %x, i32 2 %z = insertelement <4 x i32> %y, i32 undef, i32 3 store <4 x i32> %z, <4 x i32> addrspace(7)* undef, align 16 ret void } $ ~/llvm-debug/bin/opt -scalarizer -o /dev/null reduced.ll Instruction does not dominate all uses! = extractelement [145938144 x half] , i32 undef %z.upto2 = insertelement <4 x i32> undef, i32 , i32 2 in function main LLVM ERROR: Broken function found, compilation aborted! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:05 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:05 +0000 (UTC) Subject: [PATCH] D83228: [llvm] [unittests] Remove some temporary files after they're not needed Message-ID: broadwaylamb created this revision. broadwaylamb added reviewers: sammccall, chandlerc, thakis. Herald added subscribers: llvm-commits, jfb, atanasyan, jrtc27, mgorny, sdardis. Herald added a project: LLVM. Some LLVM unit tests forget to clean up the temporary files and directories. Use existing RAII classes for cleaning them up, remove duplicated code (`ScopedDir`, `ScopedFile`, `ScopedLink` classes) by extracting it into a common header file. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83228 Files: llvm/unittests/CMakeLists.txt llvm/unittests/ProfileData/SampleProfTest.cpp llvm/unittests/Support/FileCollectorTest.cpp llvm/unittests/Support/FileUtilitiesTest.cpp llvm/unittests/Support/TarWriterTest.cpp llvm/unittests/Support/VirtualFileSystemTest.cpp llvm/unittests/tools/llvm-exegesis/Mips/BenchmarkResultTest.cpp llvm/unittests/tools/llvm-exegesis/X86/SnippetFileTest.cpp llvm/utils/unittest/support/llvm-test-helper.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83228.275704.patch Type: text/x-patch Size: 13194 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:05 2020 From: llvm-commits at lists.llvm.org (Mikhail Goncharov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:05 +0000 (UTC) Subject: [PATCH] D83230: Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" Message-ID: goncharov created this revision. goncharov added a reviewer: kadircet. Herald added a reviewer: bollu. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. This reverts commit 2c16100e6f72075564ea1f67fa5a82c269dafcd3 . ninja check-polly fails: Polly :: Isl/CodeGen/MemAccess/generate-all.ll Polly :: ScopInfo/multidim_srem.ll Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83230 Files: llvm/lib/Analysis/ScalarEvolution.cpp llvm/test/Analysis/ScalarEvolution/sdiv.ll llvm/test/Analysis/ScalarEvolution/srem.ll Index: llvm/test/Analysis/ScalarEvolution/srem.ll =================================================================== --- llvm/test/Analysis/ScalarEvolution/srem.ll +++ llvm/test/Analysis/ScalarEvolution/srem.ll @@ -14,11 +14,11 @@ ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = srem i32 %i.0, 2 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i32) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i32) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i64) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i64) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * (zext i1 {false,+,true}<%for.cond> to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * (zext i1 (trunc i32 %width to i1) to i64)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) Index: llvm/test/Analysis/ScalarEvolution/sdiv.ll =================================================================== --- llvm/test/Analysis/ScalarEvolution/sdiv.ll +++ llvm/test/Analysis/ScalarEvolution/sdiv.ll @@ -14,11 +14,11 @@ ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = sdiv i32 %i.0, 2 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,1073741824) S: [0,1073741824) Exits: (%width /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: full-set S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,2147483648) S: [0,2147483648) Exits: ((zext i32 %width to i64) /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [-2147483648,2147483648) S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * ({0,+,1}<%for.cond> /u 2)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * ((zext i32 %width to i64) /u 2)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) Index: llvm/lib/Analysis/ScalarEvolution.cpp =================================================================== --- llvm/lib/Analysis/ScalarEvolution.cpp +++ llvm/lib/Analysis/ScalarEvolution.cpp @@ -6303,20 +6303,6 @@ return getSCEV(U->getOperand(0)); break; - case Instruction::SDiv: - // If both operands are non-negative, this is just an udiv. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getUDivExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - - case Instruction::SRem: - // If both operands are non-negative, this is just an urem. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getURemExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can // lead to pointer expressions which cannot safely be expanded to GEPs, // because ScalarEvolution doesn't respect the GEP aliasing rules when -------------- next part -------------- A non-text attachment was scrubbed... Name: D83230.275707.patch Type: text/x-patch Size: 4968 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:06 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:06 +0000 (UTC) Subject: [PATCH] D83229: [RISCV][WIP] Improve RV32 constant materialization Message-ID: luismarques created this revision. luismarques added reviewers: asb, lenary. Herald added subscribers: llvm-commits, evandro, apazos, sameer.abuasal, pzheng, s.egerton, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, MaskRay, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, simoncook, johnrusso, rbar, hiraditya. Herald added a project: LLVM. D79492 improves the materialization of constants for RV64. For ease of review, I'm splitting out several changes and improvements into separate patches. This patch addresses RV32. For RV32, `LUI/ADDI/LUI+ADD`I is already optimal regarding the number of instructions, but we can use alternative sequences for improved compressibility. Since we don't need the recursive approach of D79492 , it's clearer and more efficient to split the two implementations. For ease of review, this patch contributes only the RV32 improvements to constant materialization, with a split implementation approach. I'm marking this patch/review as WIP due to the following code quality issues. In isolation, these constant materialization changes should always be reasonable. Unfortunately, when multiple constants need to be materialized these optimizations can result in worse outcomes, due to how we optimize each constant in isolation. For instance: x = 0x80000000 y = 0x7FFFFFFF Before: x = (LUI 0x80000) y = (ADDI (LUI 0x80000) -1) = (ADDI x -1) => LUI+ADDI After: x = (LUI 0x80000) y = (SRLI (ADDI zero, -1) 1) => LUI+ADDI+SRLI An example of this scenario occurs in `copysign-casts.ll`. Another issue is that removing the ADDI from the materialization screws up folding the ADDIs into the load/stores (see `fold-addi-loadstore.ll`) and the computation of base+offset addresses (see `hoist-global-addr-base.ll`). While these issues can be avoided for RV32I, by gating the materialization optimizations to +C, the problem remains for RVC. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83229 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/lib/Target/RISCV/Utils/RISCVMatInt.cpp llvm/test/CodeGen/RISCV/alu16.ll llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll llvm/test/CodeGen/RISCV/copysign-casts.ll llvm/test/CodeGen/RISCV/double-bitmanip-dagcombines.ll llvm/test/CodeGen/RISCV/double-intrinsics.ll llvm/test/CodeGen/RISCV/float-bit-preserving-dagcombines.ll llvm/test/CodeGen/RISCV/float-bitmanip-dagcombines.ll llvm/test/CodeGen/RISCV/float-intrinsics.ll llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll llvm/test/CodeGen/RISCV/imm.ll llvm/test/CodeGen/RISCV/large-stack.ll llvm/test/CodeGen/RISCV/split-offsets.ll llvm/test/CodeGen/RISCV/srem-vector-lkk.ll llvm/test/CodeGen/RISCV/stack-realignment.ll llvm/test/CodeGen/RISCV/urem-vector-lkk.ll llvm/test/CodeGen/RISCV/zext-with-load-is-free.ll llvm/test/MC/RISCV/rv32c-aliases-valid.s llvm/test/MC/RISCV/rv32i-aliases-valid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83229.275659.patch Type: text/x-patch Size: 28730 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:07 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:07 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <463fe1ee8e527c2950e735b4da7a302f@localhost.localdomain> ikudrin updated this revision to Diff 275708. ikudrin marked 10 inline comments as done. ikudrin added a comment. Thanks, @jhenderson! - Use `Cursor` in the loop from the beginning. - Fix typos. - Extended tests. - Removed the old test; Moved the checks to the new test file. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 Files: lld/ELF/SyntheticSections.cpp lld/test/ELF/Inputs/gdb-index.s lld/test/ELF/gdb-index-invalid-pubnames.s lld/test/ELF/gdb-index.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83050.275708.patch Type: text/x-patch Size: 15184 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:07 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:07 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <88facc9c0c854cf360a98ff43f52f594@localhost.localdomain> ikudrin added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:27-28 Sets.clear(); - DataExtractor::Cursor C(0); - while (C && Data.isValidOffset(C.tell())) { + uint64_t Offset = 0; + while (Data.isValidOffset(Offset)) { + uint64_t SetOffset = Offset; ---------------- jhenderson wrote: > ikudrin wrote: > > jhenderson wrote: > > > What's behind the reasoning for no longer using the `Cursor` throughout? > > The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless. > I'm not sure I follow. > > As far as my understanding of `Cursor` goes, you can have: > > ``` > DataExtractor::Cursor C(0); > while (C && Data.isValidOffset(C.tell())) { > // Parse the length > if (!C) { /* report invalid length, using C.takeError() */ return; } > // Parse the header > while (C) { /* parse entries */ } > if (C && C.tell() != Offset) { /* report bad terminator */ } > } > if (!C) { /* report parsing error using C.takeError() */ > ``` > > The `Cursor` is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that `C.takeError()` does not need calling if the `Cursor` is in a success state, much like `Expected`). The only case where it might be different is if `Cursor` is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got: > > ``` > while (Offset) { > DataExtractor::Cursor C(Offset); > ... = Data.getInitialLength(C); > if (!C) { /* report invalid length, using C.takeError() */ return; } > // Parse the header > while (C) { /* parse entries */ } > if (C && C.tell() != Offset) { /* report bad terminator */ } > if (!C) { /* report parsing error using C.takeError() */ > } > ``` > > I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around? I'll take the second one, thanks! ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s:1 # RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t # RUN: not llvm-dwarfdump -debug-pubnames %t 2>&1 | FileCheck %s ---------------- jhenderson wrote: > I'd probably fold in this test case now into the other file. I don't think there's any benefit having them separate. Alternatively, this lives separately, and move the other test case into the library testing. The idea is that we test the code in detail with the library tests, and at a high level in the tool tests (i.e. showing we handle the reported output). I don't mind either approach. OK, I'll move that into the new test. I find using gtest unit tests for things like dumping and error reporting clumsy because they require lots of boilerplate code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:07 2020 From: llvm-commits at lists.llvm.org (Kadir Cetinkaya via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:07 +0000 (UTC) Subject: [PATCH] D83230: Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`" In-Reply-To: References: Message-ID: <2aafac7ad51b19007ce6dcd8fcb29b06@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGd3e3f36ff115: Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an… (authored by goncharov, committed by kadircet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83230/new/ https://reviews.llvm.org/D83230 Files: llvm/lib/Analysis/ScalarEvolution.cpp llvm/test/Analysis/ScalarEvolution/sdiv.ll llvm/test/Analysis/ScalarEvolution/srem.ll Index: llvm/test/Analysis/ScalarEvolution/srem.ll =================================================================== --- llvm/test/Analysis/ScalarEvolution/srem.ll +++ llvm/test/Analysis/ScalarEvolution/srem.ll @@ -14,11 +14,11 @@ ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = srem i32 %i.0, 2 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i32) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i32) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> (zext i1 {false,+,true}<%for.cond> to i64) U: [0,2) S: [0,2) Exits: (zext i1 (trunc i32 %width to i1) to i64) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [0,2) S: [-2,2) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * (zext i1 {false,+,true}<%for.cond> to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * (zext i1 (trunc i32 %width to i1) to i64)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) Index: llvm/test/Analysis/ScalarEvolution/sdiv.ll =================================================================== --- llvm/test/Analysis/ScalarEvolution/sdiv.ll +++ llvm/test/Analysis/ScalarEvolution/sdiv.ll @@ -14,11 +14,11 @@ ; CHECK-NEXT: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; CHECK-NEXT: --> {0,+,1}<%for.cond> U: [0,-2147483648) S: [0,-2147483648) Exits: %width LoopDispositions: { %for.cond: Computable } ; CHECK-NEXT: %rem = sdiv i32 %i.0, 2 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,1073741824) S: [0,1073741824) Exits: (%width /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> %rem U: full-set S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %idxprom = sext i32 %rem to i64 -; CHECK-NEXT: --> ({0,+,1}<%for.cond> /u 2) U: [0,2147483648) S: [0,2147483648) Exits: ((zext i32 %width to i64) /u 2) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> (sext i32 %rem to i64) U: [-2147483648,2147483648) S: [-1073741824,1073741824) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %storage, i64 0, i64 %idxprom -; CHECK-NEXT: --> ((4 * ({0,+,1}<%for.cond> /u 2)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: ((4 * ((zext i32 %width to i64) /u 2)) + %storage) LoopDispositions: { %for.cond: Computable } +; CHECK-NEXT: --> ((4 * (sext i32 %rem to i64)) + %storage) U: [0,-3) S: [-9223372036854775808,9223372036854775805) Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %1 = load i32, i32* %arrayidx, align 4 ; CHECK-NEXT: --> %1 U: full-set S: full-set Exits: <> LoopDispositions: { %for.cond: Variant } ; CHECK-NEXT: %call = call i32 @_Z3adji(i32 %1) Index: llvm/lib/Analysis/ScalarEvolution.cpp =================================================================== --- llvm/lib/Analysis/ScalarEvolution.cpp +++ llvm/lib/Analysis/ScalarEvolution.cpp @@ -6303,20 +6303,6 @@ return getSCEV(U->getOperand(0)); break; - case Instruction::SDiv: - // If both operands are non-negative, this is just an udiv. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getUDivExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - - case Instruction::SRem: - // If both operands are non-negative, this is just an urem. - if (isKnownNonNegative(getSCEV(U->getOperand(0))) && - isKnownNonNegative(getSCEV(U->getOperand(1)))) - return getURemExpr(getSCEV(U->getOperand(0)), getSCEV(U->getOperand(1))); - break; - // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can // lead to pointer expressions which cannot safely be expanded to GEPs, // because ScalarEvolution doesn't respect the GEP aliasing rules when -------------- next part -------------- A non-text attachment was scrubbed... Name: D83230.275710.patch Type: text/x-patch Size: 4968 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:08 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:08 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <47bc6247aec62938356300cd4bc82a9b@localhost.localdomain> sameerarora101 marked 7 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/docs/CommandGuide/llvm-libtool-darwin.rst:24-26 +.. option:: -help, -h + + Display usage information and exit. ---------------- jhenderson wrote: > sameerarora101 wrote: > > jhenderson wrote: > > > I suspect if you do `-help` you'll see one or two other options too. You'll probably want to include `-help-list` here at least, and might need to hide some other options. Could you post (out-of-line) what the full help text is, please? > > here is the full help text from cmd line: > > > > ``` > > OVERVIEW: llvm-libtool-darwin > > > > USAGE: llvm-libtool-darwin [options] > > > > OPTIONS: > > > > Color Options: > > > > --color - Use colors in output (default=autodetect) > > > > General options: > > > > -o - Alias for --output > > --output= - Specify output filename > > > > Generic Options: > > > > --help - Display available options (--help-hidden for more) > > --help-list - Display list of available options (--help-list-hidden for more) > > --version - Display the version of this program > > ``` > > > > I have added description for `--help-list` and `--color` now as well > Thanks. Some more comments: > > 1) As this is a Darwin-inspired tool, we should use the standard option naming throughout. If I understand it correctly, this means single dashes rather than double. > 2) You probably want to use the documentation for the various common options (help, version, color etc) used in the other tool documentation, for consistency. Take a look at e.g. llvm-objcopy or llvm-dwarfdump. In particular, I wouldn't report the "hidden" versions of the help options (they're hidden for a reason...). > 3) Documentation should use full English grammar rules with leading caps and trailing full stops, like comments in the code. Ok, I have replaced `--` with `-` and took help from `llvm-dwarfdump` for the common options. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:1 +## This test checks that an error is thrown in case of invalid input/output args. + ---------------- jhenderson wrote: > jhenderson wrote: > > On further reflection, perhaps it makes sense to combine this and basic.test into a single test. What do you think? > Actually, ignore my previous comment, since basic.test is only short-term. > > You'll probably want to add --static to these tests when you add support to that option to avoid any potential confusion. Ya I agree, `basic.test` is temporary and I replace it with `create-static-lib.test` when adding the `-static` option. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:08 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:08 +0000 (UTC) Subject: [PATCH] D82788: AMDGPU: Fix alignment requirements for 96bit and 128bit local loads and stores In-Reply-To: References: Message-ID: <31d53b975c7dff24cfb7cf8df06475f5@localhost.localdomain> nhaehnle added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1377 + // ds_read/write_b96 require 16-byte alignment. gfx9 and onward has + // unaligned access support but windows likes to keep it dword aligned. + bool IsGFX9Plus = Subtarget->getGeneration() >= AMDGPUSubtarget::GFX9; ---------------- mbrkusanin wrote: > nhaehnle wrote: > > arsenm wrote: > > > nhaehnle wrote: > > > > arsenm wrote: > > > > > This is a windows driver bug and doesn't deserve mentioning here. We do not know what the host OS is > > > > I agree that it's a bug, but I find it reasonable to mention it here. I'd change the comment though to specifically call out that this should be considered a bug in the Windows KMD. > > > It's not specific to this instance though, this belongs with the place where we assume unaligned access for amdhsa > > Okay, that's fair. > Is the new wording ok? No, it makes an incorrect statement. The alignment requirement only exists up to gfx8, so it should say "... require 16-byte alignment on gfx8 and older". I also don't think we enforce dword alignment on gfx9+ anywhere for any other kind of memory access, so we shouldn't do it here. The mess on Windows is a problem, but if we really need to work around it, it should be a target "feature" (misfeature, really, but oh well). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82788/new/ https://reviews.llvm.org/D82788 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:08 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:08 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: sanwou01 updated this revision to Diff 275712. sanwou01 marked an inline comment as done. sanwou01 added a comment. Addressed comments. - Moved handling of vector lib memory dependencies to calculateDependencies - Fixed memory dependencies of library functions that take pointers as arguments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82550.275712.patch Type: text/x-patch Size: 49719 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:09 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:09 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <61e6bb65bac38d33cff6fabd93f54edb@localhost.localdomain> ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:09 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:09 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: dantrushin marked 2 inline comments as done. dantrushin added inline comments. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:506 + + for (Register Reg : RegsToReload) + insertReloads(Reg); ---------------- skatkov wrote: > Don't you want to separate reload loads into separate function? > So we'll have: > spill registers > rewrite statepoint > insertReloads/unspill registers `insertReloads` uses local vector `RegsToReload` and `MI` (statepoint instruction). To call `insertReloads` outside of `rewriteStatepoint` I will have to make that local vector and new statepoint instruction available to `insertReloads()`. I don't think that making `RegsToReload` member variable or something like that: ``` SmallVector RegsToReload; SS.spillRegisters(); MachineInstr *NewStatepoint = SS.rewriteStatepoint(RegsToReload); // out parameter SS.insertReloads(RegsToReload, NewStatepoint); ``` will be much cleaner. But I can do that if you want. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:09 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:09 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <08367d951a537b0dce1ba00bcfa71a36@localhost.localdomain> sameerarora101 updated this revision to Diff 275715. sameerarora101 marked an inline comment as done. sameerarora101 added a comment. Updating docs Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 Files: llvm/docs/CommandGuide/index.rst llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/CMakeLists.txt llvm/test/tools/llvm-libtool-darwin/Inputs/input1.yaml llvm/test/tools/llvm-libtool-darwin/Inputs/input2.yaml llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82923.275715.patch Type: text/x-patch Size: 11205 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:09 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:09 +0000 (UTC) Subject: [PATCH] D83020: [AMDGPU] Avoid using s_cmpk when src0 is not register In-Reply-To: References: Message-ID: <3dfcc74b09c17d708486f46b97b58c77@localhost.localdomain> nhaehnle added a comment. Did you see @arsenm's comment? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83020/new/ https://reviews.llvm.org/D83020 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:10 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:10 +0000 (UTC) Subject: [PATCH] D82972: [DebugInfo] Introduce GNU macro extension entry encodings In-Reply-To: References: Message-ID: <598b2a97c2307ebcdb0b57600ee265c9@localhost.localdomain> dstenb added a comment. In D82972#2127215 , @SouraVX wrote: > Do we really need this ?, Please have a look at https://sourceware.org/legacy-ml/gdb-patches/2017-02/msg00528.html > IMO, it would be unwise to commit it till the discussion thread in D82975 converges ? As I write in the commit message this patch is not strictly needed, but I think it makes the code more clear. I'll wait with this patch until we decide in D82975 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82972/new/ https://reviews.llvm.org/D82972 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:10 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:10 +0000 (UTC) Subject: [PATCH] D81813: [ARM] MVE FP16 cost adjustments In-Reply-To: References: Message-ID: <4d8bd659ac2ff927da9fd215902cd3cf@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG146dad0077b4: [ARM] MVE FP16 cost adjustments (authored by dmgreen). Changed prior to commit: https://reviews.llvm.org/D81813?vs=273015&id=275716#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81813/new/ https://reviews.llvm.org/D81813 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81813.275716.patch Type: text/x-patch Size: 15432 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:10 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:10 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: amyk added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:188 + // CHECK-LE: ret <4 x i32> + return vec_splati_ins(vsia, 0, -17); +} ---------------- Would also be good to add tests where the second argument is 1. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:10 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:10 +0000 (UTC) Subject: [PATCH] D82892: Added comparision for all types in haveSameSpecialState() of Instruction.cpp In-Reply-To: References: Message-ID: tejohnson added a comment. Is this a bug fix? If it is NFC (No Functional Change) please specify that in the title, otherwise needs a test case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82892/new/ https://reviews.llvm.org/D82892 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:11 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:11 +0000 (UTC) Subject: [PATCH] D83122: Fix crash when getVFABIMappings is called with an indirect call instruction In-Reply-To: References: Message-ID: <6a6683a82b6f6efced7cd990820fb647@localhost.localdomain> fpetrogalli requested changes to this revision. fpetrogalli added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/unittests/Analysis/VectorFunctionABITest.cpp:12 #include "llvm/IR/InstIterator.h" +#include "llvm/IRReader/IRReader.h" #include "gtest/gtest.h" ---------------- Is this needed? ================ Comment at: llvm/unittests/Analysis/VectorFunctionABITest.cpp:634 + LLVMContext C; + std::unique_ptr M = parseIR(C, R"IR( +define void @call(void () * %f) { ---------------- Very elegant, but is this `unique_ptr` needed? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83122/new/ https://reviews.llvm.org/D83122 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:11 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:11 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types Message-ID: stuij created this revision. Herald added subscribers: llvm-commits, steven.zhang, hiraditya. Herald added a project: LLVM. The following combine currently breaks in the DAGCombiner: extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x -> extract_vector_elt a, x This happens because after we have combined these nodes we have inserted nodes that use individual instances of the vector type. In the above example i16. However this isn't a legal type on all backends. The type legalizer has already been run, and running it again would make a mess of the nodes. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83231 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll Index: llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll @@ -0,0 +1,17 @@ +; RUN: llc -asm-verbose=0 -mtriple aarch64-arm-none-eabi < %s | FileCheck %s + +; The following code previously broke in the DAGCombiner. Specifically, trying to combine: +; extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x +; -> extract_vector_elt a, x + +define half @test_combine_extract_concat_vectors(<4 x i16> %a) nounwind { +entry: + %0 = shufflevector <4 x i16> %a, <4 x i16> undef, <8 x i32> + %1 = bitcast <8 x i16> %0 to <8 x half> + %2 = extractelement <8 x half> %1, i32 3 + ret half %2 +} + +; CHECK-LABEL: test_combine_extract_concat_vectors: +; CHECK-NEXT: mov h0, v0.h[3] +; CHECK-NEXT: ret Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -17812,8 +17812,10 @@ Elt = (Idx < (int)NumElts) ? Idx : Idx - (int)NumElts; Index = DAG.getConstant(Elt, DL, Index.getValueType()); } - } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && - !BCNumEltsChanged && VecVT.getVectorElementType() == ScalarVT) { + } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && !BCNumEltsChanged && + VecVT.getVectorElementType() == ScalarVT && + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType())) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 // -> extract_vector_elt a, 0 // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83231.275720.patch Type: text/x-patch Size: 1841 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:11 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:11 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: epastor updated this revision to Diff 275719. epastor added a comment. Improve error handling and add tests Also fix comment formatting, and address other feedback. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 Files: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test llvm/test/tools/llvm-ml/struct_errors.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D75306.275719.patch Type: text/x-patch Size: 84236 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:12 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:12 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: epastor marked an inline comment as done. epastor added a comment. Thanks, Nico! I've added test coverage for many (most?) of the error cases as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:12 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:12 +0000 (UTC) Subject: [PATCH] D73051: [GlobalISel][AMDGPU] Legalize saturating add/subtract In-Reply-To: References: Message-ID: arsenm added a comment. ping. Are you going to get back to this soon, or should I adopt this? This is on the shortlist of remaining operations falling back in the OpenCL conformance tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73051/new/ https://reviews.llvm.org/D73051 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:12 2020 From: llvm-commits at lists.llvm.org (Owen Reynolds via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:12 +0000 (UTC) Subject: [PATCH] D82479: [llvm-size] Output REL, RELA and STRTAB sections in some cases In-Reply-To: References: Message-ID: gbreynoo added a comment. Apologies Maskray, I misunderstood your previous comment and should not have committed before you also accepted the change. If you would prefer this change to be reverted whilst discussion continues I would be happy to do so. I think that this change better matches what users expect from llvm-size, and considering the small size and impact of this change that it is worth including. I agree it is useful to investigate the current GNU size behaviour and work out what the intended behaviour is, from what I've seen however it looks to be overly complex and inconsistent. A wider discussion may be required for what llvm-size should output as to fit with user expectation, as seen in the bug I linked above there is some confusion. In response to your previous comment suggesting to always output SHT_REL & SHT_RELA and potentially string tables, I don't think these should be output unless allocatable. These sections would be added to the size total in cases in which they do not occupy memory. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82479/new/ https://reviews.llvm.org/D82479 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:13 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:13 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. Currently, llvm-readobj calls `report_fatal_error` when an object has both REL and RELA dynamic relocations. llvm-readelf is able to handle this case properly. This patch adds such a test case and adjusts the llvm-readobj code to follow (and be consistent with its own RELR and PLTREL cases). https://reviews.llvm.org/D83232 Files: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6362,14 +6362,14 @@ const DynRegionInfo &DynRelaRegion = this->dumper()->getDynRelaRegion(); const DynRegionInfo &DynRelrRegion = this->dumper()->getDynRelrRegion(); const DynRegionInfo &DynPLTRelRegion = this->dumper()->getDynPLTRelRegion(); - if (DynRelRegion.Size && DynRelaRegion.Size) - report_fatal_error("There are both REL and RELA dynamic relocations"); + W.startLine() << "Dynamic Relocations {\n"; W.indent(); - if (DynRelaRegion.Size > 0) + if (DynRelaRegion.Size > 0) { for (const Elf_Rela &Rela : this->dumper()->dyn_relas()) printDynamicRelocation(Obj, Rela); - else + } + if (DynRelRegion.Size > 0) { for (const Elf_Rel &Rel : this->dumper()->dyn_rels()) { Elf_Rela Rela; Rela.r_offset = Rel.r_offset; @@ -6377,6 +6377,8 @@ Rela.r_addend = 0; printDynamicRelocation(Obj, Rela); } + } + if (DynRelrRegion.Size > 0) { Elf_Relr_Range Relrs = this->dumper()->dyn_relrs(); std::vector RelrRelas = Index: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test +++ llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test @@ -463,3 +463,56 @@ Sections: - Section: .rela.dyn - Section: .dynamic + +## Show that when we have both REL and RELA relocations, we dump both sets. +# RUN: yaml2obj --docnum=13 %s -o %t13 +# RUN: llvm-readobj --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-LLVM +# RUN: llvm-readelf --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-GNU + +# BOTH-RELA-REL-LLVM: Dynamic Relocations { +# BOTH-RELA-REL-LLVM-NEXT: 0x0 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: 0x0 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: } + +# BOTH-RELA-REL-GNU: 'RELA' relocation section at offset 0x78 contains 24 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend +# BOTH-RELA-REL-GNU-NEXT: 0000000000000000 0000000000000000 R_X86_64_NONE 0 +# BOTH-RELA-REL-GNU-EMPTY: +# BOTH-RELA-REL-GNU: 'REL' relocation section at offset 0x78 contains 16 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name +# BOTH-RELA-REL-GNU-NEXT: 0000000000000000 0000000000000000 R_X86_64_NONE + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_X86_64 +Sections: + - Name: .rela.dyn + Type: SHT_RELA + Relocations: + - Type: R_X86_64_NONE + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: DT_RELA + Value: 0x0 + - Tag: DT_RELASZ + Value: 0x18 + - Tag: DT_RELAENT + Value: 0x18 + - Tag: DT_REL + Value: 0x0 + - Tag: DT_RELSZ + Value: 0x10 + - Tag: DT_RELENT + Value: 0x10 + - Tag: DT_NULL + Value: 0x0 +DynamicSymbols: [] +ProgramHeaders: + - Type: PT_LOAD + Sections: + - Section: .rela.dyn + - Section: .dynamic -------------- next part -------------- A non-text attachment was scrubbed... Name: D83232.275722.patch Type: text/x-patch Size: 3421 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:13 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:13 +0000 (UTC) Subject: [PATCH] D81166: [Matrix] Add matrix_nuw/matrix_nsw operand bundles for matrix.multiply. In-Reply-To: References: Message-ID: <609b45ac7fb0433185c26b0dc01550e8@localhost.localdomain> fhahn added a comment. ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81166/new/ https://reviews.llvm.org/D81166 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:13 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:13 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <01496384e9f365a8a2ccdfa077c4dfab@localhost.localdomain> biplmish added inline comments. ================ Comment at: clang/lib/Headers/altivec.h:17037 + })(x)) \ + .ui +#define DP2LL(x) \ ---------------- lei wrote: > This looks just like a cast to `unsigned int`, can you explain why it needs to be cast to a union to extract the unsigned int instead of just directly casting it to an `unsigned int`? The intrinsic implements the input2 argument as i64. When the input to the function is of type float/double it generates fptoui which can modify the user input value. Cast to union is done to preserve the float value and the same reaches the hw instruction. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:14 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard (Linaro) via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:14 +0000 (UTC) Subject: [PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8 In-Reply-To: References: Message-ID: <570a5b2ceba6aa935ef11ec7c6a5ce7b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe80b81d1cbf8: [Support] Fix formatted_raw_ostream for UTF-8 (authored by ostannard). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76291/new/ https://reviews.llvm.org/D76291 Files: clang/test/Analysis/checker-plugins.c llvm/include/llvm/Support/FormattedStream.h llvm/lib/Support/FormattedStream.cpp llvm/test/MC/ARM/lsl-zero.s llvm/unittests/Support/formatted_raw_ostream_test.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D76291.275726.patch Type: text/x-patch Size: 13508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:14 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:14 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <48606cc3f83809c37e1135031e36266c@localhost.localdomain> biplmish updated this revision to Diff 275723. biplmish added a comment. Modify the clang test to remove the redundant CHECK-LE's. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82359.275723.patch Type: text/x-patch Size: 17864 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:14 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:14 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <8a3353a46d56626e0fb19b375121d6ce@localhost.localdomain> fpetrogalli added a comment. Hi @sanwou01 , thank you for working on this! I left a couple of comments. Additional question: you seem to be testing only math functions. But it seems that your code would be working with any function that has the vector function abi variant attribute attached. Mmight be worth adding a test for a generic function? Kind regards, Francesco ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3031 + VFShape Shape = + VFShape::get(*CI, {static_cast(VL.size()), false}, + false /*HasGlobalPred*/); ---------------- Please describe the `false` parameter like you have done for `/*HasGlobalPred*/`. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4545 + } else { + Module *M = F->getParent(); + Type *Tys[] = {FixedVectorType::get(CI->getType(), E->Scalars.size())}; ---------------- `M` is used only inside `getDeclaration`, no need to declare a variable for it. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5110 + + return true; +} ---------------- I suspect there are many things that may fail here, other than not having a mapping in `VFDatabase` or having pointer arguments. I think it would be safer to reverse the logic, and have the function return false by default, and return true if VFDatabase is not empty and there is no pointer arguments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- ABataev wrote: > Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. HI @ABataev, if I understood correctly, you are asking to add a new attribute that guaranteed that the function does not have side effects? If that's the case, that is already guaranteed by the `vector-function-abi-variant` attribute. ================ Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll:66 ; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, <4 x float>* [[A:%.*]], align 16 -; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 -; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @ceilf(float [[VECEXT]]) #1 -; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 -; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 -; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @ceilf(float [[VECEXT_1]]) #1 -; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 -; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 -; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @ceilf(float [[VECEXT_2]]) #1 -; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 -; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 -; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @ceilf(float [[VECEXT_3]]) #1 -; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vceilf(<4 x float> [[TMP0]]) +; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0 ---------------- Nice! :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:15 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:15 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <6901b31bbef3f90dd72c186536438bea@localhost.localdomain> ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- fpetrogalli wrote: > ABataev wrote: > > Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. > HI @ABataev, if I understood correctly, you are asking to add a new attribute that guaranteed that the function does not have side effects? If that's the case, that is already guaranteed by the `vector-function-abi-variant` attribute. Hi. No, what I'm asking is to mark the vectorized function versions with attributes that they don't write to the memory (what this change implies), so all attribute interfaces properly handle such functions as no-write functions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:15 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:15 +0000 (UTC) Subject: [PATCH] D83045: [AArch64][SVE] Add FP unpredicated to predicated two-op codegen In-Reply-To: References: Message-ID: <81e190483e2911142632149a74a0cead@localhost.localdomain> cameron.mcinally added a comment. > I initially created an unpredicated intrinsic for this, because I didn't find any other way of generating the unpredicated fmin, fmax, etc. So I would appreciate any suggestions as to how to generate those unpredicated instructions without having to add more intrinsics. I might be missing the motivation for the unpredicated intrinsic, but in the non-scalable world an unpredicated operation would be canonicalized as a predicated intrinsic with all-1s mask. That would then be lowered during ISel. Something like: %mask = shufflevector 1, undef, zeroinitializer %res = call @llvm.aarch64.sve.fmin.nxv4f32( %mask %a, %b) X86 does something similar in `X86ISelLowering.cpp:getVectorMaskingNode(...)`, although it's not really apples-to-apples since X86's masks are represented differently. ================ Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:1749 + + def int_aarch64_sve_fmaxnm : AdvSIMD_Pred2VectorArg_Intrinsic; ---------------- Unnecessary whitespace change? There are a couple of these in this patch. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1579 } -#undef MAKE_CASE return nullptr; } ---------------- Was this #undef intentionally removed? ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11909 + return combineToSVEPred(N, DAG, AArch64ISD::FMINNM); } + ---------------- This looks good, but it may be table-worthy, if this is the way forward. X86 does something similar in `llvm/lib/Target/X86/X86IntrinsicsInfo.h`. Just a heads up. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83045/new/ https://reviews.llvm.org/D83045 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:16 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:16 +0000 (UTC) Subject: [PATCH] D82481: [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG6d3ae365bdfc: [XCOFF][AIX] Give symbol an internal name when desired symbol name contains… (authored by jasonliu). Changed prior to commit: https://reviews.llvm.org/D82481?vs=275463&id=275729#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82481/new/ https://reviews.llvm.org/D82481 Files: llvm/include/llvm/MC/MCAsmInfo.h llvm/include/llvm/MC/MCContext.h llvm/include/llvm/MC/MCSectionXCOFF.h llvm/include/llvm/MC/MCStreamer.h llvm/include/llvm/MC/MCSymbolXCOFF.h llvm/include/llvm/MC/MCXCOFFStreamer.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/MC/MCAsmInfoXCOFF.cpp llvm/lib/MC/MCAsmStreamer.cpp llvm/lib/MC/MCContext.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/MC/MCSymbolXCOFF.cpp llvm/lib/MC/XCOFFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll llvm/test/CodeGen/PowerPC/test_func_desc.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82481.275729.patch Type: text/x-patch Size: 29755 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:17 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:17 +0000 (UTC) Subject: [PATCH] D82315: [PowerPC][PCRelative] Thread Local Storage Support for General Dynamic In-Reply-To: References: Message-ID: <675c344ec86f71edf8ebb8071642dd53@localhost.localdomain> kamaub updated this revision to Diff 275727. kamaub added a comment. - Rebasing branch to more recent version of master. - Adding asserts to ensure values are always what they are expected to be. - Code formating and comments added. - Conditional logic for register operands rearranged to avoid potential bugs in PPCTLSDynamicCall.cpp. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82315/new/ https://reviews.llvm.org/D82315 Files: llvm/include/llvm/BinaryFormat/ELFRelocs/PowerPC64.def llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/PPC.h llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstr64Bit.td llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/lib/Target/PowerPC/PPCMCInstLower.cpp llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp llvm/test/CodeGen/PowerPC/pcrel-tls-general-dynamic.ll llvm/test/MC/PowerPC/pcrel-tls-general-dynamic-address-load-reloc.s llvm/test/MC/PowerPC/pcrel-tls-general-dynamic-value-load-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82315.275727.patch Type: text/x-patch Size: 31259 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:17 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:17 +0000 (UTC) Subject: [PATCH] D83182: Expand the LLVM Developer Policy to include new sections on adding a project to the LLVM Monorepo, and a second about the LLVM Incubator projects. In-Reply-To: References: Message-ID: <5d1b3bcd2098e2b3720f1536c7fafe8c@localhost.localdomain> stephenneuendorffer marked an inline comment as done. stephenneuendorffer added inline comments. ================ Comment at: llvm/docs/DeveloperPolicy.rst:534 + + * Generally, try to support LLVM and GCC versions from the last 3 years at a + minimum. This time-based guideline is not strict: we may support much older ---------------- lattner wrote: > stephenneuendorffer wrote: > > complete sentence? > Oh I see what you are doing here. I just move a section from one place to the other and now I have to make it actually make sense. That could never have just recently happened to you ;-) > > No good deed goes unpunished. :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83182/new/ https://reviews.llvm.org/D83182 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:17 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:17 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: tskeith added inline comments. ================ Comment at: flang/test/Semantics/num_images.f90:14 + !ERROR: too many actual arguments for intrinsic 'num_images' + print *, num_images(3.4) + ---------------- Why is the error "too many actual arguments" rather than incorrect type? ================ Comment at: flang/test/Semantics/num_images.f90:22 + !ERROR: unknown keyword argument to intrinsic 'num_images' + print *, num_images(team_number=3.4) + ---------------- Similar question here: `team_number` isn't an unknown keyword argument. The value has the wrong type. Are these bad error messages found with other intrinsics or unique to `num_images? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D82315: [PowerPC][PCRelative] Thread Local Storage Support for General Dynamic In-Reply-To: References: Message-ID: <8babd928d62c40a225fb325d73f0b1ba@localhost.localdomain> kamaub updated this revision to Diff 275731. kamaub marked 7 inline comments as done. kamaub added a comment. - Renaming test functions names as per review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82315/new/ https://reviews.llvm.org/D82315 Files: llvm/include/llvm/BinaryFormat/ELFRelocs/PowerPC64.def llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/PPC.h llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstr64Bit.td llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/lib/Target/PowerPC/PPCMCInstLower.cpp llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp llvm/test/CodeGen/PowerPC/pcrel-tls-general-dynamic.ll llvm/test/MC/PowerPC/pcrel-tls-general-dynamic-address-load-reloc.s llvm/test/MC/PowerPC/pcrel-tls-general-dynamic-value-load-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82315.275731.patch Type: text/x-patch Size: 31355 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D83235: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand In-Reply-To: References: Message-ID: Petar.Avramovic added a comment. This fixes assert and makes this test mentioned in D82651 to fallback for aarch64. This seem to have different handling on different targets. I don't know what is the best way to handle this since it looks to me that custom handling has to be done in different places Just adding same flag and register as MatchedOperand works for this example but from this comment // We need to make sure that this one operand does not end up in XZR, thus // require the address to be in a PointerRegClass register. in AArch64DAGToDAGISel::SelectInlineAsmMemoryOperand an extra copy to a register with PointerRegClass needs to be done. Something like: --- a/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ b/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -356,7 +356,11 @@ bool InlineAsmLowering::lowerInlineAsm( assert( SourceRegs.size() == 1 && "Expected the memory output to fit into a single virtual register"); - Inst.addReg(SourceRegs[0]); + const TargetRegisterClass *TRC = + MF.getSubtarget().getRegisterInfo()->getPointerRegClass(MF); + Register Tmp = MRI->createVirtualRegister(TRC); + MIRBuilder.buildCopy(Tmp, SourceRegs[0]); + Inst.addReg(Tmp); } else { // Otherwise, this outputs to a register (directly for C_Register / // C_RegisterClass. Find a register that we can use. @@ -406,6 +410,19 @@ bool InlineAsmLowering::lowerInlineAsm( InstFlagIdx += getNumOpRegs(*Inst, InstFlagIdx) + 1; assert(getNumOpRegs(*Inst, InstFlagIdx) == 1 && "Wrong flag"); + unsigned MatchedOperandFlag = Inst->getOperand(InstFlagIdx).getImm(); + if (InlineAsm::isMemKind(MatchedOperandFlag)) { + Inst.addImm(MatchedOperandFlag); + Inst.addReg(Inst->getOperand(InstFlagIdx + 1).getReg()); + break; + } + + if (!InlineAsm::isRegDefKind(MatchedOperandFlag) && + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; + } + // We want to tie input to register in next operand. unsigned DefRegIdx = InstFlagIdx + 1; Register Def = Inst->getOperand(DefRegIdx).getReg(); but target should create copy / add immediate and register to Inst. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83235/new/ https://reviews.llvm.org/D83235 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D83235: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand Message-ID: Petar.Avramovic created this revision. Petar.Avramovic added reviewers: john.brawn, foad, arsenm. Herald added subscribers: llvm-commits, hiraditya, rovka, wdng. Herald added a project: LLVM. Mark matching input constraint to mem operand as not supported. https://reviews.llvm.org/D83235 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll @@ -235,6 +235,16 @@ ret %res } +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to translate instruction{{.*}}asm_indirect_output +; FALLBACK-WITH-REPORT-OUT-LABEL: asm_indirect_output +define void @asm_indirect_output() { +entry: + %ap = alloca i8*, align 8 + %0 = load i8*, i8** %ap, align 8 + call void asm sideeffect "", "=*r|m,0,~{memory}"(i8** %ap, i8* %0) + ret void +} + attributes #1 = { "target-features"="+sve" } declare @llvm.aarch64.sve.ptrue.nxv16i1(i32 %pattern) Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -406,6 +406,18 @@ InstFlagIdx += getNumOpRegs(*Inst, InstFlagIdx) + 1; assert(getNumOpRegs(*Inst, InstFlagIdx) == 1 && "Wrong flag"); + unsigned MatchedOperandFlag = Inst->getOperand(InstFlagIdx).getImm(); + if (InlineAsm::isMemKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Matching input constraint to mem operand not " + "supported. This should be target specific.\n"); + return false; + } + if (!InlineAsm::isRegDefKind(MatchedOperandFlag) && + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; + } + // We want to tie input to register in next operand. unsigned DefRegIdx = InstFlagIdx + 1; Register Def = Inst->getOperand(DefRegIdx).getReg(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83235.275730.patch Type: text/x-patch Size: 1923 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <1f7f7167469dffd2ea4e384a5ff84d46@localhost.localdomain> fpetrogalli added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- ABataev wrote: > fpetrogalli wrote: > > ABataev wrote: > > > Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. > > HI @ABataev, if I understood correctly, you are asking to add a new attribute that guaranteed that the function does not have side effects? If that's the case, that is already guaranteed by the `vector-function-abi-variant` attribute. > Hi. No, what I'm asking is to mark the vectorized function versions with attributes that they don't write to the memory (what this change implies), so all attribute interfaces properly handle such functions as no-write functions. I just had a chat with @sanwou01 , we agreed that it is better to use the VFABI variant attribute just to describe the mapping, and have other attributes (like `readonly`) attached to the function to guarantee the fact that there is not side effects. @sanwou01 is coming up with a unit test in which a function is marked with both the `readonly` and `vector-function-abi-variant`. Is there any other attribute he should add to guarantee that the memory accesses are compatible with a concurrent invocation of the function? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- fpetrogalli wrote: > ABataev wrote: > > fpetrogalli wrote: > > > ABataev wrote: > > > > Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. > > > HI @ABataev, if I understood correctly, you are asking to add a new attribute that guaranteed that the function does not have side effects? If that's the case, that is already guaranteed by the `vector-function-abi-variant` attribute. > > Hi. No, what I'm asking is to mark the vectorized function versions with attributes that they don't write to the memory (what this change implies), so all attribute interfaces properly handle such functions as no-write functions. > I just had a chat with @sanwou01 , we agreed that it is better to use the VFABI variant attribute just to describe the mapping, and have other attributes (like `readonly`) attached to the function to guarantee the fact that there is not side effects. > > @sanwou01 is coming up with a unit test in which a function is marked with both the `readonly` and `vector-function-abi-variant`. Is there any other attribute he should add to guarantee that the memory accesses are compatible with a concurrent invocation of the function? Maybe, `nosync` attribute? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D82801: IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref In-Reply-To: References: Message-ID: <066643bdc15657541687cfec61b3f48a@localhost.localdomain> arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82801/new/ https://reviews.llvm.org/D82801 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:18 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:18 +0000 (UTC) Subject: [PATCH] D82679: OpaquePtr: Don't check pointee type for byval/preallocated In-Reply-To: References: Message-ID: arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82679/new/ https://reviews.llvm.org/D82679 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D81485: GlobalISel: Verify G_BITCAST changes the type In-Reply-To: References: Message-ID: arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81485/new/ https://reviews.llvm.org/D81485 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D83024: [PGO] Instrument function entry BB by default in IR PGO In-Reply-To: References: Message-ID: <5619e5dd93dcf87f2b7d149f12b28770@localhost.localdomain> wmi added a comment. In D83024#2132133 , @davidxl wrote: > Here version is overloaded with different meanings: 1) to indicate format change; 2) to indicate instrumentation compiler version that is capable of producing this format. These two purposes can be contradicting to each other as the user can use option to force the old format with the new compiler. For the case new compiler + -pgo-instrument-entry=false, if -pgo-instrument-entry=false is used in both profile-gen and profile-use explicitly, we won't have a problem. Do we have a senario which it doesn't work? > I am not sure how compiler can relax the check at the profile use time. I just checked IndexedInstrProfReader::readHeader and the version check part has already allowed the case with profile version being smaller than compiler current format version so the relax I am talking about is already there -- Rong's patch has already covered my proposed solution. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83024/new/ https://reviews.llvm.org/D83024 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D83164: [flang] Basic tests of external I/O runtime (part 9/9) In-Reply-To: References: Message-ID: klausler updated this revision to Diff 275733. klausler added a comment. Add missing test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83164/new/ https://reviews.llvm.org/D83164 Files: flang/runtime/terminator.cpp flang/runtime/terminator.h flang/unittests/Runtime/CMakeLists.txt flang/unittests/Runtime/external-hello.cpp flang/unittests/Runtime/external-io.cpp flang/unittests/Runtime/testing.cpp flang/unittests/Runtime/testing.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83164.275733.patch Type: text/x-patch Size: 21538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D81901: GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT In-Reply-To: References: Message-ID: <7cdd6b96fa04ab23c7b6d0403efe91d0@localhost.localdomain> arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81901/new/ https://reviews.llvm.org/D81901 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D73940: GlobalISel: Reimplement moreElementsVectorDst In-Reply-To: References: Message-ID: <98337c1a60adba5f6c6f426517d1965d@localhost.localdomain> arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73940/new/ https://reviews.llvm.org/D73940 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:19 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:19 +0000 (UTC) Subject: [PATCH] D83237: [flang] Add missing include for std::min Message-ID: tskeith created this revision. tskeith added a reviewer: klausler. tskeith added a project: Flang. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. This was causing the build to fail on macos. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83237 Files: flang/runtime/file.cpp Index: flang/runtime/file.cpp =================================================================== --- flang/runtime/file.cpp +++ flang/runtime/file.cpp @@ -9,6 +9,7 @@ #include "file.h" #include "magic-numbers.h" #include "memory.h" +#include #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: D83237.275734.patch Type: text/x-patch Size: 317 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:20 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:20 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour Message-ID: jmorse created this revision. jmorse added reviewers: aprantl, probinson, dblaikie, Orlando, vsk. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Occasionally we see absolutely massive basic blocks, typically in global constructors that are vulnerable to heavy inlining. When these blocks are dense with DBG_VALUE instructions, we can hit near quadratic complexity in DwarfDebug's validThroughout function. The problem is caused by: - validThroughout having to step through all instructions in the block to examine their lexical scope, and - a high proportion of instructions in that block being DBG_VALUEs for a unique variable fragment, Leading to us stepping through every instruction in the block, for (nearly) each instruction in the block. In the particular sample I'm looking at, there's a block with 120K instructions and maybe two-thirds of them are DBG_VALUEs. Not running validThroughout for this block cuts time in DWARF emission in half (which is many tens of seconds). By adding this guard, we force variables in this block to use a location list rather than a single-location expression, as shown in the added test . This shouldn't change the meaning of the output DWARF at all: instead we use a less efficient DWARF encoding to avoid a poor-performance code path. In the long term this could be fixed by Orlando's D82129 providing enough instruction ordering information to make validThroughouts checks less complex, but we're not there yet. The testing technique is shamelessly ripped off from D80662 . I've used a set of very-large block pointers rather than calling size() each time, because size() isn't constant-time with ilists. The default setting of blocks that are over 30,000 instructions long being considered too large isn't determined scientifically; rather, it solves the problem in front of me, and doesn't trigger on a stage2 clang build. Suggestions on a good mechanism to pick this number most welcome. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83236 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83236.275714.patch Type: text/x-patch Size: 7137 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:20 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:20 +0000 (UTC) Subject: [PATCH] D83237: [flang] Add missing include for std::min In-Reply-To: References: Message-ID: DavidTruby accepted this revision. DavidTruby added a comment. This revision is now accepted and ready to land. We must have overlapped here somehow, I actually created a revision for this a couple of hours ago https://reviews.llvm.org/D83227 I'm happy to LGTM this as it is identical to my patch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83237/new/ https://reviews.llvm.org/D83237 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:20 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:20 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: DiggerLin updated this revision to Diff 275735. DiggerLin added a comment. when hasVecInfo() is true , the paramete type should be parsed as - if has_vec is set, encoded as follows: - '00' ==> fixed parameter - '01' ==> vector parameter - '10' ==> single-precision float parameter - '11' ==> double-precision float parameter it is not included in the patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 Files: llvm/include/llvm/BinaryFormat/XCOFF.h llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/unittests/Object/CMakeLists.txt llvm/unittests/Object/XCOFFObjectFileTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81585.275735.patch Type: text/x-patch Size: 14552 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:20 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:20 +0000 (UTC) Subject: [PATCH] D83238: AMDGPU/GlobalISel: Add types to special inputs Message-ID: arsenm created this revision. arsenm added reviewers: rampitec, kerbowa. Herald added subscribers: hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. When passing special ABI inputs, we have no existing context for the type to use. https://reviews.llvm.org/D83238 Files: llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83238.275736.patch Type: text/x-patch Size: 10711 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:21 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:21 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: sanwou01 added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5208 + bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory() && + !isVectorizableLibFunctionCall(BundleMember->Inst); unsigned numAliased = 0; ---------------- ABataev wrote: > fpetrogalli wrote: > > ABataev wrote: > > > fpetrogalli wrote: > > > > ABataev wrote: > > > > > Still, this is a bad solution. Add proper attributes to the vector variants of the functions, so all memory access interfaces could properly handle such functions. > > > > HI @ABataev, if I understood correctly, you are asking to add a new attribute that guaranteed that the function does not have side effects? If that's the case, that is already guaranteed by the `vector-function-abi-variant` attribute. > > > Hi. No, what I'm asking is to mark the vectorized function versions with attributes that they don't write to the memory (what this change implies), so all attribute interfaces properly handle such functions as no-write functions. > > I just had a chat with @sanwou01 , we agreed that it is better to use the VFABI variant attribute just to describe the mapping, and have other attributes (like `readonly`) attached to the function to guarantee the fact that there is not side effects. > > > > @sanwou01 is coming up with a unit test in which a function is marked with both the `readonly` and `vector-function-abi-variant`. Is there any other attribute he should add to guarantee that the memory accesses are compatible with a concurrent invocation of the function? > Maybe, `nosync` attribute? Agreed that relying on the proper attributes is much cleaner. `nosync` is too weak, it only guarantees no communication between threads e.g. through the absence of volatile, not between subsequent calls on the main thread. `readonly` would be sufficient for most math functions. `argmemonly` would be the right attribute for functions that take pointers (possibly with `readonly`), but we'd have to check aliasing in that case. Also note that the attribute has to be set on the original scalar function (that's what's being checked here); the vector variant, by being an implementation of the scalar function, should also conform to the attribute. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (Daniel Sanders via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D83034: [GlobalISel] Don't skip adding predicate matcher In-Reply-To: References: Message-ID: <2fde50fb17b99b1548498fb36e64ae0d@localhost.localdomain> dsanders added a comment. Could you add a test case or point us at a example of an existing test that goes wrong? The description in the commit message is unlikely to be the full story as we already cover loads with non-null MemoryVT's. My best guess is you are attempting to use builtin predicates and custom predicates together. I don't see a reason why that shouldn't be allowed but it's not something that was intended as the goal was aiming to fully remove the custom C++ from the tablegen headers so that tablegen could do some transformations on sextload/zextload and similar to fix DAGISel vs GlobalISel modelling mismatches. ================ Comment at: llvm/utils/TableGen/GlobalISelEmitter.cpp:3595-3605 // G_LOAD is used for both non-extending and any-extending loads. if (Predicate.isLoad() && Predicate.isNonExtLoad()) { InsnMatcher.addPredicate( 0, MemoryVsLLTSizePredicateMatcher::EqualTo, 0); continue; } if (Predicate.isLoad() && Predicate.isAnyExtLoad()) { ---------------- These two cases covered all of the loads with predicates that were moved to built-in predicates. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83034/new/ https://reviews.llvm.org/D83034 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <486bf7201cc03af2b4349b5907223e05@localhost.localdomain> lei added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:10 -// RUN: %clang_cc1 -target-feature +vsx -target-feature +altivec \ -// RUN: -target-cpu pwr10 -triple powerpc64le-unknown-unknown -emit-llvm %s \ ---------------- Should not remove this since there are tests in here that depends on this this check prefix. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: mtrofin updated this revision to Diff 275743. mtrofin marked 23 inline comments as done. mtrofin added a comment. feedback Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 Files: llvm/CMakeLists.txt llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h llvm/include/llvm/Analysis/Utils/TFUtils.h llvm/lib/Analysis/CMakeLists.txt llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp llvm/lib/Analysis/TFUtils.cpp llvm/test/CMakeLists.txt llvm/test/lit.site.cfg.py.in llvm/unittests/Analysis/CMakeLists.txt llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/saved_model.pb llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.data-00000-of-00001 llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.index llvm/unittests/Analysis/TFUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82817.275743.patch Type: text/x-patch Size: 34451 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D83240: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping Message-ID: mbrkusanin created this revision. mbrkusanin added reviewers: foad, arsenm. mbrkusanin added a project: LLVM. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83240 Files: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/BUFInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.d16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83240.275740.patch Type: text/x-patch Size: 10563 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D79690: [RISCV] Fold ADDIs into load/stores with nonzero offsets In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG61c2a0bb8236: [RISCV] Fold ADDIs into load/stores with nonzero offsets (authored by luismarques). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79690/new/ https://reviews.llvm.org/D79690 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll llvm/test/CodeGen/RISCV/callee-saved-gprs.ll llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll llvm/test/CodeGen/RISCV/fp128.ll llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll llvm/test/CodeGen/RISCV/wide-mem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79690.275739.patch Type: text/x-patch Size: 148485 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:22 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:22 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <0a882442d2720569b8669587134ff6ca@localhost.localdomain> mtrofin added inline comments. ================ Comment at: llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h:36 +#endif // LLVM_ANALYSIS_INLINESIZEESTIMATORANALYSIS_H \ No newline at end of file ---------------- jdoerfert wrote: > I would prefer to remove the `TF` prefix (in the generic code parts). Since we only have a single model evaluator class we could add `using ModelEvaluator = TFModelEvaluator` here. We then document it saying that new evaluators might be added in which case ModelEvaluator becomes an interface implemented by different classes. Happy to rename to ModelEvaluator, but I would shy away from documenting how this may pan out if we wanted to support other evaluators. The issue would with the API typing. Currently it is tied to TF. If we wanted to generalize later over a few other evaluators, we'd need to figure out common abstractions and so on - time at which the name of the base type would be the least disruptive of the changes, I think. Note also that this code isn't that generic, in that its compilation is tied to the presence of the tensorflow C API library - the only generic part of it is that it's reused by both the size estimator in this patch, and the development mode InlineAdvisor (next patch). Back to the original feedback, happy to rename to ModelEvaluator (i.e. not even do "using", just rename); or we can keep it "TF" because it's tied to tensorflow anyway (APIs and implementation) wdyt? ================ Comment at: llvm/include/llvm/Analysis/Utils/TFUtils.h:40 + const std::vector &OutputNames, + const char *Tags = "serve"); + ~TFModelEvaluator(); ---------------- jdoerfert wrote: > Is "serve" a keyword for TF? If not I would recommend changing it. Could you please use llvm types if possible? The strings don't seem to be necessary, `StringRef` should do fine. For sure for the const char. `const SmallVectorImpl` should be preferred over `std::vector`. If you think a vector of strings is safer for the future, that is fine with me. "serve" is the typical tag used in tensorflow for the graph that's ready to be used ("served"). Re. strings - the usecase we'll have in the 'development' mode InlineAdvisor will require string concatenation, and I think at that point StringRef would loose its benefit. The vectors will be larger (the development mode currently has 11 features, and we're looking at more) - would SmallVectorImpl still make sense? ================ Comment at: llvm/include/llvm/Analysis/Utils/TFUtils.h:69 + void deleteSession(); + bool checkReportAndReset(const TF_Output &Output, StringRef Name); +}; ---------------- jdoerfert wrote: > SmallVector's above if possible. ack - addressing in the previous comment ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:44 +class TFModelEvaluator {}; +} // namespace llvm +InlineSizeEstimatorAnalysis::InlineSizeEstimatorAnalysis() {} ---------------- jdoerfert wrote: > Now that I see this, we might want to consider adding ModelEvaluator right away with some default implementation and deriving from it. See my previous observation about the API design. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:59 + "ml-inliner-ir2native-model", cl::Hidden, + cl::desc("Path to saved model evaluating native size from IR.")); + ---------------- jdoerfert wrote: > Can we move this to the top, just below the `DEBUG_TYPE`. That is where people (=I) look for command line options. OK - I flipped the ifdef, so this bubbled up, does that work? ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:65 +#include "llvm/IR/Instruction.def" +} + ---------------- jdoerfert wrote: > We do not have an existing function for this? Surprising. Not that I can tell - see also TargetLoweringBase.cpp InstructionOpcodes enum, and Instruction.h - the OtherOps enum. I would hesitate using OtherOps because it may evolve differently. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:101 + IRToNativeSizeLearning::FunctionFeatures::ImportantInstructionSuccessions = + {{1, 34}, {15, 27}, {53, 53}, {53, 34}, {1, 11}, {32, 2}, {2, 48}, + {28, 48}, {1, 45}, {49, 32}, {57, 56}, {55, 53}, {1, 28}, {57, 34}, ---------------- davidxl wrote: > Is it possible to avoid using hardcoded opcodes? Not easily, and the benefit would be unclear. The pairs were generated through analysis of the data sets used for training this model. *If* we ended up down this path (i.e. feature pairs), they would be maintained in the same way - starting from analysis of data sets, not starting from IR. Using indices is just simpler in that case. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:120 + {40, 34}, {1, 13}, {38, 34}, {29, 2}, {34, 2}, {1, 39}, {1, 22}, + {1, 27}, {49, 1}, {1, 8}, {56, 2}}; + ---------------- jdoerfert wrote: > I guess this is basically fixed? maybe a `std::array` would do too. > Yes, but (unless I'm missing something) I'd have to specify the "N" (the size). This way, I let the vector's size calculate other values (like FeatureCount) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:23 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: <5d2c28ab6f8100e867a98f37f00bf156@localhost.localdomain> MaskRay marked an inline comment as done. MaskRay added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:241 // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && !config->shared) { + if (expr == R_DTPREL && canRelax && !config->shared) { c.relocations.push_back( ---------------- grimar wrote: > I've noticed that `canRelax` is always used with `&& !config->shared` now. > So can it be: > > ``` > bool canRelax = !config->shared && config->emachine != EM_ARM && > config->emachine != EM_HEXAGON && > config->emachine != EM_RISCV; > ``` > > > I agree. `canRelax` does not capture the meaning precisely now. It actually means whether we can transit a TLS model for shared objects (general/local dynamic) to a TLS model for executables (initial/local exec). If we are going to rename the variable, that does not belong this change. I'll do that separately. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:23 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: <98308840b50e5c5fc9de54973d1ce4e5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc1a5f73a4ae7: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 Files: lld/ELF/Arch/ARM.cpp lld/ELF/Relocations.cpp lld/test/ELF/debug-dead-reloc-tls-arm.s Index: lld/test/ELF/debug-dead-reloc-tls-arm.s =================================================================== --- lld/test/ELF/debug-dead-reloc-tls-arm.s +++ lld/test/ELF/debug-dead-reloc-tls-arm.s @@ -7,8 +7,7 @@ # RUN: llvm-objdump -s %t | FileCheck %s # CHECK: Contents of section .debug_info: -## FIXME: Use ffffffff -# CHECK-NEXT: 0000 00000000 +# CHECK-NEXT: 0000 ffffffff .globl _start _start: Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -238,7 +238,7 @@ } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && !config->shared) { + if (expr == R_DTPREL && canRelax && !config->shared) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); Index: lld/ELF/Arch/ARM.cpp =================================================================== --- lld/ELF/Arch/ARM.cpp +++ lld/ELF/Arch/ARM.cpp @@ -121,6 +121,8 @@ return R_TLSGD_PC; case R_ARM_TLS_LDM32: return R_TLSLD_PC; + case R_ARM_TLS_LDO32: + return R_DTPREL; case R_ARM_BASE_PREL: // B(S) + A - P // FIXME: currently B(S) assumed to be .got, this may not hold for all -------------- next part -------------- A non-text attachment was scrubbed... Name: D83138.275745.patch Type: text/x-patch Size: 1314 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:24 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:24 +0000 (UTC) Subject: [PATCH] D83059: [RISCV] Use Generated Instruction Uncompresser In-Reply-To: References: Message-ID: <712690110711c3f0c7d184133b2bd31e@localhost.localdomain> luismarques accepted this revision. luismarques added a comment. This revision is now accepted and ready to land. LGTM. ================ Comment at: llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp:148-152 + bool Res; + MCInst UncompressedMI; + Res = uncompressInst(UncompressedMI, Inst, MRI, STI); + if (Res) + Inst = std::move(UncompressedMI); ---------------- Nit: declare `Res` at the point of initialization or, probably better yet, just do the `if` on the result of `uncompressInst`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83059/new/ https://reviews.llvm.org/D83059 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:24 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:24 +0000 (UTC) Subject: [PATCH] D83242: [RFC][BPF] support expr with typedef type for FIELD_EXISTENCE reloc Message-ID: yonghong-song created this revision. yonghong-song added reviewers: anakryiko, ast. Herald added subscribers: llvm-commits, cfe-commits, hiraditya. Herald added projects: clang, LLVM. [The patch needs more work e.g. to create proper test, to be agreed on interface, etc.] This patch added support of expression with typedef type for FIELD_EXISTENCE relocation. This tries to address the following use case: enum { FIELD_EXISTENCE = 2, }; typedef unsigned long u64; typedef unsigned int u32; struct bpf_perf_event_data_kern; typedef u64 (*btf_bpf_read_branch_records)(struct bpf_perf_event_data_kern *, void *, u32, u64); int test() { btf_bpf_read_branch_records a; return __builtin_preserve_field_info(a, FIELD_EXISTENCE); } A relocation with a type of typedef 'btf_bpf_read_branch_records' will be recorded. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83242 Files: clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Sema/SemaChecking.cpp llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp Index: llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp =================================================================== --- llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp +++ llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp @@ -277,7 +277,7 @@ } if (GV->getName().startswith("llvm.bpf.preserve.field.info")) { CInfo.Kind = BPFPreserveFieldInfoAI; - CInfo.Metadata = nullptr; + CInfo.Metadata = Call->getMetadata(LLVMContext::MD_preserve_access_index); // Check validity of info_kind as clang did not check this. uint64_t InfoKind = getConstant(Call->getArgOperand(1)); if (InfoKind >= BPFCoreSharedInfo::MAX_FIELD_RELOC_KIND) @@ -742,6 +742,13 @@ break; } + if (CInfo.Kind == BPFPreserveFieldInfoAI) { + // typedef type. + TypeName = std::string(PossibleTypeDef->getName()); + TypeMeta = PossibleTypeDef; + break; + } + assert(CInfo.Kind == BPFPreserveArrayAI); // Array entries will always be consumed for accumulative initial index. Index: clang/lib/Sema/SemaChecking.cpp =================================================================== --- clang/lib/Sema/SemaChecking.cpp +++ clang/lib/Sema/SemaChecking.cpp @@ -2582,6 +2582,15 @@ return false; } + // The second argument needs to be a constant int + Arg = TheCall->getArg(1); + llvm::APSInt Value; + if (!Arg->isIntegerConstantExpr(Value, Context)) { + Diag(Arg->getBeginLoc(), diag::err_preserve_field_info_not_const) + << 2 << Arg->getSourceRange(); + return true; + } + // The first argument needs to be a record field access. // If it is an array element access, we delay decision // to BPF backend to check whether the access is a @@ -2590,21 +2599,14 @@ if (Arg->getType()->getAsPlaceholderType() || (Arg->IgnoreParens()->getObjectKind() != OK_BitField && !dyn_cast(Arg->IgnoreParens()) && - !dyn_cast(Arg->IgnoreParens()))) { + !dyn_cast(Arg->IgnoreParens()) && + // Value 2 here represents reloc kind FIELD_EXISTENCE. + (Value != 2 || !Arg->getType()->getAs()))) { Diag(Arg->getBeginLoc(), diag::err_preserve_field_info_not_field) << 1 << Arg->getSourceRange(); return true; } - // The second argument needs to be a constant int - Arg = TheCall->getArg(1); - llvm::APSInt Value; - if (!Arg->isIntegerConstantExpr(Value, Context)) { - Diag(Arg->getBeginLoc(), diag::err_preserve_field_info_not_const) - << 2 << Arg->getSourceRange(); - return true; - } - TheCall->setType(Context.UnsignedIntTy); return false; } Index: clang/lib/CodeGen/CGBuiltin.cpp =================================================================== --- clang/lib/CodeGen/CGBuiltin.cpp +++ clang/lib/CodeGen/CGBuiltin.cpp @@ -10966,11 +10966,17 @@ ConstantInt *C = cast(EmitScalarExpr(E->getArg(1))); Value *InfoKind = ConstantInt::get(Int64Ty, C->getSExtValue()); + llvm::DIType *DbgInfo = + getDebugInfo()->getOrCreateStandaloneType(E->getArg(0)->getType(), + E->getArg(0)->getExprLoc()); + // Built the IR for the preserve_field_info intrinsic. llvm::Function *FnGetFieldInfo = llvm::Intrinsic::getDeclaration( &CGM.getModule(), llvm::Intrinsic::bpf_preserve_field_info, {FieldAddr->getType()}); - return Builder.CreateCall(FnGetFieldInfo, {FieldAddr, InfoKind}); + CallInst *Fn = Builder.CreateCall(FnGetFieldInfo, {FieldAddr, InfoKind}); + Fn->setMetadata(LLVMContext::MD_preserve_access_index, DbgInfo); + return Fn; } case BPF::BI__builtin_btf_type_id: { Value *FieldVal = nullptr; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83242.275744.patch Type: text/x-patch Size: 3737 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:24 2020 From: llvm-commits at lists.llvm.org (David Tenty via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:24 +0000 (UTC) Subject: [PATCH] D82905: [AIX] Add system-aix to lit config file In-Reply-To: References: Message-ID: <23a23a3592e36011b8242240be01aae9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG2402f9385e85: [AIX] Add system-aix to lit config file (authored by ShuhongL, committed by daltenty). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82905/new/ https://reviews.llvm.org/D82905 Files: llvm/utils/lit/lit/llvm/config.py llvm/utils/lit/tests/lit.cfg llvm/utils/lit/tests/shtest-format-argv0.py Index: llvm/utils/lit/tests/shtest-format-argv0.py =================================================================== --- llvm/utils/lit/tests/shtest-format-argv0.py +++ llvm/utils/lit/tests/shtest-format-argv0.py @@ -5,7 +5,7 @@ # # This test is not supported on AIX since `[` is only available as a shell builtin # and is not installed under PATH by default. -# UNSUPPORTED: aix +# UNSUPPORTED: system-aix # # RUN: %{lit} -j 1 -v %{inputs}/shtest-format-argv0 | FileCheck %s Index: llvm/utils/lit/tests/lit.cfg =================================================================== --- llvm/utils/lit/tests/lit.cfg +++ llvm/utils/lit/tests/lit.cfg @@ -87,7 +87,7 @@ if sys.platform.startswith('win') or sys.platform.startswith('cygwin'): config.available_features.add('system-windows') if platform.system() == 'AIX': - config.available_features.add('aix') + config.available_features.add('system-aix') # For each of lit's internal shell commands ('env', 'cd', 'diff', etc.), put # a fake command that always fails at the start of PATH. This helps us check Index: llvm/utils/lit/lit/llvm/config.py =================================================================== --- llvm/utils/lit/lit/llvm/config.py +++ llvm/utils/lit/lit/llvm/config.py @@ -51,12 +51,14 @@ elif platform.system() == 'Windows': # For tests that require Windows to run. features.add('system-windows') - elif platform.system() == "Linux": + elif platform.system() == 'Linux': features.add('system-linux') elif platform.system() in ['FreeBSD']: features.add('system-freebsd') - elif platform.system() == "NetBSD": + elif platform.system() == 'NetBSD': features.add('system-netbsd') + elif platform.system() == 'AIX': + features.add('system-aix') # Native compilation: host arch == default triple arch # Both of these values should probably be in every site config (e.g. as -------------- next part -------------- A non-text attachment was scrubbed... Name: D82905.275747.patch Type: text/x-patch Size: 2018 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:26 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:26 +0000 (UTC) Subject: [PATCH] D82974: [DebugInfo] Allow GNU macro extension to be read In-Reply-To: References: Message-ID: <9ea30b5a8b1afc61a3041b24cd17c705@localhost.localdomain> dstenb updated this revision to Diff 275749. dstenb marked 3 inline comments as done. dstenb added a comment. Address comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82974/new/ https://reviews.llvm.org/D82974 Files: llvm/include/llvm/BinaryFormat/Dwarf.h llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h llvm/lib/BinaryFormat/Dwarf.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp llvm/test/DebugInfo/X86/debug-macro-gnu.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82974.275749.patch Type: text/x-patch Size: 6611 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:26 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:26 +0000 (UTC) Subject: [PATCH] D82974: [DebugInfo] Allow GNU macro extension to be read In-Reply-To: References: Message-ID: dstenb added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3055-3059 + auto FormToString = [](unsigned Form) { + return dwarf::MacroString(Form, /*GNU=*/false); + }; emitMacroFileImpl(F, U, dwarf::DW_MACRO_start_file, + dwarf::DW_MACRO_end_file, FormToString); ---------------- dblaikie wrote: > I'd probably inline the lambda into the call expression - it's short/simple enough. This change is now removed. I'll inline the expression in the follow-up patch. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h:535 + unsigned EndFile, + std::function MacroFormToString); void handleMacroNodes(DIMacroNodeArray Nodes, DwarfCompileUnit &U); ---------------- dblaikie wrote: > Probably llvm::function_ref here, since the functor doesn't escape the callee Yes, thanks! Since I moved out the GNU defines to a separate GnuMacroString() function this change is done in the patch which introduces emission of the extension. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp:57 + WithColor(OS, HighlightColor::Macro).get() + << MacroString(E.Type, /*GNU=*/Macros.Header.Version < 5); else ---------------- ikudrin wrote: > If this is written like > ``` > << (Macros.Header.Version < 5 ? GnuMacroString(E.type) : MacroString(E.type)) ; > ``` > > then `llvm::dwarf::MacroString()` can be left as is and most of the changes are not needed. Yes, thanks! Since there are so few uses where we conditionally want to select either the DWARF or GNU defines that seems cleaner. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82974/new/ https://reviews.llvm.org/D82974 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:26 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:26 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <2c9fa8b09e3e2fc20d45276f5a7d3cf3@localhost.localdomain> biplmish updated this revision to Diff 275748. biplmish added a comment. Add the CHECK-LE run because there are dependent tests. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82359.275748.patch Type: text/x-patch Size: 17451 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:26 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:26 +0000 (UTC) Subject: [PATCH] D80358: [MLIR] Add RegionKindInterface In-Reply-To: References: Message-ID: <6cb48508221db8ca7106bd68170c64f2@localhost.localdomain> stephenneuendorffer marked 18 inline comments as done. stephenneuendorffer added inline comments. ================ Comment at: mlir/docs/LangRef.md:45 +left unspecified, relying on each transformation on operations to be +designed with the semantics in mind. Preferably, the semantic +properties are described abstractly using [Traits](Traits.md) and ---------------- mehdi_amini wrote: > > each transformation on operations to be designed with the semantics in mind > > I don't understand what this means here. I've rewritten this section. ================ Comment at: mlir/docs/LangRef.md:211 +Value identifiers are only in scope for the region in which they are +defined. Argument identifiers in mapping functions are in scope for +the mapping body. Particular operations may further limit the valid ---------------- mehdi_amini wrote: > > Argument identifiers in mapping functions > > Not sure what "mapping function" refers to here? It is in the existing doc, but this is the only place it is used. I wonder if it isn't something remaining from before functions were regular Operations? > > Is this about block argument? Can you just write "Block argument" instead? > > Edit: I wonder if this is about affine maps? > (we can leave this out of this revision, but we should probably revisit this) Yes, I believe this is talking about affine maps. ================ Comment at: mlir/docs/LangRef.md:213 +the mapping body. Particular operations may further limit the valid +scope of identifiers. For instance, the scope of values in a region +with [SSA control flow semantics](#control-flow-and-ssacfg-regions) is ---------------- mehdi_amini wrote: > > Particular operations may further limit the valid scope of identifiers > > I think this is not specific enough: the scope is only limited by the region kind and I'd rather spell it out this way. > With how you wrote it, one could think that we could have an operation restricting the scope of the value it produces (for example to be block-local). > You're right, that could be misleading. I've rewritten it. ================ Comment at: mlir/docs/LangRef.md:339 An MLIR Module represents a top-level container operation. It contains -a single region containing a single block which can contain any -operations. Operations within this region must not implicitly capture -values defined outside the module. Modules have an optional symbol -name that can be used to refer to them in operations. +a single [SSACFG region)[#control-flow-and-ssacfg-regions] +containing a single block which can contain any operations. Operations ---------------- mehdi_amini wrote: > It isn't clear to me why a module wouldn't be a graph region? > Do we put any meaning to the order of operations in a module? I suppose it could be, although it's somewhat of a degenerate kind of region anyway, since modules don't define any values and don't contain blocks. Maybe this should be another kind of region? ================ Comment at: mlir/docs/LangRef.md:478 [[more rationale](Rationale/Rationale.md#block-arguments-vs-phi-nodes)]. ## Regions ---------------- mehdi_amini wrote: > I feel the section here isn't entirely complete with respect to non CFG region with multiple blocks: are these still having terminator? What's the contract on block arguments? > > An example would be worthwhile as well. I added an additional note about the Graph Region rationale, but I'm really not sure how to provide an example here of something that I don't have a good example of :). Let's chat offline about this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80358/new/ https://reviews.llvm.org/D80358 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:27 2020 From: llvm-commits at lists.llvm.org (Arlo Siemsen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:27 +0000 (UTC) Subject: [PATCH] D83022: Add option LLVM_NM to allow specifying the location of the llvm-nm tool. In-Reply-To: References: Message-ID: <5219b1eaedb5a4e14a14e78bd8f08773@localhost.localdomain> arlosi updated this revision to Diff 275754. arlosi added a comment. Set llvm_nm_target to empty string as requested by smeenai. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83022/new/ https://reviews.llvm.org/D83022 Files: llvm/tools/llvm-shlib/CMakeLists.txt Index: llvm/tools/llvm-shlib/CMakeLists.txt =================================================================== --- llvm/tools/llvm-shlib/CMakeLists.txt +++ llvm/tools/llvm-shlib/CMakeLists.txt @@ -154,13 +154,17 @@ set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py) set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports) - - if(CMAKE_CROSSCOMPILING) - build_native_tool(llvm-nm llvm_nm) - set(llvm_nm_target "${llvm_nm}") + if(NOT LLVM_NM) + if(CMAKE_CROSSCOMPILING) + build_native_tool(llvm-nm llvm_nm) + set(llvm_nm_target "${llvm_nm}") + else() + set(llvm_nm $) + set(llvm_nm_target llvm-nm) + endif() else() - set(llvm_nm $) - set(llvm_nm_target llvm-nm) + set(llvm_nm ${LLVM_NM}) + set(llvm_nm_target "") endif() add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83022.275754.patch Type: text/x-patch Size: 931 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:27 2020 From: llvm-commits at lists.llvm.org (Arlo Siemsen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:27 +0000 (UTC) Subject: [PATCH] D83022: Add option LLVM_NM to allow specifying the location of the llvm-nm tool. In-Reply-To: References: Message-ID: arlosi marked an inline comment as done. arlosi added a comment. I do not have commit access. Could someone land this for me? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83022/new/ https://reviews.llvm.org/D83022 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:27 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:27 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dstenb updated this revision to Diff 275752. dstenb marked an inline comment as done. dstenb added a comment. Rebase and address comment. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h llvm/test/DebugInfo/X86/debug-macro-gnu.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82975.275752.patch Type: text/x-patch Size: 10527 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:28 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:28 +0000 (UTC) Subject: [PATCH] D78861: [Attributor] [WIP] Track AA dependency using dependency graph In-Reply-To: References: Message-ID: bbn added a comment. In D78861#2131573 , @jdoerfert wrote: > This looks pretty good :). Nice active review :) > > I have some minor comments below. We also should add a test for the print and dot output. I need some help here: Is there a way to test the dot output? I checked the .dot file and found it hard to write CHECK lines (see below) because we are interested in the link between different graph nodes (line 3 and line 4) Node0x55be15e4f7d0 [shape=record,label="{[AAValueSimplify] for CtxI ' %2 = load i32, i32* %0, align 4' at position \{arg: [@0]\} with state simplified\n}"]; Node0x55be15e4f810 [shape=record,label="{[AANoUnwind] for CtxI ' %2 = load i32, i32* %0, align 4' at position \{fn:checkAndAdvance [checkAndAdvance at -1]\} with state nounwind\n}"]; Node0x55be15e4f810 -> Node0x55be15e500b0; Node0x55be15e4f810 -> Node0x55be15e500b0; I have referred to some other similar tests like the *cfg_deopt_unreach.ll`, but none of theme shows how to write check lines for such testcases. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78861/new/ https://reviews.llvm.org/D78861 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:28 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:28 +0000 (UTC) Subject: [PATCH] D80358: [MLIR] Add RegionKindInterface In-Reply-To: References: Message-ID: stephenneuendorffer updated this revision to Diff 275753. stephenneuendorffer marked 4 inline comments as done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80358/new/ https://reviews.llvm.org/D80358 Files: mlir/docs/Interfaces.md mlir/docs/LangRef.md mlir/include/mlir/IR/CMakeLists.txt mlir/include/mlir/IR/Dominance.h mlir/include/mlir/IR/RegionKindInterface.h mlir/include/mlir/IR/RegionKindInterface.td mlir/lib/IR/CMakeLists.txt mlir/lib/IR/Dominance.cpp mlir/lib/IR/RegionKindInterface.cpp mlir/lib/IR/Verifier.cpp mlir/lib/Transforms/CSE.cpp mlir/test/CMakeLists.txt mlir/test/IR/invalid.mlir mlir/test/IR/parser.mlir mlir/test/IR/traits.mlir mlir/test/lib/Dialect/Test/TestDialect.cpp mlir/test/lib/Dialect/Test/TestDialect.h mlir/test/lib/Dialect/Test/TestOps.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D80358.275753.patch Type: text/x-patch Size: 61697 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:28 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:28 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC Message-ID: MaskRay created this revision. MaskRay added reviewers: grimar, psmith, ruiu. Herald added subscribers: llvm-commits, s.egerton, simoncook, arichardson, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. In the absence of TLS relaxation (rewrite of code sequences), there is still an applicable optimization: [gd]: General Dynamic: resolve DTPMOD to 1 and/or resolve DTPOFF statically All the other relaxations are only performed when !config->shared. Since [gd] is handled differently, we can fold !config->shared into canRelax and simplify its use sites. Rename it to reflect to new semantics. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83243 Files: lld/ELF/Relocations.cpp Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -197,9 +197,9 @@ return 1; } - bool canRelax = config->emachine != EM_ARM && - config->emachine != EM_HEXAGON && - config->emachine != EM_RISCV; + bool sharedToExecRelax = !config->shared && config->emachine != EM_ARM && + config->emachine != EM_HEXAGON && + config->emachine != EM_RISCV; // If we are producing an executable and the symbol is non-preemptable, it // must be defined and the code sequence can be relaxed to use Local-Exec. @@ -217,7 +217,7 @@ if (oneof( expr)) { // Local-Dynamic relocs can be relaxed to Local-Exec. - if (canRelax && !config->shared) { + if (sharedToExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -238,7 +238,7 @@ } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && canRelax && !config->shared) { + if (expr == R_DTPREL && sharedToExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -260,7 +260,7 @@ if (oneof(expr)) { - if (!canRelax || config->shared) { + if (!sharedToExecRelax) { if (in.got->addDynTlsEntry(sym)) { uint64_t off = in.got->getGlobalDynOffset(sym); @@ -308,7 +308,7 @@ // defined. if (oneof(expr) && - canRelax && isLocalInExecutable) { + sharedToExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); return 1; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83243.275756.patch Type: text/x-patch Size: 2096 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:28 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:28 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: cameron.mcinally marked an inline comment as done. cameron.mcinally added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:3244 +// NOTE: When targeting fixed length vectors at SVE the range of MVTs is runtime +// variable, hence this manual selection. +static SDNode *extractSubReg(SelectionDAG *DAG, EVT VT, SDValue V) { ---------------- paulwalker-arm wrote: > efriedma wrote: > > paulwalker-arm wrote: > > > paulwalker-arm wrote: > > > > efriedma wrote: > > > > > paulwalker-arm wrote: > > > > > > efriedma wrote: > > > > > > > I'm not sure I understand the issue here. Is the problem just that for a pattern, you need to write the type of the result? I don't think there's any problem with writing a pattern involving a type that isn't always legal; if the type isn't legal, it just won't match. > > > > > > I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class, which prevents pattern based matching. > > > > > > > > > > > > We did investigate using hwmodes but it didn't prove to be a viable solution. > > > > > > I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class > > > > > > > > > > You could change that in AArch64RegisterInfo.td, if you wanted to. I don't think that would cause any issues; it's okay if some of the types in the list aren't legal for all subtargets. > > > > Thanks for the tip, I'll take look and see how it works out. > > > I tried adding the extra fixed length vector MVTs to ZPRClass but it didn't really work out. > > > > > > Firstly I needed to fix up a bunch of patterns due to "Could not infer all types in pattern" build errors. This might be down to ZPRClass being sized as 128, which looks especially weird once it starts to contain a lot of MVTs that are known bigger than 128. Ultimately though the failing patterns are easily fixed but I guess potentially burdensome since 99.9% of the patterns shouldn't need to care about fixed length SVE. > > > > > > After this the extract_subvector patterns still do not match. I've tracked this down to SDTSubVecExtract's usage of SDTCisSubVecOfVec which is not strictly true given the newly expanded definition of extract_subvector. I cannot just update SDTCisSubVecOfVec because it's used by other operations where we don't want to allow a mixture of fixed length and scalable vectors (e.g. concat_vector). > > > > > > I can add a new variant of SDTCisSubVecOfVec for use by extract_subvector and insert_subvector, but given the minimal usage I'm not sure it's worth it. What do you think? > > I guess it's fine to leave as-is, in that case. > Thanks. I'll circle back round to this after we've enough functionality for fixed length to be usable. No objections, but I do like Eli's idea for the long term. This would be a step in the right direction for passing fixed-width arguments. Having a restricted IR for fixed-width isn't ideal. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll:25 +; how fixed length operation are lowered to scalable ones, with multiple blocks +; ensuring insert/extract sequences are not folded away. + ---------------- Nit: could probably mark the loads/stores volatile to avoid the branch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:28 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:28 +0000 (UTC) Subject: [PATCH] D83071: Add support for options with two flags for controlling the same field. In-Reply-To: References: Message-ID: <380a4d6ead9d3f7d4fede9dc4bee38b4@localhost.localdomain> dang marked an inline comment as done. dang added inline comments. ================ Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3931-3932 if (((FLAGS)&options::CC1Option) && \ (ALWAYS_EMIT || EXTRACTOR(this->KEYPATH) != DEFAULT_VALUE)) { \ - if (Option::KIND##Class == Option::FlagClass) { \ - Args.push_back(SPELLING); \ - } \ - if (Option::KIND##Class == Option::SeparateClass) { \ - Args.push_back(SPELLING); \ - DENORMALIZER(Args, SA, TABLE_INDEX, EXTRACTOR(this->KEYPATH)); \ - } \ + DENORMALIZER(Args, SPELLING, SA, TABLE_INDEX, EXTRACTOR(this->KEYPATH)); \ } ---------------- dang wrote: > dexonsmith wrote: > > I realize this commit doesn't introduce it, but it seems unfortunate to call `EXTRACTOR` twice. Maybe in a follow-up or prep commit you can fix that... maybe something like this? > > ``` > > if ((FLAGS)&options::CC1Option) { > > const auto &Extracted = EXTRACTOR(this->KEYPATH); > > if (ALWAYS_EMIT || Extracted != DEFAULT_VALUE) > > DENORMALIZER(Args, SPELLING, SA, TABLE_INDEX, EXTRACTOR(this->KEYPATH)); > > } > > ``` > Yes I can do that of course. Although EXTRACTOR is meant to be very cheap and in most cases it expands to just `this->KEYPATH` See D83211 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83071/new/ https://reviews.llvm.org/D83071 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:29 2020 From: llvm-commits at lists.llvm.org (George Rokos via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:29 +0000 (UTC) Subject: [PATCH] D83130: [OPENMP]Fix test for ARM, NFC. In-Reply-To: References: Message-ID: grokos accepted this revision. grokos added a comment. This revision is now accepted and ready to land. Looks good. Do we know why the test doesn't run without `-fPIC`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83130/new/ https://reviews.llvm.org/D83130 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:29 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:54:29 +0000 (UTC) Subject: [PATCH] D82660: [RISCV] Optimize multiplication by specific immediates In-Reply-To: References: Message-ID: <871ce1c8e6342abcc2bcc298ea648168@localhost.localdomain> luismarques accepted this revision. luismarques added a comment. This revision is now accepted and ready to land. LGTM. I'm not overly concerned about the occasional code size increases from doing the optimization for RV32IM, so the loosening of the condition is OK IMO. Everything else seems to be in order now. Maybe wait a couple of days more for @lenary's OK. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82660/new/ https://reviews.llvm.org/D82660 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:29 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:29 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <2b417e5815ad694449e40eb3a23d6094@localhost.localdomain> biplmish updated this revision to Diff 275757. biplmish added a comment. Code changes to bring the LE and BE block close, add additional tests. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/test/CodeGen/PowerPC/p10-splatImm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82520.275757.patch Type: text/x-patch Size: 5723 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:54:30 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:30 +0000 (UTC) Subject: [PATCH] D83130: [OPENMP]Fix test for ARM, NFC. In-Reply-To: References: Message-ID: <31dff599b06aaa4adce4f571459ab6b6@localhost.localdomain> ABataev added a comment. In D83130#2133492 , @grokos wrote: > Looks good. Do we know why the test doesn't run without `-fPIC`? Looks like specific behavior for ARM. `-shared` requires `-fPIC` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83130/new/ https://reviews.llvm.org/D83130 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:30 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:30 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: aprantl added a comment. And conversely, with this patch applied, do GDB and LLDB still produce the expected result? Also, what happens to the next bit field or variable right after the bit-filed with the now larger container? Is that affected by the patch? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 12:54:31 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:54:31 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <43a96b54992f819dc5184115a880d043@localhost.localdomain> lei added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:651 + // CHECK-NEXT: [[T3:%.+]] = shufflevector <2 x double> [[T2:%.+]], <2 x double> undef, <2 x i32> zeroinitialize + // CHECK-NEXT: ret <2 x double> [[T3:%.+]] + return vec_splatid(1.0); ---------------- missing CHECK-BE? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 12:55:21 2020 From: llvm-commits at lists.llvm.org (James Y Knight via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:55:21 +0000 (UTC) Subject: [PATCH] D83136: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst In-Reply-To: References: Message-ID: <598146d2a7a8724b4be3079162895ed2@localhost.localdomain> jyknight accepted this revision. jyknight added a comment. This revision is now accepted and ready to land. It's odd to have CreateAtomicCmpXchg and CreateAtomicRMW not have Align as an argument when the constructors do, but as long as the migration is finished in an upcoming commit, this seems fine as a step on the path. ================ Comment at: llvm/include/llvm/IR/IRBuilder.h:1746 + return Insert(new AtomicCmpXchgInst( + Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID)); } ---------------- jfb wrote: > Are types always rounded to a power of two at this point? > > i.e. what does this do: `struct { char c[3]; };` ? > > Also, I think this is wrong for non-lock-free types. The alignment requirement is lower on those. This is just encoding the pre-existing behavior -- you cannot currently create an cmpxchg instruction with alignment other than the size of the type. Right now, you _also_ cannot create a cmpxchg instruction with other than integral or pointer types, which -- in any _current_ llvm backend, afaik -- have non-power-of-2 sizes. Upcoming changes will plumb through a required alignment argument everywhere, and then we'll be rid of this weird hardcoded special alignment behavior here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83136/new/ https://reviews.llvm.org/D83136 From llvm-commits at lists.llvm.org Mon Jul 6 12:58:06 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:58:06 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: amyk updated this revision to Diff 275803. amyk added a comment. Address the following review comments: - formatting in PPCInstrPrefix.td - Remove extra `( )` - Indentation in PPCISelLowering - Remove unnecessary variables - Add extra case to return `SDValue()` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275803.patch Type: text/x-patch Size: 11044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:58:21 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Nicolai_H=C3=A4hnle?= via llvm-commits) Date: Mon, 06 Jul 2020 12:58:21 -0700 (PDT) Subject: [llvm] 76c5cb0 - DomTree: Remove getChildren() accessor Message-ID: <5f03825d.1c69fb81.9852f.31d9@mx.google.com> Author: Nicolai Hähnle Date: 2020-07-06T21:58:11+02:00 New Revision: 76c5cb05a3a67340cc7950eb8fb5c2d2a0ac4554 URL: https://github.com/llvm/llvm-project/commit/76c5cb05a3a67340cc7950eb8fb5c2d2a0ac4554 DIFF: https://github.com/llvm/llvm-project/commit/76c5cb05a3a67340cc7950eb8fb5c2d2a0ac4554.diff LOG: DomTree: Remove getChildren() accessor Summary: Avoid exposing details about how children are stored. This will enable subsequent type-erasure changes. New methods are introduced to cover common access patterns. Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83083 Added: Modified: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/CodeGen/EarlyIfConversion.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/MachineCSE.cpp llvm/lib/CodeGen/MachineLICM.cpp llvm/lib/CodeGen/MachineSink.cpp llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp llvm/lib/Target/Mips/MipsOptimizePICCall.cpp llvm/lib/Transforms/Scalar/ConstantHoisting.cpp llvm/lib/Transforms/Scalar/NewGVN.cpp llvm/lib/Transforms/Utils/LoopSimplify.cpp llvm/lib/Transforms/Utils/LoopUnroll.cpp llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/GenericDomTree.h b/llvm/include/llvm/Support/GenericDomTree.h index f90fe0ba7dae..d93293864a65 100644 --- a/llvm/include/llvm/Support/GenericDomTree.h +++ b/llvm/include/llvm/Support/GenericDomTree.h @@ -77,18 +77,25 @@ template class DomTreeNodeBase { const_iterator begin() const { return Children.begin(); } const_iterator end() const { return Children.end(); } + DomTreeNodeBase *const &back() const { return Children.back(); } + DomTreeNodeBase *&back() { return Children.back(); } + + iterator_range children() { return make_range(begin(), end()); } + iterator_range children() const { + return make_range(begin(), end()); + } + NodeT *getBlock() const { return TheBB; } DomTreeNodeBase *getIDom() const { return IDom; } unsigned getLevel() const { return Level; } - const std::vector &getChildren() const { return Children; } - std::unique_ptr addChild( std::unique_ptr C) { Children.push_back(C.get()); return C; } + bool isLeaf() const { return Children.empty(); } size_t getNumChildren() const { return Children.size(); } void clearAllChildren() { Children.clear(); } @@ -619,7 +626,7 @@ class DominatorTreeBase { void eraseNode(NodeT *BB) { DomTreeNodeBase *Node = getNode(BB); assert(Node && "Removing node that isn't in dominator tree."); - assert(Node->getChildren().empty() && "Node is not a leaf node."); + assert(Node->isLeaf() && "Node is not a leaf node."); DFSInfoValid = false; diff --git a/llvm/include/llvm/Support/GenericDomTreeConstruction.h b/llvm/include/llvm/Support/GenericDomTreeConstruction.h index 7c94a26705a8..1e9b0f23c144 100644 --- a/llvm/include/llvm/Support/GenericDomTreeConstruction.h +++ b/llvm/include/llvm/Support/GenericDomTreeConstruction.h @@ -1396,7 +1396,7 @@ struct SemiNCAInfo { const TreeNodePtr Node = NodeToTN.second.get(); // Handle tree leaves. - if (Node->getChildren().empty()) { + if (Node->isLeaf()) { if (Node->getDFSNumIn() + 1 != Node->getDFSNumOut()) { errs() << "Tree leaf should have DFSOut = DFSIn + 1:\n\t"; PrintNodeAndDFSNums(Node); @@ -1508,7 +1508,8 @@ struct SemiNCAInfo { for (auto &NodeToTN : DT.DomTreeNodes) { const TreeNodePtr TN = NodeToTN.second.get(); const NodePtr BB = TN->getBlock(); - if (!BB || TN->getChildren().empty()) continue; + if (!BB || TN->isLeaf()) + continue; LLVM_DEBUG(dbgs() << "Verifying parent property of node " << BlockNamePrinter(TN) << "\n"); @@ -1517,7 +1518,7 @@ struct SemiNCAInfo { return From != BB && To != BB; }); - for (TreeNodePtr Child : TN->getChildren()) + for (TreeNodePtr Child : TN->children()) if (NodeToInfo.count(Child->getBlock()) != 0) { errs() << "Child " << BlockNamePrinter(Child) << " reachable after its parent " << BlockNamePrinter(BB) @@ -1541,17 +1542,17 @@ struct SemiNCAInfo { for (auto &NodeToTN : DT.DomTreeNodes) { const TreeNodePtr TN = NodeToTN.second.get(); const NodePtr BB = TN->getBlock(); - if (!BB || TN->getChildren().empty()) continue; + if (!BB || TN->isLeaf()) + continue; - const auto &Siblings = TN->getChildren(); - for (const TreeNodePtr N : Siblings) { + for (const TreeNodePtr N : TN->children()) { clear(); NodePtr BBN = N->getBlock(); doFullDFSWalk(DT, [BBN](NodePtr From, NodePtr To) { return From != BBN && To != BBN; }); - for (const TreeNodePtr S : Siblings) { + for (const TreeNodePtr S : TN->children()) { if (S == N) continue; if (NodeToInfo.count(S->getBlock()) == 0) { diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp index a67072ce340e..96d4efb856c1 100644 --- a/llvm/lib/CodeGen/EarlyIfConversion.cpp +++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp @@ -759,7 +759,7 @@ void updateDomTree(MachineDominatorTree *DomTree, const SSAIfConv &IfConv, assert(Node != HeadNode && "Cannot erase the head node"); while (Node->getNumChildren()) { assert(Node->getBlock() == IfConv.Tail && "Unexpected children"); - DomTree->changeImmediateDominator(Node->getChildren().back(), HeadNode); + DomTree->changeImmediateDominator(Node->back(), HeadNode); } DomTree->eraseNode(B); } diff --git a/llvm/lib/CodeGen/InlineSpiller.cpp b/llvm/lib/CodeGen/InlineSpiller.cpp index 7e4a89c8ec47..41eef2fed840 100644 --- a/llvm/lib/CodeGen/InlineSpiller.cpp +++ b/llvm/lib/CodeGen/InlineSpiller.cpp @@ -1306,10 +1306,7 @@ void HoistSpillHelper::getVisitOrders( Orders.push_back(MDT.getBase().getNode(Root)); do { MachineDomTreeNode *Node = Orders[idx++]; - const std::vector &Children = Node->getChildren(); - unsigned NumChildren = Children.size(); - for (unsigned i = 0; i != NumChildren; ++i) { - MachineDomTreeNode *Child = Children[i]; + for (MachineDomTreeNode *Child : Node->children()) { if (WorkSet.count(Child)) Orders.push_back(Child); } @@ -1377,10 +1374,7 @@ void HoistSpillHelper::runHoistSpills( // Collect spills in subtree of current node (*RIt) to // SpillsInSubTreeMap[*RIt].first. - const std::vector &Children = (*RIt)->getChildren(); - unsigned NumChildren = Children.size(); - for (unsigned i = 0; i != NumChildren; ++i) { - MachineDomTreeNode *Child = Children[i]; + for (MachineDomTreeNode *Child : (*RIt)->children()) { if (SpillsInSubTreeMap.find(Child) == SpillsInSubTreeMap.end()) continue; // The stmt "SpillsInSubTree = SpillsInSubTreeMap[*RIt].first" below diff --git a/llvm/lib/CodeGen/MachineCSE.cpp b/llvm/lib/CodeGen/MachineCSE.cpp index 8c195adb444d..09531276bc10 100644 --- a/llvm/lib/CodeGen/MachineCSE.cpp +++ b/llvm/lib/CodeGen/MachineCSE.cpp @@ -747,9 +747,8 @@ bool MachineCSE::PerformCSE(MachineDomTreeNode *Node) { do { Node = WorkList.pop_back_val(); Scopes.push_back(Node); - const std::vector &Children = Node->getChildren(); - OpenChildren[Node] = Children.size(); - for (MachineDomTreeNode *Child : Children) + OpenChildren[Node] = Node->getNumChildren(); + for (MachineDomTreeNode *Child : Node->children()) WorkList.push_back(Child); } while (!WorkList.empty()); @@ -862,8 +861,7 @@ bool MachineCSE::PerformSimplePRE(MachineDominatorTree *DT) { BBs.push_back(DT->getRootNode()); do { auto Node = BBs.pop_back_val(); - const std::vector &Children = Node->getChildren(); - for (MachineDomTreeNode *Child : Children) + for (MachineDomTreeNode *Child : Node->children()) BBs.push_back(Child); MachineBasicBlock *MBB = Node->getBlock(); diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp index 98638b9fa737..5e8a916b3b3b 100644 --- a/llvm/lib/CodeGen/MachineLICM.cpp +++ b/llvm/lib/CodeGen/MachineLICM.cpp @@ -737,8 +737,7 @@ void MachineLICMBase::HoistOutOfLoop(MachineDomTreeNode *HeaderN) { continue; Scopes.push_back(Node); - const std::vector &Children = Node->getChildren(); - unsigned NumChildren = Children.size(); + unsigned NumChildren = Node->getNumChildren(); // Don't hoist things out of a large switch statement. This often causes // code to be hoisted that wasn't going to be executed, and increases @@ -747,13 +746,14 @@ void MachineLICMBase::HoistOutOfLoop(MachineDomTreeNode *HeaderN) { NumChildren = 0; OpenChildren[Node] = NumChildren; - // Add children in reverse order as then the next popped worklist node is - // the first child of this node. This means we ultimately traverse the - // DOM tree in exactly the same order as if we'd recursed. - for (int i = (int)NumChildren-1; i >= 0; --i) { - MachineDomTreeNode *Child = Children[i]; - ParentMap[Child] = Node; - WorkList.push_back(Child); + if (NumChildren) { + // Add children in reverse order as then the next popped worklist node is + // the first child of this node. This means we ultimately traverse the + // DOM tree in exactly the same order as if we'd recursed. + for (MachineDomTreeNode *Child : reverse(Node->children())) { + ParentMap[Child] = Node; + WorkList.push_back(Child); + } } } diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index 1d253a60b558..5f958bbc31b7 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -623,14 +623,13 @@ MachineSinking::GetAllSortedSuccessors(MachineInstr &MI, MachineBasicBlock *MBB, // if () {} else {} // use x // - const std::vector &Children = - DT->getNode(MBB)->getChildren(); - for (const auto &DTChild : Children) + for (MachineDomTreeNode *DTChild : DT->getNode(MBB)->children()) { // DomTree children of MBB that have MBB as immediate dominator are added. if (DTChild->getIDom()->getBlock() == MI.getParent() && // Skip MBBs already added to the AllSuccs vector above. !MBB->isSuccessor(DTChild->getBlock())) AllSuccs.push_back(DTChild->getBlock()); + } // Sort Successors according to their loop depth or block frequency info. llvm::stable_sort( diff --git a/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp b/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp index 4613002de5ff..82e8df3b73f9 100644 --- a/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp +++ b/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp @@ -828,7 +828,7 @@ void AArch64ConditionalCompares::updateDomTree( assert(Node != HeadNode && "Cannot erase the head node"); assert(Node->getIDom() == HeadNode && "CmpBB should be dominated by Head"); while (Node->getNumChildren()) - DomTree->changeImmediateDominator(Node->getChildren().back(), HeadNode); + DomTree->changeImmediateDominator(Node->back(), HeadNode); DomTree->eraseNode(RemovedMBB); } } diff --git a/llvm/lib/Target/Mips/MipsOptimizePICCall.cpp b/llvm/lib/Target/Mips/MipsOptimizePICCall.cpp index 8bd64ff6cb27..2823d300dc6e 100644 --- a/llvm/lib/Target/Mips/MipsOptimizePICCall.cpp +++ b/llvm/lib/Target/Mips/MipsOptimizePICCall.cpp @@ -218,8 +218,7 @@ bool OptimizePICCall::runOnMachineFunction(MachineFunction &F) { MBBI.preVisit(ScopedHT); Changed |= visitNode(MBBI); const MachineDomTreeNode *Node = MBBI.getNode(); - const std::vector &Children = Node->getChildren(); - WorkList.append(Children.begin(), Children.end()); + WorkList.append(Node->begin(), Node->end()); } return Changed; diff --git a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp index fa3bc5f82a3d..7c14b69d658d 100644 --- a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp +++ b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp @@ -250,7 +250,7 @@ static void findBestInsertionSet(DominatorTree &DT, BlockFrequencyInfo &BFI, Orders.push_back(Entry); while (Idx != Orders.size()) { BasicBlock *Node = Orders[Idx++]; - for (auto ChildDomNode : DT.getNode(Node)->getChildren()) { + for (auto ChildDomNode : DT.getNode(Node)->children()) { if (Candidates.count(ChildDomNode->getBlock())) Orders.push_back(ChildDomNode->getBlock()); } diff --git a/llvm/lib/Transforms/Scalar/NewGVN.cpp b/llvm/lib/Transforms/Scalar/NewGVN.cpp index 576d40f0e0ff..0ed1773373a7 100644 --- a/llvm/lib/Transforms/Scalar/NewGVN.cpp +++ b/llvm/lib/Transforms/Scalar/NewGVN.cpp @@ -3436,7 +3436,7 @@ bool NewGVN::runGVN() { // Sort dominator tree children arrays into RPO. for (auto &B : RPOT) { auto *Node = DT->getNode(B); - if (Node->getChildren().size() > 1) + if (Node->getNumChildren() > 1) llvm::sort(Node->begin(), Node->end(), [&](const DomTreeNode *A, const DomTreeNode *B) { return RPOOrdering[A] < RPOOrdering[B]; diff --git a/llvm/lib/Transforms/Utils/LoopSimplify.cpp b/llvm/lib/Transforms/Utils/LoopSimplify.cpp index c33f36f70a88..a8445e94e55a 100644 --- a/llvm/lib/Transforms/Utils/LoopSimplify.cpp +++ b/llvm/lib/Transforms/Utils/LoopSimplify.cpp @@ -696,10 +696,8 @@ static bool simplifyOneLoop(Loop *L, SmallVectorImpl &Worklist, LI->removeBlock(ExitingBlock); DomTreeNode *Node = DT->getNode(ExitingBlock); - const std::vector *> &Children = - Node->getChildren(); - while (!Children.empty()) { - DomTreeNode *Child = Children.front(); + while (!Node->isLeaf()) { + DomTreeNode *Child = Node->back(); DT->changeImmediateDominator(Child, Node->getIDom()); } DT->eraseNode(ExitingBlock); diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp index b86c62a78ce8..3875c631f839 100644 --- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp @@ -809,7 +809,7 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, for (auto *BB : OriginalLoopBlocks) { auto *BBDomNode = DT->getNode(BB); SmallVector ChildrenToUpdate; - for (auto *ChildDomNode : BBDomNode->getChildren()) { + for (auto *ChildDomNode : BBDomNode->children()) { auto *ChildBB = ChildDomNode->getBlock(); if (!L->contains(ChildBB)) ChildrenToUpdate.push_back(ChildBB); diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp index c0d97cf7eeca..2515b1676cb9 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp @@ -848,7 +848,7 @@ bool llvm::UnrollRuntimeLoopRemainder( // dominator of the exit blocks. for (auto *BB : L->blocks()) { auto *DomNodeBB = DT->getNode(BB); - for (auto *DomChild : DomNodeBB->getChildren()) { + for (auto *DomChild : DomNodeBB->children()) { auto *DomChildBB = DomChild->getBlock(); if (!L->contains(LI->getLoopFor(DomChildBB))) ChildrenToUpdate.push_back(DomChildBB); diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp index 9241377012a4..43363736684e 100644 --- a/llvm/lib/Transforms/Utils/LoopUtils.cpp +++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp @@ -512,9 +512,10 @@ llvm::collectChildrenInLoop(DomTreeNode *N, const Loop *CurLoop) { AddRegionToWorklist(N); - for (size_t I = 0; I < Worklist.size(); I++) - for (DomTreeNode *Child : Worklist[I]->getChildren()) + for (size_t I = 0; I < Worklist.size(); I++) { + for (DomTreeNode *Child : Worklist[I]->children()) AddRegionToWorklist(Child); + } return Worklist; } From llvm-commits at lists.llvm.org Mon Jul 6 12:58:23 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Nicolai_H=C3=A4hnle?= via llvm-commits) Date: Mon, 06 Jul 2020 12:58:23 -0700 (PDT) Subject: [llvm] 723a44c - DomTree: Remove the releaseMemory() method Message-ID: <5f03825f.1c69fb81.d479d.7e1f@mx.google.com> Author: Nicolai Hähnle Date: 2020-07-06T21:58:11+02:00 New Revision: 723a44c9b5d654ec791720fc450757ae00f9e631 URL: https://github.com/llvm/llvm-project/commit/723a44c9b5d654ec791720fc450757ae00f9e631 DIFF: https://github.com/llvm/llvm-project/commit/723a44c9b5d654ec791720fc450757ae00f9e631.diff LOG: DomTree: Remove the releaseMemory() method Summary: It is fully redundant with reset(). Change-Id: I25850b9f08eace757cf03cbb8780e970aca7f51a Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: wdng, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83084 Added: Modified: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/Analysis/PostDominators.h llvm/include/llvm/IR/Dominators.h llvm/include/llvm/Support/GenericDomTree.h Removed: ################################################################################ diff --git a/clang/include/clang/Analysis/Analyses/Dominators.h b/clang/include/clang/Analysis/Analyses/Dominators.h index 061c98137da2..51d86f6e4540 100644 --- a/clang/include/clang/Analysis/Analyses/Dominators.h +++ b/clang/include/clang/Analysis/Analyses/Dominators.h @@ -167,9 +167,7 @@ class CFGDominatorTreeImpl : public ManagedAnalysis { } /// Releases the memory held by the dominator tree. - virtual void releaseMemory() { - DT.releaseMemory(); - } + virtual void releaseMemory() { DT.reset(); } /// Converts the dominator tree to human readable form. virtual void print(raw_ostream &OS, const llvm::Module* M= nullptr) const { diff --git a/llvm/include/llvm/Analysis/PostDominators.h b/llvm/include/llvm/Analysis/PostDominators.h index 801eb3d59673..296110d8d03b 100644 --- a/llvm/include/llvm/Analysis/PostDominators.h +++ b/llvm/include/llvm/Analysis/PostDominators.h @@ -88,9 +88,7 @@ struct PostDominatorTreeWrapperPass : public FunctionPass { AU.setPreservesAll(); } - void releaseMemory() override { - DT.releaseMemory(); - } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module*) const override; }; diff --git a/llvm/include/llvm/IR/Dominators.h b/llvm/include/llvm/IR/Dominators.h index a79be8779b7e..0084ac0b655a 100644 --- a/llvm/include/llvm/IR/Dominators.h +++ b/llvm/include/llvm/IR/Dominators.h @@ -277,7 +277,7 @@ class DominatorTreeWrapperPass : public FunctionPass { AU.setPreservesAll(); } - void releaseMemory() override { DT.releaseMemory(); } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module *M = nullptr) const override; }; diff --git a/llvm/include/llvm/Support/GenericDomTree.h b/llvm/include/llvm/Support/GenericDomTree.h index d93293864a65..e83e7aa39e7a 100644 --- a/llvm/include/llvm/Support/GenericDomTree.h +++ b/llvm/include/llvm/Support/GenericDomTree.h @@ -325,8 +325,6 @@ class DominatorTreeBase { return false; } - void releaseMemory() { reset(); } - /// getNode - return the (Post)DominatorTree node for the specified basic /// block. This is the same as using operator[] on this class. The result /// may (but is not required to) be null for a forward (backwards) @@ -760,9 +758,6 @@ class DominatorTreeBase { return DomTreeBuilder::Verify(*this, VL); } -protected: - void addRoot(NodeT *BB) { this->Roots.push_back(BB); } - void reset() { DomTreeNodes.clear(); Roots.clear(); @@ -772,6 +767,9 @@ class DominatorTreeBase { SlowQueries = 0; } +protected: + void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template From llvm-commits at lists.llvm.org Mon Jul 6 12:58:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:58:27 +0000 (UTC) Subject: [PATCH] D83085: DomTree: Remove getRoots() accessor In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. nhaehnle marked an inline comment as done. Closed by commit rGdfcc68c52826: DomTree: Remove getRoots() accessor (authored by nhaehnle). Changed prior to commit: https://reviews.llvm.org/D83085?vs=275228&id=275807#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83085/new/ https://reviews.llvm.org/D83085 Files: llvm/include/llvm/Analysis/DominanceFrontier.h llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/CodeGen/MachinePostDominators.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/unittests/IR/DominatorTreeTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83085.275807.patch Type: text/x-patch Size: 6146 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:58:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:58:27 +0000 (UTC) Subject: [PATCH] D83084: DomTree: Remove the releaseMemory() method In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG723a44c9b5d6: DomTree: Remove the releaseMemory() method (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83084/new/ https://reviews.llvm.org/D83084 Files: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/Analysis/PostDominators.h llvm/include/llvm/IR/Dominators.h llvm/include/llvm/Support/GenericDomTree.h Index: llvm/include/llvm/Support/GenericDomTree.h =================================================================== --- llvm/include/llvm/Support/GenericDomTree.h +++ llvm/include/llvm/Support/GenericDomTree.h @@ -325,8 +325,6 @@ return false; } - void releaseMemory() { reset(); } - /// getNode - return the (Post)DominatorTree node for the specified basic /// block. This is the same as using operator[] on this class. The result /// may (but is not required to) be null for a forward (backwards) @@ -760,9 +758,6 @@ return DomTreeBuilder::Verify(*this, VL); } -protected: - void addRoot(NodeT *BB) { this->Roots.push_back(BB); } - void reset() { DomTreeNodes.clear(); Roots.clear(); @@ -772,6 +767,9 @@ SlowQueries = 0; } +protected: + void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template Index: llvm/include/llvm/IR/Dominators.h =================================================================== --- llvm/include/llvm/IR/Dominators.h +++ llvm/include/llvm/IR/Dominators.h @@ -277,7 +277,7 @@ AU.setPreservesAll(); } - void releaseMemory() override { DT.releaseMemory(); } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module *M = nullptr) const override; }; Index: llvm/include/llvm/Analysis/PostDominators.h =================================================================== --- llvm/include/llvm/Analysis/PostDominators.h +++ llvm/include/llvm/Analysis/PostDominators.h @@ -88,9 +88,7 @@ AU.setPreservesAll(); } - void releaseMemory() override { - DT.releaseMemory(); - } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module*) const override; }; Index: clang/include/clang/Analysis/Analyses/Dominators.h =================================================================== --- clang/include/clang/Analysis/Analyses/Dominators.h +++ clang/include/clang/Analysis/Analyses/Dominators.h @@ -167,9 +167,7 @@ } /// Releases the memory held by the dominator tree. - virtual void releaseMemory() { - DT.releaseMemory(); - } + virtual void releaseMemory() { DT.reset(); } /// Converts the dominator tree to human readable form. virtual void print(raw_ostream &OS, const llvm::Module* M= nullptr) const { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83084.275805.patch Type: text/x-patch Size: 2424 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:58:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:58:27 +0000 (UTC) Subject: [PATCH] D83086: DomTree: add private create{Child,Node} helpers In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGf987ba3cf9af: DomTree: add private create{Child,Node} helpers (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83086/new/ https://reviews.llvm.org/D83086 Files: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h Index: llvm/include/llvm/Support/GenericDomTreeConstruction.h =================================================================== --- llvm/include/llvm/Support/GenericDomTreeConstruction.h +++ llvm/include/llvm/Support/GenericDomTreeConstruction.h @@ -187,9 +187,7 @@ // Add a new tree node for this NodeT, and link it as a child of // IDomNode - return (DT.DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))) - .get(); + return DT.createChild(BB, IDomNode); } static bool AlwaysDescend(NodePtr, NodePtr) { return true; } @@ -587,9 +585,7 @@ // all real exits (including multiple exit blocks, infinite loops). NodePtr Root = IsPostDom ? nullptr : DT.Roots[0]; - DT.RootNode = (DT.DomTreeNodes[Root] = - std::make_unique>(Root, nullptr)) - .get(); + DT.RootNode = DT.createNode(Root); SNCA.attachNewSubtree(DT, DT.RootNode); } @@ -610,8 +606,7 @@ // Add a new tree node for this BasicBlock, and link it as a child of // IDomNode. - DT.DomTreeNodes[W] = IDomNode->addChild( - std::make_unique>(W, IDomNode)); + DT.createChild(W, IDomNode); } } @@ -661,10 +656,7 @@ // The unreachable node becomes a new root -- a tree node for it. TreeNodePtr VirtualRoot = DT.getNode(nullptr); - FromTN = - (DT.DomTreeNodes[From] = VirtualRoot->addChild( - std::make_unique>(From, VirtualRoot))) - .get(); + FromTN = DT.createChild(From, VirtualRoot); DT.Roots.push_back(From); } Index: llvm/include/llvm/Support/GenericDomTree.h =================================================================== --- llvm/include/llvm/Support/GenericDomTree.h +++ llvm/include/llvm/Support/GenericDomTree.h @@ -590,8 +590,7 @@ DomTreeNodeBase *IDomNode = getNode(DomBB); assert(IDomNode && "Not immediate dominator specified for block!"); DFSInfoValid = false; - return (DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))).get(); + return createChild(BB, IDomNode); } /// Add a new node to the forward dominator tree and make it a new root. @@ -604,8 +603,7 @@ assert(!this->isPostDominator() && "Cannot change root of post-dominator tree"); DFSInfoValid = false; - DomTreeNodeBase *NewNode = (DomTreeNodes[BB] = - std::make_unique>(BB, nullptr)).get(); + DomTreeNodeBase *NewNode = createNode(BB); if (Roots.empty()) { addRoot(BB); } else { @@ -786,6 +784,18 @@ protected: void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + DomTreeNodeBase *createChild(NodeT *BB, DomTreeNodeBase *IDom) { + return (DomTreeNodes[BB] = IDom->addChild( + std::make_unique>(BB, IDom))) + .get(); + } + + DomTreeNodeBase *createNode(NodeT *BB) { + return (DomTreeNodes[BB] = + std::make_unique>(BB, nullptr)) + .get(); + } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template -------------- next part -------------- A non-text attachment was scrubbed... Name: D83086.275806.patch Type: text/x-patch Size: 3347 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:58:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 19:58:27 +0000 (UTC) Subject: [PATCH] D83083: DomTree: Remove getChildren() accessor In-Reply-To: References: Message-ID: <2e439c1650b6ec106a96ede3662f67f2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG76c5cb05a3a6: DomTree: Remove getChildren() accessor (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83083/new/ https://reviews.llvm.org/D83083 Files: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/CodeGen/EarlyIfConversion.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/MachineCSE.cpp llvm/lib/CodeGen/MachineLICM.cpp llvm/lib/CodeGen/MachineSink.cpp llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp llvm/lib/Target/Mips/MipsOptimizePICCall.cpp llvm/lib/Transforms/Scalar/ConstantHoisting.cpp llvm/lib/Transforms/Scalar/NewGVN.cpp llvm/lib/Transforms/Utils/LoopSimplify.cpp llvm/lib/Transforms/Utils/LoopUnroll.cpp llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83083.275808.patch Type: text/x-patch Size: 13659 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 12:58:27 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Nicolai_H=C3=A4hnle?= via llvm-commits) Date: Mon, 06 Jul 2020 12:58:27 -0700 (PDT) Subject: [llvm] dfcc68c - DomTree: Remove getRoots() accessor Message-ID: <5f038263.1c69fb81.d3a50.0cfc@mx.google.com> Author: Nicolai Hähnle Date: 2020-07-06T21:58:11+02:00 New Revision: dfcc68c528269a3e0b1cbe7ef22cc92cdfdf7eba URL: https://github.com/llvm/llvm-project/commit/dfcc68c528269a3e0b1cbe7ef22cc92cdfdf7eba DIFF: https://github.com/llvm/llvm-project/commit/dfcc68c528269a3e0b1cbe7ef22cc92cdfdf7eba.diff LOG: DomTree: Remove getRoots() accessor Summary: Avoid exposing details about how roots are stored. This enables subsequent type-erasure changes. v5: - cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83085 Added: Modified: llvm/include/llvm/Analysis/DominanceFrontier.h llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/CodeGen/MachinePostDominators.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/unittests/IR/DominatorTreeTest.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/DominanceFrontier.h b/llvm/include/llvm/Analysis/DominanceFrontier.h index c0bf30e162dd..f67929c997f9 100644 --- a/llvm/include/llvm/Analysis/DominanceFrontier.h +++ b/llvm/include/llvm/Analysis/DominanceFrontier.h @@ -130,7 +130,7 @@ class ForwardDominanceFrontierBase using DomSetType = typename DominanceFrontierBase::DomSetType; void analyze(DomTreeT &DT) { - assert(DT.getRoots().size() == 1 && + assert(DT.root_size() == 1 && "Only one entry block for forward domfronts!"); this->Roots = {DT.getRoot()}; calculate(DT, DT[this->Roots[0]]); diff --git a/llvm/include/llvm/CodeGen/MachineDominators.h b/llvm/include/llvm/CodeGen/MachineDominators.h index 9d31232c9b95..2d26163a76aa 100644 --- a/llvm/include/llvm/CodeGen/MachineDominators.h +++ b/llvm/include/llvm/CodeGen/MachineDominators.h @@ -93,15 +93,6 @@ class MachineDominatorTree : public MachineFunctionPass { void getAnalysisUsage(AnalysisUsage &AU) const override; - /// getRoots - Return the root blocks of the current CFG. This may include - /// multiple blocks if we are computing post dominators. For forward - /// dominators, this will always be a single block (the entry node). - /// - const SmallVectorImpl &getRoots() const { - applySplitCriticalEdges(); - return DT->getRoots(); - } - MachineBasicBlock *getRoot() const { applySplitCriticalEdges(); return DT->getRoot(); diff --git a/llvm/include/llvm/CodeGen/MachinePostDominators.h b/llvm/include/llvm/CodeGen/MachinePostDominators.h index 597bb401a7fa..cee4294f6317 100644 --- a/llvm/include/llvm/CodeGen/MachinePostDominators.h +++ b/llvm/include/llvm/CodeGen/MachinePostDominators.h @@ -41,10 +41,6 @@ class MachinePostDominatorTree : public MachineFunctionPass { FunctionPass *createMachinePostDominatorTreePass(); - const SmallVectorImpl &getRoots() const { - return PDT->getRoots(); - } - MachineDomTreeNode *getRootNode() const { return PDT->getRootNode(); } MachineDomTreeNode *operator[](MachineBasicBlock *BB) const { diff --git a/llvm/include/llvm/Support/GenericDomTree.h b/llvm/include/llvm/Support/GenericDomTree.h index e83e7aa39e7a..407a06043cba 100644 --- a/llvm/include/llvm/Support/GenericDomTree.h +++ b/llvm/include/llvm/Support/GenericDomTree.h @@ -283,11 +283,27 @@ class DominatorTreeBase { DominatorTreeBase(const DominatorTreeBase &) = delete; DominatorTreeBase &operator=(const DominatorTreeBase &) = delete; - /// getRoots - Return the root blocks of the current CFG. This may include - /// multiple blocks if we are computing post dominators. For forward - /// dominators, this will always be a single block (the entry node). + /// Iteration over roots. /// - const SmallVectorImpl &getRoots() const { return Roots; } + /// This may include multiple blocks if we are computing post dominators. + /// For forward dominators, this will always be a single block (the entry + /// block). + using root_iterator = typename SmallVectorImpl::iterator; + using const_root_iterator = typename SmallVectorImpl::const_iterator; + + root_iterator root_begin() { return Roots.begin(); } + const_root_iterator root_begin() const { return Roots.begin(); } + root_iterator root_end() { return Roots.end(); } + const_root_iterator root_end() const { return Roots.end(); } + + size_t root_size() const { return Roots.size(); } + + iterator_range roots() { + return make_range(root_begin(), root_end()); + } + iterator_range roots() const { + return make_range(root_begin(), root_end()); + } /// isPostDominator - Returns true if analysis based of postdoms /// diff --git a/llvm/include/llvm/Support/GenericDomTreeConstruction.h b/llvm/include/llvm/Support/GenericDomTreeConstruction.h index 1e9b0f23c144..bde59ff8c276 100644 --- a/llvm/include/llvm/Support/GenericDomTreeConstruction.h +++ b/llvm/include/llvm/Support/GenericDomTreeConstruction.h @@ -1372,7 +1372,7 @@ struct SemiNCAInfo { if (!DT.DFSInfoValid || !DT.Parent) return true; - const NodePtr RootBB = IsPostDom ? nullptr : DT.getRoots()[0]; + const NodePtr RootBB = IsPostDom ? nullptr : *DT.root_begin(); const TreeNodePtr Root = DT.getNode(RootBB); auto PrintNodeAndDFSNums = [](const TreeNodePtr TN) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp b/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp index ab0d2169e390..418296684d76 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp @@ -199,8 +199,7 @@ bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) { // If there's only one exit, we don't need to do anything, unless this is a // pixel shader and that exit is an infinite loop, since we still have to // insert an export in that case. - if (PDT.getRoots().size() <= 1 && - F.getCallingConv() != CallingConv::AMDGPU_PS) + if (PDT.root_size() <= 1 && F.getCallingConv() != CallingConv::AMDGPU_PS) return false; LegacyDivergenceAnalysis &DA = getAnalysis(); @@ -217,7 +216,7 @@ bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) { bool InsertExport = false; bool Changed = false; - for (BasicBlock *BB : PDT.getRoots()) { + for (BasicBlock *BB : PDT.roots()) { if (isa(BB->getTerminator())) { if (!isUniformlyReached(DA, *BB)) ReturningBlocks.push_back(BB); diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp index 7e172544595a..dd8dc84d9589 100644 --- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp +++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp @@ -1853,7 +1853,7 @@ struct DSEState { if (CommonPred) WorkList.insert(CommonPred); else - for (BasicBlock *R : PDT.getRoots()) + for (BasicBlock *R : PDT.roots()) WorkList.insert(R); NumCFGTries++; diff --git a/llvm/unittests/IR/DominatorTreeTest.cpp b/llvm/unittests/IR/DominatorTreeTest.cpp index 66e122760ef3..16c12b2102a9 100644 --- a/llvm/unittests/IR/DominatorTreeTest.cpp +++ b/llvm/unittests/IR/DominatorTreeTest.cpp @@ -805,7 +805,7 @@ TEST(DominatorTree, InsertFromUnreachable) { BasicBlock *To = B.getOrAddBlock(LastUpdate->Edge.To); PDT.insertEdge(From, To); EXPECT_TRUE(PDT.verify()); - EXPECT_TRUE(PDT.getRoots().size() == 2); + EXPECT_EQ(PDT.root_size(), 2); // Make sure we can use a const pointer with getNode. const BasicBlock *BB5 = B.getOrAddBlock("5"); EXPECT_NE(PDT.getNode(BB5), nullptr); From llvm-commits at lists.llvm.org Mon Jul 6 12:58:28 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Nicolai_H=C3=A4hnle?= via llvm-commits) Date: Mon, 06 Jul 2020 12:58:28 -0700 (PDT) Subject: [llvm] f987ba3 - DomTree: add private create{Child,Node} helpers Message-ID: <5f038264.1c69fb81.150c9.28bd@mx.google.com> Author: Nicolai Hähnle Date: 2020-07-06T21:58:11+02:00 New Revision: f987ba3cf9af1a2fa168c5a707863b28efd61d73 URL: https://github.com/llvm/llvm-project/commit/f987ba3cf9af1a2fa168c5a707863b28efd61d73 DIFF: https://github.com/llvm/llvm-project/commit/f987ba3cf9af1a2fa168c5a707863b28efd61d73.diff LOG: DomTree: add private create{Child,Node} helpers Summary: Aside from unifying the code a bit, this change smooths the transition to use of future "opaque generic block references" in the type-erased dominator tree base class. Change-Id: If924b092cc8561c4b6a7450fe79bc96df0e12472 Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83086 Added: Modified: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/GenericDomTree.h b/llvm/include/llvm/Support/GenericDomTree.h index 407a06043cba..10e591a69d36 100644 --- a/llvm/include/llvm/Support/GenericDomTree.h +++ b/llvm/include/llvm/Support/GenericDomTree.h @@ -590,8 +590,7 @@ class DominatorTreeBase { DomTreeNodeBase *IDomNode = getNode(DomBB); assert(IDomNode && "Not immediate dominator specified for block!"); DFSInfoValid = false; - return (DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))).get(); + return createChild(BB, IDomNode); } /// Add a new node to the forward dominator tree and make it a new root. @@ -604,8 +603,7 @@ class DominatorTreeBase { assert(!this->isPostDominator() && "Cannot change root of post-dominator tree"); DFSInfoValid = false; - DomTreeNodeBase *NewNode = (DomTreeNodes[BB] = - std::make_unique>(BB, nullptr)).get(); + DomTreeNodeBase *NewNode = createNode(BB); if (Roots.empty()) { addRoot(BB); } else { @@ -786,6 +784,18 @@ class DominatorTreeBase { protected: void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + DomTreeNodeBase *createChild(NodeT *BB, DomTreeNodeBase *IDom) { + return (DomTreeNodes[BB] = IDom->addChild( + std::make_unique>(BB, IDom))) + .get(); + } + + DomTreeNodeBase *createNode(NodeT *BB) { + return (DomTreeNodes[BB] = + std::make_unique>(BB, nullptr)) + .get(); + } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template diff --git a/llvm/include/llvm/Support/GenericDomTreeConstruction.h b/llvm/include/llvm/Support/GenericDomTreeConstruction.h index bde59ff8c276..464de4e2b3ba 100644 --- a/llvm/include/llvm/Support/GenericDomTreeConstruction.h +++ b/llvm/include/llvm/Support/GenericDomTreeConstruction.h @@ -187,9 +187,7 @@ struct SemiNCAInfo { // Add a new tree node for this NodeT, and link it as a child of // IDomNode - return (DT.DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))) - .get(); + return DT.createChild(BB, IDomNode); } static bool AlwaysDescend(NodePtr, NodePtr) { return true; } @@ -587,9 +585,7 @@ struct SemiNCAInfo { // all real exits (including multiple exit blocks, infinite loops). NodePtr Root = IsPostDom ? nullptr : DT.Roots[0]; - DT.RootNode = (DT.DomTreeNodes[Root] = - std::make_unique>(Root, nullptr)) - .get(); + DT.RootNode = DT.createNode(Root); SNCA.attachNewSubtree(DT, DT.RootNode); } @@ -610,8 +606,7 @@ struct SemiNCAInfo { // Add a new tree node for this BasicBlock, and link it as a child of // IDomNode. - DT.DomTreeNodes[W] = IDomNode->addChild( - std::make_unique>(W, IDomNode)); + DT.createChild(W, IDomNode); } } @@ -661,10 +656,7 @@ struct SemiNCAInfo { // The unreachable node becomes a new root -- a tree node for it. TreeNodePtr VirtualRoot = DT.getNode(nullptr); - FromTN = - (DT.DomTreeNodes[From] = VirtualRoot->addChild( - std::make_unique>(From, VirtualRoot))) - .get(); + FromTN = DT.createChild(From, VirtualRoot); DT.Roots.push_back(From); } From llvm-commits at lists.llvm.org Mon Jul 6 12:59:32 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:59:32 +0000 (UTC) Subject: [PATCH] D83034: [GlobalISel] Don't skip adding predicate matcher In-Reply-To: References: Message-ID: <7ec61842c5c74883f38b2237dc9cb3cd@localhost.localdomain> arsenm added a comment. In D83034#2133375 , @dsanders wrote: > Could you add a test case or point us at a example of an existing test that goes wrong? The description in the commit message is unlikely to be the full story as we already cover loads with non-null MemoryVT's. My best guess is you are attempting to use builtin predicates and custom predicates together. I don't see a reason why that shouldn't be allowed but it's not something that was intended as the goal was aiming to fully remove the custom C++ from the tablegen headers so that tablegen could do some transformations on sextload/zextload and similar to fix DAGISel vs GlobalISel modelling mismatches. I think the problem is broader than just combining custom predicates and builtin. The emitter here implicitly assumes all of these builtin PatFrag predicates are used exactly as the hierarchy used to define the default/generic load/store patterns. However, the PatFrags are much more free form and you can define a patfrag that combines multiple of these predicates in the same "layer" of the load/store hierarchy. The AMDGPU patterns redefine the entire set of load/store patterns, and in some cases it makes sense to combine all the predicates at once. I had to somewhat artificially add new layers of PatFrags to only apply a single predicate at a time. This also isn't consistently applied (for example, it does work to combine the AddressSpaces predicate and alignment) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83034/new/ https://reviews.llvm.org/D83034 From llvm-commits at lists.llvm.org Mon Jul 6 13:00:00 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:00:00 +0000 (UTC) Subject: [PATCH] D82159: Add a cmake warning when someone tries to configure clang-tools-extra without clang In-Reply-To: References: Message-ID: <8f8a55bd1dc9deaeceb1250fc732dfa7@localhost.localdomain> stephenneuendorffer added a comment. I'd advocate for issuing a message (along the lines of "clang-tools-extra is enabled, which depends on 'clang'. Automatically enabling 'clang'." and 'doing the right thing' by enabling clang. In fact, this seems to be a common enough paradigm that it should be handled in the cmake infrastructure. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82159/new/ https://reviews.llvm.org/D82159 From llvm-commits at lists.llvm.org Mon Jul 6 13:00:45 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:00:45 +0000 (UTC) Subject: [PATCH] D83253: MachineBasicBlock: add printName method Message-ID: nhaehnle created this revision. nhaehnle added a reviewer: arsenm. Herald added subscribers: hiraditya, wdng. Herald added a project: LLVM. Common up some existing MBB name printing logic into a single place. Note that basic block dumping now prints the same set of attributes as the MIRPrinter. Change-Id: I8f022bbd922e831bc96d63143d7472c03282530b Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83253 Files: llvm/include/llvm/CodeGen/MachineBasicBlock.h llvm/lib/CodeGen/MIRPrinter.cpp llvm/lib/CodeGen/MachineBasicBlock.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83253.275810.patch Type: text/x-patch Size: 7202 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:01:17 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:01:17 +0000 (UTC) Subject: [PATCH] D82788: AMDGPU: Fix alignment requirements for 96bit and 128bit local loads and stores In-Reply-To: References: Message-ID: <52ebd2074f5990acf1cf9a00c41ffcb9@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h:700 + bool hasUnalignedDSAccess() const { + return UnalignedDSAccess; + } ---------------- I believe this is actually the same control as UnalignedBufferAccess, so a new feature isn't needed (but this needs double checking) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82788/new/ https://reviews.llvm.org/D82788 From llvm-commits at lists.llvm.org Mon Jul 6 13:01:40 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:01:40 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <835b1cf56d1034438f843331bd5d70c6@localhost.localdomain> amyk updated this revision to Diff 275809. amyk added a comment. Address comment of adding more specific ISA 3.1 comments to `PPCInstrPrefix.td`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275809.patch Type: text/x-patch Size: 11056 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:01:47 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:01:47 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: Tyker added a comment. Thank you for reducing rG7ea46aee3670981827c04df89b2c3a1cbdc7561b this seems quite useful. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. and the key depends on a pointer value that will change when the Module gets cloned. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 13:02:46 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:02:46 +0000 (UTC) Subject: [PATCH] D83088: Introduce CfgTraits abstraction In-Reply-To: References: Message-ID: <65bacd0bc0f6fb64125cc2fe2e029694@localhost.localdomain> nhaehnle updated this revision to Diff 275811. nhaehnle marked 4 inline comments as done. nhaehnle added a comment. - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83088/new/ https://reviews.llvm.org/D83088 Files: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/CodeGen/MachineCfgTraits.h llvm/include/llvm/IR/CFG.h llvm/include/llvm/Support/CfgTraits.h llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/MachineCfgTraits.cpp llvm/lib/IR/CFG.cpp llvm/lib/IR/CMakeLists.txt llvm/lib/Support/CMakeLists.txt llvm/lib/Support/CfgTraits.cpp llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h mlir/include/mlir/IR/Dominance.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83088.275811.patch Type: text/x-patch Size: 34804 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:03:25 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:03:25 +0000 (UTC) Subject: [PATCH] D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes In-Reply-To: References: Message-ID: <5f87fe4630965a4a802bc2d01fab1286@localhost.localdomain> nhaehnle updated this revision to Diff 275812. nhaehnle marked an inline comment as done. nhaehnle added a comment. - rename generic_{begin,end,children} back without the generic_ prefix and refer explictly to base class methods in NewGVN, which wants to mutate the order of dominator tree node children directly Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83089/new/ https://reviews.llvm.org/D83089 Files: llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/Support/CMakeLists.txt llvm/lib/Support/GenericDomTree.cpp llvm/lib/Transforms/Scalar/ADCE.cpp llvm/lib/Transforms/Scalar/NewGVN.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83089.275812.patch Type: text/x-patch Size: 43287 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:03:28 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:03:28 +0000 (UTC) Subject: [PATCH] D83237: [flang] Add missing include for std::min In-Reply-To: References: Message-ID: <5e6ba592c9630dcc34449aa15bfcbf74@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1b183918184e: [flang] Add missing include for std::min (authored by tskeith). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83237/new/ https://reviews.llvm.org/D83237 Files: flang/runtime/file.cpp Index: flang/runtime/file.cpp =================================================================== --- flang/runtime/file.cpp +++ flang/runtime/file.cpp @@ -9,6 +9,7 @@ #include "file.h" #include "magic-numbers.h" #include "memory.h" +#include #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: D83237.275813.patch Type: text/x-patch Size: 317 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:04:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:04:19 +0000 (UTC) Subject: [PATCH] D83253: MachineBasicBlock: add printName method In-Reply-To: References: Message-ID: arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. Another step towards fixing the unfortunate split between MIR and debug printing Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83253/new/ https://reviews.llvm.org/D83253 From llvm-commits at lists.llvm.org Mon Jul 6 13:04:23 2020 From: llvm-commits at lists.llvm.org (Kit Barton via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:04:23 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: kbarton updated this revision to Diff 275814. kbarton marked 6 inline comments as done. kbarton added a comment. - Address review comments - minor clean in PPC::lowerFormalArguments and update comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 Files: llvm/lib/Target/PowerPC/CMakeLists.txt llvm/lib/Target/PowerPC/LLVMBuild.txt llvm/lib/Target/PowerPC/PPC.h llvm/lib/Target/PowerPC/PPC.td llvm/lib/Target/PowerPC/PPCCallLowering.cpp llvm/lib/Target/PowerPC/PPCCallLowering.h llvm/lib/Target/PowerPC/PPCInstructionSelector.cpp llvm/lib/Target/PowerPC/PPCLegalizerInfo.cpp llvm/lib/Target/PowerPC/PPCLegalizerInfo.h llvm/lib/Target/PowerPC/PPCRegisterBankInfo.cpp llvm/lib/Target/PowerPC/PPCRegisterBankInfo.h llvm/lib/Target/PowerPC/PPCRegisterBanks.td llvm/lib/Target/PowerPC/PPCSubtarget.cpp llvm/lib/Target/PowerPC/PPCSubtarget.h llvm/lib/Target/PowerPC/PPCTargetMachine.cpp llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-ret.ll llvm/test/CodeGen/PowerPC/GlobalISel/legalize-ret.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83100.275814.patch Type: text/x-patch Size: 22380 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:04:32 2020 From: llvm-commits at lists.llvm.org (Kit Barton via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:04:32 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: <49c5159660e00b78c848eae20e45a4e1@localhost.localdomain> kbarton added a comment. In D83100#2130726 , @tschuett wrote: > AArch64 has a sub-directory for GlobalISel related things: > https://reviews.llvm.org/D81116 I like this idea. I will do the refactoring in a subsequent commit and update the patch, so if others do not like it it's easy to undo. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstructionSelector.cpp:39 + bool select(MachineInstr &I) override; + static const char *getName() { return DEBUG_TYPE; } + ---------------- arsenm wrote: > kbarton wrote: > > arsenm wrote: > > > I'm pretty sure you don't need these and all the other places that override this are dead code > > I don't follow this. > > Both select and getName seem to be required - getName is needed by the base InstructionSelector implementation in GlobalISel; select is needed by the PPCGenGlobalISel.inc file generated below. > > > > It is entirely possible I'm doing something incorrect though. Could you explain some more? > Oh right, this isn't the direct pass. I think manual getName overrides are dead code on passes and set by the INITIALIZE_PASS* macros Yes, that could be. I haven't looked at the implementation for how it works on passes. Are you OK marking this as done? ================ Comment at: llvm/lib/Target/PowerPC/PPCInstructionSelector.cpp:23 + +#define DEBUG_TYPE "ppc-isel" + ---------------- madhur13490 wrote: > May be ppc-gisel better? Yes, this is a better idea. Will change it. ================ Comment at: llvm/lib/Target/PowerPC/PPCLegalizerInfo.h:21 + +/// This class provides the information for the target register banks. +class PPCLegalizerInfo : public LegalizerInfo { ---------------- tschuett wrote: > Legalizer or register banks? Good catch. Updated to legalizer. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Mon Jul 6 13:04:38 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:04:38 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: <26b8fb11015d2afab539156dd0ede4f9@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:3774 + // creating a temporary stack slot, again. + if (Flags.isByVal() && !hasCopy) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); ---------------- LuoYuanke wrote: > Let me check below 2 rule is right or not. > 1. On linux when the byval attribute is set, it indicate copy the value that point by the pointer to the parameter stack. > 2. On window when the byval attribute is set, it indicate that allocate temporary object in caller, copy the value to the temporary, and store the temporary pointer (which point to the temporary object) to the parameter stack. > > On linux, the VA.getLocInfo() is CCValAssign::Full, and on windows is the VA.getLocInfo() is CCValAssign::Indirect. > > So I think we can just check the VA.getLocInfo(). If VA.getLocInfo() is CCValAssign::Indirect, we can NOT copy object. Instead we just restore the pointer. > `(Flags.isByVal() && VA.getLocInfo() != CCValAssign::Indirect)` The hasCopy should just be isByVal with no inversion and the Flags.isByVal() should be removed and replaced with isByVal. isByVal replaced the version in Flags. Or what Yuanke said might work. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 From llvm-commits at lists.llvm.org Mon Jul 6 13:04:56 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:04:56 +0000 (UTC) Subject: [PATCH] D83237: [flang] Add missing include for std::min In-Reply-To: References: Message-ID: <6895be5c5c08f95aa7b7c1bddaef6f16@localhost.localdomain> tskeith added a comment. In D83237#2133321 , @DavidTruby wrote: > We must have overlapped here somehow, I actually created a revision for this a couple of hours ago > https://reviews.llvm.org/D83227 Thanks. Phabricator seems very erratic with email. I only received yours a few minutes ago. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83237/new/ https://reviews.llvm.org/D83237 From llvm-commits at lists.llvm.org Mon Jul 6 13:04:58 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:04:58 +0000 (UTC) Subject: [PATCH] D83094: Analysis: Add a GenericCycleInfo analysis In-Reply-To: References: Message-ID: nhaehnle updated this revision to Diff 275815. nhaehnle added a comment. - cleanup #includes - use is_contained instead of `llvm::find(range, ...) != range.end()` pattern - mark some variables as unused in release builds - address additional review comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83094/new/ https://reviews.llvm.org/D83094 Files: llvm/include/llvm/Analysis/CycleInfo.h llvm/include/llvm/Analysis/GenericCycleInfo.h llvm/include/llvm/InitializePasses.h llvm/lib/Analysis/Analysis.cpp llvm/lib/Analysis/CMakeLists.txt llvm/lib/Analysis/CycleInfo.cpp llvm/lib/Analysis/GenericCycleInfo.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassRegistry.def llvm/test/Analysis/CycleInfo/basic.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83094.275815.patch Type: text/x-patch Size: 52498 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:05:53 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:05:53 +0000 (UTC) Subject: [PATCH] D83088: Introduce CfgTraits abstraction In-Reply-To: References: Message-ID: <0c5020d437ce50bbdbe58b13d108c85e@localhost.localdomain> nhaehnle added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineCfgTraits.h:133 + ++m_def; + if (m_def == m_instr->defs().end()) { + ++m_instr; ---------------- arsenm wrote: > != return early? The logic is actually subtly broken in the presence of instructions without defs, I just didn't notice it because it currently affects only debug printing logic. Going to fix it. ================ Comment at: llvm/include/llvm/CodeGen/MachineCfgTraits.h:136-138 + // Prefer to avoid support for bundled instructions as long as we + // don't really need it. + assert(!m_instr->isBundle()); ---------------- arsenm wrote: > I've been thinking about more aggressively using bundles around call sites to handle waterfall looping around divergent calls with SGPR arguments Hmm, so what's the correct iteration behavior in the presence of bundles? Iterate over all instructions in the bundle (which is that MachineBasicBlock::instr_iterator does) and only iterate over explicit defs? I think that's what makes the most sense, and what I'm going with for now... ================ Comment at: llvm/lib/CodeGen/MachineCfgTraits.cpp:27-29 +void MachineCfgTraits::Printer::printBlockName(raw_ostream &out, + MachineBasicBlock *block) const { + out << "bb." << block->getNumber(); ---------------- arsenm wrote: > I think this should be added to MachineBasicBlock. The same logic is already repeated in MIRPrinter (and the MBB dump function uses a different prefix) D83253 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83088/new/ https://reviews.llvm.org/D83088 From llvm-commits at lists.llvm.org Mon Jul 6 13:06:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:06:27 +0000 (UTC) Subject: [PATCH] D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes In-Reply-To: References: Message-ID: <3a3a31cb6b534b1638298b3357586a90@localhost.localdomain> nhaehnle added inline comments. ================ Comment at: llvm/include/llvm/Support/GenericDomTree.h:81-82 - iterator begin() { return Children.begin(); } - iterator end() { return Children.end(); } + iterator generic_begin() { return Children.begin(); } + iterator generic_end() { return Children.end(); } const_iterator begin() const { return Children.begin(); } ---------------- arsenm wrote: > Iterating over "generic" seems like a strange naming choice? The generic_children range is a bit better, but this should probably match Okay, so the reason why I originally did this was that NewGVN wants to change the order of children (for what seems like dubious reasons, but I didn't want to touch *that* too), and providing non-const iteration from DomTreeNodeBase is problematic. A cleaner way is to have NewGVN explicitly refer to the base class begin/end methods. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83089/new/ https://reviews.llvm.org/D83089 From llvm-commits at lists.llvm.org Mon Jul 6 13:06:32 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:06:32 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <6dd0a39622d87338239a69c15751abd5@localhost.localdomain> lei accepted this revision as: lei. lei added a comment. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 13:07:00 2020 From: llvm-commits at lists.llvm.org (Scott Linder via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:07:00 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: <6c65a0e32876a6a588f21f620ca1455b@localhost.localdomain> scott.linder added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1245 + // + // To to get the exact same bytes in re-assembled binary, we disassemble + // aamdhsa_next_free_sgpr as the amdgcn.next_free_sgpr assembler symbol and ---------------- rochauha wrote: > rochauha wrote: > > scott.linder wrote: > > > For this and the above case we should have tests to prove this out. I.e. assemble sources to a binary, disassemble and reassemble it, and then compare the two binaries. Ideally we would do this for some edge cases around VGPR/SGPR allocation granularity. > > > > > > There may need to be some fixup between disassembly and reassembly to account for the remaining non-reassembleable bits produced by llvm-objdump, but they should be pretty minor for a trivial kernel, and I would expect you could handle them with just `sed` which seems to be available to LIT tests. > > Right now we can't really re-assemble in the lit-test. This needs to be tested 'informally' by: > > > > - Manually writing a small test case. Make a copy of it too. > > - Assembling it into the binary : Binary-1. > > - Disassembling it. > > - Replace the original kernel descriptor with the disassembled kernel descriptor in the copy. > > - Assemble the copy : Binary-2. > > - Compare Binary-1 and Binary-2. > Went this route to check whether re-assembled binaries match or not. Turns out that both binaries match, in size (overall size as well as size of sections) and also in terms of all the disassembled content. But a `diff object1 object2` says that binary files differ. I'm not sure I follow what you are describing; my thought was to start with just an asm source file containing only the kernel descriptor directive in the default section, and compare the output of the following (with, e.g. diff, as you mention): * Assemble it to an object file with llvm-mc * Assemble it to an object file with llvm-mc | disassemble the kernel descriptor symbol | trim any human-readable prologue | assemble it to an object file with llvm-mc As a trivial example, diff doesn't find any difference for the following example: ``` $ printf '.amdhsa_kernel my_kernel\n.amdhsa_next_free_vgpr 0\n.amdhsa_next_free_sgpr 0\n.end_amdhsa_kernel' >a.s $ release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj a.s >a.o $ diff a.o \ <(release/bin/llvm-objdump --triple=amdgcn-amd-amdhsa --mcpu=gfx908 --disassemble-symbols=my_kernel.kd a.o \ | tail -n +8 \ | release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj) ``` I don't think you need to use `FileCheck` for these tests at all, you can just rely on ending the RUN pipeline with `diff`, which seems to be supported by lit. You can then just copy-paste the test and edit fields in the input to validate edge cases for things like the SGPR/VGPR allocation directives. I think more comprehensive testing, including for other sections and executables/DSOs, would be good eventually but for now we should at least have some tests that explicitly confirm the KD disassembly round-trips. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1345 + return MCDisassembler::Success; +} // decodeCOMPUTE_PGM_RSRC1() + ---------------- rochauha wrote: > scott.linder wrote: > > I don't know the general conventions here, but I don't think I have seen a comment for the end of a function elsewhere in LLVM. I do know that it is required for namespaces, so maybe it is permitted for long functions? > I'm not sure. I added those comments because these functions were getting quite long. I would lean towards omitting these, especially with the functions becoming shorter. For example, `decodeCOMPUTE_PGM_RSRC2()` is now <50 lines long at the entire definition now fits on one screen for me. It seems like there are other examples of this in the codebase, though, so I'm OK with it for the longer functions. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1349 + uint32_t FourByteBuffer, raw_string_ostream &KdStream) const { + // Decode as directives that handle COMPUTE_PGM_RSRC2. + StringRef Indent = "\t"; ---------------- Can you expand this comment a little and move it to a Doxygen comment for the function? ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1624 + } +} // decodeKernelDescriptorDirective() + ---------------- Need to handle the "default" case here: ``` llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1510:1: warning: control may reach end of non-void function [-Wreturn-type] ``` ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1630 + // CP microcode requires the kernel descriptor to be 64 aligned. + if (Bytes.size() != 64 || KdAddress % 64 != 0) + return MCDisassembler::Fail; ---------------- rochauha wrote: > scott.linder wrote: > > The `!= 0` here is redundant. > I know, but I thought that it is more readable this way. Fair enough, in a type-safe language it would be required anyway, so it seems reasonable. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1665 + Size = 256; + return MCDisassembler::SoftFail; + } ---------------- rochauha wrote: > scott.linder wrote: > > I'm still not sure what we landed on for the semantics of `SoftFail` here? > It should be Success / Fail based on what the bytes are for code object v2. But there's nothing we are 'doing' at the moment for v2, I returned SoftFail. If `SoftFail` isn't applicable I don't think we should return it, even if it is just because we haven't implemented something yet. It existing doesn't mean it needs to be used, I think it has a very narrow definition that doesn't apply here. Maybe just emit a diagnostic and return `Fail` so we get the "decode as .byte" behavior? What exactly happens now with the current patch as-is? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 From llvm-commits at lists.llvm.org Mon Jul 6 13:07:34 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:07:34 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <09601fbc20eb675dfca6d59e5db785f8@localhost.localdomain> lebedev.ri marked 3 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > and the key depends on a pointer value that will change when the Module gets cloned. No, the current logic is correct, that map doesn't live that long. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 13:09:38 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:09:38 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <2c3df74f1cb78d68cb5195224092cef5@localhost.localdomain> lebedev.ri added a comment. In D83101#2133062 , @foad wrote: > @lebedev.ri this is causing assertion failures and verification failures in some of our downstream tests. Here's a test case: > > $ cat reduced.ll > define void @main(<3 x i32> inreg %w) { > entry: > %a = extractelement <3 x i32> undef, i32 0 > %b = extractelement <3 x i32> undef, i32 1 > %x = extractelement <3 x i32> %w, i32 2 > %y = insertelement <4 x i32> undef, i32 %x, i32 2 > %z = insertelement <4 x i32> %y, i32 undef, i32 3 > store <4 x i32> %z, <4 x i32> addrspace(7)* undef, align 16 > ret void > } > $ ~/llvm-debug/bin/opt -scalarizer -o /dev/null reduced.ll > Instruction does not dominate all uses! > = extractelement [145938144 x half] , i32 undef > %z.upto2 = insertelement <4 x i32> undef, i32 , i32 2 > in function main > LLVM ERROR: Broken function found, compilation aborted! > Thanks for test case, looking. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Mon Jul 6 13:10:59 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:10:59 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: amyk accepted this revision as: amyk. amyk added a comment. Unless Lei and Nemanja have any additional comments, LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 13:11:09 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:11:09 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: <40d94a84ab1b783da3210ad7bda6fead@localhost.localdomain> arsenm added a comment. In D83100#2134028 , @kbarton wrote: > In D83100#2130726 , @tschuett wrote: > > > AArch64 has a sub-directory for GlobalISel related things: > > https://reviews.llvm.org/D81116 > > > I like this idea. I will do the refactoring in a subsequent commit and update the patch, so if others do not like it it's easy to undo. I don't particularly like the AArch64 split. There isn't an entirely clean break between globalisel parts and the rest of the backend. The other target subdirectories are used for places where there's a separate library Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Mon Jul 6 13:14:12 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:14:12 +0000 (UTC) Subject: [PATCH] D82786: [llvm-ar] Unsupport test on FreeBSD In-Reply-To: References: Message-ID: <9bdba1ac8f4b75255425e6b9448e6aa2@localhost.localdomain> smeenai added a comment. Thanks @arichardson and @adalava for testing, and thanks @MaskRay for committing! Sorry about the breakage. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82786/new/ https://reviews.llvm.org/D82786 From llvm-commits at lists.llvm.org Mon Jul 6 13:14:35 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 20:14:35 +0000 (UTC) Subject: [PATCH] D81805: [RISCV] Fix isStoreToStackSlot In-Reply-To: References: Message-ID: <90f4fd8e4dcd6126216f49a9db9b6f94@localhost.localdomain> luismarques accepted this revision. luismarques added a comment. This revision is now accepted and ready to land. LGTM. Good catch Roger! (I have verified that the code change makes sense based both on tablegen definitions and the sanity test that Roger indicated on the comment). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81805/new/ https://reviews.llvm.org/D81805 From llvm-commits at lists.llvm.org Mon Jul 6 13:15:58 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:15:58 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <835a0ac090b77a05fb63a88f405bd974@localhost.localdomain> Ayal added inline comments. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:439 // model could then apply the recurrence type to these instructions, // without needing a white list of instructions to ignore. collectCastsToIgnore(TheLoop, ExitInstruction, RecurrenceType, CastInsts); ---------------- dmgreen wrote: > Ayal wrote: > > Perhaps the above "better way" would also help recognize and record horizontal reductions? > Hmm. The reason I kept them separate was that this method is already pretty complex. I was trying to keep thing simpler. Adding the ability to detect a single chain of operations from Phi to LoopExitValue that can be used for horizontal reductions looks.. difficult. And error prone. :) If you think it's worth it then I can certainly give it a go! I like the separation of concerns in keeping them separate though. > > The extra things that AddReductionVar will detect that getReductionOpChain will not are: > - Phi/select predicated reductions like in if-conversion-reductions.ll and if-reduction.ll. These would need some form of predicated reduction intrinsic. > - Narrow bitwidths. This one I could add. > - Subs/FSubs are treated like Adds/FAdds for vertical reductions. Would be good to see if above TODO can be addressed - providing the set of all instructions that take part in the reduction. This set could then be used for checking in-loop reductions. Hopefully this could help simplify both, and keep them in some sync. But could be done later, possibly with another TODO.. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:848 + + if (!isCorrectOpcode(Cur)) + return {}; ---------------- dmgreen wrote: > Ayal wrote: > > `|| !Cur->hasNUses(ExpectedUses)` ? > > > > > > nit: can alternatively let getNextInstruction check its result and return only valid ones, e.g.: > > > > ``` > > bool RedOpIsCmp = (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp); > > unsigned ExpectedUses = RedOpIsCmp ? 2 : 1; > > > > auto getNextInstruction = [&](Instruction *Cur) { > > if (!Cur->hasNUses(ExpectedUses)) > > return nullptr; > > auto *FirstUser = Cur->user_begin(); > > if (!RedOpIsCmp) > > return FirstUser->getOpcode() == RedOp ? FirstUser : nullptr; > > // Handle cmp/select pair: > > auto *Sel = dyn_cast(*FirstUser) || > > dyn_cast(*std::next(FirstUser)); > > if (SelectPatternResult::isMinOrMax(matchSelectPattern(Sel, LHS, RHS).Flavor)) > > return Sel; > > return nullptr; > > } > > > > for (auto *Cur = getNextInstruction(Phi); Cur && Cur != LoopExitInstr; Cur = getNextInstruction(Cur)) > > ReductionOperations.push_back(Cur); > > ``` > This is the loop exit instr, so can have as many uses as it likes I believe. Ahh, ok. (It should have ExpectedUses+1 users being in lcssa.) "instruciton" ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:856 + // The loop exit instruction also needs to be the same opcode. We dont allow + // them to be Subs. + if (!isCorrectOpcode(Cur)) ---------------- Is Subs the only issue?. Can check this earlier, before traversing the chain, although it is pushed back last, here. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812 + if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) && + LHS->hasOneUse() && + findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) { ---------------- dmgreen wrote: > fhahn wrote: > > Ayal wrote: > > > Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr? > > > > > > Note that when extended to handle reductions with conditional bumps, some ops will have more than one use. > > Instead of doing a recursive traversal, would it be simpler to just do the traversal iteratively, at least as long as we are only using at a single use chain? > Yeah, that direction makes it a lot simpler. Thanks. Is treating sub as an add reduction something in-loop reduction could support as a future extension? ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7516 + VPlanPtr &Plan, VPRecipeBuilder &RecipeBuilder) { + for (auto &Reduction : Legal->getReductionVars()) { + PHINode *Phi = Reduction.first; ---------------- This is the other potential use of for (auto &Reduction : CM.getInloopReductions()). ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1041 + /// outside. + void categorizeReductions(); + ---------------- Ayal wrote: > collectInLoopReductions()? Perhaps worth holding a map of in loop reduction phi's with their chains. Thanks. Worth updating the comment. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769 // MinMax reduction have the start value as their identify. - if (VF == 1) { + if (VF == 1 || UseInloopReductions) { VectorStart = Identity = ReductionStartValue; ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > Ayal wrote: > > > > This is dead code if cmp/select chains are not recognized yet, as noted above. > > > I've added the code to handle minmax too (but not tested it a lot yet. I will try that now). > > > > > > MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.) > > > > > > My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away. > > Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize? > -Os sounds like a good plan. It will take some backend work to make it efficient enough first though. And predicated reductions? Hoisting the horizontal reduction from the middle block into the loop could potentially eliminate the middle block (as in tests below), so could presumably lead to code of smaller size? At-least for in-loop chains of a single link. > And predicated reductions? These are yet to be handled in-loop, right? ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6537 + // want to record it as such. + if (!ForceInloopReductions) + continue; ---------------- dmgreen wrote: > Ayal wrote: > > Move this invariant check out as another early-exit? > This does look a little strange here on it's own. The followup patch to add the TTI hook makes it look like: > if (!PreferInloopReductions && > !TTI.preferInloopReduction(Opcode, Phi->getType(), > TargetTransformInfo::ReductionFlags())) > continue; Then better placed above right after defining Phi? ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286 + RecurrenceDescriptor &RdxDesc = Reduction.second; + if (CM.useInloopReductions(Reduction.first)) { + PHINode *Phi = Reduction.first; ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > Ayal wrote: > > > > Iterate over in loop reductions? > > > Do you mean adding an iterator for iterating over reductions, stepping over the ones not inloop? > > > > > > It would seem like it's similar to the existing code, but as a new iterator class. My gut says the current code is simpler and clearer what is going on? > > Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars(). > > > > (Better early-exit via "if (!CM.isInLoopReduction(Reduction.first)) continue;") > I believe that InloopReductionChains would not iterate in a deterministic order, which is why I avoided it. > > Perhaps that would not matter here? The reductions should be independent anyway. Seems safer to try and use deterministic ordering anyway if we can. Agreed it would be better to use deterministic ordering. How about letting InloopReductionChains be a MapVector and iterate over for (auto &Reduction : CM.getInloopReductions())? The number of reductions is expected to be small, w/o removals. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Mon Jul 6 13:17:29 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:17:29 +0000 (UTC) Subject: [PATCH] D83092: DomTree: Add findSiblingOfUncle helper In-Reply-To: References: Message-ID: <7b5d68281cf856b9deb215ecd02335dc@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Support/GenericDomTree.cpp:220 +/// the degenerate case where \p A itself is a sibling of \p Uncle. +const GenericDomTreeNodeBase *GenericDominatorTreeBase::findSiblingOfUncle( + const GenericDomTreeNodeBase *A, ---------------- nhaehnle wrote: > arsenm wrote: > > I'm not sure these are the right family analogies. This could also find a great uncle, or the same parent. > Fair enough, do you have a suggestion for a better name? `findSiblingOfNthUncle` perhaps? I don't really have a better idea. The comment could maybe explain more of the cases it can encounter? "some ancestor" is a bit vague. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83092/new/ https://reviews.llvm.org/D83092 From llvm-commits at lists.llvm.org Mon Jul 6 13:18:26 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:18:26 +0000 (UTC) Subject: [PATCH] D83022: Add option LLVM_NM to allow specifying the location of the llvm-nm tool. In-Reply-To: References: Message-ID: <67efbe4894ce909a8c07c7225aab6405@localhost.localdomain> smeenai added a comment. Sure, I'll land it for you. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83022/new/ https://reviews.llvm.org/D83022 From llvm-commits at lists.llvm.org Mon Jul 6 13:18:42 2020 From: llvm-commits at lists.llvm.org (Simon Atanasyan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:18:42 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: atanasyan accepted this revision. atanasyan added a comment. This revision is now accepted and ready to land. LGTM. Thanks CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Mon Jul 6 13:20:31 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:20:31 +0000 (UTC) Subject: [PATCH] D75741: AMDGPU: Add check to recompute merge-able instructions In-Reply-To: References: Message-ID: arsenm requested changes to this revision. arsenm added a comment. This revision now requires changes to proceed. Is this still necessary? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75741/new/ https://reviews.llvm.org/D75741 From llvm-commits at lists.llvm.org Mon Jul 6 13:22:07 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:22:07 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types Message-ID: stefanp created this revision. stefanp added reviewers: nemanjai, sfertile, lei, NeHuang, hfinkel. Herald added subscribers: shchenz, kbarton, hiraditya. Herald added a project: LLVM. Currently the instruction paddi always takes s34imm as the type for the 34 bit immediate. However, the PC Relative form of the instruction should not produce the same fixup as the non PC Relative form. This patch splits the s34imm type into s34imm and s34imm_pcrel so that two different fixups can be emitted. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83255 Files: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83255.275820.patch Type: text/x-patch Size: 10867 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:24:15 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:24:15 +0000 (UTC) Subject: [PATCH] D83024: [PGO] Instrument function entry BB by default in IR PGO In-Reply-To: References: Message-ID: <7b19659e22b47a897913c2879a05cd25@localhost.localdomain> davidxl added a comment. If we require profile-use also to use the option, it will work, but I think it is better and more convenient to change variant bit (I believe there are plenty). Bumping version can potentially complicate things in the future. We should probably also add a new directive in text format for the variant, something like :entry_first When the directive is lacking, it means the old variant. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83024/new/ https://reviews.llvm.org/D83024 From llvm-commits at lists.llvm.org Mon Jul 6 13:25:26 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:25:26 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <2fbfc53b41f424d4cce626df834d95f7@localhost.localdomain> lebedev.ri added a comment. Was there an RFC for this? While i agree it likely makes sense to have these for consistency, i'm not sure why they are *needed* for implementing the Embedded-C fixed point support in Clang. ================ Comment at: llvm/lib/IR/Verifier.cpp:4948-4951 + Assert(Op1->getType()->isIntegerTy(), + "first operand of [us]shl_sat must be an int type"); + Assert(Op2->getType()->isIntegerTy(), + "second operand of [us]shl_sat must be an int type"); ---------------- I don't think it makes sense to limit these to scalars. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Mon Jul 6 13:26:13 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:26:13 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <86eaa51cff393e4bd4b6b9f32d668974@localhost.localdomain> dblaikie added subscribers: JDevlieghere, labath. dblaikie added a comment. In D82975#2128353 , @dstenb wrote: > In D82975#2127201 , @SouraVX wrote: > > > > When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag? > > > & what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension? > > > > I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows: > > > > `gcc -g3 test.c -c`, after dumping using `objdump(2.32)`, GCC will create `.debug_macro`(sort of it's default, until you specify `-gstrict-dwarf` in which case GCC will generate `.debug_macinfo`). > > > As Sourabh says this is default when not emitting strict DWARF in GCC. For Clang, my intention was for it to be enabled by default for `-fdebug-macro` when tuning for GDB. Maybe it would also be interesting when tuning for LLDB? Sounds alright. Not sure if the LLDB folks (@aprantl @JDevlieghere @labath) would be interested in that - a separate patch in any case. >> The only difference b/w `-g3` and `-gdwarf-5 -g3` GCC generated `.debug_macro` section is the `version` information in `macro header`, `4` and `5` respectively. However there's no difference in encoding used i.e both uses (DWARFv5 encodings) there is no `DW_MACRO_GNU*` -- observed using binutils `objdump` version info mentioned above. > > I personally think that the binutils tools printing the GNU extension using DWARF 5 entry names is confusing, but if people prefer to have it like that to avoid the larger code changes that this patch series introduce, I can align with that. Agreed. >> And lastly The reason why current `llvm-dwarfdump` is not able to dump/parse GCC generated(`gcc -g3 test.c -c`) `.debug_macro` section is because it uses `version` information in the header to parse it correctly(In this case it is `4`). However if you generate the macro info as(specifying version) `gcc -gdwarf-5 -g3 test.c -c` llvm-dwarfdump can parse/dump it correctly. >> >> I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. >> `CLANG/llvm` AFAIK doesn't have `-gstrict-dwarf`. So if you want analogous behavior like GCC(have `.debug_macro` section even at version `4`) you may need to introduce extra flag/switch. So that if end-user still `.debug_macinfo` for whatever reasons CLANG/llvm should generate it. >> I'm not the right person these sort of decision, I'll leave it to @dblaikie and @probinson . > > I just want to add that one downside with emitting `.debug_macro` that we have noticed downstream is that size of archives can grow quite a bit, since you then both pay for the uncoalesced strings in the different object files (same cost as for `.debug_macinfo`), plus all of the relocations. Got a rough %? Is it easy to disable this functionality if someone were trying to optimize for object size? (is there an easy way to disable gdb tuning on platforms that default to it, for instance?) > Other than that I am personally not aware of any other major reasons for wanting to use `.debug_macinfo` over `.debug_macro`, given that the rest of the toolchain supports the latter format, of course. > >> I've done some initial work(in llvm) around that D78866 and related. This is still broken from emission perspective(Fix in progress). `llvm-dwarfdump` works great. > > Okay, thanks for the information! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Mon Jul 6 13:29:00 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via llvm-commits) Date: Mon, 06 Jul 2020 13:29:00 -0700 (PDT) Subject: [llvm] 1d8cb09 - Add option LLVM_NM to allow specifying the location of the llvm-nm tool Message-ID: <5f03898c.1c69fb81.b1c2a.4a08@mx.google.com> Author: Arlo Siemsen Date: 2020-07-06T13:27:56-07:00 New Revision: 1d8cb099231a79b6ad96e745c2d17cf307bea857 URL: https://github.com/llvm/llvm-project/commit/1d8cb099231a79b6ad96e745c2d17cf307bea857 DIFF: https://github.com/llvm/llvm-project/commit/1d8cb099231a79b6ad96e745c2d17cf307bea857.diff LOG: Add option LLVM_NM to allow specifying the location of the llvm-nm tool The new option works like the existing LLVM_TABLEGEN, and LLVM_CONFIG_PATH options. Instead of building llvm-nm, the build uses the executable defined by LLVM_NM. This is useful for cross-compilation scenarios where the host cannot run the cross-compiled tool, and recursing into another cmake build is not an option (due to required DEFINE's, for example). Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D83022 Added: Modified: llvm/tools/llvm-shlib/CMakeLists.txt Removed: ################################################################################ diff --git a/llvm/tools/llvm-shlib/CMakeLists.txt b/llvm/tools/llvm-shlib/CMakeLists.txt index 25803da0c252..f3a2056f80d3 100644 --- a/llvm/tools/llvm-shlib/CMakeLists.txt +++ b/llvm/tools/llvm-shlib/CMakeLists.txt @@ -154,13 +154,17 @@ if(LLVM_BUILD_LLVM_C_DYLIB AND MSVC) set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py) set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports) - - if(CMAKE_CROSSCOMPILING) - build_native_tool(llvm-nm llvm_nm) - set(llvm_nm_target "${llvm_nm}") + if(NOT LLVM_NM) + if(CMAKE_CROSSCOMPILING) + build_native_tool(llvm-nm llvm_nm) + set(llvm_nm_target "${llvm_nm}") + else() + set(llvm_nm $) + set(llvm_nm_target llvm-nm) + endif() else() - set(llvm_nm $) - set(llvm_nm_target llvm-nm) + set(llvm_nm ${LLVM_NM}) + set(llvm_nm_target "") endif() add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE} From llvm-commits at lists.llvm.org Mon Jul 6 13:29:09 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:29:09 +0000 (UTC) Subject: [PATCH] D83022: Add option LLVM_NM to allow specifying the location of the llvm-nm tool. In-Reply-To: References: Message-ID: <0b35da4df8aaff663e960c7a0f6a2753@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1d8cb099231a: Add option LLVM_NM to allow specifying the location of the llvm-nm tool (authored by arlosi, committed by smeenai). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83022/new/ https://reviews.llvm.org/D83022 Files: llvm/tools/llvm-shlib/CMakeLists.txt Index: llvm/tools/llvm-shlib/CMakeLists.txt =================================================================== --- llvm/tools/llvm-shlib/CMakeLists.txt +++ llvm/tools/llvm-shlib/CMakeLists.txt @@ -154,13 +154,17 @@ set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py) set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports) - - if(CMAKE_CROSSCOMPILING) - build_native_tool(llvm-nm llvm_nm) - set(llvm_nm_target "${llvm_nm}") + if(NOT LLVM_NM) + if(CMAKE_CROSSCOMPILING) + build_native_tool(llvm-nm llvm_nm) + set(llvm_nm_target "${llvm_nm}") + else() + set(llvm_nm $) + set(llvm_nm_target llvm-nm) + endif() else() - set(llvm_nm $) - set(llvm_nm_target llvm-nm) + set(llvm_nm ${LLVM_NM}) + set(llvm_nm_target "") endif() add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83022.275822.patch Type: text/x-patch Size: 931 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:34:44 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:34:44 +0000 (UTC) Subject: [PATCH] D83235: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand In-Reply-To: References: Message-ID: <62d79faf648b2df30d2331595780ed1c@localhost.localdomain> arsenm added a comment. getPointerRegClass is terrible, but also shouldn't be necessary in GlobalISel since pointer types aren't lost ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:417 + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; ---------------- I don't think this case is covered in the test? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83235/new/ https://reviews.llvm.org/D83235 From llvm-commits at lists.llvm.org Mon Jul 6 13:36:18 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:36:18 +0000 (UTC) Subject: [PATCH] D83257: [SCCP] Handle assume predictes Message-ID: nikic created this revision. nikic added reviewers: fhahn, efriedma. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Take assume predicates into account when visiting ssa.copy. The handling is the same as for branch predicates, with the difference that we're always on the true edge. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83257 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/assume.ll Index: llvm/test/Transforms/SCCP/assume.ll =================================================================== --- /dev/null +++ llvm/test/Transforms/SCCP/assume.ll @@ -0,0 +1,48 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -ipsccp -S | FileCheck %s + +declare void @use(i1) +declare void @llvm.assume(i1) + +define void @basic(i32 %v) { +; CHECK-LABEL: @basic( +; CHECK-NEXT: [[A1:%.*]] = icmp ult i32 [[V:%.*]], 10 +; CHECK-NEXT: call void @llvm.assume(i1 [[A1]]) +; CHECK-NEXT: [[A2:%.*]] = icmp ugt i32 [[V]], 5 +; CHECK-NEXT: call void @llvm.assume(i1 [[A2]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[V]], 9 +; CHECK-NEXT: call void @use(i1 [[C2]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C4:%.*]] = icmp ugt i32 [[V]], 8 +; CHECK-NEXT: call void @use(i1 [[C4]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C6:%.*]] = icmp ugt i32 [[V]], 6 +; CHECK-NEXT: call void @use(i1 [[C6]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C8:%.*]] = icmp ult i32 [[V]], 7 +; CHECK-NEXT: call void @use(i1 [[C8]]) +; CHECK-NEXT: ret void +; + %a1 = icmp ult i32 %v, 10 + call void @llvm.assume(i1 %a1) + %a2 = icmp ugt i32 %v, 5 + call void @llvm.assume(i1 %a2) + %c1 = icmp ult i32 %v, 10 + call void @use(i1 %c1) + %c2 = icmp ult i32 %v, 9 + call void @use(i1 %c2) + %c3 = icmp ugt i32 %v, 9 + call void @use(i1 %c3) + %c4 = icmp ugt i32 %v, 8 + call void @use(i1 %c4) + %c5 = icmp ugt i32 %v, 5 + call void @use(i1 %c5) + %c6 = icmp ugt i32 %v, 6 + call void @use(i1 %c6) + %c7 = icmp ult i32 %v, 6 + call void @use(i1 %c7) + %c8 = icmp ult i32 %v, 7 + call void @use(i1 %c8) + ret void +} Index: llvm/lib/Transforms/Scalar/SCCP.cpp =================================================================== --- llvm/lib/Transforms/Scalar/SCCP.cpp +++ llvm/lib/Transforms/Scalar/SCCP.cpp @@ -1258,16 +1258,24 @@ return; Value *CopyOf = CB.getOperand(0); - auto *PI = getPredicateInfoFor(&CB); - auto *PBranch = dyn_cast_or_null(PI); ValueLatticeElement OriginalVal = getValueState(CopyOf); - if (!PI || !PBranch) { + auto *PI = getPredicateInfoFor(&CB); + assert(PI && "Missing predicate info for ssa.copy"); + + CmpInst *Cmp; + bool TrueEdge; + if (auto *PBranch = dyn_cast(PI)) { + Cmp = dyn_cast(PBranch->Condition); + TrueEdge = PBranch->TrueEdge; + } else if (auto *PAssume = dyn_cast(PI)) { + Cmp = dyn_cast(PAssume->Condition); + TrueEdge = true; + } else { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; } // Everything below relies on the condition being a comparison. - auto *Cmp = dyn_cast(PBranch->Condition); if (!Cmp) { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; @@ -1292,7 +1300,7 @@ return; } - if (!PBranch->TrueEdge) + if (!TrueEdge) Pred = CmpInst::getInversePredicate(Pred); ValueLatticeElement CondVal = getValueState(CmpOp1); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83257.275818.patch Type: text/x-patch Size: 3254 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:37:28 2020 From: llvm-commits at lists.llvm.org (Andrii Nakryiko via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:37:28 +0000 (UTC) Subject: [PATCH] D83242: [RFC][BPF] support expr with typedef type for FIELD_EXISTENCE reloc In-Reply-To: References: Message-ID: anakryiko added a comment. Awesome, that's exactly what we need for BPF helper availability checks! Can you please also add test that this pattern works: return __builtin_preserve_field_info((btf_bpf_read_branch_records)0, FIELD_EXISTENCE); Also for non-func pointer typedefs, something like this would work as well, right? return __builtin_preserve_field_info(*(T *)0, FIELD_EXISTENCE); Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83242/new/ https://reviews.llvm.org/D83242 From llvm-commits at lists.llvm.org Mon Jul 6 13:39:42 2020 From: llvm-commits at lists.llvm.org (Kit Barton via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:39:42 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: <36badf43e721b813952265f1ac279a0d@localhost.localdomain> kbarton updated this revision to Diff 275823. kbarton added a comment. - Put all new GlobalISel related files into new GISel subdirectory. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 Files: llvm/lib/Target/PowerPC/CMakeLists.txt llvm/lib/Target/PowerPC/GISel/PPCCallLowering.cpp llvm/lib/Target/PowerPC/GISel/PPCCallLowering.h llvm/lib/Target/PowerPC/GISel/PPCInstructionSelector.cpp llvm/lib/Target/PowerPC/GISel/PPCLegalizerInfo.cpp llvm/lib/Target/PowerPC/GISel/PPCLegalizerInfo.h llvm/lib/Target/PowerPC/GISel/PPCRegisterBankInfo.cpp llvm/lib/Target/PowerPC/GISel/PPCRegisterBankInfo.h llvm/lib/Target/PowerPC/GISel/PPCRegisterBanks.td llvm/lib/Target/PowerPC/LLVMBuild.txt llvm/lib/Target/PowerPC/PPC.h llvm/lib/Target/PowerPC/PPC.td llvm/lib/Target/PowerPC/PPCSubtarget.cpp llvm/lib/Target/PowerPC/PPCSubtarget.h llvm/lib/Target/PowerPC/PPCTargetMachine.cpp llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-ret.ll llvm/test/CodeGen/PowerPC/GlobalISel/legalize-ret.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83100.275823.patch Type: text/x-patch Size: 22524 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:41:27 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:41:27 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <80c347fe7d51f645a612eb8f9742dde6@localhost.localdomain> Tyker added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- lebedev.ri wrote: > Tyker wrote: > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > and the key depends on a pointer value that will change when the Module gets cloned. > No, the current logic is correct, that map doesn't live that long. here is the situation i think fails. extractOperandBundesFromModule is called a first time and generate a reduction. the reduction isn't considered interesting. extractOperandBundesFromModule is called a second time with different chunks to keep. but the association from an index to a Bundle is different from the first time because the module isn't the same the association from a index to a Bundle isn't the same, so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 13:44:58 2020 From: llvm-commits at lists.llvm.org (Robinson, Paul via llvm-commits) Date: Mon, 6 Jul 2020 20:44:58 +0000 Subject: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc arg by const reference not value. In-Reply-To: References: <5efcae30.1c69fb81.784cd.0a18@mx.google.com> Message-ID: > -----Original Message----- > From: Simon Pilgrim > Sent: Thursday, July 2, 2020 5:03 AM > To: David Blaikie ; Duncan Exon Smith > > Cc: Robinson, Paul ; Adrian Prantl > ; Jonas Devlieghere ; LLVM > Commits > Subject: Re: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc arg > by const reference not value. > > > On 01/07/2020 21:30, David Blaikie wrote: > > On Wed, Jul 1, 2020 at 11:12 AM Duncan Exon Smith > wrote: > >> > >> > >>> On 2020-Jul-01, at 10:01, David Blaikie wrote: > >>> > >>> On Wed, Jul 1, 2020 at 9:11 AM Robinson, Paul > wrote: > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: llvm-commits On Behalf > Of > >>>>> Simon Pilgrim via llvm-commits > >>>>> Sent: Wednesday, July 1, 2020 11:39 AM > >>>>> To: llvm-commits at lists.llvm.org > >>>>> Subject: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc > arg by > >>>>> const reference not value. > >>>>> > >>>>> > >>>>> Author: Simon Pilgrim > >>>>> Date: 2020-07-01T16:38:51+01:00 > >>>>> New Revision: 0ae989a1fede0e512e2bfd57b328aad6c1920329 > >>>>> > >>>>> URL: https://urldefense.com/v3/__https://github.com/llvm/llvm- > __;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N > 6khP6tgRUuwnKQ$ > >>>>> project/commit/0ae989a1fede0e512e2bfd57b328aad6c1920329 > >>>>> DIFF: https://urldefense.com/v3/__https://github.com/llvm/llvm- > __;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N > 6khP6tgRUuwnKQ$ > >>>>> project/commit/0ae989a1fede0e512e2bfd57b328aad6c1920329.diff > >>>>> > >>>>> LOG: Pass DebugLoc::appendInlinedAt DebugLoc arg by const reference > not > >>>>> value. > >>>>> > >>>>> Noticed by clang-tidy performance-unnecessary-value-param warning. > >>>> Is it really a performance thing? Somehow I had it in my head that > >>>> DebugLoc was deliberately lightweight to make it easy to pass by > value; > >>>> perhaps I'm thinking of something else, as I don't see a comment to > >>>> that effect in DebugLoc.h, but still it doesn't look that heavy. > >>> That's my understanding too... though, looking at it. > >> As David discovered, I changed that at some point. A simpler way to > make the argument lightweight is to pass an `MDLocation*` or > `MDLocation&`. > > Looks about the same amount as simple as passing the DebugLoc by const > > ref? I guess MDLocation isn't copyable, perhaps, so it's more obvious > > there's no choice but to handle it by pointer or ref? > > > >>> It seems this boils down to > >>> > https://urldefense.com/v3/__https://llvm.org/doxygen/Metadata_8cpp_source. > html*l00153__;Iw!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1 > VQeKgMjlz124N6khP6tj7znUhxw$ - so it's > >>> essentially like a std::shared_ptr, doing ref counting. > >> More like a TrackingVH. IIRC, there's no refcounting, it's just needed > for RAUW when parsing IR. > > Oh, looked at the code again - I see it's only ref counted for > > replaceable metadata (then there's an "addRef" call) & something else > > interesting that happens for "distinct" placeholder nodes too. > > > >>> Seems this has been this way for a while (since 2014: > >>> https://urldefense.com/v3/__https://github.com/llvm/llvm- > project/commit/5bf8fef58013e2c97180236fa6973faa40435d5f__;!!JmoZiZGBv3RvKR > Sx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N6khP6tgjciS1Uw$ > >>> ) > >>> > >>> Duncan - any thoughts on this? I haven't looked more closely at the > >>> patch yet to better understand the motivation for this particular part > >>> of the change you made back then - but perhaps you remember/it's > >>> obvious enough to you. > >> [ I'm not sure I've been able to fully page this back in, but here's > what I have... ] > >> > >> I think the main point of this part was to replace the table in the > LLVMContext (see the change in DebugLoc.cpp) that was hard to reason about > with the `MDLocation` class. IIRC, `DebugLoc` is used in places (like > `Instruction`) where we need a tracking handle (for parsing IR), so it > became heavy, but where you don't need a tracking handle you can use > `MDLocation&` (or `MDLocation*`) directly. > >> > >> IMO, the `DebugLoc` class should just be deleted in favour of using > `MDLocation` directly. I probably intended to circle back and do that. The > generator APIs can be moved to `MDLocation`, which they're mostly just > wrappers around anyway. The few places a tracking handle is needed should > use the equivalent `TrackingMDRef` and it's more obvious > they're expensive to copy. The other sites can just use an `MDLocation&` > or `MDLocation*`. > > Sounds plausible/reasonable - perhaps you/someone could add a quick > > FIXME to DebugLoc.h describing this deprecation/migration strategy? > > Urgh, I didn't expect this patch to snowball, I'm sorry for pushing > outstanding work up people's priority list... > > Currently most places that a DebugLoc is passed by value is in > constructors where its std::move()'d into place, as one could probably > expect for a wrapped ptr style type. > > There are a few other pass by value uses that seem to have snuck in > though - AArch64InstrInfo::copyGPRRegTuple for instance (even though an > equivalent use immediately below in AArch64InstrInfo::copyPhysReg uses > const ref...) - and basically anywhere else where its only used > immediately in BuildMI (etc.) calls. Don't fret, Simon. It's good for some of us to be internalizing this "DebugLoc isn't for pass-by-value anymore" idea. Like I said up front, it used to be cheap, and now we're getting the idea that Duncan's valuable work actually changed that. Totally worthwhile. You should definitely keep going with these tidy-up changes. --paulr > > > > >>> Certainly is something to keep in mind - I wouldn't copy a > >>> std::shared_ptr if I didn't need to, and I probably will make a point > >>> of not copying DebugLocs needlessly either if this is the > >>> correct/ongoing implementation. > >>> > >>> - Dave > >>> > >>>> --paulr > >>>> > >>>>> Added: > >>>>> > >>>>> > >>>>> Modified: > >>>>> llvm/include/llvm/IR/DebugLoc.h > >>>>> llvm/lib/IR/DebugLoc.cpp > >>>>> > >>>>> Removed: > >>>>> > >>>>> > >>>>> > >>>>> > ########################################################################## > >>>>> ###### > >>>>> diff --git a/llvm/include/llvm/IR/DebugLoc.h > >>>>> b/llvm/include/llvm/IR/DebugLoc.h > >>>>> index 780d17a33661..4914d733fe0d 100644 > >>>>> --- a/llvm/include/llvm/IR/DebugLoc.h > >>>>> +++ b/llvm/include/llvm/IR/DebugLoc.h > >>>>> @@ -85,7 +85,7 @@ namespace llvm { > >>>>> /// the chain now is inlined-at the new call site. > >>>>> /// \param InlinedAt The new outermost inlined-at in the > chain. > >>>>> /// \param ReplaceLast Replace the last location in the > inlined-at > >>>>> chain. > >>>>> - static DebugLoc appendInlinedAt(DebugLoc DL, DILocation > *InlinedAt, > >>>>> + static DebugLoc appendInlinedAt(const DebugLoc &DL, DILocation > >>>>> *InlinedAt, > >>>>> LLVMContext &Ctx, > >>>>> DenseMap *> > >>>>> &Cache, > >>>>> bool ReplaceLast = false); > >>>>> > >>>>> diff --git a/llvm/lib/IR/DebugLoc.cpp b/llvm/lib/IR/DebugLoc.cpp > >>>>> index 14d1396f1543..e945cbcba782 100644 > >>>>> --- a/llvm/lib/IR/DebugLoc.cpp > >>>>> +++ b/llvm/lib/IR/DebugLoc.cpp > >>>>> @@ -79,7 +79,7 @@ DebugLoc DebugLoc::get(unsigned Line, unsigned > Col, > >>>>> const MDNode *Scope, > >>>>> const_cast(InlinedAt), > ImplicitCode); > >>>>> } > >>>>> > >>>>> -DebugLoc DebugLoc::appendInlinedAt(DebugLoc DL, DILocation > *InlinedAt, > >>>>> +DebugLoc DebugLoc::appendInlinedAt(const DebugLoc &DL, DILocation > >>>>> *InlinedAt, > >>>>> LLVMContext &Ctx, > >>>>> DenseMap *> > >>>>> &Cache, > >>>>> bool ReplaceLast) { > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> llvm-commits mailing list > >>>>> llvm-commits at lists.llvm.org > >>>>> https://urldefense.com/v3/__https://lists.llvm.org/cgi- > bin/mailman/listinfo/llvm- > commits__;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgM > jlz124N6khP6tiWGkvNjQ$ From llvm-commits at lists.llvm.org Mon Jul 6 13:45:56 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 6 Jul 2020 13:45:56 -0700 Subject: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc arg by const reference not value. In-Reply-To: References: <5efcae30.1c69fb81.784cd.0a18@mx.google.com> Message-ID: On Mon, Jul 6, 2020 at 1:45 PM Robinson, Paul wrote: > > > > > -----Original Message----- > > From: Simon Pilgrim > > Sent: Thursday, July 2, 2020 5:03 AM > > To: David Blaikie ; Duncan Exon Smith > > > > Cc: Robinson, Paul ; Adrian Prantl > > ; Jonas Devlieghere ; LLVM > > Commits > > Subject: Re: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc arg > > by const reference not value. > > > > > > On 01/07/2020 21:30, David Blaikie wrote: > > > On Wed, Jul 1, 2020 at 11:12 AM Duncan Exon Smith > > wrote: > > >> > > >> > > >>> On 2020-Jul-01, at 10:01, David Blaikie wrote: > > >>> > > >>> On Wed, Jul 1, 2020 at 9:11 AM Robinson, Paul > > wrote: > > >>>> > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: llvm-commits On Behalf > > Of > > >>>>> Simon Pilgrim via llvm-commits > > >>>>> Sent: Wednesday, July 1, 2020 11:39 AM > > >>>>> To: llvm-commits at lists.llvm.org > > >>>>> Subject: [llvm] 0ae989a - Pass DebugLoc::appendInlinedAt DebugLoc > > arg by > > >>>>> const reference not value. > > >>>>> > > >>>>> > > >>>>> Author: Simon Pilgrim > > >>>>> Date: 2020-07-01T16:38:51+01:00 > > >>>>> New Revision: 0ae989a1fede0e512e2bfd57b328aad6c1920329 > > >>>>> > > >>>>> URL: https://urldefense.com/v3/__https://github.com/llvm/llvm- > > __;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N > > 6khP6tgRUuwnKQ$ > > >>>>> project/commit/0ae989a1fede0e512e2bfd57b328aad6c1920329 > > >>>>> DIFF: https://urldefense.com/v3/__https://github.com/llvm/llvm- > > __;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N > > 6khP6tgRUuwnKQ$ > > >>>>> project/commit/0ae989a1fede0e512e2bfd57b328aad6c1920329.diff > > >>>>> > > >>>>> LOG: Pass DebugLoc::appendInlinedAt DebugLoc arg by const reference > > not > > >>>>> value. > > >>>>> > > >>>>> Noticed by clang-tidy performance-unnecessary-value-param warning. > > >>>> Is it really a performance thing? Somehow I had it in my head that > > >>>> DebugLoc was deliberately lightweight to make it easy to pass by > > value; > > >>>> perhaps I'm thinking of something else, as I don't see a comment to > > >>>> that effect in DebugLoc.h, but still it doesn't look that heavy. > > >>> That's my understanding too... though, looking at it. > > >> As David discovered, I changed that at some point. A simpler way to > > make the argument lightweight is to pass an `MDLocation*` or > > `MDLocation&`. > > > Looks about the same amount as simple as passing the DebugLoc by const > > > ref? I guess MDLocation isn't copyable, perhaps, so it's more obvious > > > there's no choice but to handle it by pointer or ref? > > > > > >>> It seems this boils down to > > >>> > > https://urldefense.com/v3/__https://llvm.org/doxygen/Metadata_8cpp_source. > > html*l00153__;Iw!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1 > > VQeKgMjlz124N6khP6tj7znUhxw$ - so it's > > >>> essentially like a std::shared_ptr, doing ref counting. > > >> More like a TrackingVH. IIRC, there's no refcounting, it's just needed > > for RAUW when parsing IR. > > > Oh, looked at the code again - I see it's only ref counted for > > > replaceable metadata (then there's an "addRef" call) & something else > > > interesting that happens for "distinct" placeholder nodes too. > > > > > >>> Seems this has been this way for a while (since 2014: > > >>> https://urldefense.com/v3/__https://github.com/llvm/llvm- > > project/commit/5bf8fef58013e2c97180236fa6973faa40435d5f__;!!JmoZiZGBv3RvKR > > Sx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgMjlz124N6khP6tgjciS1Uw$ > > >>> ) > > >>> > > >>> Duncan - any thoughts on this? I haven't looked more closely at the > > >>> patch yet to better understand the motivation for this particular part > > >>> of the change you made back then - but perhaps you remember/it's > > >>> obvious enough to you. > > >> [ I'm not sure I've been able to fully page this back in, but here's > > what I have... ] > > >> > > >> I think the main point of this part was to replace the table in the > > LLVMContext (see the change in DebugLoc.cpp) that was hard to reason about > > with the `MDLocation` class. IIRC, `DebugLoc` is used in places (like > > `Instruction`) where we need a tracking handle (for parsing IR), so it > > became heavy, but where you don't need a tracking handle you can use > > `MDLocation&` (or `MDLocation*`) directly. > > >> > > >> IMO, the `DebugLoc` class should just be deleted in favour of using > > `MDLocation` directly. I probably intended to circle back and do that. The > > generator APIs can be moved to `MDLocation`, which they're mostly just > > wrappers around anyway. The few places a tracking handle is needed should > > use the equivalent `TrackingMDRef` and it's more obvious > > they're expensive to copy. The other sites can just use an `MDLocation&` > > or `MDLocation*`. > > > Sounds plausible/reasonable - perhaps you/someone could add a quick > > > FIXME to DebugLoc.h describing this deprecation/migration strategy? > > > > Urgh, I didn't expect this patch to snowball, I'm sorry for pushing > > outstanding work up people's priority list... > > > > Currently most places that a DebugLoc is passed by value is in > > constructors where its std::move()'d into place, as one could probably > > expect for a wrapped ptr style type. > > > > There are a few other pass by value uses that seem to have snuck in > > though - AArch64InstrInfo::copyGPRRegTuple for instance (even though an > > equivalent use immediately below in AArch64InstrInfo::copyPhysReg uses > > const ref...) - and basically anywhere else where its only used > > immediately in BuildMI (etc.) calls. > > Don't fret, Simon. It's good for some of us to be internalizing this > "DebugLoc isn't for pass-by-value anymore" idea. Like I said up front, > it used to be cheap, and now we're getting the idea that Duncan's > valuable work actually changed that. Totally worthwhile. > You should definitely keep going with these tidy-up changes. yep, +1 to all that > --paulr > > > > > > > > >>> Certainly is something to keep in mind - I wouldn't copy a > > >>> std::shared_ptr if I didn't need to, and I probably will make a point > > >>> of not copying DebugLocs needlessly either if this is the > > >>> correct/ongoing implementation. > > >>> > > >>> - Dave > > >>> > > >>>> --paulr > > >>>> > > >>>>> Added: > > >>>>> > > >>>>> > > >>>>> Modified: > > >>>>> llvm/include/llvm/IR/DebugLoc.h > > >>>>> llvm/lib/IR/DebugLoc.cpp > > >>>>> > > >>>>> Removed: > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > ########################################################################## > > >>>>> ###### > > >>>>> diff --git a/llvm/include/llvm/IR/DebugLoc.h > > >>>>> b/llvm/include/llvm/IR/DebugLoc.h > > >>>>> index 780d17a33661..4914d733fe0d 100644 > > >>>>> --- a/llvm/include/llvm/IR/DebugLoc.h > > >>>>> +++ b/llvm/include/llvm/IR/DebugLoc.h > > >>>>> @@ -85,7 +85,7 @@ namespace llvm { > > >>>>> /// the chain now is inlined-at the new call site. > > >>>>> /// \param InlinedAt The new outermost inlined-at in the > > chain. > > >>>>> /// \param ReplaceLast Replace the last location in the > > inlined-at > > >>>>> chain. > > >>>>> - static DebugLoc appendInlinedAt(DebugLoc DL, DILocation > > *InlinedAt, > > >>>>> + static DebugLoc appendInlinedAt(const DebugLoc &DL, DILocation > > >>>>> *InlinedAt, > > >>>>> LLVMContext &Ctx, > > >>>>> DenseMap > *> > > >>>>> &Cache, > > >>>>> bool ReplaceLast = false); > > >>>>> > > >>>>> diff --git a/llvm/lib/IR/DebugLoc.cpp b/llvm/lib/IR/DebugLoc.cpp > > >>>>> index 14d1396f1543..e945cbcba782 100644 > > >>>>> --- a/llvm/lib/IR/DebugLoc.cpp > > >>>>> +++ b/llvm/lib/IR/DebugLoc.cpp > > >>>>> @@ -79,7 +79,7 @@ DebugLoc DebugLoc::get(unsigned Line, unsigned > > Col, > > >>>>> const MDNode *Scope, > > >>>>> const_cast(InlinedAt), > > ImplicitCode); > > >>>>> } > > >>>>> > > >>>>> -DebugLoc DebugLoc::appendInlinedAt(DebugLoc DL, DILocation > > *InlinedAt, > > >>>>> +DebugLoc DebugLoc::appendInlinedAt(const DebugLoc &DL, DILocation > > >>>>> *InlinedAt, > > >>>>> LLVMContext &Ctx, > > >>>>> DenseMap > *> > > >>>>> &Cache, > > >>>>> bool ReplaceLast) { > > >>>>> > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> llvm-commits mailing list > > >>>>> llvm-commits at lists.llvm.org > > >>>>> https://urldefense.com/v3/__https://lists.llvm.org/cgi- > > bin/mailman/listinfo/llvm- > > commits__;!!JmoZiZGBv3RvKRSx!vlKxX5OIOFP4SG6loHuSEPZzRS2Bl0QDjizVpE1VQeKgM > > jlz124N6khP6tiWGkvNjQ$ From llvm-commits at lists.llvm.org Mon Jul 6 13:47:28 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:47:28 +0000 (UTC) Subject: [PATCH] D83257: [SCCP] Handle assume predictes In-Reply-To: References: Message-ID: nikic marked an inline comment as done. nikic added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1337 IV, &CB, ValueLatticeElement::getRange(NewCR, /*MayIncludeUndef=*/true)); return; ---------------- We could set MayIncludeUndef=false for assumes (as undef/poison has always been UB there), but I didn't think it worthwhile to make the distinction, as we plan to flip this for branches in the future anyway. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83257/new/ https://reviews.llvm.org/D83257 From llvm-commits at lists.llvm.org Mon Jul 6 13:51:23 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:51:23 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: lei accepted this revision as: lei. lei added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 13:51:41 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:51:41 +0000 (UTC) Subject: [PATCH] D83088: Introduce CfgTraits abstraction In-Reply-To: References: Message-ID: <677855f6a5f690f56731b38971d4d80c@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineCfgTraits.h:136-138 + // Prefer to avoid support for bundled instructions as long as we + // don't really need it. + assert(!m_instr->isBundle()); ---------------- nhaehnle wrote: > arsenm wrote: > > I've been thinking about more aggressively using bundles around call sites to handle waterfall looping around divergent calls with SGPR arguments > Hmm, so what's the correct iteration behavior in the presence of bundles? Iterate over all instructions in the bundle (which is that MachineBasicBlock::instr_iterator does) and only iterate over explicit defs? I think that's what makes the most sense, and what I'm going with for now... I don't think this actually needs to specially consider bundles. The BUNDLE itself is supposed to have the uses/defs that cover all the uses/defs inside the bundle. You shouldn't need to worry about the individual instructions Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83088/new/ https://reviews.llvm.org/D83088 From llvm-commits at lists.llvm.org Mon Jul 6 13:54:00 2020 From: llvm-commits at lists.llvm.org (Anil Mahmud via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:54:00 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <021961ff680ca3dea90fc69cf47e8a90@localhost.localdomain> anil9 added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9827 + + // Check that the shuffle mask matches the semantics the XXSPLTI32DX. + // XXSPLTI32DX can insert 4 byte chunks from the constant splat C into: ---------------- semantics the -> semantics of the ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:105 + /// XXSPLTI32DX - The PPC XXSPLTI32DX instruction. + XXSPLTI32DX, ---------------- nit : the other ones seem to have a extra line with /// ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:1278 + /// handled by the XXSPLTI32DX instruction introduced in ISA 3.1. + SDValue lowerToXXSPLTI32DX(ShuffleVectorSDNode *N, SelectionDAG &DAG) const; + ---------------- nit : /// otherwise return the default SDValue. ??? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 13:56:22 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:56:22 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri marked 2 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > lebedev.ri wrote: > > Tyker wrote: > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > and the key depends on a pointer value that will change when the Module gets cloned. > > No, the current logic is correct, that map doesn't live that long. > here is the situation i think fails. > > extractOperandBundesFromModule is called a first time and generate a reduction. > the reduction isn't considered interesting. > extractOperandBundesFromModule is called a second time with different chunks to keep. > but the association from an index to a Bundle is different from the first time because the module isn't the same > the association from a index to a Bundle isn't the same, > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) we have in this module `M0`. It then comes with different chunks to keep and tells us to mutate the module `Mc` (which is a perfect clone of `M0`). It is up to us to actually enumerate the features (here: bundles). As long as the mapping is stable, i.e. we get the same result when calling `extractOperandBundesFromModule()` on the same input, we're good. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:00:22 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:00:22 +0000 (UTC) Subject: [PATCH] D82580: [RegisterCoalescer] Dumper for JoinVals In-Reply-To: References: Message-ID: arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/lib/CodeGen/RegisterCoalescer.cpp:2343 + } + if (OtherVNI) O << " Other:", OtherVNI->print(O); + if (RedefVNI) O << " Redef:", RedefVNI->print(O); ---------------- Comma operator here is definitely weird. Braces and separate lines? ================ Comment at: llvm/lib/CodeGen/RegisterCoalescer.cpp:2344-2347 + if (RedefVNI) O << " Redef:", RedefVNI->print(O); + if (ErasableImplicitDef) O << " ImpDef"; + if (Pruned || PrunedComputed) O << ' ' << (Pruned ? "Pruned" : "NonPruned"); + if (PrunedComputed) O << 'C'; ---------------- Separate lines Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82580/new/ https://reviews.llvm.org/D82580 From llvm-commits at lists.llvm.org Mon Jul 6 14:01:02 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Mon, 06 Jul 2020 14:01:02 -0700 (PDT) Subject: [llvm] f25d020 - AMDGPU/GlobalISel: Add types to special inputs Message-ID: <5f03910e.1c69fb81.89a6e.72a3@mx.google.com> Author: Matt Arsenault Date: 2020-07-06T17:00:55-04:00 New Revision: f25d020c2ec7cb1971fa56b99381d416799d8145 URL: https://github.com/llvm/llvm-project/commit/f25d020c2ec7cb1971fa56b99381d416799d8145 DIFF: https://github.com/llvm/llvm-project/commit/f25d020c2ec7cb1971fa56b99381d416799d8145.diff LOG: AMDGPU/GlobalISel: Add types to special inputs When passing special ABI inputs, we have no existing context for the type to use. Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp index f41e774b34b4..d078fc147a36 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp @@ -83,59 +83,63 @@ void AMDGPUArgumentUsageInfo::print(raw_ostream &OS, const Module *M) const { } } -std::pair +std::tuple AMDGPUFunctionArgInfo::getPreloadedValue( - AMDGPUFunctionArgInfo::PreloadedValue Value) const { + AMDGPUFunctionArgInfo::PreloadedValue Value) const { switch (Value) { case AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_BUFFER: { - return std::make_pair( - PrivateSegmentBuffer ? &PrivateSegmentBuffer : nullptr, - &AMDGPU::SGPR_128RegClass); + return std::make_tuple(PrivateSegmentBuffer ? &PrivateSegmentBuffer + : nullptr, + &AMDGPU::SGPR_128RegClass, LLT::vector(4, 32)); } case AMDGPUFunctionArgInfo::IMPLICIT_BUFFER_PTR: - return std::make_pair(ImplicitBufferPtr ? &ImplicitBufferPtr : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(ImplicitBufferPtr ? &ImplicitBufferPtr : nullptr, + &AMDGPU::SGPR_64RegClass, + LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64)); case AMDGPUFunctionArgInfo::WORKGROUP_ID_X: - return std::make_pair(WorkGroupIDX ? &WorkGroupIDX : nullptr, - &AMDGPU::SGPR_32RegClass); - + return std::make_tuple(WorkGroupIDX ? &WorkGroupIDX : nullptr, + &AMDGPU::SGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::WORKGROUP_ID_Y: - return std::make_pair(WorkGroupIDY ? &WorkGroupIDY : nullptr, - &AMDGPU::SGPR_32RegClass); + return std::make_tuple(WorkGroupIDY ? &WorkGroupIDY : nullptr, + &AMDGPU::SGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::WORKGROUP_ID_Z: - return std::make_pair(WorkGroupIDZ ? &WorkGroupIDZ : nullptr, - &AMDGPU::SGPR_32RegClass); + return std::make_tuple(WorkGroupIDZ ? &WorkGroupIDZ : nullptr, + &AMDGPU::SGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET: - return std::make_pair( - PrivateSegmentWaveByteOffset ? &PrivateSegmentWaveByteOffset : nullptr, - &AMDGPU::SGPR_32RegClass); + return std::make_tuple( + PrivateSegmentWaveByteOffset ? &PrivateSegmentWaveByteOffset : nullptr, + &AMDGPU::SGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR: - return std::make_pair(KernargSegmentPtr ? &KernargSegmentPtr : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(KernargSegmentPtr ? &KernargSegmentPtr : nullptr, + &AMDGPU::SGPR_64RegClass, + LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64)); case AMDGPUFunctionArgInfo::IMPLICIT_ARG_PTR: - return std::make_pair(ImplicitArgPtr ? &ImplicitArgPtr : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(ImplicitArgPtr ? &ImplicitArgPtr : nullptr, + &AMDGPU::SGPR_64RegClass, + LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64)); case AMDGPUFunctionArgInfo::DISPATCH_ID: - return std::make_pair(DispatchID ? &DispatchID : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(DispatchID ? &DispatchID : nullptr, + &AMDGPU::SGPR_64RegClass, LLT::scalar(64)); case AMDGPUFunctionArgInfo::FLAT_SCRATCH_INIT: - return std::make_pair(FlatScratchInit ? &FlatScratchInit : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(FlatScratchInit ? &FlatScratchInit : nullptr, + &AMDGPU::SGPR_64RegClass, LLT::scalar(64)); case AMDGPUFunctionArgInfo::DISPATCH_PTR: - return std::make_pair(DispatchPtr ? &DispatchPtr : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(DispatchPtr ? &DispatchPtr : nullptr, + &AMDGPU::SGPR_64RegClass, + LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64)); case AMDGPUFunctionArgInfo::QUEUE_PTR: - return std::make_pair(QueuePtr ? &QueuePtr : nullptr, - &AMDGPU::SGPR_64RegClass); + return std::make_tuple(QueuePtr ? &QueuePtr : nullptr, + &AMDGPU::SGPR_64RegClass, + LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64)); case AMDGPUFunctionArgInfo::WORKITEM_ID_X: - return std::make_pair(WorkItemIDX ? &WorkItemIDX : nullptr, - &AMDGPU::VGPR_32RegClass); + return std::make_tuple(WorkItemIDX ? &WorkItemIDX : nullptr, + &AMDGPU::VGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::WORKITEM_ID_Y: - return std::make_pair(WorkItemIDY ? &WorkItemIDY : nullptr, - &AMDGPU::VGPR_32RegClass); + return std::make_tuple(WorkItemIDY ? &WorkItemIDY : nullptr, + &AMDGPU::VGPR_32RegClass, LLT::scalar(32)); case AMDGPUFunctionArgInfo::WORKITEM_ID_Z: - return std::make_pair(WorkItemIDZ ? &WorkItemIDZ : nullptr, - &AMDGPU::VGPR_32RegClass); + return std::make_tuple(WorkItemIDZ ? &WorkItemIDZ : nullptr, + &AMDGPU::VGPR_32RegClass, LLT::scalar(32)); } llvm_unreachable("unexpected preloaded value type"); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h index b4ef0c5533df..576e6cfe929e 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h @@ -12,6 +12,7 @@ #include "llvm/ADT/DenseMap.h" #include "llvm/CodeGen/Register.h" #include "llvm/Pass.h" +#include "llvm/Support/LowLevelTypeImpl.h" namespace llvm { @@ -148,7 +149,7 @@ struct AMDGPUFunctionArgInfo { ArgDescriptor WorkItemIDY; ArgDescriptor WorkItemIDZ; - std::pair + std::tuple getPreloadedValue(PreloadedValue Value) const; static constexpr AMDGPUFunctionArgInfo fixedABILayout(); diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index a00bd8822da9..bcad30a117e6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2442,7 +2442,8 @@ const ArgDescriptor *AMDGPULegalizerInfo::getArgDescriptor( const SIMachineFunctionInfo *MFI = B.getMF().getInfo(); const ArgDescriptor *Arg; const TargetRegisterClass *RC; - std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType); + LLT ArgTy; + std::tie(Arg, RC, ArgTy) = MFI->getPreloadedValue(ArgType); if (!Arg) { LLVM_DEBUG(dbgs() << "Required arg register missing\n"); return nullptr; @@ -3178,8 +3179,9 @@ bool AMDGPULegalizerInfo::legalizeImplicitArgPtr(MachineInstr &MI, const ArgDescriptor *Arg; const TargetRegisterClass *RC; - std::tie(Arg, RC) - = MFI->getPreloadedValue(AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR); + LLT ArgTy; + std::tie(Arg, RC, ArgTy) = + MFI->getPreloadedValue(AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR); if (!Arg) return false; diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 3ee48c1ffdff..a3135a787639 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1527,9 +1527,10 @@ SDValue SITargetLowering::lowerKernArgParameterPtr(SelectionDAG &DAG, const ArgDescriptor *InputPtrReg; const TargetRegisterClass *RC; + LLT ArgTy; - std::tie(InputPtrReg, RC) - = Info->getPreloadedValue(AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR); + std::tie(InputPtrReg, RC, ArgTy) = + Info->getPreloadedValue(AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR); MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo(); MVT PtrVT = getPointerTy(DL, AMDGPUAS::CONSTANT_ADDRESS); @@ -1675,8 +1676,9 @@ SDValue SITargetLowering::getPreloadedValue(SelectionDAG &DAG, AMDGPUFunctionArgInfo::PreloadedValue PVID) const { const ArgDescriptor *Reg; const TargetRegisterClass *RC; + LLT Ty; - std::tie(Reg, RC) = MFI.getPreloadedValue(PVID); + std::tie(Reg, RC, Ty) = MFI.getPreloadedValue(PVID); return CreateLiveInRegister(DAG, RC, Reg->getRegister(), VT); } @@ -2580,15 +2582,18 @@ void SITargetLowering::passSpecialInputs( for (auto InputID : InputRegs) { const ArgDescriptor *OutgoingArg; const TargetRegisterClass *ArgRC; + LLT ArgTy; - std::tie(OutgoingArg, ArgRC) = CalleeArgInfo->getPreloadedValue(InputID); + std::tie(OutgoingArg, ArgRC, ArgTy) = + CalleeArgInfo->getPreloadedValue(InputID); if (!OutgoingArg) continue; const ArgDescriptor *IncomingArg; const TargetRegisterClass *IncomingArgRC; - std::tie(IncomingArg, IncomingArgRC) - = CallerArgInfo.getPreloadedValue(InputID); + LLT Ty; + std::tie(IncomingArg, IncomingArgRC, Ty) = + CallerArgInfo.getPreloadedValue(InputID); assert(IncomingArgRC == ArgRC); // All special arguments are ints for now. @@ -2621,24 +2626,25 @@ void SITargetLowering::passSpecialInputs( // packed. const ArgDescriptor *OutgoingArg; const TargetRegisterClass *ArgRC; + LLT Ty; - std::tie(OutgoingArg, ArgRC) = - CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_X); + std::tie(OutgoingArg, ArgRC, Ty) = + CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_X); if (!OutgoingArg) - std::tie(OutgoingArg, ArgRC) = - CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Y); + std::tie(OutgoingArg, ArgRC, Ty) = + CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Y); if (!OutgoingArg) - std::tie(OutgoingArg, ArgRC) = - CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Z); + std::tie(OutgoingArg, ArgRC, Ty) = + CalleeArgInfo->getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Z); if (!OutgoingArg) return; - const ArgDescriptor *IncomingArgX - = CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_X).first; - const ArgDescriptor *IncomingArgY - = CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Y).first; - const ArgDescriptor *IncomingArgZ - = CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Z).first; + const ArgDescriptor *IncomingArgX = std::get<0>( + CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_X)); + const ArgDescriptor *IncomingArgY = std::get<0>( + CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Y)); + const ArgDescriptor *IncomingArgZ = std::get<0>( + CallerArgInfo.getPreloadedValue(AMDGPUFunctionArgInfo::WORKITEM_ID_Z)); SDValue InputReg; SDLoc SL; diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h index 7221e0157522..cf1629fda0af 100644 --- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h +++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h @@ -679,13 +679,13 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction { return ArgInfo; } - std::pair + std::tuple getPreloadedValue(AMDGPUFunctionArgInfo::PreloadedValue Value) const { return ArgInfo.getPreloadedValue(Value); } Register getPreloadedReg(AMDGPUFunctionArgInfo::PreloadedValue Value) const { - auto Arg = ArgInfo.getPreloadedValue(Value).first; + auto Arg = std::get<0>(ArgInfo.getPreloadedValue(Value)); return Arg ? Arg->getRegister() : Register(); } From llvm-commits at lists.llvm.org Mon Jul 6 14:01:21 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:01:21 +0000 (UTC) Subject: [PATCH] D83238: AMDGPU/GlobalISel: Add types to special inputs In-Reply-To: References: Message-ID: <468e6dfc276f21df6ce6e6e1b0a9a379@localhost.localdomain> arsenm closed this revision. arsenm added a comment. f25d020c2ec7cb1971fa56b99381d416799d8145 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83238/new/ https://reviews.llvm.org/D83238 From llvm-commits at lists.llvm.org Mon Jul 6 14:03:14 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:03:14 +0000 (UTC) Subject: [PATCH] D82828: [ELF] Don't resolve a relocation in .debug_line referencing an ICF folded symbol to the tombstone value In-Reply-To: References: Message-ID: thakis added a comment. Hello, this seems to increase runtime of some dwarf processing tools for us by several orders of magnitude (from "terminate in a few minutes" to "don't know how long they terminate; not before our timeouts"). https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5 has repro steps. Can we revert this and analyze async? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82828/new/ https://reviews.llvm.org/D82828 From llvm-commits at lists.llvm.org Mon Jul 6 14:05:22 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:05:22 +0000 (UTC) Subject: [PATCH] D83258: AMDGPU/GlobalISel: Initial Implementation of calls Message-ID: arsenm created this revision. arsenm added reviewers: nhaehnle, kerbowa, foad, Petar.Avramovic, mbrkusanin. Herald added subscribers: hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Return values, and tail calls are not yet handled. https://reviews.llvm.org/D83258 Files: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83258.275827.patch Type: text/x-patch Size: 381430 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:06:55 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:06:55 +0000 (UTC) Subject: [PATCH] D82129: [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope In-Reply-To: References: Message-ID: <6fb7468c5bd1a48cfcde689894a4d4cf@localhost.localdomain> probinson added a comment. I think I didn't fully grasp that the blocks were being (tail-)merged, which makes the scope ambiguous, and all the rest. So I withdraw the objection on that basis. DWARF is fine with multiple variables pointing to the same location, but it's less forgiving about scopes IIRC, much like it can't describe multiple source attributions for an instructions. This all makes me sad, but that's how DWARF is at the moment. Is there still an open question about whether this wants to be a cleanup pass or a verifier check? I apologize for losing track. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82129/new/ https://reviews.llvm.org/D82129 From llvm-commits at lists.llvm.org Mon Jul 6 14:07:34 2020 From: llvm-commits at lists.llvm.org (Evandro Menezes via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:07:34 +0000 (UTC) Subject: [PATCH] D82713: Improve stack object printing. In-Reply-To: References: Message-ID: evandro added inline comments. ================ Comment at: llvm/lib/CodeGen/MachineFrameInfo.cpp:240 if (i < NumFixedObjects) - OS << ", fixed"; + OS << ", fixed:"; if (i < NumFixedObjects || SO.SPOffset != -1) { ---------------- madhur13490 wrote: > evandro wrote: > > s/fixed:/fixed/ > As I said earlier, there is inconsistency; at some place "=" and other ":". It depends on personal choice but we need to agree on one. > Since the word `fixed` is followed by a comma, it should be followed by neither `:` nor `=`, as originally. ================ Comment at: llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll:26 +; COMMON: Frame Objects: +; COMMON: fi#-4: {{.*}} fixed:, at location [SP+8], Split Slot: No +; COMMON: fi#-3: {{.*}} fixed:, at location [SP], Split Slot: No ---------------- E.g., `fixed` followed by `:` and `,` makes no sense. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82713/new/ https://reviews.llvm.org/D82713 From llvm-commits at lists.llvm.org Mon Jul 6 14:07:57 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via llvm-commits) Date: Mon, 06 Jul 2020 14:07:57 -0700 (PDT) Subject: [llvm] f7a7efb - [AMDGPU] Tweak getTypeLegalizationCost() Message-ID: <5f0392ad.1c69fb81.1ed69.2cf2@mx.google.com> Author: Stanislav Mekhanoshin Date: 2020-07-06T14:07:48-07:00 New Revision: f7a7efbf88b72b4aa6bd95a1ded6dacd2237f2f8 URL: https://github.com/llvm/llvm-project/commit/f7a7efbf88b72b4aa6bd95a1ded6dacd2237f2f8 DIFF: https://github.com/llvm/llvm-project/commit/f7a7efbf88b72b4aa6bd95a1ded6dacd2237f2f8.diff LOG: [AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078 Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll llvm/test/Analysis/CostModel/AMDGPU/mul.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a3135a787639..d90272848500 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11690,3 +11690,18 @@ bool SITargetLowering::requiresUniformRegister(MachineFunction &MF, SmallPtrSet Visited; return hasCFUser(V, Visited, Subtarget->getWavefrontSize()); } + +std::pair +SITargetLowering::getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const { + auto Cost = TargetLoweringBase::getTypeLegalizationCost(DL, Ty); + auto Size = DL.getTypeSizeInBits(Ty); + // Maximum load or store can handle 8 dwords for scalar and 4 for + // vector ALU. Let's assume anything above 8 dwords is expensive + // even if legal. + if (Size <= 256) + return Cost; + + Cost.first = (Size + 255) / 256; + return Cost; +} diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h index ffe9140d3d07..f4c076464057 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.h +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -464,6 +464,9 @@ class SITargetLowering final : public AMDGPUTargetLowering { MachineFunction &MF, const SIRegisterInfo &TRI, SIMachineFunctionInfo &Info) const; + + std::pair getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const; }; } // End namespace llvm diff --git a/llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll b/llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll index 9a2c01058b28..609769fd5148 100644 --- a/llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll +++ b/llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll @@ -90,7 +90,7 @@ define amdgpu_kernel void @add_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64> add } ; ALL: 'add_v16i64' -; ALL: estimated cost of 32 for {{.*}} add <16 x i64> +; ALL: estimated cost of 128 for {{.*}} add <16 x i64> define amdgpu_kernel void @add_v16i64(<16 x i64> addrspace(1)* %out, <16 x i64> addrspace(1)* %vaddr, <16 x i64> %b) #0 { %vec = load <16 x i64>, <16 x i64> addrspace(1)* %vaddr %add = add <16 x i64> %vec, %b diff --git a/llvm/test/Analysis/CostModel/AMDGPU/mul.ll b/llvm/test/Analysis/CostModel/AMDGPU/mul.ll index 4d8a66ecd429..fa36d391f9c3 100644 --- a/llvm/test/Analysis/CostModel/AMDGPU/mul.ll +++ b/llvm/test/Analysis/CostModel/AMDGPU/mul.ll @@ -90,7 +90,7 @@ define amdgpu_kernel void @mul_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64> add ; ALL: 'mul_v8i64' -; ALL: estimated cost of 128 for {{.*}} mul <8 x i64> +; ALL: estimated cost of 256 for {{.*}} mul <8 x i64> define amdgpu_kernel void @mul_v8i64(<8 x i64> addrspace(1)* %out, <8 x i64> addrspace(1)* %vaddr, <8 x i64> %b) #0 { %vec = load <8 x i64>, <8 x i64> addrspace(1)* %vaddr %mul = mul <8 x i64> %vec, %b From llvm-commits at lists.llvm.org Mon Jul 6 14:09:17 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:09:17 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: Tyker added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- lebedev.ri wrote: > Tyker wrote: > > lebedev.ri wrote: > > > Tyker wrote: > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > No, the current logic is correct, that map doesn't live that long. > > here is the situation i think fails. > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > the reduction isn't considered interesting. > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > the association from a index to a Bundle isn't the same, > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > we have in this module `M0`. It then comes with different chunks to keep and tells us > to mutate the module `Mc` (which is a perfect clone of `M0`). > > It is up to us to actually enumerate the features (here: bundles). > As long as the mapping is stable, i.e. we get the same result > when calling `extractOperandBundesFromModule()` on the same input, > we're good. > when calling extractOperandBundesFromModule() on the same input, we're good. i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:09:24 2020 From: llvm-commits at lists.llvm.org (Thorsten via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:09:24 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: tschuett added a comment. Pardon my cmake, but don't you need a CMakeLists.txt in the GISel sub-directory? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Mon Jul 6 14:09:39 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:09:39 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <30b61448ce0592105034a5c75f35668e@localhost.localdomain> arsenm added a comment. Patch committed to remove the use from rocclr CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Mon Jul 6 14:10:21 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:10:21 +0000 (UTC) Subject: [PATCH] D62911: WIP: AMDGPU: Use fixup for local linkage functions In-Reply-To: References: Message-ID: arsenm marked an inline comment as done. arsenm added inline comments. ================ Comment at: lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:156 + if (MO.getTargetFlags() == SIInstrInfo::MO_PCREL32_HI) + Expr = AMDGPUMCExpr::create(AMDGPUMCExpr::VK_AMDGPU_PCREL_HI32, Expr, Ctx); + ---------------- hliao wrote: > Why not use `MCBinaryExpr::createAShr` to shift that high bits into low bits directly? We don't need invent a new target fixup. That might work. Do we need an explicit truncate from 64-bit to 32-bit operator though? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62911/new/ https://reviews.llvm.org/D62911 From llvm-commits at lists.llvm.org Mon Jul 6 14:14:43 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:14:43 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <09107e65470dd5e0439e3c6085a68966@localhost.localdomain> jdenny marked 2 inline comments as done. jdenny added inline comments. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:65 +// Hold information about clause validity by version +class VersionedClause { + // Actual clause ---------------- clementval wrote: > jdenny wrote: > > Why not unsigned as in the original? > TableGen type does not have an `unsigned` type so I went with the closest one. Ah, I thought I had seen it used elsewhere, but it looks like I'm mistaken. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:69 + + // Mininum version number where this clause is valid in the list. + int minVersion = min; ---------------- clementval wrote: > jdenny wrote: > > What does "the list" refer to? > I updated the comment and dropped the list. I think it is clearer now. I don't see the change. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def:1881 -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, release) -// TODO This should ne `none` instead -__OMP_DIRECTIVE_CLAUSE(flush, 1, ~0, flush) ---------------- clementval wrote: > jdenny wrote: > > This patch loses this TODO and the next one. I'm not sure what they mean. Do we need to keep them? > I updated the patch to keep it. Same for `depobj`. @jdoerfert : If I read git blame correctly, you wrote this TODO originally. Can you help us to understand whether it's worth preserving? ================ Comment at: llvm/test/TableGen/directive1.td:109 +// IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 1 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; ---------------- clementval wrote: > jdenny wrote: > > I know the original code used a giant if-else block, but shouldn't this be a switch? > The idea was to migrate to TableGen with the exact same code generated and to update this in a follow up patch. It would be replace with at least a switch for the `Directive` and probably an inner switch for the `Clause` or we keep the `if`s inside the Directives' switch. Sure, makes sense to do it in a separate patch. It seems like both switches makes sense, but the inner switches might benefit from default cases. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- clementval wrote: > jdenny wrote: > > Why is an empty line needed? > Just to be consistent with clang-format in the generated file. It's surprising that clang-format would require an empty line at the end of the file. Any idea why? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 14:19:36 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:19:36 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <807801fe3c79d81eeac4ec23538dc9a3@localhost.localdomain> lebedev.ri marked 2 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > lebedev.ri wrote: > > Tyker wrote: > > > lebedev.ri wrote: > > > > Tyker wrote: > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > No, the current logic is correct, that map doesn't live that long. > > > here is the situation i think fails. > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > the reduction isn't considered interesting. > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > the association from a index to a Bundle isn't the same, > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > It is up to us to actually enumerate the features (here: bundles). > > As long as the mapping is stable, i.e. we get the same result > > when calling `extractOperandBundesFromModule()` on the same input, > > we're good. > > when calling extractOperandBundesFromModule() on the same input, we're good. > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. Sure, but can you point me at the spot where you believe that would matter? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:21:38 2020 From: llvm-commits at lists.llvm.org (Alexander Shaposhnikov via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:21:38 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: alexshap added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:217 + + // Add new RPaths. + for (StringRef RPath : Config.RPathToAdd) { ---------------- sameerarora101 wrote: > jhenderson wrote: > > sameerarora101 wrote: > > > alexshap wrote: > > > > sameerarora101 wrote: > > > > > alexshap wrote: > > > > > > smeenai wrote: > > > > > > > Nit: do we want to be adding new load commands in a function called `updateLoadCommands`? At least to me that function name seems like it should only be updating existing load commands, since we have a separate `removeLoadCommands` to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate `addLoadCommands` function. > > > > > > so basically the idea was to group together logical pieces of handleArgs (to some reasonable extent). > > > > > > Besides error-reporting removeLoadCommands is ~10-12 lines of code, so I'd probably inline it into > > > > > > updateLoadCommands for consistency. > > > > > @alexshap Instead of inlining the whole `removeLoadCommands` inside `updateLoadCommands` I think it would cleaner if I just call > > > > > > > > > > ``` > > > > > // Remove LCs. > > > > > if (Error E = removeLoadCommands(Config, Obj)) > > > > > return E; > > > > > ``` > > > > > from inside `updateLoadCommands`. This can allow for independent development of `removeLoadCommands` in future as well. What do you think? (I have updated the current diff with this change, however, I can update it again in case we want something else) > > > > I'm not sure that removeLoadCommands is realistically independent from updateLoadCommands, e.g. the order in which you modify the list of load commands appears to be important. Since it's small (~10 lines) it seems preferable to avoid creating this weird asymmetry between removing/adding. > > > I see. Ok, I have inlined `removeLoadCommands` into `updateLoadCommands` then. Thanks 😊 > > There are two options. Either a) rename `updateLoadCommands` to something more generic (e.g. `processLoadCommands`, in which case I'd ensure all load command processing is done in that function), or b) think of it purely in conceptual terms where the load commands in the function name refers to the set of load commands, rather than each individual load command, if that makes sense. Thus you update that set by adding/removing elements, and also changing the existing elements. > I like `processLoadCommands` too. Updated (I can change it to something else too?) +1 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 14:21:39 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:21:39 +0000 (UTC) Subject: [PATCH] D82828: [ELF] Don't resolve a relocation in .debug_line referencing an ICF folded symbol to the tombstone value In-Reply-To: References: Message-ID: MaskRay added a comment. In D82828#2134226 , @thakis wrote: > Hello, this seems to increase runtime of some dwarf processing tools for us by several orders of magnitude (from "terminate in a few minutes" to "don't know how long they terminate; not before our timeouts"). https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5 has repro steps. > > Can we revert this and analyze async? I don't think this is LLD's problem. I'd rather add an option `--dead-reloc-addend='.debug_*=0xffffffffffffffff'`, probably temporarily. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82828/new/ https://reviews.llvm.org/D82828 From llvm-commits at lists.llvm.org Mon Jul 6 14:21:49 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:21:49 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: Tyker added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- lebedev.ri wrote: > Tyker wrote: > > lebedev.ri wrote: > > > Tyker wrote: > > > > lebedev.ri wrote: > > > > > Tyker wrote: > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > No, the current logic is correct, that map doesn't live that long. > > > > here is the situation i think fails. > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > the reduction isn't considered interesting. > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > the association from a index to a Bundle isn't the same, > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > As long as the mapping is stable, i.e. we get the same result > > > when calling `extractOperandBundesFromModule()` on the same input, > > > we're good. > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > Sure, but can you point me at the spot where you believe that would matter? the key of the Map depends on pointer values Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:26:50 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 21:26:50 +0000 (UTC) Subject: [PATCH] D82988: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo In-Reply-To: References: Message-ID: luismarques accepted this revision. luismarques added a comment. This revision is now accepted and ready to land. LGTM. Nice! ================ Comment at: llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:74-76 MachineBasicBlock::iterator NMBBI = std::next(MBBI); - Modified |= expandMI(MBB, MBBI, NMBBI); + Modified |= expandMI(MBB, MBBI); MBBI = NMBBI; ---------------- lenary wrote: > Oh this loop can be simplifed - though I'm not sure I should be incrementing `MBBI` if we've inserted new instructions using `BuildMI`, which I think also increments the iterator automatically. Guidance here would be helpful. Maybe I misunderstood your issue, but I think the increment is correct: 1) It is what AArch64 does. 2) If you have two consecutive pseudo-instructions this code expands them both, so you aren't skipping over the second by performing the explicit increment. 3) You don't want to iterate over the expanded instructions, to recursively expand them, since we don't emit other pseudo-instructions in the expansions. BTW, what simplification were you considering for this loop? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82988/new/ https://reviews.llvm.org/D82988 From llvm-commits at lists.llvm.org Mon Jul 6 14:28:08 2020 From: llvm-commits at lists.llvm.org (Hideto Ueno via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:28:08 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <9488e55acff4dc624927892b5087a347@localhost.localdomain> uenoku added a comment. Herald added a subscriber: okura. Either is fine but I think it is more natural to forbid an empty list. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 14:31:10 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:31:10 +0000 (UTC) Subject: [PATCH] D83179: [SCCP] Use range metadata for loads and calls In-Reply-To: References: Message-ID: <06fcd057224004e24cd0d041badce77b@localhost.localdomain> fhahn accepted this revision. fhahn added a comment. This revision is now accepted and ready to land. LGTM, thanks. > It should also be possible to use !nonnull using ValueLattice::getNot/markNotConstant should possibly just work? For icmps we use ValueLattice's helpers, they should be able to deal with it, unless we bail out for pointers somewhere in SCCP. ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1107 +static ValueLatticeElement getValueFromMetadata(const Instruction *I) { + if (MDNode *Ranges = I->getMetadata(LLVMContext::MD_range)) ---------------- I think we have something similar in LVI. Might be good to move the common logic to ValueLattice ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1121 + // as overdefined. + if (I.getType()->isStructTy() || I.isVolatile()) return (void)markOverdefined(&I); ---------------- For volatile loads, I think we could still use the range info if present? We are just not allowed to remove the volatile operation, right? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83179/new/ https://reviews.llvm.org/D83179 From llvm-commits at lists.llvm.org Mon Jul 6 14:33:34 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:33:34 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Optionally set OriginalOp to renamed value it refers to. In-Reply-To: References: Message-ID: fhahn added a comment. In D78133#2131838 , @nikic wrote: > Do we need to have this behind an option? That is, are there any PredicateInfo consumers who would //not// want this behavior? Currently NewGVN relies on the current behavior. It uses it to easily look up the original instruction and uses it to merge the metadata of replacement instructions (which is not an issue for SCCP because we don't replace the predicates with other equivalent instructions which could have metadata). I guess we could try to keep track of the original instruction in NewGVN (or traverse the chain there), but I don't think it's worth blocking the patch on the change to NewGVN. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 From llvm-commits at lists.llvm.org Mon Jul 6 14:33:46 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:33:46 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <75f2fb3edd7d2bc9ac29fdffc03cb624@localhost.localdomain> nemanjai requested changes to this revision. nemanjai added a comment. This revision now requires changes to proceed. Pretty close to the way it should be but the indices do need to flip so I have to request changes. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9798 + + // The LHS and RHS may be bitcasts to v8i16 as we canonicalize shuffles + // to v8i16. Peek through the bitcasts to get the actual operands, ---------------- This comment is incorrect. The canonical type is `v16i8`. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9801 + // and canonicalize the RHS being a BUILD_VECTOR when lowering to xxsplti32dx. + LHS = peekThroughBitcasts(LHS); + RHS = peekThroughBitcasts(RHS); ---------------- Is there a reason we don't just define these this way above? i.e. `SDValue LHS = peekThroughBitcasts(SVN->getOperand(0));` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9817 + bool HasAnyUndefs; + bool IsBVNConstSplat = + BVN->isConstantSplat(APSplatValue, APSplatUndef, SplatBitSize, ---------------- You do not use `IsBVNConstSplat` anywhere except in the condition. You can just put the call in the condition i.e. `if (!BVN->isConstantSplat(...) || SplatBitSize > 32)` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9824 + // Check if RHS is a splat of 4-bytes (or smaller). + if ((SplatBitSize / 8) > 4) + return SDValue(); ---------------- This can be folded into the above condition. I think it is reasonable to expect the reader to understand that 32 bits is 4 bytes (on PPC) so we don't need to divide by 8. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9827 + + // Check that the shuffle mask matches the semantics the XXSPLTI32DX. + // XXSPLTI32DX can insert 4 byte chunks from the constant splat C into: ---------------- anil9 wrote: > semantics the -> semantics of the ``` // Check that the shuffle mask matches the semantics of XXSPLTI32DX. // The instruction splats a constant C into two words of the source vector // producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }. // Thus we check that the shuffle mask is the equivalent of // <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively. // Note: the check above of isNByteElemShuffleMask() ensures that the bytes // within each word are consecutive, so we only need to check the first byte. ``` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839 + ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1. + Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1); + else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2. ---------------- We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`). ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839 + ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1. + Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1); + else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2. ---------------- nemanjai wrote: > We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`). This is backwards. On LE, the rightmost element is element zero. In this path, the constant goes into the most significant word of each doubleword. So your `Index` needs to flip in both places. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:1278 + /// handled by the XXSPLTI32DX instruction introduced in ISA 3.1. + SDValue lowerToXXSPLTI32DX(ShuffleVectorSDNode *N, SelectionDAG &DAG) const; + ---------------- anil9 wrote: > nit : > /// otherwise return the default SDValue. ??? All lowering and combine functions return a default constructed SDValue when unsuccessful. There is no reason to call that out specifically. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:754 + [(set v2i64:$XT, + (PPCxxsplti32dx v2i64:$XTi, i1:$IX, + i32:$IMM32))]>, ---------------- Can we please use `i32` rather than `i1` as the latter could lead to issues (with using CRBIT registers which we really don't want to do). ================ Comment at: llvm/test/CodeGen/PowerPC/p10-splatImm32.ll:21 +entry: + %vecins1 = shufflevector <4 x i32> %a, <4 x i32> , <4 x i32> + ret <4 x i32> %vecins1 ---------------- The result of this shuffle is: `{ a[0], 566, a[2], 566 }` Which produces a vector register: ``` LE: [ 566 | a[2] | 566 | a[0] ] => xxsplti32dx vs34, 0, 566 BE: [ a[0] | 566 | a[2] | 566 ] => xxsplti32dx vs34, 1, 566 ``` So it is backwards - similarly to all the test cases. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 14:33:47 2020 From: llvm-commits at lists.llvm.org (Atmn Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:33:47 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: atmnpatel added a comment. Herald added a subscriber: aaron.ballman. Ping @ABataev. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Mon Jul 6 14:34:28 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:34:28 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: <06a046e1b31833e12a467f082760e9f0@localhost.localdomain> greened updated this revision to Diff 275831. greened added a comment. Herald added a project: clang. Herald added a subscriber: cfe-commits. Fixed various bugs, added tests. This now has two modes because generated function output can't be ordered with respect to source file functions. When clang generates functions it can sometimes output original source functions in a different (non-source) order so checks can't be placed next to their definitions in the source file. I don't particularly like this mode dichotomy but unifying it would necessitate updating a whole lot of clang tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 Files: clang/test/utils/update_cc_test_checks/Inputs/generated-funcs.c clang/test/utils/update_cc_test_checks/Inputs/generated-funcs.c.generated.expected clang/test/utils/update_cc_test_checks/Inputs/generated-funcs.c.no-generated.expected clang/test/utils/update_cc_test_checks/generated-funcs.test llvm/utils/UpdateTestChecks/asm.py llvm/utils/UpdateTestChecks/common.py llvm/utils/update_cc_test_checks.py llvm/utils/update_llc_test_checks.py llvm/utils/update_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D83004.275831.patch Type: text/x-patch Size: 29805 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:35:35 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:35:35 +0000 (UTC) Subject: [PATCH] D83260: [PGO][PGSO] Add profile guided size optimizations to some new sites. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83260 Files: llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/Target/X86/X86FixupLEAs.cpp llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/CodeGen/AArch64/arm64-fp-imm-size.ll llvm/test/CodeGen/X86/fixup-lea.ll llvm/test/CodeGen/X86/opt-pipeline.ll llvm/test/CodeGen/X86/phaddsub-extract.ll llvm/test/CodeGen/X86/popcnt.ll llvm/test/CodeGen/X86/pr27202.ll llvm/test/CodeGen/X86/splat-for-size.ll llvm/test/Transforms/LoopVectorize/optsize.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83260.275833.patch Type: text/x-patch Size: 40932 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:37:17 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:37:17 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: lebedev.ri marked 3 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > lebedev.ri wrote: > > Tyker wrote: > > > lebedev.ri wrote: > > > > Tyker wrote: > > > > > lebedev.ri wrote: > > > > > > Tyker wrote: > > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > > No, the current logic is correct, that map doesn't live that long. > > > > > here is the situation i think fails. > > > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > > the reduction isn't considered interesting. > > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > > the association from a index to a Bundle isn't the same, > > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > > As long as the mapping is stable, i.e. we get the same result > > > > when calling `extractOperandBundesFromModule()` on the same input, > > > > we're good. > > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > > Sure, but can you point me at the spot where you believe that would matter? > the key of the Map depends on pointer values `R.CallsToRefine` is indeed a map, with key being `CallBase *`. We *just* built that map, for this very `Mc` we were provided. I'm still not seeing where pointer order would matter.. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:38:24 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:38:24 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: greened marked an inline comment as done. greened added inline comments. ================ Comment at: llvm/utils/update_cc_test_checks.py:133 + parser.add_argument('--include-generated-funcs', action='store_true', + help='Output checks for functions not in source') parser.add_argument('tests', nargs='+') ---------------- jdoerfert wrote: > I think this should go into common.py (after D78618). I would also make this the default but OK. Yes I suppose it should in case `opt` and friends generate functions. I hadn't considered that use-case. While I would like to make it default unfortunately it would require updating a bunch of the existing clang tests which doesn't seem too friendly. See the patch update comment for details. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Mon Jul 6 14:40:09 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Mon, 6 Jul 2020 14:40:09 -0700 Subject: [PATCH] D82387: [flang] add RTBuilder In-Reply-To: <7291fbdeedbc2408b140944fa7971cfe@localhost.localdomain> References: <7291fbdeedbc2408b140944fa7971cfe@localhost.localdomain> Message-ID: Agreed. This should be fixed :) -eric On Mon, Jul 6, 2020 at 12:54 PM David Truby via Phabricator < reviews at reviews.llvm.org> wrote: > DavidTruby added a comment. > > Is there a reason to use `float _Complex` here at all? The C++ standard > (29.5.4 of C++17) guarantees that `std::complex` and `float > _Complex` are layout compatible and can be reinterpret_casted to each other > so even if these functions are intended to be callable from C/interoperable > with _Complex in C code, it'd be better to use std::complex on the > C++ side. > > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D82387/new/ > > https://reviews.llvm.org/D82387 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:42:39 2020 From: llvm-commits at lists.llvm.org (Bruno Ricci via llvm-commits) Date: Mon, 06 Jul 2020 14:42:39 -0700 (PDT) Subject: [llvm] 02946de - [Support][NFC] Fix Wdocumentation warning in ADT/Bitfields.h Message-ID: <5f039acf.1c69fb81.84441.49e7@mx.google.com> Author: Bruno Ricci Date: 2020-07-06T22:41:40+01:00 New Revision: 02946de3802d3bc65bc9f2eb9b8d4969b5a7add8 URL: https://github.com/llvm/llvm-project/commit/02946de3802d3bc65bc9f2eb9b8d4969b5a7add8 DIFF: https://github.com/llvm/llvm-project/commit/02946de3802d3bc65bc9f2eb9b8d4969b5a7add8.diff LOG: [Support][NFC] Fix Wdocumentation warning in ADT/Bitfields.h \tparam is used for template parameters instead of \param. Added: Modified: llvm/include/llvm/ADT/Bitfields.h Removed: ################################################################################ diff --git a/llvm/include/llvm/ADT/Bitfields.h b/llvm/include/llvm/ADT/Bitfields.h index 68b1549a0ac5..9891b4692e80 100644 --- a/llvm/include/llvm/ADT/Bitfields.h +++ b/llvm/include/llvm/ADT/Bitfields.h @@ -212,10 +212,10 @@ template <> struct ResolveUnderlyingType { struct Bitfield { /// Describes an element of a Bitfield. This type is then used with the /// Bitfield static member functions. - /// \param T, the type of the field once in unpacked form, - /// \param Offset, the position of the first bit, - /// \param Size, the size of the field, - /// \param MaxValue, For enums the maximum enum allowed. + /// \tparam T The type of the field once in unpacked form. + /// \tparam Offset The position of the first bit. + /// \tparam Size The size of the field. + /// \tparam MaxValue For enums the maximum enum allowed. template ::value ? T(0) // coupled with static_assert below From llvm-commits at lists.llvm.org Mon Jul 6 14:42:48 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:42:48 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: <6b82c598f4c7f79a2646d2aa3eb04af6@localhost.localdomain> greened added a comment. In D83004#2129362 , @jdoerfert wrote: > This is great! Just a few days ago I added a TODO in one of the tests I created asking for this: D82722 :) Glad to help! > Will this work for all test scripts? Obviously right now it's only enabled for clang but it should be straightforward to add to the other test scripts. Al the infrastructure is there, the various drivers just have to use it. > Will this make the `prefix_exclusions` logic obsolete? Yeah, I think it might. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Mon Jul 6 14:45:32 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:45:32 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: wmi updated this revision to Diff 275836. wmi added a comment. Enable -sample-profile-merge-inlinee by default together with -sample-profile-top-down-load. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 Files: llvm/lib/Transforms/IPO/SampleProfile.cpp llvm/test/Transforms/SampleProfile/entry_counts_cold.ll llvm/test/Transforms/SampleProfile/inline-mergeprof.ll llvm/test/Transforms/SampleProfile/inline-topdown.ll Index: llvm/test/Transforms/SampleProfile/inline-topdown.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-topdown.ll +++ llvm/test/Transforms/SampleProfile/inline-topdown.ll @@ -1,10 +1,10 @@ ; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op. ; Test we aren't doing specialization for inlining with default source order -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S | FileCheck -check-prefix=DEFAULT %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-top-down-load=false -S | FileCheck -check-prefix=DEFAULT %s ; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load' -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S | FileCheck -check-prefix=TOPDOWN %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load=true -S | FileCheck -check-prefix=TOPDOWN %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/test/Transforms/SampleProfile/inline-mergeprof.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-mergeprof.ll +++ llvm/test/Transforms/SampleProfile/inline-mergeprof.ll @@ -1,10 +1,10 @@ ; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s ; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=MERGE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=MERGE %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/test/Transforms/SampleProfile/entry_counts_cold.ll =================================================================== --- llvm/test/Transforms/SampleProfile/entry_counts_cold.ll +++ llvm/test/Transforms/SampleProfile/entry_counts_cold.ll @@ -108,7 +108,7 @@ ; CHECK: [[TOP]] = !{!"function_entry_count", i64 101} ; CHECK: [[FOO]] = !{!"function_entry_count", i64 151} ; CHECK: [[BAR]] = !{!"function_entry_count", i64 303} -; CHECK: [[BAZ]] = !{!"branch_weights", i64 303} +; CHECK: [[BAZ]] = !{!"branch_weights", i32 303} !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 8.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: GNU) !1 = !DIFile(filename: "temp.c", directory: "llvm/test/Transforms/SampleProfile") Index: llvm/lib/Transforms/IPO/SampleProfile.cpp =================================================================== --- llvm/lib/Transforms/IPO/SampleProfile.cpp +++ llvm/lib/Transforms/IPO/SampleProfile.cpp @@ -148,12 +148,12 @@ "be accurate. It may be overriden by profile-sample-accurate. ")); static cl::opt ProfileMergeInlinee( - "sample-profile-merge-inlinee", cl::Hidden, cl::init(false), + "sample-profile-merge-inlinee", cl::Hidden, cl::init(true), cl::desc("Merge past inlinee's profile to outline version if sample " "profile loader decided not to inline a call site.")); static cl::opt ProfileTopDownLoad( - "sample-profile-top-down-load", cl::Hidden, cl::init(false), + "sample-profile-top-down-load", cl::Hidden, cl::init(true), cl::desc("Do profile annotation and inlining for functions in top-down " "order of call graph during sample profile loading.")); -------------- next part -------------- A non-text attachment was scrubbed... Name: D82919.275836.patch Type: text/x-patch Size: 4882 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:45:39 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:45:39 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: clementval updated this revision to Diff 275837. clementval marked 4 inline comments as done. clementval added a comment. Address more comment from @jdenny Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275837.patch Type: text/x-patch Size: 95949 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 14:45:43 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:45:43 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <74676066a41c7b6660ddf3c9ab89990f@localhost.localdomain> clementval added inline comments. ================ Comment at: llvm/include/llvm/Frontend/Directive/DirectiveBase.td:69 + + // Mininum version number where this clause is valid in the list. + int minVersion = min; ---------------- jdenny wrote: > clementval wrote: > > jdenny wrote: > > > What does "the list" refer to? > > I updated the comment and dropped the list. I think it is clearer now. > I don't see the change. Forgot this one. Sorry. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- jdenny wrote: > clementval wrote: > > jdenny wrote: > > > Why is an empty line needed? > > Just to be consistent with clang-format in the generated file. > It's surprising that clang-format would require an empty line at the end of the file. Any idea why? Sorry, it is not clang-format but Phabricator which signal a missing empty line at end of file. This is not mandatory and can be removed. The file is generated. I just wanted to be consistent with the style. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 14:48:01 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:48:01 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: probinson added a comment. In D82975#2134132 , @dblaikie wrote: > In D82975#2128353 , @dstenb wrote: > > > In D82975#2127201 , @SouraVX wrote: > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Mon Jul 6 14:48:26 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:48:26 +0000 (UTC) Subject: [PATCH] D78861: [Attributor] [WIP] Track AA dependency using dependency graph In-Reply-To: References: Message-ID: kuter added a comment. Herald added a subscriber: okura. In D78861#2133485 , @bbn wrote: > In D78861#2131573 , @jdoerfert wrote: > > > This looks pretty good :). Nice active review :) > > > > I have some minor comments below. We also should add a test for the print and dot output. > > > I need some help here: > Is there a way to test the dot output? I checked the .dot file and found it hard to write CHECK lines (see below) because we are interested in the link between different graph nodes (line 3 and line 4) > > Node0x55be15e4f7d0 [shape=record,label="{[AAValueSimplify] for CtxI ' %2 = load i32, i32* %0, align 4' at position \{arg: [@0]\} with state simplified\n}"]; > Node0x55be15e4f810 [shape=record,label="{[AANoUnwind] for CtxI ' %2 = load i32, i32* %0, align 4' at position \{fn:checkAndAdvance [checkAndAdvance at -1]\} with state nounwind\n}"]; > Node0x55be15e4f810 -> Node0x55be15e500b0; > Node0x55be15e4f810 -> Node0x55be15e500b0; > > > I have referred to some other similar tests like the *cfg_deopt_unreach.ll*, but none of theme shows how to write check lines for such testcases. I think something like this might work. // CHECK-DAG: [[NODE1:Node0x[0-9a-f]+]] ->[[NODE2]]; .... CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78861/new/ https://reviews.llvm.org/D78861 From llvm-commits at lists.llvm.org Mon Jul 6 14:48:48 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:48:48 +0000 (UTC) Subject: [PATCH] D83136: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst In-Reply-To: References: Message-ID: <591395fa0b613d874d8c8a0f21413f87@localhost.localdomain> jfb added inline comments. ================ Comment at: llvm/include/llvm/IR/IRBuilder.h:1746 + return Insert(new AtomicCmpXchgInst( + Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID)); } ---------------- jyknight wrote: > jfb wrote: > > Are types always rounded to a power of two at this point? > > > > i.e. what does this do: `struct { char c[3]; };` ? > > > > Also, I think this is wrong for non-lock-free types. The alignment requirement is lower on those. > This is just encoding the pre-existing behavior -- you cannot currently create an cmpxchg instruction with alignment other than the size of the type. > > Right now, you _also_ cannot create a cmpxchg instruction with other than integral or pointer types, which -- in any _current_ llvm backend, afaik -- have non-power-of-2 sizes. > > Upcoming changes will plumb through a required alignment argument everywhere, and then we'll be rid of this weird hardcoded special alignment behavior here. That sounds good to me. FWIW I checked and we get the following today: ``` %3 = bitcast %"struct.std::__1::atomic"* %0 to i32* %4 = zext i24 %1 to i32 %5 = cmpxchg i32* %3, i32 %4, i32 %4 seq_cst seq_cst ret void ``` That being said, if we're going to allow other things to come into a cmpxchg in the future (i.e. remove the need to bitcast) then I want to make sure we encode the right requirements here, right now. I agree that they're enforced later in the code when the instruction is created, but that won't always be the case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83136/new/ https://reviews.llvm.org/D83136 From llvm-commits at lists.llvm.org Mon Jul 6 14:49:23 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:49:23 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <988bbfae255c59ffdb023ca080ab8838@localhost.localdomain> ABataev added a comment. In D75591#2134301 , @atmnpatel wrote: > Ping @ABataev. Rebase, please Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Mon Jul 6 14:50:25 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Mon, 06 Jul 2020 14:50:25 -0700 (PDT) Subject: [llvm] 7c63804 - Fix [-Werror, -Wsign-compare] in dominator unit test. Message-ID: <5f039ca1.1c69fb81.5380.1414@mx.google.com> Author: Eric Christopher Date: 2020-07-06T14:50:13-07:00 New Revision: 7c63804383f6baed3d934b3569f406c078869567 URL: https://github.com/llvm/llvm-project/commit/7c63804383f6baed3d934b3569f406c078869567 DIFF: https://github.com/llvm/llvm-project/commit/7c63804383f6baed3d934b3569f406c078869567.diff LOG: Fix [-Werror,-Wsign-compare] in dominator unit test. Added: Modified: llvm/unittests/IR/DominatorTreeTest.cpp Removed: ################################################################################ diff --git a/llvm/unittests/IR/DominatorTreeTest.cpp b/llvm/unittests/IR/DominatorTreeTest.cpp index 16c12b2102a9..afb620f338e6 100644 --- a/llvm/unittests/IR/DominatorTreeTest.cpp +++ b/llvm/unittests/IR/DominatorTreeTest.cpp @@ -805,7 +805,7 @@ TEST(DominatorTree, InsertFromUnreachable) { BasicBlock *To = B.getOrAddBlock(LastUpdate->Edge.To); PDT.insertEdge(From, To); EXPECT_TRUE(PDT.verify()); - EXPECT_EQ(PDT.root_size(), 2); + EXPECT_EQ(PDT.root_size(), 2UL); // Make sure we can use a const pointer with getNode. const BasicBlock *BB5 = B.getOrAddBlock("5"); EXPECT_NE(PDT.getNode(BB5), nullptr); From llvm-commits at lists.llvm.org Mon Jul 6 14:50:41 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:50:41 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: Tyker added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- lebedev.ri wrote: > Tyker wrote: > > lebedev.ri wrote: > > > Tyker wrote: > > > > lebedev.ri wrote: > > > > > Tyker wrote: > > > > > > lebedev.ri wrote: > > > > > > > Tyker wrote: > > > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > > > No, the current logic is correct, that map doesn't live that long. > > > > > > here is the situation i think fails. > > > > > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > > > the reduction isn't considered interesting. > > > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > > > the association from a index to a Bundle isn't the same, > > > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > > > As long as the mapping is stable, i.e. we get the same result > > > > > when calling `extractOperandBundesFromModule()` on the same input, > > > > > we're good. > > > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > > > Sure, but can you point me at the spot where you believe that would matter? > > the key of the Map depends on pointer values > `R.CallsToRefine` is indeed a map, with key being `CallBase *`. > We *just* built that map, for this very `Mc` we were provided. > I'm still not seeing where pointer order would matter.. the map depends on pointer values so its ordering is different on each run. the map is iterated upon and the iteration order affects the mapping from Index to Feature(Bundles) so every run of has a different mapping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:52:34 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:52:34 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: <3028bcbdc75d5afcb27593bc4924ea3f@localhost.localdomain> plotfi added a comment. Ping? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 From llvm-commits at lists.llvm.org Mon Jul 6 14:54:32 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:54:32 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: <5fa4100cb7ec9aa19c0007426a5f7806@localhost.localdomain> wmi added a comment. I tested the performance with sample-profile-top-down-load and sample-profile-merge-inlinee both enabled. In different compiler versions I got different result. In one version about three weeks older, I got 0.4% improvement for one benchmark steadily in multiple runs and neutral for another. In the head llvm version, I saw neutral result for both benchmarks. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 From llvm-commits at lists.llvm.org Mon Jul 6 14:54:58 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:54:58 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <65045e4c4512a6c35a58ecd5c649dc68@localhost.localdomain> lebedev.ri marked 3 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > lebedev.ri wrote: > > Tyker wrote: > > > lebedev.ri wrote: > > > > Tyker wrote: > > > > > lebedev.ri wrote: > > > > > > Tyker wrote: > > > > > > > lebedev.ri wrote: > > > > > > > > Tyker wrote: > > > > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > > > > No, the current logic is correct, that map doesn't live that long. > > > > > > > here is the situation i think fails. > > > > > > > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > > > > the reduction isn't considered interesting. > > > > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > > > > the association from a index to a Bundle isn't the same, > > > > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > > > > As long as the mapping is stable, i.e. we get the same result > > > > > > when calling `extractOperandBundesFromModule()` on the same input, > > > > > > we're good. > > > > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > > > > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > > > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > > > > Sure, but can you point me at the spot where you believe that would matter? > > > the key of the Map depends on pointer values > > `R.CallsToRefine` is indeed a map, with key being `CallBase *`. > > We *just* built that map, for this very `Mc` we were provided. > > I'm still not seeing where pointer order would matter.. > the map depends on pointer values so its ordering is different on each run. > the map is iterated upon and the iteration order affects the mapping from Index to Feature(Bundles) > so every run of has a different mapping. > > Ahh, that is what you mean. No, it's the other way around. Do you agree that the only place we iterate over the map is: ``` for_each(R.CallsToRefine, [](const auto &P) { return maybeRewriteCallWithDifferentBundles(P.first, P.second); }); ``` ? But as you can see in `maybeRewriteCallWithDifferentBundles()`, it's second param is the indexes of bundle to preserve. So we have already performed the mapping before iterating over the map. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 14:57:51 2020 From: llvm-commits at lists.llvm.org (Atmn Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:57:51 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <2099c64073e7902ff53b23de924e7f1f@localhost.localdomain> atmnpatel updated this revision to Diff 275841. atmnpatel added a comment. Herald added a subscriber: jfb. Rebased Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 Files: clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp clang/docs/LibASTMatchersReference.html clang/include/clang/ASTMatchers/ASTMatchers.h clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/ASTMatchers/Dynamic/Registry.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/distribute_parallel_for_default_messages.cpp clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/driver.c clang/test/OpenMP/parallel_default_messages.cpp clang/test/OpenMP/parallel_for_default_messages.cpp clang/test/OpenMP/parallel_for_simd_default_messages.cpp clang/test/OpenMP/parallel_master_codegen.cpp clang/test/OpenMP/parallel_master_default_messages.cpp clang/test/OpenMP/parallel_sections_default_messages.cpp clang/test/OpenMP/target_parallel_default_messages.cpp clang/test/OpenMP/target_parallel_for_default_messages.cpp clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp clang/test/OpenMP/target_teams_default_messages.cpp clang/test/OpenMP/target_teams_distribute_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/task_default_messages.cpp clang/test/OpenMP/task_messages.cpp clang/test/OpenMP/teams_default_messages.cpp clang/test/OpenMP/teams_distribute_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/teams_distribute_simd_default_messages.cpp clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp clang/unittests/ASTMatchers/ASTMatchersTest.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def -------------- next part -------------- A non-text attachment was scrubbed... Name: D75591.275841.patch Type: text/x-patch Size: 283711 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:00:55 2020 From: llvm-commits at lists.llvm.org (Dan Zimmerman via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:00:55 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining Message-ID: danzimm created this revision. danzimm added a reviewer: hiraditya. Herald added subscribers: llvm-commits, rupprecht, MaskRay. Herald added a reviewer: jhenderson. Herald added a project: LLVM. Currently there is no way to disable source file output from `llvm-symbolizer`. Similarly there's no way to disable symbolicating inlined functions with `llvm-symbolzer` (this option is automatically disabled when `llvm-addr2line` is invoked instead of `llvm-symbolizer`). This diff introduces flags to further customize `llvm-symbolizer`'s output to support these two usecases: - `--no-inlining`: This disables symbolicating inlined frames. When paired with `--output-style=LLVM` and `--use-symbol-table` this results in a list of functions that appear in the final binary - `--no-source-file`: This disables printing source file information when symbolicating an address - `--source-file`: This enables printing source file information (the default). This option was added to balance `--no-source-file`. The last of `--no-source-file` and `--source-file` passed will determine whether source file information is printed or not. The behavior of `llvm-symbolizer` before this diff should be identical to the behavior after this diff when none of the new options are specified. Together `--functions=linkage --demangle --output-style=LLVM --no-source-file --no-inlining` results in a list of symbol names which appear in the resulting binary. This is useful when symbolicating a list of addresses e.g. for link order files. N.B. The same data can be extracted with a processor on top of `--functions=linkage --demangle --output-style=LLVM`, however with large lists of symbols I've found that this takes quite a long time (my processor(s) were in perl/python- in theory I could've written a C/++ one, but I figure best just add these as formatting options to `llvm-symbolizer` instead). Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83262 Files: llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83262.275844.patch Type: text/x-patch Size: 4571 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:02:19 2020 From: llvm-commits at lists.llvm.org (Dan Zimmerman via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:02:19 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: danzimm added a comment. If there are tests somewhere that I can add to, please point me to them! I'd love to add some tests, just couldn't quite find any (I'm guessing I'm just not looking in the right place... 😅) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Mon Jul 6 15:02:28 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:02:28 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <4bbb03e86672222c2701d2b51adcab82@localhost.localdomain> kuter added a comment. In D83185#2134275 , @uenoku wrote: > Either is fine but I think it is more natural to forbid an empty list. Do you mean returning a error if a empty `--attributor-seed-allow-list` option is present ? Currently the size of list is being used to tell if a list is present or not. I think I can use `getNumOccurrences()` to replace this behaviour . CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 15:04:13 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:04:13 +0000 (UTC) Subject: [PATCH] D82698: [NewPM] make parsePassPipeline parse adaptor-wrapped user passes In-Reply-To: References: Message-ID: <4d2a000cf7a303ae64b0fa1eed2435d5@localhost.localdomain> ychen added a comment. ping ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82698/new/ https://reviews.llvm.org/D82698 From llvm-commits at lists.llvm.org Mon Jul 6 15:07:20 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Mon, 06 Jul 2020 22:07:20 +0000 (UTC) Subject: [PATCH] D77443: [RISCV] Fix RISCVInstrInfo::getInstSizeInBytes for atomics pseudos In-Reply-To: References: Message-ID: <1c29752ad85f1bab5aa4dd9ce09b5d4d@localhost.localdomain> luismarques accepted this revision. luismarques added a comment. This revision is now accepted and ready to land. Alright, let's get this landed! Please just add comments in the expansion functions, noting the need to update these values if the expansion templates ever change, as Lewis says. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77443/new/ https://reviews.llvm.org/D77443 From llvm-commits at lists.llvm.org Mon Jul 6 15:08:39 2020 From: llvm-commits at lists.llvm.org (Keith Smiley via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:08:39 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: <2077026559917ead1e42306c42410ce4@localhost.localdomain> keith added a comment. In D83152#2133950 , @rupprecht wrote: > In D83152#2133855 , @MaskRay wrote: > > > I cannot find any search result about `no-warning-for-no-symbols`. Is `-no-warning-for-no-symbols` really an existing option? libtool is an `ar` like tool. > > > I found it by looking for underscores instead of hyphens: `-no_warning_for_no_symbols`. > However, the flag is an ar/ranlib/libtool flag, not nm, AFAICT. Yea sorry I should have been more clear, it's not the _exact_ same spelling because of the conventions used in nm with `-` instead of `_`. >> Second, I wonder how you are going to plug `-no-warning-for-no-symbols` into a build system. If you only parse stdout, you can ignore stderr. Even if you do, you can probably use `grep -v '^no symbols'`. This will have better portability (supported on older nm, supported on other binary formats). > > I agree this is likely the simpler option (just add `2> /dev/null` to the build script using `nm`) If folks feel strongly about this that would definitely work, this felt like a safer way to silence this for the future for me, but if you all think it's not worth adding an option for that's fine. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Mon Jul 6 15:09:05 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:09:05 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <282a0c7fa3fc76e691554b49115bffcc@localhost.localdomain> Tyker added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- lebedev.ri wrote: > Tyker wrote: > > lebedev.ri wrote: > > > Tyker wrote: > > > > lebedev.ri wrote: > > > > > Tyker wrote: > > > > > > lebedev.ri wrote: > > > > > > > Tyker wrote: > > > > > > > > lebedev.ri wrote: > > > > > > > > > Tyker wrote: > > > > > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > > > > > No, the current logic is correct, that map doesn't live that long. > > > > > > > > here is the situation i think fails. > > > > > > > > > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > > > > > the reduction isn't considered interesting. > > > > > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > > > > > the association from a index to a Bundle isn't the same, > > > > > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > > > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > > > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > > > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > > > > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > > > > > As long as the mapping is stable, i.e. we get the same result > > > > > > > when calling `extractOperandBundesFromModule()` on the same input, > > > > > > > we're good. > > > > > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > > > > > > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > > > > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > > > > > Sure, but can you point me at the spot where you believe that would matter? > > > > the key of the Map depends on pointer values > > > `R.CallsToRefine` is indeed a map, with key being `CallBase *`. > > > We *just* built that map, for this very `Mc` we were provided. > > > I'm still not seeing where pointer order would matter.. > > the map depends on pointer values so its ordering is different on each run. > > the map is iterated upon and the iteration order affects the mapping from Index to Feature(Bundles) > > so every run of has a different mapping. > > > > > Ahh, that is what you mean. No, it's the other way around. > Do you agree that the only place we iterate over the map is: > ``` > for_each(R.CallsToRefine, [](const auto &P) { > return maybeRewriteCallWithDifferentBundles(P.first, P.second); > }); > ``` > ? > > But as you can see in `maybeRewriteCallWithDifferentBundles()`, > it's second param is the indexes of bundle to preserve. > So we have already performed the mapping before iterating over the map. you are right, sorry for my confusion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 15:13:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 15:13:27 -0700 (PDT) Subject: [llvm] db05f2e - [Scalarizer] Centralize instruction DCE Message-ID: <5f03a207.1c69fb81.5ab7b.b42f@mx.google.com> Author: Roman Lebedev Date: 2020-07-07T01:12:51+03:00 New Revision: db05f2e34a5e9380ddcc199d6687531108d795e4 URL: https://github.com/llvm/llvm-project/commit/db05f2e34a5e9380ddcc199d6687531108d795e4 DIFF: https://github.com/llvm/llvm-project/commit/db05f2e34a5e9380ddcc199d6687531108d795e4.diff LOG: [Scalarizer] Centralize instruction DCE As reported in https://reviews.llvm.org/D83101#2133062 the new visitInsertElementInst()/visitExtractElementInst() functionality is causing miscompiles (previously-crashing test added) It is due to the fact how the infra of Scalarizer is dealing with DCE, it was not updated or was it ready for such scalar value forwarding. It always assumed that the moment we "scalarized" something, it can go away, and did so with prejudice. But that is no longer safe/okay to do. Instead, let's prevent it from ever shooting itself into foot, and let's just accumulate the instructions-to-be-deleted in a vector, and collectively cleanup (those that are *actually* dead) them all at the end. All existing tests are not reporting any new garbage leftovers, but maybe it's test coverage issue. Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/basic.ll llvm/test/Transforms/Scalarizer/crash-bug.ll llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 5cac4dca8cf8..e8fea501b1d8 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -22,8 +22,8 @@ #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" -#include "llvm/IR/Dominators.h" #include "llvm/IR/DerivedTypes.h" +#include "llvm/IR/Dominators.h" #include "llvm/IR/Function.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/InstVisitor.h" @@ -41,6 +41,7 @@ #include "llvm/Support/CommandLine.h" #include "llvm/Support/MathExtras.h" #include "llvm/Transforms/Scalar.h" +#include "llvm/Transforms/Utils/Local.h" #include #include #include @@ -222,6 +223,8 @@ class ScalarizerVisitor : public InstVisitor { ScatterMap Scattered; GatherList Gathered; + SmallVector PotentiallyDeadInstrs; + unsigned ParallelLoopAccessMDKind; DominatorTree *DT; @@ -383,11 +386,6 @@ Scatterer ScalarizerVisitor::scatter(Instruction *Point, Value *V) { // so that we can avoid creating the gathered form if all uses of Op are // replaced with uses of CV. void ScalarizerVisitor::gather(Instruction *Op, const ValueVector &CV) { - // Since we're not deleting Op yet, stub out its operands, so that it - // doesn't make anything live unnecessarily. - for (unsigned I = 0, E = Op->getNumOperands(); I != E; ++I) - Op->setOperand(I, UndefValue::get(Op->getOperand(I)->getType())); - transferMetadataAndIRFlags(Op, CV); // If we already have a scattered form of Op (created from ExtractElements @@ -402,7 +400,7 @@ void ScalarizerVisitor::gather(Instruction *Op, const ValueVector &CV) { Instruction *Old = cast(V); CV[I]->takeName(Old); Old->replaceAllUsesWith(CV[I]); - Old->eraseFromParent(); + PotentiallyDeadInstrs.emplace_back(Old); } } SV = CV; @@ -950,10 +948,13 @@ bool ScalarizerVisitor::finish() { Res->takeName(Op); Op->replaceAllUsesWith(Res); } - Op->eraseFromParent(); + PotentiallyDeadInstrs.emplace_back(Op); } Gathered.clear(); Scattered.clear(); + + RecursivelyDeleteTriviallyDeadInstructionsPermissive(PotentiallyDeadInstrs); + return true; } diff --git a/llvm/test/Transforms/Scalarizer/basic.ll b/llvm/test/Transforms/Scalarizer/basic.ll index 2c7b6a6b588f..239cdd660a33 100644 --- a/llvm/test/Transforms/Scalarizer/basic.ll +++ b/llvm/test/Transforms/Scalarizer/basic.ll @@ -539,6 +539,19 @@ define <2 x float> @f22(<2 x float> %x, <2 x float> %y, <2 x float> %z) { ret <2 x float> %res } +; See https://reviews.llvm.org/D83101#2133062 +define <2 x i32> @f23_crash(<2 x i32> %srcvec, i32 %v1) { +; CHECK-LABEL: @f23_crash( +; CHECK: %1 = extractelement <2 x i32> %srcvec, i32 0 +; CHECK: %t1.upto0 = insertelement <2 x i32> undef, i32 %1, i32 0 +; CHECK: %t1 = insertelement <2 x i32> %t1.upto0, i32 %v1, i32 1 +; CHECK: ret <2 x i32> %t1 + %v0 = extractelement <2 x i32> %srcvec, i32 0 + %t0 = insertelement <2 x i32> undef, i32 %v0, i32 0 + %t1 = insertelement <2 x i32> %t0, i32 %v1, i32 1 + ret <2 x i32> %t1 +} + !0 = !{ !"root" } !1 = !{ !"set1", !0 } !2 = !{ !"set2", !0 } diff --git a/llvm/test/Transforms/Scalarizer/crash-bug.ll b/llvm/test/Transforms/Scalarizer/crash-bug.ll index d0d019564977..d581707971e7 100644 --- a/llvm/test/Transforms/Scalarizer/crash-bug.ll +++ b/llvm/test/Transforms/Scalarizer/crash-bug.ll @@ -15,7 +15,6 @@ bb2: ; preds = %bb1 bb1: ; preds = %bb2, %0 %bb1_vec = phi <2 x i16> [ , %0 ], [ %bb2_vec, %bb2 ] ;CHECK: bb1: -;CHECK: %bb1_vec.i0 = phi i16 [ 100, %0 ], [ 0, %bb2 ] ;CHECK: %bb2_vec.i1 = phi i16 [ 200, %0 ], [ %bb2_vec.i1, %bb2 ] br i1 undef, label %bb3, label %bb2 diff --git a/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll b/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll index 8e89efb5d31f..9cfffe3b977f 100644 --- a/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll +++ b/llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll @@ -11,11 +11,8 @@ define i16 @f1() { ; CHECK: for.cond: ; CHECK-NEXT: br i1 undef, label [[FOR_BODY:%.*]], label [[FOR_END]] ; CHECK: for.end: -; CHECK-NEXT: [[PHI_I0:%.*]] = phi i16 [ 1, [[ENTRY:%.*]] ], [ undef, [[FOR_COND]] ] -; CHECK-NEXT: [[PHI_I1:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] -; CHECK-NEXT: [[PHI_I2:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] -; CHECK-NEXT: [[PHI_I3:%.*]] = phi i16 [ 1, [[ENTRY]] ], [ undef, [[FOR_COND]] ] -; CHECK-NEXT: ret i16 [[PHI_I0]] +; CHECK-NEXT: [[EXTRACT:%.*]] = phi i16 [ 1, [[ENTRY:%.*]] ], [ undef, [[FOR_COND]] ] +; CHECK-NEXT: ret i16 [[EXTRACT]] ; entry: br label %for.end From llvm-commits at lists.llvm.org Mon Jul 6 15:13:48 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:13:48 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <7f0d7a315a713863bc69f6ff29e6d401@localhost.localdomain> lebedev.ri added a comment. In D83101#2134056 , @lebedev.ri wrote: > In D83101#2133062 , @foad wrote: > > > @lebedev.ri this is causing assertion failures and verification failures in some of our downstream tests. Here's a test case: > > > > $ cat reduced.ll > > define void @main(<3 x i32> inreg %w) { > > entry: > > %a = extractelement <3 x i32> undef, i32 0 > > %b = extractelement <3 x i32> undef, i32 1 > > %x = extractelement <3 x i32> %w, i32 2 > > %y = insertelement <4 x i32> undef, i32 %x, i32 2 > > %z = insertelement <4 x i32> %y, i32 undef, i32 3 > > store <4 x i32> %z, <4 x i32> addrspace(7)* undef, align 16 > > ret void > > } > > $ ~/llvm-debug/bin/opt -scalarizer -o /dev/null reduced.ll > > Instruction does not dominate all uses! > > = extractelement [145938144 x half] , i32 undef > > %z.upto2 = insertelement <4 x i32> undef, i32 , i32 2 > > in function main > > LLVM ERROR: Broken function found, compilation aborted! > > > > > Thanks for test case, looking. Thank you for a great reproducer! Fixed in rGdb05f2e34a5e9380ddcc199d6687531108d795e4 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Mon Jul 6 15:14:30 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:14:30 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <7a359da6fa9fb3b5b3c791218217e78d@localhost.localdomain> lebedev.ri marked 4 inline comments as done. lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp:138 + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); ---------------- Tyker wrote: > lebedev.ri wrote: > > Tyker wrote: > > > lebedev.ri wrote: > > > > Tyker wrote: > > > > > lebedev.ri wrote: > > > > > > Tyker wrote: > > > > > > > lebedev.ri wrote: > > > > > > > > Tyker wrote: > > > > > > > > > lebedev.ri wrote: > > > > > > > > > > Tyker wrote: > > > > > > > > > > > Maybe we should make CallsToRefine a MapVector since the association from a index to a Bundle depends on its order in the map. > > > > > > > > > > > and the key depends on a pointer value that will change when the Module gets cloned. > > > > > > > > > > No, the current logic is correct, that map doesn't live that long. > > > > > > > > > here is the situation i think fails. > > > > > > > > > > > > > > > > > > extractOperandBundesFromModule is called a first time and generate a reduction. > > > > > > > > > the reduction isn't considered interesting. > > > > > > > > > extractOperandBundesFromModule is called a second time with different chunks to keep. > > > > > > > > > but the association from an index to a Bundle is different from the first time because the module isn't the same > > > > > > > > > the association from a index to a Bundle isn't the same, > > > > > > > > > so extractOperandBundesFromModule can remove the same operand bundle a second time and not have tried every operand bundles at the end of the passe. > > > > > > > > > > > > > > > > > > > > > > > > > > In `reduceOperandBundesDeltaPass()`, we tell `runDeltaPass()` how many features (here: bundles) > > > > > > > > we have in this module `M0`. It then comes with different chunks to keep and tells us > > > > > > > > to mutate the module `Mc` (which is a perfect clone of `M0`). > > > > > > > > > > > > > > > > It is up to us to actually enumerate the features (here: bundles). > > > > > > > > As long as the mapping is stable, i.e. we get the same result > > > > > > > > when calling `extractOperandBundesFromModule()` on the same input, > > > > > > > > we're good. > > > > > > > > when calling extractOperandBundesFromModule() on the same input, we're good. > > > > > > > > > > > > > > i agree but runDeltaPass makes clones of modules before giving them to `extractOperandBundesFromModule` to modify. > > > > > > > so the pointer values of Modules will not be the same across runs of `extractOperandBundesFromModule` even if the Modules are identical. > > > > > > Sure, but can you point me at the spot where you believe that would matter? > > > > > the key of the Map depends on pointer values > > > > `R.CallsToRefine` is indeed a map, with key being `CallBase *`. > > > > We *just* built that map, for this very `Mc` we were provided. > > > > I'm still not seeing where pointer order would matter.. > > > the map depends on pointer values so its ordering is different on each run. > > > the map is iterated upon and the iteration order affects the mapping from Index to Feature(Bundles) > > > so every run of has a different mapping. > > > > > > > > Ahh, that is what you mean. No, it's the other way around. > > Do you agree that the only place we iterate over the map is: > > ``` > > for_each(R.CallsToRefine, [](const auto &P) { > > return maybeRewriteCallWithDifferentBundles(P.first, P.second); > > }); > > ``` > > ? > > > > But as you can see in `maybeRewriteCallWithDifferentBundles()`, > > it's second param is the indexes of bundle to preserve. > > So we have already performed the mapping before iterating over the map. > you are right, sorry for my confusion. NP, thank you for taking a look! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 From llvm-commits at lists.llvm.org Mon Jul 6 15:15:35 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via llvm-commits) Date: Mon, 06 Jul 2020 15:15:35 -0700 (PDT) Subject: [llvm] c143900 - [llvm-install-name-tool] Merge install-name options Message-ID: <5f03a287.1c69fb81.59363.554f@mx.google.com> Author: Sameer Arora Date: 2020-07-06T15:15:20-07:00 New Revision: c143900a0851b2c7b7d52e4825c7f073b3474cf6 URL: https://github.com/llvm/llvm-project/commit/c143900a0851b2c7b7d52e4825c7f073b3474cf6 DIFF: https://github.com/llvm/llvm-project/commit/c143900a0851b2c7b7d52e4825c7f073b3474cf6.diff LOG: [llvm-install-name-tool] Merge install-name options This diff merges all options for llvm-install-name-tool under a single function processLoadCommands. Also adds another test case for -add_rpath option. Test plan: make check-all Reviewed by: jhenderson, alexshap, smeenai, Ktwu Differential Revision: https://reviews.llvm.org/D82812 Added: Modified: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test index 1435c6b744c8..7b21fdc2e03c 100644 --- a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test +++ b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test @@ -22,6 +22,13 @@ # NO-INPUT: no input file specified +## Add same RPATH twice: +# RUN: not llvm-install-name-tool -add_rpath @executable_X \ +# RUN: -add_rpath @executable_X %t.i386 2>&1 \ +# RUN: | FileCheck --check-prefix=DOUBLE %s + +# DOUBLE: duplicate load command + ## Check that cmdsize accounts for NULL terminator. # RUN: yaml2obj %p/Inputs/x86_64.yaml -o %t.x86_64 # RUN: llvm-install-name-tool -add_rpath abcd %t.x86_64 diff --git a/llvm/tools/llvm-objcopy/CopyConfig.cpp b/llvm/tools/llvm-objcopy/CopyConfig.cpp index f93406f371d0..1fde54dd290a 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.cpp +++ b/llvm/tools/llvm-objcopy/CopyConfig.cpp @@ -874,42 +874,39 @@ parseInstallNameToolOptions(ArrayRef ArgsArr) { auto Match = [=](StringRef RPath) { return RPath == Old || RPath == New; }; // Cannot specify duplicate -rpath entries - auto It1 = find_if(Config.RPathsToUpdate, - [&Match](const std::pair &OldNew) { - return Match(OldNew.first) || Match(OldNew.second); - }); + auto It1 = find_if( + Config.RPathsToUpdate, + [&Match](const DenseMap::value_type &OldNew) { + return Match(OldNew.getFirst()) || Match(OldNew.getSecond()); + }); if (It1 != Config.RPathsToUpdate.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -rpath %s %s and -rpath %s %s", - It1->first.str().c_str(), It1->second.str().c_str(), - Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -rpath " + It1->getFirst() + + " " + It1->getSecond() + " and -rpath " + + Old + " " + New); // Cannot specify the same rpath under both -delete_rpath and -rpath auto It2 = find_if(Config.RPathsToRemove, Match); if (It2 != Config.RPathsToRemove.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -delete_rpath %s and -rpath %s %s", - It2->str().c_str(), Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -delete_rpath " + *It2 + + " and -rpath " + Old + " " + New); // Cannot specify the same rpath under both -add_rpath and -rpath auto It3 = find_if(Config.RPathToAdd, Match); if (It3 != Config.RPathToAdd.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -add_rpath %s and -rpath %s %s", - It3->str().c_str(), Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -add_rpath " + *It3 + + " and -rpath " + Old + " " + New); - Config.RPathsToUpdate.emplace_back(Old, New); + Config.RPathsToUpdate.insert({Old, New}); } if (auto *Arg = InputArgs.getLastArg(INSTALL_NAME_TOOL_id)) Config.SharedLibId = Arg->getValue(); for (auto *Arg : InputArgs.filtered(INSTALL_NAME_TOOL_change)) { - Config.InstallNamesToUpdate.emplace_back(Arg->getValue(0), - Arg->getValue(1)); + Config.InstallNamesToUpdate.insert({Arg->getValue(0), Arg->getValue(1)}); } SmallVector Positional; diff --git a/llvm/tools/llvm-objcopy/CopyConfig.h b/llvm/tools/llvm-objcopy/CopyConfig.h index ce119dee5bff..1341dd674c7b 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.h +++ b/llvm/tools/llvm-objcopy/CopyConfig.h @@ -178,8 +178,8 @@ struct CopyConfig { std::vector DumpSection; std::vector SymbolsToAdd; std::vector RPathToAdd; - std::vector> RPathsToUpdate; - std::vector> InstallNamesToUpdate; + DenseMap RPathsToUpdate; + DenseMap InstallNamesToUpdate; DenseSet RPathsToRemove; // install-name-tool's id option diff --git a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp index 3844b6f62de6..9d0c36630258 100644 --- a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp +++ b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp @@ -42,35 +42,6 @@ static StringRef getPayloadString(const LoadCommand &LC) { .rtrim('\0'); } -static Error removeLoadCommands(const CopyConfig &Config, Object &Obj) { - DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), - Config.RPathsToRemove.end()); - - LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { - StringRef RPath = getPayloadString(LC); - if (RPathsToRemove.count(RPath)) { - RPathsToRemove.erase(RPath); - return true; - } - } - return false; - }; - - if (Error E = Obj.removeLoadCommands(RemovePred)) - return E; - - // Emit an error if the Mach-O binary does not contain an rpath path name - // specified in -delete_rpath. - for (StringRef RPath : Config.RPathsToRemove) { - if (RPathsToRemove.count(RPath)) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: %s", - RPath.str().c_str()); - } - return Error::success(); -} - static Error removeSections(const CopyConfig &Config, Object &Obj) { SectionPred RemovePred = [](const std::unique_ptr
&) { return false; @@ -157,6 +128,103 @@ static LoadCommand buildRPathLoadCommand(StringRef Path) { return LC; } +static Error processLoadCommands(const CopyConfig &Config, Object &Obj) { + // Remove RPaths. + DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), + Config.RPathsToRemove.end()); + + LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { + StringRef RPath = getPayloadString(LC); + if (RPathsToRemove.count(RPath)) { + RPathsToRemove.erase(RPath); + return true; + } + } + return false; + }; + + if (Error E = Obj.removeLoadCommands(RemovePred)) + return E; + + // Emit an error if the Mach-O binary does not contain an rpath path name + // specified in -delete_rpath. + for (StringRef RPath : Config.RPathsToRemove) { + if (RPathsToRemove.count(RPath)) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: %s", + RPath.str().c_str()); + } + + DenseSet RPaths; + + // Get all existing RPaths. + for (LoadCommand &LC : Obj.LoadCommands) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) + RPaths.insert(getPayloadString(LC)); + } + + // Throw errors for invalid RPaths. + for (const auto &OldNew : Config.RPathsToUpdate) { + StringRef Old, New; + std::tie(Old, New) = OldNew; + if (RPaths.count(Old) == 0) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: " + Old); + if (RPaths.count(New) != 0) + return createStringError(errc::invalid_argument, + "rpath " + New + + " would create a duplicate load command"); + } + + // Update load commands. + for (LoadCommand &LC : Obj.LoadCommands) { + switch (LC.MachOLoadCommand.load_command_data.cmd) { + case MachO::LC_ID_DYLIB: + if (Config.SharedLibId) { + StringRef Id = Config.SharedLibId.getValue(); + if (Id.empty()) + return createStringError(errc::invalid_argument, + "cannot specify an empty id"); + updateLoadCommandPayloadString(LC, Id); + } + break; + + case MachO::LC_RPATH: { + StringRef RPath = getPayloadString(LC); + StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath); + if (!NewRPath.empty()) + updateLoadCommandPayloadString(LC, NewRPath); + break; + } + + // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB + // here once llvm-objcopy supports them. + case MachO::LC_LOAD_DYLIB: + case MachO::LC_LOAD_WEAK_DYLIB: + StringRef InstallName = getPayloadString(LC); + StringRef NewInstallName = + Config.InstallNamesToUpdate.lookup(InstallName); + if (!NewInstallName.empty()) + updateLoadCommandPayloadString(LC, + NewInstallName); + break; + } + } + + // Add new RPaths. + for (StringRef RPath : Config.RPathToAdd) { + if (RPaths.count(RPath) != 0) + return createStringError(errc::invalid_argument, + "rpath " + RPath + + " would create a duplicate load command"); + RPaths.insert(RPath); + Obj.addLoadCommand(buildRPathLoadCommand(RPath)); + } + + return Error::success(); +} + static Error dumpSectionToFile(StringRef SecName, StringRef Filename, Object &Obj) { for (LoadCommand &LC : Obj.LoadCommands) @@ -273,34 +341,6 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { for (std::unique_ptr
&Sec : LC.Sections) Sec->Relocations.clear(); - for (LoadCommand &LC : Obj.LoadCommands) { - switch (LC.MachOLoadCommand.load_command_data.cmd) { - case MachO::LC_ID_DYLIB: - if (Config.SharedLibId) { - StringRef Id = Config.SharedLibId.getValue(); - if (Id.empty()) - return createStringError(errc::invalid_argument, - "cannot specify an empty id"); - updateLoadCommandPayloadString(LC, Id); - } - break; - - // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB - // here once llvm-objcopy supports them. - case MachO::LC_LOAD_DYLIB: - case MachO::LC_LOAD_WEAK_DYLIB: - StringRef Old, New; - StringRef CurrentInstallName = getPayloadString(LC); - for (const auto &InstallNamePair : Config.InstallNamesToUpdate) { - std::tie(Old, New) = InstallNamePair; - if (CurrentInstallName == Old) { - updateLoadCommandPayloadString(LC, New); - break; - } - } - } - } - for (const auto &Flag : Config.AddSection) { std::pair SecPair = Flag.split("="); StringRef SecName = SecPair.first; @@ -311,45 +351,9 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { return E; } - if (Error E = removeLoadCommands(Config, Obj)) + if (Error E = processLoadCommands(Config, Obj)) return E; - StringRef Old, New; - for (const auto &OldNew : Config.RPathsToUpdate) { - std::tie(Old, New) = OldNew; - - auto FindRPathLC = [&Obj](StringRef RPath) { - return find_if(Obj.LoadCommands, [=](const LoadCommand &LC) { - return LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && - getPayloadString(LC) == RPath; - }); - }; - - auto NewIt = FindRPathLC(New); - if (NewIt != Obj.LoadCommands.end()) - return createStringError(errc::invalid_argument, - "rpath " + New + - " would create a duplicate load command"); - - auto OldIt = FindRPathLC(Old); - if (OldIt == Obj.LoadCommands.end()) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: " + Old); - - updateLoadCommandPayloadString(*OldIt, New); - } - - for (StringRef RPath : Config.RPathToAdd) { - for (LoadCommand &LC : Obj.LoadCommands) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && - RPath == getPayloadString(LC)) { - return createStringError(errc::invalid_argument, - "rpath " + RPath + - " would create a duplicate load command"); - } - } - Obj.addLoadCommand(buildRPathLoadCommand(RPath)); - } return Error::success(); } From llvm-commits at lists.llvm.org Mon Jul 6 15:15:41 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:15:41 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGc143900a0851: [llvm-install-name-tool] Merge install-name options (authored by sameerarora101). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 Files: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82812.275846.patch Type: text/x-patch Size: 12613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:15:50 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:15:50 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: jdenny added inline comments. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- clementval wrote: > jdenny wrote: > > clementval wrote: > > > jdenny wrote: > > > > Why is an empty line needed? > > > Just to be consistent with clang-format in the generated file. > > It's surprising that clang-format would require an empty line at the end of the file. Any idea why? > Sorry, it is not clang-format but Phabricator which signal a missing empty line at end of file. This is not mandatory and can be removed. The file is generated. I just wanted to be consistent with the style. Are you referring to, for example, the version of `directive2.td` on the left side of this diff? Here, phabricator points out that the final line in the file is not newline-terminated. I don't recall seeing phabricator suggest an additional empty line. Likewise, in `EmitDirectivesImpl`, the final `GenerateIsAllowedClause` call already newline-terminates the final line of the generated file, so I don't see a need for an additional empty line. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 15:18:15 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 15:18:15 -0700 (PDT) Subject: [llvm] 69dca6e - [NFCI][IR] Introduce CallBase::Create() wrapper Message-ID: <5f03a327.1c69fb81.d5e8c.36f8@mx.google.com> Author: Roman Lebedev Date: 2020-07-07T01:16:36+03:00 New Revision: 69dca6efc60a40a939ca5025a8c716e891c2847a URL: https://github.com/llvm/llvm-project/commit/69dca6efc60a40a939ca5025a8c716e891c2847a DIFF: https://github.com/llvm/llvm-project/commit/69dca6efc60a40a939ca5025a8c716e891c2847a.diff LOG: [NFCI][IR] Introduce CallBase::Create() wrapper Summary: It is reasonably common to want to clone some call with different bundles. Let's actually provide an interface to do that. Reviewers: chandlerc, jdoerfert, dblaikie, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D83248 Added: Modified: llvm/include/llvm/IR/InstrTypes.h llvm/lib/IR/Instructions.cpp llvm/lib/Transforms/CFGuard/CFGuard.cpp llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/lib/Transforms/Utils/InlineFunction.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h index 06119f32aedc..770d3183a909 100644 --- a/llvm/include/llvm/IR/InstrTypes.h +++ b/llvm/include/llvm/IR/InstrTypes.h @@ -1135,6 +1135,15 @@ class CallBase : public Instruction { public: using Instruction::getContext; + /// Create a clone of \p CB with a diff erent set of operand bundles and + /// insert it before \p InsertPt. + /// + /// The returned call instruction is identical \p CB in every way except that + /// the operand bundles for the new instruction are set to the operand bundles + /// in \p Bundles. + static CallBase *Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt = nullptr); + static bool classof(const Instruction *I) { return I->getOpcode() == Instruction::Call || I->getOpcode() == Instruction::Invoke || diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp index 78887a63b726..e22f609b1885 100644 --- a/llvm/lib/IR/Instructions.cpp +++ b/llvm/lib/IR/Instructions.cpp @@ -247,6 +247,20 @@ void LandingPadInst::addClause(Constant *Val) { // CallBase Implementation //===----------------------------------------------------------------------===// +CallBase *CallBase::Create(CallBase *CB, ArrayRef Bundles, + Instruction *InsertPt) { + switch (CB->getOpcode()) { + case Instruction::Call: + return CallInst::Create(cast(CB), Bundles, InsertPt); + case Instruction::Invoke: + return InvokeInst::Create(cast(CB), Bundles, InsertPt); + case Instruction::CallBr: + return CallBrInst::Create(cast(CB), Bundles, InsertPt); + default: + llvm_unreachable("Unknown CallBase sub-class!"); + } +} + Function *CallBase::getCaller() { return getParent()->getParent(); } unsigned CallBase::getNumSubclassExtraOperandsDynamic() const { diff --git a/llvm/lib/Transforms/CFGuard/CFGuard.cpp b/llvm/lib/Transforms/CFGuard/CFGuard.cpp index e4f338b2d9e9..96c083a144b2 100644 --- a/llvm/lib/Transforms/CFGuard/CFGuard.cpp +++ b/llvm/lib/Transforms/CFGuard/CFGuard.cpp @@ -204,14 +204,9 @@ void CFGuard::insertCFGuardDispatch(CallBase *CB) { Bundles.emplace_back("cfguardtarget", CalledOperand); // Create a copy of the call/invoke instruction and add the new bundle. - CallBase *NewCB; - if (CallInst *CI = dyn_cast(CB)) { - NewCB = CallInst::Create(CI, Bundles, CB); - } else { - assert(isa(CB) && "Unknown indirect call type"); - InvokeInst *II = cast(CB); - NewCB = llvm::InvokeInst::Create(II, Bundles, CB); - } + assert((isa(CB) || isa(CB)) && + "Unknown indirect call type"); + CallBase *NewCB = CallBase::Create(CB, Bundles, CB); // Change the target of the call to be the guard dispatch function. NewCB->setCalledOperand(GuardDispatchLoad); diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp index 437451b206e8..ed989529d1ae 100644 --- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -2321,13 +2321,10 @@ static void RemovePreallocated(Function *F) { assert(PreallocatedSetup && "Did not find preallocated bundle"); uint64_t ArgCount = cast(PreallocatedSetup->getArgOperand(0))->getZExtValue(); - CallBase *NewCB = nullptr; - if (InvokeInst *II = dyn_cast(CB)) { - NewCB = InvokeInst::Create(II, OpBundles, CB); - } else { - CallInst *CI = cast(CB); - NewCB = CallInst::Create(CI, OpBundles, CB); - } + + assert((isa(CB) || isa(CB)) && + "Unknown indirect call type"); + CallBase *NewCB = CallBase::Create(CB, OpBundles, CB); CB->replaceAllUsesWith(NewCB); NewCB->takeName(CB); CB->eraseFromParent(); diff --git a/llvm/lib/Transforms/Utils/InlineFunction.cpp b/llvm/lib/Transforms/Utils/InlineFunction.cpp index 203e812cd4a2..b0b7ca484798 100644 --- a/llvm/lib/Transforms/Utils/InlineFunction.cpp +++ b/llvm/lib/Transforms/Utils/InlineFunction.cpp @@ -1864,13 +1864,7 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI, OpDefs.emplace_back("deopt", std::move(MergedDeoptArgs)); } - Instruction *NewI = nullptr; - if (isa(ICS)) - NewI = CallInst::Create(cast(ICS), OpDefs, ICS); - else if (isa(ICS)) - NewI = CallBrInst::Create(cast(ICS), OpDefs, ICS); - else - NewI = InvokeInst::Create(cast(ICS), OpDefs, ICS); + Instruction *NewI = CallBase::Create(ICS, OpDefs, ICS); // Note: the RAUW does the appropriate fixup in VMap, so we need to do // this even if the call returns void. @@ -2166,13 +2160,7 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI, I->getOperandBundlesAsDefs(OpBundles); OpBundles.emplace_back("funclet", CallSiteEHPad); - Instruction *NewInst; - if (auto *CallI = dyn_cast(I)) - NewInst = CallInst::Create(CallI, OpBundles, CallI); - else if (auto *CallBrI = dyn_cast(I)) - NewInst = CallBrInst::Create(CallBrI, OpBundles, CallBrI); - else - NewInst = InvokeInst::Create(cast(I), OpBundles, I); + Instruction *NewInst = CallBase::Create(I, OpBundles, I); NewInst->takeName(I); I->replaceAllUsesWith(NewInst); I->eraseFromParent(); From llvm-commits at lists.llvm.org Mon Jul 6 15:18:17 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 15:18:17 -0700 (PDT) Subject: [llvm] 05f2b5c - [llvm-reduce] Reducing call operand bundles Message-ID: <5f03a329.1c69fb81.34e84.32cb@mx.google.com> Author: Roman Lebedev Date: 2020-07-07T01:16:37+03:00 New Revision: 05f2b5ccfc5d8b1f182b00fc80dfbe804fd0357a URL: https://github.com/llvm/llvm-project/commit/05f2b5ccfc5d8b1f182b00fc80dfbe804fd0357a DIFF: https://github.com/llvm/llvm-project/commit/05f2b5ccfc5d8b1f182b00fc80dfbe804fd0357a.diff LOG: [llvm-reduce] Reducing call operand bundles Summary: This would have been marginally useful to me during/for rG7ea46aee3670981827c04df89b2c3a1cbdc7561b. With ongoing migration to representing assumes via operand bundles on the assume, this will be gradually more useful. Reviewers: nickdesaulniers, diegotf, dblaikie, george.burgess.iv, jdoerfert, Tyker Reviewed By: nickdesaulniers Subscribers: hiraditya, mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83177 Added: llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h Modified: llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn Removed: ################################################################################ diff --git a/llvm/test/Reduce/remove-operand-bundles.ll b/llvm/test/Reduce/remove-operand-bundles.ll new file mode 100644 index 000000000000..39c0af6c9ae5 --- /dev/null +++ b/llvm/test/Reduce/remove-operand-bundles.ll @@ -0,0 +1,41 @@ +; Test that llvm-reduce can remove uninteresting operand bundles from calls. +; +; RUN: rm -rf %t +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +; CHECK-ALL: declare void @f1() +; CHECK-ALL: declare void @f2() +; CHECK-ALL: declare void @f3() +declare void @f1() +declare void @f2() +declare void @f3() + +; CHECK-FINAL-LABEL: define void @interesting(i32 %arg0, i32 %arg1, i32 %arg2) { +; CHECK-FINAL-NEXT: entry: +; CHECK-FINAL-NEXT: call void @f1() [ "bundle0"(), "align"(i32 %arg0), "whatever0"() ] +; CHECK-FINAL-NEXT: call void @f2() +; CHECK-FINAL-NEXT: call void @f3() [ "align"(i32 %arg2) ] +; CHECK-FINAL-NEXT: ret void +; CHECK-FINAL-NEXT: } +define void @interesting(i32 %arg0, i32 %arg1, i32 %arg2) { +entry: +; CHECK-INTERESTINGNESS-LABEL: @interesting( + +; CHECK-INTERESTINGNESS: call void @f1() +; CHECK-INTERESTINGNESS: "bundle0"() +; CHECK-INTERESTINGNESS: "align"(i32 %arg0) +; CHECK-INTERESTINGNESS: "whatever0"() + +; CHECK-INTERESTINGNESS: call void @f2() + +; CHECK-INTERESTINGNESS: call void @f3() +; CHECK-INTERESTINGNESS: "align"(i32 %arg2) + +; CHECK-INTERESTINGNESS: ret + + call void @f1() [ "bundle0"(), "align"(i32 %arg0), "whatever0"() ] + call void @f2() [ "align"(i32 %arg1), "whatever1"(), "bundle1"() ] + call void @f3() [ "whatever2"(), "bundle2"(), "align"(i32 %arg2) ] + ret void +} diff --git a/llvm/tools/llvm-reduce/CMakeLists.txt b/llvm/tools/llvm-reduce/CMakeLists.txt index 48de0ffa78a1..24eedac613f5 100644 --- a/llvm/tools/llvm-reduce/CMakeLists.txt +++ b/llvm/tools/llvm-reduce/CMakeLists.txt @@ -11,15 +11,16 @@ set(LLVM_LINK_COMPONENTS ) add_llvm_tool(llvm-reduce - llvm-reduce.cpp TestRunner.cpp deltas/Delta.cpp - deltas/ReduceFunctions.cpp - deltas/ReduceGlobalVars.cpp - deltas/ReduceMetadata.cpp deltas/ReduceArguments.cpp deltas/ReduceBasicBlocks.cpp + deltas/ReduceFunctions.cpp + deltas/ReduceGlobalVars.cpp deltas/ReduceInstructions.cpp + deltas/ReduceMetadata.cpp + deltas/ReduceOperandBundles.cpp + llvm-reduce.cpp DEPENDS intrinsics_gen diff --git a/llvm/tools/llvm-reduce/DeltaManager.h b/llvm/tools/llvm-reduce/DeltaManager.h index 2309c3adf4e6..5635352b43d8 100644 --- a/llvm/tools/llvm-reduce/DeltaManager.h +++ b/llvm/tools/llvm-reduce/DeltaManager.h @@ -17,8 +17,9 @@ #include "deltas/ReduceBasicBlocks.h" #include "deltas/ReduceFunctions.h" #include "deltas/ReduceGlobalVars.h" -#include "deltas/ReduceMetadata.h" #include "deltas/ReduceInstructions.h" +#include "deltas/ReduceMetadata.h" +#include "deltas/ReduceOperandBundles.h" namespace llvm { @@ -30,6 +31,7 @@ inline void runDeltaPasses(TestRunner &Tester) { reduceMetadataDeltaPass(Tester); reduceArgumentsDeltaPass(Tester); reduceInstructionsDeltaPass(Tester); + reduceOperandBundesDeltaPass(Tester); // TODO: Implement the remaining Delta Passes } diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp new file mode 100644 index 000000000000..c6de6e9d567c --- /dev/null +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -0,0 +1,161 @@ +//===- ReduceOperandBundes.cpp - Specialized Delta Pass -------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements a function which calls the Generic Delta pass in order +// to reduce uninteresting operand bundes from calls. +// +//===----------------------------------------------------------------------===// + +#include "ReduceOperandBundles.h" +#include "Delta.h" +#include "TestRunner.h" +#include "llvm/ADT/ArrayRef.h" +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/ScopeExit.h" +#include "llvm/ADT/Sequence.h" +#include "llvm/ADT/iterator_range.h" +#include "llvm/IR/InstVisitor.h" +#include "llvm/IR/InstrTypes.h" +#include "llvm/Support/raw_ostream.h" +#include +#include +#include + +namespace { +class Module; +} // namespace + +using namespace llvm; + +namespace { + +/// Provides opaque interface for querying into ChunksToKeep without having to +/// actually understand what is going on. +struct Oracle { + /// Out of all the features that we promised to be, + /// how many have we already processed? 1-based! + int Index = 1; + + /// The actual workhorse, contains the knowledge whether or not + /// some particular feature should be preserved this time. + ArrayRef ChunksToKeep; + +public: + Oracle(ArrayRef ChunksToKeep_) : ChunksToKeep(ChunksToKeep_) {} + + /// Should be called for each feature on which we are operating. + /// Name is self-explanatory - if returns true, then it should be preserved. + bool shouldKeep() { + if (ChunksToKeep.empty()) + return false; // All further features are to be discarded. + + // Does the current (front) chunk contain such a feature? + bool ShouldKeep = ChunksToKeep.front().contains(Index); + auto _ = make_scope_exit([&]() { ++Index; }); // Next time - next feature. + + // Is this the last feature in the chunk? + if (ChunksToKeep.front().end == Index) + ChunksToKeep = ChunksToKeep.drop_front(); // Onto next chunk. + + return ShouldKeep; + } +}; + +/// Given ChunksToKeep, produce a map of calls and indexes of operand bundles +/// to be preserved for each call. +class OperandBundleRemapper : public InstVisitor { + Oracle O; + +public: + DenseMap> CallsToRefine; + + explicit OperandBundleRemapper(ArrayRef ChunksToKeep) + : O(ChunksToKeep) {} + + /// So far only CallBase sub-classes can have operand bundles. + /// Let's see which of the operand bundles of this call are to be kept. + void visitCallBase(CallBase &Call) { + if (!Call.hasOperandBundles()) + return; // No bundles to begin with. + + // Insert this call into map, we will likely want to rebuild it. + auto &OperandBundlesToKeepIndexes = CallsToRefine[&Call]; + OperandBundlesToKeepIndexes.reserve(Call.getNumOperandBundles()); + + // Enumerate every operand bundle on this call. + for_each(seq(0U, Call.getNumOperandBundles()), [&](unsigned BundleIndex) { + if (O.shouldKeep()) // Should we keep this one? + OperandBundlesToKeepIndexes.emplace_back(BundleIndex); + }); + } +}; + +struct OperandBundleCounter : public InstVisitor { + /// How many features (in this case, operand bundles) did we count, total? + int OperandBundeCount = 0; + + OperandBundleCounter() {} + + /// So far only CallBase sub-classes can have operand bundles. + void visitCallBase(CallBase &Call) { + // Just accumulate the total number of operand bundles. + OperandBundeCount += Call.getNumOperandBundles(); + } +}; + +} // namespace + +static void maybeRewriteCallWithDifferentBundles( + CallBase *OrigCall, ArrayRef OperandBundlesToKeepIndexes) { + if (OperandBundlesToKeepIndexes.size() == OrigCall->getNumOperandBundles()) + return; // Not modifying operand bundles of this call after all. + + std::vector NewBundles; + NewBundles.reserve(OperandBundlesToKeepIndexes.size()); + + // Actually copy over the bundles that we want to keep. + transform(OperandBundlesToKeepIndexes, std::back_inserter(NewBundles), + [OrigCall](unsigned Index) { + return OperandBundleDef(OrigCall->getOperandBundleAt(Index)); + }); + + // Finally actually replace the bundles on the call. + CallBase *NewCall = CallBase::Create(OrigCall, NewBundles, OrigCall); + OrigCall->replaceAllUsesWith(NewCall); + OrigCall->eraseFromParent(); +} + +/// Removes out-of-chunk operand bundles from calls. +static void extractOperandBundesFromModule(std::vector ChunksToKeep, + Module *Program) { + OperandBundleRemapper R(ChunksToKeep); + R.visit(Program); + + for_each(R.CallsToRefine, [](const auto &P) { + return maybeRewriteCallWithDifferentBundles(P.first, P.second); + }); +} + +/// Counts the amount of operand bundles. +static int countOperandBundes(Module *Program) { + OperandBundleCounter C; + + // TODO: Silence index with --quiet flag + outs() << "----------------------------\n"; + C.visit(Program); + outs() << "Number of operand bundles: " << C.OperandBundeCount << "\n"; + + return C.OperandBundeCount; +} + +void llvm::reduceOperandBundesDeltaPass(TestRunner &Test) { + outs() << "*** Reducing OperandBundes...\n"; + int OperandBundeCount = countOperandBundes(Test.getProgram()); + runDeltaPass(Test, OperandBundeCount, extractOperandBundesFromModule); +} diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h new file mode 100644 index 000000000000..382c5cb5691d --- /dev/null +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h @@ -0,0 +1,20 @@ +//===- ReduceOperandBundes.h - Specialized Delta Pass ---------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements a function which calls the Generic Delta pass in order +// to reduce uninteresting operand bundes from calls. +// +//===----------------------------------------------------------------------===// + +namespace llvm { + +class TestRunner; + +void reduceOperandBundesDeltaPass(TestRunner &Test); + +} // namespace llvm diff --git a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn index efa80c1b86d8..34e99f4fe32a 100644 --- a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn @@ -17,6 +17,7 @@ executable("llvm-reduce") { "deltas/ReduceGlobalVars.cpp", "deltas/ReduceInstructions.cpp", "deltas/ReduceMetadata.cpp", + "deltas/ReduceOperandBundes.cpp", "llvm-reduce.cpp", ] } From llvm-commits at lists.llvm.org Mon Jul 6 15:18:20 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:18:20 +0000 (UTC) Subject: [PATCH] D83248: [NFCI][IR] Introduce CallBase::Create() wrapper In-Reply-To: References: Message-ID: <1324aea7feb2dfd7a20087cb445e20a7@localhost.localdomain> This revision was automatically updated to reflect the committed changes. lebedev.ri marked an inline comment as done. Closed by commit rG69dca6efc60a: [NFCI][IR] Introduce CallBase::Create() wrapper (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83248/new/ https://reviews.llvm.org/D83248 Files: llvm/include/llvm/IR/InstrTypes.h llvm/lib/IR/Instructions.cpp llvm/lib/Transforms/CFGuard/CFGuard.cpp llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/lib/Transforms/Utils/InlineFunction.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83248.275847.patch Type: text/x-patch Size: 5210 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:18:25 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:18:25 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: <79aaf798c224b5bcab372d95d0a0dc88@localhost.localdomain> This revision was automatically updated to reflect the committed changes. lebedev.ri marked an inline comment as done. Closed by commit rG05f2b5ccfc5d: [llvm-reduce] Reducing call operand bundles (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 Files: llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83177.275848.patch Type: text/x-patch Size: 10493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:20:30 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:20:30 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <76676dd77b749f3b9a4696b2c01a4d01@localhost.localdomain> clementval marked 2 inline comments as done. clementval added inline comments. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:296 + + OS << "\n"; // Empty line at end of file } ---------------- jdenny wrote: > clementval wrote: > > jdenny wrote: > > > clementval wrote: > > > > jdenny wrote: > > > > > Why is an empty line needed? > > > > Just to be consistent with clang-format in the generated file. > > > It's surprising that clang-format would require an empty line at the end of the file. Any idea why? > > Sorry, it is not clang-format but Phabricator which signal a missing empty line at end of file. This is not mandatory and can be removed. The file is generated. I just wanted to be consistent with the style. > Are you referring to, for example, the version of `directive2.td` on the left side of this diff? Here, phabricator points out that the final line in the file is not newline-terminated. I don't recall seeing phabricator suggest an additional empty line. > > Likewise, in `EmitDirectivesImpl`, the final `GenerateIsAllowedClause` call already newline-terminates the final line of the generated file, so I don't see a need for an additional empty line. Your are correct. I removed this since it was a confusion on my side. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 15:21:06 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:21:06 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: clementval updated this revision to Diff 275850. clementval marked an inline comment as done. clementval added a comment. Rebase + remove useless new line generation Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275850.patch Type: text/x-patch Size: 98053 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:22:19 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:22:19 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <3781945200147de3a2a6d9f0ea485c98@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. This revision is now accepted and ready to land. Other than removing the assert, this LGTM. ================ Comment at: clang/lib/Headers/altivec.h:17116 + vector signed int __a, const unsigned int __b, const signed int __c) { + assert((__b == 0 || __b == 1) && "The second argument must be 0 or 1"); +#ifdef __LITTLE_ENDIAN__ ---------------- Sorry that I didn't really pay close enough attention here previously, but please no asserts in the header. 1. The user of the header may not include `` 2. The compiler should not be injecting asserts into user code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 From llvm-commits at lists.llvm.org Mon Jul 6 15:22:58 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:22:58 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <434dd1521838ef49dd2695ae794ca15e@localhost.localdomain> fpetrogalli added a comment. @david-arm , can you please update the patch adding context? Thanks! Francesco Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83195/new/ https://reviews.llvm.org/D83195 From llvm-commits at lists.llvm.org Mon Jul 6 15:23:43 2020 From: llvm-commits at lists.llvm.org (Hideto Ueno via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:23:43 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <7ec121f4864f25c14c6410d3be0fddc9@localhost.localdomain> uenoku added a comment. In D83185#2134395 , @kuter wrote: > In D83185#2134275 , @uenoku wrote: > > > Either is fine but I think it is more natural to forbid an empty list. > > > Do you mean returning a error if a empty `--attributor-seed-allow-list` option is present ? > Currently the size of list is being used to tell if a list is present or not. > I think I can use `getNumOccurrences()` to replace this behaviour . Yes, I think replacing ZeroOrMore with CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 15:27:29 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:27:29 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: <967620f7232ddc494322f81f6e2ea01a@localhost.localdomain> smeenai added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:65-66 // specified in -delete_rpath. for (StringRef RPath : Config.RPathsToRemove) { if (RPathsToRemove.count(RPath)) return createStringError(errc::invalid_argument, ---------------- sameerarora101 wrote: > smeenai wrote: > > Not related to this diff, but something I noticed: given that `RPathsToRemove` starts out as a copy of `Config.RPathsToRemove` and you're just removing elements from it, can't you just error out for any remaining entries in `RPathsToRemove` instead? > So the idea here was to raise an error for the **first** specified RPath that doesn't exist. For that purpose, we iterate through `Config.RPathsToRemove` and raise an error for the **first** RPath not deleted. Got it. RPathsToRemove is a DenseSet though, and I don't believe that guarantees that iteration order will be the same as insertion order (or even that iteration order will be deterministic); you'd need a [SetVector](https://llvm.org/docs/ProgrammersManual.html#llvm-adt-setvector-h) for that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 15:27:43 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:27:43 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: kuter added a comment. In D83185#2134444 , @uenoku wrote: > In D83185#2134395 , @kuter wrote: > > > In D83185#2134275 , @uenoku wrote: > > > > > Either is fine but I think it is more natural to forbid an empty list. > > > > > > Do you mean returning a error if a empty `--attributor-seed-allow-list` option is present ? > > Currently the size of list is being used to tell if a list is present or not. > > I think I can use `getNumOccurrences()` to replace this behaviour . > > > Yes, I think replacing ZeroOrMore with OneOrMore is enoguh I think `cl::OneOrMore` causes a error to be generated if the option is not specified. That would make it mandatory to have a atleast one `--attributor-seed-allow-list` parameter for every invocation of opt https://llvm.org/docs/CommandLine.html > The cl::OneOrMore modifier indicates that the option must be specified at least one time. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Mon Jul 6 15:31:31 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:31:31 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <71ab6b8caddeb0838270a77bb9451e8b@localhost.localdomain> jdenny accepted this revision. jdenny added a comment. This revision is now accepted and ready to land. LGTM. Thanks for addressing my concerns. The TODOs we talked about can be handled later, once @jdoerfert replies. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 15:32:22 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:32:22 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <1e34615cd947f0dce44a716ee4591a7d@localhost.localdomain> amyk updated this revision to Diff 275845. amyk added a comment. Addressed comments from Nemanja: - various variable changes - update comment/documentation - corrected the index for the `xxsplti32dx` instruction for LE/BE - Updated the instruction to use `u1imm` instead of `i1imm` so the index in assembly can be `0/1`, and this allows us the index to be `i32` in the pattern. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275845.patch Type: text/x-patch Size: 11108 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:32:38 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Mon, 06 Jul 2020 15:32:38 -0700 (PDT) Subject: [llvm] fc4f5d6 - [NFCI][llvm-reduce] ReduceOperandBundles: actually put Module forward-declaration back into llvm namespace Message-ID: <5f03a686.1c69fb81.70bb1.8c08@mx.google.com> Author: Roman Lebedev Date: 2020-07-07T01:32:26+03:00 New Revision: fc4f5d65848015217dd227d15da04b8395166407 URL: https://github.com/llvm/llvm-project/commit/fc4f5d65848015217dd227d15da04b8395166407 DIFF: https://github.com/llvm/llvm-project/commit/fc4f5d65848015217dd227d15da04b8395166407.diff LOG: [NFCI][llvm-reduce] ReduceOperandBundles: actually put Module forward-declaration back into llvm namespace Added: Modified: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp index c6de6e9d567c..23a1ae3909b1 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -27,9 +27,9 @@ #include #include -namespace { +namespace llvm { class Module; -} // namespace +} // namespace llvm using namespace llvm; From llvm-commits at lists.llvm.org Mon Jul 6 15:39:49 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via llvm-commits) Date: Mon, 06 Jul 2020 15:39:49 -0700 (PDT) Subject: [llvm] 1e495e1 - [NFC] change getLimitedCodeGenPipelineReason to static function Message-ID: <5f03a835.1c69fb81.a0773.35b3@mx.google.com> Author: Yuanfang Chen Date: 2020-07-06T15:39:27-07:00 New Revision: 1e495e10e6c87c2e7dd9ee7cac9352223b72006b URL: https://github.com/llvm/llvm-project/commit/1e495e10e6c87c2e7dd9ee7cac9352223b72006b DIFF: https://github.com/llvm/llvm-project/commit/1e495e10e6c87c2e7dd9ee7cac9352223b72006b.diff LOG: [NFC] change getLimitedCodeGenPipelineReason to static function Added: Modified: llvm/include/llvm/CodeGen/TargetPassConfig.h llvm/lib/CodeGen/TargetPassConfig.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/TargetPassConfig.h b/llvm/include/llvm/CodeGen/TargetPassConfig.h index 3a4c09163f32..a18c8b16bf1c 100644 --- a/llvm/include/llvm/CodeGen/TargetPassConfig.h +++ b/llvm/include/llvm/CodeGen/TargetPassConfig.h @@ -167,8 +167,8 @@ class TargetPassConfig : public ImmutablePass { /// If hasLimitedCodeGenPipeline is true, this method /// returns a string with the name of the options, separated /// by \p Separator that caused this pipeline to be limited. - std::string - getLimitedCodeGenPipelineReason(const char *Separator = "/") const; + static std::string + getLimitedCodeGenPipelineReason(const char *Separator = "/"); void setDisableVerify(bool Disable) { setOpt(DisableVerify, Disable); } diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp index 241357be5394..e0fdb0cefcb8 100644 --- a/llvm/lib/CodeGen/TargetPassConfig.cpp +++ b/llvm/lib/CodeGen/TargetPassConfig.cpp @@ -472,7 +472,7 @@ bool TargetPassConfig::hasLimitedCodeGenPipeline() { } std::string -TargetPassConfig::getLimitedCodeGenPipelineReason(const char *Separator) const { +TargetPassConfig::getLimitedCodeGenPipelineReason(const char *Separator) { if (!hasLimitedCodeGenPipeline()) return std::string(); std::string Res; From llvm-commits at lists.llvm.org Mon Jul 6 15:40:26 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Mon, 06 Jul 2020 15:40:26 -0700 (PDT) Subject: [llvm] 4029f8e - Temporarily Revert "[llvm-install-name-tool] Merge install-name options" as it breaks the objcopy build. Message-ID: <5f03a85a.1c69fb81.17c50.4e9f@mx.google.com> Author: Eric Christopher Date: 2020-07-06T15:40:14-07:00 New Revision: 4029f8ede42f69f5fb5affb3eb008e03d448f407 URL: https://github.com/llvm/llvm-project/commit/4029f8ede42f69f5fb5affb3eb008e03d448f407 DIFF: https://github.com/llvm/llvm-project/commit/4029f8ede42f69f5fb5affb3eb008e03d448f407.diff LOG: Temporarily Revert "[llvm-install-name-tool] Merge install-name options" as it breaks the objcopy build. This reverts commit c143900a0851b2c7b7d52e4825c7f073b3474cf6. Added: Modified: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test index 7b21fdc2e03c..1435c6b744c8 100644 --- a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test +++ b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test @@ -22,13 +22,6 @@ # NO-INPUT: no input file specified -## Add same RPATH twice: -# RUN: not llvm-install-name-tool -add_rpath @executable_X \ -# RUN: -add_rpath @executable_X %t.i386 2>&1 \ -# RUN: | FileCheck --check-prefix=DOUBLE %s - -# DOUBLE: duplicate load command - ## Check that cmdsize accounts for NULL terminator. # RUN: yaml2obj %p/Inputs/x86_64.yaml -o %t.x86_64 # RUN: llvm-install-name-tool -add_rpath abcd %t.x86_64 diff --git a/llvm/tools/llvm-objcopy/CopyConfig.cpp b/llvm/tools/llvm-objcopy/CopyConfig.cpp index 1fde54dd290a..f93406f371d0 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.cpp +++ b/llvm/tools/llvm-objcopy/CopyConfig.cpp @@ -874,39 +874,42 @@ parseInstallNameToolOptions(ArrayRef ArgsArr) { auto Match = [=](StringRef RPath) { return RPath == Old || RPath == New; }; // Cannot specify duplicate -rpath entries - auto It1 = find_if( - Config.RPathsToUpdate, - [&Match](const DenseMap::value_type &OldNew) { - return Match(OldNew.getFirst()) || Match(OldNew.getSecond()); - }); + auto It1 = find_if(Config.RPathsToUpdate, + [&Match](const std::pair &OldNew) { + return Match(OldNew.first) || Match(OldNew.second); + }); if (It1 != Config.RPathsToUpdate.end()) - return createStringError(errc::invalid_argument, - "cannot specify both -rpath " + It1->getFirst() + - " " + It1->getSecond() + " and -rpath " + - Old + " " + New); + return createStringError( + errc::invalid_argument, + "cannot specify both -rpath %s %s and -rpath %s %s", + It1->first.str().c_str(), It1->second.str().c_str(), + Old.str().c_str(), New.str().c_str()); // Cannot specify the same rpath under both -delete_rpath and -rpath auto It2 = find_if(Config.RPathsToRemove, Match); if (It2 != Config.RPathsToRemove.end()) - return createStringError(errc::invalid_argument, - "cannot specify both -delete_rpath " + *It2 + - " and -rpath " + Old + " " + New); + return createStringError( + errc::invalid_argument, + "cannot specify both -delete_rpath %s and -rpath %s %s", + It2->str().c_str(), Old.str().c_str(), New.str().c_str()); // Cannot specify the same rpath under both -add_rpath and -rpath auto It3 = find_if(Config.RPathToAdd, Match); if (It3 != Config.RPathToAdd.end()) - return createStringError(errc::invalid_argument, - "cannot specify both -add_rpath " + *It3 + - " and -rpath " + Old + " " + New); + return createStringError( + errc::invalid_argument, + "cannot specify both -add_rpath %s and -rpath %s %s", + It3->str().c_str(), Old.str().c_str(), New.str().c_str()); - Config.RPathsToUpdate.insert({Old, New}); + Config.RPathsToUpdate.emplace_back(Old, New); } if (auto *Arg = InputArgs.getLastArg(INSTALL_NAME_TOOL_id)) Config.SharedLibId = Arg->getValue(); for (auto *Arg : InputArgs.filtered(INSTALL_NAME_TOOL_change)) { - Config.InstallNamesToUpdate.insert({Arg->getValue(0), Arg->getValue(1)}); + Config.InstallNamesToUpdate.emplace_back(Arg->getValue(0), + Arg->getValue(1)); } SmallVector Positional; diff --git a/llvm/tools/llvm-objcopy/CopyConfig.h b/llvm/tools/llvm-objcopy/CopyConfig.h index 1341dd674c7b..ce119dee5bff 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.h +++ b/llvm/tools/llvm-objcopy/CopyConfig.h @@ -178,8 +178,8 @@ struct CopyConfig { std::vector DumpSection; std::vector SymbolsToAdd; std::vector RPathToAdd; - DenseMap RPathsToUpdate; - DenseMap InstallNamesToUpdate; + std::vector> RPathsToUpdate; + std::vector> InstallNamesToUpdate; DenseSet RPathsToRemove; // install-name-tool's id option diff --git a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp index 9d0c36630258..3844b6f62de6 100644 --- a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp +++ b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp @@ -42,6 +42,35 @@ static StringRef getPayloadString(const LoadCommand &LC) { .rtrim('\0'); } +static Error removeLoadCommands(const CopyConfig &Config, Object &Obj) { + DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), + Config.RPathsToRemove.end()); + + LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { + StringRef RPath = getPayloadString(LC); + if (RPathsToRemove.count(RPath)) { + RPathsToRemove.erase(RPath); + return true; + } + } + return false; + }; + + if (Error E = Obj.removeLoadCommands(RemovePred)) + return E; + + // Emit an error if the Mach-O binary does not contain an rpath path name + // specified in -delete_rpath. + for (StringRef RPath : Config.RPathsToRemove) { + if (RPathsToRemove.count(RPath)) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: %s", + RPath.str().c_str()); + } + return Error::success(); +} + static Error removeSections(const CopyConfig &Config, Object &Obj) { SectionPred RemovePred = [](const std::unique_ptr
&) { return false; @@ -128,103 +157,6 @@ static LoadCommand buildRPathLoadCommand(StringRef Path) { return LC; } -static Error processLoadCommands(const CopyConfig &Config, Object &Obj) { - // Remove RPaths. - DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), - Config.RPathsToRemove.end()); - - LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { - StringRef RPath = getPayloadString(LC); - if (RPathsToRemove.count(RPath)) { - RPathsToRemove.erase(RPath); - return true; - } - } - return false; - }; - - if (Error E = Obj.removeLoadCommands(RemovePred)) - return E; - - // Emit an error if the Mach-O binary does not contain an rpath path name - // specified in -delete_rpath. - for (StringRef RPath : Config.RPathsToRemove) { - if (RPathsToRemove.count(RPath)) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: %s", - RPath.str().c_str()); - } - - DenseSet RPaths; - - // Get all existing RPaths. - for (LoadCommand &LC : Obj.LoadCommands) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) - RPaths.insert(getPayloadString(LC)); - } - - // Throw errors for invalid RPaths. - for (const auto &OldNew : Config.RPathsToUpdate) { - StringRef Old, New; - std::tie(Old, New) = OldNew; - if (RPaths.count(Old) == 0) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: " + Old); - if (RPaths.count(New) != 0) - return createStringError(errc::invalid_argument, - "rpath " + New + - " would create a duplicate load command"); - } - - // Update load commands. - for (LoadCommand &LC : Obj.LoadCommands) { - switch (LC.MachOLoadCommand.load_command_data.cmd) { - case MachO::LC_ID_DYLIB: - if (Config.SharedLibId) { - StringRef Id = Config.SharedLibId.getValue(); - if (Id.empty()) - return createStringError(errc::invalid_argument, - "cannot specify an empty id"); - updateLoadCommandPayloadString(LC, Id); - } - break; - - case MachO::LC_RPATH: { - StringRef RPath = getPayloadString(LC); - StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath); - if (!NewRPath.empty()) - updateLoadCommandPayloadString(LC, NewRPath); - break; - } - - // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB - // here once llvm-objcopy supports them. - case MachO::LC_LOAD_DYLIB: - case MachO::LC_LOAD_WEAK_DYLIB: - StringRef InstallName = getPayloadString(LC); - StringRef NewInstallName = - Config.InstallNamesToUpdate.lookup(InstallName); - if (!NewInstallName.empty()) - updateLoadCommandPayloadString(LC, - NewInstallName); - break; - } - } - - // Add new RPaths. - for (StringRef RPath : Config.RPathToAdd) { - if (RPaths.count(RPath) != 0) - return createStringError(errc::invalid_argument, - "rpath " + RPath + - " would create a duplicate load command"); - RPaths.insert(RPath); - Obj.addLoadCommand(buildRPathLoadCommand(RPath)); - } - - return Error::success(); -} - static Error dumpSectionToFile(StringRef SecName, StringRef Filename, Object &Obj) { for (LoadCommand &LC : Obj.LoadCommands) @@ -341,6 +273,34 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { for (std::unique_ptr
&Sec : LC.Sections) Sec->Relocations.clear(); + for (LoadCommand &LC : Obj.LoadCommands) { + switch (LC.MachOLoadCommand.load_command_data.cmd) { + case MachO::LC_ID_DYLIB: + if (Config.SharedLibId) { + StringRef Id = Config.SharedLibId.getValue(); + if (Id.empty()) + return createStringError(errc::invalid_argument, + "cannot specify an empty id"); + updateLoadCommandPayloadString(LC, Id); + } + break; + + // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB + // here once llvm-objcopy supports them. + case MachO::LC_LOAD_DYLIB: + case MachO::LC_LOAD_WEAK_DYLIB: + StringRef Old, New; + StringRef CurrentInstallName = getPayloadString(LC); + for (const auto &InstallNamePair : Config.InstallNamesToUpdate) { + std::tie(Old, New) = InstallNamePair; + if (CurrentInstallName == Old) { + updateLoadCommandPayloadString(LC, New); + break; + } + } + } + } + for (const auto &Flag : Config.AddSection) { std::pair SecPair = Flag.split("="); StringRef SecName = SecPair.first; @@ -351,9 +311,45 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { return E; } - if (Error E = processLoadCommands(Config, Obj)) + if (Error E = removeLoadCommands(Config, Obj)) return E; + StringRef Old, New; + for (const auto &OldNew : Config.RPathsToUpdate) { + std::tie(Old, New) = OldNew; + + auto FindRPathLC = [&Obj](StringRef RPath) { + return find_if(Obj.LoadCommands, [=](const LoadCommand &LC) { + return LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && + getPayloadString(LC) == RPath; + }); + }; + + auto NewIt = FindRPathLC(New); + if (NewIt != Obj.LoadCommands.end()) + return createStringError(errc::invalid_argument, + "rpath " + New + + " would create a duplicate load command"); + + auto OldIt = FindRPathLC(Old); + if (OldIt == Obj.LoadCommands.end()) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: " + Old); + + updateLoadCommandPayloadString(*OldIt, New); + } + + for (StringRef RPath : Config.RPathToAdd) { + for (LoadCommand &LC : Obj.LoadCommands) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && + RPath == getPayloadString(LC)) { + return createStringError(errc::invalid_argument, + "rpath " + RPath + + " would create a duplicate load command"); + } + } + Obj.addLoadCommand(buildRPathLoadCommand(RPath)); + } return Error::success(); } From llvm-commits at lists.llvm.org Mon Jul 6 15:41:55 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Mon, 6 Jul 2020 15:41:55 -0700 Subject: [llvm] c143900 - [llvm-install-name-tool] Merge install-name options In-Reply-To: <5f03a287.1c69fb81.59363.554f@mx.google.com> References: <5f03a287.1c69fb81.59363.554f@mx.google.com> Message-ID: Hi Sameer, This broke the build :) There are a lot of complaining build bots you can take a look at. I've temporarily reverted it here: echristo at athyra ~/s/llvm-project> git push To github.com:llvm/llvm-project.git 1e495e10e6c..4029f8ede42 master -> master Thanks! and sorry for the inconvenience -eric On Mon, Jul 6, 2020 at 3:15 PM Sameer Arora via llvm-commits < llvm-commits at lists.llvm.org> wrote: > > Author: Sameer Arora > Date: 2020-07-06T15:15:20-07:00 > New Revision: c143900a0851b2c7b7d52e4825c7f073b3474cf6 > > URL: > https://github.com/llvm/llvm-project/commit/c143900a0851b2c7b7d52e4825c7f073b3474cf6 > DIFF: > https://github.com/llvm/llvm-project/commit/c143900a0851b2c7b7d52e4825c7f073b3474cf6.diff > > LOG: [llvm-install-name-tool] Merge install-name options > > This diff merges all options for llvm-install-name-tool under a single > function processLoadCommands. Also adds another test case for -add_rpath > option. > > Test plan: make check-all > > Reviewed by: jhenderson, alexshap, smeenai, Ktwu > > Differential Revision: https://reviews.llvm.org/D82812 > > Added: > > > Modified: > llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test > llvm/tools/llvm-objcopy/CopyConfig.cpp > llvm/tools/llvm-objcopy/CopyConfig.h > llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp > > Removed: > > > > > ################################################################################ > diff --git > a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test > b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test > index 1435c6b744c8..7b21fdc2e03c 100644 > --- a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test > +++ b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test > @@ -22,6 +22,13 @@ > > # NO-INPUT: no input file specified > > +## Add same RPATH twice: > +# RUN: not llvm-install-name-tool -add_rpath @executable_X \ > +# RUN: -add_rpath @executable_X %t.i386 2>&1 \ > +# RUN: | FileCheck --check-prefix=DOUBLE %s > + > +# DOUBLE: duplicate load command > + > ## Check that cmdsize accounts for NULL terminator. > # RUN: yaml2obj %p/Inputs/x86_64.yaml -o %t.x86_64 > # RUN: llvm-install-name-tool -add_rpath abcd %t.x86_64 > > diff --git a/llvm/tools/llvm-objcopy/CopyConfig.cpp > b/llvm/tools/llvm-objcopy/CopyConfig.cpp > index f93406f371d0..1fde54dd290a 100644 > --- a/llvm/tools/llvm-objcopy/CopyConfig.cpp > +++ b/llvm/tools/llvm-objcopy/CopyConfig.cpp > @@ -874,42 +874,39 @@ parseInstallNameToolOptions(ArrayRef > ArgsArr) { > auto Match = [=](StringRef RPath) { return RPath == Old || RPath == > New; }; > > // Cannot specify duplicate -rpath entries > - auto It1 = find_if(Config.RPathsToUpdate, > - [&Match](const std::pair > &OldNew) { > - return Match(OldNew.first) || > Match(OldNew.second); > - }); > + auto It1 = find_if( > + Config.RPathsToUpdate, > + [&Match](const DenseMap::value_type > &OldNew) { > + return Match(OldNew.getFirst()) || Match(OldNew.getSecond()); > + }); > if (It1 != Config.RPathsToUpdate.end()) > - return createStringError( > - errc::invalid_argument, > - "cannot specify both -rpath %s %s and -rpath %s %s", > - It1->first.str().c_str(), It1->second.str().c_str(), > - Old.str().c_str(), New.str().c_str()); > + return createStringError(errc::invalid_argument, > + "cannot specify both -rpath " + > It1->getFirst() + > + " " + It1->getSecond() + " and -rpath > " + > + Old + " " + New); > > // Cannot specify the same rpath under both -delete_rpath and -rpath > auto It2 = find_if(Config.RPathsToRemove, Match); > if (It2 != Config.RPathsToRemove.end()) > - return createStringError( > - errc::invalid_argument, > - "cannot specify both -delete_rpath %s and -rpath %s %s", > - It2->str().c_str(), Old.str().c_str(), New.str().c_str()); > + return createStringError(errc::invalid_argument, > + "cannot specify both -delete_rpath " + > *It2 + > + " and -rpath " + Old + " " + New); > > // Cannot specify the same rpath under both -add_rpath and -rpath > auto It3 = find_if(Config.RPathToAdd, Match); > if (It3 != Config.RPathToAdd.end()) > - return createStringError( > - errc::invalid_argument, > - "cannot specify both -add_rpath %s and -rpath %s %s", > - It3->str().c_str(), Old.str().c_str(), New.str().c_str()); > + return createStringError(errc::invalid_argument, > + "cannot specify both -add_rpath " + *It3 + > + " and -rpath " + Old + " " + New); > > - Config.RPathsToUpdate.emplace_back(Old, New); > + Config.RPathsToUpdate.insert({Old, New}); > } > > if (auto *Arg = InputArgs.getLastArg(INSTALL_NAME_TOOL_id)) > Config.SharedLibId = Arg->getValue(); > > for (auto *Arg : InputArgs.filtered(INSTALL_NAME_TOOL_change)) { > - Config.InstallNamesToUpdate.emplace_back(Arg->getValue(0), > - Arg->getValue(1)); > + Config.InstallNamesToUpdate.insert({Arg->getValue(0), > Arg->getValue(1)}); > } > > SmallVector Positional; > > diff --git a/llvm/tools/llvm-objcopy/CopyConfig.h > b/llvm/tools/llvm-objcopy/CopyConfig.h > index ce119dee5bff..1341dd674c7b 100644 > --- a/llvm/tools/llvm-objcopy/CopyConfig.h > +++ b/llvm/tools/llvm-objcopy/CopyConfig.h > @@ -178,8 +178,8 @@ struct CopyConfig { > std::vector DumpSection; > std::vector SymbolsToAdd; > std::vector RPathToAdd; > - std::vector> RPathsToUpdate; > - std::vector> InstallNamesToUpdate; > + DenseMap RPathsToUpdate; > + DenseMap InstallNamesToUpdate; > DenseSet RPathsToRemove; > > // install-name-tool's id option > > diff --git a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp > b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp > index 3844b6f62de6..9d0c36630258 100644 > --- a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp > +++ b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp > @@ -42,35 +42,6 @@ static StringRef getPayloadString(const LoadCommand > &LC) { > .rtrim('\0'); > } > > -static Error removeLoadCommands(const CopyConfig &Config, Object &Obj) { > - DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), > - Config.RPathsToRemove.end()); > - > - LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { > - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { > - StringRef RPath = getPayloadString(LC); > - if (RPathsToRemove.count(RPath)) { > - RPathsToRemove.erase(RPath); > - return true; > - } > - } > - return false; > - }; > - > - if (Error E = Obj.removeLoadCommands(RemovePred)) > - return E; > - > - // Emit an error if the Mach-O binary does not contain an rpath path > name > - // specified in -delete_rpath. > - for (StringRef RPath : Config.RPathsToRemove) { > - if (RPathsToRemove.count(RPath)) > - return createStringError(errc::invalid_argument, > - "no LC_RPATH load command with path: %s", > - RPath.str().c_str()); > - } > - return Error::success(); > -} > - > static Error removeSections(const CopyConfig &Config, Object &Obj) { > SectionPred RemovePred = [](const std::unique_ptr
&) { > return false; > @@ -157,6 +128,103 @@ static LoadCommand buildRPathLoadCommand(StringRef > Path) { > return LC; > } > > +static Error processLoadCommands(const CopyConfig &Config, Object &Obj) { > + // Remove RPaths. > + DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), > + Config.RPathsToRemove.end()); > + > + LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { > + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { > + StringRef RPath = getPayloadString(LC); > + if (RPathsToRemove.count(RPath)) { > + RPathsToRemove.erase(RPath); > + return true; > + } > + } > + return false; > + }; > + > + if (Error E = Obj.removeLoadCommands(RemovePred)) > + return E; > + > + // Emit an error if the Mach-O binary does not contain an rpath path > name > + // specified in -delete_rpath. > + for (StringRef RPath : Config.RPathsToRemove) { > + if (RPathsToRemove.count(RPath)) > + return createStringError(errc::invalid_argument, > + "no LC_RPATH load command with path: %s", > + RPath.str().c_str()); > + } > + > + DenseSet RPaths; > + > + // Get all existing RPaths. > + for (LoadCommand &LC : Obj.LoadCommands) { > + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) > + RPaths.insert(getPayloadString(LC)); > + } > + > + // Throw errors for invalid RPaths. > + for (const auto &OldNew : Config.RPathsToUpdate) { > + StringRef Old, New; > + std::tie(Old, New) = OldNew; > + if (RPaths.count(Old) == 0) > + return createStringError(errc::invalid_argument, > + "no LC_RPATH load command with path: " + > Old); > + if (RPaths.count(New) != 0) > + return createStringError(errc::invalid_argument, > + "rpath " + New + > + " would create a duplicate load > command"); > + } > + > + // Update load commands. > + for (LoadCommand &LC : Obj.LoadCommands) { > + switch (LC.MachOLoadCommand.load_command_data.cmd) { > + case MachO::LC_ID_DYLIB: > + if (Config.SharedLibId) { > + StringRef Id = Config.SharedLibId.getValue(); > + if (Id.empty()) > + return createStringError(errc::invalid_argument, > + "cannot specify an empty id"); > + updateLoadCommandPayloadString(LC, Id); > + } > + break; > + > + case MachO::LC_RPATH: { > + StringRef RPath = getPayloadString(LC); > + StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath); > + if (!NewRPath.empty()) > + updateLoadCommandPayloadString(LC, > NewRPath); > + break; > + } > + > + // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and > LC_LOAD_UPWARD_DYLIB > + // here once llvm-objcopy supports them. > + case MachO::LC_LOAD_DYLIB: > + case MachO::LC_LOAD_WEAK_DYLIB: > + StringRef InstallName = getPayloadString(LC); > + StringRef NewInstallName = > + Config.InstallNamesToUpdate.lookup(InstallName); > + if (!NewInstallName.empty()) > + updateLoadCommandPayloadString(LC, > + > NewInstallName); > + break; > + } > + } > + > + // Add new RPaths. > + for (StringRef RPath : Config.RPathToAdd) { > + if (RPaths.count(RPath) != 0) > + return createStringError(errc::invalid_argument, > + "rpath " + RPath + > + " would create a duplicate load > command"); > + RPaths.insert(RPath); > + Obj.addLoadCommand(buildRPathLoadCommand(RPath)); > + } > + > + return Error::success(); > +} > + > static Error dumpSectionToFile(StringRef SecName, StringRef Filename, > Object &Obj) { > for (LoadCommand &LC : Obj.LoadCommands) > @@ -273,34 +341,6 @@ static Error handleArgs(const CopyConfig &Config, > Object &Obj) { > for (std::unique_ptr
&Sec : LC.Sections) > Sec->Relocations.clear(); > > - for (LoadCommand &LC : Obj.LoadCommands) { > - switch (LC.MachOLoadCommand.load_command_data.cmd) { > - case MachO::LC_ID_DYLIB: > - if (Config.SharedLibId) { > - StringRef Id = Config.SharedLibId.getValue(); > - if (Id.empty()) > - return createStringError(errc::invalid_argument, > - "cannot specify an empty id"); > - updateLoadCommandPayloadString(LC, Id); > - } > - break; > - > - // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and > LC_LOAD_UPWARD_DYLIB > - // here once llvm-objcopy supports them. > - case MachO::LC_LOAD_DYLIB: > - case MachO::LC_LOAD_WEAK_DYLIB: > - StringRef Old, New; > - StringRef CurrentInstallName = getPayloadString(LC); > - for (const auto &InstallNamePair : Config.InstallNamesToUpdate) { > - std::tie(Old, New) = InstallNamePair; > - if (CurrentInstallName == Old) { > - updateLoadCommandPayloadString(LC, New); > - break; > - } > - } > - } > - } > - > for (const auto &Flag : Config.AddSection) { > std::pair SecPair = Flag.split("="); > StringRef SecName = SecPair.first; > @@ -311,45 +351,9 @@ static Error handleArgs(const CopyConfig &Config, > Object &Obj) { > return E; > } > > - if (Error E = removeLoadCommands(Config, Obj)) > + if (Error E = processLoadCommands(Config, Obj)) > return E; > > - StringRef Old, New; > - for (const auto &OldNew : Config.RPathsToUpdate) { > - std::tie(Old, New) = OldNew; > - > - auto FindRPathLC = [&Obj](StringRef RPath) { > - return find_if(Obj.LoadCommands, [=](const LoadCommand &LC) { > - return LC.MachOLoadCommand.load_command_data.cmd == > MachO::LC_RPATH && > - getPayloadString(LC) == RPath; > - }); > - }; > - > - auto NewIt = FindRPathLC(New); > - if (NewIt != Obj.LoadCommands.end()) > - return createStringError(errc::invalid_argument, > - "rpath " + New + > - " would create a duplicate load > command"); > - > - auto OldIt = FindRPathLC(Old); > - if (OldIt == Obj.LoadCommands.end()) > - return createStringError(errc::invalid_argument, > - "no LC_RPATH load command with path: " + > Old); > - > - updateLoadCommandPayloadString(*OldIt, New); > - } > - > - for (StringRef RPath : Config.RPathToAdd) { > - for (LoadCommand &LC : Obj.LoadCommands) { > - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && > - RPath == getPayloadString(LC)) { > - return createStringError(errc::invalid_argument, > - "rpath " + RPath + > - " would create a duplicate load > command"); > - } > - } > - Obj.addLoadCommand(buildRPathLoadCommand(RPath)); > - } > return Error::success(); > } > > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:45:32 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:45:32 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <0d42e2e361c543c3eef9f70c58b090f1@localhost.localdomain> jdenny added a comment. In D82982#2134458 , @jdenny wrote: > The TODOs we talked about can be handled later, once @jdoerfert replies. To be clear, by "later", I meant another patch. I think this is ready to land. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Mon Jul 6 15:47:20 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:47:20 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: jcai19 updated this revision to Diff 275853. jcai19 added a comment. Issuing multiple-byte NOP for 32-bit mode broke many tests and probably worth a separate patch itself. So this patch will keep the behavior on 32-bit mode unchange for now. This built and passed all the tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 Files: llvm/include/llvm/MC/MCAsmBackend.h llvm/include/llvm/MC/MCFragment.h llvm/include/llvm/MC/MCObjectStreamer.h llvm/include/llvm/MC/MCStreamer.h llvm/lib/MC/MCAssembler.cpp llvm/lib/MC/MCFragment.cpp llvm/lib/MC/MCObjectStreamer.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp llvm/test/MC/X86/align-branch-bundle.s llvm/test/MC/X86/align-branch-pad-max-prefix.s llvm/test/MC/X86/x86-directive-nops-errors.s llvm/test/MC/X86/x86-directive-nops.s llvm/test/MC/X86/x86_64-directive-nops.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82826.275853.patch Type: text/x-patch Size: 14517 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:47:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:47:45 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-nonalloc-reloc== Message-ID: MaskRay created this revision. Herald added subscribers: llvm-commits, arichardson, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. -z dead-nonalloc-reloc== Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83264 Files: lld/ELF/Config.h lld/ELF/Driver.cpp lld/ELF/InputSection.cpp lld/ELF/Options.td lld/test/ELF/debug-dead-reloc-icf.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83264.275854.patch Type: text/x-patch Size: 5179 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 15:56:53 2020 From: llvm-commits at lists.llvm.org (Atmn Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 22:56:53 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <826d1e0b02aa6e18b4ec8fa704b54f3d@localhost.localdomain> atmnpatel updated this revision to Diff 275855. atmnpatel added a comment. Fixed tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 Files: clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp clang/docs/LibASTMatchersReference.html clang/include/clang/ASTMatchers/ASTMatchers.h clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/ASTMatchers/Dynamic/Registry.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/distribute_parallel_for_default_messages.cpp clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/driver.c clang/test/OpenMP/parallel_default_messages.cpp clang/test/OpenMP/parallel_for_default_messages.cpp clang/test/OpenMP/parallel_for_simd_default_messages.cpp clang/test/OpenMP/parallel_master_codegen.cpp clang/test/OpenMP/parallel_master_default_messages.cpp clang/test/OpenMP/parallel_sections_default_messages.cpp clang/test/OpenMP/target_parallel_default_messages.cpp clang/test/OpenMP/target_parallel_for_default_messages.cpp clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp clang/test/OpenMP/target_teams_default_messages.cpp clang/test/OpenMP/target_teams_distribute_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/task_default_messages.cpp clang/test/OpenMP/task_messages.cpp clang/test/OpenMP/teams_default_messages.cpp clang/test/OpenMP/teams_distribute_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/teams_distribute_simd_default_messages.cpp clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp clang/unittests/ASTMatchers/ASTMatchersTest.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def -------------- next part -------------- A non-text attachment was scrubbed... Name: D75591.275855.patch Type: text/x-patch Size: 274251 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 16:12:39 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:12:39 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: sameerarora101 updated this revision to Diff 275858. sameerarora101 added a comment. updating `std::tie(Old, New) = OldNew;` as it was breaking on Darwin Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 Files: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82812.275858.patch Type: text/x-patch Size: 12635 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 16:12:55 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Mon, 06 Jul 2020 16:12:55 -0700 (PDT) Subject: [llvm] ea71ba1 - [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division Message-ID: <5f03aff7.1c69fb81.d32d9.e91a@mx.google.com> Author: Sanjay Patel Date: 2020-07-06T19:12:21-04:00 New Revision: ea71ba11ab1187af03a790dc20967ddd62f68bfe URL: https://github.com/llvm/llvm-project/commit/ea71ba11ab1187af03a790dc20967ddd62f68bfe DIFF: https://github.com/llvm/llvm-project/commit/ea71ba11ab1187af03a790dc20967ddd62f68bfe.diff LOG: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division X / (fabs(A) * sqrt(Z)) --> X / sqrt(A*A*Z) --> X * rsqrt(A*A*Z) In the motivating case from PR46406: https://bugs.llvm.org/show_bug.cgi?id=46406 ...this is restoring the sequence that was originally in the source code. We extracted a term from within the sqrt because we do not know in instcombine whether a target will expand a sqrt call. Note: we could say that the transform in IR should be restricted, but that would not solve the problem if the source was originally in the pattern shown here. This is a gray area for fast-math-flag requirements. I think we should at least check fast-math-flags on the fdiv and fmul because I view this transform as 2 pieces: reassociate the fmul operands and form reciprocal from the fdiv (as with the existing transform). We could argue that the sqrt also needs FMF, but that was not required before, so we should change that in a follow-up patch if that seems better. We don't currently have a way to check that the target will produce a sqrt or recip estimate without actually creating nodes (the APIs are SDValue getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up speculatively created nodes if we are not able to create an estimate. The x86 test with doubles verifies that we are not changing a test with no estimate sequence. Differential Revision: https://reviews.llvm.org/D82716 Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/X86/sqrt-fastmath.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 015d78a3e868..c94bbeb60716 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -13232,6 +13232,24 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) { Y = N1.getOperand(0); } if (Sqrt.getNode()) { + // If the other multiply operand is known positive, pull it into the + // sqrt. That will eliminate the division if we convert to an estimate: + // X / (fabs(A) * sqrt(Z)) --> X / sqrt(A*A*Z) --> X * rsqrt(A*A*Z) + // TODO: Also fold the case where A == Z (fabs is missing). + if (Flags.hasAllowReassociation() && N1.hasOneUse() && + N1->getFlags().hasAllowReassociation() && Sqrt.hasOneUse() && + Y.getOpcode() == ISD::FABS && Y.hasOneUse()) { + SDValue AA = DAG.getNode(ISD::FMUL, DL, VT, Y.getOperand(0), + Y.getOperand(0), Flags); + SDValue AAZ = + DAG.getNode(ISD::FMUL, DL, VT, AA, Sqrt.getOperand(0), Flags); + if (SDValue Rsqrt = buildRsqrtEstimate(AAZ, Flags)) + return DAG.getNode(ISD::FMUL, DL, VT, N0, Rsqrt, Flags); + + // Estimate creation failed. Clean up speculatively created nodes. + recursivelyDeleteUnusedNodes(AAZ.getNode()); + } + // We found a FSQRT, so try to make this fold: // X / (Y * sqrt(Z)) -> X * (rsqrt(Z) / Y) if (SDValue Rsqrt = buildRsqrtEstimate(Sqrt.getOperand(0), Flags)) { diff --git a/llvm/test/CodeGen/X86/sqrt-fastmath.ll b/llvm/test/CodeGen/X86/sqrt-fastmath.ll index b1582d7288c9..29d21f6927bd 100644 --- a/llvm/test/CodeGen/X86/sqrt-fastmath.ll +++ b/llvm/test/CodeGen/X86/sqrt-fastmath.ll @@ -618,46 +618,47 @@ define <16 x float> @v16f32_estimate(<16 x float> %x) #1 { ret <16 x float> %div } -; x / (fabs(y) * sqrt(z)) +; x / (fabs(y) * sqrt(z)) --> x * rsqrt(y*y*z) define float @div_sqrt_fabs_f32(float %x, float %y, float %z) { ; SSE-LABEL: div_sqrt_fabs_f32: ; SSE: # %bb.0: -; SSE-NEXT: rsqrtss %xmm2, %xmm3 -; SSE-NEXT: mulss %xmm3, %xmm2 -; SSE-NEXT: mulss %xmm3, %xmm2 -; SSE-NEXT: addss {{.*}}(%rip), %xmm2 -; SSE-NEXT: mulss {{.*}}(%rip), %xmm3 -; SSE-NEXT: mulss %xmm2, %xmm3 -; SSE-NEXT: andps {{.*}}(%rip), %xmm1 -; SSE-NEXT: divss %xmm1, %xmm3 -; SSE-NEXT: mulss %xmm3, %xmm0 +; SSE-NEXT: mulss %xmm1, %xmm1 +; SSE-NEXT: mulss %xmm2, %xmm1 +; SSE-NEXT: xorps %xmm2, %xmm2 +; SSE-NEXT: rsqrtss %xmm1, %xmm2 +; SSE-NEXT: mulss %xmm2, %xmm1 +; SSE-NEXT: mulss %xmm2, %xmm1 +; SSE-NEXT: addss {{.*}}(%rip), %xmm1 +; SSE-NEXT: mulss {{.*}}(%rip), %xmm2 +; SSE-NEXT: mulss %xmm0, %xmm2 +; SSE-NEXT: mulss %xmm1, %xmm2 +; SSE-NEXT: movaps %xmm2, %xmm0 ; SSE-NEXT: retq ; ; AVX1-LABEL: div_sqrt_fabs_f32: ; AVX1: # %bb.0: -; AVX1-NEXT: vrsqrtss %xmm2, %xmm2, %xmm3 -; AVX1-NEXT: vmulss %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vmulss %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vaddss {{.*}}(%rip), %xmm2, %xmm2 -; AVX1-NEXT: vmulss {{.*}}(%rip), %xmm3, %xmm3 -; AVX1-NEXT: vmulss %xmm2, %xmm3, %xmm2 -; AVX1-NEXT: vandps {{.*}}(%rip), %xmm1, %xmm1 -; AVX1-NEXT: vdivss %xmm1, %xmm2, %xmm1 -; AVX1-NEXT: vmulss %xmm1, %xmm0, %xmm0 +; AVX1-NEXT: vmulss %xmm1, %xmm1, %xmm1 +; AVX1-NEXT: vmulss %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vrsqrtss %xmm1, %xmm1, %xmm2 +; AVX1-NEXT: vmulss %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vmulss %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vaddss {{.*}}(%rip), %xmm1, %xmm1 +; AVX1-NEXT: vmulss {{.*}}(%rip), %xmm2, %xmm2 +; AVX1-NEXT: vmulss %xmm0, %xmm2, %xmm0 +; AVX1-NEXT: vmulss %xmm0, %xmm1, %xmm0 ; AVX1-NEXT: retq ; ; AVX512-LABEL: div_sqrt_fabs_f32: ; AVX512: # %bb.0: -; AVX512-NEXT: vrsqrtss %xmm2, %xmm2, %xmm3 -; AVX512-NEXT: vmulss %xmm3, %xmm2, %xmm2 -; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm2 = (xmm3 * xmm2) + mem -; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm3, %xmm3 -; AVX512-NEXT: vbroadcastss {{.*#+}} xmm4 = [NaN,NaN,NaN,NaN] -; AVX512-NEXT: vmulss %xmm2, %xmm3, %xmm2 -; AVX512-NEXT: vandps %xmm4, %xmm1, %xmm1 -; AVX512-NEXT: vdivss %xmm1, %xmm2, %xmm1 -; AVX512-NEXT: vmulss %xmm1, %xmm0, %xmm0 +; AVX512-NEXT: vmulss %xmm1, %xmm1, %xmm1 +; AVX512-NEXT: vmulss %xmm2, %xmm1, %xmm1 +; AVX512-NEXT: vrsqrtss %xmm1, %xmm1, %xmm2 +; AVX512-NEXT: vmulss %xmm2, %xmm1, %xmm1 +; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm1 = (xmm2 * xmm1) + mem +; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm2, %xmm2 +; AVX512-NEXT: vmulss %xmm0, %xmm2, %xmm0 +; AVX512-NEXT: vmulss %xmm0, %xmm1, %xmm0 ; AVX512-NEXT: retq %s = call fast float @llvm.sqrt.f32(float %z) %a = call fast float @llvm.fabs.f32(float %y) @@ -666,47 +667,46 @@ define float @div_sqrt_fabs_f32(float %x, float %y, float %z) { ret float %d } -; x / (fabs(y) * sqrt(z)) +; x / (fabs(y) * sqrt(z)) --> x * rsqrt(y*y*z) define <4 x float> @div_sqrt_fabs_v4f32(<4 x float> %x, <4 x float> %y, <4 x float> %z) { ; SSE-LABEL: div_sqrt_fabs_v4f32: ; SSE: # %bb.0: -; SSE-NEXT: rsqrtps %xmm2, %xmm3 -; SSE-NEXT: mulps %xmm3, %xmm2 -; SSE-NEXT: mulps %xmm3, %xmm2 -; SSE-NEXT: addps {{.*}}(%rip), %xmm2 -; SSE-NEXT: mulps {{.*}}(%rip), %xmm3 -; SSE-NEXT: mulps %xmm2, %xmm3 -; SSE-NEXT: andps {{.*}}(%rip), %xmm1 -; SSE-NEXT: divps %xmm1, %xmm3 -; SSE-NEXT: mulps %xmm3, %xmm0 +; SSE-NEXT: mulps %xmm1, %xmm1 +; SSE-NEXT: mulps %xmm2, %xmm1 +; SSE-NEXT: rsqrtps %xmm1, %xmm2 +; SSE-NEXT: mulps %xmm2, %xmm1 +; SSE-NEXT: mulps %xmm2, %xmm1 +; SSE-NEXT: addps {{.*}}(%rip), %xmm1 +; SSE-NEXT: mulps {{.*}}(%rip), %xmm2 +; SSE-NEXT: mulps %xmm1, %xmm2 +; SSE-NEXT: mulps %xmm2, %xmm0 ; SSE-NEXT: retq ; ; AVX1-LABEL: div_sqrt_fabs_v4f32: ; AVX1: # %bb.0: -; AVX1-NEXT: vrsqrtps %xmm2, %xmm3 -; AVX1-NEXT: vmulps %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vmulps %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vaddps {{.*}}(%rip), %xmm2, %xmm2 -; AVX1-NEXT: vmulps {{.*}}(%rip), %xmm3, %xmm3 -; AVX1-NEXT: vmulps %xmm2, %xmm3, %xmm2 -; AVX1-NEXT: vandps {{.*}}(%rip), %xmm1, %xmm1 -; AVX1-NEXT: vdivps %xmm1, %xmm2, %xmm1 +; AVX1-NEXT: vmulps %xmm1, %xmm1, %xmm1 +; AVX1-NEXT: vmulps %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vrsqrtps %xmm1, %xmm2 +; AVX1-NEXT: vmulps %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vmulps %xmm2, %xmm1, %xmm1 +; AVX1-NEXT: vaddps {{.*}}(%rip), %xmm1, %xmm1 +; AVX1-NEXT: vmulps {{.*}}(%rip), %xmm2, %xmm2 +; AVX1-NEXT: vmulps %xmm1, %xmm2, %xmm1 ; AVX1-NEXT: vmulps %xmm1, %xmm0, %xmm0 ; AVX1-NEXT: retq ; ; AVX512-LABEL: div_sqrt_fabs_v4f32: ; AVX512: # %bb.0: -; AVX512-NEXT: vrsqrtps %xmm2, %xmm3 -; AVX512-NEXT: vmulps %xmm3, %xmm2, %xmm2 -; AVX512-NEXT: vbroadcastss {{.*#+}} xmm4 = [-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0] -; AVX512-NEXT: vfmadd231ps {{.*#+}} xmm4 = (xmm3 * xmm2) + xmm4 -; AVX512-NEXT: vbroadcastss {{.*#+}} xmm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] -; AVX512-NEXT: vmulps %xmm2, %xmm3, %xmm2 -; AVX512-NEXT: vbroadcastss {{.*#+}} xmm3 = [NaN,NaN,NaN,NaN] -; AVX512-NEXT: vmulps %xmm4, %xmm2, %xmm2 -; AVX512-NEXT: vandps %xmm3, %xmm1, %xmm1 -; AVX512-NEXT: vdivps %xmm1, %xmm2, %xmm1 +; AVX512-NEXT: vmulps %xmm1, %xmm1, %xmm1 +; AVX512-NEXT: vmulps %xmm2, %xmm1, %xmm1 +; AVX512-NEXT: vrsqrtps %xmm1, %xmm2 +; AVX512-NEXT: vmulps %xmm2, %xmm1, %xmm1 +; AVX512-NEXT: vbroadcastss {{.*#+}} xmm3 = [-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0] +; AVX512-NEXT: vfmadd231ps {{.*#+}} xmm3 = (xmm2 * xmm1) + xmm3 +; AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] +; AVX512-NEXT: vmulps %xmm1, %xmm2, %xmm1 +; AVX512-NEXT: vmulps %xmm3, %xmm1, %xmm1 ; AVX512-NEXT: vmulps %xmm1, %xmm0, %xmm0 ; AVX512-NEXT: retq %s = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %z) @@ -716,6 +716,11 @@ define <4 x float> @div_sqrt_fabs_v4f32(<4 x float> %x, <4 x float> %y, <4 x flo ret <4 x float> %d } +; This has 'arcp' but does not have 'reassoc' FMF. +; We allow converting the sqrt to an estimate, but +; do not pull the divisor into the estimate. +; x / (fabs(y) * sqrt(z)) --> x * rsqrt(z) / fabs(y) + define <4 x float> @div_sqrt_fabs_v4f32_fmf(<4 x float> %x, <4 x float> %y, <4 x float> %z) { ; SSE-LABEL: div_sqrt_fabs_v4f32_fmf: ; SSE: # %bb.0: @@ -765,6 +770,8 @@ define <4 x float> @div_sqrt_fabs_v4f32_fmf(<4 x float> %x, <4 x float> %y, <4 x ret <4 x float> %d } +; No estimates for f64, so do not convert fabs into an fmul. + define double @div_sqrt_fabs_f64(double %x, double %y, double %z) { ; SSE-LABEL: div_sqrt_fabs_f64: ; SSE: # %bb.0: From llvm-commits at lists.llvm.org Mon Jul 6 16:12:57 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:12:57 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: <883ae3886f9dd4313f35f30050d09230@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGea71ba11ab11: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division (authored by spatel). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/X86/sqrt-fastmath.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82716.275859.patch Type: text/x-patch Size: 8428 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 16:13:00 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:13:00 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: <5fd0662b7cf13393815f43f82d15dfa0@localhost.localdomain> paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll:25 +; how fixed length operation are lowered to scalable ones, with multiple blocks +; ensuring insert/extract sequences are not folded away. + ---------------- cameron.mcinally wrote: > Nit: could probably mark the loads/stores volatile to avoid the branch. I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure the extract_subvector resulting from lowering the load is not combined with the insert_subvector that's created when lowering the store. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 From llvm-commits at lists.llvm.org Mon Jul 6 16:13:57 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:13:57 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: sameerarora101 marked an inline comment as done. sameerarora101 added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:169-170 + for (const auto &OldNew : Config.RPathsToUpdate) { + StringRef Old = OldNew.getFirst(); + StringRef New = OldNew.getSecond(); + if (RPaths.count(Old) == 0) ---------------- this is the lates update. Would it work on Darwin? thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 16:18:23 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:18:23 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <48450470d1ba9f0116f6f229fcefcfa3@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. This revision is now accepted and ready to land. The remaining updates are straightforward so feel free to address my comments on the commit. LGTM otherwise. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9800 + LHS = peekThroughBitcasts(LHS); + RHS = peekThroughBitcasts(RHS); + if (RHS->getOpcode() != ISD::BUILD_VECTOR) { ---------------- Forgot to remove these? ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9840 + return SDValue(); + + SDValue SplatNode = DAG.getNode( ---------------- If the splat is smaller than 32 bits, you need to replicate it. ``` // If the splat is narrower than 32-bits, we need to get the 32-bit value // for XXSPLTI32DX. unsigned SplatVal = APSplatValue.getZExtValue(); for (; SplatBitSize < 32; SplatBitSize <<= 1) SplatVal |= (SplatVal << SplatBitSize); ``` and then use `SplatVal` below when creating the `XXSPLTI32DX` node. We also need a test case for this. Something like: ``` vector int test(vector int a) { unsigned Val = 0xABABABAB; a[0] = Val; a[2] = Val; return a; } ``` This should give you a `SplatBitSize == 8` and `APSplatValue == 0xAB`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 From llvm-commits at lists.llvm.org Mon Jul 6 16:33:37 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:33:37 +0000 (UTC) Subject: [PATCH] D82754: [lit] Prevent hang when lit sees non-ASCII characters In-Reply-To: References: Message-ID: <3eac1bb6d415007f4d13d4cf76625f5b@localhost.localdomain> jdenny added a comment. In D82754#2127919 , @richard.barton.arm wrote: > Hi @jdenny Hi. Sorry for the delay. > I can make a test that is adapted from shtest-shell/stdout-encoding.txt with this RUN line > > # RUN: not env PYTHONIOENCODING=ascii %{lit} -j 1 -v %{inputs}/shtest-shell-ascii > %t.out > > > which will trigger the error for me when run with llvm-lit without -a (although the failure mode is to hang, so pretty nasty) Thanks for finding this reproducer! > but this test passes in make check-all. By "passes", I think you're saying that it doesn't reproduce the error. Right? If so, I suspect that llvm-lit called directly uses python2 on your system, but check-all uses python3. Can you confirm? I also suspect this is a python2-specific bug. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82754/new/ https://reviews.llvm.org/D82754 From llvm-commits at lists.llvm.org Mon Jul 6 16:36:47 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:36:47 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <286363da12d8756d0131c9b121e56b61@localhost.localdomain> aprantl accepted this revision. aprantl added a comment. This revision is now accepted and ready to land. Thank you! I was worried that we would be breaking compatibility with consumers for now compelling reason. If the consumers don't mind, then neither do I :-) I would appreciate if you could follow up with adding both DWARF 2 & 4 variants to `packed_bitfields.ll` before landing this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 16:37:50 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:37:50 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <635aff97e0b43911c7080edc0b3aeb4e@localhost.localdomain> aprantl added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1546 // Handle bitfield, assume bytes are 8 bits. + uint64_t StorageSize = ((Offset + Size + 7) / 8 - Offset / 8) * 8; + if (StorageSize > FieldSize) ---------------- Do we have something in https://llvm.org/doxygen/MathExtras_8h_source.html to make this more readable? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 16:39:58 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:39:58 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: aprantl added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1546 // Handle bitfield, assume bytes are 8 bits. + uint64_t StorageSize = ((Offset + Size + 7) / 8 - Offset / 8) * 8; + if (StorageSize > FieldSize) ---------------- aprantl wrote: > Do we have something in https://llvm.org/doxygen/MathExtras_8h_source.html to make this more readable? alignTo() perhaps? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 16:43:31 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:43:31 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: <2e3b0999a88ebfe83be43e771cf8f092@localhost.localdomain> aprantl added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h:598 + const DbgValueHistoryMap::Entries &Entries, + SmallPtrSetImpl &VeryLargeBlocks); ---------------- Not very important, but: Assuming that `VeryLargeBlocks` will only be populated in the pathological case, micro-optimizing with a *Small*PtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 From llvm-commits at lists.llvm.org Mon Jul 6 16:45:46 2020 From: llvm-commits at lists.llvm.org (Guozhi Wei via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 23:45:46 +0000 (UTC) Subject: [PATCH] D83265: [MBP] Use profile count to compute tail dup cost if it is available Message-ID: Carrot created this revision. Carrot added a reviewer: davidxl. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Current tail duplication in machine block placement pass uses block frequency information in cost model. But frequency number has only relative meaning compared to other basic blocks in the same function. A large frequency number doesn't mean it is hot and a small frequency number doesn't mean it is cold. To overcome this problem, this patch uses profile count in cost model if it's available. So we can tail duplicate real hot basic blocks. When tested with spec2006int, the performance doesn't change, the number of tail duplicated blocks was reduced from 2376 to 1746. In our internal testing, search1 was not impacted, search2 was improved by 0.1%, another 0.1% can be achieved with larger threshold parameter. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83265 Files: llvm/lib/CodeGen/MachineBlockPlacement.cpp llvm/test/CodeGen/X86/dup-cost.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83265.275860.patch Type: text/x-patch Size: 7035 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 16:58:50 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Mon, 6 Jul 2020 16:58:50 -0700 Subject: [llvm] 1fed131 - [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE In-Reply-To: <5eec28f8.1c69fb81.b595b.ca3c@mx.google.com> References: <5eec28f8.1c69fb81.b595b.ca3c@mx.google.com> Message-ID: Hi Nemanja! Running into a compiler crash with this building skia (https://skia.org/) for power after this patch. I'll see what I can do to get a testcase (if it doesn't reproduce for you), but would you mind terribly reverting in the meantime? Thanks! -eric On Thu, Jun 18, 2020 at 7:55 PM Nemanja Ivanovic via llvm-commits < llvm-commits at lists.llvm.org> wrote: > > Author: Nemanja Ivanovic > Date: 2020-06-18T21:54:22-05:00 > New Revision: 1fed131660b2c5d3ea7007e273a7a5da80699445 > > URL: > https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445 > DIFF: > https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445.diff > > LOG: [PowerPC] Canonicalize shuffles to match more single-instruction > masks on LE > > We currently miss a number of opportunities to emit single-instruction > VMRG[LH][BHW] instructions for shuffles on little endian subtargets. > Although > this in itself is not a huge performance opportunity since loading the > permute > vector for a VPERM can always be pulled out of loops, producing such merge > instructions is useful to downstream optimizations. > Since VPERM is essentially opaque to all subsequent optimizations, we want > to > avoid it as much as possible. Other permute instructions have semantics > that can > be reasoned about much more easily in later optimizations. > > This patch does the following: > - Canonicalize shuffles so that the first element comes from the first > vector > (since that's what most of the mask matching functions want) > - Switch the elements that come from splat vectors so that they match the > corresponding elements from the other vector (to allow for merges) > - Adds debugging messages for when a shuffle is matched to a VPERM so that > anyone interested in improving this further can get the info for their > code > > Differential revision: https://reviews.llvm.org/D77448 > > Added: > > > Modified: > llvm/lib/Target/PowerPC/PPCISelLowering.cpp > llvm/lib/Target/PowerPC/PPCISelLowering.h > llvm/lib/Target/PowerPC/PPCInstrVSX.td > llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll > llvm/test/CodeGen/PowerPC/build-vector-tests.ll > llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll > llvm/test/CodeGen/PowerPC/fp-strict-round.ll > llvm/test/CodeGen/PowerPC/load-and-splat.ll > llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll > llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll > llvm/test/CodeGen/PowerPC/pr25080.ll > llvm/test/CodeGen/PowerPC/pr25157-peephole.ll > llvm/test/CodeGen/PowerPC/pr38087.ll > llvm/test/CodeGen/PowerPC/pre-inc-disable.ll > llvm/test/CodeGen/PowerPC/qpx-load-splat.ll > llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll > llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll > llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll > llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll > llvm/test/CodeGen/PowerPC/swaps-le-5.ll > llvm/test/CodeGen/PowerPC/swaps-le-6.ll > llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll > llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll > llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll > llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll > llvm/test/CodeGen/PowerPC/vsx.ll > llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll > > Removed: > > > > > ################################################################################ > diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp > b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp > index d7698a5ec962..28bd80610c84 100644 > --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp > +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp > @@ -125,6 +125,7 @@ cl::desc("use absolute jump tables on ppc"), > cl::Hidden); > > STATISTIC(NumTailCalls, "Number of tail calls"); > STATISTIC(NumSiblingCalls, "Number of sibling calls"); > +STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a > VPERM"); > > static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int); > > @@ -1505,6 +1506,8 @@ const char > *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const { > case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ"; > case PPCISD::SINT_VEC_TO_FP: return "PPCISD::SINT_VEC_TO_FP"; > case PPCISD::UINT_VEC_TO_FP: return "PPCISD::UINT_VEC_TO_FP"; > + case PPCISD::SCALAR_TO_VECTOR_PERMUTED: > + return "PPCISD::SCALAR_TO_VECTOR_PERMUTED"; > case PPCISD::ANDI_rec_1_EQ_BIT: > return "PPCISD::ANDI_rec_1_EQ_BIT"; > case PPCISD::ANDI_rec_1_GT_BIT: > @@ -2716,7 +2719,8 @@ static bool usePartialVectorLoads(SDNode *N, const > PPCSubtarget& ST) { > for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end(); > UI != UE; ++UI) > if (UI.getUse().get().getResNo() == 0 && > - UI->getOpcode() != ISD::SCALAR_TO_VECTOR) > + UI->getOpcode() != ISD::SCALAR_TO_VECTOR && > + UI->getOpcode() != PPCISD::SCALAR_TO_VECTOR_PERMUTED) > return false; > > return true; > @@ -9041,7 +9045,8 @@ static const SDValue *getNormalLoadInput(const > SDValue &Op) { > const SDValue *InputLoad = &Op; > if (InputLoad->getOpcode() == ISD::BITCAST) > InputLoad = &InputLoad->getOperand(0); > - if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR) > + if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR || > + InputLoad->getOpcode() == PPCISD::SCALAR_TO_VECTOR_PERMUTED) > InputLoad = &InputLoad->getOperand(0); > if (InputLoad->getOpcode() != ISD::LOAD) > return nullptr; > @@ -9690,6 +9695,15 @@ SDValue > PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, > SDValue V1 = Op.getOperand(0); > SDValue V2 = Op.getOperand(1); > ShuffleVectorSDNode *SVOp = cast(Op); > + > + // Any nodes that were combined in the target-independent combiner prior > + // to vector legalization will not be sent to the target combine. Try to > + // combine it here. > + if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) { > + DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle); > + Op = NewShuffle; > + SVOp = cast(Op); > + } > EVT VT = Op.getValueType(); > bool isLittleEndian = Subtarget.isLittleEndian(); > > @@ -9715,6 +9729,11 @@ SDValue > PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, > Offset = isLittleEndian ? (3 - SplatIdx) * 4 : SplatIdx * 4; > else > Offset = isLittleEndian ? (1 - SplatIdx) * 8 : SplatIdx * 8; > + > + // If we are loading a partial vector, it does not make sense to > adjust > + // the base pointer. This happens with (splat (s_to_v_permuted > (ld))). > + if (LD->getMemoryVT().getSizeInBits() == (IsFourByte ? 32 : 64)) > + Offset = 0; > SDValue BasePtr = LD->getBasePtr(); > if (Offset != 0) > BasePtr = DAG.getNode(ISD::ADD, dl, > getPointerTy(DAG.getDataLayout()), > @@ -9988,7 +10007,13 @@ SDValue > PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, > MVT::i32)); > } > > + ShufflesHandledWithVPERM++; > SDValue VPermMask = DAG.getBuildVector(MVT::v16i8, dl, ResultMask); > + LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n"); > + LLVM_DEBUG(SVOp->dump()); > + LLVM_DEBUG(dbgs() << "With the following permute control vector:\n"); > + LLVM_DEBUG(VPermMask.dump()); > + > if (isLittleEndian) > return DAG.getNode(PPCISD::VPERM, dl, V1.getValueType(), > V2, V1, VPermMask); > @@ -14114,6 +14139,199 @@ SDValue > PPCTargetLowering::combineStoreFPToInt(SDNode *N, > return Val; > } > > +static bool isAlternatingShuffMask(const ArrayRef &Mask, int > NumElts) { > + // Check that the source of the element keeps flipping > + // (i.e. Mask[i] < NumElts -> Mask[i+i] >= NumElts). > + bool PrevElemFromFirstVec = Mask[0] < NumElts; > + for (int i = 1, e = Mask.size(); i < e; i++) { > + if (PrevElemFromFirstVec && Mask[i] < NumElts) > + return false; > + if (!PrevElemFromFirstVec && Mask[i] >= NumElts) > + return false; > + PrevElemFromFirstVec = !PrevElemFromFirstVec; > + } > + return true; > +} > + > +static bool isSplatBV(SDValue Op) { > + if (Op.getOpcode() != ISD::BUILD_VECTOR) > + return false; > + SDValue FirstOp; > + > + // Find first non-undef input. > + for (int i = 0, e = Op.getNumOperands(); i < e; i++) { > + FirstOp = Op.getOperand(i); > + if (!FirstOp.isUndef()) > + break; > + } > + > + // All inputs are undef or the same as the first non-undef input. > + for (int i = 1, e = Op.getNumOperands(); i < e; i++) > + if (Op.getOperand(i) != FirstOp && !Op.getOperand(i).isUndef()) > + return false; > + return true; > +} > + > +static SDValue isScalarToVec(SDValue Op) { > + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR) > + return Op; > + if (Op.getOpcode() != ISD::BITCAST) > + return SDValue(); > + Op = Op.getOperand(0); > + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR) > + return Op; > + return SDValue(); > +} > + > +static void fixupShuffleMaskForPermutedSToV(SmallVectorImpl &ShuffV, > + int LHSMaxIdx, int RHSMinIdx, > + int RHSMaxIdx, int HalfVec) { > + for (int i = 0, e = ShuffV.size(); i < e; i++) { > + int Idx = ShuffV[i]; > + if ((Idx >= 0 && Idx < LHSMaxIdx) || (Idx >= RHSMinIdx && Idx < > RHSMaxIdx)) > + ShuffV[i] += HalfVec; > + } > + return; > +} > + > +// Replace a SCALAR_TO_VECTOR with a SCALAR_TO_VECTOR_PERMUTED except if > +// the original is: > +// ( (scalar_to_vector (Ty (extract_elt %a, C)))) > +// In such a case, just change the shuffle mask to extract the element > +// from the permuted index. > +static SDValue getSToVPermuted(SDValue OrigSToV, SelectionDAG &DAG) { > + SDLoc dl(OrigSToV); > + EVT VT = OrigSToV.getValueType(); > + assert(OrigSToV.getOpcode() == ISD::SCALAR_TO_VECTOR && > + "Expecting a SCALAR_TO_VECTOR here"); > + SDValue Input = OrigSToV.getOperand(0); > + > + if (Input.getOpcode() == ISD::EXTRACT_VECTOR_ELT) { > + ConstantSDNode *Idx = dyn_cast(Input.getOperand(1)); > + SDValue OrigVector = Input.getOperand(0); > + > + // Can't handle non-const element indices or > diff erent vector types > + // for the input to the extract and the output of the > scalar_to_vector. > + if (Idx && VT == OrigVector.getValueType()) { > + SmallVector NewMask(VT.getVectorNumElements(), -1); > + NewMask[VT.getVectorNumElements() / 2] = Idx->getZExtValue(); > + return DAG.getVectorShuffle(VT, dl, OrigVector, OrigVector, > NewMask); > + } > + } > + return DAG.getNode(PPCISD::SCALAR_TO_VECTOR_PERMUTED, dl, VT, > + OrigSToV.getOperand(0)); > +} > + > +// On little endian subtargets, combine shuffles such as: > +// vector_shuffle<16,1,17,3,18,5,19,7,20,9,21,11,22,13,23,15>, , %b > +// into: > +// vector_shuffle<16,0,17,1,18,2,19,3,20,4,21,5,22,6,23,7>, , %b > +// because the latter can be matched to a single instruction merge. > +// Furthermore, SCALAR_TO_VECTOR on little endian always involves a > permute > +// to put the value into element zero. Adjust the shuffle mask so that the > +// vector can remain in permuted form (to prevent a swap prior to a > shuffle). > +SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN, > + SelectionDAG &DAG) const { > + SDValue LHS = SVN->getOperand(0); > + SDValue RHS = SVN->getOperand(1); > + auto Mask = SVN->getMask(); > + int NumElts = LHS.getValueType().getVectorNumElements(); > + SDValue Res(SVN, 0); > + SDLoc dl(SVN); > + > + // None of these combines are useful on big endian systems since the ISA > + // already has a big endian bias. > + if (!Subtarget.isLittleEndian()) > + return Res; > + > + // If this is not a shuffle of a shuffle and the first element comes > from > + // the second vector, canonicalize to the commuted form. This will make > it > + // more likely to match one of the single instruction patterns. > + if (Mask[0] >= NumElts && LHS.getOpcode() != ISD::VECTOR_SHUFFLE && > + RHS.getOpcode() != ISD::VECTOR_SHUFFLE) { > + std::swap(LHS, RHS); > + Res = DAG.getCommutedVectorShuffle(*SVN); > + Mask = cast(Res)->getMask(); > + } > + > + // Adjust the shuffle mask if either input vector comes from a > + // SCALAR_TO_VECTOR and keep the respective input vector in permuted > + // form (to prevent the need for a swap). > + SmallVector ShuffV(Mask.begin(), Mask.end()); > + SDValue SToVLHS = isScalarToVec(LHS); > + SDValue SToVRHS = isScalarToVec(RHS); > + if (SToVLHS || SToVRHS) { > + int NumEltsIn = SToVLHS ? > SToVLHS.getValueType().getVectorNumElements() > + : > SToVRHS.getValueType().getVectorNumElements(); > + int NumEltsOut = ShuffV.size(); > + > + // Initially assume that neither input is permuted. These will be > adjusted > + // accordingly if either input is. > + int LHSMaxIdx = -1; > + int RHSMinIdx = -1; > + int RHSMaxIdx = -1; > + int HalfVec = LHS.getValueType().getVectorNumElements() / 2; > + > + // Get the permuted scalar to vector nodes for the source(s) that > come from > + // ISD::SCALAR_TO_VECTOR. > + if (SToVLHS) { > + // Set up the values for the shuffle vector fixup. > + LHSMaxIdx = NumEltsOut / NumEltsIn; > + SToVLHS = getSToVPermuted(SToVLHS, DAG); > + if (SToVLHS.getValueType() != LHS.getValueType()) > + SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS); > + LHS = SToVLHS; > + } > + if (SToVRHS) { > + RHSMinIdx = NumEltsOut; > + RHSMaxIdx = NumEltsOut / NumEltsIn + RHSMinIdx; > + SToVRHS = getSToVPermuted(SToVRHS, DAG); > + if (SToVRHS.getValueType() != RHS.getValueType()) > + SToVRHS = DAG.getBitcast(RHS.getValueType(), SToVRHS); > + RHS = SToVRHS; > + } > + > + // Fix up the shuffle mask to reflect where the desired element > actually is. > + // The minimum and maximum indices that correspond to element zero > for both > + // the LHS and RHS are computed and will control which shuffle mask > entries > + // are to be changed. For example, if the RHS is permuted, any > shuffle mask > + // entries in the range [RHSMinIdx,RHSMaxIdx) will be incremented by > + // HalfVec to refer to the corresponding element in the permuted > vector. > + fixupShuffleMaskForPermutedSToV(ShuffV, LHSMaxIdx, RHSMinIdx, > RHSMaxIdx, > + HalfVec); > + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, > ShuffV); > + > + // We may have simplified away the shuffle. We won't be able to do > anything > + // further with it here. > + if (!isa(Res)) > + return Res; > + Mask = cast(Res)->getMask(); > + } > + > + // The common case after we commuted the shuffle is that the RHS is a > splat > + // and we have elements coming in from the splat at indices that are not > + // conducive to using a merge. > + // Example: > + // vector_shuffle<0,17,1,19,2,21,3,23,4,25,5,27,6,29,7,31> t1, > + if (!isSplatBV(RHS)) > + return Res; > + > + // We are looking for a mask such that all even elements are from > + // one vector and all odd elements from the other. > + if (!isAlternatingShuffMask(Mask, NumElts)) > + return Res; > + > + // Adjust the mask so we are pulling in the same index from the splat > + // as the index from the interesting vector in consecutive elements. > + // Example: > + // vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t1, > + for (int i = 1, e = Mask.size(); i < e; i += 2) > + ShuffV[i] = (ShuffV[i - 1] + NumElts); > + > + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV); > + return Res; > +} > + > SDValue PPCTargetLowering::combineVReverseMemOP(ShuffleVectorSDNode *SVN, > LSBaseSDNode *LSBase, > DAGCombinerInfo &DCI) > const { > @@ -14223,7 +14441,7 @@ SDValue > PPCTargetLowering::PerformDAGCombine(SDNode *N, > LSBaseSDNode* LSBase = cast(N->getOperand(0)); > return combineVReverseMemOP(cast(N), LSBase, > DCI); > } > - break; > + return combineVectorShuffle(cast(N), DCI.DAG); > case ISD::STORE: { > > EVT Op1VT = N->getOperand(1).getValueType(); > > diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h > b/llvm/lib/Target/PowerPC/PPCISelLowering.h > index 77252e919553..9f7c6ab53a17 100644 > --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h > +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h > @@ -221,6 +221,14 @@ namespace llvm { > /// As with SINT_VEC_TO_FP, used for converting illegal types. > UINT_VEC_TO_FP, > > + /// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to > + /// place the value into the least significant element of the most > + /// significant doubleword in the vector. This is not element zero for > + /// anything smaller than a doubleword on either endianness. This > node has > + /// the same semantics as SCALAR_TO_VECTOR except that the value > remains in > + /// the aforementioned location in the vector register. > + SCALAR_TO_VECTOR_PERMUTED, > + > // FIXME: Remove these once the ANDI glue bug is fixed: > /// i1 = ANDI_rec_1_[EQ|GT]_BIT(i32 or i64 x) - Represents the result > of the > /// eq or gt bit of CR0 after executing andi. x, 1. This is used to > @@ -1215,6 +1223,8 @@ namespace llvm { > SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const; > SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const; > SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const; > + SDValue combineVectorShuffle(ShuffleVectorSDNode *SVN, > + SelectionDAG &DAG) const; > SDValue combineVReverseMemOP(ShuffleVectorSDNode *SVN, LSBaseSDNode > *LSBase, > DAGCombinerInfo &DCI) const; > > > diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td > b/llvm/lib/Target/PowerPC/PPCInstrVSX.td > index e7ec1808ec3b..c43b2716cb37 100644 > --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td > +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td > @@ -138,6 +138,8 @@ def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH", > SDT_PPCldvsxlh, > [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>; > def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat, > [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>; > +def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED", > + SDTypeProfile<1, 1, []>, []>; > > //-------------------------- Predicate definitions > ---------------------------// > def HasVSX : Predicate<"PPCSubTarget->hasVSX()">; > @@ -288,6 +290,11 @@ class X_XS6_RA5_RB5 opcode, bits<10> xo, > string opc, > } // Predicates = HasP9Vector > } // AddedComplexity = 400, hasSideEffects = 0 > > +multiclass ScalToVecWPermute PermOut> { > + def : Pat<(Ty (scalar_to_vector In)), (Ty NonPermOut)>; > + def : Pat<(Ty (PPCSToV In)), (Ty PermOut)>; > +} > + > //-------------------------- Instruction definitions > -------------------------// > // VSX instructions require the VSX feature, they are to be selected over > // equivalent Altivec patterns (as they address a larger register set) and > @@ -2710,12 +2717,14 @@ def : Pat<(v2i64 (build_vector DblToLong.A, > DblToLong.A)), > def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)), > (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), > (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), 0))>; > -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC), > 1))>; > -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC), > 1))>; > +defm : ScalToVecWPermute< > + v4i32, FltToIntLoad.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC), > 1), > + (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC)>; > +defm : ScalToVecWPermute< > + v4i32, FltToUIntLoad.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC), > 1), > + (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC)>; > def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)), > (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>; > def : Pat<(v2f64 (PPCldsplat xoaddr:$A)), > @@ -2730,10 +2739,12 @@ def : Pat<(v2i64 (build_vector FltToLong.A, > FltToLong.A)), > def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)), > (v2i64 (XXPERMDIs > (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>; > -def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)), > - (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>; > -def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)), > - (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>; > +defm : ScalToVecWPermute< > + v2i64, DblToLongLoad.A, > + (XVCVDPSXDS (LXVDSX xoaddr:$A)), (XVCVDPSXDS (LXVDSX xoaddr:$A))>; > +defm : ScalToVecWPermute< > + v2i64, DblToULongLoad.A, > + (XVCVDPUXDS (LXVDSX xoaddr:$A)), (XVCVDPUXDS (LXVDSX xoaddr:$A))>; > } // HasVSX > > // Any big endian VSX subtarget. > @@ -2831,9 +2842,10 @@ def : Pat > // Any little endian VSX subtarget. > let Predicates = [HasVSX, IsLittleEndian] in { > -def : Pat<(v2f64 (scalar_to_vector f64:$A)), > - (v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64), > - (SUBREG_TO_REG (i64 1), $A, sub_64), 0))>; > +defm : ScalToVecWPermute + (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64), > + (SUBREG_TO_REG (i64 1), $A, sub_64), > 0), > + (SUBREG_TO_REG (i64 1), $A, sub_64)>; > > def : Pat<(f64 (extractelt v2f64:$S, 0)), > (f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>; > @@ -2943,18 +2955,24 @@ def : Pat<(PPCstore_scal_int_from_vsr > (STXSDX (XSCVDPUXDS f64:$src), xoaddr:$dst)>; > > // Load-and-splat with fp-to-int conversion (using X-Form VSX/FP loads). > -def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC), > 1))>; > -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC), > 1))>; > -def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)), > - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS > - (XFLOADf32 xoaddr:$A), VSFRC)), > 0))>; > -def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)), > - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS > - (XFLOADf32 xoaddr:$A), VSFRC)), > 0))>; > +defm : ScalToVecWPermute< > + v4i32, DblToIntLoad.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC), > 1), > + (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC)>; > +defm : ScalToVecWPermute< > + v4i32, DblToUIntLoad.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC), > 1), > + (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC)>; > +defm : ScalToVecWPermute< > + v2i64, FltToLongLoad.A, > + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A), > VSFRC)), 0), > + (SUBREG_TO_REG (i64 1), (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 > xoaddr:$A), > + VSFRC)), sub_64)>; > +defm : ScalToVecWPermute< > + v2i64, FltToULongLoad.A, > + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A), > VSFRC)), 0), > + (SUBREG_TO_REG (i64 1), (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 > xoaddr:$A), > + VSFRC)), sub_64)>; > } // HasVSX, NoP9Vector > > // Any VSX subtarget that only has loads and stores that load in big > endian > @@ -3156,8 +3174,12 @@ def : Pat (f64 (COPY_TO_REGCLASS $S1, VSRC)), VSFRC)))>; > > // v4f32 scalar <-> vector conversions (LE) > -def : Pat<(v4f32 (scalar_to_vector f32:$A)), > - (v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>; > + // The permuted version is no better than the version that puts the > value > + // into the right element because XSCVDPSPN is > diff erent from all the other > + // instructions used for PPCSToV. > + defm : ScalToVecWPermute + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1), > + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 3)>; > def : Pat<(f32 (vector_extract v4f32:$S, 0)), > (f32 (XSCVSPDPN (XXSLDWI $S, $S, 3)))>; > def : Pat<(f32 (vector_extract v4f32:$S, 1)), > @@ -3189,18 +3211,25 @@ def : Pat<(f64 (PPCfcfid (f64 (PPCmtvsra (i32 > (extractelt v4i32:$A, 3)))))), > // LIWAX - This instruction is used for sign extending i32 -> i64. > // LIWZX - This instruction will be emitted for i32, f32, and when > // zero-extending i32 to i64 (zext i32 -> i64). > -def : Pat<(v2i64 (scalar_to_vector (i64 (sextloadi32 xoaddr:$src)))), > - (v2i64 (XXPERMDIs > - (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2))>; > -def : Pat<(v2i64 (scalar_to_vector (i64 (zextloadi32 xoaddr:$src)))), > - (v2i64 (XXPERMDIs > - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; > -def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))), > - (v4i32 (XXPERMDIs > - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; > -def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))), > - (v4f32 (XXPERMDIs > - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; > +defm : ScalToVecWPermute< > + v2i64, (i64 (sextloadi32 xoaddr:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (LIWAX xoaddr:$src), sub_64)>; > + > +defm : ScalToVecWPermute< > + v2i64, (i64 (zextloadi32 xoaddr:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; > + > +defm : ScalToVecWPermute< > + v4i32, (i32 (load xoaddr:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; > + > +defm : ScalToVecWPermute< > + v4f32, (f32 (load xoaddr:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; > > def : Pat (v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3), > @@ -3336,14 +3365,17 @@ def : Pat<(i64 (vector_extract v2i64:$S, > i64:$Idx)), > // Little endian VSX subtarget with direct moves. > let Predicates = [HasVSX, HasDirectMove, IsLittleEndian] in { > // v16i8 scalar <-> vector conversions (LE) > - def : Pat<(v16i8 (scalar_to_vector i32:$A)), > - (v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>; > - def : Pat<(v8i16 (scalar_to_vector i32:$A)), > - (v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>; > - def : Pat<(v4i32 (scalar_to_vector i32:$A)), > - (v4i32 MovesToVSR.LE_WORD_0)>; > - def : Pat<(v2i64 (scalar_to_vector i64:$A)), > - (v2i64 MovesToVSR.LE_DWORD_0)>; > + defm : ScalToVecWPermute + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC), > + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, VSRC)>; > + defm : ScalToVecWPermute + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC), > + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, VSRC)>; > + defm : ScalToVecWPermute + (SUBREG_TO_REG (i64 1), (MTVSRWZ $A), sub_64)>; > + defm : ScalToVecWPermute + MovesToVSR.LE_DWORD_1>; > + > // v2i64 scalar <-> vector conversions (LE) > def : Pat<(i64 (vector_extract v2i64:$S, 0)), > (i64 VectorExtractions.LE_DWORD_0)>; > @@ -3641,30 +3673,41 @@ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, > xoaddr:$dst), > (STXVX $rS, xoaddr:$dst)>; > > // Build vectors from i8 loads > -def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)), > - (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>; > -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)), > - (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>; > -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)), > - (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>; > -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)), > - (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>; > -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)), > - (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>; > -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)), > - (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>; > +defm : ScalToVecWPermute + (VSPLTBs 7, (LXSIBZX xoaddr:$src)), > + (VSPLTBs 7, (LXSIBZX xoaddr:$src))>; > +defm : ScalToVecWPermute + (VSPLTHs 3, (LXSIBZX xoaddr:$src)), > + (VSPLTHs 3, (LXSIBZX xoaddr:$src))>; > +defm : ScalToVecWPermute + (XXSPLTWs (LXSIBZX xoaddr:$src), 1), > + (XXSPLTWs (LXSIBZX xoaddr:$src), 1)>; > +defm : ScalToVecWPermute + (XXPERMDIs (LXSIBZX xoaddr:$src), 0), > + (XXPERMDIs (LXSIBZX xoaddr:$src), 0)>; > +defm : ScalToVecWPermute + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1), > + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1)>; > +defm : ScalToVecWPermute + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0), > + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), > 0)>; > > // Build vectors from i16 loads > -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)), > - (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>; > -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)), > - (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>; > -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)), > - (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>; > -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)), > - (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>; > -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)), > - (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>; > +defm : ScalToVecWPermute + (VSPLTHs 3, (LXSIHZX xoaddr:$src)), > + (VSPLTHs 3, (LXSIHZX xoaddr:$src))>; > +defm : ScalToVecWPermute + (XXSPLTWs (LXSIHZX xoaddr:$src), 1), > + (XXSPLTWs (LXSIHZX xoaddr:$src), 1)>; > +defm : ScalToVecWPermute + (XXPERMDIs (LXSIHZX xoaddr:$src), 0), > + (XXPERMDIs (LXSIHZX xoaddr:$src), 0)>; > +defm : ScalToVecWPermute + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1), > + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1)>; > +defm : ScalToVecWPermute + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0), > + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), > 0)>; > > // Load/convert and convert/store patterns for f16. > def : Pat<(f64 (extloadf16 xoaddr:$src)), > @@ -3806,8 +3849,7 @@ def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)), > VSSRC))>; > > // Endianness-neutral patterns for const splats with ISA 3.0 instructions. > -def : Pat<(v4i32 (scalar_to_vector i32:$A)), > - (v4i32 (MTVSRWS $A))>; > +defm : ScalToVecWPermute; > def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)), > (v4i32 (MTVSRWS $A))>; > def : Pat<(v16i8 (build_vector immNonAllOneAnyExt8:$A, > immNonAllOneAnyExt8:$A, > @@ -3819,24 +3861,32 @@ def : Pat<(v16i8 (build_vector > immNonAllOneAnyExt8:$A, immNonAllOneAnyExt8:$A, > immNonAllOneAnyExt8:$A, > immNonAllOneAnyExt8:$A, > immNonAllOneAnyExt8:$A, > immNonAllOneAnyExt8:$A)), > (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>; > -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)), > - (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>; > -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)), > - (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>; > -def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC), > 1))>; > -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)), > - (v4i32 (XXSPLTW (COPY_TO_REGCLASS > - (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC), > 1))>; > -def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)), > - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS > - (DFLOADf32 iaddrX4:$A), > - VSFRC)), 0))>; > -def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)), > - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS > - (DFLOADf32 iaddrX4:$A), > - VSFRC)), 0))>; > +defm : ScalToVecWPermute + (XVCVSPSXWS (LXVWSX xoaddr:$A)), > + (XVCVSPSXWS (LXVWSX xoaddr:$A))>; > +defm : ScalToVecWPermute + (XVCVSPUXWS (LXVWSX xoaddr:$A)), > + (XVCVSPUXWS (LXVWSX xoaddr:$A))>; > +defm : ScalToVecWPermute< > + v4i32, DblToIntLoadP9.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC), > 1), > + (SUBREG_TO_REG (i64 1), (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), sub_64)>; > +defm : ScalToVecWPermute< > + v4i32, DblToUIntLoadP9.A, > + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC), > 1), > + (SUBREG_TO_REG (i64 1), (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), sub_64)>; > +defm : ScalToVecWPermute< > + v2i64, FltToLongLoadP9.A, > + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), > VSFRC)), 0), > + (SUBREG_TO_REG > + (i64 1), > + (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)), > sub_64)>; > +defm : ScalToVecWPermute< > + v2i64, FltToULongLoadP9.A, > + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), > VSFRC)), 0), > + (SUBREG_TO_REG > + (i64 1), > + (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)), > sub_64)>; > def : Pat<(v4f32 (PPCldsplat xoaddr:$A)), > (v4f32 (LXVWSX xoaddr:$A))>; > def : Pat<(v4i32 (PPCldsplat xoaddr:$A)), > @@ -4116,19 +4166,23 @@ def : Pat<(truncstorei16 (i32 (vector_extract > v8i16:$S, 6)), xoaddr:$dst), > def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), xoaddr:$dst), > (STXSIHXv (COPY_TO_REGCLASS (v16i8 (VSLDOI $S, $S, 10)), VSRC), > xoaddr:$dst)>; > > -def : Pat<(v2i64 (scalar_to_vector (i64 (load iaddrX4:$src)))), > - (v2i64 (XXPERMDIs > - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>; > -def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddrX4:$src)))), > - (v2i64 (XXPERMDIs > - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>; > +defm : ScalToVecWPermute< > + v2i64, (i64 (load iaddrX4:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>; > +defm : ScalToVecWPermute< > + v2i64, (i64 (load xaddrX4:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>; > +defm : ScalToVecWPermute< > + v2f64, (f64 (load iaddrX4:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>; > +defm : ScalToVecWPermute< > + v2f64, (f64 (load xaddrX4:$src)), > + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2), > + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>; > > -def : Pat<(v2f64 (scalar_to_vector (f64 (load iaddrX4:$src)))), > - (v2f64 (XXPERMDIs > - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>; > -def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddrX4:$src)))), > - (v2f64 (XXPERMDIs > - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>; > def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddrX4:$src), > (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), > sub_64), xaddrX4:$src)>; > > diff --git a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll > b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll > index 8c9ffa815467..4d06571d0ec7 100644 > --- a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll > +++ b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll > @@ -13,8 +13,7 @@ define void @testExpandPostRAPseudo(i32* nocapture > readonly %ptr) { > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8: lfiwzx f0, 0, r3 > ; CHECK-P8: ld r4, .LC0 at toc@l(r4) > -; CHECK-P8: xxswapd vs0, f0 > -; CHECK-P8: xxspltw v2, vs0, 3 > +; CHECK-P8: xxspltw v2, vs0, 1 > ; CHECK-P8: stvx v2, 0, r4 > ; CHECK-P8: lis r4, 1024 > ; CHECK-P8: lfiwax f0, 0, r3 > > diff --git a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll > b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll > index ee0cc41ea6bd..1cb7d7b62055 100644 > --- a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll > +++ b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll > @@ -1282,8 +1282,7 @@ define <4 x i32> @spltMemVali(i32* nocapture > readonly %ptr) { > ; P8LE-LABEL: spltMemVali: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: xxswapd vs0, f0 > -; P8LE-NEXT: xxspltw v2, vs0, 3 > +; P8LE-NEXT: xxspltw v2, vs0, 1 > ; P8LE-NEXT: blr > entry: > %0 = load i32, i32* %ptr, align 4 > @@ -2801,8 +2800,7 @@ define <4 x i32> @spltMemValui(i32* nocapture > readonly %ptr) { > ; P8LE-LABEL: spltMemValui: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: xxswapd vs0, f0 > -; P8LE-NEXT: xxspltw v2, vs0, 3 > +; P8LE-NEXT: xxspltw v2, vs0, 1 > ; P8LE-NEXT: blr > entry: > %0 = load i32, i32* %ptr, align 4 > @@ -4573,7 +4571,7 @@ define <2 x i64> @spltMemValConvftoll(float* > nocapture readonly %ptr) { > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfs f0, 0(r3) > ; P9LE-NEXT: xscvdpsxds f0, f0 > -; P9LE-NEXT: xxspltd v2, f0, 0 > +; P9LE-NEXT: xxspltd v2, vs0, 0 > ; P9LE-NEXT: blr > ; > ; P8BE-LABEL: spltMemValConvftoll: > @@ -4587,7 +4585,7 @@ define <2 x i64> @spltMemValConvftoll(float* > nocapture readonly %ptr) { > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfsx f0, 0, r3 > ; P8LE-NEXT: xscvdpsxds f0, f0 > -; P8LE-NEXT: xxspltd v2, f0, 0 > +; P8LE-NEXT: xxspltd v2, vs0, 0 > ; P8LE-NEXT: blr > entry: > %0 = load float, float* %ptr, align 4 > @@ -5761,7 +5759,7 @@ define <2 x i64> @spltMemValConvftoull(float* > nocapture readonly %ptr) { > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfs f0, 0(r3) > ; P9LE-NEXT: xscvdpuxds f0, f0 > -; P9LE-NEXT: xxspltd v2, f0, 0 > +; P9LE-NEXT: xxspltd v2, vs0, 0 > ; P9LE-NEXT: blr > ; > ; P8BE-LABEL: spltMemValConvftoull: > @@ -5775,7 +5773,7 @@ define <2 x i64> @spltMemValConvftoull(float* > nocapture readonly %ptr) { > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfsx f0, 0, r3 > ; P8LE-NEXT: xscvdpuxds f0, f0 > -; P8LE-NEXT: xxspltd v2, f0, 0 > +; P8LE-NEXT: xxspltd v2, vs0, 0 > ; P8LE-NEXT: blr > entry: > %0 = load float, float* %ptr, align 4 > > diff --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll > b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll > index 2ffe98e1f694..7fac0511e3c5 100644 > --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll > +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll > @@ -23,18 +23,12 @@ entry: > define dso_local <16 x i8> @testmrghb2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrghb2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI1_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI1_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrghb v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrghb2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 24, i32 8, i32 25, i32 9, i32 26, i32 10, i32 27, i32 11, i32 28, i32 12, > i32 29, i32 13, i32 30, i32 14, i32 31, i32 15> > @@ -57,18 +51,12 @@ entry: > define dso_local <16 x i8> @testmrghh2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrghh2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI3_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI3_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrghh v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrghh2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI3_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI3_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrghh v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 24, i32 25, i32 8, i32 9, i32 26, i32 27, i32 10, i32 11, i32 28, i32 29, > i32 12, i32 13, i32 30, i32 31, i32 14, i32 15> > @@ -91,18 +79,12 @@ entry: > define dso_local <16 x i8> @testmrglb2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrglb2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI5_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI5_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrglb v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrglb2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI5_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI5_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrglb v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32 > 21, i32 5, i32 22, i32 6, i32 23, i32 7> > @@ -125,18 +107,12 @@ entry: > define dso_local <16 x i8> @testmrglh2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrglh2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI7_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI7_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrglh v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrglh2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI7_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI7_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrglh v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 16, i32 17, i32 0, i32 1, i32 18, i32 19, i32 2, i32 3, i32 20, i32 21, i32 > 4, i32 5, i32 22, i32 23, i32 6, i32 7> > @@ -159,18 +135,12 @@ entry: > define dso_local <16 x i8> @testmrghw2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrghw2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI9_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI9_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrghw v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrghw2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI9_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI9_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 28, i32 29, > i32 30, i32 31, i32 12, i32 13, i32 14, i32 15> > @@ -193,18 +163,12 @@ entry: > define dso_local <16 x i8> @testmrglw2(<16 x i8> %a, <16 x i8> %b) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: testmrglw2: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r3, r2, .LCPI11_0 at toc@ha > -; CHECK-P8-NEXT: addi r3, r3, .LCPI11_0 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P8-NEXT: vmrglw v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrglw2: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r3, r2, .LCPI11_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI11_0 at toc@l > -; CHECK-P9-NEXT: lxvx v4, 0, r3 > -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 > +; CHECK-P9-NEXT: vmrglw v2, v2, v3 > ; CHECK-P9-NEXT: blr > entry: > %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 > 22, i32 23, i32 4, i32 5, i32 6, i32 7> > @@ -215,24 +179,16 @@ define dso_local <8 x i16> @testmrglb3(<8 x i8>* > nocapture readonly %a) local_un > ; CHECK-P8-LABEL: testmrglb3: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: ld r3, 0(r3) > -; CHECK-P8-NEXT: addis r4, r2, .LCPI12_0 at toc@ha > -; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI12_0 at toc@l > -; CHECK-P8-NEXT: lvx v3, 0, r3 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: vperm v2, v2, v4, v3 > +; CHECK-P8-NEXT: xxlxor v2, v2, v2 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > +; CHECK-P8-NEXT: vmrghb v2, v2, v3 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: testmrglb3: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lfd f0, 0(r3) > -; CHECK-P9-NEXT: addis r3, r2, .LCPI12_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI12_0 at toc@l > -; CHECK-P9-NEXT: lxvx v3, 0, r3 > -; CHECK-P9-NEXT: xxswapd v2, f0 > -; CHECK-P9-NEXT: xxlxor v4, v4, v4 > -; CHECK-P9-NEXT: vperm v2, v2, v4, v3 > +; CHECK-P9-NEXT: lxsd v2, 0(r3) > +; CHECK-P9-NEXT: xxlxor v3, v3, v3 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: blr > entry: > %0 = load <8 x i8>, <8 x i8>* %a, align 8 > > diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll > b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll > index a23db59635a4..3a43b3584caf 100644 > --- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll > +++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll > @@ -331,12 +331,12 @@ define <2 x float> @fptrunc_v2f32_v2f64(<2 x double> > %vf1) { > ; P9: # %bb.0: > ; P9-NEXT: xsrsp f0, v2 > ; P9-NEXT: xscvdpspn vs0, f0 > -; P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; P9-NEXT: xxswapd vs0, v2 > ; P9-NEXT: xsrsp f0, f0 > ; P9-NEXT: xscvdpspn vs0, f0 > -; P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; P9-NEXT: vmrglw v2, v3, v2 > +; P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; P9-NEXT: vmrghw v2, v3, v2 > ; P9-NEXT: blr > %res = call <2 x float> > @llvm.experimental.constrained.fptrunc.v2f32.v2f64( > <2 x double> %vf1, > > diff --git a/llvm/test/CodeGen/PowerPC/load-and-splat.ll > b/llvm/test/CodeGen/PowerPC/load-and-splat.ll > index f411712ba3fa..26da1fdaefef 100644 > --- a/llvm/test/CodeGen/PowerPC/load-and-splat.ll > +++ b/llvm/test/CodeGen/PowerPC/load-and-splat.ll > @@ -40,8 +40,7 @@ define dso_local void @test2(<4 x float>* nocapture %c, > float* nocapture readonl > ; P8: # %bb.0: # %entry > ; P8-NEXT: addi r4, r4, 12 > ; P8-NEXT: lfiwzx f0, 0, r4 > -; P8-NEXT: xxswapd vs0, f0 > -; P8-NEXT: xxspltw v2, vs0, 3 > +; P8-NEXT: xxspltw v2, vs0, 1 > ; P8-NEXT: stvx v2, 0, r3 > ; P8-NEXT: blr > entry: > @@ -65,8 +64,7 @@ define dso_local void @test3(<4 x i32>* nocapture %c, > i32* nocapture readonly %a > ; P8: # %bb.0: # %entry > ; P8-NEXT: addi r4, r4, 12 > ; P8-NEXT: lfiwzx f0, 0, r4 > -; P8-NEXT: xxswapd vs0, f0 > -; P8-NEXT: xxspltw v2, vs0, 3 > +; P8-NEXT: xxspltw v2, vs0, 1 > ; P8-NEXT: stvx v2, 0, r3 > ; P8-NEXT: blr > entry: > @@ -110,8 +108,7 @@ define <16 x i8> @unadjusted_lxvwsx(i32* %s, i32* %t) { > ; P8-LABEL: unadjusted_lxvwsx: > ; P8: # %bb.0: # %entry > ; P8-NEXT: lfiwzx f0, 0, r3 > -; P8-NEXT: xxswapd vs0, f0 > -; P8-NEXT: xxspltw v2, vs0, 3 > +; P8-NEXT: xxspltw v2, vs0, 1 > ; P8-NEXT: blr > entry: > %0 = bitcast i32* %s to <4 x i8>* > @@ -131,8 +128,7 @@ define <16 x i8> @adjusted_lxvwsx(i64* %s, i64* %t) { > ; P8: # %bb.0: # %entry > ; P8-NEXT: ld r3, 0(r3) > ; P8-NEXT: mtfprd f0, r3 > -; P8-NEXT: xxswapd v2, vs0 > -; P8-NEXT: xxspltw v2, v2, 2 > +; P8-NEXT: xxspltw v2, vs0, 0 > ; P8-NEXT: blr > entry: > %0 = bitcast i64* %s to <8 x i8>* > > diff --git a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll > b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll > index 409978549c36..a03ab5f9519e 100644 > --- a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll > +++ b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll > @@ -9,8 +9,7 @@ define <16 x i8> @test(i32* %s, i32* %t) { > ; CHECK-LE-LABEL: test: > ; CHECK-LE: # %bb.0: # %entry > ; CHECK-LE-NEXT: lfiwzx f0, 0, r3 > -; CHECK-LE-NEXT: xxswapd vs0, f0 > -; CHECK-LE-NEXT: xxspltw v2, vs0, 3 > +; CHECK-LE-NEXT: xxspltw v2, vs0, 1 > ; CHECK-LE-NEXT: blr > > ; CHECK-LABEL: test: > > diff --git a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll > b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll > index e1f0e827b9f6..dffa0fb98fc0 100644 > --- a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll > +++ b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll > @@ -21,8 +21,8 @@ entry: > ; CHECK: sldi r3, r3, 56 > ; CHECK: mtvsrd v2, r3 > ; CHECK-LE-LABEL: buildc > -; CHECK-LE: mtfprd f0, r3 > -; CHECK-LE: xxswapd v2, vs0 > +; CHECK-LE: mtvsrd v2, r3 > +; CHECK-LE: vspltb v2, v2, 7 > } > > ; Function Attrs: norecurse nounwind readnone > @@ -35,8 +35,8 @@ entry: > ; CHECK: sldi r3, r3, 48 > ; CHECK: mtvsrd v2, r3 > ; CHECK-LE-LABEL: builds > -; CHECK-LE: mtfprd f0, r3 > -; CHECK-LE: xxswapd v2, vs0 > +; CHECK-LE: mtvsrd v2, r3 > +; CHECK-LE: vsplth v2, v2, 3 > } > > ; Function Attrs: norecurse nounwind readnone > > diff --git a/llvm/test/CodeGen/PowerPC/pr25080.ll > b/llvm/test/CodeGen/PowerPC/pr25080.ll > index 7a2fb76fd453..f87cb5b940ca 100644 > --- a/llvm/test/CodeGen/PowerPC/pr25080.ll > +++ b/llvm/test/CodeGen/PowerPC/pr25080.ll > @@ -17,41 +17,33 @@ define <8 x i16> @pr25080(<8 x i32> %a) { > ; LE-NEXT: mfvsrwz 3, 34 > ; LE-NEXT: xxsldwi 1, 34, 34, 1 > ; LE-NEXT: mfvsrwz 4, 35 > -; LE-NEXT: xxsldwi 4, 34, 34, 3 > -; LE-NEXT: mtfprd 2, 3 > +; LE-NEXT: xxsldwi 2, 34, 34, 3 > +; LE-NEXT: mtvsrd 36, 3 > ; LE-NEXT: mffprwz 3, 0 > ; LE-NEXT: xxswapd 0, 35 > -; LE-NEXT: mtfprd 3, 4 > -; LE-NEXT: xxsldwi 5, 35, 35, 1 > +; LE-NEXT: mtvsrd 37, 4 > ; LE-NEXT: mffprwz 4, 1 > -; LE-NEXT: xxsldwi 7, 35, 35, 3 > -; LE-NEXT: mtfprd 1, 3 > -; LE-NEXT: xxswapd 33, 3 > -; LE-NEXT: mffprwz 3, 4 > -; LE-NEXT: mtfprd 4, 4 > -; LE-NEXT: xxswapd 34, 1 > +; LE-NEXT: xxsldwi 1, 35, 35, 1 > +; LE-NEXT: mtvsrd 34, 3 > +; LE-NEXT: mffprwz 3, 2 > +; LE-NEXT: mtvsrd 32, 4 > ; LE-NEXT: mffprwz 4, 0 > -; LE-NEXT: mtfprd 0, 3 > -; LE-NEXT: xxswapd 35, 4 > -; LE-NEXT: mffprwz 3, 5 > -; LE-NEXT: mtfprd 6, 4 > -; LE-NEXT: xxswapd 36, 0 > -; LE-NEXT: mtfprd 1, 3 > -; LE-NEXT: mffprwz 3, 7 > -; LE-NEXT: xxswapd 37, 6 > -; LE-NEXT: vmrglh 2, 3, 2 > -; LE-NEXT: xxswapd 35, 2 > -; LE-NEXT: mtfprd 2, 3 > -; LE-NEXT: xxswapd 32, 1 > +; LE-NEXT: xxsldwi 0, 35, 35, 3 > +; LE-NEXT: mtvsrd 33, 3 > +; LE-NEXT: mffprwz 3, 1 > +; LE-NEXT: mtvsrd 38, 4 > +; LE-NEXT: mtvsrd 35, 3 > +; LE-NEXT: mffprwz 3, 0 > +; LE-NEXT: vmrghh 2, 0, 2 > +; LE-NEXT: mtvsrd 32, 3 > ; LE-NEXT: addis 3, 2, .LCPI0_1 at toc@ha > +; LE-NEXT: vmrghh 4, 1, 4 > ; LE-NEXT: addi 3, 3, .LCPI0_1 at toc@l > -; LE-NEXT: xxswapd 38, 2 > -; LE-NEXT: vmrglh 3, 4, 3 > -; LE-NEXT: vmrglh 4, 0, 5 > -; LE-NEXT: vmrglh 5, 6, 1 > -; LE-NEXT: vmrglw 2, 3, 2 > -; LE-NEXT: vmrglw 3, 5, 4 > +; LE-NEXT: vmrghh 3, 3, 6 > +; LE-NEXT: vmrghh 5, 0, 5 > +; LE-NEXT: vmrglw 2, 4, 2 > ; LE-NEXT: vspltish 4, 15 > +; LE-NEXT: vmrglw 3, 5, 3 > ; LE-NEXT: xxmrgld 34, 35, 34 > ; LE-NEXT: lvx 3, 0, 3 > ; LE-NEXT: xxlor 34, 34, 35 > > diff --git a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll > b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll > index 4c10c3813fb5..d3bfb910fc9f 100644 > --- a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll > +++ b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll > @@ -58,12 +58,11 @@ L.LB38_2452: > > ; CHECK-LABEL: @aercalc_ > ; CHECK: lfs > -; CHECK: xxspltd > +; CHECK: xxswapd > ; CHECK: stxvd2x > ; CHECK-NOT: xxswapd > > ; CHECK-P9-LABEL: @aercalc_ > ; CHECK-P9: lfs > -; CHECK-P9: xxspltd > ; CHECK-P9: stxv > ; CHECK-P9-NOT: xxswapd > > diff --git a/llvm/test/CodeGen/PowerPC/pr38087.ll > b/llvm/test/CodeGen/PowerPC/pr38087.ll > index e05a3d2b97aa..49b3d39bc18c 100644 > --- a/llvm/test/CodeGen/PowerPC/pr38087.ll > +++ b/llvm/test/CodeGen/PowerPC/pr38087.ll > @@ -11,9 +11,8 @@ declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, > i32) #0 > define void @draw_llvm_vs_variant0(<4 x float> %x) { > ; CHECK-LABEL: draw_llvm_vs_variant0: > ; CHECK: # %bb.0: # %entry > -; CHECK-NEXT: lfd f0, 0(r3) > -; CHECK-NEXT: xxswapd v3, f0 > -; CHECK-NEXT: vmrglh v3, v3, v3 > +; CHECK-NEXT: lxsd v3, 0(r3) > +; CHECK-NEXT: vmrghh v3, v3, v3 > ; CHECK-NEXT: vextsh2w v3, v3 > ; CHECK-NEXT: xvcvsxwsp vs0, v3 > ; CHECK-NEXT: xxspltw vs0, vs0, 2 > > diff --git a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll > b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll > index 4c9137d86124..6584cb74bdb5 100644 > --- a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll > +++ b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll > @@ -11,34 +11,31 @@ > define signext i32 @test_pre_inc_disable_1(i8* nocapture readonly %pix1, > i32 signext %i_stride_pix1, i8* nocapture readonly %pix2) { > ; CHECK-LABEL: test_pre_inc_disable_1: > ; CHECK: # %bb.0: # %entry > -; CHECK-NEXT: lfd f0, 0(r5) > +; CHECK-NEXT: lxsd v5, 0(r5) > ; CHECK-NEXT: addis r5, r2, .LCPI0_0 at toc@ha > ; CHECK-NEXT: addi r5, r5, .LCPI0_0 at toc@l > ; CHECK-NEXT: lxvx v2, 0, r5 > ; CHECK-NEXT: addis r5, r2, .LCPI0_1 at toc@ha > ; CHECK-NEXT: addi r5, r5, .LCPI0_1 at toc@l > ; CHECK-NEXT: lxvx v4, 0, r5 > -; CHECK-NEXT: xxswapd v5, f0 > -; CHECK-NEXT: xxlxor v3, v3, v3 > ; CHECK-NEXT: li r5, 4 > +; CHECK-NEXT: xxlxor v3, v3, v3 > ; CHECK-NEXT: vperm v0, v3, v5, v2 > ; CHECK-NEXT: mtctr r5 > ; CHECK-NEXT: li r5, 0 > -; CHECK-NEXT: vperm v1, v5, v3, v4 > +; CHECK-NEXT: vperm v1, v3, v5, v4 > ; CHECK-NEXT: li r6, 0 > ; CHECK-NEXT: xvnegsp v5, v0 > ; CHECK-NEXT: xvnegsp v0, v1 > ; CHECK-NEXT: .p2align 4 > ; CHECK-NEXT: .LBB0_1: # %for.cond1.preheader > ; CHECK-NEXT: # > -; CHECK-NEXT: lfd f0, 0(r3) > -; CHECK-NEXT: xxswapd v1, f0 > -; CHECK-NEXT: lfdx f0, r3, r4 > -; CHECK-NEXT: vperm v6, v1, v3, v4 > +; CHECK-NEXT: lxsd v1, 0(r3) > +; CHECK-NEXT: vperm v6, v3, v1, v4 > ; CHECK-NEXT: vperm v1, v3, v1, v2 > ; CHECK-NEXT: xvnegsp v1, v1 > -; CHECK-NEXT: add r7, r3, r4 > ; CHECK-NEXT: xvnegsp v6, v6 > +; CHECK-NEXT: add r7, r3, r4 > ; CHECK-NEXT: vabsduw v1, v1, v5 > ; CHECK-NEXT: vabsduw v6, v6, v0 > ; CHECK-NEXT: vadduwm v1, v6, v1 > @@ -46,15 +43,14 @@ define signext i32 @test_pre_inc_disable_1(i8* > nocapture readonly %pix1, i32 sig > ; CHECK-NEXT: vadduwm v1, v1, v6 > ; CHECK-NEXT: xxspltw v6, v1, 2 > ; CHECK-NEXT: vadduwm v1, v1, v6 > -; CHECK-NEXT: xxswapd v6, f0 > +; CHECK-NEXT: lxsdx v6, r3, r4 > ; CHECK-NEXT: vextuwrx r3, r5, v1 > -; CHECK-NEXT: vperm v7, v6, v3, v4 > +; CHECK-NEXT: vperm v7, v3, v6, v4 > ; CHECK-NEXT: vperm v6, v3, v6, v2 > -; CHECK-NEXT: add r6, r3, r6 > -; CHECK-NEXT: add r3, r7, r4 > ; CHECK-NEXT: xvnegsp v6, v6 > ; CHECK-NEXT: xvnegsp v1, v7 > ; CHECK-NEXT: vabsduw v6, v6, v5 > +; CHECK-NEXT: add r6, r3, r6 > ; CHECK-NEXT: vabsduw v1, v1, v0 > ; CHECK-NEXT: vadduwm v1, v1, v6 > ; CHECK-NEXT: xxswapd v6, v1 > @@ -62,6 +58,7 @@ define signext i32 @test_pre_inc_disable_1(i8* nocapture > readonly %pix1, i32 sig > ; CHECK-NEXT: xxspltw v6, v1, 2 > ; CHECK-NEXT: vadduwm v1, v1, v6 > ; CHECK-NEXT: vextuwrx r8, r5, v1 > +; CHECK-NEXT: add r3, r7, r4 > ; CHECK-NEXT: add r6, r8, r6 > ; CHECK-NEXT: bdnz .LBB0_1 > ; CHECK-NEXT: # %bb.2: # %for.cond.cleanup > @@ -181,29 +178,27 @@ for.cond.cleanup: ; > preds = %for.cond1.preheader > define signext i32 @test_pre_inc_disable_2(i8* nocapture readonly %pix1, > i8* nocapture readonly %pix2) { > ; CHECK-LABEL: test_pre_inc_disable_2: > ; CHECK: # %bb.0: # %entry > -; CHECK-NEXT: lfd f0, 0(r3) > +; CHECK-NEXT: lxsd v2, 0(r3) > ; CHECK-NEXT: addis r3, r2, .LCPI1_0 at toc@ha > ; CHECK-NEXT: addi r3, r3, .LCPI1_0 at toc@l > ; CHECK-NEXT: lxvx v4, 0, r3 > ; CHECK-NEXT: addis r3, r2, .LCPI1_1 at toc@ha > -; CHECK-NEXT: xxswapd v2, f0 > -; CHECK-NEXT: lfd f0, 0(r4) > ; CHECK-NEXT: addi r3, r3, .LCPI1_1 at toc@l > -; CHECK-NEXT: xxlxor v3, v3, v3 > ; CHECK-NEXT: lxvx v0, 0, r3 > -; CHECK-NEXT: xxswapd v1, f0 > -; CHECK-NEXT: vperm v5, v2, v3, v4 > +; CHECK-NEXT: lxsd v1, 0(r4) > +; CHECK-NEXT: xxlxor v3, v3, v3 > +; CHECK-NEXT: vperm v5, v3, v2, v4 > ; CHECK-NEXT: vperm v2, v3, v2, v0 > ; CHECK-NEXT: vperm v0, v3, v1, v0 > -; CHECK-NEXT: vperm v3, v1, v3, v4 > +; CHECK-NEXT: vperm v3, v3, v1, v4 > ; CHECK-NEXT: vabsduw v2, v2, v0 > ; CHECK-NEXT: vabsduw v3, v5, v3 > ; CHECK-NEXT: vadduwm v2, v3, v2 > ; CHECK-NEXT: xxswapd v3, v2 > -; CHECK-NEXT: li r3, 0 > ; CHECK-NEXT: vadduwm v2, v2, v3 > ; CHECK-NEXT: xxspltw v3, v2, 2 > ; CHECK-NEXT: vadduwm v2, v2, v3 > +; CHECK-NEXT: li r3, 0 > ; CHECK-NEXT: vextuwrx r3, r3, v2 > ; CHECK-NEXT: extsw r3, r3 > ; CHECK-NEXT: blr > @@ -286,16 +281,14 @@ define void @test32(i8* nocapture readonly %pix2, > i32 signext %i_pix2) { > ; CHECK-LABEL: test32: > ; CHECK: # %bb.0: # %entry > ; CHECK-NEXT: add r5, r3, r4 > -; CHECK-NEXT: lfiwzx f0, r3, r4 > +; CHECK-NEXT: lxsiwzx v2, r3, r4 > ; CHECK-NEXT: addis r3, r2, .LCPI2_0 at toc@ha > ; CHECK-NEXT: addi r3, r3, .LCPI2_0 at toc@l > ; CHECK-NEXT: lxvx v4, 0, r3 > ; CHECK-NEXT: li r3, 4 > -; CHECK-NEXT: xxswapd v2, f0 > -; CHECK-NEXT: lfiwzx f0, r5, r3 > +; CHECK-NEXT: lxsiwzx v5, r5, r3 > ; CHECK-NEXT: xxlxor v3, v3, v3 > ; CHECK-NEXT: vperm v2, v2, v3, v4 > -; CHECK-NEXT: xxswapd v5, f0 > ; CHECK-NEXT: vperm v3, v5, v3, v4 > ; CHECK-NEXT: vspltisw v4, 8 > ; CHECK-NEXT: vnegw v3, v3 > @@ -361,16 +354,15 @@ define void @test16(i16* nocapture readonly %sums, > i32 signext %delta, i32 signe > ; CHECK-NEXT: lxsihzx v2, r6, r7 > ; CHECK-NEXT: lxsihzx v4, r3, r4 > ; CHECK-NEXT: li r6, 0 > -; CHECK-NEXT: mtfprd f0, r6 > +; CHECK-NEXT: mtvsrd v3, r6 > ; CHECK-NEXT: vsplth v4, v4, 3 > -; CHECK-NEXT: xxswapd v3, vs0 > ; CHECK-NEXT: vsplth v2, v2, 3 > ; CHECK-NEXT: addis r3, r2, .LCPI3_0 at toc@ha > ; CHECK-NEXT: addi r3, r3, .LCPI3_0 at toc@l > -; CHECK-NEXT: vmrglh v2, v3, v2 > -; CHECK-NEXT: vmrglh v3, v3, v4 > -; CHECK-NEXT: xxlxor v4, v4, v4 > -; CHECK-NEXT: vmrglw v3, v3, v4 > +; CHECK-NEXT: vmrghh v4, v3, v4 > +; CHECK-NEXT: vmrghh v2, v3, v2 > +; CHECK-NEXT: vsplth v3, v3, 3 > +; CHECK-NEXT: vmrglw v3, v4, v3 > ; CHECK-NEXT: lxvx v4, 0, r3 > ; CHECK-NEXT: li r3, 0 > ; CHECK-NEXT: vperm v2, v2, v3, v4 > @@ -446,18 +438,17 @@ define void @test8(i8* nocapture readonly %sums, i32 > signext %delta, i32 signext > ; CHECK-NEXT: add r6, r3, r4 > ; CHECK-NEXT: lxsibzx v2, r3, r4 > ; CHECK-NEXT: li r3, 0 > -; CHECK-NEXT: mtfprd f0, r3 > +; CHECK-NEXT: mtvsrd v3, r3 > ; CHECK-NEXT: li r3, 8 > ; CHECK-NEXT: lxsibzx v5, r6, r3 > -; CHECK-NEXT: xxswapd v3, vs0 > -; CHECK-NEXT: vspltb v4, v3, 15 > -; CHECK-NEXT: vspltb v2, v2, 7 > -; CHECK-NEXT: vmrglb v2, v3, v2 > ; CHECK-NEXT: addis r3, r2, .LCPI4_0 at toc@ha > ; CHECK-NEXT: addi r3, r3, .LCPI4_0 at toc@l > +; CHECK-NEXT: vspltb v2, v2, 7 > +; CHECK-NEXT: vmrghb v2, v3, v2 > +; CHECK-NEXT: vspltb v4, v3, 7 > ; CHECK-NEXT: vspltb v5, v5, 7 > ; CHECK-NEXT: vmrglh v2, v2, v4 > -; CHECK-NEXT: vmrglb v3, v3, v5 > +; CHECK-NEXT: vmrghb v3, v3, v5 > ; CHECK-NEXT: vmrglw v2, v2, v4 > ; CHECK-NEXT: vmrglh v3, v3, v4 > ; CHECK-NEXT: vmrglw v3, v4, v3 > > diff --git a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll > b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll > index 099611a7b5e3..50b864980d98 100644 > --- a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll > +++ b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll > @@ -53,8 +53,7 @@ define <4 x float> @foof(float* nocapture readonly %a) > #0 { > ; CHECK-LABEL: foof: > ; CHECK: # %bb.0: # %entry > ; CHECK-NEXT: lfiwzx f0, 0, r3 > -; CHECK-NEXT: xxswapd vs0, f0 > -; CHECK-NEXT: xxspltw v2, vs0, 3 > +; CHECK-NEXT: xxspltw v2, vs0, 1 > ; CHECK-NEXT: blr > entry: > %0 = load float, float* %a, align 4 > @@ -68,8 +67,7 @@ define <4 x float> @foofx(float* nocapture readonly %a, > i64 %idx) #0 { > ; CHECK: # %bb.0: # %entry > ; CHECK-NEXT: sldi r4, r4, 2 > ; CHECK-NEXT: lfiwzx f0, r3, r4 > -; CHECK-NEXT: xxswapd vs0, f0 > -; CHECK-NEXT: xxspltw v2, vs0, 3 > +; CHECK-NEXT: xxspltw v2, vs0, 1 > ; CHECK-NEXT: blr > entry: > %p = getelementptr float, float* %a, i64 %idx > > diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll > b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll > index b43e2c8b97af..c12f7f9a9f05 100644 > --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll > +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll > @@ -13,8 +13,7 @@ define <2 x i64> @s2v_test1(i64* nocapture readonly > %int64, <2 x i64> %vec) { > ; P9LE-LABEL: s2v_test1: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 0(r3) > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test1: > @@ -33,8 +32,7 @@ define <2 x i64> @s2v_test2(i64* nocapture readonly > %int64, <2 x i64> %vec) { > ; P9LE-LABEL: s2v_test2: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 8(r3) > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test2: > @@ -55,8 +53,7 @@ define <2 x i64> @s2v_test3(i64* nocapture readonly > %int64, <2 x i64> %vec, i32 > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: sldi r4, r7, 3 > ; P9LE-NEXT: lfdx f0, r3, r4 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test3 > @@ -78,8 +75,7 @@ define <2 x i64> @s2v_test4(i64* nocapture readonly > %int64, <2 x i64> %vec) { > ; P9LE-LABEL: s2v_test4: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 8(r3) > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test4: > @@ -99,8 +95,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i64* > nocapture readonly %ptr1) { > ; P9LE-LABEL: s2v_test5: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 0(r5) > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test5: > @@ -119,8 +114,7 @@ define <2 x double> @s2v_test_f1(double* nocapture > readonly %f64, <2 x double> % > ; P9LE-LABEL: s2v_test_f1: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 0(r3) > -; P9LE-NEXT: xxswapd vs0, f0 > -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f1: > @@ -132,8 +126,7 @@ define <2 x double> @s2v_test_f1(double* nocapture > readonly %f64, <2 x double> % > ; P8LE-LABEL: s2v_test_f1: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfdx f0, 0, r3 > -; P8LE-NEXT: xxspltd vs0, vs0, 0 > -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f1: > @@ -152,8 +145,7 @@ define <2 x double> @s2v_test_f2(double* nocapture > readonly %f64, <2 x double> % > ; P9LE-LABEL: s2v_test_f2: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 8(r3) > -; P9LE-NEXT: xxswapd vs0, f0 > -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f2: > @@ -165,8 +157,7 @@ define <2 x double> @s2v_test_f2(double* nocapture > readonly %f64, <2 x double> % > ; P8LE-LABEL: s2v_test_f2: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfd f0, 8(r3) > -; P8LE-NEXT: xxspltd vs0, vs0, 0 > -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f2: > @@ -187,8 +178,7 @@ define <2 x double> @s2v_test_f3(double* nocapture > readonly %f64, <2 x double> % > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: sldi r4, r7, 3 > ; P9LE-NEXT: lfdx f0, r3, r4 > -; P9LE-NEXT: xxswapd vs0, f0 > -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f3: > @@ -202,8 +192,7 @@ define <2 x double> @s2v_test_f3(double* nocapture > readonly %f64, <2 x double> % > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: sldi r4, r7, 3 > ; P8LE-NEXT: lfdx f0, r3, r4 > -; P8LE-NEXT: xxspltd vs0, vs0, 0 > -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f3: > @@ -225,8 +214,7 @@ define <2 x double> @s2v_test_f4(double* nocapture > readonly %f64, <2 x double> % > ; P9LE-LABEL: s2v_test_f4: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 8(r3) > -; P9LE-NEXT: xxswapd vs0, f0 > -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f4: > @@ -238,8 +226,7 @@ define <2 x double> @s2v_test_f4(double* nocapture > readonly %f64, <2 x double> % > ; P8LE-LABEL: s2v_test_f4: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfd f0, 8(r3) > -; P8LE-NEXT: xxspltd vs0, vs0, 0 > -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f4: > @@ -259,8 +246,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec, > double* nocapture readonly % > ; P9LE-LABEL: s2v_test_f5: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfd f0, 0(r5) > -; P9LE-NEXT: xxswapd vs0, f0 > -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f5: > @@ -272,8 +258,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec, > double* nocapture readonly % > ; P8LE-LABEL: s2v_test_f5: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfdx f0, 0, r5 > -; P8LE-NEXT: xxspltd vs0, vs0, 0 > -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f5: > > diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll > b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll > index 83691b52575d..f4572c359942 100644 > --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll > +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll > @@ -12,8 +12,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P9LE-LABEL: s2v_test1: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfiwax f0, 0, r3 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test1: > @@ -25,8 +24,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P8LE-LABEL: s2v_test1: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwax f0, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test1: > @@ -47,8 +45,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: addi r3, r3, 4 > ; P9LE-NEXT: lfiwax f0, 0, r3 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test2: > @@ -62,8 +59,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: addi r3, r3, 4 > ; P8LE-NEXT: lfiwax f0, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test2: > @@ -86,8 +82,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly > %int32, <2 x i64> %vec, i32 > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: sldi r4, r7, 2 > ; P9LE-NEXT: lfiwax f0, r3, r4 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test3: > @@ -101,8 +96,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly > %int32, <2 x i64> %vec, i32 > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: sldi r4, r7, 2 > ; P8LE-NEXT: lfiwax f0, r3, r4 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test3: > @@ -126,8 +120,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: addi r3, r3, 4 > ; P9LE-NEXT: lfiwax f0, 0, r3 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test4: > @@ -141,8 +134,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly > %int32, <2 x i64> %vec) { > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: addi r3, r3, 4 > ; P8LE-NEXT: lfiwax f0, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test4: > @@ -164,8 +156,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32* > nocapture readonly %ptr1) { > ; P9LE-LABEL: s2v_test5: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfiwax f0, 0, r5 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P9LE-NEXT: xxmrghd v2, v2, vs0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test5: > @@ -177,8 +168,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32* > nocapture readonly %ptr1) { > ; P8LE-LABEL: s2v_test5: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwax f0, 0, r5 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 > +; P8LE-NEXT: xxmrghd v2, v2, vs0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test5: > @@ -198,8 +188,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly > %ptr) { > ; P9LE-LABEL: s2v_test6: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfiwax f0, 0, r3 > -; P9LE-NEXT: xxswapd v2, f0 > -; P9LE-NEXT: xxspltd v2, v2, 1 > +; P9LE-NEXT: xxspltd v2, vs0, 0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test6: > @@ -211,8 +200,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly > %ptr) { > ; P8LE-LABEL: s2v_test6: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwax f0, 0, r3 > -; P8LE-NEXT: xxswapd v2, f0 > -; P8LE-NEXT: xxspltd v2, v2, 1 > +; P8LE-NEXT: xxspltd v2, vs0, 0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test6: > @@ -233,8 +221,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly > %ptr) { > ; P9LE-LABEL: s2v_test7: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: lfiwax f0, 0, r3 > -; P9LE-NEXT: xxswapd v2, f0 > -; P9LE-NEXT: xxspltd v2, v2, 1 > +; P9LE-NEXT: xxspltd v2, vs0, 0 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test7: > @@ -246,8 +233,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly > %ptr) { > ; P8LE-LABEL: s2v_test7: > ; P8LE: # %bb.0: # %entry > ; P8LE-NEXT: lfiwax f0, 0, r3 > -; P8LE-NEXT: xxswapd v2, f0 > -; P8LE-NEXT: xxspltd v2, v2, 1 > +; P8LE-NEXT: xxspltd v2, vs0, 0 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test7: > > diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll > b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll > index 2261d75c6619..3dc34533420c 100644 > --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll > +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll > @@ -11,12 +11,11 @@ > define <4 x i32> @s2v_test1(i32* nocapture readonly %int32, <4 x i32> > %vec) { > ; P8LE-LABEL: s2v_test1: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: lfiwzx f0, 0, r3 > ; P8LE-NEXT: addis r4, r2, .LCPI0_0 at toc@ha > -; P8LE-NEXT: addi r3, r4, .LCPI0_0 at toc@l > -; P8LE-NEXT: lvx v3, 0, r3 > -; P8LE-NEXT: xxswapd v4, f0 > -; P8LE-NEXT: vperm v2, v4, v2, v3 > +; P8LE-NEXT: lxsiwzx v4, 0, r3 > +; P8LE-NEXT: addi r4, r4, .LCPI0_0 at toc@l > +; P8LE-NEXT: lvx v3, 0, r4 > +; P8LE-NEXT: vperm v2, v2, v4, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test1: > @@ -36,13 +35,12 @@ entry: > define <4 x i32> @s2v_test2(i32* nocapture readonly %int32, <4 x i32> > %vec) { > ; P8LE-LABEL: s2v_test2: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: addi r3, r3, 4 > ; P8LE-NEXT: addis r4, r2, .LCPI1_0 at toc@ha > -; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: addi r3, r4, .LCPI1_0 at toc@l > -; P8LE-NEXT: lvx v3, 0, r3 > -; P8LE-NEXT: xxswapd v4, f0 > -; P8LE-NEXT: vperm v2, v4, v2, v3 > +; P8LE-NEXT: addi r3, r3, 4 > +; P8LE-NEXT: addi r4, r4, .LCPI1_0 at toc@l > +; P8LE-NEXT: lxsiwzx v4, 0, r3 > +; P8LE-NEXT: lvx v3, 0, r4 > +; P8LE-NEXT: vperm v2, v2, v4, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test2: > @@ -64,13 +62,12 @@ entry: > define <4 x i32> @s2v_test3(i32* nocapture readonly %int32, <4 x i32> > %vec, i32 signext %Idx) { > ; P8LE-LABEL: s2v_test3: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: sldi r5, r7, 2 > ; P8LE-NEXT: addis r4, r2, .LCPI2_0 at toc@ha > -; P8LE-NEXT: lfiwzx f0, r3, r5 > -; P8LE-NEXT: addi r3, r4, .LCPI2_0 at toc@l > -; P8LE-NEXT: lvx v4, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: vperm v2, v3, v2, v4 > +; P8LE-NEXT: sldi r5, r7, 2 > +; P8LE-NEXT: addi r4, r4, .LCPI2_0 at toc@l > +; P8LE-NEXT: lxsiwzx v3, r3, r5 > +; P8LE-NEXT: lvx v4, 0, r4 > +; P8LE-NEXT: vperm v2, v2, v3, v4 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test3: > @@ -93,13 +90,12 @@ entry: > define <4 x i32> @s2v_test4(i32* nocapture readonly %int32, <4 x i32> > %vec) { > ; P8LE-LABEL: s2v_test4: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: addi r3, r3, 4 > ; P8LE-NEXT: addis r4, r2, .LCPI3_0 at toc@ha > -; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: addi r3, r4, .LCPI3_0 at toc@l > -; P8LE-NEXT: lvx v3, 0, r3 > -; P8LE-NEXT: xxswapd v4, f0 > -; P8LE-NEXT: vperm v2, v4, v2, v3 > +; P8LE-NEXT: addi r3, r3, 4 > +; P8LE-NEXT: addi r4, r4, .LCPI3_0 at toc@l > +; P8LE-NEXT: lxsiwzx v4, 0, r3 > +; P8LE-NEXT: lvx v3, 0, r4 > +; P8LE-NEXT: vperm v2, v2, v4, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test4: > @@ -121,12 +117,11 @@ entry: > define <4 x i32> @s2v_test5(<4 x i32> %vec, i32* nocapture readonly > %ptr1) { > ; P8LE-LABEL: s2v_test5: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: lfiwzx f0, 0, r5 > ; P8LE-NEXT: addis r3, r2, .LCPI4_0 at toc@ha > +; P8LE-NEXT: lxsiwzx v4, 0, r5 > ; P8LE-NEXT: addi r3, r3, .LCPI4_0 at toc@l > ; P8LE-NEXT: lvx v3, 0, r3 > -; P8LE-NEXT: xxswapd v4, f0 > -; P8LE-NEXT: vperm v2, v4, v2, v3 > +; P8LE-NEXT: vperm v2, v2, v4, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test5: > @@ -146,12 +141,11 @@ entry: > define <4 x float> @s2v_test_f1(float* nocapture readonly %f64, <4 x > float> %vec) { > ; P8LE-LABEL: s2v_test_f1: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: lfiwzx f0, 0, r3 > ; P8LE-NEXT: addis r4, r2, .LCPI5_0 at toc@ha > -; P8LE-NEXT: addi r3, r4, .LCPI5_0 at toc@l > -; P8LE-NEXT: lvx v3, 0, r3 > -; P8LE-NEXT: xxswapd v4, f0 > -; P8LE-NEXT: vperm v2, v4, v2, v3 > +; P8LE-NEXT: lxsiwzx v4, 0, r3 > +; P8LE-NEXT: addi r4, r4, .LCPI5_0 at toc@l > +; P8LE-NEXT: lvx v3, 0, r4 > +; P8LE-NEXT: vperm v2, v2, v4, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f1: > @@ -172,10 +166,9 @@ define <2 x float> @s2v_test_f2(float* nocapture > readonly %f64, <2 x float> %vec > ; P9LE-LABEL: s2v_test_f2: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: addi r3, r3, 4 > -; P9LE-DAG: xxspltw v2, v2, 2 > -; P9LE-DAG: lfiwzx f0, 0, r3 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: vmrglw v2, v2, v3 > +; P9LE-NEXT: lxsiwzx v3, 0, r3 > +; P9LE-NEXT: vmrglw v2, v2, v2 > +; P9LE-NEXT: vmrghw v2, v2, v3 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f2: > @@ -189,11 +182,10 @@ define <2 x float> @s2v_test_f2(float* nocapture > readonly %f64, <2 x float> %vec > > ; P8LE-LABEL: s2v_test_f2: > ; P8LE: # %bb.0: # %entry > +; P8LE-NEXT: vmrglw v2, v2, v2 > ; P8LE-NEXT: addi r3, r3, 4 > -; P8LE-NEXT: xxspltw v2, v2, 2 > -; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: lxsiwzx v3, 0, r3 > +; P8LE-NEXT: vmrghw v2, v2, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f2: > @@ -216,10 +208,9 @@ define <2 x float> @s2v_test_f3(float* nocapture > readonly %f64, <2 x float> %vec > ; P9LE-LABEL: s2v_test_f3: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: sldi r4, r7, 2 > -; P9LE-NEXT: lfiwzx f0, r3, r4 > -; P9LE-DAG: xxspltw v2, v2, 2 > -; P9LE-DAG: xxswapd v3, f0 > -; P9LE-NEXT: vmrglw v2, v2, v3 > +; P9LE-NEXT: lxsiwzx v3, r3, r4 > +; P9LE-NEXT: vmrglw v2, v2, v2 > +; P9LE-NEXT: vmrghw v2, v2, v3 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f3: > @@ -233,11 +224,10 @@ define <2 x float> @s2v_test_f3(float* nocapture > readonly %f64, <2 x float> %vec > > ; P8LE-LABEL: s2v_test_f3: > ; P8LE: # %bb.0: # %entry > +; P8LE-NEXT: vmrglw v2, v2, v2 > ; P8LE-NEXT: sldi r4, r7, 2 > -; P8LE-NEXT: xxspltw v2, v2, 2 > -; P8LE-NEXT: lfiwzx f0, r3, r4 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: lxsiwzx v3, r3, r4 > +; P8LE-NEXT: vmrghw v2, v2, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f3: > @@ -261,10 +251,9 @@ define <2 x float> @s2v_test_f4(float* nocapture > readonly %f64, <2 x float> %vec > ; P9LE-LABEL: s2v_test_f4: > ; P9LE: # %bb.0: # %entry > ; P9LE-NEXT: addi r3, r3, 4 > -; P9LE-NEXT: lfiwzx f0, 0, r3 > -; P9LE-DAG: xxspltw v2, v2, 2 > -; P9LE-DAG: xxswapd v3, f0 > -; P9LE-NEXT: vmrglw v2, v2, v3 > +; P9LE-NEXT: lxsiwzx v3, 0, r3 > +; P9LE-NEXT: vmrglw v2, v2, v2 > +; P9LE-NEXT: vmrghw v2, v2, v3 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f4: > @@ -278,11 +267,10 @@ define <2 x float> @s2v_test_f4(float* nocapture > readonly %f64, <2 x float> %vec > > ; P8LE-LABEL: s2v_test_f4: > ; P8LE: # %bb.0: # %entry > +; P8LE-NEXT: vmrglw v2, v2, v2 > ; P8LE-NEXT: addi r3, r3, 4 > -; P8LE-NEXT: xxspltw v2, v2, 2 > -; P8LE-NEXT: lfiwzx f0, 0, r3 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: lxsiwzx v3, 0, r3 > +; P8LE-NEXT: vmrghw v2, v2, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f4: > @@ -304,10 +292,9 @@ entry: > define <2 x float> @s2v_test_f5(<2 x float> %vec, float* nocapture > readonly %ptr1) { > ; P9LE-LABEL: s2v_test_f5: > ; P9LE: # %bb.0: # %entry > -; P9LE-NEXT: lfiwzx f0, 0, r5 > -; P9LE-NEXT: xxspltw v2, v2, 2 > -; P9LE-NEXT: xxswapd v3, f0 > -; P9LE-NEXT: vmrglw v2, v2, v3 > +; P9LE-NEXT: lxsiwzx v3, 0, r5 > +; P9LE-NEXT: vmrglw v2, v2, v2 > +; P9LE-NEXT: vmrghw v2, v2, v3 > ; P9LE-NEXT: blr > > ; P9BE-LABEL: s2v_test_f5: > @@ -320,10 +307,9 @@ define <2 x float> @s2v_test_f5(<2 x float> %vec, > float* nocapture readonly %ptr > > ; P8LE-LABEL: s2v_test_f5: > ; P8LE: # %bb.0: # %entry > -; P8LE-NEXT: lfiwzx f0, 0, r5 > -; P8LE-NEXT: xxspltw v2, v2, 2 > -; P8LE-NEXT: xxswapd v3, f0 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: vmrglw v2, v2, v2 > +; P8LE-NEXT: lxsiwzx v3, 0, r5 > +; P8LE-NEXT: vmrghw v2, v2, v3 > ; P8LE-NEXT: blr > > ; P8BE-LABEL: s2v_test_f5: > > diff --git a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll > b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll > index 935630745f47..097ba07a5b1e 100644 > --- a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll > +++ b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll > @@ -13,60 +13,56 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -21386 > -; P9LE-NEXT: ori r5, r5, 37253 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: lis r4, -21386 > +; P9LE-NEXT: ori r4, r4, 37253 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 6 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, 31710 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 31710 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: ori r5, r5, 63421 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: sub r4, r5, r4 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: ori r4, r4, 63421 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: sub r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 6 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, 21399 > ; P9LE-NEXT: mulli r4, r4, -124 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 21399 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: ori r5, r5, 33437 > -; P9LE-NEXT: mulhw r4, r4, r5 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: ori r4, r4, 33437 > +; P9LE-NEXT: mulhw r4, r3, r4 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 5 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, -16728 > ; P9LE-NEXT: mulli r4, r4, 98 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: ori r5, r5, 63249 > -; P9LE-NEXT: mulhw r4, r4, r5 > +; P9LE-NEXT: lis r4, -16728 > +; P9LE-NEXT: ori r4, r4, 63249 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 8 > ; P9LE-NEXT: add r4, r4, r5 > ; P9LE-NEXT: mulli r4, r4, -1003 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -135,58 +131,54 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) { > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > ; P8LE-NEXT: lis r3, 21399 > -; P8LE-NEXT: lis r9, -21386 > -; P8LE-NEXT: lis r11, 31710 > ; P8LE-NEXT: lis r8, -16728 > +; P8LE-NEXT: lis r9, -21386 > +; P8LE-NEXT: lis r10, 31710 > ; P8LE-NEXT: ori r3, r3, 33437 > -; P8LE-NEXT: ori r9, r9, 37253 > ; P8LE-NEXT: ori r8, r8, 63249 > +; P8LE-NEXT: ori r9, r9, 37253 > +; P8LE-NEXT: ori r10, r10, 63421 > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: rldicl r5, r4, 32, 48 > -; P8LE-NEXT: clrldi r7, r4, 48 > ; P8LE-NEXT: rldicl r6, r4, 16, 48 > +; P8LE-NEXT: clrldi r7, r4, 48 > +; P8LE-NEXT: extsh r5, r5 > +; P8LE-NEXT: extsh r6, r6 > ; P8LE-NEXT: rldicl r4, r4, 48, 48 > -; P8LE-NEXT: extsh r10, r5 > -; P8LE-NEXT: extsh r0, r7 > -; P8LE-NEXT: mulhw r3, r10, r3 > -; P8LE-NEXT: ori r10, r11, 63421 > -; P8LE-NEXT: extsh r11, r4 > -; P8LE-NEXT: extsh r12, r6 > -; P8LE-NEXT: mulhw r9, r0, r9 > -; P8LE-NEXT: mulhw r10, r11, r10 > -; P8LE-NEXT: mulhw r8, r12, r8 > -; P8LE-NEXT: srwi r12, r3, 31 > +; P8LE-NEXT: extsh r7, r7 > +; P8LE-NEXT: mulhw r3, r5, r3 > +; P8LE-NEXT: extsh r4, r4 > +; P8LE-NEXT: mulhw r8, r6, r8 > +; P8LE-NEXT: mulhw r9, r7, r9 > +; P8LE-NEXT: mulhw r10, r4, r10 > +; P8LE-NEXT: srwi r11, r3, 31 > ; P8LE-NEXT: srawi r3, r3, 5 > -; P8LE-NEXT: add r9, r9, r0 > -; P8LE-NEXT: sub r10, r10, r11 > -; P8LE-NEXT: add r3, r3, r12 > +; P8LE-NEXT: add r3, r3, r11 > +; P8LE-NEXT: srwi r11, r8, 31 > +; P8LE-NEXT: add r9, r9, r7 > +; P8LE-NEXT: srawi r8, r8, 8 > +; P8LE-NEXT: sub r10, r10, r4 > +; P8LE-NEXT: add r8, r8, r11 > ; P8LE-NEXT: srwi r11, r9, 31 > ; P8LE-NEXT: srawi r9, r9, 6 > -; P8LE-NEXT: srwi r12, r8, 31 > -; P8LE-NEXT: srawi r8, r8, 8 > +; P8LE-NEXT: mulli r3, r3, 98 > ; P8LE-NEXT: add r9, r9, r11 > ; P8LE-NEXT: srwi r11, r10, 31 > ; P8LE-NEXT: srawi r10, r10, 6 > -; P8LE-NEXT: add r8, r8, r12 > -; P8LE-NEXT: mulli r3, r3, 98 > -; P8LE-NEXT: add r10, r10, r11 > ; P8LE-NEXT: mulli r8, r8, -1003 > +; P8LE-NEXT: add r10, r10, r11 > ; P8LE-NEXT: mulli r9, r9, 95 > ; P8LE-NEXT: mulli r10, r10, -124 > ; P8LE-NEXT: sub r3, r5, r3 > +; P8LE-NEXT: mtvsrd v2, r3 > ; P8LE-NEXT: sub r5, r6, r8 > -; P8LE-NEXT: mtfprd f0, r3 > ; P8LE-NEXT: sub r3, r7, r9 > +; P8LE-NEXT: mtvsrd v3, r5 > ; P8LE-NEXT: sub r4, r4, r10 > -; P8LE-NEXT: mtfprd f1, r5 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: vmrglh v3, v5, v4 > +; P8LE-NEXT: mtvsrd v4, r3 > +; P8LE-NEXT: mtvsrd v5, r4 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: vmrghh v3, v5, v4 > ; P8LE-NEXT: vmrglw v2, v2, v3 > ; P8LE-NEXT: blr > ; > @@ -256,56 +248,52 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -21386 > -; P9LE-NEXT: ori r5, r5, 37253 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r6, r4, r5 > -; P9LE-NEXT: add r4, r6, r4 > -; P9LE-NEXT: srwi r6, r4, 31 > -; P9LE-NEXT: srawi r4, r4, 6 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, -21386 > +; P9LE-NEXT: ori r4, r4, 37253 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r5, r3, r4 > +; P9LE-NEXT: add r5, r5, r3 > +; P9LE-NEXT: srwi r6, r5, 31 > +; P9LE-NEXT: srawi r5, r5, 6 > +; P9LE-NEXT: add r5, r5, r6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r6, r4, r5 > -; P9LE-NEXT: add r4, r6, r4 > -; P9LE-NEXT: srwi r6, r4, 31 > -; P9LE-NEXT: srawi r4, r4, 6 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r5, r3, r4 > +; P9LE-NEXT: add r5, r5, r3 > +; P9LE-NEXT: srwi r6, r5, 31 > +; P9LE-NEXT: srawi r5, r5, 6 > +; P9LE-NEXT: add r5, r5, r6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r6, r4, r5 > -; P9LE-NEXT: add r4, r6, r4 > -; P9LE-NEXT: srwi r6, r4, 31 > -; P9LE-NEXT: srawi r4, r4, 6 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r5, r3, r4 > +; P9LE-NEXT: add r5, r5, r3 > +; P9LE-NEXT: srwi r6, r5, 31 > +; P9LE-NEXT: srawi r5, r5, 6 > +; P9LE-NEXT: add r5, r5, r6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 6 > ; P9LE-NEXT: add r4, r4, r5 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -370,56 +358,50 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) { > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > ; P8LE-NEXT: lis r3, -21386 > -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill > ; P8LE-NEXT: ori r3, r3, 37253 > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: clrldi r5, r4, 48 > ; P8LE-NEXT: rldicl r6, r4, 48, 48 > -; P8LE-NEXT: extsh r8, r5 > +; P8LE-NEXT: extsh r5, r5 > ; P8LE-NEXT: rldicl r7, r4, 32, 48 > -; P8LE-NEXT: extsh r9, r6 > -; P8LE-NEXT: mulhw r10, r8, r3 > +; P8LE-NEXT: extsh r6, r6 > +; P8LE-NEXT: mulhw r8, r5, r3 > ; P8LE-NEXT: rldicl r4, r4, 16, 48 > -; P8LE-NEXT: extsh r11, r7 > -; P8LE-NEXT: mulhw r12, r9, r3 > -; P8LE-NEXT: extsh r0, r4 > -; P8LE-NEXT: mulhw r30, r11, r3 > -; P8LE-NEXT: mulhw r3, r0, r3 > -; P8LE-NEXT: add r8, r10, r8 > -; P8LE-NEXT: add r9, r12, r9 > -; P8LE-NEXT: srwi r10, r8, 31 > +; P8LE-NEXT: extsh r7, r7 > +; P8LE-NEXT: mulhw r9, r6, r3 > +; P8LE-NEXT: extsh r4, r4 > +; P8LE-NEXT: mulhw r10, r7, r3 > +; P8LE-NEXT: mulhw r3, r4, r3 > +; P8LE-NEXT: add r8, r8, r5 > +; P8LE-NEXT: add r9, r9, r6 > +; P8LE-NEXT: srwi r11, r8, 31 > ; P8LE-NEXT: srawi r8, r8, 6 > -; P8LE-NEXT: add r11, r30, r11 > -; P8LE-NEXT: add r3, r3, r0 > -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload > -; P8LE-NEXT: add r8, r8, r10 > -; P8LE-NEXT: srwi r10, r9, 31 > +; P8LE-NEXT: add r10, r10, r7 > +; P8LE-NEXT: add r3, r3, r4 > +; P8LE-NEXT: add r8, r8, r11 > +; P8LE-NEXT: srwi r11, r9, 31 > ; P8LE-NEXT: srawi r9, r9, 6 > ; P8LE-NEXT: mulli r8, r8, 95 > -; P8LE-NEXT: add r9, r9, r10 > -; P8LE-NEXT: srwi r10, r11, 31 > -; P8LE-NEXT: srawi r11, r11, 6 > +; P8LE-NEXT: add r9, r9, r11 > +; P8LE-NEXT: srwi r11, r10, 31 > +; P8LE-NEXT: srawi r10, r10, 6 > ; P8LE-NEXT: mulli r9, r9, 95 > -; P8LE-NEXT: add r10, r11, r10 > +; P8LE-NEXT: add r10, r10, r11 > ; P8LE-NEXT: srwi r11, r3, 31 > ; P8LE-NEXT: srawi r3, r3, 6 > ; P8LE-NEXT: mulli r10, r10, 95 > ; P8LE-NEXT: sub r5, r5, r8 > ; P8LE-NEXT: add r3, r3, r11 > -; P8LE-NEXT: mtfprd f0, r5 > +; P8LE-NEXT: mtvsrd v2, r5 > ; P8LE-NEXT: mulli r3, r3, 95 > ; P8LE-NEXT: sub r6, r6, r9 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: xxswapd v2, vs0 > +; P8LE-NEXT: mtvsrd v3, r6 > ; P8LE-NEXT: sub r5, r7, r10 > -; P8LE-NEXT: mtfprd f2, r5 > -; P8LE-NEXT: xxswapd v3, vs1 > +; P8LE-NEXT: mtvsrd v4, r5 > ; P8LE-NEXT: sub r3, r4, r3 > -; P8LE-NEXT: mtfprd f3, r3 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: vmrglh v3, v5, v4 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v3, v5, v4 > ; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -487,67 +469,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -21386 > -; P9LE-NEXT: ori r5, r5, 37253 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r6, r4, r5 > -; P9LE-NEXT: add r4, r6, r4 > -; P9LE-NEXT: srwi r6, r4, 31 > -; P9LE-NEXT: srawi r4, r4, 6 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: mulli r6, r4, 95 > +; P9LE-NEXT: lis r4, -21386 > +; P9LE-NEXT: ori r4, r4, 37253 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r5, r3, r4 > +; P9LE-NEXT: add r5, r5, r3 > +; P9LE-NEXT: srwi r6, r5, 31 > +; P9LE-NEXT: srawi r5, r5, 6 > +; P9LE-NEXT: add r5, r5, r6 > +; P9LE-NEXT: mulli r6, r5, 95 > ; P9LE-NEXT: sub r3, r3, r6 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: extsh r6, r3 > -; P9LE-NEXT: mulhw r7, r6, r5 > +; P9LE-NEXT: mulhw r7, r6, r4 > ; P9LE-NEXT: add r6, r7, r6 > ; P9LE-NEXT: srwi r7, r6, 31 > ; P9LE-NEXT: srawi r6, r6, 6 > ; P9LE-NEXT: add r6, r6, r7 > ; P9LE-NEXT: mulli r7, r6, 95 > ; P9LE-NEXT: sub r3, r3, r7 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: extsh r7, r3 > -; P9LE-NEXT: mulhw r8, r7, r5 > +; P9LE-NEXT: mulhw r8, r7, r4 > ; P9LE-NEXT: add r7, r8, r7 > ; P9LE-NEXT: srwi r8, r7, 31 > ; P9LE-NEXT: srawi r7, r7, 6 > ; P9LE-NEXT: add r7, r7, r8 > ; P9LE-NEXT: mulli r8, r7, 95 > ; P9LE-NEXT: sub r3, r3, r8 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: extsh r8, r3 > -; P9LE-NEXT: mulhw r5, r8, r5 > -; P9LE-NEXT: add r5, r5, r8 > -; P9LE-NEXT: srwi r8, r5, 31 > -; P9LE-NEXT: srawi r5, r5, 6 > -; P9LE-NEXT: add r5, r5, r8 > -; P9LE-NEXT: mulli r8, r5, 95 > +; P9LE-NEXT: mulhw r4, r8, r4 > +; P9LE-NEXT: add r4, r4, r8 > +; P9LE-NEXT: srwi r8, r4, 31 > +; P9LE-NEXT: srawi r4, r4, 6 > +; P9LE-NEXT: add r4, r4, r8 > +; P9LE-NEXT: mulli r8, r4, 95 > ; P9LE-NEXT: sub r3, r3, r8 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: mtfprd f0, r4 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v4, r6 > ; P9LE-NEXT: vmrglw v2, v2, v3 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r6 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r7 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r5 > -; P9LE-NEXT: xxswapd v5, vs0 > -; P9LE-NEXT: vmrglh v4, v5, v4 > +; P9LE-NEXT: mtvsrd v3, r5 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r7 > +; P9LE-NEXT: mtvsrd v5, r4 > +; P9LE-NEXT: vmrghh v4, v5, v4 > ; P9LE-NEXT: vmrglw v3, v4, v3 > ; P9LE-NEXT: vadduhm v2, v2, v3 > ; P9LE-NEXT: blr > @@ -624,69 +598,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) { > ; P8LE-LABEL: combine_srem_sdiv: > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > -; P8LE-NEXT: lis r4, -21386 > -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill > -; P8LE-NEXT: ori r4, r4, 37253 > -; P8LE-NEXT: mffprd r5, f0 > -; P8LE-NEXT: clrldi r3, r5, 48 > -; P8LE-NEXT: rldicl r6, r5, 48, 48 > -; P8LE-NEXT: rldicl r7, r5, 32, 48 > -; P8LE-NEXT: extsh r8, r3 > -; P8LE-NEXT: extsh r9, r6 > -; P8LE-NEXT: extsh r10, r7 > -; P8LE-NEXT: mulhw r11, r8, r4 > -; P8LE-NEXT: rldicl r5, r5, 16, 48 > -; P8LE-NEXT: mulhw r12, r9, r4 > -; P8LE-NEXT: mulhw r0, r10, r4 > -; P8LE-NEXT: extsh r30, r5 > -; P8LE-NEXT: mulhw r4, r30, r4 > +; P8LE-NEXT: lis r3, -21386 > +; P8LE-NEXT: ori r3, r3, 37253 > +; P8LE-NEXT: mffprd r4, f0 > +; P8LE-NEXT: clrldi r5, r4, 48 > +; P8LE-NEXT: rldicl r6, r4, 48, 48 > +; P8LE-NEXT: rldicl r7, r4, 32, 48 > +; P8LE-NEXT: extsh r5, r5 > +; P8LE-NEXT: extsh r8, r6 > +; P8LE-NEXT: extsh r9, r7 > +; P8LE-NEXT: mulhw r10, r5, r3 > +; P8LE-NEXT: mulhw r11, r8, r3 > +; P8LE-NEXT: rldicl r4, r4, 16, 48 > +; P8LE-NEXT: mulhw r12, r9, r3 > +; P8LE-NEXT: extsh r0, r4 > +; P8LE-NEXT: mulhw r3, r0, r3 > +; P8LE-NEXT: add r10, r10, r5 > ; P8LE-NEXT: add r8, r11, r8 > +; P8LE-NEXT: srwi r11, r10, 31 > ; P8LE-NEXT: add r9, r12, r9 > -; P8LE-NEXT: srwi r11, r8, 31 > -; P8LE-NEXT: add r10, r0, r10 > -; P8LE-NEXT: srawi r8, r8, 6 > -; P8LE-NEXT: srawi r12, r9, 6 > +; P8LE-NEXT: srawi r10, r10, 6 > +; P8LE-NEXT: srawi r12, r8, 6 > +; P8LE-NEXT: srwi r8, r8, 31 > +; P8LE-NEXT: add r10, r10, r11 > +; P8LE-NEXT: add r3, r3, r0 > +; P8LE-NEXT: srawi r11, r9, 6 > ; P8LE-NEXT: srwi r9, r9, 31 > -; P8LE-NEXT: add r8, r8, r11 > -; P8LE-NEXT: add r4, r4, r30 > -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload > -; P8LE-NEXT: srawi r11, r10, 6 > -; P8LE-NEXT: srwi r10, r10, 31 > -; P8LE-NEXT: add r9, r12, r9 > -; P8LE-NEXT: mtfprd f0, r8 > -; P8LE-NEXT: mulli r12, r8, 95 > -; P8LE-NEXT: add r10, r11, r10 > -; P8LE-NEXT: srwi r8, r4, 31 > -; P8LE-NEXT: mtfprd f1, r9 > -; P8LE-NEXT: srawi r4, r4, 6 > -; P8LE-NEXT: mulli r11, r9, 95 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f2, r10 > -; P8LE-NEXT: mulli r9, r10, 95 > -; P8LE-NEXT: add r4, r4, r8 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: mulli r4, r4, 95 > -; P8LE-NEXT: xxswapd v1, vs2 > -; P8LE-NEXT: sub r3, r3, r12 > -; P8LE-NEXT: mtfprd f0, r3 > -; P8LE-NEXT: sub r6, r6, r11 > -; P8LE-NEXT: xxswapd v6, vs3 > -; P8LE-NEXT: sub r3, r7, r9 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: mtfprd f4, r3 > -; P8LE-NEXT: sub r3, r5, r4 > -; P8LE-NEXT: mtfprd f5, r3 > -; P8LE-NEXT: xxswapd v4, vs1 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: xxswapd v3, vs0 > -; P8LE-NEXT: xxswapd v5, vs4 > -; P8LE-NEXT: xxswapd v0, vs5 > -; P8LE-NEXT: vmrglh v3, v4, v3 > -; P8LE-NEXT: vmrglh v4, v0, v5 > -; P8LE-NEXT: vmrglh v5, v6, v1 > -; P8LE-NEXT: vmrglw v3, v4, v3 > -; P8LE-NEXT: vmrglw v2, v5, v2 > +; P8LE-NEXT: add r8, r12, r8 > +; P8LE-NEXT: mtvsrd v2, r10 > +; P8LE-NEXT: mulli r12, r10, 95 > +; P8LE-NEXT: add r9, r11, r9 > +; P8LE-NEXT: srwi r11, r3, 31 > +; P8LE-NEXT: mtvsrd v3, r8 > +; P8LE-NEXT: srawi r3, r3, 6 > +; P8LE-NEXT: mulli r10, r8, 95 > +; P8LE-NEXT: mtvsrd v4, r9 > +; P8LE-NEXT: add r3, r3, r11 > +; P8LE-NEXT: mulli r8, r9, 95 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: mulli r9, r3, 95 > +; P8LE-NEXT: sub r5, r5, r12 > +; P8LE-NEXT: sub r6, r6, r10 > +; P8LE-NEXT: mtvsrd v3, r5 > +; P8LE-NEXT: mtvsrd v5, r6 > +; P8LE-NEXT: sub r5, r7, r8 > +; P8LE-NEXT: sub r4, r4, r9 > +; P8LE-NEXT: mtvsrd v0, r5 > +; P8LE-NEXT: mtvsrd v1, r4 > +; P8LE-NEXT: vmrghh v3, v5, v3 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v0, v1, v0 > +; P8LE-NEXT: vmrghh v4, v5, v4 > +; P8LE-NEXT: vmrglw v3, v0, v3 > +; P8LE-NEXT: vmrglw v2, v4, v2 > ; P8LE-NEXT: vadduhm v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -767,47 +731,43 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x > i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: srawi r4, r4, 6 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: srawi r4, r3, 6 > ; P9LE-NEXT: addze r4, r4 > ; P9LE-NEXT: slwi r4, r4, 6 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: srawi r4, r4, 5 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: srawi r4, r3, 5 > ; P9LE-NEXT: addze r4, r4 > ; P9LE-NEXT: slwi r4, r4, 5 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, -21386 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -21386 > -; P9LE-NEXT: ori r5, r5, 37253 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: ori r4, r4, 37253 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 6 > ; P9LE-NEXT: add r4, r4, r5 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: srawi r4, r4, 3 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: srawi r4, r3, 3 > ; P9LE-NEXT: addze r4, r4 > ; P9LE-NEXT: slwi r4, r4, 3 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v4, v2 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v4, v2 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -866,42 +826,38 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x > i16> %x) { > ; P8LE-NEXT: ori r3, r3, 37253 > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: rldicl r5, r4, 16, 48 > -; P8LE-NEXT: clrldi r7, r4, 48 > -; P8LE-NEXT: extsh r6, r5 > -; P8LE-NEXT: extsh r8, r7 > -; P8LE-NEXT: mulhw r3, r6, r3 > -; P8LE-NEXT: rldicl r9, r4, 48, 48 > -; P8LE-NEXT: srawi r8, r8, 6 > -; P8LE-NEXT: extsh r10, r9 > +; P8LE-NEXT: clrldi r6, r4, 48 > +; P8LE-NEXT: extsh r5, r5 > +; P8LE-NEXT: extsh r6, r6 > +; P8LE-NEXT: mulhw r3, r5, r3 > +; P8LE-NEXT: rldicl r7, r4, 48, 48 > +; P8LE-NEXT: srawi r8, r6, 6 > +; P8LE-NEXT: extsh r7, r7 > ; P8LE-NEXT: addze r8, r8 > ; P8LE-NEXT: rldicl r4, r4, 32, 48 > -; P8LE-NEXT: srawi r10, r10, 5 > +; P8LE-NEXT: srawi r9, r7, 5 > +; P8LE-NEXT: extsh r4, r4 > ; P8LE-NEXT: slwi r8, r8, 6 > -; P8LE-NEXT: add r3, r3, r6 > -; P8LE-NEXT: addze r6, r10 > -; P8LE-NEXT: sub r7, r7, r8 > +; P8LE-NEXT: add r3, r3, r5 > +; P8LE-NEXT: addze r9, r9 > +; P8LE-NEXT: sub r6, r6, r8 > ; P8LE-NEXT: srwi r10, r3, 31 > ; P8LE-NEXT: srawi r3, r3, 6 > -; P8LE-NEXT: mtfprd f0, r7 > -; P8LE-NEXT: slwi r6, r6, 5 > +; P8LE-NEXT: slwi r8, r9, 5 > +; P8LE-NEXT: mtvsrd v2, r6 > ; P8LE-NEXT: add r3, r3, r10 > -; P8LE-NEXT: extsh r10, r4 > -; P8LE-NEXT: sub r6, r9, r6 > +; P8LE-NEXT: srawi r9, r4, 3 > +; P8LE-NEXT: sub r6, r7, r8 > ; P8LE-NEXT: mulli r3, r3, 95 > -; P8LE-NEXT: srawi r8, r10, 3 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: addze r7, r8 > -; P8LE-NEXT: xxswapd v3, vs1 > +; P8LE-NEXT: addze r7, r9 > +; P8LE-NEXT: mtvsrd v3, r6 > +; P8LE-NEXT: vmrghh v2, v3, v2 > ; P8LE-NEXT: sub r3, r5, r3 > ; P8LE-NEXT: slwi r5, r7, 3 > ; P8LE-NEXT: sub r4, r4, r5 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: vmrglh v3, v4, v5 > +; P8LE-NEXT: mtvsrd v4, r3 > +; P8LE-NEXT: mtvsrd v5, r4 > +; P8LE-NEXT: vmrghh v3, v4, v5 > ; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -959,48 +915,46 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -14230 > -; P9LE-NEXT: ori r5, r5, 30865 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: lis r4, -14230 > +; P9LE-NEXT: ori r4, r4, 30865 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 9 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, -19946 > ; P9LE-NEXT: mulli r4, r4, 654 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, -19946 > +; P9LE-NEXT: mtvsrd v3, r3 > +; P9LE-NEXT: li r3, 0 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > -; P9LE-NEXT: ori r5, r5, 17097 > -; P9LE-NEXT: xxlxor v3, v3, v3 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: ori r4, r4, 17097 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 4 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, 24749 > ; P9LE-NEXT: mulli r4, r4, 23 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: vmrghh v3, v3, v4 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: ori r5, r5, 47143 > -; P9LE-NEXT: mulhw r4, r4, r5 > +; P9LE-NEXT: lis r4, 24749 > +; P9LE-NEXT: ori r4, r4, 47143 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 11 > ; P9LE-NEXT: add r4, r4, r5 > ; P9LE-NEXT: mulli r4, r4, 5423 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -1058,49 +1012,47 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) > { > ; P8LE-LABEL: dont_fold_srem_one: > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > -; P8LE-NEXT: lis r3, 24749 > -; P8LE-NEXT: lis r7, -19946 > -; P8LE-NEXT: lis r9, -14230 > -; P8LE-NEXT: xxlxor v5, v5, v5 > -; P8LE-NEXT: ori r3, r3, 47143 > -; P8LE-NEXT: ori r7, r7, 17097 > -; P8LE-NEXT: mffprd r4, f0 > -; P8LE-NEXT: rldicl r5, r4, 16, 48 > -; P8LE-NEXT: rldicl r6, r4, 32, 48 > -; P8LE-NEXT: rldicl r4, r4, 48, 48 > -; P8LE-NEXT: extsh r8, r5 > -; P8LE-NEXT: extsh r10, r6 > -; P8LE-NEXT: mulhw r3, r8, r3 > -; P8LE-NEXT: ori r8, r9, 30865 > -; P8LE-NEXT: extsh r9, r4 > -; P8LE-NEXT: mulhw r7, r10, r7 > -; P8LE-NEXT: mulhw r8, r9, r8 > -; P8LE-NEXT: add r7, r7, r10 > -; P8LE-NEXT: srwi r10, r3, 31 > -; P8LE-NEXT: add r8, r8, r9 > -; P8LE-NEXT: srawi r3, r3, 11 > -; P8LE-NEXT: srwi r9, r7, 31 > -; P8LE-NEXT: srawi r7, r7, 4 > -; P8LE-NEXT: add r3, r3, r10 > -; P8LE-NEXT: add r7, r7, r9 > +; P8LE-NEXT: lis r5, 24749 > +; P8LE-NEXT: lis r6, -19946 > +; P8LE-NEXT: lis r8, -14230 > +; P8LE-NEXT: ori r5, r5, 47143 > +; P8LE-NEXT: ori r6, r6, 17097 > +; P8LE-NEXT: ori r8, r8, 30865 > +; P8LE-NEXT: mffprd r3, f0 > +; P8LE-NEXT: rldicl r4, r3, 16, 48 > +; P8LE-NEXT: rldicl r7, r3, 32, 48 > +; P8LE-NEXT: rldicl r3, r3, 48, 48 > +; P8LE-NEXT: extsh r4, r4 > +; P8LE-NEXT: extsh r7, r7 > +; P8LE-NEXT: extsh r3, r3 > +; P8LE-NEXT: mulhw r5, r4, r5 > +; P8LE-NEXT: mulhw r6, r7, r6 > +; P8LE-NEXT: mulhw r8, r3, r8 > +; P8LE-NEXT: srwi r9, r5, 31 > +; P8LE-NEXT: srawi r5, r5, 11 > +; P8LE-NEXT: add r6, r6, r7 > +; P8LE-NEXT: add r8, r8, r3 > +; P8LE-NEXT: add r5, r5, r9 > +; P8LE-NEXT: srwi r9, r6, 31 > +; P8LE-NEXT: srawi r6, r6, 4 > +; P8LE-NEXT: add r6, r6, r9 > ; P8LE-NEXT: srwi r9, r8, 31 > ; P8LE-NEXT: srawi r8, r8, 9 > -; P8LE-NEXT: mulli r3, r3, 5423 > +; P8LE-NEXT: mulli r5, r5, 5423 > ; P8LE-NEXT: add r8, r8, r9 > -; P8LE-NEXT: mulli r7, r7, 23 > +; P8LE-NEXT: mulli r6, r6, 23 > +; P8LE-NEXT: li r9, 0 > ; P8LE-NEXT: mulli r8, r8, 654 > -; P8LE-NEXT: sub r3, r5, r3 > -; P8LE-NEXT: mtfprd f0, r3 > -; P8LE-NEXT: sub r3, r6, r7 > -; P8LE-NEXT: sub r4, r4, r8 > -; P8LE-NEXT: mtfprd f1, r3 > -; P8LE-NEXT: mtfprd f2, r4 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v2, v2, v3 > -; P8LE-NEXT: vmrglh v3, v4, v5 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: mtvsrd v2, r9 > +; P8LE-NEXT: sub r4, r4, r5 > +; P8LE-NEXT: sub r5, r7, r6 > +; P8LE-NEXT: mtvsrd v3, r4 > +; P8LE-NEXT: sub r3, r3, r8 > +; P8LE-NEXT: mtvsrd v4, r5 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v3, v3, v4 > +; P8LE-NEXT: vmrghh v2, v5, v2 > +; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > ; P8BE-LABEL: dont_fold_srem_one: > @@ -1161,43 +1113,41 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x > i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -19946 > -; P9LE-NEXT: ori r5, r5, 17097 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: mulhw r5, r4, r5 > -; P9LE-NEXT: add r4, r5, r4 > +; P9LE-NEXT: lis r4, -19946 > +; P9LE-NEXT: ori r4, r4, 17097 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: mulhw r4, r3, r4 > +; P9LE-NEXT: add r4, r4, r3 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 4 > ; P9LE-NEXT: add r4, r4, r5 > -; P9LE-NEXT: lis r5, 24749 > ; P9LE-NEXT: mulli r4, r4, 23 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 24749 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: ori r5, r5, 47143 > -; P9LE-NEXT: mulhw r4, r4, r5 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: ori r4, r4, 47143 > +; P9LE-NEXT: mulhw r4, r3, r4 > ; P9LE-NEXT: srwi r5, r4, 31 > ; P9LE-NEXT: srawi r4, r4, 11 > ; P9LE-NEXT: add r4, r4, r5 > ; P9LE-NEXT: mulli r4, r4, 5423 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: extsh r4, r3 > -; P9LE-NEXT: srawi r4, r4, 15 > +; P9LE-NEXT: extsh r3, r3 > +; P9LE-NEXT: srawi r4, r3, 15 > ; P9LE-NEXT: addze r4, r4 > ; P9LE-NEXT: slwi r4, r4, 15 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxlxor v4, v4, v4 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: li r3, 0 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v3, v2 > ; P9LE-NEXT: blr > ; > @@ -1252,42 +1202,40 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x > i16> %x) { > ; P8LE-NEXT: xxswapd vs0, v2 > ; P8LE-NEXT: lis r4, 24749 > ; P8LE-NEXT: lis r5, -19946 > -; P8LE-NEXT: xxlxor v5, v5, v5 > ; P8LE-NEXT: ori r4, r4, 47143 > ; P8LE-NEXT: ori r5, r5, 17097 > ; P8LE-NEXT: mffprd r3, f0 > ; P8LE-NEXT: rldicl r6, r3, 16, 48 > ; P8LE-NEXT: rldicl r7, r3, 32, 48 > -; P8LE-NEXT: extsh r8, r6 > -; P8LE-NEXT: extsh r9, r7 > -; P8LE-NEXT: mulhw r4, r8, r4 > -; P8LE-NEXT: mulhw r5, r9, r5 > +; P8LE-NEXT: extsh r6, r6 > +; P8LE-NEXT: extsh r7, r7 > +; P8LE-NEXT: mulhw r4, r6, r4 > +; P8LE-NEXT: mulhw r5, r7, r5 > ; P8LE-NEXT: rldicl r3, r3, 48, 48 > +; P8LE-NEXT: extsh r3, r3 > ; P8LE-NEXT: srwi r8, r4, 31 > ; P8LE-NEXT: srawi r4, r4, 11 > -; P8LE-NEXT: add r5, r5, r9 > +; P8LE-NEXT: add r5, r5, r7 > ; P8LE-NEXT: add r4, r4, r8 > ; P8LE-NEXT: srwi r8, r5, 31 > ; P8LE-NEXT: srawi r5, r5, 4 > ; P8LE-NEXT: mulli r4, r4, 5423 > ; P8LE-NEXT: add r5, r5, r8 > -; P8LE-NEXT: extsh r8, r3 > +; P8LE-NEXT: srawi r9, r3, 15 > +; P8LE-NEXT: li r8, 0 > ; P8LE-NEXT: mulli r5, r5, 23 > -; P8LE-NEXT: srawi r8, r8, 15 > +; P8LE-NEXT: mtvsrd v2, r8 > ; P8LE-NEXT: sub r4, r6, r4 > -; P8LE-NEXT: addze r6, r8 > -; P8LE-NEXT: mtfprd f0, r4 > -; P8LE-NEXT: slwi r4, r6, 15 > +; P8LE-NEXT: addze r6, r9 > +; P8LE-NEXT: slwi r6, r6, 15 > +; P8LE-NEXT: mtvsrd v3, r4 > ; P8LE-NEXT: sub r5, r7, r5 > -; P8LE-NEXT: sub r3, r3, r4 > -; P8LE-NEXT: mtfprd f1, r5 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v2, v2, v3 > -; P8LE-NEXT: vmrglh v3, v4, v5 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: sub r3, r3, r6 > +; P8LE-NEXT: mtvsrd v4, r5 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v3, v3, v4 > +; P8LE-NEXT: vmrghh v2, v5, v2 > +; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > ; P8BE-LABEL: dont_fold_urem_i16_smax: > > diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll > b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll > index 323397202c00..95f0fc25f2dd 100644 > --- a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll > +++ b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll > @@ -15,10 +15,10 @@ entry: > } > > ; CHECK-LABEL: @bar0 > +; CHECK-DAG: xxswapd 1, 1 > ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]] > -; CHECK-DAG: xxspltd [[REG2:[0-9]+]] > -; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1 > -; CHECK: stxvd2x [[REG3]] > +; CHECK: xxmrgld [[REG2:[0-9]+]], 1, [[REG1]] > +; CHECK: stxvd2x [[REG2]] > ; CHECK-NOT: xxswapd > > define void @bar1(double %y) { > @@ -30,10 +30,10 @@ entry: > } > > ; CHECK-LABEL: @bar1 > +; CHECK-DAG: xxswapd 1, 1 > ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]] > -; CHECK-DAG: xxspltd [[REG2:[0-9]+]] > -; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]] > -; CHECK: stxvd2x [[REG3]] > +; CHECK: xxpermdi [[REG2:[0-9]+]], [[REG1]], 1, 1 > +; CHECK: stxvd2x [[REG2]] > ; CHECK-NOT: xxswapd > > define void @baz0() { > > diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll > b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll > index 23738eaa95a7..4437e6799269 100644 > --- a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll > +++ b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll > @@ -27,7 +27,7 @@ define void @bar0() { > ; CHECK: ld r3, .LC0 at toc@l(r3) > ; CHECK: addis r3, r2, .LC2 at toc@ha > ; CHECK: ld r3, .LC2 at toc@l(r3) > -; CHECK: xxpermdi vs0, vs0, vs1, 1 > +; CHECK: xxmrgld vs0, vs0, vs1 > ; CHECK: stxvd2x vs0, 0, r3 > ; CHECK: blr > ; > @@ -38,7 +38,7 @@ define void @bar0() { > ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha > ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha > ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3) > -; CHECK-P9-NOVECTOR: xxpermdi vs0, vs1, vs0, 1 > +; CHECK-P9-NOVECTOR: xxmrgld vs0, vs1, vs0 > ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3 > ; CHECK-P9-NOVECTOR: blr > ; > @@ -72,7 +72,7 @@ define void @bar1() { > ; CHECK: ld r3, .LC0 at toc@l(r3) > ; CHECK: addis r3, r2, .LC2 at toc@ha > ; CHECK: ld r3, .LC2 at toc@l(r3) > -; CHECK: xxmrghd vs0, vs1, vs0 > +; CHECK: xxpermdi vs0, vs1, vs0, 1 > ; CHECK: stxvd2x vs0, 0, r3 > ; CHECK: blr > ; > @@ -83,7 +83,7 @@ define void @bar1() { > ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha > ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha > ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3) > -; CHECK-P9-NOVECTOR: xxmrghd vs0, vs0, vs1 > +; CHECK-P9-NOVECTOR: xxpermdi vs0, vs0, vs1, 1 > ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3 > ; CHECK-P9-NOVECTOR: blr > ; > > diff --git a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll > b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll > index d853a420dcd8..4bb3730aa043 100644 > --- a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll > +++ b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll > @@ -13,53 +13,50 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, 21399 > -; P9LE-NEXT: ori r5, r5, 33437 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: lis r5, 16727 > -; P9LE-NEXT: ori r5, r5, 2287 > +; P9LE-NEXT: lis r4, 21399 > +; P9LE-NEXT: ori r4, r4, 33437 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r4, r3, r4 > ; P9LE-NEXT: srwi r4, r4, 5 > ; P9LE-NEXT: mulli r4, r4, 98 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 16727 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: lis r5, 8456 > -; P9LE-NEXT: ori r5, r5, 16913 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: ori r4, r4, 2287 > +; P9LE-NEXT: mulhwu r4, r3, r4 > ; P9LE-NEXT: srwi r4, r4, 8 > ; P9LE-NEXT: mulli r4, r4, 1003 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: rlwinm r4, r3, 30, 18, 31 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: lis r5, 22765 > -; P9LE-NEXT: ori r5, r5, 8969 > -; P9LE-NEXT: srwi r4, r4, 2 > -; P9LE-NEXT: mulli r4, r4, 124 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r5, 8456 > +; P9LE-NEXT: ori r5, r5, 16913 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: clrlwi r4, r3, 16 > +; P9LE-NEXT: rlwinm r3, r3, 30, 18, 31 > +; P9LE-NEXT: mulhwu r3, r3, r5 > +; P9LE-NEXT: srwi r3, r3, 2 > +; P9LE-NEXT: mulli r3, r3, 124 > +; P9LE-NEXT: sub r3, r4, r3 > +; P9LE-NEXT: lis r4, 22765 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r5, r4, r5 > -; P9LE-NEXT: sub r4, r4, r5 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r5 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: ori r4, r4, 8969 > +; P9LE-NEXT: mulhwu r4, r3, r4 > +; P9LE-NEXT: sub r5, r3, r4 > +; P9LE-NEXT: srwi r5, r5, 1 > +; P9LE-NEXT: add r4, r5, r4 > ; P9LE-NEXT: srwi r4, r4, 6 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v4, v2 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v4, v2 > ; P9LE-NEXT: vmrglw v2, v3, v2 > ; P9LE-NEXT: blr > ; > @@ -123,50 +120,47 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) { > ; P8LE-NEXT: xxswapd vs0, v2 > ; P8LE-NEXT: lis r3, 22765 > ; P8LE-NEXT: lis r7, 21399 > -; P8LE-NEXT: lis r10, 16727 > +; P8LE-NEXT: lis r9, 16727 > +; P8LE-NEXT: lis r10, 8456 > ; P8LE-NEXT: ori r3, r3, 8969 > ; P8LE-NEXT: ori r7, r7, 33437 > -; P8LE-NEXT: ori r10, r10, 2287 > +; P8LE-NEXT: ori r9, r9, 2287 > +; P8LE-NEXT: ori r10, r10, 16913 > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: clrldi r6, r4, 48 > ; P8LE-NEXT: rldicl r5, r4, 32, 48 > -; P8LE-NEXT: clrlwi r9, r6, 16 > +; P8LE-NEXT: clrlwi r6, r6, 16 > ; P8LE-NEXT: rldicl r8, r4, 16, 48 > -; P8LE-NEXT: clrlwi r11, r5, 16 > -; P8LE-NEXT: mulhwu r3, r9, r3 > -; P8LE-NEXT: clrlwi r12, r8, 16 > -; P8LE-NEXT: mulhwu r7, r11, r7 > -; P8LE-NEXT: lis r11, 8456 > +; P8LE-NEXT: clrlwi r5, r5, 16 > +; P8LE-NEXT: mulhwu r3, r6, r3 > ; P8LE-NEXT: rldicl r4, r4, 48, 48 > -; P8LE-NEXT: mulhwu r10, r12, r10 > -; P8LE-NEXT: ori r11, r11, 16913 > -; P8LE-NEXT: rlwinm r12, r4, 30, 18, 31 > -; P8LE-NEXT: mulhwu r11, r12, r11 > -; P8LE-NEXT: sub r9, r9, r3 > -; P8LE-NEXT: srwi r9, r9, 1 > +; P8LE-NEXT: clrlwi r8, r8, 16 > +; P8LE-NEXT: rlwinm r11, r4, 30, 18, 31 > +; P8LE-NEXT: mulhwu r7, r5, r7 > +; P8LE-NEXT: clrlwi r4, r4, 16 > +; P8LE-NEXT: mulhwu r9, r8, r9 > +; P8LE-NEXT: mulhwu r10, r11, r10 > +; P8LE-NEXT: sub r11, r6, r3 > +; P8LE-NEXT: srwi r11, r11, 1 > ; P8LE-NEXT: srwi r7, r7, 5 > -; P8LE-NEXT: add r3, r9, r3 > -; P8LE-NEXT: srwi r9, r10, 8 > +; P8LE-NEXT: add r3, r11, r3 > +; P8LE-NEXT: srwi r9, r9, 8 > +; P8LE-NEXT: srwi r10, r10, 2 > ; P8LE-NEXT: srwi r3, r3, 6 > ; P8LE-NEXT: mulli r7, r7, 98 > -; P8LE-NEXT: srwi r10, r11, 2 > ; P8LE-NEXT: mulli r9, r9, 1003 > ; P8LE-NEXT: mulli r3, r3, 95 > ; P8LE-NEXT: mulli r10, r10, 124 > ; P8LE-NEXT: sub r5, r5, r7 > ; P8LE-NEXT: sub r7, r8, r9 > -; P8LE-NEXT: mtfprd f0, r5 > ; P8LE-NEXT: sub r3, r6, r3 > +; P8LE-NEXT: mtvsrd v2, r5 > ; P8LE-NEXT: sub r4, r4, r10 > -; P8LE-NEXT: mtfprd f1, r7 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: vmrglh v3, v5, v4 > +; P8LE-NEXT: mtvsrd v3, r7 > +; P8LE-NEXT: mtvsrd v4, r3 > +; P8LE-NEXT: mtvsrd v5, r4 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: vmrghh v3, v5, v4 > ; P8LE-NEXT: vmrglw v2, v2, v3 > ; P8LE-NEXT: blr > ; > @@ -230,56 +224,52 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, 22765 > -; P9LE-NEXT: ori r5, r5, 8969 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r6, r4, r5 > -; P9LE-NEXT: sub r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 22765 > +; P9LE-NEXT: ori r4, r4, 8969 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r5, r3, r4 > +; P9LE-NEXT: sub r6, r3, r5 > +; P9LE-NEXT: srwi r6, r6, 1 > +; P9LE-NEXT: add r5, r6, r5 > +; P9LE-NEXT: srwi r5, r5, 6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r6, r4, r5 > -; P9LE-NEXT: sub r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r5, r3, r4 > +; P9LE-NEXT: sub r6, r3, r5 > +; P9LE-NEXT: srwi r6, r6, 1 > +; P9LE-NEXT: add r5, r6, r5 > +; P9LE-NEXT: srwi r5, r5, 6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r6, r4, r5 > -; P9LE-NEXT: sub r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 6 > -; P9LE-NEXT: mulli r4, r4, 95 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r5, r3, r4 > +; P9LE-NEXT: sub r6, r3, r5 > +; P9LE-NEXT: srwi r6, r6, 1 > +; P9LE-NEXT: add r5, r6, r5 > +; P9LE-NEXT: srwi r5, r5, 6 > +; P9LE-NEXT: mulli r5, r5, 95 > +; P9LE-NEXT: sub r3, r3, r5 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r5, r4, r5 > -; P9LE-NEXT: sub r4, r4, r5 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r5 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r4, r3, r4 > +; P9LE-NEXT: sub r5, r3, r4 > +; P9LE-NEXT: srwi r5, r5, 1 > +; P9LE-NEXT: add r4, r5, r4 > ; P9LE-NEXT: srwi r4, r4, 6 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -344,36 +334,34 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > ; P8LE-NEXT: lis r3, 22765 > -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill > ; P8LE-NEXT: ori r3, r3, 8969 > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: clrldi r5, r4, 48 > ; P8LE-NEXT: rldicl r6, r4, 48, 48 > -; P8LE-NEXT: clrlwi r8, r5, 16 > +; P8LE-NEXT: clrlwi r5, r5, 16 > ; P8LE-NEXT: rldicl r7, r4, 32, 48 > -; P8LE-NEXT: clrlwi r9, r6, 16 > +; P8LE-NEXT: clrlwi r6, r6, 16 > +; P8LE-NEXT: mulhwu r8, r5, r3 > ; P8LE-NEXT: rldicl r4, r4, 16, 48 > -; P8LE-NEXT: mulhwu r10, r8, r3 > -; P8LE-NEXT: clrlwi r11, r7, 16 > -; P8LE-NEXT: clrlwi r0, r4, 16 > -; P8LE-NEXT: mulhwu r12, r9, r3 > -; P8LE-NEXT: mulhwu r30, r11, r3 > -; P8LE-NEXT: mulhwu r3, r0, r3 > -; P8LE-NEXT: sub r8, r8, r10 > -; P8LE-NEXT: srwi r8, r8, 1 > -; P8LE-NEXT: sub r9, r9, r12 > -; P8LE-NEXT: add r8, r8, r10 > -; P8LE-NEXT: sub r10, r11, r30 > -; P8LE-NEXT: sub r11, r0, r3 > -; P8LE-NEXT: srwi r9, r9, 1 > -; P8LE-NEXT: srwi r10, r10, 1 > +; P8LE-NEXT: clrlwi r7, r7, 16 > +; P8LE-NEXT: mulhwu r9, r6, r3 > +; P8LE-NEXT: clrlwi r4, r4, 16 > +; P8LE-NEXT: mulhwu r10, r7, r3 > +; P8LE-NEXT: mulhwu r3, r4, r3 > +; P8LE-NEXT: sub r11, r5, r8 > +; P8LE-NEXT: sub r12, r6, r9 > +; P8LE-NEXT: srwi r11, r11, 1 > +; P8LE-NEXT: add r8, r11, r8 > +; P8LE-NEXT: sub r11, r7, r10 > +; P8LE-NEXT: srwi r12, r12, 1 > +; P8LE-NEXT: add r9, r12, r9 > +; P8LE-NEXT: sub r12, r4, r3 > ; P8LE-NEXT: srwi r11, r11, 1 > -; P8LE-NEXT: add r9, r9, r12 > ; P8LE-NEXT: srwi r8, r8, 6 > -; P8LE-NEXT: add r10, r10, r30 > -; P8LE-NEXT: add r3, r11, r3 > +; P8LE-NEXT: add r10, r11, r10 > +; P8LE-NEXT: srwi r11, r12, 1 > ; P8LE-NEXT: srwi r9, r9, 6 > -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload > +; P8LE-NEXT: add r3, r11, r3 > ; P8LE-NEXT: mulli r8, r8, 95 > ; P8LE-NEXT: srwi r10, r10, 6 > ; P8LE-NEXT: srwi r3, r3, 6 > @@ -382,18 +370,14 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { > ; P8LE-NEXT: mulli r3, r3, 95 > ; P8LE-NEXT: sub r5, r5, r8 > ; P8LE-NEXT: sub r6, r6, r9 > -; P8LE-NEXT: mtfprd f0, r5 > +; P8LE-NEXT: mtvsrd v2, r5 > ; P8LE-NEXT: sub r5, r7, r10 > ; P8LE-NEXT: sub r3, r4, r3 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: mtfprd f2, r5 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f3, r3 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: vmrglh v3, v5, v4 > +; P8LE-NEXT: mtvsrd v3, r6 > +; P8LE-NEXT: mtvsrd v4, r5 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: vmrghh v3, v5, v4 > ; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -461,67 +445,59 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, 22765 > -; P9LE-NEXT: ori r5, r5, 8969 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r6, r4, r5 > -; P9LE-NEXT: sub r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r6 > -; P9LE-NEXT: srwi r4, r4, 6 > -; P9LE-NEXT: mulli r6, r4, 95 > +; P9LE-NEXT: lis r4, 22765 > +; P9LE-NEXT: ori r4, r4, 8969 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r5, r3, r4 > +; P9LE-NEXT: sub r6, r3, r5 > +; P9LE-NEXT: srwi r6, r6, 1 > +; P9LE-NEXT: add r5, r6, r5 > +; P9LE-NEXT: srwi r5, r5, 6 > +; P9LE-NEXT: mulli r6, r5, 95 > ; P9LE-NEXT: sub r3, r3, r6 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r6, r3, 16 > -; P9LE-NEXT: mulhwu r7, r6, r5 > +; P9LE-NEXT: mulhwu r7, r6, r4 > ; P9LE-NEXT: sub r6, r6, r7 > ; P9LE-NEXT: srwi r6, r6, 1 > ; P9LE-NEXT: add r6, r6, r7 > ; P9LE-NEXT: srwi r6, r6, 6 > ; P9LE-NEXT: mulli r7, r6, 95 > ; P9LE-NEXT: sub r3, r3, r7 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r7, r3, 16 > -; P9LE-NEXT: mulhwu r8, r7, r5 > +; P9LE-NEXT: mulhwu r8, r7, r4 > ; P9LE-NEXT: sub r7, r7, r8 > ; P9LE-NEXT: srwi r7, r7, 1 > ; P9LE-NEXT: add r7, r7, r8 > ; P9LE-NEXT: srwi r7, r7, 6 > ; P9LE-NEXT: mulli r8, r7, 95 > ; P9LE-NEXT: sub r3, r3, r8 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r8, r3, 16 > -; P9LE-NEXT: mulhwu r5, r8, r5 > -; P9LE-NEXT: sub r8, r8, r5 > +; P9LE-NEXT: mulhwu r4, r8, r4 > +; P9LE-NEXT: sub r8, r8, r4 > ; P9LE-NEXT: srwi r8, r8, 1 > -; P9LE-NEXT: add r5, r8, r5 > -; P9LE-NEXT: srwi r5, r5, 6 > -; P9LE-NEXT: mulli r8, r5, 95 > +; P9LE-NEXT: add r4, r8, r4 > +; P9LE-NEXT: srwi r4, r4, 6 > +; P9LE-NEXT: mulli r8, r4, 95 > ; P9LE-NEXT: sub r3, r3, r8 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: mtfprd f0, r4 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > +; P9LE-NEXT: mtvsrd v4, r6 > ; P9LE-NEXT: vmrglw v2, v2, v3 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r6 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r7 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r5 > -; P9LE-NEXT: xxswapd v5, vs0 > -; P9LE-NEXT: vmrglh v4, v5, v4 > +; P9LE-NEXT: mtvsrd v3, r5 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: mtvsrd v4, r7 > +; P9LE-NEXT: mtvsrd v5, r4 > +; P9LE-NEXT: vmrghh v4, v5, v4 > ; P9LE-NEXT: vmrglw v3, v4, v3 > ; P9LE-NEXT: vadduhm v2, v2, v3 > ; P9LE-NEXT: blr > @@ -598,69 +574,61 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) { > ; P8LE-LABEL: combine_urem_udiv: > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > -; P8LE-NEXT: lis r4, 22765 > +; P8LE-NEXT: lis r3, 22765 > ; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill > -; P8LE-NEXT: ori r4, r4, 8969 > -; P8LE-NEXT: mffprd r5, f0 > -; P8LE-NEXT: clrldi r3, r5, 48 > -; P8LE-NEXT: rldicl r6, r5, 48, 48 > -; P8LE-NEXT: clrlwi r8, r3, 16 > -; P8LE-NEXT: rldicl r7, r5, 32, 48 > -; P8LE-NEXT: clrlwi r9, r6, 16 > -; P8LE-NEXT: mulhwu r10, r8, r4 > -; P8LE-NEXT: clrlwi r11, r7, 16 > -; P8LE-NEXT: rldicl r5, r5, 16, 48 > -; P8LE-NEXT: mulhwu r12, r9, r4 > -; P8LE-NEXT: mulhwu r0, r11, r4 > -; P8LE-NEXT: clrlwi r30, r5, 16 > -; P8LE-NEXT: mulhwu r4, r30, r4 > -; P8LE-NEXT: sub r8, r8, r10 > +; P8LE-NEXT: ori r3, r3, 8969 > +; P8LE-NEXT: mffprd r4, f0 > +; P8LE-NEXT: clrldi r5, r4, 48 > +; P8LE-NEXT: rldicl r6, r4, 48, 48 > +; P8LE-NEXT: clrlwi r5, r5, 16 > +; P8LE-NEXT: clrlwi r8, r6, 16 > +; P8LE-NEXT: rldicl r7, r4, 32, 48 > +; P8LE-NEXT: rldicl r4, r4, 16, 48 > +; P8LE-NEXT: mulhwu r9, r5, r3 > +; P8LE-NEXT: mulhwu r11, r8, r3 > +; P8LE-NEXT: clrlwi r10, r7, 16 > +; P8LE-NEXT: clrlwi r12, r4, 16 > +; P8LE-NEXT: mulhwu r0, r10, r3 > +; P8LE-NEXT: mulhwu r3, r12, r3 > +; P8LE-NEXT: sub r30, r5, r9 > +; P8LE-NEXT: sub r8, r8, r11 > +; P8LE-NEXT: srwi r30, r30, 1 > ; P8LE-NEXT: srwi r8, r8, 1 > -; P8LE-NEXT: sub r9, r9, r12 > -; P8LE-NEXT: add r8, r8, r10 > -; P8LE-NEXT: sub r10, r11, r0 > -; P8LE-NEXT: srwi r9, r9, 1 > +; P8LE-NEXT: sub r10, r10, r0 > +; P8LE-NEXT: add r9, r30, r9 > +; P8LE-NEXT: add r8, r8, r11 > +; P8LE-NEXT: sub r11, r12, r3 > ; P8LE-NEXT: srwi r10, r10, 1 > -; P8LE-NEXT: sub r11, r30, r4 > -; P8LE-NEXT: add r9, r9, r12 > -; P8LE-NEXT: srwi r8, r8, 6 > ; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload > -; P8LE-NEXT: add r10, r10, r0 > -; P8LE-NEXT: srwi r11, r11, 1 > ; P8LE-NEXT: srwi r9, r9, 6 > -; P8LE-NEXT: mtfprd f0, r8 > -; P8LE-NEXT: mulli r12, r8, 95 > +; P8LE-NEXT: srwi r11, r11, 1 > +; P8LE-NEXT: srwi r8, r8, 6 > +; P8LE-NEXT: add r10, r10, r0 > +; P8LE-NEXT: mulli r12, r9, 95 > +; P8LE-NEXT: add r3, r11, r3 > +; P8LE-NEXT: mtvsrd v2, r9 > ; P8LE-NEXT: srwi r10, r10, 6 > -; P8LE-NEXT: add r4, r11, r4 > -; P8LE-NEXT: mtfprd f1, r9 > -; P8LE-NEXT: mulli r8, r9, 95 > -; P8LE-NEXT: mulli r9, r10, 95 > -; P8LE-NEXT: srwi r4, r4, 6 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: mtfprd f2, r10 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: mulli r4, r4, 95 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v1, vs2 > -; P8LE-NEXT: sub r3, r3, r12 > -; P8LE-NEXT: xxswapd v6, vs3 > -; P8LE-NEXT: mtfprd f0, r3 > -; P8LE-NEXT: sub r3, r7, r9 > -; P8LE-NEXT: sub r6, r6, r8 > -; P8LE-NEXT: mtfprd f4, r3 > -; P8LE-NEXT: sub r3, r5, r4 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: mtfprd f5, r3 > -; P8LE-NEXT: xxswapd v5, vs4 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: xxswapd v3, vs0 > -; P8LE-NEXT: xxswapd v4, vs1 > -; P8LE-NEXT: xxswapd v0, vs5 > -; P8LE-NEXT: vmrglh v3, v4, v3 > -; P8LE-NEXT: vmrglh v4, v0, v5 > -; P8LE-NEXT: vmrglh v5, v6, v1 > -; P8LE-NEXT: vmrglw v3, v4, v3 > -; P8LE-NEXT: vmrglw v2, v5, v2 > +; P8LE-NEXT: mulli r9, r8, 95 > +; P8LE-NEXT: srwi r3, r3, 6 > +; P8LE-NEXT: mtvsrd v3, r8 > +; P8LE-NEXT: mulli r8, r10, 95 > +; P8LE-NEXT: mtvsrd v4, r10 > +; P8LE-NEXT: mulli r10, r3, 95 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: sub r5, r5, r12 > +; P8LE-NEXT: sub r6, r6, r9 > +; P8LE-NEXT: mtvsrd v3, r5 > +; P8LE-NEXT: mtvsrd v5, r6 > +; P8LE-NEXT: sub r5, r7, r8 > +; P8LE-NEXT: sub r4, r4, r10 > +; P8LE-NEXT: mtvsrd v0, r5 > +; P8LE-NEXT: mtvsrd v1, r4 > +; P8LE-NEXT: vmrghh v3, v5, v3 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v0, v1, v0 > +; P8LE-NEXT: vmrghh v4, v5, v4 > +; P8LE-NEXT: vmrglw v3, v0, v3 > +; P8LE-NEXT: vmrglw v2, v4, v2 > ; P8LE-NEXT: vadduhm v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -742,34 +710,30 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x > i16> %x) { > ; P9LE-NEXT: li r3, 0 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r3, r3, 26 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r3, r3, 27 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, 22765 > -; P9LE-NEXT: ori r5, r5, 8969 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r5, r4, r5 > -; P9LE-NEXT: sub r4, r4, r5 > -; P9LE-NEXT: srwi r4, r4, 1 > -; P9LE-NEXT: add r4, r4, r5 > +; P9LE-NEXT: lis r4, 22765 > +; P9LE-NEXT: ori r4, r4, 8969 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r4, r3, r4 > +; P9LE-NEXT: sub r5, r3, r4 > +; P9LE-NEXT: srwi r5, r5, 1 > +; P9LE-NEXT: add r4, r5, r4 > ; P9LE-NEXT: srwi r4, r4, 6 > ; P9LE-NEXT: mulli r4, r4, 95 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > ; P9LE-NEXT: clrlwi r3, r3, 29 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v2, v4, v2 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: vmrghh v2, v4, v2 > ; P9LE-NEXT: vmrglw v2, v2, v3 > ; P9LE-NEXT: blr > ; > @@ -817,9 +781,9 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x > i16> %x) { > ; P8LE-NEXT: mffprd r4, f0 > ; P8LE-NEXT: rldicl r5, r4, 16, 48 > ; P8LE-NEXT: rldicl r7, r4, 48, 48 > -; P8LE-NEXT: clrlwi r6, r5, 16 > -; P8LE-NEXT: mulhwu r3, r6, r3 > -; P8LE-NEXT: sub r6, r6, r3 > +; P8LE-NEXT: clrlwi r5, r5, 16 > +; P8LE-NEXT: mulhwu r3, r5, r3 > +; P8LE-NEXT: sub r6, r5, r3 > ; P8LE-NEXT: srwi r6, r6, 1 > ; P8LE-NEXT: add r3, r6, r3 > ; P8LE-NEXT: clrldi r6, r4, 48 > @@ -827,19 +791,15 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x > i16> %x) { > ; P8LE-NEXT: clrlwi r6, r6, 26 > ; P8LE-NEXT: mulli r3, r3, 95 > ; P8LE-NEXT: rldicl r4, r4, 32, 48 > -; P8LE-NEXT: mtfprd f0, r6 > +; P8LE-NEXT: mtvsrd v2, r6 > ; P8LE-NEXT: clrlwi r6, r7, 27 > ; P8LE-NEXT: clrlwi r4, r4, 29 > -; P8LE-NEXT: mtfprd f1, r6 > -; P8LE-NEXT: mtfprd f3, r4 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: xxswapd v3, vs1 > +; P8LE-NEXT: mtvsrd v3, r6 > +; P8LE-NEXT: mtvsrd v5, r4 > +; P8LE-NEXT: vmrghh v2, v3, v2 > ; P8LE-NEXT: sub r3, r5, r3 > -; P8LE-NEXT: xxswapd v5, vs3 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v3, v4, v5 > +; P8LE-NEXT: mtvsrd v4, r3 > +; P8LE-NEXT: vmrghh v3, v4, v5 > ; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > @@ -885,40 +845,39 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) { > ; P9LE: # %bb.0: > ; P9LE-NEXT: li r3, 4 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: lis r5, -19946 > -; P9LE-NEXT: ori r5, r5, 17097 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: lis r5, 24749 > -; P9LE-NEXT: ori r5, r5, 47143 > +; P9LE-NEXT: lis r4, -19946 > +; P9LE-NEXT: ori r4, r4, 17097 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: mulhwu r4, r3, r4 > ; P9LE-NEXT: srwi r4, r4, 4 > ; P9LE-NEXT: mulli r4, r4, 23 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: lis r4, 24749 > +; P9LE-NEXT: mtvsrd v3, r3 > ; P9LE-NEXT: li r3, 6 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: clrlwi r4, r3, 16 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: lis r5, -14230 > -; P9LE-NEXT: ori r5, r5, 30865 > +; P9LE-NEXT: clrlwi r3, r3, 16 > +; P9LE-NEXT: ori r4, r4, 47143 > +; P9LE-NEXT: mulhwu r4, r3, r4 > ; P9LE-NEXT: srwi r4, r4, 11 > ; P9LE-NEXT: mulli r4, r4, 5423 > ; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v3, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > +; P9LE-NEXT: mtvsrd v4, r3 > ; P9LE-NEXT: li r3, 2 > ; P9LE-NEXT: vextuhrx r3, r3, v2 > -; P9LE-NEXT: rlwinm r4, r3, 31, 17, 31 > -; P9LE-NEXT: mulhwu r4, r4, r5 > -; P9LE-NEXT: srwi r4, r4, 8 > -; P9LE-NEXT: mulli r4, r4, 654 > -; P9LE-NEXT: sub r3, r3, r4 > -; P9LE-NEXT: xxswapd v4, vs0 > -; P9LE-NEXT: mtfprd f0, r3 > -; P9LE-NEXT: xxswapd v2, vs0 > -; P9LE-NEXT: vmrglh v3, v4, v3 > -; P9LE-NEXT: xxlxor v4, v4, v4 > -; P9LE-NEXT: vmrglh v2, v2, v4 > +; P9LE-NEXT: lis r5, -14230 > +; P9LE-NEXT: ori r5, r5, 30865 > +; P9LE-NEXT: vmrghh v3, v4, v3 > +; P9LE-NEXT: clrlwi r4, r3, 16 > +; P9LE-NEXT: rlwinm r3, r3, 31, 17, 31 > +; P9LE-NEXT: mulhwu r3, r3, r5 > +; P9LE-NEXT: srwi r3, r3, 8 > +; P9LE-NEXT: mulli r3, r3, 654 > +; P9LE-NEXT: sub r3, r4, r3 > +; P9LE-NEXT: mtvsrd v2, r3 > +; P9LE-NEXT: li r3, 0 > +; P9LE-NEXT: mtvsrd v4, r3 > +; P9LE-NEXT: vmrghh v2, v2, v4 > ; P9LE-NEXT: vmrglw v2, v3, v2 > ; P9LE-NEXT: blr > ; > @@ -969,41 +928,40 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) { > ; P8LE-LABEL: dont_fold_urem_one: > ; P8LE: # %bb.0: > ; P8LE-NEXT: xxswapd vs0, v2 > -; P8LE-NEXT: lis r3, -19946 > -; P8LE-NEXT: lis r7, 24749 > -; P8LE-NEXT: lis r9, -14230 > -; P8LE-NEXT: xxlxor v5, v5, v5 > -; P8LE-NEXT: ori r3, r3, 17097 > -; P8LE-NEXT: ori r7, r7, 47143 > -; P8LE-NEXT: ori r9, r9, 30865 > +; P8LE-NEXT: lis r3, -14230 > +; P8LE-NEXT: lis r7, -19946 > +; P8LE-NEXT: lis r9, 24749 > +; P8LE-NEXT: ori r3, r3, 30865 > +; P8LE-NEXT: ori r7, r7, 17097 > ; P8LE-NEXT: mffprd r4, f0 > -; P8LE-NEXT: rldicl r5, r4, 32, 48 > -; P8LE-NEXT: rldicl r6, r4, 16, 48 > -; P8LE-NEXT: clrlwi r8, r5, 16 > -; P8LE-NEXT: rldicl r4, r4, 48, 48 > +; P8LE-NEXT: rldicl r5, r4, 48, 48 > +; P8LE-NEXT: rldicl r6, r4, 32, 48 > +; P8LE-NEXT: rldicl r4, r4, 16, 48 > +; P8LE-NEXT: rlwinm r8, r5, 31, 17, 31 > +; P8LE-NEXT: clrlwi r6, r6, 16 > +; P8LE-NEXT: clrlwi r5, r5, 16 > ; P8LE-NEXT: mulhwu r3, r8, r3 > -; P8LE-NEXT: clrlwi r8, r6, 16 > -; P8LE-NEXT: mulhwu r7, r8, r7 > -; P8LE-NEXT: rlwinm r8, r4, 31, 17, 31 > -; P8LE-NEXT: mulhwu r8, r8, r9 > -; P8LE-NEXT: srwi r3, r3, 4 > -; P8LE-NEXT: srwi r7, r7, 11 > -; P8LE-NEXT: mulli r3, r3, 23 > -; P8LE-NEXT: srwi r8, r8, 8 > -; P8LE-NEXT: mulli r7, r7, 5423 > -; P8LE-NEXT: mulli r8, r8, 654 > +; P8LE-NEXT: ori r8, r9, 47143 > +; P8LE-NEXT: clrlwi r4, r4, 16 > +; P8LE-NEXT: li r9, 0 > +; P8LE-NEXT: mulhwu r7, r6, r7 > +; P8LE-NEXT: mulhwu r8, r4, r8 > +; P8LE-NEXT: mtvsrd v2, r9 > +; P8LE-NEXT: srwi r3, r3, 8 > +; P8LE-NEXT: srwi r7, r7, 4 > +; P8LE-NEXT: mulli r3, r3, 654 > +; P8LE-NEXT: srwi r8, r8, 11 > +; P8LE-NEXT: mulli r7, r7, 23 > +; P8LE-NEXT: mulli r8, r8, 5423 > ; P8LE-NEXT: sub r3, r5, r3 > ; P8LE-NEXT: sub r5, r6, r7 > -; P8LE-NEXT: mtfprd f0, r3 > +; P8LE-NEXT: mtvsrd v3, r3 > ; P8LE-NEXT: sub r3, r4, r8 > -; P8LE-NEXT: mtfprd f1, r5 > -; P8LE-NEXT: mtfprd f2, r3 > -; P8LE-NEXT: xxswapd v2, vs0 > -; P8LE-NEXT: xxswapd v3, vs1 > -; P8LE-NEXT: xxswapd v4, vs2 > -; P8LE-NEXT: vmrglh v2, v3, v2 > -; P8LE-NEXT: vmrglh v3, v4, v5 > -; P8LE-NEXT: vmrglw v2, v2, v3 > +; P8LE-NEXT: mtvsrd v4, r5 > +; P8LE-NEXT: mtvsrd v5, r3 > +; P8LE-NEXT: vmrghh v2, v3, v2 > +; P8LE-NEXT: vmrghh v3, v5, v4 > +; P8LE-NEXT: vmrglw v2, v3, v2 > ; P8LE-NEXT: blr > ; > ; P8BE-LABEL: dont_fold_urem_one: > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll > index 239b38e2ec70..48b62f57c1c9 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll > @@ -20,12 +20,10 @@ define i32 @test2elt(i64 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: vmrghh v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -40,13 +38,11 @@ define i32 @test2elt(i64 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs1 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: li r3, 0 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -90,20 +86,16 @@ define i64 @test4elt(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f3, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs2 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v4, v5 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: vmrghh v3, v4, v3 > +; CHECK-P8-NEXT: vmrghh v2, v2, v5 > +; CHECK-P8-NEXT: vmrglw v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -114,27 +106,23 @@ define i64 @test4elt(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v4, v2 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: vmrghh v2, v4, v2 > ; CHECK-P9-NEXT: vmrglw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > @@ -180,59 +168,51 @@ define <8 x i16> @test8elt(<8 x float>* nocapture > readonly) local_unnamed_addr # > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v2, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: lvx v5, r3, r4 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: lvx v3, r3, r4 > ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v5 > -; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: xscvspdpn f2, v2 > +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 > +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mffprwz r5, f0 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: mtfprd f0, r5 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglh v2, v4, v3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v5, vs2 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: xxswapd vs0, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: mffprwz r3, f2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f4 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvdpsxws f4, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v2, v4, v2 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v6, vs3 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: vmrglh v3, v3, v4 > -; CHECK-P8-NEXT: vmrglh v4, v0, v5 > -; CHECK-P8-NEXT: vmrglh v5, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghh v3, v3, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghh v5, v0, v5 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > +; CHECK-P8-NEXT: vmrghh v4, v4, v1 > +; CHECK-P8-NEXT: vmrglw v3, v4, v5 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P8-NEXT: blr > ; > @@ -244,53 +224,45 @@ define <8 x i16> @test8elt(<8 x float>* nocapture > readonly) local_unnamed_addr # > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P9-NEXT: blr > @@ -363,116 +335,100 @@ define void @test16elt(<16 x i16>* noalias > nocapture sret %agg.result, <16 x flo > ; CHECK-P8-LABEL: test16elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v5, 0, r4 > -; CHECK-P8-NEXT: li r6, 32 > ; CHECK-P8-NEXT: li r5, 16 > -; CHECK-P8-NEXT: lvx v2, r4, r6 > +; CHECK-P8-NEXT: li r6, 32 > ; CHECK-P8-NEXT: lvx v3, r4, r5 > +; CHECK-P8-NEXT: lvx v2, r4, r6 > ; CHECK-P8-NEXT: li r6, 48 > -; CHECK-P8-NEXT: xscvspdpn f0, v5 > -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3 > +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3 > +; CHECK-P8-NEXT: xscvspdpn f1, v5 > ; CHECK-P8-NEXT: lvx v4, r4, r6 > -; CHECK-P8-NEXT: xscvspdpn f4, v2 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f2, v3 > ; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xxswapd vs8, v3 > -; CHECK-P8-NEXT: xscvspdpn f6, v4 > +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3 > +; CHECK-P8-NEXT: xxswapd vs8, v3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvspdpn f3, vs3 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: xscvspdpn f7, vs7 > +; CHECK-P8-NEXT: xscvspdpn f8, vs8 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1 > +; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: xscvspdpn f2, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f1, f5 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1 > +; CHECK-P8-NEXT: xscvspdpn f4, v2 > +; CHECK-P8-NEXT: xscvdpsxws f5, f7 > +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3 > +; CHECK-P8-NEXT: xscvspdpn f6, v4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f1, f8 > +; CHECK-P8-NEXT: xxswapd vs8, v4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f5 > +; CHECK-P8-NEXT: xxswapd vs5, v2 > ; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1 > -; CHECK-P8-NEXT: xscvspdpn f8, vs8 > -; CHECK-P8-NEXT: xxswapd vs11, v2 > ; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xxswapd v2, v4 > +; CHECK-P8-NEXT: vmrghh v3, v0, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f6, f6 > +; CHECK-P8-NEXT: xscvspdpn f1, vs5 > +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v2, v5, v1 > +; CHECK-P8-NEXT: vmrghh v5, v6, v0 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f2, f3 > +; CHECK-P8-NEXT: xscvspdpn f5, vs5 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > ; CHECK-P8-NEXT: xscvspdpn f7, vs7 > -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1 > -; CHECK-P8-NEXT: xscvspdpn f10, vs10 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1 > +; CHECK-P8-NEXT: xscvspdpn f8, vs8 > +; CHECK-P8-NEXT: xscvdpsxws f0, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvspdpn f1, vs2 > +; CHECK-P8-NEXT: xscvdpsxws f3, f7 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xscvdpsxws f0, f8 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvspdpn f9, vs9 > -; CHECK-P8-NEXT: xscvdpsxws f6, f6 > -; CHECK-P8-NEXT: xscvspdpn f12, vs12 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: vmrghh v0, v0, v7 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvspdpn f11, vs11 > -; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvspdpn v2, v2 > -; CHECK-P8-NEXT: xscvdpsxws f8, f8 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: mffprwz r6, f2 > -; CHECK-P8-NEXT: xscvspdpn f13, vs13 > -; CHECK-P8-NEXT: xscvspdpn v3, v3 > -; CHECK-P8-NEXT: xscvdpsxws f10, f10 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: vmrghh v4, v8, v4 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: mtfprd f2, r6 > -; CHECK-P8-NEXT: mffprwz r6, f6 > -; CHECK-P8-NEXT: xscvdpsxws f12, f12 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f6, r6 > -; CHECK-P8-NEXT: mffprwz r6, f3 > -; CHECK-P8-NEXT: xscvdpsxws v2, v2 > -; CHECK-P8-NEXT: xxswapd v9, vs6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f8 > -; CHECK-P8-NEXT: mtfprd f3, r6 > -; CHECK-P8-NEXT: xxswapd v0, vs5 > -; CHECK-P8-NEXT: mffprwz r6, f7 > -; CHECK-P8-NEXT: xscvdpsxws f13, f13 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: xscvdpsxws v3, v3 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: mffprwz r4, f10 > -; CHECK-P8-NEXT: mtfprd f7, r6 > -; CHECK-P8-NEXT: mffprwz r6, f9 > -; CHECK-P8-NEXT: mtfprd f10, r4 > -; CHECK-P8-NEXT: mffprwz r4, f12 > -; CHECK-P8-NEXT: mtfprd f9, r6 > -; CHECK-P8-NEXT: xxswapd v6, vs10 > -; CHECK-P8-NEXT: mffprwz r6, f11 > -; CHECK-P8-NEXT: mtfprd f12, r4 > -; CHECK-P8-NEXT: xxswapd v1, vs9 > -; CHECK-P8-NEXT: mfvsrwz r4, v2 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f11, r6 > -; CHECK-P8-NEXT: mffprwz r6, f13 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v7, vs11 > -; CHECK-P8-NEXT: mfvsrwz r4, v3 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > -; CHECK-P8-NEXT: xxswapd v4, vs7 > -; CHECK-P8-NEXT: vmrglh v2, v2, v0 > -; CHECK-P8-NEXT: xxswapd v5, vs8 > -; CHECK-P8-NEXT: xxswapd v0, vs2 > -; CHECK-P8-NEXT: mtfprd f13, r6 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: vmrglh v4, v5, v4 > -; CHECK-P8-NEXT: vmrglh v5, v0, v1 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglh v0, v7, v6 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > -; CHECK-P8-NEXT: xxswapd v10, vs1 > +; CHECK-P8-NEXT: vmrghh v1, v1, v9 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghh v7, v8, v7 > +; CHECK-P8-NEXT: vmrghh v6, v6, v9 > ; CHECK-P8-NEXT: vmrglw v2, v2, v3 > -; CHECK-P8-NEXT: vmrglh v1, v1, v6 > -; CHECK-P8-NEXT: vmrglh v6, v8, v7 > -; CHECK-P8-NEXT: vmrglh v7, v9, v10 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > -; CHECK-P8-NEXT: vmrglw v4, v1, v0 > -; CHECK-P8-NEXT: vmrglw v5, v7, v6 > +; CHECK-P8-NEXT: vmrglw v3, v0, v5 > +; CHECK-P8-NEXT: vmrglw v4, v1, v4 > +; CHECK-P8-NEXT: vmrglw v5, v6, v7 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P8-NEXT: stvx v2, 0, r3 > ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 > @@ -481,118 +437,102 @@ define void @test16elt(<16 x i16>* noalias > nocapture sret %agg.result, <16 x flo > ; > ; CHECK-P9-LABEL: test16elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs1, 0(r4) > -; CHECK-P9-NEXT: lxv vs3, 16(r4) > -; CHECK-P9-NEXT: xscvspdpn f5, vs1 > -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 > -; CHECK-P9-NEXT: xscvspdpn f8, vs3 > -; CHECK-P9-NEXT: xxswapd vs4, vs1 > -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: lxv vs2, 0(r4) > +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 > +; CHECK-P9-NEXT: xxswapd vs4, vs2 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvspdpn f4, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f5, f5 > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: xscvspdpn f5, vs2 > +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f8, f8 > -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3 > -; CHECK-P9-NEXT: xxswapd vs7, vs3 > -; CHECK-P9-NEXT: xscvspdpn f6, vs6 > -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 > -; CHECK-P9-NEXT: xscvspdpn f7, vs7 > -; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: lxv vs1, 16(r4) > +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3 > +; CHECK-P9-NEXT: xxswapd vs3, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r5 > +; CHECK-P9-NEXT: mffprwz r5, f4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f5 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > +; CHECK-P9-NEXT: mffprwz r5, f4 > +; CHECK-P9-NEXT: xscvspdpn f4, vs6 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: mffprwz r5, f2 > +; CHECK-P9-NEXT: xscvspdpn f2, vs1 > +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > ; CHECK-P9-NEXT: xscvdpsxws f4, f4 > -; CHECK-P9-NEXT: xscvdpsxws f6, f6 > -; CHECK-P9-NEXT: mffprwz r5, f5 > -; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: xscvdpsxws f7, f7 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: mtfprd f5, r5 > -; CHECK-P9-NEXT: mffprwz r5, f8 > -; CHECK-P9-NEXT: mtfprd f8, r5 > -; CHECK-P9-NEXT: mffprwz r5, f2 > ; CHECK-P9-NEXT: lxv vs0, 32(r4) > -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3 > -; CHECK-P9-NEXT: xxswapd vs10, vs0 > -; CHECK-P9-NEXT: xscvspdpn f9, vs9 > -; CHECK-P9-NEXT: xscvspdpn f10, vs10 > -; CHECK-P9-NEXT: xscvdpsxws f9, f9 > -; CHECK-P9-NEXT: xscvdpsxws f10, f10 > -; CHECK-P9-NEXT: mtfprd f2, r5 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r5, f4 > -; CHECK-P9-NEXT: mtfprd f4, r5 > +; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > +; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > +; CHECK-P9-NEXT: mffprwz r5, f2 > +; CHECK-P9-NEXT: xscvspdpn f2, vs3 > +; CHECK-P9-NEXT: vmrghh v4, v5, v4 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > ; CHECK-P9-NEXT: mffprwz r5, f1 > -; CHECK-P9-NEXT: mtfprd f1, r5 > -; CHECK-P9-NEXT: mffprwz r5, f6 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: xxswapd v3, vs4 > +; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > +; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghh v5, v5, v0 > +; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrglw v3, v5, v4 > +; CHECK-P9-NEXT: mffprwz r5, f2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mtfprd f6, r5 > -; CHECK-P9-NEXT: mffprwz r5, f7 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mffprwz r5, f1 > ; CHECK-P9-NEXT: lxv vs1, 48(r4) > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs5 > -; CHECK-P9-NEXT: mtfprd f7, r5 > -; CHECK-P9-NEXT: mffprwz r5, f3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs6 > -; CHECK-P9-NEXT: xxswapd v5, vs7 > -; CHECK-P9-NEXT: mtfprd f3, r5 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: xxswapd v0, vs3 > -; CHECK-P9-NEXT: vmrglh v4, v5, v4 > -; CHECK-P9-NEXT: xxswapd v5, vs8 > -; CHECK-P9-NEXT: vmrglh v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v1, r5 > +; CHECK-P9-NEXT: vmrghh v0, v1, v0 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: mtfprd f2, r4 > -; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglw v3, v5, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > +; CHECK-P9-NEXT: mffprwz r4, f0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > +; CHECK-P9-NEXT: vmrghh v2, v4, v2 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrglw v2, v2, v0 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglh v2, v4, v2 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: mffprwz r5, f9 > -; CHECK-P9-NEXT: mtfprd f9, r5 > -; CHECK-P9-NEXT: mffprwz r5, f10 > -; CHECK-P9-NEXT: mtfprd f10, r5 > -; CHECK-P9-NEXT: xxswapd v0, vs9 > -; CHECK-P9-NEXT: xxswapd v1, vs10 > -; CHECK-P9-NEXT: vmrglh v0, v1, v0 > -; CHECK-P9-NEXT: vmrglw v2, v2, v0 > -; CHECK-P9-NEXT: stxv vs2, 0(r3) > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r4 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 > ; CHECK-P9-NEXT: stxv vs0, 16(r3) > +; CHECK-P9-NEXT: stxv vs2, 0(r3) > ; CHECK-P9-NEXT: blr > ; > ; CHECK-BE-LABEL: test16elt: > @@ -728,12 +668,10 @@ define i32 @test2elt_signed(i64 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: vmrghh v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -748,13 +686,11 @@ define i32 @test2elt_signed(i64 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs1 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: li r3, 0 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -798,20 +734,16 @@ define i64 @test4elt_signed(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f3, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs2 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v4, v5 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: vmrghh v3, v4, v3 > +; CHECK-P8-NEXT: vmrghh v2, v2, v5 > +; CHECK-P8-NEXT: vmrglw v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -822,27 +754,23 @@ define i64 @test4elt_signed(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v4, v2 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: vmrghh v2, v4, v2 > ; CHECK-P9-NEXT: vmrglw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > @@ -888,59 +816,51 @@ define <8 x i16> @test8elt_signed(<8 x float>* > nocapture readonly) local_unnamed > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v2, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: lvx v5, r3, r4 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: lvx v3, r3, r4 > ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v5 > -; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: xscvspdpn f2, v2 > +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 > +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mffprwz r5, f0 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: mtfprd f0, r5 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglh v2, v4, v3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v5, vs2 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: xxswapd vs0, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: mffprwz r3, f2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f4 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvdpsxws f4, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v2, v4, v2 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v6, vs3 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: vmrglh v3, v3, v4 > -; CHECK-P8-NEXT: vmrglh v4, v0, v5 > -; CHECK-P8-NEXT: vmrglh v5, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghh v3, v3, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghh v5, v0, v5 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > +; CHECK-P8-NEXT: vmrghh v4, v4, v1 > +; CHECK-P8-NEXT: vmrglw v3, v4, v5 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P8-NEXT: blr > ; > @@ -952,53 +872,45 @@ define <8 x i16> @test8elt_signed(<8 x float>* > nocapture readonly) local_unnamed > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P9-NEXT: blr > @@ -1071,116 +983,100 @@ define void @test16elt_signed(<16 x i16>* noalias > nocapture sret %agg.result, <1 > ; CHECK-P8-LABEL: test16elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v5, 0, r4 > -; CHECK-P8-NEXT: li r6, 32 > ; CHECK-P8-NEXT: li r5, 16 > -; CHECK-P8-NEXT: lvx v2, r4, r6 > +; CHECK-P8-NEXT: li r6, 32 > ; CHECK-P8-NEXT: lvx v3, r4, r5 > +; CHECK-P8-NEXT: lvx v2, r4, r6 > ; CHECK-P8-NEXT: li r6, 48 > -; CHECK-P8-NEXT: xscvspdpn f0, v5 > -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3 > +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3 > +; CHECK-P8-NEXT: xscvspdpn f1, v5 > ; CHECK-P8-NEXT: lvx v4, r4, r6 > -; CHECK-P8-NEXT: xscvspdpn f4, v2 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f2, v3 > ; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xxswapd vs8, v3 > -; CHECK-P8-NEXT: xscvspdpn f6, v4 > +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3 > +; CHECK-P8-NEXT: xxswapd vs8, v3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvspdpn f3, vs3 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: xscvspdpn f7, vs7 > +; CHECK-P8-NEXT: xscvspdpn f8, vs8 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1 > +; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: xscvspdpn f2, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f1, f5 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1 > +; CHECK-P8-NEXT: xscvspdpn f4, v2 > +; CHECK-P8-NEXT: xscvdpsxws f5, f7 > +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3 > +; CHECK-P8-NEXT: xscvspdpn f6, v4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f1, f8 > +; CHECK-P8-NEXT: xxswapd vs8, v4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f5 > +; CHECK-P8-NEXT: xxswapd vs5, v2 > ; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1 > -; CHECK-P8-NEXT: xscvspdpn f8, vs8 > -; CHECK-P8-NEXT: xxswapd vs11, v2 > ; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xxswapd v2, v4 > +; CHECK-P8-NEXT: vmrghh v3, v0, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvdpsxws f6, f6 > +; CHECK-P8-NEXT: xscvspdpn f1, vs5 > +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v2, v5, v1 > +; CHECK-P8-NEXT: vmrghh v5, v6, v0 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f2, f3 > +; CHECK-P8-NEXT: xscvspdpn f5, vs5 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > ; CHECK-P8-NEXT: xscvspdpn f7, vs7 > -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1 > -; CHECK-P8-NEXT: xscvspdpn f10, vs10 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1 > +; CHECK-P8-NEXT: xscvspdpn f8, vs8 > +; CHECK-P8-NEXT: xscvdpsxws f0, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xscvspdpn f1, vs2 > +; CHECK-P8-NEXT: xscvdpsxws f3, f7 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xscvdpsxws f0, f8 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvspdpn f9, vs9 > -; CHECK-P8-NEXT: xscvdpsxws f6, f6 > -; CHECK-P8-NEXT: xscvspdpn f12, vs12 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: vmrghh v0, v0, v7 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvspdpn f11, vs11 > -; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvspdpn v2, v2 > -; CHECK-P8-NEXT: xscvdpsxws f8, f8 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: mffprwz r6, f2 > -; CHECK-P8-NEXT: xscvspdpn f13, vs13 > -; CHECK-P8-NEXT: xscvspdpn v3, v3 > -; CHECK-P8-NEXT: xscvdpsxws f10, f10 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: vmrghh v4, v8, v4 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: mtfprd f2, r6 > -; CHECK-P8-NEXT: mffprwz r6, f6 > -; CHECK-P8-NEXT: xscvdpsxws f12, f12 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f6, r6 > -; CHECK-P8-NEXT: mffprwz r6, f3 > -; CHECK-P8-NEXT: xscvdpsxws v2, v2 > -; CHECK-P8-NEXT: xxswapd v9, vs6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f8 > -; CHECK-P8-NEXT: mtfprd f3, r6 > -; CHECK-P8-NEXT: xxswapd v0, vs5 > -; CHECK-P8-NEXT: mffprwz r6, f7 > -; CHECK-P8-NEXT: xscvdpsxws f13, f13 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: xscvdpsxws v3, v3 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: mffprwz r4, f10 > -; CHECK-P8-NEXT: mtfprd f7, r6 > -; CHECK-P8-NEXT: mffprwz r6, f9 > -; CHECK-P8-NEXT: mtfprd f10, r4 > -; CHECK-P8-NEXT: mffprwz r4, f12 > -; CHECK-P8-NEXT: mtfprd f9, r6 > -; CHECK-P8-NEXT: xxswapd v6, vs10 > -; CHECK-P8-NEXT: mffprwz r6, f11 > -; CHECK-P8-NEXT: mtfprd f12, r4 > -; CHECK-P8-NEXT: xxswapd v1, vs9 > -; CHECK-P8-NEXT: mfvsrwz r4, v2 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f11, r6 > -; CHECK-P8-NEXT: mffprwz r6, f13 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v7, vs11 > -; CHECK-P8-NEXT: mfvsrwz r4, v3 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > -; CHECK-P8-NEXT: xxswapd v4, vs7 > -; CHECK-P8-NEXT: vmrglh v2, v2, v0 > -; CHECK-P8-NEXT: xxswapd v5, vs8 > -; CHECK-P8-NEXT: xxswapd v0, vs2 > -; CHECK-P8-NEXT: mtfprd f13, r6 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: vmrglh v4, v5, v4 > -; CHECK-P8-NEXT: vmrglh v5, v0, v1 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglh v0, v7, v6 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > -; CHECK-P8-NEXT: xxswapd v10, vs1 > +; CHECK-P8-NEXT: vmrghh v1, v1, v9 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghh v7, v8, v7 > +; CHECK-P8-NEXT: vmrghh v6, v6, v9 > ; CHECK-P8-NEXT: vmrglw v2, v2, v3 > -; CHECK-P8-NEXT: vmrglh v1, v1, v6 > -; CHECK-P8-NEXT: vmrglh v6, v8, v7 > -; CHECK-P8-NEXT: vmrglh v7, v9, v10 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > -; CHECK-P8-NEXT: vmrglw v4, v1, v0 > -; CHECK-P8-NEXT: vmrglw v5, v7, v6 > +; CHECK-P8-NEXT: vmrglw v3, v0, v5 > +; CHECK-P8-NEXT: vmrglw v4, v1, v4 > +; CHECK-P8-NEXT: vmrglw v5, v6, v7 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P8-NEXT: stvx v2, 0, r3 > ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 > @@ -1189,118 +1085,102 @@ define void @test16elt_signed(<16 x i16>* > noalias nocapture sret %agg.result, <1 > ; > ; CHECK-P9-LABEL: test16elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs1, 0(r4) > -; CHECK-P9-NEXT: lxv vs3, 16(r4) > -; CHECK-P9-NEXT: xscvspdpn f5, vs1 > -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 > -; CHECK-P9-NEXT: xscvspdpn f8, vs3 > -; CHECK-P9-NEXT: xxswapd vs4, vs1 > -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: lxv vs2, 0(r4) > +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 > +; CHECK-P9-NEXT: xxswapd vs4, vs2 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvspdpn f4, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f5, f5 > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: xscvspdpn f5, vs2 > +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f8, f8 > -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3 > -; CHECK-P9-NEXT: xxswapd vs7, vs3 > -; CHECK-P9-NEXT: xscvspdpn f6, vs6 > -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 > -; CHECK-P9-NEXT: xscvspdpn f7, vs7 > -; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: lxv vs1, 16(r4) > +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3 > +; CHECK-P9-NEXT: xxswapd vs3, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r5 > +; CHECK-P9-NEXT: mffprwz r5, f4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f5 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > +; CHECK-P9-NEXT: mffprwz r5, f4 > +; CHECK-P9-NEXT: xscvspdpn f4, vs6 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: mffprwz r5, f2 > +; CHECK-P9-NEXT: xscvspdpn f2, vs1 > +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > ; CHECK-P9-NEXT: xscvdpsxws f4, f4 > -; CHECK-P9-NEXT: xscvdpsxws f6, f6 > -; CHECK-P9-NEXT: mffprwz r5, f5 > -; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: xscvdpsxws f7, f7 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: mtfprd f5, r5 > -; CHECK-P9-NEXT: mffprwz r5, f8 > -; CHECK-P9-NEXT: mtfprd f8, r5 > -; CHECK-P9-NEXT: mffprwz r5, f2 > ; CHECK-P9-NEXT: lxv vs0, 32(r4) > -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3 > -; CHECK-P9-NEXT: xxswapd vs10, vs0 > -; CHECK-P9-NEXT: xscvspdpn f9, vs9 > -; CHECK-P9-NEXT: xscvspdpn f10, vs10 > -; CHECK-P9-NEXT: xscvdpsxws f9, f9 > -; CHECK-P9-NEXT: xscvdpsxws f10, f10 > -; CHECK-P9-NEXT: mtfprd f2, r5 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r5, f4 > -; CHECK-P9-NEXT: mtfprd f4, r5 > +; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > +; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > +; CHECK-P9-NEXT: mffprwz r5, f2 > +; CHECK-P9-NEXT: xscvspdpn f2, vs3 > +; CHECK-P9-NEXT: vmrghh v4, v5, v4 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > ; CHECK-P9-NEXT: mffprwz r5, f1 > -; CHECK-P9-NEXT: mtfprd f1, r5 > -; CHECK-P9-NEXT: mffprwz r5, f6 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: xxswapd v3, vs4 > +; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > +; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghh v5, v5, v0 > +; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrglw v3, v5, v4 > +; CHECK-P9-NEXT: mffprwz r5, f2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mtfprd f6, r5 > -; CHECK-P9-NEXT: mffprwz r5, f7 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mffprwz r5, f1 > ; CHECK-P9-NEXT: lxv vs1, 48(r4) > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs5 > -; CHECK-P9-NEXT: mtfprd f7, r5 > -; CHECK-P9-NEXT: mffprwz r5, f3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs6 > -; CHECK-P9-NEXT: xxswapd v5, vs7 > -; CHECK-P9-NEXT: mtfprd f3, r5 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: xxswapd v0, vs3 > -; CHECK-P9-NEXT: vmrglh v4, v5, v4 > -; CHECK-P9-NEXT: xxswapd v5, vs8 > -; CHECK-P9-NEXT: vmrglh v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v1, r5 > +; CHECK-P9-NEXT: vmrghh v0, v1, v0 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: mtfprd f2, r4 > -; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglw v3, v5, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > +; CHECK-P9-NEXT: mffprwz r4, f0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > +; CHECK-P9-NEXT: vmrghh v2, v4, v2 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrglw v2, v2, v0 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglh v2, v4, v2 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: mffprwz r5, f9 > -; CHECK-P9-NEXT: mtfprd f9, r5 > -; CHECK-P9-NEXT: mffprwz r5, f10 > -; CHECK-P9-NEXT: mtfprd f10, r5 > -; CHECK-P9-NEXT: xxswapd v0, vs9 > -; CHECK-P9-NEXT: xxswapd v1, vs10 > -; CHECK-P9-NEXT: vmrglh v0, v1, v0 > -; CHECK-P9-NEXT: vmrglw v2, v2, v0 > -; CHECK-P9-NEXT: stxv vs2, 0(r3) > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r4 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 > ; CHECK-P9-NEXT: stxv vs0, 16(r3) > +; CHECK-P9-NEXT: stxv vs2, 0(r3) > ; CHECK-P9-NEXT: blr > ; > ; CHECK-BE-LABEL: test16elt_signed: > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll > index 1f95eda2b1b5..928a19f3a55c 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll > @@ -20,12 +20,10 @@ define i16 @test2elt(i64 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: vmrghb v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: clrldi r3, r3, 48 > @@ -43,13 +41,11 @@ define i16 @test2elt(i64 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: addi r3, r1, -2 > -; CHECK-P9-NEXT: xxswapd v2, vs1 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 > ; CHECK-P9-NEXT: stxsihx v2, 0, r3 > ; CHECK-P9-NEXT: lhz r3, -2(r1) > @@ -97,20 +93,16 @@ define i32 @test4elt(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f3, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs2 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > -; CHECK-P8-NEXT: vmrglb v3, v4, v5 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: vmrghb v3, v4, v3 > +; CHECK-P8-NEXT: vmrghb v2, v2, v5 > +; CHECK-P8-NEXT: vmrglh v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -121,28 +113,24 @@ define i32 @test4elt(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: li r3, 0 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v4, v2 > +; CHECK-P9-NEXT: vmrghb v2, v4, v2 > ; CHECK-P9-NEXT: vmrglh v2, v2, v3 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > @@ -189,59 +177,51 @@ define i64 @test8elt(<8 x float>* nocapture > readonly) local_unnamed_addr #2 { > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v2, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: lvx v5, r3, r4 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: lvx v3, r3, r4 > ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v5 > -; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: xscvspdpn f2, v2 > +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 > +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mffprwz r5, f0 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: mtfprd f0, r5 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglb v2, v4, v3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v5, vs2 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: xxswapd vs0, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: mffprwz r3, f2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f4 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvdpsxws f4, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghb v2, v4, v2 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v6, vs3 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: vmrglb v3, v3, v4 > -; CHECK-P8-NEXT: vmrglb v4, v0, v5 > -; CHECK-P8-NEXT: vmrglb v5, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghb v3, v3, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghb v5, v0, v5 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > +; CHECK-P8-NEXT: vmrghb v4, v4, v1 > +; CHECK-P8-NEXT: vmrglh v3, v4, v5 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > @@ -255,53 +235,45 @@ define i64 @test8elt(<8 x float>* nocapture > readonly) local_unnamed_addr #2 { > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > @@ -376,117 +348,101 @@ entry: > define <16 x i8> @test16elt(<16 x float>* nocapture readonly) > local_unnamed_addr #3 { > ; CHECK-P8-LABEL: test16elt: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: lvx v2, 0, r3 > +; CHECK-P8-NEXT: lvx v4, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > +; CHECK-P8-NEXT: li r5, 32 > ; CHECK-P8-NEXT: lvx v3, r3, r4 > -; CHECK-P8-NEXT: li r4, 32 > -; CHECK-P8-NEXT: xscvspdpn f2, v2 > -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v3 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1 > -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > -; CHECK-P8-NEXT: lvx v2, r3, r4 > +; CHECK-P8-NEXT: lvx v2, r3, r5 > +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3 > +; CHECK-P8-NEXT: xxswapd vs2, v4 > +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1 > +; CHECK-P8-NEXT: xscvspdpn f1, v4 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xxswapd vs6, v3 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxswapd vs9, v2 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: xxswapd vs7, v3 > +; CHECK-P8-NEXT: xscvspdpn f2, vs2 > +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3 > ; CHECK-P8-NEXT: xscvspdpn f6, vs6 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > ; CHECK-P8-NEXT: xscvspdpn f7, vs7 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: xscvdpsxws f4, f4 > ; CHECK-P8-NEXT: xscvspdpn f8, vs8 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvdpsxws f0, f5 > -; CHECK-P8-NEXT: xxswapd v0, vs4 > +; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: xscvspdpn f9, vs9 > -; CHECK-P8-NEXT: mtfprd f5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxswapd vs0, v2 > +; CHECK-P8-NEXT: mffprwz r5, f2 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: mtvsrd v4, r5 > +; CHECK-P8-NEXT: mffprwz r5, f4 > ; CHECK-P8-NEXT: xscvdpsxws f1, f6 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > -; CHECK-P8-NEXT: mtfprd f6, r4 > -; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: vmrghb v3, v4, v3 > +; CHECK-P8-NEXT: mtvsrd v4, r5 > +; CHECK-P8-NEXT: mffprwz r5, f3 > ; CHECK-P8-NEXT: xscvdpsxws f3, f7 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvdpsxws f0, f8 > -; CHECK-P8-NEXT: xxswapd v5, vs7 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xscvdpsxws f1, f9 > -; CHECK-P8-NEXT: xxswapd v1, vs8 > -; CHECK-P8-NEXT: mtfprd f9, r4 > -; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: vmrglb v3, v4, v3 > -; CHECK-P8-NEXT: xxswapd v4, vs2 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs9 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: xxswapd v7, vs3 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: vmrglb v4, v4, v5 > -; CHECK-P8-NEXT: xxswapd v5, vs5 > -; CHECK-P8-NEXT: mtfprd f1, r4 > +; CHECK-P8-NEXT: xscvdpsxws f4, f8 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: li r4, 48 > -; CHECK-P8-NEXT: lvx v9, r3, r4 > -; CHECK-P8-NEXT: vmrglb v1, v6, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs1 > +; CHECK-P8-NEXT: lvx v0, r3, r4 > +; CHECK-P8-NEXT: mffprwz r3, f1 > ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v9 > -; CHECK-P8-NEXT: xxswapd vs3, v9 > -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1 > +; CHECK-P8-NEXT: xscvspdpn f5, v2 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: xxswapd vs4, v0 > ; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > +; CHECK-P8-NEXT: mtvsrd v7, r3 > +; CHECK-P8-NEXT: mffprwz r3, f0 > +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1 > +; CHECK-P8-NEXT: xscvspdpn f2, v0 > ; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f6, f9 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvdpsxws f5, f5 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghb v2, v6, v1 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f5 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghb v4, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v5, r5 > +; CHECK-P8-NEXT: vmrghb v0, v6, v1 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v9, vs4 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: vmrglb v2, v0, v7 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v7, vs2 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: vmrglb v5, v8, v5 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: xxswapd v10, vs3 > -; CHECK-P8-NEXT: vmrglb v0, v0, v6 > +; CHECK-P8-NEXT: vmrghb v5, v5, v7 > +; CHECK-P8-NEXT: vmrghb v1, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: mtvsrd v7, r3 > +; CHECK-P8-NEXT: mffprwz r3, f0 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mtvsrd v9, r3 > +; CHECK-P8-NEXT: vmrghb v7, v8, v7 > +; CHECK-P8-NEXT: vmrghb v6, v6, v9 > ; CHECK-P8-NEXT: vmrglh v3, v4, v3 > -; CHECK-P8-NEXT: vmrglb v6, v8, v7 > -; CHECK-P8-NEXT: vmrglb v7, v9, v10 > -; CHECK-P8-NEXT: vmrglh v2, v2, v1 > -; CHECK-P8-NEXT: vmrglh v4, v0, v5 > -; CHECK-P8-NEXT: vmrglh v5, v7, v6 > +; CHECK-P8-NEXT: vmrglh v2, v5, v2 > +; CHECK-P8-NEXT: vmrglh v4, v1, v0 > +; CHECK-P8-NEXT: vmrglh v5, v6, v7 > ; CHECK-P8-NEXT: vmrglw v2, v2, v3 > ; CHECK-P8-NEXT: vmrglw v3, v5, v4 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > @@ -494,114 +450,98 @@ define <16 x i8> @test16elt(<16 x float>* nocapture > readonly) local_unnamed_addr > ; > ; CHECK-P9-LABEL: test16elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs2, 0(r3) > +; CHECK-P9-NEXT: lxv vs3, 0(r3) > +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3 > +; CHECK-P9-NEXT: xscvspdpn f4, vs4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: lxv vs0, 48(r3) > +; CHECK-P9-NEXT: lxv vs1, 32(r3) > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: xxswapd vs4, vs3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: xscvspdpn f4, vs4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: xscvspdpn f4, vs3 > +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f3 > ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: lxv vs0, 48(r3) > -; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs4, 16(r3) > +; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs3 > ; CHECK-P9-NEXT: xxswapd vs3, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs2 > ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: xxswapd vs2, vs4 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xscvspdpn f2, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs2 > ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v4, v5, v4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v4, v5, v4 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > -; CHECK-P9-NEXT: xxswapd v0, vs0 > -; CHECK-P9-NEXT: vmrglb v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r3 > +; CHECK-P9-NEXT: vmrghb v5, v5, v0 > ; CHECK-P9-NEXT: vmrglh v4, v5, v4 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > @@ -738,12 +678,10 @@ define i16 @test2elt_signed(i64 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: vmrghb v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: clrldi r3, r3, 48 > @@ -761,13 +699,11 @@ define i16 @test2elt_signed(i64 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: addi r3, r1, -2 > -; CHECK-P9-NEXT: xxswapd v2, vs1 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 > ; CHECK-P9-NEXT: stxsihx v2, 0, r3 > ; CHECK-P9-NEXT: lhz r3, -2(r1) > @@ -815,20 +751,16 @@ define i32 @test4elt_signed(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: mtfprd f3, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs2 > -; CHECK-P8-NEXT: xxswapd v5, vs3 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > -; CHECK-P8-NEXT: vmrglb v3, v4, v5 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: vmrghb v3, v4, v3 > +; CHECK-P8-NEXT: vmrghb v2, v2, v5 > +; CHECK-P8-NEXT: vmrglh v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -839,28 +771,24 @@ define i32 @test4elt_signed(<4 x float> %a) > local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xscvspdpn f0, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: li r3, 0 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v4, v2 > +; CHECK-P9-NEXT: vmrghb v2, v4, v2 > ; CHECK-P9-NEXT: vmrglh v2, v2, v3 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > @@ -907,59 +835,51 @@ define i64 @test8elt_signed(<8 x float>* nocapture > readonly) local_unnamed_addr > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: lvx v2, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: lvx v5, r3, r4 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: lvx v3, r3, r4 > ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v5 > -; CHECK-P8-NEXT: xxswapd vs3, v5 > -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xxswapd vs1, v2 > +; CHECK-P8-NEXT: xscvspdpn f2, v2 > +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 > +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > ; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mffprwz r5, f0 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: mtfprd f0, r5 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v1, vs4 > -; CHECK-P8-NEXT: vmrglb v2, v4, v3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v5, vs2 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mffprwz r3, f1 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: xxswapd vs0, v3 > +; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: mffprwz r3, f2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f4 > +; CHECK-P8-NEXT: xscvspdpn f1, vs1 > +; CHECK-P8-NEXT: xscvdpsxws f4, f5 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghb v2, v4, v2 > +; CHECK-P8-NEXT: mffprwz r4, f2 > +; CHECK-P8-NEXT: xscvdpsxws f1, f1 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v6, vs3 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: vmrglb v3, v3, v4 > -; CHECK-P8-NEXT: vmrglb v4, v0, v5 > -; CHECK-P8-NEXT: vmrglb v5, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghb v3, v3, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > +; CHECK-P8-NEXT: mtvsrd v5, r3 > +; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghb v5, v0, v5 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > +; CHECK-P8-NEXT: vmrghb v4, v4, v1 > +; CHECK-P8-NEXT: vmrglh v3, v4, v5 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > @@ -973,53 +893,45 @@ define i64 @test8elt_signed(<8 x float>* nocapture > readonly) local_unnamed_addr > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > @@ -1094,117 +1006,101 @@ entry: > define <16 x i8> @test16elt_signed(<16 x float>* nocapture readonly) > local_unnamed_addr #3 { > ; CHECK-P8-LABEL: test16elt_signed: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: lvx v2, 0, r3 > +; CHECK-P8-NEXT: lvx v4, 0, r3 > ; CHECK-P8-NEXT: li r4, 16 > +; CHECK-P8-NEXT: li r5, 32 > ; CHECK-P8-NEXT: lvx v3, r3, r4 > -; CHECK-P8-NEXT: li r4, 32 > -; CHECK-P8-NEXT: xscvspdpn f2, v2 > -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v3 > -; CHECK-P8-NEXT: xxswapd vs1, v2 > -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1 > -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 > -; CHECK-P8-NEXT: lvx v2, r3, r4 > +; CHECK-P8-NEXT: lvx v2, r3, r5 > +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3 > +; CHECK-P8-NEXT: xxswapd vs2, v4 > +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1 > +; CHECK-P8-NEXT: xscvspdpn f1, v4 > +; CHECK-P8-NEXT: xscvspdpn f3, v3 > +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3 > ; CHECK-P8-NEXT: xscvspdpn f0, vs0 > -; CHECK-P8-NEXT: xxswapd vs6, v3 > -; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1 > -; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxswapd vs9, v2 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > -; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: xxswapd vs7, v3 > +; CHECK-P8-NEXT: xscvspdpn f2, vs2 > +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3 > ; CHECK-P8-NEXT: xscvspdpn f6, vs6 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r4, f2 > ; CHECK-P8-NEXT: xscvspdpn f7, vs7 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: xscvdpsxws f4, f4 > ; CHECK-P8-NEXT: xscvspdpn f8, vs8 > -; CHECK-P8-NEXT: mtfprd f4, r4 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvdpsxws f0, f5 > -; CHECK-P8-NEXT: xxswapd v0, vs4 > +; CHECK-P8-NEXT: xscvdpsxws f3, f3 > ; CHECK-P8-NEXT: xscvspdpn f9, vs9 > -; CHECK-P8-NEXT: mtfprd f5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxswapd vs0, v2 > +; CHECK-P8-NEXT: mffprwz r5, f2 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f1 > +; CHECK-P8-NEXT: mtvsrd v4, r5 > +; CHECK-P8-NEXT: mffprwz r5, f4 > ; CHECK-P8-NEXT: xscvdpsxws f1, f6 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > -; CHECK-P8-NEXT: mtfprd f6, r4 > -; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: vmrghb v3, v4, v3 > +; CHECK-P8-NEXT: mtvsrd v4, r5 > +; CHECK-P8-NEXT: mffprwz r5, f3 > ; CHECK-P8-NEXT: xscvdpsxws f3, f7 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvdpsxws f0, f8 > -; CHECK-P8-NEXT: xxswapd v5, vs7 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xscvdpsxws f1, f9 > -; CHECK-P8-NEXT: xxswapd v1, vs8 > -; CHECK-P8-NEXT: mtfprd f9, r4 > -; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: vmrglb v3, v4, v3 > -; CHECK-P8-NEXT: xxswapd v4, vs2 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs9 > -; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: xscvspdpn f0, v2 > -; CHECK-P8-NEXT: xxswapd v7, vs3 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: vmrglb v4, v4, v5 > -; CHECK-P8-NEXT: xxswapd v5, vs5 > -; CHECK-P8-NEXT: mtfprd f1, r4 > +; CHECK-P8-NEXT: xscvdpsxws f4, f8 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: li r4, 48 > -; CHECK-P8-NEXT: lvx v9, r3, r4 > -; CHECK-P8-NEXT: vmrglb v1, v6, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs1 > +; CHECK-P8-NEXT: lvx v0, r3, r4 > +; CHECK-P8-NEXT: mffprwz r3, f1 > ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 > -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3 > -; CHECK-P8-NEXT: xscvspdpn f4, v9 > -; CHECK-P8-NEXT: xxswapd vs3, v9 > -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1 > +; CHECK-P8-NEXT: xscvspdpn f5, v2 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3 > +; CHECK-P8-NEXT: mtvsrd v1, r3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > +; CHECK-P8-NEXT: xxswapd vs4, v0 > ; CHECK-P8-NEXT: xscvspdpn f1, vs1 > -; CHECK-P8-NEXT: xscvspdpn f2, vs2 > +; CHECK-P8-NEXT: mtvsrd v7, r3 > +; CHECK-P8-NEXT: mffprwz r3, f0 > +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1 > +; CHECK-P8-NEXT: xscvspdpn f2, v0 > ; CHECK-P8-NEXT: xscvspdpn f3, vs3 > -; CHECK-P8-NEXT: xscvspdpn f5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: xscvdpsxws f4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f6, f9 > +; CHECK-P8-NEXT: xscvspdpn f4, vs4 > +; CHECK-P8-NEXT: xscvspdpn f0, vs0 > +; CHECK-P8-NEXT: xscvdpsxws f5, f5 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f4, f4 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghb v2, v6, v1 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f5 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: vmrghb v4, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v5, r5 > +; CHECK-P8-NEXT: vmrghb v0, v6, v1 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: xxswapd v9, vs4 > -; CHECK-P8-NEXT: mtfprd f1, r3 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > ; CHECK-P8-NEXT: mffprwz r3, f3 > -; CHECK-P8-NEXT: mtfprd f2, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs1 > -; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: vmrglb v2, v0, v7 > -; CHECK-P8-NEXT: xxswapd v0, vs0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v7, vs2 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: vmrglb v5, v8, v5 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: xxswapd v10, vs3 > -; CHECK-P8-NEXT: vmrglb v0, v0, v6 > +; CHECK-P8-NEXT: vmrghb v5, v5, v7 > +; CHECK-P8-NEXT: vmrghb v1, v1, v6 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: mtvsrd v7, r3 > +; CHECK-P8-NEXT: mffprwz r3, f0 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mtvsrd v9, r3 > +; CHECK-P8-NEXT: vmrghb v7, v8, v7 > +; CHECK-P8-NEXT: vmrghb v6, v6, v9 > ; CHECK-P8-NEXT: vmrglh v3, v4, v3 > -; CHECK-P8-NEXT: vmrglb v6, v8, v7 > -; CHECK-P8-NEXT: vmrglb v7, v9, v10 > -; CHECK-P8-NEXT: vmrglh v2, v2, v1 > -; CHECK-P8-NEXT: vmrglh v4, v0, v5 > -; CHECK-P8-NEXT: vmrglh v5, v7, v6 > +; CHECK-P8-NEXT: vmrglh v2, v5, v2 > +; CHECK-P8-NEXT: vmrglh v4, v1, v0 > +; CHECK-P8-NEXT: vmrglh v5, v6, v7 > ; CHECK-P8-NEXT: vmrglw v2, v2, v3 > ; CHECK-P8-NEXT: vmrglw v3, v5, v4 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > @@ -1212,114 +1108,98 @@ define <16 x i8> @test16elt_signed(<16 x float>* > nocapture readonly) local_unnam > ; > ; CHECK-P9-LABEL: test16elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs2, 0(r3) > +; CHECK-P9-NEXT: lxv vs3, 0(r3) > +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3 > +; CHECK-P9-NEXT: xscvspdpn f4, vs4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: lxv vs0, 48(r3) > +; CHECK-P9-NEXT: lxv vs1, 32(r3) > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: xxswapd vs4, vs3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: xscvspdpn f4, vs4 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: xscvspdpn f4, vs3 > +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: mffprwz r3, f4 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f3 > ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs3 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: lxv vs0, 48(r3) > -; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs4, 16(r3) > +; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs3 > ; CHECK-P9-NEXT: xxswapd vs3, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvspdpn f3, vs2 > ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: xxswapd vs2, vs4 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xscvspdpn f2, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1 > -; CHECK-P9-NEXT: xscvspdpn f2, vs2 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs2 > ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xxswapd vs2, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvspdpn f2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvspdpn f2, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v3, v4, v3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglb v3, v4, v3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xxswapd vs1, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvspdpn f1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xscvspdpn f1, vs0 > ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvspdpn f0, vs0 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v4, v5, v4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v4, v5, v4 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > -; CHECK-P9-NEXT: xxswapd v0, vs0 > -; CHECK-P9-NEXT: vmrglb v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r3 > +; CHECK-P9-NEXT: vmrghb v5, v5, v0 > ; CHECK-P9-NEXT: vmrglh v4, v5, v4 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll > index c7d66ae784a0..dbc2774fed8c 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll > @@ -16,12 +16,10 @@ define i32 @test2elt(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f1, v2 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglh v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: vmrghh v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -30,15 +28,13 @@ define i32 @test2elt(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P9: # %bb.0: # %entry > ; CHECK-P9-NEXT: xscvdpsxws f0, v2 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: li r3, 0 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -77,18 +73,14 @@ define i64 @test4elt(<4 x double>* nocapture readonly) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f2 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: xxswapd v2, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs3 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xxswapd v5, vs1 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: vmrghh v2, v4, v2 > +; CHECK-P8-NEXT: vmrghh v3, v5, v3 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > @@ -102,22 +94,18 @@ define i64 @test4elt(<4 x double>* nocapture > readonly) local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > @@ -176,36 +164,28 @@ define <8 x i16> @test8elt(<8 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: mtfprd f4, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs4 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f6, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v1, vs7 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v0, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs2 > -; CHECK-P8-NEXT: vmrglh v2, v5, v2 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > -; CHECK-P8-NEXT: vmrglh v3, v0, v3 > -; CHECK-P8-NEXT: vmrglh v4, v6, v4 > -; CHECK-P8-NEXT: vmrglh v5, v5, v1 > +; CHECK-P8-NEXT: vmrghh v2, v0, v2 > +; CHECK-P8-NEXT: vmrghh v3, v1, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: vmrghh v4, v0, v4 > +; CHECK-P8-NEXT: vmrghh v5, v1, v5 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: vmrglw v3, v5, v4 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > @@ -217,47 +197,39 @@ define <8 x i16> @test8elt(<8 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 48(r3) > ; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: xxswapd v2, vs4 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P9-NEXT: blr > @@ -321,209 +293,177 @@ entry: > define void @test16elt(<16 x i16>* noalias nocapture sret %agg.result, > <16 x double>* nocapture readonly) local_unnamed_addr #3 { > ; CHECK-P8-LABEL: test16elt: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r5, 16 > +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r6, 32 > +; CHECK-P8-NEXT: li r7, 48 > ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5 > ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6 > -; CHECK-P8-NEXT: li r6, 48 > -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6 > ; CHECK-P8-NEXT: li r6, 64 > -; CHECK-P8-NEXT: xscvdpsxws f4, f0 > +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7 > ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6 > -; CHECK-P8-NEXT: li r6, 80 > +; CHECK-P8-NEXT: li r7, 80 > +; CHECK-P8-NEXT: li r6, 96 > +; CHECK-P8-NEXT: xscvdpsxws f4, f0 > +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7 > +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6 > +; CHECK-P8-NEXT: li r6, 112 > ; CHECK-P8-NEXT: xxswapd vs0, vs0 > ; CHECK-P8-NEXT: xscvdpsxws f6, f1 > -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6 > -; CHECK-P8-NEXT: li r6, 96 > ; CHECK-P8-NEXT: xxswapd vs1, vs1 > ; CHECK-P8-NEXT: xscvdpsxws f8, f2 > -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6 > -; CHECK-P8-NEXT: li r6, 112 > ; CHECK-P8-NEXT: xxswapd vs2, vs2 > -; CHECK-P8-NEXT: xscvdpsxws f10, f3 > -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6 > +; CHECK-P8-NEXT: xscvdpsxws f9, f3 > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > -; CHECK-P8-NEXT: xscvdpsxws f12, f5 > +; CHECK-P8-NEXT: xscvdpsxws f11, f5 > ; CHECK-P8-NEXT: xxswapd vs5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f13, f7 > +; CHECK-P8-NEXT: xscvdpsxws f12, f7 > ; CHECK-P8-NEXT: xxswapd vs7, vs7 > -; CHECK-P8-NEXT: xscvdpsxws v2, f9 > -; CHECK-P8-NEXT: xxswapd vs9, vs9 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws v3, f11 > -; CHECK-P8-NEXT: xxswapd vs11, vs11 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r6, f6 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: mffprwz r7, f4 > +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f13, f10 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f8 > +; CHECK-P8-NEXT: xscvdpsxws f6, f4 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f9 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f11 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs4 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: mtfprd f6, r6 > -; CHECK-P8-NEXT: mffprwz r6, f10 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs6 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > ; CHECK-P8-NEXT: mffprwz r4, f12 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: xxswapd v0, vs8 > -; CHECK-P8-NEXT: mtfprd f10, r6 > -; CHECK-P8-NEXT: mffprwz r6, f13 > -; CHECK-P8-NEXT: mtfprd f12, r4 > -; CHECK-P8-NEXT: xxswapd v1, vs10 > -; CHECK-P8-NEXT: mfvsrwz r4, v2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f13 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: mtfprd f13, r6 > -; CHECK-P8-NEXT: mfvsrwz r6, v3 > -; CHECK-P8-NEXT: mtvsrd v2, r4 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xxswapd vs6, vs10 > +; CHECK-P8-NEXT: xscvdpsxws f5, f5 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxswapd vs0, vs4 > +; CHECK-P8-NEXT: mtvsrd v2, r7 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > ; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: xxswapd v2, v2 > -; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: mtvsrd v3, r6 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v3, v3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: mtfprd f2, r4 > +; CHECK-P8-NEXT: xscvdpsxws f4, f6 > +; CHECK-P8-NEXT: vmrghh v2, v8, v2 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v3, v9, v3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > -; CHECK-P8-NEXT: mffprwz r6, f3 > -; CHECK-P8-NEXT: xxswapd v10, vs2 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f9 > -; CHECK-P8-NEXT: mtfprd f3, r6 > -; CHECK-P8-NEXT: mffprwz r6, f7 > -; CHECK-P8-NEXT: mtfprd f9, r4 > -; CHECK-P8-NEXT: mffprwz r4, f11 > -; CHECK-P8-NEXT: vmrglh v4, v8, v4 > -; CHECK-P8-NEXT: xxswapd v8, vs3 > -; CHECK-P8-NEXT: vmrglh v5, v9, v5 > -; CHECK-P8-NEXT: xxswapd v9, vs5 > -; CHECK-P8-NEXT: mtfprd f7, r6 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: vmrglh v0, v10, v0 > -; CHECK-P8-NEXT: xxswapd v10, vs7 > -; CHECK-P8-NEXT: vmrglh v1, v8, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs9 > -; CHECK-P8-NEXT: vmrglh v6, v9, v6 > -; CHECK-P8-NEXT: xxswapd v9, vs0 > -; CHECK-P8-NEXT: vmrglh v7, v10, v7 > -; CHECK-P8-NEXT: vmrglh v2, v8, v2 > -; CHECK-P8-NEXT: vmrglh v3, v9, v3 > -; CHECK-P8-NEXT: vmrglw v4, v5, v4 > -; CHECK-P8-NEXT: vmrglw v5, v1, v0 > -; CHECK-P8-NEXT: vmrglw v0, v7, v6 > +; CHECK-P8-NEXT: vmrghh v4, v8, v4 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f7 > +; CHECK-P8-NEXT: vmrghh v5, v9, v5 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: vmrghh v0, v8, v0 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghh v1, v9, v1 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghh v6, v8, v6 > +; CHECK-P8-NEXT: vmrghh v7, v9, v7 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: vmrglw v3, v5, v4 > +; CHECK-P8-NEXT: vmrglw v4, v1, v0 > +; CHECK-P8-NEXT: vmrglw v5, v7, v6 > +; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > +; CHECK-P8-NEXT: stvx v2, 0, r3 > ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 > -; CHECK-P8-NEXT: stvx v3, 0, r3 > -; CHECK-P8-NEXT: xxmrgld v2, v2, v0 > -; CHECK-P8-NEXT: stvx v2, r3, r5 > +; CHECK-P8-NEXT: stvx v3, r3, r5 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test16elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs4, 0(r4) > -; CHECK-P9-NEXT: lxv vs3, 16(r4) > -; CHECK-P9-NEXT: lxv vs2, 32(r4) > -; CHECK-P9-NEXT: xscvdpsxws f5, f4 > -; CHECK-P9-NEXT: lxv vs1, 48(r4) > -; CHECK-P9-NEXT: xscvdpsxws f6, f3 > -; CHECK-P9-NEXT: lxv vs0, 64(r4) > -; CHECK-P9-NEXT: xscvdpsxws f7, f2 > -; CHECK-P9-NEXT: xscvdpsxws f8, f1 > -; CHECK-P9-NEXT: xxswapd vs4, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f4, f4 > -; CHECK-P9-NEXT: mffprwz r5, f5 > -; CHECK-P9-NEXT: xscvdpsxws f9, f0 > +; CHECK-P9-NEXT: lxv vs3, 0(r4) > +; CHECK-P9-NEXT: lxv vs2, 16(r4) > +; CHECK-P9-NEXT: lxv vs1, 32(r4) > +; CHECK-P9-NEXT: xscvdpsxws f4, f3 > +; CHECK-P9-NEXT: lxv vs0, 48(r4) > +; CHECK-P9-NEXT: xscvdpsxws f5, f2 > +; CHECK-P9-NEXT: xscvdpsxws f6, f1 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: xscvdpsxws f7, f0 > +; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: mffprwz r5, f4 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: mtfprd f5, r5 > -; CHECK-P9-NEXT: mffprwz r5, f6 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mtfprd f6, r5 > +; CHECK-P9-NEXT: mtvsrd v2, r5 > +; CHECK-P9-NEXT: mffprwz r5, f5 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: mffprwz r5, f6 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > ; CHECK-P9-NEXT: mffprwz r5, f7 > -; CHECK-P9-NEXT: mtfprd f7, r5 > -; CHECK-P9-NEXT: mffprwz r5, f8 > -; CHECK-P9-NEXT: mtfprd f8, r5 > -; CHECK-P9-NEXT: mffprwz r5, f9 > -; CHECK-P9-NEXT: mtfprd f9, r5 > -; CHECK-P9-NEXT: mffprwz r5, f4 > -; CHECK-P9-NEXT: mtfprd f4, r5 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > ; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: lxv vs3, 64(r4) > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs5 > -; CHECK-P9-NEXT: xxswapd v5, vs8 > -; CHECK-P9-NEXT: xxswapd v0, vs9 > -; CHECK-P9-NEXT: mtfprd f3, r5 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f2 > -; CHECK-P9-NEXT: mtfprd f2, r5 > -; CHECK-P9-NEXT: xxswapd vs0, vs0 > -; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: xxswapd v1, vs2 > ; CHECK-P9-NEXT: lxv vs2, 80(r4) > -; CHECK-P9-NEXT: xxswapd v3, vs4 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs6 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > -; CHECK-P9-NEXT: xscvdpsxws f3, f2 > -; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: vmrghh v2, v2, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f1 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs7 > -; CHECK-P9-NEXT: mtfprd f1, r5 > +; CHECK-P9-NEXT: lxv vs1, 96(r4) > +; CHECK-P9-NEXT: xscvdpsxws f4, f3 > +; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: vmrghh v3, v3, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v1 > -; CHECK-P9-NEXT: xxswapd v1, vs1 > -; CHECK-P9-NEXT: mtfprd f0, r5 > -; CHECK-P9-NEXT: vmrglh v5, v5, v1 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: xxswapd v1, vs0 > ; CHECK-P9-NEXT: lxv vs0, 112(r4) > -; CHECK-P9-NEXT: lxv vs1, 96(r4) > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: vmrghh v5, v5, v0 > +; CHECK-P9-NEXT: mffprwz r4, f4 > +; CHECK-P9-NEXT: vmrglw v4, v5, v4 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f3 > -; CHECK-P9-NEXT: mtfprd f3, r4 > +; CHECK-P9-NEXT: xscvdpsxws f3, f2 > +; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > +; CHECK-P9-NEXT: stxv vs4, 0(r3) > +; CHECK-P9-NEXT: mffprwz r4, f3 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: vmrglw v3, v5, v4 > -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2 > -; CHECK-P9-NEXT: xxswapd v2, vs3 > -; CHECK-P9-NEXT: vmrglh v0, v0, v1 > -; CHECK-P9-NEXT: mtfprd f2, r4 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: mtfprd f2, r4 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f1 > -; CHECK-P9-NEXT: mtfprd f1, r4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r4, f1 > -; CHECK-P9-NEXT: mtfprd f1, r4 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r4 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 > ; CHECK-P9-NEXT: stxv vs0, 16(r3) > -; CHECK-P9-NEXT: stxv vs4, 0(r3) > ; CHECK-P9-NEXT: blr > ; > ; CHECK-BE-LABEL: test16elt: > @@ -639,12 +579,10 @@ define i32 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f1, v2 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglh v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: vmrghh v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: blr > @@ -653,15 +591,13 @@ define i32 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P9: # %bb.0: # %entry > ; CHECK-P9-NEXT: xscvdpsxws f0, v2 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: li r3, 0 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -700,18 +636,14 @@ define i64 @test4elt_signed(<4 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f2 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: xxswapd v2, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs3 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xxswapd v5, vs1 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglh v3, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: vmrghh v2, v4, v2 > +; CHECK-P8-NEXT: vmrghh v3, v5, v3 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > @@ -725,22 +657,18 @@ define i64 @test4elt_signed(<4 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > @@ -799,36 +727,28 @@ define <8 x i16> @test8elt_signed(<8 x double>* > nocapture readonly) local_unname > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: mtfprd f4, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs4 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f6, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v1, vs7 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v0, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs2 > -; CHECK-P8-NEXT: vmrglh v2, v5, v2 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > -; CHECK-P8-NEXT: vmrglh v3, v0, v3 > -; CHECK-P8-NEXT: vmrglh v4, v6, v4 > -; CHECK-P8-NEXT: vmrglh v5, v5, v1 > +; CHECK-P8-NEXT: vmrghh v2, v0, v2 > +; CHECK-P8-NEXT: vmrghh v3, v1, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: vmrghh v4, v0, v4 > +; CHECK-P8-NEXT: vmrghh v5, v1, v5 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > ; CHECK-P8-NEXT: vmrglw v3, v5, v4 > ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > @@ -840,47 +760,39 @@ define <8 x i16> @test8elt_signed(<8 x double>* > nocapture readonly) local_unname > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 48(r3) > ; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: xxswapd v2, vs4 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > ; CHECK-P9-NEXT: blr > @@ -944,209 +856,177 @@ entry: > define void @test16elt_signed(<16 x i16>* noalias nocapture sret > %agg.result, <16 x double>* nocapture readonly) local_unnamed_addr #3 { > ; CHECK-P8-LABEL: test16elt_signed: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r5, 16 > +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r6, 32 > +; CHECK-P8-NEXT: li r7, 48 > ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5 > ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6 > -; CHECK-P8-NEXT: li r6, 48 > -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6 > ; CHECK-P8-NEXT: li r6, 64 > -; CHECK-P8-NEXT: xscvdpsxws f4, f0 > +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7 > ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6 > -; CHECK-P8-NEXT: li r6, 80 > +; CHECK-P8-NEXT: li r7, 80 > +; CHECK-P8-NEXT: li r6, 96 > +; CHECK-P8-NEXT: xscvdpsxws f4, f0 > +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7 > +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6 > +; CHECK-P8-NEXT: li r6, 112 > ; CHECK-P8-NEXT: xxswapd vs0, vs0 > ; CHECK-P8-NEXT: xscvdpsxws f6, f1 > -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6 > -; CHECK-P8-NEXT: li r6, 96 > ; CHECK-P8-NEXT: xxswapd vs1, vs1 > ; CHECK-P8-NEXT: xscvdpsxws f8, f2 > -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6 > -; CHECK-P8-NEXT: li r6, 112 > ; CHECK-P8-NEXT: xxswapd vs2, vs2 > -; CHECK-P8-NEXT: xscvdpsxws f10, f3 > -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6 > +; CHECK-P8-NEXT: xscvdpsxws f9, f3 > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > -; CHECK-P8-NEXT: xscvdpsxws f12, f5 > +; CHECK-P8-NEXT: xscvdpsxws f11, f5 > ; CHECK-P8-NEXT: xxswapd vs5, vs5 > -; CHECK-P8-NEXT: xscvdpsxws f13, f7 > +; CHECK-P8-NEXT: xscvdpsxws f12, f7 > ; CHECK-P8-NEXT: xxswapd vs7, vs7 > -; CHECK-P8-NEXT: xscvdpsxws v2, f9 > -; CHECK-P8-NEXT: xxswapd vs9, vs9 > -; CHECK-P8-NEXT: mffprwz r4, f4 > -; CHECK-P8-NEXT: xscvdpsxws v3, f11 > -; CHECK-P8-NEXT: xxswapd vs11, vs11 > -; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mffprwz r6, f6 > -; CHECK-P8-NEXT: mtfprd f4, r4 > +; CHECK-P8-NEXT: mffprwz r7, f4 > +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xscvdpsxws f13, f10 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f8 > +; CHECK-P8-NEXT: xscvdpsxws f6, f4 > +; CHECK-P8-NEXT: mtvsrd v4, r4 > +; CHECK-P8-NEXT: mffprwz r4, f9 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: mffprwz r4, f11 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs4 > -; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: mtfprd f6, r6 > -; CHECK-P8-NEXT: mffprwz r6, f10 > -; CHECK-P8-NEXT: mtfprd f8, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs6 > +; CHECK-P8-NEXT: mtvsrd v0, r4 > ; CHECK-P8-NEXT: mffprwz r4, f12 > -; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: xxswapd v0, vs8 > -; CHECK-P8-NEXT: mtfprd f10, r6 > -; CHECK-P8-NEXT: mffprwz r6, f13 > -; CHECK-P8-NEXT: mtfprd f12, r4 > -; CHECK-P8-NEXT: xxswapd v1, vs10 > -; CHECK-P8-NEXT: mfvsrwz r4, v2 > +; CHECK-P8-NEXT: xscvdpsxws f2, f2 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: mffprwz r4, f13 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: mtfprd f13, r6 > -; CHECK-P8-NEXT: mfvsrwz r6, v3 > -; CHECK-P8-NEXT: mtvsrd v2, r4 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > +; CHECK-P8-NEXT: mtvsrd v6, r4 > +; CHECK-P8-NEXT: mffprwz r4, f6 > +; CHECK-P8-NEXT: xxswapd vs6, vs10 > +; CHECK-P8-NEXT: xscvdpsxws f5, f5 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: xxswapd vs0, vs4 > +; CHECK-P8-NEXT: mtvsrd v2, r7 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f1 > ; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: xxswapd v2, v2 > -; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: mtvsrd v3, r6 > -; CHECK-P8-NEXT: mffprwz r6, f1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v3, v3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r4, f2 > -; CHECK-P8-NEXT: mtfprd f1, r6 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: mtfprd f2, r4 > +; CHECK-P8-NEXT: xscvdpsxws f4, f6 > +; CHECK-P8-NEXT: vmrghh v2, v8, v2 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f3 > +; CHECK-P8-NEXT: xscvdpsxws f0, f0 > +; CHECK-P8-NEXT: vmrghh v3, v9, v3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > -; CHECK-P8-NEXT: mffprwz r6, f3 > -; CHECK-P8-NEXT: xxswapd v10, vs2 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: mffprwz r4, f9 > -; CHECK-P8-NEXT: mtfprd f3, r6 > -; CHECK-P8-NEXT: mffprwz r6, f7 > -; CHECK-P8-NEXT: mtfprd f9, r4 > -; CHECK-P8-NEXT: mffprwz r4, f11 > -; CHECK-P8-NEXT: vmrglh v4, v8, v4 > -; CHECK-P8-NEXT: xxswapd v8, vs3 > -; CHECK-P8-NEXT: vmrglh v5, v9, v5 > -; CHECK-P8-NEXT: xxswapd v9, vs5 > -; CHECK-P8-NEXT: mtfprd f7, r6 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: vmrglh v0, v10, v0 > -; CHECK-P8-NEXT: xxswapd v10, vs7 > -; CHECK-P8-NEXT: vmrglh v1, v8, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs9 > -; CHECK-P8-NEXT: vmrglh v6, v9, v6 > -; CHECK-P8-NEXT: xxswapd v9, vs0 > -; CHECK-P8-NEXT: vmrglh v7, v10, v7 > -; CHECK-P8-NEXT: vmrglh v2, v8, v2 > -; CHECK-P8-NEXT: vmrglh v3, v9, v3 > -; CHECK-P8-NEXT: vmrglw v4, v5, v4 > -; CHECK-P8-NEXT: vmrglw v5, v1, v0 > -; CHECK-P8-NEXT: vmrglw v0, v7, v6 > +; CHECK-P8-NEXT: vmrghh v4, v8, v4 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f7 > +; CHECK-P8-NEXT: vmrghh v5, v9, v5 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: mffprwz r4, f4 > +; CHECK-P8-NEXT: vmrghh v0, v8, v0 > +; CHECK-P8-NEXT: mtvsrd v8, r4 > +; CHECK-P8-NEXT: mffprwz r4, f0 > +; CHECK-P8-NEXT: vmrghh v1, v9, v1 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghh v6, v8, v6 > +; CHECK-P8-NEXT: vmrghh v7, v9, v7 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: vmrglw v3, v5, v4 > +; CHECK-P8-NEXT: vmrglw v4, v1, v0 > +; CHECK-P8-NEXT: vmrglw v5, v7, v6 > +; CHECK-P8-NEXT: xxmrgld v2, v3, v2 > +; CHECK-P8-NEXT: stvx v2, 0, r3 > ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 > -; CHECK-P8-NEXT: stvx v3, 0, r3 > -; CHECK-P8-NEXT: xxmrgld v2, v2, v0 > -; CHECK-P8-NEXT: stvx v2, r3, r5 > +; CHECK-P8-NEXT: stvx v3, r3, r5 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test16elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: lxv vs4, 0(r4) > -; CHECK-P9-NEXT: lxv vs3, 16(r4) > -; CHECK-P9-NEXT: lxv vs2, 32(r4) > -; CHECK-P9-NEXT: xscvdpsxws f5, f4 > -; CHECK-P9-NEXT: lxv vs1, 48(r4) > -; CHECK-P9-NEXT: xscvdpsxws f6, f3 > -; CHECK-P9-NEXT: lxv vs0, 64(r4) > -; CHECK-P9-NEXT: xscvdpsxws f7, f2 > -; CHECK-P9-NEXT: xscvdpsxws f8, f1 > -; CHECK-P9-NEXT: xxswapd vs4, vs4 > -; CHECK-P9-NEXT: xscvdpsxws f4, f4 > -; CHECK-P9-NEXT: mffprwz r5, f5 > -; CHECK-P9-NEXT: xscvdpsxws f9, f0 > +; CHECK-P9-NEXT: lxv vs3, 0(r4) > +; CHECK-P9-NEXT: lxv vs2, 16(r4) > +; CHECK-P9-NEXT: lxv vs1, 32(r4) > +; CHECK-P9-NEXT: xscvdpsxws f4, f3 > +; CHECK-P9-NEXT: lxv vs0, 48(r4) > +; CHECK-P9-NEXT: xscvdpsxws f5, f2 > +; CHECK-P9-NEXT: xscvdpsxws f6, f1 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: xscvdpsxws f7, f0 > +; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: mffprwz r5, f4 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: mtfprd f5, r5 > -; CHECK-P9-NEXT: mffprwz r5, f6 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: mtfprd f6, r5 > +; CHECK-P9-NEXT: mtvsrd v2, r5 > +; CHECK-P9-NEXT: mffprwz r5, f5 > +; CHECK-P9-NEXT: mtvsrd v3, r5 > +; CHECK-P9-NEXT: mffprwz r5, f6 > +; CHECK-P9-NEXT: mtvsrd v4, r5 > ; CHECK-P9-NEXT: mffprwz r5, f7 > -; CHECK-P9-NEXT: mtfprd f7, r5 > -; CHECK-P9-NEXT: mffprwz r5, f8 > -; CHECK-P9-NEXT: mtfprd f8, r5 > -; CHECK-P9-NEXT: mffprwz r5, f9 > -; CHECK-P9-NEXT: mtfprd f9, r5 > -; CHECK-P9-NEXT: mffprwz r5, f4 > -; CHECK-P9-NEXT: mtfprd f4, r5 > +; CHECK-P9-NEXT: mtvsrd v5, r5 > ; CHECK-P9-NEXT: mffprwz r5, f3 > +; CHECK-P9-NEXT: lxv vs3, 64(r4) > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs5 > -; CHECK-P9-NEXT: xxswapd v5, vs8 > -; CHECK-P9-NEXT: xxswapd v0, vs9 > -; CHECK-P9-NEXT: mtfprd f3, r5 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f2 > -; CHECK-P9-NEXT: mtfprd f2, r5 > -; CHECK-P9-NEXT: xxswapd vs0, vs0 > -; CHECK-P9-NEXT: xscvdpsxws f0, f0 > -; CHECK-P9-NEXT: xxswapd v1, vs2 > ; CHECK-P9-NEXT: lxv vs2, 80(r4) > -; CHECK-P9-NEXT: xxswapd v3, vs4 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs6 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > -; CHECK-P9-NEXT: xscvdpsxws f3, f2 > -; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: vmrghh v2, v2, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f1 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs7 > -; CHECK-P9-NEXT: mtfprd f1, r5 > +; CHECK-P9-NEXT: lxv vs1, 96(r4) > +; CHECK-P9-NEXT: xscvdpsxws f4, f3 > +; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: vmrghh v3, v3, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > ; CHECK-P9-NEXT: mffprwz r5, f0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v1 > -; CHECK-P9-NEXT: xxswapd v1, vs1 > -; CHECK-P9-NEXT: mtfprd f0, r5 > -; CHECK-P9-NEXT: vmrglh v5, v5, v1 > -; CHECK-P9-NEXT: xscvdpsxws f2, f2 > -; CHECK-P9-NEXT: xxswapd v1, vs0 > ; CHECK-P9-NEXT: lxv vs0, 112(r4) > -; CHECK-P9-NEXT: lxv vs1, 96(r4) > +; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: vmrghh v4, v4, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r5 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: vmrghh v5, v5, v0 > +; CHECK-P9-NEXT: mffprwz r4, f4 > +; CHECK-P9-NEXT: vmrglw v4, v5, v4 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f3 > -; CHECK-P9-NEXT: mtfprd f3, r4 > +; CHECK-P9-NEXT: xscvdpsxws f3, f2 > +; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > +; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > +; CHECK-P9-NEXT: stxv vs4, 0(r3) > +; CHECK-P9-NEXT: mffprwz r4, f3 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: vmrglw v3, v5, v4 > -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2 > -; CHECK-P9-NEXT: xxswapd v2, vs3 > -; CHECK-P9-NEXT: vmrglh v0, v0, v1 > -; CHECK-P9-NEXT: mtfprd f2, r4 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r4, f2 > -; CHECK-P9-NEXT: mtfprd f2, r4 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r4 > ; CHECK-P9-NEXT: mffprwz r4, f1 > -; CHECK-P9-NEXT: mtfprd f1, r4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghh v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r4, f1 > -; CHECK-P9-NEXT: mtfprd f1, r4 > +; CHECK-P9-NEXT: mtvsrd v4, r4 > ; CHECK-P9-NEXT: mffprwz r4, f0 > -; CHECK-P9-NEXT: vmrglh v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: vmrglh v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v0 > -; CHECK-P9-NEXT: mtfprd f0, r4 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglh v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r4 > +; CHECK-P9-NEXT: vmrghh v4, v4, v5 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 > ; CHECK-P9-NEXT: stxv vs0, 16(r3) > -; CHECK-P9-NEXT: stxv vs4, 0(r3) > ; CHECK-P9-NEXT: blr > ; > ; CHECK-BE-LABEL: test16elt_signed: > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll > index 369fb3f10100..173ced964ad6 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll > @@ -16,12 +16,10 @@ define i64 @test2elt(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpuxws f1, v2 > ; CHECK-P8-NEXT: xscvdpuxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrwz v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglw v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrwz v3, r4 > +; CHECK-P8-NEXT: vmrghw v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -35,7 +33,7 @@ define i64 @test2elt(<2 x double> %a) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xscvdpuxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > ; CHECK-P9-NEXT: mtvsrws v2, r3 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: vmrghw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -310,12 +308,10 @@ define i64 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f1, v2 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrwz v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglw v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrwz v3, r4 > +; CHECK-P8-NEXT: vmrghw v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -329,7 +325,7 @@ define i64 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > ; CHECK-P9-NEXT: mtvsrws v2, r3 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: vmrghw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll > index fb13d1bd71f5..fd28d9a1afdc 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll > @@ -16,12 +16,10 @@ define i16 @test2elt(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f1, v2 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglb v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: vmrghb v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: clrldi r3, r3, 48 > @@ -33,15 +31,13 @@ define i16 @test2elt(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P9: # %bb.0: # %entry > ; CHECK-P9-NEXT: xscvdpsxws f0, v2 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: addi r3, r1, -2 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 > ; CHECK-P9-NEXT: stxsihx v2, 0, r3 > ; CHECK-P9-NEXT: lhz r3, -2(r1) > @@ -84,18 +80,14 @@ define i32 @test4elt(<4 x double>* nocapture readonly) > local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f2 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: xxswapd v2, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs3 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xxswapd v5, vs1 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > -; CHECK-P8-NEXT: vmrglb v3, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: vmrghb v2, v4, v2 > +; CHECK-P8-NEXT: vmrghb v3, v5, v3 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > @@ -109,24 +101,20 @@ define i32 @test4elt(<4 x double>* nocapture > readonly) local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: li r3, 0 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > +; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -185,36 +173,28 @@ define i64 @test8elt(<8 x double>* nocapture > readonly) local_unnamed_addr #1 { > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: mtfprd f4, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs4 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f6, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v1, vs7 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v0, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs2 > -; CHECK-P8-NEXT: vmrglb v2, v5, v2 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > -; CHECK-P8-NEXT: vmrglb v3, v0, v3 > -; CHECK-P8-NEXT: vmrglb v4, v6, v4 > -; CHECK-P8-NEXT: vmrglb v5, v5, v1 > +; CHECK-P8-NEXT: vmrghb v2, v0, v2 > +; CHECK-P8-NEXT: vmrghb v3, v1, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: vmrghb v4, v0, v4 > +; CHECK-P8-NEXT: vmrghb v5, v1, v5 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > ; CHECK-P8-NEXT: vmrglh v3, v5, v4 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > @@ -228,47 +208,39 @@ define i64 @test8elt(<8 x double>* nocapture > readonly) local_unnamed_addr #1 { > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 48(r3) > ; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: xxswapd v2, vs4 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > @@ -364,79 +336,63 @@ define <16 x i8> @test16elt(<16 x double>* nocapture > readonly) local_unnamed_add > ; CHECK-P8-NEXT: xxswapd vs7, vs7 > ; CHECK-P8-NEXT: xscvdpsxws v2, f9 > ; CHECK-P8-NEXT: xxswapd vs9, vs9 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws v3, f11 > ; CHECK-P8-NEXT: xxswapd vs11, vs11 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f6 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mtfprd f4, r3 > -; CHECK-P8-NEXT: mffprwz r3, f8 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs4 > -; CHECK-P8-NEXT: mtfprd f6, r4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f8 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r4, f10 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxswapd v5, vs6 > -; CHECK-P8-NEXT: mtfprd f8, r3 > -; CHECK-P8-NEXT: mffprwz r3, f12 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xxswapd v0, vs8 > -; CHECK-P8-NEXT: mtfprd f10, r4 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mffprwz r3, f12 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r4, f13 > ; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: xxswapd v1, vs10 > -; CHECK-P8-NEXT: mtfprd f12, r3 > -; CHECK-P8-NEXT: mfvsrwz r3, v2 > ; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: mtfprd f13, r4 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > +; CHECK-P8-NEXT: mfvsrwz r3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r4 > ; CHECK-P8-NEXT: mfvsrwz r4, v3 > -; CHECK-P8-NEXT: mtvsrd v2, r3 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > -; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: xxswapd v2, v2 > ; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > +; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, v3 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > +; CHECK-P8-NEXT: vmrghb v4, v8, v4 > +; CHECK-P8-NEXT: vmrghb v5, v9, v5 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f5 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v10, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f5, r3 > +; CHECK-P8-NEXT: vmrghb v0, v8, v0 > +; CHECK-P8-NEXT: vmrghb v1, v9, v1 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f9 > -; CHECK-P8-NEXT: mtfprd f7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f11 > -; CHECK-P8-NEXT: vmrglb v4, v8, v4 > -; CHECK-P8-NEXT: xxswapd v8, vs3 > -; CHECK-P8-NEXT: vmrglb v5, v9, v5 > -; CHECK-P8-NEXT: xxswapd v9, vs5 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: vmrglb v0, v10, v0 > -; CHECK-P8-NEXT: xxswapd v10, vs7 > -; CHECK-P8-NEXT: vmrglb v1, v8, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: vmrglb v6, v9, v6 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > -; CHECK-P8-NEXT: vmrglb v7, v10, v7 > -; CHECK-P8-NEXT: vmrglb v2, v8, v2 > -; CHECK-P8-NEXT: vmrglb v3, v9, v3 > +; CHECK-P8-NEXT: vmrghb v6, v8, v6 > +; CHECK-P8-NEXT: vmrghb v2, v9, v2 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghb v3, v8, v3 > +; CHECK-P8-NEXT: vmrghb v7, v9, v7 > ; CHECK-P8-NEXT: vmrglh v4, v5, v4 > ; CHECK-P8-NEXT: vmrglh v5, v1, v0 > -; CHECK-P8-NEXT: vmrglh v0, v7, v6 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > -; CHECK-P8-NEXT: vmrglw v2, v2, v0 > -; CHECK-P8-NEXT: xxmrgld v2, v2, v3 > +; CHECK-P8-NEXT: vmrglh v2, v2, v6 > +; CHECK-P8-NEXT: vmrglh v3, v7, v3 > +; CHECK-P8-NEXT: vmrglw v4, v5, v4 > +; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxmrgld v2, v2, v4 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test16elt: > @@ -445,94 +401,78 @@ define <16 x i8> @test16elt(<16 x double>* nocapture > readonly) local_unnamed_add > ; CHECK-P9-NEXT: xscvdpsxws f8, f7 > ; CHECK-P9-NEXT: xxswapd vs7, vs7 > ; CHECK-P9-NEXT: xscvdpsxws f7, f7 > +; CHECK-P9-NEXT: lxv vs6, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 112(r3) > ; CHECK-P9-NEXT: lxv vs1, 96(r3) > ; CHECK-P9-NEXT: lxv vs2, 80(r3) > ; CHECK-P9-NEXT: lxv vs3, 64(r3) > ; CHECK-P9-NEXT: lxv vs4, 48(r3) > ; CHECK-P9-NEXT: lxv vs5, 32(r3) > -; CHECK-P9-NEXT: lxv vs6, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f8 > -; CHECK-P9-NEXT: mtfprd f8, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f7 > -; CHECK-P9-NEXT: xxswapd v2, vs8 > -; CHECK-P9-NEXT: mtfprd f7, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs7 > ; CHECK-P9-NEXT: xscvdpsxws f7, f6 > ; CHECK-P9-NEXT: xxswapd vs6, vs6 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f6, f6 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f7 > -; CHECK-P9-NEXT: mtfprd f7, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f6 > -; CHECK-P9-NEXT: mtfprd f6, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs6 > ; CHECK-P9-NEXT: xscvdpsxws f6, f5 > ; CHECK-P9-NEXT: xxswapd vs5, vs5 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f5, f5 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f6 > -; CHECK-P9-NEXT: mtfprd f6, r3 > -; CHECK-P9-NEXT: mffprwz r3, f5 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs7 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs6 > -; CHECK-P9-NEXT: mtfprd f5, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs5 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f5 > ; CHECK-P9-NEXT: xscvdpsxws f5, f4 > ; CHECK-P9-NEXT: xxswapd vs4, vs4 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f5 > -; CHECK-P9-NEXT: mtfprd f5, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs4 > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs5 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs4 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: vmrglh v3, v4, v3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > -; CHECK-P9-NEXT: xxswapd v0, vs0 > -; CHECK-P9-NEXT: vmrglb v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r3 > +; CHECK-P9-NEXT: vmrghb v5, v5, v0 > ; CHECK-P9-NEXT: vmrglh v4, v5, v4 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > @@ -649,12 +589,10 @@ define i16 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvdpsxws f1, v2 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: mffprwz r3, f1 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r4, f0 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: xxswapd v3, vs1 > -; CHECK-P8-NEXT: vmrglb v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: vmrghb v2, v2, v3 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: clrldi r3, r3, 48 > @@ -666,15 +604,13 @@ define i16 @test2elt_signed(<2 x double> %a) > local_unnamed_addr #0 { > ; CHECK-P9: # %bb.0: # %entry > ; CHECK-P9-NEXT: xscvdpsxws f0, v2 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs0 > ; CHECK-P9-NEXT: xxswapd vs0, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: addi r3, r1, -2 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglb v2, v3, v2 > +; CHECK-P9-NEXT: vmrghb v2, v3, v2 > ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 > ; CHECK-P9-NEXT: stxsihx v2, 0, r3 > ; CHECK-P9-NEXT: lhz r3, -2(r1) > @@ -717,18 +653,14 @@ define i32 @test4elt_signed(<4 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > ; CHECK-P8-NEXT: mffprwz r3, f2 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: mtfprd f3, r4 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: xxswapd v2, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs3 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: xxswapd v5, vs1 > -; CHECK-P8-NEXT: vmrglb v2, v3, v2 > -; CHECK-P8-NEXT: vmrglb v3, v5, v4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > +; CHECK-P8-NEXT: vmrghb v2, v4, v2 > +; CHECK-P8-NEXT: vmrghb v3, v5, v3 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprwz r3, f0 > @@ -742,24 +674,20 @@ define i32 @test4elt_signed(<4 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > ; CHECK-P9-NEXT: lxv vs0, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v2, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs1 > -; CHECK-P9-NEXT: xxswapd v4, vs0 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: vmrglh v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: li r3, 0 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > +; CHECK-P9-NEXT: vmrglh v2, v3, v2 > ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -818,36 +746,28 @@ define i64 @test8elt_signed(<8 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f5 > -; CHECK-P8-NEXT: mtfprd f4, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: mffprwz r3, f6 > -; CHECK-P8-NEXT: mtfprd f5, r4 > -; CHECK-P8-NEXT: xxswapd v2, vs4 > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f6, r3 > -; CHECK-P8-NEXT: xxswapd v3, vs5 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r3, f0 > -; CHECK-P8-NEXT: mtfprd f7, r4 > -; CHECK-P8-NEXT: xxswapd v4, vs6 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v1, vs7 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v0, vs1 > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: xxswapd v6, vs2 > -; CHECK-P8-NEXT: vmrglb v2, v5, v2 > -; CHECK-P8-NEXT: xxswapd v5, vs0 > -; CHECK-P8-NEXT: vmrglb v3, v0, v3 > -; CHECK-P8-NEXT: vmrglb v4, v6, v4 > -; CHECK-P8-NEXT: vmrglb v5, v5, v1 > +; CHECK-P8-NEXT: vmrghb v2, v0, v2 > +; CHECK-P8-NEXT: vmrghb v3, v1, v3 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > +; CHECK-P8-NEXT: vmrghb v4, v0, v4 > +; CHECK-P8-NEXT: vmrghb v5, v1, v5 > ; CHECK-P8-NEXT: vmrglh v2, v3, v2 > ; CHECK-P8-NEXT: vmrglh v3, v5, v4 > ; CHECK-P8-NEXT: vmrglw v2, v3, v2 > @@ -861,47 +781,39 @@ define i64 @test8elt_signed(<8 x double>* nocapture > readonly) local_unnamed_addr > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > +; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 48(r3) > ; CHECK-P9-NEXT: lxv vs1, 32(r3) > -; CHECK-P9-NEXT: lxv vs2, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: xxswapd v2, vs4 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs1 > -; CHECK-P9-NEXT: xxswapd v5, vs0 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: vmrglw v2, v3, v2 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > @@ -997,79 +909,63 @@ define <16 x i8> @test16elt_signed(<16 x double>* > nocapture readonly) local_unna > ; CHECK-P8-NEXT: xxswapd vs7, vs7 > ; CHECK-P8-NEXT: xscvdpsxws v2, f9 > ; CHECK-P8-NEXT: xxswapd vs9, vs9 > -; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: xscvdpsxws v3, f11 > ; CHECK-P8-NEXT: xxswapd vs11, vs11 > +; CHECK-P8-NEXT: mffprwz r3, f4 > ; CHECK-P8-NEXT: mffprwz r4, f6 > ; CHECK-P8-NEXT: xscvdpsxws f0, f0 > -; CHECK-P8-NEXT: mtfprd f4, r3 > -; CHECK-P8-NEXT: mffprwz r3, f8 > ; CHECK-P8-NEXT: xscvdpsxws f1, f1 > -; CHECK-P8-NEXT: xxswapd v4, vs4 > -; CHECK-P8-NEXT: mtfprd f6, r4 > +; CHECK-P8-NEXT: mtvsrd v4, r3 > +; CHECK-P8-NEXT: mffprwz r3, f8 > +; CHECK-P8-NEXT: mtvsrd v5, r4 > ; CHECK-P8-NEXT: mffprwz r4, f10 > ; CHECK-P8-NEXT: xscvdpsxws f2, f2 > -; CHECK-P8-NEXT: xxswapd v5, vs6 > -; CHECK-P8-NEXT: mtfprd f8, r3 > -; CHECK-P8-NEXT: mffprwz r3, f12 > ; CHECK-P8-NEXT: xscvdpsxws f3, f3 > -; CHECK-P8-NEXT: xxswapd v0, vs8 > -; CHECK-P8-NEXT: mtfprd f10, r4 > +; CHECK-P8-NEXT: mtvsrd v0, r3 > +; CHECK-P8-NEXT: mffprwz r3, f12 > +; CHECK-P8-NEXT: mtvsrd v1, r4 > ; CHECK-P8-NEXT: mffprwz r4, f13 > ; CHECK-P8-NEXT: xscvdpsxws f5, f5 > -; CHECK-P8-NEXT: xxswapd v1, vs10 > -; CHECK-P8-NEXT: mtfprd f12, r3 > -; CHECK-P8-NEXT: mfvsrwz r3, v2 > ; CHECK-P8-NEXT: xscvdpsxws f7, f7 > -; CHECK-P8-NEXT: xxswapd v6, vs12 > -; CHECK-P8-NEXT: mtfprd f13, r4 > +; CHECK-P8-NEXT: mtvsrd v6, r3 > +; CHECK-P8-NEXT: mfvsrwz r3, v2 > +; CHECK-P8-NEXT: mtvsrd v2, r4 > ; CHECK-P8-NEXT: mfvsrwz r4, v3 > -; CHECK-P8-NEXT: mtvsrd v2, r3 > -; CHECK-P8-NEXT: xxswapd v7, vs13 > -; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: xscvdpsxws f9, f9 > -; CHECK-P8-NEXT: xxswapd v2, v2 > ; CHECK-P8-NEXT: xscvdpsxws f11, f11 > -; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > +; CHECK-P8-NEXT: mtvsrd v7, r4 > +; CHECK-P8-NEXT: mffprwz r3, f0 > ; CHECK-P8-NEXT: mffprwz r4, f1 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: xxswapd v3, v3 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f2 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > ; CHECK-P8-NEXT: mffprwz r4, f3 > -; CHECK-P8-NEXT: mtfprd f2, r3 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > +; CHECK-P8-NEXT: vmrghb v4, v8, v4 > +; CHECK-P8-NEXT: vmrghb v5, v9, v5 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f5 > -; CHECK-P8-NEXT: mtfprd f3, r4 > -; CHECK-P8-NEXT: xxswapd v10, vs2 > ; CHECK-P8-NEXT: mffprwz r4, f7 > -; CHECK-P8-NEXT: mtfprd f5, r3 > +; CHECK-P8-NEXT: vmrghb v0, v8, v0 > +; CHECK-P8-NEXT: vmrghb v1, v9, v1 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > ; CHECK-P8-NEXT: mffprwz r3, f9 > -; CHECK-P8-NEXT: mtfprd f7, r4 > ; CHECK-P8-NEXT: mffprwz r4, f11 > -; CHECK-P8-NEXT: vmrglb v4, v8, v4 > -; CHECK-P8-NEXT: xxswapd v8, vs3 > -; CHECK-P8-NEXT: vmrglb v5, v9, v5 > -; CHECK-P8-NEXT: xxswapd v9, vs5 > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: mtfprd f1, r4 > -; CHECK-P8-NEXT: vmrglb v0, v10, v0 > -; CHECK-P8-NEXT: xxswapd v10, vs7 > -; CHECK-P8-NEXT: vmrglb v1, v8, v1 > -; CHECK-P8-NEXT: xxswapd v8, vs0 > -; CHECK-P8-NEXT: vmrglb v6, v9, v6 > -; CHECK-P8-NEXT: xxswapd v9, vs1 > -; CHECK-P8-NEXT: vmrglb v7, v10, v7 > -; CHECK-P8-NEXT: vmrglb v2, v8, v2 > -; CHECK-P8-NEXT: vmrglb v3, v9, v3 > +; CHECK-P8-NEXT: vmrghb v6, v8, v6 > +; CHECK-P8-NEXT: vmrghb v2, v9, v2 > +; CHECK-P8-NEXT: mtvsrd v8, r3 > +; CHECK-P8-NEXT: mtvsrd v9, r4 > +; CHECK-P8-NEXT: vmrghb v3, v8, v3 > +; CHECK-P8-NEXT: vmrghb v7, v9, v7 > ; CHECK-P8-NEXT: vmrglh v4, v5, v4 > ; CHECK-P8-NEXT: vmrglh v5, v1, v0 > -; CHECK-P8-NEXT: vmrglh v0, v7, v6 > -; CHECK-P8-NEXT: vmrglh v2, v3, v2 > -; CHECK-P8-NEXT: vmrglw v3, v5, v4 > -; CHECK-P8-NEXT: vmrglw v2, v2, v0 > -; CHECK-P8-NEXT: xxmrgld v2, v2, v3 > +; CHECK-P8-NEXT: vmrglh v2, v2, v6 > +; CHECK-P8-NEXT: vmrglh v3, v7, v3 > +; CHECK-P8-NEXT: vmrglw v4, v5, v4 > +; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxmrgld v2, v2, v4 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test16elt_signed: > @@ -1078,94 +974,78 @@ define <16 x i8> @test16elt_signed(<16 x double>* > nocapture readonly) local_unna > ; CHECK-P9-NEXT: xscvdpsxws f8, f7 > ; CHECK-P9-NEXT: xxswapd vs7, vs7 > ; CHECK-P9-NEXT: xscvdpsxws f7, f7 > +; CHECK-P9-NEXT: lxv vs6, 16(r3) > ; CHECK-P9-NEXT: lxv vs0, 112(r3) > ; CHECK-P9-NEXT: lxv vs1, 96(r3) > ; CHECK-P9-NEXT: lxv vs2, 80(r3) > ; CHECK-P9-NEXT: lxv vs3, 64(r3) > ; CHECK-P9-NEXT: lxv vs4, 48(r3) > ; CHECK-P9-NEXT: lxv vs5, 32(r3) > -; CHECK-P9-NEXT: lxv vs6, 16(r3) > ; CHECK-P9-NEXT: mffprwz r3, f8 > -; CHECK-P9-NEXT: mtfprd f8, r3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > ; CHECK-P9-NEXT: mffprwz r3, f7 > -; CHECK-P9-NEXT: xxswapd v2, vs8 > -; CHECK-P9-NEXT: mtfprd f7, r3 > -; CHECK-P9-NEXT: xxswapd v3, vs7 > ; CHECK-P9-NEXT: xscvdpsxws f7, f6 > ; CHECK-P9-NEXT: xxswapd vs6, vs6 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: xscvdpsxws f6, f6 > +; CHECK-P9-NEXT: vmrghb v2, v2, v3 > ; CHECK-P9-NEXT: mffprwz r3, f7 > -; CHECK-P9-NEXT: mtfprd f7, r3 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f6 > -; CHECK-P9-NEXT: mtfprd f6, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs6 > ; CHECK-P9-NEXT: xscvdpsxws f6, f5 > ; CHECK-P9-NEXT: xxswapd vs5, vs5 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f5, f5 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f6 > -; CHECK-P9-NEXT: mtfprd f6, r3 > -; CHECK-P9-NEXT: mffprwz r3, f5 > -; CHECK-P9-NEXT: vmrglb v2, v2, v3 > -; CHECK-P9-NEXT: xxswapd v3, vs7 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > ; CHECK-P9-NEXT: vmrglh v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs6 > -; CHECK-P9-NEXT: mtfprd f5, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs5 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > +; CHECK-P9-NEXT: mffprwz r3, f5 > ; CHECK-P9-NEXT: xscvdpsxws f5, f4 > ; CHECK-P9-NEXT: xxswapd vs4, vs4 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f4, f4 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f5 > -; CHECK-P9-NEXT: mtfprd f5, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs4 > ; CHECK-P9-NEXT: xscvdpsxws f4, f3 > ; CHECK-P9-NEXT: xxswapd vs3, vs3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f3 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs5 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: vmrglh v3, v4, v3 > ; CHECK-P9-NEXT: mffprwz r3, f4 > -; CHECK-P9-NEXT: mtfprd f4, r3 > +; CHECK-P9-NEXT: vmrglw v2, v3, v2 > +; CHECK-P9-NEXT: mtvsrd v3, r3 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > ; CHECK-P9-NEXT: xscvdpsxws f3, f2 > ; CHECK-P9-NEXT: xxswapd vs2, vs2 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: xscvdpsxws f2, f2 > +; CHECK-P9-NEXT: vmrghb v3, v3, v4 > ; CHECK-P9-NEXT: mffprwz r3, f3 > -; CHECK-P9-NEXT: mtfprd f3, r3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs2 > ; CHECK-P9-NEXT: xscvdpsxws f2, f1 > ; CHECK-P9-NEXT: xxswapd vs1, vs1 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f1, f1 > -; CHECK-P9-NEXT: vmrglw v2, v3, v2 > -; CHECK-P9-NEXT: xxswapd v3, vs4 > -; CHECK-P9-NEXT: vmrglb v3, v3, v4 > -; CHECK-P9-NEXT: xxswapd v4, vs3 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > -; CHECK-P9-NEXT: vmrglh v3, v4, v3 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: mffprwz r3, f2 > -; CHECK-P9-NEXT: mtfprd f2, r3 > +; CHECK-P9-NEXT: vmrglh v3, v4, v3 > +; CHECK-P9-NEXT: mtvsrd v4, r3 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: xxswapd v4, vs2 > -; CHECK-P9-NEXT: mtfprd f1, r3 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > ; CHECK-P9-NEXT: xscvdpsxws f1, f0 > ; CHECK-P9-NEXT: xxswapd vs0, vs0 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: xscvdpsxws f0, f0 > +; CHECK-P9-NEXT: vmrghb v4, v4, v5 > ; CHECK-P9-NEXT: mffprwz r3, f1 > -; CHECK-P9-NEXT: mtfprd f1, r3 > +; CHECK-P9-NEXT: mtvsrd v5, r3 > ; CHECK-P9-NEXT: mffprwz r3, f0 > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: vmrglb v4, v4, v5 > -; CHECK-P9-NEXT: xxswapd v5, vs1 > -; CHECK-P9-NEXT: xxswapd v0, vs0 > -; CHECK-P9-NEXT: vmrglb v5, v5, v0 > +; CHECK-P9-NEXT: mtvsrd v0, r3 > +; CHECK-P9-NEXT: vmrghb v5, v5, v0 > ; CHECK-P9-NEXT: vmrglh v4, v5, v4 > ; CHECK-P9-NEXT: vmrglw v3, v4, v3 > ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll > index e51af62cb128..5ecd34941b39 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll > @@ -24,9 +24,9 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P8-NEXT: xscvuxdsp f1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -43,12 +43,12 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > ; CHECK-P9-NEXT: vextuhrx r3, r3, v2 > ; CHECK-P9-NEXT: clrlwi r3, r3, 16 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: mtfprwz f0, r3 > ; CHECK-P9-NEXT: xscvuxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -80,25 +80,17 @@ entry: > define <4 x float> @test4elt(i64 %a.coerce) local_unnamed_addr #1 { > ; CHECK-P8-LABEL: test4elt: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l > -; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v4, v2, v3 > +; CHECK-P8-NEXT: xxlxor v2, v2, v2 > +; CHECK-P8-NEXT: mtvsrd v3, r3 > +; CHECK-P8-NEXT: vmrghh v2, v2, v3 > ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test4elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha > -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l > -; CHECK-P9-NEXT: lxvx v3, 0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: xxlxor v4, v4, v4 > -; CHECK-P9-NEXT: vperm v2, v4, v2, v3 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: xxlxor v3, v3, v3 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > ; CHECK-P9-NEXT: xvcvuxwsp v2, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -121,17 +113,11 @@ entry: > define void @test8elt(<8 x float>* noalias nocapture sret %agg.result, <8 > x i16> %a) local_unnamed_addr #2 { > ; CHECK-P8-LABEL: test8elt: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_0 at toc@ha > -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha > -; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_0 at toc@l > -; CHECK-P8-NEXT: lvx v3, 0, r4 > -; CHECK-P8-NEXT: addi r4, r5, .LCPI2_1 at toc@l > -; CHECK-P8-NEXT: lvx v5, 0, r4 > +; CHECK-P8-NEXT: xxlxor v3, v3, v3 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: vperm v3, v4, v2, v3 > -; CHECK-P8-NEXT: vperm v2, v4, v2, v5 > -; CHECK-P8-NEXT: xvcvuxwsp v3, v3 > +; CHECK-P8-NEXT: vmrglh v4, v3, v2 > +; CHECK-P8-NEXT: vmrghh v2, v3, v2 > +; CHECK-P8-NEXT: xvcvuxwsp v3, v4 > ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 > ; CHECK-P8-NEXT: stvx v3, 0, r3 > ; CHECK-P8-NEXT: stvx v2, r3, r4 > @@ -139,19 +125,13 @@ define void @test8elt(<8 x float>* noalias nocapture > sret %agg.result, <8 x i16> > ; > ; CHECK-P9-LABEL: test8elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha > -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l > -; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxlxor v4, v4, v4 > -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha > -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l > -; CHECK-P9-NEXT: vperm v3, v4, v2, v3 > -; CHECK-P9-NEXT: xvcvuxwsp vs0, v3 > -; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: vperm v2, v4, v2, v3 > -; CHECK-P9-NEXT: stxv vs0, 0(r3) > +; CHECK-P9-NEXT: xxlxor v3, v3, v3 > +; CHECK-P9-NEXT: vmrglh v4, v3, v2 > +; CHECK-P9-NEXT: vmrghh v2, v3, v2 > +; CHECK-P9-NEXT: xvcvuxwsp vs0, v4 > ; CHECK-P9-NEXT: xvcvuxwsp vs1, v2 > ; CHECK-P9-NEXT: stxv vs1, 16(r3) > +; CHECK-P9-NEXT: stxv vs0, 0(r3) > ; CHECK-P9-NEXT: blr > ; > ; CHECK-BE-LABEL: test8elt: > @@ -276,9 +256,9 @@ define i64 @test2elt_signed(i32 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvsxdsp f1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -295,12 +275,12 @@ define i64 @test2elt_signed(i32 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > ; CHECK-P9-NEXT: vextuhrx r3, r3, v2 > ; CHECK-P9-NEXT: extsh r3, r3 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: mtfprwa f0, r3 > ; CHECK-P9-NEXT: xscvsxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -332,11 +312,10 @@ entry: > define <4 x float> @test4elt_signed(i64 %a.coerce) local_unnamed_addr #1 { > ; CHECK-P8-LABEL: test4elt_signed: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: mtfprd f0, r3 > +; CHECK-P8-NEXT: mtvsrd v2, r3 > ; CHECK-P8-NEXT: vspltisw v3, 8 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > +; CHECK-P8-NEXT: vmrghh v2, v2, v2 > ; CHECK-P8-NEXT: vadduwm v3, v3, v3 > -; CHECK-P8-NEXT: vmrglh v2, v2, v2 > ; CHECK-P8-NEXT: vslw v2, v2, v3 > ; CHECK-P8-NEXT: vsraw v2, v2, v3 > ; CHECK-P8-NEXT: xvcvsxwsp v2, v2 > @@ -344,9 +323,8 @@ define <4 x float> @test4elt_signed(i64 %a.coerce) > local_unnamed_addr #1 { > ; > ; CHECK-P9-LABEL: test4elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r3 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vmrglh v2, v2, v2 > +; CHECK-P9-NEXT: mtvsrd v2, r3 > +; CHECK-P9-NEXT: vmrghh v2, v2, v2 > ; CHECK-P9-NEXT: vextsh2w v2, v2 > ; CHECK-P9-NEXT: xvcvsxwsp v2, v2 > ; CHECK-P9-NEXT: blr > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll > index faec95831816..ea8ede3af22a 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll > @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i32 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: test2elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l > +; CHECK-P8-NEXT: mtvsrwz v2, r3 > +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > +; CHECK-P8-NEXT: lvx v3, 0, r4 > ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 > ; CHECK-P8-NEXT: xvcvuxddp v2, v2 > ; CHECK-P8-NEXT: blr > @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture > sret %agg.result, i64 %a.c > ; CHECK-P8-LABEL: test4elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha > +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v2, r4 > ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l > +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > +; CHECK-P8-NEXT: lvx v3, 0, r5 > ; CHECK-P8-NEXT: lvx v5, 0, r4 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 > -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 > -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 > -; CHECK-P8-NEXT: xvcvuxddp vs1, v3 > +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 > +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 > +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 > +; CHECK-P8-NEXT: xvcvuxddp vs1, v2 > ; CHECK-P8-NEXT: xxswapd vs0, vs0 > ; CHECK-P8-NEXT: xxswapd vs1, vs1 > ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4 > @@ -74,11 +72,10 @@ define void @test4elt(<4 x double>* noalias nocapture > sret %agg.result, i64 %a.c > ; > ; CHECK-P9-LABEL: test4elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > ; CHECK-P9-NEXT: xxlxor v4, v4, v4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_1 at toc@l > @@ -370,14 +367,13 @@ define <2 x double> @test2elt_signed(i32 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: test2elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > +; CHECK-P8-NEXT: mtvsrwz v3, r3 > ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha > +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l > ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l > +; CHECK-P8-NEXT: lvx v2, 0, r4 > ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 > +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: xxswapd v3, vs0 > ; CHECK-P8-NEXT: vsld v2, v2, v3 > ; CHECK-P8-NEXT: vsrad v2, v2, v3 > @@ -415,17 +411,16 @@ define void @test4elt_signed(<4 x double>* noalias > nocapture sret %agg.result, i > ; CHECK-P8-LABEL: test4elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha > -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: lvx v4, 0, r4 > +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha > +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l > ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l > +; CHECK-P8-NEXT: lvx v2, 0, r5 > +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l > ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r4, 16 > +; CHECK-P8-NEXT: lvx v4, 0, r5 > ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 > ; CHECK-P8-NEXT: xxswapd v4, vs0 > @@ -443,14 +438,13 @@ define void @test4elt_signed(<4 x double>* noalias > nocapture sret %agg.result, i > ; > ; CHECK-P9-LABEL: test4elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_1 at toc@l > +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: vextsh2d v3, v3 > ; CHECK-P9-NEXT: xvcvsxddp vs0, v3 > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll > index 6f046f69ecca..f152c2b008ff 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll > @@ -18,9 +18,9 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr #0 > { > ; CHECK-P8-NEXT: xscvuxdsp f0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -30,12 +30,12 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xxswapd vs0, v2 > ; CHECK-P9-NEXT: xscvuxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: xxlor vs0, v2, v2 > ; CHECK-P9-NEXT: xscvuxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -311,9 +311,9 @@ define i64 @test2elt_signed(<2 x i64> %a) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvsxdsp f0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -323,12 +323,12 @@ define i64 @test2elt_signed(<2 x i64> %a) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xxswapd vs0, v2 > ; CHECK-P9-NEXT: xscvsxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: xxlor vs0, v2, v2 > ; CHECK-P9-NEXT: xscvsxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll > index ce97ed67baa1..f2cb9f5f45fb 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll > @@ -24,9 +24,9 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P8-NEXT: xscvuxdsp f1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -43,12 +43,12 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr > #0 { > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > ; CHECK-P9-NEXT: vextubrx r3, r3, v2 > ; CHECK-P9-NEXT: clrlwi r3, r3, 24 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: mtfprwz f0, r3 > ; CHECK-P9-NEXT: xscvuxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -81,11 +81,10 @@ define <4 x float> @test4elt(i32 %a.coerce) > local_unnamed_addr #1 { > ; CHECK-P8-LABEL: test4elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l > +; CHECK-P8-NEXT: mtvsrwz v2, r3 > +; CHECK-P8-NEXT: addi r4, r4, .LCPI1_0 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > +; CHECK-P8-NEXT: lvx v3, 0, r4 > ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 > ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 > ; CHECK-P8-NEXT: blr > @@ -121,30 +120,28 @@ define void @test8elt(<8 x float>* noalias nocapture > sret %agg.result, i64 %a.co > ; CHECK-P8-LABEL: test8elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha > +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_1 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v2, r4 > ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l > +; CHECK-P8-NEXT: addi r4, r6, .LCPI2_1 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > +; CHECK-P8-NEXT: lvx v3, 0, r5 > ; CHECK-P8-NEXT: lvx v5, 0, r4 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 > -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 > -; CHECK-P8-NEXT: xvcvuxwsp v2, v2 > +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 > +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 > ; CHECK-P8-NEXT: xvcvuxwsp v3, v3 > -; CHECK-P8-NEXT: stvx v2, 0, r3 > -; CHECK-P8-NEXT: stvx v3, r3, r4 > +; CHECK-P8-NEXT: xvcvuxwsp v2, v2 > +; CHECK-P8-NEXT: stvx v3, 0, r3 > +; CHECK-P8-NEXT: stvx v2, r3, r4 > ; CHECK-P8-NEXT: blr > ; > ; CHECK-P9-LABEL: test8elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > ; CHECK-P9-NEXT: xxlxor v4, v4, v4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l > @@ -292,9 +289,9 @@ define i64 @test2elt_signed(i16 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-NEXT: xscvsxdsp f1, f1 > ; CHECK-P8-NEXT: xscvdpspn vs0, f0 > ; CHECK-P8-NEXT: xscvdpspn vs1, f1 > -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-P8-NEXT: vmrglw v2, v3, v2 > +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-P8-NEXT: vmrghw v2, v3, v2 > ; CHECK-P8-NEXT: xxswapd vs0, v2 > ; CHECK-P8-NEXT: mffprd r3, f0 > ; CHECK-P8-NEXT: blr > @@ -311,12 +308,12 @@ define i64 @test2elt_signed(i16 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > ; CHECK-P9-NEXT: vextubrx r3, r3, v2 > ; CHECK-P9-NEXT: extsb r3, r3 > -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 > +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 > ; CHECK-P9-NEXT: mtfprwa f0, r3 > ; CHECK-P9-NEXT: xscvsxdsp f0, f0 > ; CHECK-P9-NEXT: xscvdpspn vs0, f0 > -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-P9-NEXT: vmrglw v2, v2, v3 > +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-P9-NEXT: vmrghw v2, v2, v3 > ; CHECK-P9-NEXT: mfvsrld r3, v2 > ; CHECK-P9-NEXT: blr > ; > @@ -349,11 +346,10 @@ define <4 x float> @test4elt_signed(i32 %a.coerce) > local_unnamed_addr #1 { > ; CHECK-P8-LABEL: test4elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI5_0 at toc@l > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 > +; CHECK-P8-NEXT: mtvsrwz v3, r3 > +; CHECK-P8-NEXT: addi r4, r4, .LCPI5_0 at toc@l > +; CHECK-P8-NEXT: lvx v2, 0, r4 > +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: vspltisw v3, 12 > ; CHECK-P8-NEXT: vadduwm v3, v3, v3 > ; CHECK-P8-NEXT: vslw v2, v2, v3 > @@ -392,15 +388,14 @@ define void @test8elt_signed(<8 x float>* noalias > nocapture sret %agg.result, i6 > ; CHECK-P8-LABEL: test8elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha > +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_1 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v3, r4 > ; CHECK-P8-NEXT: vspltisw v5, 12 > +; CHECK-P8-NEXT: li r4, 16 > ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l > ; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: lvx v4, 0, r4 > -; CHECK-P8-NEXT: li r4, 16 > +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_1 at toc@l > +; CHECK-P8-NEXT: lvx v4, 0, r5 > ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 > ; CHECK-P8-NEXT: vadduwm v4, v5, v5 > @@ -416,14 +411,13 @@ define void @test8elt_signed(<8 x float>* noalias > nocapture sret %agg.result, i6 > ; > ; CHECK-P9-LABEL: test8elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l > +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: vextsb2w v3, v3 > ; CHECK-P9-NEXT: xvcvsxwsp vs0, v3 > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > > diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll > b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll > index b4582e844f30..268fc9b7d4cc 100644 > --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll > +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll > @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i16 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: test2elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l > +; CHECK-P8-NEXT: mtvsrwz v2, r3 > +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > +; CHECK-P8-NEXT: lvx v3, 0, r4 > ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 > ; CHECK-P8-NEXT: xvcvuxddp v2, v2 > ; CHECK-P8-NEXT: blr > @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture > sret %agg.result, i32 %a.c > ; CHECK-P8-LABEL: test4elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha > +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha > +; CHECK-P8-NEXT: mtvsrwz v2, r4 > ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l > +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > +; CHECK-P8-NEXT: lvx v3, 0, r5 > ; CHECK-P8-NEXT: lvx v5, 0, r4 > ; CHECK-P8-NEXT: li r4, 16 > -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 > -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 > -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 > -; CHECK-P8-NEXT: xvcvuxddp vs1, v3 > +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 > +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 > +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 > +; CHECK-P8-NEXT: xvcvuxddp vs1, v2 > ; CHECK-P8-NEXT: xxswapd vs0, vs0 > ; CHECK-P8-NEXT: xxswapd vs1, vs1 > ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4 > @@ -118,33 +116,32 @@ define void @test8elt(<8 x double>* noalias > nocapture sret %agg.result, i64 %a.c > ; CHECK-P8-LABEL: test8elt: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_2 at toc@ha > +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_2 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v2, r4 > +; CHECK-P8-NEXT: addis r4, r2, .LCPI2_3 at toc@ha > ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_2 at toc@l > +; CHECK-P8-NEXT: addi r4, r4, .LCPI2_3 at toc@l > ; CHECK-P8-NEXT: xxlxor v4, v4, v4 > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_3 at toc@ha > -; CHECK-P8-NEXT: lvx v5, 0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: addi r5, r5, .LCPI2_3 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l > -; CHECK-P8-NEXT: lvx v0, 0, r5 > -; CHECK-P8-NEXT: lvx v1, 0, r4 > +; CHECK-P8-NEXT: lvx v3, 0, r5 > +; CHECK-P8-NEXT: addi r5, r6, .LCPI2_2 at toc@l > +; CHECK-P8-NEXT: lvx v0, 0, r4 > ; CHECK-P8-NEXT: li r4, 48 > +; CHECK-P8-NEXT: lvx v5, 0, r5 > +; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha > +; CHECK-P8-NEXT: addi r5, r5, .LCPI2_1 at toc@l > +; CHECK-P8-NEXT: lvx v1, 0, r5 > +; CHECK-P8-NEXT: vperm v0, v4, v2, v0 > ; CHECK-P8-NEXT: li r5, 32 > -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 > -; CHECK-P8-NEXT: vperm v5, v4, v3, v5 > -; CHECK-P8-NEXT: vperm v0, v4, v3, v0 > -; CHECK-P8-NEXT: vperm v3, v4, v3, v1 > -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 > -; CHECK-P8-NEXT: xvcvuxddp vs1, v5 > +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 > +; CHECK-P8-NEXT: vperm v5, v4, v2, v5 > +; CHECK-P8-NEXT: vperm v2, v4, v2, v1 > ; CHECK-P8-NEXT: xvcvuxddp vs2, v0 > -; CHECK-P8-NEXT: xvcvuxddp vs3, v3 > +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 > +; CHECK-P8-NEXT: xvcvuxddp vs1, v5 > +; CHECK-P8-NEXT: xvcvuxddp vs3, v2 > +; CHECK-P8-NEXT: xxswapd vs2, vs2 > ; CHECK-P8-NEXT: xxswapd vs0, vs0 > ; CHECK-P8-NEXT: xxswapd vs1, vs1 > -; CHECK-P8-NEXT: xxswapd vs2, vs2 > ; CHECK-P8-NEXT: xxswapd vs3, vs3 > ; CHECK-P8-NEXT: stxvd2x vs2, r3, r4 > ; CHECK-P8-NEXT: li r4, 16 > @@ -155,11 +152,10 @@ define void @test8elt(<8 x double>* noalias > nocapture sret %agg.result, i64 %a.c > ; > ; CHECK-P9-LABEL: test8elt: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > ; CHECK-P9-NEXT: xxlxor v4, v4, v4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l > @@ -404,14 +400,13 @@ define <2 x double> @test2elt_signed(i16 %a.coerce) > local_unnamed_addr #0 { > ; CHECK-P8-LABEL: test2elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r3 > -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l > -; CHECK-P8-NEXT: xxswapd v2, vs0 > -; CHECK-P8-NEXT: lvx v3, 0, r3 > +; CHECK-P8-NEXT: mtvsrwz v3, r3 > ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha > +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l > ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l > +; CHECK-P8-NEXT: lvx v2, 0, r4 > ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3 > -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 > +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: xxswapd v3, vs0 > ; CHECK-P8-NEXT: vsld v2, v2, v3 > ; CHECK-P8-NEXT: vsrad v2, v2, v3 > @@ -449,17 +444,16 @@ define void @test4elt_signed(<4 x double>* noalias > nocapture sret %agg.result, i > ; CHECK-P8-LABEL: test4elt_signed: > ; CHECK-P8: # %bb.0: # %entry > ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha > -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l > -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l > -; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: lvx v4, 0, r4 > +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha > +; CHECK-P8-NEXT: mtvsrwz v3, r4 > ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha > +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l > ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l > +; CHECK-P8-NEXT: lvx v2, 0, r5 > +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l > ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r4, 16 > +; CHECK-P8-NEXT: lvx v4, 0, r5 > ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 > ; CHECK-P8-NEXT: xxswapd v4, vs0 > @@ -523,26 +517,25 @@ entry: > define void @test8elt_signed(<8 x double>* noalias nocapture sret > %agg.result, i64 %a.coerce) local_unnamed_addr #1 { > ; CHECK-P8-LABEL: test8elt_signed: > ; CHECK-P8: # %bb.0: # %entry > -; CHECK-P8-NEXT: mtfprd f0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_2 at toc@ha > ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha > -; CHECK-P8-NEXT: addis r6, r2, .LCPI6_3 at toc@ha > -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_2 at toc@l > +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_2 at toc@ha > +; CHECK-P8-NEXT: mtvsrd v3, r4 > +; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha > ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l > -; CHECK-P8-NEXT: addi r6, r6, .LCPI6_3 at toc@l > -; CHECK-P8-NEXT: lvx v4, 0, r4 > -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_4 at toc@ha > +; CHECK-P8-NEXT: addi r6, r6, .LCPI6_2 at toc@l > +; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l > ; CHECK-P8-NEXT: lvx v2, 0, r5 > -; CHECK-P8-NEXT: xxswapd v3, vs0 > -; CHECK-P8-NEXT: lvx v5, 0, r6 > -; CHECK-P8-NEXT: addis r5, r2, .LCPI6_1 at toc@ha > -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_4 at toc@l > -; CHECK-P8-NEXT: addi r5, r5, .LCPI6_1 at toc@l > -; CHECK-P8-NEXT: lvx v0, 0, r4 > -; CHECK-P8-NEXT: lxvd2x vs0, 0, r5 > +; CHECK-P8-NEXT: addis r5, r2, .LCPI6_3 at toc@ha > +; CHECK-P8-NEXT: lvx v4, 0, r6 > +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_4 at toc@ha > +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 > ; CHECK-P8-NEXT: li r4, 48 > -; CHECK-P8-NEXT: li r5, 32 > +; CHECK-P8-NEXT: addi r5, r5, .LCPI6_3 at toc@l > +; CHECK-P8-NEXT: lvx v5, 0, r5 > +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_4 at toc@l > +; CHECK-P8-NEXT: lvx v0, 0, r5 > ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 > +; CHECK-P8-NEXT: li r5, 32 > ; CHECK-P8-NEXT: vperm v4, v3, v3, v4 > ; CHECK-P8-NEXT: vperm v5, v3, v3, v5 > ; CHECK-P8-NEXT: vperm v3, v3, v3, v0 > @@ -572,14 +565,13 @@ define void @test8elt_signed(<8 x double>* noalias > nocapture sret %agg.result, i > ; > ; CHECK-P9-LABEL: test8elt_signed: > ; CHECK-P9: # %bb.0: # %entry > -; CHECK-P9-NEXT: mtfprd f0, r4 > +; CHECK-P9-NEXT: mtvsrd v2, r4 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > -; CHECK-P9-NEXT: xxswapd v2, vs0 > -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha > ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l > +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 > ; CHECK-P9-NEXT: vextsb2d v3, v3 > ; CHECK-P9-NEXT: xvcvsxddp vs0, v3 > ; CHECK-P9-NEXT: lxvx v3, 0, r4 > > diff --git > a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll > b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll > index 7e51f2b862ab..29955dc17f67 100644 > --- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll > +++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll > @@ -82,10 +82,10 @@ define <3 x float> @constrained_vector_fdiv_v3f32() #0 > { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: xscvdpspn 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 > -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 > +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: blr > ; > @@ -106,12 +106,12 @@ define <3 x float> @constrained_vector_fdiv_v3f32() > #0 { > ; PC64LE9-NEXT: xsdivsp 2, 2, 0 > ; PC64LE9-NEXT: xsdivsp 0, 3, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: blr > entry: > @@ -359,11 +359,11 @@ define <3 x float> @constrained_vector_frem_v3f32() > #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI7_4 at toc@l > ; PC64LE-NEXT: lvx 4, 0, 3 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 30 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: addi 1, 1, 64 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -401,15 +401,15 @@ define <3 x float> @constrained_vector_frem_v3f32() > #0 { > ; PC64LE9-NEXT: bl fmodf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 29 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > ; PC64LE9-NEXT: addis 3, 2, .LCPI7_4 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI7_4 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: addi 1, 1, 64 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -710,10 +710,10 @@ define <3 x float> @constrained_vector_fmul_v3f32() > #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: xscvdpspn 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 > -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 > +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: blr > ; > @@ -735,11 +735,11 @@ define <3 x float> @constrained_vector_fmul_v3f32() > #0 { > ; PC64LE9-NEXT: xsmulsp 1, 1, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > ; PC64LE9-NEXT: xscvdpspn 1, 1 > -; PC64LE9-NEXT: xxsldwi 34, 1, 1, 1 > +; PC64LE9-NEXT: xxsldwi 34, 1, 1, 3 > ; PC64LE9-NEXT: xscvdpspn 1, 2 > -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: blr > entry: > @@ -925,10 +925,10 @@ define <3 x float> @constrained_vector_fadd_v3f32() > #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: xscvdpspn 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 > -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 > +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: blr > ; > @@ -945,15 +945,15 @@ define <3 x float> @constrained_vector_fadd_v3f32() > #0 { > ; PC64LE9-NEXT: xsaddsp 1, 0, 1 > ; PC64LE9-NEXT: xsaddsp 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 1 > ; PC64LE9-NEXT: addis 3, 2, .LCPI17_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI17_3 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: blr > entry: > @@ -1137,10 +1137,10 @@ define <3 x float> > @constrained_vector_fsub_v3f32() #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: xscvdpspn 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 > -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 > +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: blr > ; > @@ -1157,15 +1157,15 @@ define <3 x float> > @constrained_vector_fsub_v3f32() #0 { > ; PC64LE9-NEXT: xssubsp 1, 0, 1 > ; PC64LE9-NEXT: xssubsp 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 1 > ; PC64LE9-NEXT: addis 3, 2, .LCPI22_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI22_3 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: blr > entry: > @@ -1333,12 +1333,12 @@ define <3 x float> > @constrained_vector_sqrt_v3f32() #0 { > ; PC64LE-NEXT: xssqrtsp 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > ; PC64LE-NEXT: xscvdpspn 1, 1 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 2 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: blr > ; > @@ -1358,10 +1358,10 @@ define <3 x float> > @constrained_vector_sqrt_v3f32() #0 { > ; PC64LE9-NEXT: xscvdpspn 0, 0 > ; PC64LE9-NEXT: xscvdpspn 1, 1 > ; PC64LE9-NEXT: xscvdpspn 2, 2 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: blr > @@ -1588,11 +1588,11 @@ define <3 x float> @constrained_vector_pow_v3f32() > #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI32_4 at toc@l > ; PC64LE-NEXT: lvx 4, 0, 3 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 30 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: addi 1, 1, 64 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -1630,15 +1630,15 @@ define <3 x float> @constrained_vector_pow_v3f32() > #0 { > ; PC64LE9-NEXT: bl powf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 29 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > ; PC64LE9-NEXT: addis 3, 2, .LCPI32_4 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI32_4 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: addi 1, 1, 64 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -1992,11 +1992,11 @@ define <3 x float> > @constrained_vector_powi_v3f32() #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI37_3 at toc@l > ; PC64LE-NEXT: lvx 4, 0, 3 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -2030,15 +2030,15 @@ define <3 x float> > @constrained_vector_powi_v3f32() #0 { > ; PC64LE9-NEXT: bl __powisf2 > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI37_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI37_3 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -2360,12 +2360,12 @@ define <3 x float> @constrained_vector_sin_v3f32() > #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI42_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI42_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -2396,15 +2396,15 @@ define <3 x float> @constrained_vector_sin_v3f32() > #0 { > ; PC64LE9-NEXT: bl sinf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI42_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI42_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -2709,12 +2709,12 @@ define <3 x float> @constrained_vector_cos_v3f32() > #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI47_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI47_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -2745,15 +2745,15 @@ define <3 x float> @constrained_vector_cos_v3f32() > #0 { > ; PC64LE9-NEXT: bl cosf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI47_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI47_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -3058,12 +3058,12 @@ define <3 x float> @constrained_vector_exp_v3f32() > #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI52_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI52_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -3094,15 +3094,15 @@ define <3 x float> @constrained_vector_exp_v3f32() > #0 { > ; PC64LE9-NEXT: bl expf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI52_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI52_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -3407,12 +3407,12 @@ define <3 x float> > @constrained_vector_exp2_v3f32() #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI57_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI57_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -3443,15 +3443,15 @@ define <3 x float> > @constrained_vector_exp2_v3f32() #0 { > ; PC64LE9-NEXT: bl exp2f > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI57_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI57_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -3756,12 +3756,12 @@ define <3 x float> @constrained_vector_log_v3f32() > #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI62_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI62_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -3792,15 +3792,15 @@ define <3 x float> @constrained_vector_log_v3f32() > #0 { > ; PC64LE9-NEXT: bl logf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI62_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI62_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -4105,12 +4105,12 @@ define <3 x float> > @constrained_vector_log10_v3f32() #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI67_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI67_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -4141,15 +4141,15 @@ define <3 x float> > @constrained_vector_log10_v3f32() #0 { > ; PC64LE9-NEXT: bl log10f > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI67_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI67_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -4454,12 +4454,12 @@ define <3 x float> > @constrained_vector_log2_v3f32() #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI72_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI72_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -4490,15 +4490,15 @@ define <3 x float> > @constrained_vector_log2_v3f32() #0 { > ; PC64LE9-NEXT: bl log2f > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI72_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI72_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -4748,12 +4748,12 @@ define <3 x float> > @constrained_vector_rint_v3f32() #0 { > ; PC64LE-NEXT: xsrdpic 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > ; PC64LE-NEXT: xscvdpspn 1, 1 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 2 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: blr > ; > @@ -4773,10 +4773,10 @@ define <3 x float> > @constrained_vector_rint_v3f32() #0 { > ; PC64LE9-NEXT: xscvdpspn 0, 0 > ; PC64LE9-NEXT: xscvdpspn 1, 1 > ; PC64LE9-NEXT: xscvdpspn 2, 2 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: blr > @@ -4947,12 +4947,12 @@ define <3 x float> > @constrained_vector_nearbyint_v3f32() #0 { > ; PC64LE-NEXT: addis 3, 2, .LCPI82_3 at toc@ha > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI82_3 at toc@l > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 31 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: addi 1, 1, 48 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -4983,15 +4983,15 @@ define <3 x float> > @constrained_vector_nearbyint_v3f32() #0 { > ; PC64LE9-NEXT: bl nearbyintf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 31 > ; PC64LE9-NEXT: addis 3, 2, .LCPI82_3 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI82_3 at toc@l > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: addi 1, 1, 48 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -5184,11 +5184,11 @@ define <3 x float> > @constrained_vector_maxnum_v3f32() #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI87_5 at toc@l > ; PC64LE-NEXT: lvx 4, 0, 3 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 30 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: addi 1, 1, 64 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -5227,15 +5227,15 @@ define <3 x float> > @constrained_vector_maxnum_v3f32() #0 { > ; PC64LE9-NEXT: bl fmaxf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 29 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > ; PC64LE9-NEXT: addis 3, 2, .LCPI87_5 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI87_5 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: addi 1, 1, 64 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -5471,11 +5471,11 @@ define <3 x float> > @constrained_vector_minnum_v3f32() #0 { > ; PC64LE-NEXT: xscvdpspn 1, 1 > ; PC64LE-NEXT: addi 3, 3, .LCPI92_5 at toc@l > ; PC64LE-NEXT: lvx 4, 0, 3 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 30 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 2, 3 > -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 2, 3 > +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 3, 2, 4 > ; PC64LE-NEXT: addi 1, 1, 64 > ; PC64LE-NEXT: ld 0, 16(1) > @@ -5514,15 +5514,15 @@ define <3 x float> > @constrained_vector_minnum_v3f32() #0 { > ; PC64LE9-NEXT: bl fminf > ; PC64LE9-NEXT: nop > ; PC64LE9-NEXT: xscvdpspn 0, 1 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 29 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: xscvdpspn 0, 30 > ; PC64LE9-NEXT: addis 3, 2, .LCPI92_5 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI92_5 at toc@l > ; PC64LE9-NEXT: lxvx 36, 0, 3 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 3, 2, 4 > ; PC64LE9-NEXT: addi 1, 1, 64 > ; PC64LE9-NEXT: ld 0, 16(1) > @@ -5686,9 +5686,9 @@ define <2 x float> > @constrained_vector_fptrunc_v2f64() #0 { > ; PC64LE-NEXT: xsrsp 1, 1 > ; PC64LE-NEXT: xscvdpspn 0, 0 > ; PC64LE-NEXT: xscvdpspn 1, 1 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > ; PC64LE-NEXT: blr > ; > ; PC64LE9-LABEL: constrained_vector_fptrunc_v2f64: > @@ -5698,12 +5698,12 @@ define <2 x float> > @constrained_vector_fptrunc_v2f64() #0 { > ; PC64LE9-NEXT: addis 3, 2, .LCPI96_1 at toc@ha > ; PC64LE9-NEXT: xsrsp 0, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: lfd 0, .LCPI96_1 at toc@l(3) > ; PC64LE9-NEXT: xsrsp 0, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: blr > entry: > %result = call <2 x float> > @llvm.experimental.constrained.fptrunc.v2f32.v2f64( > @@ -5729,12 +5729,12 @@ define <3 x float> > @constrained_vector_fptrunc_v3f64() #0 { > ; PC64LE-NEXT: xsrsp 2, 2 > ; PC64LE-NEXT: xscvdpspn 0, 0 > ; PC64LE-NEXT: xscvdpspn 1, 1 > -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE-NEXT: xscvdpspn 0, 2 > -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 > -; PC64LE-NEXT: vmrglw 2, 3, 2 > +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 > +; PC64LE-NEXT: vmrghw 2, 3, 2 > ; PC64LE-NEXT: lvx 3, 0, 3 > -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE-NEXT: vperm 2, 4, 2, 3 > ; PC64LE-NEXT: blr > ; > @@ -5745,20 +5745,20 @@ define <3 x float> > @constrained_vector_fptrunc_v3f64() #0 { > ; PC64LE9-NEXT: addis 3, 2, .LCPI97_1 at toc@ha > ; PC64LE9-NEXT: xsrsp 0, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 > ; PC64LE9-NEXT: lfd 0, .LCPI97_1 at toc@l(3) > ; PC64LE9-NEXT: addis 3, 2, .LCPI97_2 at toc@ha > ; PC64LE9-NEXT: addi 3, 3, .LCPI97_2 at toc@l > ; PC64LE9-NEXT: xsrsp 0, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 > -; PC64LE9-NEXT: vmrglw 2, 3, 2 > +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 > +; PC64LE9-NEXT: vmrghw 2, 3, 2 > ; PC64LE9-NEXT: lxvx 35, 0, 3 > ; PC64LE9-NEXT: addis 3, 2, .LCPI97_3 at toc@ha > ; PC64LE9-NEXT: lfd 0, .LCPI97_3 at toc@l(3) > ; PC64LE9-NEXT: xsrsp 0, 0 > ; PC64LE9-NEXT: xscvdpspn 0, 0 > -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 > +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 > ; PC64LE9-NEXT: vperm 2, 4, 2, 3 > ; PC64LE9-NEXT: blr > entry: > > diff --git a/llvm/test/CodeGen/PowerPC/vsx.ll > b/llvm/test/CodeGen/PowerPC/vsx.ll > index 8b4e3640ef6b..4a78218262ca 100644 > --- a/llvm/test/CodeGen/PowerPC/vsx.ll > +++ b/llvm/test/CodeGen/PowerPC/vsx.ll > @@ -1404,9 +1404,9 @@ define <2 x float> @test44(<2 x i64> %a) { > ; CHECK-LE-NEXT: xscvuxdsp f0, f0 > ; CHECK-LE-NEXT: xscvdpspn vs1, f1 > ; CHECK-LE-NEXT: xscvdpspn vs0, f0 > -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-LE-NEXT: vmrglw v2, v3, v2 > +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-LE-NEXT: vmrghw v2, v3, v2 > ; CHECK-LE-NEXT: blr > %v = uitofp <2 x i64> %a to <2 x float> > ret <2 x float> %v > @@ -1486,9 +1486,9 @@ define <2 x float> @test45(<2 x i64> %a) { > ; CHECK-LE-NEXT: xscvsxdsp f0, f0 > ; CHECK-LE-NEXT: xscvdpspn vs1, f1 > ; CHECK-LE-NEXT: xscvdpspn vs0, f0 > -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1 > -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1 > -; CHECK-LE-NEXT: vmrglw v2, v3, v2 > +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3 > +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3 > +; CHECK-LE-NEXT: vmrghw v2, v3, v2 > ; CHECK-LE-NEXT: blr > %v = sitofp <2 x i64> %a to <2 x float> > ret <2 x float> %v > @@ -2437,12 +2437,11 @@ define <2 x i32> @test80(i32 %v) { > ; > ; CHECK-LE-LABEL: test80: > ; CHECK-LE: # %bb.0: > -; CHECK-LE-NEXT: mtfprd f0, r3 > +; CHECK-LE-NEXT: mtfprwz f0, r3 > ; CHECK-LE-NEXT: addis r4, r2, .LCPI65_0 at toc@ha > ; CHECK-LE-NEXT: addi r3, r4, .LCPI65_0 at toc@l > -; CHECK-LE-NEXT: xxswapd vs0, vs0 > +; CHECK-LE-NEXT: xxspltw v2, vs0, 1 > ; CHECK-LE-NEXT: lvx v3, 0, r3 > -; CHECK-LE-NEXT: xxspltw v2, vs0, 3 > ; CHECK-LE-NEXT: vadduwm v2, v2, v3 > ; CHECK-LE-NEXT: blr > %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 > > diff --git a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll > b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll > index 5c05f8dc3d81..a198604f79a4 100644 > --- a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll > +++ b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll > @@ -17,17 +17,15 @@ define <2 x double> @testi0(<2 x double>* %p1, double* > %p2) { > ; CHECK-NEXT: lxvd2x vs0, 0, r3 > ; CHECK-NEXT: lfdx f1, 0, r4 > ; CHECK-NEXT: xxswapd vs0, vs0 > -; CHECK-NEXT: xxspltd vs1, vs1, 0 > -; CHECK-NEXT: xxpermdi v2, vs0, vs1, 1 > +; CHECK-NEXT: xxmrghd v2, vs0, vs1 > ; CHECK-NEXT: blr > ; > ; CHECK-P9-VECTOR-LABEL: testi0: > ; CHECK-P9-VECTOR: # %bb.0: > ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3 > ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4 > -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0 > ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0 > -; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs0, vs1, 1 > +; CHECK-P9-VECTOR-NEXT: xxmrghd v2, vs0, vs1 > ; CHECK-P9-VECTOR-NEXT: blr > ; > ; CHECK-P9-LABEL: testi0: > @@ -51,17 +49,15 @@ define <2 x double> @testi1(<2 x double>* %p1, double* > %p2) { > ; CHECK-NEXT: lxvd2x vs0, 0, r3 > ; CHECK-NEXT: lfdx f1, 0, r4 > ; CHECK-NEXT: xxswapd vs0, vs0 > -; CHECK-NEXT: xxspltd vs1, vs1, 0 > -; CHECK-NEXT: xxmrgld v2, vs1, vs0 > +; CHECK-NEXT: xxpermdi v2, vs1, vs0, 1 > ; CHECK-NEXT: blr > ; > ; CHECK-P9-VECTOR-LABEL: testi1: > ; CHECK-P9-VECTOR: # %bb.0: > ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3 > ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4 > -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0 > ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0 > -; CHECK-P9-VECTOR-NEXT: xxmrgld v2, vs1, vs0 > +; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs1, vs0, 1 > ; CHECK-P9-VECTOR-NEXT: blr > ; > ; CHECK-P9-LABEL: testi1: > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Mon Jul 6 17:13:57 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:13:57 +0000 (UTC) Subject: [PATCH] D81166: [Matrix] Add matrix_nuw/matrix_nsw operand bundles for matrix.multiply. In-Reply-To: References: Message-ID: <0fbd29cbf140438053a3c7f1a2b10a49@localhost.localdomain> reames added a comment. (drive by comments only, please don't block on me) Reading over this, I find myself wondering if this is actually matrix specific. Would it make sense to have a means to declare the operations in an reduce.add are nsw/nuw? As a semantic clarification, does the nsw/nuw markers on the matrix make any assumptions about order of operations? ================ Comment at: llvm/docs/LangRef.rst:2268 +``@llvm.matrix.multiply.*`` intrinsic and at most one of each matrix operand +bundle can be attached to a call. + ---------------- The LangRef text is not sufficient to describe the syntax. I know this only because I went and read your verifier changes. Please clarify, ideally with an example. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81166/new/ https://reviews.llvm.org/D81166 From llvm-commits at lists.llvm.org Mon Jul 6 17:28:13 2020 From: llvm-commits at lists.llvm.org (Ruiling, Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:28:13 +0000 (UTC) Subject: [PATCH] D83020: [AMDGPU] Avoid using s_cmpk when src0 is not register In-Reply-To: References: Message-ID: <9d526f36c859f372b1fc06ea992d4f0e@localhost.localdomain> ruiling marked an inline comment as done. ruiling added inline comments. ================ Comment at: llvm/test/CodeGen/AMDGPU/cmp_shrink.mir:5 +# GCN: bb.0: +# GCN-NOT: S_CMPK_GT_I32 +--- ---------------- arsenm wrote: > ruiling wrote: > > arsenm wrote: > > > positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too? > > will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue. > So what is the context this appears? Why wasn't it optimized out? well I didn't carefully check the program yet to understand why the optimization algorithms in llvm fails to optimize the program. but I think that is another problem that worth a careful investigation. I will investigate and try to optimize it off later. But I think this patch can be merged, right? can anyone help to merge? I don't have commit access. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83020/new/ https://reviews.llvm.org/D83020 From llvm-commits at lists.llvm.org Mon Jul 6 17:28:29 2020 From: llvm-commits at lists.llvm.org (Wolfgang Pieb via llvm-commits) Date: Mon, 06 Jul 2020 17:28:29 -0700 (PDT) Subject: [llvm] 1293874 - Correct 3 spelling errors in headers and doc strings. Message-ID: <5f03c1ad.1c69fb81.3f01b.184e@mx.google.com> Author: Wolfgang Pieb Date: 2020-07-06T17:27:51-07:00 New Revision: 129387497e582ae96de41c56083fe52fce68ba91 URL: https://github.com/llvm/llvm-project/commit/129387497e582ae96de41c56083fe52fce68ba91 DIFF: https://github.com/llvm/llvm-project/commit/129387497e582ae96de41c56083fe52fce68ba91.diff LOG: Correct 3 spelling errors in headers and doc strings. Added: Modified: llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h b/llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h index 3b95e3e050bd..06126da7d007 100644 --- a/llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h +++ b/llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h @@ -46,7 +46,7 @@ class GsymReader; /// also makes any encoded addresses easy to relocate as we just need to /// relocate the FunctionInfo's start address. /// -/// - The AddressRanges member "Ranges" is encoded using an approriate base +/// - The AddressRanges member "Ranges" is encoded using an appropriate base /// address as described above. /// - UINT8 boolean value that specifies if the InlineInfo object has children. /// - UINT32 string table offset that points to the name of the inline diff --git a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp index 1f8285eadb7e..29d54b26135a 100644 --- a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp +++ b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp @@ -30,7 +30,7 @@ STATISTIC(NumLFENCEsInserted, "Number of lfence instructions inserted"); static cl::opt EnableSpeculativeExecutionSideEffectSuppression( "x86-seses-enable", - cl::desc("Force enable speculative execution side effect suppresion. " + cl::desc("Force enable speculative execution side effect suppression. " "(Note: User must pass -mlvi-cfi in order to mitigate indirect " "branches and returns.)"), cl::init(false), cl::Hidden); @@ -153,5 +153,5 @@ FunctionPass *llvm::createX86SpeculativeExecutionSideEffectSuppression() { } INITIALIZE_PASS(X86SpeculativeExecutionSideEffectSuppression, "x86-seses", - "X86 Speculative Execution Side Effect Suppresion", false, + "X86 Speculative Execution Side Effect Suppression", false, false) From llvm-commits at lists.llvm.org Mon Jul 6 17:36:21 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:36:21 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: <790c57098982574dceee5a9fa062e984@localhost.localdomain> wmi updated this revision to Diff 275867. wmi added a comment. Disable sample-profile-merge-inlinee when sample-profile-top-down-load is not effective (Currently sample-profile-top-down-load is only effective for new pass manager). Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 Files: llvm/lib/Transforms/IPO/SampleProfile.cpp llvm/test/Transforms/SampleProfile/inline-mergeprof.ll llvm/test/Transforms/SampleProfile/inline-topdown.ll Index: llvm/test/Transforms/SampleProfile/inline-topdown.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-topdown.ll +++ llvm/test/Transforms/SampleProfile/inline-topdown.ll @@ -1,10 +1,10 @@ ; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op. ; Test we aren't doing specialization for inlining with default source order -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S | FileCheck -check-prefix=DEFAULT %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-top-down-load=false -S | FileCheck -check-prefix=DEFAULT %s ; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load' -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S | FileCheck -check-prefix=TOPDOWN %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load=true -S | FileCheck -check-prefix=TOPDOWN %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/test/Transforms/SampleProfile/inline-mergeprof.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-mergeprof.ll +++ llvm/test/Transforms/SampleProfile/inline-mergeprof.ll @@ -1,10 +1,10 @@ ; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s ; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=MERGE %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/lib/Transforms/IPO/SampleProfile.cpp =================================================================== --- llvm/lib/Transforms/IPO/SampleProfile.cpp +++ llvm/lib/Transforms/IPO/SampleProfile.cpp @@ -148,14 +148,17 @@ "be accurate. It may be overriden by profile-sample-accurate. ")); static cl::opt ProfileMergeInlinee( - "sample-profile-merge-inlinee", cl::Hidden, cl::init(false), + "sample-profile-merge-inlinee", cl::Hidden, cl::init(true), cl::desc("Merge past inlinee's profile to outline version if sample " - "profile loader decided not to inline a call site.")); + "profile loader decided not to inline a call site. It will " + "only be enabled when top-down order of profile loading is " + "enabled. ")); static cl::opt ProfileTopDownLoad( - "sample-profile-top-down-load", cl::Hidden, cl::init(false), + "sample-profile-top-down-load", cl::Hidden, cl::init(true), cl::desc("Do profile annotation and inlining for functions in top-down " - "order of call graph during sample profile loading.")); + "order of call graph during sample profile loading. It only " + "works for new pass manager. ")); static cl::opt ProfileSizeInline( "sample-profile-inline-size", cl::Hidden, cl::init(false), @@ -1785,6 +1788,15 @@ FunctionOrderList.reserve(M.size()); if (!ProfileTopDownLoad || CG == nullptr) { + if (ProfileMergeInlinee) { + // Disable ProfileMergeInlinee if profile is not loaded in top down order, + // because the profile for a function may be used for the profile + // annotation of its outline copy before the profile merging of its + // non-inlined inline instances, and that is not the way how + // ProfileMergeInlinee is supposed to work. + ProfileMergeInlinee = false; + } + for (Function &F : M) if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile")) FunctionOrderList.push_back(&F); -------------- next part -------------- A non-text attachment was scrubbed... Name: D82919.275867.patch Type: text/x-patch Size: 5156 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 17:37:20 2020 From: llvm-commits at lists.llvm.org (Nico Weber via llvm-commits) Date: Mon, 06 Jul 2020 17:37:20 -0700 (PDT) Subject: [llvm] 003ea14 - fix typos to cycle bots Message-ID: <5f03c3c0.1c69fb81.14908.662d@mx.google.com> Author: Nico Weber Date: 2020-07-06T20:37:11-04:00 New Revision: 003ea142205a927f5e444978515705927c3fb0f2 URL: https://github.com/llvm/llvm-project/commit/003ea142205a927f5e444978515705927c3fb0f2 DIFF: https://github.com/llvm/llvm-project/commit/003ea142205a927f5e444978515705927c3fb0f2.diff LOG: fix typos to cycle bots Added: Modified: llvm/docs/CommandGuide/FileCheck.rst Removed: ################################################################################ diff --git a/llvm/docs/CommandGuide/FileCheck.rst b/llvm/docs/CommandGuide/FileCheck.rst index 1e69c76d2c01..cb5db00c7b12 100644 --- a/llvm/docs/CommandGuide/FileCheck.rst +++ b/llvm/docs/CommandGuide/FileCheck.rst @@ -696,7 +696,7 @@ The syntax of a numeric substitution is A numeric operand is a previously defined numeric variable, an integer literal, or a function. Spaces are accepted before, after and between any of these elements. Numeric operands have 64-bit precision. Overflow and underflow - are rejected. There is no support for operator precendence, but parentheses + are rejected. There is no support for operator precedence, but parentheses can be used to change the evaluation order. The supported operators are: @@ -715,7 +715,7 @@ The syntax of a function call is ``()`` where: * mul - Returns the product of its two operands. * sub - Returns the diff erence of its two operands. -* ```` is a comma seperated list of expressions. +* ```` is a comma separated list of expressions. For example: From llvm-commits at lists.llvm.org Mon Jul 6 17:37:37 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:37:37 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <4878fc1742934e523ee465af8f7d2105@localhost.localdomain> MaskRay updated this revision to Diff 275868. MaskRay retitled this revision from "[ELF] Add -z dead-nonalloc-reloc==" to "[ELF] Add -z dead-reloc-in-nonalloc==". MaskRay edited the summary of this revision. MaskRay added reviewers: grimar, jhenderson, psmith, ruiu, thakis. MaskRay added a comment. Add more tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 Files: lld/ELF/Config.h lld/ELF/Driver.cpp lld/ELF/InputSection.cpp lld/ELF/Options.td lld/docs/ld.lld.1 lld/test/ELF/dead-reloc-in-nonalloc.s lld/test/ELF/debug-dead-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83264.275868.patch Type: text/x-patch Size: 8379 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 17:44:11 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:44:11 +0000 (UTC) Subject: [PATCH] D82129: [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope In-Reply-To: References: Message-ID: <92302612b0a78e5a98105788ed89d022@localhost.localdomain> dblaikie added a comment. In D82129#2134241 , @probinson wrote: > I think I didn't fully grasp that the blocks were being (tail-)merged, which makes the scope ambiguous, and all the rest. So I withdraw the objection on that basis. DWARF is fine with multiple variables pointing to the same location, but it's less forgiving about scopes IIRC, much like it can't describe multiple source attributions for an instructions. This all makes me sad, but that's how DWARF is at the moment. > > Is there still an open question about whether this wants to be a cleanup pass or a verifier check? I apologize for losing track. My take on it is that it's probably not practical to do this as a cleanup - it'd mean any time we merge debug locations, etc, we'd have to go check for isolated variable locations that have become invalid. (though, inversely: I worry that not cleaning up those variable locations might be a source of IR bloat and algorithmic scaling problems when the debug locations are scanned... ) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82129/new/ https://reviews.llvm.org/D82129 From llvm-commits at lists.llvm.org Mon Jul 6 17:44:39 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Mon, 06 Jul 2020 17:44:39 -0700 (PDT) Subject: [llvm] bfa8bda - [gn build] Port Message-ID: <5f03c577.1c69fb81.2f9dd.8917@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-07T00:37:49Z New Revision: bfa8bda0460dc3883605735bd3dd6ce5c1252549 URL: https://github.com/llvm/llvm-project/commit/bfa8bda0460dc3883605735bd3dd6ce5c1252549 DIFF: https://github.com/llvm/llvm-project/commit/bfa8bda0460dc3883605735bd3dd6ce5c1252549.diff LOG: [gn build] Port Added: Modified: llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn index 34e99f4fe32a..efa80c1b86d8 100644 --- a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn @@ -17,7 +17,6 @@ executable("llvm-reduce") { "deltas/ReduceGlobalVars.cpp", "deltas/ReduceInstructions.cpp", "deltas/ReduceMetadata.cpp", - "deltas/ReduceOperandBundes.cpp", "llvm-reduce.cpp", ] } From llvm-commits at lists.llvm.org Mon Jul 6 17:44:41 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Mon, 06 Jul 2020 17:44:41 -0700 (PDT) Subject: [llvm] 7a32589 - [gn build] Port 05f2b5ccfc5 Message-ID: <5f03c579.1c69fb81.df72e.2182@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-07T00:37:49Z New Revision: 7a3258912c4e86ea4f3d6e1ccf72d090d9bb299c URL: https://github.com/llvm/llvm-project/commit/7a3258912c4e86ea4f3d6e1ccf72d090d9bb299c DIFF: https://github.com/llvm/llvm-project/commit/7a3258912c4e86ea4f3d6e1ccf72d090d9bb299c.diff LOG: [gn build] Port 05f2b5ccfc5 Added: Modified: llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn index efa80c1b86d8..efb8e40850c3 100644 --- a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn @@ -17,6 +17,7 @@ executable("llvm-reduce") { "deltas/ReduceGlobalVars.cpp", "deltas/ReduceInstructions.cpp", "deltas/ReduceMetadata.cpp", + "deltas/ReduceOperandBundles.cpp", "llvm-reduce.cpp", ] } From llvm-commits at lists.llvm.org Mon Jul 6 17:46:05 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:46:05 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: MaskRay accepted this revision. MaskRay added a comment. Some nits. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/mips-plt.test:64 +# GNU-NEXT: 0041081c 004007c0 00000000 FUNC UND puts +# GNU-NEXT: 00410820 004007c0 00000000 FUNC UND __libc_start_main + ---------------- `# ` prepending can be committed separately. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3024 + if (!DtPltGot) + return createError("cannot find PLTGOT dynamic table tag"); + if (!DtLocalGotNum) ---------------- The canonical term is dynamic tag, not dynamic table tag. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Mon Jul 6 17:48:09 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:48:09 +0000 (UTC) Subject: [PATCH] D81704: [WebAssembly] Adding 64-bit version of R_WASM_MEMORY_ADDR_* relocs In-Reply-To: References: Message-ID: <08f6e4d7243aab5d42512683ef677ddd@localhost.localdomain> thakis added inline comments. ================ Comment at: llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp:40 +#define GET_INSTRINFO_ENUM 1 +#include "WebAssemblyGenInstrInfo.inc" ---------------- aardappel wrote: > aardappel wrote: > > thakis wrote: > > > This is a bit awkward. It makes WebAssembly the only target that has its TargetInfo dir depend on a llvm-tblgen generated file. It doesn't cause actual issues, but it's irregular. > > > > > > Make of this what you will :) > > I'd be happy to move this something else.. we just don't have any other .cpp that is shared between CodeGen and MC. I could add a new lib that both depend on? @dschuff > Actually, the cleaner solution would be to have tablegen (or whatever creates the instruction mappings) not stick both generated functions in the same #ifdef, which is the root cause of this having to sit in a shared location in the first place. Or make the functions static. But that potentially affects other targets, etc. This came back to bite me today (or, well, actually, a few weeks ago, but I didn't notice until today): In the GN build (which isn't supported and which you don't have to care about; it's just FYI, nothing to act on), the .inc files get generated in the Target sub directory they best fit into. Due to this change, I had moved WebAssemblyGenInstrInfo.inc from WebAssembly/MCTargetDesc to WebAssembly/TargetInfo. I didn't realize that this had the effect that I now had a correct WebAssemblyGenInstrInfo.inc and a stale WebAssemblyGenInstrInfo.inc in different directories in build dirs on the bots, and they happened to pick up the stale one. Since the .inc include is unqualified, a wasm .td change today broke my bots. The hack fix was to delete the stale .inc file. The cmake build puts all target llvm-tblgen output in lib/Target/$targetname so it's not a problem there (the GN build should do that too), but I thought it's a nice example of how "this looks a bit funny" turned into an actual problem down the road. Again, nothing for you to do here :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81704/new/ https://reviews.llvm.org/D81704 From llvm-commits at lists.llvm.org Mon Jul 6 17:52:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:52:23 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: <46e41b041d596c079c07d716fd5ee9fb@localhost.localdomain> MaskRay added a comment. Generally looks good, can you also add a test which will trigger an overflow with your previous revision? ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15750 + // are too large. + unsigned Bits = ConstNode.getScalarValueSizeInBits(); + if (Bits > 8 * sizeof(int64_t)) ---------------- Nit: const unsigned Bits ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15757 + const APInt &C2 = C2Node->getAPIntValue(); + // Prevent the transform since c1*c2 is overflow. + if ((C1 * C2).getBitWidth() > ConstNode.getScalarValueSizeInBits()) ---------------- // Prevent the transform if c1*c2 overflows. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15758 + // Prevent the transform since c1*c2 is overflow. + if ((C1 * C2).getBitWidth() > ConstNode.getScalarValueSizeInBits()) + return false; ---------------- ConstNode.getScalarValueSizeInBits() -> Bits ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15761 + // Do sign extension for c1*c2 according to c2's type. + int64_t C1C2 = llvm::SignExtend64((C1 * C2).getZExtValue(), Bits); + // This transform will introduce regression, if c1 is legal add ---------------- Store `C1 * C2` in a variable. Please don't repeat multiplication. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Mon Jul 6 17:53:18 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:53:18 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dblaikie added a comment. In D82975#2134347 , @probinson wrote: > In D82975#2134132 , @dblaikie wrote: > > > In D82975#2128353 , @dstenb wrote: > > > > > In D82975#2127201 , @SouraVX wrote: > > > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. > > > My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. > > LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand: $ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug" 1 .debug_info.dwo 0000003c 0000000000000000 2 .debug_abbrev.dwo 0000003e 0000000000000000 3 .debug_macro.dwo 0000001e 0000000000000000 4 .debug_macro.dwo 00000364 0000000000000000 5 .debug_macro.dwo 00000013 0000000000000000 6 .debug_line.dwo 00000048 0000000000000000 7 .debug_str_offsets.dwo 000002d5 0000000000000000 8 .debug_str.dwo 00000e05 0000000000000000 $ llvm-dwarfdump-tot main.dwo -debug-macro main.dwo: file format elf64-x86-64 .debug_macro.dwo contents: 0x00000000: - lineno: 19 macro: DW_MACINFO_invalid I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Mon Jul 6 17:55:19 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:55:19 +0000 (UTC) Subject: [PATCH] D83159: [RISCV] Add a new codegen test In-Reply-To: References: Message-ID: MaskRay added a comment. Optional: you can add `[test]` to the subject, i.e. `[RISCV][test]` to make it clear this patch is completely about tests. This is even stronger than `NFC` (non-functional change). ================ Comment at: llvm/test/CodeGen/RISCV/addimm-mulimm.ll:1 +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s ---------------- It'd be better adding a file-level comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 From llvm-commits at lists.llvm.org Mon Jul 6 17:56:28 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:56:28 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <656003f9c853a035248c31e41d961ab9@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Mon Jul 6 17:56:29 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via llvm-commits) Date: Mon, 06 Jul 2020 17:56:29 -0700 (PDT) Subject: [llvm] 10c82ee - Revert "[LV] Enable the LoopVectorizer to create pointer inductions" Message-ID: <5f03c83d.1c69fb81.70ffa.ac82@mx.google.com> Author: Jordan Rupprecht Date: 2020-07-06T17:50:38-07:00 New Revision: 10c82eecbcb7d9f000f6640b26c854843a78f091 URL: https://github.com/llvm/llvm-project/commit/10c82eecbcb7d9f000f6640b26c854843a78f091 DIFF: https://github.com/llvm/llvm-project/commit/10c82eecbcb7d9f000f6640b26c854843a78f091.diff LOG: Revert "[LV] Enable the LoopVectorizer to create pointer inductions" This reverts commit a8fe12065ec8137e55a6a8b35dd5355477c2ac16. It causes a crash when building gzip. Will post the detailed reduced test case to D81267. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index e3e0727b6b38..26f2aa0073e1 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4201,66 +4201,26 @@ void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF, case InductionDescriptor::IK_PtrInduction: { // Handle the pointer induction variable case. assert(P->getType()->isPointerTy() && "Unexpected type."); - - if (Cost->isScalarAfterVectorization(P, VF)) { - // This is the normalized GEP that starts counting at zero. - Value *PtrInd = - Builder.CreateSExtOrTrunc(Induction, II.getStep()->getType()); - // Determine the number of scalars we need to generate for each unroll - // iteration. If the instruction is uniform, we only need to generate the - // first lane. Otherwise, we generate all VF values. - unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF; - for (unsigned Part = 0; Part < UF; ++Part) { - for (unsigned Lane = 0; Lane < Lanes; ++Lane) { - Constant *Idx = ConstantInt::get(PtrInd->getType(), Lane + Part * VF); - Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx); - Value *SclrGep = - emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II); - SclrGep->setName("next.gep"); - VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep); - } - } - return; - } - assert(isa(II.getStep()) && - "Induction step not a SCEV constant!"); - Type *PhiType = II.getStep()->getType(); - - // Build a pointer phi - Value *ScalarStartValue = II.getStartValue(); - Type *ScStValueType = ScalarStartValue->getType(); - PHINode *NewPointerPhi = - PHINode::Create(ScStValueType, 2, "pointer.phi", Induction); - NewPointerPhi->addIncoming(ScalarStartValue, LoopVectorPreHeader); - - // A pointer induction, performed by using a gep - const SCEV *ScalarStep = II.getStep(); - SCEVExpander Exp(*PSE.getSE(), DL, "induction"); - Value *ScalarStepValue = - Exp.expandCodeFor(ScalarStep, PhiType, &*Builder.GetInsertPoint()); - Value *InductionGEP = Builder.CreateGEP( - ScStValueType->getPointerElementType(), NewPointerPhi, - Builder.CreateMul(ScalarStepValue, ConstantInt::get(PhiType, VF * UF))); - NewPointerPhi->addIncoming(InductionGEP, - cast(InductionGEP)->getParent()); - - // Create UF many actual address geps that use the pointer - // phi as base and a vectorized version of the step value - // () as offset. + // This is the normalized GEP that starts counting at zero. + Value *PtrInd = Induction; + PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType()); + // Determine the number of scalars we need to generate for each unroll + // iteration. If the instruction is uniform, we only need to generate the + // first lane. Otherwise, we generate all VF values. + unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF; + // These are the scalar results. Notice that we don't generate vector GEPs + // because scalar GEPs result in better code. for (unsigned Part = 0; Part < UF; ++Part) { - SmallVector Indices; - // Create a vector of consecutive numbers from zero to VF. - for (unsigned i = 0; i < VF; ++i) - Indices.push_back(ConstantInt::get(PhiType, i + Part * VF)); - Constant *StartOffset = ConstantVector::get(Indices); - - Value *GEP = Builder.CreateGEP( - ScStValueType->getPointerElementType(), NewPointerPhi, - Builder.CreateMul(StartOffset, - Builder.CreateVectorSplat(VF, ScalarStepValue), - "vector.gep")); - VectorLoopValueMap.setVectorValue(P, Part, GEP); + for (unsigned Lane = 0; Lane < Lanes; ++Lane) { + Constant *Idx = ConstantInt::get(PtrInd->getType(), Lane + Part * VF); + Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx); + Value *SclrGep = + emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II); + SclrGep->setName("next.gep"); + VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep); + } } + return; } } } @@ -4496,7 +4456,6 @@ void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) { // accesses that will remain scalar. SmallSetVector ScalarPtrs; SmallPtrSet PossibleNonScalarPtrs; - auto *Latch = TheLoop->getLoopLatch(); // A helper that returns true if the use of Ptr by MemAccess will be scalar. // The pointer operands of loads and stores will be scalar as long as the @@ -4522,33 +4481,11 @@ void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) { !TheLoop->isLoopInvariant(V); }; - auto isScalarPtrInduction = [&](Instruction *MemAccess, Value *Ptr) { - if (!isa(Ptr) || - !Legal->getInductionVars().count(cast(Ptr))) - return false; - auto &Induction = Legal->getInductionVars()[cast(Ptr)]; - if (Induction.getKind() != InductionDescriptor::IK_PtrInduction) - return false; - return isScalarUse(MemAccess, Ptr); - }; - - // A helper that evaluates a memory access's use of a pointer. If the - // pointer is actually the pointer induction of a loop, it is being - // inserted into Worklist. If the use will be a scalar use, and the - // pointer is only used by memory accesses, we place the pointer in - // ScalarPtrs. Otherwise, the pointer is placed in PossibleNonScalarPtrs. + // A helper that evaluates a memory access's use of a pointer. If the use + // will be a scalar use, and the pointer is only used by memory accesses, we + // place the pointer in ScalarPtrs. Otherwise, the pointer is placed in + // PossibleNonScalarPtrs. auto evaluatePtrUse = [&](Instruction *MemAccess, Value *Ptr) { - if (isScalarPtrInduction(MemAccess, Ptr)) { - Worklist.insert(cast(Ptr)); - Instruction *Update = cast( - cast(Ptr)->getIncomingValueForBlock(Latch)); - Worklist.insert(Update); - LLVM_DEBUG(dbgs() << "LV: Found new scalar instruction: " << *Ptr - << "\n"); - LLVM_DEBUG(dbgs() << "LV: Found new scalar instruction: " << *Update - << "\n"); - return; - } // We only care about bitcast and getelementptr instructions contained in // the loop. if (!isLoopVaryingBitCastOrGEP(Ptr)) @@ -4572,9 +4509,10 @@ void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) { }; // We seed the scalars analysis with three classes of instructions: (1) - // instructions marked uniform-after-vectorization and (2) bitcast, - // getelementptr and (pointer) phi instructions used by memory accesses - // requiring a scalar use. + // instructions marked uniform-after-vectorization, (2) bitcast and + // getelementptr instructions used by memory accesses requiring a scalar use, + // and (3) pointer induction variables and their update instructions (we + // currently only scalarize these). // // (1) Add to the worklist all instructions that have been identified as // uniform-after-vectorization. @@ -4600,6 +4538,24 @@ void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) { Worklist.insert(I); } + // (3) Add to the worklist all pointer induction variables and their update + // instructions. + // + // TODO: Once we are able to vectorize pointer induction variables we should + // no longer insert them into the worklist here. + auto *Latch = TheLoop->getLoopLatch(); + for (auto &Induction : Legal->getInductionVars()) { + auto *Ind = Induction.first; + auto *IndUpdate = cast(Ind->getIncomingValueForBlock(Latch)); + if (Induction.second.getKind() != InductionDescriptor::IK_PtrInduction) + continue; + Worklist.insert(Ind); + Worklist.insert(IndUpdate); + LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n"); + LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate + << "\n"); + } + // Insert the forced scalars. // FIXME: Currently widenPHIInstruction() often creates a dead vector // induction variable when the PHI user is scalarized. @@ -4635,6 +4591,14 @@ void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) { auto *Ind = Induction.first; auto *IndUpdate = cast(Ind->getIncomingValueForBlock(Latch)); + // We already considered pointer induction variables, so there's no reason + // to look at their users again. + // + // TODO: Once we are able to vectorize pointer induction variables we + // should no longer skip over them here. + if (Induction.second.getKind() == InductionDescriptor::IK_PtrInduction) + continue; + // If tail-folding is applied, the primary induction variable will be used // to feed a vector compare. if (Ind == Legal->getPrimaryInduction() && foldTailByMasking()) diff --git a/llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll b/llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll deleted file mode 100644 index daeac07f33e1..000000000000 --- a/llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll +++ /dev/null @@ -1,972 +0,0 @@ -; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -loop-vectorize -S -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -dce -instcombine --simplifycfg -enable-arm-maskedgatscat < %s | FileCheck %s - -target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" -target triple = "thumbv8.1m.main-none-none-eabi" - -define hidden void @pointer_phi_v4i32_add1(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i32 %s, i32%y) { -; CHECK-LABEL: @pointer_phi_v4i32_add1( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i32, i32* [[A:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i32, i32* [[B:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[NEXT_GEP]] to <4 x i32>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4 -; CHECK-NEXT: [[TMP1:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32* [[NEXT_GEP4]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000 -; CHECK-NEXT: br i1 [[TMP3]], label [[END:%.*]], label [[VECTOR_BODY]], !llvm.loop !0 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi i32* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi i32* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load i32, i32* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds i32, i32* %A.addr.09, i32 1 - %add = add nsw i32 %0, %y - store i32 %add, i32* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds i32, i32* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4i32_add2(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v4i32_add2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i32, i32* [[A:%.*]], i32 1992 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i32, i32* [[B:%.*]], i32 996 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i32, i32* [[A]], i32 [[TMP0]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i32, i32* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[NEXT_GEP]] to <8 x i32>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 4 -; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> undef, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[NEXT_GEP4]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !2 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi i32* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi i32* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[A_ADDR_09]], i32 2 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP5]], [[Y]] -; CHECK-NEXT: store i32 [[ADD]], i32* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !3 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi i32* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi i32* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load i32, i32* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds i32, i32* %A.addr.09, i32 2 - %add = add nsw i32 %0, %y - store i32 %add, i32* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds i32, i32* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4i32_add3(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v4i32_add3( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i32, i32* [[A:%.*]], i32 2988 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i32, i32* [[B:%.*]], i32 996 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[POINTER_PHI:%.*]] = phi i32* [ [[A]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0]] = getelementptr i32, i32* [[POINTER_PHI]], i32 12 -; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i32, i32* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP1]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[NEXT_GEP]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !5 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi i32* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi i32* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[A_ADDR_09]], i32 3 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP5]], [[Y]] -; CHECK-NEXT: store i32 [[ADD]], i32* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !6 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi i32* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi i32* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load i32, i32* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds i32, i32* %A.addr.09, i32 3 - %add = add nsw i32 %0, %y - store i32 %add, i32* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds i32, i32* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v8i16_add1(i16* noalias nocapture readonly %A, i16* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v8i16_add1( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i16 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[TMP0]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i16> [[BROADCAST_SPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i16, i16* [[A:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i16, i16* [[B:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16* [[NEXT_GEP]] to <8 x i16>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <8 x i16>, <8 x i16>* [[TMP1]], align 2 -; CHECK-NEXT: [[TMP2:%.*]] = add <8 x i16> [[WIDE_LOAD]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast i16* [[NEXT_GEP4]] to <8 x i16>* -; CHECK-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* [[TMP3]], align 2 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000 -; CHECK-NEXT: br i1 [[TMP4]], label [[END:%.*]], label [[VECTOR_BODY]], !llvm.loop !7 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i16 - br label %for.body -for.body: ; preds = %for.body, %for.body.lr.ph - %A.addr.011 = phi i16* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.010 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.09 = phi i16* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %l1 = load i16, i16* %A.addr.011, align 2 - %add.ptr = getelementptr inbounds i16, i16* %A.addr.011, i32 1 - %conv1 = add i16 %l1, %0 - store i16 %conv1, i16* %B.addr.09, align 2 - %incdec.ptr = getelementptr inbounds i16, i16* %B.addr.09, i32 1 - %inc = add nuw nsw i32 %i.010, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v8i16_add2(i16* noalias nocapture readonly %A, i16* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v8i16_add2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i16 -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i16, i16* [[A:%.*]], i32 1984 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i16, i16* [[B:%.*]], i32 992 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[TMP0]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i16> [[BROADCAST_SPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i16, i16* [[A]], i32 [[TMP1]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i16, i16* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16* [[NEXT_GEP]] to <16 x i16>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <16 x i16>, <16 x i16>* [[TMP2]], align 2 -; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i16> [[WIDE_VEC]], <16 x i16> undef, <8 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP4:%.*]] = bitcast i16* [[NEXT_GEP4]] to <8 x i16>* -; CHECK-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* [[TMP4]], align 2 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992 -; CHECK-NEXT: br i1 [[TMP5]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !8 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_011:%.*]] = phi i16* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_010:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 992, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_09:%.*]] = phi i16* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[L1:%.*]] = load i16, i16* [[A_ADDR_011]], align 2 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i16, i16* [[A_ADDR_011]], i32 2 -; CHECK-NEXT: [[CONV1:%.*]] = add i16 [[L1]], [[TMP0]] -; CHECK-NEXT: store i16 [[CONV1]], i16* [[B_ADDR_09]], align 2 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i16, i16* [[B_ADDR_09]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_010]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !9 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i16 - br label %for.body -for.body: ; preds = %for.body, %for.body.lr.ph - %A.addr.011 = phi i16* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.010 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.09 = phi i16* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %l1 = load i16, i16* %A.addr.011, align 2 - %add.ptr = getelementptr inbounds i16, i16* %A.addr.011, i32 2 - %conv1 = add i16 %l1, %0 - store i16 %conv1, i16* %B.addr.09, align 2 - %incdec.ptr = getelementptr inbounds i16, i16* %B.addr.09, i32 1 - %inc = add nuw nsw i32 %i.010, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v8i16_add3(i16* noalias nocapture readonly %A, i16* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v8i16_add3( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i16 -; CHECK-NEXT: br label [[FOR_BODY:%.*]] -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_011:%.*]] = phi i16* [ [[A:%.*]], [[ENTRY:%.*]] ], [ [[ADD_PTR:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[I_010:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_09:%.*]] = phi i16* [ [[B:%.*]], [[ENTRY]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[L1:%.*]] = load i16, i16* [[A_ADDR_011]], align 2 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i16, i16* [[A_ADDR_011]], i32 3 -; CHECK-NEXT: [[CONV1:%.*]] = add i16 [[L1]], [[TMP0]] -; CHECK-NEXT: store i16 [[CONV1]], i16* [[B_ADDR_09]], align 2 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i16, i16* [[B_ADDR_09]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_010]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]] -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i16 - br label %for.body -for.body: ; preds = %for.body, %for.body.lr.ph - %A.addr.011 = phi i16* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.010 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.09 = phi i16* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %l1 = load i16, i16* %A.addr.011, align 2 - %add.ptr = getelementptr inbounds i16, i16* %A.addr.011, i32 3 - %conv1 = add i16 %l1, %0 - store i16 %conv1, i16* %B.addr.09, align 2 - %incdec.ptr = getelementptr inbounds i16, i16* %B.addr.09, i32 1 - %inc = add nuw nsw i32 %i.010, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v16i8_add1(i8* noalias nocapture readonly %A, i8* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v16i8_add1( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i8 -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, i8* [[A:%.*]], i32 992 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i8, i8* [[B:%.*]], i32 992 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[TMP0]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, i8* [[A]], i32 [[INDEX]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i8, i8* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8* [[NEXT_GEP]] to <16 x i8>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, <16 x i8>* [[TMP1]], align 1 -; CHECK-NEXT: [[TMP2:%.*]] = add <16 x i8> [[WIDE_LOAD]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast i8* [[NEXT_GEP4]] to <16 x i8>* -; CHECK-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* [[TMP3]], align 1 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 16 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !10 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_010:%.*]] = phi i8* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_09:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 992, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_08:%.*]] = phi i8* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load i8, i8* [[A_ADDR_010]], align 1 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_010]], i32 1 -; CHECK-NEXT: [[CONV1:%.*]] = add i8 [[TMP5]], [[TMP0]] -; CHECK-NEXT: store i8 [[CONV1]], i8* [[B_ADDR_08]], align 1 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8* [[B_ADDR_08]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_09]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !11 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i8 - br label %for.body - -for.body: - %A.addr.010 = phi i8* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.09 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.08 = phi i8* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %1 = load i8, i8* %A.addr.010, align 1 - %add.ptr = getelementptr inbounds i8, i8* %A.addr.010, i32 1 - %conv1 = add i8 %1, %0 - store i8 %conv1, i8* %B.addr.08, align 1 - %incdec.ptr = getelementptr inbounds i8, i8* %B.addr.08, i32 1 - %inc = add nuw nsw i32 %i.09, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v16i8_add2(i8* noalias nocapture readonly %A, i8* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v16i8_add2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i8 -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, i8* [[A:%.*]], i32 1984 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i8, i8* [[B:%.*]], i32 992 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[TMP0]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, i8* [[A]], i32 [[TMP1]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i8, i8* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8* [[NEXT_GEP]] to <32 x i8>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <32 x i8>, <32 x i8>* [[TMP2]], align 1 -; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <32 x i8> [[WIDE_VEC]], <32 x i8> undef, <16 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = add <16 x i8> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP4:%.*]] = bitcast i8* [[NEXT_GEP4]] to <16 x i8>* -; CHECK-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* [[TMP4]], align 1 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 16 -; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992 -; CHECK-NEXT: br i1 [[TMP5]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !12 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_010:%.*]] = phi i8* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_09:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 992, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_08:%.*]] = phi i8* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP6:%.*]] = load i8, i8* [[A_ADDR_010]], align 1 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_010]], i32 2 -; CHECK-NEXT: [[CONV1:%.*]] = add i8 [[TMP6]], [[TMP0]] -; CHECK-NEXT: store i8 [[CONV1]], i8* [[B_ADDR_08]], align 1 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8* [[B_ADDR_08]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_09]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !13 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i8 - br label %for.body - -for.body: - %A.addr.010 = phi i8* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.09 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.08 = phi i8* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %1 = load i8, i8* %A.addr.010, align 1 - %add.ptr = getelementptr inbounds i8, i8* %A.addr.010, i32 2 - %conv1 = add i8 %1, %0 - store i8 %conv1, i8* %B.addr.08, align 1 - %incdec.ptr = getelementptr inbounds i8, i8* %B.addr.08, i32 1 - %inc = add nuw nsw i32 %i.09, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v16i8_add3(i8* noalias nocapture readonly %A, i8* noalias nocapture %B, i32 %y) { -; CHECK-LABEL: @pointer_phi_v16i8_add3( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[Y:%.*]] to i8 -; CHECK-NEXT: br label [[FOR_BODY:%.*]] -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_010:%.*]] = phi i8* [ [[A:%.*]], [[ENTRY:%.*]] ], [ [[ADD_PTR:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[I_09:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_08:%.*]] = phi i8* [ [[B:%.*]], [[ENTRY]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[TMP1:%.*]] = load i8, i8* [[A_ADDR_010]], align 1 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_010]], i32 3 -; CHECK-NEXT: [[CONV1:%.*]] = add i8 [[TMP1]], [[TMP0]] -; CHECK-NEXT: store i8 [[CONV1]], i8* [[B_ADDR_08]], align 1 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8* [[B_ADDR_08]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_09]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]] -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - %0 = trunc i32 %y to i8 - br label %for.body - -for.body: - %A.addr.010 = phi i8* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.09 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.08 = phi i8* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %1 = load i8, i8* %A.addr.010, align 1 - %add.ptr = getelementptr inbounds i8, i8* %A.addr.010, i32 3 - %conv1 = add i8 %1, %0 - store i8 %conv1, i8* %B.addr.08, align 1 - %incdec.ptr = getelementptr inbounds i8, i8* %B.addr.08, i32 1 - %inc = add nuw nsw i32 %i.09, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4f32_add1(float* noalias nocapture readonly %A, float* noalias nocapture %B, float %y) { -; CHECK-LABEL: @pointer_phi_v4f32_add1( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x float> undef, float [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr float, float* [[A:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr float, float* [[B:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[NEXT_GEP]] to <4 x float>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4 -; CHECK-NEXT: [[TMP1:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP2:%.*]] = bitcast float* [[NEXT_GEP4]] to <4 x float>* -; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000 -; CHECK-NEXT: br i1 [[TMP3]], label [[END:%.*]], label [[VECTOR_BODY]], !llvm.loop !14 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi float* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi float* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load float, float* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds float, float* %A.addr.09, i32 1 - %add = fadd fast float %0, %y - store float %add, float* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds float, float* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4f32_add2(float* noalias nocapture readonly %A, float* noalias nocapture %B, float %y) { -; CHECK-LABEL: @pointer_phi_v4f32_add2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr float, float* [[A:%.*]], i32 1992 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr float, float* [[B:%.*]], i32 996 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x float> undef, float [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr float, float* [[A]], i32 [[TMP0]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr float, float* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP1:%.*]] = bitcast float* [[NEXT_GEP]] to <8 x float>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x float>, <8 x float>* [[TMP1]], align 4 -; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x float> [[WIDE_VEC]], <8 x float> undef, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[NEXT_GEP4]] to <4 x float>* -; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !15 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi float* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi float* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load float, float* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[A_ADDR_09]], i32 2 -; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP5]], [[Y]] -; CHECK-NEXT: store float [[ADD]], float* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !16 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi float* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi float* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load float, float* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds float, float* %A.addr.09, i32 2 - %add = fadd fast float %0, %y - store float %add, float* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds float, float* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4f32_add3(float* noalias nocapture readonly %A, float* noalias nocapture %B, float %y) { -; CHECK-LABEL: @pointer_phi_v4f32_add3( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr float, float* [[A:%.*]], i32 2988 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr float, float* [[B:%.*]], i32 996 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x float> undef, float [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[POINTER_PHI:%.*]] = phi float* [ [[A]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0]] = getelementptr float, float* [[POINTER_PHI]], i32 12 -; CHECK-NEXT: [[TMP1:%.*]] = getelementptr float, float* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr float, float* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> [[TMP1]], i32 4, <4 x i1> , <4 x float> undef) -; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[NEXT_GEP]] to <4 x float>* -; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !17 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi float* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi float* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load float, float* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[A_ADDR_09]], i32 3 -; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP5]], [[Y]] -; CHECK-NEXT: store float [[ADD]], float* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !18 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi float* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi float* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load float, float* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds float, float* %A.addr.09, i32 3 - %add = fadd fast float %0, %y - store float %add, float* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds float, float* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4half_add1(half* noalias nocapture readonly %A, half* noalias nocapture %B, half %y) { -; CHECK-LABEL: @pointer_phi_v4half_add1( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x half> undef, half [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x half> [[BROADCAST_SPLATINSERT]], <8 x half> undef, <8 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr half, half* [[A:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr half, half* [[B:%.*]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP0:%.*]] = bitcast half* [[NEXT_GEP]] to <8 x half>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <8 x half>, <8 x half>* [[TMP0]], align 4 -; CHECK-NEXT: [[TMP1:%.*]] = fadd fast <8 x half> [[WIDE_LOAD]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP2:%.*]] = bitcast half* [[NEXT_GEP4]] to <8 x half>* -; CHECK-NEXT: store <8 x half> [[TMP1]], <8 x half>* [[TMP2]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000 -; CHECK-NEXT: br i1 [[TMP3]], label [[END:%.*]], label [[VECTOR_BODY]], !llvm.loop !19 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi half* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi half* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load half, half* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds half, half* %A.addr.09, i32 1 - %add = fadd fast half %0, %y - store half %add, half* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds half, half* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4half_add2(half* noalias nocapture readonly %A, half* noalias nocapture %B, half %y) { -; CHECK-LABEL: @pointer_phi_v4half_add2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr half, half* [[A:%.*]], i32 1984 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr half, half* [[B:%.*]], i32 992 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x half> undef, half [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x half> [[BROADCAST_SPLATINSERT]], <8 x half> undef, <8 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr half, half* [[A]], i32 [[TMP0]] -; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr half, half* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[TMP1:%.*]] = bitcast half* [[NEXT_GEP]] to <16 x half>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <16 x half>, <16 x half>* [[TMP1]], align 4 -; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x half> [[WIDE_VEC]], <16 x half> undef, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <8 x half> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP3:%.*]] = bitcast half* [[NEXT_GEP4]] to <8 x half>* -; CHECK-NEXT: store <8 x half> [[TMP2]], <8 x half>* [[TMP3]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992 -; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !20 -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi half* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 992, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi half* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP5:%.*]] = load half, half* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds half, half* [[A_ADDR_09]], i32 2 -; CHECK-NEXT: [[ADD:%.*]] = fadd fast half [[TMP5]], [[Y]] -; CHECK-NEXT: store half [[ADD]], half* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds half, half* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]], !llvm.loop !21 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi half* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi half* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load half, half* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds half, half* %A.addr.09, i32 2 - %add = fadd fast half %0, %y - store half %add, half* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds half, half* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -define hidden void @pointer_phi_v4half_add3(half* noalias nocapture readonly %A, half* noalias nocapture %B, half %y) { -; CHECK-LABEL: @pointer_phi_v4half_add3( -; CHECK-NEXT: entry: -; CHECK-NEXT: br label [[FOR_BODY:%.*]] -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_09:%.*]] = phi half* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[A:%.*]], [[ENTRY:%.*]] ] -; CHECK-NEXT: [[I_08:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ] -; CHECK-NEXT: [[B_ADDR_07:%.*]] = phi half* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[B:%.*]], [[ENTRY]] ] -; CHECK-NEXT: [[TMP0:%.*]] = load half, half* [[A_ADDR_09]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds half, half* [[A_ADDR_09]], i32 3 -; CHECK-NEXT: [[ADD:%.*]] = fadd fast half [[TMP0]], [[Y:%.*]] -; CHECK-NEXT: store half [[ADD]], half* [[B_ADDR_07]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds half, half* [[B_ADDR_07]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[FOR_BODY]] -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body -for.body: - %A.addr.09 = phi half* [ %add.ptr, %for.body ], [ %A, %entry ] - %i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %B.addr.07 = phi half* [ %incdec.ptr, %for.body ], [ %B, %entry ] - %0 = load half, half* %A.addr.09, align 4 - %add.ptr = getelementptr inbounds half, half* %A.addr.09, i32 3 - %add = fadd fast half %0, %y - store half %add, half* %B.addr.07, align 4 - %incdec.ptr = getelementptr inbounds half, half* %B.addr.07, i32 1 - %inc = add nuw nsw i32 %i.08, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body -end: - ret void -} - -!0 = distinct !{!0, !1} -!1 = !{!"llvm.loop.interleave.count", i32 2} - -define hidden void @pointer_phi_v4i32_uf2(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i32 %n, i32 %y) { -; CHECK-LABEL: @pointer_phi_v4i32_uf2( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i32, i32* [[A:%.*]], i32 59952 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i32, i32* [[B:%.*]], i32 9992 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i32> undef, i32 [[Y]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT6]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[POINTER_PHI:%.*]] = phi i32* [ [[A]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0]] = getelementptr i32, i32* [[POINTER_PHI]], i32 48 -; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i32, i32* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP1]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[WIDE_MASKED_GATHER5:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP2]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER5]], [[BROADCAST_SPLAT7]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[NEXT_GEP]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP5]], align 4 -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i32, i32* [[NEXT_GEP]], i32 4 -; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP7]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 9992 -; CHECK-NEXT: br i1 [[TMP8]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !22 -; CHECK: for.cond.cleanup: -; CHECK-NEXT: ret void -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_08:%.*]] = phi i32* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_07:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 9992, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_06:%.*]] = phi i32* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[A_ADDR_08]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[A_ADDR_08]], i32 6 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], [[Y]] -; CHECK-NEXT: store i32 [[ADD]], i32* [[B_ADDR_06]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[B_ADDR_06]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_07]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 10000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]], !llvm.loop !23 -; - -entry: - br label %for.body - -for.cond.cleanup: - ret void - -for.body: - %A.addr.08 = phi i32* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.07 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.06 = phi i32* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %0 = load i32, i32* %A.addr.08, align 4 - %add.ptr = getelementptr inbounds i32, i32* %A.addr.08, i32 6 - %add = add nsw i32 %0, %y - store i32 %add, i32* %B.addr.06, align 4 - %incdec.ptr = getelementptr inbounds i32, i32* %B.addr.06, i32 1 - %inc = add nuw nsw i32 %i.07, 1 - %exitcond = icmp eq i32 %inc, 10000 - br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !0 -} - -!2 = distinct !{!2, !3} -!3 = !{!"llvm.loop.interleave.count", i32 4} - -define hidden void @pointer_phi_v4i32_uf4(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i32 %n, i32 %y) { -; CHECK-LABEL: @pointer_phi_v4i32_uf4( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i32, i32* [[A:%.*]], i32 59904 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i32, i32* [[B:%.*]], i32 9984 -; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[Y:%.*]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: [[BROADCAST_SPLATINSERT10:%.*]] = insertelement <4 x i32> undef, i32 [[Y]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT11:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT10]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: [[BROADCAST_SPLATINSERT12:%.*]] = insertelement <4 x i32> undef, i32 [[Y]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT13:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT12]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: [[BROADCAST_SPLATINSERT14:%.*]] = insertelement <4 x i32> undef, i32 [[Y]], i32 0 -; CHECK-NEXT: [[BROADCAST_SPLAT15:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT14]], <4 x i32> undef, <4 x i32> zeroinitializer -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[POINTER_PHI:%.*]] = phi i32* [ [[A]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0]] = getelementptr i32, i32* [[POINTER_PHI]], i32 96 -; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[TMP4:%.*]] = getelementptr i32, i32* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i32, i32* [[B]], i32 [[INDEX]] -; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP1]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[WIDE_MASKED_GATHER7:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP2]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[WIDE_MASKED_GATHER8:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP3]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[WIDE_MASKED_GATHER9:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[TMP4]], i32 4, <4 x i1> , <4 x i32> undef) -; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]] -; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER7]], [[BROADCAST_SPLAT11]] -; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER8]], [[BROADCAST_SPLAT13]] -; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_GATHER9]], [[BROADCAST_SPLAT15]] -; CHECK-NEXT: [[TMP9:%.*]] = bitcast i32* [[NEXT_GEP]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP9]], align 4 -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr i32, i32* [[NEXT_GEP]], i32 4 -; CHECK-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[TMP11]], align 4 -; CHECK-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[NEXT_GEP]], i32 8 -; CHECK-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP13]], align 4 -; CHECK-NEXT: [[TMP14:%.*]] = getelementptr i32, i32* [[NEXT_GEP]], i32 12 -; CHECK-NEXT: [[TMP15:%.*]] = bitcast i32* [[TMP14]] to <4 x i32>* -; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP15]], align 4 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 16 -; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i32 [[INDEX_NEXT]], 9984 -; CHECK-NEXT: br i1 [[TMP16]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop !24 -; CHECK: for.cond.cleanup: -; CHECK-NEXT: ret void -; CHECK: for.body: -; CHECK-NEXT: [[A_ADDR_08:%.*]] = phi i32* [ [[ADD_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[I_07:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 9984, [[VECTOR_BODY]] ] -; CHECK-NEXT: [[B_ADDR_06:%.*]] = phi i32* [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[IND_END3]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP17:%.*]] = load i32, i32* [[A_ADDR_08]], align 4 -; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[A_ADDR_08]], i32 6 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP17]], [[Y]] -; CHECK-NEXT: store i32 [[ADD]], i32* [[B_ADDR_06]], align 4 -; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[B_ADDR_06]], i32 1 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_07]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 10000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]], !llvm.loop !25 -; -entry: - br label %for.body - -for.cond.cleanup: - ret void - -for.body: - %A.addr.08 = phi i32* [ %A, %entry ], [ %add.ptr, %for.body ] - %i.07 = phi i32 [ 0, %entry ], [ %inc, %for.body ] - %B.addr.06 = phi i32* [ %B, %entry ], [ %incdec.ptr, %for.body ] - %0 = load i32, i32* %A.addr.08, align 4 - %add.ptr = getelementptr inbounds i32, i32* %A.addr.08, i32 6 - %add = add nsw i32 %0, %y - store i32 %add, i32* %B.addr.06, align 4 - %incdec.ptr = getelementptr inbounds i32, i32* %B.addr.06, i32 1 - %inc = add nuw nsw i32 %i.07, 1 - %exitcond = icmp eq i32 %inc, 10000 - br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2 -} - -define hidden void @mult_ptr_iv(i8* noalias nocapture readonly %x, i8* noalias nocapture %z) { -; CHECK-LABEL: @mult_ptr_iv( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, i8* [[Z:%.*]], i32 3000 -; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, i8* [[X:%.*]], i32 3000 -; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt i8* [[SCEVGEP1]], [[Z]] -; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt i8* [[SCEVGEP]], [[X]] -; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY:%.*]], label [[VECTOR_PH:%.*]] -; CHECK: vector.ph: -; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, i8* [[X]], i32 3000 -; CHECK-NEXT: [[IND_END3:%.*]] = getelementptr i8, i8* [[Z]], i32 3000 -; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] -; CHECK: vector.body: -; CHECK-NEXT: [[POINTER_PHI:%.*]] = phi i8* [ [[X]], [[VECTOR_PH]] ], [ [[TMP0:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[POINTER_PHI5:%.*]] = phi i8* [ [[Z]], [[VECTOR_PH]] ], [ [[TMP2:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[TMP0]] = getelementptr i8, i8* [[POINTER_PHI]], i32 12 -; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i8, i8* [[POINTER_PHI]], <4 x i32> -; CHECK-NEXT: [[TMP2]] = getelementptr i8, i8* [[POINTER_PHI5]], i32 12 -; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, i8* [[POINTER_PHI5]], <4 x i32> -; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, <4 x i8*> [[TMP1]], i32 1 -; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> [[TMP1]], i32 1, <4 x i1> , <4 x i8> undef), !alias.scope !26 -; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, <4 x i8*> [[TMP1]], i32 2 -; CHECK-NEXT: [[WIDE_MASKED_GATHER6:%.*]] = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> [[TMP4]], i32 1, <4 x i1> , <4 x i8> undef), !alias.scope !26 -; CHECK-NEXT: [[WIDE_MASKED_GATHER7:%.*]] = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> [[TMP5]], i32 1, <4 x i1> , <4 x i8> undef), !alias.scope !26 -; CHECK-NEXT: [[TMP6:%.*]] = mul <4 x i8> [[WIDE_MASKED_GATHER]], -; CHECK-NEXT: [[TMP7:%.*]] = mul <4 x i8> [[WIDE_MASKED_GATHER]], [[WIDE_MASKED_GATHER6]] -; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i8> [[WIDE_MASKED_GATHER]], [[WIDE_MASKED_GATHER7]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, <4 x i8*> [[TMP3]], i32 1 -; CHECK-NEXT: call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> [[TMP6]], <4 x i8*> [[TMP3]], i32 1, <4 x i1> ), !alias.scope !29, !noalias !26 -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, <4 x i8*> [[TMP3]], i32 2 -; CHECK-NEXT: call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> [[TMP7]], <4 x i8*> [[TMP9]], i32 1, <4 x i1> ), !alias.scope !29, !noalias !26 -; CHECK-NEXT: call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> [[TMP8]], <4 x i8*> [[TMP10]], i32 1, <4 x i1> ), !alias.scope !29, !noalias !26 -; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000 -; CHECK-NEXT: br i1 [[TMP11]], label [[END:%.*]], label [[VECTOR_BODY]], !llvm.loop !31 -; CHECK: for.body: -; CHECK-NEXT: [[X_ADDR_050:%.*]] = phi i8* [ [[INCDEC_PTR2:%.*]], [[FOR_BODY]] ], [ [[X]], [[ENTRY:%.*]] ] -; CHECK-NEXT: [[Z_ADDR_049:%.*]] = phi i8* [ [[INCDEC_PTR34:%.*]], [[FOR_BODY]] ], [ [[Z]], [[ENTRY]] ] -; CHECK-NEXT: [[I_048:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ] -; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i8, i8* [[X_ADDR_050]], i32 1 -; CHECK-NEXT: [[TMP12:%.*]] = load i8, i8* [[X_ADDR_050]], align 1 -; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i8, i8* [[X_ADDR_050]], i32 2 -; CHECK-NEXT: [[TMP13:%.*]] = load i8, i8* [[INCDEC_PTR]], align 1 -; CHECK-NEXT: [[INCDEC_PTR2]] = getelementptr inbounds i8, i8* [[X_ADDR_050]], i32 3 -; CHECK-NEXT: [[TMP14:%.*]] = load i8, i8* [[INCDEC_PTR1]], align 1 -; CHECK-NEXT: [[MUL:%.*]] = mul i8 [[TMP12]], 10 -; CHECK-NEXT: [[MUL1:%.*]] = mul i8 [[TMP12]], [[TMP13]] -; CHECK-NEXT: [[MUL2:%.*]] = mul i8 [[TMP12]], [[TMP14]] -; CHECK-NEXT: [[INCDEC_PTR32:%.*]] = getelementptr inbounds i8, i8* [[Z_ADDR_049]], i32 1 -; CHECK-NEXT: store i8 [[MUL]], i8* [[Z_ADDR_049]], align 1 -; CHECK-NEXT: [[INCDEC_PTR33:%.*]] = getelementptr inbounds i8, i8* [[Z_ADDR_049]], i32 2 -; CHECK-NEXT: store i8 [[MUL1]], i8* [[INCDEC_PTR32]], align 1 -; CHECK-NEXT: [[INCDEC_PTR34]] = getelementptr inbounds i8, i8* [[Z_ADDR_049]], i32 3 -; CHECK-NEXT: store i8 [[MUL2]], i8* [[INCDEC_PTR33]], align 1 -; CHECK-NEXT: [[INC]] = add nuw i32 [[I_048]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[END]], label [[FOR_BODY]], !llvm.loop !32 -; CHECK: end: -; CHECK-NEXT: ret void -; -entry: - br label %for.body - -for.body: - %x.addr.050 = phi i8* [ %incdec.ptr2, %for.body ], [ %x, %entry ] - %z.addr.049 = phi i8* [ %incdec.ptr34, %for.body ], [ %z, %entry ] - %i.048 = phi i32 [ %inc, %for.body ], [ 0, %entry ] - %incdec.ptr = getelementptr inbounds i8, i8* %x.addr.050, i32 1 - %0 = load i8, i8* %x.addr.050, align 1 - %incdec.ptr1 = getelementptr inbounds i8, i8* %x.addr.050, i32 2 - %1 = load i8, i8* %incdec.ptr, align 1 - %incdec.ptr2 = getelementptr inbounds i8, i8* %x.addr.050, i32 3 - %2 = load i8, i8* %incdec.ptr1, align 1 - %conv = zext i8 %0 to i32 - %mul = mul nuw nsw i32 %conv, 10 - %conv1 = zext i8 %1 to i32 - %conv2 = zext i8 %2 to i32 - %mul1 = mul nuw nsw i32 %conv, %conv1 - %mul2 = mul nuw nsw i32 %conv, %conv2 - %conv3 = trunc i32 %mul to i8 - %conv4 = trunc i32 %mul1 to i8 - %conv5 = trunc i32 %mul2 to i8 - %incdec.ptr32 = getelementptr inbounds i8, i8* %z.addr.049, i32 1 - store i8 %conv3, i8* %z.addr.049, align 1 - %incdec.ptr33 = getelementptr inbounds i8, i8* %z.addr.049, i32 2 - store i8 %conv4, i8* %incdec.ptr32, align 1 - %incdec.ptr34 = getelementptr inbounds i8, i8* %z.addr.049, i32 3 - store i8 %conv5, i8* %incdec.ptr33, align 1 - %inc = add nuw i32 %i.048, 1 - %exitcond = icmp eq i32 %inc, 1000 - br i1 %exitcond, label %end, label %for.body - -end: - ret void -} From llvm-commits at lists.llvm.org Mon Jul 6 17:58:35 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:58:35 +0000 (UTC) Subject: [PATCH] D81704: [WebAssembly] Adding 64-bit version of R_WASM_MEMORY_ADDR_* relocs In-Reply-To: References: Message-ID: <6a6c185139d2323b5877bbb4b806a721@localhost.localdomain> aardappel marked an inline comment as done. aardappel added inline comments. ================ Comment at: llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp:40 +#define GET_INSTRINFO_ENUM 1 +#include "WebAssemblyGenInstrInfo.inc" ---------------- thakis wrote: > aardappel wrote: > > aardappel wrote: > > > thakis wrote: > > > > This is a bit awkward. It makes WebAssembly the only target that has its TargetInfo dir depend on a llvm-tblgen generated file. It doesn't cause actual issues, but it's irregular. > > > > > > > > Make of this what you will :) > > > I'd be happy to move this something else.. we just don't have any other .cpp that is shared between CodeGen and MC. I could add a new lib that both depend on? @dschuff > > Actually, the cleaner solution would be to have tablegen (or whatever creates the instruction mappings) not stick both generated functions in the same #ifdef, which is the root cause of this having to sit in a shared location in the first place. Or make the functions static. But that potentially affects other targets, etc. > This came back to bite me today (or, well, actually, a few weeks ago, but I didn't notice until today): In the GN build (which isn't supported and which you don't have to care about; it's just FYI, nothing to act on), the .inc files get generated in the Target sub directory they best fit into. Due to this change, I had moved WebAssemblyGenInstrInfo.inc from WebAssembly/MCTargetDesc to WebAssembly/TargetInfo. I didn't realize that this had the effect that I now had a correct WebAssemblyGenInstrInfo.inc and a stale WebAssemblyGenInstrInfo.inc in different directories in build dirs on the bots, and they happened to pick up the stale one. Since the .inc include is unqualified, a wasm .td change today broke my bots. The hack fix was to delete the stale .inc file. > > The cmake build puts all target llvm-tblgen output in lib/Target/$targetname so it's not a problem there (the GN build should do that too), but I thought it's a nice example of how "this looks a bit funny" turned into an actual problem down the road. > > Again, nothing for you to do here :) Sorry about that.. I wouldn't mind fixing this, question is what is the best fix? Do you know of a way to have tablegen generate one file per instruction mapping? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81704/new/ https://reviews.llvm.org/D81704 From llvm-commits at lists.llvm.org Mon Jul 6 17:59:50 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 00:59:50 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <74e506f8c41ee880e0209effeb0329a5@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89 + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " terminated prematurely: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; ---------------- jhenderson wrote: > ikudrin wrote: > > dblaikie wrote: > > > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. > > > > > > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. > > These wordings are already better than mine. Thanks! > How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Mon Jul 6 18:00:49 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:00:49 +0000 (UTC) Subject: [PATCH] D82594: Clarify a bit the guideline on omitting braces, including more examples (NFC) In-Reply-To: References: Message-ID: lattner added a comment. In D82594#2131998 , @MaskRay wrote: > In D82594#2131989 , @lattner wrote: > > > I'm not sure what the contention here is. `int` is a clearly inferior choice and is just as unprincipled as `unsigned` is, we should use the correct type. > > > > I get the sense that you're arguing for `ssize_t` or one of its analogues instead of `size_t`? Are you arguing for general principles or this specific case? In this specific case, using `size_t` seems completely reasonable, but I agree that `ssize_t` would also be fine. > > > > I'm ok with either size_t or ssize_t here, but if you want to turn this into a general principle that we apply to the code base, then it should be discussed on llvm-dev first. > > > ssize_t is inferior. POSIX.1-2017 says "The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}]." The implementation may not even support -2. > > We could also use intptr_t to avoid sign extension on a 64-bit machine. Well sure, but C and POSIX are designed to support non-2's complement and other weird machines. LLVM wouldn't run in general on a machine that weird. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82594/new/ https://reviews.llvm.org/D82594 From llvm-commits at lists.llvm.org Mon Jul 6 18:02:09 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:02:09 +0000 (UTC) Subject: [PATCH] D82594: Clarify a bit the guideline on omitting braces, including more examples (NFC) In-Reply-To: References: Message-ID: <3d9bbb4a3e0384a7448f098ba5012e5c@localhost.localdomain> lattner added a comment. In D82594#2131999 , @mehdi_amini wrote: > In D82594#2131989 , @lattner wrote: > > > `int` is a clearly inferior choice > > > It isn't clear to me, as I mention in my previous comment, I'd be happy to get more educated on this if you have resources you could point me to that would explain your point? I'm sorry I didn't unpack that. `int` is inferior on standard LP64 machines because it doesn't express anywhere near the range of subscripts required by some large loops / array subscripts. While it is surely fine in some cases, it is not a 'safe default' that we could advice be used everywhere. -Chris Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82594/new/ https://reviews.llvm.org/D82594 From llvm-commits at lists.llvm.org Mon Jul 6 18:03:41 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:03:41 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: reames requested changes to this revision. reames added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/include/llvm/MC/MCFragment.h:358 + /// Maximum number of bytes of each instruction. + int64_t NopLength; + ---------------- Can you call this MaxNopLength or something? ================ Comment at: llvm/lib/MC/MCAssembler.cpp:625 + + if (NumBytes < 0) { + Asm.getContext().reportError( ---------------- When parsing asm, you reject negative lengths. Should these simply be asserts? ================ Comment at: llvm/lib/MC/MCAssembler.cpp:633 + if (MaxNopLength < 0 || + MaxNopLength > Asm.getBackend().getMaximumNopSize()) { + Asm.getContext().reportError( ---------------- Does this behaviour match existing gnu? I'd have expected the result of specifying a "too large" maximum size to simply clamp to the target's maximum. This is important as if the result is semantic, then the difference between "largest encodeable" and "largest profitable" becomes a thing the rest of the code has to care about. 15 byte nops are almost always *legal* they're just not *fast*. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:644 + + while (NumBytes) { + uint64_t NopsToEmit = (uint64_t)std::min(NumBytes, MaxNopLength); ---------------- This loop is duplicated from within emitNops. Can you pass in a MaxNopLength parameter instead? ================ Comment at: llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp:1072 +unsigned X86AsmBackend::getMaximumNopSize() const { + if (!STI.getFeatureBits()[X86::FeatureNOPL] && ---------------- Rename this function to getMaximumProfitableNop() There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. ================ Comment at: llvm/test/MC/X86/align-branch-bundle.s:9 # CHECK-NEXT: e: nop -# CHECK-NEXT: f: nop # CHECK-NEXT: 10: jle ---------------- Having a test delta in a file without .nops is highly suspicious. I'd suggest splitting your patch into a trivial version which emits single byte nops, and an change which adds the multiple byte support. That would allow us to separate the directive mechanics from the interesting profit bits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Mon Jul 6 18:05:05 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:05:05 +0000 (UTC) Subject: [PATCH] D81359: [ELF] Add --[no-]relax for RISC-V In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 275870. MaskRay added a comment. In the absence of a third opinion, I give up and make --no-relax an ignored option. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81359/new/ https://reviews.llvm.org/D81359 Files: lld/ELF/Options.td lld/docs/ld.lld.1 lld/test/ELF/silent-ignore.test Index: lld/test/ELF/silent-ignore.test =================================================================== --- lld/test/ELF/silent-ignore.test +++ lld/test/ELF/silent-ignore.test @@ -7,6 +7,7 @@ RUN: -no-ctors-in-init-array \ RUN: -no-keep-memory \ RUN: -no-pipeline-knowledge \ +RUN: --no-relax \ RUN: -no-warn-mismatch \ RUN: -p \ RUN: -rpath-link . \ Index: lld/docs/ld.lld.1 =================================================================== --- lld/docs/ld.lld.1 +++ lld/docs/ld.lld.1 @@ -308,6 +308,8 @@ Page align sections. .It Fl -no-omagic Do not set the text data sections to be writable, page align sections. +.It Fl -no-relax +Disable target-specific relaxations. This is currently a no-op. .It Fl -no-rosegment Do not put read-only non-executable sections in their own segment. .It Fl -no-undefined-version Index: lld/ELF/Options.td =================================================================== --- lld/ELF/Options.td +++ lld/ELF/Options.td @@ -635,6 +635,7 @@ def: F<"no-ctors-in-init-array">; def: F<"no-keep-memory">; def: F<"no-pipeline-knowledge">; +def: F<"no-relax">; def: F<"no-warn-mismatch">; def: Flag<["-"], "p">; def: Separate<["--", "-"], "rpath-link">; -------------- next part -------------- A non-text attachment was scrubbed... Name: D81359.275870.patch Type: text/x-patch Size: 1220 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:05:36 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:05:36 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: <0c2b3e0b84d650450e27a16017b620a2@localhost.localdomain> rupprecht reopened this revision. rupprecht added a comment. This revision is now accepted and ready to land. Hi, I had to revert this in 10c82eecbcb7 due to crashing when building gzip (util.c). Reduced as both C and IR: $ cat repro.c void a(char* b) { for (char* c = 0; c != b;) if (*--c) *c = '_'; } $ clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c PHI node entries do not match predecessors! %pointer.phi = phi i8* [ null, %vector.ph ], [ %2, %vector.body ] label %vector.body label %vector.ph clang: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8029: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/rupprecht/dev/clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c 1. parser at end of file 2. Per-module optimization passes 3. Running pass 'Function Pass Manager' on module 'repro.c'. 4. Running pass 'Loop Vectorization' on function '@a' #0 0x0000000008912197 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:11 #1 0x0000000008912339 PrintStackTraceSignalHandler(void*) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:1 #2 0x0000000008910b4b llvm::sys::RunSignalHandlers() /home/rupprecht/src/llvm-project/llvm/lib/Support/Signals.cpp:67:5 #3 0x0000000008912a95 SignalHandler(int) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:1 #4 0x00007f7c948ff110 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14110) #5 0x00007f7c92e7a761 raise /build/glibc-M65Gwz/glibc-2.30/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 #6 0x00007f7c92e6455b abort /build/glibc-M65Gwz/glibc-2.30/stdlib/abort.c:81:7 #7 0x00007f7c92e6442f get_sysdep_segment_value /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:509:8 #8 0x00007f7c92e6442f _nl_load_domain /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:970:34 #9 0x00007f7c92e73092 (/lib/x86_64-linux-gnu/libc.so.6+0x34092) #10 0x0000000008adb37b llvm::LoopVectorizePass::processLoop(llvm::Loop*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8030:3 $ cat repro.ll ; ModuleID = './repro.ll' source_filename = "repro.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: noinline nounwind define void @a(i8* %b) #0 { entry: %b.addr = alloca i8*, align 8 %c = alloca i8*, align 8 store i8* %b, i8** %b.addr, align 8 br label %for.cond for.cond: ; preds = %if.end, %entry %0 = load i8*, i8** %c, align 8 %1 = load i8*, i8** %b.addr, align 8 %cmp = icmp ne i8* %0, %1 br i1 %cmp, label %for.body, label %for.end for.body: ; preds = %for.cond %2 = load i8*, i8** %c, align 8 %incdec.ptr = getelementptr inbounds i8, i8* %2, i32 -1 store i8* %incdec.ptr, i8** %c, align 8 %3 = load i8, i8* %incdec.ptr, align 1 %tobool = icmp ne i8 %3, 0 br i1 %tobool, label %if.then, label %if.end if.then: ; preds = %for.body %4 = load i8*, i8** %c, align 8 store i8 95, i8* %4, align 1 br label %if.end if.end: ; preds = %if.then, %for.body br label %for.cond for.end: ; preds = %for.cond ret void } attributes #0 = { noinline nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0} !0 = !{i32 1, !"wchar_size", i32 4} $ opt -O2 repro.ll -o repro.o Reopening as the patch is reverted Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 From llvm-commits at lists.llvm.org Mon Jul 6 18:05:41 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Mon, 06 Jul 2020 18:05:41 -0700 (PDT) Subject: [llvm] 7a99aab - [ModuloSchedule] Devirtualize PeelingModuloScheduleExpander::expand as it's not needed Message-ID: <5f03ca65.1c69fb81.38d4a.2416@mx.google.com> Author: David Blaikie Date: 2020-07-06T18:05:32-07:00 New Revision: 7a99aab8692c58558b62e9a66120886b8a70fab8 URL: https://github.com/llvm/llvm-project/commit/7a99aab8692c58558b62e9a66120886b8a70fab8 DIFF: https://github.com/llvm/llvm-project/commit/7a99aab8692c58558b62e9a66120886b8a70fab8.diff LOG: [ModuloSchedule] Devirtualize PeelingModuloScheduleExpander::expand as it's not needed The use case is out of tree code deriving from this class - but without a need to use the base class polymorphically, so skip the virtualization and virtual dtor. Post-commit review from 50ac7ce94f34c5f43b02185ae0c33e150e78b044 Added: Modified: llvm/include/llvm/CodeGen/ModuloSchedule.h Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/ModuloSchedule.h b/llvm/include/llvm/CodeGen/ModuloSchedule.h index be0108dfa936..1aa23208cfb9 100644 --- a/llvm/include/llvm/CodeGen/ModuloSchedule.h +++ b/llvm/include/llvm/CodeGen/ModuloSchedule.h @@ -282,9 +282,8 @@ class PeelingModuloScheduleExpander { LiveIntervals *LIS) : Schedule(S), MF(MF), ST(MF.getSubtarget()), MRI(MF.getRegInfo()), TII(ST.getInstrInfo()), LIS(LIS) {} - virtual ~PeelingModuloScheduleExpander() {} - virtual void expand(); + void expand(); /// Runs ModuloScheduleExpander and treats it as a golden input to validate /// aspects of the code generated by PeelingModuloScheduleExpander. From llvm-commits at lists.llvm.org Mon Jul 6 18:05:49 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:05:49 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments Message-ID: jdoerfert created this revision. jdoerfert added reviewers: jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis. Herald added subscribers: llvm-commits, cfe-commits, sstefan1, guansong, bollu, yaxunl, jholewinski. Herald added projects: clang, OpenMP, LLVM. There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83268 Files: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp clang/test/OpenMP/nvptx_data_sharing.cpp clang/test/OpenMP/nvptx_parallel_codegen.cpp clang/test/OpenMP/nvptx_target_codegen.cpp clang/test/OpenMP/nvptx_target_teams_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def openmp/libomptarget/deviceRTLs/common/src/parallel.cu openmp/libomptarget/deviceRTLs/interface.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83268.275871.patch Type: text/x-patch Size: 12960 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:06:09 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:06:09 +0000 (UTC) Subject: [PATCH] D82673: [ModuloSchedule] Make PeelingModuloScheduleExpander inheritable. In-Reply-To: References: Message-ID: dblaikie added a comment. In D82673#2129786 , @hgreving wrote: > In short - the virtuality is not the issue, the inheritability is. The class can be non-virtual, no problem for us. I would like to reuse existing code in the upstream pass that is currently "private". Hence, the "protected" part is important. It also makes sense generally, because a software pipeliner on a specific subtarget may follow different expansion strategies. If you would like to revert the virtuality, no problem at all, if you keep the protected inheritance. I think we should design a better defined interface for this class in the long run. Only one target upstream and us downstream are using it AFAIK, but as we are supporting more targets, we may come up with something better. SG? Can pull in Hexagon people, yes Ah, fair enough. Thanks for explaining! I've removed the virtual dtor and virtuality from `expand` in 7a99aab8692c58558b62e9a66120886b8a70fab8 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82673/new/ https://reviews.llvm.org/D82673 From llvm-commits at lists.llvm.org Mon Jul 6 18:07:04 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:07:04 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) Message-ID: jdoerfert created this revision. jdoerfert added reviewers: jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis. Herald added subscribers: llvm-commits, okura, bbn, sstefan1, guansong, bollu, hiraditya, yaxunl. Herald added a reviewer: sstefan1. Herald added a reviewer: baziotis. Herald added a project: LLVM. We now identify GPU kernels, that is entry points into the GPU code. These kernels (can) correspond to OpenMP target regions. With this patch we identify and on request print them via remarks. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83269 Files: llvm/include/llvm/Transforms/IPO/OpenMPOpt.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83269.275873.patch Type: text/x-patch Size: 9136 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:07:26 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:07:26 +0000 (UTC) Subject: [PATCH] D83270: [OpenMP] Compute a proper module slice for the CGSCCC pass Message-ID: jdoerfert created this revision. jdoerfert added reviewers: jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis. Herald added subscribers: llvm-commits, sstefan1, guansong, bollu, hiraditya, yaxunl. Herald added a project: LLVM. The module slice describes which functions we can analyze and transform while working on an SCC as part of the CGSCC OpenMPOpt pass. So far, we simply restricted it to the SCC. In a follow up we will need to have a bigger scope which is why this patch introduces a proper identification of the module slice. In short, everything that has a transitive reference to a function in the SCC or is transitively referenced by one is fair game. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83270 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83270.275874.patch Type: text/x-patch Size: 5399 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:08:00 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:08:00 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine Message-ID: jdoerfert created this revision. jdoerfert added reviewers: jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis. Herald added subscribers: llvm-commits, aaron.ballman, sstefan1, guansong, bollu, hiraditya, yaxunl. Herald added a project: LLVM. In non-SPMD mode we create a state machine like code to identify the parallel region the GPU worker threads should execute next. The identification uses the parallel region function pointer as that allows it to work even if the kernel (=target region) and the parallel region are in separate TUs. However, taking the address of a function comes with various downsides. With this patch we will identify the most common situation and replace the function pointer use with a dummy global symbol (for identification purposes only). That means, if the parallel region is only called from a single target region (or kernel), we do not use the function pointer of the parallel region to identify it but a new global symbol. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83271 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83271.275875.patch Type: text/x-patch Size: 18911 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:09:37 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:09:37 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: <86d6f9a9adec2f0051c48792a342dd08@localhost.localdomain> rupprecht requested changes to this revision. rupprecht added a comment. This revision now requires changes to proceed. I guess I should also request changes since there's a crash :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 From llvm-commits at lists.llvm.org Mon Jul 6 18:16:58 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:16:58 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: clementval updated this revision to Diff 275877. clementval added a comment. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275877.patch Type: text/x-patch Size: 109657 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:18:25 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:18:25 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: <34ac37d3d1fba20ba3d31b60492aac0e@localhost.localdomain> greened added a comment. In D83004#2134303 , @greened wrote: > > I don't particularly like this mode dichotomy but unifying it would necessitate updating a whole lot of clang tests. Axtually that's not strictly true, It would just change the tests singificantly when they are updated for some other reason. Whether that is reasonable is something we should discuss. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Mon Jul 6 18:19:16 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:19:16 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: <6313a53d75e43799f9738b7dd9e27836@localhost.localdomain> smeenai accepted this revision. smeenai added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:169-170 + for (const auto &OldNew : Config.RPathsToUpdate) { + StringRef Old = OldNew.getFirst(); + StringRef New = OldNew.getSecond(); + if (RPaths.count(Old) == 0) ---------------- sameerarora101 wrote: > this is the lates update. Would it work on Darwin? thanks Yup, that builds. I believe this is a libc++ bug though, and I commented on https://bugs.llvm.org/show_bug.cgi?id=17550#c11 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 18:19:52 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:19:52 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: dblaikie added a comment. In D82886#2127149 , @ikudrin wrote: > In D82886#2126027 , @dblaikie wrote: > > > Is this assertion really valuable? "length()" does the same addition that is done to create the value that length() is being compared to - it's pretty trivial? Perhaps "FullLength" should be initialized with a call to length instead of duplicating that addition? > > > In that case, we have a small inconsistency in the error message. In most cases, we report the full length of the data, which is the value of the unit length field plus the size of the field itself, but for the unit length of 0, we would not add the size of the field and would report only the raw value. This inconsistency is the only reason we calculate `FullLength` directly. Is that difference necessary? I tried removing the length == 0 special case from "length()" and no tests fail. Perhaps we could go that route instead? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Mon Jul 6 18:21:54 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:21:54 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: greened marked an inline comment as done. greened added inline comments. ================ Comment at: llvm/utils/update_cc_test_checks.py:133 + parser.add_argument('--include-generated-funcs', action='store_true', + help='Output checks for functions not in source') parser.add_argument('tests', nargs='+') ---------------- greened wrote: > jdoerfert wrote: > > I think this should go into common.py (after D78618). I would also make this the default but OK. > Yes I suppose it should in case `opt` and friends generate functions. I hadn't considered that use-case. > > While I would like to make it default unfortunately it would require updating a bunch of the existing clang tests which doesn't seem too friendly. See the patch update comment for details. > Just realized it wouldn't necessarily require regeneration of tests, it would just cause regenerated tests to change a lot when they are eventually regenerated. We should discuss as to whether that's acceptable. I think for now this should be non-default to at least get the functionality in without disturbing existing users and then we can discuss a separate change to make it default. It's also possible we could change how clang orders functions. I discovered there's a difference in clang 10 vs. 11 in the order functions are output when OpenMP outlining happens. clang 10 seems to preserve the source order of functions and clang 11 does not. Perhaps that needs to be fixed as I don't know whether that change was intentional or not. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Mon Jul 6 18:23:31 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:23:31 +0000 (UTC) Subject: [PATCH] D82838: Parse section ranges when verifying DWARF so we can exclude addresses that should have been stripped from DWARF. In-Reply-To: References: Message-ID: dblaikie added a comment. I think maybe this is sort of orthogonal to 46453... maybe not, but kind of. Seems like we should filter out known-tombstoned ranges (the only ones we can know for sure are the new -1/-2 tombstones - all the others have ambiguities). Then we should maybe flag maybe-tombstones with a little "eh, maybe?". Then we should warn for anything left that's even partially outside the .text range (this patch), then we should warn for overlaps/etc on the remaining ones? But as @jhenderson said, maybe those first ones come later & we use the .text range to determine which things to look at for overlap first, then add new verifier checks for "things outside .text that aren't clearly tombstoned" knowing that some of those are expected limitations of (at least gold's) previous tombstoning strategies. (I'd sort of like to avoid actually looking at the object's executable sections - but I can't really fault the strategy & even if we added all the other verifier checks/warnings/etc, it'd still be super reasonable to warn about ranges that are otherwise totally valid, but extend beyond/are entirely outside the actual executable .text) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82838/new/ https://reviews.llvm.org/D82838 From llvm-commits at lists.llvm.org Mon Jul 6 18:24:09 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:24:09 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: amyk updated this revision to Diff 275861. amyk added a comment. Address review comments from Nemanja: - update comments, variables - consider the case when the splat is smaller than 32-bits (and add associated test case) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275861.patch Type: text/x-patch Size: 12048 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:24:37 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:24:37 +0000 (UTC) Subject: [PATCH] D83049: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. In-Reply-To: References: Message-ID: <94e9cc55ac9b8927595e22f2263260ca@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s:2-7 +# RUN: not llvm-dwarfdump -debug-pubnames %t 2>&1 | FileCheck %s +# RUN: not llvm-dwarfdump -debug-pubtypes %t 2>&1 | FileCheck %s +# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2>&1 | FileCheck %s +# RUN: not llvm-dwarfdump -debug-gnu-pubtypes %t 2>&1 | FileCheck %s + +# CHECK: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) ---------------- Might be nice if you could test all these in one go? (so it's one invocation of llvm-dwarfdump rather than 4 - save on process overhead, etc) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83049/new/ https://reviews.llvm.org/D83049 From llvm-commits at lists.llvm.org Mon Jul 6 18:27:39 2020 From: llvm-commits at lists.llvm.org (Xiang Zhang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:27:39 +0000 (UTC) Subject: [PATCH] D83111: [X86-64] Support Intel AMX Intrinsic In-Reply-To: References: Message-ID: <5b3f47598e2551e6d2f3ebcb380b8add@localhost.localdomain> xiangzhangllvm updated this revision to Diff 275878. xiangzhangllvm added a comment. Fix some missed change last time. 1 doxygen comments: amxintrin.h --> x86intrin.h refine ldtilecfg and sttilecfg comment. 2 replace tile reg num 8 with TileRegHigh+1 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83111/new/ https://reviews.llvm.org/D83111 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Basic/BuiltinsX86_64.def clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Driver/Options.td clang/include/clang/Sema/Sema.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/Headers/CMakeLists.txt clang/lib/Headers/amxintrin.h clang/lib/Headers/cpuid.h clang/lib/Headers/immintrin.h clang/lib/Sema/SemaChecking.cpp clang/test/CodeGen/AMX/amx.c clang/test/CodeGen/AMX/amx_errors.c clang/test/CodeGen/AMX/amx_inline_asm.c clang/test/Driver/x86-target-features.c clang/test/Preprocessor/x86_amx_target_features.c llvm/include/llvm/IR/IntrinsicsX86.td llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrAMX.td llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83111.275878.patch Type: text/x-patch Size: 42434 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:29:05 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via llvm-commits) Date: Mon, 06 Jul 2020 18:29:05 -0700 (PDT) Subject: [llvm] c13e3e2 - [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. Message-ID: <5f03cfe1.1c69fb81.86a91.5811@mx.google.com> Author: Amy Kwan Date: 2020-07-06T20:28:38-05:00 New Revision: c13e3e2c2e0c774917bcc7f4f50c29c8133d3a55 URL: https://github.com/llvm/llvm-project/commit/c13e3e2c2e0c774917bcc7f4f50c29c8133d3a55 DIFF: https://github.com/llvm/llvm-project/commit/c13e3e2c2e0c774917bcc7f4f50c29c8133d3a55.diff LOG: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs. We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if: - Element size is 4 bytes - The RHS is a constant vector (and constant splat of 4-bytes) - The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks: <0, 4-7, 2, 4-7> <4-7, 1, 4-7, 3> Differential Revision: https://reviews.llvm.org/D83245 Added: llvm/test/CodeGen/PowerPC/p10-splatImm32.ll Modified: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index 40619519664f..815a84e8c320 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -1477,6 +1477,8 @@ const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const { case PPCISD::XXSPLT: return "PPCISD::XXSPLT"; case PPCISD::XXSPLTI_SP_TO_DP: return "PPCISD::XXSPLTI_SP_TO_DP"; + case PPCISD::XXSPLTI32DX: + return "PPCISD::XXSPLTI32DX"; case PPCISD::VECINSERT: return "PPCISD::VECINSERT"; case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI"; case PPCISD::VECSHL: return "PPCISD::VECSHL"; @@ -9778,6 +9780,77 @@ SDValue PPCTargetLowering::lowerToVINSERTH(ShuffleVectorSDNode *N, return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins); } +/// lowerToXXSPLTI32DX - Return the SDValue if this VECTOR_SHUFFLE can be +/// handled by the XXSPLTI32DX instruction introduced in ISA 3.1, otherwise +/// return the default SDValue. +SDValue PPCTargetLowering::lowerToXXSPLTI32DX(ShuffleVectorSDNode *SVN, + SelectionDAG &DAG) const { + // The LHS and RHS may be bitcasts to v16i8 as we canonicalize shuffles + // to v16i8. Peek through the bitcasts to get the actual operands. + SDValue LHS = peekThroughBitcasts(SVN->getOperand(0)); + SDValue RHS = peekThroughBitcasts(SVN->getOperand(1)); + + auto ShuffleMask = SVN->getMask(); + SDValue VecShuffle(SVN, 0); + SDLoc DL(SVN); + + // Check that we have a four byte shuffle. + if (!isNByteElemShuffleMask(SVN, 4, 1)) + return SDValue(); + + // Canonicalize the RHS being a BUILD_VECTOR when lowering to xxsplti32dx. + if (RHS->getOpcode() != ISD::BUILD_VECTOR) { + std::swap(LHS, RHS); + VecShuffle = DAG.getCommutedVectorShuffle(*SVN); + ShuffleMask = cast(VecShuffle)->getMask(); + } + + // Ensure that the RHS is a vector of constants. + BuildVectorSDNode *BVN = dyn_cast(RHS.getNode()); + if (!BVN) + return SDValue(); + + // Check if RHS is a splat of 4-bytes (or smaller). + APInt APSplatValue, APSplatUndef; + unsigned SplatBitSize; + bool HasAnyUndefs; + if (!BVN->isConstantSplat(APSplatValue, APSplatUndef, SplatBitSize, + HasAnyUndefs, 0, !Subtarget.isLittleEndian()) || + SplatBitSize > 32) + return SDValue(); + + // Check that the shuffle mask matches the semantics of XXSPLTI32DX. + // The instruction splats a constant C into two words of the source vector + // producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }. + // Thus we check that the shuffle mask is the equivalent of + // <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively. + // Note: the check above of isNByteElemShuffleMask() ensures that the bytes + // within each word are consecutive, so we only need to check the first byte. + SDValue Index; + bool IsLE = Subtarget.isLittleEndian(); + if ((ShuffleMask[0] == 0 && ShuffleMask[8] == 8) && + (ShuffleMask[4] % 4 == 0 && ShuffleMask[12] % 4 == 0 && + ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) + Index = DAG.getTargetConstant(IsLE ? 0 : 1, DL, MVT::i32); + else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && + (ShuffleMask[0] % 4 == 0 && ShuffleMask[8] % 4 == 0 && + ShuffleMask[0] > 15 && ShuffleMask[8] > 15)) + Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i32); + else + return SDValue(); + + // If the splat is narrower than 32-bits, we need to get the 32-bit value + // for XXSPLTI32DX. + unsigned SplatVal = APSplatValue.getZExtValue(); + for (; SplatBitSize < 32; SplatBitSize <<= 1) + SplatVal |= (SplatVal << SplatBitSize); + + SDValue SplatNode = DAG.getNode( + PPCISD::XXSPLTI32DX, DL, MVT::v2i64, DAG.getBitcast(MVT::v2i64, LHS), + Index, DAG.getTargetConstant(SplatVal, DL, MVT::i32)); + return DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, SplatNode); +} + /// LowerROTL - Custom lowering for ROTL(v1i128) to vector_shuffle(v16i8). /// We lower ROTL(v1i128) to vector_shuffle(v16i8) only if shift amount is /// a multiple of 8. Otherwise convert it to a scalar rotation(i128) @@ -9895,6 +9968,12 @@ SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins); } + if (Subtarget.hasPrefixInstrs()) { + SDValue SplatInsertNode; + if ((SplatInsertNode = lowerToXXSPLTI32DX(SVOp, DAG))) + return SplatInsertNode; + } + if (Subtarget.hasP9Altivec()) { SDValue NewISDNode; if ((NewISDNode = lowerToVINSERTH(SVOp, DAG))) diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h index 98256ae0c359..768eaa43e013 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h @@ -102,6 +102,10 @@ namespace llvm { /// vector or scalar. XXSPLTI_SP_TO_DP, + /// XXSPLTI32DX - The PPC XXSPLTI32DX instruction. + /// + XXSPLTI32DX, + /// VECINSERT - The PPC vector insert instruction /// VECINSERT, @@ -1270,6 +1274,10 @@ namespace llvm { /// essentially v16i8 vector version of VINSERTH. SDValue lowerToVINSERTB(ShuffleVectorSDNode *N, SelectionDAG &DAG) const; + /// lowerToXXSPLTI32DX - Return the SDValue if this VECTOR_SHUFFLE can be + /// handled by the XXSPLTI32DX instruction introduced in ISA 3.1. + SDValue lowerToXXSPLTI32DX(ShuffleVectorSDNode *N, SelectionDAG &DAG) const; + // Return whether the call instruction can potentially be optimized to a // tail call. This will cause the optimizers to attempt to move, or // duplicate return instructions to help enable tail call optimizations. diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td index 818c4e839e0a..3ed651abe453 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -1,3 +1,19 @@ +//===----------------------------------------------------------------------===// +// PowerPC ISA 3.1 specific type constraints. +// + +def SDT_PPCSplat32 : SDTypeProfile<1, 3, [ SDTCisVT<0, v2i64>, + SDTCisVec<1>, SDTCisInt<2>, SDTCisInt<3> +]>; + +//===----------------------------------------------------------------------===// +// ISA 3.1 specific PPCISD nodes. +// + +def PPCxxsplti32dx : SDNode<"PPCISD::XXSPLTI32DX", SDT_PPCSplat32, []>; + +//===----------------------------------------------------------------------===// + // PC Relative flag (for instructions that use the address of the prefix for // address computations). class isPCRel { bit PCRel = 1; } @@ -732,8 +748,11 @@ let Predicates = [PrefixInstrs] in { (PPCxxspltidp i32:$IMM32))]>; def XXSPLTI32DX : 8RR_DForm_IMM32_XT6_IX<32, 0, (outs vsrc:$XT), - (ins vsrc:$XTi, i1imm:$IX, i32imm:$IMM32), - "xxsplti32dx $XT, $IX, $IMM32", IIC_VecGeneral, []>, + (ins vsrc:$XTi, u1imm:$IX, i32imm:$IMM32), + "xxsplti32dx $XT, $IX, $IMM32", IIC_VecGeneral, + [(set v2i64:$XT, + (PPCxxsplti32dx v2i64:$XTi, i32:$IX, + i32:$IMM32))]>, RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">; def XXPERMX : 8RR_XX4Form_IMM3_XTABC6<34, 0, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, diff --git a/llvm/test/CodeGen/PowerPC/p10-splatImm32.ll b/llvm/test/CodeGen/PowerPC/p10-splatImm32.ll new file mode 100644 index 000000000000..d610bd260fc9 --- /dev/null +++ b/llvm/test/CodeGen/PowerPC/p10-splatImm32.ll @@ -0,0 +1,120 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -O2 \ +; RUN: -ppc-asm-full-reg-names -mcpu=pwr10 < %s | \ +; RUN: FileCheck --check-prefix=CHECK-LE %s +; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -O2 \ +; RUN: -ppc-asm-full-reg-names -mcpu=pwr10 < %s | \ +; RUN: FileCheck --check-prefix=CHECK-BE %s + +; Function Attrs: norecurse nounwind readnone +define <4 x i32> @test_xxsplti32dx_1(<4 x i32> %a) { +; CHECK-LE-LABEL: test_xxsplti32dx_1: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 0, 566 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_1: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 1, 566 +; CHECK-BE-NEXT: blr +entry: + %vecins1 = shufflevector <4 x i32> %a, <4 x i32> , <4 x i32> + ret <4 x i32> %vecins1 +} + +; Function Attrs: norecurse nounwind readnone +define <4 x i32> @test_xxsplti32dx_2(<4 x i32> %a) { +; CHECK-LE-LABEL: test_xxsplti32dx_2: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 1, 33 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_2: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 0, 33 +; CHECK-BE-NEXT: blr +entry: + %vecins1 = shufflevector <4 x i32> , <4 x i32> %a, <4 x i32> + ret <4 x i32> %vecins1 +} + +; Function Attrs: norecurse nounwind readnone +define <4 x i32> @test_xxsplti32dx_3(<4 x i32> %a) { +; CHECK-LE-LABEL: test_xxsplti32dx_3: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 0, 12 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_3: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 1, 12 +; CHECK-BE-NEXT: blr +entry: + %vecins1 = shufflevector <4 x i32> %a, <4 x i32> , <4 x i32> + ret <4 x i32> %vecins1 +} + +; Function Attrs: norecurse nounwind readnone +define <4 x i32> @test_xxsplti32dx_4(<4 x i32> %a) { +; CHECK-LE-LABEL: test_xxsplti32dx_4: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 1, -683 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_4: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 0, -683 +; CHECK-BE-NEXT: blr +entry: + %vecins1 = shufflevector <4 x i32> , <4 x i32> %a, <4 x i32> + ret <4 x i32> %vecins1 +} + +; Function Attrs: nounwind +define <4 x float> @test_xxsplti32dx_5(<4 x float> %vfa) { +; CHECK-LE-LABEL: test_xxsplti32dx_5: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 0, 1065353216 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_5: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 1, 1065353216 +; CHECK-BE-NEXT: blr +entry: + %vecins3.i = shufflevector <4 x float> %vfa, <4 x float> , <4 x i32> + ret <4 x float> %vecins3.i +} + +; Function Attrs: nounwind +define <4 x float> @test_xxsplti32dx_6(<4 x float> %vfa) { +; CHECK-LE-LABEL: test_xxsplti32dx_6: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 1, 1073741824 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_6: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 0, 1073741824 +; CHECK-BE-NEXT: blr +entry: + %vecins3.i = shufflevector <4 x float> , <4 x float> %vfa, <4 x i32> + ret <4 x float> %vecins3.i +} + +; Function Attrs: norecurse nounwind readnone +; Test to illustrate when the splat is narrower than 32-bits. +define dso_local <4 x i32> @test_xxsplti32dx_7(<4 x i32> %a) local_unnamed_addr #0 { +; CHECK-LE-LABEL: test_xxsplti32dx_7: +; CHECK-LE: # %bb.0: # %entry +; CHECK-LE-NEXT: xxsplti32dx vs34, 1, -1414812757 +; CHECK-LE-NEXT: blr +; +; CHECK-BE-LABEL: test_xxsplti32dx_7: +; CHECK-BE: # %bb.0: # %entry +; CHECK-BE-NEXT: xxsplti32dx vs34, 0, -1414812757 +; CHECK-BE-NEXT: blr +entry: + %vecins1 = shufflevector <4 x i32> , <4 x i32> %a, <4 x i32> + ret <4 x i32> %vecins1 +} From llvm-commits at lists.llvm.org Mon Jul 6 18:29:05 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:29:05 +0000 (UTC) Subject: [PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. In-Reply-To: References: Message-ID: <8374b84999da5bbdafeda5e4290f8eff@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc13e3e2c2e0c: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering… (authored by amyk). Changed prior to commit: https://reviews.llvm.org/D83245?vs=275861&id=275880#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83245/new/ https://reviews.llvm.org/D83245 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/p10-splatImm32.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83245.275880.patch Type: text/x-patch Size: 12111 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:29:29 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:29:29 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <10ca594f598a5e64d8ab6a1f08ec3f04@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:930-931 + Optional &CachedKernel = UniqueKernelMap[&F]; + if (CachedKernel.hasValue()) + return CachedKernel.getValue(); + ---------------- if (CachedKernel) return *CachedKernel ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:938 + CachedKernel = Kernel(&F); + return CachedKernel.getValue(); + } ---------------- *CachedValue ================ Comment at: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll:40 + +define internal void @__omp_offloading_35_a1e179_foo_l7_worker() { +entry: ---------------- These tests seem really big ================ Comment at: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll:278-280 +!llvm.module.flags = !{!0, !1, !2, !3} +!omp_offload.info = !{!4} +!nvvm.annotations = !{!5, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8} ---------------- Mostly unneeded metadata? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 From llvm-commits at lists.llvm.org Mon Jul 6 18:31:14 2020 From: llvm-commits at lists.llvm.org (Lei Huang via llvm-commits) Date: Mon, 06 Jul 2020 18:31:14 -0700 (PDT) Subject: [llvm] 0c6b6e2 - [PowerPC] Implement Vector Splat Immediate Builtins in Clang Message-ID: <5f03d062.1c69fb81.b51a4.fa22@mx.google.com> Author: Biplob Mishra Date: 2020-07-06T20:29:33-05:00 New Revision: 0c6b6e28e70c06a3cb4704d2d8f90829a689e230 URL: https://github.com/llvm/llvm-project/commit/0c6b6e28e70c06a3cb4704d2d8f90829a689e230 DIFF: https://github.com/llvm/llvm-project/commit/0c6b6e28e70c06a3cb4704d2d8f90829a689e230.diff LOG: [PowerPC] Implement Vector Splat Immediate Builtins in Clang Implements builtins for the following prototypes: vector signed int vec_splati (const signed int); vector float vec_splati (const float); vector double vec_splatid (const float); vector signed int vec_splati_ins (vector signed int, const unsigned int, const signed int); vector unsigned int vec_splati_ins (vector unsigned int, const unsigned int, const unsigned int); vector float vec_splati_ins (vector float, const unsigned int, const float); Differential Revision: https://reviews.llvm.org/D82520 Added: Modified: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/test/CodeGen/PowerPC/p10-splatImm.ll Removed: ################################################################################ diff --git a/clang/lib/Headers/altivec.h b/clang/lib/Headers/altivec.h index a63f2ee359fd..9a4009216930 100644 --- a/clang/lib/Headers/altivec.h +++ b/clang/lib/Headers/altivec.h @@ -17094,6 +17094,58 @@ vec_blendv(vector double __a, vector double __b, vector unsigned long long __c) { return __builtin_vsx_xxblendvd(__a, __b, __c); } + +/* vec_splati */ + +#define vec_splati(__a) \ + _Generic((__a), signed int \ + : ((vector signed int)__a), unsigned int \ + : ((vector unsigned int)__a), float \ + : ((vector float)__a)) + +/* vec_spatid */ + +static __inline__ vector double __ATTRS_o_ai vec_splatid(const float __a) { + return ((vector double)((double)__a)); +} + +/* vec_splati_ins */ + +static __inline__ vector signed int __ATTRS_o_ai vec_splati_ins( + vector signed int __a, const unsigned int __b, const signed int __c) { +#ifdef __LITTLE_ENDIAN__ + __a[1 - __b] = __c; + __a[3 - __b] = __c; +#else + __a[__b] = __c; + __a[2 + __b] = __c; +#endif + return __a; +} + +static __inline__ vector unsigned int __ATTRS_o_ai vec_splati_ins( + vector unsigned int __a, const unsigned int __b, const unsigned int __c) { +#ifdef __LITTLE_ENDIAN__ + __a[1 - __b] = __c; + __a[3 - __b] = __c; +#else + __a[__b] = __c; + __a[2 + __b] = __c; +#endif + return __a; +} + +static __inline__ vector float __ATTRS_o_ai +vec_splati_ins(vector float __a, const unsigned int __b, const float __c) { +#ifdef __LITTLE_ENDIAN__ + __a[1 - __b] = __c; + __a[3 - __b] = __c; +#else + __a[__b] = __c; + __a[2 + __b] = __c; +#endif + return __a; +} #endif /* __VSX__ */ #endif /* __POWER10_VECTOR__ */ diff --git a/clang/test/CodeGen/builtins-ppc-p10vector.c b/clang/test/CodeGen/builtins-ppc-p10vector.c index b0602dd66f53..22b4e7a6f3ec 100644 --- a/clang/test/CodeGen/builtins-ppc-p10vector.c +++ b/clang/test/CodeGen/builtins-ppc-p10vector.c @@ -512,3 +512,72 @@ vector unsigned int test_vec_inserth_uiv(void) { // CHECK-LE-NEXT: ret <4 x i32> return vec_inserth(vuia, vuib, uia); } + +vector signed int test_vec_vec_splati_si(void) { + // CHECK-BE: ret <4 x i32> + // CHECK: ret <4 x i32> + return vec_splati(-17); +} + +vector unsigned int test_vec_vec_splati_ui(void) { + // CHECK-BE: ret <4 x i32> + // CHECK: ret <4 x i32> + return vec_splati(16U); +} + +vector float test_vec_vec_splati_f(void) { + // CHECK-BE: ret <4 x float> + // CHECK: ret <4 x float> + return vec_splati(1.0f); +} + +vector double test_vec_vec_splatid(void) { + // CHECK-BE: [[T1:%.+]] = fpext float %{{.+}} to double + // CHECK-BE-NEXT: [[T2:%.+]] = insertelement <2 x double> undef, double [[T1:%.+]], i32 0 + // CHECK-BE-NEXT: [[T3:%.+]] = shufflevector <2 x double> [[T2:%.+]], <2 x double> undef, <2 x i32> zeroinitialize + // CHECK-BE-NEXT: ret <2 x double> [[T3:%.+]] + // CHECK: [[T1:%.+]] = fpext float %{{.+}} to double + // CHECK-NEXT: [[T2:%.+]] = insertelement <2 x double> undef, double [[T1:%.+]], i32 0 + // CHECK-NEXT: [[T3:%.+]] = shufflevector <2 x double> [[T2:%.+]], <2 x double> undef, <2 x i32> zeroinitialize + // CHECK-NEXT: ret <2 x double> [[T3:%.+]] + return vec_splatid(1.0); +} + +vector signed int test_vec_vec_splati_ins_si(void) { + // CHECK-BE: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 %{{.+}} + // CHECK-BE: [[T1:%.+]] = add i32 2, %{{.+}} + // CHECK-BE: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T1]] + // CHECK-BE: ret <4 x i32> + // CHECK: [[T1:%.+]] = sub i32 1, %{{.+}} + // CHECK: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T1]] + // CHECK: [[T2:%.+]] = sub i32 3, %{{.+}} + // CHECK: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T2]] + // CHECK: ret <4 x i32> + return vec_splati_ins(vsia, 0, -17); +} + +vector unsigned int test_vec_vec_splati_ins_ui(void) { + // CHECK-BE: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 %{{.+}} + // CHECK-BE: [[T1:%.+]] = add i32 2, %{{.+}} + // CHECK-BE: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T1]] + // CHECK-BE: ret <4 x i32> + // CHECK: [[T1:%.+]] = sub i32 1, %{{.+}} + // CHECK: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T1]] + // CHECK: [[T2:%.+]] = sub i32 3, %{{.+}} + // CHECK: insertelement <4 x i32> %{{.+}}, i32 %{{.+}}, i32 [[T2]] + // CHECK: ret <4 x i32> + return vec_splati_ins(vuia, 1, 16U); +} + +vector float test_vec_vec_splati_ins_f(void) { + // CHECK-BE: insertelement <4 x float> %{{.+}}, float %{{.+}}, i32 %{{.+}} + // CHECK-BE: [[T1:%.+]] = add i32 2, %{{.+}} + // CHECK-BE: insertelement <4 x float> %{{.+}}, float %{{.+}}, i32 [[T1]] + // CHECK-BE: ret <4 x float> + // CHECK: [[T1:%.+]] = sub i32 1, %{{.+}} + // CHECK: insertelement <4 x float> %{{.+}}, float %{{.+}}, i32 [[T1]] + // CHECK: [[T2:%.+]] = sub i32 3, %{{.+}} + // CHECK: insertelement <4 x float> %{{.+}}, float %{{.+}}, i32 [[T2]] + // CHECK: ret <4 x float> + return vec_splati_ins(vfa, 0, 1.0f); +} diff --git a/llvm/test/CodeGen/PowerPC/p10-splatImm.ll b/llvm/test/CodeGen/PowerPC/p10-splatImm.ll index b468f6d00451..8bb83c22be58 100644 --- a/llvm/test/CodeGen/PowerPC/p10-splatImm.ll +++ b/llvm/test/CodeGen/PowerPC/p10-splatImm.ll @@ -286,3 +286,21 @@ define dso_local double @testDoubleZeroScalar() local_unnamed_addr { entry: ret double 0.000000e+00 } + +define dso_local <4 x i32> @vec_splati() local_unnamed_addr { +; CHECK-LABEL: vec_splati: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: xxspltiw vs34, -17 +; CHECK-NEXT: blr +entry: + ret <4 x i32> +} + +define dso_local <2 x double> @vec_splatid() local_unnamed_addr { +; CHECK-LABEL: vec_splatid: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: xxspltidp vs34, 1065353216 +; CHECK-NEXT: blr +entry: + ret <2 x double> +} From llvm-commits at lists.llvm.org Mon Jul 6 18:31:29 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:31:29 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG0c6b6e28e70c: [PowerPC] Implement Vector Splat Immediate Builtins in Clang (authored by biplmish, committed by lei). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D82520?vs=275790&id=275881#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/test/CodeGen/PowerPC/p10-splatImm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82520.275881.patch Type: text/x-patch Size: 6056 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:35:36 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:35:36 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: JonChesterfield added a comment. That's interesting. Amdgpu does not handle function pointers well and I suspect nvptx has considerable performance overhead for them too. If a parallel region is only called from a single target region, it is always passed the same function pointer. Thus specialise the state machine. I think this machinery is equivalent to specialising the parallel region call. The general case involves calling one parallel region runtime function with various different function pointers. Devirtualising that is fairly difficult. For another time. For this simpler case, I think this transform is equivalent to specialising the various kmpc*parallel calls on a given function pointer. The callees are available when using a bitcode deviceRTL. Iirc function specialisation / partial evaluation is one of the classic compiler optimisations that LLVM doesn't really do. It's difficult to define a good cost model and C exposes function pointer comparison. What we could implement for this is an attribute driven one, where we mark the function pointer arguments in the deviceRTL with such and use LTO. Avoid specialising a function whose address escapes. I like this patch. It's a clear example of an effective openmp specific optimisation. It just happens to run very close to specialisation, which may not be that much harder to implement if we cheat on the cost model. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 From llvm-commits at lists.llvm.org Mon Jul 6 18:35:40 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:35:40 +0000 (UTC) Subject: [PATCH] D83111: [X86-64] Support Intel AMX Intrinsic In-Reply-To: References: Message-ID: craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM with all instances of "pointer point" replace with just "pointer" ================ Comment at: clang/lib/Headers/amxintrin.h:33 +/// \param __config +/// A pointer point to 512-bits configuration +static __inline__ void __DEFAULT_FN_ATTRS ---------------- Drop 'point' here. I think its understood that a pointer points. And it would need to be "pointing" if it stays CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83111/new/ https://reviews.llvm.org/D83111 From llvm-commits at lists.llvm.org Mon Jul 6 18:42:51 2020 From: llvm-commits at lists.llvm.org (Hendrik Greving via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:42:51 +0000 (UTC) Subject: [PATCH] D82673: [ModuloSchedule] Make PeelingModuloScheduleExpander inheritable. In-Reply-To: References: Message-ID: <0d6c16fa7c728eb9541d4462212828dc@localhost.localdomain> hgreving added a comment. In D82673#2134669 , @dblaikie wrote: > In D82673#2129786 , @hgreving wrote: > > > In short - the virtuality is not the issue, the inheritability is. The class can be non-virtual, no problem for us. I would like to reuse existing code in the upstream pass that is currently "private". Hence, the "protected" part is important. It also makes sense generally, because a software pipeliner on a specific subtarget may follow different expansion strategies. If you would like to revert the virtuality, no problem at all, if you keep the protected inheritance. I think we should design a better defined interface for this class in the long run. Only one target upstream and us downstream are using it AFAIK, but as we are supporting more targets, we may come up with something better. SG? Can pull in Hexagon people, yes > > > Ah, fair enough. Thanks for explaining! > > I've removed the virtual dtor and virtuality from `expand` in 7a99aab8692c58558b62e9a66120886b8a70fab8 Ok. Thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82673/new/ https://reviews.llvm.org/D82673 From llvm-commits at lists.llvm.org Mon Jul 6 18:45:19 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:45:19 +0000 (UTC) Subject: [PATCH] D83273: [X86] Remove the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp Message-ID: craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel, echristo, erichkeane, LuoYuanke, LiuChen3, FreddyYe, xiangzhangllvm. Herald added subscribers: jfb, hiraditya. Herald added a project: LLVM. Previously we had to specify the forward and backwards feature dependencies separately which was error prone. And as dependencies have gotten more complex it was hard to be sure the transitive dependencies were handled correctly. The way it was written was also not super readable. This patch replaces everything with a table that lists what features a feature is dependent on directly. Then we just have to recursively walk through the table to find the transitive dependencies. This is largely based on how we handle subtarget features in the MC layer from the tablegen descriptions. https://reviews.llvm.org/D83273 Files: clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h llvm/include/llvm/Support/X86TargetParser.def llvm/include/llvm/Support/X86TargetParser.h llvm/lib/Support/X86TargetParser.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83273.275882.patch Type: text/x-patch Size: 23210 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:47:25 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:47:25 +0000 (UTC) Subject: [PATCH] D83273: [X86] Remove the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp In-Reply-To: References: Message-ID: <7672db721630b2e1664ead7e04b9ec89@localhost.localdomain> craig.topper updated this revision to Diff 275883. craig.topper added a comment. Drop two header includes I was using for debugging CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83273/new/ https://reviews.llvm.org/D83273 Files: clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h llvm/include/llvm/Support/X86TargetParser.def llvm/include/llvm/Support/X86TargetParser.h llvm/lib/Support/X86TargetParser.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83273.275883.patch Type: text/x-patch Size: 22958 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 18:48:14 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:48:14 +0000 (UTC) Subject: [PATCH] D83273: [X86] Remove the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp In-Reply-To: References: Message-ID: echristo accepted this revision. echristo added a comment. This revision is now accepted and ready to land. Works for me :) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83273/new/ https://reviews.llvm.org/D83273 From llvm-commits at lists.llvm.org Mon Jul 6 18:58:31 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 01:58:31 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: Higuoxing marked an inline comment as done. Higuoxing added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- dblaikie wrote: > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Mon Jul 6 19:06:08 2020 From: llvm-commits at lists.llvm.org (River Riddle via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:06:08 +0000 (UTC) Subject: [PATCH] D83087: DomTree: remove explicit use of DomTreeNodeBase::iterator In-Reply-To: References: Message-ID: <9132a28cf90e12a706ce59503329eb2d@localhost.localdomain> rriddle accepted this revision. rriddle added a comment. Approval for anything MLIR related. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83087/new/ https://reviews.llvm.org/D83087 From llvm-commits at lists.llvm.org Mon Jul 6 19:14:26 2020 From: llvm-commits at lists.llvm.org (Xiang1 Zhang via llvm-commits) Date: Mon, 06 Jul 2020 19:14:26 -0700 (PDT) Subject: [llvm] 939d830 - [X86-64] Support Intel AMX Intrinsic Message-ID: <5f03da82.1c69fb81.7bf0a.31d6@mx.google.com> Author: Xiang1 Zhang Date: 2020-07-07T10:13:40+08:00 New Revision: 939d8309dbd4ee6cf6e9ef3e8ea26df008b006b4 URL: https://github.com/llvm/llvm-project/commit/939d8309dbd4ee6cf6e9ef3e8ea26df008b006b4 DIFF: https://github.com/llvm/llvm-project/commit/939d8309dbd4ee6cf6e9ef3e8ea26df008b006b4.diff LOG: [X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111 Added: clang/lib/Headers/amxintrin.h clang/test/CodeGen/AMX/amx.c clang/test/CodeGen/AMX/amx_errors.c clang/test/CodeGen/AMX/amx_inline_asm.c clang/test/Preprocessor/x86_amx_target_features.c llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll Modified: clang/docs/ClangCommandLineReference.rst clang/include/clang/Basic/BuiltinsX86_64.def clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Driver/Options.td clang/include/clang/Sema/Sema.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/Headers/CMakeLists.txt clang/lib/Headers/cpuid.h clang/lib/Headers/immintrin.h clang/lib/Sema/SemaChecking.cpp clang/test/Driver/x86-target-features.c llvm/include/llvm/IR/IntrinsicsX86.td llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrAMX.td Removed: ################################################################################ diff --git a/clang/docs/ClangCommandLineReference.rst b/clang/docs/ClangCommandLineReference.rst index 67c341feffbb..672c4ae80e73 100644 --- a/clang/docs/ClangCommandLineReference.rst +++ b/clang/docs/ClangCommandLineReference.rst @@ -3127,6 +3127,12 @@ X86 .. option:: -maes, -mno-aes +.. option:: -mamx-bf16, -mno-amx-bf16 + +.. option:: -mamx-int8, -mno-amx-int8 + +.. option:: -mamx-tile, -mno-amx-tile + .. option:: -mavx, -mno-avx .. option:: -mavx2, -mno-avx2 diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index c535f43203e5..7feccd2a81a0 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -101,6 +101,22 @@ TARGET_BUILTIN(__builtin_ia32_cvtsi2ss64, "V4fV4fOiIi", "ncV:128:", "avx512f") TARGET_BUILTIN(__builtin_ia32_cvtusi2sd64, "V2dV2dUOiIi", "ncV:128:", "avx512f") TARGET_BUILTIN(__builtin_ia32_cvtusi2ss64, "V4fV4fUOiIi", "ncV:128:", "avx512f") TARGET_BUILTIN(__builtin_ia32_directstore_u64, "vULi*ULi", "n", "movdiri") + +// AMX +TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") +TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") +TARGET_BUILTIN(__builtin_ia32_tilerelease, "v", "n", "amx-tile") +TARGET_BUILTIN(__builtin_ia32_tilezero, "vUc", "n", "amx-tile") + +TARGET_BUILTIN(__builtin_ia32_tileloadd64, "vIUcvC*z", "n", "amx-tile") +TARGET_BUILTIN(__builtin_ia32_tileloaddt164, "vIUcvC*z", "n", "amx-tile") +TARGET_BUILTIN(__builtin_ia32_tilestored64, "vIUcv*z", "n", "amx-tile") + +TARGET_BUILTIN(__builtin_ia32_tdpbssd, "vIUcIUcIUc", "n", "amx-int8") +TARGET_BUILTIN(__builtin_ia32_tdpbsud, "vIUcIUcIUc", "n", "amx-int8") +TARGET_BUILTIN(__builtin_ia32_tdpbusd, "vIUcIUcIUc", "n", "amx-int8") +TARGET_BUILTIN(__builtin_ia32_tdpbuud, "vIUcIUcIUc", "n", "amx-int8") +TARGET_BUILTIN(__builtin_ia32_tdpbf16ps, "vIUcIUcIUc", "n", "amx-bf16") TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") #undef BUILTIN diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 5b94aa8c4325..c935545610e0 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -9342,6 +9342,8 @@ def err_x86_builtin_invalid_rounding : Error< "invalid rounding argument">; def err_x86_builtin_invalid_scale : Error< "scale argument must be 1, 2, 4, or 8">; +def err_x86_builtin_tile_arg_duplicate : Error< + "tile arguments must refer to diff erent tiles">; def err_builtin_target_unsupported : Error< "builtin is not supported on this target">; diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 50d18343f7d4..745c696bcaa3 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3065,6 +3065,12 @@ def m3dnow : Flag<["-"], "m3dnow">, Group; def mno_3dnow : Flag<["-"], "mno-3dnow">, Group; def m3dnowa : Flag<["-"], "m3dnowa">, Group; def mno_3dnowa : Flag<["-"], "mno-3dnowa">, Group; +def mamx_bf16 : Flag<["-"], "mamx-bf16">, Group; +def mno_amx_bf16 : Flag<["-"], "mno-amx-bf16">, Group; +def mtamx_int8 : Flag<["-"], "mamx-int8">, Group; +def mno_amx_int8 : Flag<["-"], "mno-amx-int8">, Group; +def mamx_tile : Flag<["-"], "mamx-tile">, Group; +def mno_amx_tile : Flag<["-"], "mno-amx-tile">, Group; def msse : Flag<["-"], "msse">, Group; def mno_sse : Flag<["-"], "mno-sse">, Group; def msse2 : Flag<["-"], "msse2">, Group; diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index 9b82d2c984be..8ee7dd74712d 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -12142,6 +12142,13 @@ class Sema final { bool CheckSystemZBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall); bool CheckX86BuiltinRoundingOrSAE(unsigned BuiltinID, CallExpr *TheCall); bool CheckX86BuiltinGatherScatterScale(unsigned BuiltinID, CallExpr *TheCall); + bool CheckX86BuiltinTileArguments(unsigned BuiltinID, CallExpr *TheCall); + bool CheckX86BuiltinTileArgumentsRange(CallExpr *TheCall, + ArrayRef ArgNums); + bool CheckX86BuiltinTileArgumentsRange(CallExpr *TheCall, int ArgNum); + bool CheckX86BuiltinTileDuplicate(CallExpr *TheCall, ArrayRef ArgNums); + bool CheckX86BuiltinTileRangeAndDuplicate(CallExpr *TheCall, + ArrayRef ArgNums); bool CheckX86BuiltinFunctionCall(const TargetInfo &TI, unsigned BuiltinID, CallExpr *TheCall); bool CheckPPCBuiltinFunctionCall(const TargetInfo &TI, unsigned BuiltinID, diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp index 2c6742b9042a..ed62848d8070 100644 --- a/clang/lib/Basic/Targets/X86.cpp +++ b/clang/lib/Basic/Targets/X86.cpp @@ -62,6 +62,7 @@ static const char *const GCCRegNames[] = { "cr0", "cr2", "cr3", "cr4", "cr8", "dr0", "dr1", "dr2", "dr3", "dr6", "dr7", "bnd0", "bnd1", "bnd2", "bnd3", + "tmm0", "tmm1", "tmm2", "tmm3", "tmm4", "tmm5", "tmm6", "tmm7", }; const TargetInfo::AddlRegName AddlRegNames[] = { @@ -394,7 +395,10 @@ void X86TargetInfo::setFeatureEnabledImpl(llvm::StringMap &Features, } else if (Name == "xsaveopt" || Name == "xsavec" || Name == "xsaves") { if (Enabled) Features["xsave"] = true; - } + } else if (Name == "amx-tile" && !Enabled) { + Features["amx-bf16"] = Features["amx-int8"] = false; + } else if ((Name == "amx-bf16" || Name == "amx-int8") && Enabled) + Features["amx-tile"] = true; } /// handleTargetFeatures - Perform initialization based on the user @@ -529,6 +533,12 @@ bool X86TargetInfo::handleTargetFeatures(std::vector &Features, HasINVPCID = true; } else if (Feature == "+enqcmd") { HasENQCMD = true; + } else if (Feature == "+amx-bf16") { + HasAMXBF16 = true; + } else if (Feature == "+amx-int8") { + HasAMXINT8 = true; + } else if (Feature == "+amx-tile") { + HasAMXTILE = true; } else if (Feature == "+serialize") { HasSERIALIZE = true; } else if (Feature == "+tsxldtrk") { @@ -924,6 +934,12 @@ void X86TargetInfo::getTargetDefines(const LangOptions &Opts, Builder.defineMacro("__INVPCID__"); if (HasENQCMD) Builder.defineMacro("__ENQCMD__"); + if (HasAMXTILE) + Builder.defineMacro("__AMXTILE__"); + if (HasAMXINT8) + Builder.defineMacro("__AMXINT8__"); + if (HasAMXBF16) + Builder.defineMacro("__AMXBF16__"); if (HasSERIALIZE) Builder.defineMacro("__SERIALIZE__"); if (HasTSXLDTRK) @@ -1020,6 +1036,9 @@ bool X86TargetInfo::isValidFeatureName(StringRef Name) const { .Case("3dnowa", true) .Case("adx", true) .Case("aes", true) + .Case("amx-bf16", true) + .Case("amx-int8", true) + .Case("amx-tile", true) .Case("avx", true) .Case("avx2", true) .Case("avx512f", true) @@ -1102,6 +1121,9 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const { return llvm::StringSwitch(Feature) .Case("adx", HasADX) .Case("aes", HasAES) + .Case("amx-bf16", HasAMXBF16) + .Case("amx-int8", HasAMXINT8) + .Case("amx-tile", HasAMXTILE) .Case("avx", SSELevel >= AVX) .Case("avx2", SSELevel >= AVX2) .Case("avx512f", SSELevel >= AVX512F) diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h index c33c608e27c8..623ac9474b5c 100644 --- a/clang/lib/Basic/Targets/X86.h +++ b/clang/lib/Basic/Targets/X86.h @@ -125,6 +125,9 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo { bool HasPTWRITE = false; bool HasINVPCID = false; bool HasENQCMD = false; + bool HasAMXTILE = false; + bool HasAMXINT8 = false; + bool HasAMXBF16 = false; bool HasSERIALIZE = false; bool HasTSXLDTRK = false; diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index fd9e3a0d672f..e7bee192d918 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -2,6 +2,7 @@ set(files adxintrin.h altivec.h ammintrin.h + amxintrin.h arm_acle.h arm_cmse.h armintr.h diff --git a/clang/lib/Headers/amxintrin.h b/clang/lib/Headers/amxintrin.h new file mode 100644 index 000000000000..58254e21c81a --- /dev/null +++ b/clang/lib/Headers/amxintrin.h @@ -0,0 +1,225 @@ +/*===--------------- amxintrin.h - AMX intrinsics -*- C/C++ -*---------------=== + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *===------------------------------------------------------------------------=== + */ + +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif /* __IMMINTRIN_H */ + +#ifndef __AMXINTRIN_H +#define __AMXINTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-tile"))) + +/// Load tile configuration from a 64-byte memory location specified by +/// "mem_addr". The tile configuration includes the tile type palette, the +/// number of bytes per row, and the number of rows. If the specified +/// palette_id is zero, that signifies the init state for both the tile +/// config and the tile data, and the tiles are zeroed. Any invalid +/// configurations will result in #GP fault. +/// +/// \headerfile +/// +/// This intrinsic corresponds to the LDTILECFG instruction. +/// +/// \param __config +/// A pointer to 512-bits configuration +static __inline__ void __DEFAULT_FN_ATTRS +_tile_loadconfig(const void *__config) +{ + __builtin_ia32_tile_loadconfig(__config); +} + +/// Stores the current tile configuration to a 64-byte memory location +/// specified by "mem_addr". The tile configuration includes the tile type +/// palette, the number of bytes per row, and the number of rows. If tiles +/// are not configured, all zeroes will be stored to memory. +/// +/// \headerfile +/// +/// This intrinsic corresponds to the STTILECFG instruction. +/// +/// \param __config +/// A pointer to 512-bits configuration +static __inline__ void __DEFAULT_FN_ATTRS +_tile_storeconfig(void *__config) +{ + __builtin_ia32_tile_storeconfig(__config); +} + +/// Release the tile configuration to return to the init state, which +/// releases all storage it currently holds. +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TILERELEASE instruction. +static __inline__ void __DEFAULT_FN_ATTRS +_tile_release(void) +{ + __builtin_ia32_tilerelease(); +} + +/// Load tile rows from memory specifieid by "base" address and "stride" into +/// destination tile "dst" using the tile configuration previously configured +/// via "_tile_loadconfig". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TILELOADD instruction. +/// +/// \param dst +/// A destination tile. Max size is 1024 Bytes. +/// \param base +/// A pointer to base address. +/// \param stride +/// The stride between the rows' data to be loaded in memory. +#define _tile_loadd(dst, base, stride) \ + __builtin_ia32_tileloadd64((dst), ((const void *)(base)), (__SIZE_TYPE__)(stride)) + +/// Load tile rows from memory specifieid by "base" address and "stride" into +/// destination tile "dst" using the tile configuration previously configured +/// via "_tile_loadconfig". This intrinsic provides a hint to the implementation +/// that the data will likely not be reused in the near future and the data +/// caching can be optimized accordingly. +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TILELOADDT1 instruction. +/// +/// \param dst +/// A destination tile. Max size is 1024 Bytes. +/// \param base +/// A pointer to base address. +/// \param stride +/// The stride between the rows' data to be loaded in memory. +#define _tile_stream_loadd(dst, base, stride) \ + __builtin_ia32_tileloaddt164((dst), ((const void *)(base)), (__SIZE_TYPE__)(stride)) + +/// Store the tile specified by "src" to memory specifieid by "base" address and +/// "stride" using the tile configuration previously configured via +/// "_tile_loadconfig". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TILESTORED instruction. +/// +/// \param dst +/// A destination tile. Max size is 1024 Bytes. +/// \param base +/// A pointer to base address. +/// \param stride +/// The stride between the rows' data to be stored in memory. +#define _tile_stored(dst, base, stride) \ + __builtin_ia32_tilestored64((dst), ((void *)(base)), (__SIZE_TYPE__)(stride)) + +/// Zero the tile specified by "tdest". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TILEZERO instruction. +/// +/// \param tile +/// The destination tile to be zero. Max size is 1024 Bytes. +#define _tile_zero(tile) __builtin_ia32_tilezero((tile)) + +/// Compute dot-product of bytes in tiles with a source/destination accumulator. +/// Multiply groups of 4 adjacent pairs of signed 8-bit integers in src0 with +/// corresponding signed 8-bit integers in src1, producing 4 intermediate 32-bit +/// results. Sum these 4 results with the corresponding 32-bit integer in "dst", +/// and store the 32-bit result back to tile "dst". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TDPBSSD instruction. +/// +/// \param dst +/// The destination tile. Max size is 1024 Bytes. +/// \param src0 +/// The 1st source tile. Max size is 1024 Bytes. +/// \param src1 +/// The 2nd source tile. Max size is 1024 Bytes. +#define _tile_dpbssd(dst, src0, src1) __builtin_ia32_tdpbssd((dst), (src0), (src1)) + +/// Compute dot-product of bytes in tiles with a source/destination accumulator. +/// Multiply groups of 4 adjacent pairs of signed 8-bit integers in src0 with +/// corresponding unsigned 8-bit integers in src1, producing 4 intermediate +/// 32-bit results. Sum these 4 results with the corresponding 32-bit integer +/// in "dst", and store the 32-bit result back to tile "dst". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TDPBSUD instruction. +/// +/// \param dst +/// The destination tile. Max size is 1024 Bytes. +/// \param src0 +/// The 1st source tile. Max size is 1024 Bytes. +/// \param src1 +/// The 2nd source tile. Max size is 1024 Bytes. +#define _tile_dpbsud(dst, src0, src1) __builtin_ia32_tdpbsud((dst), (src0), (src1)) + +/// Compute dot-product of bytes in tiles with a source/destination accumulator. +/// Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in src0 with +/// corresponding signed 8-bit integers in src1, producing 4 intermediate 32-bit +/// results. Sum these 4 results with the corresponding 32-bit integer in "dst", +/// and store the 32-bit result back to tile "dst". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TDPBUSD instruction. +/// +/// \param dst +/// The destination tile. Max size is 1024 Bytes. +/// \param src0 +/// The 1st source tile. Max size is 1024 Bytes. +/// \param src1 +/// The 2nd source tile. Max size is 1024 Bytes. +#define _tile_dpbusd(dst, src0, src1) __builtin_ia32_tdpbusd((dst), (src0), (src1)) + +/// Compute dot-product of bytes in tiles with a source/destination accumulator. +/// Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in src0 with +/// corresponding unsigned 8-bit integers in src1, producing 4 intermediate +/// 32-bit results. Sum these 4 results with the corresponding 32-bit integer in +/// "dst", and store the 32-bit result back to tile "dst". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TDPBUUD instruction. +/// +/// \param dst +/// The destination tile. Max size is 1024 Bytes. +/// \param src0 +/// The 1st source tile. Max size is 1024 Bytes. +/// \param src1 +/// The 2nd source tile. Max size is 1024 Bytes. +#define _tile_dpbuud(dst, src0, src1) __builtin_ia32_tdpbuud((dst), (src0), (src1)) + +/// Compute dot-product of BF16 (16-bit) floating-point pairs in tiles src0 and +/// src1, accumulating the intermediate single-precision (32-bit) floating-point +/// elements with elements in "dst", and store the 32-bit result back to tile +/// "dst". +/// +/// \headerfile +/// +/// This intrinsic corresponds to the TDPBF16PS instruction. +/// +/// \param dst +/// The destination tile. Max size is 1024 Bytes. +/// \param src0 +/// The 1st source tile. Max size is 1024 Bytes. +/// \param src1 +/// The 2nd source tile. Max size is 1024 Bytes. +#define _tile_dpbf16ps(dst, src0, src1) \ + __builtin_ia32_tdpbf16ps((dst), (src0), (src1)) + +#undef __DEFAULT_FN_ATTRS + +#endif /* __x86_64__ */ +#endif /* __AMXINTRIN_H */ diff --git a/clang/lib/Headers/cpuid.h b/clang/lib/Headers/cpuid.h index 6c38b578b30e..2a88c042d046 100644 --- a/clang/lib/Headers/cpuid.h +++ b/clang/lib/Headers/cpuid.h @@ -190,6 +190,9 @@ #define bit_TSXLDTRK 0x00010000 #define bit_PCONFIG 0x00040000 #define bit_IBT 0x00100000 +#define bit_AMXBF16 0x00400000 +#define bit_AMXTILE 0x01000000 +#define bit_AMXINT8 0x02000000 /* Features in %eax for leaf 7 sub-leaf 1 */ #define bit_AVX512BF16 0x00000020 diff --git a/clang/lib/Headers/immintrin.h b/clang/lib/Headers/immintrin.h index dd27ca2f6605..e9dff2310fdf 100644 --- a/clang/lib/Headers/immintrin.h +++ b/clang/lib/Headers/immintrin.h @@ -471,6 +471,11 @@ _storebe_i64(void * __P, long long __D) { #include #endif +#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \ + defined(__AMXTILE__) || defined(__AMXINT8__) || defined(__AMXBF16__) +#include +#endif + #if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \ defined(__AVX512VP2INTERSECT__) #include diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index 0ce84ea382b5..2b52415b2800 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -88,6 +88,7 @@ #include "llvm/Support/SaveAndRestore.h" #include "llvm/Support/raw_ostream.h" #include +#include #include #include #include @@ -3607,6 +3608,64 @@ bool Sema::CheckX86BuiltinGatherScatterScale(unsigned BuiltinID, << Arg->getSourceRange(); } +enum { TileRegLow = 0, TileRegHigh = 7 }; + +bool Sema::CheckX86BuiltinTileArgumentsRange(CallExpr *TheCall, + ArrayRef ArgNums) { + for (int ArgNum : ArgNums) { + if (SemaBuiltinConstantArgRange(TheCall, ArgNum, TileRegLow, TileRegHigh)) + return true; + } + return false; +} + +bool Sema::CheckX86BuiltinTileArgumentsRange(CallExpr *TheCall, int ArgNum) { + return SemaBuiltinConstantArgRange(TheCall, ArgNum, TileRegLow, TileRegHigh); +} + +bool Sema::CheckX86BuiltinTileDuplicate(CallExpr *TheCall, + ArrayRef ArgNums) { + // Because the max number of tile register is TileRegHigh + 1, so here we use + // each bit to represent the usage of them in bitset. + std::bitset ArgValues; + for (int ArgNum : ArgNums) { + llvm::APSInt Arg; + SemaBuiltinConstantArg(TheCall, ArgNum, Arg); + int ArgExtValue = Arg.getExtValue(); + assert((ArgExtValue >= TileRegLow || ArgExtValue <= TileRegHigh) && + "Incorrect tile register num."); + if (ArgValues.test(ArgExtValue)) + return Diag(TheCall->getBeginLoc(), + diag::err_x86_builtin_tile_arg_duplicate) + << TheCall->getArg(ArgNum)->getSourceRange(); + ArgValues.set(ArgExtValue); + } + return false; +} + +bool Sema::CheckX86BuiltinTileRangeAndDuplicate(CallExpr *TheCall, + ArrayRef ArgNums) { + return CheckX86BuiltinTileArgumentsRange(TheCall, ArgNums) || + CheckX86BuiltinTileDuplicate(TheCall, ArgNums); +} + +bool Sema::CheckX86BuiltinTileArguments(unsigned BuiltinID, CallExpr *TheCall) { + switch (BuiltinID) { + default: + return false; + case X86::BI__builtin_ia32_tileloadd64: + case X86::BI__builtin_ia32_tileloaddt164: + case X86::BI__builtin_ia32_tilestored64: + case X86::BI__builtin_ia32_tilezero: + return CheckX86BuiltinTileArgumentsRange(TheCall, 0); + case X86::BI__builtin_ia32_tdpbssd: + case X86::BI__builtin_ia32_tdpbsud: + case X86::BI__builtin_ia32_tdpbusd: + case X86::BI__builtin_ia32_tdpbuud: + case X86::BI__builtin_ia32_tdpbf16ps: + return CheckX86BuiltinTileRangeAndDuplicate(TheCall, {0, 1, 2}); + } +} static bool isX86_32Builtin(unsigned BuiltinID) { // These builtins only work on x86-32 targets. switch (BuiltinID) { @@ -3640,6 +3699,10 @@ bool Sema::CheckX86BuiltinFunctionCall(const TargetInfo &TI, unsigned BuiltinID, if (CheckX86BuiltinGatherScatterScale(BuiltinID, TheCall)) return true; + // If the intrinsic has a tile arguments, make sure they are valid. + if (CheckX86BuiltinTileArguments(BuiltinID, TheCall)) + return true; + // For intrinsics which take an immediate value as part of the instruction, // range check them here. int i = 0, l = 0, u = 0; diff --git a/clang/test/CodeGen/AMX/amx.c b/clang/test/CodeGen/AMX/amx.c new file mode 100644 index 000000000000..89b486f7a601 --- /dev/null +++ b/clang/test/CodeGen/AMX/amx.c @@ -0,0 +1,32 @@ +// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-int8 \ +// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic | FileCheck %s --check-prefixes=CHECK + +#include + +void test_amx(void *data) { + //CHECK-LABEL: @test_amx + //CHECK: call void @llvm.x86.ldtilecfg(i8* %{{.*}}) + //CHECK: call void @llvm.x86.sttilecfg(i8* %{{.*}}) + //CHECK: call void @llvm.x86.tilerelease() + //CHECK: call void @llvm.x86.tilezero(i8 3) + //CHECK: call void @llvm.x86.tileloadd64(i8 4, i8* %{{.*}}, i64 8) + //CHECK: call void @llvm.x86.tileloaddt164(i8 0, i8* %{{.*}}, i64 1) + //CHECK: call void @llvm.x86.tilestored64(i8 0, i8* %{{.*}}, i64 1) + //CHECK: call void @llvm.x86.tdpbssd(i8 1, i8 2, i8 3) + //CHECK: call void @llvm.x86.tdpbsud(i8 1, i8 2, i8 3) + //CHECK: call void @llvm.x86.tdpbusd(i8 1, i8 2, i8 3) + //CHECK: call void @llvm.x86.tdpbuud(i8 1, i8 2, i8 3) + //CHECK: call void @llvm.x86.tdpbf16ps(i8 1, i8 2, i8 3) + _tile_loadconfig(data); + _tile_storeconfig(data); + _tile_release(); + _tile_zero(3); + _tile_loadd(4, data, 8); + _tile_stream_loadd(0, data, 1); + _tile_stored(0, data, 1); + _tile_dpbssd(1, 2, 3); + _tile_dpbsud(1, 2, 3); + _tile_dpbusd(1, 2, 3); + _tile_dpbuud(1, 2, 3); + _tile_dpbf16ps(1, 2, 3); +} diff --git a/clang/test/CodeGen/AMX/amx_errors.c b/clang/test/CodeGen/AMX/amx_errors.c new file mode 100644 index 000000000000..13a2b33b5a0a --- /dev/null +++ b/clang/test/CodeGen/AMX/amx_errors.c @@ -0,0 +1,17 @@ +// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-tile -target-feature +amx-int8 -target-feature +amx-bf16 -emit-llvm -fsyntax-only -verify + +#include + +void test_amx(void *data) { + _tile_zero(16); // expected-error {{argument value 16 is outside the valid range [0, 7]}} + _tile_loadd(19, data, 16); // expected-error {{argument value 19 is outside the valid range [0, 7]}} + _tile_stream_loadd(23, data, 1); // expected-error {{argument value 23 is outside the valid range [0, 7]}} + _tile_stored(88, data, 1); // expected-error {{argument value 88 is outside the valid range [0, 7]}} + _tile_dpbssd(16, 2, 3); // expected-error {{argument value 16 is outside the valid range [0, 7]}} + _tile_dpbssd(0, 16, 3); // expected-error {{argument value 16 is outside the valid range [0, 7]}} + _tile_dpbuud(0, 2, 16); // expected-error {{argument value 16 is outside the valid range [0, 7]}} + _tile_dpbsud(1, 1, 3); // expected-error {{tile arguments must refer to diff erent tiles}} + _tile_dpbsud(7, 1, 7); // expected-error {{tile arguments must refer to diff erent tiles}} + _tile_dpbsud(4, 3, 3); // expected-error {{tile arguments must refer to diff erent tiles}} + _tile_dpbf16ps(4, 3, 3); // expected-error {{tile arguments must refer to diff erent tiles}} +} diff --git a/clang/test/CodeGen/AMX/amx_inline_asm.c b/clang/test/CodeGen/AMX/amx_inline_asm.c new file mode 100644 index 000000000000..9d828f8ac94e --- /dev/null +++ b/clang/test/CodeGen/AMX/amx_inline_asm.c @@ -0,0 +1,11 @@ +// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-int8 -target-feature +amx-bf16 -emit-llvm -o - -Wall -Werror -pedantic | FileCheck %s --check-prefixes=CHECK,X86_64 + +void f_tilemul(short a) +{ + //CHECK: call void asm sideeffect "tileloadd 0(%rsi,%r13,4), %tmm0 \0A\09tileloadd 0(%rdx,%r14,4), %tmm6 \0A\09tdpbf16ps %tmm6, %tmm0, %tmm7 \0A\09tilestored %tmm7, 0(%r12,%r15,4) \0A\09", "~{memory},~{tmm0},~{tmm6},~{tmm7},~{dirflag},~{fpsr},~{flags}"() + __asm__ volatile ("tileloadd 0(%%rsi,%%r13,4), %%tmm0 \n\t" + "tileloadd 0(%%rdx,%%r14,4), %%tmm6 \n\t" + "tdpbf16ps %%tmm6, %%tmm0, %%tmm7 \n\t" + "tilestored %%tmm7, 0(%%r12,%%r15,4) \n\t" + ::: "memory", "tmm0", "tmm6", "tmm7"); +} diff --git a/clang/test/Driver/x86-target-features.c b/clang/test/Driver/x86-target-features.c index b96eed287bd9..817caeecd71e 100644 --- a/clang/test/Driver/x86-target-features.c +++ b/clang/test/Driver/x86-target-features.c @@ -232,3 +232,18 @@ // RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-tsxldtrk %s -### -o %t.o 2>&1 | FileCheck --check-prefix=NO-TSXLDTRK %s // TSXLDTRK: "-target-feature" "+tsxldtrk" // NO-TSXLDTRK: "-target-feature" "-tsxldtrk" + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-tile %s -### -o %t.o 2>&1 | FileCheck --check-prefix=AMX-TILE %s +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-tile %s -### -o %t.o 2>&1 | FileCheck --check-prefix=NO-AMX-TILE %s +// AMX-TILE: "-target-feature" "+amx-tile" +// NO-AMX-TILE: "-target-feature" "-amx-tile" + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-bf16 %s -### -o %t.o 2>&1 | FileCheck --check-prefix=AMX-BF16 %s +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-bf16 %s -### -o %t.o 2>&1 | FileCheck -check-prefix=NO-AMX-BF16 %s +// AMX-BF16: "-target-feature" "+amx-bf16" +// NO-AMX-BF16: "-target-feature" "-amx-bf16" + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-int8 %s -### -o %t.o 2>&1 | FileCheck --check-prefix=AMX-INT8 %s +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-int8 %s -### -o %t.o 2>&1 | FileCheck --check-prefix=NO-AMX-INT8 %s +// AMX-INT8: "-target-feature" "+amx-int8" +// NO-AMX-INT8: "-target-feature" "-amx-int8" diff --git a/clang/test/Preprocessor/x86_amx_target_features.c b/clang/test/Preprocessor/x86_amx_target_features.c new file mode 100644 index 000000000000..68a3d7f950b1 --- /dev/null +++ b/clang/test/Preprocessor/x86_amx_target_features.c @@ -0,0 +1,35 @@ +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-tile -x c -E -dM -o - %s | FileCheck -check-prefix=AMX-TILE %s + +// AMX-TILE: #define __AMXTILE__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-bf16 -x c -E -dM -o - %s | FileCheck -check-prefix=AMX-BF16 %s + +// AMX-BF16: #define __AMXBF16__ 1 +// AMX-BF16: #define __AMXTILE__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mamx-int8 -x c -E -dM -o - %s | FileCheck -check-prefix=AMX-INT8 %s + +// AMX-INT8: #define __AMXINT8__ 1 +// AMX-INT8: #define __AMXTILE__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-tile -x c -E -dM -o - %s | FileCheck -check-prefix=NOAMX-TILE %s + +// NOAMX-TILE-NOT: #define __AMXTILE__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-bf16 -x c -E -dM -o - %s | FileCheck -check-prefix=NOAMX-BF16 %s + +// NOAMX-BF16-NOT: #define __AMXBF16__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -amx-bf16 -mno-amx-tile -x c -E -dM -o - %s | FileCheck -check-prefix=NOAMX-BF16 %s + +// NOAMX-BF16-NOT: #define __AMXTILE__ 1 +// NOAMX-BF16-NOT: #define __AMXBF16__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-amx-int8 -x c -E -dM -o - %s | FileCheck -check-prefix=NOAMX-INT8 %s + +// NOAMX-INT8-NOT: #define __AMXINT8__ 1 + +// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -amx-int8 -mno-amx-tile -x c -E -dM -o - %s | FileCheck -check-prefix=NOAMX-INT8 %s + +// NOAMX-INT8-NOT: #define __AMXTILE__ 1 +// NOAMX-INT8-NOT: #define __AMXINT8__ 1 diff --git a/llvm/include/llvm/IR/IntrinsicsX86.td b/llvm/include/llvm/IR/IntrinsicsX86.td index b3bf18720595..3f86fd075d3a 100644 --- a/llvm/include/llvm/IR/IntrinsicsX86.td +++ b/llvm/include/llvm/IR/IntrinsicsX86.td @@ -4948,3 +4948,32 @@ let TargetPrefix = "x86" in { def int_x86_xresldtrk : GCCBuiltin<"__builtin_ia32_xresldtrk">, Intrinsic<[], [], []>; } +//===----------------------------------------------------------------------===// +// AMX - Intel AMX extensions + +let TargetPrefix = "x86" in { + def int_x86_ldtilecfg : GCCBuiltin<"__builtin_ia32_tile_loadconfig">, + Intrinsic<[], [llvm_ptr_ty], []>; + def int_x86_sttilecfg : GCCBuiltin<"__builtin_ia32_tile_storeconfig">, + Intrinsic<[], [llvm_ptr_ty], []>; + def int_x86_tilerelease : GCCBuiltin<"__builtin_ia32_tilerelease">, + Intrinsic<[], [], []>; + def int_x86_tilezero : GCCBuiltin<"__builtin_ia32_tilezero">, + Intrinsic<[], [llvm_i8_ty], []>; + def int_x86_tileloadd64 : GCCBuiltin<"__builtin_ia32_tileloadd64">, + Intrinsic<[], [llvm_i8_ty, llvm_ptr_ty, llvm_i64_ty], []>; + def int_x86_tileloaddt164 : GCCBuiltin<"__builtin_ia32_tileloaddt164">, + Intrinsic<[], [llvm_i8_ty, llvm_ptr_ty, llvm_i64_ty], []>; + def int_x86_tilestored64 : GCCBuiltin<"__builtin_ia32_tilestored64">, + Intrinsic<[], [llvm_i8_ty, llvm_ptr_ty, llvm_i64_ty], []>; + def int_x86_tdpbssd : GCCBuiltin<"__builtin_ia32_tdpbssd">, + Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty], []>; + def int_x86_tdpbsud : GCCBuiltin<"__builtin_ia32_tdpbsud">, + Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty], []>; + def int_x86_tdpbusd : GCCBuiltin<"__builtin_ia32_tdpbusd">, + Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty], []>; + def int_x86_tdpbuud : GCCBuiltin<"__builtin_ia32_tdpbuud">, + Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty], []>; + def int_x86_tdpbf16ps : GCCBuiltin<"__builtin_ia32_tdpbf16ps">, + Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty], []>; +} diff --git a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp index 5a57ca7646ff..fb285376c580 100644 --- a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp +++ b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp @@ -4435,8 +4435,39 @@ void X86DAGToDAGISel::Select(SDNode *Node) { break; } + case Intrinsic::x86_tileloadd64: + case Intrinsic::x86_tileloaddt164: + case Intrinsic::x86_tilestored64: { + if (!Subtarget->hasAMXTILE()) + break; + unsigned Opc; + switch (IntNo) { + default: llvm_unreachable("Unexpected intrinsic!"); + case Intrinsic::x86_tileloadd64: Opc = X86::PTILELOADD; break; + case Intrinsic::x86_tileloaddt164: Opc = X86::PTILELOADDT1; break; + case Intrinsic::x86_tilestored64: Opc = X86::PTILESTORED; break; + } + // FIXME: Match displacement and scale. + unsigned TIndex = Node->getConstantOperandVal(2); + SDValue TReg = getI8Imm(TIndex, dl); + SDValue Base = Node->getOperand(3); + SDValue Scale = getI8Imm(1, dl); + SDValue Index = Node->getOperand(4); + SDValue Disp = CurDAG->getTargetConstant(0, dl, MVT::i32); + SDValue Segment = CurDAG->getRegister(0, MVT::i16); + SDValue Chain = Node->getOperand(0); + MachineSDNode *CNode; + if (Opc == X86::PTILESTORED) { + SDValue Ops[] = { Base, Scale, Index, Disp, Segment, TReg, Chain }; + CNode = CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops); + } else { + SDValue Ops[] = { TReg, Base, Scale, Index, Disp, Segment, Chain }; + CNode = CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops); + } + ReplaceNode(Node, CNode); + return; + } } - break; } case ISD::BRIND: { diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 88a563720c2a..d7a45f6fb7c4 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -33044,6 +33044,10 @@ X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI, const TargetInstrInfo *TII = Subtarget.getInstrInfo(); DebugLoc DL = MI.getDebugLoc(); + auto TMMImmToTMMReg = [](unsigned Imm) { + assert (Imm < 8 && "Illegal tmm index"); + return X86::TMM0 + Imm; + }; switch (MI.getOpcode()) { default: llvm_unreachable("Unexpected instr type to insert"); case X86::TLS_addr32: @@ -33326,6 +33330,67 @@ X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI, MI.eraseFromParent(); return BB; } + case X86::PTDPBSSD: + case X86::PTDPBSUD: + case X86::PTDPBUSD: + case X86::PTDPBUUD: + case X86::PTDPBF16PS: { + const DebugLoc &DL = MI.getDebugLoc(); + unsigned Opc; + switch (MI.getOpcode()) { + case X86::PTDPBSSD: Opc = X86::TDPBSSD; break; + case X86::PTDPBSUD: Opc = X86::TDPBSUD; break; + case X86::PTDPBUSD: Opc = X86::TDPBUSD; break; + case X86::PTDPBUUD: Opc = X86::TDPBUUD; break; + case X86::PTDPBF16PS: Opc = X86::TDPBF16PS; break; + } + + MachineInstrBuilder MIB = BuildMI(*BB, MI, DL, TII->get(Opc)); + MIB.addReg(TMMImmToTMMReg(MI.getOperand(0).getImm()), RegState::Define); + MIB.addReg(TMMImmToTMMReg(MI.getOperand(0).getImm()), RegState::Undef); + MIB.addReg(TMMImmToTMMReg(MI.getOperand(1).getImm()), RegState::Undef); + MIB.addReg(TMMImmToTMMReg(MI.getOperand(2).getImm()), RegState::Undef); + + MI.eraseFromParent(); // The pseudo is gone now. + return BB; + } + case X86::PTILEZERO: { + const DebugLoc &DL = MI.getDebugLoc(); + unsigned Imm = MI.getOperand(0).getImm(); + BuildMI(*BB, MI, DL, TII->get(X86::TILEZERO), TMMImmToTMMReg(Imm)); + MI.eraseFromParent(); // The pseudo is gone now. + return BB; + } + case X86::PTILELOADD: + case X86::PTILELOADDT1: + case X86::PTILESTORED: { + const DebugLoc &DL = MI.getDebugLoc(); + unsigned Opc; + switch (MI.getOpcode()) { + case X86::PTILELOADD: Opc = X86::TILELOADD; break; + case X86::PTILELOADDT1: Opc = X86::TILELOADDT1; break; + case X86::PTILESTORED: Opc = X86::TILESTORED; break; + } + + MachineInstrBuilder MIB = BuildMI(*BB, MI, DL, TII->get(Opc)); + unsigned CurOp = 0; + if (Opc != X86::TILESTORED) + MIB.addReg(TMMImmToTMMReg(MI.getOperand(CurOp++).getImm()), + RegState::Define); + + MIB.add(MI.getOperand(CurOp++)); // base + MIB.add(MI.getOperand(CurOp++)); // scale + MIB.add(MI.getOperand(CurOp++)); // index -- stride + MIB.add(MI.getOperand(CurOp++)); // displacement + MIB.add(MI.getOperand(CurOp++)); // segment + + if (Opc == X86::TILESTORED) + MIB.addReg(TMMImmToTMMReg(MI.getOperand(CurOp++).getImm()), + RegState::Undef); + + MI.eraseFromParent(); // The pseudo is gone now. + return BB; + } } } diff --git a/llvm/lib/Target/X86/X86InstrAMX.td b/llvm/lib/Target/X86/X86InstrAMX.td index deefb3eecf39..e26dd5050a23 100644 --- a/llvm/lib/Target/X86/X86InstrAMX.td +++ b/llvm/lib/Target/X86/X86InstrAMX.td @@ -18,9 +18,11 @@ let Predicates = [HasAMXTILE, In64BitMode] in { let SchedRW = [WriteSystem] in { let Defs = [TMM0,TMM1,TMM2,TMM3,TMM4,TMM5,TMM6,TMM7] in def LDTILECFG : I <0x49, MRM0m, (outs), (ins opaquemem:$src), - "ldtilecfg\t$src", []>, VEX, T8PS; + "ldtilecfg\t$src", + [(int_x86_ldtilecfg addr:$src)]>, VEX, T8PS; def STTILECFG : I <0x49, MRM0m, (outs), (ins opaquemem:$src), - "sttilecfg\t$src", []>, VEX, T8PD; + "sttilecfg\t$src", + [(int_x86_sttilecfg addr:$src)]>, VEX, T8PD; def TILELOADD : I<0x4b, MRMSrcMemFSIB, (outs TILE:$dst), (ins sibmem:$src), "tileloadd\t{$src, $dst|$dst, $src}", []>, @@ -31,7 +33,7 @@ let Predicates = [HasAMXTILE, In64BitMode] in { VEX, T8PD; let Defs = [TMM0,TMM1,TMM2,TMM3,TMM4,TMM5,TMM6,TMM7] in def TILERELEASE : I<0x49, MRM_C0, (outs), (ins), - "tilerelease", []>, VEX, T8PS; + "tilerelease", [(int_x86_tilerelease)]>, VEX, T8PS; def TILESTORED : I<0x4b, MRMDestMemFSIB, (outs), (ins sibmem:$dst, TILE:$src), "tilestored\t{$src, $dst|$dst, $src}", []>, @@ -39,6 +41,17 @@ let Predicates = [HasAMXTILE, In64BitMode] in { def TILEZERO : I<0x49, MRMr0, (outs TILE:$dst), (ins), "tilezero\t$dst", []>, VEX, T8XD; + + let usesCustomInserter = 1 in { + // Pseudo instructions, using immediates instead of tile registers. + // To be translated to the actual instructions in X86ISelLowering.cpp + def PTILELOADD : PseudoI<(outs), (ins u8imm:$src1, sibmem:$src2), []>; + def PTILELOADDT1 : PseudoI<(outs), (ins u8imm:$src1, + sibmem:$src2), []>; + def PTILESTORED : PseudoI<(outs), (ins i8mem:$dst, u8imm:$src), []>; + def PTILEZERO : PseudoI<(outs), (ins u8imm:$src), + [(int_x86_tilezero imm:$src)]>; + } } // SchedRW } // HasAMXTILE @@ -62,6 +75,27 @@ let Predicates = [HasAMXINT8, In64BitMode] in { "tdpbuud\t{$src3, $src2, $dst|$dst, $src2, $src3}", []>, VEX_4V, T8PS; } + + let usesCustomInserter = 1 in { + // Pseudo instructions, using immediates instead of tile registers. + // To be translated to the actual instructions in X86ISelLowering.cpp + def PTDPBSSD : PseudoI<(outs), (ins u8imm:$src1, + u8imm:$src2, u8imm:$src3), + [(int_x86_tdpbssd imm:$src1, + imm:$src2, imm:$src3)]>; + def PTDPBSUD : PseudoI<(outs), (ins u8imm:$src1, + u8imm:$src2, u8imm:$src3), + [(int_x86_tdpbsud imm:$src1, + imm:$src2, imm:$src3)]>; + def PTDPBUSD : PseudoI<(outs), (ins u8imm:$src1, + u8imm:$src2, u8imm:$src3), + [(int_x86_tdpbusd imm:$src1, + imm:$src2, imm:$src3)]>; + def PTDPBUUD : PseudoI<(outs), (ins u8imm:$src1, + u8imm:$src2, u8imm:$src3), + [(int_x86_tdpbuud imm:$src1, + imm:$src2, imm:$src3)]>; + } } } // HasAMXTILE @@ -72,5 +106,14 @@ let Predicates = [HasAMXBF16, In64BitMode] in { (ins TILE:$src1, TILE:$src2, TILE:$src3), "tdpbf16ps\t{$src3, $src2, $dst|$dst, $src2, $src3}", []>, VEX_4V, T8XS; + + let usesCustomInserter = 1 in { + // Pseudo instructions, using immediates instead of tile registers. + // To be translated to the actual instructions in X86ISelLowering.cpp + def PTDPBF16PS : PseudoI<(outs), (ins u8imm:$src1, + u8imm:$src2, u8imm:$src3), + [(int_x86_tdpbf16ps imm:$src1, + imm:$src2, imm:$src3)]>; + } } } // HasAMXTILE, HasAMXBF16 diff --git a/llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll b/llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll new file mode 100644 index 000000000000..a415d9c15242 --- /dev/null +++ b/llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll @@ -0,0 +1,13 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-tile -mattr=+amx-bf16 -verify-machineinstrs | FileCheck %s + +define void @test_amx() { +; CHECK-LABEL: test_amx: +; CHECK: # %bb.0: +; CHECK-NEXT: tdpbf16ps %tmm7, %tmm4, %tmm3 +; CHECK-NEXT: retq + call void @llvm.x86.tdpbf16ps(i8 3, i8 4, i8 7) + ret void +} + +declare void @llvm.x86.tdpbf16ps(i8 %tile0, i8 %tile1, i8 %tile2) diff --git a/llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll b/llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll new file mode 100644 index 000000000000..49e69aeab510 --- /dev/null +++ b/llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll @@ -0,0 +1,24 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -verify-machineinstrs | FileCheck %s + +define void @test_amx() { +; CHECK-LABEL: test_amx: +; CHECK: # %bb.0: + call void @llvm.x86.tdpbssd(i8 3, i8 4, i8 7) +; CHECK-NEXT: tdpbssd %tmm7, %tmm4, %tmm3 + + call void @llvm.x86.tdpbsud(i8 3, i8 4, i8 7) +; CHECK-NEXT: tdpbsud %tmm7, %tmm4, %tmm3 + + call void @llvm.x86.tdpbusd(i8 3, i8 0, i8 7) +; CHECK-NEXT: tdpbusd %tmm7, %tmm0, %tmm3 + + call void @llvm.x86.tdpbuud(i8 3, i8 4, i8 1) +; CHECK-NEXT: tdpbuud %tmm1, %tmm4, %tmm3 + ret void +} + +declare void @llvm.x86.tdpbssd(i8 %tile0, i8 %tile1, i8 %tile2) +declare void @llvm.x86.tdpbsud(i8 %tile0, i8 %tile1, i8 %tile2) +declare void @llvm.x86.tdpbusd(i8 %tile0, i8 %tile1, i8 %tile2) +declare void @llvm.x86.tdpbuud(i8 %tile0, i8 %tile1, i8 %tile2) diff --git a/llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll b/llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll new file mode 100644 index 000000000000..6b8e040abb9a --- /dev/null +++ b/llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-tile -verify-machineinstrs | FileCheck %s + +define void @test_amx(i8* %pointer, i8* %base, i64 %stride) { +; CHECK-LABEL: test_amx: +; CHECK: # %bb.0: + call void @llvm.x86.ldtilecfg(i8* %pointer) +; CHECK-NEXT: ldtilecfg (%rdi) + + call void @llvm.x86.sttilecfg(i8* %pointer) +; CHECK-NEXT: sttilecfg (%rdi) + + call void @llvm.x86.tilerelease() +; CHECK-NEXT: tilerelease + + call void @llvm.x86.tilezero(i8 3) +; CHECK-NEXT: tilezero %tmm3 + + call void @llvm.x86.tileloadd64(i8 3, i8* %base, i64 %stride) +; CHECK-NEXT: tileloadd (%rsi,%rdx), %tmm3 + + call void @llvm.x86.tileloaddt164(i8 3, i8* %base, i64 %stride) +; CHECK-NEXT: tileloaddt1 (%rsi,%rdx), %tmm3 + + call void @llvm.x86.tilestored64(i8 3, i8* %base, i64 %stride) +; CHECK-NEXT: tilestored %tmm3, (%rsi,%rdx) + ret void +} + +declare void @llvm.x86.tileloadd64(i8 %tile, i8* %base, i64 %stride) +declare void @llvm.x86.tileloaddt164(i8 %tile, i8* %base, i64 %stride) +declare void @llvm.x86.tilestored64(i8 %tile, i8* %base, i64 %stride) +declare void @llvm.x86.ldtilecfg(i8* %pointer) +declare void @llvm.x86.sttilecfg(i8* %pointer) +declare void @llvm.x86.tilerelease() +declare void @llvm.x86.tilezero(i8 %tile) From llvm-commits at lists.llvm.org Mon Jul 6 19:14:33 2020 From: llvm-commits at lists.llvm.org (Xiang Zhang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:14:33 +0000 (UTC) Subject: [PATCH] D83111: [X86-64] Support Intel AMX Intrinsic In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG939d8309dbd4: [X86-64] Support Intel AMX Intrinsic (authored by xiangzhangllvm). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D83111?vs=275878&id=275888#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83111/new/ https://reviews.llvm.org/D83111 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Basic/BuiltinsX86_64.def clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Driver/Options.td clang/include/clang/Sema/Sema.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/Headers/CMakeLists.txt clang/lib/Headers/amxintrin.h clang/lib/Headers/cpuid.h clang/lib/Headers/immintrin.h clang/lib/Sema/SemaChecking.cpp clang/test/CodeGen/AMX/amx.c clang/test/CodeGen/AMX/amx_errors.c clang/test/CodeGen/AMX/amx_inline_asm.c clang/test/Driver/x86-target-features.c clang/test/Preprocessor/x86_amx_target_features.c llvm/include/llvm/IR/IntrinsicsX86.td llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrAMX.td llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83111.275888.patch Type: text/x-patch Size: 42404 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 19:15:30 2020 From: llvm-commits at lists.llvm.org (Xiang Zhang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:15:30 +0000 (UTC) Subject: [PATCH] D83111: [X86-64] Support Intel AMX Intrinsic In-Reply-To: References: Message-ID: xiangzhangllvm added a comment. In D83111#2134747 , @craig.topper wrote: > LGTM with all instances of "pointer point" replace with just "pointer" Done it in commit. Thank you! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83111/new/ https://reviews.llvm.org/D83111 From llvm-commits at lists.llvm.org Mon Jul 6 19:15:57 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:15:57 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: <02260bb11ff42e6009de8beb9b3e1cd1@localhost.localdomain> jdoerfert added inline comments. ================ Comment at: llvm/utils/update_cc_test_checks.py:133 + parser.add_argument('--include-generated-funcs', action='store_true', + help='Output checks for functions not in source') parser.add_argument('tests', nargs='+') ---------------- greened wrote: > greened wrote: > > jdoerfert wrote: > > > I think this should go into common.py (after D78618). I would also make this the default but OK. > > Yes I suppose it should in case `opt` and friends generate functions. I hadn't considered that use-case. > > > > While I would like to make it default unfortunately it would require updating a bunch of the existing clang tests which doesn't seem too friendly. See the patch update comment for details. > > > Just realized it wouldn't necessarily require regeneration of tests, it would just cause regenerated tests to change a lot when they are eventually regenerated. We should discuss as to whether that's acceptable. I think for now this should be non-default to at least get the functionality in without disturbing existing users and then we can discuss a separate change to make it default. > > It's also possible we could change how clang orders functions. I discovered there's a difference in clang 10 vs. 11 in the order functions are output when OpenMP outlining happens. clang 10 seems to preserve the source order of functions and clang 11 does not. Perhaps that needs to be fixed as I don't know whether that change was intentional or not. Best case, without the option the original behavior is preserved. Is that not the case? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Mon Jul 6 19:20:14 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Mon, 06 Jul 2020 19:20:14 -0700 (PDT) Subject: [llvm] 65482e8 - [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td Message-ID: <5f03dbde.1c69fb81.20a39.6bab@mx.google.com> Author: Valentin Clement Date: 2020-07-06T22:20:06-04:00 New Revision: 65482e8a703d4bfe8b9fb5771e34755045d5a5d7 URL: https://github.com/llvm/llvm-project/commit/65482e8a703d4bfe8b9fb5771e34755045d5a5d7 DIFF: https://github.com/llvm/llvm-project/commit/65482e8a703d4bfe8b9fb5771e34755045d5a5d7.diff LOG: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td Summary: Generate the isAllowedClauseForDirective function from tablegen. This patch introduce the VersionedClause in the tablegen file so that clause can be encapsulated in this class to specify a range of validity on a directive. VersionedClause has default minVersion, maxVersion so it can be used without them or minVersion. Reviewers: jdoerfert, jdenny Reviewed By: jdenny Subscribers: yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82982 Added: Modified: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td index 6e7d8a3fe960..87fb88c31ed0 100644 --- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td +++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td @@ -61,7 +61,19 @@ class Clause { bit isDefault = 0; } -// Information about a specific directive +// Hold information about clause validity by version. +class VersionedClause { + // Actual clause. + Clause clause = c; + + // Mininum version number where this clause is valid. + int minVersion = min; + + // Maximum version number where this clause is valid. + int maxVersion = max; +} + +// Information about a specific directive. class Directive { // Name of the directive. Can be composite directive sepearted by whitespace. string name = d; @@ -71,13 +83,13 @@ class Directive { string alternativeName = ""; // List of allowed clauses for the directive. - list allowedClauses = ?; + list allowedClauses = []; // List of clauses that are allowed to appear only once. - list allowedOnceClauses = ?; + list allowedOnceClauses = []; // List of clauses that are required. - list requiredClauses = ?; + list requiredClauses = []; // Set directive used by default when unknown. bit isDefault = 0; diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td index ed44676d2fa1..ce0bf4661176 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMP.td +++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td @@ -199,388 +199,1163 @@ def OMPC_Notinbranch : Clause<"notinbranch"> {} def OMP_ThreadPrivate : Directive<"threadprivate"> {} def OMP_Parallel : Directive<"parallel"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_FirstPrivate, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, - OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Task : Directive<"task"> { - let allowedClauses = [OMPC_If, OMPC_Final, OMPC_Default, OMPC_Private, - OMPC_FirstPrivate, OMPC_Shared, OMPC_Untied, OMPC_Mergeable, OMPC_Depend, - OMPC_Priority, OMPC_InReduction, OMPC_Allocate, OMPC_Detach, - OMPC_Affinity]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Simd : Directive<"simd"> { - let allowedClauses = [OMPC_Private, OMPC_LastPrivate, OMPC_Linear, - OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Collapse, OMPC_Reduction, - OMPC_Allocate, OMPC_If, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_For : Directive<"for"> { - let allowedClauses = [OMPC_Private, OMPC_LastPrivate, OMPC_FirstPrivate, - OMPC_Reduction, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, OMPC_NoWait, - OMPC_Linear, OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Do : Directive<"do"> { - let allowedClauses = [OMPC_Private, OMPC_FirstPrivate, OMPC_LastPrivate, - OMPC_Linear, OMPC_Reduction]; - let allowedOnceClauses = [OMPC_Schedule, OMPC_Collapse, OMPC_Ordered]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Sections : Directive<"sections"> { - let allowedClauses = [OMPC_Private, OMPC_LastPrivate, OMPC_FirstPrivate, - OMPC_Reduction, OMPC_NoWait, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Section : Directive<"section"> {} def OMP_Single : Directive<"single"> { - let allowedClauses = [OMPC_Private, OMPC_FirstPrivate, OMPC_CopyPrivate, - OMPC_NoWait, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Master : Directive<"master"> {} def OMP_Critical : Directive<"critical"> { - let allowedClauses = [OMPC_Hint]; + let allowedClauses = [ + VersionedClause + ]; } def OMP_TaskYield : Directive<"taskyield"> {} def OMP_Barrier : Directive<"barrier"> {} def OMP_TaskWait : Directive<"taskwait"> {} def OMP_TaskGroup : Directive<"taskgroup"> { - let allowedClauses = [OMPC_TaskReduction, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause + ]; } def OMP_Flush : Directive<"flush"> { - let allowedClauses = [OMPC_AcqRel, OMPC_Acquire, OMPC_Release, OMPC_Flush]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + // TODO This should ne `none` instead. Comment carried over from + // OMPKinds.def. + VersionedClause + ]; } def OMP_Ordered : Directive<"ordered"> { - let allowedClauses = [OMPC_Threads, OMPC_Simd, OMPC_Depend]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Atomic : Directive<"atomic"> { - let allowedClauses = [OMPC_Read, OMPC_Write, OMPC_Update, OMPC_Capture, - OMPC_SeqCst, OMPC_AcqRel, OMPC_Acquire, OMPC_Release, OMPC_Relaxed, - OMPC_Hint]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Target : Directive<"target"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Reduction, OMPC_Allocate, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Teams : Directive<"teams"> { - let allowedClauses = [OMPC_Default, OMPC_Private, OMPC_FirstPrivate, - OMPC_Shared, OMPC_Reduction, OMPC_NumTeams, OMPC_ThreadLimit, - OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Cancel : Directive<"cancel"> { - let allowedClauses = [OMPC_If]; + let allowedClauses = [ + VersionedClause + ]; } def OMP_Requires : Directive<"requires"> { - let allowedClauses = [OMPC_UnifiedAddress, OMPC_UnifiedSharedMemory, - OMPC_ReverseOffload, OMPC_DynamicAllocators, OMPC_AtomicDefaultMemOrder]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetData : Directive<"target data"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_UseDevicePtr, - OMPC_UseDevicePtr]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetEnterData : Directive<"target enter data"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_NoWait, - OMPC_Depend]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetExitData : Directive<"target exit data"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_NoWait, - OMPC_Depend]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallel : Directive<"target parallel"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_NoWait, - OMPC_Depend, OMPC_Private, OMPC_FirstPrivate, OMPC_DefaultMap, - OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, OMPC_Shared, OMPC_Reduction, - OMPC_IsDevicePtr, OMPC_Allocator, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallelFor : Directive<"target parallel for"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_NoWait, OMPC_Depend, - OMPC_DefaultMap, OMPC_NumThreads, OMPC_DefaultMap, OMPC_ProcBind, - OMPC_Shared, OMPC_Reduction, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, - OMPC_Linear, OMPC_IsDevicePtr, OMPC_Allocator, OMPC_Order, - OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallelDo : Directive<"target parallel do"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_NoWait, OMPC_Depend, - OMPC_DefaultMap, OMPC_NumThreads, OMPC_DefaultMap, OMPC_ProcBind, - OMPC_Shared, OMPC_Reduction, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, - OMPC_Linear, OMPC_IsDevicePtr, OMPC_Allocator, OMPC_Order, - OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetUpdate : Directive<"target update"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_To, OMPC_From, OMPC_NoWait, - OMPC_Depend]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelFor : Directive<"parallel for"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_FirstPrivate, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, - OMPC_LastPrivate, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, OMPC_Linear, - OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelDo : Directive<"parallel do"> { - let allowedClauses = [ OMPC_Default, OMPC_Private, OMPC_FirstPrivate, - OMPC_Shared, OMPC_Reduction, OMPC_Copyin, OMPC_LastPrivate, OMPC_Linear]; - let allowedOnceClauses = [OMPC_If, OMPC_NumThreads, OMPC_ProcBind, - OMPC_Schedule, OMPC_Ordered, OMPC_Collapse]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelForSimd : Directive<"parallel for simd"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_FirstPrivate, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, - OMPC_LastPrivate, OMPC_Collapse, OMPC_Schedule, OMPC_SafeLen, - OMPC_SimdLen, OMPC_Linear, OMPC_Aligned, OMPC_Ordered, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelDoSimd : Directive<"parallel do simd"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_FirstPrivate, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, - OMPC_LastPrivate, OMPC_Collapse, OMPC_Schedule, OMPC_SafeLen, - OMPC_SimdLen, OMPC_Linear, OMPC_Aligned, OMPC_Ordered, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelMaster : Directive<"parallel master"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_Private, - OMPC_FirstPrivate, OMPC_Shared, OMPC_Copyin, OMPC_Reduction, - OMPC_ProcBind, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelSections : Directive<"parallel sections"> { - let allowedClauses = [OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_FirstPrivate, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, - OMPC_LastPrivate, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ForSimd : Directive<"for simd"> { - let allowedClauses = [OMPC_Private, OMPC_FirstPrivate, OMPC_LastPrivate, - OMPC_Reduction, OMPC_Schedule, OMPC_Collapse, OMPC_NoWait, OMPC_SafeLen, - OMPC_SimdLen, OMPC_Linear, OMPC_Aligned, OMPC_Ordered, OMPC_Allocate, - OMPC_If, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + ]; } def OMP_DoSimd : Directive<"do simd"> { - let allowedClauses = [OMPC_Aligned, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_Linear, OMPC_Reduction]; - let allowedOnceClauses = [OMPC_Schedule, OMPC_Collapse, OMPC_Ordered, - OMPC_SafeLen, OMPC_SimdLen]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_CancellationPoint : Directive<"cancellation point"> {} def OMP_DeclareReduction : Directive<"declare reduction"> {} def OMP_DeclareMapper : Directive<"declare mapper"> { - let allowedClauses = [OMPC_Map]; + let allowedClauses = [ + VersionedClause + ]; } def OMP_DeclareSimd : Directive<"declare simd"> {} def OMP_TaskLoop : Directive<"taskloop"> { - let allowedClauses = [OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_Default, OMPC_Collapse, OMPC_Final, OMPC_Untied, - OMPC_Mergeable, OMPC_Priority, OMPC_GrainSize, OMPC_NoGroup, - OMPC_NumTasks, OMPC_Reduction, OMPC_InReduction, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TaskLoopSimd : Directive<"taskloop simd"> { - let allowedClauses = [OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_Default, OMPC_Collapse, OMPC_Final, OMPC_Untied, - OMPC_Mergeable, OMPC_Priority, OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, - OMPC_SimdLen, OMPC_GrainSize, OMPC_NoGroup, OMPC_NumTasks, OMPC_Reduction, - OMPC_InReduction, OMPC_Allocator, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + ]; } def OMP_Distribute : Directive<"distribute"> { - let allowedClauses = [OMPC_Private, OMPC_FirstPrivate, OMPC_LastPrivate, - OMPC_Collapse, OMPC_DistSchedule, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_DeclareTarget : Directive<"declare target"> {} def OMP_EndDeclareTarget : Directive<"end declare target"> {} def OMP_DistributeParallelFor : Directive<"distribute parallel for"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, OMPC_Schedule, - OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_DistributeParallelDo : Directive<"distribute parallel do"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, OMPC_Schedule, - OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_DistributeParallelForSimd : Directive<"distribute parallel for simd"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, OMPC_Schedule, - OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_DistributeParallelDoSimd : Directive<"distribute parallel do simd"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Copyin, OMPC_Schedule, - OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_DistributeSimd : Directive<"distribute simd"> { - let allowedClauses = [OMPC_Private, OMPC_FirstPrivate, OMPC_LastPrivate, - OMPC_Collapse, OMPC_DistSchedule, OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, - OMPC_SimdLen, OMPC_Reduction, OMPC_Allocate, OMPC_If, OMPC_NonTemporal, - OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallelForSimd : Directive<"target parallel for simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_NoWait, OMPC_Depend, - OMPC_DefaultMap, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Shared, OMPC_Reduction, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, - OMPC_Linear, OMPC_SafeLen, OMPC_SimdLen, OMPC_Aligned, OMPC_IsDevicePtr, - OMPC_Allocate, OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallelDoSimd : Directive<"target parallel do simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_NoWait, OMPC_Depend, - OMPC_DefaultMap, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Shared, OMPC_Reduction, OMPC_Collapse, OMPC_Schedule, OMPC_Ordered, - OMPC_Linear, OMPC_SafeLen, OMPC_SimdLen, OMPC_Aligned, OMPC_IsDevicePtr, - OMPC_Allocate, OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetSimd : Directive<"target simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_LastPrivate, OMPC_Linear, OMPC_Aligned, - OMPC_SafeLen, OMPC_SimdLen, OMPC_Collapse, OMPC_Reduction, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistribute : Directive<"teams distribute"> { - let allowedClauses = [OMPC_Default, OMPC_Private, OMPC_FirstPrivate, - OMPC_Shared, OMPC_Reduction, OMPC_NumTeams, OMPC_ThreadLimit, - OMPC_LastPrivate, OMPC_Collapse, OMPC_DistSchedule, OMPC_Allocate]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistributeSimd : Directive<"teams distribute simd"> { - let allowedClauses = [OMPC_Default, OMPC_Private, OMPC_FirstPrivate, - OMPC_Shared, OMPC_Reduction, OMPC_NumTeams, OMPC_ThreadLimit, - OMPC_LastPrivate, OMPC_Collapse, OMPC_DistSchedule, OMPC_Linear, - OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, OMPC_If, - OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistributeParallelForSimd : Directive<"teams distribute parallel for simd"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Schedule, OMPC_Linear, - OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_NumTeams, OMPC_ThreadLimit, - OMPC_Allocate, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistributeParallelDoSimd : Directive<"teams distribute parallel do simd"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Schedule, OMPC_Linear, - OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_NumTeams, OMPC_ThreadLimit, - OMPC_Allocate, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistributeParallelFor : Directive<"teams distribute parallel for"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Schedule, OMPC_NumTeams, - OMPC_ThreadLimit, OMPC_Copyin, OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistributeParallelDo : Directive<"teams distribute parallel do"> { - let allowedClauses = [OMPC_FirstPrivate, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_If, OMPC_NumThreads, OMPC_Default, OMPC_ProcBind, - OMPC_Private, OMPC_Shared, OMPC_Reduction, OMPC_Schedule, OMPC_NumTeams, - OMPC_ThreadLimit, OMPC_Copyin, OMPC_Allocate, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeams : Directive<"target teams"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_Allocate, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistribute : Directive<"target teams distribute"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_Allocate, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistributeParallelFor : Directive<"target teams distribute parallel for"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_NumThreads, OMPC_ProcBind, OMPC_Schedule, - OMPC_Allocate, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistributeParallelDo : Directive<"target teams distribute parallel do"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_NumThreads, OMPC_ProcBind, OMPC_Schedule, - OMPC_Allocate, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistributeParallelForSimd : Directive<"target teams distribute parallel for simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_NumThreads, OMPC_ProcBind, OMPC_Schedule, - OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistributeParallelDoSimd : Directive<"target teams distribute parallel do simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_IsDevicePtr, OMPC_Default, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_LastPrivate, OMPC_Collapse, - OMPC_DistSchedule, OMPC_NumThreads, OMPC_ProcBind, OMPC_Schedule, - OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetTeamsDistributeSimd : Directive<"target teams distribute simd"> { - let allowedClauses = [OMPC_If, OMPC_Device, OMPC_Map, OMPC_Private, - OMPC_NoWait, OMPC_Depend, OMPC_DefaultMap, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_IsDevicePtr, OMPC_Shared, OMPC_Reduction, - OMPC_NumTeams, OMPC_ThreadLimit, OMPC_Collapse, OMPC_DistSchedule, - OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, OMPC_SimdLen, OMPC_Allocate, - OMPC_NonTemporal, OMPC_Order, OMPC_UsesAllocators]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Allocate : Directive<"allocate"> { - let allowedClauses = [OMPC_Allocator]; + let allowedClauses = [ + VersionedClause + ]; } def OMP_DeclareVariant : Directive<"declare variant"> { - let allowedClauses = [OMPC_Match]; + let allowedClauses = [ + VersionedClause + ]; } def OMP_MasterTaskloop : Directive<"master taskloop"> { let allowedClauses = [ - OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, OMPC_LastPrivate, - OMPC_Default, OMPC_Collapse, OMPC_Final, OMPC_Untied, OMPC_Mergeable, - OMPC_Priority, OMPC_GrainSize, OMPC_NoGroup, OMPC_NumTasks, - OMPC_Reduction, OMPC_InReduction, OMPC_Allocate]; + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelMasterTaskloop : Directive<"parallel master taskloop"> { - let allowedClauses = [OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_Default, OMPC_Collapse, OMPC_Final, OMPC_Untied, - OMPC_Mergeable, OMPC_Priority, OMPC_GrainSize, OMPC_NoGroup, - OMPC_NumTasks, OMPC_Reduction, OMPC_Allocate, OMPC_NumThreads, - OMPC_ProcBind, OMPC_Copyin]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_MasterTaskloopSimd : Directive<"master taskloop simd"> { - let allowedClauses = [OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_DefaultMap, OMPC_Collapse, OMPC_Final, OMPC_Untied, - OMPC_Mergeable, OMPC_Priority, OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, - OMPC_SimdLen, OMPC_GrainSize, OMPC_NoGroup, OMPC_NumTasks, OMPC_Reduction, - OMPC_InReduction, OMPC_Allocate, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelMasterTaskloopSimd : Directive<"parallel master taskloop simd"> { - let allowedClauses = [OMPC_If, OMPC_Shared, OMPC_Private, OMPC_FirstPrivate, - OMPC_LastPrivate, OMPC_Default, OMPC_Collapse, OMPC_Final, OMPC_Untied, - OMPC_Mergeable, OMPC_Priority, OMPC_GrainSize, OMPC_NoGroup, - OMPC_NumTasks, OMPC_Reduction, OMPC_Allocate, OMPC_NumThreads, - OMPC_ProcBind, OMPC_Copyin, OMPC_Linear, OMPC_Aligned, OMPC_SafeLen, - OMPC_SimdLen, OMPC_NonTemporal, OMPC_Order]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Depobj : Directive<"depobj"> { - let allowedClauses = [OMPC_Depend, OMPC_Destroy, OMPC_Update, OMPC_Depobj]; + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + // TODO This should ne `none` instead. Comment carried over from + // OMPKinds.def. + VersionedClause + ]; } def OMP_Scan : Directive<"scan"> { - let allowedClauses = [OMPC_Inclusive, OMPC_Exclusive]; + let allowedClauses = [ + VersionedClause, + VersionedClause + ]; } def OMP_BeginDeclareVariant : Directive<"begin declare variant"> {} def OMP_EndDeclareVariant : Directive<"end declare variant"> {} diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h index e0427138ecfa..bfdecdd5d711 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h @@ -89,9 +89,6 @@ enum class IdentFlag { #define OMP_IDENT_FLAG(Enum, ...) constexpr auto Enum = omp::IdentFlag::Enum; #include "llvm/Frontend/OpenMP/OMPKinds.def" -/// Return true if \p C is a valid clause for \p D in version \p Version. -bool isAllowedClauseForDirective(Directive D, Clause C, unsigned Version); - /// Forward declarations for LLVM-IR types (simple, function and structure) are /// generated below. Their names are defined and used in OpenMP/OMPKinds.def. /// Here we provide the forward declarations, the initializeTypes function will diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index 05e9e950bc7d..f286403e657c 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -1147,746 +1147,3 @@ OMP_LAST_TRAIT_PROPERTY( #undef __OMP_REQUIRES_TRAIT #undef OMP_REQUIRES_TRAIT ///} - - -/// Clauses allowed per directive -/// -///{ - -#ifndef OMP_DIRECTIVE_CLAUSE -#define OMP_DIRECTIVE_CLAUSE(Directive, MinVersion, MaxVersion, Clause) -#endif - -#define __OMP_DIRECTIVE_CLAUSE(Directive, MinVersion, MaxVersion, Clause) \ - OMP_DIRECTIVE_CLAUSE(OMPD_##Directive, unsigned(MinVersion), \ - unsigned(MaxVersion), OMPC_##Clause) - -__OMP_DIRECTIVE_CLAUSE(scan, 50, ~0, inclusive) -__OMP_DIRECTIVE_CLAUSE(scan, 50, ~0, exclusive) - -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(simd, 50, ~0, if) -__OMP_DIRECTIVE_CLAUSE(simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, ordered) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(for, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~1, ordered) -__OMP_DIRECTIVE_CLAUSE(for_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(for_simd, 50, ~0, if) -__OMP_DIRECTIVE_CLAUSE(for_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(for_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(sections, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(single, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(single, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(single, 1, ~0, copyprivate) -__OMP_DIRECTIVE_CLAUSE(single, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(single, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(cancel, 1, ~0, if) - -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~1, ordered) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(parallel_for, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~1, ordered) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(parallel_for_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_master, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_sections, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, in_reduction) -__OMP_DIRECTIVE_CLAUSE(task, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(task, 50, ~0, detach) -__OMP_DIRECTIVE_CLAUSE(task, 50, ~0, affinity) - -__OMP_DIRECTIVE_CLAUSE(atomic, 1, ~0, read) -__OMP_DIRECTIVE_CLAUSE(atomic, 1, ~0, write) -__OMP_DIRECTIVE_CLAUSE(atomic, 1, ~0, update) -__OMP_DIRECTIVE_CLAUSE(atomic, 1, ~0, capture) -__OMP_DIRECTIVE_CLAUSE(atomic, 1, ~0, seq_cst) -__OMP_DIRECTIVE_CLAUSE(atomic, 50, ~0, acq_rel) -__OMP_DIRECTIVE_CLAUSE(atomic, 50, ~0, acquire) -__OMP_DIRECTIVE_CLAUSE(atomic, 50, ~0, release) -__OMP_DIRECTIVE_CLAUSE(atomic, 50, ~0, relaxed) -__OMP_DIRECTIVE_CLAUSE(atomic, 50, ~0, hint) - -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(requires, 1, ~0, unified_address) -__OMP_DIRECTIVE_CLAUSE(requires, 1, ~0, unified_shared_memory) -__OMP_DIRECTIVE_CLAUSE(requires, 1, ~0, reverse_offload) -__OMP_DIRECTIVE_CLAUSE(requires, 1, ~0, dynamic_allocators) -__OMP_DIRECTIVE_CLAUSE(requires, 1, ~0, atomic_default_mem_order) - -__OMP_DIRECTIVE_CLAUSE(allocate, 1, ~0, allocator) - -__OMP_DIRECTIVE_CLAUSE(target_data, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_data, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_data, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_data, 1, ~0, use_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_data, 50, ~0, use_device_addr) - -__OMP_DIRECTIVE_CLAUSE(target_enter_data, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_enter_data, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_enter_data, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_enter_data, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_enter_data, 1, ~0, depend) - -__OMP_DIRECTIVE_CLAUSE(target_exit_data, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_exit_data, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_exit_data, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_exit_data, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_exit_data, 1, ~0, depend) - -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_parallel, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~1, ordered) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, to) -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, from) -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_update, 1, ~0, depend) - -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(teams, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(ordered, 1, ~0, threads) -__OMP_DIRECTIVE_CLAUSE(ordered, 1, ~0, simd) -__OMP_DIRECTIVE_CLAUSE(ordered, 1, ~0, depend) - -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, in_reduction) -__OMP_DIRECTIVE_CLAUSE(taskloop, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, in_reduction) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(taskloop_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, in_reduction) -__OMP_DIRECTIVE_CLAUSE(master_taskloop, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, in_reduction) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(master_taskloop_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop, 1, ~0, copyin) - -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, final) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, untied) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, mergeable) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, priority) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, grainsize) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, nogroup) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, num_tasks) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(parallel_master_taskloop_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(critical, 1, ~0, hint) - -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(distribute, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(distribute_parallel_for_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 50, ~0, if) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(distribute_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~1, ordered) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_parallel_for_simd, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(target_simd, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_simd, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 50, ~0, if) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for_simd, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, copyin) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(teams_distribute_parallel_for, 50, ~0, order) - -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(target_teams, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_teams, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, - firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, - is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, default) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, - thread_limit) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, - dist_schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, num_threads) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, proc_bind) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for, 50, ~0, - uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - private) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - default) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - reduction) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - num_teams) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - thread_limit) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - collapse) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - dist_schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - num_threads) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - proc_bind) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - aligned) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - safelen) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - simdlen) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 1, ~0, - allocate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 50, ~0, - nontemporal) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_parallel_for_simd, 50, ~0, - uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, if) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, device) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, map) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, private) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, nowait) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, defaultmap) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, firstprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, lastprivate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, is_device_ptr) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, shared) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, reduction) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, num_teams) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, thread_limit) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, collapse) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, dist_schedule) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, linear) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, aligned) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, safelen) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, simdlen) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 1, ~0, allocate) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 50, ~0, nontemporal) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 50, ~0, order) -__OMP_DIRECTIVE_CLAUSE(target_teams_distribute_simd, 50, ~0, uses_allocators) - -__OMP_DIRECTIVE_CLAUSE(taskgroup, 1, ~0, task_reduction) -__OMP_DIRECTIVE_CLAUSE(taskgroup, 1, ~0, allocate) - -__OMP_DIRECTIVE_CLAUSE(declare_mapper, 1, ~0, map) - -__OMP_DIRECTIVE_CLAUSE(declare_variant, 1, ~0, match) - -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, acq_rel) -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, acquire) -__OMP_DIRECTIVE_CLAUSE(flush, 50, ~0, release) -// TODO This should ne `none` instead -__OMP_DIRECTIVE_CLAUSE(flush, 1, ~0, flush) - -__OMP_DIRECTIVE_CLAUSE(depobj, 50, ~0, depend) -__OMP_DIRECTIVE_CLAUSE(depobj, 50, ~0, destroy) -__OMP_DIRECTIVE_CLAUSE(depobj, 50, ~0, update) -// TODO This should ne `none` instead -__OMP_DIRECTIVE_CLAUSE(depobj, 50, ~0, depobj) - -#undef __OMP_DIRECTIVE_CLAUSE -#undef OMP_DIRECTIVE_CLAUSE -///} diff --git a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp b/llvm/lib/Frontend/OpenMP/OMPConstants.cpp index a628501e1f91..471f0361191e 100644 --- a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPConstants.cpp @@ -21,17 +21,6 @@ using namespace types; #include "llvm/Frontend/OpenMP/OMP.cpp.inc" -bool llvm::omp::isAllowedClauseForDirective(Directive D, Clause C, - unsigned Version) { - assert(unsigned(D) <= llvm::omp::Directive_enumSize); - assert(unsigned(C) <= llvm::omp::Clause_enumSize); -#define OMP_DIRECTIVE_CLAUSE(Dir, MinVersion, MaxVersion, Cl) \ - if (D == Dir && C == Cl && MinVersion <= Version && MaxVersion >= Version) \ - return true; -#include "llvm/Frontend/OpenMP/OMPKinds.def" - return false; -} - /// Declarations for LLVM-IR types (simple, array, function and structure) are /// generated below. Their names are defined and used in OpenMPKinds.def. Here /// we provide the declarations, the initializeTypes function will provide the diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index 19fe218c4fa1..b4d1a6ed2026 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -19,7 +19,10 @@ def TDLC_ClauseB : Clause<"clauseb"> { } def TDL_DirA : Directive<"dira"> { - let allowedClauses = [TDLC_ClauseA, TDLC_ClauseB]; + let allowedClauses = [ + VersionedClause, + VersionedClause + ]; let isDefault = 1; } @@ -61,6 +64,9 @@ def TDL_DirA : Directive<"dira"> { // CHECK-EMPTY: // CHECK-NEXT: llvm::StringRef getTdlClauseName(Clause C); // CHECK-EMPTY: +// CHECK-NEXT: /// Return true if \p C is a valid clause for \p D in version \p Version. +// CHECK-NEXT: bool isAllowedClauseForDirective(Directive D, Clause C, unsigned Version); +// CHECK-EMPTY: // CHECK-NEXT: } // namespace tdl // CHECK-NEXT: } // namespace llvm // CHECK-NEXT: #endif // LLVM_Tdl_INC @@ -96,3 +102,13 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: } // IMPL-NEXT: llvm_unreachable("Invalid Tdl Clause kind"); // IMPL-NEXT: } +// IMPL-EMPTY: +// IMPL-NEXT: bool llvm::tdl::isAllowedClauseForDirective(Directive D, Clause C, unsigned Version) { +// IMPL-NEXT: assert(unsigned(D) <= llvm::tdl::Directive_enumSize); +// IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 1 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clauseb && 1 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; +// IMPL-NEXT: return false; +// IMPL-NEXT: } diff --git a/llvm/test/TableGen/directive2.td b/llvm/test/TableGen/directive2.td index 545dd251fdaf..8e180e20df1f 100644 --- a/llvm/test/TableGen/directive2.td +++ b/llvm/test/TableGen/directive2.td @@ -19,7 +19,10 @@ def TDLC_ClauseB : Clause<"clauseb"> { } def TDL_DirA : Directive<"dira"> { - let allowedClauses = [TDLC_ClauseA, TDLC_ClauseB]; + let allowedClauses = [ + VersionedClause, + VersionedClause + ]; let isDefault = 1; } @@ -52,6 +55,9 @@ def TDL_DirA : Directive<"dira"> { // CHECK-EMPTY: // CHECK-NEXT: llvm::StringRef getTdlClauseName(Clause C); // CHECK-EMPTY: +// CHECK-NEXT: /// Return true if \p C is a valid clause for \p D in version \p Version. +// CHECK-NEXT: bool isAllowedClauseForDirective(Directive D, Clause C, unsigned Version); +// CHECK-EMPTY: // CHECK-NEXT: } // namespace tdl // CHECK-NEXT: } // namespace llvm // CHECK-NEXT: #endif // LLVM_Tdl_INC @@ -86,4 +92,15 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: return "clauseb"; // IMPL-NEXT: } // IMPL-NEXT: llvm_unreachable("Invalid Tdl Clause kind"); -// IMPL-NEXT: } \ No newline at end of file +// IMPL-NEXT: } +// IMPL-EMPTY: +// IMPL-NEXT: bool llvm::tdl::isAllowedClauseForDirective(Directive D, Clause C, unsigned Version) { +// IMPL-NEXT: assert(unsigned(D) <= llvm::tdl::Directive_enumSize); +// IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 2 <= Version && 4 >= Version) +// IMPL-NEXT: return true; +// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clauseb && 2 <= Version && 2147483647 >= Version) +// IMPL-NEXT: return true; +// IMPL-NEXT: return false; +// IMPL-NEXT: } + diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index 93fceb7a73ec..a9f3569c07a2 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -22,6 +22,14 @@ using namespace llvm; namespace llvm { +// Get Directive or Clause name formatted by replacing whitespaces with +// underscores. +std::string getFormattedName(StringRef Name) { + std::string N = Name.str(); + std::replace(N.begin(), N.end(), ' ', '_'); + return N; +} + // Generate enum class void GenerateEnumClass(const std::vector &Records, raw_ostream &OS, StringRef Enum, StringRef Prefix, StringRef CppNamespace, @@ -30,9 +38,7 @@ void GenerateEnumClass(const std::vector &Records, raw_ostream &OS, OS << "enum class " << Enum << " {\n"; for (const auto &R : Records) { const auto Name = R->getValueAsString("name"); - std::string N = Name.str(); - std::replace(N.begin(), N.end(), ' ', '_'); - OS << " " << Prefix << N << ",\n"; + OS << " " << Prefix << getFormattedName(Name) << ",\n"; } OS << "};\n"; OS << "\n"; @@ -47,12 +53,10 @@ void GenerateEnumClass(const std::vector &Records, raw_ostream &OS, if (MakeEnumAvailableInNamespace) { OS << "\n"; for (const auto &R : Records) { - const auto Name = R->getValueAsString("name"); - std::string N = Name.str(); - std::replace(N.begin(), N.end(), ' ', '_'); - OS << "constexpr auto " << Prefix << N << " = " - << "llvm::" << CppNamespace << "::" << Enum << "::" << Prefix << N - << ";\n"; + const auto FormattedName = getFormattedName(R->getValueAsString("name")); + OS << "constexpr auto " << Prefix << FormattedName << " = " + << "llvm::" << CppNamespace << "::" << Enum << "::" << Prefix + << FormattedName << ";\n"; } } } @@ -122,6 +126,11 @@ void EmitDirectivesDecl(RecordKeeper &Records, raw_ostream &OS) { OS << "\n"; OS << "llvm::StringRef get" << LanguageName << "ClauseName(Clause C);\n"; OS << "\n"; + OS << "/// Return true if \\p C is a valid clause for \\p D in version \\p " + << "Version.\n"; + OS << "bool isAllowedClauseForDirective(Directive D, " + << "Clause C, unsigned Version);\n"; + OS << "\n"; // Closing namespaces for (auto Ns : llvm::reverse(Namespaces)) @@ -143,9 +152,7 @@ void GenerateGetName(const std::vector &Records, raw_ostream &OS, for (const auto &R : Records) { const auto Name = R->getValueAsString("name"); const auto AlternativeName = R->getValueAsString("alternativeName"); - std::string N = Name.str(); - std::replace(N.begin(), N.end(), ' ', '_'); - OS << " case " << Prefix << N << ":\n"; + OS << " case " << Prefix << getFormattedName(Name) << ":\n"; OS << " return \""; if (AlternativeName.empty()) OS << Name; @@ -173,9 +180,8 @@ void GenerateGetKind(const std::vector &Records, raw_ostream &OS, return; } - const auto DefaultName = (*DefaultIt)->getValueAsString("name"); - std::string DefaultEnum = DefaultName.str(); - std::replace(DefaultEnum.begin(), DefaultEnum.end(), ' ', '_'); + const auto FormattedDefaultName = + getFormattedName((*DefaultIt)->getValueAsString("name")); OS << "\n"; OS << Enum << " llvm::" << Namespace << "::get" << LanguageName << Enum @@ -184,15 +190,66 @@ void GenerateGetKind(const std::vector &Records, raw_ostream &OS, for (const auto &R : Records) { const auto Name = R->getValueAsString("name"); - std::string N = Name.str(); - std::replace(N.begin(), N.end(), ' ', '_'); if (ImplicitAsUnknown && R->getValueAsBit("isImplicit")) { - OS << " .Case(\"" << Name << "\"," << Prefix << DefaultEnum << ")\n"; + OS << " .Case(\"" << Name << "\"," << Prefix << FormattedDefaultName + << ")\n"; } else { - OS << " .Case(\"" << Name << "\"," << Prefix << N << ")\n"; + OS << " .Case(\"" << Name << "\"," << Prefix << getFormattedName(Name) + << ")\n"; } } - OS << " .Default(" << Prefix << DefaultEnum << ");\n"; + OS << " .Default(" << Prefix << FormattedDefaultName << ");\n"; + OS << "}\n"; +} + +void GenerateTestForAllowedClauses(const std::vector &Clauses, + raw_ostream &OS, StringRef DirectiveName, + StringRef DirectivePrefix, + StringRef ClausePrefix) { + + const auto FormattedDirectiveName = getFormattedName(DirectiveName); + for (const auto &C : Clauses) { + const auto MinVersion = C->getValueAsInt("minVersion"); + const auto MaxVersion = C->getValueAsInt("maxVersion"); + const auto SpecificClause = C->getValueAsDef("clause"); + const auto ClauseName = SpecificClause->getValueAsString("name"); + + OS << " if (D == " << DirectivePrefix << FormattedDirectiveName + << " && C == " << ClausePrefix << getFormattedName(ClauseName) << " && " + << MinVersion << " <= Version && " << MaxVersion << " >= Version)\n"; + OS << " return true;\n"; + } +} + +// Generate the isAllowedClauseForDirective function implementation. +void GenerateIsAllowedClause(const std::vector &Directives, + raw_ostream &OS, StringRef DirectivePrefix, + StringRef ClausePrefix, StringRef CppNamespace) { + OS << "\n"; + OS << "bool llvm::" << CppNamespace << "::isAllowedClauseForDirective(" + << "Directive D, Clause C, unsigned Version) {\n"; + OS << " assert(unsigned(D) <= llvm::" << CppNamespace + << "::Directive_enumSize);\n"; + OS << " assert(unsigned(C) <= llvm::" << CppNamespace + << "::Clause_enumSize);\n"; + + for (const auto &D : Directives) { + const auto DirectiveName = D->getValueAsString("name"); + + const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); + GenerateTestForAllowedClauses(AllowedClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); + + const auto &AllowedOnceClauses = + D->getValueAsListOfDefs("allowedOnceClauses"); + GenerateTestForAllowedClauses(AllowedOnceClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); + + const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); + GenerateTestForAllowedClauses(RequiredClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); + } + OS << " return false;\n"; OS << "}\n"; } @@ -233,6 +290,9 @@ void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { // getClauseName(Clause Kind) GenerateGetName(Clauses, OS, "Clause", ClausePrefix, LanguageName, CppNamespace); + + GenerateIsAllowedClause(Directives, OS, DirectivePrefix, ClausePrefix, + CppNamespace); } } // namespace llvm From llvm-commits at lists.llvm.org Mon Jul 6 19:20:20 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:20:20 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <3e5982ace3256fff9ef650799630239f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG65482e8a703d: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to… (authored by clementval). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275889.patch Type: text/x-patch Size: 109657 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 19:27:46 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Mon, 06 Jul 2020 19:27:46 -0700 (PDT) Subject: [llvm] fc67b25 - [gn build] Port 939d8309dbd Message-ID: <5f03dda2.1c69fb81.40633.2c30@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-07T02:20:39Z New Revision: fc67b25426c8767ab5941d376ab0f0628d62256e URL: https://github.com/llvm/llvm-project/commit/fc67b25426c8767ab5941d376ab0f0628d62256e DIFF: https://github.com/llvm/llvm-project/commit/fc67b25426c8767ab5941d376ab0f0628d62256e.diff LOG: [gn build] Port 939d8309dbd Added: Modified: llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn b/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn index 9c86dbff22ad..38bbb68d64f3 100644 --- a/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn +++ b/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn @@ -82,6 +82,7 @@ copy("Headers") { "adxintrin.h", "altivec.h", "ammintrin.h", + "amxintrin.h", "arm64intr.h", "arm_acle.h", "arm_cmse.h", From llvm-commits at lists.llvm.org Mon Jul 6 19:39:06 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:39:06 +0000 (UTC) Subject: [PATCH] D78861: [Attributor] [WIP] Track AA dependency using dependency graph In-Reply-To: References: Message-ID: bbn updated this revision to Diff 275891. bbn added a comment. Herald added a subscriber: jfb. - Added tests for the dot file - other style fixes CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78861/new/ https://reviews.llvm.org/D78861 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/depgraph.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D78861.275891.patch Type: text/x-patch Size: 30040 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 19:43:19 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:43:19 +0000 (UTC) Subject: [PATCH] D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes In-Reply-To: References: Message-ID: <949bd25bb6f7498f864255c50f7b2ecb@localhost.localdomain> kuhar added a comment. Overall, this seems like a good idea to me. The amount of templated code started growing out of hand some time ago, to the point where it's really hard to make logically changes, especially in the generic updater code. This part of the code is *very* performance sensitive and definitely needs benchmarking before moving forward. Have you tried doing some performance evaluation on this change? I suggest compiling a few mid to large size programs (e.g., sqlite, webassembly, opt, clang, rippled) and compiling them into whole-program bitcode, and then running `opt -O3` on this bitcode. This is pretty easy with gllvm ; I can dig up my old instruction if that would help. Nit: please fix the linter warnings. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83089/new/ https://reviews.llvm.org/D83089 From llvm-commits at lists.llvm.org Mon Jul 6 19:44:04 2020 From: llvm-commits at lists.llvm.org (Shilei Tian via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:44:04 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <53a90750df09ae42ebabcf30b7cca1f0@localhost.localdomain> tianshilei1992 added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:1047 + for (Use *U : ToBeReplacedStateMachineUses) + U->set(ConstantExpr::getBitCast(ID, U->get()->getType())); + ---------------- Probably we need to set `Changed` to `true` here? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 From llvm-commits at lists.llvm.org Mon Jul 6 19:52:14 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:52:14 +0000 (UTC) Subject: [PATCH] D83092: DomTree: Add findSiblingOfUncle helper In-Reply-To: References: Message-ID: kuhar requested changes to this revision. kuhar added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/lib/Support/GenericDomTree.cpp:220 +/// the degenerate case where \p A itself is a sibling of \p Uncle. +const GenericDomTreeNodeBase *GenericDominatorTreeBase::findSiblingOfUncle( + const GenericDomTreeNodeBase *A, ---------------- arsenm wrote: > nhaehnle wrote: > > arsenm wrote: > > > I'm not sure these are the right family analogies. This could also find a great uncle, or the same parent. > > Fair enough, do you have a suggestion for a better name? `findSiblingOfNthUncle` perhaps? > I don't really have a better idea. The comment could maybe explain more of the cases it can encounter? "some ancestor" is a bit vague. I agree with arsenm, the naming is unfortunate. Maybe `climbUntilSiblings`? Whatever name we settle on here, I think an ascii art with an example would really help here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83092/new/ https://reviews.llvm.org/D83092 From llvm-commits at lists.llvm.org Mon Jul 6 19:54:07 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:54:07 +0000 (UTC) Subject: [PATCH] D77251: [llvm][CodeGen] Addressing modes for SVE ldN. In-Reply-To: References: Message-ID: fpetrogalli updated this revision to Diff 275892. fpetrogalli added a comment. I have rebased on top of master and added the bfloat test cases. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77251/new/ https://reviews.llvm.org/D77251 Files: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D77251.275892.patch Type: text/x-patch Size: 45029 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 19:55:29 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 02:55:29 +0000 (UTC) Subject: [PATCH] D83090: DomTree: Add TreeNode type alias In-Reply-To: References: Message-ID: kuhar accepted this revision. kuhar added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83090/new/ https://reviews.llvm.org/D83090 From llvm-commits at lists.llvm.org Mon Jul 6 20:00:40 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:00:40 +0000 (UTC) Subject: [PATCH] D83270: [OpenMP] Compute a proper module slice for the CGSCCC pass In-Reply-To: References: Message-ID: <951b697c761998fca28321c65b37b09f@localhost.localdomain> jdoerfert updated this revision to Diff 275894. jdoerfert added a comment. Limit foreachUse to the SCC and not module slice Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83270/new/ https://reviews.llvm.org/D83270 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83270.275894.patch Type: text/x-patch Size: 7789 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 20:03:05 2020 From: llvm-commits at lists.llvm.org (Wenlei He via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:03:05 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: <6b207f7ff1379b40c52b377353a7428c@localhost.localdomain> wenlei accepted this revision. wenlei added a comment. This revision is now accepted and ready to land. Thanks for measurement. LGTM. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 From llvm-commits at lists.llvm.org Mon Jul 6 20:10:54 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via llvm-commits) Date: Mon, 06 Jul 2020 20:10:54 -0700 (PDT) Subject: [llvm] 1b15397 - [PowerPC] Do not RAUW combined nodes in VECTOR_SHUFFLE legalization Message-ID: <5f03e7be.1c69fb81.ba42c.df43@mx.google.com> Author: Nemanja Ivanovic Date: 2020-07-06T22:09:28-05:00 New Revision: 1b1539712e1ee30c02ed20493682fc05d52391c0 URL: https://github.com/llvm/llvm-project/commit/1b1539712e1ee30c02ed20493682fc05d52391c0 DIFF: https://github.com/llvm/llvm-project/commit/1b1539712e1ee30c02ed20493682fc05d52391c0.diff LOG: [PowerPC] Do not RAUW combined nodes in VECTOR_SHUFFLE legalization When legalizing shuffles, we make an attempt to combine it into a PPC specific canonical form that avoids a need for a swap. If the combine is successful, we RAUW the node and the custom legalization replaces the now dead node instead of the one it should replace. Remove that erroneous call to RAUW. Added: Modified: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index 815a84e8c320..ff8e2382ec65 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -9896,9 +9896,10 @@ SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, // to vector legalization will not be sent to the target combine. Try to // combine it here. if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) { - DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle); Op = NewShuffle; SVOp = cast(Op); + V1 = Op.getOperand(0); + V2 = Op.getOperand(1); } EVT VT = Op.getValueType(); bool isLittleEndian = Subtarget.isLittleEndian(); diff --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll index 53e48b185714..ada7c73cd9ed 100644 --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll @@ -375,5 +375,49 @@ entry: ret <4 x i32> %vecins1 } +define dso_local <16 x i8> @no_RAUW_in_combine_during_legalize(i32* nocapture readonly %ptr, i32 signext %offset) local_unnamed_addr #0 { +; CHECK-P8-LABEL: no_RAUW_in_combine_during_legalize: +; CHECK-P8: # %bb.0: # %entry +; CHECK-P8-NEXT: addis r5, r2, .LCPI15_0 at toc@ha +; CHECK-P8-NEXT: sldi r4, r4, 2 +; CHECK-P8-NEXT: xxlxor v4, v4, v4 +; CHECK-P8-NEXT: addi r5, r5, .LCPI15_0 at toc@l +; CHECK-P8-NEXT: lxsiwzx v2, r3, r4 +; CHECK-P8-NEXT: lvx v3, 0, r5 +; CHECK-P8-NEXT: vperm v2, v4, v2, v3 +; CHECK-P8-NEXT: blr +; +; CHECK-P9-LABEL: no_RAUW_in_combine_during_legalize: +; CHECK-P9: # %bb.0: # %entry +; CHECK-P9-NEXT: sldi r4, r4, 2 +; CHECK-P9-NEXT: lxsiwzx v2, r3, r4 +; CHECK-P9-NEXT: addis r3, r2, .LCPI15_0 at toc@ha +; CHECK-P9-NEXT: addi r3, r3, .LCPI15_0 at toc@l +; CHECK-P9-NEXT: lxvx v3, 0, r3 +; CHECK-P9-NEXT: xxlxor v4, v4, v4 +; CHECK-P9-NEXT: vperm v2, v4, v2, v3 +; CHECK-P9-NEXT: blr +; +; CHECK-NOVSX-LABEL: no_RAUW_in_combine_during_legalize: +; CHECK-NOVSX: # %bb.0: # %entry +; CHECK-NOVSX-NEXT: sldi r4, r4, 2 +; CHECK-NOVSX-NEXT: vxor v2, v2, v2 +; CHECK-NOVSX-NEXT: lwzx r3, r3, r4 +; CHECK-NOVSX-NEXT: std r3, -16(r1) +; CHECK-NOVSX-NEXT: addi r3, r1, -16 +; CHECK-NOVSX-NEXT: lvx v3, 0, r3 +; CHECK-NOVSX-NEXT: vmrglb v2, v2, v3 +; CHECK-NOVSX-NEXT: blr +entry: + %idx.ext = sext i32 %offset to i64 + %add.ptr = getelementptr inbounds i32, i32* %ptr, i64 %idx.ext + %0 = load i32, i32* %add.ptr, align 4 + %conv = zext i32 %0 to i64 + %splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0 + %1 = bitcast <2 x i64> %splat.splatinsert to <16 x i8> + %shuffle = shufflevector <16 x i8> %1, <16 x i8> , <16 x i32> + ret <16 x i8> %shuffle +} + declare double @dummy() local_unnamed_addr attributes #0 = { nounwind } From llvm-commits at lists.llvm.org Mon Jul 6 20:11:49 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:11:49 +0000 (UTC) Subject: [PATCH] D83087: DomTree: remove explicit use of DomTreeNodeBase::iterator In-Reply-To: References: Message-ID: <5dcc5253fd173402f6aad63a3ea956a8@localhost.localdomain> kuhar accepted this revision. kuhar added a comment. LGTM modulo accidental formatting changes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83087/new/ https://reviews.llvm.org/D83087 From llvm-commits at lists.llvm.org Mon Jul 6 20:33:10 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:33:10 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: sameerarora101 marked 2 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:169-170 + for (const auto &OldNew : Config.RPathsToUpdate) { + StringRef Old = OldNew.getFirst(); + StringRef New = OldNew.getSecond(); + if (RPaths.count(Old) == 0) ---------------- smeenai wrote: > sameerarora101 wrote: > > this is the lates update. Would it work on Darwin? thanks > Yup, that builds. I believe this is a libc++ bug though, and I commented on https://bugs.llvm.org/show_bug.cgi?id=17550#c11 ok, thanks a lot @smeenai . I'll commit this now and keep an eye on other buildbot failures Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 From llvm-commits at lists.llvm.org Mon Jul 6 20:40:21 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:40:21 +0000 (UTC) Subject: [PATCH] D83275: [llc] (almost) remove `--print-machineinstrs` Message-ID: ychen created this revision. ychen added reviewers: arsenm, dsanders. Herald added subscribers: llvm-commits, jfb, atanasyan, jrtc27, aheejin, hiraditya, jgravelle-google, sbc100, wdng, sdardis, dschuff. Herald added a project: LLVM. Its effect could be achieved by `-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to print MIR after ISel which could not be done with `-print-after`/`-stop-after` since isel pass does not have commandline name. That's the reason `--print-machineinstrs` is downgraded to `--print-after-isel` in this patch. `--print-after-isel` could be removed after we switch to new pass manager since isel pass would have a commandline text name to use `print-after` or equivalent switches. The motivation of this patch is to reduce tests dependency on would-be-deprecated feature. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83275 Files: llvm/docs/CommandGuide/llc.rst llvm/docs/CommandGuide/lli.rst llvm/include/llvm/CodeGen/TargetPassConfig.h llvm/include/llvm/Target/TargetMachine.h llvm/include/llvm/Target/TargetOptions.h llvm/lib/CodeGen/MachineOperand.cpp llvm/lib/CodeGen/TargetPassConfig.cpp llvm/lib/Target/Mips/MipsTargetMachine.cpp llvm/test/CodeGen/AArch64/chkstk.ll llvm/test/CodeGen/AArch64/max-jump-table.ll llvm/test/CodeGen/AArch64/min-jump-table.ll llvm/test/CodeGen/ARM/ifcvt-branch-weight-bug.ll llvm/test/CodeGen/ARM/ifcvt-branch-weight.ll llvm/test/CodeGen/ARM/ifcvt-iter-indbr.ll llvm/test/CodeGen/ARM/tail-merge-branch-weight.ll llvm/test/CodeGen/ARM/taildup-branch-weight.ll llvm/test/CodeGen/Generic/print-machineinstrs.ll llvm/test/CodeGen/Hexagon/ifcvt-edge-weight.ll llvm/test/CodeGen/X86/llc-print-machineinstrs.mir llvm/test/DebugInfo/WebAssembly/dbg-value-live-interval.ll llvm/test/DebugInfo/WebAssembly/dbg-value-move-2.ll llvm/test/DebugInfo/WebAssembly/dbg-value-move.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83275.275897.patch Type: text/x-patch Size: 26432 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 20:45:05 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:45:05 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <97cf7a5dc0359d9aa9ba3575832c4fd9@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- Higuoxing wrote: > dblaikie wrote: > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Mon Jul 6 20:50:44 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:50:44 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <1e2f31052f685bcc1673579c210fd4d8@localhost.localdomain> NeHuang updated this revision to Diff 275896. NeHuang added a comment. Thanks @sfertile and @stefanp for the review! To Sean's question on the test coverage in this patch: Yes, your understand is correct. Addressed the review comments: - Clang-format line 1039 in `lld/ELF/Arch/PPC64.cpp` - Inserted fatal errors for the unsupported call protocols and notoc thunks. - Condense the 4 lit cases into 1 lit test with an input file. - Added another lit test with an input file for `callee` with hidden visibility. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/Inputs/ppc64-extern-callee-hidden.s lld/test/ELF/Inputs/ppc64-extern-callee.s lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.275896.patch Type: text/x-patch Size: 15601 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 20:56:04 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:56:04 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: dblaikie added a comment. In D82881#2133548 , @ABataev wrote: > In D82881#2133511 , @aprantl wrote: > > > And conversely, with this patch applied, do GDB and LLDB still produce the expected result? > > > GDB works correctly. Did not check with lldb, but it also should work. The result is similar to the debug info, produced for the next code: > > struct { > short : 3; > short : 6; > } a; > Similar, but seems different in a critical way - in that code, the type of the field is short, which has size 2. Which matches the size of the field. I think it would be pretty surprising to handle DWARF where the size of a field is different from the size of the type of that field? That said, I don't have great suggestions for how the DWARF should communicate this packed situation where a bitfield crosses a byte boundary either. > But the code, produced by the compiler, is also the same. So, I think, the debug info also should be the same. > >> Also, what happens to the next bit field or variable right after the bit-filed with the now larger container? Is that affected by the patch? > > It does not affect the next fields. We point exactly to the bytes, allocated for this particular bitfield only. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Mon Jul 6 20:58:33 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 03:58:33 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: dblaikie added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll:23-30 +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8, int p9, int pa, int pb, int pc) { +; int result; +; if (k) +; result = p1 * p2 + p3 / p4 - p5 * p6 + p7 / p8 - p9 * pa + pb / pc; +; else +; result = p1 / p2 - p3 * p4 + p5 / p6 - p7 * p8 + p9 / pa - pb * pc; +; return result; ---------------- tmsriram wrote: > dblaikie wrote: > > Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call? > > > > (etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc) > > > > > It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation: > > _Z7computebiiiiiiiiiiii.1: # %if.then > .cfi_startproc > .cfi_def_cfa %rbp, 16 > .cfi_offset %rbx, -48 > .cfi_offset %r12, -40 > .cfi_offset %r14, -32 > .cfi_offset %r15, -24 > .cfi_offset %rbp, -16 > > Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers. > Ah, OK - a comment might be handy to describe that? And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc? Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Mon Jul 6 21:04:57 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via llvm-commits) Date: Mon, 06 Jul 2020 21:04:57 -0700 (PDT) Subject: [llvm] 3b5db7f - [llvm-install-name-tool] Merge install-name options Message-ID: <5f03f469.1c69fb81.8ceea.851c@mx.google.com> Author: Sameer Arora Date: 2020-07-06T20:32:32-07:00 New Revision: 3b5db7fc69bb1efac6f017830af98f192a1f8ab4 URL: https://github.com/llvm/llvm-project/commit/3b5db7fc69bb1efac6f017830af98f192a1f8ab4 DIFF: https://github.com/llvm/llvm-project/commit/3b5db7fc69bb1efac6f017830af98f192a1f8ab4.diff LOG: [llvm-install-name-tool] Merge install-name options This diff merges all options for llvm-install-name-tool under a single function processLoadCommands. Also adds another test case for -add_rpath option. Test plan: make check-all Reviewed by: jhenderson, alexshap, smeenai, Ktwu Differential Revision: https://reviews.llvm.org/D82812 Added: Modified: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test index 1435c6b744c8..7b21fdc2e03c 100644 --- a/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test +++ b/llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test @@ -22,6 +22,13 @@ # NO-INPUT: no input file specified +## Add same RPATH twice: +# RUN: not llvm-install-name-tool -add_rpath @executable_X \ +# RUN: -add_rpath @executable_X %t.i386 2>&1 \ +# RUN: | FileCheck --check-prefix=DOUBLE %s + +# DOUBLE: duplicate load command + ## Check that cmdsize accounts for NULL terminator. # RUN: yaml2obj %p/Inputs/x86_64.yaml -o %t.x86_64 # RUN: llvm-install-name-tool -add_rpath abcd %t.x86_64 diff --git a/llvm/tools/llvm-objcopy/CopyConfig.cpp b/llvm/tools/llvm-objcopy/CopyConfig.cpp index f93406f371d0..1fde54dd290a 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.cpp +++ b/llvm/tools/llvm-objcopy/CopyConfig.cpp @@ -874,42 +874,39 @@ parseInstallNameToolOptions(ArrayRef ArgsArr) { auto Match = [=](StringRef RPath) { return RPath == Old || RPath == New; }; // Cannot specify duplicate -rpath entries - auto It1 = find_if(Config.RPathsToUpdate, - [&Match](const std::pair &OldNew) { - return Match(OldNew.first) || Match(OldNew.second); - }); + auto It1 = find_if( + Config.RPathsToUpdate, + [&Match](const DenseMap::value_type &OldNew) { + return Match(OldNew.getFirst()) || Match(OldNew.getSecond()); + }); if (It1 != Config.RPathsToUpdate.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -rpath %s %s and -rpath %s %s", - It1->first.str().c_str(), It1->second.str().c_str(), - Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -rpath " + It1->getFirst() + + " " + It1->getSecond() + " and -rpath " + + Old + " " + New); // Cannot specify the same rpath under both -delete_rpath and -rpath auto It2 = find_if(Config.RPathsToRemove, Match); if (It2 != Config.RPathsToRemove.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -delete_rpath %s and -rpath %s %s", - It2->str().c_str(), Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -delete_rpath " + *It2 + + " and -rpath " + Old + " " + New); // Cannot specify the same rpath under both -add_rpath and -rpath auto It3 = find_if(Config.RPathToAdd, Match); if (It3 != Config.RPathToAdd.end()) - return createStringError( - errc::invalid_argument, - "cannot specify both -add_rpath %s and -rpath %s %s", - It3->str().c_str(), Old.str().c_str(), New.str().c_str()); + return createStringError(errc::invalid_argument, + "cannot specify both -add_rpath " + *It3 + + " and -rpath " + Old + " " + New); - Config.RPathsToUpdate.emplace_back(Old, New); + Config.RPathsToUpdate.insert({Old, New}); } if (auto *Arg = InputArgs.getLastArg(INSTALL_NAME_TOOL_id)) Config.SharedLibId = Arg->getValue(); for (auto *Arg : InputArgs.filtered(INSTALL_NAME_TOOL_change)) { - Config.InstallNamesToUpdate.emplace_back(Arg->getValue(0), - Arg->getValue(1)); + Config.InstallNamesToUpdate.insert({Arg->getValue(0), Arg->getValue(1)}); } SmallVector Positional; diff --git a/llvm/tools/llvm-objcopy/CopyConfig.h b/llvm/tools/llvm-objcopy/CopyConfig.h index ce119dee5bff..1341dd674c7b 100644 --- a/llvm/tools/llvm-objcopy/CopyConfig.h +++ b/llvm/tools/llvm-objcopy/CopyConfig.h @@ -178,8 +178,8 @@ struct CopyConfig { std::vector DumpSection; std::vector SymbolsToAdd; std::vector RPathToAdd; - std::vector> RPathsToUpdate; - std::vector> InstallNamesToUpdate; + DenseMap RPathsToUpdate; + DenseMap InstallNamesToUpdate; DenseSet RPathsToRemove; // install-name-tool's id option diff --git a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp index 3844b6f62de6..5ca5b133572b 100644 --- a/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp +++ b/llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp @@ -42,35 +42,6 @@ static StringRef getPayloadString(const LoadCommand &LC) { .rtrim('\0'); } -static Error removeLoadCommands(const CopyConfig &Config, Object &Obj) { - DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), - Config.RPathsToRemove.end()); - - LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { - StringRef RPath = getPayloadString(LC); - if (RPathsToRemove.count(RPath)) { - RPathsToRemove.erase(RPath); - return true; - } - } - return false; - }; - - if (Error E = Obj.removeLoadCommands(RemovePred)) - return E; - - // Emit an error if the Mach-O binary does not contain an rpath path name - // specified in -delete_rpath. - for (StringRef RPath : Config.RPathsToRemove) { - if (RPathsToRemove.count(RPath)) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: %s", - RPath.str().c_str()); - } - return Error::success(); -} - static Error removeSections(const CopyConfig &Config, Object &Obj) { SectionPred RemovePred = [](const std::unique_ptr
&) { return false; @@ -157,6 +128,103 @@ static LoadCommand buildRPathLoadCommand(StringRef Path) { return LC; } +static Error processLoadCommands(const CopyConfig &Config, Object &Obj) { + // Remove RPaths. + DenseSet RPathsToRemove(Config.RPathsToRemove.begin(), + Config.RPathsToRemove.end()); + + LoadCommandPred RemovePred = [&RPathsToRemove](const LoadCommand &LC) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) { + StringRef RPath = getPayloadString(LC); + if (RPathsToRemove.count(RPath)) { + RPathsToRemove.erase(RPath); + return true; + } + } + return false; + }; + + if (Error E = Obj.removeLoadCommands(RemovePred)) + return E; + + // Emit an error if the Mach-O binary does not contain an rpath path name + // specified in -delete_rpath. + for (StringRef RPath : Config.RPathsToRemove) { + if (RPathsToRemove.count(RPath)) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: %s", + RPath.str().c_str()); + } + + DenseSet RPaths; + + // Get all existing RPaths. + for (LoadCommand &LC : Obj.LoadCommands) { + if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH) + RPaths.insert(getPayloadString(LC)); + } + + // Throw errors for invalid RPaths. + for (const auto &OldNew : Config.RPathsToUpdate) { + StringRef Old = OldNew.getFirst(); + StringRef New = OldNew.getSecond(); + if (RPaths.count(Old) == 0) + return createStringError(errc::invalid_argument, + "no LC_RPATH load command with path: " + Old); + if (RPaths.count(New) != 0) + return createStringError(errc::invalid_argument, + "rpath " + New + + " would create a duplicate load command"); + } + + // Update load commands. + for (LoadCommand &LC : Obj.LoadCommands) { + switch (LC.MachOLoadCommand.load_command_data.cmd) { + case MachO::LC_ID_DYLIB: + if (Config.SharedLibId) { + StringRef Id = Config.SharedLibId.getValue(); + if (Id.empty()) + return createStringError(errc::invalid_argument, + "cannot specify an empty id"); + updateLoadCommandPayloadString(LC, Id); + } + break; + + case MachO::LC_RPATH: { + StringRef RPath = getPayloadString(LC); + StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath); + if (!NewRPath.empty()) + updateLoadCommandPayloadString(LC, NewRPath); + break; + } + + // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB + // here once llvm-objcopy supports them. + case MachO::LC_LOAD_DYLIB: + case MachO::LC_LOAD_WEAK_DYLIB: + StringRef InstallName = getPayloadString(LC); + StringRef NewInstallName = + Config.InstallNamesToUpdate.lookup(InstallName); + if (!NewInstallName.empty()) + updateLoadCommandPayloadString(LC, + NewInstallName); + break; + } + } + + // Add new RPaths. + for (StringRef RPath : Config.RPathToAdd) { + if (RPaths.count(RPath) != 0) + return createStringError(errc::invalid_argument, + "rpath " + RPath + + " would create a duplicate load command"); + RPaths.insert(RPath); + Obj.addLoadCommand(buildRPathLoadCommand(RPath)); + } + + return Error::success(); +} + static Error dumpSectionToFile(StringRef SecName, StringRef Filename, Object &Obj) { for (LoadCommand &LC : Obj.LoadCommands) @@ -273,34 +341,6 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { for (std::unique_ptr
&Sec : LC.Sections) Sec->Relocations.clear(); - for (LoadCommand &LC : Obj.LoadCommands) { - switch (LC.MachOLoadCommand.load_command_data.cmd) { - case MachO::LC_ID_DYLIB: - if (Config.SharedLibId) { - StringRef Id = Config.SharedLibId.getValue(); - if (Id.empty()) - return createStringError(errc::invalid_argument, - "cannot specify an empty id"); - updateLoadCommandPayloadString(LC, Id); - } - break; - - // TODO: Add LC_REEXPORT_DYLIB, LC_LAZY_LOAD_DYLIB, and LC_LOAD_UPWARD_DYLIB - // here once llvm-objcopy supports them. - case MachO::LC_LOAD_DYLIB: - case MachO::LC_LOAD_WEAK_DYLIB: - StringRef Old, New; - StringRef CurrentInstallName = getPayloadString(LC); - for (const auto &InstallNamePair : Config.InstallNamesToUpdate) { - std::tie(Old, New) = InstallNamePair; - if (CurrentInstallName == Old) { - updateLoadCommandPayloadString(LC, New); - break; - } - } - } - } - for (const auto &Flag : Config.AddSection) { std::pair SecPair = Flag.split("="); StringRef SecName = SecPair.first; @@ -311,45 +351,9 @@ static Error handleArgs(const CopyConfig &Config, Object &Obj) { return E; } - if (Error E = removeLoadCommands(Config, Obj)) + if (Error E = processLoadCommands(Config, Obj)) return E; - StringRef Old, New; - for (const auto &OldNew : Config.RPathsToUpdate) { - std::tie(Old, New) = OldNew; - - auto FindRPathLC = [&Obj](StringRef RPath) { - return find_if(Obj.LoadCommands, [=](const LoadCommand &LC) { - return LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && - getPayloadString(LC) == RPath; - }); - }; - - auto NewIt = FindRPathLC(New); - if (NewIt != Obj.LoadCommands.end()) - return createStringError(errc::invalid_argument, - "rpath " + New + - " would create a duplicate load command"); - - auto OldIt = FindRPathLC(Old); - if (OldIt == Obj.LoadCommands.end()) - return createStringError(errc::invalid_argument, - "no LC_RPATH load command with path: " + Old); - - updateLoadCommandPayloadString(*OldIt, New); - } - - for (StringRef RPath : Config.RPathToAdd) { - for (LoadCommand &LC : Obj.LoadCommands) { - if (LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH && - RPath == getPayloadString(LC)) { - return createStringError(errc::invalid_argument, - "rpath " + RPath + - " would create a duplicate load command"); - } - } - Obj.addLoadCommand(buildRPathLoadCommand(RPath)); - } return Error::success(); } From llvm-commits at lists.llvm.org Mon Jul 6 21:05:08 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:05:08 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG3b5db7fc69bb: [llvm-install-name-tool] Merge install-name options (authored by sameerarora101). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 Files: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82812.275900.patch Type: text/x-patch Size: 12635 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 21:15:29 2020 From: llvm-commits at lists.llvm.org (Kiran Kumar T P via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:15:29 +0000 (UTC) Subject: [PATCH] D82931: [flang][OpenMP] Enhance parser support for atomic construct to OpenMP 5.0 In-Reply-To: References: Message-ID: <24b1420d1eb332bc0d6749b0ddb3adc1@localhost.localdomain> kiranktp added a comment. In D82931#2132486 , @DavidTruby wrote: > Looks good to me! As a nit, perhaps you could add some tests that shouldn't parse correctly as well? Thanks for reviewing the code David!. > As a nit, perhaps you could add some tests that shouldn't parse correctly as well? Any syntax error will lead to chain of errors. It will be tough to add a case for invalid syntax. I will check this once. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82931/new/ https://reviews.llvm.org/D82931 From llvm-commits at lists.llvm.org Mon Jul 6 21:21:10 2020 From: llvm-commits at lists.llvm.org (Steve Scalpone via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:21:10 +0000 (UTC) Subject: [PATCH] D82931: [flang][OpenMP] Enhance parser support for atomic construct to OpenMP 5.0 In-Reply-To: References: Message-ID: <4f8949f4ea04270e09f9a18fdb9327e7@localhost.localdomain> sscalpone added a comment. > Any syntax error will lead to chain of errors. It will be tough to add a case for invalid syntax. Please consider adding error recovery to prevent the cascade of errors. See the docs and the parser for examples. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82931/new/ https://reviews.llvm.org/D82931 From llvm-commits at lists.llvm.org Mon Jul 6 21:34:58 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:34:58 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/ELF/Arch/PPC64.cpp:1043 + // FIXME: Remove the assertions once the call protocols are supported. + assert(!(type == R_PPC64_REL24_NOTOC && (s.stOther >> 5) > 1) && + "Unsupported protocol: RelType is R_PPC64_REL24_NOTOC and the callee " ---------------- Note: assert should only be used for logically unreachable code, i.e. if the implementation is not buggy, the negative code path should not trigger. You can use `fatal(...)` for unimplemented features. Please use all lowercase messages. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Mon Jul 6 21:36:56 2020 From: llvm-commits at lists.llvm.org (Daniel Sanders via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:36:56 +0000 (UTC) Subject: [PATCH] D83275: [llc] (almost) remove `--print-machineinstrs` In-Reply-To: References: Message-ID: dsanders accepted this revision. dsanders added a comment. This revision is now accepted and ready to land. I was worried for a moment that we'd be losing the ability to print between all machine passes but it looks like -print-after-all covers that now (I don't think that was always the case). So long as we aren't losing any inter-pass dumps this LGTM but I'd suggest giving it a few days to see if anyone was using this in a way that isn't evident from the tests. > -print-after/-stop-after since isel pass does not have commandline name. IIRC, there's something weird going on in this area. I vaguely remember a problem I never got to the bottom of where there was no name when AMDGPU was omitted but when it was compiled, everybody's pass was called `amdgpu-isel`. It had something to do with AMDGPU needing additional dependencies and using INITIALIZE_PASS to get them. ================ Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:144 -static cl::opt PrintMachineInstrs( - "print-machineinstrs", cl::ValueOptional, cl::desc("Print machine instrs"), - cl::value_desc("pass-name"), cl::init("option-unspecified"), cl::Hidden); +// FIXME: remove this after switching to NPM. Currently there are two users:s +// llvm/test/CodeGen/AArch64/min-jump-table.ll ---------------- There's a typo (`:`->`'`) but also there's more tests affected than just the two. Probably best to keep the exact number vague Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83275/new/ https://reviews.llvm.org/D83275 From llvm-commits at lists.llvm.org Mon Jul 6 21:38:27 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:38:27 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-hidden.s:22 + +# SYMBOL: 1: 000000001001000c 0 NOTYPE LOCAL DEFAULT [] 5 caller1 +# SYMBOL-NEXT: 2: 0000000010010028 0 NOTYPE LOCAL DEFAULT [] 5 caller1_tailcall ---------------- Align `1: ` ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:20 +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t | FileCheck %s + + ---------------- One empty line is sufficient. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:22 + +# SYMBOL: 1: 0000000010010000 0 NOTYPE LOCAL DEFAULT 1 callee1_stother0_local +# SYMBOL-NEXT: 2: 000000001002000c 0 NOTYPE LOCAL DEFAULT [] 2 callee2_stother1_local ---------------- Indent the first line ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:40 +# CHECK-NEXT: 10010000: mullw 3, 3, 3 +# CHECK: 10010008: blr + ---------------- Make addresses indented. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:42 + +# CHECK-LABEL: caller1 +# CHECK: 10010018: bl 0x10010000 ---------------- `caller1` is not a good FileCheck label. `:` is. It is unique in the llvm-objdump output. Please fix all the occurrences. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Mon Jul 6 21:40:36 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:40:36 +0000 (UTC) Subject: [PATCH] D78434: [mlir] resolve types from attributes in assemblyFormat In-Reply-To: References: Message-ID: <2194a61788ed7100869431445b2bfcc5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG72df59d59097: [mlir] resolve types from attributes in assemblyFormat (authored by tali, committed by mehdi_amini). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78434/new/ https://reviews.llvm.org/D78434 Files: mlir/docs/OpDefinitions.md mlir/test/lib/Dialect/Test/TestOps.td mlir/test/mlir-tblgen/op-format.mlir mlir/tools/mlir-tblgen/OpFormatGen.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D78434.275902.patch Type: text/x-patch Size: 11271 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 21:43:10 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 04:43:10 +0000 (UTC) Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in prologue Message-ID: lkail created this revision. lkail added reviewers: jsji, nemanjai, sfertile, steven.zhang, PowerPC. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. Herald added a project: LLVM. Add missing CFI directives when probing in prologue if `stack-clash-protection` is enabled. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83276 Files: llvm/lib/Target/PowerPC/PPCFrameLowering.cpp llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83276.275901.patch Type: text/x-patch Size: 20094 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:01:17 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:01:17 +0000 (UTC) Subject: [PATCH] D80358: [MLIR] Add RegionKindInterface In-Reply-To: References: Message-ID: <20eb04bf5c251764c5d3ac572acc2e30@localhost.localdomain> mehdi_amini added inline comments. ================ Comment at: mlir/docs/LangRef.md:677 +%3 = "op3"(%1) : (i32) -> (i32)) +}) +``` ---------------- The indentation does not seem right in this example? The reuse of %2 does not help readability either, just like %4 before %2 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80358/new/ https://reviews.llvm.org/D80358 From llvm-commits at lists.llvm.org Mon Jul 6 22:05:21 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:05:21 +0000 (UTC) Subject: [PATCH] D83275: [llc] (almost) remove `--print-machineinstrs` In-Reply-To: References: Message-ID: <74cd014e9034029bd19c6dbfaab4fffb@localhost.localdomain> ychen added a comment. In D83275#2134936 , @dsanders wrote: > I was worried for a moment that we'd be losing the ability to print between all machine passes but it looks like -print-after-all covers that now (I don't think that was always the case). So long as we aren't losing any inter-pass dumps this LGTM but I'd suggest giving it a few days to see if anyone was using this in a way that isn't evident from the tests. > > > -print-after/-stop-after since isel pass does not have commandline name. > > IIRC, there's something weird going on in this area. I vaguely remember a problem I never got to the bottom of where there was no name when AMDGPU was omitted but when it was compiled, everybody's pass was called `amdgpu-isel`. It had something to do with AMDGPU needing additional dependencies and using INITIALIZE_PASS to get them. I don't know if it is the cause but AMDGPUDAGToDAGISel::ID is actually SelectionDAGISel::ID. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83275/new/ https://reviews.llvm.org/D83275 From llvm-commits at lists.llvm.org Mon Jul 6 22:06:55 2020 From: llvm-commits at lists.llvm.org (Heejin Ahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:06:55 +0000 (UTC) Subject: [PATCH] D83277: [WebAssembly] Generate unreachable after __stack_chk_fail Message-ID: aheejin created this revision. aheejin added a reviewer: sunfish. Herald added subscribers: llvm-commits, hiraditya, jgravelle-google, sbc100, dschuff. Herald added a project: LLVM. `__stack_chk_fail` does not return, but `unreachable` was not generated following `call __stack_chk_fail`. This had a possibility to generate an invalid binary for functions with a return type, because `__stack_chk_fail`'s return type is void, and it is the last instruction din the function but its return type (void) does not match with the function's return type. Generating `unreachable` after it makes sure CFGStackify's `fixEndsAtEndOfFunction` handles it correctly. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83277 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/WebAssembly/stack-protector.ll Index: llvm/test/CodeGen/WebAssembly/stack-protector.ll =================================================================== --- llvm/test/CodeGen/WebAssembly/stack-protector.ll +++ llvm/test/CodeGen/WebAssembly/stack-protector.ll @@ -1,13 +1,13 @@ ; RUN: llc -verify-machineinstrs -mtriple=wasm32-unknown-unknown < %s | FileCheck -check-prefix=WASM32 %s -; WASM32: i32.load 28 -; WASM32-NEXT: i32.ne -; WASM32-NEXT: br_if 0 - -; WASM32: __stack_chk_fail - @"\01LC" = internal constant [11 x i8] c"buf == %s\0A\00" ; <[11 x i8]*> [#uses=1] +; WASM32-LABEL: test +; WASM32: i32.load 28 +; WASM32: br_if 0 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + define void @test(i8* %a) nounwind ssp { entry: %a_addr = alloca i8* ; [#uses=2] @@ -25,6 +25,27 @@ ret void } +; WASM32-LABEL: test_return_i32 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + +define i32 @test_return_i32(i8* %a) nounwind ssp { +entry: + %a_addr = alloca i8* ; [#uses=2] + %buf = alloca [8 x i8] ; <[8 x i8]*> [#uses=2] + %"alloca point" = bitcast i32 0 to i32 ; [#uses=0] + store i8* %a, i8** %a_addr + %buf1 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %0 = load i8*, i8** %a_addr, align 4 ; [#uses=1] + %1 = call i8* @strcpy(i8* %buf1, i8* %0) nounwind ; [#uses=0] + %buf2 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %2 = call i32 (i8*, ...) @printf(i8* getelementptr ([11 x i8], [11 x i8]* @"\01LC", i32 0, i32 0), i8* %buf2) nounwind ; [#uses=0] + br label %return + +return: ; preds = %entry + ret i32 0 +} + declare i8* @strcpy(i8*, i8*) nounwind declare i32 @printf(i8*, ...) nounwind Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -2667,6 +2667,11 @@ // Passing 'true' for doesNotReturn above won't generate the trap for us. if (TM.getTargetTriple().isPS4CPU()) Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); + // WebAssembly needs an unreachable instruction after a non-returning call, + // because the function return type can be different from __stack_chk_fail's + // return type (void). + if (TM.getTargetTriple().isWasm()) + Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); DAG.setRoot(Chain); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83277.275903.patch Type: text/x-patch Size: 2519 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:08:40 2020 From: llvm-commits at lists.llvm.org (Heejin Ahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:08:40 +0000 (UTC) Subject: [PATCH] D83277: [WebAssembly] Generate unreachable after __stack_chk_fail In-Reply-To: References: Message-ID: <9e5db057532d30aa54086dfbc104ffe2@localhost.localdomain> aheejin updated this revision to Diff 275904. aheejin marked an inline comment as done. aheejin added a comment. - Align FileCheck lines Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83277/new/ https://reviews.llvm.org/D83277 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/WebAssembly/stack-protector.ll Index: llvm/test/CodeGen/WebAssembly/stack-protector.ll =================================================================== --- llvm/test/CodeGen/WebAssembly/stack-protector.ll +++ llvm/test/CodeGen/WebAssembly/stack-protector.ll @@ -1,13 +1,13 @@ ; RUN: llc -verify-machineinstrs -mtriple=wasm32-unknown-unknown < %s | FileCheck -check-prefix=WASM32 %s -; WASM32: i32.load 28 -; WASM32-NEXT: i32.ne -; WASM32-NEXT: br_if 0 - -; WASM32: __stack_chk_fail - @"\01LC" = internal constant [11 x i8] c"buf == %s\0A\00" ; <[11 x i8]*> [#uses=1] +; WASM32-LABEL: test +; WASM32: i32.load 28 +; WASM32: br_if 0 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + define void @test(i8* %a) nounwind ssp { entry: %a_addr = alloca i8* ; [#uses=2] @@ -25,6 +25,27 @@ ret void } +; WASM32-LABEL: test_return_i32 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + +define i32 @test_return_i32(i8* %a) nounwind ssp { +entry: + %a_addr = alloca i8* ; [#uses=2] + %buf = alloca [8 x i8] ; <[8 x i8]*> [#uses=2] + %"alloca point" = bitcast i32 0 to i32 ; [#uses=0] + store i8* %a, i8** %a_addr + %buf1 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %0 = load i8*, i8** %a_addr, align 4 ; [#uses=1] + %1 = call i8* @strcpy(i8* %buf1, i8* %0) nounwind ; [#uses=0] + %buf2 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %2 = call i32 (i8*, ...) @printf(i8* getelementptr ([11 x i8], [11 x i8]* @"\01LC", i32 0, i32 0), i8* %buf2) nounwind ; [#uses=0] + br label %return + +return: ; preds = %entry + ret i32 0 +} + declare i8* @strcpy(i8*, i8*) nounwind declare i32 @printf(i8*, ...) nounwind Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -2667,6 +2667,11 @@ // Passing 'true' for doesNotReturn above won't generate the trap for us. if (TM.getTargetTriple().isPS4CPU()) Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); + // WebAssembly needs an unreachable instruction after a non-returning call, + // because the function return type can be different from __stack_chk_fail's + // return type (void). + if (TM.getTargetTriple().isWasm()) + Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); DAG.setRoot(Chain); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83277.275904.patch Type: text/x-patch Size: 2534 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:08:41 2020 From: llvm-commits at lists.llvm.org (Heejin Ahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:08:41 +0000 (UTC) Subject: [PATCH] D83277: [WebAssembly] Generate unreachable after __stack_chk_fail In-Reply-To: References: Message-ID: <7d7f56205bdb48e956d5925fca7b1d5a@localhost.localdomain> aheejin added inline comments. ================ Comment at: llvm/test/CodeGen/WebAssembly/stack-protector.ll:4 -; WASM32: i32.load 28 -; WASM32-NEXT: i32.ne -; WASM32-NEXT: br_if 0 ---------------- After adding `unreachable`, somehow the branch order changed and this became `i32.eq` instead, so I deleted this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83277/new/ https://reviews.llvm.org/D83277 From llvm-commits at lists.llvm.org Mon Jul 6 22:13:05 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Mon, 06 Jul 2020 22:13:05 -0700 (PDT) Subject: [llvm] 7fb3a84 - [X86] Remove duplicate SSE4A feature bit from X86TargetParser.def. NFC Message-ID: <5f040461.1c69fb81.f8b86.3818@mx.google.com> Author: Craig Topper Date: 2020-07-06T22:11:51-07:00 New Revision: 7fb3a849c13f302fadc8d16b28d5676964d886cf URL: https://github.com/llvm/llvm-project/commit/7fb3a849c13f302fadc8d16b28d5676964d886cf DIFF: https://github.com/llvm/llvm-project/commit/7fb3a849c13f302fadc8d16b28d5676964d886cf.diff LOG: [X86] Remove duplicate SSE4A feature bit from X86TargetParser.def. NFC We had both SSE4A and SSE4_A. So remove one of them. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index c3a144e2dda3..a395ee86961d 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -206,7 +206,6 @@ X86_FEATURE (SERIALIZE, "serialize") X86_FEATURE (SGX, "sgx") X86_FEATURE (SHA, "sha") X86_FEATURE (SHSTK, "shstk") -X86_FEATURE (SSE4A, "sse4a") X86_FEATURE (TBM, "tbm") X86_FEATURE (TSXLDTRK, "tsxldtrk") X86_FEATURE (VAES, "vaes") diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 8df165ed4e54..452d0934f29b 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -192,13 +192,13 @@ static constexpr FeatureBitset FeaturesK8 = static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | - FeaturePRFCHW | FeatureSAHF | FeatureSSE4A; + FeaturePRFCHW | FeatureSAHF | FeatureSSE4_A; // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | - FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4A | + FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; static constexpr FeatureBitset FeaturesBTVER2 = FeaturesBTVER1 | FeatureAES | FeatureAVX | FeatureBMI | FeatureF16C | @@ -210,7 +210,7 @@ static constexpr FeatureBitset FeaturesBDVER1 = FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | - FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4A | FeatureXOP | FeatureXSAVE; + FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; static constexpr FeatureBitset FeaturesBDVER2 = FeaturesBDVER1 | FeatureBMI | FeatureFMA | FeatureF16C | FeatureTBM; static constexpr FeatureBitset FeaturesBDVER3 = @@ -228,7 +228,7 @@ static constexpr FeatureBitset FeaturesZNVER1 = FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | - FeatureSSE4_2 | FeatureSSE4A | FeatureXSAVE | FeatureXSAVEC | + FeatureSSE4_2 | FeatureSSE4_A | FeatureXSAVE | FeatureXSAVEC | FeatureXSAVEOPT | FeatureXSAVES; static constexpr FeatureBitset FeaturesZNVER2 = FeaturesZNVER1 | FeatureCLWB | FeatureRDPID | FeatureWBNOINVD; From llvm-commits at lists.llvm.org Mon Jul 6 22:19:10 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:19:10 +0000 (UTC) Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in prologue In-Reply-To: References: Message-ID: lkail updated this revision to Diff 275906. Herald added a subscriber: wuzish. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83276/new/ https://reviews.llvm.org/D83276 Files: llvm/lib/Target/PowerPC/PPCFrameLowering.cpp llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83276.275906.patch Type: text/x-patch Size: 17955 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:19:48 2020 From: llvm-commits at lists.llvm.org (LiuChen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:19:48 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: LiuChen3 updated this revision to Diff 275905. LiuChen3 added a comment. Address the comments CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/X86/win64-byval.ll Index: llvm/test/CodeGen/X86/win64-byval.ll =================================================================== --- llvm/test/CodeGen/X86/win64-byval.ll +++ llvm/test/CodeGen/X86/win64-byval.ll @@ -32,3 +32,31 @@ call void @foo({ float, double }* byval %arg) ret void } + +declare void @foo2({ float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, i64 %f) + at data = external constant { float, double } + +define void @test() { +; CHECK-LABEL: @test +; CHECK: movq (%rax), %rcx +; CHECK-NEXT: movq 8(%rax), %rax +; CHECK-NEXT: movq %rax, 120(%rsp) +; CHECK-NEXT: movq %rcx, 112(%rsp) +; CHECK-NEXT: movq %rcx, 96(%rsp) +; CHECK-NEXT: movq %rax, 104(%rsp) +; CHECK-NEXT: movq %rcx, 80(%rsp) +; CHECK-NEXT: movq %rax, 88(%rsp) +; CHECK-NEXT: movq %rcx, 64(%rsp) +; CHECK-NEXT: movq %rax, 72(%rsp) +; CHECK-NEXT: movq %rax, 56(%rsp) +; CHECK-NEXT: movq %rcx, 48(%rsp) +; CHECK-NEXT: leaq 48(%rsp), %rax +; CHECK-NEXT: movq %rax, 32(%rsp) +; CHECK-NEXT: movq $10, 40(%rsp) +; CHECK-NEXT: leaq 112(%rsp), %rcx +; CHECK-NEXT: leaq 96(%rsp), %rdx +; CHECK-NEXT: leaq 80(%rsp), %r8 +; CHECK-NEXT: leaq 64(%rsp), %r9 + call void @foo2({ float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, i64 10) + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.h =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.h +++ llvm/lib/Target/X86/X86ISelLowering.h @@ -1436,7 +1436,7 @@ SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const; + ISD::ArgFlagsTy Flags, bool isByval) const; // Call lowering helpers. Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -3763,12 +3763,13 @@ SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const { + ISD::ArgFlagsTy Flags, + bool isByVal) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), StackPtr, PtrOff); - if (Flags.isByVal()) + if (isByVal) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); return DAG.getStore( @@ -4080,7 +4081,7 @@ StackPtr = DAG.getCopyFromReg(Chain, dl, RegInfo->getStackRegister(), getPointerTy(DAG.getDataLayout())); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, - dl, DAG, VA, Flags)); + dl, DAG, VA, Flags, isByVal)); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83175.275905.patch Type: text/x-patch Size: 3438 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:42:24 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:42:24 +0000 (UTC) Subject: [PATCH] D81375: [InstCombine] Simplify boolean Phis with const inputs using CFG In-Reply-To: References: Message-ID: mkazantsev added a comment. Still missing cases found for selects... It's getting not worth it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81375/new/ https://reviews.llvm.org/D81375 From llvm-commits at lists.llvm.org Mon Jul 6 22:44:56 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:44:56 +0000 (UTC) Subject: [PATCH] D83278: [WebAssembly] Avoid scalarizing vector shifts in more cases Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff. Herald added subscribers: llvm-commits, zzheng, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. Since WebAssembly's vector shift instructions take a scalar shift amount rather than a vector shift amount, we have to check in ISel that the vector shift amount is a splat. Previously, we were checking explicitly for splat BUILD_VECTOR nodes, but this change uses the standard utilities for detecting splat values that can handle more complex splat patterns. Since the C++ ISel lowering is now more general than the ISel patterns, this change also simplifies shift lowering by using the C++ lowering for all SIMD shifts rather than mixing C++ and normal pattern-based lowering. This change improves ISel for shifts to the point that the simd-shift-unroll.ll regression test no longer tests the code path it was originally meant to test. The bug corresponding to that regression test is no longer reproducible with its original reported reproducer, so rather than try to fix the regression test, this change just removes it. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83278 Files: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83278.275907.patch Type: text/x-patch Size: 13519 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:55:47 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:55:47 +0000 (UTC) Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in prologue In-Reply-To: References: Message-ID: <2e6edc31df4a6e5a07ded974e3b11a0d@localhost.localdomain> lkail updated this revision to Diff 275908. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83276/new/ https://reviews.llvm.org/D83276 Files: llvm/lib/Target/PowerPC/PPCFrameLowering.cpp llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83276.275908.patch Type: text/x-patch Size: 18021 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 22:56:19 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:56:19 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-hidden.s:111 + mr 30, 5 + bl callee2_stother1_local at notoc + add 3, 3, 30 ---------------- IIUC, you can merge caller2 and caller2_tailcall and delete all instructions except: ``` .localentry caller2_tailcall, 1 bl callee2_stother1_local at notoc b callee2_stother1_local at notoc ``` ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:52 +callee1_stother0_local: + mullw 3, 3, 3 + extsw 3, 3 ---------------- I think these instructions are entirely irrelevant to the test and should be deleted to make tests more focused/readable. mullw 3, 3, 3 ext3sw 3,3 Please check some newer x86-64-* and aarch64-* tests. They don't have such setup instructions. But please keep the instruction after `bl ... at notoc` to make it clear that the next instruction is not special as in R_PPC64_REL24 I think cleaning up the instructions can make the test smaller by at least one half. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Mon Jul 6 22:57:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 05:57:45 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <123fa0c39b1b80fa899a8fbf28f291a0@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:153 +.section .text_extern_stother0, "ax", %progbits +caller3: + .localentry caller3, 1 ---------------- ppc64-pcrel-call-to-pcrel-callee-hidden.s changes the symbol bindings to STB_GLOBAL. Do these symbols need `.globl`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Mon Jul 6 23:04:40 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via llvm-commits) Date: Mon, 06 Jul 2020 23:04:40 -0700 (PDT) Subject: [llvm] 094e99d - [Test] Add one more missing optimization opportunity test Message-ID: <5f041078.1c69fb81.1f3de.424d@mx.google.com> Author: Max Kazantsev Date: 2020-07-07T13:04:15+07:00 New Revision: 094e99d264c937cad33796b8c92fe123cb544c9e URL: https://github.com/llvm/llvm-project/commit/094e99d264c937cad33796b8c92fe123cb544c9e DIFF: https://github.com/llvm/llvm-project/commit/094e99d264c937cad33796b8c92fe123cb544c9e.diff LOG: [Test] Add one more missing optimization opportunity test Added: Modified: llvm/test/Transforms/InstCombine/select.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll index c7e96974ef32..381a77bb8d78 100644 --- a/llvm/test/Transforms/InstCombine/select.ll +++ b/llvm/test/Transforms/InstCombine/select.ll @@ -1255,8 +1255,8 @@ define i128 @test86(i1 %flag) { define i32 @test_select_select0(i32 %a, i32 %r0, i32 %r1, i32 %v1, i32 %v2) { ; CHECK-LABEL: @test_select_select0( -; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[A:%.*]], [[V1:%.*]] -; CHECK-NEXT: [[S0:%.*]] = select i1 [[C0]], i32 [[R1:%.*]], i32 [[R0:%.*]] +; CHECK-NEXT: [[C0_NOT:%.*]] = icmp slt i32 [[A:%.*]], [[V1:%.*]] +; CHECK-NEXT: [[S0:%.*]] = select i1 [[C0_NOT]], i32 [[R1:%.*]], i32 [[R0:%.*]] ; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[A]], [[V2:%.*]] ; CHECK-NEXT: [[S1:%.*]] = select i1 [[C1]], i32 [[S0]], i32 [[R1]] ; CHECK-NEXT: ret i32 [[S1]] @@ -1270,8 +1270,8 @@ define i32 @test_select_select0(i32 %a, i32 %r0, i32 %r1, i32 %v1, i32 %v2) { define i32 @test_select_select1(i32 %a, i32 %r0, i32 %r1, i32 %v1, i32 %v2) { ; CHECK-LABEL: @test_select_select1( -; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[A:%.*]], [[V1:%.*]] -; CHECK-NEXT: [[S0:%.*]] = select i1 [[C0]], i32 [[R1:%.*]], i32 [[R0:%.*]] +; CHECK-NEXT: [[C0_NOT:%.*]] = icmp slt i32 [[A:%.*]], [[V1:%.*]] +; CHECK-NEXT: [[S0:%.*]] = select i1 [[C0_NOT]], i32 [[R1:%.*]], i32 [[R0:%.*]] ; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[A]], [[V2:%.*]] ; CHECK-NEXT: [[S1:%.*]] = select i1 [[C1]], i32 [[R0]], i32 [[S0]] ; CHECK-NEXT: ret i32 [[S1]] @@ -2240,3 +2240,36 @@ exit: exit2: ret i32 %iv.inc } + +define i32 @test_select_into_phi_not_idom(i1 %cond, i32 %A, i32 %B) { +; CHECK-LABEL: @test_select_into_phi_not_idom( +; CHECK-NEXT: entry: +; CHECK-NEXT: br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]] +; CHECK: if.true: +; CHECK-NEXT: br label [[MERGE:%.*]] +; CHECK: if.false: +; CHECK-NEXT: br label [[MERGE]] +; CHECK: merge: +; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[A:%.*]], [[IF_TRUE]] ], [ [[B:%.*]], [[IF_FALSE]] ] +; CHECK-NEXT: br label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[SEL:%.*]] = select i1 [[COND]], i32 [[PHI]], i32 [[A]] +; CHECK-NEXT: ret i32 [[SEL]] +; +entry: + br i1 %cond, label %if.true, label %if.false + +if.true: + br label %merge + +if.false: + br label %merge + +merge: + %phi = phi i32 [%A, %if.true], [%B, %if.false] + br label %exit + +exit: + %sel = select i1 %cond, i32 %phi, i32 %A + ret i32 %sel +} From llvm-commits at lists.llvm.org Mon Jul 6 23:07:54 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:07:54 +0000 (UTC) Subject: [PATCH] D80358: [MLIR] Add RegionKindInterface In-Reply-To: References: Message-ID: <395f3dfecc362edda9af08d6e278810d@localhost.localdomain> stephenneuendorffer updated this revision to Diff 275910. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80358/new/ https://reviews.llvm.org/D80358 Files: mlir/docs/Interfaces.md mlir/docs/LangRef.md mlir/include/mlir/IR/CMakeLists.txt mlir/include/mlir/IR/Dominance.h mlir/include/mlir/IR/RegionKindInterface.h mlir/include/mlir/IR/RegionKindInterface.td mlir/lib/IR/CMakeLists.txt mlir/lib/IR/Dominance.cpp mlir/lib/IR/RegionKindInterface.cpp mlir/lib/IR/Verifier.cpp mlir/lib/Transforms/CSE.cpp mlir/test/CMakeLists.txt mlir/test/IR/invalid.mlir mlir/test/IR/parser.mlir mlir/test/IR/traits.mlir mlir/test/lib/Dialect/Test/TestDialect.cpp mlir/test/lib/Dialect/Test/TestDialect.h mlir/test/lib/Dialect/Test/TestOps.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D80358.275910.patch Type: text/x-patch Size: 63366 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:10:24 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:10:24 +0000 (UTC) Subject: [PATCH] D83192: Fix off by one error in Bitfields In-Reply-To: References: Message-ID: courbet added a comment. Can you add a unit test for this ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83192/new/ https://reviews.llvm.org/D83192 From llvm-commits at lists.llvm.org Mon Jul 6 23:13:36 2020 From: llvm-commits at lists.llvm.org (Ruiling, Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:13:36 +0000 (UTC) Subject: [PATCH] D83280: [AMDGPU] Fix typos in performCtlz_CttzCombine() Message-ID: ruiling created this revision. ruiling added reviewers: arsenm, nhaehnle. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Fix two obvious errors in the code and also update the test check. Also add one test to catch the failure. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83280 Files: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/test/CodeGen/AMDGPU/cttz_zero_undef.ll llvm/test/CodeGen/AMDGPU/select-constant-cttz.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83280.275911.patch Type: text/x-patch Size: 4463 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:14:39 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:14:39 +0000 (UTC) Subject: [PATCH] D83281: [OpenMP] Allow traits for the OpenMP context selector `isa` Message-ID: jdoerfert created this revision. jdoerfert added reviewers: jhuber6, fghanim, JonChesterfield, grokos, ggeorgakoudis, ABataev. Herald added subscribers: llvm-commits, sstefan1, guansong, bollu, hiraditya, yaxunl. Herald added projects: clang, LLVM. NOTE: The changes are fairly mechanical overall but this patch currently lacks proper testing. An updated version with more test coverage will be provided. It was unclear what `isa` was supposed to mean so we did not provide any traits for this context selector. With this patch we will allow *any* string or identifier. We use the target attribute and target info to determine if the trait matches. In other words, we will check if the provided value is a target feature that is available (at the call site). Fixes PR46338 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83281 Files: clang/include/clang/AST/OpenMPClause.h clang/lib/AST/OpenMPClause.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/declare_variant_device_isa_codegen_1.c llvm/include/llvm/Frontend/OpenMP/OMPContext.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPContext.cpp llvm/unittests/Frontend/OpenMPContextTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83281.275912.patch Type: text/x-patch Size: 21680 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:15:02 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Mon, 06 Jul 2020 23:15:02 -0700 (PDT) Subject: [llvm] 16f3d69 - [X86] Move the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp Message-ID: <5f0412e6.1c69fb81.d2c51.6e4d@mx.google.com> Author: Craig Topper Date: 2020-07-06T23:14:02-07:00 New Revision: 16f3d698f2afbbea43e0c3df81df6f2a640ce806 URL: https://github.com/llvm/llvm-project/commit/16f3d698f2afbbea43e0c3df81df6f2a640ce806 DIFF: https://github.com/llvm/llvm-project/commit/16f3d698f2afbbea43e0c3df81df6f2a640ce806.diff LOG: [X86] Move the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp Previously we had to specify the forward and backwards feature dependencies separately which was error prone. And as dependencies have gotten more complex it was hard to be sure the transitive dependencies were handled correctly. The way it was written was also not super readable. This patch replaces everything with a table that lists what features a feature is dependent on directly. Then we can recursively walk through the table to find the transitive dependencies. This is largely based on how we handle subtarget features in the MC layer from the tablegen descriptions. Differential Revision: https://reviews.llvm.org/D83273 Added: Modified: clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h llvm/include/llvm/Support/X86TargetParser.def llvm/include/llvm/Support/X86TargetParser.h llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp index ed62848d8070..9ea0925cb535 100644 --- a/clang/lib/Basic/Targets/X86.cpp +++ b/clang/lib/Basic/Targets/X86.cpp @@ -145,155 +145,6 @@ bool X86TargetInfo::initFeatureMap( return true; } -void X86TargetInfo::setSSELevel(llvm::StringMap &Features, - X86SSEEnum Level, bool Enabled) { - if (Enabled) { - switch (Level) { - case AVX512F: - Features["avx512f"] = true; - Features["fma"] = true; - Features["f16c"] = true; - LLVM_FALLTHROUGH; - case AVX2: - Features["avx2"] = true; - LLVM_FALLTHROUGH; - case AVX: - Features["avx"] = true; - LLVM_FALLTHROUGH; - case SSE42: - Features["sse4.2"] = true; - LLVM_FALLTHROUGH; - case SSE41: - Features["sse4.1"] = true; - LLVM_FALLTHROUGH; - case SSSE3: - Features["ssse3"] = true; - LLVM_FALLTHROUGH; - case SSE3: - Features["sse3"] = true; - LLVM_FALLTHROUGH; - case SSE2: - Features["sse2"] = true; - LLVM_FALLTHROUGH; - case SSE1: - Features["sse"] = true; - LLVM_FALLTHROUGH; - case NoSSE: - break; - } - return; - } - - switch (Level) { - case NoSSE: - case SSE1: - Features["sse"] = false; - LLVM_FALLTHROUGH; - case SSE2: - Features["sse2"] = Features["pclmul"] = Features["aes"] = false; - Features["sha"] = Features["gfni"] = false; - LLVM_FALLTHROUGH; - case SSE3: - Features["sse3"] = false; - setXOPLevel(Features, NoXOP, false); - LLVM_FALLTHROUGH; - case SSSE3: - Features["ssse3"] = false; - LLVM_FALLTHROUGH; - case SSE41: - Features["sse4.1"] = false; - LLVM_FALLTHROUGH; - case SSE42: - Features["sse4.2"] = false; - LLVM_FALLTHROUGH; - case AVX: - Features["fma"] = Features["avx"] = Features["f16c"] = false; - Features["vaes"] = Features["vpclmulqdq"] = false; - setXOPLevel(Features, FMA4, false); - LLVM_FALLTHROUGH; - case AVX2: - Features["avx2"] = false; - LLVM_FALLTHROUGH; - case AVX512F: - Features["avx512f"] = Features["avx512cd"] = Features["avx512er"] = false; - Features["avx512pf"] = Features["avx512dq"] = Features["avx512bw"] = false; - Features["avx512vl"] = Features["avx512vbmi"] = false; - Features["avx512ifma"] = Features["avx512vpopcntdq"] = false; - Features["avx512bitalg"] = Features["avx512vnni"] = false; - Features["avx512vbmi2"] = Features["avx512bf16"] = false; - Features["avx512vp2intersect"] = false; - break; - } -} - -void X86TargetInfo::setMMXLevel(llvm::StringMap &Features, - MMX3DNowEnum Level, bool Enabled) { - if (Enabled) { - switch (Level) { - case AMD3DNowAthlon: - Features["3dnowa"] = true; - LLVM_FALLTHROUGH; - case AMD3DNow: - Features["3dnow"] = true; - LLVM_FALLTHROUGH; - case MMX: - Features["mmx"] = true; - LLVM_FALLTHROUGH; - case NoMMX3DNow: - break; - } - return; - } - - switch (Level) { - case NoMMX3DNow: - case MMX: - Features["mmx"] = false; - LLVM_FALLTHROUGH; - case AMD3DNow: - Features["3dnow"] = false; - LLVM_FALLTHROUGH; - case AMD3DNowAthlon: - Features["3dnowa"] = false; - break; - } -} - -void X86TargetInfo::setXOPLevel(llvm::StringMap &Features, XOPEnum Level, - bool Enabled) { - if (Enabled) { - switch (Level) { - case XOP: - Features["xop"] = true; - LLVM_FALLTHROUGH; - case FMA4: - Features["fma4"] = true; - setSSELevel(Features, AVX, true); - LLVM_FALLTHROUGH; - case SSE4A: - Features["sse4a"] = true; - setSSELevel(Features, SSE3, true); - LLVM_FALLTHROUGH; - case NoXOP: - break; - } - return; - } - - switch (Level) { - case NoXOP: - case SSE4A: - Features["sse4a"] = false; - LLVM_FALLTHROUGH; - case FMA4: - Features["fma4"] = false; - LLVM_FALLTHROUGH; - case XOP: - Features["xop"] = false; - break; - } -} - void X86TargetInfo::setFeatureEnabledImpl(llvm::StringMap &Features, StringRef Name, bool Enabled) { if (Name == "sse4") { @@ -309,96 +160,10 @@ void X86TargetInfo::setFeatureEnabledImpl(llvm::StringMap &Features, Features[Name] = Enabled; - if (Name == "mmx") { - setMMXLevel(Features, MMX, Enabled); - } else if (Name == "sse") { - setSSELevel(Features, SSE1, Enabled); - } else if (Name == "sse2") { - setSSELevel(Features, SSE2, Enabled); - } else if (Name == "sse3") { - setSSELevel(Features, SSE3, Enabled); - } else if (Name == "ssse3") { - setSSELevel(Features, SSSE3, Enabled); - } else if (Name == "sse4.2") { - setSSELevel(Features, SSE42, Enabled); - } else if (Name == "sse4.1") { - setSSELevel(Features, SSE41, Enabled); - } else if (Name == "3dnow") { - setMMXLevel(Features, AMD3DNow, Enabled); - } else if (Name == "3dnowa") { - setMMXLevel(Features, AMD3DNowAthlon, Enabled); - } else if (Name == "aes") { - if (Enabled) - setSSELevel(Features, SSE2, Enabled); - else - Features["vaes"] = false; - } else if (Name == "vaes") { - if (Enabled) { - setSSELevel(Features, AVX, Enabled); - Features["aes"] = true; - } - } else if (Name == "pclmul") { - if (Enabled) - setSSELevel(Features, SSE2, Enabled); - else - Features["vpclmulqdq"] = false; - } else if (Name == "vpclmulqdq") { - if (Enabled) { - setSSELevel(Features, AVX, Enabled); - Features["pclmul"] = true; - } - } else if (Name == "gfni") { - if (Enabled) - setSSELevel(Features, SSE2, Enabled); - } else if (Name == "avx") { - setSSELevel(Features, AVX, Enabled); - } else if (Name == "avx2") { - setSSELevel(Features, AVX2, Enabled); - } else if (Name == "avx512f") { - setSSELevel(Features, AVX512F, Enabled); - } else if (Name.startswith("avx512")) { - if (Enabled) - setSSELevel(Features, AVX512F, Enabled); - // Enable BWI instruction if certain features are being enabled. - if ((Name == "avx512vbmi" || Name == "avx512vbmi2" || - Name == "avx512bitalg" || Name == "avx512bf16") && Enabled) - Features["avx512bw"] = true; - // Also disable some features if BWI is being disabled. - if (Name == "avx512bw" && !Enabled) { - Features["avx512vbmi"] = false; - Features["avx512vbmi2"] = false; - Features["avx512bitalg"] = false; - Features["avx512bf16"] = false; - } - } else if (Name == "fma") { - if (Enabled) - setSSELevel(Features, AVX, Enabled); - else - setSSELevel(Features, AVX512F, Enabled); - } else if (Name == "fma4") { - setXOPLevel(Features, FMA4, Enabled); - } else if (Name == "xop") { - setXOPLevel(Features, XOP, Enabled); - } else if (Name == "sse4a") { - setXOPLevel(Features, SSE4A, Enabled); - } else if (Name == "f16c") { - if (Enabled) - setSSELevel(Features, AVX, Enabled); - else - setSSELevel(Features, AVX512F, Enabled); - } else if (Name == "sha") { - if (Enabled) - setSSELevel(Features, SSE2, Enabled); - } else if (Name == "xsave") { - if (!Enabled) - Features["xsaveopt"] = Features["xsavec"] = Features["xsaves"] = false; - } else if (Name == "xsaveopt" || Name == "xsavec" || Name == "xsaves") { - if (Enabled) - Features["xsave"] = true; - } else if (Name == "amx-tile" && !Enabled) { - Features["amx-bf16"] = Features["amx-int8"] = false; - } else if ((Name == "amx-bf16" || Name == "amx-int8") && Enabled) - Features["amx-tile"] = true; + SmallVector ImpliedFeatures; + llvm::X86::getImpliedFeatures(Name, Enabled, ImpliedFeatures); + for (const auto &F : ImpliedFeatures) + Features[F] = Enabled; } /// handleTargetFeatures - Perform initialization based on the user diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h index 623ac9474b5c..b783decd72be 100644 --- a/clang/lib/Basic/Targets/X86.h +++ b/clang/lib/Basic/Targets/X86.h @@ -262,15 +262,6 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo { void getTargetDefines(const LangOptions &Opts, MacroBuilder &Builder) const override; - static void setSSELevel(llvm::StringMap &Features, X86SSEEnum Level, - bool Enabled); - - static void setMMXLevel(llvm::StringMap &Features, MMX3DNowEnum Level, - bool Enabled); - - static void setXOPLevel(llvm::StringMap &Features, XOPEnum Level, - bool Enabled); - void setFeatureEnabled(llvm::StringMap &Features, StringRef Name, bool Enabled) const override { setFeatureEnabledImpl(Features, Name, Enabled); diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index a395ee86961d..e53ef20f939e 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -174,13 +174,16 @@ X86_FEATURE_COMPAT(AVX512VP2INTERSECT, "avx512vp2intersect") X86_FEATURE (3DNOW, "3dnow") X86_FEATURE (3DNOWA, "3dnowa") X86_FEATURE (ADX, "adx") +X86_FEATURE (AMX_BF16, "amx-bf16") +X86_FEATURE (AMX_INT8, "amx-int8") +X86_FEATURE (AMX_TILE, "amx-tile") X86_FEATURE (CLDEMOTE, "cldemote") X86_FEATURE (CLFLUSHOPT, "clflushopt") X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") -X86_FEATURE (EM64T, nullptr) +X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") @@ -209,6 +212,7 @@ X86_FEATURE (SHSTK, "shstk") X86_FEATURE (TBM, "tbm") X86_FEATURE (TSXLDTRK, "tsxldtrk") X86_FEATURE (VAES, "vaes") +X86_FEATURE (VZEROUPPER, "vzeroupper") X86_FEATURE (WAITPKG, "waitpkg") X86_FEATURE (WBNOINVD, "wbnoinvd") X86_FEATURE (X87, "x87") @@ -216,5 +220,10 @@ X86_FEATURE (XSAVE, "xsave") X86_FEATURE (XSAVEC, "xsavec") X86_FEATURE (XSAVEOPT, "xsaveopt") X86_FEATURE (XSAVES, "xsaves") +// These features aren't really CPU features, but the frontend can set them. +X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches") +X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls") +X86_FEATURE (LVI_CFI, "lvi-cfi") +X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening") #undef X86_FEATURE_COMPAT #undef X86_FEATURE diff --git a/llvm/include/llvm/Support/X86TargetParser.h b/llvm/include/llvm/Support/X86TargetParser.h index 5897e79eb287..4a4fb8ccc4cc 100644 --- a/llvm/include/llvm/Support/X86TargetParser.h +++ b/llvm/include/llvm/Support/X86TargetParser.h @@ -137,6 +137,11 @@ ProcessorFeatures getKeyFeature(CPUKind Kind); /// Fill in the features that \p CPU supports into \p Features. void getFeaturesForCPU(StringRef CPU, SmallVectorImpl &Features); +/// Fill \p Features with the features that are implied to be enabled/disabled +/// by the provided \p Feature. +void getImpliedFeatures(StringRef Feature, bool Enabled, + SmallVectorImpl &Features); + } // namespace X86 } // namespace llvm diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 452d0934f29b..5e4f62d8a1d6 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -48,6 +48,14 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } + constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { + uint32_t NewBits = Bits[I] | RHS.Bits[I]; + Bits[I] = NewBits; + } + return *this; + } + constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { FeatureBitset Result; for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) @@ -77,6 +85,11 @@ struct ProcInfo { FeatureBitset Features; }; +struct FeatureInfo { + StringLiteral Name; + FeatureBitset ImpliedFeatures; +}; + } // end anonymous namespace #define X86_FEATURE(ENUM, STRING) \ @@ -376,19 +389,187 @@ ProcessorFeatures llvm::X86::getKeyFeature(X86::CPUKind Kind) { llvm_unreachable("Unable to find CPU kind!"); } -static const char *FeatureStrings[X86::CPU_FEATURE_MAX] = { -#define X86_FEATURE(ENUM, STR) STR, +// Features with no dependencies. +static constexpr FeatureBitset ImpliedFeaturesADX = {}; +static constexpr FeatureBitset ImpliedFeaturesBMI = {}; +static constexpr FeatureBitset ImpliedFeaturesBMI2 = {}; +static constexpr FeatureBitset ImpliedFeaturesCLDEMOTE = {}; +static constexpr FeatureBitset ImpliedFeaturesCLFLUSHOPT = {}; +static constexpr FeatureBitset ImpliedFeaturesCLWB = {}; +static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; +static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; +static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; +static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; +static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; +static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; +static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; +static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; +static constexpr FeatureBitset ImpliedFeaturesINVPCID = {}; +static constexpr FeatureBitset ImpliedFeaturesLWP = {}; +static constexpr FeatureBitset ImpliedFeaturesLZCNT = {}; +static constexpr FeatureBitset ImpliedFeaturesMWAITX = {}; +static constexpr FeatureBitset ImpliedFeaturesMOVBE = {}; +static constexpr FeatureBitset ImpliedFeaturesMOVDIR64B = {}; +static constexpr FeatureBitset ImpliedFeaturesMOVDIRI = {}; +static constexpr FeatureBitset ImpliedFeaturesPCONFIG = {}; +static constexpr FeatureBitset ImpliedFeaturesPOPCNT = {}; +static constexpr FeatureBitset ImpliedFeaturesPKU = {}; +static constexpr FeatureBitset ImpliedFeaturesPREFETCHWT1 = {}; +static constexpr FeatureBitset ImpliedFeaturesPRFCHW = {}; +static constexpr FeatureBitset ImpliedFeaturesPTWRITE = {}; +static constexpr FeatureBitset ImpliedFeaturesRDPID = {}; +static constexpr FeatureBitset ImpliedFeaturesRDRND = {}; +static constexpr FeatureBitset ImpliedFeaturesRDSEED = {}; +static constexpr FeatureBitset ImpliedFeaturesRTM = {}; +static constexpr FeatureBitset ImpliedFeaturesSAHF = {}; +static constexpr FeatureBitset ImpliedFeaturesSERIALIZE = {}; +static constexpr FeatureBitset ImpliedFeaturesSGX = {}; +static constexpr FeatureBitset ImpliedFeaturesSHSTK = {}; +static constexpr FeatureBitset ImpliedFeaturesTBM = {}; +static constexpr FeatureBitset ImpliedFeaturesTSXLDTRK = {}; +static constexpr FeatureBitset ImpliedFeaturesWAITPKG = {}; +static constexpr FeatureBitset ImpliedFeaturesWBNOINVD = {}; +static constexpr FeatureBitset ImpliedFeaturesVZEROUPPER = {}; +static constexpr FeatureBitset ImpliedFeaturesX87 = {}; +static constexpr FeatureBitset ImpliedFeaturesXSAVE = {}; + +// Not really CPU features, but need to be in the table because clang uses +// target features to communicate them to the backend. +static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_BRANCHES = {}; +static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_CALLS = {}; +static constexpr FeatureBitset ImpliedFeaturesLVI_CFI = {}; +static constexpr FeatureBitset ImpliedFeaturesLVI_LOAD_HARDENING = {}; + +// XSAVE features are dependent on basic XSAVE. +static constexpr FeatureBitset ImpliedFeaturesXSAVEC = FeatureXSAVE; +static constexpr FeatureBitset ImpliedFeaturesXSAVEOPT = FeatureXSAVE; +static constexpr FeatureBitset ImpliedFeaturesXSAVES = FeatureXSAVE; + +// MMX->3DNOW->3DNOWA chain. +static constexpr FeatureBitset ImpliedFeaturesMMX = {}; +static constexpr FeatureBitset ImpliedFeatures3DNOW = FeatureMMX; +static constexpr FeatureBitset ImpliedFeatures3DNOWA = Feature3DNOW; + +// SSE/AVX/AVX512F chain. +static constexpr FeatureBitset ImpliedFeaturesSSE = {}; +static constexpr FeatureBitset ImpliedFeaturesSSE2 = FeatureSSE; +static constexpr FeatureBitset ImpliedFeaturesSSE3 = FeatureSSE2; +static constexpr FeatureBitset ImpliedFeaturesSSSE3 = FeatureSSE3; +static constexpr FeatureBitset ImpliedFeaturesSSE4_1 = FeatureSSSE3; +static constexpr FeatureBitset ImpliedFeaturesSSE4_2 = FeatureSSE4_1; +static constexpr FeatureBitset ImpliedFeaturesAVX = FeatureSSE4_2; +static constexpr FeatureBitset ImpliedFeaturesAVX2 = FeatureAVX; +static constexpr FeatureBitset ImpliedFeaturesAVX512F = + FeatureAVX2 | FeatureF16C | FeatureFMA; + +// Vector extensions that build on SSE or AVX. +static constexpr FeatureBitset ImpliedFeaturesAES = FeatureSSE2; +static constexpr FeatureBitset ImpliedFeaturesF16C = FeatureAVX; +static constexpr FeatureBitset ImpliedFeaturesFMA = FeatureAVX; +static constexpr FeatureBitset ImpliedFeaturesGFNI = FeatureSSE2; +static constexpr FeatureBitset ImpliedFeaturesPCLMUL = FeatureSSE2; +static constexpr FeatureBitset ImpliedFeaturesSHA = FeatureSSE2; +static constexpr FeatureBitset ImpliedFeaturesVAES = FeatureAES | FeatureAVX; +static constexpr FeatureBitset ImpliedFeaturesVPCLMULQDQ = + FeatureAVX | FeaturePCLMUL; + +// AVX512 features. +static constexpr FeatureBitset ImpliedFeaturesAVX512CD = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512BW = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512DQ = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512ER = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512PF = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512VL = FeatureAVX512F; + +static constexpr FeatureBitset ImpliedFeaturesAVX512BF16 = FeatureAVX512BW; +static constexpr FeatureBitset ImpliedFeaturesAVX512BITALG = FeatureAVX512BW; +static constexpr FeatureBitset ImpliedFeaturesAVX512IFMA = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512VNNI = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512VPOPCNTDQ = FeatureAVX512F; +static constexpr FeatureBitset ImpliedFeaturesAVX512VBMI = FeatureAVX512BW; +static constexpr FeatureBitset ImpliedFeaturesAVX512VBMI2 = FeatureAVX512BW; +static constexpr FeatureBitset ImpliedFeaturesAVX512VP2INTERSECT = + FeatureAVX512F; + +// FIXME: These two aren't really implemented and just exist in the feature +// list for __builtin_cpu_supports. So omit their dependencies. +static constexpr FeatureBitset ImpliedFeaturesAVX5124FMAPS = {}; +static constexpr FeatureBitset ImpliedFeaturesAVX5124VNNIW = {}; + +// SSE4_A->FMA4->XOP chain. +static constexpr FeatureBitset ImpliedFeaturesSSE4_A = FeatureSSSE3; +static constexpr FeatureBitset ImpliedFeaturesFMA4 = FeatureAVX | FeatureSSE4_A; +static constexpr FeatureBitset ImpliedFeaturesXOP = FeatureFMA4; + +// AMX Features +static constexpr FeatureBitset ImpliedFeaturesAMX_TILE = {}; +static constexpr FeatureBitset ImpliedFeaturesAMX_BF16 = FeatureAMX_TILE; +static constexpr FeatureBitset ImpliedFeaturesAMX_INT8 = FeatureAMX_TILE; + +static constexpr FeatureInfo FeatureInfos[X86::CPU_FEATURE_MAX] = { +#define X86_FEATURE(ENUM, STR) {{STR}, ImpliedFeatures##ENUM}, #include "llvm/Support/X86TargetParser.def" }; +// Convert the set bits in FeatureBitset to a list of strings. +static void getFeatureBitsAsStrings(const FeatureBitset &Bits, + SmallVectorImpl &Features) { + for (unsigned i = 0; i != CPU_FEATURE_MAX; ++i) + if (Bits[i] && !FeatureInfos[i].Name.empty()) + Features.push_back(FeatureInfos[i].Name); +} + void llvm::X86::getFeaturesForCPU(StringRef CPU, - SmallVectorImpl &Features) { + SmallVectorImpl &EnabledFeatures) { auto I = llvm::find_if(Processors, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); // Add the string version of all set bits. - for (unsigned i = 0; i != CPU_FEATURE_MAX; ++i) - if (FeatureStrings[i] && I->Features[i]) - Features.push_back(FeatureStrings[i]); + getFeatureBitsAsStrings(I->Features, EnabledFeatures); +} + +// For each feature that is (transitively) implied by this feature, set it. +static void getImpliedEnabledFeatures(FeatureBitset &Bits, + const FeatureBitset &Implies) { + Bits |= Implies; + for (unsigned i = 0; i != CPU_FEATURE_MAX; ++i) { + if (Implies[i]) + getImpliedEnabledFeatures(Bits, FeatureInfos[i].ImpliedFeatures); + } +} + +/// Create bit vector of features that are implied disabled if the feature +/// passed in Value is disabled. +static void getImpliedDisabledFeatures(FeatureBitset &Bits, unsigned Value) { + // Check all features looking for any dependent on this feature. If we find + // one, mark it and recursively find any feature that depend on it. + for (unsigned i = 0; i != CPU_FEATURE_MAX; ++i) { + if (FeatureInfos[i].ImpliedFeatures[Value]) { + Bits.set(i); + getImpliedDisabledFeatures(Bits, i); + } + } +} + +void llvm::X86::getImpliedFeatures( + StringRef Feature, bool Enabled, + SmallVectorImpl &ImpliedFeatures) { + auto I = llvm::find_if( + FeatureInfos, [&](const FeatureInfo &FI) { return FI.Name == Feature; }); + if (I == std::end(FeatureInfos)) { + // This shouldn't happen, but handle it gracefully for release builds. + assert(false && "Feature not in table!"); + return; + } + + FeatureBitset ImpliedBits; + if (Enabled) + getImpliedEnabledFeatures(ImpliedBits, I->ImpliedFeatures); + else + getImpliedDisabledFeatures(ImpliedBits, + std::distance(std::begin(FeatureInfos), I)); + + // Convert all the found bits into strings. + getFeatureBitsAsStrings(ImpliedBits, ImpliedFeatures); } From llvm-commits at lists.llvm.org Mon Jul 6 23:15:04 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:15:04 +0000 (UTC) Subject: [PATCH] D83273: [X86] Remove the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp In-Reply-To: References: Message-ID: <7159e778ba97c1f8ba5eaf902f22b704@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG16f3d698f2af: [X86] Move the feature dependency handling in X86TargetInfo… (authored by craig.topper). Herald added a project: clang. Changed prior to commit: https://reviews.llvm.org/D83273?vs=275883&id=275913#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83273/new/ https://reviews.llvm.org/D83273 Files: clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h llvm/include/llvm/Support/X86TargetParser.def llvm/include/llvm/Support/X86TargetParser.h llvm/lib/Support/X86TargetParser.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83273.275913.patch Type: text/x-patch Size: 21842 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:17:19 2020 From: llvm-commits at lists.llvm.org (Greg Clayton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:17:19 +0000 (UTC) Subject: [PATCH] D82838: Parse section ranges when verifying DWARF so we can exclude addresses that should have been stripped from DWARF. In-Reply-To: References: Message-ID: <291f7f5d5b443134e083e0cdb12f8716@localhost.localdomain> clayborg added a comment. In D82838#2134725 , @dblaikie wrote: > I think maybe this is sort of orthogonal to 46453... maybe not, but kind of. > > Seems like we should filter out known-tombstoned ranges (the only ones we can know for sure are the new -1/-2 tombstones - all the others have ambiguities). Then we should maybe flag maybe-tombstones with a little "eh, maybe?". Then we should warn for anything left that's even partially outside the .text range (this patch), then we should warn for overlaps/etc on the remaining ones? So for this patch, anything that isn't in text becomes a warning and not an error? Or do we want to add an option to "llvm-dwarfdump --verify" to enforce the text ranges as feature that is disabled by default? --ignore-invalid-text-ranges? > But as @jhenderson said, maybe those first ones come later & we use the .text range to determine which things to look at for overlap first, then add new verifier checks for "things outside .text that aren't clearly tombstoned" knowing that some of those are expected limitations of (at least gold's) previous tombstoning strategies. > > (I'd sort of like to avoid actually looking at the object's executable sections - but I can't really fault the strategy & even if we added all the other verifier checks/warnings/etc, it'd still be super reasonable to warn about ranges that are otherwise totally valid, but extend beyond/are entirely outside the actual executable .text) Since zero is so prevalent, it is nice to get that noise out of the error checking since it creates so many false errors at the moment. It makes the --verify option less useful and way too noisy if we don't do something. We can also just not do the .text ranges for object files since they typically have relocations on each address. We already avoid looking at ranges in many cases for .o files. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82838/new/ https://reviews.llvm.org/D82838 From llvm-commits at lists.llvm.org Mon Jul 6 23:20:01 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:20:01 +0000 (UTC) Subject: [PATCH] D81375: [InstCombine] Simplify boolean Phis with const inputs using CFG In-Reply-To: References: Message-ID: <63fa7b8d58d5b07d96aab25936f39b37@localhost.localdomain> mkazantsev added a comment. At least 2 another transforms for selects are missing that we need to avoid regressions. I'm depreoritizing this as these efforts don't pay off. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81375/new/ https://reviews.llvm.org/D81375 From llvm-commits at lists.llvm.org Mon Jul 6 23:21:55 2020 From: llvm-commits at lists.llvm.org (LuoYuanke via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:21:55 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: <6bee971af617b8d8c1847cdcf9991f4a@localhost.localdomain> LuoYuanke added a comment. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 From llvm-commits at lists.llvm.org Mon Jul 6 23:25:20 2020 From: llvm-commits at lists.llvm.org (Serguei Katkov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:25:20 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <70992895e0082539580c56a408a5c840@localhost.localdomain> skatkov added inline comments. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:125 +// Return statepoint GC args as a set +static SmallSet collectGCRegs(MachineInstr &MI) { + StatepointOpers SO(&MI); ---------------- Do I understand correctly that with your changes ALL GC pointers must be defs? So why do you need these iterations instead of just taking all defs? ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:401 + + // To insert reload at the end of MBB, insert it before last instruction + // and then swap them. ---------------- what is the reason for this magic? ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:506 + + for (Register Reg : RegsToReload) + insertReloads(Reg); ---------------- Don't you want to separate reload loads into separate function? So we'll have: spill registers rewrite statepoint insertReloads/unspill registers Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Mon Jul 6 23:26:29 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:26:29 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: <967456f69e0308fc4a073eeb2fe5f7e7@localhost.localdomain> courbet added inline comments. ================ Comment at: llvm/include/llvm/IR/Instruction.h:69 + template + using BoolBitfieldElement = typename Bitfield::Element; + ---------------- The curretn naming does not make it immediately obvious what is a generic type (e.g. `AlignmentBitfieldElement`, `BoolBitfieldElement`) from an instantiation of this type with specific semantics (e.g. `AlignmentField`, `SwiftErrorField`). I think just appending a `T` or `Type` would make things more obvious: `AlignmentBitfieldElementT`, `BoolBitfieldElementT`. ================ Comment at: llvm/include/llvm/IR/Instructions.h:68 using AlignmentField = AlignmentBitfieldElement<0>; // Next bit:5 - using UsedWithInAllocaField = Bitfield::Element; // Next bit:6 - using SwiftErrorField = Bitfield::Element; // Next bit:7 + using UsedWithInAllocaField = BoolBitfieldElement<5>; // Next bit:6 + using SwiftErrorField = BoolBitfieldElement<6>; // Next bit:7 ---------------- What would you think of: ``` using AlignmentField = AlignmentBitfieldElement<0>; // Next bit:5 using UsedWithInAllocaField = BoolBitfieldElement; // Next bit:6 using SwiftErrorField = BoolBitfieldElement; // Next bit:7 ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 From llvm-commits at lists.llvm.org Mon Jul 6 23:27:28 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:27:28 +0000 (UTC) Subject: [PATCH] D80358: [MLIR] Add RegionKindInterface In-Reply-To: References: Message-ID: <38de18338d3662978f2a5cd994f286e7@localhost.localdomain> stephenneuendorffer updated this revision to Diff 275914. stephenneuendorffer marked 2 inline comments as done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80358/new/ https://reviews.llvm.org/D80358 Files: mlir/docs/Interfaces.md mlir/docs/LangRef.md mlir/include/mlir/IR/CMakeLists.txt mlir/include/mlir/IR/Dominance.h mlir/include/mlir/IR/RegionKindInterface.h mlir/include/mlir/IR/RegionKindInterface.td mlir/lib/IR/CMakeLists.txt mlir/lib/IR/Dominance.cpp mlir/lib/IR/RegionKindInterface.cpp mlir/lib/IR/Verifier.cpp mlir/lib/Transforms/CSE.cpp mlir/test/CMakeLists.txt mlir/test/IR/invalid.mlir mlir/test/IR/parser.mlir mlir/test/IR/traits.mlir mlir/test/lib/Dialect/Test/TestDialect.cpp mlir/test/lib/Dialect/Test/TestDialect.h mlir/test/lib/Dialect/Test/TestOps.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D80358.275914.patch Type: text/x-patch Size: 64053 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:35:48 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:35:48 +0000 (UTC) Subject: [PATCH] D83277: [WebAssembly] Generate unreachable after __stack_chk_fail In-Reply-To: References: Message-ID: tlively accepted this revision. tlively added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83277/new/ https://reviews.llvm.org/D83277 From llvm-commits at lists.llvm.org Mon Jul 6 23:40:15 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:40:15 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: <31480c9cd47b8dc84f5b6278ed4997c6@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 From llvm-commits at lists.llvm.org Mon Jul 6 23:42:33 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via llvm-commits) Date: Mon, 06 Jul 2020 23:42:33 -0700 (PDT) Subject: [llvm] 560292f - [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions Message-ID: <5f041959.1c69fb81.2b519.697e@mx.google.com> Author: Carl Ritson Date: 2020-07-07T15:40:44+09:00 New Revision: 560292fa990a2bfcf8415f07a166393beff667f6 URL: https://github.com/llvm/llvm-project/commit/560292fa990a2bfcf8415f07a166393beff667f6 DIFF: https://github.com/llvm/llvm-project/commit/560292fa990a2bfcf8415f07a166393beff667f6.diff LOG: [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions MAD/MAC is no longer always available. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83207 Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index d90272848500..79204180540f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4277,10 +4277,13 @@ bool SITargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF, switch (VT.getSimpleVT().SimpleTy) { case MVT::f32: { - // This is as fast on some subtargets. However, we always have full rate f32 - // mad available which returns the same result as the separate operations - // which we should prefer over fma. We can't use this if we want to support - // denormals, so only report this in these cases. + // If mad is not available this depends only on if f32 fma is full rate. + if (!Subtarget->hasMadMacF32Insts()) + return Subtarget->hasFastFMAF32(); + + // Otherwise f32 mad is always full rate and returns the same result as + // the separate operations so should be preferred over fma. + // However does not support denomals. if (hasFP32Denormals(MF)) return Subtarget->hasFastFMAF32() || Subtarget->hasDLInsts(); From llvm-commits at lists.llvm.org Mon Jul 6 23:42:37 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:42:37 +0000 (UTC) Subject: [PATCH] D83207: [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions In-Reply-To: References: Message-ID: <01e4f4b2010be59916b6d36214e0ede6@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG560292fa990a: [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions (authored by critson). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83207/new/ https://reviews.llvm.org/D83207 Files: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4277,10 +4277,13 @@ switch (VT.getSimpleVT().SimpleTy) { case MVT::f32: { - // This is as fast on some subtargets. However, we always have full rate f32 - // mad available which returns the same result as the separate operations - // which we should prefer over fma. We can't use this if we want to support - // denormals, so only report this in these cases. + // If mad is not available this depends only on if f32 fma is full rate. + if (!Subtarget->hasMadMacF32Insts()) + return Subtarget->hasFastFMAF32(); + + // Otherwise f32 mad is always full rate and returns the same result as + // the separate operations so should be preferred over fma. + // However does not support denomals. if (hasFP32Denormals(MF)) return Subtarget->hasFastFMAF32() || Subtarget->hasDLInsts(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83207.275919.patch Type: text/x-patch Size: 1043 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:51:50 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:51:50 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <978503bf59550af5a9164dc9a9cfd564@localhost.localdomain> david-arm updated this revision to Diff 275920. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83195/new/ https://reviews.llvm.org/D83195 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -4334,7 +4334,6 @@ EVT OutVT = N->getValueType(0); EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT); assert(NOutVT.isVector() && "This type must be promoted to a vector type"); - unsigned OutNumElems = OutVT.getVectorNumElements(); EVT NOutVTElem = NOutVT.getVectorElementType(); SDLoc dl(N); @@ -4371,6 +4370,7 @@ EVT InVT = InOp0.getValueType(); + unsigned OutNumElems = OutVT.getVectorNumElements(); SmallVector Ops; Ops.reserve(OutNumElems); for (unsigned i = 0; i != OutNumElems; ++i) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83195.275920.patch Type: text/x-patch Size: 805 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:54:32 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:54:32 +0000 (UTC) Subject: [PATCH] D83197: [CodeGen] Fix warning in DAGTypeLegalizer::SplitVecRes_ExtendOp In-Reply-To: References: Message-ID: david-arm updated this revision to Diff 275922. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83197/new/ https://reviews.llvm.org/D83197 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1767,8 +1767,7 @@ // more effectively move in the right direction and prevent falling down // to scalarization in many cases due to the input vector being split too // far. - unsigned NumElements = SrcVT.getVectorNumElements(); - if ((NumElements & 1) == 0 && + if ((SrcVT.getVectorMinNumElements() & 1) == 0 && SrcVT.getSizeInBits() * 2 < DestVT.getSizeInBits()) { LLVMContext &Ctx = *DAG.getContext(); EVT NewSrcVT = SrcVT.widenIntegerVectorElementType(Ctx); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83197.275922.patch Type: text/x-patch Size: 739 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:55:24 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:55:24 +0000 (UTC) Subject: [PATCH] D83198: [CodeGen] Fix warnings in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <86146d8f6f53321b12e3f382d0ae0e22@localhost.localdomain> david-arm updated this revision to Diff 275923. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83198/new/ https://reviews.llvm.org/D83198 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2179,13 +2179,18 @@ SDValue Idx = N->getOperand(1); SDLoc dl(N); SDValue Lo, Hi; + + assert(SubVT.isScalableVector() == + N->getOperand(0).getValueType().isScalableVector() && + "We only support extracting fixed length vectors from legal scalable " + "vector types"); GetSplitVector(N->getOperand(0), Lo, Hi); - uint64_t LoElts = Lo.getValueType().getVectorNumElements(); + uint64_t LoElts = Lo.getValueType().getVectorMinNumElements(); uint64_t IdxVal = cast(Idx)->getZExtValue(); if (IdxVal < LoElts) { - assert(IdxVal + SubVT.getVectorNumElements() <= LoElts && + assert(IdxVal + SubVT.getVectorMinNumElements() <= LoElts && "Extracted subvector crosses vector split!"); return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Lo, Idx); } else { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83198.275923.patch Type: text/x-patch Size: 1091 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:56:01 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:56:01 +0000 (UTC) Subject: [PATCH] D83203: [CodeGen] Fix warnings in SelectionDAG::SplitVector In-Reply-To: References: Message-ID: david-arm updated this revision to Diff 275924. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83203/new/ https://reviews.llvm.org/D83203 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -9614,14 +9614,18 @@ std::pair SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT, const EVT &HiVT) { - assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <= - N.getValueType().getVectorNumElements() && + assert(LoVT.isScalableVector() == HiVT.isScalableVector() && + LoVT.isScalableVector() == N.getValueType().isScalableVector() && + "Splitting vector with an invalid mixture of fixed and scalable " + "vector types"); + assert(LoVT.getVectorMinNumElements() + HiVT.getVectorMinNumElements() <= + N.getValueType().getVectorMinNumElements() && "More vector elements requested than available!"); SDValue Lo, Hi; Lo = getNode(ISD::EXTRACT_SUBVECTOR, DL, LoVT, N, getVectorIdxConstant(0, DL)); Hi = getNode(ISD::EXTRACT_SUBVECTOR, DL, HiVT, N, - getVectorIdxConstant(LoVT.getVectorNumElements(), DL)); + getVectorIdxConstant(LoVT.getVectorMinNumElements(), DL)); return std::make_pair(Lo, Hi); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83203.275924.patch Type: text/x-patch Size: 1315 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 23:56:35 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 06:56:35 +0000 (UTC) Subject: [PATCH] D83205: [SVE] Add checks for no warnings in CodeGen/AArch64/sve-sext-zext.ll In-Reply-To: References: Message-ID: <7ad07094585e8ae27159b2989b607639@localhost.localdomain> david-arm updated this revision to Diff 275925. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83205/new/ https://reviews.llvm.org/D83205 Files: llvm/test/CodeGen/AArch64/sve-sext-zext.ll Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-sext-zext.ll +++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning define @sext_i1_i8( %a) { ; CHECK-LABEL: sext_i1_i8: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83205.275925.patch Type: text/x-patch Size: 632 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:02:50 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:02:50 +0000 (UTC) Subject: [PATCH] D83196: [CodeGen] Fix a warning in DAGTypeLegalizer::SetSplitVector In-Reply-To: References: Message-ID: david-arm updated this revision to Diff 275926. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83196/new/ https://reviews.llvm.org/D83196 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp @@ -835,9 +835,9 @@ void DAGTypeLegalizer::SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi) { assert(Lo.getValueType().getVectorElementType() == - Op.getValueType().getVectorElementType() && - 2*Lo.getValueType().getVectorNumElements() == - Op.getValueType().getVectorNumElements() && + Op.getValueType().getVectorElementType() && + Lo.getValueType().getVectorElementCount() * 2 == + Op.getValueType().getVectorElementCount() && Hi.getValueType() == Lo.getValueType() && "Invalid type for split vector"); // Lo/Hi may have been newly allocated, if so, add nodeid's as relevant. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83196.275926.patch Type: text/x-patch Size: 929 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:04:08 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:04:08 +0000 (UTC) Subject: [PATCH] D83205: [SVE] Add checks for no warnings in CodeGen/AArch64/sve-sext-zext.ll In-Reply-To: References: Message-ID: <0446ea2c8f0ef59b21e97e6804ab7e45@localhost.localdomain> sdesmalen accepted this revision. sdesmalen added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83205/new/ https://reviews.llvm.org/D83205 From llvm-commits at lists.llvm.org Tue Jul 7 00:08:57 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:08:57 +0000 (UTC) Subject: [PATCH] D83192: Fix off by one error in Bitfields In-Reply-To: References: Message-ID: <5b6df5f9f9602ed0ae83eda9b6fdd5c3@localhost.localdomain> gchatelet added a comment. In D83192#2135019 , @courbet wrote: > Can you add a unit test for this ? I'll add the following test as a separate commit if it LGTY. Note: I can't use EXPECT_EQ because it takes the arguments by `const &` and the properties are `static constexpr`, the compiler complains about undefined reference. TEST(BitfieldsTest, Properties) { using A = Bitfield::Element; EXPECT_TRUE(A::FirstBit == 0U); EXPECT_TRUE(A::LastBit == 0U); EXPECT_TRUE(A::Shift == 0U); EXPECT_TRUE(A::Bits == 1U); using B = Bitfield::Element; EXPECT_TRUE(B::FirstBit == 3U); EXPECT_TRUE(B::LastBit == 6U); EXPECT_TRUE(B::Shift == 3U); EXPECT_TRUE(B::Bits == 4U); } Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83192/new/ https://reviews.llvm.org/D83192 From llvm-commits at lists.llvm.org Tue Jul 7 00:09:09 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:09:09 +0000 (UTC) Subject: [PATCH] D82980: [NFC] Run clang-format on llvm-objcopy. In-Reply-To: References: Message-ID: jhenderson added a comment. In D82980#2133616 , @rupprecht wrote: > In D82980#2125967 , @MaskRay wrote: > > > It is subjective but it seems to me that clang-format is making bad decisions for .td files. Many people don't format .td files (various lib/Target/*/*.td and include/clang/Driver/Options.td) > > > The td formatting is less robust, but works well for simple CLI option files. I'm not particularly tied to the format it chooses, but I do find the options files easier to read when consistently formatted. > > The other td files you mentioned are much more complex, and I've tried using clang-format on them. It totally botches them and I don't think we should bother with complex files. > > > Formatting C++ files is fine but it seems that we run into some discrepancy between clang-format versions. Formatting with latest clang-format is probably fine. > > Yep > > In D82980#2127279 , @jhenderson wrote: > > > No issue in principle with this, but we do need to figure out the canonical version of clang-format we want to use for this if we are going to do it. I have no personal opinion on it, but suspect my installed clang-format version is out-of-date, and that if I were to do the same thing you did I'd get different results. > > > What does `clang-format --version` look like on your machine? The one on my machine seems to be updated on a rolling basis (built from trunk on a regular basis), but `sudo apt install clang-format-10` is available to use a more stable version. I'm primarily a Windows-based developer, so don't expect me to be sudo-ing or apt-getting anything. I actually have 3 different clang-formats installed on my machine, it looks like, partly because it comes as part of our downstream toolchain installation package. The one that appears to be used under my most frequent usage is based on clang v8, since that seems to be the one tied to our Visual Studio extension that hooks it into the IDE, but I also have two v9 versions kicking around. I update the toolchain installation maybe once or at most twice a year, and there's usually some lag between that and the current LLVM version, so I'm always likely to be a year or two out-of-date. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82980/new/ https://reviews.llvm.org/D82980 From llvm-commits at lists.llvm.org Tue Jul 7 00:10:21 2020 From: llvm-commits at lists.llvm.org (Jonas Hahnfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:10:21 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <71f9285de61d63872f412f624a0c7ec9@localhost.localdomain> Hahnfeld added a comment. This is definitely not NFC and breaks ABI compatibility (but apparently nobody cares anymore). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 00:10:58 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:10:58 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: <09d1361892dd47bb966d665fa8a7ae9f@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 From llvm-commits at lists.llvm.org Tue Jul 7 00:12:52 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:12:52 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <28a526731bab3bd2933f35ac5982459c@localhost.localdomain> jdoerfert updated this revision to Diff 275927. jdoerfert marked 6 inline comments as done. jdoerfert added a comment. Addressed comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83271.275927.patch Type: text/x-patch Size: 12129 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:12:56 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:12:56 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <4256fdb76a00e4e81978509f156bceb6@localhost.localdomain> jdoerfert added a comment. In D83271#2134746 , @JonChesterfield wrote: > That's interesting. Amdgpu does not handle function pointers well and I suspect nvptx has considerable performance overhead for them too. If a parallel region is only called from a single target region, it is always passed the same function pointer. Thus specialise the state machine. I think this machinery is equivalent to specialising the parallel region call. The problem here was the spurious call edge from an unrelated kernel to the outlined parallel function. ptxas then needed more registers for a trivial kernel as it was "thought" to call the outlined function. > The general case involves calling one parallel region runtime function with various different function pointers. Devirtualising that is fairly difficult. For another time. > > For this simpler case, I think this transform is equivalent to specialising the various kmpc*parallel calls on a given function pointer. The callees are available when using a bitcode deviceRTL. > > Iirc function specialisation / partial evaluation is one of the classic compiler optimisations that LLVM doesn't really do. It's difficult to define a good cost model and C exposes function pointer comparison. What we could implement for this is an attribute driven one, where we mark the function pointer arguments in the deviceRTL with such and use LTO. Avoid specialising a function whose address escapes. > > I like this patch. It's a clear example of an effective openmp specific optimisation. It just happens to run very close to specialisation, which may not be that much harder to implement if we cheat on the cost model. Specialization is (soonish) coming to the Attributor ;) ---- ================ Comment at: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll:278-280 +!llvm.module.flags = !{!0, !1, !2, !3} +!omp_offload.info = !{!4} +!nvvm.annotations = !{!5, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8} ---------------- arsenm wrote: > Mostly unneeded metadata? Interestingly, that is what our device runtime looks like. For reasons I haven't understood yet it has all these "null is aligned" annotations. CUDA is weird. Anyway, I can strip this down too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 From llvm-commits at lists.llvm.org Tue Jul 7 00:13:47 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:13:47 +0000 (UTC) Subject: [PATCH] D83208: [llvm-readobj] - Refactor ELFDumper::getStaticSymbolName. In-Reply-To: References: Message-ID: <380095a71d3d4c2a0122f3af6d416d61@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:1135 +std::string ELFDumper::getStaticSymbolName(uint32_t Index) const { + auto ReportWarn = [&](Error E) -> std::string { + this->reportUniqueWarning( ---------------- Either simply `Warn` or `ReportWarning`, I think. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83208/new/ https://reviews.llvm.org/D83208 From llvm-commits at lists.llvm.org Tue Jul 7 00:17:33 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:17:33 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: <00152fb81f2b1a1652a5e3d5ecb869ca@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, but see my inline comment (happy for you to commit the two patches separately or as one). ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:6559-6566 + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); ---------------- grimar wrote: > grimar wrote: > > jhenderson wrote: > > > This seems like a pattern we're likely to have in several different parts of the ELFDumper. Is there any code we could share to avoid duplication? Maybe it just makes sense to change `getStaticSymbolName` to report the warning/return the `` itself? > > From what I see, the `getStaticSymbolName` is used in one more place: > > > > ``` > > template > > void LLVMStyle::printAddrsig(const ELFFile *Obj) { > > ... > > for (uint64_t Sym : *V) { > > Expected NameOrErr = this->dumper()->getStaticSymbolName(Sym); > > if (NameOrErr) { > > W.printNumber("Sym", *NameOrErr, Sym); > > continue; > > } > > reportWarning(NameOrErr.takeError(), this->FileName); > > W.printNumber("Sym", "", Sym); > > } > > } > > ``` > > > > And it looks like it should be reasonable and possible to do what you suggest. Follow-up? > Follow-up: D83208 Normally, I'd say yes, follow-up, but in this case, D83208 looks small enough that it could be done as part of this, especially as most of that change is just undoing what some of this change does (in particular, the `GetSymName` lamdba replaces `getStaticSymbolName` in this change and then is just replaced back the other way again). I don't feel strongly about this though. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 From llvm-commits at lists.llvm.org Tue Jul 7 00:18:51 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 00:18:51 -0700 (PDT) Subject: [llvm] ef4cc70 - [X86] Remove assert for missing features from X86::getImpliedFeatures Message-ID: <5f0421db.1c69fb81.1e041.80e8@mx.google.com> Author: Craig Topper Date: 2020-07-07T00:18:01-07:00 New Revision: ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846 URL: https://github.com/llvm/llvm-project/commit/ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846 DIFF: https://github.com/llvm/llvm-project/commit/ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846.diff LOG: [X86] Remove assert for missing features from X86::getImpliedFeatures This is failing on the bots. Remove while I try to figure out what feature I missed in the table. Added: Modified: llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 5e4f62d8a1d6..12182179fe45 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -558,8 +558,6 @@ void llvm::X86::getImpliedFeatures( auto I = llvm::find_if( FeatureInfos, [&](const FeatureInfo &FI) { return FI.Name == Feature; }); if (I == std::end(FeatureInfos)) { - // This shouldn't happen, but handle it gracefully for release builds. - assert(false && "Feature not in table!"); return; } From llvm-commits at lists.llvm.org Tue Jul 7 00:20:19 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:20:19 +0000 (UTC) Subject: [PATCH] D83173: [VE] Correct stack alignment In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGdf3bda047d5a: [VE] Correct stack alignment (authored by kaz7). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83173/new/ https://reviews.llvm.org/D83173 Files: clang/lib/Basic/Targets/VE.h clang/test/CodeGen/target-data.c llvm/lib/Target/VE/VETargetMachine.cpp Index: llvm/lib/Target/VE/VETargetMachine.cpp =================================================================== --- llvm/lib/Target/VE/VETargetMachine.cpp +++ llvm/lib/Target/VE/VETargetMachine.cpp @@ -41,8 +41,8 @@ // VE supports 32 bit and 64 bits integer on registers Ret += "-n32:64"; - // Stack alignment is 64 bits - Ret += "-S64"; + // Stack alignment is 128 bits + Ret += "-S128"; return Ret; } Index: clang/test/CodeGen/target-data.c =================================================================== --- clang/test/CodeGen/target-data.c +++ clang/test/CodeGen/target-data.c @@ -250,3 +250,7 @@ // RUN: %clang_cc1 -triple bpfeb -o - -emit-llvm %s | \ // RUN: FileCheck %s -check-prefix=BPFEB // BPFEB: target datalayout = "E-m:e-p:64:64-i64:64-i128:128-n32:64-S128" + +// RUN: %clang_cc1 -triple ve -o - -emit-llvm %s | \ +// RUN: FileCheck %s -check-prefix=VE +// VE: target datalayout = "e-m:e-i64:64-n32:64-S128" Index: clang/lib/Basic/Targets/VE.h =================================================================== --- clang/lib/Basic/Targets/VE.h +++ clang/lib/Basic/Targets/VE.h @@ -45,7 +45,7 @@ WCharType = UnsignedInt; WIntType = UnsignedInt; UseZeroLengthBitfieldAlignment = true; - resetDataLayout("e-m:e-i64:64-n32:64-S64"); + resetDataLayout("e-m:e-i64:64-n32:64-S128"); } void getTargetDefines(const LangOptions &Opts, -------------- next part -------------- A non-text attachment was scrubbed... Name: D83173.275586.patch Type: text/x-patch Size: 1396 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:04 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:04 +0000 (UTC) Subject: [PATCH] D82456: [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG55227f85d09c: [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost (authored by dmgreen). Changed prior to commit: https://reviews.llvm.org/D82456?vs=272986&id=275588#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82456/new/ https://reviews.llvm.org/D82456 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Analysis/CostModel/ARM/load_store.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82456.275588.patch Type: text/x-patch Size: 29559 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:19 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:19 +0000 (UTC) Subject: [PATCH] D83160: [InstCombine] Lower infinite combine loop detection thresholds In-Reply-To: References: Message-ID: <1e34b36b667c235f1f51c91bcbe59f09@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGcd7f8051ac7b: [InstCombine] Lower infinite combine loop detection thresholds (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83160/new/ https://reviews.llvm.org/D83160 Files: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Index: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp =================================================================== --- llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,8 +123,13 @@ DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); +// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; +#ifndef NDEBUG +static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; +#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; +#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83160.275589.patch Type: text/x-patch Size: 795 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:31 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:31 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <17b4a75215c880a3e7ae1b3c810ecc7c@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG28b7816b782b: [Scalarizer] ExtractElement handling w/ constant extract index (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/constant-extractelement.ll llvm/test/Transforms/Scalarizer/phi-unreachable-pred.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83101.275591.patch Type: text/x-patch Size: 5424 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:31 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:31 +0000 (UTC) Subject: [PATCH] D83102: [Scalarizer] InsertElement handling w/ constant insert index In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGf62c8dbc99ea: [Scalarizer] InsertElement handling w/ constant insert index (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83102/new/ https://reviews.llvm.org/D83102 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/basic.ll llvm/test/Transforms/Scalarizer/constant-insertelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83102.275590.patch Type: text/x-patch Size: 4797 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:36 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:36 +0000 (UTC) Subject: [PATCH] D82961: [Scalarizer] InsertElement handling w/ variable insert index (PR46524) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG6e5047458132: [Scalarizer] InsertElement handling w/ variable insert index (PR46524) (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82961/new/ https://reviews.llvm.org/D82961 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-insertelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82961.275592.patch Type: text/x-patch Size: 9456 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:23:48 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:23:48 +0000 (UTC) Subject: [PATCH] D82970: [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG51f9310ff2e3: [Scalarizer] ExtractElement handling w/ variable insert index (PR46524) (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82970/new/ https://reviews.llvm.org/D82970 Files: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/variable-extractelement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82970.275593.patch Type: text/x-patch Size: 7404 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:24:20 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:24:20 +0000 (UTC) Subject: [PATCH] D83128: [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc In-Reply-To: References: Message-ID: <3df6a020cc62dd0312724c7959f5473d@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGcd209f1a3790: [Support] Add path::user_config_directory for $XDG_CONFIG_HOME etc (authored by sammccall). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83128/new/ https://reviews.llvm.org/D83128 Files: llvm/include/llvm/Support/Path.h llvm/lib/Support/Unix/Path.inc llvm/lib/Support/Windows/Path.inc llvm/unittests/Support/Path.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83128.275594.patch Type: text/x-patch Size: 4154 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:24:43 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:24:43 +0000 (UTC) Subject: [PATCH] D82457: [ARM] Add extra extend and trunc costs for cast instructions In-Reply-To: References: Message-ID: <96e6ab8260cf6dd4b0c5c1ee07580ce5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG60b8b2beeab9: [ARM] Add extra extend and trunc costs for cast instructions (authored by dmgreen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82457/new/ https://reviews.llvm.org/D82457 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll Index: llvm/test/Analysis/CostModel/ARM/cast_ldst.ll =================================================================== --- llvm/test/Analysis/CostModel/ARM/cast_ldst.ll +++ llvm/test/Analysis/CostModel/ARM/cast_ldst.ll @@ -122,8 +122,8 @@ ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %v8864u = zext <8 x i8> %loadv8i8 to <8 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816s = sext <16 x i8> %loadv16i8 to <16 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816u = zext <16 x i8> %loadv16i8 to <16 x i16> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832s = sext <16 x i8> %loadv16i8 to <16 x i32> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832u = zext <16 x i8> %loadv16i8 to <16 x i32> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 1322 for instruction: %v16864s = sext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 298 for instruction: %v16864u = zext <16 x i8> %loadv16i8 to <16 x i64> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v21632s = sext <2 x i16> %loadv2i16 to <2 x i32> @@ -758,7 +758,7 @@ ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8832 = trunc <8 x i32> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v8864 = trunc <8 x i64> undef to <8 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16816 = trunc <16 x i16> undef to <16 x i8> -; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> +; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16832 = trunc <16 x i32> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v16864 = trunc <16 x i64> undef to <16 x i8> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21632 = trunc <2 x i32> undef to <2 x i16> ; CHECK-MVE-RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v21664 = trunc <2 x i64> undef to <2 x i16> Index: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp =================================================================== --- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -228,12 +228,39 @@ {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 0}, {ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i8, 0}, {ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i8, 0}, + // The following extend from a legal type to an illegal type, so need to + // split the load. This introduced an extra load operation, but the + // extend is still "free". + {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 1}, + {ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 3}, + {ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 1}, + {ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 1}, }; if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { if (const auto *Entry = ConvertCostTableLookup(MVELoadConversionTbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT())) - return AdjustCost(Entry->Cost); + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); + } + } + + // The truncate of a store is free. This is the mirror of extends above. + if (I && I->hasOneUse() && isa(*I->user_begin())) { + static const TypeConversionCostTblEntry MVELoadConversionTbl[] = { + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i16, 0}, + {ISD::TRUNCATE, MVT::v4i32, MVT::v4i8, 0}, + {ISD::TRUNCATE, MVT::v8i16, MVT::v8i8, 0}, + {ISD::TRUNCATE, MVT::v8i32, MVT::v8i16, 1}, + {ISD::TRUNCATE, MVT::v16i32, MVT::v16i8, 3}, + {ISD::TRUNCATE, MVT::v16i16, MVT::v16i8, 1}, + }; + if (SrcTy.isVector() && ST->hasMVEIntegerOps()) { + if (const auto *Entry = + ConvertCostTableLookup(MVELoadConversionTbl, ISD, SrcTy.getSimpleVT(), + DstTy.getSimpleVT())) + return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor()); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82457.275595.patch Type: text/x-patch Size: 4820 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:24:56 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:24:56 +0000 (UTC) Subject: [PATCH] D82368: [SystemZ/zos] Define Endian constants for z/OS. In-Reply-To: References: Message-ID: <67b330566fe0155a86f43d608d5e72a4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG0663844b064d: [SystemZ/ZOS] Define Endian constants for z/OS. (authored by Kai). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82368/new/ https://reviews.llvm.org/D82368 Files: llvm/include/llvm/Support/SwapByteOrder.h Index: llvm/include/llvm/Support/SwapByteOrder.h =================================================================== --- llvm/include/llvm/Support/SwapByteOrder.h +++ llvm/include/llvm/Support/SwapByteOrder.h @@ -36,6 +36,10 @@ #else #define BYTE_ORDER LITTLE_ENDIAN #endif +#elif defined(__MVS__) +#define BIG_ENDIAN 4321 +#define LITTLE_ENDIAN 1234 +#define BYTE_ORDER BIG_ENDIAN #else #if !defined(BYTE_ORDER) && !defined(_WIN32) #include -------------- next part -------------- A non-text attachment was scrubbed... Name: D82368.275596.patch Type: text/x-patch Size: 467 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:25:02 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:25:02 +0000 (UTC) Subject: [PATCH] D82544: [SystemZ][ZOS] Implement getMainExecutable() and is_local_impl() In-Reply-To: References: Message-ID: <1598cf22d093d8c5408aee1b7249cb56@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbfd84b1c034d: [SystemZ/ZOS] Implement getMainExecutable() and is_local_impl() (authored by Kai). Changed prior to commit: https://reviews.llvm.org/D82544?vs=274075&id=275597#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82544/new/ https://reviews.llvm.org/D82544 Files: llvm/lib/Support/Unix/Path.inc Index: llvm/lib/Support/Unix/Path.inc =================================================================== --- llvm/lib/Support/Unix/Path.inc +++ llvm/lib/Support/Unix/Path.inc @@ -48,6 +48,8 @@ #endif #elif defined(__DragonFly__) #include +#elif defined(__MVS__) +#include #endif // Both stdio.h and cstdio are included via different paths and @@ -56,9 +58,13 @@ #undef ferror #undef feof +#if !defined(PATH_MAX) // For GNU Hurd -#if defined(__GNU__) && !defined(PATH_MAX) -# define PATH_MAX 4096 +#if defined(__GNU__) +#define PATH_MAX 4096 +#elif defined(__MVS__) +#define PATH_MAX _XOPEN_PATH_MAX +#endif #endif #include @@ -100,7 +106,8 @@ #define STATVFS_F_FRSIZE(vfs) static_cast(vfs.f_bsize) #endif -#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) +#if defined(__NetBSD__) || defined(__DragonFly__) || defined(__GNU__) || \ + defined(__MVS__) #define STATVFS_F_FLAG(vfs) (vfs).f_flag #else #define STATVFS_F_FLAG(vfs) (vfs).f_flags @@ -265,6 +272,26 @@ // Fall back to the classical detection. if (getprogpath(exe_path, argv0)) return exe_path; +#elif defined(__MVS__) + int token = 0; + W_PSPROC buf; + char exe_path[PS_PATHBLEN]; + pid_t pid = getpid(); + + memset(&buf, 0, sizeof(buf)); + buf.ps_pathptr = exe_path; + buf.ps_pathlen = sizeof(exe_path); + + while (true) { + if ((token = w_getpsent(token, &buf, sizeof(buf))) <= 0) + break; + if (buf.ps_pid != pid) + continue; + char real_path[PATH_MAX]; + if (realpath(exe_path, real_path)) + return std::string(real_path); + break; // Found entry, but realpath failed. + } #elif defined(HAVE_DLFCN_H) && defined(HAVE_DLADDR) // Use dladdr to get executable path if available. Dl_info DLInfo; @@ -493,6 +520,10 @@ // vmount entry not found; "remote" is the conservative answer. return false; +#elif defined(__MVS__) + // The file system can have an arbitrary structure on z/OS; must go with the + // conservative answer. + return false; #else return !!(STATVFS_F_FLAG(Vfs) & MNT_LOCAL); #endif -------------- next part -------------- A non-text attachment was scrubbed... Name: D82544.275597.patch Type: text/x-patch Size: 2126 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:25:14 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:25:14 +0000 (UTC) Subject: [PATCH] D82539: [TargetLowering] Improve expansion of ROTL/ROTR In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe7a4a24dc50a: [TargetLowering] Improve expansion of ROTL/ROTR (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82539/new/ https://reviews.llvm.org/D82539 Files: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Index: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6177,12 +6177,15 @@ SDLoc DL(SDValue(Node, 0)); EVT ShVT = Op1.getValueType(); - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); + SDValue Zero = DAG.getConstant(0, DL, ShVT); - // If a rotate in the other direction is legal, use it. + assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && + "Expecting the type bitwidth to be a power of 2"); + + // If a rotate in the other direction is supported, use it. unsigned RevRot = IsLeft ? ISD::ROTR : ISD::ROTL; - if (isOperationLegal(RevRot, VT)) { - SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + if (isOperationLegalOrCustom(RevRot, VT)) { + SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); Result = DAG.getNode(RevRot, DL, VT, Op0, Sub); return true; } @@ -6195,15 +6198,13 @@ return false; // Otherwise, - // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and w-c, w-1))) - // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and w-c, w-1))) + // (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and -c, w-1))) + // (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and -c, w-1))) // - assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 && - "Expecting the type bitwidth to be a power of 2"); unsigned ShOpc = IsLeft ? ISD::SHL : ISD::SRL; unsigned HsOpc = IsLeft ? ISD::SRL : ISD::SHL; SDValue BitWidthMinusOneC = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, Op1); + SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1); SDValue And0 = DAG.getNode(ISD::AND, DL, ShVT, Op1, BitWidthMinusOneC); SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC); Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0), -------------- next part -------------- A non-text attachment was scrubbed... Name: D82539.275599.patch Type: text/x-patch Size: 2063 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:25:21 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:25:21 +0000 (UTC) Subject: [PATCH] D82540: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount In-Reply-To: References: Message-ID: <6841725da568d9cecc9eed2e02ea6994@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGbabbeafa006f: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82540/new/ https://reviews.llvm.org/D82540 Files: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Index: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -6117,6 +6117,14 @@ return Ok; } +// Check that (every element of) Z is undef or not an exact multiple of BW. +static bool isNonZeroModBitWidth(SDValue Z, unsigned BW) { + return ISD::matchUnaryPredicate( + Z, + [=](ConstantSDNode *C) { return !C || C->getAPIntValue().urem(BW) != 0; }, + true); +} + bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result, SelectionDAG &DAG) const { EVT VT = Node->getValueType(0); @@ -6127,40 +6135,52 @@ !isOperationLegalOrCustomOrPromote(ISD::OR, VT))) return false; - // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) - // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) SDValue X = Node->getOperand(0); SDValue Y = Node->getOperand(1); SDValue Z = Node->getOperand(2); - unsigned EltSizeInBits = VT.getScalarSizeInBits(); + unsigned BW = VT.getScalarSizeInBits(); bool IsFSHL = Node->getOpcode() == ISD::FSHL; SDLoc DL(SDValue(Node, 0)); EVT ShVT = Z.getValueType(); - SDValue Mask = DAG.getConstant(EltSizeInBits - 1, DL, ShVT); - SDValue ShAmt, InvShAmt; - if (isPowerOf2_32(EltSizeInBits)) { - // Z % BW -> Z & (BW - 1) - ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); - // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) - InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); - } else { - SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT); - ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); - InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); - } - SDValue One = DAG.getConstant(1, DL, ShVT); SDValue ShX, ShY; - if (IsFSHL) { - ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); - SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); - ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + SDValue ShAmt, InvShAmt; + if (isNonZeroModBitWidth(Z, BW)) { + // fshl: X << C | Y >> (BW - C) + // fshr: X << (BW - C) | Y >> C + // where C = Z % BW is not zero + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, ShAmt); + ShX = DAG.getNode(ISD::SHL, DL, VT, X, IsFSHL ? ShAmt : InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, IsFSHL ? InvShAmt : ShAmt); } else { - SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); - ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); - ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW)) + // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW) + SDValue Mask = DAG.getConstant(BW - 1, DL, ShVT); + if (isPowerOf2_32(BW)) { + // Z % BW -> Z & (BW - 1) + ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask); + // (BW - 1) - (Z % BW) -> ~Z & (BW - 1) + InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask); + } else { + SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT); + ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC); + InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt); + } + + SDValue One = DAG.getConstant(1, DL, ShVT); + if (IsFSHL) { + ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt); + SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One); + ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt); + } else { + SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One); + ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt); + ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt); + } } Result = DAG.getNode(ISD::OR, DL, VT, ShX, ShY); return true; -------------- next part -------------- A non-text attachment was scrubbed... Name: D82540.275600.patch Type: text/x-patch Size: 3921 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:25:47 2020 From: llvm-commits at lists.llvm.org (EsmeYi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:25:47 +0000 (UTC) Subject: [PATCH] D82145: [NFC][PowerPC] Legalize SREM/UREM directly on P9. In-Reply-To: References: Message-ID: <75294d01e69eaedee02b15322c06e447@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG0607c8df7faf: [PowerPC] Legalize SREM/UREM directly on P9. (authored by Esme). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82145/new/ https://reviews.llvm.org/D82145 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/test/CodeGen/PowerPC/ppc64-P9-mod.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82145.275602.patch Type: text/x-patch Size: 4426 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:26:28 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:26:28 +0000 (UTC) Subject: [PATCH] D82458: [ARM] Adjust default fp extend and trunc costs In-Reply-To: References: Message-ID: <7e06e7393256a054268cf27e2a599e44@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGafdb2ef2ed9d: [ARM] Adjust default fp extend and trunc costs (authored by dmgreen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82458/new/ https://reviews.llvm.org/D82458 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast.ll llvm/test/Analysis/CostModel/ARM/cast_ldst.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82458.275606.patch Type: text/x-patch Size: 205917 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:09 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:09 +0000 (UTC) Subject: [PATCH] D81813: [ARM] MVE FP16 cost adjustments In-Reply-To: References: Message-ID: <528790d7819f5d42816aaa948e46c9a1@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG146dad0077b4: [ARM] MVE FP16 cost adjustments (authored by dmgreen). Changed prior to commit: https://reviews.llvm.org/D81813?vs=273015&id=275611#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81813/new/ https://reviews.llvm.org/D81813 Files: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/test/Analysis/CostModel/ARM/cast_ldst.ll llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81813.275611.patch Type: text/x-patch Size: 15432 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:26 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard (Linaro) via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:26 +0000 (UTC) Subject: [PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8 In-Reply-To: References: Message-ID: <26095c9c697d5374bf02cbf8445a452f@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGe80b81d1cbf8: [Support] Fix formatted_raw_ostream for UTF-8 (authored by ostannard). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76291/new/ https://reviews.llvm.org/D76291 Files: clang/test/Analysis/checker-plugins.c llvm/include/llvm/Support/FormattedStream.h llvm/lib/Support/FormattedStream.cpp llvm/test/MC/ARM/lsl-zero.s llvm/unittests/Support/formatted_raw_ostream_test.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D76291.275612.patch Type: text/x-patch Size: 13508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:29 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:29 +0000 (UTC) Subject: [PATCH] D82481: [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s) In-Reply-To: References: Message-ID: <1036cd03416af6560665321176efba2d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG6d3ae365bdfc: [XCOFF][AIX] Give symbol an internal name when desired symbol name contains… (authored by jasonliu). Changed prior to commit: https://reviews.llvm.org/D82481?vs=275463&id=275613#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82481/new/ https://reviews.llvm.org/D82481 Files: llvm/include/llvm/MC/MCAsmInfo.h llvm/include/llvm/MC/MCContext.h llvm/include/llvm/MC/MCSectionXCOFF.h llvm/include/llvm/MC/MCStreamer.h llvm/include/llvm/MC/MCSymbolXCOFF.h llvm/include/llvm/MC/MCXCOFFStreamer.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/MC/MCAsmInfoXCOFF.cpp llvm/lib/MC/MCAsmStreamer.cpp llvm/lib/MC/MCContext.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/MC/MCSymbolXCOFF.cpp llvm/lib/MC/XCOFFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll llvm/test/CodeGen/PowerPC/test_func_desc.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82481.275613.patch Type: text/x-patch Size: 29755 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:42 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:42 +0000 (UTC) Subject: [PATCH] D83198: [CodeGen] Fix warnings in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <92e4d0f18ea9495258cef37f9b6b904a@localhost.localdomain> sdesmalen added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2183 + + assert(SubVT.isScalableVector() == + N->getOperand(0).getValueType().isScalableVector() && ---------------- I think this assert is saying that this is not *yet* supported, because if extracting a fixed-width vector from a scalable vector is allowed then it should (at some point) also support extracting it from an illegal scalable vector. So maybe better to change the wording to `"Extracting a fixed-length vector from a scalable vector is not yet supported"`, and perhaps you can wrap it in a `report_fatal_error(..)` instead of an assert, because this is more a `is-unsupported-error`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83198/new/ https://reviews.llvm.org/D83198 From llvm-commits at lists.llvm.org Tue Jul 7 00:27:48 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Lu=C3=ADs_Marques_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 07:27:48 +0000 (UTC) Subject: [PATCH] D79690: [RISCV] Fold ADDIs into load/stores with nonzero offsets In-Reply-To: References: Message-ID: <90902bd3b61ffdedbf436b0feb5e526b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG61c2a0bb8236: [RISCV] Fold ADDIs into load/stores with nonzero offsets (authored by luismarques). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79690/new/ https://reviews.llvm.org/D79690 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll llvm/test/CodeGen/RISCV/callee-saved-gprs.ll llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll llvm/test/CodeGen/RISCV/fp128.ll llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll llvm/test/CodeGen/RISCV/wide-mem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79690.275615.patch Type: text/x-patch Size: 148485 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:49 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:49 +0000 (UTC) Subject: [PATCH] D83138: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGc1a5f73a4ae7: [ELF][ARM] Represent R_ARM_LDO32 as R_DTPREL instead of R_ABS (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83138/new/ https://reviews.llvm.org/D83138 Files: lld/ELF/Arch/ARM.cpp lld/ELF/Relocations.cpp lld/test/ELF/debug-dead-reloc-tls-arm.s Index: lld/test/ELF/debug-dead-reloc-tls-arm.s =================================================================== --- lld/test/ELF/debug-dead-reloc-tls-arm.s +++ lld/test/ELF/debug-dead-reloc-tls-arm.s @@ -7,8 +7,7 @@ # RUN: llvm-objdump -s %t | FileCheck %s # CHECK: Contents of section .debug_info: -## FIXME: Use ffffffff -# CHECK-NEXT: 0000 00000000 +# CHECK-NEXT: 0000 ffffffff .globl _start _start: Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -238,7 +238,7 @@ } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && !config->shared) { + if (expr == R_DTPREL && canRelax && !config->shared) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); Index: lld/ELF/Arch/ARM.cpp =================================================================== --- lld/ELF/Arch/ARM.cpp +++ lld/ELF/Arch/ARM.cpp @@ -121,6 +121,8 @@ return R_TLSGD_PC; case R_ARM_TLS_LDM32: return R_TLSLD_PC; + case R_ARM_TLS_LDO32: + return R_DTPREL; case R_ARM_BASE_PREL: // B(S) + A - P // FIXME: currently B(S) assumed to be .got, this may not hold for all -------------- next part -------------- A non-text attachment was scrubbed... Name: D83138.275616.patch Type: text/x-patch Size: 1314 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:27:50 2020 From: llvm-commits at lists.llvm.org (David Tenty via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:27:50 +0000 (UTC) Subject: [PATCH] D82905: [AIX] Add system-aix to lit config file In-Reply-To: References: Message-ID: <8911c5b740d6bb4593216dbc02e45a4d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG2402f9385e85: [AIX] Add system-aix to lit config file (authored by ShuhongL, committed by daltenty). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82905/new/ https://reviews.llvm.org/D82905 Files: llvm/utils/lit/lit/llvm/config.py llvm/utils/lit/tests/lit.cfg llvm/utils/lit/tests/shtest-format-argv0.py Index: llvm/utils/lit/tests/shtest-format-argv0.py =================================================================== --- llvm/utils/lit/tests/shtest-format-argv0.py +++ llvm/utils/lit/tests/shtest-format-argv0.py @@ -5,7 +5,7 @@ # # This test is not supported on AIX since `[` is only available as a shell builtin # and is not installed under PATH by default. -# UNSUPPORTED: aix +# UNSUPPORTED: system-aix # # RUN: %{lit} -j 1 -v %{inputs}/shtest-format-argv0 | FileCheck %s Index: llvm/utils/lit/tests/lit.cfg =================================================================== --- llvm/utils/lit/tests/lit.cfg +++ llvm/utils/lit/tests/lit.cfg @@ -87,7 +87,7 @@ if sys.platform.startswith('win') or sys.platform.startswith('cygwin'): config.available_features.add('system-windows') if platform.system() == 'AIX': - config.available_features.add('aix') + config.available_features.add('system-aix') # For each of lit's internal shell commands ('env', 'cd', 'diff', etc.), put # a fake command that always fails at the start of PATH. This helps us check Index: llvm/utils/lit/lit/llvm/config.py =================================================================== --- llvm/utils/lit/lit/llvm/config.py +++ llvm/utils/lit/lit/llvm/config.py @@ -51,12 +51,14 @@ elif platform.system() == 'Windows': # For tests that require Windows to run. features.add('system-windows') - elif platform.system() == "Linux": + elif platform.system() == 'Linux': features.add('system-linux') elif platform.system() in ['FreeBSD']: features.add('system-freebsd') - elif platform.system() == "NetBSD": + elif platform.system() == 'NetBSD': features.add('system-netbsd') + elif platform.system() == 'AIX': + features.add('system-aix') # Native compilation: host arch == default triple arch # Both of these values should probably be in every site config (e.g. as -------------- next part -------------- A non-text attachment was scrubbed... Name: D82905.275617.patch Type: text/x-patch Size: 2018 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:28:07 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:28:07 +0000 (UTC) Subject: [PATCH] D83164: [flang] Basic tests of external I/O runtime (part 9/9) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGa39e9cf6bec4: [flang] Basic tests of external I/O runtime (part 9/9) (authored by klausler). Changed prior to commit: https://reviews.llvm.org/D83164?vs=275522&id=275618#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83164/new/ https://reviews.llvm.org/D83164 Files: flang/runtime/terminator.cpp flang/runtime/terminator.h flang/unittests/Runtime/CMakeLists.txt flang/unittests/Runtime/external-hello.cpp flang/unittests/Runtime/external-io.cpp flang/unittests/Runtime/testing.cpp flang/unittests/Runtime/testing.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83164.275618.patch Type: text/x-patch Size: 21538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:28:25 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:28:25 +0000 (UTC) Subject: [PATCH] D82903: [flang] Bug fix for ambiguous references to data and functions In-Reply-To: References: Message-ID: <172f24d42c44f7473cf35845c6a9c107@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf9e24a563c36: [flang] Bug fix for ambiguous references to data and functions (authored by PeteSteinfeld). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82903/new/ https://reviews.llvm.org/D82903 Files: flang/lib/Semantics/expression.cpp flang/lib/Semantics/resolve-names.cpp flang/test/Semantics/resolve93.f90 Index: flang/test/Semantics/resolve93.f90 =================================================================== --- /dev/null +++ flang/test/Semantics/resolve93.f90 @@ -0,0 +1,44 @@ +! RUN: %S/test_errors.sh %s %t %f18 +subroutine s1() + character(10) str + character(10) str1 + !ERROR: Cannot reference function 'str' as data + print *, str(1:9), str(7) + block + character(10) str2 + character(10) str3 + !ERROR: Cannot reference function 'str1' as data + print *, str1(1:9), str1(7) + !ERROR: 'str2' is not an array + print *, str2(1:9), str2(7) + !ERROR: Cannot reference function 'str3' as data + print *, str3(7), str3(1:9) + end block +end subroutine s1 + +subroutine s2() + character(10) func + !ERROR: Cannot reference function 'func' as data + print *, func(7), func(1:9) +end subroutine s2 + +subroutine s3() + real(8) :: func + !ERROR: Cannot reference function 'func' as data + print *, func(7), func(1:6) +end subroutine s3 + +subroutine s4() + real(8) :: local + real(8) :: local1 + !ERROR: Cannot reference function 'local' as data + print *, local(1:6), local(7) + !ERROR: Cannot reference function 'local1' as data + print *, local1(7), local1(1:6) +end subroutine s4 + +subroutine s5(arg) + integer :: iVar + external :: arg + iVar = loc(arg) +end subroutine s5 Index: flang/lib/Semantics/resolve-names.cpp =================================================================== --- flang/lib/Semantics/resolve-names.cpp +++ flang/lib/Semantics/resolve-names.cpp @@ -5505,7 +5505,15 @@ }, [&](const Indirection &y) { Walk(y.value().subscripts); - return ResolveDataRef(y.value().base); + const parser::Name *name{ResolveDataRef(y.value().base)}; + if (!name) { + } else if (!name->symbol->has()) { + ConvertToObjectEntity(*name->symbol); + } else if (!context().HasError(*name->symbol)) { + SayWithDecl(*name, *name->symbol, + "Cannot reference function '%s' as data"_err_en_US); + } + return name; }, [&](const Indirection &y) { Walk(y.value().imageSelector); Index: flang/lib/Semantics/expression.cpp =================================================================== --- flang/lib/Semantics/expression.cpp +++ flang/lib/Semantics/expression.cpp @@ -909,7 +909,10 @@ return std::nullopt; } else if (baseExpr->Rank() == 0) { if (const Symbol * symbol{GetLastSymbol(*baseExpr)}) { - Say("'%s' is not an array"_err_en_US, symbol->name()); + if (!context_.HasError(symbol)) { + Say("'%s' is not an array"_err_en_US, symbol->name()); + context_.SetError(const_cast(*symbol)); + } } } else if (std::optional dataRef{ ExtractDataRef(std::move(*baseExpr))}) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D82903.275622.patch Type: text/x-patch Size: 2983 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:28:46 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:28:46 +0000 (UTC) Subject: [PATCH] D83184: Avoid using globals in ELF Symbol Table In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGdc6b3f03a872: [ELF] Drop an unneeded reference to `symtab` from SymbolTable::addSymbol (authored by William S. Moses <gh at wsmoses.com>, committed by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83184/new/ https://reviews.llvm.org/D83184 Files: lld/ELF/SymbolTable.cpp Index: lld/ELF/SymbolTable.cpp =================================================================== --- lld/ELF/SymbolTable.cpp +++ lld/ELF/SymbolTable.cpp @@ -94,7 +94,7 @@ } Symbol *SymbolTable::addSymbol(const Symbol &newSym) { - Symbol *sym = symtab->insert(newSym.getName()); + Symbol *sym = insert(newSym.getName()); sym->resolve(newSym); return sym; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83184.275623.patch Type: text/x-patch Size: 371 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:28:51 2020 From: llvm-commits at lists.llvm.org (Kazushi Marukawa via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:28:51 +0000 (UTC) Subject: [PATCH] D83170: [VE] Support symbol with offset in assembly In-Reply-To: References: Message-ID: <21287495fba81c018299b1921dcd8869@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGfa1fecc73d4d: [VE] Support symbol with offset in assembly (authored by kaz7). Changed prior to commit: https://reviews.llvm.org/D83170?vs=275539&id=275625#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83170/new/ https://reviews.llvm.org/D83170 Files: llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/VE/AsmParser/VEAsmParser.cpp llvm/test/MC/VE/sym-br.s llvm/test/MC/VE/symbols.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83170.275625.patch Type: text/x-patch Size: 13288 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:07 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:29:07 +0000 (UTC) Subject: [PATCH] D82821: [WebAssembly] Added 64-bit memory.grow/size/init/copy/fill In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG16d83c395a1f: [WebAssembly] Added 64-bit memory.grow/size/copy/fill (authored by aardappel). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D82821?vs=275258&id=275627#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82821/new/ https://reviews.llvm.org/D82821 Files: clang/include/clang/Basic/BuiltinsWebAssembly.def clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/builtins-wasm.c llvm/include/llvm/IR/IntrinsicsWebAssembly.td llvm/lib/Target/WebAssembly/WebAssemblyInstrBulkMemory.td llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td llvm/lib/Target/WebAssembly/WebAssemblySelectionDAGInfo.cpp llvm/test/CodeGen/WebAssembly/bulk-memory-intrinsics.ll llvm/test/CodeGen/WebAssembly/bulk-memory64.ll llvm/test/CodeGen/WebAssembly/memory-addr64.ll llvm/test/MC/WebAssembly/bulk-memory-encodings.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82821.275627.patch Type: text/x-patch Size: 22087 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:10 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 07:29:10 +0000 (UTC) Subject: [PATCH] D83084: DomTree: Remove the releaseMemory() method In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG723a44c9b5d6: DomTree: Remove the releaseMemory() method (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83084/new/ https://reviews.llvm.org/D83084 Files: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/Analysis/PostDominators.h llvm/include/llvm/IR/Dominators.h llvm/include/llvm/Support/GenericDomTree.h Index: llvm/include/llvm/Support/GenericDomTree.h =================================================================== --- llvm/include/llvm/Support/GenericDomTree.h +++ llvm/include/llvm/Support/GenericDomTree.h @@ -325,8 +325,6 @@ return false; } - void releaseMemory() { reset(); } - /// getNode - return the (Post)DominatorTree node for the specified basic /// block. This is the same as using operator[] on this class. The result /// may (but is not required to) be null for a forward (backwards) @@ -760,9 +758,6 @@ return DomTreeBuilder::Verify(*this, VL); } -protected: - void addRoot(NodeT *BB) { this->Roots.push_back(BB); } - void reset() { DomTreeNodes.clear(); Roots.clear(); @@ -772,6 +767,9 @@ SlowQueries = 0; } +protected: + void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template Index: llvm/include/llvm/IR/Dominators.h =================================================================== --- llvm/include/llvm/IR/Dominators.h +++ llvm/include/llvm/IR/Dominators.h @@ -277,7 +277,7 @@ AU.setPreservesAll(); } - void releaseMemory() override { DT.releaseMemory(); } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module *M = nullptr) const override; }; Index: llvm/include/llvm/Analysis/PostDominators.h =================================================================== --- llvm/include/llvm/Analysis/PostDominators.h +++ llvm/include/llvm/Analysis/PostDominators.h @@ -88,9 +88,7 @@ AU.setPreservesAll(); } - void releaseMemory() override { - DT.releaseMemory(); - } + void releaseMemory() override { DT.reset(); } void print(raw_ostream &OS, const Module*) const override; }; Index: clang/include/clang/Analysis/Analyses/Dominators.h =================================================================== --- clang/include/clang/Analysis/Analyses/Dominators.h +++ clang/include/clang/Analysis/Analyses/Dominators.h @@ -167,9 +167,7 @@ } /// Releases the memory held by the dominator tree. - virtual void releaseMemory() { - DT.releaseMemory(); - } + virtual void releaseMemory() { DT.reset(); } /// Converts the dominator tree to human readable form. virtual void print(raw_ostream &OS, const llvm::Module* M= nullptr) const { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83084.275628.patch Type: text/x-patch Size: 2424 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:10 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 07:29:10 +0000 (UTC) Subject: [PATCH] D83083: DomTree: Remove getChildren() accessor In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG76c5cb05a3a6: DomTree: Remove getChildren() accessor (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83083/new/ https://reviews.llvm.org/D83083 Files: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/CodeGen/EarlyIfConversion.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/MachineCSE.cpp llvm/lib/CodeGen/MachineLICM.cpp llvm/lib/CodeGen/MachineSink.cpp llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp llvm/lib/Target/Mips/MipsOptimizePICCall.cpp llvm/lib/Transforms/Scalar/ConstantHoisting.cpp llvm/lib/Transforms/Scalar/NewGVN.cpp llvm/lib/Transforms/Utils/LoopSimplify.cpp llvm/lib/Transforms/Utils/LoopUnroll.cpp llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83083.275629.patch Type: text/x-patch Size: 13659 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:23 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 07:29:23 +0000 (UTC) Subject: [PATCH] D83085: DomTree: Remove getRoots() accessor In-Reply-To: References: Message-ID: <613c86c1e23c9c5566c99a2e42d34a00@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGdfcc68c52826: DomTree: Remove getRoots() accessor (authored by nhaehnle). Changed prior to commit: https://reviews.llvm.org/D83085?vs=275228&id=275630#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83085/new/ https://reviews.llvm.org/D83085 Files: llvm/include/llvm/Analysis/DominanceFrontier.h llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/CodeGen/MachinePostDominators.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/unittests/IR/DominatorTreeTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83085.275630.patch Type: text/x-patch Size: 6146 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:24 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 07:29:24 +0000 (UTC) Subject: [PATCH] D83086: DomTree: add private create{Child,Node} helpers In-Reply-To: References: Message-ID: <1f877892f9e1342e1bcb8bf6d6b5be62@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf987ba3cf9af: DomTree: add private create{Child,Node} helpers (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83086/new/ https://reviews.llvm.org/D83086 Files: llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h Index: llvm/include/llvm/Support/GenericDomTreeConstruction.h =================================================================== --- llvm/include/llvm/Support/GenericDomTreeConstruction.h +++ llvm/include/llvm/Support/GenericDomTreeConstruction.h @@ -187,9 +187,7 @@ // Add a new tree node for this NodeT, and link it as a child of // IDomNode - return (DT.DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))) - .get(); + return DT.createChild(BB, IDomNode); } static bool AlwaysDescend(NodePtr, NodePtr) { return true; } @@ -587,9 +585,7 @@ // all real exits (including multiple exit blocks, infinite loops). NodePtr Root = IsPostDom ? nullptr : DT.Roots[0]; - DT.RootNode = (DT.DomTreeNodes[Root] = - std::make_unique>(Root, nullptr)) - .get(); + DT.RootNode = DT.createNode(Root); SNCA.attachNewSubtree(DT, DT.RootNode); } @@ -610,8 +606,7 @@ // Add a new tree node for this BasicBlock, and link it as a child of // IDomNode. - DT.DomTreeNodes[W] = IDomNode->addChild( - std::make_unique>(W, IDomNode)); + DT.createChild(W, IDomNode); } } @@ -661,10 +656,7 @@ // The unreachable node becomes a new root -- a tree node for it. TreeNodePtr VirtualRoot = DT.getNode(nullptr); - FromTN = - (DT.DomTreeNodes[From] = VirtualRoot->addChild( - std::make_unique>(From, VirtualRoot))) - .get(); + FromTN = DT.createChild(From, VirtualRoot); DT.Roots.push_back(From); } Index: llvm/include/llvm/Support/GenericDomTree.h =================================================================== --- llvm/include/llvm/Support/GenericDomTree.h +++ llvm/include/llvm/Support/GenericDomTree.h @@ -590,8 +590,7 @@ DomTreeNodeBase *IDomNode = getNode(DomBB); assert(IDomNode && "Not immediate dominator specified for block!"); DFSInfoValid = false; - return (DomTreeNodes[BB] = IDomNode->addChild( - std::make_unique>(BB, IDomNode))).get(); + return createChild(BB, IDomNode); } /// Add a new node to the forward dominator tree and make it a new root. @@ -604,8 +603,7 @@ assert(!this->isPostDominator() && "Cannot change root of post-dominator tree"); DFSInfoValid = false; - DomTreeNodeBase *NewNode = (DomTreeNodes[BB] = - std::make_unique>(BB, nullptr)).get(); + DomTreeNodeBase *NewNode = createNode(BB); if (Roots.empty()) { addRoot(BB); } else { @@ -786,6 +784,18 @@ protected: void addRoot(NodeT *BB) { this->Roots.push_back(BB); } + DomTreeNodeBase *createChild(NodeT *BB, DomTreeNodeBase *IDom) { + return (DomTreeNodes[BB] = IDom->addChild( + std::make_unique>(BB, IDom))) + .get(); + } + + DomTreeNodeBase *createNode(NodeT *BB) { + return (DomTreeNodes[BB] = + std::make_unique>(BB, nullptr)) + .get(); + } + // NewBB is split and now it has one successor. Update dominator tree to // reflect this change. template -------------- next part -------------- A non-text attachment was scrubbed... Name: D83086.275631.patch Type: text/x-patch Size: 3347 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:39 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:29:39 +0000 (UTC) Subject: [PATCH] D83022: Add option LLVM_NM to allow specifying the location of the llvm-nm tool. In-Reply-To: References: Message-ID: <7f7b910437987ce539401e97a03bfbec@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1d8cb099231a: Add option LLVM_NM to allow specifying the location of the llvm-nm tool (authored by arlosi, committed by smeenai). Changed prior to commit: https://reviews.llvm.org/D83022?vs=275000&id=275632#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83022/new/ https://reviews.llvm.org/D83022 Files: llvm/tools/llvm-shlib/CMakeLists.txt Index: llvm/tools/llvm-shlib/CMakeLists.txt =================================================================== --- llvm/tools/llvm-shlib/CMakeLists.txt +++ llvm/tools/llvm-shlib/CMakeLists.txt @@ -154,13 +154,17 @@ set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py) set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports) - - if(CMAKE_CROSSCOMPILING) - build_native_tool(llvm-nm llvm_nm) - set(llvm_nm_target "${llvm_nm}") + if(NOT LLVM_NM) + if(CMAKE_CROSSCOMPILING) + build_native_tool(llvm-nm llvm_nm) + set(llvm_nm_target "${llvm_nm}") + else() + set(llvm_nm $) + set(llvm_nm_target llvm-nm) + endif() else() - set(llvm_nm $) - set(llvm_nm_target llvm-nm) + set(llvm_nm ${LLVM_NM}) + set(llvm_nm_target "") endif() add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83022.275632.patch Type: text/x-patch Size: 931 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:30:12 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:30:12 +0000 (UTC) Subject: [PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGc143900a0851: [llvm-install-name-tool] Merge install-name options (authored by sameerarora101). Changed prior to commit: https://reviews.llvm.org/D82812?vs=275562&id=275636#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82812/new/ https://reviews.llvm.org/D82812 Files: llvm/test/tools/llvm-objcopy/MachO/install-name-tool-add-rpath.test llvm/tools/llvm-objcopy/CopyConfig.cpp llvm/tools/llvm-objcopy/CopyConfig.h llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82812.275636.patch Type: text/x-patch Size: 12613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:30:19 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:30:19 +0000 (UTC) Subject: [PATCH] D83177: [llvm-reduce] Reducing call operand bundles In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG05f2b5ccfc5d: [llvm-reduce] Reducing call operand bundles (authored by lebedev.ri). Changed prior to commit: https://reviews.llvm.org/D83177?vs=275560&id=275637#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83177/new/ https://reviews.llvm.org/D83177 Files: llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83177.275637.patch Type: text/x-patch Size: 10493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:30:45 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:30:45 +0000 (UTC) Subject: [PATCH] D82716: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGea71ba11ab11: [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division (authored by spatel). Changed prior to commit: https://reviews.llvm.org/D82716?vs=273935&id=275640#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82716/new/ https://reviews.llvm.org/D82716 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/X86/sqrt-fastmath.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82716.275640.patch Type: text/x-patch Size: 8428 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:31:19 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:31:19 +0000 (UTC) Subject: [PATCH] D82520: [Power10] Implement Vector Splat Immediate Builtins in LLVM/Clang In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Revision". This revision was automatically updated to reflect the committed changes. Closed by commit rG0c6b6e28e70c: [PowerPC] Implement Vector Splat Immediate Builtins in Clang (authored by biplmish, committed by lei). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D82520?vs=274748&id=275642#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82520/new/ https://reviews.llvm.org/D82520 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/test/CodeGen/PowerPC/p10-splatImm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82520.275642.patch Type: text/x-patch Size: 6056 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:31:31 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:31:31 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: <67e222a6323578b7855b65842b95e888@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG65482e8a703d: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to… (authored by clementval). Changed prior to commit: https://reviews.llvm.org/D82982?vs=274857&id=275643#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82982.275643.patch Type: text/x-patch Size: 109657 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:31:39 2020 From: llvm-commits at lists.llvm.org (Xiang Zhang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:31:39 +0000 (UTC) Subject: [PATCH] D83111: [X86-64] Support Intel AMX Intrinsic In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG939d8309dbd4: [X86-64] Support Intel AMX Intrinsic (authored by xiangzhangllvm). Herald added a project: clang. Herald added a subscriber: cfe-commits. Changed prior to commit: https://reviews.llvm.org/D83111?vs=275537&id=275644#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83111/new/ https://reviews.llvm.org/D83111 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Basic/BuiltinsX86_64.def clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Driver/Options.td clang/include/clang/Sema/Sema.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/Headers/CMakeLists.txt clang/lib/Headers/amxintrin.h clang/lib/Headers/cpuid.h clang/lib/Headers/immintrin.h clang/lib/Sema/SemaChecking.cpp clang/test/CodeGen/AMX/amx.c clang/test/CodeGen/AMX/amx_errors.c clang/test/CodeGen/AMX/amx_inline_asm.c clang/test/Driver/x86-target-features.c clang/test/Preprocessor/x86_amx_target_features.c llvm/include/llvm/IR/IntrinsicsX86.td llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrAMX.td llvm/test/CodeGen/X86/AMX/amx-bf16-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-int8-intrinsics.ll llvm/test/CodeGen/X86/AMX/amx-tile-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83111.275644.patch Type: text/x-patch Size: 42384 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:31:46 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:31:46 +0000 (UTC) Subject: [PATCH] D78434: [mlir] resolve types from attributes in assemblyFormat In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG72df59d59097: [mlir] resolve types from attributes in assemblyFormat (authored by tali, committed by mehdi_amini). Changed prior to commit: https://reviews.llvm.org/D78434?vs=275288&id=275645#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78434/new/ https://reviews.llvm.org/D78434 Files: mlir/docs/OpDefinitions.md mlir/test/lib/Dialect/Test/TestOps.td mlir/test/mlir-tblgen/op-format.mlir mlir/tools/mlir-tblgen/OpFormatGen.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D78434.275645.patch Type: text/x-patch Size: 11208 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:33:10 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:33:10 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <9d748f638a799c8b21750146e58b4b3e@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:11 +# RUN: llvm-ar t %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-NAMES --implicit-check-not={{.}} -DPREFIX=create-static-lib.test.tmp + ---------------- It's not widely used, but there is `%basename_t` which substitutes for just the file name part of `%t`. This allows the test to not make assumptions about how `%t` expands, and also keeps the check independent of the test name. I'd recommend trying that. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:18 +# RUN: llvm-nm --print-armap %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-SYMBOLS -DPREFIX=create-static-lib.test.tmp + ---------------- Mostly fine, but use `%basename_t` here too. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:34 + +## Checking that the output file is overwritten: +# RUN: llvm-libtool-darwin -static -o %t.lib %t-input2.o ---------------- Checking -> Check ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:37 +# RUN: llvm-ar t %t.lib | \ +# RUN: FileCheck %s --check-prefix=OVERWRITE-NAMES --implicit-check-not={{.}} +# RUN: llvm-nm --print-armap %t.lib | \ ---------------- Here and below, use `%basename_t` too, to avoid baking in file names. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:44 +# OVERWRITE-SYMBOLS: Archive map +# OVERWRITE-SYMBOLS-NEXT: _symbol2 in create-static-lib.test.tmp-input2.o +# OVERWRITE-SYMBOLS-EMPTY: ---------------- Ditto. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:50-52 +# RUN: FileCheck %s --check-prefix=DUPLICATE-NAMES --implicit-check-not={{.}} -DPREFIX=create-static-lib.test.tmp +# RUN: llvm-nm --print-armap %t.lib | \ +# RUN: FileCheck %s --check-prefix=DUPLICATE-SYMBOLS -DPREFIX=create-static-lib.test.tmp ---------------- Ditto. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:7 + +# MISSING-OPERATION: Library Type: option: must be specified at least once! + ---------------- sameerarora101 wrote: > jhenderson wrote: > > Does the double space match the actual error message? > Yes, the actual error msg also has the double space: > ``` > Library Type: option: must be specified at least once! > ``` Okay, bonus marks for fixing it in another patch if you want. Or file a bug against the CommandLine code. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:27 + +## Missing -static option: +# RUN: not llvm-libtool-darwin -o %t.lib %t.input 2>&1 | \ ---------------- Maybe make this more generic, e.g. "Missing library type option:" I don't really know where to put this test. It might want to be its own test case entirely, e.g. "missing-library-type.test" ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:44 + +# NOT-OBJECT: error: 'invalid-input-output-args.test.tmp.invalid': The file was not recognized as a valid object file + ---------------- Use `%basename_t` here too. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:51 + +# NOT-MACHO: error: 'invalid-input-output-args.test.tmp.elf': format not supported + ---------------- Use `%basename_t` here too. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:28 + cl::value_desc("filename"), cl::Required, + cl::cat(LibtoolCategory)); static cl::alias OutputFileShort("o", cl::desc("Alias for -output"), ---------------- Adding the categories sounds like a different change to me? You might want to include it alongside a test case to show that unrelated options aren't incldued. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:85 + std::vector NewMembers; + for (const StringRef &Member : InputFiles) + if (Error E = addMember(NewMembers, Member)) ---------------- sameerarora101 wrote: > @jhenderson Should I replace the type of `Member` back to `auto`. clang-tidy raises a warning with `StringRef`? Well typically you don't want to use a `const &` for `StringRef` because it is already a light-weight construct (much the same as you wouldn't use `const &` to a pointer or primitive type). That is probably the cause of the problem here. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:100 InitLLVM X(Argc, Argv); + cl::HideUnrelatedOptions({&LibtoolCategory, &ColorCategory}); cl::ParseCommandLineOptions(Argc, Argv, "llvm-libtool-darwin\n"); ---------------- See above re. option category change. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:34-38 +static cl::opt LibraryOperation( + cl::desc("Library Type: "), + cl::values( + clEnumValN(Static, "static", + "Produce a statically linked library from the input files")), ---------------- sameerarora101 wrote: > jhenderson wrote: > > I'm not really familiar with the `Operation` type. What does it look like in the help text? > this is what help text looks like: > ``` > OVERVIEW: llvm-libtool-darwin > > USAGE: llvm-libtool-darwin [options] > > OPTIONS: > > Color Options: > > --color - Use colors in output (default=autodetect) > > Generic Options: > > --help - Display available options (--help-hidden for more) > --help-list - Display list of available options (--help-list-hidden for more) > --version - Display the version of this program > > llvm-libtool-darwin options: > > -o - Alias for -output > --output= - Specify output filename > Library Type: > --static - Produce a statically linked library from the input files > ``` > > I created an `enum Operation` here so that in future we can add support for `dynamic` operation easily. I can very well make the `-static` option a boolean flag as well. What do you think? Your current approach seems fine. I was just making sure it didn't do anything too weird. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Tue Jul 7 00:34:36 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:34:36 +0000 (UTC) Subject: [PATCH] D82540: [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount In-Reply-To: References: Message-ID: <68c3601a236c9433fc1302c7120a1a26@localhost.localdomain> foad added a comment. I misread something and thought @arsenm had accepted this patch. That's why I committed it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82540/new/ https://reviews.llvm.org/D82540 From llvm-commits at lists.llvm.org Tue Jul 7 00:36:12 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:36:12 +0000 (UTC) Subject: [PATCH] D83192: Fix off by one error in Bitfields In-Reply-To: References: Message-ID: <2037158abc91cfe770544ae973d53d43@localhost.localdomain> courbet added a comment. In D83192#2135079 , @gchatelet wrote: > In D83192#2135019 , @courbet wrote: > > > Can you add a unit test for this ? > > > I'll add the following test as a separate commit if it LGTY. > Note: I can't use EXPECT_EQ because it takes the arguments by `const &` and the properties are `static constexpr`, the compiler complains about undefined reference. > > TEST(BitfieldsTest, Properties) { > using A = Bitfield::Element; > EXPECT_TRUE(A::FirstBit == 0U); > EXPECT_TRUE(A::LastBit == 0U); > EXPECT_TRUE(A::Shift == 0U); > EXPECT_TRUE(A::Bits == 1U); > > using B = Bitfield::Element; > EXPECT_TRUE(B::FirstBit == 3U); > EXPECT_TRUE(B::LastBit == 6U); > EXPECT_TRUE(B::Shift == 3U); > EXPECT_TRUE(B::Bits == 4U); > } > SG Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83192/new/ https://reviews.llvm.org/D83192 From llvm-commits at lists.llvm.org Tue Jul 7 00:38:04 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:38:04 +0000 (UTC) Subject: [PATCH] D73051: [GlobalISel][AMDGPU] Legalize saturating add/subtract In-Reply-To: References: Message-ID: foad added a comment. In D73051#2133179 , @arsenm wrote: > ping. Are you going to get back to this soon, or should I adopt this? This is on the shortlist of remaining operations falling back in the OpenCL conformance tests In the short term I don't have time to work on this, so I would be happy for you to commandeer it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73051/new/ https://reviews.llvm.org/D73051 From llvm-commits at lists.llvm.org Tue Jul 7 00:45:01 2020 From: llvm-commits at lists.llvm.org (Haojian Wu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:45:01 +0000 (UTC) Subject: [PATCH] D83099: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' In-Reply-To: References: Message-ID: <6ead4440031722382c83b7c1aacf7ff1@localhost.localdomain> hokein accepted this revision. hokein added a comment. This revision is now accepted and ready to land. looks good from my side. ================ Comment at: .gitignore:58 +.clangd/ +.cache # static analyzer regression testing project files ---------------- I'm afraid that many projects have to update their `.gitignore`, but this is a tradeoff... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 From llvm-commits at lists.llvm.org Tue Jul 7 00:52:39 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:52:39 +0000 (UTC) Subject: [PATCH] D81359: [ELF] Add --[no-]relax for RISC-V In-Reply-To: References: Message-ID: <2f0239c260685dabd0c7cf529b379005@localhost.localdomain> grimar accepted this revision. grimar added a comment. This revision is now accepted and ready to land. LGTM. Thanks.. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81359/new/ https://reviews.llvm.org/D81359 From llvm-commits at lists.llvm.org Tue Jul 7 00:56:38 2020 From: llvm-commits at lists.llvm.org (Sourabh Singh Tomar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:56:38 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <42193fb7fca926809838b8bdea8123f4@localhost.localdomain> SouraVX added a comment. In D82975#2134626 , @dblaikie wrote: > In D82975#2134347 , @probinson wrote: > > > In D82975#2134132 , @dblaikie wrote: > > > > > In D82975#2128353 , @dstenb wrote: > > > > > > > In D82975#2127201 , @SouraVX wrote: > > > > > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > > > > > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. > > > > > > My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. > > > > LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. > > > GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand: > > $ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug" > 1 .debug_info.dwo 0000003c 0000000000000000 > 2 .debug_abbrev.dwo 0000003e 0000000000000000 > 3 .debug_macro.dwo 0000001e 0000000000000000 > 4 .debug_macro.dwo 00000364 0000000000000000 > 5 .debug_macro.dwo 00000013 0000000000000000 > 6 .debug_line.dwo 00000048 0000000000000000 > 7 .debug_str_offsets.dwo 000002d5 0000000000000000 > 8 .debug_str.dwo 00000e05 0000000000000000 > $ llvm-dwarfdump-tot main.dwo -debug-macro > main.dwo: file format elf64-x86-64 > > .debug_macro.dwo contents: > 0x00000000: > - lineno: 19 macro: > DW_MACINFO_invalid > > > I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off. One more deciding factor to considered here(previously missed) is that: `GDB(trunk)` also doesn't understand `GNU macro extensions`(if you wish to call it) in split case. i.e `gcc -g3 -gsplit-dwarf test.c` `test.dwo` contains `.debug_macro.dwo` forms which no tool(as of now can dump). if you load `a.out` in GDB and try expanding macro(defined in source). GDB will report (gdb) info macro FOO The symbol `FOO' has no definition as a C/C++ preprocessor macro at :-1 on the other hand, if you try with `-gstrict-dwarf -gsplit-dwarf`. GDB is happy. So at the end of the day, even if we allow `GNU macro` extension, things will still be broken for `-gsplit-dwarf` case. Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in `.debug_macro.dwo` section. That it self is unclear right ? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Tue Jul 7 00:57:29 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 00:57:29 -0700 (PDT) Subject: [llvm] 44ea81a - [X86] Add 64bit and retpoline-external-thunk to list of featuers in X86TargetParser.def. Message-ID: <5f042ae9.1c69fb81.5b5fa.00a8@mx.google.com> Author: Craig Topper Date: 2020-07-07T00:57:04-07:00 New Revision: 44ea81acb696592281157656ea7d81ecb41ca161 URL: https://github.com/llvm/llvm-project/commit/44ea81acb696592281157656ea7d81ecb41ca161 DIFF: https://github.com/llvm/llvm-project/commit/44ea81acb696592281157656ea7d81ecb41ca161.diff LOG: [X86] Add 64bit and retpoline-external-thunk to list of featuers in X86TargetParser.def. '64bit' shows up from -march=native on 64-bit capable CPUs. 'retpoline-eternal-thunk' isn't a real feature but shows up when -mretpoline-external-thunk is passed to clang. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index e53ef20f939e..91feb146baaa 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -173,6 +173,7 @@ X86_FEATURE_COMPAT(AVX512VP2INTERSECT, "avx512vp2intersect") // Features below here are not in libgcc/compiler-rt. X86_FEATURE (3DNOW, "3dnow") X86_FEATURE (3DNOWA, "3dnowa") +X86_FEATURE (64BIT, "64bit") X86_FEATURE (ADX, "adx") X86_FEATURE (AMX_BF16, "amx-bf16") X86_FEATURE (AMX_INT8, "amx-int8") @@ -183,6 +184,9 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") +// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is +// valid for 64-bit mode, but has empty string so it doesn't get added to +// target attributes in IR. X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") @@ -221,6 +225,7 @@ X86_FEATURE (XSAVEC, "xsavec") X86_FEATURE (XSAVEOPT, "xsaveopt") X86_FEATURE (XSAVES, "xsaves") // These features aren't really CPU features, but the frontend can set them. +X86_FEATURE (RETPOLINE_EXTERNAL_THUNK, "retpoline-external-thunk") X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches") X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls") X86_FEATURE (LVI_CFI, "lvi-cfi") diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 12182179fe45..261e296b9e5a 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -390,6 +390,7 @@ ProcessorFeatures llvm::X86::getKeyFeature(X86::CPUKind Kind) { } // Features with no dependencies. +static constexpr FeatureBitset ImpliedFeatures64BIT = {}; static constexpr FeatureBitset ImpliedFeaturesADX = {}; static constexpr FeatureBitset ImpliedFeaturesBMI = {}; static constexpr FeatureBitset ImpliedFeaturesBMI2 = {}; @@ -435,6 +436,7 @@ static constexpr FeatureBitset ImpliedFeaturesXSAVE = {}; // Not really CPU features, but need to be in the table because clang uses // target features to communicate them to the backend. +static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_EXTERNAL_THUNK = {}; static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_BRANCHES = {}; static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_CALLS = {}; static constexpr FeatureBitset ImpliedFeaturesLVI_CFI = {}; @@ -558,6 +560,8 @@ void llvm::X86::getImpliedFeatures( auto I = llvm::find_if( FeatureInfos, [&](const FeatureInfo &FI) { return FI.Name == Feature; }); if (I == std::end(FeatureInfos)) { + // FIXME: This shouldn't happen, but may not have all features in the table + // yet. return; } From llvm-commits at lists.llvm.org Tue Jul 7 01:04:49 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:04:49 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: djtodoro added a comment. Instead of `git mv llvm/lib/CodeGen/LiveDebugValues.cpp llvm/lib/CodeGen/VarLocBasedImpl.cpp` it should be as following: `git mv llvm/lib/CodeGen/LiveDebugValues.cpp llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp` > Plus moving the source file in CMakeLists.txt. I've assumed that such a file movement doesn't need reviewing; I can upload a review if anyone wants a closer look though. I'd prefer adding all the patches in the stack, since there might be developers using some scripts for applying the patches automatically from Phabricator. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp:1 +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/CodeGen/MachineFrameInfo.h" ---------------- Same as above. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp:22 + +#include "LiveDebugValues.h" + ---------------- According to the LLVM coding standard, this should be included first. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h:1 +#include "llvm/CodeGen/MachineFunction.h" +#include "llvm/CodeGen/TargetPassConfig.h" ---------------- I think we are missing the file header at the beginning of the file. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h:20 + + } // NS SharedLiveDebugValues + ---------------- // namespace SharedLiveDebugValues ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h:24 + extern LDVImpl *makeVarLocBasedLiveDebugValues(); +} // NS llvm ---------------- // namespace llvm Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Tue Jul 7 01:09:48 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:09:48 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: <17ecb5d0335e8e9ee124b573c22253c3@localhost.localdomain> grimar marked an inline comment as done. grimar added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:6559-6566 + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); ---------------- jhenderson wrote: > grimar wrote: > > grimar wrote: > > > jhenderson wrote: > > > > This seems like a pattern we're likely to have in several different parts of the ELFDumper. Is there any code we could share to avoid duplication? Maybe it just makes sense to change `getStaticSymbolName` to report the warning/return the `` itself? > > > From what I see, the `getStaticSymbolName` is used in one more place: > > > > > > ``` > > > template > > > void LLVMStyle::printAddrsig(const ELFFile *Obj) { > > > ... > > > for (uint64_t Sym : *V) { > > > Expected NameOrErr = this->dumper()->getStaticSymbolName(Sym); > > > if (NameOrErr) { > > > W.printNumber("Sym", *NameOrErr, Sym); > > > continue; > > > } > > > reportWarning(NameOrErr.takeError(), this->FileName); > > > W.printNumber("Sym", "", Sym); > > > } > > > } > > > ``` > > > > > > And it looks like it should be reasonable and possible to do what you suggest. Follow-up? > > Follow-up: D83208 > Normally, I'd say yes, follow-up, but in this case, D83208 looks small enough that it could be done as part of this, especially as most of that change is just undoing what some of this change does (in particular, the `GetSymName` lamdba replaces `getStaticSymbolName` in this change and then is just replaced back the other way again). I don't feel strongly about this though. The problem is that D83208 is not a NFC. It changes the message reported for `--addrsig` (adds a prefix with the context): "warning: '[[FILE]]': unable to get symbol from section [index 2]: invalid symbol index (255)" -> "warning: '[[FILE]]': unable to read the name of symbol with index 255: unable to get symbol from section [index 2]: invalid symbol index (255)" That is why I slightly prefer to keep them separate. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 From llvm-commits at lists.llvm.org Tue Jul 7 01:11:01 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:11:01 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: foad added a comment. In D83101#2134412 , @lebedev.ri wrote: > Fixed in rGdb05f2e34a5e9380ddcc199d6687531108d795e4 . Thanks. I can confirm that it fixes all the failures I was seeing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 01:14:23 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:14:23 +0000 (UTC) Subject: [PATCH] D83282: [DWARFYAML] Refactor: Pull out member functions to DWARFYAMLUtils.cpp. Message-ID: Higuoxing created this revision. Higuoxing added reviewers: jhenderson, grimar, MaskRay. Herald added subscribers: llvm-commits, hiraditya, aprantl, mgorny. Herald added a project: LLVM. DWARFYAML.cpp should only hold functions for mapping YAML representation to DWARF structures. In this change, I add a file DWARFYAMLUtils.cpp to contain helper functions that are defined as member functions. In the future, we will add some helper functions that make DWARF sections interlinked. They should be implemented in DWARFYAMLUtils.cpp. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83282 Files: llvm/lib/ObjectYAML/CMakeLists.txt llvm/lib/ObjectYAML/DWARFYAML.cpp llvm/lib/ObjectYAML/DWARFYAMLUtils.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83282.275933.patch Type: text/x-patch Size: 3702 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:17:51 2020 From: llvm-commits at lists.llvm.org (Serguei Katkov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:17:51 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: skatkov added a comment. A pretty close to be done. Do we need a special tests for this patch? ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:125 +// Return statepoint GC args as a set +static SmallSet collectGCRegs(MachineInstr &MI) { + StatepointOpers SO(&MI); ---------------- dantrushin wrote: > skatkov wrote: > > Do I understand correctly that with your changes ALL GC pointers must be defs? > > So why do you need these iterations instead of just taking all defs? > Strictly speaking, no. Only derived pointers passed in registers. > Are we guaranteed that all base pointers will appear as derived ones too? > If yes, then it is good catch, taking them from defs is simpler (but taking them from operand list instead of def list sounds a bit more natural, IMHO) > > I'm a bit confused here. What is the difference between derived and based pointer here? You have an alive gc pointer. It might be relocated == can be changed. So it must be defined as def independent on whether it is a derived one or based one. Do I miss anything here? If you are doubt and write that it works under assumption I would suggest under debug assert that sets collected by different way are the same. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:401 + + // To insert reload at the end of MBB, insert it before last instruction + // and then swap them. ---------------- dantrushin wrote: > skatkov wrote: > > what is the reason for this magic? > The reason is that `TTI.loadRegFromStackSlot` can insert load only **before** some existing instruction. Does it make sense to add an utility function to TTI which after some existing instruction? It looks more natural then this magic. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:506 + + for (Register Reg : RegsToReload) + insertReloads(Reg); ---------------- dantrushin wrote: > skatkov wrote: > > Don't you want to separate reload loads into separate function? > > So we'll have: > > spill registers > > rewrite statepoint > > insertReloads/unspill registers > `insertReloads` uses local vector `RegsToReload` and `MI` (statepoint instruction). > To call `insertReloads` outside of `rewriteStatepoint` I will have to make that local vector and new statepoint instruction > available to `insertReloads()`. > > I don't think that making `RegsToReload` member variable or something like that: > > > ``` > SmallVector RegsToReload; > SS.spillRegisters(); > MachineInstr *NewStatepoint = SS.rewriteStatepoint(RegsToReload); // out parameter > SS.insertReloads(RegsToReload, NewStatepoint); > > ``` > will be much cleaner. > But I can do that if you want. I do not have strong preference here. But separation seems to me makes sense. At least rename the function rewriteStatepoint and definitely update the comment before the function due to it does additional things except rewriting statepoints. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Tue Jul 7 01:18:00 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:18:00 +0000 (UTC) Subject: [PATCH] D82660: [RISCV] Optimize multiplication by specific immediates In-Reply-To: References: Message-ID: lenary added a comment. One issue, then I'm happy. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2990 + if (auto *ConstNode = dyn_cast(C.getNode())) { + int64_t Imm = ConstNode->getSExtValue(); + if (isPowerOf2_64(Imm + 1) || isPowerOf2_64(Imm - 1) || ---------------- getSExtValue will assert if the value does not fit into 64 bits - you need to do a check before you get there. I think this hook can be called before legalisation, so you may not get only legal types in this call. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82660/new/ https://reviews.llvm.org/D82660 From llvm-commits at lists.llvm.org Tue Jul 7 01:22:05 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Tue, 07 Jul 2020 01:22:05 -0700 (PDT) Subject: [llvm] c061e56 - [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll Message-ID: <5f0430ad.1c69fb81.bbad3.4ed9@mx.google.com> Author: David Sherwood Date: 2020-07-07T09:21:47+01:00 New Revision: c061e56e880a20488e0f7e6cf9975aa24b83067c URL: https://github.com/llvm/llvm-project/commit/c061e56e880a20488e0f7e6cf9975aa24b83067c DIFF: https://github.com/llvm/llvm-project/commit/c061e56e880a20488e0f7e6cf9975aa24b83067c.diff LOG: [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll This patch fixes all remaining warnings in: llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll I hit some warnings related to getCopyPartsToVector. I fixed two issues: 1. In widenVectorToPartType() we assumed that we'd always be using BUILD_VECTOR nodes to expand from one vector type to another, which is incorrect for scalable vector types. I've fixed this for now by simply bailing out immediately for scalable vectors. 2. In getCopyToPartsVector() I've changed the code to compare the element counts of different types. Differential Revision: https://reviews.llvm.org/D83028 Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index a42dc3367097..05813296997e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -656,7 +656,7 @@ static void getCopyToParts(SelectionDAG &DAG, const SDLoc &DL, SDValue Val, static SDValue widenVectorToPartType(SelectionDAG &DAG, SDValue Val, const SDLoc &DL, EVT PartVT) { - if (!PartVT.isVector()) + if (!PartVT.isFixedLengthVector()) return SDValue(); EVT ValueVT = Val.getValueType(); @@ -702,8 +702,9 @@ static void getCopyToPartsVector(SelectionDAG &DAG, const SDLoc &DL, Val = Widened; } else if (PartVT.isVector() && PartEVT.getVectorElementType().bitsGE( - ValueVT.getVectorElementType()) && - PartEVT.getVectorNumElements() == ValueVT.getVectorNumElements()) { + ValueVT.getVectorElementType()) && + PartEVT.getVectorElementCount() == + ValueVT.getVectorElementCount()) { // Promoted vector extract Val = DAG.getAnyExtOrTrunc(Val, DL, PartVT); diff --git a/llvm/test/CodeGen/AArch64/sve-trunc.ll b/llvm/test/CodeGen/AArch64/sve-trunc.ll index c78f036078ba..50ce4d966087 100644 --- a/llvm/test/CodeGen/AArch64/sve-trunc.ll +++ b/llvm/test/CodeGen/AArch64/sve-trunc.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; For all the functions below should the operation is a nop define @trunc_i16toi8( %in) { diff --git a/llvm/test/CodeGen/AArch64/sve-vector-splat.ll b/llvm/test/CodeGen/AArch64/sve-vector-splat.ll index af43f8fc97e9..cd7ecbeb5ca1 100644 --- a/llvm/test/CodeGen/AArch64/sve-vector-splat.ll +++ b/llvm/test/CodeGen/AArch64/sve-vector-splat.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ;; Splats of legal integer vector types From llvm-commits at lists.llvm.org Tue Jul 7 01:22:09 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:22:09 +0000 (UTC) Subject: [PATCH] D83028: [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll In-Reply-To: References: Message-ID: <350e731f0334c23686f7abff2f108b25@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc061e56e880a: [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83028/new/ https://reviews.llvm.org/D83028 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll Index: llvm/test/CodeGen/AArch64/sve-vector-splat.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-vector-splat.ll +++ llvm/test/CodeGen/AArch64/sve-vector-splat.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ;; Splats of legal integer vector types Index: llvm/test/CodeGen/AArch64/sve-trunc.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-trunc.ll +++ llvm/test/CodeGen/AArch64/sve-trunc.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; For all the functions below should the operation is a nop define @trunc_i16toi8( %in) { Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -656,7 +656,7 @@ static SDValue widenVectorToPartType(SelectionDAG &DAG, SDValue Val, const SDLoc &DL, EVT PartVT) { - if (!PartVT.isVector()) + if (!PartVT.isFixedLengthVector()) return SDValue(); EVT ValueVT = Val.getValueType(); @@ -702,8 +702,9 @@ Val = Widened; } else if (PartVT.isVector() && PartEVT.getVectorElementType().bitsGE( - ValueVT.getVectorElementType()) && - PartEVT.getVectorNumElements() == ValueVT.getVectorNumElements()) { + ValueVT.getVectorElementType()) && + PartEVT.getVectorElementCount() == + ValueVT.getVectorElementCount()) { // Promoted vector extract Val = DAG.getAnyExtOrTrunc(Val, DL, PartVT); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83028.275937.patch Type: text/x-patch Size: 2218 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:26:52 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:26:52 +0000 (UTC) Subject: [PATCH] D83037: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). In-Reply-To: References: Message-ID: <2b7db9277777b7d4f7dce05961ed3071@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83037/new/ https://reviews.llvm.org/D83037 From llvm-commits at lists.llvm.org Tue Jul 7 01:29:00 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:29:00 +0000 (UTC) Subject: [PATCH] D83283: [Attributor] AAPotentialValues Interface Message-ID: okura created this revision. okura added a reviewer: jdoerfert. Herald added subscribers: llvm-commits, bbn, kuter, uenoku, hiraditya. Herald added a reviewer: sstefan1. Herald added a reviewer: uenoku. Herald added a reviewer: homerdin. Herald added a reviewer: baziotis. Herald added a project: LLVM. This is a split patch of D80991 . This patch introduces AAPotentialValues and its interface only. For more detail of AAPotentialValues abstract attribute, see the original patch. https://reviews.llvm.org/D83283 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/dereferenceable-1.ll llvm/test/Transforms/Attributor/lvi-after-jumpthreading.ll llvm/test/Transforms/Attributor/potential.ll llvm/test/Transforms/Attributor/value-simplify.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83283.275936.patch Type: text/x-patch Size: 36337 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:29:36 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:29:36 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. In-Reply-To: References: Message-ID: <41817e0d4109918082bf662426e8ec4b@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test:499-512 + - Tag: DT_RELA + Value: 0x0 + - Tag: DT_RELASZ + Value: 0x18 + - Tag: DT_RELAENT + Value: 0x18 + - Tag: DT_REL ---------------- The input here is a little subtle, i.e. reusing the same data for both REL and RELA relocations. Perhaps it would be clearer to have them point at different sections? If that's not possible, it might be worth pointing out how this works in a comment. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83232/new/ https://reviews.llvm.org/D83232 From llvm-commits at lists.llvm.org Tue Jul 7 01:32:01 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:32:01 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <820a8ec5ca66fc2cec379fa9b2eca5c8@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, but please give others a chance to review too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Tue Jul 7 01:32:27 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:32:27 +0000 (UTC) Subject: [PATCH] D82524: [SVE][CodeGen] Fix bug when falling back to DAG ISel In-Reply-To: References: Message-ID: <59ad10ee74c1cb21b1b560a4b25e9176@localhost.localdomain> This revision was automatically updated to reflect the committed changes. david-arm marked an inline comment as done. Closed by commit rG79d34a5a1bce: [SVE][CodeGen] Fix bug when falling back to DAG ISel (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82524/new/ https://reviews.llvm.org/D82524 Files: llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82524.275939.patch Type: text/x-patch Size: 4885 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:32:27 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Tue, 07 Jul 2020 01:32:27 -0700 (PDT) Subject: [llvm] 79d34a5 - [SVE][CodeGen] Fix bug when falling back to DAG ISel Message-ID: <5f04331b.1c69fb81.7c16d.68ae@mx.google.com> Author: David Sherwood Date: 2020-07-07T09:23:04+01:00 New Revision: 79d34a5a1bce39d8153be3665456e9cb0cc8601b URL: https://github.com/llvm/llvm-project/commit/79d34a5a1bce39d8153be3665456e9cb0cc8601b DIFF: https://github.com/llvm/llvm-project/commit/79d34a5a1bce39d8153be3665456e9cb0cc8601b.diff LOG: [SVE][CodeGen] Fix bug when falling back to DAG ISel In an earlier commit 584d0d5c1749c13625a5d322178ccb4121eea610 I added functionality to allow AArch64 CodeGen support for falling back to DAG ISel when Global ISel encounters scalable vector types. However, it seems that we were not falling back early enough as llvm::getLLTForType was still being invoked for scalable vector types. I've added a new fallback function to the call lowering class in order to catch this problem early enough, rather than wait for lowerFormalArguments to reject scalable vector types. Differential Revision: https://reviews.llvm.org/D82524 Added: Modified: llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h b/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h index 88a1837665aa..4d60dffb91db 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h @@ -290,6 +290,8 @@ class CallLowering { return false; } + virtual bool fallBackToDAGISel(const Function &F) const { return false; } + /// This hook must be implemented to lower the incoming (formal) /// arguments, described by \p VRegs, for GlobalISel. Each argument /// must end up in the related virtual registers described by \p VRegs. diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp index 4d950d4d2038..0171d6cb18ca 100644 --- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp +++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp @@ -2384,6 +2384,14 @@ bool IRTranslator::runOnMachineFunction(MachineFunction &CurMF) { // Make our arguments/constants entry block fallthrough to the IR entry block. EntryBB->addSuccessor(&getMBB(F.front())); + if (CLI->fallBackToDAGISel(F)) { + OptimizationRemarkMissed R("gisel-irtranslator", "GISelFailure", + F.getSubprogram(), &F.getEntryBlock()); + R << "unable to lower function: " << ore::NV("Prototype", F.getType()); + reportTranslationError(*MF, *TPC, *ORE, R); + return false; + } + // Lower the actual args into this basic block. SmallVector, 8> VRegArgs; for (const Argument &Arg: F.args()) { diff --git a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp index 7903299fcc33..ec9683a560f8 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp @@ -427,6 +427,14 @@ static void handleMustTailForwardedRegisters(MachineIRBuilder &MIRBuilder, } } +bool AArch64CallLowering::fallBackToDAGISel(const Function &F) const { + if (isa(F.getReturnType())) + return true; + return llvm::any_of(F.args(), [](const Argument &A) { + return isa(A.getType()); + }); +} + bool AArch64CallLowering::lowerFormalArguments( MachineIRBuilder &MIRBuilder, const Function &F, ArrayRef> VRegs) const { @@ -438,9 +446,6 @@ bool AArch64CallLowering::lowerFormalArguments( SmallVector SplitArgs; unsigned i = 0; for (auto &Arg : F.args()) { - if (isa(Arg.getType())) - return false; - if (DL.getTypeStoreSize(Arg.getType()).isZero()) continue; diff --git a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h index b0c601c7062c..640a86253059 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h +++ b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h @@ -37,6 +37,8 @@ class AArch64CallLowering: public CallLowering { ArrayRef VRegs, Register SwiftErrorVReg) const override; + bool fallBackToDAGISel(const Function &F) const override; + bool lowerFormalArguments(MachineIRBuilder &MIRBuilder, const Function &F, ArrayRef> VRegs) const override; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll index ca628fa45f6d..8287ab716a80 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll @@ -220,21 +220,30 @@ entry: ret void } -; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to lower arguments{{.*}}scalable_arg +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to lower function{{.*}}scalable_arg ; FALLBACK-WITH-REPORT-OUT-LABEL: scalable_arg define @scalable_arg( %pred, i8* %addr) #1 { %res = call @llvm.aarch64.sve.ld1.nxv16i8( %pred, i8* %addr) ret %res } -; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to translate instruction{{.*}}scalable_call -; FALLBACK-WITH-REPORT-OUT-LABEL: scalable_call -define @scalable_call(i8* %addr) #1 { +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to lower function{{.*}}scalable_ret +; FALLBACK-WITH-REPORT-OUT-LABEL: scalable_ret +define @scalable_ret(i8* %addr) #1 { %pred = call @llvm.aarch64.sve.ptrue.nxv16i1(i32 0) %res = call @llvm.aarch64.sve.ld1.nxv16i8( %pred, i8* %addr) ret %res } +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to translate instruction{{.*}}scalable_call +; FALLBACK-WITH-REPORT-OUT-LABEL: scalable_call +define i8 @scalable_call(i8* %addr) #1 { + %pred = call @llvm.aarch64.sve.ptrue.nxv16i1(i32 0) + %vec = call @llvm.aarch64.sve.ld1.nxv16i8( %pred, i8* %addr) + %res = extractelement %vec, i32 0 + ret i8 %res +} + attributes #1 = { "target-features"="+sve" } declare @llvm.aarch64.sve.ptrue.nxv16i1(i32 %pattern) From llvm-commits at lists.llvm.org Tue Jul 7 01:32:50 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:32:50 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: vitalybuka added a comment. LGTM ================ Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:1833 + Attrs.getAttributes(AttributeList::ReturnIndex)) + .addAttributes( + B.getContext(), AttributeList::FirstArgIndex + 0, ---------------- does this assign attributes of x to the 1.0 constant? why? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Mon Jul 6 12:42:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:42:58 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/CodeGen/LiveIntervals.cpp:1014 updateRange(LI, Reg, LaneBitmask::getNone()); + for (LiveInterval::SubRange &S : LI.subranges()) { + if (LI.covers(S)) ---------------- Needs a comment explaining this CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Mon Jul 6 12:53:32 2020 From: llvm-commits at lists.llvm.org (Sebastian Neubauer via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 19:53:32 +0000 (UTC) Subject: [PATCH] D81728: [InstCombine] Add target-specific inst combining In-Reply-To: References: Message-ID: <7c091cc74958be98f1e773d9ca49725e@localhost.localdomain> Flakebi updated this revision to Diff 275617. Flakebi added a comment. Rebased and removed a few includes as suggested. Make the TargetTransformInfo a private member of InstCombiner because it should not be used in general inst combines. Move CreateOverflowTuple out of InstCombiner and make CreateNonTerminatorUnreachable static. > I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine? I agrees that keeping the public interface small is desirable and I tried to do that by splitting the class into `InstCombiner` – the internal, public interface – and `InstCombinerImpl` – the actual implementation of the pass. As far as I understand it, `LLVM_LIBRARY_VISIBILITY` hides this class so it is not visible outside LLVM? With this change, inst combining is split across several places, the general InstCombine and all the targets. They do similar things with the difference that the inst combining part inside the targets does only have access to the public `InstCombiner` interface. As the target specific parts want to use the same helper methods, these helpers need to be in a public interface (public to the targets, not to LLVM users). The most prominent of these helpers is `peekThroughBitcast`. Some of these helper functions are currently not used by targets, so they can be moved to a utils header if desired. In general, I think we want them to be shared, so that not every target has its own set of helpers. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81728/new/ https://reviews.llvm.org/D81728 Files: clang/test/CodeGen/thinlto-distributed-newpm.ll llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/IR/Function.h llvm/include/llvm/Transforms/InstCombine/InstCombiner.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/IR/Function.cpp llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/lib/Target/AMDGPU/InstCombineTables.td llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/X86/CMakeLists.txt llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp llvm/lib/Target/X86/X86TargetTransformInfo.h llvm/lib/Transforms/InstCombine/CMakeLists.txt llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp llvm/lib/Transforms/InstCombine/InstCombineInternal.h llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp llvm/lib/Transforms/InstCombine/InstCombineTables.td llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp llvm/lib/Transforms/InstCombine/InstructionCombining.cpp llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll llvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll llvm/test/Transforms/InstCombine/AMDGPU/ldexp.ll llvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll llvm/test/Transforms/InstCombine/ARM/neon-intrinsics.ll llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll llvm/test/Transforms/InstCombine/X86/X86FsubCmpCombine.ll llvm/test/Transforms/InstCombine/X86/addcarry.ll llvm/test/Transforms/InstCombine/X86/clmulqdq.ll llvm/test/Transforms/InstCombine/X86/x86-avx2.ll llvm/test/Transforms/InstCombine/X86/x86-avx512.ll llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll llvm/test/Transforms/InstCombine/X86/x86-insertps.ll llvm/test/Transforms/InstCombine/X86/x86-masked-memops.ll llvm/test/Transforms/InstCombine/X86/x86-movmsk.ll llvm/test/Transforms/InstCombine/X86/x86-pack.ll llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll llvm/test/Transforms/InstCombine/X86/x86-sse.ll llvm/test/Transforms/InstCombine/X86/x86-sse2.ll llvm/test/Transforms/InstCombine/X86/x86-sse41.ll llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll llvm/test/Transforms/InstCombine/X86/x86-vec_demanded_elts.ll llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll llvm/test/Transforms/InstCombine/X86/x86-xop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81728.275617.patch Type: text/x-patch Size: 484375 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:14:51 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:14:51 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <52b36ca089ba6c97fc2e5621b0e75e03@localhost.localdomain> rampitec updated this revision to Diff 275817. rampitec marked an inline comment as done. rampitec added a comment. Added comment. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 Files: llvm/lib/CodeGen/LiveIntervals.cpp llvm/unittests/MI/LiveIntervalTest.cpp Index: llvm/unittests/MI/LiveIntervalTest.cpp =================================================================== --- llvm/unittests/MI/LiveIntervalTest.cpp +++ llvm/unittests/MI/LiveIntervalTest.cpp @@ -499,6 +499,26 @@ }); } +TEST(LiveIntervalTest, TestMoveSubRegUseAcrossMainRangeHole) { + liveIntervalTest(R"MIR( + %1:sgpr_128 = IMPLICIT_DEF + bb.1: + %2:sgpr_32 = COPY %1.sub2 + %3:sgpr_32 = COPY %1.sub1 + %1.sub2 = COPY %2 + undef %1.sub0 = IMPLICIT_DEF + %1.sub2 = IMPLICIT_DEF + S_CBRANCH_SCC1 %bb.1, implicit undef $scc + S_BRANCH %bb.2 + bb.2: +)MIR", [](MachineFunction &MF, LiveIntervals &LIS) { + MachineInstr &MI = getMI(MF, 3, /*BlockNum=*/1); + MI.getOperand(0).setIsUndef(false); + testHandleMove(MF, LIS, 4, 3, 1); + testHandleMove(MF, LIS, 1, 4, 1); + }); +} + TEST(LiveIntervalTest, BundleUse) { liveIntervalTest(R"MIR( %0 = IMPLICIT_DEF Index: llvm/lib/CodeGen/LiveIntervals.cpp =================================================================== --- llvm/lib/CodeGen/LiveIntervals.cpp +++ llvm/lib/CodeGen/LiveIntervals.cpp @@ -1011,6 +1011,20 @@ } } updateRange(LI, Reg, LaneBitmask::getNone()); + // If main range has a hole and we are moving a subrange use across + // the hole updateRange() cannot properly handle it since it only + // gets the LiveRange and not the whole LiveInterval. As a result + // we may end up with a main range not covering all subranges. + // This is extremely rare case, so let's check and reconstruct the + // main range. + for (LiveInterval::SubRange &S : LI.subranges()) { + if (LI.covers(S)) + continue; + LI.clear(); + LIS.constructMainRangeFromSubranges(LI); + break; + } + continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82916.275817.patch Type: text/x-patch Size: 1851 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Mon Jul 6 13:44:01 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 20:44:01 +0000 (UTC) Subject: [PATCH] D83078: [AMDGPU] Tweak getTypeLegalizationCost() In-Reply-To: References: Message-ID: <06ac03a8db330d8cde691ea75c3adcc3@localhost.localdomain> arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. The adds shouldn't really increase in cost, but the concept of type legality is broken anyway CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83078/new/ https://reviews.llvm.org/D83078 From llvm-commits at lists.llvm.org Mon Jul 6 14:08:12 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Mon, 06 Jul 2020 21:08:12 +0000 (UTC) Subject: [PATCH] D83078: [AMDGPU] Tweak getTypeLegalizationCost() In-Reply-To: References: Message-ID: <927472e38be754e5fc89e21b67ee9ac8@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf7a7efbf88b7: [AMDGPU] Tweak getTypeLegalizationCost() (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83078/new/ https://reviews.llvm.org/D83078 Files: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll llvm/test/Analysis/CostModel/AMDGPU/mul.ll Index: llvm/test/Analysis/CostModel/AMDGPU/mul.ll =================================================================== --- llvm/test/Analysis/CostModel/AMDGPU/mul.ll +++ llvm/test/Analysis/CostModel/AMDGPU/mul.ll @@ -90,7 +90,7 @@ ; ALL: 'mul_v8i64' -; ALL: estimated cost of 128 for {{.*}} mul <8 x i64> +; ALL: estimated cost of 256 for {{.*}} mul <8 x i64> define amdgpu_kernel void @mul_v8i64(<8 x i64> addrspace(1)* %out, <8 x i64> addrspace(1)* %vaddr, <8 x i64> %b) #0 { %vec = load <8 x i64>, <8 x i64> addrspace(1)* %vaddr %mul = mul <8 x i64> %vec, %b Index: llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll =================================================================== --- llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll +++ llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll @@ -90,7 +90,7 @@ } ; ALL: 'add_v16i64' -; ALL: estimated cost of 32 for {{.*}} add <16 x i64> +; ALL: estimated cost of 128 for {{.*}} add <16 x i64> define amdgpu_kernel void @add_v16i64(<16 x i64> addrspace(1)* %out, <16 x i64> addrspace(1)* %vaddr, <16 x i64> %b) #0 { %vec = load <16 x i64>, <16 x i64> addrspace(1)* %vaddr %add = add <16 x i64> %vec, %b Index: llvm/lib/Target/AMDGPU/SIISelLowering.h =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.h +++ llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -464,6 +464,9 @@ MachineFunction &MF, const SIRegisterInfo &TRI, SIMachineFunctionInfo &Info) const; + + std::pair getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const; }; } // End namespace llvm Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11690,3 +11690,18 @@ SmallPtrSet Visited; return hasCFUser(V, Visited, Subtarget->getWavefrontSize()); } + +std::pair +SITargetLowering::getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const { + auto Cost = TargetLoweringBase::getTypeLegalizationCost(DL, Ty); + auto Size = DL.getTypeSizeInBits(Ty); + // Maximum load or store can handle 8 dwords for scalar and 4 for + // vector ALU. Let's assume anything above 8 dwords is expensive + // even if legal. + if (Size <= 256) + return Cost; + + Cost.first = (Size + 255) / 256; + return Cost; +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83078.275829.patch Type: text/x-patch Size: 2613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 00:29:49 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 07:29:49 +0000 (UTC) Subject: [PATCH] D83078: [AMDGPU] Tweak getTypeLegalizationCost() In-Reply-To: References: Message-ID: <9f2d503adc832421340d0e3c5a4c85c0@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGf7a7efbf88b7: [AMDGPU] Tweak getTypeLegalizationCost() (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83078/new/ https://reviews.llvm.org/D83078 Files: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll llvm/test/Analysis/CostModel/AMDGPU/mul.ll Index: llvm/test/Analysis/CostModel/AMDGPU/mul.ll =================================================================== --- llvm/test/Analysis/CostModel/AMDGPU/mul.ll +++ llvm/test/Analysis/CostModel/AMDGPU/mul.ll @@ -90,7 +90,7 @@ ; ALL: 'mul_v8i64' -; ALL: estimated cost of 128 for {{.*}} mul <8 x i64> +; ALL: estimated cost of 256 for {{.*}} mul <8 x i64> define amdgpu_kernel void @mul_v8i64(<8 x i64> addrspace(1)* %out, <8 x i64> addrspace(1)* %vaddr, <8 x i64> %b) #0 { %vec = load <8 x i64>, <8 x i64> addrspace(1)* %vaddr %mul = mul <8 x i64> %vec, %b Index: llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll =================================================================== --- llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll +++ llvm/test/Analysis/CostModel/AMDGPU/add-sub.ll @@ -90,7 +90,7 @@ } ; ALL: 'add_v16i64' -; ALL: estimated cost of 32 for {{.*}} add <16 x i64> +; ALL: estimated cost of 128 for {{.*}} add <16 x i64> define amdgpu_kernel void @add_v16i64(<16 x i64> addrspace(1)* %out, <16 x i64> addrspace(1)* %vaddr, <16 x i64> %b) #0 { %vec = load <16 x i64>, <16 x i64> addrspace(1)* %vaddr %add = add <16 x i64> %vec, %b Index: llvm/lib/Target/AMDGPU/SIISelLowering.h =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.h +++ llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -464,6 +464,9 @@ MachineFunction &MF, const SIRegisterInfo &TRI, SIMachineFunctionInfo &Info) const; + + std::pair getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const; }; } // End namespace llvm Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11690,3 +11690,18 @@ SmallPtrSet Visited; return hasCFUser(V, Visited, Subtarget->getWavefrontSize()); } + +std::pair +SITargetLowering::getTypeLegalizationCost(const DataLayout &DL, + Type *Ty) const { + auto Cost = TargetLoweringBase::getTypeLegalizationCost(DL, Ty); + auto Size = DL.getTypeSizeInBits(Ty); + // Maximum load or store can handle 8 dwords for scalar and 4 for + // vector ALU. Let's assume anything above 8 dwords is expensive + // even if legal. + if (Size <= 256) + return Cost; + + Cost.first = (Size + 255) / 256; + return Cost; +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83078.275634.patch Type: text/x-patch Size: 2613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:33:49 2020 From: llvm-commits at lists.llvm.org (Peter Smith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:33:49 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: <30a39b9a856cf8529643d16356268fae@localhost.localdomain> psmith accepted this revision. psmith added a comment. This revision is now accepted and ready to land. LGTM it looks like a good simplification. Worth waiting a little bit to see if there are any objections. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Tue Jul 7 01:34:08 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:34:08 +0000 (UTC) Subject: [PATCH] D83282: [DWARFYAML] Refactor: Pull out member functions to DWARFYAMLUtils.cpp. In-Reply-To: References: Message-ID: <7250034af5c9ed6adaa1ff65207ae98d@localhost.localdomain> grimar added a comment. So, currently, for example, `DWARFYAML::Data` is defined in `DWARFYAML.h` (https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ObjectYAML/DWARFYAML.h#L180) and implemented in `DWARFYAML.cpp`. With this change the implementation of members is moved to a new `DWARFYAMLUtils.cpp` file. I.e. we have `DWARFYAML.h` + `DWARFYAMLUtils.cpp` with this patch. Honestly, it looks a bit unnatural to me, because it breaks the straightforward common approach: "define in a header, implement in a cpp with the same name". You're saying that "In the future, we will add some helper functions that make DWARF sections interlinked". Perhaps I'd wait for these functions to appear and may be then create `DWARFYAMLUtils.h` + `DWARFYAMLUtils.cpp` for them. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83282/new/ https://reviews.llvm.org/D83282 From llvm-commits at lists.llvm.org Tue Jul 7 01:34:27 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:34:27 +0000 (UTC) Subject: [PATCH] D80991: [WIP][Attributor] AAPotentialValues Attribute In-Reply-To: References: Message-ID: <5d515655321a330f62f1cab61fb17109@localhost.localdomain> okura added a comment. Herald added a subscriber: bbn. This patch is too large and hard to review, I split off this patch. D83283 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80991/new/ https://reviews.llvm.org/D80991 From llvm-commits at lists.llvm.org Tue Jul 7 01:35:24 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:35:24 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <39f95cc6ae3f7926b2ffb1b2667d16d3@localhost.localdomain> hans added a comment. > I don't want to block this patch, but I do agree with Eric's point. We *really* want to focus more on the switch then invest into more LPM infra. Short term resolutions to unblock folks, with the best split possible, sure, keeping in mind they'll need to be cleaned up. Sounds good to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 01:35:40 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:35:40 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: hans accepted this revision. hans added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 01:37:32 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:37:32 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/mips-plt.test:64 +# GNU-NEXT: 0041081c 004007c0 00000000 FUNC UND puts +# GNU-NEXT: 00410820 004007c0 00000000 FUNC UND __libc_start_main + ---------------- MaskRay wrote: > `# ` prepending can be committed separately. +1 to this (in both tests) ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3077-3084 + if (!PltSec) + return createError("There is no not empty PLTGOT section at 0x " + + Twine::utohexstr(*DtMipsPltGot)); + + PltRelSec = findNotEmptySectionByAddress(Obj, FileName, *DtJmpRel); + if (!PltRelSec) + return createError("There is no not empty RELPLT section at 0x" + ---------------- These two cases don't appear to have tests? Also, the errors should be "not empty" -> "non-empty" and have lower-case first letters. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Tue Jul 7 01:39:19 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Tue, 07 Jul 2020 01:39:19 -0700 (PDT) Subject: [llvm] 9a1a7d8 - [SVE] Add more warnings checks to clang and LLVM SVE tests Message-ID: <5f0434b7.1c69fb81.5a597.4dfb@mx.google.com> Author: David Sherwood Date: 2020-07-07T09:33:20+01:00 New Revision: 9a1a7d888b53ebe5a934a8193de37da86e276f1e URL: https://github.com/llvm/llvm-project/commit/9a1a7d888b53ebe5a934a8193de37da86e276f1e DIFF: https://github.com/llvm/llvm-project/commit/9a1a7d888b53ebe5a934a8193de37da86e276f1e.diff LOG: [SVE] Add more warnings checks to clang and LLVM SVE tests There are now more SVE tests in LLVM and Clang that do not emit warnings related to invalid use of EVT::getVectorNumElements() and VectorType::getNumElements(). For these tests I have added additional checks that there are no warnings in order to prevent any future regressions. Differential Revision: https://reviews.llvm.org/D82943 Added: Modified: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll llvm/test/CodeGen/AArch64/sve-fcmp.ll llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll llvm/test/CodeGen/AArch64/sve-gep.ll llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll Removed: ################################################################################ diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c index 7db98086d286..d1752815bcee 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c index 89d16d955f8d..0377105a1ff0 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c index a45bf6b01710..f411cd2cf627 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c index ed6750ab2910..cbbd6ed94753 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c index e6b7b65513b3..269801c71fdc 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c index 19695759167f..56c761c77318 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c index 7512c05856a1..f49520b3e360 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c index 4fa8e656f964..cb93f1ece5fb 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c index 37cbd818de76..b19d51555956 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c index b8cbaea05de0..272af486f895 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c index 4727cb1177a7..bc9d506b5033 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c index a3b8f3c91600..5475d55564a6 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c index 5412915dcc32..d604d13356e5 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c index d174e5ee79d0..f4d8478ec83e 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint8_t test_svindex_s8(int8_t base, int8_t step) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c index c6729546606e..7a108331c940 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c index 758295fe554d..6475b19ab653 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c index 749b3d0dcd0d..3f4db7aec244 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c index 4f2d97d6a0e6..e3209fa8d924 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c index 6aa806063dd9..9219b687bd2c 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c index 0e7d62480805..1c2b48becbb9 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c index b02f28958f0b..0915a8ae4959 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c index c5841824bfc2..5c1fd27c3bbb 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c index 4595867811b2..b7892b96d0bb 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c index cffc1080d036..108980c19043 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c index 83c20e4ac510..2a267e8301c8 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c index a92ba68d9a72..6865c3ee6258 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c index ebde19c625ef..841da37bc12f 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint16_t test_svldnf1sb_s16(svbool_t pg, const int8_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c index d862954855cf..e5a1666abd60 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint32_t test_svldnf1sh_s32(svbool_t pg, const int16_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c index eac86a59a42b..addb6825aa37 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint64_t test_svldnf1sw_s64(svbool_t pg, const int32_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c index fb0abc4ed6e2..63ea57c43c55 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint16_t test_svldnf1ub_s16(svbool_t pg, const uint8_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c index 624aefd6ff27..3d70bba7f7d4 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint32_t test_svldnf1uh_s32(svbool_t pg, const uint16_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c index b1f9e5398fa0..685a563c99ff 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint64_t test_svldnf1uw_s64(svbool_t pg, const uint32_t *base) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c index eeaae1e104f7..4d023fa34b8f 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svbool_t test_svpnext_b8(svbool_t pg, svbool_t op) diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c index 496fea8a2051..98ea771ddf54 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svbool_t test_svptrue_b8() diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c index 336e00bc2ab9..f48c6c71f496 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c index 773eda97a7f9..3a9dcf0c739f 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c @@ -1,7 +1,11 @@ // REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t // CHECK-NOT: warning +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include void test_svsetffr() diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c index 0e9b61cb3c32..7e2e8e133bda 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c index 51bf31c6c744..704b8d10f715 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c index b60e083c24f3..581ea7a050d3 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c @@ -1,5 +1,10 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include svint8_t test_svundef_s8() diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c index 72c40dc9f82c..cae68a3e0a85 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c index 44598c0a84b1..a73419779a28 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c index fb35121df82c..6eea3bc17d98 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c index 204e94e3dc48..645e96c4e55f 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c index a322612653c1..6aa30f2ef59b 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c index a689c8921048..6904dbc079b7 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c index e438a6ac4d09..218f63764bec 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c index bab6ea8ed532..099d78958697 100644 --- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c +++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c @@ -1,6 +1,11 @@ +// REQUIRES: aarch64-registered-target // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -o - %s >/dev/null 2>%t +// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t +// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it. +// ASM-NOT: warning #include #ifdef SVE_OVERLOADED_FORMS diff --git a/llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll b/llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll index ca29e15697fe..caa8d32186f4 100644 --- a/llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll +++ b/llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll @@ -1,6 +1,9 @@ ; Because some arguments are passed by reference (through stack), ; the compiler should not do tail-call optimization. -; RUN: llc -mtriple=aarch64 -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64 -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; CHECK-LABEL: caller: ; CHECK: addvl sp, sp, #-[[STACKSIZE:[0-9]+]] diff --git a/llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll b/llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll index bbb8209941b0..d579ba08b59b 100644 --- a/llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll +++ b/llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -stop-after=finalize-isel < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -stop-after=finalize-isel < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; Test that z8 and z9, passed in by reference, are correctly loaded from x0 and x1. ; i.e. z0 = %z0 diff --git a/llvm/test/CodeGen/AArch64/sve-fcmp.ll b/llvm/test/CodeGen/AArch64/sve-fcmp.ll index cbafae608262..703e86d9f453 100644 --- a/llvm/test/CodeGen/AArch64/sve-fcmp.ll +++ b/llvm/test/CodeGen/AArch64/sve-fcmp.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning define @oeq( %x, %x2) { ; CHECK-LABEL: oeq: diff --git a/llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll b/llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll index dffeee2df570..e9e34ada83d1 100644 --- a/llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll +++ b/llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; Verify that DAG combine rules for LD1 + sext/zext don't apply when the ; result of LD1 has multiple uses diff --git a/llvm/test/CodeGen/AArch64/sve-gep.ll b/llvm/test/CodeGen/AArch64/sve-gep.ll index 1b558a833f3b..48fc3ccb48bb 100644 --- a/llvm/test/CodeGen/AArch64/sve-gep.ll +++ b/llvm/test/CodeGen/AArch64/sve-gep.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning define * @scalar_of_scalable_1(* %base) { ; CHECK-LABEL: scalar_of_scalable_1: diff --git a/llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll b/llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll index d8d44a8f5611..b721cc7b00c5 100644 --- a/llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll +++ b/llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; SMAX diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll index a86da9594a21..5637a6982c2a 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1H, LDFF1W, LDFF1D: base + 32-bit scaled offset, sign (sxtw) or zero (uxtw) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll index 012812fb22b0..27aa5622160d 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1B, LDFF1W, LDFF1H, LDFF1D: base + 32-bit unscaled offset, sign (sxtw) or zero diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll index 4d5267356081..6cfbddf031da 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1H, LDFF1W, LDFF1D: base + 64-bit scaled offset diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll index 570bac58cc9a..9e17b470037a 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1B, LDFF1W, LDFF1H, LDFF1D: base + 64-bit unscaled offset diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll index 5cb887932eff..d591614b964c 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1B, LDFF1W, LDFF1H, LDFF1D: vector base + immediate offset (index) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll index 4b9fba9dc275..6534f32cfbb1 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1B, LDFF1W, LDFF1H, LDFF1D: vector base + scalar offset (index) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll index db593413f7af..b03e1a25f5bf 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1H, LD1W, LD1D: base + 32-bit scaled offset, sign (sxtw) or zero (uxtw) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll index ba8806986d69..cf3847323734 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B, LD1W, LD1H, LD1D: base + 32-bit unscaled offset, sign (sxtw) or zero diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll index 10de34975fa0..3818c6178faa 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1H, LD1W, LD1D: base + 64-bit scaled offset diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll index fddbc24e911a..87580c92e710 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B, LD1W, LD1H, LD1D: base + 64-bit unscaled offset diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll index c7798e7f52d2..856d29aec7a4 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B, LD1W, LD1H, LD1D: vector base + immediate offset (index) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll index 3d84c0bbfc71..f877d24111da 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B, LD1W, LD1H, LD1D: vector base + scalar offset (index) diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll index 60a9dd88da6e..18d7f3515756 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; SMAX diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll index e3fccea179e6..2aaf222504a7 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll index a47da1c004ca..e66b84a74103 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll index 69f20fa5c13e..e5d200945098 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll @@ -1,5 +1,8 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s ; RUN: llc -O0 -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1B diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll index b4ac587c0b79..1a4a25c83b34 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+f64mm,+bf16 -asm-verbose=0 < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+f64mm,+bf16 -asm-verbose=0 < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LD1ROB diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll index 96de8cc67802..e56ebcbde8e6 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ; LDFF1B diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll index 27394bbbf944..31f5c6797bbc 100644 --- a/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll +++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; Range testing for the immediate in the reg+imm(mulvl) addressing ; mode is done only for one instruction. The rest of the instrucions From llvm-commits at lists.llvm.org Tue Jul 7 01:39:21 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:39:21 +0000 (UTC) Subject: [PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests In-Reply-To: References: Message-ID: <73065efd3bd258576bc6b300816b4c78@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9a1a7d888b53: [SVE] Add more warnings checks to clang and LLVM SVE tests (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82943/new/ https://reviews.llvm.org/D82943 Files: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll llvm/test/CodeGen/AArch64/sve-fcmp.ll llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll llvm/test/CodeGen/AArch64/sve-gep.ll llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82943.275941.patch Type: text/x-patch Size: 63544 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:40:27 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:40:27 +0000 (UTC) Subject: [PATCH] D83282: [DWARFYAML] Refactor: Pull out member functions to DWARFYAMLUtils.cpp. In-Reply-To: References: Message-ID: <5c4c85f92504de8c16bef4341842291d@localhost.localdomain> Higuoxing planned changes to this revision. Higuoxing added a comment. In D83282#2135231 , @grimar wrote: > So, currently, for example, `DWARFYAML::Data` is defined in `DWARFYAML.h` > (https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ObjectYAML/DWARFYAML.h#L180) and implemented in `DWARFYAML.cpp`. > > With this change the implementation of members is moved to a new `DWARFYAMLUtils.cpp` file. I.e. we have `DWARFYAML.h` + `DWARFYAMLUtils.cpp` with this patch. > Honestly, it looks a bit unnatural to me, because it breaks the straightforward common approach: "define in a header, implement in a cpp with the same name". > > You're saying that "In the future, we will add some helper functions that make DWARF sections interlinked". Perhaps I'd wait for these functions to appear and may be > then create `DWARFYAMLUtils.h` + `DWARFYAMLUtils.cpp` for them. Sure, I will upload them later to see how it works, thanks for the suggestion! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83282/new/ https://reviews.llvm.org/D83282 From llvm-commits at lists.llvm.org Tue Jul 7 01:42:58 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:42:58 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <53ed24925af01a010eae2af70f770ab9@localhost.localdomain> nikic requested changes to this revision. nikic added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 01:43:43 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:43:43 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: sdesmalen accepted this revision. sdesmalen added a comment. This revision is now accepted and ready to land. nit: The title and commit message speaks of fixing warnings (in some test), but I think many people who don't follow our developments closely will not know what those 'warnings' relate to. Perhaps you can phrase the title as `[CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR`. Also I think it's worth explaining in the commit message why you're moving this code, e.g. `getVectorNumElements` is not safe for scalable vectors and should normally use `getVectorElementCount` instead. But because at this point in the code only fixed-width vectors are used, the use of `getVectorNumElements` is valid. LGTM otherwise. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83195/new/ https://reviews.llvm.org/D83195 From llvm-commits at lists.llvm.org Tue Jul 7 01:44:30 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:44:30 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <77aefeac3b6ccb0f4f6ab69d52cdca4e@localhost.localdomain> david-arm added a comment. The patch overall looks good to me - just a question about the assert! ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7135 if (IsCompressedMemory) { + assert(!DataVT.isScalableVector() && + "Cannot currently handle compressed memory with scalable vectors"); ---------------- Do we know if this is something we catch earlier and hence should never get here? I just wonder if here it's not really an assert that something went wrong with the code, but perhaps we just hit a case we don't support yet? If it's just because we don't support it yet, instead of asserting we could do: if (DataVT.isScalableVector()) report_fatal_error("Cannot currently handle compressed memory with scalable vectors"); Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Tue Jul 7 01:45:57 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:45:57 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks Message-ID: mkazantsev created this revision. mkazantsev added reviewers: nikic, lebedev.ri, fhahn, asbirlea. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. We can try to replace select with a Phi not in its parent block alone, but also in blocks of its arguments. We benefit from it when select's argument is a Phi. https://reviews.llvm.org/D83284 Files: llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/test/Transforms/InstCombine/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83284.275943.patch Type: text/x-patch Size: 7121 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:46:29 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:46:29 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: grimar accepted this revision. grimar added a comment. LGTM too. A question/suggestion for a follow-up is inlined. ================ Comment at: lld/ELF/Relocations.cpp:211 // DTPMOD may not be expected at load time. bool isLocalInExecutable = !sym.isPreemptible && !config->shared; ---------------- It looks like we can get rid of this variable now? I.e. looks like `isLocalInExecutable` can be replaced with `!sym.isPreemptible` everywhere. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Tue Jul 7 01:48:01 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:48:01 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <075d0af6fe69de2ad4cfc9782933e613@localhost.localdomain> mkazantsev added a comment. Ideally we could consider all dominator chain, but this is too expensive. This is attempt to reduce the impact. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Tue Jul 7 01:48:04 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:48:04 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <509a639043f8e8b11eabc6ff51806bdf@localhost.localdomain> ebevhan marked an inline comment as done. ebevhan added a comment. In D83216#2134130 , @lebedev.ri wrote: > Was there an RFC for this? No explicit RFC for these particular intrinsics, but they have been mentioned in the larger scope of fixed-point support: - http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html - http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html I notice now that Leonard's mail does not mention saturated shifts, but my older one does. > While i agree it likely makes sense to have these for consistency, > i'm not sure why they are *needed* for implementing the Embedded-C fixed point support in Clang. Yes, "needed" might be a stronger wording than necessary. I originally wrote "useful" but was concerned it wasn't strong enough. Of course, they aren't needed per se, but it becomes more of a hassle to select instructions for the operations if there are no intrinsics. ================ Comment at: llvm/lib/IR/Verifier.cpp:4948-4951 + Assert(Op1->getType()->isIntegerTy(), + "first operand of [us]shl_sat must be an int type"); + Assert(Op2->getType()->isIntegerTy(), + "second operand of [us]shl_sat must be an int type"); ---------------- lebedev.ri wrote: > I don't think it makes sense to limit these to scalars. The add.sat and sub.sat intrinsics were given vector operands because they were useful for some of x86's vector instructions. I couldn't see any such operations for shifts, but I can add the vector type support for consistency. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Tue Jul 7 01:54:07 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:54:07 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <5429cc86a5a39b469bcef7a1c17230bb@localhost.localdomain> jhenderson added reviewers: rupprecht, MaskRay, grimar. jhenderson added a comment. Hi @danzimm, In LLVM tools using the cl::opt interface, like llvm-symbolizer and llvm-addr2line, where you see a `cl::opt` (such as `--inlining`) you should be able to do `--inlininng=0` to disable it. I'm not necessarily opposed to `--no-inlining` too mind you, but wanted to raise that before going further. If you wish to go ahead with adding `--no-inlining`, I'd recommend moving it to a separate patch, as it is independent of the `--source-files` stuff. Also, please add any new options to the llvm-symbolizer and llvm-addr2line docs (located in llvm/docs/CommandGuide). In D83262#2134394 , @danzimm wrote: > If there are tests somewhere that I can add to, please point me to them! I'd love to add some tests, just couldn't quite find any (I'm guessing I'm just not looking in the right place... 😅) Most tests for llvm-symbolizer (and llvm-addr2line) are located in llvm/test/tools/llvm-symbolizer (there are a few in other scattered locations - grep for llvm-symbolizer). Any new front-end options like these should be tested there. > Together --functions=linkage --demangle --output-style=LLVM --no-source-file --no-inlining results in a list of symbol names which appear in the resulting binary. This is useful when symbolicating a list of addresses e.g. for link order files. > > N.B. The same data can be extracted with a processor on top of --functions=linkage --demangle --output-style=LLVM, however with large lists of symbols I've found that this takes quite a long time (my processor(s) were in perl/python- in theory I could've written a C/++ one, but I figure best just add these as formatting options to llvm-symbolizer instead). Are you specifically interested in symbols at specific addresses, or with a specific type? llvm-nm and llvm-readobj can both be used to dump symbols too. It doesn't feel to me like llvm-symbolizer is the right tool for the job if you want to dump all symbols (or all functions), though I could possibly see an argument if you are limiting it to the symbols with specific addresses. I personally would think it would make more sense to add any necessary options to llvm-nm or possibly llvm-readelf. Adding a few others with binutils knowledge for more visibility and to get their input. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Tue Jul 7 01:57:12 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:57:12 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: <905528ca02b7746c7ea5822be1400867@localhost.localdomain> serge-sans-paille updated this revision to Diff 275946. serge-sans-paille marked 2 inline comments as done. serge-sans-paille added a comment. Stick to original API for easier maintenance. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp Index: llvm/lib/CodeGen/CodeGenPrepare.cpp =================================================================== --- llvm/lib/CodeGen/CodeGenPrepare.cpp +++ llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -2822,8 +2822,9 @@ TypePromotionTransaction(SetOfInstrs &RemovedInsts) : RemovedInsts(RemovedInsts) {} - /// Advocate every changes made in that transaction. - void commit(); + /// Advocate every changes made in that transaction. Return true if any change + /// happen. + bool commit(); /// Undo all the changes made after the given point. void rollback(ConstRestorationPt Point); @@ -2929,11 +2930,13 @@ return !Actions.empty() ? Actions.back().get() : nullptr; } -void TypePromotionTransaction::commit() { +bool TypePromotionTransaction::commit() { for (CommitPt It = Actions.begin(), EndIt = Actions.end(); It != EndIt; ++It) (*It)->commit(); + bool Modified = !Actions.empty(); Actions.clear(); + return Modified; } void TypePromotionTransaction::rollback( @@ -4959,7 +4962,7 @@ TPT.rollback(LastKnownGood); return false; } - TPT.commit(); + bool Modified = TPT.commit(); // Get the combined AddrMode (or the only AddrMode, if we only had one). ExtAddrMode AddrMode = AddrModes.getAddrMode(); @@ -4973,7 +4976,7 @@ })) { LLVM_DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n"); - return false; + return Modified; } // Insert this computation right after this user. Since our caller is @@ -5014,7 +5017,7 @@ // We can't add more than one pointer together, nor can we scale a // pointer (both of which seem meaningless). if (ResultPtr || AddrMode.Scale != 1) - return false; + return Modified; ResultPtr = AddrMode.ScaledReg; AddrMode.Scale = 0; @@ -5031,12 +5034,12 @@ Type *ScaledRegTy = AddrMode.ScaledReg->getType(); if (cast(IntPtrTy)->getBitWidth() > cast(ScaledRegTy)->getBitWidth()) - return false; + return Modified; } if (AddrMode.BaseGV) { if (ResultPtr) - return false; + return Modified; ResultPtr = AddrMode.BaseGV; } @@ -5060,7 +5063,7 @@ !AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) { SunkAddr = Constant::getNullValue(Addr->getType()); } else if (!ResultPtr) { - return false; + return Modified; } else { Type *I8PtrTy = Builder.getInt8PtrTy(Addr->getType()->getPointerAddressSpace()); @@ -5145,7 +5148,7 @@ (ScalePtrTy && DL->isNonIntegralPointerType(ScalePtrTy)) || (AddrMode.BaseGV && DL->isNonIntegralPointerType(AddrMode.BaseGV->getType()))) - return false; + return Modified; LLVM_DEBUG(dbgs() << "CGP: SINKING nonlocal addrmode: " << AddrMode << " for " << *MemoryInst << "\n"); @@ -5185,7 +5188,7 @@ Instruction *I = dyn_cast_or_null(Result); if (I && (Result != AddrMode.BaseReg)) I->eraseFromParent(); - return false; + return Modified; } if (AddrMode.Scale != 1) V = Builder.CreateMul(V, ConstantInt::get(IntPtrTy, AddrMode.Scale), -------------- next part -------------- A non-text attachment was scrubbed... Name: D81256.275946.patch Type: text/x-patch Size: 3297 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:57:36 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:57:36 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: serge-sans-paille added inline comments. ================ Comment at: llvm/lib/CodeGen/CodeGenPrepare.cpp:2943 +bool TypePromotionTransaction::changed( + TypePromotionTransaction::ConstRestorationPt Point) { ---------------- foad wrote: > Maybe `changedSince` would be a better name? I've reworked the patch to make it closer to the original code, making that change no longer necessary. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 From llvm-commits at lists.llvm.org Tue Jul 7 01:58:53 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:58:53 +0000 (UTC) Subject: [PATCH] D83028: [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll In-Reply-To: References: Message-ID: <13955d19cb7ec696e8781089b51c85a7@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc061e56e880a: [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83028/new/ https://reviews.llvm.org/D83028 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll Index: llvm/test/CodeGen/AArch64/sve-vector-splat.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-vector-splat.ll +++ llvm/test/CodeGen/AArch64/sve-vector-splat.ll @@ -1,4 +1,7 @@ -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ;; Splats of legal integer vector types Index: llvm/test/CodeGen/AArch64/sve-trunc.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-trunc.ll +++ llvm/test/CodeGen/AArch64/sve-trunc.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; For all the functions below should the operation is a nop define @trunc_i16toi8( %in) { Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -656,7 +656,7 @@ static SDValue widenVectorToPartType(SelectionDAG &DAG, SDValue Val, const SDLoc &DL, EVT PartVT) { - if (!PartVT.isVector()) + if (!PartVT.isFixedLengthVector()) return SDValue(); EVT ValueVT = Val.getValueType(); @@ -702,8 +702,9 @@ Val = Widened; } else if (PartVT.isVector() && PartEVT.getVectorElementType().bitsGE( - ValueVT.getVectorElementType()) && - PartEVT.getVectorNumElements() == ValueVT.getVectorNumElements()) { + ValueVT.getVectorElementType()) && + PartEVT.getVectorElementCount() == + ValueVT.getVectorElementCount()) { // Promoted vector extract Val = DAG.getAnyExtOrTrunc(Val, DL, PartVT); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83028.275648.patch Type: text/x-patch Size: 2218 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:58:54 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:58:54 +0000 (UTC) Subject: [PATCH] D82524: [SVE][CodeGen] Fix bug when falling back to DAG ISel In-Reply-To: References: Message-ID: <25726ad90fad3bbb47bc3d5d4b7a5ea4@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. david-arm marked an inline comment as done. Closed by commit rG79d34a5a1bce: [SVE][CodeGen] Fix bug when falling back to DAG ISel (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82524/new/ https://reviews.llvm.org/D82524 Files: llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.h llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82524.275649.patch Type: text/x-patch Size: 4885 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:59:03 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:59:03 +0000 (UTC) Subject: [PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG9a1a7d888b53: [SVE] Add more warnings checks to clang and LLVM SVE tests (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82943/new/ https://reviews.llvm.org/D82943 Files: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll llvm/test/CodeGen/AArch64/sve-fcmp.ll llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll llvm/test/CodeGen/AArch64/sve-gep.ll llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82943.275650.patch Type: text/x-patch Size: 63544 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 01:59:29 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 08:59:29 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <8afb73dab599c9a1e635fec72e2f1eb7@localhost.localdomain> sdesmalen added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7155 + DAG.getVScale(DL, AddrVT, + APInt(Addr.getValueSizeInBits().getFixedSize(), + DataVT.getSizeInBits().getKnownMinSize() / 8)); ---------------- Given that the type of VScale will be `AddrVT`, it's clearer to use `AddrVT.getSizeInBits().getFixedSize()`, that avoids the types being different. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7156 + APInt(Addr.getValueSizeInBits().getFixedSize(), + DataVT.getSizeInBits().getKnownMinSize() / 8)); } else ---------------- Should this be using `DataVT.getStoreSize()` instead of `DataVT.getSizeInBits()` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Tue Jul 7 02:03:06 2020 From: llvm-commits at lists.llvm.org (Hafiz Abid Qadeer via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:03:06 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. In-Reply-To: References: Message-ID: abidh added a comment. In D83244#2133669 , @MaskRay wrote: > The `.eh_frame` test case is invalid. LLD handles .eh_frame input sections differently. It parses .eh_frame and deduplicates them. See `eh-frame-merge.s`, an input .eh_frame referencing a non-prevailing COMDAT group is dropped (EhFrameSection::isFdeLive) > > Do you have a realistic case where LLD erroneously errors? If so, can you get a minimal reproduce, use `LLD_REPRODUCE=/tmp/rep.tar` or `-Wl,--reproduce=/tmp/rep.tar` to get a reproduce file and upload it somewhere? The problem that I faced is with gcc_except_table. I added .eh_frame for completion sake. The problem was coming while linking in libc++ for a beremetal target. I will try if I can get a minimal testcase from that but it may be difficult. The error looks like this: ld.lld: error: relocation refers to a symbol in a discarded section: >>> defined in /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a(locale.cpp.o) >>> section group signature: _ZNSt3__116__pad_and_outputIcNS_11char_traitsIcEEEENS_19ostreambuf_iteratorIT_T0_EES6_PKS4_S8_S8_RNS_8ios_baseES4_ >>> prevailing definition is in /tmp/test-48f00f.o >>> referenced by locale.cpp >>> >>> locale.cpp.o:(.gcc_except_table+0x6FD) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a >>> >>> referenced by locale.cpp >>> >>> locale.cpp.o:(.gcc_except_table+0x706) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a >>> >>> referenced by locale.cpp >>> >>> locale.cpp.o:(.gcc_except_table+0x70A) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83244/new/ https://reviews.llvm.org/D83244 From llvm-commits at lists.llvm.org Tue Jul 7 02:03:09 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:03:09 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: <1269595e28bd92cf94e4d9063f0c0ec9@localhost.localdomain> gchatelet updated this revision to Diff 275947. gchatelet added a comment. - Address comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 Files: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83202.275947.patch Type: text/x-patch Size: 9830 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:03:50 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:03:50 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: <1477e7afccb063043ce51f8830c8c8c4@localhost.localdomain> jhenderson added a comment. In D83152#2134405 , @keith wrote: > In D83152#2133950 , @rupprecht wrote: > > > In D83152#2133855 , @MaskRay wrote: > > > > > I cannot find any search result about `no-warning-for-no-symbols`. Is `-no-warning-for-no-symbols` really an existing option? libtool is an `ar` like tool. > > > > > > I found it by looking for underscores instead of hyphens: `-no_warning_for_no_symbols`. > > However, the flag is an ar/ranlib/libtool flag, not nm, AFAICT. > > > Yea sorry I should have been more clear, it's not the _exact_ same spelling because of the conventions used in nm with `-` instead of `_`. > > >> Second, I wonder how you are going to plug `-no-warning-for-no-symbols` into a build system. If you only parse stdout, you can ignore stderr. Even if you do, you can probably use `grep -v '^no symbols'`. This will have better portability (supported on older nm, supported on other binary formats). > > > > I agree this is likely the simpler option (just add `2> /dev/null` to the build script using `nm`) > > If folks feel strongly about this that would definitely work, this felt like a safer way to silence this for the future for me, but if you all think it's not worth adding an option for that's fine. I don't have strong opinions either way. I think there probably is some benefit to adding the new option: one counter-point to redirecting stderr to `/dev/null` is that will hide any real error messages llvm-nm (e.g. caused by a missing input file). I don't think that's a good idea personally. That being said, if stderr is being redirected somewhere other than `/dev/null` maybe it's okay. Another issue is that because stderr and stdout are often handled independently, but end up in the same final output, you can end up with the "no symbols" message appearing quite some way from where it really belongs, which could be slightly confusing. However, it's not an issue I actually have at the moment. Another possible benefit (again not something I personally have worried about, but one I thought of) is that users could specify the option to avoid having to special case parsing scripts to handle no symbols. A naive script would typically just iterate through the lines and expect each one to correspond to a symbol, so if there are none, the parser might break. Another option name could just be `--quiet`, although there's a small risk that could clash in the future with a GNU option. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Tue Jul 7 02:07:25 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:07:25 +0000 (UTC) Subject: [PATCH] D83037: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). In-Reply-To: References: Message-ID: <41ea63f4c62f3d408b2dde13dcd83459@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGd5cbf7ba3252: [llvm-readobj] - Fix a crash scenario in GNUStyle<ELFT>::printHashSymbols(). (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83037/new/ https://reviews.llvm.org/D83037 Files: llvm/test/tools/llvm-readobj/ELF/hash-symbols.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83037.275948.patch Type: text/x-patch Size: 7651 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:07:25 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 02:07:25 -0700 (PDT) Subject: [llvm] d5cbf7b - [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). Message-ID: <5f043b4d.1c69fb81.5a90d.067b@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T11:59:00+03:00 New Revision: d5cbf7ba32527dba53fa673ff7fd7f7fbb0b82fc URL: https://github.com/llvm/llvm-project/commit/d5cbf7ba32527dba53fa673ff7fd7f7fbb0b82fc DIFF: https://github.com/llvm/llvm-project/commit/d5cbf7ba32527dba53fa673ff7fd7f7fbb0b82fc.diff LOG: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). We might crash when the dynamic symbols table is empty (or not found) and --hash-symbols is requested. Both .hash and .gnu.hash logic is affected. The patch fixes this issue. Differential revision: https://reviews.llvm.org/D83037 Added: Modified: llvm/test/tools/llvm-readobj/ELF/hash-symbols.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/hash-symbols.test b/llvm/test/tools/llvm-readobj/ELF/hash-symbols.test index b43f3eb64aec..2576fe03deae 100644 --- a/llvm/test/tools/llvm-readobj/ELF/hash-symbols.test +++ b/llvm/test/tools/llvm-readobj/ELF/hash-symbols.test @@ -509,3 +509,131 @@ ProgramHeaders: Sections: - Section: .gnu.hash - Section: .dynamic + +## Check the behavior when the dynamic symbol table is empty or not found. + +## Case A.1: Check we report a warning when the dynamic symbol table is empty and we attempt to print hash symbols +## from the .hash table. The number of symbols in the dynamic symbol table can be calculated from its size +## or derived from the Chain vector of the .hash table. Make both ways to return a zero to do the check. +# RUN: yaml2obj --docnum=9 %s -o %t9.1.so +# RUN: llvm-readelf --hash-symbols %t9.1.so 2>&1 | FileCheck %s -DFILE=%t9.1.so --check-prefix=DYNSYM-EMPTY-HASH + +# DYNSYM-EMPTY-HASH: Symbol table of .hash for image: +# DYNSYM-EMPTY-HASH-NEXT: Num Buc: Value Size Type Bind Vis Ndx Name +# DYNSYM-EMPTY-HASH-NEXT: warning: '[[FILE]]': unable to print symbols for the .hash table: the dynamic symbol table is empty +# DYNSYM-EMPTY-HASH-NOT: {{.}} + +--- !ELF +FileHeader: + Class: ELFCLASS32 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_386 +Sections: + - Name: .hash + Type: SHT_HASH + Flags: [ SHF_ALLOC ] + Bucket: [ 0 ] + Chain: [ ] + - Name: .dynamic + Type: SHT_DYNAMIC + Flags: [ SHF_ALLOC ] + Entries: + - Tag: DT_HASH + Value: 0x0 + - Tag: DT_STRTAB +## PT_LOAD p_offset == .hash offset == 0x54. +## 0x54 + 0x2c == 0x80 == .dynstr offset. + Value: 0x2c + - Tag: DT_STRSZ + Value: 0x1 + - Tag: DT_NULL + Value: 0x0 + - Name: .dynstr + Type: SHT_STRTAB + Flags: [ SHF_ALLOC ] + - Name: .dynsym + Type: [[DYNSYMTYPE=SHT_DYNSYM]] + Flags: [ SHF_ALLOC ] + Size: 0 +ProgramHeaders: + - Type: PT_LOAD + Flags: [ PF_R, PF_X ] + Sections: + - Section: .hash + - Section: .dynamic + - Section: .dynstr + +## Case A.2: similar to A.1, but now check that we report a warning when the dynamic symbol table was not found. +## To do that, set the type of the .dynsym to SHT_PROGBITS to hide it. +# RUN: yaml2obj --docnum=9 -DDYNSYMTYPE=SHT_PROGBITS %s -o %t9.2.so +# RUN: llvm-readelf --hash-symbols %t9.2.so 2>&1 | FileCheck %s -DFILE=%t9.2.so --check-prefix=DYNSYM-NOTFOUND-HASH + +# DYNSYM-NOTFOUND-HASH: Symbol table of .hash for image: +# DYNSYM-NOTFOUND-HASH-NEXT: Num Buc: Value Size Type Bind Vis Ndx Name +# DYNSYM-NOTFOUND-HASH-NEXT: warning: '[[FILE]]': unable to print symbols for the .hash table: the dynamic symbol table was not found +# DYNSYM-NOTFOUND-HASH-NOT: {{.}} + +## Case B.1: Check we report a warning when the dynamic symbol table is empty and we attempt to print +## hash symbols from the .gnu.hash table. +# RUN: yaml2obj --docnum=10 %s -o %t10.1.so +# RUN: llvm-readelf --hash-symbols %t10.1.so 2>&1 | FileCheck %s -DFILE=%t10.1.so --check-prefix=DYNSYM-EMPTY-GNUHASH + +# DYNSYM-EMPTY-GNUHASH: Symbol table of .gnu.hash for image: +# DYNSYM-EMPTY-GNUHASH-NEXT: Num Buc: Value Size Type Bind Vis Ndx Name +# DYNSYM-EMPTY-GNUHASH-NEXT: warning: '[[FILE]]': unable to print symbols for the .gnu.hash table: the dynamic symbol table is empty +# DYNSYM-EMPTY-GNUHASH-NOT: {{.}} + +## Case B.2: similar to B.1, but now check that we report a warning when the dynamic symbol table was not found. +## To do that, set the type of the .dynsym to SHT_PROGBITS to hide it. +# RUN: yaml2obj --docnum=10 -DDYNSYMTYPE=SHT_PROGBITS %s -o %t10.2.so +# RUN: llvm-readelf --hash-symbols %t10.2.so 2>&1 | FileCheck %s -DFILE=%t10.2.so --check-prefix=DYNSYM-NOTFOUND-GNUHASH + +# DYNSYM-NOTFOUND-GNUHASH: Symbol table of .gnu.hash for image: +# DYNSYM-NOTFOUND-GNUHASH-NEXT: Num Buc: Value Size Type Bind Vis Ndx Name +# DYNSYM-NOTFOUND-GNUHASH-NEXT: warning: '[[FILE]]': unable to print symbols for the .gnu.hash table: the dynamic symbol table was not found +# DYNSYM-NOTFOUND-GNUHASH-NOT: {{.}} + +--- !ELF +FileHeader: + Class: ELFCLASS32 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_386 +Sections: + - Name: .gnu.hash + Type: SHT_GNU_HASH + Flags: [ SHF_ALLOC ] + Header: + SymNdx: 0x0 + Shift2: 0x0 + BloomFilter: [ 0x0 ] + HashBuckets: [ 0x1 ] + HashValues: [ 0x0 ] + - Name: .dynamic + Type: SHT_DYNAMIC + Flags: [ SHF_ALLOC ] + Entries: + - Tag: DT_GNU_HASH + Value: 0x0 + - Tag: DT_STRTAB +## PT_LOAD p_offset == .hash offset == 0x54. +## 0x54 + 0x3c == 0x80 == .dynstr offset. + Value: 0x3c + - Tag: DT_STRSZ + Value: 0x1 + - Tag: DT_NULL + Value: 0x0 + - Name: .dynstr + Type: SHT_STRTAB + - Name: .dynsym + Type: [[DYNSYMTYPE=SHT_DYNSYM]] + Flags: [ SHF_ALLOC ] + Size: 0 +ProgramHeaders: + - Type: PT_LOAD + Flags: [ PF_R, PF_X ] + Sections: + - Section: .gnu.hash + - Section: .dynamic + - Section: .dynstr diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index e2ea2c32e0a3..7a2960682e8c 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -4066,7 +4066,7 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { if (this->dumper()->getDynamicStringTable().empty()) return; auto StringTable = this->dumper()->getDynamicStringTable(); - auto DynSyms = this->dumper()->dynamic_symbols(); + Elf_Sym_Range DynSyms = this->dumper()->dynamic_symbols(); auto PrintHashTable = [&](const Elf_Hash *SysVHash) { if (ELFT::Is64Bits) @@ -4075,6 +4075,16 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; OS << "\n"; + const Elf_Sym *FirstSym = DynSyms.empty() ? nullptr : &DynSyms[0]; + if (!FirstSym) { + Optional DynSymRegion = this->dumper()->getDynSymRegion(); + this->reportUniqueWarning( + createError(Twine("unable to print symbols for the .hash table: the " + "dynamic symbol table ") + + (DynSymRegion ? "is empty" : "was not found"))); + return; + } + auto Buckets = SysVHash->buckets(); auto Chains = SysVHash->chains(); for (uint32_t Buc = 0; Buc < SysVHash->nbucket; Buc++) { @@ -4093,7 +4103,7 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { break; } - printHashedSymbol(Obj, &DynSyms[0], Ch, StringTable, Buc); + printHashedSymbol(Obj, FirstSym, Ch, StringTable, Buc); Visited[Ch] = true; } } @@ -4124,6 +4134,16 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { return; } + const Elf_Sym *FirstSym = DynSyms.empty() ? nullptr : &DynSyms[0]; + if (!FirstSym) { + Optional DynSymRegion = this->dumper()->getDynSymRegion(); + this->reportUniqueWarning(createError( + Twine("unable to print symbols for the .gnu.hash table: the " + "dynamic symbol table ") + + (DynSymRegion ? "is empty" : "was not found"))); + return; + } + auto Buckets = GnuHash->buckets(); for (uint32_t Buc = 0; Buc < GnuHash->nbuckets; Buc++) { if (Buckets[Buc] == ELF::STN_UNDEF) @@ -4132,7 +4152,7 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { uint32_t GnuHashable = Index - GnuHash->symndx; // Print whole chain while (true) { - printHashedSymbol(Obj, &DynSyms[0], Index++, StringTable, Buc); + printHashedSymbol(Obj, FirstSym, Index++, StringTable, Buc); // Chain ends at symbol with stopper bit if ((GnuHash->values(DynSyms.size())[GnuHashable++] & 1) == 1) break; From llvm-commits at lists.llvm.org Tue Jul 7 02:08:59 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:08:59 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: dantrushin marked 2 inline comments as done. dantrushin added inline comments. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:125 +// Return statepoint GC args as a set +static SmallSet collectGCRegs(MachineInstr &MI) { + StatepointOpers SO(&MI); ---------------- skatkov wrote: > dantrushin wrote: > > skatkov wrote: > > > Do I understand correctly that with your changes ALL GC pointers must be defs? > > > So why do you need these iterations instead of just taking all defs? > > Strictly speaking, no. Only derived pointers passed in registers. > > Are we guaranteed that all base pointers will appear as derived ones too? > > If yes, then it is good catch, taking them from defs is simpler (but taking them from operand list instead of def list sounds a bit more natural, IMHO) > > > > > I'm a bit confused here. What is the difference between derived and based pointer here? > You have an alive gc pointer. It might be relocated == can be changed. So it must be defined as def independent on whether it is a derived one or based one. Do I miss anything here? > > If you are doubt and write that it works under assumption I would suggest under debug assert that sets collected by different way are the same. GC pointers always occurs in pairs (base, derived). Only derived pointers are relocated (and so can be tied to defs of statepoint). If base pointer need to be relocated it will appear as (base, base) pair. It is not specified if the base pointer must be relocated together with its derived pointer. At least, this is how I interpret LLVM docs. So I originally wrote it in a way I had no doubts of. What's the point of having two implementation and comparing them with assert? Assert is not proof that 'doubtful' implementation is correct. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:401 + + // To insert reload at the end of MBB, insert it before last instruction + // and then swap them. ---------------- skatkov wrote: > dantrushin wrote: > > skatkov wrote: > > > what is the reason for this magic? > > The reason is that `TTI.loadRegFromStackSlot` can insert load only **before** some existing instruction. > Does it make sense to add an utility function to TTI which after some existing instruction? It looks more natural then this magic. I will have to implement that function in all backends and then spent 6 months or so asking people to review change which they won't ever use... I would like to avoid that Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Tue Jul 7 02:09:01 2020 From: llvm-commits at lists.llvm.org (Peter Smith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:09:01 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <1ccbf0c22f6a2ecfd4872dbebf6df360@localhost.localdomain> psmith added a comment. I think an option makes sense. I have a small reservation about making users order their matches on the command line. We should make it more explicit in the help and the manual, or try and sort the matches using a heuristic order of specificity, for example no wildcards is more specific than wildcards, longer text is more specific than shorter. Other than that I think it looks OK. ================ Comment at: lld/ELF/Options.td:127 +defm dead_nonalloc_reloc_value : EEq<"dead-nonalloc-reloc-value", + "Resolve a relocation from a matched non-SHF_ALLOC section to a discarded " ---------------- I think it will be worth mentioning that the there can be multiple occurrences and that the last takes precedence. "Resolve a relocation from a matched non-SHF_ALLOC section to a discarded symbol to the specified value. Accepts wildcards, in the event of a section matching more than one instance of this option, the last instance on the command-line takes precedence." ================ Comment at: lld/docs/ld.lld.1:626 .Pp +.It Cm dead-reloc-in-nonalloc Ns = Ns Ar section_glob=value +Resolve a relocation in a matched non-SHF_ALLOC section referencing a discarded symbol to ---------------- Similar to the comment above. Although as the man-page we can elaborate a bit. "Resolve a relocation from a matched non-SHF_ALLOC section to a discarded symbol to the specified value. Accepts wildcards, in the event of a section matching more than one instance of this option, the last instance on the command-line takes precedence. An order of least specific to most specific match is recommended." Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 02:13:11 2020 From: llvm-commits at lists.llvm.org (Lucas Prates via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:13:11 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: <937280c1bfbe68f07556c4a94818c548@localhost.localdomain> pratlucas updated this revision to Diff 275951. pratlucas added a comment. Fixing typos. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/ARM/arm-half-promote.ll Index: llvm/test/CodeGen/ARM/arm-half-promote.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -0,0 +1,53 @@ +; RUN: llc < %s -mtriple=thumbv7s-apple-ios7.0.0 | FileCheck %s + +define arm_aapcs_vfpcc { <8 x half>, <8 x half> } @f1() { +; CHECK-LABEL: _f1 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} + +define swiftcc { <8 x half>, <8 x half> } @f2() { +; CHECK-LABEL: _f2 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -4554,7 +4554,7 @@ // FIXME need to be more flexible about rounding mode. (void)V.convert(APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored); - return getConstant(V.bitcastToAPInt(), DL, VT); + return getConstant(V.bitcastToAPInt().getZExtValue(), DL, VT); } } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82552.275951.patch Type: text/x-patch Size: 2681 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:15:08 2020 From: llvm-commits at lists.llvm.org (Serguei Katkov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:15:08 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: skatkov added a comment. I need an answer to the question about test and at least update doc for rewriteStatepoint. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:125 +// Return statepoint GC args as a set +static SmallSet collectGCRegs(MachineInstr &MI) { + StatepointOpers SO(&MI); ---------------- dantrushin wrote: > skatkov wrote: > > dantrushin wrote: > > > skatkov wrote: > > > > Do I understand correctly that with your changes ALL GC pointers must be defs? > > > > So why do you need these iterations instead of just taking all defs? > > > Strictly speaking, no. Only derived pointers passed in registers. > > > Are we guaranteed that all base pointers will appear as derived ones too? > > > If yes, then it is good catch, taking them from defs is simpler (but taking them from operand list instead of def list sounds a bit more natural, IMHO) > > > > > > > > I'm a bit confused here. What is the difference between derived and based pointer here? > > You have an alive gc pointer. It might be relocated == can be changed. So it must be defined as def independent on whether it is a derived one or based one. Do I miss anything here? > > > > If you are doubt and write that it works under assumption I would suggest under debug assert that sets collected by different way are the same. > GC pointers always occurs in pairs (base, derived). > Only derived pointers are relocated (and so can be tied to defs of statepoint). > If base pointer need to be relocated it will appear as (base, base) pair. > > It is not specified if the base pointer must be relocated together with its derived pointer. > At least, this is how I interpret LLVM docs. So I originally wrote it in a way I had no doubts of. > > What's the point of having two implementation and comparing them with assert? > Assert is not proof that 'doubtful' implementation is correct. It will help you to catch easily an error in case I was wrong with defs. ================ Comment at: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp:401 + + // To insert reload at the end of MBB, insert it before last instruction + // and then swap them. ---------------- dantrushin wrote: > skatkov wrote: > > dantrushin wrote: > > > skatkov wrote: > > > > what is the reason for this magic? > > > The reason is that `TTI.loadRegFromStackSlot` can insert load only **before** some existing instruction. > > Does it make sense to add an utility function to TTI which after some existing instruction? It looks more natural then this magic. > I will have to implement that function in all backends and then spent 6 months or so asking people > to review change which they won't ever use... > I would like to avoid that ok Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Tue Jul 7 02:16:05 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:16:05 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation In-Reply-To: References: Message-ID: uweigand accepted this revision. uweigand added a comment. This revision is now accepted and ready to land. A couple of cosmetic changes inline, otherwise the LGTM. Thanks! ================ Comment at: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp:907 + + auto *CE = dyn_cast(Register); + if (!CE) ---------------- Good point, this should be const. ================ Comment at: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp:959 + // (i.e. insn is of type BDVMem) is true. + RegisterGroup RegGroup = (HasVectorIndex) ? RegV : RegGR; + ---------------- Parentheses around HasVectorIndex are not necessary. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83251/new/ https://reviews.llvm.org/D83251 From llvm-commits at lists.llvm.org Tue Jul 7 02:16:27 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:16:27 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: hans added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:170 PassManagerBuilder::PassManagerBuilder() { OptLevel = 2; ---------------- Oh, just noticed: I think CallGraphProfile should be initialized along with the other flags here. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- nikic wrote: > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 02:21:28 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:21:28 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <78ca6c296188e52e39fad907e0b16fb7@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89 + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " terminated prematurely: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; ---------------- dblaikie wrote: > jhenderson wrote: > > ikudrin wrote: > > > dblaikie wrote: > > > > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. > > > > > > > > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. > > > These wordings are already better than mine. Thanks! > > How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. > The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. > > What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? Taken from the test case: ``` error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 ``` (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") ``` error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c ``` ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:3 + +## All four public name sections share the same parser, but slightly different +## code paths are used to reach it. Do a comprehensive check for one of the ---------------- I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what). ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:7-8 + +# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2> %t.err | FileCheck %s +# RUN: FileCheck %s --input-file=%t.err --check-prefix=ERR + ---------------- I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:17 + +## The next few sets does not contain all required fields in the header. +# CHECK-NEXT: length = 0x00000001, format = DWARF32, version = 0x0000, unit_offset = 0x00000000, unit_size = 0x00000000 ---------------- does not -> do not ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:44-45 + .short 2 # Version + .long 0x32 # Debug Info offset + .byte 1, 2, 3 # Debug Info Length (truncated) +.LSet2End: ---------------- For consistency, either offset -> Offset or Length -> length (here and below). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Tue Jul 7 02:21:50 2020 From: llvm-commits at lists.llvm.org (Ruiling, Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:21:50 +0000 (UTC) Subject: [PATCH] D83020: [AMDGPU] Avoid using s_cmpk when src0 is not register In-Reply-To: References: Message-ID: <1d6bb547b7c606a8d4613d1d675e5a45@localhost.localdomain> ruiling marked an inline comment as done. ruiling added inline comments. ================ Comment at: llvm/test/CodeGen/AMDGPU/cmp_shrink.mir:5 +# GCN: bb.0: +# GCN-NOT: S_CMPK_GT_I32 +--- ---------------- ruiling wrote: > arsenm wrote: > > ruiling wrote: > > > arsenm wrote: > > > > positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too? > > > will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue. > > So what is the context this appears? Why wasn't it optimized out? > well I didn't carefully check the program yet to understand why the optimization algorithms in llvm fails to optimize the program. but I think that is another problem that worth a careful investigation. I will investigate and try to optimize it off later. But I think this patch can be merged, right? can anyone help to merge? I don't have commit access. @arsenm The issue occurs when running vulkancts again AMD open-source vulkan driver. I did a little more check, the test-case has ~40 BBs and lots of phi instructions which is later simplified and proved to be constant. And the problem may be because LLPC choose a subset of llvm optimization passes considering compilation time (https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/patch/Patch.cpp#L223). I tried use LLVM standard set of passes, the constant was optimized off. I think the optimization passes for vulkan may need further tuning to reach a better trade-off between compile-time and quality of generated code. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83020/new/ https://reviews.llvm.org/D83020 From llvm-commits at lists.llvm.org Tue Jul 7 02:27:49 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:27:49 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: jhenderson added a comment. Aside from the inline comment, this looks reasonable, but I haven't looked in detail due to time constraints. ================ Comment at: lld/test/ELF/dead-reloc-in-nonalloc.s:32-33 +## If a section matches multiple patterns. The last pattern wins. +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc='.debug_i*=1' \ +# RUN: -z dead-reloc-in-nonalloc='.debug_info=0' %t.o -o - | cmp %tzero - + ---------------- I'd be slightly tempted to swap the two arguments around, to show that it is definitely last, and not most-specific, that wins. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 02:29:40 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:29:40 +0000 (UTC) Subject: [PATCH] D83220: [DWARFYAML][unittest] Refactor parseDWARFYAML(). In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. Makes sense. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83220/new/ https://reviews.llvm.org/D83220 From llvm-commits at lists.llvm.org Tue Jul 7 02:38:33 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:38:33 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <7a4493fdd65d1633076fc24eec640304@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- dblaikie wrote: > Higuoxing wrote: > > dblaikie wrote: > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Tue Jul 7 02:39:10 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:39:10 +0000 (UTC) Subject: [PATCH] D83257: [SCCP] Handle assume predicates In-Reply-To: References: Message-ID: fhahn accepted this revision. fhahn added a comment. This revision is now accepted and ready to land. LGTM, thanks! ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1337 IV, &CB, ValueLatticeElement::getRange(NewCR, /*MayIncludeUndef=*/true)); return; ---------------- nikic wrote: > We could set MayIncludeUndef=false for assumes (as undef/poison has always been UB there), but I didn't think it worthwhile to make the distinction, as we plan to flip this for branches in the future anyway. It's probably not worth the trouble and we should just prioritize flipping the flag in general. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83257/new/ https://reviews.llvm.org/D83257 From llvm-commits at lists.llvm.org Tue Jul 7 02:41:57 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:41:57 +0000 (UTC) Subject: [PATCH] D83037: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGd5cbf7ba3252: [llvm-readobj] - Fix a crash scenario in GNUStyle<ELFT>::printHashSymbols(). (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83037/new/ https://reviews.llvm.org/D83037 Files: llvm/test/tools/llvm-readobj/ELF/hash-symbols.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83037.275651.patch Type: text/x-patch Size: 7651 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:44:05 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:44:05 +0000 (UTC) Subject: [PATCH] D83287: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: nickdesaulniers, dblaikie, diegotf, george.burgess.iv. lebedev.ri added a project: LLVM. I think, this results in much more understandable/readable flow. At least the original logic was perhaps the most hard thing for me to grasp when taking an initial look on the delta passes. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83287 Files: llvm/tools/llvm-reduce/deltas/Delta.h llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83287.275961.patch Type: text/x-patch Size: 11576 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:45:27 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:45:27 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <085dd96a1c4e2069bb44b8d11936d868@localhost.localdomain> ebevhan updated this revision to Diff 275963. ebevhan added a comment. Add vector support and TD isel nodes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.275963.patch Type: text/x-patch Size: 50054 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:46:55 2020 From: llvm-commits at lists.llvm.org (James Henderson via llvm-commits) Date: Tue, 7 Jul 2020 10:46:55 +0100 Subject: [PATCH] D83037: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). In-Reply-To: References: Message-ID: Hi @Mehdi/MaskRay, Looks like something else odd is going on with Phabricator - I accepted this patch, but when it landed, I got this email saying it landed in a "Needs Review" state (see below). Probably there's something wrong with Phabricator again? James On Tue, 7 Jul 2020 at 10:41, George Rimar via Phabricator < reviews at reviews.llvm.org> wrote: > This revision was not accepted when it landed; it landed in state "Needs > Review". > This revision was automatically updated to reflect the committed changes. > Closed by commit rGd5cbf7ba3252: [llvm-readobj] - Fix a crash scenario in > GNUStyle<ELFT>::printHashSymbols(). (authored by grimar). > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D83037/new/ > > https://reviews.llvm.org/D83037 > > Files: > llvm/test/tools/llvm-readobj/ELF/hash-symbols.test > llvm/tools/llvm-readobj/ELFDumper.cpp > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Tue Jul 7 02:48:03 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:48:03 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: <6d05b5a2cf96f0cc93aee6c7f33cf923@localhost.localdomain> foad accepted this revision. foad added a comment. This revision is now accepted and ready to land. Looks good to me now - though it would still be nice to hear from @qcolombet as the author of `TypePromotionTransaction`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 From llvm-commits at lists.llvm.org Tue Jul 7 02:48:52 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:48:52 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <7d08e9fcb68ea8d975ecf985fecd2d1c@localhost.localdomain> Higuoxing marked an inline comment as done. Higuoxing added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- jhenderson wrote: > dblaikie wrote: > > Higuoxing wrote: > > > dblaikie wrote: > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Tue Jul 7 02:50:19 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:50:19 +0000 (UTC) Subject: [PATCH] D83136: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst In-Reply-To: References: Message-ID: <76e482edadbd839a97f283d9b1c17b9f@localhost.localdomain> gchatelet marked an inline comment as done. gchatelet added inline comments. ================ Comment at: llvm/include/llvm/IR/IRBuilder.h:1746 + return Insert(new AtomicCmpXchgInst( + Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID)); } ---------------- jfb wrote: > jyknight wrote: > > jfb wrote: > > > Are types always rounded to a power of two at this point? > > > > > > i.e. what does this do: `struct { char c[3]; };` ? > > > > > > Also, I think this is wrong for non-lock-free types. The alignment requirement is lower on those. > > This is just encoding the pre-existing behavior -- you cannot currently create an cmpxchg instruction with alignment other than the size of the type. > > > > Right now, you _also_ cannot create a cmpxchg instruction with other than integral or pointer types, which -- in any _current_ llvm backend, afaik -- have non-power-of-2 sizes. > > > > Upcoming changes will plumb through a required alignment argument everywhere, and then we'll be rid of this weird hardcoded special alignment behavior here. > That sounds good to me. FWIW I checked and we get the following today: > ``` > %3 = bitcast %"struct.std::__1::atomic"* %0 to i32* > %4 = zext i24 %1 to i32 > %5 = cmpxchg i32* %3, i32 %4, i32 %4 seq_cst seq_cst > ret void > ``` > That being said, if we're going to allow other things to come into a cmpxchg in the future (i.e. remove the need to bitcast) then I want to make sure we encode the right requirements here, right now. I agree that they're enforced later in the code when the instruction is created, but that won't always be the case. With this patch, the alignment is still not user accessible, just defined a bit higher in the call hierarchy so I think it's fine. This patch is NFC. Next patch will make this change visible to users so we'll have to document it in LangRef, @jfb I'll loop you in so you can proofread it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83136/new/ https://reviews.llvm.org/D83136 From llvm-commits at lists.llvm.org Tue Jul 7 02:54:25 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Tue, 07 Jul 2020 02:54:25 -0700 (PDT) Subject: [llvm] 74c7237 - [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst Message-ID: <5f044651.1c69fb81.67a4.72a4@mx.google.com> Author: Guillaume Chatelet Date: 2020-07-07T09:54:13Z New Revision: 74c723757e69fbe7d85e42527d07b728113699ae URL: https://github.com/llvm/llvm-project/commit/74c723757e69fbe7d85e42527d07b728113699ae DIFF: https://github.com/llvm/llvm-project/commit/74c723757e69fbe7d85e42527d07b728113699ae.diff LOG: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst This is the first step to add support for the align attribute to AtomicRMWInst and AtomicCmpXchgInst. Next step is to add support in IRBuilder and BitcodeReader. Bug: https://bugs.llvm.org/show_bug.cgi?id=27168 Differential Revision: https://reviews.llvm.org/D83136 Added: Modified: llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/Instructions.h llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/IR/Instructions.cpp llvm/unittests/Analysis/AliasAnalysisTest.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h index ec042f0740fc..ffec4ff64ca6 100644 --- a/llvm/include/llvm/IR/IRBuilder.h +++ b/llvm/include/llvm/IR/IRBuilder.h @@ -1733,19 +1733,21 @@ class IRBuilderBase { return Insert(new FenceInst(Context, Ordering, SSID), Name); } - AtomicCmpXchgInst * - CreateAtomicCmpXchg(Value *Ptr, Value *Cmp, Value *New, - AtomicOrdering SuccessOrdering, - AtomicOrdering FailureOrdering, - SyncScope::ID SSID = SyncScope::System) { - return Insert(new AtomicCmpXchgInst(Ptr, Cmp, New, SuccessOrdering, - FailureOrdering, SSID)); + AtomicCmpXchgInst *CreateAtomicCmpXchg( + Value *Ptr, Value *Cmp, Value *New, AtomicOrdering SuccessOrdering, + AtomicOrdering FailureOrdering, SyncScope::ID SSID = SyncScope::System) { + const DataLayout &DL = BB->getModule()->getDataLayout(); + Align Alignment(DL.getTypeStoreSize(New->getType())); + return Insert(new AtomicCmpXchgInst( + Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID)); } AtomicRMWInst *CreateAtomicRMW(AtomicRMWInst::BinOp Op, Value *Ptr, Value *Val, AtomicOrdering Ordering, SyncScope::ID SSID = SyncScope::System) { - return Insert(new AtomicRMWInst(Op, Ptr, Val, Ordering, SSID)); + const DataLayout &DL = BB->getModule()->getDataLayout(); + Align Alignment(DL.getTypeStoreSize(Val->getType())); + return Insert(new AtomicRMWInst(Op, Ptr, Val, Alignment, Ordering, SSID)); } Value *CreateGEP(Value *Ptr, ArrayRef IdxList, diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h index 57ad0db6f3ce..7119b1392d2d 100644 --- a/llvm/include/llvm/IR/Instructions.h +++ b/llvm/include/llvm/IR/Instructions.h @@ -513,10 +513,15 @@ class FenceInst : public Instruction { /// failure (false) as second element. /// class AtomicCmpXchgInst : public Instruction { - void Init(Value *Ptr, Value *Cmp, Value *NewVal, + void Init(Value *Ptr, Value *Cmp, Value *NewVal, Align Align, AtomicOrdering SuccessOrdering, AtomicOrdering FailureOrdering, SyncScope::ID SSID); + template + using AtomicOrderingBitfieldElement = + typename Bitfield::Element; + protected: // Note: Instruction needs to be a friend here to call cloneImpl. friend class Instruction; @@ -524,34 +529,35 @@ class AtomicCmpXchgInst : public Instruction { AtomicCmpXchgInst *cloneImpl() const; public: - AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, + AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, Align Alignment, AtomicOrdering SuccessOrdering, - AtomicOrdering FailureOrdering, - SyncScope::ID SSID, Instruction *InsertBefore = nullptr); - AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, + AtomicOrdering FailureOrdering, SyncScope::ID SSID, + Instruction *InsertBefore = nullptr); + AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, Align Alignment, AtomicOrdering SuccessOrdering, - AtomicOrdering FailureOrdering, - SyncScope::ID SSID, BasicBlock *InsertAtEnd); + AtomicOrdering FailureOrdering, SyncScope::ID SSID, + BasicBlock *InsertAtEnd); // allocate space for exactly three operands void *operator new(size_t s) { return User::operator new(s, 3); } - // FIXME: Reuse bit 1 that was used by `syncscope.` - using VolatileField = Bitfield::Element; // Next bit:1 - using SuccessOrderingField = - Bitfield::Element; // Next bit:5 - using FailureOrderingField = - Bitfield::Element; // Next bit:8 - using WeakField = Bitfield::Element; // Next bit:9 + using VolatileField = Bitfield::Element; // Next bit:1 + using WeakField = Bitfield::Element; // Next bit:2 + using SuccessOrderingField = AtomicOrderingBitfieldElement<2>; // Next bit:5 + using FailureOrderingField = AtomicOrderingBitfieldElement<5>; // Next bit:8 + using AlignmentField = AlignmentBitfieldElement<8>; // Next bit:13 - /// Always returns the natural type alignment. - /// FIXME: Introduce a proper alignment - /// https://bugs.llvm.org/show_bug.cgi?id=27168 - Align getAlign() const; + /// Return the alignment of the memory that is being allocated by the + /// instruction. + Align getAlign() const { + return Align(1ULL << getSubclassData()); + } + + void setAlignment(Align Align) { + setSubclassData(Log2(Align)); + } /// Return true if this is a cmpxchg from a volatile memory /// location. @@ -726,10 +732,21 @@ class AtomicRMWInst : public Instruction { BAD_BINOP }; - AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, +private: + template + using AtomicOrderingBitfieldElement = + typename Bitfield::Element; + + template + using BinOpBitfieldElement = + typename Bitfield::Element; + +public: + AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, Align Alignment, AtomicOrdering Ordering, SyncScope::ID SSID, Instruction *InsertBefore = nullptr); - AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, + AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, Align Alignment, AtomicOrdering Ordering, SyncScope::ID SSID, BasicBlock *InsertAtEnd); @@ -738,13 +755,10 @@ class AtomicRMWInst : public Instruction { return User::operator new(s, 2); } - // FIXME: Reuse bit 1 that was used by `syncscope.` - using VolatileField = Bitfield::Element; // Next bit:1 - using AtomicOrderingField = - Bitfield::Element; // Next bit:5 - using OperationField = Bitfield::Element; // Next bit:9 + using VolatileField = Bitfield::Element; // Next bit:1 + using AtomicOrderingField = AtomicOrderingBitfieldElement<1>; // Next bit:4 + using OperationField = BinOpBitfieldElement<4>; // Next bit:8 + using AlignmentField = AlignmentBitfieldElement<8>; // Next bit:13 BinOp getOperation() const { return getSubclassData(); } @@ -764,10 +778,15 @@ class AtomicRMWInst : public Instruction { setSubclassData(Operation); } - /// Always returns the natural type alignment. - /// FIXME: Introduce a proper alignment - /// https://bugs.llvm.org/show_bug.cgi?id=27168 - Align getAlign() const; + /// Return the alignment of the memory that is being allocated by the + /// instruction. + Align getAlign() const { + return Align(1ULL << getSubclassData()); + } + + void setAlignment(Align Align) { + setSubclassData(Log2(Align)); + } /// Return true if this is a RMW on a volatile memory location. /// @@ -827,7 +846,7 @@ class AtomicRMWInst : public Instruction { } private: - void Init(BinOp Operation, Value *Ptr, Value *Val, + void Init(BinOp Operation, Value *Ptr, Value *Val, Align Align, AtomicOrdering Ordering, SyncScope::ID SSID); // Shadow Instruction::setInstructionSubclassData with a private forwarding diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp index db4fbfd7e3c9..85105f2c4b49 100644 --- a/llvm/lib/AsmParser/LLParser.cpp +++ b/llvm/lib/AsmParser/LLParser.cpp @@ -7209,8 +7209,13 @@ int LLParser::ParseCmpXchg(Instruction *&Inst, PerFunctionState &PFS) { return Error(NewLoc, "new value and pointer type do not match"); if (!New->getType()->isFirstClassType()) return Error(NewLoc, "cmpxchg operand must be a first class value"); + + Align Alignment( + PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize( + Cmp->getType())); + AtomicCmpXchgInst *CXI = new AtomicCmpXchgInst( - Ptr, Cmp, New, SuccessOrdering, FailureOrdering, SSID); + Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID); CXI->setVolatile(isVolatile); CXI->setWeak(isWeak); Inst = CXI; @@ -7294,9 +7299,11 @@ int LLParser::ParseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) { if (Size < 8 || (Size & (Size - 1))) return Error(ValLoc, "atomicrmw operand must be power-of-two byte-sized" " integer"); - + Align Alignment( + PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize( + Val->getType())); AtomicRMWInst *RMWI = - new AtomicRMWInst(Operation, Ptr, Val, Ordering, SSID); + new AtomicRMWInst(Operation, Ptr, Val, Alignment, Ordering, SSID); RMWI->setVolatile(isVolatile); Inst = RMWI; return AteExtraComma ? InstExtraComma : InstNormal; diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 4471302c05d7..dceb492c9120 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -5020,8 +5020,10 @@ Error BitcodeReader::parseFunctionBody(Function *F) { else FailureOrdering = getDecodedOrdering(Record[OpNum + 3]); - I = new AtomicCmpXchgInst(Ptr, Cmp, New, SuccessOrdering, FailureOrdering, - SSID); + Align Alignment( + TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); + I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, + FailureOrdering, SSID); FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); cast(I)->setVolatile(Record[OpNum]); @@ -5058,7 +5060,9 @@ Error BitcodeReader::parseFunctionBody(Function *F) { Ordering == AtomicOrdering::Unordered) return error("Invalid record"); SyncScope::ID SSID = getDecodedSyncScopeID(Record[OpNum + 3]); - I = new AtomicRMWInst(Operation, Ptr, Val, Ordering, SSID); + Align Alignment( + TheModule->getDataLayout().getTypeStoreSize(Val->getType())); + I = new AtomicRMWInst(Operation, Ptr, Val, Alignment, Ordering, SSID); FullTy = getPointerElementFlatType(FullTy); cast(I)->setVolatile(Record[OpNum+1]); InstructionList.push_back(I); diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp index e22f609b1885..f650ad9130ac 100644 --- a/llvm/lib/IR/Instructions.cpp +++ b/llvm/lib/IR/Instructions.cpp @@ -1479,7 +1479,7 @@ StoreInst::StoreInst(Value *val, Value *addr, bool isVolatile, Align Align, //===----------------------------------------------------------------------===// void AtomicCmpXchgInst::Init(Value *Ptr, Value *Cmp, Value *NewVal, - AtomicOrdering SuccessOrdering, + Align Alignment, AtomicOrdering SuccessOrdering, AtomicOrdering FailureOrdering, SyncScope::ID SSID) { Op<0>() = Ptr; @@ -1488,6 +1488,7 @@ void AtomicCmpXchgInst::Init(Value *Ptr, Value *Cmp, Value *NewVal, setSuccessOrdering(SuccessOrdering); setFailureOrdering(FailureOrdering); setSyncScopeID(SSID); + setAlignment(Alignment); assert(getOperand(0) && getOperand(1) && getOperand(2) && "All operands must be non-null!"); @@ -1512,6 +1513,7 @@ void AtomicCmpXchgInst::Init(Value *Ptr, Value *Cmp, Value *NewVal, } AtomicCmpXchgInst::AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, + Align Alignment, AtomicOrdering SuccessOrdering, AtomicOrdering FailureOrdering, SyncScope::ID SSID, @@ -1520,10 +1522,11 @@ AtomicCmpXchgInst::AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, StructType::get(Cmp->getType(), Type::getInt1Ty(Cmp->getContext())), AtomicCmpXchg, OperandTraits::op_begin(this), OperandTraits::operands(this), InsertBefore) { - Init(Ptr, Cmp, NewVal, SuccessOrdering, FailureOrdering, SSID); + Init(Ptr, Cmp, NewVal, Alignment, SuccessOrdering, FailureOrdering, SSID); } AtomicCmpXchgInst::AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, + Align Alignment, AtomicOrdering SuccessOrdering, AtomicOrdering FailureOrdering, SyncScope::ID SSID, @@ -1532,14 +1535,7 @@ AtomicCmpXchgInst::AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal, StructType::get(Cmp->getType(), Type::getInt1Ty(Cmp->getContext())), AtomicCmpXchg, OperandTraits::op_begin(this), OperandTraits::operands(this), InsertAtEnd) { - Init(Ptr, Cmp, NewVal, SuccessOrdering, FailureOrdering, SSID); -} - -Align AtomicCmpXchgInst::getAlign() const { - // The default here is to assume it has NATURAL alignment, not - // DataLayout-specified alignment. - const DataLayout &DL = getModule()->getDataLayout(); - return Align(DL.getTypeStoreSize(getCompareOperand()->getType())); + Init(Ptr, Cmp, NewVal, Alignment, SuccessOrdering, FailureOrdering, SSID); } //===----------------------------------------------------------------------===// @@ -1547,13 +1543,14 @@ Align AtomicCmpXchgInst::getAlign() const { //===----------------------------------------------------------------------===// void AtomicRMWInst::Init(BinOp Operation, Value *Ptr, Value *Val, - AtomicOrdering Ordering, + Align Alignment, AtomicOrdering Ordering, SyncScope::ID SSID) { Op<0>() = Ptr; Op<1>() = Val; setOperation(Operation); setOrdering(Ordering); setSyncScopeID(SSID); + setAlignment(Alignment); assert(getOperand(0) && getOperand(1) && "All operands must be non-null!"); @@ -1567,25 +1564,21 @@ void AtomicRMWInst::Init(BinOp Operation, Value *Ptr, Value *Val, } AtomicRMWInst::AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, - AtomicOrdering Ordering, - SyncScope::ID SSID, - Instruction *InsertBefore) - : Instruction(Val->getType(), AtomicRMW, - OperandTraits::op_begin(this), - OperandTraits::operands(this), - InsertBefore) { - Init(Operation, Ptr, Val, Ordering, SSID); + Align Alignment, AtomicOrdering Ordering, + SyncScope::ID SSID, Instruction *InsertBefore) + : Instruction(Val->getType(), AtomicRMW, + OperandTraits::op_begin(this), + OperandTraits::operands(this), InsertBefore) { + Init(Operation, Ptr, Val, Alignment, Ordering, SSID); } AtomicRMWInst::AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val, - AtomicOrdering Ordering, - SyncScope::ID SSID, - BasicBlock *InsertAtEnd) - : Instruction(Val->getType(), AtomicRMW, - OperandTraits::op_begin(this), - OperandTraits::operands(this), - InsertAtEnd) { - Init(Operation, Ptr, Val, Ordering, SSID); + Align Alignment, AtomicOrdering Ordering, + SyncScope::ID SSID, BasicBlock *InsertAtEnd) + : Instruction(Val->getType(), AtomicRMW, + OperandTraits::op_begin(this), + OperandTraits::operands(this), InsertAtEnd) { + Init(Operation, Ptr, Val, Alignment, Ordering, SSID); } StringRef AtomicRMWInst::getOperationName(BinOp Op) { @@ -1623,13 +1616,6 @@ StringRef AtomicRMWInst::getOperationName(BinOp Op) { llvm_unreachable("invalid atomicrmw operation"); } -Align AtomicRMWInst::getAlign() const { - // The default here is to assume it has NATURAL alignment, not - // DataLayout-specified alignment. - const DataLayout &DL = getModule()->getDataLayout(); - return Align(DL.getTypeStoreSize(getValOperand()->getType())); -} - //===----------------------------------------------------------------------===// // FenceInst Implementation //===----------------------------------------------------------------------===// @@ -4282,10 +4268,9 @@ StoreInst *StoreInst::cloneImpl() const { } AtomicCmpXchgInst *AtomicCmpXchgInst::cloneImpl() const { - AtomicCmpXchgInst *Result = - new AtomicCmpXchgInst(getOperand(0), getOperand(1), getOperand(2), - getSuccessOrdering(), getFailureOrdering(), - getSyncScopeID()); + AtomicCmpXchgInst *Result = new AtomicCmpXchgInst( + getOperand(0), getOperand(1), getOperand(2), getAlign(), + getSuccessOrdering(), getFailureOrdering(), getSyncScopeID()); Result->setVolatile(isVolatile()); Result->setWeak(isWeak()); return Result; @@ -4293,8 +4278,8 @@ AtomicCmpXchgInst *AtomicCmpXchgInst::cloneImpl() const { AtomicRMWInst *AtomicRMWInst::cloneImpl() const { AtomicRMWInst *Result = - new AtomicRMWInst(getOperation(), getOperand(0), getOperand(1), - getOrdering(), getSyncScopeID()); + new AtomicRMWInst(getOperation(), getOperand(0), getOperand(1), + getAlign(), getOrdering(), getSyncScopeID()); Result->setVolatile(isVolatile()); return Result; } diff --git a/llvm/unittests/Analysis/AliasAnalysisTest.cpp b/llvm/unittests/Analysis/AliasAnalysisTest.cpp index 83f4c2481934..6f0f6f5f6326 100644 --- a/llvm/unittests/Analysis/AliasAnalysisTest.cpp +++ b/llvm/unittests/Analysis/AliasAnalysisTest.cpp @@ -174,6 +174,7 @@ TEST_F(AliasAnalysisTest, getModRefInfo) { auto PtrType = Type::getInt32PtrTy(C); auto *Value = ConstantInt::get(IntType, 42); auto *Addr = ConstantPointerNull::get(PtrType); + auto Alignment = Align(IntType->getBitWidth() / 8); auto *Store1 = new StoreInst(Value, Addr, BB); auto *Load1 = new LoadInst(IntType, Addr, "load", BB); @@ -181,11 +182,11 @@ TEST_F(AliasAnalysisTest, getModRefInfo) { auto *VAArg1 = new VAArgInst(Addr, PtrType, "vaarg", BB); auto *CmpXChg1 = new AtomicCmpXchgInst( Addr, ConstantInt::get(IntType, 0), ConstantInt::get(IntType, 1), - AtomicOrdering::Monotonic, AtomicOrdering::Monotonic, + Alignment, AtomicOrdering::Monotonic, AtomicOrdering::Monotonic, SyncScope::System, BB); - auto *AtomicRMW = - new AtomicRMWInst(AtomicRMWInst::Xchg, Addr, ConstantInt::get(IntType, 1), - AtomicOrdering::Monotonic, SyncScope::System, BB); + auto *AtomicRMW = new AtomicRMWInst( + AtomicRMWInst::Xchg, Addr, ConstantInt::get(IntType, 1), Alignment, + AtomicOrdering::Monotonic, SyncScope::System, BB); ReturnInst::Create(C, nullptr, BB); From llvm-commits at lists.llvm.org Tue Jul 7 02:54:30 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 09:54:30 +0000 (UTC) Subject: [PATCH] D83136: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG74c723757e69: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst (authored by gchatelet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83136/new/ https://reviews.llvm.org/D83136 Files: llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/Instructions.h llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/IR/Instructions.cpp llvm/unittests/Analysis/AliasAnalysisTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83136.275965.patch Type: text/x-patch Size: 18437 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:00:48 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:00:48 +0000 (UTC) Subject: [PATCH] D82876: [Alignment][NFC] Migrate TargetTransformInfo::allowsMisalignedMemoryAccesses to Align In-Reply-To: References: Message-ID: <21a360d23826f308a69df2c2478fe49b@localhost.localdomain> gchatelet updated this revision to Diff 275966. gchatelet added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82876/new/ https://reviews.llvm.org/D82876 Files: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/R600ISelLowering.cpp llvm/lib/Target/AMDGPU/R600ISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/ARM/ARMISelLowering.cpp llvm/lib/Target/ARM/ARMISelLowering.h llvm/lib/Target/Hexagon/HexagonISelLowering.cpp llvm/lib/Target/Hexagon/HexagonISelLowering.h llvm/lib/Target/Mips/Mips16ISelLowering.cpp llvm/lib/Target/Mips/Mips16ISelLowering.h llvm/lib/Target/Mips/MipsSEISelLowering.cpp llvm/lib/Target/Mips/MipsSEISelLowering.h llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/VE/VEISelLowering.cpp llvm/lib/Target/VE/VEISelLowering.h llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82876.275966.patch Type: text/x-patch Size: 32960 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:05:01 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:05:01 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <7a242b33514c77840e81aab8c948d3cd@localhost.localdomain> grimar added a comment. Probably there are other names rather than "dead-reloc-in-nonalloc" which we might want to consider? `-z dead-noalloc-reloc-val` `-z tombstone-reloc` `-z resolve-dead-reloc` ================ Comment at: lld/ELF/Driver.cpp:1074 + for (opt::Arg *arg : args.filtered(OPT_z)) { + constexpr StringRef prefix = "-z dead-reloc-in-nonalloc=: "; + std::pair option = ---------------- I'd move it after the first `continue`. Also, perhaps, `errPrefix` would be more clear name for this variable. ================ Comment at: lld/test/ELF/dead-reloc-in-nonalloc.s:9 +# RUN: llvm-objdump -s %t | FileCheck %s --check-prefixes=COMMON,AA +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc=.debug_info=2863311530 \ +# RUN: -z dead-reloc-in-nonalloc=.not_debug=0xbbbbbbbb %t.o -o - | cmp %t - ---------------- Please add a comment saying that 2863311530 == 0xaaaaaaaa. ================ Comment at: lld/test/ELF/dead-reloc-in-nonalloc.s:35 + +# RUN: not ld.lld -z dead-reloc-in-nonalloc= 2>&1 | FileCheck %s --check-prefix=USAGE +# RUN: not ld.lld -z dead-reloc-in-nonalloc=a= 2>&1 | FileCheck %s --check-prefix=USAGE ---------------- I'd leave a single common comment saying you're going to check all possible invalid cases now. (to separate these tests from above ones). ================ Comment at: lld/test/ELF/debug-dead-reloc.s:27 +# OVERRIDE: Contents of section .debug_loc: +# OVERRIDE-NEXT: 0000 2a000000 00000000 2a000000 00000000 + ---------------- What about printing other sections content too? (seems there is no other test showing that when override the tombstone value for a debug section, the other ones remain unaffected) Probably it worth to combine `CHECK` and `OVERRIDE`: ``` # CHECK: Contents of section .debug_loc: # NOOVERRIDE-NEXT: 0000 feffffff ffffffff feffffff ffffffff # OVERRIDE-NEXT: 0000 2a000000 00000000 2a000000 00000000 ... ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 03:08:23 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:08:23 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. Message-ID: fhahn created this revision. fhahn added reviewers: Ayal, gilr, mkazantsev. Herald added subscribers: javed.absar, hiraditya. Herald added a project: LLVM. Currently the DomTree is not kept up to date for additional blocks generated in the vector loop, for example when vectorizing with predication. SCEVExpander relies on dominance checks when looking for existing instructions to re-use and in some cases that can lead to the expander picking instructions that do not actually dominate their insert point (e.g. as in PR46525). Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG is only patched up after generating code for a block. For now, we can just use the vector loop header, as this ensures the inserted instructions dominate all uses in the vector loop. There should be no noticeable impact on the generated code, as other passes should sink those instructions, if profitable. Fixes PR46525. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83288 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83288.275968.patch Type: text/x-patch Size: 8194 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:09:47 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:09:47 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: gchatelet updated this revision to Diff 275970. gchatelet added a comment. - Various nits Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 Files: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83202.275970.patch Type: text/x-patch Size: 11544 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:23:38 2020 From: llvm-commits at lists.llvm.org (Ilya Leoshkevich via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:23:38 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays Message-ID: iii created this revision. iii added a reviewer: yonghong-song. Herald added subscribers: llvm-commits, dexonsmith, hiraditya. Herald added a project: LLVM. Some BPF programs compiled on s390 fail to load, because s390 arch-specific linux headers contain float and double types. At the moment there is no BTF_KIND for floats and doubles, so release version of LLVM ends up emitting type id 0 for them, which in-kernel verifier does not accept. This should also be the case for structs and enums with >=64k members. Fix by emitting byte arrays instead of real types. Another option would be to skip the offending types altogether, but this would be unnecessarily restrictive. Yet another option would be what pahole does - represent floats and doubles as ints, but that won't cover other "weird" cases like large structs. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83289 Files: llvm/lib/Target/BPF/BTFDebug.cpp llvm/lib/Target/BPF/BTFDebug.h llvm/test/CodeGen/BPF/BTF/double.ll llvm/test/CodeGen/BPF/BTF/float.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83289.275975.patch Type: text/x-patch Size: 17488 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:24:10 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via llvm-commits) Date: Tue, 07 Jul 2020 03:24:10 -0700 (PDT) Subject: [llvm] 5e8084b - [SVE][CodeGen] Legalisation of unpredicated load instructions Message-ID: <5f044d4a.1c69fb81.ba0b.0c96@mx.google.com> Author: Kerry McLaughlin Date: 2020-07-07T11:05:03+01:00 New Revision: 5e8084beba20f27ce14536168087e5c6971e292d URL: https://github.com/llvm/llvm-project/commit/5e8084beba20f27ce14536168087e5c6971e292d DIFF: https://github.com/llvm/llvm-project/commit/5e8084beba20f27ce14536168087e5c6971e292d.diff LOG: [SVE][CodeGen] Legalisation of unpredicated load instructions Summary: When splitting a load of a scalable type, the new address is calculated in SplitVecRes_LOAD using a vscale and an add instruction. This patch also adds a DAG combiner fold to visitADD for vscale: - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1))) Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82792 Added: llvm/test/CodeGen/AArch64/sve-split-load.ll Modified: llvm/include/llvm/CodeGen/SelectionDAG.h llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index e084c42b2ffd..f26ab6f287a0 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -931,7 +931,8 @@ class SelectionDAG { SDValue getVScale(const SDLoc &DL, EVT VT, APInt MulImm) { assert(MulImm.getMinSignedBits() <= VT.getSizeInBits() && "Immediate does not fit VT"); - return getNode(ISD::VSCALE, DL, VT, getConstant(MulImm, DL, VT)); + return getNode(ISD::VSCALE, DL, VT, + getConstant(MulImm.sextOrTrunc(VT.getSizeInBits()), DL, VT)); } /// Return a GLOBAL_OFFSET_TABLE node. This does not have a useful SDLoc. diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index c94bbeb60716..4042a81b9cb7 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -2371,6 +2371,16 @@ SDValue DAGCombiner::visitADD(SDNode *N) { return DAG.getVScale(DL, VT, C0 + C1); } + // fold a+vscale(c1)+vscale(c2) -> a+vscale(c1+c2) + if ((N0.getOpcode() == ISD::ADD) && + (N0.getOperand(1).getOpcode() == ISD::VSCALE) && + (N1.getOpcode() == ISD::VSCALE)) { + auto VS0 = N0.getOperand(1)->getConstantOperandAPInt(0); + auto VS1 = N1->getConstantOperandAPInt(0); + auto VS = DAG.getVScale(DL, VT, VS0 + VS1); + return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), VS); + } + return SDValue(); } diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 6afa7b196030..cacc2dfa03c2 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1537,11 +1537,22 @@ void DAGTypeLegalizer::SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, LD->getPointerInfo(), LoMemVT, LD->getOriginalAlign(), MMOFlags, AAInfo); - unsigned IncrementSize = LoMemVT.getSizeInBits()/8; - Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize); - Hi = DAG.getLoad(ISD::UNINDEXED, ExtType, HiVT, dl, Ch, Ptr, Offset, - LD->getPointerInfo().getWithOffset(IncrementSize), HiMemVT, - LD->getOriginalAlign(), MMOFlags, AAInfo); + unsigned IncrementSize = LoMemVT.getSizeInBits().getKnownMinSize() / 8; + + MachinePointerInfo MPI; + if (LoVT.isScalableVector()) { + SDValue BytesIncrement = DAG.getVScale( + dl, Ptr.getValueType(), + APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize)); + MPI = MachinePointerInfo(LD->getPointerInfo().getAddrSpace()); + Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr, BytesIncrement); + } else { + MPI = LD->getPointerInfo().getWithOffset(IncrementSize); + Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize); + } + + Hi = DAG.getLoad(ISD::UNINDEXED, ExtType, HiVT, dl, Ch, Ptr, Offset, MPI, + HiMemVT, LD->getOriginalAlign(), MMOFlags, AAInfo); // Build a factor node to remember that this load is independent of the // other one. diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 732aea83d9ee..d1411c4b6060 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -4802,6 +4802,9 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X) return getNode(ISD::FABS, DL, VT, Operand.getOperand(0)); break; + case ISD::VSCALE: + assert(VT == Operand.getValueType() && "Unexpected VT!"); + break; } SDNode *N; diff --git a/llvm/test/CodeGen/AArch64/sve-split-load.ll b/llvm/test/CodeGen/AArch64/sve-split-load.ll new file mode 100644 index 000000000000..a76b27e63557 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-split-load.ll @@ -0,0 +1,55 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s + +; LOAD + +define @load_promote_4i8(* %a) { +; CHECK-LABEL: load_promote_4i8: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.s +; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0] +; CHECK-NEXT: ret + %load = load , * %a + ret %load +} + +define @load_split_i16(* %a) { +; CHECK-LABEL: load_split_i16: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.h +; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0] +; CHECK-NEXT: ld1h { z1.h }, p0/z, [x0, #1, mul vl] +; CHECK-NEXT: ret + %load = load , * %a + ret %load +} + +define @load_split_32i16(* %a) { +; CHECK-LABEL: load_split_32i16: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.h +; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0] +; CHECK-NEXT: ld1h { z1.h }, p0/z, [x0, #1, mul vl] +; CHECK-NEXT: ld1h { z2.h }, p0/z, [x0, #2, mul vl] +; CHECK-NEXT: ld1h { z3.h }, p0/z, [x0, #3, mul vl] +; CHECK-NEXT: ret + %load = load , * %a + ret %load +} + +define @load_split_16i64(* %a) { +; CHECK-LABEL: load_split_16i64: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.d +; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0] +; CHECK-NEXT: ld1d { z1.d }, p0/z, [x0, #1, mul vl] +; CHECK-NEXT: ld1d { z2.d }, p0/z, [x0, #2, mul vl] +; CHECK-NEXT: ld1d { z3.d }, p0/z, [x0, #3, mul vl] +; CHECK-NEXT: ld1d { z4.d }, p0/z, [x0, #4, mul vl] +; CHECK-NEXT: ld1d { z5.d }, p0/z, [x0, #5, mul vl] +; CHECK-NEXT: ld1d { z6.d }, p0/z, [x0, #6, mul vl] +; CHECK-NEXT: ld1d { z7.d }, p0/z, [x0, #7, mul vl] +; CHECK-NEXT: ret + %load = load , * %a + ret %load +} From llvm-commits at lists.llvm.org Tue Jul 7 03:24:25 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:24:25 +0000 (UTC) Subject: [PATCH] D82792: [SVE][CodeGen] Legalisation of unpredicated load instructions In-Reply-To: References: Message-ID: <0cf225b2d4475a491d80a0f30ae58a30@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG5e8084beba20: [SVE][CodeGen] Legalisation of unpredicated load instructions (authored by kmclaughlin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82792/new/ https://reviews.llvm.org/D82792 Files: llvm/include/llvm/CodeGen/SelectionDAG.h llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/AArch64/sve-split-load.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82792.275976.patch Type: text/x-patch Size: 5712 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:24:48 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:24:48 +0000 (UTC) Subject: [PATCH] D83136: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst In-Reply-To: References: Message-ID: <396a2adb4a369e7f88fe6095d77697ef@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG74c723757e69: [NFC] Adding the align attribute on Atomic{CmpXchg|RMW}Inst (authored by gchatelet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83136/new/ https://reviews.llvm.org/D83136 Files: llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/Instructions.h llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/IR/Instructions.cpp llvm/unittests/Analysis/AliasAnalysisTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83136.275652.patch Type: text/x-patch Size: 18437 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:24:49 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:24:49 +0000 (UTC) Subject: [PATCH] D82792: [SVE][CodeGen] Legalisation of unpredicated load instructions In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG5e8084beba20: [SVE][CodeGen] Legalisation of unpredicated load instructions (authored by kmclaughlin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82792/new/ https://reviews.llvm.org/D82792 Files: llvm/include/llvm/CodeGen/SelectionDAG.h llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/AArch64/sve-split-load.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82792.275653.patch Type: text/x-patch Size: 5712 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:25:06 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:25:06 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: <92ecc5686dee9d31ef66351a009f7356@localhost.localdomain> ikudrin added a comment. In D82886#2134722 , @dblaikie wrote: > Is that difference necessary? I tried removing the length == 0 special case from "length()" and no tests fail. Perhaps we could go that route instead? For example, `dumpRnglistsSection()` in `DWARFContext.cpp` terminates the loop when `length()` returns 0. With a specially constructed input, your variant would result in several additional unsuccessful reads with additional error messages: .section .debug_rnglists,"", at progbits .long 0xffffffff .long 0xffffffff .byte 0xff ... error: parsing .debug_rnglists table at offset 0x0: unexpected end of data at offset 0xb while reading [0x4, 0xc) error: parsing .debug_rnglists table at offset 0x4: unexpected end of data at offset 0xb while reading [0x8, 0x10) error: parsing .debug_rnglists table at offset 0x8: unexpected end of data at offset 0xb while reading [0x8, 0xc) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Tue Jul 7 03:26:15 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:26:15 +0000 (UTC) Subject: [PATCH] D82792: [SVE][CodeGen] Legalisation of unpredicated load instructions In-Reply-To: References: Message-ID: kmclaughlin added a comment. Thanks for reviewing this change, @efriedma & @david-arm! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82792/new/ https://reviews.llvm.org/D82792 From llvm-commits at lists.llvm.org Tue Jul 7 03:30:59 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 03:30:59 -0700 (PDT) Subject: [llvm] 2d9bd44 - [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. Message-ID: <5f044ee3.1c69fb81.f8536.81c5@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T13:30:12+03:00 New Revision: 2d9bd448c9f051de088b53592f89871e9b390fba URL: https://github.com/llvm/llvm-project/commit/2d9bd448c9f051de088b53592f89871e9b390fba DIFF: https://github.com/llvm/llvm-project/commit/2d9bd448c9f051de088b53592f89871e9b390fba.diff LOG: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. The code we have currently reports an error if something is not right with the profile section. Instead we can report a warning and continue dumping when it is possible. This patch does it. Differential revision: https://reviews.llvm.org/D83129 Added: Modified: llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test b/llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test index dfd95ea46a06..8ccc93cf426f 100644 --- a/llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test +++ b/llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test @@ -37,6 +37,69 @@ Sections: - From: bar To: foo Weight: 98 +## 0x10 is the normal entry size for the SHT_LLVM_CALL_GRAPH_PROFILE section. + EntSize: [[ENTSIZE=0x10]] Symbols: - Name: foo - Name: bar + +## Check we report a warning when unable to get the content of the SHT_LLVM_CALL_GRAPH_PROFILE section. +# RUN: yaml2obj %s -DENTSIZE=0xF -o %t2.o +# RUN: llvm-readobj %t2.o --cg-profile 2>&1 | FileCheck %s -DFILE=%t2.o --check-prefix=LLVM-ERR +# RUN: llvm-readelf %t2.o --cg-profile | FileCheck %s --check-prefix=GNU + +# LLVM-ERR: CGProfile [ +# LLVM-ERR-NEXT: warning: '[[FILE]]': unable to dump the SHT_LLVM_CALL_GRAPH_PROFILE section: section [index 1] has an invalid sh_entsize: 15 +# LLVM-ERR-NEXT: ] + +## Check we report a warning when unable to dump a name of a symbol. +# RUN: yaml2obj %s --docnum=2 -o %t3.o +# RUN: llvm-readobj %t3.o --cg-profile 2>&1 | FileCheck %s -DFILE=%t3.o --check-prefix=LLVM-BROKEN-SYM +# RUN: llvm-readelf %t3.o --cg-profile | FileCheck %s --check-prefix=GNU + +# LLVM-BROKEN-SYM: CGProfile [ +# LLVM-BROKEN-SYM-NEXT: CGProfileEntry { +# LLVM-BROKEN-SYM-NEXT: From: A (1) +# LLVM-BROKEN-SYM-NEXT: warning: '[[FILE]]': unable to read the name of symbol with index 2: st_name (0xff) is past the end of the string table of size 0x5 +# LLVM-BROKEN-SYM-NEXT: To: (2) +# LLVM-BROKEN-SYM-NEXT: Weight: 10 +# LLVM-BROKEN-SYM-NEXT: } +# LLVM-BROKEN-SYM-NEXT: CGProfileEntry { +# LLVM-BROKEN-SYM-NEXT: From: (2) +# LLVM-BROKEN-SYM-NEXT: To: B (3) +# LLVM-BROKEN-SYM-NEXT: Weight: 20 +# LLVM-BROKEN-SYM-NEXT: } +# LLVM-BROKEN-SYM-NEXT: CGProfileEntry { +# LLVM-BROKEN-SYM-NEXT: From: (0) +# LLVM-BROKEN-SYM-NEXT: warning: '[[FILE]]': unable to read the name of symbol with index 4: unable to get symbol from section [index 3]: invalid symbol index (4) +# LLVM-BROKEN-SYM-NEXT: To: (4) +# LLVM-BROKEN-SYM-NEXT: Weight: 20 +# LLVM-BROKEN-SYM-NEXT: } +# LLVM-BROKEN-SYM-NEXT: ] + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_X86_64 +Sections: + - Name: .llvm.call-graph-profile + Type: SHT_LLVM_CALL_GRAPH_PROFILE + Entries: + - From: 1 + To: 2 + Weight: 10 + - From: 2 + To: 3 + Weight: 20 + - From: 0x0 ## Null symbol. + To: 0x4 ## This index goes past the end of the symbol table. + Weight: 20 + - Name: .strtab + Type: SHT_STRTAB + Content: "0041004200" ## '\0', 'A', '\0', 'B', '\0' +Symbols: + - StName: 1 ## 'A' + - StName: 0xFF ## An arbitrary currupted index in the string table. + - StName: 3 ## 'B' diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index 7a2960682e8c..3d51515682d8 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6564,21 +6564,32 @@ void LLVMStyle::printCGProfile(const ELFFile *Obj) { ListScope L(W, "CGProfile"); if (!this->dumper()->getDotCGProfileSec()) return; - auto CGProfile = unwrapOrError( - this->FileName, Obj->template getSectionContentsAsArray( - this->dumper()->getDotCGProfileSec())); - for (const Elf_CGProfile &CGPE : CGProfile) { + + Expected> CGProfileOrErr = + Obj->template getSectionContentsAsArray( + this->dumper()->getDotCGProfileSec()); + if (!CGProfileOrErr) { + this->reportUniqueWarning( + createError("unable to dump the SHT_LLVM_CALL_GRAPH_PROFILE section: " + + toString(CGProfileOrErr.takeError()))); + return; + } + + auto GetSymName = [&](uint32_t Index) -> std::string { + if (Expected NameOrErr = + this->dumper()->getStaticSymbolName(Index)) + return *NameOrErr; + else + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(NameOrErr.takeError()))); + return ""; + }; + + for (const Elf_CGProfile &CGPE : *CGProfileOrErr) { DictScope D(W, "CGProfileEntry"); - W.printNumber( - "From", - unwrapOrError(this->FileName, - this->dumper()->getStaticSymbolName(CGPE.cgp_from)), - CGPE.cgp_from); - W.printNumber( - "To", - unwrapOrError(this->FileName, - this->dumper()->getStaticSymbolName(CGPE.cgp_to)), - CGPE.cgp_to); + W.printNumber("From", GetSymName(CGPE.cgp_from), CGPE.cgp_from); + W.printNumber("To", GetSymName(CGPE.cgp_to), CGPE.cgp_to); W.printNumber("Weight", CGPE.cgp_weight); } } From llvm-commits at lists.llvm.org Tue Jul 7 03:31:01 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:31:01 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG2d9bd448c9f0: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE… (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 Files: llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83129.275977.patch Type: text/x-patch Size: 4927 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:38:56 2020 From: llvm-commits at lists.llvm.org (Stefanos Baziotis via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:38:56 +0000 (UTC) Subject: [PATCH] D80991: [WIP][Attributor] AAPotentialValues Attribute In-Reply-To: References: Message-ID: <2392373de72472a5feabb4ac39a83609@localhost.localdomain> baziotis added a comment. First of all, great patch. Reading the description and the referenced diff, I can't find an explanation about what this patch tries to do, it would be good if you added that. My assumption is: You try to deduce what potential values can an IR position value get. Currently, it supports only ints (with that assumption I wrote the comments above). Correct? (Even if you agree with that, please add some explanation about the method you're following) ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:3057 + /// If this is true, the value of \p Set is meaningless. + bool isFull; + ---------------- This is something that Johannes can give more insight and also a description for the patch might help but: I guess that what you mean here by "full" is the "top" (or "bottom", depending on how you see it) of a lattice in a data-flow analysis. Point being, "top" pretty much means "I don't know, it could be anything" and it's a common knowledge for people. "Full" OTOH, especially without further explanation about what it means, is not that clear IMHO. e.g. can I insert stuff up to a certain point ? When does something becomes full if the set can grow infinitely large ? ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:3185 + /// Set representing assumed set of potential values. + StateTy Assumed; +}; ---------------- okura wrote: > jdoerfert wrote: > > I'm not sure if we really want/need to track the known set here. Maybe I just don't understand what we currently track in it. Feel free to explain :) > > If it is "just" to match the design of other states, don't worry. Take the full set as the worst state and remove the known stuff. > `Known.IsFull` is always true in the current implementation, But when the bit width of an value is 1 or 2, we can initialize known set as actual full-set (`isFull` is false). > By this, the fixpoint state can be reached faster and we may be able to reduce the number of iterations. That's why I left known in here. Is this a bad idea? I agree with Johannes. OTOH, I'm sorry but I did not understand your comment. First of all, regarding the bit-width, do you mean that e.g. if I have a bit-width of 1, then I know I have only two possible values and so I know the full set ? But what is the "actual" full set ? Even in that case, if you don't know which of the values you have, you still then have pessimistic fix-point (i.e. both values, i.e. `isFull = true`). No? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80991/new/ https://reviews.llvm.org/D80991 From llvm-commits at lists.llvm.org Tue Jul 7 03:39:23 2020 From: llvm-commits at lists.llvm.org (Stefanos Baziotis via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:39:23 +0000 (UTC) Subject: [PATCH] D83283: [Attributor] AAPotentialValues Interface In-Reply-To: References: Message-ID: baziotis added a comment. Nit comments, I haven't yet digested the whole patch. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:3056 + static inline APInt getEmptyKey() { + APInt V(nullptr, 0); + V.U.VAL = 0; ---------------- Is there an `APInt` constructor that takes a pointer and an `int` ? I think the only such constructor is a private one. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:3057 + APInt V(nullptr, 0); + V.U.VAL = 0; + return V; ---------------- I also think that this `U` member is `private`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83283/new/ https://reviews.llvm.org/D83283 From llvm-commits at lists.llvm.org Tue Jul 7 03:41:01 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 03:41:01 -0700 (PDT) Subject: [llvm] 2953ac0 - [llvm-readobj] - Refactor ELFDumper::getStaticSymbolName. Message-ID: <5f04513d.1c69fb81.df057.1031@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T13:33:47+03:00 New Revision: 2953ac0975bc7e5dbe61fbd6538f02487efa62d2 URL: https://github.com/llvm/llvm-project/commit/2953ac0975bc7e5dbe61fbd6538f02487efa62d2 DIFF: https://github.com/llvm/llvm-project/commit/2953ac0975bc7e5dbe61fbd6538f02487efa62d2.diff LOG: [llvm-readobj] - Refactor ELFDumper::getStaticSymbolName. This is a followup for D83129. It is possible to make `getStaticSymbolName` report warnings inside and return the "" on a error. This allows to encapsulate errors handling and slightly simplifies the logic in callers code. Differential revision: https://reviews.llvm.org/D83208 Added: Modified: llvm/test/tools/llvm-readobj/ELF/addrsig.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/addrsig.test b/llvm/test/tools/llvm-readobj/ELF/addrsig.test index 1234a42a839a..c4793aae7b73 100644 --- a/llvm/test/tools/llvm-readobj/ELF/addrsig.test +++ b/llvm/test/tools/llvm-readobj/ELF/addrsig.test @@ -58,7 +58,7 @@ Sections: # INVALID-INDEX: Addrsig [ # INVALID-INDEX-NEXT: Sym: foo (1) -# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to get symbol from section [index 2]: invalid symbol index (255) +# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to read the name of symbol with index 255: unable to get symbol from section [index 2]: invalid symbol index (255) # INVALID-INDEX-NEXT: Sym: (255) # INVALID-INDEX-NEXT: Sym: bar (2) # INVALID-INDEX-NEXT: ] diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index 3d51515682d8..b284aae84470 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -343,7 +343,7 @@ template class ELFDumper : public ObjDumper { const Elf_Sym *FirstSym) const; Expected getSymbolSectionName(const Elf_Sym *Symbol, unsigned SectionIndex) const; - Expected getStaticSymbolName(uint32_t Index) const; + std::string getStaticSymbolName(uint32_t Index) const; StringRef getDynamicString(uint64_t Value) const; Expected getSymbolVersionByIndex(uint32_t VersionSymbolIndex, bool &IsDefault) const; @@ -1131,21 +1131,27 @@ static std::string maybeDemangle(StringRef Name) { } template -Expected -ELFDumper::getStaticSymbolName(uint32_t Index) const { +std::string ELFDumper::getStaticSymbolName(uint32_t Index) const { + auto Warn = [&](Error E) -> std::string { + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(std::move(E)))); + return ""; + }; + const ELFFile *Obj = ObjF->getELFFile(); Expected SymOrErr = Obj->getSymbol(DotSymtabSec, Index); if (!SymOrErr) - return SymOrErr.takeError(); + return Warn(SymOrErr.takeError()); Expected StrTabOrErr = Obj->getStringTableForSymtab(*DotSymtabSec); if (!StrTabOrErr) - return StrTabOrErr.takeError(); + return Warn(StrTabOrErr.takeError()); Expected NameOrErr = (*SymOrErr)->getName(*StrTabOrErr); if (!NameOrErr) - return NameOrErr.takeError(); + return Warn(NameOrErr.takeError()); return maybeDemangle(*NameOrErr); } @@ -6575,21 +6581,12 @@ void LLVMStyle::printCGProfile(const ELFFile *Obj) { return; } - auto GetSymName = [&](uint32_t Index) -> std::string { - if (Expected NameOrErr = - this->dumper()->getStaticSymbolName(Index)) - return *NameOrErr; - else - this->reportUniqueWarning( - createError("unable to read the name of symbol with index " + - Twine(Index) + ": " + toString(NameOrErr.takeError()))); - return ""; - }; - for (const Elf_CGProfile &CGPE : *CGProfileOrErr) { DictScope D(W, "CGProfileEntry"); - W.printNumber("From", GetSymName(CGPE.cgp_from), CGPE.cgp_from); - W.printNumber("To", GetSymName(CGPE.cgp_to), CGPE.cgp_to); + W.printNumber("From", this->dumper()->getStaticSymbolName(CGPE.cgp_from), + CGPE.cgp_from); + W.printNumber("To", this->dumper()->getStaticSymbolName(CGPE.cgp_to), + CGPE.cgp_to); W.printNumber("Weight", CGPE.cgp_weight); } } From llvm-commits at lists.llvm.org Tue Jul 7 03:41:08 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:41:08 +0000 (UTC) Subject: [PATCH] D83208: [llvm-readobj] - Refactor ELFDumper::getStaticSymbolName. In-Reply-To: References: Message-ID: <4dab71952bc8ff47201c0009b61414a9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. grimar marked an inline comment as done. Closed by commit rG2953ac0975bc: [llvm-readobj] - Refactor ELFDumper<ELFT>::getStaticSymbolName. (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D83208?vs=275644&id=275982#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83208/new/ https://reviews.llvm.org/D83208 Files: llvm/test/tools/llvm-readobj/ELF/addrsig.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -343,7 +343,7 @@ const Elf_Sym *FirstSym) const; Expected getSymbolSectionName(const Elf_Sym *Symbol, unsigned SectionIndex) const; - Expected getStaticSymbolName(uint32_t Index) const; + std::string getStaticSymbolName(uint32_t Index) const; StringRef getDynamicString(uint64_t Value) const; Expected getSymbolVersionByIndex(uint32_t VersionSymbolIndex, bool &IsDefault) const; @@ -1131,21 +1131,27 @@ } template -Expected -ELFDumper::getStaticSymbolName(uint32_t Index) const { +std::string ELFDumper::getStaticSymbolName(uint32_t Index) const { + auto Warn = [&](Error E) -> std::string { + this->reportUniqueWarning( + createError("unable to read the name of symbol with index " + + Twine(Index) + ": " + toString(std::move(E)))); + return ""; + }; + const ELFFile *Obj = ObjF->getELFFile(); Expected SymOrErr = Obj->getSymbol(DotSymtabSec, Index); if (!SymOrErr) - return SymOrErr.takeError(); + return Warn(SymOrErr.takeError()); Expected StrTabOrErr = Obj->getStringTableForSymtab(*DotSymtabSec); if (!StrTabOrErr) - return StrTabOrErr.takeError(); + return Warn(StrTabOrErr.takeError()); Expected NameOrErr = (*SymOrErr)->getName(*StrTabOrErr); if (!NameOrErr) - return NameOrErr.takeError(); + return Warn(NameOrErr.takeError()); return maybeDemangle(*NameOrErr); } @@ -6575,21 +6581,12 @@ return; } - auto GetSymName = [&](uint32_t Index) -> std::string { - if (Expected NameOrErr = - this->dumper()->getStaticSymbolName(Index)) - return *NameOrErr; - else - this->reportUniqueWarning( - createError("unable to read the name of symbol with index " + - Twine(Index) + ": " + toString(NameOrErr.takeError()))); - return ""; - }; - for (const Elf_CGProfile &CGPE : *CGProfileOrErr) { DictScope D(W, "CGProfileEntry"); - W.printNumber("From", GetSymName(CGPE.cgp_from), CGPE.cgp_from); - W.printNumber("To", GetSymName(CGPE.cgp_to), CGPE.cgp_to); + W.printNumber("From", this->dumper()->getStaticSymbolName(CGPE.cgp_from), + CGPE.cgp_from); + W.printNumber("To", this->dumper()->getStaticSymbolName(CGPE.cgp_to), + CGPE.cgp_to); W.printNumber("Weight", CGPE.cgp_weight); } } Index: llvm/test/tools/llvm-readobj/ELF/addrsig.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/addrsig.test +++ llvm/test/tools/llvm-readobj/ELF/addrsig.test @@ -58,7 +58,7 @@ # INVALID-INDEX: Addrsig [ # INVALID-INDEX-NEXT: Sym: foo (1) -# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to get symbol from section [index 2]: invalid symbol index (255) +# INVALID-INDEX-NEXT: warning: '[[FILE]]': unable to read the name of symbol with index 255: unable to get symbol from section [index 2]: invalid symbol index (255) # INVALID-INDEX-NEXT: Sym: (255) # INVALID-INDEX-NEXT: Sym: bar (2) # INVALID-INDEX-NEXT: ] -------------- next part -------------- A non-text attachment was scrubbed... Name: D83208.275982.patch Type: text/x-patch Size: 3508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:41:21 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:41:21 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <1eb7737980afb8f828585fafef120595@localhost.localdomain> dmgreen updated this revision to Diff 275979. dmgreen marked 12 inline comments as done. dmgreen added a comment. Thanks for taking another look. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 Files: llvm/include/llvm/Analysis/IVDescriptors.h llvm/lib/Analysis/IVDescriptors.cpp llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp llvm/lib/Transforms/Vectorize/VPlan.h llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll llvm/test/Transforms/LoopVectorize/reduction-inloop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D75069.275979.patch Type: text/x-patch Size: 64625 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 03:41:28 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 10:41:28 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <81c25778012237b4a0cfaf1305aba091@localhost.localdomain> dmgreen added inline comments. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:856 + // The loop exit instruction also needs to be the same opcode. We dont allow + // them to be Subs. + if (!isCorrectOpcode(Cur)) ---------------- Ayal wrote: > Is Subs the only issue?. > Can check this earlier, before traversing the chain, although it is pushed back last, here. I believe sub is the only issue, unless I am forgetting something. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812 + if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) && + LHS->hasOneUse() && + findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) { ---------------- Ayal wrote: > dmgreen wrote: > > fhahn wrote: > > > Ayal wrote: > > > > Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr? > > > > > > > > Note that when extended to handle reductions with conditional bumps, some ops will have more than one use. > > > Instead of doing a recursive traversal, would it be simpler to just do the traversal iteratively, at least as long as we are only using at a single use chain? > > Yeah, that direction makes it a lot simpler. Thanks. > Is treating sub as an add reduction something in-loop reduction could support as a future extension? Hmm. I don't want to say never. A normal inloop reduction looks like: p = PHI(0, a) l = VLDR (..) a = VADDVA(p, l) Where the `VADDV` is an across-vector reductions, and the extra `A` means also add p. Reducing a sub would need to become: p = PHI(0, a) l = VLDR (..) a = VADDV(l) p = SUB(p, a) With the SUB as a separate scalar instruction, which would be quite slow on some hardware (getting a value over from the VADDV to the SUB). So this would almost certainly be slower than a out-of-loop reduction. But if we could end up using a higher vector factor for the reduction, or end up vectorizing loops that would previously not be vectorized.. that may lead to a gain overall to overcome the extra cost of adding the sub to the loop. It will require some very careful costing I think. And maybe the ability to create multiple vplans and cost them against one another :) ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769 // MinMax reduction have the start value as their identify. - if (VF == 1) { + if (VF == 1 || UseInloopReductions) { VectorStart = Identity = ReductionStartValue; ---------------- Ayal wrote: > dmgreen wrote: > > Ayal wrote: > > > dmgreen wrote: > > > > Ayal wrote: > > > > > This is dead code if cmp/select chains are not recognized yet, as noted above. > > > > I've added the code to handle minmax too (but not tested it a lot yet. I will try that now). > > > > > > > > MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.) > > > > > > > > My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away. > > > Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize? > > -Os sounds like a good plan. It will take some backend work to make it efficient enough first though. And predicated reductions? > Hoisting the horizontal reduction from the middle block into the loop could potentially eliminate the middle block (as in tests below), so could presumably lead to code of smaller size? At-least for in-loop chains of a single link. > > > And predicated reductions? > These are yet to be handled in-loop, right? >> And predicated reductions? >These are yet to be handled in-loop, right? Yep. It will need a predicated reduction intrinsic. A vecreduce that takes a mask. That will allow us to tail-fold the reductions with trip counts that do not divide the vector factor, which will make them look a lot better under -Os. And nice in general I think once it all starts being tail predicated. The backend work I was mentioning was that we need to more efficiently transform x = min(vecreduce.min(z), y) into x = VMINV(y, z) Where y is (confusingly) accumulated in the case (even though the instruction doesn't have an A suffix). We currently generate x = min(VMINV(UINT_MAX, z), y) Once that is sorted out then, yep, using these for Os sounds like a good plan. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6537 + // want to record it as such. + if (!ForceInloopReductions) + continue; ---------------- Ayal wrote: > dmgreen wrote: > > Ayal wrote: > > > Move this invariant check out as another early-exit? > > This does look a little strange here on it's own. The followup patch to add the TTI hook makes it look like: > > if (!PreferInloopReductions && > > !TTI.preferInloopReduction(Opcode, Phi->getType(), > > TargetTransformInfo::ReductionFlags())) > > continue; > Then better placed above right after defining Phi? I can put it somewhere sensible in this patch and move it in the next :) ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286 + RecurrenceDescriptor &RdxDesc = Reduction.second; + if (CM.useInloopReductions(Reduction.first)) { + PHINode *Phi = Reduction.first; ---------------- Ayal wrote: > dmgreen wrote: > > Ayal wrote: > > > dmgreen wrote: > > > > Ayal wrote: > > > > > Iterate over in loop reductions? > > > > Do you mean adding an iterator for iterating over reductions, stepping over the ones not inloop? > > > > > > > > It would seem like it's similar to the existing code, but as a new iterator class. My gut says the current code is simpler and clearer what is going on? > > > Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars(). > > > > > > (Better early-exit via "if (!CM.isInLoopReduction(Reduction.first)) continue;") > > I believe that InloopReductionChains would not iterate in a deterministic order, which is why I avoided it. > > > > Perhaps that would not matter here? The reductions should be independent anyway. Seems safer to try and use deterministic ordering anyway if we can. > Agreed it would be better to use deterministic ordering. How about letting InloopReductionChains be a MapVector and iterate over > for (auto &Reduction : CM.getInloopReductions())? > The number of reductions is expected to be small, w/o removals. MapVector sounds good. I've changed it to use that and tried to use that in a few more places. Let me know what you think. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Tue Jul 7 04:00:42 2020 From: llvm-commits at lists.llvm.org (Kadir Cetinkaya via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:00:42 +0000 (UTC) Subject: [PATCH] D83099: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' In-Reply-To: References: Message-ID: <80927a752bbabb16c03d38b19a400109@localhost.localdomain> kadircet added inline comments. ================ Comment at: .gitignore:57 +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache ---------------- why do we still need this ? i thought index (and other caches) would reside in `.cache` ? ================ Comment at: clang-tools-extra/clangd/index/Background.h:61 + // CDBDirectory is the first directory containing a CDB in parent directories + // of a file, or user's home directory if none was found, e.g. stdlib headers. static Factory createDiskBackedStorageFactory( ---------------- nit: maybe talk about XDG_CACHE instead of home directory as fallback location. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 From llvm-commits at lists.llvm.org Tue Jul 7 04:01:02 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via llvm-commits) Date: Tue, 07 Jul 2020 04:01:02 -0700 (PDT) Subject: [llvm] cdf2eef - [SVE][CodeGen] Legalisation of unpredicated store instructions Message-ID: <5f0455ee.1c69fb81.9f328.5b8f@mx.google.com> Author: Kerry McLaughlin Date: 2020-07-07T11:47:10+01:00 New Revision: cdf2eef613f7d6949e2929b316925c3553a8e3d0 URL: https://github.com/llvm/llvm-project/commit/cdf2eef613f7d6949e2929b316925c3553a8e3d0 DIFF: https://github.com/llvm/llvm-project/commit/cdf2eef613f7d6949e2929b316925c3553a8e3d0.diff LOG: [SVE][CodeGen] Legalisation of unpredicated store instructions Summary: When splitting a store of a scalable type, the new address is calculated in SplitVecOp_STORE using a vscale and an add instruction. Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83041 Added: llvm/test/CodeGen/AArch64/sve-split-store.ll Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index cacc2dfa03c2..96c3a715532a 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2490,7 +2490,7 @@ SDValue DAGTypeLegalizer::SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo) { if (!LoMemVT.isByteSized() || !HiMemVT.isByteSized()) return TLI.scalarizeVectorStore(N, DAG); - unsigned IncrementSize = LoMemVT.getSizeInBits()/8; + unsigned IncrementSize = LoMemVT.getSizeInBits().getKnownMinSize() / 8; if (isTruncating) Lo = DAG.getTruncStore(Ch, DL, Lo, Ptr, N->getPointerInfo(), LoMemVT, @@ -2499,17 +2499,24 @@ SDValue DAGTypeLegalizer::SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo) { Lo = DAG.getStore(Ch, DL, Lo, Ptr, N->getPointerInfo(), Alignment, MMOFlags, AAInfo); - // Increment the pointer to the other half. - Ptr = DAG.getObjectPtrOffset(DL, Ptr, IncrementSize); + MachinePointerInfo MPI; + if (LoMemVT.isScalableVector()) { + SDValue BytesIncrement = DAG.getVScale( + DL, Ptr.getValueType(), + APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize)); + MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace()); + Ptr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, BytesIncrement); + } else { + MPI = N->getPointerInfo().getWithOffset(IncrementSize); + // Increment the pointer to the other half. + Ptr = DAG.getObjectPtrOffset(DL, Ptr, IncrementSize); + } if (isTruncating) - Hi = DAG.getTruncStore(Ch, DL, Hi, Ptr, - N->getPointerInfo().getWithOffset(IncrementSize), + Hi = DAG.getTruncStore(Ch, DL, Hi, Ptr, MPI, HiMemVT, Alignment, MMOFlags, AAInfo); else - Hi = DAG.getStore(Ch, DL, Hi, Ptr, - N->getPointerInfo().getWithOffset(IncrementSize), - Alignment, MMOFlags, AAInfo); + Hi = DAG.getStore(Ch, DL, Hi, Ptr, MPI, Alignment, MMOFlags, AAInfo); return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Lo, Hi); } diff --git a/llvm/test/CodeGen/AArch64/sve-split-store.ll b/llvm/test/CodeGen/AArch64/sve-split-store.ll new file mode 100644 index 000000000000..2fba0404ef34 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-split-store.ll @@ -0,0 +1,53 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s + +define void @store_promote_4i8( %data, * %a) { +; CHECK-LABEL: store_promote_4i8: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.s +; CHECK-NEXT: st1b { z0.s }, p0, [x0] +; CHECK-NEXT: ret + store %data, * %a + ret void +} + +define void @store_split_i16( %data, * %a) { +; CHECK-LABEL: store_split_i16: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.h +; CHECK-NEXT: st1h { z1.h }, p0, [x0, #1, mul vl] +; CHECK-NEXT: st1h { z0.h }, p0, [x0] +; CHECK-NEXT: ret + store %data, * %a + ret void +} + +define void @store_split_16i32( %data, * %a) { +; CHECK-LABEL: store_split_16i32: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.s +; CHECK-NEXT: st1w { z3.s }, p0, [x0, #3, mul vl] +; CHECK-NEXT: st1w { z2.s }, p0, [x0, #2, mul vl] +; CHECK-NEXT: st1w { z1.s }, p0, [x0, #1, mul vl] +; CHECK-NEXT: st1w { z0.s }, p0, [x0] +; CHECK-NEXT: ret + store %data, * %a + ret void +} + +define void @store_split_16i64( %data, * %a) { +; CHECK-LABEL: store_split_16i64: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.d +; CHECK-NEXT: st1d { z7.d }, p0, [x0, #7, mul vl] +; CHECK-NEXT: st1d { z6.d }, p0, [x0, #6, mul vl] +; CHECK-NEXT: st1d { z5.d }, p0, [x0, #5, mul vl] +; CHECK-NEXT: st1d { z4.d }, p0, [x0, #4, mul vl] +; CHECK-NEXT: st1d { z3.d }, p0, [x0, #3, mul vl] +; CHECK-NEXT: st1d { z2.d }, p0, [x0, #2, mul vl] +; CHECK-NEXT: st1d { z1.d }, p0, [x0, #1, mul vl] +; CHECK-NEXT: st1d { z0.d }, p0, [x0] +; CHECK-NEXT: ret + store %data, * %a + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 04:01:14 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:01:14 +0000 (UTC) Subject: [PATCH] D83041: [SVE][CodeGen] Legalisation of unpredicated store instructions In-Reply-To: References: Message-ID: <1dbe2245ba24bade5b14c7c6347c16cd@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcdf2eef613f7: [SVE][CodeGen] Legalisation of unpredicated store instructions (authored by kmclaughlin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83041/new/ https://reviews.llvm.org/D83041 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/test/CodeGen/AArch64/sve-split-store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83041.275985.patch Type: text/x-patch Size: 4173 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:01:50 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 04:01:50 -0700 (PDT) Subject: [llvm] 8f0f7db - [llvm-readobj] - Split the printHashSymbols. NFCI. Message-ID: <5f04561e.1c69fb81.8e44b.9d67@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T14:01:34+03:00 New Revision: 8f0f7dbcea34e2e89fb0946067af3488c20c4532 URL: https://github.com/llvm/llvm-project/commit/8f0f7dbcea34e2e89fb0946067af3488c20c4532 DIFF: https://github.com/llvm/llvm-project/commit/8f0f7dbcea34e2e89fb0946067af3488c20c4532.diff LOG: [llvm-readobj] - Split the printHashSymbols. NFCI. This introduces `printHashTableSymbols` and `printGNUHashTableSymbols` to split the `printHashSymbols`. It makes the code more readable and consistent. Differential revision: https://reviews.llvm.org/D83040 Added: Modified: llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index b284aae84470..bbdae7504525 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -851,6 +851,10 @@ template class GNUStyle : public DumpStyle { void printHashHistogram(const Elf_Hash &HashTable); void printGnuHashHistogram(const Elf_GnuHash &GnuHashTable); + void printHashTableSymbols(const ELFO *Obj, const Elf_Hash &HashTable); + void printGnuHashTableSymbols(const ELFO *Obj, + const Elf_GnuHash &GnuHashTable); + struct Field { std::string Str; unsigned Column; @@ -4068,78 +4072,62 @@ void GNUStyle::printSymbols(const ELFO *Obj, bool PrintSymbols, this->dumper()->printSymbolsHelper(false); } -template void GNUStyle::printHashSymbols(const ELFO *Obj) { - if (this->dumper()->getDynamicStringTable().empty()) - return; - auto StringTable = this->dumper()->getDynamicStringTable(); - Elf_Sym_Range DynSyms = this->dumper()->dynamic_symbols(); - - auto PrintHashTable = [&](const Elf_Hash *SysVHash) { - if (ELFT::Is64Bits) - OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; - else - OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; - OS << "\n"; - - const Elf_Sym *FirstSym = DynSyms.empty() ? nullptr : &DynSyms[0]; - if (!FirstSym) { - Optional DynSymRegion = this->dumper()->getDynSymRegion(); - this->reportUniqueWarning( - createError(Twine("unable to print symbols for the .hash table: the " - "dynamic symbol table ") + - (DynSymRegion ? "is empty" : "was not found"))); - return; - } - - auto Buckets = SysVHash->buckets(); - auto Chains = SysVHash->chains(); - for (uint32_t Buc = 0; Buc < SysVHash->nbucket; Buc++) { - if (Buckets[Buc] == ELF::STN_UNDEF) - continue; - std::vector Visited(SysVHash->nchain); - for (uint32_t Ch = Buckets[Buc]; Ch < SysVHash->nchain; Ch = Chains[Ch]) { - if (Ch == ELF::STN_UNDEF) - break; - - if (Visited[Ch]) { - reportWarning( - createError(".hash section is invalid: bucket " + Twine(Ch) + - ": a cycle was detected in the linked chain"), - this->FileName); - break; - } - - printHashedSymbol(Obj, FirstSym, Ch, StringTable, Buc); - Visited[Ch] = true; - } - } - }; - - if (const Elf_Hash *SysVHash = this->dumper()->getHashTable()) { - OS << "\n Symbol table of .hash for image:\n"; - if (Error E = checkHashTable(Obj, SysVHash)) - this->reportUniqueWarning(std::move(E)); - else - PrintHashTable(SysVHash); - } - - // Try printing the .gnu.hash table. - const Elf_GnuHash *GnuHash = this->dumper()->getGnuHashTable(); - if (!GnuHash) +template +void GNUStyle::printHashTableSymbols(const ELFO *Obj, + const Elf_Hash &SysVHash) { + StringRef StringTable = this->dumper()->getDynamicStringTable(); + if (StringTable.empty()) return; - OS << "\n Symbol table of .gnu.hash for image:\n"; if (ELFT::Is64Bits) OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; else OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; OS << "\n"; - if (Error E = checkGNUHashTable(Obj, GnuHash)) { - this->reportUniqueWarning(std::move(E)); + Elf_Sym_Range DynSyms = this->dumper()->dynamic_symbols(); + const Elf_Sym *FirstSym = DynSyms.empty() ? nullptr : &DynSyms[0]; + if (!FirstSym) { + Optional DynSymRegion = this->dumper()->getDynSymRegion(); + this->reportUniqueWarning( + createError(Twine("unable to print symbols for the .hash table: the " + "dynamic symbol table ") + + (DynSymRegion ? "is empty" : "was not found"))); return; } + auto Buckets = SysVHash.buckets(); + auto Chains = SysVHash.chains(); + for (uint32_t Buc = 0; Buc < SysVHash.nbucket; Buc++) { + if (Buckets[Buc] == ELF::STN_UNDEF) + continue; + std::vector Visited(SysVHash.nchain); + for (uint32_t Ch = Buckets[Buc]; Ch < SysVHash.nchain; Ch = Chains[Ch]) { + if (Ch == ELF::STN_UNDEF) + break; + + if (Visited[Ch]) { + reportWarning(createError(".hash section is invalid: bucket " + + Twine(Ch) + + ": a cycle was detected in the linked chain"), + this->FileName); + break; + } + + printHashedSymbol(Obj, FirstSym, Ch, StringTable, Buc); + Visited[Ch] = true; + } + } +} + +template +void GNUStyle::printGnuHashTableSymbols(const ELFO *Obj, + const Elf_GnuHash &GnuHash) { + StringRef StringTable = this->dumper()->getDynamicStringTable(); + if (StringTable.empty()) + return; + + Elf_Sym_Range DynSyms = this->dumper()->dynamic_symbols(); const Elf_Sym *FirstSym = DynSyms.empty() ? nullptr : &DynSyms[0]; if (!FirstSym) { Optional DynSymRegion = this->dumper()->getDynSymRegion(); @@ -4150,22 +4138,47 @@ template void GNUStyle::printHashSymbols(const ELFO *Obj) { return; } - auto Buckets = GnuHash->buckets(); - for (uint32_t Buc = 0; Buc < GnuHash->nbuckets; Buc++) { + ArrayRef Buckets = GnuHash.buckets(); + for (uint32_t Buc = 0; Buc < GnuHash.nbuckets; Buc++) { if (Buckets[Buc] == ELF::STN_UNDEF) continue; uint32_t Index = Buckets[Buc]; - uint32_t GnuHashable = Index - GnuHash->symndx; + uint32_t GnuHashable = Index - GnuHash.symndx; // Print whole chain while (true) { printHashedSymbol(Obj, FirstSym, Index++, StringTable, Buc); // Chain ends at symbol with stopper bit - if ((GnuHash->values(DynSyms.size())[GnuHashable++] & 1) == 1) + if ((GnuHash.values(DynSyms.size())[GnuHashable++] & 1) == 1) break; } } } +template void GNUStyle::printHashSymbols(const ELFO *Obj) { + if (const Elf_Hash *SysVHash = this->dumper()->getHashTable()) { + OS << "\n Symbol table of .hash for image:\n"; + if (Error E = checkHashTable(Obj, SysVHash)) + this->reportUniqueWarning(std::move(E)); + else + printHashTableSymbols(Obj, *SysVHash); + } + + // Try printing the .gnu.hash table. + if (const Elf_GnuHash *GnuHash = this->dumper()->getGnuHashTable()) { + OS << "\n Symbol table of .gnu.hash for image:\n"; + if (ELFT::Is64Bits) + OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; + else + OS << " Num Buc: Value Size Type Bind Vis Ndx Name"; + OS << "\n"; + + if (Error E = checkGNUHashTable(Obj, GnuHash)) + this->reportUniqueWarning(std::move(E)); + else + printGnuHashTableSymbols(Obj, *GnuHash); + } +} + static inline std::string printPhdrFlags(unsigned Flag) { std::string Str; Str = (Flag & PF_R) ? "R" : " "; From llvm-commits at lists.llvm.org Tue Jul 7 04:02:02 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:02:02 +0000 (UTC) Subject: [PATCH] D83040: [llvm-readobj] - Split the printHashSymbols. NFCI. In-Reply-To: References: Message-ID: <681bb527fb389821a0f0e98375651c83@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG8f0f7dbcea34: [llvm-readobj] - Split the printHashSymbols. NFCI. (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D83040?vs=275070&id=275986#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83040/new/ https://reviews.llvm.org/D83040 Files: llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83040.275986.patch Type: text/x-patch Size: 7043 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:07:25 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:07:25 +0000 (UTC) Subject: [PATCH] D83129: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE sections. In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG2d9bd448c9f0: [llvm-readobj] - Allow dumping partially corrupted SHT_LLVM_CALL_GRAPH_PROFILE… (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83129/new/ https://reviews.llvm.org/D83129 Files: llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83129.275654.patch Type: text/x-patch Size: 4927 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:07:28 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:07:28 +0000 (UTC) Subject: [PATCH] D83041: [SVE][CodeGen] Legalisation of unpredicated store instructions In-Reply-To: References: Message-ID: <7a2c4c413638a8bbca3210eff8a007a5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcdf2eef613f7: [SVE][CodeGen] Legalisation of unpredicated store instructions (authored by kmclaughlin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83041/new/ https://reviews.llvm.org/D83041 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/test/CodeGen/AArch64/sve-split-store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83041.275655.patch Type: text/x-patch Size: 4173 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:07:35 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:07:35 +0000 (UTC) Subject: [PATCH] D83040: [llvm-readobj] - Split the printHashSymbols. NFCI. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG8f0f7dbcea34: [llvm-readobj] - Split the printHashSymbols. NFCI. (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D83040?vs=275070&id=275656#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83040/new/ https://reviews.llvm.org/D83040 Files: llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83040.275656.patch Type: text/x-patch Size: 7043 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:11:03 2020 From: llvm-commits at lists.llvm.org (Dan Zimmerman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:11:03 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <60568e34733ed15fe1acc1a2c1a7888f@localhost.localdomain> danzimm added a comment. Wow, thanks for the quick and thoughtful response @jhenderson ! This was awesome to wake up to! > In LLVM tools using the cl::opt interface, like llvm-symbolizer and llvm-addr2line, where you see a cl::opt (such as --inlining) you should be able to do --inlininng=0 to disable it. I'm not necessarily opposed to --no-inlining too mind you, but wanted to raise that before going further Whoa, I didn't know that! This is pretty cool! I'll go ahead and pull this change out of this diff. Do you have preference on introducing `--no-inlining`? I think I'll drop it altogether (as opposed to putting it in another diff) in order to try and reduce complexity. > Most tests for llvm-symbolizer (and llvm-addr2line) are located in llvm/test/tools/llvm-symbolizer (there are a few in other scattered locations - grep for llvm-symbolizer). Any new front-end options like these should be tested there. Doh! It was right in front of me the whole time! Thanks for pointing that out. Next time (if there is a next time for this diff) I upload changes I'll add some tests! > Are you specifically interested in symbols at specific addresses, or with a specific type? llvm-nm and llvm-readobj can both be used to dump symbols too. It doesn't feel to me like llvm-symbolizer is the right tool for the job if you want to dump all symbols (or all functions), though I could possibly see an argument if you are limiting it to the symbols with specific addresses. I personally would think it would make more sense to add any necessary options to llvm-nm or possibly llvm-readelf. Adding a few others with binutils knowledge for more visibility and to get their input. Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. `-finstrument-function-entry-bare`). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected. So.... I think for my personal use cases we can go ahead and scrap this diff. I'm also ok with creating the tests and pushing through the `--no-source-file` change (it still probably has some utility in the case when there aren't duplicate symbols). What do you say @jhenderson ? Does scrapping this diff sound ok? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Tue Jul 7 04:11:45 2020 From: llvm-commits at lists.llvm.org (Nathan James via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:11:45 +0000 (UTC) Subject: [PATCH] D82159: Add a cmake warning when someone tries to configure clang-tools-extra without clang In-Reply-To: References: Message-ID: <38e6fe3815e0f4d46102b0336d92ecbd@localhost.localdomain> njames93 updated this revision to Diff 275988. njames93 added a comment. Automatically enable clang as well as displaying warning Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82159/new/ https://reviews.llvm.org/D82159 Files: llvm/CMakeLists.txt Index: llvm/CMakeLists.txt =================================================================== --- llvm/CMakeLists.txt +++ llvm/CMakeLists.txt @@ -81,6 +81,10 @@ if( LLVM_ENABLE_PROJECTS STREQUAL "all" ) set( LLVM_ENABLE_PROJECTS ${LLVM_ALL_PROJECTS}) endif() +if ("clang-tools-extra" IN_LIST LLVM_ENABLE_PROJECTS AND NOT "clang" IN_LIST LLVM_ENABLE_PROJECTS) + message(WARNING "clang-tools-extra is enabled, which depends on 'clang'. Automatically enabling 'clang'.") + list(APPEND LLVM_ENABLE_PROJECTS "clang") +endif() if ("flang" IN_LIST LLVM_ENABLE_PROJECTS AND NOT "mlir" IN_LIST LLVM_ENABLE_PROJECTS) message(STATUS "Enabling MLIR as a dependency to flang") list(APPEND LLVM_ENABLE_PROJECTS "mlir") -------------- next part -------------- A non-text attachment was scrubbed... Name: D82159.275988.patch Type: text/x-patch Size: 720 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:12:44 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 04:12:44 -0700 (PDT) Subject: [llvm] 0d656cb - [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. Message-ID: <5f0458ac.1c69fb81.72350.6234@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T14:04:17+03:00 New Revision: 0d656cb25dc760cdfe0adfdd7256651b7bd0bcab URL: https://github.com/llvm/llvm-project/commit/0d656cb25dc760cdfe0adfdd7256651b7bd0bcab DIFF: https://github.com/llvm/llvm-project/commit/0d656cb25dc760cdfe0adfdd7256651b7bd0bcab.diff LOG: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. It is possible to: 1) Avoid using the `unwrapOrError` calls and hence allow to continue dumping even when something is not OK with one of SHT_LLVM_LINKER_OPTIONS sections. 2) replace `reportWarning` with `reportUniqueWarning` calls. In this method it is no-op, because it is not possible to have a duplicated warnings anyways, but since we probably want to switch to `reportUniqueWarning` globally, this is a good thing to do. This patch addresses both these points. Differential revision: https://reviews.llvm.org/D83131 Added: Modified: llvm/test/tools/llvm-readobj/ELF/linker-options.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/linker-options.test b/llvm/test/tools/llvm-readobj/ELF/linker-options.test index 3fad6b65c7a5..8e66547bd5ed 100644 --- a/llvm/test/tools/llvm-readobj/ELF/linker-options.test +++ b/llvm/test/tools/llvm-readobj/ELF/linker-options.test @@ -2,13 +2,14 @@ ## to dump SHT_LLVM_LINKER_OPTIONS sections. # RUN: yaml2obj --docnum=1 %s -o %t1 -# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s --check-prefix=CHECK -DFILE=%t1 +# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s -DFILE=%t1 # CHECK: LinkerOptions [ # CHECK: option 0: value 0 # CHECK: option 1: value 1 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 2 is broken: an incomplete key-value pair was found. The last possible key was: "c" # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 # CHECK-NEXT: ] @@ -44,7 +45,15 @@ Sections: - Name: .linker-options.nonul Type: SHT_LLVM_LINKER_OPTIONS Content: "61" -## Case 5: another correct case to show we do not stop dumping after reporting a warning. +## Case 5: check we report a warning when it is not possible to read +## the content of the SHT_LLVM_LINKER_OPTIONS section. + - Name: .linker-options.broken.content + Type: SHT_LLVM_LINKER_OPTIONS + ShOffset: 0xffffffff + Options: + - Name: foo + Value: bar +## Case 6: another correct case to show we do not stop dumping after reporting a warning. - Name: .linker-options.valid2 Type: SHT_LLVM_LINKER_OPTIONS Options: diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index bbdae7504525..247dfd9d6039 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6794,30 +6794,33 @@ void LLVMStyle::printELFLinkerOptions(const ELFFile *Obj) { if (Shdr.sh_type != ELF::SHT_LLVM_LINKER_OPTIONS) continue; - ArrayRef Contents = - unwrapOrError(this->FileName, Obj->getSectionContents(&Shdr)); - if (Contents.empty()) + Expected> ContentsOrErr = Obj->getSectionContents(&Shdr); + if (!ContentsOrErr) { + this->reportUniqueWarning( + createError("unable to read the content of the " + "SHT_LLVM_LINKER_OPTIONS section: " + + toString(ContentsOrErr.takeError()))); + continue; + } + if (ContentsOrErr->empty()) continue; - if (Contents.back() != 0) { - reportWarning(createError("SHT_LLVM_LINKER_OPTIONS section at index " + - Twine(I) + - " is broken: the " - "content is not null-terminated"), - this->FileName); + if (ContentsOrErr->back() != 0) { + this->reportUniqueWarning( + createError("SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: the " + "content is not null-terminated")); continue; } SmallVector Strings; - toStringRef(Contents.drop_back()).split(Strings, '\0'); + toStringRef(ContentsOrErr->drop_back()).split(Strings, '\0'); if (Strings.size() % 2 != 0) { - reportWarning( - createError( - "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + - " is broken: an incomplete " - "key-value pair was found. The last possible key was: \"" + - Strings.back() + "\""), - this->FileName); + this->reportUniqueWarning(createError( + "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: an incomplete " + "key-value pair was found. The last possible key was: \"" + + Strings.back() + "\"")); continue; } From llvm-commits at lists.llvm.org Tue Jul 7 04:12:54 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:12:54 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG0d656cb25dc7: [llvm-readobj] - Refine the error reporting in LLVMStyle<ELFT>… (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 Files: llvm/test/tools/llvm-readobj/ELF/linker-options.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6794,30 +6794,33 @@ if (Shdr.sh_type != ELF::SHT_LLVM_LINKER_OPTIONS) continue; - ArrayRef Contents = - unwrapOrError(this->FileName, Obj->getSectionContents(&Shdr)); - if (Contents.empty()) + Expected> ContentsOrErr = Obj->getSectionContents(&Shdr); + if (!ContentsOrErr) { + this->reportUniqueWarning( + createError("unable to read the content of the " + "SHT_LLVM_LINKER_OPTIONS section: " + + toString(ContentsOrErr.takeError()))); + continue; + } + if (ContentsOrErr->empty()) continue; - if (Contents.back() != 0) { - reportWarning(createError("SHT_LLVM_LINKER_OPTIONS section at index " + - Twine(I) + - " is broken: the " - "content is not null-terminated"), - this->FileName); + if (ContentsOrErr->back() != 0) { + this->reportUniqueWarning( + createError("SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: the " + "content is not null-terminated")); continue; } SmallVector Strings; - toStringRef(Contents.drop_back()).split(Strings, '\0'); + toStringRef(ContentsOrErr->drop_back()).split(Strings, '\0'); if (Strings.size() % 2 != 0) { - reportWarning( - createError( - "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + - " is broken: an incomplete " - "key-value pair was found. The last possible key was: \"" + - Strings.back() + "\""), - this->FileName); + this->reportUniqueWarning(createError( + "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: an incomplete " + "key-value pair was found. The last possible key was: \"" + + Strings.back() + "\"")); continue; } Index: llvm/test/tools/llvm-readobj/ELF/linker-options.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/linker-options.test +++ llvm/test/tools/llvm-readobj/ELF/linker-options.test @@ -2,13 +2,14 @@ ## to dump SHT_LLVM_LINKER_OPTIONS sections. # RUN: yaml2obj --docnum=1 %s -o %t1 -# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s --check-prefix=CHECK -DFILE=%t1 +# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s -DFILE=%t1 # CHECK: LinkerOptions [ # CHECK: option 0: value 0 # CHECK: option 1: value 1 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 2 is broken: an incomplete key-value pair was found. The last possible key was: "c" # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 # CHECK-NEXT: ] @@ -44,7 +45,15 @@ - Name: .linker-options.nonul Type: SHT_LLVM_LINKER_OPTIONS Content: "61" -## Case 5: another correct case to show we do not stop dumping after reporting a warning. +## Case 5: check we report a warning when it is not possible to read +## the content of the SHT_LLVM_LINKER_OPTIONS section. + - Name: .linker-options.broken.content + Type: SHT_LLVM_LINKER_OPTIONS + ShOffset: 0xffffffff + Options: + - Name: foo + Value: bar +## Case 6: another correct case to show we do not stop dumping after reporting a warning. - Name: .linker-options.valid2 Type: SHT_LLVM_LINKER_OPTIONS Options: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83131.275989.patch Type: text/x-patch Size: 4074 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:14:24 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:14:24 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: <2e48c04c9e9ca97bc927f3c7a5f7f382@localhost.localdomain> courbet accepted this revision. courbet added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/include/llvm/IR/InstrTypes.h:1102 using CallInstReservedField = Bitfield::Element; // Next bit:2 using CallingConvField = Bitfield::Element; // Next bit:12 ---------------- here too ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 From llvm-commits at lists.llvm.org Tue Jul 7 04:19:40 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:19:40 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <272273f1069c9655900600187b8b8e71@localhost.localdomain> jhenderson added a comment. I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of `=0` to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Tue Jul 7 04:23:27 2020 From: llvm-commits at lists.llvm.org (Mikhail Maltsev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:23:27 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: <45af1543f0bca683cfa9bf78e6937aa0@localhost.localdomain> miyuki accepted this revision. miyuki added a comment. This revision is now accepted and ready to land. LGTM. Please wait until Friday before committing, so that others have a chance to share their objections. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 From llvm-commits at lists.llvm.org Tue Jul 7 04:26:44 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:26:44 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: <60e2af06921a90e04dcc83cafd1bdb35@localhost.localdomain> lebedev.ri requested changes to this revision. lebedev.ri added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:17817-17818 + VecVT.getVectorElementType() == ScalarVT && + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType())) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 ---------------- !LegalTyepes || TLI.isTypeLegal( Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 From llvm-commits at lists.llvm.org Tue Jul 7 04:29:03 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:29:03 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <32fed6f3b17f0a7cb151b4b6ca487079@localhost.localdomain> ABataev added a comment. In D82881#2134913 , @dblaikie wrote: > In D82881#2133548 , @ABataev wrote: > > > In D82881#2133511 , @aprantl wrote: > > > > > And conversely, with this patch applied, do GDB and LLDB still produce the expected result? > > > > > > GDB works correctly. Did not check with lldb, but it also should work. The result is similar to the debug info, produced for the next code: > > > > struct { > > short : 3; > > short : 6; > > } a; > > > > > Similar, but seems different in a critical way - in that code, the type of the field is short, which has size 2. Which matches the size of the field. > > I think it would be pretty surprising to handle DWARF where the size of a field is different from the size of the type of that field? The standard clearly says: A base type entry has a DW_AT_byte_size attribute, whose value is a constant, describing the size in bytes of the storage unit used to represent an object of the given type. In our example, the storage size is the same, just like for short. The standard does not say anything about the size of the base type. The real size of the bitfield is passed in `DW_AT_bit_size` attribute (in bits). > That said, I don't have great suggestions for how the DWARF should communicate this packed situation where a bitfield crosses a byte boundary either. > >> But the code, produced by the compiler, is also the same. So, I think, the debug info also should be the same. >> >>> Also, what happens to the next bit field or variable right after the bit-filed with the now larger container? Is that affected by the patch? >> >> It does not affect the next fields. We point exactly to the bytes, allocated for this particular bitfield only. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Tue Jul 7 04:32:10 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:32:10 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: <5c42e20596cd5a7d9522f04ada1ae1c0@localhost.localdomain> jmorse marked an inline comment as done. jmorse added a comment. In D83236#2133902 , @dblaikie wrote: > Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together? This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too. (Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently) ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h:598 + const DbgValueHistoryMap::Entries &Entries, + SmallPtrSetImpl &VeryLargeBlocks); ---------------- aprantl wrote: > Not very important, but: Assuming that `VeryLargeBlocks` will only be populated in the pathological case, micro-optimizing with a *Small*PtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet? Sounds fair, I'll fold that change in, Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 From llvm-commits at lists.llvm.org Tue Jul 7 04:32:56 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:32:56 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. In-Reply-To: References: Message-ID: grimar updated this revision to Diff 275993. grimar marked 2 inline comments as done. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83232/new/ https://reviews.llvm.org/D83232 Files: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6401,14 +6401,14 @@ const DynRegionInfo &DynRelaRegion = this->dumper()->getDynRelaRegion(); const DynRegionInfo &DynRelrRegion = this->dumper()->getDynRelrRegion(); const DynRegionInfo &DynPLTRelRegion = this->dumper()->getDynPLTRelRegion(); - if (DynRelRegion.Size && DynRelaRegion.Size) - report_fatal_error("There are both REL and RELA dynamic relocations"); + W.startLine() << "Dynamic Relocations {\n"; W.indent(); - if (DynRelaRegion.Size > 0) + if (DynRelaRegion.Size > 0) { for (const Elf_Rela &Rela : this->dumper()->dyn_relas()) printDynamicRelocation(Obj, Rela); - else + } + if (DynRelRegion.Size > 0) { for (const Elf_Rel &Rel : this->dumper()->dyn_rels()) { Elf_Rela Rela; Rela.r_offset = Rel.r_offset; @@ -6416,6 +6416,8 @@ Rela.r_addend = 0; printDynamicRelocation(Obj, Rela); } + } + if (DynRelrRegion.Size > 0) { Elf_Relr_Range Relrs = this->dumper()->dyn_relrs(); std::vector RelrRelas = Index: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test +++ llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test @@ -463,3 +463,62 @@ Sections: - Section: .rela.dyn - Section: .dynamic + +## Show that when we have both REL and RELA relocations, we dump both sets. +# RUN: yaml2obj --docnum=13 %s -o %t13 +# RUN: llvm-readobj --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-LLVM +# RUN: llvm-readelf --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-GNU + +# BOTH-RELA-REL-LLVM: Dynamic Relocations { +# BOTH-RELA-REL-LLVM-NEXT: 0x0 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: 0x0 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: } + +# BOTH-RELA-REL-GNU: 'RELA' relocation section at offset 0x78 contains 24 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend +# BOTH-RELA-REL-GNU-NEXT: 0000000000000000 0000000000000000 R_X86_64_NONE 0 +# BOTH-RELA-REL-GNU-EMPTY: +# BOTH-RELA-REL-GNU: 'REL' relocation section at offset 0x90 contains 16 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name +# BOTH-RELA-REL-GNU-NEXT: 0000000000000000 0000000000000000 R_X86_64_NONE + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_X86_64 +Sections: + - Name: .rela.dyn + Type: SHT_RELA + Relocations: + - Type: R_X86_64_NONE + - Name: .rel.dyn + Type: SHT_REL + Relocations: + - Type: R_X86_64_NONE + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: DT_RELA + Value: 0x0 + - Tag: DT_RELASZ + Value: 0x18 + - Tag: DT_RELAENT + Value: 0x18 +## 0x18 == offset of .rel.dyn == size of .rela.dyn section. + - Tag: DT_REL + Value: 0x18 + - Tag: DT_RELSZ + Value: 0x10 + - Tag: DT_RELENT + Value: 0x10 + - Tag: DT_NULL + Value: 0x0 +DynamicSymbols: [] +ProgramHeaders: + - Type: PT_LOAD + Sections: + - Section: .rela.dyn + - Section: .rel.dyn + - Section: .dynamic -------------- next part -------------- A non-text attachment was scrubbed... Name: D83232.275993.patch Type: text/x-patch Size: 3596 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:33:00 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:33:00 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. In-Reply-To: References: Message-ID: <7d7c75450f256d49affdf5b8d8ab82ac@localhost.localdomain> grimar added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test:499-512 + - Tag: DT_RELA + Value: 0x0 + - Tag: DT_RELASZ + Value: 0x18 + - Tag: DT_RELAENT + Value: 0x18 + - Tag: DT_REL ---------------- jhenderson wrote: > The input here is a little subtle, i.e. reusing the same data for both REL and RELA relocations. Perhaps it would be clearer to have them point at different sections? If that's not possible, it might be worth pointing out how this works in a comment. Fixed. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83232/new/ https://reviews.llvm.org/D83232 From llvm-commits at lists.llvm.org Tue Jul 7 04:35:04 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:35:04 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <321d4aa24d26b49c2c41ed9163e6f855@localhost.localdomain> ABataev added a comment. In D83268#2135081 , @Hahnfeld wrote: > This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?). +1. Better to introduce new entry points and mark these ones as deprecated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 04:43:32 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:43:32 +0000 (UTC) Subject: [PATCH] D82159: Add a cmake warning when someone tries to configure clang-tools-extra without clang In-Reply-To: References: Message-ID: DavidTruby added a comment. I agree that we might want to abstract this somewhere as I can imagine other subprojects wanting to do it as well. It would probably be nicer for the dependency to appear in the subproject CMakeLists but I'm not sure if that's possible. Perhaps we should change the STATUS message to a WARNING for flang too, or at least keep both the same? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82159/new/ https://reviews.llvm.org/D82159 From llvm-commits at lists.llvm.org Tue Jul 7 04:46:35 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 04:46:35 -0700 (PDT) Subject: [llvm] a256193 - [llvm-readobj] - Add prepending # to mips-got.test and mips-plt.test. NFC. Message-ID: <5f04609b.1c69fb81.2eeba.60b9@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T14:44:30+03:00 New Revision: a256193afa4869ae749eaeec7548244772843da3 URL: https://github.com/llvm/llvm-project/commit/a256193afa4869ae749eaeec7548244772843da3 DIFF: https://github.com/llvm/llvm-project/commit/a256193afa4869ae749eaeec7548244772843da3.diff LOG: [llvm-readobj] - Add prepending # to mips-got.test and mips-plt.test. NFC. It was requested in D83225 review to do it separately. Added: Modified: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-got.test b/llvm/test/tools/llvm-readobj/ELF/mips-got.test index 8ed35d4b68e2..c13d57233326 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-got.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-got.test @@ -1,486 +1,486 @@ -RUN: llvm-readobj -A %p/Inputs/dynamic-table-exe.mips | \ -RUN: FileCheck %s -check-prefix GOT-EXE -RUN: llvm-readobj -A %p/Inputs/dynamic-table-so.mips | \ -RUN: FileCheck %s -check-prefix GOT-SO -RUN: llvm-readobj -A %p/Inputs/got-tls.so.elf-mips64el | \ -RUN: FileCheck %s -check-prefix GOT-TLS -RUN: llvm-readobj -A %p/Inputs/got-empty.exe.mipsel | \ -RUN: FileCheck %s -check-prefix GOT-EMPTY -RUN: llvm-readobj -A %p/Inputs/got-static.exe.mips | \ -RUN: FileCheck %s -check-prefix GOT-STATIC +# RUN: llvm-readobj -A %p/Inputs/dynamic-table-exe.mips | \ +# RUN: FileCheck %s -check-prefix GOT-EXE +# RUN: llvm-readobj -A %p/Inputs/dynamic-table-so.mips | \ +# RUN: FileCheck %s -check-prefix GOT-SO +# RUN: llvm-readobj -A %p/Inputs/got-tls.so.elf-mips64el | \ +# RUN: FileCheck %s -check-prefix GOT-TLS +# RUN: llvm-readobj -A %p/Inputs/got-empty.exe.mipsel | \ +# RUN: FileCheck %s -check-prefix GOT-EMPTY +# RUN: llvm-readobj -A %p/Inputs/got-static.exe.mips | \ +# RUN: FileCheck %s -check-prefix GOT-STATIC -RUN: llvm-readelf -A %p/Inputs/dynamic-table-exe.mips | \ -RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-EXE -RUN: llvm-readelf -A %p/Inputs/dynamic-table-so.mips | \ -RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-SO -RUN: llvm-readelf -A %p/Inputs/got-tls.so.elf-mips64el | \ -RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-TLS -RUN: llvm-readelf -A %p/Inputs/got-empty.exe.mipsel | \ -RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-EMPTY -RUN: llvm-readelf -A %p/Inputs/got-static.exe.mips | \ -RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-STATIC +# RUN: llvm-readelf -A %p/Inputs/dynamic-table-exe.mips | \ +# RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-EXE +# RUN: llvm-readelf -A %p/Inputs/dynamic-table-so.mips | \ +# RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-SO +# RUN: llvm-readelf -A %p/Inputs/got-tls.so.elf-mips64el | \ +# RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-TLS +# RUN: llvm-readelf -A %p/Inputs/got-empty.exe.mipsel | \ +# RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-EMPTY +# RUN: llvm-readelf -A %p/Inputs/got-static.exe.mips | \ +# RUN: FileCheck %s --strict-whitespace -check-prefix GNU-GOT-STATIC -GOT-EXE: Primary GOT { -GOT-EXE-NEXT: Canonical gp value: 0x418880 -GOT-EXE-NEXT: Reserved entries [ -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x410890 -GOT-EXE-NEXT: Access: -32752 -GOT-EXE-NEXT: Initial: 0x0 -GOT-EXE-NEXT: Purpose: Lazy resolver -GOT-EXE-NEXT: } -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x410894 -GOT-EXE-NEXT: Access: -32748 -GOT-EXE-NEXT: Initial: 0x80000000 -GOT-EXE-NEXT: Purpose: Module pointer (GNU extension) -GOT-EXE-NEXT: } -GOT-EXE-NEXT: ] -GOT-EXE-NEXT: Local entries [ -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x410898 -GOT-EXE-NEXT: Access: -32744 -GOT-EXE-NEXT: Initial: 0x400418 -GOT-EXE-NEXT: } -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x41089C -GOT-EXE-NEXT: Access: -32740 -GOT-EXE-NEXT: Initial: 0x410840 -GOT-EXE-NEXT: } -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x4108A0 -GOT-EXE-NEXT: Access: -32736 -GOT-EXE-NEXT: Initial: 0x0 -GOT-EXE-NEXT: } -GOT-EXE-NEXT: ] -GOT-EXE-NEXT: Global entries [ -GOT-EXE-NEXT: Entry { -GOT-EXE-NEXT: Address: 0x4108A4 -GOT-EXE-NEXT: Access: -32732 -GOT-EXE-NEXT: Initial: 0x0 -GOT-EXE-NEXT: Value: 0x0 -GOT-EXE-NEXT: Type: Function (0x2) -GOT-EXE-NEXT: Section: Undefined (0x0) -GOT-EXE-NEXT: Name: __gmon_start__ (1) -GOT-EXE-NEXT: } -GOT-EXE-NEXT: ] -GOT-EXE-NEXT: Number of TLS and multi-GOT entries: 0 -GOT-EXE-NEXT: } +# GOT-EXE: Primary GOT { +# GOT-EXE-NEXT: Canonical gp value: 0x418880 +# GOT-EXE-NEXT: Reserved entries [ +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x410890 +# GOT-EXE-NEXT: Access: -32752 +# GOT-EXE-NEXT: Initial: 0x0 +# GOT-EXE-NEXT: Purpose: Lazy resolver +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x410894 +# GOT-EXE-NEXT: Access: -32748 +# GOT-EXE-NEXT: Initial: 0x80000000 +# GOT-EXE-NEXT: Purpose: Module pointer (GNU extension) +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: ] +# GOT-EXE-NEXT: Local entries [ +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x410898 +# GOT-EXE-NEXT: Access: -32744 +# GOT-EXE-NEXT: Initial: 0x400418 +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x41089C +# GOT-EXE-NEXT: Access: -32740 +# GOT-EXE-NEXT: Initial: 0x410840 +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x4108A0 +# GOT-EXE-NEXT: Access: -32736 +# GOT-EXE-NEXT: Initial: 0x0 +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: ] +# GOT-EXE-NEXT: Global entries [ +# GOT-EXE-NEXT: Entry { +# GOT-EXE-NEXT: Address: 0x4108A4 +# GOT-EXE-NEXT: Access: -32732 +# GOT-EXE-NEXT: Initial: 0x0 +# GOT-EXE-NEXT: Value: 0x0 +# GOT-EXE-NEXT: Type: Function (0x2) +# GOT-EXE-NEXT: Section: Undefined (0x0) +# GOT-EXE-NEXT: Name: __gmon_start__ (1) +# GOT-EXE-NEXT: } +# GOT-EXE-NEXT: ] +# GOT-EXE-NEXT: Number of TLS and multi-GOT entries: 0 +# GOT-EXE-NEXT: } -GOT-SO: Primary GOT { -GOT-SO-NEXT: Canonical gp value: 0x188D0 -GOT-SO-NEXT: Reserved entries [ -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108E0 -GOT-SO-NEXT: Access: -32752 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Purpose: Lazy resolver -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108E4 -GOT-SO-NEXT: Access: -32748 -GOT-SO-NEXT: Initial: 0x80000000 -GOT-SO-NEXT: Purpose: Module pointer (GNU extension) -GOT-SO-NEXT: } -GOT-SO-NEXT: ] -GOT-SO-NEXT: Local entries [ -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108E8 -GOT-SO-NEXT: Access: -32744 -GOT-SO-NEXT: Initial: 0x108E0 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108EC -GOT-SO-NEXT: Access: -32740 -GOT-SO-NEXT: Initial: 0x10000 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108F0 -GOT-SO-NEXT: Access: -32736 -GOT-SO-NEXT: Initial: 0x10920 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108F4 -GOT-SO-NEXT: Access: -32732 -GOT-SO-NEXT: Initial: 0x108CC -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108F8 -GOT-SO-NEXT: Access: -32728 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x108FC -GOT-SO-NEXT: Access: -32724 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10900 -GOT-SO-NEXT: Access: -32720 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10904 -GOT-SO-NEXT: Access: -32716 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: } -GOT-SO-NEXT: ] -GOT-SO-NEXT: Global entries [ -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10908 -GOT-SO-NEXT: Access: -32712 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Value: 0x0 -GOT-SO-NEXT: Type: None (0x0) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: _ITM_registerTMCloneTable (87) -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x1090C -GOT-SO-NEXT: Access: -32708 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Value: 0x0 -GOT-SO-NEXT: Type: None (0x0) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: _Jv_RegisterClasses (128) -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10910 -GOT-SO-NEXT: Access: -32704 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Value: 0x0 -GOT-SO-NEXT: Type: Function (0x2) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: __gmon_start__ (23) -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10914 -GOT-SO-NEXT: Access: -32700 -GOT-SO-NEXT: Initial: 0x840 -GOT-SO-NEXT: Value: 0x840 -GOT-SO-NEXT: Type: Function (0x2) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: puts at GLIBC_2.0 (162) -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x10918 -GOT-SO-NEXT: Access: -32696 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Value: 0x0 -GOT-SO-NEXT: Type: None (0x0) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: _ITM_deregisterTMCloneTable (59) -GOT-SO-NEXT: } -GOT-SO-NEXT: Entry { -GOT-SO-NEXT: Address: 0x1091C -GOT-SO-NEXT: Access: -32692 -GOT-SO-NEXT: Initial: 0x0 -GOT-SO-NEXT: Value: 0x0 -GOT-SO-NEXT: Type: Function (0x2) -GOT-SO-NEXT: Section: Undefined (0x0) -GOT-SO-NEXT: Name: __cxa_finalize at GLIBC_2.2 (113) -GOT-SO-NEXT: } -GOT-SO-NEXT: ] -GOT-SO-NEXT: Number of TLS and multi-GOT entries: 0 -GOT-SO-NEXT: } +# GOT-SO: Primary GOT { +# GOT-SO-NEXT: Canonical gp value: 0x188D0 +# GOT-SO-NEXT: Reserved entries [ +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108E0 +# GOT-SO-NEXT: Access: -32752 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Purpose: Lazy resolver +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108E4 +# GOT-SO-NEXT: Access: -32748 +# GOT-SO-NEXT: Initial: 0x80000000 +# GOT-SO-NEXT: Purpose: Module pointer (GNU extension) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: ] +# GOT-SO-NEXT: Local entries [ +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108E8 +# GOT-SO-NEXT: Access: -32744 +# GOT-SO-NEXT: Initial: 0x108E0 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108EC +# GOT-SO-NEXT: Access: -32740 +# GOT-SO-NEXT: Initial: 0x10000 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108F0 +# GOT-SO-NEXT: Access: -32736 +# GOT-SO-NEXT: Initial: 0x10920 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108F4 +# GOT-SO-NEXT: Access: -32732 +# GOT-SO-NEXT: Initial: 0x108CC +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108F8 +# GOT-SO-NEXT: Access: -32728 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x108FC +# GOT-SO-NEXT: Access: -32724 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10900 +# GOT-SO-NEXT: Access: -32720 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10904 +# GOT-SO-NEXT: Access: -32716 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: } +# GOT-SO-NEXT: ] +# GOT-SO-NEXT: Global entries [ +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10908 +# GOT-SO-NEXT: Access: -32712 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Value: 0x0 +# GOT-SO-NEXT: Type: None (0x0) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: _ITM_registerTMCloneTable (87) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x1090C +# GOT-SO-NEXT: Access: -32708 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Value: 0x0 +# GOT-SO-NEXT: Type: None (0x0) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: _Jv_RegisterClasses (128) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10910 +# GOT-SO-NEXT: Access: -32704 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Value: 0x0 +# GOT-SO-NEXT: Type: Function (0x2) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: __gmon_start__ (23) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10914 +# GOT-SO-NEXT: Access: -32700 +# GOT-SO-NEXT: Initial: 0x840 +# GOT-SO-NEXT: Value: 0x840 +# GOT-SO-NEXT: Type: Function (0x2) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: puts at GLIBC_2.0 (162) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x10918 +# GOT-SO-NEXT: Access: -32696 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Value: 0x0 +# GOT-SO-NEXT: Type: None (0x0) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: _ITM_deregisterTMCloneTable (59) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: Entry { +# GOT-SO-NEXT: Address: 0x1091C +# GOT-SO-NEXT: Access: -32692 +# GOT-SO-NEXT: Initial: 0x0 +# GOT-SO-NEXT: Value: 0x0 +# GOT-SO-NEXT: Type: Function (0x2) +# GOT-SO-NEXT: Section: Undefined (0x0) +# GOT-SO-NEXT: Name: __cxa_finalize at GLIBC_2.2 (113) +# GOT-SO-NEXT: } +# GOT-SO-NEXT: ] +# GOT-SO-NEXT: Number of TLS and multi-GOT entries: 0 +# GOT-SO-NEXT: } -GOT-TLS: Primary GOT { -GOT-TLS-NEXT: Canonical gp value: 0x18BF0 -GOT-TLS-NEXT: Reserved entries [ -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C00 -GOT-TLS-NEXT: Access: -32752 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Purpose: Lazy resolver -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C08 -GOT-TLS-NEXT: Access: -32744 -GOT-TLS-NEXT: Initial: 0x8000000000000000 -GOT-TLS-NEXT: Purpose: Module pointer (GNU extension) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: ] -GOT-TLS-NEXT: Local entries [ -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C10 -GOT-TLS-NEXT: Access: -32736 -GOT-TLS-NEXT: Initial: 0x10000 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C18 -GOT-TLS-NEXT: Access: -32728 -GOT-TLS-NEXT: Initial: 0x10C00 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C20 -GOT-TLS-NEXT: Access: -32720 -GOT-TLS-NEXT: Initial: 0x10CB8 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C28 -GOT-TLS-NEXT: Access: -32712 -GOT-TLS-NEXT: Initial: 0x10BF0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C30 -GOT-TLS-NEXT: Access: -32704 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C38 -GOT-TLS-NEXT: Access: -32696 -GOT-TLS-NEXT: Initial: 0x948 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C40 -GOT-TLS-NEXT: Access: -32688 -GOT-TLS-NEXT: Initial: 0xA20 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C48 -GOT-TLS-NEXT: Access: -32680 -GOT-TLS-NEXT: Initial: 0xAF0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C50 -GOT-TLS-NEXT: Access: -32672 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C58 -GOT-TLS-NEXT: Access: -32664 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C60 -GOT-TLS-NEXT: Access: -32656 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: } -GOT-TLS-NEXT: ] -GOT-TLS-NEXT: Global entries [ -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C68 -GOT-TLS-NEXT: Access: -32648 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Value: 0x0 -GOT-TLS-NEXT: Type: None (0x0) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: _ITM_registerTMCloneTable (78) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C70 -GOT-TLS-NEXT: Access: -32640 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Value: 0x0 -GOT-TLS-NEXT: Type: None (0x0) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: _Jv_RegisterClasses (119) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C78 -GOT-TLS-NEXT: Access: -32632 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Value: 0x0 -GOT-TLS-NEXT: Type: Function (0x2) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: __gmon_start__ (23) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C80 -GOT-TLS-NEXT: Access: -32624 -GOT-TLS-NEXT: Initial: 0xB60 -GOT-TLS-NEXT: Value: 0xB60 -GOT-TLS-NEXT: Type: Function (0x2) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: __tls_get_addr at GLIBC_2.3 (150) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C88 -GOT-TLS-NEXT: Access: -32616 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Value: 0x0 -GOT-TLS-NEXT: Type: None (0x0) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: _ITM_deregisterTMCloneTable (50) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: Entry { -GOT-TLS-NEXT: Address: 0x10C90 -GOT-TLS-NEXT: Access: -32608 -GOT-TLS-NEXT: Initial: 0x0 -GOT-TLS-NEXT: Value: 0x0 -GOT-TLS-NEXT: Type: Function (0x2) -GOT-TLS-NEXT: Section: Undefined (0x0) -GOT-TLS-NEXT: Name: __cxa_finalize at GLIBC_2.2 (104) -GOT-TLS-NEXT: } -GOT-TLS-NEXT: ] -GOT-TLS-NEXT: Number of TLS and multi-GOT entries: 4 -GOT-TLS-NEXT: } +# GOT-TLS: Primary GOT { +# GOT-TLS-NEXT: Canonical gp value: 0x18BF0 +# GOT-TLS-NEXT: Reserved entries [ +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C00 +# GOT-TLS-NEXT: Access: -32752 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Purpose: Lazy resolver +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C08 +# GOT-TLS-NEXT: Access: -32744 +# GOT-TLS-NEXT: Initial: 0x8000000000000000 +# GOT-TLS-NEXT: Purpose: Module pointer (GNU extension) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: ] +# GOT-TLS-NEXT: Local entries [ +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C10 +# GOT-TLS-NEXT: Access: -32736 +# GOT-TLS-NEXT: Initial: 0x10000 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C18 +# GOT-TLS-NEXT: Access: -32728 +# GOT-TLS-NEXT: Initial: 0x10C00 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C20 +# GOT-TLS-NEXT: Access: -32720 +# GOT-TLS-NEXT: Initial: 0x10CB8 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C28 +# GOT-TLS-NEXT: Access: -32712 +# GOT-TLS-NEXT: Initial: 0x10BF0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C30 +# GOT-TLS-NEXT: Access: -32704 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C38 +# GOT-TLS-NEXT: Access: -32696 +# GOT-TLS-NEXT: Initial: 0x948 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C40 +# GOT-TLS-NEXT: Access: -32688 +# GOT-TLS-NEXT: Initial: 0xA20 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C48 +# GOT-TLS-NEXT: Access: -32680 +# GOT-TLS-NEXT: Initial: 0xAF0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C50 +# GOT-TLS-NEXT: Access: -32672 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C58 +# GOT-TLS-NEXT: Access: -32664 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C60 +# GOT-TLS-NEXT: Access: -32656 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: ] +# GOT-TLS-NEXT: Global entries [ +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C68 +# GOT-TLS-NEXT: Access: -32648 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Value: 0x0 +# GOT-TLS-NEXT: Type: None (0x0) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: _ITM_registerTMCloneTable (78) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C70 +# GOT-TLS-NEXT: Access: -32640 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Value: 0x0 +# GOT-TLS-NEXT: Type: None (0x0) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: _Jv_RegisterClasses (119) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C78 +# GOT-TLS-NEXT: Access: -32632 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Value: 0x0 +# GOT-TLS-NEXT: Type: Function (0x2) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: __gmon_start__ (23) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C80 +# GOT-TLS-NEXT: Access: -32624 +# GOT-TLS-NEXT: Initial: 0xB60 +# GOT-TLS-NEXT: Value: 0xB60 +# GOT-TLS-NEXT: Type: Function (0x2) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: __tls_get_addr at GLIBC_2.3 (150) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C88 +# GOT-TLS-NEXT: Access: -32616 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Value: 0x0 +# GOT-TLS-NEXT: Type: None (0x0) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: _ITM_deregisterTMCloneTable (50) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: Entry { +# GOT-TLS-NEXT: Address: 0x10C90 +# GOT-TLS-NEXT: Access: -32608 +# GOT-TLS-NEXT: Initial: 0x0 +# GOT-TLS-NEXT: Value: 0x0 +# GOT-TLS-NEXT: Type: Function (0x2) +# GOT-TLS-NEXT: Section: Undefined (0x0) +# GOT-TLS-NEXT: Name: __cxa_finalize at GLIBC_2.2 (104) +# GOT-TLS-NEXT: } +# GOT-TLS-NEXT: ] +# GOT-TLS-NEXT: Number of TLS and multi-GOT entries: 4 +# GOT-TLS-NEXT: } -GOT-EMPTY: Primary GOT { -GOT-EMPTY-NEXT: Canonical gp value: 0x409FF0 -GOT-EMPTY-NEXT: Reserved entries [ -GOT-EMPTY-NEXT: Entry { -GOT-EMPTY-NEXT: Address: 0x402000 -GOT-EMPTY-NEXT: Access: -32752 -GOT-EMPTY-NEXT: Initial: 0x0 -GOT-EMPTY-NEXT: Purpose: Lazy resolver -GOT-EMPTY-NEXT: } -GOT-EMPTY-NEXT: Entry { -GOT-EMPTY-NEXT: Address: 0x402004 -GOT-EMPTY-NEXT: Access: -32748 -GOT-EMPTY-NEXT: Initial: 0x80000000 -GOT-EMPTY-NEXT: Purpose: Module pointer (GNU extension) -GOT-EMPTY-NEXT: } -GOT-EMPTY-NEXT: ] -GOT-EMPTY-NEXT: Local entries [ -GOT-EMPTY-NEXT: ] -GOT-EMPTY-NEXT: Global entries [ -GOT-EMPTY-NEXT: ] -GOT-EMPTY-NEXT: Number of TLS and multi-GOT entries: 2 -GOT-EMPTY-NEXT: } +# GOT-EMPTY: Primary GOT { +# GOT-EMPTY-NEXT: Canonical gp value: 0x409FF0 +# GOT-EMPTY-NEXT: Reserved entries [ +# GOT-EMPTY-NEXT: Entry { +# GOT-EMPTY-NEXT: Address: 0x402000 +# GOT-EMPTY-NEXT: Access: -32752 +# GOT-EMPTY-NEXT: Initial: 0x0 +# GOT-EMPTY-NEXT: Purpose: Lazy resolver +# GOT-EMPTY-NEXT: } +# GOT-EMPTY-NEXT: Entry { +# GOT-EMPTY-NEXT: Address: 0x402004 +# GOT-EMPTY-NEXT: Access: -32748 +# GOT-EMPTY-NEXT: Initial: 0x80000000 +# GOT-EMPTY-NEXT: Purpose: Module pointer (GNU extension) +# GOT-EMPTY-NEXT: } +# GOT-EMPTY-NEXT: ] +# GOT-EMPTY-NEXT: Local entries [ +# GOT-EMPTY-NEXT: ] +# GOT-EMPTY-NEXT: Global entries [ +# GOT-EMPTY-NEXT: ] +# GOT-EMPTY-NEXT: Number of TLS and multi-GOT entries: 2 +# GOT-EMPTY-NEXT: } -GOT-STATIC: Static GOT { -GOT-STATIC-NEXT: Canonical gp value: 0x418100 -GOT-STATIC-NEXT: Reserved entries [ -GOT-STATIC-NEXT: Entry { -GOT-STATIC-NEXT: Address: 0x410110 -GOT-STATIC-NEXT: Access: -32752 -GOT-STATIC-NEXT: Initial: 0x0 -GOT-STATIC-NEXT: Purpose: Lazy resolver -GOT-STATIC-NEXT: } -GOT-STATIC-NEXT: Entry { -GOT-STATIC-NEXT: Address: 0x410114 -GOT-STATIC-NEXT: Access: -32748 -GOT-STATIC-NEXT: Initial: 0x80000000 -GOT-STATIC-NEXT: Purpose: Module pointer (GNU extension) -GOT-STATIC-NEXT: } -GOT-STATIC-NEXT: ] -GOT-STATIC-NEXT: Local entries [ -GOT-STATIC-NEXT: Entry { -GOT-STATIC-NEXT: Address: 0x410118 -GOT-STATIC-NEXT: Access: -32744 -GOT-STATIC-NEXT: Initial: 0x400000 -GOT-STATIC-NEXT: } -GOT-STATIC-NEXT: Entry { -GOT-STATIC-NEXT: Address: 0x41011C -GOT-STATIC-NEXT: Access: -32740 -GOT-STATIC-NEXT: Initial: 0x400100 -GOT-STATIC-NEXT: } -GOT-STATIC-NEXT: Entry { -GOT-STATIC-NEXT: Address: 0x410120 -GOT-STATIC-NEXT: Access: -32736 -GOT-STATIC-NEXT: Initial: 0x400104 -GOT-STATIC-NEXT: } -GOT-STATIC-NEXT: ] -GOT-STATIC-NEXT: } +# GOT-STATIC: Static GOT { +# GOT-STATIC-NEXT: Canonical gp value: 0x418100 +# GOT-STATIC-NEXT: Reserved entries [ +# GOT-STATIC-NEXT: Entry { +# GOT-STATIC-NEXT: Address: 0x410110 +# GOT-STATIC-NEXT: Access: -32752 +# GOT-STATIC-NEXT: Initial: 0x0 +# GOT-STATIC-NEXT: Purpose: Lazy resolver +# GOT-STATIC-NEXT: } +# GOT-STATIC-NEXT: Entry { +# GOT-STATIC-NEXT: Address: 0x410114 +# GOT-STATIC-NEXT: Access: -32748 +# GOT-STATIC-NEXT: Initial: 0x80000000 +# GOT-STATIC-NEXT: Purpose: Module pointer (GNU extension) +# GOT-STATIC-NEXT: } +# GOT-STATIC-NEXT: ] +# GOT-STATIC-NEXT: Local entries [ +# GOT-STATIC-NEXT: Entry { +# GOT-STATIC-NEXT: Address: 0x410118 +# GOT-STATIC-NEXT: Access: -32744 +# GOT-STATIC-NEXT: Initial: 0x400000 +# GOT-STATIC-NEXT: } +# GOT-STATIC-NEXT: Entry { +# GOT-STATIC-NEXT: Address: 0x41011C +# GOT-STATIC-NEXT: Access: -32740 +# GOT-STATIC-NEXT: Initial: 0x400100 +# GOT-STATIC-NEXT: } +# GOT-STATIC-NEXT: Entry { +# GOT-STATIC-NEXT: Address: 0x410120 +# GOT-STATIC-NEXT: Access: -32736 +# GOT-STATIC-NEXT: Initial: 0x400104 +# GOT-STATIC-NEXT: } +# GOT-STATIC-NEXT: ] +# GOT-STATIC-NEXT: } -GNU-GOT-EXE: Primary GOT: -GNU-GOT-EXE-NEXT: Canonical gp value: 00418880 +# GNU-GOT-EXE: Primary GOT: +# GNU-GOT-EXE-NEXT: Canonical gp value: 00418880 -GNU-GOT-EXE: Reserved entries: -GNU-GOT-EXE-NEXT: Address Access Initial Purpose -GNU-GOT-EXE-NEXT: 00410890 -32752(gp) 00000000 Lazy resolver -GNU-GOT-EXE-NEXT: 00410894 -32748(gp) 80000000 Module pointer (GNU extension) +# GNU-GOT-EXE: Reserved entries: +# GNU-GOT-EXE-NEXT: Address Access Initial Purpose +# GNU-GOT-EXE-NEXT: 00410890 -32752(gp) 00000000 Lazy resolver +# GNU-GOT-EXE-NEXT: 00410894 -32748(gp) 80000000 Module pointer (GNU extension) -GNU-GOT-EXE: Local entries: -GNU-GOT-EXE-NEXT: Address Access Initial -GNU-GOT-EXE-NEXT: 00410898 -32744(gp) 00400418 -GNU-GOT-EXE-NEXT: 0041089c -32740(gp) 00410840 -GNU-GOT-EXE-NEXT: 004108a0 -32736(gp) 00000000 +# GNU-GOT-EXE: Local entries: +# GNU-GOT-EXE-NEXT: Address Access Initial +# GNU-GOT-EXE-NEXT: 00410898 -32744(gp) 00400418 +# GNU-GOT-EXE-NEXT: 0041089c -32740(gp) 00410840 +# GNU-GOT-EXE-NEXT: 004108a0 -32736(gp) 00000000 -GNU-GOT-EXE: Global entries: -GNU-GOT-EXE-NEXT: Address Access Initial Sym.Val. Type Ndx Name -GNU-GOT-EXE-NEXT: 004108a4 -32732(gp) 00000000 00000000 FUNC UND __gmon_start__ +# GNU-GOT-EXE: Global entries: +# GNU-GOT-EXE-NEXT: Address Access Initial Sym.Val. Type Ndx Name +# GNU-GOT-EXE-NEXT: 004108a4 -32732(gp) 00000000 00000000 FUNC UND __gmon_start__ -GNU-GOT-EXE: PLT GOT: +# GNU-GOT-EXE: PLT GOT: -GNU-GOT-EXE: Reserved entries: -GNU-GOT-EXE-NEXT: Address Initial Purpose -GNU-GOT-EXE-NEXT: 00410854 00000000 PLT lazy resolver -GNU-GOT-EXE-NEXT: 00410858 00000000 Module pointer +# GNU-GOT-EXE: Reserved entries: +# GNU-GOT-EXE-NEXT: Address Initial Purpose +# GNU-GOT-EXE-NEXT: 00410854 00000000 PLT lazy resolver +# GNU-GOT-EXE-NEXT: 00410858 00000000 Module pointer -GNU-GOT-EXE: Entries: -GNU-GOT-EXE-NEXT: Address Initial Sym.Val. Type Ndx Name -GNU-GOT-EXE-NEXT: 0041085c 00400800 00000000 FUNC UND puts -GNU-GOT-EXE-NEXT: 00410860 00400800 00000000 FUNC UND __libc_start_main +# GNU-GOT-EXE: Entries: +# GNU-GOT-EXE-NEXT: Address Initial Sym.Val. Type Ndx Name +# GNU-GOT-EXE-NEXT: 0041085c 00400800 00000000 FUNC UND puts +# GNU-GOT-EXE-NEXT: 00410860 00400800 00000000 FUNC UND __libc_start_main -GNU-GOT-SO: Primary GOT: -GNU-GOT-SO-NEXT: Canonical gp value: 000188d0 +# GNU-GOT-SO: Primary GOT: +# GNU-GOT-SO-NEXT: Canonical gp value: 000188d0 -GNU-GOT-SO: Reserved entries: -GNU-GOT-SO-NEXT: Address Access Initial Purpose -GNU-GOT-SO-NEXT: 000108e0 -32752(gp) 00000000 Lazy resolver -GNU-GOT-SO-NEXT: 000108e4 -32748(gp) 80000000 Module pointer (GNU extension) +# GNU-GOT-SO: Reserved entries: +# GNU-GOT-SO-NEXT: Address Access Initial Purpose +# GNU-GOT-SO-NEXT: 000108e0 -32752(gp) 00000000 Lazy resolver +# GNU-GOT-SO-NEXT: 000108e4 -32748(gp) 80000000 Module pointer (GNU extension) -GNU-GOT-SO: Local entries: -GNU-GOT-SO-NEXT: Address Access Initial -GNU-GOT-SO-NEXT: 000108e8 -32744(gp) 000108e0 -GNU-GOT-SO-NEXT: 000108ec -32740(gp) 00010000 -GNU-GOT-SO-NEXT: 000108f0 -32736(gp) 00010920 -GNU-GOT-SO-NEXT: 000108f4 -32732(gp) 000108cc -GNU-GOT-SO-NEXT: 000108f8 -32728(gp) 00000000 -GNU-GOT-SO-NEXT: 000108fc -32724(gp) 00000000 -GNU-GOT-SO-NEXT: 00010900 -32720(gp) 00000000 -GNU-GOT-SO-NEXT: 00010904 -32716(gp) 00000000 +# GNU-GOT-SO: Local entries: +# GNU-GOT-SO-NEXT: Address Access Initial +# GNU-GOT-SO-NEXT: 000108e8 -32744(gp) 000108e0 +# GNU-GOT-SO-NEXT: 000108ec -32740(gp) 00010000 +# GNU-GOT-SO-NEXT: 000108f0 -32736(gp) 00010920 +# GNU-GOT-SO-NEXT: 000108f4 -32732(gp) 000108cc +# GNU-GOT-SO-NEXT: 000108f8 -32728(gp) 00000000 +# GNU-GOT-SO-NEXT: 000108fc -32724(gp) 00000000 +# GNU-GOT-SO-NEXT: 00010900 -32720(gp) 00000000 +# GNU-GOT-SO-NEXT: 00010904 -32716(gp) 00000000 -GNU-GOT-SO: Global entries: -GNU-GOT-SO-NEXT: Address Access Initial Sym.Val. Type Ndx Name -GNU-GOT-SO-NEXT: 00010908 -32712(gp) 00000000 00000000 NOTYPE UND _ITM_registerTMCloneTable -GNU-GOT-SO-NEXT: 0001090c -32708(gp) 00000000 00000000 NOTYPE UND _Jv_RegisterClasses -GNU-GOT-SO-NEXT: 00010910 -32704(gp) 00000000 00000000 FUNC UND __gmon_start__ -GNU-GOT-SO-NEXT: 00010914 -32700(gp) 00000840 00000840 FUNC UND puts -GNU-GOT-SO-NEXT: 00010918 -32696(gp) 00000000 00000000 NOTYPE UND _ITM_deregisterTMCloneTable -GNU-GOT-SO-NEXT: 0001091c -32692(gp) 00000000 00000000 FUNC UND __cxa_finalize +# GNU-GOT-SO: Global entries: +# GNU-GOT-SO-NEXT: Address Access Initial Sym.Val. Type Ndx Name +# GNU-GOT-SO-NEXT: 00010908 -32712(gp) 00000000 00000000 NOTYPE UND _ITM_registerTMCloneTable +# GNU-GOT-SO-NEXT: 0001090c -32708(gp) 00000000 00000000 NOTYPE UND _Jv_RegisterClasses +# GNU-GOT-SO-NEXT: 00010910 -32704(gp) 00000000 00000000 FUNC UND __gmon_start__ +# GNU-GOT-SO-NEXT: 00010914 -32700(gp) 00000840 00000840 FUNC UND puts +# GNU-GOT-SO-NEXT: 00010918 -32696(gp) 00000000 00000000 NOTYPE UND _ITM_deregisterTMCloneTable +# GNU-GOT-SO-NEXT: 0001091c -32692(gp) 00000000 00000000 FUNC UND __cxa_finalize -GNU-GOT-TLS: Primary GOT: -GNU-GOT-TLS-NEXT: Canonical gp value: 0000000000018bf0 +# GNU-GOT-TLS: Primary GOT: +# GNU-GOT-TLS-NEXT: Canonical gp value: 0000000000018bf0 -GNU-GOT-TLS: Reserved entries: -GNU-GOT-TLS-NEXT: Address Access Initial Purpose -GNU-GOT-TLS-NEXT: 0000000000010c00 -32752(gp) 0000000000000000 Lazy resolver -GNU-GOT-TLS-NEXT: 0000000000010c08 -32744(gp) 8000000000000000 Module pointer (GNU extension) +# GNU-GOT-TLS: Reserved entries: +# GNU-GOT-TLS-NEXT: Address Access Initial Purpose +# GNU-GOT-TLS-NEXT: 0000000000010c00 -32752(gp) 0000000000000000 Lazy resolver +# GNU-GOT-TLS-NEXT: 0000000000010c08 -32744(gp) 8000000000000000 Module pointer (GNU extension) -GNU-GOT-TLS: Local entries: -GNU-GOT-TLS-NEXT: Address Access Initial -GNU-GOT-TLS-NEXT: 0000000000010c10 -32736(gp) 0000000000010000 -GNU-GOT-TLS-NEXT: 0000000000010c18 -32728(gp) 0000000000010c00 -GNU-GOT-TLS-NEXT: 0000000000010c20 -32720(gp) 0000000000010cb8 -GNU-GOT-TLS-NEXT: 0000000000010c28 -32712(gp) 0000000000010bf0 -GNU-GOT-TLS-NEXT: 0000000000010c30 -32704(gp) 0000000000000000 -GNU-GOT-TLS-NEXT: 0000000000010c38 -32696(gp) 0000000000000948 -GNU-GOT-TLS-NEXT: 0000000000010c40 -32688(gp) 0000000000000a20 -GNU-GOT-TLS-NEXT: 0000000000010c48 -32680(gp) 0000000000000af0 -GNU-GOT-TLS-NEXT: 0000000000010c50 -32672(gp) 0000000000000000 -GNU-GOT-TLS-NEXT: 0000000000010c58 -32664(gp) 0000000000000000 -GNU-GOT-TLS-NEXT: 0000000000010c60 -32656(gp) 0000000000000000 +# GNU-GOT-TLS: Local entries: +# GNU-GOT-TLS-NEXT: Address Access Initial +# GNU-GOT-TLS-NEXT: 0000000000010c10 -32736(gp) 0000000000010000 +# GNU-GOT-TLS-NEXT: 0000000000010c18 -32728(gp) 0000000000010c00 +# GNU-GOT-TLS-NEXT: 0000000000010c20 -32720(gp) 0000000000010cb8 +# GNU-GOT-TLS-NEXT: 0000000000010c28 -32712(gp) 0000000000010bf0 +# GNU-GOT-TLS-NEXT: 0000000000010c30 -32704(gp) 0000000000000000 +# GNU-GOT-TLS-NEXT: 0000000000010c38 -32696(gp) 0000000000000948 +# GNU-GOT-TLS-NEXT: 0000000000010c40 -32688(gp) 0000000000000a20 +# GNU-GOT-TLS-NEXT: 0000000000010c48 -32680(gp) 0000000000000af0 +# GNU-GOT-TLS-NEXT: 0000000000010c50 -32672(gp) 0000000000000000 +# GNU-GOT-TLS-NEXT: 0000000000010c58 -32664(gp) 0000000000000000 +# GNU-GOT-TLS-NEXT: 0000000000010c60 -32656(gp) 0000000000000000 -GNU-GOT-TLS: Global entries: -GNU-GOT-TLS-NEXT: Address Access Initial Sym.Val. Type Ndx Name -GNU-GOT-TLS-NEXT: 0000000000010c68 -32648(gp) 0000000000000000 0000000000000000 NOTYPE UND _ITM_registerTMCloneTable -GNU-GOT-TLS-NEXT: 0000000000010c70 -32640(gp) 0000000000000000 0000000000000000 NOTYPE UND _Jv_RegisterClasses -GNU-GOT-TLS-NEXT: 0000000000010c78 -32632(gp) 0000000000000000 0000000000000000 FUNC UND __gmon_start__ -GNU-GOT-TLS-NEXT: 0000000000010c80 -32624(gp) 0000000000000b60 0000000000000b60 FUNC UND __tls_get_addr -GNU-GOT-TLS-NEXT: 0000000000010c88 -32616(gp) 0000000000000000 0000000000000000 NOTYPE UND _ITM_deregisterTMCloneTable -GNU-GOT-TLS-NEXT: 0000000000010c90 -32608(gp) 0000000000000000 0000000000000000 FUNC UND __cxa_finalize +# GNU-GOT-TLS: Global entries: +# GNU-GOT-TLS-NEXT: Address Access Initial Sym.Val. Type Ndx Name +# GNU-GOT-TLS-NEXT: 0000000000010c68 -32648(gp) 0000000000000000 0000000000000000 NOTYPE UND _ITM_registerTMCloneTable +# GNU-GOT-TLS-NEXT: 0000000000010c70 -32640(gp) 0000000000000000 0000000000000000 NOTYPE UND _Jv_RegisterClasses +# GNU-GOT-TLS-NEXT: 0000000000010c78 -32632(gp) 0000000000000000 0000000000000000 FUNC UND __gmon_start__ +# GNU-GOT-TLS-NEXT: 0000000000010c80 -32624(gp) 0000000000000b60 0000000000000b60 FUNC UND __tls_get_addr +# GNU-GOT-TLS-NEXT: 0000000000010c88 -32616(gp) 0000000000000000 0000000000000000 NOTYPE UND _ITM_deregisterTMCloneTable +# GNU-GOT-TLS-NEXT: 0000000000010c90 -32608(gp) 0000000000000000 0000000000000000 FUNC UND __cxa_finalize -GNU-GOTY : Primary GOT: -GNU-GOT-EMPTY: Canonical gp value: 00409ff0 +# GNU-GOTY : Primary GOT: +# GNU-GOT-EMPTY: Canonical gp value: 00409ff0 -GNU-GOTY : Reserved entries: -GNU-GOT-EMPTY: Address Access Initial Purpose -GNU-GOT-EMPTY: 00402000 -32752(gp) 00000000 Lazy resolver -GNU-GOT-EMPTY: 00402004 -32748(gp) 80000000 Module pointer (GNU extension) +# GNU-GOTY : Reserved entries: +# GNU-GOT-EMPTY: Address Access Initial Purpose +# GNU-GOT-EMPTY: 00402000 -32752(gp) 00000000 Lazy resolver +# GNU-GOT-EMPTY: 00402004 -32748(gp) 80000000 Module pointer (GNU extension) -GNU-GOT-STATIC: Static GOT: -GNU-GOT-STATIC-NEXT: Canonical gp value: 00418100 +# GNU-GOT-STATIC: Static GOT: +# GNU-GOT-STATIC-NEXT: Canonical gp value: 00418100 -GNU-GOT-STATIC: Reserved entries: -GNU-GOT-STATIC-NEXT: Address Access Initial Purpose -GNU-GOT-STATIC-NEXT: 00410110 -32752(gp) 00000000 Lazy resolver -GNU-GOT-STATIC-NEXT: 00410114 -32748(gp) 80000000 Module pointer (GNU extension) +# GNU-GOT-STATIC: Reserved entries: +# GNU-GOT-STATIC-NEXT: Address Access Initial Purpose +# GNU-GOT-STATIC-NEXT: 00410110 -32752(gp) 00000000 Lazy resolver +# GNU-GOT-STATIC-NEXT: 00410114 -32748(gp) 80000000 Module pointer (GNU extension) -GNU-GOT-STATIC: Local entries: -GNU-GOT-STATIC-NEXT: Address Access Initial -GNU-GOT-STATIC-NEXT: 00410118 -32744(gp) 00400000 -GNU-GOT-STATIC-NEXT: 0041011c -32740(gp) 00400100 -GNU-GOT-STATIC-NEXT: 00410120 -32736(gp) 00400104 +# GNU-GOT-STATIC: Local entries: +# GNU-GOT-STATIC-NEXT: Address Access Initial +# GNU-GOT-STATIC-NEXT: 00410118 -32744(gp) 00400000 +# GNU-GOT-STATIC-NEXT: 0041011c -32740(gp) 00400100 +# GNU-GOT-STATIC-NEXT: 00410120 -32736(gp) 00400104 diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test index 4e40ca6aa2c1..b79237ce6c36 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test @@ -1,64 +1,64 @@ -RUN: llvm-readobj -A %p/Inputs/got-plt.exe.elf-mipsel | FileCheck %s -RUN: llvm-readelf -A %p/Inputs/got-plt.exe.elf-mipsel | FileCheck --check-prefix=GNU %s +# RUN: llvm-readobj -A %p/Inputs/got-plt.exe.elf-mipsel | FileCheck %s +# RUN: llvm-readelf -A %p/Inputs/got-plt.exe.elf-mipsel | FileCheck --check-prefix=GNU %s -CHECK: PLT GOT { -CHECK-NEXT: Reserved entries [ -CHECK-NEXT: Entry { -CHECK-NEXT: Address: 0x410814 -CHECK-NEXT: Initial: 0x0 -CHECK-NEXT: Purpose: PLT lazy resolver -CHECK-NEXT: } -CHECK-NEXT: Entry { -CHECK-NEXT: Address: 0x410818 -CHECK-NEXT: Initial: 0x0 -CHECK-NEXT: Purpose: Module pointer -CHECK-NEXT: } -CHECK-NEXT: ] -CHECK-NEXT: Entries [ -CHECK-NEXT: Entry { -CHECK-NEXT: Address: 0x41081C -CHECK-NEXT: Initial: 0x4007C0 -CHECK-NEXT: Value: 0x0 -CHECK-NEXT: Type: Function (0x2) -CHECK-NEXT: Section: Undefined (0x0) -CHECK-NEXT: Name: puts at GLIBC_2.0 (71) -CHECK-NEXT: } -CHECK-NEXT: Entry { -CHECK-NEXT: Address: 0x410820 -CHECK-NEXT: Initial: 0x4007C0 -CHECK-NEXT: Value: 0x0 -CHECK-NEXT: Type: Function (0x2) -CHECK-NEXT: Section: Undefined (0x0) -CHECK-NEXT: Name: __libc_start_main at GLIBC_2.0 (53) -CHECK-NEXT: } -CHECK-NEXT: ] -CHECK-NEXT: } +# CHECK: PLT GOT { +# CHECK-NEXT: Reserved entries [ +# CHECK-NEXT: Entry { +# CHECK-NEXT: Address: 0x410814 +# CHECK-NEXT: Initial: 0x0 +# CHECK-NEXT: Purpose: PLT lazy resolver +# CHECK-NEXT: } +# CHECK-NEXT: Entry { +# CHECK-NEXT: Address: 0x410818 +# CHECK-NEXT: Initial: 0x0 +# CHECK-NEXT: Purpose: Module pointer +# CHECK-NEXT: } +# CHECK-NEXT: ] +# CHECK-NEXT: Entries [ +# CHECK-NEXT: Entry { +# CHECK-NEXT: Address: 0x41081C +# CHECK-NEXT: Initial: 0x4007C0 +# CHECK-NEXT: Value: 0x0 +# CHECK-NEXT: Type: Function (0x2) +# CHECK-NEXT: Section: Undefined (0x0) +# CHECK-NEXT: Name: puts at GLIBC_2.0 (71) +# CHECK-NEXT: } +# CHECK-NEXT: Entry { +# CHECK-NEXT: Address: 0x410820 +# CHECK-NEXT: Initial: 0x4007C0 +# CHECK-NEXT: Value: 0x0 +# CHECK-NEXT: Type: Function (0x2) +# CHECK-NEXT: Section: Undefined (0x0) +# CHECK-NEXT: Name: __libc_start_main at GLIBC_2.0 (53) +# CHECK-NEXT: } +# CHECK-NEXT: ] +# CHECK-NEXT: } -GNU: Primary GOT: -GNU-NEXT: Canonical gp value: 00418840 +# GNU: Primary GOT: +# GNU-NEXT: Canonical gp value: 00418840 -GNU: Reserved entries: -GNU-NEXT: Address Access Initial Purpose -GNU-NEXT: 00410850 -32752(gp) 00000000 Lazy resolver -GNU-NEXT: 00410854 -32748(gp) 80000000 Module pointer (GNU extension) +# GNU: Reserved entries: +# GNU-NEXT: Address Access Initial Purpose +# GNU-NEXT: 00410850 -32752(gp) 00000000 Lazy resolver +# GNU-NEXT: 00410854 -32748(gp) 80000000 Module pointer (GNU extension) -GNU: Local entries: -GNU-NEXT: Address Access Initial -GNU-NEXT: 00410858 -32744(gp) 004003d4 -GNU-NEXT: 0041085c -32740(gp) 00410800 -GNU-NEXT: 00410860 -32736(gp) 00000000 +# GNU: Local entries: +# GNU-NEXT: Address Access Initial +# GNU-NEXT: 00410858 -32744(gp) 004003d4 +# GNU-NEXT: 0041085c -32740(gp) 00410800 +# GNU-NEXT: 00410860 -32736(gp) 00000000 -GNU: Global entries: -GNU-NEXT: Address Access Initial Sym.Val. Type Ndx Name -GNU-NEXT: 00410864 -32732(gp) 00000000 00000000 FUNC UND __gmon_start__ -GNU-NEXT: PLT GOT: +# GNU: Global entries: +# GNU-NEXT: Address Access Initial Sym.Val. Type Ndx Name +# GNU-NEXT: 00410864 -32732(gp) 00000000 00000000 FUNC UND __gmon_start__ +# GNU-NEXT: PLT GOT: -GNU: Reserved entries: -GNU-NEXT: Address Initial Purpose -GNU-NEXT: 00410814 00000000 PLT lazy resolver -GNU-NEXT: 00410818 00000000 Module pointer +# GNU: Reserved entries: +# GNU-NEXT: Address Initial Purpose +# GNU-NEXT: 00410814 00000000 PLT lazy resolver +# GNU-NEXT: 00410818 00000000 Module pointer -GNU: Entries: -GNU-NEXT: Address Initial Sym.Val. Type Ndx Name -GNU-NEXT: 0041081c 004007c0 00000000 FUNC UND puts -GNU-NEXT: 00410820 004007c0 00000000 FUNC UND __libc_start_main +# GNU: Entries: +# GNU-NEXT: Address Initial Sym.Val. Type Ndx Name +# GNU-NEXT: 0041081c 004007c0 00000000 FUNC UND puts +# GNU-NEXT: 00410820 004007c0 00000000 FUNC UND __libc_start_main From llvm-commits at lists.llvm.org Tue Jul 7 04:46:41 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:46:41 +0000 (UTC) Subject: [PATCH] D83131: [llvm-readobj] - Refine the error reporting in LLVMStyle::printELFLinkerOptions. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG0d656cb25dc7: [llvm-readobj] - Refine the error reporting in LLVMStyle<ELFT>… (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D83131?vs=275391&id=275657#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83131/new/ https://reviews.llvm.org/D83131 Files: llvm/test/tools/llvm-readobj/ELF/linker-options.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6794,30 +6794,33 @@ if (Shdr.sh_type != ELF::SHT_LLVM_LINKER_OPTIONS) continue; - ArrayRef Contents = - unwrapOrError(this->FileName, Obj->getSectionContents(&Shdr)); - if (Contents.empty()) + Expected> ContentsOrErr = Obj->getSectionContents(&Shdr); + if (!ContentsOrErr) { + this->reportUniqueWarning( + createError("unable to read the content of the " + "SHT_LLVM_LINKER_OPTIONS section: " + + toString(ContentsOrErr.takeError()))); + continue; + } + if (ContentsOrErr->empty()) continue; - if (Contents.back() != 0) { - reportWarning(createError("SHT_LLVM_LINKER_OPTIONS section at index " + - Twine(I) + - " is broken: the " - "content is not null-terminated"), - this->FileName); + if (ContentsOrErr->back() != 0) { + this->reportUniqueWarning( + createError("SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: the " + "content is not null-terminated")); continue; } SmallVector Strings; - toStringRef(Contents.drop_back()).split(Strings, '\0'); + toStringRef(ContentsOrErr->drop_back()).split(Strings, '\0'); if (Strings.size() % 2 != 0) { - reportWarning( - createError( - "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + - " is broken: an incomplete " - "key-value pair was found. The last possible key was: \"" + - Strings.back() + "\""), - this->FileName); + this->reportUniqueWarning(createError( + "SHT_LLVM_LINKER_OPTIONS section at index " + Twine(I) + + " is broken: an incomplete " + "key-value pair was found. The last possible key was: \"" + + Strings.back() + "\"")); continue; } Index: llvm/test/tools/llvm-readobj/ELF/linker-options.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/linker-options.test +++ llvm/test/tools/llvm-readobj/ELF/linker-options.test @@ -2,13 +2,14 @@ ## to dump SHT_LLVM_LINKER_OPTIONS sections. # RUN: yaml2obj --docnum=1 %s -o %t1 -# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s --check-prefix=CHECK -DFILE=%t1 +# RUN: llvm-readobj --elf-linker-options %t1 2>&1 | FileCheck %s -DFILE=%t1 # CHECK: LinkerOptions [ # CHECK: option 0: value 0 # CHECK: option 1: value 1 # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 2 is broken: an incomplete key-value pair was found. The last possible key was: "c" # CHECK-NEXT: warning: '[[FILE]]': SHT_LLVM_LINKER_OPTIONS section at index 4 is broken: the content is not null-terminated +# CHECK-NEXT: warning: '[[FILE]]': unable to read the content of the SHT_LLVM_LINKER_OPTIONS section: section [index 5] has a sh_offset (0xffffffff) + sh_size (0x8) that is greater than the file size (0x370) # CHECK-NEXT: option 3: value 3 # CHECK-NEXT: ] @@ -44,7 +45,15 @@ - Name: .linker-options.nonul Type: SHT_LLVM_LINKER_OPTIONS Content: "61" -## Case 5: another correct case to show we do not stop dumping after reporting a warning. +## Case 5: check we report a warning when it is not possible to read +## the content of the SHT_LLVM_LINKER_OPTIONS section. + - Name: .linker-options.broken.content + Type: SHT_LLVM_LINKER_OPTIONS + ShOffset: 0xffffffff + Options: + - Name: foo + Value: bar +## Case 6: another correct case to show we do not stop dumping after reporting a warning. - Name: .linker-options.valid2 Type: SHT_LLVM_LINKER_OPTIONS Options: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83131.275657.patch Type: text/x-patch Size: 4074 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:51:12 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:51:12 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. In-Reply-To: References: Message-ID: <1df513aac41230e547e43d2e5c2a268a@localhost.localdomain> dmgreen added a comment. Is the DT reliable enough to use for checking the block is in the loop? I see we might have to exclude the preheader and midblock. But if it's not uptodate, and only knows about split block, it might think the midblock it dominated by the vector body. Maybe if LI->getLoopFor(LoopVectorBody == LI->getLoopFor(InsertBB) works, that may be better? It looks like LI is kept uptodate as blocks get split. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2881 + // Get a suitable insert point for SCEV expansion. For blocks in the vector + // loop, chose the end of the vector loop header (=LoopVectorBody), because + // the DomTree is not kept up-to-date for additional blocks generated in the ---------------- *choose Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83288/new/ https://reviews.llvm.org/D83288 From llvm-commits at lists.llvm.org Tue Jul 7 04:54:57 2020 From: llvm-commits at lists.llvm.org (Nathan James via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:54:57 +0000 (UTC) Subject: [PATCH] D82159: Add a cmake warning when someone tries to configure clang-tools-extra without clang In-Reply-To: References: Message-ID: njames93 updated this revision to Diff 275999. njames93 added a comment. Change message from WARNING to STATUS in line with flang. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82159/new/ https://reviews.llvm.org/D82159 Files: llvm/CMakeLists.txt Index: llvm/CMakeLists.txt =================================================================== --- llvm/CMakeLists.txt +++ llvm/CMakeLists.txt @@ -81,6 +81,10 @@ if( LLVM_ENABLE_PROJECTS STREQUAL "all" ) set( LLVM_ENABLE_PROJECTS ${LLVM_ALL_PROJECTS}) endif() +if ("clang-tools-extra" IN_LIST LLVM_ENABLE_PROJECTS AND NOT "clang" IN_LIST LLVM_ENABLE_PROJECTS) + message(STATUS "clang-tools-extra is enabled, which depends on 'clang'. Automatically enabling 'clang'.") + list(APPEND LLVM_ENABLE_PROJECTS "clang") +endif() if ("flang" IN_LIST LLVM_ENABLE_PROJECTS AND NOT "mlir" IN_LIST LLVM_ENABLE_PROJECTS) message(STATUS "Enabling MLIR as a dependency to flang") list(APPEND LLVM_ENABLE_PROJECTS "mlir") -------------- next part -------------- A non-text attachment was scrubbed... Name: D82159.275999.patch Type: text/x-patch Size: 719 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:55:08 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:55:08 +0000 (UTC) Subject: [PATCH] D83099: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' In-Reply-To: References: Message-ID: <4468e98012d5118433e1e67009436cd3@localhost.localdomain> sammccall marked 5 inline comments as done. sammccall added inline comments. ================ Comment at: .gitignore:57 +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache ---------------- kadircet wrote: > why do we still need this ? i thought index (and other caches) would reside in `.cache` ? Otherwise we're going to end up with indexes from old versions of clangd checked in :-( ================ Comment at: .gitignore:58 +.clangd/ +.cache # static analyzer regression testing project files ---------------- hokein wrote: > I'm afraid that many projects have to update their `.gitignore`, but this is a tradeoff... Yeah. This is a consequence of naming the config file `.clangd`, which I think is pretty desirable. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 From llvm-commits at lists.llvm.org Tue Jul 7 04:58:15 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:58:15 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <3c9b626bf8b7cec17ec47479d4a18650@localhost.localdomain> jmorse updated this revision to Diff 276001. jmorse added a comment. clang-format, add file headers, address some review comments CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 Files: llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83046.276001.patch Type: text/x-patch Size: 23074 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 04:59:25 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 11:59:25 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: jmorse marked 9 inline comments as done. jmorse added a comment. Thanks for the feedback, In D83046#2135151 , @djtodoro wrote: > Instead of `git mv llvm/lib/CodeGen/LiveDebugValues.cpp llvm/lib/CodeGen/VarLocBasedImpl.cpp` > it should be as following: > `git mv llvm/lib/CodeGen/LiveDebugValues.cpp llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp` Ah yep, missed that, > I'd prefer adding all the patches in the stack, since there might be developers using some scripts for applying the patches automatically from Phabricator. Sure, I'll upload that shortly. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Tue Jul 7 05:00:03 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:00:03 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. Message-ID: kuter created this revision. Herald added subscribers: llvm-commits, okura, bbn, uenoku, hiraditya. Herald added a reviewer: jdoerfert. Herald added a reviewer: sstefan1. Herald added a reviewer: uenoku. Herald added a reviewer: homerdin. Herald added a reviewer: baziotis. Herald added a project: LLVM. This patch is meant to be used for creating videos of the attributor scheduling. Usage: 1. Create dot files (--attribute-seed-allow-list recommended) 2. Render the dot files. for f in *.dot do dot -Tpng "$f" -o "$f.png" done 1. concatenate the images with ffmpeg. ffmpeg -r 1 -pattern_type glob -i '*.png' -codec:v libx264 out.mp4 (TODO: Find options that don't break the aspect ratio) https://reviews.llvm.org/D83297 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83297.276000.patch Type: text/x-patch Size: 6694 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:04:02 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:04:02 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <794d9cddd1b69c0c4486551ad370c987@localhost.localdomain> jdoerfert added a comment. In D83268#2135081 , @Hahnfeld wrote: > This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?). This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why: - Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically. - Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions. In D83268#2135655 , @ABataev wrote: > In D83268#2135081 , @Hahnfeld wrote: > > > This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?). > > > +1. Better to introduce new entry points and mark these ones as deprecated. Same response as above. What is the use case here which we want to continue to support? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 05:05:20 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via llvm-commits) Date: Tue, 07 Jul 2020 05:05:20 -0700 (PDT) Subject: [llvm] 7bf299c - [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz Message-ID: <5f046500.1c69fb81.7a422.79e1@mx.google.com> Author: Ayal Zaks Date: 2020-07-07T15:04:21+03:00 New Revision: 7bf299c8d8d59304fb821f8811618cdeb1d1f1fd URL: https://github.com/llvm/llvm-project/commit/7bf299c8d8d59304fb821f8811618cdeb1d1f1fd DIFF: https://github.com/llvm/llvm-project/commit/7bf299c8d8d59304fb821f8811618cdeb1d1f1fd.diff LOG: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz If a loop is in a function marked OptSize, Loop Access Analysis should refrain from generating runtime checks for unit strides that will version the loop. If a loop is in a function marked OptSize and its vectorization is enabled, it should be vectorized w/o any versioning. Fixes PR46228. Differential Revision: https://reviews.llvm.org/D81345 Added: Modified: llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/optsize.ll llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll llvm/test/Transforms/LoopVectorize/optsize.ll llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll llvm/test/Transforms/LoopVectorize/runtime-check.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index 3c75f0fcebfa..ae282a7a1095 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1835,6 +1835,10 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI, const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel(); + const bool EnableMemAccessVersioningOfLoop = + EnableMemAccessVersioning && + !TheLoop->getHeader()->getParent()->hasOptSize(); + // For each block. for (BasicBlock *BB : TheLoop->blocks()) { // Scan the BB and collect legal loads and stores. Also detect any @@ -1890,7 +1894,7 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI, NumLoads++; Loads.push_back(Ld); DepChecker->addAccess(Ld); - if (EnableMemAccessVersioning) + if (EnableMemAccessVersioningOfLoop) collectStridedAccess(Ld); continue; } @@ -1914,7 +1918,7 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI, NumStores++; Stores.push_back(St); DepChecker->addAccess(St); - if (EnableMemAccessVersioning) + if (EnableMemAccessVersioningOfLoop) collectStridedAccess(St); } } // Next instr. diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 26f2aa0073e1..4998082f3868 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4937,15 +4937,8 @@ bool LoopVectorizationCostModel::runtimeChecksRequired() { return true; } - // FIXME: Avoid specializing for stride==1 instead of bailing out. - if (!Legal->getLAI()->getSymbolicStrides().empty()) { - reportVectorizationFailure("Runtime stride check is required with -Os/-Oz", - "runtime stride == 1 checks needed. Enable vectorization of " - "this loop with '#pragma clang loop vectorize(enable)' when " - "compiling with -Os/-Oz", - "CantVersionLoopWithOptForSize", ORE, TheLoop); - return true; - } + assert(Legal->getLAI()->getSymbolicStrides().empty() && + "Specializing for stride == 1 under -Os/-Oz"); return false; } @@ -7611,7 +7604,7 @@ static ScalarEpilogueLowering getScalarEpilogueLowering( PGSOQueryType::IRPass); // 1) OptSize takes precedence over all other options, i.e. if this is set, // don't look at hints or options, and don't request a scalar epilogue. - if (OptSize && Hints.getForce() != LoopVectorizeHints::FK_Enabled) + if (OptSize) return CM_ScalarEpilogueNotAllowedOptSize; bool PredicateOptDisabled = PreferPredicateOverEpilog.getNumOccurrences() && diff --git a/llvm/test/Transforms/LoopVectorize/X86/optsize.ll b/llvm/test/Transforms/LoopVectorize/X86/optsize.ll index ad72f90d0852..0bbb62379c2c 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/optsize.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/optsize.ll @@ -218,38 +218,35 @@ for.end: ; preds = %for.body attributes #1 = { minsize } -; We can't vectorize this one because we version for stride==1; even having TC -; a multiple of VF. +; We can vectorize this one by refraining from versioning for stride==1. define void @scev4stride1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32 %k) #2 { ; CHECK-LABEL: @scev4stride1( ; CHECK-NEXT: for.body.preheader: -; CHECK-NEXT: br label [[FOR_BODY:%.*]] -; CHECK: for.body: -; CHECK-NEXT: [[I_07:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER:%.*]] ] -; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K:%.*]] -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i32 [[MUL]] -; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[I_07]] -; CHECK-NEXT: store i32 [[TMP0]], i32* [[ARRAYIDX1]], align 4 -; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_07]], 1 -; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 256 -; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]] +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <64 x i32> undef, i32 [[K:%.*]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <64 x i32> [[BROADCAST_SPLATINSERT]], <64 x i32> undef, <64 x i32> zeroinitializer +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK: middle.block: +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 256, 256 +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: ; CHECK: for.end.loopexit: ; CHECK-NEXT: ret void ; ; AUTOVF-LABEL: @scev4stride1( ; AUTOVF-NEXT: for.body.preheader: -; AUTOVF-NEXT: br label [[FOR_BODY:%.*]] -; AUTOVF: for.body: -; AUTOVF-NEXT: [[I_07:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER:%.*]] ] -; AUTOVF-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K:%.*]] -; AUTOVF-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i32 [[MUL]] -; AUTOVF-NEXT: [[TMP0:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; AUTOVF-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[I_07]] -; AUTOVF-NEXT: store i32 [[TMP0]], i32* [[ARRAYIDX1]], align 4 -; AUTOVF-NEXT: [[INC]] = add nuw nsw i32 [[I_07]], 1 -; AUTOVF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 256 -; AUTOVF-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]] +; AUTOVF-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; AUTOVF: vector.ph: +; AUTOVF-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> undef, i32 [[K:%.*]], i32 0 +; AUTOVF-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> undef, <8 x i32> zeroinitializer +; AUTOVF-NEXT: br label [[VECTOR_BODY:%.*]] +; AUTOVF: vector.body: +; AUTOVF: middle.block: +; AUTOVF-NEXT: [[CMP_N:%.*]] = icmp eq i32 256, 256 +; AUTOVF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]] +; AUTOVF: scalar.ph: ; AUTOVF: for.end.loopexit: ; AUTOVF-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll b/llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll index 967dd3f2c26b..8852513cc3db 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -loop-vectorize -S | FileCheck %s --check-prefixes=CHECK,DEFAULT -; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S | FileCheck %s --check-prefixes=CHECK,PREDFLAG +; RUN: opt < %s -loop-vectorize -S | FileCheck %s +; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S | FileCheck %s target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @@ -74,141 +74,56 @@ for.body: br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !6 } +; Marking function as optsize turns tail folding on, as if explicit tail folding +; flag was enabled. define dso_local void @tail_folding_disabled(i32* noalias nocapture %A, i32* noalias nocapture readonly %B, i32* noalias nocapture readonly %C) local_unnamed_addr #0 { -; DEFAULT-LABEL: @tail_folding_disabled( -; DEFAULT-NEXT: entry: -; DEFAULT-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] -; DEFAULT: vector.ph: -; DEFAULT-NEXT: br label [[VECTOR_BODY:%.*]] -; DEFAULT: vector.body: -; DEFAULT-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; DEFAULT-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 -; DEFAULT-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8 -; DEFAULT-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16 -; DEFAULT-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24 -; DEFAULT-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[TMP0]] -; DEFAULT-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[TMP1]] -; DEFAULT-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[TMP2]] -; DEFAULT-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[TMP3]] -; DEFAULT-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 0 -; DEFAULT-NEXT: [[TMP9:%.*]] = bitcast i32* [[TMP8]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD:%.*]] = load <8 x i32>, <8 x i32>* [[TMP9]], align 4 -; DEFAULT-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 8 -; DEFAULT-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x i32>, <8 x i32>* [[TMP11]], align 4 -; DEFAULT-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 16 -; DEFAULT-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD2:%.*]] = load <8 x i32>, <8 x i32>* [[TMP13]], align 4 -; DEFAULT-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 24 -; DEFAULT-NEXT: [[TMP15:%.*]] = bitcast i32* [[TMP14]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD3:%.*]] = load <8 x i32>, <8 x i32>* [[TMP15]], align 4 -; DEFAULT-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, i32* [[C:%.*]], i64 [[TMP0]] -; DEFAULT-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[TMP1]] -; DEFAULT-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[TMP2]] -; DEFAULT-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[TMP3]] -; DEFAULT-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, i32* [[TMP16]], i32 0 -; DEFAULT-NEXT: [[TMP21:%.*]] = bitcast i32* [[TMP20]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD4:%.*]] = load <8 x i32>, <8 x i32>* [[TMP21]], align 4 -; DEFAULT-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, i32* [[TMP16]], i32 8 -; DEFAULT-NEXT: [[TMP23:%.*]] = bitcast i32* [[TMP22]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD5:%.*]] = load <8 x i32>, <8 x i32>* [[TMP23]], align 4 -; DEFAULT-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, i32* [[TMP16]], i32 16 -; DEFAULT-NEXT: [[TMP25:%.*]] = bitcast i32* [[TMP24]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD6:%.*]] = load <8 x i32>, <8 x i32>* [[TMP25]], align 4 -; DEFAULT-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, i32* [[TMP16]], i32 24 -; DEFAULT-NEXT: [[TMP27:%.*]] = bitcast i32* [[TMP26]] to <8 x i32>* -; DEFAULT-NEXT: [[WIDE_LOAD7:%.*]] = load <8 x i32>, <8 x i32>* [[TMP27]], align 4 -; DEFAULT-NEXT: [[TMP28:%.*]] = add nsw <8 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]] -; DEFAULT-NEXT: [[TMP29:%.*]] = add nsw <8 x i32> [[WIDE_LOAD5]], [[WIDE_LOAD1]] -; DEFAULT-NEXT: [[TMP30:%.*]] = add nsw <8 x i32> [[WIDE_LOAD6]], [[WIDE_LOAD2]] -; DEFAULT-NEXT: [[TMP31:%.*]] = add nsw <8 x i32> [[WIDE_LOAD7]], [[WIDE_LOAD3]] -; DEFAULT-NEXT: [[TMP32:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[TMP0]] -; DEFAULT-NEXT: [[TMP33:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP1]] -; DEFAULT-NEXT: [[TMP34:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP2]] -; DEFAULT-NEXT: [[TMP35:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] -; DEFAULT-NEXT: [[TMP36:%.*]] = getelementptr inbounds i32, i32* [[TMP32]], i32 0 -; DEFAULT-NEXT: [[TMP37:%.*]] = bitcast i32* [[TMP36]] to <8 x i32>* -; DEFAULT-NEXT: store <8 x i32> [[TMP28]], <8 x i32>* [[TMP37]], align 4 -; DEFAULT-NEXT: [[TMP38:%.*]] = getelementptr inbounds i32, i32* [[TMP32]], i32 8 -; DEFAULT-NEXT: [[TMP39:%.*]] = bitcast i32* [[TMP38]] to <8 x i32>* -; DEFAULT-NEXT: store <8 x i32> [[TMP29]], <8 x i32>* [[TMP39]], align 4 -; DEFAULT-NEXT: [[TMP40:%.*]] = getelementptr inbounds i32, i32* [[TMP32]], i32 16 -; DEFAULT-NEXT: [[TMP41:%.*]] = bitcast i32* [[TMP40]] to <8 x i32>* -; DEFAULT-NEXT: store <8 x i32> [[TMP30]], <8 x i32>* [[TMP41]], align 4 -; DEFAULT-NEXT: [[TMP42:%.*]] = getelementptr inbounds i32, i32* [[TMP32]], i32 24 -; DEFAULT-NEXT: [[TMP43:%.*]] = bitcast i32* [[TMP42]] to <8 x i32>* -; DEFAULT-NEXT: store <8 x i32> [[TMP31]], <8 x i32>* [[TMP43]], align 4 -; DEFAULT-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 32 -; DEFAULT-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], 416 -; DEFAULT-NEXT: br i1 [[TMP44]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4 -; DEFAULT: middle.block: -; DEFAULT-NEXT: [[CMP_N:%.*]] = icmp eq i64 430, 416 -; DEFAULT-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] -; DEFAULT: scalar.ph: -; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 416, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] -; DEFAULT-NEXT: br label [[FOR_BODY:%.*]] -; DEFAULT: for.cond.cleanup: -; DEFAULT-NEXT: ret void -; DEFAULT: for.body: -; DEFAULT-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] -; DEFAULT-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[INDVARS_IV]] -; DEFAULT-NEXT: [[TMP45:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; DEFAULT-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[INDVARS_IV]] -; DEFAULT-NEXT: [[TMP46:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4 -; DEFAULT-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP46]], [[TMP45]] -; DEFAULT-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV]] -; DEFAULT-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX4]], align 4 -; DEFAULT-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 -; DEFAULT-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 430 -; DEFAULT-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop !5 -; -; PREDFLAG-LABEL: @tail_folding_disabled( -; PREDFLAG-NEXT: entry: -; PREDFLAG-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] -; PREDFLAG: vector.ph: -; PREDFLAG-NEXT: br label [[VECTOR_BODY:%.*]] -; PREDFLAG: vector.body: -; PREDFLAG-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; PREDFLAG-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> undef, i64 [[INDEX]], i32 0 -; PREDFLAG-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> undef, <8 x i32> zeroinitializer -; PREDFLAG-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], -; PREDFLAG-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 -; PREDFLAG-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[TMP0]] -; PREDFLAG-NEXT: [[TMP2:%.*]] = icmp ule <8 x i64> [[INDUCTION]], -; PREDFLAG-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 -; PREDFLAG-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <8 x i32>* -; PREDFLAG-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* [[TMP4]], i32 4, <8 x i1> [[TMP2]], <8 x i32> undef) -; PREDFLAG-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[C:%.*]], i64 [[TMP0]] -; PREDFLAG-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i32 0 -; PREDFLAG-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* -; PREDFLAG-NEXT: [[WIDE_MASKED_LOAD1:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* [[TMP7]], i32 4, <8 x i1> [[TMP2]], <8 x i32> undef) -; PREDFLAG-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]] -; PREDFLAG-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[TMP0]] -; PREDFLAG-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TMP9]], i32 0 -; PREDFLAG-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* -; PREDFLAG-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP2]]) -; PREDFLAG-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8 -; PREDFLAG-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 432 -; PREDFLAG-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4 -; PREDFLAG: middle.block: -; PREDFLAG-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] -; PREDFLAG: scalar.ph: -; PREDFLAG-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 432, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] -; PREDFLAG-NEXT: br label [[FOR_BODY:%.*]] -; PREDFLAG: for.cond.cleanup: -; PREDFLAG-NEXT: ret void -; PREDFLAG: for.body: -; PREDFLAG-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] -; PREDFLAG-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[INDVARS_IV]] -; PREDFLAG-NEXT: [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; PREDFLAG-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[INDVARS_IV]] -; PREDFLAG-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4 -; PREDFLAG-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], [[TMP13]] -; PREDFLAG-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV]] -; PREDFLAG-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX4]], align 4 -; PREDFLAG-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 -; PREDFLAG-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 430 -; PREDFLAG-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop !5 +; CHECK-LABEL: @tail_folding_disabled( +; CHECK-NEXT: entry: +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> undef, i64 [[INDEX]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> undef, <8 x i32> zeroinitializer +; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 +; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[TMP0]] +; CHECK-NEXT: [[TMP2:%.*]] = icmp ule <8 x i64> [[INDUCTION]], +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 +; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <8 x i32>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* [[TMP4]], i32 4, <8 x i1> [[TMP2]], <8 x i32> undef) +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[C:%.*]], i64 [[TMP0]] +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i32 0 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* [[TMP7]], i32 4, <8 x i1> [[TMP2]], <8 x i32> undef) +; CHECK-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]] +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[TMP0]] +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TMP9]], i32 0 +; CHECK-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* +; CHECK-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP2]]) +; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8 +; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 432 +; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4 +; CHECK: middle.block: +; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 432, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] +; CHECK-NEXT: br label [[FOR_BODY:%.*]] +; CHECK: for.cond.cleanup: +; CHECK-NEXT: ret void +; CHECK: for.body: +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[INDVARS_IV]] +; CHECK-NEXT: [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[INDVARS_IV]] +; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4 +; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], [[TMP13]] +; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV]] +; CHECK-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX4]], align 4 +; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 +; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 430 +; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop !5 ; entry: br label %for.body diff --git a/llvm/test/Transforms/LoopVectorize/optsize.ll b/llvm/test/Transforms/LoopVectorize/optsize.ll index f0ca35b411c5..8def1ab0a0e8 100644 --- a/llvm/test/Transforms/LoopVectorize/optsize.ll +++ b/llvm/test/Transforms/LoopVectorize/optsize.ll @@ -154,6 +154,73 @@ exit: ret i32 %for } +; PR46228: Vectorize w/o versioning for unit stride under optsize and enabled +; vectorization. + +; NOTE: Some assertions have been autogenerated by utils/update_test_checks.py +define void @stride1(i16* noalias %B, i32 %BStride) optsize { +; CHECK-LABEL: @stride1( +; CHECK-NEXT: entry: +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> undef, i32 [[BSTRIDE:%.*]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE2:%.*]] ] +; CHECK-NEXT: [[VEC_IND:%.*]] = phi <2 x i32> [ , [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[PRED_STORE_CONTINUE2]] ] +; CHECK-NEXT: [[TMP0:%.*]] = mul nsw <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]] +; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <2 x i32> [[VEC_IND]], +; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0 +; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] +; CHECK: pred.store.if: +; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[B:%.*]], i32 [[TMP3]] +; CHECK-NEXT: store i16 42, i16* [[TMP4]], align 4 +; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]] +; CHECK: pred.store.continue: +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1 +; CHECK-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]] +; CHECK: pred.store.if1: +; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, i16* [[B]], i32 [[TMP6]] +; CHECK-NEXT: store i16 42, i16* [[TMP7]], align 4 +; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]] +; CHECK: pred.store.continue2: +; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2 +; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], +; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1026 +; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !19 +; CHECK: middle.block: +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK: for.end: +; CHECK-NEXT: ret void +; +; PGSO-LABEL: @stride1( +; PGSO-NEXT: entry: +; PGSO-NEXT: br i1 false, label %scalar.ph, label %vector.ph +; +; NPGSO-LABEL: @stride1( +; NPGSO-NEXT: entry: +; NPGSO-NEXT: br i1 false, label %scalar.ph, label %vector.ph + +entry: + br label %for.body + +for.body: + %iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ] + %mulB = mul nsw i32 %iv, %BStride + %gepOfB = getelementptr inbounds i16, i16* %B, i32 %mulB + store i16 42, i16* %gepOfB, align 4 + %iv.next = add nuw nsw i32 %iv, 1 + %exitcond = icmp eq i32 %iv.next, 1025 + br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15 + +for.end: + ret void +} + !llvm.module.flags = !{!0} !0 = !{i32 1, !"ProfileSummary", !1} !1 = !{!2, !3, !4, !5, !6, !7, !8, !9} @@ -170,3 +237,5 @@ exit: !12 = !{i32 999000, i64 100, i32 1} !13 = !{i32 999999, i64 1, i32 2} !14 = !{!"function_entry_count", i64 0} +!15 = distinct !{!15, !16} +!16 = !{!"llvm.loop.vectorize.enable", i1 true} diff --git a/llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll b/llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll index 6032fb18a387..d07e35687932 100644 --- a/llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll +++ b/llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll @@ -26,13 +26,57 @@ bb68: ret void } -; Check that the need for stride==1 check prevents vectorizing a loop under opt -; for size. -; CHECK-LABEL: @scev4stride1 -; CHECK-NOT: vector.scevcheck -; CHECK-NOT: vector.body: -; CHECK-LABEL: for.body: +; Check that a loop under opt-for-size is vectorized, w/o checking for +; stride==1. +; NOTE: Some assertions have been autogenerated by utils/update_test_checks.py define void @scev4stride1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32 %k) #0 { +; CHECK-LABEL: @scev4stride1( +; CHECK-NEXT: for.body.preheader: +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[K:%.*]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[VEC_IND:%.*]] = phi <4 x i32> [ , [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0 +; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1 +; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2 +; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3 +; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]] +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i32 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1 +; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[B]], i32 [[TMP7]] +; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[B]], i32 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3 +; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[B]], i32 [[TMP11]] +; CHECK-NEXT: [[TMP13:%.*]] = load i32, i32* [[TMP6]], align 4 +; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[TMP8]], align 4 +; CHECK-NEXT: [[TMP15:%.*]] = load i32, i32* [[TMP10]], align 4 +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[TMP12]], align 4 +; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> undef, i32 [[TMP13]], i32 0 +; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i32> [[TMP17]], i32 [[TMP14]], i32 1 +; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP15]], i32 2 +; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i32> [[TMP19]], i32 [[TMP16]], i32 3 +; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[TMP0]] +; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, i32* [[TMP21]], i32 0 +; CHECK-NEXT: [[TMP23:%.*]] = bitcast i32* [[TMP22]] to <4 x i32>* +; CHECK-NEXT: store <4 x i32> [[TMP20]], <4 x i32>* [[TMP23]], align 4 +; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4 +; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], +; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 +; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0 +; CHECK: middle.block: +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1024, 1024 +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK: for.body: +; CHECK: for.end.loopexit: +; CHECK-NEXT: ret void +; for.body.preheader: br label %for.body diff --git a/llvm/test/Transforms/LoopVectorize/runtime-check.ll b/llvm/test/Transforms/LoopVectorize/runtime-check.ll index bf54b2c798f2..37f9c0ce0de4 100644 --- a/llvm/test/Transforms/LoopVectorize/runtime-check.ll +++ b/llvm/test/Transforms/LoopVectorize/runtime-check.ll @@ -162,9 +162,9 @@ loopexit: define dso_local void @forced_optsize(i64* noalias nocapture readonly %x_p, i64* noalias nocapture readonly %y_p, i64* noalias nocapture %z_p) minsize optsize { ; -; FORCED_OPTSIZE: remark: :0:0: Code-size may be reduced by not forcing vectorization, or by source-code modifications eliminating the need for runtime checks (e.g., adding 'restrict'). +; FORCED_OPTSIZE: remark: :0:0: loop not vectorized: runtime pointer checks needed. Enable vectorization of this loop with '#pragma clang loop vectorize(enable)' when compiling with -Os/-Oz ; FORCED_OPTSIZE-LABEL: @forced_optsize( -; FORCED_OPTSIZE: vector.body: +; FORCED_OPTSIZE-NOT: vector.body: ; entry: br label %for.body From llvm-commits at lists.llvm.org Tue Jul 7 05:05:31 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:05:31 +0000 (UTC) Subject: [PATCH] D81345: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz In-Reply-To: References: Message-ID: <682a30d034355d13e2786412c0f2a3ad@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG7bf299c8d8d5: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz (authored by Ayal). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81345/new/ https://reviews.llvm.org/D81345 Files: llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/optsize.ll llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll llvm/test/Transforms/LoopVectorize/optsize.ll llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll llvm/test/Transforms/LoopVectorize/runtime-check.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81345.276002.patch Type: text/x-patch Size: 30313 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:07:12 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:07:12 +0000 (UTC) Subject: [PATCH] D83299: [Attributor] [WIP] Introduce callbase context bridges. Message-ID: kuter created this revision. Herald added subscribers: llvm-commits, okura, bbn, uenoku, hiraditya. Herald added a reviewer: jdoerfert. Herald added a reviewer: sstefan1. Herald added a reviewer: uenoku. Herald added a reviewer: jdoerfert. Herald added a reviewer: homerdin. Herald added a reviewer: baziotis. Herald added a project: LLVM. This patch introduces the bridges between call base context and the callbase https://reviews.llvm.org/D83299 Files: llvm/lib/Transforms/IPO/AttributorAttributes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83299.276003.patch Type: text/x-patch Size: 5974 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:09:23 2020 From: llvm-commits at lists.llvm.org (Hsiangkai Wang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:09:23 +0000 (UTC) Subject: [PATCH] D80802: [RISCV] Upgrade RVV MC to v0.9. In-Reply-To: References: Message-ID: <427688bfb6a14fd87156df9255591f58@localhost.localdomain> HsiangKai updated this revision to Diff 276005. HsiangKai added a comment. Address @rogfer01 and @fpallares' comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80802/new/ https://reviews.llvm.org/D80802 Files: llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp llvm/lib/Target/RISCV/RISCVInstrFormats.td llvm/lib/Target/RISCV/RISCVInstrFormatsV.td llvm/lib/Target/RISCV/RISCVInstrInfo.h llvm/lib/Target/RISCV/RISCVInstrInfoV.td llvm/test/MC/RISCV/rvv/convert.s llvm/test/MC/RISCV/rvv/ext.s llvm/test/MC/RISCV/rvv/fothers.s llvm/test/MC/RISCV/rvv/invalid.s llvm/test/MC/RISCV/rvv/load.s llvm/test/MC/RISCV/rvv/mask.s llvm/test/MC/RISCV/rvv/snippet.s llvm/test/MC/RISCV/rvv/store.s llvm/test/MC/RISCV/rvv/vsetvl.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D80802.276005.patch Type: text/x-patch Size: 129048 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:10:24 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:10:24 +0000 (UTC) Subject: [PATCH] D83071: Add support for options with two flags for controlling the same field. In-Reply-To: References: Message-ID: <052139caeaaa4c622cf89943e9860f90@localhost.localdomain> dang updated this revision to Diff 276007. dang added a comment. Make mergers use values directly instead of constant references Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83071/new/ https://reviews.llvm.org/D83071 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83071.276007.patch Type: text/x-patch Size: 8305 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:10:36 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:10:36 +0000 (UTC) Subject: [PATCH] D81345: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz In-Reply-To: References: Message-ID: <7d7b6339493cef8516e36aa9df74aa1a@localhost.localdomain> Ayal marked an inline comment as done. Ayal added inline comments. ================ Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:1821 + const bool EnableMemAccessVersioningOfLoop = + EnableMemAccessVersioning && ---------------- fhahn wrote: > Ayal wrote: > > fhahn wrote: > > > I think it might be slightly preferable to let LV drive the decision whether to version or not based on cost estimates (and LAA is used by other passes as well, which might have different requirements). > > > > > > Did you consider disabling generating the codes for (I think it should happen `emitSCEVChecks`) conditionally on optsize? IIUC we should only need to generate code for predicates, if either we need runtime checks or have symbolic strides. And runtime checks are already rejected in `runtimeChecksRequired`. > > Would it be better if LV passes a "only one copy of the loop is allowed" flag to analyzeLoop(), via the constructor of LAI, instead of the latter checking for OptSize? This requirement of OptSize is common to all other passes, right? > > > > Suppressing emitSCEVChecks would appease the assert, but LAI should refrain from collecting Strided Accesses in order (for getPtrStride()) to consider related accesses as non-unit strided accesses. It already does so under a cl::opt flag, for all its users. > > Would it be better if LV passes a "only one copy of the loop is allowed" flag to analyzeLoop(), via the constructor of LAI, instead of the latter checking for OptSize? This requirement of OptSize is common to all other passes, right? > > (Sorry for the long delay!) > > I think something like that would be preferable, as it makes explicit the interaction between LV/LAI. But I am not sure if that is possible at the moment though, because we get LAI as analysis. > > I am not sure if there's a feasible alternative without too much refactoring and it is probably not worth blocking the change on that, especially given that there's already precedence for using opt flags in a similar way. OK, thanks, unblocked the change :-) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81345/new/ https://reviews.llvm.org/D81345 From llvm-commits at lists.llvm.org Tue Jul 7 05:12:24 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:12:24 +0000 (UTC) Subject: [PATCH] D82860: Port ObjCMTAction to new option parsing system In-Reply-To: References: Message-ID: <7d37224158819852d54c9aaa2e6172f0@localhost.localdomain> dang updated this revision to Diff 276008. dang added a comment. Make mergers use values directly instead of constant references. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82860/new/ https://reviews.llvm.org/D82860 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82860.276008.patch Type: text/x-patch Size: 27572 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:13:13 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:13:13 +0000 (UTC) Subject: [PATCH] D83071: Add support for options with two flags for controlling the same field. In-Reply-To: References: Message-ID: <1ec67e80f25b84a154f5afc1016e9f25@localhost.localdomain> dang updated this revision to Diff 276009. dang added a comment. Rebase on top of some changes to parent patches. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83071/new/ https://reviews.llvm.org/D83071 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83071.276009.patch Type: text/x-patch Size: 7912 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:17:36 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:17:36 +0000 (UTC) Subject: [PATCH] D83299: [Attributor] [WIP] Introduce callbase context bridges. In-Reply-To: References: Message-ID: kuter updated this revision to Diff 276011. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83299/new/ https://reviews.llvm.org/D83299 Files: llvm/lib/Transforms/IPO/AttributorAttributes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83299.276011.patch Type: text/x-patch Size: 6174 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:18:58 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:18:58 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, with one suggestion. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test:499 + Relocations: + - Type: R_X86_64_NONE + - Name: .dynamic ---------------- Consider specifying on offset for one of these relocations, to distinguish it from the other, and show that the same region isn't being read for both now. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83232/new/ https://reviews.llvm.org/D83232 From llvm-commits at lists.llvm.org Tue Jul 7 05:20:21 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:20:21 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <09f71bf20f05694119dc665d601bf9f0@localhost.localdomain> djtodoro added inline comments. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp:17 + +#include "LiveDebugValues.h" + ---------------- //According to the LLVM coding standard, this should be included first.// CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Tue Jul 7 05:20:35 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:20:35 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2135724 , @jdoerfert wrote: > In D83268#2135081 , @Hahnfeld wrote: > > > This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?). > > > This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why: > > - Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically. > - Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions. > > > > In D83268#2135655 , @ABataev wrote: > > > In D83268#2135081 , @Hahnfeld wrote: > > > > > This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?). > > > > > > +1. Better to introduce new entry points and mark these ones as deprecated. > > > Same response as above. What is the use case here which we want to continue to support? Use of the new library with the previous version of the compiler. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 05:21:31 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:21:31 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <48ab405cbf093b89cffa22ca4c3e6738@localhost.localdomain> djtodoro added a comment. Thanks for addressing comments! One nit included. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Tue Jul 7 05:22:55 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:22:55 +0000 (UTC) Subject: [PATCH] D81345: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz In-Reply-To: References: Message-ID: <0a2aec035684a3200943affe728fa1b4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG7bf299c8d8d5: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz (authored by Ayal). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81345/new/ https://reviews.llvm.org/D81345 Files: llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/optsize.ll llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll llvm/test/Transforms/LoopVectorize/optsize.ll llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll llvm/test/Transforms/LoopVectorize/runtime-check.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81345.275658.patch Type: text/x-patch Size: 30313 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:25:47 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:25:47 +0000 (UTC) Subject: [PATCH] D60413: SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: <904eab2f7ae38c9166d734498b74fc33@localhost.localdomain> dnsampaio updated this revision to Diff 276016. dnsampaio added a comment. Herald added a subscriber: hiraditya. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276016.patch Type: text/x-patch Size: 5451 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:26:14 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:26:14 +0000 (UTC) Subject: [PATCH] D83260: [PGO][PGSO] Add profile guided size optimizations to some new sites. In-Reply-To: References: Message-ID: <12a450863c8e7e54347c4b250f26e603@localhost.localdomain> fhahn requested changes to this revision. fhahn added a comment. This revision now requires changes to proceed. Herald added a subscriber: nikic. I think it would be preferable to have separate patches per pass. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83260/new/ https://reviews.llvm.org/D83260 From llvm-commits at lists.llvm.org Tue Jul 7 05:27:21 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:27:21 +0000 (UTC) Subject: [PATCH] D74601: SystemZ target - Incorrect code generated for accessing thread_local variables when high-word feature is enabled In-Reply-To: References: Message-ID: uweigand added a comment. This was actually fixed in a different way here: https://reviews.llvm.org/D75014 I'm closing this review now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D74601/new/ https://reviews.llvm.org/D74601 From llvm-commits at lists.llvm.org Tue Jul 7 05:40:03 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:40:03 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: lebedev.ri added a comment. Why doesn't `InstCombiner::SimplifyDemandedUseBits()` handle this? I would have expected this to already deal with it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Tue Jul 7 05:41:49 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:41:49 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation In-Reply-To: References: Message-ID: <7203fc1056929a974c2ebad9be232835@localhost.localdomain> Kai accepted this revision. Kai added a comment. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83251/new/ https://reviews.llvm.org/D83251 From llvm-commits at lists.llvm.org Tue Jul 7 05:49:00 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:49:00 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: <85cd43b61b0f7d0ae75a2621af674cd8@localhost.localdomain> benshi001 updated this revision to Diff 276023. benshi001 edited the summary of this revision. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll llvm/test/CodeGen/X86/urem-seteq-nonzero.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83153.276023.patch Type: text/x-patch Size: 9473 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:53:06 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:53:06 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: benshi001 marked 4 inline comments as done. benshi001 added a comment. Change list according to all your comments. 1. Seperate the test cases to show improvement in another patch. Done. https://reviews.llvm.org/D83159 2. Make sure c1 and c2 do not exceed int64, to avoid assert failure. Done. One more if-statment is added to check that. 3. Check if c1*c2 is overflow. Done One more if-statment for that is added. 4. Add a new test case case triggers the overflow check. I will do that in https://reviews.llvm.org/D83159 5. Make a inverse transform if "opt -instcombine" has been performed. Shall we seperate this inverse transform in another patch? At least this patch improves the test case urem-seteq-nonzero.ll, and the case in https://reviews.llvm.org/D83159 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Tue Jul 7 05:54:02 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:54:02 +0000 (UTC) Subject: [PATCH] D83049: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. In-Reply-To: References: Message-ID: ikudrin updated this revision to Diff 276024. ikudrin marked an inline comment as done. ikudrin added a comment. - Updated the test to check everything in one run. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83049/new/ https://reviews.llvm.org/D83049 Files: lld/ELF/DWARF.h lld/ELF/SyntheticSections.cpp lld/test/ELF/gdb-index-invalid-pubnames.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83049.276024.patch Type: text/x-patch Size: 10275 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:54:05 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:54:05 +0000 (UTC) Subject: [PATCH] D83300: [GlobalOpt] Don't remove inalloca from musttail-called functions Message-ID: hans created this revision. hans added reviewers: aeubanks, efriedma. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Otherwise the verifier complains about the mismatching function ABIs. https://reviews.llvm.org/D83300 Files: llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/test/Transforms/GlobalOpt/fastcc.ll Index: llvm/test/Transforms/GlobalOpt/fastcc.ll =================================================================== --- llvm/test/Transforms/GlobalOpt/fastcc.ll +++ llvm/test/Transforms/GlobalOpt/fastcc.ll @@ -35,6 +35,17 @@ ret i32 %rv } +define i32 @inalloca2_caller(i32* inalloca %p) { + %rv = musttail call i32 @inalloca2(i32* inalloca %p) + ret i32 %rv +} +define internal i32 @inalloca2(i32* inalloca %p) { +; Because of the musttail caller, this inalloca cannot be dropped. +; CHECK-LABEL: define internal i32 @inalloca2(i32* inalloca %p) + %rv = load i32, i32* %p + ret i32 %rv +} + define internal i32 @preallocated(i32* preallocated(i32) %p) { ; CHECK-LABEL: define internal fastcc i32 @preallocated(i32* %p) %rv = load i32, i32* %p Index: llvm/lib/Transforms/IPO/GlobalOpt.cpp =================================================================== --- llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -2442,7 +2442,7 @@ // FIXME: We should also hoist alloca affected by this to the entry // block if possible. if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca) && - !F->hasAddressTaken()) { + !F->hasAddressTaken() && !hasMustTailCallers(F)) { RemoveAttribute(F, Attribute::InAlloca); Changed = true; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83300.276026.patch Type: text/x-patch Size: 1318 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:54:10 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:54:10 +0000 (UTC) Subject: [PATCH] D83099: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. sammccall marked 2 inline comments as done. Closed by commit rG9b55bc4d1197: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' (authored by sammccall). Changed prior to commit: https://reviews.llvm.org/D83099?vs=275645&id=276027#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 Files: .gitignore clang-tools-extra/clangd/index/Background.h clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp clang-tools-extra/clangd/test/background-index.test llvm/.gitignore Index: llvm/.gitignore =================================================================== --- llvm/.gitignore +++ llvm/.gitignore @@ -59,8 +59,6 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd #==============================================================================# # Files created in tree by the Go bindings. Index: clang-tools-extra/clangd/test/background-index.test =================================================================== --- clang-tools-extra/clangd/test/background-index.test +++ clang-tools-extra/clangd/test/background-index.test @@ -15,8 +15,8 @@ # RUN: clangd -background-index -lit-test < %t/definition.jsonrpc | FileCheck %t/definition.jsonrpc --check-prefixes=CHECK,BUILD # Test that the index is writing files in the expected location. -# RUN: ls %t/.clangd/index/foo.cpp.*.idx -# RUN: ls %t/sub_dir/.clangd/index/foo.h.*.idx +# RUN: ls %t/.cache/clangd/index/foo.cpp.*.idx +# RUN: ls %t/sub_dir/.cache/clangd/index/foo.h.*.idx # Test the index is read from disk: delete code and restart clangd. # RUN: rm %t/foo.cpp Index: clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp =================================================================== --- clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp +++ clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp @@ -95,8 +95,8 @@ }; // Creates and owns IndexStorages for multiple CDBs. -// When a CDB root is found, shards are stored in $ROOT/.clangd/index. -// When no root is found, the fallback path is ~/.cache/clangd/index. +// When a CDB root is found, shards are stored in $ROOT/.cache/clangd/index/. +// When no root is found, the fallback path is ~/.cache/clangd/index/. class DiskBackedIndexStorageManager { public: DiskBackedIndexStorageManager( @@ -115,7 +115,7 @@ llvm::SmallString<128> StorageDir(FallbackDir); if (auto PI = GetProjectInfo(File)) { StorageDir = PI->SourceRoot; - llvm::sys::path::append(StorageDir, ".clangd", "index"); + llvm::sys::path::append(StorageDir, ".cache", "clangd", "index"); } auto &IndexStorage = IndexStorageMap[StorageDir]; if (!IndexStorage) Index: clang-tools-extra/clangd/index/Background.h =================================================================== --- clang-tools-extra/clangd/index/Background.h +++ clang-tools-extra/clangd/index/Background.h @@ -56,9 +56,9 @@ using Factory = llvm::unique_function; // Creates an Index Storage that saves shards into disk. Index storage uses - // CDBDirectory + ".clangd/index/" as the folder to save shards. CDBDirectory - // is the first directory containing a CDB in parent directories of a file, or - // user's home directory if none was found, e.g. standard library headers. + // CDBDirectory + ".cache/clangd/index/" as the folder to save shards. + // CDBDirectory is the first directory containing a CDB in parent directories + // of a file, or user cache directory if none was found, e.g. stdlib headers. static Factory createDiskBackedStorageFactory( std::function(PathRef)> GetProjectInfo); }; Index: .gitignore =================================================================== --- .gitignore +++ .gitignore @@ -53,10 +53,11 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache # static analyzer regression testing project files /clang/utils/analyzer/projects/*/CachedSource /clang/utils/analyzer/projects/*/PatchedSource /clang/utils/analyzer/projects/*/ScanBuildResults -/clang/utils/analyzer/projects/*/RefScanBuildResults \ No newline at end of file +/clang/utils/analyzer/projects/*/RefScanBuildResults -------------- next part -------------- A non-text attachment was scrubbed... Name: D83099.276027.patch Type: text/x-patch Size: 3809 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:54:49 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:54:49 +0000 (UTC) Subject: [PATCH] D83300: [GlobalOpt] Don't remove inalloca from musttail-called functions In-Reply-To: References: Message-ID: <752319f85b05c8bf9bc1521fd576da1b@localhost.localdomain> hans added a comment. Please take a look! (We're hitting this in the Chromium browser / compiler test suite, https://crbug.com/1101286) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83300/new/ https://reviews.llvm.org/D83300 From llvm-commits at lists.llvm.org Tue Jul 7 05:55:10 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:55:10 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: <440c057e79520ac3647bc5417a5a7eaa@localhost.localdomain> mbrkusanin updated this revision to Diff 276025. mbrkusanin added a comment. - Addressed comments - Also renamed and updated tests for SDag. Let me know if you would rather have this as a separate patch. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 Files: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83214.276025.patch Type: text/x-patch Size: 16523 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:55:12 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:55:12 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: grimar updated this revision to Diff 276019. grimar marked 5 inline comments as done. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83225.276019.patch Type: text/x-patch Size: 11592 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:55:18 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:55:18 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: <23137c12ec2365a867a04e7ee77c6b99@localhost.localdomain> grimar added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3024 + if (!DtPltGot) + return createError("cannot find PLTGOT dynamic table tag"); + if (!DtLocalGotNum) ---------------- MaskRay wrote: > The canonical term is dynamic tag, not dynamic table tag. Fixed. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3077-3084 + if (!PltSec) + return createError("There is no not empty PLTGOT section at 0x " + + Twine::utohexstr(*DtMipsPltGot)); + + PltRelSec = findNotEmptySectionByAddress(Obj, FileName, *DtJmpRel); + if (!PltRelSec) + return createError("There is no not empty RELPLT section at 0x" + ---------------- jhenderson wrote: > These two cases don't appear to have tests? > > Also, the errors should be "not empty" -> "non-empty" and have lower-case first letters. Oh, right. I've missed them for unknown reason. I've added tests for these 2. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Tue Jul 7 05:55:40 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:55:40 +0000 (UTC) Subject: [PATCH] D80951: [GlobalOpt] Remove preallocated calls when possible In-Reply-To: References: Message-ID: hans added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/GlobalOpt.cpp:2405 + if (F->getAttributes().hasAttrSomewhere(Attribute::Preallocated) && + !F->hasAddressTaken()) { + RemovePreallocated(F); ---------------- aeubanks wrote: > efriedma wrote: > > Do you also need to check for musttail calls? > Good point, done Actually, I think the old code is also missing that check :-) Sent D83300 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80951/new/ https://reviews.llvm.org/D80951 From llvm-commits at lists.llvm.org Tue Jul 7 05:55:56 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:55:56 +0000 (UTC) Subject: [PATCH] D83099: Revert "[clangd] Store index in '.clangd/index' instead of '.clangd-index'" In-Reply-To: References: Message-ID: <6ed8e2fe8467b3075ce87b55a2243ac4@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG9b55bc4d1197: [clangd] Store index in '.cache/clangd/index' instead of '.clangd/index' (authored by sammccall). Changed prior to commit: https://reviews.llvm.org/D83099?vs=275244&id=275660#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83099/new/ https://reviews.llvm.org/D83099 Files: .gitignore clang-tools-extra/clangd/index/Background.h clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp clang-tools-extra/clangd/test/background-index.test llvm/.gitignore Index: llvm/.gitignore =================================================================== --- llvm/.gitignore +++ llvm/.gitignore @@ -59,8 +59,6 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd #==============================================================================# # Files created in tree by the Go bindings. Index: clang-tools-extra/clangd/test/background-index.test =================================================================== --- clang-tools-extra/clangd/test/background-index.test +++ clang-tools-extra/clangd/test/background-index.test @@ -15,8 +15,8 @@ # RUN: clangd -background-index -lit-test < %t/definition.jsonrpc | FileCheck %t/definition.jsonrpc --check-prefixes=CHECK,BUILD # Test that the index is writing files in the expected location. -# RUN: ls %t/.clangd/index/foo.cpp.*.idx -# RUN: ls %t/sub_dir/.clangd/index/foo.h.*.idx +# RUN: ls %t/.cache/clangd/index/foo.cpp.*.idx +# RUN: ls %t/sub_dir/.cache/clangd/index/foo.h.*.idx # Test the index is read from disk: delete code and restart clangd. # RUN: rm %t/foo.cpp Index: clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp =================================================================== --- clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp +++ clang-tools-extra/clangd/index/BackgroundIndexStorage.cpp @@ -95,8 +95,8 @@ }; // Creates and owns IndexStorages for multiple CDBs. -// When a CDB root is found, shards are stored in $ROOT/.clangd/index. -// When no root is found, the fallback path is ~/.cache/clangd/index. +// When a CDB root is found, shards are stored in $ROOT/.cache/clangd/index/. +// When no root is found, the fallback path is ~/.cache/clangd/index/. class DiskBackedIndexStorageManager { public: DiskBackedIndexStorageManager( @@ -115,7 +115,7 @@ llvm::SmallString<128> StorageDir(FallbackDir); if (auto PI = GetProjectInfo(File)) { StorageDir = PI->SourceRoot; - llvm::sys::path::append(StorageDir, ".clangd", "index"); + llvm::sys::path::append(StorageDir, ".cache", "clangd", "index"); } auto &IndexStorage = IndexStorageMap[StorageDir]; if (!IndexStorage) Index: clang-tools-extra/clangd/index/Background.h =================================================================== --- clang-tools-extra/clangd/index/Background.h +++ clang-tools-extra/clangd/index/Background.h @@ -56,9 +56,9 @@ using Factory = llvm::unique_function; // Creates an Index Storage that saves shards into disk. Index storage uses - // CDBDirectory + ".clangd/index/" as the folder to save shards. CDBDirectory - // is the first directory containing a CDB in parent directories of a file, or - // user's home directory if none was found, e.g. standard library headers. + // CDBDirectory + ".cache/clangd/index/" as the folder to save shards. + // CDBDirectory is the first directory containing a CDB in parent directories + // of a file, or user cache directory if none was found, e.g. stdlib headers. static Factory createDiskBackedStorageFactory( std::function(PathRef)> GetProjectInfo); }; Index: .gitignore =================================================================== --- .gitignore +++ .gitignore @@ -53,10 +53,11 @@ # VS2017 and VSCode config files. .vscode .vs -# clangd index -.clangd +# clangd index. (".clangd" is a config file now, thus trailing slash) +.clangd/ +.cache # static analyzer regression testing project files /clang/utils/analyzer/projects/*/CachedSource /clang/utils/analyzer/projects/*/PatchedSource /clang/utils/analyzer/projects/*/ScanBuildResults -/clang/utils/analyzer/projects/*/RefScanBuildResults \ No newline at end of file +/clang/utils/analyzer/projects/*/RefScanBuildResults -------------- next part -------------- A non-text attachment was scrubbed... Name: D83099.275660.patch Type: text/x-patch Size: 3809 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:56:22 2020 From: llvm-commits at lists.llvm.org (Hideto Ueno via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:56:22 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: uenoku added a comment. In D83185#2134452 , @kuter wrote: > In D83185#2134444 , @uenoku wrote: > > > In D83185#2134395 , @kuter wrote: > > > > > In D83185#2134275 , @uenoku wrote: > > > > > > > Either is fine but I think it is more natural to forbid an empty list. > > > > > > > > > Do you mean returning a error if a empty `--attributor-seed-allow-list` option is present ? > > > Currently the size of list is being used to tell if a list is present or not. > > > I think I can use `getNumOccurrences()` to replace this behaviour . > > > > > > Yes, I think replacing ZeroOrMore with OneOrMore is enoguh > > > I think `cl::OneOrMore` causes a error to be generated if the option is not specified. > That would make it mandatory to have a atleast one `--attributor-seed-allow-list` parameter for every invocation of > opt > > https://llvm.org/docs/CommandLine.html > > > The cl::OneOrMore modifier indicates that the option must be specified at least one time. Yeah, I might be wrong. Sorry for confusion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Tue Jul 7 05:56:58 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:56:58 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added inline comments. ================ Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:42 - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, ---------------- I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 05:57:57 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:57:57 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: ikudrin updated this revision to Diff 276028. ikudrin marked 6 inline comments as done. ikudrin added a comment. - Fixed typos (again). Thanks, @jhenderson! - Updated the test to shrink the number of tool runs. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 Files: lld/ELF/SyntheticSections.cpp lld/test/ELF/Inputs/gdb-index.s lld/test/ELF/gdb-index-invalid-pubnames.s lld/test/ELF/gdb-index.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83050.276028.patch Type: text/x-patch Size: 15674 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 05:58:10 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:58:10 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <7614d84d28c3b164c2d7bfbf0b870aeb@localhost.localdomain> ikudrin added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89 + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " terminated prematurely: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; ---------------- jhenderson wrote: > dblaikie wrote: > > jhenderson wrote: > > > ikudrin wrote: > > > > dblaikie wrote: > > > > > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. > > > > > > > > > > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. > > > > These wordings are already better than mine. Thanks! > > > How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. > > The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. > > > > What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? > Taken from the test case: > > ``` > error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 > ``` > (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") > > ``` > error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c > ``` Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative? ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:3 + +## All four public name sections share the same parser, but slightly different +## code paths are used to reach it. Do a comprehensive check for one of the ---------------- jhenderson wrote: > I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what). Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment sound better? ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:7-8 + +# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2> %t.err | FileCheck %s +# RUN: FileCheck %s --input-file=%t.err --check-prefix=ERR + ---------------- jhenderson wrote: > I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined. That is done to improve readability. The error messages are printed during parsing and dumping of all sets in the section comes after that. Thus, if we want to check all the messages at once, the error messages (or dumping messages) have to be separated from the corresponding lines of source code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Tue Jul 7 05:58:11 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:58:11 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. LGTM, with message fix. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3077-3084 + if (!PltSec) + return createError("There is no not empty PLTGOT section at 0x " + + Twine::utohexstr(*DtMipsPltGot)); + + PltRelSec = findNotEmptySectionByAddress(Obj, FileName, *DtJmpRel); + if (!PltRelSec) + return createError("There is no not empty RELPLT section at 0x" + ---------------- grimar wrote: > jhenderson wrote: > > These two cases don't appear to have tests? > > > > Also, the errors should be "not empty" -> "non-empty" and have lower-case first letters. > Oh, right. I've missed them for unknown reason. I've added tests for these 2. not empty -> non-empty (in both errors still) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Tue Jul 7 06:01:21 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:01:21 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <0a1baeef9d44df06723518642a160834@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, I think, but please wait for @dblaikie. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:3 + +## All four public name sections share the same parser, but slightly different +## code paths are used to reach it. Do a comprehensive check for one of the ---------------- ikudrin wrote: > jhenderson wrote: > > I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what). > Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment sound better? Looks okay to me. ================ Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s:7-8 + +# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2> %t.err | FileCheck %s +# RUN: FileCheck %s --input-file=%t.err --check-prefix=ERR + ---------------- ikudrin wrote: > jhenderson wrote: > > I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined. > That is done to improve readability. The error messages are printed during parsing and dumping of all sets in the section comes after that. Thus, if we want to check all the messages at once, the error messages (or dumping messages) have to be separated from the corresponding lines of source code. Thanks, makes sense. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Tue Jul 7 06:03:22 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:03:22 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: <026b7f5cea5acbf5e50946c086ab28c0@localhost.localdomain> serge-sans-paille added a comment. In D81256#2135457 , @foad wrote: > Looks good to me now - though it would still be nice to hear from @qcolombet as the author of `TypePromotionTransaction`. Thanks! I'll leave another day before I merge it if we don't hear from @qcolombet CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 From llvm-commits at lists.llvm.org Tue Jul 7 06:04:31 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:04:31 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: <79e11da0a0ed8fa08de9f0a2ce840afe@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054 + + Optional Arg = + getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true); + ---------------- I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1059 + if (Value == 0) { + BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_MOV_B32), DstReg).addImm(0); + } else if (Value == -1) { // all ones ---------------- This would need to be an S_MOV_B64 for wave 64? ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1064 + } else + return false; + } else { ---------------- This should be unreachable code (however, the verifier doesn't check intrinsic operand types so I guess you have to leave this) ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12 +; CHECK: ; %bb.0: +; CHECK-NEXT: s_mov_b32 s0, 0 +; CHECK-NEXT: s_mov_b32 s1, 0 +; CHECK-NEXT: ; return to shader part epilog ---------------- This can be one s_mov_b64 ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:23-24 +; CHECK: ; %bb.0: +; CHECK-NEXT: s_mov_b32 s0, exec_lo +; CHECK-NEXT: s_mov_b32 s1, exec_hi +; CHECK-NEXT: ; return to shader part epilog ---------------- One s_mov_b64 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 From llvm-commits at lists.llvm.org Tue Jul 7 06:08:22 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:08:22 +0000 (UTC) Subject: [PATCH] D83275: [llc] (almost) remove `--print-machineinstrs` In-Reply-To: References: Message-ID: <351f8f96f8d7cf44610ca498c3c5af96@localhost.localdomain> arsenm accepted this revision. arsenm added a comment. I've never used -print-machineinstrs and was surprised it existed last time I noticed it ================ Comment at: llvm/test/CodeGen/Hexagon/ifcvt-edge-weight.ll:1 -; RUN: llc -march=hexagon -mcpu=hexagonv5 -hexagon-eif=0 -print-machineinstrs=if-converter %s -o /dev/null 2>&1 | FileCheck %s +; RUN: llc -march=hexagon -mcpu=hexagonv5 -hexagon-eif=0 -print-after=if-converter %s -o /dev/null 2>&1 | FileCheck %s ; Check that the edge weights are updated correctly after if-conversion. ---------------- This test should probably be switched to stop-after and MIR ================ Comment at: llvm/test/DebugInfo/WebAssembly/dbg-value-move-2.ll:1 -; RUN: llc < %s -print-machineinstrs 2>&1 | FileCheck %s +; RUN: llc < %s -print-after=wasm-reg-stackify 2>&1 | FileCheck %s ---------------- Ditto ================ Comment at: llvm/test/DebugInfo/WebAssembly/dbg-value-move.ll:1 -; RUN: llc < %s -print-machineinstrs 2>&1 | FileCheck %s +; RUN: llc < %s -print-after=wasm-reg-stackify 2>&1 | FileCheck %s ---------------- Ditto Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83275/new/ https://reviews.llvm.org/D83275 From llvm-commits at lists.llvm.org Tue Jul 7 06:10:36 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:10:36 +0000 (UTC) Subject: [PATCH] D81947: [PowerPC][PCRelative] Thread Local Storage Support for Initial Exec In-Reply-To: References: Message-ID: <546bdec079b5432537e1f0af9f182267@localhost.localdomain> kamaub updated this revision to Diff 276031. kamaub added a comment. - rearranging conditional logic for emitting `MCSymbolRefExpr::VK_PPC_TLS_PCREL` so that it is less redundant and reads better. - replacing `future` with `pwr10` in test cases and adding better comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81947/new/ https://reviews.llvm.org/D81947 Files: llvm/include/llvm/BinaryFormat/ELFRelocs/PowerPC64.def llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/PPC.h llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCMCInstLower.cpp llvm/test/CodeGen/PowerPC/pcrel-tls-initial-exec.ll llvm/test/MC/PowerPC/pcrel-tls-initial-exec-address-load-reloc.s llvm/test/MC/PowerPC/pcrel-tls-initial-exec-value-load-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D81947.276031.patch Type: text/x-patch Size: 15310 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:15:05 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 06:15:05 -0700 (PDT) Subject: [llvm] 4a3c3d7 - [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. Message-ID: <5f047559.1c69fb81.f27fc.27b4@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T16:14:51+03:00 New Revision: 4a3c3d741a1711e0da618e4fdaee0b74dd2d6ace URL: https://github.com/llvm/llvm-project/commit/4a3c3d741a1711e0da618e4fdaee0b74dd2d6ace DIFF: https://github.com/llvm/llvm-project/commit/4a3c3d741a1711e0da618e4fdaee0b74dd2d6ace.diff LOG: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. Currently, llvm-readobj calls `report_fatal_error` when an object has both REL and RELA dynamic relocations. llvm-readelf is able to handle this case properly. This patch adds such a test case and adjusts the llvm-readobj code to follow (and be consistent with its own RELR and PLTREL cases). Differential revision: https://reviews.llvm.org/D83232 Added: Modified: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test b/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test index e13492e07cf1..5313ae78bf49 100644 --- a/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test +++ b/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test @@ -463,3 +463,64 @@ ProgramHeaders: Sections: - Section: .rela.dyn - Section: .dynamic + +## Show that when we have both REL and RELA relocations, we dump both sets. +# RUN: yaml2obj --docnum=13 %s -o %t13 +# RUN: llvm-readobj --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-LLVM +# RUN: llvm-readelf --dyn-relocations %t13 2>&1 | FileCheck %s -DFILE=%t13 --check-prefix=BOTH-RELA-REL-GNU + +# BOTH-RELA-REL-LLVM: Dynamic Relocations { +# BOTH-RELA-REL-LLVM-NEXT: 0x1 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: 0x2 R_X86_64_NONE - 0x0 +# BOTH-RELA-REL-LLVM-NEXT: } + +# BOTH-RELA-REL-GNU: 'RELA' relocation section at offset 0x78 contains 24 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend +# BOTH-RELA-REL-GNU-NEXT: 0000000000000001 0000000000000000 R_X86_64_NONE 0 +# BOTH-RELA-REL-GNU-EMPTY: +# BOTH-RELA-REL-GNU: 'REL' relocation section at offset 0x90 contains 16 bytes: +# BOTH-RELA-REL-GNU-NEXT: Offset Info Type Symbol's Value Symbol's Name +# BOTH-RELA-REL-GNU-NEXT: 0000000000000002 0000000000000000 R_X86_64_NONE + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_DYN + Machine: EM_X86_64 +Sections: + - Name: .rela.dyn + Type: SHT_RELA + Relocations: + - Type: R_X86_64_NONE + Offset: 0x1 + - Name: .rel.dyn + Type: SHT_REL + Relocations: + - Type: R_X86_64_NONE + Offset: 0x2 + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: DT_RELA + Value: 0x0 + - Tag: DT_RELASZ + Value: 0x18 + - Tag: DT_RELAENT + Value: 0x18 +## 0x18 == offset of .rel.dyn == size of .rela.dyn section. + - Tag: DT_REL + Value: 0x18 + - Tag: DT_RELSZ + Value: 0x10 + - Tag: DT_RELENT + Value: 0x10 + - Tag: DT_NULL + Value: 0x0 +DynamicSymbols: [] +ProgramHeaders: + - Type: PT_LOAD + Sections: + - Section: .rela.dyn + - Section: .rel.dyn + - Section: .dynamic diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index 247dfd9d6039..7b19cfd42e2d 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -6401,14 +6401,14 @@ void LLVMStyle::printDynamicRelocations(const ELFO *Obj) { const DynRegionInfo &DynRelaRegion = this->dumper()->getDynRelaRegion(); const DynRegionInfo &DynRelrRegion = this->dumper()->getDynRelrRegion(); const DynRegionInfo &DynPLTRelRegion = this->dumper()->getDynPLTRelRegion(); - if (DynRelRegion.Size && DynRelaRegion.Size) - report_fatal_error("There are both REL and RELA dynamic relocations"); + W.startLine() << "Dynamic Relocations {\n"; W.indent(); - if (DynRelaRegion.Size > 0) + if (DynRelaRegion.Size > 0) { for (const Elf_Rela &Rela : this->dumper()->dyn_relas()) printDynamicRelocation(Obj, Rela); - else + } + if (DynRelRegion.Size > 0) { for (const Elf_Rel &Rel : this->dumper()->dyn_rels()) { Elf_Rela Rela; Rela.r_offset = Rel.r_offset; @@ -6416,6 +6416,8 @@ void LLVMStyle::printDynamicRelocations(const ELFO *Obj) { Rela.r_addend = 0; printDynamicRelocation(Obj, Rela); } + } + if (DynRelrRegion.Size > 0) { Elf_Relr_Range Relrs = this->dumper()->dyn_relrs(); std::vector RelrRelas = From llvm-commits at lists.llvm.org Tue Jul 7 06:15:19 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:15:19 +0000 (UTC) Subject: [PATCH] D83232: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object has both REL and RELA. In-Reply-To: References: Message-ID: <26490249fb431795d0a7ef64cfd9c83e@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG4a3c3d741a17: [llvm-readobj] - Don't abort when dumping dynamic relocations when an object… (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D83232?vs=275993&id=276033#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83232/new/ https://reviews.llvm.org/D83232 Files: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83232.276033.patch Type: text/x-patch Size: 3640 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:17:10 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:17:10 +0000 (UTC) Subject: [PATCH] D72932: [ARM] Follow AACPS standard for volatile bit-fields access width In-Reply-To: References: Message-ID: <7d9ec1750c562e3b0d18f2ad4ac18f58@localhost.localdomain> dnsampaio updated this revision to Diff 276034. dnsampaio added a comment. Herald added a subscriber: danielkiss. Rebased. Now the AAPCS explicitly avoids conflicts with the C11, by not imposing any restriction when the natural container will overlap a zero lenght bit-field: https://github.com/ARM-software/abi-aa/commit/2334fc7611ede31b33e314ddd0dc90579015b322 Both 32 and 64 bit versions were updated on the same way. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72932/new/ https://reviews.llvm.org/D72932 Files: clang/include/clang/Basic/CodeGenOptions.def clang/include/clang/Driver/Options.td clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGRecordLayout.h clang/lib/CodeGen/CGRecordLayoutBuilder.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/test/CodeGen/aapcs-bitfield.c clang/test/CodeGen/bitfield-2.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D72932.276034.patch Type: text/x-patch Size: 230319 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:17:17 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:17:17 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: JonChesterfield added a comment. I'm not sure we have a consensus on api stability. Usually llvm allows mixing libraries and compilers from different sources, e.g. the various libunwind or compiler-rt vs libgcc. Libomptarget in general appears to be considered fixed and has external users (intel, maybe gcc). The device runtime would be ill served by this default. This is the only openmp device runtime library which works with llvm. It's statically linked, usually as bitcode when performance is important. The code used to handle target offloading for nvptx is spread across the compiler and the runtime, probably not optimally. I'm not familiar with the gcc-nvptx-openmp implementation. Reading through https://gcc.gnu.org/wiki/Offloading strongly suggests a totally independent implementation to this one. I don't think gcc can be using this runtime library for nvptx. It definitely doesn't for amdgcn. Proprietary compilers might be using this code, but we can have no duty of care to toolchains that use this code without telling us they're doing so. Therefore the only backwards/forwards compatibility we can strive for is between different versions of clang and the device runtime. That seems potentially useful - one could use a release clang binary while working on the deviceRTL or vice versa. It's an expensive developer convenience though. We would pay with things like the API rot fixed above. Introducing a faster lowering for an openmp construct would mean a redundant path through clang and some version checking to guess which runtime library we're targeting, which is not presently versioned. Likewise moving code between compiler and runtime becomes much more expensive to implement. Getting it wrong is inevitable given our test coverage. I think the project is much better served by assuming that the runtime library used by clang is the one from the same hash in the monorepo. That leaves us free to fix debt and improve performance, at the price of needing to build clang from (near to) trunk while developing the rtl. Perhaps we can embrace API stability later on, once we have things like versioning and a solid optimisation pipeline in place, especially if gcc wants to use the deviceRTL for nvptx. Now is too early. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:22:47 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 06:22:47 -0700 (PDT) Subject: [llvm] f7522a5 - [llvm-readobj] - Fix indentation in broken-dynamic-reloc.test. NFC. Message-ID: <5f047727.1c69fb81.8b07.2137@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T16:22:10+03:00 New Revision: f7522a5823d66303edfd8d872232dd6b07190f42 URL: https://github.com/llvm/llvm-project/commit/f7522a5823d66303edfd8d872232dd6b07190f42 DIFF: https://github.com/llvm/llvm-project/commit/f7522a5823d66303edfd8d872232dd6b07190f42.diff LOG: [llvm-readobj] - Fix indentation in broken-dynamic-reloc.test. NFC. Fix a broken indentation introduced my myself in rG4a3c3d741a17. Added: Modified: llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test b/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test index 5313ae78bf49..9142fc65a025 100644 --- a/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test +++ b/llvm/test/tools/llvm-readobj/ELF/broken-dynamic-reloc.test @@ -497,7 +497,7 @@ Sections: - Name: .rel.dyn Type: SHT_REL Relocations: - - Type: R_X86_64_NONE + - Type: R_X86_64_NONE Offset: 0x2 - Name: .dynamic Type: SHT_DYNAMIC From llvm-commits at lists.llvm.org Tue Jul 7 06:25:26 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:25:26 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: JonChesterfield accepted this revision. JonChesterfield added a comment. This revision is now accepted and ready to land. Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM. @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost? ================ Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:42 - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, ---------------- ABataev wrote: > I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled. If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls. Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:28:45 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:28:45 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: jmorse updated this revision to Diff 276036. jmorse added a comment. Move include to start of file; was on autopilot sorry. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 Files: llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83046.276036.patch Type: text/x-patch Size: 23175 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:29:44 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:29:44 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2135930 , @JonChesterfield wrote: > Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM. > > @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost? No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately. ================ Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:42 - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, ---------------- JonChesterfield wrote: > ABataev wrote: > > I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled. > If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls. > > Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised. No, I don't think you can do this in all cases Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:32:11 2020 From: llvm-commits at lists.llvm.org (Bjorn Pettersson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:32:11 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: bjope added a comment. I unfortunately still see some problems (related to the Scalarizer changes, and probably this patch): > cat scalarizer-bug.ll ; RUN: opt < %s -scalarizer -S -o - define void @foo() { vector.ph: br label %vector.body115 vector.body115: ; preds = %vector.body115, %vector.ph %vector.recur = phi <4 x i16> [ undef, %vector.ph ], [ %wide.load125, %vector.body115 ] %wide.load125 = load <4 x i16>, <4 x i16>* undef, align 1 br i1 undef, label %middle.block113, label %vector.body115 middle.block113: ; preds = %vector.body115 ret void } ----------------------------------------------------- > ~/opt.master -scalarizer scalarizer-bug.ll -S opt.master: ../lib/IR/Value.cpp:458: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `!contains(New, this) && "this->replaceAllUsesWith(expr(this)) is NOT valid!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/uabbpet/opt.master -scalarizer scalarizer-bug.ll -S 1. Running pass 'Function Pass Manager' on module 'scalarizer-bug.ll'. 2. Running pass 'Scalarize vector operations' on function '@foo' Abort Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 06:33:57 2020 From: llvm-commits at lists.llvm.org (Orlando Cazalet-Hyams via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:33:57 +0000 (UTC) Subject: [PATCH] D82129: [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope In-Reply-To: References: Message-ID: Orlando added a comment. In D82129#2134241 , @probinson wrote: > I think I didn't fully grasp that the blocks were being (tail-)merged, which makes the scope ambiguous, and all the rest. So I withdraw the objection on that basis. DWARF is fine with multiple variables pointing to the same location, but it's less forgiving about scopes IIRC, much like it can't describe multiple source attributions for an instructions. This all makes me sad, but that's how DWARF is at the moment. > > Is there still an open question about whether this wants to be a cleanup pass or a verifier check? I apologize for losing track. I think we have ruled out a MIR/IR verifier pass, but flagging it in llvm-dwarfdump somehow would still be nice and I wrote a ticket for fixing up the --statistics (PR46575). Instead, I think the question is now whether this should happen earlier in some way to reduce the number of redundant intrinsics, as David says in his reply below. In D82129#2134609 , @dblaikie wrote: > My take on it is that it's probably not practical to do this as a cleanup - it'd mean any time we merge debug locations, etc, we'd have to go check for isolated variable locations that have become invalid. > > (though, inversely: I worry that not cleaning up those variable locations might be a source of IR bloat and algorithmic scaling problems when the debug locations are scanned... ) I chose to do the trimming here because I can say with confidence that it won't cause any coverage or correctness regressions. I agree that it would be nice to remove redundant intrinsics, though I'm not exactly sure what that implementation would entail without further investigation. Is anyone able to offer any insight on this? If not, I suppose it might be reasonable to attempt to do this earlier (in IR) to see if there are any problems, and compare the results. Though I won't be able to get on this for a little while myself. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82129/new/ https://reviews.llvm.org/D82129 From llvm-commits at lists.llvm.org Tue Jul 7 06:34:41 2020 From: llvm-commits at lists.llvm.org (John Fastabend via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:34:41 +0000 (UTC) Subject: [PATCH] D83242: [RFC][BPF] support expr with typedef type for FIELD_EXISTENCE reloc In-Reply-To: References: Message-ID: <4e2b432311e28d0c12fe7db0179ae7e3@localhost.localdomain> jrfastab added a comment. Agreed also useful for us. I can pull it in and test if that is useful. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83242/new/ https://reviews.llvm.org/D83242 From llvm-commits at lists.llvm.org Tue Jul 7 06:36:24 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:36:24 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: jdoerfert marked an inline comment as done. jdoerfert added a comment. > I don't think gcc can be using this runtime library for nvptx. Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon). > Use of the new library with the previous version of the compiler. Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain. ================ Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:42 - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, ---------------- JonChesterfield wrote: > ABataev wrote: > > I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled. > If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls. > > Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised. We can detect all we want but switching it *does not have any effect*. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:36:41 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:36:41 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <6ca33676f6970fcb2a608079255ca56b@localhost.localdomain> JonChesterfield added a comment. In D83268#2135938 , @ABataev wrote: > > @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost? > > No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately. Yes, the existing v# naming and deprecated comments should also go. What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:37:47 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:37:47 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <75af94adfeef8a349cbf80350687479b@localhost.localdomain> jdoerfert added a comment. > Especially taking into account, that libomp can be built separately. This is *not* affecting libomp in any way. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:37:55 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:37:55 +0000 (UTC) Subject: [PATCH] D83149: [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush In-Reply-To: References: Message-ID: serge-sans-paille added a comment. Some nit-picking, lgtm otherwise. Please wait for @calixte review though :-) ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c:49 dlerror(); - void (*gcov_flush1)() = (void (*)())dlsym(f1_handle, "__gcov_flush"); - if (gcov_flush1 == NULL) { - fprintf(stderr, "unable to find __gcov_flush in func.shared': %s\n", dlerror()); + void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); + if (gcov_reset1 == NULL) { ---------------- Do we also need to test gcov_flush symbol here too? ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c.gcov:56 +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); // CHECK-NEXT: #####: 52: return EXIT_FAILURE; ---------------- Same question here, what about gcov_flush symbol? ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main_three-libs.c.gcov:55 +// CHECK-NEXT: 1: 49: void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); ---------------- And here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83149/new/ https://reviews.llvm.org/D83149 From llvm-commits at lists.llvm.org Tue Jul 7 06:41:10 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:41:10 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: lebedev.ri added a comment. In D83101#2135945 , @bjope wrote: > I unfortunately still see some problems (related to the Scalarizer changes, and probably this patch): > > > cat scalarizer-bug.ll > ; RUN: opt < %s -scalarizer -S -o - > > define void @foo() { > vector.ph: > br label %vector.body115 > > vector.body115: ; preds = %vector.body115, %vector.ph > %vector.recur = phi <4 x i16> [ undef, %vector.ph ], [ %wide.load125, %vector.body115 ] > %wide.load125 = load <4 x i16>, <4 x i16>* undef, align 1 > br i1 undef, label %middle.block113, label %vector.body115 > > middle.block113: ; preds = %vector.body115 > ret void > } > > > ----------------------------------------------------- > > > ~/opt.master -scalarizer scalarizer-bug.ll -S > opt.master: ../lib/IR/Value.cpp:458: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `!contains(New, this) && "this->replaceAllUsesWith(expr(this)) is NOT valid!"' failed. > PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. > Stack dump: > 0. Program arguments: /home/uabbpet/opt.master -scalarizer scalarizer-bug.ll -S > 1. Running pass 'Function Pass Manager' on module 'scalarizer-bug.ll'. > 2. Running pass 'Scalarize vector operations' on function '@foo' > Abort > Acknowledged, looking. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 06:44:00 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:44:00 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: grimar marked an inline comment as done. grimar added inline comments. ================ Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:3077-3084 + if (!PltSec) + return createError("There is no not empty PLTGOT section at 0x " + + Twine::utohexstr(*DtMipsPltGot)); + + PltRelSec = findNotEmptySectionByAddress(Obj, FileName, *DtJmpRel); + if (!PltRelSec) + return createError("There is no not empty RELPLT section at 0x" + ---------------- jhenderson wrote: > grimar wrote: > > jhenderson wrote: > > > These two cases don't appear to have tests? > > > > > > Also, the errors should be "not empty" -> "non-empty" and have lower-case first letters. > > Oh, right. I've missed them for unknown reason. I've added tests for these 2. > not empty -> non-empty > > (in both errors still) > (in both errors still) Sorry! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 From llvm-commits at lists.llvm.org Tue Jul 7 06:44:21 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Tue, 07 Jul 2020 06:44:21 -0700 (PDT) Subject: [llvm] e7abed3 - [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). Message-ID: <5f047c35.1c69fb81.3563a.1e42@mx.google.com> Author: Georgii Rymar Date: 2020-07-07T16:43:38+03:00 New Revision: e7abed3d48ec40350006bc76ad5c6c1f64b1bfad URL: https://github.com/llvm/llvm-project/commit/e7abed3d48ec40350006bc76ad5c6c1f64b1bfad DIFF: https://github.com/llvm/llvm-project/commit/e7abed3d48ec40350006bc76ad5c6c1f64b1bfad.diff LOG: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). `MipsGOTParser` is a helper class that is used to dump MIPS GOT and PLT. There is a problem with it: it might call report_fatal_error() on invalid input. When this happens, the tool reports a crash: ``` # command stderr: LLVM ERROR: Cannot find PLTGOT dynamic table tag. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backt race. Stack dump: ... ``` Such error were not tested. In this patch I've refactored `MipsGOTParser`: I've splitted handling of GOT and PLT to separate methods. This allows to propagate any possible errors to caller and should allow to dump the PLT when something is wrong with the GOT and vise versa in the future. I've added tests for each `report_fatal_error()` and now calling the `reportError` instead. In the future we might want to switch to reporting warnings, but it requres the additional testing and should be performed independently. I've kept `unwrapOrError` calls untouched for now as I'd like to focus on eliminating `report_fatal_error` calls in this patch only. Differential revision: https://reviews.llvm.org/D83225 Added: Modified: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-got.test b/llvm/test/tools/llvm-readobj/ELF/mips-got.test index c13d57233326..54b321bcaae4 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-got.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-got.test @@ -484,3 +484,38 @@ # GNU-GOT-STATIC-NEXT: 00410118 -32744(gp) 00400000 # GNU-GOT-STATIC-NEXT: 0041011c -32740(gp) 00400100 # GNU-GOT-STATIC-NEXT: 00410120 -32736(gp) 00400104 + +## Check we report errors when dynamic tags, needed for dumping GOT, are missing. + +# RUN: yaml2obj --docnum=1 -DTAG1=DT_MIPS_LOCAL_GOTNO -DTAG2=DT_MIPS_GOTSYM %s -o %t.err1.o +# RUN: not llvm-readobj -A %t.err1.o 2>&1 | FileCheck %s -DFILE=%t.err1.o -check-prefix ERR1 + +# ERR1: error: '[[FILE]]': cannot find PLTGOT dynamic tag + +# RUN: yaml2obj --docnum=1 -DTAG1=DT_PLTGOT -DTAG2=DT_MIPS_GOTSYM %s -o %t.err2.o +# RUN: not llvm-readobj -A %t.err2.o 2>&1 | FileCheck %s -DFILE=%t.err2.o -check-prefix ERR2 + +# ERR2: error: '[[FILE]]': cannot find MIPS_LOCAL_GOTNO dynamic tag + +# RUN: yaml2obj --docnum=1 -DTAG1=DT_PLTGOT -DTAG2=DT_MIPS_LOCAL_GOTNO %s -o %t.err3.o +# RUN: not llvm-readobj -A %t.err3.o 2>&1 | FileCheck %s -DFILE=%t.err3.o -check-prefix ERR3 + +# ERR3: error: '[[FILE]]': cannot find MIPS_GOTSYM dynamic tag + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_EXEC + Machine: EM_MIPS +Sections: + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: [[TAG1]] + Value: 0 + - Tag: [[TAG2]] + Value: 0 + - Tag: DT_NULL + Value: 0 +DynamicSymbols: [] diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test index b79237ce6c36..fa62aa98251a 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test @@ -62,3 +62,59 @@ # GNU-NEXT: Address Initial Sym.Val. Type Ndx Name # GNU-NEXT: 0041081c 004007c0 00000000 FUNC UND puts # GNU-NEXT: 00410820 004007c0 00000000 FUNC UND __libc_start_main + +## Check we report errors when dynamic tags, needed for dumping PLT, are missing. + +# RUN: yaml2obj --docnum=1 -DTAG=DT_MIPS_PLTGOT %s -o %t.err1.o +# RUN: not llvm-readobj -A %t.err1.o 2>&1 | FileCheck %s -DFILE=%t.err1.o --check-prefix=ERR1 + +# ERR1: error: '[[FILE]]': cannot find JMPREL dynamic tag + +# RUN: yaml2obj --docnum=1 -DTAG=DT_JMPREL %s -o %t.err2.o +# RUN: not llvm-readobj -A %t.err2.o 2>&1 | FileCheck %s -DFILE=%t.err2.o --check-prefix=ERR2 + +# ERR2: error: '[[FILE]]': cannot find MIPS_PLTGOT dynamic tag + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_EXEC + Machine: EM_MIPS +Sections: + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: [[TAG]] + Value: 0 + - Tag: DT_NULL + Value: 0 + +## Check we report errors when we are unable to find PLTGOT/JMPREL sections. +# RUN: yaml2obj --docnum=2 %s -DVAL1=0xffff -o %t.err3.o +# RUN: not llvm-readobj -A %t.err3.o 2>&1 | FileCheck %s -DFILE=%t.err3.o -check-prefix ERR3 + +# ERR3: error: '[[FILE]]': there is no non-empty PLTGOT section at 0xffff + +# RUN: yaml2obj --docnum=2 %s -DVAL2=0xffff -o %t.err4.o +# RUN: not llvm-readobj -A %t.err4.o 2>&1 | FileCheck %s -DFILE=%t.err4.o -check-prefix ERR4 + +# ERR4: error: '[[FILE]]': there is no non-empty RELPLT section at 0xffff + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_EXEC + Machine: EM_MIPS +Sections: + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: DT_MIPS_PLTGOT + Value: [[VAL1=0]] + - Tag: DT_JMPREL + Value: [[VAL2=0]] + - Tag: DT_NULL + Value: 0 +DynamicSymbols: [] diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index 7b19cfd42e2d..cd3c79d208e4 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -2865,9 +2865,14 @@ template void ELFDumper::printArchSpecificInfo() { MipsGOTParser Parser(Obj, ObjF->getFileName(), dynamic_table(), dynamic_symbols()); - if (Parser.hasGot()) + if (Error E = Parser.findGOT(dynamic_table(), dynamic_symbols())) + reportError(std::move(E), ObjF->getFileName()); + else if (!Parser.isGotEmpty()) ELFDumperStyle->printMipsGOT(Parser); - if (Parser.hasPlt()) + + if (Error E = Parser.findPLT(dynamic_table())) + reportError(std::move(E), ObjF->getFileName()); + else if (!Parser.isPltEmpty()) ELFDumperStyle->printMipsPLT(Parser); break; } @@ -2929,9 +2934,11 @@ template class MipsGOTParser { MipsGOTParser(const ELFO *Obj, StringRef FileName, Elf_Dyn_Range DynTable, Elf_Sym_Range DynSyms); + Error findGOT(Elf_Dyn_Range DynTable, Elf_Sym_Range DynSyms); + Error findPLT(Elf_Dyn_Range DynTable); - bool hasGot() const { return !GotEntries.empty(); } - bool hasPlt() const { return !PltEntries.empty(); } + bool isGotEmpty() const { return GotEntries.empty(); } + bool isPltEmpty() const { return PltEntries.empty(); } uint64_t getGp() const; @@ -2979,7 +2986,11 @@ MipsGOTParser::MipsGOTParser(const ELFO *Obj, StringRef FileName, Elf_Sym_Range DynSyms) : IsStatic(DynTable.empty()), Obj(Obj), GotSec(nullptr), LocalNum(0), GlobalNum(0), PltSec(nullptr), PltRelSec(nullptr), PltSymTable(nullptr), - FileName(FileName) { + FileName(FileName) {} + +template +Error MipsGOTParser::findGOT(Elf_Dyn_Range DynTable, + Elf_Sym_Range DynSyms) { // See "Global Offset Table" in Chapter 5 in the following document // for detailed GOT description. // ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf @@ -2988,22 +2999,20 @@ MipsGOTParser::MipsGOTParser(const ELFO *Obj, StringRef FileName, if (IsStatic) { GotSec = findSectionByName(*Obj, FileName, ".got"); if (!GotSec) - return; + return Error::success(); ArrayRef Content = unwrapOrError(FileName, Obj->getSectionContents(GotSec)); GotEntries = Entries(reinterpret_cast(Content.data()), Content.size() / sizeof(Entry)); LocalNum = GotEntries.size(); - return; + return Error::success(); } - // Lookup dynamic table tags which define GOT/PLT layouts. + // Lookup dynamic table tags which define the GOT layout. Optional DtPltGot; Optional DtLocalGotNum; Optional DtGotSym; - Optional DtMipsPltGot; - Optional DtJmpRel; for (const auto &Entry : DynTable) { switch (Entry.getTag()) { case ELF::DT_PLTGOT: @@ -3015,6 +3024,47 @@ MipsGOTParser::MipsGOTParser(const ELFO *Obj, StringRef FileName, case ELF::DT_MIPS_GOTSYM: DtGotSym = Entry.getVal(); break; + } + } + + if (!DtPltGot && !DtLocalGotNum && !DtGotSym) + return Error::success(); + + if (!DtPltGot) + return createError("cannot find PLTGOT dynamic tag"); + if (!DtLocalGotNum) + return createError("cannot find MIPS_LOCAL_GOTNO dynamic tag"); + if (!DtGotSym) + return createError("cannot find MIPS_GOTSYM dynamic tag"); + + size_t DynSymTotal = DynSyms.size(); + if (*DtGotSym > DynSymTotal) + return createError("MIPS_GOTSYM exceeds a number of dynamic symbols"); + + GotSec = findNotEmptySectionByAddress(Obj, FileName, *DtPltGot); + if (!GotSec) + return createError("there is no non-empty GOT section at 0x" + + Twine::utohexstr(*DtPltGot)); + + LocalNum = *DtLocalGotNum; + GlobalNum = DynSymTotal - *DtGotSym; + + ArrayRef Content = + unwrapOrError(FileName, Obj->getSectionContents(GotSec)); + GotEntries = Entries(reinterpret_cast(Content.data()), + Content.size() / sizeof(Entry)); + GotDynSyms = DynSyms.drop_front(*DtGotSym); + + return Error::success(); +} + +template +Error MipsGOTParser::findPLT(Elf_Dyn_Range DynTable) { + // Lookup dynamic table tags which define the PLT layout. + Optional DtMipsPltGot; + Optional DtJmpRel; + for (const auto &Entry : DynTable) { + switch (Entry.getTag()) { case ELF::DT_MIPS_PLTGOT: DtMipsPltGot = Entry.getVal(); break; @@ -3024,63 +3074,35 @@ MipsGOTParser::MipsGOTParser(const ELFO *Obj, StringRef FileName, } } - // Find dynamic GOT section. - if (DtPltGot || DtLocalGotNum || DtGotSym) { - if (!DtPltGot) - report_fatal_error("Cannot find PLTGOT dynamic table tag."); - if (!DtLocalGotNum) - report_fatal_error("Cannot find MIPS_LOCAL_GOTNO dynamic table tag."); - if (!DtGotSym) - report_fatal_error("Cannot find MIPS_GOTSYM dynamic table tag."); - - size_t DynSymTotal = DynSyms.size(); - if (*DtGotSym > DynSymTotal) - reportError( - createError("MIPS_GOTSYM exceeds a number of dynamic symbols"), - FileName); - - GotSec = findNotEmptySectionByAddress(Obj, FileName, *DtPltGot); - if (!GotSec) - reportError(createError("There is no not empty GOT section at 0x" + - Twine::utohexstr(*DtPltGot)), - FileName); - - LocalNum = *DtLocalGotNum; - GlobalNum = DynSymTotal - *DtGotSym; - - ArrayRef Content = - unwrapOrError(FileName, Obj->getSectionContents(GotSec)); - GotEntries = Entries(reinterpret_cast(Content.data()), - Content.size() / sizeof(Entry)); - GotDynSyms = DynSyms.drop_front(*DtGotSym); - } + if (!DtMipsPltGot && !DtJmpRel) + return Error::success(); // Find PLT section. - if (DtMipsPltGot || DtJmpRel) { - if (!DtMipsPltGot) - report_fatal_error("Cannot find MIPS_PLTGOT dynamic table tag."); - if (!DtJmpRel) - report_fatal_error("Cannot find JMPREL dynamic table tag."); - - PltSec = findNotEmptySectionByAddress(Obj, FileName, * DtMipsPltGot); - if (!PltSec) - report_fatal_error("There is no not empty PLTGOT section at 0x " + - Twine::utohexstr(*DtMipsPltGot)); - - PltRelSec = findNotEmptySectionByAddress(Obj, FileName, * DtJmpRel); - if (!PltRelSec) - report_fatal_error("There is no not empty RELPLT section at 0x" + - Twine::utohexstr(*DtJmpRel)); - - ArrayRef PltContent = - unwrapOrError(FileName, Obj->getSectionContents(PltSec)); - PltEntries = Entries(reinterpret_cast(PltContent.data()), - PltContent.size() / sizeof(Entry)); - - PltSymTable = unwrapOrError(FileName, Obj->getSection(PltRelSec->sh_link)); - PltStrTable = - unwrapOrError(FileName, Obj->getStringTableForSymtab(*PltSymTable)); - } + if (!DtMipsPltGot) + return createError("cannot find MIPS_PLTGOT dynamic tag"); + if (!DtJmpRel) + return createError("cannot find JMPREL dynamic tag"); + + PltSec = findNotEmptySectionByAddress(Obj, FileName, *DtMipsPltGot); + if (!PltSec) + return createError("there is no non-empty PLTGOT section at 0x" + + Twine::utohexstr(*DtMipsPltGot)); + + PltRelSec = findNotEmptySectionByAddress(Obj, FileName, *DtJmpRel); + if (!PltRelSec) + return createError("there is no non-empty RELPLT section at 0x" + + Twine::utohexstr(*DtJmpRel)); + + ArrayRef PltContent = + unwrapOrError(FileName, Obj->getSectionContents(PltSec)); + PltEntries = Entries(reinterpret_cast(PltContent.data()), + PltContent.size() / sizeof(Entry)); + + PltSymTable = unwrapOrError(FileName, Obj->getSection(PltRelSec->sh_link)); + PltStrTable = + unwrapOrError(FileName, Obj->getStringTableForSymtab(*PltSymTable)); + + return Error::success(); } template uint64_t MipsGOTParser::getGp() const { From llvm-commits at lists.llvm.org Tue Jul 7 06:44:23 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:44:23 +0000 (UTC) Subject: [PATCH] D83225: [llvm-readobj] - Refactor the MipsGOTParser to stop using report_fatal_error(). In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe7abed3d48ec: [llvm-readobj] - Refactor the MipsGOTParser<ELFT> to stop using… (authored by grimar). Herald added a subscriber: jrtc27. Changed prior to commit: https://reviews.llvm.org/D83225?vs=276019&id=276038#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83225/new/ https://reviews.llvm.org/D83225 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83225.276038.patch Type: text/x-patch Size: 11592 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:44:52 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:44:52 +0000 (UTC) Subject: [PATCH] D83047: [LiveDebugValues] 2/4 Add instruction-referencing LiveDebugValues implementation In-Reply-To: References: Message-ID: <5c77dc4c3385523025797166b58415d6@localhost.localdomain> jmorse updated this revision to Diff 276039. jmorse added a comment. (Rebasing, only affects LiveDebugValues.h) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83047/new/ https://reviews.llvm.org/D83047 Files: llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83047.276039.patch Type: text/x-patch Size: 111951 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:45:04 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:45:04 +0000 (UTC) Subject: [PATCH] D82502: [PowerPC][Power10] Implement Load VSX Vector and Sign Extend and Zero Extend In-Reply-To: References: Message-ID: <32c8916a16a8363b5dffd938a64439d2@localhost.localdomain> lei added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:13791 + // Ensure that the load from the narrow width is being zero extended to i128. + if ((!ValidLDType) || (LD->getValueType(0) != MVT::i128) || + (LD->getExtensionType() != ISD::ZEXTLOAD)) ---------------- nit: don't need `()` aroud `!ValidLDType` ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:13797 + // we are creating in order to pattern match to the appropriate instruction + // in the backend. + SDValue Width = DAG.getIntPtrConstant(MemoryType.getScalarSizeInBits(), dl); ---------------- I don't think we need to explicitly say this sine everything we do here is for pattern matching to appropriate instructions in the backend... ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:13798 + // in the backend. + SDValue Width = DAG.getIntPtrConstant(MemoryType.getScalarSizeInBits(), dl); + SDValue LoadOps[] = {LD->getChain(), LD->getBasePtr(), Width}; ---------------- Can just merge this into the next line and remove this tmp value. ================ Comment at: llvm/test/CodeGen/PowerPC/p10-vsx-builtins.ll:6 +; RUN: -mcpu=pwr10 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | \ +; RUN: FileCheck %s + ---------------- Please add tests for BE. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82502/new/ https://reviews.llvm.org/D82502 From llvm-commits at lists.llvm.org Tue Jul 7 06:46:03 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:46:03 +0000 (UTC) Subject: [PATCH] D83048: [LiveDebugValues] 3/4 Add Xclang and CodeGen options for using instr-ref variable locations In-Reply-To: References: Message-ID: <9c92b5cd90c98f848aa95cc4cc260c00@localhost.localdomain> jmorse updated this revision to Diff 276040. jmorse added a comment. (Rebase, only affecting LiveDebugValues.cpp) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83048/new/ https://reviews.llvm.org/D83048 Files: clang/include/clang/Basic/CodeGenOptions.def clang/include/clang/Driver/CC1Options.td clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/CodeGen/CommandFlags.h llvm/include/llvm/Target/TargetOptions.h llvm/lib/CodeGen/CommandFlags.cpp llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83048.276040.patch Type: text/x-patch Size: 7337 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:46:42 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:46:42 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2135955 , @jdoerfert wrote: > > Especially taking into account, that libomp can be built separately. > > This is *not* affecting libomp in any way. libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:47:42 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:47:42 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <1cb89fe40acdc399e6b7aef1055455a8@localhost.localdomain> ABataev added a comment. In D83268#2135954 , @JonChesterfield wrote: > In D83268#2135938 , @ABataev wrote: > > > > @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost? > > > > No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately. > > > Yes, the existing v# naming and deprecated comments should also go. > > What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent. Answered already: the previous version of the compiler with the new version of the runtime. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:50:13 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Tue, 07 Jul 2020 06:50:13 -0700 (PDT) Subject: [llvm] ea85ff8 - [X86] Fix a bug that when lowering byval argument Message-ID: <5f047d95.1c69fb81.8ea7e.6a23@mx.google.com> Author: Liu, Chen3 Date: 2020-07-07T21:49:31+08:00 New Revision: ea85ff82c82687496453bc14c4ac60548a42d8f3 URL: https://github.com/llvm/llvm-project/commit/ea85ff82c82687496453bc14c4ac60548a42d8f3 DIFF: https://github.com/llvm/llvm-project/commit/ea85ff82c82687496453bc14c4ac60548a42d8f3.diff LOG: [X86] Fix a bug that when lowering byval argument When an argument has 'byval' attribute and should be passed on the stack according calling convention, a stack copy would be emitted twice. This will cause the real value will be put into stack where the pointer should be passed. Differential Revision: https://reviews.llvm.org/D83175 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/X86/win64-byval.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index d7a45f6fb7c4..4821dd44e01f 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -3763,12 +3763,13 @@ SDValue X86TargetLowering::LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const { + ISD::ArgFlagsTy Flags, + bool isByVal) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), StackPtr, PtrOff); - if (Flags.isByVal()) + if (isByVal) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); return DAG.getStore( @@ -4080,7 +4081,7 @@ X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI, StackPtr = DAG.getCopyFromReg(Chain, dl, RegInfo->getStackRegister(), getPointerTy(DAG.getDataLayout())); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, - dl, DAG, VA, Flags)); + dl, DAG, VA, Flags, isByVal)); } } diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h index ad76c55e9c6e..7f3dc90a2d73 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.h +++ b/llvm/lib/Target/X86/X86ISelLowering.h @@ -1436,7 +1436,7 @@ namespace llvm { SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const; + ISD::ArgFlagsTy Flags, bool isByval) const; // Call lowering helpers. diff --git a/llvm/test/CodeGen/X86/win64-byval.ll b/llvm/test/CodeGen/X86/win64-byval.ll index 2fefa736e64e..af2af4e6e9b9 100644 --- a/llvm/test/CodeGen/X86/win64-byval.ll +++ b/llvm/test/CodeGen/X86/win64-byval.ll @@ -32,3 +32,31 @@ define void @baz({ float, double }* byval %arg) call void @foo({ float, double }* byval %arg) ret void } + +declare void @foo2({ float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, i64 %f) + at data = external constant { float, double } + +define void @test() { +; CHECK-LABEL: @test +; CHECK: movq (%rax), %rcx +; CHECK-NEXT: movq 8(%rax), %rax +; CHECK-NEXT: movq %rax, 120(%rsp) +; CHECK-NEXT: movq %rcx, 112(%rsp) +; CHECK-NEXT: movq %rcx, 96(%rsp) +; CHECK-NEXT: movq %rax, 104(%rsp) +; CHECK-NEXT: movq %rcx, 80(%rsp) +; CHECK-NEXT: movq %rax, 88(%rsp) +; CHECK-NEXT: movq %rcx, 64(%rsp) +; CHECK-NEXT: movq %rax, 72(%rsp) +; CHECK-NEXT: movq %rax, 56(%rsp) +; CHECK-NEXT: movq %rcx, 48(%rsp) +; CHECK-NEXT: leaq 48(%rsp), %rax +; CHECK-NEXT: movq %rax, 32(%rsp) +; CHECK-NEXT: movq $10, 40(%rsp) +; CHECK-NEXT: leaq 112(%rsp), %rcx +; CHECK-NEXT: leaq 96(%rsp), %rdx +; CHECK-NEXT: leaq 80(%rsp), %r8 +; CHECK-NEXT: leaq 64(%rsp), %r9 + call void @foo2({ float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, i64 10) + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 06:50:30 2020 From: llvm-commits at lists.llvm.org (LiuChen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:50:30 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGea85ff82c826: [X86] Fix a bug that when lowering byval argument (authored by LiuChen3). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/X86/win64-byval.ll Index: llvm/test/CodeGen/X86/win64-byval.ll =================================================================== --- llvm/test/CodeGen/X86/win64-byval.ll +++ llvm/test/CodeGen/X86/win64-byval.ll @@ -32,3 +32,31 @@ call void @foo({ float, double }* byval %arg) ret void } + +declare void @foo2({ float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, i64 %f) + at data = external constant { float, double } + +define void @test() { +; CHECK-LABEL: @test +; CHECK: movq (%rax), %rcx +; CHECK-NEXT: movq 8(%rax), %rax +; CHECK-NEXT: movq %rax, 120(%rsp) +; CHECK-NEXT: movq %rcx, 112(%rsp) +; CHECK-NEXT: movq %rcx, 96(%rsp) +; CHECK-NEXT: movq %rax, 104(%rsp) +; CHECK-NEXT: movq %rcx, 80(%rsp) +; CHECK-NEXT: movq %rax, 88(%rsp) +; CHECK-NEXT: movq %rcx, 64(%rsp) +; CHECK-NEXT: movq %rax, 72(%rsp) +; CHECK-NEXT: movq %rax, 56(%rsp) +; CHECK-NEXT: movq %rcx, 48(%rsp) +; CHECK-NEXT: leaq 48(%rsp), %rax +; CHECK-NEXT: movq %rax, 32(%rsp) +; CHECK-NEXT: movq $10, 40(%rsp) +; CHECK-NEXT: leaq 112(%rsp), %rcx +; CHECK-NEXT: leaq 96(%rsp), %rdx +; CHECK-NEXT: leaq 80(%rsp), %r8 +; CHECK-NEXT: leaq 64(%rsp), %r9 + call void @foo2({ float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, i64 10) + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.h =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.h +++ llvm/lib/Target/X86/X86ISelLowering.h @@ -1436,7 +1436,7 @@ SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const; + ISD::ArgFlagsTy Flags, bool isByval) const; // Call lowering helpers. Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -3763,12 +3763,13 @@ SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const { + ISD::ArgFlagsTy Flags, + bool isByVal) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), StackPtr, PtrOff); - if (Flags.isByVal()) + if (isByVal) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); return DAG.getStore( @@ -4080,7 +4081,7 @@ StackPtr = DAG.getCopyFromReg(Chain, dl, RegInfo->getStackRegister(), getPointerTy(DAG.getDataLayout())); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, - dl, DAG, VA, Flags)); + dl, DAG, VA, Flags, isByVal)); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83175.276044.patch Type: text/x-patch Size: 3438 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:50:58 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:50:58 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <674acf42a1ffd2f971097300e7345000@localhost.localdomain> jdoerfert added a comment. In D83268#2135989 , @ABataev wrote: > In D83268#2135954 , @JonChesterfield wrote: > > > What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent. > > > Answered already: the previous version of the compiler with the new version of the runtime. Still cannot be expected to work: https://reviews.llvm.org/D83268#2135951 Are there other use cases? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:53:05 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:53:05 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: jdoerfert added a comment. In D83268#2135988 , @ABataev wrote: > In D83268#2135955 , @jdoerfert wrote: > > > > Especially taking into account, that libomp can be built separately. > > > > This is *not* affecting libomp in any way. > > > libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no? No they are not. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:53:57 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:53:57 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS. However, when calculating the offsets for each of the operands we incorrectly use the element size rather than actual size and thus the stores overlap. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83303 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -0,0 +1,38 @@ +; RUN: llc -aarch64-sve-vector-bits-min=256 < %s | FileCheck %s +; RUN: llc -aarch64-sve-vector-bits-min=512 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 + +target triple = "aarch64-unknown-linux-gnu" + +; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in +; validating all combinations of vector type. + +define void @concat_vectors_v4i64(<2 x i64> %a, <2 x i64> %b, <4 x i64> *%c.addr) #0 { +; CHECK-LABEL: concat_vectors_v4i64: +; CHECK: stp q0, q1, [sp] +; CHECK: ptrue [[OUT_PG:p[0-9]+]].d, vl4 +; CHECK: mov x[[LO_ADDR:[0-9]+]], sp +; CHECK: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x[[LO_ADDR]]] + %concat = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> + store <4 x i64> %concat, <4 x i64>* %c.addr + ret void +} + +define void @concat_vectors_v8i64(<4 x i64> *%a.addr, <4 x i64> *%b.addr, <8 x i64> *%c.addr) #0 { +; VBITS_GE_512-LABEL: concat_vectors_v8i64: +; VBITS_GE_512: ptrue [[IN_PG:p[0-9]+]].d, vl4 +; VBITS_GE_512: ld1d { [[LO:z[0-9]+]].d }, [[IN_PG]]/z, [x0] +; VBITS_GE_512: ld1d { [[HI:z[0-9]+]].d }, [[IN_PG]]/z, [x1] +; VBITS_GE_512: mov x[[LO_ADDR:[0-9]+]], sp +; VBITS_GE_512: orr x[[HI_ADDR:[0-9]+]], x[[LO_ADDR]], #0x20 +; VBITS_GE_512: st1d { [[LO]].d }, [[IN_PG]], [x[[LO_ADDR]]] +; VBITS_GE_512: st1d { [[HI]].d }, [[IN_PG]], [x[[HI_ADDR]]] +; VBITS_GE_512: ptrue [[OUT_PG:p[0-9]+]].d, vl8 +; VBITS_GE_512: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x8] + %a = load <4 x i64>, <4 x i64>* %a.addr + %b = load <4 x i64>, <4 x i64>* %b.addr + %concat = shufflevector <4 x i64> %a, <4 x i64> %b, <8 x i32> + store <8 x i64> %concat, <8 x i64>* %c.addr + ret void +} + +attributes #0 = { nounwind "target-features"="+sve" } Index: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -1390,12 +1390,17 @@ } SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) { + assert((Node->getOpcode() == ISD::BUILD_VECTOR || + Node->getOpcode() == ISD::CONCAT_VECTORS) && + "Unexpected opcode!"); + // We can't handle this case efficiently. Allocate a sufficiently - // aligned object on the stack, store each element into it, then load + // aligned object on the stack, store each operand into it, then load // the result as a vector. // Create the stack frame object. EVT VT = Node->getValueType(0); - EVT EltVT = VT.getVectorElementType(); + EVT MemVT = isa(Node) ? VT.getVectorElementType() + : Node->getOperand(0).getValueType(); SDLoc dl(Node); SDValue FIPtr = DAG.CreateStackTemporary(VT); int FI = cast(FIPtr.getNode())->getIndex(); @@ -1404,7 +1409,7 @@ // Emit a store of each element to the stack slot. SmallVector Stores; - unsigned TypeByteSize = EltVT.getSizeInBits() / 8; + unsigned TypeByteSize = MemVT.getSizeInBits() / 8; assert(TypeByteSize > 0 && "Vector element type too small for stack store!"); // Store (in the right endianness) the elements to memory. for (unsigned i = 0, e = Node->getNumOperands(); i != e; ++i) { @@ -1417,11 +1422,11 @@ // If the destination vector element type is narrower than the source // element type, only store the bits necessary. - if (EltVT.bitsLT(Node->getOperand(i).getValueType().getScalarType())) { + if (MemVT.bitsLT(Node->getOperand(i).getValueType())) Stores.push_back(DAG.getTruncStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, - PtrInfo.getWithOffset(Offset), EltVT)); - } else + PtrInfo.getWithOffset(Offset), MemVT)); + else Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, PtrInfo.getWithOffset(Offset))); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83303.276049.patch Type: text/x-patch Size: 4320 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:54:37 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:54:37 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2136021 , @jdoerfert wrote: > In D83268#2135988 , @ABataev wrote: > > > In D83268#2135955 , @jdoerfert wrote: > > > > > > Especially taking into account, that libomp can be built separately. > > > > > > This is *not* affecting libomp in any way. > > > > > > libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no? > > > No they are not. `llvm-project/openmp/libomptarget` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:56:41 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:56:41 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2135951 , @jdoerfert wrote: > > I don't think gcc can be using this runtime library for nvptx. > > Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon). > > > Use of the new library with the previous version of the compiler. > > Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain. There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level? Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as `.a` library, without LLVM IR inlining. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:57:44 2020 From: llvm-commits at lists.llvm.org (Anirudh Prasad via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:57:44 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation In-Reply-To: References: Message-ID: anirudhp updated this revision to Diff 276050. anirudhp added a comment. - Addressing the clang-tidy warning, adding const to the auto var - Removing unnecessary parenthesis in a ternary assignment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83251/new/ https://reviews.llvm.org/D83251 Files: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/test/MC/SystemZ/insn-bad.s llvm/test/MC/SystemZ/insn-good-z13.s llvm/test/MC/SystemZ/insn-good-z14.s llvm/test/MC/SystemZ/insn-good-z15.s llvm/test/MC/SystemZ/insn-good.s llvm/test/MC/SystemZ/regs-good.s llvm/test/MC/SystemZ/tokens.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83251.276050.patch Type: text/x-patch Size: 40228 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:59:37 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:59:37 +0000 (UTC) Subject: [PATCH] D83304: [LiveDebugValues][NFC] 0/4 Move LiveDebugValues source file ahead of refactor Message-ID: jmorse created this revision. jmorse added reviewers: aprantl, vsk, djtodoro, probinson, Orlando, StephenTozer, TWeaver. Herald added subscribers: llvm-commits, hiraditya, mgorny. Herald added a project: LLVM. This is the base patch of the stack from D83046 , where a new implementation of the pass is being merged in. To allow arc and other tools to construct the stack correctly, this base patch moves one source file, ahead of other things being built around it. (No functional change). Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83304 Files: llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LiveDebugValues.cpp llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp Index: llvm/lib/CodeGen/CMakeLists.txt =================================================================== --- llvm/lib/CodeGen/CMakeLists.txt +++ llvm/lib/CodeGen/CMakeLists.txt @@ -49,7 +49,6 @@ LatencyPriorityQueue.cpp LazyMachineBlockFrequencyInfo.cpp LexicalScopes.cpp - LiveDebugValues.cpp LiveDebugVariables.cpp LiveIntervals.cpp LiveInterval.cpp @@ -182,6 +181,8 @@ WinEHPrepare.cpp XRayInstrumentation.cpp + LiveDebugValues/VarLocBasedImpl.cpp + ADDITIONAL_HEADER_DIRS ${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen ${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP -------------- next part -------------- A non-text attachment was scrubbed... Name: D83304.276046.patch Type: text/x-patch Size: 596 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 06:59:40 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:59:40 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <15aae2f23c2cbc4ad40373217bd87561@localhost.localdomain> jdoerfert added a comment. In D83268#2136031 , @ABataev wrote: > There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level? That is the point. They might or might not be, right? There is no guarantee they are. > Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as `.a` library, without LLVM IR inlining. That mode is deprecated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 06:59:42 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 13:59:42 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: gchatelet updated this revision to Diff 276051. gchatelet added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 Files: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83202.276051.patch Type: text/x-patch Size: 11588 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:00:38 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:00:38 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: jdoerfert added a comment. In D83268#2136029 , @ABataev wrote: > `llvm-project/openmp/libomptarget` Please use more words. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 07:00:44 2020 From: llvm-commits at lists.llvm.org (LiuChen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:00:44 +0000 (UTC) Subject: [PATCH] D83175: [X86] Fix a bug that when lowering byval argument In-Reply-To: References: Message-ID: <4894825963f6bed81d43202eb17388ed@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGea85ff82c826: [X86] Fix a bug that when lowering byval argument (authored by LiuChen3). Changed prior to commit: https://reviews.llvm.org/D83175?vs=275547&id=275662#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83175/new/ https://reviews.llvm.org/D83175 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/X86/win64-byval.ll Index: llvm/test/CodeGen/X86/win64-byval.ll =================================================================== --- llvm/test/CodeGen/X86/win64-byval.ll +++ llvm/test/CodeGen/X86/win64-byval.ll @@ -32,3 +32,31 @@ call void @foo({ float, double }* byval %arg) ret void } + +declare void @foo2({ float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, { float, double }* byval, i64 %f) + at data = external constant { float, double } + +define void @test() { +; CHECK-LABEL: @test +; CHECK: movq (%rax), %rcx +; CHECK-NEXT: movq 8(%rax), %rax +; CHECK-NEXT: movq %rax, 120(%rsp) +; CHECK-NEXT: movq %rcx, 112(%rsp) +; CHECK-NEXT: movq %rcx, 96(%rsp) +; CHECK-NEXT: movq %rax, 104(%rsp) +; CHECK-NEXT: movq %rcx, 80(%rsp) +; CHECK-NEXT: movq %rax, 88(%rsp) +; CHECK-NEXT: movq %rcx, 64(%rsp) +; CHECK-NEXT: movq %rax, 72(%rsp) +; CHECK-NEXT: movq %rax, 56(%rsp) +; CHECK-NEXT: movq %rcx, 48(%rsp) +; CHECK-NEXT: leaq 48(%rsp), %rax +; CHECK-NEXT: movq %rax, 32(%rsp) +; CHECK-NEXT: movq $10, 40(%rsp) +; CHECK-NEXT: leaq 112(%rsp), %rcx +; CHECK-NEXT: leaq 96(%rsp), %rdx +; CHECK-NEXT: leaq 80(%rsp), %r8 +; CHECK-NEXT: leaq 64(%rsp), %r9 + call void @foo2({ float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, { float, double }* byval @G, i64 10) + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.h =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.h +++ llvm/lib/Target/X86/X86ISelLowering.h @@ -1436,7 +1436,7 @@ SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const; + ISD::ArgFlagsTy Flags, bool isByval) const; // Call lowering helpers. Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -3763,12 +3763,13 @@ SDValue Arg, const SDLoc &dl, SelectionDAG &DAG, const CCValAssign &VA, - ISD::ArgFlagsTy Flags) const { + ISD::ArgFlagsTy Flags, + bool isByVal) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), StackPtr, PtrOff); - if (Flags.isByVal()) + if (isByVal) return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl); return DAG.getStore( @@ -4080,7 +4081,7 @@ StackPtr = DAG.getCopyFromReg(Chain, dl, RegInfo->getStackRegister(), getPointerTy(DAG.getDataLayout())); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, - dl, DAG, VA, Flags)); + dl, DAG, VA, Flags, isByVal)); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83175.275662.patch Type: text/x-patch Size: 3438 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:01:15 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:01:15 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <839dbeaf2ca28a50c19f8c3071bd94bc@localhost.localdomain> ABataev added a comment. In D83268#2136054 , @jdoerfert wrote: > In D83268#2136029 , @ABataev wrote: > > > `llvm-project/openmp/libomptarget` > > > Please use more words. libomptarget is part of libomp Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 07:01:19 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:01:19 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll:13 +; CHECK: ptrue [[OUT_PG:p[0-9]+]].d, vl4 +; CHECK: mov x[[LO_ADDR:[0-9]+]], sp +; CHECK: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x[[LO_ADDR]]] ---------------- The tests are a little looser than I'd like but this mov prevents me from using CHECK-DAG because there's a similar unrelated mov that's part of the function prolog. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Tue Jul 7 07:04:22 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:04:22 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: ABataev added a comment. In D83268#2136049 , @jdoerfert wrote: > In D83268#2136031 , @ABataev wrote: > > > There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level? > > > That is the point. They might or might not be, right? There is no guarantee they are. Better to ask the users. Maybe, send an RFC to openmp-devs? > > >> Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as `.a` library, without LLVM IR inlining. > > That mode is deprecated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 07:04:50 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Tue, 07 Jul 2020 07:04:50 -0700 (PDT) Subject: [llvm] 16266e6 - [Scalarizer] When gathering scattered scalar, don't replace it with itself Message-ID: <5f048102.1c69fb81.673ba.992e@mx.google.com> Author: Roman Lebedev Date: 2020-07-07T17:03:53+03:00 New Revision: 16266e63963ad6ee27ad21983a9366ab313dfd03 URL: https://github.com/llvm/llvm-project/commit/16266e63963ad6ee27ad21983a9366ab313dfd03 DIFF: https://github.com/llvm/llvm-project/commit/16266e63963ad6ee27ad21983a9366ab313dfd03.diff LOG: [Scalarizer] When gathering scattered scalar, don't replace it with itself The (previously-crashing) test-case would cause us to seemingly-harmlessly replace some use with something else, but we can't replace it with itself, so we would crash. Added: Modified: llvm/lib/Transforms/Scalar/Scalarizer.cpp llvm/test/Transforms/Scalarizer/crash-bug.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index e8fea501b1d8..3d650c66a862 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -944,6 +944,8 @@ bool ScalarizerVisitor::finish() { } else { assert(CV.size() == 1 && Op->getType() == CV[0]->getType()); Res = CV[0]; + if (Op == Res) + continue; } Res->takeName(Op); Op->replaceAllUsesWith(Res); diff --git a/llvm/test/Transforms/Scalarizer/crash-bug.ll b/llvm/test/Transforms/Scalarizer/crash-bug.ll index d581707971e7..97756362d424 100644 --- a/llvm/test/Transforms/Scalarizer/crash-bug.ll +++ b/llvm/test/Transforms/Scalarizer/crash-bug.ll @@ -22,3 +22,32 @@ bb3: ret void } +; See https://reviews.llvm.org/D83101#2135945 +define void @f1_crash(<2 x i16> %base, i1 %c, <2 x i16>* %ptr) { +; CHECK-LABEL: @f1_crash( +; CHECK: vector.ph: +; CHECK: %base.i0 = extractelement <2 x i16> %base, i32 0 +; CHECK: %base.i1 = extractelement <2 x i16> %base, i32 1 +; CHECK: br label %vector.body115 +; CHECK: vector.body115: ; preds = %vector.body115, %vector.ph +; CHECK: %vector.recur.i0 = phi i16 [ %base.i0, %vector.ph ], [ %wide.load125.i0, %vector.body115 ] +; CHECK: %vector.recur.i1 = phi i16 [ %base.i1, %vector.ph ], [ %wide.load125.i1, %vector.body115 ] +; CHECK: %wide.load125 = load <2 x i16>, <2 x i16>* %ptr, align 1 +; CHECK: %wide.load125.i0 = extractelement <2 x i16> %wide.load125, i32 0 +; CHECK: %wide.load125.i1 = extractelement <2 x i16> %wide.load125, i32 1 +; CHECK: br i1 %c, label %middle.block113, label %vector.body115 +; CHECK: middle.block113: ; preds = %vector.body115 +; CHECK: ret void +; CHECK: } + +vector.ph: + br label %vector.body115 + +vector.body115: + %vector.recur = phi <2 x i16> [ %base, %vector.ph ], [ %wide.load125, %vector.body115 ] + %wide.load125 = load <2 x i16>, <2 x i16>* %ptr, align 1 + br i1 %c, label %middle.block113, label %vector.body115 + +middle.block113: + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 07:06:32 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:06:32 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: lebedev.ri added a comment. In D83101#2135962 , @lebedev.ri wrote: > In D83101#2135945 , @bjope wrote: > > > I unfortunately still see some problems (related to the Scalarizer changes, and probably this patch): > > > > > cat scalarizer-bug.ll > > ; RUN: opt < %s -scalarizer -S -o - > > > > define void @foo() { > > vector.ph: > > br label %vector.body115 > > > > vector.body115: ; preds = %vector.body115, %vector.ph > > %vector.recur = phi <4 x i16> [ undef, %vector.ph ], [ %wide.load125, %vector.body115 ] > > %wide.load125 = load <4 x i16>, <4 x i16>* undef, align 1 > > br i1 undef, label %middle.block113, label %vector.body115 > > > > middle.block113: ; preds = %vector.body115 > > ret void > > } > > > > > > ----------------------------------------------------- > > > > > ~/opt.master -scalarizer scalarizer-bug.ll -S > > opt.master: ../lib/IR/Value.cpp:458: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `!contains(New, this) && "this->replaceAllUsesWith(expr(this)) is NOT valid!"' failed. > > PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. > > Stack dump: > > 0. Program arguments: /home/uabbpet/opt.master -scalarizer scalarizer-bug.ll -S > > 1. Running pass 'Function Pass Manager' on module 'scalarizer-bug.ll'. > > 2. Running pass 'Scalarize vector operations' on function '@foo' > > Abort > > > > > Acknowledged, looking. Hm, this is saddening. I've fixed it in rG16266e63963ad6ee27ad21983a9366ab313dfd03 , but are there more? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 07:07:33 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:07:33 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: jdoerfert added a comment. In D83268#2136055 , @ABataev wrote: > In D83268#2136054 , @jdoerfert wrote: > > > In D83268#2136029 , @ABataev wrote: > > > > > `llvm-project/openmp/libomptarget` > > > > > > Please use more words. > > > libomptarget is part of libomp As mentioned before, no it is not. Despite the similarity in name, libomp and libomptarget are distinct libraries, this was a conscious design choice. FWIW, this patch does *not* modify libomptarget either. This modifies *the device runtime*, aka. libomptarget-nvptx-sm_XXX.bc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 07:07:35 2020 From: llvm-commits at lists.llvm.org (Aaron Ballman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:07:35 +0000 (UTC) Subject: [PATCH] D80301: [yaml][clang-tidy] Fix multiline YAML serialization In-Reply-To: References: Message-ID: <328025d1b47fcb5c1d424ba31d48a638@localhost.localdomain> aaron.ballman accepted this revision. aaron.ballman added a comment. This revision is now accepted and ready to land. LGTM, this seems like a pretty clean solution. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80301/new/ https://reviews.llvm.org/D80301 From llvm-commits at lists.llvm.org Tue Jul 7 07:09:14 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:09:14 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: lebedev.ri added a comment. Forgot to say, thank you for the reduced reproducer! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 07:11:05 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via llvm-commits) Date: Tue, 07 Jul 2020 07:11:05 -0700 (PDT) Subject: [llvm] c9fb7f8 - [DEBUGINFO]Add dwarf versions to the test, NFC. Message-ID: <5f048279.1c69fb81.54774.230e@mx.google.com> Author: Alexey Bataev Date: 2020-07-07T10:10:44-04:00 New Revision: c9fb7f81715a321a00096e0463b0e8635204641e URL: https://github.com/llvm/llvm-project/commit/c9fb7f81715a321a00096e0463b0e8635204641e DIFF: https://github.com/llvm/llvm-project/commit/c9fb7f81715a321a00096e0463b0e8635204641e.diff LOG: [DEBUGINFO]Add dwarf versions to the test, NFC. Added: Modified: llvm/test/DebugInfo/X86/packed_bitfields.ll Removed: ################################################################################ diff --git a/llvm/test/DebugInfo/X86/packed_bitfields.ll b/llvm/test/DebugInfo/X86/packed_bitfields.ll index 66768f98c02c..c36df72439eb 100644 --- a/llvm/test/DebugInfo/X86/packed_bitfields.ll +++ b/llvm/test/DebugInfo/X86/packed_bitfields.ll @@ -1,5 +1,7 @@ -; RUN: llc -mtriple x86_64-apple-macosx -O0 -filetype=obj -o %t_le.o %s -; RUN: llvm-dwarfdump -v -debug-info %t_le.o | FileCheck %s +; RUN: llc -dwarf-version=2 -mtriple x86_64-apple-macosx -O0 -filetype=obj -o %t_2_le.o %s +; RUN: llvm-dwarfdump -v -debug-info %t_2_le.o | FileCheck %s +; RUN: llc -dwarf-version=4 -debugger-tune=gdb -mtriple x86_64-apple-macosx -O0 -filetype=obj -o %t_4_le.o %s +; RUN: llvm-dwarfdump -v -debug-info %t_4_le.o | FileCheck %s ; Produced at -O0 from: ; struct { @@ -16,7 +18,7 @@ ; CHECK: DW_AT_byte_size {{.*}} (0x01) ; CHECK-NEXT: DW_AT_bit_size {{.*}} (0x06) ; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff) -; CHECK-NEXT: DW_AT_data_member_location {{.*}} (DW_OP_plus_uconst 0x0) +; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}}) ; ModuleID = 'repro.c' source_filename = "repro.c" From llvm-commits at lists.llvm.org Tue Jul 7 07:11:12 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:11:12 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: Ayal added a comment. In D81416#2096034 , @AaronLiu wrote: > In D81416#2095961 , @spatel wrote: > > > IIUC, we should add a test under test/Transforms/PhaseOrdering with -O2 to show the cooperative effect of the 2 vectorizers rather than a stand-alone SLP test. > > If you can push that test with full baseline CHECK lines and then apply this patch and show test diffs, that would make it much easier to tell what is intended with this patch. > > > Thanks for the comment. This patch does not intend to change or test phase ordering. In this patch, we interleave for small loops with scalar reductions which cannot be vectorized by LV, and later on SLP captures the opportunities. Interleaving is done by LV, and vectorization is done by SLP. LV *cannot* vectorize the loop, or will not do so? If LV can interleave the loop, it should be able to also/instead vectorize it, unless there are some other obstacles? Is this an issue of LV's cost-model being more conservative than SLP's? If so, would updating LV's cost-model be a (more) appropriate remedy, than convincing LV to unroll for SLP? The term "small loops" may be confusing; it presumably relates to loops having small number of instructions or low ILP(?), rather than small trip-count. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Tue Jul 7 07:11:48 2020 From: llvm-commits at lists.llvm.org (Nathan James via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:11:48 +0000 (UTC) Subject: [PATCH] D83305: [ADT] Fix join_impl using the wrong size when calculating total length Message-ID: njames93 created this revision. njames93 added reviewers: joerg, chandlerc. Herald added subscribers: llvm-commits, dexonsmith. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83305 Files: llvm/include/llvm/ADT/StringExtras.h Index: llvm/include/llvm/ADT/StringExtras.h =================================================================== --- llvm/include/llvm/ADT/StringExtras.h +++ llvm/include/llvm/ADT/StringExtras.h @@ -338,7 +338,7 @@ size_t Len = (std::distance(Begin, End) - 1) * Separator.size(); for (IteratorT I = Begin; I != End; ++I) - Len += (*Begin).size(); + Len += I->size(); S.reserve(Len); S += (*Begin); while (++Begin != End) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83305.276052.patch Type: text/x-patch Size: 445 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:21:05 2020 From: llvm-commits at lists.llvm.org (Jonas Hahnfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:21:05 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <66cc765ee6c473de8742d0ae31e1c0ce@localhost.localdomain> Hahnfeld added a comment. In D83268#2135930 , @JonChesterfield wrote: > Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM. > > @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost? I'm neither, and I've long argued that being able to build the OpenMP runtime(s) without Clang trunk is an important use case. These arguments have gone largely unheard, so I'll not join this discussion once more. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 07:23:04 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:23:04 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <65930215e855dc070ae3809241cedcd6@localhost.localdomain> ABataev updated this revision to Diff 276057. ABataev added a comment. Rebase + used `align...` functions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 Files: llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp llvm/test/DebugInfo/X86/packed_bitfields.ll Index: llvm/test/DebugInfo/X86/packed_bitfields.ll =================================================================== --- llvm/test/DebugInfo/X86/packed_bitfields.ll +++ llvm/test/DebugInfo/X86/packed_bitfields.ll @@ -15,9 +15,9 @@ ; CHECK: DW_TAG_member ; CHECK-NEXT: DW_AT_name{{.*}}"a" ; CHECK-NOT: DW_TAG_member -; CHECK: DW_AT_byte_size {{.*}} (0x01) +; CHECK: DW_AT_byte_size {{.*}} (0x02) ; CHECK-NEXT: DW_AT_bit_size {{.*}} (0x06) -; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff) +; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0x07) ; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}}) ; ModuleID = 'repro.c' Index: llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp =================================================================== --- llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp +++ llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp @@ -1541,12 +1541,15 @@ bool IsBitfield = FieldSize && Size != FieldSize; if (IsBitfield) { + uint64_t Offset = DT->getOffsetInBits(); // Handle bitfield, assume bytes are 8 bits. + uint64_t StorageSize = alignTo(Offset + Size, 8) - alignDown(Offset, 8); + if (StorageSize > FieldSize) + FieldSize = StorageSize; if (DD->useDWARF2Bitfields()) addUInt(MemberDie, dwarf::DW_AT_byte_size, None, FieldSize/8); addUInt(MemberDie, dwarf::DW_AT_bit_size, None, Size); - uint64_t Offset = DT->getOffsetInBits(); // We can't use DT->getAlignInBits() here: AlignInBits for member type // is non-zero if and only if alignment was forced (e.g. _Alignas()), // which can't be done with bitfields. Thus we use FieldSize here. -------------- next part -------------- A non-text attachment was scrubbed... Name: D82881.276057.patch Type: text/x-patch Size: 1670 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:26:59 2020 From: llvm-commits at lists.llvm.org (Bjorn Pettersson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:26:59 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <452a5909b607f3d00eb74848322f2e24@localhost.localdomain> bjope added a comment. In D83101#2136062 , @lebedev.ri wrote: > In D83101#2135962 , @lebedev.ri wrote: > > > In D83101#2135945 , @bjope wrote: > > > > > I unfortunately still see some problems (related to the Scalarizer changes, and probably this patch): > > > > > > > cat scalarizer-bug.ll > > > ; RUN: opt < %s -scalarizer -S -o - > > > > > > define void @foo() { > > > vector.ph: > > > br label %vector.body115 > > > > > > vector.body115: ; preds = %vector.body115, %vector.ph > > > %vector.recur = phi <4 x i16> [ undef, %vector.ph ], [ %wide.load125, %vector.body115 ] > > > %wide.load125 = load <4 x i16>, <4 x i16>* undef, align 1 > > > br i1 undef, label %middle.block113, label %vector.body115 > > > > > > middle.block113: ; preds = %vector.body115 > > > ret void > > > } > > > > > > > > > ----------------------------------------------------- > > > > > > > ~/opt.master -scalarizer scalarizer-bug.ll -S > > > opt.master: ../lib/IR/Value.cpp:458: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `!contains(New, this) && "this->replaceAllUsesWith(expr(this)) is NOT valid!"' failed. > > > PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. > > > Stack dump: > > > 0. Program arguments: /home/uabbpet/opt.master -scalarizer scalarizer-bug.ll -S > > > 1. Running pass 'Function Pass Manager' on module 'scalarizer-bug.ll'. > > > 2. Running pass 'Scalarize vector operations' on function '@foo' > > > Abort > > > > > > > > > Acknowledged, looking. > > > Hm, this is saddening. I've fixed it in rG16266e63963ad6ee27ad21983a9366ab313dfd03 , but are there more? Ah, maybe I was so occupied reducing the fault so I missed that there was another fix. I actually did take a look in github just to avoid reporting a problem that had been fixed, but must have done it just before rG16266e63963ad6ee27ad21983a9366ab313dfd03 landed. I'll fetch, rebuild, and test. Let you know if it didn't help. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 07:28:02 2020 From: llvm-commits at lists.llvm.org (Ronak Chauhan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:28:02 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: <43fd176e08c380a78cafda6546f97d38@localhost.localdomain> rochauha updated this revision to Diff 276059. rochauha added a comment. - Handled default statement to silence the warning. - Expanded comments for decodeCOMPUTE_PGM_RSRC1 and decodeCOMPUTE_PGM_RSRC2. - Removed extra comment at the end of functions. - Changed SoftFail to Success for code object v2. - Replaced the old test case with a small assembly file. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 Files: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s llvm/tools/llvm-objdump/llvm-objdump.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80713.276059.patch Type: text/x-patch Size: 18712 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:32:14 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:32:14 +0000 (UTC) Subject: [PATCH] D83267: [flang] Add lowering of I/O statements In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG216a54a04b9b: [flang] Add lowering of I/O statements. (authored by schweitz). Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D83267?vs=275864&id=276060#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83267/new/ https://reviews.llvm.org/D83267 Files: flang/include/flang/Lower/IO.h flang/lib/Lower/CMakeLists.txt flang/lib/Lower/IO.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83267.276060.patch Type: text/x-patch Size: 72010 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:34:03 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:34:03 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: lebedev.ri added a comment. In D83101#2136101 , @bjope wrote: > In D83101#2136062 , @lebedev.ri wrote: > > > In D83101#2135962 , @lebedev.ri wrote: > > > > > In D83101#2135945 , @bjope wrote: > > > > > > > I unfortunately still see some problems (related to the Scalarizer changes, and probably this patch): > > > > > > > > > cat scalarizer-bug.ll > > > > ; RUN: opt < %s -scalarizer -S -o - > > > > > > > > define void @foo() { > > > > vector.ph: > > > > br label %vector.body115 > > > > > > > > vector.body115: ; preds = %vector.body115, %vector.ph > > > > %vector.recur = phi <4 x i16> [ undef, %vector.ph ], [ %wide.load125, %vector.body115 ] > > > > %wide.load125 = load <4 x i16>, <4 x i16>* undef, align 1 > > > > br i1 undef, label %middle.block113, label %vector.body115 > > > > > > > > middle.block113: ; preds = %vector.body115 > > > > ret void > > > > } > > > > > > > > > > > > ----------------------------------------------------- > > > > > > > > > ~/opt.master -scalarizer scalarizer-bug.ll -S > > > > opt.master: ../lib/IR/Value.cpp:458: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `!contains(New, this) && "this->replaceAllUsesWith(expr(this)) is NOT valid!"' failed. > > > > PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. > > > > Stack dump: > > > > 0. Program arguments: /home/uabbpet/opt.master -scalarizer scalarizer-bug.ll -S > > > > 1. Running pass 'Function Pass Manager' on module 'scalarizer-bug.ll'. > > > > 2. Running pass 'Scalarize vector operations' on function '@foo' > > > > Abort > > > > > > > > > > > > > Acknowledged, looking. > > > > > > Hm, this is saddening. I've fixed it in rG16266e63963ad6ee27ad21983a9366ab313dfd03 , but are there more? > > > Ah, maybe I was so occupied reducing the fault so I missed that there was another fix. I actually did take a look in github just to avoid reporting a problem that had been fixed, but must have done it just before rG16266e63963ad6ee27ad21983a9366ab313dfd03 landed. Nono, rG16266e63963ad6ee27ad21983a9366ab313dfd03 is the fix as a reaction to the bug you reported, your report wasn't a duplicate, sorry for confusion. > I'll fetch, rebuild, and test. Let you know if it didn't help. Yes, please, thank you! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 07:35:24 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Tue, 07 Jul 2020 07:35:24 -0700 (PDT) Subject: [llvm] abdd367 - [Bitfields][NFC] Make sure bitfields are contiguous Message-ID: <5f04882c.1c69fb81.7dcae.b135@mx.google.com> Author: Guillaume Chatelet Date: 2020-07-07T14:35:13Z New Revision: abdd367b200a0bf4176dbdaf200b23f750a35cb0 URL: https://github.com/llvm/llvm-project/commit/abdd367b200a0bf4176dbdaf200b23f750a35cb0 DIFF: https://github.com/llvm/llvm-project/commit/abdd367b200a0bf4176dbdaf200b23f750a35cb0.diff LOG: [Bitfields][NFC] Make sure bitfields are contiguous Differential Revision: https://reviews.llvm.org/D83202 Added: Modified: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/ADT/Bitfields.h b/llvm/include/llvm/ADT/Bitfields.h index 9891b4692e80..d93f6483fa52 100644 --- a/llvm/include/llvm/ADT/Bitfields.h +++ b/llvm/include/llvm/ADT/Bitfields.h @@ -228,6 +228,7 @@ struct Bitfield { static constexpr unsigned Bits = Size; static constexpr unsigned FirstBit = Offset; static constexpr unsigned LastBit = Shift + Bits - 1; + static constexpr unsigned NextBit = Shift + Bits; private: template friend struct bitfields_details::Impl; @@ -275,6 +276,12 @@ struct Bitfield { template static constexpr bool isOverlapping() { return A::LastBit >= B::FirstBit && B::LastBit >= A::FirstBit; } + + template static constexpr bool areContiguous() { return true; } + template + static constexpr bool areContiguous() { + return A::NextBit == B::FirstBit && areContiguous(); + } }; } // namespace llvm diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h index 770d3183a909..8408c8772b22 100644 --- a/llvm/include/llvm/IR/InstrTypes.h +++ b/llvm/include/llvm/IR/InstrTypes.h @@ -758,7 +758,7 @@ class CmpInst : public Instruction { BAD_ICMP_PREDICATE = ICMP_SLE + 1 }; using PredicateField = - Bitfield::Element; // Next bit:6 + Bitfield::Element; protected: CmpInst(Type *ty, Instruction::OtherOps op, Predicate pred, @@ -1096,12 +1096,16 @@ using ConstOperandBundleDef = OperandBundleDefT; /// subclass requires. Note that accessing the end of the argument list isn't /// as cheap as most other operations on the base class. class CallBase : public Instruction { - // The first two bits are reserved by CallInst for fast retrieving, - using CallInstReservedField = Bitfield::Element; // Next bit:2 - using CallingConvField = Bitfield::Element; // Next bit:12 - protected: + // The first two bits are reserved by CallInst for fast retrieval, + using CallInstReservedField = Bitfield::Element; + using CallingConvField = + Bitfield::Element; + static_assert( + Bitfield::areContiguous(), + "Bitfields must be contiguous"); + /// The last operand is the called operand. static constexpr int CalledOperandOpEndIdx = -1; diff --git a/llvm/include/llvm/IR/Instruction.h b/llvm/include/llvm/IR/Instruction.h index a85353ff9027..a03eac0ad40d 100644 --- a/llvm/include/llvm/IR/Instruction.h +++ b/llvm/include/llvm/IR/Instruction.h @@ -23,6 +23,7 @@ #include "llvm/IR/SymbolTableListTraits.h" #include "llvm/IR/User.h" #include "llvm/IR/Value.h" +#include "llvm/Support/AtomicOrdering.h" #include "llvm/Support/Casting.h" #include #include @@ -53,7 +54,7 @@ class Instruction : public User, protected: // The 15 first bits of `Value::SubclassData` are available for subclasses of // `Instruction` to use. - using OpaqueField = Bitfield::Element; // Next bit:15 + using OpaqueField = Bitfield::Element; // Template alias so that all Instruction storing alignment use the same // definiton. @@ -61,10 +62,18 @@ class Instruction : public User, // 2^29. We store them as Log2(Alignment), so we need 5 bits to encode the 30 // possible values. template - using AlignmentBitfieldElement = + using AlignmentBitfieldElementT = typename Bitfield::Element; + template + using BoolBitfieldElementT = typename Bitfield::Element; + + template + using AtomicOrderingBitfieldElementT = + typename Bitfield::Element; + private: // The last bit is used to store whether the instruction has metadata attached // or not. diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h index 7119b1392d2d..0afc585dfbe5 100644 --- a/llvm/include/llvm/IR/Instructions.h +++ b/llvm/include/llvm/IR/Instructions.h @@ -60,9 +60,12 @@ class LLVMContext; class AllocaInst : public UnaryInstruction { Type *AllocatedType; - using AlignmentField = AlignmentBitfieldElement<0>; // Next bit:5 - using UsedWithInAllocaField = Bitfield::Element; // Next bit:6 - using SwiftErrorField = Bitfield::Element; // Next bit:7 + using AlignmentField = AlignmentBitfieldElementT<0>; + using UsedWithInAllocaField = BoolBitfieldElementT; + using SwiftErrorField = BoolBitfieldElementT; + static_assert(Bitfield::areContiguous(), + "Bitfields must be contiguous"); protected: // Note: Instruction needs to be a friend here to call cloneImpl. @@ -168,10 +171,12 @@ class AllocaInst : public UnaryInstruction { /// An instruction for reading from memory. This uses the SubclassData field in /// Value to store whether or not the load is volatile. class LoadInst : public UnaryInstruction { - using VolatileField = Bitfield::Element; // Next bit:1 - using AlignmentField = AlignmentBitfieldElement<1>; // Next bit:6 - using OrderingField = Bitfield::Element; // Next bit:9 + using VolatileField = BoolBitfieldElementT<0>; + using AlignmentField = AlignmentBitfieldElementT; + using OrderingField = AtomicOrderingBitfieldElementT; + static_assert( + Bitfield::areContiguous(), + "Bitfields must be contiguous"); void AssertOK(); @@ -295,10 +300,12 @@ class LoadInst : public UnaryInstruction { /// An instruction for storing to memory. class StoreInst : public Instruction { - using VolatileField = Bitfield::Element; // Next bit:1 - using AlignmentField = AlignmentBitfieldElement<1>; // Next bit:6 - using OrderingField = Bitfield::Element; // Next bit:9 + using VolatileField = BoolBitfieldElementT<0>; + using AlignmentField = AlignmentBitfieldElementT; + using OrderingField = AtomicOrderingBitfieldElementT; + static_assert( + Bitfield::areContiguous(), + "Bitfields must be contiguous"); void AssertOK(); @@ -434,8 +441,7 @@ DEFINE_TRANSPARENT_OPERAND_ACCESSORS(StoreInst, Value) /// An instruction for ordering other memory operations. class FenceInst : public Instruction { - using OrderingField = Bitfield::Element; // Next bit:4 + using OrderingField = AtomicOrderingBitfieldElementT<0>; void Init(AtomicOrdering Ordering, SyncScope::ID SSID); @@ -543,11 +549,18 @@ class AtomicCmpXchgInst : public Instruction { return User::operator new(s, 3); } - using VolatileField = Bitfield::Element; // Next bit:1 - using WeakField = Bitfield::Element; // Next bit:2 - using SuccessOrderingField = AtomicOrderingBitfieldElement<2>; // Next bit:5 - using FailureOrderingField = AtomicOrderingBitfieldElement<5>; // Next bit:8 - using AlignmentField = AlignmentBitfieldElement<8>; // Next bit:13 + using VolatileField = BoolBitfieldElementT<0>; + using WeakField = BoolBitfieldElementT; + using SuccessOrderingField = + AtomicOrderingBitfieldElementT; + using FailureOrderingField = + AtomicOrderingBitfieldElementT; + using AlignmentField = + AlignmentBitfieldElementT; + static_assert( + Bitfield::areContiguous(), + "Bitfields must be contiguous"); /// Return the alignment of the memory that is being allocated by the /// instruction. @@ -755,10 +768,14 @@ class AtomicRMWInst : public Instruction { return User::operator new(s, 2); } - using VolatileField = Bitfield::Element; // Next bit:1 - using AtomicOrderingField = AtomicOrderingBitfieldElement<1>; // Next bit:4 - using OperationField = BinOpBitfieldElement<4>; // Next bit:8 - using AlignmentField = AlignmentBitfieldElement<8>; // Next bit:13 + using VolatileField = BoolBitfieldElementT<0>; + using AtomicOrderingField = + AtomicOrderingBitfieldElementT; + using OperationField = BinOpBitfieldElement; + using AlignmentField = AlignmentBitfieldElementT; + static_assert(Bitfield::areContiguous(), + "Bitfields must be contiguous"); BinOp getOperation() const { return getSubclassData(); } @@ -1591,6 +1608,9 @@ class CallInst : public CallBase { }; using TailCallKindField = Bitfield::Element; + static_assert( + Bitfield::areContiguous(), + "Bitfields must be contiguous"); TailCallKind getTailCallKind() const { return getSubclassData(); @@ -2754,7 +2774,7 @@ DEFINE_TRANSPARENT_OPERAND_ACCESSORS(PHINode, Value) /// cleanup. /// class LandingPadInst : public Instruction { - using CleanupField = Bitfield::Element; + using CleanupField = BoolBitfieldElementT<0>; /// The number of operands actually allocated. NumOperands is /// the number actually in use. @@ -4125,7 +4145,7 @@ DEFINE_TRANSPARENT_OPERAND_ACCESSORS(ResumeInst, Value) // CatchSwitchInst Class //===----------------------------------------------------------------------===// class CatchSwitchInst : public Instruction { - using UnwindDestField = Bitfield::Element; // Next bit:1 + using UnwindDestField = BoolBitfieldElementT<0>; /// The number of operands actually allocated. NumOperands is /// the number actually in use. @@ -4474,7 +4494,8 @@ DEFINE_TRANSPARENT_OPERAND_ACCESSORS(CatchReturnInst, Value) //===----------------------------------------------------------------------===// class CleanupReturnInst : public Instruction { - using UnwindDestField = Bitfield::Element; // Next bit:1 + using UnwindDestField = BoolBitfieldElementT<0>; + private: CleanupReturnInst(const CleanupReturnInst &RI); CleanupReturnInst(Value *CleanupPad, BasicBlock *UnwindBB, unsigned Values, diff --git a/llvm/unittests/ADT/BitFieldsTest.cpp b/llvm/unittests/ADT/BitFieldsTest.cpp index 759c18394995..3062d5d7f293 100644 --- a/llvm/unittests/ADT/BitFieldsTest.cpp +++ b/llvm/unittests/ADT/BitFieldsTest.cpp @@ -192,6 +192,18 @@ TEST(BitfieldsTest, isOverlapping) { EXPECT_FALSE((Bitfield::isOverlapping())); } +TEST(BitfieldsTest, areContiguous) { + using A = Bitfield::Element; // Next Bit:1 + using B = Bitfield::Element; // Next Bit:5 + using C = Bitfield::Element; // Next Bit:8 + EXPECT_TRUE((Bitfield::areContiguous())); + EXPECT_TRUE((Bitfield::areContiguous())); + + EXPECT_FALSE((Bitfield::areContiguous())); + EXPECT_FALSE((Bitfield::areContiguous())); + EXPECT_FALSE((Bitfield::areContiguous())); +} + TEST(BitfieldsTest, FullUint64) { uint64_t Storage = 0; using Value = Bitfield::Element; From llvm-commits at lists.llvm.org Tue Jul 7 07:35:39 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:35:39 +0000 (UTC) Subject: [PATCH] D83202: [Bitfields][NFC] Make sure bitfields are contiguous In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. gchatelet marked an inline comment as done. Closed by commit rGabdd367b200a: [Bitfields][NFC] Make sure bitfields are contiguous (authored by gchatelet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83202/new/ https://reviews.llvm.org/D83202 Files: llvm/include/llvm/ADT/Bitfields.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instruction.h llvm/include/llvm/IR/Instructions.h llvm/unittests/ADT/BitFieldsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83202.276062.patch Type: text/x-patch Size: 11588 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:38:29 2020 From: llvm-commits at lists.llvm.org (Alexander Richardson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:38:29 +0000 (UTC) Subject: [PATCH] D78478: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py In-Reply-To: References: Message-ID: <97e375991d060115d195b0367ef34567@localhost.localdomain> arichardson added a comment. @MaskRay are you okay with me committing this change and delaying the global search-replace? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78478/new/ https://reviews.llvm.org/D78478 From llvm-commits at lists.llvm.org Tue Jul 7 07:42:05 2020 From: llvm-commits at lists.llvm.org (Bjorn Pettersson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:42:05 +0000 (UTC) Subject: [PATCH] D83101: [Scalarizer] ExtractElement handling w/ constant extract index In-Reply-To: References: Message-ID: <6ab6b19a75410cb9533032d1dde8819d@localhost.localdomain> bjope added a comment. In D83101#2136116 , @lebedev.ri wrote: > > Ah, maybe I was so occupied reducing the fault so I missed that there was another fix. I actually did take a look in github just to avoid reporting a problem that had been fixed, but must have done it just before rG16266e63963ad6ee27ad21983a9366ab313dfd03 landed. > > Nono, rG16266e63963ad6ee27ad21983a9366ab313dfd03 is the fix as a reaction to the bug you reported, your report wasn't a duplicate, sorry for confusion. Ah, thanks! (That was such a quick fix/response so I thought someone else had discovered the same problem.) >> I'll fetch, rebuild, and test. Let you know if it didn't help. > > Yes, please, thank you! The fix works fine. So I don't know about more problems right now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83101/new/ https://reviews.llvm.org/D83101 From llvm-commits at lists.llvm.org Tue Jul 7 07:42:26 2020 From: llvm-commits at lists.llvm.org (Ronak Chauhan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:42:26 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: rochauha updated this revision to Diff 276063. rochauha marked 5 inline comments as done. rochauha added a comment. - Return MCDisassembler::Fail for code object v2. - Add missing full stops in doxygen comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 Files: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s llvm/tools/llvm-objdump/llvm-objdump.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80713.276063.patch Type: text/x-patch Size: 18712 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:42:54 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Tue, 07 Jul 2020 07:42:54 -0700 (PDT) Subject: [llvm] 082e395 - [CodeMoverUtils] Make specific analysis dependent checks optional Message-ID: <5f0489ee.1c69fb81.4ed0d.24b4@mx.google.com> Author: SharmaRithik Date: 2020-07-07T20:11:07+05:30 New Revision: 082e3952300003ecf2eaa6bf346ae2e783b7a02e URL: https://github.com/llvm/llvm-project/commit/082e3952300003ecf2eaa6bf346ae2e783b7a02e DIFF: https://github.com/llvm/llvm-project/commit/082e3952300003ecf2eaa6bf346ae2e783b7a02e.diff LOG: [CodeMoverUtils] Make specific analysis dependent checks optional Summary: This patch makes code motion checks optional which are dependent on specific analysis example, dominator tree, post dominator tree and dependence info. The aim is to make the adoption of CodeMoverUtils easier for clients that don't use analysis which were strictly required by CodeMoverUtils. This will also help in diversifying code motion checks using other analysis example MSSA. Authored By: RithikSharma Reviewer: Whitney, bmahjour, etiotto Reviewed By: Whitney Subscribers: Prazek, hiraditya, george.burgess.iv, asbirlea, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D82566 Added: Modified: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h b/llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h index 3f5c0aed2abf..630f936471f2 100644 --- a/llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h +++ b/llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h @@ -38,14 +38,16 @@ bool isControlFlowEquivalent(const BasicBlock &BB0, const BasicBlock &BB1, /// Return true if \p I can be safely moved before \p InsertPoint. bool isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, - DominatorTree &DT, const PostDominatorTree &PDT, - DependenceInfo &DI); + DominatorTree &DT, + const PostDominatorTree *PDT = nullptr, + DependenceInfo *DI = nullptr); /// Return true if all instructions (except the terminator) in \p BB can be /// safely moved before \p InsertPoint. bool isSafeToMoveBefore(BasicBlock &BB, Instruction &InsertPoint, - DominatorTree &DT, const PostDominatorTree &PDT, - DependenceInfo &DI); + DominatorTree &DT, + const PostDominatorTree *PDT = nullptr, + DependenceInfo *DI = nullptr); /// Move instructions, in an order-preserving manner, from \p FromBB to the /// beginning of \p ToBB when proven safe. diff --git a/llvm/lib/Transforms/Scalar/LoopFuse.cpp b/llvm/lib/Transforms/Scalar/LoopFuse.cpp index e29bf6325850..20edc8699d79 100644 --- a/llvm/lib/Transforms/Scalar/LoopFuse.cpp +++ b/llvm/lib/Transforms/Scalar/LoopFuse.cpp @@ -743,8 +743,8 @@ struct LoopFuser { } if (!isSafeToMoveBefore(*FC1->Preheader, - *FC0->Preheader->getTerminator(), DT, PDT, - DI)) { + *FC0->Preheader->getTerminator(), DT, &PDT, + &DI)) { LLVM_DEBUG(dbgs() << "Fusion candidate contains unsafe " "instructions in preheader. Not fusing.\n"); reportLoopFusion(*FC0, *FC1, @@ -757,7 +757,7 @@ struct LoopFuser { if (!isSafeToMoveBefore(*FC0->ExitBlock, *FC1->ExitBlock->getFirstNonPHIOrDbg(), DT, - PDT, DI)) { + &PDT, &DI)) { LLVM_DEBUG(dbgs() << "Fusion candidate contains unsafe " "instructions in exit block. Not fusing.\n"); reportLoopFusion(*FC0, *FC1, @@ -767,8 +767,8 @@ struct LoopFuser { if (!isSafeToMoveBefore( *FC1->GuardBranch->getParent(), - *FC0->GuardBranch->getParent()->getTerminator(), DT, PDT, - DI)) { + *FC0->GuardBranch->getParent()->getTerminator(), DT, &PDT, + &DI)) { LLVM_DEBUG(dbgs() << "Fusion candidate contains unsafe " "instructions in guard block. Not fusing.\n"); diff --git a/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp b/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp index 4583ff74167a..11a740f8285b 100644 --- a/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp +++ b/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp @@ -297,8 +297,12 @@ collectInstructionsInBetween(Instruction &StartInst, const Instruction &EndInst, } bool llvm::isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, - DominatorTree &DT, const PostDominatorTree &PDT, - DependenceInfo &DI) { + DominatorTree &DT, const PostDominatorTree *PDT, + DependenceInfo *DI) { + // Skip tests when we don't have PDT or DI + if (!PDT || !DI) + return false; + // Cannot move itself before itself. if (&I == &InsertPoint) return false; @@ -314,7 +318,7 @@ bool llvm::isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, return reportInvalidCandidate(I, NotMovedTerminator); // TODO remove this limitation. - if (!isControlFlowEquivalent(I, InsertPoint, DT, PDT)) + if (!isControlFlowEquivalent(I, InsertPoint, DT, *PDT)) return reportInvalidCandidate(I, NotControlFlowEquivalent); if (!DT.dominates(&InsertPoint, &I)) @@ -363,7 +367,7 @@ bool llvm::isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, // StartInst to \p EndInst. if (std::any_of(InstsToCheck.begin(), InstsToCheck.end(), [&DI, &I](Instruction *CurInst) { - auto DepResult = DI.depends(&I, CurInst, true); + auto DepResult = DI->depends(&I, CurInst, true); if (DepResult && (DepResult->isOutput() || DepResult->isFlow() || DepResult->isAnti())) @@ -376,8 +380,8 @@ bool llvm::isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, } bool llvm::isSafeToMoveBefore(BasicBlock &BB, Instruction &InsertPoint, - DominatorTree &DT, const PostDominatorTree &PDT, - DependenceInfo &DI) { + DominatorTree &DT, const PostDominatorTree *PDT, + DependenceInfo *DI) { return llvm::all_of(BB, [&](Instruction &I) { if (BB.getTerminator() == &I) return true; @@ -396,7 +400,7 @@ void llvm::moveInstructionsToTheBeginning(BasicBlock &FromBB, BasicBlock &ToBB, // Increment the iterator before modifying FromBB. ++It; - if (isSafeToMoveBefore(I, *MovePos, DT, PDT, DI)) + if (isSafeToMoveBefore(I, *MovePos, DT, &PDT, &DI)) I.moveBefore(MovePos); } } @@ -408,7 +412,7 @@ void llvm::moveInstructionsToTheEnd(BasicBlock &FromBB, BasicBlock &ToBB, Instruction *MovePos = ToBB.getTerminator(); while (FromBB.size() > 1) { Instruction &I = FromBB.front(); - if (isSafeToMoveBefore(I, *MovePos, DT, PDT, DI)) + if (isSafeToMoveBefore(I, *MovePos, DT, &PDT, &DI)) I.moveBefore(MovePos); } } diff --git a/llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp b/llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp index 27e140d81a98..f9f1b235d0d0 100644 --- a/llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp +++ b/llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp @@ -505,48 +505,50 @@ TEST(CodeMoverUtils, IsSafeToMoveTest1) { // Can move after CI_safecall, as it does not throw, not synchronize, or // must return. EXPECT_TRUE(isSafeToMoveBefore(*CI_safecall->getPrevNode(), - *CI_safecall->getNextNode(), DT, PDT, - DI)); + *CI_safecall->getNextNode(), DT, &PDT, + &DI)); // Cannot move CI_unsafecall, as it may throw. EXPECT_FALSE(isSafeToMoveBefore(*CI_unsafecall->getNextNode(), - *CI_unsafecall, DT, PDT, DI)); + *CI_unsafecall, DT, &PDT, &DI)); // Moving instruction to non control flow equivalent places are not // supported. - EXPECT_FALSE( - isSafeToMoveBefore(*SI_A5, *Entry->getTerminator(), DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*SI_A5, *Entry->getTerminator(), DT, + &PDT, &DI)); // Moving PHINode is not supported. EXPECT_FALSE(isSafeToMoveBefore(PN, *PN.getNextNode()->getNextNode(), - DT, PDT, DI)); + DT, &PDT, &DI)); // Cannot move non-PHINode before PHINode. - EXPECT_FALSE(isSafeToMoveBefore(*PN.getNextNode(), PN, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*PN.getNextNode(), PN, DT, &PDT, &DI)); // Moving Terminator is not supported. EXPECT_FALSE(isSafeToMoveBefore(*Entry->getTerminator(), - *PN.getNextNode(), DT, PDT, DI)); + *PN.getNextNode(), DT, &PDT, &DI)); // Cannot move %arrayidx_A after SI, as SI is its user. EXPECT_FALSE(isSafeToMoveBefore(*SI->getPrevNode(), *SI->getNextNode(), - DT, PDT, DI)); + DT, &PDT, &DI)); // Cannot move SI before %arrayidx_A, as %arrayidx_A is its operand. - EXPECT_FALSE(isSafeToMoveBefore(*SI, *SI->getPrevNode(), DT, PDT, DI)); + EXPECT_FALSE( + isSafeToMoveBefore(*SI, *SI->getPrevNode(), DT, &PDT, &DI)); // Cannot move LI2 after SI_A6, as there is a flow dependence. EXPECT_FALSE( - isSafeToMoveBefore(*LI2, *SI_A6->getNextNode(), DT, PDT, DI)); + isSafeToMoveBefore(*LI2, *SI_A6->getNextNode(), DT, &PDT, &DI)); // Cannot move SI after LI1, as there is a anti dependence. - EXPECT_FALSE(isSafeToMoveBefore(*SI, *LI1->getNextNode(), DT, PDT, DI)); + EXPECT_FALSE( + isSafeToMoveBefore(*SI, *LI1->getNextNode(), DT, &PDT, &DI)); // Cannot move SI_A5 after SI, as there is a output dependence. - EXPECT_FALSE(isSafeToMoveBefore(*SI_A5, *LI1, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*SI_A5, *LI1, DT, &PDT, &DI)); // Can move LI2 before LI1, as there is only an input dependence. - EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, PDT, DI)); + EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, &PDT, &DI)); }); } @@ -578,11 +580,11 @@ TEST(CodeMoverUtils, IsSafeToMoveTest2) { Instruction *SubInst = getInstructionByName(F, "sub"); // Cannot move as %user uses %add and %sub doesn't dominates %user. - EXPECT_FALSE(isSafeToMoveBefore(*AddInst, *SubInst, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*AddInst, *SubInst, DT, &PDT, &DI)); // Cannot move as %sub_op0 is an operand of %sub and %add doesn't // dominates %sub_op0. - EXPECT_FALSE(isSafeToMoveBefore(*SubInst, *AddInst, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*SubInst, *AddInst, DT, &PDT, &DI)); }); } @@ -611,7 +613,7 @@ TEST(CodeMoverUtils, IsSafeToMoveTest3) { // Can move as the incoming block of %inc for %i (%for.latch) dominated // by %cmp. - EXPECT_TRUE(isSafeToMoveBefore(*IncInst, *CmpInst, DT, PDT, DI)); + EXPECT_TRUE(isSafeToMoveBefore(*IncInst, *CmpInst, DT, &PDT, &DI)); }); } @@ -643,10 +645,10 @@ TEST(CodeMoverUtils, IsSafeToMoveTest4) { Instruction *SubInst = getInstructionByName(F, "sub"); // Cannot move as %user uses %add and %sub doesn't dominates %user. - EXPECT_FALSE(isSafeToMoveBefore(*AddInst, *SubInst, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*AddInst, *SubInst, DT, &PDT, &DI)); // Cannot move as %sub_op0 is an operand of %sub and %add doesn't // dominates %sub_op0. - EXPECT_FALSE(isSafeToMoveBefore(*SubInst, *AddInst, DT, PDT, DI)); + EXPECT_FALSE(isSafeToMoveBefore(*SubInst, *AddInst, DT, &PDT, &DI)); }); } From llvm-commits at lists.llvm.org Tue Jul 7 07:43:04 2020 From: llvm-commits at lists.llvm.org (Ronak Chauhan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:43:04 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: rochauha added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1345 + return MCDisassembler::Success; +} // decodeCOMPUTE_PGM_RSRC1() + ---------------- scott.linder wrote: > rochauha wrote: > > scott.linder wrote: > > > I don't know the general conventions here, but I don't think I have seen a comment for the end of a function elsewhere in LLVM. I do know that it is required for namespaces, so maybe it is permitted for long functions? > > I'm not sure. I added those comments because these functions were getting quite long. > I would lean towards omitting these, especially with the functions becoming shorter. For example, `decodeCOMPUTE_PGM_RSRC2()` is now <50 lines long at the entire definition now fits on one screen for me. It seems like there are other examples of this in the codebase, though, so I'm OK with it for the longer functions. Done. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1349 + uint32_t FourByteBuffer, raw_string_ostream &KdStream) const { + // Decode as directives that handle COMPUTE_PGM_RSRC2. + StringRef Indent = "\t"; ---------------- scott.linder wrote: > Can you expand this comment a little and move it to a Doxygen comment for the function? Done. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1624 + } +} // decodeKernelDescriptorDirective() + ---------------- scott.linder wrote: > Need to handle the "default" case here: > > ``` > llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1510:1: warning: control may reach end of non-void function [-Wreturn-type] > ``` Done. ================ Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1665 + Size = 256; + return MCDisassembler::SoftFail; + } ---------------- scott.linder wrote: > rochauha wrote: > > scott.linder wrote: > > > I'm still not sure what we landed on for the semantics of `SoftFail` here? > > It should be Success / Fail based on what the bytes are for code object v2. But there's nothing we are 'doing' at the moment for v2, I returned SoftFail. > If `SoftFail` isn't applicable I don't think we should return it, even if it is just because we haven't implemented something yet. It existing doesn't mean it needs to be used, I think it has a very narrow definition that doesn't apply here. > > Maybe just emit a diagnostic and return `Fail` so we get the "decode as .byte" behavior? What exactly happens now with the current patch as-is? Done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 From llvm-commits at lists.llvm.org Tue Jul 7 07:43:06 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:43:06 +0000 (UTC) Subject: [PATCH] D82566: [CodeMoverUtils] Make specific analysis dependent checks optional In-Reply-To: References: Message-ID: <4fca0a8ae5f4bed5db84801410a24caf@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG082e39523000: [CodeMoverUtils] Make specific analysis dependent checks optional (authored by RithikSharma). Changed prior to commit: https://reviews.llvm.org/D82566?vs=275758&id=276064#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82566/new/ https://reviews.llvm.org/D82566 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82566.276064.patch Type: text/x-patch Size: 10548 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:44:02 2020 From: llvm-commits at lists.llvm.org (Shilei Tian via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:44:02 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: tianshilei1992 added a comment. LGTM. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:1139 #define OMP_RTL(_Enum, _Name, ...) \ - if (M.getFunction(_Name)) \ - return OMPInModule = true; + else if (M.getFunction(_Name)) OMPInModule = true; #include "llvm/Frontend/OpenMP/OMPKinds.def" ---------------- This line of change looks not related to this patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 From llvm-commits at lists.llvm.org Tue Jul 7 07:51:31 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:51:31 +0000 (UTC) Subject: [PATCH] D83308: [Power10] Implement Vector Replace Builtins in LLVM Message-ID: biplmish created this revision. biplmish added reviewers: lei, amyk, PowerPC. Herald added subscribers: wuzish, kbarton, hiraditya, nemanjai. Herald added a project: LLVM. This patch implements the LLVM intrinsics needed to implement the following prototypes of Vector Replace Builtins. vector signed int vec_replace_elt (vector signed int, signed int, const int); vector unsigned int vec_replace_elt (vector unsigned int, unsigned int, const int); vector float vec_replace_elt (vector float, float, const int); vector signed long long vec_replace_elt (vector signed long long, signed long long, const int); vector unsigned long long vec_replace_elt (vector unsigned long long, unsigned long long, const int); vector double rec_replace_elt (vector double, double, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, signed int, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, unsigned int, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, float, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, signed long long, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, unsigned long long, const int); vector unsigned char vec_replace_unaligned (vector unsigned char, double, const int); https://reviews.llvm.org/D83308 Files: llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll Index: llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll =================================================================== --- llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll +++ llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll @@ -231,3 +231,25 @@ ret <4 x i32> %0 } declare <4 x i32> @llvm.ppc.altivec.vinswvrx(<4 x i32>, i64, <4 x i32>) + +define <4 x i32> @testVINSW(<4 x i32> %a, i64 %b) { +; CHECK-LABEL: testVINSW: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsw v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32> %a, i64 %b, i32 1) + ret <4 x i32> %0 +} +declare <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i64, i32 immarg) + +define <2 x i64> @testVINSD(<2 x i64> %a, i64 %b) { +; CHECK-LABEL: testVINSD: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsd v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64> %a, i64 %b, i32 1) + ret <2 x i64> %0 +} +declare <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64>, i64, i32 immarg) Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -794,8 +794,16 @@ (int_ppc_altivec_vsrdbi v16i8:$VRA, v16i8:$VRB, i32:$SH))]>; - def VINSW : VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", []>; - def VINSD : VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", []>; + def VINSW : + VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", + [(set v4i32:$vD, + (int_ppc_altivec_vinsw v4i32:$vDi, i64:$rB, + timm:$UIM))]>; + def VINSD : + VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", + [(set v2i64:$vD, + (int_ppc_altivec_vinsd v2i64:$vDi, i64:$rB, + timm:$UIM))]>; def VINSBVLX : VXForm_VTB5_RA5_ins<15, "vinsbvlx", [(set v16i8:$vD, Index: llvm/include/llvm/IR/IntrinsicsPowerPC.td =================================================================== --- llvm/include/llvm/IR/IntrinsicsPowerPC.td +++ llvm/include/llvm/IR/IntrinsicsPowerPC.td @@ -522,6 +522,15 @@ Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_i64_ty, llvm_v4i32_ty], [IntrNoMem]>; + // P10 Vector Insert with immediate. + def int_ppc_altivec_vinsw : GCCBuiltin<"__builtin_altivec_vinsw">, + Intrinsic<[llvm_v4i32_ty], + [llvm_v4i32_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; + def int_ppc_altivec_vinsd : GCCBuiltin<"__builtin_altivec_vinsd">, + Intrinsic<[llvm_v2i64_ty], + [llvm_v2i64_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; } // Vector average. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83308.276066.patch Type: text/x-patch Size: 3192 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:52:14 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:52:14 +0000 (UTC) Subject: [PATCH] D83031: AMDGPU/GlobalISel: Select G_FREEZE In-Reply-To: References: Message-ID: Petar.Avramovic updated this revision to Diff 276068. Petar.Avramovic added a comment. Add a few sgpr,agpr and vcc tests similar to ones from regbankselect-copy.mir. Original tests were from legalize-freeze.mir CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83031/new/ https://reviews.llvm.org/D83031 Files: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-freeze.mir llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-freeze.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83031.276068.patch Type: text/x-patch Size: 43529 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 07:54:13 2020 From: llvm-commits at lists.llvm.org (Biplob Mishra via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:54:13 +0000 (UTC) Subject: [PATCH] D82359: [Power10] Implement Vector Replace Builtins in LLVM/Clang In-Reply-To: References: Message-ID: biplmish abandoned this revision. biplmish added a comment. This patch will be implemented as 2 separate patches for LLVM and CLANG respectively. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82359/new/ https://reviews.llvm.org/D82359 From llvm-commits at lists.llvm.org Tue Jul 7 07:57:28 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:57:28 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: cameron.mcinally accepted this revision. cameron.mcinally added a comment. This revision is now accepted and ready to land. LGTM ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1429 + PtrInfo.getWithOffset(Offset), MemVT)); + else Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl, Node->getOperand(i), ---------------- Nit: Not specific to this patch, but I think we can hoist `Node->getOperand(i).getValueType()` out of the loop. All the BUILD_VECTOR/CONCAT_VECTOR operand types should be the same. Looking deeper, the BUILD_VECTOR description is a little vague though: ``` /// The types of the operands must all be /// the same and must match the vector element type, except that integer types /// are allowed to be larger than the element type, in which case the operands /// are implicitly truncated. ``` I assume the larger integer operand types must all be the same type. Maybe I'm misinterpreting this though. Just queried llvm-dev about BUILD_VECTOR and will report back... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Tue Jul 7 07:59:06 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:59:06 +0000 (UTC) Subject: [PATCH] D83308: [Power10] Implement Vector Replace Builtins in LLVM In-Reply-To: References: Message-ID: amyk added inline comments. ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:526 + // P10 Vector Insert with immediate. + def int_ppc_altivec_vinsw : GCCBuiltin<"__builtin_altivec_vinsw">, + Intrinsic<[llvm_v4i32_ty], ---------------- If we are going to custom codegen the front end, I believe we have to omit the `GCCBuiltin` part. Same with `int_ppc_altivec_vinsd`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83308/new/ https://reviews.llvm.org/D83308 From llvm-commits at lists.llvm.org Tue Jul 7 08:01:51 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:01:51 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <4cb21cbbd7c04d05fb830d40eb344a0a@localhost.localdomain> jdoerfert added a comment. In D83268#2136060 , @ABataev wrote: > Better to ask the users. Maybe, send an RFC to openmp-devs? Sure: http://lists.llvm.org/pipermail/openmp-dev/2020-July/003531.html Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 08:02:00 2020 From: llvm-commits at lists.llvm.org (David Greene via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:02:00 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: greened marked an inline comment as done. greened added inline comments. ================ Comment at: llvm/utils/update_cc_test_checks.py:133 + parser.add_argument('--include-generated-funcs', action='store_true', + help='Output checks for functions not in source') parser.add_argument('tests', nargs='+') ---------------- jdoerfert wrote: > greened wrote: > > greened wrote: > > > jdoerfert wrote: > > > > I think this should go into common.py (after D78618). I would also make this the default but OK. > > > Yes I suppose it should in case `opt` and friends generate functions. I hadn't considered that use-case. > > > > > > While I would like to make it default unfortunately it would require updating a bunch of the existing clang tests which doesn't seem too friendly. See the patch update comment for details. > > > > > Just realized it wouldn't necessarily require regeneration of tests, it would just cause regenerated tests to change a lot when they are eventually regenerated. We should discuss as to whether that's acceptable. I think for now this should be non-default to at least get the functionality in without disturbing existing users and then we can discuss a separate change to make it default. > > > > It's also possible we could change how clang orders functions. I discovered there's a difference in clang 10 vs. 11 in the order functions are output when OpenMP outlining happens. clang 10 seems to preserve the source order of functions and clang 11 does not. Perhaps that needs to be fixed as I don't know whether that change was intentional or not. > Best case, without the option the original behavior is preserved. Is that not the case? That's right. I was referring to making this behavior default. If we do that, we could clean up the script code a bit but it would mean clang tests would change pretty dramatically when they are regenerated. If we fix the clang output, the test changes wouldn't be so dramatic. The way clang is behaving now, I would expect any tests that use `-fopenmp`, have multiple functions with OpenMP regions and use function prototypes for some of those functions would break given clang's reordering of function definitions. Perhaps we don't have any tests like that though. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Tue Jul 7 08:03:38 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:03:38 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: ABataev accepted this revision. ABataev added a comment. LGTM with a nit. ================ Comment at: clang/lib/Parse/ParseOpenMP.cpp:2787 + if (getLangOpts().OpenMP < 51 && Kind == OMPC_default && + static_cast(Val.getValue().Type) == + OMP_DEFAULT_firstprivate) { ---------------- ABataev wrote: > No need for cast here. Still no need for the cast Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Tue Jul 7 08:04:32 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:04:32 +0000 (UTC) Subject: [PATCH] D82566: [CodeMoverUtils] Make specific analysis dependent checks optional In-Reply-To: References: Message-ID: <00935adfc880662d2cf7a98500a05f30@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG082e39523000: [CodeMoverUtils] Make specific analysis dependent checks optional (authored by RithikSharma). Changed prior to commit: https://reviews.llvm.org/D82566?vs=274121&id=275665#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82566/new/ https://reviews.llvm.org/D82566 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82566.275665.patch Type: text/x-patch Size: 10548 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:04:36 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:04:36 +0000 (UTC) Subject: [PATCH] D83308: [Power10] Implement Vector Replace Builtins in LLVM In-Reply-To: References: Message-ID: <16597362edfc2fb5b8f786b3ab8a1bf3@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. This revision is now accepted and ready to land. LGTM as long as the inheriting of GCCBuiltin is removed. This can be done on commit. ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:526 + // P10 Vector Insert with immediate. + def int_ppc_altivec_vinsw : GCCBuiltin<"__builtin_altivec_vinsw">, + Intrinsic<[llvm_v4i32_ty], ---------------- amyk wrote: > If we are going to custom codegen the front end, I believe we have to omit the `GCCBuiltin` part. Same with `int_ppc_altivec_vinsd`. Yes. Even if we don't plan custom code gen in the front end, we shouldn't have this in the initial back end patch because it is referencing a builtin that we do not define. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83308/new/ https://reviews.llvm.org/D83308 From llvm-commits at lists.llvm.org Tue Jul 7 08:05:27 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:05:27 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA Message-ID: RithikSharma created this revision. RithikSharma added reviewers: Whitney, bmahjour, etiotto. RithikSharma added a project: LLVM. Herald added subscribers: llvm-commits, asbirlea, george.burgess.iv, hiraditya, Prazek. isSafeToMoveBefore uses Dependence Info to check for flow/anti/output dependence, this patch adds alternative checks using MSSA. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83311 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83311.276067.patch Type: text/x-patch Size: 18100 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:06:17 2020 From: llvm-commits at lists.llvm.org (Steve Scalpone via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:06:17 +0000 (UTC) Subject: [PATCH] D83227: [flang] Add algorithm include to runtime/file.cpp for std::min In-Reply-To: References: Message-ID: <6e7d938f35ef414d7dec6a996a6fd283@localhost.localdomain> sscalpone added a comment. Is this a dup of 1b183918184ecbcd03898badf8d1789ea0f4ffe4 ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83227/new/ https://reviews.llvm.org/D83227 From llvm-commits at lists.llvm.org Tue Jul 7 08:07:37 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:07:37 +0000 (UTC) Subject: [PATCH] D83227: [flang] Add algorithm include to runtime/file.cpp for std::min In-Reply-To: References: Message-ID: <56c932b098e7a2b83e0c5f20fd40928b@localhost.localdomain> DavidTruby abandoned this revision. DavidTruby added a comment. Yeah, I posted this first but looks like the other got committed first. No problem! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83227/new/ https://reviews.llvm.org/D83227 From llvm-commits at lists.llvm.org Tue Jul 7 08:13:47 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:13:47 +0000 (UTC) Subject: [PATCH] D83235: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand In-Reply-To: References: Message-ID: Petar.Avramovic marked an inline comment as done. Petar.Avramovic added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:417 + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; ---------------- arsenm wrote: > I don't think this case is covered in the test? That was covered in D82651 but I didn't check the flag and missed to report that isMemKind() was unsupported . CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83235/new/ https://reviews.llvm.org/D83235 From llvm-commits at lists.llvm.org Tue Jul 7 08:15:03 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:15:03 +0000 (UTC) Subject: [PATCH] D83312: [OpenMPOpt][NFC] Exposing OMPInformationCache as private header for unittesting Message-ID: hamax97 created this revision. hamax97 added reviewers: jdoerfert, sstefan1. hamax97 added projects: OpenMP, LLVM. Herald added subscribers: llvm-commits, bbn, guansong, hiraditya, yaxunl. Herald added a reviewer: baziotis. What I propose is to move the structures that are needed in unittesting to the private header file `OpenMPOptPriv.h`. I this patch I only moved `OMPInformationCache` beacause that's the one I needed for writing the unittest of one of the functions. But, we can progressively move what is needed to be tested here. https://reviews.llvm.org/D83312 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/OpenMPOptPriv.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83312.276074.patch Type: text/x-patch Size: 15941 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:18:16 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:18:16 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <68f1a1855f1301a88b3db81785eb7a6b@localhost.localdomain> sameerarora101 marked 19 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:11 +# RUN: llvm-ar t %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-NAMES --implicit-check-not={{.}} -DPREFIX=create-static-lib.test.tmp + ---------------- jhenderson wrote: > It's not widely used, but there is `%basename_t` which substitutes for just the file name part of `%t`. This allows the test to not make assumptions about how `%t` expands, and also keeps the check independent of the test name. I'd recommend trying that. thanks, I now have `-DPREFIX=%basename_t.tmp` as `%basename_t` substitutes for the last path component of %t but without the .tmp extension ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-arguments.test:7 + +# MISSING-OPERATION: Library Type: option: must be specified at least once! + ---------------- jhenderson wrote: > sameerarora101 wrote: > > jhenderson wrote: > > > Does the double space match the actual error message? > > Yes, the actual error msg also has the double space: > > ``` > > Library Type: option: must be specified at least once! > > ``` > Okay, bonus marks for fixing it in another patch if you want. Or file a bug against the CommandLine code. 😂I can fix it in another patch. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:27 + +## Missing -static option: +# RUN: not llvm-libtool-darwin -o %t.lib %t.input 2>&1 | \ ---------------- jhenderson wrote: > Maybe make this more generic, e.g. "Missing library type option:" > > I don't really know where to put this test. It might want to be its own test case entirely, e.g. "missing-library-type.test" Isn't it similar to `## Missing output file`? (That's why I thought it would go here). But I have placed it in `missing-library-type.test` for now. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:28 + cl::value_desc("filename"), cl::Required, + cl::cat(LibtoolCategory)); static cl::alias OutputFileShort("o", cl::desc("Alias for -output"), ---------------- jhenderson wrote: > Adding the categories sounds like a different change to me? You might want to include it alongside a test case to show that unrelated options aren't incldued. yup, I added `OptionCategory` in order to prevent the general option `--safepoint-ir-verifier-print-only` from showing up in the help text. Added a test case for it as well. Thanks ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:85 + std::vector NewMembers; + for (const StringRef &Member : InputFiles) + if (Error E = addMember(NewMembers, Member)) ---------------- jhenderson wrote: > sameerarora101 wrote: > > @jhenderson Should I replace the type of `Member` back to `auto`. clang-tidy raises a warning with `StringRef`? > Well typically you don't want to use a `const &` for `StringRef` because it is already a light-weight construct (much the same as you wouldn't use `const &` to a pointer or primitive type). That is probably the cause of the problem here. ok thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Tue Jul 7 08:18:28 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:18:28 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: courbet added a comment. Only style comments left ================ Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:110 + + auto *From = (void *)Function.getFunctionBytes().data(); + auto *To = (void *)(((char *)From) + Function.getFunctionBytes().size()); ---------------- let's use a c++ cast here. ================ Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:112 + auto *To = (void *)(((char *)From) + Function.getFunctionBytes().size()); + auto ValueOrError = Counter->readOrError(From, To); if (!ValueOrError) ---------------- Actually I'm thinking you can just pass a `StringRef` to `readOrError` ================ Comment at: llvm/tools/llvm-exegesis/lib/PerfHelper.h:90 /// Returns the current value of the counter or error if it cannot be read. + virtual llvm::Expected> ---------------- Please explain what the arguments are (and switch to StringRef as mentioned above ?) ================ Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:149 + PROT_READ | PROT_WRITE, MAP_SHARED, FileDescriptor, 0); + if (MMappedBuffer == MAP_FAILED) { + llvm::errs() << "Failed to mmap buffer."; ---------------- braces ================ Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:204 + + if (!error) { + return CycleArray; ---------------- style: no braces ================ Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.h:8 +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TOOLS_LLVM_EXEGESIS_LIB_X86_X86COUNTER_H ---------------- This could use a file comment with a link to LBR documentation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 From llvm-commits at lists.llvm.org Tue Jul 7 08:22:04 2020 From: llvm-commits at lists.llvm.org (Yvan Roux via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:22:04 +0000 (UTC) Subject: [PATCH] D83313: [MachineOutliner] Fix liveness computing. Message-ID: yroux created this revision. yroux added reviewers: paquette, samparker, efriedma, SjoerdMeijer, dmgreen. Herald added subscribers: llvm-commits, kristof.beyls. Herald added a project: LLVM. There are targets, such as ARM, where if-conversion can insert a predicated terminator when merging blocks. On ARM this instruction (BX_RET) doesn't contain the information that it uses the link register (this is handled by addLiveOuts which will add calle saved regtisters if needed), thus the current implementation might miss this link register usage wrongly insert an outlined call without saving it. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83313 Files: llvm/include/llvm/CodeGen/MachineOutliner.h llvm/test/CodeGen/ARM/machine-outliner-liveness.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83313.276072.patch Type: text/x-patch Size: 6997 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:27:54 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:27:54 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <2dd95f449b3e60f748db867494a3e919@localhost.localdomain> sameerarora101 updated this revision to Diff 276081. sameerarora101 marked 5 inline comments as done. sameerarora101 added a comment. Use `%basename_t` and add test for unrelated options. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/test/tools/llvm-libtool-darwin/missing-library-type.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.276081.patch Type: text/x-patch Size: 10949 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:28:09 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:28:09 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <6864915dac3fcb209e2dc6937cfbb145@localhost.localdomain> dantrushin updated this revision to Diff 276082. dantrushin added a comment. Return back old implementation of collectGCRegs as I've found case where statepoint defs != gc args; Refactor `insertReloads()` into a separate routine. Will add some tests soon. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp llvm/test/CodeGen/X86/statepoint-vreg.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.276082.patch Type: text/x-patch Size: 20675 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:28:44 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:28:44 +0000 (UTC) Subject: [PATCH] D83314: [llvm-readobj] - Refine error reporting in MipsGOTParser helper. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay, atanasyan. Herald added subscribers: rupprecht, arichardson, sdardis, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This is a follow-up for D83225 . This does the following: 1. Adds missing tests for existent errors. 2. Stops using `unwrapOrError` to propagate errors to caller. (I am trying to get rid of all `unwrapOrErr` calls in the llvm-readelf code). 3. Improves error messages reported slightly. https://reviews.llvm.org/D83314 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83314.276080.patch Type: text/x-patch Size: 5122 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:45:26 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:45:26 +0000 (UTC) Subject: [PATCH] D83312: [OpenMPOpt][NFC] Exposing OMPInformationCache as private header for unittesting In-Reply-To: References: Message-ID: <4b0e88541fa89273dd45a819885aaa40@localhost.localdomain> hamax97 updated this revision to Diff 276085. hamax97 added a comment. - Adding static function needed in unittest. - Fixing description in header of new file. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83312/new/ https://reviews.llvm.org/D83312 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/OpenMPOptPriv.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83312.276085.patch Type: text/x-patch Size: 21112 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:46:02 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:46:02 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: <00fb3571fbc7af26ee0e67473b7e4e19@localhost.localdomain> jdoerfert marked an inline comment as done. jdoerfert added inline comments. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:1139 #define OMP_RTL(_Enum, _Name, ...) \ - if (M.getFunction(_Name)) \ - return OMPInModule = true; + else if (M.getFunction(_Name)) OMPInModule = true; #include "llvm/Frontend/OpenMP/OMPKinds.def" ---------------- tianshilei1992 wrote: > This line of change looks not related to this patch. It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out of the if-cascade. Entrance: `else`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 From llvm-commits at lists.llvm.org Tue Jul 7 08:51:56 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:51:56 +0000 (UTC) Subject: [PATCH] D82584: [PowerPC][Power10] Exploit the High Order Vector Multiply Instructions on Power10 In-Reply-To: References: Message-ID: <1ef5457e9fa9b2a07fe4270307523d33@localhost.localdomain> kamaub accepted this revision. kamaub added a comment. This revision is now accepted and ready to land. This LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82584/new/ https://reviews.llvm.org/D82584 From llvm-commits at lists.llvm.org Tue Jul 7 08:52:12 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:52:12 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: nikic added inline comments. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- hans wrote: > nikic wrote: > > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. It would only help if there is some way to only fetch the analysis conditionally. I believe many PGO passes use something like PSI.hasProfileSummary() or F.hasProfileData() for that. > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? Right, just disabling this by default in clang/opt would also work. For reference, the current compile-time numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=516ff1d4baee28b1911737e47b42973567adf8ff&to=8df840660bb764b6653fcfd9ac7a72cc6adebde6&stat=instructions Not huge, but it adds up (some similar regressions have been introduced in LLVM 10). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 08:52:17 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:52:17 +0000 (UTC) Subject: [PATCH] D83315: Turn arcmt-* options into a single option Message-ID: dang created this revision. dang added a reviewer: Bigcheese. Herald added subscribers: llvm-commits, cfe-commits, dexonsmith. Herald added projects: clang, LLVM. - The new option, -arcmt-action, is a simple enum based option. - The driver is modified to translate the existing -ccc-acmt-* options accordingly Depends on D83298 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83315 Files: clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/test/ARCMT/GC-check-warn-nsalloc.m clang/test/ARCMT/GC-check.m clang/test/ARCMT/atautorelease-check.m clang/test/ARCMT/check-api.m clang/test/ARCMT/check-with-pch.m clang/test/ARCMT/check-with-serialized-diag.m clang/test/ARCMT/checking-in-arc.m clang/test/ARCMT/checking.m clang/test/ARCMT/cxx-checking.mm clang/test/ARCMT/driver-migrate.m clang/test/ARCMT/migrate-emit-errors.m clang/test/ARCMT/migrate-plist-output.m clang/test/ARCMT/migrate-space-in-path.m clang/test/ARCMT/migrate-with-pch.m clang/test/ARCMT/migrate.m clang/test/ARCMT/no-canceling-bridge-to-bridge-cast.m clang/test/ARCMT/nonobjc-to-objc-cast-2.m clang/test/ARCMT/releases-driver.m clang/test/ARCMT/releases-driver.m.result clang/test/ARCMT/verify.m clang/test/ARCMT/with-arc-mode-modify.m clang/test/ARCMT/with-arc-mode-modify.m.result llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83315.276088.patch Type: text/x-patch Size: 18163 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:52:46 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:52:46 +0000 (UTC) Subject: [PATCH] D83316: [OpenMPOpt][WIP] Structure for unittests Message-ID: hamax97 created this revision. hamax97 added reviewers: jdoerfert, sstefan1. hamax97 added projects: OpenMP, LLVM. Herald added subscribers: llvm-commits, guansong, yaxunl, mgorny. Basic structure for adding new unittests to the OpenMPOpt optimizations. https://reviews.llvm.org/D83316 Files: llvm/unittests/Transforms/IPO/CMakeLists.txt llvm/unittests/Transforms/IPO/OpenMPOpt/CMakeLists.txt llvm/unittests/Transforms/IPO/OpenMPOpt/HideMemTransferLatencyTest.cpp llvm/unittests/Transforms/IPO/OpenMPOpt/OpenMPOptTest.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83316.276087.patch Type: text/x-patch Size: 4647 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 08:54:11 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:54:11 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <2162f86d726d61ef63c98efc53642c12@localhost.localdomain> wmi added inline comments. ================ Comment at: llvm/lib/CodeGen/CFIInstrInserter.cpp:344 + *MBBInfo.MBB, MBBI); + InsertedCFIInstr = true; + } ---------------- Should we add a continue here after setting PrevMBBInfo = &MBBInfo? We shouldn't insert any extra CSR related CFI directives based on prev MBB. ================ Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:488-490 + MachineFunction &MF = *MBB.getParent(); + if (!hasFP(MF)) + return; ---------------- .cfi_offset directive for framepointer is inserted after other .cfi_offset directives for callee save registers. This is different from the .cfi_offset order inserted for prologue. I am not familiar with how the dedup cfi is implemented. A question is will the order difference reduce the chance of deduplicating cfi in prologue? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 08:55:58 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:55:58 +0000 (UTC) Subject: [PATCH] D83300: [GlobalOpt] Don't remove inalloca from musttail-called functions In-Reply-To: References: Message-ID: aeubanks accepted this revision. aeubanks added a comment. LGTM Ideally we could look remove musttail inalloca calls/parameters recursively CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83300/new/ https://reviews.llvm.org/D83300 From llvm-commits at lists.llvm.org Tue Jul 7 08:56:30 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:56:30 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: <204a68d0095c15068f910e395486ae43@localhost.localdomain> nikic added a comment. In D60413#2135831 , @lebedev.ri wrote: > Why doesn't `InstCombiner::SimplifyDemandedUseBits()` handle this? > I would have expected this to already deal with it. `SimplifyDemandedUseBits()` can only handle single-use cases, because demanded bits are only computed for a specific use. There is `SimplifyMultipleUseDemandedBits()`, but it can only simplify a specific use-site, not replace a whole instruction. This patch has the right general idea, in that `DemandedBits` is the analysis that can determine this for the multi-use case. However, I'm not very comfortable with performing a full demanded bits calculation (on the whole function) in AggressiveInstCombine, just for this purpose. I think it would be better to repurpose BDCE (which already computes DemandedBits) to be a bit more general and also perform some demanded-bits based folds there, rather than only DCE. (This is similar to how SCCP has recently started replacing sext with zext if possible, even though that is not the primary purpose of that pass.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Tue Jul 7 08:58:57 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 15:58:57 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: <8aa4b064a80a4338c381fe9b1b6a2f55@localhost.localdomain> nikic added inline comments. ================ Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:356 + AssumptionCache AC(F); + DemandedBits DB(F, AC, DT); for (BasicBlock &BB : F) { ---------------- It's quite likely that this analysis may get invalidated by some of the performed transforms. AssumptionCache should also be made a pass dependency, not constructed inline. But as mentioned, I would recommend moving this into BDCE instead. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Tue Jul 7 09:01:36 2020 From: llvm-commits at lists.llvm.org (Kit Barton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:01:36 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: <3d58144e6cbc33714a70f2050de6e85c@localhost.localdomain> kbarton added a comment. In D83100#2134249 , @tschuett wrote: > Pardon my cmake, but don't you need a CMakeLists.txt in the GISel sub-directory? I thought I would too, but I added GISel/ prefix to the CMake in the PowerPC directory, which works. This is the way it was done in the Aarch64 patch that did the refactoring. That said, I am by no means a cmake expert, so if having a separate CMakeLists.txt file in the GISel directory is "better", I can add one there also. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Tue Jul 7 09:02:48 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:02:48 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <3e711552c7715f16bf103739707e7668@localhost.localdomain> fpetrogalli accepted this revision. fpetrogalli added a comment. LGTM! Thank you CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83195/new/ https://reviews.llvm.org/D83195 From llvm-commits at lists.llvm.org Tue Jul 7 09:08:54 2020 From: llvm-commits at lists.llvm.org (Daniel Sanders via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:08:54 +0000 (UTC) Subject: [PATCH] D83275: [llc] (almost) remove `--print-machineinstrs` In-Reply-To: References: Message-ID: <4269d1b773924ea1ca24d5bae3562348@localhost.localdomain> dsanders added a comment. In D83275#2134966 , @ychen wrote: > In D83275#2134936 , @dsanders wrote: > > > I was worried for a moment that we'd be losing the ability to print between all machine passes but it looks like -print-after-all covers that now (I don't think that was always the case). So long as we aren't losing any inter-pass dumps this LGTM but I'd suggest giving it a few days to see if anyone was using this in a way that isn't evident from the tests. > > > > > -print-after/-stop-after since isel pass does not have commandline name. > > > > IIRC, there's something weird going on in this area. I vaguely remember a problem I never got to the bottom of where there was no name when AMDGPU was omitted but when it was compiled, everybody's pass was called `amdgpu-isel`. It had something to do with AMDGPU needing additional dependencies and using INITIALIZE_PASS to get them. > > > I don't know if it is the cause but AMDGPUDAGToDAGISel::ID is actually SelectionDAGISel::ID. That could be it. Those ID's are the keys into the pass registry so that would cause AMDGPUDAGToDAGISel to register a PassInfo for SelectionDAGISel with the argument "amdgpu-isel" which would be how other targets are able to use it. Without it compiled in, there would be no PassInfo registered leaving getPassIDFromName() unable to find it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83275/new/ https://reviews.llvm.org/D83275 From llvm-commits at lists.llvm.org Tue Jul 7 09:09:00 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:09:00 +0000 (UTC) Subject: [PATCH] D82754: [lit] Prevent hang when lit sees non-ASCII characters In-Reply-To: References: Message-ID: <6fa4151f39e690a9248a1d568eced358@localhost.localdomain> richard.barton.arm added a comment. > By "passes", I think you're saying that it doesn't reproduce the error. Right? Yes, sorry. I was trying to write a test that would fail _without_ my patch applied so had got myself turned around. > If so, I suspect that llvm-lit called directly uses python2 on your system, but check-all uses python3. Can you confirm? > I also suspect this is a python2-specific bug. Spot on - I can confirm this is what is happening. Thanks for clearing that up for me. I will push an update with a new comment and my new test. I guess the evilness of it when/if it were to regress is ok - make sure we don't regress I suppose! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82754/new/ https://reviews.llvm.org/D82754 From llvm-commits at lists.llvm.org Tue Jul 7 09:09:26 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:09:26 +0000 (UTC) Subject: [PATCH] D82754: [lit] Prevent hang when lit sees non-ASCII characters In-Reply-To: References: Message-ID: <5215d75499328efb8b5d28610a9ec6d4@localhost.localdomain> richard.barton.arm updated this revision to Diff 276094. richard.barton.arm added a comment. Herald added a reviewer: DavidTruby. Add test and comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82754/new/ https://reviews.llvm.org/D82754 Files: flang/test/Semantics/modfile01.f90 -------------- next part -------------- A non-text attachment was scrubbed... Name: D82754.276094.patch Type: text/x-patch Size: 3017 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:13:38 2020 From: llvm-commits at lists.llvm.org (Aaron H Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:13:38 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: <0ada795792a80cd41db6db7282f61e87@localhost.localdomain> AaronLiu added a comment. In the application we try, LV refuse to vectorize due to not profitable, but if we force LV to vectorize and it will crash. Apparently there are some obstacles. There are cases that even if LV fails, SLP could succeed. Yes, the term small loop is a little bit of confusing. For example a loop which has a small number of instructions but has a huge loop trip count, is the loop small or big? In our example, the loop trip count is small, and also the instruction number is small. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Tue Jul 7 09:14:03 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:14:03 +0000 (UTC) Subject: [PATCH] D83100: [PPC][GlobalISel] Add initial GlobalIsel infrastructure In-Reply-To: References: Message-ID: <23d67c0020ea4c6b75fa17b105096c26@localhost.localdomain> arsenm added a comment. In D83100#2136392 , @kbarton wrote: > In D83100#2134249 , @tschuett wrote: > > > Pardon my cmake, but don't you need a CMakeLists.txt in the GISel sub-directory? > > > I thought I would too, but I added GISel/ prefix to the CMake in the PowerPC directory, which works. > This is the way it was done in the Aarch64 patch that did the refactoring. > > That said, I am by no means a cmake expert, so if having a separate CMakeLists.txt file in the GISel directory is "better", I can add one there also. It's not a separate library, so it's not better. Things are not layered to make this possible Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83100/new/ https://reviews.llvm.org/D83100 From llvm-commits at lists.llvm.org Tue Jul 7 09:14:41 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:14:41 +0000 (UTC) Subject: [PATCH] D82754: [lit] Prevent hang when lit sees non-ASCII characters In-Reply-To: References: Message-ID: <42231758956e023be3cfe085b88b71e9@localhost.localdomain> richard.barton.arm updated this revision to Diff 276096. richard.barton.arm added a comment. Arg - finger trouble using arc! I pushed the Flang test that I was writing that made me stumble across this issue as part of this patch by mistake. Remove it from the new version. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82754/new/ https://reviews.llvm.org/D82754 Files: llvm/utils/lit/lit/display.py llvm/utils/lit/tests/Inputs/shtest-shell-ascii/diff-in.bin llvm/utils/lit/tests/Inputs/shtest-shell-ascii/lit.cfg llvm/utils/lit/tests/Inputs/shtest-shell-ascii/stdout-encoding.txt llvm/utils/lit/tests/shtest-shell-ascii.py Index: llvm/utils/lit/tests/shtest-shell-ascii.py =================================================================== --- /dev/null +++ llvm/utils/lit/tests/shtest-shell-ascii.py @@ -0,0 +1,22 @@ +# Check the internal shell handling component of the ShTest format. +# +# RUN: env PYTHONIOENCODING=ascii %{lit} -j 1 -a %{inputs}/shtest-shell-ascii > %t.out +# FIXME: Temporarily dump test output so we can debug failing tests on +# buildbots. +# RUN: cat %t.out +# RUN: FileCheck --input-file %t.out %s +# +# END. + +# CHECK: -- Testing: + +# CHECK: PASS: shtest-shell-ascii :: stdout-encoding.txt +# CHECK: $ "cat" "diff-in.bin" +# CHECK: # command output: +# CHECK-NEXT: {{^.f.o.o.$}} +# CHECK-NEXT: {{^.b.a.r.$}} +# CHECK-NEXT: {{^.b.a.z.$}} +# CHECK-NOT: error +# CHECK: *** + +# CHECK-NOT: Failed Tests Index: llvm/utils/lit/tests/Inputs/shtest-shell-ascii/stdout-encoding.txt =================================================================== --- /dev/null +++ llvm/utils/lit/tests/Inputs/shtest-shell-ascii/stdout-encoding.txt @@ -0,0 +1,4 @@ +# Check that lit doesn't fail when printing special characters in its test +# results. + +# RUN: cat diff-in.bin Index: llvm/utils/lit/tests/Inputs/shtest-shell-ascii/lit.cfg =================================================================== --- /dev/null +++ llvm/utils/lit/tests/Inputs/shtest-shell-ascii/lit.cfg @@ -0,0 +1,7 @@ +import lit.formats +config.name = 'shtest-shell-ascii' +config.suffixes = ['.txt'] +config.test_format = lit.formats.ShTest() +config.test_source_root = None +config.test_exec_root = None +config.substitutions.append(('%{python}', '"%s"' % (sys.executable))) Index: llvm/utils/lit/lit/display.py =================================================================== --- llvm/utils/lit/lit/display.py +++ llvm/utils/lit/lit/display.py @@ -86,7 +86,9 @@ errors="replace") except UnicodeDecodeError: pass - out = out.decode(encoding=sys.stdout.encoding) + # Python 2 can raise UnicodeDecodeError when the stdout + # encoding is ASCII. Ignore decode errors in this case + out = out.decode(encoding=sys.stdout.encoding, errors="ignore") print(out) print("*" * 20) -------------- next part -------------- A non-text attachment was scrubbed... Name: D82754.276096.patch Type: text/x-patch Size: 2305 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:15:18 2020 From: llvm-commits at lists.llvm.org (Aaron H Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:15:18 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: AaronLiu added a comment. In D81416#2117912 , @bmahjour wrote: > May I suggest we only test what this patch actually changes? This patch adds an option which when enabled allows interleaving of loops with small trip counts and scalar reductions, so it suffices to test exactly that. That should be covered by llvm/test/Transforms/LoopVectorize/PowerPC/interleave_IC.ll. I think the other test cases can be removed. IMHO adding more tests to make sure SLP vectorization happens (and the like) are redundant, add unnecessary maintenance in the future and are beyond the scope of what this patch is trying to do. Will remove other three testcases. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Tue Jul 7 09:15:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:15:42 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 276098. MaskRay marked 8 inline comments as done. MaskRay added a comment. Address comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 Files: lld/ELF/Config.h lld/ELF/Driver.cpp lld/ELF/InputSection.cpp lld/docs/ld.lld.1 lld/test/ELF/dead-reloc-in-nonalloc.s lld/test/ELF/debug-dead-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83264.276098.patch Type: text/x-patch Size: 8032 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:15:53 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:15:53 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <102a1bc7982e17ef12e78f3d0f749f20@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/ELF/Options.td:127 +defm dead_nonalloc_reloc_value : EEq<"dead-nonalloc-reloc-value", + "Resolve a relocation from a matched non-SHF_ALLOC section to a discarded " ---------------- psmith wrote: > I think it will be worth mentioning that the there can be multiple occurrences and that the last takes precedence. > "Resolve a relocation from a matched non-SHF_ALLOC section to a discarded symbol to the specified value. Accepts wildcards, in the event of a section matching more than one instance of this option, the last instance on the command-line takes precedence." Thanks for the suggestion. My fault, we don't actually have a help for `-z` options, so I can only leave the message in ld.lld.1 We probably should figure out a way to print `-z` options in `--help`. I pick `-z` because nonalloc is an ELF specific concept. ================ Comment at: lld/test/ELF/debug-dead-reloc.s:27 +# OVERRIDE: Contents of section .debug_loc: +# OVERRIDE-NEXT: 0000 2a000000 00000000 2a000000 00000000 + ---------------- grimar wrote: > What about printing other sections content too? > (seems there is no other test showing that when override the tombstone value for a debug section, the other ones > remain unaffected) > > Probably it worth to combine `CHECK` and `OVERRIDE`: > > ``` > # CHECK: Contents of section .debug_loc: > # NOOVERRIDE-NEXT: 0000 feffffff ffffffff feffffff ffffffff > # OVERRIDE-NEXT: 0000 2a000000 00000000 2a000000 00000000 > ... > ``` I think the value testing more section contents is low. This is just an auxiliary test showing that --gc-sections is similar to ICF. (Moreover, NOOVERRIDE is longer than CHECK and I would need to re-align `CHECK-NEXT:` above.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 09:16:30 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:16:30 +0000 (UTC) Subject: [PATCH] D83319: [x86] fix miscompile in buildvector v16i8 lowering Message-ID: spatel created this revision. spatel added reviewers: craig.topper, RKSimon, lebedev.ri. Herald added subscribers: hiraditya, mcrosier. Herald added a project: LLVM. In the test based on PR46586: https://bugs.llvm.org/show_bug.cgi?id=46586 ...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch. (It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.) This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report. https://reviews.llvm.org/D83319 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/buildvec-insertvec.ll Index: llvm/test/CodeGen/X86/buildvec-insertvec.ll =================================================================== --- llvm/test/CodeGen/X86/buildvec-insertvec.ll +++ llvm/test/CodeGen/X86/buildvec-insertvec.ll @@ -784,12 +784,13 @@ ret <4 x i32> %5 } -; FIXME: If we do not define all bytes that are extracted, this is a miscompile. +; If we do not define all bytes that are extracted, this is a miscompile. define i32 @PR46586(i8* %p, <4 x i32> %v) { ; SSE2-LABEL: PR46586: ; SSE2: # %bb.0: ; SSE2-NEXT: movzbl 3(%rdi), %eax +; SSE2-NEXT: pxor %xmm1, %xmm1 ; SSE2-NEXT: pinsrw $6, %eax, %xmm1 ; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3] ; SSE2-NEXT: movd %xmm1, %eax Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -8002,10 +8002,11 @@ Elt = NextElt; } - // If our first insertion is not the first index then insert into zero - // vector to break any register dependency else use SCALAR_TO_VECTOR. + // If our first insertion is not the first index or zeros are needed, then + // insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high + // elements undefined). if (!V) { - if (i != 0) + if (i != 0 || NumZero) V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl); else { V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83319.276090.patch Type: text/x-patch Size: 1521 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:19:01 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:19:01 +0000 (UTC) Subject: [PATCH] D83047: [LiveDebugValues] 2/4 Add instruction-referencing LiveDebugValues implementation In-Reply-To: References: Message-ID: <34b314b74520ce3804f9153d6310f3a9@localhost.localdomain> jmorse updated this revision to Diff 276099. jmorse marked 11 inline comments as done. jmorse added a comment. - Replace a std::pair with a DbgValueProperties class, - Replace some Densemaps of LocIdx with std::map, for initial cleanliness, - Rename ValueRec to DbgValue and make two exclusive fields a union, - Use an Optional instead of in-band signalling when it's an invalid result, - Address additional assorted feedback. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83047/new/ https://reviews.llvm.org/D83047 Files: llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83047.276099.patch Type: text/x-patch Size: 113048 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:19:14 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:19:14 +0000 (UTC) Subject: [PATCH] D83047: [LiveDebugValues] 2/4 Add instruction-referencing LiveDebugValues implementation In-Reply-To: References: Message-ID: jmorse added a comment. In D83047#2133722 , @vsk wrote: > Thanks for this, Jeremy. It'll take me multiple passes to page all of this in. I hope to get to the core algorithm changes in my next review. In the interest of getting some feedback to you sooner rather than later, I've included some minor suggestions and questions inline. Thanks for taking the time!, ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:241 +public: + uint64_t BlockNo : 20; /// The block where the def happens. + uint64_t InstNo : 20; /// The Instruction where the def happens. ---------------- vsk wrote: > You might find it convenient to use the new bitfield utilities from https://reviews.llvm.org/D82454. I'll read up on this for a further revision, I'm generally unfamiliar with the bitfield utilities as they are. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:245 + LocIdx LocNo : NUM_LOC_BITS; /// The machine location where the def happens. + // (No idea why this can work as a LocIdx, it probably shouldn't) + ---------------- vsk wrote: > I don't follow this caveat, could you rephrase this? Hmm, I don't think I meant to upload that, now removed. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:313 + static bool isEqual(LocIdx A, LocIdx B) { return A == B; } +}; + ---------------- vsk wrote: > Wdyt of getting rid of these DenseMapInfo specializations? Having special reserved values complicates things a bit. If profiling demonstrates that std::map is a bottleneck, they could be added back. Sounds like a plan, ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:321 +/// the the value, and Boolean of whether or not it's indirect. +typedef std::pair MetaVal; + ---------------- vsk wrote: > Seems worthwhile to make this a proper class, with a constructor that accepts a MachineInstr and fills out the structure. I also find the name somewhat non-specific. Wdyt of "DbgValueProperties"? Replaced with a DbgValueProperties class. This flushes out several circumstances where the class was potentially being default-constructed on map-assignment, which it's probably good to suppress. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:360 + /// as the number of registers on the target -- if the value in the register + /// is not being tracked, then the LocIdx value will be zero. New entries are + /// appended if a new spill slot begins being tracked. ---------------- vsk wrote: > Why does there need to be a LocIdx reserved for the case where the value in a register isn't tracked? It doesn't look like this is done for stack slots. Ah, this is the curse of early optimisation: it's a shortcut so that LocIDToLocIdx can be an array with constant-time lookup to identify whether the corresponding register is tracked or not (signalled by the element being nonzero). This could easily be eliminated by making LocIDToLocIdx an associative array. It isn't necessary for stack slots as they're identified by an associative array elsewhere. However, outside of MLocTracker, I've been using a zero LocIdx as shorthand for "this is an invalid/empty value/location", in the manner of Optional<> and None. It's not really tied to either registers or stack slots. This isn't good practice, but there was a stage in prototyping where the machine value number live-in / live-outs of a block could be an empty or null value. I don't think this is the case any more (95% sure) , I'll spend some time polishing this "feature" out of the implementation. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:669 +/// (DebugVariable specific) dataflow analysis. +class ValueRec { +public: ---------------- vsk wrote: > nit -- "Rec" is suggestive of "recurrence". Wdyt of naming this "DbgValue"? Works for me, and I guess it drives home that this is the value of the variable. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp:674 + /// If Kind is Const, the MachineOperand defining this value. + Optional MO; + /// Qualifiers for the ValueIDNum above. ---------------- vsk wrote: > Wdyt of grouping 'ID' and 'MO' in a union? This would make it clear that they cannot both be in use at the same time. Works for me too. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83047/new/ https://reviews.llvm.org/D83047 From llvm-commits at lists.llvm.org Tue Jul 7 09:20:07 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:20:07 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: MaskRay marked 2 inline comments as done. MaskRay added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:211 // DTPMOD may not be expected at load time. bool isLocalInExecutable = !sym.isPreemptible && !config->shared; ---------------- grimar wrote: > It looks like we can get rid of this variable now? I.e. looks like `isLocalInExecutable` can be replaced with `!sym.isPreemptible` everywhere. We can't delete it. ``` if (!sharedToExecRelax) { // if (isLocalInExecutable) // cannot be changed to // if (!sym.isPreemptible } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Tue Jul 7 09:22:38 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:22:38 +0000 (UTC) Subject: [PATCH] D83320: Hand port modfile01.f90 from test_modfile.sh to FileCheck Message-ID: richard.barton.arm created this revision. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Sample of a test_modfile.sh port to FileCheck to support an RFC Summary of changes: - Create module files in a temporary directory using -module to isolate the tests from eachother. - Use ls and wc to replace test_modfile.sh's check that no additional modules were created. - Separate, explicit RUN line to check each module content with FileCheck. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83320 Files: flang/test/Semantics/modfile01.f90 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83320.276101.patch Type: text/x-patch Size: 3017 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:25:07 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:25:07 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: <93ca48ace3d0620bab96f157e87989f0@localhost.localdomain> MaskRay marked 2 inline comments as done. MaskRay added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:311 R_TLSIE_HINT>(expr) && - canRelax && isLocalInExecutable) { + sharedToExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); ---------------- Initial-Exec -> Local-Exec is a relaxation from executable to executable. `sharedToExecRelax` is not an appropriate name. Shall we rename the variable? Technically, a shared object can use Initial-Exec as well if it is part of initial modules (via transitive DT_NEEDED; `DF_STATIC_TLS`). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Tue Jul 7 09:25:09 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Tue, 07 Jul 2020 09:25:09 -0700 (PDT) Subject: [llvm] 1c956a3 - [x86] add test for buildvector lowering miscompile (PR46586); NFC Message-ID: <5f04a1e5.1c69fb81.adec1.5216@mx.google.com> Author: Sanjay Patel Date: 2020-07-07T12:24:56-04:00 New Revision: 1c956a3eb934fffd719ab027829d616e762eca2d URL: https://github.com/llvm/llvm-project/commit/1c956a3eb934fffd719ab027829d616e762eca2d DIFF: https://github.com/llvm/llvm-project/commit/1c956a3eb934fffd719ab027829d616e762eca2d.diff LOG: [x86] add test for buildvector lowering miscompile (PR46586); NFC Added: Modified: llvm/test/CodeGen/X86/buildvec-insertvec.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/buildvec-insertvec.ll b/llvm/test/CodeGen/X86/buildvec-insertvec.ll index 73daef78bc04..3922450b0f21 100644 --- a/llvm/test/CodeGen/X86/buildvec-insertvec.ll +++ b/llvm/test/CodeGen/X86/buildvec-insertvec.ll @@ -783,3 +783,50 @@ define <4 x i32> @ossfuzz5688(i32 %a0) { store i32 %4, i32* undef ret <4 x i32> %5 } + +; FIXME: If we do not define all bytes that are extracted, this is a miscompile. + +define i32 @PR46586(i8* %p, <4 x i32> %v) { +; SSE2-LABEL: PR46586: +; SSE2: # %bb.0: +; SSE2-NEXT: movzbl 3(%rdi), %eax +; SSE2-NEXT: pinsrw $6, %eax, %xmm1 +; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3] +; SSE2-NEXT: movd %xmm1, %eax +; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3] +; SSE2-NEXT: movd %xmm0, %ecx +; SSE2-NEXT: xorl %edx, %edx +; SSE2-NEXT: divl %ecx +; SSE2-NEXT: movl %edx, %eax +; SSE2-NEXT: retq +; +; SSE41-LABEL: PR46586: +; SSE41: # %bb.0: +; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero +; SSE41-NEXT: extractps $3, %xmm0, %ecx +; SSE41-NEXT: pextrd $3, %xmm1, %eax +; SSE41-NEXT: xorl %edx, %edx +; SSE41-NEXT: divl %ecx +; SSE41-NEXT: movl %edx, %eax +; SSE41-NEXT: retq +; +; AVX-LABEL: PR46586: +; AVX: # %bb.0: +; AVX-NEXT: vpmovzxbd {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero +; AVX-NEXT: vextractps $3, %xmm0, %ecx +; AVX-NEXT: vpextrd $3, %xmm1, %eax +; AVX-NEXT: xorl %edx, %edx +; AVX-NEXT: divl %ecx +; AVX-NEXT: movl %edx, %eax +; AVX-NEXT: retq + %p0 = getelementptr inbounds i8, i8* %p, i64 0 + %p3 = getelementptr inbounds i8, i8* %p, i64 3 + %t25 = load i8, i8* %p0 + %t28 = load i8, i8* %p3 + %t29 = insertelement <4 x i8> undef, i8 %t25, i32 0 + %t32 = insertelement <4 x i8> %t29, i8 %t28, i32 3 + %t33 = zext <4 x i8> %t32 to <4 x i32> + %t34 = urem <4 x i32> %t33, %v + %t35 = extractelement <4 x i32> %t34, i32 3 + ret i32 %t35 +} From llvm-commits at lists.llvm.org Tue Jul 7 09:25:42 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:25:42 +0000 (UTC) Subject: [PATCH] D83287: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction In-Reply-To: References: Message-ID: nickdesaulniers accepted this revision. nickdesaulniers added a comment. This revision is now accepted and ready to land. Thanks, this makes the delta passes much more readable. It even looks like it might be simpler to avoid some of the local `std::vector`'s now, folding some of the loops. ================ Comment at: llvm/tools/llvm-reduce/deltas/Delta.h:53 +/// actually understand what is going on. +struct Oracle { + /// Out of all the features that we promised to be, ---------------- class ================ Comment at: llvm/tools/llvm-reduce/deltas/Delta.h:63 +public: + Oracle(ArrayRef ChunksToKeep_) : ChunksToKeep(ChunksToKeep_) {} + ---------------- explicit Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83287/new/ https://reviews.llvm.org/D83287 From llvm-commits at lists.llvm.org Tue Jul 7 09:27:31 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:27:31 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <47854798cd5eb304de4ed661318f9d44@localhost.localdomain> lebedev.ri added a comment. Thanks. This looks good to me in principle. Alive2 support for these intrinsics: https://github.com/AliveToolkit/alive2/pull/448 ================ Comment at: llvm/include/llvm/CodeGen/ISDOpcodes.h:313-314 + /// RESULT = [US]SHLSAT(LHS, RHS) - Perform saturation left shift on 2 + /// integers with the same bit width (W). If the true value of LHS << RHS + /// exceeds the largest value that can be represented by W bits, the ---------------- I'm not sure what `left shift on 2 integers` means. Perhaps this needs some rewording. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:810 + Op1Promoted = SExtPromotedInteger(Op1); + Op2Promoted = SExtPromotedInteger(Op2); + ShiftOp = ISD::SRA; ---------------- `Op2Promoted = ZExtPromotedInteger(Op2);` ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7360-7365 + // For signed shifts, we can check for overflow by checking if we would have + // shifted out any bits that disagree with the sign bit. For unsigned shifts, + // we can just check if we would have shifted out any ones. + // TODO: On targets that don't support CTLZ, it may be more efficient to pull + // down the bits to be shifted out and compare those to the signmask/zero + // instead. ---------------- Have you checked if naive `x != ((x << y) u/s>> y)` results in worse lowering? ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7377-7378 + SDValue LSignBits = DAG.getNode(ISD::CTLZ, dl, VT, XORLHS); + Threshold = DAG.getNode(ISD::SUB, dl, VT, LSignBits, + DAG.getConstant(1, dl, VT)); + } else { ---------------- Why not just change predicate to `ISD::SETUGE`? ================ Comment at: llvm/test/CodeGen/X86/sshl_sat.ll:9 +declare i18 @llvm.sshl.sat.i18 (i18, i18) +declare i64 @llvm.sshl.sat.i64 (i64, i64) +declare <4 x i32> @llvm.sshl.sat.v4i32(<4 x i32>, <4 x i32>) ---------------- Add `i32` test while at it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Tue Jul 7 09:27:32 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:27:32 +0000 (UTC) Subject: [PATCH] D83319: [x86] fix miscompile in buildvector v16i8 lowering In-Reply-To: References: Message-ID: <3eb21f1667ccad3eb3147412fe07ec08@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. ouch! LGTM - cheers. IIRC I attempted to add a DAGCombine for ANY/ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR()) for something similar to the poor codegen - I can't remember what the problem was I hit though. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83319/new/ https://reviews.llvm.org/D83319 From llvm-commits at lists.llvm.org Tue Jul 7 09:29:35 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:29:35 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <7a26626d091387859ca219eaeb00b8a4@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:298 +struct TracebackTable { + // Byte 1 ---------------- I would suggest some renames if we decide not to add any doxygen comment for the masks, because some names here are not self-explanatory. Here's some suggestion as examples: GlobaLinkageMask -> GlobalLinkageMask IsEprolMask -> IsOutOfLineEpilogOrPrologueMask HasCodeLenMask -> HasTraceBackTableOffsetMask ... Hope you could get the idea. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:310 + static constexpr uint32_t IsEprolMask = 0x0000'4000; + static constexpr uint32_t HasCodeLenMask = 0x0000'2000; + static constexpr uint32_t IntProcMask = 0x0000'1000; ---------------- In AIX OS header "/usr/include/sys/debug.h", we have this field as ``` unsigned has_tboff:1; /* Set if offset from start of proc stored */ ``` It might be better to rename HasCodeLenMask sot that it has a bit association with the original name? So that people do not need to reason about if this is the correct field or not. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:334 + // Byte 6 + static constexpr uint32_t HasVecInfoMask = 0x0080'0000; + static constexpr uint32_t Spare4Mask = 0x0040'0000; ---------------- I find the 6th byte here is a bit different than what we have in the OS headers: ``` /* Byte 6 */ unsigned longtbtable:1; /* Set if xtbtable extension exists. */ unsigned has_vec:1; /* Set if optional vector info is present */ unsigned gpr_saved:6; /* Number of GPRs saved, max of 32 */ ``` ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:348 + + static constexpr uint32_t FixedParaTypeBit = 0x8000'0000; + static constexpr uint32_t FloatPointParaTypeBit = 0x4000'0000; ---------------- TODO: add a comment to say this is for other purpose? ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:399 + Optional CodeLen; + Optional HandMask; + Optional CtlInfo; ---------------- HandMask -> HandlerMask ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:400 + Optional HandMask; + Optional CtlInfo; + Optional FunctionNameLen; ---------------- TODO: no number of CTL anchors? or Displacement into stack of each anchor? ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:406 +public: + static Expected create(const uint8_t *Ptr, uint64_t S); + XCOFFTracebackTable(const uint8_t *Ptr, uint64_t S, Error &Err); ---------------- Rename parameter `S` as `Size`. `S` itself doesn't tell people what this parameter is meant to be without people looking at the function body for a while. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:428 + + bool doesBackChainStore(); + uint8_t getNumOfFPRsSaved(); ---------------- doesBackChainStore -> isBackChainStore ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:842 +bool doesXCOFFTracebackTableBegin(ArrayRef Bytes) { + assert(Bytes.size() == 4 && "Traceback table started with 4 bytes zero."); + return support::endian::read32be(Bytes.data()) == 0; ---------------- How does callers know that they have to pass an ArrayRef that has exactly 4 bytes to this function? What if they have an empty array? What if they pass in a 8 byte array? I don't think it's callers' responsibility to ensure that. This function should still return a correct answer for the above situation. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:847 +static std::string parseParaType(uint32_t Value, unsigned int ParaNum) { + std::string ParaType; + for (unsigned I = 0; I < ParaNum; ++I) { ---------------- I think we could use a SmallString here when editing the string and return std::string in the end for better performance (especially when you know the ParaNum which have some implication on how large the string could be). ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:850 + if (I != 0) + ParaType += ", "; + if ((Value & TracebackTable::FixedParaTypeBit) == 0) { ---------------- Consider doing ParaType += "i, " and ParaType += "f, " ... and do a removal of ", " after parsing all parameters. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:891 + + if (!Err && ParaNum && !hasVecInfo()) + ParaType = parseParaType(DE.getU32(&offset_ptr, &Err), ParaNum); ---------------- When (ParaNum && hasVecInfo()) returns true, and we do not parse this, would we have the wrong offset_ptr for the rest of the traceback table? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:902 + CtlInfo = DE.getU32(&offset_ptr, &Err); + + if (!Err && isFuncNamePresent()) { ---------------- Are we missing something between CtlInfo and FunctionNameLen? ``` * ctl_info exists if has_ctl bit is set. * ctl_info_disp exists if ctl_info exists. * name_len exists if name_present bit is set. ``` i.e. the ctl_info_disp? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:912 + AllocaRegister = DE.getU8(&offset_ptr, &Err); +} + ---------------- It seems this is missing the parsing of ``` struct vec_ext vec_ext; /* Vector extension (if has_vec is set) */ unsigned char xtbtable; /* More tbtable fields, if longtbtable is set*/ ``` Although it is Okay (or even preferable) that we do not handle these extra fields in this patch to keep the patch size reasonable, I think we should still return errors when these fields are present. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:916 + (support::endian::read32be(TBPtr + P) & TracebackTable::X) +#define GETBITWITHMASKSHIFT(P, X, S) \ + (support::endian::read32be(TBPtr + P) & TracebackTable::X) >> \ ---------------- Macros are missing undefs. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Tue Jul 7 09:29:47 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:29:47 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: <3dd88eb5e513a71926286fa67a435011@localhost.localdomain> cameron.mcinally added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll:25 +; how fixed length operation are lowered to scalable ones, with multiple blocks +; ensuring insert/extract sequences are not folded away. + ---------------- paulwalker-arm wrote: > cameron.mcinally wrote: > > Nit: could probably mark the loads/stores volatile to avoid the branch. > I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure the extract_subvector resulting from lowering the load is not combined with the insert_subvector that's created when lowering the store. Ok, that makes sense. Let me ask the opposite though -- if the load and store are volatile, will the fixed-width lowering honor the volatile? ``` x = load volatile *p y = fneg x x = load volatile *p store *p, x ``` Unlikely a problem considering the loads made it to the backend, but would be good to confirm. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 From llvm-commits at lists.llvm.org Tue Jul 7 09:32:39 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:32:39 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <74bfb2e4c3e7ce63f892ef63e1c53c30@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM - cheers ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:40269 + SmallVector ShlVals; + for (unsigned i = 0; i != VT.getVectorNumElements(); ++i) { + auto *MaskVal = cast(Mask.getOperand(i)); ---------------- unsigned i = 0, e = VT.getVectorNumElements(); i != e; ++i CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 From llvm-commits at lists.llvm.org Tue Jul 7 09:33:18 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:33:18 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <24b96392f78d986cf224fe1f86a9f9ce@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:348 + + static constexpr uint32_t FixedParaTypeBit = 0x8000'0000; + static constexpr uint32_t FloatPointParaTypeBit = 0x4000'0000; ---------------- jasonliu wrote: > TODO: add a comment to say this is for other purpose? Sorry... TODO was for my reviewing purpose. What I wanted to say is that we might want to add a comment here to separate FixedParaTypeBit and FloatPointParaTypeBit from above `Byte [num]`. Just a blank line is not enough. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Tue Jul 7 09:34:53 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:34:53 +0000 (UTC) Subject: [PATCH] D83319: [x86] fix miscompile in buildvector v16i8 lowering In-Reply-To: References: Message-ID: <208fd0f9db6ace1c905782f7a2fd6a6e@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83319/new/ https://reviews.llvm.org/D83319 From llvm-commits at lists.llvm.org Tue Jul 7 09:38:21 2020 From: llvm-commits at lists.llvm.org (Ding Fei via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:38:21 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound Message-ID: danix800 created this revision. danix800 added reviewers: ddunbar, lattner, krememek. Herald added subscribers: llvm-commits, Charusso, arphaman, dexonsmith, hiraditya. Herald added a project: LLVM. Size of paths in utf8 is possibly bigger than in utf16. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83321 Files: llvm/lib/Support/Windows/Path.inc Index: llvm/lib/Support/Windows/Path.inc =================================================================== --- llvm/lib/Support/Windows/Path.inc +++ llvm/lib/Support/Windows/Path.inc @@ -958,8 +958,8 @@ // Convert path to the format that Windows is happy with. if (PathUTF16.size() > 0 && - !is_separator(PathUTF16[Path.size() - 1]) && - PathUTF16[Path.size() - 1] != L':') { + !is_separator(PathUTF16[PathUTF16.size() - 1]) && + PathUTF16[PathUTF16.size() - 1] != L':') { PathUTF16.push_back(L'\\'); PathUTF16.push_back(L'*'); } else { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83321.276103.patch Type: text/x-patch Size: 580 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:48:15 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:48:15 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <879daba22369a667ada34d0237e93e5f@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:428 + + bool doesBackChainStore(); + uint8_t getNumOfFPRsSaved(); ---------------- jasonliu wrote: > doesBackChainStore -> isBackChainStore >From a brief look, I think the "is" form would be `isBackChainStoring`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Tue Jul 7 09:48:16 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:48:16 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <14f361495c56d0dddb5c7b0e992d74ea@localhost.localdomain> guiand added a comment. I was discussing with @eugenis about this and he suggested that it's reasonable not to copy any call-inst attributes when simplifying LibCalls. Specifically because you run into these messy questions of how reasonable it is to blindly apply attributes as a caller. Since LibCalls have their own inherent attribute lists which get applied upon creating a `Function` instance for them, it seems like we shouldn't impose any further attributes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Tue Jul 7 09:48:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Tue, 07 Jul 2020 09:48:23 -0700 (PDT) Subject: [lld] 09b81a7 - [ELF] Ignore --no-relax for RISC-V Message-ID: <5f04a757.1c69fb81.88fa8.77bb@mx.google.com> Author: Fangrui Song Date: 2020-07-07T09:48:13-07:00 New Revision: 09b81a72ac67c035f74ff369e6862d75cc4c4090 URL: https://github.com/llvm/llvm-project/commit/09b81a72ac67c035f74ff369e6862d75cc4c4090 DIFF: https://github.com/llvm/llvm-project/commit/09b81a72ac67c035f74ff369e6862d75cc4c4090.diff LOG: [ELF] Ignore --no-relax for RISC-V In GNU ld, --no-relax can disable x86-64 GOTPCRELX relaxation. It is not useful, so we don't implement it. For RISC-V, --no-relax disables linker relaxations which have larger impact. Linux kernel specifies --no-relax when CONFIG_DYNAMIC_FTRACE is specified (since http://git.kernel.org/linus/a1d2a6b4cee858a2f27eebce731fbf1dfd72cb4e ). LLD has not implemented the relaxations, so this option is a no-op. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D81359 Added: Modified: lld/ELF/Options.td lld/docs/ld.lld.1 lld/test/ELF/silent-ignore.test Removed: ################################################################################ diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td index 0a16faa2b8fe..bc12f4d45546 100644 --- a/lld/ELF/Options.td +++ b/lld/ELF/Options.td @@ -635,6 +635,7 @@ def: F<"no-copy-dt-needed-entries">; def: F<"no-ctors-in-init-array">; def: F<"no-keep-memory">; def: F<"no-pipeline-knowledge">; +def: F<"no-relax">; def: F<"no-warn-mismatch">; def: Flag<["-"], "p">; def: Separate<["--", "-"], "rpath-link">; diff --git a/lld/docs/ld.lld.1 b/lld/docs/ld.lld.1 index 0522feb145f1..3acc818afa22 100644 --- a/lld/docs/ld.lld.1 +++ b/lld/docs/ld.lld.1 @@ -308,6 +308,8 @@ Disable merging .ARM.exidx entries. Page align sections. .It Fl -no-omagic Do not set the text data sections to be writable, page align sections. +.It Fl -no-relax +Disable target-specific relaxations. This is currently a no-op. .It Fl -no-rosegment Do not put read-only non-executable sections in their own segment. .It Fl -no-undefined-version diff --git a/lld/test/ELF/silent-ignore.test b/lld/test/ELF/silent-ignore.test index 600c9f86b18a..6f76809d0e11 100644 --- a/lld/test/ELF/silent-ignore.test +++ b/lld/test/ELF/silent-ignore.test @@ -7,6 +7,7 @@ RUN: -no-copy-dt-needed-entries \ RUN: -no-ctors-in-init-array \ RUN: -no-keep-memory \ RUN: -no-pipeline-knowledge \ +RUN: --no-relax \ RUN: -no-warn-mismatch \ RUN: -p \ RUN: -rpath-link . \ From llvm-commits at lists.llvm.org Tue Jul 7 09:48:34 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:48:34 +0000 (UTC) Subject: [PATCH] D81359: [ELF] Ignore --no-relax for RISC-V In-Reply-To: References: Message-ID: <048bfdbd3e4b8ecb8b98153b39cdde05@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG09b81a72ac67: [ELF] Ignore --no-relax for RISC-V (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81359/new/ https://reviews.llvm.org/D81359 Files: lld/ELF/Options.td lld/docs/ld.lld.1 lld/test/ELF/silent-ignore.test Index: lld/test/ELF/silent-ignore.test =================================================================== --- lld/test/ELF/silent-ignore.test +++ lld/test/ELF/silent-ignore.test @@ -7,6 +7,7 @@ RUN: -no-ctors-in-init-array \ RUN: -no-keep-memory \ RUN: -no-pipeline-knowledge \ +RUN: --no-relax \ RUN: -no-warn-mismatch \ RUN: -p \ RUN: -rpath-link . \ Index: lld/docs/ld.lld.1 =================================================================== --- lld/docs/ld.lld.1 +++ lld/docs/ld.lld.1 @@ -308,6 +308,8 @@ Page align sections. .It Fl -no-omagic Do not set the text data sections to be writable, page align sections. +.It Fl -no-relax +Disable target-specific relaxations. This is currently a no-op. .It Fl -no-rosegment Do not put read-only non-executable sections in their own segment. .It Fl -no-undefined-version Index: lld/ELF/Options.td =================================================================== --- lld/ELF/Options.td +++ lld/ELF/Options.td @@ -635,6 +635,7 @@ def: F<"no-ctors-in-init-array">; def: F<"no-keep-memory">; def: F<"no-pipeline-knowledge">; +def: F<"no-relax">; def: F<"no-warn-mismatch">; def: Flag<["-"], "p">; def: Separate<["--", "-"], "rpath-link">; -------------- next part -------------- A non-text attachment was scrubbed... Name: D81359.276109.patch Type: text/x-patch Size: 1220 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:48:55 2020 From: llvm-commits at lists.llvm.org (Bardia Mahjour via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:48:55 +0000 (UTC) Subject: [PATCH] D82927: Intergerate Loop Peeling into Loop Fusion In-Reply-To: References: Message-ID: <9b50c9805e29dc0a00eb25ee70d4b4f3@localhost.localdomain> bmahjour added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:124 + +static cl::opt FusionPeelMaxCount( + "loop-fusion-peel-max-count", cl::init(3), cl::Hidden, ---------------- We only really need one option to control peeling for loop fusion. You can remove `loop-fusion-peel` and change the default for this threshold to 0. I don't know what others feel about this, but I'd rather have shorter commands when possible. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:758 + + LLVM_DEBUG(dbgs() << "Difference in Loop trip count is: " << Difference + << "\n"); ---------------- nit: Loop -> loop ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:766 + unsigned PeelCount) { + assert(FC0.AbleToPeel && "Should be able to Peel Loop"); + ---------------- nit: Peel Loop -> peel loop. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:814 + LLVM_DEBUG( + dbgs() << "Sucessfully Peeled " << FC0.PP.PeelCount + << " iterations from the first loop.\n" ---------------- nit: Peeled -> peeled ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:861 + + // Check if the user specfices to use Peeling with fusion + // Check FC0 (first loop) is able to be peeled ---------------- nit: comments should be full sentences ending with a period. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:862 + // Check if the user specfices to use Peeling with fusion + // Check FC0 (first loop) is able to be peeled + // Check that both loops have the different tripcounts ---------------- is able to be peeled -> can be peeled ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:863 + // Check FC0 (first loop) is able to be peeled + // Check that both loops have the different tripcounts + if (FusionPeel.getNumOccurrences() > 0 && FusionPeel && ---------------- the different -> different ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:867 + assert(*TCDifference <= FusionPeelMaxCount && + "Peel Count: is greater than maximum value specificed"); + // Dependent on peeling being performed on the first loop, and ---------------- Please change this to give up on such loops instead of asserting. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:1252 /// Simplify the condition of the latch branch of \p FC to true, when both of /// its successors are the same. ---------------- This function no longer does what this comment says. Please update the comment. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:1558 + // of the FC0 Exit block to the to the beginning of the exit block of FC1. + if (!FC0.Peeled) + moveInstructionsToTheBeginning(*FC0.ExitBlock, *FC1.ExitBlock, DT, PDT, ---------------- `moveInstructionsToTheBeginning(FC0.Peeled ? *FC0ExitBlockSuccessor : *FC0.ExitBlock, *FC1.ExitBlock, DT, PDT, DI);` ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:1583 FC0.GuardBranch->replaceUsesOfWith(FC0NonLoopBlock, FC1NonLoopBlock); - FC0.ExitBlock->getTerminator()->replaceUsesOfWith(FC1GuardBlock, - FC1.Header); + if (!FC0.Peeled) + FC0.ExitBlock->getTerminator()->replaceUsesOfWith(FC1GuardBlock, ---------------- ``` BasicBlock *BBToUpdate = FC0.Peeled ? FC0ExitBlockSuccessor : FC0.ExitBlock; BBToUpdate->getTerminator()->replaceUsesOfWith(FC1GuardBlock, FC1.Header); ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82927/new/ https://reviews.llvm.org/D82927 From llvm-commits at lists.llvm.org Tue Jul 7 09:51:13 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:51:13 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: <2998ce966969b262f5bcce2e7a074356@localhost.localdomain> paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll:25 +; how fixed length operation are lowered to scalable ones, with multiple blocks +; ensuring insert/extract sequences are not folded away. + ---------------- cameron.mcinally wrote: > paulwalker-arm wrote: > > cameron.mcinally wrote: > > > Nit: could probably mark the loads/stores volatile to avoid the branch. > > I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure the extract_subvector resulting from lowering the load is not combined with the insert_subvector that's created when lowering the store. > Ok, that makes sense. Let me ask the opposite though -- if the load and store are volatile, will the fixed-width lowering honor the volatile? > > ``` > x = load volatile *p > y = fneg x > x = load volatile *p > store *p, x > ``` > > Unlikely a problem considering the loads made it to the backend, but would be good to confirm. The resulting masked memory operation takes the same MachineMemOperand as the original fixed length operation, so the volatile flag will be maintained. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 From llvm-commits at lists.llvm.org Tue Jul 7 09:52:10 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:52:10 +0000 (UTC) Subject: [PATCH] D82680: MSAN: Allow emitting checks for struct types In-Reply-To: References: Message-ID: guiand updated this revision to Diff 276110. guiand added a comment. Fixed msan_x86_bts_asm test after optimizing away `icmp i1 %x, false` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82680/new/ https://reviews.llvm.org/D82680 Files: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/check-struct.ll llvm/test/Instrumentation/MemorySanitizer/msan_x86_bts_asm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82680.276110.patch Type: text/x-patch Size: 7359 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 09:56:16 2020 From: llvm-commits at lists.llvm.org (Bardia Mahjour via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:56:16 +0000 (UTC) Subject: [PATCH] D83056: Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: <833424774584d2d897459e427d37ffe2@localhost.localdomain> bmahjour added a comment. Please put NFC in the title if no functional changes are intended. The same comment applies to D80580 . Thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 09:57:22 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:57:22 +0000 (UTC) Subject: [PATCH] D83323: AMDGPU/GlobalISel: Handle call return values Message-ID: arsenm created this revision. arsenm added reviewers: nhaehnle, kerbowa, foad, Petar.Avramovic, mbrkusanin. Herald added subscribers: hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, wdng, jvesely, kzhuravl. Herald added a project: LLVM. The only case that I know doesn't work is the implicit sret case when the return type doesn't fit in the return registers. https://reviews.llvm.org/D83323 Files: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83323.276104.patch Type: text/x-patch Size: 160639 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:00:03 2020 From: llvm-commits at lists.llvm.org (Amy Huang via llvm-commits) Date: Tue, 07 Jul 2020 10:00:03 -0700 (PDT) Subject: [llvm] 9ee90a4 - [NativeSession] Add column numbers to NativeLineNumber. Message-ID: <5f04aa13.1c69fb81.f2c3e.2aea@mx.google.com> Author: Amy Huang Date: 2020-07-07T09:59:22-07:00 New Revision: 9ee90a490563a735ddaa739a34c2204c7494826f URL: https://github.com/llvm/llvm-project/commit/9ee90a490563a735ddaa739a34c2204c7494826f DIFF: https://github.com/llvm/llvm-project/commit/9ee90a490563a735ddaa739a34c2204c7494826f.diff LOG: [NativeSession] Add column numbers to NativeLineNumber. Summary: This adds column numbers if they are present, and otherwise sets the column number to be zero. Bug: https://bugs.llvm.org/show_bug.cgi?id=41795 Reviewers: amccarth Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81950 Added: llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test Modified: llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h b/llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h index f105526adf56..a7ce82c70b08 100644 --- a/llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h +++ b/llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h @@ -19,7 +19,8 @@ namespace pdb { class NativeLineNumber : public IPDBLineNumber { public: explicit NativeLineNumber(const NativeSession &Session, - const codeview::LineInfo Line, uint32_t Length, + const codeview::LineInfo Line, + uint32_t ColumnNumber, uint32_t Length, uint32_t Section, uint32_t Offset, uint32_t SrcFileId); @@ -39,6 +40,7 @@ class NativeLineNumber : public IPDBLineNumber { private: const NativeSession &Session; const codeview::LineInfo Line; + uint32_t ColumnNumber; uint32_t Section; uint32_t Offset; uint32_t Length; diff --git a/llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h b/llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h index e14ef3209916..90fd19a7a2fb 100644 --- a/llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h +++ b/llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h @@ -75,6 +75,7 @@ class SymbolCache { struct LineTableEntry { uint64_t Addr; codeview::LineInfo Line; + uint32_t ColumnNumber; uint32_t FileNameIndex; bool IsTerminalEntry; }; diff --git a/llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp b/llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp index f493c1807942..2535e09baf62 100644 --- a/llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp +++ b/llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp @@ -13,10 +13,11 @@ using namespace llvm::pdb; NativeLineNumber::NativeLineNumber(const NativeSession &Session, const codeview::LineInfo Line, - uint32_t Section, uint32_t Offset, - uint32_t Length, uint32_t SrcFileId) - : Session(Session), Line(Line), Section(Section), Offset(Offset), - Length(Length), SrcFileId(SrcFileId) {} + uint32_t ColumnNumber, uint32_t Section, + uint32_t Offset, uint32_t Length, + uint32_t SrcFileId) + : Session(Session), Line(Line), ColumnNumber(ColumnNumber), + Section(Section), Offset(Offset), Length(Length), SrcFileId(SrcFileId) {} uint32_t NativeLineNumber::getLineNumber() const { return Line.getStartLine(); } @@ -24,7 +25,7 @@ uint32_t NativeLineNumber::getLineNumberEnd() const { return Line.getEndLine(); } -uint32_t NativeLineNumber::getColumnNumber() const { return 0; } +uint32_t NativeLineNumber::getColumnNumber() const { return ColumnNumber; } uint32_t NativeLineNumber::getColumnNumberEnd() const { return 0; } diff --git a/llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp b/llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp index 83cf77aae862..9f15907b519e 100644 --- a/llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp +++ b/llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp @@ -460,18 +460,33 @@ SymbolCache::findLineTable(uint16_t Modi) const { continue; std::vector Entries; + + // If there are column numbers, then they should be in a parallel stream + // to the line numbers. + auto ColIt = Group.Columns.begin(); + auto ColsEnd = Group.Columns.end(); + for (const LineNumberEntry &LN : Group.LineNumbers) { - LineInfo Line(LN.Flags); uint64_t VA = Session.getVAFromSectOffset(RelocSegment, RelocOffset + LN.Offset); - Entries.push_back({VA, Line, Group.NameIndex, false}); + LineInfo Line(LN.Flags); + uint32_t ColNum = 0; + + if (Lines.hasColumnInfo() && ColIt != ColsEnd) { + ColNum = ColIt->StartColumn; + ++ColIt; + } + Entries.push_back({VA, Line, ColNum, Group.NameIndex, false}); } // Add a terminal entry line to mark the end of this subsection. - LineInfo LastLine(Group.LineNumbers.back().Flags); uint64_t VA = Session.getVAFromSectOffset( RelocSegment, RelocOffset + Lines.header()->CodeSize); - Entries.push_back({VA, LastLine, Group.NameIndex, true}); + LineInfo LastLine(Group.LineNumbers.back().Flags); + uint32_t ColNum = + (Lines.hasColumnInfo()) ? Group.Columns.back().StartColumn : 0; + Entries.push_back({VA, LastLine, ColNum, Group.NameIndex, true}); + EntryList.push_back(Entries); } } @@ -571,8 +586,8 @@ SymbolCache::findLineNumbersByVA(uint64_t VA, uint32_t Length) const { auto ChecksumIter = ExpectedChecksums->getArray().at(LineIter->FileNameIndex); uint32_t SrcFileId = getOrCreateSourceFile(*ChecksumIter); - NativeLineNumber LineNum(Session, LineIter->Line, LineSect, LineOff, - LineLength, SrcFileId); + NativeLineNumber LineNum(Session, LineIter->Line, LineIter->ColumnNumber, + LineSect, LineOff, LineLength, SrcFileId); LineNumbers.push_back(LineNum); ++LineIter; } diff --git a/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe new file mode 100644 index 000000000000..1d9a40dc74e5 Binary files /dev/null and b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe diff er diff --git a/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb new file mode 100644 index 000000000000..cd1093270e84 Binary files /dev/null and b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb diff er diff --git a/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp index bf97594fa4c8..e1ac50f2e820 100644 --- a/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp +++ b/llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp @@ -1,5 +1,7 @@ // To generate the corresponding EXE/PDB, run: // cl /Zi test.cpp +// To generate the PDB with column numbers, run: +// clang-cl /Zi -gcolumn-info test.cpp namespace NS { struct Foo { diff --git a/llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test b/llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test new file mode 100644 index 000000000000..a8ccc33d0357 --- /dev/null +++ b/llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test @@ -0,0 +1,29 @@ +RUN: echo 0x140006BA0 > %t.input +RUN: echo 0x140006C00 >> %t.input +RUN: echo 0x140006BB0 >> %t.input +RUN: echo 0x140006C10 >> %t.input +RUN: echo 0x140006C20 >> %t.input +RUN: echo 0x140006C30 >> %t.input +RUN: echo 0x140006C40 >> %t.input +RUN: echo 0x140006C70 >> %t.input +RUN: llvm-symbolizer -obj="%p/Inputs/test-columns.exe" -use-native-pdb-reader < %t.input \ +RUN: | FileCheck %s + +This tests that the symbolizer outputs column info when it is present in the pdb. + +CHECK: foo(void) +CHECK-NEXT: test.cpp:11:1 +CHECK: {{^private_symbol$}} +CHECK-NEXT: test.cpp:14:1 +CHECK: {{^main}} +CHECK-NEXT: test.cpp:16:0 +CHECK: {{^foo_cdecl$}} +CHECK-NEXT: test.cpp:25:27 +CHECK: {{^foo_stdcall$}} +CHECK-NEXT: test.cpp:26:31 +CHECK: {{^foo_fastcall$}} +CHECK-NEXT: test.cpp:27:33 +CHECK: {{^foo_vectorcall}} +CHECK-NEXT: test.cpp:28:37 +CHECK: NS::Foo::bar(void) +CHECK-NEXT: test.cpp:6:0 From llvm-commits at lists.llvm.org Tue Jul 7 10:00:15 2020 From: llvm-commits at lists.llvm.org (Amy Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:00:15 +0000 (UTC) Subject: [PATCH] D81950: [NativeSession] Add column numbers to NativeLineNumber. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG9ee90a490563: [NativeSession] Add column numbers to NativeLineNumber. (authored by akhuang). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81950/new/ https://reviews.llvm.org/D81950 Files: llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D81950.276112.patch Type: text/x-patch Size: 6415 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:00:35 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:00:35 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <75195ffab508942babd212c05e71d20c@localhost.localdomain> tskeith added a comment. @ktras, do you have the necessary permissions to commit this or do you need someone to do it for you? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Tue Jul 7 10:01:33 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:01:33 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <6978c55b55316cb167aafd7d5978f431@localhost.localdomain> zequanwu marked an inline comment as done. zequanwu added inline comments. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- nikic wrote: > hans wrote: > > nikic wrote: > > > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > > > > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > It would only help if there is some way to only fetch the analysis conditionally. I believe many PGO passes use something like PSI.hasProfileSummary() or F.hasProfileData() for that. > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > Right, just disabling this by default in clang/opt would also work. > > For reference, the current compile-time numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=516ff1d4baee28b1911737e47b42973567adf8ff&to=8df840660bb764b6653fcfd9ac7a72cc6adebde6&stat=instructions Not huge, but it adds up (some similar regressions have been introduced in LLVM 10). Do you mean disabling it just for LPM or both? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 10:02:40 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Tue, 07 Jul 2020 10:02:40 -0700 (PDT) Subject: [llvm] 642eed3 - [x86] fix miscompile in buildvector v16i8 lowering Message-ID: <5f04aab0.1c69fb81.8555b.76a2@mx.google.com> Author: Sanjay Patel Date: 2020-07-07T13:02:31-04:00 New Revision: 642eed37134db4aca953704d1e4ae856af675f51 URL: https://github.com/llvm/llvm-project/commit/642eed37134db4aca953704d1e4ae856af675f51 DIFF: https://github.com/llvm/llvm-project/commit/642eed37134db4aca953704d1e4ae856af675f51.diff LOG: [x86] fix miscompile in buildvector v16i8 lowering In the test based on PR46586: https://bugs.llvm.org/show_bug.cgi?id=46586 ...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch. (It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.) This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report. Differential Revision: https://reviews.llvm.org/D83319 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/buildvec-insertvec.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 4821dd44e01f..575f358361b1 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -8002,10 +8002,11 @@ static SDValue LowerBuildVectorv16i8(SDValue Op, unsigned NonZeros, Elt = NextElt; } - // If our first insertion is not the first index then insert into zero - // vector to break any register dependency else use SCALAR_TO_VECTOR. + // If our first insertion is not the first index or zeros are needed, then + // insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high + // elements undefined). if (!V) { - if (i != 0) + if (i != 0 || NumZero) V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl); else { V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt); diff --git a/llvm/test/CodeGen/X86/buildvec-insertvec.ll b/llvm/test/CodeGen/X86/buildvec-insertvec.ll index 3922450b0f21..e428ae8d5919 100644 --- a/llvm/test/CodeGen/X86/buildvec-insertvec.ll +++ b/llvm/test/CodeGen/X86/buildvec-insertvec.ll @@ -784,12 +784,13 @@ define <4 x i32> @ossfuzz5688(i32 %a0) { ret <4 x i32> %5 } -; FIXME: If we do not define all bytes that are extracted, this is a miscompile. +; If we do not define all bytes that are extracted, this is a miscompile. define i32 @PR46586(i8* %p, <4 x i32> %v) { ; SSE2-LABEL: PR46586: ; SSE2: # %bb.0: ; SSE2-NEXT: movzbl 3(%rdi), %eax +; SSE2-NEXT: pxor %xmm1, %xmm1 ; SSE2-NEXT: pinsrw $6, %eax, %xmm1 ; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3] ; SSE2-NEXT: movd %xmm1, %eax From llvm-commits at lists.llvm.org Tue Jul 7 10:02:57 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:02:57 +0000 (UTC) Subject: [PATCH] D83319: [x86] fix miscompile in buildvector v16i8 lowering In-Reply-To: References: Message-ID: <541aede94d25dd664006672fea9a52f1@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG642eed37134d: [x86] fix miscompile in buildvector v16i8 lowering (authored by spatel). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83319/new/ https://reviews.llvm.org/D83319 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/buildvec-insertvec.ll Index: llvm/test/CodeGen/X86/buildvec-insertvec.ll =================================================================== --- llvm/test/CodeGen/X86/buildvec-insertvec.ll +++ llvm/test/CodeGen/X86/buildvec-insertvec.ll @@ -784,12 +784,13 @@ ret <4 x i32> %5 } -; FIXME: If we do not define all bytes that are extracted, this is a miscompile. +; If we do not define all bytes that are extracted, this is a miscompile. define i32 @PR46586(i8* %p, <4 x i32> %v) { ; SSE2-LABEL: PR46586: ; SSE2: # %bb.0: ; SSE2-NEXT: movzbl 3(%rdi), %eax +; SSE2-NEXT: pxor %xmm1, %xmm1 ; SSE2-NEXT: pinsrw $6, %eax, %xmm1 ; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3] ; SSE2-NEXT: movd %xmm1, %eax Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -8002,10 +8002,11 @@ Elt = NextElt; } - // If our first insertion is not the first index then insert into zero - // vector to break any register dependency else use SCALAR_TO_VECTOR. + // If our first insertion is not the first index or zeros are needed, then + // insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high + // elements undefined). if (!V) { - if (i != 0) + if (i != 0 || NumZero) V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl); else { V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83319.276114.patch Type: text/x-patch Size: 1521 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:03:02 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via llvm-commits) Date: Tue, 07 Jul 2020 10:03:02 -0700 (PDT) Subject: [llvm] 7fc279c - [GlobalOpt] Don't remove inalloca from musttail-called functions Message-ID: <5f04aac6.1c69fb81.f1d78.2e53@mx.google.com> Author: Hans Wennborg Date: 2020-07-07T19:02:46+02:00 New Revision: 7fc279ca3d414c0997998cb30d1adc2c63c837a5 URL: https://github.com/llvm/llvm-project/commit/7fc279ca3d414c0997998cb30d1adc2c63c837a5 DIFF: https://github.com/llvm/llvm-project/commit/7fc279ca3d414c0997998cb30d1adc2c63c837a5.diff LOG: [GlobalOpt] Don't remove inalloca from musttail-called functions Otherwise the verifier complains about the mismatching function ABIs. Differential revision: https://reviews.llvm.org/D83300 Added: Modified: llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/test/Transforms/GlobalOpt/fastcc.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp index ed989529d1ae..d9fb820f7cb5 100644 --- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -2442,7 +2442,7 @@ OptimizeFunctions(Module &M, // FIXME: We should also hoist alloca affected by this to the entry // block if possible. if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca) && - !F->hasAddressTaken()) { + !F->hasAddressTaken() && !hasMustTailCallers(F)) { RemoveAttribute(F, Attribute::InAlloca); Changed = true; } diff --git a/llvm/test/Transforms/GlobalOpt/fastcc.ll b/llvm/test/Transforms/GlobalOpt/fastcc.ll index 9c9076d0155b..edd0688ea92b 100644 --- a/llvm/test/Transforms/GlobalOpt/fastcc.ll +++ b/llvm/test/Transforms/GlobalOpt/fastcc.ll @@ -35,6 +35,17 @@ define internal i32 @inalloca(i32* inalloca %p) { ret i32 %rv } +define i32 @inalloca2_caller(i32* inalloca %p) { + %rv = musttail call i32 @inalloca2(i32* inalloca %p) + ret i32 %rv +} +define internal i32 @inalloca2(i32* inalloca %p) { +; Because of the musttail caller, this inalloca cannot be dropped. +; CHECK-LABEL: define internal i32 @inalloca2(i32* inalloca %p) + %rv = load i32, i32* %p + ret i32 %rv +} + define internal i32 @preallocated(i32* preallocated(i32) %p) { ; CHECK-LABEL: define internal fastcc i32 @preallocated(i32* %p) %rv = load i32, i32* %p From llvm-commits at lists.llvm.org Tue Jul 7 10:03:12 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:03:12 +0000 (UTC) Subject: [PATCH] D83300: [GlobalOpt] Don't remove inalloca from musttail-called functions In-Reply-To: References: Message-ID: <9543dbc1648f19648938fdb47dd4b394@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG7fc279ca3d41: [GlobalOpt] Don't remove inalloca from musttail-called functions (authored by hans). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83300/new/ https://reviews.llvm.org/D83300 Files: llvm/lib/Transforms/IPO/GlobalOpt.cpp llvm/test/Transforms/GlobalOpt/fastcc.ll Index: llvm/test/Transforms/GlobalOpt/fastcc.ll =================================================================== --- llvm/test/Transforms/GlobalOpt/fastcc.ll +++ llvm/test/Transforms/GlobalOpt/fastcc.ll @@ -35,6 +35,17 @@ ret i32 %rv } +define i32 @inalloca2_caller(i32* inalloca %p) { + %rv = musttail call i32 @inalloca2(i32* inalloca %p) + ret i32 %rv +} +define internal i32 @inalloca2(i32* inalloca %p) { +; Because of the musttail caller, this inalloca cannot be dropped. +; CHECK-LABEL: define internal i32 @inalloca2(i32* inalloca %p) + %rv = load i32, i32* %p + ret i32 %rv +} + define internal i32 @preallocated(i32* preallocated(i32) %p) { ; CHECK-LABEL: define internal fastcc i32 @preallocated(i32* %p) %rv = load i32, i32* %p Index: llvm/lib/Transforms/IPO/GlobalOpt.cpp =================================================================== --- llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -2442,7 +2442,7 @@ // FIXME: We should also hoist alloca affected by this to the entry // block if possible. if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca) && - !F->hasAddressTaken()) { + !F->hasAddressTaken() && !hasMustTailCallers(F)) { RemoveAttribute(F, Attribute::InAlloca); Changed = true; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83300.276115.patch Type: text/x-patch Size: 1318 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:05:09 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:05:09 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <2812f763430ec838406f7991b06cca5c@localhost.localdomain> kamaub accepted this revision as: kamaub. kamaub added a comment. This revision is now accepted and ready to land. This is LGTM, one suggestion on the clang-format change that can be done pre-commit. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp:97 + // name offset bits flags + {"fixup_ppc_br24", 6, 24, MCFixupKindInfo::FKF_IsPCRel}, + {"fixup_ppc_br24_notoc", 6, 24, MCFixupKindInfo::FKF_IsPCRel}, ---------------- I think it might be a good idea to ignore the clang-format suggestions in this case since the previous way is alot more readable. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 From llvm-commits at lists.llvm.org Tue Jul 7 10:12:00 2020 From: llvm-commits at lists.llvm.org (Amy Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:12:00 +0000 (UTC) Subject: [PATCH] D81950: [NativeSession] Add column numbers to NativeLineNumber. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG9ee90a490563: [NativeSession] Add column numbers to NativeLineNumber. (authored by akhuang). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81950/new/ https://reviews.llvm.org/D81950 Files: llvm/include/llvm/DebugInfo/PDB/Native/NativeLineNumber.h llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.exe llvm/test/tools/llvm-symbolizer/pdb/Inputs/test-columns.pdb llvm/test/tools/llvm-symbolizer/pdb/Inputs/test.cpp llvm/test/tools/llvm-symbolizer/pdb/pdb-native-columns.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D81950.275670.patch Type: text/x-patch Size: 6415 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:12:02 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:12:02 +0000 (UTC) Subject: [PATCH] D81359: [ELF] Add --[no-]relax for RISC-V In-Reply-To: References: Message-ID: <4045a0cb644112d15d1945c2c17dd5b4@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG09b81a72ac67: [ELF] Ignore --no-relax for RISC-V (authored by MaskRay). Changed prior to commit: https://reviews.llvm.org/D81359?vs=269281&id=275669#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81359/new/ https://reviews.llvm.org/D81359 Files: lld/ELF/Options.td lld/docs/ld.lld.1 lld/test/ELF/silent-ignore.test Index: lld/test/ELF/silent-ignore.test =================================================================== --- lld/test/ELF/silent-ignore.test +++ lld/test/ELF/silent-ignore.test @@ -7,6 +7,7 @@ RUN: -no-ctors-in-init-array \ RUN: -no-keep-memory \ RUN: -no-pipeline-knowledge \ +RUN: --no-relax \ RUN: -no-warn-mismatch \ RUN: -p \ RUN: -rpath-link . \ Index: lld/docs/ld.lld.1 =================================================================== --- lld/docs/ld.lld.1 +++ lld/docs/ld.lld.1 @@ -308,6 +308,8 @@ Page align sections. .It Fl -no-omagic Do not set the text data sections to be writable, page align sections. +.It Fl -no-relax +Disable target-specific relaxations. This is currently a no-op. .It Fl -no-rosegment Do not put read-only non-executable sections in their own segment. .It Fl -no-undefined-version Index: lld/ELF/Options.td =================================================================== --- lld/ELF/Options.td +++ lld/ELF/Options.td @@ -635,6 +635,7 @@ def: F<"no-ctors-in-init-array">; def: F<"no-keep-memory">; def: F<"no-pipeline-knowledge">; +def: F<"no-relax">; def: F<"no-warn-mismatch">; def: Flag<["-"], "p">; def: Separate<["--", "-"], "rpath-link">; -------------- next part -------------- A non-text attachment was scrubbed... Name: D81359.275669.patch Type: text/x-patch Size: 1220 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:15:00 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:15:00 +0000 (UTC) Subject: [PATCH] D80580: Separate Peeling Properties into its own struct In-Reply-To: References: Message-ID: <86d17cead5dbf179b07b2e3d729d8d5a@localhost.localdomain> Meinersbur accepted this revision. Meinersbur added a comment. This revision is now accepted and ready to land. Given there is no other feedback, I accept the patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80580/new/ https://reviews.llvm.org/D80580 From llvm-commits at lists.llvm.org Tue Jul 7 10:16:11 2020 From: llvm-commits at lists.llvm.org (Dan Liew via llvm-commits) Date: Tue, 07 Jul 2020 10:16:11 -0700 (PDT) Subject: [compiler-rt] 888951a - Disable interception of sigaltstack on i386 macOS. Message-ID: <5f04addb.1c69fb81.7a422.93fc@mx.google.com> Author: Dan Liew Date: 2020-07-07T10:15:37-07:00 New Revision: 888951aaca583bcce85b42ea6166416db8f96fe0 URL: https://github.com/llvm/llvm-project/commit/888951aaca583bcce85b42ea6166416db8f96fe0 DIFF: https://github.com/llvm/llvm-project/commit/888951aaca583bcce85b42ea6166416db8f96fe0.diff LOG: Disable interception of sigaltstack on i386 macOS. Summary: 28c91219c7e introduced an interceptor for `sigaltstack`. It turns out this broke `setjmp` on i386 macOS. This is because the implementation of `setjmp` on i386 macOS is written in assembly and makes the assumption that the call to `sigaltstack` does not clobber any registers. Presumably that assumption was made because it's a system call. In particular `setjmp` assumes that before and after the call that `%ecx` will contain a pointer the `jmp_buf`. The current interceptor breaks this assumption because it's written in C++ and `%ecx` is not a callee-saved register. This could be fixed by writing a trampoline interceptor to the existing interceptor in assembly that ensures all the registers are preserved. However, this is a lot of work for very little gain. Instead this patch just disables the interceptor on i386 macOS. For other Darwin architectures it currently appears to be safe to intercept `sigaltstack` using the current implementation because: * `setjmp` for x86_64 saves the pointer `jmp_buf` to the stack before calling `sigaltstack`. * `setjmp` for armv7/arm64/arm64_32/arm64e appears to not call `sigaltstack` at all. This patch should unbreak (once they are re-enabled) the following tests: ``` AddressSanitizer-Unit :: ./Asan-i386-calls-Test/AddressSanitizer.LongJmpTest AddressSanitizer-Unit :: ./Asan-i386-calls-Test/AddressSanitizer.SigLongJmpTest AddressSanitizer-Unit :: ./Asan-i386-inline-Test/AddressSanitizer.LongJmpTest AddressSanitizer-Unit :: ./Asan-i386-inline-Test/AddressSanitizer.SigLongJmpTest AddressSanitizer-i386-darwin :: TestCases/longjmp.cpp ``` This patch introduces a `SANITIZER_I386` macro for convenience. rdar://problem/62141412 Reviewers: kubamracek, yln, eugenis Subscribers: kristof.beyls, #sanitizers, llvm-commits Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D82691 Added: Modified: compiler-rt/lib/sanitizer_common/sanitizer_platform.h compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h Removed: ################################################################################ diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform.h index c68bfa258755..f0b1e04d1dd6 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform.h @@ -132,6 +132,12 @@ # define SANITIZER_X32 0 #endif +#if defined(__i386__) || defined(_M_IX86) +# define SANITIZER_I386 1 +#else +# define SANITIZER_I386 0 +#endif + #if defined(__mips__) # define SANITIZER_MIPS 1 # if defined(__mips64) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h index fb0dbfb2e9ae..a5fcbadb2597 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h @@ -597,7 +597,10 @@ #define SANITIZER_INTERCEPT_QSORT \ (SI_POSIX && !SI_IOSSIM && !SI_WATCHOS && !SI_TVOS && !SI_ANDROID) #define SANITIZER_INTERCEPT_QSORT_R (SI_LINUX && !SI_ANDROID) -#define SANITIZER_INTERCEPT_SIGALTSTACK SI_POSIX +// sigaltstack on i386 macOS cannot be intercepted due to setjmp() +// calling it and assuming that it does not clobber registers. +#define SANITIZER_INTERCEPT_SIGALTSTACK \ + (SI_POSIX && !(SANITIZER_MAC && SANITIZER_I386)) #define SANITIZER_INTERCEPT_UNAME (SI_POSIX && !SI_FREEBSD) #define SANITIZER_INTERCEPT___XUNAME SI_FREEBSD From llvm-commits at lists.llvm.org Tue Jul 7 10:16:13 2020 From: llvm-commits at lists.llvm.org (Dan Liew via llvm-commits) Date: Tue, 07 Jul 2020 10:16:13 -0700 (PDT) Subject: [compiler-rt] 8a8d6e2 - Revert "Temporarily disable the following failing tests on Darwin:" Message-ID: <5f04addd.1c69fb81.1c783.a8c8@mx.google.com> Author: Dan Liew Date: 2020-07-07T10:15:46-07:00 New Revision: 8a8d6e2b727112fafc52477acaf25affb62b6e65 URL: https://github.com/llvm/llvm-project/commit/8a8d6e2b727112fafc52477acaf25affb62b6e65 DIFF: https://github.com/llvm/llvm-project/commit/8a8d6e2b727112fafc52477acaf25affb62b6e65.diff LOG: Revert "Temporarily disable the following failing tests on Darwin:" This reverts commit f3a089506fdcc4a1d658697009572c93e00c4373. 888951aaca583bcce85b42ea6166416db8f96fe0 introduced a fix that should make the disabled tests work again. rdar://problem/62141412 Added: Modified: compiler-rt/lib/asan/tests/asan_test.cpp Removed: ################################################################################ diff --git a/compiler-rt/lib/asan/tests/asan_test.cpp b/compiler-rt/lib/asan/tests/asan_test.cpp index 83b0b0e8d33e..edc98ed18520 100644 --- a/compiler-rt/lib/asan/tests/asan_test.cpp +++ b/compiler-rt/lib/asan/tests/asan_test.cpp @@ -588,9 +588,6 @@ NOINLINE void TouchStackFunc() { A[i] = i*i; } -// Disabled due to rdar://problem/62141412 -#if !(defined(__APPLE__) && defined(__i386__)) - // Test that we handle longjmp and do not report false positives on stack. TEST(AddressSanitizer, LongJmpTest) { static jmp_buf buf; @@ -600,7 +597,6 @@ TEST(AddressSanitizer, LongJmpTest) { TouchStackFunc(); } } -#endif #if !defined(_WIN32) // Only basic longjmp is available on Windows. NOINLINE void UnderscopeLongJmpFunc1(jmp_buf buf) { @@ -662,8 +658,6 @@ TEST(AddressSanitizer, UnderscopeLongJmpTest) { } } -// Disabled due to rdar://problem/62141412 -#if !(defined(__APPLE__) && defined(__i386__)) TEST(AddressSanitizer, SigLongJmpTest) { static sigjmp_buf buf; if (!sigsetjmp(buf, 1)) { @@ -674,8 +668,6 @@ TEST(AddressSanitizer, SigLongJmpTest) { } #endif -#endif - // FIXME: Why does clang-cl define __EXCEPTIONS? #if defined(__EXCEPTIONS) && !defined(_WIN32) NOINLINE void ThrowFunc() { From llvm-commits at lists.llvm.org Tue Jul 7 10:16:14 2020 From: llvm-commits at lists.llvm.org (Dan Liew via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:16:14 +0000 (UTC) Subject: [PATCH] D82691: Disable interception of sigaltstack on i386 macOS. In-Reply-To: References: Message-ID: <9345e415ef4d0f2a398417d89e5807b5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG888951aaca58: Disable interception of sigaltstack on i386 macOS. (authored by delcypher). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82691/new/ https://reviews.llvm.org/D82691 Files: compiler-rt/lib/sanitizer_common/sanitizer_platform.h compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h Index: compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h =================================================================== --- compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h +++ compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h @@ -597,7 +597,10 @@ #define SANITIZER_INTERCEPT_QSORT \ (SI_POSIX && !SI_IOSSIM && !SI_WATCHOS && !SI_TVOS && !SI_ANDROID) #define SANITIZER_INTERCEPT_QSORT_R (SI_LINUX && !SI_ANDROID) -#define SANITIZER_INTERCEPT_SIGALTSTACK SI_POSIX +// sigaltstack on i386 macOS cannot be intercepted due to setjmp() +// calling it and assuming that it does not clobber registers. +#define SANITIZER_INTERCEPT_SIGALTSTACK \ + (SI_POSIX && !(SANITIZER_MAC && SANITIZER_I386)) #define SANITIZER_INTERCEPT_UNAME (SI_POSIX && !SI_FREEBSD) #define SANITIZER_INTERCEPT___XUNAME SI_FREEBSD Index: compiler-rt/lib/sanitizer_common/sanitizer_platform.h =================================================================== --- compiler-rt/lib/sanitizer_common/sanitizer_platform.h +++ compiler-rt/lib/sanitizer_common/sanitizer_platform.h @@ -132,6 +132,12 @@ # define SANITIZER_X32 0 #endif +#if defined(__i386__) || defined(_M_IX86) +# define SANITIZER_I386 1 +#else +# define SANITIZER_I386 0 +#endif + #if defined(__mips__) # define SANITIZER_MIPS 1 # if defined(__mips64) -------------- next part -------------- A non-text attachment was scrubbed... Name: D82691.276121.patch Type: text/x-patch Size: 1354 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:17:16 2020 From: llvm-commits at lists.llvm.org (Vy Nguyen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:17:16 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: oontvoo updated this revision to Diff 276122. oontvoo marked 6 inline comments as done. oontvoo added a comment. updated diff Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 Files: llvm/docs/CommandGuide/llvm-exegesis.rst llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att llvm/test/tools/llvm-exegesis/X86/lbr/lit.local.cfg llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.h llvm/tools/llvm-exegesis/lib/X86/CMakeLists.txt llvm/tools/llvm-exegesis/lib/X86/Target.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.h llvm/tools/llvm-exegesis/llvm-exegesis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77422.276122.patch Type: text/x-patch Size: 21441 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:18:02 2020 From: llvm-commits at lists.llvm.org (Jonas Devlieghere via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:18:02 +0000 (UTC) Subject: [PATCH] D83161: [llvm] [docs] Do not require recommonmark for manpage build In-Reply-To: References: Message-ID: <3c19872dd510744448249756adc1bf57@localhost.localdomain> JDevlieghere accepted this revision. JDevlieghere added a comment. This revision is now accepted and ready to land. I think this makes sense, I often need to build just the man pages and if we can avoid the dependency I'm all for it. LGTM. In D83161#2131468 , @mgorny wrote: > (though personally I'd prefer just converting these three docs into .rst and having a single markup everywhere, without extra dependencies) Personally I agree, but it seems LLVM might be moving in the opposite direction. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83161/new/ https://reviews.llvm.org/D83161 From llvm-commits at lists.llvm.org Tue Jul 7 10:18:51 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:18:51 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: zbrid updated this revision to Diff 276123. zbrid added a comment. Herald added a subscriber: nikic. Add inline comment about lvi hardening falling back to seses Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 Files: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82037.276123.patch Type: text/x-patch Size: 9359 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:23:12 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:23:12 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <4a563a142628436240660b01d6a1020b@localhost.localdomain> RithikSharma updated this revision to Diff 276126. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83311.276126.patch Type: text/x-patch Size: 18259 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:24:31 2020 From: llvm-commits at lists.llvm.org (Lei Huang via llvm-commits) Date: Tue, 07 Jul 2020 10:24:31 -0700 (PDT) Subject: [llvm] 62ba48b - [PowerPC] Implement Vector Replace Builtins in LLVM Message-ID: <5f04afcf.1c69fb81.bd60a.7c60@mx.google.com> Author: Biplob Mishra Date: 2020-07-07T12:22:52-05:00 New Revision: 62ba48b45f6525a3e453b54a6e5562d2f3dc7324 URL: https://github.com/llvm/llvm-project/commit/62ba48b45f6525a3e453b54a6e5562d2f3dc7324 DIFF: https://github.com/llvm/llvm-project/commit/62ba48b45f6525a3e453b54a6e5562d2f3dc7324.diff LOG: [PowerPC] Implement Vector Replace Builtins in LLVM Provide the LLVM intrinsics needed to implement vector replace element builtins in altivec.h which will be added in a subsequent patch. Differential Revision: https://reviews.llvm.org/D83308 Added: Modified: llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/IntrinsicsPowerPC.td b/llvm/include/llvm/IR/IntrinsicsPowerPC.td index 6aa0c351e95a..12f4a3ce8e28 100644 --- a/llvm/include/llvm/IR/IntrinsicsPowerPC.td +++ b/llvm/include/llvm/IR/IntrinsicsPowerPC.td @@ -522,6 +522,15 @@ let TargetPrefix = "ppc" in { // All intrinsics start with "llvm.ppc.". Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_i64_ty, llvm_v4i32_ty], [IntrNoMem]>; + // P10 Vector Insert with immediate. + def int_ppc_altivec_vinsw : + Intrinsic<[llvm_v4i32_ty], + [llvm_v4i32_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; + def int_ppc_altivec_vinsd : + Intrinsic<[llvm_v2i64_ty], + [llvm_v2i64_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; } // Vector average. diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td index 3ed651abe453..2c21d0a175ad 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -794,8 +794,16 @@ let Predicates = [IsISA3_1] in { (int_ppc_altivec_vsrdbi v16i8:$VRA, v16i8:$VRB, i32:$SH))]>; - def VINSW : VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", []>; - def VINSD : VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", []>; + def VINSW : + VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", + [(set v4i32:$vD, + (int_ppc_altivec_vinsw v4i32:$vDi, i64:$rB, + timm:$UIM))]>; + def VINSD : + VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", + [(set v2i64:$vD, + (int_ppc_altivec_vinsd v2i64:$vDi, i64:$rB, + timm:$UIM))]>; def VINSBVLX : VXForm_VTB5_RA5_ins<15, "vinsbvlx", [(set v16i8:$vD, diff --git a/llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll b/llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll index 577bb212f018..84bf4032aa34 100644 --- a/llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll +++ b/llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll @@ -231,3 +231,25 @@ entry: ret <4 x i32> %0 } declare <4 x i32> @llvm.ppc.altivec.vinswvrx(<4 x i32>, i64, <4 x i32>) + +define <4 x i32> @testVINSW(<4 x i32> %a, i64 %b) { +; CHECK-LABEL: testVINSW: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsw v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32> %a, i64 %b, i32 1) + ret <4 x i32> %0 +} +declare <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i64, i32 immarg) + +define <2 x i64> @testVINSD(<2 x i64> %a, i64 %b) { +; CHECK-LABEL: testVINSD: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsd v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64> %a, i64 %b, i32 1) + ret <2 x i64> %0 +} +declare <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64>, i64, i32 immarg) From llvm-commits at lists.llvm.org Tue Jul 7 10:24:45 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:24:45 +0000 (UTC) Subject: [PATCH] D83308: [Power10] Implement Vector Replace Builtins in LLVM In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG62ba48b45f65: [PowerPC] Implement Vector Replace Builtins in LLVM (authored by biplmish, committed by lei). Changed prior to commit: https://reviews.llvm.org/D83308?vs=276066&id=276127#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83308/new/ https://reviews.llvm.org/D83308 Files: llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll Index: llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll =================================================================== --- llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll +++ llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll @@ -231,3 +231,25 @@ ret <4 x i32> %0 } declare <4 x i32> @llvm.ppc.altivec.vinswvrx(<4 x i32>, i64, <4 x i32>) + +define <4 x i32> @testVINSW(<4 x i32> %a, i64 %b) { +; CHECK-LABEL: testVINSW: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsw v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32> %a, i64 %b, i32 1) + ret <4 x i32> %0 +} +declare <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i64, i32 immarg) + +define <2 x i64> @testVINSD(<2 x i64> %a, i64 %b) { +; CHECK-LABEL: testVINSD: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vinsd v2, r5, 1 +; CHECK-NEXT: blr +entry: + %0 = tail call <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64> %a, i64 %b, i32 1) + ret <2 x i64> %0 +} +declare <2 x i64> @llvm.ppc.altivec.vinsd(<2 x i64>, i64, i32 immarg) Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -794,8 +794,16 @@ (int_ppc_altivec_vsrdbi v16i8:$VRA, v16i8:$VRB, i32:$SH))]>; - def VINSW : VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", []>; - def VINSD : VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", []>; + def VINSW : + VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", + [(set v4i32:$vD, + (int_ppc_altivec_vinsw v4i32:$vDi, i64:$rB, + timm:$UIM))]>; + def VINSD : + VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", + [(set v2i64:$vD, + (int_ppc_altivec_vinsd v2i64:$vDi, i64:$rB, + timm:$UIM))]>; def VINSBVLX : VXForm_VTB5_RA5_ins<15, "vinsbvlx", [(set v16i8:$vD, Index: llvm/include/llvm/IR/IntrinsicsPowerPC.td =================================================================== --- llvm/include/llvm/IR/IntrinsicsPowerPC.td +++ llvm/include/llvm/IR/IntrinsicsPowerPC.td @@ -522,6 +522,15 @@ Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_i64_ty, llvm_v4i32_ty], [IntrNoMem]>; + // P10 Vector Insert with immediate. + def int_ppc_altivec_vinsw : + Intrinsic<[llvm_v4i32_ty], + [llvm_v4i32_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; + def int_ppc_altivec_vinsd : + Intrinsic<[llvm_v2i64_ty], + [llvm_v2i64_ty, llvm_i64_ty, llvm_i32_ty], + [IntrNoMem, ImmArg>]>; } // Vector average. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83308.276127.patch Type: text/x-patch Size: 3114 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:25:01 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:25:01 +0000 (UTC) Subject: [PATCH] D83056: Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:50 -#define DEBUG_TYPE "loop-unroll" +#define DEBUG_TYPE "loop-peel" ---------------- I am not sure about this change. Currently peeling is integrated in loop-unroll and remarks/debug can be filtered by loop-unroll, but now we will generate remarks for `loop-unroll` and `loop-peel` when running `-loop-unroll`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 10:25:16 2020 From: llvm-commits at lists.llvm.org (Daniil Fukalov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:25:16 +0000 (UTC) Subject: [PATCH] D82761: SpeculativeExecution: Fix for logic change introduced in D81730. In-Reply-To: References: Message-ID: dfukalov marked an inline comment as done. dfukalov added a comment. Ping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82761/new/ https://reviews.llvm.org/D82761 From llvm-commits at lists.llvm.org Tue Jul 7 10:25:45 2020 From: llvm-commits at lists.llvm.org (Whitney Tsang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:25:45 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: Whitney added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:222 + SmallPtrSet InstsToCheck) { + if (!std::any_of(InstsToCheck.begin(), InstsToCheck.end(), + [&DI, &I](Instruction *CurInst) { ---------------- ``` return !std::any_of(InstsToCheck.begin(), InstsToCheck.end(), [&DI, &I](Instruction *CurInst) { auto DepResult = DI.depends(&I, CurInst, true); if (DepResult && (DepResult->isOutput() || DepResult->isFlow() || DepResult->isAnti())) return true; return false; })); ``` ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:237 + SmallPtrSet InstsToCheck) { + if (!std::any_of(InstsToCheck.begin(), InstsToCheck.end(), + [&MSSAU, &I](Instruction *CurInst) { ---------------- Below is the cleaner version of your code: ``` MemoryUseOrDef *MemUseOrDef = MSSAU.getMemorySSA()->getMemoryAccess(&I); if (isa(MemUseOrDef)) return false; return !std::any_of(InstsToCheck.begin(), InstsToCheck.end(), &MSSAU, &I](Instruction *CurInst) { MemoryUseOrDef *MemUseOrDef = MSSAU.getMemorySSA()->getMemoryAccess(CurInst); return isa(MemUseOrDef); }); ``` But this code is very restrictive. It can be improved by considering the relationship between `I` and `CurInst`, e.g. if they don't access the same memory then it should still be safe. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:513 *CI_safecall->getNextNode(), DT, &PDT, - &DI)); + &DI, &MSSAU)); + EXPECT_TRUE(isSafeToMoveBefore(*CI_safecall->getPrevNode(), ---------------- change all the existing ones to `&DI, nullptr))` to make sure you are testing `DI`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Tue Jul 7 10:27:36 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:27:36 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: ctetreau updated this revision to Diff 276128. ctetreau added a comment. add comment for mul.ll Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 Files: llvm/include/llvm/IR/PatternMatch.h llvm/test/Transforms/InstCombine/fmul.ll llvm/test/Transforms/InstCombine/mul.ll Index: llvm/test/Transforms/InstCombine/mul.ll =================================================================== --- llvm/test/Transforms/InstCombine/mul.ll +++ llvm/test/Transforms/InstCombine/mul.ll @@ -680,3 +680,12 @@ %m = mul <4 x i32> %r, %r ret <4 x i32> %m } + +; z * splat(0) = splat(0), even for scalable vectors +define @mul_scalable_splat_zero( %z) { +; CHECK-LABEL: @mul_scalable_splat_zero( +; CHECK-NEXT: ret zeroinitializer + %shuf = shufflevector insertelement ( undef, i64 0, i32 0), undef, zeroinitializer + %t3 = mul %shuf, %z + ret %t3 +} Index: llvm/test/Transforms/InstCombine/fmul.ll =================================================================== --- llvm/test/Transforms/InstCombine/fmul.ll +++ llvm/test/Transforms/InstCombine/fmul.ll @@ -1164,3 +1164,12 @@ %mul = fmul fast double %sqr, %sel ret double %mul } + +; fastmath => z * splat(0) = splat(0), even for scalable vectors +define @mul_scalable_splat_zero( %z) { +; CHECK-LABEL: @mul_scalable_splat_zero( +; CHECK-NEXT: ret zeroinitializer + %shuf = shufflevector insertelement ( undef, float 0.0, i32 0), undef, zeroinitializer + %t3 = fmul fast %shuf, %z + ret %t3 +} Index: llvm/include/llvm/IR/PatternMatch.h =================================================================== --- llvm/include/llvm/IR/PatternMatch.h +++ llvm/include/llvm/IR/PatternMatch.h @@ -262,18 +262,23 @@ return constantint_match(); } -/// This helper class is used to match scalar and fixed width vector integer -/// constants that satisfy a specified predicate. -/// For vector constants, undefined elements are ignored. +/// This helper class is used to match integer constant scalars, vector splats, +/// and fixed width vectors that satisfy a specified predicate. +/// For fixed width vector constants, undefined elements are ignored. template struct cst_pred_ty : public Predicate { template bool match(ITy *V) { if (const auto *CI = dyn_cast(V)) return this->isValue(CI->getValue()); - if (const auto *FVTy = dyn_cast(V->getType())) { + if (const auto *VTy = dyn_cast(V->getType())) { if (const auto *C = dyn_cast(V)) { if (const auto *CI = dyn_cast_or_null(C->getSplatValue())) return this->isValue(CI->getValue()); + // Number of elements of a scalable vector unknown at compile time + auto *FVTy = dyn_cast(VTy); + if (!FVTy) + return false; + // Non-splat vector constant: check each element for a match. unsigned NumElts = FVTy->getNumElements(); assert(NumElts != 0 && "Constant vector with no elements?"); @@ -321,25 +326,25 @@ } }; -/// This helper class is used to match scalar and vector floating-point -/// constants that satisfy a specified predicate. -/// For vector constants, undefined elements are ignored. +/// This helper class is used to match float constant scalars, vector splats, +/// and fixed width vectors that satisfy a specified predicate. +/// For fixed width vector constants, undefined elements are ignored. template struct cstfp_pred_ty : public Predicate { template bool match(ITy *V) { if (const auto *CF = dyn_cast(V)) return this->isValue(CF->getValueAPF()); - if (V->getType()->isVectorTy()) { + if (const auto *VTy = dyn_cast(V->getType())) { if (const auto *C = dyn_cast(V)) { if (const auto *CF = dyn_cast_or_null(C->getSplatValue())) return this->isValue(CF->getValueAPF()); // Number of elements of a scalable vector unknown at compile time - if (isa(V->getType())) + auto *FVTy = dyn_cast(VTy); + if (!FVTy) return false; // Non-splat vector constant: check each element for a match. - unsigned NumElts = - cast(V->getType())->getNumElements(); + unsigned NumElts = FVTy->getNumElements(); assert(NumElts != 0 && "Constant vector with no elements?"); bool HasNonUndefElements = false; for (unsigned i = 0; i != NumElts; ++i) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83001.276128.patch Type: text/x-patch Size: 4657 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:30:31 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via llvm-commits) Date: Tue, 07 Jul 2020 10:30:31 -0700 (PDT) Subject: [llvm] 79b30af - Expand the LLVM Developer Policy to include new sections on adding Message-ID: <5f04b137.1c69fb81.c92ca.82c8@mx.google.com> Author: Chris Lattner Date: 2020-07-07T10:30:24-07:00 New Revision: 79b30af0ec53de7c7e3378465124ff1026a77f75 URL: https://github.com/llvm/llvm-project/commit/79b30af0ec53de7c7e3378465124ff1026a77f75 DIFF: https://github.com/llvm/llvm-project/commit/79b30af0ec53de7c7e3378465124ff1026a77f75.diff LOG: Expand the LLVM Developer Policy to include new sections on adding a project to the LLVM Monorepo, and a second about the LLVM Incubator projects. Differential Revision: https://reviews.llvm.org/D83182 Added: Modified: llvm/docs/DeveloperPolicy.rst Removed: ################################################################################ diff --git a/llvm/docs/DeveloperPolicy.rst b/llvm/docs/DeveloperPolicy.rst index 22f29a06b289..8d424e63372a 100644 --- a/llvm/docs/DeveloperPolicy.rst +++ b/llvm/docs/DeveloperPolicy.rst @@ -521,8 +521,69 @@ C API Changes release notes so that it's clear to external users who do not follow the project how the C API is changing and evolving. -New Targets ------------ +.. _toolchain: + +Updating Toolchain Requirements +------------------------------- + +We intend to require newer toolchains as time goes by. This means LLVM's +codebase can use newer versions of C++ as they get standardized. Requiring newer +toolchains to build LLVM can be painful for those building LLVM; therefore, it +will only be done through the following process: + + * It is a general goal to support LLVM and GCC versions from the last 3 years + at a minimum. This time-based guideline is not strict: we may support much + older compilers, or decide to support fewer versions. + + * An RFC is sent to the `llvm-dev mailing list `_ + + - Detail upsides of the version increase (e.g. which newer C++ language or + library features LLVM should use; avoid miscompiles in particular compiler + versions, etc). + - Detail downsides on important platforms (e.g. Ubuntu LTS status). + + * Once the RFC reaches consensus, update the CMake toolchain version checks as + well as the :doc:`getting started` guide. This provides a + softer transition path for developers compiling LLVM, because the + error can be turned into a warning using a CMake flag. This is an important + step: LLVM still doesn't have code which requires the new toolchains, but it + soon will. If you compile LLVM but don't read the mailing list, we should + tell you! + + * Ensure that at least one LLVM release has had this soft-error. Not all + developers compile LLVM top-of-tree. These release-bound developers should + also be told about upcoming changes. + + * Turn the soft-error into a hard-error after said LLVM release has branched. + + * Update the :doc:`coding standards` to allow the new + features we've explicitly approved in the RFC. + + * Start using the new features in LLVM's codebase. + +Here's a `sample RFC +`_ and the +`corresponding change `_. + +.. _new-llvm-components: + +Introducing New Components into LLVM +==================================== + +The LLVM community is a vibrant and exciting place to be, and we look to be +inclusive of new projects and foster new communities, and increase +collaboration across industry and academia. + +That said, we need to strike a balance between being inclusive of new ideas and +people and the cost of ongoing maintenance that new code requires. As such, we +have the following general policies for introducing major new components into +the LLVM world. However, this is really only intended to cover common cases +that we have seen arise: diff erent situations are diff erent, and we are open +to discussing unusual cases as well - just start an RFC thread on the +`llvm-dev mailing list `_. + +Adding a New Target +------------------- LLVM is very receptive to new targets, even experimental ones, but a number of problems can appear when adding new large portions of code, and back-ends are @@ -606,49 +667,111 @@ In essences, these rules are necessary for targets to gain and retain their status, but also markers to define bit-rot, and will be used to clean up the tree from unmaintained targets. -.. _toolchain: - -Updating Toolchain Requirements -------------------------------- - -We intend to require newer toolchains as time goes by. This means LLVM's -codebase can use newer versions of C++ as they get standardized. Requiring newer -toolchains to build LLVM can be painful for those building LLVM; therefore, it -will only be done through the following process: - - * Generally, try to support LLVM and GCC versions from the last 3 years at a - minimum. This time-based guideline is not strict: we may support much older - compilers, or decide to support fewer versions. - - * An RFC is sent to the `llvm-dev mailing list `_ - - - Detail upsides of the version increase (e.g. which newer C++ language or - library features LLVM should use; avoid miscompiles in particular compiler - versions, etc). - - Detail downsides on important platforms (e.g. Ubuntu LTS status). - - * Once the RFC reaches consensus, update the CMake toolchain version checks as - well as the :doc:`getting started` guide. We want to - soft-error when developers compile LLVM. We say "soft-error" because the - error can be turned into a warning using a CMake flag. This is an important - step: LLVM still doesn't have code which requires the new toolchains, but it - soon will. If you compile LLVM but don't read the mailing list, we should - tell you! - - * Ensure that at least one LLVM release has had this soft-error. Not all - developers compile LLVM top-of-tree. These release-bound developers should - also be told about upcoming changes. - - * Turn the soft-error into a hard-error after said LLVM release has branched. +Adding an Established Project To the LLVM Monorepo +-------------------------------------------------- + +The `LLVM monorepo `_ is the centerpoint +of development in the LLVM world, and has all of the primary LLVM components, +including the LLVM optimizer and code generators, Clang, LLDB, etc. `Monorepos +in general `_ are great because they +allow atomic commits to the project, simplify CI, and make it easier for +subcommunities to collaborate. + +That said, the burden to add things to the LLVM monorepo needs to be very high - +code that is added to this repository is checked out by everyone in the +community. As such, we hold subprojects to a high bar similar to "official +targets", they: + + * Must be generally aligned with the mission of the LLVM project to advance + compilers, languages, tools, runtimes, etc. + * Must conform to all of the policies laid out in this developer policy + document, including license, patent, coding standards, and code of conduct. + * Must have an active community that maintains the code, including established + code owners. + * Should have reasonable documentation about how it works, including a high + quality README file. + * Should have CI to catch breakage within the project itself or due to + underlying LLVM dependencies. + * Should have code free of issues the community finds contentious, or be on a + clear path to resolving them. + * Must be proposed through the LLVM RFC process, and have its addition approved + by the LLVM community - this ultimately mediates the resolution of the + "should" concerns above. + +If you have a project that you think would make sense to add to the LLVM +monorepo, please start an RFC thread on the llvm-dev mailing list to kick off +the discussion. This process can take some time and iteration - please don’t +be discouraged or intimidated by that! + +If you have an earlier stage project that you think is aligned with LLVM, please +see the "Incubating New Projects" section. + +Incubating New Projects +----------------------- - * Update the :doc:`coding standards` to allow the new - features we've explicitly approved in the RFC. +The burden to add a new project to the LLVM monorepo is intentionally very high, +but that can have a chilling effect on new and innovative projects. To help +foster these sorts of projects, LLVM supports an "incubator" process that is +much easier to get started with. It provides space for potentially valuable, +new top-level and sub-projects to reach a critical mass before they have enough +code to prove their utility and grow a community. This also allows +collaboration between teams that already have permissions to make contributions +to projects under the LLVM umbrella. + +Projects which can be considered for the LLVM incubator meet the following +criteria: + + * Must be generally aligned with the mission of the LLVM project to advance + compilers, languages, tools, runtimes, etc. + * Must conform to the license, patent, and code of conduct policies laid out + in this developer policy document. + * Must have a documented charter and development plan, e.g. in the form of a + README file, mission statement, and/or manifesto. + * Should conform to coding standards, incremental development process, and + other expectations. + * Should have a sense of the community that it hopes to eventually foster, and + there should be interest from members with diff erent affiliations / + organizations. + * Should have a feasible path to eventually graduate as a dedicated top-level + or sub-project within the `LLVM monorepo + `_. + * Should include a notice (e.g. in the project README or web page) that the + project is in ‘incubation status’ and is not included in LLVM releases (see + suggested wording below). + * Must be proposed through the LLVM RFC process, and have its addition + approved by the LLVM community - this ultimately mediates the resolution of + the "should" concerns above. + +That said, the project need not have any code to get started, and need not have +an established community at all! Furthermore, incubating projects may pass +through transient states that violate the "Should" guidelines above, or would +otherwise make them unsuitable for direct inclusion in the monorepo (e.g. +dependencies that have not yet been factored appropriately, leveraging +experimental components or APIs that are not yet upstream, etc). + +When approved, the llvm-admin group can grant the new project: + * A new repository in the LLVM Github Organization - but not the LLVM monorepo. + * New mailing list, discourse forum, and/or discord chat hosted with other LLVM + forums. + * Other infrastructure integration can be discussed on a case-by-case basis. + +Graduation to the mono-repo would follow existing processes and standards for +becoming a first-class part of the monorepo. Similarly, an incubating project +may be eventually retired, but no process has been established for that yet. If +and when this comes up, please start an RFC discussion on llvm-dev. + +This process is very new - please expect the details to change, it is always +safe to ask on the `llvm-dev mailing list +`_ about this. + +Suggested disclaimer for the project README and the main project web page: - * Start using the new features in LLVM's codebase. +:: -Here's a `sample RFC -`_ and the -`corresponding change `_. + This project is participating in the LLVM Incubator process: as such, it is + not part of any official LLVM release. While incubation status is not + necessarily a reflection of the completeness or stability of the code, it + does indicate that the project is not yet endorsed as a component of LLVM. .. _copyright-license-patents: From llvm-commits at lists.llvm.org Tue Jul 7 10:30:34 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:30:34 +0000 (UTC) Subject: [PATCH] D83182: Expand the LLVM Developer Policy to include new sections on adding a project to the LLVM Monorepo, and a second about the LLVM Incubator projects. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG79b30af0ec53: Expand the LLVM Developer Policy to include new sections on adding a project to… (authored by lattner). Changed prior to commit: https://reviews.llvm.org/D83182?vs=275576&id=276131#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83182/new/ https://reviews.llvm.org/D83182 Files: llvm/docs/DeveloperPolicy.rst -------------- next part -------------- A non-text attachment was scrubbed... Name: D83182.276131.patch Type: text/x-patch Size: 11398 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:31:24 2020 From: llvm-commits at lists.llvm.org (Michael Spencer via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:31:24 +0000 (UTC) Subject: [PATCH] D83315: Turn arcmt-* options into a single option In-Reply-To: References: Message-ID: <529d0c7d3436f22d788a48c7f6f2a7a7@localhost.localdomain> Bigcheese accepted this revision. Bigcheese added a comment. This revision is now accepted and ready to land. I have a slight preference for `-arcmt-action=`, but up to you if you want to change it. Otherwise LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83315/new/ https://reviews.llvm.org/D83315 From llvm-commits at lists.llvm.org Tue Jul 7 10:32:18 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:32:18 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <7db66473229cc8fd6ba5371f167fe500@localhost.localdomain> RithikSharma marked an inline comment as done. RithikSharma added inline comments. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:513 *CI_safecall->getNextNode(), DT, &PDT, - &DI)); + &DI, &MSSAU)); + EXPECT_TRUE(isSafeToMoveBefore(*CI_safecall->getPrevNode(), ---------------- Whitney wrote: > change all the existing ones to `&DI, nullptr))` to make sure you are testing `DI`. Sure but even when we give preference to DI? ``` if (DI) return isDependenceSafe(I, *DI, InstsToCheck); else if (MSSAU) return isDependenceSafe(I, *MSSAU, InstsToCheck); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Tue Jul 7 10:32:39 2020 From: llvm-commits at lists.llvm.org (Derek Schuff via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:32:39 +0000 (UTC) Subject: [PATCH] D83278: [WebAssembly] Avoid scalarizing vector shifts in more cases In-Reply-To: References: Message-ID: <643804bc7b61e565370f909d93693308@localhost.localdomain> dschuff accepted this revision. dschuff added a comment. This revision is now accepted and ready to land. This is a nice simplification. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83278/new/ https://reviews.llvm.org/D83278 From llvm-commits at lists.llvm.org Tue Jul 7 10:33:08 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Tue, 07 Jul 2020 10:33:08 -0700 (PDT) Subject: [llvm] 3030e6b - [X86][AVX] Add AVX2 tests to extractelement-load.ll Message-ID: <5f04b1d4.1c69fb81.6f121.acac@mx.google.com> Author: Simon Pilgrim Date: 2020-07-07T18:32:32+01:00 New Revision: 3030e6b94b21f4d37ada6bef7cde6920415409d8 URL: https://github.com/llvm/llvm-project/commit/3030e6b94b21f4d37ada6bef7cde6920415409d8 DIFF: https://github.com/llvm/llvm-project/commit/3030e6b94b21f4d37ada6bef7cde6920415409d8.diff LOG: [X86][AVX] Add AVX2 tests to extractelement-load.ll Added: Modified: llvm/test/CodeGen/X86/extractelement-load.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/extractelement-load.ll b/llvm/test/CodeGen/X86/extractelement-load.ll index 332fea81adff..5eb24632e066 100644 --- a/llvm/test/CodeGen/X86/extractelement-load.ll +++ b/llvm/test/CodeGen/X86/extractelement-load.ll @@ -1,7 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=X32-SSE2 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 | FileCheck %s --check-prefixes=X64,X64-SSSE3 -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefixes=X64,X64-AVX +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX1 +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX2 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" From llvm-commits at lists.llvm.org Tue Jul 7 10:33:10 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Tue, 07 Jul 2020 10:33:10 -0700 (PDT) Subject: [llvm] 6cff71e - [X86][AVX] Add test case showing incorrect extraction from VBROADCAST_LOAD on AVX2 targets Message-ID: <5f04b1d6.1c69fb81.96be9.a3da@mx.google.com> Author: Simon Pilgrim Date: 2020-07-07T18:32:32+01:00 New Revision: 6cff71e92e644adf5eab8cb411e5ac053746bbac URL: https://github.com/llvm/llvm-project/commit/6cff71e92e644adf5eab8cb411e5ac053746bbac DIFF: https://github.com/llvm/llvm-project/commit/6cff71e92e644adf5eab8cb411e5ac053746bbac.diff LOG: [X86][AVX] Add test case showing incorrect extraction from VBROADCAST_LOAD on AVX2 targets On AVX2 we tend to lower BUILD_VECTOR of constants as broadcasts if we can, in this case a <2 x i16> non-uniform constant has been lowered as a <4 x i32> broadcast. The test case shows that the extraction folding code has incorrectly extracted the wrong part (lower WORD) of the resulting i32 memory source. Found by internal fuzzing tests. Added: Modified: llvm/test/CodeGen/X86/extractelement-load.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/extractelement-load.ll b/llvm/test/CodeGen/X86/extractelement-load.ll index 5eb24632e066..752ba5b2a33d 100644 --- a/llvm/test/CodeGen/X86/extractelement-load.ll +++ b/llvm/test/CodeGen/X86/extractelement-load.ll @@ -266,3 +266,51 @@ entry: %cond = select i1 %cmp, float 1.000000e+00, float %vecext ret float %cond } + +; FIXME: Incorrect AVX2 codegen due to bad extraction from a VBROADCAST_LOAD of the <2 x i16> constant bitcast as <4 x i32>. +define void @subextract_broadcast_load_constant(<2 x i16>* nocapture %0, i16* nocapture %1, i16* nocapture %2) { +; X32-SSE2-LABEL: subextract_broadcast_load_constant: +; X32-SSE2: # %bb.0: +; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax +; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx +; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %edx +; X32-SSE2-NEXT: movl $-1583308898, (%edx) # imm = 0xA1A09F9E +; X32-SSE2-NEXT: movw $-24674, (%ecx) # imm = 0x9F9E +; X32-SSE2-NEXT: movw $-24160, (%eax) # imm = 0xA1A0 +; X32-SSE2-NEXT: retl +; +; X64-SSSE3-LABEL: subextract_broadcast_load_constant: +; X64-SSSE3: # %bb.0: +; X64-SSSE3-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E +; X64-SSSE3-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E +; X64-SSSE3-NEXT: movw $-24160, (%rdx) # imm = 0xA1A0 +; X64-SSSE3-NEXT: retq +; +; X64-AVX1-LABEL: subextract_broadcast_load_constant: +; X64-AVX1: # %bb.0: +; X64-AVX1-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E +; X64-AVX1-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E +; X64-AVX1-NEXT: movw $-24160, (%rdx) # imm = 0xA1A0 +; X64-AVX1-NEXT: retq +; +; X64-AVX2-LABEL: subextract_broadcast_load_constant: +; X64-AVX2: # %bb.0: +; X64-AVX2-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E +; X64-AVX2-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E +; X64-AVX2-NEXT: movw $-24674, (%rdx) # imm = 0x9F9E +; X64-AVX2-NEXT: retq + %4 = bitcast <2 x i16>* %0 to i8* + store i8 -98, i8* %4, align 1 + %5 = getelementptr inbounds i8, i8* %4, i64 1 + store i8 -97, i8* %5, align 1 + %6 = getelementptr inbounds i8, i8* %4, i64 2 + store i8 -96, i8* %6, align 1 + %7 = getelementptr inbounds i8, i8* %4, i64 3 + store i8 -95, i8* %7, align 1 + %8 = load <2 x i16>, <2 x i16>* %0, align 4 + %9 = extractelement <2 x i16> %8, i32 0 + store i16 %9, i16* %1, align 2 + %10 = extractelement <2 x i16> %8, i32 1 + store i16 %10, i16* %2, align 2 + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 10:37:49 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:37:49 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowance with tablegen generated map Message-ID: clementval created this revision. clementval added reviewers: DavidTruby, sscalpone, kiranchandramohan, ichoyjx. Herald added subscribers: llvm-commits, aaron.ballman, sstefan1, guansong, hiraditya, yaxunl. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This patch is enabling the generation of clauses enum sets for semantics check in Flang through tablegen. Enum sets and directive - sets map is generated by the new tablegen infrsatructure for OpenMP and other directive languages. The semantic checks for OpenMP are modified to use this newly generated map. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83326 Files: flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/check-omp-structure.h flang/test/Semantics/omp-clause-validity01.f90 llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83326.276135.patch Type: text/x-patch Size: 75244 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:39:18 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:39:18 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowance with tablegen generated map In-Reply-To: References: Message-ID: <6342f1ab04cc736f31387d693c422614@localhost.localdomain> clementval marked an inline comment as done. clementval added inline comments. ================ Comment at: flang/test/Semantics/omp-clause-validity01.f90:457 - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) ---------------- As a side note, This is supposed to be fine in Clang so I removed the check. I looked at the OpenMP 5.0 std and didn't see a restriction on `reduction` for `task loop simd`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Tue Jul 7 10:43:28 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Tue, 07 Jul 2020 10:43:28 -0700 (PDT) Subject: [llvm] 1143f09 - [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion Message-ID: <5f04b440.1c69fb81.8ea7e.7989@mx.google.com> Author: Arthur Eubanks Date: 2020-07-07T10:43:07-07:00 New Revision: 1143f09678f42352620d373939b0655e7a332268 URL: https://github.com/llvm/llvm-project/commit/1143f09678f42352620d373939b0655e7a332268 DIFF: https://github.com/llvm/llvm-project/commit/1143f09678f42352620d373939b0655e7a332268.diff LOG: [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion The legacy pass name is "loop-fusion". Fixes most tests under Transforms/LoopFusion under NPM. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D83066 Added: Modified: llvm/lib/Passes/PassRegistry.def Removed: ################################################################################ diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index a28f6f9d6c3e..ecb532ee5553 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -221,7 +221,7 @@ FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass()) FUNCTION_PASS("lcssa", LCSSAPass()) FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass()) FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass()) -FUNCTION_PASS("loop-fuse", LoopFusePass()) +FUNCTION_PASS("loop-fusion", LoopFusePass()) FUNCTION_PASS("loop-distribute", LoopDistributePass()) FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt()) FUNCTION_PASS("print", PrintFunctionPass(dbgs())) From llvm-commits at lists.llvm.org Tue Jul 7 10:43:37 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:43:37 +0000 (UTC) Subject: [PATCH] D83066: [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion In-Reply-To: References: Message-ID: <5d9e0126dfaa5fbb00a24518a87c3089@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1143f09678f4: [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83066/new/ https://reviews.llvm.org/D83066 Files: llvm/lib/Passes/PassRegistry.def Index: llvm/lib/Passes/PassRegistry.def =================================================================== --- llvm/lib/Passes/PassRegistry.def +++ llvm/lib/Passes/PassRegistry.def @@ -221,7 +221,7 @@ FUNCTION_PASS("lcssa", LCSSAPass()) FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass()) FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass()) -FUNCTION_PASS("loop-fuse", LoopFusePass()) +FUNCTION_PASS("loop-fusion", LoopFusePass()) FUNCTION_PASS("loop-distribute", LoopDistributePass()) FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt()) FUNCTION_PASS("print", PrintFunctionPass(dbgs())) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83066.276137.patch Type: text/x-patch Size: 608 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:44:32 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Tue, 07 Jul 2020 10:44:32 -0700 (PDT) Subject: [llvm] 83158cf - [BasicAA] Remove -basicaa alias Message-ID: <5f04b480.1c69fb81.6305c.b895@mx.google.com> Author: Arthur Eubanks Date: 2020-07-07T10:44:23-07:00 New Revision: 83158cf95dd7fd9fa8a1eb515f16bf47856601ef URL: https://github.com/llvm/llvm-project/commit/83158cf95dd7fd9fa8a1eb515f16bf47856601ef DIFF: https://github.com/llvm/llvm-project/commit/83158cf95dd7fd9fa8a1eb515f16bf47856601ef.diff LOG: [BasicAA] Remove -basicaa alias Follow up of https://reviews.llvm.org/D82607. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D83067 Added: Modified: llvm/include/llvm/IR/LegacyPassNameParser.h llvm/test/Analysis/BasicAA/empty.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/LegacyPassNameParser.h b/llvm/include/llvm/IR/LegacyPassNameParser.h index 931181930d16..c33b9fc40472 100644 --- a/llvm/include/llvm/IR/LegacyPassNameParser.h +++ b/llvm/include/llvm/IR/LegacyPassNameParser.h @@ -73,11 +73,6 @@ class PassNameParser : public PassRegistrationListener, llvm_unreachable(nullptr); } addLiteralOption(P->getPassArgument().data(), P, P->getPassName().data()); - - // Temporary alias for basicaa -> basic-aa - // TODO: remove once everything is migrated to basic-aa - if (P->getPassArgument() == "basic-aa") - addLiteralOption("basicaa", P, "deprecated alias for basic-aa"); } void passEnumerate(const PassInfo *P) override { passRegistered(P); } diff --git a/llvm/test/Analysis/BasicAA/empty.ll b/llvm/test/Analysis/BasicAA/empty.ll index 01eeb1c4193f..a6fdcabfb3d5 100644 --- a/llvm/test/Analysis/BasicAA/empty.ll +++ b/llvm/test/Analysis/BasicAA/empty.ll @@ -1,4 +1,3 @@ -; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s ; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" From llvm-commits at lists.llvm.org Tue Jul 7 10:44:41 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:44:41 +0000 (UTC) Subject: [PATCH] D83067: [BasicAA] Remove -basicaa alias In-Reply-To: References: Message-ID: <27d9942930804d8c032432cca2c0f2dd@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG83158cf95dd7: [BasicAA] Remove -basicaa alias (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83067/new/ https://reviews.llvm.org/D83067 Files: llvm/include/llvm/IR/LegacyPassNameParser.h llvm/test/Analysis/BasicAA/empty.ll Index: llvm/test/Analysis/BasicAA/empty.ll =================================================================== --- llvm/test/Analysis/BasicAA/empty.ll +++ llvm/test/Analysis/BasicAA/empty.ll @@ -1,4 +1,3 @@ -; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s ; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" Index: llvm/include/llvm/IR/LegacyPassNameParser.h =================================================================== --- llvm/include/llvm/IR/LegacyPassNameParser.h +++ llvm/include/llvm/IR/LegacyPassNameParser.h @@ -73,11 +73,6 @@ llvm_unreachable(nullptr); } addLiteralOption(P->getPassArgument().data(), P, P->getPassName().data()); - - // Temporary alias for basicaa -> basic-aa - // TODO: remove once everything is migrated to basic-aa - if (P->getPassArgument() == "basic-aa") - addLiteralOption("basicaa", P, "deprecated alias for basic-aa"); } void passEnumerate(const PassInfo *P) override { passRegistered(P); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83067.276139.patch Type: text/x-patch Size: 1237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:45:22 2020 From: llvm-commits at lists.llvm.org (Dan Liew via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:45:22 +0000 (UTC) Subject: [PATCH] D82691: Disable interception of sigaltstack on i386 macOS. In-Reply-To: References: Message-ID: <36a4e663b9283f163f8c65c6c1663312@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG888951aaca58: Disable interception of sigaltstack on i386 macOS. (authored by delcypher). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82691/new/ https://reviews.llvm.org/D82691 Files: compiler-rt/lib/sanitizer_common/sanitizer_platform.h compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h Index: compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h =================================================================== --- compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h +++ compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h @@ -597,7 +597,10 @@ #define SANITIZER_INTERCEPT_QSORT \ (SI_POSIX && !SI_IOSSIM && !SI_WATCHOS && !SI_TVOS && !SI_ANDROID) #define SANITIZER_INTERCEPT_QSORT_R (SI_LINUX && !SI_ANDROID) -#define SANITIZER_INTERCEPT_SIGALTSTACK SI_POSIX +// sigaltstack on i386 macOS cannot be intercepted due to setjmp() +// calling it and assuming that it does not clobber registers. +#define SANITIZER_INTERCEPT_SIGALTSTACK \ + (SI_POSIX && !(SANITIZER_MAC && SANITIZER_I386)) #define SANITIZER_INTERCEPT_UNAME (SI_POSIX && !SI_FREEBSD) #define SANITIZER_INTERCEPT___XUNAME SI_FREEBSD Index: compiler-rt/lib/sanitizer_common/sanitizer_platform.h =================================================================== --- compiler-rt/lib/sanitizer_common/sanitizer_platform.h +++ compiler-rt/lib/sanitizer_common/sanitizer_platform.h @@ -132,6 +132,12 @@ # define SANITIZER_X32 0 #endif +#if defined(__i386__) || defined(_M_IX86) +# define SANITIZER_I386 1 +#else +# define SANITIZER_I386 0 +#endif + #if defined(__mips__) # define SANITIZER_MIPS 1 # if defined(__mips64) -------------- next part -------------- A non-text attachment was scrubbed... Name: D82691.275672.patch Type: text/x-patch Size: 1354 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:45:17 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Tue, 7 Jul 2020 10:45:17 -0700 Subject: [PATCH] D82387: [flang] add RTBuilder In-Reply-To: References: <7291fbdeedbc2408b140944fa7971cfe@localhost.localdomain> Message-ID: ... you're the one that put the review up and didn't credit anyone else. Can you fix this then? Thanks. -eric On Tue, Jul 7, 2020 at 10:10 AM Eric Schweitz (PGI) < eric.schweitz at pgroup.com> wrote: > The original author of this particular code knows of the issue, but is > currently out on vacation. > > > > Thanks, > > Eric > > > > *From:* Eric Christopher > *Sent:* Monday, July 6, 2020 2:40 PM > *To:* reviews+D82387+public+36dcff9b060bae84 at reviews.llvm.org > *Cc:* Eric Schweitz (PGI) ; Jean Perier < > jperier at nvidia.com>; Steve Scalpone ; > kiranchandramohan at gmail.com; clementval at gmail.com; Doerfert, Johannes < > jdoerfert at anl.gov>; David Truby ; > gsocsameeran at gmail.com; River Riddle ; > stephen.neuendorffer at gmail.com; llvm-commits ; > 88888yl at gmail.com; Samuel.j.knapp at btinternet.com; Peter Steinfeld < > psteinfeld at nvidia.com>; aperry at lanl.gov; Timothy Keith ; > Sourabh Singh Tomar ; Isuru Fernando < > isuruf at gmail.com>; Valentin Churavy ; > uday at polymagelabs.com > *Subject:* Re: [PATCH] D82387: [flang] add RTBuilder > > > > Agreed. This should be fixed :) > > > > -eric > > > > On Mon, Jul 6, 2020 at 12:54 PM David Truby via Phabricator < > reviews at reviews.llvm.org> wrote: > > DavidTruby added a comment. > > Is there a reason to use `float _Complex` here at all? The C++ standard > (29.5.4 of C++17) guarantees that `std::complex` and `float > _Complex` are layout compatible and can be reinterpret_casted to each other > so even if these functions are intended to be callable from C/interoperable > with _Complex in C code, it'd be better to use std::complex on the > C++ side. > > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D82387/new/ > > https://reviews.llvm.org/D82387 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:45:35 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via llvm-commits) Date: Tue, 07 Jul 2020 10:45:35 -0700 (PDT) Subject: [llvm] 0d7286a - [WebAssembly] Avoid scalarizing vector shifts in more cases Message-ID: <5f04b4bf.1c69fb81.8256c.a0bc@mx.google.com> Author: Thomas Lively Date: 2020-07-07T10:45:26-07:00 New Revision: 0d7286a652371bca460357348f3b4828cd4ca214 URL: https://github.com/llvm/llvm-project/commit/0d7286a652371bca460357348f3b4828cd4ca214 DIFF: https://github.com/llvm/llvm-project/commit/0d7286a652371bca460357348f3b4828cd4ca214.diff LOG: [WebAssembly] Avoid scalarizing vector shifts in more cases Since WebAssembly's vector shift instructions take a scalar shift amount rather than a vector shift amount, we have to check in ISel that the vector shift amount is a splat. Previously, we were checking explicitly for splat BUILD_VECTOR nodes, but this change uses the standard utilities for detecting splat values that can handle more complex splat patterns. Since the C++ ISel lowering is now more general than the ISel patterns, this change also simplifies shift lowering by using the C++ lowering for all SIMD shifts rather than mixing C++ and normal pattern-based lowering. This change improves ISel for shifts to the point that the simd-shift-unroll.ll regression test no longer tests the code path it was originally meant to test. The bug corresponding to that regression test is no longer reproducible with its original reported reproducer, so rather than try to fix the regression test, this change just removes it. Differential Revision: https://reviews.llvm.org/D83278 Added: llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll Modified: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td Removed: llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll ################################################################################ diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp index 651c504efc06..3f4ebd501595 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp @@ -1677,19 +1677,13 @@ SDValue WebAssemblyTargetLowering::LowerShift(SDValue Op, // Only manually lower vector shifts assert(Op.getSimpleValueType().isVector()); - // Unroll non-splat vector shifts - BuildVectorSDNode *ShiftVec; - SDValue SplatVal; - if (!(ShiftVec = dyn_cast(Op.getOperand(1).getNode())) || - !(SplatVal = ShiftVec->getSplatValue())) + auto ShiftVal = Op.getOperand(1); + if (!DAG.isSplatValue(ShiftVal, /*AllowUndefs=*/true)) return unrollVectorShift(Op, DAG); - // All splats except i64x2 const splats are handled by patterns - auto *SplatConst = dyn_cast(SplatVal); - if (!SplatConst || Op.getSimpleValueType() != MVT::v2i64) - return Op; + auto SplatVal = DAG.getSplatValue(ShiftVal); + assert(SplatVal != SDValue()); - // i64x2 const splats are custom lowered to avoid unnecessary wraps unsigned Opcode; switch (Op.getOpcode()) { case ISD::SHL: @@ -1704,9 +1698,11 @@ SDValue WebAssemblyTargetLowering::LowerShift(SDValue Op, default: llvm_unreachable("unexpected opcode"); } - APInt Shift = SplatConst->getAPIntValue().zextOrTrunc(32); + + // Use anyext because none of the high bits can affect the shift + auto ScalarShift = DAG.getAnyExtOrTrunc(SplatVal, DL, MVT::i32); return DAG.getNode(Opcode, DL, Op.getValueType(), Op.getOperand(0), - DAG.getConstant(Shift, DL, MVT::i32)); + ScalarShift); } //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td b/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td index b4a8a7bc42ae..814bb80fb693 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td +++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td @@ -654,55 +654,36 @@ defm BITMASK : SIMDBitmask; // Bit shifts //===----------------------------------------------------------------------===// -multiclass SIMDShift simdop> { +multiclass SIMDShift simdop> { defm _#vec_t : SIMD_I<(outs V128:$dst), (ins V128:$vec, I32:$x), (outs), (ins), - [(set (vec_t V128:$dst), - (node V128:$vec, (vec_t shift_vec)))], + [(set (vec_t V128:$dst), (node V128:$vec, I32:$x))], vec#"."#name#"\t$dst, $vec, $x", vec#"."#name, simdop>; } multiclass SIMDShiftInt baseInst> { - defm "" : SIMDShift; - defm "" : SIMDShift; - defm "" : SIMDShift; - defm "" : SIMDShift; + defm "" : SIMDShift; + defm "" : SIMDShift; + defm "" : SIMDShift; + defm "" : SIMDShift; } -// Left shift by scalar: shl -defm SHL : SIMDShiftInt; - -// Right shift by scalar: shr_s / shr_u -defm SHR_S : SIMDShiftInt; -defm SHR_U : SIMDShiftInt; - -// Truncate i64 shift operands to i32s, except if they are already i32s -foreach shifts = [[shl, SHL_v2i64], [sra, SHR_S_v2i64], [srl, SHR_U_v2i64]] in { -def : Pat<(v2i64 (shifts[0] - (v2i64 V128:$vec), - (v2i64 (splat2 (i64 (sext I32:$x)))) - )), - (v2i64 (shifts[1] (v2i64 V128:$vec), (i32 I32:$x)))>; -def : Pat<(v2i64 (shifts[0] (v2i64 V128:$vec), (v2i64 (splat2 I64:$x)))), - (v2i64 (shifts[1] (v2i64 V128:$vec), (I32_WRAP_I64 I64:$x)))>; -} - -// 2xi64 shifts with constant shift amounts are custom lowered to avoid wrapping +// WebAssembly SIMD shifts are nonstandard in that the shift amount is +// an i32 rather than a vector, so they need custom nodes. def wasm_shift_t : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0, 1>, SDTCisVT<2, i32>] >; def wasm_shl : SDNode<"WebAssemblyISD::VEC_SHL", wasm_shift_t>; def wasm_shr_s : SDNode<"WebAssemblyISD::VEC_SHR_S", wasm_shift_t>; def wasm_shr_u : SDNode<"WebAssemblyISD::VEC_SHR_U", wasm_shift_t>; -foreach shifts = [[wasm_shl, SHL_v2i64], - [wasm_shr_s, SHR_S_v2i64], - [wasm_shr_u, SHR_U_v2i64]] in -def : Pat<(v2i64 (shifts[0] (v2i64 V128:$vec), I32:$x)), - (v2i64 (shifts[1] (v2i64 V128:$vec), I32:$x))>; + +// Left shift by scalar: shl +defm SHL : SIMDShiftInt; + +// Right shift by scalar: shr_s / shr_u +defm SHR_S : SIMDShiftInt; +defm SHR_U : SIMDShiftInt; //===----------------------------------------------------------------------===// // Integer binary arithmetic @@ -766,12 +747,12 @@ def add_nuw : PatFrag<(ops node:$lhs, node:$rhs), "return N->getFlags().hasNoUnsignedWrap();">; foreach nodes = [[v16i8, splat16], [v8i16, splat8]] in -def : Pat<(srl +def : Pat<(wasm_shr_u (add_nuw (add_nuw (nodes[0] V128:$lhs), (nodes[0] V128:$rhs)), (nodes[1] (i32 1)) ), - (nodes[0] (nodes[1] (i32 1))) + (i32 1) ), (!cast("AVGR_U_"#nodes[0]) V128:$lhs, V128:$rhs)>; diff --git a/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll b/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll new file mode 100644 index 000000000000..ded430f89545 --- /dev/null +++ b/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll @@ -0,0 +1,27 @@ +; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s + +; Test that SIMD shifts can be lowered correctly even with shift +; values that are more complex than plain splats. + +target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" +target triple = "wasm32-unknown-unknown" + +;; TODO: Optimize this further by scalarizing the add + +; CHECK-LABEL: shl_add: +; CHECK-NEXT: .functype shl_add (v128, i32, i32) -> (v128) +; CHECK-NEXT: i8x16.splat $push1=, $1 +; CHECK-NEXT: i8x16.splat $push0=, $2 +; CHECK-NEXT: i8x16.add $push2=, $pop1, $pop0 +; CHECK-NEXT: i8x16.extract_lane_u $push3=, $pop2, 0 +; CHECK-NEXT: i8x16.shl $push4=, $0, $pop3 +; CHECK-NEXT: return $pop4 +define <16 x i8> @shl_add(<16 x i8> %v, i8 %a, i8 %b) { + %t1 = insertelement <16 x i8> undef, i8 %a, i32 0 + %va = shufflevector <16 x i8> %t1, <16 x i8> undef, <16 x i32> zeroinitializer + %t2 = insertelement <16 x i8> undef, i8 %b, i32 0 + %vb = shufflevector <16 x i8> %t2, <16 x i8> undef, <16 x i32> zeroinitializer + %shift = add <16 x i8> %va, %vb + %r = shl <16 x i8> %v, %shift + ret <16 x i8> %r +} diff --git a/llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll b/llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll deleted file mode 100644 index 2a5422cb0110..000000000000 --- a/llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll +++ /dev/null @@ -1,128 +0,0 @@ -; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 | FileCheck %s --check-prefixes CHECK,SIMD128,SIMD128-SLOW - -;; Test that the custom shift unrolling works correctly in cases that -;; cause assertion failures due to illegal types when using -;; DAG.UnrollVectorOp. Regression test for PR45178. - -target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" -target triple = "wasm32-unknown-unknown" - -; CHECK-LABEL: shl_v16i8: -; CHECK-NEXT: .functype shl_v16i8 (v128) -> (v128) -; CHECK-NEXT: i8x16.extract_lane_u $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 3 -; CHECK-NEXT: i32.shl $push2=, $pop0, $pop1 -; CHECK-NEXT: i8x16.splat $push3=, $pop2 -; CHECK-NEXT: i8x16.extract_lane_u $push4=, $0, 1 -; CHECK-NEXT: i8x16.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i8x16.extract_lane_u $push32=, $0, 15 -; CHECK-NEXT: i8x16.replace_lane $push33=, $pop31, 15, $pop32 -; CHECK-NEXT: v8x16.shuffle $push34=, $pop33, $0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -; CHECK-NEXT: return $pop34 -define <16 x i8> @shl_v16i8(<16 x i8> %in) { - %out = shl <16 x i8> %in, - - %ret = shufflevector <16 x i8> %out, <16 x i8> undef, <16 x i32> zeroinitializer - ret <16 x i8> %ret -} - -; CHECK-LABEL: shr_s_v16i8: -; CHECK-NEXT: functype shr_s_v16i8 (v128) -> (v128) -; CHECK-NEXT: i8x16.extract_lane_s $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 3 -; CHECK-NEXT: i32.shr_s $push2=, $pop0, $pop1 -; CHECK-NEXT: i8x16.splat $push3=, $pop2 -; CHECK-NEXT: i8x16.extract_lane_s $push4=, $0, 1 -; CHECK-NEXT: i8x16.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i8x16.extract_lane_s $push32=, $0, 15 -; CHECK-NEXT: i8x16.replace_lane $push33=, $pop31, 15, $pop32 -; CHECK-NEXT: v8x16.shuffle $push34=, $pop33, $0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -; CHECK-NEXT: return $pop34 -define <16 x i8> @shr_s_v16i8(<16 x i8> %in) { - %out = ashr <16 x i8> %in, - - %ret = shufflevector <16 x i8> %out, <16 x i8> undef, <16 x i32> zeroinitializer - ret <16 x i8> %ret -} - -; CHECK-LABEL: shr_u_v16i8: -; CHECK-NEXT: functype shr_u_v16i8 (v128) -> (v128) -; CHECK-NEXT: i8x16.extract_lane_u $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 3 -; CHECK-NEXT: i32.shr_u $push2=, $pop0, $pop1 -; CHECK-NEXT: i8x16.splat $push3=, $pop2 -; CHECK-NEXT: i8x16.extract_lane_u $push4=, $0, 1 -; CHECK-NEXT: i8x16.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i8x16.extract_lane_u $push32=, $0, 15 -; CHECK-NEXT: i8x16.replace_lane $push33=, $pop31, 15, $pop32 -; CHECK-NEXT: v8x16.shuffle $push34=, $pop33, $0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -; CHECK-NEXT: return $pop34 -define <16 x i8> @shr_u_v16i8(<16 x i8> %in) { - %out = lshr <16 x i8> %in, - - %ret = shufflevector <16 x i8> %out, <16 x i8> undef, <16 x i32> zeroinitializer - ret <16 x i8> %ret -} - -; CHECK-LABEL: shl_v8i16: -; CHECK-NEXT: functype shl_v8i16 (v128) -> (v128) -; CHECK-NEXT: i16x8.extract_lane_u $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 9 -; CHECK-NEXT: i32.shl $push2=, $pop0, $pop1 -; CHECK-NEXT: i16x8.splat $push3=, $pop2 -; CHECK-NEXT: i16x8.extract_lane_u $push4=, $0, 1 -; CHECK-NEXT: i16x8.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i16x8.extract_lane_u $push16=, $0, 7 -; CHECK-NEXT: i16x8.replace_lane $push17=, $pop15, 7, $pop16 -; CHECK-NEXT: v8x16.shuffle $push18=, $pop17, $0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 -; CHECK-NEXT: return $pop18 -define <8 x i16> @shl_v8i16(<8 x i16> %in) { - %out = shl <8 x i16> %in, - %ret = shufflevector <8 x i16> %out, <8 x i16> undef, <8 x i32> zeroinitializer - ret <8 x i16> %ret -} - -; CHECK-LABEL: shr_s_v8i16: -; CHECK-NEXT: functype shr_s_v8i16 (v128) -> (v128) -; CHECK-NEXT: i16x8.extract_lane_s $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 9 -; CHECK-NEXT: i32.shr_s $push2=, $pop0, $pop1 -; CHECK-NEXT: i16x8.splat $push3=, $pop2 -; CHECK-NEXT: i16x8.extract_lane_s $push4=, $0, 1 -; CHECK-NEXT: i16x8.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i16x8.extract_lane_s $push16=, $0, 7 -; CHECK-NEXT: i16x8.replace_lane $push17=, $pop15, 7, $pop16 -; CHECK-NEXT: v8x16.shuffle $push18=, $pop17, $0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 -; CHECK-NEXT: return $pop18 -define <8 x i16> @shr_s_v8i16(<8 x i16> %in) { - %out = ashr <8 x i16> %in, - %ret = shufflevector <8 x i16> %out, <8 x i16> undef, <8 x i32> zeroinitializer - ret <8 x i16> %ret -} - -; CHECK-LABEL: shr_u_v8i16: -; CHECK-NEXT: functype shr_u_v8i16 (v128) -> (v128) -; CHECK-NEXT: i16x8.extract_lane_u $push0=, $0, 0 -; CHECK-NEXT: i32.const $push1=, 9 -; CHECK-NEXT: i32.shr_u $push2=, $pop0, $pop1 -; CHECK-NEXT: i16x8.splat $push3=, $pop2 -; CHECK-NEXT: i16x8.extract_lane_u $push4=, $0, 1 -; CHECK-NEXT: i16x8.replace_lane $push5=, $pop3, 1, $pop4 -; ... -; CHECK: i16x8.extract_lane_u $push16=, $0, 7 -; CHECK-NEXT: i16x8.replace_lane $push17=, $pop15, 7, $pop16 -; CHECK-NEXT: v8x16.shuffle $push18=, $pop17, $0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 -; CHECK-NEXT: return $pop18 -define <8 x i16> @shr_u_v8i16(<8 x i16> %in) { - %out = lshr <8 x i16> %in, - %ret = shufflevector <8 x i16> %out, <8 x i16> undef, <8 x i32> zeroinitializer - ret <8 x i16> %ret -} From llvm-commits at lists.llvm.org Tue Jul 7 10:45:44 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:45:44 +0000 (UTC) Subject: [PATCH] D83278: [WebAssembly] Avoid scalarizing vector shifts in more cases In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG0d7286a65237: [WebAssembly] Avoid scalarizing vector shifts in more cases (authored by tlively). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83278/new/ https://reviews.llvm.org/D83278 Files: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/WebAssembly/simd-shift-unroll.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83278.276140.patch Type: text/x-patch Size: 13519 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:45:52 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:45:52 +0000 (UTC) Subject: [PATCH] D83182: Expand the LLVM Developer Policy to include new sections on adding a project to the LLVM Monorepo, and a second about the LLVM Incubator projects. In-Reply-To: References: Message-ID: <9deedb3b19c26f8402113b87dd7885cf@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG79b30af0ec53: Expand the LLVM Developer Policy to include new sections on adding a project to… (authored by lattner). Changed prior to commit: https://reviews.llvm.org/D83182?vs=275576&id=275676#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83182/new/ https://reviews.llvm.org/D83182 Files: llvm/docs/DeveloperPolicy.rst -------------- next part -------------- A non-text attachment was scrubbed... Name: D83182.275676.patch Type: text/x-patch Size: 11398 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:46:01 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:46:01 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: <86360b09fb0718af4c2ae10c7b104046@localhost.localdomain> plotfi added a comment. @efriedma Does this seem right to you? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 From llvm-commits at lists.llvm.org Tue Jul 7 10:46:26 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:46:26 +0000 (UTC) Subject: [PATCH] D83066: [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG1143f09678f4: [NewPM][LoopFusion] Rename loop-fuse -> loop-fusion (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83066/new/ https://reviews.llvm.org/D83066 Files: llvm/lib/Passes/PassRegistry.def Index: llvm/lib/Passes/PassRegistry.def =================================================================== --- llvm/lib/Passes/PassRegistry.def +++ llvm/lib/Passes/PassRegistry.def @@ -221,7 +221,7 @@ FUNCTION_PASS("lcssa", LCSSAPass()) FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass()) FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass()) -FUNCTION_PASS("loop-fuse", LoopFusePass()) +FUNCTION_PASS("loop-fusion", LoopFusePass()) FUNCTION_PASS("loop-distribute", LoopDistributePass()) FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt()) FUNCTION_PASS("print", PrintFunctionPass(dbgs())) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83066.275677.patch Type: text/x-patch Size: 608 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:46:43 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:46:43 +0000 (UTC) Subject: [PATCH] D83067: [BasicAA] Remove -basicaa alias In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG83158cf95dd7: [BasicAA] Remove -basicaa alias (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83067/new/ https://reviews.llvm.org/D83067 Files: llvm/include/llvm/IR/LegacyPassNameParser.h llvm/test/Analysis/BasicAA/empty.ll Index: llvm/test/Analysis/BasicAA/empty.ll =================================================================== --- llvm/test/Analysis/BasicAA/empty.ll +++ llvm/test/Analysis/BasicAA/empty.ll @@ -1,4 +1,3 @@ -; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s ; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" Index: llvm/include/llvm/IR/LegacyPassNameParser.h =================================================================== --- llvm/include/llvm/IR/LegacyPassNameParser.h +++ llvm/include/llvm/IR/LegacyPassNameParser.h @@ -73,11 +73,6 @@ llvm_unreachable(nullptr); } addLiteralOption(P->getPassArgument().data(), P, P->getPassName().data()); - - // Temporary alias for basicaa -> basic-aa - // TODO: remove once everything is migrated to basic-aa - if (P->getPassArgument() == "basic-aa") - addLiteralOption("basicaa", P, "deprecated alias for basic-aa"); } void passEnumerate(const PassInfo *P) override { passRegistered(P); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83067.275678.patch Type: text/x-patch Size: 1237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:50:29 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:50:29 +0000 (UTC) Subject: [PATCH] D83149: [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush In-Reply-To: References: Message-ID: MaskRay marked 6 inline comments as done. MaskRay added inline comments. ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c:49 dlerror(); - void (*gcov_flush1)() = (void (*)())dlsym(f1_handle, "__gcov_flush"); - if (gcov_flush1 == NULL) { - fprintf(stderr, "unable to find __gcov_flush in func.shared': %s\n", dlerror()); + void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); + if (gcov_reset1 == NULL) { ---------------- serge-sans-paille wrote: > Do we also need to test gcov_flush symbol here too? `__gcov_flush` is deleted. We don't need to test it. ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c.gcov:56 +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); // CHECK-NEXT: #####: 52: return EXIT_FAILURE; ---------------- serge-sans-paille wrote: > Same question here, what about gcov_flush symbol? `__gcov_flush` is deleted. We don't need to test it. ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main_three-libs.c.gcov:55 +// CHECK-NEXT: 1: 49: void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); ---------------- serge-sans-paille wrote: > And here. `__gcov_flush` is deleted. We don't need to test it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83149/new/ https://reviews.llvm.org/D83149 From llvm-commits at lists.llvm.org Tue Jul 7 10:53:23 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Tue, 07 Jul 2020 10:53:23 -0700 (PDT) Subject: [llvm] 907f15c - [gn build] Port dfa0db79d0e Message-ID: <5f04b693.1c69fb81.4bea.2ffc@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-07T17:49:12Z New Revision: 907f15c5914187b2831977e10599bede6bbafe72 URL: https://github.com/llvm/llvm-project/commit/907f15c5914187b2831977e10599bede6bbafe72 DIFF: https://github.com/llvm/llvm-project/commit/907f15c5914187b2831977e10599bede6bbafe72.diff LOG: [gn build] Port dfa0db79d0e Added: Modified: llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn b/llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn index a34f9dcf9c0a..45ecef719fdb 100644 --- a/llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn +++ b/llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn @@ -39,6 +39,7 @@ static_library("bugprone") { "MisplacedWideningCastCheck.cpp", "MoveForwardingReferenceCheck.cpp", "MultipleStatementMacroCheck.cpp", + "NoEscapeCheck.cpp", "NotNullTerminatedResultCheck.cpp", "ParentVirtualCallCheck.cpp", "PosixReturnCheck.cpp", From llvm-commits at lists.llvm.org Tue Jul 7 10:53:55 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:53:55 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: zbrid updated this revision to Diff 276141. zbrid marked an inline comment as done. zbrid added a comment. Fix redundant LFENCE problem with SESES Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 Files: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82037.276141.patch Type: text/x-patch Size: 11338 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 10:54:31 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:54:31 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: <8e10b58b3c8d59e966fbfa93fb3441d4@localhost.localdomain> zbrid added a comment. Going to merge this today. I updated based on the comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 From llvm-commits at lists.llvm.org Tue Jul 7 10:54:59 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:54:59 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: thakis accepted this revision. thakis added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/test/tools/llvm-ml/struct_errors.test:10 +t1 int_test <<1,2,3>> +// CHECK: error: Initializer too long for field + ---------------- This diag could probably add "expected at most %d elements, got %d" at the end ================ Comment at: llvm/test/tools/llvm-ml/struct_errors.test:48 +t9 STRUCT 3 +// CHECK: error: unsupported alignment value +t9 ENDS ---------------- this could maybe say "alignment must be power of two, got %d" or similar (I realize it's an existing diag) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 From llvm-commits at lists.llvm.org Tue Jul 7 10:57:06 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:57:06 +0000 (UTC) Subject: [PATCH] D83312: [OpenMPOpt][NFC] Exposing OMPInformationCache and OpenMPOpt in the public header for unittesting In-Reply-To: References: Message-ID: <161c5b31f5d7fc3b72944f132de926a7@localhost.localdomain> hamax97 updated this revision to Diff 276142. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83312/new/ https://reviews.llvm.org/D83312 Files: llvm/include/llvm/Transforms/IPO/OpenMPOpt.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83312.276142.patch Type: text/x-patch Size: 50297 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:03:42 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:03:42 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: <7dc05167ad6ea5ba01e54982cc111cde@localhost.localdomain> zbrid updated this revision to Diff 276144. zbrid added a comment. Update commit message to add info about SESES change too Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 Files: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82037.276144.patch Type: text/x-patch Size: 11338 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:05:33 2020 From: llvm-commits at lists.llvm.org (Atmn Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:05:33 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <18113c629292031dcd34cffd2c681bbf@localhost.localdomain> atmnpatel added inline comments. ================ Comment at: clang/lib/Parse/ParseOpenMP.cpp:2787 + if (getLangOpts().OpenMP < 51 && Kind == OMPC_default && + static_cast(Val.getValue().Type) == + OMP_DEFAULT_firstprivate) { ---------------- ABataev wrote: > ABataev wrote: > > No need for cast here. > Still no need for the cast Sorry, I saw that before and checked if I could remove it and I can't. Val.getValue().Type is an unsigned int and the other is an enum. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Tue Jul 7 11:05:32 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via llvm-commits) Date: Tue, 07 Jul 2020 11:05:32 -0700 (PDT) Subject: [llvm] dfabffb - [x86][lvi][seses] Use SESES at O0 for LVI mitigation Message-ID: <5f04b96c.1c69fb81.c31ce.ae91@mx.google.com> Author: Zola Bridges Date: 2020-07-07T11:05:09-07:00 New Revision: dfabffb195ee7c9f9db327f29feb781cbec53724 URL: https://github.com/llvm/llvm-project/commit/dfabffb195ee7c9f9db327f29feb781cbec53724 DIFF: https://github.com/llvm/llvm-project/commit/dfabffb195ee7c9f9db327f29feb781cbec53724.diff LOG: [x86][lvi][seses] Use SESES at O0 for LVI mitigation Use SESES as the fallback at O0 where the optimized LVI pass isn't desired due to its effect on build times at O0. I updated the LVI tests since this changes the code gen for the tests touched in the parent revision. This is a follow up to the comments I made here: https://reviews.llvm.org/D80964 Hopefully we can continue the discussion here. Also updated SESES to handle LFENCE instructions properly instead of adding redundant LFENCEs. In particular, 1) no longer add LFENCE if the current instruction being processed is an LFENCE and 2) no longer add LFENCE if the instruction right before the instruction being processed is an LFENCE Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D82037 Added: Modified: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86.h b/llvm/lib/Target/X86/X86.h index e7f373e5e865..91ba4e3d091e 100644 --- a/llvm/lib/Target/X86/X86.h +++ b/llvm/lib/Target/X86/X86.h @@ -141,7 +141,6 @@ InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM, X86RegisterBankInfo &); FunctionPass *createX86LoadValueInjectionLoadHardeningPass(); -FunctionPass *createX86LoadValueInjectionLoadHardeningUnoptimizedPass(); FunctionPass *createX86LoadValueInjectionRetHardeningPass(); FunctionPass *createX86SpeculativeLoadHardeningPass(); FunctionPass *createX86SpeculativeExecutionSideEffectSuppression(); @@ -161,7 +160,6 @@ void initializeX86ExecutionDomainFixPass(PassRegistry &); void initializeX86ExpandPseudoPass(PassRegistry &); void initializeX86FixupSetCCPassPass(PassRegistry &); void initializeX86FlagsCopyLoweringPassPass(PassRegistry &); -void initializeX86LoadValueInjectionLoadHardeningUnoptimizedPassPass(PassRegistry &); void initializeX86LoadValueInjectionLoadHardeningPassPass(PassRegistry &); void initializeX86LoadValueInjectionRetHardeningPassPass(PassRegistry &); void initializeX86OptimizeLEAPassPass(PassRegistry &); diff --git a/llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp b/llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp index 35fc439998f9..50f8b3477acc 100644 --- a/llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp +++ b/llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp @@ -822,79 +822,3 @@ INITIALIZE_PASS_END(X86LoadValueInjectionLoadHardeningPass, PASS_KEY, FunctionPass *llvm::createX86LoadValueInjectionLoadHardeningPass() { return new X86LoadValueInjectionLoadHardeningPass(); } - -namespace { - -/// The `X86LoadValueInjectionLoadHardeningPass` above depends on expensive -/// analysis passes that add complexity to the pipeline. This complexity -/// can cause noticable overhead when no optimizations are enabled, i.e., -O0. -/// The purpose of `X86LoadValueInjectionLoadHardeningUnoptimizedPass` is to -/// provide the same security as the optimized pass, but without adding -/// unnecessary complexity to the LLVM pipeline. -/// -/// The behavior of this pass is simply to insert an LFENCE after every load -/// instruction. -class X86LoadValueInjectionLoadHardeningUnoptimizedPass - : public MachineFunctionPass { -public: - X86LoadValueInjectionLoadHardeningUnoptimizedPass() - : MachineFunctionPass(ID) {} - - StringRef getPassName() const override { - return "X86 Load Value Injection (LVI) Load Hardening (Unoptimized)"; - } - bool runOnMachineFunction(MachineFunction &MF) override; - static char ID; -}; - -} // end anonymous namespace - -char X86LoadValueInjectionLoadHardeningUnoptimizedPass::ID = 0; - -bool X86LoadValueInjectionLoadHardeningUnoptimizedPass::runOnMachineFunction( - MachineFunction &MF) { - LLVM_DEBUG(dbgs() << "***** " << getPassName() << " : " << MF.getName() - << " *****\n"); - const X86Subtarget *STI = &MF.getSubtarget(); - if (!STI->useLVILoadHardening()) - return false; - - // FIXME: support 32-bit - if (!STI->is64Bit()) - report_fatal_error("LVI load hardening is only supported on 64-bit", false); - - // Don't skip functions with the "optnone" attr but participate in opt-bisect. - const Function &F = MF.getFunction(); - if (!F.hasOptNone() && skipFunction(F)) - return false; - - bool Modified = false; - ++NumFunctionsConsidered; - - const TargetInstrInfo *TII = STI->getInstrInfo(); - for (auto &MBB : MF) { - for (auto &MI : MBB) { - if (!MI.mayLoad() || MI.getOpcode() == X86::LFENCE || - MI.getOpcode() == X86::MFENCE) - continue; - - MachineBasicBlock::iterator InsertionPt = - MI.getNextNode() ? MI.getNextNode() : MBB.end(); - BuildMI(MBB, InsertionPt, DebugLoc(), TII->get(X86::LFENCE)); - ++NumFences; - Modified = true; - } - } - - if (Modified) - ++NumFunctionsMitigated; - - return Modified; -} - -INITIALIZE_PASS(X86LoadValueInjectionLoadHardeningUnoptimizedPass, PASS_KEY, - "X86 LVI load hardening", false, false) - -FunctionPass *llvm::createX86LoadValueInjectionLoadHardeningUnoptimizedPass() { - return new X86LoadValueInjectionLoadHardeningUnoptimizedPass(); -} diff --git a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp index 29d54b26135a..75138f2de696 100644 --- a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp +++ b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp @@ -22,6 +22,7 @@ #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/Pass.h" +#include "llvm/Target/TargetMachine.h" using namespace llvm; #define DEBUG_TYPE "x86-seses" @@ -86,27 +87,42 @@ static bool hasConstantAddressingMode(const MachineInstr &MI) { bool X86SpeculativeExecutionSideEffectSuppression::runOnMachineFunction( MachineFunction &MF) { - if (!EnableSpeculativeExecutionSideEffectSuppression) + + const auto &OptLevel = MF.getTarget().getOptLevel(); + const X86Subtarget &Subtarget = MF.getSubtarget(); + + // Check whether SESES needs to run as the fallback for LVI at O0 or if the + // user explicitly passed the SESES flag. + if (!EnableSpeculativeExecutionSideEffectSuppression && + !(Subtarget.useLVILoadHardening() && OptLevel == CodeGenOpt::None)) return false; LLVM_DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName() << " **********\n"); bool Modified = false; - const X86Subtarget &Subtarget = MF.getSubtarget(); const X86InstrInfo *TII = Subtarget.getInstrInfo(); for (MachineBasicBlock &MBB : MF) { MachineInstr *FirstTerminator = nullptr; - + // Keep track of whether the previous instruction was an LFENCE to avoid + // adding redundant LFENCEs. + bool PrevInstIsLFENCE = false; for (auto &MI : MBB) { + + if (MI.getOpcode() == X86::LFENCE) { + PrevInstIsLFENCE = true; + continue; + } // We want to put an LFENCE before any instruction that // may load or store. This LFENCE is intended to avoid leaking any secret // data due to a given load or store. This results in closing the cache // and memory timing side channels. We will treat terminators that load // or store separately. if (MI.mayLoadOrStore() && !MI.isTerminator()) { - BuildMI(MBB, MI, DebugLoc(), TII->get(X86::LFENCE)); - NumLFENCEsInserted++; - Modified = true; + if (!PrevInstIsLFENCE) { + BuildMI(MBB, MI, DebugLoc(), TII->get(X86::LFENCE)); + NumLFENCEsInserted++; + Modified = true; + } if (OneLFENCEPerBasicBlock) break; } @@ -128,19 +144,25 @@ bool X86SpeculativeExecutionSideEffectSuppression::runOnMachineFunction( // Look for branch instructions that will require an LFENCE to be put // before this basic block's terminators. - if (!MI.isBranch() || OmitBranchLFENCEs) + if (!MI.isBranch() || OmitBranchLFENCEs) { // This isn't a branch or we're not putting LFENCEs before branches. + PrevInstIsLFENCE = false; continue; + } - if (OnlyLFENCENonConst && hasConstantAddressingMode(MI)) + if (OnlyLFENCENonConst && hasConstantAddressingMode(MI)) { // This is a branch, but it only has constant addressing mode and we're // not adding LFENCEs before such branches. + PrevInstIsLFENCE = false; continue; + } // This branch requires adding an LFENCE. - BuildMI(MBB, FirstTerminator, DebugLoc(), TII->get(X86::LFENCE)); - NumLFENCEsInserted++; - Modified = true; + if (!PrevInstIsLFENCE) { + BuildMI(MBB, FirstTerminator, DebugLoc(), TII->get(X86::LFENCE)); + NumLFENCEsInserted++; + Modified = true; + } break; } } diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 7e00b30915a0..7344116e14af 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -489,10 +489,12 @@ void X86PassConfig::addMachineSSAOptimization() { void X86PassConfig::addPostRegAlloc() { addPass(createX86FloatingPointStackifierPass()); + // When -O0 is enabled, the Load Value Injection Hardening pass will fall back + // to using the Speculative Execution Side Effect Suppression pass for + // mitigation. This is to prevent slow downs due to + // analyses needed by the LVIHardening pass when compiling at -O0. if (getOptLevel() != CodeGenOpt::None) addPass(createX86LoadValueInjectionLoadHardeningPass()); - else - addPass(createX86LoadValueInjectionLoadHardeningUnoptimizedPass()); } void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); } diff --git a/llvm/test/CodeGen/X86/O0-pipeline.ll b/llvm/test/CodeGen/X86/O0-pipeline.ll index a1cd828abeab..528b3c39c879 100644 --- a/llvm/test/CodeGen/X86/O0-pipeline.ll +++ b/llvm/test/CodeGen/X86/O0-pipeline.ll @@ -46,7 +46,6 @@ ; CHECK-NEXT: Fast Register Allocator ; CHECK-NEXT: Bundle Machine CFG Edges ; CHECK-NEXT: X86 FP Stackifier -; CHECK-NEXT: X86 Load Value Injection (LVI) Load Hardening (Unoptimized) ; CHECK-NEXT: Fixup Statepoint Caller Saved ; CHECK-NEXT: Lazy Machine Block Frequency Analysis ; CHECK-NEXT: Machine Optimization Remark Emitter diff --git a/llvm/test/CodeGen/X86/lvi-hardening-loads.ll b/llvm/test/CodeGen/X86/lvi-hardening-loads.ll index e40911aa1f0c..ff8276f6f1c2 100644 --- a/llvm/test/CodeGen/X86/lvi-hardening-loads.ll +++ b/llvm/test/CodeGen/X86/lvi-hardening-loads.ll @@ -26,8 +26,11 @@ entry: ; X64-NEXT: jmp .LBB0_1 ; X64-NOOPT: # %bb.0: # %entry +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movl %esi, -{{[0-9]+}}(%rsp) +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movl $0, -{{[0-9]+}}(%rsp) ; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movl $0, -{{[0-9]+}}(%rsp) @@ -48,6 +51,7 @@ for.cond: ; preds = %for.inc, %entry ; X64-NOOPT: .LBB0_1: # %for.cond ; X64-NOOPT-NEXT: # =>This Inner Loop Header: Depth=1 +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax ; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax @@ -73,12 +77,13 @@ for.body: ; preds = %for.cond ; X64-NOOPT: # %bb.2: # %for.body ; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1 -; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax ; X64-NOOPT-NEXT: lfence +; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax ; X64-NOOPT-NEXT: cltd ; X64-NOOPT-NEXT: movl $2, %ecx ; X64-NOOPT-NEXT: idivl %ecx ; X64-NOOPT-NEXT: cmpl $0, %edx +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: jne .LBB0_4 if.then: ; preds = %for.body @@ -105,6 +110,7 @@ if.then: ; preds = %for.body ; X64-NOOPT: # %bb.3: # %if.then ; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1 +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movq -{{[0-9]+}}(%rsp), %rax ; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx @@ -126,10 +132,12 @@ for.inc: ; preds = %if.end ; X64-NOOPT: .LBB0_5: # %for.inc ; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1 -; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax ; X64-NOOPT-NEXT: lfence +; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax ; X64-NOOPT-NEXT: addl $1, %eax +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: movl %eax, -{{[0-9]+}}(%rsp) +; X64-NOOPT-NEXT: lfence ; X64-NOOPT-NEXT: jmp .LBB0_1 for.end: ; preds = %for.cond From llvm-commits at lists.llvm.org Tue Jul 7 11:05:37 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:05:37 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: <79fd10c503e65cf8214213f94c69e13c@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGdfabffb195ee: [x86][lvi][seses] Use SESES at O0 for LVI mitigation (authored by zbrid). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 Files: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82037.276146.patch Type: text/x-patch Size: 11338 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:07:39 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:07:39 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: ABataev added inline comments. ================ Comment at: clang/lib/Parse/ParseOpenMP.cpp:2787 + if (getLangOpts().OpenMP < 51 && Kind == OMPC_default && + static_cast(Val.getValue().Type) == + OMP_DEFAULT_firstprivate) { ---------------- atmnpatel wrote: > ABataev wrote: > > ABataev wrote: > > > No need for cast here. > > Still no need for the cast > Sorry, I saw that before and checked if I could remove it and I can't. Val.getValue().Type is an unsigned int and the other is an enum. This enum should be converted to `int` implicitly, no? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Tue Jul 7 11:08:34 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:08:34 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <957f73e2cc5bf8491eee944d1666b20f@localhost.localdomain> kmclaughlin updated this revision to Diff 276145. kmclaughlin added a comment. Changes to IncrementMemoryAddress: - Changed the assert added for scalable vectors to a report_fatal_error - Replaced `Addr.getValueSizeInBits().getFixedSize()` with `AddrVT.getSizeInBits().getFixedSize()` - Use `DataVT.getStoreSize()` instead of `DataVT.getSizeInBits()` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 Files: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td llvm/test/CodeGen/AArch64/sve-split-load.ll llvm/test/CodeGen/AArch64/sve-split-store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83137.276145.patch Type: text/x-patch Size: 11304 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:10:30 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:10:30 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: kmclaughlin marked 3 inline comments as done. kmclaughlin added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7135 if (IsCompressedMemory) { + assert(!DataVT.isScalableVector() && + "Cannot currently handle compressed memory with scalable vectors"); ---------------- david-arm wrote: > Do we know if this is something we catch earlier and hence should never get here? I just wonder if here it's not really an assert that something went wrong with the code, but perhaps we just hit a case we don't support yet? If it's just because we don't support it yet, instead of asserting we could do: > > if (DataVT.isScalableVector()) > report_fatal_error("Cannot currently handle compressed memory with scalable vectors"); I think this is something that we just don't support yet, so I've changed this to `report_fatal_error` as suggested CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Tue Jul 7 11:10:49 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:10:49 +0000 (UTC) Subject: [PATCH] D79910: [x86][seses] Add clang flag; Use lvi-cfi with seses In-Reply-To: References: Message-ID: <6e53767aca44da48de42fa830f591d11@localhost.localdomain> zbrid updated this revision to Diff 276147. zbrid added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79910/new/ https://reviews.llvm.org/D79910 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D79910.276147.patch Type: text/x-patch Size: 8050 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:12:24 2020 From: llvm-commits at lists.llvm.org (Atmn Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:12:24 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <3baa8e029802ff893400395d3cab09e8@localhost.localdomain> atmnpatel marked an inline comment as done. atmnpatel added inline comments. ================ Comment at: clang/lib/Parse/ParseOpenMP.cpp:2787 + if (getLangOpts().OpenMP < 51 && Kind == OMPC_default && + static_cast(Val.getValue().Type) == + OMP_DEFAULT_firstprivate) { ---------------- ABataev wrote: > atmnpatel wrote: > > ABataev wrote: > > > ABataev wrote: > > > > No need for cast here. > > > Still no need for the cast > > Sorry, I saw that before and checked if I could remove it and I can't. Val.getValue().Type is an unsigned int and the other is an enum. > This enum should be converted to `int` implicitly, no? When we moved the definition of this enum over from clang to llvm, we converted them to strongly typed enums. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Tue Jul 7 11:14:07 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:14:07 +0000 (UTC) Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing attribute to clang In-Reply-To: References: Message-ID: <9c9ca0c4acd9f51c8ad2904aeaf47d6e@localhost.localdomain> ABataev added inline comments. ================ Comment at: clang/lib/Parse/ParseOpenMP.cpp:2787 + if (getLangOpts().OpenMP < 51 && Kind == OMPC_default && + static_cast(Val.getValue().Type) == + OMP_DEFAULT_firstprivate) { ---------------- atmnpatel wrote: > ABataev wrote: > > atmnpatel wrote: > > > ABataev wrote: > > > > ABataev wrote: > > > > > No need for cast here. > > > > Still no need for the cast > > > Sorry, I saw that before and checked if I could remove it and I can't. Val.getValue().Type is an unsigned int and the other is an enum. > > This enum should be converted to `int` implicitly, no? > When we moved the definition of this enum over from clang to llvm, we converted them to strongly typed enums. I see. Ok then, leave it as is Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75591/new/ https://reviews.llvm.org/D75591 From llvm-commits at lists.llvm.org Tue Jul 7 11:14:34 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Tue, 07 Jul 2020 11:14:34 -0700 (PDT) Subject: [llvm] 931ec74 - [X86][AVX] Don't fold PEXTR(VBROADCAST_LOAD(X)) -> LOAD(X). Message-ID: <5f04bb8a.1c69fb81.17c50.b383@mx.google.com> Author: Simon Pilgrim Date: 2020-07-07T19:10:03+01:00 New Revision: 931ec74f7a29f53e18b574dc9500012ecbeba23a URL: https://github.com/llvm/llvm-project/commit/931ec74f7a29f53e18b574dc9500012ecbeba23a DIFF: https://github.com/llvm/llvm-project/commit/931ec74f7a29f53e18b574dc9500012ecbeba23a.diff LOG: [X86][AVX] Don't fold PEXTR(VBROADCAST_LOAD(X)) -> LOAD(X). We were checking the VBROADCAST_LOAD element size against the extraction destination size instead of the extracted vector element size - PEXTRW/PEXTB have implicit zext'ing so have i32 destination sizes for v8i16/v16i8 vectors, resulting in us extracting from the wrong part of a load. This patch bails from the fold if the vector element sizes don't match, and we now use the target constant extraction code later on like the pre-AVX2 targets, fixing the test case. Found by internal fuzzing tests. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/extractelement-load.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 575f358361b1..023b5975f0c7 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -38986,7 +38986,7 @@ static SDValue combineExtractWithShuffle(SDNode *N, SelectionDAG &DAG, auto *MemIntr = cast(SrcBC); unsigned SrcBCWidth = SrcBC.getScalarValueSizeInBits(); if (MemIntr->getMemoryVT().getSizeInBits() == SrcBCWidth && - VT.getSizeInBits() == SrcBCWidth) { + VT.getSizeInBits() == SrcBCWidth && SrcEltBits == SrcBCWidth) { SDValue Load = DAG.getLoad(VT, dl, MemIntr->getChain(), MemIntr->getBasePtr(), MemIntr->getPointerInfo(), diff --git a/llvm/test/CodeGen/X86/extractelement-load.ll b/llvm/test/CodeGen/X86/extractelement-load.ll index 752ba5b2a33d..94628c70d989 100644 --- a/llvm/test/CodeGen/X86/extractelement-load.ll +++ b/llvm/test/CodeGen/X86/extractelement-load.ll @@ -267,8 +267,8 @@ entry: ret float %cond } -; FIXME: Incorrect AVX2 codegen due to bad extraction from a VBROADCAST_LOAD of the <2 x i16> constant bitcast as <4 x i32>. -define void @subextract_broadcast_load_constant(<2 x i16>* nocapture %0, i16* nocapture %1, i16* nocapture %2) { +; Test for bad extractions from a VBROADCAST_LOAD of the <2 x i16> non-uniform constant bitcast as <4 x i32>. +define void @subextract_broadcast_load_constant(<2 x i16>* nocapture %0, i16* nocapture %1, i16* nocapture %2) { ; X32-SSE2-LABEL: subextract_broadcast_load_constant: ; X32-SSE2: # %bb.0: ; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax @@ -279,26 +279,12 @@ define void @subextract_broadcast_load_constant(<2 x i16>* nocapture %0, i16* no ; X32-SSE2-NEXT: movw $-24160, (%eax) # imm = 0xA1A0 ; X32-SSE2-NEXT: retl ; -; X64-SSSE3-LABEL: subextract_broadcast_load_constant: -; X64-SSSE3: # %bb.0: -; X64-SSSE3-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E -; X64-SSSE3-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E -; X64-SSSE3-NEXT: movw $-24160, (%rdx) # imm = 0xA1A0 -; X64-SSSE3-NEXT: retq -; -; X64-AVX1-LABEL: subextract_broadcast_load_constant: -; X64-AVX1: # %bb.0: -; X64-AVX1-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E -; X64-AVX1-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E -; X64-AVX1-NEXT: movw $-24160, (%rdx) # imm = 0xA1A0 -; X64-AVX1-NEXT: retq -; -; X64-AVX2-LABEL: subextract_broadcast_load_constant: -; X64-AVX2: # %bb.0: -; X64-AVX2-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E -; X64-AVX2-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E -; X64-AVX2-NEXT: movw $-24674, (%rdx) # imm = 0x9F9E -; X64-AVX2-NEXT: retq +; X64-LABEL: subextract_broadcast_load_constant: +; X64: # %bb.0: +; X64-NEXT: movl $-1583308898, (%rdi) # imm = 0xA1A09F9E +; X64-NEXT: movw $-24674, (%rsi) # imm = 0x9F9E +; X64-NEXT: movw $-24160, (%rdx) # imm = 0xA1A0 +; X64-NEXT: retq %4 = bitcast <2 x i16>* %0 to i8* store i8 -98, i8* %4, align 1 %5 = getelementptr inbounds i8, i8* %4, i64 1 From llvm-commits at lists.llvm.org Tue Jul 7 11:17:40 2020 From: llvm-commits at lists.llvm.org (Bardia Mahjour via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:17:40 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <840a83a6a3999872f2e20ccfccd21639@localhost.localdomain> bmahjour added inline comments. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:582 + EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, &PDT, &DI, &MSSAU)); + EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, &PDT, nullptr, &MSSAU)); }); ---------------- Please also add a check to make sure independent memory load/stores can be moved passed each other. For example, `%load2` should be able to move before the store to B. store i32 %load1, i32* %arrayidx_B, align 4 %load2 = load i32, i32* %arrayidx_A, align 4 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Tue Jul 7 11:18:00 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:18:00 +0000 (UTC) Subject: [PATCH] D82037: [x86][lvi][seses] Use SESES at O0 for LVI mitigation In-Reply-To: References: Message-ID: <80be3bd2cac32af2808dd5fbbb612946@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGdfabffb195ee: [x86][lvi][seses] Use SESES at O0 for LVI mitigation (authored by zbrid). Changed prior to commit: https://reviews.llvm.org/D82037?vs=271791&id=275679#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82037/new/ https://reviews.llvm.org/D82037 Files: llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86TargetMachine.cpp llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/lvi-hardening-loads.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82037.275679.patch Type: text/x-patch Size: 11338 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:19:54 2020 From: llvm-commits at lists.llvm.org (Katherine Rasmussen via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:19:54 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <2d11100db1fa855041710d27db5ef2ec@localhost.localdomain> ktras added a comment. In D83142#2136615 , @tskeith wrote: > @ktras, do you have the necessary permissions to commit this or do you need someone to do it for you? No, I do not, so I would need someone to commit it for me if everything looks ready to go. Thank you all for your help! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Tue Jul 7 11:23:02 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Tue, 07 Jul 2020 11:23:02 -0700 (PDT) Subject: [llvm] 9dfea03 - [SCCP] Handle assume predicates Message-ID: <5f04bd86.1c69fb81.f20c8.ecc2@mx.google.com> Author: Nikita Popov Date: 2020-07-07T20:22:52+02:00 New Revision: 9dfea0351797e1e10b6e28e290c134c179eba504 URL: https://github.com/llvm/llvm-project/commit/9dfea0351797e1e10b6e28e290c134c179eba504 DIFF: https://github.com/llvm/llvm-project/commit/9dfea0351797e1e10b6e28e290c134c179eba504.diff LOG: [SCCP] Handle assume predicates Take assume predicates into account when visiting ssa.copy. The handling is the same as for branch predicates, with the difference that we're always on the true edge. Differential Revision: https://reviews.llvm.org/D83257 Added: llvm/test/Transforms/SCCP/assume.ll Modified: llvm/lib/Transforms/Scalar/SCCP.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/SCCP.cpp b/llvm/lib/Transforms/Scalar/SCCP.cpp index c08007f49a63..d28846d6e6de 100644 --- a/llvm/lib/Transforms/Scalar/SCCP.cpp +++ b/llvm/lib/Transforms/Scalar/SCCP.cpp @@ -1247,16 +1247,24 @@ void SCCPSolver::handleCallResult(CallBase &CB) { return; Value *CopyOf = CB.getOperand(0); - auto *PI = getPredicateInfoFor(&CB); - auto *PBranch = dyn_cast_or_null(PI); ValueLatticeElement OriginalVal = getValueState(CopyOf); - if (!PI || !PBranch) { + auto *PI = getPredicateInfoFor(&CB); + assert(PI && "Missing predicate info for ssa.copy"); + + CmpInst *Cmp; + bool TrueEdge; + if (auto *PBranch = dyn_cast(PI)) { + Cmp = dyn_cast(PBranch->Condition); + TrueEdge = PBranch->TrueEdge; + } else if (auto *PAssume = dyn_cast(PI)) { + Cmp = dyn_cast(PAssume->Condition); + TrueEdge = true; + } else { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; } // Everything below relies on the condition being a comparison. - auto *Cmp = dyn_cast(PBranch->Condition); if (!Cmp) { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; @@ -1281,7 +1289,7 @@ void SCCPSolver::handleCallResult(CallBase &CB) { return; } - if (!PBranch->TrueEdge) + if (!TrueEdge) Pred = CmpInst::getInversePredicate(Pred); ValueLatticeElement CondVal = getValueState(CmpOp1); diff --git a/llvm/test/Transforms/SCCP/assume.ll b/llvm/test/Transforms/SCCP/assume.ll new file mode 100644 index 000000000000..764c1737c287 --- /dev/null +++ b/llvm/test/Transforms/SCCP/assume.ll @@ -0,0 +1,48 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -ipsccp -S | FileCheck %s + +declare void @use(i1) +declare void @llvm.assume(i1) + +define void @basic(i32 %v) { +; CHECK-LABEL: @basic( +; CHECK-NEXT: [[A1:%.*]] = icmp ult i32 [[V:%.*]], 10 +; CHECK-NEXT: call void @llvm.assume(i1 [[A1]]) +; CHECK-NEXT: [[A2:%.*]] = icmp ugt i32 [[V]], 5 +; CHECK-NEXT: call void @llvm.assume(i1 [[A2]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[V]], 9 +; CHECK-NEXT: call void @use(i1 [[C2]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C4:%.*]] = icmp ugt i32 [[V]], 8 +; CHECK-NEXT: call void @use(i1 [[C4]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C6:%.*]] = icmp ugt i32 [[V]], 6 +; CHECK-NEXT: call void @use(i1 [[C6]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C8:%.*]] = icmp ult i32 [[V]], 7 +; CHECK-NEXT: call void @use(i1 [[C8]]) +; CHECK-NEXT: ret void +; + %a1 = icmp ult i32 %v, 10 + call void @llvm.assume(i1 %a1) + %a2 = icmp ugt i32 %v, 5 + call void @llvm.assume(i1 %a2) + %c1 = icmp ult i32 %v, 10 + call void @use(i1 %c1) + %c2 = icmp ult i32 %v, 9 + call void @use(i1 %c2) + %c3 = icmp ugt i32 %v, 9 + call void @use(i1 %c3) + %c4 = icmp ugt i32 %v, 8 + call void @use(i1 %c4) + %c5 = icmp ugt i32 %v, 5 + call void @use(i1 %c5) + %c6 = icmp ugt i32 %v, 6 + call void @use(i1 %c6) + %c7 = icmp ult i32 %v, 6 + call void @use(i1 %c7) + %c8 = icmp ult i32 %v, 7 + call void @use(i1 %c8) + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 11:23:05 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:23:05 +0000 (UTC) Subject: [PATCH] D83257: [SCCP] Handle assume predicates In-Reply-To: References: Message-ID: <491c9e55cb322a0c0192653dfe2e0873@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9dfea0351797: [SCCP] Handle assume predicates (authored by nikic). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83257/new/ https://reviews.llvm.org/D83257 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/assume.ll Index: llvm/test/Transforms/SCCP/assume.ll =================================================================== --- /dev/null +++ llvm/test/Transforms/SCCP/assume.ll @@ -0,0 +1,48 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -ipsccp -S | FileCheck %s + +declare void @use(i1) +declare void @llvm.assume(i1) + +define void @basic(i32 %v) { +; CHECK-LABEL: @basic( +; CHECK-NEXT: [[A1:%.*]] = icmp ult i32 [[V:%.*]], 10 +; CHECK-NEXT: call void @llvm.assume(i1 [[A1]]) +; CHECK-NEXT: [[A2:%.*]] = icmp ugt i32 [[V]], 5 +; CHECK-NEXT: call void @llvm.assume(i1 [[A2]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[V]], 9 +; CHECK-NEXT: call void @use(i1 [[C2]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C4:%.*]] = icmp ugt i32 [[V]], 8 +; CHECK-NEXT: call void @use(i1 [[C4]]) +; CHECK-NEXT: call void @use(i1 true) +; CHECK-NEXT: [[C6:%.*]] = icmp ugt i32 [[V]], 6 +; CHECK-NEXT: call void @use(i1 [[C6]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: [[C8:%.*]] = icmp ult i32 [[V]], 7 +; CHECK-NEXT: call void @use(i1 [[C8]]) +; CHECK-NEXT: ret void +; + %a1 = icmp ult i32 %v, 10 + call void @llvm.assume(i1 %a1) + %a2 = icmp ugt i32 %v, 5 + call void @llvm.assume(i1 %a2) + %c1 = icmp ult i32 %v, 10 + call void @use(i1 %c1) + %c2 = icmp ult i32 %v, 9 + call void @use(i1 %c2) + %c3 = icmp ugt i32 %v, 9 + call void @use(i1 %c3) + %c4 = icmp ugt i32 %v, 8 + call void @use(i1 %c4) + %c5 = icmp ugt i32 %v, 5 + call void @use(i1 %c5) + %c6 = icmp ugt i32 %v, 6 + call void @use(i1 %c6) + %c7 = icmp ult i32 %v, 6 + call void @use(i1 %c7) + %c8 = icmp ult i32 %v, 7 + call void @use(i1 %c8) + ret void +} Index: llvm/lib/Transforms/Scalar/SCCP.cpp =================================================================== --- llvm/lib/Transforms/Scalar/SCCP.cpp +++ llvm/lib/Transforms/Scalar/SCCP.cpp @@ -1247,16 +1247,24 @@ return; Value *CopyOf = CB.getOperand(0); - auto *PI = getPredicateInfoFor(&CB); - auto *PBranch = dyn_cast_or_null(PI); ValueLatticeElement OriginalVal = getValueState(CopyOf); - if (!PI || !PBranch) { + auto *PI = getPredicateInfoFor(&CB); + assert(PI && "Missing predicate info for ssa.copy"); + + CmpInst *Cmp; + bool TrueEdge; + if (auto *PBranch = dyn_cast(PI)) { + Cmp = dyn_cast(PBranch->Condition); + TrueEdge = PBranch->TrueEdge; + } else if (auto *PAssume = dyn_cast(PI)) { + Cmp = dyn_cast(PAssume->Condition); + TrueEdge = true; + } else { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; } // Everything below relies on the condition being a comparison. - auto *Cmp = dyn_cast(PBranch->Condition); if (!Cmp) { mergeInValue(ValueState[&CB], &CB, OriginalVal); return; @@ -1281,7 +1289,7 @@ return; } - if (!PBranch->TrueEdge) + if (!TrueEdge) Pred = CmpInst::getInversePredicate(Pred); ValueLatticeElement CondVal = getValueState(CmpOp1); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83257.276149.patch Type: text/x-patch Size: 3254 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:23:06 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:23:06 +0000 (UTC) Subject: [PATCH] D82210: [SVE] Remove calls to VectorType::getNumElements from CodeGen In-Reply-To: References: Message-ID: <0e0bbabb4e74c5219f3d4acb2f457115@localhost.localdomain> ctetreau updated this revision to Diff 276148. ctetreau added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82210/new/ https://reviews.llvm.org/D82210 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/ExpandReductions.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/CodeGen/ValueTypes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82210.276148.patch Type: text/x-patch Size: 10218 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:23:41 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:23:41 +0000 (UTC) Subject: [PATCH] D83149: [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush In-Reply-To: References: Message-ID: serge-sans-paille added inline comments. ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c:49 dlerror(); - void (*gcov_flush1)() = (void (*)())dlsym(f1_handle, "__gcov_flush"); - if (gcov_flush1 == NULL) { - fprintf(stderr, "unable to find __gcov_flush in func.shared': %s\n", dlerror()); + void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); + if (gcov_reset1 == NULL) { ---------------- MaskRay wrote: > serge-sans-paille wrote: > > Do we also need to test gcov_flush symbol here too? > `__gcov_flush` is deleted. We don't need to test it. Sorry, I meant `__gcov_dump` ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c.gcov:56 +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); // CHECK-NEXT: #####: 52: return EXIT_FAILURE; ---------------- MaskRay wrote: > serge-sans-paille wrote: > > Same question here, what about gcov_flush symbol? > `__gcov_flush` is deleted. We don't need to test it. Sorry, I meant `__gcov_dump` ================ Comment at: compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main_three-libs.c.gcov:55 +// CHECK-NEXT: 1: 49: void (*gcov_reset1)() = (void (*)())dlsym(f1_handle, "__gcov_reset"); +// CHECK-NEXT: 1: 50: if (gcov_reset1 == NULL) { +// CHECK-NEXT: #####: 51: fprintf(stderr, "unable to find __gcov_reset in func.shared': %s\n", dlerror()); ---------------- MaskRay wrote: > serge-sans-paille wrote: > > And here. > `__gcov_flush` is deleted. We don't need to test it. Sorry, I meant `__gcov_dump` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83149/new/ https://reviews.llvm.org/D83149 From llvm-commits at lists.llvm.org Tue Jul 7 11:25:31 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:25:31 +0000 (UTC) Subject: [PATCH] D82243: [SVE] Remove calls to VectorType::getNumElements from Scalar In-Reply-To: References: Message-ID: <11df6d9587fe449ad3d7bfec68ca5c04@localhost.localdomain> ctetreau updated this revision to Diff 276150. ctetreau added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82243/new/ https://reviews.llvm.org/D82243 Files: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp llvm/lib/Transforms/Scalar/SROA.cpp llvm/lib/Transforms/Scalar/Scalarizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82243.276150.patch Type: text/x-patch Size: 14942 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:27:50 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:27:50 +0000 (UTC) Subject: [PATCH] D82754: [lit] Prevent hang when lit sees non-ASCII characters In-Reply-To: References: Message-ID: <252f4644f5d05c110b638bb5ba1c05bb@localhost.localdomain> jdenny added a comment. In D82754#2136412 , @richard.barton.arm wrote: > I will push an update with a new comment and my new test. Thanks! > I guess the evilness of it when/if it were to regress is ok - make sure we don't regress I suppose! I agree that the possibility of regressions causing hangs is not nice, but I think it's better for them to occur in our test suite immediately than in the wild later. ================ Comment at: llvm/utils/lit/lit/display.py:89 pass - out = out.decode(encoding=sys.stdout.encoding) + out = out.decode(encoding=sys.stdout.encoding, errors="ignore") print(out) ---------------- jdenny wrote: > Please add a comment documenting the platform where this problem can be seen. Include the python version. If people try to simplify this subtle passage of code later, perhaps to abandon python 2 support, such comments should prove helpful. When I first read this new comment, I thought it might be repeating the info from the earlier comment but at a more specific location in the code. To make it clear this is a different issue, I'd prefer "can raise UnicodeDecodeError" -> "can raise UnicodeDecodeError here too". Don't forget the `.` on the last sentence. ================ Comment at: llvm/utils/lit/tests/shtest-shell-ascii.py:7 +# RUN: cat %t.out +# RUN: FileCheck --input-file %t.out %s +# ---------------- This test mostly copies a piece of `shtest-shell.py` but with `PYTHONIOENCODING=ascii`. In doing so, it separates the original `shtest-encoding.txt` from the new one even though they're covering related bugs. I wonder if it would be better to keep everything together and avoid duplication by just extending `shtest-shell.py` with these new `RUN:` lines. A comment could explain that `shtest-encoding.txt` is the main focus but that the other tests are covered by these new `RUN:` lines too just in case a problem crops up in them. What do you think? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82754/new/ https://reviews.llvm.org/D82754 From llvm-commits at lists.llvm.org Tue Jul 7 11:28:41 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:28:41 +0000 (UTC) Subject: [PATCH] D79910: [x86][seses] Add clang flag; Use lvi-cfi with seses In-Reply-To: References: Message-ID: <6611c6f397896b3308f0f941769787cb@localhost.localdomain> zbrid updated this revision to Diff 276153. zbrid added a comment. Herald added a subscriber: jfb. update seses flag Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79910/new/ https://reviews.llvm.org/D79910 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79910.276153.patch Type: text/x-patch Size: 10042 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:29:09 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:29:09 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: Ayal added a comment. In D81416#2136421 , @AaronLiu wrote: > In the application we try, LV refuse to vectorize due to not profitable, but if we force LV to vectorize and it will crash. Apparently there are some obstacles. There are cases that even if LV fails, SLP could succeed. In that case, best understand why LV's cost model claims vectorizing the loop is not profitable, which you and SLP know it is; and ideally fix LV's cost model. A crash due to forced vectorization sounds like a bug, which best be reported and/or fixed. If cases with concrete "obstacles" are identified preventing LV from vectorizing a loop but allowing SLP to vectorize (part of) it, after LV interleaves the loop, such obstacles could potentially be used to (further) drive LV to interleave the loop. > Yes, the term small loop is a little bit of confusing. For example a loop which has a small number of instructions but has a huge loop trip count, is the loop small or big? In our example, the loop trip count is small, and also the instruction number is small. Hence the term "small loop" should be more specific; as in "vectorizer-min-trip-count" / "TinyTripCountVectorThreshold". CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Tue Jul 7 11:32:11 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:32:11 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <323b94e0105cb8a87e959198e43189f6@localhost.localdomain> dantrushin updated this revision to Diff 276155. dantrushin added a comment. Add two simple MIR tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp llvm/test/CodeGen/X86/statepoint-fixup-call.mir llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir llvm/test/CodeGen/X86/statepoint-vreg.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.276155.patch Type: text/x-patch Size: 31139 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:32:23 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:32:23 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <5fa370484d2c00434d77a7c43ba55372@localhost.localdomain> zequanwu marked an inline comment as done. zequanwu added inline comments. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- zequanwu wrote: > nikic wrote: > > hans wrote: > > > nikic wrote: > > > > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > > > > > > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > It would only help if there is some way to only fetch the analysis conditionally. I believe many PGO passes use something like PSI.hasProfileSummary() or F.hasProfileData() for that. > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > > Right, just disabling this by default in clang/opt would also work. > > > > For reference, the current compile-time numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=516ff1d4baee28b1911737e47b42973567adf8ff&to=8df840660bb764b6653fcfd9ac7a72cc6adebde6&stat=instructions Not huge, but it adds up (some similar regressions have been introduced in LLVM 10). > Do you mean disabling it just for LPM or both? > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? For Clang, a better fix I think is that `Opts.CallGraphProfile` should based on both whether the integrated assembler is used and whether profile instrumentation is turned on. What do you think? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 11:35:03 2020 From: llvm-commits at lists.llvm.org (Michele Scandale via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:35:03 +0000 (UTC) Subject: [PATCH] D82659: Fix missing build dependency on omp_gen. In-Reply-To: References: Message-ID: michele.scandale added a comment. Why `omp_gen` is now a dependency of `clang-tablegen-targets` rather than being in the `LLVM_COMMON_DEPENDS` list like `clang-tablegen-targets`? Moreover I've noticed that with the recent changes where `omp_gen` has been added as a dependency in several libraries, this was done unconditionally breaking the Clang standalone build. For the same issue `intrinsics_gen` is added only if `CLANG_BUILT_STANDALONE ` is false. At this point I think that something like: # All targets below may depend on all tablegen'd files. get_property(CLANG_TABLEGEN_TARGETS GLOBAL PROPERTY CLANG_TABLEGEN_TARGETS) add_custom_target(clang-tablegen-targets DEPENDS ${CLANG_TABLEGEN_TARGETS}) set_target_properties(clang-tablegen-targets PROPERTIES FOLDER "Misc") list(APPEND LLVM_COMMON_DEPENDS clang-tablegen-targets) if(NOT CLANG_BUILT_STANDALONE) list(APPEND LLVM_COMMON_DEPENDS omg_gen) endif() would fix all the issues, and it would allow removing the explicit dependencies added to each clang library. Is there any issue with my reasoning? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82659/new/ https://reviews.llvm.org/D82659 From llvm-commits at lists.llvm.org Tue Jul 7 11:35:44 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:35:44 +0000 (UTC) Subject: [PATCH] D83329: [PGO][PGSO] Add profile guided size optimization to loop vectorization legality. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83329 Files: llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/optsize.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83329.276156.patch Type: text/x-patch Size: 9870 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:35:45 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:35:45 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <373ab9670ded72dccca8c90928a232c4@localhost.localdomain> nikic added inline comments. ================ Comment at: llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp:2518 + if (auto *PN = foldSelectToPhiImpl(Sel, I->getParent(), DT, Builder)) + return PN; + ---------------- It seems quite likely that some of the parents (or all of them) are going to be the same. Might it make sense to deduplicate? ``` // Collect likely candidates for placing the phi node. SmallPtrSet CandidateBlocks; CandidateBlocks.insert(Sel.getParent(); for (Value *V : Sel.operands()) if (auto *I = dyn_cast(V)) CandidateBlocks.insert(I->getParent()); for (BasicBlock *BB : CandidateBlocks) if (auto *PN = foldSelectToPhiImpl(Sel, BB, DT, Builder)) return PN; ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Tue Jul 7 11:36:19 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:36:19 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added subscribers: nikic, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83330 Files: llvm/lib/Target/X86/X86FixupLEAs.cpp llvm/test/CodeGen/X86/fixup-lea.ll llvm/test/CodeGen/X86/opt-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83330.276158.patch Type: text/x-patch Size: 5547 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:36:30 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:36:30 +0000 (UTC) Subject: [PATCH] D83152: llvm-nm: add flag to suppress no symbols warning In-Reply-To: References: Message-ID: rupprecht added a comment. In D83152#2135334 , @jhenderson wrote: > In D83152#2134405 , @keith wrote: > > > In D83152#2133950 , @rupprecht wrote: > > > > > In D83152#2133855 , @MaskRay wrote: > > > > > > > I cannot find any search result about `no-warning-for-no-symbols`. Is `-no-warning-for-no-symbols` really an existing option? libtool is an `ar` like tool. > > > > > > > > > I found it by looking for underscores instead of hyphens: `-no_warning_for_no_symbols`. > > > However, the flag is an ar/ranlib/libtool flag, not nm, AFAICT. > > > > > > Yea sorry I should have been more clear, it's not the _exact_ same spelling because of the conventions used in nm with `-` instead of `_`. > > > > >> Second, I wonder how you are going to plug `-no-warning-for-no-symbols` into a build system. If you only parse stdout, you can ignore stderr. Even if you do, you can probably use `grep -v '^no symbols'`. This will have better portability (supported on older nm, supported on other binary formats). > > > > > > I agree this is likely the simpler option (just add `2> /dev/null` to the build script using `nm`) > > > > If folks feel strongly about this that would definitely work, this felt like a safer way to silence this for the future for me, but if you all think it's not worth adding an option for that's fine. > > > I don't have strong opinions either way. I think there probably is some benefit to adding the new option: one counter-point to redirecting stderr to `/dev/null` is that will hide any real error messages llvm-nm (e.g. caused by a missing input file). I don't think that's a good idea personally. The return code would be 1 for a missing input file, which should fail whatever build step/etc. is executing llvm-nm. If you're ignoring that return code, you have other problems :) That said, I'm on the fence here -- it doesn't seem all that complex to add it, but it also doesn't seem worth the extra (minor) complexity. Maybe I don't understand the use case well enough. Is there a build step executing llvm-nm that chokes on this? Can you say more about the use case? > That being said, if stderr is being redirected somewhere other than `/dev/null` maybe it's okay. Another issue is that because stderr and stdout are often handled independently, but end up in the same final output, you can end up with the "no symbols" message appearing quite some way from where it really belongs, which could be slightly confusing. However, it's not an issue I actually have at the moment. > > Another possible benefit (again not something I personally have worried about, but one I thought of) is that users could specify the option to avoid having to special case parsing scripts to handle no symbols. A naive script would typically just iterate through the lines and expect each one to correspond to a symbol, so if there are none, the parser might break. Since "no symbols" is printed to stderr, not stdout, you would have to go out of your way to have such a script need to handle "no symbols" as input by joining stdout and stderr. Normal pipelining (`$ llvm-nm foo.o | script`) would drop stderr. > Another option name could just be `--quiet`, although there's a small risk that could clash in the future with a GNU option. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83152/new/ https://reviews.llvm.org/D83152 From llvm-commits at lists.llvm.org Tue Jul 7 11:36:44 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:36:44 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83331 Files: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/test/CodeGen/X86/popcnt.ll llvm/test/CodeGen/X86/pr27202.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83331.276159.patch Type: text/x-patch Size: 19323 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:37:15 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:37:15 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83332 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/phaddsub-extract.ll llvm/test/CodeGen/X86/splat-for-size.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83332.276160.patch Type: text/x-patch Size: 4205 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:37:42 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:37:42 +0000 (UTC) Subject: [PATCH] D83333: [PGO][PGSO] Add profile guided size optimization to LegalizeDAG. Message-ID: yamauchi created this revision. yamauchi added a reviewer: davidxl. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83333 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/test/CodeGen/AArch64/arm64-fp-imm-size.ll Index: llvm/test/CodeGen/AArch64/arm64-fp-imm-size.ll =================================================================== --- llvm/test/CodeGen/AArch64/arm64-fp-imm-size.ll +++ llvm/test/CodeGen/AArch64/arm64-fp-imm-size.ll @@ -38,3 +38,38 @@ ; CHECK-NEXT: ret ret fp128 0xL00000000000000000000000000000000 } + +; CHECK: literal8 +; CHECK: .quad 0x0000001fffffffd +define double @foo2_pgso() !prof !14 { +; CHECK: _foo2_pgso: +; CHECK: adrp x[[REG:[0-9]+]], lCPI4_0 at PAGE +; CHECK: ldr d0, [x[[REG]], lCPI4_0 at PAGEOFF] +; CHECK-NEXT: ret + ret double 0x1FFFFFFFd1 +} + +define float @bar_pgso() !prof !14 { +; CHECK: _bar_pgso: +; CHECK: adrp x[[REG:[0-9]+]], lCPI5_0 at PAGE +; CHECK: ldr s0, [x[[REG]], lCPI5_0 at PAGEOFF] +; CHECK-NEXT: ret + ret float 0x400921FB80000000 +} + +!llvm.module.flags = !{!0} +!0 = !{i32 1, !"ProfileSummary", !1} +!1 = !{!2, !3, !4, !5, !6, !7, !8, !9} +!2 = !{!"ProfileFormat", !"InstrProf"} +!3 = !{!"TotalCount", i64 10000} +!4 = !{!"MaxCount", i64 10} +!5 = !{!"MaxInternalCount", i64 1} +!6 = !{!"MaxFunctionCount", i64 1000} +!7 = !{!"NumCounts", i64 3} +!8 = !{!"NumFunctions", i64 3} +!9 = !{!"DetailedSummary", !10} +!10 = !{!11, !12, !13} +!11 = !{i32 10000, i64 100, i32 1} +!12 = !{i32 999000, i64 100, i32 1} +!13 = !{i32 999999, i64 1, i32 2} +!14 = !{!"function_entry_count", i64 0} Index: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -3310,7 +3310,7 @@ // Check to see if this FP immediate is already legal. // If this is a legal constant, turn it into a TargetConstantFP node. if (!TLI.isFPImmLegal(CFP->getValueAPF(), Node->getValueType(0), - DAG.getMachineFunction().getFunction().hasOptSize())) + DAG.shouldOptForSize())) Results.push_back(ExpandConstantFP(CFP, true)); break; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83333.276161.patch Type: text/x-patch Size: 1987 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:41:03 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:41:03 +0000 (UTC) Subject: [PATCH] D83260: [PGO][PGSO] Add profile guided size optimizations to some new sites. In-Reply-To: References: Message-ID: yamauchi added a comment. In D83260#2135806 , @fhahn wrote: > I think it would be preferable to have separate patches per pass. Done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83260/new/ https://reviews.llvm.org/D83260 From llvm-commits at lists.llvm.org Tue Jul 7 11:41:10 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:41:10 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: nikic added inline comments. ================ Comment at: llvm/test/CodeGen/X86/opt-pipeline.ll:187 ; CHECK-NEXT: X86 Atom pad short functions +; CHECK-NEXT: Lazy Machine Block Frequency Analysis ; CHECK-NEXT: X86 LEA Fixup ---------------- Side note: You might want to mark LazyMBFI as preserved in X86PadShortFunction, I doubt that pass changes anything related to block frequency. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 From llvm-commits at lists.llvm.org Tue Jul 7 11:47:28 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:47:28 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <84b555ae7ee4672065fde08aed818718@localhost.localdomain> stefanp updated this revision to Diff 276164. stefanp marked an inline comment as done. stefanp added a comment. Fixed some nits in comments. Added a fatal error and a test case case to cover the situation where the offset for the branch is more than 26 bits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82950.276164.patch Type: text/x-patch Size: 7143 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:48:07 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:48:07 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <8de519c9ff50f37f42c4d47633039eb0@localhost.localdomain> stefanp added inline comments. ================ Comment at: lld/ELF/Thunks.cpp:842 + write32(buf + 0, 0xf8410018); // std r2,24(r1) + write32(buf + 4, 0x48000000 | (offset & 0x03fffffc)); // b +} ---------------- sfertile wrote: > What happens if offset doesn't fit within 26 bits? Good point. Right now we just drop the extra bits from the offset and jump to an incorrect offset. I'll add an error to fail at this point if the offset does not fit in the 26 bits. ================ Comment at: lld/ELF/Thunks.cpp:846 +void PPC64R2SaveStub::addSymbols(ThunkSection &isec) { + Defined *s = addSymbol(saver.save("__long_branch_" + destination.getName()), + STT_FUNC, 0, isec); ---------------- sfertile wrote: > This is being named the same as a branch extending thunk (ie a trampoline for when the call is too far to represent with a single call instruction). The name we create shoud represent the thunk type, it makes reading disassembly much easier. How about "__toc_save_` instead? The reason I had used this name was because GCC used `long_branch.callee` as the name as well so I figured I would do the same thing. However, I do like the idea of calling it `__toc_save` better so I'm going to use that instead. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 From llvm-commits at lists.llvm.org Tue Jul 7 11:48:48 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:48:48 +0000 (UTC) Subject: [PATCH] D81378: GlobalISel: Handle more cases in getGCDType In-Reply-To: References: Message-ID: arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81378/new/ https://reviews.llvm.org/D81378 From llvm-commits at lists.llvm.org Tue Jul 7 11:52:42 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via llvm-commits) Date: Tue, 07 Jul 2020 11:52:42 -0700 (PDT) Subject: [llvm] 7c03872 - LIS: fix handleMove to properly extend main range Message-ID: <5f04c47a.1c69fb81.2e511.7e02@mx.google.com> Author: Stanislav Mekhanoshin Date: 2020-07-07T11:52:32-07:00 New Revision: 7c038726453b76d6f40590b22304c43ffa05aaf1 URL: https://github.com/llvm/llvm-project/commit/7c038726453b76d6f40590b22304c43ffa05aaf1 DIFF: https://github.com/llvm/llvm-project/commit/7c038726453b76d6f40590b22304c43ffa05aaf1.diff LOG: LIS: fix handleMove to properly extend main range handleMoveDown or handleMoveUp cannot properly repair a main range of a LiveInterval since they only get LiveRange. There is a problem if certain use has moved few segments away and there is a hole in the main range in between of these two locations. We may get a SubRange with a very extended Segment spanning several Segments of the main range and also spanning that hole. If that happens then we end up with the main range not covering its SubRange which is an error. It might be possible to attempt fixing the main range in place just between of the old and new index by extending all of its Segments in between, but it is unclear this logic will be faster than just straight constructMainRangeFromSubranges, which itself is pretty cheap since it only contains interval logic. That will also require shrinkToUses() call after which is probably even more expensive. In the test second move is from 64B to 92B for the sub1. Subrange is correctly fixed: L000000000000000C [16r,32B:0)[32B,92r:1) 0 at 16r 1 at 32B-phi But the main range has a hole in between 80d and 88r after updateRange(): %1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2) Since source position is 64B this segment is not even considered by the updateRange(). Differential Revision: https://reviews.llvm.org/D82916 Added: Modified: llvm/lib/CodeGen/LiveIntervals.cpp llvm/unittests/MI/LiveIntervalTest.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp index 2bbe036e8425..e8ee0599e1a2 100644 --- a/llvm/lib/CodeGen/LiveIntervals.cpp +++ b/llvm/lib/CodeGen/LiveIntervals.cpp @@ -1011,6 +1011,20 @@ class LiveIntervals::HMEditor { } } updateRange(LI, Reg, LaneBitmask::getNone()); + // If main range has a hole and we are moving a subrange use across + // the hole updateRange() cannot properly handle it since it only + // gets the LiveRange and not the whole LiveInterval. As a result + // we may end up with a main range not covering all subranges. + // This is extremely rare case, so let's check and reconstruct the + // main range. + for (LiveInterval::SubRange &S : LI.subranges()) { + if (LI.covers(S)) + continue; + LI.clear(); + LIS.constructMainRangeFromSubranges(LI); + break; + } + continue; } diff --git a/llvm/unittests/MI/LiveIntervalTest.cpp b/llvm/unittests/MI/LiveIntervalTest.cpp index 5c974ea7461e..3971d86e82d3 100644 --- a/llvm/unittests/MI/LiveIntervalTest.cpp +++ b/llvm/unittests/MI/LiveIntervalTest.cpp @@ -499,6 +499,26 @@ TEST(LiveIntervalTest, TestMoveSubRegDefAcrossUseDefMulti) { }); } +TEST(LiveIntervalTest, TestMoveSubRegUseAcrossMainRangeHole) { + liveIntervalTest(R"MIR( + %1:sgpr_128 = IMPLICIT_DEF + bb.1: + %2:sgpr_32 = COPY %1.sub2 + %3:sgpr_32 = COPY %1.sub1 + %1.sub2 = COPY %2 + undef %1.sub0 = IMPLICIT_DEF + %1.sub2 = IMPLICIT_DEF + S_CBRANCH_SCC1 %bb.1, implicit undef $scc + S_BRANCH %bb.2 + bb.2: +)MIR", [](MachineFunction &MF, LiveIntervals &LIS) { + MachineInstr &MI = getMI(MF, 3, /*BlockNum=*/1); + MI.getOperand(0).setIsUndef(false); + testHandleMove(MF, LIS, 4, 3, 1); + testHandleMove(MF, LIS, 1, 4, 1); + }); +} + TEST(LiveIntervalTest, BundleUse) { liveIntervalTest(R"MIR( %0 = IMPLICIT_DEF From llvm-commits at lists.llvm.org Tue Jul 7 11:53:34 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:53:34 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <2c4873815fdef58c59376c6900c97fed@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7135 if (IsCompressedMemory) { + assert(!DataVT.isScalableVector() && + "Cannot currently handle compressed memory with scalable vectors"); ---------------- kmclaughlin wrote: > david-arm wrote: > > Do we know if this is something we catch earlier and hence should never get here? I just wonder if here it's not really an assert that something went wrong with the code, but perhaps we just hit a case we don't support yet? If it's just because we don't support it yet, instead of asserting we could do: > > > > if (DataVT.isScalableVector()) > > report_fatal_error("Cannot currently handle compressed memory with scalable vectors"); > I think this is something that we just don't support yet, so I've changed this to `report_fatal_error` as suggested This is part of the support for llvm.masked.expandload/llvm.masked.compressstore. There isn't a native instruction for that in SVE, but it's still a reasonable operation with scalable vectors. ================ Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1096 + // Extract lo/hi halves of legal predicate types. + def : Pat<(nxv2i1 (extract_subvector (nxv4i1 PPR:$Ps), (i64 0))), ---------------- Do we need to support extracting, for example, an nxv2i1 from an nxv16i1? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Tue Jul 7 11:58:13 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:58:13 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: <27408a821ede32babfe30ba4af89161f@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 From llvm-commits at lists.llvm.org Tue Jul 7 11:58:40 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:58:40 +0000 (UTC) Subject: [PATCH] D82659: Fix missing build dependency on omp_gen. In-Reply-To: References: Message-ID: <85315bf7f239eb177058e0c1bfe66ee2@localhost.localdomain> clementval added a comment. In D82659#2136909 , @michele.scandale wrote: > Why `omp_gen` is now a dependency of `clang-tablegen-targets` rather than being in the `LLVM_COMMON_DEPENDS` list like `clang-tablegen-targets`? > > Moreover I've noticed that with the recent changes where `omp_gen` has been added as a dependency in several libraries, this was done unconditionally breaking the Clang standalone build. > For the same issue `intrinsics_gen` is added only if `CLANG_BUILT_STANDALONE ` is false. > > At this point I think that something like: > > # All targets below may depend on all tablegen'd files. > get_property(CLANG_TABLEGEN_TARGETS GLOBAL PROPERTY CLANG_TABLEGEN_TARGETS) > add_custom_target(clang-tablegen-targets DEPENDS ${CLANG_TABLEGEN_TARGETS}) > set_target_properties(clang-tablegen-targets PROPERTIES FOLDER "Misc") > list(APPEND LLVM_COMMON_DEPENDS clang-tablegen-targets) > if(NOT CLANG_BUILT_STANDALONE) > list(APPEND LLVM_COMMON_DEPENDS omg_gen) > endif() > > > would fix all the issues, and it would allow removing the explicit dependencies added to each clang library. > > Is there any issue with my reasoning? Looks good but just one question ... When clang is built as standalone it does not build the OpenMP part inside Clang? I haven't seen any code to avoid compiling the OpenMP parsing and semantic checking inside clang. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82659/new/ https://reviews.llvm.org/D82659 From llvm-commits at lists.llvm.org Tue Jul 7 11:59:18 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= via llvm-commits) Date: Tue, 07 Jul 2020 11:59:18 -0700 (PDT) Subject: [llvm] 446e3df - [llvm] [docs] Do not require recommonmark for manpage build Message-ID: <5f04c606.1c69fb81.85f57.7d57@mx.google.com> Author: Michał Górny Date: 2020-07-07T20:59:02+02:00 New Revision: 446e3df25483312c8a7dfb3c53eef0de0e13074a URL: https://github.com/llvm/llvm-project/commit/446e3df25483312c8a7dfb3c53eef0de0e13074a DIFF: https://github.com/llvm/llvm-project/commit/446e3df25483312c8a7dfb3c53eef0de0e13074a.diff LOG: [llvm] [docs] Do not require recommonmark for manpage build Do not enforce recommonmark dependency if sphinx is called to build manpages. In order to do this, try to import recommonmark first and do not configure it if it's not available. Additionally, declare a custom tags for the selected builder via CMake, and ignore recommonmark import failure when 'man' target is used. This will permit us to avoid the problematic recommonmark dependency for the majority of Gentoo users that do not need to locally build the complete documentation but want to have tool manpages. Differential Revision: https://reviews.llvm.org/D83161 Added: Modified: llvm/cmake/modules/AddSphinxTarget.cmake llvm/docs/conf.py Removed: ################################################################################ diff --git a/llvm/cmake/modules/AddSphinxTarget.cmake b/llvm/cmake/modules/AddSphinxTarget.cmake index f053d8084da4..b5babb30abcf 100644 --- a/llvm/cmake/modules/AddSphinxTarget.cmake +++ b/llvm/cmake/modules/AddSphinxTarget.cmake @@ -38,6 +38,7 @@ function (add_sphinx_target builder project) -b ${builder} -d "${SPHINX_DOC_TREE_DIR}" -q # Quiet: no output other than errors and warnings. + -t builder-${builder} # tag for builder ${SPHINX_WARNINGS_AS_ERRORS_FLAG} # Treat warnings as errors if requested "${ARG_SOURCE_DIR}" # Source "${SPHINX_BUILD_DIR}" # Output diff --git a/llvm/docs/conf.py b/llvm/docs/conf.py index 948c61c2c0f9..aed5e06b6f50 100644 --- a/llvm/docs/conf.py +++ b/llvm/docs/conf.py @@ -28,22 +28,29 @@ # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.intersphinx', 'sphinx.ext.todo'] -import sphinx -if sphinx.version_info >= (3, 0): - # This requires 0.5 or later. - extensions.append('recommonmark') -else: - source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} - # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = { '.rst': 'restructuredtext', - '.md': 'markdown', } +try: + import recommonmark +except ImportError: + # manpages do not use any .md sources + if not tags.has('builder-man'): + raise +else: + import sphinx + if sphinx.version_info >= (3, 0): + # This requires 0.5 or later. + extensions.append('recommonmark') + else: + source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} + source_suffix['.md'] = 'markdown' + # The encoding of source files. #source_encoding = 'utf-8-sig' From llvm-commits at lists.llvm.org Tue Jul 7 11:59:22 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Micha=C5=82_G=C3=B3rny_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 18:59:22 +0000 (UTC) Subject: [PATCH] D83161: [llvm] [docs] Do not require recommonmark for manpage build In-Reply-To: References: Message-ID: <77f2d637b9b4111bacc89fc32433ae72@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG446e3df25483: [llvm] [docs] Do not require recommonmark for manpage build (authored by mgorny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83161/new/ https://reviews.llvm.org/D83161 Files: llvm/cmake/modules/AddSphinxTarget.cmake llvm/docs/conf.py Index: llvm/docs/conf.py =================================================================== --- llvm/docs/conf.py +++ llvm/docs/conf.py @@ -28,22 +28,29 @@ # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.intersphinx', 'sphinx.ext.todo'] -import sphinx -if sphinx.version_info >= (3, 0): - # This requires 0.5 or later. - extensions.append('recommonmark') -else: - source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} - # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = { '.rst': 'restructuredtext', - '.md': 'markdown', } +try: + import recommonmark +except ImportError: + # manpages do not use any .md sources + if not tags.has('builder-man'): + raise +else: + import sphinx + if sphinx.version_info >= (3, 0): + # This requires 0.5 or later. + extensions.append('recommonmark') + else: + source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} + source_suffix['.md'] = 'markdown' + # The encoding of source files. #source_encoding = 'utf-8-sig' Index: llvm/cmake/modules/AddSphinxTarget.cmake =================================================================== --- llvm/cmake/modules/AddSphinxTarget.cmake +++ llvm/cmake/modules/AddSphinxTarget.cmake @@ -38,6 +38,7 @@ -b ${builder} -d "${SPHINX_DOC_TREE_DIR}" -q # Quiet: no output other than errors and warnings. + -t builder-${builder} # tag for builder ${SPHINX_WARNINGS_AS_ERRORS_FLAG} # Treat warnings as errors if requested "${ARG_SOURCE_DIR}" # Source "${SPHINX_BUILD_DIR}" # Output -------------- next part -------------- A non-text attachment was scrubbed... Name: D83161.276167.patch Type: text/x-patch Size: 1861 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:59:25 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:59:25 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: efriedma accepted this revision. efriedma added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 From llvm-commits at lists.llvm.org Tue Jul 7 12:03:21 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:03:21 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: <7dcc912a41a38fc86b178f5b78e2c551@localhost.localdomain> efriedma added a comment. Actually, can you add similar tests that don't involve zero constants? I'm afraid at some point, we'll start folding them to zeroinitializer and end up on a different codepath. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 From llvm-commits at lists.llvm.org Tue Jul 7 12:03:28 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:03:28 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: <3f3101d99cb546ba5e6d0ec438aa56d1@localhost.localdomain> cameron.mcinally marked an inline comment as done. cameron.mcinally added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1429 + PtrInfo.getWithOffset(Offset), MemVT)); + else Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl, Node->getOperand(i), ---------------- cameron.mcinally wrote: > Nit: Not specific to this patch, but I think we can hoist `Node->getOperand(i).getValueType()` out of the loop. All the BUILD_VECTOR/CONCAT_VECTOR operand types should be the same. > > Looking deeper, the BUILD_VECTOR description is a little vague though: > > ``` > /// The types of the operands must all be > /// the same and must match the vector element type, except that integer types > /// are allowed to be larger than the element type, in which case the operands > /// are implicitly truncated. > ``` > > I assume the larger integer operand types must all be the same type. Maybe I'm misinterpreting this though. > > Just queried llvm-dev about BUILD_VECTOR and will report back... > It seems that the operands must always have the same type, but there's a modicum of disagreement (uncertainty?) there. No reason to hold up this patch though. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Tue Jul 7 12:06:48 2020 From: llvm-commits at lists.llvm.org (Sid Manning via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:06:48 +0000 (UTC) Subject: [PATCH] D82263: [Hexagon] Cleanup compiler-rt.builtins remove code that belongs in the c-library In-Reply-To: References: Message-ID: <83380d603451c0da138a5e7fb26f85cb@localhost.localdomain> sidneym updated this revision to Diff 276169. sidneym added a comment. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82263/new/ https://reviews.llvm.org/D82263 Files: compiler-rt/lib/builtins/CMakeLists.txt compiler-rt/lib/builtins/hexagon/dffma.S compiler-rt/lib/builtins/hexagon/fabs_opt.S compiler-rt/lib/builtins/hexagon/fma_opt.S compiler-rt/lib/builtins/hexagon/fmax_opt.S compiler-rt/lib/builtins/hexagon/fmin_opt.S -------------- next part -------------- A non-text attachment was scrubbed... Name: D82263.276169.patch Type: text/x-patch Size: 4939 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:09:22 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:09:22 +0000 (UTC) Subject: [PATCH] D77808: [SCCP] Use conditional info with AND/OR branch conditions. In-Reply-To: References: Message-ID: <067b5525787b03d09bd05144cb06c0ba@localhost.localdomain> fhahn updated this revision to Diff 276170. fhahn added a comment. Simplify code as suggested, remove dedicated `CopyOf != OriginalVal` check. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77808/new/ https://reviews.llvm.org/D77808 Files: llvm/lib/Transforms/IPO/SCCP.cpp llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/conditions-ranges.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D77808.276170.patch Type: text/x-patch Size: 6955 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:10:43 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:10:43 +0000 (UTC) Subject: [PATCH] D83056: [NFC] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: <6a34ce0417718bfbc9b57ccb9a0e83e8@localhost.localdomain> Meinersbur added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:50 -#define DEBUG_TYPE "loop-unroll" +#define DEBUG_TYPE "loop-peel" ---------------- fhahn wrote: > I am not sure about this change. Currently peeling is integrated in loop-unroll and remarks/debug can be filtered by loop-unroll, but now we will generate remarks for `loop-unroll` and `loop-peel` when running `-loop-unroll`. Isn't it actually better since you can now filter `-debug-only=loop-unroll`, respectively `-debug-only=loop-peel` depending on what you want to look at? Note: `-Rpass=` remarks use the pass name, not `DEBUG_TYPE`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 12:11:32 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:11:32 +0000 (UTC) Subject: [PATCH] D83179: [SCCP] Use range metadata for loads and calls In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG8691544a2767: [SCCP] Use range metadata for loads and calls (authored by nikic). Changed prior to commit: https://reviews.llvm.org/D83179?vs=275564&id=276171#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83179/new/ https://reviews.llvm.org/D83179 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/metadata.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83179.276171.patch Type: text/x-patch Size: 7418 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:11:32 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Tue, 07 Jul 2020 12:11:32 -0700 (PDT) Subject: [llvm] 8691544 - [SCCP] Use range metadata for loads and calls Message-ID: <5f04c8e4.1c69fb81.7786e.aeb0@mx.google.com> Author: Nikita Popov Date: 2020-07-07T21:09:21+02:00 New Revision: 8691544a276744474ff04b71d7e220069435c7fe URL: https://github.com/llvm/llvm-project/commit/8691544a276744474ff04b71d7e220069435c7fe DIFF: https://github.com/llvm/llvm-project/commit/8691544a276744474ff04b71d7e220069435c7fe.diff LOG: [SCCP] Use range metadata for loads and calls When all else fails, use range metadata to constrain the result of loads and calls. It should also be possible to use !nonnull, but that would require some general support for inequalities in SCCP first. Differential Revision: https://reviews.llvm.org/D83179 Added: Modified: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/metadata.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/SCCP.cpp b/llvm/lib/Transforms/Scalar/SCCP.cpp index d28846d6e6de..8ba118291206 100644 --- a/llvm/lib/Transforms/Scalar/SCCP.cpp +++ b/llvm/lib/Transforms/Scalar/SCCP.cpp @@ -1104,11 +1104,21 @@ void SCCPSolver::visitStoreInst(StoreInst &SI) { TrackedGlobals.erase(I); // No need to keep tracking this! } +static ValueLatticeElement getValueFromMetadata(const Instruction *I) { + if (MDNode *Ranges = I->getMetadata(LLVMContext::MD_range)) + if (I->getType()->isIntegerTy()) + return ValueLatticeElement::getRange( + getConstantRangeFromMetadata(*Ranges)); + // TODO: Also handle MD_nonnull. + return ValueLatticeElement::getOverdefined(); +} + // Handle load instructions. If the operand is a constant pointer to a constant // global, we can replace the load with the loaded constant value! void SCCPSolver::visitLoadInst(LoadInst &I) { - // If this load is of a struct, just mark the result overdefined. - if (I.getType()->isStructTy()) + // If this load is of a struct or the load is volatile, just mark the result + // as overdefined. + if (I.getType()->isStructTy() || I.isVolatile()) return (void)markOverdefined(&I); // ResolvedUndefsIn might mark I as overdefined. Bail out, even if we would @@ -1122,41 +1132,39 @@ void SCCPSolver::visitLoadInst(LoadInst &I) { ValueLatticeElement &IV = ValueState[&I]; - if (!isConstant(PtrVal) || I.isVolatile()) - return (void)markOverdefined(IV, &I); - - Constant *Ptr = getConstant(PtrVal); - - // load null is undefined. - if (isa(Ptr)) { - if (NullPointerIsDefined(I.getFunction(), I.getPointerAddressSpace())) - return (void)markOverdefined(IV, &I); - else - return; - } + if (isConstant(PtrVal)) { + Constant *Ptr = getConstant(PtrVal); - // Transform load (constant global) into the value loaded. - if (auto *GV = dyn_cast(Ptr)) { - if (!TrackedGlobals.empty()) { - // If we are tracking this global, merge in the known value for it. - auto It = TrackedGlobals.find(GV); - if (It != TrackedGlobals.end()) { - mergeInValue(IV, &I, It->second, getMaxWidenStepsOpts()); + // load null is undefined. + if (isa(Ptr)) { + if (NullPointerIsDefined(I.getFunction(), I.getPointerAddressSpace())) + return (void)markOverdefined(IV, &I); + else return; + } + + // Transform load (constant global) into the value loaded. + if (auto *GV = dyn_cast(Ptr)) { + if (!TrackedGlobals.empty()) { + // If we are tracking this global, merge in the known value for it. + auto It = TrackedGlobals.find(GV); + if (It != TrackedGlobals.end()) { + mergeInValue(IV, &I, It->second, getMaxWidenStepsOpts()); + return; + } } } - } - // Transform load from a constant into a constant if possible. - if (Constant *C = ConstantFoldLoadFromConstPtr(Ptr, I.getType(), DL)) { - if (isa(C)) - return; - return (void)markConstant(IV, &I, C); + // Transform load from a constant into a constant if possible. + if (Constant *C = ConstantFoldLoadFromConstPtr(Ptr, I.getType(), DL)) { + if (isa(C)) + return; + return (void)markConstant(IV, &I, C); + } } - // Otherwise we cannot say for certain what value this load will produce. - // Bail out. - markOverdefined(IV, &I); + // Fall back to metadata. + mergeInValue(&I, getValueFromMetadata(&I)); } void SCCPSolver::visitCallBase(CallBase &CB) { @@ -1171,10 +1179,13 @@ void SCCPSolver::handleCallOverdefined(CallBase &CB) { if (CB.getType()->isVoidTy()) return; + // Always mark struct return as overdefined. + if (CB.getType()->isStructTy()) + return (void)markOverdefined(&CB); + // Otherwise, if we have a single return value case, and if the function is // a declaration, maybe we can constant fold it. - if (F && F->isDeclaration() && !CB.getType()->isStructTy() && - canConstantFoldCallTo(&CB, F)) { + if (F && F->isDeclaration() && canConstantFoldCallTo(&CB, F)) { SmallVector Operands; for (auto AI = CB.arg_begin(), E = CB.arg_end(); AI != E; ++AI) { if (AI->get()->getType()->isStructTy()) @@ -1202,8 +1213,8 @@ void SCCPSolver::handleCallOverdefined(CallBase &CB) { } } - // Otherwise, we don't know anything about this call, mark it overdefined. - return (void)markOverdefined(&CB); + // Fall back to metadata. + mergeInValue(&CB, getValueFromMetadata(&CB)); } void SCCPSolver::handleCallArguments(CallBase &CB) { diff --git a/llvm/test/Transforms/SCCP/metadata.ll b/llvm/test/Transforms/SCCP/metadata.ll index afc66df58676..43e4c59571e9 100644 --- a/llvm/test/Transforms/SCCP/metadata.ll +++ b/llvm/test/Transforms/SCCP/metadata.ll @@ -7,12 +7,10 @@ declare i32 @get_i32() define void @load_range(i32* %p) { ; CHECK-LABEL: @load_range( ; CHECK-NEXT: [[V:%.*]] = load i32, i32* [[P:%.*]], align 4, !range !0 -; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[V]], 10 -; CHECK-NEXT: call void @use(i1 [[C1]]) +; CHECK-NEXT: call void @use(i1 true) ; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[V]], 9 ; CHECK-NEXT: call void @use(i1 [[C2]]) -; CHECK-NEXT: [[C3:%.*]] = icmp ugt i32 [[V]], 9 -; CHECK-NEXT: call void @use(i1 [[C3]]) +; CHECK-NEXT: call void @use(i1 false) ; CHECK-NEXT: [[C4:%.*]] = icmp ugt i32 [[V]], 8 ; CHECK-NEXT: call void @use(i1 [[C4]]) ; CHECK-NEXT: ret void @@ -29,9 +27,26 @@ define void @load_range(i32* %p) { ret void } +define i32 @load_range_single(i32* %p) { +; CHECK-LABEL: @load_range_single( +; CHECK-NEXT: ret i32 0 +; + %v = load i32, i32* %p, !range !{i32 0, i32 1} + ret i32 %v +} + +define i32 @load_range_single_volatile(i32* %p) { +; CHECK-LABEL: @load_range_single_volatile( +; CHECK-NEXT: [[V:%.*]] = load volatile i32, i32* [[P:%.*]], align 4, !range !1 +; CHECK-NEXT: ret i32 [[V]] +; + %v = load volatile i32, i32* %p, !range !{i32 0, i32 1} + ret i32 %v +} + define void @load_nonnull(i32** %p) { ; CHECK-LABEL: @load_nonnull( -; CHECK-NEXT: [[V:%.*]] = load i32*, i32** [[P:%.*]], align 8, !nonnull !1 +; CHECK-NEXT: [[V:%.*]] = load i32*, i32** [[P:%.*]], align 8, !nonnull !2 ; CHECK-NEXT: [[C1:%.*]] = icmp ne i32* [[V]], null ; CHECK-NEXT: call void @use(i1 [[C1]]) ; CHECK-NEXT: ret void @@ -45,12 +60,10 @@ define void @load_nonnull(i32** %p) { define void @call_range(i32* %p) { ; CHECK-LABEL: @call_range( ; CHECK-NEXT: [[V:%.*]] = call i32 @get_i32(), !range !0 -; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[V]], 10 -; CHECK-NEXT: call void @use(i1 [[C1]]) +; CHECK-NEXT: call void @use(i1 true) ; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[V]], 9 ; CHECK-NEXT: call void @use(i1 [[C2]]) -; CHECK-NEXT: [[C3:%.*]] = icmp ugt i32 [[V]], 9 -; CHECK-NEXT: call void @use(i1 [[C3]]) +; CHECK-NEXT: call void @use(i1 false) ; CHECK-NEXT: [[C4:%.*]] = icmp ugt i32 [[V]], 8 ; CHECK-NEXT: call void @use(i1 [[C4]]) ; CHECK-NEXT: ret void @@ -69,8 +82,7 @@ define void @call_range(i32* %p) { define internal i1 @ip_cmp_range(i32 %v) { ; CHECK-LABEL: @ip_cmp_range( -; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[V:%.*]], 10 -; CHECK-NEXT: ret i1 [[C]] +; CHECK-NEXT: ret i1 undef ; %c = icmp ult i32 %v, 10 ret i1 %c @@ -80,7 +92,7 @@ define i1 @ip_load_range(i32* %p) { ; CHECK-LABEL: @ip_load_range( ; CHECK-NEXT: [[V:%.*]] = load i32, i32* [[P:%.*]], align 4, !range !0 ; CHECK-NEXT: [[C:%.*]] = call i1 @ip_cmp_range(i32 [[V]]) -; CHECK-NEXT: ret i1 [[C]] +; CHECK-NEXT: ret i1 true ; %v = load i32, i32* %p, !range !{i32 0, i32 10} %c = call i1 @ip_cmp_range(i32 %v) From llvm-commits at lists.llvm.org Tue Jul 7 12:12:12 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:12:12 +0000 (UTC) Subject: [PATCH] D77808: [SCCP] Use conditional info with AND/OR branch conditions. In-Reply-To: References: Message-ID: fhahn marked an inline comment as done. fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1320 + else + NewCR = NewCR.intersectWith(CopyOfCR); ---------------- nikic wrote: > I'd write this as: > ``` > NewCR = NewCR.intersectWith(CopyOfCR); > if (!CopyOfCR.contains(NewCR) && > CopyOfCR.getSingleMissingElement() && > CopyOf != OriginalVal) > NewCR = CopyOfCR; > ``` > There is no need to discard the intersection if it is smaller than the original range. E.g. if you had information for `ne 0` before, then intersecting it with `ugt 42` will always be beneficial. > > I'm also not clear on why the `CopyOf != OriginalVal` comparison is there. If we think that inequality information is more valuable, shouldn't that hold regardless of how we arrived at the inequality? Thanks, that's simpler indeed. I think the impact in practice is relatively low, but it is definitely more straight-forward:) > I'm also not clear on why the CopyOf != OriginalVal comparison is there It was mainly there to guard against specific potential regressions caused by this patch. But it is probably not worth special casing here. I've remove the check. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77808/new/ https://reviews.llvm.org/D77808 From llvm-commits at lists.llvm.org Tue Jul 7 12:12:36 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:12:36 +0000 (UTC) Subject: [PATCH] D83149: [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush In-Reply-To: References: Message-ID: <124dd2b787fd224376b807d33ea808b4@localhost.localdomain> MaskRay updated this revision to Diff 276173. MaskRay marked 6 inline comments as done. MaskRay added a comment. Test __gcov_dump Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83149/new/ https://reviews.llvm.org/D83149 Files: clang/lib/Driver/ToolChains/Darwin.cpp clang/test/CodeGen/code-coverage.c clang/test/Driver/darwin-ld.c compiler-rt/lib/profile/GCDAProfiling.c compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c compiler-rt/test/profile/Posix/gcov-dlopen.c compiler-rt/test/profile/Posix/gcov-shared-flush.c compiler-rt/test/profile/gcov-__gcov_flush-terminate.c compiler-rt/test/profile/gcov-dump-and-remove.c llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83149.276173.patch Type: text/x-patch Size: 15649 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:14:30 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:14:30 +0000 (UTC) Subject: [PATCH] D83149: [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush In-Reply-To: References: Message-ID: MaskRay added a comment. `compiler-rt/test/profile/Inputs/instrprof-dlopen-dlclose-main.c.gcov` is clumsy to update. The filename is also wrong: gcov has nothing to do with instrprof. I'll update the tests separately like my fba8523fb55c8e3bc853df7a250845cf51e5fc99 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83149/new/ https://reviews.llvm.org/D83149 From llvm-commits at lists.llvm.org Tue Jul 7 12:16:17 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:16:17 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: jasonliu added inline comments. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:409 + + uint8_t getVersion(); + uint8_t getLanguageID(); ---------------- Add `const` modifier to member functions when applicable. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Tue Jul 7 12:17:06 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:17:06 +0000 (UTC) Subject: [PATCH] D83179: [SCCP] Use range metadata for loads and calls In-Reply-To: References: Message-ID: nikic marked an inline comment as done. nikic added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1121 + // as overdefined. + if (I.getType()->isStructTy() || I.isVolatile()) return (void)markOverdefined(&I); ---------------- fhahn wrote: > For volatile loads, I think we could still use the range info if present? We are just not allowed to remove the volatile operation, right? This would be safe. I briefly tried this, but found that SCCP would zap the volatile load, because it only uses `isSafeToRemove()` to determine whether instructions can be dropped. We would have to replace that with something stronger, like `isInstructionTriviallyDead()`. Do you think that would be worthwhile? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83179/new/ https://reviews.llvm.org/D83179 From llvm-commits at lists.llvm.org Tue Jul 7 12:18:18 2020 From: llvm-commits at lists.llvm.org (Itay Bookstein via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:18:18 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: nextsilicon-itay-bookstein added a comment. How do we want to proceed, then? Does my analysis make sense? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Tue Jul 7 12:18:59 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:18:59 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. Message-ID: fhahn created this revision. fhahn added reviewers: efriedma, niravd, paquette. Herald added subscribers: dexonsmith, hiraditya, MatzeB. Herald added a project: LLVM. Currently popFromQueueImpl iterates over all candidates to find the best one. While the candidate queue is small, this is not a problem. But it becomes a problem once the queue gets larger. For example, the snippet below takes 330s to compile with llc -O0, but completes in 3s with this patch. define void @test(i4000000* %ptr) { entry: store i4000000 0, i4000000* %ptr, align 4 ret void } On backends that use the MachineScheduler, there should be no changes in the generated code (e.g. for X86 there are no binary changes with this patch when building MultiSource, SPEC2000, SPEC2006 with -O3 -lto). On backends that are not using the MachineScheduler, there is a slight change in behavior: previously, the first candidate in the list would be picked if there are multiple candidates with the same score. For small worklists, maintaining the heap can be more expensive than it is actually worth it, so the new approach is only used for candidate lists with more than 100 candidates. See http://llvm-compile-time-tracker.com/compare.php?from=058af835063ff9afc39fc53279fa660e075564ed&to=390d055bee63860be70caf57515b3f29f7728d91&stat=instructions where the first commit is with a limit and the second one always uses a heap. The first commit is slightly faster. Overall on CTMark, the change is mostly neutral (http://llvm-compile-time-tracker.com/compare.php?from=f7522a5823d66303edfd8d872232dd6b07190f42&to=058af835063ff9afc39fc53279fa660e075564ed&stat=instructions) but it is beneficial for very large inputs. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83335 Files: llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83335.276178.patch Type: text/x-patch Size: 3337 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:19:26 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:19:26 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <03278fe517e2a4210e5a619a5507111e@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- zequanwu wrote: > zequanwu wrote: > > nikic wrote: > > > hans wrote: > > > > nikic wrote: > > > > > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > > > > > > > > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. > > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > > > It would only help if there is some way to only fetch the analysis conditionally. I believe many PGO passes use something like PSI.hasProfileSummary() or F.hasProfileData() for that. > > > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > > > > Right, just disabling this by default in clang/opt would also work. > > > > > > For reference, the current compile-time numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=516ff1d4baee28b1911737e47b42973567adf8ff&to=8df840660bb764b6653fcfd9ac7a72cc6adebde6&stat=instructions Not huge, but it adds up (some similar regressions have been introduced in LLVM 10). > > Do you mean disabling it just for LPM or both? > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > For Clang, a better fix I think is that `Opts.CallGraphProfile` should based on both whether the integrated assembler is used and whether profile instrumentation is turned on. What do you think? I'd prefer not having `CallGraphProfile` * `-no-integrated-as -S` => no .cgprofile (.llvm_addrsig behaves this way) * `-S` -> .cgprofile Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 12:22:46 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Micha=C5=82_G=C3=B3rny_via_Phabricator?= via llvm-commits) Date: Tue, 07 Jul 2020 19:22:46 +0000 (UTC) Subject: [PATCH] D83161: [llvm] [docs] Do not require recommonmark for manpage build In-Reply-To: References: Message-ID: <3436f23001a7e24c0fe446a003284c2a@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG446e3df25483: [llvm] [docs] Do not require recommonmark for manpage build (authored by mgorny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83161/new/ https://reviews.llvm.org/D83161 Files: llvm/cmake/modules/AddSphinxTarget.cmake llvm/docs/conf.py Index: llvm/docs/conf.py =================================================================== --- llvm/docs/conf.py +++ llvm/docs/conf.py @@ -28,22 +28,29 @@ # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.intersphinx', 'sphinx.ext.todo'] -import sphinx -if sphinx.version_info >= (3, 0): - # This requires 0.5 or later. - extensions.append('recommonmark') -else: - source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} - # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = { '.rst': 'restructuredtext', - '.md': 'markdown', } +try: + import recommonmark +except ImportError: + # manpages do not use any .md sources + if not tags.has('builder-man'): + raise +else: + import sphinx + if sphinx.version_info >= (3, 0): + # This requires 0.5 or later. + extensions.append('recommonmark') + else: + source_parsers = {'.md': 'recommonmark.parser.CommonMarkParser'} + source_suffix['.md'] = 'markdown' + # The encoding of source files. #source_encoding = 'utf-8-sig' Index: llvm/cmake/modules/AddSphinxTarget.cmake =================================================================== --- llvm/cmake/modules/AddSphinxTarget.cmake +++ llvm/cmake/modules/AddSphinxTarget.cmake @@ -38,6 +38,7 @@ -b ${builder} -d "${SPHINX_DOC_TREE_DIR}" -q # Quiet: no output other than errors and warnings. + -t builder-${builder} # tag for builder ${SPHINX_WARNINGS_AS_ERRORS_FLAG} # Treat warnings as errors if requested "${ARG_SOURCE_DIR}" # Source "${SPHINX_BUILD_DIR}" # Output -------------- next part -------------- A non-text attachment was scrubbed... Name: D83161.275680.patch Type: text/x-patch Size: 1861 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:22:59 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:22:59 +0000 (UTC) Subject: [PATCH] D83179: [SCCP] Use range metadata for loads and calls In-Reply-To: References: Message-ID: <386fbe049a041b24ae0513b1c1ec1c09@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG8691544a2767: [SCCP] Use range metadata for loads and calls (authored by nikic). Changed prior to commit: https://reviews.llvm.org/D83179?vs=275564&id=275682#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83179/new/ https://reviews.llvm.org/D83179 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/metadata.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83179.275682.patch Type: text/x-patch Size: 7418 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:26:26 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:26:26 +0000 (UTC) Subject: [PATCH] D78478: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py In-Reply-To: References: Message-ID: <36e75243a0d79cd53a837dc3b21f062c@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78478/new/ https://reviews.llvm.org/D78478 From llvm-commits at lists.llvm.org Tue Jul 7 12:28:23 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:28:23 +0000 (UTC) Subject: [PATCH] D77808: [SCCP] Use conditional info with AND/OR branch conditions. In-Reply-To: References: Message-ID: nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LGTM modulo some naming. ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1268 Value *CmpOp1 = Cmp->getOperand(1); - if (CopyOf != CmpOp0 && CopyOf != CmpOp1) { - mergeInValue(ValueState[&CB], &CB, OriginalVal); + // Bail out if neither of the operands matches the OriginalVal or CopyOf. + if (CmpOp0 != OriginalVal && CmpOp1 != OriginalVal) { ---------------- The "or CopyOf" is too much now. (You might want to convert this into an assertion, to make sure there are no more surprises here.) ================ Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:1296 ValueLatticeElement &IV = ValueState[&CB]; - if (CondVal.isConstantRange() || OriginalVal.isConstantRange()) { - auto NewCR = + ValueLatticeElement CopyOfVal = getValueState(CopyOf); + if (CondVal.isConstantRange() || CopyOfVal.isConstantRange()) { ---------------- You already fetched this above under the name of `OriginalValState`. I think the naming here got a bit messed up due to the back and forth refactoring. You might want to rename `OriginalValState` above to `CopyOfVal`, to avoid confusing with `OriginalVal`. I'd also s/OriginalVal/OriginalOp to keep with the name from PredicateInfo, and because you're using `Val` to indicate lattice values rather than `Value*`s. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77808/new/ https://reviews.llvm.org/D77808 From llvm-commits at lists.llvm.org Tue Jul 7 12:28:39 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:28:39 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: <82c0b5e0c0a57618f0d9299f61c065ee@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 From llvm-commits at lists.llvm.org Tue Jul 7 12:31:39 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:31:39 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGc6a23df691fb: [flang] Make 'num_images()' intrinsic (authored by ktras, committed by tskeith). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 Files: flang/documentation/Intrinsics.md flang/lib/Evaluate/intrinsics.cpp flang/test/Semantics/call10.f90 flang/test/Semantics/num_images.f90 flang/unittests/Evaluate/intrinsics.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83142.276180.patch Type: text/x-patch Size: 5738 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:33:26 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:33:26 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <6f17a88136f0b523e69f44b7fc0903ab@localhost.localdomain> Ayal added inline comments. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:844 + // check the opcode is correct (and dont allow them to be Subs) and that they + // have expected to have an extra user from the LCSSA phi, but select do not + // have a single use from the phi, meaning we should always expect 2 uses. ---------------- rephrase sentence so it parses ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812 + if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) && + LHS->hasOneUse() && + findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) { ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > fhahn wrote: > > > > Ayal wrote: > > > > > Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr? > > > > > > > > > > Note that when extended to handle reductions with conditional bumps, some ops will have more than one use. > > > > Instead of doing a recursive traversal, would it be simpler to just do the traversal iteratively, at least as long as we are only using at a single use chain? > > > Yeah, that direction makes it a lot simpler. Thanks. > > Is treating sub as an add reduction something in-loop reduction could support as a future extension? > Hmm. I don't want to say never. A normal inloop reduction looks like: > p = PHI(0, a) > l = VLDR (..) > a = VADDVA(p, l) > Where the `VADDV` is an across-vector reductions, and the extra `A` means also add p. Reducing a sub would need to become: > p = PHI(0, a) > l = VLDR (..) > a = VADDV(l) > p = SUB(p, a) > With the SUB as a separate scalar instruction, which would be quite slow on some hardware (getting a value over from the VADDV to the SUB). So this would almost certainly be slower than a out-of-loop reduction. > > But if we could end up using a higher vector factor for the reduction, or end up vectorizing loops that would previously not be vectorized.. that may lead to a gain overall to overcome the extra cost of adding the sub to the loop. It will require some very careful costing I think. And maybe the ability to create multiple vplans and cost them against one another :) An original sub code, say, acc -= a[i], can be treated as acc += (-a[i]). This could be in-loop reduced by first negating a[i]'s, at LV's LLVM-IR level, presumably lowered later to something like ``` p = PHI(0, a) l = VLDR (..) s = VSUBV (zero, l) a = VADDVA(p, s) ``` , right? ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7348 + // For min/max reducitons, where we have a pair of icmp/select, we also + // need to recode the ICmp recipe, so it can be removed later. + if (Kind == RecurrenceDescriptor::RK_IntegerMinMax || ---------------- recode >> record ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3751 setDebugLocFromInst(Builder, ReductionStartValue); + bool UseInloopReductions = Cost->useInloopReductions(Phi); ---------------- Ayal wrote: > isInLoopReductionPhi Ahh, this should actually be capitalized `IsInLoopReductionPhi` ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769 // MinMax reduction have the start value as their identify. - if (VF == 1) { + if (VF == 1 || UseInloopReductions) { VectorStart = Identity = ReductionStartValue; ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > Ayal wrote: > > > > dmgreen wrote: > > > > > Ayal wrote: > > > > > > This is dead code if cmp/select chains are not recognized yet, as noted above. > > > > > I've added the code to handle minmax too (but not tested it a lot yet. I will try that now). > > > > > > > > > > MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.) > > > > > > > > > > My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away. > > > > Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize? > > > -Os sounds like a good plan. It will take some backend work to make it efficient enough first though. And predicated reductions? > > Hoisting the horizontal reduction from the middle block into the loop could potentially eliminate the middle block (as in tests below), so could presumably lead to code of smaller size? At-least for in-loop chains of a single link. > > > > > And predicated reductions? > > These are yet to be handled in-loop, right? > >> And predicated reductions? > >These are yet to be handled in-loop, right? > Yep. It will need a predicated reduction intrinsic. A vecreduce that takes a mask. That will allow us to tail-fold the reductions with trip counts that do not divide the vector factor, which will make them look a lot better under -Os. And nice in general I think once it all starts being tail predicated. > > The backend work I was mentioning was that we need to more efficiently transform > x = min(vecreduce.min(z), y) > into > x = VMINV(y, z) > Where y is (confusingly) accumulated in the case (even though the instruction doesn't have an A suffix). We currently generate > x = min(VMINV(UINT_MAX, z), y) > > Once that is sorted out then, yep, using these for Os sounds like a good plan. Re: predicated reductions - could they be handled by replacing masked-off elements with `Identity` using a select prior to reduction? To be potentially folded later by suitable targets into a predicated reduction operation which they may support. Somewhat akin to "passthru" values of masked loads. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286 + RecurrenceDescriptor &RdxDesc = Reduction.second; + if (CM.useInloopReductions(Reduction.first)) { + PHINode *Phi = Reduction.first; ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > Ayal wrote: > > > > dmgreen wrote: > > > > > Ayal wrote: > > > > > > Iterate over in loop reductions? > > > > > Do you mean adding an iterator for iterating over reductions, stepping over the ones not inloop? > > > > > > > > > > It would seem like it's similar to the existing code, but as a new iterator class. My gut says the current code is simpler and clearer what is going on? > > > > Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars(). > > > > > > > > (Better early-exit via "if (!CM.isInLoopReduction(Reduction.first)) continue;") > > > I believe that InloopReductionChains would not iterate in a deterministic order, which is why I avoided it. > > > > > > Perhaps that would not matter here? The reductions should be independent anyway. Seems safer to try and use deterministic ordering anyway if we can. > > Agreed it would be better to use deterministic ordering. How about letting InloopReductionChains be a MapVector and iterate over > > for (auto &Reduction : CM.getInloopReductions())? > > The number of reductions is expected to be small, w/o removals. > MapVector sounds good. I've changed it to use that and tried to use that in a few more places. Let me know what you think. Uses as MapVector look good to me, thanks. Can also retain isInLoopReduction(PHI). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Tue Jul 7 12:33:50 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:33:50 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: DmitryPolukhin added a comment. getBaseObject is only small part of code sharing, the majority of the code is outside the class in common code that handles both GlobalIFunc and GlobalAliases in the same way. Here they should be treated differently so, IMHO, there is no need in changing the inheritance but GlobalIFunc should be handles as a special case here. ================ Comment at: llvm/lib/IR/Globals.cpp:448 + if (auto *GI = dyn_cast(C)) + return findBaseObject(GI->getOperand(0), Aliases); if (auto *GA = dyn_cast(C)) ---------------- `return GI;` should work here I think. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Tue Jul 7 12:34:17 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:34:17 +0000 (UTC) Subject: [PATCH] D83112: [flang] Added missing runtime I/O definitions In-Reply-To: References: Message-ID: <37dc28a5913e4ae95619e1819a5fd198@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG4b9b64d561e9: [flang] Added missing runtime I/O definitions (authored by zacharyselk, committed by tskeith). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83112/new/ https://reviews.llvm.org/D83112 Files: flang/runtime/io-api.cpp Index: flang/runtime/io-api.cpp =================================================================== --- flang/runtime/io-api.cpp +++ flang/runtime/io-api.cpp @@ -843,6 +843,35 @@ return false; } +bool IONAME(OutputReal32)(Cookie cookie, float x) { + IoStatementState &io{*cookie}; + if (!io.get_if()) { + io.GetIoErrorHandler().Crash( + "OutputReal32() called for a non-output I/O statement"); + return false; + } + if (auto edit{io.GetNextDataEdit()}) { + return RealOutputEditing<24>{io, x}.Edit(*edit); + } + return false; +} + +bool IONAME(InputReal32)(Cookie cookie, float &x) { + IoStatementState &io{*cookie}; + if (!io.get_if()) { + io.GetIoErrorHandler().Crash( + "InputReal32() called for a non-input I/O statement"); + return false; + } + if (auto edit{io.GetNextDataEdit()}) { + if (edit->descriptor == DataEdit::ListDirectedNullValue) { + return true; + } + return EditRealInput<24>(io, *edit, reinterpret_cast(&x)); + } + return false; +} + bool IONAME(OutputReal64)(Cookie cookie, double x) { IoStatementState &io{*cookie}; if (!io.get_if()) { @@ -872,6 +901,18 @@ return false; } +bool IONAME(OutputComplex32)(Cookie cookie, float r, float z) { + IoStatementState &io{*cookie}; + if (io.get_if>()) { + DataEdit real, imaginary; + real.descriptor = DataEdit::ListDirectedRealPart; + imaginary.descriptor = DataEdit::ListDirectedImaginaryPart; + return RealOutputEditing<24>{io, r}.Edit(real) && + RealOutputEditing<24>{io, z}.Edit(imaginary); + } + return IONAME(OutputReal32)(cookie, r) && IONAME(OutputReal32)(cookie, z); +} + bool IONAME(OutputComplex64)(Cookie cookie, double r, double z) { IoStatementState &io{*cookie}; if (io.get_if>()) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83112.276181.patch Type: text/x-patch Size: 1930 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:34:27 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:34:27 +0000 (UTC) Subject: [PATCH] D83336: [flang] Support for image selectors Message-ID: PeteSteinfeld created this revision. PeteSteinfeld added reviewers: klausler, tskeith. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. This change implements support for image selectors and image selector specifications as described in section 9.6. In check-coarray[.h,cpp] I changed the `Leave()` function for `parser::ImageSelectorSpec` to take a `parser::ImageSelector`, which contains a list of image selector specifications. This allows us to detect when the same specification is used more than once. I also added code to analyze the expressions for the image selector specifications to expression.cpp and a test for all of the conditions to check at compile-time. Note that we do not check at compile-time to see if the value of the cosubscripts are within the specified cobounds. We also do not check anything related to selecting a valid team. We also do not check that the denotation of the `stat-variable` is not dependent on the evaluation of an entity in the same statement. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83336 Files: flang/include/flang/Parser/tools.h flang/lib/Parser/tools.cpp flang/lib/Semantics/check-coarray.cpp flang/lib/Semantics/check-coarray.h flang/lib/Semantics/expression.cpp flang/test/Semantics/resolve94.f90 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83336.276182.patch Type: text/x-patch Size: 9736 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:36:13 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:36:13 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls Message-ID: guiand created this revision. guiand added reviewers: eugenis, vitalybuka. Herald added subscribers: llvm-commits, Sanitizers, jfb, hiraditya. Herald added projects: Sanitizers, LLVM. These calls are neither intercepted by compiler-rt nor is libatomic.a naturally instrumented. This patch uses the existing libcall mechanism to detect a call to __atomic_load or __atomic_store, and instruments them much like the preexisting instrumentation for atomics. Calls to _load are modified to have at least Acquire ordering, and calls to _store at least Release ordering. Because this needs to be converted at runtime, msan injects a LUT (implemented as a vector with extractelement). Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83337 Files: compiler-rt/test/msan/libatomic.c llvm/include/llvm/Analysis/TargetLibraryInfo.def llvm/lib/Analysis/TargetLibraryInfo.cpp llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/libatomic.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83337.276184.patch Type: text/x-patch Size: 12570 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:36:15 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:36:15 +0000 (UTC) Subject: [PATCH] D83338: [PowerPC][Power10] Implemented Vector Shift Builtins Message-ID: Conanap created this revision. Conanap added reviewers: PowerPC, power-llvm-team, saghir, nemanjai, hfinkel. Conanap added projects: LLVM, clang, PowerPC. Implemented the following vector right and left shift builtins and its test cases: vector unsigned __int128 vec_sl(vector unsigned __int128 a, vector unsigned __int128 b) vector signed __int128 vec_sl(vector signed __int128 a, vector unsigned __int128 b) vector unsigned __int128 vec_sr(vector unsigned __int128 a, vector unsigned __int128 b) vector signed __int128 vec_sr(vector signed __int128 a, vector unsigned __int128 b) vector unsigned __int128 vec_sra(vector unsigned __int128 a, vector unsigned __int128 b) vector signed __int128 vec_sra(vector signed __int128 a, vector unsigned __int128 b) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83338 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/Headers/altivec.h llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83338.276179.patch Type: text/x-patch Size: 5729 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:38:24 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:38:24 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: <8c46cd40aa266da7985c18872baf17f4@localhost.localdomain> DmitryPolukhin added inline comments. ================ Comment at: llvm/lib/IR/Globals.cpp:448 + if (auto *GI = dyn_cast(C)) + return findBaseObject(GI->getOperand(0), Aliases); if (auto *GA = dyn_cast(C)) ---------------- DmitryPolukhin wrote: > `return GI;` should work here I think. Nope, it doesn't just from types point of view. But IFunc should be treated as an object itself. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Tue Jul 7 12:43:13 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:43:13 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: epastor updated this revision to Diff 276185. epastor added a comment. Improve error messages and fix a missing OnFailure case Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 Files: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test llvm/test/tools/llvm-ml/struct_errors.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D75306.276185.patch Type: text/x-patch Size: 84988 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:49:11 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:49:11 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <433b2356ddc2bb67c6a7c946eb6ce322@localhost.localdomain> guiand marked 3 inline comments as done. guiand added inline comments. ================ Comment at: compiler-rt/test/msan/libatomic.c:37 +#endif +} ---------------- One thing that turned out a little strange is that because I have to insert instructions *after* the atomic load, including the origin update, the msan reporter decides that the origin is one line below the call. Is there anything I can do about this? ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:1872 + OrderingTable[(int)AtomicOrderingCABI::seq_cst] = + (int)AtomicOrderingCABI::seq_cst; + ---------------- Is there a more elegant way to do this? ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3544 + IRBuilder<> NextIRB(CB.getNextNode()); + Align AlignOne = assumeAligned(1); + auto SrcShadowOriginPair = ---------------- Is it valid to assume any alignment other than 1 here? I can't think of a way, but I'd like to make sure. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 From llvm-commits at lists.llvm.org Tue Jul 7 12:54:58 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:54:58 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: efriedma added a comment. I'm concerned that the behavior of queues with multiple candidates with the same score might not be consistent across compilers. (This is similar to using llvm::sort when you really need std::stable_sort.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Tue Jul 7 12:55:03 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:55:03 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <958310f92169b531dc469b1e19b7c10d@localhost.localdomain> guiand updated this revision to Diff 276186. guiand added a comment. Update TargetLibraryInfoTest to include new LibFunc declarations. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 Files: compiler-rt/test/msan/libatomic.c llvm/include/llvm/Analysis/TargetLibraryInfo.def llvm/lib/Analysis/TargetLibraryInfo.cpp llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/libatomic.ll llvm/unittests/Analysis/TargetLibraryInfoTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83337.276186.patch Type: text/x-patch Size: 13185 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:55:38 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:55:38 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <86c5558162414c70975c392973063292@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc6a23df691fb: [flang] Make 'num_images()' intrinsic (authored by ktras, committed by tskeith). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 Files: flang/documentation/Intrinsics.md flang/lib/Evaluate/intrinsics.cpp flang/test/Semantics/call10.f90 flang/test/Semantics/num_images.f90 flang/unittests/Evaluate/intrinsics.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83142.275683.patch Type: text/x-patch Size: 5738 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:55:44 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:55:44 +0000 (UTC) Subject: [PATCH] D83112: Added missing runtime I/O definitions In-Reply-To: References: Message-ID: <8a1ff2a914837759254fed0937df75c9@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG4b9b64d561e9: [flang] Added missing runtime I/O definitions (authored by zacharyselk, committed by tskeith). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83112/new/ https://reviews.llvm.org/D83112 Files: flang/runtime/io-api.cpp Index: flang/runtime/io-api.cpp =================================================================== --- flang/runtime/io-api.cpp +++ flang/runtime/io-api.cpp @@ -843,6 +843,35 @@ return false; } +bool IONAME(OutputReal32)(Cookie cookie, float x) { + IoStatementState &io{*cookie}; + if (!io.get_if()) { + io.GetIoErrorHandler().Crash( + "OutputReal32() called for a non-output I/O statement"); + return false; + } + if (auto edit{io.GetNextDataEdit()}) { + return RealOutputEditing<24>{io, x}.Edit(*edit); + } + return false; +} + +bool IONAME(InputReal32)(Cookie cookie, float &x) { + IoStatementState &io{*cookie}; + if (!io.get_if()) { + io.GetIoErrorHandler().Crash( + "InputReal32() called for a non-input I/O statement"); + return false; + } + if (auto edit{io.GetNextDataEdit()}) { + if (edit->descriptor == DataEdit::ListDirectedNullValue) { + return true; + } + return EditRealInput<24>(io, *edit, reinterpret_cast(&x)); + } + return false; +} + bool IONAME(OutputReal64)(Cookie cookie, double x) { IoStatementState &io{*cookie}; if (!io.get_if()) { @@ -872,6 +901,18 @@ return false; } +bool IONAME(OutputComplex32)(Cookie cookie, float r, float z) { + IoStatementState &io{*cookie}; + if (io.get_if>()) { + DataEdit real, imaginary; + real.descriptor = DataEdit::ListDirectedRealPart; + imaginary.descriptor = DataEdit::ListDirectedImaginaryPart; + return RealOutputEditing<24>{io, r}.Edit(real) && + RealOutputEditing<24>{io, z}.Edit(imaginary); + } + return IONAME(OutputReal32)(cookie, r) && IONAME(OutputReal32)(cookie, z); +} + bool IONAME(OutputComplex64)(Cookie cookie, double r, double z) { IoStatementState &io{*cookie}; if (io.get_if>()) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83112.275684.patch Type: text/x-patch Size: 1930 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 12:59:08 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:59:08 +0000 (UTC) Subject: [PATCH] D83021: [Inliner] Don't skip inlining alwaysinline in optnone functions In-Reply-To: References: Message-ID: aeubanks added a comment. In D83021#2128775 , @aeubanks wrote: > Actually, I'm wondering if the NPM inliner properly handles alwaysinline in all cases? I don't see any other references to "alwaysinline" in Inliner.cpp (besides some legacy passes). jyknight pointed me to llvm::getAttributeBasedInliningDecision() where alwaysinline is handled. It still seems a little weird to me that alwaysinline isn't a separate pass under the -O1/-O2/-O3 pipelines and would make opt-bisect always run the inliner pass (once it's implemented in NPM). But this will do for now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83021/new/ https://reviews.llvm.org/D83021 From llvm-commits at lists.llvm.org Tue Jul 7 13:02:07 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:02:07 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: <3fda0998f6c8456b192ad7957726d872@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 From llvm-commits at lists.llvm.org Tue Jul 7 13:02:57 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Tue, 07 Jul 2020 13:02:57 -0700 (PDT) Subject: [llvm] 2279380 - [Inliner] Don't skip inlining alwaysinline in optnone functions Message-ID: <5f04d4f1.1c69fb81.1edb6.801b@mx.google.com> Author: Arthur Eubanks Date: 2020-07-07T12:54:55-07:00 New Revision: 2279380eab08219911910e1ecdcef3eacb0b7f0c URL: https://github.com/llvm/llvm-project/commit/2279380eab08219911910e1ecdcef3eacb0b7f0c DIFF: https://github.com/llvm/llvm-project/commit/2279380eab08219911910e1ecdcef3eacb0b7f0c.diff LOG: [Inliner] Don't skip inlining alwaysinline in optnone functions Previously the NPM inliner would skip all potential inlines in an optnone function, but alwaysinline callees should be inlined regardless of optnone. Fixes inline-optnone.ll under NPM. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D83021 Added: Modified: llvm/lib/Transforms/IPO/Inliner.cpp llvm/test/Transforms/Inline/inline-optnone.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp index 876ad8846a05..7d2260f4c169 100644 --- a/llvm/lib/Transforms/IPO/Inliner.cpp +++ b/llvm/lib/Transforms/IPO/Inliner.cpp @@ -791,7 +791,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC, LazyCallGraph::Node &N = *CG.lookup(F); if (CG.lookupSCC(N) != C) continue; - if (F.hasOptNone()) { + if (!Calls[I].first->getCalledFunction()->hasFnAttribute( + Attribute::AlwaysInline) && + F.hasOptNone()) { setInlineRemark(*Calls[I].first, "optnone attribute"); continue; } diff --git a/llvm/test/Transforms/Inline/inline-optnone.ll b/llvm/test/Transforms/Inline/inline-optnone.ll index 9b99c4558ea0..475ef7692ce7 100644 --- a/llvm/test/Transforms/Inline/inline-optnone.ll +++ b/llvm/test/Transforms/Inline/inline-optnone.ll @@ -1,4 +1,5 @@ ; RUN: opt < %s -inline -S | FileCheck %s +; RUN: opt < %s --passes=inline -S | FileCheck %s ; Test that functions with attribute optnone are not inlined. ; Also test that only functions with attribute alwaysinline are From llvm-commits at lists.llvm.org Tue Jul 7 13:03:11 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:03:11 +0000 (UTC) Subject: [PATCH] D83021: [Inliner] Don't skip inlining alwaysinline in optnone functions In-Reply-To: References: Message-ID: <05e147df0d205c28616c990d2f3acd28@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG2279380eab08: [Inliner] Don't skip inlining alwaysinline in optnone functions (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83021/new/ https://reviews.llvm.org/D83021 Files: llvm/lib/Transforms/IPO/Inliner.cpp llvm/test/Transforms/Inline/inline-optnone.ll Index: llvm/test/Transforms/Inline/inline-optnone.ll =================================================================== --- llvm/test/Transforms/Inline/inline-optnone.ll +++ llvm/test/Transforms/Inline/inline-optnone.ll @@ -1,4 +1,5 @@ ; RUN: opt < %s -inline -S | FileCheck %s +; RUN: opt < %s --passes=inline -S | FileCheck %s ; Test that functions with attribute optnone are not inlined. ; Also test that only functions with attribute alwaysinline are Index: llvm/lib/Transforms/IPO/Inliner.cpp =================================================================== --- llvm/lib/Transforms/IPO/Inliner.cpp +++ llvm/lib/Transforms/IPO/Inliner.cpp @@ -791,7 +791,9 @@ LazyCallGraph::Node &N = *CG.lookup(F); if (CG.lookupSCC(N) != C) continue; - if (F.hasOptNone()) { + if (!Calls[I].first->getCalledFunction()->hasFnAttribute( + Attribute::AlwaysInline) && + F.hasOptNone()) { setInlineRemark(*Calls[I].first, "optnone attribute"); continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83021.276188.patch Type: text/x-patch Size: 1009 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:03:15 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:03:15 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: <46ac33527d2b6995fd2d8fad01ef1882@localhost.localdomain> asbirlea added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1706 + + if (isFreeCall(MaybeTerm, &TLI)) { + DataLayout DL = MaybeTerm->getParent()->getModule()->getDataLayout(); ---------------- I'm not sure how expensive the `isFreeCall` is, but the check is already done inside the call to `getLocForTerminator`. You could add a bool optional, or return a pair, if worth eliminating the repeated check. ================ Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:2263 + + ToCheck.insert(NextDef->getDefiningAccess()); + if (OR == OW_Complete) { ---------------- I don't understand how this change is related. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 From llvm-commits at lists.llvm.org Tue Jul 7 13:07:06 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:07:06 +0000 (UTC) Subject: [PATCH] D83339: [SVE] Remove calls to VectorType::getNumElements from AsmParserTest Message-ID: ctetreau created this revision. Herald added subscribers: llvm-commits, psnobl, tschuett. Herald added a reviewer: efriedma. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83339 Files: llvm/unittests/AsmParser/AsmParserTest.cpp Index: llvm/unittests/AsmParser/AsmParserTest.cpp =================================================================== --- llvm/unittests/AsmParser/AsmParserTest.cpp +++ llvm/unittests/AsmParser/AsmParserTest.cpp @@ -230,7 +230,7 @@ ASSERT_TRUE(Ty->isVectorTy()); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); @@ -362,7 +362,7 @@ ASSERT_TRUE(Read == 9); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83339.276189.patch Type: text/x-patch Size: 843 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:09:16 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:09:16 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Optionally set OriginalOp to renamed value it refers to. In-Reply-To: References: Message-ID: <7aa905880055238dd39da97530a92784@localhost.localdomain> nikic added a comment. In D78133#2134287 , @fhahn wrote: > In D78133#2131838 , @nikic wrote: > > > Do we need to have this behind an option? That is, are there any PredicateInfo consumers who would //not// want this behavior? > > > Currently NewGVN relies on the current behavior. It uses it to easily look up the original instruction and uses it to merge the metadata of replacement instructions (which is not an issue for SCCP because we don't replace the predicates with other equivalent instructions which could have metadata). I guess we could try to keep track of the original instruction in NewGVN (or traverse the chain there), but I don't think it's worth blocking the patch on the change to NewGVN. Why don't we include both variants? While NewGVN needs the current OriginalOp for the replacement instruction patching, it also needs what is being implemented here (lets call it CondOp) when originally handling the PredicateInfo. As this comment indicates, NewGVN currently has the same problem as SCCP: https://github.com/llvm/llvm-project/blob/2279380eab08219911910e1ecdcef3eacb0b7f0c/llvm/lib/Transforms/Scalar/NewGVN.cpp#L1565-L1572 As PredicateInfo structures aren't particularly memory critical, I'd suggest to include both `OriginalOp` (value prior to any renaming) and `CondOp` (value used inside the condition) and use them as appropriate. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 From llvm-commits at lists.llvm.org Tue Jul 7 13:10:22 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:10:22 +0000 (UTC) Subject: [PATCH] D83313: [MachineOutliner] Fix liveness computing. In-Reply-To: References: Message-ID: <28459860aa2f055c25f9432a2f2c7a1d@localhost.localdomain> efriedma added a comment. I'm not really happy with this approach; if LiveRegUnits isn't producing correct results, we should fix it, not try to hack around it. Maybe we should revive D40061 ... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83313/new/ https://reviews.llvm.org/D83313 From llvm-commits at lists.llvm.org Tue Jul 7 13:11:55 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:11:55 +0000 (UTC) Subject: [PATCH] D83341: [SVE] Scalarize fixed length masked loads and stores. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a reviewer: rengolin. Herald added a reviewer: efriedma. Herald added a project: LLVM. When adding support for scalable vector masked loads and stores we accidently opened up likewise for fixed length vectors. This patch restricts support to scalable vectors only, thus ensuring fixed length vectors are treated the same regardless of SVE support. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83341 Files: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83341.276191.patch Type: text/x-patch Size: 11825 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:13:28 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:13:28 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: <8ddbc0175473bf08e261367d45685281@localhost.localdomain> ctetreau updated this revision to Diff 276193. ctetreau added a comment. fix test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 Files: llvm/lib/IR/Constants.cpp llvm/test/Transforms/InstSimplify/vscale.ll llvm/unittests/IR/ConstantsTest.cpp Index: llvm/unittests/IR/ConstantsTest.cpp =================================================================== --- llvm/unittests/IR/ConstantsTest.cpp +++ llvm/unittests/IR/ConstantsTest.cpp @@ -638,5 +638,34 @@ EXPECT_FALSE(CP00U->isElementWiseEqual(CP00U0)); } +TEST(ConstantsTest, GetSplatValueRoundTrip) { + LLVMContext Context; + + Type *FloatTy = Type::getFloatTy(Context); + Type *Int32Ty = Type::getInt32Ty(Context); + Type *Int8Ty = Type::getInt8Ty(Context); + + for (unsigned Min : {1, 2, 8}) { + ElementCount SEC = {Min, true}; + ElementCount FEC = {Min, false}; + + for (auto EC : {SEC, FEC}) { + for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { + Constant *Zero = Constant::getNullValue(Ty); + Constant *One = Constant::getAllOnesValue(Ty); + + for (auto *C : {Zero, One}) { + Constant *Splat = ConstantVector::getSplat(EC, C); + ASSERT_NE(nullptr, Splat); + + Constant *SplatVal = Splat->getSplatValue(); + EXPECT_NE(nullptr, SplatVal); + EXPECT_EQ(SplatVal, C); + } + } + } + } +} + } // end anonymous namespace } // end namespace llvm Index: llvm/test/Transforms/InstSimplify/vscale.ll =================================================================== --- llvm/test/Transforms/InstSimplify/vscale.ll +++ llvm/test/Transforms/InstSimplify/vscale.ll @@ -95,6 +95,15 @@ ret i32 %r } +; more complicated expressions + +define @cmp_le_smax_always_true( %x) { +; CHECK-LABEL: @cmp_le_smax_always_true( +; CHECK-NEXT: ret shufflevector ( insertelement ( undef, i1 true, i32 0), undef, zeroinitializer) + %cmp = icmp sle %x, shufflevector ( insertelement ( undef, i64 9223372036854775807, i32 0), undef, zeroinitializer) + ret %cmp +} + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Memory Access and Addressing Operations ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Index: llvm/lib/IR/Constants.cpp =================================================================== --- llvm/lib/IR/Constants.cpp +++ llvm/lib/IR/Constants.cpp @@ -1588,6 +1588,27 @@ return CV->getSplatValue(); if (const ConstantVector *CV = dyn_cast(this)) return CV->getSplatValue(AllowUndefs); + + // Check if this is a constant expression splat of the form returned by + // ConstantVector::getSplat() + const auto *Shuf = dyn_cast(this); + if (Shuf && Shuf->getOpcode() == Instruction::ShuffleVector && + isa(Shuf->getOperand(1))) { + + const auto *IElt = dyn_cast(Shuf->getOperand(0)); + if (IElt && IElt->getOpcode() == Instruction::InsertElement && + isa(IElt->getOperand(0))) { + + ArrayRef Mask = Shuf->getShuffleMask(); + Constant *SplatVal = IElt->getOperand(1); + ConstantInt *Index = dyn_cast(IElt->getOperand(2)); + + if (Index && Index->getValue() == 0 && + std::all_of(Mask.begin(), Mask.end(), [](int I) { return I == 0; })) + return SplatVal; + } + } + return nullptr; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82416.276193.patch Type: text/x-patch Size: 3369 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:15:58 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:15:58 +0000 (UTC) Subject: [PATCH] D83041: [SVE][CodeGen] Legalisation of unpredicated store instructions In-Reply-To: References: Message-ID: efriedma added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2513 + Ptr = DAG.getObjectPtrOffset(DL, Ptr, IncrementSize); + } ---------------- It might make sense to write a helper with the pointer increment logic? Not sure how many places you need to repeat it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83041/new/ https://reviews.llvm.org/D83041 From llvm-commits at lists.llvm.org Tue Jul 7 13:19:55 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:19:55 +0000 (UTC) Subject: [PATCH] D83341: [SVE] Scalarize fixed length masked loads and stores. In-Reply-To: References: Message-ID: <1eed463ac8da545f746bc347eda2c636@localhost.localdomain> paulwalker-arm added a reviewer: kmclaughlin. paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll:1 +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S %s -scalarize-masked-mem-intrin -mtriple=aarch64-linux-gnu | FileCheck %s ---------------- For information I just copied these tests from ScalarizeMaskedMemIntrin/X86. I did consider moving them somewhere common but when we add code generation support for fixed length masked ops they'll diverge anyway. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83341/new/ https://reviews.llvm.org/D83341 From llvm-commits at lists.llvm.org Tue Jul 7 13:19:56 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:19:56 +0000 (UTC) Subject: [PATCH] D79910: [x86][seses] Add clang flag; Use lvi-cfi with seses In-Reply-To: References: Message-ID: <3d36bf37259a01a0219a32f3f5f921f6@localhost.localdomain> zbrid updated this revision to Diff 276196. zbrid added a comment. rebase prior to commit Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79910/new/ https://reviews.llvm.org/D79910 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79910.276196.patch Type: text/x-patch Size: 10042 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:20:53 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:20:53 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: <3ff8dc70f2fd580d3d4b705ed856a4a1@localhost.localdomain> DmitryPolukhin added a comment. I did a bit of archeology and it turns out that getBaseObejct was part of moved from GlobalAlias to GlobalIndirectSymbol in https://github.com/llvm/llvm-project/commit/95549497ec8b5269f0439f12859537b7371b7c90 It looks like the simplest solution is to handle nullptr from getBaseObejct in computeAliasSummary... CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Tue Jul 7 13:21:36 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via llvm-commits) Date: Tue, 07 Jul 2020 13:21:36 -0700 (PDT) Subject: [llvm] 9d9e499 - [x86][seses] Add clang flag; Use lvi-cfi with seses Message-ID: <5f04d950.1c69fb81.bd60a.8785@mx.google.com> Author: Zola Bridges Date: 2020-07-07T13:20:13-07:00 New Revision: 9d9e499840af670b9644af77ce846c52085c23a1 URL: https://github.com/llvm/llvm-project/commit/9d9e499840af670b9644af77ce846c52085c23a1 DIFF: https://github.com/llvm/llvm-project/commit/9d9e499840af670b9644af77ce846c52085c23a1.diff LOG: [x86][seses] Add clang flag; Use lvi-cfi with seses This patch creates a clang flag to enable SESES. This flag also ensures that lvi-cfi is on when using seses via clang. SESES should use lvi-cfi to mitigate returns and indirect branches. The flag to enable the SESES functionality only without lvi-cfi is now -x86-seses-enable-without-lvi-cfi to warn users part of the mitigation is not enabled if they use this flag. This is useful in case folks want to see the cost of SESES separate from the LVI-CFI. Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D79910 Added: Modified: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll Removed: ################################################################################ diff --git a/clang/docs/ClangCommandLineReference.rst b/clang/docs/ClangCommandLineReference.rst index 672c4ae80e73..0b56b7ac4206 100644 --- a/clang/docs/ClangCommandLineReference.rst +++ b/clang/docs/ClangCommandLineReference.rst @@ -2755,6 +2755,10 @@ Generate a \_\_mcount\_loc section entry for each \_\_fentry\_\_ call. Make StdCall calling convention the default +.. option:: -mseses, -mno-seses + +Enable speculative execution side effect suppression (SESES). Includes LVI control flow integrity mitigations + .. option:: -msign-return-address= Select return address signing scope diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 745c696bcaa3..c95d427da267 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -2264,6 +2264,11 @@ def mlvi_cfi : Flag<["-"], "mlvi-cfi">, Group, Flags<[CoreOption,Driver HelpText<"Enable only control-flow mitigations for Load Value Injection (LVI)">; def mno_lvi_cfi : Flag<["-"], "mno-lvi-cfi">, Group, Flags<[CoreOption,DriverOption]>, HelpText<"Disable control-flow mitigations for Load Value Injection (LVI)">; +def m_seses : Flag<["-"], "mseses">, Group, Flags<[CoreOption, DriverOption]>, + HelpText<"Enable speculative execution side effect suppression (SESES). " + "Includes LVI control flow integrity mitigations">; +def mno_seses : Flag<["-"], "mno-seses">, Group, Flags<[CoreOption, DriverOption]>, + HelpText<"Disable speculative execution side effect suppression (SESES)">; def mrelax : Flag<["-"], "mrelax">, Group, HelpText<"Enable linker relaxation">; diff --git a/clang/lib/Driver/ToolChains/Arch/X86.cpp b/clang/lib/Driver/ToolChains/Arch/X86.cpp index dbbc025de38c..aa95c4189d1e 100644 --- a/clang/lib/Driver/ToolChains/Arch/X86.cpp +++ b/clang/lib/Driver/ToolChains/Arch/X86.cpp @@ -184,6 +184,24 @@ void x86::getX86TargetFeatures(const Driver &D, const llvm::Triple &Triple, LVIOpt = options::OPT_mlvi_cfi; } + if (Args.hasFlag(options::OPT_m_seses, options::OPT_mno_seses, false)) { + if (LVIOpt == options::OPT_mlvi_hardening) + D.Diag(diag::err_drv_argument_not_allowed_with) + << D.getOpts().getOptionName(options::OPT_mlvi_hardening) + << D.getOpts().getOptionName(options::OPT_m_seses); + + if (SpectreOpt != clang::driver::options::ID::OPT_INVALID) + D.Diag(diag::err_drv_argument_not_allowed_with) + << D.getOpts().getOptionName(SpectreOpt) + << D.getOpts().getOptionName(options::OPT_m_seses); + + Features.push_back("+seses"); + if (!Args.hasArg(options::OPT_mno_lvi_cfi)) { + Features.push_back("+lvi-cfi"); + LVIOpt = options::OPT_mlvi_cfi; + } + } + if (SpectreOpt != clang::driver::options::ID::OPT_INVALID && LVIOpt != clang::driver::options::ID::OPT_INVALID) { D.Diag(diag::err_drv_argument_not_allowed_with) diff --git a/clang/test/Driver/x86-target-features.c b/clang/test/Driver/x86-target-features.c index 817caeecd71e..85a9374ab905 100644 --- a/clang/test/Driver/x86-target-features.c +++ b/clang/test/Driver/x86-target-features.c @@ -178,6 +178,27 @@ // RUN: %clang -target i386-linux-gnu -mlvi-hardening -mretpoline-external-thunk %s -### -o %t.o 2>&1 | FileCheck -check-prefix=LVIHARDENING-RETPOLINE-EXTERNAL-THUNK %s // LVIHARDENING-RETPOLINE-EXTERNAL-THUNK: error: invalid argument 'mretpoline-external-thunk' not allowed with 'mlvi-hardening' +// RUN: %clang -target i386-linux-gnu -mseses %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES %s +// RUN: %clang -target i386-linux-gnu -mno-seses %s -### -o %t.o 2>&1 | FileCheck -check-prefix=NO-SESES %s +// SESES: "-target-feature" "+seses" +// SESES: "-target-feature" "+lvi-cfi" +// NO-SESES-NOT: seses +// NO-SESES-NOT: lvi-cfi + +// RUN: %clang -target i386-linux-gnu -mseses -mno-lvi-cfi %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES-NOLVICFI %s +// SESES-NOLVICFI: "-target-feature" "+seses" +// SESES-NOLVICFI-NOT: lvi-cfi + +// RUN: %clang -target i386-linux-gnu -mseses -mspeculative-load-hardening %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES-SLH %s +// SESES-SLH: error: invalid argument 'mspeculative-load-hardening' not allowed with 'mseses' +// RUN: %clang -target i386-linux-gnu -mseses -mretpoline %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES-RETPOLINE %s +// SESES-RETPOLINE: error: invalid argument 'mretpoline' not allowed with 'mseses' +// RUN: %clang -target i386-linux-gnu -mseses -mretpoline-external-thunk %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES-RETPOLINE-EXTERNAL-THUNK %s +// SESES-RETPOLINE-EXTERNAL-THUNK: error: invalid argument 'mretpoline-external-thunk' not allowed with 'mseses' + +// RUN: %clang -target i386-linux-gnu -mseses -mlvi-hardening %s -### -o %t.o 2>&1 | FileCheck -check-prefix=SESES-LVIHARDENING %s +// SESES-LVIHARDENING: error: invalid argument 'mlvi-hardening' not allowed with 'mseses' + // RUN: %clang -target i386-linux-gnu -mwaitpkg %s -### -o %t.o 2>&1 | FileCheck -check-prefix=WAITPKG %s // RUN: %clang -target i386-linux-gnu -mno-waitpkg %s -### -o %t.o 2>&1 | FileCheck -check-prefix=NO-WAITPKG %s // WAITPKG: "-target-feature" "+waitpkg" diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td index eb50e6bf9ff1..dc1ff72add49 100644 --- a/llvm/lib/Target/X86/X86.td +++ b/llvm/lib/Target/X86/X86.td @@ -455,6 +455,15 @@ def FeatureLVIControlFlowIntegrity "LFENCE instruction to serialize control flow. Also decompose RET " "instructions into a POP+LFENCE+JMP sequence.">; +// Enable SESES to mitigate speculative execution attacks +def FeatureSpeculativeExecutionSideEffectSuppression + : SubtargetFeature< + "seses", "UseSpeculativeExecutionSideEffectSuppression", "true", + "Prevent speculative execution side channel timing attacks by " + "inserting a speculation barrier before memory reads, memory writes, " + "and conditional branches. Implies LVI Control Flow integrity.", + [FeatureLVIControlFlowIntegrity]>; + // Mitigate LVI attacks against data loads def FeatureLVILoadHardening : SubtargetFeature< diff --git a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp index 75138f2de696..7e91c37367d2 100644 --- a/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp +++ b/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp @@ -30,7 +30,7 @@ using namespace llvm; STATISTIC(NumLFENCEsInserted, "Number of lfence instructions inserted"); static cl::opt EnableSpeculativeExecutionSideEffectSuppression( - "x86-seses-enable", + "x86-seses-enable-without-lvi-cfi", cl::desc("Force enable speculative execution side effect suppression. " "(Note: User must pass -mlvi-cfi in order to mitigate indirect " "branches and returns.)"), @@ -91,10 +91,12 @@ bool X86SpeculativeExecutionSideEffectSuppression::runOnMachineFunction( const auto &OptLevel = MF.getTarget().getOptLevel(); const X86Subtarget &Subtarget = MF.getSubtarget(); - // Check whether SESES needs to run as the fallback for LVI at O0 or if the - // user explicitly passed the SESES flag. + // Check whether SESES needs to run as the fallback for LVI at O0, whether the + // user explicitly passed an SESES flag, or whether the SESES target feature + // was set. if (!EnableSpeculativeExecutionSideEffectSuppression && - !(Subtarget.useLVILoadHardening() && OptLevel == CodeGenOpt::None)) + !(Subtarget.useLVILoadHardening() && OptLevel == CodeGenOpt::None) && + !Subtarget.useSpeculativeExecutionSideEffectSuppression()) return false; LLVM_DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName() diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h index 6a2879e4a5d7..de45d357e3c2 100644 --- a/llvm/lib/Target/X86/X86Subtarget.h +++ b/llvm/lib/Target/X86/X86Subtarget.h @@ -442,6 +442,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// POP+LFENCE+JMP sequence. bool UseLVIControlFlowIntegrity = false; + /// Enable Speculative Execution Side Effect Suppression + bool UseSpeculativeExecutionSideEffectSuppression = false; + /// Insert LFENCE instructions to prevent data speculatively injected into /// loads from being used maliciously. bool UseLVILoadHardening = false; @@ -759,6 +762,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { bool useGLMDivSqrtCosts() const { return UseGLMDivSqrtCosts; } bool useLVIControlFlowIntegrity() const { return UseLVIControlFlowIntegrity; } bool useLVILoadHardening() const { return UseLVILoadHardening; } + bool useSpeculativeExecutionSideEffectSuppression() const { + return UseSpeculativeExecutionSideEffectSuppression; + } unsigned getPreferVectorWidth() const { return PreferVectorWidth; } unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; } diff --git a/llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll b/llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll index acbdc9e9387e..fdd56382448a 100644 --- a/llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll +++ b/llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll @@ -1,8 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable %s -o - | FileCheck %s -; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable -x86-seses-one-lfence-per-bb %s -o - | FileCheck %s --check-prefix=X86-ONE-LFENCE -; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable -x86-seses-omit-branch-lfences %s -o - | FileCheck %s --check-prefix=X86-OMIT-BR -; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable -x86-seses-only-lfence-non-const %s -o - | FileCheck %s --check-prefix=X86-NON-CONST +; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable-without-lvi-cfi %s -o - | FileCheck %s +; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable-without-lvi-cfi -x86-seses-one-lfence-per-bb %s -o - | FileCheck %s --check-prefix=X86-ONE-LFENCE +; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable-without-lvi-cfi -x86-seses-omit-branch-lfences %s -o - | FileCheck %s --check-prefix=X86-OMIT-BR +; RUN: llc -mtriple=x86_64-unknown-linux-gnu -x86-seses-enable-without-lvi-cfi -x86-seses-only-lfence-non-const %s -o - | FileCheck %s --check-prefix=X86-NON-CONST define void @_Z4buzzv() { ; CHECK-LABEL: _Z4buzzv: From llvm-commits at lists.llvm.org Tue Jul 7 13:21:42 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:21:42 +0000 (UTC) Subject: [PATCH] D79910: [x86][seses] Add clang flag; Use lvi-cfi with seses In-Reply-To: References: Message-ID: <55efb5eb8e3afd20ee03935dc7a3f536@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9d9e499840af: [x86][seses] Add clang flag; Use lvi-cfi with seses (authored by zbrid). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79910/new/ https://reviews.llvm.org/D79910 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79910.276197.patch Type: text/x-patch Size: 10042 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:24:14 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:24:14 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: <0f0dba2030c5d9ca75b76fb6554d8095@localhost.localdomain> fhahn planned changes to this revision. fhahn added a comment. In D83335#2137150 , @efriedma wrote: > I'm concerned that the behavior of queues with multiple candidates with the same score might not be consistent across compilers. (This is similar to using llvm::sort when you really need std::stable_sort.) It looks like the comperators try hard to break ties between candidates with the same score (via an increasing `NodeQueueId`) but I think I noticed cases where there we still visit candidates in a slightly different order. I'll take a closer look Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Tue Jul 7 13:28:15 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:28:15 +0000 (UTC) Subject: [PATCH] D83021: [Inliner] Don't skip inlining alwaysinline in optnone functions In-Reply-To: References: Message-ID: <94e21828ec9c333ab0f2838cfe891d70@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG2279380eab08: [Inliner] Don't skip inlining alwaysinline in optnone functions (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83021/new/ https://reviews.llvm.org/D83021 Files: llvm/lib/Transforms/IPO/Inliner.cpp llvm/test/Transforms/Inline/inline-optnone.ll Index: llvm/test/Transforms/Inline/inline-optnone.ll =================================================================== --- llvm/test/Transforms/Inline/inline-optnone.ll +++ llvm/test/Transforms/Inline/inline-optnone.ll @@ -1,4 +1,5 @@ ; RUN: opt < %s -inline -S | FileCheck %s +; RUN: opt < %s --passes=inline -S | FileCheck %s ; Test that functions with attribute optnone are not inlined. ; Also test that only functions with attribute alwaysinline are Index: llvm/lib/Transforms/IPO/Inliner.cpp =================================================================== --- llvm/lib/Transforms/IPO/Inliner.cpp +++ llvm/lib/Transforms/IPO/Inliner.cpp @@ -791,7 +791,9 @@ LazyCallGraph::Node &N = *CG.lookup(F); if (CG.lookupSCC(N) != C) continue; - if (F.hasOptNone()) { + if (!Calls[I].first->getCalledFunction()->hasFnAttribute( + Attribute::AlwaysInline) && + F.hasOptNone()) { setInlineRemark(*Calls[I].first, "optnone attribute"); continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83021.275685.patch Type: text/x-patch Size: 1009 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:28:19 2020 From: llvm-commits at lists.llvm.org (Zola Bridges via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:28:19 +0000 (UTC) Subject: [PATCH] D79910: [x86][seses] Add clang flag; Use lvi-cfi with seses In-Reply-To: References: Message-ID: <1fd4847c1a78012fb11b04a4f237ccd2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9d9e499840af: [x86][seses] Add clang flag; Use lvi-cfi with seses (authored by zbrid). Herald added a subscriber: jfb. Changed prior to commit: https://reviews.llvm.org/D79910?vs=272117&id=275687#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79910/new/ https://reviews.llvm.org/D79910 Files: clang/docs/ClangCommandLineReference.rst clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Arch/X86.cpp clang/test/Driver/x86-target-features.c llvm/lib/Target/X86/X86.td llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp llvm/lib/Target/X86/X86Subtarget.h llvm/test/CodeGen/X86/speculative-execution-side-effect-suppression.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79910.275687.patch Type: text/x-patch Size: 10042 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:31:05 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:31:05 +0000 (UTC) Subject: [PATCH] D83336: [flang] Support for image selectors In-Reply-To: References: Message-ID: <083c323582440dac6abe05a73ae8e1dd@localhost.localdomain> tskeith accepted this revision. tskeith added inline comments. This revision is now accepted and ready to land. ================ Comment at: flang/lib/Semantics/check-coarray.cpp:113 + context_.Say(parser::FindSourceLocation(imageSelectorSpec), // C929 + "TEAM STAT variable can only be specified once"_err_en_US); + } ---------------- I think this should be "STAT variable..." ================ Comment at: flang/lib/Semantics/expression.cpp:1090-1100 + std::visit( + common::visitors{ + [&](const parser::ImageSelectorSpec::Stat &statVar) { + Analyze(statVar.v); + }, + [&](const parser::TeamValue &teamValue) { Analyze(teamValue.v); }, + [&](const parser::ImageSelectorSpec::Team_Number &teamNumber) { ---------------- Because all of the cases are the same, this can be simplified to: ``` std::visit([&](const auto &x) { Analyze(x.v); }, imageSelSpec.u); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83336/new/ https://reviews.llvm.org/D83336 From llvm-commits at lists.llvm.org Tue Jul 7 13:33:42 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:33:42 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <12b503b0bff707e1b02cdf54c5697e75@localhost.localdomain> zequanwu added inline comments. ================ Comment at: llvm/test/Other/opt-O2-pipeline.ll:289 +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ---------------- MaskRay wrote: > zequanwu wrote: > > zequanwu wrote: > > > nikic wrote: > > > > hans wrote: > > > > > nikic wrote: > > > > > > Is it possible to switch this pass to use LazyBPI / LazyBFA, only fetched if PGO is actually in use? > > > > > > > > > > > > PGO functionality that most people don't use adding expensive analysis passes like PDT should be avoided. > > > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > > > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > > > I wonder if just switching to LazyBlockFrequencyInfo would help though. It looks to me like the CGProfile would request info about each function anyway. > > > > > > > > It would only help if there is some way to only fetch the analysis conditionally. I believe many PGO passes use something like PSI.hasProfileSummary() or F.hasProfileData() for that. > > > > > > > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > > > > > > > Right, just disabling this by default in clang/opt would also work. > > > > > > > > For reference, the current compile-time numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=516ff1d4baee28b1911737e47b42973567adf8ff&to=8df840660bb764b6653fcfd9ac7a72cc6adebde6&stat=instructions Not huge, but it adds up (some similar regressions have been introduced in LLVM 10). > > > Do you mean disabling it just for LPM or both? > > > I was surprised to see that Clang sets Opts.CallGraphProfile solely based on whether the integrated assembler is used. Maybe a better fix is to only set that to true when a profile is actually being used? > > For Clang, a better fix I think is that `Opts.CallGraphProfile` should based on both whether the integrated assembler is used and whether profile instrumentation is turned on. What do you think? > I'd prefer not having `CallGraphProfile` > > * `-no-integrated-as -S` => no .cgprofile (.llvm_addrsig behaves this way) > * `-S` -> .cgprofile As discussed above, I think `CGProfilePass` should be disabled by default in clang unless `-no-integrated-as` is not given and `-fprofile-instrument-use-path=` is given. So, `Opts.CallGraphProfile` is a convenient switch for that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 13:34:40 2020 From: llvm-commits at lists.llvm.org (Philip Reames via llvm-commits) Date: Tue, 07 Jul 2020 13:34:40 -0700 (PDT) Subject: [llvm] b172cd7 - [Statepoint] Factor out logic for non-stack non-vreg lowering [almost NFC] Message-ID: <5f04dc60.1c69fb81.9eb5b.8075@mx.google.com> Author: Philip Reames Date: 2020-07-07T13:34:28-07:00 New Revision: b172cd7812405beaa9db817d85ac1458f7e31547 URL: https://github.com/llvm/llvm-project/commit/b172cd7812405beaa9db817d85ac1458f7e31547 DIFF: https://github.com/llvm/llvm-project/commit/b172cd7812405beaa9db817d85ac1458f7e31547.diff LOG: [Statepoint] Factor out logic for non-stack non-vreg lowering [almost NFC] This is inspired by D81648. The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic. This is not NFC in that the handling of constants larger than > 64 bit has changed. The old lowering would crash on values which could not be encoded as a sign extended 64 bit value. The new lowering just spills all constants > 64 bits. We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now. Added: Modified: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp llvm/test/CodeGen/X86/statepoint-vector.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp index 20aae87531ad..d63fedda3bd1 100644 --- a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp @@ -221,6 +221,28 @@ static Optional findPreviousSpillSlot(const Value *Val, return None; } + +/// Return true if-and-only-if the given SDValue can be lowered as either a +/// constant argument or a stack reference. The key point is that the value +/// doesn't need to be spilled or tracked as a vreg use. +static bool willLowerDirectly(SDValue Incoming) { + // We are making an unchecked assumption that the frame size <= 2^16 as that + // is the largest offset which can be encoded in the stackmap format. + if (isa(Incoming)) + return true; + + // The largest constant describeable in the StackMap format is 64 bits. + // Potential Optimization: Constants values are sign extended by consumer, + // and thus there are many constants of static type > 64 bits whose value + // happens to be sext(Con64) and could thus be lowered directly. + if (Incoming.getValueType().getSizeInBits() > 64) + return false; + + return (isa(Incoming) || isa(Incoming) || + Incoming.isUndef()); +} + + /// Try to find existing copies of the incoming values in stack slots used for /// statepoint spilling. If we can find a spill slot for the incoming value, /// mark that slot as allocated, and reuse the same slot for this safepoint. @@ -230,12 +252,10 @@ static void reservePreviousStackSlotForValue(const Value *IncomingValue, SelectionDAGBuilder &Builder) { SDValue Incoming = Builder.getValue(IncomingValue); - if (isa(Incoming) || isa(Incoming) || - isa(Incoming) || Incoming.isUndef()) { - // We won't need to spill this, so no need to check for previously - // allocated stack slots + // If we won't spill this, we don't need to check for previously allocated + // stack slots. + if (willLowerDirectly(Incoming)) return; - } SDValue OldLocation = Builder.StatepointLowering.getLocation(Incoming); if (OldLocation.getNode()) @@ -384,45 +404,53 @@ lowerIncomingStatepointValue(SDValue Incoming, bool RequireSpillSlot, SmallVectorImpl &Ops, SmallVectorImpl &MemRefs, SelectionDAGBuilder &Builder) { - // Note: We know all of these spills are independent, but don't bother to - // exploit that chain wise. DAGCombine will happily do so as needed, so - // doing it here would be a small compile time win at most. - SDValue Chain = Builder.getRoot(); - - if (Incoming.isUndef() && Incoming.getValueType().getSizeInBits() <= 64) { - // Put an easily recognized constant that's unlikely to be a valid - // value so that uses of undef by the consumer of the stackmap is - // easily recognized. This is legal since the compiler is always - // allowed to chose an arbitrary value for undef. - pushStackMapConstant(Ops, Builder, 0xFEFEFEFE); - return; + + if (willLowerDirectly(Incoming)) { + if (FrameIndexSDNode *FI = dyn_cast(Incoming)) { + // This handles allocas as arguments to the statepoint (this is only + // really meaningful for a deopt value. For GC, we'd be trying to + // relocate the address of the alloca itself?) + assert(Incoming.getValueType() == Builder.getFrameIndexTy() && + "Incoming value is a frame index!"); + Ops.push_back(Builder.DAG.getTargetFrameIndex(FI->getIndex(), + Builder.getFrameIndexTy())); + + auto &MF = Builder.DAG.getMachineFunction(); + auto *MMO = getMachineMemOperand(MF, *FI); + MemRefs.push_back(MMO); + return; + } + + assert(Incoming.getValueType().getSizeInBits() <= 64); + + if (Incoming.isUndef()) { + // Put an easily recognized constant that's unlikely to be a valid + // value so that uses of undef by the consumer of the stackmap is + // easily recognized. This is legal since the compiler is always + // allowed to chose an arbitrary value for undef. + pushStackMapConstant(Ops, Builder, 0xFEFEFEFE); + return; + } + + // If the original value was a constant, make sure it gets recorded as + // such in the stackmap. This is required so that the consumer can + // parse any internal format to the deopt state. It also handles null + // pointers and other constant pointers in GC states. + if (ConstantSDNode *C = dyn_cast(Incoming)) { + pushStackMapConstant(Ops, Builder, C->getSExtValue()); + return; + } else if (ConstantFPSDNode *C = dyn_cast(Incoming)) { + pushStackMapConstant(Ops, Builder, + C->getValueAPF().bitcastToAPInt().getZExtValue()); + return; + } + + llvm_unreachable("unhandled direct lowering case"); } - // If the original value was a constant, make sure it gets recorded as - // such in the stackmap. This is required so that the consumer can - // parse any internal format to the deopt state. It also handles null - // pointers and other constant pointers in GC states. Note the constant - // vectors do not appear to actually hit this path and that anything larger - // than an i64 value (not type!) will fail asserts here. - if (ConstantFPSDNode *C = dyn_cast(Incoming)) { - pushStackMapConstant(Ops, Builder, - C->getValueAPF().bitcastToAPInt().getZExtValue()); - } else if (ConstantSDNode *C = dyn_cast(Incoming)) { - pushStackMapConstant(Ops, Builder, C->getSExtValue()); - } else if (FrameIndexSDNode *FI = dyn_cast(Incoming)) { - // This handles allocas as arguments to the statepoint (this is only - // really meaningful for a deopt value. For GC, we'd be trying to - // relocate the address of the alloca itself?) - assert(Incoming.getValueType() == Builder.getFrameIndexTy() && - "Incoming value is a frame index!"); - Ops.push_back(Builder.DAG.getTargetFrameIndex(FI->getIndex(), - Builder.getFrameIndexTy())); - auto &MF = Builder.DAG.getMachineFunction(); - auto *MMO = getMachineMemOperand(MF, *FI); - MemRefs.push_back(MMO); - } else if (!RequireSpillSlot) { + if (!RequireSpillSlot) { // If this value is live in (not live-on-return, or live-through), we can // treat it the same way patchpoint treats it's "live in" values. We'll // end up folding some of these into stack references, but they'll be @@ -432,20 +460,20 @@ lowerIncomingStatepointValue(SDValue Incoming, bool RequireSpillSlot, // fix-up pass should be executed to force spilling of such registers. Ops.push_back(Incoming); } else { - // Otherwise, locate a spill slot and explicitly spill it so it - // can be found by the runtime later. We currently do not support - // tracking values through callee saved registers to their eventual - // spill location. This would be a useful optimization, but would - // need to be optional since it requires a lot of complexity on the - // runtime side which not all would support. + // Otherwise, locate a spill slot and explicitly spill it so it can be + // found by the runtime later. Note: We know all of these spills are + // independent, but don't bother to exploit that chain wise. DAGCombine + // will happily do so as needed, so doing it here would be a small compile + // time win at most. + SDValue Chain = Builder.getRoot(); auto Res = spillIncomingStatepointValue(Incoming, Chain, Builder); Ops.push_back(std::get<0>(Res)); if (auto *MMO = std::get<2>(Res)) MemRefs.push_back(MMO); Chain = std::get<1>(Res);; + Builder.DAG.setRoot(Chain); } - Builder.DAG.setRoot(Chain); } /// Lower deopt state and gc pointer arguments of the statepoint. The actual diff --git a/llvm/test/CodeGen/X86/statepoint-vector.ll b/llvm/test/CodeGen/X86/statepoint-vector.ll index 3c2408251d6c..a7d7be8ed069 100644 --- a/llvm/test/CodeGen/X86/statepoint-vector.ll +++ b/llvm/test/CodeGen/X86/statepoint-vector.ll @@ -112,21 +112,26 @@ entry: ret <2 x i8 addrspace(1)*> %obj.relocated } -; Check that we can lower a constant typed as i128 correctly. Note that the -; actual value is representable in 64 bits. We don't have a representation -; of larger than 64 bit constant in the StackMap format. +; Check that we can lower a constant typed as i128 correctly. We don't have +; a representation of larger than 64 bit constant in the StackMap format. At +; the moment, this simply means spilling them, but there's a potential +; optimization for values representable as sext(Con64). define void @test5() gc "statepoint-example" { ; CHECK-LABEL: test5: ; CHECK: # %bb.0: # %entry -; CHECK-NEXT: pushq %rax -; CHECK-NEXT: .cfi_def_cfa_offset 16 +; CHECK-NEXT: subq $40, %rsp +; CHECK-NEXT: .cfi_def_cfa_offset 48 +; CHECK-NEXT: xorps %xmm0, %xmm0 +; CHECK-NEXT: movups %xmm0, {{[0-9]+}}(%rsp) +; CHECK-NEXT: movq $-1, {{[0-9]+}}(%rsp) +; CHECK-NEXT: movq $-1, {{[0-9]+}}(%rsp) ; CHECK-NEXT: callq do_safepoint ; CHECK-NEXT: .Ltmp4: -; CHECK-NEXT: popq %rax +; CHECK-NEXT: addq $40, %rsp ; CHECK-NEXT: .cfi_def_cfa_offset 8 ; CHECK-NEXT: retq entry: - %safepoint_token = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0) ["deopt" (i128 0)] + %safepoint_token = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0) ["deopt" (i128 0, i128 -1)] ret void } From llvm-commits at lists.llvm.org Tue Jul 7 13:36:20 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Tue, 07 Jul 2020 13:36:20 -0700 (PDT) Subject: [llvm] 42bb481 - AMDGPU/GlobalISel: Fix skipping unused kernel arguments Message-ID: <5f04dcc4.1c69fb81.a0387.f968@mx.google.com> Author: Matt Arsenault Date: 2020-07-07T16:36:13-04:00 New Revision: 42bb481442c3368f2e98f26da6151e7c5ad3ae8e URL: https://github.com/llvm/llvm-project/commit/42bb481442c3368f2e98f26da6151e7c5ad3ae8e DIFF: https://github.com/llvm/llvm-project/commit/42bb481442c3368f2e98f26da6151e7c5ad3ae8e.diff LOG: AMDGPU/GlobalISel: Fix skipping unused kernel arguments The tests in a5b9ad7e9aca1329ba310e638dafa58c47468a58 actually failed the verifier, which for some reason is not the default. Also add tests for 0-sized function arguments, which do not add entries to the expected register lists. Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp index 83e5fcef7d7b..d1701851fea0 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp @@ -523,8 +523,10 @@ bool AMDGPUCallLowering::lowerFormalArgumentsKernel( uint64_t ArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + BaseOffset; ExplicitArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + AllocSize; - if (Arg.use_empty()) + if (Arg.use_empty()) { + ++i; continue; + } ArrayRef OrigArgRegs = VRegs[i]; Register ArgReg = diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll index 4c48f9cc49fe..76a6f1732102 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -O0 -amdgpu-ir-lower-kernel-arguments=0 -stop-after=irtranslator -global-isel %s -o - | FileCheck -check-prefix=HSA-VI %s -; RUN: llc -mtriple=amdgcn-- -mcpu=fiji -O0 -amdgpu-ir-lower-kernel-arguments=0 -stop-after=irtranslator -global-isel %s -o - | FileCheck -check-prefix=LEGACY-MESA-VI %s +; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -O0 -amdgpu-ir-lower-kernel-arguments=0 -stop-after=irtranslator -verify-machineinstrs %s -o - | FileCheck -check-prefix=HSA-VI %s +; RUN: llc -global-isel -mtriple=amdgcn-- -mcpu=fiji -O0 -amdgpu-ir-lower-kernel-arguments=0 -stop-after=irtranslator -verify-machineinstrs %s -o - | FileCheck -check-prefix=LEGACY-MESA-VI %s define amdgpu_kernel void @i8_arg(i32 addrspace(1)* nocapture %out, i8 %in) nounwind { ; HSA-VI-LABEL: name: i8_arg @@ -1070,17 +1070,55 @@ define amdgpu_kernel void @i1_arg_sext_i64(i64 addrspace(1)* %out, i1 %x) nounwi ret void } -define amdgpu_kernel void @empty_struct_arg({} %in) nounwind { +; 0-sized arguments do not add a slot to the argument register set, so +; waste an index in the virtual register array. +define amdgpu_kernel void @empty_struct_arg({} %arg0, i32 %arg1) nounwind { ; HSA-VI-LABEL: name: empty_struct_arg ; HSA-VI: bb.1 (%ir-block.0): ; HSA-VI: liveins: $sgpr4_sgpr5 ; HSA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr4_sgpr5 + ; HSA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0 + ; HSA-VI: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C]](s64) + ; HSA-VI: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 4, align 16, addrspace 4) + ; HSA-VI: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; HSA-VI: G_STORE [[LOAD]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) ; HSA-VI: S_ENDPGM 0 ; LEGACY-MESA-VI-LABEL: name: empty_struct_arg ; LEGACY-MESA-VI: bb.1 (%ir-block.0): ; LEGACY-MESA-VI: liveins: $sgpr0_sgpr1 ; LEGACY-MESA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr0_sgpr1 + ; LEGACY-MESA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 36 + ; LEGACY-MESA-VI: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C]](s64) + ; LEGACY-MESA-VI: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 4, addrspace 4) + ; LEGACY-MESA-VI: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; LEGACY-MESA-VI: G_STORE [[LOAD]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) ; LEGACY-MESA-VI: S_ENDPGM 0 + store i32 %arg1, i32 addrspace(1)* undef + ret void +} + +define amdgpu_kernel void @empty_array_arg([0 x i8] %arg0, i32 %arg1) nounwind { + ; HSA-VI-LABEL: name: empty_array_arg + ; HSA-VI: bb.1 (%ir-block.0): + ; HSA-VI: liveins: $sgpr4_sgpr5 + ; HSA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr4_sgpr5 + ; HSA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0 + ; HSA-VI: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C]](s64) + ; HSA-VI: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 4, align 16, addrspace 4) + ; HSA-VI: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; HSA-VI: G_STORE [[LOAD]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) + ; HSA-VI: S_ENDPGM 0 + ; LEGACY-MESA-VI-LABEL: name: empty_array_arg + ; LEGACY-MESA-VI: bb.1 (%ir-block.0): + ; LEGACY-MESA-VI: liveins: $sgpr0_sgpr1 + ; LEGACY-MESA-VI: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr0_sgpr1 + ; LEGACY-MESA-VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 36 + ; LEGACY-MESA-VI: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C]](s64) + ; LEGACY-MESA-VI: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load 4, addrspace 4) + ; LEGACY-MESA-VI: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; LEGACY-MESA-VI: G_STORE [[LOAD]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) + ; LEGACY-MESA-VI: S_ENDPGM 0 + store i32 %arg1, i32 addrspace(1)* undef ret void } @@ -1171,13 +1209,15 @@ define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, ; HSA-VI: [[EXTRACT1:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD]](s96), 32 ; HSA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 13 ; HSA-VI: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C1]](s64) - ; HSA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; HSA-VI: [[LOAD1:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; HSA-VI: [[EXTRACT2:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD1]](s96), 0 + ; HSA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD1]](s96), 32 ; HSA-VI: [[C2:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 ; HSA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C2]](p1) ; HSA-VI: G_STORE [[EXTRACT]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; HSA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) - ; HSA-VI: G_STORE %3:_(s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) - ; HSA-VI: G_STORE %4:_(s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; HSA-VI: G_STORE [[EXTRACT2]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; HSA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; HSA-VI: S_ENDPGM 0 ; LEGACY-MESA-VI-LABEL: name: packed_struct_argument_alignment ; LEGACY-MESA-VI: bb.1 (%ir-block.1): @@ -1190,13 +1230,15 @@ define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, ; LEGACY-MESA-VI: [[EXTRACT1:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD]](s96), 32 ; LEGACY-MESA-VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 49 ; LEGACY-MESA-VI: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY]], [[C1]](s64) - ; LEGACY-MESA-VI: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; LEGACY-MESA-VI: [[LOAD1:%[0-9]+]]:_(s96) = G_LOAD [[PTR_ADD1]](p4) :: (dereferenceable invariant load 12, align 1, addrspace 4) + ; LEGACY-MESA-VI: [[EXTRACT2:%[0-9]+]]:_(s32) = G_EXTRACT [[LOAD1]](s96), 0 + ; LEGACY-MESA-VI: [[EXTRACT3:%[0-9]+]]:_(s64) = G_EXTRACT [[LOAD1]](s96), 32 ; LEGACY-MESA-VI: [[C2:%[0-9]+]]:_(p1) = G_CONSTANT i64 0 ; LEGACY-MESA-VI: [[COPY1:%[0-9]+]]:_(p1) = COPY [[C2]](p1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: G_STORE [[EXTRACT1]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) - ; LEGACY-MESA-VI: G_STORE %3:_(s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) - ; LEGACY-MESA-VI: G_STORE %4:_(s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: G_STORE [[EXTRACT2]](s32), [[C2]](p1) :: (volatile store 4 into `i32 addrspace(1)* null`, addrspace 1) + ; LEGACY-MESA-VI: G_STORE [[EXTRACT3]](s64), [[COPY1]](p1) :: (volatile store 8 into `i64 addrspace(1)* null`, addrspace 1) ; LEGACY-MESA-VI: S_ENDPGM 0 %val0 = extractvalue <{i32, i64}> %arg0, 0 %val1 = extractvalue <{i32, i64}> %arg0, 1 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll index f7e6fd5cd474..c5bea011252f 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll @@ -2,6 +2,34 @@ ; RUN: llc -march=amdgcn -mcpu=fiji -O0 -stop-after=irtranslator -global-isel -verify-machineinstrs -o - %s | FileCheck %s ; FIXME: pre-VI should have same ABI without legal i16 operations. +define void @void_func_empty_arg({} %arg0, i32 %arg1) #0 { + ; CHECK-LABEL: name: void_func_empty_arg + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 + ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 + ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; CHECK: G_STORE [[COPY]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) + ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] + ; CHECK: S_SETPC_B64_return [[COPY2]] + store i32 %arg1, i32 addrspace(1)* undef + ret void +} + +define void @void_func_empty_array([0 x i8] %arg0, i32 %arg1) #0 { + ; CHECK-LABEL: name: void_func_empty_array + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 + ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 + ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF + ; CHECK: G_STORE [[COPY]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1) + ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] + ; CHECK: S_SETPC_B64_return [[COPY2]] + store i32 %arg1, i32 addrspace(1)* undef + ret void +} + define void @void_func_i1(i1 %arg0) #0 { ; CHECK-LABEL: name: void_func_i1 ; CHECK: bb.1 (%ir-block.0): From llvm-commits at lists.llvm.org Tue Jul 7 13:36:22 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Tue, 07 Jul 2020 13:36:22 -0700 (PDT) Subject: [llvm] 23157f3 - GlobalISel: Handle EVT argument lowering correctly Message-ID: <5f04dcc6.1c69fb81.29eff.a48a@mx.google.com> Author: Matt Arsenault Date: 2020-07-07T16:36:14-04:00 New Revision: 23157f3bdb4f6af1d24aac7d4fbf439b71bba216 URL: https://github.com/llvm/llvm-project/commit/23157f3bdb4f6af1d24aac7d4fbf439b71bba216 DIFF: https://github.com/llvm/llvm-project/commit/23157f3bdb4f6af1d24aac7d4fbf439b71bba216.diff LOG: GlobalISel: Handle EVT argument lowering correctly handleAssignments was assuming every argument type is an MVT, and assignArg would always fail. This fixes one of the hacks in the current AMDGPU calling convention code that pre-processes the arguments. Added: Modified: llvm/lib/CodeGen/GlobalISel/CallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp index c2294fad998a..a7146515c4c9 100644 --- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp @@ -194,11 +194,11 @@ bool CallLowering::handleAssignments(CCState &CCInfo, unsigned NumArgs = Args.size(); for (unsigned i = 0; i != NumArgs; ++i) { - MVT CurVT = MVT::getVT(Args[i].Ty); - if (Handler.assignArg(i, CurVT, CurVT, CCValAssign::Full, Args[i], - Args[i].Flags[0], CCInfo)) { - if (!CurVT.isValid()) - return false; + EVT CurVT = EVT::getEVT(Args[i].Ty); + if (!CurVT.isSimple() || + Handler.assignArg(i, CurVT.getSimpleVT(), CurVT.getSimpleVT(), + CCValAssign::Full, Args[i], Args[i].Flags[0], + CCInfo)) { MVT NewVT = TLI->getRegisterTypeForCallingConv( F.getContext(), F.getCallingConv(), EVT(CurVT)); @@ -309,8 +309,10 @@ bool CallLowering::handleAssignments(CCState &CCInfo, // FIXME: Pack registers if we have more than one. Register ArgReg = Args[i].Regs[0]; - MVT OrigVT = MVT::getVT(Args[i].Ty); - MVT VAVT = VA.getValVT(); + EVT OrigVT = EVT::getEVT(Args[i].Ty); + EVT VAVT = VA.getValVT(); + const LLT OrigTy = getLLTForType(*Args[i].Ty, DL); + if (VA.isRegLoc()) { if (Handler.isIncomingArgumentHandler() && VAVT != OrigVT) { if (VAVT.getSizeInBits() < OrigVT.getSizeInBits()) { @@ -332,7 +334,7 @@ bool CallLowering::handleAssignments(CCState &CCInfo, MIRBuilder.buildMerge(Args[i].OrigRegs[0], Args[i].Regs); continue; } - const LLT VATy(VAVT); + const LLT VATy(VAVT.getSimpleVT()); Register NewReg = MIRBuilder.getMRI()->createGenericVirtualRegister(VATy); Handler.assignValueToReg(NewReg, VA.getLocReg(), VA); @@ -340,7 +342,6 @@ bool CallLowering::handleAssignments(CCState &CCInfo, // or do an unmerge to get the lower block of elements. if (VATy.isVector() && VATy.getNumElements() > OrigVT.getVectorNumElements()) { - const LLT OrigTy(OrigVT); // Just handle the case where the VA type is 2 * original type. if (VATy.getNumElements() != OrigVT.getVectorNumElements() * 2) { LLVM_DEBUG(dbgs() diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp index d1701851fea0..05a4e3462a26 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp @@ -226,19 +226,6 @@ void AMDGPUCallLowering::splitToValueTypes( MVT RegVT = TLI.getRegisterTypeForCallingConv(Ctx, CallConv, VT); if (NumParts == 1) { - // Fixup EVTs to an MVT. - // - // FIXME: This is pretty hacky. Why do we have to split the type - // legalization logic between here and handleAssignments? - if (OrigArgIdx != AttributeList::ReturnIndex && VT != RegVT) { - assert(VT.getSizeInBits() < 32 && - "unexpected illegal type"); - Ty = Type::getInt32Ty(Ctx); - Register OrigReg = Reg; - Reg = B.getMRI()->createGenericVirtualRegister(LLT::scalar(32)); - B.buildTrunc(OrigReg, Reg); - } - // No splitting to do, but we want to replace the original type (e.g. [1 x // double] -> double). SplitArgs.emplace_back(Reg, Ty, OrigArg.Flags, OrigArg.IsFixed); diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll index c5bea011252f..340392ea8d46 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll @@ -123,10 +123,11 @@ define void @void_func_i8(i8 %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[TRUNC1:%[0-9]+]]:_(s8) = G_TRUNC [[TRUNC]](s16) ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 - ; CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s32) ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF - ; CHECK: G_STORE [[TRUNC]](s8), [[DEF]](p1) :: (store 1 into `i8 addrspace(1)* undef`, addrspace 1) + ; CHECK: G_STORE [[TRUNC1]](s8), [[DEF]](p1) :: (store 1 into `i8 addrspace(1)* undef`, addrspace 1) ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] ; CHECK: S_SETPC_B64_return [[COPY2]] store i8 %arg0, i8 addrspace(1)* undef @@ -138,8 +139,8 @@ define void @void_func_i8_zeroext(i8 zeroext %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 12 ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[TRUNC]](s8) @@ -158,8 +159,8 @@ define void @void_func_i8_signext(i8 signext %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 12 ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[TRUNC]](s8) @@ -233,8 +234,8 @@ define void @void_func_i24(i24 %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s24) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: G_STORE [[TRUNC]](s24), [[DEF]](p1) :: (store 3 into `i24 addrspace(1)* undef`, align 4, addrspace 1) ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] @@ -248,8 +249,8 @@ define void @void_func_i24_zeroext(i24 zeroext %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s24) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: G_STORE [[TRUNC]](s24), [[DEF]](p1) :: (store 3 into `i24 addrspace(1)* undef`, align 4, addrspace 1) ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] @@ -263,8 +264,8 @@ define void @void_func_i24_signext(i24 signext %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s24) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: G_STORE [[TRUNC]](s24), [[DEF]](p1) :: (store 3 into `i24 addrspace(1)* undef`, align 4, addrspace 1) ; CHECK: [[COPY2:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY1]] @@ -1569,11 +1570,12 @@ define void @void_func_struct_i8_i32({ i8, i32 } %arg0) #0 { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[TRUNC1:%[0-9]+]]:_(s8) = G_TRUNC [[TRUNC]](s16) ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 - ; CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s32) ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF - ; CHECK: G_STORE [[TRUNC]](s8), [[DEF]](p1) :: (store 1 into `{ i8, i32 } addrspace(1)* undef`, align 4, addrspace 1) + ; CHECK: G_STORE [[TRUNC1]](s8), [[DEF]](p1) :: (store 1 into `{ i8, i32 } addrspace(1)* undef`, align 4, addrspace 1) ; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4 ; CHECK: [[PTR_ADD:%[0-9]+]]:_(p1) = G_PTR_ADD [[DEF]], [[C]](s64) ; CHECK: G_STORE [[COPY1]](s32), [[PTR_ADD]](p1) :: (store 4 into `{ i8, i32 } addrspace(1)* undef` + 4, addrspace 1) @@ -1766,14 +1768,13 @@ define void @void_func_v32i32_i1_i8_i16(<32 x i32> %arg0, i1 %arg1, i8 %arg2, i1 ; CHECK: [[FRAME_INDEX:%[0-9]+]]:_(p5) = G_FRAME_INDEX %fixed-stack.3 ; CHECK: [[LOAD:%[0-9]+]]:_(s1) = G_LOAD [[FRAME_INDEX]](p5) :: (invariant load 1 from %fixed-stack.3, align 16, addrspace 5) ; CHECK: [[FRAME_INDEX1:%[0-9]+]]:_(p5) = G_FRAME_INDEX %fixed-stack.2 - ; CHECK: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[FRAME_INDEX1]](p5) :: (invariant load 4 from %fixed-stack.2, addrspace 5) + ; CHECK: [[LOAD1:%[0-9]+]]:_(s8) = G_LOAD [[FRAME_INDEX1]](p5) :: (invariant load 1 from %fixed-stack.2, align 4, addrspace 5) ; CHECK: [[FRAME_INDEX2:%[0-9]+]]:_(p5) = G_FRAME_INDEX %fixed-stack.1 ; CHECK: [[LOAD2:%[0-9]+]]:_(s16) = G_LOAD [[FRAME_INDEX2]](p5) :: (invariant load 2 from %fixed-stack.1, align 8, addrspace 5) ; CHECK: [[FRAME_INDEX3:%[0-9]+]]:_(p5) = G_FRAME_INDEX %fixed-stack.0 ; CHECK: [[LOAD3:%[0-9]+]]:_(s16) = G_LOAD [[FRAME_INDEX3]](p5) :: (invariant load 2 from %fixed-stack.0, align 4, addrspace 5) ; CHECK: [[COPY32:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<32 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32), [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32), [[COPY12]](s32), [[COPY13]](s32), [[COPY14]](s32), [[COPY15]](s32), [[COPY16]](s32), [[COPY17]](s32), [[COPY18]](s32), [[COPY19]](s32), [[COPY20]](s32), [[COPY21]](s32), [[COPY22]](s32), [[COPY23]](s32), [[COPY24]](s32), [[COPY25]](s32), [[COPY26]](s32), [[COPY27]](s32), [[COPY28]](s32), [[COPY29]](s32), [[COPY30]](s32), [[COPY31]](s32) - ; CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[LOAD1]](s32) ; CHECK: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF ; CHECK: [[COPY33:%[0-9]+]]:_(p1) = COPY [[DEF]](p1) ; CHECK: [[COPY34:%[0-9]+]]:_(p1) = COPY [[DEF]](p1) @@ -1781,7 +1782,7 @@ define void @void_func_v32i32_i1_i8_i16(<32 x i32> %arg0, i1 %arg1, i8 %arg2, i1 ; CHECK: [[COPY36:%[0-9]+]]:_(p1) = COPY [[DEF]](p1) ; CHECK: G_STORE [[BUILD_VECTOR]](<32 x s32>), [[DEF]](p1) :: (volatile store 128 into `<32 x i32> addrspace(1)* undef`, addrspace 1) ; CHECK: G_STORE [[LOAD]](s1), [[COPY33]](p1) :: (volatile store 1 into `i1 addrspace(1)* undef`, addrspace 1) - ; CHECK: G_STORE [[TRUNC]](s8), [[COPY34]](p1) :: (volatile store 1 into `i8 addrspace(1)* undef`, addrspace 1) + ; CHECK: G_STORE [[LOAD1]](s8), [[COPY34]](p1) :: (volatile store 1 into `i8 addrspace(1)* undef`, addrspace 1) ; CHECK: G_STORE [[LOAD2]](s16), [[COPY35]](p1) :: (volatile store 2 into `i16 addrspace(1)* undef`, addrspace 1) ; CHECK: G_STORE [[LOAD3]](s16), [[COPY36]](p1) :: (volatile store 2 into `half addrspace(1)* undef`, addrspace 1) ; CHECK: [[COPY37:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY32]] diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll index cc1c75e404e0..4f56e1ae598d 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll @@ -48,9 +48,9 @@ define i8* @ptrmask_flat_i16(i8* %ptr, i16 %mask) { ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY2]](s32) ; CHECK: [[COPY3:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32) - ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY2]](s32) ; CHECK: [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[MV]], [[TRUNC]](s16) ; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTRMASK]](p0) ; CHECK: $vgpr0 = COPY [[UV]](s32) @@ -119,8 +119,8 @@ define i8 addrspace(3)* @ptrmask_local_i16(i8 addrspace(3)* %ptr, i16 %mask) { ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(p3) = COPY $vgpr0 ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 - ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[PTRMASK:%[0-9]+]]:_(p3) = G_PTRMASK [[COPY]], [[TRUNC]](s16) ; CHECK: $vgpr0 = COPY [[PTRMASK]](p3) ; CHECK: [[COPY3:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY2]] diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll index 2d648580408f..efdecf7fb49f 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll @@ -6,10 +6,10 @@ define i16 @uaddsat_i16(i16 %lhs, i16 %rhs) { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 - ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[UADDSAT:%[0-9]+]]:_(s16) = G_UADDSAT [[TRUNC]], [[TRUNC1]] ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UADDSAT]](s16) ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) @@ -85,10 +85,10 @@ define i16 @saddsat_i16(i16 %lhs, i16 %rhs) { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 - ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[SADDSAT:%[0-9]+]]:_(s16) = G_SADDSAT [[TRUNC]], [[TRUNC1]] ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SADDSAT]](s16) ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) @@ -164,10 +164,10 @@ define i16 @usubsat_i16(i16 %lhs, i16 %rhs) { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 - ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[USUBSAT:%[0-9]+]]:_(s16) = G_USUBSAT [[TRUNC]], [[TRUNC1]] ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[USUBSAT]](s16) ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) @@ -243,10 +243,10 @@ define i16 @ssubsat_i16(i16 %lhs, i16 %rhs) { ; CHECK: bb.1 (%ir-block.0): ; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 - ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 - ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; CHECK: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31 ; CHECK: [[SSUBSAT:%[0-9]+]]:_(s16) = G_SSUBSAT [[TRUNC]], [[TRUNC1]] ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SSUBSAT]](s16) ; CHECK: $vgpr0 = COPY [[ANYEXT]](s32) From llvm-commits at lists.llvm.org Tue Jul 7 13:40:05 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:40:05 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: fhahn updated this revision to Diff 276200. fhahn added a comment. Addressed comments, thanks! I updated `getLocForTerminator` to return an optional pair and removed the unnecessary `ToCheck.insert` calls. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 Files: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D72410.276200.patch Type: text/x-patch Size: 12860 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:41:13 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:41:13 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: <2da339bd972e03c977a00fc10786353e@localhost.localdomain> fhahn marked 3 inline comments as done. fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1706 + + if (isFreeCall(MaybeTerm, &TLI)) { + DataLayout DL = MaybeTerm->getParent()->getModule()->getDataLayout(); ---------------- asbirlea wrote: > I'm not sure how expensive the `isFreeCall` is, but the check is already done inside the call to `getLocForTerminator`. You could add a bool optional, or return a pair, if worth eliminating the repeated check. Done, thanks! ================ Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:2263 + + ToCheck.insert(NextDef->getDefiningAccess()); + if (OR == OW_Complete) { ---------------- asbirlea wrote: > I don't understand how this change is related. It looks like this was a leftover from an earlier rebase. I removed it. It should not be required as it is already added around line 2205. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 From llvm-commits at lists.llvm.org Tue Jul 7 13:42:57 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:42:57 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: PeteSteinfeld added a comment. @ktras, are you planning to implement the other coarray intrinsic functions? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Tue Jul 7 13:43:28 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:43:28 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: <195ccac958cfb4c1eaa08a8e85670799@localhost.localdomain> jcai19 marked 2 inline comments as done. jcai19 added inline comments. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:633 + if (MaxNopLength < 0 || + MaxNopLength > Asm.getBackend().getMaximumNopSize()) { + Asm.getContext().reportError( ---------------- reames wrote: > Does this behaviour match existing gnu? I'd have expected the result of specifying a "too large" maximum size to simply clamp to the target's maximum. > > This is important as if the result is semantic, then the difference between "largest encodeable" and "largest profitable" becomes a thing the rest of the code has to care about. 15 byte nops are almost always *legal* they're just not *fast*. > Does this behaviour match existing gnu? Appears so. $ cat foo.s .nops 16, 15 $ gcc -c foo.s foo.s: Assembler messages: foo.s:1: Error: invalid single nop size: 15 (expect within [0, 11]) With the patch applied, $ llvm-mc -filetype=obj -triple=x86_64 foo.s foo.s:1:1: error: illegal NOP size 15. (expected within [0, 10]) .nops 16, 15 ^ ================ Comment at: llvm/test/MC/X86/align-branch-bundle.s:9 # CHECK-NEXT: e: nop -# CHECK-NEXT: f: nop # CHECK-NEXT: 10: jle ---------------- reames wrote: > Having a test delta in a file without .nops is highly suspicious. > > I'd suggest splitting your patch into a trivial version which emits single byte nops, and an change which adds the multiple byte support. That would allow us to separate the directive mechanics from the interesting profit bits. How about we also print out instruction bytes here. If 64-bit processors can generate a two-byte long nop instruction here, shouldn't we emit that instead of two single-byte nop? Thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 13:46:07 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via llvm-commits) Date: Tue, 07 Jul 2020 13:46:07 -0700 (PDT) Subject: [llvm] 021d56a - [SVE] Make Constant::getSplatValue work for scalable vector splats Message-ID: <5f04df0f.1c69fb81.9ca75.3b1b@mx.google.com> Author: Christopher Tetreault Date: 2020-07-07T13:45:51-07:00 New Revision: 021d56abb9ee3028cb88895144d71365e566c32f URL: https://github.com/llvm/llvm-project/commit/021d56abb9ee3028cb88895144d71365e566c32f DIFF: https://github.com/llvm/llvm-project/commit/021d56abb9ee3028cb88895144d71365e566c32f.diff LOG: [SVE] Make Constant::getSplatValue work for scalable vector splats Summary: Make Constant::getSplatValue recognize scalable vector splats of the form created by ConstantVector::getSplat. Add unit test to verify that C == ConstantVector::getSplat(C)->getSplatValue() for fixed width and scalable vector splats Reviewers: efriedma, spatel, fpetrogalli, c-rhodes Reviewed By: efriedma Subscribers: sdesmalen, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82416 Added: Modified: llvm/lib/IR/Constants.cpp llvm/test/Transforms/InstSimplify/vscale.ll llvm/unittests/IR/ConstantsTest.cpp Removed: ################################################################################ diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index d8e044ee4bdc..cbbcca20ea51 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -1585,6 +1585,27 @@ Constant *Constant::getSplatValue(bool AllowUndefs) const { return CV->getSplatValue(); if (const ConstantVector *CV = dyn_cast(this)) return CV->getSplatValue(AllowUndefs); + + // Check if this is a constant expression splat of the form returned by + // ConstantVector::getSplat() + const auto *Shuf = dyn_cast(this); + if (Shuf && Shuf->getOpcode() == Instruction::ShuffleVector && + isa(Shuf->getOperand(1))) { + + const auto *IElt = dyn_cast(Shuf->getOperand(0)); + if (IElt && IElt->getOpcode() == Instruction::InsertElement && + isa(IElt->getOperand(0))) { + + ArrayRef Mask = Shuf->getShuffleMask(); + Constant *SplatVal = IElt->getOperand(1); + ConstantInt *Index = dyn_cast(IElt->getOperand(2)); + + if (Index && Index->getValue() == 0 && + std::all_of(Mask.begin(), Mask.end(), [](int I) { return I == 0; })) + return SplatVal; + } + } + return nullptr; } diff --git a/llvm/test/Transforms/InstSimplify/vscale.ll b/llvm/test/Transforms/InstSimplify/vscale.ll index 669c824685e8..d396f0289196 100644 --- a/llvm/test/Transforms/InstSimplify/vscale.ll +++ b/llvm/test/Transforms/InstSimplify/vscale.ll @@ -95,6 +95,15 @@ define i32 @insert_extract_element_same_vec_idx_2( %a) { ret i32 %r } +; more complicated expressions + +define @cmp_le_smax_always_true( %x) { +; CHECK-LABEL: @cmp_le_smax_always_true( +; CHECK-NEXT: ret shufflevector ( insertelement ( undef, i1 true, i32 0), undef, zeroinitializer) + %cmp = icmp sle %x, shufflevector ( insertelement ( undef, i64 9223372036854775807, i32 0), undef, zeroinitializer) + ret %cmp +} + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Memory Access and Addressing Operations ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; diff --git a/llvm/unittests/IR/ConstantsTest.cpp b/llvm/unittests/IR/ConstantsTest.cpp index e20039e8d9c4..3fed395daee4 100644 --- a/llvm/unittests/IR/ConstantsTest.cpp +++ b/llvm/unittests/IR/ConstantsTest.cpp @@ -638,5 +638,34 @@ TEST(ConstantsTest, isElementWiseEqual) { EXPECT_FALSE(CP00U->isElementWiseEqual(CP00U0)); } +TEST(ConstantsTest, GetSplatValueRoundTrip) { + LLVMContext Context; + + Type *FloatTy = Type::getFloatTy(Context); + Type *Int32Ty = Type::getInt32Ty(Context); + Type *Int8Ty = Type::getInt8Ty(Context); + + for (unsigned Min : {1, 2, 8}) { + ElementCount SEC = {Min, true}; + ElementCount FEC = {Min, false}; + + for (auto EC : {SEC, FEC}) { + for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { + Constant *Zero = Constant::getNullValue(Ty); + Constant *One = Constant::getAllOnesValue(Ty); + + for (auto *C : {Zero, One}) { + Constant *Splat = ConstantVector::getSplat(EC, C); + ASSERT_NE(nullptr, Splat); + + Constant *SplatVal = Splat->getSplatValue(); + EXPECT_NE(nullptr, SplatVal); + EXPECT_EQ(SplatVal, C); + } + } + } + } +} + } // end anonymous namespace } // end namespace llvm From llvm-commits at lists.llvm.org Tue Jul 7 13:46:12 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:46:12 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG021d56abb9ee: [SVE] Make Constant::getSplatValue work for scalable vector splats (authored by ctetreau). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 Files: llvm/lib/IR/Constants.cpp llvm/test/Transforms/InstSimplify/vscale.ll llvm/unittests/IR/ConstantsTest.cpp Index: llvm/unittests/IR/ConstantsTest.cpp =================================================================== --- llvm/unittests/IR/ConstantsTest.cpp +++ llvm/unittests/IR/ConstantsTest.cpp @@ -638,5 +638,34 @@ EXPECT_FALSE(CP00U->isElementWiseEqual(CP00U0)); } +TEST(ConstantsTest, GetSplatValueRoundTrip) { + LLVMContext Context; + + Type *FloatTy = Type::getFloatTy(Context); + Type *Int32Ty = Type::getInt32Ty(Context); + Type *Int8Ty = Type::getInt8Ty(Context); + + for (unsigned Min : {1, 2, 8}) { + ElementCount SEC = {Min, true}; + ElementCount FEC = {Min, false}; + + for (auto EC : {SEC, FEC}) { + for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { + Constant *Zero = Constant::getNullValue(Ty); + Constant *One = Constant::getAllOnesValue(Ty); + + for (auto *C : {Zero, One}) { + Constant *Splat = ConstantVector::getSplat(EC, C); + ASSERT_NE(nullptr, Splat); + + Constant *SplatVal = Splat->getSplatValue(); + EXPECT_NE(nullptr, SplatVal); + EXPECT_EQ(SplatVal, C); + } + } + } + } +} + } // end anonymous namespace } // end namespace llvm Index: llvm/test/Transforms/InstSimplify/vscale.ll =================================================================== --- llvm/test/Transforms/InstSimplify/vscale.ll +++ llvm/test/Transforms/InstSimplify/vscale.ll @@ -95,6 +95,15 @@ ret i32 %r } +; more complicated expressions + +define @cmp_le_smax_always_true( %x) { +; CHECK-LABEL: @cmp_le_smax_always_true( +; CHECK-NEXT: ret shufflevector ( insertelement ( undef, i1 true, i32 0), undef, zeroinitializer) + %cmp = icmp sle %x, shufflevector ( insertelement ( undef, i64 9223372036854775807, i32 0), undef, zeroinitializer) + ret %cmp +} + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Memory Access and Addressing Operations ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Index: llvm/lib/IR/Constants.cpp =================================================================== --- llvm/lib/IR/Constants.cpp +++ llvm/lib/IR/Constants.cpp @@ -1585,6 +1585,27 @@ return CV->getSplatValue(); if (const ConstantVector *CV = dyn_cast(this)) return CV->getSplatValue(AllowUndefs); + + // Check if this is a constant expression splat of the form returned by + // ConstantVector::getSplat() + const auto *Shuf = dyn_cast(this); + if (Shuf && Shuf->getOpcode() == Instruction::ShuffleVector && + isa(Shuf->getOperand(1))) { + + const auto *IElt = dyn_cast(Shuf->getOperand(0)); + if (IElt && IElt->getOpcode() == Instruction::InsertElement && + isa(IElt->getOperand(0))) { + + ArrayRef Mask = Shuf->getShuffleMask(); + Constant *SplatVal = IElt->getOperand(1); + ConstantInt *Index = dyn_cast(IElt->getOperand(2)); + + if (Index && Index->getValue() == 0 && + std::all_of(Mask.begin(), Mask.end(), [](int I) { return I == 0; })) + return SplatVal; + } + } + return nullptr; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82416.276202.patch Type: text/x-patch Size: 3369 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:51:06 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:51:06 +0000 (UTC) Subject: [PATCH] D83339: [SVE] Remove calls to VectorType::getNumElements from AsmParserTest In-Reply-To: References: Message-ID: efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83339/new/ https://reviews.llvm.org/D83339 From llvm-commits at lists.llvm.org Tue Jul 7 13:56:25 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Tue, 7 Jul 2020 13:56:25 -0700 Subject: [PATCH] D83179: [SCCP] Use range metadata for loads and calls In-Reply-To: <386fbe049a041b24ae0513b1c1ec1c09@localhost.localdomain> References: <386fbe049a041b24ae0513b1c1ec1c09@localhost.localdomain> Message-ID: FWIW I'm starting to see errors with this in cuda tests. Working on getting a testcase to show what's going on :) -eric On Tue, Jul 7, 2020 at 12:23 PM Nikita Popov via Phabricator via llvm-commits wrote: > This revision was not accepted when it landed; it landed in state "Needs > Review". > This revision was automatically updated to reflect the committed changes. > Closed by commit rG8691544a2767: [SCCP] Use range metadata for loads and > calls (authored by nikic). > > Changed prior to commit: > https://reviews.llvm.org/D83179?vs=275564&id=275682#toc > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D83179/new/ > > https://reviews.llvm.org/D83179 > > Files: > llvm/lib/Transforms/Scalar/SCCP.cpp > llvm/test/Transforms/SCCP/metadata.ll > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Tue Jul 7 13:57:05 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 20:57:05 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <02ca74056a479ccad1bce0a0ec31cb1c@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:346 + static constexpr uint32_t ParmsOnStackMask = 0x0000'0001; + static constexpr uint8_t NumberOfFPParaShift = 01; + ---------------- 01 -> 1? ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:396 + const uint8_t *TBPtr; + const uint64_t Size; + Optional ParaType; ---------------- Do you actually need the Size as data member? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:872 +Expected XCOFFTracebackTable::create(const uint8_t *Ptr, + const uint64_t S) { + Error Err = Error::success(); ---------------- Remove `const` to match with declaration. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:884 + ErrorAsOutParameter EAO(&Err); + DataExtractor DE(ArrayRef(Ptr, S), false, 4); + uint64_t offset_ptr = 0; ---------------- Suggestion: ``` DataExtractor DE(ArrayRef(Ptr, S), /* IsLittleEndian */ false, AddressSize); ``` AddressSize is not necessarily 4, it's 8 in 64 bit mode (although we don't actually use member functions in DataExtractor which uses AddressSize, in that case not sure if we could just pass in `/* AddressSize */ 0` to avoid confusion). ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- Why do we need to declare a new variable? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Tue Jul 7 14:00:14 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:00:14 +0000 (UTC) Subject: [PATCH] D82927: Intergerate Loop Peeling into Loop Fusion In-Reply-To: References: Message-ID: sidbav marked an inline comment as done. sidbav added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:124 + +static cl::opt FusionPeelMaxCount( + "loop-fusion-peel-max-count", cl::init(3), cl::Hidden, ---------------- bmahjour wrote: > We only really need one option to control peeling for loop fusion. You can remove `loop-fusion-peel` and change the default for this threshold to 0. I don't know what others feel about this, but I'd rather have shorter commands when possible. I agree with this idea, setting up the option like this will allow for this option to serve both purposes, of enabling peeling for fusion, and specifying the max value for fusion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82927/new/ https://reviews.llvm.org/D82927 From llvm-commits at lists.llvm.org Tue Jul 7 14:01:24 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:01:24 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG021d56abb9ee: [SVE] Make Constant::getSplatValue work for scalable vector splats (authored by ctetreau). Changed prior to commit: https://reviews.llvm.org/D82416?vs=274903&id=275688#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 Files: llvm/lib/IR/Constants.cpp llvm/test/Transforms/InstSimplify/vscale.ll llvm/unittests/IR/ConstantsTest.cpp Index: llvm/unittests/IR/ConstantsTest.cpp =================================================================== --- llvm/unittests/IR/ConstantsTest.cpp +++ llvm/unittests/IR/ConstantsTest.cpp @@ -638,5 +638,34 @@ EXPECT_FALSE(CP00U->isElementWiseEqual(CP00U0)); } +TEST(ConstantsTest, GetSplatValueRoundTrip) { + LLVMContext Context; + + Type *FloatTy = Type::getFloatTy(Context); + Type *Int32Ty = Type::getInt32Ty(Context); + Type *Int8Ty = Type::getInt8Ty(Context); + + for (unsigned Min : {1, 2, 8}) { + ElementCount SEC = {Min, true}; + ElementCount FEC = {Min, false}; + + for (auto EC : {SEC, FEC}) { + for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { + Constant *Zero = Constant::getNullValue(Ty); + Constant *One = Constant::getAllOnesValue(Ty); + + for (auto *C : {Zero, One}) { + Constant *Splat = ConstantVector::getSplat(EC, C); + ASSERT_NE(nullptr, Splat); + + Constant *SplatVal = Splat->getSplatValue(); + EXPECT_NE(nullptr, SplatVal); + EXPECT_EQ(SplatVal, C); + } + } + } + } +} + } // end anonymous namespace } // end namespace llvm Index: llvm/test/Transforms/InstSimplify/vscale.ll =================================================================== --- llvm/test/Transforms/InstSimplify/vscale.ll +++ llvm/test/Transforms/InstSimplify/vscale.ll @@ -95,6 +95,15 @@ ret i32 %r } +; more complicated expressions + +define @cmp_le_smax_always_true( %x) { +; CHECK-LABEL: @cmp_le_smax_always_true( +; CHECK-NEXT: ret shufflevector ( insertelement ( undef, i1 true, i32 0), undef, zeroinitializer) + %cmp = icmp sle %x, shufflevector ( insertelement ( undef, i64 9223372036854775807, i32 0), undef, zeroinitializer) + ret %cmp +} + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Memory Access and Addressing Operations ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Index: llvm/lib/IR/Constants.cpp =================================================================== --- llvm/lib/IR/Constants.cpp +++ llvm/lib/IR/Constants.cpp @@ -1585,6 +1585,27 @@ return CV->getSplatValue(); if (const ConstantVector *CV = dyn_cast(this)) return CV->getSplatValue(AllowUndefs); + + // Check if this is a constant expression splat of the form returned by + // ConstantVector::getSplat() + const auto *Shuf = dyn_cast(this); + if (Shuf && Shuf->getOpcode() == Instruction::ShuffleVector && + isa(Shuf->getOperand(1))) { + + const auto *IElt = dyn_cast(Shuf->getOperand(0)); + if (IElt && IElt->getOpcode() == Instruction::InsertElement && + isa(IElt->getOperand(0))) { + + ArrayRef Mask = Shuf->getShuffleMask(); + Constant *SplatVal = IElt->getOperand(1); + ConstantInt *Index = dyn_cast(IElt->getOperand(2)); + + if (Index && Index->getValue() == 0 && + std::all_of(Mask.begin(), Mask.end(), [](int I) { return I == 0; })) + return SplatVal; + } + } + return nullptr; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82416.275688.patch Type: text/x-patch Size: 3369 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:02:32 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:02:32 +0000 (UTC) Subject: [PATCH] D83341: [SVE] Scalarize fixed length masked loads and stores. In-Reply-To: References: Message-ID: <9f80de3f61dcc5f0888f767ff11faac8@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83341/new/ https://reviews.llvm.org/D83341 From llvm-commits at lists.llvm.org Tue Jul 7 14:02:42 2020 From: llvm-commits at lists.llvm.org (Eric Astor via llvm-commits) Date: Tue, 07 Jul 2020 14:02:42 -0700 (PDT) Subject: [llvm] bc8e262 - [ms] [llvm-ml] Add initial MASM STRUCT/UNION support Message-ID: <5f04e2f2.1c69fb81.17c50.be6c@mx.google.com> Author: Eric Astor Date: 2020-07-07T17:02:10-04:00 New Revision: bc8e262afe833fce2bff46c73d9e77ed23fd720f URL: https://github.com/llvm/llvm-project/commit/bc8e262afe833fce2bff46c73d9e77ed23fd720f DIFF: https://github.com/llvm/llvm-project/commit/bc8e262afe833fce2bff46c73d9e77ed23fd720f.diff LOG: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support Summary: Add support for user-defined types to MasmParser, including initialization and field access. Known issues: - Omitted entry initializers (e.g., <,0>) do not work consistently for nested structs/arrays. - Size checking/inference for values with known types is not yet implemented. - Some ml64.exe syntaxes for accessing STRUCT fields are not recognized. - `[.].` - `[[.]]` - `( PTR []).` - `[.].` - `( PTR ).` Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D75306 Added: llvm/test/tools/llvm-ml/struct.test llvm/test/tools/llvm-ml/struct_errors.test Modified: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/MC/MCParser/MCAsmParser.h b/llvm/include/llvm/MC/MCParser/MCAsmParser.h index f02578e451b5..204008975959 100644 --- a/llvm/include/llvm/MC/MCParser/MCAsmParser.h +++ b/llvm/include/llvm/MC/MCParser/MCAsmParser.h @@ -170,6 +170,11 @@ class MCAsmParser { virtual bool isParsingMasm() const { return false; } + virtual bool LookUpFieldOffset(StringRef Base, StringRef Member, + unsigned &Offset) { + return true; + } + /// Parse MS-style inline assembly. virtual bool parseMSInlineAsm( void *AsmLoc, std::string &AsmString, unsigned &NumOutputs, diff --git a/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h b/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h index ad086eaa539c..1d10c66b4201 100644 --- a/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h +++ b/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h @@ -334,7 +334,7 @@ class MCTargetAsmParser : public MCAsmParserExtension { /// SemaCallback - The Sema callback implementation. Must be set when parsing /// ms-style inline assembly. - MCAsmParserSemaCallback *SemaCallback; + MCAsmParserSemaCallback *SemaCallback = nullptr; /// Set of options which affects instrumentation of inline assembly. MCTargetOptions MCOptions; diff --git a/llvm/lib/MC/MCParser/MasmParser.cpp b/llvm/lib/MC/MCParser/MasmParser.cpp index 423230962de5..14c889da5c3e 100644 --- a/llvm/lib/MC/MCParser/MasmParser.cpp +++ b/llvm/lib/MC/MCParser/MasmParser.cpp @@ -80,8 +80,7 @@ namespace { typedef std::vector MCAsmMacroArgument; typedef std::vector MCAsmMacroArguments; -/// Helper class for storing information about an active macro -/// instantiation. +/// Helper class for storing information about an active macro instantiation. struct MacroInstantiation { /// The location of the instantiation. SMLoc InstantiationLoc; @@ -113,6 +112,238 @@ struct ParseStatementInfo { : AsmRewrites(rewrites) {} }; +enum FieldType { + FT_INTEGRAL, // Initializer: integer expression, stored as an MCExpr. + FT_REAL, // Initializer: real number, stored as an APInt. + FT_STRUCT // Initializer: struct initializer, stored recursively. +}; + +struct FieldInfo; +struct StructInfo { + StringRef Name; + bool IsUnion = false; + size_t Alignment = 0; + size_t Size = 0; + std::vector Fields; + StringMap FieldsByName; + + FieldInfo &addField(StringRef FieldName, FieldType FT); + + StructInfo() = default; + + StructInfo(StringRef StructName, bool Union, unsigned AlignmentValue) + : Name(StructName), IsUnion(Union), Alignment(AlignmentValue) {} +}; + +// FIXME: This should probably use a class hierarchy, raw pointers between the +// objects, and dynamic type resolution instead of a union. On the other hand, +// ownership then becomes much more complicated; the obvious thing would be to +// use BumpPtrAllocator, but the lack of a destructor makes that messy. + +struct StructInitializer; +struct IntFieldInfo { + SmallVector Values; + + IntFieldInfo() = default; + IntFieldInfo(const SmallVector &V) { Values = V; } + IntFieldInfo(SmallVector &&V) { Values = V; } +}; +struct RealFieldInfo { + SmallVector AsIntValues; + + RealFieldInfo() = default; + RealFieldInfo(const SmallVector &V) { AsIntValues = V; } + RealFieldInfo(SmallVector &&V) { AsIntValues = V; } +}; +struct StructFieldInfo { + std::vector Initializers; + StructInfo Structure; + + StructFieldInfo() = default; + StructFieldInfo(const std::vector &V, StructInfo S) { + Initializers = V; + Structure = S; + } + StructFieldInfo(std::vector &&V, StructInfo S) { + Initializers = V; + Structure = S; + } +}; + +class FieldInitializer { +public: + FieldType FT; + union { + IntFieldInfo IntInfo; + RealFieldInfo RealInfo; + StructFieldInfo StructInfo; + }; + + ~FieldInitializer() { + switch (FT) { + case FT_INTEGRAL: + IntInfo.~IntFieldInfo(); + break; + case FT_REAL: + RealInfo.~RealFieldInfo(); + break; + case FT_STRUCT: + StructInfo.~StructFieldInfo(); + break; + } + } + + FieldInitializer(FieldType FT) : FT(FT) { + switch (FT) { + case FT_INTEGRAL: + new (&IntInfo) IntFieldInfo(); + break; + case FT_REAL: + new (&RealInfo) RealFieldInfo(); + break; + case FT_STRUCT: + new (&StructInfo) StructFieldInfo(); + break; + } + } + + FieldInitializer(SmallVector &&Values) : FT(FT_INTEGRAL) { + new (&IntInfo) IntFieldInfo(Values); + } + + FieldInitializer(SmallVector &&AsIntValues) : FT(FT_REAL) { + new (&RealInfo) RealFieldInfo(AsIntValues); + } + + FieldInitializer(std::vector &&Initializers, + struct StructInfo Structure) + : FT(FT_STRUCT) { + new (&StructInfo) StructFieldInfo(Initializers, Structure); + } + + FieldInitializer(const FieldInitializer &Initializer) : FT(Initializer.FT) { + switch (FT) { + case FT_INTEGRAL: + new (&IntInfo) IntFieldInfo(Initializer.IntInfo); + break; + case FT_REAL: + new (&RealInfo) RealFieldInfo(Initializer.RealInfo); + break; + case FT_STRUCT: + new (&StructInfo) StructFieldInfo(Initializer.StructInfo); + break; + } + } + + FieldInitializer(FieldInitializer &&Initializer) : FT(Initializer.FT) { + switch (FT) { + case FT_INTEGRAL: + new (&IntInfo) IntFieldInfo(Initializer.IntInfo); + break; + case FT_REAL: + new (&RealInfo) RealFieldInfo(Initializer.RealInfo); + break; + case FT_STRUCT: + new (&StructInfo) StructFieldInfo(Initializer.StructInfo); + break; + } + } + + FieldInitializer &operator=(const FieldInitializer &Initializer) { + if (FT != Initializer.FT) { + switch (FT) { + case FT_INTEGRAL: + IntInfo.~IntFieldInfo(); + break; + case FT_REAL: + RealInfo.~RealFieldInfo(); + break; + case FT_STRUCT: + StructInfo.~StructFieldInfo(); + break; + } + } + FT = Initializer.FT; + switch (FT) { + case FT_INTEGRAL: + IntInfo = Initializer.IntInfo; + break; + case FT_REAL: + RealInfo = Initializer.RealInfo; + break; + case FT_STRUCT: + StructInfo = Initializer.StructInfo; + break; + } + return *this; + } + + FieldInitializer &operator=(FieldInitializer &&Initializer) { + if (FT != Initializer.FT) { + switch (FT) { + case FT_INTEGRAL: + IntInfo.~IntFieldInfo(); + break; + case FT_REAL: + RealInfo.~RealFieldInfo(); + break; + case FT_STRUCT: + StructInfo.~StructFieldInfo(); + break; + } + } + FT = Initializer.FT; + switch (FT) { + case FT_INTEGRAL: + IntInfo = Initializer.IntInfo; + break; + case FT_REAL: + RealInfo = Initializer.RealInfo; + break; + case FT_STRUCT: + StructInfo = Initializer.StructInfo; + break; + } + return *this; + } +}; + +struct StructInitializer { + std::vector FieldInitializers; +}; + +struct FieldInfo { + // Offset of the field within the containing STRUCT. + size_t Offset = 0; + + // Total size of the field (= LengthOf * Type). + size_t SizeOf = 0; + + // Number of elements in the field (1 if scalar, >1 if an array). + size_t LengthOf = 0; + + // Size of a single entry in this field, in bytes ("type" in MASM standards). + size_t Type = 0; + + FieldInitializer Contents; + + FieldInfo(FieldType FT) : Contents(FT) {} +}; + +FieldInfo &StructInfo::addField(StringRef FieldName, FieldType FT) { + if (!FieldName.empty()) + FieldsByName[FieldName] = Fields.size(); + Fields.emplace_back(FT); + FieldInfo &Field = Fields.back(); + if (IsUnion) { + Field.Offset = 0; + } else { + Size = llvm::alignTo(Size, Alignment); + Field.Offset = Size; + } + return Field; +} + /// The concrete assembly parser instance. // Note that this is a full MCAsmParser, not an MCAsmParserExtension! // It's a peer of AsmParser, not of COFFAsmParser, WasmAsmParser, etc. @@ -139,7 +370,7 @@ class MasmParser : public MCAsmParser { /// addDirectiveHandler. StringMap ExtensionDirectiveMap; - /// maps assembly-time variable names to variables + /// maps assembly-time variable names to variables. struct Variable { StringRef Name; bool Redefinable = true; @@ -149,6 +380,15 @@ class MasmParser : public MCAsmParser { }; StringMap Variables; + /// Stack of active struct definitions. + SmallVector StructInProgress; + + /// Maps struct tags to struct definitions. + StringMap Structs; + + /// Maps data location names to user-defined types. + StringMap KnownType; + /// Stack of active macro instantiations. std::vector ActiveMacros; @@ -190,6 +430,9 @@ class MasmParser : public MCAsmParser { // Is alt macro mode enabled. bool AltMacroMode = false; + // Current <...> expression depth. + unsigned AngleBracketDepth = 0U; + public: MasmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out, const MCAsmInfo &MAI, unsigned CB); @@ -247,6 +490,9 @@ class MasmParser : public MCAsmParser { bool isParsingMasm() const override { return true; } + bool LookUpFieldOffset(StringRef Base, StringRef Member, + unsigned &Offset) override; + bool parseMSInlineAsm(void *AsmLoc, std::string &AsmString, unsigned &NumOutputs, unsigned &NumInputs, SmallVectorImpl> &OpDecls, @@ -315,6 +561,9 @@ class MasmParser : public MCAsmParser { } static void DiagHandler(const SMDiagnostic &Diag, void *Context); + bool LookUpFieldOffset(const StructInfo &Structure, StringRef Member, + unsigned &Offset); + /// Should we emit DWARF describing this assembler source? (Returns false if /// the source has .file directives, which means we don't want to generate /// info describing the assembler source itself.) @@ -464,11 +713,14 @@ class MasmParser : public MCAsmParser { DK_ERRE, DK_ERRNZ, DK_ECHO, + DK_STRUCT, + DK_UNION, + DK_ENDS, DK_END }; - /// Maps directive name --> DirectiveKind enum, for - /// directives parsed by this class. + /// Maps directive name --> DirectiveKind enum, for directives parsed by this + /// class. StringMap DirectiveKindMap; // Codeview def_range type parsing. @@ -480,8 +732,8 @@ class MasmParser : public MCAsmParser { CVDR_DEFRANGE_REGISTER_REL }; - /// Maps Codeview def_range types --> CVDefRangeType enum, for - /// Codeview def_range types parsed by this class. + /// Maps Codeview def_range types --> CVDefRangeType enum, for Codeview + /// def_range types parsed by this class. StringMap CVDefRangeTypeMap; bool parseInitValue(unsigned Size); @@ -490,20 +742,85 @@ class MasmParser : public MCAsmParser { bool parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated); // "byte", "word", ... - bool parseScalarInstList(unsigned Size, - SmallVectorImpl &Values); + bool emitIntValue(const MCExpr *Value, unsigned Size); + bool parseScalarInitializer(unsigned Size, + SmallVectorImpl &Values, + unsigned StringPadLength = 0); + bool parseScalarInstList( + unsigned Size, SmallVectorImpl &Values, + const AsmToken::TokenKind EndToken = AsmToken::EndOfStatement); + bool emitIntegralValues(unsigned Size); + bool addIntegralField(StringRef Name, unsigned Size); bool parseDirectiveValue(StringRef IDVal, unsigned Size); bool parseDirectiveNamedValue(StringRef IDVal, unsigned Size, StringRef Name, SMLoc NameLoc); // "real4", "real8" + bool emitRealValues(const fltSemantics &Semantics); + bool addRealField(StringRef Name, const fltSemantics &Semantics); bool parseDirectiveRealValue(StringRef IDVal, const fltSemantics &Semantics); - bool parseRealInstList(const fltSemantics &Semantics, - SmallVectorImpl &Values); + bool parseRealInstList( + const fltSemantics &Semantics, SmallVectorImpl &Values, + const AsmToken::TokenKind EndToken = AsmToken::EndOfStatement); bool parseDirectiveNamedRealValue(StringRef IDVal, const fltSemantics &Semantics, StringRef Name, SMLoc NameLoc); + bool parseOptionalAngleBracketOpen(); + bool parseAngleBracketClose(const Twine &Msg = "expected '>'"); + + bool parseFieldInitializer(const FieldInfo &Field, + FieldInitializer &Initializer); + bool parseFieldInitializer(const FieldInfo &Field, + const IntFieldInfo &Contents, + FieldInitializer &Initializer); + bool parseFieldInitializer(const FieldInfo &Field, + const RealFieldInfo &Contents, + FieldInitializer &Initializer); + bool parseFieldInitializer(const FieldInfo &Field, + const StructFieldInfo &Contents, + FieldInitializer &Initializer); + + bool parseStructInitializer(const StructInfo &Structure, + StructInitializer &Initializer); + bool parseStructInstList( + const StructInfo &Structure, std::vector &Initializers, + const AsmToken::TokenKind EndToken = AsmToken::EndOfStatement); + + bool emitFieldValue(const FieldInfo &Field); + bool emitFieldValue(const FieldInfo &Field, const IntFieldInfo &Contents); + bool emitFieldValue(const FieldInfo &Field, const RealFieldInfo &Contents); + bool emitFieldValue(const FieldInfo &Field, const StructFieldInfo &Contents); + + bool emitStructValue(const StructInfo &Structure); + + bool emitFieldInitializer(const FieldInfo &Field, + const FieldInitializer &Initializer); + bool emitFieldInitializer(const FieldInfo &Field, + const IntFieldInfo &Contents, + const IntFieldInfo &Initializer); + bool emitFieldInitializer(const FieldInfo &Field, + const RealFieldInfo &Contents, + const RealFieldInfo &Initializer); + bool emitFieldInitializer(const FieldInfo &Field, + const StructFieldInfo &Contents, + const StructFieldInfo &Initializer); + + bool emitStructInitializer(const StructInfo &Structure, + const StructInitializer &Initializer); + + // User-defined types (structs, unions): + bool emitStructValue(const StructInfo &Structure, + const StructInitializer &Initializer, + size_t InitialOffset = 0, size_t InitialField = 0); + bool emitStructValues(const StructInfo &Structure); + bool addStructField(StringRef Name, const StructInfo &Structure); + bool parseDirectiveStructValue(const StructInfo &Structure, + StringRef Directive, SMLoc DirLoc); + bool parseDirectiveNamedStructValue(const StructInfo &Structure, + StringRef Directive, SMLoc DirLoc, + StringRef Name); + // "=", "equ", "textequ" bool parseDirectiveEquate(StringRef IDVal, StringRef Name, DirectiveKind DirKind); @@ -562,8 +879,14 @@ class MasmParser : public MCAsmParser { // alternate macro mode directives bool parseDirectiveAltmacro(StringRef Directive); - /// Parse a directive like ".globl" which - /// accepts a single symbol (which should be a label or an external). + bool parseDirectiveStruct(StringRef Directive, DirectiveKind DirKind, + StringRef Name, SMLoc NameLoc); + bool parseDirectiveNestedStruct(StringRef Directive, DirectiveKind DirKind); + bool parseDirectiveEnds(StringRef Name, SMLoc NameLoc); + bool parseDirectiveNestedEnds(); + + /// Parse a directive like ".globl" which accepts a single symbol (which + /// should be a label or an external). bool parseDirectiveSymbolAttribute(MCSymbolAttr Attr); bool parseDirectiveComm(bool IsLocal); // ".comm" and ".lcomm" @@ -1024,7 +1347,7 @@ bool MasmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) { return Error(FirstTokenLoc, "invalid token in expression"); } } - // Parse symbol variant + // Parse symbol variant. std::pair Split; if (!MAI.useParensForSymbolVariant()) { if (FirstTokenKind == AsmToken::String) { @@ -1060,7 +1383,7 @@ bool MasmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) { MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None; - // Lookup the symbol variant if used. + // Look up the symbol variant if used. if (!Split.second.empty()) { Variant = MCSymbolRefExpr::getVariantKindForName(Split.second); if (Variant != MCSymbolRefExpr::VK_Invalid) { @@ -1073,6 +1396,27 @@ bool MasmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) { } } + // Find the field offset if used. + unsigned Offset = 0; + Split = SymbolName.split('.'); + if (!Split.second.empty()) { + SymbolName = Split.first; + if (Structs.count(SymbolName.lower()) && + !LookUpFieldOffset(SymbolName, Split.second, Offset)) { + // This is actually a reference to a field offset. + Res = MCConstantExpr::create(Offset, getContext()); + return false; + } + + auto TypeIt = KnownType.find(SymbolName); + if (TypeIt == KnownType.end() || + LookUpFieldOffset(*TypeIt->second, Split.second, Offset)) { + std::pair BaseMember = Split.second.split('.'); + StringRef Base = BaseMember.first, Member = BaseMember.second; + LookUpFieldOffset(Base, Member, Offset); + } + } + MCSymbol *Sym = getContext().getInlineAsmLabel(SymbolName); if (!Sym) Sym = getContext().getOrCreateSymbol(SymbolName); @@ -1093,7 +1437,15 @@ bool MasmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) { } // Otherwise create a symbol ref. - Res = MCSymbolRefExpr::create(Sym, Variant, getContext(), FirstTokenLoc); + const MCExpr *SymRef = + MCSymbolRefExpr::create(Sym, Variant, getContext(), FirstTokenLoc); + if (Offset) { + Res = MCBinaryExpr::create(MCBinaryExpr::Add, SymRef, + MCConstantExpr::create(Offset, getContext()), + getContext()); + } else { + Res = SymRef; + } return false; } case AsmToken::BigNum: @@ -1104,10 +1456,10 @@ bool MasmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) { Res = MCConstantExpr::create(IntVal, getContext()); EndLoc = Lexer.getTok().getEndLoc(); Lex(); // Eat token. - // Look for 'b' or 'f' following an Integer as a directional label + // Look for 'b' or 'f' following an Integer as a directional label. if (Lexer.getKind() == AsmToken::Identifier) { StringRef IDVal = getTok().getString(); - // Lookup the symbol variant if used. + // Look up the symbol variant if used. std::pair Split = IDVal.split('@'); MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None; if (Split.first.size() != IDVal.size()) { @@ -1324,7 +1676,8 @@ bool MasmParser::parseAbsoluteExpression(int64_t &Res) { static unsigned getGNUBinOpPrecedence(AsmToken::TokenKind K, MCBinaryExpr::Opcode &Kind, - bool ShouldUseLogicalShr) { + bool ShouldUseLogicalShr, + bool EndExpressionAtGreater) { switch (K) { default: return 0; // not a binop. @@ -1352,6 +1705,8 @@ static unsigned getGNUBinOpPrecedence(AsmToken::TokenKind K, Kind = MCBinaryExpr::LTE; return 3; case AsmToken::Greater: + if (EndExpressionAtGreater) + return 0; Kind = MCBinaryExpr::GT; return 3; case AsmToken::GreaterEqual: @@ -1393,6 +1748,8 @@ static unsigned getGNUBinOpPrecedence(AsmToken::TokenKind K, Kind = MCBinaryExpr::Shl; return 6; case AsmToken::GreaterGreater: + if (EndExpressionAtGreater) + return 0; Kind = ShouldUseLogicalShr ? MCBinaryExpr::LShr : MCBinaryExpr::AShr; return 6; } @@ -1401,7 +1758,8 @@ static unsigned getGNUBinOpPrecedence(AsmToken::TokenKind K, unsigned MasmParser::getBinOpPrecedence(AsmToken::TokenKind K, MCBinaryExpr::Opcode &Kind) { bool ShouldUseLogicalShr = MAI.shouldUseLogicalShr(); - return getGNUBinOpPrecedence(K, Kind, ShouldUseLogicalShr); + return getGNUBinOpPrecedence(K, Kind, ShouldUseLogicalShr, + AngleBracketDepth > 0); } /// Parse all binary operators with precedence >= 'Precedence'. @@ -1444,11 +1802,11 @@ bool MasmParser::parseBinOpRHS(unsigned Precedence, const MCExpr *&Res, bool MasmParser::parseStatement(ParseStatementInfo &Info, MCAsmParserSemaCallback *SI) { assert(!hasPendingError() && "parseStatement started with pending error"); - // Eat initial spaces and comments + // Eat initial spaces and comments. while (Lexer.is(AsmToken::Space)) Lex(); if (Lexer.is(AsmToken::EndOfStatement)) { - // if this is a line comment we can drop it safely + // If this is a line comment we can drop it safely. if (getTok().getString().empty() || getTok().getString().front() == '\r' || getTok().getString().front() == '\n') Out.AddBlankLine(); @@ -1678,6 +2036,13 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, getTargetParser().flushPendingInstructions(getStreamer()); + // Special-case handling of structure-end directives at higher priority, + // since ENDS is overloaded as a segment-end directive. + if (IDVal.equals_lower("ends") && StructInProgress.size() > 1 && + getTok().is(AsmToken::EndOfStatement)) { + return parseDirectiveNestedEnds(); + } + // First, check the extension directive map to see if any extension has // registered itself to parse this directive. std::pair Handler = @@ -1735,6 +2100,11 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, return parseDirectiveRealValue(IDVal, APFloat::IEEEsingle()); case DK_REAL8: return parseDirectiveRealValue(IDVal, APFloat::IEEEdouble()); + case DK_STRUCT: + case DK_UNION: + return parseDirectiveNestedStruct(IDVal, DirKind); + case DK_ENDS: + return parseDirectiveNestedEnds(); case DK_ALIGN: return parseDirectiveAlign(); case DK_ORG: @@ -1878,6 +2248,12 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, return Error(IDLoc, "unknown directive"); } + // We also check if this is allocating memory with user-defined type. + auto IDIt = Structs.find(IDVal.lower()); + if (IDIt != Structs.end()) + return parseDirectiveStructValue(/*Structure=*/IDIt->getValue(), IDVal, + IDLoc); + // Non-conditional Microsoft directives sometimes follow their first argument. const AsmToken nextTok = getTok(); const StringRef nextVal = nextTok.getString(); @@ -1894,6 +2270,13 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, getTargetParser().flushPendingInstructions(getStreamer()); + // Special-case handling of structure-end directives at higher priority, since + // ENDS is overloaded as a segment-end directive. + if (nextVal.equals_lower("ends") && StructInProgress.size() == 1) { + Lex(); + return parseDirectiveEnds(IDVal, IDLoc); + } + // First, check the extension directive map to see if any extension has // registered itself to parse this directive. std::pair Handler = @@ -1904,7 +2287,7 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, return (*Handler.second)(Handler.first, nextVal, nextLoc); } - // Finally, if no one else is interested in this directive, it must be + // If no one else is interested in this directive, it must be // generic and familiar to this class. DirKindIt = DirectiveKindMap.find(nextVal.lower()); DirKind = (DirKindIt == DirectiveKindMap.end()) @@ -1945,6 +2328,21 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, Lex(); return parseDirectiveNamedRealValue(nextVal, APFloat::IEEEdouble(), IDVal, IDLoc); + case DK_STRUCT: + case DK_UNION: + Lex(); + return parseDirectiveStruct(nextVal, DirKind, IDVal, IDLoc); + case DK_ENDS: + Lex(); + return parseDirectiveEnds(IDVal, IDLoc); + } + + // Finally, we check if this is allocating a variable with user-defined type. + auto NextIt = Structs.find(nextVal.lower()); + if (NextIt != Structs.end()) { + Lex(); + return parseDirectiveNamedStructValue(/*Structure=*/NextIt->getValue(), + nextVal, nextLoc, IDVal); } // __asm _emit or __asm __emit @@ -2029,19 +2427,19 @@ bool MasmParser::parseStatement(ParseStatementInfo &Info, return false; } -// Parse and erase curly braces marking block start/end +// Parse and erase curly braces marking block start/end. bool MasmParser::parseCurlyBlockScope( SmallVectorImpl &AsmStrRewrites) { - // Identify curly brace marking block start/end + // Identify curly brace marking block start/end. if (Lexer.isNot(AsmToken::LCurly) && Lexer.isNot(AsmToken::RCurly)) return false; SMLoc StartLoc = Lexer.getLoc(); - Lex(); // Eat the brace + Lex(); // Eat the brace. if (Lexer.is(AsmToken::EndOfStatement)) - Lex(); // Eat EndOfStatement following the brace + Lex(); // Eat EndOfStatement following the brace. - // Erase the block start/end brace from the output asm string + // Erase the block start/end brace from the output asm string. AsmStrRewrites.emplace_back(AOK_Skip, StartLoc, Lexer.getLoc().getPointer() - StartLoc.getPointer()); return true; @@ -2348,7 +2746,7 @@ bool MasmParser::parseMacroArgument(MCAsmMacroArgument &MA, bool Vararg) { if (Lexer.is(AsmToken::Space)) { SpaceEaten = true; - Lexer.Lex(); // Eat spaces + Lexer.Lex(); // Eat spaces. } // Spaces can delimit parameters, but could also be part an expression. @@ -2431,7 +2829,7 @@ bool MasmParser::parseMacroArguments(const MCAsmMacro *M, if (AltMacroMode && Lexer.is(AsmToken::Percent)) { const MCExpr *AbsoluteExp; int64_t Value; - /// Eat '%' + /// Eat '%'. Lex(); if (parseExpression(AbsoluteExp, EndLoc)) return false; @@ -2448,7 +2846,7 @@ bool MasmParser::parseMacroArguments(const MCAsmMacro *M, const char *StrChar = StrLoc.getPointer(); const char *EndChar = EndLoc.getPointer(); jumpToLoc(EndLoc, CurBuffer); - /// Eat from '<' to '>' + /// Eat from '<' to '>'. Lex(); AsmToken newToken(AsmToken::String, StringRef(StrChar, EndChar - StrChar)); @@ -2634,7 +3032,7 @@ bool MasmParser::parseDirectiveEquate(StringRef IDVal, StringRef Name, Var.IsText = true; Var.TextValue = Value; - // Accept a text-list, not just one text-item + // Accept a text-list, not just one text-item. auto parseItem = [&]() -> bool { if (parseTextItem(Value)) return true; @@ -2650,7 +3048,7 @@ bool MasmParser::parseDirectiveEquate(StringRef IDVal, StringRef Name, if (DirKind == DK_TEXTEQU) return TokError("expected in '" + Twine(IDVal) + "' directive"); - // Parse as expression assignment + // Parse as expression assignment. const MCExpr *Expr; SMLoc EndLoc, StartLoc = Lexer.getLoc(); if (parseExpression(Expr, EndLoc)) @@ -2748,7 +3146,7 @@ bool MasmParser::parseAngleBracketString(std::string &Data) { const char *StartChar = StartLoc.getPointer() + 1; const char *EndChar = EndLoc.getPointer() - 1; jumpToLoc(EndLoc, CurBuffer); - /// Eat from '<' to '>' + // Eat from '<' to '>'. Lex(); Data = angleBracketString(StringRef(StartChar, EndChar - StartChar)); @@ -2781,89 +3179,140 @@ bool MasmParser::parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated) { return false; } -bool MasmParser::parseScalarInstList(unsigned Size, - SmallVectorImpl &Values) { - do { - if (getTok().is(AsmToken::String)) { - StringRef Value = getTok().getStringContents(); - if (Size == 1) { - // Treat each character as an initializer. - for (const char CharVal : Value) - Values.push_back(MCConstantExpr::create(CharVal, getContext())); - } else { - // Treat the string as an initial value in big-endian representation. - if (Value.size() > Size) - return Error(getTok().getLoc(), "out of range literal value"); - - uint64_t IntValue = 0; - for (const unsigned char CharVal : Value.bytes()) - IntValue = (IntValue << 8) | CharVal; - Values.push_back(MCConstantExpr::create(IntValue, getContext())); - } - Lex(); +bool MasmParser::emitIntValue(const MCExpr *Value, unsigned Size) { + // Special case constant expressions to match code generator. + if (const MCConstantExpr *MCE = dyn_cast(Value)) { + assert(Size <= 8 && "Invalid size"); + int64_t IntValue = MCE->getValue(); + if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue)) + return Error(MCE->getLoc(), "out of range literal value"); + getStreamer().emitIntValue(IntValue, Size); + } else { + const MCSymbolRefExpr *MSE = dyn_cast(Value); + if (MSE && MSE->getSymbol().getName() == "?") { + // ? initializer; treat as 0. + getStreamer().emitIntValue(0, Size); } else { - const MCExpr *Value; - if (checkForValidSection() || parseExpression(Value)) + getStreamer().emitValue(Value, Size, Value->getLoc()); + } + } + return false; +} + +bool MasmParser::parseScalarInitializer(unsigned Size, + SmallVectorImpl &Values, + unsigned StringPadLength) { + if (getTok().is(AsmToken::String)) { + StringRef Value = getTok().getStringContents(); + if (Size == 1) { + // Treat each character as an initializer. + for (const char CharVal : Value) + Values.push_back(MCConstantExpr::create(CharVal, getContext())); + + // Pad the string with spaces to the specified length. + for (size_t i = Value.size(); i < StringPadLength; ++i) + Values.push_back(MCConstantExpr::create(' ', getContext())); + } else { + // Treat the string as an initial value in big-endian representation. + if (Value.size() > Size) + return Error(getTok().getLoc(), "out of range literal value"); + + uint64_t IntValue = 0; + for (const unsigned char CharVal : Value.bytes()) + IntValue = (IntValue << 8) | CharVal; + Values.push_back(MCConstantExpr::create(IntValue, getContext())); + } + Lex(); + } else { + const MCExpr *Value; + if (checkForValidSection() || parseExpression(Value)) + return true; + if (getTok().is(AsmToken::Identifier) && + getTok().getString().equals_lower("dup")) { + Lex(); // Eat 'dup'. + const MCConstantExpr *MCE = dyn_cast(Value); + if (!MCE) + return Error(Value->getLoc(), + "cannot repeat value a non-constant number of times"); + const int64_t Repetitions = MCE->getValue(); + if (Repetitions < 0) + return Error(Value->getLoc(), + "cannot repeat value a negative number of times"); + + SmallVector DuplicatedValues; + if (parseToken(AsmToken::LParen, + "parentheses required for 'dup' contents") || + parseScalarInstList(Size, DuplicatedValues) || + parseToken(AsmToken::RParen, "unmatched parentheses")) return true; - if (getTok().is(AsmToken::Identifier) && - getTok().getString().equals_lower("dup")) { - Lex(); // eat 'dup' - const MCConstantExpr *MCE = dyn_cast(Value); - if (!MCE) - return Error(Value->getLoc(), - "cannot repeat value a non-constant number of times"); - const int64_t Repetitions = MCE->getValue(); - if (Repetitions < 0) - return Error(Value->getLoc(), - "cannot repeat value a negative number of times"); - - SmallVector DuplicatedValues; - if (parseToken(AsmToken::LParen, - "parentheses required for 'dup' contents") || - parseScalarInstList(Size, DuplicatedValues) || - parseToken(AsmToken::RParen, "unmatched parentheses")) - return true; - for (int i = 0; i < Repetitions; ++i) - Values.append(DuplicatedValues.begin(), DuplicatedValues.end()); - } else { - Values.push_back(Value); - } + for (int i = 0; i < Repetitions; ++i) + Values.append(DuplicatedValues.begin(), DuplicatedValues.end()); + } else { + Values.push_back(Value); } + } + return false; +} - // Continue if we see a comma. (Also, allow line continuation.) - } while (parseOptionalToken(AsmToken::Comma) && - (getTok().isNot(AsmToken::EndOfStatement) || - !parseToken(AsmToken::EndOfStatement))); +bool MasmParser::parseScalarInstList(unsigned Size, + SmallVectorImpl &Values, + const AsmToken::TokenKind EndToken) { + while (getTok().isNot(EndToken) && + (EndToken != AsmToken::Greater || + getTok().isNot(AsmToken::GreaterGreater))) { + parseScalarInitializer(Size, Values); + + // If we see a comma, continue, and allow line continuation. + if (!parseOptionalToken(AsmToken::Comma)) + break; + parseOptionalToken(AsmToken::EndOfStatement); + } + return false; +} + +bool MasmParser::emitIntegralValues(unsigned Size) { + SmallVector Values; + if (checkForValidSection() || parseScalarInstList(Size, Values)) + return true; + + for (auto Value : Values) { + emitIntValue(Value, Size); + } + return false; +} + +// Add a field to the current structure. +bool MasmParser::addIntegralField(StringRef Name, unsigned Size) { + StructInfo &Struct = StructInProgress.back(); + FieldInfo &Field = Struct.addField(Name, FT_INTEGRAL); + IntFieldInfo &IntInfo = Field.Contents.IntInfo; + + Field.Type = Size; + + if (parseScalarInstList(Size, IntInfo.Values)) + return true; + Field.SizeOf = Field.Type * IntInfo.Values.size(); + Field.LengthOf = IntInfo.Values.size(); + if (Struct.IsUnion) + Struct.Size = std::max(Struct.Size, Field.SizeOf); + else + Struct.Size += Field.SizeOf; return false; } /// parseDirectiveValue /// ::= (byte | word | ... ) [ expression (, expression)* ] bool MasmParser::parseDirectiveValue(StringRef IDVal, unsigned Size) { - SmallVector Values; - if (parseScalarInstList(Size, Values)) + if (StructInProgress.empty()) { + // Initialize data value. + if (emitIntegralValues(Size)) + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } else if (addIntegralField("", Size)) { return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); - - for (const MCExpr *Value : Values) { - // Special case constant expressions to match code generator. - if (const MCConstantExpr *MCE = dyn_cast(Value)) { - assert(Size <= 8 && "Invalid size"); - int64_t IntValue = MCE->getValue(); - if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue)) - return Error(MCE->getLoc(), "out of range literal value"); - getStreamer().emitIntValue(IntValue, Size); - } else { - const MCSymbolRefExpr *MSE = dyn_cast(Value); - if (MSE && MSE->getSymbol().getName() == "?") { - // ? initializer; treat as 0. - getStreamer().emitIntValue(0, Size); - } else { - getStreamer().emitValue(Value, Size, Value->getLoc()); - } - } } + return false; } @@ -2871,9 +3320,17 @@ bool MasmParser::parseDirectiveValue(StringRef IDVal, unsigned Size) { /// ::= name (byte | word | ... ) [ expression (, expression)* ] bool MasmParser::parseDirectiveNamedValue(StringRef IDVal, unsigned Size, StringRef Name, SMLoc NameLoc) { - MCSymbol *Sym = getContext().getOrCreateSymbol(Name); - getStreamer().emitLabel(Sym); - return parseDirectiveValue(IDVal, Size); + if (StructInProgress.empty()) { + // Initialize named data value. + MCSymbol *Sym = getContext().getOrCreateSymbol(Name); + getStreamer().emitLabel(Sym); + if (emitIntegralValues(Size)) + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } else if (addIntegralField(Name, Size)) { + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } + + return false; } static bool parseHexOcta(MasmParser &Asm, uint64_t &hi, uint64_t &lo) { @@ -2902,8 +3359,9 @@ bool MasmParser::parseRealValue(const fltSemantics &Semantics, APInt &Res) { if (getLexer().is(AsmToken::Minus)) { Lexer.Lex(); IsNeg = true; - } else if (getLexer().is(AsmToken::Plus)) + } else if (getLexer().is(AsmToken::Plus)) { Lexer.Lex(); + } if (Lexer.is(AsmToken::Error)) return TokError(Lexer.getErr()); @@ -2915,16 +3373,19 @@ bool MasmParser::parseRealValue(const fltSemantics &Semantics, APInt &Res) { APFloat Value(Semantics); StringRef IDVal = getTok().getString(); if (getLexer().is(AsmToken::Identifier)) { - if (!IDVal.compare_lower("infinity") || !IDVal.compare_lower("inf")) + if (IDVal.equals_lower("infinity") || IDVal.equals_lower("inf")) Value = APFloat::getInf(Semantics); - else if (!IDVal.compare_lower("nan")) + else if (IDVal.equals_lower("nan")) Value = APFloat::getNaN(Semantics, false, ~0); + else if (IDVal.equals_lower("?")) + Value = APFloat::getZero(Semantics); else return TokError("invalid floating point literal"); } else if (errorToBool( Value.convertFromString(IDVal, APFloat::rmNearestTiesToEven) - .takeError())) + .takeError())) { return TokError("invalid floating point literal"); + } if (IsNeg) Value.changeSign(); @@ -2937,8 +3398,11 @@ bool MasmParser::parseRealValue(const fltSemantics &Semantics, APInt &Res) { } bool MasmParser::parseRealInstList(const fltSemantics &Semantics, - SmallVectorImpl &ValuesAsInt) { - do { + SmallVectorImpl &ValuesAsInt, + const AsmToken::TokenKind EndToken) { + while (getTok().isNot(EndToken) || + (EndToken == AsmToken::Greater && + getTok().isNot(AsmToken::GreaterGreater))) { const AsmToken NextTok = Lexer.peekTok(); if (NextTok.is(AsmToken::Identifier) && NextTok.getString().equals_lower("dup")) { @@ -2969,11 +3433,48 @@ bool MasmParser::parseRealInstList(const fltSemantics &Semantics, return true; ValuesAsInt.push_back(AsInt); } + // Continue if we see a comma. (Also, allow line continuation.) - } while (parseOptionalToken(AsmToken::Comma) && - (getTok().isNot(AsmToken::EndOfStatement) || - !parseToken(AsmToken::EndOfStatement))); + if (!parseOptionalToken(AsmToken::Comma)) + break; + parseOptionalToken(AsmToken::EndOfStatement); + } + + return false; +} + +// Initialize real data values. +bool MasmParser::emitRealValues(const fltSemantics &Semantics) { + SmallVector ValuesAsInt; + if (parseRealInstList(Semantics, ValuesAsInt)) + return true; + for (const APInt &AsInt : ValuesAsInt) { + getStreamer().emitIntValue(AsInt.getLimitedValue(), + AsInt.getBitWidth() / 8); + } + return false; +} + +// Add a real field to the current struct. +bool MasmParser::addRealField(StringRef Name, const fltSemantics &Semantics) { + StructInfo &Struct = StructInProgress.back(); + FieldInfo &Field = Struct.addField(Name, FT_REAL); + RealFieldInfo &RealInfo = Field.Contents.RealInfo; + + Field.SizeOf = 0; + + if (checkForValidSection() || + parseRealInstList(Semantics, RealInfo.AsIntValues)) + return true; + + Field.Type = RealInfo.AsIntValues.back().getBitWidth() / 8; + Field.LengthOf = RealInfo.AsIntValues.size(); + Field.SizeOf = Field.Type * Field.LengthOf; + if (Struct.IsUnion) + Struct.Size = std::max(Struct.Size, Field.SizeOf); + else + Struct.Size += Field.SizeOf; return false; } @@ -2984,13 +3485,12 @@ bool MasmParser::parseDirectiveRealValue(StringRef IDVal, if (checkForValidSection()) return true; - SmallVector ValuesAsInt; - if (parseRealInstList(Semantics, ValuesAsInt)) + if (StructInProgress.empty()) { + // Initialize data value. + if (emitRealValues(Semantics)) + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } else if (addRealField("", Semantics)) { return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); - - for (const APInt &AsInt : ValuesAsInt) { - getStreamer().emitIntValue(AsInt.getLimitedValue(), - AsInt.getBitWidth() / 8); } return false; } @@ -3000,9 +3500,633 @@ bool MasmParser::parseDirectiveRealValue(StringRef IDVal, bool MasmParser::parseDirectiveNamedRealValue(StringRef IDVal, const fltSemantics &Semantics, StringRef Name, SMLoc NameLoc) { - MCSymbol *Sym = getContext().getOrCreateSymbol(Name); - getStreamer().emitLabel(Sym); - return parseDirectiveRealValue(IDVal, Semantics); + if (checkForValidSection()) + return true; + + if (StructInProgress.empty()) { + // Initialize named data value. + MCSymbol *Sym = getContext().getOrCreateSymbol(Name); + getStreamer().emitLabel(Sym); + if (emitRealValues(Semantics)) + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } else if (addRealField(Name, Semantics)) { + return addErrorSuffix(" in '" + Twine(IDVal) + "' directive"); + } + return false; +} + +bool MasmParser::parseOptionalAngleBracketOpen() { + const AsmToken Tok = getTok(); + if (parseOptionalToken(AsmToken::LessLess)) { + AngleBracketDepth++; + Lexer.UnLex(AsmToken(AsmToken::Less, Tok.getString().substr(1))); + return true; + } else if (parseOptionalToken(AsmToken::LessGreater)) { + AngleBracketDepth++; + Lexer.UnLex(AsmToken(AsmToken::Greater, Tok.getString().substr(1))); + return true; + } else if (parseOptionalToken(AsmToken::Less)) { + AngleBracketDepth++; + return true; + } + + return false; +} + +bool MasmParser::parseAngleBracketClose(const Twine &Msg) { + const AsmToken Tok = getTok(); + if (parseOptionalToken(AsmToken::GreaterGreater)) { + Lexer.UnLex(AsmToken(AsmToken::Greater, Tok.getString().substr(1))); + } else if (parseToken(AsmToken::Greater, Msg)) { + return true; + } + AngleBracketDepth--; + return false; +} + +bool MasmParser::parseFieldInitializer(const FieldInfo &Field, + const IntFieldInfo &Contents, + FieldInitializer &Initializer) { + SMLoc Loc = getTok().getLoc(); + + SmallVector Values; + if (parseOptionalToken(AsmToken::LCurly)) { + if (Field.LengthOf == 1 && Field.Type > 1) + return Error(Loc, "Cannot initialize scalar field with array value"); + if (parseScalarInstList(Field.Type, Values, AsmToken::RCurly) || + parseToken(AsmToken::RCurly)) + return true; + } else if (parseOptionalAngleBracketOpen()) { + if (Field.LengthOf == 1 && Field.Type > 1) + return Error(Loc, "Cannot initialize scalar field with array value"); + if (parseScalarInstList(Field.Type, Values, AsmToken::Greater) || + parseAngleBracketClose()) + return true; + } else if (Field.LengthOf > 1 && Field.Type > 1) { + return Error(Loc, "Cannot initialize array field with scalar value"); + } else if (parseScalarInitializer(Field.Type, Values, + /*StringPadLength=*/Field.LengthOf)) { + return true; + } + + if (Values.size() > Field.LengthOf) { + return Error(Loc, "Initializer too long for field; expected at most " + + std::to_string(Field.LengthOf) + " elements, got " + + std::to_string(Values.size())); + } + // Default-initialize all remaining values. + Values.append(Contents.Values.begin() + Values.size(), Contents.Values.end()); + + Initializer = FieldInitializer(std::move(Values)); + return false; +} + +bool MasmParser::parseFieldInitializer(const FieldInfo &Field, + const RealFieldInfo &Contents, + FieldInitializer &Initializer) { + const fltSemantics &Semantics = + (Field.Type == 4) ? APFloat::IEEEsingle() : APFloat::IEEEdouble(); + + SMLoc Loc = getTok().getLoc(); + + SmallVector AsIntValues; + if (parseOptionalToken(AsmToken::LCurly)) { + if (Field.LengthOf == 1) + return Error(Loc, "Cannot initialize scalar field with array value"); + if (parseRealInstList(Semantics, AsIntValues, AsmToken::RCurly) || + parseToken(AsmToken::RCurly)) + return true; + } else if (parseOptionalAngleBracketOpen()) { + if (Field.LengthOf == 1) + return Error(Loc, "Cannot initialize scalar field with array value"); + if (parseRealInstList(Semantics, AsIntValues, AsmToken::Greater) || + parseAngleBracketClose()) + return true; + } else if (Field.LengthOf > 1) { + return Error(Loc, "Cannot initialize array field with scalar value"); + } else { + AsIntValues.emplace_back(); + if (parseRealValue(Semantics, AsIntValues.back())) + return true; + } + + if (AsIntValues.size() > Field.LengthOf) { + return Error(Loc, "Initializer too long for field; expected at most " + + std::to_string(Field.LengthOf) + " elements, got " + + std::to_string(AsIntValues.size())); + } + // Default-initialize all remaining values. + AsIntValues.append(Contents.AsIntValues.begin() + AsIntValues.size(), + Contents.AsIntValues.end()); + + Initializer = FieldInitializer(std::move(AsIntValues)); + return false; +} + +bool MasmParser::parseFieldInitializer(const FieldInfo &Field, + const StructFieldInfo &Contents, + FieldInitializer &Initializer) { + SMLoc Loc = getTok().getLoc(); + + std::vector Initializers; + if (Field.LengthOf > 1) { + if (parseOptionalToken(AsmToken::LCurly)) { + if (parseStructInstList(Contents.Structure, Initializers, + AsmToken::RCurly) || + parseToken(AsmToken::RCurly)) + return true; + } else if (parseOptionalAngleBracketOpen()) { + if (parseStructInstList(Contents.Structure, Initializers, + AsmToken::Greater) || + parseAngleBracketClose()) + return true; + } else { + return Error(Loc, "Cannot initialize array field with scalar value"); + } + } else { + Initializers.emplace_back(); + if (parseStructInitializer(Contents.Structure, Initializers.back())) + return true; + } + + if (Initializers.size() > Field.LengthOf) { + return Error(Loc, "Initializer too long for field; expected at most " + + std::to_string(Field.LengthOf) + " elements, got " + + std::to_string(Initializers.size())); + } + // Default-initialize all remaining values. + Initializers.insert(Initializers.end(), + Contents.Initializers.begin() + Initializers.size(), + Contents.Initializers.end()); + + Initializer = FieldInitializer(std::move(Initializers), Contents.Structure); + return false; +} + +bool MasmParser::parseFieldInitializer(const FieldInfo &Field, + FieldInitializer &Initializer) { + switch (Field.Contents.FT) { + case FT_INTEGRAL: + return parseFieldInitializer(Field, Field.Contents.IntInfo, Initializer); + case FT_REAL: + return parseFieldInitializer(Field, Field.Contents.RealInfo, Initializer); + case FT_STRUCT: + return parseFieldInitializer(Field, Field.Contents.StructInfo, Initializer); + } +} + +bool MasmParser::parseStructInitializer(const StructInfo &Structure, + StructInitializer &Initializer) { + const AsmToken FirstToken = getTok(); + + Optional EndToken; + if (parseOptionalToken(AsmToken::LCurly)) { + EndToken = AsmToken::RCurly; + } else if (parseOptionalAngleBracketOpen()) { + EndToken = AsmToken::Greater; + AngleBracketDepth++; + } else if (FirstToken.is(AsmToken::Identifier) && + FirstToken.getString() == "?") { + // ? initializer; leave EndToken uninitialized to treat as empty. + if (parseToken(AsmToken::Identifier)) + return true; + } else { + return Error(FirstToken.getLoc(), "Expected struct initializer"); + } + + auto &FieldInitializers = Initializer.FieldInitializers; + size_t FieldIndex = 0; + if (EndToken.hasValue()) { + // Initialize all fields with given initializers. + while (getTok().isNot(EndToken.getValue()) && + FieldIndex < Structure.Fields.size()) { + const FieldInfo &Field = Structure.Fields[FieldIndex++]; + if (parseOptionalToken(AsmToken::Comma)) { + // Empty initializer; use the default and continue. (Also, allow line + // continuation.) + FieldInitializers.push_back(Field.Contents); + parseOptionalToken(AsmToken::EndOfStatement); + continue; + } + FieldInitializers.emplace_back(Field.Contents.FT); + if (parseFieldInitializer(Field, FieldInitializers.back())) + return true; + + // Continue if we see a comma. (Also, allow line continuation.) + SMLoc CommaLoc = getTok().getLoc(); + if (!parseOptionalToken(AsmToken::Comma)) + break; + if (FieldIndex == Structure.Fields.size()) + return Error(CommaLoc, "'" + Structure.Name + + "' initializer initializes too many fields"); + parseOptionalToken(AsmToken::EndOfStatement); + } + } + // Default-initialize all remaining fields. + for (auto It = Structure.Fields.begin() + FieldIndex; + It != Structure.Fields.end(); ++It) { + const FieldInfo &Field = *It; + FieldInitializers.push_back(Field.Contents); + } + + if (EndToken.hasValue()) { + if (EndToken.getValue() == AsmToken::Greater) + return parseAngleBracketClose(); + + return parseToken(EndToken.getValue()); + } + + return false; +} + +bool MasmParser::parseStructInstList( + const StructInfo &Structure, std::vector &Initializers, + const AsmToken::TokenKind EndToken) { + while (getTok().isNot(EndToken) || + (EndToken == AsmToken::Greater && + getTok().isNot(AsmToken::GreaterGreater))) { + const AsmToken NextTok = Lexer.peekTok(); + if (NextTok.is(AsmToken::Identifier) && + NextTok.getString().equals_lower("dup")) { + const MCExpr *Value; + if (parseExpression(Value) || parseToken(AsmToken::Identifier)) + return true; + const MCConstantExpr *MCE = dyn_cast(Value); + if (!MCE) + return Error(Value->getLoc(), + "cannot repeat value a non-constant number of times"); + const int64_t Repetitions = MCE->getValue(); + if (Repetitions < 0) + return Error(Value->getLoc(), + "cannot repeat value a negative number of times"); + + std::vector DuplicatedValues; + if (parseToken(AsmToken::LParen, + "parentheses required for 'dup' contents") || + parseStructInstList(Structure, DuplicatedValues) || + parseToken(AsmToken::RParen, "unmatched parentheses")) + return true; + + for (int i = 0; i < Repetitions; ++i) + Initializers.insert(Initializers.end(), DuplicatedValues.begin(), + DuplicatedValues.end()); + } else { + Initializers.emplace_back(); + if (parseStructInitializer(Structure, Initializers.back())) + return true; + } + + // Continue if we see a comma. (Also, allow line continuation.) + if (!parseOptionalToken(AsmToken::Comma)) + break; + parseOptionalToken(AsmToken::EndOfStatement); + } + + return false; +} + +bool MasmParser::emitFieldValue(const FieldInfo &Field, + const IntFieldInfo &Contents) { + // Default-initialize all values. + for (const MCExpr *Value : Contents.Values) { + if (emitIntValue(Value, Field.Type)) + return true; + } + return false; +} + +bool MasmParser::emitFieldValue(const FieldInfo &Field, + const RealFieldInfo &Contents) { + for (const APInt &AsInt : Contents.AsIntValues) { + getStreamer().emitIntValue(AsInt.getLimitedValue(), + AsInt.getBitWidth() / 8); + } + return false; +} + +bool MasmParser::emitFieldValue(const FieldInfo &Field, + const StructFieldInfo &Contents) { + for (const auto &Initializer : Contents.Initializers) { + size_t Index = 0, Offset = 0; + for (const auto &SubField : Contents.Structure.Fields) { + getStreamer().emitZeros(SubField.Offset - Offset); + Offset = SubField.Offset + SubField.SizeOf; + emitFieldInitializer(SubField, Initializer.FieldInitializers[Index++]); + } + } + return false; +} + +bool MasmParser::emitFieldValue(const FieldInfo &Field) { + switch (Field.Contents.FT) { + case FT_INTEGRAL: + return emitFieldValue(Field, Field.Contents.IntInfo); + case FT_REAL: + return emitFieldValue(Field, Field.Contents.RealInfo); + case FT_STRUCT: + return emitFieldValue(Field, Field.Contents.StructInfo); + } +} + +bool MasmParser::emitStructValue(const StructInfo &Structure) { + size_t Offset = 0; + for (const auto &Field : Structure.Fields) { + getStreamer().emitZeros(Field.Offset - Offset); + if (emitFieldValue(Field)) + return true; + Offset = Field.Offset + Field.SizeOf; + } + // Add final padding. + if (Offset != Structure.Size) + getStreamer().emitZeros(Structure.Size - Offset); + return false; +} + +bool MasmParser::emitFieldInitializer(const FieldInfo &Field, + const IntFieldInfo &Contents, + const IntFieldInfo &Initializer) { + for (const auto &Value : Initializer.Values) { + if (emitIntValue(Value, Field.Type)) + return true; + } + // Default-initialize all remaining values. + for (auto it = Contents.Values.begin() + Initializer.Values.size(); + it != Contents.Values.end(); ++it) { + const auto &Value = *it; + if (emitIntValue(Value, Field.Type)) + return true; + } + return false; +} + +bool MasmParser::emitFieldInitializer(const FieldInfo &Field, + const RealFieldInfo &Contents, + const RealFieldInfo &Initializer) { + for (const auto &AsInt : Initializer.AsIntValues) { + getStreamer().emitIntValue(AsInt.getLimitedValue(), + AsInt.getBitWidth() / 8); + } + // Default-initialize all remaining values. + for (auto It = Contents.AsIntValues.begin() + Initializer.AsIntValues.size(); + It != Contents.AsIntValues.end(); ++It) { + const auto &AsInt = *It; + getStreamer().emitIntValue(AsInt.getLimitedValue(), + AsInt.getBitWidth() / 8); + } + return false; +} + +bool MasmParser::emitFieldInitializer(const FieldInfo &Field, + const StructFieldInfo &Contents, + const StructFieldInfo &Initializer) { + for (const auto &Init : Initializer.Initializers) { + emitStructInitializer(Contents.Structure, Init); + } + // Default-initialize all remaining values. + for (auto It = + Contents.Initializers.begin() + Initializer.Initializers.size(); + It != Contents.Initializers.end(); ++It) { + const auto &Init = *It; + emitStructInitializer(Contents.Structure, Init); + } + return false; +} + +bool MasmParser::emitFieldInitializer(const FieldInfo &Field, + const FieldInitializer &Initializer) { + switch (Field.Contents.FT) { + case FT_INTEGRAL: + return emitFieldInitializer(Field, Field.Contents.IntInfo, + Initializer.IntInfo); + case FT_REAL: + return emitFieldInitializer(Field, Field.Contents.RealInfo, + Initializer.RealInfo); + case FT_STRUCT: + return emitFieldInitializer(Field, Field.Contents.StructInfo, + Initializer.StructInfo); + } +} + +bool MasmParser::emitStructInitializer(const StructInfo &Structure, + const StructInitializer &Initializer) { + size_t Index = 0, Offset = 0; + for (const auto &Init : Initializer.FieldInitializers) { + const auto &Field = Structure.Fields[Index++]; + getStreamer().emitZeros(Field.Offset - Offset); + Offset = Field.Offset + Field.SizeOf; + if (emitFieldInitializer(Field, Init)) + return true; + } + // Default-initialize all remaining fields. + for (auto It = + Structure.Fields.begin() + Initializer.FieldInitializers.size(); + It != Structure.Fields.end(); ++It) { + const auto &Field = *It; + getStreamer().emitZeros(Field.Offset - Offset); + Offset = Field.Offset + Field.SizeOf; + if (emitFieldValue(Field)) + return true; + } + // Add final padding. + if (Offset != Structure.Size) + getStreamer().emitZeros(Structure.Size - Offset); + return false; +} + +// Set data values from initializers. +bool MasmParser::emitStructValues(const StructInfo &Structure) { + std::vector Initializers; + if (parseStructInstList(Structure, Initializers)) + return true; + + for (const auto &Initializer : Initializers) { + if (emitStructInitializer(Structure, Initializer)) + return true; + } + + return false; +} + +// Declare a field in the current struct. +bool MasmParser::addStructField(StringRef Name, const StructInfo &Structure) { + StructInfo &OwningStruct = StructInProgress.back(); + FieldInfo &Field = OwningStruct.addField(Name, FT_STRUCT); + StructFieldInfo &StructInfo = Field.Contents.StructInfo; + + StructInfo.Structure = Structure; + Field.Type = Structure.Size; + + if (parseStructInstList(Structure, StructInfo.Initializers)) + return true; + + Field.LengthOf = StructInfo.Initializers.size(); + Field.SizeOf = Field.Type * Field.LengthOf; + if (OwningStruct.IsUnion) + OwningStruct.Size = std::max(OwningStruct.Size, Field.SizeOf); + else + OwningStruct.Size += Field.SizeOf; + + return false; +} + +/// parseDirectiveStructValue +/// ::= struct-id ( | {struct-initializer}) +/// [, ( | {struct-initializer})]* +bool MasmParser::parseDirectiveStructValue(const StructInfo &Structure, + StringRef Directive, SMLoc DirLoc) { + if (StructInProgress.empty()) { + if (emitStructValues(Structure)) + return true; + } else if (addStructField("", Structure)) { + return addErrorSuffix(" in '" + Twine(Directive) + "' directive"); + } + + return false; +} + +/// parseDirectiveNamedValue +/// ::= name (byte | word | ... ) [ expression (, expression)* ] +bool MasmParser::parseDirectiveNamedStructValue(const StructInfo &Structure, + StringRef Directive, + SMLoc DirLoc, StringRef Name) { + if (StructInProgress.empty()) { + // Initialize named data value. + MCSymbol *Sym = getContext().getOrCreateSymbol(Name); + getStreamer().emitLabel(Sym); + KnownType[Name] = &Structure; + if (emitStructValues(Structure)) + return true; + } else if (addStructField(Name, Structure)) { + return addErrorSuffix(" in '" + Twine(Directive) + "' directive"); + } + + return false; +} + +/// parseDirectiveStruct +/// ::= (STRUC | STRUCT | UNION) [fieldAlign] [, NONUNIQUE] +/// (dataDir | generalDir | offsetDir | nestedStruct)+ +/// ENDS +////// dataDir = data declaration +////// offsetDir = EVEN, ORG, ALIGN +bool MasmParser::parseDirectiveStruct(StringRef Directive, + DirectiveKind DirKind, StringRef Name, + SMLoc NameLoc) { + // We ignore NONUNIQUE; we do not support OPTION M510 or OPTION OLDSTRUCTS + // anyway, so all field accesses must be qualified. + AsmToken NextTok = getTok(); + int64_t AlignmentValue = 1; + if (NextTok.isNot(AsmToken::Comma) && + NextTok.isNot(AsmToken::EndOfStatement) && + parseAbsoluteExpression(AlignmentValue)) { + return addErrorSuffix(" in alignment value for '" + Twine(Directive) + + "' directive"); + } + if (!isPowerOf2_64(AlignmentValue)) { + return Error(NextTok.getLoc(), "alignment must be a power of two; was " + + std::to_string(AlignmentValue)); + } + + StringRef Qualifier; + SMLoc QualifierLoc; + if (parseOptionalToken(AsmToken::Comma)) { + QualifierLoc = getTok().getLoc(); + if (parseIdentifier(Qualifier)) + return addErrorSuffix(" in '" + Twine(Directive) + "' directive"); + if (!Qualifier.equals_lower("nonunique")) + return Error(QualifierLoc, "Unrecognized qualifier for '" + + Twine(Directive) + + "' directive; expected none or NONUNIQUE"); + } + + if (parseToken(AsmToken::EndOfStatement)) + return addErrorSuffix(" in '" + Twine(Directive) + "' directive"); + + StructInProgress.emplace_back(Name, DirKind == DK_UNION, AlignmentValue); + return false; +} + +/// parseDirectiveNestedStruct +/// ::= (STRUC | STRUCT | UNION) [name] +/// (dataDir | generalDir | offsetDir | nestedStruct)+ +/// ENDS +bool MasmParser::parseDirectiveNestedStruct(StringRef Directive, + DirectiveKind DirKind) { + if (StructInProgress.empty()) + return TokError("missing name in top-level '" + Twine(Directive) + + "' directive"); + + StringRef Name; + if (getTok().is(AsmToken::Identifier)) { + Name = getTok().getIdentifier(); + parseToken(AsmToken::Identifier); + } + if (parseToken(AsmToken::EndOfStatement)) + return addErrorSuffix(" in '" + Twine(Directive) + "' directive"); + + StructInProgress.emplace_back(Name, DirKind == DK_UNION, + StructInProgress.back().Alignment); + return false; +} + +bool MasmParser::parseDirectiveEnds(StringRef Name, SMLoc NameLoc) { + if (StructInProgress.empty()) + return Error(NameLoc, "ENDS directive without matching STRUC/STRUCT/UNION"); + if (StructInProgress.size() > 1) + return Error(NameLoc, "unexpected name in nested ENDS directive"); + if (StructInProgress.back().Name.compare_lower(Name)) + return Error(NameLoc, "mismatched name in ENDS directive; expected '" + + StructInProgress.back().Name + "'"); + StructInfo Structure = StructInProgress.pop_back_val(); + if (Structure.Size % Structure.Alignment != 0) { + // Pad to make the structure's size divisible by its alignment. + Structure.Size += + Structure.Alignment - (Structure.Size % Structure.Alignment); + } + Structs[Name.lower()] = Structure; + + if (parseToken(AsmToken::EndOfStatement)) + return addErrorSuffix(" in ENDS directive"); + + return false; +} + +bool MasmParser::parseDirectiveNestedEnds() { + if (StructInProgress.empty()) + return TokError("ENDS directive without matching STRUC/STRUCT/UNION"); + if (StructInProgress.size() == 1) + return TokError("missing name in top-level ENDS directive"); + + if (parseToken(AsmToken::EndOfStatement)) + return addErrorSuffix(" in nested ENDS directive"); + + StructInfo Structure = StructInProgress.pop_back_val(); + if (Structure.Size % Structure.Alignment != 0) { + // Pad to make the structure's size divisible by its alignment. + Structure.Size += + Structure.Alignment - (Structure.Size % Structure.Alignment); + } + StructInfo &ParentStruct = StructInProgress.back(); + + FieldInfo &Field = ParentStruct.addField(Structure.Name, FT_STRUCT); + StructFieldInfo &StructInfo = Field.Contents.StructInfo; + Field.Type = Structure.Size; + Field.LengthOf = 1; + Field.SizeOf = Structure.Size; + + if (ParentStruct.IsUnion) + ParentStruct.Size = std::max(ParentStruct.Size, Field.SizeOf); + else + ParentStruct.Size += Field.SizeOf; + + StructInfo.Structure = Structure; + StructInfo.Initializers.emplace_back(); + auto &FieldInitializers = StructInfo.Initializers.back().FieldInitializers; + for (const auto &SubField : Structure.Fields) { + FieldInitializers.push_back(SubField.Contents); + } + + return false; } /// parseDirectiveOrg @@ -3033,7 +4157,7 @@ bool MasmParser::parseDirectiveAlign() { if (checkForValidSection()) return addErrorSuffix(" in align directive"); - // Ignore empty 'align' directives + // Ignore empty 'align' directives. if (getTok().is(AsmToken::EndOfStatement)) { Warning(AlignmentLoc, "align directive with no operand is ignored"); return parseToken(AsmToken::EndOfStatement); @@ -3045,9 +4169,8 @@ bool MasmParser::parseDirectiveAlign() { // Always emit an alignment here even if we thrown an error. bool ReturnVal = false; - // Reject alignments that aren't either a power of two or zero, - // for gas compatibility. Alignment of zero is silently rounded - // up to one. + // Reject alignments that aren't either a power of two or zero, for gas + // compatibility. Alignment of zero is silently rounded up to one. if (Alignment == 0) Alignment = 1; if (!isPowerOf2_64(Alignment)) @@ -4088,7 +5211,7 @@ bool MasmParser::parseDirectiveMacro(SMLoc DirectiveLoc) { if (parseIdentifier(Parameter.Name)) return TokError("expected identifier in '.macro' directive"); - // Emit an error if two (or more) named parameters share the same name + // Emit an error if two (or more) named parameters share the same name. for (const MCAsmMacroParameter& CurrParam : Parameters) if (CurrParam.Name.equals(Parameter.Name)) return TokError("macro '" + Name + "' has multiple parameters" @@ -4137,7 +5260,7 @@ bool MasmParser::parseDirectiveMacro(SMLoc DirectiveLoc) { // Eat just the end of statement. Lexer.Lex(); - // Consuming deferred text, so use Lexer.Lex to ignore Lexing Errors + // Consuming deferred text, so use Lexer.Lex to ignore Lexing Errors. AsmToken EndToken, StartToken = getTok(); unsigned MacroDepth = 0; // Lex the macro definition. @@ -4445,7 +5568,7 @@ bool MasmParser::parseDirectiveComm(bool IsLocal) { if (!Sym->isUndefined()) return Error(IDLoc, "invalid symbol redefinition"); - // Create the Symbol as a common or local common with Size and Pow2Alignment + // Create the Symbol as a common or local common with Size and Pow2Alignment. if (IsLocal) { getStreamer().emitLocalCommonSymbol(Sym, Size, 1 << Pow2Alignment); return false; @@ -5130,6 +6253,10 @@ void MasmParser::initializeDirectiveKindMap() { DirectiveKindMap["dq"] = DK_DQ; DirectiveKindMap["dw"] = DK_DW; DirectiveKindMap["echo"] = DK_ECHO; + DirectiveKindMap["struc"] = DK_STRUCT; + DirectiveKindMap["struct"] = DK_STRUCT; + DirectiveKindMap["union"] = DK_UNION; + DirectiveKindMap["ends"] = DK_ENDS; } MCAsmMacro *MasmParser::parseMacroLikeBody(SMLoc DirectiveLoc) { @@ -5389,6 +6516,49 @@ static int rewritesSort(const AsmRewrite *AsmRewriteA, llvm_unreachable("Unstable rewrite sort."); } +bool MasmParser::LookUpFieldOffset(StringRef Base, StringRef Member, + unsigned &Offset) { + if (Base.empty()) + return true; + + auto TypeIt = KnownType.find(Base); + if (TypeIt != KnownType.end()) + return LookUpFieldOffset(*TypeIt->second, Member, Offset); + + auto StructIt = Structs.find(Base.lower()); + if (StructIt != Structs.end()) + return LookUpFieldOffset(StructIt->second, Member, Offset); + + return true; +} + +bool MasmParser::LookUpFieldOffset(const StructInfo &Structure, + StringRef Member, unsigned &Offset) { + std::pair Split = Member.split('.'); + const StringRef FieldName = Split.first, FieldMember = Split.second; + + auto FieldIt = Structure.FieldsByName.find(FieldName.lower()); + if (FieldIt == Structure.FieldsByName.end()) + return true; + + const FieldInfo &Field = Structure.Fields[FieldIt->second]; + if (FieldMember.empty()) { + Offset = Field.Offset; + return false; + } + + if (Field.Contents.FT != FT_STRUCT) + return true; + const StructFieldInfo &StructInfo = Field.Contents.StructInfo; + + bool Result = LookUpFieldOffset(StructInfo.Structure, FieldMember, Offset); + if (Result) + return true; + + Offset += Field.Offset; + return false; +} + bool MasmParser::parseMSInlineAsm( void *AsmLoc, std::string &AsmString, unsigned &NumOutputs, unsigned &NumInputs, SmallVectorImpl> &OpDecls, @@ -5412,7 +6582,7 @@ bool MasmParser::parseMSInlineAsm( unsigned InputIdx = 0; unsigned OutputIdx = 0; while (getLexer().isNot(AsmToken::Eof)) { - // Parse curly braces marking block start/end + // Parse curly braces marking block start/end. if (parseCurlyBlockScope(AsmStrRewrites)) continue; @@ -5458,7 +6628,7 @@ bool MasmParser::parseMSInlineAsm( StringRef Constraint = Operand.getConstraint(); if (Operand.isImm()) { - // Offset as immediate + // Offset as immediate. if (Operand.isOffsetOfLocal()) Constraint = "r"; else diff --git a/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp b/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp index 7de2b2079b8d..0573d4eec059 100644 --- a/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp +++ b/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp @@ -864,6 +864,8 @@ class X86AsmParser : public MCTargetAsmParser { return nullptr; } + bool MatchRegisterByName(unsigned &RegNo, StringRef RegName, SMLoc StartLoc, + SMLoc EndLoc); bool ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc, bool RestoreOnFailure); @@ -1145,6 +1147,108 @@ static bool CheckBaseRegAndIndexRegAndScale(unsigned BaseReg, unsigned IndexReg, return checkScale(Scale, ErrMsg); } +bool X86AsmParser::MatchRegisterByName(unsigned &RegNo, StringRef RegName, + SMLoc StartLoc, SMLoc EndLoc) { + // If we encounter a %, ignore it. This code handles registers with and + // without the prefix, unprefixed registers can occur in cfi directives. + RegName.consume_front("%"); + + RegNo = MatchRegisterName(RegName); + + // If the match failed, try the register name as lowercase. + if (RegNo == 0) + RegNo = MatchRegisterName(RegName.lower()); + + // The "flags" and "mxcsr" registers cannot be referenced directly. + // Treat it as an identifier instead. + if (isParsingMSInlineAsm() && isParsingIntelSyntax() && + (RegNo == X86::EFLAGS || RegNo == X86::MXCSR)) + RegNo = 0; + + if (!is64BitMode()) { + // FIXME: This should be done using Requires and + // Requires so "eiz" usage in 64-bit instructions can be also + // checked. + // FIXME: Check AH, CH, DH, BH cannot be used in an instruction requiring a + // REX prefix. + if (RegNo == X86::RIZ || RegNo == X86::RIP || + X86MCRegisterClasses[X86::GR64RegClassID].contains(RegNo) || + X86II::isX86_64NonExtLowByteReg(RegNo) || + X86II::isX86_64ExtendedReg(RegNo)) { + return Error(StartLoc, + "register %" + RegName + " is only available in 64-bit mode", + SMRange(StartLoc, EndLoc)); + } + } + + // If this is "db[0-15]", match it as an alias + // for dr[0-15]. + if (RegNo == 0 && RegName.startswith("db")) { + if (RegName.size() == 3) { + switch (RegName[2]) { + case '0': + RegNo = X86::DR0; + break; + case '1': + RegNo = X86::DR1; + break; + case '2': + RegNo = X86::DR2; + break; + case '3': + RegNo = X86::DR3; + break; + case '4': + RegNo = X86::DR4; + break; + case '5': + RegNo = X86::DR5; + break; + case '6': + RegNo = X86::DR6; + break; + case '7': + RegNo = X86::DR7; + break; + case '8': + RegNo = X86::DR8; + break; + case '9': + RegNo = X86::DR9; + break; + } + } else if (RegName.size() == 4 && RegName[2] == '1') { + switch (RegName[3]) { + case '0': + RegNo = X86::DR10; + break; + case '1': + RegNo = X86::DR11; + break; + case '2': + RegNo = X86::DR12; + break; + case '3': + RegNo = X86::DR13; + break; + case '4': + RegNo = X86::DR14; + break; + case '5': + RegNo = X86::DR15; + break; + } + } + } + + if (RegNo == 0) { + if (isParsingIntelSyntax()) + return true; + return Error(StartLoc, "invalid register name", SMRange(StartLoc, EndLoc)); + } + return false; +} + bool X86AsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc, bool RestoreOnFailure) { MCAsmParser &Parser = getParser(); @@ -1180,37 +1284,9 @@ bool X86AsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMRange(StartLoc, EndLoc)); } - RegNo = MatchRegisterName(Tok.getString()); - - // If the match failed, try the register name as lowercase. - if (RegNo == 0) - RegNo = MatchRegisterName(Tok.getString().lower()); - - // The "flags" and "mxcsr" registers cannot be referenced directly. - // Treat it as an identifier instead. - if (isParsingMSInlineAsm() && isParsingIntelSyntax() && - (RegNo == X86::EFLAGS || RegNo == X86::MXCSR)) - RegNo = 0; - - if (!is64BitMode()) { - // FIXME: This should be done using Requires and - // Requires so "eiz" usage in 64-bit instructions can be also - // checked. - // FIXME: Check AH, CH, DH, BH cannot be used in an instruction requiring a - // REX prefix. - if (RegNo == X86::RIZ || RegNo == X86::RIP || - X86MCRegisterClasses[X86::GR64RegClassID].contains(RegNo) || - X86II::isX86_64NonExtLowByteReg(RegNo) || - X86II::isX86_64ExtendedReg(RegNo)) { - StringRef RegName = Tok.getString(); - OnFailure(); - if (!RestoreOnFailure) { - Parser.Lex(); // Eat register name. - } - return Error(StartLoc, - "register %" + RegName + " is only available in 64-bit mode", - SMRange(StartLoc, EndLoc)); - } + if (MatchRegisterByName(RegNo, Tok.getString(), StartLoc, EndLoc)) { + OnFailure(); + return true; } // Parse "%st" as "%st(0)" and "%st(1)", which is multiple tokens. @@ -1259,40 +1335,6 @@ bool X86AsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, EndLoc = Parser.getTok().getEndLoc(); - // If this is "db[0-15]", match it as an alias - // for dr[0-15]. - if (RegNo == 0 && Tok.getString().startswith("db")) { - if (Tok.getString().size() == 3) { - switch (Tok.getString()[2]) { - case '0': RegNo = X86::DR0; break; - case '1': RegNo = X86::DR1; break; - case '2': RegNo = X86::DR2; break; - case '3': RegNo = X86::DR3; break; - case '4': RegNo = X86::DR4; break; - case '5': RegNo = X86::DR5; break; - case '6': RegNo = X86::DR6; break; - case '7': RegNo = X86::DR7; break; - case '8': RegNo = X86::DR8; break; - case '9': RegNo = X86::DR9; break; - } - } else if (Tok.getString().size() == 4 && Tok.getString()[2] == '1') { - switch (Tok.getString()[3]) { - case '0': RegNo = X86::DR10; break; - case '1': RegNo = X86::DR11; break; - case '2': RegNo = X86::DR12; break; - case '3': RegNo = X86::DR13; break; - case '4': RegNo = X86::DR14; break; - case '5': RegNo = X86::DR15; break; - } - } - - if (RegNo != 0) { - EndLoc = Parser.getTok().getEndLoc(); - Parser.Lex(); // Eat it. - return false; - } - } - if (RegNo == 0) { OnFailure(); if (isParsingIntelSyntax()) return true; @@ -1590,12 +1632,41 @@ bool X86AsmParser::ParseIntelExpression(IntelExprStateMachine &SM, SMLoc &End) { SMLoc IdentLoc = Tok.getLoc(); StringRef Identifier = Tok.getString(); UpdateLocLex = false; - // Register + // Register, or (MASM only) . unsigned Reg; - if (Tok.is(AsmToken::Identifier) && !ParseRegister(Reg, IdentLoc, End)) { - if (SM.onRegister(Reg, ErrMsg)) - return Error(Tok.getLoc(), ErrMsg); - break; + if (Tok.is(AsmToken::Identifier)) { + if (!ParseRegister(Reg, IdentLoc, End, /*RestoreOnFailure=*/true)) { + if (SM.onRegister(Reg, ErrMsg)) + return Error(IdentLoc, ErrMsg); + break; + } + if (Parser.isParsingMasm()) { + const std::pair RegField = + Tok.getString().split('.'); + const StringRef RegName = RegField.first, Field = RegField.second; + SMLoc RegEndLoc = + SMLoc::getFromPointer(RegName.data() + RegName.size()); + if (!Field.empty() && + !MatchRegisterByName(Reg, RegName, IdentLoc, RegEndLoc)) { + if (SM.onRegister(Reg, ErrMsg)) + return Error(IdentLoc, ErrMsg); + + SMLoc FieldStartLoc = SMLoc::getFromPointer(Field.data()); + const std::pair BaseMember = Field.split('.'); + const StringRef Base = BaseMember.first, Member = BaseMember.second; + + unsigned Offset; + if (Parser.LookUpFieldOffset(Base, Member, Offset)) + return Error(FieldStartLoc, "unknown offset"); + else if (SM.onPlus(ErrMsg)) + return Error(getTok().getLoc(), ErrMsg); + else if (SM.onInteger(Offset, ErrMsg)) + return Error(IdentLoc, ErrMsg); + + End = consumeToken(); + break; + } + } } // Operator synonymous ("not", "or" etc.) bool ParseError = false; @@ -1607,37 +1678,39 @@ bool X86AsmParser::ParseIntelExpression(IntelExprStateMachine &SM, SMLoc &End) { // Symbol reference, when parsing assembly content InlineAsmIdentifierInfo Info; const MCExpr *Val; - if (!isParsingMSInlineAsm()) { - if (getParser().parsePrimaryExpr(Val, End)) { - return Error(Tok.getLoc(), "Unexpected identifier!"); - } else if (SM.onIdentifierExpr(Val, Identifier, Info, false, ErrMsg)) { - return Error(IdentLoc, ErrMsg); - } else + if (isParsingMSInlineAsm() || Parser.isParsingMasm()) { + // MS Dot Operator expression + if (Identifier.count('.') && PrevTK == AsmToken::RBrac) { + if (ParseIntelDotOperator(SM, End)) + return true; break; + } } - // MS InlineAsm operators (TYPE/LENGTH/SIZE) - if (unsigned OpKind = IdentifyIntelInlineAsmOperator(Identifier)) { - if (int64_t Val = ParseIntelInlineAsmOperator(OpKind)) { - if (SM.onInteger(Val, ErrMsg)) - return Error(IdentLoc, ErrMsg); - } else - return true; - break; - } - // MS Dot Operator expression - if (Identifier.count('.') && PrevTK == AsmToken::RBrac) { - if (ParseIntelDotOperator(SM, End)) + if (isParsingMSInlineAsm()) { + // MS InlineAsm operators (TYPE/LENGTH/SIZE) + if (unsigned OpKind = IdentifyIntelInlineAsmOperator(Identifier)) { + if (int64_t Val = ParseIntelInlineAsmOperator(OpKind)) { + if (SM.onInteger(Val, ErrMsg)) + return Error(IdentLoc, ErrMsg); + } else + return true; + break; + } + // MS InlineAsm identifier + // Call parseIdentifier() to combine @ with the identifier behind it. + if (TK == AsmToken::At && Parser.parseIdentifier(Identifier)) + return Error(IdentLoc, "expected identifier"); + if (ParseIntelInlineAsmIdentifier(Val, Identifier, Info, false, End)) return true; + else if (SM.onIdentifierExpr(Val, Identifier, Info, true, ErrMsg)) + return Error(IdentLoc, ErrMsg); break; } - // MS InlineAsm identifier - // Call parseIdentifier() to combine @ with the identifier behind it. - if (TK == AsmToken::At && Parser.parseIdentifier(Identifier)) - return Error(IdentLoc, "expected identifier"); - if (ParseIntelInlineAsmIdentifier(Val, Identifier, Info, false, End)) - return true; - else if (SM.onIdentifierExpr(Val, Identifier, Info, true, ErrMsg)) + if (getParser().parsePrimaryExpr(Val, End)) { + return Error(Tok.getLoc(), "Unexpected identifier!"); + } else if (SM.onIdentifierExpr(Val, Identifier, Info, false, ErrMsg)) { return Error(IdentLoc, ErrMsg); + } break; } case AsmToken::Integer: { @@ -1856,10 +1929,14 @@ bool X86AsmParser::ParseIntelDotOperator(IntelExprStateMachine &SM, SMLoc &End) APInt DotDisp; DotDispStr.getAsInteger(10, DotDisp); Offset = DotDisp.getZExtValue(); - } else if (isParsingMSInlineAsm() && Tok.is(AsmToken::Identifier)) { - std::pair BaseMember = DotDispStr.split('.'); - if (SemaCallback->LookupInlineAsmField(BaseMember.first, BaseMember.second, - Offset)) + } else if ((isParsingMSInlineAsm() || getParser().isParsingMasm()) && + Tok.is(AsmToken::Identifier)) { + const std::pair BaseMember = DotDispStr.split('.'); + const StringRef Base = BaseMember.first, Member = BaseMember.second; + if (getParser().LookUpFieldOffset(SM.getSymName(), DotDispStr, Offset) && + getParser().LookUpFieldOffset(Base, Member, Offset) && + (!SemaCallback || + SemaCallback->LookupInlineAsmField(Base, Member, Offset))) return Error(Tok.getLoc(), "Unable to lookup field reference!"); } else return Error(Tok.getLoc(), "Unexpected token type!"); diff --git a/llvm/test/tools/llvm-ml/struct.test b/llvm/test/tools/llvm-ml/struct.test new file mode 100644 index 000000000000..0e60d2449455 --- /dev/null +++ b/llvm/test/tools/llvm-ml/struct.test @@ -0,0 +1,104 @@ +# RUN: llvm-ml -filetype=asm %s | FileCheck %s + +.data +BAZ STRUCT + a BYTE 1 + b BYTE 2 +BAZ ENDS + +FOOBAR struct 2 + c BYTE 3 DUP (4) + d DWORD 5 + e BAZ <> + STRUCT f + g BYTE 6 + h BYTE 7 + ends + h BYTE "abcde" +foobar ENDS + +t1 foobar <> + +; CHECK: t1: +; +; BYTE 3 DUP (4), plus alignment padding +; CHECK-NEXT: .byte 4 +; CHECK-NEXT: .byte 4 +; CHECK-NEXT: .byte 4 +; CHECK-NEXT: .zero 1 +; +; DWORD 5 +; CHECK-NEXT: .long 5 +; +; BAZ <> +; CHECK-NEXT: .byte 1 +; CHECK-NEXT: .byte 2 +; +; , with internal alignment padding +; CHECK-NEXT: .byte 6 +; CHECK-NEXT: .zero 1 +; CHECK-NEXT: .byte 7 +; CHECK-NEXT: .zero 1 +; +; BYTE "abcde", plus alignment padding +; CHECK-NEXT: .byte 97 +; CHECK-NEXT: .byte 98 +; CHECK-NEXT: .byte 99 +; CHECK-NEXT: .byte 100 +; CHECK-NEXT: .byte 101 +; CHECK-NEXT: .zero 1 + +t2 FOOBAR <"gh",,<10,11>,<12>,"ijk"> + +; CHECK: t2: +; +; BYTE "gh", padded with " ", plus alignment padding +; CHECK-NEXT: .byte 103 +; CHECK-NEXT: .byte 104 +; CHECK-NEXT: .byte 32 +; CHECK-NEXT: .zero 1 +; +; DWORD 5 (default-initialized when omitted) +; CHECK-NEXT: .long 5 +; +; BAZ <10, 11> +; CHECK-NEXT: .byte 10 +; CHECK-NEXT: .byte 11 +; +; , with internal alignment padding +; CHECK-NEXT: .byte 12 +; CHECK-NEXT: .zero 1 +; CHECK-NEXT: .byte 7 +; CHECK-NEXT: .zero 1 +; +; BYTE "ijk", padded with " ", plus alignment padding +; CHECK-NEXT: .byte 105 +; CHECK-NEXT: .byte 106 +; CHECK-NEXT: .byte 107 +; CHECK-NEXT: .byte 32 +; CHECK-NEXT: .byte 32 +; CHECK-NEXT: .zero 1 + +.code + +t3: +mov eax, t2.f.h +mov eax, [t2].f.h +mov eax, [t2.f.h] +mov eax, t2.FOOBAR.f.h + +; CHECK: t3: +; CHECK-NEXT: mov eax, dword ptr [rip + t2+12] +; CHECK-NEXT: mov eax, dword ptr [rip + t2+12] +; CHECK-NEXT: mov eax, dword ptr [rip + t2+12] +; CHECK-NEXT: mov eax, dword ptr [rip + t2+12] + +t4: +mov eax, j.FOOBAR.f.h +mov eax, j.baz.b + +; CHECK: t4: +; CHECK-NEXT: mov eax, dword ptr [rip + j+12] +; CHECK-NEXT: mov eax, dword ptr [rip + j+1] + +END diff --git a/llvm/test/tools/llvm-ml/struct_errors.test b/llvm/test/tools/llvm-ml/struct_errors.test new file mode 100644 index 000000000000..13d72a2840f0 --- /dev/null +++ b/llvm/test/tools/llvm-ml/struct_errors.test @@ -0,0 +1,57 @@ +# RUN: not llvm-ml -filetype=asm %s 2>&1 | FileCheck %s --dump-input=always + +.data +int_test STRUCT + int_arr DWORD ?, ? + int_scalar DWORD ? +int_test ENDS + +t1 int_test <<1,2,3>> +// CHECK: error: Initializer too long for field; expected at most 2 elements, got 3 + +t2 int_test <4> +// CHECK: error: Cannot initialize array field with scalar value + +t3 int_test <,<5,6>> +// CHECK: error: Cannot initialize scalar field with array value + +real_test STRUCT + real_arr REAL4 ?, ?, ? + real_scalar REAL4 ? +real_test ENDS + +t4 real_test <<1.0,0.0,-1.0,-2.0>> +// CHECK: error: Initializer too long for field; expected at most 3 elements, got 4 + +t5 real_test <2.0> +// CHECK: error: Cannot initialize array field with scalar value + +t6 real_test <,<2.0,-2.0>> +// CHECK: error: Cannot initialize scalar field with array value + +inner_struct STRUCT + a BYTE ? +inner_struct ENDS + +struct_test STRUCT + struct_arr inner_struct 4 DUP (?) + struct_scalar inner_struct ? +struct_test ENDS + +t7 struct_test <<<>, <>, <>, <>, <>>> +// CHECK: error: Initializer too long for field; expected at most 4 elements, got 5 + +t8 struct_test <,<<>, <>>> +// CHECK: error: 'inner_struct' initializer initializes too many fields + +t9 STRUCT 3 +// CHECK: error: alignment must be a power of two; was 3 +t9 ENDS + +t10 STRUCT 1, X +// CHECK: error: Unrecognized qualifier for 'STRUCT' directive; expected none or NONUNIQUE +t10 ENDS + +t11 STRUCT + diff erent_struct ENDS +// CHECK: error: mismatched name in ENDS directive; expected 't11' From llvm-commits at lists.llvm.org Tue Jul 7 14:02:52 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:02:52 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGbc8e262afe83: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support (authored by epastor). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 Files: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test llvm/test/tools/llvm-ml/struct_errors.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D75306.276205.patch Type: text/x-patch Size: 84988 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:07:50 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:07:50 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <1dbaca8a80a4ae1f778e944cc46279a8@localhost.localdomain> dmgreen updated this revision to Diff 276198. dmgreen marked 4 inline comments as done. dmgreen added a comment. Reinstate isInLoopReduction and fixup some wording/typos/capitalisation. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 Files: llvm/include/llvm/Analysis/IVDescriptors.h llvm/lib/Analysis/IVDescriptors.cpp llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp llvm/lib/Transforms/Vectorize/VPlan.h llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll llvm/test/Transforms/LoopVectorize/reduction-inloop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D75069.276198.patch Type: text/x-patch Size: 64690 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:08:01 2020 From: llvm-commits at lists.llvm.org (Rainer Orth via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:08:01 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: ro added a comment. @isuruf, do you intend to commit your patch any time soon? It would be good to have it in-tree before LLVM 11 branches. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 From llvm-commits at lists.llvm.org Tue Jul 7 14:08:11 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:08:11 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: dmgreen added inline comments. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812 + if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) && + LHS->hasOneUse() && + findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) { ---------------- Ayal wrote: > dmgreen wrote: > > Ayal wrote: > > > dmgreen wrote: > > > > fhahn wrote: > > > > > Ayal wrote: > > > > > > Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr? > > > > > > > > > > > > Note that when extended to handle reductions with conditional bumps, some ops will have more than one use. > > > > > Instead of doing a recursive traversal, would it be simpler to just do the traversal iteratively, at least as long as we are only using at a single use chain? > > > > Yeah, that direction makes it a lot simpler. Thanks. > > > Is treating sub as an add reduction something in-loop reduction could support as a future extension? > > Hmm. I don't want to say never. A normal inloop reduction looks like: > > p = PHI(0, a) > > l = VLDR (..) > > a = VADDVA(p, l) > > Where the `VADDV` is an across-vector reductions, and the extra `A` means also add p. Reducing a sub would need to become: > > p = PHI(0, a) > > l = VLDR (..) > > a = VADDV(l) > > p = SUB(p, a) > > With the SUB as a separate scalar instruction, which would be quite slow on some hardware (getting a value over from the VADDV to the SUB). So this would almost certainly be slower than a out-of-loop reduction. > > > > But if we could end up using a higher vector factor for the reduction, or end up vectorizing loops that would previously not be vectorized.. that may lead to a gain overall to overcome the extra cost of adding the sub to the loop. It will require some very careful costing I think. And maybe the ability to create multiple vplans and cost them against one another :) > An original sub code, say, acc -= a[i], can be treated as acc += (-a[i]). This could be in-loop reduced by first negating a[i]'s, at LV's LLVM-IR level, presumably lowered later to something like > > ``` > p = PHI(0, a) > l = VLDR (..) > s = VSUBV (zero, l) > a = VADDVA(p, s) > ``` > , right? Yep. We would have the option to trading a scalar instruction for a vector instruction + an extra register (to hold the 0, we only have 8 registers!) Unfortunately both would be slower than in out-of-loop reduction unless we were vectorizing at a higher factor, though. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769 // MinMax reduction have the start value as their identify. - if (VF == 1) { + if (VF == 1 || UseInloopReductions) { VectorStart = Identity = ReductionStartValue; ---------------- Ayal wrote: > dmgreen wrote: > > Ayal wrote: > > > dmgreen wrote: > > > > Ayal wrote: > > > > > dmgreen wrote: > > > > > > Ayal wrote: > > > > > > > This is dead code if cmp/select chains are not recognized yet, as noted above. > > > > > > I've added the code to handle minmax too (but not tested it a lot yet. I will try that now). > > > > > > > > > > > > MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.) > > > > > > > > > > > > My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away. > > > > > Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize? > > > > -Os sounds like a good plan. It will take some backend work to make it efficient enough first though. And predicated reductions? > > > Hoisting the horizontal reduction from the middle block into the loop could potentially eliminate the middle block (as in tests below), so could presumably lead to code of smaller size? At-least for in-loop chains of a single link. > > > > > > > And predicated reductions? > > > These are yet to be handled in-loop, right? > > >> And predicated reductions? > > >These are yet to be handled in-loop, right? > > Yep. It will need a predicated reduction intrinsic. A vecreduce that takes a mask. That will allow us to tail-fold the reductions with trip counts that do not divide the vector factor, which will make them look a lot better under -Os. And nice in general I think once it all starts being tail predicated. > > > > The backend work I was mentioning was that we need to more efficiently transform > > x = min(vecreduce.min(z), y) > > into > > x = VMINV(y, z) > > Where y is (confusingly) accumulated in the case (even though the instruction doesn't have an A suffix). We currently generate > > x = min(VMINV(UINT_MAX, z), y) > > > > Once that is sorted out then, yep, using these for Os sounds like a good plan. > Re: predicated reductions - could they be handled by replacing masked-off elements with `Identity` using a select prior to reduction? To be potentially folded later by suitable targets into a predicated reduction operation which they may support. > Somewhat akin to "passthru" values of masked loads. Oh. So select s = select m, a, 0 v = vecreduce.add s to a predicated vaddv? Yeah, sounds interesting. I'll look into that as an alternative to predicated intrinsics. Nice suggestion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Tue Jul 7 14:13:51 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:13:51 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <8e4ed90d04394418de6633866fcf64f7@localhost.localdomain> eugenis added a comment. This code generates a libcall out of thin air. My intuition says the safest thing to do is to drop all call site attributes, because they generally specify something about how an attribute must be passed to the callee, and not a property of the value being passed, so there is no reason for the attribute lists on pow and on exp to have anything in common at all. This way we would lose the noundef attribute on the exp call arguments. We might extend TargetLibraryInfo in the future to specify attributes on the declarations. WDYT? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Tue Jul 7 14:14:38 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:14:38 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: <32269ef938ba019d5b558d3a66406abe@localhost.localdomain> reames added a comment. Initial comments, continue to look for other suggestions. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:228 +// Return true if V is a values which need not to be relocated/spilled. +static bool isConstantVal(SDValue V) { ---------------- Rebase on commit b172cd781. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:533 // Before we actually start lowering (and allocating spill slots for values), // reserve any stack slots which we judge to be profitable to reuse for a ---------------- Please replace your modifications to this function with the following: DenseSet LowerAsVReg; if (UseRegistersForGCPointers && isa(SI.StatepointInstr)) { for (unsigned i = 0; i < SI.Bases.size(); ++i) { if (willDirectlyLower(getValue(SI.Bases[i])) || continue; LowerAsVReg.insert(getValue(SI.Bases[i])); if (LowerAsVReg.size() == N) break; } } auto requireSpillSlot = [&](const Value *V) { if (isGCValue(V)) return LowerAsVReg.contains(getValue(V))); return !(LiveInDeopt || UseRegistersForDeoptValues); }; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Tue Jul 7 14:15:34 2020 From: llvm-commits at lists.llvm.org (Isuru Fernando via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:15:34 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: <760c482964b39d0a4a99b6741564114e@localhost.localdomain> isuruf added a comment. @ro, there are some red pre-merge checks. Not sure what those are about. Any ideas? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 From llvm-commits at lists.llvm.org Tue Jul 7 14:15:43 2020 From: llvm-commits at lists.llvm.org (Aaron H Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:15:43 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: AaronLiu added a comment. > In that case, best understand why LV's cost model claims vectorizing the loop is not profitable, which you and SLP know it is; and ideally fix LV's cost model. > A crash due to forced vectorization sounds like a bug, which best be reported and/or fixed. > If cases with concrete "obstacles" are identified preventing LV from vectorizing a loop but allowing SLP to vectorize (part of) it, after LV interleaves the loop, such obstacles could potentially be used to (further) drive LV to interleave the loop. Agree, ideally LV's cost model and its vectorization functionality should be improved in the future to be able to vectorize a lot more instructions. We see some applications keep being crashed, due to some changes in LV and probably being fixed later on, or because of its own weakness in some aspects. But all the above are beyond of this patch. Currently, LV and SLP complement each other, and there are cases that LV fails to vectorize (functionally not being able to do it) but SLP succeed. > Hence the term "small loop" should be more specific; as in "vectorizer-min-trip-count" / "TinyTripCountVectorThreshold". The "small or tiny" values are relative, and will keep on changing. In the situations we see, it is even more dynamic, the exact trip count is not known, but we know that it is relatively small. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Tue Jul 7 14:17:48 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:17:48 +0000 (UTC) Subject: [PATCH] D83197: [CodeGen] Fix warning in DAGTypeLegalizer::SplitVecRes_ExtendOp In-Reply-To: References: Message-ID: efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83197/new/ https://reviews.llvm.org/D83197 From llvm-commits at lists.llvm.org Tue Jul 7 14:19:00 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:19:00 +0000 (UTC) Subject: [PATCH] D83320: Hand port modfile01.f90 from test_modfile.sh to FileCheck In-Reply-To: References: Message-ID: tskeith added inline comments. ================ Comment at: flang/test/Semantics/modfile01.f90:1 -! RUN: %S/test_modfile.sh %s %t %f18 +! RUN: mkdir -p %t.dir +! RUN: %f18 -module %t.dir -fdebug-resolve-names -fparse-only %s ---------------- It looks like the temp directory could still be there from the previous run. Does it get cleaned up? test_modfile.sh has the same problem, dating back to the port to lit. ================ Comment at: flang/test/Semantics/modfile01.f90:9 +! Module files start with a 3-byte UTF BOM which lit struggles with. Remove it +! with sed before lit sees it. +! RUN: FileCheck --input-file=%t.dir/m1.mod --check-prefix=MOD-M1 %s ---------------- Where is this sed command? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83320/new/ https://reviews.llvm.org/D83320 From llvm-commits at lists.llvm.org Tue Jul 7 14:19:32 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:19:32 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram updated this revision to Diff 276206. tmsriram marked 3 inline comments as done. tmsriram added a comment. Simplify CFI instructions test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/DebugInfo/X86/basic-block-sections-cfi_1.ll llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.276206.patch Type: text/x-patch Size: 16078 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:22:42 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:22:42 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <73732747bab4c682ddf4ada41b326f38@localhost.localdomain> guiand updated this revision to Diff 276207. guiand added a comment. This update removes attribute list copying per the discussion above. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 Files: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp llvm/test/Transforms/InstCombine/pow_fp_int.ll llvm/test/Transforms/InstCombine/simplify-libcalls.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82820.276207.patch Type: text/x-patch Size: 4751 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:24:17 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:24:17 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll:23-30 +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8, int p9, int pa, int pb, int pc) { +; int result; +; if (k) +; result = p1 * p2 + p3 / p4 - p5 * p6 + p7 / p8 - p9 * pa + pb / pc; +; else +; result = p1 / p2 - p3 * p4 + p5 / p6 - p7 * p8 + p9 / pa - pb * pc; +; return result; ---------------- dblaikie wrote: > tmsriram wrote: > > dblaikie wrote: > > > Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call? > > > > > > (etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc) > > > > > > > > > > It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation: > > > > _Z7computebiiiiiiiiiiii.1: # %if.then > > .cfi_startproc > > .cfi_def_cfa %rbp, 16 > > .cfi_offset %rbx, -48 > > .cfi_offset %r12, -40 > > .cfi_offset %r14, -32 > > .cfi_offset %r15, -24 > > .cfi_offset %rbp, -16 > > > > Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers. > > > Ah, OK - a comment might be handy to describe that? > > And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc? > > Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated? Simplified the test and added comments. Having more than 4 integers in the struct seems to go to the stack though the ABI says upto 32 bytes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 14:25:22 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:25:22 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <5399fc3dabfd74b0db3bc53d5b466f23@localhost.localdomain> zequanwu updated this revision to Diff 276208. zequanwu marked an inline comment as done. zequanwu added a comment. Herald added subscribers: dexonsmith, steven_wu. - Disable `enable-call-graph-profile` by default in opt. - Disable `CGProfilePass` by default in clang unless `-no-integrated-as` is not given and `-fprofile-instrument-use-path=` is given, as this pass only generates module metadata when profile data is given. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/new-pm-defaults.ll llvm/test/Other/new-pm-thinlto-defaults.ll llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll llvm/tools/opt/NewPMDriver.cpp llvm/tools/opt/NewPMDriver.h llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276208.patch Type: text/x-patch Size: 17883 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:26:19 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:26:19 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. In-Reply-To: References: Message-ID: <5825c825dafbf81fd6ebd8c37c81879d@localhost.localdomain> RKSimon added reviewers: andreadb, RKSimon. RKSimon added a comment. You should probably commit the additional tests to trunk with its current codegen and then rebase the patch to show the diff. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83332/new/ https://reviews.llvm.org/D83332 From llvm-commits at lists.llvm.org Tue Jul 7 14:28:25 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:28:25 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <95b0735862ad22d7df54c2ff61f28cfb@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Tue Jul 7 14:29:51 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:29:51 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: RKSimon added a comment. @paulwalker-arm Does this finally fix PR12772? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Tue Jul 7 14:30:06 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:30:06 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <8b371876e5f0f387ad66f755c8d85a4c@localhost.localdomain> vitalybuka added inline comments. ================ Comment at: llvm/include/llvm/Analysis/TargetLibraryInfo.def:266 +/// void __atomic_load(size_t size, void *mptr, void *vptr, int smodel); +TLI_DEFINE_ENUM_INTERNAL(atomic_load) +TLI_DEFINE_STRING_INTERNAL("__atomic_load") ---------------- Can you move TargetLibraryInfo extension into a separate patch ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3546 + auto SrcShadowOriginPair = + getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), AlignOne, + /*isStore*/ false); ---------------- I'd use ", Align(1), " without temp var ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3553 + + NextIRB.CreateMemCpy(DstShadowPtr, AlignOne, SrcShadowOriginPair.first, + AlignOne, Size); ---------------- DstShadowPtr.first for consistency ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3544 + IRBuilder<> NextIRB(CB.getNextNode()); + Align AlignOne = assumeAligned(1); + auto SrcShadowOriginPair = ---------------- guiand wrote: > Is it valid to assume any alignment other than 1 here? I can't think of a way, but I'd like to make sure. Can we do MemCpy of shadow here at all? not all bits of shadow bytes correspond to the src/dst Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 From llvm-commits at lists.llvm.org Tue Jul 7 14:33:42 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:33:42 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGbc8e262afe83: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support (authored by epastor). Changed prior to commit: https://reviews.llvm.org/D75306?vs=274814&id=275690#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 Files: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test llvm/test/tools/llvm-ml/struct_errors.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D75306.275690.patch Type: text/x-patch Size: 84988 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:35:02 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:35:02 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: <881e96cd2868da046384ddd862711752@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:165 /// performance. bool OptForSize; ---------------- ARe more changes needed to get rid of this member? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 From llvm-commits at lists.llvm.org Tue Jul 7 14:35:13 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:35:13 +0000 (UTC) Subject: [PATCH] D83344: [ms] [llvm-ml] Improve MASM STRUCT field accessor support Message-ID: epastor created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Adds support for several accessors: - `[.].` - `[..].` (where `field` has already-defined STRUCT type) - `[.].` (where `field` has already-defined STRUCT type) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83344 Files: llvm/include/llvm/MC/MCParser/MCAsmParser.h llvm/lib/MC/MCParser/MasmParser.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83344.276213.patch Type: text/x-patch Size: 11508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:35:56 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:35:56 +0000 (UTC) Subject: [PATCH] D83345: [ms] [llvm-ml] Fix MASM support for nested unnamed STRUCTs and UNIONs Message-ID: epastor created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Fix MASM support for nested unnamed STRUCTs and UNIONs Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83345 Files: llvm/lib/MC/MCParser/MasmParser.cpp llvm/test/tools/llvm-ml/struct.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83345.276214.patch Type: text/x-patch Size: 3418 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:36:50 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:36:50 +0000 (UTC) Subject: [PATCH] D83346: [ms] [llvm-ml] Add support for MASM STRUCT casting field accessors: ( PTR ). Message-ID: epastor created this revision. Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. Herald added a project: LLVM. Add support for MASM STRUCT casting field accessors: ( PTR ). Since these are operands, we add them to X86AsmParser. If/when we extend MASM support to other architectures (e.g., ARM), we will need similar changes there as well. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83346 Files: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83346.276215.patch Type: text/x-patch Size: 3028 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:37:15 2020 From: llvm-commits at lists.llvm.org (Aaron H Liu via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:37:15 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: <02a3c2e3ee601d277595dcfd9e6d8718@localhost.localdomain> AaronLiu updated this revision to Diff 276212. AaronLiu added a comment. deleted: llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP.ll deleted: llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP_false.ll deleted: llvm/test/Transforms/SLPVectorizer/PowerPC/interleave_SLP.ll CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/PowerPC/interleave_IC.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81416.276212.patch Type: text/x-patch Size: 8525 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:38:46 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:38:46 +0000 (UTC) Subject: [PATCH] D83347: [ms] [llvm-ml] Add support for line continuations in MASM Message-ID: epastor created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Add support for line continuations (the "backslash operator") in MASM by modifying the Parser's Lex method. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83347 Files: llvm/lib/MC/MCParser/MasmParser.cpp Index: llvm/lib/MC/MCParser/MasmParser.cpp =================================================================== --- llvm/lib/MC/MCParser/MasmParser.cpp +++ llvm/lib/MC/MCParser/MasmParser.cpp @@ -1099,6 +1099,14 @@ tok = &Lexer.Lex(); } + // Recognize and bypass line continuations. + while (tok->is(AsmToken::BackSlash) && + Lexer.peekTok().is(AsmToken::EndOfStatement)) { + // Eat both the backslash and the end of statement. + Lexer.Lex(); + tok = &Lexer.Lex(); + } + if (tok->is(AsmToken::Eof)) { // If this is the end of an included file, pop the parent file off the // include stack. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83347.276216.patch Type: text/x-patch Size: 632 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:39:12 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:39:12 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: <707eaba6b396badf266ad1a1f764db43@localhost.localdomain> jcai19 marked 2 inline comments as done. jcai19 added inline comments. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:644 + + while (NumBytes) { + uint64_t NopsToEmit = (uint64_t)std::min(NumBytes, MaxNopLength); ---------------- reames wrote: > This loop is duplicated from within emitNops. Can you pass in a MaxNopLength parameter instead? There isn't any loop in emitNops. Do you by any chance refer to the loop in writeNopData? In that case, they are not duplicate as this loop will break total bytes into nop instructions no longer than specified by the second argument of .nops if provided, while the loop in writeNopData makes sure each instruction emitted is no longer than the maximum length allowed by the target. ================ Comment at: llvm/test/MC/X86/align-branch-bundle.s:9 # CHECK-NEXT: e: nop -# CHECK-NEXT: f: nop # CHECK-NEXT: 10: jle ---------------- jcai19 wrote: > reames wrote: > > Having a test delta in a file without .nops is highly suspicious. > > > > I'd suggest splitting your patch into a trivial version which emits single byte nops, and an change which adds the multiple byte support. That would allow us to separate the directive mechanics from the interesting profit bits. > How about we also print out instruction bytes here. If 64-bit processors can generate a two-byte long nop instruction here, shouldn't we emit that instead of two single-byte nop? Thanks. On second thought, I agree that splitting the patch is the better approach in case the multiple-byte support causes any regression. Will address this in the next iteration. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 14:39:22 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:39:22 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: eugenis added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:1788 Function *Callee = CI->getCalledFunction(); + AttributeList Attrs; // LibCalls have built-in attribute lists StringRef Name = Callee->getName(); ---------------- This comment seems wrong. (intrinsics have attribute lists, libcalls don't) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Tue Jul 7 14:39:26 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:39:26 +0000 (UTC) Subject: [PATCH] D83348: [ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM Message-ID: epastor created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Add support for bitwise named operators (AND, NOT, OR) in MASM. Also move valid-section checking to where it's actually required. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83348 Files: llvm/lib/MC/MCParser/MasmParser.cpp Index: llvm/lib/MC/MCParser/MasmParser.cpp =================================================================== --- llvm/lib/MC/MCParser/MasmParser.cpp +++ llvm/lib/MC/MCParser/MasmParser.cpp @@ -1779,8 +1779,18 @@ SMLoc &EndLoc) { SMLoc StartLoc = Lexer.getLoc(); while (true) { + AsmToken::TokenKind TokKind = Lexer.getKind(); + if (Lexer.getKind() == AsmToken::Identifier) { + StringRef Identifier = Lexer.getTok().getString(); + if (Identifier.equals_lower("and")) + TokKind = AsmToken::Amp; + else if (Identifier.equals_lower("not")) + TokKind = AsmToken::Exclaim; + else if (Identifier.equals_lower("or")) + TokKind = AsmToken::Pipe; + } MCBinaryExpr::Opcode Kind = MCBinaryExpr::Add; - unsigned TokPrec = getBinOpPrecedence(Lexer.getKind(), Kind); + unsigned TokPrec = getBinOpPrecedence(TokKind, Kind); // If the next token is lower precedence than we are allowed to eat, return // successfully with what we ate already. @@ -3236,7 +3246,7 @@ Lex(); } else { const MCExpr *Value; - if (checkForValidSection() || parseExpression(Value)) + if (parseExpression(Value)) return true; if (getTok().is(AsmToken::Identifier) && getTok().getString().equals_lower("dup")) { @@ -3456,6 +3466,9 @@ // Initialize real data values. bool MasmParser::emitRealValues(const fltSemantics &Semantics) { + if (checkForValidSection()) + return true; + SmallVector ValuesAsInt; if (parseRealInstList(Semantics, ValuesAsInt)) return true; @@ -3475,8 +3488,7 @@ Field.SizeOf = 0; - if (checkForValidSection() || - parseRealInstList(Semantics, RealInfo.AsIntValues)) + if (parseRealInstList(Semantics, RealInfo.AsIntValues)) return true; Field.Type = RealInfo.AsIntValues.back().getBitWidth() / 8; @@ -3493,9 +3505,6 @@ /// ::= (real4 | real8) [ expression (, expression)* ] bool MasmParser::parseDirectiveRealValue(StringRef IDVal, const fltSemantics &Semantics) { - if (checkForValidSection()) - return true; - if (StructInProgress.empty()) { // Initialize data value. if (emitRealValues(Semantics)) @@ -3511,9 +3520,6 @@ bool MasmParser::parseDirectiveNamedRealValue(StringRef IDVal, const fltSemantics &Semantics, StringRef Name, SMLoc NameLoc) { - if (checkForValidSection()) - return true; - if (StructInProgress.empty()) { // Initialize named data value. MCSymbol *Sym = getContext().getOrCreateSymbol(Name); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83348.276217.patch Type: text/x-patch Size: 2675 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:43:56 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:43:56 +0000 (UTC) Subject: [PATCH] D82927: Intergerate Loop Peeling into Loop Fusion In-Reply-To: References: Message-ID: <18a63bcc4e99d1f74738c0da1ac33d7c@localhost.localdomain> sidbav updated this revision to Diff 276219. sidbav added a comment. Address Bardia's comments, major change is removing the `loop-fusion-peel` option. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82927/new/ https://reviews.llvm.org/D82927 Files: llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/test/Transforms/LoopFusion/guarded_peel.ll llvm/test/Transforms/LoopFusion/guarded_unsafeblock_peel.ll llvm/test/Transforms/LoopFusion/nonadjacent_peel.ll llvm/test/Transforms/LoopFusion/peel.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82927.276219.patch Type: text/x-patch Size: 38876 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:49:29 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:49:29 +0000 (UTC) Subject: [PATCH] D83056: [NFC] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: sidbav marked an inline comment as done. sidbav added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:50 -#define DEBUG_TYPE "loop-unroll" +#define DEBUG_TYPE "loop-peel" ---------------- Meinersbur wrote: > fhahn wrote: > > I am not sure about this change. Currently peeling is integrated in loop-unroll and remarks/debug can be filtered by loop-unroll, but now we will generate remarks for `loop-unroll` and `loop-peel` when running `-loop-unroll`. > Isn't it actually better since you can now filter `-debug-only=loop-unroll`, respectively `-debug-only=loop-peel` depending on what you want to look at? > > Note: `-Rpass=` remarks use the pass name, not `DEBUG_TYPE`. I also agree with @Meinersbur, having them separate is better. Additionally, in the case that the developer wants to look at both unrolling and peeling at the same time, they can specify `debug-only=loop-unroll,loop-peel`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 14:50:33 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:50:33 +0000 (UTC) Subject: [PATCH] D83345: [ms] [llvm-ml] Fix MASM support for nested unnamed STRUCTs and UNIONs In-Reply-To: References: Message-ID: epastor updated this revision to Diff 276220. epastor added a comment. Avoid re-implementing alignTo. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83345/new/ https://reviews.llvm.org/D83345 Files: llvm/lib/MC/MCParser/MasmParser.cpp llvm/test/tools/llvm-ml/struct.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83345.276220.patch Type: text/x-patch Size: 4545 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:51:04 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:51:04 +0000 (UTC) Subject: [PATCH] D83346: [ms] [llvm-ml] Add support for MASM STRUCT casting field accessors: ( PTR ). In-Reply-To: References: Message-ID: <682f2770c1b751127bd82a2c059630ea@localhost.localdomain> epastor updated this revision to Diff 276221. epastor added a comment. Rebase on parent Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83346/new/ https://reviews.llvm.org/D83346 Files: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/test/tools/llvm-ml/struct.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83346.276221.patch Type: text/x-patch Size: 3028 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:51:17 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:51:17 +0000 (UTC) Subject: [PATCH] D83347: [ms] [llvm-ml] Add support for line continuations in MASM In-Reply-To: References: Message-ID: <8a45a0d15dd0aaa0ea196ae2b3f9f7fb@localhost.localdomain> epastor updated this revision to Diff 276222. epastor added a comment. Rebase on parent Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83347/new/ https://reviews.llvm.org/D83347 Files: llvm/lib/MC/MCParser/MasmParser.cpp Index: llvm/lib/MC/MCParser/MasmParser.cpp =================================================================== --- llvm/lib/MC/MCParser/MasmParser.cpp +++ llvm/lib/MC/MCParser/MasmParser.cpp @@ -1099,6 +1099,14 @@ tok = &Lexer.Lex(); } + // Recognize and bypass line continuations. + while (tok->is(AsmToken::BackSlash) && + Lexer.peekTok().is(AsmToken::EndOfStatement)) { + // Eat both the backslash and the end of statement. + Lexer.Lex(); + tok = &Lexer.Lex(); + } + if (tok->is(AsmToken::Eof)) { // If this is the end of an included file, pop the parent file off the // include stack. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83347.276222.patch Type: text/x-patch Size: 632 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:51:32 2020 From: llvm-commits at lists.llvm.org (Eric Astor via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:51:32 +0000 (UTC) Subject: [PATCH] D83348: [ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM In-Reply-To: References: Message-ID: <472d67694ed13cd65d62f481121a9b66@localhost.localdomain> epastor updated this revision to Diff 276223. epastor added a comment. Rebase on parent Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83348/new/ https://reviews.llvm.org/D83348 Files: llvm/lib/MC/MCParser/MasmParser.cpp Index: llvm/lib/MC/MCParser/MasmParser.cpp =================================================================== --- llvm/lib/MC/MCParser/MasmParser.cpp +++ llvm/lib/MC/MCParser/MasmParser.cpp @@ -1779,8 +1779,18 @@ SMLoc &EndLoc) { SMLoc StartLoc = Lexer.getLoc(); while (true) { + AsmToken::TokenKind TokKind = Lexer.getKind(); + if (Lexer.getKind() == AsmToken::Identifier) { + StringRef Identifier = Lexer.getTok().getString(); + if (Identifier.equals_lower("and")) + TokKind = AsmToken::Amp; + else if (Identifier.equals_lower("not")) + TokKind = AsmToken::Exclaim; + else if (Identifier.equals_lower("or")) + TokKind = AsmToken::Pipe; + } MCBinaryExpr::Opcode Kind = MCBinaryExpr::Add; - unsigned TokPrec = getBinOpPrecedence(Lexer.getKind(), Kind); + unsigned TokPrec = getBinOpPrecedence(TokKind, Kind); // If the next token is lower precedence than we are allowed to eat, return // successfully with what we ate already. @@ -3236,7 +3246,7 @@ Lex(); } else { const MCExpr *Value; - if (checkForValidSection() || parseExpression(Value)) + if (parseExpression(Value)) return true; if (getTok().is(AsmToken::Identifier) && getTok().getString().equals_lower("dup")) { @@ -3456,6 +3466,9 @@ // Initialize real data values. bool MasmParser::emitRealValues(const fltSemantics &Semantics) { + if (checkForValidSection()) + return true; + SmallVector ValuesAsInt; if (parseRealInstList(Semantics, ValuesAsInt)) return true; @@ -3475,8 +3488,7 @@ Field.SizeOf = 0; - if (checkForValidSection() || - parseRealInstList(Semantics, RealInfo.AsIntValues)) + if (parseRealInstList(Semantics, RealInfo.AsIntValues)) return true; Field.Type = RealInfo.AsIntValues.back().getBitWidth() / 8; @@ -3493,9 +3505,6 @@ /// ::= (real4 | real8) [ expression (, expression)* ] bool MasmParser::parseDirectiveRealValue(StringRef IDVal, const fltSemantics &Semantics) { - if (checkForValidSection()) - return true; - if (StructInProgress.empty()) { // Initialize data value. if (emitRealValues(Semantics)) @@ -3511,9 +3520,6 @@ bool MasmParser::parseDirectiveNamedRealValue(StringRef IDVal, const fltSemantics &Semantics, StringRef Name, SMLoc NameLoc) { - if (checkForValidSection()) - return true; - if (StructInProgress.empty()) { // Initialize named data value. MCSymbol *Sym = getContext().getOrCreateSymbol(Name); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83348.276223.patch Type: text/x-patch Size: 2675 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:55:08 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:55:08 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: eugenis added inline comments. ================ Comment at: compiler-rt/test/msan/libatomic.c:37 +#endif +} ---------------- guiand wrote: > One thing that turned out a little strange is that because I have to insert instructions *after* the atomic load, including the origin update, the msan reporter decides that the origin is one line below the call. Is there anything I can do about this? use SetCurrentDebugLocation on the builder ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3550 + getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), AlignOne, + /*isStore*/ false) + .first; ---------------- I think isStore needs to be true here. ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3554 + NextIRB.CreateMemCpy(DstShadowPtr, AlignOne, SrcShadowOriginPair.first, + AlignOne, Size); + if (MS.TrackOrigins) { ---------------- How about we do the same thing that happens to memcpy() calls in the user code, i.e. emit a call to __msan_memcpy? It will take care of everything. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 From llvm-commits at lists.llvm.org Tue Jul 7 14:55:24 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:55:24 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: jcai19 updated this revision to Diff 276226. jcai19 added a comment. Address some of @reames's concerns, and rename variables to avoid confusion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 Files: llvm/include/llvm/MC/MCAsmBackend.h llvm/include/llvm/MC/MCFragment.h llvm/include/llvm/MC/MCObjectStreamer.h llvm/include/llvm/MC/MCStreamer.h llvm/lib/MC/MCAssembler.cpp llvm/lib/MC/MCFragment.cpp llvm/lib/MC/MCObjectStreamer.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp llvm/test/MC/X86/x86-directive-nops-errors.s llvm/test/MC/X86/x86-directive-nops.s llvm/test/MC/X86/x86_64-directive-nops.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82826.276226.patch Type: text/x-patch Size: 13933 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:56:02 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via llvm-commits) Date: Tue, 07 Jul 2020 14:56:02 -0700 (PDT) Subject: [llvm] c17120a - [SVE] Remove calls to VectorType::getNumElements from AsmParserTest Message-ID: <5f04ef72.1c69fb81.d3264.ceaa@mx.google.com> Author: Christopher Tetreault Date: 2020-07-07T14:55:42-07:00 New Revision: c17120a3a4c14940982320beafa455437d60d170 URL: https://github.com/llvm/llvm-project/commit/c17120a3a4c14940982320beafa455437d60d170 DIFF: https://github.com/llvm/llvm-project/commit/c17120a3a4c14940982320beafa455437d60d170.diff LOG: [SVE] Remove calls to VectorType::getNumElements from AsmParserTest Reviewers: efriedma, c-rhodes, david-arm, kmclaughlin, fpetrogalli, sdesmalen Reviewed By: efriedma Subscribers: tschuett, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83339 Added: Modified: llvm/unittests/AsmParser/AsmParserTest.cpp Removed: ################################################################################ diff --git a/llvm/unittests/AsmParser/AsmParserTest.cpp b/llvm/unittests/AsmParser/AsmParserTest.cpp index 198191bf435f..9a7d70ad1ed0 100644 --- a/llvm/unittests/AsmParser/AsmParserTest.cpp +++ b/llvm/unittests/AsmParser/AsmParserTest.cpp @@ -230,7 +230,7 @@ TEST(AsmParserTest, TypeWithSlotMappingParsing) { ASSERT_TRUE(Ty->isVectorTy()); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); @@ -362,7 +362,7 @@ TEST(AsmParserTest, TypeAtBeginningWithSlotMappingParsing) { ASSERT_TRUE(Read == 9); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); From llvm-commits at lists.llvm.org Tue Jul 7 14:56:04 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:56:04 +0000 (UTC) Subject: [PATCH] D83339: [SVE] Remove calls to VectorType::getNumElements from AsmParserTest In-Reply-To: References: Message-ID: <3eebf617793e2355ec7f298238e87141@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc17120a3a4c1: [SVE] Remove calls to VectorType::getNumElements from AsmParserTest (authored by ctetreau). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83339/new/ https://reviews.llvm.org/D83339 Files: llvm/unittests/AsmParser/AsmParserTest.cpp Index: llvm/unittests/AsmParser/AsmParserTest.cpp =================================================================== --- llvm/unittests/AsmParser/AsmParserTest.cpp +++ llvm/unittests/AsmParser/AsmParserTest.cpp @@ -230,7 +230,7 @@ ASSERT_TRUE(Ty->isVectorTy()); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); @@ -362,7 +362,7 @@ ASSERT_TRUE(Read == 9); // Check the details of the vector. - VectorType *VT = cast(Ty); + auto *VT = cast(Ty); ASSERT_TRUE(VT->getNumElements() == 5); ASSERT_TRUE(VT->getPrimitiveSizeInBits().getFixedSize() == 160); Ty = VT->getElementType(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83339.276227.patch Type: text/x-patch Size: 843 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 14:57:54 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 21:57:54 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <0dc62fca50649450ae2b568a39636d26@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll:23-30 +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8, int p9, int pa, int pb, int pc) { +; int result; +; if (k) +; result = p1 * p2 + p3 / p4 - p5 * p6 + p7 / p8 - p9 * pa + pb / pc; +; else +; result = p1 / p2 - p3 * p4 + p5 / p6 - p7 * p8 + p9 / pa - pb * pc; +; return result; ---------------- tmsriram wrote: > dblaikie wrote: > > tmsriram wrote: > > > dblaikie wrote: > > > > Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call? > > > > > > > > (etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc) > > > > > > > > > > > > > > > It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation: > > > > > > _Z7computebiiiiiiiiiiii.1: # %if.then > > > .cfi_startproc > > > .cfi_def_cfa %rbp, 16 > > > .cfi_offset %rbx, -48 > > > .cfi_offset %r12, -40 > > > .cfi_offset %r14, -32 > > > .cfi_offset %r15, -24 > > > .cfi_offset %rbp, -16 > > > > > > Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers. > > > > > Ah, OK - a comment might be handy to describe that? > > > > And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc? > > > > Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated? > Simplified the test and added comments. Having more than 4 integers in the struct seems to go to the stack though the ABI says upto 32 bytes. > Ah - the comment's good, thanks! Not sure about the code changes - I was hoping for more uniformity (& brevity as a second-order benefit), but having a struct, then 3 struct parameters and some ints lacks the uniformity I was hoping for. Also the arithmetic looks sort of arbitrarily complicated (& raises the question, as a reader (for me at least), why is it complicated? Is the particular sequence of arithmetic important in some way?). Is a more uniform operation (like all addition) not viable due to vectorizing or something? (is a function call inadequate because of other spill issues (eg: void f1(bool k, int a, int b.... ) { int result; if (k) { result f2(a, b, ... ); } else { result = f3(a, b, ...); } return result; })?) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 15:01:10 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Tue, 07 Jul 2020 15:01:10 -0700 (PDT) Subject: [llvm] 8c5825b - [llvm-readobj][test] Fix ELF/verneed-flags.yaml Message-ID: <5f04f0a6.1c69fb81.422c0.42e3@mx.google.com> Author: Fangrui Song Date: 2020-07-07T15:01:02-07:00 New Revision: 8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a URL: https://github.com/llvm/llvm-project/commit/8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a DIFF: https://github.com/llvm/llvm-project/commit/8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a.diff LOG: [llvm-readobj][test] Fix ELF/verneed-flags.yaml *.yaml tests don't currently run, so we failed to update it. Added: Modified: llvm/test/tools/llvm-readobj/ELF/verneed-flags.yaml Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/verneed-flags.yaml b/llvm/test/tools/llvm-readobj/ELF/verneed-flags.yaml index eedc7fe7ad7a..685acfbd696a 100644 --- a/llvm/test/tools/llvm-readobj/ELF/verneed-flags.yaml +++ b/llvm/test/tools/llvm-readobj/ELF/verneed-flags.yaml @@ -13,37 +13,52 @@ # LLVM-VERDEF-NEXT: Entries [ # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: Base (0x1) +# LLVM-VERDEF-NEXT: Flags [ (0x1) +# LLVM-VERDEF-NEXT: Base (0x1) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: base # LLVM-VERDEF-NEXT: } # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: Weak (0x2) +# LLVM-VERDEF-NEXT: Flags [ (0x2) +# LLVM-VERDEF-NEXT: Weak (0x2) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: weak # LLVM-VERDEF-NEXT: } # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: Info (0x4) +# LLVM-VERDEF-NEXT: Flags [ (0x4) +# LLVM-VERDEF-NEXT: Info (0x4) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: info # LLVM-VERDEF-NEXT: } # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: 0x7 +# LLVM-VERDEF-NEXT: Flags [ (0x7) +# LLVM-VERDEF-NEXT: Base (0x1) +# LLVM-VERDEF-NEXT: Info (0x4) +# LLVM-VERDEF-NEXT: Weak (0x2) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: all # LLVM-VERDEF-NEXT: } # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: 0x8 +# LLVM-VERDEF-NEXT: Flags [ (0x8) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: unknown # LLVM-VERDEF-NEXT: } # LLVM-VERDEF-NEXT: Entry { # LLVM-VERDEF-NEXT: Hash: 0 -# LLVM-VERDEF-NEXT: Flags: 0xF +# LLVM-VERDEF-NEXT: Flags [ (0xF) +# LLVM-VERDEF-NEXT: Base (0x1) +# LLVM-VERDEF-NEXT: Info (0x4) +# LLVM-VERDEF-NEXT: Weak (0x2) +# LLVM-VERDEF-NEXT: ] # LLVM-VERDEF-NEXT: Index: 0 # LLVM-VERDEF-NEXT: Name: all_and_unknown # LLVM-VERDEF-NEXT: } @@ -52,7 +67,7 @@ # LLVM-VERDEF-NEXT: ] # GNU-VERDEF: Version needs section '.gnu.version_r' contains 1 entries: -# GNU-VERDEF-NEXT: Addr: 0000000000000000 Offset: 0x000200 Link: 6 (.dynstr) +# GNU-VERDEF-NEXT: Addr: 0000000000000000 Offset: 0x000040 Link: 3 (.dynstr) # GNU-VERDEF-NEXT: 0x0000: Version: 1 File: dso.so.0 Cnt: 6 # GNU-VERDEF-NEXT: 0x0010: Name: base Flags: BASE Version: 0 # GNU-VERDEF-NEXT: 0x0020: Name: weak Flags: WEAK Version: 0 From llvm-commits at lists.llvm.org Tue Jul 7 15:05:17 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:05:17 +0000 (UTC) Subject: [PATCH] D83056: [NFC] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: <09677cca67461a091a0a889a03a4f08d@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:50 -#define DEBUG_TYPE "loop-unroll" +#define DEBUG_TYPE "loop-peel" ---------------- sidbav wrote: > Meinersbur wrote: > > fhahn wrote: > > > I am not sure about this change. Currently peeling is integrated in loop-unroll and remarks/debug can be filtered by loop-unroll, but now we will generate remarks for `loop-unroll` and `loop-peel` when running `-loop-unroll`. > > Isn't it actually better since you can now filter `-debug-only=loop-unroll`, respectively `-debug-only=loop-peel` depending on what you want to look at? > > > > Note: `-Rpass=` remarks use the pass name, not `DEBUG_TYPE`. > I also agree with @Meinersbur, having them separate is better. Additionally, in the case that the developer wants to look at both unrolling and peeling at the same time, they can specify `debug-only=loop-unroll,loop-peel`. > Isn't it actually better since you can now filter -debug-only=loop-unroll, respectively -debug-only=loop-peel depending on what you want to look at? I'd say it depends. Personally I find it mostly makes things less discoverable for newcomers. I can see how it might be surprising if a user wants to ask for debug output of the LoopUnroll pass and then the pass makes changes but doesn't display the debug output. It's certainly not a new problem though and not a blocker. I think it means that the patch changes behavior though ;) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 15:05:22 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:05:22 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: Ayal added inline comments. ================ Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812 + if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) && + LHS->hasOneUse() && + findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) { ---------------- dmgreen wrote: > Ayal wrote: > > dmgreen wrote: > > > Ayal wrote: > > > > dmgreen wrote: > > > > > fhahn wrote: > > > > > > Ayal wrote: > > > > > > > Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr? > > > > > > > > > > > > > > Note that when extended to handle reductions with conditional bumps, some ops will have more than one use. > > > > > > Instead of doing a recursive traversal, would it be simpler to just do the traversal iteratively, at least as long as we are only using at a single use chain? > > > > > Yeah, that direction makes it a lot simpler. Thanks. > > > > Is treating sub as an add reduction something in-loop reduction could support as a future extension? > > > Hmm. I don't want to say never. A normal inloop reduction looks like: > > > p = PHI(0, a) > > > l = VLDR (..) > > > a = VADDVA(p, l) > > > Where the `VADDV` is an across-vector reductions, and the extra `A` means also add p. Reducing a sub would need to become: > > > p = PHI(0, a) > > > l = VLDR (..) > > > a = VADDV(l) > > > p = SUB(p, a) > > > With the SUB as a separate scalar instruction, which would be quite slow on some hardware (getting a value over from the VADDV to the SUB). So this would almost certainly be slower than a out-of-loop reduction. > > > > > > But if we could end up using a higher vector factor for the reduction, or end up vectorizing loops that would previously not be vectorized.. that may lead to a gain overall to overcome the extra cost of adding the sub to the loop. It will require some very careful costing I think. And maybe the ability to create multiple vplans and cost them against one another :) > > An original sub code, say, acc -= a[i], can be treated as acc += (-a[i]). This could be in-loop reduced by first negating a[i]'s, at LV's LLVM-IR level, presumably lowered later to something like > > > > ``` > > p = PHI(0, a) > > l = VLDR (..) > > s = VSUBV (zero, l) > > a = VADDVA(p, s) > > ``` > > , right? > Yep. We would have the option to trading a scalar instruction for a vector instruction + an extra register (to hold the 0, we only have 8 registers!) > > Unfortunately both would be slower than in out-of-loop reduction unless we were vectorizing at a higher factor, though. ok, so sub's can be handled in-loop, but doing so is expected to be more costly than out-of-loop, at-least if a horizontal add operation is to be used rather than a horizontal subtract; probably worth a comment. If a reduction chain has only sub's, they could all sink - negating the sum once after the loop, using VADDVA inside. Doing so however will retain the middle block, i.e., w/o decreasing code size. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Tue Jul 7 15:05:58 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Tue, 07 Jul 2020 15:05:58 -0700 (PDT) Subject: [llvm] bdc3134 - [RuntimeDyld][test] Fix ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml after D60122 Message-ID: <5f04f1c6.1c69fb81.f1d78.42f2@mx.google.com> Author: Fangrui Song Date: 2020-07-07T15:05:50-07:00 New Revision: bdc3134e237737dd46b51cd1ecd41ecbbe9f921a URL: https://github.com/llvm/llvm-project/commit/bdc3134e237737dd46b51cd1ecd41ecbbe9f921a DIFF: https://github.com/llvm/llvm-project/commit/bdc3134e237737dd46b51cd1ecd41ecbbe9f921a.diff LOG: [RuntimeDyld][test] Fix ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml after D60122 *.yaml tests don't currently run, so we failed to notice it. Added: Modified: llvm/test/ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml Removed: ################################################################################ diff --git a/llvm/test/ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml b/llvm/test/ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml index ed4b243fae13..09e14c832a20 100644 --- a/llvm/test/ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml +++ b/llvm/test/ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml @@ -22,8 +22,8 @@ Sections: - Offset: 0x0000000000000000 Type: R_X86_64_NONE Symbols: - Global: - - Name: _main - Section: .text - Value: 0 - Size: 4 + - Name: _main + Section: .text + Binding: STB_GLOBAL + Value: 0 + Size: 4 From llvm-commits at lists.llvm.org Tue Jul 7 15:09:03 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:09:03 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <71cf5fc1afd8b75c83b6def7bd084593@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:94 /// If it contains any dynamic allocas, returns false. static bool canTRE(Function &F) { // Because of PR962, we don't TRE dynamic allocas. ---------------- If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop this check. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:808 // Until this is resolved, disable this transformation if that would ever // happen. This bug is PR962. for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /*in loop*/) { ---------------- Can you move this FIXME into a more appropriate spot? ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:335 + II->getIntrinsicID() == Intrinsic::assume) + return true; + ---------------- avl wrote: > efriedma wrote: > > What is the new handling for lifetime.end/assume doing? > They are just skipped. In following test case: > > > ``` > call void @_Z5test5i(i32 %sub) > call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %1) #5 > call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5 > br label %return > > ``` > > they are generated in between call and ret. It is safe to ignore them while checking whether transformation is possible. It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? Since this is sort of tricky, I'd prefer to split this off into a followup. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Tue Jul 7 15:12:13 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:12:13 +0000 (UTC) Subject: [PATCH] D83336: [flang] Support for image selectors In-Reply-To: References: Message-ID: <4bc73f34b2774cec3264c64fa37b1503@localhost.localdomain> PeteSteinfeld marked 2 inline comments as done. PeteSteinfeld added inline comments. ================ Comment at: flang/lib/Semantics/check-coarray.cpp:113 + context_.Say(parser::FindSourceLocation(imageSelectorSpec), // C929 + "TEAM STAT variable can only be specified once"_err_en_US); + } ---------------- tskeith wrote: > I think this should be "STAT variable..." So it shall be written. ================ Comment at: flang/lib/Semantics/expression.cpp:1090-1100 + std::visit( + common::visitors{ + [&](const parser::ImageSelectorSpec::Stat &statVar) { + Analyze(statVar.v); + }, + [&](const parser::TeamValue &teamValue) { Analyze(teamValue.v); }, + [&](const parser::ImageSelectorSpec::Team_Number &teamNumber) { ---------------- tskeith wrote: > Because all of the cases are the same, this can be simplified to: > ``` > std::visit([&](const auto &x) { Analyze(x.v); }, imageSelSpec.u); > ``` Thanks! I thought there might be something like this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83336/new/ https://reviews.llvm.org/D83336 From llvm-commits at lists.llvm.org Tue Jul 7 15:15:29 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:15:29 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: <86b8adfe204e0f842a50bc2fc8d8494e@localhost.localdomain> reames added a comment. Required Change - Please introduce a runtime flag which controls how many values are handled via vregs. Default this value to zero. This will remove all existing test diffs; if it doesn't you have a bug. Then introduce a new test file, optional copied from an existing one, called statepoint-gc-regs.ll which enumerates sufficient coverage for the new feature. This is not a suggestion, it is a requirement. I specifically need to see a zero diff in the existing tests to have confidence in the changes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Tue Jul 7 15:19:38 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:19:38 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: aganea added inline comments. ================ Comment at: llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp:1072 +unsigned X86AsmBackend::getMaximumNopSize() const { + if (!STI.getFeatureBits()[X86::FeatureNOPL] && ---------------- reames wrote: > Rename this function to getMaximumProfitableNop() > > There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. Any reason for not reusing `maxLongNopLength()` rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 That function could perhaps be moved to `llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 15:19:59 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Tue, 07 Jul 2020 15:19:59 -0700 (PDT) Subject: [llvm] 04b85e2 - Revert "[SLP] Make sure instructions are ordered when computing spill cost." Message-ID: <5f04f50f.1c69fb81.ac8a0.8aa1@mx.google.com> Author: Florian Hahn Date: 2020-07-07T23:15:01+01:00 New Revision: 04b85e2bcbffcbe4ff4eb1ef3762f326525cae7c URL: https://github.com/llvm/llvm-project/commit/04b85e2bcbffcbe4ff4eb1ef3762f326525cae7c DIFF: https://github.com/llvm/llvm-project/commit/04b85e2bcbffcbe4ff4eb1ef3762f326525cae7c.diff LOG: Revert "[SLP] Make sure instructions are ordered when computing spill cost." This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371 This reverts commit eb46137daa92723b75d828f2db959f2061612622. Added: Modified: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index b65832450828..5bb05c6ac3d1 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -3760,24 +3760,11 @@ int BoUpSLP::getSpillCost() const { SmallPtrSet LiveValues; Instruction *PrevInst = nullptr; - // The entries in VectorizableTree are not necessarily ordered by their - // position in basic blocks. Collect them and order them by dominance so later - // instructions are guaranteed to be visited first. For instructions in - // diff erent basic blocks, we only scan to the beginning of the block, so - // their order does not matter, as long as all instructions in a basic block - // are grouped together. Using dominance ensures a deterministic order. - SmallVector OrderedScalars; for (const auto &TEPtr : VectorizableTree) { Instruction *Inst = dyn_cast(TEPtr->Scalars[0]); if (!Inst) continue; - OrderedScalars.push_back(Inst); - } - llvm::stable_sort(OrderedScalars, [this](Instruction *A, Instruction *B) { - return !DT->dominates(A, B); - }); - for (Instruction *Inst : OrderedScalars) { if (!PrevInst) { PrevInst = Inst; continue; diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll index 8e0ca4b29384..9286b7ce9a69 100644 --- a/llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll +++ b/llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll @@ -13,19 +13,22 @@ define void @test(i64* %ptr, i64* noalias %res) { ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: ; CHECK-NEXT: [[CALL_I_I:%.*]] = call i32* @get_ptr() +; CHECK-NEXT: [[L_0_0:%.*]] = load i32, i32* [[CALL_I_I]], align 2 ; CHECK-NEXT: [[GEP_1:%.*]] = getelementptr i32, i32* [[CALL_I_I]], i32 2 +; CHECK-NEXT: [[L_1_0:%.*]] = load i32, i32* [[GEP_1]], align 2 +; CHECK-NEXT: [[EXT_0_0:%.*]] = zext i32 [[L_0_0]] to i64 +; CHECK-NEXT: [[EXT_1_0:%.*]] = zext i32 [[L_1_0]] to i64 +; CHECK-NEXT: [[SUB_1:%.*]] = sub nsw i64 [[EXT_0_0]], [[EXT_1_0]] ; CHECK-NEXT: [[GEP_2:%.*]] = getelementptr i32, i32* [[CALL_I_I]], i32 1 -; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[CALL_I_I]] to <2 x i32>* -; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]], align 2 +; CHECK-NEXT: [[L_0_1:%.*]] = load i32, i32* [[GEP_2]], align 2 ; CHECK-NEXT: [[GEP_3:%.*]] = getelementptr i32, i32* [[CALL_I_I]], i32 3 -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32* [[GEP_1]] to <2 x i32>* -; CHECK-NEXT: [[TMP3:%.*]] = load <2 x i32>, <2 x i32>* [[TMP2]], align 2 -; CHECK-NEXT: [[TMP4:%.*]] = zext <2 x i32> [[TMP1]] to <2 x i64> -; CHECK-NEXT: [[TMP5:%.*]] = zext <2 x i32> [[TMP3]] to <2 x i64> -; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i64> [[TMP4]], [[TMP5]] -; CHECK-NEXT: [[RES_1:%.*]] = getelementptr i64, i64* [[RES:%.*]], i64 1 -; CHECK-NEXT: [[TMP7:%.*]] = bitcast i64* [[RES]] to <2 x i64>* -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* [[TMP7]], align 8 +; CHECK-NEXT: [[L_1_1:%.*]] = load i32, i32* [[GEP_3]], align 2 +; CHECK-NEXT: [[EXT_0_1:%.*]] = zext i32 [[L_0_1]] to i64 +; CHECK-NEXT: [[EXT_1_1:%.*]] = zext i32 [[L_1_1]] to i64 +; CHECK-NEXT: [[SUB_2:%.*]] = sub nsw i64 [[EXT_0_1]], [[EXT_1_1]] +; CHECK-NEXT: store i64 [[SUB_1]], i64* [[RES:%.*]], align 8 +; CHECK-NEXT: [[RES_1:%.*]] = getelementptr i64, i64* [[RES]], i64 1 +; CHECK-NEXT: store i64 [[SUB_2]], i64* [[RES_1]], align 8 ; CHECK-NEXT: [[C:%.*]] = call i1 @cond() ; CHECK-NEXT: br i1 [[C]], label [[FOR_BODY]], label [[EXIT:%.*]] ; CHECK: exit: From llvm-commits at lists.llvm.org Tue Jul 7 15:20:15 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:20:15 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml Message-ID: MaskRay created this revision. MaskRay added reviewers: grimar, jhenderson, rupprecht. Herald added subscribers: llvm-commits, cmtice. Herald added a project: LLVM. This patch extends D58439 (`llvm/test/{yaml2obj,obj2yaml}/**/*.yaml`) and runs all `llvm/test/**/*.yaml` Many directories have configured `.yaml` (see the deleted lit.local.cfg files). Yet still some don't configure .yaml and have caused stale tests: - 8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a test/llvm-readobj - bdc3134e237737dd46b51cd1ecd41ecbbe9f921a test/ExecutionEngine Just hoist .yaml to `llvm/test/`. The number of tests running on my machine increases from 38304 to 38309. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83350 Files: llvm/test/Object/lit.local.cfg llvm/test/ObjectYAML/lit.local.cfg llvm/test/lit.cfg.py llvm/test/tools/llvm-as/lit.local.cfg llvm/test/tools/llvm-dwarfdump/lit.local.cfg llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg llvm/test/tools/llvm-nm/lit.local.cfg llvm/test/tools/llvm-objdump/lit.local.cfg llvm/test/tools/llvm-readobj/COFF/lit.local.cfg llvm/test/tools/llvm-xray/X86/lit.local.cfg llvm/test/tools/obj2yaml/lit.local.cfg llvm/test/tools/yaml2obj/lit.local.cfg Index: llvm/test/tools/yaml2obj/lit.local.cfg =================================================================== --- llvm/test/tools/yaml2obj/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/obj2yaml/lit.local.cfg =================================================================== --- llvm/test/tools/obj2yaml/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-xray/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-xray/X86/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml', '.ll', '.txt'] Index: llvm/test/tools/llvm-readobj/COFF/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-readobj/COFF/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes.add('.yaml') Index: llvm/test/tools/llvm-objdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-objdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml', '.txt'] Index: llvm/test/tools/llvm-nm/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-nm/lit.local.cfg +++ llvm/test/tools/llvm-nm/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg @@ -1,4 +1,2 @@ if not ('ARM' in config.root.targets and 'AArch64' in config.root.targets): config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-dwarfdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-dwarfdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] Index: llvm/test/tools/llvm-as/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-as/lit.local.cfg +++ llvm/test/tools/llvm-as/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/lit.cfg.py =================================================================== --- llvm/test/lit.cfg.py +++ llvm/test/lit.cfg.py @@ -22,7 +22,7 @@ # suffixes: A list of file extensions to treat as test files. This is overriden # by individual lit.local.cfg files in the test subdirectories. -config.suffixes = ['.ll', '.c', '.cxx', '.test', '.txt', '.s', '.mir'] +config.suffixes = ['.ll', '.c', '.cxx', '.test', '.txt', '.s', '.mir', '.yaml'] # excludes: A list of directories to exclude from the testsuite. The 'Inputs' # subdirectories contain auxiliary inputs for various tests in their parent Index: llvm/test/ObjectYAML/lit.local.cfg =================================================================== --- llvm/test/ObjectYAML/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml'] Index: llvm/test/Object/lit.local.cfg =================================================================== --- llvm/test/Object/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] -------------- next part -------------- A non-text attachment was scrubbed... Name: D83350.276231.patch Type: text/x-patch Size: 3904 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:21:34 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 15:21:34 -0700 (PDT) Subject: [llvm] 91f7067 - [X86] Add back the assert in getImpliedFeatures that I removed in ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846 Message-ID: <5f04f56e.1c69fb81.3210.428f@mx.google.com> Author: Craig Topper Date: 2020-07-07T15:20:59-07:00 New Revision: 91f70675cc6e5c872e0059c11d797b8726eeac67 URL: https://github.com/llvm/llvm-project/commit/91f70675cc6e5c872e0059c11d797b8726eeac67 DIFF: https://github.com/llvm/llvm-project/commit/91f70675cc6e5c872e0059c11d797b8726eeac67.diff LOG: [X86] Add back the assert in getImpliedFeatures that I removed in ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846 I've added additional features to the table so I want to see if the bots are happier with this. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 91feb146baaa..9910fd615b1d 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -230,5 +230,6 @@ X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches") X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls") X86_FEATURE (LVI_CFI, "lvi-cfi") X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening") +X86_FEATURE (SESES, "seses") #undef X86_FEATURE_COMPAT #undef X86_FEATURE diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 261e296b9e5a..df03f63e720e 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -441,6 +441,7 @@ static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_BRANCHES = {}; static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_CALLS = {}; static constexpr FeatureBitset ImpliedFeaturesLVI_CFI = {}; static constexpr FeatureBitset ImpliedFeaturesLVI_LOAD_HARDENING = {}; +static constexpr FeatureBitset ImpliedFeaturesSESES = {}; // XSAVE features are dependent on basic XSAVE. static constexpr FeatureBitset ImpliedFeaturesXSAVEC = FeatureXSAVE; @@ -562,6 +563,7 @@ void llvm::X86::getImpliedFeatures( if (I == std::end(FeatureInfos)) { // FIXME: This shouldn't happen, but may not have all features in the table // yet. + assert(false && "Feature not found in table!"); return; } From llvm-commits at lists.llvm.org Tue Jul 7 15:23:32 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:23:32 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 276233. MaskRay edited the summary of this revision. MaskRay added a comment. Delete .cxx (there is no such test) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83350/new/ https://reviews.llvm.org/D83350 Files: llvm/test/Object/lit.local.cfg llvm/test/ObjectYAML/lit.local.cfg llvm/test/lit.cfg.py llvm/test/tools/llvm-as/lit.local.cfg llvm/test/tools/llvm-dwarfdump/lit.local.cfg llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg llvm/test/tools/llvm-nm/lit.local.cfg llvm/test/tools/llvm-objdump/lit.local.cfg llvm/test/tools/llvm-readobj/COFF/lit.local.cfg llvm/test/tools/llvm-xray/X86/lit.local.cfg llvm/test/tools/obj2yaml/lit.local.cfg llvm/test/tools/yaml2obj/lit.local.cfg Index: llvm/test/tools/yaml2obj/lit.local.cfg =================================================================== --- llvm/test/tools/yaml2obj/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/obj2yaml/lit.local.cfg =================================================================== --- llvm/test/tools/obj2yaml/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-xray/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-xray/X86/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml', '.ll', '.txt'] Index: llvm/test/tools/llvm-readobj/COFF/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-readobj/COFF/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes.add('.yaml') Index: llvm/test/tools/llvm-objdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-objdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml', '.txt'] Index: llvm/test/tools/llvm-nm/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-nm/lit.local.cfg +++ llvm/test/tools/llvm-nm/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg @@ -1,4 +1,2 @@ if not ('ARM' in config.root.targets and 'AArch64' in config.root.targets): config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-dwarfdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-dwarfdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] Index: llvm/test/tools/llvm-as/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-as/lit.local.cfg +++ llvm/test/tools/llvm-as/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/lit.cfg.py =================================================================== --- llvm/test/lit.cfg.py +++ llvm/test/lit.cfg.py @@ -22,7 +22,7 @@ # suffixes: A list of file extensions to treat as test files. This is overriden # by individual lit.local.cfg files in the test subdirectories. -config.suffixes = ['.ll', '.c', '.cxx', '.test', '.txt', '.s', '.mir'] +config.suffixes = ['.ll', '.c', '.test', '.txt', '.s', '.mir', '.yaml'] # excludes: A list of directories to exclude from the testsuite. The 'Inputs' # subdirectories contain auxiliary inputs for various tests in their parent Index: llvm/test/ObjectYAML/lit.local.cfg =================================================================== --- llvm/test/ObjectYAML/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml'] Index: llvm/test/Object/lit.local.cfg =================================================================== --- llvm/test/Object/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] -------------- next part -------------- A non-text attachment was scrubbed... Name: D83350.276233.patch Type: text/x-patch Size: 3896 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:26:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:26:27 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: nickdesaulniers, dblaikie, diegotf, george.burgess.iv, jdoerfert, Tyker. lebedev.ri added a project: LLVM. Herald added a subscriber: mgorny. This handles all three places where attributes could currently be - `GlobalVariable`, `Function` and `CallBase`. For last two, it correctly handles all three possible attribute locations (return value, arguments and function itself) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276234.patch Type: text/x-patch Size: 13521 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:29:00 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:29:00 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: jcai19 updated this revision to Diff 276235. jcai19 added a comment. Fixed an assertion message. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 Files: llvm/include/llvm/MC/MCAsmBackend.h llvm/include/llvm/MC/MCFragment.h llvm/include/llvm/MC/MCObjectStreamer.h llvm/include/llvm/MC/MCStreamer.h llvm/lib/MC/MCAssembler.cpp llvm/lib/MC/MCFragment.cpp llvm/lib/MC/MCObjectStreamer.cpp llvm/lib/MC/MCStreamer.cpp llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp llvm/test/MC/X86/x86-directive-nops-errors.s llvm/test/MC/X86/x86-directive-nops.s llvm/test/MC/X86/x86_64-directive-nops.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82826.276235.patch Type: text/x-patch Size: 13946 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:30:16 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Tue, 7 Jul 2020 23:30:16 +0100 Subject: buildbot failure in LLVM on llvm-clang-x86_64-expensive-checks-win In-Reply-To: References: <20200703181025.3DDD210A03F7@lab.llvm.org> Message-ID: Thanks for letting me know. I reverted the patch in 04b85e2bcbffcbe4ff4eb1ef3762f326525cae7c for now. The real failure must have slipped through, as I was swamped with failure emails unrelated to my patch on Friday. Cheers Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:30:53 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:30:53 +0000 (UTC) Subject: [PATCH] D83287: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction In-Reply-To: References: Message-ID: <4f2679293b4bb560d24baa098df018d3@localhost.localdomain> lebedev.ri updated this revision to Diff 276237. lebedev.ri marked 2 inline comments as done. lebedev.ri added a comment. @nickdesaulniers thank you for taking a look! Addressed review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83287/new/ https://reviews.llvm.org/D83287 Files: llvm/tools/llvm-reduce/deltas/Delta.h llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83287.276237.patch Type: text/x-patch Size: 11591 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:33:13 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:33:13 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <35801aae2ec968b95694ed6e3ba9ea00@localhost.localdomain> arsenm added a comment. One of the problems with bugpoint's attribute handling is it burns a lot of time trying to reduce attributes on intrinsics, which cannot actually be removed since getDeclaration will just put them back on ================ Comment at: llvm/tools/llvm-reduce/DeltaManager.h:36 reduceOperandBundesDeltaPass(Tester); + reduceAttributesDeltaPass(Tester); // TODO: Implement the remaining Delta Passes ---------------- Doing this last is an improvement over bugpoint's attempt to do this first. I don't think removing attributes is actually a great reduction strategy. For most of the hard to reduce testcases I debug, removing attributes is entirely pointless (and adding them is more helpful). I think this needs a flag to disable it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 15:33:49 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:33:49 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: <6b05c4fbb1df4887aef34e4630e8b6ff@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp:1072 +unsigned X86AsmBackend::getMaximumNopSize() const { + if (!STI.getFeatureBits()[X86::FeatureNOPL] && ---------------- aganea wrote: > reames wrote: > > Rename this function to getMaximumProfitableNop() > > > > There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. > Any reason for not reusing `maxLongNopLength()` rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 > That function could perhaps be moved to `llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h` ? > That function can't be moved as is. It uses X86Subtarget which isn't available to MC. It does something different than for 32-bit mode than what is currently in this patch as that causes additional test failures as discussed elsewhere in this review. That function also uses ProcIntelSLM instead of Feature7ByteNOP. And the FeatureFast flags being set assumes FeatureNOPL is set which is backwards of how it should be. I think the function here is closer to how it should be except for the 32-bit difference. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 15:34:11 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:34:11 +0000 (UTC) Subject: [PATCH] D83352: [flang] Fix CHARACTER length folding problem Message-ID: klausler created this revision. klausler added reviewers: tskeith, sscalpone. klausler added a project: Flang. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Do not rewrite LEN(x) or x%len to the expression that specifies the length of x when that length is not a constant expression. Its value may have changed since the value of the expression was first captured in the definition of the object. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83352 Files: flang/include/flang/Evaluate/type.h flang/lib/Evaluate/fold-integer.cpp flang/lib/Evaluate/shape.cpp flang/lib/Evaluate/type.cpp flang/lib/Evaluate/variable.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83352.276239.patch Type: text/x-patch Size: 5587 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:36:26 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:36:26 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: <1fe96fd226e0e5f3aa6b32e4ee5e4645@localhost.localdomain> jcai19 marked an inline comment as done. jcai19 added inline comments. ================ Comment at: llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp:1072 +unsigned X86AsmBackend::getMaximumNopSize() const { + if (!STI.getFeatureBits()[X86::FeatureNOPL] && ---------------- craig.topper wrote: > aganea wrote: > > reames wrote: > > > Rename this function to getMaximumProfitableNop() > > > > > > There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. > > Any reason for not reusing `maxLongNopLength()` rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 > > That function could perhaps be moved to `llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h` ? > > > That function can't be moved as is. It uses X86Subtarget which isn't available to MC. It does something different than for 32-bit mode than what is currently in this patch as that causes additional test failures as discussed elsewhere in this review. That function also uses ProcIntelSLM instead of Feature7ByteNOP. And the FeatureFast flags being set assumes FeatureNOPL is set which is backwards of how it should be. > > I think the function here is closer to how it should be except for the 32-bit difference. > Any reason for not reusing maxLongNopLength() rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 Yes I'm all for merging these two functions although there are some differences on both 32-bit and 64-bit mode that would break some unit tests, such as https://reviews.llvm.org/D82826?id=275853 on 64-bit mode. Maybe we can address that in a separate patch as previously discussed. > That function could perhaps be moved to llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h ? SG. Will start to work on merging them once this patch is checked in. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 15:37:36 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:37:36 +0000 (UTC) Subject: [PATCH] D83124: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll In-Reply-To: References: Message-ID: efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83124/new/ https://reviews.llvm.org/D83124 From llvm-commits at lists.llvm.org Tue Jul 7 15:44:10 2020 From: llvm-commits at lists.llvm.org (Jian Cai via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:44:10 +0000 (UTC) Subject: [PATCH] D82826: [X86] support .nops directive In-Reply-To: References: Message-ID: jcai19 marked an inline comment as done. jcai19 added inline comments. ================ Comment at: llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp:1072 +unsigned X86AsmBackend::getMaximumNopSize() const { + if (!STI.getFeatureBits()[X86::FeatureNOPL] && ---------------- jcai19 wrote: > craig.topper wrote: > > aganea wrote: > > > reames wrote: > > > > Rename this function to getMaximumProfitableNop() > > > > > > > > There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. > > > Any reason for not reusing `maxLongNopLength()` rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 > > > That function could perhaps be moved to `llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h` ? > > > > > That function can't be moved as is. It uses X86Subtarget which isn't available to MC. It does something different than for 32-bit mode than what is currently in this patch as that causes additional test failures as discussed elsewhere in this review. That function also uses ProcIntelSLM instead of Feature7ByteNOP. And the FeatureFast flags being set assumes FeatureNOPL is set which is backwards of how it should be. > > > > I think the function here is closer to how it should be except for the 32-bit difference. > > Any reason for not reusing maxLongNopLength() rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 > > Yes I'm all for merging these two functions although there are some differences on both 32-bit and 64-bit mode that would break some unit tests, such as https://reviews.llvm.org/D82826?id=275853 on 64-bit mode. Maybe we can address that in a separate patch as previously discussed. > > > That function could perhaps be moved to llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h ? > > SG. Will start to work on merging them once this patch is checked in. @craig.topper Okay I missed your comment. Thanks for the clarification. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82826/new/ https://reviews.llvm.org/D82826 From llvm-commits at lists.llvm.org Tue Jul 7 15:49:26 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:49:26 +0000 (UTC) Subject: [PATCH] D82262: [RISCV] Optimize addition with an immediate In-Reply-To: References: Message-ID: <20f9b8ec5f8fcaff2e012cecede23a24@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:80 + case ISD::ADD: { + // The second operand must be an immediate. + if (auto *ConstOp = dyn_cast(Node->getOperand(1))) { ---------------- The two comments can be combined and placed under `case ISD::ADD:` Optimize (add r, imm) to (addi (addi r, imm0) imm1) if applicable. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:83 + // The immediate operand must not have other use. + if (!ConstOp->hasOneUse()) + break; ---------------- `if (!ConstOp->hasOneUse())` is not tested. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:92 + EVT VT = Node->getValueType(0); + SDValue ImmOp0 = CurDAG->getTargetConstant(Imm - Imm / 2, DL, VT); + SDValue ImmOp1 = CurDAG->getTargetConstant(Imm / 2, DL, VT); ---------------- Add const if applicable. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82262/new/ https://reviews.llvm.org/D82262 From llvm-commits at lists.llvm.org Tue Jul 7 15:52:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:52:23 +0000 (UTC) Subject: [PATCH] D62627: [NFC] Do not run CGProfilePass when not using integrated assembler In-Reply-To: References: Message-ID: <46522c6fda8f01edf391cbb2b43386b4@localhost.localdomain> MaskRay added a comment. I do not see a strong argument for `-enable-npm-call-graph-profile`. We just need `Opts.CallGraphProfile = !Opts.DisableIntegratedAS` The option does not appear to be very useful to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62627/new/ https://reviews.llvm.org/D62627 From llvm-commits at lists.llvm.org Tue Jul 7 15:53:31 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:53:31 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <9f9e1c55f7ee1c605295d99a2b8cafe6@localhost.localdomain> NeHuang updated this revision to Diff 276241. NeHuang marked 8 inline comments as done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/Inputs/ppc64-extern-callee-hidden.s lld/test/ELF/Inputs/ppc64-extern-callee.s lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.276241.patch Type: text/x-patch Size: 11797 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 15:54:05 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:54:05 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <7324b14adc6ae7f53790517303ea9ce4@localhost.localdomain> nickdesaulniers added a comment. I think it would be good in the description+commit message to reference https://reviews.llvm.org/D73853 (and that it was reverted). Does llvm-reduce still have "interestingness tests" in `llvm/test/Reduce/Inputs/`? If so, shouldn't this change add such a test? Thanks for the patch! ================ Comment at: llvm/tools/llvm-reduce/DeltaManager.h:36 reduceOperandBundesDeltaPass(Tester); + reduceAttributesDeltaPass(Tester); // TODO: Implement the remaining Delta Passes ---------------- arsenm wrote: > Doing this last is an improvement over bugpoint's attempt to do this first. > > I don't think removing attributes is actually a great reduction strategy. For most of the hard to reduce testcases I debug, removing attributes is entirely pointless (and adding them is more helpful). I think this needs a flag to disable it. Counterpoint, I find removing attributes very helpful in reducing the amount of noise in reduced test cases and have had bugs when I needed to figure out which attribute was the source of differences in codegen. I don't mind a flag (I don't think it's necessary, but doesn't hurt); but I'd prefer it to be default on so you can opt-out if you don't want to reduce attributes. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:146-148 + for (const auto &I : zip(Res, AttributeSets)) { + std::pair &NewSet = std::get<0>(I); + const AttrPtrIdxVecVecTy &V = std::get<1>(I); ---------------- does `zip` actually simplify this sequence? Looks kind of complicated. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:177 + I.first->setAttributes(convertAttributeRefVecToAttributeList(C, I.second)); + }); +} ---------------- I wish all these members of `AttributeRemapper` were `private` and these three loops maybe hidden in a method of `AttributeRemapper`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 15:55:37 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 22:55:37 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <01de8af836e1f3e632e7fa3a100d34f6@localhost.localdomain> MaskRay added a comment. I still haven't seen a strong argument keeping a command line option `-enable-npm-call-graph-profile`. Asked in D62627 . `Opts.getProfileUse() != CodeGenOptions::ProfileNone ` in Opts.CallGraphProfile = Opts.getProfileUse() != CodeGenOptions::ProfileNone && !Opts.DisableIntegratedAS; is redundant. CGProfile.cpp is a no-op if no function provides `getEntryFreq()`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Tue Jul 7 16:01:40 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:01:40 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <8876784306c0033294a8aa439070b35c@localhost.localdomain> lebedev.ri added a comment. In D83351#2137603 , @nickdesaulniers wrote: > I think it would be good in the description+commit message to reference https://reviews.llvm.org/D73853 (and that it was reverted). > > Does llvm-reduce still have "interestingness tests" in `llvm/test/Reduce/Inputs/`? If so, shouldn't this change add such a test? *cough* `llvm/test/Reduce/*.ll` There are (no more new!) `llvm/test/Reduce/Inputs/*.ll` files because they are pointless. As you can see we can just as easily use FileCheck to drive interestingness tests. > Thanks for the patch! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 16:02:24 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:02:24 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/ELF/Thunks.cpp:843 + if (!isInt<26>(offset)) + fatal("R2 save stub branch offset is too large."); + write32(buf + 0, 0xf8410018); // std r2,24(r1) ---------------- Drop trailing period. http://llvm.org/docs/CodingStandards.html#error-and-warning-messages `"R2 save stub branch offset is too large: " + Twine(offset)` ================ Comment at: lld/test/ELF/ppc64-error-toc-local-call.s:15 + .localentry callee, 1 + addi 3, 3, 5 # 0x0 + extsw 3, 3 ---------------- Delete `addi 3, 3, 5` and extsw. They are irrelevant. ================ Comment at: lld/test/ELF/ppc64-error-toc-local-call.s:25 + .localentry caller, .Lfunc_lep1-.Lfunc_gep1 + mr 30, 3 + bl callee # 0x18 ---------------- Delete irrelevant mr and add Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 From llvm-commits at lists.llvm.org Tue Jul 7 16:02:41 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:02:41 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: NeHuang marked 5 inline comments as done and an inline comment as not done. NeHuang added a comment. Thanks @MaskRay. - Inserted `fatal()` for unsupported features. - Cleaned up test to keep necessary instructions and combine cases. - Fixed some nits in comment. ================ Comment at: lld/ELF/Arch/PPC64.cpp:1043 + // FIXME: Remove the assertions once the call protocols are supported. + assert(!(type == R_PPC64_REL24_NOTOC && (s.stOther >> 5) > 1) && + "Unsupported protocol: RelType is R_PPC64_REL24_NOTOC and the callee " ---------------- MaskRay wrote: > Note: assert should only be used for logically unreachable code, i.e. if the implementation is not buggy, the negative code path should not trigger. > > You can use `fatal(...)` for unimplemented features. Please use all lowercase messages. Added `fatal()`. I am using lower cases for the messages except the relocation name. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-hidden.s:111 + mr 30, 5 + bl callee2_stother1_local at notoc + add 3, 3, 30 ---------------- MaskRay wrote: > IIUC, you can merge caller2 and caller2_tailcall and delete all instructions except: > > ``` > .localentry caller2_tailcall, 1 > bl callee2_stother1_local at notoc > b callee2_stother1_local at notoc > ``` Good point. Merged them for all the test. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:42 + +# CHECK-LABEL: caller1 +# CHECK: 10010018: bl 0x10010000 ---------------- MaskRay wrote: > `caller1` is not a good FileCheck label. > > `:` is. It is unique in the llvm-objdump output. > > Please fix all the occurrences. Added the fix for all the occurrences. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:52 +callee1_stother0_local: + mullw 3, 3, 3 + extsw 3, 3 ---------------- MaskRay wrote: > I think these instructions are entirely irrelevant to the test and should be deleted to make tests more focused/readable. > > mullw 3, 3, 3 > ext3sw 3,3 > > Please check some newer x86-64-* and aarch64-* tests. They don't have such setup instructions. > > But please keep the instruction after `bl ... at notoc` to make it clear that the next instruction is not special as in R_PPC64_REL24 > > I think cleaning up the instructions can make the test smaller by at least one half. Thanks for the advice. Only keep necessary instructions to make the test smaller. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:153 +.section .text_extern_stother0, "ax", %progbits +caller3: + .localentry caller3, 1 ---------------- MaskRay wrote: > ppc64-pcrel-call-to-pcrel-callee-hidden.s changes the symbol bindings to STB_GLOBAL. > Do these symbols need `.globl`? Good catch. They suppose to be `.local` by default. Updated `ppc64-pcrel-call-to-pcrel-callee-hidden.s` to make them consistent. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Tue Jul 7 16:02:54 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:02:54 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: JonChesterfield added a comment. __kmpc_spmd_kernel_init is always called with RequiresDataSharing == 0 Specifically, it's only called from clang, and emitSPMDEntryHeader unconditionally passes zero to it I.e. I think there's more stuff that can be cleaned up in the theme of the above, suggest in later patches Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 From llvm-commits at lists.llvm.org Tue Jul 7 16:04:03 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:04:03 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: <4a08e5875ef3269887abbe35e849dadb@localhost.localdomain> PaoloS updated this revision to Diff 276249. PaoloS marked 2 inline comments as done. PaoloS added a comment. Added missing pattern-matching for *w instructions. Added codegen tests. Added ComplexPattern instances that are crucial to pattern-match SLOI, SROI, SLOIW, SROIW and SLLIUW. Both 32 and 64 bit test files have both 32 and 64 bit test cases of the instructions (were existing). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoB.td llvm/test/CodeGen/RISCV/rv32Zbb.ll llvm/test/CodeGen/RISCV/rv64Zbb.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79870.276249.patch Type: text/x-patch Size: 66863 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:04:37 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:04:37 +0000 (UTC) Subject: [PATCH] D83354: [Preallocated] Add @llvm.call.preallocated.teardown Message-ID: aeubanks created this revision. aeubanks added reviewers: efriedma, hans. Herald added subscribers: llvm-commits, jdoerfert, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This cleans up the stack allocated by a @llvm.call.preallocated.setup. Should either call the teardown or the preallocated call to clean up the stack. Calling both is UB. Add LangRef. Add verifier check that the token argument is a @llvm.call.preallocated.setup. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83354 Files: llvm/docs/LangRef.rst llvm/include/llvm/IR/Intrinsics.td llvm/lib/IR/Verifier.cpp llvm/test/Verifier/preallocated-invalid.ll llvm/test/Verifier/preallocated-valid.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83354.276251.patch Type: text/x-patch Size: 6934 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:05:01 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:05:01 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <51783f5ea66faf3176573f12aab8d31a@localhost.localdomain> nickdesaulniers added a comment. In D83351#2137610 , @lebedev.ri wrote: > In D83351#2137603 , @nickdesaulniers wrote: > > > I think it would be good in the description+commit message to reference https://reviews.llvm.org/D73853 (and that it was reverted). > > > > Does llvm-reduce still have "interestingness tests" in `llvm/test/Reduce/Inputs/`? If so, shouldn't this change add such a test? > > > *cough* `llvm/test/Reduce/*.ll` > There are (no more new!) `llvm/test/Reduce/Inputs/*` files because they are pointless. > As you can see we can just as easily use FileCheck to drive interestingness tests. Might be worth proving it if you're bored and doing other janitorial work in the area. Having homogeneous tests is nice, since we'd rather future travelers use `FileCheck` for BOTH interesting-ness and functional testing of their reduction pass itself. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 16:07:38 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:07:38 +0000 (UTC) Subject: [PATCH] D56387: [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE In-Reply-To: References: Message-ID: <468ea67970fbf1403b36a963734044d3@localhost.localdomain> efriedma added a comment. The pattern in question comes out of https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L3100 , and it traces out to https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L4216 . Probably we want to do what the Hexagon backend does: `def asext: PatFrags<(ops node:$Rs), [(sext node:$Rs), (anyext node:$Rs)]>;`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D56387/new/ https://reviews.llvm.org/D56387 From llvm-commits at lists.llvm.org Tue Jul 7 16:08:02 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:08:02 +0000 (UTC) Subject: [PATCH] D83355: [flang] upstream intrinsic call lowering Message-ID: schweitz created this revision. schweitz added reviewers: jeanPerier, sscalpone, vjayathirtha-nv, kiranchandramohan, clementval. schweitz added a project: Flang. Herald added subscribers: llvm-commits, mgorny. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. This module implements the lowering of Fortran intrinsics to the corresponding calls in support libraries (the Fortran runtime, math libraries, etc.) This revision adds lowering for a fair number of Fortran intrinsics, of which there are many. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83355 Files: flang/include/flang/Lower/CharacterExpr.h flang/include/flang/Lower/IntrinsicCall.h flang/include/flang/Lower/Mangler.h flang/include/flang/Optimizer/Dialect/FIRType.h flang/lib/Lower/CMakeLists.txt flang/lib/Lower/CharacterExpr.cpp flang/lib/Lower/IntrinsicCall.cpp flang/lib/Lower/Mangler.cpp flang/lib/Optimizer/Dialect/FIRType.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83355.276250.patch Type: text/x-patch Size: 81345 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:11:47 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:11:47 +0000 (UTC) Subject: [PATCH] D81378: GlobalISel: Handle more cases in getGCDType In-Reply-To: References: Message-ID: <1dd09c09b34b61b9bfb656588a99be00@localhost.localdomain> aemerson added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:551 + + if (OrigTy.isVector()) { + LLT OrigElt = OrigTy.getElementType(); ---------------- Can we reorganize this to not have so much nesting? Maybe duplicate `greatestCommonDivisor(OrigSize, TargetSize);` above this for the scalar case and early exit. ================ Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:560 + } + } else { + // If the source is a vector of pointers, return a pointer element. ---------------- Invert this so the else early exits? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81378/new/ https://reviews.llvm.org/D81378 From llvm-commits at lists.llvm.org Tue Jul 7 16:12:11 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:12:11 +0000 (UTC) Subject: [PATCH] D83354: [Preallocated] Add @llvm.call.preallocated.teardown In-Reply-To: References: Message-ID: <83879a9f903a33e2552b96e69fd7c058@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83354/new/ https://reviews.llvm.org/D83354 From llvm-commits at lists.llvm.org Tue Jul 7 16:12:51 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:12:51 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: <0492ee2ed9c8307b6748e2314f718ef5@localhost.localdomain> PaoloS added a comment. Just a clarification. I decided to split the tests into 32bit and 64bit because the 32bit code compiled on RV64 commonly produces sign-extended IR and that's when many *w instructions are selected. A version of the tests in a unique file could imply on one hand to have 32 bit IR with sign-extension compiled for RV32 (harmless but redundant), on the other hand we would have i32 code with no explicit sign-extension compiled for RV32. That is correct but it might lead to misleading selections, like pattern-matching the IR code of a 32bit SLOI on RV64 with a RV64 SLOI instead of a SLOIW (the difference is that SLOIW ignores the upper 32 bit of the result while RV64 doesn't). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 From llvm-commits at lists.llvm.org Tue Jul 7 16:14:50 2020 From: llvm-commits at lists.llvm.org (Michael Spencer via llvm-commits) Date: Tue, 07 Jul 2020 16:14:50 -0700 (PDT) Subject: [llvm] 64788d7 - [clang] Include missing LangOpts in `getModuleHash`. Message-ID: <5f0501ea.1c69fb81.7d473.15c0@mx.google.com> Author: Michael Spencer Date: 2020-07-07T17:13:23-06:00 New Revision: 64788d7d5377345af5e3080d26cb6a76c324ab5b URL: https://github.com/llvm/llvm-project/commit/64788d7d5377345af5e3080d26cb6a76c324ab5b DIFF: https://github.com/llvm/llvm-project/commit/64788d7d5377345af5e3080d26cb6a76c324ab5b.diff LOG: [clang] Include missing LangOpts in `getModuleHash`. `ObjCRuntime` and `CommentOpts.BlockCommandNames` are checked by `ASTReader::checkLanguageOptions`, but are not part of the module context hash. This can lead to errors when using implicit modules if different TUs have different values for these options when using the same module cache. This was not hit very often due to the rare usage of `-fblock-command-names=` and that `ObjCRuntime` is by default set by the target triple, which is part of the existing context hash. Added: Modified: clang/include/clang/Basic/ObjCRuntime.h clang/lib/Frontend/CompilerInvocation.cpp clang/test/Modules/context-hash.c llvm/include/llvm/Support/VersionTuple.h Removed: ################################################################################ diff --git a/clang/include/clang/Basic/ObjCRuntime.h b/clang/include/clang/Basic/ObjCRuntime.h index 1c4a69269dee..26403bfa98c9 100644 --- a/clang/include/clang/Basic/ObjCRuntime.h +++ b/clang/include/clang/Basic/ObjCRuntime.h @@ -476,6 +476,10 @@ class ObjCRuntime { friend bool operator!=(const ObjCRuntime &left, const ObjCRuntime &right) { return !(left == right); } + + friend llvm::hash_code hash_value(const ObjCRuntime &OCR) { + return llvm::hash_combine(OCR.getKind(), OCR.getVersion()); + } }; raw_ostream &operator<<(raw_ostream &out, const ObjCRuntime &value); diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index f58854cd9e08..6f6af917e3a3 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -3852,6 +3852,10 @@ std::string CompilerInvocation::getModuleHash() const { for (StringRef Feature : LangOpts->ModuleFeatures) code = hash_combine(code, Feature); + code = hash_combine(code, LangOpts->ObjCRuntime); + const auto &BCN = LangOpts->CommentOpts.BlockCommandNames; + code = hash_combine(code, hash_combine_range(BCN.begin(), BCN.end())); + // Extend the signature with the target options. code = hash_combine(code, TargetOpts->Triple, TargetOpts->CPU, TargetOpts->ABI); diff --git a/clang/test/Modules/context-hash.c b/clang/test/Modules/context-hash.c index 33dfb2f15a2c..8bb7422f6a54 100644 --- a/clang/test/Modules/context-hash.c +++ b/clang/test/Modules/context-hash.c @@ -1,3 +1,6 @@ +// This test verifies that only strict hashing includes search paths and +// diagnostics in the module context hash. + // RUN: rm -rf %t // RUN: %clang_cc1 -fsyntax-only -internal-isystem \ // RUN: %S/Inputs/System/usr/include -fmodules -fimplicit-module-maps \ @@ -20,8 +23,25 @@ // RUN: echo %t > %t.path // RUN: cat %t.path %t1 %t2 %t3 %t4 | FileCheck %s -// This test verifies that only strict hashing includes search paths and -// diagnostics in the module context hash. +// This tests things verified by ASTReader::checkLanguageOptions that are not +// part of LangOpts.def. + +// RUN: rm -rf %t +// RUN: %clang_cc1 -fsyntax-only -internal-isystem \ +// RUN: %S/Inputs/System/usr/include -fmodules -fimplicit-module-maps \ +// RUN: -fmodules-cache-path=%t -x objective-c %s -Rmodule-build 2> %t1 +// RUN: rm -rf %t +// RUN: %clang_cc1 -fsyntax-only -internal-isystem \ +// RUN: %S/Inputs/System/usr/include -fmodules -fimplicit-module-maps \ +// RUN: -fobjc-runtime=macosx-1.0.0.0 \ +// RUN: -fmodules-cache-path=%t -x objective-c %s -Rmodule-build 2> %t2 +// RUN: rm -rf %t +// RUN: %clang_cc1 -fsyntax-only -internal-isystem \ +// RUN: %S/Inputs/System/usr/include -fmodules -fimplicit-module-maps \ +// RUN: -fcomment-block-commands=lp,bj \ +// RUN: -fmodules-cache-path=%t -x objective-c %s -Rmodule-build 2> %t3 +// RUN: echo %t > %t.path +// RUN: cat %t.path %t1 %t2 %t3 | FileCheck --check-prefix=LANGOPTS %s #include @@ -32,3 +52,10 @@ // CHECK: cstd-[[AST_HASH]].pcm' // CHECK-NOT: building module 'cstd' as '{{.*[/\\]}}[[CONTEXT_HASH]]{{[/\\]}} // CHECK: cstd-[[AST_HASH]].pcm' + +// LANGOPTS: [[PREFIX:(.*[/\\])+[a-zA-Z0-9.-]+]] +// LANGOPTS: building module 'cstd' as '[[PREFIX]]{{[/\\]}}[[CONTEXT_HASH:[A-Z0-9]+]]{{[/\\]}}cstd-[[AST_HASH:[A-Z0-9]+]].pcm' +// LANGOPTS-NOT: building module 'cstd' as '{{.*[/\\]}}[[CONTEXT_HASH]]{{[/\\]}} +// LANGOPTS: cstd-[[AST_HASH]].pcm' +// LANGOPTS-NOT: building module 'cstd' as '{{.*[/\\]}}[[CONTEXT_HASH]]{{[/\\]}} +// LANGOPTS: cstd-[[AST_HASH]].pcm' diff --git a/llvm/include/llvm/Support/VersionTuple.h b/llvm/include/llvm/Support/VersionTuple.h index ad89e40f0f14..6f3711f06f1a 100644 --- a/llvm/include/llvm/Support/VersionTuple.h +++ b/llvm/include/llvm/Support/VersionTuple.h @@ -14,6 +14,7 @@ #ifndef LLVM_SUPPORT_VERSIONTUPLE_H #define LLVM_SUPPORT_VERSIONTUPLE_H +#include "llvm/ADT/Hashing.h" #include "llvm/ADT/Optional.h" #include #include @@ -144,6 +145,10 @@ class VersionTuple { return !(X < Y); } + friend llvm::hash_code hash_value(const VersionTuple &VT) { + return llvm::hash_combine(VT.Major, VT.Minor, VT.Subminor, VT.Build); + } + /// Retrieve a string representation of the version number. std::string getAsString() const; From llvm-commits at lists.llvm.org Tue Jul 7 16:15:56 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:15:56 +0000 (UTC) Subject: [PATCH] D81485: GlobalISel: Verify G_BITCAST changes the type In-Reply-To: References: Message-ID: aemerson added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:410 +# FAST-NEXT: %1:fpr(<4 x s8>) = G_BITCAST %0 +# GREEDY-NEXT: %1:gpr(<4 x s8>) = G_BITCAST %0 body: | ---------------- paquette wrote: > AFAIK we should only have vectors on FPRs, but maybe I'm wrong about that. > > @aemerson ? We never enable the greedy RBS mode. I think this behavior is wrong for vectors. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81485/new/ https://reviews.llvm.org/D81485 From llvm-commits at lists.llvm.org Tue Jul 7 16:16:45 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:16:45 +0000 (UTC) Subject: [PATCH] D81901: GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT In-Reply-To: References: Message-ID: aemerson added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2248 + LLT SrcEltTy = SrcVecTy.getElementType(); + if (DstTy != SrcEltTy) // XXX - Is this valid mir? + return UnableToLegalize; ---------------- I don't think so. @aditya_nandakumar G_EXTRACT_VECTOR_ELT doesn't extend/truncate the element right? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81901/new/ https://reviews.llvm.org/D81901 From llvm-commits at lists.llvm.org Tue Jul 7 16:16:54 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:16:54 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml In-Reply-To: References: Message-ID: <005a30569d49f7d92f3ab765bbf1d794@localhost.localdomain> rupprecht accepted this revision. rupprecht added a comment. This revision is now accepted and ready to land. Good catch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83350/new/ https://reviews.llvm.org/D83350 From llvm-commits at lists.llvm.org Tue Jul 7 16:17:28 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:17:28 +0000 (UTC) Subject: [PATCH] D82329: [SVE] Fix invalid Scalable to fixed width vetor type demotion in LLT In-Reply-To: References: Message-ID: <14eed35756531c1996817e63808aa78a@localhost.localdomain> aemerson added a comment. This can be abandoned now? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82329/new/ https://reviews.llvm.org/D82329 From llvm-commits at lists.llvm.org Tue Jul 7 16:17:30 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:17:30 +0000 (UTC) Subject: [PATCH] D82906: [flang][openmp] Use common Directive and Clause enum from llvm/Frontend In-Reply-To: References: Message-ID: <750fe88b56d6dee5b32222221de2b4fb@localhost.localdomain> jdoerfert added a comment. YAY :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82906/new/ https://reviews.llvm.org/D82906 From llvm-commits at lists.llvm.org Tue Jul 7 16:19:14 2020 From: llvm-commits at lists.llvm.org (Philip Reames via llvm-commits) Date: Tue, 07 Jul 2020 16:19:14 -0700 (PDT) Subject: [llvm] 9955876 - [Statepoint] Reduce intendation and change a variable name [NFC] Message-ID: <5f0502f2.1c69fb81.5ced7.8ee4@mx.google.com> Author: Philip Reames Date: 2020-07-07T16:19:05-07:00 New Revision: 9955876d74a51659791fd72f8acb1fba27deacb0 URL: https://github.com/llvm/llvm-project/commit/9955876d74a51659791fd72f8acb1fba27deacb0 DIFF: https://github.com/llvm/llvm-project/commit/9955876d74a51659791fd72f8acb1fba27deacb0.diff LOG: [Statepoint] Reduce intendation and change a variable name [NFC] Added: Modified: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp index d63fedda3bd1..c3a23d67306b 100644 --- a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp @@ -988,23 +988,23 @@ void SelectionDAGBuilder::LowerCallSiteWithDeoptBundle( void SelectionDAGBuilder::visitGCResult(const GCResultInst &CI) { // The result value of the gc_result is simply the result of the actual // call. We've already emitted this, so just grab the value. - const GCStatepointInst *I = CI.getStatepoint(); - - if (I->getParent() != CI.getParent()) { - // Statepoint is in diff erent basic block so we should have stored call - // result in a virtual register. - // We can not use default getValue() functionality to copy value from this - // register because statepoint and actual call return types can be - // diff erent, and getValue() will use CopyFromReg of the wrong type, - // which is always i32 in our case. - Type *RetTy = I->getActualReturnType(); - SDValue CopyFromReg = getCopyFromRegs(I, RetTy); - - assert(CopyFromReg.getNode()); - setValue(&CI, CopyFromReg); - } else { - setValue(&CI, getValue(I)); + const GCStatepointInst *SI = CI.getStatepoint(); + + if (SI->getParent() == CI.getParent()) { + setValue(&CI, getValue(SI)); + return; } + // Statepoint is in diff erent basic block so we should have stored call + // result in a virtual register. + // We can not use default getValue() functionality to copy value from this + // register because statepoint and actual call return types can be + // diff erent, and getValue() will use CopyFromReg of the wrong type, + // which is always i32 in our case. + Type *RetTy = SI->getActualReturnType(); + SDValue CopyFromReg = getCopyFromRegs(SI, RetTy); + + assert(CopyFromReg.getNode()); + setValue(&CI, CopyFromReg); } void SelectionDAGBuilder::visitGCRelocate(const GCRelocateInst &Relocate) { From llvm-commits at lists.llvm.org Tue Jul 7 16:19:16 2020 From: llvm-commits at lists.llvm.org (Philip Reames via llvm-commits) Date: Tue, 07 Jul 2020 16:19:16 -0700 (PDT) Subject: [llvm] 22596e7 - [Statepoint] Use early return to reduce nesting and clarify comments [NFC] Message-ID: <5f0502f4.1c69fb81.84ad4.487f@mx.google.com> Author: Philip Reames Date: 2020-07-07T16:19:05-07:00 New Revision: 22596e7b2f3e22b99b828f8dc17434c53f1f67e7 URL: https://github.com/llvm/llvm-project/commit/22596e7b2f3e22b99b828f8dc17434c53f1f67e7 DIFF: https://github.com/llvm/llvm-project/commit/22596e7b2f3e22b99b828f8dc17434c53f1f67e7.diff LOG: [Statepoint] Use early return to reduce nesting and clarify comments [NFC] Added: Modified: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp index c3a23d67306b..2cb57c1d1ccc 100644 --- a/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp @@ -913,36 +913,38 @@ SelectionDAGBuilder::LowerStatepoint(const GCStatepointInst &I, // Export the result value if needed const GCResultInst *GCResult = I.getGCResult(); Type *RetTy = I.getActualReturnType(); - if (!RetTy->isVoidTy() && GCResult) { - if (GCResult->getParent() != I.getParent()) { - // Result value will be used in a diff erent basic block so we need to - // export it now. Default exporting mechanism will not work here because - // statepoint call has a diff erent type than the actual call. It means - // that by default llvm will create export register of the wrong type - // (always i32 in our case). So instead we need to create export register - // with correct type manually. - // TODO: To eliminate this problem we can remove gc.result intrinsics - // completely and make statepoint call to return a tuple. - unsigned Reg = FuncInfo.CreateRegs(RetTy); - RegsForValue RFV(*DAG.getContext(), DAG.getTargetLoweringInfo(), - DAG.getDataLayout(), Reg, RetTy, - I.getCallingConv()); - SDValue Chain = DAG.getEntryNode(); - - RFV.getCopyToRegs(ReturnValue, DAG, getCurSDLoc(), Chain, nullptr); - PendingExports.push_back(Chain); - FuncInfo.ValueMap[&I] = Reg; - } else { - // Result value will be used in a same basic block. Don't export it or - // perform any explicit register copies. - // We'll replace the actuall call node shortly. gc_result will grab - // this value. - setValue(&I, ReturnValue); - } - } else { - // The token value is never used from here on, just generate a poison value + + if (RetTy->isVoidTy() || !GCResult) { + // The return value is not needed, just generate a poison value. setValue(&I, DAG.getIntPtrConstant(-1, getCurSDLoc())); + return; } + + if (GCResult->getParent() == I.getParent()) { + // Result value will be used in a same basic block. Don't export it or + // perform any explicit register copies. The gc_result will simply grab + // this value. + setValue(&I, ReturnValue); + return; + } + + // Result value will be used in a diff erent basic block so we need to export + // it now. Default exporting mechanism will not work here because statepoint + // call has a diff erent type than the actual call. It means that by default + // llvm will create export register of the wrong type (always i32 in our + // case). So instead we need to create export register with correct type + // manually. + // TODO: To eliminate this problem we can remove gc.result intrinsics + // completely and make statepoint call to return a tuple. + unsigned Reg = FuncInfo.CreateRegs(RetTy); + RegsForValue RFV(*DAG.getContext(), DAG.getTargetLoweringInfo(), + DAG.getDataLayout(), Reg, RetTy, + I.getCallingConv()); + SDValue Chain = DAG.getEntryNode(); + + RFV.getCopyToRegs(ReturnValue, DAG, getCurSDLoc(), Chain, nullptr); + PendingExports.push_back(Chain); + FuncInfo.ValueMap[&I] = Reg; } void SelectionDAGBuilder::LowerCallSiteWithDeoptBundleImpl( From llvm-commits at lists.llvm.org Tue Jul 7 16:20:32 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:20:32 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <079513fedf66f72ff00e83cbfbaf373f@localhost.localdomain> jdoerfert added a comment. Herald added a subscriber: bbn. Can you merge this? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Tue Jul 7 16:20:42 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:20:42 +0000 (UTC) Subject: [PATCH] D83357: [llvm][sve] Reg + Imm addressing mode for ld1ro. Message-ID: fpetrogalli created this revision. fpetrogalli added reviewers: kmclaughlin, efriedma, sdesmalen. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83357 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64InstrFormats.td llvm/lib/Target/AArch64/SVEInstrFormats.td llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-imm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83357.276258.patch Type: text/x-patch Size: 9262 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:21:01 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:21:01 +0000 (UTC) Subject: [PATCH] D82719: [OpenMPOpt][SplitMemTransfer][WIP] Getting values stored in offload arrays In-Reply-To: References: Message-ID: hamax97 updated this revision to Diff 276256. hamax97 added a comment. - Exposing `getValuesInOfflArrays()` to make it unit testable. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82719/new/ https://reviews.llvm.org/D82719 Files: llvm/include/llvm/Transforms/IPO/OpenMPOpt.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82719.276256.patch Type: text/x-patch Size: 10023 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:23:57 2020 From: llvm-commits at lists.llvm.org (Dinar Temirbulatov via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:23:57 +0000 (UTC) Subject: [PATCH] D57779: [SLP] Add support for throttling. In-Reply-To: References: Message-ID: <1911a39b609b1ab5e2400b6e08199f57@localhost.localdomain> dtemirbulatov added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57779/new/ https://reviews.llvm.org/D57779 From llvm-commits at lists.llvm.org Tue Jul 7 16:24:36 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:24:36 +0000 (UTC) Subject: [PATCH] D82982: [openmp] Move isAllowedClauseForDirective to tablegen + add clause version to OMP.td In-Reply-To: References: Message-ID: jdoerfert added a comment. In D82982#2134458 , @jdenny wrote: > The TODOs we talked about can be handled later, once @jdoerfert replies. FWIW, you seem to have a pretty good handle on this (and I am a bit behind in my review list), thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82982/new/ https://reviews.llvm.org/D82982 From llvm-commits at lists.llvm.org Tue Jul 7 16:25:27 2020 From: llvm-commits at lists.llvm.org (Aditya Nandakumar via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:25:27 +0000 (UTC) Subject: [PATCH] D81901: GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT In-Reply-To: References: Message-ID: <28dd67d24c88d6e5fc4d891b6d58d8dd@localhost.localdomain> aditya_nandakumar added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2248 + LLT SrcEltTy = SrcVecTy.getElementType(); + if (DstTy != SrcEltTy) // XXX - Is this valid mir? + return UnableToLegalize; ---------------- aemerson wrote: > I don't think so. > > @aditya_nandakumar G_EXTRACT_VECTOR_ELT doesn't extend/truncate the element right? Nope. Plan was to add a truncating version of it which I didn't get around to doing. Thanks for the reminder though - I'll try to get to it soon. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81901/new/ https://reviews.llvm.org/D81901 From llvm-commits at lists.llvm.org Tue Jul 7 16:28:13 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:28:13 +0000 (UTC) Subject: [PATCH] D82316: [LangRef] Add `noundef` attribute to documentation In-Reply-To: References: Message-ID: <856292bf77fa9682cda27d8b6313d4a8@localhost.localdomain> jdoerfert added a comment. reverse ping. Let's land this :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82316/new/ https://reviews.llvm.org/D82316 From llvm-commits at lists.llvm.org Tue Jul 7 16:31:38 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:31:38 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: asbirlea accepted this revision. asbirlea added a comment. This revision is now accepted and ready to land. Thanks you, LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 From llvm-commits at lists.llvm.org Tue Jul 7 16:31:52 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:31:52 +0000 (UTC) Subject: [PATCH] D76342: [OpenMP] Implement '#pragma omp tile' In-Reply-To: References: Message-ID: <83204dabeae601d790649bd1d1d431fa@localhost.localdomain> Meinersbur updated this revision to Diff 276259. Meinersbur added a comment. - Rebase, especially omp_gen - Some cleanup Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76342/new/ https://reviews.llvm.org/D76342 Files: clang/include/clang-c/Index.h clang/include/clang/AST/OpenMPClause.h clang/include/clang/AST/RecursiveASTVisitor.h clang/include/clang/AST/StmtOpenMP.h clang/include/clang/Basic/DiagnosticCommonKinds.td clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Basic/OpenMPKinds.h clang/include/clang/Basic/StmtNodes.td clang/include/clang/Parse/Parser.h clang/include/clang/Sema/Sema.h clang/include/clang/Serialization/ASTBitCodes.h clang/lib/AST/OpenMPClause.cpp clang/lib/AST/StmtOpenMP.cpp clang/lib/AST/StmtPrinter.cpp clang/lib/AST/StmtProfile.cpp clang/lib/Analysis/CFG.cpp clang/lib/Basic/OpenMPKinds.cpp clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp clang/lib/CodeGen/CGStmt.cpp clang/lib/CodeGen/CGStmtOpenMP.cpp clang/lib/CodeGen/CodeGenFunction.h clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaExceptionSpec.cpp clang/lib/Sema/SemaOpenMP.cpp clang/lib/Sema/TreeTransform.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/test/Index/openmp-tile.c clang/test/OpenMP/tile_ast_print.cpp clang/test/OpenMP/tile_codegen.cpp clang/test/OpenMP/tile_messages.cpp clang/tools/libclang/CIndex.cpp clang/tools/libclang/CXCursor.cpp llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPKinds.def -------------- next part -------------- A non-text attachment was scrubbed... Name: D76342.276259.patch Type: text/x-patch Size: 165399 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:32:43 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:32:43 +0000 (UTC) Subject: [PATCH] D82995: [UpdateTestChecks] Allow $ in function names In-Reply-To: References: Message-ID: jdoerfert added a comment. We need a test. If it helps, OpenMP begin/end declare variant mangles the names with `$`, so do we for CUDA (sometimes). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82995/new/ https://reviews.llvm.org/D82995 From llvm-commits at lists.llvm.org Tue Jul 7 16:38:10 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:38:10 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <860bf0af1b891a3891fc656040ed5c57@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/test/Reduce/remove-call-site-attributes.ll:3 +; +; RUN: rm -rf %t +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t ---------------- `rm -f %t` is sufficient. ================ Comment at: llvm/test/Reduce/remove-call-site-attributes.ll:20 +; CHECK-INTERESTINGNESS: %r = call +; CHECK-INTERESTINGNESS: "attr0" +; CHECK-INTERESTINGNESS: i32 @f1( ---------------- Add `-SAME:` if applicable ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:155 + }); + sort(Res, [](const std::pair &LHS, + const std::pair &RHS) { ---------------- If it is non-deterministic, use `stable_sort` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 16:39:19 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:39:19 +0000 (UTC) Subject: [PATCH] D83352: [flang] Fix CHARACTER length folding problem In-Reply-To: References: Message-ID: <17adec247ed602c47e01ff43ad4b03ca@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG8f0f9eaddf9e: [flang] Fix CHARACTER length folding problem (authored by klausler). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83352/new/ https://reviews.llvm.org/D83352 Files: flang/include/flang/Evaluate/type.h flang/lib/Evaluate/fold-integer.cpp flang/lib/Evaluate/shape.cpp flang/lib/Evaluate/type.cpp flang/lib/Evaluate/variable.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83352.276262.patch Type: text/x-patch Size: 5587 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:40:16 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:40:16 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: reames requested changes to this revision. reames added a comment. This revision now requires changes to proceed. I've spent several hours today wrapping my head around this patch. I think I've found a material simplification which should greatly simplify the code. You track the def offset for each pointer of interest. You have plumbed through the merge value code with the purpose of being able to get to the SDNode corresponding to the actual statepoint. In the gc.relocate code, you combine these two pieces into enough information to extract the relevant definition from the statepoint. Under the assumption that we're only talking about relocates within a single basic block, this is correct. My suggested simplification is the following. Rather than maintaining a map to offsets in the STATEPOINT node, simply maintain a map from pointer to SDValue representing the particular output of the node. Doing this means that you do not need any of the merge value code, all of the result propagation remains the same, and the gc.relocate changes are isolated. You would need to export these values in the cross block case, but this is fairly easily handled by copying each output to a vreg during lowering of the statepoint and having the vreg be the value tracked in the FuncInfo. If my written description is not clear, or you disagree with my conclusions, let's find a time to talk offline. In addition to this larger change, I have added a couple of smaller FIXMEs throughout the day. I've also been landing NFCs where doing so made sense in the process of understanding your changes. You should expect a non-trivial, but not particularly difficult either, rebase. You should consider this comment a blocking comment. I do not intend to spend further time on this review until either a) the changes noted today have been made, or b) we've talked offline and we've agreed to an alternate approach. ================ Comment at: llvm/include/llvm/CodeGen/FunctionLoweringInfo.h:106 + DenseMap DerivedPtrMap; + /// StaticAllocaMap - Keep track of frame indices for fixed sized allocas in ---------------- Marker (see overall comment) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Tue Jul 7 16:47:38 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:47:38 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: <7b3e38f074d02e513ddff03f4382812b@localhost.localdomain> JonChesterfield added a comment. I think there's slightly more code here than is necessary. Specifically, I think identifyKernels should return SmallPtrSetImpl instead of populating a member variable which can later be accessed. With a rename, proposing: `SmallPtrSetImpl getKernels(Module &M){/*roughly contents of current identifyKernels */}` The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache. Thoughts? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 From llvm-commits at lists.llvm.org Tue Jul 7 16:48:51 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:48:51 +0000 (UTC) Subject: [PATCH] D79625: [PowerPC] Extend .reloc directive on PowerPC In-Reply-To: References: Message-ID: <571b3544e23f188e0a882960f04d90c3@localhost.localdomain> stefanp updated this revision to Diff 276264. stefanp marked 4 inline comments as done and an inline comment as not done. stefanp edited the summary of this revision. stefanp added a comment. Added a couple of checks when reading the .reloc directive. Fixed a number of comments. Fixed the specifications for fixup_ppc_linker_opt. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79625/new/ https://reviews.llvm.org/D79625 Files: llvm/include/llvm/BinaryFormat/ELFRelocs/PowerPC64.def llvm/include/llvm/MC/MCExpr.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/test/MC/PowerPC/pcrel-reloc-with-expr.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D79625.276264.patch Type: text/x-patch Size: 27965 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:50:00 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:50:00 +0000 (UTC) Subject: [PATCH] D79625: [PowerPC] Extend .reloc directive on PowerPC In-Reply-To: References: Message-ID: stefanp added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp:1814 +/// Format: .reloc expression , identifier [ , expression ] +/// For example: +/// pld 3, vec at got@pcrel(0), 1 ---------------- sfertile wrote: > Do we want the example to show up in the doxygen? Yes, we can add it to Doxygen. I'm not sure how exactly it would look in this case (or if it would be formatted correctly) but it would be nice to have the full text. I assume that the `\\\` will add anything after it to Doxygen. ================ Comment at: llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp:1819 +/// lwa 3, 4(3) +/// In the above example the .reloc specifies to the linker that the instruction +/// at position .Lpcrel-8 (in this case the pld) has a relocation of type ---------------- sfertile wrote: > A couple nits in regards to this comment: > > The .reloc directive is consumed by the assembler to emit a relocation. It is clearer to separate the explanation of the directive, and then explain the relocations significance to the linker. > > We use 'position' to refer to the expression which defines the relocations r_offset field, then use 'offset' to refer to the expression which defines the relocations r_addend field, which I find confusing. > > Taking both those into account and stealing from the ABI description of the relocation: > > ``` > /// pld 3, vec at got@pcrel(0), 1 > /// .Lpcrel1: > /// .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) > /// lwa 3, 4(3) > > /// The .reloc directive instructs the assembler to emit a relocation of type R_PPC64_RCREL_OPT, referencing > /// offset `.Lpcrel1-8` (the pc-relative load from the got) with addend `.-(.Lpcrel1-8)` (the offset from > /// got access instruction to the associated load/store instruction). > /// The relocation specifies that the instructions at r_offset (pld) and r_offset + r_addend (lwa) > /// may be optimized by the linker (ie the compiler guarantees that register lifetimes are such > /// that the optimization is safe). > ``` I agree your description is much clearer. ================ Comment at: llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp:1849 + Offset->getKind() != llvm::MCExpr::SymbolRef && + Offset->getKind() != llvm::MCExpr::Binary, + OffsetLoc, ---------------- sfertile wrote: > Don't we need further checking on the BinaryExpr to make sure its a symbol+offset expression? I'm guessing we are relying on `getOffsetFromBinaryExpr` to catch the error right now, but a diagnostic should be emitted while parsing since we have OffsetLoc. Yes, we are relying on `getOffsetFromBinaryExpr` to catch the error. However, I'll add another check in here. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp:72 case PPC::fixup_ppc_pcrel34: + case PPC::fixup_ppc_linker_opt: case FK_Data_8: ---------------- sfertile wrote: > Shouldn't this be zero? The relocation is a hint to the linker that the 2 instructions it references are optimizable and doesn't represent any relocatable bits. Yes, you are right that this should be like `fixup_ppc_nofixups`. Initially I had considered this to be potentially "changing" all of the bits of the instruction it is optimizing but that doesn't really make a ton of sense. I'll change it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79625/new/ https://reviews.llvm.org/D79625 From llvm-commits at lists.llvm.org Tue Jul 7 16:53:51 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:53:51 +0000 (UTC) Subject: [PATCH] D82709: [MachineLICM] [PowerPC] hoisting rematerializable cheap instructions based on register pressure. In-Reply-To: References: Message-ID: efriedma added a comment. > But unfortunately, RA can not sink down the hoisted rematerializable instruction(LIS), so there are still many spills. "Cannot", as in the target hooks for remat forbid it somehow? Or are the heuristics somehow favoring spilling over remat? If we trust the register allocator to remat appropriately, we can just hoist everything without worrying about it. If we can't trust the register allocator, that means we're assuming the instruction won't be rematerialized, so we shouldn't be checking if it's rematerializable in the first place. > I am thinking this may be not a good solution either as it ties two passes Machine LICM and RA together. It may also hard to maintain, if we change one pass we must also consider another pass. They're sort of tied together anyway in a general sense: given we have a register pressure heuristic, it needs to be aware of what the register allocator is actually going to do. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrInfo.h:318 + /// Hoist cheap instructions based on register pressure in Machine LICM. + bool shouldHoistCheapInstructions() const override { return true; } + ---------------- shchenz wrote: > efriedma wrote: > > I'm not really happy about adding target-specific heuristics to MachineLICM. Each target has its own cost model to some extent; we might want to use different rules for specific instructions, or maybe specific register files if they're particularly tiny. But I'd like to avoid knobs that universally change the way the algorithm works. If the core algorithm changes depending on the target, that makes it much harder to understand the the way the code is supposed to work, or make any changes in the future, or implement the hook appropriately. > yeah, agree with you for the hook adding policy. There is a conflict place in MachineLICM on PowerPC: > 1: `hasLowDefLatency` overriding on PowerPC in commit https://reviews.llvm.org/rL258142, it indicates that all instructions including cheap instructions should be hoisted outside of loop. > 2: In function `MachineLICMBase::CanCauseHighRegPressure`, there are statements: > ``` > // Don't hoist cheap instructions if they would increase register pressure, > // even if we're under the limit. > if (CheapInstr && !HoistCheapInsts) > return true; > ``` > > Here I just want to make it consistent on PowerPC target. But it seems strange with two different hooks...Maybe we need another patch to improve this. I assume you meant to link https://reviews.llvm.org/rL225471 ? I think this is all tied together; it doesn't really make sense to push it off to later. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82709/new/ https://reviews.llvm.org/D82709 From llvm-commits at lists.llvm.org Tue Jul 7 16:56:17 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:56:17 +0000 (UTC) Subject: [PATCH] D81485: GlobalISel: Verify G_BITCAST changes the type In-Reply-To: References: Message-ID: <6dfb64484f0f3d748893fdd0825deaf6@localhost.localdomain> arsenm marked an inline comment as done. arsenm added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:410 +# FAST-NEXT: %1:fpr(<4 x s8>) = G_BITCAST %0 +# GREEDY-NEXT: %1:gpr(<4 x s8>) = G_BITCAST %0 body: | ---------------- aemerson wrote: > paquette wrote: > > AFAIK we should only have vectors on FPRs, but maybe I'm wrong about that. > > > > @aemerson ? > We never enable the greedy RBS mode. I think this behavior is wrong for vectors. For the purpose of the patch, there just needs to be something here to test a bitcast. If it happens to produce the wrong output, that's a separate issue CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81485/new/ https://reviews.llvm.org/D81485 From llvm-commits at lists.llvm.org Tue Jul 7 16:57:37 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:57:37 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <19d80527dfa045b485f83b427c7fa4ab@localhost.localdomain> JonChesterfield accepted this revision. JonChesterfield added a comment. This revision is now accepted and ready to land. I haven't been able to apply this to the aomp tree (for reasons unrelated to this patch), but by inspection I think it's sound. I like the conservative pattern matching approach. The function pointer specialisation alternative is more complicated than I suggested above - because the pointer gets stored in local state and loaded, it isn't readily available for specialisation on by each call. ================ Comment at: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll:40 + +define internal void @__omp_offloading_35_a1e179_foo_l7_worker() { +entry: ---------------- arsenm wrote: > These tests seem really big Agreed. I wonder if it's worth restructuring the openmp codegen to favour emitting functions instead of blocks with interesting control flow, such that tests like these look more like a linear sequence of named function calls. Said functions would then be inlined downstream of the codegen to produce the same IR we see here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 From llvm-commits at lists.llvm.org Tue Jul 7 16:58:22 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:58:22 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: <795eced448f370059aa004d87b8928a0@localhost.localdomain> ctetreau updated this revision to Diff 276266. ctetreau added a comment. Add unit tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 Files: llvm/include/llvm/IR/PatternMatch.h llvm/test/Transforms/InstCombine/fmul.ll llvm/test/Transforms/InstCombine/mul.ll llvm/unittests/IR/PatternMatch.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83001.276266.patch Type: text/x-patch Size: 12635 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:58:44 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:58:44 +0000 (UTC) Subject: [PATCH] D83056: [NFC] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: <5b7bb3e7ae54a228054724d6d6f47bca@localhost.localdomain> Meinersbur added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:50 -#define DEBUG_TYPE "loop-unroll" +#define DEBUG_TYPE "loop-peel" ---------------- fhahn wrote: > sidbav wrote: > > Meinersbur wrote: > > > fhahn wrote: > > > > I am not sure about this change. Currently peeling is integrated in loop-unroll and remarks/debug can be filtered by loop-unroll, but now we will generate remarks for `loop-unroll` and `loop-peel` when running `-loop-unroll`. > > > Isn't it actually better since you can now filter `-debug-only=loop-unroll`, respectively `-debug-only=loop-peel` depending on what you want to look at? > > > > > > Note: `-Rpass=` remarks use the pass name, not `DEBUG_TYPE`. > > I also agree with @Meinersbur, having them separate is better. Additionally, in the case that the developer wants to look at both unrolling and peeling at the same time, they can specify `debug-only=loop-unroll,loop-peel`. > > Isn't it actually better since you can now filter -debug-only=loop-unroll, respectively -debug-only=loop-peel depending on what you want to look at? > > I'd say it depends. Personally I find it mostly makes things less discoverable for newcomers. I can see how it might be surprising if a user wants to ask for debug output of the LoopUnroll pass and then the pass makes changes but doesn't display the debug output. It's certainly not a new problem though and not a blocker. I think it means that the patch changes behavior though ;) If you want both, use `-debug-only=loop-unroll,loop-peel`. For discoverability one can emit everything using just `-debug`. Ultimately, every change can be surprising. `DEBUG_TYPE` is for debugging, not an intentional behavioral change (in debug builds). I agree this is a somewhat grey area for a NFC(I) patch, but in the strict interpretation assertion also may change behavior and we change those regularly in refactoring NFC changes. I don't think we have a policy for this case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 From llvm-commits at lists.llvm.org Tue Jul 7 17:01:05 2020 From: llvm-commits at lists.llvm.org (Jessica Paquette via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:01:05 +0000 (UTC) Subject: [PATCH] D82651: [GlobalISel][InlineAsm] Add support for matching input constraints In-Reply-To: References: Message-ID: paquette added a comment. Another AArch64 failure from this (bugpointed from a failure in http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g) Given the following: define void @main() { %1 = call i16 asm sideeffect "", "=r,0"(i16 undef) unreachable } Compile with: llc -global-isel -mtriple aarch64-apple-ios -verify-machineinstrs And you'll get the following error: # After IRTranslator # Machine code for function main: IsSSA, TracksLiveness bb.1 (%ir-block.0): %1:_(s16) = G_IMPLICIT_DEF %2:gpr32common = COPY %1:_(s16) INLINEASM &"" [sideeffect] [attdialect], $0:[regdef:GPR32common], def %0:gpr32common, $1:[reguse tiedto:$0], %2:gpr32common(tied-def 3) %4:_(s32) = COPY %0:gpr32common %3:_(s16) = G_TRUNC %4:_(s32) # End machine code for function main. *** Bad machine code: Copy Instruction is illegal with mismatching sizes *** - function: main - basic block: %bb.1 (0x7fdea2021d40) - instruction: %2:gpr32common = COPY %1:_(s16) Def Size = 32, Src Size = 16 LLVM ERROR: Found 1 machine code errors. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82651/new/ https://reviews.llvm.org/D82651 From llvm-commits at lists.llvm.org Tue Jul 7 17:06:49 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:06:49 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, nlopes, aqjune, lebedev.ri. Herald added subscribers: steven.zhang, hiraditya. Herald added a project: LLVM. As noted here https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html and by alive2, this transform isn't valid. If X is poison this potentially propagates poison when it shouldn't. It seems we don't have any tests for this in either InstSimplify or InstCombine. I can add negative tests if we want. The same transform also exists in DAGCombiner. https://reviews.llvm.org/D83360 Files: llvm/lib/Analysis/InstructionSimplify.cpp Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4117,11 +4117,6 @@ if (TrueVal == FalseVal) return TrueVal; - if (isa(TrueVal)) // select ?, undef, X -> X - return FalseVal; - if (isa(FalseVal)) // select ?, X, undef -> X - return TrueVal; - // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && -------------- next part -------------- A non-text attachment was scrubbed... Name: D83360.276265.patch Type: text/x-patch Size: 643 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:08:38 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:08:38 +0000 (UTC) Subject: [PATCH] D83176: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. In-Reply-To: References: Message-ID: <91f32270b7e1471eaa7199c29854722c@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. LGTM. ================ Comment at: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:1125 +#include "llvm/Frontend/OpenMP/OMPKinds.def" +} ---------------- While we are here, remove `uninitializeTypes` ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:313 { \ SmallVector ArgsTypes({__VA_ARGS__}); \ Function *F = M.getFunction(_Name); \ ---------------- hoyFB wrote: > sstefan1 wrote: > > I wasn't sure how to handle `__VA_ARGS__` here, since we would need `OMPBuilder` in front of every type. > > That is why helper macros above exist. The problem with this is that this creates some unused variables in `OpenMPOpt`. > > > > Not sure if `-Wno-unused-variable` would be a good thing to do temporarily? Is there another way to handle `__VA_ARGS__` here? > Could it be possible to use `OMPBuilder. _Name`, `OMPBuilder. _ReturnType` everywhere? Since this will (soonish) be replaced by tablegen definitions we can apply the following workaround: Keep your helper macros but add `(void) VarName;` after each declaration :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83176/new/ https://reviews.llvm.org/D83176 From llvm-commits at lists.llvm.org Tue Jul 7 17:08:41 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:08:41 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo Message-ID: guiand created this revision. guiand added reviewers: eugenis, vitalybuka, efriedma. Herald added subscribers: llvm-commits, jfb, hiraditya. Herald added a project: LLVM. This allows treating these functions like libcalls. This patch is a prerequisite to instrumenting them in MSAN: https://reviews.llvm.org/D83337 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83361 Files: llvm/include/llvm/Analysis/TargetLibraryInfo.def llvm/lib/Analysis/TargetLibraryInfo.cpp llvm/unittests/Analysis/TargetLibraryInfoTest.cpp Index: llvm/unittests/Analysis/TargetLibraryInfoTest.cpp =================================================================== --- llvm/unittests/Analysis/TargetLibraryInfoTest.cpp +++ llvm/unittests/Analysis/TargetLibraryInfoTest.cpp @@ -495,6 +495,9 @@ "declare i8* @mempcpy(i8*, i8*, i64)\n" "declare i8* @memrchr(i8*, i32, i64)\n" + "declare void @__atomic_load(i64, i8*, i8*, i32)\n" + "declare void @__atomic_store(i64, i8*, i8*, i32)\n" + // These are similar to the FILE* fgetc/fputc. "declare i32 @_IO_getc(%struct*)\n" "declare i32 @_IO_putc(i32, %struct*)\n" Index: llvm/lib/Analysis/TargetLibraryInfo.cpp =================================================================== --- llvm/lib/Analysis/TargetLibraryInfo.cpp +++ llvm/lib/Analysis/TargetLibraryInfo.cpp @@ -1228,6 +1228,15 @@ case LibFunc_ZdaPvmSt11align_val_t: return (NumParams == 3 && FTy.getParamType(0)->isPointerTy()); + case LibFunc_atomic_load: + // void __atomic_load(size_t, void *, void *, int) + case LibFunc_atomic_store: + // void __atomic_store(size_t, void *, void *, int) + return (NumParams == 4 && FTy.getParamType(0)->isIntegerTy() && + FTy.getParamType(1)->isPointerTy() && + FTy.getParamType(2)->isPointerTy() && + FTy.getParamType(3)->isIntegerTy()); + case LibFunc_memset_pattern16: return (!FTy.isVarArg() && NumParams == 3 && FTy.getParamType(0)->isPointerTy() && Index: llvm/include/llvm/Analysis/TargetLibraryInfo.def =================================================================== --- llvm/include/llvm/Analysis/TargetLibraryInfo.def +++ llvm/include/llvm/Analysis/TargetLibraryInfo.def @@ -262,6 +262,12 @@ /// long double __atanhl_finite(long double x); TLI_DEFINE_ENUM_INTERNAL(atanhl_finite) TLI_DEFINE_STRING_INTERNAL("__atanhl_finite") +/// void __atomic_load(size_t size, void *mptr, void *vptr, int smodel); +TLI_DEFINE_ENUM_INTERNAL(atomic_load) +TLI_DEFINE_STRING_INTERNAL("__atomic_load") +/// void __atomic_store(size_t size, void *mptr, void *vptr, int smodel); +TLI_DEFINE_ENUM_INTERNAL(atomic_store) +TLI_DEFINE_STRING_INTERNAL("__atomic_store") /// double __cosh_finite(double x); TLI_DEFINE_ENUM_INTERNAL(cosh_finite) TLI_DEFINE_STRING_INTERNAL("__cosh_finite") -------------- next part -------------- A non-text attachment was scrubbed... Name: D83361.276269.patch Type: text/x-patch Size: 2306 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:08:57 2020 From: llvm-commits at lists.llvm.org (Steve Scalpone via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:08:57 +0000 (UTC) Subject: [PATCH] D83355: [flang] upstream intrinsic call lowering In-Reply-To: References: Message-ID: <0adb979cbeb559034271165064fb7ac1@localhost.localdomain> sscalpone accepted this revision. sscalpone added inline comments. This revision is now accepted and ready to land. ================ Comment at: flang/include/flang/Optimizer/Dialect/FIRType.h:245 /// returns -1 if the rank is unknown - int getRank() const; + unsigned getRank() const; }; ---------------- Either the comment or the type ought to be changed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83355/new/ https://reviews.llvm.org/D83355 From llvm-commits at lists.llvm.org Tue Jul 7 17:09:16 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:09:16 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <8682e79404ec05c528d1a4e2bb3037cd@localhost.localdomain> guiand added a comment. Split off the TLI changes to https://reviews.llvm.org/D83361 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 From llvm-commits at lists.llvm.org Tue Jul 7 17:09:49 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:09:49 +0000 (UTC) Subject: [PATCH] D82895: [NFC][LoopInfo] Document empty() In-Reply-To: References: Message-ID: <6881715609ddd4b2e9b1db01279ea81d@localhost.localdomain> Meinersbur added a comment. In D82895#2130151 , @fhahn wrote: > Hm, adding multiple ways to check the same thing may lead to other problems. IMO it will be more confusing if some places use `!L->getParentLoop()` and other `L->isOutermost()`. There are often multiple ways for the same goal, for instance `LoadInst::getPointerOperand()` and `LoadInst::getOperand(0)`. One makes the intention clearer than the other. > Same for `isInnerMost()`/`empty()`. Since these are identical, I'd tend to remove one. In these case I'd remove `empty()` since `Loop` does not represent a container. ================ Comment at: llvm/include/llvm/Analysis/LoopInfo.h:160 bool empty() const { return getSubLoops().empty(); } + bool isInnermost() const { return empty(); } + // Outermost is the same as top-level. ---------------- `isInnermost` should be documented using a doc-string as well (or grouped together with `empty()` using `/// @{` / `/// @}`) ================ Comment at: llvm/include/llvm/Analysis/LoopInfo.h:161 + bool isInnermost() const { return empty(); } + // Outermost is the same as top-level. + bool isOutermost() const { return getParentLoop() == nullptr; } ---------------- make this a doc-string as well? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82895/new/ https://reviews.llvm.org/D82895 From llvm-commits at lists.llvm.org Tue Jul 7 17:10:31 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:10:31 +0000 (UTC) Subject: [PATCH] D82709: [MachineLICM] [PowerPC] hoisting rematerializable cheap instructions based on register pressure. In-Reply-To: References: Message-ID: <4a4d5518c021fe48eb03953738ec8f9e@localhost.localdomain> efriedma added a comment. In D82709#2137756 , @efriedma wrote: > > But unfortunately, RA can not sink down the hoisted rematerializable instruction(LIS), so there are still many spills. > > "Cannot", as in the target hooks for remat forbid it somehow? Or are the heuristics somehow favoring spilling over remat? > > If we trust the register allocator to remat appropriately, we can just hoist everything without worrying about it. If we can't trust the register allocator, that means we're assuming the instruction won't be rematerialized, so we shouldn't be checking if it's rematerializable in the first place. Rereading this, I should probably say a bit more. I think the patch in https://reviews.llvm.org/D82709#2121904 makes sense in a world where we trust the register allocator. In a world where we don't trust the register allocator, we should just delete the isTriviallyReMaterializable() check completely. Either way, we need a consistent model. The original patch is essentially saying remat only works for instructions that have non-loop-invariant operands, and that doesn't really make sense. ------- Not really related to the contents of this patch, but some targets, like ARM, use a pseudo-instruction for integer immediates, and expand it after register allocation; this makes remat more effective. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82709/new/ https://reviews.llvm.org/D82709 From llvm-commits at lists.llvm.org Tue Jul 7 17:13:34 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:13:34 +0000 (UTC) Subject: [PATCH] D71739: [AssumeBundles] Use operand bundles to encode alignment assumptions In-Reply-To: References: Message-ID: <09008075e8f920e7b57073b7f3c2c848@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added a comment. LGTM. @lebedev.ri ? ================ Comment at: llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp:227 } - - if (!AAPtr) - return false; - - // Sign extend the offset to 64 bits (so that it is like all of the other - // expressions). - unsigned OffSCEVBits = OffSCEV->getType()->getPrimitiveSizeInBits(); - if (OffSCEVBits < 64) - OffSCEV = SE->getSignExtendExpr(OffSCEV, Int64Ty); - else if (OffSCEVBits > 64) - return false; - - AAPtr = AAPtr->stripPointerCasts(); - return true; + return false; } ---------------- Early exit please. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71739/new/ https://reviews.llvm.org/D71739 From llvm-commits at lists.llvm.org Tue Jul 7 17:17:38 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:17:38 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <3d2acf37694390ffeb3e13372d74f81f@localhost.localdomain> thakis accepted this revision. thakis added a comment. This revision is now accepted and ready to land. Thanks! Should the release notes grow an entry mentioning the behavior change and this flag? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 17:18:00 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:18:00 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <49095e2f6e601ca79c00869c645b3fd4@localhost.localdomain> guiand updated this revision to Diff 276271. guiand marked 6 inline comments as done. guiand added a comment. Addressed comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 Files: compiler-rt/test/msan/libatomic.c llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/libatomic.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83337.276271.patch Type: text/x-patch Size: 10900 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:18:43 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:18:43 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <9e5404e316c9549a174e988e73250106@localhost.localdomain> jdoerfert added a comment. As discussed, we can check for the validity of the list in the constructor. Two more comments below. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:1433 + /// Before the run function gets called. + bool SeedingPeriod; + ---------------- Use more words to describe this please. Nit: Maybe initialize it directly here. ================ Comment at: llvm/test/Transforms/Attributor/allow_list.ll:37 + +!0 = !{!"clang version 3.8.1-24 (tags/RELEASE_381/final)"} ---------------- Get rid of all lines we don't need, e.g. the last 5. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Tue Jul 7 17:20:01 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:20:01 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: dblaikie added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:60-61 + void visitModule(Module &M) { + for_each(M.getGlobalList(), + [&](GlobalVariable &GV) { visitGlobalVariable(GV); }); + } ---------------- range-based-for loop, probably? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:107-108 + + AttributeCounter() {} + + void visitModule(Module &M) { ---------------- drop this since it's implicit/default? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:146-148 + for (const auto &I : zip(Res, AttributeSets)) { + std::pair &NewSet = std::get<0>(I); + const AttrPtrIdxVecVecTy &V = std::get<1>(I); ---------------- nickdesaulniers wrote: > does `zip` actually simplify this sequence? Looks kind of complicated. +1 to this. ``` std::vector> Res; Res.reserve(AttributeSets.size()); for (const auto &V : AttributeSets) Res.push_back({V.first, convertAttributeRefToAttributeSet(C, V.second)}); ``` Seems simpler. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:152-154 + erase_if(Res, [](const std::pair &NewSet) { + return !NewSet.second.hasAttributes(); + }); ---------------- Could roll this into the previous for loop, making the push_back conditional: ``` std::vector> Res; Res.reserve(AttributeSets.size()); for (const auto &V : AttributeSets) { AttributeSet S = convertAttributeRefToAttributeSet(C, V.second); if (S.hasAttributes()) Res.push_back({V.first, S}); // maybe std::move(S)? Not sure if it's sufficiently heavy to benefit from moving } ``` ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:169 + LLVMContext &C = Program->getContext(); + for_each(R.GlobalVariablesToRefine, [&C](const auto &I) { + I.first->setAttributes(convertAttributeRefToAttributeSet(C, I.second)); ---------------- why std::for_each rather than a range-based for loop? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Tue Jul 7 17:20:11 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:20:11 +0000 (UTC) Subject: [PATCH] D83337: [MSAN] Instrument libatomic load/store calls In-Reply-To: References: Message-ID: <6f01a89a54dadbcedec3805181a15e52@localhost.localdomain> guiand updated this revision to Diff 276272. guiand added a comment. isStore=true on atomic load destination Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83337/new/ https://reviews.llvm.org/D83337 Files: compiler-rt/test/msan/libatomic.c llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/libatomic.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83337.276272.patch Type: text/x-patch Size: 10899 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:25:19 2020 From: llvm-commits at lists.llvm.org (Katherine Rasmussen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:25:19 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <2a9be078c99e402aae8263a24f4394a0@localhost.localdomain> ktras added a comment. In D83142#2137260 , @PeteSteinfeld wrote: > @ktras, are you planning to implement the other coarray intrinsic functions? Yes, that is my plan. I am looking at 'this_image()' next. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Tue Jul 7 17:27:16 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:27:16 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <9b5b20b88583686308c83ac8ab8f0eda@localhost.localdomain> MaskRay added a comment. In D83264#2137849 , @thakis wrote: > Thanks! Should the release notes grow an entry mentioning the behavior change and this flag? Good idea. Will send a review afterwards. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Tue Jul 7 17:27:37 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:27:37 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results In-Reply-To: References: Message-ID: jdoerfert added a comment. I'm not sold on the liveness check. Do we have users of this function that would even try to call with a dead instruction? Usually, liveness is backed in so deep that all AAs actually see is assumed life. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:729 + return Result; + } + ---------------- Please add documentation and consider taking the instructions as references. Nit: Move `F` after the first check to shorten the lifetime (and avoid confusion). ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:2312 + bool isAssumedReachable(const Instruction *From, const Instruction *To, + Attributor &A) const { + const auto &LivenessAA = ---------------- Style: Most our interfaces take the Attributor first (I think). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 From llvm-commits at lists.llvm.org Tue Jul 7 17:29:19 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:29:19 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <8bf77a5c27660bcdfa88a2ef6a7e67f9@localhost.localdomain> dblaikie added a comment. In D82975#2135150 , @SouraVX wrote: > In D82975#2134626 , @dblaikie wrote: > > > In D82975#2134347 , @probinson wrote: > > > > > In D82975#2134132 , @dblaikie wrote: > > > > > > > In D82975#2128353 , @dstenb wrote: > > > > > > > > > In D82975#2127201 , @SouraVX wrote: > > > > > > > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > > > > > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > > > > > > > > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. > > > > > > > > > My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. > > > > > > LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. > > > > > > GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand: > > > > $ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug" > > 1 .debug_info.dwo 0000003c 0000000000000000 > > 2 .debug_abbrev.dwo 0000003e 0000000000000000 > > 3 .debug_macro.dwo 0000001e 0000000000000000 > > 4 .debug_macro.dwo 00000364 0000000000000000 > > 5 .debug_macro.dwo 00000013 0000000000000000 > > 6 .debug_line.dwo 00000048 0000000000000000 > > 7 .debug_str_offsets.dwo 000002d5 0000000000000000 > > 8 .debug_str.dwo 00000e05 0000000000000000 > > $ llvm-dwarfdump-tot main.dwo -debug-macro > > main.dwo: file format elf64-x86-64 > > > > .debug_macro.dwo contents: > > 0x00000000: > > - lineno: 19 macro: > > DW_MACINFO_invalid > > > > > > I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off. > > > One more deciding factor to considered here(previously missed) is that: `GDB(trunk)` also doesn't understand `GNU macro extensions`(if you wish to call it) in split case. > i.e > `gcc -g3 -gsplit-dwarf test.c` > `test.dwo` contains `.debug_macro.dwo` forms which no tool(as of now can dump). > if you load `a.out` in GDB and try expanding macro(defined in source). > GDB will report > > (gdb) info macro FOO > The symbol `FOO' has no definition as a C/C++ preprocessor macro > at :-1 > > > on the other hand, if you try with `-gstrict-dwarf -gsplit-dwarf`. GDB is happy. > So at the end of the day, even if we allow `GNU macro` extension, things will still be broken for `-gsplit-dwarf` case. > Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in `.debug_macro.dwo` section. > That it self is unclear right ? Sure, but easy enough to find the answer to by looking at GCC's output or implementation. But, yeah, hardly high-value if gdb doesn't support it anyway, unless someone wants to add support there, or has some other DWARF Consumer that can handle it. Don't mind -gsplit-dwarf -gdwarf-4 -fdebug-macro -ggdb to use debug_macinfo.dwo while non-split uses GNU extension debug_macro. Don't mind hugely in any case. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Tue Jul 7 17:29:32 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:29:32 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <035ec96f43fbc5d2aae35043e6db9000@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- Higuoxing wrote: > jhenderson wrote: > > dblaikie wrote: > > > Higuoxing wrote: > > > > dblaikie wrote: > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Tue Jul 7 17:33:44 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:33:44 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: <5e8d9429708a6dfb55f99146d6dec161@localhost.localdomain> dblaikie added a comment. In D82886#2135527 , @ikudrin wrote: > In D82886#2134722 , @dblaikie wrote: > > > Is that difference necessary? I tried removing the length == 0 special case from "length()" and no tests fail. Perhaps we could go that route instead? > > > For example, `dumpRnglistsSection()` in `DWARFContext.cpp` terminates the loop when `length()` returns 0. With a specially constructed input, your variant would result in several additional unsuccessful reads with additional error messages: > > .section .debug_rnglists,"", at progbits > .long 0xffffffff > .long 0xffffffff > .byte 0xff > ... > error: parsing .debug_rnglists table at offset 0x0: unexpected end of data at offset 0xb while reading [0x4, 0xc) > error: parsing .debug_rnglists table at offset 0x4: unexpected end of data at offset 0xb while reading [0x8, 0x10) > error: parsing .debug_rnglists table at offset 0x8: unexpected end of data at offset 0xb while reading [0x8, 0xc) > Ah, thanks! Would be handy to have a test case for that & perhaps some other way to communicate "end of list" that's a bit more explicit? Hmm, I'm not sure why this produce the repetition - if length() accurately returned the length that was read rather than zero, then it'd go to the end and stop, right? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Tue Jul 7 17:34:42 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:34:42 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: <1c66baf22b33040d35264095e24fc52f@localhost.localdomain> dblaikie added a comment. In D83236#2135647 , @jmorse wrote: > In D83236#2133902 , @dblaikie wrote: > > > Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together? > > > This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too. > > (Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently) Fair enough. Perhaps a clear FIXME or the like (suggesting that this should be obsoleted by @orlando 's work (& ensuring they're aware that this is something that should be considered in the replacement design)? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 From llvm-commits at lists.llvm.org Tue Jul 7 17:35:44 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:35:44 +0000 (UTC) Subject: [PATCH] D82709: [MachineLICM] [PowerPC] hoisting rematerializable cheap instructions based on register pressure. In-Reply-To: References: Message-ID: shchenz added a comment. OK, I will look into RA and see what's wrong here on PowerPC target. It seems RA can not sink any remat instructions at all on PowerPC as the case here should be very common. Also thanks very much for your info about how ARM handles big IMM. To be honest, that's my most favorite solution either when I come to this issue. On PowerPC we expand the big IMM in ISEL in different ways for different IMM. This will expand a remat instruction to 2 or more non-remat instructions. But I gave up because of the big effort to refactor these. Anyway, thanks very much for your good comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82709/new/ https://reviews.llvm.org/D82709 From llvm-commits at lists.llvm.org Tue Jul 7 17:36:21 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:36:21 +0000 (UTC) Subject: [PATCH] D79871: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbp asm instructions In-Reply-To: References: Message-ID: <7869bcfa1e73ac25f9ce99cf704298f5@localhost.localdomain> PaoloS updated this revision to Diff 276279. PaoloS added a comment. Add missing pattern-matching for *w instructions. Add tests. Add both i32 and i64 code versions to both i32 and i64 test files. Remove NOT labels from tests. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79871/new/ https://reviews.llvm.org/D79871 Files: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoB.td llvm/test/CodeGen/RISCV/rv32Zbp.ll llvm/test/CodeGen/RISCV/rv64Zbp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79871.276279.patch Type: text/x-patch Size: 78266 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:37:01 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:37:01 +0000 (UTC) Subject: [PATCH] D82708: [MachineLICM] NFC - make safety of moving explicitly for IsLoopInvariantInst In-Reply-To: References: Message-ID: shchenz abandoned this revision. shchenz added a comment. we decide to use another solution in https://reviews.llvm.org/D82709, so this NFC should not need any more. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82708/new/ https://reviews.llvm.org/D82708 From llvm-commits at lists.llvm.org Tue Jul 7 17:37:39 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:37:39 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <094cde380cb717ebf4f15539faa7628d@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89 + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " terminated prematurely: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; ---------------- ikudrin wrote: > jhenderson wrote: > > dblaikie wrote: > > > jhenderson wrote: > > > > ikudrin wrote: > > > > > dblaikie wrote: > > > > > > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. > > > > > > > > > > > > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. > > > > > These wordings are already better than mine. Thanks! > > > > How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. > > > The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. > > > > > > What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? > > Taken from the test case: > > > > ``` > > error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 > > ``` > > (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") > > > > ``` > > error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c > > ``` > Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative? This one sounds OK (guess it could be more precise in this case "bounds reached without finding expected null terminator" perhaps - but I realize that's fairly orthogonal to this patch & could be improved in the general DataExtractor infrastructure) - honestly the verbosity of these messages doesn't seem like a problem to me. They should be pretty rare & when they do come up, the more explicit/precise the better, it seems to me. ``` error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 ``` This one ``` error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c ``` Still seems like it could be more precise - exactly why was the terminator unexpected? "has a terminator at 0x8c before the expected end at 0x??" perhaps. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Tue Jul 7 17:37:40 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:37:40 +0000 (UTC) Subject: [PATCH] D82441: [MachineLICM] NFC - add a target hook shouldHoistCheapInstructions for machine licm to hoist cheap instruction In-Reply-To: References: Message-ID: shchenz abandoned this revision. shchenz added a comment. we decide to use another solution in https://reviews.llvm.org/D82709, so this NFC should not need any more. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82441/new/ https://reviews.llvm.org/D82441 From llvm-commits at lists.llvm.org Tue Jul 7 17:42:07 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:42:07 +0000 (UTC) Subject: [PATCH] D82129: [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope In-Reply-To: References: Message-ID: dblaikie added a comment. In D82129#2135946 , @Orlando wrote: > In D82129#2134241 , @probinson wrote: > > > I think I didn't fully grasp that the blocks were being (tail-)merged, which makes the scope ambiguous, and all the rest. So I withdraw the objection on that basis. DWARF is fine with multiple variables pointing to the same location, but it's less forgiving about scopes IIRC, much like it can't describe multiple source attributions for an instructions. This all makes me sad, but that's how DWARF is at the moment. > > > > Is there still an open question about whether this wants to be a cleanup pass or a verifier check? I apologize for losing track. > > > I think we have ruled out a MIR/IR verifier pass, but flagging it in llvm-dwarfdump somehow would still be nice and I wrote a ticket for fixing up the --statistics (PR46575). Instead, I think the question is now whether this should happen earlier in some way to reduce the number of redundant intrinsics, as David says in his reply below. > > In D82129#2134609 , @dblaikie wrote: > > > My take on it is that it's probably not practical to do this as a cleanup - it'd mean any time we merge debug locations, etc, we'd have to go check for isolated variable locations that have become invalid. > > > > (though, inversely: I worry that not cleaning up those variable locations might be a source of IR bloat and algorithmic scaling problems when the debug locations are scanned... ) > > > I chose to do the trimming here because I can say with confidence that it won't cause any coverage or correctness regressions. I agree that it would be nice to remove redundant intrinsics, though I'm not exactly sure what that implementation would entail without further investigation. Is anyone able to offer any insight on this? If not, I suppose it might be reasonable to attempt to do this earlier (in IR) to see if there are any problems, and compare the results. Though I won't be able to get on this for a little while myself. I don't have any particular insight on that, no. If no one else is stepping up, this patch as-is (though I haven't reviewed the implementation in detail) seems like a reasonable improvement at least, and should be acceptable. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82129/new/ https://reviews.llvm.org/D82129 From llvm-commits at lists.llvm.org Tue Jul 7 17:43:42 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:43:42 +0000 (UTC) Subject: [PATCH] D78861: [Attributor] [WIP] Track AA dependency using dependency graph In-Reply-To: References: Message-ID: jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. Some minor things (below and from the last comment). LGTM otherwise. Thx :) ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:2072 + static bool classof(const AADepGraphNode *DGN) { return true; } + ---------------- Add a comment here and explain (especially) why we only return true. ================ Comment at: llvm/lib/Transforms/IPO/Attributor.cpp:2175 + for (AbstractAttribute *AA : AAs) + AA->printWithDeps(errs()); +} ---------------- Nit: use `outs` for `print`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78861/new/ https://reviews.llvm.org/D78861 From llvm-commits at lists.llvm.org Tue Jul 7 17:43:45 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:43:45 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <7cabff7d7b30a5ee1eb3421ada9a5aaf@localhost.localdomain> dblaikie added a comment. In D82881#2135641 , @ABataev wrote: > In D82881#2134913 , @dblaikie wrote: > > > In D82881#2133548 , @ABataev wrote: > > > > > In D82881#2133511 , @aprantl wrote: > > > > > > > And conversely, with this patch applied, do GDB and LLDB still produce the expected result? > > > > > > > > > GDB works correctly. Did not check with lldb, but it also should work. The result is similar to the debug info, produced for the next code: > > > > > > struct { > > > short : 3; > > > short : 6; > > > } a; > > > > > > > > > Similar, but seems different in a critical way - in that code, the type of the field is short, which has size 2. Which matches the size of the field. > > > > I think it would be pretty surprising to handle DWARF where the size of a field is different from the size of the type of that field? > > > The standard clearly says: > A base type entry has a DW_AT_byte_size attribute, whose value is a constant, describing the size in bytes of the storage unit used to represent an object of the given type. > In our example, the storage size is the same, just like for short. The storage size is the same as what? It looks like/my understanding is this patch produces a field of type 'char' with size 2 bytes - which seems surprisingly inconsistent, at least. If there are other pre-existing examples of fields having larger sizes than their types, that might be useful to draw analogy and confidence from. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Tue Jul 7 17:44:37 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:44:37 +0000 (UTC) Subject: [PATCH] D82316: [LangRef] Add `noundef` attribute to documentation In-Reply-To: References: Message-ID: <154c4019145039d4fbadc599e6db301f@localhost.localdomain> guiand added a comment. Should I land this without waiting for the other two patches? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82316/new/ https://reviews.llvm.org/D82316 From llvm-commits at lists.llvm.org Tue Jul 7 17:45:44 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:45:44 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <7f7ff9400c21a0801bb0211b6dcc9158@localhost.localdomain> stefanp updated this revision to Diff 276281. stefanp added a comment. Updated text for error message. Test cleanup be deleting lines that are not needed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82950.276281.patch Type: text/x-patch Size: 7080 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:46:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:46:45 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. In-Reply-To: References: Message-ID: MaskRay added a comment. In D83244#2135329 , @abidh wrote: > In D83244#2133669 , @MaskRay wrote: > > > The `.eh_frame` test case is invalid. LLD handles .eh_frame input sections differently. It parses .eh_frame and deduplicates them. See `eh-frame-merge.s`, an input .eh_frame referencing a non-prevailing COMDAT group is dropped (EhFrameSection::isFdeLive) > > > > Do you have a realistic case where LLD erroneously errors? If so, can you get a minimal reproduce, use `LLD_REPRODUCE=/tmp/rep.tar` or `-Wl,--reproduce=/tmp/rep.tar` to get a reproduce file and upload it somewhere? > > > The problem that I faced is with gcc_except_table. I added .eh_frame for completion sake. The problem was coming while linking in libc++ for a beremetal target. I will try if I can get a minimal testcase from that but it may be difficult. The error looks like this: > > ld.lld: error: relocation refers to a symbol in a discarded section: > > >>> defined in /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a(locale.cpp.o) > >>> section group signature: _ZNSt3__116__pad_and_outputIcNS_11char_traitsIcEEEENS_19ostreambuf_iteratorIT_T0_EES6_PKS4_S8_S8_RNS_8ios_baseES4_ > >>> prevailing definition is in /tmp/test-48f00f.o > >>> referenced by locale.cpp > >>> > >>> locale.cpp.o:(.gcc_except_table+0x6FD) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a > >>> > >>> referenced by locale.cpp > >>> > >>> locale.cpp.o:(.gcc_except_table+0x706) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a > >>> > >>> referenced by locale.cpp > >>> > >>> locale.cpp.o:(.gcc_except_table+0x70A) in archive /opt/llvm/lib/clang-runtimes/riscv64-unknown-elf/rv64imac/lp64/medany/lib/libc++.a OK, please drop .eh_frame from the patch. I don't think a valid input can cause lld to choke. For .gcc_except_table, I still hope we can have a better understanding of the problem. If it is indeed a larger issue like .eh_frame/ppc64 .toc/ppc32 .got2, we may have to work around it, but I want to make sure we are not working around a riscv toolchain problem. If you can get a reproduce, I'd be happy help analyze it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83244/new/ https://reviews.llvm.org/D83244 From llvm-commits at lists.llvm.org Tue Jul 7 17:48:37 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:48:37 +0000 (UTC) Subject: [PATCH] D83004: [UpdateCCTestChecks] Include generated functions if asked In-Reply-To: References: Message-ID: <206eade602cee464ba4d566b6282c24c@localhost.localdomain> jdoerfert added inline comments. ================ Comment at: llvm/utils/update_cc_test_checks.py:133 + parser.add_argument('--include-generated-funcs', action='store_true', + help='Output checks for functions not in source') parser.add_argument('tests', nargs='+') ---------------- greened wrote: > jdoerfert wrote: > > greened wrote: > > > greened wrote: > > > > jdoerfert wrote: > > > > > I think this should go into common.py (after D78618). I would also make this the default but OK. > > > > Yes I suppose it should in case `opt` and friends generate functions. I hadn't considered that use-case. > > > > > > > > While I would like to make it default unfortunately it would require updating a bunch of the existing clang tests which doesn't seem too friendly. See the patch update comment for details. > > > > > > > Just realized it wouldn't necessarily require regeneration of tests, it would just cause regenerated tests to change a lot when they are eventually regenerated. We should discuss as to whether that's acceptable. I think for now this should be non-default to at least get the functionality in without disturbing existing users and then we can discuss a separate change to make it default. > > > > > > It's also possible we could change how clang orders functions. I discovered there's a difference in clang 10 vs. 11 in the order functions are output when OpenMP outlining happens. clang 10 seems to preserve the source order of functions and clang 11 does not. Perhaps that needs to be fixed as I don't know whether that change was intentional or not. > > Best case, without the option the original behavior is preserved. Is that not the case? > That's right. I was referring to making this behavior default. If we do that, we could clean up the script code a bit but it would mean clang tests would change pretty dramatically when they are regenerated. If we fix the clang output, the test changes wouldn't be so dramatic. > > The way clang is behaving now, I would expect any tests that use `-fopenmp`, have multiple functions with OpenMP regions and use function prototypes for some of those functions would break given clang's reordering of function definitions. Perhaps we don't have any tests like that though. We (almost) do not have OpenMP tests with autogenerated test lines. Partially, because we do not test new functions. I would really like this to be available for OpenMP, both in _cc_ and IR tests. If people can opt out of this, especially if the default is "off", the ordering is not a problem (IMHO). With UTC_ARGS we also remember the choice so I really don't see the downside to this being in the common part for all scripts. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83004/new/ https://reviews.llvm.org/D83004 From llvm-commits at lists.llvm.org Tue Jul 7 17:49:41 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:49:41 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <68c90b811f71660ed3b66f44c7f2ef08@localhost.localdomain> stefanp updated this revision to Diff 276284. stefanp added a comment. Cleaned up third test as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82950.276284.patch Type: text/x-patch Size: 7036 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:50:30 2020 From: llvm-commits at lists.llvm.org (Sid Manning via llvm-commits) Date: Tue, 07 Jul 2020 17:50:30 -0700 (PDT) Subject: [compiler-rt] baca8f9 - [compiler-rt][Hexagon] Remove fma/fmin/max code Message-ID: <5f051856.1c69fb81.c31ce.ccd4@mx.google.com> Author: Sid Manning Date: 2020-07-07T19:50:04-05:00 New Revision: baca8f977edce6edb0f9074b77a38c753f8f0c79 URL: https://github.com/llvm/llvm-project/commit/baca8f977edce6edb0f9074b77a38c753f8f0c79 DIFF: https://github.com/llvm/llvm-project/commit/baca8f977edce6edb0f9074b77a38c753f8f0c79.diff LOG: [compiler-rt][Hexagon] Remove fma/fmin/max code This code should reside in the c-library. Differential Revision: https://reviews.llvm.org/D82263 Added: Modified: compiler-rt/lib/builtins/CMakeLists.txt compiler-rt/lib/builtins/hexagon/dffma.S Removed: compiler-rt/lib/builtins/hexagon/fabs_opt.S compiler-rt/lib/builtins/hexagon/fma_opt.S compiler-rt/lib/builtins/hexagon/fmax_opt.S compiler-rt/lib/builtins/hexagon/fmin_opt.S ################################################################################ diff --git a/compiler-rt/lib/builtins/CMakeLists.txt b/compiler-rt/lib/builtins/CMakeLists.txt index aa0df8d5bfdf..5e3c901322ec 100644 --- a/compiler-rt/lib/builtins/CMakeLists.txt +++ b/compiler-rt/lib/builtins/CMakeLists.txt @@ -505,13 +505,9 @@ set(hexagon_SOURCES hexagon/dfsqrt.S hexagon/divdi3.S hexagon/divsi3.S - hexagon/fabs_opt.S hexagon/fastmath2_dlib_asm.S hexagon/fastmath2_ldlib_asm.S hexagon/fastmath_dlib_asm.S - hexagon/fma_opt.S - hexagon/fmax_opt.S - hexagon/fmin_opt.S hexagon/memcpy_forward_vp4cp4n2.S hexagon/memcpy_likely_aligned.S hexagon/moddi3.S diff --git a/compiler-rt/lib/builtins/hexagon/dffma.S b/compiler-rt/lib/builtins/hexagon/dffma.S index c201d3d8be5e..843e88b3cab8 100644 --- a/compiler-rt/lib/builtins/hexagon/dffma.S +++ b/compiler-rt/lib/builtins/hexagon/dffma.S @@ -104,13 +104,11 @@ .type __hexagon_fmadf4, at function .global __hexagon_fmadf5 .type __hexagon_fmadf5, at function - .global fma - .type fma, at function Q6_ALIAS(fmadf5) .p2align 5 __hexagon_fmadf4: __hexagon_fmadf5: -fma: +.Lfma_begin: { P_TMP = dfclass(A,#2) P_TMP = dfclass(B,#2) @@ -561,7 +559,7 @@ fma: B = insert(BTMP,#63,#0) AH -= asl(TMP,#HI_MANTBITS) } - jump fma + jump .Lfma_begin .Lfma_ab_tiny: ATMP = combine(##0x00100000,#0) @@ -569,7 +567,7 @@ fma: A = insert(ATMP,#63,#0) B = insert(ATMP,#63,#0) } - jump fma + jump .Lfma_begin .Lab_inf: { diff --git a/compiler-rt/lib/builtins/hexagon/fabs_opt.S b/compiler-rt/lib/builtins/hexagon/fabs_opt.S deleted file mode 100644 index 6bf9b84b3d20..000000000000 --- a/compiler-rt/lib/builtins/hexagon/fabs_opt.S +++ /dev/null @@ -1,36 +0,0 @@ -//===----------------------Hexagon builtin routine ------------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -.macro FUNCTION_BEGIN name -.text -.p2align 5 -.globl \name -.type \name, @function -\name: -.endm - -.macro FUNCTION_END name -.size \name, . - \name -.endm - -FUNCTION_BEGIN fabs - { - r1 = clrbit(r1, #31) - jumpr r31 - } -FUNCTION_END fabs - -FUNCTION_BEGIN fabsf - { - r0 = clrbit(r0, #31) - jumpr r31 - } -FUNCTION_END fabsf - - .globl fabsl - .set fabsl, fabs diff --git a/compiler-rt/lib/builtins/hexagon/fma_opt.S b/compiler-rt/lib/builtins/hexagon/fma_opt.S deleted file mode 100644 index 7f566adffd6a..000000000000 --- a/compiler-rt/lib/builtins/hexagon/fma_opt.S +++ /dev/null @@ -1,30 +0,0 @@ -//===----------------------Hexagon builtin routine ------------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -.macro FUNCTION_BEGIN name -.text -.p2align 5 -.globl \name -.type \name, @function -\name: -.endm - -.macro FUNCTION_END name -.size \name, . - \name -.endm - -FUNCTION_BEGIN fmaf - r2 += sfmpy(r0, r1) - { - r0 = r2 - jumpr r31 - } -FUNCTION_END fmaf - - .globl fmal - .set fmal, fma diff --git a/compiler-rt/lib/builtins/hexagon/fmax_opt.S b/compiler-rt/lib/builtins/hexagon/fmax_opt.S deleted file mode 100644 index 81d711dff8d2..000000000000 --- a/compiler-rt/lib/builtins/hexagon/fmax_opt.S +++ /dev/null @@ -1,29 +0,0 @@ -//===----------------------Hexagon builtin routine ------------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -.macro FUNCTION_BEGIN name -.text -.p2align 5 -.globl \name -.type \name, @function -\name: -.endm - -.macro FUNCTION_END name -.size \name, . - \name -.endm - -FUNCTION_BEGIN fmaxf - { - r0 = sfmax(r0, r1) - jumpr r31 - } -FUNCTION_END fmaxf - - .globl fmaxl - .set fmaxl, fmax diff --git a/compiler-rt/lib/builtins/hexagon/fmin_opt.S b/compiler-rt/lib/builtins/hexagon/fmin_opt.S deleted file mode 100644 index d043f1d7a698..000000000000 --- a/compiler-rt/lib/builtins/hexagon/fmin_opt.S +++ /dev/null @@ -1,29 +0,0 @@ -//===----------------------Hexagon builtin routine ------------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -.macro FUNCTION_BEGIN name -.text -.p2align 5 -.globl \name -.type \name, @function -\name: -.endm - -.macro FUNCTION_END name -.size \name, . - \name -.endm - -FUNCTION_BEGIN fminf - { - r0 = sfmin(r0, r1) - jumpr r31 - } -FUNCTION_END fminf - - .globl fminl - .set fminl, fmin From llvm-commits at lists.llvm.org Tue Jul 7 17:50:32 2020 From: llvm-commits at lists.llvm.org (Sid Manning via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:50:32 +0000 (UTC) Subject: [PATCH] D82263: [Hexagon] Cleanup compiler-rt.builtins remove code that belongs in the c-library In-Reply-To: References: Message-ID: <91a5c4ffdc2d11753f02c48da75e5229@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbaca8f977edc: [compiler-rt][Hexagon] Remove fma/fmin/max code (authored by sidneym). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82263/new/ https://reviews.llvm.org/D82263 Files: compiler-rt/lib/builtins/CMakeLists.txt compiler-rt/lib/builtins/hexagon/dffma.S compiler-rt/lib/builtins/hexagon/fabs_opt.S compiler-rt/lib/builtins/hexagon/fma_opt.S compiler-rt/lib/builtins/hexagon/fmax_opt.S compiler-rt/lib/builtins/hexagon/fmin_opt.S -------------- next part -------------- A non-text attachment was scrubbed... Name: D82263.276286.patch Type: text/x-patch Size: 4939 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:50:54 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:50:54 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: smeenai accepted this revision. smeenai added a comment. LGTM. ================ Comment at: llvm/docs/CommandGuide/llvm-libtool-darwin.rst:17 + +For most scenarios, it works as a drop-in replacement for cctool's +:program:`libtool`. ---------------- Nit: The package name is "cctools", so you should say "cctools'" (trailing apostrophe) instead of "cctool's". ================ Comment at: llvm/test/tools/llvm-libtool-darwin/basic.test:1 +## This test checks that main exits normally (EC 0) for correct input/output args. + ---------------- Nit: just spell out "error code" instead of abbreviating it as EC. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:18 + +static cl::opt OutputFile("output", + cl::desc("Specify output filename"), ---------------- As far as I can see, cctools libtool doesn't support the `-output` spelling, only `-o`. Is there any reason for us to support it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Tue Jul 7 17:52:46 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:52:46 +0000 (UTC) Subject: [PATCH] D82316: [LangRef] Add `noundef` attribute to documentation In-Reply-To: References: Message-ID: <0ca5a18719039659f8f62e5e4b458bc1@localhost.localdomain> jdoerfert added a comment. In D82316#2137939 , @guiand wrote: > Should I land this without waiting for the other two patches? Either way works. Don't want us to forget to land it ;) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82316/new/ https://reviews.llvm.org/D82316 From llvm-commits at lists.llvm.org Tue Jul 7 17:53:03 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:53:03 +0000 (UTC) Subject: [PATCH] D79873: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbbp asm instructions In-Reply-To: References: Message-ID: <03cf476e6bff6e77ae7155ed0efbd149@localhost.localdomain> PaoloS updated this revision to Diff 276288. PaoloS added a comment. Added missing pattern-matching for *w instructions. Added tests. Added both i32 and i64 code versions to both i32 and i64 test files. Removed NOT labels from tests. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79873/new/ https://reviews.llvm.org/D79873 Files: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoB.td llvm/test/CodeGen/RISCV/rv32Zbbp.ll llvm/test/CodeGen/RISCV/rv64Zbbp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79873.276288.patch Type: text/x-patch Size: 26199 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:55:27 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 00:55:27 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if Message-ID: clementval created this revision. clementval added reviewers: jdoerfert, jdenny. Herald added subscribers: llvm-commits, sstefan1, guansong, yaxunl. Herald added a project: LLVM. Change the test in isAllowedClauseForDirective from if with multiple conditions to a main switch on directive and then switches on clause for each directive. Version check is still done with a condition in the return statment. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83363 Files: llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276289.patch Type: text/x-patch Size: 7583 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 17:56:05 2020 From: llvm-commits at lists.llvm.org (Wouter van Oortmerssen via llvm-commits) Date: Tue, 07 Jul 2020 17:56:05 -0700 (PDT) Subject: [llvm] fd0964a - [WebAssembly] fix gcc 10 warning Message-ID: <5f0519a5.1c69fb81.1ea66.c5e8@mx.google.com> Author: Wouter van Oortmerssen Date: 2020-07-07T17:55:37-07:00 New Revision: fd0964ae8340d24ce7991767fbbfe4bc01af87b3 URL: https://github.com/llvm/llvm-project/commit/fd0964ae8340d24ce7991767fbbfe4bc01af87b3 DIFF: https://github.com/llvm/llvm-project/commit/fd0964ae8340d24ce7991767fbbfe4bc01af87b3.diff LOG: [WebAssembly] fix gcc 10 warning Added: Modified: llvm/include/llvm/BinaryFormat/Wasm.h llvm/lib/MC/WasmObjectWriter.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h index d8d72cacf226..1aca692e30a7 100644 --- a/llvm/include/llvm/BinaryFormat/Wasm.h +++ b/llvm/include/llvm/BinaryFormat/Wasm.h @@ -280,6 +280,7 @@ enum : unsigned { }; enum : unsigned { + WASM_LIMITS_FLAG_NONE = 0x0, WASM_LIMITS_FLAG_HAS_MAX = 0x1, WASM_LIMITS_FLAG_IS_SHARED = 0x2, WASM_LIMITS_FLAG_IS_64 = 0x4, diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp index d1290b050ef2..f51d908c53e1 100644 --- a/llvm/lib/MC/WasmObjectWriter.cpp +++ b/llvm/lib/MC/WasmObjectWriter.cpp @@ -1198,7 +1198,8 @@ uint64_t WasmObjectWriter::writeObject(MCAssembler &Asm, MemImport.Module = "env"; MemImport.Field = "__linear_memory"; MemImport.Kind = wasm::WASM_EXTERNAL_MEMORY; - MemImport.Memory.Flags = is64Bit() ? wasm::WASM_LIMITS_FLAG_IS_64 : 0; + MemImport.Memory.Flags = is64Bit() ? wasm::WASM_LIMITS_FLAG_IS_64 + : wasm::WASM_LIMITS_FLAG_NONE; Imports.push_back(MemImport); // For now, always emit the table section, since indirect calls are not From llvm-commits at lists.llvm.org Tue Jul 7 18:00:22 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:00:22 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: dnsampaio updated this revision to Diff 276291. dnsampaio added a comment. Moved to BDCE. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276291.patch Type: text/x-patch Size: 5468 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:01:21 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 18:01:21 -0700 (PDT) Subject: [llvm] f1d290d - [X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def. Message-ID: <5f051ae1.1c69fb81.f5c3c.9556@mx.google.com> Author: Craig Topper Date: 2020-07-07T17:59:54-07:00 New Revision: f1d290d81298092b693076725cef4f34e951e974 URL: https://github.com/llvm/llvm-project/commit/f1d290d81298092b693076725cef4f34e951e974 DIFF: https://github.com/llvm/llvm-project/commit/f1d290d81298092b693076725cef4f34e951e974.diff LOG: [X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def. These represent the same thing but 64BIT only showed up from getHostCPUFeatures providing a list of featuers to clang. While EM64T showed up from getting the features for a named CPU. EM64T didn't have a string specifically so it would not be passed up to clang when getting features for a named CPU. While 64bit needed a name since that's how it is index. Merge them by filtering 64bit out before sending features to clang for named CPUs. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Analysis/InstructionSimplify.cpp llvm/lib/Support/Host.cpp llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 9910fd615b1d..ed41295166b3 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -184,10 +184,6 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") -// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is -// valid for 64-bit mode, but has empty string so it doesn't get added to -// target attributes in IR. -X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index df4abe09797c..723bea7c2ad7 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -4117,11 +4117,6 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, if (TrueVal == FalseVal) return TrueVal; - if (isa(TrueVal)) // select ?, undef, X -> X - return FalseVal; - if (isa(FalseVal)) // select ?, X, undef -> X - return TrueVal; - // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index 3a7d9a0242fa..db99612c97b5 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; } - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_CORE2; // "core2" *Subtype = X86::INTEL_CORE2_65; break; @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 15: { - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_NOCONA; break; } @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, setFeature(X86::FEATURE_FMA4); if (HasExtLeaf1 && ((EDX >> 29) & 1)) - setFeature(X86::FEATURE_EM64T); + setFeature(X86::FEATURE_64BIT); } StringRef sys::getHostCPUName() { diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index df03f63e720e..cbb7f6186d0d 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -48,6 +48,14 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } + constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { + uint32_t NewBits = Bits[I] & RHS.Bits[I]; + Bits[I] = NewBits; + } + return *this; + } + constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { uint32_t NewBits = Bits[I] | RHS.Bits[I]; @@ -57,16 +65,14 @@ class FeatureBitset { } constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { - FeatureBitset Result; - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) - Result.Bits[I] = Bits[I] & RHS.Bits[I]; + FeatureBitset Result = *this; + Result &= RHS; return Result; } constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { - FeatureBitset Result; - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) - Result.Bits[I] = Bits[I] | RHS.Bits[I]; + FeatureBitset Result = *this; + Result |= RHS; return Result; } @@ -111,10 +117,10 @@ static constexpr FeatureBitset FeaturesPentium4 = static constexpr FeatureBitset FeaturesPrescott = FeaturesPentium4 | FeatureSSE3; static constexpr FeatureBitset FeaturesNocona = - FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; + FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; // Basic 64-bit capable CPU. -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; // Intel Core CPUs static constexpr FeatureBitset FeaturesCore2 = @@ -201,7 +207,7 @@ static constexpr FeatureBitset FeaturesAthlon = static constexpr FeatureBitset FeaturesAthlonXP = FeaturesAthlon | FeatureFXSR | FeatureSSE; static constexpr FeatureBitset FeaturesK8 = - FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; + FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | @@ -209,7 +215,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; @@ -220,7 +226,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = // AMD Bulldozer architecture processors. static constexpr FeatureBitset FeaturesBDVER1 = FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | - FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | + FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; @@ -236,7 +242,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = static constexpr FeatureBitset FeaturesZNVER1 = FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | - FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | + FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | @@ -363,7 +369,7 @@ static constexpr ProcInfo Processors[] = { X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { for (const auto &P : Processors) - if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) return P.Kind; return CK_None; @@ -372,7 +378,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, bool Only64Bit) { for (const auto &P : Processors) - if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) Values.emplace_back(P.Name); } @@ -401,7 +407,6 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; -static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; @@ -528,8 +533,14 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); + FeatureBitset Bits = I->Features; + + // Remove the 64-bit feature which we only use to validate if a CPU can + // be used with 64-bit mode. + Bits &= ~Feature64BIT; + // Add the string version of all set bits. - getFeatureBitsAsStrings(I->Features, EnabledFeatures); + getFeatureBitsAsStrings(Bits, EnabledFeatures); } // For each feature that is (transitively) implied by this feature, set it. From llvm-commits at lists.llvm.org Tue Jul 7 18:04:59 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:04:59 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend Message-ID: Conanap created this revision. Conanap added reviewers: power-llvm-team, PowerPC, saghir, nemanjai, hfinkel. Conanap added projects: LLVM, clang, PowerPC. Includes instruction defintion and MC Tests for above instructions. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83364 Files: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s Index: llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s =================================================================== --- llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s +++ llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s @@ -405,3 +405,27 @@ # CHECK-BE: vinsdrx 1, 2, 3 # encoding: [0x10,0x22,0x1b,0xcf] # CHECK-LE: vinsdrx 1, 2, 3 # encoding: [0xcf,0x1b,0x22,0x10] vinsdrx 1, 2, 3 +# CHECK-BE: lxvrbx 32, 1, 2 # encoding: [0x7c,0x01,0x10,0x1b] +# CHECK-LE: lxvrbx 32, 1, 2 # encoding: [0x1b,0x10,0x01,0x7c] + lxvrbx 32, 1, 2 +# CHECK-BE: lxvrhx 33, 1, 2 # encoding: [0x7c,0x21,0x10,0x5b] +# CHECK-LE: lxvrhx 33, 1, 2 # encoding: [0x5b,0x10,0x21,0x7c] + lxvrhx 33, 1, 2 +# CHECK-BE: lxvrdx 34, 1, 2 # encoding: [0x7c,0x41,0x10,0xdb] +# CHECK-LE: lxvrdx 34, 1, 2 # encoding: [0xdb,0x10,0x41,0x7c] + lxvrdx 34, 1, 2 +# CHECK-BE: lxvrwx 35, 1, 2 # encoding: [0x7c,0x61,0x10,0x9b] +# CHECK-LE: lxvrwx 35, 1, 2 # encoding: [0x9b,0x10,0x61,0x7c] + lxvrwx 35, 1, 2 +# CHECK-BE: stxvrbx 32, 3, 1 # encoding: [0x7c,0x03,0x09,0x1b] +# CHECK-LE: stxvrbx 32, 3, 1 # encoding: [0x1b,0x09,0x03,0x7c] + stxvrbx 32, 3, 1 +# CHECK-BE: stxvrhx 33, 3, 1 # encoding: [0x7c,0x23,0x09,0x5b] +# CHECK-LE: stxvrhx 33, 3, 1 # encoding: [0x5b,0x09,0x23,0x7c] + stxvrhx 33, 3, 1 +# CHECK-BE: stxvrwx 34, 3, 1 # encoding: [0x7c,0x43,0x09,0x9b] +# CHECK-LE: stxvrwx 34, 3, 1 # encoding: [0x9b,0x09,0x43,0x7c] + stxvrwx 34, 3, 1 +# CHECK-BE: stxvrdx 35, 3, 1 # encoding: [0x7c,0x63,0x09,0xdb] +# CHECK-LE: stxvrdx 35, 3, 1 # encoding: [0xdb,0x09,0x63,0x7c] + stxvrdx 35, 3, 1 Index: llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt =================================================================== --- llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt +++ llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt @@ -278,3 +278,27 @@ # CHECK: vinsdrx 1, 2, 3 0x10 0x22 0x1b 0xcf + +# CHECK: lxvrbx 32, 1, 2 +0x7c 0x01 0x10 0x1b + +# CHECK: lxvrhx 33, 1, 2 +0x7c 0x21 0x10 0x5b + +# CHECK: lxvrdx 34, 1, 2 +0x7c 0x41 0x10 0xdb + +# CHECK: lxvrwx 35, 1, 2 +0x7c 0x61 0x10 0x9b + +# CHECK: stxvrbx 32, 3, 1 +0x7c 0x03 0x09 0x1b + +# CHECK: stxvrhx 33, 3, 1 +0x7c 0x23 0x09 0x5b + +# CHECK: stxvrwx 34, 3, 1 +0x7c 0x43 0x09 0x9b + +# CHECK: stxvrdx 35, 3, 1 +0x7c 0x63 0x09 0xdb Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -428,6 +428,22 @@ def PrefixInstrs : Predicate<"Subtarget->hasPrefixInstrs()">; def IsISA3_1 : Predicate<"Subtarget->isISA3_1()">; +let mayLoad = 1, mayStore = 0, Predicates = [IsISA3_1] in { + // The XFormMemOp flag is set on the instruction format. + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; + def LXVRHX : X_XT6_RA5_RB5<31, 45, "lxvrhx", vsrc, []>; + def LXVRWX : X_XT6_RA5_RB5<31, 77, "lxvrwx", vsrc, []>; + def LXVRDX : X_XT6_RA5_RB5<31, 109, "lxvrdx", vsrc, []>; +} + +let mayLoad = 0, mayStore = 1, Predicates = [IsISA3_1] in { + // The XFormMemOp flag is set on the instruction format. + def STXVRBX : X_XS6_RA5_RB5<31, 141, "stxvrbx", vsrc, []>; + def STXVRHX : X_XS6_RA5_RB5<31, 173, "stxvrhx", vsrc, []>; + def STXVRWX : X_XS6_RA5_RB5<31, 205, "stxvrwx", vsrc, []>; + def STXVRDX : X_XS6_RA5_RB5<31, 237, "stxvrdx", vsrc, []>; +} + let Predicates = [PrefixInstrs] in { let Interpretation64Bit = 1, isCodeGenOnly = 1 in { defm PADDI8 : -------------- next part -------------- A non-text attachment was scrubbed... Name: D83364.276282.patch Type: text/x-patch Size: 3953 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:10:47 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:10:47 +0000 (UTC) Subject: [PATCH] D82262: [RISCV] optimize addition with a pair of (addi imm) In-Reply-To: References: Message-ID: <0c6c946e77881ddb677a32ef3d79c9c4@localhost.localdomain> benshi001 updated this revision to Diff 276293. benshi001 retitled this revision from "[RISCV] Optimize addition with an immediate" to "[RISCV] optimize addition with a pair of (addi imm)". CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82262/new/ https://reviews.llvm.org/D82262 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/add-imm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82262.276293.patch Type: text/x-patch Size: 7616 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:18:50 2020 From: llvm-commits at lists.llvm.org (Sid Manning via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:18:50 +0000 (UTC) Subject: [PATCH] D82263: [Hexagon] Cleanup compiler-rt.builtins remove code that belongs in the c-library In-Reply-To: References: Message-ID: <7adbf92fd261f94cf950b050856d3098@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbaca8f977edc: [compiler-rt][Hexagon] Remove fma/fmin/max code (authored by sidneym). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82263/new/ https://reviews.llvm.org/D82263 Files: compiler-rt/lib/builtins/CMakeLists.txt compiler-rt/lib/builtins/hexagon/dffma.S compiler-rt/lib/builtins/hexagon/fabs_opt.S compiler-rt/lib/builtins/hexagon/fma_opt.S compiler-rt/lib/builtins/hexagon/fmax_opt.S compiler-rt/lib/builtins/hexagon/fmin_opt.S -------------- next part -------------- A non-text attachment was scrubbed... Name: D82263.275693.patch Type: text/x-patch Size: 4939 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:20:01 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:20:01 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <4f578d7233010093af279c1c2774770e@localhost.localdomain> smeenai added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:18 +# RUN: llvm-nm --print-armap %t.lib | \ +# RUN: FileCheck %s --check-prefix=CHECK-SYMBOLS -DPREFIX=%basename_t.tmp + ---------------- Might wanna add `--match-full-lines` here, to ensure there's no unexpected symbol name prefix or member name suffix. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:32 +# FORMAT-NEXT: [[PREFIX]]-input2.o +# FORMAT_NOT: {{.}} + ---------------- You have an underscore instead of a dash :) Is the purpose to ensure that there's no other members? I assume a -EMPTY would work for that. We should also check for the "Archive : " header, to ensure there's no members before the table of contents member. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:47 + +## Duplicate a binary: +# RUN: llvm-libtool-darwin -static -o %t.lib %t-input1.o %t-input2.o %t-input1.o ---------------- Might be worth noting that cctools libtool warns in this case, and we don't implement that warning yet. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test:1 +## This test checks that unrelated options are hidden in help text. + ---------------- This seems unrelated to this diff; perhaps it should be in the previous one? ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:1 ## This test checks that an error is thrown in case of invalid input/output args. ---------------- Can you add `-static` to all of the tests in this file, so that the only invalid part of the command line is the aspect being tested? ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:28 + cl::value_desc("filename"), cl::Required, + cl::cat(LibtoolCategory)); static cl::alias OutputFileShort("o", cl::desc("Alias for -output"), ---------------- sameerarora101 wrote: > jhenderson wrote: > > Adding the categories sounds like a different change to me? You might want to include it alongside a test case to show that unrelated options aren't incldued. > yup, I added `OptionCategory` in order to prevent the general option `--safepoint-ir-verifier-print-only` from showing up in the help text. Added a test case for it as well. Thanks Yup, this should either go in the previous diff or be its own diff. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Tue Jul 7 18:20:55 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 18:20:55 -0700 (PDT) Subject: [llvm] d92bf71 - Revert "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." Message-ID: <5f051f77.1c69fb81.74009.c5f6@mx.google.com> Author: Craig Topper Date: 2020-07-07T18:20:07-07:00 New Revision: d92bf71a07c1787b535f8eb9deb27a4f11ac2d44 URL: https://github.com/llvm/llvm-project/commit/d92bf71a07c1787b535f8eb9deb27a4f11ac2d44 DIFF: https://github.com/llvm/llvm-project/commit/d92bf71a07c1787b535f8eb9deb27a4f11ac2d44.diff LOG: Revert "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." An accidental change snuck in here This reverts commit f1d290d81298092b693076725cef4f34e951e974. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Analysis/InstructionSimplify.cpp llvm/lib/Support/Host.cpp llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index ed41295166b3..9910fd615b1d 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -184,6 +184,10 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") +// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is +// valid for 64-bit mode, but has empty string so it doesn't get added to +// target attributes in IR. +X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index 723bea7c2ad7..df4abe09797c 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -4117,6 +4117,11 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, if (TrueVal == FalseVal) return TrueVal; + if (isa(TrueVal)) // select ?, undef, X -> X + return FalseVal; + if (isa(FalseVal)) // select ?, X, undef -> X + return TrueVal; + // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index db99612c97b5..3a7d9a0242fa 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; } - if (testFeature(X86::FEATURE_64BIT)) { + if (testFeature(X86::FEATURE_EM64T)) { *Type = X86::INTEL_CORE2; // "core2" *Subtype = X86::INTEL_CORE2_65; break; @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 15: { - if (testFeature(X86::FEATURE_64BIT)) { + if (testFeature(X86::FEATURE_EM64T)) { *Type = X86::INTEL_NOCONA; break; } @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, setFeature(X86::FEATURE_FMA4); if (HasExtLeaf1 && ((EDX >> 29) & 1)) - setFeature(X86::FEATURE_64BIT); + setFeature(X86::FEATURE_EM64T); } StringRef sys::getHostCPUName() { diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index cbb7f6186d0d..df03f63e720e 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -48,14 +48,6 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } - constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { - uint32_t NewBits = Bits[I] & RHS.Bits[I]; - Bits[I] = NewBits; - } - return *this; - } - constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { uint32_t NewBits = Bits[I] | RHS.Bits[I]; @@ -65,14 +57,16 @@ class FeatureBitset { } constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { - FeatureBitset Result = *this; - Result &= RHS; + FeatureBitset Result; + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) + Result.Bits[I] = Bits[I] & RHS.Bits[I]; return Result; } constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { - FeatureBitset Result = *this; - Result |= RHS; + FeatureBitset Result; + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) + Result.Bits[I] = Bits[I] | RHS.Bits[I]; return Result; } @@ -117,10 +111,10 @@ static constexpr FeatureBitset FeaturesPentium4 = static constexpr FeatureBitset FeaturesPrescott = FeaturesPentium4 | FeatureSSE3; static constexpr FeatureBitset FeaturesNocona = - FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; + FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; // Basic 64-bit capable CPU. -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; // Intel Core CPUs static constexpr FeatureBitset FeaturesCore2 = @@ -207,7 +201,7 @@ static constexpr FeatureBitset FeaturesAthlon = static constexpr FeatureBitset FeaturesAthlonXP = FeaturesAthlon | FeatureFXSR | FeatureSSE; static constexpr FeatureBitset FeaturesK8 = - FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; + FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | @@ -215,7 +209,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; @@ -226,7 +220,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = // AMD Bulldozer architecture processors. static constexpr FeatureBitset FeaturesBDVER1 = FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | - FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | + FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; @@ -242,7 +236,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = static constexpr FeatureBitset FeaturesZNVER1 = FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | - FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | + FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | @@ -369,7 +363,7 @@ static constexpr ProcInfo Processors[] = { X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { for (const auto &P : Processors) - if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) + if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) return P.Kind; return CK_None; @@ -378,7 +372,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, bool Only64Bit) { for (const auto &P : Processors) - if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) + if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) Values.emplace_back(P.Name); } @@ -407,6 +401,7 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; +static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; @@ -533,14 +528,8 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); - FeatureBitset Bits = I->Features; - - // Remove the 64-bit feature which we only use to validate if a CPU can - // be used with 64-bit mode. - Bits &= ~Feature64BIT; - // Add the string version of all set bits. - getFeatureBitsAsStrings(Bits, EnabledFeatures); + getFeatureBitsAsStrings(I->Features, EnabledFeatures); } // For each feature that is (transitively) implied by this feature, set it. From llvm-commits at lists.llvm.org Tue Jul 7 18:27:54 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:27:54 +0000 (UTC) Subject: [PATCH] D82660: [RISCV] Optimize multiplication by specific immediates In-Reply-To: References: Message-ID: benshi001 updated this revision to Diff 276297. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82660/new/ https://reviews.llvm.org/D82660 Files: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/test/CodeGen/RISCV/mul.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82660.276297.patch Type: text/x-patch Size: 14293 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:49:48 2020 From: llvm-commits at lists.llvm.org (Kostya Serebryany via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:49:48 +0000 (UTC) Subject: [PATCH] D76665: [asan] Stop instrumenting user-defined ELF sections In-Reply-To: References: Message-ID: kcc added a comment. can we instead slap an attribute on these special variables? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76665/new/ https://reviews.llvm.org/D76665 From llvm-commits at lists.llvm.org Tue Jul 7 18:50:39 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Tue, 07 Jul 2020 18:50:39 -0700 (PDT) Subject: [llvm] cb82de2 - [RISCV] Optimize multiplication by constant Message-ID: <5f05266f.1c69fb81.54774.5830@mx.google.com> Author: Ben Shi Date: 2020-07-07T18:50:24-07:00 New Revision: cb82de29601745d6c4beaf51ee1dbd1bf7acc186 URL: https://github.com/llvm/llvm-project/commit/cb82de29601745d6c4beaf51ee1dbd1bf7acc186 DIFF: https://github.com/llvm/llvm-project/commit/cb82de29601745d6c4beaf51ee1dbd1bf7acc186.diff LOG: [RISCV] Optimize multiplication by constant ... to shift/add or shift/sub. Do not enable it on riscv32 with the M extension where decomposeMulByConstant may not be an optimization. Reviewed By: luismarques, MaskRay Differential Revision: https://reviews.llvm.org/D82660 Added: Modified: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/test/CodeGen/RISCV/mul.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index b2e51516b983..91fc69b5bc10 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -33,6 +33,7 @@ #include "llvm/IR/IntrinsicsRISCV.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; @@ -2978,6 +2979,26 @@ bool RISCVTargetLowering::shouldExtendTypeInLibCall(EVT Type) const { return true; } +bool RISCVTargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT, + SDValue C) const { + // Check integral scalar types. + if (VT.isScalarInteger()) { + // Do not perform the transformation on riscv32 with the M extension. + if (!Subtarget.is64Bit() && Subtarget.hasStdExtM()) + return false; + if (auto *ConstNode = dyn_cast(C.getNode())) { + if (ConstNode->getAPIntValue().getBitWidth() > 8 * sizeof(int64_t)) + return false; + int64_t Imm = ConstNode->getSExtValue(); + if (isPowerOf2_64(Imm + 1) || isPowerOf2_64(Imm - 1) || + isPowerOf2_64(1 - Imm) || isPowerOf2_64(-1 - Imm)) + return true; + } + } + + return false; +} + #define GET_REGISTER_MATCHER #include "RISCVGenAsmMatcher.inc" diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h index 691bb6d75d13..e420e879efc9 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.h +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h @@ -185,6 +185,9 @@ class RISCVTargetLowering : public TargetLowering { bool mayBeEmittedAsTailCall(const CallInst *CI) const override; bool shouldConsiderGEPOffsetSplit() const override { return true; } + bool decomposeMulByConstant(LLVMContext &Context, EVT VT, + SDValue C) const override; + TargetLowering::AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override; Value *emitMaskedAtomicRMWIntrinsic(IRBuilder<> &Builder, AtomicRMWInst *AI, diff --git a/llvm/test/CodeGen/RISCV/mul.ll b/llvm/test/CodeGen/RISCV/mul.ll index 5808660b5713..89c4bce122fd 100644 --- a/llvm/test/CodeGen/RISCV/mul.ll +++ b/llvm/test/CodeGen/RISCV/mul.ll @@ -79,12 +79,8 @@ define signext i32 @mul(i32 %a, i32 %b) nounwind { define signext i32 @mul_constant(i32 %a) nounwind { ; RV32I-LABEL: mul_constant: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a1, zero, 5 -; RV32I-NEXT: call __mulsi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a1, a0, 2 +; RV32I-NEXT: add a0, a1, a0 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: mul_constant: @@ -95,19 +91,14 @@ define signext i32 @mul_constant(i32 %a) nounwind { ; ; RV64I-LABEL: mul_constant: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 5 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: sext.w a0, a0 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 2 +; RV64I-NEXT: addw a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: mul_constant: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 5 -; RV64IM-NEXT: mulw a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 2 +; RV64IM-NEXT: addw a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i32 %a, 5 ret i32 %1 @@ -177,13 +168,15 @@ define i64 @mul64(i64 %a, i64 %b) nounwind { define i64 @mul64_constant(i64 %a) nounwind { ; RV32I-LABEL: mul64_constant: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a2, zero, 5 -; RV32I-NEXT: mv a3, zero -; RV32I-NEXT: call __muldi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a3, a0, 2 +; RV32I-NEXT: add a2, a3, a0 +; RV32I-NEXT: sltu a3, a2, a3 +; RV32I-NEXT: srli a0, a0, 30 +; RV32I-NEXT: slli a4, a1, 2 +; RV32I-NEXT: or a0, a4, a0 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: add a1, a0, a3 +; RV32I-NEXT: mv a0, a2 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: mul64_constant: @@ -197,18 +190,14 @@ define i64 @mul64_constant(i64 %a) nounwind { ; ; RV64I-LABEL: mul64_constant: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 5 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 2 +; RV64I-NEXT: add a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: mul64_constant: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 5 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 2 +; RV64IM-NEXT: add a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i64 %a, 5 ret i64 %1 @@ -305,12 +294,8 @@ define zeroext i32 @mulhu(i32 zeroext %a, i32 zeroext %b) nounwind { define i32 @muli32_p65(i32 %a) nounwind { ; RV32I-LABEL: muli32_p65: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a1, zero, 65 -; RV32I-NEXT: call __mulsi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a1, a0, 6 +; RV32I-NEXT: add a0, a1, a0 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_p65: @@ -321,18 +306,14 @@ define i32 @muli32_p65(i32 %a) nounwind { ; ; RV64I-LABEL: muli32_p65: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 65 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: addw a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_p65: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 65 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: addw a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i32 %a, 65 ret i32 %1 @@ -341,12 +322,8 @@ define i32 @muli32_p65(i32 %a) nounwind { define i32 @muli32_p63(i32 %a) nounwind { ; RV32I-LABEL: muli32_p63: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a1, zero, 63 -; RV32I-NEXT: call __mulsi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a1, a0, 6 +; RV32I-NEXT: sub a0, a1, a0 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_p63: @@ -357,18 +334,14 @@ define i32 @muli32_p63(i32 %a) nounwind { ; ; RV64I-LABEL: muli32_p63: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 63 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: subw a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_p63: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 63 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: subw a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i32 %a, 63 ret i32 %1 @@ -377,13 +350,15 @@ define i32 @muli32_p63(i32 %a) nounwind { define i64 @muli64_p65(i64 %a) nounwind { ; RV32I-LABEL: muli64_p65: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a2, zero, 65 -; RV32I-NEXT: mv a3, zero -; RV32I-NEXT: call __muldi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a3, a0, 6 +; RV32I-NEXT: add a2, a3, a0 +; RV32I-NEXT: sltu a3, a2, a3 +; RV32I-NEXT: srli a0, a0, 26 +; RV32I-NEXT: slli a4, a1, 6 +; RV32I-NEXT: or a0, a4, a0 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: add a1, a0, a3 +; RV32I-NEXT: mv a0, a2 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli64_p65: @@ -397,18 +372,14 @@ define i64 @muli64_p65(i64 %a) nounwind { ; ; RV64I-LABEL: muli64_p65: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 65 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: add a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli64_p65: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 65 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: add a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i64 %a, 65 ret i64 %1 @@ -417,13 +388,14 @@ define i64 @muli64_p65(i64 %a) nounwind { define i64 @muli64_p63(i64 %a) nounwind { ; RV32I-LABEL: muli64_p63: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a2, zero, 63 -; RV32I-NEXT: mv a3, zero -; RV32I-NEXT: call __muldi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a2, a0, 6 +; RV32I-NEXT: sltu a3, a2, a0 +; RV32I-NEXT: srli a4, a0, 26 +; RV32I-NEXT: slli a5, a1, 6 +; RV32I-NEXT: or a4, a5, a4 +; RV32I-NEXT: sub a1, a4, a1 +; RV32I-NEXT: sub a1, a1, a3 +; RV32I-NEXT: sub a0, a2, a0 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli64_p63: @@ -437,18 +409,14 @@ define i64 @muli64_p63(i64 %a) nounwind { ; ; RV64I-LABEL: muli64_p63: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, 63 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: sub a0, a1, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli64_p63: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, 63 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: sub a0, a1, a0 ; RV64IM-NEXT: ret %1 = mul i64 %a, 63 ret i64 %1 @@ -457,12 +425,8 @@ define i64 @muli64_p63(i64 %a) nounwind { define i32 @muli32_m63(i32 %a) nounwind { ; RV32I-LABEL: muli32_m63: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a1, zero, -63 -; RV32I-NEXT: call __mulsi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a1, a0, 6 +; RV32I-NEXT: sub a0, a0, a1 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_m63: @@ -473,18 +437,14 @@ define i32 @muli32_m63(i32 %a) nounwind { ; ; RV64I-LABEL: muli32_m63: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, -63 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: subw a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_m63: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, -63 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: subw a0, a0, a1 ; RV64IM-NEXT: ret %1 = mul i32 %a, -63 ret i32 %1 @@ -493,12 +453,9 @@ define i32 @muli32_m63(i32 %a) nounwind { define i32 @muli32_m65(i32 %a) nounwind { ; RV32I-LABEL: muli32_m65: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a1, zero, -65 -; RV32I-NEXT: call __mulsi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a1, a0, 6 +; RV32I-NEXT: add a0, a1, a0 +; RV32I-NEXT: neg a0, a0 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_m65: @@ -509,18 +466,16 @@ define i32 @muli32_m65(i32 %a) nounwind { ; ; RV64I-LABEL: muli32_m65: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, -65 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: add a0, a1, a0 +; RV64I-NEXT: negw a0, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_m65: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, -65 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: add a0, a1, a0 +; RV64IM-NEXT: negw a0, a0 ; RV64IM-NEXT: ret %1 = mul i32 %a, -65 ret i32 %1 @@ -529,13 +484,14 @@ define i32 @muli32_m65(i32 %a) nounwind { define i64 @muli64_m63(i64 %a) nounwind { ; RV32I-LABEL: muli64_m63: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a2, zero, -63 -; RV32I-NEXT: addi a3, zero, -1 -; RV32I-NEXT: call __muldi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a2, a0, 6 +; RV32I-NEXT: sltu a3, a0, a2 +; RV32I-NEXT: srli a4, a0, 26 +; RV32I-NEXT: slli a5, a1, 6 +; RV32I-NEXT: or a4, a5, a4 +; RV32I-NEXT: sub a1, a1, a4 +; RV32I-NEXT: sub a1, a1, a3 +; RV32I-NEXT: sub a0, a0, a2 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli64_m63: @@ -550,18 +506,14 @@ define i64 @muli64_m63(i64 %a) nounwind { ; ; RV64I-LABEL: muli64_m63: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, -63 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: sub a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli64_m63: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, -63 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: sub a0, a0, a1 ; RV64IM-NEXT: ret %1 = mul i64 %a, -63 ret i64 %1 @@ -570,13 +522,18 @@ define i64 @muli64_m63(i64 %a) nounwind { define i64 @muli64_m65(i64 %a) nounwind { ; RV32I-LABEL: muli64_m65: ; RV32I: # %bb.0: -; RV32I-NEXT: addi sp, sp, -16 -; RV32I-NEXT: sw ra, 12(sp) -; RV32I-NEXT: addi a2, zero, -65 -; RV32I-NEXT: addi a3, zero, -1 -; RV32I-NEXT: call __muldi3 -; RV32I-NEXT: lw ra, 12(sp) -; RV32I-NEXT: addi sp, sp, 16 +; RV32I-NEXT: slli a2, a0, 6 +; RV32I-NEXT: add a3, a2, a0 +; RV32I-NEXT: sltu a2, a3, a2 +; RV32I-NEXT: srli a0, a0, 26 +; RV32I-NEXT: slli a4, a1, 6 +; RV32I-NEXT: or a0, a4, a0 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: add a0, a0, a2 +; RV32I-NEXT: snez a1, a3 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: neg a1, a0 +; RV32I-NEXT: neg a0, a3 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli64_m65: @@ -591,18 +548,16 @@ define i64 @muli64_m65(i64 %a) nounwind { ; ; RV64I-LABEL: muli64_m65: ; RV64I: # %bb.0: -; RV64I-NEXT: addi sp, sp, -16 -; RV64I-NEXT: sd ra, 8(sp) -; RV64I-NEXT: addi a1, zero, -65 -; RV64I-NEXT: call __muldi3 -; RV64I-NEXT: ld ra, 8(sp) -; RV64I-NEXT: addi sp, sp, 16 +; RV64I-NEXT: slli a1, a0, 6 +; RV64I-NEXT: add a0, a1, a0 +; RV64I-NEXT: neg a0, a0 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli64_m65: ; RV64IM: # %bb.0: -; RV64IM-NEXT: addi a1, zero, -65 -; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: slli a1, a0, 6 +; RV64IM-NEXT: add a0, a1, a0 +; RV64IM-NEXT: neg a0, a0 ; RV64IM-NEXT: ret %1 = mul i64 %a, -65 ret i64 %1 From llvm-commits at lists.llvm.org Tue Jul 7 18:50:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:50:42 +0000 (UTC) Subject: [PATCH] D82660: [RISCV] Optimize multiplication by constant In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGcb82de296017: [RISCV] Optimize multiplication by constant (authored by benshi001, committed by MaskRay). Herald added a subscriber: jrtc27. Changed prior to commit: https://reviews.llvm.org/D82660?vs=276297&id=276299#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82660/new/ https://reviews.llvm.org/D82660 Files: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/test/CodeGen/RISCV/mul.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82660.276299.patch Type: text/x-patch Size: 14343 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:51:15 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:51:15 +0000 (UTC) Subject: [PATCH] D82660: [RISCV] Optimize multiplication by specific immediates In-Reply-To: References: Message-ID: <51030df7aa7f8c4843f61949926ffc86@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGcb82de296017: [RISCV] Optimize multiplication by constant (authored by benshi001, committed by MaskRay). Herald added a subscriber: jrtc27. Changed prior to commit: https://reviews.llvm.org/D82660?vs=275139&id=275694#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82660/new/ https://reviews.llvm.org/D82660 Files: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/test/CodeGen/RISCV/mul.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82660.275694.patch Type: text/x-patch Size: 14343 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 18:54:13 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:54:13 +0000 (UTC) Subject: [PATCH] D82262: [RISCV] optimize addition with a pair of (addi imm) In-Reply-To: References: Message-ID: MaskRay accepted this revision. MaskRay added a comment. Looks great! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82262/new/ https://reviews.llvm.org/D82262 From llvm-commits at lists.llvm.org Tue Jul 7 18:57:50 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Tue, 07 Jul 2020 18:57:50 -0700 (PDT) Subject: [llvm] 1e9d081 - [RISCV] optimize addition with a pair of (addi imm) Message-ID: <5f05281e.1c69fb81.849e4.5711@mx.google.com> Author: Ben Shi Date: 2020-07-07T18:57:28-07:00 New Revision: 1e9d0811c9bf40abbe07d591057869568f140036 URL: https://github.com/llvm/llvm-project/commit/1e9d0811c9bf40abbe07d591057869568f140036 DIFF: https://github.com/llvm/llvm-project/commit/1e9d0811c9bf40abbe07d591057869568f140036.diff LOG: [RISCV] optimize addition with a pair of (addi imm) For an addition with an immediate in specific ranges, a pair of addi-addi can be generated instead of the ordinary lui-addi-add serial. Reviewed By: MaskRay, luismarques Differential Revision: https://reviews.llvm.org/D82262 Added: llvm/test/CodeGen/RISCV/add-imm.ll Modified: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp index e7584e4f60ea..a0ae05081adc 100644 --- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp @@ -75,6 +75,30 @@ void RISCVDAGToDAGISel::Select(SDNode *Node) { EVT VT = Node->getValueType(0); switch (Opcode) { + case ISD::ADD: { + // Optimize (add r, imm) to (addi (addi r, imm0) imm1) if applicable. The + // immediate must be in specific ranges and have a single use. + if (auto *ConstOp = dyn_cast(Node->getOperand(1))) { + if (!(ConstOp->hasOneUse())) + break; + // The imm must be in range [-4096,-2049] or [2048,4094]. + int64_t Imm = ConstOp->getSExtValue(); + if (!(-4096 <= Imm && Imm <= -2049) && !(2048 <= Imm && Imm <= 4094)) + break; + // Break the imm to imm0+imm1. + SDLoc DL(Node); + EVT VT = Node->getValueType(0); + const SDValue ImmOp0 = CurDAG->getTargetConstant(Imm - Imm / 2, DL, VT); + const SDValue ImmOp1 = CurDAG->getTargetConstant(Imm / 2, DL, VT); + auto *NodeAddi0 = CurDAG->getMachineNode(RISCV::ADDI, DL, VT, + Node->getOperand(0), ImmOp0); + auto *NodeAddi1 = CurDAG->getMachineNode(RISCV::ADDI, DL, VT, + SDValue(NodeAddi0, 0), ImmOp1); + ReplaceNode(Node, NodeAddi1); + return; + } + break; + } case ISD::Constant: { auto ConstNode = cast(Node); if (VT == XLenVT && ConstNode->isNullValue()) { diff --git a/llvm/test/CodeGen/RISCV/add-imm.ll b/llvm/test/CodeGen/RISCV/add-imm.ll new file mode 100644 index 000000000000..2db13eb0ba83 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/add-imm.ll @@ -0,0 +1,209 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32I %s +; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV64I %s + +; These test how the immediate in an addition is materialized. + +define i32 @add_positive_low_bound_reject(i32 %a) nounwind { +; RV32I-LABEL: add_positive_low_bound_reject: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, 2047 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_positive_low_bound_reject: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, 2047 +; RV64I-NEXT: ret + %1 = add i32 %a, 2047 + ret i32 %1 +} + +define i32 @add_positive_low_bound_accept(i32 %a) nounwind { +; RV32I-LABEL: add_positive_low_bound_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, 1024 +; RV32I-NEXT: addi a0, a0, 1024 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_positive_low_bound_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, 1024 +; RV64I-NEXT: addi a0, a0, 1024 +; RV64I-NEXT: ret + %1 = add i32 %a, 2048 + ret i32 %1 +} + +define i32 @add_positive_high_bound_accept(i32 %a) nounwind { +; RV32I-LABEL: add_positive_high_bound_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, 2047 +; RV32I-NEXT: addi a0, a0, 2047 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_positive_high_bound_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, 2047 +; RV64I-NEXT: addi a0, a0, 2047 +; RV64I-NEXT: ret + %1 = add i32 %a, 4094 + ret i32 %1 +} + +define i32 @add_positive_high_bound_reject(i32 %a) nounwind { +; RV32I-LABEL: add_positive_high_bound_reject: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a1, 1 +; RV32I-NEXT: addi a1, a1, -1 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_positive_high_bound_reject: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a1, 1 +; RV64I-NEXT: addiw a1, a1, -1 +; RV64I-NEXT: add a0, a0, a1 +; RV64I-NEXT: ret + %1 = add i32 %a, 4095 + ret i32 %1 +} + +define i32 @add_negative_high_bound_reject(i32 %a) nounwind { +; RV32I-LABEL: add_negative_high_bound_reject: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, -2048 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_negative_high_bound_reject: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, -2048 +; RV64I-NEXT: ret + %1 = add i32 %a, -2048 + ret i32 %1 +} + +define i32 @add_negative_high_bound_accept(i32 %a) nounwind { +; RV32I-LABEL: add_negative_high_bound_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, -1025 +; RV32I-NEXT: addi a0, a0, -1024 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_negative_high_bound_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, -1025 +; RV64I-NEXT: addi a0, a0, -1024 +; RV64I-NEXT: ret + %1 = add i32 %a, -2049 + ret i32 %1 +} + +define i32 @add_negative_low_bound_accept(i32 %a) nounwind { +; RV32I-LABEL: add_negative_low_bound_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, -2048 +; RV32I-NEXT: addi a0, a0, -2048 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_negative_low_bound_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, -2048 +; RV64I-NEXT: addi a0, a0, -2048 +; RV64I-NEXT: ret + %1 = add i32 %a, -4096 + ret i32 %1 +} + +define i32 @add_negative_low_bound_reject(i32 %a) nounwind { +; RV32I-LABEL: add_negative_low_bound_reject: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a1, 1048575 +; RV32I-NEXT: addi a1, a1, -1 +; RV32I-NEXT: add a0, a0, a1 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add_negative_low_bound_reject: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a1, 1048575 +; RV64I-NEXT: addiw a1, a1, -1 +; RV64I-NEXT: add a0, a0, a1 +; RV64I-NEXT: ret + %1 = add i32 %a, -4097 + ret i32 %1 +} + +define i32 @add32_accept(i32 %a) nounwind { +; RV32I-LABEL: add32_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a0, a0, 1500 +; RV32I-NEXT: addi a0, a0, 1499 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add32_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, 1500 +; RV64I-NEXT: addi a0, a0, 1499 +; RV64I-NEXT: ret + %1 = add i32 %a, 2999 + ret i32 %1 +} + +define i64 @add64_accept(i64 %a) nounwind { +; RV32I-LABEL: add64_accept: +; RV32I: # %bb.0: +; RV32I-NEXT: addi a2, a0, 1500 +; RV32I-NEXT: addi a2, a2, 1499 +; RV32I-NEXT: sltu a0, a2, a0 +; RV32I-NEXT: add a1, a1, a0 +; RV32I-NEXT: mv a0, a2 +; RV32I-NEXT: ret +; +; RV64I-LABEL: add64_accept: +; RV64I: # %bb.0: +; RV64I-NEXT: addi a0, a0, 1500 +; RV64I-NEXT: addi a0, a0, 1499 +; RV64I-NEXT: ret + %1 = add i64 %a, 2999 + ret i64 %1 +} + + at ga = global i32 0, align 4 + at gb = global i32 0, align 4 +define void @add32_reject() nounwind { +; RV32I-LABEL: add32_reject: +; RV32I: # %bb.0: +; RV32I-NEXT: lui a0, %hi(ga) +; RV32I-NEXT: lw a1, %lo(ga)(a0) +; RV32I-NEXT: lui a2, %hi(gb) +; RV32I-NEXT: lw a3, %lo(gb)(a2) +; RV32I-NEXT: lui a4, 1 +; RV32I-NEXT: addi a4, a4, -1096 +; RV32I-NEXT: add a1, a1, a4 +; RV32I-NEXT: add a3, a3, a4 +; RV32I-NEXT: sw a1, %lo(ga)(a0) +; RV32I-NEXT: sw a3, %lo(gb)(a2) +; RV32I-NEXT: ret +; +; RV64I-LABEL: add32_reject: +; RV64I: # %bb.0: +; RV64I-NEXT: lui a0, %hi(ga) +; RV64I-NEXT: lw a1, %lo(ga)(a0) +; RV64I-NEXT: lui a2, %hi(gb) +; RV64I-NEXT: lw a3, %lo(gb)(a2) +; RV64I-NEXT: lui a4, 1 +; RV64I-NEXT: addiw a4, a4, -1096 +; RV64I-NEXT: add a1, a1, a4 +; RV64I-NEXT: add a3, a3, a4 +; RV64I-NEXT: sw a1, %lo(ga)(a0) +; RV64I-NEXT: sw a3, %lo(gb)(a2) +; RV64I-NEXT: ret + %1 = load i32, i32* @ga, align 4 + %2 = load i32, i32* @gb, align 4 + %3 = add i32 %1, 3000 + %4 = add i32 %2, 3000 + store i32 %3, i32* @ga, align 4 + store i32 %4, i32* @gb, align 4 + ret void +} From llvm-commits at lists.llvm.org Tue Jul 7 18:58:03 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 01:58:03 +0000 (UTC) Subject: [PATCH] D82262: [RISCV] optimize addition with a pair of (addi imm) In-Reply-To: References: Message-ID: <13906cdfd1f29d5d25a9a8a9f4958f63@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1e9d0811c9bf: [RISCV] optimize addition with a pair of (addi imm) (authored by benshi001, committed by MaskRay). Changed prior to commit: https://reviews.llvm.org/D82262?vs=276293&id=276300#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82262/new/ https://reviews.llvm.org/D82262 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/add-imm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82262.276300.patch Type: text/x-patch Size: 7625 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 19:01:30 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:01:30 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <470cede55e0fb6b595c9ee19923c5133@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/ELF/Thunks.cpp:982 + if ((s.stOther >> 5) == 1 && type == R_PPC64_REL24) + return make(s); ---------------- This needs a comment. ================ Comment at: lld/test/ELF/ppc64-error-toc-local-call.s:7 + +# This test checks that the linker produces errors when it is missing the nop +# after a local call to a callee with st_other=1. ---------------- Use `## ` for comments. Newer tests conform to this rule. ================ Comment at: lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s:29 + lwz 4, global at toc@l(30) + add 3, 4, 3 + bl callee ---------------- `add 3, 4, 3` is irrelevant and can be deleted. ================ Comment at: lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s:35 + .long 0 + .size global, 4 ---------------- `.size` can be deleted Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 From llvm-commits at lists.llvm.org Tue Jul 7 19:04:05 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Tue, 07 Jul 2020 19:04:05 -0700 (PDT) Subject: [llvm] 51b0da7 - Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." Message-ID: <5f052995.1c69fb81.150c9.bb11@mx.google.com> Author: Craig Topper Date: 2020-07-07T19:01:58-07:00 New Revision: 51b0da731af75c68dd521e04cc576d5a611b1612 URL: https://github.com/llvm/llvm-project/commit/51b0da731af75c68dd521e04cc576d5a611b1612 DIFF: https://github.com/llvm/llvm-project/commit/51b0da731af75c68dd521e04cc576d5a611b1612.diff LOG: Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." These represent the same thing but 64BIT only showed up from getHostCPUFeatures providing a list of featuers to clang. While EM64T showed up from getting the features for a named CPU. EM64T didn't have a string specifically so it would not be passed up to clang when getting features for a named CPU. While 64bit needed a name since that's how it is index. Merge them by filtering 64bit out before sending features to clang for named CPUs. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 9910fd615b1d..ed41295166b3 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -184,10 +184,6 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") -// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is -// valid for 64-bit mode, but has empty string so it doesn't get added to -// target attributes in IR. -X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index 3a7d9a0242fa..db99612c97b5 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; } - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_CORE2; // "core2" *Subtype = X86::INTEL_CORE2_65; break; @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 15: { - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_NOCONA; break; } @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, setFeature(X86::FEATURE_FMA4); if (HasExtLeaf1 && ((EDX >> 29) & 1)) - setFeature(X86::FEATURE_EM64T); + setFeature(X86::FEATURE_64BIT); } StringRef sys::getHostCPUName() { diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index df03f63e720e..cbb7f6186d0d 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -48,6 +48,14 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } + constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { + uint32_t NewBits = Bits[I] & RHS.Bits[I]; + Bits[I] = NewBits; + } + return *this; + } + constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { uint32_t NewBits = Bits[I] | RHS.Bits[I]; @@ -57,16 +65,14 @@ class FeatureBitset { } constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { - FeatureBitset Result; - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) - Result.Bits[I] = Bits[I] & RHS.Bits[I]; + FeatureBitset Result = *this; + Result &= RHS; return Result; } constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { - FeatureBitset Result; - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) - Result.Bits[I] = Bits[I] | RHS.Bits[I]; + FeatureBitset Result = *this; + Result |= RHS; return Result; } @@ -111,10 +117,10 @@ static constexpr FeatureBitset FeaturesPentium4 = static constexpr FeatureBitset FeaturesPrescott = FeaturesPentium4 | FeatureSSE3; static constexpr FeatureBitset FeaturesNocona = - FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; + FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; // Basic 64-bit capable CPU. -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; // Intel Core CPUs static constexpr FeatureBitset FeaturesCore2 = @@ -201,7 +207,7 @@ static constexpr FeatureBitset FeaturesAthlon = static constexpr FeatureBitset FeaturesAthlonXP = FeaturesAthlon | FeatureFXSR | FeatureSSE; static constexpr FeatureBitset FeaturesK8 = - FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; + FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | @@ -209,7 +215,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; @@ -220,7 +226,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = // AMD Bulldozer architecture processors. static constexpr FeatureBitset FeaturesBDVER1 = FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | - FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | + FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; @@ -236,7 +242,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = static constexpr FeatureBitset FeaturesZNVER1 = FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | - FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | + FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | @@ -363,7 +369,7 @@ static constexpr ProcInfo Processors[] = { X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { for (const auto &P : Processors) - if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) return P.Kind; return CK_None; @@ -372,7 +378,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, bool Only64Bit) { for (const auto &P : Processors) - if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) Values.emplace_back(P.Name); } @@ -401,7 +407,6 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; -static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; @@ -528,8 +533,14 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); + FeatureBitset Bits = I->Features; + + // Remove the 64-bit feature which we only use to validate if a CPU can + // be used with 64-bit mode. + Bits &= ~Feature64BIT; + // Add the string version of all set bits. - getFeatureBitsAsStrings(I->Features, EnabledFeatures); + getFeatureBitsAsStrings(Bits, EnabledFeatures); } // For each feature that is (transitively) implied by this feature, set it. From llvm-commits at lists.llvm.org Tue Jul 7 19:12:45 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:12:45 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <31ed5806319efb6ce4b0689761b23b4e@localhost.localdomain> jdoerfert added inline comments. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:471 VersionedClause, - VersionedClause, VersionedClause, ---------------- [Drive By] Unrelated? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Tue Jul 7 19:14:05 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:14:05 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <69979d073578d7cb47f9156d15914e62@localhost.localdomain> clementval marked 2 inline comments as done. clementval added inline comments. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:471 VersionedClause, - VersionedClause, VersionedClause, ---------------- jdoerfert wrote: > [Drive By] Unrelated? It was a duplicate. Already on line 469. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Tue Jul 7 19:14:49 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:14:49 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <84a120b2ff415915e426d4c801475e4e@localhost.localdomain> clementval marked 2 inline comments as done. clementval added inline comments. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:471 VersionedClause, - VersionedClause, VersionedClause, ---------------- clementval wrote: > jdoerfert wrote: > > [Drive By] Unrelated? > It was a duplicate. Already on line 469. Makes compilation error because of duplicate case when code is generated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Tue Jul 7 19:15:20 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:15:20 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: jdoerfert added inline comments. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:471 VersionedClause, - VersionedClause, VersionedClause, ---------------- clementval wrote: > clementval wrote: > > jdoerfert wrote: > > > [Drive By] Unrelated? > > It was a duplicate. Already on line 469. > Makes compilation error because of duplicate case when code is generated. Commit NFC changes like that w/o review if it "just makes sense". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Tue Jul 7 19:17:03 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:17:03 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: clementval marked 2 inline comments as done. clementval added inline comments. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:471 VersionedClause, - VersionedClause, VersionedClause, ---------------- jdoerfert wrote: > clementval wrote: > > clementval wrote: > > > jdoerfert wrote: > > > > [Drive By] Unrelated? > > > It was a duplicate. Already on line 469. > > Makes compilation error because of duplicate case when code is generated. > Commit NFC changes like that w/o review if it "just makes sense". Ok! Good to know! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Tue Jul 7 19:23:20 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:23:20 +0000 (UTC) Subject: [PATCH] D82262: [RISCV] Optimize addition with an immediate In-Reply-To: References: Message-ID: <71b1ef9aaeb977bae0c0b90d398ad255@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG1e9d0811c9bf: [RISCV] optimize addition with a pair of (addi imm) (authored by benshi001, committed by MaskRay). Changed prior to commit: https://reviews.llvm.org/D82262?vs=275546&id=275696#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82262/new/ https://reviews.llvm.org/D82262 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/test/CodeGen/RISCV/add-imm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82262.275696.patch Type: text/x-patch Size: 7625 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 19:29:34 2020 From: llvm-commits at lists.llvm.org (Nico Weber via llvm-commits) Date: Tue, 07 Jul 2020 19:29:34 -0700 (PDT) Subject: [llvm] fe13ee8 - [gn build] Port baca8f977ed Message-ID: <5f052f8e.1c69fb81.b3c89.b1c3@mx.google.com> Author: Nico Weber Date: 2020-07-07T22:29:19-04:00 New Revision: fe13ee875b102e5bf66c212883852f08796168b3 URL: https://github.com/llvm/llvm-project/commit/fe13ee875b102e5bf66c212883852f08796168b3 DIFF: https://github.com/llvm/llvm-project/commit/fe13ee875b102e5bf66c212883852f08796168b3.diff LOG: [gn build] Port baca8f977ed Added: Modified: llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn b/llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn index 7793901770f3..1291a5d33cbb 100644 --- a/llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn +++ b/llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn @@ -449,13 +449,9 @@ static_library("builtins") { "hexagon/dfsqrt.S", "hexagon/divdi3.S", "hexagon/divsi3.S", - "hexagon/fabs_opt.S", "hexagon/fastmath2_dlib_asm.S", "hexagon/fastmath2_ldlib_asm.S", "hexagon/fastmath_dlib_asm.S", - "hexagon/fma_opt.S", - "hexagon/fmax_opt.S", - "hexagon/fmin_opt.S", "hexagon/memcpy_forward_vp4cp4n2.S", "hexagon/memcpy_likely_aligned.S", "hexagon/moddi3.S", From llvm-commits at lists.llvm.org Tue Jul 7 19:37:23 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:37:23 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <6abbfedd258cf40cea1de5a8c6e030c7@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36727 + unsigned NumCstElts = cast(C->getType())->getNumElements(); + if (NumCstElts != NumElts && NumCstElts != (NumElts * 2)) + return false; ---------------- I think this check isn't enough if the load is narrower than the constant pool vector. For example a v16i8 load with a v32i8 constant pool. So NumCstElts == NumElts * 2 and we'll proceed. I think this is the cause of some failures we're seeing, but I don't have a reduced case yet. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Tue Jul 7 19:38:23 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Tue, 07 Jul 2020 19:38:23 -0700 (PDT) Subject: [llvm] 0a41493 - [openmp][NFC] Remove duplicate clause defaultmap for target parallel do Message-ID: <5f05319f.1c69fb81.2066a.a0cd@mx.google.com> Author: clementval Date: 2020-07-07T22:38:13-04:00 New Revision: 0a41493b9822514daf72d036d088ac91d9235b0c URL: https://github.com/llvm/llvm-project/commit/0a41493b9822514daf72d036d088ac91d9235b0c DIFF: https://github.com/llvm/llvm-project/commit/0a41493b9822514daf72d036d088ac91d9235b0c.diff LOG: [openmp][NFC] Remove duplicate clause defaultmap for target parallel do Added: Modified: llvm/include/llvm/Frontend/OpenMP/OMP.td Removed: ################################################################################ diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td index ce0bf4661176..692bd2fb3210 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMP.td +++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td @@ -468,7 +468,6 @@ def OMP_TargetParallelDo : Directive<"target parallel do"> { VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, From llvm-commits at lists.llvm.org Tue Jul 7 19:41:00 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 02:41:00 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <7e68da8f483a589599d434d38bedff4f@localhost.localdomain> clementval updated this revision to Diff 276302. clementval marked an inline comment as done. clementval added a comment. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276302.patch Type: text/x-patch Size: 7095 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 19:57:31 2020 From: llvm-commits at lists.llvm.org (Nico Weber via llvm-commits) Date: Tue, 07 Jul 2020 19:57:31 -0700 (PDT) Subject: [llvm] e885f33 - Revert "[X86] Add back the assert in getImpliedFeatures that I removed in ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846" Message-ID: <5f05361b.1c69fb81.b3c89.b49c@mx.google.com> Author: Nico Weber Date: 2020-07-07T22:56:08-04:00 New Revision: e885f336fd78e35ccb8e967e0664b356de333963 URL: https://github.com/llvm/llvm-project/commit/e885f336fd78e35ccb8e967e0664b356de333963 DIFF: https://github.com/llvm/llvm-project/commit/e885f336fd78e35ccb8e967e0664b356de333963.diff LOG: Revert "[X86] Add back the assert in getImpliedFeatures that I removed in ef4cc70f3ed2a91e0a48c6448c517c3ba34c2846" This reverts commit 91f70675cc6e5c872e0059c11d797b8726eeac67. It seems to break most (all?) hwasan tests. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index ed41295166b3..4b96c66b0e29 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -226,6 +226,5 @@ X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches") X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls") X86_FEATURE (LVI_CFI, "lvi-cfi") X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening") -X86_FEATURE (SESES, "seses") #undef X86_FEATURE_COMPAT #undef X86_FEATURE diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index cbb7f6186d0d..7e87d65a7c56 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -446,7 +446,6 @@ static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_BRANCHES = {}; static constexpr FeatureBitset ImpliedFeaturesRETPOLINE_INDIRECT_CALLS = {}; static constexpr FeatureBitset ImpliedFeaturesLVI_CFI = {}; static constexpr FeatureBitset ImpliedFeaturesLVI_LOAD_HARDENING = {}; -static constexpr FeatureBitset ImpliedFeaturesSESES = {}; // XSAVE features are dependent on basic XSAVE. static constexpr FeatureBitset ImpliedFeaturesXSAVEC = FeatureXSAVE; @@ -574,7 +573,6 @@ void llvm::X86::getImpliedFeatures( if (I == std::end(FeatureInfos)) { // FIXME: This shouldn't happen, but may not have all features in the table // yet. - assert(false && "Feature not found in table!"); return; } From llvm-commits at lists.llvm.org Tue Jul 7 20:03:21 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:03:21 +0000 (UTC) Subject: [PATCH] D83365: [PowerPC] start and end may exist in different block before Message-ID: shchenz created this revision. shchenz added reviewers: nemanjai, jsji, PowerPC. Herald added subscribers: llvm-commits, wuzish, kbarton, hiraditya. Herald added a project: LLVM. In `fixupIsDeadOrKill`, we assume `StartMI` and `EndMI` not exist in same basic block, so we add an assertion in that function.This is right after RA. We extend this assumption to before RA in https://reviews.llvm.org/D81723. This is not right, as before RA the true definition may exist in another block through copy like instructions. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83365 Files: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir Index: llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir =================================================================== --- /dev/null +++ llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir @@ -0,0 +1,21 @@ +# RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs -start-before ppc-mi-peepholes \ +# RUN: -stop-after ppc-mi-peepholes %s -o - | FileCheck %s + +--- +name: test +#CHECK : name : test +tracksRegLiveness: true +body: | + bb.0.entry: + liveins: $x3 + %0:g8rc = COPY $x3 + %1:gprc = COPY %0.sub_32:g8rc + %2:g8rc = LI8 63 + + bb.1: + %3:gprc = COPY %2.sub_32:g8rc + ; CHECK: %4:gprc = LI 0 + %4:gprc = XORI killed %3:gprc, 63 + STW killed %4:gprc, %4:gprc, 100 + BLR8 implicit $lr8, implicit $rm +... Index: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.cpp +++ llvm/lib/Target/PowerPC/PPCInstrInfo.cpp @@ -2655,10 +2655,9 @@ void PPCInstrInfo::fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const { - - // Instructions between [StartMI, EndMI] should be in same basic block. - assert((StartMI.getParent() == EndMI.getParent()) && - "Instructions are not in same basic block"); + // FIXME: fix up kill/dead flag crossing basic blocks. + if (StartMI.getParent() != EndMI.getParent()) + return; bool IsKillSet = false; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83365.276304.patch Type: text/x-patch Size: 1490 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 20:07:52 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:07:52 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram marked an inline comment as done and an inline comment as not done. tmsriram added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll:23-30 +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8, int p9, int pa, int pb, int pc) { +; int result; +; if (k) +; result = p1 * p2 + p3 / p4 - p5 * p6 + p7 / p8 - p9 * pa + pb / pc; +; else +; result = p1 / p2 - p3 * p4 + p5 / p6 - p7 * p8 + p9 / pa - pb * pc; +; return result; ---------------- dblaikie wrote: > tmsriram wrote: > > dblaikie wrote: > > > tmsriram wrote: > > > > dblaikie wrote: > > > > > Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call? > > > > > > > > > > (etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc) > > > > > > > > > > > > > > > > > > > > It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation: > > > > > > > > _Z7computebiiiiiiiiiiii.1: # %if.then > > > > .cfi_startproc > > > > .cfi_def_cfa %rbp, 16 > > > > .cfi_offset %rbx, -48 > > > > .cfi_offset %r12, -40 > > > > .cfi_offset %r14, -32 > > > > .cfi_offset %r15, -24 > > > > .cfi_offset %rbp, -16 > > > > > > > > Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers. > > > > > > > Ah, OK - a comment might be handy to describe that? > > > > > > And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc? > > > > > > Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated? > > Simplified the test and added comments. Having more than 4 integers in the struct seems to go to the stack though the ABI says upto 32 bytes. > > > Ah - the comment's good, thanks! Not sure about the code changes - I was hoping for more uniformity (& brevity as a second-order benefit), but having a struct, then 3 struct parameters and some ints lacks the uniformity I was hoping for. > > Also the arithmetic looks sort of arbitrarily complicated (& raises the question, as a reader (for me at least), why is it complicated? Is the particular sequence of arithmetic important in some way?). Is a more uniform operation (like all addition) not viable due to vectorizing or something? (is a function call inadequate because of other spill issues (eg: void f1(bool k, int a, int b.... ) { int result; if (k) { result f2(a, b, ... ); } else { result = f3(a, b, ...); } return result; })?) @wmi who is a register allocation expert. I am not an expert but we need to use callee saved registers in some manner. Function calls would still only use caller-save registers and unless we have a *callee* where there is serious computation combined with a dearth of scratch registers (caller-save), callee saved registers will be avoided. Does this make sense :) ? I can try to simplify the computation or spend some time to see if I can produce simpler code that can still use callee saved registers, but I dont have concrete ideas now. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 20:20:51 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:20:51 +0000 (UTC) Subject: [PATCH] D81682: [PGO] Extend the value profile buckets for mem op sizes. In-Reply-To: References: Message-ID: davidxl added inline comments. ================ Comment at: compiler-rt/include/profile/InstrProfData.inc:839 +#if defined(_MSC_VER) && !defined(__clang__) + +#include ---------------- There is __popcnt etc. Can they be used? https://docs.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=vs-2019 ================ Comment at: compiler-rt/include/profile/InstrProfData.inc:842 +INSTR_PROF_VISIBILITY INSTR_PROF_INLINE +int InstProfClzll(unsigned long long X) { + unsigned long LeadZeroIdx = 0; ---------------- Since these helpers are only used by runtime on target, not by the host compiler, they should be moved to InstrProfilingUtil.c instead as InstrProfData.Inc is shared by runtime and compiler. ================ Comment at: compiler-rt/lib/profile/InstrProfilingValue.c:274 +COMPILER_RT_VISIBILITY void +__llvm_profile_instrument_memop(uint64_t TargetValue, void *Data, + uint32_t CounterIndex) { ---------------- Ideally, this function should be inline expanded by the compiler at instrumentation time -- but that can be done separately. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81682/new/ https://reviews.llvm.org/D81682 From llvm-commits at lists.llvm.org Tue Jul 7 20:21:30 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:21:30 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <1c240ae00333c82f4974ef95ceb786df@localhost.localdomain> tmsriram marked an inline comment as done and 2 inline comments as not done. tmsriram added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll:23-30 +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8, int p9, int pa, int pb, int pc) { +; int result; +; if (k) +; result = p1 * p2 + p3 / p4 - p5 * p6 + p7 / p8 - p9 * pa + pb / pc; +; else +; result = p1 / p2 - p3 * p4 + p5 / p6 - p7 * p8 + p9 / pa - pb * pc; +; return result; ---------------- tmsriram wrote: > dblaikie wrote: > > tmsriram wrote: > > > dblaikie wrote: > > > > tmsriram wrote: > > > > > dblaikie wrote: > > > > > > Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call? > > > > > > > > > > > > (etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc) > > > > > > > > > > > > > > > > > > > > > > > > > It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation: > > > > > > > > > > _Z7computebiiiiiiiiiiii.1: # %if.then > > > > > .cfi_startproc > > > > > .cfi_def_cfa %rbp, 16 > > > > > .cfi_offset %rbx, -48 > > > > > .cfi_offset %r12, -40 > > > > > .cfi_offset %r14, -32 > > > > > .cfi_offset %r15, -24 > > > > > .cfi_offset %rbp, -16 > > > > > > > > > > Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers. > > > > > > > > > Ah, OK - a comment might be handy to describe that? > > > > > > > > And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc? > > > > > > > > Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated? > > > Simplified the test and added comments. Having more than 4 integers in the struct seems to go to the stack though the ABI says upto 32 bytes. > > > > > Ah - the comment's good, thanks! Not sure about the code changes - I was hoping for more uniformity (& brevity as a second-order benefit), but having a struct, then 3 struct parameters and some ints lacks the uniformity I was hoping for. > > > > Also the arithmetic looks sort of arbitrarily complicated (& raises the question, as a reader (for me at least), why is it complicated? Is the particular sequence of arithmetic important in some way?). Is a more uniform operation (like all addition) not viable due to vectorizing or something? (is a function call inadequate because of other spill issues (eg: void f1(bool k, int a, int b.... ) { int result; if (k) { result f2(a, b, ... ); } else { result = f3(a, b, ...); } return result; })?) > @wmi who is a register allocation expert. I am not an expert but we need to use callee saved registers in some manner. Function calls would still only use caller-save registers and unless we have a *callee* where there is serious computation combined with a dearth of scratch registers (caller-save), callee saved registers will be avoided. Does this make sense :) ? > > I can try to simplify the computation or spend some time to see if I can produce simpler code that can still use callee saved registers, but I dont have concrete ideas now. > > Ah! I think you are right :), sorry about that! It does use callee saved registers for function calls. Let me simplify this along these lines CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 20:30:50 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:30:50 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI Message-ID: skan created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83366 Files: llvm/lib/MC/MCAssembler.cpp Index: llvm/lib/MC/MCAssembler.cpp =================================================================== --- llvm/lib/MC/MCAssembler.cpp +++ llvm/lib/MC/MCAssembler.cpp @@ -820,48 +820,58 @@ // Evaluate and apply the fixups, generating relocation entries as necessary. for (MCSection &Sec : *this) { for (MCFragment &Frag : Sec) { - // Data and relaxable fragments both have fixups. So only process - // those here. - // FIXME: Is there a better way to do this? MCEncodedFragmentWithFixups - // being templated makes this tricky. - if (isa(&Frag) && - isa(&Frag)) - continue; - if (!isa(&Frag) && !isa(&Frag) && - !isa(&Frag)) + // Insert fixup type for code alignment if the target define + // shouldInsertFixupForCodeAlign target hook. + if (MCAlignFragment *AF = dyn_cast(&Frag)) { + if (Sec.UseCodeAlign() && AF->hasEmitNops()) { + getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); + } continue; + } + // Only process MCEncodedFragmentWithFixups here. ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; - if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *AF = dyn_cast(&Frag)) { - // Insert fixup type for code alignment if the target define - // shouldInsertFixupForCodeAlign target hook. - if (Sec.UseCodeAlign() && AF->hasEmitNops()) { - getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); - } + switch (Frag.getKind()) { + default: + // We reach here when Frag is not neither a MCAlignFragment nor + // MCEncodedFragmentWithFixups. continue; - } else if (auto *FragWithFixups = - dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else - llvm_unreachable("Unknown fragment with fixups!"); + case MCFragment::FT_Data: { + MCDataFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + STI = DF.getSubtargetInfo(); + assert(!DF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_Relaxable: { + MCRelaxableFragment &RF = cast(Frag); + Fixups = RF.getFixups(); + Contents = RF.getContents(); + STI = RF.getSubtargetInfo(); + assert(!RF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_CVDefRange: { + MCCVDefRangeFragment &CF = cast(Frag); + Fixups = CF.getFixups(); + Contents = CF.getContents(); + break; + } + case MCFragment::FT_Dwarf: { + MCDwarfLineAddrFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + case MCFragment::FT_DwarfFrame: { + MCDwarfCallFrameFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + } for (const MCFixup &Fixup : Fixups) { uint64_t FixedValue; bool IsResolved; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83366.276306.patch Type: text/x-patch Size: 4411 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 20:31:14 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:31:14 +0000 (UTC) Subject: [PATCH] D83367: [NewPM][opt] Translate "-O#" to NPM's "default" Message-ID: aeubanks created this revision. aeubanks added reviewers: ychen, hans, asbirlea. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Fixes 52 check-llvm tests under NPM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83367 Files: llvm/test/Other/opt-hot-cold-split.ll llvm/tools/opt/opt.cpp Index: llvm/tools/opt/opt.cpp =================================================================== --- llvm/tools/opt/opt.cpp +++ llvm/tools/opt/opt.cpp @@ -747,6 +747,18 @@ for (const auto &P : PassList) { Passes.push_back(P->getPassArgument()); } + if (OptLevelO0) + Passes.push_back("default"); + if (OptLevelO1) + Passes.push_back("default"); + if (OptLevelO2) + Passes.push_back("default"); + if (OptLevelO3) + Passes.push_back("default"); + if (OptLevelOs) + Passes.push_back("default"); + if (OptLevelOz) + Passes.push_back("default"); OutputKind OK = OK_NoOutput; if (!NoOutput) OK = OutputAssembly Index: llvm/test/Other/opt-hot-cold-split.ll =================================================================== --- llvm/test/Other/opt-hot-cold-split.ll +++ llvm/test/Other/opt-hot-cold-split.ll @@ -1,8 +1,8 @@ ; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -debug-pass=Structure < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=DEFAULT-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os ; REQUIRES: asserts -------------- next part -------------- A non-text attachment was scrubbed... Name: D83367.276307.patch Type: text/x-patch Size: 2426 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 20:33:21 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:33:21 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <841d6c686bf44836d3fdab7e7fd3bc64@localhost.localdomain> skan updated this revision to Diff 276308. skan added a comment. Update comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 Files: llvm/lib/MC/MCAssembler.cpp Index: llvm/lib/MC/MCAssembler.cpp =================================================================== --- llvm/lib/MC/MCAssembler.cpp +++ llvm/lib/MC/MCAssembler.cpp @@ -820,48 +820,58 @@ // Evaluate and apply the fixups, generating relocation entries as necessary. for (MCSection &Sec : *this) { for (MCFragment &Frag : Sec) { - // Data and relaxable fragments both have fixups. So only process - // those here. - // FIXME: Is there a better way to do this? MCEncodedFragmentWithFixups - // being templated makes this tricky. - if (isa(&Frag) && - isa(&Frag)) - continue; - if (!isa(&Frag) && !isa(&Frag) && - !isa(&Frag)) + // Insert fixup type for code alignment if the target define + // shouldInsertFixupForCodeAlign target hook. + if (MCAlignFragment *AF = dyn_cast(&Frag)) { + if (Sec.UseCodeAlign() && AF->hasEmitNops()) { + getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); + } continue; + } + // Only process MCEncodedFragmentWithFixups here. ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; - if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *AF = dyn_cast(&Frag)) { - // Insert fixup type for code alignment if the target define - // shouldInsertFixupForCodeAlign target hook. - if (Sec.UseCodeAlign() && AF->hasEmitNops()) { - getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); - } + switch (Frag.getKind()) { + default: + // We reach here when Frag is neither a MCAlignFragment nor a + // MCEncodedFragmentWithFixups. continue; - } else if (auto *FragWithFixups = - dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else - llvm_unreachable("Unknown fragment with fixups!"); + case MCFragment::FT_Data: { + MCDataFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + STI = DF.getSubtargetInfo(); + assert(!DF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_Relaxable: { + MCRelaxableFragment &RF = cast(Frag); + Fixups = RF.getFixups(); + Contents = RF.getContents(); + STI = RF.getSubtargetInfo(); + assert(!RF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_CVDefRange: { + MCCVDefRangeFragment &CF = cast(Frag); + Fixups = CF.getFixups(); + Contents = CF.getContents(); + break; + } + case MCFragment::FT_Dwarf: { + MCDwarfLineAddrFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + case MCFragment::FT_DwarfFrame: { + MCDwarfCallFrameFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + } for (const MCFixup &Fixup : Fixups) { uint64_t FixedValue; bool IsResolved; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83366.276308.patch Type: text/x-patch Size: 4409 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 20:36:41 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:36:41 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: NeHuang updated this revision to Diff 276305. NeHuang marked an inline comment as not done. NeHuang added a comment. - Keep only one lit test with one input file to cover all the cases: 1. local linkage callee with `st_other=0` 2. local linkage callee with `st_other=1` 3. global linkage callee with `st_other=0` 4. global linkage callee with `st_other=1` - Put the check of all three unimplemented protocols in `PPC64::needsThunk` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/test/ELF/Inputs/ppc64-callee-global-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.276305.patch Type: text/x-patch Size: 6996 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 20:44:08 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:44:08 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <510605d5f16109f3c930ef740a03f581@localhost.localdomain> NeHuang marked an inline comment as done. NeHuang added inline comments. ================ Comment at: lld/ELF/Arch/PPC64.cpp:1041 return false; // If a function is in the Plt it needs to be called with a call-stub. ---------------- sfertile wrote: > We should probably insert a couple of fatal error here: > 1) if the type in NOTOC and the symbols st_other indicates it needs the toc-pointer setup. > 2) If the type is not NOTOC but the symbols st_other indicates it tramples the toc. Thanks Sean for the advice. I also moved the fatal error check for the protocol "external call with R_PPC64_REL_NOTOC" here so that we are checking all unimplemented protocols in the same function. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Tue Jul 7 20:47:32 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 03:47:32 +0000 (UTC) Subject: [PATCH] D83368: [NewPM][opt] Share -disable-loop-unrolling between pass managers Message-ID: aeubanks created this revision. aeubanks added a reviewer: echristo. Herald added subscribers: llvm-commits, zzheng. Herald added a project: LLVM. There's no reason to introduce a new option for the NPM. The various PGO options are shared in this manner. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83368 Files: llvm/test/Transforms/LoopUnroll/FullUnroll.ll llvm/tools/opt/NewPMDriver.cpp llvm/tools/opt/opt.cpp Index: llvm/tools/opt/opt.cpp =================================================================== --- llvm/tools/opt/opt.cpp +++ llvm/tools/opt/opt.cpp @@ -183,10 +183,9 @@ static cl::opt TargetTriple("mtriple", cl::desc("Override target triple for module")); -static cl::opt -DisableLoopUnrolling("disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), - cl::init(false)); +cl::opt DisableLoopUnrolling( + "disable-loop-unrolling", + cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); static cl::opt EmitSummaryIndex("module-summary", cl::desc("Emit module summary index"), Index: llvm/tools/opt/NewPMDriver.cpp =================================================================== --- llvm/tools/opt/NewPMDriver.cpp +++ llvm/tools/opt/NewPMDriver.cpp @@ -102,9 +102,7 @@ cl::Hidden); // Individual pipeline tuning options. -static cl::opt DisableLoopUnrolling( - "new-pm-disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); +extern cl::opt DisableLoopUnrolling; extern cl::opt PGOKindFlag; extern cl::opt ProfileFile; Index: llvm/test/Transforms/LoopUnroll/FullUnroll.ll =================================================================== --- llvm/test/Transforms/LoopUnroll/FullUnroll.ll +++ llvm/test/Transforms/LoopUnroll/FullUnroll.ll @@ -1,4 +1,4 @@ -; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -new-pm-disable-loop-unrolling=true \ +; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -disable-loop-unrolling=true \ ; RUN: -S -o - %s | FileCheck %s ; This checks that the loop full unroller will fire in the new pass manager -------------- next part -------------- A non-text attachment was scrubbed... Name: D83368.276309.patch Type: text/x-patch Size: 1880 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 21:07:53 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:07:53 +0000 (UTC) Subject: [PATCH] D83142: [flang] Make 'num_images()' intrinsic In-Reply-To: References: Message-ID: <508c7170b4ec264f5eaf95975470b4f5@localhost.localdomain> PeteSteinfeld added a comment. In D83142#2137875 , @ktras wrote: > In D83142#2137260 , @PeteSteinfeld wrote: > > > @ktras, are you planning to implement the other coarray intrinsic functions? > > > Yes, that is my plan. I am looking at 'this_image()' next. Excellent! I'm working on some NAG tests that use coarrays, and they use `this_image()`. Thanks for doing this! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83142/new/ https://reviews.llvm.org/D83142 From llvm-commits at lists.llvm.org Tue Jul 7 21:08:08 2020 From: llvm-commits at lists.llvm.org (Serguei Katkov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:08:08 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: skatkov added a comment. could you please explain the case due to you need to return back in collectGCRegs implementation? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Tue Jul 7 21:10:43 2020 From: llvm-commits at lists.llvm.org (Serguei Katkov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:10:43 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: skatkov added a comment. As I understand a lot of things here are done to workout the case that one catch block can correspond to different invoke instructions. Can you please add the test examine this case? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 From llvm-commits at lists.llvm.org Tue Jul 7 21:41:01 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:41:01 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <5db64a97e8c4ebf10eb054367de0c497@localhost.localdomain> tmsriram updated this revision to Diff 276310. tmsriram marked 6 inline comments as done. tmsriram added a comment. Fix CFIInstr test and address other reviewer comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/DebugInfo/X86/basic-block-sections-cfi_1.ll llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.276310.patch Type: text/x-patch Size: 15095 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 21:41:10 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:41:10 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram added inline comments. ================ Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:488-490 + MachineFunction &MF = *MBB.getParent(); + if (!hasFP(MF)) + return; ---------------- wmi wrote: > .cfi_offset directive for framepointer is inserted after other .cfi_offset directives for callee save registers. This is different from the .cfi_offset order inserted for prologue. I am not familiar with how the dedup cfi is implemented. A question is will the order difference reduce the chance of deduplicating cfi in prologue? I fixed the order and checked that dedup works just fine. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 21:47:08 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:47:08 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <12224296ab1369e542c451512e338a5b@localhost.localdomain> MaskRay added a comment. Note that the number of lines actually increases... so I am not sure this is "simplification" Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 21:52:19 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:52:19 +0000 (UTC) Subject: [PATCH] D83368: [NewPM][opt] Share -disable-loop-unrolling between pass managers In-Reply-To: References: Message-ID: <4783a15e2f454c9a933112c0a78b69f4@localhost.localdomain> echristo accepted this revision. echristo added a comment. This revision is now accepted and ready to land. If this works then awesome. :) -eric Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83368/new/ https://reviews.llvm.org/D83368 From llvm-commits at lists.llvm.org Tue Jul 7 21:54:29 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:54:29 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: wmi accepted this revision. wmi added a comment. This revision is now accepted and ready to land. Thanks. LGTM. Please wait and see if David has more comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 21:58:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 04:58:45 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <6679dde89018de3f3050d081f1fa3540@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:3084 + for (const HandlerInfo &HI : Handlers) { + HI.Handler->beginBasicBlock(MBB); + } ---------------- Drop braces for simple statements CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 22:02:15 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:02:15 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <51f7151a645c559a160a9a7f1f42f371@localhost.localdomain> skan added a comment. In D83366#2138183 , @MaskRay wrote: > Note that the number of lines actually increases... so I am not sure this is "simplification" The increased line number comes from `}` and `break`, the logic is simplified and corrected since we do not need to check `isa` here. We extract the code for "MCAlignFragment" out to make it clear that it doesn't have fixup. If you care about the increased line number here, I can remove the unnecessary "{}" here or use `if` statement. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 22:08:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:08:42 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <0b49d0b325e7dba823f1442106e56c73@localhost.localdomain> MaskRay added a comment. I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well. For `basic-block-sections-cfiinstr_1.ll`, have you considered places like `CodeGen/X86/cfi-inserter-*`? You may even create a subdirectory there. `_1` is not very common. `-1` is more common. `curl -L 'https://reviews.llvm.org/D79978?download=1'` does not have a/ or b/ prefix. I think that may be why `arc patch D79978` cannot apply the patch. Can you upload a diff with either `arc diff`, git format-patch -1 or `git diff 'HEAD^'`? Thanks. ================ Comment at: llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll:6 +; CFI_INSTR: _Z7computebiiiiii +; CFI_INSTR: bb.0 +; CFI_INSTR: bb.1 ---------------- I think these labels may need `:` suffix and a `# `prefix to make them unique. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Tue Jul 7 22:38:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:38:45 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: MaskRay added a comment. In D83366#2138199 , @skan wrote: > In D83366#2138183 , @MaskRay wrote: > > > Note that the number of lines actually increases... so I am not sure this is "simplification" > > > The increased line number comes from `}` and `break`, the logic is simplified and corrected since we do not need to check `isa` here. We extract the code for "MCAlignFragment" out to make it clear that it doesn't have fixup. If you care > about the increased line number here, I can remove the unnecessary "{}" here or use `if` statement. This is: if else if else if -> switch. I think it is just a refactoring and I agree that it is the right thing to do, but I don't find complexity being simplified. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:825 + // shouldInsertFixupForCodeAlign target hook. + if (MCAlignFragment *AF = dyn_cast(&Frag)) { + if (Sec.UseCodeAlign() && AF->hasEmitNops()) { ---------------- Can you move MCAlignFragment into the switch as well? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 22:52:33 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:52:33 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses Message-ID: ggeorgakoudis created this revision. Herald added subscribers: llvm-commits, sstefan1, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. Ignore callback uses when adding a callback function in the CallGraph. Callback functions are typically created when outlining, e.g. for OpenMP, so they have internal scope and linkage. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.276312.patch Type: text/x-patch Size: 5334 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 22:52:40 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:52:40 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <37a9ca93342a7aefaf284d3adde46399@localhost.localdomain> skan added a comment. In D83366#2138210 , @MaskRay wrote: > This is: if else if else if -> switch. I think it is just a refactoring and I agree that it is the right thing to do, but I don't find complexity being simplified. In addition to if else -> switch. The simplified thing is that we don't need to check `isa(&Frag)` in the new code. In the old code, if we want to add a new kind of fragment that inherits from `MCEncodedFragment` but is not a `MCEncodedFragmentWithFixups`, we have to add `if(!isa(Frag) continue)` here to avoid reaching "llvm_unreachable("Unknown fragment with fixups!")", it doesn't make sense since the new fragment has nothing to do with fixup. We only need to process `MCAlignFragment` and `MCEncodedFragmentWithFixups` here, so Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 22:53:36 2020 From: llvm-commits at lists.llvm.org (Xiang Zhang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:53:36 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <395e92d2ba8e93e6b880324f12e675e9@localhost.localdomain> xiangzhangllvm added a comment. That is really more clear than old code. I +1 for this refine. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 22:56:16 2020 From: llvm-commits at lists.llvm.org (Serge Pavlov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:56:16 +0000 (UTC) Subject: [PATCH] D74729: [FPEnv] Intrinsic for setting rounding mode In-Reply-To: References: Message-ID: <1951d0ab5bac5d1c6047f923570ae850@localhost.localdomain> sepavloff updated this revision to Diff 276313. sepavloff added a comment. Rebased patch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D74729/new/ https://reviews.llvm.org/D74729 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/Intrinsics.td llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/IR/Verifier.cpp llvm/unittests/IR/IRBuilderTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D74729.276313.patch Type: text/x-patch Size: 6083 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 22:56:48 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 05:56:48 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <36fb43b04f3c8967295b6176e8fdbdcd@localhost.localdomain> skan marked an inline comment as done. skan added inline comments. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:825 + // shouldInsertFixupForCodeAlign target hook. + if (MCAlignFragment *AF = dyn_cast(&Frag)) { + if (Sec.UseCodeAlign() && AF->hasEmitNops()) { ---------------- MaskRay wrote: > Can you move MCAlignFragment into the switch as well? ``` ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; for (const MCFixup &Fixup : Fixups) { ... ``` is not used by `MCAlignFragment`. I think an early `continue` here can make things more clear. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 23:20:53 2020 From: llvm-commits at lists.llvm.org (Michele Scandale via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:20:53 +0000 (UTC) Subject: [PATCH] D82659: Fix missing build dependency on omp_gen. In-Reply-To: References: Message-ID: <73cfa182f8947318a2073b00c2f8cf94@localhost.localdomain> michele.scandale added a comment. In D82659#2136999 , @clementval wrote: > Looks good but just one question ... When clang is built as standalone it does not build the OpenMP part inside Clang? I haven't seen any code to avoid compiling the OpenMP parsing and semantic checking inside clang. I don't think there is a way to avoid compiling the OpenMP support in Clang. The standalone build is just building the content of the `clang` directory as a separate CMake project reusing the an already built LLVM -- therefore the `libLLVMFrontendOpenMP` as well as the `OMP.h.inc` would have been generated already. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82659/new/ https://reviews.llvm.org/D82659 From llvm-commits at lists.llvm.org Tue Jul 7 23:24:20 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Micha=C5=82_G=C3=B3rny_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 06:24:20 +0000 (UTC) Subject: [PATCH] D76665: [asan] Stop instrumenting user-defined ELF sections In-Reply-To: References: Message-ID: mgorny added a comment. In D76665#2138050 , @kcc wrote: > can we instead slap an attribute on these special variables? Could you explain, please? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76665/new/ https://reviews.llvm.org/D76665 From llvm-commits at lists.llvm.org Tue Jul 7 23:28:38 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:28:38 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <5bff1fb230cec495ed57bba8e72ec608@localhost.localdomain> kyulee updated this revision to Diff 276315. kyulee added a comment. Updating for Linux support. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D76570.276315.patch Type: text/x-patch Size: 46122 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 23:30:58 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:30:58 +0000 (UTC) Subject: [PATCH] D83304: [LiveDebugValues][NFC] 0/4 Move LiveDebugValues source file ahead of refactor In-Reply-To: References: Message-ID: <8472a41cfa87071e59f7b2305dfb8ce1@localhost.localdomain> djtodoro accepted this revision. djtodoro added a comment. This revision is now accepted and ready to land. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83304/new/ https://reviews.llvm.org/D83304 From llvm-commits at lists.llvm.org Tue Jul 7 23:35:58 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Tue, 07 Jul 2020 23:35:58 -0700 (PDT) Subject: [llvm] edc7da2 - Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare Message-ID: <5f05694e.1c69fb81.91876.e84f@mx.google.com> Author: serge-sans-paille Date: 2020-07-08T08:35:44+02:00 New Revision: edc7da24057b22896dc6522d3f98ccdd75a4e7f8 URL: https://github.com/llvm/llvm-project/commit/edc7da24057b22896dc6522d3f98ccdd75a4e7f8 DIFF: https://github.com/llvm/llvm-project/commit/edc7da24057b22896dc6522d3f98ccdd75a4e7f8.diff LOG: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare optimizeMemoryInst was reporting no change while still modifying the IR. Inspect the status of TypePromotionTransaction to get a better status. Related to https://reviews.llvm.org/D80916 Differential Revision: https://reviews.llvm.org/D81256 Added: Modified: llvm/lib/CodeGen/CodeGenPrepare.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp index 0d97a7feadcb..8181c6643cfd 100644 --- a/llvm/lib/CodeGen/CodeGenPrepare.cpp +++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -2822,8 +2822,9 @@ class TypePromotionTransaction { TypePromotionTransaction(SetOfInstrs &RemovedInsts) : RemovedInsts(RemovedInsts) {} - /// Advocate every changes made in that transaction. - void commit(); + /// Advocate every changes made in that transaction. Return true if any change + /// happen. + bool commit(); /// Undo all the changes made after the given point. void rollback(ConstRestorationPt Point); @@ -2929,11 +2930,13 @@ TypePromotionTransaction::getRestorationPoint() const { return !Actions.empty() ? Actions.back().get() : nullptr; } -void TypePromotionTransaction::commit() { +bool TypePromotionTransaction::commit() { for (CommitPt It = Actions.begin(), EndIt = Actions.end(); It != EndIt; ++It) (*It)->commit(); + bool Modified = !Actions.empty(); Actions.clear(); + return Modified; } void TypePromotionTransaction::rollback( @@ -4959,7 +4962,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, TPT.rollback(LastKnownGood); return false; } - TPT.commit(); + bool Modified = TPT.commit(); // Get the combined AddrMode (or the only AddrMode, if we only had one). ExtAddrMode AddrMode = AddrModes.getAddrMode(); @@ -4973,7 +4976,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, })) { LLVM_DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n"); - return false; + return Modified; } // Insert this computation right after this user. Since our caller is @@ -5014,7 +5017,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, // We can't add more than one pointer together, nor can we scale a // pointer (both of which seem meaningless). if (ResultPtr || AddrMode.Scale != 1) - return false; + return Modified; ResultPtr = AddrMode.ScaledReg; AddrMode.Scale = 0; @@ -5031,12 +5034,12 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, Type *ScaledRegTy = AddrMode.ScaledReg->getType(); if (cast(IntPtrTy)->getBitWidth() > cast(ScaledRegTy)->getBitWidth()) - return false; + return Modified; } if (AddrMode.BaseGV) { if (ResultPtr) - return false; + return Modified; ResultPtr = AddrMode.BaseGV; } @@ -5060,7 +5063,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, !AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) { SunkAddr = Constant::getNullValue(Addr->getType()); } else if (!ResultPtr) { - return false; + return Modified; } else { Type *I8PtrTy = Builder.getInt8PtrTy(Addr->getType()->getPointerAddressSpace()); @@ -5145,7 +5148,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, (ScalePtrTy && DL->isNonIntegralPointerType(ScalePtrTy)) || (AddrMode.BaseGV && DL->isNonIntegralPointerType(AddrMode.BaseGV->getType()))) - return false; + return Modified; LLVM_DEBUG(dbgs() << "CGP: SINKING nonlocal addrmode: " << AddrMode << " for " << *MemoryInst << "\n"); @@ -5185,7 +5188,7 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction *MemoryInst, Value *Addr, Instruction *I = dyn_cast_or_null(Result); if (I && (Result != AddrMode.BaseReg)) I->eraseFromParent(); - return false; + return Modified; } if (AddrMode.Scale != 1) V = Builder.CreateMul(V, ConstantInt::get(IntPtrTy, AddrMode.Scale), From llvm-commits at lists.llvm.org Tue Jul 7 23:36:08 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:36:08 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: <667d173e1bdff630b1671b61c83e0c3a@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGedc7da24057b: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare (authored by serge-sans-paille). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp Index: llvm/lib/CodeGen/CodeGenPrepare.cpp =================================================================== --- llvm/lib/CodeGen/CodeGenPrepare.cpp +++ llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -2822,8 +2822,9 @@ TypePromotionTransaction(SetOfInstrs &RemovedInsts) : RemovedInsts(RemovedInsts) {} - /// Advocate every changes made in that transaction. - void commit(); + /// Advocate every changes made in that transaction. Return true if any change + /// happen. + bool commit(); /// Undo all the changes made after the given point. void rollback(ConstRestorationPt Point); @@ -2929,11 +2930,13 @@ return !Actions.empty() ? Actions.back().get() : nullptr; } -void TypePromotionTransaction::commit() { +bool TypePromotionTransaction::commit() { for (CommitPt It = Actions.begin(), EndIt = Actions.end(); It != EndIt; ++It) (*It)->commit(); + bool Modified = !Actions.empty(); Actions.clear(); + return Modified; } void TypePromotionTransaction::rollback( @@ -4959,7 +4962,7 @@ TPT.rollback(LastKnownGood); return false; } - TPT.commit(); + bool Modified = TPT.commit(); // Get the combined AddrMode (or the only AddrMode, if we only had one). ExtAddrMode AddrMode = AddrModes.getAddrMode(); @@ -4973,7 +4976,7 @@ })) { LLVM_DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n"); - return false; + return Modified; } // Insert this computation right after this user. Since our caller is @@ -5014,7 +5017,7 @@ // We can't add more than one pointer together, nor can we scale a // pointer (both of which seem meaningless). if (ResultPtr || AddrMode.Scale != 1) - return false; + return Modified; ResultPtr = AddrMode.ScaledReg; AddrMode.Scale = 0; @@ -5031,12 +5034,12 @@ Type *ScaledRegTy = AddrMode.ScaledReg->getType(); if (cast(IntPtrTy)->getBitWidth() > cast(ScaledRegTy)->getBitWidth()) - return false; + return Modified; } if (AddrMode.BaseGV) { if (ResultPtr) - return false; + return Modified; ResultPtr = AddrMode.BaseGV; } @@ -5060,7 +5063,7 @@ !AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) { SunkAddr = Constant::getNullValue(Addr->getType()); } else if (!ResultPtr) { - return false; + return Modified; } else { Type *I8PtrTy = Builder.getInt8PtrTy(Addr->getType()->getPointerAddressSpace()); @@ -5145,7 +5148,7 @@ (ScalePtrTy && DL->isNonIntegralPointerType(ScalePtrTy)) || (AddrMode.BaseGV && DL->isNonIntegralPointerType(AddrMode.BaseGV->getType()))) - return false; + return Modified; LLVM_DEBUG(dbgs() << "CGP: SINKING nonlocal addrmode: " << AddrMode << " for " << *MemoryInst << "\n"); @@ -5185,7 +5188,7 @@ Instruction *I = dyn_cast_or_null(Result); if (I && (Result != AddrMode.BaseReg)) I->eraseFromParent(); - return false; + return Modified; } if (AddrMode.Scale != 1) V = Builder.CreateMul(V, ConstantInt::get(IntPtrTy, AddrMode.Scale), -------------- next part -------------- A non-text attachment was scrubbed... Name: D81256.276318.patch Type: text/x-patch Size: 3297 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 23:38:17 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:38:17 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <16626199936b7c4d705585643f36b39d@localhost.localdomain> serge-sans-paille added a comment. All the reviews required to land this without breaking the validation have landed, @jdoerfert / @nikic / @foad I'd be super-happy to land this one before branching. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Tue Jul 7 23:47:39 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:47:39 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <4646a56de0615c863811fdf5cc77d140@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:825 + // shouldInsertFixupForCodeAlign target hook. + if (MCAlignFragment *AF = dyn_cast(&Frag)) { + if (Sec.UseCodeAlign() && AF->hasEmitNops()) { ---------------- skan wrote: > MaskRay wrote: > > Can you move MCAlignFragment into the switch as well? > ``` > ArrayRef Fixups; > MutableArrayRef Contents; > const MCSubtargetInfo *STI = nullptr; > > for (const MCFixup &Fixup : Fixups) { > ... > ``` > is not used by `MCAlignFragment`. I think an early `continue` here can make things more clear. They have trivial constructors. Moving it can be clearer/more efficient. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Tue Jul 7 23:58:37 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 06:58:37 +0000 (UTC) Subject: [PATCH] D81256: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGedc7da24057b: Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare (authored by serge-sans-paille). Changed prior to commit: https://reviews.llvm.org/D81256?vs=272951&id=275697#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81256/new/ https://reviews.llvm.org/D81256 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp Index: llvm/lib/CodeGen/CodeGenPrepare.cpp =================================================================== --- llvm/lib/CodeGen/CodeGenPrepare.cpp +++ llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -2822,8 +2822,9 @@ TypePromotionTransaction(SetOfInstrs &RemovedInsts) : RemovedInsts(RemovedInsts) {} - /// Advocate every changes made in that transaction. - void commit(); + /// Advocate every changes made in that transaction. Return true if any change + /// happen. + bool commit(); /// Undo all the changes made after the given point. void rollback(ConstRestorationPt Point); @@ -2929,11 +2930,13 @@ return !Actions.empty() ? Actions.back().get() : nullptr; } -void TypePromotionTransaction::commit() { +bool TypePromotionTransaction::commit() { for (CommitPt It = Actions.begin(), EndIt = Actions.end(); It != EndIt; ++It) (*It)->commit(); + bool Modified = !Actions.empty(); Actions.clear(); + return Modified; } void TypePromotionTransaction::rollback( @@ -4959,7 +4962,7 @@ TPT.rollback(LastKnownGood); return false; } - TPT.commit(); + bool Modified = TPT.commit(); // Get the combined AddrMode (or the only AddrMode, if we only had one). ExtAddrMode AddrMode = AddrModes.getAddrMode(); @@ -4973,7 +4976,7 @@ })) { LLVM_DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n"); - return false; + return Modified; } // Insert this computation right after this user. Since our caller is @@ -5014,7 +5017,7 @@ // We can't add more than one pointer together, nor can we scale a // pointer (both of which seem meaningless). if (ResultPtr || AddrMode.Scale != 1) - return false; + return Modified; ResultPtr = AddrMode.ScaledReg; AddrMode.Scale = 0; @@ -5031,12 +5034,12 @@ Type *ScaledRegTy = AddrMode.ScaledReg->getType(); if (cast(IntPtrTy)->getBitWidth() > cast(ScaledRegTy)->getBitWidth()) - return false; + return Modified; } if (AddrMode.BaseGV) { if (ResultPtr) - return false; + return Modified; ResultPtr = AddrMode.BaseGV; } @@ -5060,7 +5063,7 @@ !AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) { SunkAddr = Constant::getNullValue(Addr->getType()); } else if (!ResultPtr) { - return false; + return Modified; } else { Type *I8PtrTy = Builder.getInt8PtrTy(Addr->getType()->getPointerAddressSpace()); @@ -5145,7 +5148,7 @@ (ScalePtrTy && DL->isNonIntegralPointerType(ScalePtrTy)) || (AddrMode.BaseGV && DL->isNonIntegralPointerType(AddrMode.BaseGV->getType()))) - return false; + return Modified; LLVM_DEBUG(dbgs() << "CGP: SINKING nonlocal addrmode: " << AddrMode << " for " << *MemoryInst << "\n"); @@ -5185,7 +5188,7 @@ Instruction *I = dyn_cast_or_null(Result); if (I && (Result != AddrMode.BaseReg)) I->eraseFromParent(); - return false; + return Modified; } if (AddrMode.Scale != 1) V = Builder.CreateMul(V, ConstantInt::get(IntPtrTy, AddrMode.Scale), -------------- next part -------------- A non-text attachment was scrubbed... Name: D81256.275697.patch Type: text/x-patch Size: 3297 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 00:00:28 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:00:28 +0000 (UTC) Subject: [PATCH] D83314: [llvm-readobj] - Refine error reporting in MipsGOTParser helper. In-Reply-To: References: Message-ID: <86037664e97348f820526f276ca98989@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, with one nit. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/mips-plt.test:127 + +## Check we report errors when we are unable to dump PLTGOT table properly. + ---------------- "to dump the PLTGOT properly" (GOT stands for Global Offset Table, so no need to say "table" again). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83314/new/ https://reviews.llvm.org/D83314 From llvm-commits at lists.llvm.org Wed Jul 8 00:08:34 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:08:34 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <3095a25cdcc1ee54fa6e9d13c765696a@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- dblaikie wrote: > Higuoxing wrote: > > jhenderson wrote: > > > dblaikie wrote: > > > > Higuoxing wrote: > > > > > dblaikie wrote: > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Wed Jul 8 00:10:02 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:10:02 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89 + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " terminated prematurely: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; ---------------- dblaikie wrote: > ikudrin wrote: > > jhenderson wrote: > > > dblaikie wrote: > > > > jhenderson wrote: > > > > > ikudrin wrote: > > > > > > dblaikie wrote: > > > > > > > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. > > > > > > > > > > > > > > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. > > > > > > These wordings are already better than mine. Thanks! > > > > > How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. > > > > The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. > > > > > > > > What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? > > > Taken from the test case: > > > > > > ``` > > > error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 > > > ``` > > > (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") > > > > > > ``` > > > error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c > > > ``` > > Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative? > This one sounds OK (guess it could be more precise in this case "bounds reached without finding expected null terminator" perhaps - but I realize that's fairly orthogonal to this patch & could be improved in the general DataExtractor infrastructure) - honestly the verbosity of these messages doesn't seem like a problem to me. They should be pretty rare & when they do come up, the more explicit/precise the better, it seems to me. > ``` > error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 > ``` > This one > ``` > error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c > ``` > Still seems like it could be more precise - exactly why was the terminator unexpected? "has a terminator at 0x8c before the expected end at 0x??" perhaps. > "has a terminator at 0x8c before the expected end at 0x??" perhaps. Sounds good to me. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Wed Jul 8 00:12:55 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:12:55 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. LGTM. For my sanity, can you confirm which the five new tests are that you are running? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83350/new/ https://reviews.llvm.org/D83350 From llvm-commits at lists.llvm.org Wed Jul 8 00:15:22 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:15:22 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <32b157000488c844309c2595df7da94b@localhost.localdomain> RKSimon marked an inline comment as done. RKSimon added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36727 + unsigned NumCstElts = cast(C->getType())->getNumElements(); + if (NumCstElts != NumElts && NumCstElts != (NumElts * 2)) + return false; ---------------- craig.topper wrote: > I think this check isn't enough if the load is narrower than the constant pool vector. For example a v16i8 load with a v32i8 constant pool. So NumCstElts == NumElts * 2 and we'll proceed. > > I think this is the cause of some failures we're seeing, but I don't have a reduced case yet. Does checking that Mask.getValueSizeInBits() == C->getSizeInBits() as well help? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 00:16:18 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:16:18 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: skan updated this revision to Diff 276323. skan marked an inline comment as done. skan added a comment. Address review comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 Files: llvm/lib/MC/MCAssembler.cpp Index: llvm/lib/MC/MCAssembler.cpp =================================================================== --- llvm/lib/MC/MCAssembler.cpp +++ llvm/lib/MC/MCAssembler.cpp @@ -820,48 +820,57 @@ // Evaluate and apply the fixups, generating relocation entries as necessary. for (MCSection &Sec : *this) { for (MCFragment &Frag : Sec) { - // Data and relaxable fragments both have fixups. So only process - // those here. - // FIXME: Is there a better way to do this? MCEncodedFragmentWithFixups - // being templated makes this tricky. - if (isa(&Frag) && - isa(&Frag)) - continue; - if (!isa(&Frag) && !isa(&Frag) && - !isa(&Frag)) - continue; ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; - if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *AF = dyn_cast(&Frag)) { + + // Process MCAlignFragment and MCEncodedFragmentWithFixups here. + switch (Frag.getKind()) { + default: + continue; + case MCFragment::FT_Align: { + MCAlignFragment &AF = cast(Frag); // Insert fixup type for code alignment if the target define // shouldInsertFixupForCodeAlign target hook. - if (Sec.UseCodeAlign() && AF->hasEmitNops()) { - getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); - } + if (Sec.UseCodeAlign() && AF.hasEmitNops()) + getBackend().shouldInsertFixupForCodeAlign(*this, Layout, AF); continue; - } else if (auto *FragWithFixups = - dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else - llvm_unreachable("Unknown fragment with fixups!"); + } + case MCFragment::FT_Data: { + MCDataFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + STI = DF.getSubtargetInfo(); + assert(!DF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_Relaxable: { + MCRelaxableFragment &RF = cast(Frag); + Fixups = RF.getFixups(); + Contents = RF.getContents(); + STI = RF.getSubtargetInfo(); + assert(!RF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_CVDefRange: { + MCCVDefRangeFragment &CF = cast(Frag); + Fixups = CF.getFixups(); + Contents = CF.getContents(); + break; + } + case MCFragment::FT_Dwarf: { + MCDwarfLineAddrFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + case MCFragment::FT_DwarfFrame: { + MCDwarfCallFrameFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + } for (const MCFixup &Fixup : Fixups) { uint64_t FixedValue; bool IsResolved; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83366.276323.patch Type: text/x-patch Size: 4224 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 00:21:55 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:21:55 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <948603b95e2d819361920a9f8756de4e@localhost.localdomain> jhenderson added a comment. Mostly looks good, barring @smeenai's comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test:27 + +## Missing -static option: +# RUN: not llvm-libtool-darwin -o %t.lib %t.input 2>&1 | \ ---------------- sameerarora101 wrote: > jhenderson wrote: > > Maybe make this more generic, e.g. "Missing library type option:" > > > > I don't really know where to put this test. It might want to be its own test case entirely, e.g. "missing-library-type.test" > Isn't it similar to `## Missing output file`? (That's why I thought it would go here). But I have placed it in `missing-library-type.test` for now. Similar in concept, but different behaviour (it's checking the behaviour of the `cl::Required` on the `Library operation` option, which is a different thing to the input and output files). ================ Comment at: llvm/test/tools/llvm-libtool-darwin/missing-library-type.test:3-5 +# RUN: FileCheck %s --check-prefix=MISSING-OPERATION --strict-whitespace + +# MISSING-OPERATION: Library Type: option: must be specified at least once! ---------------- I don't think it's necessary to add --strict-whitespace here, as the exact formatting of the message isn't really the tool's responsibility. I'd get rid of the double space, as if the message was correct, and the option, so that you don't have to modify this test when you fix the command line message. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:85 + std::vector NewMembers; + for (const StringRef Member : InputFiles) + if (Error E = addMember(NewMembers, Member)) ---------------- You can get rid of the `const` too if you want. `StringRef` itself is immutable, so all this does is stop you re-assigning `Member`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Wed Jul 8 00:33:34 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:33:34 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: craig.topper added a subscriber: yubing. craig.topper added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36727 + unsigned NumCstElts = cast(C->getType())->getNumElements(); + if (NumCstElts != NumElts && NumCstElts != (NumElts * 2)) + return false; ---------------- RKSimon wrote: > craig.topper wrote: > > I think this check isn't enough if the load is narrower than the constant pool vector. For example a v16i8 load with a v32i8 constant pool. So NumCstElts == NumElts * 2 and we'll proceed. > > > > I think this is the cause of some failures we're seeing, but I don't have a reduced case yet. > Does checking that Mask.getValueSizeInBits() == C->getSizeInBits() as well help? Yeah I ended up trying that right after I wrote my earlier message. That fixed the failing tests we had. I think we may have a reduced test case now. @pengfei or @yubing can you share it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 00:40:24 2020 From: llvm-commits at lists.llvm.org (Itay Bookstein via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:40:24 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: <9cf32655b8abb2ee4fd53a2a922e552e@localhost.localdomain> nextsilicon-itay-bookstein added a comment. In D81911#2137237 , @DmitryPolukhin wrote: > I did a bit of archeology and it turns out that getBaseObejct was part of moved from GlobalAlias to GlobalIndirectSymbol in https://github.com/llvm/llvm-project/commit/95549497ec8b5269f0439f12859537b7371b7c90 > It looks like the simplest solution is to handle nullptr from getBaseObejct in computeAliasSummary... Doesn't that mean that getBaseObject is given two contradictory meanings, then? On the one hand, from what you say, the base object should be able to "act in place of the enclosing object" => GlobalIFunc::getBaseObject() should return itself, since the resolver doesn't substitute for the symbol it's supposed to resolve. On the other hand, from the expectations of code like DCE and SplitModule (in the commit you linked), the base object should tell you that these objects are "linked together" in some respect => GlobalIFunc::getBaseObject() should return the resolver, since they are 'inseparable' in that respect. >From a types point of view, either of the two restores the idempotence getBaseObject(). But as far as this PR is concerned: - Without this PR's changes, if you have Alias-to-IFunc, getBaseObject() on the Alias returns nullptr and getBaseObject() on the IFunc returns the resolver. - If we put "return GI;" in findBaseObject(), getBaseObject() for the Alias returns the IFunc, but getBaseObject() on the IFunc returns the resolver. Both cases are not idempotent :\ CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Wed Jul 8 00:42:07 2020 From: llvm-commits at lists.llvm.org (Pengfei Wang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:42:07 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <4efb9e9636b54cd7fd5bdb3ba7eb217c@localhost.localdomain> pengfei added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36727 + unsigned NumCstElts = cast(C->getType())->getNumElements(); + if (NumCstElts != NumElts && NumCstElts != (NumElts * 2)) + return false; ---------------- craig.topper wrote: > RKSimon wrote: > > craig.topper wrote: > > > I think this check isn't enough if the load is narrower than the constant pool vector. For example a v16i8 load with a v32i8 constant pool. So NumCstElts == NumElts * 2 and we'll proceed. > > > > > > I think this is the cause of some failures we're seeing, but I don't have a reduced case yet. > > Does checking that Mask.getValueSizeInBits() == C->getSizeInBits() as well help? > Yeah I ended up trying that right after I wrote my earlier message. That fixed the failing tests we had. I think we may have a reduced test case now. @pengfei or @yubing can you share it? Sure. There's a small one here https://godbolt.org/z/hsh5_K Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 00:53:07 2020 From: llvm-commits at lists.llvm.org (Bing Yu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:53:07 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <69747d65eb6eb8022e7f1973e7225d59@localhost.localdomain> yubing added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36736 + Constant *Elt = C->getAggregateElement(i); + if (!DemandedElts[i / Scale] && !isa(Elt)) { + ConstVecOps.push_back(UndefValue::get(Elt->getType())); ---------------- Hi, Simon. I'm just wondering why we divide i by scale here. In my case: When SimplifyDemandedVectorEltsForTargetShuffle visit t150, demandedElts is 0xff0f, scale is 2. so when i=8, DemandedElts[i / Scale] is false, but DemandedElts[i] is true. Thus the t146[8] will become undef while the previous value is -1. t146: i64 = X86ISD::Wrapper TargetConstantPool:i64<<32 x i8> > 0 t154: v16i8,ch = load<(load 16 from constant-pool, align 32)> t0, t146, undef:i64 t150: v16i8 = X86ISD::PSHUFB t156, t154 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 00:54:10 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:54:10 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind Message-ID: mwezdeck created this revision. mwezdeck added reviewers: lattner, dblaikie, resistor, bkramer, craig.topper. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. If llvm so lib is dlopened and dlclosed several times, then memory leak can be observed, reported by Valgrind. This patch fixes the issue. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83372 Files: llvm/lib/Support/ManagedStatic.cpp Index: llvm/lib/Support/ManagedStatic.cpp =================================================================== --- llvm/lib/Support/ManagedStatic.cpp +++ llvm/lib/Support/ManagedStatic.cpp @@ -18,23 +18,14 @@ using namespace llvm; static const ManagedStaticBase *StaticList = nullptr; -static std::recursive_mutex *ManagedStaticMutex = nullptr; +static std::recursive_mutex ManagedStaticMutex; static llvm::once_flag mutex_init_flag; -static void initializeMutex() { - ManagedStaticMutex = new std::recursive_mutex(); -} - -static std::recursive_mutex *getManagedStaticMutex() { - llvm::call_once(mutex_init_flag, initializeMutex); - return ManagedStaticMutex; -} - void ManagedStaticBase::RegisterManagedStatic(void *(*Creator)(), void (*Deleter)(void*)) const { assert(Creator); if (llvm_is_multithreaded()) { - std::lock_guard Lock(*getManagedStaticMutex()); + std::lock_guard Lock(ManagedStaticMutex); if (!Ptr.load(std::memory_order_relaxed)) { void *Tmp = Creator(); @@ -76,7 +67,7 @@ /// llvm_shutdown - Deallocate and destroy all ManagedStatic variables. void llvm::llvm_shutdown() { - std::lock_guard Lock(*getManagedStaticMutex()); + std::lock_guard Lock(ManagedStaticMutex); while (StaticList) StaticList->destroy(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83372.276326.patch Type: text/x-patch Size: 1415 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 00:57:42 2020 From: llvm-commits at lists.llvm.org (Sam Parker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:57:42 +0000 (UTC) Subject: [PATCH] D83313: [MachineOutliner] Fix liveness computing. In-Reply-To: References: Message-ID: <694704ad0a5cd19162e7ba1438ead339@localhost.localdomain> samparker added a comment. I guess I just don't understand why the BX_RET wouldn't be marked with using the link register in the first place? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83313/new/ https://reviews.llvm.org/D83313 From llvm-commits at lists.llvm.org Wed Jul 8 00:58:45 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Wed, 08 Jul 2020 00:58:45 -0700 (PDT) Subject: [llvm] d8dfd6d - [gn build] Port 20e271a98de Message-ID: <5f057cb5.1c69fb81.f6fd9.7d9f@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-08T07:52:15Z New Revision: d8dfd6dcc143a2164ae781de6598e72b7183fc3f URL: https://github.com/llvm/llvm-project/commit/d8dfd6dcc143a2164ae781de6598e72b7183fc3f DIFF: https://github.com/llvm/llvm-project/commit/d8dfd6dcc143a2164ae781de6598e72b7183fc3f.diff LOG: [gn build] Port 20e271a98de Added: Modified: llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn b/llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn index c5f57a89623f..8a930b459774 100644 --- a/llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn +++ b/llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn @@ -107,6 +107,7 @@ static_library("Checkers") { "RunLoopAutoreleaseLeakChecker.cpp", "STLAlgorithmModeling.cpp", "SimpleStreamChecker.cpp", + "SmartPtrChecker.cpp", "SmartPtrModeling.cpp", "StackAddrEscapeChecker.cpp", "StdLibraryFunctionsChecker.cpp", From llvm-commits at lists.llvm.org Wed Jul 8 00:59:01 2020 From: llvm-commits at lists.llvm.org (Peter Smith via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 07:59:01 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: <4353db6cb68cd554946b301bd48ef422@localhost.localdomain> psmith added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:311 R_TLSIE_HINT>(expr) && - canRelax && isLocalInExecutable) { + sharedToExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); ---------------- MaskRay wrote: > Initial-Exec -> Local-Exec is a relaxation from executable to executable. `sharedToExecRelax` is not an appropriate name. Shall we rename the variable? > > Technically, a shared object can use Initial-Exec as well if it is part of initial modules (via transitive DT_NEEDED; `DF_STATIC_TLS`). Perhaps just toExecRelax or canExecRelax? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Wed Jul 8 00:59:55 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Wed, 08 Jul 2020 00:59:55 -0700 (PDT) Subject: [llvm] 80970ac - [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end). Message-ID: <5f057cfb.1c69fb81.8256c.eb99@mx.google.com> Author: Florian Hahn Date: 2020-07-08T08:59:46+01:00 New Revision: 80970ac87574c6d0292894a4a912fa512336f434 URL: https://github.com/llvm/llvm-project/commit/80970ac87574c6d0292894a4a912fa512336f434 DIFF: https://github.com/llvm/llvm-project/commit/80970ac87574c6d0292894a4a912fa512336f434.diff LOG: [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end). This patch adds support for eliminating stores by free & lifetime.end calls. We can remove stores that are not read before calling a memory terminator and we can eliminate all stores after a memory terminator until we see a new lifetime.start. The second case seems to not really trigger much in practice though. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72410 Added: Modified: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp index dd8dc84d9589..e58db03225ee 100644 --- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp +++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp @@ -51,6 +51,7 @@ #include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" #include "llvm/IR/PassManager.h" +#include "llvm/IR/PatternMatch.h" #include "llvm/IR/Value.h" #include "llvm/InitializePasses.h" #include "llvm/Pass.h" @@ -73,6 +74,7 @@ #include using namespace llvm; +using namespace PatternMatch; #define DEBUG_TYPE "dse" @@ -1533,7 +1535,7 @@ struct DSEState { auto *MD = dyn_cast_or_null(MA); if (MD && State.MemDefs.size() < MemorySSADefsPerBlockLimit && - State.getLocForWriteEx(&I)) + (State.getLocForWriteEx(&I) || State.isMemTerminatorInst(&I))) State.MemDefs.push_back(MD); // Track whether alloca and alloca-like objects are visible in the @@ -1667,6 +1669,51 @@ struct DSEState { return true; } + /// If \p I is a memory terminator like llvm.lifetime.end or free, return a + /// pair with the MemoryLocation terminated by \p I and a boolean flag + /// indicating whether \p I is a free-like call. + Optional> + getLocForTerminator(Instruction *I) const { + uint64_t Len; + Value *Ptr; + if (match(I, m_Intrinsic(m_ConstantInt(Len), + m_Value(Ptr)))) + return {std::make_pair(MemoryLocation(Ptr, Len), false)}; + + if (auto *CB = dyn_cast(I)) { + if (isFreeCall(I, &TLI)) + return {std::make_pair(MemoryLocation(CB->getArgOperand(0)), true)}; + } + + return None; + } + + /// Returns true if \p I is a memory terminator instruction like + /// llvm.lifetime.end or free. + bool isMemTerminatorInst(Instruction *I) const { + IntrinsicInst *II = dyn_cast(I); + return (II && II->getIntrinsicID() == Intrinsic::lifetime_end) || + isFreeCall(I, &TLI); + } + + /// Returns true if \p MaybeTerm is a memory terminator for the same + /// underlying object as \p DefLoc. + bool isMemTerminator(MemoryLocation DefLoc, Instruction *MaybeTerm) const { + Optional> MaybeTermLoc = + getLocForTerminator(MaybeTerm); + + if (!MaybeTermLoc) + return false; + + // If the terminator is a free-like call, all accesses to the underlying + // object can be considered terminated. + if (MaybeTermLoc->second) { + DataLayout DL = MaybeTerm->getParent()->getModule()->getDataLayout(); + DefLoc = MemoryLocation(GetUnderlyingObject(DefLoc.Ptr, DL)); + } + return AA.isMustAlias(MaybeTermLoc->first, DefLoc); + } + // Returns true if \p Use may read from \p DefLoc. bool isReadClobber(MemoryLocation DefLoc, Instruction *UseInst) const { if (!UseInst->mayReadFromMemory()) @@ -1772,6 +1819,11 @@ struct DSEState { continue; } + // A memory terminator kills all preceeding MemoryDefs and all succeeding + // MemoryAccesses. We do not have to check it's users. + if (isMemTerminator(DefLoc, UseInst)) + continue; + // Uses which may read the original MemoryDef mean we cannot eliminate the // original MD. Stop walk. if (isReadClobber(DefLoc, UseInst)) { @@ -2059,6 +2111,12 @@ bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA, Instruction *SI = KillingDef->getMemoryInst(); auto MaybeSILoc = State.getLocForWriteEx(SI); + if (State.isMemTerminatorInst(SI)) + MaybeSILoc = State.getLocForTerminator(SI).map( + [](const std::pair &P) { return P.first; }); + else + MaybeSILoc = State.getLocForWriteEx(SI); + if (!MaybeSILoc) { LLVM_DEBUG(dbgs() << "Failed to find analyzable write location for " << *SI << "\n"); @@ -2165,43 +2223,55 @@ bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA, continue; MemoryLocation NILoc = *State.getLocForWriteEx(NI); - // Check if NI overwrites SI. - int64_t InstWriteOffset, DepWriteOffset; - auto Iter = State.IOLs.insert( - std::make_pair( - NI->getParent(), InstOverlapIntervalsTy())); - auto &IOL = Iter.first->second; - OverwriteResult OR = isOverwrite(SILoc, NILoc, DL, TLI, DepWriteOffset, - InstWriteOffset, NI, IOL, AA, &F); - - if (EnablePartialStoreMerging && OR == OW_PartialEarlierWithFullLater) { - auto *Earlier = dyn_cast(NI); - auto *Later = dyn_cast(SI); - if (Constant *Merged = tryToMergePartialOverlappingStores( - Earlier, Later, InstWriteOffset, DepWriteOffset, DL, &AA, - &DT)) { - - // Update stored value of earlier store to merged constant. - Earlier->setOperand(0, Merged); - ++NumModifiedStores; - MadeChange = true; - - // Remove later store and remove any outstanding overlap intervals for - // the updated store. - State.deleteDeadInstruction(Later); - auto I = State.IOLs.find(Earlier->getParent()); - if (I != State.IOLs.end()) - I->second.erase(Earlier); - break; - } - } - if (OR == OW_Complete) { + if (State.isMemTerminatorInst(SI)) { + const Value *NIUnd = GetUnderlyingObject(NILoc.Ptr, DL); + if (!SILocUnd || SILocUnd != NIUnd) + continue; LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *NI << "\n KILLER: " << *SI << '\n'); State.deleteDeadInstruction(NI); ++NumFastStores; MadeChange = true; + } else { + // Check if NI overwrites SI. + int64_t InstWriteOffset, DepWriteOffset; + auto Iter = State.IOLs.insert( + std::make_pair( + NI->getParent(), InstOverlapIntervalsTy())); + auto &IOL = Iter.first->second; + OverwriteResult OR = isOverwrite(SILoc, NILoc, DL, TLI, DepWriteOffset, + InstWriteOffset, NI, IOL, AA, &F); + + if (EnablePartialStoreMerging && OR == OW_PartialEarlierWithFullLater) { + auto *Earlier = dyn_cast(NI); + auto *Later = dyn_cast(SI); + if (Constant *Merged = tryToMergePartialOverlappingStores( + Earlier, Later, InstWriteOffset, DepWriteOffset, DL, &AA, + &DT)) { + + // Update stored value of earlier store to merged constant. + Earlier->setOperand(0, Merged); + ++NumModifiedStores; + MadeChange = true; + + // Remove later store and remove any outstanding overlap intervals + // for the updated store. + State.deleteDeadInstruction(Later); + auto I = State.IOLs.find(Earlier->getParent()); + if (I != State.IOLs.end()) + I->second.erase(Earlier); + break; + } + } + + if (OR == OW_Complete) { + LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *NI + << "\n KILLER: " << *SI << '\n'); + State.deleteDeadInstruction(NI); + ++NumFastStores; + MadeChange = true; + } } } } diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll index f5d7e2514e56..85a749f81d50 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll @@ -1,5 +1,4 @@ -; XFAIL: * -; RUN: opt < %s -basic-aa -dse-enable-dse-memoryssa -S -enable-dse-partial-overwrite-tracking | FileCheck %s +; RUN: opt < %s -basic-aa -dse -enable-dse-memoryssa -S -enable-dse-partial-overwrite-tracking | FileCheck %s ; PR28588 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll index 81e64c8ba005..13cfb7002cf1 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll @@ -1,5 +1,3 @@ -; XFAIL: * - ; RUN: opt < %s -basic-aa -dse -enable-dse-memoryssa -S | FileCheck %s target datalayout = "e-p:64:64:64" diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll index 222c293111a2..29ff7726c4ee 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll @@ -1,5 +1,3 @@ -; XFAIL: * - ; RUN: opt -S -basic-aa -dse -enable-dse-memoryssa < %s | FileCheck %s target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128" @@ -35,5 +33,3 @@ define void @test2(i32* %P) { ; CHECK: lifetime.end ret void } - - diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll index 80db7f5d6b6e..c28f0cc90124 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll @@ -1,3 +1,4 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; Test that the getelementptr generated when the dse pass determines that ; a memset can be shortened has the debugloc carried over from the memset. diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll index c7949e7c97d9..e28713929e9a 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll @@ -207,6 +207,7 @@ exit: call void @capture(i8* %m) ret i8* %m } + ; Stores to stack objects can be eliminated if they are not captured inside the function. define void @test_alloca_nocapture_1() { ; CHECK-LABEL: @test_alloca_nocapture_1( diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll index c8b951ea80a6..04cdae285d81 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll @@ -17,13 +17,12 @@ declare void @free(i8* nocapture) #2 define void @test16(i32* noalias %P) { ; CHECK-LABEL: @test16( ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P:%.*]] to i8* -; CHECK-NEXT: store i32 1, i32* [[P]] ; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB3:%.*]] ; CHECK: bb1: -; CHECK-NEXT: store i32 1, i32* [[P]] ; CHECK-NEXT: br label [[BB3]] ; CHECK: bb3: ; CHECK-NEXT: call void @free(i8* [[P2]]) +; CHECK-NEXT: store i32 1, i32* [[P]] ; CHECK-NEXT: ret void ; %P2 = bitcast i32* %P to i8* @@ -34,6 +33,7 @@ bb1: br label %bb3 bb3: call void @free(i8* %P2) + store i32 1, i32* %P ret void } @@ -41,11 +41,9 @@ bb3: define void @test17(i32* noalias %P) { ; CHECK-LABEL: @test17( ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P:%.*]] to i8* -; CHECK-NEXT: store i32 1, i32* [[P]] ; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB3:%.*]] ; CHECK: bb1: ; CHECK-NEXT: call void @unknown_func() -; CHECK-NEXT: store i32 1, i32* [[P]] ; CHECK-NEXT: br label [[BB3]] ; CHECK: bb3: ; CHECK-NEXT: call void @free(i8* [[P2]]) @@ -63,6 +61,30 @@ bb3: ret void } +define void @test17_read_after_free(i32* noalias %P) { +; CHECK-LABEL: @test17_read_after_free( +; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P:%.*]] to i8* +; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB3:%.*]] +; CHECK: bb1: +; CHECK-NEXT: br label [[BB3]] +; CHECK: bb3: +; CHECK-NEXT: call void @free(i8* [[P2]]) +; CHECK-NEXT: [[LV:%.*]] = load i8, i8* [[P2]] +; CHECK-NEXT: ret void +; + %P2 = bitcast i32* %P to i8* + store i32 1, i32* %P + br i1 true, label %bb1, label %bb3 +bb1: + store i32 1, i32* %P + br label %bb3 +bb3: + call void @free(i8* %P2) + %lv = load i8, i8* %P2 + ret void +} + + define void @test6(i32* noalias %P) { ; CHECK-LABEL: @test6( ; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB2:%.*]] diff --git a/llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll b/llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll index 8411c1d9ac8b..ef785f10ffaf 100644 --- a/llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll +++ b/llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll @@ -625,7 +625,6 @@ define void @test41(i32* noalias %P) { ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P:%.*]] to i8* ; CHECK-NEXT: store i32 1, i32* [[P]], align 4 ; CHECK-NEXT: call void @unknown_func() -; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: call void @free(i8* [[P2]]) ; CHECK-NEXT: ret void ; From llvm-commits at lists.llvm.org Wed Jul 8 01:00:01 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:00:01 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: <9137a776c554ab07af5037b1e0d54a05@localhost.localdomain> This revision was automatically updated to reflect the committed changes. fhahn marked an inline comment as done. Closed by commit rG80970ac87574: [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end). (authored by fhahn). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 Files: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D72410.276328.patch Type: text/x-patch Size: 12860 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:02:17 2020 From: llvm-commits at lists.llvm.org (Heejin Ahn via llvm-commits) Date: Wed, 08 Jul 2020 01:02:17 -0700 (PDT) Subject: [llvm] 7e6793a - [WebAssembly] Generate unreachable after __stack_chk_fail Message-ID: <5f057d89.1c69fb81.18697.babb@mx.google.com> Author: Heejin Ahn Date: 2020-07-08T01:02:05-07:00 New Revision: 7e6793aa33dd61ed9dd531871fce30c1b7978e13 URL: https://github.com/llvm/llvm-project/commit/7e6793aa33dd61ed9dd531871fce30c1b7978e13 DIFF: https://github.com/llvm/llvm-project/commit/7e6793aa33dd61ed9dd531871fce30c1b7978e13.diff LOG: [WebAssembly] Generate unreachable after __stack_chk_fail `__stack_chk_fail` does not return, but `unreachable` was not generated following `call __stack_chk_fail`. This had a possibility to generate an invalid binary for functions with a return type, because `__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can be the last instruction in the function whose return type is non-void. Generating `unreachable` after it makes sure CFGStackify's `fixEndsAtEndOfFunction` handles it correctly. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D83277 Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/WebAssembly/stack-protector.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 05813296997e..f5e12101c8e9 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -2668,6 +2668,11 @@ SelectionDAGBuilder::visitSPDescriptorFailure(StackProtectorDescriptor &SPD) { // Passing 'true' for doesNotReturn above won't generate the trap for us. if (TM.getTargetTriple().isPS4CPU()) Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); + // WebAssembly needs an unreachable instruction after a non-returning call, + // because the function return type can be diff erent from __stack_chk_fail's + // return type (void). + if (TM.getTargetTriple().isWasm()) + Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); DAG.setRoot(Chain); } diff --git a/llvm/test/CodeGen/WebAssembly/stack-protector.ll b/llvm/test/CodeGen/WebAssembly/stack-protector.ll index 2b6521b43770..eda24bcb366b 100644 --- a/llvm/test/CodeGen/WebAssembly/stack-protector.ll +++ b/llvm/test/CodeGen/WebAssembly/stack-protector.ll @@ -1,13 +1,13 @@ ; RUN: llc -verify-machineinstrs -mtriple=wasm32-unknown-unknown < %s | FileCheck -check-prefix=WASM32 %s -; WASM32: i32.load 28 -; WASM32-NEXT: i32.ne -; WASM32-NEXT: br_if 0 - -; WASM32: __stack_chk_fail - @"\01LC" = internal constant [11 x i8] c"buf == %s\0A\00" ; <[11 x i8]*> [#uses=1] +; WASM32-LABEL: test +; WASM32: i32.load 28 +; WASM32: br_if 0 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + define void @test(i8* %a) nounwind ssp { entry: %a_addr = alloca i8* ; [#uses=2] @@ -25,6 +25,27 @@ return: ; preds = %entry ret void } +; WASM32-LABEL: test_return_i32 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + +define i32 @test_return_i32(i8* %a) nounwind ssp { +entry: + %a_addr = alloca i8* ; [#uses=2] + %buf = alloca [8 x i8] ; <[8 x i8]*> [#uses=2] + %"alloca point" = bitcast i32 0 to i32 ; [#uses=0] + store i8* %a, i8** %a_addr + %buf1 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %0 = load i8*, i8** %a_addr, align 4 ; [#uses=1] + %1 = call i8* @strcpy(i8* %buf1, i8* %0) nounwind ; [#uses=0] + %buf2 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %2 = call i32 (i8*, ...) @printf(i8* getelementptr ([11 x i8], [11 x i8]* @"\01LC", i32 0, i32 0), i8* %buf2) nounwind ; [#uses=0] + br label %return + +return: ; preds = %entry + ret i32 0 +} + declare i8* @strcpy(i8*, i8*) nounwind declare i32 @printf(i8*, ...) nounwind From llvm-commits at lists.llvm.org Wed Jul 8 01:02:26 2020 From: llvm-commits at lists.llvm.org (Heejin Ahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:02:26 +0000 (UTC) Subject: [PATCH] D83277: [WebAssembly] Generate unreachable after __stack_chk_fail In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG7e6793aa33dd: [WebAssembly] Generate unreachable after __stack_chk_fail (authored by aheejin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83277/new/ https://reviews.llvm.org/D83277 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/CodeGen/WebAssembly/stack-protector.ll Index: llvm/test/CodeGen/WebAssembly/stack-protector.ll =================================================================== --- llvm/test/CodeGen/WebAssembly/stack-protector.ll +++ llvm/test/CodeGen/WebAssembly/stack-protector.ll @@ -1,13 +1,13 @@ ; RUN: llc -verify-machineinstrs -mtriple=wasm32-unknown-unknown < %s | FileCheck -check-prefix=WASM32 %s -; WASM32: i32.load 28 -; WASM32-NEXT: i32.ne -; WASM32-NEXT: br_if 0 - -; WASM32: __stack_chk_fail - @"\01LC" = internal constant [11 x i8] c"buf == %s\0A\00" ; <[11 x i8]*> [#uses=1] +; WASM32-LABEL: test +; WASM32: i32.load 28 +; WASM32: br_if 0 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + define void @test(i8* %a) nounwind ssp { entry: %a_addr = alloca i8* ; [#uses=2] @@ -25,6 +25,27 @@ ret void } +; WASM32-LABEL: test_return_i32 +; WASM32: call __stack_chk_fail +; WASM32-NEXT: unreachable + +define i32 @test_return_i32(i8* %a) nounwind ssp { +entry: + %a_addr = alloca i8* ; [#uses=2] + %buf = alloca [8 x i8] ; <[8 x i8]*> [#uses=2] + %"alloca point" = bitcast i32 0 to i32 ; [#uses=0] + store i8* %a, i8** %a_addr + %buf1 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %0 = load i8*, i8** %a_addr, align 4 ; [#uses=1] + %1 = call i8* @strcpy(i8* %buf1, i8* %0) nounwind ; [#uses=0] + %buf2 = bitcast [8 x i8]* %buf to i8* ; [#uses=1] + %2 = call i32 (i8*, ...) @printf(i8* getelementptr ([11 x i8], [11 x i8]* @"\01LC", i32 0, i32 0), i8* %buf2) nounwind ; [#uses=0] + br label %return + +return: ; preds = %entry + ret i32 0 +} + declare i8* @strcpy(i8*, i8*) nounwind declare i32 @printf(i8*, ...) nounwind Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -2668,6 +2668,11 @@ // Passing 'true' for doesNotReturn above won't generate the trap for us. if (TM.getTargetTriple().isPS4CPU()) Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); + // WebAssembly needs an unreachable instruction after a non-returning call, + // because the function return type can be different from __stack_chk_fail's + // return type (void). + if (TM.getTargetTriple().isWasm()) + Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain); DAG.setRoot(Chain); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83277.276329.patch Type: text/x-patch Size: 2534 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:03:22 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:03:22 +0000 (UTC) Subject: [PATCH] D72410: [DSE] Eliminate stores by terminators (free,lifetime.end). In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG80970ac87574: [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end). (authored by fhahn). Changed prior to commit: https://reviews.llvm.org/D72410?vs=273340&id=275698#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72410/new/ https://reviews.llvm.org/D72410 Files: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp llvm/test/Transforms/DeadStoreElimination/MSSA/2016-07-17-UseAfterFree.ll llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D72410.275698.patch Type: text/x-patch Size: 12860 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:06:48 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:06:48 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <527ab2b9c11ee5f9c7b96e277a8d2602@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:842 +bool doesXCOFFTracebackTableBegin(ArrayRef Bytes) { + assert(Bytes.size() == 4 && "Traceback table started with 4 bytes zero."); + return support::endian::read32be(Bytes.data()) == 0; ---------------- jasonliu wrote: > How does callers know that they have to pass an ArrayRef that has exactly 4 bytes to this function? > What if they have an empty array? What if they pass in a 8 byte array? > I don't think it's callers' responsibility to ensure that. This function should still return a correct answer for the above situation. This is exactly the point I made earlier, but seems to have been ignored... :( ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:846 + +static std::string parseParaType(uint32_t Value, unsigned int ParaNum) { + std::string ParaType; ---------------- Any particular reason you're using `unsigned int` here instead of just `unsigned` like you do below? Should it actually be a `size_t` in both cases? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:885 + DataExtractor DE(ArrayRef(Ptr, S), false, 4); + uint64_t offset_ptr = 0; + ---------------- Please use LLVM style for variable names. ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:85 + XCOFFTracebackTable::create(V1, sizeof(V1)); + ASSERT_TRUE(!!TTOrErr1) << "Parse error"; + XCOFFTracebackTable TT1 = TTOrErr1.get(); ---------------- Here and in the equivalent cases elsewhere, use `ASSERT_THAT_EXPECTED(TTOrErr1, Succeeded());` ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:86 + ASSERT_TRUE(!!TTOrErr1) << "Parse error"; + XCOFFTracebackTable TT1 = TTOrErr1.get(); + EXPECT_EQ(TT1.getVersion(), 1); ---------------- `XCOFFTracebackTable TT1 = *TTOrErr1;` is more traditional usage. ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:95 + + EXPECT_TRUE(TT1.getParaType()); + EXPECT_STREQ(TT1.getParaType().getValue().data(), "i, i, f, f, d"); ---------------- `ASSERT_TRUE` or the next check will crash if it ever fails. ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:96 + EXPECT_TRUE(TT1.getParaType()); + EXPECT_STREQ(TT1.getParaType().getValue().data(), "i, i, f, f, d"); + ---------------- Does `EXPECT_EQ(TT1.getParaType().getValue(), "i, i, f, f, d");` not work? ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:101-103 + ASSERT_TRUE(!!TTOrErr2) << "Parse error"; + XCOFFTracebackTable TT2 = TTOrErr2.get(); + EXPECT_STREQ(TT2.getParaType().getValue().data(), "f, f, d, i, i"); ---------------- For this and the ones below, same comments as above, but you also need an `ASSERT_TRUE(TT2.getParaType())` to avoid a crash in case the `Optional` is empty. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 01:07:49 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:07:49 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: courbet added inline comments. ================ Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:178 + const void *To = nullptr; + if (!FunctionBytes.empty()) { + From = reinterpret_cast(FunctionBytes.data()); ---------------- Let's return an error when `FunctionBytes.empty()`. Right now this will assert later. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 From llvm-commits at lists.llvm.org Wed Jul 8 01:09:13 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:09:13 +0000 (UTC) Subject: [PATCH] D81675: SILoadStoreOptimizer: add support for GFX10 image instructions In-Reply-To: References: Message-ID: <245e54ec15db4967c68adbcb0a683ce7@localhost.localdomain> foad added a comment. Ping! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81675/new/ https://reviews.llvm.org/D81675 From llvm-commits at lists.llvm.org Wed Jul 8 01:12:35 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:12:35 +0000 (UTC) Subject: [PATCH] D82868: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. In-Reply-To: References: Message-ID: <54a62ca41c71a4dd6732c4637dfcb361@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM, with nit. ================ Comment at: llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp:124 +static Error ParseCFI(dwarf::CIE &C, ArrayRef Instructions, + Optional Size = None) { ---------------- `parseCFI` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82868/new/ https://reviews.llvm.org/D82868 From llvm-commits at lists.llvm.org Wed Jul 8 01:15:00 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:15:00 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <38fec6063a69dadd9b62187f91db3620@localhost.localdomain> lebedev.ri added a comment. I do think we should have (miscompiled) tests in all places first. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 01:16:16 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Wed, 08 Jul 2020 01:16:16 -0700 (PDT) Subject: [llvm] 15aeb80 - [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll Message-ID: <5f0580d0.1c69fb81.d651f.c1c2@mx.google.com> Author: David Sherwood Date: 2020-07-08T09:16:00+01:00 New Revision: 15aeb805dc46fbd268388af5f8de19e4de29cdb3 URL: https://github.com/llvm/llvm-project/commit/15aeb805dc46fbd268388af5f8de19e4de29cdb3 DIFF: https://github.com/llvm/llvm-project/commit/15aeb805dc46fbd268388af5f8de19e4de29cdb3.diff LOG: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll For the GetElementPtr case in function AddressingModeMatcher::matchOperationAddr I've changed the code to use the TypeSize class instead of relying upon the implicit conversion to a uint64_t. As part of this we now check for scalable types and if we encounter one just bail out for now as the subsequent optimisations doesn't currently support them. This changes fixes up all warnings in the following tests: llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Differential Revision: https://reviews.llvm.org/D83124 Added: Modified: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp index 8181c6643cfd..5fe8a092797b 100644 --- a/llvm/lib/CodeGen/CodeGenPrepare.cpp +++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -4356,15 +4356,20 @@ bool AddressingModeMatcher::matchOperationAddr(User *AddrInst, unsigned Opcode, cast(AddrInst->getOperand(i))->getZExtValue(); ConstantOffset += SL->getElementOffset(Idx); } else { - uint64_t TypeSize = DL.getTypeAllocSize(GTI.getIndexedType()); - if (ConstantInt *CI = dyn_cast(AddrInst->getOperand(i))) { - const APInt &CVal = CI->getValue(); - if (CVal.getMinSignedBits() <= 64) { - ConstantOffset += CVal.getSExtValue() * TypeSize; - continue; + TypeSize TS = DL.getTypeAllocSize(GTI.getIndexedType()); + if (TS.isNonZero()) { + // The optimisations below currently only work for fixed offsets. + if (TS.isScalable()) + return false; + int64_t TypeSize = TS.getFixedSize(); + if (ConstantInt *CI = + dyn_cast(AddrInst->getOperand(i))) { + const APInt &CVal = CI->getValue(); + if (CVal.getMinSignedBits() <= 64) { + ConstantOffset += CVal.getSExtValue() * TypeSize; + continue; + } } - } - if (TypeSize) { // Scales of zero don't do anything. // We only allow one variable index at the moment. if (VariableOperand != -1) return false; diff --git a/llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll b/llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll index 20bcd51e716d..13bd864c1f23 100644 --- a/llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll +++ b/llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; LD1B diff --git a/llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll b/llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll index ba9941d35e3a..2e4f19014545 100644 --- a/llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll +++ b/llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ST1B From llvm-commits at lists.llvm.org Wed Jul 8 01:16:26 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:16:26 +0000 (UTC) Subject: [PATCH] D83124: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll In-Reply-To: References: Message-ID: <5ab81749f6e8dd5beebda5f786db3a97@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG15aeb805dc46: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83124/new/ https://reviews.llvm.org/D83124 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Index: llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll +++ llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ST1B Index: llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll +++ llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; LD1B Index: llvm/lib/CodeGen/CodeGenPrepare.cpp =================================================================== --- llvm/lib/CodeGen/CodeGenPrepare.cpp +++ llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -4356,15 +4356,20 @@ cast(AddrInst->getOperand(i))->getZExtValue(); ConstantOffset += SL->getElementOffset(Idx); } else { - uint64_t TypeSize = DL.getTypeAllocSize(GTI.getIndexedType()); - if (ConstantInt *CI = dyn_cast(AddrInst->getOperand(i))) { - const APInt &CVal = CI->getValue(); - if (CVal.getMinSignedBits() <= 64) { - ConstantOffset += CVal.getSExtValue() * TypeSize; - continue; + TypeSize TS = DL.getTypeAllocSize(GTI.getIndexedType()); + if (TS.isNonZero()) { + // The optimisations below currently only work for fixed offsets. + if (TS.isScalable()) + return false; + int64_t TypeSize = TS.getFixedSize(); + if (ConstantInt *CI = + dyn_cast(AddrInst->getOperand(i))) { + const APInt &CVal = CI->getValue(); + if (CVal.getMinSignedBits() <= 64) { + ConstantOffset += CVal.getSExtValue() * TypeSize; + continue; + } } - } - if (TypeSize) { // Scales of zero don't do anything. // We only allow one variable index at the moment. if (VariableOperand != -1) return false; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83124.276332.patch Type: text/x-patch Size: 2696 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:17:36 2020 From: llvm-commits at lists.llvm.org (Rainer Orth via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:17:36 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: ro added a comment. The one from April wasn't run at all, but only reports some obscure internal error. There were generic problems with the pre-merge infrastructure at that time IIRC. Last night, I tried to restart the check, but it's still stuck in the first step (creating a branch in the repo to apply the patch to) after 10 hours. In that form, the pre-merge checks are worse than useless IMO. I'd suggest you go ahead and commit the patch. Should there be any issues despite your and my testing, the buildbots will let us know quickly enough. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 From llvm-commits at lists.llvm.org Wed Jul 8 01:18:26 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:18:26 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: <892cc61d2b4b3e140b5134cc94a26d4e@localhost.localdomain> richard.barton.arm added a comment. +1 on @ro 's suggestion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 From llvm-commits at lists.llvm.org Wed Jul 8 01:19:27 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:19:27 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: <526a7760db04d1f1c9a9c0ce5c4aac35@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:1 +; RUN: llvm-mc < %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1 +; RUN: llvm-objdump --triple=amdgcn-amd-amdhsa --mcpu=gfx908 --disassemble-symbols=my_kernel.kd %t1 | tail -n +8 | llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2 ---------------- No need for the `<`. llvm-mc is quite capable of taking inputs on the command-line as positional arguments. ================ Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:2 +; RUN: llvm-mc < %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1 +; RUN: llvm-objdump --triple=amdgcn-amd-amdhsa --mcpu=gfx908 --disassemble-symbols=my_kernel.kd %t1 | tail -n +8 | llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2 +; RUN: diff %t1 %t2 ---------------- This line is too long. Please break it up into individual lines: ; RUN: llvm-objdump ... | \ ; RUN: tail -n +8 | llvm-mc ... ================ Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:10 + +; Right now this test fails for some combinations for the two directives above. For example (50, 23) and (42, 42) ---------------- Is this a FIXME/TODO? If so, please add "FIXME" or "TODO". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 From llvm-commits at lists.llvm.org Wed Jul 8 01:19:53 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:19:53 +0000 (UTC) Subject: [PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:5-8 +.amdhsa_kernel my_kernel +.amdhsa_next_free_vgpr 50 +.amdhsa_next_free_sgpr 2 +.end_amdhsa_kernel ---------------- This test is also quite small. Does it actually cover every code path? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80713/new/ https://reviews.llvm.org/D80713 From llvm-commits at lists.llvm.org Wed Jul 8 01:20:06 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:20:06 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: lebedev.ri accepted this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. Thanks, this looks about right to me, but please wait for @nikic. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Wed Jul 8 01:21:54 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:21:54 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <19854e03b9dbfae224426f4bb4a29bd8@localhost.localdomain> lebedev.ri added a comment. I may be missing context, but this may be missing some wording. *Why* should they not be in the callgraph? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Wed Jul 8 01:29:55 2020 From: llvm-commits at lists.llvm.org (Petr Hosek via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:29:55 +0000 (UTC) Subject: [PATCH] D80873: [clang][cmake] Force CMAKE_LINKER for multistage build in case of BOOTSTRAP_LLVM_ENABLE_LLD and MSVC In-Reply-To: References: Message-ID: phosek added inline comments. ================ Comment at: clang/CMakeLists.txt:751 + if(BOOTSTRAP_LLVM_ENABLE_LLD) + if(MSVC AND NOT BOOTSTRAP_CMAKE_SYSTEM_NAME) + set(${CLANG_STAGE}_LINKER -DCMAKE_LINKER=${LLVM_RUNTIME_OUTPUT_INTDIR}/lld-link.exe) ---------------- I don't understand the second part of this condition, can you elaborate? Why not set `CMAKE_LINKER` to `lld-link.exe` even if `BOOTSTRAP_CMAKE_SYSTEM_NAME STREQUAL "Windows"`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80873/new/ https://reviews.llvm.org/D80873 From llvm-commits at lists.llvm.org Wed Jul 8 01:35:52 2020 From: llvm-commits at lists.llvm.org (Nuno Lopes via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:35:52 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <22a56e2fbf10f7e679a203a7fc488f5f@localhost.localdomain> nlopes accepted this revision. nlopes added a comment. This revision is now accepted and ready to land. Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 01:36:52 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Wed, 08 Jul 2020 01:36:52 -0700 (PDT) Subject: [llvm] 5b14f50 - [CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR Message-ID: <5f0585a4.1c69fb81.9852f.f613@mx.google.com> Author: David Sherwood Date: 2020-07-08T09:36:34+01:00 New Revision: 5b14f5051f134d29f51b523e5c9b602c08a4a7af URL: https://github.com/llvm/llvm-project/commit/5b14f5051f134d29f51b523e5c9b602c08a4a7af DIFF: https://github.com/llvm/llvm-project/commit/5b14f5051f134d29f51b523e5c9b602c08a4a7af.diff LOG: [CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR Calling getVectorNumElements() is not safe for scalable vectors and we should normally use getVectorElementCount() instead. However, for the code changed in this patch I decided to simply move the instantiation of the variable 'OutNumElems' lower down to the place where only fixed-width vectors are used, and hence it is safe to call getVectorNumElements(). Fixes up one warning in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83195 Added: Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp index 4e0e8c5b052b..74071f763dbf 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -4334,7 +4334,6 @@ SDValue DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N) { EVT OutVT = N->getValueType(0); EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT); assert(NOutVT.isVector() && "This type must be promoted to a vector type"); - unsigned OutNumElems = OutVT.getVectorNumElements(); EVT NOutVTElem = NOutVT.getVectorElementType(); SDLoc dl(N); @@ -4371,6 +4370,7 @@ SDValue DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N) { EVT InVT = InOp0.getValueType(); + unsigned OutNumElems = OutVT.getVectorNumElements(); SmallVector Ops; Ops.reserve(OutNumElems); for (unsigned i = 0; i != OutNumElems; ++i) { From llvm-commits at lists.llvm.org Wed Jul 8 01:37:05 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:37:05 +0000 (UTC) Subject: [PATCH] D83195: [CodeGen] Fix a warning in DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR In-Reply-To: References: Message-ID: <527a56768664e620d9bdb042dd36753e@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG5b14f5051f13: [CodeGen] Fix wrong use of getVectorNumElements in… (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83195/new/ https://reviews.llvm.org/D83195 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -4334,7 +4334,6 @@ EVT OutVT = N->getValueType(0); EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT); assert(NOutVT.isVector() && "This type must be promoted to a vector type"); - unsigned OutNumElems = OutVT.getVectorNumElements(); EVT NOutVTElem = NOutVT.getVectorElementType(); SDLoc dl(N); @@ -4371,6 +4370,7 @@ EVT InVT = InOp0.getValueType(); + unsigned OutNumElems = OutVT.getVectorNumElements(); SmallVector Ops; Ops.reserve(OutNumElems); for (unsigned i = 0; i != OutNumElems; ++i) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83195.276339.patch Type: text/x-patch Size: 805 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:37:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:37:27 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <16af8ee3afed0282b834da0fabbc26ea@localhost.localdomain> lebedev.ri added a comment. In D83360#2138386 , @nlopes wrote: > Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633 Sure, but we still need to have a test with comment that it should not be folded, referencing all this. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 01:39:02 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:39:02 +0000 (UTC) Subject: [PATCH] D83124: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll In-Reply-To: References: Message-ID: <9b4b1824acaac2d3d01bcafc25de7e95@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG15aeb805dc46: [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83124/new/ https://reviews.llvm.org/D83124 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Index: llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll +++ llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; ST1B Index: llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll +++ llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; LD1B Index: llvm/lib/CodeGen/CodeGenPrepare.cpp =================================================================== --- llvm/lib/CodeGen/CodeGenPrepare.cpp +++ llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -4356,15 +4356,20 @@ cast(AddrInst->getOperand(i))->getZExtValue(); ConstantOffset += SL->getElementOffset(Idx); } else { - uint64_t TypeSize = DL.getTypeAllocSize(GTI.getIndexedType()); - if (ConstantInt *CI = dyn_cast(AddrInst->getOperand(i))) { - const APInt &CVal = CI->getValue(); - if (CVal.getMinSignedBits() <= 64) { - ConstantOffset += CVal.getSExtValue() * TypeSize; - continue; + TypeSize TS = DL.getTypeAllocSize(GTI.getIndexedType()); + if (TS.isNonZero()) { + // The optimisations below currently only work for fixed offsets. + if (TS.isScalable()) + return false; + int64_t TypeSize = TS.getFixedSize(); + if (ConstantInt *CI = + dyn_cast(AddrInst->getOperand(i))) { + const APInt &CVal = CI->getValue(); + if (CVal.getMinSignedBits() <= 64) { + ConstantOffset += CVal.getSExtValue() * TypeSize; + continue; + } } - } - if (TypeSize) { // Scales of zero don't do anything. // We only allow one variable index at the moment. if (VariableOperand != -1) return false; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83124.275700.patch Type: text/x-patch Size: 2696 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:39:48 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:39:48 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml In-Reply-To: References: Message-ID: <3845462ed073d8bf6479cbdc35fe1f0d@localhost.localdomain> grimar accepted this revision. grimar added a comment. LGTM, cool! Since you are removing `.cxx`, it probably worth mentioning in the patch header (in theory it might be used in downstream builds). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83350/new/ https://reviews.llvm.org/D83350 From llvm-commits at lists.llvm.org Wed Jul 8 01:42:03 2020 From: llvm-commits at lists.llvm.org (Stefanos Baziotis via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:42:03 +0000 (UTC) Subject: [PATCH] D82895: [NFC][LoopInfo] Document empty() In-Reply-To: References: Message-ID: <2288836832ac9b38f031c8a75d96c989@localhost.localdomain> baziotis marked an inline comment as done. baziotis added a comment. In D82895#2137834 , @Meinersbur wrote: > In D82895#2130151 , @fhahn wrote: > > > Hm, adding multiple ways to check the same thing may lead to other problems. IMO it will be more confusing if some places use `!L->getParentLoop()` and other `L->isOutermost()`. > > > There are often multiple ways for the same goal, for instance `LoadInst::getPointerOperand()` and `LoadInst::getOperand(0)`. One makes the intention clearer than the other. > > > Same for `isInnerMost()`/`empty()`. > > Since these are identical, I'd tend to remove one. In these case I'd remove `empty()` since `Loop` does not represent a container. This the orthogonality vs expressiveness and I'm sure we all value each under certain circumstances. That said, I too agree that if we were to take a step towards orthogonality, that would be to replace `empty()`, rather than not introducing `isInnermost/Outermost`. ================ Comment at: llvm/include/llvm/Analysis/LoopInfo.h:161 + bool isInnermost() const { return empty(); } + // Outermost is the same as top-level. + bool isOutermost() const { return getParentLoop() == nullptr; } ---------------- Meinersbur wrote: > make this a doc-string as well? I added it as non-doc comment intentionally but probably bad thought. I'll fix that. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82895/new/ https://reviews.llvm.org/D82895 From llvm-commits at lists.llvm.org Wed Jul 8 01:47:30 2020 From: llvm-commits at lists.llvm.org (Diana Picus via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:47:30 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo In-Reply-To: References: Message-ID: <0c8b83e49c01fd6d05ced7d0802b9c73@localhost.localdomain> rovka added a comment. Don't you also have to set as Available/Unavailable when initializing the TLI? ================ Comment at: llvm/lib/Analysis/TargetLibraryInfo.cpp:1232 + case LibFunc_atomic_load: + // void __atomic_load(size_t, void *, void *, int) + case LibFunc_atomic_store: ---------------- Nit: The comment should go above the case statement. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83361/new/ https://reviews.llvm.org/D83361 From llvm-commits at lists.llvm.org Wed Jul 8 01:48:45 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:48:45 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <840a9ab294fc671c8b9cb97fd0a87103@localhost.localdomain> mwezdeck updated this revision to Diff 276340. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 Files: llvm/lib/Support/ManagedStatic.cpp Index: llvm/lib/Support/ManagedStatic.cpp =================================================================== --- llvm/lib/Support/ManagedStatic.cpp +++ llvm/lib/Support/ManagedStatic.cpp @@ -18,23 +18,13 @@ using namespace llvm; static const ManagedStaticBase *StaticList = nullptr; -static std::recursive_mutex *ManagedStaticMutex = nullptr; -static llvm::once_flag mutex_init_flag; - -static void initializeMutex() { - ManagedStaticMutex = new std::recursive_mutex(); -} - -static std::recursive_mutex *getManagedStaticMutex() { - llvm::call_once(mutex_init_flag, initializeMutex); - return ManagedStaticMutex; -} +static std::recursive_mutex ManagedStaticMutex; void ManagedStaticBase::RegisterManagedStatic(void *(*Creator)(), void (*Deleter)(void*)) const { assert(Creator); if (llvm_is_multithreaded()) { - std::lock_guard Lock(*getManagedStaticMutex()); + std::lock_guard Lock(ManagedStaticMutex); if (!Ptr.load(std::memory_order_relaxed)) { void *Tmp = Creator(); @@ -76,7 +66,7 @@ /// llvm_shutdown - Deallocate and destroy all ManagedStatic variables. void llvm::llvm_shutdown() { - std::lock_guard Lock(*getManagedStaticMutex()); + std::lock_guard Lock(ManagedStaticMutex); while (StaticList) StaticList->destroy(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83372.276340.patch Type: text/x-patch Size: 1415 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:50:00 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:50:00 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <2bdc600117de27770268507de1d8b9e1@localhost.localdomain> foad added inline comments. ================ Comment at: llvm/unittests/IR/LegacyPassManagerTest.cpp:683 Function *SF = splitSimpleFunction(*F); - CallInst::Create(F, "", &SF->getEntryBlock()); + CallInst::Create(F, "", &*SF->getEntryBlock().getFirstInsertionPt()); ASSERT_EQ(M->getFunctionList().size(), 5U); ---------------- Is this change related to the rest of the patch somehow? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Wed Jul 8 01:57:57 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 08:57:57 +0000 (UTC) Subject: [PATCH] D83197: [CodeGen] Fix warning in DAGTypeLegalizer::SplitVecRes_ExtendOp In-Reply-To: References: Message-ID: <53e49afd8fc3844341afec78547ffefb@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9e66e9c30a19: [CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer… (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83197/new/ https://reviews.llvm.org/D83197 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1778,8 +1778,7 @@ // more effectively move in the right direction and prevent falling down // to scalarization in many cases due to the input vector being split too // far. - unsigned NumElements = SrcVT.getVectorNumElements(); - if ((NumElements & 1) == 0 && + if ((SrcVT.getVectorMinNumElements() & 1) == 0 && SrcVT.getSizeInBits() * 2 < DestVT.getSizeInBits()) { LLVMContext &Ctx = *DAG.getContext(); EVT NewSrcVT = SrcVT.widenIntegerVectorElementType(Ctx); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83197.276342.patch Type: text/x-patch Size: 739 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 01:57:58 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Wed, 08 Jul 2020 01:57:58 -0700 (PDT) Subject: [llvm] 9e66e9c - [CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer::SplitVecRes_ExtendOp Message-ID: <5f058a96.1c69fb81.ceeb6.06ee@mx.google.com> Author: David Sherwood Date: 2020-07-08T09:53:20+01:00 New Revision: 9e66e9c30a19dc5923c85d3a3a4b757935299fba URL: https://github.com/llvm/llvm-project/commit/9e66e9c30a19dc5923c85d3a3a4b757935299fba DIFF: https://github.com/llvm/llvm-project/commit/9e66e9c30a19dc5923c85d3a3a4b757935299fba.diff LOG: [CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer::SplitVecRes_ExtendOp In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid call to getVectorNumElements() with a call to getVectorMinNumElements(), since the code path works for both fixed and scalable vectors. This fixes up a warning in the following test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83197 Added: Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 96c3a715532a..15d88eb5811f 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1778,8 +1778,7 @@ void DAGTypeLegalizer::SplitVecRes_ExtendOp(SDNode *N, SDValue &Lo, // more effectively move in the right direction and prevent falling down // to scalarization in many cases due to the input vector being split too // far. - unsigned NumElements = SrcVT.getVectorNumElements(); - if ((NumElements & 1) == 0 && + if ((SrcVT.getVectorMinNumElements() & 1) == 0 && SrcVT.getSizeInBits() * 2 < DestVT.getSizeInBits()) { LLVMContext &Ctx = *DAG.getContext(); EVT NewSrcVT = SrcVT.widenIntegerVectorElementType(Ctx); From llvm-commits at lists.llvm.org Wed Jul 8 01:59:00 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Mikael_Holm=C3=A9n_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 08:59:00 +0000 (UTC) Subject: [PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support In-Reply-To: References: Message-ID: uabelho added inline comments. ================ Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3676 + } +} + ---------------- gcc (7.4.0) warns with ../lib/MC/MCParser/MasmParser.cpp:3676:1: warning: control reaches end of non-void function [-Wreturn-type] here. It can be silenced with an llvm_unreachable if that's acceptable. ================ Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3829 + } +} + ---------------- gcc 7.4.0 warning here too ../lib/MC/MCParser/MasmParser.cpp:3829:1: warning: control reaches end of non-void function [-Wreturn-type] ================ Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3831 + +bool MasmParser::emitStructValue(const StructInfo &Structure) { + size_t Offset = 0; ---------------- gcc warns that this method is unused. Will it be used or can it perhaps be removed? ================ Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3908 + } +} + ---------------- gcc 7.4.0 warning here too ../lib/MC/MCParser/MasmParser.cpp:3908:1: warning: control reaches end of non-void function [-Wreturn-type] Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75306/new/ https://reviews.llvm.org/D75306 From llvm-commits at lists.llvm.org Wed Jul 8 02:06:18 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Wed, 08 Jul 2020 02:06:18 -0700 (PDT) Subject: [llvm] 1f84ace - [llvm-readobj] - Refine error reporting in MipsGOTParser helper. Message-ID: <5f058c8a.1c69fb81.b2862.043e@mx.google.com> Author: Georgii Rymar Date: 2020-07-08T12:05:52+03:00 New Revision: 1f84ace3c7266564801d79185ebb05eb451205f1 URL: https://github.com/llvm/llvm-project/commit/1f84ace3c7266564801d79185ebb05eb451205f1 DIFF: https://github.com/llvm/llvm-project/commit/1f84ace3c7266564801d79185ebb05eb451205f1.diff LOG: [llvm-readobj] - Refine error reporting in MipsGOTParser helper. This is a follow-up for D83225. This does the following: 1) Adds missing tests for existent errors. 2) Stops using `unwrapOrError` to propagate errors to caller. (I am trying to get rid of all `unwrapOrErr` calls in the llvm-readelf code). 3) Improves error messages reported slightly. Differential revision: https://reviews.llvm.org/D83314 Added: Modified: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-got.test b/llvm/test/tools/llvm-readobj/ELF/mips-got.test index 54b321bcaae4..cfbf1c4f37a3 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-got.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-got.test @@ -519,3 +519,31 @@ Sections: - Tag: DT_NULL Value: 0 DynamicSymbols: [] + +# RUN: yaml2obj --docnum=2 -DVAL1=0xffff %s -o %t.err4.o +# RUN: not llvm-readobj -A %t.err4.o 2>&1 | FileCheck %s -DFILE=%t.err4.o -check-prefix=ERR4 + +# ERR4: error: '[[FILE]]': DT_MIPS_GOTSYM value (65535) exceeds the number of dynamic symbols (1) + +# RUN: yaml2obj --docnum=2 -DVAL2=0xffff %s -o %t.err5.o +# RUN: not llvm-readobj -A %t.err5.o 2>&1 | FileCheck %s -DFILE=%t.err5.o -check-prefix=ERR5 + +# ERR5: error: '[[FILE]]': there is no non-empty GOT section at 0xffff + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_EXEC + Machine: EM_MIPS +Sections: + - Name: .dynamic + Type: SHT_DYNAMIC + Entries: + - Tag: DT_MIPS_LOCAL_GOTNO + Value: 0 + - Tag: DT_MIPS_GOTSYM + Value: [[VAL1=0]] + - Tag: DT_PLTGOT + Value: [[VAL2=0]] +DynamicSymbols: [] diff --git a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test index fa62aa98251a..4959d8892ac2 100644 --- a/llvm/test/tools/llvm-readobj/ELF/mips-plt.test +++ b/llvm/test/tools/llvm-readobj/ELF/mips-plt.test @@ -117,4 +117,26 @@ Sections: Value: [[VAL2=0]] - Tag: DT_NULL Value: 0 + - Name: .foo + Type: SHT_PROGBITS + Address: 0x100 + ShSize: 0xffffffff + Link: [[LINK=0x1]] DynamicSymbols: [] + +## Check we report errors when we are unable to dump PLTGOT properly. + +# RUN: yaml2obj --docnum=2 -DVAL1=0x100 %s -o %t.err5.o +# RUN: not llvm-readobj -A %t.err5.o 2>&1 | FileCheck %s -DFILE=%t.err5.o -check-prefix ERR5 + +# ERR5: error: '[[FILE]]': unable to read PLTGOT section content: section [index 2] has a sh_offset (0x70) + sh_size (0xffffffff) that is greater than the file size (0x280) + +# RUN: yaml2obj --docnum=2 -DVAL2=0x100 -DLINK=0xaaaaaaaa %s -o %t.err6.o +# RUN: not llvm-readobj -A %t.err6.o 2>&1 | FileCheck %s -DFILE=%t.err6.o -check-prefix ERR6 + +# ERR6: error: '[[FILE]]': unable to get a symbol table linked to the RELPLT section with index 2: invalid section index: 2863311530 + +# RUN: yaml2obj --docnum=2 -DVAL2=0x100 %s -o %t.err7.o +# RUN: not llvm-readobj -A %t.err7.o 2>&1 | FileCheck %s -DFILE=%t.err7.o -check-prefix ERR7 + +# ERR7: error: '[[FILE]]': unable to get a string table for the symbol table with index 1: invalid sh_type for symbol table, expected SHT_SYMTAB or SHT_DYNSYM diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index cd3c79d208e4..b8a5de27cb67 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -3039,7 +3039,9 @@ Error MipsGOTParser::findGOT(Elf_Dyn_Range DynTable, size_t DynSymTotal = DynSyms.size(); if (*DtGotSym > DynSymTotal) - return createError("MIPS_GOTSYM exceeds a number of dynamic symbols"); + return createError("DT_MIPS_GOTSYM value (" + Twine(*DtGotSym) + + ") exceeds the number of dynamic symbols (" + + Twine(DynSymTotal) + ")"); GotSec = findNotEmptySectionByAddress(Obj, FileName, *DtPltGot); if (!GotSec) @@ -3093,14 +3095,35 @@ Error MipsGOTParser::findPLT(Elf_Dyn_Range DynTable) { return createError("there is no non-empty RELPLT section at 0x" + Twine::utohexstr(*DtJmpRel)); - ArrayRef PltContent = - unwrapOrError(FileName, Obj->getSectionContents(PltSec)); - PltEntries = Entries(reinterpret_cast(PltContent.data()), - PltContent.size() / sizeof(Entry)); + if (Expected> PltContentOrErr = + Obj->getSectionContents(PltSec)) + PltEntries = + Entries(reinterpret_cast(PltContentOrErr->data()), + PltContentOrErr->size() / sizeof(Entry)); + else + return createError("unable to read PLTGOT section content: " + + toString(PltContentOrErr.takeError())); + + if (Expected PltSymTableOrErr = + Obj->getSection(PltRelSec->sh_link)) { + PltSymTable = *PltSymTableOrErr; + } else { + unsigned SecNdx = PltRelSec - &cantFail(Obj->sections()).front(); + return createError("unable to get a symbol table linked to the RELPLT " + "section with index " + + Twine(SecNdx) + ": " + + toString(PltSymTableOrErr.takeError())); + } - PltSymTable = unwrapOrError(FileName, Obj->getSection(PltRelSec->sh_link)); - PltStrTable = - unwrapOrError(FileName, Obj->getStringTableForSymtab(*PltSymTable)); + if (Expected StrTabOrErr = + Obj->getStringTableForSymtab(*PltSymTable)) { + PltStrTable = *StrTabOrErr; + } else { + unsigned SecNdx = PltSymTable - &cantFail(Obj->sections()).front(); + return createError( + "unable to get a string table for the symbol table with index " + + Twine(SecNdx) + ": " + toString(StrTabOrErr.takeError())); + } return Error::success(); } From llvm-commits at lists.llvm.org Wed Jul 8 02:06:23 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:06:23 +0000 (UTC) Subject: [PATCH] D83314: [llvm-readobj] - Refine error reporting in MipsGOTParser helper. In-Reply-To: References: Message-ID: <24da92b9a89ac270b410a2d064d402d2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. grimar marked an inline comment as done. Closed by commit rG1f84ace3c726: [llvm-readobj] - Refine error reporting in MipsGOTParser<ELFT> helper. (authored by grimar). Herald added a subscriber: jrtc27. Changed prior to commit: https://reviews.llvm.org/D83314?vs=276080&id=276343#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83314/new/ https://reviews.llvm.org/D83314 Files: llvm/test/tools/llvm-readobj/ELF/mips-got.test llvm/test/tools/llvm-readobj/ELF/mips-plt.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83314.276343.patch Type: text/x-patch Size: 5116 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:18:04 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Wed, 08 Jul 2020 02:18:04 -0700 (PDT) Subject: [llvm] bee8cdc - [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. Message-ID: <5f058f4c.1c69fb81.42787.0be6@mx.google.com> Author: Georgii Rymar Date: 2020-07-08T12:10:23+03:00 New Revision: bee8cdcabd2b3931be3f240e70b0b04e766ea4fe URL: https://github.com/llvm/llvm-project/commit/bee8cdcabd2b3931be3f240e70b0b04e766ea4fe DIFF: https://github.com/llvm/llvm-project/commit/bee8cdcabd2b3931be3f240e70b0b04e766ea4fe.diff LOG: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. There are following issues with `CFIProgram::parse` code: 1) Invalid CFI opcodes were never tested. And currently a test would fail when the `LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled. It happens because the `DataExtractor::Cursor C` remains unchecked when the "Invalid extended CFI opcode" error is reported: ``` .eh_frame section at offset 0x1128 address 0x0: Program aborted due to an unhandled Error: Error value was Success. (Note: Success values must still be checked prior to being destroyed). ``` 2) It is impossible to reach the "Invalid primary CFI opcode" error with the current code. There are 3 possible primary opcode values and all of them are handled. Hence this error should be replaced with llvm_unreachable. 3) Errors currently reported are upper-case. This patch refines the code in the `CFIProgram::parse` method to fix all issues mentioned and adds unit tests for all possible invalid extended CFI opcodes. Differential revision: https://reviews.llvm.org/D82868 Added: Modified: llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp Removed: ################################################################################ diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp index 6ffbf1212d47..0a1b75592290 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp @@ -39,18 +39,15 @@ Error CFIProgram::parse(DWARFDataExtractor Data, uint64_t *Offset, DataExtractor::Cursor C(*Offset); while (C && C.tell() < EndOffset) { uint8_t Opcode = Data.getRelocatedValue(C, 1); - // Some instructions have a primary opcode encoded in the top bits. - uint8_t Primary = Opcode & DWARF_CFI_PRIMARY_OPCODE_MASK; + if (!C) + break; - if (Primary) { + // Some instructions have a primary opcode encoded in the top bits. + if (uint8_t Primary = Opcode & DWARF_CFI_PRIMARY_OPCODE_MASK) { // If it's a primary opcode, the first operand is encoded in the bottom // bits of the opcode itself. uint64_t Op1 = Opcode & DWARF_CFI_PRIMARY_OPERAND_MASK; switch (Primary) { - default: - return createStringError(errc::illegal_byte_sequence, - "Invalid primary CFI opcode 0x%" PRIx8, - Primary); case DW_CFA_advance_loc: case DW_CFA_restore: addInstruction(Primary, Op1); @@ -58,104 +55,106 @@ Error CFIProgram::parse(DWARFDataExtractor Data, uint64_t *Offset, case DW_CFA_offset: addInstruction(Primary, Op1, Data.getULEB128(C)); break; - } - } else { - // Extended opcode - its value is Opcode itself. - switch (Opcode) { default: - return createStringError(errc::illegal_byte_sequence, - "Invalid extended CFI opcode 0x%" PRIx8, - Opcode); - case DW_CFA_nop: - case DW_CFA_remember_state: - case DW_CFA_restore_state: - case DW_CFA_GNU_window_save: - // No operands - addInstruction(Opcode); - break; - case DW_CFA_set_loc: - // Operands: Address - addInstruction(Opcode, Data.getRelocatedAddress(C)); - break; - case DW_CFA_advance_loc1: - // Operands: 1-byte delta - addInstruction(Opcode, Data.getRelocatedValue(C, 1)); - break; - case DW_CFA_advance_loc2: - // Operands: 2-byte delta - addInstruction(Opcode, Data.getRelocatedValue(C, 2)); - break; - case DW_CFA_advance_loc4: - // Operands: 4-byte delta - addInstruction(Opcode, Data.getRelocatedValue(C, 4)); - break; - case DW_CFA_restore_extended: - case DW_CFA_undefined: - case DW_CFA_same_value: - case DW_CFA_def_cfa_register: - case DW_CFA_def_cfa_offset: - case DW_CFA_GNU_args_size: - // Operands: ULEB128 - addInstruction(Opcode, Data.getULEB128(C)); - break; - case DW_CFA_def_cfa_offset_sf: - // Operands: SLEB128 - addInstruction(Opcode, Data.getSLEB128(C)); - break; - case DW_CFA_offset_extended: - case DW_CFA_register: - case DW_CFA_def_cfa: - case DW_CFA_val_offset: { - // Operands: ULEB128, ULEB128 - // Note: We can not embed getULEB128 directly into function - // argument list. getULEB128 changes Offset and order of evaluation - // for arguments is unspecified. - uint64_t op1 = Data.getULEB128(C); - uint64_t op2 = Data.getULEB128(C); - addInstruction(Opcode, op1, op2); - break; - } - case DW_CFA_offset_extended_sf: - case DW_CFA_def_cfa_sf: - case DW_CFA_val_offset_sf: { - // Operands: ULEB128, SLEB128 - // Note: see comment for the previous case - uint64_t op1 = Data.getULEB128(C); - uint64_t op2 = (uint64_t)Data.getSLEB128(C); - addInstruction(Opcode, op1, op2); - break; - } - case DW_CFA_def_cfa_expression: { - uint64_t ExprLength = Data.getULEB128(C); - addInstruction(Opcode, 0); - StringRef Expression = Data.getBytes(C, ExprLength); - - DataExtractor Extractor(Expression, Data.isLittleEndian(), - Data.getAddressSize()); - // Note. We do not pass the DWARF format to DWARFExpression, because - // DW_OP_call_ref, the only operation which depends on the format, is - // prohibited in call frame instructions, see sec. 6.4.2 in DWARFv5. - Instructions.back().Expression = - DWARFExpression(Extractor, Data.getAddressSize()); - break; - } - case DW_CFA_expression: - case DW_CFA_val_expression: { - uint64_t RegNum = Data.getULEB128(C); - addInstruction(Opcode, RegNum, 0); - - uint64_t BlockLength = Data.getULEB128(C); - StringRef Expression = Data.getBytes(C, BlockLength); - DataExtractor Extractor(Expression, Data.isLittleEndian(), - Data.getAddressSize()); - // Note. We do not pass the DWARF format to DWARFExpression, because - // DW_OP_call_ref, the only operation which depends on the format, is - // prohibited in call frame instructions, see sec. 6.4.2 in DWARFv5. - Instructions.back().Expression = - DWARFExpression(Extractor, Data.getAddressSize()); - break; - } + llvm_unreachable("invalid primary CFI opcode"); } + continue; + } + + // Extended opcode - its value is Opcode itself. + switch (Opcode) { + default: + return createStringError(errc::illegal_byte_sequence, + "invalid extended CFI opcode 0x%" PRIx8, Opcode); + case DW_CFA_nop: + case DW_CFA_remember_state: + case DW_CFA_restore_state: + case DW_CFA_GNU_window_save: + // No operands + addInstruction(Opcode); + break; + case DW_CFA_set_loc: + // Operands: Address + addInstruction(Opcode, Data.getRelocatedAddress(C)); + break; + case DW_CFA_advance_loc1: + // Operands: 1-byte delta + addInstruction(Opcode, Data.getRelocatedValue(C, 1)); + break; + case DW_CFA_advance_loc2: + // Operands: 2-byte delta + addInstruction(Opcode, Data.getRelocatedValue(C, 2)); + break; + case DW_CFA_advance_loc4: + // Operands: 4-byte delta + addInstruction(Opcode, Data.getRelocatedValue(C, 4)); + break; + case DW_CFA_restore_extended: + case DW_CFA_undefined: + case DW_CFA_same_value: + case DW_CFA_def_cfa_register: + case DW_CFA_def_cfa_offset: + case DW_CFA_GNU_args_size: + // Operands: ULEB128 + addInstruction(Opcode, Data.getULEB128(C)); + break; + case DW_CFA_def_cfa_offset_sf: + // Operands: SLEB128 + addInstruction(Opcode, Data.getSLEB128(C)); + break; + case DW_CFA_offset_extended: + case DW_CFA_register: + case DW_CFA_def_cfa: + case DW_CFA_val_offset: { + // Operands: ULEB128, ULEB128 + // Note: We can not embed getULEB128 directly into function + // argument list. getULEB128 changes Offset and order of evaluation + // for arguments is unspecified. + uint64_t op1 = Data.getULEB128(C); + uint64_t op2 = Data.getULEB128(C); + addInstruction(Opcode, op1, op2); + break; + } + case DW_CFA_offset_extended_sf: + case DW_CFA_def_cfa_sf: + case DW_CFA_val_offset_sf: { + // Operands: ULEB128, SLEB128 + // Note: see comment for the previous case + uint64_t op1 = Data.getULEB128(C); + uint64_t op2 = (uint64_t)Data.getSLEB128(C); + addInstruction(Opcode, op1, op2); + break; + } + case DW_CFA_def_cfa_expression: { + uint64_t ExprLength = Data.getULEB128(C); + addInstruction(Opcode, 0); + StringRef Expression = Data.getBytes(C, ExprLength); + + DataExtractor Extractor(Expression, Data.isLittleEndian(), + Data.getAddressSize()); + // Note. We do not pass the DWARF format to DWARFExpression, because + // DW_OP_call_ref, the only operation which depends on the format, is + // prohibited in call frame instructions, see sec. 6.4.2 in DWARFv5. + Instructions.back().Expression = + DWARFExpression(Extractor, Data.getAddressSize()); + break; + } + case DW_CFA_expression: + case DW_CFA_val_expression: { + uint64_t RegNum = Data.getULEB128(C); + addInstruction(Opcode, RegNum, 0); + + uint64_t BlockLength = Data.getULEB128(C); + StringRef Expression = Data.getBytes(C, BlockLength); + DataExtractor Extractor(Expression, Data.isLittleEndian(), + Data.getAddressSize()); + // Note. We do not pass the DWARF format to DWARFExpression, because + // DW_OP_call_ref, the only operation which depends on the format, is + // prohibited in call frame instructions, see sec. 6.4.2 in DWARFv5. + Instructions.back().Expression = + DWARFExpression(Extractor, Data.getAddressSize()); + break; + } } } diff --git a/llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp b/llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp index fd9a2be8a3e9..65e2be723090 100644 --- a/llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp +++ b/llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp @@ -6,6 +6,7 @@ // //===----------------------------------------------------------------------===// +#include "llvm/ADT/DenseSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringRef.h" #include "llvm/BinaryFormat/Dwarf.h" @@ -120,65 +121,116 @@ TEST(DWARFDebugFrame, DumpEH64FDE) { "cie=1111ab9a000c pc=4444abcdabcd...5555bcdebcde"); } +static Error parseCFI(dwarf::CIE &C, ArrayRef Instructions, + Optional Size = None) { + DWARFDataExtractor Data(Instructions, /*IsLittleEndian=*/true, + /*AddressSize=*/8); + uint64_t Offset = 0; + const uint64_t EndOffset = Size ? *Size : (uint64_t)Instructions.size(); + return C.cfis().parse(Data, &Offset, EndOffset); +} + +TEST(DWARFDebugFrame, InvalidCFIOpcodesTest) { + llvm::DenseSet ValidExtendedOpcodes = { + dwarf::DW_CFA_nop, + dwarf::DW_CFA_advance_loc, + dwarf::DW_CFA_offset, + dwarf::DW_CFA_restore, + dwarf::DW_CFA_set_loc, + dwarf::DW_CFA_advance_loc1, + dwarf::DW_CFA_advance_loc2, + dwarf::DW_CFA_advance_loc4, + dwarf::DW_CFA_offset_extended, + dwarf::DW_CFA_restore_extended, + dwarf::DW_CFA_undefined, + dwarf::DW_CFA_same_value, + dwarf::DW_CFA_register, + dwarf::DW_CFA_remember_state, + dwarf::DW_CFA_restore_state, + dwarf::DW_CFA_def_cfa, + dwarf::DW_CFA_def_cfa_register, + dwarf::DW_CFA_def_cfa_offset, + dwarf::DW_CFA_def_cfa_expression, + dwarf::DW_CFA_expression, + dwarf::DW_CFA_offset_extended_sf, + dwarf::DW_CFA_def_cfa_sf, + dwarf::DW_CFA_def_cfa_offset_sf, + dwarf::DW_CFA_val_offset, + dwarf::DW_CFA_val_offset_sf, + dwarf::DW_CFA_val_expression, + dwarf::DW_CFA_MIPS_advance_loc8, + dwarf::DW_CFA_GNU_window_save, + dwarf::DW_CFA_AARCH64_negate_ra_state, + dwarf::DW_CFA_GNU_args_size}; + + dwarf::CIE TestCIE = createCIE(/*IsDWARF64=*/false, + /*Offset=*/0x0, + /*Length=*/0xff); + + // See DWARF standard v3, section 7.23: low 6 bits are used to encode an + // extended opcode. + for (uint8_t Code = 0; Code <= 63; ++Code) { + if (ValidExtendedOpcodes.count(Code)) + continue; + + EXPECT_THAT_ERROR(parseCFI(TestCIE, Code), + FailedWithMessage(("invalid extended CFI opcode 0x" + + Twine::utohexstr(Code)) + .str() + .c_str())); + } +} + // Here we test how truncated Call Frame Instructions are parsed. TEST(DWARFDebugFrame, ParseTruncatedCFITest) { - auto ParseCFI = [](dwarf::CIE &C, ArrayRef Instructions, - Optional Size = None) { - DWARFDataExtractor Data(Instructions, /*IsLittleEndian=*/true, - /*AddressSize=*/8); - uint64_t Offset = 0; - const uint64_t EndOffset = Size ? *Size : (uint64_t)Instructions.size(); - return C.cfis().parse(Data, &Offset, EndOffset); - }; - dwarf::CIE TestCIE = createCIE(/*IsDWARF64=*/false, /*Offset=*/0x0, /*Length=*/0xff); // Having an empty instructions list is fine. - EXPECT_THAT_ERROR(ParseCFI(TestCIE, {}), Succeeded()); + EXPECT_THAT_ERROR(parseCFI(TestCIE, {}), Succeeded()); // Unable to read an opcode, because the instructions list is empty, but we // say to the parser that it is not. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {}, /*Size=*/1), + parseCFI(TestCIE, {}, /*Size=*/1), FailedWithMessage( "unexpected end of data at offset 0x0 while reading [0x0, 0x1)")); // Unable to read a truncated DW_CFA_offset instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_offset}), + parseCFI(TestCIE, {dwarf::DW_CFA_offset}), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); // Unable to read a truncated DW_CFA_set_loc instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_set_loc}), + parseCFI(TestCIE, {dwarf::DW_CFA_set_loc}), FailedWithMessage( "unexpected end of data at offset 0x1 while reading [0x1, 0x9)")); // Unable to read a truncated DW_CFA_advance_loc1 instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_advance_loc1}), + parseCFI(TestCIE, {dwarf::DW_CFA_advance_loc1}), FailedWithMessage( "unexpected end of data at offset 0x1 while reading [0x1, 0x2)")); // Unable to read a truncated DW_CFA_advance_loc2 instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_advance_loc2}), + parseCFI(TestCIE, {dwarf::DW_CFA_advance_loc2}), FailedWithMessage( "unexpected end of data at offset 0x1 while reading [0x1, 0x3)")); // Unable to read a truncated DW_CFA_advance_loc4 instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_advance_loc4}), + parseCFI(TestCIE, {dwarf::DW_CFA_advance_loc4}), FailedWithMessage( "unexpected end of data at offset 0x1 while reading [0x1, 0x5)")); // A test for an instruction with a single ULEB128 operand. auto CheckOp_ULEB128 = [&](uint8_t Inst) { EXPECT_THAT_ERROR( - ParseCFI(TestCIE, Inst), + parseCFI(TestCIE, Inst), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); }; @@ -191,19 +243,19 @@ TEST(DWARFDebugFrame, ParseTruncatedCFITest) { // Unable to read a truncated DW_CFA_def_cfa_offset_sf instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_offset_sf}), + parseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_offset_sf}), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed sleb128, extends past end")); // A test for an instruction with two ULEB128 operands. auto CheckOp_ULEB128_ULEB128 = [&](uint8_t Inst) { EXPECT_THAT_ERROR( - ParseCFI(TestCIE, Inst), + parseCFI(TestCIE, Inst), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {Inst, /*Op1=*/0}), + parseCFI(TestCIE, {Inst, /*Op1=*/0}), FailedWithMessage("unable to decode LEB128 at offset 0x00000002: " "malformed uleb128, extends past end")); }; @@ -215,12 +267,12 @@ TEST(DWARFDebugFrame, ParseTruncatedCFITest) { // A test for an instruction with two operands: ULEB128, SLEB128. auto CheckOp_ULEB128_SLEB128 = [&](uint8_t Inst) { EXPECT_THAT_ERROR( - ParseCFI(TestCIE, Inst), + parseCFI(TestCIE, Inst), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {Inst, /*Op1=*/0}), + parseCFI(TestCIE, {Inst, /*Op1=*/0}), FailedWithMessage("unable to decode LEB128 at offset 0x00000002: " "malformed sleb128, extends past end")); }; @@ -231,16 +283,16 @@ TEST(DWARFDebugFrame, ParseTruncatedCFITest) { // Unable to read a truncated DW_CFA_def_cfa_expression instruction. EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression}), + parseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression}), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression, + parseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression, /*expression length=*/0x1}), FailedWithMessage( "unexpected end of data at offset 0x2 while reading [0x2, 0x3)")); // The DW_CFA_def_cfa_expression can contain a zero length expression. - EXPECT_THAT_ERROR(ParseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression, + EXPECT_THAT_ERROR(parseCFI(TestCIE, {dwarf::DW_CFA_def_cfa_expression, /*ExprLen=*/0}), Succeeded()); @@ -248,19 +300,19 @@ TEST(DWARFDebugFrame, ParseTruncatedCFITest) { // (ULEB128) and expression bytes. auto CheckOp_ULEB128_Expr = [&](uint8_t Inst) { EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {Inst}), + parseCFI(TestCIE, {Inst}), FailedWithMessage("unable to decode LEB128 at offset 0x00000001: " "malformed uleb128, extends past end")); EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {Inst, /*Op1=*/0}), + parseCFI(TestCIE, {Inst, /*Op1=*/0}), FailedWithMessage("unable to decode LEB128 at offset 0x00000002: " "malformed uleb128, extends past end")); // A zero length expression is fine - EXPECT_THAT_ERROR(ParseCFI(TestCIE, {Inst, + EXPECT_THAT_ERROR(parseCFI(TestCIE, {Inst, /*Op1=*/0, /*ExprLen=*/0}), Succeeded()); EXPECT_THAT_ERROR( - ParseCFI(TestCIE, {Inst, + parseCFI(TestCIE, {Inst, /*Op1=*/0, /*ExprLen=*/1}), FailedWithMessage( "unexpected end of data at offset 0x3 while reading [0x3, 0x4)")); From llvm-commits at lists.llvm.org Wed Jul 8 02:18:16 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:18:16 +0000 (UTC) Subject: [PATCH] D82868: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. In-Reply-To: References: Message-ID: <94d55a6879f675683543aa99e6c4065a@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbee8cdcabd2b: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related… (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D82868?vs=274429&id=276345#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82868/new/ https://reviews.llvm.org/D82868 Files: llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82868.276345.patch Type: text/x-patch Size: 18522 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:21:03 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 02:21:03 -0700 (PDT) Subject: [llvm] c00a277 - [X86][AVX] Remove redundant EXTRACT_VECTOR_ELT(VBROADCAST(SCALAR())) fold Message-ID: <5f058fff.1c69fb81.70bb1.3f69@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T10:18:36+01:00 New Revision: c00a27752e4944db609a683504bb10e0975fdf76 URL: https://github.com/llvm/llvm-project/commit/c00a27752e4944db609a683504bb10e0975fdf76 DIFF: https://github.com/llvm/llvm-project/commit/c00a27752e4944db609a683504bb10e0975fdf76.diff LOG: [X86][AVX] Remove redundant EXTRACT_VECTOR_ELT(VBROADCAST(SCALAR())) fold Noticed while looking for similar cases to rG931ec74f7a29 - SimplifyDemandedVectorElts and shuffle combining both should handle this now. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 023b5975f0c7..5238014008be 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -38963,9 +38963,6 @@ static SDValue combineExtractWithShuffle(SDNode *N, SelectionDAG &DAG, // Handle extract(bitcast(broadcast(scalar_value))). if (X86ISD::VBROADCAST == SrcBC.getOpcode()) { SDValue SrcOp = SrcBC.getOperand(0); - if (SrcOp.getValueSizeInBits() == VT.getSizeInBits()) - return DAG.getBitcast(VT, SrcOp); - EVT SrcOpVT = SrcOp.getValueType(); if (SrcOpVT.isScalarInteger() && VT.isInteger() && (SrcOpVT.getSizeInBits() % SrcEltBits) == 0) { From llvm-commits at lists.llvm.org Wed Jul 8 02:21:04 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 02:21:04 -0700 (PDT) Subject: [llvm] 997a3c2 - Fix MSVC "not all control paths return a value" warnings. NFC. Message-ID: <5f059000.1c69fb81.1ccd4.8f6d@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T10:18:36+01:00 New Revision: 997a3c29f4655e930a9ef44be98d28368d757d98 URL: https://github.com/llvm/llvm-project/commit/997a3c29f4655e930a9ef44be98d28368d757d98 DIFF: https://github.com/llvm/llvm-project/commit/997a3c29f4655e930a9ef44be98d28368d757d98.diff LOG: Fix MSVC "not all control paths return a value" warnings. NFC. Added: Modified: llvm/lib/MC/MCParser/MasmParser.cpp Removed: ################################################################################ diff --git a/llvm/lib/MC/MCParser/MasmParser.cpp b/llvm/lib/MC/MCParser/MasmParser.cpp index 14c889da5c3e..3dbd00aae47a 100644 --- a/llvm/lib/MC/MCParser/MasmParser.cpp +++ b/llvm/lib/MC/MCParser/MasmParser.cpp @@ -3673,6 +3673,7 @@ bool MasmParser::parseFieldInitializer(const FieldInfo &Field, case FT_STRUCT: return parseFieldInitializer(Field, Field.Contents.StructInfo, Initializer); } + llvm_unreachable("Unhandled FieldType enum"); } bool MasmParser::parseStructInitializer(const StructInfo &Structure, @@ -3826,6 +3827,7 @@ bool MasmParser::emitFieldValue(const FieldInfo &Field) { case FT_STRUCT: return emitFieldValue(Field, Field.Contents.StructInfo); } + llvm_unreachable("Unhandled FieldType enum"); } bool MasmParser::emitStructValue(const StructInfo &Structure) { @@ -3905,6 +3907,7 @@ bool MasmParser::emitFieldInitializer(const FieldInfo &Field, return emitFieldInitializer(Field, Field.Contents.StructInfo, Initializer.StructInfo); } + llvm_unreachable("Unhandled FieldType enum"); } bool MasmParser::emitStructInitializer(const StructInfo &Structure, From llvm-commits at lists.llvm.org Wed Jul 8 02:26:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Wed, 08 Jul 2020 02:26:27 -0700 (PDT) Subject: [llvm] a39c7ab - [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction Message-ID: <5f059143.1c69fb81.937d3.bee7@mx.google.com> Author: Roman Lebedev Date: 2020-07-08T12:26:00+03:00 New Revision: a39c7ab9c355670510341191a802f3799265e9ef URL: https://github.com/llvm/llvm-project/commit/a39c7ab9c355670510341191a802f3799265e9ef DIFF: https://github.com/llvm/llvm-project/commit/a39c7ab9c355670510341191a802f3799265e9ef.diff LOG: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction Summary: I think, this results in much more understandable/readable flow. At least the original logic was perhaps the most hard thing for me to grasp when taking an initial look on the delta passes. Reviewers: nickdesaulniers, dblaikie, diegotf, george.burgess.iv Reviewed By: nickdesaulniers Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83287 Added: Modified: llvm/tools/llvm-reduce/deltas/Delta.h llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-reduce/deltas/Delta.h b/llvm/tools/llvm-reduce/deltas/Delta.h index dbb18e4bd07f..7da3c79c958e 100644 --- a/llvm/tools/llvm-reduce/deltas/Delta.h +++ b/llvm/tools/llvm-reduce/deltas/Delta.h @@ -16,9 +16,10 @@ #define LLVM_TOOLS_LLVMREDUCE_LLVMREDUCE_DELTA_H #include "TestRunner.h" -#include -#include +#include "llvm/ADT/ScopeExit.h" #include +#include +#include namespace llvm { @@ -47,6 +48,39 @@ struct Chunk { } }; +/// Provides opaque interface for querying into ChunksToKeep without having to +/// actually understand what is going on. +class Oracle { + /// Out of all the features that we promised to be, + /// how many have we already processed? 1-based! + int Index = 1; + + /// The actual workhorse, contains the knowledge whether or not + /// some particular feature should be preserved this time. + ArrayRef ChunksToKeep; + +public: + explicit Oracle(ArrayRef ChunksToKeep_) + : ChunksToKeep(ChunksToKeep_) {} + + /// Should be called for each feature on which we are operating. + /// Name is self-explanatory - if returns true, then it should be preserved. + bool shouldKeep() { + if (ChunksToKeep.empty()) + return false; // All further features are to be discarded. + + // Does the current (front) chunk contain such a feature? + bool ShouldKeep = ChunksToKeep.front().contains(Index); + auto _ = make_scope_exit([&]() { ++Index; }); // Next time - next feature. + + // Is this the last feature in the chunk? + if (ChunksToKeep.front().end == Index) + ChunksToKeep = ChunksToKeep.drop_front(); // Onto next chunk. + + return ShouldKeep; + } +}; + /// This function implements the Delta Debugging algorithm, it receives a /// number of Targets (e.g. Functions, Instructions, Basic Blocks, etc.) and /// splits them in half; these chunks of targets are then tested while ignoring diff --git a/llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp b/llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp index b7d92049a67a..a119b40018b3 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp @@ -42,7 +42,8 @@ static void replaceFunctionCalls(Function &OldF, Function &NewF, /// accordingly. It also removes allocations of out-of-chunk arguments. static void extractArgumentsFromModule(std::vector ChunksToKeep, Module *Program) { - int I = 0, ArgCount = 0; + Oracle O(ChunksToKeep); + std::set ArgsToKeep; std::vector Funcs; // Get inside-chunk arguments, as well as their parent function @@ -50,12 +51,8 @@ static void extractArgumentsFromModule(std::vector ChunksToKeep, if (!F.isDeclaration()) { Funcs.push_back(&F); for (auto &A : F.args()) - if (I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(++ArgCount)) - ArgsToKeep.insert(&A); - if (ChunksToKeep[I].end == ArgCount) - ++I; - } + if (O.shouldKeep()) + ArgsToKeep.insert(&A); } for (auto *F : Funcs) { diff --git a/llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp b/llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp index 03c3962d2fd9..002d81a67487 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp @@ -82,17 +82,14 @@ static void removeUninterestingBBsFromSwitch(SwitchInst &SwInst, /// accordingly. It also removes allocations of out-of-chunk arguments. static void extractBasicBlocksFromModule(std::vector ChunksToKeep, Module *Program) { - int I = 0, BBCount = 0; + Oracle O(ChunksToKeep); + std::set BBsToKeep; for (auto &F : *Program) for (auto &BB : F) - if (I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(++BBCount)) - BBsToKeep.insert(&BB); - if (ChunksToKeep[I].end == BBCount) - ++I; - } + if (O.shouldKeep()) + BBsToKeep.insert(&BB); std::vector BBsToDelete; for (auto &F : *Program) diff --git a/llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp b/llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp index 7652f0fda55f..b29df88261d9 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp @@ -24,16 +24,13 @@ using namespace llvm; /// that aren't inside any of the desired Chunks. static void extractFunctionsFromModule(const std::vector &ChunksToKeep, Module *Program) { + Oracle O(ChunksToKeep); + // Get functions inside desired chunks std::set FuncsToKeep; - int I = 0, FunctionCount = 0; for (auto &F : *Program) - if (I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(++FunctionCount)) - FuncsToKeep.insert(&F); - if (FunctionCount == ChunksToKeep[I].end) - ++I; - } + if (O.shouldKeep()) + FuncsToKeep.insert(&F); // Delete out-of-chunk functions, and replace their calls with undef std::vector FuncsToRemove; diff --git a/llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp b/llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp index 55d732cfec98..dc8df7395e1f 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp @@ -20,16 +20,13 @@ using namespace llvm; /// Removes all the Initialized GVs that aren't inside the desired Chunks. static void extractGVsFromModule(std::vector ChunksToKeep, Module *Program) { + Oracle O(ChunksToKeep); + // Get GVs inside desired chunks std::set GVsToKeep; - int I = 0, GVCount = 0; for (auto &GV : Program->globals()) - if (GV.hasInitializer() && I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(++GVCount)) - GVsToKeep.insert(&GV); - if (GVCount == ChunksToKeep[I].end) - ++I; - } + if (GV.hasInitializer() && O.shouldKeep()) + GVsToKeep.insert(&GV); // Delete out-of-chunk GVs and their uses std::vector ToRemove; diff --git a/llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp b/llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp index b3497ad2dc02..18dec02b90ad 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp @@ -19,18 +19,15 @@ using namespace llvm; /// accordingly. It also removes allocations of out-of-chunk arguments. static void extractInstrFromModule(std::vector ChunksToKeep, Module *Program) { - int I = 0, InstCount = 0; + Oracle O(ChunksToKeep); + std::set InstToKeep; for (auto &F : *Program) for (auto &BB : F) for (auto &Inst : BB) - if (I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(++InstCount)) - InstToKeep.insert(&Inst); - if (ChunksToKeep[I].end == InstCount) - ++I; - } + if (O.shouldKeep()) + InstToKeep.insert(&Inst); std::vector InstToDelete; for (auto &F : *Program) diff --git a/llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp b/llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp index 4ea223546efa..4587295a00be 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp @@ -21,20 +21,15 @@ using namespace llvm; /// Adds all Unnamed Metadata Nodes that are inside desired Chunks to set template -static void getChunkMetadataNodes(T &MDUser, int &I, - const std::vector &ChunksToKeep, +static void getChunkMetadataNodes(T &MDUser, Oracle &O, std::set &SeenNodes, std::set &NodesToKeep) { SmallVector, 4> MDs; MDUser.getAllMetadata(MDs); for (auto &MD : MDs) { SeenNodes.insert(MD.second); - if (I < (int)ChunksToKeep.size()) { - if (ChunksToKeep[I].contains(SeenNodes.size())) - NodesToKeep.insert(MD.second); - if (ChunksToKeep[I].end == (int)SeenNodes.size()) - ++I; - } + if (O.shouldKeep()) + NodesToKeep.insert(MD.second); } } @@ -53,19 +48,20 @@ static void eraseMetadataIfOutsideChunk(T &MDUser, /// functions that aren't inside the desired Chunks. static void extractMetadataFromModule(const std::vector &ChunksToKeep, Module *Program) { + Oracle O(ChunksToKeep); + std::set SeenNodes; std::set NodesToKeep; - int I = 0; // Add chunk MDNodes used by GVs, Functions, and Instructions to set for (auto &GV : Program->globals()) - getChunkMetadataNodes(GV, I, ChunksToKeep, SeenNodes, NodesToKeep); + getChunkMetadataNodes(GV, O, SeenNodes, NodesToKeep); for (auto &F : *Program) { - getChunkMetadataNodes(F, I, ChunksToKeep, SeenNodes, NodesToKeep); + getChunkMetadataNodes(F, O, SeenNodes, NodesToKeep); for (auto &BB : F) for (auto &Inst : BB) - getChunkMetadataNodes(Inst, I, ChunksToKeep, SeenNodes, NodesToKeep); + getChunkMetadataNodes(Inst, O, SeenNodes, NodesToKeep); } // Once more, go over metadata nodes, but deleting the ones outside chunks @@ -81,17 +77,10 @@ static void extractMetadataFromModule(const std::vector &ChunksToKeep, // Get out-of-chunk Named metadata nodes - unsigned MetadataCount = SeenNodes.size(); std::vector NamedNodesToDelete; - for (auto &MD : Program->named_metadata()) { - if (I < (int)ChunksToKeep.size()) { - if (!ChunksToKeep[I].contains(++MetadataCount)) - NamedNodesToDelete.push_back(&MD); - if (ChunksToKeep[I].end == (int)SeenNodes.size()) - ++I; - } else + for (auto &MD : Program->named_metadata()) + if (!O.shouldKeep()) NamedNodesToDelete.push_back(&MD); - } for (auto *NN : NamedNodesToDelete) { for (int I = 0, E = NN->getNumOperands(); I != E; ++I) diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp index 23a1ae3909b1..3f1cb3740813 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -17,7 +17,6 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/STLExtras.h" -#include "llvm/ADT/ScopeExit.h" #include "llvm/ADT/Sequence.h" #include "llvm/ADT/iterator_range.h" #include "llvm/IR/InstVisitor.h" @@ -35,38 +34,6 @@ using namespace llvm; namespace { -/// Provides opaque interface for querying into ChunksToKeep without having to -/// actually understand what is going on. -struct Oracle { - /// Out of all the features that we promised to be, - /// how many have we already processed? 1-based! - int Index = 1; - - /// The actual workhorse, contains the knowledge whether or not - /// some particular feature should be preserved this time. - ArrayRef ChunksToKeep; - -public: - Oracle(ArrayRef ChunksToKeep_) : ChunksToKeep(ChunksToKeep_) {} - - /// Should be called for each feature on which we are operating. - /// Name is self-explanatory - if returns true, then it should be preserved. - bool shouldKeep() { - if (ChunksToKeep.empty()) - return false; // All further features are to be discarded. - - // Does the current (front) chunk contain such a feature? - bool ShouldKeep = ChunksToKeep.front().contains(Index); - auto _ = make_scope_exit([&]() { ++Index; }); // Next time - next feature. - - // Is this the last feature in the chunk? - if (ChunksToKeep.front().end == Index) - ChunksToKeep = ChunksToKeep.drop_front(); // Onto next chunk. - - return ShouldKeep; - } -}; - /// Given ChunksToKeep, produce a map of calls and indexes of operand bundles /// to be preserved for each call. class OperandBundleRemapper : public InstVisitor { From llvm-commits at lists.llvm.org Wed Jul 8 02:26:30 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:26:30 +0000 (UTC) Subject: [PATCH] D83287: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction In-Reply-To: References: Message-ID: <8a9e6414f350931e54b4c590653031fc@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa39c7ab9c355: [NFCI][llvm-reduce] Cleanup Delta passes to use Oracle abstraction (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83287/new/ https://reviews.llvm.org/D83287 Files: llvm/tools/llvm-reduce/deltas/Delta.h llvm/tools/llvm-reduce/deltas/ReduceArguments.cpp llvm/tools/llvm-reduce/deltas/ReduceBasicBlocks.cpp llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp llvm/tools/llvm-reduce/deltas/ReduceGlobalVars.cpp llvm/tools/llvm-reduce/deltas/ReduceInstructions.cpp llvm/tools/llvm-reduce/deltas/ReduceMetadata.cpp llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83287.276347.patch Type: text/x-patch Size: 11591 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:27:54 2020 From: llvm-commits at lists.llvm.org (Vishal Chebrolu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:27:54 +0000 (UTC) Subject: [PATCH] D82892: [NFC] Added comparision for all types in haveSameSpecialState() of Instruction.cpp In-Reply-To: References: Message-ID: <74c6eac46f47bcfbe3324dee773b2918@localhost.localdomain> vish99 added a comment. In D82892#2133157 , @tejohnson wrote: > Is this a bug fix? If it is NFC (No Functional Change) please specify that in the title, otherwise needs a test case. Thanks. The title has been edited :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82892/new/ https://reviews.llvm.org/D82892 From llvm-commits at lists.llvm.org Wed Jul 8 02:31:08 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via llvm-commits) Date: Wed, 08 Jul 2020 02:31:08 -0700 (PDT) Subject: [llvm] b9d977b - [DWARF] Add cuttoff guarding quadratic validThroughout behaviour Message-ID: <5f05925c.1c69fb81.a82aa.eb12@mx.google.com> Author: Jeremy Morse Date: 2020-07-08T10:30:09+01:00 New Revision: b9d977b0ca60c54f11615ca9d144c9f08b29fd85 URL: https://github.com/llvm/llvm-project/commit/b9d977b0ca60c54f11615ca9d144c9f08b29fd85 DIFF: https://github.com/llvm/llvm-project/commit/b9d977b0ca60c54f11615ca9d144c9f08b29fd85.diff LOG: [DWARF] Add cuttoff guarding quadratic validThroughout behaviour Occasionally we see absolutely massive basic blocks, typically in global constructors that are vulnerable to heavy inlining. When these blocks are dense with DBG_VALUE instructions, we can hit near quadratic complexity in DwarfDebug's validThroughout function. The problem is caused by: * validThroughout having to step through all instructions in the block to examine their lexical scope, * and a high proportion of instructions in that block being DBG_VALUEs for a unique variable fragment, Leading to us stepping through every instruction in the block, for (nearly) each instruction in the block. By adding this guard, we force variables in large blocks to use a location list rather than a single-location expression, as shown in the added test. This shouldn't change the meaning of the output DWARF at all: instead we use a less efficient DWARF encoding to avoid a poor-performance code path. Differential Revision: https://reviews.llvm.org/D83236 Added: llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir Modified: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp index 80935f8a4540..45ed5256deb9 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -167,6 +167,11 @@ static cl::opt "Abstract subprograms")), cl::init(DefaultLinkageNames)); +static cl::opt LocationAnalysisSizeLimit( + "singlevarlocation-input-bb-limit", + cl::desc("Maximum block size to analyze for single-location variables"), + cl::init(30000), cl::Hidden); + static const char *const DWARFGroupName = "dwarf"; static const char *const DWARFGroupDescription = "DWARF Emission"; static const char *const DbgTimerName = "writer"; @@ -1637,8 +1642,10 @@ static bool validThroughout(LexicalScopes &LScopes, // [1-3) [(reg0, fragment 0, 32), (reg1, fragment 32, 32)] // [3-4) [(reg1, fragment 32, 32), (123, fragment 64, 32)] // [4-) [(@g, fragment 0, 96)] -bool DwarfDebug::buildLocationList(SmallVectorImpl &DebugLoc, - const DbgValueHistoryMap::Entries &Entries) { +bool DwarfDebug::buildLocationList( + SmallVectorImpl &DebugLoc, + const DbgValueHistoryMap::Entries &Entries, + DenseSet &VeryLargeBlocks) { using OpenRange = std::pair; SmallVector OpenRanges; @@ -1734,8 +1741,14 @@ bool DwarfDebug::buildLocationList(SmallVectorImpl &DebugLoc, DebugLoc.pop_back(); } - return DebugLoc.size() == 1 && isSafeForSingleLocation && - validThroughout(LScopes, StartDebugMI, EndMI); + // If there's a single entry, safe for a single location, and not part of + // an over-sized basic block, then ask validThroughout whether this + // location can be represented as a single variable location. + if (DebugLoc.size() != 1 || !isSafeForSingleLocation) + return false; + if (VeryLargeBlocks.count(StartDebugMI->getParent())) + return false; + return validThroughout(LScopes, StartDebugMI, EndMI); } DbgEntity *DwarfDebug::createConcreteEntity(DwarfCompileUnit &TheCU, @@ -1767,6 +1780,13 @@ void DwarfDebug::collectEntityInfo(DwarfCompileUnit &TheCU, // Grab the variable info that was squirreled away in the MMI side-table. collectVariableInfoFromMFTable(TheCU, Processed); + // Identify blocks that are unreasonably sized, so that we can later + // skip lexical scope analysis over them. + DenseSet VeryLargeBlocks; + for (const auto &MBB : *CurFn) + if (MBB.size() > LocationAnalysisSizeLimit) + VeryLargeBlocks.insert(&MBB); + for (const auto &I : DbgValues) { InlinedEntity IV = I.first; if (Processed.count(IV)) @@ -1803,7 +1823,8 @@ void DwarfDebug::collectEntityInfo(DwarfCompileUnit &TheCU, if (HistSize == 1 || SingleValueWithClobber) { const auto *End = SingleValueWithClobber ? HistoryMapEntries[1].getInstr() : nullptr; - if (validThroughout(LScopes, MInsn, End)) { + if (VeryLargeBlocks.count(MInsn->getParent()) == 0 && + validThroughout(LScopes, MInsn, End)) { RegVar->initializeDbgValue(MInsn); continue; } @@ -1818,7 +1839,8 @@ void DwarfDebug::collectEntityInfo(DwarfCompileUnit &TheCU, // Build the location list for this variable. SmallVector Entries; - bool isValidSingleLocation = buildLocationList(Entries, HistoryMapEntries); + bool isValidSingleLocation = + buildLocationList(Entries, HistoryMapEntries, VeryLargeBlocks); // Check whether buildLocationList managed to merge all locations to one // that is valid throughout the variable's scope. If so, produce single diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h index bb0f550b9654..d7a4b2abf52b 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h @@ -592,8 +592,10 @@ class DwarfDebug : public DebugHandlerBase { /// function that describe the same variable. If the resulting /// list has only one entry that is valid for entire variable's /// scope return true. - bool buildLocationList(SmallVectorImpl &DebugLoc, - const DbgValueHistoryMap::Entries &Entries); + bool buildLocationList( + SmallVectorImpl &DebugLoc, + const DbgValueHistoryMap::Entries &Entries, + DenseSet &VeryLargeBlocks); /// Collect variable information from the side table maintained by MF. void collectVariableInfoFromMFTable(DwarfCompileUnit &TheCU, diff --git a/llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir b/llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir new file mode 100644 index 000000000000..6ad64d9d74bb --- /dev/null +++ b/llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir @@ -0,0 +1,65 @@ +# Test cutoffs for single-location variable analysis. +# Disable validThroughout if the input size exceeds the specified limit + +# RUN: llc %s -o - -start-after=livedebugvalues -mtriple=x86_64-unknown-unknown \ +# RUN: --singlevarlocation-input-bb-limit=0 -filetype=obj\ +# RUN: | llvm-dwarfdump -v -\ +# RUN: | FileCheck %s -check-prefix=LIMITED + +# RUN: llc %s -o - -start-after=livedebugvalues -mtriple=x86_64-unknown-unknown \ +# RUN: --singlevarlocation-input-bb-limit=20 -filetype=obj | llvm-dwarfdump -v -\ +# RUN: | FileCheck %s -check-prefix=UNLIMITED + +# LIMITED: DW_AT_location [DW_FORM_sec_offset] + +# UNLIMITED: DW_AT_location [DW_FORM_exprloc] + +--- | + target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" + + declare i32 @use(i32) + + define i32 @foo(i32 %x) !dbg !6 { + entry: + ret i32 1, !dbg !15 + } + + declare void @llvm.dbg.value(metadata, metadata, metadata) + + !llvm.dbg.cu = !{!0} + !llvm.debugify = !{!3, !4} + !llvm.module.flags = !{!5} + + !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2) + !1 = !DIFile(filename: "/tmp/t.ll", directory: "/") + !2 = !{} + !3 = !{i32 4} + !4 = !{i32 2} + !5 = !{i32 2, !"Debug Info Version", i32 3} + !6 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope: null, file: !1, line: 1, type: !7, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !8) + !7 = !DISubroutineType(types: !2) + !8 = !{!9, !11} + !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10) + !10 = !DIBasicType(name: "ty32", size: 32, encoding: DW_ATE_unsigned) + !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10) + !12 = !DILocation(line: 1, column: 1, scope: !6) + !13 = !DILocation(line: 2, column: 1, scope: !6) + !14 = !DILocation(line: 3, column: 1, scope: !6) + !15 = !DILocation(line: 4, column: 1, scope: !6) + +... +--- +name: foo +liveins: + - { reg: '$edi', virtual-reg: '' } +stack: + - { id: 0, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4, + stack-id: default, callee-saved-register: '', callee-saved-restored: true, + debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } +body: | + bb.0.entry: + liveins: $edi + DBG_VALUE renamable $edi, $noreg, !11, !DIExpression(), debug-location !14 + RETQ debug-location !14 + +... From llvm-commits at lists.llvm.org Wed Jul 8 02:31:16 2020 From: llvm-commits at lists.llvm.org (Jeremy Morse via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:31:16 +0000 (UTC) Subject: [PATCH] D83236: [DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGb9d977b0ca60: [DWARF] Add cuttoff guarding quadratic validThroughout behaviour (authored by jmorse). Changed prior to commit: https://reviews.llvm.org/D83236?vs=275714&id=276350#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83236/new/ https://reviews.llvm.org/D83236 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83236.276350.patch Type: text/x-patch Size: 7117 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:32:28 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:32:28 +0000 (UTC) Subject: [PATCH] D77129: [Verifier] Verify matrix dimensions operands match vector size. In-Reply-To: References: Message-ID: <2f6575341e410fc02b044b4a48306517@localhost.localdomain> SjoerdMeijer added inline comments. ================ Comment at: llvm/lib/IR/Verifier.cpp:4805 + NumColumns = cast(Call.getArgOperand(4)); + TypeToCheck = cast(Call.getType()); + break; ---------------- Quick query on this and the semantics: declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 , i32 , i32 ) do we expect the element types of vectors %A and %B to be same, and do we need to check this? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77129/new/ https://reviews.llvm.org/D77129 From llvm-commits at lists.llvm.org Wed Jul 8 02:35:05 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:35:05 +0000 (UTC) Subject: [PATCH] D83375: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) Message-ID: gchatelet created this revision. gchatelet added a reviewer: courbet. Herald added subscribers: llvm-commits, jfb, hiraditya. Herald added a project: LLVM. This is preparatory work to unable storing alignment for AtomicCmpXchgInst. See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83375 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83375.276353.patch Type: text/x-patch Size: 8009 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:40:01 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:40:01 +0000 (UTC) Subject: [PATCH] D83376: [Legalizer] Fix wrong operand in split vector helper Message-ID: qiucf created this revision. qiucf added reviewers: andrew.w.kaylor, uweigand, craig.topper, kpn. Herald added subscribers: llvm-commits, steven.zhang, hiraditya. Herald added a project: LLVM. It's hard to provide a test case now since vector int-to-fp isn't ready on PowerPC. But this should be easy to review. I guess it's a typo introduced in D69275 . It may cause an unknown segfault in `getNode`. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83376 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2610,9 +2610,9 @@ SDValue Chain; if (N->isStrictFPOpcode()) { HalfLo = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfLo}); + {N->getOperand(0), InLoVec}); HalfHi = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfHi}); + {N->getOperand(0), InHiVec}); // Legalize the chain result - switch anything that used the old chain to // use the new one. Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, HalfLo.getValue(1), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83376.276349.patch Type: text/x-patch Size: 858 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:41:01 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:41:01 +0000 (UTC) Subject: [PATCH] D83375: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: References: Message-ID: <52bbc2063a6db69324d75c399f8ad010@localhost.localdomain> gchatelet marked 2 inline comments as done. gchatelet added a comment. This patch splits the two implementations, moves a few lines and simplifies the code by inferring constant values. The semantic is unchanged. ================ Comment at: llvm/include/llvm/Bitcode/LLVMBitCodes.h:539 FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, vol, - // ordering, synchscope] + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, + // success_ordering, ssid, ---------------- The documentation here was wrong. alignment was never stored for `FUNC_CODE_INST_CMPXCHG_OLD` and `failure_ordering` and `weak` were optional. ================ Comment at: llvm/lib/Bitcode/Reader/BitcodeReader.cpp:4989 + Value *Ptr = nullptr; + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) return error("Invalid record"); ---------------- Each function taking `Slot` (previously `OpNum`) will increase it if successful. This allows to replace `OpNum + X` by its value. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83375/new/ https://reviews.llvm.org/D83375 From llvm-commits at lists.llvm.org Wed Jul 8 02:42:56 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:42:56 +0000 (UTC) Subject: [PATCH] D82868: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related `CFIProgram::parse` code. In-Reply-To: References: Message-ID: <1eeb5bfdf17ba47f05cdefe8ee6ebc5c@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGbee8cdcabd2b: [DebugInfo/DWARF] - Test invalid CFI opcodes properly and refine related… (authored by grimar). Changed prior to commit: https://reviews.llvm.org/D82868?vs=274429&id=275701#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82868/new/ https://reviews.llvm.org/D82868 Files: llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp llvm/unittests/DebugInfo/DWARF/DWARFDebugFrameTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82868.275701.patch Type: text/x-patch Size: 18522 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:46:29 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:46:29 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <9c54e11aee66c5257ce2b02543b04680@localhost.localdomain> RKSimon marked an inline comment as done. RKSimon added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:36736 + Constant *Elt = C->getAggregateElement(i); + if (!DemandedElts[i / Scale] && !isa(Elt)) { + ConstVecOps.push_back(UndefValue::get(Elt->getType())); ---------------- yubing wrote: > Hi, Simon. I'm just wondering why we divide i by scale here. In my case: > When SimplifyDemandedVectorEltsForTargetShuffle visit t150, demandedElts is 0xff0f, scale is 2. so when i=8, DemandedElts[i / Scale] is false, but DemandedElts[i] is true. Thus the t146[8] will become undef while the previous value is -1. > > t146: i64 = X86ISD::Wrapper TargetConstantPool:i64<<32 x i8> > 0 > t154: v16i8,ch = load<(load 16 from constant-pool, align 32)> t0, t146, undef:i64 > t150: v16i8 = X86ISD::PSHUFB t156, t154 Scale should only be used to handle vXi64 <-> v2Xi32 style issues on 32-bit targets - that we're hitting this on other types is a bug because we're not dealing with the fact that the Constant might be a different size to the mask Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 02:51:27 2020 From: llvm-commits at lists.llvm.org (Paul Walker via llvm-commits) Date: Wed, 08 Jul 2020 02:51:27 -0700 (PDT) Subject: [llvm] fb75451 - [SVE] Custom ISel for fixed length extract/insert_subvector. Message-ID: <5f05971f.1c69fb81.a0b21.0291@mx.google.com> Author: Paul Walker Date: 2020-07-08T09:49:28Z New Revision: fb75451775f83c04d53e4e94bb4bd298ea9a882f URL: https://github.com/llvm/llvm-project/commit/fb75451775f83c04d53e4e94bb4bd298ea9a882f DIFF: https://github.com/llvm/llvm-project/commit/fb75451775f83c04d53e4e94bb4bd298ea9a882f.diff LOG: [SVE] Custom ISel for fixed length extract/insert_subvector. We use extact_subvector and insert_subvector to "cast" between fixed length and scalable vectors. This patch adds custom c++ based ISel for the following cases: fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0 Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized vectors or COPY_TO_REGCLASS otherwise. Differential Revision: https://reviews.llvm.org/D82871 Added: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll Modified: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp index 4ef9bfb3aab6..10c477853353 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp @@ -3240,6 +3240,63 @@ void AArch64DAGToDAGISel::SelectTagP(SDNode *N) { ReplaceNode(N, N3); } +// NOTE: We cannot use EXTRACT_SUBREG in all cases because the fixed length +// vector types larger than NEON don't have a matching SubRegIndex. +static SDNode *extractSubReg(SelectionDAG *DAG, EVT VT, SDValue V) { + assert(V.getValueType().isScalableVector() && + V.getValueType().getSizeInBits().getKnownMinSize() == + AArch64::SVEBitsPerBlock && + "Expected to extract from a packed scalable vector!"); + assert(VT.isFixedLengthVector() && + "Expected to extract a fixed length vector!"); + + SDLoc DL(V); + switch (VT.getSizeInBits()) { + case 64: { + auto SubReg = DAG->getTargetConstant(AArch64::dsub, DL, MVT::i32); + return DAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, VT, V, SubReg); + } + case 128: { + auto SubReg = DAG->getTargetConstant(AArch64::zsub, DL, MVT::i32); + return DAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, VT, V, SubReg); + } + default: { + auto RC = DAG->getTargetConstant(AArch64::ZPRRegClassID, DL, MVT::i64); + return DAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS, DL, VT, V, RC); + } + } +} + +// NOTE: We cannot use INSERT_SUBREG in all cases because the fixed length +// vector types larger than NEON don't have a matching SubRegIndex. +static SDNode *insertSubReg(SelectionDAG *DAG, EVT VT, SDValue V) { + assert(VT.isScalableVector() && + VT.getSizeInBits().getKnownMinSize() == AArch64::SVEBitsPerBlock && + "Expected to insert into a packed scalable vector!"); + assert(V.getValueType().isFixedLengthVector() && + "Expected to insert a fixed length vector!"); + + SDLoc DL(V); + switch (V.getValueType().getSizeInBits()) { + case 64: { + auto SubReg = DAG->getTargetConstant(AArch64::dsub, DL, MVT::i32); + auto Container = DAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); + return DAG->getMachineNode(TargetOpcode::INSERT_SUBREG, DL, VT, + SDValue(Container, 0), V, SubReg); + } + case 128: { + auto SubReg = DAG->getTargetConstant(AArch64::zsub, DL, MVT::i32); + auto Container = DAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); + return DAG->getMachineNode(TargetOpcode::INSERT_SUBREG, DL, VT, + SDValue(Container, 0), V, SubReg); + } + default: { + auto RC = DAG->getTargetConstant(AArch64::ZPRRegClassID, DL, MVT::i64); + return DAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS, DL, VT, V, RC); + } + } +} + void AArch64DAGToDAGISel::Select(SDNode *Node) { // If we have a custom node, we already have selected! if (Node->isMachineOpcode()) { @@ -3313,6 +3370,52 @@ void AArch64DAGToDAGISel::Select(SDNode *Node) { return; break; + case ISD::EXTRACT_SUBVECTOR: { + // Bail when not a "cast" like extract_subvector. + if (cast(Node->getOperand(1))->getZExtValue() != 0) + break; + + // Bail when normal isel can do the job. + EVT InVT = Node->getOperand(0).getValueType(); + if (VT.isScalableVector() || InVT.isFixedLengthVector()) + break; + + // NOTE: We can only get here when doing fixed length SVE code generation. + // We do manual selection because the types involved are not linked to real + // registers (despite being legal) and must be coerced into SVE registers. + // + // NOTE: If the above changes, be aware that selection will still not work + // because the td definition of extract_vector does not support extracting + // a fixed length vector from a scalable vector. + + ReplaceNode(Node, extractSubReg(CurDAG, VT, Node->getOperand(0))); + return; + } + + case ISD::INSERT_SUBVECTOR: { + // Bail when not a "cast" like insert_subvector. + if (cast(Node->getOperand(2))->getZExtValue() != 0) + break; + if (!Node->getOperand(0).isUndef()) + break; + + // Bail when normal isel should do the job. + EVT InVT = Node->getOperand(1).getValueType(); + if (VT.isFixedLengthVector() || InVT.isScalableVector()) + break; + + // NOTE: We can only get here when doing fixed length SVE code generation. + // We do manual selection because the types involved are not linked to real + // registers (despite being legal) and must be coerced into SVE registers. + // + // NOTE: If the above changes, be aware that selection will still not work + // because the td definition of insert_vector does not support inserting a + // fixed length vector into a scalable vector. + + ReplaceNode(Node, insertSubReg(CurDAG, VT, Node->getOperand(1))); + return; + } + case ISD::Constant: { // Materialize zero constants as copies from WZR/XZR. This allows // the coalescer to propagate these into other instructions. diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index aaeb6b459915..729fb8f62912 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -116,6 +116,18 @@ EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden, /// Value type used for condition codes. static const MVT MVT_CC = MVT::i32; +/// Returns true if VT's elements occupy the lowest bit positions of its +/// associated register class without any intervening space. +/// +/// For example, nxv2f16, nxv4f16 and nxv8f16 are legal types that belong to the +/// same register class, but only nxv8f16 can be treated as a packed vector. +static inline bool isPackedVectorType(EVT VT, SelectionDAG &DAG) { + assert(VT.isVector() && DAG.getTargetLoweringInfo().isTypeLegal(VT) && + "Expected legal vector type!"); + return VT.isFixedLengthVector() || + VT.getSizeInBits().getKnownMinSize() == AArch64::SVEBitsPerBlock; +} + AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, const AArch64Subtarget &STI) : TargetLowering(TM), Subtarget(&STI) { @@ -908,6 +920,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, // D68877 for more details. for (MVT VT : MVT::integer_scalable_vector_valuetypes()) { if (isTypeLegal(VT)) { + setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom); setOperationAction(ISD::SPLAT_VECTOR, VT, Custom); setOperationAction(ISD::SELECT, VT, Custom); setOperationAction(ISD::SDIV, VT, Custom); @@ -921,16 +934,18 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, setOperationAction(ISD::SRA, VT, Custom); if (VT.getScalarType() == MVT::i1) setOperationAction(ISD::SETCC, VT, Custom); - } else { - for (auto VT : { MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32 }) - setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom); } } + + for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) + setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom); + setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom); setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom); for (MVT VT : MVT::fp_scalable_vector_valuetypes()) { if (isTypeLegal(VT)) { + setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom); setOperationAction(ISD::SPLAT_VECTOR, VT, Custom); setOperationAction(ISD::SELECT, VT, Custom); } @@ -1037,9 +1052,7 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) { for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op) setOperationAction(Op, VT, Expand); - // EXTRACT_SUBVECTOR/INSERT_SUBVECTOR are used to "cast" between scalable - // and fixed length vector types, although with the current level of support - // only the former is exercised. + // We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one. setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom); // Lower fixed length vector operations to scalable equivalents. @@ -3469,6 +3482,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op, return LowerSPLAT_VECTOR(Op, DAG); case ISD::EXTRACT_SUBVECTOR: return LowerEXTRACT_SUBVECTOR(Op, DAG); + case ISD::INSERT_SUBVECTOR: + return LowerINSERT_SUBVECTOR(Op, DAG); case ISD::SDIV: return LowerToPredicatedOp(Op, DAG, AArch64ISD::SDIV_PRED); case ISD::UDIV: @@ -8679,29 +8694,47 @@ AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op, SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const { - assert(!Op.getValueType().isScalableVector() && - "Unexpected scalable type for custom lowering EXTRACT_SUBVECTOR"); + assert(Op.getValueType().isFixedLengthVector() && + "Only cases that extract a fixed length vector are supported!"); - EVT VT = Op.getOperand(0).getValueType(); - SDLoc dl(Op); - // Just in case... - if (!VT.isVector()) - return SDValue(); + EVT InVT = Op.getOperand(0).getValueType(); + unsigned Idx = cast(Op.getOperand(1))->getZExtValue(); + unsigned Size = Op.getValueSizeInBits(); - ConstantSDNode *Cst = dyn_cast(Op.getOperand(1)); - if (!Cst) - return SDValue(); - unsigned Val = Cst->getZExtValue(); + if (InVT.isScalableVector()) { + // This will be matched by custom code during ISelDAGToDAG. + if (Idx == 0 && isPackedVectorType(InVT, DAG)) + return Op; - unsigned Size = Op.getValueSizeInBits(); + return SDValue(); + } // This will get lowered to an appropriate EXTRACT_SUBREG in ISel. - if (Val == 0) + if (Idx == 0 && InVT.getSizeInBits() <= 128) return Op; // If this is extracting the upper 64-bits of a 128-bit vector, we match // that directly. - if (Size == 64 && Val * VT.getScalarSizeInBits() == 64) + if (Size == 64 && Idx * InVT.getScalarSizeInBits() == 64) + return Op; + + return SDValue(); +} + +SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op, + SelectionDAG &DAG) const { + assert(Op.getValueType().isScalableVector() && + "Only expect to lower inserts into scalable vectors!"); + + EVT InVT = Op.getOperand(1).getValueType(); + unsigned Idx = cast(Op.getOperand(2))->getZExtValue(); + + // We don't have any patterns for scalable vector yet. + if (InVT.isScalableVector() || !useSVEForFixedLengthVectorVT(InVT)) + return SDValue(); + + // This will be matched by custom code during ISelDAGToDAG. + if (Idx == 0 && isPackedVectorType(InVT, DAG) && Op.getOperand(0).isUndef()) return Op; return SDValue(); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h index 210b8c842701..60ce88576f91 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -850,6 +850,7 @@ class AArch64TargetLowering : public TargetLowering { SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG, unsigned NewOp) const; SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const; + SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const; SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const; SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const; SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const; diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll new file mode 100644 index 000000000000..45ebdc78784e --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll @@ -0,0 +1,88 @@ +; RUN: llc -aarch64-sve-vector-bits-min=128 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefix=NO_SVE +; RUN: llc -aarch64-sve-vector-bits-min=256 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK +; RUN: llc -aarch64-sve-vector-bits-min=384 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK +; RUN: llc -aarch64-sve-vector-bits-min=512 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=640 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=768 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=896 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=1024 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1152 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1280 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1408 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1536 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1664 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1792 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1920 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 + +; Test we can code generater patterns of the form: +; fixed_length_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 +; scalable_vector = ISD::INSERT_SUBVECTOR scalable_vector, fixed_length_vector, 0 +; +; NOTE: Currently shufflevector does not support scalable vectors so it cannot +; be used to model the above operations. Instead these tests rely on knowing +; how fixed length operation are lowered to scalable ones, with multiple blocks +; ensuring insert/extract sequences are not folded away. + +target triple = "aarch64-unknown-linux-gnu" + +; Don't use SVE when its registers are no bigger than NEON. +; NO_SVE-NOT: ptrue + +define void @subvector_v8i32(<8 x i32> *%in, <8 x i32>* %out) #0 { +; CHECK-LABEL: subvector_v8i32: +; CHECK: ptrue [[PG:p[0-9]+]].s, vl8 +; CHECK: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0] +; CHECK: st1w { [[DATA]] }, [[PG]], [x1] +; CHECK: ret + %a = load <8 x i32>, <8 x i32>* %in + br label %bb1 + +bb1: + store <8 x i32> %a, <8 x i32>* %out + ret void +} + +define void @subvector_v16i32(<16 x i32> *%in, <16 x i32>* %out) #0 { +; CHECK-LABEL: subvector_v16i32: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16 +; VBITS_GE_512: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0] +; VBITS_GE_512: st1w { [[DATA]] }, [[PG]], [x1] +; CHECKT: ret + %a = load <16 x i32>, <16 x i32>* %in + br label %bb1 + +bb1: + store <16 x i32> %a, <16 x i32>* %out + ret void +} + +define void @subvector_v32i32(<32 x i32> *%in, <32 x i32>* %out) #0 { +; CHECK-LABEL: subvector_v32i32: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32 +; VBITS_GE_1024: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0] +; VBITS_GE_1024: st1w { [[DATA]] }, [[PG]], [x1] +; CHECK: ret + %a = load <32 x i32>, <32 x i32>* %in + br label %bb1 + +bb1: + store <32 x i32> %a, <32 x i32>* %out + ret void +} + +define void @subvector_v64i32(<64 x i32> *%in, <64 x i32>* %out) #0 { +; CHECK-LABEL: subvector_v64i32: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl64 +; VBITS_GE_2048: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0] +; VBITS_GE_2048: st1w { [[DATA]] }, [[PG]], [x1] +; CHECK: ret + %a = load <64 x i32>, <64 x i32>* %in + br label %bb1 + +bb1: + store <64 x i32> %a, <64 x i32>* %out + ret void +} + +attributes #0 = { "target-features"="+sve" } From llvm-commits at lists.llvm.org Wed Jul 8 02:51:43 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:51:43 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: <1fab0c714acddee991354c8b6851a2b0@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGfb75451775f8: [SVE] Custom ISel for fixed length extract/insert_subvector. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 Files: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82871.276355.patch Type: text/x-patch Size: 15990 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 02:57:26 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 09:57:26 +0000 (UTC) Subject: [PATCH] D83377: [ARM] Expand distributing increments to also handle existing pre/post inc instructions. Message-ID: dmgreen created this revision. dmgreen added reviewers: samparker, SjoerdMeijer, efriedma, simon_tatham, ostannard. Herald added subscribers: danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. This extends the distributing postinc code in load/store optimizer to also handle the case where there is an existing pre/post inc instuction, where subsequent instructions can be modified to use the adjusted increment from the increment. This can save us having to keep the old register live past the increment instruction. https://reviews.llvm.org/D83377 Files: llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp llvm/test/CodeGen/Thumb2/mve-postinc-distribute.mir llvm/test/CodeGen/Thumb2/mve-vst2.ll llvm/test/CodeGen/Thumb2/mve-vst3.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83377.276336.patch Type: text/x-patch Size: 30498 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:05:14 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:05:14 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <7a93218433478bd237b75f2d17259d7c@localhost.localdomain> dmgreen added a comment. Ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Wed Jul 8 03:05:43 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:05:43 +0000 (UTC) Subject: [PATCH] D82676: [CGP] Prevent optimizePhiType from iterating forever In-Reply-To: References: Message-ID: dmgreen added a comment. ^ Thanks for the help anyway! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82676/new/ https://reviews.llvm.org/D82676 From llvm-commits at lists.llvm.org Wed Jul 8 03:15:13 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:15:13 +0000 (UTC) Subject: [PATCH] D82871: [SVE] Custom ISel for fixed length extract/insert_subvector. In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGfb75451775f8: [SVE] Custom ISel for fixed length extract/insert_subvector. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82871/new/ https://reviews.llvm.org/D82871 Files: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82871.275703.patch Type: text/x-patch Size: 15990 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:18:00 2020 From: llvm-commits at lists.llvm.org (Alex Richardson via llvm-commits) Date: Wed, 08 Jul 2020 03:18:00 -0700 (PDT) Subject: [llvm] aae4134 - [UpdateTestChecks] Move more update_test_checks.py logic to common.py Message-ID: <5f059d58.1c69fb81.6c5b8.8801@mx.google.com> Author: Alex Richardson Date: 2020-07-08T10:59:28+01:00 New Revision: aae413462fae16c481df31ff23b951c5df494a60 URL: https://github.com/llvm/llvm-project/commit/aae413462fae16c481df31ff23b951c5df494a60 DIFF: https://github.com/llvm/llvm-project/commit/aae413462fae16c481df31ff23b951c5df494a60.diff LOG: [UpdateTestChecks] Move more update_test_checks.py logic to common.py I intend to reuse this to add UTC_ARGS support for update_llc_test_checks.py and update_cc_test_checks.py in D78478. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78618 Added: Modified: llvm/utils/UpdateTestChecks/common.py llvm/utils/update_test_checks.py Removed: ################################################################################ diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py index 9abb8d790c5f..7b4a9b3e6d16 100644 --- a/llvm/utils/UpdateTestChecks/common.py +++ b/llvm/utils/UpdateTestChecks/common.py @@ -1,9 +1,10 @@ from __future__ import print_function + +import copy +import glob import re -import string import subprocess import sys -import copy if sys.version_info[0] > 2: class string: @@ -21,11 +22,85 @@ def parse_commandline_args(parser): help='Show verbose output') parser.add_argument('-u', '--update-only', action='store_true', help='Only update test if it was already autogened') + parser.add_argument('--force-update', action='store_true', + help='Update test even if it was autogened by a diff erent script') + parser.add_argument('--enable', action='store_true', dest='enabled', default=True, + help='Activate CHECK line generation from this point forward') + parser.add_argument('--disable', action='store_false', dest='enabled', + help='Deactivate CHECK line generation from this point forward') args = parser.parse_args() global _verbose _verbose = args.verbose return args + +class InputLineInfo(object): + def __init__(self, line, line_number, args, argv): + self.line = line + self.line_number = line_number + self.args = args + self.argv = argv + + +class TestInfo(object): + def __init__(self, test, parser, script_name, input_lines, args, argv, + comment_prefix): + self.parser = parser + self.path = test + self.args = args + self.argv = argv + self.input_lines = input_lines + self.run_lines = find_run_lines(test, self.input_lines) + self.comment_prefix = comment_prefix + if self.comment_prefix is None: + if self.path.endswith('.mir'): + self.comment_prefix = '#' + else: + self.comment_prefix = ';' + self.autogenerated_note_prefix = self.comment_prefix + ' ' + UTC_ADVERT + self.test_autogenerated_note = self.autogenerated_note_prefix + script_name + self.test_autogenerated_note += get_autogennote_suffix(parser, self.args) + + def iterlines(self, output_lines): + output_lines.append(self.test_autogenerated_note) + for line_num, input_line in enumerate(self.input_lines): + # Discard any previous script advertising. + if input_line.startswith(self.autogenerated_note_prefix): + continue + self.args, self.argv = check_for_command(input_line, self.parser, + self.args, self.argv) + if not self.args.enabled: + output_lines.append(input_line) + continue + yield InputLineInfo(input_line, line_num, self.args, self.argv) + + +def itertests(test_patterns, parser, script_name, comment_prefix=None): + for pattern in test_patterns: + # On Windows we must expand the patterns ourselves. + tests_list = glob.glob(pattern) + if not tests_list: + warn("Test file pattern '%s' was not found. Ignoring it." % (pattern,)) + continue + for test in tests_list: + with open(test) as f: + input_lines = [l.rstrip() for l in f] + args = parser.parse_args() + argv = sys.argv[:] + first_line = input_lines[0] if input_lines else "" + if UTC_ADVERT in first_line: + if script_name not in first_line and not args.force_update: + warn("Skipping test which wasn't autogenerated by " + script_name, test) + continue + args, argv = check_for_command(first_line, parser, args, argv) + elif args.update_only: + assert UTC_ADVERT not in first_line + warn("Skipping test which isn't autogenerated: " + test) + continue + yield TestInfo(test, parser, script_name, input_lines, args, argv, + comment_prefix) + + def should_add_line_to_output(input_line, prefix_set): # Skip any blank comment lines in the IR. if input_line.strip() == ';': @@ -57,7 +132,6 @@ def invoke_tool(exe, cmd_args, ir): return stdout.replace('\r\n', '\n') ##### LLVM IR parser - RUN_LINE_RE = re.compile(r'^\s*(?://|[;#])\s*RUN:\s*(.*)$') CHECK_PREFIX_RE = re.compile(r'--?check-prefix(?:es)?[= ](\S+)') PREFIX_RE = re.compile('^[a-zA-Z0-9_-]+$') @@ -65,6 +139,7 @@ def invoke_tool(exe, cmd_args, ir): UTC_ARGS_KEY = 'UTC_ARGS:' UTC_ARGS_CMD = re.compile(r'.*' + UTC_ARGS_KEY + '\s*(?P.*)\s*$') +UTC_ADVERT = 'NOTE: Assertions have been autogenerated by ' OPT_FUNCTION_RE = re.compile( r'^\s*define\s+(?:internal\s+)?[^@]*@(?P[\w.-]+?)\s*' diff --git a/llvm/utils/update_test_checks.py b/llvm/utils/update_test_checks.py index 014b55c9ae83..be15ae9e2685 100755 --- a/llvm/utils/update_test_checks.py +++ b/llvm/utils/update_test_checks.py @@ -32,18 +32,12 @@ from __future__ import print_function import argparse -import glob -import itertools -import os # Used to advertise this file's name ("autogenerated_note"). -import string -import subprocess -import sys -import tempfile +import os # Used to advertise this file's name ("autogenerated_note"). import re +import sys from UpdateTestChecks import common -ADVERT = '; NOTE: Assertions have been autogenerated by ' def main(): from argparse import RawTextHelpFormatter @@ -58,58 +52,26 @@ def main(): help='Keep function signature information around for the check line') parser.add_argument('--scrub-attributes', action='store_true', help='Remove attribute annotations (#0) from the end of check line') - parser.add_argument('--enable', action='store_true', dest='enabled', default=True, - help='Activate CHECK line generation from this point forward') - parser.add_argument('--disable', action='store_false', dest='enabled', - help='Deactivate CHECK line generation from this point forward') parser.add_argument('tests', nargs='+') - args = common.parse_commandline_args(parser) + initial_args = common.parse_commandline_args(parser) script_name = os.path.basename(__file__) - autogenerated_note = (ADVERT + 'utils/' + script_name) - - opt_basename = os.path.basename(args.opt_binary) + opt_basename = os.path.basename(initial_args.opt_binary) if not re.match(r'^opt(-\d+)?$', opt_basename): common.error('Unexpected opt name: ' + opt_basename) sys.exit(1) opt_basename = 'opt' - for test in args.tests: - if not glob.glob(test): - common.warn("Test file pattern '%s' was not found. Ignoring it." % (test,)) - continue - - # On Windows we must expand the patterns ourselves. - test_paths = [test for pattern in args.tests for test in glob.glob(pattern)] - for test in test_paths: - argv = sys.argv[:] - args = parser.parse_args() - with open(test) as f: - input_lines = [l.rstrip() for l in f] - - first_line = input_lines[0] if input_lines else "" - if 'autogenerated' in first_line and script_name not in first_line: - common.warn("Skipping test which wasn't autogenerated by " + script_name, test) - continue - if first_line and 'autogenerated' in first_line: - args, argv = common.check_for_command(first_line, parser, args, argv) - test_autogenerated_note = autogenerated_note + common.get_autogennote_suffix(parser, args) - - if args.update_only: - if not first_line or 'autogenerated' not in first_line: - common.warn("Skipping test which isn't autogenerated: " + test) - continue - - run_lines = common.find_run_lines(test, input_lines) - + for ti in common.itertests(initial_args.tests, parser, + script_name='utils/' + script_name): # If requested we scrub trailing attribute annotations, e.g., '#0', together with whitespaces - if args.scrub_attributes: + if ti.args.scrub_attributes: common.SCRUB_TRAILING_WHITESPACE_TEST_RE = common.SCRUB_TRAILING_WHITESPACE_AND_ATTRIBUTES_RE else: common.SCRUB_TRAILING_WHITESPACE_TEST_RE = common.SCRUB_TRAILING_WHITESPACE_RE prefix_list = [] - for l in run_lines: + for l in ti.run_lines: if '|' not in l: common.warn('Skipping unparseable RUN line: ' + l) continue @@ -127,8 +89,9 @@ def main(): tool_cmd_args = tool_cmd[len(opt_basename):].strip() tool_cmd_args = tool_cmd_args.replace('< %s', '').replace('%s', '').strip() - check_prefixes = [item for m in common.CHECK_PREFIX_RE.finditer(filecheck_cmd) - for item in m.group(1).split(',')] + check_prefixes = [item for m in + common.CHECK_PREFIX_RE.finditer(filecheck_cmd) + for item in m.group(1).split(',')] if not check_prefixes: check_prefixes = ['CHECK'] @@ -144,28 +107,20 @@ def main(): common.debug('Extracted opt cmd: ' + opt_basename + ' ' + opt_args) common.debug('Extracted FileCheck prefixes: ' + str(prefixes)) - raw_tool_output = common.invoke_tool(args.opt_binary, opt_args, test) + raw_tool_output = common.invoke_tool(ti.args.opt_binary, opt_args, ti.path) common.build_function_body_dictionary( common.OPT_FUNCTION_RE, common.scrub_body, [], - raw_tool_output, prefixes, func_dict, args.verbose, - args.function_signature) + raw_tool_output, prefixes, func_dict, ti.args.verbose, + ti.args.function_signature) is_in_function = False is_in_function_start = False prefix_set = set([prefix for prefixes, _ in prefix_list for prefix in prefixes]) common.debug('Rewriting FileCheck prefixes:', str(prefix_set)) output_lines = [] - output_lines.append(test_autogenerated_note) - - for input_line in input_lines: - # Discard any previous script advertising. - if input_line.startswith(ADVERT): - continue - - args, argv = common.check_for_command(input_line, parser, args, argv) - if not args.enabled: - output_lines.append(input_line) - continue + for input_line_info in ti.iterlines(output_lines): + input_line = input_line_info.line + args = input_line_info.args if is_in_function_start: if input_line == '': continue @@ -204,9 +159,9 @@ def main(): continue is_in_function = is_in_function_start = True - common.debug('Writing %d lines to %s...' % (len(output_lines), test)) + common.debug('Writing %d lines to %s...' % (len(output_lines), ti.path)) - with open(test, 'wb') as f: + with open(ti.path, 'wb') as f: f.writelines(['{}\n'.format(l).encode('utf-8') for l in output_lines]) From llvm-commits at lists.llvm.org Wed Jul 8 03:18:03 2020 From: llvm-commits at lists.llvm.org (Alex Richardson via llvm-commits) Date: Wed, 08 Jul 2020 03:18:03 -0700 (PDT) Subject: [llvm] a80afc0 - [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py Message-ID: <5f059d5b.1c69fb81.618ec.ebce@mx.google.com> Author: Alex Richardson Date: 2020-07-08T11:00:10+01:00 New Revision: a80afc032859ebe65af283f76b38a0f5921b683f URL: https://github.com/llvm/llvm-project/commit/a80afc032859ebe65af283f76b38a0f5921b683f DIFF: https://github.com/llvm/llvm-project/commit/a80afc032859ebe65af283f76b38a0f5921b683f.diff LOG: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py https://reviews.llvm.org/D69701 added support for on-the-fly argument changes for update scripts. I recently wanted to keep some manual check lines in a test generated by update_cc_test_checks.py in our CHERI fork, so this commit adds support for UTC_ARGS in update_cc_test_checks.py. And since I was refactoring the code to be in common.py, I also added it for update_llc_test_checks.py. Reviewed By: jdoerfert, MaskRay Differential Revision: https://reviews.llvm.org/D78478 Added: clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test Modified: clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected clang/test/utils/update_cc_test_checks/mangled_names.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test llvm/utils/update_cc_test_checks.py llvm/utils/update_llc_test_checks.py Removed: ################################################################################ diff --git a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected index 005b2f242747..e76cf074bdb7 100644 --- a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected +++ b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected @@ -1,4 +1,4 @@ -// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature // Example input for update_cc_test_checks // RUN: %clang_cc1 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s | FileCheck %s diff --git a/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c b/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c new file mode 100644 index 000000000000..8956e6b52a21 --- /dev/null +++ b/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c @@ -0,0 +1,20 @@ +// RUN: %clang_cc1 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s | FileCheck %s + +int checks_please() { + return 1; +} + +// UTC_ARGS: --disable + +int no_checks_please() { + // Manual CHECK line should be retained: + // CHECK: manual check line + return -1; +} + +// UTC_ARGS: --enable + + +int checks_again() { + return 2; +} diff --git a/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected b/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected new file mode 100644 index 000000000000..cb7846c7b3d5 --- /dev/null +++ b/clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected @@ -0,0 +1,29 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// RUN: %clang_cc1 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s | FileCheck %s + +// CHECK-LABEL: @checks_please( +// CHECK-NEXT: entry: +// CHECK-NEXT: ret i32 1 +// +int checks_please() { + return 1; +} + +// UTC_ARGS: --disable + +int no_checks_please() { + // Manual CHECK line should be retained: + // CHECK: manual check line + return -1; +} + +// UTC_ARGS: --enable + + +// CHECK-LABEL: @checks_again( +// CHECK-NEXT: entry: +// CHECK-NEXT: ret i32 2 +// +int checks_again() { + return 2; +} diff --git a/clang/test/utils/update_cc_test_checks/mangled_names.test b/clang/test/utils/update_cc_test_checks/mangled_names.test index 082ed74304f0..bc88c9b5a382 100644 --- a/clang/test/utils/update_cc_test_checks/mangled_names.test +++ b/clang/test/utils/update_cc_test_checks/mangled_names.test @@ -8,6 +8,11 @@ ## Also try the --function-signature flag # RUN: %update_cc_test_checks %t.c --function-signature # RUN: diff -u %t.c %S/Inputs/mangled_names.c.funcsig.expected -## Verify that running without the --function-signature flag removes the -SAME: lines: +## Running it again should implicitly add the function-signature flag due to UTC_ARGS: # RUN: %update_cc_test_checks %t.c -# RUN: diff -u %t.c %S/Inputs/mangled_names.c.expected +# RUN: diff -u %t.c %S/Inputs/mangled_names.c.funcsig.expected +## Verify that running without the --function-signature flag removes the -SAME: lines: +## We have to remove the UTC_ARGS comment first: +# RUN: grep -v UTC_ARGS %t.c > %t-no-args.c +# RUN: %update_cc_test_checks %t-no-args.c +# RUN: diff -u %t-no-args.c %S/Inputs/mangled_names.c.expected diff --git a/clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test b/clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test new file mode 100644 index 000000000000..629b01f9d066 --- /dev/null +++ b/clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test @@ -0,0 +1,6 @@ +# RUN: cp -f %S/Inputs/on_the_fly_arg_change.c %t.c +# RUN: %update_cc_test_checks %t.c +# RUN: diff -u %t.c %S/Inputs/on_the_fly_arg_change.c.expected +## Check that running the script again does not change the result: +# RUN: %update_cc_test_checks %t.c +# RUN: diff -u %t.c %S/Inputs/on_the_fly_arg_change.c.expected diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected index 96b06de79c93..3c49f489a353 100644 --- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected @@ -1,4 +1,3 @@ -; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; Example input for update_llc_test_checks (taken from CodeGen/X86/iabs.ll) ; RUN: llc < %s -mtriple=i686-unknown-unknown | FileCheck %s --check-prefix=X86 --check-prefix=X86-NO-CMOV ; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+cmov | FileCheck %s --check-prefix=X86 --check-prefix=X86-CMOV diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll new file mode 100644 index 000000000000..ed5d949bb092 --- /dev/null +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll @@ -0,0 +1,22 @@ +; RUN: llc < %s -mtriple=i686-unknown-unknown | FileCheck %s + +declare void @foo() + +define i64 @check_lines_1() { + ret i64 1 +} + +; UTC_ARGS: --disable + +define i64 @no_check_lines() { +; A check line that would not be auto-generated (should not be removed!). +; CHECK: manual check line + ret i64 2 +} + +; UTC_ARGS: --enable --no_x86_scrub_rip + +define i64 @check_lines_2() { + %result = call i64 @no_check_lines() + ret i64 %result +} diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected new file mode 100644 index 000000000000..f1955c4af252 --- /dev/null +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected @@ -0,0 +1,32 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=i686-unknown-unknown | FileCheck %s + +declare void @foo() + +define i64 @check_lines_1() { +; CHECK-LABEL: check_lines_1: +; CHECK: # %bb.0: +; CHECK-NEXT: movl $1, %eax +; CHECK-NEXT: xorl %edx, %edx +; CHECK-NEXT: retl + ret i64 1 +} + +; UTC_ARGS: --disable + +define i64 @no_check_lines() { +; A check line that would not be auto-generated (should not be removed!). +; CHECK: manual check line + ret i64 2 +} + +; UTC_ARGS: --enable --no_x86_scrub_rip + +define i64 @check_lines_2() { +; CHECK-LABEL: check_lines_2: +; CHECK: # %bb.0: +; CHECK-NEXT: calll no_check_lines +; CHECK-NEXT: retl + %result = call i64 @no_check_lines() + ret i64 %result +} diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test index 74c2b6cd70ed..36d0f5d9ff7e 100644 --- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test @@ -2,14 +2,24 @@ ## Basic test checking that update_llc_test_checks.py can update a file with multiple check prefixes # RUN: cp -f %S/Inputs/basic.ll %t.ll && %update_llc_test_checks %t.ll -# RUN: diff -u %S/Inputs/basic.ll.expected %t.ll -## The flags --x86_scrub_rip and --extra_scrub should have any effect for this +# RUN: echo '; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py' > %t.expected.ll +# RUN: cat %S/Inputs/basic.ll.expected >> %t.expected.ll +# RUN: diff -u %t.expected.ll %t.ll + +## The flags --x86_scrub_rip and --extra_scrub should not have any effect on this ## test. Check the output is identical. -# RUN: cp -f %S/Inputs/basic.ll %t.ll && %update_llc_test_checks --extra_scrub %t.ll -# RUN: diff -u %S/Inputs/basic.ll.expected %t.ll -# RUN: cp -f %S/Inputs/basic.ll %t.ll && %update_llc_test_checks --x86_scrub_rip %t.ll -# RUN: diff -u %S/Inputs/basic.ll.expected %t.ll +# RUN: cp -f %S/Inputs/basic.ll %t.ll && %update_llc_test_checks --extra_scrub %t.ll +# RUN: echo '; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --extra_scrub' > %t.expected.ll +# RUN: cat %S/Inputs/basic.ll.expected >> %t.expected.ll +# RUN: diff -u %t.expected.ll %t.ll +# RUN: cp -f %S/Inputs/basic.ll %t.ll && %update_llc_test_checks --no_x86_scrub_rip %t.ll +# RUN: echo '; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_rip' > %t.expected.ll +# RUN: cat %S/Inputs/basic.ll.expected >> %t.expected.ll +# RUN: diff -u %t.expected.ll %t.ll + ## Finally, run the script on an already updated file and verify that all previous ## CHECK lines are removed. # RUN: %update_llc_test_checks %t.ll -# RUN: diff -u %S/Inputs/basic.ll.expected %t.ll +# RUN: echo '; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_rip' > %t.expected.ll +# RUN: cat %S/Inputs/basic.ll.expected >> %t.expected.ll +# RUN: diff -u %t.expected.ll %t.ll diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test new file mode 100644 index 000000000000..ec85fd7cb7ef --- /dev/null +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test @@ -0,0 +1,6 @@ +# RUN: cp -f %S/Inputs/on_the_fly_arg_change.ll %t.ll +# RUN: %update_llc_test_checks %t.ll +# RUN: diff -u %t.ll %S/Inputs/on_the_fly_arg_change.ll.expected +## Check that running the script again does not change the result: +# RUN: %update_llc_test_checks %t.ll +# RUN: diff -u %t.ll %S/Inputs/on_the_fly_arg_change.ll.expected diff --git a/llvm/utils/update_cc_test_checks.py b/llvm/utils/update_cc_test_checks.py index 7c69ff339cfb..5851f19e3d2d 100755 --- a/llvm/utils/update_cc_test_checks.py +++ b/llvm/utils/update_cc_test_checks.py @@ -19,16 +19,13 @@ import distutils.spawn import json import os +import re import shlex -import string import subprocess import sys -import re import tempfile -from UpdateTestChecks import asm, common - -ADVERT = '// NOTE: Assertions have been autogenerated by ' +from UpdateTestChecks import common SUBST = { '%clang': [], @@ -110,6 +107,11 @@ def parse_clang_ast_json(node): return ret +def str_to_commandline(value): + if not value: + return [] + return shlex.split(value) + def config(): parser = argparse.ArgumentParser( description=__doc__, @@ -117,7 +119,7 @@ def config(): parser.add_argument('--llvm-bin', help='llvm $prefix/bin path') parser.add_argument('--clang', help='"clang" executable, defaults to $llvm_bin/clang') - parser.add_argument('--clang-args', + parser.add_argument('--clang-args', default=[], type=str_to_commandline, help='Space-separated extra args to clang, e.g. --clang-args=-v') parser.add_argument('--opt', help='"opt" executable, defaults to $llvm_bin/opt') @@ -131,7 +133,6 @@ def config(): help='Keep function signature information around for the check line') parser.add_argument('tests', nargs='+') args = common.parse_commandline_args(parser) - args.clang_args = shlex.split(args.clang_args or '') if args.clang is None: if args.llvm_bin is None: @@ -165,7 +166,7 @@ def config(): # defer this error message until we find that opt is actually needed. args.opt = None - return args + return args, parser def get_function_body(args, filename, clang_args, extra_commands, prefixes, triple_in_cmd, func_dict): @@ -196,31 +197,15 @@ def get_function_body(args, filename, clang_args, extra_commands, prefixes, trip def main(): - args = config() + initial_args, parser = config() script_name = os.path.basename(__file__) - autogenerated_note = (ADVERT + 'utils/' + script_name) - - for filename in args.tests: - with open(filename) as f: - input_lines = [l.rstrip() for l in f] - - first_line = input_lines[0] if input_lines else "" - if 'autogenerated' in first_line and script_name not in first_line: - common.warn("Skipping test which wasn't autogenerated by " + script_name, filename) - continue - - if args.update_only: - if not first_line or 'autogenerated' not in first_line: - common.warn("Skipping test which isn't autogenerated: " + filename) - continue - - # Extract RUN lines. - run_lines = common.find_run_lines(filename, input_lines) + for ti in common.itertests(initial_args.tests, parser, 'utils/' + script_name, + comment_prefix='//'): # Build a list of clang command lines and check prefixes from RUN lines. run_list = [] line2spell_and_mangled_list = collections.defaultdict(list) - for l in run_lines: + for l in ti.run_lines: commands = [cmd.strip() for cmd in l.split('|')] triple_in_cmd = None @@ -234,7 +219,7 @@ def main(): print('WARNING: Skipping non-clang RUN line: ' + l, file=sys.stderr) continue clang_args[0:1] = SUBST[clang_args[0]] - clang_args = [filename if i == '%s' else i for i in clang_args] + args.clang_args + clang_args = [ti.path if i == '%s' else i for i in clang_args] + ti.args.clang_args # Permit piping the output through opt if not (len(commands) == 2 or @@ -253,18 +238,6 @@ def main(): check_prefixes = ['CHECK'] run_list.append((check_prefixes, clang_args, commands[1:-1], triple_in_cmd)) - # Strip CHECK lines which are in `prefix_set`, update test file. - prefix_set = set([prefix for p in run_list for prefix in p[0]]) - input_lines = [] - with open(filename, 'r+') as f: - for line in f: - m = common.CHECK_RE.match(line) - if not (m and m.group(1) in prefix_set) and line != '//\n': - input_lines.append(line) - f.seek(0) - f.writelines(input_lines) - f.truncate() - # Execute clang, generate LLVM IR, and extract functions. func_dict = {} for p in run_list: @@ -275,18 +248,23 @@ def main(): common.debug('Extracted clang cmd: clang {}'.format(clang_args)) common.debug('Extracted FileCheck prefixes: {}'.format(prefixes)) - get_function_body(args, filename, clang_args, extra_commands, prefixes, triple_in_cmd, func_dict) + get_function_body(ti.args, ti.path, clang_args, extra_commands, prefixes, triple_in_cmd, func_dict) # Invoke clang -Xclang -ast-dump=json to get mapping from start lines to # mangled names. Forward all clang args for now. - for k, v in get_line2spell_and_mangled(args, clang_args).items(): + for k, v in get_line2spell_and_mangled(ti.args, clang_args).items(): line2spell_and_mangled_list[k].append(v) - output_lines = [autogenerated_note] - for idx, line in enumerate(input_lines): - # Discard any previous script advertising. - if line.startswith(ADVERT): - continue + prefix_set = set([prefix for p in run_list for prefix in p[0]]) + output_lines = [] + for line_info in ti.iterlines(output_lines): + idx = line_info.line_number + line = line_info.line + args = line_info.args + include_line = True + m = common.CHECK_RE.match(line) + if m and m.group(1) in prefix_set: + continue # Don't append the existing CHECK lines if idx in line2spell_and_mangled_list: added = set() for spell, mangled in line2spell_and_mangled_list[idx]: @@ -298,16 +276,25 @@ def main(): if mangled in added or spell not in line: continue if args.functions is None or any(re.search(regex, spell) for regex in args.functions): + last_line = output_lines[-1].strip() + while last_line == '//': + # Remove the comment line since we will generate a new comment + # line as part of common.add_ir_checks() + output_lines.pop() + last_line = output_lines[-1].strip() if added: output_lines.append('//') added.add(mangled) common.add_ir_checks(output_lines, '//', run_list, func_dict, mangled, False, args.function_signature) - output_lines.append(line.rstrip('\n')) + if line.rstrip('\n') == '//': + include_line = False + if include_line: + output_lines.append(line.rstrip('\n')) - common.debug('Writing %d lines to %s...' % (len(output_lines), filename)) - with open(filename, 'wb') as f: + common.debug('Writing %d lines to %s...' % (len(output_lines), ti.path)) + with open(ti.path, 'wb') as f: f.writelines(['{}\n'.format(l).encode('utf-8') for l in output_lines]) return 0 diff --git a/llvm/utils/update_llc_test_checks.py b/llvm/utils/update_llc_test_checks.py index d873c60187b7..05fca340760e 100755 --- a/llvm/utils/update_llc_test_checks.py +++ b/llvm/utils/update_llc_test_checks.py @@ -10,19 +10,13 @@ from __future__ import print_function import argparse -import glob -import os # Used to advertise this file's name ("autogenerated_note"). -import string -import subprocess -import sys -import re +import os # Used to advertise this file's name ("autogenerated_note"). from UpdateTestChecks import asm, common -ADVERT = ' NOTE: Assertions have been autogenerated by ' # llc is the only llc-like in the LLVM tree but downstream forks can add # additional ones here if they have them. -LLC_LIKE_TOOLS = ('llc',) +LLC_LIKE_TOOLS = ('llc',) def main(): parser = argparse.ArgumentParser(description=__doc__) @@ -42,35 +36,21 @@ def main(): '--no_x86_scrub_mem_shuffle', action='store_true', default=False, help='Reduce scrubbing shuffles with memory operands') parser.add_argument('tests', nargs='+') - args = common.parse_commandline_args(parser) + initial_args = common.parse_commandline_args(parser) script_name = os.path.basename(__file__) - test_paths = [test for pattern in args.tests for test in glob.glob(pattern)] - for test in test_paths: - with open(test) as f: - input_lines = [l.rstrip() for l in f] - - first_line = input_lines[0] if input_lines else "" - if 'autogenerated' in first_line and script_name not in first_line: - common.warn("Skipping test which wasn't autogenerated by " + script_name, test) - continue - - if args.update_only: - if not first_line or 'autogenerated' not in first_line: - common.warn("Skipping test which isn't autogenerated: " + test) - continue - + for ti in common.itertests(initial_args.tests, parser, + script_name='utils/' + script_name): triple_in_ir = None - for l in input_lines: + for l in ti.input_lines: m = common.TRIPLE_IR_RE.match(l) if m: triple_in_ir = m.groups()[0] break - run_lines = common.find_run_lines(test, input_lines) run_list = [] - for l in run_lines: + for l in ti.run_lines: if '|' not in l: common.warn('Skipping unparseable RUN line: ' + l) continue @@ -103,7 +83,7 @@ def main(): llc_cmd_args = llc_cmd[len(llc_tool):].strip() llc_cmd_args = llc_cmd_args.replace('< %s', '').replace('%s', '').strip() - if test.endswith('.mir'): + if ti.path.endswith('.mir'): llc_cmd_args += ' -x mir' check_prefixes = [item for m in common.CHECK_PREFIX_RE.finditer(filecheck_cmd) for item in m.group(1).split(',')] @@ -114,13 +94,10 @@ def main(): # now, we just ignore all but the last. run_list.append((check_prefixes, llc_cmd_args, triple_in_cmd, march_in_cmd)) - if test.endswith('.mir'): - comment_sym = '#' + if ti.path.endswith('.mir'): check_indent = ' ' else: - comment_sym = ';' check_indent = '' - autogenerated_note = (comment_sym + ADVERT + 'utils/' + script_name) func_dict = {} for p in run_list: @@ -131,13 +108,12 @@ def main(): common.debug('Extracted LLC cmd:', llc_tool, llc_args) common.debug('Extracted FileCheck prefixes:', str(prefixes)) - raw_tool_output = common.invoke_tool(args.llc_binary or llc_tool, - llc_args, test) + raw_tool_output = common.invoke_tool(ti.args.llc_binary or llc_tool, llc_args, ti.path) triple = triple_in_cmd or triple_in_ir if not triple: triple = asm.get_triple_from_march(march_in_cmd) - asm.build_function_body_dictionary_for_triple(args, raw_tool_output, + asm.build_function_body_dictionary_for_triple(ti.args, raw_tool_output, triple, prefixes, func_dict) is_in_function = False @@ -146,9 +122,9 @@ def main(): prefix_set = set([prefix for p in run_list for prefix in p[0]]) common.debug('Rewriting FileCheck prefixes:', str(prefix_set)) output_lines = [] - output_lines.append(autogenerated_note) - - for input_line in input_lines: + for input_info in ti.iterlines(output_lines): + input_line = input_info.line + args = input_info.args if is_in_function_start: if input_line == '': continue @@ -172,10 +148,6 @@ def main(): is_in_function = False continue - # Discard any previous script advertising. - if input_line.startswith(comment_sym + ADVERT): - continue - # If it's outside a function, it just gets copied to the output. output_lines.append(input_line) @@ -188,9 +160,9 @@ def main(): continue is_in_function = is_in_function_start = True - common.debug('Writing %d lines to %s...' % (len(output_lines), test)) + common.debug('Writing %d lines to %s...' % (len(output_lines), ti.path)) - with open(test, 'wb') as f: + with open(ti.path, 'wb') as f: f.writelines(['{}\n'.format(l).encode('utf-8') for l in output_lines]) From llvm-commits at lists.llvm.org Wed Jul 8 03:18:09 2020 From: llvm-commits at lists.llvm.org (Alexander Richardson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:18:09 +0000 (UTC) Subject: [PATCH] D78618: [UpdateTestChecks] Move more update_test_checks.py logic to common.py In-Reply-To: References: Message-ID: <9cc8942deb7833e7e610f5d87146fbf6@localhost.localdomain> This revision was automatically updated to reflect the committed changes. arichardson marked an inline comment as done. Closed by commit rGaae413462fae: [UpdateTestChecks] Move more update_test_checks.py logic to common.py (authored by arichardson). Changed prior to commit: https://reviews.llvm.org/D78618?vs=274426&id=276358#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78618/new/ https://reviews.llvm.org/D78618 Files: llvm/utils/UpdateTestChecks/common.py llvm/utils/update_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D78618.276358.patch Type: text/x-patch Size: 10573 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:18:13 2020 From: llvm-commits at lists.llvm.org (Alexander Richardson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:18:13 +0000 (UTC) Subject: [PATCH] D78478: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py In-Reply-To: References: Message-ID: <7d951ca1e1ee9c1dd1b821c0275ebaa2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa80afc032859: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py (authored by arichardson). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78478/new/ https://reviews.llvm.org/D78478 Files: clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected clang/test/utils/update_cc_test_checks/mangled_names.test clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test llvm/utils/update_cc_test_checks.py llvm/utils/update_llc_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D78478.276359.patch Type: text/x-patch Size: 21233 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:19:02 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:19:02 +0000 (UTC) Subject: [PATCH] D77129: [Verifier] Verify matrix dimensions operands match vector size. In-Reply-To: References: Message-ID: <2487623762a874ef9c2a2b78a7daa3aa@localhost.localdomain> fhahn marked an inline comment as done. fhahn added inline comments. ================ Comment at: llvm/lib/IR/Verifier.cpp:4805 + NumColumns = cast(Call.getArgOperand(4)); + TypeToCheck = cast(Call.getType()); + break; ---------------- SjoerdMeijer wrote: > Quick query on this and the semantics: > > declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 , i32 , i32 ) > > do we expect the element types of vectors %A and %B to be same, and do we need to check this? Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. To generate code for llvm.aarch64.neon.udot & co, there probably needs to be a way to have different element type widths for result and source operands. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77129/new/ https://reviews.llvm.org/D77129 From llvm-commits at lists.llvm.org Wed Jul 8 03:26:23 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard via llvm-commits) Date: Wed, 08 Jul 2020 03:26:23 -0700 (PDT) Subject: [llvm] a50c7eb - [Support] Fix signed/unsigned comparison warning Message-ID: <5f059f4f.1c69fb81.97b4.c92c@mx.google.com> Author: Oliver Stannard Date: 2020-07-08T11:26:10+01:00 New Revision: a50c7ebfd0f06982e8dc31020acae4d32e6d0e9f URL: https://github.com/llvm/llvm-project/commit/a50c7ebfd0f06982e8dc31020acae4d32e6d0e9f DIFF: https://github.com/llvm/llvm-project/commit/a50c7ebfd0f06982e8dc31020acae4d32e6d0e9f.diff LOG: [Support] Fix signed/unsigned comparison warning Added: Modified: llvm/lib/Support/FormattedStream.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/FormattedStream.cpp b/llvm/lib/Support/FormattedStream.cpp index 081b8bf2cc19..5716afc187e4 100644 --- a/llvm/lib/Support/FormattedStream.cpp +++ b/llvm/lib/Support/FormattedStream.cpp @@ -82,7 +82,7 @@ void formatted_raw_ostream::UpdatePosition(const char *Ptr, size_t Size) { // the display width until we see the rest of the code point. Stash the // bytes we do have, so that we can reconstruct the whole code point later, // even if the buffer is being flushed. - if ((End - Ptr) < NumBytes) { + if ((unsigned)(End - Ptr) < NumBytes) { PartialUTF8Char = StringRef(Ptr, End - Ptr); return; } From llvm-commits at lists.llvm.org Wed Jul 8 03:26:45 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 03:26:45 -0700 (PDT) Subject: [llvm] 75f9aa6 - [X86][AVX] Add SimplifyDemandedVectorEltsForTargetShuffle test for v32i8->v16i8 PSHUFB Message-ID: <5f059f65.1c69fb81.d2dd5.877a@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T11:26:33+01:00 New Revision: 75f9aa6ce0751064d89bb19c9767866d770adf84 URL: https://github.com/llvm/llvm-project/commit/75f9aa6ce0751064d89bb19c9767866d770adf84 DIFF: https://github.com/llvm/llvm-project/commit/75f9aa6ce0751064d89bb19c9767866d770adf84.diff LOG: [X86][AVX] Add SimplifyDemandedVectorEltsForTargetShuffle test for v32i8->v16i8 PSHUFB On SKX targets we end up loading a v16i8 PSHUFB mask from a v32i8 constant and scaling incorrectly indexes the demanded elts mask - we're missing a check that the constant pool is the same size as the loaded mask. Test case from D81791 post-commit review. Added: Modified: llvm/test/CodeGen/X86/vector-shuffle-avx512.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll index 112fd4beed99..b79746b1d67a 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll @@ -522,6 +522,69 @@ define <16 x float> @test_masked_permps_v16f32(<16 x float>* %vp, <16 x float> % ret <16 x float> %res } +define void @test_demandedelts_pshufb_v32i8_v16i8(<2 x i32>* %src, <8 x i32>* %dst) { +; SKX64-LABEL: test_demandedelts_pshufb_v32i8_v16i8: +; SKX64: # %bb.0: +; SKX64-NEXT: vmovdqa 32(%rdi), %xmm0 +; SKX64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero +; SKX64-NEXT: vmovdqa %ymm0, 672(%rsi) +; SKX64-NEXT: vpermilps {{.*#+}} xmm0 = mem[1,0,2,3] +; SKX64-NEXT: vmovaps %ymm0, 832(%rsi) +; SKX64-NEXT: vzeroupper +; SKX64-NEXT: retq +; +; KNL64-LABEL: test_demandedelts_pshufb_v32i8_v16i8: +; KNL64: # %bb.0: +; KNL64-NEXT: vmovdqa 32(%rdi), %xmm0 +; KNL64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero +; KNL64-NEXT: vmovdqa %ymm0, 672(%rsi) +; KNL64-NEXT: vmovdqa 208(%rdi), %xmm0 +; KNL64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero +; KNL64-NEXT: vmovdqa %ymm0, 832(%rsi) +; KNL64-NEXT: retq +; +; SKX32-LABEL: test_demandedelts_pshufb_v32i8_v16i8: +; SKX32: # %bb.0: +; SKX32-NEXT: movl {{[0-9]+}}(%esp), %eax +; SKX32-NEXT: movl {{[0-9]+}}(%esp), %ecx +; SKX32-NEXT: vmovdqa 32(%ecx), %xmm0 +; SKX32-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero +; SKX32-NEXT: vmovdqa %ymm0, 672(%eax) +; SKX32-NEXT: vpermilps {{.*#+}} xmm0 = mem[1,0,2,3] +; SKX32-NEXT: vmovaps %ymm0, 832(%eax) +; SKX32-NEXT: vzeroupper +; SKX32-NEXT: retl +; +; KNL32-LABEL: test_demandedelts_pshufb_v32i8_v16i8: +; KNL32: # %bb.0: +; KNL32-NEXT: movl {{[0-9]+}}(%esp), %eax +; KNL32-NEXT: vmovdqa 32(%eax), %xmm0 +; KNL32-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero +; KNL32-NEXT: movl {{[0-9]+}}(%esp), %ecx +; KNL32-NEXT: vmovdqa %ymm0, 672(%ecx) +; KNL32-NEXT: vmovdqa 208(%eax), %xmm0 +; KNL32-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero +; KNL32-NEXT: vmovdqa %ymm0, 832(%ecx) +; KNL32-NEXT: retl + %t64 = bitcast <2 x i32>* %src to <16 x i32>* + %t87 = load <16 x i32>, <16 x i32>* %t64, align 64 + %t88 = extractelement <16 x i32> %t87, i64 11 + %t89 = insertelement <8 x i32> , i32 %t88, i64 0 + %t90 = insertelement <8 x i32> %t89, i32 %t88, i64 1 + %ptridx49.i = getelementptr inbounds <8 x i32>, <8 x i32>* %dst, i64 21 + store <8 x i32> %t90, <8 x i32>* %ptridx49.i, align 32 + %ptridx56.i = getelementptr inbounds <2 x i32>, <2 x i32>* %src, i64 24 + %t00 = bitcast <2 x i32>* %ptridx56.i to <16 x i32>* + %t09 = load <16 x i32>, <16 x i32>* %t00, align 64 + %t10 = extractelement <16 x i32> %t09, i64 5 + %t11 = insertelement <8 x i32> , i32 %t10, i64 0 + %t12 = extractelement <16 x i32> %t09, i64 4 + %t13 = insertelement <8 x i32> %t11, i32 %t12, i64 1 + %ptridx64.i = getelementptr inbounds <8 x i32>, <8 x i32>* %dst, i64 26 + store <8 x i32> %t13, <8 x i32>* %ptridx64.i, align 32 + ret void +} + %union1= type { <16 x float> } @src1 = external dso_local local_unnamed_addr global %union1, align 64 From llvm-commits at lists.llvm.org Wed Jul 8 03:27:38 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:27:38 +0000 (UTC) Subject: [PATCH] D77129: [Verifier] Verify matrix dimensions operands match vector size. In-Reply-To: References: Message-ID: SjoerdMeijer added inline comments. ================ Comment at: llvm/lib/IR/Verifier.cpp:4805 + NumColumns = cast(Call.getArgOperand(4)); + TypeToCheck = cast(Call.getType()); + break; ---------------- fhahn wrote: > SjoerdMeijer wrote: > > Quick query on this and the semantics: > > > > declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 , i32 , i32 ) > > > > do we expect the element types of vectors %A and %B to be same, and do we need to check this? > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. > > To generate code for llvm.aarch64.neon.udot & co, there probably needs to be a way to have different element type widths for result and source operands. > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. I started looking at the matrix support, getting up to speed with it, and this is where I started and the first thing I noticed. Was just asking about that here as a sanity check. I wouldn't mind putting up a patch for that if that's helpful. Probably the least we can do for not is to check if we are not mixing integers and float types, and then we also need to add that to LangRef and be explicit about that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77129/new/ https://reviews.llvm.org/D77129 From llvm-commits at lists.llvm.org Wed Jul 8 03:33:00 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via llvm-commits) Date: Wed, 08 Jul 2020 03:33:00 -0700 (PDT) Subject: [llvm] 419c92a - [GlobalISel][InlineAsm] Fix matching input constraints to mem operand Message-ID: <5f05a0dc.1c69fb81.7c16d.e928@mx.google.com> Author: Petar Avramovic Date: 2020-07-08T12:32:17+02:00 New Revision: 419c92a749294a22a3deaa22719094ebd4e70568 URL: https://github.com/llvm/llvm-project/commit/419c92a749294a22a3deaa22719094ebd4e70568 DIFF: https://github.com/llvm/llvm-project/commit/419c92a749294a22a3deaa22719094ebd4e70568.diff LOG: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand Mark matching input constraint to mem operand as not supported. Differential Revision: https://reviews.llvm.org/D83235 Added: Modified: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp b/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp index 1950a4e8b763..241d5bace248 100644 --- a/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ b/llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -406,6 +406,18 @@ bool InlineAsmLowering::lowerInlineAsm( InstFlagIdx += getNumOpRegs(*Inst, InstFlagIdx) + 1; assert(getNumOpRegs(*Inst, InstFlagIdx) == 1 && "Wrong flag"); + unsigned MatchedOperandFlag = Inst->getOperand(InstFlagIdx).getImm(); + if (InlineAsm::isMemKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Matching input constraint to mem operand not " + "supported. This should be target specific.\n"); + return false; + } + if (!InlineAsm::isRegDefKind(MatchedOperandFlag) && + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; + } + // We want to tie input to register in next operand. unsigned DefRegIdx = InstFlagIdx + 1; Register Def = Inst->getOperand(DefRegIdx).getReg(); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll index 8287ab716a80..cf596c98d462 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll @@ -244,6 +244,16 @@ define i8 @scalable_call(i8* %addr) #1 { ret i8 %res } +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to translate instruction{{.*}}asm_indirect_output +; FALLBACK-WITH-REPORT-OUT-LABEL: asm_indirect_output +define void @asm_indirect_output() { +entry: + %ap = alloca i8*, align 8 + %0 = load i8*, i8** %ap, align 8 + call void asm sideeffect "", "=*r|m,0,~{memory}"(i8** %ap, i8* %0) + ret void +} + attributes #1 = { "target-features"="+sve" } declare @llvm.aarch64.sve.ptrue.nxv16i1(i32 %pattern) From llvm-commits at lists.llvm.org Wed Jul 8 03:33:01 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:33:01 +0000 (UTC) Subject: [PATCH] D83235: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG419c92a74929: [GlobalISel][InlineAsm] Fix matching input constraints to mem operand (authored by Petar.Avramovic). Changed prior to commit: https://reviews.llvm.org/D83235?vs=275730&id=276362#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83235/new/ https://reviews.llvm.org/D83235 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll @@ -244,6 +244,16 @@ ret i8 %res } +; FALLBACK-WITH-REPORT-ERR: remark: :0:0: unable to translate instruction{{.*}}asm_indirect_output +; FALLBACK-WITH-REPORT-OUT-LABEL: asm_indirect_output +define void @asm_indirect_output() { +entry: + %ap = alloca i8*, align 8 + %0 = load i8*, i8** %ap, align 8 + call void asm sideeffect "", "=*r|m,0,~{memory}"(i8** %ap, i8* %0) + ret void +} + attributes #1 = { "target-features"="+sve" } declare @llvm.aarch64.sve.ptrue.nxv16i1(i32 %pattern) Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -406,6 +406,18 @@ InstFlagIdx += getNumOpRegs(*Inst, InstFlagIdx) + 1; assert(getNumOpRegs(*Inst, InstFlagIdx) == 1 && "Wrong flag"); + unsigned MatchedOperandFlag = Inst->getOperand(InstFlagIdx).getImm(); + if (InlineAsm::isMemKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Matching input constraint to mem operand not " + "supported. This should be target specific.\n"); + return false; + } + if (!InlineAsm::isRegDefKind(MatchedOperandFlag) && + !InlineAsm::isRegDefEarlyClobberKind(MatchedOperandFlag)) { + LLVM_DEBUG(dbgs() << "Unknown matching constraint\n"); + return false; + } + // We want to tie input to register in next operand. unsigned DefRegIdx = InstFlagIdx + 1; Register Def = Inst->getOperand(DefRegIdx).getReg(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83235.276362.patch Type: text/x-patch Size: 1907 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:34:40 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:34:40 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <5c2c53b4be36ad19a9bfac86040fecb7@localhost.localdomain> serge-sans-paille marked an inline comment as done. serge-sans-paille added inline comments. ================ Comment at: llvm/unittests/IR/LegacyPassManagerTest.cpp:683 Function *SF = splitSimpleFunction(*F); - CallInst::Create(F, "", &SF->getEntryBlock()); + CallInst::Create(F, "", &*SF->getEntryBlock().getFirstInsertionPt()); ASSERT_EQ(M->getFunctionList().size(), 5U); ---------------- foad wrote: > Is this change related to the rest of the patch somehow? Yes, without this change, the call is added at the end of the entryblock, i.e. after the terminator. It happens that hash computation triggers an assert (indirectly) on that (bad) situation. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Wed Jul 8 03:38:31 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:38:31 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <4a1b453734cc1c96fe016ba3d769a74a@localhost.localdomain> lebedev.ri updated this revision to Diff 276363. lebedev.ri marked 18 inline comments as done. lebedev.ri added a comment. Thanks for taking a look! Applied some refactoring, addressed some comments. I will split some of the changes into separate commits when actually landing this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276363.patch Type: text/x-patch Size: 17928 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:38:35 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:38:35 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <5e3b0d42c20b3385291b3b19ac3ceb2c@localhost.localdomain> lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/DeltaManager.h:36 reduceOperandBundesDeltaPass(Tester); + reduceAttributesDeltaPass(Tester); // TODO: Implement the remaining Delta Passes ---------------- nickdesaulniers wrote: > arsenm wrote: > > Doing this last is an improvement over bugpoint's attempt to do this first. > > > > I don't think removing attributes is actually a great reduction strategy. For most of the hard to reduce testcases I debug, removing attributes is entirely pointless (and adding them is more helpful). I think this needs a flag to disable it. > Counterpoint, I find removing attributes very helpful in reducing the amount of noise in reduced test cases and have had bugs when I needed to figure out which attribute was the source of differences in codegen. > > I don't mind a flag (I don't think it's necessary, but doesn't hurt); but I'd prefer it to be default on so you can opt-out if you don't want to reduce attributes. We currently have no such options. How about we deal with that afterwards, by just consistently adding one for each? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:60-61 + void visitModule(Module &M) { + for_each(M.getGlobalList(), + [&](GlobalVariable &GV) { visitGlobalVariable(GV); }); + } ---------------- dblaikie wrote: > range-based-for loop, probably? Hm, can use either. Why not `for_each` ? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:146-148 + for (const auto &I : zip(Res, AttributeSets)) { + std::pair &NewSet = std::get<0>(I); + const AttrPtrIdxVecVecTy &V = std::get<1>(I); ---------------- dblaikie wrote: > nickdesaulniers wrote: > > does `zip` actually simplify this sequence? Looks kind of complicated. > +1 to this. > ``` > std::vector> Res; > Res.reserve(AttributeSets.size()); > for (const auto &V : AttributeSets) > Res.push_back({V.first, convertAttributeRefToAttributeSet(C, V.second)}); > ``` > > Seems simpler. After some refactoring... ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:155 + }); + sort(Res, [](const std::pair &LHS, + const std::pair &RHS) { ---------------- MaskRay wrote: > If it is non-deterministic, use `stable_sort` It is. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:169 + LLVMContext &C = Program->getContext(); + for_each(R.GlobalVariablesToRefine, [&C](const auto &I) { + I.first->setAttributes(convertAttributeRefToAttributeSet(C, I.second)); ---------------- dblaikie wrote: > why std::for_each rather than a range-based for loop? why range-based for loop rather than a std::for_each? These cases i'll prefer to keep as for_each, because it really doesn't matter in which order we iterate here, while in range-based for loop is more for traditional direct forward iteration. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:177 + I.first->setAttributes(convertAttributeRefVecToAttributeList(C, I.second)); + }); +} ---------------- nickdesaulniers wrote: > I wish all these members of `AttributeRemapper` were `private` and these three loops maybe hidden in a method of `AttributeRemapper`. The delta pass consists of three stages: 1. just counting all the features 2. enumerating each feature (in a stable order) and recording whether or not it is to be kept 3. actually applying the rewrite from the previous step Right now, i think each step is neatly separated into `AttributeCounter`, `AttributeRemapper` and `extractAttributesFromModule`. I think, sinking implementation detail of the last step into middle step will convolute things. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 03:39:55 2020 From: llvm-commits at lists.llvm.org (Alex Richardson via llvm-commits) Date: Wed, 08 Jul 2020 03:39:55 -0700 (PDT) Subject: [llvm] 1be92dd - Add missing REQUIRES: x86-registered-target Message-ID: <5f05a27b.1c69fb81.64f59.7cdd@mx.google.com> Author: Alex Richardson Date: 2020-07-08T11:39:29+01:00 New Revision: 1be92dd207275e1ecf09dd38f44ead908fe4b8c9 URL: https://github.com/llvm/llvm-project/commit/1be92dd207275e1ecf09dd38f44ead908fe4b8c9 DIFF: https://github.com/llvm/llvm-project/commit/1be92dd207275e1ecf09dd38f44ead908fe4b8c9.diff LOG: Add missing REQUIRES: x86-registered-target This should fix build bot failures after a80afc032859ebe65af283f76b38a0f5921b683f Added: Modified: llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test Removed: ################################################################################ diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test index ec85fd7cb7ef..a53e03b8909a 100644 --- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test +++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test @@ -1,3 +1,4 @@ +# REQUIRES: x86-registered-target # RUN: cp -f %S/Inputs/on_the_fly_arg_change.ll %t.ll # RUN: %update_llc_test_checks %t.ll # RUN: diff -u %t.ll %S/Inputs/on_the_fly_arg_change.ll.expected From llvm-commits at lists.llvm.org Wed Jul 8 03:43:42 2020 From: llvm-commits at lists.llvm.org (JunMa via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:43:42 +0000 (UTC) Subject: [PATCH] D83379: [Coroutines] Refactor sinkLifetimeStartMarkers Message-ID: junparser created this revision. junparser added reviewers: modocache, lxfind, lewissbaker. junparser added a project: LLVM. Herald added subscribers: llvm-commits, hiraditya. D82314 has implemented a method for coroutine frame reducing in sinkLifetimeStartMarkers, which sink lifetime.start marker to multiple users. However, this may cause mismatch. Consider this : coroutine A(){ a; lifetime.start(a); if (b) { ... // do something co_await use(a); } use(a); lifetime.end(a); } Since both user of a do not dominate each other, we just sink lifetime.start for both of them. This may cause buildCoroutineFrame to keep a as local variable. More importantly, it introduces the pattern in resume function such like: lifetime.start(a); use(a); lifetime.start(a); use(a); lifetime.end(a); which cause wrong optimization in later passes. This patch rewrite sinkLifetimeStartMarkers which only sink lifetime.start markers when all of its user are only used inside one of suspended region. TestPlan: cppcoro, check-llvm, https://godbolt.org/z/zVx_eB Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83379 Files: llvm/lib/Transforms/Coroutines/CoroFrame.cpp llvm/lib/Transforms/Coroutines/CoroSplit.cpp llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83379.276357.patch Type: text/x-patch Size: 12599 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:46:59 2020 From: llvm-commits at lists.llvm.org (Alexander Richardson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:46:59 +0000 (UTC) Subject: [PATCH] D78618: [UpdateTestChecks] Move more update_test_checks.py logic to common.py In-Reply-To: References: Message-ID: <04ccdd69fdc156b7a1641268de1bf7b8@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGaae413462fae: [UpdateTestChecks] Move more update_test_checks.py logic to common.py (authored by arichardson). Changed prior to commit: https://reviews.llvm.org/D78618?vs=274426&id=275704#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78618/new/ https://reviews.llvm.org/D78618 Files: llvm/utils/UpdateTestChecks/common.py llvm/utils/update_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D78618.275704.patch Type: text/x-patch Size: 10573 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:47:10 2020 From: llvm-commits at lists.llvm.org (Alexander Richardson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:47:10 +0000 (UTC) Subject: [PATCH] D78478: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py In-Reply-To: References: Message-ID: <25d5c74becc54713c4b420e4fc1ab59b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa80afc032859: [UpdateTestChecks] Add UTC_ARGS support for update_{llc,cc}_test_checks.py (authored by arichardson). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78478/new/ https://reviews.llvm.org/D78478 Files: clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c clang/test/utils/update_cc_test_checks/Inputs/on_the_fly_arg_change.c.expected clang/test/utils/update_cc_test_checks/mangled_names.test clang/test/utils/update_cc_test_checks/on_the_fly_arg_change.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/on_the_fly_arg_change.ll.expected llvm/test/tools/UpdateTestChecks/update_llc_test_checks/basic.test llvm/test/tools/UpdateTestChecks/update_llc_test_checks/on_the_fly_arg_change.test llvm/utils/update_cc_test_checks.py llvm/utils/update_llc_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D78478.275705.patch Type: text/x-patch Size: 21233 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:48:02 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:48:02 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: foad added a comment. Looks good to me apart from the comments inline, but I'd like someone else to approve it too. As a follow-up can we have the same checking for Loop, Region and SCC passes? ================ Comment at: llvm/lib/IR/LegacyPassManager.cpp:1490 + +uint64_t FunctionHash(Function &F) { + StructuralHash H; ---------------- I don't like having both `functionHash` and `FunctionHash`. How about having `FunctionHash` take the initial hash value as an optional argument that defaults to StructuralHash() ? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Wed Jul 8 03:48:45 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 03:48:45 -0700 (PDT) Subject: [llvm] 9dc250d - [X86][AVX] SimplifyDemandedVectorEltsForTargetShuffle - ensure mask is same size as constant size Message-ID: <5f05a48d.1c69fb81.f9552.d2cd@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T11:47:59+01:00 New Revision: 9dc250db9db29b0264fbb1e59bde8efa86d90c9b URL: https://github.com/llvm/llvm-project/commit/9dc250db9db29b0264fbb1e59bde8efa86d90c9b DIFF: https://github.com/llvm/llvm-project/commit/9dc250db9db29b0264fbb1e59bde8efa86d90c9b.diff LOG: [X86][AVX] SimplifyDemandedVectorEltsForTargetShuffle - ensure mask is same size as constant size Fixes test regression reported on D81791 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-shuffle-avx512.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 5238014008be..2e7d3062430c 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36935,11 +36935,16 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetShuffle( return false; const Constant *C = getTargetConstantFromNode(Load); - if (!C || !C->getType()->isVectorTy()) + if (!C) + return false; + + Type *CTy = C->getType(); + if (!CTy->isVectorTy() || + CTy->getPrimitiveSizeInBits() != Mask.getValueSizeInBits()) return false; // Handle scaling for i64 elements on 32-bit targets. - unsigned NumCstElts = cast(C->getType())->getNumElements(); + unsigned NumCstElts = cast(CTy)->getNumElements(); if (NumCstElts != NumElts && NumCstElts != (NumElts * 2)) return false; unsigned Scale = NumCstElts / NumElts; diff --git a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll index b79746b1d67a..1ab6f2cc45fc 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll @@ -528,8 +528,9 @@ define void @test_demandedelts_pshufb_v32i8_v16i8(<2 x i32>* %src, <8 x i32>* %d ; SKX64-NEXT: vmovdqa 32(%rdi), %xmm0 ; SKX64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero ; SKX64-NEXT: vmovdqa %ymm0, 672(%rsi) -; SKX64-NEXT: vpermilps {{.*#+}} xmm0 = mem[1,0,2,3] -; SKX64-NEXT: vmovaps %ymm0, 832(%rsi) +; SKX64-NEXT: vmovdqa 208(%rdi), %xmm0 +; SKX64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero +; SKX64-NEXT: vmovdqa %ymm0, 832(%rsi) ; SKX64-NEXT: vzeroupper ; SKX64-NEXT: retq ; @@ -550,8 +551,9 @@ define void @test_demandedelts_pshufb_v32i8_v16i8(<2 x i32>* %src, <8 x i32>* %d ; SKX32-NEXT: vmovdqa 32(%ecx), %xmm0 ; SKX32-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[12,13,14,15,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero ; SKX32-NEXT: vmovdqa %ymm0, 672(%eax) -; SKX32-NEXT: vpermilps {{.*#+}} xmm0 = mem[1,0,2,3] -; SKX32-NEXT: vmovaps %ymm0, 832(%eax) +; SKX32-NEXT: vmovdqa 208(%ecx), %xmm0 +; SKX32-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero +; SKX32-NEXT: vmovdqa %ymm0, 832(%eax) ; SKX32-NEXT: vzeroupper ; SKX32-NEXT: retl ; From llvm-commits at lists.llvm.org Wed Jul 8 03:52:45 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:52:45 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: dmgreen updated this revision to Diff 276364. dmgreen added a comment. Update a FIXME with sub info. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 Files: llvm/include/llvm/Analysis/IVDescriptors.h llvm/lib/Analysis/IVDescriptors.cpp llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp llvm/lib/Transforms/Vectorize/VPlan.h llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll llvm/test/Transforms/LoopVectorize/reduction-inloop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D75069.276364.patch Type: text/x-patch Size: 64918 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 03:53:33 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:53:33 +0000 (UTC) Subject: [PATCH] D77129: [Verifier] Verify matrix dimensions operands match vector size. In-Reply-To: References: Message-ID: fhahn marked an inline comment as done. fhahn added inline comments. ================ Comment at: llvm/lib/IR/Verifier.cpp:4805 + NumColumns = cast(Call.getArgOperand(4)); + TypeToCheck = cast(Call.getType()); + break; ---------------- SjoerdMeijer wrote: > fhahn wrote: > > SjoerdMeijer wrote: > > > Quick query on this and the semantics: > > > > > > declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 , i32 , i32 ) > > > > > > do we expect the element types of vectors %A and %B to be same, and do we need to check this? > > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. > > > > To generate code for llvm.aarch64.neon.udot & co, there probably needs to be a way to have different element type widths for result and source operands. > > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. > > I started looking at the matrix support, getting up to speed with it, and this is where I started and the first thing I noticed. Was just asking about that here as a sanity check. I wouldn't mind putting up a patch for that if that's helpful. Probably the least we can do for not is to check if we are not mixing integers and float types, and then we also need to add that to LangRef and be explicit about that. > I wouldn't mind putting up a patch for that if that's helpful. That would be great. I think things will fall apart/miscompile if the element types differ at the moment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77129/new/ https://reviews.llvm.org/D77129 From llvm-commits at lists.llvm.org Wed Jul 8 03:55:33 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 10:55:33 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: <5fc31c6e5b30839f812b2999f0405df8@localhost.localdomain> fhahn added a comment. > We see some applications keep being crashed, due to some changes in LV and probably being fixed later on, or because of its own weakness in some aspects. Does the application crash or is the crash in LV? If it is LV, it would be great if you could report it at https://bugs.llvm.org CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Wed Jul 8 04:04:29 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via llvm-commits) Date: Wed, 08 Jul 2020 04:04:29 -0700 (PDT) Subject: [llvm] b199131 - [cmake] Use CMAKE_GENERATOR to determine if Ninja is used Message-ID: <5f05a83d.1c69fb81.34dd1.d01d@mx.google.com> Author: Michael Forney Date: 2020-07-08T13:04:13+02:00 New Revision: b19913188d03d59332908f6280af37325bc49492 URL: https://github.com/llvm/llvm-project/commit/b19913188d03d59332908f6280af37325bc49492 DIFF: https://github.com/llvm/llvm-project/commit/b19913188d03d59332908f6280af37325bc49492.diff LOG: [cmake] Use CMAKE_GENERATOR to determine if Ninja is used The name of the make program does not necessarily match "ninja", especially if an alternative implementation like samurai is used. Using CMAKE_GENERATOR is a more robust detection method, and is already used elsewhere in this file. Differential revision: https://reviews.llvm.org/D77091 Added: Modified: llvm/cmake/modules/HandleLLVMOptions.cmake Removed: ################################################################################ diff --git a/llvm/cmake/modules/HandleLLVMOptions.cmake b/llvm/cmake/modules/HandleLLVMOptions.cmake index dfcf684fc502..2e249593e12f 100644 --- a/llvm/cmake/modules/HandleLLVMOptions.cmake +++ b/llvm/cmake/modules/HandleLLVMOptions.cmake @@ -27,7 +27,7 @@ string(TOUPPER "${LLVM_ENABLE_LTO}" uppercase_LLVM_ENABLE_LTO) set(LLVM_PARALLEL_COMPILE_JOBS "" CACHE STRING "Define the maximum number of concurrent compilation jobs (Ninja only).") if(LLVM_PARALLEL_COMPILE_JOBS) - if(NOT CMAKE_MAKE_PROGRAM MATCHES "ninja") + if(NOT CMAKE_GENERATOR STREQUAL "Ninja") message(WARNING "Job pooling is only available with Ninja generators.") else() set_property(GLOBAL APPEND PROPERTY JOB_POOLS compile_job_pool=${LLVM_PARALLEL_COMPILE_JOBS}) @@ -37,7 +37,7 @@ endif() set(LLVM_PARALLEL_LINK_JOBS "" CACHE STRING "Define the maximum number of concurrent link jobs (Ninja only).") -if(CMAKE_MAKE_PROGRAM MATCHES "ninja") +if(CMAKE_GENERATOR STREQUAL "Ninja") if(NOT LLVM_PARALLEL_LINK_JOBS AND uppercase_LLVM_ENABLE_LTO STREQUAL "THIN") message(STATUS "ThinLTO provides its own parallel linking - limiting parallel link jobs to 2.") set(LLVM_PARALLEL_LINK_JOBS "2") From llvm-commits at lists.llvm.org Wed Jul 8 04:04:35 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:04:35 +0000 (UTC) Subject: [PATCH] D77091: [cmake] Use CMAKE_GENERATOR to determine if Ninja is used In-Reply-To: References: Message-ID: <640f9716288b2e7bdeccabf315604be9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGb19913188d03: [cmake] Use CMAKE_GENERATOR to determine if Ninja is used (authored by mcf, committed by hans). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77091/new/ https://reviews.llvm.org/D77091 Files: llvm/cmake/modules/HandleLLVMOptions.cmake Index: llvm/cmake/modules/HandleLLVMOptions.cmake =================================================================== --- llvm/cmake/modules/HandleLLVMOptions.cmake +++ llvm/cmake/modules/HandleLLVMOptions.cmake @@ -27,7 +27,7 @@ set(LLVM_PARALLEL_COMPILE_JOBS "" CACHE STRING "Define the maximum number of concurrent compilation jobs (Ninja only).") if(LLVM_PARALLEL_COMPILE_JOBS) - if(NOT CMAKE_MAKE_PROGRAM MATCHES "ninja") + if(NOT CMAKE_GENERATOR STREQUAL "Ninja") message(WARNING "Job pooling is only available with Ninja generators.") else() set_property(GLOBAL APPEND PROPERTY JOB_POOLS compile_job_pool=${LLVM_PARALLEL_COMPILE_JOBS}) @@ -37,7 +37,7 @@ set(LLVM_PARALLEL_LINK_JOBS "" CACHE STRING "Define the maximum number of concurrent link jobs (Ninja only).") -if(CMAKE_MAKE_PROGRAM MATCHES "ninja") +if(CMAKE_GENERATOR STREQUAL "Ninja") if(NOT LLVM_PARALLEL_LINK_JOBS AND uppercase_LLVM_ENABLE_LTO STREQUAL "THIN") message(STATUS "ThinLTO provides its own parallel linking - limiting parallel link jobs to 2.") set(LLVM_PARALLEL_LINK_JOBS "2") -------------- next part -------------- A non-text attachment was scrubbed... Name: D77091.276365.patch Type: text/x-patch Size: 1096 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:19:17 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:19:17 +0000 (UTC) Subject: [PATCH] D77091: [cmake] Use CMAKE_GENERATOR to determine if Ninja is used In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGb19913188d03: [cmake] Use CMAKE_GENERATOR to determine if Ninja is used (authored by mcf, committed by hans). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77091/new/ https://reviews.llvm.org/D77091 Files: llvm/cmake/modules/HandleLLVMOptions.cmake Index: llvm/cmake/modules/HandleLLVMOptions.cmake =================================================================== --- llvm/cmake/modules/HandleLLVMOptions.cmake +++ llvm/cmake/modules/HandleLLVMOptions.cmake @@ -27,7 +27,7 @@ set(LLVM_PARALLEL_COMPILE_JOBS "" CACHE STRING "Define the maximum number of concurrent compilation jobs (Ninja only).") if(LLVM_PARALLEL_COMPILE_JOBS) - if(NOT CMAKE_MAKE_PROGRAM MATCHES "ninja") + if(NOT CMAKE_GENERATOR STREQUAL "Ninja") message(WARNING "Job pooling is only available with Ninja generators.") else() set_property(GLOBAL APPEND PROPERTY JOB_POOLS compile_job_pool=${LLVM_PARALLEL_COMPILE_JOBS}) @@ -37,7 +37,7 @@ set(LLVM_PARALLEL_LINK_JOBS "" CACHE STRING "Define the maximum number of concurrent link jobs (Ninja only).") -if(CMAKE_MAKE_PROGRAM MATCHES "ninja") +if(CMAKE_GENERATOR STREQUAL "Ninja") if(NOT LLVM_PARALLEL_LINK_JOBS AND uppercase_LLVM_ENABLE_LTO STREQUAL "THIN") message(STATUS "ThinLTO provides its own parallel linking - limiting parallel link jobs to 2.") set(LLVM_PARALLEL_LINK_JOBS "2") -------------- next part -------------- A non-text attachment was scrubbed... Name: D77091.275706.patch Type: text/x-patch Size: 1096 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:20:25 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:20:25 +0000 (UTC) Subject: [PATCH] D81911: [IR] Fix getBaseObject for GlobalAlias-to-GlobalIFunc In-Reply-To: References: Message-ID: <2aa94264b802f1a90947488554403d83@localhost.localdomain> DmitryPolukhin added a comment. I don't understand what do you mean by "not idempotent" behavior in this case. As far as I can see GlobalIFunc doesn't implement own getBaseObject (and it is not virtual) so calling getBaseObject on the IFunc should return null same as calling it on Alias-to-IFunc. Calling getbaseObject on Alias-to-IFunc will recursively call it on IFunc that will return null that will be propagated, isn't it? So in my opinion computeAliasSummary should handle null without crash because other places have checks for null returned from getBaseObject. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81911/new/ https://reviews.llvm.org/D81911 From llvm-commits at lists.llvm.org Wed Jul 8 04:23:37 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:23:37 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: <9e477475c2183d6dbdeb2784c0a48f21@localhost.localdomain> stuij updated this revision to Diff 276368. stuij added a comment. addressed review comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll Index: llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll @@ -0,0 +1,17 @@ +; RUN: llc -asm-verbose=0 -mtriple aarch64-arm-none-eabi < %s | FileCheck %s + +; The following code previously broke in the DAGCombiner. Specifically, trying to combine: +; extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x +; -> extract_vector_elt a, x + +define half @test_combine_extract_concat_vectors(<4 x i16> %a) nounwind { +entry: + %0 = shufflevector <4 x i16> %a, <4 x i16> undef, <8 x i32> + %1 = bitcast <8 x i16> %0 to <8 x half> + %2 = extractelement <8 x half> %1, i32 3 + ret half %2 +} + +; CHECK-LABEL: test_combine_extract_concat_vectors: +; CHECK-NEXT: mov h0, v0.h[3] +; CHECK-NEXT: ret Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -17812,8 +17812,11 @@ Elt = (Idx < (int)NumElts) ? Idx : Idx - (int)NumElts; Index = DAG.getConstant(Elt, DL, Index.getValueType()); } - } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && - !BCNumEltsChanged && VecVT.getVectorElementType() == ScalarVT) { + } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && !BCNumEltsChanged && + VecVT.getVectorElementType() == ScalarVT && + (!LegalTypes || + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType()))) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 // -> extract_vector_elt a, 0 // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83231.276368.patch Type: text/x-patch Size: 1874 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:26:32 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:26:32 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: <13ed934aca6eb6ec52232970aac04937@localhost.localdomain> stuij marked 2 inline comments as done. stuij added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:17817-17818 + VecVT.getVectorElementType() == ScalarVT && + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType())) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 ---------------- lebedev.ri wrote: > !LegalTyepes || TLI.isTypeLegal( Thanks. Fixed this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 From llvm-commits at lists.llvm.org Wed Jul 8 04:26:51 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:26:51 +0000 (UTC) Subject: [PATCH] D83381: [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Message-ID: foad created this revision. foad added reviewers: arsenm, rampitec, b-sumner. Herald added subscribers: llvm-commits, kerbowa, asbirlea, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83381 Files: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/idiv-licm.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83381.276370.patch Type: text/x-patch Size: 475691 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:27:07 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:27:07 +0000 (UTC) Subject: [PATCH] D83382: [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Message-ID: foad created this revision. foad added reviewers: arsenm, rampitec, b-sumner. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83382 Files: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructions.td llvm/lib/Target/AMDGPU/CaymanInstructions.td llvm/lib/Target/AMDGPU/SIInstructions.td llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83382.276371.patch Type: text/x-patch Size: 65713 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:27:31 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:27:31 +0000 (UTC) Subject: [PATCH] D83383: [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Message-ID: foad created this revision. foad added reviewers: arsenm, rampitec, b-sumner. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83383 Files: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll From llvm-commits at lists.llvm.org Wed Jul 8 04:28:49 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:28:49 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <4386eadf09afce4964f34504e27ba5df@localhost.localdomain> stefanp updated this revision to Diff 276373. stefanp marked an inline comment as done. stefanp added a comment. Fixed a couple of conditions and added comments for them. Added a test for the new condition. Cleaned up existing tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82950.276373.patch Type: text/x-patch Size: 7586 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:29:30 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:29:30 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: stefanp added inline comments. ================ Comment at: lld/ELF/Thunks.cpp:982 + if ((s.stOther >> 5) == 1 && type == R_PPC64_REL24) + return make(s); ---------------- MaskRay wrote: > This needs a comment. I've added a comment. Adding the comment forced me to re-assess the condition statement so I've also changed the condition to be just: ``` if ((s.stOther >> 5) == 1) ``` There was no reason to exclude R_PPC64_REL14 from this condition. I have also added a test for R_PPC64_REL14. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 From llvm-commits at lists.llvm.org Wed Jul 8 04:35:14 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:35:14 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: lebedev.ri accepted this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. Thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 From llvm-commits at lists.llvm.org Wed Jul 8 04:40:19 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:40:19 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for matching input constraints Message-ID: Petar.Avramovic created this revision. Petar.Avramovic added reviewers: paquette, arsenm. Herald added subscribers: llvm-commits, hiraditya, rovka, wdng. Herald added a project: LLVM. Check that input size matches size of destination reg class. Attempt to extend input size when needed. https://reviews.llvm.org/D83384 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll @@ -211,3 +211,16 @@ %1 = tail call i32 asm "ldr $0, $1", "=r,*m"(i32* %a) ret i32 %1 } + +define void @test_anyext_input_matching_constraint() { + ; CHECK-LABEL: name: test_anyext_input_matching_constraint + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: [[DEF:%[0-9]+]]:_(s16) = G_IMPLICIT_DEF + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[DEF]](s16) + ; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY [[ANYEXT]](s32) + ; CHECK: INLINEASM &"", 1 /* sideeffect attdialect */, 655370 /* regdef:GPR32common */, def %0, 2147483657 /* reguse tiedto:$0 */, [[COPY]](tied-def 3) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY %0 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + %1 = call i16 asm sideeffect "", "=r,0"(i16 undef) + unreachable +} Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -427,7 +427,23 @@ ArrayRef SrcRegs = GetOrCreateVRegs(*OpInfo.CallOperandVal); assert(SrcRegs.size() == 1 && "Single register is expected here"); Register Tmp = MRI->createVirtualRegister(RC); - MIRBuilder.buildCopy(Tmp, SrcRegs[0]); + Register Src = SrcRegs[0]; + unsigned SrcSize = TRI->getRegSizeInBits(Src, *MRI); + unsigned TmpSize = TRI->getRegSizeInBits(Tmp, *MRI); + if (TmpSize < SrcSize) { + LLVM_DEBUG(dbgs() << "Input can't fit in destination reg class\n"); + return false; + } + // Attempt to anyext small scalar sources. + if (TmpSize > SrcSize) { + if (!MRI->getType(Src).isValid() || !MRI->getType(Src).isScalar()) { + LLVM_DEBUG(dbgs() << "Can't extend input to size of destination" + " reg class\n"); + return false; + } + Src = MIRBuilder.buildAnyExt(LLT::scalar(TmpSize), Src).getReg(0); + } + MIRBuilder.buildCopy(Tmp, Src); // Add Flag and input register operand (Tmp) to Inst. Tie Tmp to Def. unsigned UseFlag = InlineAsm::getFlagWord(InlineAsm::Kind_RegUse, 1); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83384.276374.patch Type: text/x-patch Size: 2496 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:47:04 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:47:04 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: spatel added a comment. In D83360#2138392 , @lebedev.ri wrote: > In D83360#2138386 , @nlopes wrote: > > > Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633 > > > Sure, but we still need to have a test with comment that it should not be folded, referencing all this. Yes - we need ~4 tests ( {trueval/falseval} * {scalar/vector} ) to ensure that the transform doesn't get mistakenly re-added. Also, there's a clang test that is going to fail with this change: clang/test/CodeGen/arm-mve-intrinsics/dup.c That file tried to be minimally dependent on opt, but still used -early-cse which calls instsimplify for analysis. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 04:49:18 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:49:18 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <6ff8cd96006476ecf7bc5878e5eccf41@localhost.localdomain> ebevhan updated this revision to Diff 276377. ebevhan added a comment. Addressed review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.276377.patch Type: text/x-patch Size: 47301 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 04:51:38 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:51:38 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <7f6a6665a186158f663835872df38731@localhost.localdomain> ebevhan marked 5 inline comments as done. ebevhan added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/ISDOpcodes.h:313-314 + /// RESULT = [US]SHLSAT(LHS, RHS) - Perform saturation left shift on 2 + /// integers with the same bit width (W). If the true value of LHS << RHS + /// exceeds the largest value that can be represented by W bits, the ---------------- lebedev.ri wrote: > I'm not sure what `left shift on 2 integers` means. > Perhaps this needs some rewording. I lifted it from the other node descriptions, but it doesn't really make sense for shift. I changed it a bit. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7360-7365 + // For signed shifts, we can check for overflow by checking if we would have + // shifted out any bits that disagree with the sign bit. For unsigned shifts, + // we can just check if we would have shifted out any ones. + // TODO: On targets that don't support CTLZ, it may be more efficient to pull + // down the bits to be shifted out and compare those to the signmask/zero + // instead. ---------------- lebedev.ri wrote: > Have you checked if naive `x != ((x << y) u/s>> y)` results in worse lowering? The CTLZ approach was the one that popped into my head first, so I went with that. But it does turn out that yours works a bit better, at least for sshl.sat, so I swapped it out. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 04:53:53 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:53:53 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <7669de96cdfce141fe2c95b6440eecdb@localhost.localdomain> hans added a comment. In D83013#2137607 , @MaskRay wrote: > `Opts.getProfileUse() != CodeGenOptions::ProfileNone ` in > > Opts.CallGraphProfile = Opts.getProfileUse() != CodeGenOptions::ProfileNone && > !Opts.DisableIntegratedAS; > > > is redundant. CGProfile.cpp is a no-op if no function provides `getEntryFreq()`. It's a functional no-op, but it runs the BFI analysis, which as Nikita pointed out above adds some compile-time cost. Not scheduling the pass unless we're using profile info seems like a reasonable way to avoid that cost to me. The alternative of using LazyBlockFrequencyInfoPass and checking PSI->hasProfileSummary() first would also work I guess. If you think that's cleaner, maybe that's the better way to go. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Wed Jul 8 04:59:41 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 11:59:41 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: RKSimon added a comment. @yubing @pengfei @craig.topper Please can you confirm the regressions have now been addressed? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 05:00:57 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:00:57 +0000 (UTC) Subject: [PATCH] D83376: [Legalizer] Fix wrong operand in split vector helper In-Reply-To: References: Message-ID: uweigand accepted this revision. uweigand added a comment. This revision is now accepted and ready to land. Yes, this does look like an obvious typo. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83376/new/ https://reviews.llvm.org/D83376 From llvm-commits at lists.llvm.org Wed Jul 8 05:02:13 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:02:13 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: spatel added inline comments. ================ Comment at: llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll:2 +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -o - -bdce -S %s | FileCheck %s +define i32 @ZEXT_0(i16 %a) { ---------------- Run line should be more like this: opt -S -bdce < %s The test file should be moved to this folder: https://github.com/llvm/llvm-project/tree/master/llvm/test/Transforms/BDCE (please also update the title of this review / commit message) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Wed Jul 8 05:05:48 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:05:48 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: LukeGeeson updated this revision to Diff 276383. LukeGeeson marked 2 inline comments as done. LukeGeeson added a comment. Addressed Mikhail's feedback: Sorted CPU lists accordingly CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.276383.patch Type: text/x-patch Size: 14148 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:10:21 2020 From: llvm-commits at lists.llvm.org (Kiran Kumar T P via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:10:21 +0000 (UTC) Subject: [PATCH] D82931: [flang][OpenMP] Enhance parser support for atomic construct to OpenMP 5.0 In-Reply-To: References: Message-ID: <4f75636e64422a2fead4cb93dac74c68@localhost.localdomain> kiranktp added a comment. In D82931#2134924 , @sscalpone wrote: > > Any syntax error will lead to chain of errors. It will be tough to add a case for invalid syntax. > > Please consider adding error recovery to prevent the cascade of errors. See the docs and the parser for examples. Yes Steve. Checking this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82931/new/ https://reviews.llvm.org/D82931 From llvm-commits at lists.llvm.org Wed Jul 8 05:10:40 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:10:40 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. We have an issue currently: --dyn-relocations always prints the following relocation header when dumping `DynPLTRelRegion`: "Offset Info Type Symbol's Value Symbol's Name + Addend" I.e. even for an empty object, --dyn-relocations still prints this. It is a easy to fix bug, but we have no dedicated test case for this option. (we have a dynamic-reloc-no-section-headers.test, which has a slightly different purpose). This patch adds a test and fixes the behavior. https://reviews.llvm.org/D83387 Files: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83387.276384.patch Type: text/x-patch Size: 6517 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:14:09 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:14:09 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. In-Reply-To: References: Message-ID: grimar updated this revision to Diff 276385. grimar added a comment. - Minor fix. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83387/new/ https://reviews.llvm.org/D83387 Files: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83387.276385.patch Type: text/x-patch Size: 6507 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:14:37 2020 From: llvm-commits at lists.llvm.org (Konstantin Schwarz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:14:37 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for matching input constraints In-Reply-To: References: Message-ID: <39cd6889b05c1ba69214c14d3dc95a84@localhost.localdomain> kschwarz added a comment. Hi @Petar.Avramovic, we noticed that this extension is also missing for the non-tied register input operands. I added the opposite logic, i.e. truncating of output operands, but forgot about extending the inputs. Would you mind adding that handling in this patch too, as it is basically the same? If you feel like it should be in a separate commit, I can also provide a patch for it. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 From llvm-commits at lists.llvm.org Wed Jul 8 05:18:03 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:18:03 +0000 (UTC) Subject: [PATCH] D83073: [x86] improve codegen for bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <6ca8e9ddcbc8025beb6cdf003dd7ddff@localhost.localdomain> spatel closed this revision. spatel added a comment. rG26543f1c0cee CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83073/new/ https://reviews.llvm.org/D83073 From llvm-commits at lists.llvm.org Wed Jul 8 05:18:52 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:18:52 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dstenb added a comment. In D82975#2134132 , @dblaikie wrote: > In D82975#2128353 , @dstenb wrote: > > > In D82975#2127201 , @SouraVX wrote: > > > > > > When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag? > > > > & what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension? > > > > > > I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows: > > > > > > `gcc -g3 test.c -c`, after dumping using `objdump(2.32)`, GCC will create `.debug_macro`(sort of it's default, until you specify `-gstrict-dwarf` in which case GCC will generate `.debug_macinfo`). > > > > > > As Sourabh says this is default when not emitting strict DWARF in GCC. For Clang, my intention was for it to be enabled by default for `-fdebug-macro` when tuning for GDB. Maybe it would also be interesting when tuning for LLDB? > > > Sounds alright. Not sure if the LLDB folks (@aprantl @JDevlieghere @labath) would be interested in that - a separate patch in any case. Yes, let's take that in another patch. >> I just want to add that one downside with emitting `.debug_macro` that we have noticed downstream is that size of archives can grow quite a bit, since you then both pay for the uncoalesced strings in the different object files (same cost as for `.debug_macinfo`), plus all of the relocations. > > Got a rough %? Is it easy to disable this functionality if someone were trying to optimize for object size? (is there an easy way to disable gdb tuning on platforms that default to it, for instance?) I wrote that because I have in some downstream cases seen the size of large archives more than double. I don't know if there is something special about those cases, or the cost of relocation on that target, though. When building a Clang 8.0 RelWithDebInfo binary on x86-64 the size of the archives under lib/ increased from 5224M to 5341M (a 2.4% increase), so that's not too bad. $ du -h -s build-lib-with-macinfo/*.a | sort -h | tail -10 152M build-lib-with-macinfo/libclangARCMigrate.a 194M build-lib-with-macinfo/libclangStaticAnalyzerCore.a 200M build-lib-with-macinfo/libLLVMX86CodeGen.a 230M build-lib-with-macinfo/libLLVMAnalysis.a 245M build-lib-with-macinfo/libLLVMScalarOpts.a 289M build-lib-with-macinfo/libclangAST.a 457M build-lib-with-macinfo/libclangSema.a 481M build-lib-with-macinfo/libclangCodeGen.a 535M build-lib-with-macinfo/libLLVMCodeGen.a 573M build-lib-with-macinfo/libclangStaticAnalyzerCheckers.a $ du -h -s build-lib-with-macro/*.a | sort -h | tail -10 154M build-lib-with-macro/libclangARCMigrate.a 197M build-lib-with-macro/libclangStaticAnalyzerCore.a 204M build-lib-with-macro/libLLVMX86CodeGen.a 237M build-lib-with-macro/libLLVMAnalysis.a 250M build-lib-with-macro/libLLVMScalarOpts.a 295M build-lib-with-macro/libclangAST.a 460M build-lib-with-macro/libclangSema.a 487M build-lib-with-macro/libclangCodeGen.a 548M build-lib-with-macro/libLLVMCodeGen.a 581M build-lib-with-macro/libclangStaticAnalyzerCheckers.a Regarding overriding the default, I think the only way is to explicitly pass another tuning option, e.g. `-glldb`? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Wed Jul 8 05:21:24 2020 From: llvm-commits at lists.llvm.org (Adhemerval Zanella via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:21:24 +0000 (UTC) Subject: [PATCH] D83134: [asan] Disable fast unwinder on arm-linux-gnueabi with thumb In-Reply-To: References: Message-ID: <43f8adc3b8d3f0d83fd2abe6351995ab@localhost.localdomain> zatrazz added a comment. In D83134#2133857 , @eugenis wrote: > Is unwinding actually broken on an all-clang, all-thumb system? The issue is when instrumented objects interact with gcc object,, for instance running instrumented binaries on usual distros like Ubuntu. I haven't tested on a all-clang/all-thumb system, but from my understanding is clang thumb use the FP similar no arm mode (differerent than gcc), so it should work. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83134/new/ https://reviews.llvm.org/D83134 From llvm-commits at lists.llvm.org Wed Jul 8 05:23:09 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Wed, 08 Jul 2020 05:23:09 -0700 (PDT) Subject: [llvm] 9114900 - [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) Message-ID: <5f05baad.1c69fb81.2821d.2a1b@mx.google.com> Author: Sanjay Patel Date: 2020-07-08T08:20:49-04:00 New Revision: 91149002872f968673c8f01f641dfe11dc4a4d7c URL: https://github.com/llvm/llvm-project/commit/91149002872f968673c8f01f641dfe11dc4a4d7c DIFF: https://github.com/llvm/llvm-project/commit/91149002872f968673c8f01f641dfe11dc4a4d7c.diff LOG: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) vselect ((X & Pow2C) == 0), LHS, RHS --> vselect ((shl X, C') < 0), RHS, LHS Follow-up to D83073 - the non-splat mask cases where we actually see an improvement are quite limited from what I can tell. AVX1 needs multiply and blend capabilities and AVX2 needs vector shift and blend capabilities. The intersection of those 2 constraints is only vectors with 32-bit or 64-bit elements. XOP is/was better. Differential Revision: https://reviews.llvm.org/D83181 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vselect-pcmp.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 2e7d3062430c..5ac94be28adf 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -40311,18 +40311,42 @@ static SDValue combineSelect(SDNode *N, SelectionDAG &DAG, cast(Cond.getOperand(2))->get() == ISD::SETEQ && Cond.getOperand(0).getValueType() == VT) { // The 'and' mask must be composed of power-of-2 constants. - // TODO: This is limited to splats because the availability/lowering of - // non-uniform shifts and variable blend types is lumpy. Supporting - // arbitrary power-of-2 vector constants will make the code more - // complicated and may require target limitations to ensure that the - // transform is profitable. - auto *C = isConstOrConstSplat(Cond.getOperand(0).getOperand(1)); + SDValue And = Cond.getOperand(0); + auto *C = isConstOrConstSplat(And.getOperand(1)); if (C && C->getAPIntValue().isPowerOf2()) { // vselect (X & C == 0), LHS, RHS --> vselect (X & C != 0), RHS, LHS - SDValue NotCond = DAG.getSetCC(DL, CondVT, Cond.getOperand(0), - Cond.getOperand(1), ISD::SETNE); + SDValue NotCond = + DAG.getSetCC(DL, CondVT, And, Cond.getOperand(1), ISD::SETNE); return DAG.getSelect(DL, VT, NotCond, RHS, LHS); } + + // If we have a non-splat but still powers-of-2 mask, AVX1 can use pmulld + // and AVX2 can use vpsllv{dq}. 8-bit lacks a proper shift or multiply. + // 16-bit lacks a proper blendv. + unsigned EltBitWidth = VT.getScalarSizeInBits(); + bool CanShiftBlend = + TLI.isTypeLegal(VT) && ((Subtarget.hasAVX() && EltBitWidth == 32) || + (Subtarget.hasAVX2() && EltBitWidth == 64) || + (Subtarget.hasXOP())); + if (CanShiftBlend && + ISD::matchUnaryPredicate(And.getOperand(1), [](ConstantSDNode *C) { + return C->getAPIntValue().isPowerOf2(); + })) { + // Create a left-shift constant to get the mask bits over to the sign-bit. + SDValue Mask = And.getOperand(1); + SmallVector ShlVals; + for (unsigned i = 0, e = VT.getVectorNumElements(); i != e; ++i) { + auto *MaskVal = cast(Mask.getOperand(i)); + ShlVals.push_back(EltBitWidth - 1 - + MaskVal->getAPIntValue().exactLogBase2()); + } + // vsel ((X & C) == 0), LHS, RHS --> vsel ((shl X, C') < 0), RHS, LHS + SDValue ShlAmt = getConstVector(ShlVals, VT.getSimpleVT(), DAG, DL); + SDValue Shl = DAG.getNode(ISD::SHL, DL, VT, And.getOperand(0), ShlAmt); + SDValue NewCond = + DAG.getSetCC(DL, CondVT, Shl, Cond.getOperand(1), ISD::SETLT); + return DAG.getSelect(DL, VT, NewCond, RHS, LHS); + } } return SDValue(); diff --git a/llvm/test/CodeGen/X86/vselect-pcmp.ll b/llvm/test/CodeGen/X86/vselect-pcmp.ll index b7065c69b83b..4c56c654defa 100644 --- a/llvm/test/CodeGen/X86/vselect-pcmp.ll +++ b/llvm/test/CodeGen/X86/vselect-pcmp.ll @@ -931,13 +931,19 @@ define <16 x i8> @blend_splat_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x } define <2 x i64> @blend_mask_cond_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z) { -; AVX12-LABEL: blend_mask_cond_v2i64: -; AVX12: # %bb.0: -; AVX12-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; AVX12-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX12-NEXT: vpcmpeqq %xmm3, %xmm0, %xmm0 -; AVX12-NEXT: vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 -; AVX12-NEXT: retq +; AVX1-LABEL: blend_mask_cond_v2i64: +; AVX1: # %bb.0: +; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 +; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3 +; AVX1-NEXT: vpcmpeqq %xmm3, %xmm0, %xmm0 +; AVX1-NEXT: vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 +; AVX1-NEXT: retq +; +; AVX2-LABEL: blend_mask_cond_v2i64: +; AVX2: # %bb.0: +; AVX2-NEXT: vpsllvq {{.*}}(%rip), %xmm0, %xmm0 +; AVX2-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0 +; AVX2-NEXT: retq ; ; AVX512F-LABEL: blend_mask_cond_v2i64: ; AVX512F: # %bb.0: @@ -959,10 +965,8 @@ define <2 x i64> @blend_mask_cond_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z ; ; XOP-LABEL: blend_mask_cond_v2i64: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqq %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vpshlq {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0 ; XOP-NEXT: retq %a = and <2 x i64> %x, %c = icmp eq <2 x i64> %a, zeroinitializer @@ -971,13 +975,17 @@ define <2 x i64> @blend_mask_cond_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z } define <4 x i32> @blend_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) { -; AVX12-LABEL: blend_mask_cond_v4i32: -; AVX12: # %bb.0: -; AVX12-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; AVX12-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX12-NEXT: vpcmpeqd %xmm3, %xmm0, %xmm0 -; AVX12-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 -; AVX12-NEXT: retq +; AVX1-LABEL: blend_mask_cond_v4i32: +; AVX1: # %bb.0: +; AVX1-NEXT: vpmulld {{.*}}(%rip), %xmm0, %xmm0 +; AVX1-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0 +; AVX1-NEXT: retq +; +; AVX2-LABEL: blend_mask_cond_v4i32: +; AVX2: # %bb.0: +; AVX2-NEXT: vpsllvd {{.*}}(%rip), %xmm0, %xmm0 +; AVX2-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0 +; AVX2-NEXT: retq ; ; AVX512F-LABEL: blend_mask_cond_v4i32: ; AVX512F: # %bb.0: @@ -999,10 +1007,8 @@ define <4 x i32> @blend_mask_cond_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z ; ; XOP-LABEL: blend_mask_cond_v4i32: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqd %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vpshld {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0 ; XOP-NEXT: retq %a = and <4 x i32> %x, %c = icmp eq <4 x i32> %a, zeroinitializer @@ -1021,10 +1027,10 @@ define <8 x i16> @blend_mask_cond_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %z ; ; XOP-LABEL: blend_mask_cond_v8i16: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 ; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqw %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpcomltw %xmm3, %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0 ; XOP-NEXT: retq %a = and <8 x i16> %x, %c = icmp eq <8 x i16> %a, zeroinitializer @@ -1043,10 +1049,8 @@ define <16 x i8> @blend_mask_cond_v16i8(<16 x i8> %x, <16 x i8> %y, <16 x i8> %z ; ; XOP-LABEL: blend_mask_cond_v16i8: ; XOP: # %bb.0: -; XOP-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0 -; XOP-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqb %xmm3, %xmm0, %xmm0 -; XOP-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0 +; XOP-NEXT: vpshlb {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0 ; XOP-NEXT: retq %a = and <16 x i8> %x, %c = icmp eq <16 x i8> %a, zeroinitializer @@ -1068,10 +1072,8 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z ; ; AVX2-LABEL: blend_mask_cond_v4i64: ; AVX2: # %bb.0: -; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 -; AVX2-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX2-NEXT: vpcmpeqq %ymm3, %ymm0, %ymm0 -; AVX2-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0 +; AVX2-NEXT: vpsllvq {{.*}}(%rip), %ymm0, %ymm0 +; AVX2-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0 ; AVX2-NEXT: retq ; ; AVX512F-LABEL: blend_mask_cond_v4i64: @@ -1093,13 +1095,11 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z ; ; XOP-LABEL: blend_mask_cond_v4i64: ; XOP: # %bb.0: -; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 -; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 -; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 -; XOP-NEXT: vpcomeqq %xmm4, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqq %xmm4, %xmm0, %xmm0 -; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 -; XOP-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: vpshlq {{.*}}(%rip), %xmm0, %xmm3 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm0 +; XOP-NEXT: vpshlq {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm0 +; XOP-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0 ; XOP-NEXT: retq %a = and <4 x i64> %x, %c = icmp eq <4 x i64> %a, zeroinitializer @@ -1110,21 +1110,17 @@ define <4 x i64> @blend_mask_cond_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %z define <8 x i32> @blend_mask_cond_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %z) { ; AVX1-LABEL: blend_mask_cond_v8i32: ; AVX1: # %bb.0: -; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3 -; AVX1-NEXT: vpxor %xmm4, %xmm4, %xmm4 -; AVX1-NEXT: vpcmpeqd %xmm4, %xmm3, %xmm3 -; AVX1-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 -; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 +; AVX1-NEXT: vpmulld {{.*}}(%rip), %xmm0, %xmm3 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 +; AVX1-NEXT: vpmulld {{.*}}(%rip), %xmm0, %xmm0 +; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm0 +; AVX1-NEXT: vblendvps %ymm0, %ymm2, %ymm1, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: blend_mask_cond_v8i32: ; AVX2: # %bb.0: -; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 -; AVX2-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX2-NEXT: vpcmpeqd %ymm3, %ymm0, %ymm0 -; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 +; AVX2-NEXT: vpsllvd {{.*}}(%rip), %ymm0, %ymm0 +; AVX2-NEXT: vblendvps %ymm0, %ymm2, %ymm1, %ymm0 ; AVX2-NEXT: retq ; ; AVX512F-LABEL: blend_mask_cond_v8i32: @@ -1146,13 +1142,11 @@ define <8 x i32> @blend_mask_cond_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %z ; ; XOP-LABEL: blend_mask_cond_v8i32: ; XOP: # %bb.0: -; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 -; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 -; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 -; XOP-NEXT: vpcomeqd %xmm4, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqd %xmm4, %xmm0, %xmm0 -; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 -; XOP-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0 +; XOP-NEXT: vpshld {{.*}}(%rip), %xmm0, %xmm3 +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm0 +; XOP-NEXT: vpshld {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm0 +; XOP-NEXT: vblendvps %ymm0, %ymm2, %ymm1, %ymm0 ; XOP-NEXT: retq %a = and <8 x i32> %x, %c = icmp eq <8 x i32> %a, zeroinitializer @@ -1192,13 +1186,14 @@ define <16 x i16> @blend_mask_cond_v16i16(<16 x i16> %x, <16 x i16> %y, <16 x i1 ; ; XOP-LABEL: blend_mask_cond_v16i16: ; XOP: # %bb.0: -; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 ; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm3, %xmm3 ; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 -; XOP-NEXT: vpcomeqw %xmm4, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqw %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vpcomltw %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpcomltw %xmm4, %xmm0, %xmm0 ; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 -; XOP-NEXT: vpcmov %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm1, %ymm2, %ymm0 ; XOP-NEXT: retq %a = and <16 x i16> %x, %c = icmp eq <16 x i16> %a, zeroinitializer @@ -1238,13 +1233,14 @@ define <32 x i8> @blend_mask_cond_v32i8(<32 x i8> %x, <32 x i8> %y, <32 x i8> %z ; ; XOP-LABEL: blend_mask_cond_v32i8: ; XOP: # %bb.0: -; XOP-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 ; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3 +; XOP-NEXT: vpshlb {{.*}}(%rip), %xmm3, %xmm3 ; XOP-NEXT: vpxor %xmm4, %xmm4, %xmm4 -; XOP-NEXT: vpcomeqb %xmm4, %xmm3, %xmm3 -; XOP-NEXT: vpcomeqb %xmm4, %xmm0, %xmm0 +; XOP-NEXT: vpcomltb %xmm4, %xmm3, %xmm3 +; XOP-NEXT: vpshlb {{.*}}(%rip), %xmm0, %xmm0 +; XOP-NEXT: vpcomltb %xmm4, %xmm0, %xmm0 ; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0 -; XOP-NEXT: vpcmov %ymm0, %ymm2, %ymm1, %ymm0 +; XOP-NEXT: vpcmov %ymm0, %ymm1, %ymm2, %ymm0 ; XOP-NEXT: retq %a = and <32 x i8> %x, %c = icmp eq <32 x i8> %a, zeroinitializer From llvm-commits at lists.llvm.org Wed Jul 8 05:23:13 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:23:13 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <1e807950a5acaf619a5a17aba1a5387b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG91149002872f: [x86] improve codegen for non-splat bit-masked vector compare and select… (authored by spatel). Changed prior to commit: https://reviews.llvm.org/D83181?vs=275705&id=276386#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vselect-pcmp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83181.276386.patch Type: text/x-patch Size: 12121 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:24:09 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:24:09 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: <21b2f34cb160201b0d4b40ad9bf60bf5@localhost.localdomain> bbn added a comment. Can we just merge the dependency graph and the "SchedulingGraph" because I think they are quite similar. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 05:25:36 2020 From: llvm-commits at lists.llvm.org (Isuru Fernando via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:25:36 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG2ebf4b6e4c35: [flang] Fix setting mxcsr on MSVC (authored by isuruf). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 Files: flang/lib/Evaluate/host.cpp flang/lib/Evaluate/host.h flang/unittests/Evaluate/fp-testing.cpp flang/unittests/Evaluate/fp-testing.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D77815.276387.patch Type: text/x-patch Size: 5694 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:26:27 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:26:27 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <0500039001daa93fef5d867ecb4854a6@localhost.localdomain> serge-sans-paille updated this revision to Diff 276388. serge-sans-paille added a comment. Take final review into account CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 Files: llvm/lib/IR/LegacyPassManager.cpp llvm/unittests/IR/LegacyPassManagerTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80916.276388.patch Type: text/x-patch Size: 3685 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:27:40 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:27:40 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <5cbb453f61c59705957d39609af8ceb8@localhost.localdomain> dmgreen added a comment. I would expect this to be very similar to https://reviews.llvm.org/rG8bf99f1e6f0f9b426d6060361ea6d9d47c1868d1, but some parts seems to be missing. Can you make sure that everything is included and in a sensible order. ================ Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:131 +AARCH64_CPU_NAME("cortex-a78", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (AArch64::AEK_RAS | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC | AArch64::AEK_SSBS)) +AARCH64_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, ---------------- This no longer has FP16? AEK_RAS I believe should be included in ARMV8_2A, ================ Comment at: llvm/include/llvm/Support/ARMTargetParser.def:301 + (ARM::AEK_RAS | ARM::AEK_DOTPROD)) +ARM_CPU_NAME("cortex-a78",ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (ARM::AEK_RAS | ARM::AEK_DOTPROD)) ---------------- All these can go in a better order, please. A78 can go next to A77. Same everywhere else. ================ Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.cpp:196 + case CortexX1: + case CortexA78: + PrefFunctionLogAlignment = 4; ---------------- These can go with the other CortexAXX cpu's, which seem to set the same PrefFunctionLogAlignment. Same for the ARM equivalent. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 From llvm-commits at lists.llvm.org Wed Jul 8 05:31:28 2020 From: llvm-commits at lists.llvm.org (Eugene Leviant via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:31:28 +0000 (UTC) Subject: [PATCH] D73242: [WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP In-Reply-To: References: Message-ID: evgeny777 accepted this revision. evgeny777 added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73242/new/ https://reviews.llvm.org/D73242 From llvm-commits at lists.llvm.org Wed Jul 8 05:38:24 2020 From: llvm-commits at lists.llvm.org (Eugene Leviant via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:38:24 +0000 (UTC) Subject: [PATCH] D83389: [MIR] Speedup parsing (up to 1000x+) Message-ID: evgeny777 created this revision. evgeny777 added reviewers: arphaman, arsenm, Eugene.Zelenko, qcolombet. Herald added subscribers: dexonsmith, hiraditya, wdng. Herald added a project: LLVM. Patch eliminates string length calculation when lexing a token. It lowers parsing time from 15 minutes to few seconds for 4MB MIR file on Ryzen7 4800H https://reviews.llvm.org/D83389 Files: llvm/lib/CodeGen/MIRParser/MIParser.cpp Index: llvm/lib/CodeGen/MIRParser/MIParser.cpp =================================================================== --- llvm/lib/CodeGen/MIRParser/MIParser.cpp +++ llvm/lib/CodeGen/MIRParser/MIParser.cpp @@ -563,7 +563,7 @@ void MIParser::lex(unsigned SkipChar) { CurrentSource = lexMIToken( - CurrentSource.data() + SkipChar, Token, + CurrentSource.slice(SkipChar, StringRef::npos), Token, [this](StringRef::iterator Loc, const Twine &Msg) { error(Loc, Msg); }); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83389.276389.patch Type: text/x-patch Size: 491 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:41:06 2020 From: llvm-commits at lists.llvm.org (Aaron H Liu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:41:06 +0000 (UTC) Subject: [PATCH] D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions. In-Reply-To: References: Message-ID: AaronLiu added a comment. Thanks! Will keep an eye on it in the future. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81416/new/ https://reviews.llvm.org/D81416 From llvm-commits at lists.llvm.org Wed Jul 8 05:49:01 2020 From: llvm-commits at lists.llvm.org (Anna Welker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:49:01 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: <7068163c1cda8513ef9d181fe2cacaef@localhost.localdomain> anwel updated this revision to Diff 276391. anwel added a comment. Hi, there was a bug concerning the location at which the new getelementptr instruction would be generated, thanks for catching this. I have fixed the bug and derived a LoopVectorize test case from your C example. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll llvm/test/Transforms/LoopVectorize/pointer-induction.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81267.276391.patch Type: text/x-patch Size: 54809 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:50:12 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:50:12 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <9b634bd5644f7c537fc419c9d9daa116@localhost.localdomain> lebedev.ri added a comment. Some more thoughts. ================ Comment at: llvm/docs/LangRef.rst:14243 +'``llvm.sshl.sat.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ---------------- This should be ``` '``llvm.sshl.sat.*``' Intrinsics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` ================ Comment at: llvm/docs/LangRef.rst:14262 +The '``llvm.sshl.sat``' family of intrinsic functions perform signed +saturation left shift on the first argument. + ---------------- Here and elsewhere: i strongly suspect it should be `s/saturation/saturating/` ================ Comment at: llvm/docs/LangRef.rst:14292 +'``llvm.ushl.sat.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ---------------- same ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:800 + + unsigned Opcode = N->getOpcode(); + ---------------- Assert that we only ever get `ISD::USHLSAT`/`ISD::SSHLSAT` ? ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:809 + } else { + Op1Promoted = SExtPromotedInteger(Op1); + Op2Promoted = ZExtPromotedInteger(Op2); ---------------- Actually, why do we need to signext, or even zeroext it? As the comment before function notes, we want anyext, we don't care about those new high bits, because we are immediately going to shift them out. ``` ---------------------------------------- Name: promote ushl %r = ushl_sat i8 %x, %y ret i8 %r => %x_wide = zext i8 %x to i16 %y_wide = zext i8 %y to i16 %t0 = shl i16 %x_wide, 8 %t1 = ushl_sat i16 %t0, %y_wide %t2 = lshr i16 %t1, 8 %r = trunc i16 %t2 to i8 ret i8 %r Done: 1 Transformation seems to be correct! ---------------------------------------- Name: promote sshl %r = sshl_sat i8 %x, %y ret i8 %r => %x_wide = zext i8 %x to i16 %y_wide = zext i8 %y to i16 %t0 = shl i16 %x_wide, 8 %t1 = sshl_sat i16 %t0, %y_wide %t2 = ashr i16 %t1, 8 %r = trunc i16 %t2 to i8 ret i8 %r Done: 1 Transformation seems to be correct! ``` So i think you want ``` SDValue Op1Promoted = GetPromotedInteger(Op1); SDValue Op1Promoted = GetPromotedInteger(Op2); unsigned ShiftOp = Opcode == ISD::USHLSAT ? ISD::SRL : ISD::SRA; ``` and maybe get rid of `ShiftOp` variable, or sink it closer to use. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 05:51:05 2020 From: llvm-commits at lists.llvm.org (Dominik Montada via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:51:05 +0000 (UTC) Subject: [PATCH] D83390: [GlobalISel][InlineAsm] Extend input operands when register class size does not match type Message-ID: gargaroff created this revision. gargaroff added reviewers: arsenm, kschwarz. Herald added subscribers: llvm-commits, hiraditya, rovka, wdng. Herald added a project: LLVM. The InlineAsmLowering was blindly copying the input operands to the selected register class without checking if their sizes match. This resulted in illegal COPYs. Use G_ANYEXT to extend the input operand to fit the selected register class. G_ANYEXT ist used since the inline assembler is not allowed to assume anything about the upper bits of the input operands. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83390 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll @@ -199,6 +199,18 @@ ret i8 %0 } +define void @test_input_register_extend() { + ; CHECK-LABEL: name: test_input_register_extend + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 42 + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[C]](s8) + ; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY [[ANYEXT]](s32) + ; CHECK: INLINEASM &"mov x0, $0", 1 /* sideeffect attdialect */, 9 /* reguse */, [[COPY]] + ; CHECK: RET_ReallyLR + call void asm sideeffect "mov x0, $0", "r"(i8 42) + ret void +} + define i32 @test_memory_constraint(i32* %a) nounwind { ; CHECK-LABEL: name: test_memory_constraint ; CHECK: bb.1 (%ir-block.0): Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -525,7 +525,21 @@ unsigned Flag = InlineAsm::getFlagWord(InlineAsm::Kind_RegUse, NumRegs); Inst.addImm(Flag); - MIRBuilder.buildCopy(OpInfo.Regs[0], SourceRegs[0]); + + // We might need to extend the input registers to fit the register class + Register TargetReg = OpInfo.Regs[0]; + Register InputReg = SourceRegs[0]; + + unsigned TargetSize = TRI->getRegSizeInBits(TargetReg, *MRI); + unsigned InputSize = MRI->getType(InputReg).getSizeInBits(); + + if (InputSize < TargetSize) { + Register ScratchReg = + MRI->createGenericVirtualRegister(LLT::scalar(TargetSize)); + InputReg = MIRBuilder.buildAnyExt(ScratchReg, InputReg).getReg(0); + } + + MIRBuilder.buildCopy(OpInfo.Regs[0], InputReg); Inst.addReg(OpInfo.Regs[0]); break; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83390.276393.patch Type: text/x-patch Size: 2020 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:52:44 2020 From: llvm-commits at lists.llvm.org (Dominik Montada via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:52:44 +0000 (UTC) Subject: [PATCH] D83390: [GlobalISel][InlineAsm] Extend input operands when register class size does not match type In-Reply-To: References: Message-ID: <525e10d8bfd66dbfbaa404b23cf96c46@localhost.localdomain> gargaroff updated this revision to Diff 276394. gargaroff added a comment. Forgot to commit final changes before diff Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83390/new/ https://reviews.llvm.org/D83390 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll @@ -199,6 +199,18 @@ ret i8 %0 } +define void @test_input_register_extend() { + ; CHECK-LABEL: name: test_input_register_extend + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 42 + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[C]](s8) + ; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY [[ANYEXT]](s32) + ; CHECK: INLINEASM &"mov x0, $0", 1 /* sideeffect attdialect */, 9 /* reguse */, [[COPY]] + ; CHECK: RET_ReallyLR + call void asm sideeffect "mov x0, $0", "r"(i8 42) + ret void +} + define i32 @test_memory_constraint(i32* %a) nounwind { ; CHECK-LABEL: name: test_memory_constraint ; CHECK: bb.1 (%ir-block.0): Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -525,8 +525,22 @@ unsigned Flag = InlineAsm::getFlagWord(InlineAsm::Kind_RegUse, NumRegs); Inst.addImm(Flag); - MIRBuilder.buildCopy(OpInfo.Regs[0], SourceRegs[0]); - Inst.addReg(OpInfo.Regs[0]); + + // We might need to extend the input registers to fit the register class + Register TargetReg = OpInfo.Regs[0]; + Register InputReg = SourceRegs[0]; + + unsigned TargetSize = TRI->getRegSizeInBits(TargetReg, *MRI); + unsigned InputSize = MRI->getType(InputReg).getSizeInBits(); + + if (InputSize < TargetSize) { + Register ScratchReg = + MRI->createGenericVirtualRegister(LLT::scalar(TargetSize)); + InputReg = MIRBuilder.buildAnyExt(ScratchReg, InputReg).getReg(0); + } + + MIRBuilder.buildCopy(TargetReg, InputReg); + Inst.addReg(TargetReg); break; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83390.276394.patch Type: text/x-patch Size: 2048 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:54:41 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:54:41 +0000 (UTC) Subject: [PATCH] D83181: [x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) In-Reply-To: References: Message-ID: <3fd12adbc78552f0e463e4ec3109dd9c@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG91149002872f: [x86] improve codegen for non-splat bit-masked vector compare and select… (authored by spatel). Changed prior to commit: https://reviews.llvm.org/D83181?vs=275568&id=275708#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83181/new/ https://reviews.llvm.org/D83181 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vselect-pcmp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83181.275708.patch Type: text/x-patch Size: 12121 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:54:41 2020 From: llvm-commits at lists.llvm.org (Isuru Fernando via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:54:41 +0000 (UTC) Subject: [PATCH] D77815: [flang] Fix setting mxcsr on MSVC In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG2ebf4b6e4c35: [flang] Fix setting mxcsr on MSVC (authored by isuruf). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77815/new/ https://reviews.llvm.org/D77815 Files: flang/lib/Evaluate/host.cpp flang/lib/Evaluate/host.h flang/unittests/Evaluate/fp-testing.cpp flang/unittests/Evaluate/fp-testing.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D77815.275707.patch Type: text/x-patch Size: 5694 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:57:50 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:57:50 +0000 (UTC) Subject: [PATCH] D83391: [NFC][Debugify] Rename OptCustomPassManager into DebugifyCustomPassManager Message-ID: djtodoro created this revision. djtodoro added reviewers: vsk, aprantl. djtodoro added projects: debug-info, LLVM. Herald added subscribers: llvm-commits, hiraditya. In addition, move the definition of the class into the LegacyPassManager.h, so we can use it from different levels. The motivation for this is D82547 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83391 Files: llvm/include/llvm/IR/LegacyPassManager.h llvm/lib/IR/LegacyPassManager.cpp llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83391.276395.patch Type: text/x-patch Size: 7348 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 05:58:47 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:58:47 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <5bf1605b8f4f062eff20c92045894b08@localhost.localdomain> mwezdeck added a comment. Can I kindly ask for review? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Wed Jul 8 05:59:35 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 12:59:35 +0000 (UTC) Subject: [PATCH] D82547: [Debugify] Expose debugify (original mode) as CC1 option In-Reply-To: References: Message-ID: <65b81f9d7c29c9b29f18a7bacc357758@localhost.localdomain> djtodoro marked an inline comment as done. djtodoro added inline comments. ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:855 +class ClangCustomPassManager : public legacy::PassManager { +public: ---------------- djtodoro wrote: > vsk wrote: > > Please factor out OptCustomPassManager from opt and generalize it so it can be used by both opt and clang. That should help ensure that extensions and bug fixes are only made to one custom 'debugify' pass manager. > I'll try that with the latest code. I remember I've tried it once, but I ended up moving it into the IR library (since we need to link it within legacy pass manager). Hi @vsk, I've posted the patch as D83391. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82547/new/ https://reviews.llvm.org/D82547 From llvm-commits at lists.llvm.org Wed Jul 8 06:00:15 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:00:15 +0000 (UTC) Subject: [PATCH] D82545: [Debugify] Make the debugify aware of the original (-g) Debug Info In-Reply-To: References: Message-ID: <59302b301bfc4be6a8a67134ded99d5a@localhost.localdomain> djtodoro updated this revision to Diff 276396. djtodoro added a comment. - Rebase on top of D83391 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82545/new/ https://reviews.llvm.org/D82545 Files: llvm/docs/HowToUpdateDebugInfo.rst llvm/include/llvm/IR/LegacyPassManager.h llvm/include/llvm/Transforms/Utils/Debugify.h llvm/lib/Transforms/Utils/Debugify.cpp llvm/test/DebugInfo/check-debugify-preserves-analyses.ll llvm/test/DebugInfo/debugify-each-original-dbginfo.ll llvm/test/DebugInfo/debugify-each.ll llvm/test/DebugInfo/debugify-export.ll llvm/test/DebugInfo/debugify-original-dbginfo.ll llvm/test/DebugInfo/debugify-original-no-dbg-info.ll llvm/test/DebugInfo/debugify.ll llvm/test/DebugInfo/pr37964.ll llvm/test/Transforms/CodeGenPrepare/AArch64/overflow-intrinsics.ll llvm/test/Transforms/CodeGenPrepare/SPARC/overflow-intrinsics.ll llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll llvm/test/Transforms/CodeGenPrepare/X86/vec-shift.ll llvm/test/Transforms/CorrelatedValuePropagation/ashr.ll llvm/test/Transforms/CorrelatedValuePropagation/overflows.ll llvm/test/Transforms/CorrelatedValuePropagation/sext.ll llvm/test/Transforms/CorrelatedValuePropagation/udiv.ll llvm/test/Transforms/InstCombine/call-guard.ll llvm/test/Transforms/InstCombine/double-float-shrink-2.ll llvm/test/Transforms/InstCombine/malloc-free-delete-dbginvar.ll llvm/test/Transforms/InstCombine/musttail-thunk.ll llvm/test/Transforms/SCCP/ipsccp-basic.ll llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82545.276396.patch Type: text/x-patch Size: 58421 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:01:31 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:01:31 +0000 (UTC) Subject: [PATCH] D82546: [Debugify][OriginalMode] Export the report into JSON file In-Reply-To: References: Message-ID: <25dcebd907ff9c5f33464abeb836606d@localhost.localdomain> djtodoro updated this revision to Diff 276397. djtodoro added a comment. - Rebase on top of D83391 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82546/new/ https://reviews.llvm.org/D82546 Files: llvm/docs/HowToUpdateDebugInfo.rst llvm/include/llvm/IR/LegacyPassManager.h llvm/include/llvm/Transforms/Utils/Debugify.h llvm/lib/Transforms/Utils/Debugify.cpp llvm/test/DebugInfo/debugify-each-original-dbginfo.ll llvm/tools/opt/opt.cpp llvm/utils/llvm-debugify-original.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D82546.276397.patch Type: text/x-patch Size: 29951 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:01:56 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:01:56 +0000 (UTC) Subject: [PATCH] D82547: [Debugify] Expose debugify (original mode) as CC1 option In-Reply-To: References: Message-ID: <294e44c035816bd60e72f816cef88fc9@localhost.localdomain> djtodoro updated this revision to Diff 276398. djtodoro added a comment. - Rebase on top of D83391 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82547/new/ https://reviews.llvm.org/D82547 Files: clang/include/clang/Basic/CodeGenOptions.def clang/include/clang/Basic/CodeGenOptions.h clang/include/clang/Driver/CC1Options.td clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/test/Driver/debugify-each-original.c llvm/docs/HowToUpdateDebugInfo.rst -------------- next part -------------- A non-text attachment was scrubbed... Name: D82547.276398.patch Type: text/x-patch Size: 6225 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:05:31 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:05:31 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: lebedev.ri updated this revision to Diff 276399. lebedev.ri added a comment. IWYU Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276399.patch Type: text/x-patch Size: 17931 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:08:01 2020 From: llvm-commits at lists.llvm.org (Bardia Mahjour via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:08:01 +0000 (UTC) Subject: [PATCH] D82927: Intergerate Loop Peeling into Loop Fusion In-Reply-To: References: Message-ID: bmahjour added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:858 + + // Here we are checking the user specfices to enable loop peeling with + // fusion, ensure FC0 (the first loop) can be peeled, and both loops ---------------- we are checking **that** the user **specifies**... ================ Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:861 + // have different tripcounts + if (FusionPeelMaxCount.getNumOccurrences() > 0 && + FusionPeelMaxCount != 0 && FC0->AbleToPeel && !SameTripCount && ---------------- I don't think we should check for occurrence of the option on the command line. It suffices to just compare `TCDifference` against the `FusionPeelMaxCount`. When the option is not specified the value of `FusionPeelMaxCount` will be the default, which is currently zero and would prevent peeling). Similarly `FusionPeelMaxCount != 0` is redundant. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82927/new/ https://reviews.llvm.org/D82927 From llvm-commits at lists.llvm.org Wed Jul 8 06:08:04 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:08:04 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <1e7d302eb5e4c11f8aec432d3b91610c@localhost.localdomain> djtodoro added a comment. Super nits included. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp:13 +#include "llvm/CodeGen/MachineFrameInfo.h" +#include "llvm/CodeGen/MachineFunction.h" +#include "llvm/CodeGen/MachineFunctionPass.h" ---------------- I think this include is redundant. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:1792 + // Skip functions from NoDebug compilation units. + if (MF.getFunction().getSubprogram()->getUnit()->getEmissionKind() == + DICompileUnit::NoDebug) ---------------- clang-format please CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Wed Jul 8 06:09:40 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:09:40 +0000 (UTC) Subject: [PATCH] D83071: Add support for options with two flags for controlling the same field. In-Reply-To: References: Message-ID: <5fc6c407f09075f8c2148d62dcd86cd0@localhost.localdomain> dang updated this revision to Diff 276400. dang added a comment. Split into two macro kinds. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83071/new/ https://reviews.llvm.org/D83071 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83071.276400.patch Type: text/x-patch Size: 18193 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:11:30 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:11:30 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dstenb added a comment. In D82975#2135150 , @SouraVX wrote: > In D82975#2134626 , @dblaikie wrote: > > > In D82975#2134347 , @probinson wrote: > > > > > In D82975#2134132 , @dblaikie wrote: > > > > > > > In D82975#2128353 , @dstenb wrote: > > > > > > > > > In D82975#2127201 , @SouraVX wrote: > > > > > > > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > > > > > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > > > > > > > > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. > > > > > > > > > My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. > > > > > > LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. > > > > > > GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand: > > > > $ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug" > > 1 .debug_info.dwo 0000003c 0000000000000000 > > 2 .debug_abbrev.dwo 0000003e 0000000000000000 > > 3 .debug_macro.dwo 0000001e 0000000000000000 > > 4 .debug_macro.dwo 00000364 0000000000000000 > > 5 .debug_macro.dwo 00000013 0000000000000000 > > 6 .debug_line.dwo 00000048 0000000000000000 > > 7 .debug_str_offsets.dwo 000002d5 0000000000000000 > > 8 .debug_str.dwo 00000e05 0000000000000000 > > $ llvm-dwarfdump-tot main.dwo -debug-macro > > main.dwo: file format elf64-x86-64 > > > > .debug_macro.dwo contents: > > 0x00000000: > > - lineno: 19 macro: > > DW_MACINFO_invalid > > > > > > I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off. > > > One more deciding factor to considered here(previously missed) is that: `GDB(trunk)` also doesn't understand `GNU macro extensions`(if you wish to call it) in split case. > i.e > `gcc -g3 -gsplit-dwarf test.c` > `test.dwo` contains `.debug_macro.dwo` forms which no tool(as of now can dump). > if you load `a.out` in GDB and try expanding macro(defined in source). > GDB will report > > (gdb) info macro FOO > The symbol `FOO' has no definition as a C/C++ preprocessor macro > at :-1 > > > on the other hand, if you try with `-gstrict-dwarf -gsplit-dwarf`. GDB is happy. > So at the end of the day, even if we allow `GNU macro` extension, things will still be broken for `-gsplit-dwarf` case. > Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in `.debug_macro.dwo` section. > That it self is unclear right ? (Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.) When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Wed Jul 8 06:12:34 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via llvm-commits) Date: Wed, 08 Jul 2020 06:12:34 -0700 (PDT) Subject: [llvm] 64363a9 - [NVPTX]Add a test for debug info for packed bitfields, NFC. Message-ID: <5f05c642.1c69fb81.25532.0a8f@mx.google.com> Author: Alexey Bataev Date: 2020-07-08T09:10:42-04:00 New Revision: 64363a9d93006595d05825ce8fdbcc146aac15b1 URL: https://github.com/llvm/llvm-project/commit/64363a9d93006595d05825ce8fdbcc146aac15b1 DIFF: https://github.com/llvm/llvm-project/commit/64363a9d93006595d05825ce8fdbcc146aac15b1.diff LOG: [NVPTX]Add a test for debug info for packed bitfields, NFC. Added: llvm/test/DebugInfo/NVPTX/packed_bitfields.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll new file mode 100644 index 000000000000..36433064fa92 --- /dev/null +++ b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll @@ -0,0 +1,42 @@ +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda | FileCheck %s + +; Produced at -O0 from: +; struct { +; char : 3; +; char a : 6; +; } __attribute__((__packed__)) b; + +; Note that DWARF 2 counts bit offsets backwards from the high end of +; the storage unit to the high end of the bit field. + +; CHECK: .section .debug_info +; CHECK: .b8 3 // Abbrev {{.*}} DW_TAG_structure_type +; CHECK: .b8 3 // DW_AT_decl_line +; CHECK-NEXT: .b8 1 // DW_AT_byte_size +; CHECK-NEXT: .b8 6 // DW_AT_bit_size +; CHECK-NEXT: .b64 -1 // DW_AT_bit_offset +; CHECK-NEXT: .b8 2 // DW_AT_data_member_location + +%struct.anon = type { i16 } + + at b = global %struct.anon zeroinitializer, align 1, !dbg !0 + +!llvm.dbg.cu = !{!2} +!llvm.module.flags = !{!10, !11, !12, !13} +!llvm.ident = !{!14} + +!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression()) +!1 = distinct !DIGlobalVariable(name: "b", scope: !2, file: !3, line: 4, type: !6, isLocal: false, isDefinition: true) +!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "clang version 11.0.0 ", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5, nameTableKind: None, sysroot: "/") +!3 = !DIFile(filename: "repro.c", directory: "/tmp") +!4 = !{} +!5 = !{!0} +!6 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !3, line: 1, size: 16, elements: !7) +!7 = !{!8} +!8 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !6, file: !3, line: 3, baseType: !9, size: 6, offset: 3, flags: DIFlagBitField, extraData: i64 0) +!9 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char) +!10 = !{i32 7, !"Dwarf Version", i32 2} +!11 = !{i32 2, !"Debug Info Version", i32 3} +!12 = !{i32 1, !"wchar_size", i32 4} +!13 = !{i32 7, !"PIC Level", i32 2} +!14 = !{!"clang version 11.0.0 "} From llvm-commits at lists.llvm.org Wed Jul 8 06:13:40 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:13:40 +0000 (UTC) Subject: [PATCH] D82659: Fix missing build dependency on omp_gen. In-Reply-To: References: Message-ID: <91ca3cc1797895e101b5e918290afaf1@localhost.localdomain> clementval added a comment. In D82659#2138228 , @michele.scandale wrote: > In D82659#2136999 , @clementval wrote: > > > Looks good but just one question ... When clang is built as standalone it does not build the OpenMP part inside Clang? I haven't seen any code to avoid compiling the OpenMP parsing and semantic checking inside clang. > > > I don't think there is a way to avoid compiling the OpenMP support in Clang. The standalone build is just building the content of the `clang` directory as a separate CMake project reusing the an already built LLVM -- therefore the `libLLVMFrontendOpenMP` as well as the `OMP.h.inc` would have been generated already. Ok then your fix should work. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82659/new/ https://reviews.llvm.org/D82659 From llvm-commits at lists.llvm.org Wed Jul 8 06:19:12 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:19:12 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: fhahn added a comment. Do you have any stats on how often this tiggers? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Wed Jul 8 06:31:10 2020 From: llvm-commits at lists.llvm.org (Anastasia Stulova via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:31:10 +0000 (UTC) Subject: [PATCH] D82756: Port some floating point options to new option marshalling infrastructure In-Reply-To: References: Message-ID: <456265f6efbd87196ced152960dbd571@localhost.localdomain> Anastasia added inline comments. ================ Comment at: clang/include/clang/Driver/Options.td:1176 +defm reciprocal_math : OptInFFlag< "reciprocal-math", "Allow division operations to be reassociated", "", "", [], "LangOpts->AllowRecip">; +def fapprox_func : Flag<["-"], "fapprox-func">, Group, Flags<[CC1Option, NoDriverOption]>, + MarshallingInfoFlag<"LangOpts->ApproxFunc", "false">; ---------------- could this also be OptInFFlag? ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2805 CmdArgs.push_back("-menable-unsafe-fp-math"); + ApproxFunc = true; + } ---------------- Is this a bug fix ? ================ Comment at: clang/test/CodeGen/fp-function-attrs.cpp:2 +// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffinite-math-only -menable-unsafe-fp-math \ +// RUN: -menable-no-infs -menable-no-nans -fno-signed-zeros -freciprocal-math \ +// RUN: -fapprox-func -mreassociate -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s ---------------- Not clear why do you need to pass these extra flags now? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82756/new/ https://reviews.llvm.org/D82756 From llvm-commits at lists.llvm.org Wed Jul 8 06:35:05 2020 From: llvm-commits at lists.llvm.org (Muhammad Usman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:35:05 +0000 (UTC) Subject: [PATCH] D83392: Strlen loop idiom recognition Message-ID: musman created this revision. musman added reviewers: lebedev.ri, nemanjai, hfinkel, sanjoy. musman added a project: LLVM. Herald added subscribers: llvm-commits, hiraditya. This patch adds strlen to LoopIdiomRecognize.cpp. It is the first part of 3 patches: 1. Strlen loop idiom recognition 2. strlen16 recognition and creation of new strlen16 intrinsic 3. Folding of strlen/strlen16 call if they are only used for zero equality comparison (replace with load of first char) Examples that this recognizes: unsigned strlen1(char *Str) { char *Src = Str; if (!Src) return 0; for (; *Src;) Src++; return Src - Str; } unsigned strlen2(char *Str) { unsigned Len = 0; if (!Str) return 0; while(*Str) { Len++; Str++; } return Len; } Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83392 Files: llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp llvm/test/Transforms/LoopIdiom/recognize-strlen.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83392.276403.patch Type: text/x-patch Size: 14908 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:35:50 2020 From: llvm-commits at lists.llvm.org (Muhammad Usman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:35:50 +0000 (UTC) Subject: [PATCH] D82395: Strlen loop idiom recognition and folding In-Reply-To: References: Message-ID: <0f70c3b70fdd9bd39da50101aaf04130@localhost.localdomain> musman added a comment. In D82395#2109591 , @lebedev.ri wrote: > This should be several patches: > > - Loop idiom recognition for plain strlen > - langref for new strlen16 > - other strlen16 patches > - Loop idiom recognition for strlen16 Thanks, I will split this up into smaller patches Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82395/new/ https://reviews.llvm.org/D82395 From llvm-commits at lists.llvm.org Wed Jul 8 06:40:19 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:40:19 +0000 (UTC) Subject: [PATCH] D83393: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This allows to propagate an error and report a warning properly. https://reviews.llvm.org/D83393 Files: llvm/test/tools/llvm-readobj/ELF/versym-invalid.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -1086,10 +1086,12 @@ sizeof(Elf_Sym); // Get the corresponding version index entry. - const Elf_Versym *Versym = unwrapOrError( - ObjF->getFileName(), ObjF->getELFFile()->template getEntry( - SymbolVersionSection, EntryIndex)); - return this->getSymbolVersionByIndex(Versym->vs_index, IsDefault); + if (Expected EntryOrErr = + ObjF->getELFFile()->template getEntry( + SymbolVersionSection, EntryIndex)) + return this->getSymbolVersionByIndex((*EntryOrErr)->vs_index, IsDefault); + else + return EntryOrErr.takeError(); } template Index: llvm/test/tools/llvm-readobj/ELF/versym-invalid.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/versym-invalid.test +++ llvm/test/tools/llvm-readobj/ELF/versym-invalid.test @@ -115,13 +115,39 @@ ## Check we report a warning when a SHT_GNU_versym section has an invalid entry size. # RUN: yaml2obj --docnum=5 %s -o %t5 -# RUN: llvm-readelf -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU -# RUN: llvm-readobj -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM - +# RUN: llvm-readelf -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU +# RUN: llvm-readobj -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM + +# INVALID-ENT-SIZE-GNU: Symbol table '.dynsym' contains 2 entries: +# INVALID-ENT-SIZE-GNU-NEXT: Num: Value Size Type Bind Vis Ndx Name +# INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-GNU-NEXT: 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND @ +# INVALID-ENT-SIZE-GNU-NEXT: 1: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND foo@ # INVALID-ENT-SIZE-GNU: Version symbols section '.gnu.version' contains 1 entries: # INVALID-ENT-SIZE-GNU-NEXT: Addr: 0000000000000000 Offset: 0x000040 Link: 0 () # INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 +# INVALID-ENT-SIZE-LLVM: DynamicSymbols [ +# INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: @ (0) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: foo@ (1) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: ] # INVALID-ENT-SIZE-LLVM: VersionSymbols [ # INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 # INVALID-ENT-SIZE-LLVM-NEXT: ] @@ -137,6 +163,8 @@ Type: SHT_GNU_versym Entries: [ 0 ] EntSize: 3 +DynamicSymbols: + - Name: foo ## Check we report a warning when the number of version entries does not match the number of symbols in the associated symbol table. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83393.276409.patch Type: text/x-patch Size: 4072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:42:11 2020 From: llvm-commits at lists.llvm.org (Gil Rapaport via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:42:11 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: gilr added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:238 VPTransformState(unsigned VF, unsigned UF, LoopInfo *LI, DominatorTree *DT, - IRBuilder<> &Builder, VectorizerValueMap &ValueMap, - InnerLoopVectorizer *ILV, VPCallback &Callback) - : VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder), + const TargetTransformInfo *TTI, IRBuilder<> &Builder, + VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV, ---------------- dmgreen wrote: > Ayal wrote: > > Too bad this requires passing TTI through the State everywhere. > > Perhaps storing TTI in the recipe would be somewhat better. > I've changed it to be stored there. It does mean multiple things are holding TTI. Let me know what you think. It seems that TTI is only used later for deciding whether to use a shuffle sequence or an intrinsic based on data available during planning. If so, then it would be best if the Planner calls TTI->useReductionIntrinsic() and records that boolean decision in the Recipe. This is also required in order to estimate in-loop reduction cost. This could be done separately. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Wed Jul 8 06:44:09 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:44:09 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: ABataev added a comment. In D82881#2137938 , @dblaikie wrote: > In D82881#2135641 , @ABataev wrote: > > > In D82881#2134913 , @dblaikie wrote: > > > > > In D82881#2133548 , @ABataev wrote: > > > > > > > In D82881#2133511 , @aprantl wrote: > > > > > > > > > And conversely, with this patch applied, do GDB and LLDB still produce the expected result? > > > > > > > > > > > > GDB works correctly. Did not check with lldb, but it also should work. The result is similar to the debug info, produced for the next code: > > > > > > > > struct { > > > > short : 3; > > > > short : 6; > > > > } a; > > > > > > > > > > > > > Similar, but seems different in a critical way - in that code, the type of the field is short, which has size 2. Which matches the size of the field. > > > > > > I think it would be pretty surprising to handle DWARF where the size of a field is different from the size of the type of that field? > > > > > > The standard clearly says: > > A base type entry has a DW_AT_byte_size attribute, whose value is a constant, describing the size in bytes of the storage unit used to represent an object of the given type. > > In our example, the storage size is the same, just like for short. > > > The storage size is the same as what? > > It looks like/my understanding is this patch produces a field of type 'char' with size 2 bytes - which seems surprisingly inconsistent, at least. If there are other pre-existing examples of fields having larger sizes than their types, that might be useful to draw analogy and confidence from. Found a discussion in the dwarf-discuss mailing list http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2011-September/003881.html. Looks like signed offsets are allowed in DWARF2. The bug I'm trying to fix is the incompatibility with NVPTX ptxas compiler. It does not allow signed integers in debug sections. Would it be good to emit bit_offset as `DW_FORM_udata` for NVPTX target to fix incompatibility? Checked that it works with ptxas. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 06:44:40 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:44:40 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <485df04453b468e84893c767da0000b7@localhost.localdomain> ebevhan updated this revision to Diff 276410. ebevhan added a comment. Addressed review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.276410.patch Type: text/x-patch Size: 51004 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:46:23 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:46:23 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <2ac33786a91aab4cb4329073efbadb02@localhost.localdomain> ebevhan marked 5 inline comments as done. ebevhan added inline comments. ================ Comment at: llvm/docs/LangRef.rst:14243 +'``llvm.sshl.sat.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ---------------- lebedev.ri wrote: > This should be > ``` > '``llvm.sshl.sat.*``' Intrinsics > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > ``` I made the change to the rest of the saturating/fixedpoint intrinsics as well. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:809 + } else { + Op1Promoted = SExtPromotedInteger(Op1); + Op2Promoted = ZExtPromotedInteger(Op2); ---------------- lebedev.ri wrote: > Actually, why do we need to signext, or even zeroext it? > As the comment before function notes, we want anyext, we don't care about those new high bits, > because we are immediately going to shift them out. > > ``` > ---------------------------------------- > Name: promote ushl > %r = ushl_sat i8 %x, %y > ret i8 %r > => > %x_wide = zext i8 %x to i16 > %y_wide = zext i8 %y to i16 > %t0 = shl i16 %x_wide, 8 > %t1 = ushl_sat i16 %t0, %y_wide > %t2 = lshr i16 %t1, 8 > %r = trunc i16 %t2 to i8 > ret i8 %r > > Done: 1 > Transformation seems to be correct! > > ---------------------------------------- > Name: promote sshl > %r = sshl_sat i8 %x, %y > ret i8 %r > => > %x_wide = zext i8 %x to i16 > %y_wide = zext i8 %y to i16 > %t0 = shl i16 %x_wide, 8 > %t1 = sshl_sat i16 %t0, %y_wide > %t2 = ashr i16 %t1, 8 > %r = trunc i16 %t2 to i8 > ret i8 %r > > Done: 1 > Transformation seems to be correct! > > ``` > > So i think you want > ``` > SDValue Op1Promoted = GetPromotedInteger(Op1); > SDValue Op1Promoted = GetPromotedInteger(Op2); > unsigned ShiftOp = Opcode == ISD::USHLSAT ? ISD::SRL : ISD::SRA; > ``` > and maybe get rid of `ShiftOp` variable, or sink it closer to use. Ah, that's true. I grabbed it from the ADDSUBSAT promotion without thinking. That needs the proper extension due to the min/max expansion, I think. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 06:51:27 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:51:27 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways Message-ID: foad created this revision. foad added reviewers: arsenm, sameerds. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. As explained in the comment: // For a FLAT instruction the hardware decides whether to access // global/scratch/shared memory based on the high bits of vaddr, // ignoring the offset field, so we have to ensure that when we add // remainder to vaddr it still points into the same underlying object. // The easiest way to do that is to make sure that we split the offset // into two pieces that are both >= 0 or both <= 0. In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have an unsigned immediate offset field, so we can't use it to help split a negative offset. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83394 Files: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp llvm/test/CodeGen/AMDGPU/offset-split-global.ll llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83394.276412.patch Type: text/x-patch Size: 13844 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:54:19 2020 From: llvm-commits at lists.llvm.org (Anirudh Prasad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:54:19 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation In-Reply-To: References: Message-ID: <92745c2d92b00bcceb80dc72dd9d3304@localhost.localdomain> anirudhp added a comment. @uweigand if you are happy with the patch and have no other comments for me to address, would you mind committing this? (as I don't have commit access) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83251/new/ https://reviews.llvm.org/D83251 From llvm-commits at lists.llvm.org Wed Jul 8 06:54:29 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:54:29 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a reviewer: rengolin. Herald added a reviewer: efriedma. Herald added a project: LLVM. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83395 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83395.276413.patch Type: text/x-patch Size: 18860 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 06:56:24 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 13:56:24 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <6d64145f3fc3630e5ed7c4f841313982@localhost.localdomain> lebedev.ri added a comment. Patch as-is looks good but i'm not sure what's the RFC status here. If these new intrinsics were already previously proposed as part of some RFC that got accepted, can you state that in the patch's description? (with link to the thread) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:809 + } else { + Op1Promoted = SExtPromotedInteger(Op1); + Op2Promoted = ZExtPromotedInteger(Op2); ---------------- ebevhan wrote: > lebedev.ri wrote: > > Actually, why do we need to signext, or even zeroext it? > > As the comment before function notes, we want anyext, we don't care about those new high bits, > > because we are immediately going to shift them out. > > > > ``` > > ---------------------------------------- > > Name: promote ushl > > %r = ushl_sat i8 %x, %y > > ret i8 %r > > => > > %x_wide = zext i8 %x to i16 > > %y_wide = zext i8 %y to i16 > > %t0 = shl i16 %x_wide, 8 > > %t1 = ushl_sat i16 %t0, %y_wide > > %t2 = lshr i16 %t1, 8 > > %r = trunc i16 %t2 to i8 > > ret i8 %r > > > > Done: 1 > > Transformation seems to be correct! > > > > ---------------------------------------- > > Name: promote sshl > > %r = sshl_sat i8 %x, %y > > ret i8 %r > > => > > %x_wide = zext i8 %x to i16 > > %y_wide = zext i8 %y to i16 > > %t0 = shl i16 %x_wide, 8 > > %t1 = sshl_sat i16 %t0, %y_wide > > %t2 = ashr i16 %t1, 8 > > %r = trunc i16 %t2 to i8 > > ret i8 %r > > > > Done: 1 > > Transformation seems to be correct! > > > > ``` > > > > So i think you want > > ``` > > SDValue Op1Promoted = GetPromotedInteger(Op1); > > SDValue Op1Promoted = GetPromotedInteger(Op2); > > unsigned ShiftOp = Opcode == ISD::USHLSAT ? ISD::SRL : ISD::SRA; > > ``` > > and maybe get rid of `ShiftOp` variable, or sink it closer to use. > Ah, that's true. I grabbed it from the ADDSUBSAT promotion without thinking. That needs the proper extension due to the min/max expansion, I think. Err, i was half-right i think. I guess we actually need to zext the shift amount. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 07:00:55 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:00:55 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: <680c995d3a4a049c38550099c4c682ff@localhost.localdomain> paulwalker-arm added a comment. In D83303#2137371 , @RKSimon wrote: > @paulwalker-arm Does this finally fix PR12772? @RKSimon: I believe so. I'll update the ticket once I've pushed this change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Wed Jul 8 07:04:10 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:04:10 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <9d5564826204852679eae534dd1b4713@localhost.localdomain> dmgreen added a comment. It should be triggering relatively often, from looking at the tests/benchmarks I was seeing. Any code that uses pointer induction variables can trigger it. Whether that will common will be a matter of the style of the original code, and whether it actually proves move NoAlias will be variable. But it seems to come up enough to make code improvements in a fair number of cases. >From what I can tell it should be almost free though, in terms of compile time. Just an extra check of whether a phi is a simple recursion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Wed Jul 8 07:07:10 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:07:10 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Optionally set OriginalOp to renamed value it refers to. In-Reply-To: References: Message-ID: fhahn updated this revision to Diff 276418. fhahn retitled this revision from "[PredicateInfo] Optionally set OriginalOp to renamed value it refers to. " to "[PredicateInfo] Optionally set OriginalOp to renamed value it refers to.". fhahn added a comment. Add new RenamedOp to PredicateBase instead of having a separate option. >> In D78133#2131838 , @nikic wrote: >> >>> > > Why don't we include both variants? While NewGVN needs the current OriginalOp for the replacement instruction patching, it also needs what is being implemented here (lets call it CondOp) when originally handling the PredicateInfo. As this comment indicates, NewGVN currently has the same problem as SCCP: https://github.com/llvm/llvm-project/blob/2279380eab08219911910e1ecdcef3eacb0b7f0c/llvm/lib/Transforms/Scalar/NewGVN.cpp#L1565-L1572 > > As PredicateInfo structures aren't particularly memory critical, I'd suggest to include both `OriginalOp` (value prior to any renaming) and `CondOp` (value used inside the condition) and use them as appropriate. Initially I wanted to not increase memory consumption, but it's probably not a big deal. I also had same patches further reducing the memory consumption here, but did not have time to follow up on them. Anyways, I've added a separate RenamedOp field. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 Files: llvm/include/llvm/Transforms/Utils/PredicateInfo.h llvm/lib/Transforms/Utils/PredicateInfo.cpp Index: llvm/lib/Transforms/Utils/PredicateInfo.cpp =================================================================== --- llvm/lib/Transforms/Utils/PredicateInfo.cpp +++ llvm/lib/Transforms/Utils/PredicateInfo.cpp @@ -600,6 +600,9 @@ RenameIter == RenameStack.begin() ? OrigOp : (RenameIter - 1)->Def; ValueDFS &Result = *RenameIter; auto *ValInfo = Result.PInfo; + ValInfo->RenamedOp = (RenameStack.end() - Start) == RenameStack.begin() + ? OrigOp + : (RenameStack.end() - Start - 1)->Def; // For edge predicates, we can just place the operand in the block before // the terminator. For assume, we have to place it right before the assume // to ensure we dominate all of our uses. Always insert right before the Index: llvm/include/llvm/Transforms/Utils/PredicateInfo.h =================================================================== --- llvm/include/llvm/Transforms/Utils/PredicateInfo.h +++ llvm/include/llvm/Transforms/Utils/PredicateInfo.h @@ -79,6 +79,10 @@ // This can be use by passes, when destroying predicateinfo, to know // whether they can just drop the intrinsic, or have to merge metadata. Value *OriginalOp; + // The renamed operand in the condition used for this predicate. For nested + // predicates, this is different to OriginalOp which refers to the initial + // operand. + Value *RenamedOp; PredicateBase(const PredicateBase &) = delete; PredicateBase &operator=(const PredicateBase &) = delete; PredicateBase() = delete; -------------- next part -------------- A non-text attachment was scrubbed... Name: D78133.276418.patch Type: text/x-patch Size: 1566 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:07:49 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:07:49 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <59fca09040bee2e59fa107880beb391c@localhost.localdomain> sameerarora101 marked 5 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/docs/CommandGuide/llvm-libtool-darwin.rst:17 + +For most scenarios, it works as a drop-in replacement for cctool's +:program:`libtool`. ---------------- smeenai wrote: > Nit: The package name is "cctools", so you should say "cctools'" (trailing apostrophe) instead of "cctool's". good catch, thanks 😊 ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:18 + +static cl::opt OutputFile("output", + cl::desc("Specify output filename"), ---------------- smeenai wrote: > As far as I can see, cctools libtool doesn't support the `-output` spelling, only `-o`. Is there any reason for us to support it? Yup, that is true. I was just looking at other llvm tools and they have `-output` in addition to `-o`. So I thought of adding both. I can remove `-output` if you guys prefer that? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Wed Jul 8 07:08:17 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:08:17 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <4a379218eeefa6e9e3bef600ae38782e@localhost.localdomain> jdenny added inline comments. ================ Comment at: llvm/test/TableGen/directive1.td:120 +// IMPL-NEXT: } +// IMPL-NEXT: } break; +// IMPL-NEXT: default: ---------------- In most examples I've seen, the outer `case` would be formatted as: ``` case TDLD_dira: { . . . break; } ``` clang-format puts the `{` on the same line as the `case`. grep shows some code putting `break;` on the same line as the `}` and some code putting it on the line before. However, I did at least find [[ http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector | one example in LLVM docs ]] showing the way I'm suggesting above. Alternatively, [[ http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return | as in this example ]], couldn't those braces be dropped given that there are no local declarations? ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:255 const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); - GenerateTestForAllowedClauses(RequiredClauses, OS, DirectiveName, + GenerateCaseForAllowedClauses(RequiredClauses, OS, DirectiveName, DirectivePrefix, ClausePrefix); ---------------- Maybe `GenerateCaseForVersionedClauses` given that it's not just `allowedClauses`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 07:08:22 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:08:22 +0000 (UTC) Subject: [PATCH] D82927: Intergerate Loop Peeling into Loop Fusion In-Reply-To: References: Message-ID: <7e2ec9e645ac8d3f8d03ede02edd1412@localhost.localdomain> sidbav updated this revision to Diff 276419. sidbav added a comment. Address Bardia's comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82927/new/ https://reviews.llvm.org/D82927 Files: llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/test/Transforms/LoopFusion/guarded_peel.ll llvm/test/Transforms/LoopFusion/guarded_unsafeblock_peel.ll llvm/test/Transforms/LoopFusion/nonadjacent_peel.ll llvm/test/Transforms/LoopFusion/peel.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82927.276419.patch Type: text/x-patch Size: 38716 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:09:44 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:09:44 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <49518dbd01025f233a59a8c65d7ae2b5@localhost.localdomain> sameerarora101 added a comment. @smeenai I have also moved `OptionCategory` and `hiding-unrelated-options.test` to this diff instead of D83002 that adds the support for `-static`. Thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Wed Jul 8 07:10:16 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:10:16 +0000 (UTC) Subject: [PATCH] D77808: [SCCP] Use conditional info with AND/OR branch conditions. In-Reply-To: References: Message-ID: fhahn updated this revision to Diff 276420. fhahn added a comment. Rebased after updating D78133 . Adjust comment/naming. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77808/new/ https://reviews.llvm.org/D77808 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/conditions-ranges.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D77808.276420.patch Type: text/x-patch Size: 5678 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:11:37 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:11:37 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: sameerarora101 updated this revision to Diff 276422. sameerarora101 added a comment. Add `OptionCategory` and `hide-unrelated-options.test` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 Files: llvm/docs/CommandGuide/index.rst llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/CMakeLists.txt llvm/test/tools/llvm-libtool-darwin/Inputs/input1.yaml llvm/test/tools/llvm-libtool-darwin/Inputs/input2.yaml llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82923.276422.patch Type: text/x-patch Size: 12257 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:13:42 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:13:42 +0000 (UTC) Subject: [PATCH] D60413: [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses In-Reply-To: References: Message-ID: dnsampaio updated this revision to Diff 276423. dnsampaio added a comment. Fixed test location and command Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/BDCE/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276423.patch Type: text/x-patch Size: 5431 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:18:16 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:18:16 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: <927b0a63934b569ed9b0cb594d63f8c9@localhost.localdomain> Petar.Avramovic updated this revision to Diff 276425. Petar.Avramovic retitled this revision from "[GlobalISel][InlineAsm] Fix buildCopy for matching input constraints" to "[GlobalISel][InlineAsm] Fix buildCopy for inputs". Petar.Avramovic added a reviewer: kschwarz. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll Index: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll =================================================================== --- llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll +++ llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll @@ -211,3 +211,35 @@ %1 = tail call i32 asm "ldr $0, $1", "=r,*m"(i32* %a) ret i32 %1 } + +define i16 @test_anyext_input() { + ; CHECK-LABEL: name: test_anyext_input + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 1 + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[C]](s16) + ; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY [[ANYEXT]](s32) + ; CHECK: INLINEASM &"", 1 /* sideeffect attdialect */, 655370 /* regdef:GPR32common */, def %0, 9 /* reguse */, [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY %0 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[TRUNC]](s16) + ; CHECK: $w0 = COPY [[ANYEXT1]](s32) + ; CHECK: RET_ReallyLR implicit $w0 + %1 = call i16 asm sideeffect "", "=r,r"(i16 1) + ret i16 %1 +} + +define i16 @test_anyext_input_with_matching_constraint() { + ; CHECK-LABEL: name: test_anyext_input_with_matching_constraint + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 1 + ; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[C]](s16) + ; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY [[ANYEXT]](s32) + ; CHECK: INLINEASM &"", 1 /* sideeffect attdialect */, 655370 /* regdef:GPR32common */, def %0, 2147483657 /* reguse tiedto:$0 */, [[COPY]](tied-def 3) + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY %0 + ; CHECK: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) + ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[TRUNC]](s16) + ; CHECK: $w0 = COPY [[ANYEXT1]](s32) + ; CHECK: RET_ReallyLR implicit $w0 + %1 = call i16 asm sideeffect "", "=r,0"(i16 1) + ret i16 %1 +} Index: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp +++ llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp @@ -237,6 +237,31 @@ return InlineAsm::getNumOperandRegisters(Flag); } +static bool buildAnyextOrCopy(Register Dst, Register Src, + MachineIRBuilder &MIRBuilder) { + const TargetRegisterInfo *TRI = + MIRBuilder.getMF().getSubtarget().getRegisterInfo(); + MachineRegisterInfo *MRI = MIRBuilder.getMRI(); + + unsigned SrcSize = TRI->getRegSizeInBits(Src, *MRI); + unsigned DstSize = TRI->getRegSizeInBits(Dst, *MRI); + if (DstSize < SrcSize) { + LLVM_DEBUG(dbgs() << "Input can't fit in destination reg class\n"); + return false; + } + // Attempt to anyext small scalar sources. + if (DstSize > SrcSize) { + if (!MRI->getType(Src).isValid() || !MRI->getType(Src).isScalar()) { + LLVM_DEBUG(dbgs() << "Can't extend input to size of destination" + " reg class\n"); + return false; + } + Src = MIRBuilder.buildAnyExt(LLT::scalar(DstSize), Src).getReg(0); + } + MIRBuilder.buildCopy(Dst, Src); + return true; +} + bool InlineAsmLowering::lowerInlineAsm( MachineIRBuilder &MIRBuilder, const CallBase &Call, std::function(const Value &Val)> GetOrCreateVRegs) @@ -427,7 +452,8 @@ ArrayRef SrcRegs = GetOrCreateVRegs(*OpInfo.CallOperandVal); assert(SrcRegs.size() == 1 && "Single register is expected here"); Register Tmp = MRI->createVirtualRegister(RC); - MIRBuilder.buildCopy(Tmp, SrcRegs[0]); + if (!buildAnyextOrCopy(Tmp, SrcRegs[0], MIRBuilder)) + return false; // Add Flag and input register operand (Tmp) to Inst. Tie Tmp to Def. unsigned UseFlag = InlineAsm::getFlagWord(InlineAsm::Kind_RegUse, 1); @@ -525,7 +551,8 @@ unsigned Flag = InlineAsm::getFlagWord(InlineAsm::Kind_RegUse, NumRegs); Inst.addImm(Flag); - MIRBuilder.buildCopy(OpInfo.Regs[0], SourceRegs[0]); + if (!buildAnyextOrCopy(OpInfo.Regs[0], SourceRegs[0], MIRBuilder)) + return false; Inst.addReg(OpInfo.Regs[0]); break; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83384.276425.patch Type: text/x-patch Size: 4232 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:19:34 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:19:34 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: arsenm added a comment. The mirror change is needed for globalisel Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 07:20:37 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:20:37 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex Message-ID: echristo created this revision. echristo added a reviewer: DavidTruby. echristo added a project: Flang. Herald added subscribers: llvm-commits, mcrosier. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. Fixing warning: '_Complex' is a C99 extension [-Wc99-extensions]. The functions are currently unused so no real testing has happened past build time, however, complex is strictly compatible with _Complex in the C++ standard and can be reinterpret casted in uses. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83397 Files: flang/lib/Lower/RTBuilder.h Index: flang/lib/Lower/RTBuilder.h =================================================================== --- flang/lib/Lower/RTBuilder.h +++ flang/lib/Lower/RTBuilder.h @@ -22,6 +22,7 @@ #include "mlir/IR/MLIRContext.h" #include "mlir/IR/StandardTypes.h" #include "llvm/ADT/SmallVector.h" +#include #include // List the runtime headers we want to be able to dissect @@ -158,13 +159,13 @@ } template <> -constexpr TypeBuilderFunc getModel() { +constexpr TypeBuilderFunc getModel>() { return [](mlir::MLIRContext *context) -> mlir::Type { return fir::CplxType::get(context, sizeof(float)); }; } template <> -constexpr TypeBuilderFunc getModel() { +constexpr TypeBuilderFunc getModel>() { return [](mlir::MLIRContext *context) -> mlir::Type { return fir::CplxType::get(context, sizeof(double)); }; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83397.276424.patch Type: text/x-patch Size: 925 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:22:00 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Wed, 8 Jul 2020 07:22:00 -0700 Subject: [PATCH] D82387: [flang] add RTBuilder In-Reply-To: References: <7291fbdeedbc2408b140944fa7971cfe@localhost.localdomain> Message-ID: Sent out https://reviews.llvm.org/D83397 -eric On Tue, Jul 7, 2020 at 10:45 AM Eric Christopher wrote: > ... you're the one that put the review up and didn't credit anyone else. > Can you fix this then? > > Thanks. > > -eric > > On Tue, Jul 7, 2020 at 10:10 AM Eric Schweitz (PGI) < > eric.schweitz at pgroup.com> wrote: > >> The original author of this particular code knows of the issue, but is >> currently out on vacation. >> >> >> >> Thanks, >> >> Eric >> >> >> >> *From:* Eric Christopher >> *Sent:* Monday, July 6, 2020 2:40 PM >> *To:* reviews+D82387+public+36dcff9b060bae84 at reviews.llvm.org >> *Cc:* Eric Schweitz (PGI) ; Jean Perier < >> jperier at nvidia.com>; Steve Scalpone ; >> kiranchandramohan at gmail.com; clementval at gmail.com; Doerfert, Johannes < >> jdoerfert at anl.gov>; David Truby ; >> gsocsameeran at gmail.com; River Riddle ; >> stephen.neuendorffer at gmail.com; llvm-commits ; >> 88888yl at gmail.com; Samuel.j.knapp at btinternet.com; Peter Steinfeld < >> psteinfeld at nvidia.com>; aperry at lanl.gov; Timothy Keith ; >> Sourabh Singh Tomar ; Isuru Fernando < >> isuruf at gmail.com>; Valentin Churavy ; >> uday at polymagelabs.com >> *Subject:* Re: [PATCH] D82387: [flang] add RTBuilder >> >> >> >> Agreed. This should be fixed :) >> >> >> >> -eric >> >> >> >> On Mon, Jul 6, 2020 at 12:54 PM David Truby via Phabricator < >> reviews at reviews.llvm.org> wrote: >> >> DavidTruby added a comment. >> >> Is there a reason to use `float _Complex` here at all? The C++ standard >> (29.5.4 of C++17) guarantees that `std::complex` and `float >> _Complex` are layout compatible and can be reinterpret_casted to each other >> so even if these functions are intended to be callable from C/interoperable >> with _Complex in C code, it'd be better to use std::complex on the >> C++ side. >> >> >> Repository: >> rG LLVM Github Monorepo >> >> CHANGES SINCE LAST ACTION >> https://reviews.llvm.org/D82387/new/ >> >> https://reviews.llvm.org/D82387 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:23:29 2020 From: llvm-commits at lists.llvm.org (Rainer Orth via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:23:29 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: <09782b2d11873fce9e070a69b292f768@localhost.localdomain> ro added a comment. This patch broke `ninja check-all` on the Solaris buildbots, e.g. clang-solaris11-sparcv9 : [530/724] Building CXX object unittests/IR/CMakeFiles/IRTests.dir/ConstantsTest.cpp.o FAILED: unittests/IR/CMakeFiles/IRTests.dir/ConstantsTest.cpp.o /opt/llvm-buildbot/bin/c++ -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0 -DGTEST_LANG_CXX11=1 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iunittests/IR -I/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR -Iinclude -I/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include -I/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include/llvm/Support/Solaris -I/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/utils/unittest/googletest/include -I/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/utils/unittest/googlemock/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -Wno-variadic-macros -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT unittests/IR/CMakeFiles/IRTests.dir/ConstantsTest.cpp.o -MF unittests/IR/CMakeFiles/IRTests.dir/ConstantsTest.cpp.o.d -o unittests/IR/CMakeFiles/IRTests.dir/ConstantsTest.cpp.o -c /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp In file included from /usr/include/sys/select.h:27:0, from /usr/include/sys/types.h:665, from /usr/include/sys/wait.h:12, from /usr/include/stdlib.h:16, from /opt/llvm-buildbot/include/c++/7.4.0/cstdlib:75, from /opt/llvm-buildbot/include/c++/7.4.0/bits/stl_algo.h:59, from /opt/llvm-buildbot/include/c++/7.4.0/algorithm:62, from /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include/llvm/Support/MathExtras.h:17, from /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include/llvm/ADT/APInt.h:19, from /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include/llvm/ADT/APFloat.h:19, from /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/include/llvm/IR/Constants.h:23, from /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp:9: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp: In member function ‘virtual void llvm::{anonymous}::ConstantsTest_GetSplatValueRoundTrip_Test::TestBody()’: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp:649:18: error: expected unqualified-id before numeric constant ElementCount SEC = {Min, true}; ^ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp:652:29: error: unable to deduce ‘std::initializer_list&&’ from ‘{1, FEC}’ for (auto EC : {SEC, FEC}) { ^ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/unittests/IR/ConstantsTest.cpp:652:29: note: deduced conf `` has `#define SEC 1` Please fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 From llvm-commits at lists.llvm.org Wed Jul 8 07:24:55 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:24:55 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <7ab74fccf4c84d9f150580e9511bec1d@localhost.localdomain> DavidTruby accepted this revision. DavidTruby added a comment. This revision is now accepted and ready to land. LGTM, @schweitz might want to check in case there was a specific reason he was using _Complex previously. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:25:09 2020 From: llvm-commits at lists.llvm.org (Simon Wallis via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:25:09 +0000 (UTC) Subject: [PATCH] D82638: [MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef In-Reply-To: References: Message-ID: <9eb9d6b6945c9647d6eac4a8ec99e564@localhost.localdomain> simonwallis2 added a comment. Please review the recent changes made since the last review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82638/new/ https://reviews.llvm.org/D82638 From llvm-commits at lists.llvm.org Wed Jul 8 07:28:07 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:28:07 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <127ccbf9cdc5bf218354d3b26222af49@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- This limitation also only needs to be applied if AS == FLAT_ADDRESS Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 07:30:06 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via llvm-commits) Date: Wed, 08 Jul 2020 07:30:06 -0700 (PDT) Subject: [llvm] 26a2247 - [CodeGen] Don't combine extract + concat vectors with non-legal types Message-ID: <5f05d86e.1c69fb81.5b739.053a@mx.google.com> Author: Ties Stuij Date: 2020-07-08T15:29:57+01:00 New Revision: 26a22478cdfe6fe4d169320910c38958d5dafc38 URL: https://github.com/llvm/llvm-project/commit/26a22478cdfe6fe4d169320910c38958d5dafc38 DIFF: https://github.com/llvm/llvm-project/commit/26a22478cdfe6fe4d169320910c38958d5dafc38.diff LOG: [CodeGen] Don't combine extract + concat vectors with non-legal types Summary: The following combine currently breaks in the DAGCombiner: ``` extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x -> extract_vector_elt a, x ``` This happens because after we have combined these nodes we have inserted nodes that use individual instances of the vector element type. In the above example i16. However this isn't a legal type on all backends, and when the combining pass calls the legalizer it breaks as it expects types to already be legal. The type legalizer has already been run, and running it again would make a mess of the nodes. In the example code at least, the generated code is still efficient after the change. Reviewers: miyuki, arsenm, dmgreen, lebedev.ri Reviewed By: miyuki, lebedev.ri Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83231 Added: llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 4042a81b9cb7..a1d5769369bb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -17843,8 +17843,11 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) { Elt = (Idx < (int)NumElts) ? Idx : Idx - (int)NumElts; Index = DAG.getConstant(Elt, DL, Index.getValueType()); } - } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && - !BCNumEltsChanged && VecVT.getVectorElementType() == ScalarVT) { + } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && !BCNumEltsChanged && + VecVT.getVectorElementType() == ScalarVT && + (!LegalTypes || + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType()))) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 // -> extract_vector_elt a, 0 // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 1 diff --git a/llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll b/llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll new file mode 100644 index 000000000000..1662e27ecdef --- /dev/null +++ b/llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll @@ -0,0 +1,17 @@ +; RUN: llc -asm-verbose=0 -mtriple aarch64-arm-none-eabi < %s | FileCheck %s + +; The following code previously broke in the DAGCombiner. Specifically, trying to combine: +; extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x +; -> extract_vector_elt a, x + +define half @test_combine_extract_concat_vectors(<4 x i16> %a) nounwind { +entry: + %0 = shufflevector <4 x i16> %a, <4 x i16> undef, <8 x i32> + %1 = bitcast <8 x i16> %0 to <8 x half> + %2 = extractelement <8 x half> %1, i32 3 + ret half %2 +} + +; CHECK-LABEL: test_combine_extract_concat_vectors: +; CHECK-NEXT: mov h0, v0.h[3] +; CHECK-NEXT: ret From llvm-commits at lists.llvm.org Wed Jul 8 07:30:14 2020 From: llvm-commits at lists.llvm.org (Ties Stuij via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:30:14 +0000 (UTC) Subject: [PATCH] D83231: [CodeGen] Don't combine extract + concat vectors with non-legal types In-Reply-To: References: Message-ID: <87fdb74cd0f85cd0ea324798ebe1d066@localhost.localdomain> This revision was automatically updated to reflect the committed changes. stuij marked an inline comment as done. Closed by commit rG26a22478cdfe: [CodeGen] Don't combine extract + concat vectors with non-legal types (authored by stuij). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83231/new/ https://reviews.llvm.org/D83231 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll Index: llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/regress-combine-extract-vectors.ll @@ -0,0 +1,17 @@ +; RUN: llc -asm-verbose=0 -mtriple aarch64-arm-none-eabi < %s | FileCheck %s + +; The following code previously broke in the DAGCombiner. Specifically, trying to combine: +; extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x +; -> extract_vector_elt a, x + +define half @test_combine_extract_concat_vectors(<4 x i16> %a) nounwind { +entry: + %0 = shufflevector <4 x i16> %a, <4 x i16> undef, <8 x i32> + %1 = bitcast <8 x i16> %0 to <8 x half> + %2 = extractelement <8 x half> %1, i32 3 + ret half %2 +} + +; CHECK-LABEL: test_combine_extract_concat_vectors: +; CHECK-NEXT: mov h0, v0.h[3] +; CHECK-NEXT: ret Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -17843,8 +17843,11 @@ Elt = (Idx < (int)NumElts) ? Idx : Idx - (int)NumElts; Index = DAG.getConstant(Elt, DL, Index.getValueType()); } - } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && - !BCNumEltsChanged && VecVT.getVectorElementType() == ScalarVT) { + } else if (VecOp.getOpcode() == ISD::CONCAT_VECTORS && !BCNumEltsChanged && + VecVT.getVectorElementType() == ScalarVT && + (!LegalTypes || + TLI.isTypeLegal( + VecOp.getOperand(0).getValueType().getVectorElementType()))) { // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 0 // -> extract_vector_elt a, 0 // extract_vector_elt (concat_vectors v2i16:a, v2i16:b), 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83231.276428.patch Type: text/x-patch Size: 1874 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:30:50 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:30:50 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <9d2a7f23e40ce21f84fa504e3dc39542@localhost.localdomain> schweitz requested changes to this revision. schweitz added a comment. This revision now requires changes to proceed. This will not compile in our builds. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:32:21 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:32:21 +0000 (UTC) Subject: [PATCH] D83336: [flang] Support for image selectors In-Reply-To: References: Message-ID: <698dd3b07785b57bc5c75630d54bf999@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG15fa287b64d0: [flang] Support for image selectors (authored by PeteSteinfeld). Changed prior to commit: https://reviews.llvm.org/D83336?vs=276182&id=276429#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83336/new/ https://reviews.llvm.org/D83336 Files: flang/include/flang/Parser/tools.h flang/lib/Parser/tools.cpp flang/lib/Semantics/check-coarray.cpp flang/lib/Semantics/check-coarray.h flang/lib/Semantics/expression.cpp flang/test/Semantics/resolve94.f90 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83336.276429.patch Type: text/x-patch Size: 9435 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:33:40 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:33:40 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: foad marked an inline comment as done. foad added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- arsenm wrote: > This limitation also only needs to be applied if AS == FLAT_ADDRESS The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 07:33:55 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:33:55 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <11e41cba55073783b4e491885b9c93cc@localhost.localdomain> DavidTruby added a comment. In D83397#2139078 , @schweitz wrote: > This will not compile in our builds. Could you give a little more info here? What's the error? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:34:08 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:34:08 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <2391dfe79a9c1b595072408a8c52c814@localhost.localdomain> echristo added a comment. In whose builds and why? What changes would you like? What compiler are you using? Can you provide any more information? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:34:45 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:34:45 +0000 (UTC) Subject: [PATCH] D83355: [flang] upstream intrinsic call lowering In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG24b62f28c5da: [flang] Upstreaming intrinsic call lowering. (authored by schweitz). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83355/new/ https://reviews.llvm.org/D83355 Files: flang/include/flang/Lower/CharacterExpr.h flang/include/flang/Lower/IntrinsicCall.h flang/include/flang/Lower/Mangler.h flang/include/flang/Optimizer/Dialect/FIRType.h flang/lib/Lower/CMakeLists.txt flang/lib/Lower/CharacterExpr.cpp flang/lib/Lower/IntrinsicCall.cpp flang/lib/Lower/Mangler.cpp flang/lib/Optimizer/Dialect/FIRType.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83355.276431.patch Type: text/x-patch Size: 81345 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:37:45 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:37:45 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: ebevhan updated this revision to Diff 276432. ebevhan added a comment. Fixed review comment and updated summary. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.276432.patch Type: text/x-patch Size: 51125 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:40:05 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:40:05 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <668662cb45589bb1e6de511fac6fa9d7@localhost.localdomain> lebedev.ri added a comment. In D83216#2139098 , @ebevhan wrote: > Fixed review comment and updated summary. (note that updating commit msg does not automatically update description in phab differential) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 07:42:03 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:42:03 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- foad wrote: > arsenm wrote: > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 07:42:50 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:42:50 +0000 (UTC) Subject: [PATCH] D82638: [MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef In-Reply-To: References: Message-ID: lkail added a comment. Hi @simonwallis2 , you can create an MIR test by llc -simplify-mir -verify-machineinstrs -mtriple=arm-eabi -stop-before=machine-cp foo.ll -o foo.mir and then add RUN line for it RUN: llc -simplify-mir -verify-machineinstrs -mtriple=arm-eabi -run-pass=machine-cp %s -o - | FileCheck %s CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82638/new/ https://reviews.llvm.org/D82638 From llvm-commits at lists.llvm.org Wed Jul 8 07:44:31 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:44:31 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <4025ebb99807c8f9791517f77a821762@localhost.localdomain> schweitz added a comment. Hi Eric, There is an active development branch for the flang middle end. https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev That code base is being upstreamed piecemeal. Not all of the code is upstreamed at this point. It is simply a false impression that code in the middle of being upstreamed is "unused" or "unnecessary". Since it not all of it is upstreamed, changing interfaces and support code in llvm-project directly is going to cause problems that can become hard to track and resolve while the upstreaming is ongoing. Besides, there are plenty of compilation warnings in LLVM today, and many of them from personal experience have persisted for months before being resolved. This is not a unique case. Is it possible to wait for the author to implement a solution? Or if not, can we just disable the warning? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:46:09 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:46:09 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <9e14270c60971b1e3fdde5bd08c39450@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- arsenm wrote: > foad wrote: > > arsenm wrote: > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? > Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 07:46:39 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:46:39 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: amyk accepted this revision as: amyk. amyk added a comment. This revision is now accepted and ready to land. This LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Wed Jul 8 07:49:04 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:49:04 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. In-Reply-To: References: Message-ID: <4661f0b64c3224fd6367fcd1cb2f31aa@localhost.localdomain> fhahn updated this revision to Diff 276435. fhahn added a comment. In D83288#2135681 , @dmgreen wrote: > Is the DT reliable enough to use for checking the block is in the loop? I see we might have to exclude the preheader and midblock. But if it's not uptodate, and only knows about split block, it might think the midblock it dominated by the vector body. > Maybe if LI->getLoopFor(LoopVectorBody == LI->getLoopFor(InsertBB) works, that may be better? It looks like LI is kept uptodate as blocks get split. We are mainly want to avoid using LoopVectorBody as insert point for blocks not dominated by t. The new blocks will be unreachable initially and dominated by everything, so it should work fine. That being said, LI is updated directly as you mentioned, so it is probably better to just use LI. Updated the patch, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83288/new/ https://reviews.llvm.org/D83288 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83288.276435.patch Type: text/x-patch Size: 8223 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:49:28 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:49:28 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <2754991427b963904458c4e041e3f2d8@localhost.localdomain> ebevhan added a comment. In D83216#2138974 , @lebedev.ri wrote: > Patch as-is looks good but i'm not sure what's the RFC status here. > If these new intrinsics were already previously proposed as part of some RFC that got accepted, > can you state that in the patch's description? (with link to the thread) I added the links to the threads I mentioned earlier. Looking back at the full discussion, it doesn't really seem like any real consensus regarding **how** to implement the types was reached, but the prevailing view was that an altogether new IR type was the best approach. I don't think either I or Leonard thought that was the right (or fastest, at least) way to go, though. The final listing of intrinsics was in http://lists.llvm.org/pipermail/llvm-dev/2018-September/126311.html but the design has diverged a bit from that since then. In D83216#2139104 , @lebedev.ri wrote: > In D83216#2139098 , @ebevhan wrote: > > > Fixed review comment and updated summary. > > > (note that updating commit msg does not automatically update description in phab differential) I noticed! I was looking for an arc option to do that but couldn't seem to find one. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 07:49:56 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:49:56 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <4b70cbb36614bc0a58874383a5b4b7b6@localhost.localdomain> echristo added a comment. In D83397#2139111 , @schweitz wrote: > Hi Eric, > > There is an active development branch for the flang middle end. https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev That's not part of the llvm project. > That code base is being upstreamed piecemeal. Not all of the code is upstreamed at this point. It is simply a false impression that code in the middle of being upstreamed is "unused" or "unnecessary". Since it not all of it is upstreamed, changing interfaces and support code in llvm-project directly is going to cause problems that can become hard to track and resolve while the upstreaming is ongoing. It very much is unused and unnecessary as there are no pieces of that code in the repository. > Besides, there are plenty of compilation warnings in LLVM today, and many of them from personal experience have persisted for months before being resolved. This is not a unique case. In general, I'd raise exception with "there are plenty of compilation warnings in LLVM today" as it's just not true - at least not with clang as I run a full build with Werror and fix any that slip in. Happy to have anything you're seeing fixed, but warnings with errors turned on are even on bots. Do you have a bot that shows a warning that's erroring out? This is the only warning that shows up in my full build with clang. > Is it possible to wait for the author to implement a solution? Or if not, can we just disable the warning? What solution? What compiler is causing issues? If it's a supported compiler I'm happy to help figure it out, but this has been going on for a bit and it's a pair of one line fixes. -eric Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 07:51:13 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:51:13 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: <7af8924153349a7896e202d28000f8c5@localhost.localdomain> dmgreen added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1033 ---------------- Latch might be a better name here. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:4 +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + ---------------- I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:7 +; Function Attrs: nofree norecurse nounwind +define void @a(i8* readnone %b) local_unnamed_addr #0 { +; CHECK-LABEL: @a( ---------------- Also some of this might be able to be cleaned up, like the local_unnamed_addr, the metadata and all/most(?) of the attributes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 From llvm-commits at lists.llvm.org Wed Jul 8 07:52:14 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:52:14 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <8c85bb1579d31018a1b71774b0309f59@localhost.localdomain> lebedev.ri added a comment. In D83216#2139130 , @ebevhan wrote: > In D83216#2138974 , @lebedev.ri wrote: > > > Patch as-is looks good but i'm not sure what's the RFC status here. > > If these new intrinsics were already previously proposed as part of some RFC that got accepted, > > can you state that in the patch's description? (with link to the thread) > > > I added the links to the threads I mentioned earlier. > > Looking back at the full discussion, it doesn't really seem like any real consensus regarding **how** to implement the types was reached, but the prevailing view was that an altogether new IR type was the best approach. I don't think either I or Leonard thought that was the right (or fastest, at least) way to go, though. > > The final listing of intrinsics was in http://lists.llvm.org/pipermail/llvm-dev/2018-September/126311.html but the design has diverged a bit from that since then. i see. I think this is fine, but just to be safe, may i suggest to do an RFC for these two intrinsics specifically, just so we're 100% sure everyone is on the same page about them? > In D83216#2139104 , @lebedev.ri wrote: > >> In D83216#2139098 , @ebevhan wrote: >> >> > Fixed review comment and updated summary. >> >> >> (note that updating commit msg does not automatically update description in phab differential) > > > I noticed! I was looking for an arc option to do that but couldn't seem to find one. Sorry, it's just a repeating issue in many reviews :/ Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 07:55:54 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:55:54 +0000 (UTC) Subject: [PATCH] D83315: Turn arcmt-* options into a single option In-Reply-To: References: Message-ID: <30fda6527475da530a21587ee6597a3e@localhost.localdomain> dang updated this revision to Diff 276438. dang added a comment. Instead of using a Separate option kind (-arcmt-action action) use a Joined kind (-arcmt-action=*) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83315/new/ https://reviews.llvm.org/D83315 Files: clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/test/ARCMT/GC-check-warn-nsalloc.m clang/test/ARCMT/GC-check.m clang/test/ARCMT/atautorelease-check.m clang/test/ARCMT/check-api.m clang/test/ARCMT/check-with-pch.m clang/test/ARCMT/check-with-serialized-diag.m clang/test/ARCMT/checking-in-arc.m clang/test/ARCMT/checking.m clang/test/ARCMT/cxx-checking.mm clang/test/ARCMT/driver-migrate.m clang/test/ARCMT/migrate-emit-errors.m clang/test/ARCMT/migrate-plist-output.m clang/test/ARCMT/migrate-space-in-path.m clang/test/ARCMT/migrate-with-pch.m clang/test/ARCMT/migrate.m clang/test/ARCMT/no-canceling-bridge-to-bridge-cast.m clang/test/ARCMT/nonobjc-to-objc-cast-2.m clang/test/ARCMT/releases-driver.m clang/test/ARCMT/releases-driver.m.result clang/test/ARCMT/verify.m clang/test/ARCMT/with-arc-mode-modify.m clang/test/ARCMT/with-arc-mode-modify.m.result llvm/include/llvm/Option/OptParser.td llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83315.276438.patch Type: text/x-patch Size: 21338 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:55:49 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Wed, 8 Jul 2020 07:55:49 -0700 Subject: [llvm] 1fed131 - [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE In-Reply-To: References: <5eec28f8.1c69fb81.b595b.ca3c@mx.google.com> Message-ID: To follow up here: 1b1539712e1ee30c02ed20493682fc05d52391c0 fixed the crashes I was seeing. Thanks Nemanja! :) On Mon, Jul 6, 2020 at 4:58 PM Eric Christopher wrote: > Hi Nemanja! > > Running into a compiler crash with this building skia (https://skia.org/) > for power after this patch. I'll see what I can do to get a testcase (if it > doesn't reproduce for you), but would you mind terribly reverting in the > meantime? > > Thanks! > > -eric > > On Thu, Jun 18, 2020 at 7:55 PM Nemanja Ivanovic via llvm-commits < > llvm-commits at lists.llvm.org> wrote: > >> >> Author: Nemanja Ivanovic >> Date: 2020-06-18T21:54:22-05:00 >> New Revision: 1fed131660b2c5d3ea7007e273a7a5da80699445 >> >> URL: >> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445 >> DIFF: >> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445.diff >> >> LOG: [PowerPC] Canonicalize shuffles to match more single-instruction >> masks on LE >> >> We currently miss a number of opportunities to emit single-instruction >> VMRG[LH][BHW] instructions for shuffles on little endian subtargets. >> Although >> this in itself is not a huge performance opportunity since loading the >> permute >> vector for a VPERM can always be pulled out of loops, producing such merge >> instructions is useful to downstream optimizations. >> Since VPERM is essentially opaque to all subsequent optimizations, we >> want to >> avoid it as much as possible. Other permute instructions have semantics >> that can >> be reasoned about much more easily in later optimizations. >> >> This patch does the following: >> - Canonicalize shuffles so that the first element comes from the first >> vector >> (since that's what most of the mask matching functions want) >> - Switch the elements that come from splat vectors so that they match the >> corresponding elements from the other vector (to allow for merges) >> - Adds debugging messages for when a shuffle is matched to a VPERM so that >> anyone interested in improving this further can get the info for their >> code >> >> Differential revision: https://reviews.llvm.org/D77448 >> >> Added: >> >> >> Modified: >> llvm/lib/Target/PowerPC/PPCISelLowering.cpp >> llvm/lib/Target/PowerPC/PPCISelLowering.h >> llvm/lib/Target/PowerPC/PPCInstrVSX.td >> llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll >> llvm/test/CodeGen/PowerPC/build-vector-tests.ll >> llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll >> llvm/test/CodeGen/PowerPC/fp-strict-round.ll >> llvm/test/CodeGen/PowerPC/load-and-splat.ll >> llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll >> llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll >> llvm/test/CodeGen/PowerPC/pr25080.ll >> llvm/test/CodeGen/PowerPC/pr25157-peephole.ll >> llvm/test/CodeGen/PowerPC/pr38087.ll >> llvm/test/CodeGen/PowerPC/pre-inc-disable.ll >> llvm/test/CodeGen/PowerPC/qpx-load-splat.ll >> llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll >> llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll >> llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll >> llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll >> llvm/test/CodeGen/PowerPC/swaps-le-5.ll >> llvm/test/CodeGen/PowerPC/swaps-le-6.ll >> llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll >> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll >> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll >> llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll >> llvm/test/CodeGen/PowerPC/vsx.ll >> llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll >> >> Removed: >> >> >> >> >> ################################################################################ >> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp >> b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp >> index d7698a5ec962..28bd80610c84 100644 >> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp >> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp >> @@ -125,6 +125,7 @@ cl::desc("use absolute jump tables on ppc"), >> cl::Hidden); >> >> STATISTIC(NumTailCalls, "Number of tail calls"); >> STATISTIC(NumSiblingCalls, "Number of sibling calls"); >> +STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a >> VPERM"); >> >> static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int); >> >> @@ -1505,6 +1506,8 @@ const char >> *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const { >> case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ"; >> case PPCISD::SINT_VEC_TO_FP: return "PPCISD::SINT_VEC_TO_FP"; >> case PPCISD::UINT_VEC_TO_FP: return "PPCISD::UINT_VEC_TO_FP"; >> + case PPCISD::SCALAR_TO_VECTOR_PERMUTED: >> + return "PPCISD::SCALAR_TO_VECTOR_PERMUTED"; >> case PPCISD::ANDI_rec_1_EQ_BIT: >> return "PPCISD::ANDI_rec_1_EQ_BIT"; >> case PPCISD::ANDI_rec_1_GT_BIT: >> @@ -2716,7 +2719,8 @@ static bool usePartialVectorLoads(SDNode *N, const >> PPCSubtarget& ST) { >> for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end(); >> UI != UE; ++UI) >> if (UI.getUse().get().getResNo() == 0 && >> - UI->getOpcode() != ISD::SCALAR_TO_VECTOR) >> + UI->getOpcode() != ISD::SCALAR_TO_VECTOR && >> + UI->getOpcode() != PPCISD::SCALAR_TO_VECTOR_PERMUTED) >> return false; >> >> return true; >> @@ -9041,7 +9045,8 @@ static const SDValue *getNormalLoadInput(const >> SDValue &Op) { >> const SDValue *InputLoad = &Op; >> if (InputLoad->getOpcode() == ISD::BITCAST) >> InputLoad = &InputLoad->getOperand(0); >> - if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR) >> + if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR || >> + InputLoad->getOpcode() == PPCISD::SCALAR_TO_VECTOR_PERMUTED) >> InputLoad = &InputLoad->getOperand(0); >> if (InputLoad->getOpcode() != ISD::LOAD) >> return nullptr; >> @@ -9690,6 +9695,15 @@ SDValue >> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, >> SDValue V1 = Op.getOperand(0); >> SDValue V2 = Op.getOperand(1); >> ShuffleVectorSDNode *SVOp = cast(Op); >> + >> + // Any nodes that were combined in the target-independent combiner >> prior >> + // to vector legalization will not be sent to the target combine. Try >> to >> + // combine it here. >> + if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) { >> + DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle); >> + Op = NewShuffle; >> + SVOp = cast(Op); >> + } >> EVT VT = Op.getValueType(); >> bool isLittleEndian = Subtarget.isLittleEndian(); >> >> @@ -9715,6 +9729,11 @@ SDValue >> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, >> Offset = isLittleEndian ? (3 - SplatIdx) * 4 : SplatIdx * 4; >> else >> Offset = isLittleEndian ? (1 - SplatIdx) * 8 : SplatIdx * 8; >> + >> + // If we are loading a partial vector, it does not make sense to >> adjust >> + // the base pointer. This happens with (splat (s_to_v_permuted >> (ld))). >> + if (LD->getMemoryVT().getSizeInBits() == (IsFourByte ? 32 : 64)) >> + Offset = 0; >> SDValue BasePtr = LD->getBasePtr(); >> if (Offset != 0) >> BasePtr = DAG.getNode(ISD::ADD, dl, >> getPointerTy(DAG.getDataLayout()), >> @@ -9988,7 +10007,13 @@ SDValue >> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, >> MVT::i32)); >> } >> >> + ShufflesHandledWithVPERM++; >> SDValue VPermMask = DAG.getBuildVector(MVT::v16i8, dl, ResultMask); >> + LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n"); >> + LLVM_DEBUG(SVOp->dump()); >> + LLVM_DEBUG(dbgs() << "With the following permute control vector:\n"); >> + LLVM_DEBUG(VPermMask.dump()); >> + >> if (isLittleEndian) >> return DAG.getNode(PPCISD::VPERM, dl, V1.getValueType(), >> V2, V1, VPermMask); >> @@ -14114,6 +14139,199 @@ SDValue >> PPCTargetLowering::combineStoreFPToInt(SDNode *N, >> return Val; >> } >> >> +static bool isAlternatingShuffMask(const ArrayRef &Mask, int >> NumElts) { >> + // Check that the source of the element keeps flipping >> + // (i.e. Mask[i] < NumElts -> Mask[i+i] >= NumElts). >> + bool PrevElemFromFirstVec = Mask[0] < NumElts; >> + for (int i = 1, e = Mask.size(); i < e; i++) { >> + if (PrevElemFromFirstVec && Mask[i] < NumElts) >> + return false; >> + if (!PrevElemFromFirstVec && Mask[i] >= NumElts) >> + return false; >> + PrevElemFromFirstVec = !PrevElemFromFirstVec; >> + } >> + return true; >> +} >> + >> +static bool isSplatBV(SDValue Op) { >> + if (Op.getOpcode() != ISD::BUILD_VECTOR) >> + return false; >> + SDValue FirstOp; >> + >> + // Find first non-undef input. >> + for (int i = 0, e = Op.getNumOperands(); i < e; i++) { >> + FirstOp = Op.getOperand(i); >> + if (!FirstOp.isUndef()) >> + break; >> + } >> + >> + // All inputs are undef or the same as the first non-undef input. >> + for (int i = 1, e = Op.getNumOperands(); i < e; i++) >> + if (Op.getOperand(i) != FirstOp && !Op.getOperand(i).isUndef()) >> + return false; >> + return true; >> +} >> + >> +static SDValue isScalarToVec(SDValue Op) { >> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR) >> + return Op; >> + if (Op.getOpcode() != ISD::BITCAST) >> + return SDValue(); >> + Op = Op.getOperand(0); >> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR) >> + return Op; >> + return SDValue(); >> +} >> + >> +static void fixupShuffleMaskForPermutedSToV(SmallVectorImpl &ShuffV, >> + int LHSMaxIdx, int RHSMinIdx, >> + int RHSMaxIdx, int HalfVec) { >> + for (int i = 0, e = ShuffV.size(); i < e; i++) { >> + int Idx = ShuffV[i]; >> + if ((Idx >= 0 && Idx < LHSMaxIdx) || (Idx >= RHSMinIdx && Idx < >> RHSMaxIdx)) >> + ShuffV[i] += HalfVec; >> + } >> + return; >> +} >> + >> +// Replace a SCALAR_TO_VECTOR with a SCALAR_TO_VECTOR_PERMUTED except if >> +// the original is: >> +// ( (scalar_to_vector (Ty (extract_elt %a, C)))) >> +// In such a case, just change the shuffle mask to extract the element >> +// from the permuted index. >> +static SDValue getSToVPermuted(SDValue OrigSToV, SelectionDAG &DAG) { >> + SDLoc dl(OrigSToV); >> + EVT VT = OrigSToV.getValueType(); >> + assert(OrigSToV.getOpcode() == ISD::SCALAR_TO_VECTOR && >> + "Expecting a SCALAR_TO_VECTOR here"); >> + SDValue Input = OrigSToV.getOperand(0); >> + >> + if (Input.getOpcode() == ISD::EXTRACT_VECTOR_ELT) { >> + ConstantSDNode *Idx = dyn_cast(Input.getOperand(1)); >> + SDValue OrigVector = Input.getOperand(0); >> + >> + // Can't handle non-const element indices or >> diff erent vector types >> + // for the input to the extract and the output of the >> scalar_to_vector. >> + if (Idx && VT == OrigVector.getValueType()) { >> + SmallVector NewMask(VT.getVectorNumElements(), -1); >> + NewMask[VT.getVectorNumElements() / 2] = Idx->getZExtValue(); >> + return DAG.getVectorShuffle(VT, dl, OrigVector, OrigVector, >> NewMask); >> + } >> + } >> + return DAG.getNode(PPCISD::SCALAR_TO_VECTOR_PERMUTED, dl, VT, >> + OrigSToV.getOperand(0)); >> +} >> + >> +// On little endian subtargets, combine shuffles such as: >> +// vector_shuffle<16,1,17,3,18,5,19,7,20,9,21,11,22,13,23,15>, , %b >> +// into: >> +// vector_shuffle<16,0,17,1,18,2,19,3,20,4,21,5,22,6,23,7>, , %b >> +// because the latter can be matched to a single instruction merge. >> +// Furthermore, SCALAR_TO_VECTOR on little endian always involves a >> permute >> +// to put the value into element zero. Adjust the shuffle mask so that >> the >> +// vector can remain in permuted form (to prevent a swap prior to a >> shuffle). >> +SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN, >> + SelectionDAG &DAG) const >> { >> + SDValue LHS = SVN->getOperand(0); >> + SDValue RHS = SVN->getOperand(1); >> + auto Mask = SVN->getMask(); >> + int NumElts = LHS.getValueType().getVectorNumElements(); >> + SDValue Res(SVN, 0); >> + SDLoc dl(SVN); >> + >> + // None of these combines are useful on big endian systems since the >> ISA >> + // already has a big endian bias. >> + if (!Subtarget.isLittleEndian()) >> + return Res; >> + >> + // If this is not a shuffle of a shuffle and the first element comes >> from >> + // the second vector, canonicalize to the commuted form. This will >> make it >> + // more likely to match one of the single instruction patterns. >> + if (Mask[0] >= NumElts && LHS.getOpcode() != ISD::VECTOR_SHUFFLE && >> + RHS.getOpcode() != ISD::VECTOR_SHUFFLE) { >> + std::swap(LHS, RHS); >> + Res = DAG.getCommutedVectorShuffle(*SVN); >> + Mask = cast(Res)->getMask(); >> + } >> + >> + // Adjust the shuffle mask if either input vector comes from a >> + // SCALAR_TO_VECTOR and keep the respective input vector in permuted >> + // form (to prevent the need for a swap). >> + SmallVector ShuffV(Mask.begin(), Mask.end()); >> + SDValue SToVLHS = isScalarToVec(LHS); >> + SDValue SToVRHS = isScalarToVec(RHS); >> + if (SToVLHS || SToVRHS) { >> + int NumEltsIn = SToVLHS ? >> SToVLHS.getValueType().getVectorNumElements() >> + : >> SToVRHS.getValueType().getVectorNumElements(); >> + int NumEltsOut = ShuffV.size(); >> + >> + // Initially assume that neither input is permuted. These will be >> adjusted >> + // accordingly if either input is. >> + int LHSMaxIdx = -1; >> + int RHSMinIdx = -1; >> + int RHSMaxIdx = -1; >> + int HalfVec = LHS.getValueType().getVectorNumElements() / 2; >> + >> + // Get the permuted scalar to vector nodes for the source(s) that >> come from >> + // ISD::SCALAR_TO_VECTOR. >> + if (SToVLHS) { >> + // Set up the values for the shuffle vector fixup. >> + LHSMaxIdx = NumEltsOut / NumEltsIn; >> + SToVLHS = getSToVPermuted(SToVLHS, DAG); >> + if (SToVLHS.getValueType() != LHS.getValueType()) >> + SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS); >> + LHS = SToVLHS; >> + } >> + if (SToVRHS) { >> + RHSMinIdx = NumEltsOut; >> + RHSMaxIdx = NumEltsOut / NumEltsIn + RHSMinIdx; >> + SToVRHS = getSToVPermuted(SToVRHS, DAG); >> + if (SToVRHS.getValueType() != RHS.getValueType()) >> + SToVRHS = DAG.getBitcast(RHS.getValueType(), SToVRHS); >> + RHS = SToVRHS; >> + } >> + >> + // Fix up the shuffle mask to reflect where the desired element >> actually is. >> + // The minimum and maximum indices that correspond to element zero >> for both >> + // the LHS and RHS are computed and will control which shuffle mask >> entries >> + // are to be changed. For example, if the RHS is permuted, any >> shuffle mask >> + // entries in the range [RHSMinIdx,RHSMaxIdx) will be incremented by >> + // HalfVec to refer to the corresponding element in the permuted >> vector. >> + fixupShuffleMaskForPermutedSToV(ShuffV, LHSMaxIdx, RHSMinIdx, >> RHSMaxIdx, >> + HalfVec); >> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, >> ShuffV); >> + >> + // We may have simplified away the shuffle. We won't be able to do >> anything >> + // further with it here. >> + if (!isa(Res)) >> + return Res; >> + Mask = cast(Res)->getMask(); >> + } >> + >> + // The common case after we commuted the shuffle is that the RHS is a >> splat >> + // and we have elements coming in from the splat at indices that are >> not >> + // conducive to using a merge. >> + // Example: >> + // vector_shuffle<0,17,1,19,2,21,3,23,4,25,5,27,6,29,7,31> t1, >> + if (!isSplatBV(RHS)) >> + return Res; >> + >> + // We are looking for a mask such that all even elements are from >> + // one vector and all odd elements from the other. >> + if (!isAlternatingShuffMask(Mask, NumElts)) >> + return Res; >> + >> + // Adjust the mask so we are pulling in the same index from the splat >> + // as the index from the interesting vector in consecutive elements. >> + // Example: >> + // vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t1, >> + for (int i = 1, e = Mask.size(); i < e; i += 2) >> + ShuffV[i] = (ShuffV[i - 1] + NumElts); >> + >> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV); >> + return Res; >> +} >> + >> SDValue PPCTargetLowering::combineVReverseMemOP(ShuffleVectorSDNode *SVN, >> LSBaseSDNode *LSBase, >> DAGCombinerInfo &DCI) >> const { >> @@ -14223,7 +14441,7 @@ SDValue >> PPCTargetLowering::PerformDAGCombine(SDNode *N, >> LSBaseSDNode* LSBase = cast(N->getOperand(0)); >> return combineVReverseMemOP(cast(N), LSBase, >> DCI); >> } >> - break; >> + return combineVectorShuffle(cast(N), DCI.DAG); >> case ISD::STORE: { >> >> EVT Op1VT = N->getOperand(1).getValueType(); >> >> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h >> b/llvm/lib/Target/PowerPC/PPCISelLowering.h >> index 77252e919553..9f7c6ab53a17 100644 >> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h >> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h >> @@ -221,6 +221,14 @@ namespace llvm { >> /// As with SINT_VEC_TO_FP, used for converting illegal types. >> UINT_VEC_TO_FP, >> >> + /// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to >> + /// place the value into the least significant element of the most >> + /// significant doubleword in the vector. This is not element zero >> for >> + /// anything smaller than a doubleword on either endianness. This >> node has >> + /// the same semantics as SCALAR_TO_VECTOR except that the value >> remains in >> + /// the aforementioned location in the vector register. >> + SCALAR_TO_VECTOR_PERMUTED, >> + >> // FIXME: Remove these once the ANDI glue bug is fixed: >> /// i1 = ANDI_rec_1_[EQ|GT]_BIT(i32 or i64 x) - Represents the >> result of the >> /// eq or gt bit of CR0 after executing andi. x, 1. This is used to >> @@ -1215,6 +1223,8 @@ namespace llvm { >> SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const; >> SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const; >> SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const; >> + SDValue combineVectorShuffle(ShuffleVectorSDNode *SVN, >> + SelectionDAG &DAG) const; >> SDValue combineVReverseMemOP(ShuffleVectorSDNode *SVN, LSBaseSDNode >> *LSBase, >> DAGCombinerInfo &DCI) const; >> >> >> diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td >> b/llvm/lib/Target/PowerPC/PPCInstrVSX.td >> index e7ec1808ec3b..c43b2716cb37 100644 >> --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td >> +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td >> @@ -138,6 +138,8 @@ def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH", >> SDT_PPCldvsxlh, >> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>; >> def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat, >> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>; >> +def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED", >> + SDTypeProfile<1, 1, []>, []>; >> >> //-------------------------- Predicate definitions >> ---------------------------// >> def HasVSX : Predicate<"PPCSubTarget->hasVSX()">; >> @@ -288,6 +290,11 @@ class X_XS6_RA5_RB5 opcode, bits<10> xo, >> string opc, >> } // Predicates = HasP9Vector >> } // AddedComplexity = 400, hasSideEffects = 0 >> >> +multiclass ScalToVecWPermute> PermOut> { >> + def : Pat<(Ty (scalar_to_vector In)), (Ty NonPermOut)>; >> + def : Pat<(Ty (PPCSToV In)), (Ty PermOut)>; >> +} >> + >> //-------------------------- Instruction definitions >> -------------------------// >> // VSX instructions require the VSX feature, they are to be selected over >> // equivalent Altivec patterns (as they address a larger register set) >> and >> @@ -2710,12 +2717,14 @@ def : Pat<(v2i64 (build_vector DblToLong.A, >> DblToLong.A)), >> def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)), >> (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), >> (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), >> 0))>; >> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC), >> 1))>; >> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC), >> 1))>; >> +defm : ScalToVecWPermute< >> + v4i32, FltToIntLoad.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC), >> 1), >> + (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC)>; >> +defm : ScalToVecWPermute< >> + v4i32, FltToUIntLoad.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC), >> 1), >> + (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC)>; >> def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)), >> (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>; >> def : Pat<(v2f64 (PPCldsplat xoaddr:$A)), >> @@ -2730,10 +2739,12 @@ def : Pat<(v2i64 (build_vector FltToLong.A, >> FltToLong.A)), >> def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)), >> (v2i64 (XXPERMDIs >> (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>; >> -def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)), >> - (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>; >> -def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)), >> - (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>; >> +defm : ScalToVecWPermute< >> + v2i64, DblToLongLoad.A, >> + (XVCVDPSXDS (LXVDSX xoaddr:$A)), (XVCVDPSXDS (LXVDSX xoaddr:$A))>; >> +defm : ScalToVecWPermute< >> + v2i64, DblToULongLoad.A, >> + (XVCVDPUXDS (LXVDSX xoaddr:$A)), (XVCVDPUXDS (LXVDSX xoaddr:$A))>; >> } // HasVSX >> >> // Any big endian VSX subtarget. >> @@ -2831,9 +2842,10 @@ def : Pat> >> // Any little endian VSX subtarget. >> let Predicates = [HasVSX, IsLittleEndian] in { >> -def : Pat<(v2f64 (scalar_to_vector f64:$A)), >> - (v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64), >> - (SUBREG_TO_REG (i64 1), $A, sub_64), 0))>; >> +defm : ScalToVecWPermute> + (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64), >> + (SUBREG_TO_REG (i64 1), $A, sub_64), >> 0), >> + (SUBREG_TO_REG (i64 1), $A, sub_64)>; >> >> def : Pat<(f64 (extractelt v2f64:$S, 0)), >> (f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>; >> @@ -2943,18 +2955,24 @@ def : Pat<(PPCstore_scal_int_from_vsr >> (STXSDX (XSCVDPUXDS f64:$src), xoaddr:$dst)>; >> >> // Load-and-splat with fp-to-int conversion (using X-Form VSX/FP loads). >> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC), >> 1))>; >> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC), >> 1))>; >> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)), >> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS >> - (XFLOADf32 xoaddr:$A), >> VSFRC)), 0))>; >> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)), >> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS >> - (XFLOADf32 xoaddr:$A), >> VSFRC)), 0))>; >> +defm : ScalToVecWPermute< >> + v4i32, DblToIntLoad.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC), >> 1), >> + (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC)>; >> +defm : ScalToVecWPermute< >> + v4i32, DblToUIntLoad.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC), >> 1), >> + (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC)>; >> +defm : ScalToVecWPermute< >> + v2i64, FltToLongLoad.A, >> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A), >> VSFRC)), 0), >> + (SUBREG_TO_REG (i64 1), (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 >> xoaddr:$A), >> + VSFRC)), >> sub_64)>; >> +defm : ScalToVecWPermute< >> + v2i64, FltToULongLoad.A, >> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A), >> VSFRC)), 0), >> + (SUBREG_TO_REG (i64 1), (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 >> xoaddr:$A), >> + VSFRC)), >> sub_64)>; >> } // HasVSX, NoP9Vector >> >> // Any VSX subtarget that only has loads and stores that load in big >> endian >> @@ -3156,8 +3174,12 @@ def : Pat> (f64 (COPY_TO_REGCLASS $S1, VSRC)), >> VSFRC)))>; >> >> // v4f32 scalar <-> vector conversions (LE) >> -def : Pat<(v4f32 (scalar_to_vector f32:$A)), >> - (v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>; >> + // The permuted version is no better than the version that puts the >> value >> + // into the right element because XSCVDPSPN is >> diff erent from all the other >> + // instructions used for PPCSToV. >> + defm : ScalToVecWPermute> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1), >> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 3)>; >> def : Pat<(f32 (vector_extract v4f32:$S, 0)), >> (f32 (XSCVSPDPN (XXSLDWI $S, $S, 3)))>; >> def : Pat<(f32 (vector_extract v4f32:$S, 1)), >> @@ -3189,18 +3211,25 @@ def : Pat<(f64 (PPCfcfid (f64 (PPCmtvsra (i32 >> (extractelt v4i32:$A, 3)))))), >> // LIWAX - This instruction is used for sign extending i32 -> i64. >> // LIWZX - This instruction will be emitted for i32, f32, and when >> // zero-extending i32 to i64 (zext i32 -> i64). >> -def : Pat<(v2i64 (scalar_to_vector (i64 (sextloadi32 xoaddr:$src)))), >> - (v2i64 (XXPERMDIs >> - (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2))>; >> -def : Pat<(v2i64 (scalar_to_vector (i64 (zextloadi32 xoaddr:$src)))), >> - (v2i64 (XXPERMDIs >> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; >> -def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))), >> - (v4i32 (XXPERMDIs >> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; >> -def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))), >> - (v4f32 (XXPERMDIs >> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>; >> +defm : ScalToVecWPermute< >> + v2i64, (i64 (sextloadi32 xoaddr:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (LIWAX xoaddr:$src), sub_64)>; >> + >> +defm : ScalToVecWPermute< >> + v2i64, (i64 (zextloadi32 xoaddr:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; >> + >> +defm : ScalToVecWPermute< >> + v4i32, (i32 (load xoaddr:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; >> + >> +defm : ScalToVecWPermute< >> + v4f32, (f32 (load xoaddr:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>; >> >> def : Pat> (v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3), >> @@ -3336,14 +3365,17 @@ def : Pat<(i64 (vector_extract v2i64:$S, >> i64:$Idx)), >> // Little endian VSX subtarget with direct moves. >> let Predicates = [HasVSX, HasDirectMove, IsLittleEndian] in { >> // v16i8 scalar <-> vector conversions (LE) >> - def : Pat<(v16i8 (scalar_to_vector i32:$A)), >> - (v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>; >> - def : Pat<(v8i16 (scalar_to_vector i32:$A)), >> - (v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>; >> - def : Pat<(v4i32 (scalar_to_vector i32:$A)), >> - (v4i32 MovesToVSR.LE_WORD_0)>; >> - def : Pat<(v2i64 (scalar_to_vector i64:$A)), >> - (v2i64 MovesToVSR.LE_DWORD_0)>; >> + defm : ScalToVecWPermute> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC), >> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, >> VSRC)>; >> + defm : ScalToVecWPermute> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC), >> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, >> VSRC)>; >> + defm : ScalToVecWPermute> + (SUBREG_TO_REG (i64 1), (MTVSRWZ $A), >> sub_64)>; >> + defm : ScalToVecWPermute> + MovesToVSR.LE_DWORD_1>; >> + >> // v2i64 scalar <-> vector conversions (LE) >> def : Pat<(i64 (vector_extract v2i64:$S, 0)), >> (i64 VectorExtractions.LE_DWORD_0)>; >> @@ -3641,30 +3673,41 @@ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, >> xoaddr:$dst), >> (STXVX $rS, xoaddr:$dst)>; >> >> // Build vectors from i8 loads >> -def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)), >> - (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>; >> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)), >> - (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>; >> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)), >> - (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>; >> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)), >> - (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>; >> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)), >> - (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>; >> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)), >> - (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>; >> +defm : ScalToVecWPermute> + (VSPLTBs 7, (LXSIBZX xoaddr:$src)), >> + (VSPLTBs 7, (LXSIBZX xoaddr:$src))>; >> +defm : ScalToVecWPermute> + (VSPLTHs 3, (LXSIBZX xoaddr:$src)), >> + (VSPLTHs 3, (LXSIBZX xoaddr:$src))>; >> +defm : ScalToVecWPermute> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1), >> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1)>; >> +defm : ScalToVecWPermute> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0), >> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0)>; >> +defm : ScalToVecWPermute> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1), >> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), >> 1)>; >> +defm : ScalToVecWPermute> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), >> 0), >> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), >> 0)>; >> >> // Build vectors from i16 loads >> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)), >> - (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>; >> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)), >> - (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>; >> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)), >> - (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>; >> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)), >> - (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>; >> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)), >> - (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>; >> +defm : ScalToVecWPermute> + (VSPLTHs 3, (LXSIHZX xoaddr:$src)), >> + (VSPLTHs 3, (LXSIHZX xoaddr:$src))>; >> +defm : ScalToVecWPermute> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1), >> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1)>; >> +defm : ScalToVecWPermute> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0), >> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0)>; >> +defm : ScalToVecWPermute> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1), >> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), >> 1)>; >> +defm : ScalToVecWPermute> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), >> 0), >> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), >> 0)>; >> >> // Load/convert and convert/store patterns for f16. >> def : Pat<(f64 (extloadf16 xoaddr:$src)), >> @@ -3806,8 +3849,7 @@ def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)), >> VSSRC))>; >> >> // Endianness-neutral patterns for const splats with ISA 3.0 >> instructions. >> -def : Pat<(v4i32 (scalar_to_vector i32:$A)), >> - (v4i32 (MTVSRWS $A))>; >> +defm : ScalToVecWPermute> $A)>; >> def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)), >> (v4i32 (MTVSRWS $A))>; >> def : Pat<(v16i8 (build_vector immNonAllOneAnyExt8:$A, >> immNonAllOneAnyExt8:$A, >> @@ -3819,24 +3861,32 @@ def : Pat<(v16i8 (build_vector >> immNonAllOneAnyExt8:$A, immNonAllOneAnyExt8:$A, >> immNonAllOneAnyExt8:$A, >> immNonAllOneAnyExt8:$A, >> immNonAllOneAnyExt8:$A, >> immNonAllOneAnyExt8:$A)), >> (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>; >> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)), >> - (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>; >> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)), >> - (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>; >> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC), >> 1))>; >> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)), >> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS >> - (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC), >> 1))>; >> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)), >> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS >> - (DFLOADf32 iaddrX4:$A), >> - VSFRC)), 0))>; >> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)), >> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS >> - (DFLOADf32 iaddrX4:$A), >> - VSFRC)), 0))>; >> +defm : ScalToVecWPermute> + (XVCVSPSXWS (LXVWSX xoaddr:$A)), >> + (XVCVSPSXWS (LXVWSX xoaddr:$A))>; >> +defm : ScalToVecWPermute> + (XVCVSPUXWS (LXVWSX xoaddr:$A)), >> + (XVCVSPUXWS (LXVWSX xoaddr:$A))>; >> +defm : ScalToVecWPermute< >> + v4i32, DblToIntLoadP9.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC), >> 1), >> + (SUBREG_TO_REG (i64 1), (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), sub_64)>; >> +defm : ScalToVecWPermute< >> + v4i32, DblToUIntLoadP9.A, >> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC), >> 1), >> + (SUBREG_TO_REG (i64 1), (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), sub_64)>; >> +defm : ScalToVecWPermute< >> + v2i64, FltToLongLoadP9.A, >> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), >> VSFRC)), 0), >> + (SUBREG_TO_REG >> + (i64 1), >> + (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)), >> sub_64)>; >> +defm : ScalToVecWPermute< >> + v2i64, FltToULongLoadP9.A, >> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), >> VSFRC)), 0), >> + (SUBREG_TO_REG >> + (i64 1), >> + (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)), >> sub_64)>; >> def : Pat<(v4f32 (PPCldsplat xoaddr:$A)), >> (v4f32 (LXVWSX xoaddr:$A))>; >> def : Pat<(v4i32 (PPCldsplat xoaddr:$A)), >> @@ -4116,19 +4166,23 @@ def : Pat<(truncstorei16 (i32 (vector_extract >> v8i16:$S, 6)), xoaddr:$dst), >> def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), >> xoaddr:$dst), >> (STXSIHXv (COPY_TO_REGCLASS (v16i8 (VSLDOI $S, $S, 10)), >> VSRC), xoaddr:$dst)>; >> >> -def : Pat<(v2i64 (scalar_to_vector (i64 (load iaddrX4:$src)))), >> - (v2i64 (XXPERMDIs >> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>; >> -def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddrX4:$src)))), >> - (v2i64 (XXPERMDIs >> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>; >> +defm : ScalToVecWPermute< >> + v2i64, (i64 (load iaddrX4:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>; >> +defm : ScalToVecWPermute< >> + v2i64, (i64 (load xaddrX4:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>; >> +defm : ScalToVecWPermute< >> + v2f64, (f64 (load iaddrX4:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>; >> +defm : ScalToVecWPermute< >> + v2f64, (f64 (load xaddrX4:$src)), >> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2), >> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>; >> >> -def : Pat<(v2f64 (scalar_to_vector (f64 (load iaddrX4:$src)))), >> - (v2f64 (XXPERMDIs >> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>; >> -def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddrX4:$src)))), >> - (v2f64 (XXPERMDIs >> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>; >> def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddrX4:$src), >> (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), >> sub_64), xaddrX4:$src)>; >> >> diff --git a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll >> b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll >> index 8c9ffa815467..4d06571d0ec7 100644 >> --- a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll >> +++ b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll >> @@ -13,8 +13,7 @@ define void @testExpandPostRAPseudo(i32* nocapture >> readonly %ptr) { >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8: lfiwzx f0, 0, r3 >> ; CHECK-P8: ld r4, .LC0 at toc@l(r4) >> -; CHECK-P8: xxswapd vs0, f0 >> -; CHECK-P8: xxspltw v2, vs0, 3 >> +; CHECK-P8: xxspltw v2, vs0, 1 >> ; CHECK-P8: stvx v2, 0, r4 >> ; CHECK-P8: lis r4, 1024 >> ; CHECK-P8: lfiwax f0, 0, r3 >> >> diff --git a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll >> b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll >> index ee0cc41ea6bd..1cb7d7b62055 100644 >> --- a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll >> +++ b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll >> @@ -1282,8 +1282,7 @@ define <4 x i32> @spltMemVali(i32* nocapture >> readonly %ptr) { >> ; P8LE-LABEL: spltMemVali: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: xxswapd vs0, f0 >> -; P8LE-NEXT: xxspltw v2, vs0, 3 >> +; P8LE-NEXT: xxspltw v2, vs0, 1 >> ; P8LE-NEXT: blr >> entry: >> %0 = load i32, i32* %ptr, align 4 >> @@ -2801,8 +2800,7 @@ define <4 x i32> @spltMemValui(i32* nocapture >> readonly %ptr) { >> ; P8LE-LABEL: spltMemValui: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: xxswapd vs0, f0 >> -; P8LE-NEXT: xxspltw v2, vs0, 3 >> +; P8LE-NEXT: xxspltw v2, vs0, 1 >> ; P8LE-NEXT: blr >> entry: >> %0 = load i32, i32* %ptr, align 4 >> @@ -4573,7 +4571,7 @@ define <2 x i64> @spltMemValConvftoll(float* >> nocapture readonly %ptr) { >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfs f0, 0(r3) >> ; P9LE-NEXT: xscvdpsxds f0, f0 >> -; P9LE-NEXT: xxspltd v2, f0, 0 >> +; P9LE-NEXT: xxspltd v2, vs0, 0 >> ; P9LE-NEXT: blr >> ; >> ; P8BE-LABEL: spltMemValConvftoll: >> @@ -4587,7 +4585,7 @@ define <2 x i64> @spltMemValConvftoll(float* >> nocapture readonly %ptr) { >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfsx f0, 0, r3 >> ; P8LE-NEXT: xscvdpsxds f0, f0 >> -; P8LE-NEXT: xxspltd v2, f0, 0 >> +; P8LE-NEXT: xxspltd v2, vs0, 0 >> ; P8LE-NEXT: blr >> entry: >> %0 = load float, float* %ptr, align 4 >> @@ -5761,7 +5759,7 @@ define <2 x i64> @spltMemValConvftoull(float* >> nocapture readonly %ptr) { >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfs f0, 0(r3) >> ; P9LE-NEXT: xscvdpuxds f0, f0 >> -; P9LE-NEXT: xxspltd v2, f0, 0 >> +; P9LE-NEXT: xxspltd v2, vs0, 0 >> ; P9LE-NEXT: blr >> ; >> ; P8BE-LABEL: spltMemValConvftoull: >> @@ -5775,7 +5773,7 @@ define <2 x i64> @spltMemValConvftoull(float* >> nocapture readonly %ptr) { >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfsx f0, 0, r3 >> ; P8LE-NEXT: xscvdpuxds f0, f0 >> -; P8LE-NEXT: xxspltd v2, f0, 0 >> +; P8LE-NEXT: xxspltd v2, vs0, 0 >> ; P8LE-NEXT: blr >> entry: >> %0 = load float, float* %ptr, align 4 >> >> diff --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll >> b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll >> index 2ffe98e1f694..7fac0511e3c5 100644 >> --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll >> +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll >> @@ -23,18 +23,12 @@ entry: >> define dso_local <16 x i8> @testmrghb2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrghb2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI1_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI1_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrghb2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 24, i32 8, i32 25, i32 9, i32 26, i32 10, i32 27, i32 11, i32 28, i32 12, >> i32 29, i32 13, i32 30, i32 14, i32 31, i32 15> >> @@ -57,18 +51,12 @@ entry: >> define dso_local <16 x i8> @testmrghh2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrghh2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI3_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI3_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrghh2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI3_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI3_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 24, i32 25, i32 8, i32 9, i32 26, i32 27, i32 10, i32 11, i32 28, i32 29, >> i32 12, i32 13, i32 30, i32 31, i32 14, i32 15> >> @@ -91,18 +79,12 @@ entry: >> define dso_local <16 x i8> @testmrglb2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrglb2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI5_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI5_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrglb v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrglb2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI5_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI5_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32 >> 21, i32 5, i32 22, i32 6, i32 23, i32 7> >> @@ -125,18 +107,12 @@ entry: >> define dso_local <16 x i8> @testmrglh2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrglh2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI7_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI7_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrglh2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI7_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI7_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 16, i32 17, i32 0, i32 1, i32 18, i32 19, i32 2, i32 3, i32 20, i32 21, i32 >> 4, i32 5, i32 22, i32 23, i32 6, i32 7> >> @@ -159,18 +135,12 @@ entry: >> define dso_local <16 x i8> @testmrghw2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrghw2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI9_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI9_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrghw2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI9_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI9_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 28, i32 29, >> i32 30, i32 31, i32 12, i32 13, i32 14, i32 15> >> @@ -193,18 +163,12 @@ entry: >> define dso_local <16 x i8> @testmrglw2(<16 x i8> %a, <16 x i8> %b) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: testmrglw2: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r3, r2, .LCPI11_0 at toc@ha >> -; CHECK-P8-NEXT: addi r3, r3, .LCPI11_0 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrglw2: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI11_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI11_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v4, 0, r3 >> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4 >> +; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P9-NEXT: blr >> entry: >> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> > 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 >> 22, i32 23, i32 4, i32 5, i32 6, i32 7> >> @@ -215,24 +179,16 @@ define dso_local <8 x i16> @testmrglb3(<8 x i8>* >> nocapture readonly %a) local_un >> ; CHECK-P8-LABEL: testmrglb3: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: ld r3, 0(r3) >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI12_0 at toc@ha >> -; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI12_0 at toc@l >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: vperm v2, v2, v4, v3 >> +; CHECK-P8-NEXT: xxlxor v2, v2, v2 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testmrglb3: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lfd f0, 0(r3) >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI12_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI12_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v3, 0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, f0 >> -; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P9-NEXT: vperm v2, v2, v4, v3 >> +; CHECK-P9-NEXT: lxsd v2, 0(r3) >> +; CHECK-P9-NEXT: xxlxor v3, v3, v3 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: blr >> entry: >> %0 = load <8 x i8>, <8 x i8>* %a, align 8 >> >> diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll >> b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll >> index a23db59635a4..3a43b3584caf 100644 >> --- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll >> +++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll >> @@ -331,12 +331,12 @@ define <2 x float> @fptrunc_v2f32_v2f64(<2 x >> double> %vf1) { >> ; P9: # %bb.0: >> ; P9-NEXT: xsrsp f0, v2 >> ; P9-NEXT: xscvdpspn vs0, f0 >> -; P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; P9-NEXT: xxswapd vs0, v2 >> ; P9-NEXT: xsrsp f0, f0 >> ; P9-NEXT: xscvdpspn vs0, f0 >> -; P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; P9-NEXT: vmrglw v2, v3, v2 >> +; P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; P9-NEXT: vmrghw v2, v3, v2 >> ; P9-NEXT: blr >> %res = call <2 x float> >> @llvm.experimental.constrained.fptrunc.v2f32.v2f64( >> <2 x double> %vf1, >> >> diff --git a/llvm/test/CodeGen/PowerPC/load-and-splat.ll >> b/llvm/test/CodeGen/PowerPC/load-and-splat.ll >> index f411712ba3fa..26da1fdaefef 100644 >> --- a/llvm/test/CodeGen/PowerPC/load-and-splat.ll >> +++ b/llvm/test/CodeGen/PowerPC/load-and-splat.ll >> @@ -40,8 +40,7 @@ define dso_local void @test2(<4 x float>* nocapture %c, >> float* nocapture readonl >> ; P8: # %bb.0: # %entry >> ; P8-NEXT: addi r4, r4, 12 >> ; P8-NEXT: lfiwzx f0, 0, r4 >> -; P8-NEXT: xxswapd vs0, f0 >> -; P8-NEXT: xxspltw v2, vs0, 3 >> +; P8-NEXT: xxspltw v2, vs0, 1 >> ; P8-NEXT: stvx v2, 0, r3 >> ; P8-NEXT: blr >> entry: >> @@ -65,8 +64,7 @@ define dso_local void @test3(<4 x i32>* nocapture %c, >> i32* nocapture readonly %a >> ; P8: # %bb.0: # %entry >> ; P8-NEXT: addi r4, r4, 12 >> ; P8-NEXT: lfiwzx f0, 0, r4 >> -; P8-NEXT: xxswapd vs0, f0 >> -; P8-NEXT: xxspltw v2, vs0, 3 >> +; P8-NEXT: xxspltw v2, vs0, 1 >> ; P8-NEXT: stvx v2, 0, r3 >> ; P8-NEXT: blr >> entry: >> @@ -110,8 +108,7 @@ define <16 x i8> @unadjusted_lxvwsx(i32* %s, i32* %t) >> { >> ; P8-LABEL: unadjusted_lxvwsx: >> ; P8: # %bb.0: # %entry >> ; P8-NEXT: lfiwzx f0, 0, r3 >> -; P8-NEXT: xxswapd vs0, f0 >> -; P8-NEXT: xxspltw v2, vs0, 3 >> +; P8-NEXT: xxspltw v2, vs0, 1 >> ; P8-NEXT: blr >> entry: >> %0 = bitcast i32* %s to <4 x i8>* >> @@ -131,8 +128,7 @@ define <16 x i8> @adjusted_lxvwsx(i64* %s, i64* %t) { >> ; P8: # %bb.0: # %entry >> ; P8-NEXT: ld r3, 0(r3) >> ; P8-NEXT: mtfprd f0, r3 >> -; P8-NEXT: xxswapd v2, vs0 >> -; P8-NEXT: xxspltw v2, v2, 2 >> +; P8-NEXT: xxspltw v2, vs0, 0 >> ; P8-NEXT: blr >> entry: >> %0 = bitcast i64* %s to <8 x i8>* >> >> diff --git a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll >> b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll >> index 409978549c36..a03ab5f9519e 100644 >> --- a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll >> +++ b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll >> @@ -9,8 +9,7 @@ define <16 x i8> @test(i32* %s, i32* %t) { >> ; CHECK-LE-LABEL: test: >> ; CHECK-LE: # %bb.0: # %entry >> ; CHECK-LE-NEXT: lfiwzx f0, 0, r3 >> -; CHECK-LE-NEXT: xxswapd vs0, f0 >> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3 >> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1 >> ; CHECK-LE-NEXT: blr >> >> ; CHECK-LABEL: test: >> >> diff --git a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll >> b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll >> index e1f0e827b9f6..dffa0fb98fc0 100644 >> --- a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll >> +++ b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll >> @@ -21,8 +21,8 @@ entry: >> ; CHECK: sldi r3, r3, 56 >> ; CHECK: mtvsrd v2, r3 >> ; CHECK-LE-LABEL: buildc >> -; CHECK-LE: mtfprd f0, r3 >> -; CHECK-LE: xxswapd v2, vs0 >> +; CHECK-LE: mtvsrd v2, r3 >> +; CHECK-LE: vspltb v2, v2, 7 >> } >> >> ; Function Attrs: norecurse nounwind readnone >> @@ -35,8 +35,8 @@ entry: >> ; CHECK: sldi r3, r3, 48 >> ; CHECK: mtvsrd v2, r3 >> ; CHECK-LE-LABEL: builds >> -; CHECK-LE: mtfprd f0, r3 >> -; CHECK-LE: xxswapd v2, vs0 >> +; CHECK-LE: mtvsrd v2, r3 >> +; CHECK-LE: vsplth v2, v2, 3 >> } >> >> ; Function Attrs: norecurse nounwind readnone >> >> diff --git a/llvm/test/CodeGen/PowerPC/pr25080.ll >> b/llvm/test/CodeGen/PowerPC/pr25080.ll >> index 7a2fb76fd453..f87cb5b940ca 100644 >> --- a/llvm/test/CodeGen/PowerPC/pr25080.ll >> +++ b/llvm/test/CodeGen/PowerPC/pr25080.ll >> @@ -17,41 +17,33 @@ define <8 x i16> @pr25080(<8 x i32> %a) { >> ; LE-NEXT: mfvsrwz 3, 34 >> ; LE-NEXT: xxsldwi 1, 34, 34, 1 >> ; LE-NEXT: mfvsrwz 4, 35 >> -; LE-NEXT: xxsldwi 4, 34, 34, 3 >> -; LE-NEXT: mtfprd 2, 3 >> +; LE-NEXT: xxsldwi 2, 34, 34, 3 >> +; LE-NEXT: mtvsrd 36, 3 >> ; LE-NEXT: mffprwz 3, 0 >> ; LE-NEXT: xxswapd 0, 35 >> -; LE-NEXT: mtfprd 3, 4 >> -; LE-NEXT: xxsldwi 5, 35, 35, 1 >> +; LE-NEXT: mtvsrd 37, 4 >> ; LE-NEXT: mffprwz 4, 1 >> -; LE-NEXT: xxsldwi 7, 35, 35, 3 >> -; LE-NEXT: mtfprd 1, 3 >> -; LE-NEXT: xxswapd 33, 3 >> -; LE-NEXT: mffprwz 3, 4 >> -; LE-NEXT: mtfprd 4, 4 >> -; LE-NEXT: xxswapd 34, 1 >> +; LE-NEXT: xxsldwi 1, 35, 35, 1 >> +; LE-NEXT: mtvsrd 34, 3 >> +; LE-NEXT: mffprwz 3, 2 >> +; LE-NEXT: mtvsrd 32, 4 >> ; LE-NEXT: mffprwz 4, 0 >> -; LE-NEXT: mtfprd 0, 3 >> -; LE-NEXT: xxswapd 35, 4 >> -; LE-NEXT: mffprwz 3, 5 >> -; LE-NEXT: mtfprd 6, 4 >> -; LE-NEXT: xxswapd 36, 0 >> -; LE-NEXT: mtfprd 1, 3 >> -; LE-NEXT: mffprwz 3, 7 >> -; LE-NEXT: xxswapd 37, 6 >> -; LE-NEXT: vmrglh 2, 3, 2 >> -; LE-NEXT: xxswapd 35, 2 >> -; LE-NEXT: mtfprd 2, 3 >> -; LE-NEXT: xxswapd 32, 1 >> +; LE-NEXT: xxsldwi 0, 35, 35, 3 >> +; LE-NEXT: mtvsrd 33, 3 >> +; LE-NEXT: mffprwz 3, 1 >> +; LE-NEXT: mtvsrd 38, 4 >> +; LE-NEXT: mtvsrd 35, 3 >> +; LE-NEXT: mffprwz 3, 0 >> +; LE-NEXT: vmrghh 2, 0, 2 >> +; LE-NEXT: mtvsrd 32, 3 >> ; LE-NEXT: addis 3, 2, .LCPI0_1 at toc@ha >> +; LE-NEXT: vmrghh 4, 1, 4 >> ; LE-NEXT: addi 3, 3, .LCPI0_1 at toc@l >> -; LE-NEXT: xxswapd 38, 2 >> -; LE-NEXT: vmrglh 3, 4, 3 >> -; LE-NEXT: vmrglh 4, 0, 5 >> -; LE-NEXT: vmrglh 5, 6, 1 >> -; LE-NEXT: vmrglw 2, 3, 2 >> -; LE-NEXT: vmrglw 3, 5, 4 >> +; LE-NEXT: vmrghh 3, 3, 6 >> +; LE-NEXT: vmrghh 5, 0, 5 >> +; LE-NEXT: vmrglw 2, 4, 2 >> ; LE-NEXT: vspltish 4, 15 >> +; LE-NEXT: vmrglw 3, 5, 3 >> ; LE-NEXT: xxmrgld 34, 35, 34 >> ; LE-NEXT: lvx 3, 0, 3 >> ; LE-NEXT: xxlor 34, 34, 35 >> >> diff --git a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll >> b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll >> index 4c10c3813fb5..d3bfb910fc9f 100644 >> --- a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll >> +++ b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll >> @@ -58,12 +58,11 @@ L.LB38_2452: >> >> ; CHECK-LABEL: @aercalc_ >> ; CHECK: lfs >> -; CHECK: xxspltd >> +; CHECK: xxswapd >> ; CHECK: stxvd2x >> ; CHECK-NOT: xxswapd >> >> ; CHECK-P9-LABEL: @aercalc_ >> ; CHECK-P9: lfs >> -; CHECK-P9: xxspltd >> ; CHECK-P9: stxv >> ; CHECK-P9-NOT: xxswapd >> >> diff --git a/llvm/test/CodeGen/PowerPC/pr38087.ll >> b/llvm/test/CodeGen/PowerPC/pr38087.ll >> index e05a3d2b97aa..49b3d39bc18c 100644 >> --- a/llvm/test/CodeGen/PowerPC/pr38087.ll >> +++ b/llvm/test/CodeGen/PowerPC/pr38087.ll >> @@ -11,9 +11,8 @@ declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, >> i32) #0 >> define void @draw_llvm_vs_variant0(<4 x float> %x) { >> ; CHECK-LABEL: draw_llvm_vs_variant0: >> ; CHECK: # %bb.0: # %entry >> -; CHECK-NEXT: lfd f0, 0(r3) >> -; CHECK-NEXT: xxswapd v3, f0 >> -; CHECK-NEXT: vmrglh v3, v3, v3 >> +; CHECK-NEXT: lxsd v3, 0(r3) >> +; CHECK-NEXT: vmrghh v3, v3, v3 >> ; CHECK-NEXT: vextsh2w v3, v3 >> ; CHECK-NEXT: xvcvsxwsp vs0, v3 >> ; CHECK-NEXT: xxspltw vs0, vs0, 2 >> >> diff --git a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll >> b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll >> index 4c9137d86124..6584cb74bdb5 100644 >> --- a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll >> +++ b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll >> @@ -11,34 +11,31 @@ >> define signext i32 @test_pre_inc_disable_1(i8* nocapture readonly %pix1, >> i32 signext %i_stride_pix1, i8* nocapture readonly %pix2) { >> ; CHECK-LABEL: test_pre_inc_disable_1: >> ; CHECK: # %bb.0: # %entry >> -; CHECK-NEXT: lfd f0, 0(r5) >> +; CHECK-NEXT: lxsd v5, 0(r5) >> ; CHECK-NEXT: addis r5, r2, .LCPI0_0 at toc@ha >> ; CHECK-NEXT: addi r5, r5, .LCPI0_0 at toc@l >> ; CHECK-NEXT: lxvx v2, 0, r5 >> ; CHECK-NEXT: addis r5, r2, .LCPI0_1 at toc@ha >> ; CHECK-NEXT: addi r5, r5, .LCPI0_1 at toc@l >> ; CHECK-NEXT: lxvx v4, 0, r5 >> -; CHECK-NEXT: xxswapd v5, f0 >> -; CHECK-NEXT: xxlxor v3, v3, v3 >> ; CHECK-NEXT: li r5, 4 >> +; CHECK-NEXT: xxlxor v3, v3, v3 >> ; CHECK-NEXT: vperm v0, v3, v5, v2 >> ; CHECK-NEXT: mtctr r5 >> ; CHECK-NEXT: li r5, 0 >> -; CHECK-NEXT: vperm v1, v5, v3, v4 >> +; CHECK-NEXT: vperm v1, v3, v5, v4 >> ; CHECK-NEXT: li r6, 0 >> ; CHECK-NEXT: xvnegsp v5, v0 >> ; CHECK-NEXT: xvnegsp v0, v1 >> ; CHECK-NEXT: .p2align 4 >> ; CHECK-NEXT: .LBB0_1: # %for.cond1.preheader >> ; CHECK-NEXT: # >> -; CHECK-NEXT: lfd f0, 0(r3) >> -; CHECK-NEXT: xxswapd v1, f0 >> -; CHECK-NEXT: lfdx f0, r3, r4 >> -; CHECK-NEXT: vperm v6, v1, v3, v4 >> +; CHECK-NEXT: lxsd v1, 0(r3) >> +; CHECK-NEXT: vperm v6, v3, v1, v4 >> ; CHECK-NEXT: vperm v1, v3, v1, v2 >> ; CHECK-NEXT: xvnegsp v1, v1 >> -; CHECK-NEXT: add r7, r3, r4 >> ; CHECK-NEXT: xvnegsp v6, v6 >> +; CHECK-NEXT: add r7, r3, r4 >> ; CHECK-NEXT: vabsduw v1, v1, v5 >> ; CHECK-NEXT: vabsduw v6, v6, v0 >> ; CHECK-NEXT: vadduwm v1, v6, v1 >> @@ -46,15 +43,14 @@ define signext i32 @test_pre_inc_disable_1(i8* >> nocapture readonly %pix1, i32 sig >> ; CHECK-NEXT: vadduwm v1, v1, v6 >> ; CHECK-NEXT: xxspltw v6, v1, 2 >> ; CHECK-NEXT: vadduwm v1, v1, v6 >> -; CHECK-NEXT: xxswapd v6, f0 >> +; CHECK-NEXT: lxsdx v6, r3, r4 >> ; CHECK-NEXT: vextuwrx r3, r5, v1 >> -; CHECK-NEXT: vperm v7, v6, v3, v4 >> +; CHECK-NEXT: vperm v7, v3, v6, v4 >> ; CHECK-NEXT: vperm v6, v3, v6, v2 >> -; CHECK-NEXT: add r6, r3, r6 >> -; CHECK-NEXT: add r3, r7, r4 >> ; CHECK-NEXT: xvnegsp v6, v6 >> ; CHECK-NEXT: xvnegsp v1, v7 >> ; CHECK-NEXT: vabsduw v6, v6, v5 >> +; CHECK-NEXT: add r6, r3, r6 >> ; CHECK-NEXT: vabsduw v1, v1, v0 >> ; CHECK-NEXT: vadduwm v1, v1, v6 >> ; CHECK-NEXT: xxswapd v6, v1 >> @@ -62,6 +58,7 @@ define signext i32 @test_pre_inc_disable_1(i8* >> nocapture readonly %pix1, i32 sig >> ; CHECK-NEXT: xxspltw v6, v1, 2 >> ; CHECK-NEXT: vadduwm v1, v1, v6 >> ; CHECK-NEXT: vextuwrx r8, r5, v1 >> +; CHECK-NEXT: add r3, r7, r4 >> ; CHECK-NEXT: add r6, r8, r6 >> ; CHECK-NEXT: bdnz .LBB0_1 >> ; CHECK-NEXT: # %bb.2: # %for.cond.cleanup >> @@ -181,29 +178,27 @@ for.cond.cleanup: ; >> preds = %for.cond1.preheader >> define signext i32 @test_pre_inc_disable_2(i8* nocapture readonly %pix1, >> i8* nocapture readonly %pix2) { >> ; CHECK-LABEL: test_pre_inc_disable_2: >> ; CHECK: # %bb.0: # %entry >> -; CHECK-NEXT: lfd f0, 0(r3) >> +; CHECK-NEXT: lxsd v2, 0(r3) >> ; CHECK-NEXT: addis r3, r2, .LCPI1_0 at toc@ha >> ; CHECK-NEXT: addi r3, r3, .LCPI1_0 at toc@l >> ; CHECK-NEXT: lxvx v4, 0, r3 >> ; CHECK-NEXT: addis r3, r2, .LCPI1_1 at toc@ha >> -; CHECK-NEXT: xxswapd v2, f0 >> -; CHECK-NEXT: lfd f0, 0(r4) >> ; CHECK-NEXT: addi r3, r3, .LCPI1_1 at toc@l >> -; CHECK-NEXT: xxlxor v3, v3, v3 >> ; CHECK-NEXT: lxvx v0, 0, r3 >> -; CHECK-NEXT: xxswapd v1, f0 >> -; CHECK-NEXT: vperm v5, v2, v3, v4 >> +; CHECK-NEXT: lxsd v1, 0(r4) >> +; CHECK-NEXT: xxlxor v3, v3, v3 >> +; CHECK-NEXT: vperm v5, v3, v2, v4 >> ; CHECK-NEXT: vperm v2, v3, v2, v0 >> ; CHECK-NEXT: vperm v0, v3, v1, v0 >> -; CHECK-NEXT: vperm v3, v1, v3, v4 >> +; CHECK-NEXT: vperm v3, v3, v1, v4 >> ; CHECK-NEXT: vabsduw v2, v2, v0 >> ; CHECK-NEXT: vabsduw v3, v5, v3 >> ; CHECK-NEXT: vadduwm v2, v3, v2 >> ; CHECK-NEXT: xxswapd v3, v2 >> -; CHECK-NEXT: li r3, 0 >> ; CHECK-NEXT: vadduwm v2, v2, v3 >> ; CHECK-NEXT: xxspltw v3, v2, 2 >> ; CHECK-NEXT: vadduwm v2, v2, v3 >> +; CHECK-NEXT: li r3, 0 >> ; CHECK-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-NEXT: extsw r3, r3 >> ; CHECK-NEXT: blr >> @@ -286,16 +281,14 @@ define void @test32(i8* nocapture readonly %pix2, >> i32 signext %i_pix2) { >> ; CHECK-LABEL: test32: >> ; CHECK: # %bb.0: # %entry >> ; CHECK-NEXT: add r5, r3, r4 >> -; CHECK-NEXT: lfiwzx f0, r3, r4 >> +; CHECK-NEXT: lxsiwzx v2, r3, r4 >> ; CHECK-NEXT: addis r3, r2, .LCPI2_0 at toc@ha >> ; CHECK-NEXT: addi r3, r3, .LCPI2_0 at toc@l >> ; CHECK-NEXT: lxvx v4, 0, r3 >> ; CHECK-NEXT: li r3, 4 >> -; CHECK-NEXT: xxswapd v2, f0 >> -; CHECK-NEXT: lfiwzx f0, r5, r3 >> +; CHECK-NEXT: lxsiwzx v5, r5, r3 >> ; CHECK-NEXT: xxlxor v3, v3, v3 >> ; CHECK-NEXT: vperm v2, v2, v3, v4 >> -; CHECK-NEXT: xxswapd v5, f0 >> ; CHECK-NEXT: vperm v3, v5, v3, v4 >> ; CHECK-NEXT: vspltisw v4, 8 >> ; CHECK-NEXT: vnegw v3, v3 >> @@ -361,16 +354,15 @@ define void @test16(i16* nocapture readonly %sums, >> i32 signext %delta, i32 signe >> ; CHECK-NEXT: lxsihzx v2, r6, r7 >> ; CHECK-NEXT: lxsihzx v4, r3, r4 >> ; CHECK-NEXT: li r6, 0 >> -; CHECK-NEXT: mtfprd f0, r6 >> +; CHECK-NEXT: mtvsrd v3, r6 >> ; CHECK-NEXT: vsplth v4, v4, 3 >> -; CHECK-NEXT: xxswapd v3, vs0 >> ; CHECK-NEXT: vsplth v2, v2, 3 >> ; CHECK-NEXT: addis r3, r2, .LCPI3_0 at toc@ha >> ; CHECK-NEXT: addi r3, r3, .LCPI3_0 at toc@l >> -; CHECK-NEXT: vmrglh v2, v3, v2 >> -; CHECK-NEXT: vmrglh v3, v3, v4 >> -; CHECK-NEXT: xxlxor v4, v4, v4 >> -; CHECK-NEXT: vmrglw v3, v3, v4 >> +; CHECK-NEXT: vmrghh v4, v3, v4 >> +; CHECK-NEXT: vmrghh v2, v3, v2 >> +; CHECK-NEXT: vsplth v3, v3, 3 >> +; CHECK-NEXT: vmrglw v3, v4, v3 >> ; CHECK-NEXT: lxvx v4, 0, r3 >> ; CHECK-NEXT: li r3, 0 >> ; CHECK-NEXT: vperm v2, v2, v3, v4 >> @@ -446,18 +438,17 @@ define void @test8(i8* nocapture readonly %sums, >> i32 signext %delta, i32 signext >> ; CHECK-NEXT: add r6, r3, r4 >> ; CHECK-NEXT: lxsibzx v2, r3, r4 >> ; CHECK-NEXT: li r3, 0 >> -; CHECK-NEXT: mtfprd f0, r3 >> +; CHECK-NEXT: mtvsrd v3, r3 >> ; CHECK-NEXT: li r3, 8 >> ; CHECK-NEXT: lxsibzx v5, r6, r3 >> -; CHECK-NEXT: xxswapd v3, vs0 >> -; CHECK-NEXT: vspltb v4, v3, 15 >> -; CHECK-NEXT: vspltb v2, v2, 7 >> -; CHECK-NEXT: vmrglb v2, v3, v2 >> ; CHECK-NEXT: addis r3, r2, .LCPI4_0 at toc@ha >> ; CHECK-NEXT: addi r3, r3, .LCPI4_0 at toc@l >> +; CHECK-NEXT: vspltb v2, v2, 7 >> +; CHECK-NEXT: vmrghb v2, v3, v2 >> +; CHECK-NEXT: vspltb v4, v3, 7 >> ; CHECK-NEXT: vspltb v5, v5, 7 >> ; CHECK-NEXT: vmrglh v2, v2, v4 >> -; CHECK-NEXT: vmrglb v3, v3, v5 >> +; CHECK-NEXT: vmrghb v3, v3, v5 >> ; CHECK-NEXT: vmrglw v2, v2, v4 >> ; CHECK-NEXT: vmrglh v3, v3, v4 >> ; CHECK-NEXT: vmrglw v3, v4, v3 >> >> diff --git a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll >> b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll >> index 099611a7b5e3..50b864980d98 100644 >> --- a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll >> +++ b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll >> @@ -53,8 +53,7 @@ define <4 x float> @foof(float* nocapture readonly %a) >> #0 { >> ; CHECK-LABEL: foof: >> ; CHECK: # %bb.0: # %entry >> ; CHECK-NEXT: lfiwzx f0, 0, r3 >> -; CHECK-NEXT: xxswapd vs0, f0 >> -; CHECK-NEXT: xxspltw v2, vs0, 3 >> +; CHECK-NEXT: xxspltw v2, vs0, 1 >> ; CHECK-NEXT: blr >> entry: >> %0 = load float, float* %a, align 4 >> @@ -68,8 +67,7 @@ define <4 x float> @foofx(float* nocapture readonly %a, >> i64 %idx) #0 { >> ; CHECK: # %bb.0: # %entry >> ; CHECK-NEXT: sldi r4, r4, 2 >> ; CHECK-NEXT: lfiwzx f0, r3, r4 >> -; CHECK-NEXT: xxswapd vs0, f0 >> -; CHECK-NEXT: xxspltw v2, vs0, 3 >> +; CHECK-NEXT: xxspltw v2, vs0, 1 >> ; CHECK-NEXT: blr >> entry: >> %p = getelementptr float, float* %a, i64 %idx >> >> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll >> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll >> index b43e2c8b97af..c12f7f9a9f05 100644 >> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll >> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll >> @@ -13,8 +13,7 @@ define <2 x i64> @s2v_test1(i64* nocapture readonly >> %int64, <2 x i64> %vec) { >> ; P9LE-LABEL: s2v_test1: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 0(r3) >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test1: >> @@ -33,8 +32,7 @@ define <2 x i64> @s2v_test2(i64* nocapture readonly >> %int64, <2 x i64> %vec) { >> ; P9LE-LABEL: s2v_test2: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 8(r3) >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test2: >> @@ -55,8 +53,7 @@ define <2 x i64> @s2v_test3(i64* nocapture readonly >> %int64, <2 x i64> %vec, i32 >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: sldi r4, r7, 3 >> ; P9LE-NEXT: lfdx f0, r3, r4 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test3 >> @@ -78,8 +75,7 @@ define <2 x i64> @s2v_test4(i64* nocapture readonly >> %int64, <2 x i64> %vec) { >> ; P9LE-LABEL: s2v_test4: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 8(r3) >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test4: >> @@ -99,8 +95,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i64* >> nocapture readonly %ptr1) { >> ; P9LE-LABEL: s2v_test5: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 0(r5) >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test5: >> @@ -119,8 +114,7 @@ define <2 x double> @s2v_test_f1(double* nocapture >> readonly %f64, <2 x double> % >> ; P9LE-LABEL: s2v_test_f1: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 0(r3) >> -; P9LE-NEXT: xxswapd vs0, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f1: >> @@ -132,8 +126,7 @@ define <2 x double> @s2v_test_f1(double* nocapture >> readonly %f64, <2 x double> % >> ; P8LE-LABEL: s2v_test_f1: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfdx f0, 0, r3 >> -; P8LE-NEXT: xxspltd vs0, vs0, 0 >> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f1: >> @@ -152,8 +145,7 @@ define <2 x double> @s2v_test_f2(double* nocapture >> readonly %f64, <2 x double> % >> ; P9LE-LABEL: s2v_test_f2: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 8(r3) >> -; P9LE-NEXT: xxswapd vs0, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f2: >> @@ -165,8 +157,7 @@ define <2 x double> @s2v_test_f2(double* nocapture >> readonly %f64, <2 x double> % >> ; P8LE-LABEL: s2v_test_f2: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfd f0, 8(r3) >> -; P8LE-NEXT: xxspltd vs0, vs0, 0 >> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f2: >> @@ -187,8 +178,7 @@ define <2 x double> @s2v_test_f3(double* nocapture >> readonly %f64, <2 x double> % >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: sldi r4, r7, 3 >> ; P9LE-NEXT: lfdx f0, r3, r4 >> -; P9LE-NEXT: xxswapd vs0, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f3: >> @@ -202,8 +192,7 @@ define <2 x double> @s2v_test_f3(double* nocapture >> readonly %f64, <2 x double> % >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: sldi r4, r7, 3 >> ; P8LE-NEXT: lfdx f0, r3, r4 >> -; P8LE-NEXT: xxspltd vs0, vs0, 0 >> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f3: >> @@ -225,8 +214,7 @@ define <2 x double> @s2v_test_f4(double* nocapture >> readonly %f64, <2 x double> % >> ; P9LE-LABEL: s2v_test_f4: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 8(r3) >> -; P9LE-NEXT: xxswapd vs0, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f4: >> @@ -238,8 +226,7 @@ define <2 x double> @s2v_test_f4(double* nocapture >> readonly %f64, <2 x double> % >> ; P8LE-LABEL: s2v_test_f4: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfd f0, 8(r3) >> -; P8LE-NEXT: xxspltd vs0, vs0, 0 >> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f4: >> @@ -259,8 +246,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec, >> double* nocapture readonly % >> ; P9LE-LABEL: s2v_test_f5: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfd f0, 0(r5) >> -; P9LE-NEXT: xxswapd vs0, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f5: >> @@ -272,8 +258,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec, >> double* nocapture readonly % >> ; P8LE-LABEL: s2v_test_f5: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfdx f0, 0, r5 >> -; P8LE-NEXT: xxspltd vs0, vs0, 0 >> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f5: >> >> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll >> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll >> index 83691b52575d..f4572c359942 100644 >> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll >> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll >> @@ -12,8 +12,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P9LE-LABEL: s2v_test1: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfiwax f0, 0, r3 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test1: >> @@ -25,8 +24,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P8LE-LABEL: s2v_test1: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwax f0, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test1: >> @@ -47,8 +45,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: addi r3, r3, 4 >> ; P9LE-NEXT: lfiwax f0, 0, r3 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test2: >> @@ -62,8 +59,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: addi r3, r3, 4 >> ; P8LE-NEXT: lfiwax f0, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test2: >> @@ -86,8 +82,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly >> %int32, <2 x i64> %vec, i32 >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: sldi r4, r7, 2 >> ; P9LE-NEXT: lfiwax f0, r3, r4 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test3: >> @@ -101,8 +96,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly >> %int32, <2 x i64> %vec, i32 >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: sldi r4, r7, 2 >> ; P8LE-NEXT: lfiwax f0, r3, r4 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test3: >> @@ -126,8 +120,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: addi r3, r3, 4 >> ; P9LE-NEXT: lfiwax f0, 0, r3 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test4: >> @@ -141,8 +134,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly >> %int32, <2 x i64> %vec) { >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: addi r3, r3, 4 >> ; P8LE-NEXT: lfiwax f0, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test4: >> @@ -164,8 +156,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32* >> nocapture readonly %ptr1) { >> ; P9LE-LABEL: s2v_test5: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfiwax f0, 0, r5 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P9LE-NEXT: xxmrghd v2, v2, vs0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test5: >> @@ -177,8 +168,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32* >> nocapture readonly %ptr1) { >> ; P8LE-LABEL: s2v_test5: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwax f0, 0, r5 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1 >> +; P8LE-NEXT: xxmrghd v2, v2, vs0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test5: >> @@ -198,8 +188,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly >> %ptr) { >> ; P9LE-LABEL: s2v_test6: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfiwax f0, 0, r3 >> -; P9LE-NEXT: xxswapd v2, f0 >> -; P9LE-NEXT: xxspltd v2, v2, 1 >> +; P9LE-NEXT: xxspltd v2, vs0, 0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test6: >> @@ -211,8 +200,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly >> %ptr) { >> ; P8LE-LABEL: s2v_test6: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwax f0, 0, r3 >> -; P8LE-NEXT: xxswapd v2, f0 >> -; P8LE-NEXT: xxspltd v2, v2, 1 >> +; P8LE-NEXT: xxspltd v2, vs0, 0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test6: >> @@ -233,8 +221,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly >> %ptr) { >> ; P9LE-LABEL: s2v_test7: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: lfiwax f0, 0, r3 >> -; P9LE-NEXT: xxswapd v2, f0 >> -; P9LE-NEXT: xxspltd v2, v2, 1 >> +; P9LE-NEXT: xxspltd v2, vs0, 0 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test7: >> @@ -246,8 +233,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly >> %ptr) { >> ; P8LE-LABEL: s2v_test7: >> ; P8LE: # %bb.0: # %entry >> ; P8LE-NEXT: lfiwax f0, 0, r3 >> -; P8LE-NEXT: xxswapd v2, f0 >> -; P8LE-NEXT: xxspltd v2, v2, 1 >> +; P8LE-NEXT: xxspltd v2, vs0, 0 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test7: >> >> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll >> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll >> index 2261d75c6619..3dc34533420c 100644 >> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll >> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll >> @@ -11,12 +11,11 @@ >> define <4 x i32> @s2v_test1(i32* nocapture readonly %int32, <4 x i32> >> %vec) { >> ; P8LE-LABEL: s2v_test1: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> ; P8LE-NEXT: addis r4, r2, .LCPI0_0 at toc@ha >> -; P8LE-NEXT: addi r3, r4, .LCPI0_0 at toc@l >> -; P8LE-NEXT: lvx v3, 0, r3 >> -; P8LE-NEXT: xxswapd v4, f0 >> -; P8LE-NEXT: vperm v2, v4, v2, v3 >> +; P8LE-NEXT: lxsiwzx v4, 0, r3 >> +; P8LE-NEXT: addi r4, r4, .LCPI0_0 at toc@l >> +; P8LE-NEXT: lvx v3, 0, r4 >> +; P8LE-NEXT: vperm v2, v2, v4, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test1: >> @@ -36,13 +35,12 @@ entry: >> define <4 x i32> @s2v_test2(i32* nocapture readonly %int32, <4 x i32> >> %vec) { >> ; P8LE-LABEL: s2v_test2: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: addi r3, r3, 4 >> ; P8LE-NEXT: addis r4, r2, .LCPI1_0 at toc@ha >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: addi r3, r4, .LCPI1_0 at toc@l >> -; P8LE-NEXT: lvx v3, 0, r3 >> -; P8LE-NEXT: xxswapd v4, f0 >> -; P8LE-NEXT: vperm v2, v4, v2, v3 >> +; P8LE-NEXT: addi r3, r3, 4 >> +; P8LE-NEXT: addi r4, r4, .LCPI1_0 at toc@l >> +; P8LE-NEXT: lxsiwzx v4, 0, r3 >> +; P8LE-NEXT: lvx v3, 0, r4 >> +; P8LE-NEXT: vperm v2, v2, v4, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test2: >> @@ -64,13 +62,12 @@ entry: >> define <4 x i32> @s2v_test3(i32* nocapture readonly %int32, <4 x i32> >> %vec, i32 signext %Idx) { >> ; P8LE-LABEL: s2v_test3: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: sldi r5, r7, 2 >> ; P8LE-NEXT: addis r4, r2, .LCPI2_0 at toc@ha >> -; P8LE-NEXT: lfiwzx f0, r3, r5 >> -; P8LE-NEXT: addi r3, r4, .LCPI2_0 at toc@l >> -; P8LE-NEXT: lvx v4, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: vperm v2, v3, v2, v4 >> +; P8LE-NEXT: sldi r5, r7, 2 >> +; P8LE-NEXT: addi r4, r4, .LCPI2_0 at toc@l >> +; P8LE-NEXT: lxsiwzx v3, r3, r5 >> +; P8LE-NEXT: lvx v4, 0, r4 >> +; P8LE-NEXT: vperm v2, v2, v3, v4 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test3: >> @@ -93,13 +90,12 @@ entry: >> define <4 x i32> @s2v_test4(i32* nocapture readonly %int32, <4 x i32> >> %vec) { >> ; P8LE-LABEL: s2v_test4: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: addi r3, r3, 4 >> ; P8LE-NEXT: addis r4, r2, .LCPI3_0 at toc@ha >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: addi r3, r4, .LCPI3_0 at toc@l >> -; P8LE-NEXT: lvx v3, 0, r3 >> -; P8LE-NEXT: xxswapd v4, f0 >> -; P8LE-NEXT: vperm v2, v4, v2, v3 >> +; P8LE-NEXT: addi r3, r3, 4 >> +; P8LE-NEXT: addi r4, r4, .LCPI3_0 at toc@l >> +; P8LE-NEXT: lxsiwzx v4, 0, r3 >> +; P8LE-NEXT: lvx v3, 0, r4 >> +; P8LE-NEXT: vperm v2, v2, v4, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test4: >> @@ -121,12 +117,11 @@ entry: >> define <4 x i32> @s2v_test5(<4 x i32> %vec, i32* nocapture readonly >> %ptr1) { >> ; P8LE-LABEL: s2v_test5: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: lfiwzx f0, 0, r5 >> ; P8LE-NEXT: addis r3, r2, .LCPI4_0 at toc@ha >> +; P8LE-NEXT: lxsiwzx v4, 0, r5 >> ; P8LE-NEXT: addi r3, r3, .LCPI4_0 at toc@l >> ; P8LE-NEXT: lvx v3, 0, r3 >> -; P8LE-NEXT: xxswapd v4, f0 >> -; P8LE-NEXT: vperm v2, v4, v2, v3 >> +; P8LE-NEXT: vperm v2, v2, v4, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test5: >> @@ -146,12 +141,11 @@ entry: >> define <4 x float> @s2v_test_f1(float* nocapture readonly %f64, <4 x >> float> %vec) { >> ; P8LE-LABEL: s2v_test_f1: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> ; P8LE-NEXT: addis r4, r2, .LCPI5_0 at toc@ha >> -; P8LE-NEXT: addi r3, r4, .LCPI5_0 at toc@l >> -; P8LE-NEXT: lvx v3, 0, r3 >> -; P8LE-NEXT: xxswapd v4, f0 >> -; P8LE-NEXT: vperm v2, v4, v2, v3 >> +; P8LE-NEXT: lxsiwzx v4, 0, r3 >> +; P8LE-NEXT: addi r4, r4, .LCPI5_0 at toc@l >> +; P8LE-NEXT: lvx v3, 0, r4 >> +; P8LE-NEXT: vperm v2, v2, v4, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f1: >> @@ -172,10 +166,9 @@ define <2 x float> @s2v_test_f2(float* nocapture >> readonly %f64, <2 x float> %vec >> ; P9LE-LABEL: s2v_test_f2: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: addi r3, r3, 4 >> -; P9LE-DAG: xxspltw v2, v2, 2 >> -; P9LE-DAG: lfiwzx f0, 0, r3 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: vmrglw v2, v2, v3 >> +; P9LE-NEXT: lxsiwzx v3, 0, r3 >> +; P9LE-NEXT: vmrglw v2, v2, v2 >> +; P9LE-NEXT: vmrghw v2, v2, v3 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f2: >> @@ -189,11 +182,10 @@ define <2 x float> @s2v_test_f2(float* nocapture >> readonly %f64, <2 x float> %vec >> >> ; P8LE-LABEL: s2v_test_f2: >> ; P8LE: # %bb.0: # %entry >> +; P8LE-NEXT: vmrglw v2, v2, v2 >> ; P8LE-NEXT: addi r3, r3, 4 >> -; P8LE-NEXT: xxspltw v2, v2, 2 >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: lxsiwzx v3, 0, r3 >> +; P8LE-NEXT: vmrghw v2, v2, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f2: >> @@ -216,10 +208,9 @@ define <2 x float> @s2v_test_f3(float* nocapture >> readonly %f64, <2 x float> %vec >> ; P9LE-LABEL: s2v_test_f3: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: sldi r4, r7, 2 >> -; P9LE-NEXT: lfiwzx f0, r3, r4 >> -; P9LE-DAG: xxspltw v2, v2, 2 >> -; P9LE-DAG: xxswapd v3, f0 >> -; P9LE-NEXT: vmrglw v2, v2, v3 >> +; P9LE-NEXT: lxsiwzx v3, r3, r4 >> +; P9LE-NEXT: vmrglw v2, v2, v2 >> +; P9LE-NEXT: vmrghw v2, v2, v3 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f3: >> @@ -233,11 +224,10 @@ define <2 x float> @s2v_test_f3(float* nocapture >> readonly %f64, <2 x float> %vec >> >> ; P8LE-LABEL: s2v_test_f3: >> ; P8LE: # %bb.0: # %entry >> +; P8LE-NEXT: vmrglw v2, v2, v2 >> ; P8LE-NEXT: sldi r4, r7, 2 >> -; P8LE-NEXT: xxspltw v2, v2, 2 >> -; P8LE-NEXT: lfiwzx f0, r3, r4 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: lxsiwzx v3, r3, r4 >> +; P8LE-NEXT: vmrghw v2, v2, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f3: >> @@ -261,10 +251,9 @@ define <2 x float> @s2v_test_f4(float* nocapture >> readonly %f64, <2 x float> %vec >> ; P9LE-LABEL: s2v_test_f4: >> ; P9LE: # %bb.0: # %entry >> ; P9LE-NEXT: addi r3, r3, 4 >> -; P9LE-NEXT: lfiwzx f0, 0, r3 >> -; P9LE-DAG: xxspltw v2, v2, 2 >> -; P9LE-DAG: xxswapd v3, f0 >> -; P9LE-NEXT: vmrglw v2, v2, v3 >> +; P9LE-NEXT: lxsiwzx v3, 0, r3 >> +; P9LE-NEXT: vmrglw v2, v2, v2 >> +; P9LE-NEXT: vmrghw v2, v2, v3 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f4: >> @@ -278,11 +267,10 @@ define <2 x float> @s2v_test_f4(float* nocapture >> readonly %f64, <2 x float> %vec >> >> ; P8LE-LABEL: s2v_test_f4: >> ; P8LE: # %bb.0: # %entry >> +; P8LE-NEXT: vmrglw v2, v2, v2 >> ; P8LE-NEXT: addi r3, r3, 4 >> -; P8LE-NEXT: xxspltw v2, v2, 2 >> -; P8LE-NEXT: lfiwzx f0, 0, r3 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: lxsiwzx v3, 0, r3 >> +; P8LE-NEXT: vmrghw v2, v2, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f4: >> @@ -304,10 +292,9 @@ entry: >> define <2 x float> @s2v_test_f5(<2 x float> %vec, float* nocapture >> readonly %ptr1) { >> ; P9LE-LABEL: s2v_test_f5: >> ; P9LE: # %bb.0: # %entry >> -; P9LE-NEXT: lfiwzx f0, 0, r5 >> -; P9LE-NEXT: xxspltw v2, v2, 2 >> -; P9LE-NEXT: xxswapd v3, f0 >> -; P9LE-NEXT: vmrglw v2, v2, v3 >> +; P9LE-NEXT: lxsiwzx v3, 0, r5 >> +; P9LE-NEXT: vmrglw v2, v2, v2 >> +; P9LE-NEXT: vmrghw v2, v2, v3 >> ; P9LE-NEXT: blr >> >> ; P9BE-LABEL: s2v_test_f5: >> @@ -320,10 +307,9 @@ define <2 x float> @s2v_test_f5(<2 x float> %vec, >> float* nocapture readonly %ptr >> >> ; P8LE-LABEL: s2v_test_f5: >> ; P8LE: # %bb.0: # %entry >> -; P8LE-NEXT: lfiwzx f0, 0, r5 >> -; P8LE-NEXT: xxspltw v2, v2, 2 >> -; P8LE-NEXT: xxswapd v3, f0 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: vmrglw v2, v2, v2 >> +; P8LE-NEXT: lxsiwzx v3, 0, r5 >> +; P8LE-NEXT: vmrghw v2, v2, v3 >> ; P8LE-NEXT: blr >> >> ; P8BE-LABEL: s2v_test_f5: >> >> diff --git a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll >> b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll >> index 935630745f47..097ba07a5b1e 100644 >> --- a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll >> +++ b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll >> @@ -13,60 +13,56 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -21386 >> -; P9LE-NEXT: ori r5, r5, 37253 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: lis r4, -21386 >> +; P9LE-NEXT: ori r4, r4, 37253 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 6 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, 31710 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 31710 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: ori r5, r5, 63421 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: sub r4, r5, r4 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: ori r4, r4, 63421 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: sub r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 6 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, 21399 >> ; P9LE-NEXT: mulli r4, r4, -124 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 21399 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: ori r5, r5, 33437 >> -; P9LE-NEXT: mulhw r4, r4, r5 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: ori r4, r4, 33437 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 5 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, -16728 >> ; P9LE-NEXT: mulli r4, r4, 98 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: ori r5, r5, 63249 >> -; P9LE-NEXT: mulhw r4, r4, r5 >> +; P9LE-NEXT: lis r4, -16728 >> +; P9LE-NEXT: ori r4, r4, 63249 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 8 >> ; P9LE-NEXT: add r4, r4, r5 >> ; P9LE-NEXT: mulli r4, r4, -1003 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -135,58 +131,54 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) { >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> ; P8LE-NEXT: lis r3, 21399 >> -; P8LE-NEXT: lis r9, -21386 >> -; P8LE-NEXT: lis r11, 31710 >> ; P8LE-NEXT: lis r8, -16728 >> +; P8LE-NEXT: lis r9, -21386 >> +; P8LE-NEXT: lis r10, 31710 >> ; P8LE-NEXT: ori r3, r3, 33437 >> -; P8LE-NEXT: ori r9, r9, 37253 >> ; P8LE-NEXT: ori r8, r8, 63249 >> +; P8LE-NEXT: ori r9, r9, 37253 >> +; P8LE-NEXT: ori r10, r10, 63421 >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: rldicl r5, r4, 32, 48 >> -; P8LE-NEXT: clrldi r7, r4, 48 >> ; P8LE-NEXT: rldicl r6, r4, 16, 48 >> +; P8LE-NEXT: clrldi r7, r4, 48 >> +; P8LE-NEXT: extsh r5, r5 >> +; P8LE-NEXT: extsh r6, r6 >> ; P8LE-NEXT: rldicl r4, r4, 48, 48 >> -; P8LE-NEXT: extsh r10, r5 >> -; P8LE-NEXT: extsh r0, r7 >> -; P8LE-NEXT: mulhw r3, r10, r3 >> -; P8LE-NEXT: ori r10, r11, 63421 >> -; P8LE-NEXT: extsh r11, r4 >> -; P8LE-NEXT: extsh r12, r6 >> -; P8LE-NEXT: mulhw r9, r0, r9 >> -; P8LE-NEXT: mulhw r10, r11, r10 >> -; P8LE-NEXT: mulhw r8, r12, r8 >> -; P8LE-NEXT: srwi r12, r3, 31 >> +; P8LE-NEXT: extsh r7, r7 >> +; P8LE-NEXT: mulhw r3, r5, r3 >> +; P8LE-NEXT: extsh r4, r4 >> +; P8LE-NEXT: mulhw r8, r6, r8 >> +; P8LE-NEXT: mulhw r9, r7, r9 >> +; P8LE-NEXT: mulhw r10, r4, r10 >> +; P8LE-NEXT: srwi r11, r3, 31 >> ; P8LE-NEXT: srawi r3, r3, 5 >> -; P8LE-NEXT: add r9, r9, r0 >> -; P8LE-NEXT: sub r10, r10, r11 >> -; P8LE-NEXT: add r3, r3, r12 >> +; P8LE-NEXT: add r3, r3, r11 >> +; P8LE-NEXT: srwi r11, r8, 31 >> +; P8LE-NEXT: add r9, r9, r7 >> +; P8LE-NEXT: srawi r8, r8, 8 >> +; P8LE-NEXT: sub r10, r10, r4 >> +; P8LE-NEXT: add r8, r8, r11 >> ; P8LE-NEXT: srwi r11, r9, 31 >> ; P8LE-NEXT: srawi r9, r9, 6 >> -; P8LE-NEXT: srwi r12, r8, 31 >> -; P8LE-NEXT: srawi r8, r8, 8 >> +; P8LE-NEXT: mulli r3, r3, 98 >> ; P8LE-NEXT: add r9, r9, r11 >> ; P8LE-NEXT: srwi r11, r10, 31 >> ; P8LE-NEXT: srawi r10, r10, 6 >> -; P8LE-NEXT: add r8, r8, r12 >> -; P8LE-NEXT: mulli r3, r3, 98 >> -; P8LE-NEXT: add r10, r10, r11 >> ; P8LE-NEXT: mulli r8, r8, -1003 >> +; P8LE-NEXT: add r10, r10, r11 >> ; P8LE-NEXT: mulli r9, r9, 95 >> ; P8LE-NEXT: mulli r10, r10, -124 >> ; P8LE-NEXT: sub r3, r5, r3 >> +; P8LE-NEXT: mtvsrd v2, r3 >> ; P8LE-NEXT: sub r5, r6, r8 >> -; P8LE-NEXT: mtfprd f0, r3 >> ; P8LE-NEXT: sub r3, r7, r9 >> +; P8LE-NEXT: mtvsrd v3, r5 >> ; P8LE-NEXT: sub r4, r4, r10 >> -; P8LE-NEXT: mtfprd f1, r5 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: vmrglh v3, v5, v4 >> +; P8LE-NEXT: mtvsrd v4, r3 >> +; P8LE-NEXT: mtvsrd v5, r4 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: vmrghh v3, v5, v4 >> ; P8LE-NEXT: vmrglw v2, v2, v3 >> ; P8LE-NEXT: blr >> ; >> @@ -256,56 +248,52 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -21386 >> -; P9LE-NEXT: ori r5, r5, 37253 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r6, r4, r5 >> -; P9LE-NEXT: add r4, r6, r4 >> -; P9LE-NEXT: srwi r6, r4, 31 >> -; P9LE-NEXT: srawi r4, r4, 6 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, -21386 >> +; P9LE-NEXT: ori r4, r4, 37253 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r5, r3, r4 >> +; P9LE-NEXT: add r5, r5, r3 >> +; P9LE-NEXT: srwi r6, r5, 31 >> +; P9LE-NEXT: srawi r5, r5, 6 >> +; P9LE-NEXT: add r5, r5, r6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r6, r4, r5 >> -; P9LE-NEXT: add r4, r6, r4 >> -; P9LE-NEXT: srwi r6, r4, 31 >> -; P9LE-NEXT: srawi r4, r4, 6 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r5, r3, r4 >> +; P9LE-NEXT: add r5, r5, r3 >> +; P9LE-NEXT: srwi r6, r5, 31 >> +; P9LE-NEXT: srawi r5, r5, 6 >> +; P9LE-NEXT: add r5, r5, r6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r6, r4, r5 >> -; P9LE-NEXT: add r4, r6, r4 >> -; P9LE-NEXT: srwi r6, r4, 31 >> -; P9LE-NEXT: srawi r4, r4, 6 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r5, r3, r4 >> +; P9LE-NEXT: add r5, r5, r3 >> +; P9LE-NEXT: srwi r6, r5, 31 >> +; P9LE-NEXT: srawi r5, r5, 6 >> +; P9LE-NEXT: add r5, r5, r6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 6 >> ; P9LE-NEXT: add r4, r4, r5 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -370,56 +358,50 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) { >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> ; P8LE-NEXT: lis r3, -21386 >> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill >> ; P8LE-NEXT: ori r3, r3, 37253 >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: clrldi r5, r4, 48 >> ; P8LE-NEXT: rldicl r6, r4, 48, 48 >> -; P8LE-NEXT: extsh r8, r5 >> +; P8LE-NEXT: extsh r5, r5 >> ; P8LE-NEXT: rldicl r7, r4, 32, 48 >> -; P8LE-NEXT: extsh r9, r6 >> -; P8LE-NEXT: mulhw r10, r8, r3 >> +; P8LE-NEXT: extsh r6, r6 >> +; P8LE-NEXT: mulhw r8, r5, r3 >> ; P8LE-NEXT: rldicl r4, r4, 16, 48 >> -; P8LE-NEXT: extsh r11, r7 >> -; P8LE-NEXT: mulhw r12, r9, r3 >> -; P8LE-NEXT: extsh r0, r4 >> -; P8LE-NEXT: mulhw r30, r11, r3 >> -; P8LE-NEXT: mulhw r3, r0, r3 >> -; P8LE-NEXT: add r8, r10, r8 >> -; P8LE-NEXT: add r9, r12, r9 >> -; P8LE-NEXT: srwi r10, r8, 31 >> +; P8LE-NEXT: extsh r7, r7 >> +; P8LE-NEXT: mulhw r9, r6, r3 >> +; P8LE-NEXT: extsh r4, r4 >> +; P8LE-NEXT: mulhw r10, r7, r3 >> +; P8LE-NEXT: mulhw r3, r4, r3 >> +; P8LE-NEXT: add r8, r8, r5 >> +; P8LE-NEXT: add r9, r9, r6 >> +; P8LE-NEXT: srwi r11, r8, 31 >> ; P8LE-NEXT: srawi r8, r8, 6 >> -; P8LE-NEXT: add r11, r30, r11 >> -; P8LE-NEXT: add r3, r3, r0 >> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload >> -; P8LE-NEXT: add r8, r8, r10 >> -; P8LE-NEXT: srwi r10, r9, 31 >> +; P8LE-NEXT: add r10, r10, r7 >> +; P8LE-NEXT: add r3, r3, r4 >> +; P8LE-NEXT: add r8, r8, r11 >> +; P8LE-NEXT: srwi r11, r9, 31 >> ; P8LE-NEXT: srawi r9, r9, 6 >> ; P8LE-NEXT: mulli r8, r8, 95 >> -; P8LE-NEXT: add r9, r9, r10 >> -; P8LE-NEXT: srwi r10, r11, 31 >> -; P8LE-NEXT: srawi r11, r11, 6 >> +; P8LE-NEXT: add r9, r9, r11 >> +; P8LE-NEXT: srwi r11, r10, 31 >> +; P8LE-NEXT: srawi r10, r10, 6 >> ; P8LE-NEXT: mulli r9, r9, 95 >> -; P8LE-NEXT: add r10, r11, r10 >> +; P8LE-NEXT: add r10, r10, r11 >> ; P8LE-NEXT: srwi r11, r3, 31 >> ; P8LE-NEXT: srawi r3, r3, 6 >> ; P8LE-NEXT: mulli r10, r10, 95 >> ; P8LE-NEXT: sub r5, r5, r8 >> ; P8LE-NEXT: add r3, r3, r11 >> -; P8LE-NEXT: mtfprd f0, r5 >> +; P8LE-NEXT: mtvsrd v2, r5 >> ; P8LE-NEXT: mulli r3, r3, 95 >> ; P8LE-NEXT: sub r6, r6, r9 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: xxswapd v2, vs0 >> +; P8LE-NEXT: mtvsrd v3, r6 >> ; P8LE-NEXT: sub r5, r7, r10 >> -; P8LE-NEXT: mtfprd f2, r5 >> -; P8LE-NEXT: xxswapd v3, vs1 >> +; P8LE-NEXT: mtvsrd v4, r5 >> ; P8LE-NEXT: sub r3, r4, r3 >> -; P8LE-NEXT: mtfprd f3, r3 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: vmrglh v3, v5, v4 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v3, v5, v4 >> ; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -487,67 +469,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -21386 >> -; P9LE-NEXT: ori r5, r5, 37253 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r6, r4, r5 >> -; P9LE-NEXT: add r4, r6, r4 >> -; P9LE-NEXT: srwi r6, r4, 31 >> -; P9LE-NEXT: srawi r4, r4, 6 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: mulli r6, r4, 95 >> +; P9LE-NEXT: lis r4, -21386 >> +; P9LE-NEXT: ori r4, r4, 37253 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r5, r3, r4 >> +; P9LE-NEXT: add r5, r5, r3 >> +; P9LE-NEXT: srwi r6, r5, 31 >> +; P9LE-NEXT: srawi r5, r5, 6 >> +; P9LE-NEXT: add r5, r5, r6 >> +; P9LE-NEXT: mulli r6, r5, 95 >> ; P9LE-NEXT: sub r3, r3, r6 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: extsh r6, r3 >> -; P9LE-NEXT: mulhw r7, r6, r5 >> +; P9LE-NEXT: mulhw r7, r6, r4 >> ; P9LE-NEXT: add r6, r7, r6 >> ; P9LE-NEXT: srwi r7, r6, 31 >> ; P9LE-NEXT: srawi r6, r6, 6 >> ; P9LE-NEXT: add r6, r6, r7 >> ; P9LE-NEXT: mulli r7, r6, 95 >> ; P9LE-NEXT: sub r3, r3, r7 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: extsh r7, r3 >> -; P9LE-NEXT: mulhw r8, r7, r5 >> +; P9LE-NEXT: mulhw r8, r7, r4 >> ; P9LE-NEXT: add r7, r8, r7 >> ; P9LE-NEXT: srwi r8, r7, 31 >> ; P9LE-NEXT: srawi r7, r7, 6 >> ; P9LE-NEXT: add r7, r7, r8 >> ; P9LE-NEXT: mulli r8, r7, 95 >> ; P9LE-NEXT: sub r3, r3, r8 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: extsh r8, r3 >> -; P9LE-NEXT: mulhw r5, r8, r5 >> -; P9LE-NEXT: add r5, r5, r8 >> -; P9LE-NEXT: srwi r8, r5, 31 >> -; P9LE-NEXT: srawi r5, r5, 6 >> -; P9LE-NEXT: add r5, r5, r8 >> -; P9LE-NEXT: mulli r8, r5, 95 >> +; P9LE-NEXT: mulhw r4, r8, r4 >> +; P9LE-NEXT: add r4, r4, r8 >> +; P9LE-NEXT: srwi r8, r4, 31 >> +; P9LE-NEXT: srawi r4, r4, 6 >> +; P9LE-NEXT: add r4, r4, r8 >> +; P9LE-NEXT: mulli r8, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r8 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: mtfprd f0, r4 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v4, r6 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r6 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r7 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r5 >> -; P9LE-NEXT: xxswapd v5, vs0 >> -; P9LE-NEXT: vmrglh v4, v5, v4 >> +; P9LE-NEXT: mtvsrd v3, r5 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r7 >> +; P9LE-NEXT: mtvsrd v5, r4 >> +; P9LE-NEXT: vmrghh v4, v5, v4 >> ; P9LE-NEXT: vmrglw v3, v4, v3 >> ; P9LE-NEXT: vadduhm v2, v2, v3 >> ; P9LE-NEXT: blr >> @@ -624,69 +598,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) { >> ; P8LE-LABEL: combine_srem_sdiv: >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> -; P8LE-NEXT: lis r4, -21386 >> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill >> -; P8LE-NEXT: ori r4, r4, 37253 >> -; P8LE-NEXT: mffprd r5, f0 >> -; P8LE-NEXT: clrldi r3, r5, 48 >> -; P8LE-NEXT: rldicl r6, r5, 48, 48 >> -; P8LE-NEXT: rldicl r7, r5, 32, 48 >> -; P8LE-NEXT: extsh r8, r3 >> -; P8LE-NEXT: extsh r9, r6 >> -; P8LE-NEXT: extsh r10, r7 >> -; P8LE-NEXT: mulhw r11, r8, r4 >> -; P8LE-NEXT: rldicl r5, r5, 16, 48 >> -; P8LE-NEXT: mulhw r12, r9, r4 >> -; P8LE-NEXT: mulhw r0, r10, r4 >> -; P8LE-NEXT: extsh r30, r5 >> -; P8LE-NEXT: mulhw r4, r30, r4 >> +; P8LE-NEXT: lis r3, -21386 >> +; P8LE-NEXT: ori r3, r3, 37253 >> +; P8LE-NEXT: mffprd r4, f0 >> +; P8LE-NEXT: clrldi r5, r4, 48 >> +; P8LE-NEXT: rldicl r6, r4, 48, 48 >> +; P8LE-NEXT: rldicl r7, r4, 32, 48 >> +; P8LE-NEXT: extsh r5, r5 >> +; P8LE-NEXT: extsh r8, r6 >> +; P8LE-NEXT: extsh r9, r7 >> +; P8LE-NEXT: mulhw r10, r5, r3 >> +; P8LE-NEXT: mulhw r11, r8, r3 >> +; P8LE-NEXT: rldicl r4, r4, 16, 48 >> +; P8LE-NEXT: mulhw r12, r9, r3 >> +; P8LE-NEXT: extsh r0, r4 >> +; P8LE-NEXT: mulhw r3, r0, r3 >> +; P8LE-NEXT: add r10, r10, r5 >> ; P8LE-NEXT: add r8, r11, r8 >> +; P8LE-NEXT: srwi r11, r10, 31 >> ; P8LE-NEXT: add r9, r12, r9 >> -; P8LE-NEXT: srwi r11, r8, 31 >> -; P8LE-NEXT: add r10, r0, r10 >> -; P8LE-NEXT: srawi r8, r8, 6 >> -; P8LE-NEXT: srawi r12, r9, 6 >> +; P8LE-NEXT: srawi r10, r10, 6 >> +; P8LE-NEXT: srawi r12, r8, 6 >> +; P8LE-NEXT: srwi r8, r8, 31 >> +; P8LE-NEXT: add r10, r10, r11 >> +; P8LE-NEXT: add r3, r3, r0 >> +; P8LE-NEXT: srawi r11, r9, 6 >> ; P8LE-NEXT: srwi r9, r9, 31 >> -; P8LE-NEXT: add r8, r8, r11 >> -; P8LE-NEXT: add r4, r4, r30 >> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload >> -; P8LE-NEXT: srawi r11, r10, 6 >> -; P8LE-NEXT: srwi r10, r10, 31 >> -; P8LE-NEXT: add r9, r12, r9 >> -; P8LE-NEXT: mtfprd f0, r8 >> -; P8LE-NEXT: mulli r12, r8, 95 >> -; P8LE-NEXT: add r10, r11, r10 >> -; P8LE-NEXT: srwi r8, r4, 31 >> -; P8LE-NEXT: mtfprd f1, r9 >> -; P8LE-NEXT: srawi r4, r4, 6 >> -; P8LE-NEXT: mulli r11, r9, 95 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f2, r10 >> -; P8LE-NEXT: mulli r9, r10, 95 >> -; P8LE-NEXT: add r4, r4, r8 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: mulli r4, r4, 95 >> -; P8LE-NEXT: xxswapd v1, vs2 >> -; P8LE-NEXT: sub r3, r3, r12 >> -; P8LE-NEXT: mtfprd f0, r3 >> -; P8LE-NEXT: sub r6, r6, r11 >> -; P8LE-NEXT: xxswapd v6, vs3 >> -; P8LE-NEXT: sub r3, r7, r9 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: mtfprd f4, r3 >> -; P8LE-NEXT: sub r3, r5, r4 >> -; P8LE-NEXT: mtfprd f5, r3 >> -; P8LE-NEXT: xxswapd v4, vs1 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: xxswapd v3, vs0 >> -; P8LE-NEXT: xxswapd v5, vs4 >> -; P8LE-NEXT: xxswapd v0, vs5 >> -; P8LE-NEXT: vmrglh v3, v4, v3 >> -; P8LE-NEXT: vmrglh v4, v0, v5 >> -; P8LE-NEXT: vmrglh v5, v6, v1 >> -; P8LE-NEXT: vmrglw v3, v4, v3 >> -; P8LE-NEXT: vmrglw v2, v5, v2 >> +; P8LE-NEXT: add r8, r12, r8 >> +; P8LE-NEXT: mtvsrd v2, r10 >> +; P8LE-NEXT: mulli r12, r10, 95 >> +; P8LE-NEXT: add r9, r11, r9 >> +; P8LE-NEXT: srwi r11, r3, 31 >> +; P8LE-NEXT: mtvsrd v3, r8 >> +; P8LE-NEXT: srawi r3, r3, 6 >> +; P8LE-NEXT: mulli r10, r8, 95 >> +; P8LE-NEXT: mtvsrd v4, r9 >> +; P8LE-NEXT: add r3, r3, r11 >> +; P8LE-NEXT: mulli r8, r9, 95 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: mulli r9, r3, 95 >> +; P8LE-NEXT: sub r5, r5, r12 >> +; P8LE-NEXT: sub r6, r6, r10 >> +; P8LE-NEXT: mtvsrd v3, r5 >> +; P8LE-NEXT: mtvsrd v5, r6 >> +; P8LE-NEXT: sub r5, r7, r8 >> +; P8LE-NEXT: sub r4, r4, r9 >> +; P8LE-NEXT: mtvsrd v0, r5 >> +; P8LE-NEXT: mtvsrd v1, r4 >> +; P8LE-NEXT: vmrghh v3, v5, v3 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v0, v1, v0 >> +; P8LE-NEXT: vmrghh v4, v5, v4 >> +; P8LE-NEXT: vmrglw v3, v0, v3 >> +; P8LE-NEXT: vmrglw v2, v4, v2 >> ; P8LE-NEXT: vadduhm v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -767,47 +731,43 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x >> i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: srawi r4, r4, 6 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: srawi r4, r3, 6 >> ; P9LE-NEXT: addze r4, r4 >> ; P9LE-NEXT: slwi r4, r4, 6 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: srawi r4, r4, 5 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: srawi r4, r3, 5 >> ; P9LE-NEXT: addze r4, r4 >> ; P9LE-NEXT: slwi r4, r4, 5 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, -21386 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -21386 >> -; P9LE-NEXT: ori r5, r5, 37253 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: ori r4, r4, 37253 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 6 >> ; P9LE-NEXT: add r4, r4, r5 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: srawi r4, r4, 3 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: srawi r4, r3, 3 >> ; P9LE-NEXT: addze r4, r4 >> ; P9LE-NEXT: slwi r4, r4, 3 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v4, v2 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v4, v2 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -866,42 +826,38 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x >> i16> %x) { >> ; P8LE-NEXT: ori r3, r3, 37253 >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: rldicl r5, r4, 16, 48 >> -; P8LE-NEXT: clrldi r7, r4, 48 >> -; P8LE-NEXT: extsh r6, r5 >> -; P8LE-NEXT: extsh r8, r7 >> -; P8LE-NEXT: mulhw r3, r6, r3 >> -; P8LE-NEXT: rldicl r9, r4, 48, 48 >> -; P8LE-NEXT: srawi r8, r8, 6 >> -; P8LE-NEXT: extsh r10, r9 >> +; P8LE-NEXT: clrldi r6, r4, 48 >> +; P8LE-NEXT: extsh r5, r5 >> +; P8LE-NEXT: extsh r6, r6 >> +; P8LE-NEXT: mulhw r3, r5, r3 >> +; P8LE-NEXT: rldicl r7, r4, 48, 48 >> +; P8LE-NEXT: srawi r8, r6, 6 >> +; P8LE-NEXT: extsh r7, r7 >> ; P8LE-NEXT: addze r8, r8 >> ; P8LE-NEXT: rldicl r4, r4, 32, 48 >> -; P8LE-NEXT: srawi r10, r10, 5 >> +; P8LE-NEXT: srawi r9, r7, 5 >> +; P8LE-NEXT: extsh r4, r4 >> ; P8LE-NEXT: slwi r8, r8, 6 >> -; P8LE-NEXT: add r3, r3, r6 >> -; P8LE-NEXT: addze r6, r10 >> -; P8LE-NEXT: sub r7, r7, r8 >> +; P8LE-NEXT: add r3, r3, r5 >> +; P8LE-NEXT: addze r9, r9 >> +; P8LE-NEXT: sub r6, r6, r8 >> ; P8LE-NEXT: srwi r10, r3, 31 >> ; P8LE-NEXT: srawi r3, r3, 6 >> -; P8LE-NEXT: mtfprd f0, r7 >> -; P8LE-NEXT: slwi r6, r6, 5 >> +; P8LE-NEXT: slwi r8, r9, 5 >> +; P8LE-NEXT: mtvsrd v2, r6 >> ; P8LE-NEXT: add r3, r3, r10 >> -; P8LE-NEXT: extsh r10, r4 >> -; P8LE-NEXT: sub r6, r9, r6 >> +; P8LE-NEXT: srawi r9, r4, 3 >> +; P8LE-NEXT: sub r6, r7, r8 >> ; P8LE-NEXT: mulli r3, r3, 95 >> -; P8LE-NEXT: srawi r8, r10, 3 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: addze r7, r8 >> -; P8LE-NEXT: xxswapd v3, vs1 >> +; P8LE-NEXT: addze r7, r9 >> +; P8LE-NEXT: mtvsrd v3, r6 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> ; P8LE-NEXT: sub r3, r5, r3 >> ; P8LE-NEXT: slwi r5, r7, 3 >> ; P8LE-NEXT: sub r4, r4, r5 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: vmrglh v3, v4, v5 >> +; P8LE-NEXT: mtvsrd v4, r3 >> +; P8LE-NEXT: mtvsrd v5, r4 >> +; P8LE-NEXT: vmrghh v3, v4, v5 >> ; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -959,48 +915,46 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -14230 >> -; P9LE-NEXT: ori r5, r5, 30865 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: lis r4, -14230 >> +; P9LE-NEXT: ori r4, r4, 30865 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 9 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, -19946 >> ; P9LE-NEXT: mulli r4, r4, 654 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, -19946 >> +; P9LE-NEXT: mtvsrd v3, r3 >> +; P9LE-NEXT: li r3, 0 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> -; P9LE-NEXT: ori r5, r5, 17097 >> -; P9LE-NEXT: xxlxor v3, v3, v3 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: ori r4, r4, 17097 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 4 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, 24749 >> ; P9LE-NEXT: mulli r4, r4, 23 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: vmrghh v3, v3, v4 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: ori r5, r5, 47143 >> -; P9LE-NEXT: mulhw r4, r4, r5 >> +; P9LE-NEXT: lis r4, 24749 >> +; P9LE-NEXT: ori r4, r4, 47143 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 11 >> ; P9LE-NEXT: add r4, r4, r5 >> ; P9LE-NEXT: mulli r4, r4, 5423 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -1058,49 +1012,47 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> >> %x) { >> ; P8LE-LABEL: dont_fold_srem_one: >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> -; P8LE-NEXT: lis r3, 24749 >> -; P8LE-NEXT: lis r7, -19946 >> -; P8LE-NEXT: lis r9, -14230 >> -; P8LE-NEXT: xxlxor v5, v5, v5 >> -; P8LE-NEXT: ori r3, r3, 47143 >> -; P8LE-NEXT: ori r7, r7, 17097 >> -; P8LE-NEXT: mffprd r4, f0 >> -; P8LE-NEXT: rldicl r5, r4, 16, 48 >> -; P8LE-NEXT: rldicl r6, r4, 32, 48 >> -; P8LE-NEXT: rldicl r4, r4, 48, 48 >> -; P8LE-NEXT: extsh r8, r5 >> -; P8LE-NEXT: extsh r10, r6 >> -; P8LE-NEXT: mulhw r3, r8, r3 >> -; P8LE-NEXT: ori r8, r9, 30865 >> -; P8LE-NEXT: extsh r9, r4 >> -; P8LE-NEXT: mulhw r7, r10, r7 >> -; P8LE-NEXT: mulhw r8, r9, r8 >> -; P8LE-NEXT: add r7, r7, r10 >> -; P8LE-NEXT: srwi r10, r3, 31 >> -; P8LE-NEXT: add r8, r8, r9 >> -; P8LE-NEXT: srawi r3, r3, 11 >> -; P8LE-NEXT: srwi r9, r7, 31 >> -; P8LE-NEXT: srawi r7, r7, 4 >> -; P8LE-NEXT: add r3, r3, r10 >> -; P8LE-NEXT: add r7, r7, r9 >> +; P8LE-NEXT: lis r5, 24749 >> +; P8LE-NEXT: lis r6, -19946 >> +; P8LE-NEXT: lis r8, -14230 >> +; P8LE-NEXT: ori r5, r5, 47143 >> +; P8LE-NEXT: ori r6, r6, 17097 >> +; P8LE-NEXT: ori r8, r8, 30865 >> +; P8LE-NEXT: mffprd r3, f0 >> +; P8LE-NEXT: rldicl r4, r3, 16, 48 >> +; P8LE-NEXT: rldicl r7, r3, 32, 48 >> +; P8LE-NEXT: rldicl r3, r3, 48, 48 >> +; P8LE-NEXT: extsh r4, r4 >> +; P8LE-NEXT: extsh r7, r7 >> +; P8LE-NEXT: extsh r3, r3 >> +; P8LE-NEXT: mulhw r5, r4, r5 >> +; P8LE-NEXT: mulhw r6, r7, r6 >> +; P8LE-NEXT: mulhw r8, r3, r8 >> +; P8LE-NEXT: srwi r9, r5, 31 >> +; P8LE-NEXT: srawi r5, r5, 11 >> +; P8LE-NEXT: add r6, r6, r7 >> +; P8LE-NEXT: add r8, r8, r3 >> +; P8LE-NEXT: add r5, r5, r9 >> +; P8LE-NEXT: srwi r9, r6, 31 >> +; P8LE-NEXT: srawi r6, r6, 4 >> +; P8LE-NEXT: add r6, r6, r9 >> ; P8LE-NEXT: srwi r9, r8, 31 >> ; P8LE-NEXT: srawi r8, r8, 9 >> -; P8LE-NEXT: mulli r3, r3, 5423 >> +; P8LE-NEXT: mulli r5, r5, 5423 >> ; P8LE-NEXT: add r8, r8, r9 >> -; P8LE-NEXT: mulli r7, r7, 23 >> +; P8LE-NEXT: mulli r6, r6, 23 >> +; P8LE-NEXT: li r9, 0 >> ; P8LE-NEXT: mulli r8, r8, 654 >> -; P8LE-NEXT: sub r3, r5, r3 >> -; P8LE-NEXT: mtfprd f0, r3 >> -; P8LE-NEXT: sub r3, r6, r7 >> -; P8LE-NEXT: sub r4, r4, r8 >> -; P8LE-NEXT: mtfprd f1, r3 >> -; P8LE-NEXT: mtfprd f2, r4 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v2, v2, v3 >> -; P8LE-NEXT: vmrglh v3, v4, v5 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: mtvsrd v2, r9 >> +; P8LE-NEXT: sub r4, r4, r5 >> +; P8LE-NEXT: sub r5, r7, r6 >> +; P8LE-NEXT: mtvsrd v3, r4 >> +; P8LE-NEXT: sub r3, r3, r8 >> +; P8LE-NEXT: mtvsrd v4, r5 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v3, v3, v4 >> +; P8LE-NEXT: vmrghh v2, v5, v2 >> +; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> ; P8BE-LABEL: dont_fold_srem_one: >> @@ -1161,43 +1113,41 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x >> i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -19946 >> -; P9LE-NEXT: ori r5, r5, 17097 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: mulhw r5, r4, r5 >> -; P9LE-NEXT: add r4, r5, r4 >> +; P9LE-NEXT: lis r4, -19946 >> +; P9LE-NEXT: ori r4, r4, 17097 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> +; P9LE-NEXT: add r4, r4, r3 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 4 >> ; P9LE-NEXT: add r4, r4, r5 >> -; P9LE-NEXT: lis r5, 24749 >> ; P9LE-NEXT: mulli r4, r4, 23 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 24749 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: ori r5, r5, 47143 >> -; P9LE-NEXT: mulhw r4, r4, r5 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: ori r4, r4, 47143 >> +; P9LE-NEXT: mulhw r4, r3, r4 >> ; P9LE-NEXT: srwi r5, r4, 31 >> ; P9LE-NEXT: srawi r4, r4, 11 >> ; P9LE-NEXT: add r4, r4, r5 >> ; P9LE-NEXT: mulli r4, r4, 5423 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: extsh r4, r3 >> -; P9LE-NEXT: srawi r4, r4, 15 >> +; P9LE-NEXT: extsh r3, r3 >> +; P9LE-NEXT: srawi r4, r3, 15 >> ; P9LE-NEXT: addze r4, r4 >> ; P9LE-NEXT: slwi r4, r4, 15 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxlxor v4, v4, v4 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: li r3, 0 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v3, v2 >> ; P9LE-NEXT: blr >> ; >> @@ -1252,42 +1202,40 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x >> i16> %x) { >> ; P8LE-NEXT: xxswapd vs0, v2 >> ; P8LE-NEXT: lis r4, 24749 >> ; P8LE-NEXT: lis r5, -19946 >> -; P8LE-NEXT: xxlxor v5, v5, v5 >> ; P8LE-NEXT: ori r4, r4, 47143 >> ; P8LE-NEXT: ori r5, r5, 17097 >> ; P8LE-NEXT: mffprd r3, f0 >> ; P8LE-NEXT: rldicl r6, r3, 16, 48 >> ; P8LE-NEXT: rldicl r7, r3, 32, 48 >> -; P8LE-NEXT: extsh r8, r6 >> -; P8LE-NEXT: extsh r9, r7 >> -; P8LE-NEXT: mulhw r4, r8, r4 >> -; P8LE-NEXT: mulhw r5, r9, r5 >> +; P8LE-NEXT: extsh r6, r6 >> +; P8LE-NEXT: extsh r7, r7 >> +; P8LE-NEXT: mulhw r4, r6, r4 >> +; P8LE-NEXT: mulhw r5, r7, r5 >> ; P8LE-NEXT: rldicl r3, r3, 48, 48 >> +; P8LE-NEXT: extsh r3, r3 >> ; P8LE-NEXT: srwi r8, r4, 31 >> ; P8LE-NEXT: srawi r4, r4, 11 >> -; P8LE-NEXT: add r5, r5, r9 >> +; P8LE-NEXT: add r5, r5, r7 >> ; P8LE-NEXT: add r4, r4, r8 >> ; P8LE-NEXT: srwi r8, r5, 31 >> ; P8LE-NEXT: srawi r5, r5, 4 >> ; P8LE-NEXT: mulli r4, r4, 5423 >> ; P8LE-NEXT: add r5, r5, r8 >> -; P8LE-NEXT: extsh r8, r3 >> +; P8LE-NEXT: srawi r9, r3, 15 >> +; P8LE-NEXT: li r8, 0 >> ; P8LE-NEXT: mulli r5, r5, 23 >> -; P8LE-NEXT: srawi r8, r8, 15 >> +; P8LE-NEXT: mtvsrd v2, r8 >> ; P8LE-NEXT: sub r4, r6, r4 >> -; P8LE-NEXT: addze r6, r8 >> -; P8LE-NEXT: mtfprd f0, r4 >> -; P8LE-NEXT: slwi r4, r6, 15 >> +; P8LE-NEXT: addze r6, r9 >> +; P8LE-NEXT: slwi r6, r6, 15 >> +; P8LE-NEXT: mtvsrd v3, r4 >> ; P8LE-NEXT: sub r5, r7, r5 >> -; P8LE-NEXT: sub r3, r3, r4 >> -; P8LE-NEXT: mtfprd f1, r5 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v2, v2, v3 >> -; P8LE-NEXT: vmrglh v3, v4, v5 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: sub r3, r3, r6 >> +; P8LE-NEXT: mtvsrd v4, r5 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v3, v3, v4 >> +; P8LE-NEXT: vmrghh v2, v5, v2 >> +; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> ; P8BE-LABEL: dont_fold_urem_i16_smax: >> >> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll >> b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll >> index 323397202c00..95f0fc25f2dd 100644 >> --- a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll >> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll >> @@ -15,10 +15,10 @@ entry: >> } >> >> ; CHECK-LABEL: @bar0 >> +; CHECK-DAG: xxswapd 1, 1 >> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]] >> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]] >> -; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1 >> -; CHECK: stxvd2x [[REG3]] >> +; CHECK: xxmrgld [[REG2:[0-9]+]], 1, [[REG1]] >> +; CHECK: stxvd2x [[REG2]] >> ; CHECK-NOT: xxswapd >> >> define void @bar1(double %y) { >> @@ -30,10 +30,10 @@ entry: >> } >> >> ; CHECK-LABEL: @bar1 >> +; CHECK-DAG: xxswapd 1, 1 >> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]] >> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]] >> -; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]] >> -; CHECK: stxvd2x [[REG3]] >> +; CHECK: xxpermdi [[REG2:[0-9]+]], [[REG1]], 1, 1 >> +; CHECK: stxvd2x [[REG2]] >> ; CHECK-NOT: xxswapd >> >> define void @baz0() { >> >> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll >> b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll >> index 23738eaa95a7..4437e6799269 100644 >> --- a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll >> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll >> @@ -27,7 +27,7 @@ define void @bar0() { >> ; CHECK: ld r3, .LC0 at toc@l(r3) >> ; CHECK: addis r3, r2, .LC2 at toc@ha >> ; CHECK: ld r3, .LC2 at toc@l(r3) >> -; CHECK: xxpermdi vs0, vs0, vs1, 1 >> +; CHECK: xxmrgld vs0, vs0, vs1 >> ; CHECK: stxvd2x vs0, 0, r3 >> ; CHECK: blr >> ; >> @@ -38,7 +38,7 @@ define void @bar0() { >> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha >> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha >> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3) >> -; CHECK-P9-NOVECTOR: xxpermdi vs0, vs1, vs0, 1 >> +; CHECK-P9-NOVECTOR: xxmrgld vs0, vs1, vs0 >> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3 >> ; CHECK-P9-NOVECTOR: blr >> ; >> @@ -72,7 +72,7 @@ define void @bar1() { >> ; CHECK: ld r3, .LC0 at toc@l(r3) >> ; CHECK: addis r3, r2, .LC2 at toc@ha >> ; CHECK: ld r3, .LC2 at toc@l(r3) >> -; CHECK: xxmrghd vs0, vs1, vs0 >> +; CHECK: xxpermdi vs0, vs1, vs0, 1 >> ; CHECK: stxvd2x vs0, 0, r3 >> ; CHECK: blr >> ; >> @@ -83,7 +83,7 @@ define void @bar1() { >> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha >> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha >> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3) >> -; CHECK-P9-NOVECTOR: xxmrghd vs0, vs0, vs1 >> +; CHECK-P9-NOVECTOR: xxpermdi vs0, vs0, vs1, 1 >> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3 >> ; CHECK-P9-NOVECTOR: blr >> ; >> >> diff --git a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll >> b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll >> index d853a420dcd8..4bb3730aa043 100644 >> --- a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll >> +++ b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll >> @@ -13,53 +13,50 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, 21399 >> -; P9LE-NEXT: ori r5, r5, 33437 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: lis r5, 16727 >> -; P9LE-NEXT: ori r5, r5, 2287 >> +; P9LE-NEXT: lis r4, 21399 >> +; P9LE-NEXT: ori r4, r4, 33437 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> ; P9LE-NEXT: srwi r4, r4, 5 >> ; P9LE-NEXT: mulli r4, r4, 98 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 16727 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: lis r5, 8456 >> -; P9LE-NEXT: ori r5, r5, 16913 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: ori r4, r4, 2287 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> ; P9LE-NEXT: srwi r4, r4, 8 >> ; P9LE-NEXT: mulli r4, r4, 1003 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: rlwinm r4, r3, 30, 18, 31 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: lis r5, 22765 >> -; P9LE-NEXT: ori r5, r5, 8969 >> -; P9LE-NEXT: srwi r4, r4, 2 >> -; P9LE-NEXT: mulli r4, r4, 124 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r5, 8456 >> +; P9LE-NEXT: ori r5, r5, 16913 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: clrlwi r4, r3, 16 >> +; P9LE-NEXT: rlwinm r3, r3, 30, 18, 31 >> +; P9LE-NEXT: mulhwu r3, r3, r5 >> +; P9LE-NEXT: srwi r3, r3, 2 >> +; P9LE-NEXT: mulli r3, r3, 124 >> +; P9LE-NEXT: sub r3, r4, r3 >> +; P9LE-NEXT: lis r4, 22765 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r5, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r5 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r5 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: ori r4, r4, 8969 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> +; P9LE-NEXT: sub r5, r3, r4 >> +; P9LE-NEXT: srwi r5, r5, 1 >> +; P9LE-NEXT: add r4, r5, r4 >> ; P9LE-NEXT: srwi r4, r4, 6 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v4, v2 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v4, v2 >> ; P9LE-NEXT: vmrglw v2, v3, v2 >> ; P9LE-NEXT: blr >> ; >> @@ -123,50 +120,47 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) { >> ; P8LE-NEXT: xxswapd vs0, v2 >> ; P8LE-NEXT: lis r3, 22765 >> ; P8LE-NEXT: lis r7, 21399 >> -; P8LE-NEXT: lis r10, 16727 >> +; P8LE-NEXT: lis r9, 16727 >> +; P8LE-NEXT: lis r10, 8456 >> ; P8LE-NEXT: ori r3, r3, 8969 >> ; P8LE-NEXT: ori r7, r7, 33437 >> -; P8LE-NEXT: ori r10, r10, 2287 >> +; P8LE-NEXT: ori r9, r9, 2287 >> +; P8LE-NEXT: ori r10, r10, 16913 >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: clrldi r6, r4, 48 >> ; P8LE-NEXT: rldicl r5, r4, 32, 48 >> -; P8LE-NEXT: clrlwi r9, r6, 16 >> +; P8LE-NEXT: clrlwi r6, r6, 16 >> ; P8LE-NEXT: rldicl r8, r4, 16, 48 >> -; P8LE-NEXT: clrlwi r11, r5, 16 >> -; P8LE-NEXT: mulhwu r3, r9, r3 >> -; P8LE-NEXT: clrlwi r12, r8, 16 >> -; P8LE-NEXT: mulhwu r7, r11, r7 >> -; P8LE-NEXT: lis r11, 8456 >> +; P8LE-NEXT: clrlwi r5, r5, 16 >> +; P8LE-NEXT: mulhwu r3, r6, r3 >> ; P8LE-NEXT: rldicl r4, r4, 48, 48 >> -; P8LE-NEXT: mulhwu r10, r12, r10 >> -; P8LE-NEXT: ori r11, r11, 16913 >> -; P8LE-NEXT: rlwinm r12, r4, 30, 18, 31 >> -; P8LE-NEXT: mulhwu r11, r12, r11 >> -; P8LE-NEXT: sub r9, r9, r3 >> -; P8LE-NEXT: srwi r9, r9, 1 >> +; P8LE-NEXT: clrlwi r8, r8, 16 >> +; P8LE-NEXT: rlwinm r11, r4, 30, 18, 31 >> +; P8LE-NEXT: mulhwu r7, r5, r7 >> +; P8LE-NEXT: clrlwi r4, r4, 16 >> +; P8LE-NEXT: mulhwu r9, r8, r9 >> +; P8LE-NEXT: mulhwu r10, r11, r10 >> +; P8LE-NEXT: sub r11, r6, r3 >> +; P8LE-NEXT: srwi r11, r11, 1 >> ; P8LE-NEXT: srwi r7, r7, 5 >> -; P8LE-NEXT: add r3, r9, r3 >> -; P8LE-NEXT: srwi r9, r10, 8 >> +; P8LE-NEXT: add r3, r11, r3 >> +; P8LE-NEXT: srwi r9, r9, 8 >> +; P8LE-NEXT: srwi r10, r10, 2 >> ; P8LE-NEXT: srwi r3, r3, 6 >> ; P8LE-NEXT: mulli r7, r7, 98 >> -; P8LE-NEXT: srwi r10, r11, 2 >> ; P8LE-NEXT: mulli r9, r9, 1003 >> ; P8LE-NEXT: mulli r3, r3, 95 >> ; P8LE-NEXT: mulli r10, r10, 124 >> ; P8LE-NEXT: sub r5, r5, r7 >> ; P8LE-NEXT: sub r7, r8, r9 >> -; P8LE-NEXT: mtfprd f0, r5 >> ; P8LE-NEXT: sub r3, r6, r3 >> +; P8LE-NEXT: mtvsrd v2, r5 >> ; P8LE-NEXT: sub r4, r4, r10 >> -; P8LE-NEXT: mtfprd f1, r7 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: vmrglh v3, v5, v4 >> +; P8LE-NEXT: mtvsrd v3, r7 >> +; P8LE-NEXT: mtvsrd v4, r3 >> +; P8LE-NEXT: mtvsrd v5, r4 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: vmrghh v3, v5, v4 >> ; P8LE-NEXT: vmrglw v2, v2, v3 >> ; P8LE-NEXT: blr >> ; >> @@ -230,56 +224,52 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, 22765 >> -; P9LE-NEXT: ori r5, r5, 8969 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r6, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 22765 >> +; P9LE-NEXT: ori r4, r4, 8969 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r5, r3, r4 >> +; P9LE-NEXT: sub r6, r3, r5 >> +; P9LE-NEXT: srwi r6, r6, 1 >> +; P9LE-NEXT: add r5, r6, r5 >> +; P9LE-NEXT: srwi r5, r5, 6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r6, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r5, r3, r4 >> +; P9LE-NEXT: sub r6, r3, r5 >> +; P9LE-NEXT: srwi r6, r6, 1 >> +; P9LE-NEXT: add r5, r6, r5 >> +; P9LE-NEXT: srwi r5, r5, 6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r6, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 6 >> -; P9LE-NEXT: mulli r4, r4, 95 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r5, r3, r4 >> +; P9LE-NEXT: sub r6, r3, r5 >> +; P9LE-NEXT: srwi r6, r6, 1 >> +; P9LE-NEXT: add r5, r6, r5 >> +; P9LE-NEXT: srwi r5, r5, 6 >> +; P9LE-NEXT: mulli r5, r5, 95 >> +; P9LE-NEXT: sub r3, r3, r5 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r5, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r5 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r5 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> +; P9LE-NEXT: sub r5, r3, r4 >> +; P9LE-NEXT: srwi r5, r5, 1 >> +; P9LE-NEXT: add r4, r5, r4 >> ; P9LE-NEXT: srwi r4, r4, 6 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -344,36 +334,34 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> ; P8LE-NEXT: lis r3, 22765 >> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill >> ; P8LE-NEXT: ori r3, r3, 8969 >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: clrldi r5, r4, 48 >> ; P8LE-NEXT: rldicl r6, r4, 48, 48 >> -; P8LE-NEXT: clrlwi r8, r5, 16 >> +; P8LE-NEXT: clrlwi r5, r5, 16 >> ; P8LE-NEXT: rldicl r7, r4, 32, 48 >> -; P8LE-NEXT: clrlwi r9, r6, 16 >> +; P8LE-NEXT: clrlwi r6, r6, 16 >> +; P8LE-NEXT: mulhwu r8, r5, r3 >> ; P8LE-NEXT: rldicl r4, r4, 16, 48 >> -; P8LE-NEXT: mulhwu r10, r8, r3 >> -; P8LE-NEXT: clrlwi r11, r7, 16 >> -; P8LE-NEXT: clrlwi r0, r4, 16 >> -; P8LE-NEXT: mulhwu r12, r9, r3 >> -; P8LE-NEXT: mulhwu r30, r11, r3 >> -; P8LE-NEXT: mulhwu r3, r0, r3 >> -; P8LE-NEXT: sub r8, r8, r10 >> -; P8LE-NEXT: srwi r8, r8, 1 >> -; P8LE-NEXT: sub r9, r9, r12 >> -; P8LE-NEXT: add r8, r8, r10 >> -; P8LE-NEXT: sub r10, r11, r30 >> -; P8LE-NEXT: sub r11, r0, r3 >> -; P8LE-NEXT: srwi r9, r9, 1 >> -; P8LE-NEXT: srwi r10, r10, 1 >> +; P8LE-NEXT: clrlwi r7, r7, 16 >> +; P8LE-NEXT: mulhwu r9, r6, r3 >> +; P8LE-NEXT: clrlwi r4, r4, 16 >> +; P8LE-NEXT: mulhwu r10, r7, r3 >> +; P8LE-NEXT: mulhwu r3, r4, r3 >> +; P8LE-NEXT: sub r11, r5, r8 >> +; P8LE-NEXT: sub r12, r6, r9 >> +; P8LE-NEXT: srwi r11, r11, 1 >> +; P8LE-NEXT: add r8, r11, r8 >> +; P8LE-NEXT: sub r11, r7, r10 >> +; P8LE-NEXT: srwi r12, r12, 1 >> +; P8LE-NEXT: add r9, r12, r9 >> +; P8LE-NEXT: sub r12, r4, r3 >> ; P8LE-NEXT: srwi r11, r11, 1 >> -; P8LE-NEXT: add r9, r9, r12 >> ; P8LE-NEXT: srwi r8, r8, 6 >> -; P8LE-NEXT: add r10, r10, r30 >> -; P8LE-NEXT: add r3, r11, r3 >> +; P8LE-NEXT: add r10, r11, r10 >> +; P8LE-NEXT: srwi r11, r12, 1 >> ; P8LE-NEXT: srwi r9, r9, 6 >> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload >> +; P8LE-NEXT: add r3, r11, r3 >> ; P8LE-NEXT: mulli r8, r8, 95 >> ; P8LE-NEXT: srwi r10, r10, 6 >> ; P8LE-NEXT: srwi r3, r3, 6 >> @@ -382,18 +370,14 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) { >> ; P8LE-NEXT: mulli r3, r3, 95 >> ; P8LE-NEXT: sub r5, r5, r8 >> ; P8LE-NEXT: sub r6, r6, r9 >> -; P8LE-NEXT: mtfprd f0, r5 >> +; P8LE-NEXT: mtvsrd v2, r5 >> ; P8LE-NEXT: sub r5, r7, r10 >> ; P8LE-NEXT: sub r3, r4, r3 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: mtfprd f2, r5 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f3, r3 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: vmrglh v3, v5, v4 >> +; P8LE-NEXT: mtvsrd v3, r6 >> +; P8LE-NEXT: mtvsrd v4, r5 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: vmrghh v3, v5, v4 >> ; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -461,67 +445,59 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, 22765 >> -; P9LE-NEXT: ori r5, r5, 8969 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r6, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r6 >> -; P9LE-NEXT: srwi r4, r4, 6 >> -; P9LE-NEXT: mulli r6, r4, 95 >> +; P9LE-NEXT: lis r4, 22765 >> +; P9LE-NEXT: ori r4, r4, 8969 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r5, r3, r4 >> +; P9LE-NEXT: sub r6, r3, r5 >> +; P9LE-NEXT: srwi r6, r6, 1 >> +; P9LE-NEXT: add r5, r6, r5 >> +; P9LE-NEXT: srwi r5, r5, 6 >> +; P9LE-NEXT: mulli r6, r5, 95 >> ; P9LE-NEXT: sub r3, r3, r6 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r6, r3, 16 >> -; P9LE-NEXT: mulhwu r7, r6, r5 >> +; P9LE-NEXT: mulhwu r7, r6, r4 >> ; P9LE-NEXT: sub r6, r6, r7 >> ; P9LE-NEXT: srwi r6, r6, 1 >> ; P9LE-NEXT: add r6, r6, r7 >> ; P9LE-NEXT: srwi r6, r6, 6 >> ; P9LE-NEXT: mulli r7, r6, 95 >> ; P9LE-NEXT: sub r3, r3, r7 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r7, r3, 16 >> -; P9LE-NEXT: mulhwu r8, r7, r5 >> +; P9LE-NEXT: mulhwu r8, r7, r4 >> ; P9LE-NEXT: sub r7, r7, r8 >> ; P9LE-NEXT: srwi r7, r7, 1 >> ; P9LE-NEXT: add r7, r7, r8 >> ; P9LE-NEXT: srwi r7, r7, 6 >> ; P9LE-NEXT: mulli r8, r7, 95 >> ; P9LE-NEXT: sub r3, r3, r8 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r8, r3, 16 >> -; P9LE-NEXT: mulhwu r5, r8, r5 >> -; P9LE-NEXT: sub r8, r8, r5 >> +; P9LE-NEXT: mulhwu r4, r8, r4 >> +; P9LE-NEXT: sub r8, r8, r4 >> ; P9LE-NEXT: srwi r8, r8, 1 >> -; P9LE-NEXT: add r5, r8, r5 >> -; P9LE-NEXT: srwi r5, r5, 6 >> -; P9LE-NEXT: mulli r8, r5, 95 >> +; P9LE-NEXT: add r4, r8, r4 >> +; P9LE-NEXT: srwi r4, r4, 6 >> +; P9LE-NEXT: mulli r8, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r8 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: mtfprd f0, r4 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> +; P9LE-NEXT: mtvsrd v4, r6 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r6 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r7 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r5 >> -; P9LE-NEXT: xxswapd v5, vs0 >> -; P9LE-NEXT: vmrglh v4, v5, v4 >> +; P9LE-NEXT: mtvsrd v3, r5 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: mtvsrd v4, r7 >> +; P9LE-NEXT: mtvsrd v5, r4 >> +; P9LE-NEXT: vmrghh v4, v5, v4 >> ; P9LE-NEXT: vmrglw v3, v4, v3 >> ; P9LE-NEXT: vadduhm v2, v2, v3 >> ; P9LE-NEXT: blr >> @@ -598,69 +574,61 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) { >> ; P8LE-LABEL: combine_urem_udiv: >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> -; P8LE-NEXT: lis r4, 22765 >> +; P8LE-NEXT: lis r3, 22765 >> ; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill >> -; P8LE-NEXT: ori r4, r4, 8969 >> -; P8LE-NEXT: mffprd r5, f0 >> -; P8LE-NEXT: clrldi r3, r5, 48 >> -; P8LE-NEXT: rldicl r6, r5, 48, 48 >> -; P8LE-NEXT: clrlwi r8, r3, 16 >> -; P8LE-NEXT: rldicl r7, r5, 32, 48 >> -; P8LE-NEXT: clrlwi r9, r6, 16 >> -; P8LE-NEXT: mulhwu r10, r8, r4 >> -; P8LE-NEXT: clrlwi r11, r7, 16 >> -; P8LE-NEXT: rldicl r5, r5, 16, 48 >> -; P8LE-NEXT: mulhwu r12, r9, r4 >> -; P8LE-NEXT: mulhwu r0, r11, r4 >> -; P8LE-NEXT: clrlwi r30, r5, 16 >> -; P8LE-NEXT: mulhwu r4, r30, r4 >> -; P8LE-NEXT: sub r8, r8, r10 >> +; P8LE-NEXT: ori r3, r3, 8969 >> +; P8LE-NEXT: mffprd r4, f0 >> +; P8LE-NEXT: clrldi r5, r4, 48 >> +; P8LE-NEXT: rldicl r6, r4, 48, 48 >> +; P8LE-NEXT: clrlwi r5, r5, 16 >> +; P8LE-NEXT: clrlwi r8, r6, 16 >> +; P8LE-NEXT: rldicl r7, r4, 32, 48 >> +; P8LE-NEXT: rldicl r4, r4, 16, 48 >> +; P8LE-NEXT: mulhwu r9, r5, r3 >> +; P8LE-NEXT: mulhwu r11, r8, r3 >> +; P8LE-NEXT: clrlwi r10, r7, 16 >> +; P8LE-NEXT: clrlwi r12, r4, 16 >> +; P8LE-NEXT: mulhwu r0, r10, r3 >> +; P8LE-NEXT: mulhwu r3, r12, r3 >> +; P8LE-NEXT: sub r30, r5, r9 >> +; P8LE-NEXT: sub r8, r8, r11 >> +; P8LE-NEXT: srwi r30, r30, 1 >> ; P8LE-NEXT: srwi r8, r8, 1 >> -; P8LE-NEXT: sub r9, r9, r12 >> -; P8LE-NEXT: add r8, r8, r10 >> -; P8LE-NEXT: sub r10, r11, r0 >> -; P8LE-NEXT: srwi r9, r9, 1 >> +; P8LE-NEXT: sub r10, r10, r0 >> +; P8LE-NEXT: add r9, r30, r9 >> +; P8LE-NEXT: add r8, r8, r11 >> +; P8LE-NEXT: sub r11, r12, r3 >> ; P8LE-NEXT: srwi r10, r10, 1 >> -; P8LE-NEXT: sub r11, r30, r4 >> -; P8LE-NEXT: add r9, r9, r12 >> -; P8LE-NEXT: srwi r8, r8, 6 >> ; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload >> -; P8LE-NEXT: add r10, r10, r0 >> -; P8LE-NEXT: srwi r11, r11, 1 >> ; P8LE-NEXT: srwi r9, r9, 6 >> -; P8LE-NEXT: mtfprd f0, r8 >> -; P8LE-NEXT: mulli r12, r8, 95 >> +; P8LE-NEXT: srwi r11, r11, 1 >> +; P8LE-NEXT: srwi r8, r8, 6 >> +; P8LE-NEXT: add r10, r10, r0 >> +; P8LE-NEXT: mulli r12, r9, 95 >> +; P8LE-NEXT: add r3, r11, r3 >> +; P8LE-NEXT: mtvsrd v2, r9 >> ; P8LE-NEXT: srwi r10, r10, 6 >> -; P8LE-NEXT: add r4, r11, r4 >> -; P8LE-NEXT: mtfprd f1, r9 >> -; P8LE-NEXT: mulli r8, r9, 95 >> -; P8LE-NEXT: mulli r9, r10, 95 >> -; P8LE-NEXT: srwi r4, r4, 6 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: mtfprd f2, r10 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: mulli r4, r4, 95 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v1, vs2 >> -; P8LE-NEXT: sub r3, r3, r12 >> -; P8LE-NEXT: xxswapd v6, vs3 >> -; P8LE-NEXT: mtfprd f0, r3 >> -; P8LE-NEXT: sub r3, r7, r9 >> -; P8LE-NEXT: sub r6, r6, r8 >> -; P8LE-NEXT: mtfprd f4, r3 >> -; P8LE-NEXT: sub r3, r5, r4 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: mtfprd f5, r3 >> -; P8LE-NEXT: xxswapd v5, vs4 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: xxswapd v3, vs0 >> -; P8LE-NEXT: xxswapd v4, vs1 >> -; P8LE-NEXT: xxswapd v0, vs5 >> -; P8LE-NEXT: vmrglh v3, v4, v3 >> -; P8LE-NEXT: vmrglh v4, v0, v5 >> -; P8LE-NEXT: vmrglh v5, v6, v1 >> -; P8LE-NEXT: vmrglw v3, v4, v3 >> -; P8LE-NEXT: vmrglw v2, v5, v2 >> +; P8LE-NEXT: mulli r9, r8, 95 >> +; P8LE-NEXT: srwi r3, r3, 6 >> +; P8LE-NEXT: mtvsrd v3, r8 >> +; P8LE-NEXT: mulli r8, r10, 95 >> +; P8LE-NEXT: mtvsrd v4, r10 >> +; P8LE-NEXT: mulli r10, r3, 95 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: sub r5, r5, r12 >> +; P8LE-NEXT: sub r6, r6, r9 >> +; P8LE-NEXT: mtvsrd v3, r5 >> +; P8LE-NEXT: mtvsrd v5, r6 >> +; P8LE-NEXT: sub r5, r7, r8 >> +; P8LE-NEXT: sub r4, r4, r10 >> +; P8LE-NEXT: mtvsrd v0, r5 >> +; P8LE-NEXT: mtvsrd v1, r4 >> +; P8LE-NEXT: vmrghh v3, v5, v3 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v0, v1, v0 >> +; P8LE-NEXT: vmrghh v4, v5, v4 >> +; P8LE-NEXT: vmrglw v3, v0, v3 >> +; P8LE-NEXT: vmrglw v2, v4, v2 >> ; P8LE-NEXT: vadduhm v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -742,34 +710,30 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x >> i16> %x) { >> ; P9LE-NEXT: li r3, 0 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r3, r3, 26 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r3, r3, 27 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, 22765 >> -; P9LE-NEXT: ori r5, r5, 8969 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r5, r4, r5 >> -; P9LE-NEXT: sub r4, r4, r5 >> -; P9LE-NEXT: srwi r4, r4, 1 >> -; P9LE-NEXT: add r4, r4, r5 >> +; P9LE-NEXT: lis r4, 22765 >> +; P9LE-NEXT: ori r4, r4, 8969 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> +; P9LE-NEXT: sub r5, r3, r4 >> +; P9LE-NEXT: srwi r5, r5, 1 >> +; P9LE-NEXT: add r4, r5, r4 >> ; P9LE-NEXT: srwi r4, r4, 6 >> ; P9LE-NEXT: mulli r4, r4, 95 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> ; P9LE-NEXT: clrlwi r3, r3, 29 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v2, v4, v2 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: vmrghh v2, v4, v2 >> ; P9LE-NEXT: vmrglw v2, v2, v3 >> ; P9LE-NEXT: blr >> ; >> @@ -817,9 +781,9 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x >> i16> %x) { >> ; P8LE-NEXT: mffprd r4, f0 >> ; P8LE-NEXT: rldicl r5, r4, 16, 48 >> ; P8LE-NEXT: rldicl r7, r4, 48, 48 >> -; P8LE-NEXT: clrlwi r6, r5, 16 >> -; P8LE-NEXT: mulhwu r3, r6, r3 >> -; P8LE-NEXT: sub r6, r6, r3 >> +; P8LE-NEXT: clrlwi r5, r5, 16 >> +; P8LE-NEXT: mulhwu r3, r5, r3 >> +; P8LE-NEXT: sub r6, r5, r3 >> ; P8LE-NEXT: srwi r6, r6, 1 >> ; P8LE-NEXT: add r3, r6, r3 >> ; P8LE-NEXT: clrldi r6, r4, 48 >> @@ -827,19 +791,15 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x >> i16> %x) { >> ; P8LE-NEXT: clrlwi r6, r6, 26 >> ; P8LE-NEXT: mulli r3, r3, 95 >> ; P8LE-NEXT: rldicl r4, r4, 32, 48 >> -; P8LE-NEXT: mtfprd f0, r6 >> +; P8LE-NEXT: mtvsrd v2, r6 >> ; P8LE-NEXT: clrlwi r6, r7, 27 >> ; P8LE-NEXT: clrlwi r4, r4, 29 >> -; P8LE-NEXT: mtfprd f1, r6 >> -; P8LE-NEXT: mtfprd f3, r4 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: xxswapd v3, vs1 >> +; P8LE-NEXT: mtvsrd v3, r6 >> +; P8LE-NEXT: mtvsrd v5, r4 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> ; P8LE-NEXT: sub r3, r5, r3 >> -; P8LE-NEXT: xxswapd v5, vs3 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v3, v4, v5 >> +; P8LE-NEXT: mtvsrd v4, r3 >> +; P8LE-NEXT: vmrghh v3, v4, v5 >> ; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> @@ -885,40 +845,39 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) { >> ; P9LE: # %bb.0: >> ; P9LE-NEXT: li r3, 4 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: lis r5, -19946 >> -; P9LE-NEXT: ori r5, r5, 17097 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: lis r5, 24749 >> -; P9LE-NEXT: ori r5, r5, 47143 >> +; P9LE-NEXT: lis r4, -19946 >> +; P9LE-NEXT: ori r4, r4, 17097 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> ; P9LE-NEXT: srwi r4, r4, 4 >> ; P9LE-NEXT: mulli r4, r4, 23 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: lis r4, 24749 >> +; P9LE-NEXT: mtvsrd v3, r3 >> ; P9LE-NEXT: li r3, 6 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: clrlwi r4, r3, 16 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: lis r5, -14230 >> -; P9LE-NEXT: ori r5, r5, 30865 >> +; P9LE-NEXT: clrlwi r3, r3, 16 >> +; P9LE-NEXT: ori r4, r4, 47143 >> +; P9LE-NEXT: mulhwu r4, r3, r4 >> ; P9LE-NEXT: srwi r4, r4, 11 >> ; P9LE-NEXT: mulli r4, r4, 5423 >> ; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v3, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> +; P9LE-NEXT: mtvsrd v4, r3 >> ; P9LE-NEXT: li r3, 2 >> ; P9LE-NEXT: vextuhrx r3, r3, v2 >> -; P9LE-NEXT: rlwinm r4, r3, 31, 17, 31 >> -; P9LE-NEXT: mulhwu r4, r4, r5 >> -; P9LE-NEXT: srwi r4, r4, 8 >> -; P9LE-NEXT: mulli r4, r4, 654 >> -; P9LE-NEXT: sub r3, r3, r4 >> -; P9LE-NEXT: xxswapd v4, vs0 >> -; P9LE-NEXT: mtfprd f0, r3 >> -; P9LE-NEXT: xxswapd v2, vs0 >> -; P9LE-NEXT: vmrglh v3, v4, v3 >> -; P9LE-NEXT: xxlxor v4, v4, v4 >> -; P9LE-NEXT: vmrglh v2, v2, v4 >> +; P9LE-NEXT: lis r5, -14230 >> +; P9LE-NEXT: ori r5, r5, 30865 >> +; P9LE-NEXT: vmrghh v3, v4, v3 >> +; P9LE-NEXT: clrlwi r4, r3, 16 >> +; P9LE-NEXT: rlwinm r3, r3, 31, 17, 31 >> +; P9LE-NEXT: mulhwu r3, r3, r5 >> +; P9LE-NEXT: srwi r3, r3, 8 >> +; P9LE-NEXT: mulli r3, r3, 654 >> +; P9LE-NEXT: sub r3, r4, r3 >> +; P9LE-NEXT: mtvsrd v2, r3 >> +; P9LE-NEXT: li r3, 0 >> +; P9LE-NEXT: mtvsrd v4, r3 >> +; P9LE-NEXT: vmrghh v2, v2, v4 >> ; P9LE-NEXT: vmrglw v2, v3, v2 >> ; P9LE-NEXT: blr >> ; >> @@ -969,41 +928,40 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) { >> ; P8LE-LABEL: dont_fold_urem_one: >> ; P8LE: # %bb.0: >> ; P8LE-NEXT: xxswapd vs0, v2 >> -; P8LE-NEXT: lis r3, -19946 >> -; P8LE-NEXT: lis r7, 24749 >> -; P8LE-NEXT: lis r9, -14230 >> -; P8LE-NEXT: xxlxor v5, v5, v5 >> -; P8LE-NEXT: ori r3, r3, 17097 >> -; P8LE-NEXT: ori r7, r7, 47143 >> -; P8LE-NEXT: ori r9, r9, 30865 >> +; P8LE-NEXT: lis r3, -14230 >> +; P8LE-NEXT: lis r7, -19946 >> +; P8LE-NEXT: lis r9, 24749 >> +; P8LE-NEXT: ori r3, r3, 30865 >> +; P8LE-NEXT: ori r7, r7, 17097 >> ; P8LE-NEXT: mffprd r4, f0 >> -; P8LE-NEXT: rldicl r5, r4, 32, 48 >> -; P8LE-NEXT: rldicl r6, r4, 16, 48 >> -; P8LE-NEXT: clrlwi r8, r5, 16 >> -; P8LE-NEXT: rldicl r4, r4, 48, 48 >> +; P8LE-NEXT: rldicl r5, r4, 48, 48 >> +; P8LE-NEXT: rldicl r6, r4, 32, 48 >> +; P8LE-NEXT: rldicl r4, r4, 16, 48 >> +; P8LE-NEXT: rlwinm r8, r5, 31, 17, 31 >> +; P8LE-NEXT: clrlwi r6, r6, 16 >> +; P8LE-NEXT: clrlwi r5, r5, 16 >> ; P8LE-NEXT: mulhwu r3, r8, r3 >> -; P8LE-NEXT: clrlwi r8, r6, 16 >> -; P8LE-NEXT: mulhwu r7, r8, r7 >> -; P8LE-NEXT: rlwinm r8, r4, 31, 17, 31 >> -; P8LE-NEXT: mulhwu r8, r8, r9 >> -; P8LE-NEXT: srwi r3, r3, 4 >> -; P8LE-NEXT: srwi r7, r7, 11 >> -; P8LE-NEXT: mulli r3, r3, 23 >> -; P8LE-NEXT: srwi r8, r8, 8 >> -; P8LE-NEXT: mulli r7, r7, 5423 >> -; P8LE-NEXT: mulli r8, r8, 654 >> +; P8LE-NEXT: ori r8, r9, 47143 >> +; P8LE-NEXT: clrlwi r4, r4, 16 >> +; P8LE-NEXT: li r9, 0 >> +; P8LE-NEXT: mulhwu r7, r6, r7 >> +; P8LE-NEXT: mulhwu r8, r4, r8 >> +; P8LE-NEXT: mtvsrd v2, r9 >> +; P8LE-NEXT: srwi r3, r3, 8 >> +; P8LE-NEXT: srwi r7, r7, 4 >> +; P8LE-NEXT: mulli r3, r3, 654 >> +; P8LE-NEXT: srwi r8, r8, 11 >> +; P8LE-NEXT: mulli r7, r7, 23 >> +; P8LE-NEXT: mulli r8, r8, 5423 >> ; P8LE-NEXT: sub r3, r5, r3 >> ; P8LE-NEXT: sub r5, r6, r7 >> -; P8LE-NEXT: mtfprd f0, r3 >> +; P8LE-NEXT: mtvsrd v3, r3 >> ; P8LE-NEXT: sub r3, r4, r8 >> -; P8LE-NEXT: mtfprd f1, r5 >> -; P8LE-NEXT: mtfprd f2, r3 >> -; P8LE-NEXT: xxswapd v2, vs0 >> -; P8LE-NEXT: xxswapd v3, vs1 >> -; P8LE-NEXT: xxswapd v4, vs2 >> -; P8LE-NEXT: vmrglh v2, v3, v2 >> -; P8LE-NEXT: vmrglh v3, v4, v5 >> -; P8LE-NEXT: vmrglw v2, v2, v3 >> +; P8LE-NEXT: mtvsrd v4, r5 >> +; P8LE-NEXT: mtvsrd v5, r3 >> +; P8LE-NEXT: vmrghh v2, v3, v2 >> +; P8LE-NEXT: vmrghh v3, v5, v4 >> +; P8LE-NEXT: vmrglw v2, v3, v2 >> ; P8LE-NEXT: blr >> ; >> ; P8BE-LABEL: dont_fold_urem_one: >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll >> index 239b38e2ec70..48b62f57c1c9 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll >> @@ -20,12 +20,10 @@ define i32 @test2elt(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -40,13 +38,11 @@ define i32 @test2elt(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs1 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -90,20 +86,16 @@ define i64 @test4elt(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f3, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs2 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v4, v5 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: vmrghh v3, v4, v3 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v5 >> +; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -114,27 +106,23 @@ define i64 @test4elt(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v4, v2 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: vmrghh v2, v4, v2 >> ; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -180,59 +168,51 @@ define <8 x i16> @test8elt(<8 x float>* nocapture >> readonly) local_unnamed_addr # >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v2, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: lvx v5, r3, r4 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: lvx v3, r3, r4 >> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v5 >> -; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: xscvspdpn f2, v2 >> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 >> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mffprwz r5, f0 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r5 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglh v2, v4, v3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v5, vs2 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: xxswapd vs0, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: mffprwz r3, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f4 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v2, v4, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v6, vs3 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P8-NEXT: vmrglh v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglh v5, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghh v3, v3, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghh v5, v0, v5 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P8-NEXT: vmrghh v4, v4, v1 >> +; CHECK-P8-NEXT: vmrglw v3, v4, v5 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P8-NEXT: blr >> ; >> @@ -244,53 +224,45 @@ define <8 x i16> @test8elt(<8 x float>* nocapture >> readonly) local_unnamed_addr # >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -363,116 +335,100 @@ define void @test16elt(<16 x i16>* noalias >> nocapture sret %agg.result, <16 x flo >> ; CHECK-P8-LABEL: test16elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v5, 0, r4 >> -; CHECK-P8-NEXT: li r6, 32 >> ; CHECK-P8-NEXT: li r5, 16 >> -; CHECK-P8-NEXT: lvx v2, r4, r6 >> +; CHECK-P8-NEXT: li r6, 32 >> ; CHECK-P8-NEXT: lvx v3, r4, r5 >> +; CHECK-P8-NEXT: lvx v2, r4, r6 >> ; CHECK-P8-NEXT: li r6, 48 >> -; CHECK-P8-NEXT: xscvspdpn f0, v5 >> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3 >> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3 >> +; CHECK-P8-NEXT: xscvspdpn f1, v5 >> ; CHECK-P8-NEXT: lvx v4, r4, r6 >> -; CHECK-P8-NEXT: xscvspdpn f4, v2 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f2, v3 >> ; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xxswapd vs8, v3 >> -; CHECK-P8-NEXT: xscvspdpn f6, v4 >> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3 >> +; CHECK-P8-NEXT: xxswapd vs8, v3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> +; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: xscvspdpn f2, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f5 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvspdpn f4, v2 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f7 >> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3 >> +; CHECK-P8-NEXT: xscvspdpn f6, v4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f8 >> +; CHECK-P8-NEXT: xxswapd vs8, v4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f5 >> +; CHECK-P8-NEXT: xxswapd vs5, v2 >> ; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1 >> -; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> -; CHECK-P8-NEXT: xxswapd vs11, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xxswapd v2, v4 >> +; CHECK-P8-NEXT: vmrghh v3, v0, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f6 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs5 >> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v2, v5, v1 >> +; CHECK-P8-NEXT: vmrghh v5, v6, v0 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f3 >> +; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> ; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1 >> -; CHECK-P8-NEXT: xscvspdpn f10, vs10 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1 >> +; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs2 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f7 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f8 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P8-NEXT: xscvdpsxws f6, f6 >> -; CHECK-P8-NEXT: xscvspdpn f12, vs12 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: vmrghh v0, v0, v7 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvspdpn f11, vs11 >> -; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvspdpn v2, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f8, f8 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: mffprwz r6, f2 >> -; CHECK-P8-NEXT: xscvspdpn f13, vs13 >> -; CHECK-P8-NEXT: xscvspdpn v3, v3 >> -; CHECK-P8-NEXT: xscvdpsxws f10, f10 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: vmrghh v4, v8, v4 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: mtfprd f2, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f6 >> -; CHECK-P8-NEXT: xscvdpsxws f12, f12 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f6, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f3 >> -; CHECK-P8-NEXT: xscvdpsxws v2, v2 >> -; CHECK-P8-NEXT: xxswapd v9, vs6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f8 >> -; CHECK-P8-NEXT: mtfprd f3, r6 >> -; CHECK-P8-NEXT: xxswapd v0, vs5 >> -; CHECK-P8-NEXT: mffprwz r6, f7 >> -; CHECK-P8-NEXT: xscvdpsxws f13, f13 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: xscvdpsxws v3, v3 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f10 >> -; CHECK-P8-NEXT: mtfprd f7, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f9 >> -; CHECK-P8-NEXT: mtfprd f10, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f12 >> -; CHECK-P8-NEXT: mtfprd f9, r6 >> -; CHECK-P8-NEXT: xxswapd v6, vs10 >> -; CHECK-P8-NEXT: mffprwz r6, f11 >> -; CHECK-P8-NEXT: mtfprd f12, r4 >> -; CHECK-P8-NEXT: xxswapd v1, vs9 >> -; CHECK-P8-NEXT: mfvsrwz r4, v2 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f11, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f13 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v7, vs11 >> -; CHECK-P8-NEXT: mfvsrwz r4, v3 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> -; CHECK-P8-NEXT: xxswapd v4, vs7 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v0 >> -; CHECK-P8-NEXT: xxswapd v5, vs8 >> -; CHECK-P8-NEXT: xxswapd v0, vs2 >> -; CHECK-P8-NEXT: mtfprd f13, r6 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: vmrglh v4, v5, v4 >> -; CHECK-P8-NEXT: vmrglh v5, v0, v1 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglh v0, v7, v6 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> -; CHECK-P8-NEXT: xxswapd v10, vs1 >> +; CHECK-P8-NEXT: vmrghh v1, v1, v9 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghh v7, v8, v7 >> +; CHECK-P8-NEXT: vmrghh v6, v6, v9 >> ; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> -; CHECK-P8-NEXT: vmrglh v1, v1, v6 >> -; CHECK-P8-NEXT: vmrglh v6, v8, v7 >> -; CHECK-P8-NEXT: vmrglh v7, v9, v10 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v4, v1, v0 >> -; CHECK-P8-NEXT: vmrglw v5, v7, v6 >> +; CHECK-P8-NEXT: vmrglw v3, v0, v5 >> +; CHECK-P8-NEXT: vmrglw v4, v1, v4 >> +; CHECK-P8-NEXT: vmrglw v5, v6, v7 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P8-NEXT: stvx v2, 0, r3 >> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 >> @@ -481,118 +437,102 @@ define void @test16elt(<16 x i16>* noalias >> nocapture sret %agg.result, <16 x flo >> ; >> ; CHECK-P9-LABEL: test16elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs1, 0(r4) >> -; CHECK-P9-NEXT: lxv vs3, 16(r4) >> -; CHECK-P9-NEXT: xscvspdpn f5, vs1 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 >> -; CHECK-P9-NEXT: xscvspdpn f8, vs3 >> -; CHECK-P9-NEXT: xxswapd vs4, vs1 >> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: lxv vs2, 0(r4) >> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 >> +; CHECK-P9-NEXT: xxswapd vs4, vs2 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: xscvspdpn f5, vs2 >> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f8, f8 >> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3 >> -; CHECK-P9-NEXT: xxswapd vs7, vs3 >> -; CHECK-P9-NEXT: xscvspdpn f6, vs6 >> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 >> -; CHECK-P9-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: lxv vs1, 16(r4) >> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3 >> +; CHECK-P9-NEXT: xxswapd vs3, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs6 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> +; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P9-NEXT: xscvdpsxws f6, f6 >> -; CHECK-P9-NEXT: mffprwz r5, f5 >> -; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: xscvdpsxws f7, f7 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: mtfprd f5, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f2 >> ; CHECK-P9-NEXT: lxv vs0, 32(r4) >> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3 >> -; CHECK-P9-NEXT: xxswapd vs10, vs0 >> -; CHECK-P9-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P9-NEXT: xscvspdpn f10, vs10 >> -; CHECK-P9-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P9-NEXT: xscvdpsxws f10, f10 >> -; CHECK-P9-NEXT: mtfprd f2, r5 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r5, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r5 >> +; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> +; CHECK-P9-NEXT: xscvspdpn f2, vs3 >> +; CHECK-P9-NEXT: vmrghh v4, v5, v4 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f6 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> +; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> +; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghh v5, v5, v0 >> +; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mtfprd f6, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f7 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mffprwz r5, f1 >> ; CHECK-P9-NEXT: lxv vs1, 48(r4) >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs5 >> -; CHECK-P9-NEXT: mtfprd f7, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs6 >> -; CHECK-P9-NEXT: xxswapd v5, vs7 >> -; CHECK-P9-NEXT: mtfprd f3, r5 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: xxswapd v0, vs3 >> -; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v5, vs8 >> -; CHECK-P9-NEXT: vmrglh v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v1, r5 >> +; CHECK-P9-NEXT: vmrghh v0, v1, v0 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> -; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> +; CHECK-P9-NEXT: mffprwz r4, f0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P9-NEXT: vmrghh v2, v4, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglh v2, v4, v2 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: mffprwz r5, f9 >> -; CHECK-P9-NEXT: mtfprd f9, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f10 >> -; CHECK-P9-NEXT: mtfprd f10, r5 >> -; CHECK-P9-NEXT: xxswapd v0, vs9 >> -; CHECK-P9-NEXT: xxswapd v1, vs10 >> -; CHECK-P9-NEXT: vmrglh v0, v1, v0 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P9-NEXT: stxv vs2, 0(r3) >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r4 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 >> ; CHECK-P9-NEXT: stxv vs0, 16(r3) >> +; CHECK-P9-NEXT: stxv vs2, 0(r3) >> ; CHECK-P9-NEXT: blr >> ; >> ; CHECK-BE-LABEL: test16elt: >> @@ -728,12 +668,10 @@ define i32 @test2elt_signed(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -748,13 +686,11 @@ define i32 @test2elt_signed(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs1 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -798,20 +734,16 @@ define i64 @test4elt_signed(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f3, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs2 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v4, v5 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: vmrghh v3, v4, v3 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v5 >> +; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -822,27 +754,23 @@ define i64 @test4elt_signed(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v4, v2 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: vmrghh v2, v4, v2 >> ; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -888,59 +816,51 @@ define <8 x i16> @test8elt_signed(<8 x float>* >> nocapture readonly) local_unnamed >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v2, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: lvx v5, r3, r4 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: lvx v3, r3, r4 >> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v5 >> -; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: xscvspdpn f2, v2 >> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 >> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mffprwz r5, f0 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r5 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglh v2, v4, v3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v5, vs2 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: xxswapd vs0, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: mffprwz r3, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f4 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v2, v4, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v6, vs3 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P8-NEXT: vmrglh v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglh v5, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghh v3, v3, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghh v5, v0, v5 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P8-NEXT: vmrghh v4, v4, v1 >> +; CHECK-P8-NEXT: vmrglw v3, v4, v5 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P8-NEXT: blr >> ; >> @@ -952,53 +872,45 @@ define <8 x i16> @test8elt_signed(<8 x float>* >> nocapture readonly) local_unnamed >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -1071,116 +983,100 @@ define void @test16elt_signed(<16 x i16>* >> noalias nocapture sret %agg.result, <1 >> ; CHECK-P8-LABEL: test16elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v5, 0, r4 >> -; CHECK-P8-NEXT: li r6, 32 >> ; CHECK-P8-NEXT: li r5, 16 >> -; CHECK-P8-NEXT: lvx v2, r4, r6 >> +; CHECK-P8-NEXT: li r6, 32 >> ; CHECK-P8-NEXT: lvx v3, r4, r5 >> +; CHECK-P8-NEXT: lvx v2, r4, r6 >> ; CHECK-P8-NEXT: li r6, 48 >> -; CHECK-P8-NEXT: xscvspdpn f0, v5 >> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3 >> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3 >> +; CHECK-P8-NEXT: xscvspdpn f1, v5 >> ; CHECK-P8-NEXT: lvx v4, r4, r6 >> -; CHECK-P8-NEXT: xscvspdpn f4, v2 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f2, v3 >> ; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xxswapd vs8, v3 >> -; CHECK-P8-NEXT: xscvspdpn f6, v4 >> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3 >> +; CHECK-P8-NEXT: xxswapd vs8, v3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> +; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: xscvspdpn f2, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f5 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvspdpn f4, v2 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f7 >> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3 >> +; CHECK-P8-NEXT: xscvspdpn f6, v4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f8 >> +; CHECK-P8-NEXT: xxswapd vs8, v4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f5 >> +; CHECK-P8-NEXT: xxswapd vs5, v2 >> ; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1 >> -; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> -; CHECK-P8-NEXT: xxswapd vs11, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xxswapd v2, v4 >> +; CHECK-P8-NEXT: vmrghh v3, v0, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f6 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs5 >> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v2, v5, v1 >> +; CHECK-P8-NEXT: vmrghh v5, v6, v0 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f3 >> +; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> ; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1 >> -; CHECK-P8-NEXT: xscvspdpn f10, vs10 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1 >> +; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs2 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f7 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f8 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P8-NEXT: xscvdpsxws f6, f6 >> -; CHECK-P8-NEXT: xscvspdpn f12, vs12 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: vmrghh v0, v0, v7 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvspdpn f11, vs11 >> -; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvspdpn v2, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f8, f8 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: mffprwz r6, f2 >> -; CHECK-P8-NEXT: xscvspdpn f13, vs13 >> -; CHECK-P8-NEXT: xscvspdpn v3, v3 >> -; CHECK-P8-NEXT: xscvdpsxws f10, f10 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: vmrghh v4, v8, v4 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: mtfprd f2, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f6 >> -; CHECK-P8-NEXT: xscvdpsxws f12, f12 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f6, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f3 >> -; CHECK-P8-NEXT: xscvdpsxws v2, v2 >> -; CHECK-P8-NEXT: xxswapd v9, vs6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f8 >> -; CHECK-P8-NEXT: mtfprd f3, r6 >> -; CHECK-P8-NEXT: xxswapd v0, vs5 >> -; CHECK-P8-NEXT: mffprwz r6, f7 >> -; CHECK-P8-NEXT: xscvdpsxws f13, f13 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: xscvdpsxws v3, v3 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f10 >> -; CHECK-P8-NEXT: mtfprd f7, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f9 >> -; CHECK-P8-NEXT: mtfprd f10, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f12 >> -; CHECK-P8-NEXT: mtfprd f9, r6 >> -; CHECK-P8-NEXT: xxswapd v6, vs10 >> -; CHECK-P8-NEXT: mffprwz r6, f11 >> -; CHECK-P8-NEXT: mtfprd f12, r4 >> -; CHECK-P8-NEXT: xxswapd v1, vs9 >> -; CHECK-P8-NEXT: mfvsrwz r4, v2 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f11, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f13 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v7, vs11 >> -; CHECK-P8-NEXT: mfvsrwz r4, v3 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> -; CHECK-P8-NEXT: xxswapd v4, vs7 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v0 >> -; CHECK-P8-NEXT: xxswapd v5, vs8 >> -; CHECK-P8-NEXT: xxswapd v0, vs2 >> -; CHECK-P8-NEXT: mtfprd f13, r6 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: vmrglh v4, v5, v4 >> -; CHECK-P8-NEXT: vmrglh v5, v0, v1 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglh v0, v7, v6 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> -; CHECK-P8-NEXT: xxswapd v10, vs1 >> +; CHECK-P8-NEXT: vmrghh v1, v1, v9 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghh v7, v8, v7 >> +; CHECK-P8-NEXT: vmrghh v6, v6, v9 >> ; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> -; CHECK-P8-NEXT: vmrglh v1, v1, v6 >> -; CHECK-P8-NEXT: vmrglh v6, v8, v7 >> -; CHECK-P8-NEXT: vmrglh v7, v9, v10 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v4, v1, v0 >> -; CHECK-P8-NEXT: vmrglw v5, v7, v6 >> +; CHECK-P8-NEXT: vmrglw v3, v0, v5 >> +; CHECK-P8-NEXT: vmrglw v4, v1, v4 >> +; CHECK-P8-NEXT: vmrglw v5, v6, v7 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P8-NEXT: stvx v2, 0, r3 >> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 >> @@ -1189,118 +1085,102 @@ define void @test16elt_signed(<16 x i16>* >> noalias nocapture sret %agg.result, <1 >> ; >> ; CHECK-P9-LABEL: test16elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs1, 0(r4) >> -; CHECK-P9-NEXT: lxv vs3, 16(r4) >> -; CHECK-P9-NEXT: xscvspdpn f5, vs1 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 >> -; CHECK-P9-NEXT: xscvspdpn f8, vs3 >> -; CHECK-P9-NEXT: xxswapd vs4, vs1 >> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: lxv vs2, 0(r4) >> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 >> +; CHECK-P9-NEXT: xxswapd vs4, vs2 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: xscvspdpn f5, vs2 >> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f8, f8 >> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3 >> -; CHECK-P9-NEXT: xxswapd vs7, vs3 >> -; CHECK-P9-NEXT: xscvspdpn f6, vs6 >> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 >> -; CHECK-P9-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: lxv vs1, 16(r4) >> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3 >> +; CHECK-P9-NEXT: xxswapd vs3, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs6 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> +; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P9-NEXT: xscvdpsxws f6, f6 >> -; CHECK-P9-NEXT: mffprwz r5, f5 >> -; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: xscvdpsxws f7, f7 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: mtfprd f5, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f2 >> ; CHECK-P9-NEXT: lxv vs0, 32(r4) >> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3 >> -; CHECK-P9-NEXT: xxswapd vs10, vs0 >> -; CHECK-P9-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P9-NEXT: xscvspdpn f10, vs10 >> -; CHECK-P9-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P9-NEXT: xscvdpsxws f10, f10 >> -; CHECK-P9-NEXT: mtfprd f2, r5 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r5, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r5 >> +; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> +; CHECK-P9-NEXT: xscvspdpn f2, vs3 >> +; CHECK-P9-NEXT: vmrghh v4, v5, v4 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f6 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> +; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> +; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghh v5, v5, v0 >> +; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P9-NEXT: mffprwz r5, f2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mtfprd f6, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f7 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mffprwz r5, f1 >> ; CHECK-P9-NEXT: lxv vs1, 48(r4) >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs5 >> -; CHECK-P9-NEXT: mtfprd f7, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs6 >> -; CHECK-P9-NEXT: xxswapd v5, vs7 >> -; CHECK-P9-NEXT: mtfprd f3, r5 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: xxswapd v0, vs3 >> -; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v5, vs8 >> -; CHECK-P9-NEXT: vmrglh v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v1, r5 >> +; CHECK-P9-NEXT: vmrghh v0, v1, v0 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> -; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> +; CHECK-P9-NEXT: mffprwz r4, f0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P9-NEXT: vmrghh v2, v4, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglh v2, v4, v2 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: mffprwz r5, f9 >> -; CHECK-P9-NEXT: mtfprd f9, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f10 >> -; CHECK-P9-NEXT: mtfprd f10, r5 >> -; CHECK-P9-NEXT: xxswapd v0, vs9 >> -; CHECK-P9-NEXT: xxswapd v1, vs10 >> -; CHECK-P9-NEXT: vmrglh v0, v1, v0 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P9-NEXT: stxv vs2, 0(r3) >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r4 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 >> ; CHECK-P9-NEXT: stxv vs0, 16(r3) >> +; CHECK-P9-NEXT: stxv vs2, 0(r3) >> ; CHECK-P9-NEXT: blr >> ; >> ; CHECK-BE-LABEL: test16elt_signed: >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll >> index 1f95eda2b1b5..928a19f3a55c 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll >> @@ -20,12 +20,10 @@ define i16 @test2elt(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: clrldi r3, r3, 48 >> @@ -43,13 +41,11 @@ define i16 @test2elt(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: addi r3, r1, -2 >> -; CHECK-P9-NEXT: xxswapd v2, vs1 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 >> ; CHECK-P9-NEXT: stxsihx v2, 0, r3 >> ; CHECK-P9-NEXT: lhz r3, -2(r1) >> @@ -97,20 +93,16 @@ define i32 @test4elt(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f3, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs2 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v4, v5 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: vmrghb v3, v4, v3 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v5 >> +; CHECK-P8-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -121,28 +113,24 @@ define i32 @test4elt(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v4, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v4, v2 >> ; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -189,59 +177,51 @@ define i64 @test8elt(<8 x float>* nocapture >> readonly) local_unnamed_addr #2 { >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v2, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: lvx v5, r3, r4 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: lvx v3, r3, r4 >> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v5 >> -; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: xscvspdpn f2, v2 >> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 >> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mffprwz r5, f0 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r5 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglb v2, v4, v3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v5, vs2 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: xxswapd vs0, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: mffprwz r3, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f4 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghb v2, v4, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v6, vs3 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P8-NEXT: vmrglb v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglb v5, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghb v3, v3, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghb v5, v0, v5 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> +; CHECK-P8-NEXT: vmrghb v4, v4, v1 >> +; CHECK-P8-NEXT: vmrglh v3, v4, v5 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> @@ -255,53 +235,45 @@ define i64 @test8elt(<8 x float>* nocapture >> readonly) local_unnamed_addr #2 { >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> @@ -376,117 +348,101 @@ entry: >> define <16 x i8> @test16elt(<16 x float>* nocapture readonly) >> local_unnamed_addr #3 { >> ; CHECK-P8-LABEL: test16elt: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: lvx v2, 0, r3 >> +; CHECK-P8-NEXT: lvx v4, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> +; CHECK-P8-NEXT: li r5, 32 >> ; CHECK-P8-NEXT: lvx v3, r3, r4 >> -; CHECK-P8-NEXT: li r4, 32 >> -; CHECK-P8-NEXT: xscvspdpn f2, v2 >> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v3 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1 >> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> -; CHECK-P8-NEXT: lvx v2, r3, r4 >> +; CHECK-P8-NEXT: lvx v2, r3, r5 >> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3 >> +; CHECK-P8-NEXT: xxswapd vs2, v4 >> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1 >> +; CHECK-P8-NEXT: xscvspdpn f1, v4 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xxswapd vs6, v3 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxswapd vs9, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: xxswapd vs7, v3 >> +; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3 >> ; CHECK-P8-NEXT: xscvspdpn f6, vs6 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> ; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> ; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f5 >> -; CHECK-P8-NEXT: xxswapd v0, vs4 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxswapd vs0, v2 >> +; CHECK-P8-NEXT: mffprwz r5, f2 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: mtvsrd v4, r5 >> +; CHECK-P8-NEXT: mffprwz r5, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f6 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> -; CHECK-P8-NEXT: mtfprd f6, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: vmrghb v3, v4, v3 >> +; CHECK-P8-NEXT: mtvsrd v4, r5 >> +; CHECK-P8-NEXT: mffprwz r5, f3 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f7 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f8 >> -; CHECK-P8-NEXT: xxswapd v5, vs7 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f9 >> -; CHECK-P8-NEXT: xxswapd v1, vs8 >> -; CHECK-P8-NEXT: mtfprd f9, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P8-NEXT: xxswapd v4, vs2 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs9 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: xxswapd v7, vs3 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P8-NEXT: xxswapd v5, vs5 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f8 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: li r4, 48 >> -; CHECK-P8-NEXT: lvx v9, r3, r4 >> -; CHECK-P8-NEXT: vmrglb v1, v6, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs1 >> +; CHECK-P8-NEXT: lvx v0, r3, r4 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v9 >> -; CHECK-P8-NEXT: xxswapd vs3, v9 >> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1 >> +; CHECK-P8-NEXT: xscvspdpn f5, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: xxswapd vs4, v0 >> ; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P8-NEXT: mtvsrd v7, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1 >> +; CHECK-P8-NEXT: xscvspdpn f2, v0 >> ; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f9 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghb v2, v6, v1 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f5 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghb v4, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v5, r5 >> +; CHECK-P8-NEXT: vmrghb v0, v6, v1 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v9, vs4 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: vmrglb v2, v0, v7 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v7, vs2 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: vmrglb v5, v8, v5 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: xxswapd v10, vs3 >> -; CHECK-P8-NEXT: vmrglb v0, v0, v6 >> +; CHECK-P8-NEXT: vmrghb v5, v5, v7 >> +; CHECK-P8-NEXT: vmrghb v1, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: mtvsrd v7, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mtvsrd v9, r3 >> +; CHECK-P8-NEXT: vmrghb v7, v8, v7 >> +; CHECK-P8-NEXT: vmrghb v6, v6, v9 >> ; CHECK-P8-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P8-NEXT: vmrglb v6, v8, v7 >> -; CHECK-P8-NEXT: vmrglb v7, v9, v10 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v1 >> -; CHECK-P8-NEXT: vmrglh v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglh v5, v7, v6 >> +; CHECK-P8-NEXT: vmrglh v2, v5, v2 >> +; CHECK-P8-NEXT: vmrglh v4, v1, v0 >> +; CHECK-P8-NEXT: vmrglh v5, v6, v7 >> ; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> @@ -494,114 +450,98 @@ define <16 x i8> @test16elt(<16 x float>* >> nocapture readonly) local_unnamed_addr >> ; >> ; CHECK-P9-LABEL: test16elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs2, 0(r3) >> +; CHECK-P9-NEXT: lxv vs3, 0(r3) >> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: lxv vs0, 48(r3) >> +; CHECK-P9-NEXT: lxv vs1, 32(r3) >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: xxswapd vs4, vs3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs3 >> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f3 >> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: lxv vs0, 48(r3) >> -; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs4, 16(r3) >> +; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs2 >> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: xxswapd vs2, vs4 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs2 >> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v4, v5, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v4, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> -; CHECK-P9-NEXT: xxswapd v0, vs0 >> -; CHECK-P9-NEXT: vmrglb v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r3 >> +; CHECK-P9-NEXT: vmrghb v5, v5, v0 >> ; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> @@ -738,12 +678,10 @@ define i16 @test2elt_signed(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: clrldi r3, r3, 48 >> @@ -761,13 +699,11 @@ define i16 @test2elt_signed(i64 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: addi r3, r1, -2 >> -; CHECK-P9-NEXT: xxswapd v2, vs1 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 >> ; CHECK-P9-NEXT: stxsihx v2, 0, r3 >> ; CHECK-P9-NEXT: lhz r3, -2(r1) >> @@ -815,20 +751,16 @@ define i32 @test4elt_signed(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: mtfprd f3, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs2 >> -; CHECK-P8-NEXT: xxswapd v5, vs3 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v4, v5 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: vmrghb v3, v4, v3 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v5 >> +; CHECK-P8-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -839,28 +771,24 @@ define i32 @test4elt_signed(<4 x float> %a) >> local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xscvspdpn f0, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v4, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v4, v2 >> ; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -907,59 +835,51 @@ define i64 @test8elt_signed(<8 x float>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: lvx v2, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: lvx v5, r3, r4 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: lvx v3, r3, r4 >> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v5 >> -; CHECK-P8-NEXT: xxswapd vs3, v5 >> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xxswapd vs1, v2 >> +; CHECK-P8-NEXT: xscvspdpn f2, v2 >> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1 >> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> ; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mffprwz r5, f0 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r5 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v1, vs4 >> -; CHECK-P8-NEXT: vmrglb v2, v4, v3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v5, vs2 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f1 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: xxswapd vs0, v3 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: mffprwz r3, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f4 >> +; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f5 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghb v2, v4, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v6, vs3 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P8-NEXT: vmrglb v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglb v5, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghb v3, v3, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> +; CHECK-P8-NEXT: mtvsrd v5, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghb v5, v0, v5 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> +; CHECK-P8-NEXT: vmrghb v4, v4, v1 >> +; CHECK-P8-NEXT: vmrglh v3, v4, v5 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> @@ -973,53 +893,45 @@ define i64 @test8elt_signed(<8 x float>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> @@ -1094,117 +1006,101 @@ entry: >> define <16 x i8> @test16elt_signed(<16 x float>* nocapture readonly) >> local_unnamed_addr #3 { >> ; CHECK-P8-LABEL: test16elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: lvx v2, 0, r3 >> +; CHECK-P8-NEXT: lvx v4, 0, r3 >> ; CHECK-P8-NEXT: li r4, 16 >> +; CHECK-P8-NEXT: li r5, 32 >> ; CHECK-P8-NEXT: lvx v3, r3, r4 >> -; CHECK-P8-NEXT: li r4, 32 >> -; CHECK-P8-NEXT: xscvspdpn f2, v2 >> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v3 >> -; CHECK-P8-NEXT: xxswapd vs1, v2 >> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1 >> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3 >> -; CHECK-P8-NEXT: lvx v2, r3, r4 >> +; CHECK-P8-NEXT: lvx v2, r3, r5 >> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3 >> +; CHECK-P8-NEXT: xxswapd vs2, v4 >> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1 >> +; CHECK-P8-NEXT: xscvspdpn f1, v4 >> +; CHECK-P8-NEXT: xscvspdpn f3, v3 >> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3 >> ; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> -; CHECK-P8-NEXT: xxswapd vs6, v3 >> -; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1 >> -; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxswapd vs9, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: xxswapd vs7, v3 >> +; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3 >> ; CHECK-P8-NEXT: xscvspdpn f6, vs6 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r4, f2 >> ; CHECK-P8-NEXT: xscvspdpn f7, vs7 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> ; CHECK-P8-NEXT: xscvspdpn f8, vs8 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f5 >> -; CHECK-P8-NEXT: xxswapd v0, vs4 >> +; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P8-NEXT: xscvspdpn f9, vs9 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxswapd vs0, v2 >> +; CHECK-P8-NEXT: mffprwz r5, f2 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> +; CHECK-P8-NEXT: mtvsrd v4, r5 >> +; CHECK-P8-NEXT: mffprwz r5, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f6 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> -; CHECK-P8-NEXT: mtfprd f6, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: vmrghb v3, v4, v3 >> +; CHECK-P8-NEXT: mtvsrd v4, r5 >> +; CHECK-P8-NEXT: mffprwz r5, f3 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f7 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f8 >> -; CHECK-P8-NEXT: xxswapd v5, vs7 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xscvdpsxws f1, f9 >> -; CHECK-P8-NEXT: xxswapd v1, vs8 >> -; CHECK-P8-NEXT: mtfprd f9, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P8-NEXT: xxswapd v4, vs2 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs9 >> -; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: xscvspdpn f0, v2 >> -; CHECK-P8-NEXT: xxswapd v7, vs3 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P8-NEXT: xxswapd v5, vs5 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f8 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: li r4, 48 >> -; CHECK-P8-NEXT: lvx v9, r3, r4 >> -; CHECK-P8-NEXT: vmrglb v1, v6, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs1 >> +; CHECK-P8-NEXT: lvx v0, r3, r4 >> +; CHECK-P8-NEXT: mffprwz r3, f1 >> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1 >> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3 >> -; CHECK-P8-NEXT: xscvspdpn f4, v9 >> -; CHECK-P8-NEXT: xxswapd vs3, v9 >> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1 >> +; CHECK-P8-NEXT: xscvspdpn f5, v2 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3 >> +; CHECK-P8-NEXT: mtvsrd v1, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> +; CHECK-P8-NEXT: xxswapd vs4, v0 >> ; CHECK-P8-NEXT: xscvspdpn f1, vs1 >> -; CHECK-P8-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P8-NEXT: mtvsrd v7, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1 >> +; CHECK-P8-NEXT: xscvspdpn f2, v0 >> ; CHECK-P8-NEXT: xscvspdpn f3, vs3 >> -; CHECK-P8-NEXT: xscvspdpn f5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f9 >> +; CHECK-P8-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P8-NEXT: xscvspdpn f0, vs0 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghb v2, v6, v1 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f5 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: vmrghb v4, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v5, r5 >> +; CHECK-P8-NEXT: vmrghb v0, v6, v1 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: xxswapd v9, vs4 >> -; CHECK-P8-NEXT: mtfprd f1, r3 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs1 >> -; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: vmrglb v2, v0, v7 >> -; CHECK-P8-NEXT: xxswapd v0, vs0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v7, vs2 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: vmrglb v5, v8, v5 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: xxswapd v10, vs3 >> -; CHECK-P8-NEXT: vmrglb v0, v0, v6 >> +; CHECK-P8-NEXT: vmrghb v5, v5, v7 >> +; CHECK-P8-NEXT: vmrghb v1, v1, v6 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: mtvsrd v7, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mtvsrd v9, r3 >> +; CHECK-P8-NEXT: vmrghb v7, v8, v7 >> +; CHECK-P8-NEXT: vmrghb v6, v6, v9 >> ; CHECK-P8-NEXT: vmrglh v3, v4, v3 >> -; CHECK-P8-NEXT: vmrglb v6, v8, v7 >> -; CHECK-P8-NEXT: vmrglb v7, v9, v10 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v1 >> -; CHECK-P8-NEXT: vmrglh v4, v0, v5 >> -; CHECK-P8-NEXT: vmrglh v5, v7, v6 >> +; CHECK-P8-NEXT: vmrglh v2, v5, v2 >> +; CHECK-P8-NEXT: vmrglh v4, v1, v0 >> +; CHECK-P8-NEXT: vmrglh v5, v6, v7 >> ; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> ; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> @@ -1212,114 +1108,98 @@ define <16 x i8> @test16elt_signed(<16 x float>* >> nocapture readonly) local_unnam >> ; >> ; CHECK-P9-LABEL: test16elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs2, 0(r3) >> +; CHECK-P9-NEXT: lxv vs3, 0(r3) >> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: lxv vs0, 48(r3) >> +; CHECK-P9-NEXT: lxv vs1, 32(r3) >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: xxswapd vs4, vs3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs4 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: xscvspdpn f4, vs3 >> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: mffprwz r3, f4 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f3 >> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: lxv vs0, 48(r3) >> -; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs4, 16(r3) >> +; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvspdpn f3, vs2 >> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: xxswapd vs2, vs4 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1 >> -; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs2 >> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvspdpn f2, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v3, v4, v3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglb v3, v4, v3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xscvspdpn f1, vs0 >> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvspdpn f0, vs0 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v4, v5, v4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v4, v5, v4 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> -; CHECK-P9-NEXT: xxswapd v0, vs0 >> -; CHECK-P9-NEXT: vmrglb v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r3 >> +; CHECK-P9-NEXT: vmrghb v5, v5, v0 >> ; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll >> index c7d66ae784a0..dbc2774fed8c 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll >> @@ -16,12 +16,10 @@ define i32 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -30,15 +28,13 @@ define i32 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9: # %bb.0: # %entry >> ; CHECK-P9-NEXT: xscvdpsxws f0, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -77,18 +73,14 @@ define i64 @test4elt(<4 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: xxswapd v2, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs3 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xxswapd v5, vs1 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: vmrghh v2, v4, v2 >> +; CHECK-P8-NEXT: vmrghh v3, v5, v3 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> @@ -102,22 +94,18 @@ define i64 @test4elt(<4 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -176,36 +164,28 @@ define <8 x i16> @test8elt(<8 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs4 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f6, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v1, vs7 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v0, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs2 >> -; CHECK-P8-NEXT: vmrglh v2, v5, v2 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> -; CHECK-P8-NEXT: vmrglh v3, v0, v3 >> -; CHECK-P8-NEXT: vmrglh v4, v6, v4 >> -; CHECK-P8-NEXT: vmrglh v5, v5, v1 >> +; CHECK-P8-NEXT: vmrghh v2, v0, v2 >> +; CHECK-P8-NEXT: vmrghh v3, v1, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: vmrghh v4, v0, v4 >> +; CHECK-P8-NEXT: vmrghh v5, v1, v5 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> @@ -217,47 +197,39 @@ define <8 x i16> @test8elt(<8 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 48(r3) >> ; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: xxswapd v2, vs4 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -321,209 +293,177 @@ entry: >> define void @test16elt(<16 x i16>* noalias nocapture sret %agg.result, >> <16 x double>* nocapture readonly) local_unnamed_addr #3 { >> ; CHECK-P8-LABEL: test16elt: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r5, 16 >> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r6, 32 >> +; CHECK-P8-NEXT: li r7, 48 >> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5 >> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6 >> -; CHECK-P8-NEXT: li r6, 48 >> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6 >> ; CHECK-P8-NEXT: li r6, 64 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f0 >> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7 >> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6 >> -; CHECK-P8-NEXT: li r6, 80 >> +; CHECK-P8-NEXT: li r7, 80 >> +; CHECK-P8-NEXT: li r6, 96 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f0 >> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7 >> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6 >> +; CHECK-P8-NEXT: li r6, 112 >> ; CHECK-P8-NEXT: xxswapd vs0, vs0 >> ; CHECK-P8-NEXT: xscvdpsxws f6, f1 >> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6 >> -; CHECK-P8-NEXT: li r6, 96 >> ; CHECK-P8-NEXT: xxswapd vs1, vs1 >> ; CHECK-P8-NEXT: xscvdpsxws f8, f2 >> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6 >> -; CHECK-P8-NEXT: li r6, 112 >> ; CHECK-P8-NEXT: xxswapd vs2, vs2 >> -; CHECK-P8-NEXT: xscvdpsxws f10, f3 >> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6 >> +; CHECK-P8-NEXT: xscvdpsxws f9, f3 >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> -; CHECK-P8-NEXT: xscvdpsxws f12, f5 >> +; CHECK-P8-NEXT: xscvdpsxws f11, f5 >> ; CHECK-P8-NEXT: xxswapd vs5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f13, f7 >> +; CHECK-P8-NEXT: xscvdpsxws f12, f7 >> ; CHECK-P8-NEXT: xxswapd vs7, vs7 >> -; CHECK-P8-NEXT: xscvdpsxws v2, f9 >> -; CHECK-P8-NEXT: xxswapd vs9, vs9 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws v3, f11 >> -; CHECK-P8-NEXT: xxswapd vs11, vs11 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r6, f6 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: mffprwz r7, f4 >> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f13, f10 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f8 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f4 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f9 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f11 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs4 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: mtfprd f6, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f10 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs6 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f12 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: xxswapd v0, vs8 >> -; CHECK-P8-NEXT: mtfprd f10, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f13 >> -; CHECK-P8-NEXT: mtfprd f12, r4 >> -; CHECK-P8-NEXT: xxswapd v1, vs10 >> -; CHECK-P8-NEXT: mfvsrwz r4, v2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f13 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: mtfprd f13, r6 >> -; CHECK-P8-NEXT: mfvsrwz r6, v3 >> -; CHECK-P8-NEXT: mtvsrd v2, r4 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xxswapd vs6, vs10 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxswapd vs0, vs4 >> +; CHECK-P8-NEXT: mtvsrd v2, r7 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> ; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: xxswapd v2, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: mtvsrd v3, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v3, v3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f6 >> +; CHECK-P8-NEXT: vmrghh v2, v8, v2 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v3, v9, v3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> -; CHECK-P8-NEXT: mffprwz r6, f3 >> -; CHECK-P8-NEXT: xxswapd v10, vs2 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f9 >> -; CHECK-P8-NEXT: mtfprd f3, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f7 >> -; CHECK-P8-NEXT: mtfprd f9, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f11 >> -; CHECK-P8-NEXT: vmrglh v4, v8, v4 >> -; CHECK-P8-NEXT: xxswapd v8, vs3 >> -; CHECK-P8-NEXT: vmrglh v5, v9, v5 >> -; CHECK-P8-NEXT: xxswapd v9, vs5 >> -; CHECK-P8-NEXT: mtfprd f7, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: vmrglh v0, v10, v0 >> -; CHECK-P8-NEXT: xxswapd v10, vs7 >> -; CHECK-P8-NEXT: vmrglh v1, v8, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs9 >> -; CHECK-P8-NEXT: vmrglh v6, v9, v6 >> -; CHECK-P8-NEXT: xxswapd v9, vs0 >> -; CHECK-P8-NEXT: vmrglh v7, v10, v7 >> -; CHECK-P8-NEXT: vmrglh v2, v8, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v9, v3 >> -; CHECK-P8-NEXT: vmrglw v4, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v5, v1, v0 >> -; CHECK-P8-NEXT: vmrglw v0, v7, v6 >> +; CHECK-P8-NEXT: vmrghh v4, v8, v4 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f7 >> +; CHECK-P8-NEXT: vmrghh v5, v9, v5 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: vmrghh v0, v8, v0 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghh v1, v9, v1 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghh v6, v8, v6 >> +; CHECK-P8-NEXT: vmrghh v7, v9, v7 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P8-NEXT: vmrglw v4, v1, v0 >> +; CHECK-P8-NEXT: vmrglw v5, v7, v6 >> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> +; CHECK-P8-NEXT: stvx v2, 0, r3 >> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 >> -; CHECK-P8-NEXT: stvx v3, 0, r3 >> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0 >> -; CHECK-P8-NEXT: stvx v2, r3, r5 >> +; CHECK-P8-NEXT: stvx v3, r3, r5 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test16elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs4, 0(r4) >> -; CHECK-P9-NEXT: lxv vs3, 16(r4) >> -; CHECK-P9-NEXT: lxv vs2, 32(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f5, f4 >> -; CHECK-P9-NEXT: lxv vs1, 48(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f6, f3 >> -; CHECK-P9-NEXT: lxv vs0, 64(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f7, f2 >> -; CHECK-P9-NEXT: xscvdpsxws f8, f1 >> -; CHECK-P9-NEXT: xxswapd vs4, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P9-NEXT: mffprwz r5, f5 >> -; CHECK-P9-NEXT: xscvdpsxws f9, f0 >> +; CHECK-P9-NEXT: lxv vs3, 0(r4) >> +; CHECK-P9-NEXT: lxv vs2, 16(r4) >> +; CHECK-P9-NEXT: lxv vs1, 32(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> +; CHECK-P9-NEXT: lxv vs0, 48(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f5, f2 >> +; CHECK-P9-NEXT: xscvdpsxws f6, f1 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: xscvdpsxws f7, f0 >> +; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: mtfprd f5, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f6 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mtfprd f6, r5 >> +; CHECK-P9-NEXT: mtvsrd v2, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f5 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f6 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f7 >> -; CHECK-P9-NEXT: mtfprd f7, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f9 >> -; CHECK-P9-NEXT: mtfprd f9, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r5 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: lxv vs3, 64(r4) >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs5 >> -; CHECK-P9-NEXT: xxswapd v5, vs8 >> -; CHECK-P9-NEXT: xxswapd v0, vs9 >> -; CHECK-P9-NEXT: mtfprd f3, r5 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r5 >> -; CHECK-P9-NEXT: xxswapd vs0, vs0 >> -; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: xxswapd v1, vs2 >> ; CHECK-P9-NEXT: lxv vs2, 80(r4) >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs6 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> -; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> -; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f1 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs7 >> -; CHECK-P9-NEXT: mtfprd f1, r5 >> +; CHECK-P9-NEXT: lxv vs1, 96(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> +; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v1 >> -; CHECK-P9-NEXT: xxswapd v1, vs1 >> -; CHECK-P9-NEXT: mtfprd f0, r5 >> -; CHECK-P9-NEXT: vmrglh v5, v5, v1 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: xxswapd v1, vs0 >> ; CHECK-P9-NEXT: lxv vs0, 112(r4) >> -; CHECK-P9-NEXT: lxv vs1, 96(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghh v5, v5, v0 >> +; CHECK-P9-NEXT: mffprwz r4, f4 >> +; CHECK-P9-NEXT: vmrglw v4, v5, v4 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r4 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> +; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P9-NEXT: stxv vs4, 0(r3) >> +; CHECK-P9-NEXT: mffprwz r4, f3 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v2, vs3 >> -; CHECK-P9-NEXT: vmrglh v0, v0, v1 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r4, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r4 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r4 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 >> ; CHECK-P9-NEXT: stxv vs0, 16(r3) >> -; CHECK-P9-NEXT: stxv vs4, 0(r3) >> ; CHECK-P9-NEXT: blr >> ; >> ; CHECK-BE-LABEL: test16elt: >> @@ -639,12 +579,10 @@ define i32 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -653,15 +591,13 @@ define i32 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9: # %bb.0: # %entry >> ; CHECK-P9-NEXT: xscvdpsxws f0, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -700,18 +636,14 @@ define i64 @test4elt_signed(<4 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: xxswapd v2, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs3 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xxswapd v5, vs1 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: vmrghh v2, v4, v2 >> +; CHECK-P8-NEXT: vmrghh v3, v5, v3 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> @@ -725,22 +657,18 @@ define i64 @test4elt_signed(<4 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -799,36 +727,28 @@ define <8 x i16> @test8elt_signed(<8 x double>* >> nocapture readonly) local_unname >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs4 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f6, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v1, vs7 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v0, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs2 >> -; CHECK-P8-NEXT: vmrglh v2, v5, v2 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> -; CHECK-P8-NEXT: vmrglh v3, v0, v3 >> -; CHECK-P8-NEXT: vmrglh v4, v6, v4 >> -; CHECK-P8-NEXT: vmrglh v5, v5, v1 >> +; CHECK-P8-NEXT: vmrghh v2, v0, v2 >> +; CHECK-P8-NEXT: vmrghh v3, v1, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: vmrghh v4, v0, v4 >> +; CHECK-P8-NEXT: vmrghh v5, v1, v5 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> @@ -840,47 +760,39 @@ define <8 x i16> @test8elt_signed(<8 x double>* >> nocapture readonly) local_unname >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 48(r3) >> ; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: xxswapd v2, vs4 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> ; CHECK-P9-NEXT: blr >> @@ -944,209 +856,177 @@ entry: >> define void @test16elt_signed(<16 x i16>* noalias nocapture sret >> %agg.result, <16 x double>* nocapture readonly) local_unnamed_addr #3 { >> ; CHECK-P8-LABEL: test16elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r5, 16 >> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r6, 32 >> +; CHECK-P8-NEXT: li r7, 48 >> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5 >> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6 >> -; CHECK-P8-NEXT: li r6, 48 >> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6 >> ; CHECK-P8-NEXT: li r6, 64 >> -; CHECK-P8-NEXT: xscvdpsxws f4, f0 >> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7 >> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6 >> -; CHECK-P8-NEXT: li r6, 80 >> +; CHECK-P8-NEXT: li r7, 80 >> +; CHECK-P8-NEXT: li r6, 96 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f0 >> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7 >> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6 >> +; CHECK-P8-NEXT: li r6, 112 >> ; CHECK-P8-NEXT: xxswapd vs0, vs0 >> ; CHECK-P8-NEXT: xscvdpsxws f6, f1 >> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6 >> -; CHECK-P8-NEXT: li r6, 96 >> ; CHECK-P8-NEXT: xxswapd vs1, vs1 >> ; CHECK-P8-NEXT: xscvdpsxws f8, f2 >> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6 >> -; CHECK-P8-NEXT: li r6, 112 >> ; CHECK-P8-NEXT: xxswapd vs2, vs2 >> -; CHECK-P8-NEXT: xscvdpsxws f10, f3 >> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6 >> +; CHECK-P8-NEXT: xscvdpsxws f9, f3 >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> -; CHECK-P8-NEXT: xscvdpsxws f12, f5 >> +; CHECK-P8-NEXT: xscvdpsxws f11, f5 >> ; CHECK-P8-NEXT: xxswapd vs5, vs5 >> -; CHECK-P8-NEXT: xscvdpsxws f13, f7 >> +; CHECK-P8-NEXT: xscvdpsxws f12, f7 >> ; CHECK-P8-NEXT: xxswapd vs7, vs7 >> -; CHECK-P8-NEXT: xscvdpsxws v2, f9 >> -; CHECK-P8-NEXT: xxswapd vs9, vs9 >> -; CHECK-P8-NEXT: mffprwz r4, f4 >> -; CHECK-P8-NEXT: xscvdpsxws v3, f11 >> -; CHECK-P8-NEXT: xxswapd vs11, vs11 >> -; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mffprwz r6, f6 >> -; CHECK-P8-NEXT: mtfprd f4, r4 >> +; CHECK-P8-NEXT: mffprwz r7, f4 >> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xscvdpsxws f13, f10 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f8 >> +; CHECK-P8-NEXT: xscvdpsxws f6, f4 >> +; CHECK-P8-NEXT: mtvsrd v4, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f9 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f11 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs4 >> -; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: mtfprd f6, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f10 >> -; CHECK-P8-NEXT: mtfprd f8, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs6 >> +; CHECK-P8-NEXT: mtvsrd v0, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f12 >> -; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: xxswapd v0, vs8 >> -; CHECK-P8-NEXT: mtfprd f10, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f13 >> -; CHECK-P8-NEXT: mtfprd f12, r4 >> -; CHECK-P8-NEXT: xxswapd v1, vs10 >> -; CHECK-P8-NEXT: mfvsrwz r4, v2 >> +; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f13 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: mtfprd f13, r6 >> -; CHECK-P8-NEXT: mfvsrwz r6, v3 >> -; CHECK-P8-NEXT: mtvsrd v2, r4 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> +; CHECK-P8-NEXT: mtvsrd v6, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f6 >> +; CHECK-P8-NEXT: xxswapd vs6, vs10 >> +; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: xxswapd vs0, vs4 >> +; CHECK-P8-NEXT: mtvsrd v2, r7 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f1 >> ; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: xxswapd v2, v2 >> -; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: mtvsrd v3, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v3, v3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r6 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: mtfprd f2, r4 >> +; CHECK-P8-NEXT: xscvdpsxws f4, f6 >> +; CHECK-P8-NEXT: vmrghh v2, v8, v2 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f3 >> +; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P8-NEXT: vmrghh v3, v9, v3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> -; CHECK-P8-NEXT: mffprwz r6, f3 >> -; CHECK-P8-NEXT: xxswapd v10, vs2 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f9 >> -; CHECK-P8-NEXT: mtfprd f3, r6 >> -; CHECK-P8-NEXT: mffprwz r6, f7 >> -; CHECK-P8-NEXT: mtfprd f9, r4 >> -; CHECK-P8-NEXT: mffprwz r4, f11 >> -; CHECK-P8-NEXT: vmrglh v4, v8, v4 >> -; CHECK-P8-NEXT: xxswapd v8, vs3 >> -; CHECK-P8-NEXT: vmrglh v5, v9, v5 >> -; CHECK-P8-NEXT: xxswapd v9, vs5 >> -; CHECK-P8-NEXT: mtfprd f7, r6 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: vmrglh v0, v10, v0 >> -; CHECK-P8-NEXT: xxswapd v10, vs7 >> -; CHECK-P8-NEXT: vmrglh v1, v8, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs9 >> -; CHECK-P8-NEXT: vmrglh v6, v9, v6 >> -; CHECK-P8-NEXT: xxswapd v9, vs0 >> -; CHECK-P8-NEXT: vmrglh v7, v10, v7 >> -; CHECK-P8-NEXT: vmrglh v2, v8, v2 >> -; CHECK-P8-NEXT: vmrglh v3, v9, v3 >> -; CHECK-P8-NEXT: vmrglw v4, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v5, v1, v0 >> -; CHECK-P8-NEXT: vmrglw v0, v7, v6 >> +; CHECK-P8-NEXT: vmrghh v4, v8, v4 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f7 >> +; CHECK-P8-NEXT: vmrghh v5, v9, v5 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f4 >> +; CHECK-P8-NEXT: vmrghh v0, v8, v0 >> +; CHECK-P8-NEXT: mtvsrd v8, r4 >> +; CHECK-P8-NEXT: mffprwz r4, f0 >> +; CHECK-P8-NEXT: vmrghh v1, v9, v1 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghh v6, v8, v6 >> +; CHECK-P8-NEXT: vmrghh v7, v9, v7 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> +; CHECK-P8-NEXT: vmrglw v4, v1, v0 >> +; CHECK-P8-NEXT: vmrglw v5, v7, v6 >> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2 >> +; CHECK-P8-NEXT: stvx v2, 0, r3 >> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4 >> -; CHECK-P8-NEXT: stvx v3, 0, r3 >> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0 >> -; CHECK-P8-NEXT: stvx v2, r3, r5 >> +; CHECK-P8-NEXT: stvx v3, r3, r5 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test16elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: lxv vs4, 0(r4) >> -; CHECK-P9-NEXT: lxv vs3, 16(r4) >> -; CHECK-P9-NEXT: lxv vs2, 32(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f5, f4 >> -; CHECK-P9-NEXT: lxv vs1, 48(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f6, f3 >> -; CHECK-P9-NEXT: lxv vs0, 64(r4) >> -; CHECK-P9-NEXT: xscvdpsxws f7, f2 >> -; CHECK-P9-NEXT: xscvdpsxws f8, f1 >> -; CHECK-P9-NEXT: xxswapd vs4, vs4 >> -; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> -; CHECK-P9-NEXT: mffprwz r5, f5 >> -; CHECK-P9-NEXT: xscvdpsxws f9, f0 >> +; CHECK-P9-NEXT: lxv vs3, 0(r4) >> +; CHECK-P9-NEXT: lxv vs2, 16(r4) >> +; CHECK-P9-NEXT: lxv vs1, 32(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> +; CHECK-P9-NEXT: lxv vs0, 48(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f5, f2 >> +; CHECK-P9-NEXT: xscvdpsxws f6, f1 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: xscvdpsxws f7, f0 >> +; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: mffprwz r5, f4 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: mtfprd f5, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f6 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: mtfprd f6, r5 >> +; CHECK-P9-NEXT: mtvsrd v2, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f5 >> +; CHECK-P9-NEXT: mtvsrd v3, r5 >> +; CHECK-P9-NEXT: mffprwz r5, f6 >> +; CHECK-P9-NEXT: mtvsrd v4, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f7 >> -; CHECK-P9-NEXT: mtfprd f7, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f9 >> -; CHECK-P9-NEXT: mtfprd f9, r5 >> -; CHECK-P9-NEXT: mffprwz r5, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r5 >> +; CHECK-P9-NEXT: mtvsrd v5, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f3 >> +; CHECK-P9-NEXT: lxv vs3, 64(r4) >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs5 >> -; CHECK-P9-NEXT: xxswapd v5, vs8 >> -; CHECK-P9-NEXT: xxswapd v0, vs9 >> -; CHECK-P9-NEXT: mtfprd f3, r5 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r5 >> -; CHECK-P9-NEXT: xxswapd vs0, vs0 >> -; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P9-NEXT: xxswapd v1, vs2 >> ; CHECK-P9-NEXT: lxv vs2, 80(r4) >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs6 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> -; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> -; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f1 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs7 >> -; CHECK-P9-NEXT: mtfprd f1, r5 >> +; CHECK-P9-NEXT: lxv vs1, 96(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> +; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> ; CHECK-P9-NEXT: mffprwz r5, f0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v1 >> -; CHECK-P9-NEXT: xxswapd v1, vs1 >> -; CHECK-P9-NEXT: mtfprd f0, r5 >> -; CHECK-P9-NEXT: vmrglh v5, v5, v1 >> -; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P9-NEXT: xxswapd v1, vs0 >> ; CHECK-P9-NEXT: lxv vs0, 112(r4) >> -; CHECK-P9-NEXT: lxv vs1, 96(r4) >> +; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r5 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghh v5, v5, v0 >> +; CHECK-P9-NEXT: mffprwz r4, f4 >> +; CHECK-P9-NEXT: vmrglw v4, v5, v4 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r4 >> +; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> +; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> +; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P9-NEXT: stxv vs4, 0(r3) >> +; CHECK-P9-NEXT: mffprwz r4, f3 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v2, vs3 >> -; CHECK-P9-NEXT: vmrglh v0, v0, v1 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r4, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r4 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghh v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r4, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r4 >> +; CHECK-P9-NEXT: mtvsrd v4, r4 >> ; CHECK-P9-NEXT: mffprwz r4, f0 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: vmrglh v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglh v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r4 >> +; CHECK-P9-NEXT: vmrghh v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2 >> ; CHECK-P9-NEXT: stxv vs0, 16(r3) >> -; CHECK-P9-NEXT: stxv vs4, 0(r3) >> ; CHECK-P9-NEXT: blr >> ; >> ; CHECK-BE-LABEL: test16elt_signed: >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll >> index 369fb3f10100..173ced964ad6 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll >> @@ -16,12 +16,10 @@ define i64 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpuxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpuxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrwz v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrwz v3, r4 >> +; CHECK-P8-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -35,7 +33,7 @@ define i64 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpuxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> ; CHECK-P9-NEXT: mtvsrws v2, r3 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -310,12 +308,10 @@ define i64 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrwz v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrwz v3, r4 >> +; CHECK-P8-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -329,7 +325,7 @@ define i64 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> ; CHECK-P9-NEXT: mtvsrws v2, r3 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll >> index fb13d1bd71f5..fd28d9a1afdc 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll >> @@ -16,12 +16,10 @@ define i16 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglb v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: clrldi r3, r3, 48 >> @@ -33,15 +31,13 @@ define i16 @test2elt(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9: # %bb.0: # %entry >> ; CHECK-P9-NEXT: xscvdpsxws f0, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: addi r3, r1, -2 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 >> ; CHECK-P9-NEXT: stxsihx v2, 0, r3 >> ; CHECK-P9-NEXT: lhz r3, -2(r1) >> @@ -84,18 +80,14 @@ define i32 @test4elt(<4 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: xxswapd v2, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs3 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xxswapd v5, vs1 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: vmrghb v2, v4, v2 >> +; CHECK-P8-NEXT: vmrghb v3, v5, v3 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> @@ -109,24 +101,20 @@ define i32 @test4elt(<4 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> +; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -185,36 +173,28 @@ define i64 @test8elt(<8 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs4 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f6, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v1, vs7 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v0, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs2 >> -; CHECK-P8-NEXT: vmrglb v2, v5, v2 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> -; CHECK-P8-NEXT: vmrglb v3, v0, v3 >> -; CHECK-P8-NEXT: vmrglb v4, v6, v4 >> -; CHECK-P8-NEXT: vmrglb v5, v5, v1 >> +; CHECK-P8-NEXT: vmrghb v2, v0, v2 >> +; CHECK-P8-NEXT: vmrghb v3, v1, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: vmrghb v4, v0, v4 >> +; CHECK-P8-NEXT: vmrghb v5, v1, v5 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> @@ -228,47 +208,39 @@ define i64 @test8elt(<8 x double>* nocapture >> readonly) local_unnamed_addr #1 { >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 48(r3) >> ; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: xxswapd v2, vs4 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> @@ -364,79 +336,63 @@ define <16 x i8> @test16elt(<16 x double>* >> nocapture readonly) local_unnamed_add >> ; CHECK-P8-NEXT: xxswapd vs7, vs7 >> ; CHECK-P8-NEXT: xscvdpsxws v2, f9 >> ; CHECK-P8-NEXT: xxswapd vs9, vs9 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws v3, f11 >> ; CHECK-P8-NEXT: xxswapd vs11, vs11 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f6 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f8 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs4 >> -; CHECK-P8-NEXT: mtfprd f6, r4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f8 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f10 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxswapd v5, vs6 >> -; CHECK-P8-NEXT: mtfprd f8, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f12 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xxswapd v0, vs8 >> -; CHECK-P8-NEXT: mtfprd f10, r4 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f12 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f13 >> ; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: xxswapd v1, vs10 >> -; CHECK-P8-NEXT: mtfprd f12, r3 >> -; CHECK-P8-NEXT: mfvsrwz r3, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: mtfprd f13, r4 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> +; CHECK-P8-NEXT: mfvsrwz r3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r4 >> ; CHECK-P8-NEXT: mfvsrwz r4, v3 >> -; CHECK-P8-NEXT: mtvsrd v2, r3 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> -; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: xxswapd v2, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, v3 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> +; CHECK-P8-NEXT: vmrghb v4, v8, v4 >> +; CHECK-P8-NEXT: vmrghb v5, v9, v5 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f5 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v10, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f5, r3 >> +; CHECK-P8-NEXT: vmrghb v0, v8, v0 >> +; CHECK-P8-NEXT: vmrghb v1, v9, v1 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f9 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f11 >> -; CHECK-P8-NEXT: vmrglb v4, v8, v4 >> -; CHECK-P8-NEXT: xxswapd v8, vs3 >> -; CHECK-P8-NEXT: vmrglb v5, v9, v5 >> -; CHECK-P8-NEXT: xxswapd v9, vs5 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: vmrglb v0, v10, v0 >> -; CHECK-P8-NEXT: xxswapd v10, vs7 >> -; CHECK-P8-NEXT: vmrglb v1, v8, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: vmrglb v6, v9, v6 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> -; CHECK-P8-NEXT: vmrglb v7, v10, v7 >> -; CHECK-P8-NEXT: vmrglb v2, v8, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v9, v3 >> +; CHECK-P8-NEXT: vmrghb v6, v8, v6 >> +; CHECK-P8-NEXT: vmrghb v2, v9, v2 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghb v3, v8, v3 >> +; CHECK-P8-NEXT: vmrghb v7, v9, v7 >> ; CHECK-P8-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P8-NEXT: vmrglh v5, v1, v0 >> -; CHECK-P8-NEXT: vmrglh v0, v7, v6 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3 >> +; CHECK-P8-NEXT: vmrglh v2, v2, v6 >> +; CHECK-P8-NEXT: vmrglh v3, v7, v3 >> +; CHECK-P8-NEXT: vmrglw v4, v5, v4 >> +; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test16elt: >> @@ -445,94 +401,78 @@ define <16 x i8> @test16elt(<16 x double>* >> nocapture readonly) local_unnamed_add >> ; CHECK-P9-NEXT: xscvdpsxws f8, f7 >> ; CHECK-P9-NEXT: xxswapd vs7, vs7 >> ; CHECK-P9-NEXT: xscvdpsxws f7, f7 >> +; CHECK-P9-NEXT: lxv vs6, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 112(r3) >> ; CHECK-P9-NEXT: lxv vs1, 96(r3) >> ; CHECK-P9-NEXT: lxv vs2, 80(r3) >> ; CHECK-P9-NEXT: lxv vs3, 64(r3) >> ; CHECK-P9-NEXT: lxv vs4, 48(r3) >> ; CHECK-P9-NEXT: lxv vs5, 32(r3) >> -; CHECK-P9-NEXT: lxv vs6, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f7 >> -; CHECK-P9-NEXT: xxswapd v2, vs8 >> -; CHECK-P9-NEXT: mtfprd f7, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs7 >> ; CHECK-P9-NEXT: xscvdpsxws f7, f6 >> ; CHECK-P9-NEXT: xxswapd vs6, vs6 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f6, f6 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f7 >> -; CHECK-P9-NEXT: mtfprd f7, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f6 >> -; CHECK-P9-NEXT: mtfprd f6, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs6 >> ; CHECK-P9-NEXT: xscvdpsxws f6, f5 >> ; CHECK-P9-NEXT: xxswapd vs5, vs5 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f6 >> -; CHECK-P9-NEXT: mtfprd f6, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f5 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs7 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs6 >> -; CHECK-P9-NEXT: mtfprd f5, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs5 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f5 >> ; CHECK-P9-NEXT: xscvdpsxws f5, f4 >> ; CHECK-P9-NEXT: xxswapd vs4, vs4 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f5 >> -; CHECK-P9-NEXT: mtfprd f5, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs4 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs5 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> -; CHECK-P9-NEXT: xxswapd v0, vs0 >> -; CHECK-P9-NEXT: vmrglb v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r3 >> +; CHECK-P9-NEXT: vmrghb v5, v5, v0 >> ; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> @@ -649,12 +589,10 @@ define i16 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvdpsxws f1, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: mffprwz r3, f1 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r4, f0 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: xxswapd v3, vs1 >> -; CHECK-P8-NEXT: vmrglb v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: clrldi r3, r3, 48 >> @@ -666,15 +604,13 @@ define i16 @test2elt_signed(<2 x double> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9: # %bb.0: # %entry >> ; CHECK-P9-NEXT: xscvdpsxws f0, v2 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs0 >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: addi r3, r1, -2 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglb v2, v3, v2 >> +; CHECK-P9-NEXT: vmrghb v2, v3, v2 >> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8 >> ; CHECK-P9-NEXT: stxsihx v2, 0, r3 >> ; CHECK-P9-NEXT: lhz r3, -2(r1) >> @@ -717,18 +653,14 @@ define i32 @test4elt_signed(<4 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: xxswapd v2, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs3 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: xxswapd v5, vs1 >> -; CHECK-P8-NEXT: vmrglb v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v5, v4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> +; CHECK-P8-NEXT: vmrghb v2, v4, v2 >> +; CHECK-P8-NEXT: vmrghb v3, v5, v3 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> @@ -742,24 +674,20 @@ define i32 @test4elt_signed(<4 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> ; CHECK-P9-NEXT: lxv vs0, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v2, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs1 >> -; CHECK-P9-NEXT: xxswapd v4, vs0 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: li r3, 0 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> +; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -818,36 +746,28 @@ define i64 @test8elt_signed(<8 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f5 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: mffprwz r3, f6 >> -; CHECK-P8-NEXT: mtfprd f5, r4 >> -; CHECK-P8-NEXT: xxswapd v2, vs4 >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f6, r3 >> -; CHECK-P8-NEXT: xxswapd v3, vs5 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f0 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> -; CHECK-P8-NEXT: xxswapd v4, vs6 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v1, vs7 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v0, vs1 >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: xxswapd v6, vs2 >> -; CHECK-P8-NEXT: vmrglb v2, v5, v2 >> -; CHECK-P8-NEXT: xxswapd v5, vs0 >> -; CHECK-P8-NEXT: vmrglb v3, v0, v3 >> -; CHECK-P8-NEXT: vmrglb v4, v6, v4 >> -; CHECK-P8-NEXT: vmrglb v5, v5, v1 >> +; CHECK-P8-NEXT: vmrghb v2, v0, v2 >> +; CHECK-P8-NEXT: vmrghb v3, v1, v3 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> +; CHECK-P8-NEXT: vmrghb v4, v0, v4 >> +; CHECK-P8-NEXT: vmrghb v5, v1, v5 >> ; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> ; CHECK-P8-NEXT: vmrglh v3, v5, v4 >> ; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> @@ -861,47 +781,39 @@ define i64 @test8elt_signed(<8 x double>* nocapture >> readonly) local_unnamed_addr >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> +; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 48(r3) >> ; CHECK-P9-NEXT: lxv vs1, 32(r3) >> -; CHECK-P9-NEXT: lxv vs2, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: xxswapd v2, vs4 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs1 >> -; CHECK-P9-NEXT: xxswapd v5, vs0 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> @@ -997,79 +909,63 @@ define <16 x i8> @test16elt_signed(<16 x double>* >> nocapture readonly) local_unna >> ; CHECK-P8-NEXT: xxswapd vs7, vs7 >> ; CHECK-P8-NEXT: xscvdpsxws v2, f9 >> ; CHECK-P8-NEXT: xxswapd vs9, vs9 >> -; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: xscvdpsxws v3, f11 >> ; CHECK-P8-NEXT: xxswapd vs11, vs11 >> +; CHECK-P8-NEXT: mffprwz r3, f4 >> ; CHECK-P8-NEXT: mffprwz r4, f6 >> ; CHECK-P8-NEXT: xscvdpsxws f0, f0 >> -; CHECK-P8-NEXT: mtfprd f4, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f8 >> ; CHECK-P8-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P8-NEXT: xxswapd v4, vs4 >> -; CHECK-P8-NEXT: mtfprd f6, r4 >> +; CHECK-P8-NEXT: mtvsrd v4, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f8 >> +; CHECK-P8-NEXT: mtvsrd v5, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f10 >> ; CHECK-P8-NEXT: xscvdpsxws f2, f2 >> -; CHECK-P8-NEXT: xxswapd v5, vs6 >> -; CHECK-P8-NEXT: mtfprd f8, r3 >> -; CHECK-P8-NEXT: mffprwz r3, f12 >> ; CHECK-P8-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P8-NEXT: xxswapd v0, vs8 >> -; CHECK-P8-NEXT: mtfprd f10, r4 >> +; CHECK-P8-NEXT: mtvsrd v0, r3 >> +; CHECK-P8-NEXT: mffprwz r3, f12 >> +; CHECK-P8-NEXT: mtvsrd v1, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f13 >> ; CHECK-P8-NEXT: xscvdpsxws f5, f5 >> -; CHECK-P8-NEXT: xxswapd v1, vs10 >> -; CHECK-P8-NEXT: mtfprd f12, r3 >> -; CHECK-P8-NEXT: mfvsrwz r3, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f7, f7 >> -; CHECK-P8-NEXT: xxswapd v6, vs12 >> -; CHECK-P8-NEXT: mtfprd f13, r4 >> +; CHECK-P8-NEXT: mtvsrd v6, r3 >> +; CHECK-P8-NEXT: mfvsrwz r3, v2 >> +; CHECK-P8-NEXT: mtvsrd v2, r4 >> ; CHECK-P8-NEXT: mfvsrwz r4, v3 >> -; CHECK-P8-NEXT: mtvsrd v2, r3 >> -; CHECK-P8-NEXT: xxswapd v7, vs13 >> -; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: xscvdpsxws f9, f9 >> -; CHECK-P8-NEXT: xxswapd v2, v2 >> ; CHECK-P8-NEXT: xscvdpsxws f11, f11 >> -; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> +; CHECK-P8-NEXT: mtvsrd v7, r4 >> +; CHECK-P8-NEXT: mffprwz r3, f0 >> ; CHECK-P8-NEXT: mffprwz r4, f1 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: xxswapd v3, v3 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f2 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> ; CHECK-P8-NEXT: mffprwz r4, f3 >> -; CHECK-P8-NEXT: mtfprd f2, r3 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> +; CHECK-P8-NEXT: vmrghb v4, v8, v4 >> +; CHECK-P8-NEXT: vmrghb v5, v9, v5 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f5 >> -; CHECK-P8-NEXT: mtfprd f3, r4 >> -; CHECK-P8-NEXT: xxswapd v10, vs2 >> ; CHECK-P8-NEXT: mffprwz r4, f7 >> -; CHECK-P8-NEXT: mtfprd f5, r3 >> +; CHECK-P8-NEXT: vmrghb v0, v8, v0 >> +; CHECK-P8-NEXT: vmrghb v1, v9, v1 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> ; CHECK-P8-NEXT: mffprwz r3, f9 >> -; CHECK-P8-NEXT: mtfprd f7, r4 >> ; CHECK-P8-NEXT: mffprwz r4, f11 >> -; CHECK-P8-NEXT: vmrglb v4, v8, v4 >> -; CHECK-P8-NEXT: xxswapd v8, vs3 >> -; CHECK-P8-NEXT: vmrglb v5, v9, v5 >> -; CHECK-P8-NEXT: xxswapd v9, vs5 >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: mtfprd f1, r4 >> -; CHECK-P8-NEXT: vmrglb v0, v10, v0 >> -; CHECK-P8-NEXT: xxswapd v10, vs7 >> -; CHECK-P8-NEXT: vmrglb v1, v8, v1 >> -; CHECK-P8-NEXT: xxswapd v8, vs0 >> -; CHECK-P8-NEXT: vmrglb v6, v9, v6 >> -; CHECK-P8-NEXT: xxswapd v9, vs1 >> -; CHECK-P8-NEXT: vmrglb v7, v10, v7 >> -; CHECK-P8-NEXT: vmrglb v2, v8, v2 >> -; CHECK-P8-NEXT: vmrglb v3, v9, v3 >> +; CHECK-P8-NEXT: vmrghb v6, v8, v6 >> +; CHECK-P8-NEXT: vmrghb v2, v9, v2 >> +; CHECK-P8-NEXT: mtvsrd v8, r3 >> +; CHECK-P8-NEXT: mtvsrd v9, r4 >> +; CHECK-P8-NEXT: vmrghb v3, v8, v3 >> +; CHECK-P8-NEXT: vmrghb v7, v9, v7 >> ; CHECK-P8-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P8-NEXT: vmrglh v5, v1, v0 >> -; CHECK-P8-NEXT: vmrglh v0, v7, v6 >> -; CHECK-P8-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P8-NEXT: vmrglw v3, v5, v4 >> -; CHECK-P8-NEXT: vmrglw v2, v2, v0 >> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3 >> +; CHECK-P8-NEXT: vmrglh v2, v2, v6 >> +; CHECK-P8-NEXT: vmrglh v3, v7, v3 >> +; CHECK-P8-NEXT: vmrglw v4, v5, v4 >> +; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test16elt_signed: >> @@ -1078,94 +974,78 @@ define <16 x i8> @test16elt_signed(<16 x double>* >> nocapture readonly) local_unna >> ; CHECK-P9-NEXT: xscvdpsxws f8, f7 >> ; CHECK-P9-NEXT: xxswapd vs7, vs7 >> ; CHECK-P9-NEXT: xscvdpsxws f7, f7 >> +; CHECK-P9-NEXT: lxv vs6, 16(r3) >> ; CHECK-P9-NEXT: lxv vs0, 112(r3) >> ; CHECK-P9-NEXT: lxv vs1, 96(r3) >> ; CHECK-P9-NEXT: lxv vs2, 80(r3) >> ; CHECK-P9-NEXT: lxv vs3, 64(r3) >> ; CHECK-P9-NEXT: lxv vs4, 48(r3) >> ; CHECK-P9-NEXT: lxv vs5, 32(r3) >> -; CHECK-P9-NEXT: lxv vs6, 16(r3) >> ; CHECK-P9-NEXT: mffprwz r3, f8 >> -; CHECK-P9-NEXT: mtfprd f8, r3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f7 >> -; CHECK-P9-NEXT: xxswapd v2, vs8 >> -; CHECK-P9-NEXT: mtfprd f7, r3 >> -; CHECK-P9-NEXT: xxswapd v3, vs7 >> ; CHECK-P9-NEXT: xscvdpsxws f7, f6 >> ; CHECK-P9-NEXT: xxswapd vs6, vs6 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f6, f6 >> +; CHECK-P9-NEXT: vmrghb v2, v2, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f7 >> -; CHECK-P9-NEXT: mtfprd f7, r3 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f6 >> -; CHECK-P9-NEXT: mtfprd f6, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs6 >> ; CHECK-P9-NEXT: xscvdpsxws f6, f5 >> ; CHECK-P9-NEXT: xxswapd vs5, vs5 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f5, f5 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f6 >> -; CHECK-P9-NEXT: mtfprd f6, r3 >> -; CHECK-P9-NEXT: mffprwz r3, f5 >> -; CHECK-P9-NEXT: vmrglb v2, v2, v3 >> -; CHECK-P9-NEXT: xxswapd v3, vs7 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> ; CHECK-P9-NEXT: vmrglh v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs6 >> -; CHECK-P9-NEXT: mtfprd f5, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs5 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> +; CHECK-P9-NEXT: mffprwz r3, f5 >> ; CHECK-P9-NEXT: xscvdpsxws f5, f4 >> ; CHECK-P9-NEXT: xxswapd vs4, vs4 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f4 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f5 >> -; CHECK-P9-NEXT: mtfprd f5, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs4 >> ; CHECK-P9-NEXT: xscvdpsxws f4, f3 >> ; CHECK-P9-NEXT: xxswapd vs3, vs3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f3 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs5 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> ; CHECK-P9-NEXT: mffprwz r3, f4 >> -; CHECK-P9-NEXT: mtfprd f4, r3 >> +; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P9-NEXT: mtvsrd v3, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> ; CHECK-P9-NEXT: xscvdpsxws f3, f2 >> ; CHECK-P9-NEXT: xxswapd vs2, vs2 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f2 >> +; CHECK-P9-NEXT: vmrghb v3, v3, v4 >> ; CHECK-P9-NEXT: mffprwz r3, f3 >> -; CHECK-P9-NEXT: mtfprd f3, r3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs2 >> ; CHECK-P9-NEXT: xscvdpsxws f2, f1 >> ; CHECK-P9-NEXT: xxswapd vs1, vs1 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f1 >> -; CHECK-P9-NEXT: vmrglw v2, v3, v2 >> -; CHECK-P9-NEXT: xxswapd v3, vs4 >> -; CHECK-P9-NEXT: vmrglb v3, v3, v4 >> -; CHECK-P9-NEXT: xxswapd v4, vs3 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: mffprwz r3, f2 >> -; CHECK-P9-NEXT: mtfprd f2, r3 >> +; CHECK-P9-NEXT: vmrglh v3, v4, v3 >> +; CHECK-P9-NEXT: mtvsrd v4, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: xxswapd v4, vs2 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> ; CHECK-P9-NEXT: xscvdpsxws f1, f0 >> ; CHECK-P9-NEXT: xxswapd vs0, vs0 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: xscvdpsxws f0, f0 >> +; CHECK-P9-NEXT: vmrghb v4, v4, v5 >> ; CHECK-P9-NEXT: mffprwz r3, f1 >> -; CHECK-P9-NEXT: mtfprd f1, r3 >> +; CHECK-P9-NEXT: mtvsrd v5, r3 >> ; CHECK-P9-NEXT: mffprwz r3, f0 >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: vmrglb v4, v4, v5 >> -; CHECK-P9-NEXT: xxswapd v5, vs1 >> -; CHECK-P9-NEXT: xxswapd v0, vs0 >> -; CHECK-P9-NEXT: vmrglb v5, v5, v0 >> +; CHECK-P9-NEXT: mtvsrd v0, r3 >> +; CHECK-P9-NEXT: vmrghb v5, v5, v0 >> ; CHECK-P9-NEXT: vmrglh v4, v5, v4 >> ; CHECK-P9-NEXT: vmrglw v3, v4, v3 >> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2 >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll >> index e51af62cb128..5ecd34941b39 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll >> @@ -24,9 +24,9 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr >> #0 { >> ; CHECK-P8-NEXT: xscvuxdsp f1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -43,12 +43,12 @@ define i64 @test2elt(i32 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2 >> ; CHECK-P9-NEXT: clrlwi r3, r3, 16 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: mtfprwz f0, r3 >> ; CHECK-P9-NEXT: xscvuxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -80,25 +80,17 @@ entry: >> define <4 x float> @test4elt(i64 %a.coerce) local_unnamed_addr #1 { >> ; CHECK-P8-LABEL: test4elt: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l >> -; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v4, v2, v3 >> +; CHECK-P8-NEXT: xxlxor v2, v2, v2 >> +; CHECK-P8-NEXT: mtvsrd v3, r3 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v3 >> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test4elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha >> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v3, 0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: xxlxor v3, v3, v3 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> ; CHECK-P9-NEXT: xvcvuxwsp v2, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -121,17 +113,11 @@ entry: >> define void @test8elt(<8 x float>* noalias nocapture sret %agg.result, >> <8 x i16> %a) local_unnamed_addr #2 { >> ; CHECK-P8-LABEL: test8elt: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_0 at toc@ha >> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha >> -; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_0 at toc@l >> -; CHECK-P8-NEXT: lvx v3, 0, r4 >> -; CHECK-P8-NEXT: addi r4, r5, .LCPI2_1 at toc@l >> -; CHECK-P8-NEXT: lvx v5, 0, r4 >> +; CHECK-P8-NEXT: xxlxor v3, v3, v3 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: vperm v3, v4, v2, v3 >> -; CHECK-P8-NEXT: vperm v2, v4, v2, v5 >> -; CHECK-P8-NEXT: xvcvuxwsp v3, v3 >> +; CHECK-P8-NEXT: vmrglh v4, v3, v2 >> +; CHECK-P8-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P8-NEXT: xvcvuxwsp v3, v4 >> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 >> ; CHECK-P8-NEXT: stvx v3, 0, r3 >> ; CHECK-P8-NEXT: stvx v2, r3, r4 >> @@ -139,19 +125,13 @@ define void @test8elt(<8 x float>* noalias >> nocapture sret %agg.result, <8 x i16> >> ; >> ; CHECK-P9-LABEL: test8elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha >> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l >> -; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha >> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l >> -; CHECK-P9-NEXT: vperm v3, v4, v2, v3 >> -; CHECK-P9-NEXT: xvcvuxwsp vs0, v3 >> -; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3 >> -; CHECK-P9-NEXT: stxv vs0, 0(r3) >> +; CHECK-P9-NEXT: xxlxor v3, v3, v3 >> +; CHECK-P9-NEXT: vmrglh v4, v3, v2 >> +; CHECK-P9-NEXT: vmrghh v2, v3, v2 >> +; CHECK-P9-NEXT: xvcvuxwsp vs0, v4 >> ; CHECK-P9-NEXT: xvcvuxwsp vs1, v2 >> ; CHECK-P9-NEXT: stxv vs1, 16(r3) >> +; CHECK-P9-NEXT: stxv vs0, 0(r3) >> ; CHECK-P9-NEXT: blr >> ; >> ; CHECK-BE-LABEL: test8elt: >> @@ -276,9 +256,9 @@ define i64 @test2elt_signed(i32 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvsxdsp f1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -295,12 +275,12 @@ define i64 @test2elt_signed(i32 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2 >> ; CHECK-P9-NEXT: extsh r3, r3 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: mtfprwa f0, r3 >> ; CHECK-P9-NEXT: xscvsxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -332,11 +312,10 @@ entry: >> define <4 x float> @test4elt_signed(i64 %a.coerce) local_unnamed_addr #1 >> { >> ; CHECK-P8-LABEL: test4elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> +; CHECK-P8-NEXT: mtvsrd v2, r3 >> ; CHECK-P8-NEXT: vspltisw v3, 8 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> +; CHECK-P8-NEXT: vmrghh v2, v2, v2 >> ; CHECK-P8-NEXT: vadduwm v3, v3, v3 >> -; CHECK-P8-NEXT: vmrglh v2, v2, v2 >> ; CHECK-P8-NEXT: vslw v2, v2, v3 >> ; CHECK-P8-NEXT: vsraw v2, v2, v3 >> ; CHECK-P8-NEXT: xvcvsxwsp v2, v2 >> @@ -344,9 +323,8 @@ define <4 x float> @test4elt_signed(i64 %a.coerce) >> local_unnamed_addr #1 { >> ; >> ; CHECK-P9-LABEL: test4elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r3 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vmrglh v2, v2, v2 >> +; CHECK-P9-NEXT: mtvsrd v2, r3 >> +; CHECK-P9-NEXT: vmrghh v2, v2, v2 >> ; CHECK-P9-NEXT: vextsh2w v2, v2 >> ; CHECK-P9-NEXT: xvcvsxwsp v2, v2 >> ; CHECK-P9-NEXT: blr >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll >> index faec95831816..ea8ede3af22a 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll >> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i32 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: test2elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l >> +; CHECK-P8-NEXT: mtvsrwz v2, r3 >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> +; CHECK-P8-NEXT: lvx v3, 0, r4 >> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 >> ; CHECK-P8-NEXT: xvcvuxddp v2, v2 >> ; CHECK-P8-NEXT: blr >> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture >> sret %agg.result, i64 %a.c >> ; CHECK-P8-LABEL: test4elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v2, r4 >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l >> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> +; CHECK-P8-NEXT: lvx v3, 0, r5 >> ; CHECK-P8-NEXT: lvx v5, 0, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 >> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 >> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 >> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3 >> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 >> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 >> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 >> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, vs0 >> ; CHECK-P8-NEXT: xxswapd vs1, vs1 >> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4 >> @@ -74,11 +72,10 @@ define void @test4elt(<4 x double>* noalias nocapture >> sret %agg.result, i64 %a.c >> ; >> ; CHECK-P9-LABEL: test4elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> ; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_1 at toc@l >> @@ -370,14 +367,13 @@ define <2 x double> @test2elt_signed(i32 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: test2elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> +; CHECK-P8-NEXT: mtvsrwz v3, r3 >> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l >> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l >> +; CHECK-P8-NEXT: lvx v2, 0, r4 >> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 >> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: xxswapd v3, vs0 >> ; CHECK-P8-NEXT: vsld v2, v2, v3 >> ; CHECK-P8-NEXT: vsrad v2, v2, v3 >> @@ -415,17 +411,16 @@ define void @test4elt_signed(<4 x double>* noalias >> nocapture sret %agg.result, i >> ; CHECK-P8-LABEL: test4elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha >> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: lvx v4, 0, r4 >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha >> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l >> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l >> +; CHECK-P8-NEXT: lvx v2, 0, r5 >> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l >> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> +; CHECK-P8-NEXT: lvx v4, 0, r5 >> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 >> ; CHECK-P8-NEXT: xxswapd v4, vs0 >> @@ -443,14 +438,13 @@ define void @test4elt_signed(<4 x double>* noalias >> nocapture sret %agg.result, i >> ; >> ; CHECK-P9-LABEL: test4elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_1 at toc@l >> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: vextsh2d v3, v3 >> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3 >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll >> index 6f046f69ecca..f152c2b008ff 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll >> @@ -18,9 +18,9 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr >> #0 { >> ; CHECK-P8-NEXT: xscvuxdsp f0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -30,12 +30,12 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr >> #0 { >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> ; CHECK-P9-NEXT: xscvuxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: xxlor vs0, v2, v2 >> ; CHECK-P9-NEXT: xscvuxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -311,9 +311,9 @@ define i64 @test2elt_signed(<2 x i64> %a) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvsxdsp f0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -323,12 +323,12 @@ define i64 @test2elt_signed(<2 x i64> %a) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xxswapd vs0, v2 >> ; CHECK-P9-NEXT: xscvsxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: xxlor vs0, v2, v2 >> ; CHECK-P9-NEXT: xscvsxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll >> index ce97ed67baa1..f2cb9f5f45fb 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll >> @@ -24,9 +24,9 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr >> #0 { >> ; CHECK-P8-NEXT: xscvuxdsp f1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -43,12 +43,12 @@ define i64 @test2elt(i16 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P9-NEXT: vextubrx r3, r3, v2 >> ; CHECK-P9-NEXT: clrlwi r3, r3, 24 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: mtfprwz f0, r3 >> ; CHECK-P9-NEXT: xscvuxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -81,11 +81,10 @@ define <4 x float> @test4elt(i32 %a.coerce) >> local_unnamed_addr #1 { >> ; CHECK-P8-LABEL: test4elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l >> +; CHECK-P8-NEXT: mtvsrwz v2, r3 >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI1_0 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> +; CHECK-P8-NEXT: lvx v3, 0, r4 >> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 >> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2 >> ; CHECK-P8-NEXT: blr >> @@ -121,30 +120,28 @@ define void @test8elt(<8 x float>* noalias >> nocapture sret %agg.result, i64 %a.co >> ; CHECK-P8-LABEL: test8elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_1 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v2, r4 >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l >> +; CHECK-P8-NEXT: addi r4, r6, .LCPI2_1 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> +; CHECK-P8-NEXT: lvx v3, 0, r5 >> ; CHECK-P8-NEXT: lvx v5, 0, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 >> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 >> -; CHECK-P8-NEXT: xvcvuxwsp v2, v2 >> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 >> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 >> ; CHECK-P8-NEXT: xvcvuxwsp v3, v3 >> -; CHECK-P8-NEXT: stvx v2, 0, r3 >> -; CHECK-P8-NEXT: stvx v3, r3, r4 >> +; CHECK-P8-NEXT: xvcvuxwsp v2, v2 >> +; CHECK-P8-NEXT: stvx v3, 0, r3 >> +; CHECK-P8-NEXT: stvx v2, r3, r4 >> ; CHECK-P8-NEXT: blr >> ; >> ; CHECK-P9-LABEL: test8elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> ; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l >> @@ -292,9 +289,9 @@ define i64 @test2elt_signed(i16 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-NEXT: xscvsxdsp f1, f1 >> ; CHECK-P8-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P8-NEXT: xscvdpspn vs1, f1 >> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-P8-NEXT: vmrglw v2, v3, v2 >> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-P8-NEXT: vmrghw v2, v3, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, v2 >> ; CHECK-P8-NEXT: mffprd r3, f0 >> ; CHECK-P8-NEXT: blr >> @@ -311,12 +308,12 @@ define i64 @test2elt_signed(i16 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> ; CHECK-P9-NEXT: vextubrx r3, r3, v2 >> ; CHECK-P9-NEXT: extsb r3, r3 >> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1 >> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3 >> ; CHECK-P9-NEXT: mtfprwa f0, r3 >> ; CHECK-P9-NEXT: xscvsxdsp f0, f0 >> ; CHECK-P9-NEXT: xscvdpspn vs0, f0 >> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-P9-NEXT: vmrglw v2, v2, v3 >> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-P9-NEXT: vmrghw v2, v2, v3 >> ; CHECK-P9-NEXT: mfvsrld r3, v2 >> ; CHECK-P9-NEXT: blr >> ; >> @@ -349,11 +346,10 @@ define <4 x float> @test4elt_signed(i32 %a.coerce) >> local_unnamed_addr #1 { >> ; CHECK-P8-LABEL: test4elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI5_0 at toc@l >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 >> +; CHECK-P8-NEXT: mtvsrwz v3, r3 >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI5_0 at toc@l >> +; CHECK-P8-NEXT: lvx v2, 0, r4 >> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: vspltisw v3, 12 >> ; CHECK-P8-NEXT: vadduwm v3, v3, v3 >> ; CHECK-P8-NEXT: vslw v2, v2, v3 >> @@ -392,15 +388,14 @@ define void @test8elt_signed(<8 x float>* noalias >> nocapture sret %agg.result, i6 >> ; CHECK-P8-LABEL: test8elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_1 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> ; CHECK-P8-NEXT: vspltisw v5, 12 >> +; CHECK-P8-NEXT: li r4, 16 >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l >> ; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: lvx v4, 0, r4 >> -; CHECK-P8-NEXT: li r4, 16 >> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_1 at toc@l >> +; CHECK-P8-NEXT: lvx v4, 0, r5 >> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 >> ; CHECK-P8-NEXT: vadduwm v4, v5, v5 >> @@ -416,14 +411,13 @@ define void @test8elt_signed(<8 x float>* noalias >> nocapture sret %agg.result, i6 >> ; >> ; CHECK-P9-LABEL: test8elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l >> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: vextsb2w v3, v3 >> ; CHECK-P9-NEXT: xvcvsxwsp vs0, v3 >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> >> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll >> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll >> index b4582e844f30..268fc9b7d4cc 100644 >> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll >> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll >> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i16 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: test2elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l >> +; CHECK-P8-NEXT: mtvsrwz v2, r3 >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> +; CHECK-P8-NEXT: lvx v3, 0, r4 >> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3 >> ; CHECK-P8-NEXT: xvcvuxddp v2, v2 >> ; CHECK-P8-NEXT: blr >> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture >> sret %agg.result, i32 %a.c >> ; CHECK-P8-LABEL: test4elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha >> +; CHECK-P8-NEXT: mtvsrwz v2, r4 >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l >> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> +; CHECK-P8-NEXT: lvx v3, 0, r5 >> ; CHECK-P8-NEXT: lvx v5, 0, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 >> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5 >> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 >> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3 >> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 >> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5 >> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 >> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2 >> ; CHECK-P8-NEXT: xxswapd vs0, vs0 >> ; CHECK-P8-NEXT: xxswapd vs1, vs1 >> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4 >> @@ -118,33 +116,32 @@ define void @test8elt(<8 x double>* noalias >> nocapture sret %agg.result, i64 %a.c >> ; CHECK-P8-LABEL: test8elt: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_2 at toc@ha >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_2 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v2, r4 >> +; CHECK-P8-NEXT: addis r4, r2, .LCPI2_3 at toc@ha >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_2 at toc@l >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI2_3 at toc@l >> ; CHECK-P8-NEXT: xxlxor v4, v4, v4 >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_3 at toc@ha >> -; CHECK-P8-NEXT: lvx v5, 0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: addi r5, r5, .LCPI2_3 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l >> -; CHECK-P8-NEXT: lvx v0, 0, r5 >> -; CHECK-P8-NEXT: lvx v1, 0, r4 >> +; CHECK-P8-NEXT: lvx v3, 0, r5 >> +; CHECK-P8-NEXT: addi r5, r6, .LCPI2_2 at toc@l >> +; CHECK-P8-NEXT: lvx v0, 0, r4 >> ; CHECK-P8-NEXT: li r4, 48 >> +; CHECK-P8-NEXT: lvx v5, 0, r5 >> +; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha >> +; CHECK-P8-NEXT: addi r5, r5, .LCPI2_1 at toc@l >> +; CHECK-P8-NEXT: lvx v1, 0, r5 >> +; CHECK-P8-NEXT: vperm v0, v4, v2, v0 >> ; CHECK-P8-NEXT: li r5, 32 >> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2 >> -; CHECK-P8-NEXT: vperm v5, v4, v3, v5 >> -; CHECK-P8-NEXT: vperm v0, v4, v3, v0 >> -; CHECK-P8-NEXT: vperm v3, v4, v3, v1 >> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2 >> -; CHECK-P8-NEXT: xvcvuxddp vs1, v5 >> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3 >> +; CHECK-P8-NEXT: vperm v5, v4, v2, v5 >> +; CHECK-P8-NEXT: vperm v2, v4, v2, v1 >> ; CHECK-P8-NEXT: xvcvuxddp vs2, v0 >> -; CHECK-P8-NEXT: xvcvuxddp vs3, v3 >> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3 >> +; CHECK-P8-NEXT: xvcvuxddp vs1, v5 >> +; CHECK-P8-NEXT: xvcvuxddp vs3, v2 >> +; CHECK-P8-NEXT: xxswapd vs2, vs2 >> ; CHECK-P8-NEXT: xxswapd vs0, vs0 >> ; CHECK-P8-NEXT: xxswapd vs1, vs1 >> -; CHECK-P8-NEXT: xxswapd vs2, vs2 >> ; CHECK-P8-NEXT: xxswapd vs3, vs3 >> ; CHECK-P8-NEXT: stxvd2x vs2, r3, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> @@ -155,11 +152,10 @@ define void @test8elt(<8 x double>* noalias >> nocapture sret %agg.result, i64 %a.c >> ; >> ; CHECK-P9-LABEL: test8elt: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> ; CHECK-P9-NEXT: xxlxor v4, v4, v4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l >> @@ -404,14 +400,13 @@ define <2 x double> @test2elt_signed(i16 %a.coerce) >> local_unnamed_addr #0 { >> ; CHECK-P8-LABEL: test2elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r3 >> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l >> -; CHECK-P8-NEXT: xxswapd v2, vs0 >> -; CHECK-P8-NEXT: lvx v3, 0, r3 >> +; CHECK-P8-NEXT: mtvsrwz v3, r3 >> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l >> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l >> +; CHECK-P8-NEXT: lvx v2, 0, r4 >> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3 >> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3 >> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: xxswapd v3, vs0 >> ; CHECK-P8-NEXT: vsld v2, v2, v3 >> ; CHECK-P8-NEXT: vsrad v2, v2, v3 >> @@ -449,17 +444,16 @@ define void @test4elt_signed(<4 x double>* noalias >> nocapture sret %agg.result, i >> ; CHECK-P8-LABEL: test4elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha >> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l >> -; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: lvx v4, 0, r4 >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha >> +; CHECK-P8-NEXT: mtvsrwz v3, r4 >> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha >> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l >> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l >> +; CHECK-P8-NEXT: lvx v2, 0, r5 >> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l >> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r4, 16 >> +; CHECK-P8-NEXT: lvx v4, 0, r5 >> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4 >> ; CHECK-P8-NEXT: xxswapd v4, vs0 >> @@ -523,26 +517,25 @@ entry: >> define void @test8elt_signed(<8 x double>* noalias nocapture sret >> %agg.result, i64 %a.coerce) local_unnamed_addr #1 { >> ; CHECK-P8-LABEL: test8elt_signed: >> ; CHECK-P8: # %bb.0: # %entry >> -; CHECK-P8-NEXT: mtfprd f0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_2 at toc@ha >> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha >> -; CHECK-P8-NEXT: addis r6, r2, .LCPI6_3 at toc@ha >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_2 at toc@l >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_2 at toc@ha >> +; CHECK-P8-NEXT: mtvsrd v3, r4 >> +; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha >> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l >> -; CHECK-P8-NEXT: addi r6, r6, .LCPI6_3 at toc@l >> -; CHECK-P8-NEXT: lvx v4, 0, r4 >> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_4 at toc@ha >> +; CHECK-P8-NEXT: addi r6, r6, .LCPI6_2 at toc@l >> +; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l >> ; CHECK-P8-NEXT: lvx v2, 0, r5 >> -; CHECK-P8-NEXT: xxswapd v3, vs0 >> -; CHECK-P8-NEXT: lvx v5, 0, r6 >> -; CHECK-P8-NEXT: addis r5, r2, .LCPI6_1 at toc@ha >> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_4 at toc@l >> -; CHECK-P8-NEXT: addi r5, r5, .LCPI6_1 at toc@l >> -; CHECK-P8-NEXT: lvx v0, 0, r4 >> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r5 >> +; CHECK-P8-NEXT: addis r5, r2, .LCPI6_3 at toc@ha >> +; CHECK-P8-NEXT: lvx v4, 0, r6 >> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_4 at toc@ha >> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4 >> ; CHECK-P8-NEXT: li r4, 48 >> -; CHECK-P8-NEXT: li r5, 32 >> +; CHECK-P8-NEXT: addi r5, r5, .LCPI6_3 at toc@l >> +; CHECK-P8-NEXT: lvx v5, 0, r5 >> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_4 at toc@l >> +; CHECK-P8-NEXT: lvx v0, 0, r5 >> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2 >> +; CHECK-P8-NEXT: li r5, 32 >> ; CHECK-P8-NEXT: vperm v4, v3, v3, v4 >> ; CHECK-P8-NEXT: vperm v5, v3, v3, v5 >> ; CHECK-P8-NEXT: vperm v3, v3, v3, v0 >> @@ -572,14 +565,13 @@ define void @test8elt_signed(<8 x double>* noalias >> nocapture sret %agg.result, i >> ; >> ; CHECK-P9-LABEL: test8elt_signed: >> ; CHECK-P9: # %bb.0: # %entry >> -; CHECK-P9-NEXT: mtfprd f0, r4 >> +; CHECK-P9-NEXT: mtvsrd v2, r4 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> -; CHECK-P9-NEXT: xxswapd v2, vs0 >> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha >> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l >> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3 >> ; CHECK-P9-NEXT: vextsb2d v3, v3 >> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3 >> ; CHECK-P9-NEXT: lxvx v3, 0, r4 >> >> diff --git >> a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll >> b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll >> index 7e51f2b862ab..29955dc17f67 100644 >> --- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll >> +++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll >> @@ -82,10 +82,10 @@ define <3 x float> @constrained_vector_fdiv_v3f32() >> #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: xscvdpspn 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 >> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 >> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: blr >> ; >> @@ -106,12 +106,12 @@ define <3 x float> @constrained_vector_fdiv_v3f32() >> #0 { >> ; PC64LE9-NEXT: xsdivsp 2, 2, 0 >> ; PC64LE9-NEXT: xsdivsp 0, 3, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: blr >> entry: >> @@ -359,11 +359,11 @@ define <3 x float> @constrained_vector_frem_v3f32() >> #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI7_4 at toc@l >> ; PC64LE-NEXT: lvx 4, 0, 3 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 30 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: addi 1, 1, 64 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -401,15 +401,15 @@ define <3 x float> @constrained_vector_frem_v3f32() >> #0 { >> ; PC64LE9-NEXT: bl fmodf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 29 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI7_4 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI7_4 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: addi 1, 1, 64 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -710,10 +710,10 @@ define <3 x float> @constrained_vector_fmul_v3f32() >> #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: xscvdpspn 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 >> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 >> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: blr >> ; >> @@ -735,11 +735,11 @@ define <3 x float> @constrained_vector_fmul_v3f32() >> #0 { >> ; PC64LE9-NEXT: xsmulsp 1, 1, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 1, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 1, 1, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 1, 1, 3 >> ; PC64LE9-NEXT: xscvdpspn 1, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: blr >> entry: >> @@ -925,10 +925,10 @@ define <3 x float> @constrained_vector_fadd_v3f32() >> #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: xscvdpspn 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 >> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 >> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: blr >> ; >> @@ -945,15 +945,15 @@ define <3 x float> @constrained_vector_fadd_v3f32() >> #0 { >> ; PC64LE9-NEXT: xsaddsp 1, 0, 1 >> ; PC64LE9-NEXT: xsaddsp 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI17_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI17_3 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: blr >> entry: >> @@ -1137,10 +1137,10 @@ define <3 x float> >> @constrained_vector_fsub_v3f32() #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: xscvdpspn 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1 >> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3 >> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: blr >> ; >> @@ -1157,15 +1157,15 @@ define <3 x float> >> @constrained_vector_fsub_v3f32() #0 { >> ; PC64LE9-NEXT: xssubsp 1, 0, 1 >> ; PC64LE9-NEXT: xssubsp 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI22_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI22_3 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: blr >> entry: >> @@ -1333,12 +1333,12 @@ define <3 x float> >> @constrained_vector_sqrt_v3f32() #0 { >> ; PC64LE-NEXT: xssqrtsp 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 2 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: blr >> ; >> @@ -1358,10 +1358,10 @@ define <3 x float> >> @constrained_vector_sqrt_v3f32() #0 { >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 1, 1 >> ; PC64LE9-NEXT: xscvdpspn 2, 2 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: blr >> @@ -1588,11 +1588,11 @@ define <3 x float> >> @constrained_vector_pow_v3f32() #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI32_4 at toc@l >> ; PC64LE-NEXT: lvx 4, 0, 3 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 30 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: addi 1, 1, 64 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -1630,15 +1630,15 @@ define <3 x float> >> @constrained_vector_pow_v3f32() #0 { >> ; PC64LE9-NEXT: bl powf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 29 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI32_4 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI32_4 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: addi 1, 1, 64 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -1992,11 +1992,11 @@ define <3 x float> >> @constrained_vector_powi_v3f32() #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI37_3 at toc@l >> ; PC64LE-NEXT: lvx 4, 0, 3 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -2030,15 +2030,15 @@ define <3 x float> >> @constrained_vector_powi_v3f32() #0 { >> ; PC64LE9-NEXT: bl __powisf2 >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI37_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI37_3 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -2360,12 +2360,12 @@ define <3 x float> >> @constrained_vector_sin_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI42_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI42_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -2396,15 +2396,15 @@ define <3 x float> >> @constrained_vector_sin_v3f32() #0 { >> ; PC64LE9-NEXT: bl sinf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI42_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI42_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -2709,12 +2709,12 @@ define <3 x float> >> @constrained_vector_cos_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI47_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI47_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -2745,15 +2745,15 @@ define <3 x float> >> @constrained_vector_cos_v3f32() #0 { >> ; PC64LE9-NEXT: bl cosf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI47_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI47_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -3058,12 +3058,12 @@ define <3 x float> >> @constrained_vector_exp_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI52_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI52_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -3094,15 +3094,15 @@ define <3 x float> >> @constrained_vector_exp_v3f32() #0 { >> ; PC64LE9-NEXT: bl expf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI52_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI52_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -3407,12 +3407,12 @@ define <3 x float> >> @constrained_vector_exp2_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI57_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI57_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -3443,15 +3443,15 @@ define <3 x float> >> @constrained_vector_exp2_v3f32() #0 { >> ; PC64LE9-NEXT: bl exp2f >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI57_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI57_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -3756,12 +3756,12 @@ define <3 x float> >> @constrained_vector_log_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI62_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI62_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -3792,15 +3792,15 @@ define <3 x float> >> @constrained_vector_log_v3f32() #0 { >> ; PC64LE9-NEXT: bl logf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI62_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI62_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -4105,12 +4105,12 @@ define <3 x float> >> @constrained_vector_log10_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI67_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI67_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -4141,15 +4141,15 @@ define <3 x float> >> @constrained_vector_log10_v3f32() #0 { >> ; PC64LE9-NEXT: bl log10f >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI67_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI67_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -4454,12 +4454,12 @@ define <3 x float> >> @constrained_vector_log2_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI72_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI72_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -4490,15 +4490,15 @@ define <3 x float> >> @constrained_vector_log2_v3f32() #0 { >> ; PC64LE9-NEXT: bl log2f >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI72_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI72_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -4748,12 +4748,12 @@ define <3 x float> >> @constrained_vector_rint_v3f32() #0 { >> ; PC64LE-NEXT: xsrdpic 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 2 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: blr >> ; >> @@ -4773,10 +4773,10 @@ define <3 x float> >> @constrained_vector_rint_v3f32() #0 { >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 1, 1 >> ; PC64LE9-NEXT: xscvdpspn 2, 2 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: blr >> @@ -4947,12 +4947,12 @@ define <3 x float> >> @constrained_vector_nearbyint_v3f32() #0 { >> ; PC64LE-NEXT: addis 3, 2, .LCPI82_3 at toc@ha >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI82_3 at toc@l >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 31 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: addi 1, 1, 48 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -4983,15 +4983,15 @@ define <3 x float> >> @constrained_vector_nearbyint_v3f32() #0 { >> ; PC64LE9-NEXT: bl nearbyintf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 31 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI82_3 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI82_3 at toc@l >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: addi 1, 1, 48 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -5184,11 +5184,11 @@ define <3 x float> >> @constrained_vector_maxnum_v3f32() #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI87_5 at toc@l >> ; PC64LE-NEXT: lvx 4, 0, 3 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 30 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: addi 1, 1, 64 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -5227,15 +5227,15 @@ define <3 x float> >> @constrained_vector_maxnum_v3f32() #0 { >> ; PC64LE9-NEXT: bl fmaxf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 29 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI87_5 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI87_5 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: addi 1, 1, 64 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -5471,11 +5471,11 @@ define <3 x float> >> @constrained_vector_minnum_v3f32() #0 { >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> ; PC64LE-NEXT: addi 3, 3, .LCPI92_5 at toc@l >> ; PC64LE-NEXT: lvx 4, 0, 3 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 30 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 2, 3 >> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 2, 3 >> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE-NEXT: addi 1, 1, 64 >> ; PC64LE-NEXT: ld 0, 16(1) >> @@ -5514,15 +5514,15 @@ define <3 x float> >> @constrained_vector_minnum_v3f32() #0 { >> ; PC64LE9-NEXT: bl fminf >> ; PC64LE9-NEXT: nop >> ; PC64LE9-NEXT: xscvdpspn 0, 1 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 29 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: xscvdpspn 0, 30 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI92_5 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI92_5 at toc@l >> ; PC64LE9-NEXT: lxvx 36, 0, 3 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 3, 2, 4 >> ; PC64LE9-NEXT: addi 1, 1, 64 >> ; PC64LE9-NEXT: ld 0, 16(1) >> @@ -5686,9 +5686,9 @@ define <2 x float> >> @constrained_vector_fptrunc_v2f64() #0 { >> ; PC64LE-NEXT: xsrsp 1, 1 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> ; PC64LE-NEXT: blr >> ; >> ; PC64LE9-LABEL: constrained_vector_fptrunc_v2f64: >> @@ -5698,12 +5698,12 @@ define <2 x float> >> @constrained_vector_fptrunc_v2f64() #0 { >> ; PC64LE9-NEXT: addis 3, 2, .LCPI96_1 at toc@ha >> ; PC64LE9-NEXT: xsrsp 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: lfd 0, .LCPI96_1 at toc@l(3) >> ; PC64LE9-NEXT: xsrsp 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: blr >> entry: >> %result = call <2 x float> >> @llvm.experimental.constrained.fptrunc.v2f32.v2f64( >> @@ -5729,12 +5729,12 @@ define <3 x float> >> @constrained_vector_fptrunc_v3f64() #0 { >> ; PC64LE-NEXT: xsrsp 2, 2 >> ; PC64LE-NEXT: xscvdpspn 0, 0 >> ; PC64LE-NEXT: xscvdpspn 1, 1 >> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE-NEXT: xscvdpspn 0, 2 >> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1 >> -; PC64LE-NEXT: vmrglw 2, 3, 2 >> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3 >> +; PC64LE-NEXT: vmrghw 2, 3, 2 >> ; PC64LE-NEXT: lvx 3, 0, 3 >> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE-NEXT: blr >> ; >> @@ -5745,20 +5745,20 @@ define <3 x float> >> @constrained_vector_fptrunc_v3f64() #0 { >> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_1 at toc@ha >> ; PC64LE9-NEXT: xsrsp 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3 >> ; PC64LE9-NEXT: lfd 0, .LCPI97_1 at toc@l(3) >> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_2 at toc@ha >> ; PC64LE9-NEXT: addi 3, 3, .LCPI97_2 at toc@l >> ; PC64LE9-NEXT: xsrsp 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1 >> -; PC64LE9-NEXT: vmrglw 2, 3, 2 >> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3 >> +; PC64LE9-NEXT: vmrghw 2, 3, 2 >> ; PC64LE9-NEXT: lxvx 35, 0, 3 >> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_3 at toc@ha >> ; PC64LE9-NEXT: lfd 0, .LCPI97_3 at toc@l(3) >> ; PC64LE9-NEXT: xsrsp 0, 0 >> ; PC64LE9-NEXT: xscvdpspn 0, 0 >> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1 >> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3 >> ; PC64LE9-NEXT: vperm 2, 4, 2, 3 >> ; PC64LE9-NEXT: blr >> entry: >> >> diff --git a/llvm/test/CodeGen/PowerPC/vsx.ll >> b/llvm/test/CodeGen/PowerPC/vsx.ll >> index 8b4e3640ef6b..4a78218262ca 100644 >> --- a/llvm/test/CodeGen/PowerPC/vsx.ll >> +++ b/llvm/test/CodeGen/PowerPC/vsx.ll >> @@ -1404,9 +1404,9 @@ define <2 x float> @test44(<2 x i64> %a) { >> ; CHECK-LE-NEXT: xscvuxdsp f0, f0 >> ; CHECK-LE-NEXT: xscvdpspn vs1, f1 >> ; CHECK-LE-NEXT: xscvdpspn vs0, f0 >> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-LE-NEXT: vmrglw v2, v3, v2 >> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-LE-NEXT: vmrghw v2, v3, v2 >> ; CHECK-LE-NEXT: blr >> %v = uitofp <2 x i64> %a to <2 x float> >> ret <2 x float> %v >> @@ -1486,9 +1486,9 @@ define <2 x float> @test45(<2 x i64> %a) { >> ; CHECK-LE-NEXT: xscvsxdsp f0, f0 >> ; CHECK-LE-NEXT: xscvdpspn vs1, f1 >> ; CHECK-LE-NEXT: xscvdpspn vs0, f0 >> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1 >> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1 >> -; CHECK-LE-NEXT: vmrglw v2, v3, v2 >> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3 >> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3 >> +; CHECK-LE-NEXT: vmrghw v2, v3, v2 >> ; CHECK-LE-NEXT: blr >> %v = sitofp <2 x i64> %a to <2 x float> >> ret <2 x float> %v >> @@ -2437,12 +2437,11 @@ define <2 x i32> @test80(i32 %v) { >> ; >> ; CHECK-LE-LABEL: test80: >> ; CHECK-LE: # %bb.0: >> -; CHECK-LE-NEXT: mtfprd f0, r3 >> +; CHECK-LE-NEXT: mtfprwz f0, r3 >> ; CHECK-LE-NEXT: addis r4, r2, .LCPI65_0 at toc@ha >> ; CHECK-LE-NEXT: addi r3, r4, .LCPI65_0 at toc@l >> -; CHECK-LE-NEXT: xxswapd vs0, vs0 >> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1 >> ; CHECK-LE-NEXT: lvx v3, 0, r3 >> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3 >> ; CHECK-LE-NEXT: vadduwm v2, v2, v3 >> ; CHECK-LE-NEXT: blr >> %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 >> >> diff --git a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll >> b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll >> index 5c05f8dc3d81..a198604f79a4 100644 >> --- a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll >> +++ b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll >> @@ -17,17 +17,15 @@ define <2 x double> @testi0(<2 x double>* %p1, >> double* %p2) { >> ; CHECK-NEXT: lxvd2x vs0, 0, r3 >> ; CHECK-NEXT: lfdx f1, 0, r4 >> ; CHECK-NEXT: xxswapd vs0, vs0 >> -; CHECK-NEXT: xxspltd vs1, vs1, 0 >> -; CHECK-NEXT: xxpermdi v2, vs0, vs1, 1 >> +; CHECK-NEXT: xxmrghd v2, vs0, vs1 >> ; CHECK-NEXT: blr >> ; >> ; CHECK-P9-VECTOR-LABEL: testi0: >> ; CHECK-P9-VECTOR: # %bb.0: >> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3 >> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4 >> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0 >> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0 >> -; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs0, vs1, 1 >> +; CHECK-P9-VECTOR-NEXT: xxmrghd v2, vs0, vs1 >> ; CHECK-P9-VECTOR-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testi0: >> @@ -51,17 +49,15 @@ define <2 x double> @testi1(<2 x double>* %p1, >> double* %p2) { >> ; CHECK-NEXT: lxvd2x vs0, 0, r3 >> ; CHECK-NEXT: lfdx f1, 0, r4 >> ; CHECK-NEXT: xxswapd vs0, vs0 >> -; CHECK-NEXT: xxspltd vs1, vs1, 0 >> -; CHECK-NEXT: xxmrgld v2, vs1, vs0 >> +; CHECK-NEXT: xxpermdi v2, vs1, vs0, 1 >> ; CHECK-NEXT: blr >> ; >> ; CHECK-P9-VECTOR-LABEL: testi1: >> ; CHECK-P9-VECTOR: # %bb.0: >> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3 >> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4 >> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0 >> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0 >> -; CHECK-P9-VECTOR-NEXT: xxmrgld v2, vs1, vs0 >> +; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs1, vs0, 1 >> ; CHECK-P9-VECTOR-NEXT: blr >> ; >> ; CHECK-P9-LABEL: testi1: >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Wed Jul 8 07:56:48 2020 From: llvm-commits at lists.llvm.org (dmajor via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:56:48 +0000 (UTC) Subject: [PATCH] D82717: [InstSimplify] Fold icmp with dominating assume In-Reply-To: References: Message-ID: dmajor added a comment. After this commit, Firefox builds crash inside the `isValidAssumeForContext`. I filed https://bugs.llvm.org/show_bug.cgi?id=46638. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82717/new/ https://reviews.llvm.org/D82717 From llvm-commits at lists.llvm.org Wed Jul 8 07:57:21 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:57:21 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: <798a11b4ea43f935ef7315cf839b3210@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:4 +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + ---------------- dmgreen wrote: > I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in. It looks like the options above actually force vectorization with a certain factor. In that case, it Is probably best to remove the triple. I'd also consider just checking the loop-vectorize output (without -dce -instcombine), if it is not too messy, as it makes the test more prone to break when something changes in instcombine. Also, it might be possible to only specifically check the IR related to the generated induction, rather than autogenerating the checks, which include a lot of relatively irrelevant stuff. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 From llvm-commits at lists.llvm.org Wed Jul 8 07:59:51 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 14:59:51 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <58d0513cce5eeadca3c50730fb8106c7@localhost.localdomain> sameerarora101 marked 11 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:32 +# FORMAT-NEXT: [[PREFIX]]-input2.o +# FORMAT_NOT: {{.}} + ---------------- smeenai wrote: > You have an underscore instead of a dash :) > > Is the purpose to ensure that there's no other members? I assume a -EMPTY would work for that. We should also check for the "Archive : " header, to ensure there's no members before the table of contents member. Thanks for catching the underscore. I added a check for "Archive : " header now. However, using `FORMAT-EMPTY` would just check that the next line (after 2nd member) has nothing on it. What I thought we wanted to check was that there is nothing at all after the second member. For eg, ``` Archive : ... ... __.SYMDEF ...-input1.o ...-input2.o something here ``` would pass with `FORMAT-EMPTY` just below the check for second member, right? But we want it to fail. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test:1 +## This test checks that unrelated options are hidden in help text. + ---------------- smeenai wrote: > This seems unrelated to this diff; perhaps it should be in the previous one? Yup, thanks. Moved it to the previous diff D82923 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Wed Jul 8 08:04:06 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:04:06 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <510365f3d25e8946b1cbd599bfe05650@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. LGTM. Thanks for all the hard work on making this possible :) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Wed Jul 8 08:06:51 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:06:51 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <41a7f705dc2b31541ff82531324cb660@localhost.localdomain> ebevhan added a comment. In D83216#2139135 , @lebedev.ri wrote: > i see. > > I think this is fine, but just to be safe, may i suggest to do an RFC for these two intrinsics specifically, > just so we're 100% sure everyone is on the same page about them? Sure, I'll send one out. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 08:07:04 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:07:04 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <38c21647511171eecf65e4433616a139@localhost.localdomain> schweitz added a comment. In D83397#2139131 , @echristo wrote: > In D83397#2139111 , @schweitz wrote: > > > Hi Eric, > > > > There is an active development branch for the flang middle end. https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev > > > That's not part of the llvm project. > > > That code base is being upstreamed piecemeal. Not all of the code is upstreamed at this point. It is simply a false impression that code in the middle of being upstreamed is "unused" or "unnecessary". Since it not all of it is upstreamed, changing interfaces and support code in llvm-project directly is going to cause problems that can become hard to track and resolve while the upstreaming is ongoing. > > It very much is unused and unnecessary as there are no pieces of that code in the repository. We are stuck between a rock and a hard place. Flang needs a middle end, as it currently can't generate code. Work on that middle end is being done. We are following the rules and trying to upstream that code in small "reviewable" chunks as quickly as possible. It's just simply going to be the case because of the interdependencies involved and the small chunks process that some code will merely appear to be temporally unused. If you have a better solution to how to change the upstream process, then our group would be happy to hear it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 08:07:09 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:07:09 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: sameerarora101 updated this revision to Diff 276440. sameerarora101 added a comment. Updating tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/test/tools/llvm-libtool-darwin/missing-library-type.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.276440.patch Type: text/x-patch Size: 10702 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:09:59 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:09:59 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: clementval updated this revision to Diff 276441. clementval marked 6 inline comments as done. clementval added a comment. Address review comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276441.patch Type: text/x-patch Size: 7228 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:10:04 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:10:04 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: clementval added a comment. Thanks for the review. I just updated the patch. ================ Comment at: llvm/test/TableGen/directive1.td:120 +// IMPL-NEXT: } +// IMPL-NEXT: } break; +// IMPL-NEXT: default: ---------------- jdenny wrote: > In most examples I've seen, the outer `case` would be formatted as: > > ``` > case TDLD_dira: { > . . . > break; > } > ``` > > clang-format puts the `{` on the same line as the `case`. > > grep shows some code putting `break;` on the same line as the `}` and some code putting it on the line before. However, I did at least find [[ http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector | one example in LLVM docs ]] showing the way I'm suggesting above. > > Alternatively, [[ http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return | as in this example ]], couldn't those braces be dropped given that there are no local declarations? I guess we can drop the braces and then it's clear where the `break;` is going. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- jdenny wrote: > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. Will remove it. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:255 const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); - GenerateTestForAllowedClauses(RequiredClauses, OS, DirectiveName, + GenerateCaseForAllowedClauses(RequiredClauses, OS, DirectiveName, DirectivePrefix, ClausePrefix); ---------------- jdenny wrote: > Maybe `GenerateCaseForVersionedClauses` given that it's not just `allowedClauses`? Yeah might be better and less confusing in the future. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 08:10:43 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 15:10:43 +0000 (UTC) Subject: [PATCH] D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes In-Reply-To: References: Message-ID: nhaehnle added a comment. In D83089#2134858 , @kuhar wrote: > This part of the code is *very* performance sensitive and definitely needs benchmarking before moving forward. Have you tried doing some performance evaluation on this change? I suggest compiling a few mid to large size programs (e.g., sqlite, webassembly, opt, clang, rippled) and compiling them into whole-program bitcode, and then running `opt -O3` on this bitcode. This is pretty easy with gllvm ; I can dig up my old instruction if that would help. I've done this now: used gllvm to extract bitcode of Z3, then run `perf stat opt -O3 z3.bc -o z3.out.bc`. 3 runs on both my branch and the underlying master: Underlying master: 1.441.691.613.522 cycles:u # 3,948 GHz (83,34%) 1.511.470.542.186 instructions:u # 1,05 insn per cycle 1.445.040.063.358 cycles:u # 3,832 GHz (83,33%) 1.511.151.342.715 instructions:u # 1,05 insn per cycle 1.445.488.489.379 cycles:u # 3,851 GHz (83,34%) 1.510.528.339.517 instructions:u # 1,04 insn per cycle My branch: 1.447.208.247.196 cycles:u # 3,920 GHz (83,33%) 1.506.361.797.982 instructions:u # 1,04 insn per cycle 1.451.415.961.317 cycles:u # 3,913 GHz (83,33%) 1.505.605.385.713 instructions:u # 1,04 insn per cycle 1.445.005.341.369 cycles:u # 3,908 GHz (83,33%) 1.506.364.033.713 instructions:u # 1,04 insn per cycle So the results are a bit unintuitive: lower number of instructions overall on my branch, slightly higher average number of cycles. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83089/new/ https://reviews.llvm.org/D83089 From llvm-commits at lists.llvm.org Wed Jul 8 08:12:56 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:12:56 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <831b48a4ec81892ea111a292bdb8bd4a@localhost.localdomain> echristo added a comment. In D83397#2139187 , @schweitz wrote: > In D83397#2139131 , @echristo wrote: > > > In D83397#2139111 , @schweitz wrote: > > > > > Hi Eric, > > > > > > There is an active development branch for the flang middle end. https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev > > > > > > That's not part of the llvm project. > > > > > That code base is being upstreamed piecemeal. Not all of the code is upstreamed at this point. It is simply a false impression that code in the middle of being upstreamed is "unused" or "unnecessary". Since it not all of it is upstreamed, changing interfaces and support code in llvm-project directly is going to cause problems that can become hard to track and resolve while the upstreaming is ongoing. > > > > It very much is unused and unnecessary as there are no pieces of that code in the repository. > > > We are stuck between a rock and a hard place. Flang needs a middle end, as it currently can't generate code. Work on that middle end is being done. We are following the rules and trying to upstream that code in small "reviewable" chunks as quickly as possible. It's just simply going to be the case because of the interdependencies involved and the small chunks process that some code will merely appear to be temporally unused. > > If you have a better solution to how to change the upstream process, then our group would be happy to hear it. You commit something as you get a use for it. Partial files etc. At least this is what the rest of us have done. :) You also still haven't replied with "what bot/compiler/etc is going to break by making this change". Would you please do that? Thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 08:13:50 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 15:13:50 +0000 (UTC) Subject: [PATCH] D82788: AMDGPU: Fix alignment requirements for 96bit and 128bit local loads and stores In-Reply-To: References: Message-ID: <911c8936139761673faed8b82a4ac68e@localhost.localdomain> nhaehnle added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h:700 + bool hasUnalignedDSAccess() const { + return UnalignedDSAccess; + } ---------------- arsenm wrote: > I believe this is actually the same control as UnalignedBufferAccess, so a new feature isn't needed (but this needs double checking) I believe LDS only become fully-featured with gfx9. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82788/new/ https://reviews.llvm.org/D82788 From llvm-commits at lists.llvm.org Wed Jul 8 08:14:34 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:14:34 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: <526a382a1d5f74d936c924e50dd630d7@localhost.localdomain> kamaub accepted this revision. kamaub added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Wed Jul 8 08:14:57 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:14:57 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. In-Reply-To: References: Message-ID: <5905be79e1f8eea5f02ee87ede7c65bc@localhost.localdomain> dmgreen accepted this revision. dmgreen added a comment. This revision is now accepted and ready to land. Thanks. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83288/new/ https://reviews.llvm.org/D83288 From llvm-commits at lists.llvm.org Wed Jul 8 08:15:22 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 15:15:22 +0000 (UTC) Subject: [PATCH] D83087: DomTree: remove explicit use of DomTreeNodeBase::iterator In-Reply-To: References: Message-ID: <6e51ef276d884aed7596bc4eca51c1f9@localhost.localdomain> nhaehnle added a comment. In D83087#2134881 , @kuhar wrote: > modulo accidental formatting changes. I'm not aware of any. Some line breaks changed because "const_iterator" is longer than "iterator". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83087/new/ https://reviews.llvm.org/D83087 From llvm-commits at lists.llvm.org Wed Jul 8 08:16:36 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:16:36 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <86531d51dd8b59a0e4a953a45b6d1d68@localhost.localdomain> clementval updated this revision to Diff 276442. clementval added a comment. Small fix to indent in lit test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276442.patch Type: text/x-patch Size: 7232 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:18:28 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:18:28 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <8435a8fee41cd2612514bb90a451387d@localhost.localdomain> nemanjai added a comment. Minor nits, otherwise LGTM. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp:97 + // name offset bits flags + {"fixup_ppc_br24", 6, 24, MCFixupKindInfo::FKF_IsPCRel}, + {"fixup_ppc_br24_notoc", 6, 24, MCFixupKindInfo::FKF_IsPCRel}, ---------------- kamaub wrote: > I think it might be a good idea to ignore the clang-format suggestions in this case since the previous way is alot more readable. Yes. Please do not change the existing ones. Unrelated whitespace changes are quite detrimental to meaningful git log history. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp:413 + case PPC::fixup_ppc_imm34: + switch (Modifier) { + default: ---------------- I think we can just skip the `switch` that does nothing and add an `llvm_unreachable` for this case. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:399 def PLI8 : MLS_DForm_SI34_RT5<14, (outs g8rc:$RT), - (ins s34imm:$SI), + (ins s34imm_pcrel:$SI), "pli $RT, $SI", IIC_IntSimple, []>; ---------------- It seems very odd to me that we would use the `_pcrel` version here. There should be no way to do anything PC-relative with this instruction since it will necessarily set the PC-Rel bit to zero. The immediate should always be a real immediate (never any fixup). So although it doesn't matter, we should probably not use the `_pcrel` version because it will be confusing. I was certainly confused and wrote about 3 versions of this comment :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 From llvm-commits at lists.llvm.org Wed Jul 8 08:18:53 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:18:53 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <095054a43bb21df9da85f73e5d7acc3e@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. Forgot to select Accept. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 From llvm-commits at lists.llvm.org Wed Jul 8 08:19:47 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via llvm-commits) Date: Wed, 08 Jul 2020 08:19:47 -0700 (PDT) Subject: [llvm] 6403009 - SLP: honor requested max vector size merging PHIs Message-ID: <5f05e413.1c69fb81.3a60c.082d@mx.google.com> Author: Stanislav Mekhanoshin Date: 2020-07-08T08:06:15-07:00 New Revision: 64030099c378062131fa1b29742a783f2ca14c17 URL: https://github.com/llvm/llvm-project/commit/64030099c378062131fa1b29742a783f2ca14c17 DIFF: https://github.com/llvm/llvm-project/commit/64030099c378062131fa1b29742a783f2ca14c17.diff LOG: SLP: honor requested max vector size merging PHIs At the moment this place does not check maximum size set by TTI and just creates a maximum possible vectors. Differential Revision: https://reviews.llvm.org/D82227 Added: Modified: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 5bb05c6ac3d1..d4b16fac985d 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -7361,6 +7361,7 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) { bool Changed = false; SmallVector Incoming; SmallPtrSet VisitedInstrs; + unsigned MaxVecRegSize = R.getMaxVecRegSize(); bool HaveVectorizedPhiNodes = true; while (HaveVectorizedPhiNodes) { @@ -7387,8 +7388,18 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) { // Look for the next elements with the same type. SmallVector::iterator SameTypeIt = IncIt; + Type *EltTy = (*IncIt)->getType(); + unsigned EltSize = EltTy->isSized() ? DL->getTypeSizeInBits(EltTy) + : MaxVecRegSize; + unsigned MaxNumElts = MaxVecRegSize / EltSize; + if (MaxNumElts < 2) { + ++IncIt; + continue; + } + while (SameTypeIt != E && - (*SameTypeIt)->getType() == (*IncIt)->getType()) { + (*SameTypeIt)->getType() == EltTy && + (SameTypeIt - IncIt) < MaxNumElts) { VisitedInstrs.insert(*SameTypeIt); ++SameTypeIt; } diff --git a/llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll b/llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll index a134aec00bbb..afc08aa4fb27 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll @@ -1,5 +1,5 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -pass-remarks-output=%t < %s | FileCheck %s +; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer --slp-max-reg-size=256 -pass-remarks-output=%t < %s | FileCheck %s ; RUN: FileCheck --input-file=%t --check-prefix=YAML %s ; This type is not supported by SLP diff --git a/llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll b/llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll index e349016fa4f2..466e83d0260a 100644 --- a/llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll +++ b/llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll @@ -8,94 +8,74 @@ define void @phi_float32(half %hval, float %fval) { ; MAX32-NEXT: bb: ; MAX32-NEXT: br label [[BB1:%.*]] ; MAX32: bb1: -; MAX32-NEXT: [[TMP0:%.*]] = insertelement <4 x half> undef, half [[HVAL:%.*]], i32 0 -; MAX32-NEXT: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half [[HVAL]], i32 1 -; MAX32-NEXT: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half [[HVAL]], i32 2 -; MAX32-NEXT: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half [[HVAL]], i32 3 -; MAX32-NEXT: [[TMP4:%.*]] = fpext <4 x half> [[TMP3]] to <4 x float> -; MAX32-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <32 x i32> -; MAX32-NEXT: [[TMP5:%.*]] = insertelement <32 x float> undef, float [[FVAL:%.*]], i32 0 -; MAX32-NEXT: [[TMP6:%.*]] = insertelement <32 x float> [[TMP5]], float [[FVAL]], i32 1 -; MAX32-NEXT: [[TMP7:%.*]] = insertelement <32 x float> [[TMP6]], float [[FVAL]], i32 2 -; MAX32-NEXT: [[TMP8:%.*]] = insertelement <32 x float> [[TMP7]], float [[FVAL]], i32 3 -; MAX32-NEXT: [[TMP9:%.*]] = insertelement <32 x float> [[TMP8]], float [[FVAL]], i32 4 -; MAX32-NEXT: [[TMP10:%.*]] = insertelement <32 x float> [[TMP9]], float [[FVAL]], i32 5 -; MAX32-NEXT: [[TMP11:%.*]] = insertelement <32 x float> [[TMP10]], float [[FVAL]], i32 6 -; MAX32-NEXT: [[TMP12:%.*]] = insertelement <32 x float> [[TMP11]], float [[FVAL]], i32 7 -; MAX32-NEXT: [[TMP13:%.*]] = insertelement <32 x float> [[TMP12]], float [[FVAL]], i32 8 -; MAX32-NEXT: [[TMP14:%.*]] = insertelement <32 x float> [[TMP13]], float [[FVAL]], i32 9 -; MAX32-NEXT: [[TMP15:%.*]] = insertelement <32 x float> [[TMP14]], float [[FVAL]], i32 10 -; MAX32-NEXT: [[TMP16:%.*]] = insertelement <32 x float> [[TMP15]], float [[FVAL]], i32 11 -; MAX32-NEXT: [[TMP17:%.*]] = insertelement <32 x float> [[TMP16]], float [[FVAL]], i32 12 -; MAX32-NEXT: [[TMP18:%.*]] = insertelement <32 x float> [[TMP17]], float [[FVAL]], i32 13 -; MAX32-NEXT: [[TMP19:%.*]] = insertelement <32 x float> [[TMP18]], float [[FVAL]], i32 14 -; MAX32-NEXT: [[TMP20:%.*]] = insertelement <32 x float> [[TMP19]], float [[FVAL]], i32 15 -; MAX32-NEXT: [[TMP21:%.*]] = insertelement <32 x float> [[TMP20]], float [[FVAL]], i32 16 -; MAX32-NEXT: [[TMP22:%.*]] = insertelement <32 x float> [[TMP21]], float [[FVAL]], i32 17 -; MAX32-NEXT: [[TMP23:%.*]] = insertelement <32 x float> [[TMP22]], float [[FVAL]], i32 18 -; MAX32-NEXT: [[TMP24:%.*]] = insertelement <32 x float> [[TMP23]], float [[FVAL]], i32 19 -; MAX32-NEXT: [[TMP25:%.*]] = insertelement <32 x float> [[TMP24]], float [[FVAL]], i32 20 -; MAX32-NEXT: [[TMP26:%.*]] = insertelement <32 x float> [[TMP25]], float [[FVAL]], i32 21 -; MAX32-NEXT: [[TMP27:%.*]] = insertelement <32 x float> [[TMP26]], float [[FVAL]], i32 22 -; MAX32-NEXT: [[TMP28:%.*]] = insertelement <32 x float> [[TMP27]], float [[FVAL]], i32 23 -; MAX32-NEXT: [[TMP29:%.*]] = insertelement <32 x float> [[TMP28]], float [[FVAL]], i32 24 -; MAX32-NEXT: [[TMP30:%.*]] = insertelement <32 x float> [[TMP29]], float [[FVAL]], i32 25 -; MAX32-NEXT: [[TMP31:%.*]] = insertelement <32 x float> [[TMP30]], float [[FVAL]], i32 26 -; MAX32-NEXT: [[TMP32:%.*]] = insertelement <32 x float> [[TMP31]], float [[FVAL]], i32 27 -; MAX32-NEXT: [[TMP33:%.*]] = insertelement <32 x float> [[TMP32]], float [[FVAL]], i32 28 -; MAX32-NEXT: [[TMP34:%.*]] = insertelement <32 x float> [[TMP33]], float [[FVAL]], i32 29 -; MAX32-NEXT: [[TMP35:%.*]] = insertelement <32 x float> [[TMP34]], float [[FVAL]], i32 30 -; MAX32-NEXT: [[TMP36:%.*]] = insertelement <32 x float> [[TMP35]], float [[FVAL]], i32 31 -; MAX32-NEXT: [[TMP37:%.*]] = fmul <32 x float> [[SHUFFLE]], [[TMP36]] -; MAX32-NEXT: [[TMP38:%.*]] = fadd <32 x float> zeroinitializer, [[TMP37]] -; MAX32-NEXT: [[TMP39:%.*]] = extractelement <32 x float> [[TMP38]], i32 0 -; MAX32-NEXT: [[TMP40:%.*]] = insertelement <32 x float> undef, float [[TMP39]], i32 0 -; MAX32-NEXT: [[TMP41:%.*]] = extractelement <32 x float> [[TMP38]], i32 1 -; MAX32-NEXT: [[TMP42:%.*]] = insertelement <32 x float> [[TMP40]], float [[TMP41]], i32 1 -; MAX32-NEXT: [[TMP43:%.*]] = insertelement <32 x float> [[TMP42]], float [[FVAL]], i32 2 -; MAX32-NEXT: [[TMP44:%.*]] = insertelement <32 x float> [[TMP43]], float [[FVAL]], i32 3 -; MAX32-NEXT: [[TMP45:%.*]] = extractelement <32 x float> [[TMP38]], i32 4 -; MAX32-NEXT: [[TMP46:%.*]] = insertelement <32 x float> [[TMP44]], float [[TMP45]], i32 4 -; MAX32-NEXT: [[TMP47:%.*]] = extractelement <32 x float> [[TMP38]], i32 5 -; MAX32-NEXT: [[TMP48:%.*]] = insertelement <32 x float> [[TMP46]], float [[TMP47]], i32 5 -; MAX32-NEXT: [[TMP49:%.*]] = insertelement <32 x float> [[TMP48]], float [[FVAL]], i32 6 -; MAX32-NEXT: [[TMP50:%.*]] = insertelement <32 x float> [[TMP49]], float [[FVAL]], i32 7 -; MAX32-NEXT: [[TMP51:%.*]] = insertelement <32 x float> [[TMP50]], float [[FVAL]], i32 8 -; MAX32-NEXT: [[TMP52:%.*]] = insertelement <32 x float> [[TMP51]], float [[FVAL]], i32 9 -; MAX32-NEXT: [[TMP53:%.*]] = extractelement <32 x float> [[TMP38]], i32 10 -; MAX32-NEXT: [[TMP54:%.*]] = insertelement <32 x float> [[TMP52]], float [[TMP53]], i32 10 -; MAX32-NEXT: [[TMP55:%.*]] = extractelement <32 x float> [[TMP38]], i32 11 -; MAX32-NEXT: [[TMP56:%.*]] = insertelement <32 x float> [[TMP54]], float [[TMP55]], i32 11 -; MAX32-NEXT: [[TMP57:%.*]] = insertelement <32 x float> [[TMP56]], float [[FVAL]], i32 12 -; MAX32-NEXT: [[TMP58:%.*]] = insertelement <32 x float> [[TMP57]], float [[FVAL]], i32 13 -; MAX32-NEXT: [[TMP59:%.*]] = extractelement <32 x float> [[TMP38]], i32 14 -; MAX32-NEXT: [[TMP60:%.*]] = insertelement <32 x float> [[TMP58]], float [[TMP59]], i32 14 -; MAX32-NEXT: [[TMP61:%.*]] = extractelement <32 x float> [[TMP38]], i32 15 -; MAX32-NEXT: [[TMP62:%.*]] = insertelement <32 x float> [[TMP60]], float [[TMP61]], i32 15 -; MAX32-NEXT: [[TMP63:%.*]] = insertelement <32 x float> [[TMP62]], float [[FVAL]], i32 16 -; MAX32-NEXT: [[TMP64:%.*]] = insertelement <32 x float> [[TMP63]], float [[FVAL]], i32 17 -; MAX32-NEXT: [[TMP65:%.*]] = extractelement <32 x float> [[TMP38]], i32 18 -; MAX32-NEXT: [[TMP66:%.*]] = insertelement <32 x float> [[TMP64]], float [[TMP65]], i32 18 -; MAX32-NEXT: [[TMP67:%.*]] = extractelement <32 x float> [[TMP38]], i32 19 -; MAX32-NEXT: [[TMP68:%.*]] = insertelement <32 x float> [[TMP66]], float [[TMP67]], i32 19 -; MAX32-NEXT: [[TMP69:%.*]] = insertelement <32 x float> [[TMP68]], float [[FVAL]], i32 20 -; MAX32-NEXT: [[TMP70:%.*]] = insertelement <32 x float> [[TMP69]], float [[FVAL]], i32 21 -; MAX32-NEXT: [[TMP71:%.*]] = extractelement <32 x float> [[TMP38]], i32 22 -; MAX32-NEXT: [[TMP72:%.*]] = insertelement <32 x float> [[TMP70]], float [[TMP71]], i32 22 -; MAX32-NEXT: [[TMP73:%.*]] = extractelement <32 x float> [[TMP38]], i32 23 -; MAX32-NEXT: [[TMP74:%.*]] = insertelement <32 x float> [[TMP72]], float [[TMP73]], i32 23 -; MAX32-NEXT: [[TMP75:%.*]] = insertelement <32 x float> [[TMP74]], float [[FVAL]], i32 24 -; MAX32-NEXT: [[TMP76:%.*]] = insertelement <32 x float> [[TMP75]], float [[FVAL]], i32 25 -; MAX32-NEXT: [[TMP77:%.*]] = extractelement <32 x float> [[TMP38]], i32 26 -; MAX32-NEXT: [[TMP78:%.*]] = insertelement <32 x float> [[TMP76]], float [[TMP77]], i32 26 -; MAX32-NEXT: [[TMP79:%.*]] = extractelement <32 x float> [[TMP38]], i32 27 -; MAX32-NEXT: [[TMP80:%.*]] = insertelement <32 x float> [[TMP78]], float [[TMP79]], i32 27 -; MAX32-NEXT: [[TMP81:%.*]] = insertelement <32 x float> [[TMP80]], float [[FVAL]], i32 28 -; MAX32-NEXT: [[TMP82:%.*]] = insertelement <32 x float> [[TMP81]], float [[FVAL]], i32 29 -; MAX32-NEXT: [[TMP83:%.*]] = extractelement <32 x float> [[TMP38]], i32 30 -; MAX32-NEXT: [[TMP84:%.*]] = insertelement <32 x float> [[TMP82]], float [[TMP83]], i32 30 -; MAX32-NEXT: [[TMP85:%.*]] = extractelement <32 x float> [[TMP38]], i32 31 -; MAX32-NEXT: [[TMP86:%.*]] = insertelement <32 x float> [[TMP84]], float [[TMP85]], i32 31 +; MAX32-NEXT: [[I:%.*]] = fpext half [[HVAL:%.*]] to float +; MAX32-NEXT: [[I1:%.*]] = fmul float [[I]], [[FVAL:%.*]] +; MAX32-NEXT: [[I2:%.*]] = fadd float 0.000000e+00, [[I1]] +; MAX32-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float +; MAX32-NEXT: [[I4:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I5:%.*]] = fadd float 0.000000e+00, [[I4]] +; MAX32-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float +; MAX32-NEXT: [[I7:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I8:%.*]] = fadd float 0.000000e+00, [[I7]] +; MAX32-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float +; MAX32-NEXT: [[I10:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I11:%.*]] = fadd float 0.000000e+00, [[I10]] +; MAX32-NEXT: [[I12:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I13:%.*]] = fadd float 0.000000e+00, [[I12]] +; MAX32-NEXT: [[I14:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I15:%.*]] = fadd float 0.000000e+00, [[I14]] +; MAX32-NEXT: [[I16:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I17:%.*]] = fadd float 0.000000e+00, [[I16]] +; MAX32-NEXT: [[I18:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I19:%.*]] = fadd float 0.000000e+00, [[I18]] +; MAX32-NEXT: [[I20:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I21:%.*]] = fadd float 0.000000e+00, [[I20]] +; MAX32-NEXT: [[I22:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I23:%.*]] = fadd float 0.000000e+00, [[I22]] +; MAX32-NEXT: [[I24:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I25:%.*]] = fadd float 0.000000e+00, [[I24]] +; MAX32-NEXT: [[I26:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I27:%.*]] = fadd float 0.000000e+00, [[I26]] +; MAX32-NEXT: [[I28:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I29:%.*]] = fadd float 0.000000e+00, [[I28]] +; MAX32-NEXT: [[I30:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I31:%.*]] = fadd float 0.000000e+00, [[I30]] +; MAX32-NEXT: [[I32:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I33:%.*]] = fadd float 0.000000e+00, [[I32]] +; MAX32-NEXT: [[I34:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I35:%.*]] = fadd float 0.000000e+00, [[I34]] +; MAX32-NEXT: [[I36:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I37:%.*]] = fadd float 0.000000e+00, [[I36]] +; MAX32-NEXT: [[I38:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I39:%.*]] = fadd float 0.000000e+00, [[I38]] +; MAX32-NEXT: [[I40:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I41:%.*]] = fadd float 0.000000e+00, [[I40]] +; MAX32-NEXT: [[I42:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I43:%.*]] = fadd float 0.000000e+00, [[I42]] +; MAX32-NEXT: [[I44:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I45:%.*]] = fadd float 0.000000e+00, [[I44]] +; MAX32-NEXT: [[I46:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I47:%.*]] = fadd float 0.000000e+00, [[I46]] +; MAX32-NEXT: [[I48:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I49:%.*]] = fadd float 0.000000e+00, [[I48]] +; MAX32-NEXT: [[I50:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I51:%.*]] = fadd float 0.000000e+00, [[I50]] +; MAX32-NEXT: [[I52:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I53:%.*]] = fadd float 0.000000e+00, [[I52]] +; MAX32-NEXT: [[I54:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I55:%.*]] = fadd float 0.000000e+00, [[I54]] +; MAX32-NEXT: [[I56:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I57:%.*]] = fadd float 0.000000e+00, [[I56]] +; MAX32-NEXT: [[I58:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I59:%.*]] = fadd float 0.000000e+00, [[I58]] +; MAX32-NEXT: [[I60:%.*]] = fmul float [[I]], [[FVAL]] +; MAX32-NEXT: [[I61:%.*]] = fadd float 0.000000e+00, [[I60]] +; MAX32-NEXT: [[I62:%.*]] = fmul float [[I3]], [[FVAL]] +; MAX32-NEXT: [[I63:%.*]] = fadd float 0.000000e+00, [[I62]] +; MAX32-NEXT: [[I64:%.*]] = fmul float [[I6]], [[FVAL]] +; MAX32-NEXT: [[I65:%.*]] = fadd float 0.000000e+00, [[I64]] +; MAX32-NEXT: [[I66:%.*]] = fmul float [[I9]], [[FVAL]] +; MAX32-NEXT: [[I67:%.*]] = fadd float 0.000000e+00, [[I66]] ; MAX32-NEXT: switch i32 undef, label [[BB5:%.*]] [ ; MAX32-NEXT: i32 0, label [[BB2:%.*]] ; MAX32-NEXT: i32 1, label [[BB3:%.*]] @@ -104,89 +84,42 @@ define void @phi_float32(half %hval, float %fval) { ; MAX32: bb3: ; MAX32-NEXT: br label [[BB2]] ; MAX32: bb4: -; MAX32-NEXT: [[TMP87:%.*]] = insertelement <32 x float> [[TMP40]], float [[FVAL]], i32 1 -; MAX32-NEXT: [[TMP88:%.*]] = insertelement <32 x float> [[TMP87]], float [[FVAL]], i32 2 -; MAX32-NEXT: [[TMP89:%.*]] = extractelement <32 x float> [[TMP38]], i32 3 -; MAX32-NEXT: [[TMP90:%.*]] = insertelement <32 x float> [[TMP88]], float [[TMP89]], i32 3 -; MAX32-NEXT: [[TMP91:%.*]] = insertelement <32 x float> [[TMP90]], float [[TMP45]], i32 4 -; MAX32-NEXT: [[TMP92:%.*]] = insertelement <32 x float> [[TMP91]], float [[FVAL]], i32 5 -; MAX32-NEXT: [[TMP93:%.*]] = insertelement <32 x float> [[TMP92]], float [[FVAL]], i32 6 -; MAX32-NEXT: [[TMP94:%.*]] = extractelement <32 x float> [[TMP38]], i32 7 -; MAX32-NEXT: [[TMP95:%.*]] = insertelement <32 x float> [[TMP93]], float [[TMP94]], i32 7 -; MAX32-NEXT: [[TMP96:%.*]] = extractelement <32 x float> [[TMP38]], i32 8 -; MAX32-NEXT: [[TMP97:%.*]] = insertelement <32 x float> [[TMP95]], float [[TMP96]], i32 8 -; MAX32-NEXT: [[TMP98:%.*]] = insertelement <32 x float> [[TMP97]], float [[FVAL]], i32 9 -; MAX32-NEXT: [[TMP99:%.*]] = insertelement <32 x float> [[TMP98]], float [[FVAL]], i32 10 -; MAX32-NEXT: [[TMP100:%.*]] = insertelement <32 x float> [[TMP99]], float [[TMP55]], i32 11 -; MAX32-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[TMP38]], i32 12 -; MAX32-NEXT: [[TMP102:%.*]] = insertelement <32 x float> [[TMP100]], float [[TMP101]], i32 12 -; MAX32-NEXT: [[TMP103:%.*]] = insertelement <32 x float> [[TMP102]], float [[FVAL]], i32 13 -; MAX32-NEXT: [[TMP104:%.*]] = insertelement <32 x float> [[TMP103]], float [[FVAL]], i32 14 -; MAX32-NEXT: [[TMP105:%.*]] = insertelement <32 x float> [[TMP104]], float [[TMP61]], i32 15 -; MAX32-NEXT: [[TMP106:%.*]] = extractelement <32 x float> [[TMP38]], i32 16 -; MAX32-NEXT: [[TMP107:%.*]] = insertelement <32 x float> [[TMP105]], float [[TMP106]], i32 16 -; MAX32-NEXT: [[TMP108:%.*]] = insertelement <32 x float> [[TMP107]], float [[FVAL]], i32 17 -; MAX32-NEXT: [[TMP109:%.*]] = insertelement <32 x float> [[TMP108]], float [[FVAL]], i32 18 -; MAX32-NEXT: [[TMP110:%.*]] = insertelement <32 x float> [[TMP109]], float [[TMP67]], i32 19 -; MAX32-NEXT: [[TMP111:%.*]] = extractelement <32 x float> [[TMP38]], i32 20 -; MAX32-NEXT: [[TMP112:%.*]] = insertelement <32 x float> [[TMP110]], float [[TMP111]], i32 20 -; MAX32-NEXT: [[TMP113:%.*]] = insertelement <32 x float> [[TMP112]], float [[FVAL]], i32 21 -; MAX32-NEXT: [[TMP114:%.*]] = insertelement <32 x float> [[TMP113]], float [[FVAL]], i32 22 -; MAX32-NEXT: [[TMP115:%.*]] = insertelement <32 x float> [[TMP114]], float [[TMP73]], i32 23 -; MAX32-NEXT: [[TMP116:%.*]] = extractelement <32 x float> [[TMP38]], i32 24 -; MAX32-NEXT: [[TMP117:%.*]] = insertelement <32 x float> [[TMP115]], float [[TMP116]], i32 24 -; MAX32-NEXT: [[TMP118:%.*]] = insertelement <32 x float> [[TMP117]], float [[FVAL]], i32 25 -; MAX32-NEXT: [[TMP119:%.*]] = insertelement <32 x float> [[TMP118]], float [[FVAL]], i32 26 -; MAX32-NEXT: [[TMP120:%.*]] = insertelement <32 x float> [[TMP119]], float [[TMP79]], i32 27 -; MAX32-NEXT: [[TMP121:%.*]] = extractelement <32 x float> [[TMP38]], i32 28 -; MAX32-NEXT: [[TMP122:%.*]] = insertelement <32 x float> [[TMP120]], float [[TMP121]], i32 28 -; MAX32-NEXT: [[TMP123:%.*]] = insertelement <32 x float> [[TMP122]], float [[FVAL]], i32 29 -; MAX32-NEXT: [[TMP124:%.*]] = insertelement <32 x float> [[TMP123]], float [[FVAL]], i32 30 -; MAX32-NEXT: [[TMP125:%.*]] = insertelement <32 x float> [[TMP124]], float [[TMP85]], i32 31 ; MAX32-NEXT: br label [[BB2]] ; MAX32: bb5: -; MAX32-NEXT: [[TMP126:%.*]] = insertelement <32 x float> [[TMP5]], float [[TMP41]], i32 1 -; MAX32-NEXT: [[TMP127:%.*]] = insertelement <32 x float> [[TMP126]], float [[FVAL]], i32 2 -; MAX32-NEXT: [[TMP128:%.*]] = extractelement <32 x float> [[TMP38]], i32 3 -; MAX32-NEXT: [[TMP129:%.*]] = insertelement <32 x float> [[TMP127]], float [[TMP128]], i32 3 -; MAX32-NEXT: [[TMP130:%.*]] = insertelement <32 x float> [[TMP129]], float [[FVAL]], i32 4 -; MAX32-NEXT: [[TMP131:%.*]] = insertelement <32 x float> [[TMP130]], float [[TMP47]], i32 5 -; MAX32-NEXT: [[TMP132:%.*]] = insertelement <32 x float> [[TMP131]], float [[FVAL]], i32 6 -; MAX32-NEXT: [[TMP133:%.*]] = extractelement <32 x float> [[TMP38]], i32 7 -; MAX32-NEXT: [[TMP134:%.*]] = insertelement <32 x float> [[TMP132]], float [[TMP133]], i32 7 -; MAX32-NEXT: [[TMP135:%.*]] = extractelement <32 x float> [[TMP38]], i32 8 -; MAX32-NEXT: [[TMP136:%.*]] = insertelement <32 x float> [[TMP134]], float [[TMP135]], i32 8 -; MAX32-NEXT: [[TMP137:%.*]] = insertelement <32 x float> [[TMP136]], float [[FVAL]], i32 9 -; MAX32-NEXT: [[TMP138:%.*]] = insertelement <32 x float> [[TMP137]], float [[TMP53]], i32 10 -; MAX32-NEXT: [[TMP139:%.*]] = insertelement <32 x float> [[TMP138]], float [[FVAL]], i32 11 -; MAX32-NEXT: [[TMP140:%.*]] = extractelement <32 x float> [[TMP38]], i32 12 -; MAX32-NEXT: [[TMP141:%.*]] = insertelement <32 x float> [[TMP139]], float [[TMP140]], i32 12 -; MAX32-NEXT: [[TMP142:%.*]] = insertelement <32 x float> [[TMP141]], float [[FVAL]], i32 13 -; MAX32-NEXT: [[TMP143:%.*]] = insertelement <32 x float> [[TMP142]], float [[TMP59]], i32 14 -; MAX32-NEXT: [[TMP144:%.*]] = insertelement <32 x float> [[TMP143]], float [[FVAL]], i32 15 -; MAX32-NEXT: [[TMP145:%.*]] = extractelement <32 x float> [[TMP38]], i32 16 -; MAX32-NEXT: [[TMP146:%.*]] = insertelement <32 x float> [[TMP144]], float [[TMP145]], i32 16 -; MAX32-NEXT: [[TMP147:%.*]] = insertelement <32 x float> [[TMP146]], float [[FVAL]], i32 17 -; MAX32-NEXT: [[TMP148:%.*]] = insertelement <32 x float> [[TMP147]], float [[TMP65]], i32 18 -; MAX32-NEXT: [[TMP149:%.*]] = insertelement <32 x float> [[TMP148]], float [[FVAL]], i32 19 -; MAX32-NEXT: [[TMP150:%.*]] = extractelement <32 x float> [[TMP38]], i32 20 -; MAX32-NEXT: [[TMP151:%.*]] = insertelement <32 x float> [[TMP149]], float [[TMP150]], i32 20 -; MAX32-NEXT: [[TMP152:%.*]] = insertelement <32 x float> [[TMP151]], float [[FVAL]], i32 21 -; MAX32-NEXT: [[TMP153:%.*]] = insertelement <32 x float> [[TMP152]], float [[TMP71]], i32 22 -; MAX32-NEXT: [[TMP154:%.*]] = insertelement <32 x float> [[TMP153]], float [[FVAL]], i32 23 -; MAX32-NEXT: [[TMP155:%.*]] = extractelement <32 x float> [[TMP38]], i32 24 -; MAX32-NEXT: [[TMP156:%.*]] = insertelement <32 x float> [[TMP154]], float [[TMP155]], i32 24 -; MAX32-NEXT: [[TMP157:%.*]] = insertelement <32 x float> [[TMP156]], float [[FVAL]], i32 25 -; MAX32-NEXT: [[TMP158:%.*]] = insertelement <32 x float> [[TMP157]], float [[TMP77]], i32 26 -; MAX32-NEXT: [[TMP159:%.*]] = insertelement <32 x float> [[TMP158]], float [[FVAL]], i32 27 -; MAX32-NEXT: [[TMP160:%.*]] = extractelement <32 x float> [[TMP38]], i32 28 -; MAX32-NEXT: [[TMP161:%.*]] = insertelement <32 x float> [[TMP159]], float [[TMP160]], i32 28 -; MAX32-NEXT: [[TMP162:%.*]] = insertelement <32 x float> [[TMP161]], float [[FVAL]], i32 29 -; MAX32-NEXT: [[TMP163:%.*]] = insertelement <32 x float> [[TMP162]], float [[TMP83]], i32 30 -; MAX32-NEXT: [[TMP164:%.*]] = insertelement <32 x float> [[TMP163]], float [[FVAL]], i32 31 ; MAX32-NEXT: br label [[BB2]] ; MAX32: bb2: -; MAX32-NEXT: [[TMP165:%.*]] = phi <32 x float> [ [[TMP38]], [[BB3]] ], [ [[TMP125]], [[BB4]] ], [ [[TMP164]], [[BB5]] ], [ [[TMP86]], [[BB1]] ] +; MAX32-NEXT: [[PHI1:%.*]] = phi float [ [[I19]], [[BB3]] ], [ [[I19]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I19]], [[BB1]] ] +; MAX32-NEXT: [[PHI2:%.*]] = phi float [ [[I17]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I17]], [[BB5]] ], [ [[I17]], [[BB1]] ] +; MAX32-NEXT: [[PHI3:%.*]] = phi float [ [[I15]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI4:%.*]] = phi float [ [[I13]], [[BB3]] ], [ [[I13]], [[BB4]] ], [ [[I13]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI5:%.*]] = phi float [ [[I11]], [[BB3]] ], [ [[I11]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I11]], [[BB1]] ] +; MAX32-NEXT: [[PHI6:%.*]] = phi float [ [[I8]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I8]], [[BB5]] ], [ [[I8]], [[BB1]] ] +; MAX32-NEXT: [[PHI7:%.*]] = phi float [ [[I5]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI8:%.*]] = phi float [ [[I2]], [[BB3]] ], [ [[I2]], [[BB4]] ], [ [[I2]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI9:%.*]] = phi float [ [[I21]], [[BB3]] ], [ [[I21]], [[BB4]] ], [ [[I21]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI10:%.*]] = phi float [ [[I23]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI11:%.*]] = phi float [ [[I25]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I25]], [[BB5]] ], [ [[I25]], [[BB1]] ] +; MAX32-NEXT: [[PHI12:%.*]] = phi float [ [[I27]], [[BB3]] ], [ [[I27]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I27]], [[BB1]] ] +; MAX32-NEXT: [[PHI13:%.*]] = phi float [ [[I29]], [[BB3]] ], [ [[I29]], [[BB4]] ], [ [[I29]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI14:%.*]] = phi float [ [[I31]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI15:%.*]] = phi float [ [[I33]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I33]], [[BB5]] ], [ [[I33]], [[BB1]] ] +; MAX32-NEXT: [[PHI16:%.*]] = phi float [ [[I35]], [[BB3]] ], [ [[I35]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I35]], [[BB1]] ] +; MAX32-NEXT: [[PHI17:%.*]] = phi float [ [[I37]], [[BB3]] ], [ [[I37]], [[BB4]] ], [ [[I37]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI18:%.*]] = phi float [ [[I39]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI19:%.*]] = phi float [ [[I41]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I41]], [[BB5]] ], [ [[I41]], [[BB1]] ] +; MAX32-NEXT: [[PHI20:%.*]] = phi float [ [[I43]], [[BB3]] ], [ [[I43]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I43]], [[BB1]] ] +; MAX32-NEXT: [[PHI21:%.*]] = phi float [ [[I45]], [[BB3]] ], [ [[I45]], [[BB4]] ], [ [[I45]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI22:%.*]] = phi float [ [[I47]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI23:%.*]] = phi float [ [[I49]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I49]], [[BB5]] ], [ [[I49]], [[BB1]] ] +; MAX32-NEXT: [[PHI24:%.*]] = phi float [ [[I51]], [[BB3]] ], [ [[I51]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I51]], [[BB1]] ] +; MAX32-NEXT: [[PHI25:%.*]] = phi float [ [[I53]], [[BB3]] ], [ [[I53]], [[BB4]] ], [ [[I53]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI26:%.*]] = phi float [ [[I55]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI27:%.*]] = phi float [ [[I57]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I57]], [[BB5]] ], [ [[I57]], [[BB1]] ] +; MAX32-NEXT: [[PHI28:%.*]] = phi float [ [[I59]], [[BB3]] ], [ [[I59]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I59]], [[BB1]] ] +; MAX32-NEXT: [[PHI29:%.*]] = phi float [ [[I61]], [[BB3]] ], [ [[I61]], [[BB4]] ], [ [[I61]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI30:%.*]] = phi float [ [[I63]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[FVAL]], [[BB1]] ] +; MAX32-NEXT: [[PHI31:%.*]] = phi float [ [[I65]], [[BB3]] ], [ [[FVAL]], [[BB4]] ], [ [[I65]], [[BB5]] ], [ [[I65]], [[BB1]] ] +; MAX32-NEXT: [[PHI32:%.*]] = phi float [ [[I67]], [[BB3]] ], [ [[I67]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I67]], [[BB1]] ] ; MAX32-NEXT: ret void ; ; MAX256-LABEL: @phi_float32( @@ -198,89 +131,77 @@ define void @phi_float32(half %hval, float %fval) { ; MAX256-NEXT: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half [[HVAL]], i32 2 ; MAX256-NEXT: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half [[HVAL]], i32 3 ; MAX256-NEXT: [[TMP4:%.*]] = fpext <4 x half> [[TMP3]] to <4 x float> -; MAX256-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <32 x i32> -; MAX256-NEXT: [[TMP5:%.*]] = insertelement <32 x float> undef, float [[FVAL:%.*]], i32 0 -; MAX256-NEXT: [[TMP6:%.*]] = insertelement <32 x float> [[TMP5]], float [[FVAL]], i32 1 -; MAX256-NEXT: [[TMP7:%.*]] = insertelement <32 x float> [[TMP6]], float [[FVAL]], i32 2 -; MAX256-NEXT: [[TMP8:%.*]] = insertelement <32 x float> [[TMP7]], float [[FVAL]], i32 3 -; MAX256-NEXT: [[TMP9:%.*]] = insertelement <32 x float> [[TMP8]], float [[FVAL]], i32 4 -; MAX256-NEXT: [[TMP10:%.*]] = insertelement <32 x float> [[TMP9]], float [[FVAL]], i32 5 -; MAX256-NEXT: [[TMP11:%.*]] = insertelement <32 x float> [[TMP10]], float [[FVAL]], i32 6 -; MAX256-NEXT: [[TMP12:%.*]] = insertelement <32 x float> [[TMP11]], float [[FVAL]], i32 7 -; MAX256-NEXT: [[TMP13:%.*]] = insertelement <32 x float> [[TMP12]], float [[FVAL]], i32 8 -; MAX256-NEXT: [[TMP14:%.*]] = insertelement <32 x float> [[TMP13]], float [[FVAL]], i32 9 -; MAX256-NEXT: [[TMP15:%.*]] = insertelement <32 x float> [[TMP14]], float [[FVAL]], i32 10 -; MAX256-NEXT: [[TMP16:%.*]] = insertelement <32 x float> [[TMP15]], float [[FVAL]], i32 11 -; MAX256-NEXT: [[TMP17:%.*]] = insertelement <32 x float> [[TMP16]], float [[FVAL]], i32 12 -; MAX256-NEXT: [[TMP18:%.*]] = insertelement <32 x float> [[TMP17]], float [[FVAL]], i32 13 -; MAX256-NEXT: [[TMP19:%.*]] = insertelement <32 x float> [[TMP18]], float [[FVAL]], i32 14 -; MAX256-NEXT: [[TMP20:%.*]] = insertelement <32 x float> [[TMP19]], float [[FVAL]], i32 15 -; MAX256-NEXT: [[TMP21:%.*]] = insertelement <32 x float> [[TMP20]], float [[FVAL]], i32 16 -; MAX256-NEXT: [[TMP22:%.*]] = insertelement <32 x float> [[TMP21]], float [[FVAL]], i32 17 -; MAX256-NEXT: [[TMP23:%.*]] = insertelement <32 x float> [[TMP22]], float [[FVAL]], i32 18 -; MAX256-NEXT: [[TMP24:%.*]] = insertelement <32 x float> [[TMP23]], float [[FVAL]], i32 19 -; MAX256-NEXT: [[TMP25:%.*]] = insertelement <32 x float> [[TMP24]], float [[FVAL]], i32 20 -; MAX256-NEXT: [[TMP26:%.*]] = insertelement <32 x float> [[TMP25]], float [[FVAL]], i32 21 -; MAX256-NEXT: [[TMP27:%.*]] = insertelement <32 x float> [[TMP26]], float [[FVAL]], i32 22 -; MAX256-NEXT: [[TMP28:%.*]] = insertelement <32 x float> [[TMP27]], float [[FVAL]], i32 23 -; MAX256-NEXT: [[TMP29:%.*]] = insertelement <32 x float> [[TMP28]], float [[FVAL]], i32 24 -; MAX256-NEXT: [[TMP30:%.*]] = insertelement <32 x float> [[TMP29]], float [[FVAL]], i32 25 -; MAX256-NEXT: [[TMP31:%.*]] = insertelement <32 x float> [[TMP30]], float [[FVAL]], i32 26 -; MAX256-NEXT: [[TMP32:%.*]] = insertelement <32 x float> [[TMP31]], float [[FVAL]], i32 27 -; MAX256-NEXT: [[TMP33:%.*]] = insertelement <32 x float> [[TMP32]], float [[FVAL]], i32 28 -; MAX256-NEXT: [[TMP34:%.*]] = insertelement <32 x float> [[TMP33]], float [[FVAL]], i32 29 -; MAX256-NEXT: [[TMP35:%.*]] = insertelement <32 x float> [[TMP34]], float [[FVAL]], i32 30 -; MAX256-NEXT: [[TMP36:%.*]] = insertelement <32 x float> [[TMP35]], float [[FVAL]], i32 31 -; MAX256-NEXT: [[TMP37:%.*]] = fmul <32 x float> [[SHUFFLE]], [[TMP36]] -; MAX256-NEXT: [[TMP38:%.*]] = fadd <32 x float> zeroinitializer, [[TMP37]] -; MAX256-NEXT: [[TMP39:%.*]] = extractelement <32 x float> [[TMP38]], i32 0 -; MAX256-NEXT: [[TMP40:%.*]] = insertelement <32 x float> undef, float [[TMP39]], i32 0 -; MAX256-NEXT: [[TMP41:%.*]] = extractelement <32 x float> [[TMP38]], i32 1 -; MAX256-NEXT: [[TMP42:%.*]] = insertelement <32 x float> [[TMP40]], float [[TMP41]], i32 1 -; MAX256-NEXT: [[TMP43:%.*]] = insertelement <32 x float> [[TMP42]], float [[FVAL]], i32 2 -; MAX256-NEXT: [[TMP44:%.*]] = insertelement <32 x float> [[TMP43]], float [[FVAL]], i32 3 -; MAX256-NEXT: [[TMP45:%.*]] = extractelement <32 x float> [[TMP38]], i32 4 -; MAX256-NEXT: [[TMP46:%.*]] = insertelement <32 x float> [[TMP44]], float [[TMP45]], i32 4 -; MAX256-NEXT: [[TMP47:%.*]] = extractelement <32 x float> [[TMP38]], i32 5 -; MAX256-NEXT: [[TMP48:%.*]] = insertelement <32 x float> [[TMP46]], float [[TMP47]], i32 5 -; MAX256-NEXT: [[TMP49:%.*]] = insertelement <32 x float> [[TMP48]], float [[FVAL]], i32 6 -; MAX256-NEXT: [[TMP50:%.*]] = insertelement <32 x float> [[TMP49]], float [[FVAL]], i32 7 -; MAX256-NEXT: [[TMP51:%.*]] = insertelement <32 x float> [[TMP50]], float [[FVAL]], i32 8 -; MAX256-NEXT: [[TMP52:%.*]] = insertelement <32 x float> [[TMP51]], float [[FVAL]], i32 9 -; MAX256-NEXT: [[TMP53:%.*]] = extractelement <32 x float> [[TMP38]], i32 10 -; MAX256-NEXT: [[TMP54:%.*]] = insertelement <32 x float> [[TMP52]], float [[TMP53]], i32 10 -; MAX256-NEXT: [[TMP55:%.*]] = extractelement <32 x float> [[TMP38]], i32 11 -; MAX256-NEXT: [[TMP56:%.*]] = insertelement <32 x float> [[TMP54]], float [[TMP55]], i32 11 -; MAX256-NEXT: [[TMP57:%.*]] = insertelement <32 x float> [[TMP56]], float [[FVAL]], i32 12 -; MAX256-NEXT: [[TMP58:%.*]] = insertelement <32 x float> [[TMP57]], float [[FVAL]], i32 13 -; MAX256-NEXT: [[TMP59:%.*]] = extractelement <32 x float> [[TMP38]], i32 14 -; MAX256-NEXT: [[TMP60:%.*]] = insertelement <32 x float> [[TMP58]], float [[TMP59]], i32 14 -; MAX256-NEXT: [[TMP61:%.*]] = extractelement <32 x float> [[TMP38]], i32 15 -; MAX256-NEXT: [[TMP62:%.*]] = insertelement <32 x float> [[TMP60]], float [[TMP61]], i32 15 -; MAX256-NEXT: [[TMP63:%.*]] = insertelement <32 x float> [[TMP62]], float [[FVAL]], i32 16 -; MAX256-NEXT: [[TMP64:%.*]] = insertelement <32 x float> [[TMP63]], float [[FVAL]], i32 17 -; MAX256-NEXT: [[TMP65:%.*]] = extractelement <32 x float> [[TMP38]], i32 18 -; MAX256-NEXT: [[TMP66:%.*]] = insertelement <32 x float> [[TMP64]], float [[TMP65]], i32 18 -; MAX256-NEXT: [[TMP67:%.*]] = extractelement <32 x float> [[TMP38]], i32 19 -; MAX256-NEXT: [[TMP68:%.*]] = insertelement <32 x float> [[TMP66]], float [[TMP67]], i32 19 -; MAX256-NEXT: [[TMP69:%.*]] = insertelement <32 x float> [[TMP68]], float [[FVAL]], i32 20 -; MAX256-NEXT: [[TMP70:%.*]] = insertelement <32 x float> [[TMP69]], float [[FVAL]], i32 21 -; MAX256-NEXT: [[TMP71:%.*]] = extractelement <32 x float> [[TMP38]], i32 22 -; MAX256-NEXT: [[TMP72:%.*]] = insertelement <32 x float> [[TMP70]], float [[TMP71]], i32 22 -; MAX256-NEXT: [[TMP73:%.*]] = extractelement <32 x float> [[TMP38]], i32 23 -; MAX256-NEXT: [[TMP74:%.*]] = insertelement <32 x float> [[TMP72]], float [[TMP73]], i32 23 -; MAX256-NEXT: [[TMP75:%.*]] = insertelement <32 x float> [[TMP74]], float [[FVAL]], i32 24 -; MAX256-NEXT: [[TMP76:%.*]] = insertelement <32 x float> [[TMP75]], float [[FVAL]], i32 25 -; MAX256-NEXT: [[TMP77:%.*]] = extractelement <32 x float> [[TMP38]], i32 26 -; MAX256-NEXT: [[TMP78:%.*]] = insertelement <32 x float> [[TMP76]], float [[TMP77]], i32 26 -; MAX256-NEXT: [[TMP79:%.*]] = extractelement <32 x float> [[TMP38]], i32 27 -; MAX256-NEXT: [[TMP80:%.*]] = insertelement <32 x float> [[TMP78]], float [[TMP79]], i32 27 -; MAX256-NEXT: [[TMP81:%.*]] = insertelement <32 x float> [[TMP80]], float [[FVAL]], i32 28 -; MAX256-NEXT: [[TMP82:%.*]] = insertelement <32 x float> [[TMP81]], float [[FVAL]], i32 29 -; MAX256-NEXT: [[TMP83:%.*]] = extractelement <32 x float> [[TMP38]], i32 30 -; MAX256-NEXT: [[TMP84:%.*]] = insertelement <32 x float> [[TMP82]], float [[TMP83]], i32 30 -; MAX256-NEXT: [[TMP85:%.*]] = extractelement <32 x float> [[TMP38]], i32 31 -; MAX256-NEXT: [[TMP86:%.*]] = insertelement <32 x float> [[TMP84]], float [[TMP85]], i32 31 +; MAX256-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <8 x i32> +; MAX256-NEXT: [[TMP5:%.*]] = insertelement <8 x float> undef, float [[FVAL:%.*]], i32 0 +; MAX256-NEXT: [[TMP6:%.*]] = insertelement <8 x float> [[TMP5]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP7:%.*]] = insertelement <8 x float> [[TMP6]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP8:%.*]] = insertelement <8 x float> [[TMP7]], float [[FVAL]], i32 3 +; MAX256-NEXT: [[TMP9:%.*]] = insertelement <8 x float> [[TMP8]], float [[FVAL]], i32 4 +; MAX256-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP9]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP11:%.*]] = insertelement <8 x float> [[TMP10]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[FVAL]], i32 7 +; MAX256-NEXT: [[TMP13:%.*]] = fmul <8 x float> [[SHUFFLE]], [[TMP12]] +; MAX256-NEXT: [[TMP14:%.*]] = fadd <8 x float> zeroinitializer, [[TMP13]] +; MAX256-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 3 +; MAX256-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 2 +; MAX256-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 1 +; MAX256-NEXT: [[TMP18:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 0 +; MAX256-NEXT: [[TMP19:%.*]] = insertelement <8 x float> undef, float [[TMP15]], i32 0 +; MAX256-NEXT: [[TMP20:%.*]] = insertelement <8 x float> [[TMP19]], float [[TMP16]], i32 1 +; MAX256-NEXT: [[TMP21:%.*]] = insertelement <8 x float> [[TMP20]], float [[TMP17]], i32 2 +; MAX256-NEXT: [[TMP22:%.*]] = insertelement <8 x float> [[TMP21]], float [[TMP18]], i32 3 +; MAX256-NEXT: [[TMP23:%.*]] = insertelement <8 x float> [[TMP22]], float [[TMP15]], i32 4 +; MAX256-NEXT: [[TMP24:%.*]] = insertelement <8 x float> [[TMP23]], float [[TMP16]], i32 5 +; MAX256-NEXT: [[TMP25:%.*]] = insertelement <8 x float> [[TMP24]], float [[TMP17]], i32 6 +; MAX256-NEXT: [[TMP26:%.*]] = insertelement <8 x float> [[TMP25]], float [[TMP18]], i32 7 +; MAX256-NEXT: [[TMP27:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]] +; MAX256-NEXT: [[TMP28:%.*]] = fadd <8 x float> zeroinitializer, [[TMP27]] +; MAX256-NEXT: [[TMP29:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]] +; MAX256-NEXT: [[TMP30:%.*]] = fadd <8 x float> zeroinitializer, [[TMP29]] +; MAX256-NEXT: [[TMP31:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]] +; MAX256-NEXT: [[TMP32:%.*]] = fadd <8 x float> zeroinitializer, [[TMP31]] +; MAX256-NEXT: [[TMP33:%.*]] = extractelement <8 x float> [[TMP14]], i32 0 +; MAX256-NEXT: [[TMP34:%.*]] = insertelement <8 x float> undef, float [[TMP33]], i32 0 +; MAX256-NEXT: [[TMP35:%.*]] = extractelement <8 x float> [[TMP14]], i32 1 +; MAX256-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP35]], i32 1 +; MAX256-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[FVAL]], i32 3 +; MAX256-NEXT: [[TMP39:%.*]] = extractelement <8 x float> [[TMP14]], i32 4 +; MAX256-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP39]], i32 4 +; MAX256-NEXT: [[TMP41:%.*]] = extractelement <8 x float> [[TMP14]], i32 5 +; MAX256-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP41]], i32 5 +; MAX256-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[FVAL]], i32 7 +; MAX256-NEXT: [[TMP45:%.*]] = extractelement <8 x float> [[TMP28]], i32 2 +; MAX256-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP45]], i32 2 +; MAX256-NEXT: [[TMP47:%.*]] = extractelement <8 x float> [[TMP28]], i32 3 +; MAX256-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP47]], i32 3 +; MAX256-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[FVAL]], i32 4 +; MAX256-NEXT: [[TMP50:%.*]] = insertelement <8 x float> [[TMP49]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP51:%.*]] = extractelement <8 x float> [[TMP28]], i32 6 +; MAX256-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP50]], float [[TMP51]], i32 6 +; MAX256-NEXT: [[TMP53:%.*]] = extractelement <8 x float> [[TMP28]], i32 7 +; MAX256-NEXT: [[TMP54:%.*]] = insertelement <8 x float> [[TMP52]], float [[TMP53]], i32 7 +; MAX256-NEXT: [[TMP55:%.*]] = extractelement <8 x float> [[TMP30]], i32 2 +; MAX256-NEXT: [[TMP56:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP55]], i32 2 +; MAX256-NEXT: [[TMP57:%.*]] = extractelement <8 x float> [[TMP30]], i32 3 +; MAX256-NEXT: [[TMP58:%.*]] = insertelement <8 x float> [[TMP56]], float [[TMP57]], i32 3 +; MAX256-NEXT: [[TMP59:%.*]] = insertelement <8 x float> [[TMP58]], float [[FVAL]], i32 4 +; MAX256-NEXT: [[TMP60:%.*]] = insertelement <8 x float> [[TMP59]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP61:%.*]] = extractelement <8 x float> [[TMP30]], i32 6 +; MAX256-NEXT: [[TMP62:%.*]] = insertelement <8 x float> [[TMP60]], float [[TMP61]], i32 6 +; MAX256-NEXT: [[TMP63:%.*]] = extractelement <8 x float> [[TMP30]], i32 7 +; MAX256-NEXT: [[TMP64:%.*]] = insertelement <8 x float> [[TMP62]], float [[TMP63]], i32 7 +; MAX256-NEXT: [[TMP65:%.*]] = extractelement <8 x float> [[TMP32]], i32 2 +; MAX256-NEXT: [[TMP66:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP65]], i32 2 +; MAX256-NEXT: [[TMP67:%.*]] = extractelement <8 x float> [[TMP32]], i32 3 +; MAX256-NEXT: [[TMP68:%.*]] = insertelement <8 x float> [[TMP66]], float [[TMP67]], i32 3 +; MAX256-NEXT: [[TMP69:%.*]] = insertelement <8 x float> [[TMP68]], float [[FVAL]], i32 4 +; MAX256-NEXT: [[TMP70:%.*]] = insertelement <8 x float> [[TMP69]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP71:%.*]] = extractelement <8 x float> [[TMP32]], i32 6 +; MAX256-NEXT: [[TMP72:%.*]] = insertelement <8 x float> [[TMP70]], float [[TMP71]], i32 6 +; MAX256-NEXT: [[TMP73:%.*]] = extractelement <8 x float> [[TMP32]], i32 7 +; MAX256-NEXT: [[TMP74:%.*]] = insertelement <8 x float> [[TMP72]], float [[TMP73]], i32 7 ; MAX256-NEXT: switch i32 undef, label [[BB5:%.*]] [ ; MAX256-NEXT: i32 0, label [[BB2:%.*]] ; MAX256-NEXT: i32 1, label [[BB3:%.*]] @@ -289,89 +210,92 @@ define void @phi_float32(half %hval, float %fval) { ; MAX256: bb3: ; MAX256-NEXT: br label [[BB2]] ; MAX256: bb4: -; MAX256-NEXT: [[TMP87:%.*]] = insertelement <32 x float> [[TMP40]], float [[FVAL]], i32 1 -; MAX256-NEXT: [[TMP88:%.*]] = insertelement <32 x float> [[TMP87]], float [[FVAL]], i32 2 -; MAX256-NEXT: [[TMP89:%.*]] = extractelement <32 x float> [[TMP38]], i32 3 -; MAX256-NEXT: [[TMP90:%.*]] = insertelement <32 x float> [[TMP88]], float [[TMP89]], i32 3 -; MAX256-NEXT: [[TMP91:%.*]] = insertelement <32 x float> [[TMP90]], float [[TMP45]], i32 4 -; MAX256-NEXT: [[TMP92:%.*]] = insertelement <32 x float> [[TMP91]], float [[FVAL]], i32 5 -; MAX256-NEXT: [[TMP93:%.*]] = insertelement <32 x float> [[TMP92]], float [[FVAL]], i32 6 -; MAX256-NEXT: [[TMP94:%.*]] = extractelement <32 x float> [[TMP38]], i32 7 -; MAX256-NEXT: [[TMP95:%.*]] = insertelement <32 x float> [[TMP93]], float [[TMP94]], i32 7 -; MAX256-NEXT: [[TMP96:%.*]] = extractelement <32 x float> [[TMP38]], i32 8 -; MAX256-NEXT: [[TMP97:%.*]] = insertelement <32 x float> [[TMP95]], float [[TMP96]], i32 8 -; MAX256-NEXT: [[TMP98:%.*]] = insertelement <32 x float> [[TMP97]], float [[FVAL]], i32 9 -; MAX256-NEXT: [[TMP99:%.*]] = insertelement <32 x float> [[TMP98]], float [[FVAL]], i32 10 -; MAX256-NEXT: [[TMP100:%.*]] = insertelement <32 x float> [[TMP99]], float [[TMP55]], i32 11 -; MAX256-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[TMP38]], i32 12 -; MAX256-NEXT: [[TMP102:%.*]] = insertelement <32 x float> [[TMP100]], float [[TMP101]], i32 12 -; MAX256-NEXT: [[TMP103:%.*]] = insertelement <32 x float> [[TMP102]], float [[FVAL]], i32 13 -; MAX256-NEXT: [[TMP104:%.*]] = insertelement <32 x float> [[TMP103]], float [[FVAL]], i32 14 -; MAX256-NEXT: [[TMP105:%.*]] = insertelement <32 x float> [[TMP104]], float [[TMP61]], i32 15 -; MAX256-NEXT: [[TMP106:%.*]] = extractelement <32 x float> [[TMP38]], i32 16 -; MAX256-NEXT: [[TMP107:%.*]] = insertelement <32 x float> [[TMP105]], float [[TMP106]], i32 16 -; MAX256-NEXT: [[TMP108:%.*]] = insertelement <32 x float> [[TMP107]], float [[FVAL]], i32 17 -; MAX256-NEXT: [[TMP109:%.*]] = insertelement <32 x float> [[TMP108]], float [[FVAL]], i32 18 -; MAX256-NEXT: [[TMP110:%.*]] = insertelement <32 x float> [[TMP109]], float [[TMP67]], i32 19 -; MAX256-NEXT: [[TMP111:%.*]] = extractelement <32 x float> [[TMP38]], i32 20 -; MAX256-NEXT: [[TMP112:%.*]] = insertelement <32 x float> [[TMP110]], float [[TMP111]], i32 20 -; MAX256-NEXT: [[TMP113:%.*]] = insertelement <32 x float> [[TMP112]], float [[FVAL]], i32 21 -; MAX256-NEXT: [[TMP114:%.*]] = insertelement <32 x float> [[TMP113]], float [[FVAL]], i32 22 -; MAX256-NEXT: [[TMP115:%.*]] = insertelement <32 x float> [[TMP114]], float [[TMP73]], i32 23 -; MAX256-NEXT: [[TMP116:%.*]] = extractelement <32 x float> [[TMP38]], i32 24 -; MAX256-NEXT: [[TMP117:%.*]] = insertelement <32 x float> [[TMP115]], float [[TMP116]], i32 24 -; MAX256-NEXT: [[TMP118:%.*]] = insertelement <32 x float> [[TMP117]], float [[FVAL]], i32 25 -; MAX256-NEXT: [[TMP119:%.*]] = insertelement <32 x float> [[TMP118]], float [[FVAL]], i32 26 -; MAX256-NEXT: [[TMP120:%.*]] = insertelement <32 x float> [[TMP119]], float [[TMP79]], i32 27 -; MAX256-NEXT: [[TMP121:%.*]] = extractelement <32 x float> [[TMP38]], i32 28 -; MAX256-NEXT: [[TMP122:%.*]] = insertelement <32 x float> [[TMP120]], float [[TMP121]], i32 28 -; MAX256-NEXT: [[TMP123:%.*]] = insertelement <32 x float> [[TMP122]], float [[FVAL]], i32 29 -; MAX256-NEXT: [[TMP124:%.*]] = insertelement <32 x float> [[TMP123]], float [[FVAL]], i32 30 -; MAX256-NEXT: [[TMP125:%.*]] = insertelement <32 x float> [[TMP124]], float [[TMP85]], i32 31 +; MAX256-NEXT: [[TMP75:%.*]] = insertelement <8 x float> [[TMP34]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP76:%.*]] = insertelement <8 x float> [[TMP75]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP77:%.*]] = extractelement <8 x float> [[TMP14]], i32 3 +; MAX256-NEXT: [[TMP78:%.*]] = insertelement <8 x float> [[TMP76]], float [[TMP77]], i32 3 +; MAX256-NEXT: [[TMP79:%.*]] = insertelement <8 x float> [[TMP78]], float [[TMP39]], i32 4 +; MAX256-NEXT: [[TMP80:%.*]] = insertelement <8 x float> [[TMP79]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP81:%.*]] = insertelement <8 x float> [[TMP80]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP82:%.*]] = extractelement <8 x float> [[TMP14]], i32 7 +; MAX256-NEXT: [[TMP83:%.*]] = insertelement <8 x float> [[TMP81]], float [[TMP82]], i32 7 +; MAX256-NEXT: [[TMP84:%.*]] = extractelement <8 x float> [[TMP28]], i32 0 +; MAX256-NEXT: [[TMP85:%.*]] = insertelement <8 x float> undef, float [[TMP84]], i32 0 +; MAX256-NEXT: [[TMP86:%.*]] = insertelement <8 x float> [[TMP85]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP87:%.*]] = insertelement <8 x float> [[TMP86]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP88:%.*]] = insertelement <8 x float> [[TMP87]], float [[TMP47]], i32 3 +; MAX256-NEXT: [[TMP89:%.*]] = extractelement <8 x float> [[TMP28]], i32 4 +; MAX256-NEXT: [[TMP90:%.*]] = insertelement <8 x float> [[TMP88]], float [[TMP89]], i32 4 +; MAX256-NEXT: [[TMP91:%.*]] = insertelement <8 x float> [[TMP90]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP92:%.*]] = insertelement <8 x float> [[TMP91]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP93:%.*]] = insertelement <8 x float> [[TMP92]], float [[TMP53]], i32 7 +; MAX256-NEXT: [[TMP94:%.*]] = extractelement <8 x float> [[TMP30]], i32 0 +; MAX256-NEXT: [[TMP95:%.*]] = insertelement <8 x float> undef, float [[TMP94]], i32 0 +; MAX256-NEXT: [[TMP96:%.*]] = insertelement <8 x float> [[TMP95]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP97:%.*]] = insertelement <8 x float> [[TMP96]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP98:%.*]] = insertelement <8 x float> [[TMP97]], float [[TMP57]], i32 3 +; MAX256-NEXT: [[TMP99:%.*]] = extractelement <8 x float> [[TMP30]], i32 4 +; MAX256-NEXT: [[TMP100:%.*]] = insertelement <8 x float> [[TMP98]], float [[TMP99]], i32 4 +; MAX256-NEXT: [[TMP101:%.*]] = insertelement <8 x float> [[TMP100]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP102:%.*]] = insertelement <8 x float> [[TMP101]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP103:%.*]] = insertelement <8 x float> [[TMP102]], float [[TMP63]], i32 7 +; MAX256-NEXT: [[TMP104:%.*]] = extractelement <8 x float> [[TMP32]], i32 0 +; MAX256-NEXT: [[TMP105:%.*]] = insertelement <8 x float> undef, float [[TMP104]], i32 0 +; MAX256-NEXT: [[TMP106:%.*]] = insertelement <8 x float> [[TMP105]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP107:%.*]] = insertelement <8 x float> [[TMP106]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP108:%.*]] = insertelement <8 x float> [[TMP107]], float [[TMP67]], i32 3 +; MAX256-NEXT: [[TMP109:%.*]] = extractelement <8 x float> [[TMP32]], i32 4 +; MAX256-NEXT: [[TMP110:%.*]] = insertelement <8 x float> [[TMP108]], float [[TMP109]], i32 4 +; MAX256-NEXT: [[TMP111:%.*]] = insertelement <8 x float> [[TMP110]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP112:%.*]] = insertelement <8 x float> [[TMP111]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP113:%.*]] = insertelement <8 x float> [[TMP112]], float [[TMP73]], i32 7 ; MAX256-NEXT: br label [[BB2]] ; MAX256: bb5: -; MAX256-NEXT: [[TMP126:%.*]] = insertelement <32 x float> [[TMP5]], float [[TMP41]], i32 1 -; MAX256-NEXT: [[TMP127:%.*]] = insertelement <32 x float> [[TMP126]], float [[FVAL]], i32 2 -; MAX256-NEXT: [[TMP128:%.*]] = extractelement <32 x float> [[TMP38]], i32 3 -; MAX256-NEXT: [[TMP129:%.*]] = insertelement <32 x float> [[TMP127]], float [[TMP128]], i32 3 -; MAX256-NEXT: [[TMP130:%.*]] = insertelement <32 x float> [[TMP129]], float [[FVAL]], i32 4 -; MAX256-NEXT: [[TMP131:%.*]] = insertelement <32 x float> [[TMP130]], float [[TMP47]], i32 5 -; MAX256-NEXT: [[TMP132:%.*]] = insertelement <32 x float> [[TMP131]], float [[FVAL]], i32 6 -; MAX256-NEXT: [[TMP133:%.*]] = extractelement <32 x float> [[TMP38]], i32 7 -; MAX256-NEXT: [[TMP134:%.*]] = insertelement <32 x float> [[TMP132]], float [[TMP133]], i32 7 -; MAX256-NEXT: [[TMP135:%.*]] = extractelement <32 x float> [[TMP38]], i32 8 -; MAX256-NEXT: [[TMP136:%.*]] = insertelement <32 x float> [[TMP134]], float [[TMP135]], i32 8 -; MAX256-NEXT: [[TMP137:%.*]] = insertelement <32 x float> [[TMP136]], float [[FVAL]], i32 9 -; MAX256-NEXT: [[TMP138:%.*]] = insertelement <32 x float> [[TMP137]], float [[TMP53]], i32 10 -; MAX256-NEXT: [[TMP139:%.*]] = insertelement <32 x float> [[TMP138]], float [[FVAL]], i32 11 -; MAX256-NEXT: [[TMP140:%.*]] = extractelement <32 x float> [[TMP38]], i32 12 -; MAX256-NEXT: [[TMP141:%.*]] = insertelement <32 x float> [[TMP139]], float [[TMP140]], i32 12 -; MAX256-NEXT: [[TMP142:%.*]] = insertelement <32 x float> [[TMP141]], float [[FVAL]], i32 13 -; MAX256-NEXT: [[TMP143:%.*]] = insertelement <32 x float> [[TMP142]], float [[TMP59]], i32 14 -; MAX256-NEXT: [[TMP144:%.*]] = insertelement <32 x float> [[TMP143]], float [[FVAL]], i32 15 -; MAX256-NEXT: [[TMP145:%.*]] = extractelement <32 x float> [[TMP38]], i32 16 -; MAX256-NEXT: [[TMP146:%.*]] = insertelement <32 x float> [[TMP144]], float [[TMP145]], i32 16 -; MAX256-NEXT: [[TMP147:%.*]] = insertelement <32 x float> [[TMP146]], float [[FVAL]], i32 17 -; MAX256-NEXT: [[TMP148:%.*]] = insertelement <32 x float> [[TMP147]], float [[TMP65]], i32 18 -; MAX256-NEXT: [[TMP149:%.*]] = insertelement <32 x float> [[TMP148]], float [[FVAL]], i32 19 -; MAX256-NEXT: [[TMP150:%.*]] = extractelement <32 x float> [[TMP38]], i32 20 -; MAX256-NEXT: [[TMP151:%.*]] = insertelement <32 x float> [[TMP149]], float [[TMP150]], i32 20 -; MAX256-NEXT: [[TMP152:%.*]] = insertelement <32 x float> [[TMP151]], float [[FVAL]], i32 21 -; MAX256-NEXT: [[TMP153:%.*]] = insertelement <32 x float> [[TMP152]], float [[TMP71]], i32 22 -; MAX256-NEXT: [[TMP154:%.*]] = insertelement <32 x float> [[TMP153]], float [[FVAL]], i32 23 -; MAX256-NEXT: [[TMP155:%.*]] = extractelement <32 x float> [[TMP38]], i32 24 -; MAX256-NEXT: [[TMP156:%.*]] = insertelement <32 x float> [[TMP154]], float [[TMP155]], i32 24 -; MAX256-NEXT: [[TMP157:%.*]] = insertelement <32 x float> [[TMP156]], float [[FVAL]], i32 25 -; MAX256-NEXT: [[TMP158:%.*]] = insertelement <32 x float> [[TMP157]], float [[TMP77]], i32 26 -; MAX256-NEXT: [[TMP159:%.*]] = insertelement <32 x float> [[TMP158]], float [[FVAL]], i32 27 -; MAX256-NEXT: [[TMP160:%.*]] = extractelement <32 x float> [[TMP38]], i32 28 -; MAX256-NEXT: [[TMP161:%.*]] = insertelement <32 x float> [[TMP159]], float [[TMP160]], i32 28 -; MAX256-NEXT: [[TMP162:%.*]] = insertelement <32 x float> [[TMP161]], float [[FVAL]], i32 29 -; MAX256-NEXT: [[TMP163:%.*]] = insertelement <32 x float> [[TMP162]], float [[TMP83]], i32 30 -; MAX256-NEXT: [[TMP164:%.*]] = insertelement <32 x float> [[TMP163]], float [[FVAL]], i32 31 +; MAX256-NEXT: [[TMP114:%.*]] = insertelement <8 x float> [[TMP5]], float [[TMP35]], i32 1 +; MAX256-NEXT: [[TMP115:%.*]] = insertelement <8 x float> [[TMP114]], float [[FVAL]], i32 2 +; MAX256-NEXT: [[TMP116:%.*]] = extractelement <8 x float> [[TMP14]], i32 3 +; MAX256-NEXT: [[TMP117:%.*]] = insertelement <8 x float> [[TMP115]], float [[TMP116]], i32 3 +; MAX256-NEXT: [[TMP118:%.*]] = insertelement <8 x float> [[TMP117]], float [[FVAL]], i32 4 +; MAX256-NEXT: [[TMP119:%.*]] = insertelement <8 x float> [[TMP118]], float [[TMP41]], i32 5 +; MAX256-NEXT: [[TMP120:%.*]] = insertelement <8 x float> [[TMP119]], float [[FVAL]], i32 6 +; MAX256-NEXT: [[TMP121:%.*]] = extractelement <8 x float> [[TMP14]], i32 7 +; MAX256-NEXT: [[TMP122:%.*]] = insertelement <8 x float> [[TMP120]], float [[TMP121]], i32 7 +; MAX256-NEXT: [[TMP123:%.*]] = extractelement <8 x float> [[TMP28]], i32 0 +; MAX256-NEXT: [[TMP124:%.*]] = insertelement <8 x float> undef, float [[TMP123]], i32 0 +; MAX256-NEXT: [[TMP125:%.*]] = insertelement <8 x float> [[TMP124]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP126:%.*]] = insertelement <8 x float> [[TMP125]], float [[TMP45]], i32 2 +; MAX256-NEXT: [[TMP127:%.*]] = insertelement <8 x float> [[TMP126]], float [[FVAL]], i32 3 +; MAX256-NEXT: [[TMP128:%.*]] = extractelement <8 x float> [[TMP28]], i32 4 +; MAX256-NEXT: [[TMP129:%.*]] = insertelement <8 x float> [[TMP127]], float [[TMP128]], i32 4 +; MAX256-NEXT: [[TMP130:%.*]] = insertelement <8 x float> [[TMP129]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP131:%.*]] = insertelement <8 x float> [[TMP130]], float [[TMP51]], i32 6 +; MAX256-NEXT: [[TMP132:%.*]] = insertelement <8 x float> [[TMP131]], float [[FVAL]], i32 7 +; MAX256-NEXT: [[TMP133:%.*]] = extractelement <8 x float> [[TMP30]], i32 0 +; MAX256-NEXT: [[TMP134:%.*]] = insertelement <8 x float> undef, float [[TMP133]], i32 0 +; MAX256-NEXT: [[TMP135:%.*]] = insertelement <8 x float> [[TMP134]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP136:%.*]] = insertelement <8 x float> [[TMP135]], float [[TMP55]], i32 2 +; MAX256-NEXT: [[TMP137:%.*]] = insertelement <8 x float> [[TMP136]], float [[FVAL]], i32 3 +; MAX256-NEXT: [[TMP138:%.*]] = extractelement <8 x float> [[TMP30]], i32 4 +; MAX256-NEXT: [[TMP139:%.*]] = insertelement <8 x float> [[TMP137]], float [[TMP138]], i32 4 +; MAX256-NEXT: [[TMP140:%.*]] = insertelement <8 x float> [[TMP139]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP141:%.*]] = insertelement <8 x float> [[TMP140]], float [[TMP61]], i32 6 +; MAX256-NEXT: [[TMP142:%.*]] = insertelement <8 x float> [[TMP141]], float [[FVAL]], i32 7 +; MAX256-NEXT: [[TMP143:%.*]] = extractelement <8 x float> [[TMP32]], i32 0 +; MAX256-NEXT: [[TMP144:%.*]] = insertelement <8 x float> undef, float [[TMP143]], i32 0 +; MAX256-NEXT: [[TMP145:%.*]] = insertelement <8 x float> [[TMP144]], float [[FVAL]], i32 1 +; MAX256-NEXT: [[TMP146:%.*]] = insertelement <8 x float> [[TMP145]], float [[TMP65]], i32 2 +; MAX256-NEXT: [[TMP147:%.*]] = insertelement <8 x float> [[TMP146]], float [[FVAL]], i32 3 +; MAX256-NEXT: [[TMP148:%.*]] = extractelement <8 x float> [[TMP32]], i32 4 +; MAX256-NEXT: [[TMP149:%.*]] = insertelement <8 x float> [[TMP147]], float [[TMP148]], i32 4 +; MAX256-NEXT: [[TMP150:%.*]] = insertelement <8 x float> [[TMP149]], float [[FVAL]], i32 5 +; MAX256-NEXT: [[TMP151:%.*]] = insertelement <8 x float> [[TMP150]], float [[TMP71]], i32 6 +; MAX256-NEXT: [[TMP152:%.*]] = insertelement <8 x float> [[TMP151]], float [[FVAL]], i32 7 ; MAX256-NEXT: br label [[BB2]] ; MAX256: bb2: -; MAX256-NEXT: [[TMP165:%.*]] = phi <32 x float> [ [[TMP38]], [[BB3]] ], [ [[TMP125]], [[BB4]] ], [ [[TMP164]], [[BB5]] ], [ [[TMP86]], [[BB1]] ] +; MAX256-NEXT: [[TMP153:%.*]] = phi <8 x float> [ [[TMP14]], [[BB3]] ], [ [[TMP83]], [[BB4]] ], [ [[TMP122]], [[BB5]] ], [ [[TMP44]], [[BB1]] ] +; MAX256-NEXT: [[TMP154:%.*]] = phi <8 x float> [ [[TMP28]], [[BB3]] ], [ [[TMP93]], [[BB4]] ], [ [[TMP132]], [[BB5]] ], [ [[TMP54]], [[BB1]] ] +; MAX256-NEXT: [[TMP155:%.*]] = phi <8 x float> [ [[TMP30]], [[BB3]] ], [ [[TMP103]], [[BB4]] ], [ [[TMP142]], [[BB5]] ], [ [[TMP64]], [[BB1]] ] +; MAX256-NEXT: [[TMP156:%.*]] = phi <8 x float> [ [[TMP32]], [[BB3]] ], [ [[TMP113]], [[BB4]] ], [ [[TMP152]], [[BB5]] ], [ [[TMP74]], [[BB1]] ] ; MAX256-NEXT: ret void ; ; MAX1024-LABEL: @phi_float32( From llvm-commits at lists.llvm.org Wed Jul 8 08:25:47 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Wed, 08 Jul 2020 08:25:47 -0700 (PDT) Subject: [llvm] 6aab27b - [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. Message-ID: <5f05e57b.1c69fb81.1c5d2.0892@mx.google.com> Author: sstefan1 Date: 2020-07-08T17:23:55+02:00 New Revision: 6aab27ba851f132f01ea8e87f92243918dc23cfd URL: https://github.com/llvm/llvm-project/commit/6aab27ba851f132f01ea8e87f92243918dc23cfd DIFF: https://github.com/llvm/llvm-project/commit/6aab27ba851f132f01ea8e87f92243918dc23cfd.diff LOG: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. Summary: D82193 exposed a problem with global type definitions in `OMPConstants.h`. This causes a race when running in thinLTO mode. Types now live inside of OpenMPIRBuilder to prevent this from happening. Reviewers: jdoerfert Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83176 Added: Modified: clang/lib/CodeGen/CGDecl.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/CodeGen/CGOpenMPRuntime.h clang/lib/CodeGen/CGStmtOpenMP.cpp clang/lib/CodeGen/CodeGenFunction.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.h llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp Removed: ################################################################################ diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp index 09593531af83..1729c7ed3c31 100644 --- a/clang/lib/CodeGen/CGDecl.cpp +++ b/clang/lib/CodeGen/CGDecl.cpp @@ -1401,7 +1401,7 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) { Address address = Address::invalid(); Address AllocaAddr = Address::invalid(); Address OpenMPLocalAddr = Address::invalid(); - if (CGM.getOpenMPIRBuilder()) + if (CGM.getLangOpts().OpenMPIRBuilder) OpenMPLocalAddr = OMPBuilderCBHelpers::getAddressOfLocalVariable(*this, &D); else OpenMPLocalAddr = diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 2547690ed3a3..be5d976f346c 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -2398,7 +2398,7 @@ EmitBitCastOfLValueToProperType(CodeGenFunction &CGF, static LValue EmitThreadPrivateVarDeclLValue( CodeGenFunction &CGF, const VarDecl *VD, QualType T, Address Addr, llvm::Type *RealVarTy, SourceLocation Loc) { - if (CGF.CGM.getOpenMPIRBuilder()) + if (CGF.CGM.getLangOpts().OpenMPIRBuilder) Addr = CodeGenFunction::OMPBuilderCBHelpers::getAddrOfThreadPrivate( CGF, VD, Addr, Loc); else diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index 858dbbf81f20..fb2ce60f2e41 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1060,7 +1060,7 @@ static FieldDecl *addFieldToRecordDecl(ASTContext &C, DeclContext *DC, CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator, StringRef Separator) : CGM(CGM), FirstSeparator(FirstSeparator), Separator(Separator), - OffloadEntriesInfoManager(CGM) { + OMPBuilder(CGM.getModule()), OffloadEntriesInfoManager(CGM) { ASTContext &C = CGM.getContext(); RecordDecl *RD = C.buildImplicitRecord("ident_t"); QualType KmpInt32Ty = C.getIntTypeForBitwidth(/*DestWidth=*/32, /*Signed=*/1); @@ -1081,7 +1081,7 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator, KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /*NumElements*/ 8); // Initialize Types used in OpenMPIRBuilder from OMPKinds.def - llvm::omp::types::initializeTypes(CGM.getModule()); + OMPBuilder.initialize(); loadOffloadInfoMetadata(); } @@ -1278,8 +1278,8 @@ static llvm::Function *emitParallelOrTeamsOutlinedFunction( // TODO: Temporarily inform the OpenMPIRBuilder, if any, about the new // parallel region to make cancellation barriers work properly. - llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder(); - PushAndPopStackRAII PSR(OMPBuilder, CGF, HasCancel); + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); + PushAndPopStackRAII PSR(&OMPBuilder, CGF, HasCancel); CGOpenMPOutlinedRegionInfo CGInfo(*CS, ThreadIDVar, CodeGen, InnermostKind, HasCancel, OutlinedHelperName); CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo); @@ -1316,7 +1316,7 @@ llvm::Function *CGOpenMPRuntime::emitTaskOutlinedFunction( CGF.EmitLoadOfPointerLValue(CGF.GetAddrOfLocalVar(TaskTVar), TaskTVar->getType()->castAs()) .getPointer(CGF)}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_task), TaskArgs); }; @@ -1563,8 +1563,8 @@ llvm::Value *CGOpenMPRuntime::getThreadID(CodeGenFunction &CGF, CGBuilderTy::InsertPointGuard IPG(CGF.Builder); CGF.Builder.SetInsertPoint(Elem.second.ServiceInsertPt); llvm::CallInst *Call = CGF.Builder.CreateCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_global_thread_num), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_global_thread_num), emitUpdateLocation(CGF, Loc)); Call->setCallingConv(CGF.getRuntimeCC()); Elem.second.ThreadID = Call; @@ -1783,7 +1783,7 @@ Address CGOpenMPRuntime::getAddrOfThreadPrivate(CodeGenFunction &CGF, CGM.getSize(CGM.GetTargetTypeStoreSize(VarTy)), getOrCreateThreadPrivateCache(VD)}; return Address(CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_threadprivate_cached), Args), VDAddr.getAlignment()); @@ -1795,7 +1795,7 @@ void CGOpenMPRuntime::emitThreadPrivateVarInit( // Call kmp_int32 __kmpc_global_thread_num(&loc) to init OpenMP runtime // library. llvm::Value *OMPLoc = emitUpdateLocation(CGF, Loc); - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_global_thread_num), OMPLoc); // Call __kmpc_threadprivate_register(&loc, &var, ctor, cctor/*NULL*/, dtor) @@ -1804,7 +1804,7 @@ void CGOpenMPRuntime::emitThreadPrivateVarInit( OMPLoc, CGF.Builder.CreatePointerCast(VDAddr.getPointer(), CGM.VoidPtrTy), Ctor, CopyCtor, Dtor}; CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_threadprivate_register), Args); } @@ -2068,7 +2068,7 @@ Address CGOpenMPRuntime::getAddrOfArtificialThreadPrivate(CodeGenFunction &CGF, return Address( CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_threadprivate_cached), Args), VarLVType->getPointerTo(/*AddrSpace=*/0)), @@ -2122,8 +2122,8 @@ void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation Loc, return; llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc); auto &M = CGM.getModule(); - auto &&ThenGen = [&M, OutlinedFn, CapturedVars, RTLoc](CodeGenFunction &CGF, - PrePostActionTy &) { + auto &&ThenGen = [&M, OutlinedFn, CapturedVars, RTLoc, + this](CodeGenFunction &CGF, PrePostActionTy &) { // Build call __kmpc_fork_call(loc, n, microtask, var1, .., varn); CGOpenMPRuntime &RT = CGF.CGM.getOpenMPRuntime(); llvm::Value *Args[] = { @@ -2135,18 +2135,17 @@ void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation Loc, RealArgs.append(CapturedVars.begin(), CapturedVars.end()); llvm::FunctionCallee RTLFn = - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - M, OMPRTL___kmpc_fork_call); + OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_fork_call); CGF.EmitRuntimeCall(RTLFn, RealArgs); }; - auto &&ElseGen = [&M, OutlinedFn, CapturedVars, RTLoc, - Loc](CodeGenFunction &CGF, PrePostActionTy &) { + auto &&ElseGen = [&M, OutlinedFn, CapturedVars, RTLoc, Loc, + this](CodeGenFunction &CGF, PrePostActionTy &) { CGOpenMPRuntime &RT = CGF.CGM.getOpenMPRuntime(); llvm::Value *ThreadID = RT.getThreadID(CGF, Loc); // Build calls: // __kmpc_serialized_parallel(&Loc, GTid); llvm::Value *Args[] = {RTLoc, ThreadID}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( M, OMPRTL___kmpc_serialized_parallel), Args); @@ -2165,7 +2164,7 @@ void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation Loc, // __kmpc_end_serialized_parallel(&Loc, GTid); llvm::Value *EndArgs[] = {RT.emitUpdateLocation(CGF, Loc), ThreadID}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( M, OMPRTL___kmpc_end_serialized_parallel), EndArgs); }; @@ -2284,12 +2283,12 @@ void CGOpenMPRuntime::emitCriticalRegion(CodeGenFunction &CGF, CGF.EmitScalarExpr(Hint), CGM.Int32Ty, /*isSigned=*/false)); } CommonActionTy Action( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), Hint ? OMPRTL___kmpc_critical_with_hint : OMPRTL___kmpc_critical), EnterArgs, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_end_critical), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_end_critical), Args); CriticalOpGen.setAction(Action); emitInlinedDirective(CGF, OMPD_critical, CriticalOpGen); @@ -2306,10 +2305,10 @@ void CGOpenMPRuntime::emitMasterRegion(CodeGenFunction &CGF, // } // Prepare arguments and build a call to __kmpc_master llvm::Value *Args[] = {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc)}; - CommonActionTy Action(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_master), Args, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_end_master), Args, /*Conditional=*/true); @@ -2322,15 +2321,14 @@ void CGOpenMPRuntime::emitTaskyieldCall(CodeGenFunction &CGF, SourceLocation Loc) { if (!CGF.HaveInsertPoint()) return; - llvm::OpenMPIRBuilder *OMPBuilder = CGF.CGM.getOpenMPIRBuilder(); - if (OMPBuilder) { - OMPBuilder->CreateTaskyield(CGF.Builder); + if (CGF.CGM.getLangOpts().OpenMPIRBuilder) { + OMPBuilder.CreateTaskyield(CGF.Builder); } else { // Build call __kmpc_omp_taskyield(loc, thread_id, 0); llvm::Value *Args[] = { emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), llvm::ConstantInt::get(CGM.IntTy, /*V=*/0, /*isSigned=*/true)}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_taskyield), Args); } @@ -2349,10 +2347,10 @@ void CGOpenMPRuntime::emitTaskgroupRegion(CodeGenFunction &CGF, // __kmpc_end_taskgroup(ident_t *, gtid); // Prepare arguments and build a call to __kmpc_taskgroup llvm::Value *Args[] = {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc)}; - CommonActionTy Action(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_taskgroup), Args, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_end_taskgroup), Args); TaskgroupOpGen.setAction(Action); @@ -2459,10 +2457,10 @@ void CGOpenMPRuntime::emitSingleRegion(CodeGenFunction &CGF, } // Prepare arguments and build a call to __kmpc_single llvm::Value *Args[] = {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc)}; - CommonActionTy Action(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_single), Args, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_end_single), Args, /*Conditional=*/true); @@ -2509,7 +2507,7 @@ void CGOpenMPRuntime::emitSingleRegion(CodeGenFunction &CGF, CpyFn, // void (*) (void *, void *) DidItVal // i32 did_it }; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_copyprivate), Args); } @@ -2526,10 +2524,10 @@ void CGOpenMPRuntime::emitOrderedRegion(CodeGenFunction &CGF, // Prepare arguments and build a call to __kmpc_ordered if (IsThreads) { llvm::Value *Args[] = {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc)}; - CommonActionTy Action(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_ordered), Args, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_end_ordered), Args); OrderedOpGen.setAction(Action); @@ -2578,9 +2576,8 @@ void CGOpenMPRuntime::emitBarrierCall(CodeGenFunction &CGF, SourceLocation Loc, // Check if we should use the OMPBuilder auto *OMPRegionInfo = dyn_cast_or_null(CGF.CapturedStmtInfo); - llvm::OpenMPIRBuilder *OMPBuilder = CGF.CGM.getOpenMPIRBuilder(); - if (OMPBuilder) { - CGF.Builder.restoreIP(OMPBuilder->CreateBarrier( + if (CGF.CGM.getLangOpts().OpenMPIRBuilder) { + CGF.Builder.restoreIP(OMPBuilder.CreateBarrier( CGF.Builder, Kind, ForceSimpleCall, EmitChecks)); return; } @@ -2597,8 +2594,8 @@ void CGOpenMPRuntime::emitBarrierCall(CodeGenFunction &CGF, SourceLocation Loc, if (OMPRegionInfo) { if (!ForceSimpleCall && OMPRegionInfo->hasCancel()) { llvm::Value *Result = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_cancel_barrier), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_cancel_barrier), Args); if (EmitChecks) { // if (__kmpc_cancel_barrier()) { @@ -2618,7 +2615,7 @@ void CGOpenMPRuntime::emitBarrierCall(CodeGenFunction &CGF, SourceLocation Loc, return; } } - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_barrier), Args); } @@ -2870,7 +2867,7 @@ void CGOpenMPRuntime::emitForStaticFinish(CodeGenFunction &CGF, : OMP_IDENT_WORK_SECTIONS), getThreadID(CGF, Loc)}; auto DL = ApplyDebugLocation::CreateDefaultArtificial(CGF, Loc); - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_for_static_fini), Args); } @@ -2919,7 +2916,7 @@ void CGOpenMPRuntime::emitNumThreadsClause(CodeGenFunction &CGF, llvm::Value *Args[] = { emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_push_num_threads), Args); } @@ -2934,21 +2931,20 @@ void CGOpenMPRuntime::emitProcBindClause(CodeGenFunction &CGF, llvm::Value *Args[] = { emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), llvm::ConstantInt::get(CGM.IntTy, unsigned(ProcBind), /*isSigned=*/true)}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_push_proc_bind), Args); } void CGOpenMPRuntime::emitFlush(CodeGenFunction &CGF, ArrayRef, SourceLocation Loc, llvm::AtomicOrdering AO) { - llvm::OpenMPIRBuilder *OMPBuilder = CGF.CGM.getOpenMPIRBuilder(); - if (OMPBuilder) { - OMPBuilder->CreateFlush(CGF.Builder); + if (CGF.CGM.getLangOpts().OpenMPIRBuilder) { + OMPBuilder.CreateFlush(CGF.Builder); } else { if (!CGF.HaveInsertPoint()) return; // Build call void __kmpc_flush(ident_t *loc) - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_flush), emitUpdateLocation(CGF, Loc)); } @@ -4302,12 +4298,12 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc, DeviceID = CGF.Builder.getInt64(OMP_DEVICEID_UNDEF); AllocArgs.push_back(DeviceID); NewTask = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_target_task_alloc), AllocArgs); } else { NewTask = - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_task_alloc), AllocArgs); } @@ -4324,7 +4320,7 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc, llvm::Value *Tid = getThreadID(CGF, DC->getBeginLoc()); Tid = CGF.Builder.CreateIntCast(Tid, CGF.IntTy, /*isSigned=*/false); llvm::Value *EvtVal = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_task_allow_completion_event), {Loc, Tid, NewTask}); EvtVal = CGF.EmitScalarConversion(EvtVal, C.VoidPtrTy, Evt->getType(), @@ -4463,7 +4459,7 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc, // FIXME: Emit the function and ignore its result for now unless the // runtime function is properly implemented. (void)CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_reg_task_with_affinity), {LocRef, GTid, NewTask, NumOfElements, AffinListPtr}); } @@ -4966,7 +4962,7 @@ Address CGOpenMPRuntime::emitDepobjDependClause( llvm::Value *Args[] = {ThreadID, Size, Allocator}; llvm::Value *Addr = - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_alloc), Args, ".dep.arr.addr"); Addr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( @@ -5019,7 +5015,7 @@ void CGOpenMPRuntime::emitDestroyClause(CodeGenFunction &CGF, LValue DepobjLVal, llvm::Value *Args[] = {ThreadID, DepObjAddr, Allocator}; // _kmpc_free(gtid, addr, nullptr); - (void)CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + (void)CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_free), Args); } @@ -5120,11 +5116,11 @@ void CGOpenMPRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc, } if (!Data.Dependences.empty()) { CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_task_with_deps), DepTaskArgs); } else { - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_task), TaskArgs); } @@ -5144,8 +5140,8 @@ void CGOpenMPRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc, DepWaitTaskArgs[5] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy); } auto &M = CGM.getModule(); - auto &&ElseCodeGen = [&M, &TaskArgs, ThreadID, NewTaskNewTaskTTy, TaskEntry, - &Data, &DepWaitTaskArgs, + auto &&ElseCodeGen = [this, &M, &TaskArgs, ThreadID, NewTaskNewTaskTTy, + TaskEntry, &Data, &DepWaitTaskArgs, Loc](CodeGenFunction &CGF, PrePostActionTy &) { CodeGenFunction::RunCleanupsScope LocalScope(CGF); // Build void __kmpc_omp_wait_deps(ident_t *, kmp_int32 gtid, @@ -5153,9 +5149,9 @@ void CGOpenMPRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc, // ndeps_noalias, kmp_depend_info_t *noalias_dep_list); if dependence info // is specified. if (!Data.Dependences.empty()) - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - M, OMPRTL___kmpc_omp_wait_deps), - DepWaitTaskArgs); + CGF.EmitRuntimeCall( + OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_wait_deps), + DepWaitTaskArgs); // Call proxy_task_entry(gtid, new_task); auto &&CodeGen = [TaskEntry, ThreadID, NewTaskNewTaskTTy, Loc](CodeGenFunction &CGF, PrePostActionTy &Action) { @@ -5170,10 +5166,10 @@ void CGOpenMPRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc, // Build void __kmpc_omp_task_complete_if0(ident_t *, kmp_int32 gtid, // kmp_task_t *new_task); RegionCodeGenTy RCG(CodeGen); - CommonActionTy Action(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( M, OMPRTL___kmpc_omp_task_begin_if0), TaskArgs, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( M, OMPRTL___kmpc_omp_task_complete_if0), TaskArgs); RCG.setAction(Action); @@ -5269,7 +5265,7 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc, Result.TaskDupFn ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( Result.TaskDupFn, CGF.VoidPtrTy) : llvm::ConstantPointerNull::get(CGF.VoidPtrTy)}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_taskloop), TaskArgs); } @@ -5613,7 +5609,7 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc, Lock // kmp_critical_name *& }; llvm::Value *Res = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), WithNowait ? OMPRTL___kmpc_reduce_nowait : OMPRTL___kmpc_reduce), Args); @@ -5656,7 +5652,7 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc, RegionCodeGenTy RCG(CodeGen); CommonActionTy Action( nullptr, llvm::None, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), WithNowait ? OMPRTL___kmpc_end_reduce_nowait : OMPRTL___kmpc_end_reduce), EndArgs); @@ -5781,7 +5777,7 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc, Lock // kmp_critical_name *& }; CommonActionTy Action(nullptr, llvm::None, - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_end_reduce), EndArgs); AtomicRCG.setAction(Action); @@ -6121,7 +6117,7 @@ llvm::Value *CGOpenMPRuntime::emitTaskReductionInit( CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( TaskRedInput.getPointer(), CGM.VoidPtrTy)}; return CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_taskred_modifier_init), Args); } @@ -6132,7 +6128,7 @@ llvm::Value *CGOpenMPRuntime::emitTaskReductionInit( llvm::ConstantInt::get(CGM.IntTy, Size, /*isSigned=*/true), CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(TaskRedInput.getPointer(), CGM.VoidPtrTy)}; - return CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + return CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_taskred_init), Args); } @@ -6150,7 +6146,7 @@ void CGOpenMPRuntime::emitTaskReductionFini(CodeGenFunction &CGF, IsWorksharingReduction ? 1 : 0, /*isSigned=*/true)}; (void)CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_task_reduction_modifier_fini), Args); } @@ -6186,7 +6182,7 @@ Address CGOpenMPRuntime::getTaskReductionItem(CodeGenFunction &CGF, SharedLVal.getPointer(CGF), CGM.VoidPtrTy)}; return Address( CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_task_reduction_get_th_data), Args), SharedLVal.getAlignment()); @@ -6197,15 +6193,14 @@ void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, if (!CGF.HaveInsertPoint()) return; - llvm::OpenMPIRBuilder *OMPBuilder = CGF.CGM.getOpenMPIRBuilder(); - if (OMPBuilder) { - OMPBuilder->CreateTaskwait(CGF.Builder); + if (CGF.CGM.getLangOpts().OpenMPIRBuilder) { + OMPBuilder.CreateTaskwait(CGF.Builder); } else { // Build call kmp_int32 __kmpc_omp_taskwait(ident_t *loc, kmp_int32 // global_tid); llvm::Value *Args[] = {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc)}; // Ignore return result until untied tasks are supported. - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_omp_taskwait), Args); } @@ -6266,7 +6261,7 @@ void CGOpenMPRuntime::emitCancellationPointCall( CGF.Builder.getInt32(getCancellationKind(CancelRegion))}; // Ignore return result until untied tasks are supported. llvm::Value *Result = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_cancellationpoint), Args); // if (__kmpc_cancellationpoint()) { @@ -6296,17 +6291,15 @@ void CGOpenMPRuntime::emitCancelCall(CodeGenFunction &CGF, SourceLocation Loc, auto &M = CGM.getModule(); if (auto *OMPRegionInfo = dyn_cast_or_null(CGF.CapturedStmtInfo)) { - auto &&ThenGen = [&M, Loc, CancelRegion, + auto &&ThenGen = [this, &M, Loc, CancelRegion, OMPRegionInfo](CodeGenFunction &CGF, PrePostActionTy &) { CGOpenMPRuntime &RT = CGF.CGM.getOpenMPRuntime(); llvm::Value *Args[] = { RT.emitUpdateLocation(CGF, Loc), RT.getThreadID(CGF, Loc), CGF.Builder.getInt32(getCancellationKind(CancelRegion))}; // Ignore return result until untied tasks are supported. - llvm::Value *Result = - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - M, OMPRTL___kmpc_cancel), - Args); + llvm::Value *Result = CGF.EmitRuntimeCall( + OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_cancel), Args); // if (__kmpc_cancel()) { // exit from construct; // } @@ -6402,7 +6395,7 @@ void CGOpenMPRuntime::emitUsesAllocatorsInit(CodeGenFunction &CGF, CGF.EmitLoadOfScalar(AllocatorTraitsLVal, AllocatorTraits->getExprLoc()); llvm::Value *AllocatorVal = - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_init_allocator), {ThreadId, MemSpaceHandle, NumTraits, Traits}); // Store to allocator. @@ -6426,8 +6419,8 @@ void CGOpenMPRuntime::emitUsesAllocatorsFini(CodeGenFunction &CGF, CGF.getContext().VoidPtrTy, Allocator->getExprLoc()); (void)CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_destroy_allocator), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_destroy_allocator), {ThreadId, AllocatorVal}); } @@ -9068,8 +9061,8 @@ void CGOpenMPRuntime::emitUserDefinedMapper(const OMPDeclareMapperDecl *D, // pre-existing components. llvm::Value *OffloadingArgs[] = {Handle}; llvm::Value *PreviousSize = MapperCGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___tgt_mapper_num_components), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___tgt_mapper_num_components), OffloadingArgs); llvm::Value *ShiftedPreviousSize = MapperCGF.Builder.CreateShl( PreviousSize, @@ -9176,7 +9169,7 @@ void CGOpenMPRuntime::emitUserDefinedMapper(const OMPDeclareMapperDecl *D, llvm::Value *OffloadingArgs[] = {Handle, CurBaseArg, CurBeginArg, CurSizeArg, CurMapType}; MapperCGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___tgt_push_mapper_component), OffloadingArgs); } @@ -9258,8 +9251,8 @@ void CGOpenMPRuntime::emitUDMapperArrayInitOrDel( // data structure. llvm::Value *OffloadingArgs[] = {Handle, Base, Begin, ArraySize, MapTypeArg}; MapperCGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___tgt_push_mapper_component), + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___tgt_push_mapper_component), OffloadingArgs); } @@ -9282,7 +9275,7 @@ void CGOpenMPRuntime::emitTargetNumIterationsCall( if (llvm::Value *NumIterations = SizeEmitter(CGF, *LD)) { llvm::Value *Args[] = {DeviceID, NumIterations}; CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_push_target_tripcount), Args); } @@ -9411,7 +9404,7 @@ void CGOpenMPRuntime::emitTargetCall( NumTeams, NumThreads}; Return = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), HasNowait ? OMPRTL___tgt_target_teams_nowait : OMPRTL___tgt_target_teams), OffloadingArgs); @@ -9424,7 +9417,7 @@ void CGOpenMPRuntime::emitTargetCall( InputInfo.SizesArray.getPointer(), MapTypesArray}; Return = CGF.EmitRuntimeCall( - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), HasNowait ? OMPRTL___tgt_target_nowait : OMPRTL___tgt_target), OffloadingArgs); @@ -10060,7 +10053,7 @@ llvm::Function *CGOpenMPRuntime::emitRequiresDirectiveRegFun() { "Target or declare target region expected."); if (HasRequiresUnifiedSharedMemory) Flags = OMP_REQ_UNIFIED_SHARED_MEMORY; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___tgt_register_requires), llvm::ConstantInt::get(CGM.Int64Ty, Flags)); CGF.FinishFunction(); @@ -10088,9 +10081,8 @@ void CGOpenMPRuntime::emitTeamsCall(CodeGenFunction &CGF, RealArgs.append(std::begin(Args), std::end(Args)); RealArgs.append(CapturedVars.begin(), CapturedVars.end()); - llvm::FunctionCallee RTLFn = - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_fork_teams); + llvm::FunctionCallee RTLFn = OMPBuilder.getOrCreateRuntimeFunction( + CGM.getModule(), OMPRTL___kmpc_fork_teams); CGF.EmitRuntimeCall(RTLFn, RealArgs); } @@ -10118,7 +10110,7 @@ void CGOpenMPRuntime::emitNumTeamsClause(CodeGenFunction &CGF, // Build call __kmpc_push_num_teamss(&loc, global_tid, num_teams, thread_limit) llvm::Value *PushNumTeamsArgs[] = {RTLoc, getThreadID(CGF, Loc), NumTeamsVal, ThreadLimitVal}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_push_num_teams), PushNumTeamsArgs); } @@ -10173,7 +10165,7 @@ void CGOpenMPRuntime::emitTargetDataCalls( llvm::Value *OffloadingArgs[] = { DeviceID, PointerNum, BasePointersArrayArg, PointersArrayArg, SizesArrayArg, MapTypesArrayArg}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___tgt_target_data_begin), OffloadingArgs); @@ -10210,7 +10202,7 @@ void CGOpenMPRuntime::emitTargetDataCalls( llvm::Value *OffloadingArgs[] = { DeviceID, PointerNum, BasePointersArrayArg, PointersArrayArg, SizesArrayArg, MapTypesArrayArg}; - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___tgt_target_data_end), OffloadingArgs); }; @@ -10372,9 +10364,9 @@ void CGOpenMPRuntime::emitTargetDataStandAloneCall( llvm_unreachable("Unexpected standalone target data directive."); break; } - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), RTLFn), - OffloadingArgs); + CGF.EmitRuntimeCall( + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), RTLFn), + OffloadingArgs); }; auto &&TargetThenGen = [this, &ThenGen, &D, &InputInfo, &MapTypesArray]( @@ -11065,15 +11057,13 @@ void CGOpenMPRuntime::emitDoacrossInit(CodeGenFunction &CGF, CGF.Builder.CreateConstArrayGEP(DimsAddr, 0).getPointer(), CGM.VoidPtrTy)}; - llvm::FunctionCallee RTLFn = - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_doacross_init); + llvm::FunctionCallee RTLFn = OMPBuilder.getOrCreateRuntimeFunction( + CGM.getModule(), OMPRTL___kmpc_doacross_init); CGF.EmitRuntimeCall(RTLFn, Args); llvm::Value *FiniArgs[DoacrossCleanupTy::DoacrossFinArgs] = { emitUpdateLocation(CGF, D.getEndLoc()), getThreadID(CGF, D.getEndLoc())}; - llvm::FunctionCallee FiniRTLFn = - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_doacross_fini); + llvm::FunctionCallee FiniRTLFn = OMPBuilder.getOrCreateRuntimeFunction( + CGM.getModule(), OMPRTL___kmpc_doacross_fini); CGF.EHStack.pushCleanup(NormalAndEHCleanup, FiniRTLFn, llvm::makeArrayRef(FiniArgs)); } @@ -11101,12 +11091,12 @@ void CGOpenMPRuntime::emitDoacrossOrdered(CodeGenFunction &CGF, CGF.Builder.CreateConstArrayGEP(CntAddr, 0).getPointer()}; llvm::FunctionCallee RTLFn; if (C->getDependencyKind() == OMPC_DEPEND_source) { - RTLFn = llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_doacross_post); + RTLFn = OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_doacross_post); } else { assert(C->getDependencyKind() == OMPC_DEPEND_sink); - RTLFn = llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_doacross_wait); + RTLFn = OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), + OMPRTL___kmpc_doacross_wait); } CGF.EmitRuntimeCall(RTLFn, Args); } @@ -11210,14 +11200,13 @@ Address CGOpenMPRuntime::getAddressOfLocalVariable(CodeGenFunction &CGF, llvm::Value *Args[] = {ThreadID, Size, Allocator}; llvm::Value *Addr = - CGF.EmitRuntimeCall(llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction( + CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( CGM.getModule(), OMPRTL___kmpc_alloc), Args, getName({CVD->getName(), ".void.addr"})); llvm::Value *FiniArgs[OMPAllocateCleanupTy::CleanupArgs] = {ThreadID, Addr, Allocator}; - llvm::FunctionCallee FiniRTLFn = - llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction(CGM.getModule(), - OMPRTL___kmpc_free); + llvm::FunctionCallee FiniRTLFn = OMPBuilder.getOrCreateRuntimeFunction( + CGM.getModule(), OMPRTL___kmpc_free); CGF.EHStack.pushCleanup(NormalAndEHCleanup, FiniRTLFn, llvm::makeArrayRef(FiniArgs)); diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h index dea92e16e59f..eb22f155f5ef 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.h +++ b/clang/lib/CodeGen/CGOpenMPRuntime.h @@ -25,6 +25,7 @@ #include "llvm/ADT/StringMap.h" #include "llvm/ADT/StringSet.h" #include "llvm/Frontend/OpenMP/OMPConstants.h" +#include "llvm/Frontend/OpenMP/OMPIRBuilder.h" #include "llvm/IR/Function.h" #include "llvm/IR/ValueHandle.h" #include "llvm/Support/AtomicOrdering.h" @@ -37,6 +38,7 @@ class GlobalVariable; class StructType; class Type; class Value; +class OpenMPIRBuilder; } // namespace llvm namespace clang { @@ -284,6 +286,8 @@ class CGOpenMPRuntime { ~LastprivateConditionalRAII(); }; + llvm::OpenMPIRBuilder &getOMPBuilder() { return OMPBuilder; } + protected: CodeGenModule &CGM; StringRef FirstSeparator, Separator; @@ -368,6 +372,8 @@ class CGOpenMPRuntime { llvm::Value *getCriticalRegionLock(StringRef CriticalName); private: + /// An OpenMP-IR-Builder instance. + llvm::OpenMPIRBuilder OMPBuilder; /// Default const ident_t object used for initialization of all other /// ident_t objects. llvm::Constant *DefaultOpenMPPSource = nullptr; diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp index 4141acfcd1fb..7135135d2a41 100644 --- a/clang/lib/CodeGen/CGStmtOpenMP.cpp +++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp @@ -1569,8 +1569,7 @@ static void emitEmptyBoundParameters(CodeGenFunction &, Address CodeGenFunction::OMPBuilderCBHelpers::getAddressOfLocalVariable( CodeGenFunction &CGF, const VarDecl *VD) { CodeGenModule &CGM = CGF.CGM; - auto OMPBuilder = CGM.getOpenMPIRBuilder(); - assert(OMPBuilder && "OMPIRBuilder does not exist!"); + auto &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); if (!VD) return Address::invalid(); @@ -1607,11 +1606,11 @@ Address CodeGenFunction::OMPBuilderCBHelpers::getAddressOfLocalVariable( Allocator = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(Allocator, CGM.VoidPtrTy); - llvm::Value *Addr = OMPBuilder->CreateOMPAlloc( + llvm::Value *Addr = OMPBuilder.CreateOMPAlloc( CGF.Builder, Size, Allocator, getNameWithSeparators({CVD->getName(), ".void.addr"}, ".", ".")); llvm::CallInst *FreeCI = - OMPBuilder->CreateOMPFree(CGF.Builder, Addr, Allocator); + OMPBuilder.CreateOMPFree(CGF.Builder, Addr, Allocator); CGF.EHStack.pushCleanup(NormalAndEHCleanup, FreeCI); Addr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( @@ -1629,8 +1628,7 @@ Address CodeGenFunction::OMPBuilderCBHelpers::getAddrOfThreadPrivate( CGM.getContext().getTargetInfo().isTLSSupported()) return VDAddr; - llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder(); - assert(OMPBuilder && "OpenMPIRBuilder is not initialized or used."); + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); llvm::Type *VarTy = VDAddr.getElementType(); llvm::Value *Data = @@ -1640,7 +1638,7 @@ Address CodeGenFunction::OMPBuilderCBHelpers::getAddrOfThreadPrivate( llvm::Twine CacheName = Twine(CGM.getMangledName(VD)).concat(Suffix); llvm::CallInst *ThreadPrivateCacheCall = - OMPBuilder->CreateCachedThreadPrivate(CGF.Builder, Data, Size, CacheName); + OMPBuilder.CreateCachedThreadPrivate(CGF.Builder, Data, Size, CacheName); return Address(ThreadPrivateCacheCall, VDAddr.getAlignment()); } @@ -1657,7 +1655,8 @@ std::string CodeGenFunction::OMPBuilderCBHelpers::getNameWithSeparators( return OS.str().str(); } void CodeGenFunction::EmitOMPParallelDirective(const OMPParallelDirective &S) { - if (llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder()) { + if (CGM.getLangOpts().OpenMPIRBuilder) { + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); // Check if we have any if clause associated with the directive. llvm::Value *IfCond = nullptr; if (const auto *C = S.getSingleClause()) @@ -1708,9 +1707,9 @@ void CodeGenFunction::EmitOMPParallelDirective(const OMPParallelDirective &S) { CGCapturedStmtInfo CGSI(*CS, CR_OpenMP); CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI); - Builder.restoreIP(OMPBuilder->CreateParallel(Builder, BodyGenCB, PrivCB, - FiniCB, IfCond, NumThreads, - ProcBind, S.hasCancel())); + Builder.restoreIP(OMPBuilder.CreateParallel(Builder, BodyGenCB, PrivCB, + FiniCB, IfCond, NumThreads, + ProcBind, S.hasCancel())); return; } @@ -3615,7 +3614,8 @@ static void emitMaster(CodeGenFunction &CGF, const OMPExecutableDirective &S) { } void CodeGenFunction::EmitOMPMasterDirective(const OMPMasterDirective &S) { - if (llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder()) { + if (CGM.getLangOpts().OpenMPIRBuilder) { + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy; const CapturedStmt *CS = S.getInnermostCapturedStmt(); @@ -3635,7 +3635,7 @@ void CodeGenFunction::EmitOMPMasterDirective(const OMPMasterDirective &S) { CGCapturedStmtInfo CGSI(*CS, CR_OpenMP); CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI); - Builder.restoreIP(OMPBuilder->CreateMaster(Builder, BodyGenCB, FiniCB)); + Builder.restoreIP(OMPBuilder.CreateMaster(Builder, BodyGenCB, FiniCB)); return; } @@ -3644,7 +3644,8 @@ void CodeGenFunction::EmitOMPMasterDirective(const OMPMasterDirective &S) { } void CodeGenFunction::EmitOMPCriticalDirective(const OMPCriticalDirective &S) { - if (llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder()) { + if (CGM.getLangOpts().OpenMPIRBuilder) { + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy; const CapturedStmt *CS = S.getInnermostCapturedStmt(); @@ -3675,7 +3676,7 @@ void CodeGenFunction::EmitOMPCriticalDirective(const OMPCriticalDirective &S) { CGCapturedStmtInfo CGSI(*CS, CR_OpenMP); CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI); - Builder.restoreIP(OMPBuilder->CreateCritical( + Builder.restoreIP(OMPBuilder.CreateCritical( Builder, BodyGenCB, FiniCB, S.getDirectiveName().getAsString(), HintInst)); @@ -5876,7 +5877,8 @@ void CodeGenFunction::EmitOMPCancelDirective(const OMPCancelDirective &S) { break; } } - if (llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder()) { + if (CGM.getLangOpts().OpenMPIRBuilder) { + llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder(); // TODO: This check is necessary as we only generate `omp parallel` through // the OpenMPIRBuilder for now. if (S.getCancelRegion() == OMPD_parallel) { @@ -5885,7 +5887,7 @@ void CodeGenFunction::EmitOMPCancelDirective(const OMPCancelDirective &S) { IfCondition = EmitScalarExpr(IfCond, /*IgnoreResultAssign=*/true); return Builder.restoreIP( - OMPBuilder->CreateCancel(Builder, IfCondition, S.getCancelRegion())); + OMPBuilder.CreateCancel(Builder, IfCondition, S.getCancelRegion())); } } diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp index babb0ea76aba..8ce488f35dd3 100644 --- a/clang/lib/CodeGen/CodeGenFunction.cpp +++ b/clang/lib/CodeGen/CodeGenFunction.cpp @@ -87,8 +87,8 @@ CodeGenFunction::~CodeGenFunction() { // seems to be a reasonable spot. We do it here, as opposed to the deletion // time of the CodeGenModule, because we have to ensure the IR has not yet // been "emitted" to the outside, thus, modifications are still sensible. - if (llvm::OpenMPIRBuilder *OMPBuilder = CGM.getOpenMPIRBuilder()) - OMPBuilder->finalize(); + if (CGM.getLangOpts().OpenMPIRBuilder) + CGM.getOpenMPRuntime().getOMPBuilder().finalize(); } // Map the LangOption for exception behavior into diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 71778ac31660..4ae8ce7e5ccf 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -222,14 +222,6 @@ void CodeGenModule::createOpenMPRuntime() { OpenMPRuntime.reset(new CGOpenMPRuntime(*this)); break; } - - // The OpenMP-IR-Builder should eventually replace the above runtime codegens - // but we are not there yet so they both reside in CGModule for now and the - // OpenMP-IR-Builder is opt-in only. - if (LangOpts.OpenMPIRBuilder) { - OMPBuilder.reset(new llvm::OpenMPIRBuilder(TheModule)); - OMPBuilder->initialize(); - } } void CodeGenModule::createCUDARuntime() { diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index 1ebc907bbf65..a6c4a1f7b278 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -324,7 +324,6 @@ class CodeGenModule : public CodeGenTypeCache { std::unique_ptr ObjCRuntime; std::unique_ptr OpenCLRuntime; std::unique_ptr OpenMPRuntime; - std::unique_ptr OMPBuilder; std::unique_ptr CUDARuntime; std::unique_ptr DebugInfo; std::unique_ptr ObjCData; @@ -597,9 +596,6 @@ class CodeGenModule : public CodeGenTypeCache { return *OpenMPRuntime; } - /// Return a pointer to the configured OpenMPIRBuilder, if any. - llvm::OpenMPIRBuilder *getOpenMPIRBuilder() { return OMPBuilder.get(); } - /// Return a reference to the configured CUDA runtime. CGCUDARuntime &getCUDARuntime() { assert(CUDARuntime != nullptr); diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h index bfdecdd5d711..d171d0a2b6c4 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h @@ -89,35 +89,6 @@ enum class IdentFlag { #define OMP_IDENT_FLAG(Enum, ...) constexpr auto Enum = omp::IdentFlag::Enum; #include "llvm/Frontend/OpenMP/OMPKinds.def" -/// Forward declarations for LLVM-IR types (simple, function and structure) are -/// generated below. Their names are defined and used in OpenMP/OMPKinds.def. -/// Here we provide the forward declarations, the initializeTypes function will -/// provide the values. -/// -///{ -namespace types { - -#define OMP_TYPE(VarName, InitValue) extern Type *VarName; -#define OMP_ARRAY_TYPE(VarName, ElemTy, ArraySize) \ - extern ArrayType *VarName##Ty; \ - extern PointerType *VarName##PtrTy; -#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ - extern FunctionType *VarName; \ - extern PointerType *VarName##Ptr; -#define OMP_STRUCT_TYPE(VarName, StrName, ...) \ - extern StructType *VarName; \ - extern PointerType *VarName##Ptr; -#include "llvm/Frontend/OpenMP/OMPKinds.def" - -/// Helper to initialize all types defined in OpenMP/OMPKinds.def. -void initializeTypes(Module &M); - -/// Helper to uninitialize all types defined in OpenMP/OMPKinds.def. -void uninitializeTypes(); - -} // namespace types -///} - } // end namespace omp } // end namespace llvm diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 3c2dd6ed4860..2a3a64a5f4ac 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -39,7 +39,7 @@ class OpenMPIRBuilder { void finalize(); /// Add attributes known for \p FnID to \p Fn. - static void addAttributes(omp::RuntimeFunction FnID, Function &Fn); + void addAttributes(omp::RuntimeFunction FnID, Function &Fn); /// Type used throughout for insertion points. using InsertPointTy = IRBuilder<>::InsertPoint; @@ -199,8 +199,8 @@ class OpenMPIRBuilder { } /// Return the function declaration for the runtime function with \p FnID. - static FunctionCallee getOrCreateRuntimeFunction(Module &M, - omp::RuntimeFunction FnID); + FunctionCallee getOrCreateRuntimeFunction(Module &M, + omp::RuntimeFunction FnID); Function *getOrCreateRuntimeFunctionPtr(omp::RuntimeFunction FnID); @@ -381,7 +381,31 @@ class OpenMPIRBuilder { llvm::ConstantInt *Size, const llvm::Twine &Name = Twine("")); + /// Declarations for LLVM-IR types (simple, array, function and structure) are + /// generated below. Their names are defined and used in OpenMPKinds.def. Here + /// we provide the declarations, the initializeTypes function will provide the + /// values. + /// + ///{ +#define OMP_TYPE(VarName, InitValue) Type *VarName = nullptr; +#define OMP_ARRAY_TYPE(VarName, ElemTy, ArraySize) \ + ArrayType *VarName##Ty = nullptr; \ + PointerType *VarName##PtrTy = nullptr; +#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ + FunctionType *VarName = nullptr; \ + PointerType *VarName##Ptr = nullptr; +#define OMP_STRUCT_TYPE(VarName, StrName, ...) \ + StructType *VarName = nullptr; \ + PointerType *VarName##Ptr = nullptr; +#include "llvm/Frontend/OpenMP/OMPKinds.def" + + ///} + private: + /// Create all simple and struct types exposed by the runtime and remember + /// the llvm::PointerTypes of them for easy access later. + void initializeTypes(Module &M); + /// Common interface for generating entry calls for OMP Directives. /// if the directive has a region/body, It will set the insertion /// point to the body diff --git a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp b/llvm/lib/Frontend/OpenMP/OMPConstants.cpp index 471f0361191e..fdee3c5ef658 100644 --- a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPConstants.cpp @@ -17,61 +17,5 @@ using namespace llvm; using namespace omp; -using namespace types; #include "llvm/Frontend/OpenMP/OMP.cpp.inc" - -/// Declarations for LLVM-IR types (simple, array, function and structure) are -/// generated below. Their names are defined and used in OpenMPKinds.def. Here -/// we provide the declarations, the initializeTypes function will provide the -/// values. -/// -///{ -#define OMP_TYPE(VarName, InitValue) Type *llvm::omp::types::VarName = nullptr; -#define OMP_ARRAY_TYPE(VarName, ElemTy, ArraySize) \ - ArrayType *llvm::omp::types::VarName##Ty = nullptr; \ - PointerType *llvm::omp::types::VarName##PtrTy = nullptr; -#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ - FunctionType *llvm::omp::types::VarName = nullptr; \ - PointerType *llvm::omp::types::VarName##Ptr = nullptr; -#define OMP_STRUCT_TYPE(VarName, StrName, ...) \ - StructType *llvm::omp::types::VarName = nullptr; \ - PointerType *llvm::omp::types::VarName##Ptr = nullptr; -#include "llvm/Frontend/OpenMP/OMPKinds.def" - -///} - -void llvm::omp::types::initializeTypes(Module &M) { - if (Void) - return; - - LLVMContext &Ctx = M.getContext(); - // Create all simple and struct types exposed by the runtime and remember - // the llvm::PointerTypes of them for easy access later. - StructType *T; -#define OMP_TYPE(VarName, InitValue) VarName = InitValue; -#define OMP_ARRAY_TYPE(VarName, ElemTy, ArraySize) \ - VarName##Ty = ArrayType::get(ElemTy, ArraySize); \ - VarName##PtrTy = PointerType::getUnqual(VarName##Ty); -#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ - VarName = FunctionType::get(ReturnType, {__VA_ARGS__}, IsVarArg); \ - VarName##Ptr = PointerType::getUnqual(VarName); -#define OMP_STRUCT_TYPE(VarName, StructName, ...) \ - T = M.getTypeByName(StructName); \ - if (!T) \ - T = StructType::create(Ctx, {__VA_ARGS__}, StructName); \ - VarName = T; \ - VarName##Ptr = PointerType::getUnqual(T); -#include "llvm/Frontend/OpenMP/OMPKinds.def" -} - -void llvm::omp::types::uninitializeTypes() { -#define OMP_TYPE(VarName, InitValue) VarName = nullptr; -#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ - VarName = nullptr; \ - VarName##Ptr = nullptr; -#define OMP_STRUCT_TYPE(VarName, StrName, ...) \ - VarName = nullptr; \ - VarName##Ptr = nullptr; -#include "llvm/Frontend/OpenMP/OMPKinds.def" -} diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3e945544170c..b7212edab6ab 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -31,7 +31,6 @@ using namespace llvm; using namespace omp; -using namespace types; static cl::opt OptimisticAttributes("openmp-ir-builder-optimistic-attributes", cl::Hidden, @@ -1092,3 +1091,24 @@ Value *OpenMPIRBuilder::getOMPCriticalRegionLock(StringRef CriticalName) { std::string Name = getNameWithSeparators({Prefix, "var"}, ".", "."); return getOrCreateOMPInternalVariable(KmpCriticalNameTy, Name); } + +// Create all simple and struct types exposed by the runtime and remember +// the llvm::PointerTypes of them for easy access later. +void OpenMPIRBuilder::initializeTypes(Module &M) { + LLVMContext &Ctx = M.getContext(); + StructType *T; +#define OMP_TYPE(VarName, InitValue) VarName = InitValue; +#define OMP_ARRAY_TYPE(VarName, ElemTy, ArraySize) \ + VarName##Ty = ArrayType::get(ElemTy, ArraySize); \ + VarName##PtrTy = PointerType::getUnqual(VarName##Ty); +#define OMP_FUNCTION_TYPE(VarName, IsVarArg, ReturnType, ...) \ + VarName = FunctionType::get(ReturnType, {__VA_ARGS__}, IsVarArg); \ + VarName##Ptr = PointerType::getUnqual(VarName); +#define OMP_STRUCT_TYPE(VarName, StructName, ...) \ + T = M.getTypeByName(StructName); \ + if (!T) \ + T = StructType::create(Ctx, {__VA_ARGS__}, StructName); \ + VarName = T; \ + VarName##Ptr = PointerType::getUnqual(T); +#include "llvm/Frontend/OpenMP/OMPKinds.def" +} diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 60813500359b..8ad562f513e4 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -29,7 +29,6 @@ using namespace llvm; using namespace omp; -using namespace types; #define DEBUG_TYPE "openmp-opt" @@ -263,11 +262,11 @@ struct OMPInformationCache : public InformationCache { ICV.InitValue = nullptr; \ break; \ case ICV_ZERO: \ - ICV.InitValue = \ - ConstantInt::get(Type::getInt32Ty(Int32->getContext()), 0); \ + ICV.InitValue = ConstantInt::get( \ + Type::getInt32Ty(OMPBuilder.Int32->getContext()), 0); \ break; \ case ICV_FALSE: \ - ICV.InitValue = ConstantInt::getFalse(Int1->getContext()); \ + ICV.InitValue = ConstantInt::getFalse(OMPBuilder.Int1->getContext()); \ break; \ case ICV_LAST: \ break; \ @@ -332,16 +331,39 @@ struct OMPInformationCache : public InformationCache { Module &M = *((*ModuleSlice.begin())->getParent()); + // Helper macros for handling __VA_ARGS__ in OMP_RTL +#define OMP_TYPE(VarName, ...) \ + Type *VarName = OMPBuilder.VarName; \ + (void)VarName; + +#define OMP_ARRAY_TYPE(VarName, ...) \ + ArrayType *VarName##Ty = OMPBuilder.VarName##Ty; \ + (void)VarName##Ty; \ + PointerType *VarName##PtrTy = OMPBuilder.VarName##PtrTy; \ + (void)VarName##PtrTy; + +#define OMP_FUNCTION_TYPE(VarName, ...) \ + FunctionType *VarName = OMPBuilder.VarName; \ + (void)VarName; \ + PointerType *VarName##Ptr = OMPBuilder.VarName##Ptr; \ + (void)VarName##Ptr; + +#define OMP_STRUCT_TYPE(VarName, ...) \ + StructType *VarName = OMPBuilder.VarName; \ + (void)VarName; \ + PointerType *VarName##Ptr = OMPBuilder.VarName##Ptr; \ + (void)VarName##Ptr; + #define OMP_RTL(_Enum, _Name, _IsVarArg, _ReturnType, ...) \ { \ SmallVector ArgsTypes({__VA_ARGS__}); \ Function *F = M.getFunction(_Name); \ - if (declMatchesRTFTypes(F, _ReturnType, ArgsTypes)) { \ + if (declMatchesRTFTypes(F, OMPBuilder._ReturnType, ArgsTypes)) { \ auto &RFI = RFIs[_Enum]; \ RFI.Kind = _Enum; \ RFI.Name = _Name; \ RFI.IsVarArg = _IsVarArg; \ - RFI.ReturnType = _ReturnType; \ + RFI.ReturnType = OMPBuilder._ReturnType; \ RFI.ArgumentTypes = std::move(ArgsTypes); \ RFI.Declaration = F; \ unsigned NumUses = CollectUses(RFI); \ @@ -593,11 +615,11 @@ struct OpenMPOpt { "Unexpected replacement value!"); // TODO: Use dominance to find a good position instead. - auto CanBeMoved = [](CallBase &CB) { + auto CanBeMoved = [this](CallBase &CB) { unsigned NumArgs = CB.getNumArgOperands(); if (NumArgs == 0) return true; - if (CB.getArgOperand(0)->getType() != IdentPtr) + if (CB.getArgOperand(0)->getType() != OMPInfoCache.OMPBuilder.IdentPtr) return false; for (unsigned u = 1; u < NumArgs; ++u) if (isa(CB.getArgOperand(u))) @@ -632,7 +654,7 @@ struct OpenMPOpt { // existing and used by one of the calls, or created from scratch. if (CallBase *CI = dyn_cast(ReplVal)) { if (CI->getNumArgOperands() > 0 && - CI->getArgOperand(0)->getType() == IdentPtr) { + CI->getArgOperand(0)->getType() == OMPInfoCache.OMPBuilder.IdentPtr) { Value *Ident = getCombinedIdentFromCallUsesIn(RFI, F, /* GlobalOnly */ true); CI->setArgOperand(0, Ident); diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp index ae0dff43b94c..2ba9d85a0f9e 100644 --- a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp +++ b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp @@ -19,7 +19,6 @@ using namespace llvm; using namespace omp; -using namespace types; namespace { @@ -50,7 +49,6 @@ class OpenMPIRBuilderTest : public testing::Test { void TearDown() override { BB = nullptr; M.reset(); - uninitializeTypes(); } LLVMContext Ctx; From llvm-commits at lists.llvm.org Wed Jul 8 08:25:50 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:25:50 +0000 (UTC) Subject: [PATCH] D83176: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. In-Reply-To: References: Message-ID: <4bfcf1805ee8544dce8c60675209d86c@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG6aab27ba851f: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. (authored by sstefan1). Changed prior to commit: https://reviews.llvm.org/D83176?vs=275549&id=276445#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83176/new/ https://reviews.llvm.org/D83176 Files: clang/lib/CodeGen/CGDecl.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/CodeGen/CGOpenMPRuntime.h clang/lib/CodeGen/CGStmtOpenMP.cpp clang/lib/CodeGen/CodeGenFunction.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.h llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83176.276445.patch Type: text/x-patch Size: 59148 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:28:43 2020 From: llvm-commits at lists.llvm.org (Andrew Ng via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:28:43 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: <4e25f02b7eec443d38b0dbfb56fd8c08@localhost.localdomain> andrewng added a comment. The usual practice is to include the entire context in the diff. However, the change itself LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 From llvm-commits at lists.llvm.org Wed Jul 8 08:28:50 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:28:50 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: jdenny added inline comments. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- clementval wrote: > jdenny wrote: > > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. > Will remove it. Is the default useful? Are all directives covered by cases? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 08:30:03 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:30:03 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: davidxl accepted this revision. davidxl added a comment. lgtm Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 From llvm-commits at lists.llvm.org Wed Jul 8 08:30:17 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:30:17 +0000 (UTC) Subject: [PATCH] D57779: [SLP] Add support for throttling. In-Reply-To: References: Message-ID: ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4097 +bool BoUpSLP::findSubTree(int UserCost) { + auto cmp = [](const TreeEntry *LHS, const TreeEntry *RHS) { + return LHS->Cost > RHS->Cost; ---------------- `Cmp` ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4108-4111 + int i = 0; + for (auto It = Vec.begin(), E = Vec.end(); It != E; ++It, i++) + if (i>MaxCostsRecalculations) + Vec.erase(It); ---------------- Just `Vec.erase(Vec.rbegin(), Vec.rbegin() + (Vec.size() - MaxCostsRecalculations)`? ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4118 + // Avoid reducing the tree if there is no potential room to reduce. + if ((Tree->TreeCost - UserCost - Sum) > -SLPCostThreshold) + return false; ---------------- `>=` ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7259 + Throttling = true; + Cost = V.getTreeCost() + ReductionCost; + } ---------------- Looks like you missed compare ща `Cost` with `-SLPCostThreshold` here. You vectorized the tree after throttling unconditionally. Plus, the `Cost` is calculated here, but not used later except for the debug prints. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6185-6187 + } else if (SLPThrottling && R.findSubTree()) + R.saveTree(); ---------------- dtemirbulatov wrote: > ABataev wrote: > > Actually, `else` is not required here at all. Just make it a standalone `if` statement since there is an early exit in the previous `if` > hmm, No I think it is required here, we don't want to reduce already decided full-tree vectorization. Ho, you don't need it. Read https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57779/new/ https://reviews.llvm.org/D57779 From llvm-commits at lists.llvm.org Wed Jul 8 08:36:35 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Wed, 08 Jul 2020 08:36:35 -0700 (PDT) Subject: [llvm] 37afd99 - Double check that passes correctly set their Modified status Message-ID: <5f05e803.1c69fb81.e240a.09a5@mx.google.com> Author: serge-sans-paille Date: 2020-07-08T17:36:13+02:00 New Revision: 37afd99c768b29c7df7c5f2eb645362fb61f9915 URL: https://github.com/llvm/llvm-project/commit/37afd99c768b29c7df7c5f2eb645362fb61f9915 DIFF: https://github.com/llvm/llvm-project/commit/37afd99c768b29c7df7c5f2eb645362fb61f9915.diff LOG: Double check that passes correctly set their Modified status The approach is simple: if a pass reports that it's not modifying a Function/Module, compute a loose hash of that Function/Module and compare it with the original one. If we report no change but there's a hash change, then we have an error. This approach misses a lot of change but it's not super intrusive and can detect most of the simple mistakes. Differential Revision: https://reviews.llvm.org/D80916 Added: Modified: llvm/lib/IR/LegacyPassManager.cpp llvm/unittests/IR/LegacyPassManagerTest.cpp Removed: ################################################################################ diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 1d9c44f385fb..ae0604432c2a 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -1443,6 +1443,74 @@ void FPPassManager::dumpPassStructure(unsigned Offset) { } } +#ifdef EXPENSIVE_CHECKS +namespace { +namespace details { + +// Basic hashing mechanism to detect structural change to the IR, used to verify +// pass return status consistency with actual change. Loosely copied from +// llvm/lib/Transforms/Utils/FunctionComparator.cpp + +class StructuralHash { + uint64_t Hash = 0x6acaa36bef8325c5ULL; + + void update(uint64_t V) { Hash = hashing::detail::hash_16_bytes(Hash, V); } + +public: + StructuralHash() = default; + + void update(Function &F) { + if (F.empty()) + return; + + update(F.isVarArg()); + update(F.arg_size()); + + SmallVector BBs; + SmallPtrSet VisitedBBs; + + BBs.push_back(&F.getEntryBlock()); + VisitedBBs.insert(BBs[0]); + while (!BBs.empty()) { + const BasicBlock *BB = BBs.pop_back_val(); + update(45798); // Block header + for (auto &Inst : *BB) + update(Inst.getOpcode()); + + const Instruction *Term = BB->getTerminator(); + for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { + if (!VisitedBBs.insert(Term->getSuccessor(i)).second) + continue; + BBs.push_back(Term->getSuccessor(i)); + } + } + } + + void update(Module &M) { + for (Function &F : M) + update(F); + } + + uint64_t getHash() const { return Hash; } +}; + +} // namespace details + +uint64_t StructuralHash(Function &F) { + details::StructuralHash H; + H.update(F); + return H.getHash(); +} + +uint64_t StructuralHash(Module &M) { + details::StructuralHash H; + H.update(M); + return H.getHash(); +} + +} // end anonymous namespace + +#endif /// Execute all of the passes scheduled for execution by invoking /// runOnFunction method. Keep track of whether any of the passes modifies @@ -1481,7 +1549,16 @@ bool FPPassManager::runOnFunction(Function &F) { { PassManagerPrettyStackEntry X(FP, F); TimeRegion PassTimer(getPassTimer(FP)); +#ifdef EXPENSIVE_CHECKS + uint64_t RefHash = StructuralHash(F); +#endif LocalChanged |= FP->runOnFunction(F); + +#ifdef EXPENSIVE_CHECKS + assert((LocalChanged || (RefHash == StructuralHash(F))) && + "Pass modifies its input and doesn't report it."); +#endif + if (EmitICRemark) { unsigned NewSize = F.getInstructionCount(); @@ -1582,7 +1659,17 @@ MPPassManager::runOnModule(Module &M) { PassManagerPrettyStackEntry X(MP, M); TimeRegion PassTimer(getPassTimer(MP)); +#ifdef EXPENSIVE_CHECKS + uint64_t RefHash = StructuralHash(M); +#endif + LocalChanged |= MP->runOnModule(M); + +#ifdef EXPENSIVE_CHECKS + assert((LocalChanged || (RefHash == StructuralHash(M))) && + "Pass modifies its input and doesn't report it."); +#endif + if (EmitICRemark) { // Update the size of the module. unsigned ModuleCount = M.getInstructionCount(); diff --git a/llvm/unittests/IR/LegacyPassManagerTest.cpp b/llvm/unittests/IR/LegacyPassManagerTest.cpp index b7801b52481d..8dda94b1b032 100644 --- a/llvm/unittests/IR/LegacyPassManagerTest.cpp +++ b/llvm/unittests/IR/LegacyPassManagerTest.cpp @@ -680,7 +680,7 @@ namespace llvm { ASSERT_EQ(M->getFunctionList().size(), 4U); Function *F = M->getFunction("test2"); Function *SF = splitSimpleFunction(*F); - CallInst::Create(F, "", &SF->getEntryBlock()); + CallInst::Create(F, "", &*SF->getEntryBlock().getFirstInsertionPt()); ASSERT_EQ(M->getFunctionList().size(), 5U); CGModifierPass *P = new CGModifierPass(); legacy::PassManager Passes; From llvm-commits at lists.llvm.org Wed Jul 8 08:36:43 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:36:43 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <61fd8c838148d73bc097bd8495b2a1c2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG37afd99c768b: Double check that passes correctly set their Modified status (authored by serge-sans-paille). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 Files: llvm/lib/IR/LegacyPassManager.cpp llvm/unittests/IR/LegacyPassManagerTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80916.276447.patch Type: text/x-patch Size: 3685 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:39:35 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:39:35 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: foad marked an inline comment as done. foad added a comment. > The mirror change is needed for globalisel AMDGPUInstructionSelector::selectFlatOffsetImpl doesn't attempt to split offsets so I don't think there's anything to fix. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- arsenm wrote: > arsenm wrote: > > foad wrote: > > > arsenm wrote: > > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? > > Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions > Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) > This limitation also only needs to be applied if AS == FLAT_ADDRESS I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Wed Jul 8 08:41:01 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:41:01 +0000 (UTC) Subject: [PATCH] D83087: DomTree: remove explicit use of DomTreeNodeBase::iterator In-Reply-To: References: Message-ID: kuhar added a comment. In D83087#2139211 , @nhaehnle wrote: > In D83087#2134881 , @kuhar wrote: > > > modulo accidental formatting changes. > > > I'm not aware of any. Some line breaks changed because "const_iterator" is longer than "iterator". Ack. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83087/new/ https://reviews.llvm.org/D83087 From llvm-commits at lists.llvm.org Wed Jul 8 08:41:29 2020 From: llvm-commits at lists.llvm.org (Paul Walker via llvm-commits) Date: Wed, 08 Jul 2020 08:41:29 -0700 (PDT) Subject: [llvm] bb35f0f - [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. Message-ID: <5f05e929.1c69fb81.1ff27.0a43@mx.google.com> Author: Paul Walker Date: 2020-07-08T15:39:25Z New Revision: bb35f0fd89ff4904cc954f1578b6bbe28a6795f1 URL: https://github.com/llvm/llvm-project/commit/bb35f0fd89ff4904cc954f1578b6bbe28a6795f1 DIFF: https://github.com/llvm/llvm-project/commit/bb35f0fd89ff4904cc954f1578b6bbe28a6795f1.diff LOG: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS. However, when calculating the offsets for each of the operands we incorrectly use the element size rather than actual size and thus the stores overlap. Differential Revision: https://reviews.llvm.org/D83303 Added: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp index ff6cc2521661..6a6004c158bb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -1390,12 +1390,17 @@ SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) { } SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) { + assert((Node->getOpcode() == ISD::BUILD_VECTOR || + Node->getOpcode() == ISD::CONCAT_VECTORS) && + "Unexpected opcode!"); + // We can't handle this case efficiently. Allocate a sufficiently - // aligned object on the stack, store each element into it, then load + // aligned object on the stack, store each operand into it, then load // the result as a vector. // Create the stack frame object. EVT VT = Node->getValueType(0); - EVT EltVT = VT.getVectorElementType(); + EVT MemVT = isa(Node) ? VT.getVectorElementType() + : Node->getOperand(0).getValueType(); SDLoc dl(Node); SDValue FIPtr = DAG.CreateStackTemporary(VT); int FI = cast(FIPtr.getNode())->getIndex(); @@ -1404,7 +1409,7 @@ SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) { // Emit a store of each element to the stack slot. SmallVector Stores; - unsigned TypeByteSize = EltVT.getSizeInBits() / 8; + unsigned TypeByteSize = MemVT.getSizeInBits() / 8; assert(TypeByteSize > 0 && "Vector element type too small for stack store!"); // Store (in the right endianness) the elements to memory. for (unsigned i = 0, e = Node->getNumOperands(); i != e; ++i) { @@ -1417,11 +1422,11 @@ SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) { // If the destination vector element type is narrower than the source // element type, only store the bits necessary. - if (EltVT.bitsLT(Node->getOperand(i).getValueType().getScalarType())) { + if (MemVT.bitsLT(Node->getOperand(i).getValueType())) Stores.push_back(DAG.getTruncStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, - PtrInfo.getWithOffset(Offset), EltVT)); - } else + PtrInfo.getWithOffset(Offset), MemVT)); + else Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, PtrInfo.getWithOffset(Offset))); } diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll new file mode 100644 index 000000000000..52574fad8210 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -0,0 +1,38 @@ +; RUN: llc -aarch64-sve-vector-bits-min=256 < %s | FileCheck %s +; RUN: llc -aarch64-sve-vector-bits-min=512 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 + +target triple = "aarch64-unknown-linux-gnu" + +; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in +; validating all combinations of vector type. + +define void @concat_vectors_v4i64(<2 x i64> %a, <2 x i64> %b, <4 x i64> *%c.addr) #0 { +; CHECK-LABEL: concat_vectors_v4i64: +; CHECK: stp q0, q1, [sp] +; CHECK: ptrue [[OUT_PG:p[0-9]+]].d, vl4 +; CHECK: mov x[[LO_ADDR:[0-9]+]], sp +; CHECK: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x[[LO_ADDR]]] + %concat = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> + store <4 x i64> %concat, <4 x i64>* %c.addr + ret void +} + +define void @concat_vectors_v8i64(<4 x i64> *%a.addr, <4 x i64> *%b.addr, <8 x i64> *%c.addr) #0 { +; VBITS_GE_512-LABEL: concat_vectors_v8i64: +; VBITS_GE_512: ptrue [[IN_PG:p[0-9]+]].d, vl4 +; VBITS_GE_512: ld1d { [[LO:z[0-9]+]].d }, [[IN_PG]]/z, [x0] +; VBITS_GE_512: ld1d { [[HI:z[0-9]+]].d }, [[IN_PG]]/z, [x1] +; VBITS_GE_512: mov x[[LO_ADDR:[0-9]+]], sp +; VBITS_GE_512: orr x[[HI_ADDR:[0-9]+]], x[[LO_ADDR]], #0x20 +; VBITS_GE_512: st1d { [[LO]].d }, [[IN_PG]], [x[[LO_ADDR]]] +; VBITS_GE_512: st1d { [[HI]].d }, [[IN_PG]], [x[[HI_ADDR]]] +; VBITS_GE_512: ptrue [[OUT_PG:p[0-9]+]].d, vl8 +; VBITS_GE_512: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x8] + %a = load <4 x i64>, <4 x i64>* %a.addr + %b = load <4 x i64>, <4 x i64>* %b.addr + %concat = shufflevector <4 x i64> %a, <4 x i64> %b, <8 x i32> + store <8 x i64> %concat, <8 x i64>* %c.addr + ret void +} + +attributes #0 = { nounwind "target-features"="+sve" } From llvm-commits at lists.llvm.org Wed Jul 8 08:41:45 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:41:45 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: <500ee439285f15c08fea9cd862996f39@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbb35f0fd89ff: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -0,0 +1,38 @@ +; RUN: llc -aarch64-sve-vector-bits-min=256 < %s | FileCheck %s +; RUN: llc -aarch64-sve-vector-bits-min=512 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 + +target triple = "aarch64-unknown-linux-gnu" + +; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in +; validating all combinations of vector type. + +define void @concat_vectors_v4i64(<2 x i64> %a, <2 x i64> %b, <4 x i64> *%c.addr) #0 { +; CHECK-LABEL: concat_vectors_v4i64: +; CHECK: stp q0, q1, [sp] +; CHECK: ptrue [[OUT_PG:p[0-9]+]].d, vl4 +; CHECK: mov x[[LO_ADDR:[0-9]+]], sp +; CHECK: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x[[LO_ADDR]]] + %concat = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> + store <4 x i64> %concat, <4 x i64>* %c.addr + ret void +} + +define void @concat_vectors_v8i64(<4 x i64> *%a.addr, <4 x i64> *%b.addr, <8 x i64> *%c.addr) #0 { +; VBITS_GE_512-LABEL: concat_vectors_v8i64: +; VBITS_GE_512: ptrue [[IN_PG:p[0-9]+]].d, vl4 +; VBITS_GE_512: ld1d { [[LO:z[0-9]+]].d }, [[IN_PG]]/z, [x0] +; VBITS_GE_512: ld1d { [[HI:z[0-9]+]].d }, [[IN_PG]]/z, [x1] +; VBITS_GE_512: mov x[[LO_ADDR:[0-9]+]], sp +; VBITS_GE_512: orr x[[HI_ADDR:[0-9]+]], x[[LO_ADDR]], #0x20 +; VBITS_GE_512: st1d { [[LO]].d }, [[IN_PG]], [x[[LO_ADDR]]] +; VBITS_GE_512: st1d { [[HI]].d }, [[IN_PG]], [x[[HI_ADDR]]] +; VBITS_GE_512: ptrue [[OUT_PG:p[0-9]+]].d, vl8 +; VBITS_GE_512: ld1d { z{{[0-9]+}}.d }, [[OUT_PG]]/z, [x8] + %a = load <4 x i64>, <4 x i64>* %a.addr + %b = load <4 x i64>, <4 x i64>* %b.addr + %concat = shufflevector <4 x i64> %a, <4 x i64> %b, <8 x i32> + store <8 x i64> %concat, <8 x i64>* %c.addr + ret void +} + +attributes #0 = { nounwind "target-features"="+sve" } Index: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -1390,12 +1390,17 @@ } SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) { + assert((Node->getOpcode() == ISD::BUILD_VECTOR || + Node->getOpcode() == ISD::CONCAT_VECTORS) && + "Unexpected opcode!"); + // We can't handle this case efficiently. Allocate a sufficiently - // aligned object on the stack, store each element into it, then load + // aligned object on the stack, store each operand into it, then load // the result as a vector. // Create the stack frame object. EVT VT = Node->getValueType(0); - EVT EltVT = VT.getVectorElementType(); + EVT MemVT = isa(Node) ? VT.getVectorElementType() + : Node->getOperand(0).getValueType(); SDLoc dl(Node); SDValue FIPtr = DAG.CreateStackTemporary(VT); int FI = cast(FIPtr.getNode())->getIndex(); @@ -1404,7 +1409,7 @@ // Emit a store of each element to the stack slot. SmallVector Stores; - unsigned TypeByteSize = EltVT.getSizeInBits() / 8; + unsigned TypeByteSize = MemVT.getSizeInBits() / 8; assert(TypeByteSize > 0 && "Vector element type too small for stack store!"); // Store (in the right endianness) the elements to memory. for (unsigned i = 0, e = Node->getNumOperands(); i != e; ++i) { @@ -1417,11 +1422,11 @@ // If the destination vector element type is narrower than the source // element type, only store the bits necessary. - if (EltVT.bitsLT(Node->getOperand(i).getValueType().getScalarType())) { + if (MemVT.bitsLT(Node->getOperand(i).getValueType())) Stores.push_back(DAG.getTruncStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, - PtrInfo.getWithOffset(Offset), EltVT)); - } else + PtrInfo.getWithOffset(Offset), MemVT)); + else Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl, Node->getOperand(i), Idx, PtrInfo.getWithOffset(Offset))); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83303.276448.patch Type: text/x-patch Size: 4320 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:42:00 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:42:00 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: ikudrin added a comment. In D82886#2137907 , @dblaikie wrote: > Ah, thanks! Would be handy to have a test case for that & perhaps some other way to communicate "end of list" that's a bit more explicit? For my understanding, that is not yet broken, so does not need to be fixed. > Hmm, I'm not sure why this produce the repetition - if length() accurately returned the length that was read rather than zero, then it'd go to the end and stop, right? `0xffffffff` is a DWARF64 mark, so than it is read, the library expects to read the next 8 bytes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Wed Jul 8 08:42:24 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:42:24 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: <42804ee5d234f9928beaf2b51a24069f@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:77 + support::ubig32_t BssDataSize; + support::ubig32_t EntryPointVirtualAddr; + support::ubig32_t TextStartAddr; ---------------- EntryPointVirtualAddr -> EntryPointAddr. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:90 + support::ubig16_t ModuleType; + char CpuFlag; + char CpuType; ---------------- Why do we use `char` and not `uint8_t`? ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:107 + ///< default value is 0 (system-selected page size). + char Flag; + support::ubig16_t SecNumOfTData; ---------------- Flag -> FlagAndTDataAlignment ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:142 + support::ubig16_t SecNumOfTBSS; + support::ubig16_t X64bitsFlag; +}; ---------------- X64bitsFlag -> XCOFF64Flags ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:229 const XCOFFSectionHeader64 *sectionHeaderTable64() const; - size_t getFileHeaderSize() const; ---------------- Looks like the blank line here is for separating the functions. Why do we want to remove it? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:139 +const XCOFFAuxiliaryHeader64 *XCOFFObjectFile::AuxiliaryHeader64() const { + assert(is64Bit() && "64-bit interface called on a 64-bit object file."); + return static_cast(AuxiliaryHeader); ---------------- 64-bit interface called on a 32-bit object file. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:105 + const XCOFFAuxiliaryHeader64 *AuxHeader64Prt = Obj.AuxiliaryHeader64(); + printAuxiliaryHeaders(AuxHeader64Prt); + } else { ---------------- I don't think you need to define an extra variable here. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:494 + +void XCOFFDumper::printAuxiliaryHeaders(const XCOFFAuxiliaryHeader32 *AuxHeader) { + if (AuxHeader == nullptr) { ---------------- Please consider combine 32 bit and 64 bit version of this function using template, as most of the fields have the same name. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:501 + DictScope DS(W, "AuxiliaryHeader"); + PrintAuxMember(Hex, "Magic", AuxHeader->AuxMagic, AuxSize); + PrintAuxMember(Hex, "Version", AuxHeader->Version, AuxSize); ---------------- Why do you need to pass in `AuxSize` to the macro function when all inputs are the same? ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:594 + AuxSize); + PrintAuxMember(Number, "64 bits XCOFF Flag", AuxHeader->X64bitsFlag, AuxSize); +} ---------------- We should print this as hex. ================ Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:178 + // --auxiliary-headers + cl::opt + XCOFFAuxiliaryHeaders("auxiliary-headers", ---------------- I'm assuming we need to add it somewhere in the llvm docs about this new option. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Wed Jul 8 08:45:47 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:45:47 +0000 (UTC) Subject: [PATCH] D83176: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. In-Reply-To: References: Message-ID: <4becfdf965d80e4601a147745050f086@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG6aab27ba851f: [OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. (authored by sstefan1). Changed prior to commit: https://reviews.llvm.org/D83176?vs=275549&id=275711#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83176/new/ https://reviews.llvm.org/D83176 Files: clang/lib/CodeGen/CGDecl.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/CodeGen/CGOpenMPRuntime.h clang/lib/CodeGen/CGStmtOpenMP.cpp clang/lib/CodeGen/CodeGenFunction.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.h llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83176.275711.patch Type: text/x-patch Size: 59148 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:45:58 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:45:58 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <8136bd87530c0444f6241bb5c1a9ac6a@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG37afd99c768b: Double check that passes correctly set their Modified status (authored by serge-sans-paille). Changed prior to commit: https://reviews.llvm.org/D80916?vs=268675&id=275712#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 Files: llvm/lib/IR/LegacyPassManager.cpp llvm/unittests/IR/LegacyPassManagerTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80916.275712.patch Type: text/x-patch Size: 3685 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:47:43 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:47:43 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <1e1d03365e5fbd6f0311e4a4c5716f33@localhost.localdomain> jdenny added inline comments. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- jdenny wrote: > clementval wrote: > > jdenny wrote: > > > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. > > Will remove it. > Is the default useful? Are all directives covered by cases? This is what I'm thinking of: http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 08:48:20 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:48:20 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: <73c23083c5959213271a9e87c6e52d6f@localhost.localdomain> fhahn updated this revision to Diff 276450. fhahn added a comment. I updated the patch to limit the use of the heap to the src order comperator (which is used in combination with the MachineScheduler) and added extra verification to ensure the heap remains properly ordered and we pick the same candidate as with the existing heuristic. With those verifications enabled, I managed to do a bootstrap build on X86 and built SPEC2000, SPEC2006 and MultiSource on X86 and AArch64 without a crash. After taking a closer look at the source order operator, it looks like the guarantee a total ordering (they fall back to the NodeQueueId if all other criteria are equal and this ID is unique), so I don't think we would run into problems with multiple SUs having the same score. There is a different potential issue though. For some comperators, the scoring of a candidate can change if a different node is scheduled. This does not seem like an issue on practice for the source order comperator, but it is for some of the other comperators, which is why this patch now limits the change to the source order comperator. After looking at the code for the source order comperator, it looks like the score could change after units are scheduled as well in some edge cases. This is not a big problem, as it would happen in a deterministic way and should only have very minor impact on the generated code, as the machinescheduler has the main responsibility for scheduling. What do you think? We might even go further and limit the source order comperator to just the IR ordering and the queue IDs, because the real scheduling should happen in the machine scheduler. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 Files: llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83335.276450.patch Type: text/x-patch Size: 4158 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:49:12 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Wed, 08 Jul 2020 08:49:12 -0700 (PDT) Subject: [llvm] 470bf7b - [Preallocated] Add @llvm.call.preallocated.teardown Message-ID: <5f05eaf8.1c69fb81.b160d.09b8@mx.google.com> Author: Arthur Eubanks Date: 2020-07-08T08:48:44-07:00 New Revision: 470bf7b5a2976b5792a97b2d053a59d4b1082a5f URL: https://github.com/llvm/llvm-project/commit/470bf7b5a2976b5792a97b2d053a59d4b1082a5f DIFF: https://github.com/llvm/llvm-project/commit/470bf7b5a2976b5792a97b2d053a59d4b1082a5f.diff LOG: [Preallocated] Add @llvm.call.preallocated.teardown This cleans up the stack allocated by a @llvm.call.preallocated.setup. Should either call the teardown or the preallocated call to clean up the stack. Calling both is UB. Add LangRef. Add verifier check that the token argument is a @llvm.call.preallocated.setup. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D83354 Added: Modified: llvm/docs/LangRef.rst llvm/include/llvm/IR/Intrinsics.td llvm/lib/IR/Verifier.cpp llvm/test/Verifier/preallocated-invalid.ll llvm/test/Verifier/preallocated-valid.ll Removed: ################################################################################ diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 9e99f4daa90a..cc2f6d1b3a09 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -12103,6 +12103,65 @@ It is undefined behavior if this is called with a token from an preallocated call corresponding to the '``llvm.call.preallocated.setup``' has already been called. +.. _int_call_preallocated_teardown: + +'``llvm.call.preallocated.teardown``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.call.preallocated.teardown(token %setup_token) + +Overview: +""""""""" + +The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack +created by a '``llvm.call.preallocated.setup``'. + +Semantics: +"""""""""" + +The token argument must be a '``llvm.call.preallocated.setup``'. + +The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack +allocated by the corresponding '``llvm.call.preallocated.setup``'. Exactly +one of this or the preallocated call must be called to prevent stack leaks. +It is undefined behavior to call both a '``llvm.call.preallocated.teardown``' +and the preallocated call for a given '``llvm.call.preallocated.setup``'. + +For example, if the stack is allocated for a preallocated call by a +'``llvm.call.preallocated.setup``', then an initializer function called on an +allocated argument throws an exception, there should be a +'``llvm.call.preallocated.teardown``' in the exception handler to prevent +stack leaks. + +Following the nesting rules in '``llvm.call.preallocated.setup``', nested +calls to '``llvm.call.preallocated.setup``' and +'``llvm.call.preallocated.teardown``' are allowed but must be properly +nested. + +Example: +"""""""" + +.. code-block:: llvm + + %cs = call token @llvm.call.preallocated.setup(i32 1) + %x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0) preallocated(i32) + %y = bitcast i8* %x to i32* + invoke void @constructor(i32* %y) to label %conta unwind label %contb + conta: + call void @foo1(i32* preallocated(i32) %y) ["preallocated"(token %cs)] + ret void + contb: + %s = catchswitch within none [label %catch] unwind to caller + catch: + %p = catchpad within %s [] + call void @llvm.call.preallocated.teardown(token %cs) + ret void + Standard C Library Intrinsics ----------------------------- diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td index 433e24979ab9..94741229a2a7 100644 --- a/llvm/include/llvm/IR/Intrinsics.td +++ b/llvm/include/llvm/IR/Intrinsics.td @@ -534,6 +534,7 @@ def int_instrprof_value_profile : Intrinsic<[], def int_call_preallocated_setup : Intrinsic<[llvm_token_ty], [llvm_i32_ty]>; def int_call_preallocated_arg : Intrinsic<[llvm_ptr_ty], [llvm_token_ty, llvm_i32_ty]>; +def int_call_preallocated_teardown : Intrinsic<[], [llvm_token_ty]>; //===------------------- Standard C Library Intrinsics --------------------===// // diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index b5e4ce9f44b3..8fa87b748901 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -4566,6 +4566,9 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { "llvm.call.preallocated.alloc arg index must be between 0 and " "corresponding " "llvm.call.preallocated.setup's argument count"); + } else if (Fn && Fn->getIntrinsicID() == + Intrinsic::call_preallocated_teardown) { + // nothing to do } else { Assert(!FoundCall, "Can have at most one call corresponding to a " "llvm.call.preallocated.setup"); @@ -4614,6 +4617,14 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { "call site attribute"); break; } + case Intrinsic::call_preallocated_teardown: { + auto *Token = dyn_cast(Call.getArgOperand(0)); + Assert(Token && Token->getCalledFunction()->getIntrinsicID() == + Intrinsic::call_preallocated_setup, + "llvm.call.preallocated.teardown token argument must be a " + "llvm.call.preallocated.setup"); + break; + } case Intrinsic::gcroot: case Intrinsic::gcwrite: case Intrinsic::gcread: diff --git a/llvm/test/Verifier/preallocated-invalid.ll b/llvm/test/Verifier/preallocated-invalid.ll index 7fdab33167e5..879d4ed8a24f 100644 --- a/llvm/test/Verifier/preallocated-invalid.ll +++ b/llvm/test/Verifier/preallocated-invalid.ll @@ -2,6 +2,7 @@ declare token @llvm.call.preallocated.setup(i32) declare i8* @llvm.call.preallocated.arg(token, i32) +declare void @llvm.call.preallocated.teardown(token) ; Fake LLVM intrinsic to return a token declare token @llvm.what() @@ -136,3 +137,10 @@ define void @musttail_attr_no_match(i32* preallocated(i32) %a) { musttail call void @musttail_and_bundle(i32* %a) ret void } + +; CHECK: token argument must be a llvm.call.preallocated.setup +define void @teardown_token_not_from_setup() { + %cs = call token @llvm.what() + call void @llvm.call.preallocated.teardown(token %cs) + ret void +} diff --git a/llvm/test/Verifier/preallocated-valid.ll b/llvm/test/Verifier/preallocated-valid.ll index 483493c0c747..bbb663b94ebf 100644 --- a/llvm/test/Verifier/preallocated-valid.ll +++ b/llvm/test/Verifier/preallocated-valid.ll @@ -2,11 +2,16 @@ declare token @llvm.call.preallocated.setup(i32) declare i8* @llvm.call.preallocated.arg(token, i32) +declare void @llvm.call.preallocated.teardown(token) + +declare i32 @__CxxFrameHandler3(...) declare void @foo1(i32* preallocated(i32)) declare i64 @foo1_i64(i32* preallocated(i32)) declare void @foo2(i32* preallocated(i32), i32*, i32* preallocated(i32)) +declare void @constructor(i32*) + define void @preallocated() { %cs = call token @llvm.call.preallocated.setup(i32 1) %x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0) preallocated(i32) @@ -40,12 +45,34 @@ define void @preallocated_num_args() { ret void } -define void @preallocate_musttail(i32* preallocated(i32) %a) { +define void @preallocated_musttail(i32* preallocated(i32) %a) { musttail call void @foo1(i32* preallocated(i32) %a) ret void } -define i64 @preallocate_musttail_i64(i32* preallocated(i32) %a) { +define i64 @preallocated_musttail_i64(i32* preallocated(i32) %a) { %r = musttail call i64 @foo1_i64(i32* preallocated(i32) %a) ret i64 %r } + +define void @preallocated_teardown() { + %cs = call token @llvm.call.preallocated.setup(i32 1) + call void @llvm.call.preallocated.teardown(token %cs) + ret void +} + +define void @preallocated_teardown_invoke() personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) { + %cs = call token @llvm.call.preallocated.setup(i32 1) + %x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0) preallocated(i32) + %y = bitcast i8* %x to i32* + invoke void @constructor(i32* %y) to label %conta unwind label %contb +conta: + call void @foo1(i32* preallocated(i32) %y) ["preallocated"(token %cs)] + ret void +contb: + %s = catchswitch within none [label %catch] unwind to caller +catch: + %p = catchpad within %s [] + call void @llvm.call.preallocated.teardown(token %cs) + ret void +} From llvm-commits at lists.llvm.org Wed Jul 8 08:49:26 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:49:26 +0000 (UTC) Subject: [PATCH] D83354: [Preallocated] Add @llvm.call.preallocated.teardown In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG470bf7b5a297: [Preallocated] Add @llvm.call.preallocated.teardown (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83354/new/ https://reviews.llvm.org/D83354 Files: llvm/docs/LangRef.rst llvm/include/llvm/IR/Intrinsics.td llvm/lib/IR/Verifier.cpp llvm/test/Verifier/preallocated-invalid.ll llvm/test/Verifier/preallocated-valid.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83354.276451.patch Type: text/x-patch Size: 6934 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:50:20 2020 From: llvm-commits at lists.llvm.org (Evgeny Leviant via llvm-commits) Date: Wed, 08 Jul 2020 08:50:20 -0700 (PDT) Subject: [llvm] a074984 - [MIR] Speedup parsing of function with large number of basic blocks Message-ID: <5f05eb3c.1c69fb81.c9b98.0988@mx.google.com> Author: Evgeny Leviant Date: 2020-07-08T18:50:00+03:00 New Revision: a07498425099adbae38d3e8b01a0097fd6791c68 URL: https://github.com/llvm/llvm-project/commit/a07498425099adbae38d3e8b01a0097fd6791c68 DIFF: https://github.com/llvm/llvm-project/commit/a07498425099adbae38d3e8b01a0097fd6791c68.diff LOG: [MIR] Speedup parsing of function with large number of basic blocks Patch eliminates string length calculation when lexing a token. Speedup can be up to 1000x. Differential revision: https://reviews.llvm.org/D83389 Added: Modified: llvm/lib/CodeGen/MIRParser/MIParser.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/MIRParser/MIParser.cpp b/llvm/lib/CodeGen/MIRParser/MIParser.cpp index d6bf273433a2..ded31cd08fb5 100644 --- a/llvm/lib/CodeGen/MIRParser/MIParser.cpp +++ b/llvm/lib/CodeGen/MIRParser/MIParser.cpp @@ -563,7 +563,7 @@ MIParser::MIParser(PerFunctionMIParsingState &PFS, SMDiagnostic &Error, void MIParser::lex(unsigned SkipChar) { CurrentSource = lexMIToken( - CurrentSource.data() + SkipChar, Token, + CurrentSource.slice(SkipChar, StringRef::npos), Token, [this](StringRef::iterator Loc, const Twine &Msg) { error(Loc, Msg); }); } From llvm-commits at lists.llvm.org Wed Jul 8 08:50:29 2020 From: llvm-commits at lists.llvm.org (Eugene Leviant via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:50:29 +0000 (UTC) Subject: [PATCH] D83389: [MIR] Speedup parsing (up to 1000x+) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGa07498425099: [MIR] Speedup parsing of function with large number of basic blocks (authored by evgeny777). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83389/new/ https://reviews.llvm.org/D83389 Files: llvm/lib/CodeGen/MIRParser/MIParser.cpp Index: llvm/lib/CodeGen/MIRParser/MIParser.cpp =================================================================== --- llvm/lib/CodeGen/MIRParser/MIParser.cpp +++ llvm/lib/CodeGen/MIRParser/MIParser.cpp @@ -563,7 +563,7 @@ void MIParser::lex(unsigned SkipChar) { CurrentSource = lexMIToken( - CurrentSource.data() + SkipChar, Token, + CurrentSource.slice(SkipChar, StringRef::npos), Token, [this](StringRef::iterator Loc, const Twine &Msg) { error(Loc, Msg); }); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83389.276453.patch Type: text/x-patch Size: 491 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:51:37 2020 From: llvm-commits at lists.llvm.org (John Reagan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:51:37 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <6982d2bac4fc28532d827b2c02305800@localhost.localdomain> JohnReagan added a comment. Yes, my BLISS language allows structures to have "negative" offsets. You end up passing around a pointer to the "middle" and have some fields in either direction (often private in one direction, public the other). It is also used as crude form of polymorphism back in the day before the word existed. And my BLISS/Pascal let you provide explicit field offsets which then end up with alignment holes scattered around. Along with alignment attributes on fields as well as on the overall structure itself. For Pascal's PACKED RECORDs, subrange values get packed into a small set of bit positions and, unlike C, their alignment isn't derived from the underlying base type. PACKED RECORD F1 : 1..10; F2 : 'A'..'L'; F3 : (RED, WHITE, BLUE); END; Ends up with a size of 2 bytes for the overall type (and any variables of that type), F1 is 4 bits big at offset 0 bits from the start, F2 is 7 bits big at offset 4 bits from the start, F3 is 2 bits big at offset 11 bits from the start. Of course, removing PACKED ends up with a much more pleasant layout but I have a legacy PACKED that started back in the VAX days when bytes were precious and I need to describe it. (The first VAX shipped with a base of 256Kbytes [yes 'K'] but most customers splurged for an entire 1MB - the system max'd out at 8MB) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 08:52:55 2020 From: llvm-commits at lists.llvm.org (Sean Fertile via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:52:55 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: sfertile accepted this revision. sfertile marked an inline comment as done. sfertile added a comment. This revision is now accepted and ready to land. LGTM. ================ Comment at: lld/ELF/Thunks.cpp:982 + if ((s.stOther >> 5) == 1 && type == R_PPC64_REL24) + return make(s); ---------------- stefanp wrote: > MaskRay wrote: > > This needs a comment. > I've added a comment. > Adding the comment forced me to re-assess the condition statement so I've also changed the condition to be just: > ``` > if ((s.stOther >> 5) == 1) > ``` > There was no reason to exclude R_PPC64_REL14 from this condition. > I have also added a test for R_PPC64_REL14. 👍 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 From llvm-commits at lists.llvm.org Wed Jul 8 08:54:19 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Wed, 08 Jul 2020 08:54:19 -0700 (PDT) Subject: [llvm] 481709e - [NewPM][opt] Share -disable-loop-unrolling between pass managers Message-ID: <5f05ec2b.1c69fb81.8ce8c.0ad3@mx.google.com> Author: Arthur Eubanks Date: 2020-07-08T08:50:56-07:00 New Revision: 481709e831b9e14793dea0a825ecc9cd5f1950ca URL: https://github.com/llvm/llvm-project/commit/481709e831b9e14793dea0a825ecc9cd5f1950ca DIFF: https://github.com/llvm/llvm-project/commit/481709e831b9e14793dea0a825ecc9cd5f1950ca.diff LOG: [NewPM][opt] Share -disable-loop-unrolling between pass managers There's no reason to introduce a new option for the NPM. The various PGO options are shared in this manner. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D83368 Added: Modified: llvm/test/Transforms/LoopUnroll/FullUnroll.ll llvm/tools/opt/NewPMDriver.cpp llvm/tools/opt/opt.cpp Removed: ################################################################################ diff --git a/llvm/test/Transforms/LoopUnroll/FullUnroll.ll b/llvm/test/Transforms/LoopUnroll/FullUnroll.ll index 01936e487682..5dee20be3325 100644 --- a/llvm/test/Transforms/LoopUnroll/FullUnroll.ll +++ b/llvm/test/Transforms/LoopUnroll/FullUnroll.ll @@ -1,4 +1,4 @@ -; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -new-pm-disable-loop-unrolling=true \ +; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -disable-loop-unrolling=true \ ; RUN: -S -o - %s | FileCheck %s ; This checks that the loop full unroller will fire in the new pass manager diff --git a/llvm/tools/opt/NewPMDriver.cpp b/llvm/tools/opt/NewPMDriver.cpp index 0b572efc9010..8f8ca352dcff 100644 --- a/llvm/tools/opt/NewPMDriver.cpp +++ b/llvm/tools/opt/NewPMDriver.cpp @@ -102,9 +102,7 @@ static cl::opt OptimizerLastEPPipeline( cl::Hidden); // Individual pipeline tuning options. -static cl::opt DisableLoopUnrolling( - "new-pm-disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); +extern cl::opt DisableLoopUnrolling; extern cl::opt PGOKindFlag; extern cl::opt ProfileFile; diff --git a/llvm/tools/opt/opt.cpp b/llvm/tools/opt/opt.cpp index d31b985dbdde..936cf1081f99 100644 --- a/llvm/tools/opt/opt.cpp +++ b/llvm/tools/opt/opt.cpp @@ -183,10 +183,9 @@ CodeGenOptLevel("codegen-opt-level", static cl::opt TargetTriple("mtriple", cl::desc("Override target triple for module")); -static cl::opt -DisableLoopUnrolling("disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), - cl::init(false)); +cl::opt DisableLoopUnrolling( + "disable-loop-unrolling", + cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); static cl::opt EmitSummaryIndex("module-summary", cl::desc("Emit module summary index"), From llvm-commits at lists.llvm.org Wed Jul 8 08:54:29 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:54:29 +0000 (UTC) Subject: [PATCH] D83368: [NewPM][opt] Share -disable-loop-unrolling between pass managers In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG481709e831b9: [NewPM][opt] Share -disable-loop-unrolling between pass managers (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83368/new/ https://reviews.llvm.org/D83368 Files: llvm/test/Transforms/LoopUnroll/FullUnroll.ll llvm/tools/opt/NewPMDriver.cpp llvm/tools/opt/opt.cpp Index: llvm/tools/opt/opt.cpp =================================================================== --- llvm/tools/opt/opt.cpp +++ llvm/tools/opt/opt.cpp @@ -183,10 +183,9 @@ static cl::opt TargetTriple("mtriple", cl::desc("Override target triple for module")); -static cl::opt -DisableLoopUnrolling("disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), - cl::init(false)); +cl::opt DisableLoopUnrolling( + "disable-loop-unrolling", + cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); static cl::opt EmitSummaryIndex("module-summary", cl::desc("Emit module summary index"), Index: llvm/tools/opt/NewPMDriver.cpp =================================================================== --- llvm/tools/opt/NewPMDriver.cpp +++ llvm/tools/opt/NewPMDriver.cpp @@ -102,9 +102,7 @@ cl::Hidden); // Individual pipeline tuning options. -static cl::opt DisableLoopUnrolling( - "new-pm-disable-loop-unrolling", - cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false)); +extern cl::opt DisableLoopUnrolling; extern cl::opt PGOKindFlag; extern cl::opt ProfileFile; Index: llvm/test/Transforms/LoopUnroll/FullUnroll.ll =================================================================== --- llvm/test/Transforms/LoopUnroll/FullUnroll.ll +++ llvm/test/Transforms/LoopUnroll/FullUnroll.ll @@ -1,4 +1,4 @@ -; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -new-pm-disable-loop-unrolling=true \ +; RUN: opt -passes='default' -disable-verify --mtriple x86_64-pc-linux-gnu -disable-loop-unrolling=true \ ; RUN: -S -o - %s | FileCheck %s ; This checks that the loop full unroller will fire in the new pass manager -------------- next part -------------- A non-text attachment was scrubbed... Name: D83368.276455.patch Type: text/x-patch Size: 1880 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:55:16 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:55:16 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <12c254ea240a4501b522e90fcb11094b@localhost.localdomain> schweitz resigned from this revision. schweitz added a comment. This revision is now accepted and ready to land. I've reverted these models from the upstreamed code for now. It will avoid the warnings. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 08:55:59 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:55:59 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <601c25de8c6c80fec5a8ab9e3b4c41b5@localhost.localdomain> ikudrin updated this revision to Diff 276456. ikudrin added a comment. - Updated the wording for the reporting of premature termination. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 Files: lld/ELF/SyntheticSections.cpp lld/test/ELF/Inputs/gdb-index.s lld/test/ELF/gdb-index-invalid-pubnames.s lld/test/ELF/gdb-index.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83050.276456.patch Type: text/x-patch Size: 15755 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:58:13 2020 From: llvm-commits at lists.llvm.org (Eric Schweitz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:58:13 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <55613c84d409bb40a971119ec3bec7f3@localhost.localdomain> schweitz added a comment. In D83397#2139202 , @echristo wrote: > You also still haven't replied with "what bot/compiler/etc is going to break by making this change". Would you please do that? Locally, we build flang with our changes with g++ and clang++ on a variety of Linux and MacOS. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 08:58:34 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:58:34 +0000 (UTC) Subject: [PATCH] D83303: [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. In-Reply-To: References: Message-ID: JonChesterfield added subscribers: davidb, JonChesterfield. JonChesterfield added a comment. Nice, thank you. @davidb this probably means you can drop the TargetConcatVectors ISD node from your llc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83303/new/ https://reviews.llvm.org/D83303 From llvm-commits at lists.llvm.org Wed Jul 8 08:59:03 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:59:03 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <1ad09a58e825afc0e0c9fa8a09b15be8@localhost.localdomain> avl marked 3 inline comments as done. avl added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:94 /// If it contains any dynamic allocas, returns false. static bool canTRE(Function &F) { // Because of PR962, we don't TRE dynamic allocas. ---------------- efriedma wrote: > If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop this check. It looks to me that original idea(PR962) was to avoid inefficient code which is generated for dynamic alloca. Currently there would still be generated inefficient code: Doing TRE for dynamic alloca requires correct stack adjustment to avoid extra stack usage. i.e. dynamic stack reservation done for alloca should be restored in the end of the current iteration. Current TRE implementation does not do this. Please, consider the test case: ``` #include int count; __attribute__((noinline)) void globalIncrement(const int* param) { count += *param; } void test(int recurseCount) { if (recurseCount == 0) return; { int *temp = (int*)alloca(100); globalIncrement(temp); } test(recurseCount - 1); } ``` Following is the x86 asm generated for the above test case in assumption that dynamic allocas are possible: ``` .LBB1_2: movq %rsp, %rdi addq $-112, %rdi <<<<<<<<<<<<<< dynamic stack reservation, need to be restored before "jne .LBB1_2" movq %rdi, %rsp callq _Z15globalIncrementPKi addl $-1, %ebx jne .LBB1_2 ``` So, it looks like we still have inefficient code here and it was a reason for avoiding TRE. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:808 // Until this is resolved, disable this transformation if that would ever // happen. This bug is PR962. for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /*in loop*/) { ---------------- efriedma wrote: > Can you move this FIXME into a more appropriate spot? OK. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:335 + II->getIntrinsicID() == Intrinsic::assume) + return true; + ---------------- efriedma wrote: > avl wrote: > > efriedma wrote: > > > What is the new handling for lifetime.end/assume doing? > > They are just skipped. In following test case: > > > > > > ``` > > call void @_Z5test5i(i32 %sub) > > call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %1) #5 > > call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5 > > br label %return > > > > ``` > > > > they are generated in between call and ret. It is safe to ignore them while checking whether transformation is possible. > It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) > > I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? > > Since this is sort of tricky, I'd prefer to split this off into a followup. >It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) OK, I would add checking that the pointer argument of lifetime.end is pointing to an alloca. >I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? >Since this is sort of tricky, I'd prefer to split this off into a followup. Ok, I would split Intrinsic::assume into another review. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Wed Jul 8 08:59:32 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:59:32 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Add additional RenamedOp field to PB. In-Reply-To: References: Message-ID: nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LG Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 From llvm-commits at lists.llvm.org Wed Jul 8 09:01:33 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Wed, 08 Jul 2020 09:01:33 -0700 (PDT) Subject: [llvm] 3f17332 - [NewPM][opt] Translate "-O#" to NPM's "default" Message-ID: <5f05eddd.1c69fb81.7839f.0adc@mx.google.com> Author: Arthur Eubanks Date: 2020-07-08T09:01:20-07:00 New Revision: 3f17332aa71542842ceb76e77b45315e6f3ff819 URL: https://github.com/llvm/llvm-project/commit/3f17332aa71542842ceb76e77b45315e6f3ff819 DIFF: https://github.com/llvm/llvm-project/commit/3f17332aa71542842ceb76e77b45315e6f3ff819.diff LOG: [NewPM][opt] Translate "-O#" to NPM's "default" Fixes 52 check-llvm tests under NPM. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D83367 Added: Modified: llvm/test/Other/opt-hot-cold-split.ll llvm/tools/opt/opt.cpp Removed: ################################################################################ diff --git a/llvm/test/Other/opt-hot-cold-split.ll b/llvm/test/Other/opt-hot-cold-split.ll index 971fe130b11c..f43f3a3d893c 100644 --- a/llvm/test/Other/opt-hot-cold-split.ll +++ b/llvm/test/Other/opt-hot-cold-split.ll @@ -1,8 +1,8 @@ ; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -debug-pass=Structure < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=DEFAULT-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os ; REQUIRES: asserts diff --git a/llvm/tools/opt/opt.cpp b/llvm/tools/opt/opt.cpp index 936cf1081f99..0e52134f0100 100644 --- a/llvm/tools/opt/opt.cpp +++ b/llvm/tools/opt/opt.cpp @@ -746,6 +746,18 @@ int main(int argc, char **argv) { for (const auto &P : PassList) { Passes.push_back(P->getPassArgument()); } + if (OptLevelO0) + Passes.push_back("default"); + if (OptLevelO1) + Passes.push_back("default"); + if (OptLevelO2) + Passes.push_back("default"); + if (OptLevelO3) + Passes.push_back("default"); + if (OptLevelOs) + Passes.push_back("default"); + if (OptLevelOz) + Passes.push_back("default"); OutputKind OK = OK_NoOutput; if (!NoOutput) OK = OutputAssembly From llvm-commits at lists.llvm.org Wed Jul 8 09:01:37 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:01:37 +0000 (UTC) Subject: [PATCH] D83367: [NewPM][opt] Translate "-O#" to NPM's "default" In-Reply-To: References: Message-ID: <421cff8dd7aaf7bdc499b27d37c57d92@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG3f17332aa715: [NewPM][opt] Translate "-O#" to NPM's "default<O#>" (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83367/new/ https://reviews.llvm.org/D83367 Files: llvm/test/Other/opt-hot-cold-split.ll llvm/tools/opt/opt.cpp Index: llvm/tools/opt/opt.cpp =================================================================== --- llvm/tools/opt/opt.cpp +++ llvm/tools/opt/opt.cpp @@ -746,6 +746,18 @@ for (const auto &P : PassList) { Passes.push_back(P->getPassArgument()); } + if (OptLevelO0) + Passes.push_back("default"); + if (OptLevelO1) + Passes.push_back("default"); + if (OptLevelO2) + Passes.push_back("default"); + if (OptLevelO3) + Passes.push_back("default"); + if (OptLevelOs) + Passes.push_back("default"); + if (OptLevelOz) + Passes.push_back("default"); OutputKind OK = OK_NoOutput; if (!NoOutput) OK = OutputAssembly Index: llvm/test/Other/opt-hot-cold-split.ll =================================================================== --- llvm/test/Other/opt-hot-cold-split.ll +++ llvm/test/Other/opt-hot-cold-split.ll @@ -1,8 +1,8 @@ ; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -debug-pass=Structure < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=DEFAULT-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os -; RUN: opt -mtriple=x86_64-- -Os -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os ; REQUIRES: asserts -------------- next part -------------- A non-text attachment was scrubbed... Name: D83367.276458.patch Type: text/x-patch Size: 2426 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:01:39 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:01:39 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <173d7acc1f2accb2498bdbd0a8b24838@localhost.localdomain> echristo added a comment. In D83397#2139319 , @schweitz wrote: > I've reverted these models from the upstreamed code for now. It will avoid the warnings. OK, thanks! There are a few more unused ones too, but that's not a warning that's currently enabled for flang (should be eventually of course). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 09:02:16 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:02:16 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <4904c75d49ce4471d6fe86842c7a7b45@localhost.localdomain> echristo added a comment. In D83397#2139336 , @schweitz wrote: > In D83397#2139202 , @echristo wrote: > > > You also still haven't replied with "what bot/compiler/etc is going to break by making this change". Would you please do that? > > > Locally, we build flang with our changes with g++ and clang++ on a variety of Linux and MacOS. So do I as do a bunch of the builders so I'm still unsure what issues you're going to see on the change? Can you elaborate? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Wed Jul 8 09:12:04 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:12:04 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: lei added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:431 +let mayLoad = 1, mayStore = 0, Predicates = [IsISA3_1] in { + // The XFormMemOp flag is set on the instruction format. ---------------- Instead of creating a new section like this, why not add to the existing one on line 469? I realize that does not have `Predicates = [IsISA3_1]`, but I think that is an oversight from previous patch and it should be added as those instructions are also part of ISA3.1. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:439 + +let mayLoad = 0, mayStore = 1, Predicates = [IsISA3_1] in { + // The XFormMemOp flag is set on the instruction format. ---------------- same. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Wed Jul 8 09:13:10 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via llvm-commits) Date: Wed, 08 Jul 2020 09:13:10 -0700 (PDT) Subject: [llvm] 0fc17e9 - [matrix] Add some more Verifier negative tests. NFC. Message-ID: <5f05f096.1c69fb81.4b096.0c21@mx.google.com> Author: Sjoerd Meijer Date: 2020-07-08T17:11:53+01:00 New Revision: 0fc17e9edc8f95e2f52966b2187fa04002f30f3d URL: https://github.com/llvm/llvm-project/commit/0fc17e9edc8f95e2f52966b2187fa04002f30f3d DIFF: https://github.com/llvm/llvm-project/commit/0fc17e9edc8f95e2f52966b2187fa04002f30f3d.diff LOG: [matrix] Add some more Verifier negative tests. NFC. Added: Modified: llvm/test/Verifier/matrix-intrinsics.ll Removed: ################################################################################ diff --git a/llvm/test/Verifier/matrix-intrinsics.ll b/llvm/test/Verifier/matrix-intrinsics.ll index 5f5af03da101..6b2a4c501c66 100644 --- a/llvm/test/Verifier/matrix-intrinsics.ll +++ b/llvm/test/Verifier/matrix-intrinsics.ll @@ -1,40 +1,66 @@ ; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s declare <4 x float> @llvm.matrix.transpose.v4f32(<4 x float>, i32, i32) -define <4 x float> @transpose(<4 x float> %m) { +define <4 x float> @transpose(<4 x float> %m, i32 %arg) { ; CHECK: assembly parsed, but does not verify as correct! ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector - %result.1 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %m, i32 3, i32 2) +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: immarg operand has non-immediate parameter +; CHECK-NEXT: i32 %arg +; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.2, i32 %arg, i32 2) +; CHECK-NEXT: immarg operand has non-immediate parameter +; CHECK-NEXT: i32 %arg +; CHECK-NEXT: %result.4 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.3, i32 2, i32 %arg) + %result.0 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %m, i32 0, i32 0) + %result.1 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.0, i32 3, i32 2) %result.2 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.1, i32 2, i32 1) - ret <4 x float> %result.2 + %result.3 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.2, i32 %arg, i32 2) + %result.4 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.3, i32 2, i32 %arg) + ret <4 x float> %result.4 } declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) -define <4 x float> @multiply(<4 x float> %m) { +define <4 x float> @multiply(<4 x float> %m, i32 %arg) { +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector - %result.1 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %m, <4 x float> %m, i32 3, i32 2, i32 2) +; CHECK-NEXT: immarg operand has non-immediate parameter +; CHECK-NEXT: i32 %arg +; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.2, <4 x float> %m, i32 %arg, i32 2, i32 1) + %result.0 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %m, <4 x float> %m, i32 0, i32 0, i32 0) + %result.1 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.0, <4 x float> %m, i32 3, i32 2, i32 2) %result.2 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.1, <4 x float> %m, i32 2, i32 2, i32 1) - ret <4 x float> %result.2 + %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.2, <4 x float> %m, i32 %arg, i32 2, i32 1) + ret <4 x float> %result.3 } declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>*, i64, i1, i32, i32) declare <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>*, i64, i1, i32, i32) -define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n) { +define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n, i32 %arg) { +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: immarg operand has non-immediate parameter +; CHECK-NEXT: i32 %arg +; CHECK-NEXT: %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>* %n, i64 2, i1 true, i32 3, i32 %arg) + %result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>* %m, i64 0, i1 false, i32 0, i32 0) %result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>* %m, i64 2, i1 false, i32 1, i32 2) %result.2 = call <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>* %n, i64 2, i1 true, i32 3, i32 3) + %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>* %n, i64 2, i1 true, i32 3, i32 %arg) ret <4 x float> %result.1 } declare void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float>, <4 x float>*, i64, i1, i32, i32) declare void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float>, <6 x float>*, i64, i1, i32, i32) -define void @column.major_store(<4 x float>* %m, <6 x float>* %n) { +define void @column.major_store(<4 x float>* %m, <6 x float>* %n, i64 %arg) { +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: result of a matrix operation does not fit in the returned vector + call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 0, i1 false, i32 0, i32 0) call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 1, i32 2) call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 2, i1 false, i32 3, i32 3) + call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 %arg, i1 false, i32 3, i32 3) ret void } From llvm-commits at lists.llvm.org Wed Jul 8 09:15:38 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Wed, 08 Jul 2020 09:15:38 -0700 (PDT) Subject: [llvm] bf9a940 - Revert "Double check that passes correctly set their Modified status" Message-ID: <5f05f12a.1c69fb81.41061.0c3b@mx.google.com> Author: serge-sans-paille Date: 2020-07-08T18:14:40+02:00 New Revision: bf9a940c3f1b460420b1106fe5b1565fd60be5a2 URL: https://github.com/llvm/llvm-project/commit/bf9a940c3f1b460420b1106fe5b1565fd60be5a2 DIFF: https://github.com/llvm/llvm-project/commit/bf9a940c3f1b460420b1106fe5b1565fd60be5a2.diff LOG: Revert "Double check that passes correctly set their Modified status" This reverts commit 37afd99c768b29c7df7c5f2eb645362fb61f9915. Added: Modified: llvm/lib/IR/LegacyPassManager.cpp llvm/unittests/IR/LegacyPassManagerTest.cpp Removed: ################################################################################ diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index ae0604432c2a..1d9c44f385fb 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -1443,74 +1443,6 @@ void FPPassManager::dumpPassStructure(unsigned Offset) { } } -#ifdef EXPENSIVE_CHECKS -namespace { -namespace details { - -// Basic hashing mechanism to detect structural change to the IR, used to verify -// pass return status consistency with actual change. Loosely copied from -// llvm/lib/Transforms/Utils/FunctionComparator.cpp - -class StructuralHash { - uint64_t Hash = 0x6acaa36bef8325c5ULL; - - void update(uint64_t V) { Hash = hashing::detail::hash_16_bytes(Hash, V); } - -public: - StructuralHash() = default; - - void update(Function &F) { - if (F.empty()) - return; - - update(F.isVarArg()); - update(F.arg_size()); - - SmallVector BBs; - SmallPtrSet VisitedBBs; - - BBs.push_back(&F.getEntryBlock()); - VisitedBBs.insert(BBs[0]); - while (!BBs.empty()) { - const BasicBlock *BB = BBs.pop_back_val(); - update(45798); // Block header - for (auto &Inst : *BB) - update(Inst.getOpcode()); - - const Instruction *Term = BB->getTerminator(); - for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { - if (!VisitedBBs.insert(Term->getSuccessor(i)).second) - continue; - BBs.push_back(Term->getSuccessor(i)); - } - } - } - - void update(Module &M) { - for (Function &F : M) - update(F); - } - - uint64_t getHash() const { return Hash; } -}; - -} // namespace details - -uint64_t StructuralHash(Function &F) { - details::StructuralHash H; - H.update(F); - return H.getHash(); -} - -uint64_t StructuralHash(Module &M) { - details::StructuralHash H; - H.update(M); - return H.getHash(); -} - -} // end anonymous namespace - -#endif /// Execute all of the passes scheduled for execution by invoking /// runOnFunction method. Keep track of whether any of the passes modifies @@ -1549,16 +1481,7 @@ bool FPPassManager::runOnFunction(Function &F) { { PassManagerPrettyStackEntry X(FP, F); TimeRegion PassTimer(getPassTimer(FP)); -#ifdef EXPENSIVE_CHECKS - uint64_t RefHash = StructuralHash(F); -#endif LocalChanged |= FP->runOnFunction(F); - -#ifdef EXPENSIVE_CHECKS - assert((LocalChanged || (RefHash == StructuralHash(F))) && - "Pass modifies its input and doesn't report it."); -#endif - if (EmitICRemark) { unsigned NewSize = F.getInstructionCount(); @@ -1659,17 +1582,7 @@ MPPassManager::runOnModule(Module &M) { PassManagerPrettyStackEntry X(MP, M); TimeRegion PassTimer(getPassTimer(MP)); -#ifdef EXPENSIVE_CHECKS - uint64_t RefHash = StructuralHash(M); -#endif - LocalChanged |= MP->runOnModule(M); - -#ifdef EXPENSIVE_CHECKS - assert((LocalChanged || (RefHash == StructuralHash(M))) && - "Pass modifies its input and doesn't report it."); -#endif - if (EmitICRemark) { // Update the size of the module. unsigned ModuleCount = M.getInstructionCount(); diff --git a/llvm/unittests/IR/LegacyPassManagerTest.cpp b/llvm/unittests/IR/LegacyPassManagerTest.cpp index 8dda94b1b032..b7801b52481d 100644 --- a/llvm/unittests/IR/LegacyPassManagerTest.cpp +++ b/llvm/unittests/IR/LegacyPassManagerTest.cpp @@ -680,7 +680,7 @@ namespace llvm { ASSERT_EQ(M->getFunctionList().size(), 4U); Function *F = M->getFunction("test2"); Function *SF = splitSimpleFunction(*F); - CallInst::Create(F, "", &*SF->getEntryBlock().getFirstInsertionPt()); + CallInst::Create(F, "", &SF->getEntryBlock()); ASSERT_EQ(M->getFunctionList().size(), 5U); CGModifierPass *P = new CGModifierPass(); legacy::PassManager Passes; From llvm-commits at lists.llvm.org Wed Jul 8 09:15:50 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:15:50 +0000 (UTC) Subject: [PATCH] D77129: [Verifier] Verify matrix dimensions operands match vector size. In-Reply-To: References: Message-ID: SjoerdMeijer added inline comments. ================ Comment at: llvm/lib/IR/Verifier.cpp:4805 + NumColumns = cast(Call.getArgOperand(4)); + TypeToCheck = cast(Call.getType()); + break; ---------------- fhahn wrote: > SjoerdMeijer wrote: > > fhahn wrote: > > > SjoerdMeijer wrote: > > > > Quick query on this and the semantics: > > > > > > > > declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 , i32 , i32 ) > > > > > > > > do we expect the element types of vectors %A and %B to be same, and do we need to check this? > > > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. > > > > > > To generate code for llvm.aarch64.neon.udot & co, there probably needs to be a way to have different element type widths for result and source operands. > > > Yes, the element types of all types must match currently, but I think it is neither checked in the verifier nor explicit in the LangRef. > > > > I started looking at the matrix support, getting up to speed with it, and this is where I started and the first thing I noticed. Was just asking about that here as a sanity check. I wouldn't mind putting up a patch for that if that's helpful. Probably the least we can do for not is to check if we are not mixing integers and float types, and then we also need to add that to LangRef and be explicit about that. > > I wouldn't mind putting up a patch for that if that's helpful. > > That would be great. I think things will fall apart/miscompile if the element types differ at the moment. cool, will do. FYI: I started with committing a NFC patch adding some more negative tests: rG0fc17e9edc8f Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77129/new/ https://reviews.llvm.org/D77129 From llvm-commits at lists.llvm.org Wed Jul 8 09:17:00 2020 From: llvm-commits at lists.llvm.org (Xun Li via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:17:00 +0000 (UTC) Subject: [PATCH] D83379: [Coroutines] Refactor sinkLifetimeStartMarkers In-Reply-To: References: Message-ID: <77eee840b1b1ff2a0032fa43878718bb@localhost.localdomain> lxfind added a comment. Thank you for looking into the fix! ================ Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1578 + auto isUsedByLifetimeStart = [&](Instruction *I) { + if (isa(I) && I->hasOneUse()) + if (auto *IT = dyn_cast(I->user_back())) ---------------- If I is a BitCastInst, wouldn't it be used by both lifetime.start and lifetime.end intrinsics, and hence has more than one user? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83379/new/ https://reviews.llvm.org/D83379 From llvm-commits at lists.llvm.org Wed Jul 8 09:19:01 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?Q?Nicolai_H=C3=A4hnle?= via llvm-commits) Date: Wed, 08 Jul 2020 09:19:01 -0700 (PDT) Subject: [llvm] 3fa989d - DomTree: remove explicit use of DomTreeNodeBase::iterator Message-ID: <5f05f1f5.1c69fb81.aeec7.0c2c@mx.google.com> Author: Nicolai Hähnle Date: 2020-07-08T18:18:49+02:00 New Revision: 3fa989d4fd6b854209ba4e950d96b91d6d5797b4 URL: https://github.com/llvm/llvm-project/commit/3fa989d4fd6b854209ba4e950d96b91d6d5797b4 DIFF: https://github.com/llvm/llvm-project/commit/3fa989d4fd6b854209ba4e950d96b91d6d5797b4.diff LOG: DomTree: remove explicit use of DomTreeNodeBase::iterator Summary: Almost all uses of these iterators, including implicit ones, really only need the const variant (as it should be). The only exception is in NewGVN, which changes the order of dominator tree child nodes. Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4 Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits Tags: #clang, #mlir, #llvm Differential Revision: https://reviews.llvm.org/D83087 Added: Modified: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/IR/Dominators.h llvm/lib/Target/X86/X86InstrInfo.cpp llvm/lib/Transforms/Scalar/EarlyCSE.cpp llvm/lib/Transforms/Scalar/Sink.cpp llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h mlir/include/mlir/IR/Dominance.h mlir/lib/Transforms/CSE.cpp Removed: ################################################################################ diff --git a/clang/include/clang/Analysis/Analyses/Dominators.h b/clang/include/clang/Analysis/Analyses/Dominators.h index 51d86f6e4540..95a661138df4 100644 --- a/clang/include/clang/Analysis/Analyses/Dominators.h +++ b/clang/include/clang/Analysis/Analyses/Dominators.h @@ -349,7 +349,7 @@ ClangCFGPostDomReverseChildrenGetter::Get( /// template <> struct GraphTraits { using NodeRef = ::clang::DomTreeNode *; - using ChildIteratorType = ::clang::DomTreeNode::iterator; + using ChildIteratorType = ::clang::DomTreeNode::const_iterator; static NodeRef getEntryNode(NodeRef N) { return N; } static ChildIteratorType child_begin(NodeRef N) { return N->begin(); } diff --git a/llvm/include/llvm/CodeGen/MachineDominators.h b/llvm/include/llvm/CodeGen/MachineDominators.h index 2d26163a76aa..cf3af4d38223 100644 --- a/llvm/include/llvm/CodeGen/MachineDominators.h +++ b/llvm/include/llvm/CodeGen/MachineDominators.h @@ -261,7 +261,8 @@ template struct GraphTraits; template <> struct GraphTraits : public MachineDomTreeGraphTraitsBase {}; + MachineDomTreeNode::const_iterator> { +}; template <> struct GraphTraits diff --git a/llvm/include/llvm/IR/Dominators.h b/llvm/include/llvm/IR/Dominators.h index 0084ac0b655a..71595cb15df4 100644 --- a/llvm/include/llvm/IR/Dominators.h +++ b/llvm/include/llvm/IR/Dominators.h @@ -208,7 +208,8 @@ template struct DomTreeGraphTraitsBase { template <> struct GraphTraits - : public DomTreeGraphTraitsBase {}; + : public DomTreeGraphTraitsBase { +}; template <> struct GraphTraits diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp index 46ff62f7a4ed..42c111173570 100644 --- a/llvm/lib/Target/X86/X86InstrInfo.cpp +++ b/llvm/lib/Target/X86/X86InstrInfo.cpp @@ -8660,8 +8660,7 @@ namespace { } // Visit the children of this block in the dominator tree. - for (MachineDomTreeNode::iterator I = Node->begin(), E = Node->end(); - I != E; ++I) { + for (auto I = Node->begin(), E = Node->end(); I != E; ++I) { Changed |= VisitNode(*I, TLSBaseAddrReg); } diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp index 1dac64a12d3e..ddfc8555b0a0 100644 --- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp +++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp @@ -620,8 +620,8 @@ class EarlyCSE { public: StackNode(ScopedHTType &AvailableValues, LoadHTType &AvailableLoads, InvariantHTType &AvailableInvariants, CallHTType &AvailableCalls, - unsigned cg, DomTreeNode *n, DomTreeNode::iterator child, - DomTreeNode::iterator end) + unsigned cg, DomTreeNode *n, DomTreeNode::const_iterator child, + DomTreeNode::const_iterator end) : CurrentGeneration(cg), ChildGeneration(cg), Node(n), ChildIter(child), EndIter(end), Scopes(AvailableValues, AvailableLoads, AvailableInvariants, @@ -635,7 +635,7 @@ class EarlyCSE { unsigned childGeneration() { return ChildGeneration; } void childGeneration(unsigned generation) { ChildGeneration = generation; } DomTreeNode *node() { return Node; } - DomTreeNode::iterator childIter() { return ChildIter; } + DomTreeNode::const_iterator childIter() { return ChildIter; } DomTreeNode *nextChild() { DomTreeNode *child = *ChildIter; @@ -643,7 +643,7 @@ class EarlyCSE { return child; } - DomTreeNode::iterator end() { return EndIter; } + DomTreeNode::const_iterator end() { return EndIter; } bool isProcessed() { return Processed; } void process() { Processed = true; } @@ -651,8 +651,8 @@ class EarlyCSE { unsigned CurrentGeneration; unsigned ChildGeneration; DomTreeNode *Node; - DomTreeNode::iterator ChildIter; - DomTreeNode::iterator EndIter; + DomTreeNode::const_iterator ChildIter; + DomTreeNode::const_iterator EndIter; NodeScope Scopes; bool Processed = false; }; diff --git a/llvm/lib/Transforms/Scalar/Sink.cpp b/llvm/lib/Transforms/Scalar/Sink.cpp index 677d86f8c7b4..48f289c8f17d 100644 --- a/llvm/lib/Transforms/Scalar/Sink.cpp +++ b/llvm/lib/Transforms/Scalar/Sink.cpp @@ -166,8 +166,8 @@ static bool SinkInstruction(Instruction *Inst, // dominated by one of the successors. // Look at all the dominated blocks and see if we can sink it in one. DomTreeNode *DTN = DT.getNode(Inst->getParent()); - for (DomTreeNode::iterator I = DTN->begin(), E = DTN->end(); - I != E && SuccToSinkTo == nullptr; ++I) { + for (auto I = DTN->begin(), E = DTN->end(); I != E && SuccToSinkTo == nullptr; + ++I) { BasicBlock *Candidate = (*I)->getBlock(); // A node always immediate-dominates its children on the dominator // tree. diff --git a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h index 19f5d2c00c60..a42ebc9ee955 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h +++ b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h @@ -30,7 +30,8 @@ using VPDomTreeNode = DomTreeNodeBase; /// Template specializations of GraphTraits for VPDomTreeNode. template <> struct GraphTraits - : public DomTreeGraphTraitsBase {}; + : public DomTreeGraphTraitsBase {}; template <> struct GraphTraits diff --git a/mlir/include/mlir/IR/Dominance.h b/mlir/include/mlir/IR/Dominance.h index 97ec99007708..9d9a19996765 100644 --- a/mlir/include/mlir/IR/Dominance.h +++ b/mlir/include/mlir/IR/Dominance.h @@ -141,7 +141,7 @@ namespace llvm { /// DominatorTree GraphTraits specialization so the DominatorTree can be /// iterated by generic graph iterators. template <> struct GraphTraits { - using ChildIteratorType = mlir::DominanceInfoNode::iterator; + using ChildIteratorType = mlir::DominanceInfoNode::const_iterator; using NodeRef = mlir::DominanceInfoNode *; static NodeRef getEntryNode(NodeRef N) { return N; } diff --git a/mlir/lib/Transforms/CSE.cpp b/mlir/lib/Transforms/CSE.cpp index eb31fd207d79..5f4791450c7b 100644 --- a/mlir/lib/Transforms/CSE.cpp +++ b/mlir/lib/Transforms/CSE.cpp @@ -64,7 +64,7 @@ struct CSE : public CSEBase { ScopedMapTy::ScopeTy scope; DominanceInfoNode *node; - DominanceInfoNode::iterator childIterator; + DominanceInfoNode::const_iterator childIterator; /// If this node has been fully processed yet or not. bool processed; From llvm-commits at lists.llvm.org Wed Jul 8 09:19:05 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 16:19:05 +0000 (UTC) Subject: [PATCH] D83087: DomTree: remove explicit use of DomTreeNodeBase::iterator In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG3fa989d4fd6b: DomTree: remove explicit use of DomTreeNodeBase::iterator (authored by nhaehnle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83087/new/ https://reviews.llvm.org/D83087 Files: clang/include/clang/Analysis/Analyses/Dominators.h llvm/include/llvm/CodeGen/MachineDominators.h llvm/include/llvm/IR/Dominators.h llvm/lib/Target/X86/X86InstrInfo.cpp llvm/lib/Transforms/Scalar/EarlyCSE.cpp llvm/lib/Transforms/Scalar/Sink.cpp llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h mlir/include/mlir/IR/Dominance.h mlir/lib/Transforms/CSE.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83087.276460.patch Type: text/x-patch Size: 6404 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:20:52 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via llvm-commits) Date: Wed, 08 Jul 2020 09:20:52 -0700 (PDT) Subject: [llvm] cca8578 - [SystemZ] Allow specifying integer registers as part of the address calculation Message-ID: <5f05f264.1c69fb81.a768b.0b6c@mx.google.com> Author: Ulrich Weigand Date: 2020-07-08T18:20:24+02:00 New Revision: cca8578efab096fddcb0134b28b17f4758e9afa0 URL: https://github.com/llvm/llvm-project/commit/cca8578efab096fddcb0134b28b17f4758e9afa0 DIFF: https://github.com/llvm/llvm-project/commit/cca8578efab096fddcb0134b28b17f4758e9afa0.diff LOG: [SystemZ] Allow specifying integer registers as part of the address calculation Revision e1de2773a534957305d7a559c6d88c4b5ac354e2 provided support for accepting integer registers in inline asm i.e. __asm("lhi %r0, 5") -> lhi %r0, 5 __asm("lhi 0, 5") -> lhi 0,5 This patch aims to extend this support to instructions which compute addresses as well. (i.e instructions of type BDMem and BD[X|R|V|L]Mem) Author: anirudhp Differential Revision: https://reviews.llvm.org/D83251 Added: Modified: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/test/MC/SystemZ/insn-bad.s llvm/test/MC/SystemZ/insn-good-z13.s llvm/test/MC/SystemZ/insn-good-z14.s llvm/test/MC/SystemZ/insn-good-z15.s llvm/test/MC/SystemZ/insn-good.s llvm/test/MC/SystemZ/regs-good.s llvm/test/MC/SystemZ/tokens.s Removed: ################################################################################ diff --git a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp index a3110248e8e0..d5a3a19446c7 100644 --- a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp +++ b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp @@ -405,14 +405,16 @@ class SystemZAsmParser : public MCTargetAsmParser { bool parseRegister(Register &Reg, bool RestoreOnFailure = false); + bool parseIntegerRegister(Register &Reg, RegisterGroup Group); + OperandMatchResultTy parseRegister(OperandVector &Operands, RegisterKind Kind); OperandMatchResultTy parseAnyRegister(OperandVector &Operands); - bool parseAddress(bool &HaveReg1, Register &Reg1, - bool &HaveReg2, Register &Reg2, - const MCExpr *&Disp, const MCExpr *&Length); + bool parseAddress(bool &HaveReg1, Register &Reg1, bool &HaveReg2, + Register &Reg2, const MCExpr *&Disp, const MCExpr *&Length, + bool HasLength = false, bool HasVectorIndex = false); bool parseAddressRegister(Register &Reg); bool ParseDirectiveInsn(SMLoc L); @@ -748,82 +750,60 @@ bool SystemZAsmParser::parseRegister(Register &Reg, bool RestoreOnFailure) { // Parse a register of kind Kind and add it to Operands. OperandMatchResultTy SystemZAsmParser::parseRegister(OperandVector &Operands, RegisterKind Kind) { - SMLoc StartLoc, EndLoc; - unsigned RegNum; + Register Reg; + RegisterGroup Group; + switch (Kind) { + case GR32Reg: + case GRH32Reg: + case GR64Reg: + case GR128Reg: + Group = RegGR; + break; + case FP32Reg: + case FP64Reg: + case FP128Reg: + Group = RegFP; + break; + case VR32Reg: + case VR64Reg: + case VR128Reg: + Group = RegV; + break; + case AR32Reg: + Group = RegAR; + break; + case CR64Reg: + Group = RegCR; + break; + } - // Handle register names of the form %. + // Handle register names of the form % if (Parser.getTok().is(AsmToken::Percent)) { - Register Reg; if (parseRegister(Reg)) return MatchOperand_ParseFail; - // Verify that a register prefix appropriate for Kind was used. - bool PrefixMatch; - switch (Kind) { - case GR32Reg: - case GRH32Reg: - case GR64Reg: - case GR128Reg: - PrefixMatch = Reg.Group == RegGR; - break; - case FP32Reg: - case FP64Reg: - case FP128Reg: - PrefixMatch = Reg.Group == RegFP; - break; - case VR32Reg: - case VR64Reg: - case VR128Reg: - // It is OK to use the %f prefix with vector instructions that - // expect some VR..Reg kind, so accept the RegFP group as well. - PrefixMatch = Reg.Group == RegV || Reg.Group == RegFP; - break; - case AR32Reg: - PrefixMatch = Reg.Group == RegAR; - break; - case CR64Reg: - PrefixMatch = Reg.Group == RegCR; - break; - } - if (!PrefixMatch) { - Error(Reg.StartLoc, "invalid operand for instruction"); - return MatchOperand_ParseFail; - } - - RegNum = Reg.Num; - StartLoc = Reg.StartLoc; - EndLoc = Reg.EndLoc; - } - // Also allow specifying just a plain register number as integer. - else if (Parser.getTok().is(AsmToken::Integer)) { - const MCExpr *Register; - StartLoc = Parser.getTok().getLoc(); - if (Parser.parseExpression(Register)) - return MatchOperand_ParseFail; - - auto *CE = dyn_cast(Register); - if (!CE) - return MatchOperand_ParseFail; - - int64_t MaxRegNum; - switch (Kind) { - case VR32Reg: - case VR64Reg: - case VR128Reg: - MaxRegNum = 31; + // Check the parsed register group "Reg.Group" with the expected "Group" + // Have to error out if user specified wrong prefix. + switch (Group) { + case RegGR: + case RegFP: + case RegAR: + case RegCR: + if (Group != Reg.Group) { + Error(Reg.StartLoc, "invalid operand for instruction"); + return MatchOperand_ParseFail; + } break; - default: - MaxRegNum = 15; + case RegV: + if (Reg.Group != RegV && Reg.Group != RegFP) { + Error(Reg.StartLoc, "invalid operand for instruction"); + return MatchOperand_ParseFail; + } break; } - int64_t Value = CE->getValue(); - if (Value < 0 || Value > MaxRegNum) { - Error(StartLoc, "invalid register"); + } else if (Parser.getTok().is(AsmToken::Integer)) { + if (parseIntegerRegister(Reg, Group)) return MatchOperand_ParseFail; - } - RegNum = (unsigned) Value; - - EndLoc = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); } // Otherwise we didn't match a register operand. else @@ -845,13 +825,13 @@ SystemZAsmParser::parseRegister(OperandVector &Operands, RegisterKind Kind) { case AR32Reg: Regs = SystemZMC::AR32Regs; break; case CR64Reg: Regs = SystemZMC::CR64Regs; break; } - if (Regs[RegNum] == 0) { - Error(StartLoc, "invalid register pair"); + if (Regs[Reg.Num] == 0) { + Error(Reg.StartLoc, "invalid register pair"); return MatchOperand_ParseFail; } - Operands.push_back(SystemZOperand::createReg(Kind, Regs[RegNum], - StartLoc, EndLoc)); + Operands.push_back( + SystemZOperand::createReg(Kind, Regs[Reg.Num], Reg.StartLoc, Reg.EndLoc)); return MatchOperand_Success; } @@ -916,11 +896,39 @@ SystemZAsmParser::parseAnyRegister(OperandVector &Operands) { return MatchOperand_Success; } +bool SystemZAsmParser::parseIntegerRegister(Register &Reg, + RegisterGroup Group) { + Reg.StartLoc = Parser.getTok().getLoc(); + // We have an integer token + const MCExpr *Register; + if (Parser.parseExpression(Register)) + return true; + + const auto *CE = dyn_cast(Register); + if (!CE) + return true; + + int64_t MaxRegNum = (Group == RegV) ? 31 : 15; + int64_t Value = CE->getValue(); + if (Value < 0 || Value > MaxRegNum) { + Error(Parser.getTok().getLoc(), "invalid register"); + return true; + } + + // Assign the Register Number + Reg.Num = (unsigned)Value; + Reg.Group = Group; + Reg.EndLoc = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); + + // At this point, successfully parsed an integer register. + return false; +} + // Parse a memory operand into Reg1, Reg2, Disp, and Length. bool SystemZAsmParser::parseAddress(bool &HaveReg1, Register &Reg1, bool &HaveReg2, Register &Reg2, - const MCExpr *&Disp, - const MCExpr *&Length) { + const MCExpr *&Disp, const MCExpr *&Length, + bool HasLength, bool HasVectorIndex) { // Parse the displacement, which must always be present. if (getParser().parseExpression(Disp)) return true; @@ -929,6 +937,27 @@ bool SystemZAsmParser::parseAddress(bool &HaveReg1, Register &Reg1, HaveReg1 = false; HaveReg2 = false; Length = nullptr; + + // If we have a scenario as below: + // vgef %v0, 0(0), 0 + // This is an example of a "BDVMem" instruction type. + // + // So when we parse this as an integer register, the register group + // needs to be tied to "RegV". Usually when the prefix is passed in + // as % its easy to check which group it should belong to + // However, if we're passing in just the integer there's no real way to + // "check" what register group it should belong to. + // + // When the user passes in the register as an integer, the user assumes that + // the compiler is responsible for substituting it as the right kind of + // register. Whereas, when the user specifies a "prefix", the onus is on + // the user to make sure they pass in the right kind of register. + // + // The restriction only applies to the first Register (i.e. Reg1). Reg2 is + // always a general register. Reg1 should be of group RegV if "HasVectorIndex" + // (i.e. insn is of type BDVMem) is true. + RegisterGroup RegGroup = HasVectorIndex ? RegV : RegGR; + if (getLexer().is(AsmToken::LParen)) { Parser.Lex(); @@ -937,18 +966,47 @@ bool SystemZAsmParser::parseAddress(bool &HaveReg1, Register &Reg1, HaveReg1 = true; if (parseRegister(Reg1)) return true; + } + // So if we have an integer as the first token in ([tok1], ..), it could: + // 1. Refer to a "Register" (i.e X,R,V fields in BD[X|R|V]Mem type of + // instructions) + // 2. Refer to a "Length" field (i.e L field in BDLMem type of instructions) + else if (getLexer().is(AsmToken::Integer)) { + if (HasLength) { + // Instruction has a "Length" field, safe to parse the first token as + // the "Length" field + if (getParser().parseExpression(Length)) + return true; + } else { + // Otherwise, if the instruction has no "Length" field, parse the + // token as a "Register". We don't have to worry about whether the + // instruction is invalid here, because the caller will take care of + // error reporting. + HaveReg1 = true; + if (parseIntegerRegister(Reg1, RegGroup)) + return true; + } } else { - // Parse the length. - if (getParser().parseExpression(Length)) - return true; + // If its not an integer or a percent token, then if the instruction + // is reported to have a "Length" then, parse it as "Length". + if (HasLength) { + if (getParser().parseExpression(Length)) + return true; + } } // Check whether there's a second register. if (getLexer().is(AsmToken::Comma)) { Parser.Lex(); HaveReg2 = true; - if (parseRegister(Reg2)) - return true; + + if (getLexer().is(AsmToken::Integer)) { + if (parseIntegerRegister(Reg2, RegGR)) + return true; + } else { + if (parseRegister(Reg2)) + return true; + } } // Consume the closing bracket. @@ -983,7 +1041,11 @@ SystemZAsmParser::parseAddress(OperandVector &Operands, MemoryKind MemKind, bool HaveReg1, HaveReg2; const MCExpr *Disp; const MCExpr *Length; - if (parseAddress(HaveReg1, Reg1, HaveReg2, Reg2, Disp, Length)) + + bool HasLength = (MemKind == BDLMem) ? true : false; + bool HasVectorIndex = (MemKind == BDVMem) ? true : false; + if (parseAddress(HaveReg1, Reg1, HaveReg2, Reg2, Disp, Length, HasLength, + HasVectorIndex)) return MatchOperand_ParseFail; const unsigned *Regs; @@ -1001,11 +1063,7 @@ SystemZAsmParser::parseAddress(OperandVector &Operands, MemoryKind MemKind, return MatchOperand_ParseFail; Base = Regs[Reg1.Num]; } - // There must be no Reg2 or length. - if (Length) { - Error(StartLoc, "invalid use of length addressing"); - return MatchOperand_ParseFail; - } + // There must be no Reg2. if (HaveReg2) { Error(StartLoc, "invalid use of indexed addressing"); return MatchOperand_ParseFail; @@ -1029,11 +1087,6 @@ SystemZAsmParser::parseAddress(OperandVector &Operands, MemoryKind MemKind, return MatchOperand_ParseFail; Base = Regs[Reg2.Num]; } - // There must be no length. - if (Length) { - Error(StartLoc, "invalid use of length addressing"); - return MatchOperand_ParseFail; - } break; case BDLMem: // If we have Reg2, it must be an address register. @@ -1066,11 +1119,6 @@ SystemZAsmParser::parseAddress(OperandVector &Operands, MemoryKind MemKind, return MatchOperand_ParseFail; Base = Regs[Reg2.Num]; } - // There must be no length. - if (Length) { - Error(StartLoc, "invalid use of length addressing"); - return MatchOperand_ParseFail; - } break; case BDVMem: // We must have Reg1, and it must be a vector register. @@ -1085,16 +1133,11 @@ SystemZAsmParser::parseAddress(OperandVector &Operands, MemoryKind MemKind, return MatchOperand_ParseFail; Base = Regs[Reg2.Num]; } - // There must be no length. - if (Length) { - Error(StartLoc, "invalid use of length addressing"); - return MatchOperand_ParseFail; - } break; } SMLoc EndLoc = - SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); + SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); Operands.push_back(SystemZOperand::createMem(MemKind, RegKind, Base, Disp, Index, Length, LengthReg, StartLoc, EndLoc)); @@ -1323,7 +1366,8 @@ bool SystemZAsmParser::parseOperand(OperandVector &Operands, bool HaveReg1, HaveReg2; const MCExpr *Expr; const MCExpr *Length; - if (parseAddress(HaveReg1, Reg1, HaveReg2, Reg2, Expr, Length)) + if (parseAddress(HaveReg1, Reg1, HaveReg2, Reg2, Expr, Length, + /*HasLength*/ true, /*HasVectorIndex*/ true)) return true; // If the register combination is not valid for any instruction, reject it. // Otherwise, fall back to reporting an unrecognized instruction. diff --git a/llvm/test/MC/SystemZ/insn-bad.s b/llvm/test/MC/SystemZ/insn-bad.s index c330fa496a25..59174067c4b2 100644 --- a/llvm/test/MC/SystemZ/insn-bad.s +++ b/llvm/test/MC/SystemZ/insn-bad.s @@ -1343,7 +1343,7 @@ #CHECK: clc 0, 0 #CHECK: error: missing length in address #CHECK: clc 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: clc 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: clc 0(0,%r1), 0(%r1) @@ -2573,7 +2573,7 @@ #CHECK: ed 0, 0 #CHECK: error: missing length in address #CHECK: ed 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: ed 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: ed 0(0,%r1), 0(%r1) @@ -2611,7 +2611,7 @@ #CHECK: edmk 0, 0 #CHECK: error: missing length in address #CHECK: edmk 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: edmk 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: edmk 0(0,%r1), 0(%r1) @@ -4373,7 +4373,7 @@ #CHECK: mvc 0, 0 #CHECK: error: missing length in address #CHECK: mvc 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvc 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: mvc 0(0,%r1), 0(%r1) @@ -4428,7 +4428,7 @@ #CHECK: mvcin 0, 0 #CHECK: error: missing length in address #CHECK: mvcin 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvcin 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: mvcin 0(0,%r1), 0(%r1) @@ -4462,7 +4462,7 @@ mvcin 0(1,%r2), 0(%r1,%r2) mvcin 0(-), 0 -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvck 0(%r1,%r1), 0(2,%r1), %r3 #CHECK: error: invalid operand #CHECK: mvck -1(%r1,%r1), 0(%r1), %r3 @@ -4474,7 +4474,7 @@ #CHECK: mvck 0(%r1,%r1), 4096(%r1), %r3 #CHECK: error: invalid use of indexed addressing #CHECK: mvck 0(%r1,%r2), 0(%r1,%r2), %r3 -#CHECK: error: unknown token in expression +#CHECK: error: unexpected token in address #CHECK: mvck 0(-), 0, %r3 mvck 0(%r1,%r1), 0(2,%r1), %r3 @@ -4538,7 +4538,7 @@ mvcos 0(%r1), -1(%r15), %r2 mvcos 0(%r1), 4096(%r15), %r2 -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvcp 0(%r1,%r1), 0(2,%r1), %r3 #CHECK: error: invalid operand #CHECK: mvcp -1(%r1,%r1), 0(%r1), %r3 @@ -4550,7 +4550,7 @@ #CHECK: mvcp 0(%r1,%r1), 4096(%r1), %r3 #CHECK: error: invalid use of indexed addressing #CHECK: mvcp 0(%r1,%r2), 0(%r1,%r2), %r3 -#CHECK: error: unknown token in expression +#CHECK: error: unexpected token in address #CHECK: mvcp 0(-), 0, %r3 mvcp 0(%r1,%r1), 0(2,%r1), %r3 @@ -4561,7 +4561,7 @@ mvcp 0(%r1,%r2), 0(%r1,%r2), %r3 mvcp 0(-), 0, %r3 -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvcs 0(%r1,%r1), 0(2,%r1), %r3 #CHECK: error: invalid operand #CHECK: mvcs -1(%r1,%r1), 0(%r1), %r3 @@ -4573,7 +4573,7 @@ #CHECK: mvcs 0(%r1,%r1), 4096(%r1), %r3 #CHECK: error: invalid use of indexed addressing #CHECK: mvcs 0(%r1,%r2), 0(%r1,%r2), %r3 -#CHECK: error: unknown token in expression +#CHECK: error: unexpected token in address #CHECK: mvcs 0(-), 0, %r3 mvcs 0(%r1,%r1), 0(2,%r1), %r3 @@ -4690,7 +4690,7 @@ #CHECK: mvn 0, 0 #CHECK: error: missing length in address #CHECK: mvn 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvn 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: mvn 0(0,%r1), 0(%r1) @@ -4775,7 +4775,7 @@ #CHECK: mvz 0, 0 #CHECK: error: missing length in address #CHECK: mvz 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: mvz 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: mvz 0(0,%r1), 0(%r1) @@ -4917,7 +4917,7 @@ #CHECK: nc 0, 0 #CHECK: error: missing length in address #CHECK: nc 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: nc 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: nc 0(0,%r1), 0(%r1) @@ -5071,7 +5071,7 @@ #CHECK: oc 0, 0 #CHECK: error: missing length in address #CHECK: oc 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: oc 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: oc 0(0,%r1), 0(%r1) @@ -5319,7 +5319,7 @@ #CHECK: pka 0, 0 #CHECK: error: missing length in address #CHECK: pka 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: pka 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: pka 0(%r1), 0(0,%r1) @@ -5357,7 +5357,7 @@ #CHECK: pku 0, 0 #CHECK: error: missing length in address #CHECK: pku 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: pku 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: pku 0(%r1), 0(0,%r1) @@ -6232,7 +6232,7 @@ #CHECK: srp 0, 0, 0 #CHECK: error: missing length in address #CHECK: srp 0(%r1), 0(%r1), 0 -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: srp 0(1,%r1), 0(2,%r1), 0 #CHECK: error: invalid operand #CHECK: srp 0(0,%r1), 0(%r1), 0 @@ -7118,7 +7118,7 @@ #CHECK: tr 0, 0 #CHECK: error: missing length in address #CHECK: tr 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: tr 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: tr 0(0,%r1), 0(%r1) @@ -7216,7 +7216,7 @@ #CHECK: trt 0, 0 #CHECK: error: missing length in address #CHECK: trt 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: trt 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: trt 0(0,%r1), 0(%r1) @@ -7276,7 +7276,7 @@ #CHECK: trtr 0, 0 #CHECK: error: missing length in address #CHECK: trtr 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: trtr 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: trtr 0(0,%r1), 0(%r1) @@ -7405,7 +7405,7 @@ #CHECK: unpka 0, 0 #CHECK: error: missing length in address #CHECK: unpka 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: unpka 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: unpka 0(0,%r1), 0(%r1) @@ -7443,7 +7443,7 @@ #CHECK: unpku 0, 0 #CHECK: error: missing length in address #CHECK: unpku 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: unpku 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: unpku 0(0,%r1), 0(%r1) @@ -7489,7 +7489,7 @@ #CHECK: xc 0, 0 #CHECK: error: missing length in address #CHECK: xc 0(%r1), 0(%r1) -#CHECK: error: invalid use of length addressing +#CHECK: error: invalid use of indexed addressing #CHECK: xc 0(1,%r1), 0(2,%r1) #CHECK: error: invalid operand #CHECK: xc 0(0,%r1), 0(%r1) diff --git a/llvm/test/MC/SystemZ/insn-good-z13.s b/llvm/test/MC/SystemZ/insn-good-z13.s index df99a1f48951..4e3ba50ad7f9 100644 --- a/llvm/test/MC/SystemZ/insn-good-z13.s +++ b/llvm/test/MC/SystemZ/insn-good-z13.s @@ -2879,6 +2879,20 @@ vgbm %v31, 0 vgbm %v17, 0x1234 +#CHECK: vgef %v0, 0(%v0), 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v0,%r1), 3 # encoding: [0xe7,0x00,0x10,0x00,0x30,0x13] +#CHECK: vgef %v0, 0(%v0,%r15), 0 # encoding: [0xe7,0x00,0xf0,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v31,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x04,0x13] +#CHECK: vgef %v0, 4095(%v0,%r1), 0 # encoding: [0xe7,0x00,0x1f,0xff,0x00,0x13] +#CHECK: vgef %v15, 0(%v0,%r1), 0 # encoding: [0xe7,0xf0,0x10,0x00,0x00,0x13] +#CHECK: vgef %v31, 0(%v0,%r1), 0 # encoding: [0xe7,0xf0,0x10,0x00,0x08,0x13] +#CHECK: vgef %v10, 1000(%v19,%r7), 1 # encoding: [0xe7,0xa3,0x73,0xe8,0x14,0x13] +#CHECK: vgef %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v0,%r1), 3 # encoding: [0xe7,0x00,0x10,0x00,0x30,0x13] +#CHECK: vgef %v0, 0(%v0,%r15), 0 # encoding: [0xe7,0x00,0xf0,0x00,0x00,0x13] +#CHECK: vgef %v0, 0(%v15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x13] #CHECK: vgef %v0, 0(%v0), 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x13] #CHECK: vgef %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x13] #CHECK: vgef %v0, 0(%v0,%r1), 3 # encoding: [0xe7,0x00,0x10,0x00,0x30,0x13] @@ -2900,7 +2914,35 @@ vgef %v15, 0(%v0,%r1), 0 vgef %v31, 0(%v0,%r1), 0 vgef %v10, 1000(%v19,%r7), 1 + vgef %v0, 0(0,%r1), 0 + vgef %v0, 0(%v0,1), 3 + vgef %v0, 0(0,%r15), 0 + vgef %v0, 0(%v15,1), 0 + vgef 0, 0(0), 0 + vgef 0, 0(0,1), 0 + vgef 0, 0(0,1), 3 + vgef 0, 0(0,15), 0 + vgef 0, 0(15,1), 0 + vgef 0, 0(31,1), 0 + vgef 0, 4095(0, 1), 0 + vgef 15, 0(0,1), 0 + vgef 31, 0(0,1), 0 + vgef 10, 1000(19,7), 1 +#CHECK: vgeg %v0, 0(%v0), 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v0,%r1), 1 # encoding: [0xe7,0x00,0x10,0x00,0x10,0x12] +#CHECK: vgeg %v0, 0(%v0,%r15), 0 # encoding: [0xe7,0x00,0xf0,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v31,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x04,0x12] +#CHECK: vgeg %v0, 4095(%v0,%r1), 0 # encoding: [0xe7,0x00,0x1f,0xff,0x00,0x12] +#CHECK: vgeg %v15, 0(%v0,%r1), 0 # encoding: [0xe7,0xf0,0x10,0x00,0x00,0x12] +#CHECK: vgeg %v31, 0(%v0,%r1), 0 # encoding: [0xe7,0xf0,0x10,0x00,0x08,0x12] +#CHECK: vgeg %v10, 1000(%v19,%r7), 1 # encoding: [0xe7,0xa3,0x73,0xe8,0x14,0x12] +#CHECK: vgeg %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v0,%r1), 1 # encoding: [0xe7,0x00,0x10,0x00,0x10,0x12] +#CHECK: vgeg %v0, 0(%v0,%r15), 0 # encoding: [0xe7,0x00,0xf0,0x00,0x00,0x12] +#CHECK: vgeg %v0, 0(%v15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x12] #CHECK: vgeg %v0, 0(%v0), 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x12] #CHECK: vgeg %v0, 0(%v0,%r1), 0 # encoding: [0xe7,0x00,0x10,0x00,0x00,0x12] #CHECK: vgeg %v0, 0(%v0,%r1), 1 # encoding: [0xe7,0x00,0x10,0x00,0x10,0x12] @@ -2922,6 +2964,21 @@ vgeg %v15, 0(%v0,%r1), 0 vgeg %v31, 0(%v0,%r1), 0 vgeg %v10, 1000(%v19,%r7), 1 + vgeg %v0, 0(0,%r1), 0 + vgeg %v0, 0(%v0,1), 1 + vgeg %v0, 0(0,%r15), 0 + vgeg %v0, 0(%v15,1), 0 + vgeg 0, 0(0), 0 + vgeg 0, 0(0,1), 0 + vgeg 0, 0(0,1), 1 + vgeg 0, 0(0,15), 0 + vgeg 0, 0(15,1), 0 + vgeg 0, 0(31,1), 0 + vgeg 0, 4095(0,1), 0 + vgeg 15, 0(0,1), 0 + vgeg 31, 0(0,1), 0 + vgeg 10, 1000(19,7), 1 + #CHECK: vgfm %v0, %v0, %v0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0xb4] #CHECK: vgfm %v0, %v0, %v0, 15 # encoding: [0xe7,0x00,0x00,0x00,0xf0,0xb4] @@ -3229,6 +3286,16 @@ vl %v31, 0 vl %v18, 0x567(%r3,%r4), 3 +#CHECK: vlbb %v0, 0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x07] +#CHECK: vlbb %v0, 0, 15 # encoding: [0xe7,0x00,0x00,0x00,0xf0,0x07] +#CHECK: vlbb %v0, 4095, 0 # encoding: [0xe7,0x00,0x0f,0xff,0x00,0x07] +#CHECK: vlbb %v0, 0(%r15), 0 # encoding: [0xe7,0x00,0xf0,0x00,0x00,0x07] +#CHECK: vlbb %v0, 0(%r15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x07] +#CHECK: vlbb %v15, 0, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x00,0x07] +#CHECK: vlbb %v31, 0, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x08,0x07] +#CHECK: vlbb %v18, 1383(%r3,%r4), 8 # encoding: [0xe7,0x23,0x45,0x67,0x88,0x07] +#CHECK: vlbb %v0, 0(%r15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x07] +#CHECK: vlbb %v0, 0(%r15,%r1), 0 # encoding: [0xe7,0x0f,0x10,0x00,0x00,0x07] #CHECK: vlbb %v0, 0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0x07] #CHECK: vlbb %v0, 0, 15 # encoding: [0xe7,0x00,0x00,0x00,0xf0,0x07] #CHECK: vlbb %v0, 4095, 0 # encoding: [0xe7,0x00,0x0f,0xff,0x00,0x07] @@ -3246,6 +3313,17 @@ vlbb %v15, 0, 0 vlbb %v31, 0, 0 vlbb %v18, 1383(%r3,%r4), 8 + vlbb %v18, 1383(%r3, 4), 8 + vlbb %v0, 0(15,%r1), 0 + vlbb %v0, 0(%r15,1), 0 + vlbb 0, 0, 0 + vlbb 0, 0, 15 + vlbb 0, 4095, 0 + vlbb 0, 0(15), 0 + vlbb 0, 0(15,1), 0 + vlbb 15, 0, 0 + vlbb 31, 0, 0 + vlbb 18, 1383(3,4), 8 #CHECK: vlc %v0, %v0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x00,0xde] #CHECK: vlc %v0, %v0, 15 # encoding: [0xe7,0x00,0x00,0x00,0xf0,0xde] @@ -3922,6 +4000,7 @@ #CHECK: vlvgf %v0, %r0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x20,0x22] #CHECK: vlvgf %v0, %r0, 4095 # encoding: [0xe7,0x00,0x0f,0xff,0x20,0x22] #CHECK: vlvgf %v0, %r0, 0(%r15) # encoding: [0xe7,0x00,0xf0,0x00,0x20,0x22] +#CHECK: vlvgf %v0, %r0, 0(%r15) # encoding: [0xe7,0x00,0xf0,0x00,0x20,0x22] #CHECK: vlvgf %v0, %r15, 0 # encoding: [0xe7,0x0f,0x00,0x00,0x20,0x22] #CHECK: vlvgf %v15, %r0, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x20,0x22] #CHECK: vlvgf %v31, %r0, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x28,0x22] @@ -3930,6 +4009,7 @@ vlvgf %v0, %r0, 0 vlvgf %v0, %r0, 4095 vlvgf %v0, %r0, 0(%r15) + vlvgf %v0, %r0, 0(15) vlvgf %v0, %r15, 0 vlvgf %v15, %r0, 0 vlvgf %v31, %r0, 0 diff --git a/llvm/test/MC/SystemZ/insn-good-z14.s b/llvm/test/MC/SystemZ/insn-good-z14.s index 1fcdcb4ccab0..ec12283ecbae 100644 --- a/llvm/test/MC/SystemZ/insn-good-z14.s +++ b/llvm/test/MC/SystemZ/insn-good-z14.s @@ -4,6 +4,16 @@ # RUN: llvm-mc -triple s390x-linux-gnu -mcpu=arch12 -show-encoding %s \ # RUN: | FileCheck %s +#CHECK: agh %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x38] +#CHECK: agh %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x38] +#CHECK: agh %r0, 0 # encoding: [0xe3,0x00,0x00,0x00,0x00,0x38] +#CHECK: agh %r0, 1 # encoding: [0xe3,0x00,0x00,0x01,0x00,0x38] +#CHECK: agh %r0, 524287 # encoding: [0xe3,0x00,0x0f,0xff,0x7f,0x38] +#CHECK: agh %r0, 0(%r1) # encoding: [0xe3,0x00,0x10,0x00,0x00,0x38] +#CHECK: agh %r0, 0(%r15) # encoding: [0xe3,0x00,0xf0,0x00,0x00,0x38] +#CHECK: agh %r0, 524287(%r1,%r15) # encoding: [0xe3,0x01,0xff,0xff,0x7f,0x38] +#CHECK: agh %r0, 524287(%r15,%r1) # encoding: [0xe3,0x0f,0x1f,0xff,0x7f,0x38] +#CHECK: agh %r15, 0 # encoding: [0xe3,0xf0,0x00,0x00,0x00,0x38] #CHECK: agh %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x38] #CHECK: agh %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x38] #CHECK: agh %r0, 0 # encoding: [0xe3,0x00,0x00,0x00,0x00,0x38] @@ -25,6 +35,16 @@ agh %r0, 524287(%r1,%r15) agh %r0, 524287(%r15,%r1) agh %r15, 0 + agh 0, -524288 + agh 0, -1 + agh 0, 0 + agh 0, 1 + agh 0, 524287 + agh 0, 0(1) + agh 0, 0(15) + agh 0, 524287(1,15) + agh 0, 524287(15,1) + agh 15, 0 #CHECK: bi -524288 # encoding: [0xe3,0xf0,0x00,0x00,0x80,0x47] #CHECK: bi -1 # encoding: [0xe3,0xf0,0x0f,0xff,0xff,0x47] @@ -1200,6 +1220,13 @@ vlip %v31, 0, 0 vlip %v17, 0x1234, 7 +#CHECK: vllezlf %v0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x60,0x04] +#CHECK: vllezlf %v0, 4095 # encoding: [0xe7,0x00,0x0f,0xff,0x60,0x04] +#CHECK: vllezlf %v0, 0(%r15) # encoding: [0xe7,0x00,0xf0,0x00,0x60,0x04] +#CHECK: vllezlf %v0, 0(%r15,%r1) # encoding: [0xe7,0x0f,0x10,0x00,0x60,0x04] +#CHECK: vllezlf %v15, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x60,0x04] +#CHECK: vllezlf %v31, 0 # encoding: [0xe7,0xf0,0x00,0x00,0x68,0x04] +#CHECK: vllezlf %v18, 1383(%r3,%r4) # encoding: [0xe7,0x23,0x45,0x67,0x68,0x04] #CHECK: vllezlf %v0, 0 # encoding: [0xe7,0x00,0x00,0x00,0x60,0x04] #CHECK: vllezlf %v0, 4095 # encoding: [0xe7,0x00,0x0f,0xff,0x60,0x04] #CHECK: vllezlf %v0, 0(%r15) # encoding: [0xe7,0x00,0xf0,0x00,0x60,0x04] @@ -1215,7 +1242,21 @@ vllezlf %v15, 0 vllezlf %v31, 0 vllezlf %v18, 0x567(%r3,%r4) + vllezlf 0, 0 + vllezlf 0, 4095 + vllezlf 0, 0(15) + vllezlf 0, 0(15,1) + vllezlf 15, 0 + vllezlf 31, 0 + vllezlf 18, 0x567(3,4) +#CHECK: vlrl %v0, 0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x00,0x35] +#CHECK: vlrl %v0, 4095, 0 # encoding: [0xe6,0x00,0x0f,0xff,0x00,0x35] +#CHECK: vlrl %v0, 0(%r15), 0 # encoding: [0xe6,0x00,0xf0,0x00,0x00,0x35] +#CHECK: vlrl %v0, 0, 255 # encoding: [0xe6,0xff,0x00,0x00,0x00,0x35] +#CHECK: vlrl %v15, 0, 0 # encoding: [0xe6,0x00,0x00,0x00,0xf0,0x35] +#CHECK: vlrl %v31, 0, 0 # encoding: [0xe6,0x00,0x00,0x00,0xf1,0x35] +#CHECK: vlrl %v18, 1383(%r4), 3 # encoding: [0xe6,0x03,0x45,0x67,0x21,0x35] #CHECK: vlrl %v0, 0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x00,0x35] #CHECK: vlrl %v0, 4095, 0 # encoding: [0xe6,0x00,0x0f,0xff,0x00,0x35] #CHECK: vlrl %v0, 0(%r15), 0 # encoding: [0xe6,0x00,0xf0,0x00,0x00,0x35] @@ -1231,6 +1272,13 @@ vlrl %v15, 0, 0 vlrl %v31, 0, 0 vlrl %v18, 1383(%r4), 3 + vlrl 0, 0, 0 + vlrl 0, 4095, 0 + vlrl 0, 0(15), 0 + vlrl 0, 0, 255 + vlrl 15, 0, 0 + vlrl 31, 0, 0 + vlrl 18, 1383(4), 3 #CHECK: vlrlr %v0, %r0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x00,0x37] #CHECK: vlrlr %v0, %r0, 4095 # encoding: [0xe6,0x00,0x0f,0xff,0x00,0x37] diff --git a/llvm/test/MC/SystemZ/insn-good-z15.s b/llvm/test/MC/SystemZ/insn-good-z15.s index 1eb3b743cc61..36476161ea46 100644 --- a/llvm/test/MC/SystemZ/insn-good-z15.s +++ b/llvm/test/MC/SystemZ/insn-good-z15.s @@ -24,6 +24,13 @@ kdsa %r15, %r2 kdsa %r7, %r10 +#CHECK: vllebrzg %v0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x30,0x04] +#CHECK: vllebrzg %v0, 4095 # encoding: [0xe6,0x00,0x0f,0xff,0x30,0x04] +#CHECK: vllebrzg %v0, 0(%r15) # encoding: [0xe6,0x00,0xf0,0x00,0x30,0x04] +#CHECK: vllebrzg %v0, 0(%r15,%r1) # encoding: [0xe6,0x0f,0x10,0x00,0x30,0x04] +#CHECK: vllebrzg %v15, 0 # encoding: [0xe6,0xf0,0x00,0x00,0x30,0x04] +#CHECK: vllebrzg %v31, 0 # encoding: [0xe6,0xf0,0x00,0x00,0x38,0x04] +#CHECK: vllebrzg %v18, 1383(%r3,%r4) # encoding: [0xe6,0x23,0x45,0x67,0x38,0x04] #CHECK: vllebrzg %v0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x30,0x04] #CHECK: vllebrzg %v0, 4095 # encoding: [0xe6,0x00,0x0f,0xff,0x30,0x04] #CHECK: vllebrzg %v0, 0(%r15) # encoding: [0xe6,0x00,0xf0,0x00,0x30,0x04] @@ -39,6 +46,14 @@ ldrv %f15, 0 ldrv %v31, 0 ldrv %v18, 0x567(%r3,%r4) + ldrv 0, 0 + ldrv 0, 4095 + ldrv 0, 0(15) + ldrv 0, 0(15,1) + ldrv 15, 0 + ldrv 31, 0 + ldrv 18, 0x567(3,4) + #CHECK: vllebrze %v0, 0 # encoding: [0xe6,0x00,0x00,0x00,0x60,0x04] #CHECK: vllebrze %v0, 4095 # encoding: [0xe6,0x00,0x0f,0xff,0x60,0x04] diff --git a/llvm/test/MC/SystemZ/insn-good.s b/llvm/test/MC/SystemZ/insn-good.s index 3c5b34abe0a9..07f721bfa5e4 100644 --- a/llvm/test/MC/SystemZ/insn-good.s +++ b/llvm/test/MC/SystemZ/insn-good.s @@ -1,6 +1,13 @@ # For z10 and above. # RUN: llvm-mc -triple s390x-linux-gnu -show-encoding %s | FileCheck %s +#CHECK: a %r0, 0 # encoding: [0x5a,0x00,0x00,0x00] +#CHECK: a %r0, 4095 # encoding: [0x5a,0x00,0x0f,0xff] +#CHECK: a %r0, 0(%r1) # encoding: [0x5a,0x00,0x10,0x00] +#CHECK: a %r0, 0(%r15) # encoding: [0x5a,0x00,0xf0,0x00] +#CHECK: a %r0, 4095(%r1,%r15) # encoding: [0x5a,0x01,0xff,0xff] +#CHECK: a %r0, 4095(%r15,%r1) # encoding: [0x5a,0x0f,0x1f,0xff] +#CHECK: a %r15, 0 # encoding: [0x5a,0xf0,0x00,0x00] #CHECK: a %r0, 0 # encoding: [0x5a,0x00,0x00,0x00] #CHECK: a %r0, 4095 # encoding: [0x5a,0x00,0x0f,0xff] #CHECK: a %r0, 0(%r1) # encoding: [0x5a,0x00,0x10,0x00] @@ -16,6 +23,14 @@ a %r0, 4095(%r1,%r15) a %r0, 4095(%r15,%r1) a %r15, 0 + a 0, 0 + a 0, 4095 + a 0, 0(1) + a 0, 0(15) + a 0, 4095(1,15) + a 0, 4095(15,1) + a 15, 0 + #CHECK: ad %f0, 0 # encoding: [0x6a,0x00,0x00,0x00] #CHECK: ad %f0, 4095 # encoding: [0x6a,0x00,0x0f,0xff] @@ -319,6 +334,13 @@ ahy %r0, 524287(%r15,%r1) ahy %r15, 0 +#CHECK: al %r0, 0 # encoding: [0x5e,0x00,0x00,0x00] +#CHECK: al %r0, 4095 # encoding: [0x5e,0x00,0x0f,0xff] +#CHECK: al %r0, 0(%r1) # encoding: [0x5e,0x00,0x10,0x00] +#CHECK: al %r0, 0(%r15) # encoding: [0x5e,0x00,0xf0,0x00] +#CHECK: al %r0, 4095(%r1,%r15) # encoding: [0x5e,0x01,0xff,0xff] +#CHECK: al %r0, 4095(%r15,%r1) # encoding: [0x5e,0x0f,0x1f,0xff] +#CHECK: al %r15, 0 # encoding: [0x5e,0xf0,0x00,0x00] #CHECK: al %r0, 0 # encoding: [0x5e,0x00,0x00,0x00] #CHECK: al %r0, 4095 # encoding: [0x5e,0x00,0x0f,0xff] #CHECK: al %r0, 0(%r1) # encoding: [0x5e,0x00,0x10,0x00] @@ -334,6 +356,13 @@ al %r0, 4095(%r1,%r15) al %r0, 4095(%r15,%r1) al %r15, 0 + al 0, 0 + al 0, 4095 + al 0, 0(1) + al 0, 0(15) + al 0, 4095(1,15) + al 0, 4095(15,1) + al 15, 0 #CHECK: alc %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x98] #CHECK: alc %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x98] @@ -407,6 +436,16 @@ alfi %r0, (1 << 32) - 1 alfi %r15, 0 +#CHECK: alg %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x0a] +#CHECK: alg %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x0a] +#CHECK: alg %r0, 0 # encoding: [0xe3,0x00,0x00,0x00,0x00,0x0a] +#CHECK: alg %r0, 1 # encoding: [0xe3,0x00,0x00,0x01,0x00,0x0a] +#CHECK: alg %r0, 524287 # encoding: [0xe3,0x00,0x0f,0xff,0x7f,0x0a] +#CHECK: alg %r0, 0(%r1) # encoding: [0xe3,0x00,0x10,0x00,0x00,0x0a] +#CHECK: alg %r0, 0(%r15) # encoding: [0xe3,0x00,0xf0,0x00,0x00,0x0a] +#CHECK: alg %r0, 524287(%r1,%r15) # encoding: [0xe3,0x01,0xff,0xff,0x7f,0x0a] +#CHECK: alg %r0, 524287(%r15,%r1) # encoding: [0xe3,0x0f,0x1f,0xff,0x7f,0x0a] +#CHECK: alg %r15, 0 # encoding: [0xe3,0xf0,0x00,0x00,0x00,0x0a] #CHECK: alg %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x0a] #CHECK: alg %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x0a] #CHECK: alg %r0, 0 # encoding: [0xe3,0x00,0x00,0x00,0x00,0x0a] @@ -428,6 +467,16 @@ alg %r0, 524287(%r1,%r15) alg %r0, 524287(%r15,%r1) alg %r15, 0 + alg 0, -524288 + alg 0, -1 + alg 0, 0 + alg 0, 1 + alg 0, 524287 + alg 0, 0(1) + alg 0, 0(15) + alg 0, 524287(1,15) + alg 0, 524287(15,1) + alg 15, 0 #CHECK: algf %r0, -524288 # encoding: [0xe3,0x00,0x00,0x00,0x80,0x1a] #CHECK: algf %r0, -1 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x1a] @@ -479,6 +528,19 @@ algr %r15,%r0 algr %r7,%r8 +#CHECK: algsi -524288, 0 # encoding: [0xeb,0x00,0x00,0x00,0x80,0x7e] +#CHECK: algsi -1, 0 # encoding: [0xeb,0x00,0x0f,0xff,0xff,0x7e] +#CHECK: algsi 0, 0 # encoding: [0xeb,0x00,0x00,0x00,0x00,0x7e] +#CHECK: algsi 1, 0 # encoding: [0xeb,0x00,0x00,0x01,0x00,0x7e] +#CHECK: algsi 524287, 0 # encoding: [0xeb,0x00,0x0f,0xff,0x7f,0x7e] +#CHECK: algsi 0, -128 # encoding: [0xeb,0x80,0x00,0x00,0x00,0x7e] +#CHECK: algsi 0, -1 # encoding: [0xeb,0xff,0x00,0x00,0x00,0x7e] +#CHECK: algsi 0, 1 # encoding: [0xeb,0x01,0x00,0x00,0x00,0x7e] +#CHECK: algsi 0, 127 # encoding: [0xeb,0x7f,0x00,0x00,0x00,0x7e] +#CHECK: algsi 0(%r1), 42 # encoding: [0xeb,0x2a,0x10,0x00,0x00,0x7e] +#CHECK: algsi 0(%r15), 42 # encoding: [0xeb,0x2a,0xf0,0x00,0x00,0x7e] +#CHECK: algsi 524287(%r1), 42 # encoding: [0xeb,0x2a,0x1f,0xff,0x7f,0x7e] +#CHECK: algsi 524287(%r15), 42 # encoding: [0xeb,0x2a,0xff,0xff,0x7f,0x7e] #CHECK: algsi -524288, 0 # encoding: [0xeb,0x00,0x00,0x00,0x80,0x7e] #CHECK: algsi -1, 0 # encoding: [0xeb,0x00,0x0f,0xff,0xff,0x7e] #CHECK: algsi 0, 0 # encoding: [0xeb,0x00,0x00,0x00,0x00,0x7e] @@ -506,6 +568,19 @@ algsi 0(%r15), 42 algsi 524287(%r1), 42 algsi 524287(%r15), 42 + algsi -524288, 0 + algsi -1, 0 + algsi 0, 0 + algsi 1, 0 + algsi 524287, 0 + algsi 0, -128 + algsi 0, -1 + algsi 0, 1 + algsi 0, 127 + algsi 0(1), 42 + algsi 0(15), 42 + algsi 524287(1), 42 + algsi 524287(15), 42 #CHECK: alr %r0, %r0 # encoding: [0x1e,0x00] #CHECK: alr %r0, %r15 # encoding: [0x1e,0x0f] @@ -596,6 +671,20 @@ ap 0(16,%r15), 0(1) ap 0(1), 0(16,%r1) ap 0(1), 0(16,%r15) + ap 0(1), 0(1) + ap 0(1), 0(1,1) + ap 0(1), 0(1,15) + ap 0(1), 4095(1) + ap 0(1), 4095(1,1) + ap 0(1), 4095(1,15) + ap 0(1,1), 0(1) + ap 0(1,15), 0(1) + ap 4095(1,1), 0(1) + ap 4095(1,15), 0(1) + ap 0(16,1), 0(1) + ap 0(16,15), 0(1) + ap 0(1), 0(16,1) + ap 0(1), 0(16,15) #CHECK: ar %r0, %r0 # encoding: [0x1a,0x00] #CHECK: ar %r0, %r15 # encoding: [0x1a,0x0f] diff --git a/llvm/test/MC/SystemZ/regs-good.s b/llvm/test/MC/SystemZ/regs-good.s index c6157b5e2c85..b4c1edd1b591 100644 --- a/llvm/test/MC/SystemZ/regs-good.s +++ b/llvm/test/MC/SystemZ/regs-good.s @@ -152,6 +152,29 @@ lctl %c14,%c15,0 lctl 0,15,0 +#CHECK: st %r0, 0 # encoding: [0x50,0x00,0x00,0x00] +#CHECK: st %r0, 4095 # encoding: [0x50,0x00,0x0f,0xff] +#CHECK: st %r0, 0(%r1) # encoding: [0x50,0x00,0x10,0x00] +#CHECK: st %r0, 0(%r15) # encoding: [0x50,0x00,0xf0,0x00] +#CHECK: st %r0, 4095(%r1,%r15) # encoding: [0x50,0x01,0xff,0xff] +#CHECK: st %r0, 4095(%r15,%r1) # encoding: [0x50,0x0f,0x1f,0xff] +#CHECK: st %r15, 0 # encoding: [0x50,0xf0,0x00,0x00] +#CHECK: st %r0, 0(%r1) # encoding: [0x50,0x00,0x10,0x00] +#CHECK: st %r0, 0(%r15) # encoding: [0x50,0x00,0xf0,0x00] +#CHECK: st %r0, 4095(%r1,%r15) # encoding: [0x50,0x01,0xff,0xff] +#CHECK: st %r0, 4095(%r15,%r1) # encoding: [0x50,0x0f,0x1f,0xff] + + st %r0, 0 + st %r0, 4095 + st %r0, 0(%r1) + st %r0, 0(%r15) + st %r0, 4095(%r1,%r15) + st %r0, 4095(%r15,%r1) + st %r15, 0 + st 0, 0(1) + st 0, 0(15) + st 0, 4095(1,15) + st 0, 4095(15,1) #CHECK: .cfi_offset %r0, 0 #CHECK: .cfi_offset %r1, 8 diff --git a/llvm/test/MC/SystemZ/tokens.s b/llvm/test/MC/SystemZ/tokens.s index 177378615ed4..bf8c4e906785 100644 --- a/llvm/test/MC/SystemZ/tokens.s +++ b/llvm/test/MC/SystemZ/tokens.s @@ -57,6 +57,14 @@ #CHECK: foo %, 200 #CHECK: error: unknown token in expression #CHECK: foo {, 200 +#CHECK: error: invalid instruction +#CHECK: foo 100(15), 300 +#CHECK: error: register expected +#CHECK: foo 100(15,), 300 +#CHECK: error: invalid instruction +#CHECK: foo 100(15,%r1), 300 +#CHECK: error: invalid instruction +#CHECK: foo 100(%v20,10), 300 foo 100, 200 foo 100(, 200 @@ -86,3 +94,7 @@ foo %c, 200 foo %, 200 foo {, 200 + foo 100(15), 300 + foo 100(15,), 300 + foo 100(15,%r1), 300 + foo 100(%v20,10), 300 From llvm-commits at lists.llvm.org Wed Jul 8 09:20:57 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:20:57 +0000 (UTC) Subject: [PATCH] D83251: [SystemZ] Allow specifying integer registers as part of the address calculation In-Reply-To: References: Message-ID: <62448dc57a976fae452fdb7a53342886@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcca8578efab0: [SystemZ] Allow specifying integer registers as part of the address calculation (authored by uweigand). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83251/new/ https://reviews.llvm.org/D83251 Files: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/test/MC/SystemZ/insn-bad.s llvm/test/MC/SystemZ/insn-good-z13.s llvm/test/MC/SystemZ/insn-good-z14.s llvm/test/MC/SystemZ/insn-good-z15.s llvm/test/MC/SystemZ/insn-good.s llvm/test/MC/SystemZ/regs-good.s llvm/test/MC/SystemZ/tokens.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83251.276461.patch Type: text/x-patch Size: 40228 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:20:58 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:20:58 +0000 (UTC) Subject: [PATCH] D83404: [PowerPC][PCRelative] Thread Local Storage Support for Local Exec Message-ID: kamaub created this revision. kamaub added reviewers: stefanp, nemanjai, NeHuang. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. Herald added a project: LLVM. This patch is the initial support for the Local Exec Thread Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83404 Files: llvm/include/llvm/BinaryFormat/ELFRelocs/PowerPC64.def llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/lib/Target/PowerPC/PPCMCInstLower.cpp llvm/test/CodeGen/PowerPC/pcrel-tls-local-exec.ll llvm/test/MC/PowerPC/pcrel-tls-local-exec-address-load-reloc.s llvm/test/MC/PowerPC/pcrel-tls-local-exec-value-load-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83404.276462.patch Type: text/x-patch Size: 10288 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:23:58 2020 From: llvm-commits at lists.llvm.org (Wei Mi via llvm-commits) Date: Wed, 08 Jul 2020 09:23:58 -0700 (PDT) Subject: [llvm] e32469a - [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge-inlinee Message-ID: <5f05f31e.1c69fb81.85e9c.0c52@mx.google.com> Author: Wei Mi Date: 2020-07-08T09:23:18-07:00 New Revision: e32469a140374737ad0ece395d5b52444bd94cd1 URL: https://github.com/llvm/llvm-project/commit/e32469a140374737ad0ece395d5b52444bd94cd1 DIFF: https://github.com/llvm/llvm-project/commit/e32469a140374737ad0ece395d5b52444bd94cd1.diff LOG: [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge-inlinee by default. sample-profile-top-down-load is an internal option which can enable top-down order of inlining and profile annotation in sample profile load pass. It was found to be beneficial for better profile annotation. Recently we found it could also solve some build time issue. Suppose function A has many callsites in function B. In the last release binary where sample profile was collected, the outline copy of A is large because there are many other functions inlined into A. However although all the callsites calling A in B are inlined, but every inlined body is small (A was inlined into B before other functions are inlined into A), there is no build time issue in last release. In an optimized build using the sample profile collected from last release, without top-down inlining, we saw a case that A got very large because of inlining, and then multiple callsites of A got inlined into B, and that led to a huge B which caused significant build time issue besides profile annotation issue. To solve that problem, the patch enables the flag sample-profile-top-down-load by default. sample-profile-top-down-load can have better performance when it is enabled together with sample-profile-merge-inlinee so in this patch we also enable sample-profile-merge-inlinee by default. Differential Revision: https://reviews.llvm.org/D82919 Added: Modified: llvm/lib/Transforms/IPO/SampleProfile.cpp llvm/test/Transforms/SampleProfile/inline-mergeprof.ll llvm/test/Transforms/SampleProfile/inline-topdown.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp index 51fdf8f0db29..b6871e260532 100644 --- a/llvm/lib/Transforms/IPO/SampleProfile.cpp +++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp @@ -149,14 +149,17 @@ static cl::opt ProfileAccurateForSymsInList( "be accurate. It may be overriden by profile-sample-accurate. ")); static cl::opt ProfileMergeInlinee( - "sample-profile-merge-inlinee", cl::Hidden, cl::init(false), + "sample-profile-merge-inlinee", cl::Hidden, cl::init(true), cl::desc("Merge past inlinee's profile to outline version if sample " - "profile loader decided not to inline a call site.")); + "profile loader decided not to inline a call site. It will " + "only be enabled when top-down order of profile loading is " + "enabled. ")); static cl::opt ProfileTopDownLoad( - "sample-profile-top-down-load", cl::Hidden, cl::init(false), + "sample-profile-top-down-load", cl::Hidden, cl::init(true), cl::desc("Do profile annotation and inlining for functions in top-down " - "order of call graph during sample profile loading.")); + "order of call graph during sample profile loading. It only " + "works for new pass manager. ")); static cl::opt ProfileSizeInline( "sample-profile-inline-size", cl::Hidden, cl::init(false), @@ -1785,6 +1788,15 @@ SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) { FunctionOrderList.reserve(M.size()); if (!ProfileTopDownLoad || CG == nullptr) { + if (ProfileMergeInlinee) { + // Disable ProfileMergeInlinee if profile is not loaded in top down order, + // because the profile for a function may be used for the profile + // annotation of its outline copy before the profile merging of its + // non-inlined inline instances, and that is not the way how + // ProfileMergeInlinee is supposed to work. + ProfileMergeInlinee = false; + } + for (Function &F : M) if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile")) FunctionOrderList.push_back(&F); diff --git a/llvm/test/Transforms/SampleProfile/inline-mergeprof.ll b/llvm/test/Transforms/SampleProfile/inline-mergeprof.ll index 08e9fa008fa8..d83fd23c33d3 100644 --- a/llvm/test/Transforms/SampleProfile/inline-mergeprof.ll +++ b/llvm/test/Transforms/SampleProfile/inline-mergeprof.ll @@ -1,10 +1,10 @@ ; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s ; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=MERGE %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 diff --git a/llvm/test/Transforms/SampleProfile/inline-topdown.ll b/llvm/test/Transforms/SampleProfile/inline-topdown.ll index 3781f6ad2497..8e8e07696030 100644 --- a/llvm/test/Transforms/SampleProfile/inline-topdown.ll +++ b/llvm/test/Transforms/SampleProfile/inline-topdown.ll @@ -1,10 +1,10 @@ ; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op. ; Test we aren't doing specialization for inlining with default source order -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S | FileCheck -check-prefix=DEFAULT %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-top-down-load=false -S | FileCheck -check-prefix=DEFAULT %s ; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load' -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S | FileCheck -check-prefix=TOPDOWN %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load=true -S | FileCheck -check-prefix=TOPDOWN %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 From llvm-commits at lists.llvm.org Wed Jul 8 09:24:07 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:24:07 +0000 (UTC) Subject: [PATCH] D82919: [SampleFDO] Enable sample-profile-top-down-load by default. In-Reply-To: References: Message-ID: <161a7c47d343e427b470165c824abe7a@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe32469a14037: [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge… (authored by wmi). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82919/new/ https://reviews.llvm.org/D82919 Files: llvm/lib/Transforms/IPO/SampleProfile.cpp llvm/test/Transforms/SampleProfile/inline-mergeprof.ll llvm/test/Transforms/SampleProfile/inline-topdown.ll Index: llvm/test/Transforms/SampleProfile/inline-topdown.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-topdown.ll +++ llvm/test/Transforms/SampleProfile/inline-topdown.ll @@ -1,10 +1,10 @@ ; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op. ; Test we aren't doing specialization for inlining with default source order -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S | FileCheck -check-prefix=DEFAULT %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-top-down-load=false -S | FileCheck -check-prefix=DEFAULT %s ; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load' -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S | FileCheck -check-prefix=TOPDOWN %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load=true -S | FileCheck -check-prefix=TOPDOWN %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/test/Transforms/SampleProfile/inline-mergeprof.ll =================================================================== --- llvm/test/Transforms/SampleProfile/inline-mergeprof.ll +++ llvm/test/Transforms/SampleProfile/inline-mergeprof.ll @@ -1,10 +1,10 @@ ; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=SCALE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S | FileCheck -check-prefix=SCALE %s ; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee' -; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s -; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S | FileCheck -check-prefix=MERGE %s +; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S | FileCheck -check-prefix=MERGE %s @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1 Index: llvm/lib/Transforms/IPO/SampleProfile.cpp =================================================================== --- llvm/lib/Transforms/IPO/SampleProfile.cpp +++ llvm/lib/Transforms/IPO/SampleProfile.cpp @@ -149,14 +149,17 @@ "be accurate. It may be overriden by profile-sample-accurate. ")); static cl::opt ProfileMergeInlinee( - "sample-profile-merge-inlinee", cl::Hidden, cl::init(false), + "sample-profile-merge-inlinee", cl::Hidden, cl::init(true), cl::desc("Merge past inlinee's profile to outline version if sample " - "profile loader decided not to inline a call site.")); + "profile loader decided not to inline a call site. It will " + "only be enabled when top-down order of profile loading is " + "enabled. ")); static cl::opt ProfileTopDownLoad( - "sample-profile-top-down-load", cl::Hidden, cl::init(false), + "sample-profile-top-down-load", cl::Hidden, cl::init(true), cl::desc("Do profile annotation and inlining for functions in top-down " - "order of call graph during sample profile loading.")); + "order of call graph during sample profile loading. It only " + "works for new pass manager. ")); static cl::opt ProfileSizeInline( "sample-profile-inline-size", cl::Hidden, cl::init(false), @@ -1785,6 +1788,15 @@ FunctionOrderList.reserve(M.size()); if (!ProfileTopDownLoad || CG == nullptr) { + if (ProfileMergeInlinee) { + // Disable ProfileMergeInlinee if profile is not loaded in top down order, + // because the profile for a function may be used for the profile + // annotation of its outline copy before the profile merging of its + // non-inlined inline instances, and that is not the way how + // ProfileMergeInlinee is supposed to work. + ProfileMergeInlinee = false; + } + for (Function &F : M) if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile")) FunctionOrderList.push_back(&F); -------------- next part -------------- A non-text attachment was scrubbed... Name: D82919.276463.patch Type: text/x-patch Size: 5156 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:25:04 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:25:04 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <6d5349ca9342792a49e405d36a73e70b@localhost.localdomain> nikic requested changes to this revision. nikic added inline comments. This revision now requires changes to proceed. ================ Comment at: llvm/lib/Transforms/Scalar/BDCE.cpp:127 + Changed = true; + NumSExt2ZExt++; + continue; ---------------- You probably need to `clearAssumptionsOfUsers()` here. Please check this test case: https://alive2.llvm.org/ce/z/caMis2 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Wed Jul 8 09:27:42 2020 From: llvm-commits at lists.llvm.org (Vy Nguyen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:27:42 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: <844cc23286c4b3624ce7d23c9c50ab11@localhost.localdomain> oontvoo updated this revision to Diff 276464. oontvoo marked an inline comment as done. oontvoo added a comment. Return error for empty function Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 Files: llvm/docs/CommandGuide/llvm-exegesis.rst llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att llvm/test/tools/llvm-exegesis/X86/lbr/lit.local.cfg llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.h llvm/tools/llvm-exegesis/lib/X86/CMakeLists.txt llvm/tools/llvm-exegesis/lib/X86/Target.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.h llvm/tools/llvm-exegesis/llvm-exegesis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77422.276464.patch Type: text/x-patch Size: 21475 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:32:12 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 16:32:12 +0000 (UTC) Subject: [PATCH] D81675: SILoadStoreOptimizer: add support for GFX10 image instructions In-Reply-To: References: Message-ID: <21b4e73fede34472fb7b1f10d6a66c15@localhost.localdomain> nhaehnle accepted this revision. nhaehnle added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81675/new/ https://reviews.llvm.org/D81675 From llvm-commits at lists.llvm.org Wed Jul 8 09:33:05 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Wed, 08 Jul 2020 09:33:05 -0700 (PDT) Subject: [llvm] 0b2536d - [NewPM] Add PredicateInfoPrinterPass to PassRegistry.def Message-ID: <5f05f541.1c69fb81.bdc9.0d91@mx.google.com> Author: Arthur Eubanks Date: 2020-07-08T09:32:46-07:00 New Revision: 0b2536d0bdb4cba2a0305067cc0d2ff988ab909d URL: https://github.com/llvm/llvm-project/commit/0b2536d0bdb4cba2a0305067cc0d2ff988ab909d DIFF: https://github.com/llvm/llvm-project/commit/0b2536d0bdb4cba2a0305067cc0d2ff988ab909d.diff LOG: [NewPM] Add PredicateInfoPrinterPass to PassRegistry.def Fixes tests under NPM in Transforms/Util/PredicateInfo. Added: Modified: llvm/lib/Passes/PassRegistry.def Removed: ################################################################################ diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index ecb532ee5553..eb2b740db561 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -240,6 +240,7 @@ FUNCTION_PASS("print", PhiValuesPrinterPass(dbgs())) FUNCTION_PASS("print", RegionInfoPrinterPass(dbgs())) FUNCTION_PASS("print", ScalarEvolutionPrinterPass(dbgs())) FUNCTION_PASS("print", StackSafetyPrinterPass(dbgs())) +FUNCTION_PASS("print-predicateinfo", PredicateInfoPrinterPass(dbgs())) FUNCTION_PASS("reassociate", ReassociatePass()) FUNCTION_PASS("scalarizer", ScalarizerPass()) FUNCTION_PASS("sccp", SCCPPass()) From llvm-commits at lists.llvm.org Wed Jul 8 09:33:19 2020 From: llvm-commits at lists.llvm.org (dmajor via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:33:19 +0000 (UTC) Subject: [PATCH] D76885: [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...} In-Reply-To: References: Message-ID: <32d94b9c835c1bf51efeffd070418081@localhost.localdomain> dmajor added a comment. I filed https://bugs.llvm.org/show_bug.cgi?id=46641 to make sure it doesn't get lost. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76885/new/ https://reviews.llvm.org/D76885 From llvm-commits at lists.llvm.org Wed Jul 8 09:38:55 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:38:55 +0000 (UTC) Subject: [PATCH] D83406: Remove NormalizerRetTy and use the decltype of the KeyPath instead Message-ID: dang created this revision. dang added a reviewer: Bigcheese. Herald added subscribers: llvm-commits, cfe-commits, dexonsmith. Herald added projects: clang, LLVM. Depends on D83315 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83406 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83406.276467.patch Type: text/x-patch Size: 8594 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:43:27 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:43:27 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: jdoerfert added a comment. You think it makes sense to merge the two graphes, as @bbn mentioned? Can you create a script that, given a source file and maybe some options creates the video? Maybe place it in `llvm/utils/Attributor`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 09:47:18 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 09:47:18 -0700 (PDT) Subject: [llvm] 08a2c9c - [X86] Fix copy+paste typo in combineVectorPack assert message. NFC. Message-ID: <5f05f896.1c69fb81.e3581.0fea@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T17:42:42+01:00 New Revision: 08a2c9ce5c86d1754f71b24c5c83b4a07de00749 URL: https://github.com/llvm/llvm-project/commit/08a2c9ce5c86d1754f71b24c5c83b4a07de00749 DIFF: https://github.com/llvm/llvm-project/commit/08a2c9ce5c86d1754f71b24c5c83b4a07de00749.diff LOG: [X86] Fix copy+paste typo in combineVectorPack assert message. NFC. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 5ac94be28adf..cd52684fc263 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -41887,7 +41887,7 @@ static SDValue combineVectorPack(SDNode *N, SelectionDAG &DAG, const X86Subtarget &Subtarget) { unsigned Opcode = N->getOpcode(); assert((X86ISD::PACKSS == Opcode || X86ISD::PACKUS == Opcode) && - "Unexpected shift opcode"); + "Unexpected pack opcode"); EVT VT = N->getValueType(0); SDValue N0 = N->getOperand(0); From llvm-commits at lists.llvm.org Wed Jul 8 09:47:20 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Wed, 08 Jul 2020 09:47:20 -0700 (PDT) Subject: [llvm] 800fb68 - [X86][SSE] Pull out PACK(SHUFFLE(), SHUFFLE()) folds into its own function. NFC. Message-ID: <5f05f898.1c69fb81.5d2fc.0df2@mx.google.com> Author: Simon Pilgrim Date: 2020-07-08T17:42:42+01:00 New Revision: 800fb68420681d14f7690b76421c57ff474349a1 URL: https://github.com/llvm/llvm-project/commit/800fb68420681d14f7690b76421c57ff474349a1 DIFF: https://github.com/llvm/llvm-project/commit/800fb68420681d14f7690b76421c57ff474349a1.diff LOG: [X86][SSE] Pull out PACK(SHUFFLE(),SHUFFLE()) folds into its own function. NFC. Future patches will extend this so declutter combineVectorPack before we start. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index cd52684fc263..017bfba94b61 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -41882,6 +41882,50 @@ static SDValue combineShiftRightLogical(SDNode *N, SelectionDAG &DAG, return SDValue(); } +static SDValue combineVectorPackWithShuffle(SDNode *N, SelectionDAG &DAG) { + unsigned Opcode = N->getOpcode(); + assert((X86ISD::PACKSS == Opcode || X86ISD::PACKUS == Opcode) && + "Unexpected pack opcode"); + + EVT VT = N->getValueType(0); + SDValue N0 = N->getOperand(0); + SDValue N1 = N->getOperand(1); + unsigned NumDstElts = VT.getVectorNumElements(); + + // Attempt to fold PACK(LOSUBVECTOR(SHUFFLE(X)),HISUBVECTOR(SHUFFLE(X))) + // to SHUFFLE(PACK(LOSUBVECTOR(X),HISUBVECTOR(X))), this is mainly for + // truncation trees that help us avoid lane crossing shuffles. + // TODO: There's a lot more we can do for PACK/HADD style shuffle combines. + if (N0.getOpcode() == ISD::EXTRACT_SUBVECTOR && + N1.getOpcode() == ISD::EXTRACT_SUBVECTOR && + N0.getConstantOperandAPInt(1) == 0 && + N1.getConstantOperandAPInt(1) == (NumDstElts / 2) && + N0.getOperand(0) == N1.getOperand(0) && VT.is128BitVector() && + N0.getOperand(0).getValueType().is256BitVector()) { + // TODO - support target/faux shuffles. + SDValue Vec = peekThroughBitcasts(N0.getOperand(0)); + if (auto *SVN = dyn_cast(Vec)) { + // To keep the PACK LHS/RHS coherency, we must be able to scale the unary + // shuffle to a vXi64 width - we can probably relax this in the future. + SmallVector ShuffleMask; + if (SVN->getOperand(1).isUndef() && + scaleShuffleElements(SVN->getMask(), 4, ShuffleMask)) { + SDLoc DL(N); + SDValue Lo, Hi; + std::tie(Lo, Hi) = DAG.SplitVector(SVN->getOperand(0), DL); + Lo = DAG.getBitcast(N0.getValueType(), Lo); + Hi = DAG.getBitcast(N1.getValueType(), Hi); + SDValue Res = DAG.getNode(Opcode, DL, VT, Lo, Hi); + Res = DAG.getBitcast(MVT::v4i32, Res); + Res = DAG.getVectorShuffle(MVT::v4i32, DL, Res, Res, ShuffleMask); + return DAG.getBitcast(VT, Res); + } + } + } + + return SDValue(); +} + static SDValue combineVectorPack(SDNode *N, SelectionDAG &DAG, TargetLowering::DAGCombinerInfo &DCI, const X86Subtarget &Subtarget) { @@ -41955,36 +41999,9 @@ static SDValue combineVectorPack(SDNode *N, SelectionDAG &DAG, return getConstVector(Bits, Undefs, VT.getSimpleVT(), DAG, SDLoc(N)); } - // Attempt to fold PACK(LOSUBVECTOR(SHUFFLE(X)),HISUBVECTOR(SHUFFLE(X))) - // to SHUFFLE(PACK(LOSUBVECTOR(X),HISUBVECTOR(X))), this is mainly for - // truncation trees that help us avoid lane crossing shuffles. - // TODO: There's a lot more we can do for PACK/HADD style shuffle combines. - if (N0.getOpcode() == ISD::EXTRACT_SUBVECTOR && - N1.getOpcode() == ISD::EXTRACT_SUBVECTOR && - N0.getConstantOperandAPInt(1) == 0 && - N1.getConstantOperandAPInt(1) == (NumDstElts / 2) && - N0.getOperand(0) == N1.getOperand(0) && VT.is128BitVector() && - N0.getOperand(0).getValueType().is256BitVector()) { - // TODO - support target/faux shuffles. - SDValue Vec = peekThroughBitcasts(N0.getOperand(0)); - if (auto *SVN = dyn_cast(Vec)) { - // To keep the PACK LHS/RHS coherency, we must be able to scale the unary - // shuffle to a vXi64 width - we can probably relax this in the future. - SmallVector ShuffleMask; - if (SVN->getOperand(1).isUndef() && - scaleShuffleElements(SVN->getMask(), 4, ShuffleMask)) { - SDLoc DL(N); - SDValue Lo, Hi; - std::tie(Lo, Hi) = DAG.SplitVector(SVN->getOperand(0), DL); - Lo = DAG.getBitcast(N0.getValueType(), Lo); - Hi = DAG.getBitcast(N1.getValueType(), Hi); - SDValue Res = DAG.getNode(Opcode, DL, VT, Lo, Hi); - Res = DAG.getBitcast(MVT::v4i32, Res); - Res = DAG.getVectorShuffle(MVT::v4i32, DL, Res, Res, ShuffleMask); - return DAG.getBitcast(VT, Res); - } - } - } + // Try to fold PACK(SHUFFLE(),SHUFFLE()) -> SHUFFLE(PACK()). + if (SDValue V = combineVectorPackWithShuffle(N, DAG)) + return V; // Try to combine a PACKUSWB/PACKSSWB implemented truncate with a regular // truncate to create a larger truncate. From llvm-commits at lists.llvm.org Wed Jul 8 09:49:07 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:49:07 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: wmi added a comment. https://reviews.llvm.org/D82123 to always instrument function entry BB has been committed guarded by a flag. https://reviews.llvm.org/D83024 to enable the flag by default is under review. Can you take another look at the patch? Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 From llvm-commits at lists.llvm.org Wed Jul 8 09:50:15 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Wed, 08 Jul 2020 09:50:15 -0700 (PDT) Subject: [llvm] 39329d5 - [DAGCombiner] add enum for store source value; NFC Message-ID: <5f05f947.1c69fb81.9f63.0669@mx.google.com> Author: Sanjay Patel Date: 2020-07-08T12:49:59-04:00 New Revision: 39329d5724d94737fda0212f8e89ca240f14474a URL: https://github.com/llvm/llvm-project/commit/39329d5724d94737fda0212f8e89ca240f14474a DIFF: https://github.com/llvm/llvm-project/commit/39329d5724d94737fda0212f8e89ca240f14474a.diff LOG: [DAGCombiner] add enum for store source value; NFC This removes existing code duplication and allows us to assert that we are handling the expected cases. We have a list of outstanding bugs that could benefit by handling truncated source values, so that's a possible addition going forward. Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index a1d5769369bb..d12cf74e5cd7 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -638,6 +638,19 @@ namespace { : MemNode(N), OffsetFromBase(Offset) {} }; + // Classify the origin of a stored value. + enum class StoreSource { Unknown, Constant, Extract, Load }; + StoreSource getStoreSource(SDValue StoreVal) { + if (isa(StoreVal) || isa(StoreVal)) + return StoreSource::Constant; + if (StoreVal.getOpcode() == ISD::EXTRACT_VECTOR_ELT || + StoreVal.getOpcode() == ISD::EXTRACT_SUBVECTOR) + return StoreSource::Extract; + if (isa(StoreVal)) + return StoreSource::Load; + return StoreSource::Unknown; + } + /// This is a helper function for visitMUL to check the profitability /// of folding (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2). /// MulNode is the original multiply, AddNode is (add x, c1), @@ -16024,14 +16037,12 @@ void DAGCombiner::getStoreMergeCandidates( if (BasePtr.getBase().isUndef()) return; - bool IsConstantSrc = isa(Val) || isa(Val); - bool IsExtractVecSrc = (Val.getOpcode() == ISD::EXTRACT_VECTOR_ELT || - Val.getOpcode() == ISD::EXTRACT_SUBVECTOR); - bool IsLoadSrc = isa(Val); + StoreSource StoreSrc = getStoreSource(Val); + assert(StoreSrc != StoreSource::Unknown && "Expected known source for store"); BaseIndexOffset LBasePtr; // Match on loadbaseptr if relevant. EVT LoadVT; - if (IsLoadSrc) { + if (StoreSrc == StoreSource::Load) { auto *Ld = cast(Val); LBasePtr = BaseIndexOffset::match(Ld, DAG); LoadVT = Ld->getMemoryVT(); @@ -16059,7 +16070,7 @@ void DAGCombiner::getStoreMergeCandidates( // Allow merging constants of diff erent types as integers. bool NoTypeMatch = (MemVT.isInteger()) ? !MemVT.bitsEq(Other->getMemoryVT()) : Other->getMemoryVT() != MemVT; - if (IsLoadSrc) { + if (StoreSrc == StoreSource::Load) { if (NoTypeMatch) return false; // The Load's Base Ptr must also match @@ -16083,13 +16094,13 @@ void DAGCombiner::getStoreMergeCandidates( } else return false; } - if (IsConstantSrc) { + if (StoreSrc == StoreSource::Constant) { if (NoTypeMatch) return false; if (!(isa(OtherBC) || isa(OtherBC))) return false; } - if (IsExtractVecSrc) { + if (StoreSrc == StoreSource::Extract) { // Do not merge truncated stores here. if (Other->isTruncatingStore()) return false; @@ -16261,16 +16272,12 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { // Perform an early exit check. Do not bother looking at stored values that // are not constants, loads, or extracted vector elements. SDValue StoredVal = peekThroughBitcasts(St->getValue()); - bool IsLoadSrc = isa(StoredVal); - bool IsConstantSrc = isa(StoredVal) || - isa(StoredVal); - bool IsExtractVecSrc = (StoredVal.getOpcode() == ISD::EXTRACT_VECTOR_ELT || - StoredVal.getOpcode() == ISD::EXTRACT_SUBVECTOR); + StoreSource StoreSrc = getStoreSource(StoredVal); bool IsNonTemporalStore = St->isNonTemporal(); - bool IsNonTemporalLoad = - IsLoadSrc && cast(StoredVal)->isNonTemporal(); + bool IsNonTemporalLoad = StoreSrc == StoreSource::Load && + cast(StoredVal)->isNonTemporal(); - if (!IsConstantSrc && !IsLoadSrc && !IsExtractVecSrc) + if (StoreSrc == StoreSource::Unknown) return false; SmallVector StoreNodes; @@ -16335,7 +16342,7 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { const DataLayout &DL = DAG.getDataLayout(); // Store the constants into memory as one consecutive store. - if (IsConstantSrc) { + if (StoreSrc == StoreSource::Constant) { while (NumConsecutiveStores >= 2) { LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; unsigned FirstStoreAS = FirstInChain->getAddressSpace(); @@ -16454,7 +16461,7 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { // When extracting multiple vector elements, try to store them // in one vector store rather than a sequence of scalar stores. - if (IsExtractVecSrc) { + if (StoreSrc == StoreSource::Extract) { // Loop on Consecutive Stores on success. while (NumConsecutiveStores >= 2) { LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; @@ -16522,6 +16529,7 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { // Below we handle the case of multiple consecutive stores that // come from multiple consecutive loads. We merge them into a single // wide load and a single wide store. + assert(StoreSrc == StoreSource::Load && "Expected load source for store"); // Look for load nodes which are used by the stored values. SmallVector LoadNodes; From llvm-commits at lists.llvm.org Wed Jul 8 09:50:19 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Wed, 08 Jul 2020 09:50:19 -0700 (PDT) Subject: [llvm] 683a7f7 - [DAGCombiner] fix function-name formatting; NFC Message-ID: <5f05f94b.1c69fb81.ba785.0eeb@mx.google.com> Author: Sanjay Patel Date: 2020-07-08T12:49:59-04:00 New Revision: 683a7f7025b3115053449a76463e11916ecf350f URL: https://github.com/llvm/llvm-project/commit/683a7f7025b3115053449a76463e11916ecf350f DIFF: https://github.com/llvm/llvm-project/commit/683a7f7025b3115053449a76463e11916ecf350f.diff LOG: [DAGCombiner] fix function-name formatting; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index d12cf74e5cd7..27b340ffcb7e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -678,34 +678,31 @@ namespace { /// can be combined into narrow loads. bool BackwardsPropagateMask(SDNode *N); - /// Helper function for MergeConsecutiveStores which merges the - /// component store chains. + /// Helper function for mergeConsecutiveStores which merges the component + /// store chains. SDValue getMergeStoreChains(SmallVectorImpl &StoreNodes, unsigned NumStores); - /// This is a helper function for MergeConsecutiveStores. When the - /// source elements of the consecutive stores are all constants or - /// all extracted vector elements, try to merge them into one - /// larger store introducing bitcasts if necessary. \return True - /// if a merged store was created. - bool MergeStoresOfConstantsOrVecElts(SmallVectorImpl &StoreNodes, + /// This is a helper function for mergeConsecutiveStores. When the source + /// elements of the consecutive stores are all constants or all extracted + /// vector elements, try to merge them into one larger store introducing + /// bitcasts if necessary. \return True if a merged store was created. + bool mergeStoresOfConstantsOrVecElts(SmallVectorImpl &StoreNodes, EVT MemVT, unsigned NumStores, bool IsConstantSrc, bool UseVector, bool UseTrunc); - /// This is a helper function for MergeConsecutiveStores. Stores - /// that potentially may be merged with St are placed in - /// StoreNodes. RootNode is a chain predecessor to all store - /// candidates. + /// This is a helper function for mergeConsecutiveStores. Stores that + /// potentially may be merged with St are placed in StoreNodes. RootNode is + /// a chain predecessor to all store candidates. void getStoreMergeCandidates(StoreSDNode *St, SmallVectorImpl &StoreNodes, SDNode *&Root); - /// Helper function for MergeConsecutiveStores. Checks if - /// candidate stores have indirect dependency through their - /// operands. RootNode is the predecessor to all stores calculated - /// by getStoreMergeCandidates and is used to prune the dependency check. - /// \return True if safe to merge. + /// Helper function for mergeConsecutiveStores. Checks if candidate stores + /// have indirect dependency through their operands. RootNode is the + /// predecessor to all stores calculated by getStoreMergeCandidates and is + /// used to prune the dependency check. \return True if safe to merge. bool checkMergeStoreCandidatesForDependencies( SmallVectorImpl &StoreNodes, unsigned NumStores, SDNode *RootNode); @@ -714,7 +711,7 @@ namespace { /// This optimization uses wide integers or vectors when possible. /// \return number of stores that were merged into a merged store (the /// affected nodes are stored as a prefix in \p StoreNodes). - bool MergeConsecutiveStores(StoreSDNode *St); + bool mergeConsecutiveStores(StoreSDNode *St); /// Try to transform a truncation where C is a constant: /// (trunc (and X, C)) -> (and (trunc X), (trunc C)) @@ -15863,7 +15860,7 @@ SDValue DAGCombiner::getMergeStoreChains(SmallVectorImpl &StoreNodes, return DAG.getTokenFactor(StoreDL, Chains); } -bool DAGCombiner::MergeStoresOfConstantsOrVecElts( +bool DAGCombiner::mergeStoresOfConstantsOrVecElts( SmallVectorImpl &StoreNodes, EVT MemVT, unsigned NumStores, bool IsConstantSrc, bool UseVector, bool UseTrunc) { // Make sure we have something to merge. @@ -16241,7 +16238,7 @@ bool DAGCombiner::checkMergeStoreCandidatesForDependencies( return true; } -bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { +bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None || !EnableStoreMerging) return false; @@ -16449,7 +16446,7 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { continue; } - RV |= MergeStoresOfConstantsOrVecElts(StoreNodes, MemVT, NumElem, true, + RV |= mergeStoresOfConstantsOrVecElts(StoreNodes, MemVT, NumElem, true, UseVector, LastIntegerTrunc); // Remove merged stores for next iteration. @@ -16516,7 +16513,7 @@ bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) { continue; } - RV |= MergeStoresOfConstantsOrVecElts( + RV |= mergeStoresOfConstantsOrVecElts( StoreNodes, MemVT, NumStoresToMerge, false, true, false); StoreNodes.erase(StoreNodes.begin(), @@ -17046,7 +17043,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *N) { // There can be multiple store sequences on the same chain. // Keep trying to merge store sequences until we are unable to do so // or until we merge the last store on the chain. - bool Changed = MergeConsecutiveStores(ST); + bool Changed = mergeConsecutiveStores(ST); if (!Changed) break; // Return N as merge only uses CombineTo and no worklist clean // up is necessary. @@ -21890,10 +21887,10 @@ bool operator!=(const UnitT &, const UnitT &) { return false; } // redundant, as this function gets called when visiting every store // node, so why not let the work be done on each store as it's visited? // -// I believe this is mainly important because MergeConsecutiveStores +// I believe this is mainly important because mergeConsecutiveStores // is unable to deal with merging stores of diff erent sizes, so unless // we improve the chains of all the potential candidates up-front -// before running MergeConsecutiveStores, it might only see some of +// before running mergeConsecutiveStores, it might only see some of // the nodes that will eventually be candidates, and then not be able // to go from a partially-merged state to the desired final // fully-merged state. From llvm-commits at lists.llvm.org Wed Jul 8 09:56:30 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:56:30 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <3d2f34ad87a56d40edd41c7e6fc170e1@localhost.localdomain> dantrushin updated this revision to Diff 276470. dantrushin added a comment. Add (hand crafted) test for shared landing pad; Slightly change cache handling code to better handle shared landing pads; Improve debug output; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp llvm/test/CodeGen/X86/statepoint-fixup-call.mir llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir llvm/test/CodeGen/X86/statepoint-fixup-shared-ehpad.mir llvm/test/CodeGen/X86/statepoint-vreg.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.276470.patch Type: text/x-patch Size: 41533 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:58:02 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:58:02 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a reviewer: rengolin. Herald added a reviewer: efriedma. Herald added a project: LLVM. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch disables using VECTOR_SHUFFLE to expand BUILD_VECTOR. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83408 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -3,6 +3,18 @@ target triple = "aarch64-unknown-linux-gnu" +; Currently there is no custom lowering for vector shuffles operating on types +; bigger than NEON. However, having no support opens us up to a code generator +; hang when expanding BUILD_VECTOR. Here we just validate the promblematic case +; successfully exits code generation. +define void @hang_when_merging_stores_after_legalisation(<8 x i32>* %a, <2 x i32> %b) #0 { +; CHECK-LABEL: hang_when_merging_stores_after_legalisation: + %splat = shufflevector <2 x i32> %b, <2 x i32> undef, <8 x i32> zeroinitializer + %interleaved.vec = shufflevector <8 x i32> %splat, <8 x i32> undef, <8 x i32> + store <8 x i32> %interleaved.vec, <8 x i32>* %a, align 4 + ret void +} + ; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in ; validating all combinations of vector type. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -734,6 +734,18 @@ bool fallBackToDAGISel(const Instruction &Inst) const override; + bool + shouldExpandBuildVectorWithShuffles(EVT VT, + unsigned DefinedValues) const override; + + /// SVE code generation for fixed length vectors does not custom lower + /// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to + /// merge. However, merging them creates a BUILD_VECTOR that is just as + /// illegal as the original, thus leading to an infinite legalisation loop. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); + } + private: /// Keep a pointer to the AArch64Subtarget around so that we can /// make the right decision when generating code for different targets. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -3564,6 +3564,16 @@ } } +// VECTOR_SHUFFLE is not legal for vectors bigger than NEON, so we cannot use +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { + if (useSVEForFixedLengthVectorVT(VT)) + return false; + + return TargetLowering::shouldExpandBuildVectorWithShuffles(VT, DefinedValues); +} + bool AArch64TargetLowering::useSVEForFixedLengthVectors() const { // Prefer NEON unless larger SVE registers are available. return Subtarget->hasSVE() && Subtarget->getMinSVEVectorSizeInBits() >= 256; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83408.276471.patch Type: text/x-patch Size: 3035 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:58:37 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:58:37 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: clementval updated this revision to Diff 276472. clementval marked 3 inline comments as done. clementval added a comment. Remove default in directive switch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276472.patch Type: text/x-patch Size: 7029 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 09:58:44 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:58:44 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <1b51101c73d932e10b8e89ee363800f8@localhost.localdomain> clementval added inline comments. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- jdenny wrote: > jdenny wrote: > > clementval wrote: > > > jdenny wrote: > > > > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. > > > Will remove it. > > Is the default useful? Are all directives covered by cases? > This is what I'm thinking of: > > http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations Yeah for directive we get remove it since it's fully covered. Just pushed an update. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 09:59:12 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:59:12 +0000 (UTC) Subject: [PATCH] D82316: [LangRef] Add `noundef` attribute to documentation In-Reply-To: References: Message-ID: <137336fd0275af8cb92aea71fb45f32b@localhost.localdomain> nikic added a comment. In D82316#2137939 , @guiand wrote: > Should I land this without waiting for the other two patches? I'd suggest to land this together with the LLVM part of D81678 (which LGTM) and land the clang part that actually starts using this and requires the massive test changes separately. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82316/new/ https://reviews.llvm.org/D82316 From llvm-commits at lists.llvm.org Wed Jul 8 09:59:53 2020 From: llvm-commits at lists.llvm.org (Jessica Paquette via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 16:59:53 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: <4650fc697a147a6b5bb1f181327d615f@localhost.localdomain> paquette added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:240 +static bool buildAnyextOrCopy(Register Dst, Register Src, + MachineIRBuilder &MIRBuilder) { ---------------- Would `MachineIRBuilder::buildExtOrTrunc` work here? If not, maybe it would make sense to move this to `MachineIRBuilder` for the sake of consistency? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 From llvm-commits at lists.llvm.org Wed Jul 8 10:02:59 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:02:59 +0000 (UTC) Subject: [PATCH] D83409: [opt] Remove obsolete --quiet option Message-ID: aeubanks created this revision. aeubanks added reviewers: echristo, hans. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. git blame shows these were last touched in 2004? Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83409 Files: llvm/include/llvm/Support/SystemUtils.h llvm/lib/Support/SystemUtils.cpp llvm/tools/llvm-as/llvm-as.cpp llvm/tools/llvm-extract/llvm-extract.cpp llvm/tools/llvm-link/llvm-link.cpp llvm/tools/opt/PassPrinters.cpp llvm/tools/opt/PassPrinters.h llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83409.276473.patch Type: text/x-patch Size: 12390 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:03:31 2020 From: llvm-commits at lists.llvm.org (Tim Renouf via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:03:31 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: tpr added a reviewer: beanz. tpr added a comment. +Chris Bieneman I'm a bit worried that there is a reason why the removed code was there, rather than using a static initializer. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Wed Jul 8 10:03:43 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:03:43 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: bbn added inline comments. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:906 + InfoCache(InfoCache), CGUpdater(CGUpdater), SG(&DG), Allowed(Allowed), SeedingPeriod(true) {} ---------------- Is this patch based on D83185 ? But I think those are 2 irrelevant patches, right? Maybe you should create a branch from master and apply changes to that. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 10:04:43 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:04:43 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <6a561efccc3aa356aa3557c0d57f708b@localhost.localdomain> thakis added a comment. Cool. Could you check this in, please? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Wed Jul 8 10:06:46 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:06:46 +0000 (UTC) Subject: [PATCH] D83410: [flang] Fix a crash when cosubscript list is empty Message-ID: PeteSteinfeld created this revision. PeteSteinfeld added reviewers: klausler, tskeith. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. When there are errors in the evaluation of every cosubscript expression in a coindexed object, the compiler would crash. I fixed this by just checking to see if there were errors in the evaluation of the cosubscripts before constructing the `DataRef` for the coindexed object. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83410 Files: flang/lib/Semantics/expression.cpp flang/test/Semantics/resolve94.f90 Index: flang/test/Semantics/resolve94.f90 =================================================================== --- flang/test/Semantics/resolve94.f90 +++ flang/test/Semantics/resolve94.f90 @@ -13,6 +13,7 @@ integer, dimension(4) :: intArray integer :: intScalarCoarray[*] integer :: intCoarray[3, 4, *] + integer :: smallIntCoarray[4, *] intCoVar = 343 ! OK rVar1 = rCoarray[1,2,3] @@ -20,6 +21,8 @@ rVar1 = rCoarray[1,2] !ERROR: Must have INTEGER type, but is REAL(4) rVar1 = rCoarray[1,2,3.4] + !ERROR: Must have INTEGER type, but is REAL(4) + iVar1 = smallIntCoarray[3.4] !ERROR: Must be a scalar value, but is a rank-1 array rVar1 = rCoarray[1,intArray,3] ! OK Index: flang/lib/Semantics/expression.cpp =================================================================== --- flang/lib/Semantics/expression.cpp +++ flang/lib/Semantics/expression.cpp @@ -1089,15 +1089,17 @@ std::get>(x.imageSelector.t)) { std::visit( common::visitors{ - [&](const auto &x) {Analyze(x.v); }, + [&](const auto &x) { Analyze(x.v); }, }, imageSelSpec.u); } // Reverse the chain of symbols so that the base is first and coarray // ultimate component is last. - return Designate( - DataRef{CoarrayRef{SymbolVector{reversed.crbegin(), reversed.crend()}, - std::move(subscripts), std::move(cosubscripts)}}); + if (cosubsOk) { + return Designate( + DataRef{CoarrayRef{SymbolVector{reversed.crbegin(), reversed.crend()}, + std::move(subscripts), std::move(cosubscripts)}}); + } } return std::nullopt; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83410.276474.patch Type: text/x-patch Size: 1706 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:12:00 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:12:00 +0000 (UTC) Subject: [PATCH] D83410: [flang] Fix a crash when cosubscript list is empty In-Reply-To: References: Message-ID: <9bf9a3ff57bee9d9e163cb14783a0f00@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9520b6c8ab63: [flang] Fix a crash when cosubscript list is empty (authored by PeteSteinfeld). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83410/new/ https://reviews.llvm.org/D83410 Files: flang/lib/Semantics/expression.cpp flang/test/Semantics/resolve94.f90 Index: flang/test/Semantics/resolve94.f90 =================================================================== --- flang/test/Semantics/resolve94.f90 +++ flang/test/Semantics/resolve94.f90 @@ -13,6 +13,7 @@ integer, dimension(4) :: intArray integer :: intScalarCoarray[*] integer :: intCoarray[3, 4, *] + integer :: smallIntCoarray[4, *] intCoVar = 343 ! OK rVar1 = rCoarray[1,2,3] @@ -20,6 +21,8 @@ rVar1 = rCoarray[1,2] !ERROR: Must have INTEGER type, but is REAL(4) rVar1 = rCoarray[1,2,3.4] + !ERROR: Must have INTEGER type, but is REAL(4) + iVar1 = smallIntCoarray[3.4] !ERROR: Must be a scalar value, but is a rank-1 array rVar1 = rCoarray[1,intArray,3] ! OK Index: flang/lib/Semantics/expression.cpp =================================================================== --- flang/lib/Semantics/expression.cpp +++ flang/lib/Semantics/expression.cpp @@ -1089,15 +1089,17 @@ std::get>(x.imageSelector.t)) { std::visit( common::visitors{ - [&](const auto &x) {Analyze(x.v); }, + [&](const auto &x) { Analyze(x.v); }, }, imageSelSpec.u); } // Reverse the chain of symbols so that the base is first and coarray // ultimate component is last. - return Designate( - DataRef{CoarrayRef{SymbolVector{reversed.crbegin(), reversed.crend()}, - std::move(subscripts), std::move(cosubscripts)}}); + if (cosubsOk) { + return Designate( + DataRef{CoarrayRef{SymbolVector{reversed.crbegin(), reversed.crend()}, + std::move(subscripts), std::move(cosubscripts)}}); + } } return std::nullopt; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83410.276477.patch Type: text/x-patch Size: 1706 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:14:09 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:14:09 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: <4f0eac625725c4b14c210c5923e6e46f@localhost.localdomain> yamauchi updated this revision to Diff 276478. yamauchi added a comment. Address comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 Files: llvm/lib/Target/X86/X86FixupLEAs.cpp llvm/lib/Target/X86/X86PadShortFunction.cpp llvm/test/CodeGen/X86/fixup-lea.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83330.276478.patch Type: text/x-patch Size: 5477 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:14:43 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:14:43 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: yamauchi marked an inline comment as done. yamauchi added inline comments. ================ Comment at: llvm/test/CodeGen/X86/opt-pipeline.ll:187 ; CHECK-NEXT: X86 Atom pad short functions +; CHECK-NEXT: Lazy Machine Block Frequency Analysis ; CHECK-NEXT: X86 LEA Fixup ---------------- nikic wrote: > Side note: You might want to mark LazyMBFI as preserved in X86PadShortFunction, I doubt that pass changes anything related to block frequency. Done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 From llvm-commits at lists.llvm.org Wed Jul 8 10:15:24 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 10:15:24 -0700 (PDT) Subject: [lld] 4ce56b8 - [ELF] Add -z dead-reloc-in-nonalloc== Message-ID: <5f05ff2c.1c69fb81.307aa.0f33@mx.google.com> Author: Fangrui Song Date: 2020-07-08T10:15:16-07:00 New Revision: 4ce56b8122219f7b79d75d33184e3ec890a6e222 URL: https://github.com/llvm/llvm-project/commit/4ce56b8122219f7b79d75d33184e3ec890a6e222 DIFF: https://github.com/llvm/llvm-project/commit/4ce56b8122219f7b79d75d33184e3ec890a6e222.diff LOG: [ELF] Add -z dead-reloc-in-nonalloc== ... to customize the tombstone value we use for an absolute relocation referencing a discarded symbol. This can be used as a workaround when some debug processing tool has trouble with current -1 tombstone value (https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c11 ) For example, to get the current built-in rules (not considering the .debug_line special case for ICF): ``` -z dead-reloc-in-nonalloc='.debug_*=0xffffffffffffffff' -z dead-reloc-in-nonalloc=.debug_loc=0xfffffffffffffffe -z dead-reloc-in-nonalloc=.debug_ranges=0xfffffffffffffffe ``` To get GNU ld (as of binutils 2.35)'s behavior: ``` -z dead-reloc-in-nonalloc='*=0' -z dead-reloc-in-nonalloc=.debug_ranges=1 ``` This option has other use cases. For example, if we want to check whether a non-SHF_ALLOC section has dead relocations. With this patch, we can run a regular LLD and run another with a special -z dead-reloc-in-nonalloc=, then compare their output. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D83264 Added: lld/test/ELF/dead-reloc-in-nonalloc.s Modified: lld/ELF/Config.h lld/ELF/Driver.cpp lld/ELF/InputSection.cpp lld/docs/ld.lld.1 lld/test/ELF/debug-dead-reloc.s Removed: ################################################################################ diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h index 9486ef233037..e74a4a0c5b22 100644 --- a/lld/ELF/Config.h +++ b/lld/ELF/Config.h @@ -145,6 +145,7 @@ struct Configuration { bool checkSections; bool compressDebugSections; bool cref; + std::vector> deadRelocInNonAlloc; bool defineCommon; bool demangle = true; bool dependentLibraries; diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp index 4e025071fc1e..301f11359823 100644 --- a/lld/ELF/Driver.cpp +++ b/lld/ELF/Driver.cpp @@ -444,6 +444,7 @@ static bool isKnownZFlag(StringRef s) { s == "rela" || s == "relro" || s == "retpolineplt" || s == "rodynamic" || s == "shstk" || s == "text" || s == "undefs" || s == "wxneeded" || s.startswith("common-page-size=") || + s.startswith("dead-reloc-in-nonalloc=") || s.startswith("max-page-size=") || s.startswith("stack-size=") || s.startswith("start-stop-visibility="); } @@ -1069,6 +1070,27 @@ static void readConfigs(opt::InputArgList &args) { config->zText = getZFlag(args, "text", "notext", true); config->zWxneeded = hasZOption(args, "wxneeded"); + for (opt::Arg *arg : args.filtered(OPT_z)) { + std::pair option = + StringRef(arg->getValue()).split('='); + if (option.first != "dead-reloc-in-nonalloc") + continue; + constexpr StringRef errPrefix = "-z dead-reloc-in-nonalloc=: "; + std::pair kv = option.second.split('='); + if (kv.first.empty() || kv.second.empty()) { + error(errPrefix + "expected ="); + continue; + } + uint64_t v; + if (!to_integer(kv.second, v)) + error(errPrefix + "expected a non-negative integer, but got '" + + kv.second + "'"); + else if (Expected pat = GlobPattern::create(kv.first)) + config->deadRelocInNonAlloc.emplace_back(std::move(*pat), v); + else + error(errPrefix + toString(pat.takeError())); + } + // Parse LTO options. if (auto *arg = args.getLastArg(OPT_plugin_opt_mcpu_eq)) parseClangOption(saver.save("-mcpu=" + StringRef(arg->getValue())), diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp index fa7c0fb9b4c1..7a7ebd974909 100644 --- a/lld/ELF/InputSection.cpp +++ b/lld/ELF/InputSection.cpp @@ -857,6 +857,12 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef rels) { const bool isDebugLocOrRanges = isDebug && (name == ".debug_loc" || name == ".debug_ranges"); const bool isDebugLine = isDebug && name == ".debug_line"; + Optional tombstone; + for (const auto &patAndValue : llvm::reverse(config->deadRelocInNonAlloc)) + if (patAndValue.first.match(this->name)) { + tombstone = patAndValue.second; + break; + } for (const RelTy &rel : rels) { RelType type = rel.getType(config->isMips64EL); @@ -907,7 +913,8 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef rels) { continue; } - if (isDebug && (type == target->symbolicRel || expr == R_DTPREL)) { + if (tombstone || + (isDebug && (type == target->symbolicRel || expr == R_DTPREL))) { // Resolve relocations in .debug_* referencing (discarded symbols or ICF // folded section symbols) to a tombstone value. Resolving to addend is // unsatisfactory because the result address range may collide with a @@ -935,8 +942,11 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef rels) { auto *ds = dyn_cast(&sym); if (!sym.getOutputSection() || (ds && ds->section->repl != ds->section && !isDebugLine)) { - target->relocateNoSym(bufLoc, type, - isDebugLocOrRanges ? UINT64_MAX - 1 : UINT64_MAX); + // If -z dead-reloc-in-nonalloc= is specified, respect it. + const uint64_t value = + tombstone ? SignExtend64(*tombstone) + : (isDebugLocOrRanges ? UINT64_MAX - 1 : UINT64_MAX); + target->relocateNoSym(bufLoc, type, value); continue; } } diff --git a/lld/docs/ld.lld.1 b/lld/docs/ld.lld.1 index 3acc818afa22..5edeaf85f93f 100644 --- a/lld/docs/ld.lld.1 +++ b/lld/docs/ld.lld.1 @@ -625,6 +625,13 @@ Use wrapper functions for symbol. Linker option extensions. .Bl -tag -width indent -compact .Pp +.It Cm dead-reloc-in-nonalloc Ns = Ns Ar section_glob=value +Resolve a relocation in a matched non-SHF_ALLOC section referencing a discarded symbol to +.Ar value +Accepts globs, in the event of a section matching more than one option, the last +option takes precedence. An order of least specific to most specific match is +recommended. +.Pp .It Cm execstack Make the main stack executable. Stack permissions are recorded in the diff --git a/lld/test/ELF/dead-reloc-in-nonalloc.s b/lld/test/ELF/dead-reloc-in-nonalloc.s new file mode 100644 index 000000000000..00d3d2cbc4a8 --- /dev/null +++ b/lld/test/ELF/dead-reloc-in-nonalloc.s @@ -0,0 +1,69 @@ +# REQUIRES: x86 +## Test that -z dead-reloc-in-nonalloc= can customize the tombstone value we +## use for an absolute relocation referencing a discarded symbol. + +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc=.debug_info=0xaaaaaaaa \ +# RUN: -z dead-reloc-in-nonalloc=.not_debug=0xbbbbbbbb %t.o -o %t +# RUN: llvm-objdump -s %t | FileCheck %s --check-prefixes=COMMON,AA +## 0xaaaaaaaa == 2863311530 +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc=.debug_info=2863311530 \ +# RUN: -z dead-reloc-in-nonalloc=.not_debug=0xbbbbbbbb %t.o -o - | cmp %t - + +# COMMON: Contents of section .debug_addr: +# COMMON-NEXT: 0000 [[ADDR:[0-9a-f]+]] 00000000 ffffffff ffffffff + +# AA: Contents of section .debug_info: +# AA-NEXT: 0000 [[ADDR]] 00000000 aaaaaaaa 00000000 +# AA: Contents of section .not_debug: +# AA-NEXT: 0000 bbbbbbbb + +## Specifying zero can get a behavior similar to GNU ld. +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc=.debug_info=0 %t.o -o %tzero +# RUN: llvm-objdump -s %tzero | FileCheck %s --check-prefixes=COMMON,ZERO + +# ZERO: Contents of section .debug_info: +# ZERO-NEXT: 0000 {{[0-9a-f]+}}000 00000000 00000000 00000000 + +## Glob works. +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc='.debug_i*=0xaaaaaaaa' \ +# RUN: -z dead-reloc-in-nonalloc='[.]not_debug=0xbbbbbbbb' %t.o -o - | cmp %t - + +## If a section matches multiple option. The last option wins. +# RUN: ld.lld --icf=all -z dead-reloc-in-nonalloc='.debug_info=1' \ +# RUN: -z dead-reloc-in-nonalloc='.debug_i*=0' %t.o -o - | cmp %tzero - + +## Test all possible invalid cases. +# RUN: not ld.lld -z dead-reloc-in-nonalloc= 2>&1 | FileCheck %s --check-prefix=USAGE +# RUN: not ld.lld -z dead-reloc-in-nonalloc=a= 2>&1 | FileCheck %s --check-prefix=USAGE +# RUN: not ld.lld -z dead-reloc-in-nonalloc==0 2>&1 | FileCheck %s --check-prefix=USAGE + +# USAGE: error: -z dead-reloc-in-nonalloc=: expected = + +# RUN: not ld.lld -z dead-reloc-in-nonalloc=a=-1 2>&1 | FileCheck %s --check-prefix=NON-INTEGER + +# NON-INTEGER: error: -z dead-reloc-in-nonalloc=: expected a non-negative integer, but got '-1' + +# RUN: not ld.lld -z dead-reloc-in-nonalloc='['=0 2>&1 | FileCheck %s --check-prefix=INVALID + +# INVALID: error: -z dead-reloc-in-nonalloc=: invalid glob pattern: [ + +.globl _start +_start: + ret + +## .text.1 will be folded by ICF. +.section .text.1,"ax" + ret + +.section .debug_addr + .quad .text+8 + .quad .text.1+8 + +.section .debug_info + .quad .text+8 + .quad .text.1+8 + +## Test a non-.debug_ section. +.section .not_debug + .long .text.1+8 diff --git a/lld/test/ELF/debug-dead-reloc.s b/lld/test/ELF/debug-dead-reloc.s index 7e6dc8d52b37..d784519e9af4 100644 --- a/lld/test/ELF/debug-dead-reloc.s +++ b/lld/test/ELF/debug-dead-reloc.s @@ -19,6 +19,13 @@ # CHECK-NEXT: 0000 ffffffff ffffffff 08000000 00000000 # CHECK-NEXT: 0010 ffffffff ffffffff 08000000 00000000 +## -z dead-reloc-in-nonalloc= can override the tombstone value. +# RUN: ld.lld --gc-sections -z dead-reloc-in-nonalloc=.debug_loc=42 %t.o %t1.o %t1.o -o %t42 +# RUN: llvm-objdump -s %t42 | FileCheck %s --check-prefix=OVERRIDE + +# OVERRIDE: Contents of section .debug_loc: +# OVERRIDE-NEXT: 0000 2a000000 00000000 2a000000 00000000 + .section .text.1,"ax" .byte 0 .section .text.2,"axe" From llvm-commits at lists.llvm.org Wed Jul 8 10:15:27 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:15:27 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <481034ec2057397f3da2ee13e716fb7b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG4ce56b812221: [ELF] Add -z dead-reloc-in-nonalloc=<section_glob>=<value> (authored by MaskRay). Changed prior to commit: https://reviews.llvm.org/D83264?vs=276098&id=276479#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 Files: lld/ELF/Config.h lld/ELF/Driver.cpp lld/ELF/InputSection.cpp lld/docs/ld.lld.1 lld/test/ELF/dead-reloc-in-nonalloc.s lld/test/ELF/debug-dead-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83264.276479.patch Type: text/x-patch Size: 8037 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:15:39 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:15:39 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: nickdesaulniers added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:68-70 + if (!AS.getNumAttributes()) + return; // No attributes to begin with. + visitAttributeSet(AS, GlobalVariablesToRefine[&GV]); ---------------- ``` if (AS.getNumAttributes()) visitAttributeSet(AS, GlobalVariablesToRefine[&GV]); ``` ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:60-61 + void visitModule(Module &M) { + for_each(M.getGlobalList(), + [&](GlobalVariable &GV) { visitGlobalVariable(GV); }); + } ---------------- lebedev.ri wrote: > dblaikie wrote: > > range-based-for loop, probably? > Hm, can use either. Why not `for_each` ? I find the range-for more concise, FWIW. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 10:18:20 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:18:20 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: jdenny accepted this revision. jdenny added a comment. This revision is now accepted and ready to land. Other than the last bit of cleanup I commented on, LGTM. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- clementval wrote: > jdenny wrote: > > jdenny wrote: > > > clementval wrote: > > > > jdenny wrote: > > > > > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. > > > > Will remove it. > > > Is the default useful? Are all directives covered by cases? > > This is what I'm thinking of: > > > > http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations > Yeah for directive we get remove it since it's fully covered. Just pushed an update. But it needs `llvm_unreachable`, whose message makes sense now that `default` is removed. Also, the last update removed the wrong `default` from the emitter. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 10:22:42 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:22:42 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: <1b00da44641bd0a52397bf1e77fb7281@localhost.localdomain> davidxl added a comment. Can you first split the NFC part (refactoring part such as GetEntryForPercentile) out ? ================ Comment at: llvm/include/llvm/ProfileData/InstrProf.h:682 /// Scale up value profile data counts. - void scale(uint64_t Weight, function_ref Warn); + void scale(uint64_t Norm, uint64_t DeNorm, + function_ref Warn); ---------------- document the parameters. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 From llvm-commits at lists.llvm.org Wed Jul 8 10:24:30 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 10:24:30 -0700 (PDT) Subject: [llvm] e89c075 - [test] Run llvm/test/**/*.yaml & don't run llvm/test/**/*.cxx (not exist) Message-ID: <5f06014e.1c69fb81.e3e56.08ca@mx.google.com> Author: Fangrui Song Date: 2020-07-08T10:22:49-07:00 New Revision: e89c075f3251bc4778dceb890388483151f24659 URL: https://github.com/llvm/llvm-project/commit/e89c075f3251bc4778dceb890388483151f24659 DIFF: https://github.com/llvm/llvm-project/commit/e89c075f3251bc4778dceb890388483151f24659.diff LOG: [test] Run llvm/test/**/*.yaml & don't run llvm/test/**/*.cxx (not exist) This patch extends D58439 (`llvm/test/{yaml2obj,obj2yaml}/**/*.yaml`) and runs all `llvm/test/**/*.yaml` Many directories have configured `.yaml` (see the deleted lit.local.cfg files). Yet still some don't configure .yaml and have caused stale tests: * 8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a test/llvm-readobj * bdc3134e237737dd46b51cd1ecd41ecbbe9f921a test/ExecutionEngine Just hoist .yaml to `llvm/test/lit.cfg.py`. Also delete .cxx which is not used. The number of tests running on my machine increases from 38304 to 38309. The list of new tests: ``` ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml Object/archive-error-tmp.txt tools/llvm-ar/coff-weak.yaml tools/llvm-readobj/ELF/verneed-flags.yaml tools/obj2yaml/COFF/bss.s ``` Reviewed By: grimar, jhenderson, rupprecht Differential Revision: https://reviews.llvm.org/D83350 Added: Modified: llvm/test/lit.cfg.py llvm/test/tools/llvm-as/lit.local.cfg llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg llvm/test/tools/llvm-nm/lit.local.cfg Removed: llvm/test/Object/lit.local.cfg llvm/test/ObjectYAML/lit.local.cfg llvm/test/tools/llvm-dwarfdump/lit.local.cfg llvm/test/tools/llvm-objdump/lit.local.cfg llvm/test/tools/llvm-readobj/COFF/lit.local.cfg llvm/test/tools/llvm-xray/X86/lit.local.cfg llvm/test/tools/obj2yaml/lit.local.cfg llvm/test/tools/yaml2obj/lit.local.cfg ################################################################################ diff --git a/llvm/test/Object/lit.local.cfg b/llvm/test/Object/lit.local.cfg deleted file mode 100644 index ec8ad451d2da..000000000000 --- a/llvm/test/Object/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] diff --git a/llvm/test/ObjectYAML/lit.local.cfg b/llvm/test/ObjectYAML/lit.local.cfg deleted file mode 100644 index 8169b9f95e11..000000000000 --- a/llvm/test/ObjectYAML/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml'] diff --git a/llvm/test/lit.cfg.py b/llvm/test/lit.cfg.py index a3a97dd3b5c8..4eaa6cb77c82 100644 --- a/llvm/test/lit.cfg.py +++ b/llvm/test/lit.cfg.py @@ -22,7 +22,7 @@ # suffixes: A list of file extensions to treat as test files. This is overriden # by individual lit.local.cfg files in the test subdirectories. -config.suffixes = ['.ll', '.c', '.cxx', '.test', '.txt', '.s', '.mir'] +config.suffixes = ['.ll', '.c', '.test', '.txt', '.s', '.mir', '.yaml'] # excludes: A list of directories to exclude from the testsuite. The 'Inputs' # subdirectories contain auxiliary inputs for various tests in their parent diff --git a/llvm/test/tools/llvm-as/lit.local.cfg b/llvm/test/tools/llvm-as/lit.local.cfg index 1fc0bea084ca..c8625f4d9d24 100644 --- a/llvm/test/tools/llvm-as/lit.local.cfg +++ b/llvm/test/tools/llvm-as/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] diff --git a/llvm/test/tools/llvm-dwarfdump/lit.local.cfg b/llvm/test/tools/llvm-dwarfdump/lit.local.cfg deleted file mode 100644 index ec8ad451d2da..000000000000 --- a/llvm/test/tools/llvm-dwarfdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] diff --git a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg index e06c15ef1413..8a995e36a127 100644 --- a/llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg +++ b/llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg @@ -1,4 +1,2 @@ if not ('ARM' in config.root.targets and 'AArch64' in config.root.targets): config.unsupported = True - -config.suffixes = ['.test', '.yaml'] diff --git a/llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg b/llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg index 52c762f5cfb8..c8625f4d9d24 100644 --- a/llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg +++ b/llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.test', '.yaml'] diff --git a/llvm/test/tools/llvm-nm/lit.local.cfg b/llvm/test/tools/llvm-nm/lit.local.cfg index 1fc0bea084ca..c8625f4d9d24 100644 --- a/llvm/test/tools/llvm-nm/lit.local.cfg +++ b/llvm/test/tools/llvm-nm/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] diff --git a/llvm/test/tools/llvm-objdump/lit.local.cfg b/llvm/test/tools/llvm-objdump/lit.local.cfg deleted file mode 100644 index c3e092a7ba89..000000000000 --- a/llvm/test/tools/llvm-objdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml', '.txt'] diff --git a/llvm/test/tools/llvm-readobj/COFF/lit.local.cfg b/llvm/test/tools/llvm-readobj/COFF/lit.local.cfg deleted file mode 100644 index 38f335368f17..000000000000 --- a/llvm/test/tools/llvm-readobj/COFF/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes.add('.yaml') diff --git a/llvm/test/tools/llvm-xray/X86/lit.local.cfg b/llvm/test/tools/llvm-xray/X86/lit.local.cfg deleted file mode 100644 index 4f00369e13d7..000000000000 --- a/llvm/test/tools/llvm-xray/X86/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml', '.ll', '.txt'] diff --git a/llvm/test/tools/obj2yaml/lit.local.cfg b/llvm/test/tools/obj2yaml/lit.local.cfg deleted file mode 100644 index db82cc231003..000000000000 --- a/llvm/test/tools/obj2yaml/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] diff --git a/llvm/test/tools/yaml2obj/lit.local.cfg b/llvm/test/tools/yaml2obj/lit.local.cfg deleted file mode 100644 index db82cc231003..000000000000 --- a/llvm/test/tools/yaml2obj/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] From llvm-commits at lists.llvm.org Wed Jul 8 10:24:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:24:42 +0000 (UTC) Subject: [PATCH] D83350: [test] Run llvm/test/**/*.yaml & don't run llvm/test/**/*.cxx (not exist) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe89c075f3251: [test] Run llvm/test/**/*.yaml & don't run llvm/test/**/*.cxx (not exist) (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83350/new/ https://reviews.llvm.org/D83350 Files: llvm/test/Object/lit.local.cfg llvm/test/ObjectYAML/lit.local.cfg llvm/test/lit.cfg.py llvm/test/tools/llvm-as/lit.local.cfg llvm/test/tools/llvm-dwarfdump/lit.local.cfg llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg llvm/test/tools/llvm-nm/lit.local.cfg llvm/test/tools/llvm-objdump/lit.local.cfg llvm/test/tools/llvm-readobj/COFF/lit.local.cfg llvm/test/tools/llvm-xray/X86/lit.local.cfg llvm/test/tools/obj2yaml/lit.local.cfg llvm/test/tools/yaml2obj/lit.local.cfg Index: llvm/test/tools/yaml2obj/lit.local.cfg =================================================================== --- llvm/test/tools/yaml2obj/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/obj2yaml/lit.local.cfg =================================================================== --- llvm/test/tools/obj2yaml/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-xray/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-xray/X86/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml', '.ll', '.txt'] Index: llvm/test/tools/llvm-readobj/COFF/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-readobj/COFF/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes.add('.yaml') Index: llvm/test/tools/llvm-objdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-objdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml', '.txt'] Index: llvm/test/tools/llvm-nm/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-nm/lit.local.cfg +++ llvm/test/tools/llvm-nm/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/X86/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg +++ llvm/test/tools/llvm-gsymutil/ARM_AArch64/lit.local.cfg @@ -1,4 +1,2 @@ if not ('ARM' in config.root.targets and 'AArch64' in config.root.targets): config.unsupported = True - -config.suffixes = ['.test', '.yaml'] Index: llvm/test/tools/llvm-dwarfdump/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-dwarfdump/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] Index: llvm/test/tools/llvm-as/lit.local.cfg =================================================================== --- llvm/test/tools/llvm-as/lit.local.cfg +++ llvm/test/tools/llvm-as/lit.local.cfg @@ -1,4 +1,2 @@ if not 'X86' in config.root.targets: config.unsupported = True - -config.suffixes = ['.ll', '.s', '.test', '.yaml'] Index: llvm/test/lit.cfg.py =================================================================== --- llvm/test/lit.cfg.py +++ llvm/test/lit.cfg.py @@ -22,7 +22,7 @@ # suffixes: A list of file extensions to treat as test files. This is overriden # by individual lit.local.cfg files in the test subdirectories. -config.suffixes = ['.ll', '.c', '.cxx', '.test', '.txt', '.s', '.mir'] +config.suffixes = ['.ll', '.c', '.test', '.txt', '.s', '.mir', '.yaml'] # excludes: A list of directories to exclude from the testsuite. The 'Inputs' # subdirectories contain auxiliary inputs for various tests in their parent Index: llvm/test/ObjectYAML/lit.local.cfg =================================================================== --- llvm/test/ObjectYAML/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.yaml'] Index: llvm/test/Object/lit.local.cfg =================================================================== --- llvm/test/Object/lit.local.cfg +++ /dev/null @@ -1 +0,0 @@ -config.suffixes = ['.test', '.ll', '.s', '.yaml'] -------------- next part -------------- A non-text attachment was scrubbed... Name: D83350.276481.patch Type: text/x-patch Size: 3896 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:25:58 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:25:58 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: <06dcd2c5d419a57e0a144d283515eef2@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:491 + W.print##H(S, T); \ + if ((X = X - sizeof(T)) == 0) \ + return ---------------- This strikes me as extremely hazardous. What if we get a length value that is reflective of a partial field? ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:501 + DictScope DS(W, "AuxiliaryHeader"); + PrintAuxMember(Hex, "Magic", AuxHeader->AuxMagic, AuxSize); + PrintAuxMember(Hex, "Version", AuxHeader->Version, AuxSize); ---------------- jasonliu wrote: > Why do you need to pass in `AuxSize` to the macro function when all inputs are the same? `AuxSize` is modified by each macro(!) invocation... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Wed Jul 8 10:25:59 2020 From: llvm-commits at lists.llvm.org (Dimitry Andric via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:25:59 +0000 (UTC) Subject: [PATCH] D83411: Use md5 -q for HashProgramOutput.sh, to show only the checksum. Message-ID: dim created this revision. dim added reviewers: ddunbar, jdoerfert, serge-sans-paille, tra. This option is supported on both BSDs and macOS, and ensures the command also works if GNU coreutils are not installed. Repository: rT test-suite https://reviews.llvm.org/D83411 Files: HashProgramOutput.sh Index: HashProgramOutput.sh =================================================================== --- HashProgramOutput.sh +++ HashProgramOutput.sh @@ -18,7 +18,7 @@ mv $1 $1.bak if [ $is_md5sum = "0" ]; then - $md5cmd < $1.bak > $1 + $md5cmd -q < $1.bak > $1 else $md5cmd < $1.bak | cut -d' ' -f 1 > $1 fi -------------- next part -------------- A non-text attachment was scrubbed... Name: D83411.276482.patch Type: text/x-patch Size: 322 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:26:51 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:26:51 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to sharedToExecRelax. NFC In-Reply-To: References: Message-ID: <2c24a0e945204722326fddbd3e935da1@localhost.localdomain> MaskRay marked 2 inline comments as done. MaskRay added inline comments. ================ Comment at: lld/ELF/Relocations.cpp:311 R_TLSIE_HINT>(expr) && - canRelax && isLocalInExecutable) { + sharedToExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); ---------------- psmith wrote: > MaskRay wrote: > > Initial-Exec -> Local-Exec is a relaxation from executable to executable. `sharedToExecRelax` is not an appropriate name. Shall we rename the variable? > > > > Technically, a shared object can use Initial-Exec as well if it is part of initial modules (via transitive DT_NEEDED; `DF_STATIC_TLS`). > Perhaps just toExecRelax or canExecRelax? Thanks. toExecRelax sounds good. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 From llvm-commits at lists.llvm.org Wed Jul 8 10:27:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:27:23 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to toExecRelax. NFC In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 276483. MaskRay marked an inline comment as done. MaskRay retitled this revision from "[ELF] Rename canRelax to sharedToExecRelax. NFC" to "[ELF] Rename canRelax to toExecRelax. NFC". MaskRay edited the summary of this revision. MaskRay added a comment. sharedToExecRelax -> toExecRelax Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 Files: lld/ELF/Relocations.cpp Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -197,9 +197,9 @@ return 1; } - bool canRelax = config->emachine != EM_ARM && - config->emachine != EM_HEXAGON && - config->emachine != EM_RISCV; + bool toExecRelax = !config->shared && config->emachine != EM_ARM && + config->emachine != EM_HEXAGON && + config->emachine != EM_RISCV; // If we are producing an executable and the symbol is non-preemptable, it // must be defined and the code sequence can be relaxed to use Local-Exec. @@ -217,7 +217,7 @@ if (oneof( expr)) { // Local-Dynamic relocs can be relaxed to Local-Exec. - if (canRelax && !config->shared) { + if (toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -238,7 +238,7 @@ } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && canRelax && !config->shared) { + if (expr == R_DTPREL && toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -260,7 +260,7 @@ if (oneof(expr)) { - if (!canRelax || config->shared) { + if (!toExecRelax) { if (in.got->addDynTlsEntry(sym)) { uint64_t off = in.got->getGlobalDynOffset(sym); @@ -308,7 +308,7 @@ // defined. if (oneof(expr) && - canRelax && isLocalInExecutable) { + toExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); return 1; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83243.276483.patch Type: text/x-patch Size: 2054 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:28:39 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 10:28:39 -0700 (PDT) Subject: [lld] 169ec2d - [ELF] Rename canRelax to toExecRelax. NFC Message-ID: <5f060247.1c69fb81.ffe97.080c@mx.google.com> Author: Fangrui Song Date: 2020-07-08T10:27:31-07:00 New Revision: 169ec2d6b006ea31114a7d6ddc3f002d3cb4acb3 URL: https://github.com/llvm/llvm-project/commit/169ec2d6b006ea31114a7d6ddc3f002d3cb4acb3 DIFF: https://github.com/llvm/llvm-project/commit/169ec2d6b006ea31114a7d6ddc3f002d3cb4acb3.diff LOG: [ELF] Rename canRelax to toExecRelax. NFC In the absence of TLS relaxation (rewrite of code sequences), there is still an applicable optimization: [gd]: General Dynamic: resolve DTPMOD to 1 and/or resolve DTPOFF statically All the other relaxations are only performed when transiting to executable (`!config->shared`). Since [gd] is handled differently, we can fold `!config->shared` into canRelax and simplify its use sites. Rename the variable to reflect to new semantics. Reviewed By: grimar, psmith Differential Revision: https://reviews.llvm.org/D83243 Added: Modified: lld/ELF/Relocations.cpp Removed: ################################################################################ diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp index 42341f67afee..751ded397768 100644 --- a/lld/ELF/Relocations.cpp +++ b/lld/ELF/Relocations.cpp @@ -197,9 +197,9 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, return 1; } - bool canRelax = config->emachine != EM_ARM && - config->emachine != EM_HEXAGON && - config->emachine != EM_RISCV; + bool toExecRelax = !config->shared && config->emachine != EM_ARM && + config->emachine != EM_HEXAGON && + config->emachine != EM_RISCV; // If we are producing an executable and the symbol is non-preemptable, it // must be defined and the code sequence can be relaxed to use Local-Exec. @@ -217,7 +217,7 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, if (oneof( expr)) { // Local-Dynamic relocs can be relaxed to Local-Exec. - if (canRelax && !config->shared) { + if (toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -238,7 +238,7 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && canRelax && !config->shared) { + if (expr == R_DTPREL && toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -260,7 +260,7 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, if (oneof(expr)) { - if (!canRelax || config->shared) { + if (!toExecRelax) { if (in.got->addDynTlsEntry(sym)) { uint64_t off = in.got->getGlobalDynOffset(sym); @@ -308,7 +308,7 @@ handleTlsRelocation(RelType type, Symbol &sym, InputSectionBase &c, // defined. if (oneof(expr) && - canRelax && isLocalInExecutable) { + toExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); return 1; } From llvm-commits at lists.llvm.org Wed Jul 8 10:28:41 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:28:41 +0000 (UTC) Subject: [PATCH] D83243: [ELF] Rename canRelax to toExecRelax. NFC In-Reply-To: References: Message-ID: <416d17396149418b3199e8bbf65e3e77@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG169ec2d6b006: [ELF] Rename canRelax to toExecRelax. NFC (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83243/new/ https://reviews.llvm.org/D83243 Files: lld/ELF/Relocations.cpp Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -197,9 +197,9 @@ return 1; } - bool canRelax = config->emachine != EM_ARM && - config->emachine != EM_HEXAGON && - config->emachine != EM_RISCV; + bool toExecRelax = !config->shared && config->emachine != EM_ARM && + config->emachine != EM_HEXAGON && + config->emachine != EM_RISCV; // If we are producing an executable and the symbol is non-preemptable, it // must be defined and the code sequence can be relaxed to use Local-Exec. @@ -217,7 +217,7 @@ if (oneof( expr)) { // Local-Dynamic relocs can be relaxed to Local-Exec. - if (canRelax && !config->shared) { + if (toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -238,7 +238,7 @@ } // Local-Dynamic relocs can be relaxed to Local-Exec. - if (expr == R_DTPREL && canRelax && !config->shared) { + if (expr == R_DTPREL && toExecRelax) { c.relocations.push_back( {target->adjustRelaxExpr(type, nullptr, R_RELAX_TLS_LD_TO_LE), type, offset, addend, &sym}); @@ -260,7 +260,7 @@ if (oneof(expr)) { - if (!canRelax || config->shared) { + if (!toExecRelax) { if (in.got->addDynTlsEntry(sym)) { uint64_t off = in.got->getGlobalDynOffset(sym); @@ -308,7 +308,7 @@ // defined. if (oneof(expr) && - canRelax && isLocalInExecutable) { + toExecRelax && isLocalInExecutable) { c.relocations.push_back({R_RELAX_TLS_IE_TO_LE, type, offset, addend, &sym}); return 1; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83243.276485.patch Type: text/x-patch Size: 2054 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:28:54 2020 From: llvm-commits at lists.llvm.org (Sean Fertile via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:28:54 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <332e956a737ca2da7a8e1ac31b5ddcae@localhost.localdomain> sfertile added a comment. I think we are really close now. I would suggest having a second lit test, which we link into either a static exec or pie exec which has calls using the R_PPC64_REL24_NOTOC relocation to symbols with global linkage and default visibility. I think it can be done with a single file (ie define all the callers and callees in the same file, no need for a second input file). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Wed Jul 8 10:30:10 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:30:10 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls Message-ID: guiand created this revision. guiand added reviewers: jdoerfert, nikic, efriedma, eugenis. Herald added subscribers: llvm-commits, dexonsmith, steven_wu, hiraditya. Herald added a project: LLVM. The `noundef` attribute indicates an argument or return value which may never have an undef value representation. This patch allows LLVM to parse the attribute. Isolated out of D82316 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83412 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Attributes.td llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/Attributes.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83412.276484.patch Type: text/x-patch Size: 4790 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:30:39 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:30:39 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <56539d6972507a630f48b3047f6b6c9e@localhost.localdomain> lebedev.ri updated this revision to Diff 276486. lebedev.ri marked an inline comment as done. lebedev.ri added a comment. Addressing review nit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276486.patch Type: text/x-patch Size: 17882 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:34:13 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:34:13 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: <22cfed0ed508a674e7536d2bceb06cb0@localhost.localdomain> sstefan1 added inline comments. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:906 + InfoCache(InfoCache), CGUpdater(CGUpdater), SG(&DG), Allowed(Allowed), SeedingPeriod(true) {} ---------------- bbn wrote: > Is this patch based on D83185 ? But I think those are 2 irrelevant patches, right? Maybe you should create a branch from master and apply changes to that. That is fine, as long as allow list goes in first. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 10:35:14 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:35:14 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. In-Reply-To: References: Message-ID: <88969735b22f5dd3cf4a76e46039328b@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test:59 + +--- !ELF +FileHeader: ---------------- Can the test be merged with dynamic-reloc-no-section-headers.test? We also miss a warning: `'[[FILE]]': section header string table index xxx does not exist` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83387/new/ https://reviews.llvm.org/D83387 From llvm-commits at lists.llvm.org Wed Jul 8 10:36:11 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:36:11 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <6e2b9adf141f85da1c9bd132c57a5757@localhost.localdomain> nickdesaulniers accepted this revision. nickdesaulniers added a comment. This revision is now accepted and ready to land. I'm not a fan of the inconsistent use of range-for and for-each; I would prefer range-for everywhere since it's more concise. But I don't feel strongly enough to block the patch based on that. Maybe LLVM's style guide should provide clarity and guidance on the difference of opinion? It might be polite to see if other reviewers have additional and timely feedback. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 10:36:57 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:36:57 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <900de9e56a50755c85530fef91b367b2@localhost.localdomain> tra added a comment. In D82881#2138942 , @ABataev wrote: > The bug I'm trying to fix is the incompatibility with NVPTX ptxas compiler. It does not allow signed integers in debug sections. Would it be good to emit bit_offset as `DW_FORM_udata` for NVPTX target to fix incompatibility? Checked that it works with ptxas. Looks like we'll need to teach MCAsmStreamer to handle 'unsigned-only' data directives. Right now it always prints a signed value. Making dwarf use unsigned values is just one of the ways to trigger the issue. I'll get it fixed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 10:37:05 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:37:05 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <05c121437d526727b05a8e4178a25456@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. This revision is now accepted and ready to land. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 From llvm-commits at lists.llvm.org Wed Jul 8 10:38:45 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:38:45 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <475818a97865a82931c13ce27ba200ad@localhost.localdomain> ABataev added a comment. In D82881#2139589 , @tra wrote: > In D82881#2138942 , @ABataev wrote: > > > The bug I'm trying to fix is the incompatibility with NVPTX ptxas compiler. It does not allow signed integers in debug sections. Would it be good to emit bit_offset as `DW_FORM_udata` for NVPTX target to fix incompatibility? Checked that it works with ptxas. > > > Looks like we'll need to teach MCAsmStreamer to handle 'unsigned-only' data directives. Right now it always prints a signed value. Making dwarf use unsigned values is just one of the ways to trigger the issue. > > I'll get it fixed. I have a patch for it already. It is quite simple, just need to set the Form to `DW_FORM_udata` and everything work. I can update this patch, if you want to try the fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 10:41:55 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:41:55 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo In-Reply-To: References: Message-ID: guiand added a comment. In D83361#2138409 , @rovka wrote: > Don't you also have to set as Available/Unavailable when initializing the TLI? Yes, thanks for the catch! It looks like it's set available by default, there may be platforms that don't have these libatomic functions available. >From a look at RuntimeLibCalls, where these functions are also used (for emitting), it looks like the functions are only explicitly made unavailable on webassembly. (Which is pretty surprising! Could I be missing something here?) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83361/new/ https://reviews.llvm.org/D83361 From llvm-commits at lists.llvm.org Wed Jul 8 10:45:37 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:45:37 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: lebedev.ri added a comment. In D83351#2139586 , @nickdesaulniers wrote: > I'm not a fan of the inconsistent use of range-for and for-each; I would prefer range-for everywhere since it's more concise. My (consistently inconsistent) headcanon as to which to use when is that for_each should be used when in principle we don't care in which order each item will be processed. > But I don't feel strongly enough to block the patch based on that. Maybe LLVM's style guide should provide clarity and guidance on the difference of opinion? > It might be polite to see if other reviewers have additional and timely feedback. Sure, let's see if @dblaikie / etc have any other comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 10:47:52 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:47:52 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR Message-ID: cameron.mcinally created this revision. cameron.mcinally added reviewers: craig.topper, kparzysz, lebedev.ri. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. VerifySDNode(...) in SelectionDAG.cpp shows that the operands of a BUILD_VECTOR must all be the same type. This patch cleans up the comment in ISDOpcodes.h to make that more obvious. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83413 Files: llvm/include/llvm/CodeGen/ISDOpcodes.h Index: llvm/include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- llvm/include/llvm/CodeGen/ISDOpcodes.h +++ llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -450,9 +450,9 @@ /// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector /// with the specified, possibly variable, elements. The number of elements /// is required to be a power of two. The types of the operands must all be - /// the same and must match the vector element type, except that integer types - /// are allowed to be larger than the element type, in which case the operands - /// are implicitly truncated. + /// the same. The types of the operands must match the vector element type, + /// except that integer types are allowed to be larger than the element type, + /// in which case the operands are implicitly truncated. BUILD_VECTOR, /// INSERT_VECTOR_ELT(VECTOR, VAL, IDX) - Returns VECTOR with the element -------------- next part -------------- A non-text attachment was scrubbed... Name: D83413.276452.patch Type: text/x-patch Size: 970 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:49:30 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:49:30 +0000 (UTC) Subject: [PATCH] D77734: [AssumeBundles] Adapt RecursivelyDeleteTriviallyDeadInstructions and depending passes. In-Reply-To: References: Message-ID: <4b4930e34e343c6e593f15501fc5abe6@localhost.localdomain> Tyker added a comment. Herald added subscribers: okura, bbn. In D77734#2117317 , @jdoerfert wrote: > What is the status on this one? Are there other AssumeBundle patches pending or missing? since the creation of the patch many patches on assume bundles have landed and I don't think this patch is needed any more Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77734/new/ https://reviews.llvm.org/D77734 From llvm-commits at lists.llvm.org Wed Jul 8 10:51:29 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 10:51:29 -0700 (PDT) Subject: [llvm] 4137ab6 - [Support] Define llvm::parallel::strategy for -DLLVM_ENABLE_THREADS=off builds after D76885 Message-ID: <5f0607a1.1c69fb81.15475.09c9@mx.google.com> Author: Fangrui Song Date: 2020-07-08T10:51:20-07:00 New Revision: 4137ab62cff268ed0de51ba2283143a6a992a932 URL: https://github.com/llvm/llvm-project/commit/4137ab62cff268ed0de51ba2283143a6a992a932 DIFF: https://github.com/llvm/llvm-project/commit/4137ab62cff268ed0de51ba2283143a6a992a932.diff LOG: [Support] Define llvm::parallel::strategy for -DLLVM_ENABLE_THREADS=off builds after D76885 Added: Modified: llvm/lib/Support/Parallel.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp index 7f6ce82763d6..9a2e1003da5a 100644 --- a/llvm/lib/Support/Parallel.cpp +++ b/llvm/lib/Support/Parallel.cpp @@ -9,9 +9,6 @@ #include "llvm/Support/Parallel.h" #include "llvm/Config/llvm-config.h" #include "llvm/Support/ManagedStatic.h" - -#if LLVM_ENABLE_THREADS - #include "llvm/Support/Threading.h" #include @@ -22,6 +19,8 @@ llvm::ThreadPoolStrategy llvm::parallel::strategy; +#if LLVM_ENABLE_THREADS + namespace llvm { namespace parallel { namespace detail { From llvm-commits at lists.llvm.org Wed Jul 8 10:52:27 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:52:27 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <990a6b1501df5fd17a018076dd4ab808@localhost.localdomain> guiand updated this revision to Diff 276491. guiand edited the summary of this revision. guiand added a comment. Reworded comments to more closely match the reasoning Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 Files: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp llvm/test/Transforms/InstCombine/pow_fp_int.ll llvm/test/Transforms/InstCombine/simplify-libcalls.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82820.276491.patch Type: text/x-patch Size: 4803 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:53:52 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:53:52 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <3852888f069aa7b6113eadad4bd7a7c6@localhost.localdomain> tra added a comment. In D82881#2139596 , @ABataev wrote: > I have a patch for it already. It is quite simple, just need to set the Form to `DW_FORM_udata` and everything work. I can update this patch, if you want to try the fix. If negative values are allowed by dwarf, then dwarf emitter works as intended. No need to fix it if it's not broken. :-) I appreciate the effort you've put into investigating the problem. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 10:53:53 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:53:53 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <3737bd25594816f980ae64d166cd0508@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- jhenderson wrote: > dblaikie wrote: > > Higuoxing wrote: > > > jhenderson wrote: > > > > dblaikie wrote: > > > > > Higuoxing wrote: > > > > > > dblaikie wrote: > > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. What do you mean by "auto-generation aspects"? But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Wed Jul 8 10:54:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 10:54:42 -0700 (PDT) Subject: [llvm] e81c057 - [test] Add REQUIRES: x86-registered-target to tools/obj2yaml/COFF/bss.s Message-ID: <5f060862.1c69fb81.636ce.0b93@mx.google.com> Author: Fangrui Song Date: 2020-07-08T10:53:30-07:00 New Revision: e81c05777d67ec2dcbd55d34c7d2287e237bfbd1 URL: https://github.com/llvm/llvm-project/commit/e81c05777d67ec2dcbd55d34c7d2287e237bfbd1 DIFF: https://github.com/llvm/llvm-project/commit/e81c05777d67ec2dcbd55d34c7d2287e237bfbd1.diff LOG: [test] Add REQUIRES: x86-registered-target to tools/obj2yaml/COFF/bss.s Added: Modified: llvm/test/tools/obj2yaml/COFF/bss.s Removed: ################################################################################ diff --git a/llvm/test/tools/obj2yaml/COFF/bss.s b/llvm/test/tools/obj2yaml/COFF/bss.s index fed5d058714b..8e52ba6dddb3 100644 --- a/llvm/test/tools/obj2yaml/COFF/bss.s +++ b/llvm/test/tools/obj2yaml/COFF/bss.s @@ -1,3 +1,4 @@ +# REQUIRES: x86-registered-target # RUN: llvm-mc -filetype=obj -triple=x86_64-windows-msvc %s -o %t.obj # RUN: llvm-objdump -h %t.obj | FileCheck %s # RUN: obj2yaml %t.obj | yaml2obj -o %t.2.obj From llvm-commits at lists.llvm.org Wed Jul 8 10:57:02 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:57:02 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <030e991d289dcd84238cd39e99918395@localhost.localdomain> ggeorgakoudis added a comment. In D83370#2138377 , @lebedev.ri wrote: > I may be missing context, but this may be missing some wording. > *Why* should they not be in the callgraph? Thanks for the comment. The callback function is in the callgraph but it is not added to the ExternalCallingNode. I have updated the commit message to include more information. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Wed Jul 8 11:01:11 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:01:11 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: lebedev.ri added a comment. In D83370#2139632 , @ggeorgakoudis wrote: > In D83370#2138377 , @lebedev.ri wrote: > > > I may be missing context, but this may be missing some wording. > > *Why* should they not be in the callgraph? > > > Thanks for the comment. The callback function is in the callgraph but it is not added to the ExternalCallingNode. I have updated the commit message to include more information. Thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Wed Jul 8 11:01:17 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:01:17 +0000 (UTC) Subject: [PATCH] D81485: GlobalISel: Verify G_BITCAST changes the type In-Reply-To: References: Message-ID: <7c17d4d655f59920a16223bca920378f@localhost.localdomain> aemerson accepted this revision. aemerson added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:410 +# FAST-NEXT: %1:fpr(<4 x s8>) = G_BITCAST %0 +# GREEDY-NEXT: %1:gpr(<4 x s8>) = G_BITCAST %0 body: | ---------------- arsenm wrote: > aemerson wrote: > > paquette wrote: > > > AFAIK we should only have vectors on FPRs, but maybe I'm wrong about that. > > > > > > @aemerson ? > > We never enable the greedy RBS mode. I think this behavior is wrong for vectors. > For the purpose of the patch, there just needs to be something here to test a bitcast. If it happens to produce the wrong output, that's a separate issue Please add a comment in this test to explain that the greedy checks are correct. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81485/new/ https://reviews.llvm.org/D81485 From llvm-commits at lists.llvm.org Wed Jul 8 11:01:30 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:01:30 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: <392801dca9a964936e089a85f82743df@localhost.localdomain> craig.topper added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/ISDOpcodes.h:452 /// with the specified, possibly variable, elements. The number of elements /// is required to be a power of two. The types of the operands must all be + /// the same. The types of the operands must match the vector element type, ---------------- While you're here can you drop "The number of elements is required to be a power of two." That's not true. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 From llvm-commits at lists.llvm.org Wed Jul 8 11:02:16 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:02:16 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: lattner added a comment. Does this add a static constructor? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Wed Jul 8 11:02:41 2020 From: llvm-commits at lists.llvm.org (Scott Linder via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:02:41 +0000 (UTC) Subject: [PATCH] D82858: [llvm-objdump] Detect note section for ELF objects In-Reply-To: References: Message-ID: <65b2874e303f95e8e869ec9e137fc017@localhost.localdomain> scott.linder added a comment. > In an ideal world, we'd merge all the binary tools (GNU and LLVM) into a single tool, or redistribute functionality somehow, so that we don't have duplicate functionality like we already do. This takes us further away from that ideal. I'm confused by this statement in particular. If the goal is to just have one tool, why did LLVM start re-implementing these tools to begin with? Wasn't the first commit of "llvm-objdump"/"llvm-readobj" a massive step away from the ideal? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82858/new/ https://reviews.llvm.org/D82858 From llvm-commits at lists.llvm.org Wed Jul 8 11:07:48 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:07:48 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <2f258bd28d6aebdcd3398941e0400379@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll:20-30 +; Exhaust caller-saved parameter registers and force callee saved registers to +; be used in the computation. This tests that CFI directives for callee saved +; registers are generated with basic block sections. +; extern int f1(int, int, int); +; +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6) { +; int result = p1; ---------------- this looks nicer - though I'd still like a bit more commentary on exactly how/why these constructs are here? Why two function calls with interleaved parameters rather than one, etc? Mostly I'm hoping the test would explain why these constructs are used and which parts are relevant. (does the function need a non-void return type? or could the function calls be void-returning but conditional? (it's not like they can be optimized away, since they might have side effects anyway)) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Wed Jul 8 11:08:38 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:08:38 +0000 (UTC) Subject: [PATCH] D83393: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. In-Reply-To: References: Message-ID: <213b088a697777fd02d11d7ca4aae4eb@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83393/new/ https://reviews.llvm.org/D83393 From llvm-commits at lists.llvm.org Wed Jul 8 11:08:38 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via llvm-commits) Date: Wed, 08 Jul 2020 11:08:38 -0700 (PDT) Subject: [llvm] c444b1b - [SVE] Remove calls to VectorType::getNumElements from Scalar Message-ID: <5f060ba6.1c69fb81.19005.1319@mx.google.com> Author: Christopher Tetreault Date: 2020-07-08T11:08:20-07:00 New Revision: c444b1b904b11356c57980a41a19f4ef361b80a8 URL: https://github.com/llvm/llvm-project/commit/c444b1b904b11356c57980a41a19f4ef361b80a8 DIFF: https://github.com/llvm/llvm-project/commit/c444b1b904b11356c57980a41a19f4ef361b80a8.diff LOG: [SVE] Remove calls to VectorType::getNumElements from Scalar Reviewers: efriedma, fhahn, reames, kmclaughlin, sdesmalen Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, dantrushin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82243 Added: Modified: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp llvm/lib/Transforms/Scalar/SROA.cpp llvm/lib/Transforms/Scalar/Scalarizer.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp b/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp index ebe8aeb01fac..90314b17b5e2 100644 --- a/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp +++ b/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp @@ -254,13 +254,13 @@ class LowerMatrixIntrinsics { return Vectors.size(); else { assert(Vectors.size() > 0 && "Cannot call getNumRows without columns"); - return cast(Vectors[0]->getType())->getNumElements(); + return cast(Vectors[0]->getType())->getNumElements(); } } unsigned getNumRows() const { if (isColumnMajor()) { assert(Vectors.size() > 0 && "Cannot call getNumRows without columns"); - return cast(Vectors[0]->getType())->getNumElements(); + return cast(Vectors[0]->getType())->getNumElements(); } else return Vectors.size(); } @@ -401,7 +401,7 @@ class LowerMatrixIntrinsics { unsigned getNumOps(Type *VT) { assert(isa(VT) && "Expected vector type"); return getNumOps(VT->getScalarType(), - cast(VT)->getNumElements()); + cast(VT)->getNumElements()); } // @@ -421,7 +421,8 @@ class LowerMatrixIntrinsics { IRBuilder<> &Builder) { VectorType *VType = dyn_cast(MatrixVal->getType()); assert(VType && "MatrixVal must be a vector type"); - assert(VType->getNumElements() == SI.NumRows * SI.NumColumns && + assert(cast(VType)->getNumElements() == + SI.NumRows * SI.NumColumns && "The vector size must match the number of matrix elements"); // Check if we lowered MatrixVal using shape information. In that case, @@ -442,7 +443,8 @@ class LowerMatrixIntrinsics { // Otherwise split MatrixVal. SmallVector SplitVecs; Value *Undef = UndefValue::get(VType); - for (unsigned MaskStart = 0; MaskStart < VType->getNumElements(); + for (unsigned MaskStart = 0; + MaskStart < cast(VType)->getNumElements(); MaskStart += SI.getStride()) { Value *V = Builder.CreateShuffleVector( MatrixVal, Undef, createSequentialMask(MaskStart, SI.getStride(), 0), @@ -928,8 +930,8 @@ class LowerMatrixIntrinsics { // First, bring Block to the same size as Col unsigned BlockNumElts = - cast(Block->getType())->getNumElements(); - unsigned NumElts = cast(Col->getType())->getNumElements(); + cast(Block->getType())->getNumElements(); + unsigned NumElts = cast(Col->getType())->getNumElements(); assert(NumElts >= BlockNumElts && "Too few elements for current block"); Value *Undef = UndefValue::get(Block->getType()); @@ -944,7 +946,8 @@ class LowerMatrixIntrinsics { for (i = 0; i < I; i++) Mask.push_back(i); - unsigned VecNumElts = cast(Col->getType())->getNumElements(); + unsigned VecNumElts = + cast(Col->getType())->getNumElements(); for (; i < I + BlockNumElts; i++) Mask.push_back(i - I + VecNumElts); diff --git a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp index dbccfda246b8..dc2ad14ae61e 100644 --- a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp +++ b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp @@ -1352,7 +1352,8 @@ static void CreateGCRelocates(ArrayRef LiveVariables, auto AS = Ty->getScalarType()->getPointerAddressSpace(); Type *NewTy = Type::getInt8PtrTy(M->getContext(), AS); if (auto *VT = dyn_cast(Ty)) - NewTy = FixedVectorType::get(NewTy, VT->getNumElements()); + NewTy = FixedVectorType::get(NewTy, + cast(VT)->getNumElements()); return Intrinsic::getDeclaration(M, Intrinsic::experimental_gc_relocate, {NewTy}); }; @@ -2667,8 +2668,9 @@ bool RewriteStatepointsForGC::runOnFunction(Function &F, DominatorTree &DT, unsigned VF = 0; for (unsigned i = 0; i < I.getNumOperands(); i++) if (auto *OpndVTy = dyn_cast(I.getOperand(i)->getType())) { - assert(VF == 0 || VF == OpndVTy->getNumElements()); - VF = OpndVTy->getNumElements(); + assert(VF == 0 || + VF == cast(OpndVTy)->getNumElements()); + VF = cast(OpndVTy)->getNumElements(); } // It's the vector to scalar traversal through the pointer operand which diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp index 32d5dc68b709..89f324deef9f 100644 --- a/llvm/lib/Transforms/Scalar/SROA.cpp +++ b/llvm/lib/Transforms/Scalar/SROA.cpp @@ -1470,7 +1470,7 @@ static Value *getNaturalGEPRecursively(IRBuilderTy &IRB, const DataLayout &DL, } APInt ElementSize(Offset.getBitWidth(), ElementSizeInBits / 8); APInt NumSkippedElements = Offset.sdiv(ElementSize); - if (NumSkippedElements.ugt(VecTy->getNumElements())) + if (NumSkippedElements.ugt(cast(VecTy)->getNumElements())) return nullptr; Offset -= NumSkippedElements * ElementSize; Indices.push_back(IRB.getInt(NumSkippedElements)); @@ -1799,12 +1799,13 @@ static bool isVectorPromotionViableForSlice(Partition &P, const Slice &S, std::max(S.beginOffset(), P.beginOffset()) - P.beginOffset(); uint64_t BeginIndex = BeginOffset / ElementSize; if (BeginIndex * ElementSize != BeginOffset || - BeginIndex >= Ty->getNumElements()) + BeginIndex >= cast(Ty)->getNumElements()) return false; uint64_t EndOffset = std::min(S.endOffset(), P.endOffset()) - P.beginOffset(); uint64_t EndIndex = EndOffset / ElementSize; - if (EndIndex * ElementSize != EndOffset || EndIndex > Ty->getNumElements()) + if (EndIndex * ElementSize != EndOffset || + EndIndex > cast(Ty)->getNumElements()) return false; assert(EndIndex > BeginIndex && "Empty vector!"); @@ -1930,7 +1931,8 @@ static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) { "All non-integer types eliminated!"); assert(LHSTy->getElementType()->isIntegerTy() && "All non-integer types eliminated!"); - return RHSTy->getNumElements() < LHSTy->getNumElements(); + return cast(RHSTy)->getNumElements() < + cast(LHSTy)->getNumElements(); }; llvm::sort(CandidateTys, RankVectorTypes); CandidateTys.erase( @@ -2173,7 +2175,7 @@ static Value *insertInteger(const DataLayout &DL, IRBuilderTy &IRB, Value *Old, static Value *extractVector(IRBuilderTy &IRB, Value *V, unsigned BeginIndex, unsigned EndIndex, const Twine &Name) { - VectorType *VecTy = cast(V->getType()); + auto *VecTy = cast(V->getType()); unsigned NumElements = EndIndex - BeginIndex; assert(NumElements <= VecTy->getNumElements() && "Too many elements!"); @@ -2211,21 +2213,23 @@ static Value *insertVector(IRBuilderTy &IRB, Value *Old, Value *V, return V; } - assert(Ty->getNumElements() <= VecTy->getNumElements() && + assert(cast(Ty)->getNumElements() <= + cast(VecTy)->getNumElements() && "Too many elements!"); - if (Ty->getNumElements() == VecTy->getNumElements()) { + if (cast(Ty)->getNumElements() == + cast(VecTy)->getNumElements()) { assert(V->getType() == VecTy && "Vector type mismatch"); return V; } - unsigned EndIndex = BeginIndex + Ty->getNumElements(); + unsigned EndIndex = BeginIndex + cast(Ty)->getNumElements(); // When inserting a smaller vector into the larger to store, we first // use a shuffle vector to widen it with undef elements, and then // a second shuffle vector to select between the loaded vector and the // incoming vector. SmallVector Mask; - Mask.reserve(VecTy->getNumElements()); - for (unsigned i = 0; i != VecTy->getNumElements(); ++i) + Mask.reserve(cast(VecTy)->getNumElements()); + for (unsigned i = 0; i != cast(VecTy)->getNumElements(); ++i) if (i >= BeginIndex && i < EndIndex) Mask.push_back(IRB.getInt32(i - BeginIndex)); else @@ -2235,7 +2239,7 @@ static Value *insertVector(IRBuilderTy &IRB, Value *Old, Value *V, LLVM_DEBUG(dbgs() << " shuffle: " << *V << "\n"); Mask.clear(); - for (unsigned i = 0; i != VecTy->getNumElements(); ++i) + for (unsigned i = 0; i != cast(VecTy)->getNumElements(); ++i) Mask.push_back(IRB.getInt1(i >= BeginIndex && i < EndIndex)); V = IRB.CreateSelect(ConstantVector::get(Mask), V, Old, Name + "blend"); @@ -2595,7 +2599,8 @@ class llvm::sroa::AllocaSliceRewriter unsigned EndIndex = getIndex(NewEndOffset); assert(EndIndex > BeginIndex && "Empty vector!"); unsigned NumElements = EndIndex - BeginIndex; - assert(NumElements <= VecTy->getNumElements() && "Too many elements!"); + assert(NumElements <= cast(VecTy)->getNumElements() && + "Too many elements!"); Type *SliceTy = (NumElements == 1) ? ElementTy : FixedVectorType::get(ElementTy, NumElements); @@ -2819,7 +2824,8 @@ class llvm::sroa::AllocaSliceRewriter unsigned EndIndex = getIndex(NewEndOffset); assert(EndIndex > BeginIndex && "Empty vector!"); unsigned NumElements = EndIndex - BeginIndex; - assert(NumElements <= VecTy->getNumElements() && "Too many elements!"); + assert(NumElements <= cast(VecTy)->getNumElements() && + "Too many elements!"); Value *Splat = getIntegerSplat( II.getValue(), DL.getTypeSizeInBits(ElementTy).getFixedSize() / 8); @@ -2858,7 +2864,8 @@ class llvm::sroa::AllocaSliceRewriter V = getIntegerSplat(II.getValue(), DL.getTypeSizeInBits(ScalarTy).getFixedSize() / 8); if (VectorType *AllocaVecTy = dyn_cast(AllocaTy)) - V = getVectorSplat(V, AllocaVecTy->getNumElements()); + V = getVectorSplat( + V, cast(AllocaVecTy)->getNumElements()); V = convertValue(DL, IRB, V, AllocaTy); } @@ -3617,7 +3624,7 @@ static Type *getTypePartition(const DataLayout &DL, Type *Ty, uint64_t Offset, } else { // FIXME: This isn't right for vectors with non-byte-sized or // non-power-of-two sized elements. - auto *VT = cast(Ty); + auto *VT = cast(Ty); ElementTy = VT->getElementType(); TyNumElements = VT->getNumElements(); } diff --git a/llvm/lib/Transforms/Scalar/Scalarizer.cpp b/llvm/lib/Transforms/Scalar/Scalarizer.cpp index 3d650c66a862..851bd79cd6d8 100644 --- a/llvm/lib/Transforms/Scalar/Scalarizer.cpp +++ b/llvm/lib/Transforms/Scalar/Scalarizer.cpp @@ -262,7 +262,7 @@ Scatterer::Scatterer(BasicBlock *bb, BasicBlock::iterator bbi, Value *v, PtrTy = dyn_cast(Ty); if (PtrTy) Ty = PtrTy->getElementType(); - Size = cast(Ty)->getNumElements(); + Size = cast(Ty)->getNumElements(); if (!CachePtr) Tmp.resize(Size, nullptr); else if (CachePtr->empty()) @@ -465,7 +465,7 @@ bool ScalarizerVisitor::splitUnary(Instruction &I, const Splitter &Split) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&I); Scatterer Op = scatter(&I, I.getOperand(0)); assert(Op.size() == NumElems && "Mismatched unary operation"); @@ -485,7 +485,7 @@ bool ScalarizerVisitor::splitBinary(Instruction &I, const Splitter &Split) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&I); Scatterer VOp0 = scatter(&I, I.getOperand(0)); Scatterer VOp1 = scatter(&I, I.getOperand(1)); @@ -528,7 +528,7 @@ bool ScalarizerVisitor::splitCall(CallInst &CI) { if (ID == Intrinsic::not_intrinsic || !isTriviallyScalariable(ID)) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); unsigned NumArgs = CI.getNumArgOperands(); ValueVector ScalarOperands(NumArgs); @@ -578,7 +578,7 @@ bool ScalarizerVisitor::visitSelectInst(SelectInst &SI) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&SI); Scatterer VOp1 = scatter(&SI, SI.getOperand(1)); Scatterer VOp2 = scatter(&SI, SI.getOperand(2)); @@ -632,7 +632,7 @@ bool ScalarizerVisitor::visitGetElementPtrInst(GetElementPtrInst &GEPI) { return false; IRBuilder<> Builder(&GEPI); - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); unsigned NumIndices = GEPI.getNumIndices(); // The base pointer might be scalar even if it's a vector GEP. In those cases, @@ -677,7 +677,7 @@ bool ScalarizerVisitor::visitCastInst(CastInst &CI) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&CI); Scatterer Op0 = scatter(&CI, CI.getOperand(0)); assert(Op0.size() == NumElems && "Mismatched cast"); @@ -696,8 +696,8 @@ bool ScalarizerVisitor::visitBitCastInst(BitCastInst &BCI) { if (!DstVT || !SrcVT) return false; - unsigned DstNumElems = DstVT->getNumElements(); - unsigned SrcNumElems = SrcVT->getNumElements(); + unsigned DstNumElems = cast(DstVT)->getNumElements(); + unsigned SrcNumElems = cast(SrcVT)->getNumElements(); IRBuilder<> Builder(&BCI); Scatterer Op0 = scatter(&BCI, BCI.getOperand(0)); ValueVector Res; @@ -750,7 +750,7 @@ bool ScalarizerVisitor::visitInsertElementInst(InsertElementInst &IEI) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&IEI); Scatterer Op0 = scatter(&IEI, IEI.getOperand(0)); Value *NewElt = IEI.getOperand(1); @@ -785,7 +785,7 @@ bool ScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) { if (!VT) return false; - unsigned NumSrcElems = VT->getNumElements(); + unsigned NumSrcElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&EEI); Scatterer Op0 = scatter(&EEI, EEI.getOperand(0)); Value *ExtIdx = EEI.getOperand(1); @@ -817,7 +817,7 @@ bool ScalarizerVisitor::visitShuffleVectorInst(ShuffleVectorInst &SVI) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); Scatterer Op0 = scatter(&SVI, SVI.getOperand(0)); Scatterer Op1 = scatter(&SVI, SVI.getOperand(1)); ValueVector Res; @@ -841,7 +841,7 @@ bool ScalarizerVisitor::visitPHINode(PHINode &PHI) { if (!VT) return false; - unsigned NumElems = VT->getNumElements(); + unsigned NumElems = cast(VT)->getNumElements(); IRBuilder<> Builder(&PHI); ValueVector Res; Res.resize(NumElems); @@ -872,7 +872,7 @@ bool ScalarizerVisitor::visitLoadInst(LoadInst &LI) { if (!Layout) return false; - unsigned NumElems = Layout->VecTy->getNumElements(); + unsigned NumElems = cast(Layout->VecTy)->getNumElements(); IRBuilder<> Builder(&LI); Scatterer Ptr = scatter(&LI, LI.getPointerOperand()); ValueVector Res; @@ -898,7 +898,7 @@ bool ScalarizerVisitor::visitStoreInst(StoreInst &SI) { if (!Layout) return false; - unsigned NumElems = Layout->VecTy->getNumElements(); + unsigned NumElems = cast(Layout->VecTy)->getNumElements(); IRBuilder<> Builder(&SI); Scatterer VPtr = scatter(&SI, SI.getPointerOperand()); Scatterer VVal = scatter(&SI, FullValue); @@ -934,7 +934,7 @@ bool ScalarizerVisitor::finish() { Value *Res = UndefValue::get(Op->getType()); if (auto *Ty = dyn_cast(Op->getType())) { BasicBlock *BB = Op->getParent(); - unsigned Count = Ty->getNumElements(); + unsigned Count = cast(Ty)->getNumElements(); IRBuilder<> Builder(Op); if (isa(Op)) Builder.SetInsertPoint(BB, BB->getFirstInsertionPt()); From llvm-commits at lists.llvm.org Wed Jul 8 11:08:42 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:08:42 +0000 (UTC) Subject: [PATCH] D82243: [SVE] Remove calls to VectorType::getNumElements from Scalar In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGc444b1b904b1: [SVE] Remove calls to VectorType::getNumElements from Scalar (authored by ctetreau). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82243/new/ https://reviews.llvm.org/D82243 Files: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp llvm/lib/Transforms/Scalar/SROA.cpp llvm/lib/Transforms/Scalar/Scalarizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82243.276492.patch Type: text/x-patch Size: 14942 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:10:51 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:10:51 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: <656b2c346d6cca5ef1d719776352196f@localhost.localdomain> dblaikie added a comment. In D82886#2139279 , @ikudrin wrote: > In D82886#2137907 , @dblaikie wrote: > > > Ah, thanks! Would be handy to have a test case for that & perhaps some other way to communicate "end of list" that's a bit more explicit? > > > For my understanding, that is not yet broken, so does not need to be fixed. I'm not suggesting it needs to be fixed - but that that codepath (the one that returns zero) is untested - so when it was committed, it was committed without test coverage. It'd be good to add test coverage where it is missing like this. >> Hmm, I'm not sure why this produce the repetition - if length() accurately returned the length that was read rather than zero, then it'd go to the end and stop, right? > > `0xffffffff` is a DWARF64 mark, so than it is read, the library expects to read the next 8 bytes. Right - but what I mean is if there's only 10 bytes, as in your example - it reads the 4 bytes of DWARF64 mark, then 6 bytes out of the desired 8 - if the length was then reported as 10 (with an error saying the length was garbled/the contents terminated earlier than expected), would that be adequate to no longer need the zero length special case? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Wed Jul 8 11:13:50 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:13:50 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: clementval updated this revision to Diff 276493. clementval marked 2 inline comments as done. clementval added a comment. Remove correct default case and revert llvm_unreachable Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276493.patch Type: text/x-patch Size: 7130 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:14:05 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:14:05 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <9844f4ab77cf67cfe27b2f63f53468b2@localhost.localdomain> clementval added a comment. In D83363#2139526 , @jdenny wrote: > Other than the last bit of cleanup I commented on, LGTM. Should be good now. ================ Comment at: llvm/test/TableGen/directive1.td:124 +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } ---------------- jdenny wrote: > clementval wrote: > > jdenny wrote: > > > jdenny wrote: > > > > clementval wrote: > > > > > jdenny wrote: > > > > > > The unreachable message doesn't make sense given the `default` in the directive switch. If that switch covers all directives, `default` isn't needed anyway. > > > > > Will remove it. > > > > Is the default useful? Are all directives covered by cases? > > > This is what I'm thinking of: > > > > > > http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations > > Yeah for directive we get remove it since it's fully covered. Just pushed an update. > But it needs `llvm_unreachable`, whose message makes sense now that `default` is removed. > > Also, the last update removed the wrong `default` from the emitter. Of course! Shouldn't try to do two things at the same time ... sorry for the mix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 11:15:26 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Wed, 08 Jul 2020 11:15:26 -0700 (PDT) Subject: [llvm] f4bd01c - [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Message-ID: <5f060d3e.1c69fb81.8dd19.152d@mx.google.com> Author: Jay Foad Date: 2020-07-08T19:14:48+01:00 New Revision: f4bd01c1918e90f232a098b4878b52c6f7d4a215 URL: https://github.com/llvm/llvm-project/commit/f4bd01c1918e90f232a098b4878b52c6f7d4a215 DIFF: https://github.com/llvm/llvm-project/commit/f4bd01c1918e90f232a098b4878b52c6f7d4a215.diff LOG: [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Differential Revision: https://reviews.llvm.org/D83381 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/idiv-licm.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp index 99297f1b3e34..a79549301740 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp @@ -1017,9 +1017,9 @@ static Value *getSign32(Value *V, IRBuilder<> &Builder, const DataLayout *DL) { return Builder.CreateAShr(V, Builder.getInt32(31)); } -Value* AMDGPUCodeGenPrepare::expandDivRem32(IRBuilder<> &Builder, - BinaryOperator &I, - Value *Num, Value *Den) const { +Value *AMDGPUCodeGenPrepare::expandDivRem32(IRBuilder<> &Builder, + BinaryOperator &I, Value *X, + Value *Y) const { Instruction::BinaryOps Opc = I.getOpcode(); assert(Opc == Instruction::URem || Opc == Instruction::UDiv || Opc == Instruction::SRem || Opc == Instruction::SDiv); @@ -1028,27 +1028,27 @@ Value* AMDGPUCodeGenPrepare::expandDivRem32(IRBuilder<> &Builder, FMF.setFast(); Builder.setFastMathFlags(FMF); - if (divHasSpecialOptimization(I, Num, Den)) + if (divHasSpecialOptimization(I, X, Y)) return nullptr; // Keep it for later optimization. bool IsDiv = Opc == Instruction::UDiv || Opc == Instruction::SDiv; bool IsSigned = Opc == Instruction::SRem || Opc == Instruction::SDiv; - Type *Ty = Num->getType(); + Type *Ty = X->getType(); Type *I32Ty = Builder.getInt32Ty(); Type *F32Ty = Builder.getFloatTy(); if (Ty->getScalarSizeInBits() < 32) { if (IsSigned) { - Num = Builder.CreateSExt(Num, I32Ty); - Den = Builder.CreateSExt(Den, I32Ty); + X = Builder.CreateSExt(X, I32Ty); + Y = Builder.CreateSExt(Y, I32Ty); } else { - Num = Builder.CreateZExt(Num, I32Ty); - Den = Builder.CreateZExt(Den, I32Ty); + X = Builder.CreateZExt(X, I32Ty); + Y = Builder.CreateZExt(Y, I32Ty); } } - if (Value *Res = expandDivRem24(Builder, I, Num, Den, IsDiv, IsSigned)) { + if (Value *Res = expandDivRem24(Builder, I, X, Y, IsDiv, IsSigned)) { return IsSigned ? Builder.CreateSExtOrTrunc(Res, Ty) : Builder.CreateZExtOrTrunc(Res, Ty); } @@ -1058,97 +1058,79 @@ Value* AMDGPUCodeGenPrepare::expandDivRem32(IRBuilder<> &Builder, Value *Sign = nullptr; if (IsSigned) { - Value *LHSign = getSign32(Num, Builder, DL); - Value *RHSign = getSign32(Den, Builder, DL); + Value *SignX = getSign32(X, Builder, DL); + Value *SignY = getSign32(Y, Builder, DL); // Remainder sign is the same as LHS - Sign = IsDiv ? Builder.CreateXor(LHSign, RHSign) : LHSign; + Sign = IsDiv ? Builder.CreateXor(SignX, SignY) : SignX; - Num = Builder.CreateAdd(Num, LHSign); - Den = Builder.CreateAdd(Den, RHSign); + X = Builder.CreateAdd(X, SignX); + Y = Builder.CreateAdd(Y, SignY); - Num = Builder.CreateXor(Num, LHSign); - Den = Builder.CreateXor(Den, RHSign); + X = Builder.CreateXor(X, SignX); + Y = Builder.CreateXor(Y, SignY); } - // RCP = URECIP(Den) = 2^32 / Den + e - // e is rounding error. - Value *DEN_F32 = Builder.CreateUIToFP(Den, F32Ty); - - Function *RcpDecl = Intrinsic::getDeclaration(Mod, Intrinsic::amdgcn_rcp, - Builder.getFloatTy()); - Value *RCP_F32 = Builder.CreateCall(RcpDecl, { DEN_F32 }); - Constant *UINT_MAX_PLUS_1 = ConstantFP::get(F32Ty, BitsToFloat(0x4f800000)); - Value *RCP_SCALE = Builder.CreateFMul(RCP_F32, UINT_MAX_PLUS_1); - Value *RCP = Builder.CreateFPToUI(RCP_SCALE, I32Ty); - - // RCP_LO, RCP_HI = mul(RCP, Den) */ - Value *RCP_LO, *RCP_HI; - std::tie(RCP_LO, RCP_HI) = getMul64(Builder, RCP, Den); - - // NEG_RCP_LO = -RCP_LO - Value *NEG_RCP_LO = Builder.CreateNeg(RCP_LO); - - // ABS_RCP_LO = (RCP_HI == 0 ? NEG_RCP_LO : RCP_LO) - Value *RCP_HI_0_CC = Builder.CreateICmpEQ(RCP_HI, Zero); - Value *ABS_RCP_LO = Builder.CreateSelect(RCP_HI_0_CC, NEG_RCP_LO, RCP_LO); - - // Calculate the rounding error from the URECIP instruction - // E = mulhu(ABS_RCP_LO, RCP) - Value *E = getMulHu(Builder, ABS_RCP_LO, RCP); - - // RCP_A_E = RCP + E - Value *RCP_A_E = Builder.CreateAdd(RCP, E); - - // RCP_S_E = RCP - E - Value *RCP_S_E = Builder.CreateSub(RCP, E); - - // Tmp0 = (RCP_HI == 0 ? RCP_A_E : RCP_SUB_E) - Value *Tmp0 = Builder.CreateSelect(RCP_HI_0_CC, RCP_A_E, RCP_S_E); - - // Quotient = mulhu(Tmp0, Num) - Value *Quotient = getMulHu(Builder, Tmp0, Num); - - // Num_S_Remainder = Quotient * Den - Value *Num_S_Remainder = Builder.CreateMul(Quotient, Den); - - // Remainder = Num - Num_S_Remainder - Value *Remainder = Builder.CreateSub(Num, Num_S_Remainder); - - // Remainder_GE_Den = Remainder >= Den; - Value *Remainder_GE_Den = Builder.CreateICmpUGE(Remainder, Den); - - // Remainder_GE_Zero = Num >= Num_S_Remainder - Value *Remainder_GE_Zero = Builder.CreateICmpUGE(Num, Num_S_Remainder); - - // Tmp1 = Remainder_GE_Den & Remainder_GE_Zero - Value *Tmp1 = Builder.CreateAnd(Remainder_GE_Den, Remainder_GE_Zero); - + // The algorithm here is based on ideas from "Software Integer Division", Tom + // Rodeheffer, August 2008. + // + // unsigned udiv(unsigned x, unsigned y) { + // // Initial estimate of inv(y). The constant is less than 2^32 to ensure + // // that this is a lower bound on inv(y), even if some of the calculations + // // round up. + // unsigned z = (unsigned)((4294967296.0 - 512.0) * v_rcp_f32((float)y)); + // + // // One round of UNR (Unsigned integer Newton-Raphson) to improve z. + // // Empirically this is guaranteed to give a "two-y" lower bound on + // // inv(y). + // z += umulh(z, -y * z); + // + // // Quotient/remainder estimate. + // unsigned q = umulh(x, z); + // unsigned r = x - q * y; + // + // // Two rounds of quotient/remainder refinement. + // if (r >= y) { + // ++q; + // r -= y; + // } + // if (r >= y) { + // ++q; + // r -= y; + // } + // + // return q; + // } + + // Initial estimate of inv(y). + Value *FloatY = Builder.CreateUIToFP(Y, F32Ty); + Function *Rcp = Intrinsic::getDeclaration(Mod, Intrinsic::amdgcn_rcp, F32Ty); + Value *RcpY = Builder.CreateCall(Rcp, {FloatY}); + Constant *Scale = ConstantFP::get(F32Ty, BitsToFloat(0x4F7FFFFE)); + Value *ScaledY = Builder.CreateFMul(RcpY, Scale); + Value *Z = Builder.CreateFPToUI(ScaledY, I32Ty); + + // One round of UNR. + Value *NegY = Builder.CreateSub(Zero, Y); + Value *NegYZ = Builder.CreateMul(NegY, Z); + Z = Builder.CreateAdd(Z, getMulHu(Builder, Z, NegYZ)); + + // Quotient/remainder estimate. + Value *Q = getMulHu(Builder, X, Z); + Value *R = Builder.CreateSub(X, Builder.CreateMul(Q, Y)); + + // First quotient/remainder refinement. + Value *Cond = Builder.CreateICmpUGE(R, Y); + if (IsDiv) + Q = Builder.CreateSelect(Cond, Builder.CreateAdd(Q, One), Q); + R = Builder.CreateSelect(Cond, Builder.CreateSub(R, Y), R); + + // Second quotient/remainder refinement. + Cond = Builder.CreateICmpUGE(R, Y); Value *Res; - if (IsDiv) { - // Quotient_A_One = Quotient + 1 - Value *Quotient_A_One = Builder.CreateAdd(Quotient, One); - - // Quotient_S_One = Quotient - 1 - Value *Quotient_S_One = Builder.CreateSub(Quotient, One); - - // Div = (Tmp1 ? Quotient_A_One : Quotient) - Value *Div = Builder.CreateSelect(Tmp1, Quotient_A_One, Quotient); - - // Div = (Remainder_GE_Zero ? Div : Quotient_S_One) - Res = Builder.CreateSelect(Remainder_GE_Zero, Div, Quotient_S_One); - } else { - // Remainder_S_Den = Remainder - Den - Value *Remainder_S_Den = Builder.CreateSub(Remainder, Den); - - // Remainder_A_Den = Remainder + Den - Value *Remainder_A_Den = Builder.CreateAdd(Remainder, Den); - - // Rem = (Tmp1 ? Remainder_S_Den : Remainder) - Value *Rem = Builder.CreateSelect(Tmp1, Remainder_S_Den, Remainder); - - // Rem = (Remainder_GE_Zero ? Rem : Remainder_A_Den) - Res = Builder.CreateSelect(Remainder_GE_Zero, Rem, Remainder_A_Den); - } + if (IsDiv) + Res = Builder.CreateSelect(Cond, Builder.CreateAdd(Q, One), Q); + else + Res = Builder.CreateSelect(Cond, Builder.CreateSub(R, Y), R); if (IsSigned) { Res = Builder.CreateXor(Res, Sign); diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll index 19c359b9a4f7..dc9910e9ed21 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll @@ -53,39 +53,32 @@ define i32 @v_sdiv_i32(i32 %num, i32 %den) { ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v5, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v6, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v7, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v6 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v8 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v6, v9, vcc -; CGP-NEXT: v_mul_lo_u32 v6, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v6, s[4:5], v7, v6 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v6, v3 -; CGP-NEXT: v_add_i32_e64 v6, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v6, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v6, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v5 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v3, v2, v1 ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v2 -; CGP-NEXT: v_subrev_i32_e32 v6, vcc, 1, v2 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v1 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v4 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -138,50 +131,41 @@ define amdgpu_ps i32 @s_sdiv_i32(i32 inreg %num, i32 inreg %den) { ; CGP: ; %bb.0: ; CGP-NEXT: s_ashr_i32 s2, s0, 31 ; CGP-NEXT: s_ashr_i32 s3, s1, 31 -; CGP-NEXT: s_xor_b32 s5, s2, s3 +; CGP-NEXT: s_xor_b32 s4, s2, s3 ; CGP-NEXT: s_add_i32 s0, s0, s2 ; CGP-NEXT: s_add_i32 s1, s1, s3 -; CGP-NEXT: s_xor_b32 s2, s0, s2 -; CGP-NEXT: s_xor_b32 s4, s1, s3 -; CGP-NEXT: v_cvt_f32_u32_e32 v0, s4 -; CGP-NEXT: s_bfe_u64 s[0:1], s[4:5], 0x200000 -; CGP-NEXT: s_bfe_u64 s[6:7], s[2:3], 0x200000 +; CGP-NEXT: s_xor_b32 s0, s0, s2 +; CGP-NEXT: s_xor_b32 s5, s1, s3 +; CGP-NEXT: v_cvt_f32_u32_e32 v0, s5 +; CGP-NEXT: s_sub_i32 s1, 0, s5 +; CGP-NEXT: s_bfe_u64 s[2:3], s[0:1], 0x200000 ; CGP-NEXT: v_rcp_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, 0, s0 -; CGP-NEXT: v_mul_lo_u32 v2, 0, s6 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_lo_u32 v1, s2, 0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v3, v0, s0 -; CGP-NEXT: v_mul_lo_u32 v4, v0, s1 -; CGP-NEXT: v_mul_hi_u32 v5, v0, s0 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v1, 0 -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v3, s[0:1], v6, v3 -; CGP-NEXT: v_add_i32_e64 v1, s[0:1], v3, v1 -; CGP-NEXT: v_add_i32_e64 v3, s[0:1], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc -; CGP-NEXT: v_mul_lo_u32 v1, v0, s7 -; CGP-NEXT: v_mul_hi_u32 v0, v0, s6 +; CGP-NEXT: v_mul_lo_u32 v2, s1, v0 +; CGP-NEXT: v_mul_lo_u32 v3, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v4, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_mul_lo_u32 v2, s3, v0 +; CGP-NEXT: v_mul_hi_u32 v0, s2, v0 ; CGP-NEXT: v_add_i32_e32 v1, vcc, v2, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v1, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, s4 +; CGP-NEXT: v_mul_lo_u32 v1, v0, s5 +; CGP-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s5, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_subrev_i32_e64 v2, s[0:1], s5, v1 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; CGP-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, s2, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, s2, v1 -; CGP-NEXT: v_cmp_le_u32_e64 s[0:1], s4, v4 -; CGP-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CGP-NEXT: v_xor_b32_e32 v0, s5, v0 -; CGP-NEXT: v_subrev_i32_e32 v0, vcc, s5, v0 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s5, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_xor_b32_e32 v0, s4, v0 +; CGP-NEXT: v_subrev_i32_e32 v0, vcc, s4, v0 ; CGP-NEXT: v_readfirstlane_b32 s0, v0 ; CGP-NEXT: ; return to shader part epilog %result = sdiv i32 %num, %den @@ -277,73 +261,59 @@ define <2 x i32> @v_sdiv_v2i32(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_xor_b32_e32 v1, v1, v6 ; CGP-NEXT: v_xor_b32_e32 v3, v3, v7 ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v10, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v11, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v10, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v11, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v14, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v15, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v16, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v17, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v13 -; CGP-NEXT: v_sub_i32_e32 v18, vcc, 0, v12 -; CGP-NEXT: v_add_i32_e32 v10, vcc, v10, v16 -; CGP-NEXT: v_sub_i32_e32 v19, vcc, 0, v15 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v14 -; CGP-NEXT: v_add_i32_e32 v10, vcc, v10, v17 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v12, v18, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v10, v15, v19, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v12, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v14, v10, 0 -; CGP-NEXT: v_mul_hi_u32 v10, v10, v7 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v13, v12 -; CGP-NEXT: v_add_i32_e64 v13, s[6:7], v16, v14 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v12, v5 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v13, v10 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v10 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v10 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v12, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v11, v10 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v12, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v10, v10, v7 +; CGP-NEXT: v_mul_lo_u32 v13, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v14, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v15, 0, v10 +; CGP-NEXT: v_mul_hi_u32 v10, v7, v10 +; CGP-NEXT: v_add_i32_e32 v12, vcc, v14, v12 +; CGP-NEXT: v_add_i32_e32 v13, vcc, v15, v13 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v12, v5 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v13, v10 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v10 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v10, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v10, v11 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v6, v4, v2 ; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; CGP-NEXT: v_subrev_i32_e32 v10, vcc, 1, v4 -; CGP-NEXT: v_mul_lo_u32 v11, v5, v3 -; CGP-NEXT: v_add_i32_e32 v12, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v13, vcc, 1, v5 -; CGP-NEXT: v_sub_i32_e32 v14, vcc, v0, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v11 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v11 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v14, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v12, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v10, v5, v3 +; CGP-NEXT: v_add_i32_e32 v11, vcc, 1, v5 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; CGP-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v5, v5, v11, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v8 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v9 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 @@ -815,78 +785,64 @@ define <2 x i32> @v_sdiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_xor_b32_e32 v4, v4, v6 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v6 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v0 +; CGP-NEXT: v_mul_lo_u32 v8, v0, 0 ; CGP-NEXT: v_xor_b32_e32 v5, v5, v7 ; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_xor_b32_e32 v2, v2, v6 ; CGP-NEXT: v_xor_b32_e32 v3, v3, v7 ; CGP-NEXT: v_cvt_f32_u32_e32 v6, v2 -; CGP-NEXT: v_mul_lo_u32 v7, 0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 ; CGP-NEXT: v_cvt_f32_u32_e32 v10, v3 -; CGP-NEXT: v_mul_lo_u32 v11, 0, v3 +; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v3 ; CGP-NEXT: v_rcp_f32_e32 v6, v6 ; CGP-NEXT: v_rcp_f32_e32 v10, v10 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_mul_f32_e32 v10, 0x4f800000, v10 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v10, 0x4f7ffffe, v10 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 ; CGP-NEXT: v_cvt_u32_f32_e32 v10, v10 -; CGP-NEXT: v_mul_lo_u32 v12, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v14, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v15, v10, v3 -; CGP-NEXT: v_mul_lo_u32 v16, v10, 0 -; CGP-NEXT: v_mul_hi_u32 v17, v10, v3 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v13 -; CGP-NEXT: v_sub_i32_e32 v18, vcc, 0, v12 -; CGP-NEXT: v_add_i32_e32 v11, vcc, v11, v16 -; CGP-NEXT: v_sub_i32_e32 v19, vcc, 0, v15 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v14 -; CGP-NEXT: v_add_i32_e32 v11, vcc, v11, v17 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; CGP-NEXT: v_cndmask_b32_e32 v7, v12, v18, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; CGP-NEXT: v_cndmask_b32_e64 v11, v15, v19, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v12, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v6 -; CGP-NEXT: v_mul_lo_u32 v14, v11, 0 -; CGP-NEXT: v_mul_hi_u32 v11, v11, v10 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v13, v12 -; CGP-NEXT: v_add_i32_e64 v13, s[6:7], v16, v14 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v12, v7 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v13, v11 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v6, v7 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v10, v11 -; CGP-NEXT: v_sub_i32_e64 v10, s[6:7], v10, v11 -; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v12, vcc -; CGP-NEXT: v_cndmask_b32_e64 v7, v10, v7, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v0 -; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v1 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v10 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v11 +; CGP-NEXT: v_mul_lo_u32 v7, v7, v6 +; CGP-NEXT: v_mul_lo_u32 v12, v6, 0 +; CGP-NEXT: v_mul_lo_u32 v11, v11, v10 +; CGP-NEXT: v_mul_lo_u32 v13, v10, 0 +; CGP-NEXT: v_mul_lo_u32 v14, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v6, v7 +; CGP-NEXT: v_mul_lo_u32 v15, 0, v11 +; CGP-NEXT: v_mul_hi_u32 v11, v10, v11 +; CGP-NEXT: v_add_i32_e32 v12, vcc, v14, v12 +; CGP-NEXT: v_add_i32_e32 v13, vcc, v15, v13 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v12, v7 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v10, v11 +; CGP-NEXT: v_mul_lo_u32 v10, 0, v6 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_lo_u32 v11, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v1, v7 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v10, v8 +; CGP-NEXT: v_add_i32_e32 v9, vcc, v11, v9 ; CGP-NEXT: v_add_i32_e32 v6, vcc, v8, v6 ; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v7 ; CGP-NEXT: v_mul_lo_u32 v8, v6, v2 ; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v6 -; CGP-NEXT: v_subrev_i32_e32 v10, vcc, 1, v6 -; CGP-NEXT: v_mul_lo_u32 v11, v7, v3 -; CGP-NEXT: v_add_i32_e32 v12, vcc, 1, v7 -; CGP-NEXT: v_subrev_i32_e32 v13, vcc, 1, v7 -; CGP-NEXT: v_sub_i32_e32 v14, vcc, v0, v8 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v8 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v11 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v11 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v14, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v6, v9, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v7, v12, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v10, v7, v3 +; CGP-NEXT: v_add_i32_e32 v11, vcc, 1, v7 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v9, vcc +; CGP-NEXT: v_sub_i32_e64 v8, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v7, v7, v11, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v9, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v8, vcc +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v6 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v9, s[4:5] +; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v7 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v8, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v7, v9, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v4 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -945,39 +901,32 @@ define i32 @v_sdiv_i32_24bit(i32 %num, i32 %den) { ; CGP-NEXT: v_and_b32_e32 v0, s4, v0 ; CGP-NEXT: v_and_b32_e32 v1, s4, v1 ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v3, v2, v1 ; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CGP-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and i32 %num, 16777215 %den.mask = and i32 %den, 16777215 @@ -1069,73 +1018,59 @@ define <2 x i32> @v_sdiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_and_b32_e32 v2, s4, v2 ; CGP-NEXT: v_and_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v6, v4, v2 ; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; CGP-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; CGP-NEXT: v_mul_lo_u32 v9, v5, v3 -; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v8, v5, v3 +; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; CGP-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and <2 x i32> %num, %den.mask = and <2 x i32> %den, diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll index c53d5627d9fe..1d2d57669976 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll @@ -51,39 +51,30 @@ define i32 @v_srem_i32(i32 %num, i32 %den) { ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v5, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v6, v3, v1 -; CGP-NEXT: v_mul_lo_u32 v7, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v3, v1 -; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v7 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v6 -; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v8 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CGP-NEXT: v_cndmask_b32_e32 v4, v6, v9, vcc -; CGP-NEXT: v_mul_lo_u32 v6, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v6, s[4:5], v7, v6 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v6, v4 -; CGP-NEXT: v_add_i32_e64 v6, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v6, vcc -; CGP-NEXT: v_mul_lo_u32 v4, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v0 -; CGP-NEXT: v_add_i32_e32 v4, vcc, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v6, v3, 0 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_lo_u32 v4, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v0, v3 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 ; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 ; CGP-NEXT: v_mul_lo_u32 v3, v3, v1 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v4, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -133,51 +124,40 @@ define amdgpu_ps i32 @s_srem_i32(i32 inreg %num, i32 inreg %den) { ; ; CGP-LABEL: s_srem_i32: ; CGP: ; %bb.0: -; CGP-NEXT: s_ashr_i32 s5, s0, 31 -; CGP-NEXT: s_ashr_i32 s3, s1, 31 -; CGP-NEXT: s_add_i32 s0, s0, s5 -; CGP-NEXT: s_add_i32 s1, s1, s3 -; CGP-NEXT: s_xor_b32 s2, s0, s5 -; CGP-NEXT: s_xor_b32 s4, s1, s3 -; CGP-NEXT: v_cvt_f32_u32_e32 v0, s4 -; CGP-NEXT: s_bfe_u64 s[0:1], s[4:5], 0x200000 -; CGP-NEXT: s_bfe_u64 s[6:7], s[2:3], 0x200000 +; CGP-NEXT: s_ashr_i32 s4, s0, 31 +; CGP-NEXT: s_ashr_i32 s2, s1, 31 +; CGP-NEXT: s_add_i32 s0, s0, s4 +; CGP-NEXT: s_add_i32 s1, s1, s2 +; CGP-NEXT: s_xor_b32 s0, s0, s4 +; CGP-NEXT: s_xor_b32 s1, s1, s2 +; CGP-NEXT: v_cvt_f32_u32_e32 v0, s1 +; CGP-NEXT: s_sub_i32 s5, 0, s1 +; CGP-NEXT: s_bfe_u64 s[2:3], s[0:1], 0x200000 ; CGP-NEXT: v_rcp_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, 0, s0 -; CGP-NEXT: v_mul_lo_u32 v2, 0, s6 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_lo_u32 v1, s2, 0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v3, v0, s0 -; CGP-NEXT: v_mul_lo_u32 v4, v0, s1 -; CGP-NEXT: v_mul_hi_u32 v5, v0, s0 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v1, 0 -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v3, s[0:1], v6, v3 -; CGP-NEXT: v_add_i32_e64 v1, s[0:1], v3, v1 -; CGP-NEXT: v_add_i32_e64 v3, s[0:1], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc -; CGP-NEXT: v_mul_lo_u32 v1, v0, s7 -; CGP-NEXT: v_mul_hi_u32 v0, v0, s6 +; CGP-NEXT: v_mul_lo_u32 v2, s5, v0 +; CGP-NEXT: v_mul_lo_u32 v3, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v4, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_mul_lo_u32 v2, s3, v0 +; CGP-NEXT: v_mul_hi_u32 v0, s2, v0 ; CGP-NEXT: v_add_i32_e32 v1, vcc, v2, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v1, v0 -; CGP-NEXT: v_mul_lo_u32 v0, v0, s4 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, s2, v0 -; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 -; CGP-NEXT: v_add_i32_e64 v2, s[0:1], s4, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[0:1], s2, v0 -; CGP-NEXT: v_subrev_i32_e64 v0, s[2:3], s4, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[0:1] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] -; CGP-NEXT: v_xor_b32_e32 v0, s5, v0 -; CGP-NEXT: v_subrev_i32_e32 v0, vcc, s5, v0 +; CGP-NEXT: v_mul_lo_u32 v0, v0, s1 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; CGP-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_xor_b32_e32 v0, s4, v0 +; CGP-NEXT: v_subrev_i32_e32 v0, vcc, s4, v0 ; CGP-NEXT: v_readfirstlane_b32 s0, v0 ; CGP-NEXT: ; return to shader part epilog %result = srem i32 %num, %den @@ -269,73 +249,55 @@ define <2 x i32> @v_srem_v2i32(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_xor_b32_e32 v1, v1, v6 ; CGP-NEXT: v_xor_b32_e32 v3, v3, v7 ; CGP-NEXT: v_cvt_f32_u32_e32 v5, v2 -; CGP-NEXT: v_mul_lo_u32 v7, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v8, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v9, v3 -; CGP-NEXT: v_mul_lo_u32 v10, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v11, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v10, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v11, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v5, v5 ; CGP-NEXT: v_rcp_f32_e32 v9, v9 -; CGP-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; CGP-NEXT: v_mul_f32_e32 v9, 0x4f800000, v9 +; CGP-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v5 +; CGP-NEXT: v_mul_f32_e32 v9, 0x4f7ffffe, v9 ; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 ; CGP-NEXT: v_cvt_u32_f32_e32 v9, v9 -; CGP-NEXT: v_mul_lo_u32 v12, v5, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v14, v5, v2 -; CGP-NEXT: v_mul_lo_u32 v15, v9, v3 -; CGP-NEXT: v_mul_lo_u32 v16, v9, 0 -; CGP-NEXT: v_mul_hi_u32 v17, v9, v3 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v13 -; CGP-NEXT: v_sub_i32_e32 v18, vcc, 0, v12 -; CGP-NEXT: v_add_i32_e32 v10, vcc, v10, v16 -; CGP-NEXT: v_sub_i32_e32 v19, vcc, 0, v15 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v14 -; CGP-NEXT: v_add_i32_e32 v10, vcc, v10, v17 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; CGP-NEXT: v_cndmask_b32_e32 v7, v12, v18, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v10, v15, v19, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v12, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v5 -; CGP-NEXT: v_mul_lo_u32 v14, v10, 0 -; CGP-NEXT: v_mul_hi_u32 v10, v10, v9 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v13, v12 -; CGP-NEXT: v_add_i32_e64 v13, s[6:7], v16, v14 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v12, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v13, v10 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v5, v7 -; CGP-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v9, v10 -; CGP-NEXT: v_sub_i32_e64 v9, s[6:7], v9, v10 -; CGP-NEXT: v_cndmask_b32_e32 v5, v5, v12, vcc -; CGP-NEXT: v_cndmask_b32_e64 v7, v9, v7, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v9, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v0 -; CGP-NEXT: v_mul_lo_u32 v10, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v1 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v9 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v11, v10 +; CGP-NEXT: v_mul_lo_u32 v7, v7, v5 +; CGP-NEXT: v_mul_lo_u32 v12, v5, 0 +; CGP-NEXT: v_mul_lo_u32 v10, v10, v9 +; CGP-NEXT: v_mul_lo_u32 v13, v9, 0 +; CGP-NEXT: v_mul_lo_u32 v14, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v5, v7 +; CGP-NEXT: v_mul_lo_u32 v15, 0, v10 +; CGP-NEXT: v_mul_hi_u32 v10, v9, v10 +; CGP-NEXT: v_add_i32_e32 v12, vcc, v14, v12 +; CGP-NEXT: v_add_i32_e32 v13, vcc, v15, v13 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v12, v7 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v13, v10 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v7 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v10 +; CGP-NEXT: v_mul_lo_u32 v9, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v0, v5 +; CGP-NEXT: v_mul_lo_u32 v10, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v1, v7 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v9, v8 +; CGP-NEXT: v_add_i32_e32 v9, vcc, v10, v11 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v8, v5 ; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v7 ; CGP-NEXT: v_mul_lo_u32 v5, v5, v2 ; CGP-NEXT: v_mul_lo_u32 v7, v7, v3 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, v1, v7 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v8, v2 -; CGP-NEXT: v_add_i32_e64 v10, s[4:5], v8, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v8, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v9, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v9, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v7 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v9, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v4 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v6 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -805,77 +767,59 @@ define <2 x i32> @v_srem_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; CGP-NEXT: v_ashrrev_i32_e32 v7, 31, v3 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v6 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v0 +; CGP-NEXT: v_mul_lo_u32 v8, v0, 0 ; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_xor_b32_e32 v2, v2, v6 ; CGP-NEXT: v_xor_b32_e32 v3, v3, v7 ; CGP-NEXT: v_cvt_f32_u32_e32 v6, v2 -; CGP-NEXT: v_mul_lo_u32 v7, 0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 ; CGP-NEXT: v_cvt_f32_u32_e32 v10, v3 -; CGP-NEXT: v_mul_lo_u32 v11, 0, v3 +; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v3 ; CGP-NEXT: v_rcp_f32_e32 v6, v6 ; CGP-NEXT: v_rcp_f32_e32 v10, v10 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_mul_f32_e32 v10, 0x4f800000, v10 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v10, 0x4f7ffffe, v10 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 ; CGP-NEXT: v_cvt_u32_f32_e32 v10, v10 -; CGP-NEXT: v_mul_lo_u32 v12, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v14, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v15, v10, v3 -; CGP-NEXT: v_mul_lo_u32 v16, v10, 0 -; CGP-NEXT: v_mul_hi_u32 v17, v10, v3 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v13 -; CGP-NEXT: v_sub_i32_e32 v18, vcc, 0, v12 -; CGP-NEXT: v_add_i32_e32 v11, vcc, v11, v16 -; CGP-NEXT: v_sub_i32_e32 v19, vcc, 0, v15 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v14 -; CGP-NEXT: v_add_i32_e32 v11, vcc, v11, v17 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; CGP-NEXT: v_cndmask_b32_e32 v7, v12, v18, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; CGP-NEXT: v_cndmask_b32_e64 v11, v15, v19, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v12, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v6 -; CGP-NEXT: v_mul_lo_u32 v14, v11, 0 -; CGP-NEXT: v_mul_hi_u32 v11, v11, v10 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v13, v12 -; CGP-NEXT: v_add_i32_e64 v13, s[6:7], v16, v14 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v12, v7 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v13, v11 -; CGP-NEXT: v_add_i32_e64 v12, s[6:7], v6, v7 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v10, v11 -; CGP-NEXT: v_sub_i32_e64 v10, s[6:7], v10, v11 -; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v12, vcc -; CGP-NEXT: v_cndmask_b32_e64 v7, v10, v7, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v0 -; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v1 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v10 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v11 +; CGP-NEXT: v_mul_lo_u32 v7, v7, v6 +; CGP-NEXT: v_mul_lo_u32 v12, v6, 0 +; CGP-NEXT: v_mul_lo_u32 v11, v11, v10 +; CGP-NEXT: v_mul_lo_u32 v13, v10, 0 +; CGP-NEXT: v_mul_lo_u32 v14, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v6, v7 +; CGP-NEXT: v_mul_lo_u32 v15, 0, v11 +; CGP-NEXT: v_mul_hi_u32 v11, v10, v11 +; CGP-NEXT: v_add_i32_e32 v12, vcc, v14, v12 +; CGP-NEXT: v_add_i32_e32 v13, vcc, v15, v13 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v12, v7 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v10, v11 +; CGP-NEXT: v_mul_lo_u32 v10, 0, v6 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_lo_u32 v11, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v1, v7 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v10, v8 +; CGP-NEXT: v_add_i32_e32 v9, vcc, v11, v9 ; CGP-NEXT: v_add_i32_e32 v6, vcc, v8, v6 ; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v7 ; CGP-NEXT: v_mul_lo_u32 v6, v6, v2 ; CGP-NEXT: v_mul_lo_u32 v7, v7, v3 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, v1, v7 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v8, v2 -; CGP-NEXT: v_add_i32_e64 v10, s[4:5], v8, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v8, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v9, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v9, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v7 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v9, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v4 ; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -933,39 +877,30 @@ define i32 @v_srem_i32_24bit(i32 %num, i32 %den) { ; CGP-NEXT: v_and_b32_e32 v0, s4, v0 ; CGP-NEXT: v_and_b32_e32 v1, s4, v1 ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v2, v2, v1 -; CGP-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and i32 %num, 16777215 %den.mask = and i32 %den, 16777215 @@ -1055,73 +990,55 @@ define <2 x i32> @v_srem_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_and_b32_e32 v2, s4, v2 ; CGP-NEXT: v_and_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v4, v4, v2 ; CGP-NEXT: v_mul_lo_u32 v5, v5, v3 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; CGP-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and <2 x i32> %num, %den.mask = and <2 x i32> %den, diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll index 31ce2d033eea..336305347f53 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll @@ -37,39 +37,32 @@ define i32 @v_udiv_i32(i32 %num, i32 %den) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v3, v2, v1 ; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CGP-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %result = udiv i32 %num, %den ret i32 %result @@ -109,44 +102,34 @@ define amdgpu_ps i32 @s_udiv_i32(i32 inreg %num, i32 inreg %den) { ; ; CGP-LABEL: s_udiv_i32: ; CGP: ; %bb.0: -; CGP-NEXT: s_mov_b32 s4, s1 -; CGP-NEXT: v_cvt_f32_u32_e32 v0, s4 -; CGP-NEXT: s_bfe_u64 s[2:3], s[4:5], 0x200000 -; CGP-NEXT: s_bfe_u64 s[6:7], s[0:1], 0x200000 +; CGP-NEXT: v_cvt_f32_u32_e32 v0, s1 +; CGP-NEXT: s_sub_i32 s4, 0, s1 +; CGP-NEXT: s_bfe_u64 s[2:3], s[0:1], 0x200000 ; CGP-NEXT: v_rcp_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, 0, s2 -; CGP-NEXT: v_mul_lo_u32 v2, 0, s6 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_lo_u32 v1, s2, 0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v3, v0, s2 -; CGP-NEXT: v_mul_lo_u32 v4, v0, s3 -; CGP-NEXT: v_mul_hi_u32 v5, v0, s2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v1, 0 -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v3, s[2:3], v6, v3 -; CGP-NEXT: v_add_i32_e64 v1, s[2:3], v3, v1 -; CGP-NEXT: v_add_i32_e64 v3, s[2:3], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[2:3], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc -; CGP-NEXT: v_mul_lo_u32 v1, v0, s7 -; CGP-NEXT: v_mul_hi_u32 v0, v0, s6 +; CGP-NEXT: v_mul_lo_u32 v2, s4, v0 +; CGP-NEXT: v_mul_lo_u32 v3, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v4, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_mul_lo_u32 v2, s3, v0 +; CGP-NEXT: v_mul_hi_u32 v0, s2, v0 ; CGP-NEXT: v_add_i32_e32 v1, vcc, v2, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v1, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, s4 +; CGP-NEXT: v_mul_lo_u32 v1, v0, s1 +; CGP-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_subrev_i32_e64 v2, s[2:3], s1, v1 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; CGP-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, s0, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, s0, v1 -; CGP-NEXT: v_cmp_le_u32_e64 s[0:1], s4, v4 -; CGP-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CGP-NEXT: v_readfirstlane_b32 s0, v0 ; CGP-NEXT: ; return to shader part epilog %result = udiv i32 %num, %den @@ -210,73 +193,59 @@ define <2 x i32> @v_udiv_v2i32(<2 x i32> %num, <2 x i32> %den) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v6, v4, v2 ; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; CGP-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; CGP-NEXT: v_mul_lo_u32 v9, v5, v3 -; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v8, v5, v3 +; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; CGP-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %result = udiv <2 x i32> %num, %den ret <2 x i32> %result @@ -556,76 +525,62 @@ define <2 x i32> @v_udiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: s_movk_i32 s4, 0x1000 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v5, v1, 0 ; CGP-NEXT: v_lshl_b32_e32 v2, s4, v2 ; CGP-NEXT: v_lshl_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v6, v2 -; CGP-NEXT: v_mul_lo_u32 v7, 0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 ; CGP-NEXT: v_cvt_f32_u32_e32 v8, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v3 +; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; CGP-NEXT: v_rcp_f32_e32 v6, v6 ; CGP-NEXT: v_rcp_f32_e32 v8, v8 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_mul_f32_e32 v8, 0x4f800000, v8 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 ; CGP-NEXT: v_cvt_u32_f32_e32 v8, v8 -; CGP-NEXT: v_mul_lo_u32 v10, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v8, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v8, v3 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v12 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; CGP-NEXT: v_cndmask_b32_e32 v7, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; CGP-NEXT: v_cndmask_b32_e64 v9, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v6 -; CGP-NEXT: v_mul_lo_u32 v12, v9, 0 -; CGP-NEXT: v_mul_hi_u32 v9, v9, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v10, v7 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v11, v9 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v6, v7 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v8, v9 -; CGP-NEXT: v_sub_i32_e64 v8, s[6:7], v8, v9 -; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v7, v8, v7, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v8, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v0 -; CGP-NEXT: v_mul_lo_u32 v9, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v1 -; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v8 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v9 +; CGP-NEXT: v_mul_lo_u32 v7, v7, v6 +; CGP-NEXT: v_mul_lo_u32 v10, v6, 0 +; CGP-NEXT: v_mul_lo_u32 v9, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v11, v8, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v6, v7 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v9 +; CGP-NEXT: v_mul_hi_u32 v9, v8, v9 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v10, v7 +; CGP-NEXT: v_add_i32_e32 v9, vcc, v11, v9 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v6 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_lo_u32 v9, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v1, v7 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v8, v4 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v9, v5 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v6 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v7 ; CGP-NEXT: v_mul_lo_u32 v6, v4, v2 ; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; CGP-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; CGP-NEXT: v_mul_lo_u32 v9, v5, v3 -; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v8, v5, v3 +; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; CGP-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %shl.y = shl <2 x i32> , %y %r = udiv <2 x i32> %x, %shl.y @@ -671,39 +626,32 @@ define i32 @v_udiv_i32_24bit(i32 %num, i32 %den) { ; CGP-NEXT: v_and_b32_e32 v0, s4, v0 ; CGP-NEXT: v_and_b32_e32 v1, s4, v1 ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v3, v2, v1 ; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CGP-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and i32 %num, 16777215 %den.mask = and i32 %den, 16777215 @@ -777,73 +725,59 @@ define <2 x i32> @v_udiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_and_b32_e32 v2, s4, v2 ; CGP-NEXT: v_and_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v6, v4, v2 ; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; CGP-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; CGP-NEXT: v_mul_lo_u32 v9, v5, v3 -; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; CGP-NEXT: v_mul_lo_u32 v8, v5, v3 +; CGP-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; CGP-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and <2 x i32> %num, %den.mask = and <2 x i32> %den, diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll index 45ce6cdf4210..265246c5e8ec 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll @@ -37,39 +37,30 @@ define i32 @v_urem_i32(i32 %num, i32 %den) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v2, v2, v1 -; CGP-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %result = urem i32 %num, %den ret i32 %result @@ -109,44 +100,32 @@ define amdgpu_ps i32 @s_urem_i32(i32 inreg %num, i32 inreg %den) { ; ; CGP-LABEL: s_urem_i32: ; CGP: ; %bb.0: -; CGP-NEXT: s_mov_b32 s4, s1 -; CGP-NEXT: v_cvt_f32_u32_e32 v0, s4 -; CGP-NEXT: s_bfe_u64 s[2:3], s[4:5], 0x200000 -; CGP-NEXT: s_bfe_u64 s[6:7], s[0:1], 0x200000 +; CGP-NEXT: v_cvt_f32_u32_e32 v0, s1 +; CGP-NEXT: s_sub_i32 s4, 0, s1 +; CGP-NEXT: s_bfe_u64 s[2:3], s[0:1], 0x200000 ; CGP-NEXT: v_rcp_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, 0, s2 -; CGP-NEXT: v_mul_lo_u32 v2, 0, s6 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_lo_u32 v1, s2, 0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v3, v0, s2 -; CGP-NEXT: v_mul_lo_u32 v4, v0, s3 -; CGP-NEXT: v_mul_hi_u32 v5, v0, s2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v1, 0 -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v3, s[2:3], v6, v3 -; CGP-NEXT: v_add_i32_e64 v1, s[2:3], v3, v1 -; CGP-NEXT: v_add_i32_e64 v3, s[2:3], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[2:3], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc -; CGP-NEXT: v_mul_lo_u32 v1, v0, s7 -; CGP-NEXT: v_mul_hi_u32 v0, v0, s6 +; CGP-NEXT: v_mul_lo_u32 v2, s4, v0 +; CGP-NEXT: v_mul_lo_u32 v3, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v4, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_mul_lo_u32 v2, s3, v0 +; CGP-NEXT: v_mul_hi_u32 v0, s2, v0 ; CGP-NEXT: v_add_i32_e32 v1, vcc, v2, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v1, v0 -; CGP-NEXT: v_mul_lo_u32 v0, v0, s4 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, s0, v0 -; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 -; CGP-NEXT: v_add_i32_e64 v2, s[2:3], s4, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[0:1], s0, v0 -; CGP-NEXT: v_subrev_i32_e64 v0, s[2:3], s4, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[0:1] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] +; CGP-NEXT: v_mul_lo_u32 v0, v0, s1 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; CGP-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CGP-NEXT: v_readfirstlane_b32 s0, v0 ; CGP-NEXT: ; return to shader part epilog %result = urem i32 %num, %den @@ -210,73 +189,55 @@ define <2 x i32> @v_urem_v2i32(<2 x i32> %num, <2 x i32> %den) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v4, v4, v2 ; CGP-NEXT: v_mul_lo_u32 v5, v5, v3 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; CGP-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %result = urem <2 x i32> %num, %den ret <2 x i32> %result @@ -557,76 +518,58 @@ define <2 x i32> @v_urem_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CGP-NEXT: s_movk_i32 s4, 0x1000 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 +; CGP-NEXT: v_mul_lo_u32 v5, v1, 0 ; CGP-NEXT: v_lshl_b32_e32 v2, s4, v2 ; CGP-NEXT: v_lshl_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v6, v2 -; CGP-NEXT: v_mul_lo_u32 v7, 0, v2 +; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 ; CGP-NEXT: v_cvt_f32_u32_e32 v8, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v3 +; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; CGP-NEXT: v_rcp_f32_e32 v6, v6 ; CGP-NEXT: v_rcp_f32_e32 v8, v8 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_mul_f32_e32 v8, 0x4f800000, v8 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 ; CGP-NEXT: v_cvt_u32_f32_e32 v8, v8 -; CGP-NEXT: v_mul_lo_u32 v10, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v6, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v8, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v8, v3 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v7, v12 -; CGP-NEXT: v_add_i32_e32 v9, vcc, v9, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; CGP-NEXT: v_cndmask_b32_e32 v7, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; CGP-NEXT: v_cndmask_b32_e64 v9, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v6 -; CGP-NEXT: v_mul_lo_u32 v12, v9, 0 -; CGP-NEXT: v_mul_hi_u32 v9, v9, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v10, v7 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v11, v9 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v6, v7 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v8, v9 -; CGP-NEXT: v_sub_i32_e64 v8, s[6:7], v8, v9 -; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v7, v8, v7, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v8, v6, 0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v0 -; CGP-NEXT: v_mul_lo_u32 v9, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v7, v1 -; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v8 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v9 +; CGP-NEXT: v_mul_lo_u32 v7, v7, v6 +; CGP-NEXT: v_mul_lo_u32 v10, v6, 0 +; CGP-NEXT: v_mul_lo_u32 v9, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v11, v8, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v6, v7 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v9 +; CGP-NEXT: v_mul_hi_u32 v9, v8, v9 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v10, v7 +; CGP-NEXT: v_add_i32_e32 v9, vcc, v11, v9 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v6 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_lo_u32 v9, 0, v7 +; CGP-NEXT: v_mul_hi_u32 v7, v1, v7 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v8, v4 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v9, v5 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v6 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v7 ; CGP-NEXT: v_mul_lo_u32 v4, v4, v2 ; CGP-NEXT: v_mul_lo_u32 v5, v5, v3 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; CGP-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %shl.y = shl <2 x i32> , %y %r = urem <2 x i32> %x, %shl.y @@ -672,39 +615,30 @@ define i32 @v_urem_i32_24bit(i32 %num, i32 %den) { ; CGP-NEXT: v_and_b32_e32 v0, s4, v0 ; CGP-NEXT: v_and_b32_e32 v1, s4, v1 ; CGP-NEXT: v_cvt_f32_u32_e32 v2, v1 -; CGP-NEXT: v_mul_lo_u32 v3, 0, v1 -; CGP-NEXT: v_mul_lo_u32 v4, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 +; CGP-NEXT: v_mul_lo_u32 v4, v0, 0 ; CGP-NEXT: v_rcp_f32_e32 v2, v2 -; CGP-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CGP-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CGP-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CGP-NEXT: v_mul_lo_u32 v5, v2, v1 -; CGP-NEXT: v_mul_lo_u32 v6, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v7, v2, v1 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v5 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v7 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CGP-NEXT: v_cndmask_b32_e32 v3, v5, v8, vcc -; CGP-NEXT: v_mul_lo_u32 v5, v3, 0 -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v6, v5 -; CGP-NEXT: v_add_i32_e64 v3, s[4:5], v5, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CGP-NEXT: v_mul_lo_u32 v3, v2, 0 -; CGP-NEXT: v_mul_hi_u32 v2, v2, v0 -; CGP-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v5, v2, 0 +; CGP-NEXT: v_mul_lo_u32 v6, 0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_mul_lo_u32 v3, 0, v2 +; CGP-NEXT: v_mul_hi_u32 v2, v0, v2 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 ; CGP-NEXT: v_add_i32_e32 v2, vcc, v3, v2 ; CGP-NEXT: v_mul_lo_u32 v2, v2, v1 -; CGP-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and i32 %num, 16777215 %den.mask = and i32 %den, 16777215 @@ -778,73 +712,55 @@ define <2 x i32> @v_urem_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; CGP-NEXT: v_and_b32_e32 v2, s4, v2 ; CGP-NEXT: v_and_b32_e32 v3, s4, v3 ; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CGP-NEXT: v_mul_lo_u32 v5, 0, v2 -; CGP-NEXT: v_mul_lo_u32 v6, 0, v0 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; CGP-NEXT: v_mul_lo_u32 v6, v0, 0 ; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 -; CGP-NEXT: v_mul_lo_u32 v8, 0, v3 -; CGP-NEXT: v_mul_lo_u32 v9, 0, v1 +; CGP-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; CGP-NEXT: v_mul_lo_u32 v9, v1, 0 ; CGP-NEXT: v_rcp_f32_e32 v4, v4 ; CGP-NEXT: v_rcp_f32_e32 v7, v7 -; CGP-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CGP-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 ; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 -; CGP-NEXT: v_mul_lo_u32 v10, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v11, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v12, v4, v2 -; CGP-NEXT: v_mul_lo_u32 v13, v7, v3 -; CGP-NEXT: v_mul_lo_u32 v14, v7, 0 -; CGP-NEXT: v_mul_hi_u32 v15, v7, v3 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v11 -; CGP-NEXT: v_sub_i32_e32 v16, vcc, 0, v10 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v14 -; CGP-NEXT: v_sub_i32_e32 v17, vcc, 0, v13 -; CGP-NEXT: v_add_i32_e32 v5, vcc, v5, v12 -; CGP-NEXT: v_add_i32_e32 v8, vcc, v8, v15 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v5, v10, v16, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CGP-NEXT: v_cndmask_b32_e64 v8, v13, v17, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v10, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v4 -; CGP-NEXT: v_mul_lo_u32 v12, v8, 0 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v7 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v11, v10 -; CGP-NEXT: v_add_i32_e64 v11, s[6:7], v14, v12 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v10, v5 -; CGP-NEXT: v_add_i32_e64 v8, s[6:7], v11, v8 -; CGP-NEXT: v_add_i32_e64 v10, s[6:7], v4, v5 -; CGP-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v5 -; CGP-NEXT: v_add_i32_e64 v5, s[6:7], v7, v8 -; CGP-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v8 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; CGP-NEXT: v_cndmask_b32_e64 v5, v7, v5, s[4:5] -; CGP-NEXT: v_mul_lo_u32 v7, v4, 0 -; CGP-NEXT: v_mul_hi_u32 v4, v4, v0 -; CGP-NEXT: v_mul_lo_u32 v8, v5, 0 -; CGP-NEXT: v_mul_hi_u32 v5, v5, v1 -; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v7 -; CGP-NEXT: v_add_i32_e32 v7, vcc, v9, v8 +; CGP-NEXT: v_mul_lo_u32 v5, v5, v4 +; CGP-NEXT: v_mul_lo_u32 v10, v4, 0 +; CGP-NEXT: v_mul_lo_u32 v8, v8, v7 +; CGP-NEXT: v_mul_lo_u32 v11, v7, 0 +; CGP-NEXT: v_mul_lo_u32 v12, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v4, v5 +; CGP-NEXT: v_mul_lo_u32 v13, 0, v8 +; CGP-NEXT: v_mul_hi_u32 v8, v7, v8 +; CGP-NEXT: v_add_i32_e32 v10, vcc, v12, v10 +; CGP-NEXT: v_add_i32_e32 v11, vcc, v13, v11 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v10, v5 +; CGP-NEXT: v_add_i32_e32 v8, vcc, v11, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v8 +; CGP-NEXT: v_mul_lo_u32 v7, 0, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_lo_u32 v8, 0, v5 +; CGP-NEXT: v_mul_hi_u32 v5, v1, v5 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v7, v6 +; CGP-NEXT: v_add_i32_e32 v7, vcc, v8, v9 ; CGP-NEXT: v_add_i32_e32 v4, vcc, v6, v4 ; CGP-NEXT: v_add_i32_e32 v5, vcc, v7, v5 ; CGP-NEXT: v_mul_lo_u32 v4, v4, v2 ; CGP-NEXT: v_mul_lo_u32 v5, v5, v3 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; CGP-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; CGP-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; CGP-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: s_setpc_b64 s[30:31] %num.mask = and <2 x i32> %num, %den.mask = and <2 x i32> %den, diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll index 8b1ec0b013a6..4d1731d55ad5 100644 --- a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll +++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll @@ -93,43 +93,35 @@ define i32 @select_sdiv_lhs_opaque_const0_i32(i1 %cond) { ; IR-NEXT: [[TMP4:%.*]] = xor i32 [[TMP3]], [[TMP1]] ; IR-NEXT: [[TMP5:%.*]] = uitofp i32 [[TMP4]] to float ; IR-NEXT: [[TMP6:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP5]]) -; IR-NEXT: [[TMP7:%.*]] = fmul fast float [[TMP6]], 0x41F0000000000000 +; IR-NEXT: [[TMP7:%.*]] = fmul fast float [[TMP6]], 0x41EFFFFFC0000000 ; IR-NEXT: [[TMP8:%.*]] = fptoui float [[TMP7]] to i32 -; IR-NEXT: [[TMP9:%.*]] = zext i32 [[TMP8]] to i64 -; IR-NEXT: [[TMP10:%.*]] = zext i32 [[TMP4]] to i64 -; IR-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] -; IR-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; IR-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; IR-NEXT: [[TMP9:%.*]] = sub i32 0, [[TMP4]] +; IR-NEXT: [[TMP10:%.*]] = mul i32 [[TMP9]], [[TMP8]] +; IR-NEXT: [[TMP11:%.*]] = zext i32 [[TMP8]] to i64 +; IR-NEXT: [[TMP12:%.*]] = zext i32 [[TMP10]] to i64 +; IR-NEXT: [[TMP13:%.*]] = mul i64 [[TMP11]], [[TMP12]] ; IR-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 -; IR-NEXT: [[TMP15:%.*]] = sub i32 0, [[TMP12]] -; IR-NEXT: [[TMP16:%.*]] = icmp eq i32 [[TMP14]], 0 -; IR-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP15]], i32 [[TMP12]] +; IR-NEXT: [[TMP15:%.*]] = lshr i64 [[TMP13]], 32 +; IR-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32 +; IR-NEXT: [[TMP17:%.*]] = add i32 [[TMP8]], [[TMP16]] ; IR-NEXT: [[TMP18:%.*]] = zext i32 [[TMP17]] to i64 -; IR-NEXT: [[TMP19:%.*]] = zext i32 [[TMP8]] to i64 -; IR-NEXT: [[TMP20:%.*]] = mul i64 [[TMP18]], [[TMP19]] -; IR-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; IR-NEXT: [[TMP22:%.*]] = lshr i64 [[TMP20]], 32 -; IR-NEXT: [[TMP23:%.*]] = trunc i64 [[TMP22]] to i32 -; IR-NEXT: [[TMP24:%.*]] = add i32 [[TMP8]], [[TMP23]] -; IR-NEXT: [[TMP25:%.*]] = sub i32 [[TMP8]], [[TMP23]] -; IR-NEXT: [[TMP26:%.*]] = select i1 [[TMP16]], i32 [[TMP24]], i32 [[TMP25]] -; IR-NEXT: [[TMP27:%.*]] = zext i32 [[TMP26]] to i64 -; IR-NEXT: [[TMP28:%.*]] = mul i64 [[TMP27]], 1000000 -; IR-NEXT: [[TMP29:%.*]] = trunc i64 [[TMP28]] to i32 -; IR-NEXT: [[TMP30:%.*]] = lshr i64 [[TMP28]], 32 -; IR-NEXT: [[TMP31:%.*]] = trunc i64 [[TMP30]] to i32 -; IR-NEXT: [[TMP32:%.*]] = mul i32 [[TMP31]], [[TMP4]] -; IR-NEXT: [[TMP33:%.*]] = sub i32 1000000, [[TMP32]] -; IR-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP33]], [[TMP4]] -; IR-NEXT: [[TMP35:%.*]] = icmp uge i32 1000000, [[TMP32]] -; IR-NEXT: [[TMP36:%.*]] = and i1 [[TMP34]], [[TMP35]] -; IR-NEXT: [[TMP37:%.*]] = add i32 [[TMP31]], 1 -; IR-NEXT: [[TMP38:%.*]] = sub i32 [[TMP31]], 1 -; IR-NEXT: [[TMP39:%.*]] = select i1 [[TMP36]], i32 [[TMP37]], i32 [[TMP31]] -; IR-NEXT: [[TMP40:%.*]] = select i1 [[TMP35]], i32 [[TMP39]], i32 [[TMP38]] -; IR-NEXT: [[TMP41:%.*]] = xor i32 [[TMP40]], [[TMP2]] -; IR-NEXT: [[TMP42:%.*]] = sub i32 [[TMP41]], [[TMP2]] -; IR-NEXT: ret i32 [[TMP42]] +; IR-NEXT: [[TMP19:%.*]] = mul i64 1000000, [[TMP18]] +; IR-NEXT: [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32 +; IR-NEXT: [[TMP21:%.*]] = lshr i64 [[TMP19]], 32 +; IR-NEXT: [[TMP22:%.*]] = trunc i64 [[TMP21]] to i32 +; IR-NEXT: [[TMP23:%.*]] = mul i32 [[TMP22]], [[TMP4]] +; IR-NEXT: [[TMP24:%.*]] = sub i32 1000000, [[TMP23]] +; IR-NEXT: [[TMP25:%.*]] = icmp uge i32 [[TMP24]], [[TMP4]] +; IR-NEXT: [[TMP26:%.*]] = add i32 [[TMP22]], 1 +; IR-NEXT: [[TMP27:%.*]] = select i1 [[TMP25]], i32 [[TMP26]], i32 [[TMP22]] +; IR-NEXT: [[TMP28:%.*]] = sub i32 [[TMP24]], [[TMP4]] +; IR-NEXT: [[TMP29:%.*]] = select i1 [[TMP25]], i32 [[TMP28]], i32 [[TMP24]] +; IR-NEXT: [[TMP30:%.*]] = icmp uge i32 [[TMP29]], [[TMP4]] +; IR-NEXT: [[TMP31:%.*]] = add i32 [[TMP27]], 1 +; IR-NEXT: [[TMP32:%.*]] = select i1 [[TMP30]], i32 [[TMP31]], i32 [[TMP27]] +; IR-NEXT: [[TMP33:%.*]] = xor i32 [[TMP32]], [[TMP2]] +; IR-NEXT: [[TMP34:%.*]] = sub i32 [[TMP33]], [[TMP2]] +; IR-NEXT: ret i32 [[TMP34]] ; ; GCN-LABEL: select_sdiv_lhs_opaque_const0_i32: ; GCN: ; %bb.0: @@ -140,7 +132,6 @@ define i32 @select_sdiv_lhs_opaque_const0_i32(i1 %cond) { ; GCN-NEXT: s_load_dword s4, s[4:5], 0x0 ; GCN-NEXT: v_and_b32_e32 v0, 1, v0 ; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0 -; GCN-NEXT: s_mov_b32 s6, 0xf4240 ; GCN-NEXT: s_waitcnt lgkmcnt(0) ; GCN-NEXT: v_mov_b32_e32 v1, s4 ; GCN-NEXT: v_cndmask_b32_e32 v0, 5, v1, vcc @@ -148,29 +139,25 @@ define i32 @select_sdiv_lhs_opaque_const0_i32(i1 %cond) { ; GCN-NEXT: v_add_u32_e32 v0, vcc, v0, v1 ; GCN-NEXT: v_xor_b32_e32 v0, v0, v1 ; GCN-NEXT: v_cvt_f32_u32_e32 v2, v0 +; GCN-NEXT: v_sub_u32_e32 v3, vcc, 0, v0 +; GCN-NEXT: s_mov_b32 s4, 0xf4240 ; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GCN-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GCN-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GCN-NEXT: v_mul_lo_u32 v3, v3, v2 +; GCN-NEXT: v_mul_hi_u32 v3, v2, v3 +; GCN-NEXT: v_add_u32_e32 v2, vcc, v2, v3 +; GCN-NEXT: v_mul_hi_u32 v2, v2, s4 ; GCN-NEXT: v_mul_lo_u32 v3, v2, v0 -; GCN-NEXT: v_mul_hi_u32 v4, v2, v0 -; GCN-NEXT: v_sub_u32_e32 v5, vcc, 0, v3 -; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GCN-NEXT: v_mul_hi_u32 v3, v3, v2 -; GCN-NEXT: v_add_u32_e64 v4, s[4:5], v2, v3 -; GCN-NEXT: v_sub_u32_e64 v2, s[4:5], v2, v3 +; GCN-NEXT: v_add_u32_e32 v4, vcc, 1, v2 +; GCN-NEXT: v_sub_u32_e32 v3, vcc, s4, v3 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v3, v0 ; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GCN-NEXT: v_mul_hi_u32 v2, v2, s6 -; GCN-NEXT: s_mov_b32 s4, 0xf4241 -; GCN-NEXT: v_mul_lo_u32 v3, v2, v0 +; GCN-NEXT: v_sub_u32_e64 v4, s[4:5], v3, v0 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc ; GCN-NEXT: v_add_u32_e32 v4, vcc, 1, v2 -; GCN-NEXT: v_add_u32_e32 v5, vcc, -1, v2 -; GCN-NEXT: v_sub_u32_e32 v6, vcc, s6, v3 -; GCN-NEXT: v_cmp_gt_u32_e32 vcc, s4, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v0 -; GCN-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GCN-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v3, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v4, vcc ; GCN-NEXT: v_xor_b32_e32 v0, v0, v1 ; GCN-NEXT: v_sub_u32_e32 v0, vcc, v0, v1 ; GCN-NEXT: s_setpc_b64 s[30:31] @@ -188,43 +175,35 @@ define i32 @select_sdiv_lhs_opaque_const1_i32(i1 %cond) { ; IR-NEXT: [[TMP4:%.*]] = xor i32 [[TMP3]], [[TMP1]] ; IR-NEXT: [[TMP5:%.*]] = uitofp i32 [[TMP4]] to float ; IR-NEXT: [[TMP6:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP5]]) -; IR-NEXT: [[TMP7:%.*]] = fmul fast float [[TMP6]], 0x41F0000000000000 +; IR-NEXT: [[TMP7:%.*]] = fmul fast float [[TMP6]], 0x41EFFFFFC0000000 ; IR-NEXT: [[TMP8:%.*]] = fptoui float [[TMP7]] to i32 -; IR-NEXT: [[TMP9:%.*]] = zext i32 [[TMP8]] to i64 -; IR-NEXT: [[TMP10:%.*]] = zext i32 [[TMP4]] to i64 -; IR-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] -; IR-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; IR-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; IR-NEXT: [[TMP9:%.*]] = sub i32 0, [[TMP4]] +; IR-NEXT: [[TMP10:%.*]] = mul i32 [[TMP9]], [[TMP8]] +; IR-NEXT: [[TMP11:%.*]] = zext i32 [[TMP8]] to i64 +; IR-NEXT: [[TMP12:%.*]] = zext i32 [[TMP10]] to i64 +; IR-NEXT: [[TMP13:%.*]] = mul i64 [[TMP11]], [[TMP12]] ; IR-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 -; IR-NEXT: [[TMP15:%.*]] = sub i32 0, [[TMP12]] -; IR-NEXT: [[TMP16:%.*]] = icmp eq i32 [[TMP14]], 0 -; IR-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP15]], i32 [[TMP12]] +; IR-NEXT: [[TMP15:%.*]] = lshr i64 [[TMP13]], 32 +; IR-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32 +; IR-NEXT: [[TMP17:%.*]] = add i32 [[TMP8]], [[TMP16]] ; IR-NEXT: [[TMP18:%.*]] = zext i32 [[TMP17]] to i64 -; IR-NEXT: [[TMP19:%.*]] = zext i32 [[TMP8]] to i64 -; IR-NEXT: [[TMP20:%.*]] = mul i64 [[TMP18]], [[TMP19]] -; IR-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; IR-NEXT: [[TMP22:%.*]] = lshr i64 [[TMP20]], 32 -; IR-NEXT: [[TMP23:%.*]] = trunc i64 [[TMP22]] to i32 -; IR-NEXT: [[TMP24:%.*]] = add i32 [[TMP8]], [[TMP23]] -; IR-NEXT: [[TMP25:%.*]] = sub i32 [[TMP8]], [[TMP23]] -; IR-NEXT: [[TMP26:%.*]] = select i1 [[TMP16]], i32 [[TMP24]], i32 [[TMP25]] -; IR-NEXT: [[TMP27:%.*]] = zext i32 [[TMP26]] to i64 -; IR-NEXT: [[TMP28:%.*]] = mul i64 [[TMP27]], 1000000 -; IR-NEXT: [[TMP29:%.*]] = trunc i64 [[TMP28]] to i32 -; IR-NEXT: [[TMP30:%.*]] = lshr i64 [[TMP28]], 32 -; IR-NEXT: [[TMP31:%.*]] = trunc i64 [[TMP30]] to i32 -; IR-NEXT: [[TMP32:%.*]] = mul i32 [[TMP31]], [[TMP4]] -; IR-NEXT: [[TMP33:%.*]] = sub i32 1000000, [[TMP32]] -; IR-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP33]], [[TMP4]] -; IR-NEXT: [[TMP35:%.*]] = icmp uge i32 1000000, [[TMP32]] -; IR-NEXT: [[TMP36:%.*]] = and i1 [[TMP34]], [[TMP35]] -; IR-NEXT: [[TMP37:%.*]] = add i32 [[TMP31]], 1 -; IR-NEXT: [[TMP38:%.*]] = sub i32 [[TMP31]], 1 -; IR-NEXT: [[TMP39:%.*]] = select i1 [[TMP36]], i32 [[TMP37]], i32 [[TMP31]] -; IR-NEXT: [[TMP40:%.*]] = select i1 [[TMP35]], i32 [[TMP39]], i32 [[TMP38]] -; IR-NEXT: [[TMP41:%.*]] = xor i32 [[TMP40]], [[TMP2]] -; IR-NEXT: [[TMP42:%.*]] = sub i32 [[TMP41]], [[TMP2]] -; IR-NEXT: ret i32 [[TMP42]] +; IR-NEXT: [[TMP19:%.*]] = mul i64 1000000, [[TMP18]] +; IR-NEXT: [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32 +; IR-NEXT: [[TMP21:%.*]] = lshr i64 [[TMP19]], 32 +; IR-NEXT: [[TMP22:%.*]] = trunc i64 [[TMP21]] to i32 +; IR-NEXT: [[TMP23:%.*]] = mul i32 [[TMP22]], [[TMP4]] +; IR-NEXT: [[TMP24:%.*]] = sub i32 1000000, [[TMP23]] +; IR-NEXT: [[TMP25:%.*]] = icmp uge i32 [[TMP24]], [[TMP4]] +; IR-NEXT: [[TMP26:%.*]] = add i32 [[TMP22]], 1 +; IR-NEXT: [[TMP27:%.*]] = select i1 [[TMP25]], i32 [[TMP26]], i32 [[TMP22]] +; IR-NEXT: [[TMP28:%.*]] = sub i32 [[TMP24]], [[TMP4]] +; IR-NEXT: [[TMP29:%.*]] = select i1 [[TMP25]], i32 [[TMP28]], i32 [[TMP24]] +; IR-NEXT: [[TMP30:%.*]] = icmp uge i32 [[TMP29]], [[TMP4]] +; IR-NEXT: [[TMP31:%.*]] = add i32 [[TMP27]], 1 +; IR-NEXT: [[TMP32:%.*]] = select i1 [[TMP30]], i32 [[TMP31]], i32 [[TMP27]] +; IR-NEXT: [[TMP33:%.*]] = xor i32 [[TMP32]], [[TMP2]] +; IR-NEXT: [[TMP34:%.*]] = sub i32 [[TMP33]], [[TMP2]] +; IR-NEXT: ret i32 [[TMP34]] ; ; GCN-LABEL: select_sdiv_lhs_opaque_const1_i32: ; GCN: ; %bb.0: @@ -235,7 +214,6 @@ define i32 @select_sdiv_lhs_opaque_const1_i32(i1 %cond) { ; GCN-NEXT: s_load_dword s4, s[4:5], 0x0 ; GCN-NEXT: v_and_b32_e32 v0, 1, v0 ; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0 -; GCN-NEXT: s_mov_b32 s6, 0xf4240 ; GCN-NEXT: s_waitcnt lgkmcnt(0) ; GCN-NEXT: v_mov_b32_e32 v1, s4 ; GCN-NEXT: v_cndmask_b32_e64 v0, v1, 5, vcc @@ -243,29 +221,25 @@ define i32 @select_sdiv_lhs_opaque_const1_i32(i1 %cond) { ; GCN-NEXT: v_add_u32_e32 v0, vcc, v0, v1 ; GCN-NEXT: v_xor_b32_e32 v0, v0, v1 ; GCN-NEXT: v_cvt_f32_u32_e32 v2, v0 +; GCN-NEXT: v_sub_u32_e32 v3, vcc, 0, v0 +; GCN-NEXT: s_mov_b32 s4, 0xf4240 ; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GCN-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GCN-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GCN-NEXT: v_mul_lo_u32 v3, v3, v2 +; GCN-NEXT: v_mul_hi_u32 v3, v2, v3 +; GCN-NEXT: v_add_u32_e32 v2, vcc, v2, v3 +; GCN-NEXT: v_mul_hi_u32 v2, v2, s4 ; GCN-NEXT: v_mul_lo_u32 v3, v2, v0 -; GCN-NEXT: v_mul_hi_u32 v4, v2, v0 -; GCN-NEXT: v_sub_u32_e32 v5, vcc, 0, v3 -; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GCN-NEXT: v_mul_hi_u32 v3, v3, v2 -; GCN-NEXT: v_add_u32_e64 v4, s[4:5], v2, v3 -; GCN-NEXT: v_sub_u32_e64 v2, s[4:5], v2, v3 +; GCN-NEXT: v_add_u32_e32 v4, vcc, 1, v2 +; GCN-NEXT: v_sub_u32_e32 v3, vcc, s4, v3 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v3, v0 ; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GCN-NEXT: v_mul_hi_u32 v2, v2, s6 -; GCN-NEXT: s_mov_b32 s4, 0xf4241 -; GCN-NEXT: v_mul_lo_u32 v3, v2, v0 +; GCN-NEXT: v_sub_u32_e64 v4, s[4:5], v3, v0 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc ; GCN-NEXT: v_add_u32_e32 v4, vcc, 1, v2 -; GCN-NEXT: v_add_u32_e32 v5, vcc, -1, v2 -; GCN-NEXT: v_sub_u32_e32 v6, vcc, s6, v3 -; GCN-NEXT: v_cmp_gt_u32_e32 vcc, s4, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v0 -; GCN-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GCN-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v3, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v4, vcc ; GCN-NEXT: v_xor_b32_e32 v0, v0, v1 ; GCN-NEXT: v_sub_u32_e32 v0, vcc, v0, v1 ; GCN-NEXT: s_setpc_b64 s[30:31] @@ -357,6 +331,7 @@ define float @select_fadd_lhs_const_i32_fmf(i1 %cond) { ; IR-LABEL: @select_fadd_lhs_const_i32_fmf( ; IR-NEXT: [[OP:%.*]] = select nnan nsz i1 [[COND:%.*]], float 3.000000e+00, float 5.000000e+00 ; IR-NEXT: ret float [[OP]] +; ; GCN-LABEL: select_fadd_lhs_const_i32_fmf: ; GCN: ; %bb.0: ; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -385,6 +360,7 @@ define i32 @select_mul_lhs_const_i32(i1 %cond) { ; IR-LABEL: @select_mul_lhs_const_i32( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i32 5000, i32 8000 ; IR-NEXT: ret i32 [[OP]] +; %select = select i1 %cond, i32 5, i32 8 %op = mul i32 1000, %select ret i32 %op @@ -404,6 +380,7 @@ define i32 @select_mul_rhs_const_i32(i1 %cond) { ; IR-LABEL: @select_mul_rhs_const_i32( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i32 5000, i32 8000 ; IR-NEXT: ret i32 [[OP]] +; %select = select i1 %cond, i32 5, i32 8 %op = mul i32 %select, 1000 ret i32 %op @@ -412,8 +389,9 @@ define i32 @select_mul_rhs_const_i32(i1 %cond) { define amdgpu_kernel void @select_add_lhs_const_i16(i1 %cond) { ; IR-LABEL: @select_add_lhs_const_i16( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i16 128, i16 131 -; IR-NEXT: store i16 [[OP]], i16 addrspace(1)* undef +; IR-NEXT: store i16 [[OP]], i16 addrspace(1)* undef, align 2 ; IR-NEXT: ret void +; ; GCN-LABEL: select_add_lhs_const_i16: ; GCN: ; %bb.0: ; GCN-NEXT: s_load_dword s0, s[4:5], 0x0 @@ -442,6 +420,7 @@ define i16 @select_add_trunc_select(i1 %cond) { ; IR-LABEL: @select_add_trunc_select( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i16 47, i16 50 ; IR-NEXT: ret i16 [[OP]] +; %select = select i1 %cond, i32 5, i32 8 %trunc = trunc i32 %select to i16 %op = add i16 %trunc, 42 @@ -452,6 +431,7 @@ define i32 @select_add_sext_select(i1 %cond) { ; IR-LABEL: @select_add_sext_select( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i32 29, i32 50 ; IR-NEXT: ret i32 [[OP]] +; ; GCN-LABEL: select_add_sext_select: ; GCN: ; %bb.0: ; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -469,6 +449,7 @@ define i32 @select_add_zext_select(i1 %cond) { ; IR-LABEL: @select_add_zext_select( ; IR-NEXT: [[OP:%.*]] = select i1 [[COND:%.*]], i32 47, i32 50 ; IR-NEXT: ret i32 [[OP]] +; ; GCN-LABEL: select_add_zext_select: ; GCN: ; %bb.0: ; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll index 19e6c5907967..76f3a4989635 100644 --- a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll +++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll @@ -7,74 +7,63 @@ define amdgpu_kernel void @udiv_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) { ; CHECK-LABEL: @udiv_i32( ; CHECK-NEXT: [[TMP1:%.*]] = uitofp i32 [[Y:%.*]] to float ; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP1]]) -; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP4:%.*]] = fptoui float [[TMP3]] to i32 -; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64 -; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[Y]] to i64 -; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[TMP6]] -; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP7]] to i32 -; CHECK-NEXT: [[TMP9:%.*]] = lshr i64 [[TMP7]], 32 +; CHECK-NEXT: [[TMP5:%.*]] = sub i32 0, [[Y]] +; CHECK-NEXT: [[TMP6:%.*]] = mul i32 [[TMP5]], [[TMP4]] +; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP4]] to i64 +; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] ; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = sub i32 0, [[TMP8]] -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[TMP10]], 0 -; CHECK-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]] -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP4]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 +; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[TMP4]], [[TMP12]] +; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[X:%.*]] to i64 +; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP13]] to i64 ; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP14]], [[TMP15]] ; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[TMP16]] to i32 ; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP16]], 32 ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 -; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[TMP4]], [[TMP19]] -; CHECK-NEXT: [[TMP21:%.*]] = sub i32 [[TMP4]], [[TMP19]] -; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP12]], i32 [[TMP20]], i32 [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64 -; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[X:%.*]] to i64 -; CHECK-NEXT: [[TMP25:%.*]] = mul i64 [[TMP23]], [[TMP24]] -; CHECK-NEXT: [[TMP26:%.*]] = trunc i64 [[TMP25]] to i32 -; CHECK-NEXT: [[TMP27:%.*]] = lshr i64 [[TMP25]], 32 -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = mul i32 [[TMP28]], [[Y]] -; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[X]], [[TMP29]] -; CHECK-NEXT: [[TMP31:%.*]] = icmp uge i32 [[TMP30]], [[Y]] -; CHECK-NEXT: [[TMP32:%.*]] = icmp uge i32 [[X]], [[TMP29]] -; CHECK-NEXT: [[TMP33:%.*]] = and i1 [[TMP31]], [[TMP32]] -; CHECK-NEXT: [[TMP34:%.*]] = add i32 [[TMP28]], 1 -; CHECK-NEXT: [[TMP35:%.*]] = sub i32 [[TMP28]], 1 -; CHECK-NEXT: [[TMP36:%.*]] = select i1 [[TMP33]], i32 [[TMP34]], i32 [[TMP28]] -; CHECK-NEXT: [[TMP37:%.*]] = select i1 [[TMP32]], i32 [[TMP36]], i32 [[TMP35]] -; CHECK-NEXT: store i32 [[TMP37]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP20:%.*]] = mul i32 [[TMP19]], [[Y]] +; CHECK-NEXT: [[TMP21:%.*]] = sub i32 [[X]], [[TMP20]] +; CHECK-NEXT: [[TMP22:%.*]] = icmp uge i32 [[TMP21]], [[Y]] +; CHECK-NEXT: [[TMP23:%.*]] = add i32 [[TMP19]], 1 +; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP22]], i32 [[TMP23]], i32 [[TMP19]] +; CHECK-NEXT: [[TMP25:%.*]] = sub i32 [[TMP21]], [[Y]] +; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP22]], i32 [[TMP25]], i32 [[TMP21]] +; CHECK-NEXT: [[TMP27:%.*]] = icmp uge i32 [[TMP26]], [[Y]] +; CHECK-NEXT: [[TMP28:%.*]] = add i32 [[TMP24]], 1 +; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP27]], i32 [[TMP28]], i32 [[TMP24]] +; CHECK-NEXT: store i32 [[TMP29]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xb ; GCN-NEXT: s_mov_b32 s7, 0xf000 ; GCN-NEXT: s_mov_b32 s6, -1 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s9 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s3 +; GCN-NEXT: s_sub_i32 s4, 0, s3 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 +; GCN-NEXT: v_mul_lo_u32 v1, s4, v0 +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s2, v0 +; GCN-NEXT: v_mul_lo_u32 v1, v0, s3 ; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s8, v1 -; GCN-NEXT: v_cmp_ge_u32_e32 vcc, s8, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s9, v4 -; GCN-NEXT: s_and_b64 s[0:1], s[0:1], vcc +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s2, v1 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1 ; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1] +; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s3, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GCN-NEXT: s_waitcnt lgkmcnt(0) ; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 ; GCN-NEXT: s_endpgm %r = udiv i32 %x, %y @@ -86,75 +75,59 @@ define amdgpu_kernel void @urem_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) { ; CHECK-LABEL: @urem_i32( ; CHECK-NEXT: [[TMP1:%.*]] = uitofp i32 [[Y:%.*]] to float ; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP1]]) -; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP4:%.*]] = fptoui float [[TMP3]] to i32 -; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64 -; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[Y]] to i64 -; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[TMP6]] -; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP7]] to i32 -; CHECK-NEXT: [[TMP9:%.*]] = lshr i64 [[TMP7]], 32 +; CHECK-NEXT: [[TMP5:%.*]] = sub i32 0, [[Y]] +; CHECK-NEXT: [[TMP6:%.*]] = mul i32 [[TMP5]], [[TMP4]] +; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP4]] to i64 +; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] ; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = sub i32 0, [[TMP8]] -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[TMP10]], 0 -; CHECK-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]] -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP4]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 +; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[TMP4]], [[TMP12]] +; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[X:%.*]] to i64 +; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP13]] to i64 ; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP14]], [[TMP15]] ; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[TMP16]] to i32 ; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP16]], 32 ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 -; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[TMP4]], [[TMP19]] -; CHECK-NEXT: [[TMP21:%.*]] = sub i32 [[TMP4]], [[TMP19]] -; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP12]], i32 [[TMP20]], i32 [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64 -; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[X:%.*]] to i64 -; CHECK-NEXT: [[TMP25:%.*]] = mul i64 [[TMP23]], [[TMP24]] -; CHECK-NEXT: [[TMP26:%.*]] = trunc i64 [[TMP25]] to i32 -; CHECK-NEXT: [[TMP27:%.*]] = lshr i64 [[TMP25]], 32 -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = mul i32 [[TMP28]], [[Y]] -; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[X]], [[TMP29]] -; CHECK-NEXT: [[TMP31:%.*]] = icmp uge i32 [[TMP30]], [[Y]] -; CHECK-NEXT: [[TMP32:%.*]] = icmp uge i32 [[X]], [[TMP29]] -; CHECK-NEXT: [[TMP33:%.*]] = and i1 [[TMP31]], [[TMP32]] -; CHECK-NEXT: [[TMP34:%.*]] = sub i32 [[TMP30]], [[Y]] -; CHECK-NEXT: [[TMP35:%.*]] = add i32 [[TMP30]], [[Y]] -; CHECK-NEXT: [[TMP36:%.*]] = select i1 [[TMP33]], i32 [[TMP34]], i32 [[TMP30]] -; CHECK-NEXT: [[TMP37:%.*]] = select i1 [[TMP32]], i32 [[TMP36]], i32 [[TMP35]] -; CHECK-NEXT: store i32 [[TMP37]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP20:%.*]] = mul i32 [[TMP19]], [[Y]] +; CHECK-NEXT: [[TMP21:%.*]] = sub i32 [[X]], [[TMP20]] +; CHECK-NEXT: [[TMP22:%.*]] = icmp uge i32 [[TMP21]], [[Y]] +; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP21]], [[Y]] +; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP22]], i32 [[TMP23]], i32 [[TMP21]] +; CHECK-NEXT: [[TMP25:%.*]] = icmp uge i32 [[TMP24]], [[Y]] +; CHECK-NEXT: [[TMP26:%.*]] = sub i32 [[TMP24]], [[Y]] +; CHECK-NEXT: [[TMP27:%.*]] = select i1 [[TMP25]], i32 [[TMP26]], i32 [[TMP24]] +; CHECK-NEXT: store i32 [[TMP27]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s7, 0xf000 -; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 +; GCN-NEXT: s_mov_b32 s3, 0xf000 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s9 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s5 +; GCN-NEXT: s_sub_i32 s2, 0, s5 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s9 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, s8, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s8, v0 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s9, v1 -; GCN-NEXT: v_add_i32_e32 v2, vcc, s9, v1 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s9, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[2:3] -; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GCN-NEXT: v_mul_lo_u32 v1, s2, v0 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s4, v0 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s5 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s5, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s5, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; GCN-NEXT: s_endpgm %r = urem i32 %x, %y store i32 %r, i32 addrspace(1)* %out @@ -172,44 +145,36 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) { ; CHECK-NEXT: [[TMP7:%.*]] = xor i32 [[TMP5]], [[TMP2]] ; CHECK-NEXT: [[TMP8:%.*]] = uitofp i32 [[TMP7]] to float ; CHECK-NEXT: [[TMP9:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP8]]) -; CHECK-NEXT: [[TMP10:%.*]] = fmul fast float [[TMP9]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP10:%.*]] = fmul fast float [[TMP9]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP11:%.*]] = fptoui float [[TMP10]] to i32 -; CHECK-NEXT: [[TMP12:%.*]] = zext i32 [[TMP11]] to i64 -; CHECK-NEXT: [[TMP13:%.*]] = zext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP12]], [[TMP13]] -; CHECK-NEXT: [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32 -; CHECK-NEXT: [[TMP16:%.*]] = lshr i64 [[TMP14]], 32 +; CHECK-NEXT: [[TMP12:%.*]] = sub i32 0, [[TMP7]] +; CHECK-NEXT: [[TMP13:%.*]] = mul i32 [[TMP12]], [[TMP11]] +; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP11]] to i64 +; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP13]] to i64 +; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP14]], [[TMP15]] ; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[TMP16]] to i32 -; CHECK-NEXT: [[TMP18:%.*]] = sub i32 0, [[TMP15]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i32 [[TMP17]], 0 -; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP18]], i32 [[TMP15]] -; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP20]] to i64 -; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP11]] to i64 +; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP16]], 32 +; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 +; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[TMP11]], [[TMP19]] +; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP20]] to i64 ; CHECK-NEXT: [[TMP23:%.*]] = mul i64 [[TMP21]], [[TMP22]] ; CHECK-NEXT: [[TMP24:%.*]] = trunc i64 [[TMP23]] to i32 ; CHECK-NEXT: [[TMP25:%.*]] = lshr i64 [[TMP23]], 32 ; CHECK-NEXT: [[TMP26:%.*]] = trunc i64 [[TMP25]] to i32 -; CHECK-NEXT: [[TMP27:%.*]] = add i32 [[TMP11]], [[TMP26]] -; CHECK-NEXT: [[TMP28:%.*]] = sub i32 [[TMP11]], [[TMP26]] -; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP19]], i32 [[TMP27]], i32 [[TMP28]] -; CHECK-NEXT: [[TMP30:%.*]] = zext i32 [[TMP29]] to i64 -; CHECK-NEXT: [[TMP31:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP32:%.*]] = mul i64 [[TMP30]], [[TMP31]] -; CHECK-NEXT: [[TMP33:%.*]] = trunc i64 [[TMP32]] to i32 -; CHECK-NEXT: [[TMP34:%.*]] = lshr i64 [[TMP32]], 32 -; CHECK-NEXT: [[TMP35:%.*]] = trunc i64 [[TMP34]] to i32 -; CHECK-NEXT: [[TMP36:%.*]] = mul i32 [[TMP35]], [[TMP7]] -; CHECK-NEXT: [[TMP37:%.*]] = sub i32 [[TMP6]], [[TMP36]] -; CHECK-NEXT: [[TMP38:%.*]] = icmp uge i32 [[TMP37]], [[TMP7]] -; CHECK-NEXT: [[TMP39:%.*]] = icmp uge i32 [[TMP6]], [[TMP36]] -; CHECK-NEXT: [[TMP40:%.*]] = and i1 [[TMP38]], [[TMP39]] -; CHECK-NEXT: [[TMP41:%.*]] = add i32 [[TMP35]], 1 -; CHECK-NEXT: [[TMP42:%.*]] = sub i32 [[TMP35]], 1 -; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP40]], i32 [[TMP41]], i32 [[TMP35]] -; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP39]], i32 [[TMP43]], i32 [[TMP42]] -; CHECK-NEXT: [[TMP45:%.*]] = xor i32 [[TMP44]], [[TMP3]] -; CHECK-NEXT: [[TMP46:%.*]] = sub i32 [[TMP45]], [[TMP3]] -; CHECK-NEXT: store i32 [[TMP46]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP27:%.*]] = mul i32 [[TMP26]], [[TMP7]] +; CHECK-NEXT: [[TMP28:%.*]] = sub i32 [[TMP6]], [[TMP27]] +; CHECK-NEXT: [[TMP29:%.*]] = icmp uge i32 [[TMP28]], [[TMP7]] +; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP26]], 1 +; CHECK-NEXT: [[TMP31:%.*]] = select i1 [[TMP29]], i32 [[TMP30]], i32 [[TMP26]] +; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP28]], [[TMP7]] +; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP29]], i32 [[TMP32]], i32 [[TMP28]] +; CHECK-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP33]], [[TMP7]] +; CHECK-NEXT: [[TMP35:%.*]] = add i32 [[TMP31]], 1 +; CHECK-NEXT: [[TMP36:%.*]] = select i1 [[TMP34]], i32 [[TMP35]], i32 [[TMP31]] +; CHECK-NEXT: [[TMP37:%.*]] = xor i32 [[TMP36]], [[TMP3]] +; CHECK-NEXT: [[TMP38:%.*]] = sub i32 [[TMP37]], [[TMP3]] +; CHECK-NEXT: store i32 [[TMP38]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i32: @@ -223,34 +188,30 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) { ; GCN-NEXT: s_add_i32 s3, s3, s8 ; GCN-NEXT: s_xor_b32 s9, s3, s8 ; GCN-NEXT: v_cvt_f32_u32_e32 v0, s9 -; GCN-NEXT: s_ashr_i32 s3, s2, 31 -; GCN-NEXT: s_add_i32 s2, s2, s3 -; GCN-NEXT: s_xor_b32 s2, s2, s3 +; GCN-NEXT: s_sub_i32 s3, 0, s9 +; GCN-NEXT: s_ashr_i32 s0, s2, 31 +; GCN-NEXT: s_add_i32 s1, s2, s0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s3, s3, s8 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: s_xor_b32 s1, s1, s0 +; GCN-NEXT: s_xor_b32 s2, s0, s8 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s2 +; GCN-NEXT: v_mul_lo_u32 v1, s3, v0 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s1, v0 ; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 ; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s2, v1 -; GCN-NEXT: v_cmp_ge_u32_e32 vcc, s2, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s9, v4 -; GCN-NEXT: s_and_b64 s[0:1], s[0:1], vcc +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s1, v1 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s9, v1 ; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GCN-NEXT: v_xor_b32_e32 v0, s3, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s3, v0 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s9, v1 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1] +; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GCN-NEXT: v_xor_b32_e32 v0, s2, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s2, v0 ; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 ; GCN-NEXT: s_endpgm %r = sdiv i32 %x, %y @@ -268,86 +229,69 @@ define amdgpu_kernel void @srem_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) { ; CHECK-NEXT: [[TMP6:%.*]] = xor i32 [[TMP4]], [[TMP2]] ; CHECK-NEXT: [[TMP7:%.*]] = uitofp i32 [[TMP6]] to float ; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP7]]) -; CHECK-NEXT: [[TMP9:%.*]] = fmul fast float [[TMP8]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP9:%.*]] = fmul fast float [[TMP8]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP10:%.*]] = fptoui float [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = zext i32 [[TMP10]] to i64 -; CHECK-NEXT: [[TMP12:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP13:%.*]] = mul i64 [[TMP11]], [[TMP12]] -; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 -; CHECK-NEXT: [[TMP15:%.*]] = lshr i64 [[TMP13]], 32 +; CHECK-NEXT: [[TMP11:%.*]] = sub i32 0, [[TMP6]] +; CHECK-NEXT: [[TMP12:%.*]] = mul i32 [[TMP11]], [[TMP10]] +; CHECK-NEXT: [[TMP13:%.*]] = zext i32 [[TMP10]] to i64 +; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP12]] to i64 +; CHECK-NEXT: [[TMP15:%.*]] = mul i64 [[TMP13]], [[TMP14]] ; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32 -; CHECK-NEXT: [[TMP17:%.*]] = sub i32 0, [[TMP14]] -; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i32 [[TMP16]], 0 -; CHECK-NEXT: [[TMP19:%.*]] = select i1 [[TMP18]], i32 [[TMP17]], i32 [[TMP14]] -; CHECK-NEXT: [[TMP20:%.*]] = zext i32 [[TMP19]] to i64 -; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP10]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = lshr i64 [[TMP15]], 32 +; CHECK-NEXT: [[TMP18:%.*]] = trunc i64 [[TMP17]] to i32 +; CHECK-NEXT: [[TMP19:%.*]] = add i32 [[TMP10]], [[TMP18]] +; CHECK-NEXT: [[TMP20:%.*]] = zext i32 [[TMP5]] to i64 +; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP19]] to i64 ; CHECK-NEXT: [[TMP22:%.*]] = mul i64 [[TMP20]], [[TMP21]] ; CHECK-NEXT: [[TMP23:%.*]] = trunc i64 [[TMP22]] to i32 ; CHECK-NEXT: [[TMP24:%.*]] = lshr i64 [[TMP22]], 32 ; CHECK-NEXT: [[TMP25:%.*]] = trunc i64 [[TMP24]] to i32 -; CHECK-NEXT: [[TMP26:%.*]] = add i32 [[TMP10]], [[TMP25]] -; CHECK-NEXT: [[TMP27:%.*]] = sub i32 [[TMP10]], [[TMP25]] -; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP18]], i32 [[TMP26]], i32 [[TMP27]] -; CHECK-NEXT: [[TMP29:%.*]] = zext i32 [[TMP28]] to i64 -; CHECK-NEXT: [[TMP30:%.*]] = zext i32 [[TMP5]] to i64 -; CHECK-NEXT: [[TMP31:%.*]] = mul i64 [[TMP29]], [[TMP30]] -; CHECK-NEXT: [[TMP32:%.*]] = trunc i64 [[TMP31]] to i32 -; CHECK-NEXT: [[TMP33:%.*]] = lshr i64 [[TMP31]], 32 -; CHECK-NEXT: [[TMP34:%.*]] = trunc i64 [[TMP33]] to i32 -; CHECK-NEXT: [[TMP35:%.*]] = mul i32 [[TMP34]], [[TMP6]] -; CHECK-NEXT: [[TMP36:%.*]] = sub i32 [[TMP5]], [[TMP35]] -; CHECK-NEXT: [[TMP37:%.*]] = icmp uge i32 [[TMP36]], [[TMP6]] -; CHECK-NEXT: [[TMP38:%.*]] = icmp uge i32 [[TMP5]], [[TMP35]] -; CHECK-NEXT: [[TMP39:%.*]] = and i1 [[TMP37]], [[TMP38]] -; CHECK-NEXT: [[TMP40:%.*]] = sub i32 [[TMP36]], [[TMP6]] -; CHECK-NEXT: [[TMP41:%.*]] = add i32 [[TMP36]], [[TMP6]] -; CHECK-NEXT: [[TMP42:%.*]] = select i1 [[TMP39]], i32 [[TMP40]], i32 [[TMP36]] -; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP38]], i32 [[TMP42]], i32 [[TMP41]] -; CHECK-NEXT: [[TMP44:%.*]] = xor i32 [[TMP43]], [[TMP1]] -; CHECK-NEXT: [[TMP45:%.*]] = sub i32 [[TMP44]], [[TMP1]] -; CHECK-NEXT: store i32 [[TMP45]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP26:%.*]] = mul i32 [[TMP25]], [[TMP6]] +; CHECK-NEXT: [[TMP27:%.*]] = sub i32 [[TMP5]], [[TMP26]] +; CHECK-NEXT: [[TMP28:%.*]] = icmp uge i32 [[TMP27]], [[TMP6]] +; CHECK-NEXT: [[TMP29:%.*]] = sub i32 [[TMP27]], [[TMP6]] +; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP28]], i32 [[TMP29]], i32 [[TMP27]] +; CHECK-NEXT: [[TMP31:%.*]] = icmp uge i32 [[TMP30]], [[TMP6]] +; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP30]], [[TMP6]] +; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP31]], i32 [[TMP32]], i32 [[TMP30]] +; CHECK-NEXT: [[TMP34:%.*]] = xor i32 [[TMP33]], [[TMP1]] +; CHECK-NEXT: [[TMP35:%.*]] = sub i32 [[TMP34]], [[TMP1]] +; CHECK-NEXT: store i32 [[TMP35]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb -; GCN-NEXT: s_mov_b32 s7, 0xf000 -; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xb +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_ashr_i32 s2, s5, 31 -; GCN-NEXT: s_add_i32 s3, s5, s2 -; GCN-NEXT: s_xor_b32 s10, s3, s2 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s10 -; GCN-NEXT: s_ashr_i32 s8, s4, 31 -; GCN-NEXT: s_add_i32 s4, s4, s8 -; GCN-NEXT: s_xor_b32 s9, s4, s8 +; GCN-NEXT: s_ashr_i32 s4, s3, 31 +; GCN-NEXT: s_add_i32 s3, s3, s4 +; GCN-NEXT: s_xor_b32 s4, s3, s4 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s4 +; GCN-NEXT: s_sub_i32 s3, 0, s4 +; GCN-NEXT: s_ashr_i32 s5, s2, 31 +; GCN-NEXT: s_add_i32 s2, s2, s5 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: s_xor_b32 s6, s2, s5 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s10 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s10 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s9 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s10 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, s9, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v0 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s10, v1 -; GCN-NEXT: v_add_i32_e32 v2, vcc, s10, v1 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s10, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v0, s8, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s8, v0 -; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GCN-NEXT: v_mul_lo_u32 v1, s3, v0 +; GCN-NEXT: s_mov_b32 s3, 0xf000 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s6, v0 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s4 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s6, v0 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: v_xor_b32_e32 v0, s5, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s5, v0 +; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; GCN-NEXT: s_endpgm %r = srem i32 %x, %y store i32 %r, i32 addrspace(1)* %out @@ -373,7 +317,7 @@ define amdgpu_kernel void @udiv_i16(i16 addrspace(1)* %out, i16 %x, i16 %y) { ; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP10]], [[TMP14]] ; CHECK-NEXT: [[TMP16:%.*]] = and i32 [[TMP15]], 65535 ; CHECK-NEXT: [[TMP17:%.*]] = trunc i32 [[TMP16]] to i16 -; CHECK-NEXT: store i16 [[TMP17]], i16 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i16 [[TMP17]], i16 addrspace(1)* [[OUT:%.*]], align 2 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i16: @@ -422,7 +366,7 @@ define amdgpu_kernel void @urem_i16(i16 addrspace(1)* %out, i16 %x, i16 %y) { ; CHECK-NEXT: [[TMP17:%.*]] = sub i32 [[TMP1]], [[TMP16]] ; CHECK-NEXT: [[TMP18:%.*]] = and i32 [[TMP17]], 65535 ; CHECK-NEXT: [[TMP19:%.*]] = trunc i32 [[TMP18]] to i16 -; CHECK-NEXT: store i16 [[TMP19]], i16 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i16 [[TMP19]], i16 addrspace(1)* [[OUT:%.*]], align 2 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i16: @@ -475,7 +419,7 @@ define amdgpu_kernel void @sdiv_i16(i16 addrspace(1)* %out, i16 %x, i16 %y) { ; CHECK-NEXT: [[TMP19:%.*]] = shl i32 [[TMP18]], 16 ; CHECK-NEXT: [[TMP20:%.*]] = ashr i32 [[TMP19]], 16 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i32 [[TMP20]] to i16 -; CHECK-NEXT: store i16 [[TMP21]], i16 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i16 [[TMP21]], i16 addrspace(1)* [[OUT:%.*]], align 2 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i16: @@ -533,7 +477,7 @@ define amdgpu_kernel void @srem_i16(i16 addrspace(1)* %out, i16 %x, i16 %y) { ; CHECK-NEXT: [[TMP21:%.*]] = shl i32 [[TMP20]], 16 ; CHECK-NEXT: [[TMP22:%.*]] = ashr i32 [[TMP21]], 16 ; CHECK-NEXT: [[TMP23:%.*]] = trunc i32 [[TMP22]] to i16 -; CHECK-NEXT: store i16 [[TMP23]], i16 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i16 [[TMP23]], i16 addrspace(1)* [[OUT:%.*]], align 2 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i16: @@ -587,7 +531,7 @@ define amdgpu_kernel void @udiv_i8(i8 addrspace(1)* %out, i8 %x, i8 %y) { ; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP10]], [[TMP14]] ; CHECK-NEXT: [[TMP16:%.*]] = and i32 [[TMP15]], 255 ; CHECK-NEXT: [[TMP17:%.*]] = trunc i32 [[TMP16]] to i8 -; CHECK-NEXT: store i8 [[TMP17]], i8 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i8 [[TMP17]], i8 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i8: @@ -634,7 +578,7 @@ define amdgpu_kernel void @urem_i8(i8 addrspace(1)* %out, i8 %x, i8 %y) { ; CHECK-NEXT: [[TMP17:%.*]] = sub i32 [[TMP1]], [[TMP16]] ; CHECK-NEXT: [[TMP18:%.*]] = and i32 [[TMP17]], 255 ; CHECK-NEXT: [[TMP19:%.*]] = trunc i32 [[TMP18]] to i8 -; CHECK-NEXT: store i8 [[TMP19]], i8 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i8 [[TMP19]], i8 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i8: @@ -686,7 +630,7 @@ define amdgpu_kernel void @sdiv_i8(i8 addrspace(1)* %out, i8 %x, i8 %y) { ; CHECK-NEXT: [[TMP19:%.*]] = shl i32 [[TMP18]], 24 ; CHECK-NEXT: [[TMP20:%.*]] = ashr i32 [[TMP19]], 24 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i32 [[TMP20]] to i8 -; CHECK-NEXT: store i8 [[TMP21]], i8 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i8 [[TMP21]], i8 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i8: @@ -744,7 +688,7 @@ define amdgpu_kernel void @srem_i8(i8 addrspace(1)* %out, i8 %x, i8 %y) { ; CHECK-NEXT: [[TMP21:%.*]] = shl i32 [[TMP20]], 24 ; CHECK-NEXT: [[TMP22:%.*]] = ashr i32 [[TMP21]], 24 ; CHECK-NEXT: [[TMP23:%.*]] = trunc i32 [[TMP22]] to i8 -; CHECK-NEXT: store i8 [[TMP23]], i8 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i8 [[TMP23]], i8 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i8: @@ -786,266 +730,219 @@ define amdgpu_kernel void @udiv_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> %x ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[Y:%.*]], i64 0 ; CHECK-NEXT: [[TMP3:%.*]] = uitofp i32 [[TMP2]] to float ; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP3]]) -; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP6:%.*]] = fptoui float [[TMP5]] to i32 -; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP2]] to i64 -; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] -; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP7:%.*]] = sub i32 0, [[TMP2]] +; CHECK-NEXT: [[TMP8:%.*]] = mul i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] ; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP10]] -; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP12]], 0 -; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP10]] -; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP15]] to i64 -; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 +; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP6]], [[TMP14]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP1]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 ; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 ; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP14]], i32 [[TMP22]], i32 [[TMP23]] -; CHECK-NEXT: [[TMP25:%.*]] = zext i32 [[TMP24]] to i64 -; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP1]] to i64 -; CHECK-NEXT: [[TMP27:%.*]] = mul i64 [[TMP25]], [[TMP26]] -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = lshr i64 [[TMP27]], 32 -; CHECK-NEXT: [[TMP30:%.*]] = trunc i64 [[TMP29]] to i32 -; CHECK-NEXT: [[TMP31:%.*]] = mul i32 [[TMP30]], [[TMP2]] -; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP35:%.*]] = and i1 [[TMP33]], [[TMP34]] -; CHECK-NEXT: [[TMP36:%.*]] = add i32 [[TMP30]], 1 -; CHECK-NEXT: [[TMP37:%.*]] = sub i32 [[TMP30]], 1 -; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP35]], i32 [[TMP36]], i32 [[TMP30]] -; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP34]], i32 [[TMP38]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP40:%.*]] = insertelement <4 x i32> undef, i32 [[TMP39]], i64 0 -; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP42:%.*]] = extractelement <4 x i32> [[Y]], i64 1 -; CHECK-NEXT: [[TMP43:%.*]] = uitofp i32 [[TMP42]] to float -; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP43]]) -; CHECK-NEXT: [[TMP45:%.*]] = fmul fast float [[TMP44]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP46:%.*]] = fptoui float [[TMP45]] to i32 -; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP42]] to i64 -; CHECK-NEXT: [[TMP49:%.*]] = mul i64 [[TMP47]], [[TMP48]] -; CHECK-NEXT: [[TMP50:%.*]] = trunc i64 [[TMP49]] to i32 -; CHECK-NEXT: [[TMP51:%.*]] = lshr i64 [[TMP49]], 32 -; CHECK-NEXT: [[TMP52:%.*]] = trunc i64 [[TMP51]] to i32 -; CHECK-NEXT: [[TMP53:%.*]] = sub i32 0, [[TMP50]] -; CHECK-NEXT: [[TMP54:%.*]] = icmp eq i32 [[TMP52]], 0 -; CHECK-NEXT: [[TMP55:%.*]] = select i1 [[TMP54]], i32 [[TMP53]], i32 [[TMP50]] -; CHECK-NEXT: [[TMP56:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP58:%.*]] = mul i64 [[TMP56]], [[TMP57]] -; CHECK-NEXT: [[TMP59:%.*]] = trunc i64 [[TMP58]] to i32 -; CHECK-NEXT: [[TMP60:%.*]] = lshr i64 [[TMP58]], 32 -; CHECK-NEXT: [[TMP61:%.*]] = trunc i64 [[TMP60]] to i32 -; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP63:%.*]] = sub i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP64:%.*]] = select i1 [[TMP54]], i32 [[TMP62]], i32 [[TMP63]] -; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP64]] to i64 -; CHECK-NEXT: [[TMP66:%.*]] = zext i32 [[TMP41]] to i64 -; CHECK-NEXT: [[TMP67:%.*]] = mul i64 [[TMP65]], [[TMP66]] -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = lshr i64 [[TMP67]], 32 -; CHECK-NEXT: [[TMP70:%.*]] = trunc i64 [[TMP69]] to i32 -; CHECK-NEXT: [[TMP71:%.*]] = mul i32 [[TMP70]], [[TMP42]] -; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = icmp uge i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP74:%.*]] = icmp uge i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP75:%.*]] = and i1 [[TMP73]], [[TMP74]] -; CHECK-NEXT: [[TMP76:%.*]] = add i32 [[TMP70]], 1 -; CHECK-NEXT: [[TMP77:%.*]] = sub i32 [[TMP70]], 1 -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP75]], i32 [[TMP76]], i32 [[TMP70]] -; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP74]], i32 [[TMP78]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = insertelement <4 x i32> [[TMP40]], i32 [[TMP79]], i64 1 -; CHECK-NEXT: [[TMP81:%.*]] = extractelement <4 x i32> [[X]], i64 2 -; CHECK-NEXT: [[TMP82:%.*]] = extractelement <4 x i32> [[Y]], i64 2 -; CHECK-NEXT: [[TMP83:%.*]] = uitofp i32 [[TMP82]] to float -; CHECK-NEXT: [[TMP84:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP83]]) -; CHECK-NEXT: [[TMP85:%.*]] = fmul fast float [[TMP84]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP86:%.*]] = fptoui float [[TMP85]] to i32 -; CHECK-NEXT: [[TMP87:%.*]] = zext i32 [[TMP86]] to i64 -; CHECK-NEXT: [[TMP88:%.*]] = zext i32 [[TMP82]] to i64 -; CHECK-NEXT: [[TMP89:%.*]] = mul i64 [[TMP87]], [[TMP88]] -; CHECK-NEXT: [[TMP90:%.*]] = trunc i64 [[TMP89]] to i32 -; CHECK-NEXT: [[TMP91:%.*]] = lshr i64 [[TMP89]], 32 -; CHECK-NEXT: [[TMP92:%.*]] = trunc i64 [[TMP91]] to i32 -; CHECK-NEXT: [[TMP93:%.*]] = sub i32 0, [[TMP90]] -; CHECK-NEXT: [[TMP94:%.*]] = icmp eq i32 [[TMP92]], 0 -; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP93]], i32 [[TMP90]] -; CHECK-NEXT: [[TMP96:%.*]] = zext i32 [[TMP95]] to i64 -; CHECK-NEXT: [[TMP97:%.*]] = zext i32 [[TMP86]] to i64 -; CHECK-NEXT: [[TMP98:%.*]] = mul i64 [[TMP96]], [[TMP97]] -; CHECK-NEXT: [[TMP99:%.*]] = trunc i64 [[TMP98]] to i32 -; CHECK-NEXT: [[TMP100:%.*]] = lshr i64 [[TMP98]], 32 -; CHECK-NEXT: [[TMP101:%.*]] = trunc i64 [[TMP100]] to i32 -; CHECK-NEXT: [[TMP102:%.*]] = add i32 [[TMP86]], [[TMP101]] -; CHECK-NEXT: [[TMP103:%.*]] = sub i32 [[TMP86]], [[TMP101]] -; CHECK-NEXT: [[TMP104:%.*]] = select i1 [[TMP94]], i32 [[TMP102]], i32 [[TMP103]] -; CHECK-NEXT: [[TMP105:%.*]] = zext i32 [[TMP104]] to i64 -; CHECK-NEXT: [[TMP106:%.*]] = zext i32 [[TMP81]] to i64 +; CHECK-NEXT: [[TMP22:%.*]] = mul i32 [[TMP21]], [[TMP2]] +; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP1]], [[TMP22]] +; CHECK-NEXT: [[TMP24:%.*]] = icmp uge i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP25:%.*]] = add i32 [[TMP21]], 1 +; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP24]], i32 [[TMP25]], i32 [[TMP21]] +; CHECK-NEXT: [[TMP27:%.*]] = sub i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP24]], i32 [[TMP27]], i32 [[TMP23]] +; CHECK-NEXT: [[TMP29:%.*]] = icmp uge i32 [[TMP28]], [[TMP2]] +; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP26]], 1 +; CHECK-NEXT: [[TMP31:%.*]] = select i1 [[TMP29]], i32 [[TMP30]], i32 [[TMP26]] +; CHECK-NEXT: [[TMP32:%.*]] = insertelement <4 x i32> undef, i32 [[TMP31]], i64 0 +; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[Y]], i64 1 +; CHECK-NEXT: [[TMP35:%.*]] = uitofp i32 [[TMP34]] to float +; CHECK-NEXT: [[TMP36:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP35]]) +; CHECK-NEXT: [[TMP37:%.*]] = fmul fast float [[TMP36]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP38:%.*]] = fptoui float [[TMP37]] to i32 +; CHECK-NEXT: [[TMP39:%.*]] = sub i32 0, [[TMP34]] +; CHECK-NEXT: [[TMP40:%.*]] = mul i32 [[TMP39]], [[TMP38]] +; CHECK-NEXT: [[TMP41:%.*]] = zext i32 [[TMP38]] to i64 +; CHECK-NEXT: [[TMP42:%.*]] = zext i32 [[TMP40]] to i64 +; CHECK-NEXT: [[TMP43:%.*]] = mul i64 [[TMP41]], [[TMP42]] +; CHECK-NEXT: [[TMP44:%.*]] = trunc i64 [[TMP43]] to i32 +; CHECK-NEXT: [[TMP45:%.*]] = lshr i64 [[TMP43]], 32 +; CHECK-NEXT: [[TMP46:%.*]] = trunc i64 [[TMP45]] to i32 +; CHECK-NEXT: [[TMP47:%.*]] = add i32 [[TMP38]], [[TMP46]] +; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP33]] to i64 +; CHECK-NEXT: [[TMP49:%.*]] = zext i32 [[TMP47]] to i64 +; CHECK-NEXT: [[TMP50:%.*]] = mul i64 [[TMP48]], [[TMP49]] +; CHECK-NEXT: [[TMP51:%.*]] = trunc i64 [[TMP50]] to i32 +; CHECK-NEXT: [[TMP52:%.*]] = lshr i64 [[TMP50]], 32 +; CHECK-NEXT: [[TMP53:%.*]] = trunc i64 [[TMP52]] to i32 +; CHECK-NEXT: [[TMP54:%.*]] = mul i32 [[TMP53]], [[TMP34]] +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 [[TMP33]], [[TMP54]] +; CHECK-NEXT: [[TMP56:%.*]] = icmp uge i32 [[TMP55]], [[TMP34]] +; CHECK-NEXT: [[TMP57:%.*]] = add i32 [[TMP53]], 1 +; CHECK-NEXT: [[TMP58:%.*]] = select i1 [[TMP56]], i32 [[TMP57]], i32 [[TMP53]] +; CHECK-NEXT: [[TMP59:%.*]] = sub i32 [[TMP55]], [[TMP34]] +; CHECK-NEXT: [[TMP60:%.*]] = select i1 [[TMP56]], i32 [[TMP59]], i32 [[TMP55]] +; CHECK-NEXT: [[TMP61:%.*]] = icmp uge i32 [[TMP60]], [[TMP34]] +; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP58]], 1 +; CHECK-NEXT: [[TMP63:%.*]] = select i1 [[TMP61]], i32 [[TMP62]], i32 [[TMP58]] +; CHECK-NEXT: [[TMP64:%.*]] = insertelement <4 x i32> [[TMP32]], i32 [[TMP63]], i64 1 +; CHECK-NEXT: [[TMP65:%.*]] = extractelement <4 x i32> [[X]], i64 2 +; CHECK-NEXT: [[TMP66:%.*]] = extractelement <4 x i32> [[Y]], i64 2 +; CHECK-NEXT: [[TMP67:%.*]] = uitofp i32 [[TMP66]] to float +; CHECK-NEXT: [[TMP68:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP67]]) +; CHECK-NEXT: [[TMP69:%.*]] = fmul fast float [[TMP68]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP70:%.*]] = fptoui float [[TMP69]] to i32 +; CHECK-NEXT: [[TMP71:%.*]] = sub i32 0, [[TMP66]] +; CHECK-NEXT: [[TMP72:%.*]] = mul i32 [[TMP71]], [[TMP70]] +; CHECK-NEXT: [[TMP73:%.*]] = zext i32 [[TMP70]] to i64 +; CHECK-NEXT: [[TMP74:%.*]] = zext i32 [[TMP72]] to i64 +; CHECK-NEXT: [[TMP75:%.*]] = mul i64 [[TMP73]], [[TMP74]] +; CHECK-NEXT: [[TMP76:%.*]] = trunc i64 [[TMP75]] to i32 +; CHECK-NEXT: [[TMP77:%.*]] = lshr i64 [[TMP75]], 32 +; CHECK-NEXT: [[TMP78:%.*]] = trunc i64 [[TMP77]] to i32 +; CHECK-NEXT: [[TMP79:%.*]] = add i32 [[TMP70]], [[TMP78]] +; CHECK-NEXT: [[TMP80:%.*]] = zext i32 [[TMP65]] to i64 +; CHECK-NEXT: [[TMP81:%.*]] = zext i32 [[TMP79]] to i64 +; CHECK-NEXT: [[TMP82:%.*]] = mul i64 [[TMP80]], [[TMP81]] +; CHECK-NEXT: [[TMP83:%.*]] = trunc i64 [[TMP82]] to i32 +; CHECK-NEXT: [[TMP84:%.*]] = lshr i64 [[TMP82]], 32 +; CHECK-NEXT: [[TMP85:%.*]] = trunc i64 [[TMP84]] to i32 +; CHECK-NEXT: [[TMP86:%.*]] = mul i32 [[TMP85]], [[TMP66]] +; CHECK-NEXT: [[TMP87:%.*]] = sub i32 [[TMP65]], [[TMP86]] +; CHECK-NEXT: [[TMP88:%.*]] = icmp uge i32 [[TMP87]], [[TMP66]] +; CHECK-NEXT: [[TMP89:%.*]] = add i32 [[TMP85]], 1 +; CHECK-NEXT: [[TMP90:%.*]] = select i1 [[TMP88]], i32 [[TMP89]], i32 [[TMP85]] +; CHECK-NEXT: [[TMP91:%.*]] = sub i32 [[TMP87]], [[TMP66]] +; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP88]], i32 [[TMP91]], i32 [[TMP87]] +; CHECK-NEXT: [[TMP93:%.*]] = icmp uge i32 [[TMP92]], [[TMP66]] +; CHECK-NEXT: [[TMP94:%.*]] = add i32 [[TMP90]], 1 +; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP93]], i32 [[TMP94]], i32 [[TMP90]] +; CHECK-NEXT: [[TMP96:%.*]] = insertelement <4 x i32> [[TMP64]], i32 [[TMP95]], i64 2 +; CHECK-NEXT: [[TMP97:%.*]] = extractelement <4 x i32> [[X]], i64 3 +; CHECK-NEXT: [[TMP98:%.*]] = extractelement <4 x i32> [[Y]], i64 3 +; CHECK-NEXT: [[TMP99:%.*]] = uitofp i32 [[TMP98]] to float +; CHECK-NEXT: [[TMP100:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP99]]) +; CHECK-NEXT: [[TMP101:%.*]] = fmul fast float [[TMP100]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP102:%.*]] = fptoui float [[TMP101]] to i32 +; CHECK-NEXT: [[TMP103:%.*]] = sub i32 0, [[TMP98]] +; CHECK-NEXT: [[TMP104:%.*]] = mul i32 [[TMP103]], [[TMP102]] +; CHECK-NEXT: [[TMP105:%.*]] = zext i32 [[TMP102]] to i64 +; CHECK-NEXT: [[TMP106:%.*]] = zext i32 [[TMP104]] to i64 ; CHECK-NEXT: [[TMP107:%.*]] = mul i64 [[TMP105]], [[TMP106]] ; CHECK-NEXT: [[TMP108:%.*]] = trunc i64 [[TMP107]] to i32 ; CHECK-NEXT: [[TMP109:%.*]] = lshr i64 [[TMP107]], 32 ; CHECK-NEXT: [[TMP110:%.*]] = trunc i64 [[TMP109]] to i32 -; CHECK-NEXT: [[TMP111:%.*]] = mul i32 [[TMP110]], [[TMP82]] -; CHECK-NEXT: [[TMP112:%.*]] = sub i32 [[TMP81]], [[TMP111]] -; CHECK-NEXT: [[TMP113:%.*]] = icmp uge i32 [[TMP112]], [[TMP82]] -; CHECK-NEXT: [[TMP114:%.*]] = icmp uge i32 [[TMP81]], [[TMP111]] -; CHECK-NEXT: [[TMP115:%.*]] = and i1 [[TMP113]], [[TMP114]] -; CHECK-NEXT: [[TMP116:%.*]] = add i32 [[TMP110]], 1 -; CHECK-NEXT: [[TMP117:%.*]] = sub i32 [[TMP110]], 1 -; CHECK-NEXT: [[TMP118:%.*]] = select i1 [[TMP115]], i32 [[TMP116]], i32 [[TMP110]] -; CHECK-NEXT: [[TMP119:%.*]] = select i1 [[TMP114]], i32 [[TMP118]], i32 [[TMP117]] -; CHECK-NEXT: [[TMP120:%.*]] = insertelement <4 x i32> [[TMP80]], i32 [[TMP119]], i64 2 -; CHECK-NEXT: [[TMP121:%.*]] = extractelement <4 x i32> [[X]], i64 3 -; CHECK-NEXT: [[TMP122:%.*]] = extractelement <4 x i32> [[Y]], i64 3 -; CHECK-NEXT: [[TMP123:%.*]] = uitofp i32 [[TMP122]] to float -; CHECK-NEXT: [[TMP124:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP123]]) -; CHECK-NEXT: [[TMP125:%.*]] = fmul fast float [[TMP124]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP126:%.*]] = fptoui float [[TMP125]] to i32 -; CHECK-NEXT: [[TMP127:%.*]] = zext i32 [[TMP126]] to i64 -; CHECK-NEXT: [[TMP128:%.*]] = zext i32 [[TMP122]] to i64 -; CHECK-NEXT: [[TMP129:%.*]] = mul i64 [[TMP127]], [[TMP128]] -; CHECK-NEXT: [[TMP130:%.*]] = trunc i64 [[TMP129]] to i32 -; CHECK-NEXT: [[TMP131:%.*]] = lshr i64 [[TMP129]], 32 -; CHECK-NEXT: [[TMP132:%.*]] = trunc i64 [[TMP131]] to i32 -; CHECK-NEXT: [[TMP133:%.*]] = sub i32 0, [[TMP130]] -; CHECK-NEXT: [[TMP134:%.*]] = icmp eq i32 [[TMP132]], 0 -; CHECK-NEXT: [[TMP135:%.*]] = select i1 [[TMP134]], i32 [[TMP133]], i32 [[TMP130]] -; CHECK-NEXT: [[TMP136:%.*]] = zext i32 [[TMP135]] to i64 -; CHECK-NEXT: [[TMP137:%.*]] = zext i32 [[TMP126]] to i64 -; CHECK-NEXT: [[TMP138:%.*]] = mul i64 [[TMP136]], [[TMP137]] -; CHECK-NEXT: [[TMP139:%.*]] = trunc i64 [[TMP138]] to i32 -; CHECK-NEXT: [[TMP140:%.*]] = lshr i64 [[TMP138]], 32 -; CHECK-NEXT: [[TMP141:%.*]] = trunc i64 [[TMP140]] to i32 -; CHECK-NEXT: [[TMP142:%.*]] = add i32 [[TMP126]], [[TMP141]] -; CHECK-NEXT: [[TMP143:%.*]] = sub i32 [[TMP126]], [[TMP141]] -; CHECK-NEXT: [[TMP144:%.*]] = select i1 [[TMP134]], i32 [[TMP142]], i32 [[TMP143]] -; CHECK-NEXT: [[TMP145:%.*]] = zext i32 [[TMP144]] to i64 -; CHECK-NEXT: [[TMP146:%.*]] = zext i32 [[TMP121]] to i64 -; CHECK-NEXT: [[TMP147:%.*]] = mul i64 [[TMP145]], [[TMP146]] -; CHECK-NEXT: [[TMP148:%.*]] = trunc i64 [[TMP147]] to i32 -; CHECK-NEXT: [[TMP149:%.*]] = lshr i64 [[TMP147]], 32 -; CHECK-NEXT: [[TMP150:%.*]] = trunc i64 [[TMP149]] to i32 -; CHECK-NEXT: [[TMP151:%.*]] = mul i32 [[TMP150]], [[TMP122]] -; CHECK-NEXT: [[TMP152:%.*]] = sub i32 [[TMP121]], [[TMP151]] -; CHECK-NEXT: [[TMP153:%.*]] = icmp uge i32 [[TMP152]], [[TMP122]] -; CHECK-NEXT: [[TMP154:%.*]] = icmp uge i32 [[TMP121]], [[TMP151]] -; CHECK-NEXT: [[TMP155:%.*]] = and i1 [[TMP153]], [[TMP154]] -; CHECK-NEXT: [[TMP156:%.*]] = add i32 [[TMP150]], 1 -; CHECK-NEXT: [[TMP157:%.*]] = sub i32 [[TMP150]], 1 -; CHECK-NEXT: [[TMP158:%.*]] = select i1 [[TMP155]], i32 [[TMP156]], i32 [[TMP150]] -; CHECK-NEXT: [[TMP159:%.*]] = select i1 [[TMP154]], i32 [[TMP158]], i32 [[TMP157]] -; CHECK-NEXT: [[TMP160:%.*]] = insertelement <4 x i32> [[TMP120]], i32 [[TMP159]], i64 3 -; CHECK-NEXT: store <4 x i32> [[TMP160]], <4 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP111:%.*]] = add i32 [[TMP102]], [[TMP110]] +; CHECK-NEXT: [[TMP112:%.*]] = zext i32 [[TMP97]] to i64 +; CHECK-NEXT: [[TMP113:%.*]] = zext i32 [[TMP111]] to i64 +; CHECK-NEXT: [[TMP114:%.*]] = mul i64 [[TMP112]], [[TMP113]] +; CHECK-NEXT: [[TMP115:%.*]] = trunc i64 [[TMP114]] to i32 +; CHECK-NEXT: [[TMP116:%.*]] = lshr i64 [[TMP114]], 32 +; CHECK-NEXT: [[TMP117:%.*]] = trunc i64 [[TMP116]] to i32 +; CHECK-NEXT: [[TMP118:%.*]] = mul i32 [[TMP117]], [[TMP98]] +; CHECK-NEXT: [[TMP119:%.*]] = sub i32 [[TMP97]], [[TMP118]] +; CHECK-NEXT: [[TMP120:%.*]] = icmp uge i32 [[TMP119]], [[TMP98]] +; CHECK-NEXT: [[TMP121:%.*]] = add i32 [[TMP117]], 1 +; CHECK-NEXT: [[TMP122:%.*]] = select i1 [[TMP120]], i32 [[TMP121]], i32 [[TMP117]] +; CHECK-NEXT: [[TMP123:%.*]] = sub i32 [[TMP119]], [[TMP98]] +; CHECK-NEXT: [[TMP124:%.*]] = select i1 [[TMP120]], i32 [[TMP123]], i32 [[TMP119]] +; CHECK-NEXT: [[TMP125:%.*]] = icmp uge i32 [[TMP124]], [[TMP98]] +; CHECK-NEXT: [[TMP126:%.*]] = add i32 [[TMP122]], 1 +; CHECK-NEXT: [[TMP127:%.*]] = select i1 [[TMP125]], i32 [[TMP126]], i32 [[TMP122]] +; CHECK-NEXT: [[TMP128:%.*]] = insertelement <4 x i32> [[TMP96]], i32 [[TMP127]], i64 3 +; CHECK-NEXT: store <4 x i32> [[TMP128]], <4 x i32> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v4i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx8 s[8:15], s[0:1], 0xd -; GCN-NEXT: s_mov_b32 s6, 0x4f800000 -; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s19, 0xf000 -; GCN-NEXT: s_mov_b32 s18, -1 +; GCN-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0xd +; GCN-NEXT: s_mov_b32 s12, 0x4f7ffffe +; GCN-NEXT: s_mov_b32 s15, 0xf000 +; GCN-NEXT: s_mov_b32 s14, -1 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s12 -; GCN-NEXT: v_cvt_f32_u32_e32 v1, s13 -; GCN-NEXT: v_cvt_f32_u32_e32 v7, s15 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s8 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s9 +; GCN-NEXT: s_sub_i32 s2, 0, s8 +; GCN-NEXT: v_cvt_f32_u32_e32 v3, s10 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GCN-NEXT: v_mul_f32_e32 v0, s6, v0 +; GCN-NEXT: v_mul_f32_e32 v0, s12, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_f32_e32 v1, s6, v1 +; GCN-NEXT: v_mul_f32_e32 v1, s12, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s12 -; GCN-NEXT: v_mul_lo_u32 v3, v0, s12 -; GCN-NEXT: v_mul_hi_u32 v4, v1, s13 -; GCN-NEXT: v_mul_lo_u32 v5, v1, s13 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v3 -; GCN-NEXT: v_cndmask_b32_e64 v2, v3, v6, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v0 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v5 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v2, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v6, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v4 -; GCN-NEXT: v_cndmask_b32_e64 v2, v5, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v1 -; GCN-NEXT: v_mul_lo_u32 v3, v0, s12 -; GCN-NEXT: v_add_i32_e32 v4, vcc, -1, v0 -; GCN-NEXT: v_sub_i32_e32 v5, vcc, s8, v3 -; GCN-NEXT: v_cmp_le_u32_e64 s[4:5], s12, v5 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v2, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s14 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s9 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s8, v3 -; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v0 -; GCN-NEXT: s_and_b64 vcc, s[4:5], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc -; GCN-NEXT: v_mul_f32_e32 v2, s6, v2 +; GCN-NEXT: v_mul_lo_u32 v2, s2, v0 +; GCN-NEXT: s_sub_i32 s2, 0, s9 +; GCN-NEXT: v_mul_lo_u32 v4, s2, v1 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s4, v0 +; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v3 +; GCN-NEXT: v_mul_hi_u32 v3, v1, v4 +; GCN-NEXT: v_mul_lo_u32 v4, v0, s8 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v0 +; GCN-NEXT: v_mul_f32_e32 v2, s12, v2 ; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GCN-NEXT: v_mul_lo_u32 v3, v1, s13 -; GCN-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v6, v2, s14 -; GCN-NEXT: v_mul_lo_u32 v5, v2, s14 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s9, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v3 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v6 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v5 -; GCN-NEXT: v_cndmask_b32_e64 v3, v5, v3, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v3, v3, v2 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v4 -; GCN-NEXT: v_add_i32_e32 v4, vcc, -1, v1 -; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v1 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v3, v2 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, v3, v2 -; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v7 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v2, v2, s10 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_mul_f32_e32 v3, s6, v3 -; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc -; GCN-NEXT: v_mul_lo_u32 v5, v2, s14 -; GCN-NEXT: v_cndmask_b32_e64 v1, v4, v1, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v7, v3, s15 -; GCN-NEXT: v_mul_lo_u32 v6, v3, s15 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s10, v5 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v7 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v6 -; GCN-NEXT: v_cndmask_b32_e64 v4, v6, v4, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v3 -; GCN-NEXT: v_add_i32_e32 v6, vcc, -1, v2 -; GCN-NEXT: v_add_i32_e32 v7, vcc, v4, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, v4, v3 -; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v7, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v3, v3, s11 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s10, v5 -; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_mul_lo_u32 v5, v3, s15 -; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GCN-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s11, v5 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v4 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s11, v5 -; GCN-NEXT: v_add_i32_e32 v4, vcc, -1, v3 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, s4, v4 +; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s8, v4 +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v5, s[2:3] +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s8, v4 +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[2:3] +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s8, v4 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v3, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s5, v1 +; GCN-NEXT: s_sub_i32 s4, 0, s10 +; GCN-NEXT: v_mul_lo_u32 v5, s4, v2 +; GCN-NEXT: s_sub_i32 s4, 0, s11 +; GCN-NEXT: v_mul_lo_u32 v3, v1, s9 +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v1 +; GCN-NEXT: v_mul_hi_u32 v5, v2, v5 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s5, v3 +; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s9, v3 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[2:3] +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s9, v3 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[2:3] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v3 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; GCN-NEXT: v_cvt_f32_u32_e32 v4, s11 +; GCN-NEXT: v_add_i32_e32 v2, vcc, v5, v2 +; GCN-NEXT: v_mul_hi_u32 v2, s6, v2 +; GCN-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; GCN-NEXT: v_mul_lo_u32 v3, v2, s10 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v2 +; GCN-NEXT: v_mul_f32_e32 v4, s12, v4 +; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s6, v3 +; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s10, v3 +; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v5, s[2:3] +; GCN-NEXT: v_mul_lo_u32 v6, s4, v4 +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s10, v3 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[2:3] +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v2 +; GCN-NEXT: v_mul_hi_u32 v6, v4, v6 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s10, v3 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GCN-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x9 +; GCN-NEXT: v_add_i32_e32 v3, vcc, v6, v4 +; GCN-NEXT: v_mul_hi_u32 v3, s7, v3 +; GCN-NEXT: v_mul_lo_u32 v4, v3, s11 ; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] +; GCN-NEXT: v_sub_i32_e32 v4, vcc, s7, v4 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s11, v4 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s11, v4 +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[0:1] +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s11, v4 ; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GCN-NEXT: v_cndmask_b32_e64 v3, v4, v3, s[2:3] -; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[16:19], 0 +; GCN-NEXT: s_waitcnt lgkmcnt(0) +; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[12:15], 0 ; GCN-NEXT: s_endpgm %r = udiv <4 x i32> %x, %y store <4 x i32> %r, <4 x i32> addrspace(1)* %out @@ -1058,266 +955,202 @@ define amdgpu_kernel void @urem_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> %x ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[Y:%.*]], i64 0 ; CHECK-NEXT: [[TMP3:%.*]] = uitofp i32 [[TMP2]] to float ; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP3]]) -; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP6:%.*]] = fptoui float [[TMP5]] to i32 -; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP2]] to i64 -; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] -; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP7:%.*]] = sub i32 0, [[TMP2]] +; CHECK-NEXT: [[TMP8:%.*]] = mul i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] ; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP10]] -; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP12]], 0 -; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP10]] -; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP15]] to i64 -; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 +; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP6]], [[TMP14]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP1]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 ; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 ; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP14]], i32 [[TMP22]], i32 [[TMP23]] -; CHECK-NEXT: [[TMP25:%.*]] = zext i32 [[TMP24]] to i64 -; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP1]] to i64 -; CHECK-NEXT: [[TMP27:%.*]] = mul i64 [[TMP25]], [[TMP26]] -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = lshr i64 [[TMP27]], 32 -; CHECK-NEXT: [[TMP30:%.*]] = trunc i64 [[TMP29]] to i32 -; CHECK-NEXT: [[TMP31:%.*]] = mul i32 [[TMP30]], [[TMP2]] -; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP35:%.*]] = and i1 [[TMP33]], [[TMP34]] -; CHECK-NEXT: [[TMP36:%.*]] = sub i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP37:%.*]] = add i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP35]], i32 [[TMP36]], i32 [[TMP32]] -; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP34]], i32 [[TMP38]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP40:%.*]] = insertelement <4 x i32> undef, i32 [[TMP39]], i64 0 -; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP42:%.*]] = extractelement <4 x i32> [[Y]], i64 1 -; CHECK-NEXT: [[TMP43:%.*]] = uitofp i32 [[TMP42]] to float -; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP43]]) -; CHECK-NEXT: [[TMP45:%.*]] = fmul fast float [[TMP44]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP46:%.*]] = fptoui float [[TMP45]] to i32 -; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP42]] to i64 -; CHECK-NEXT: [[TMP49:%.*]] = mul i64 [[TMP47]], [[TMP48]] -; CHECK-NEXT: [[TMP50:%.*]] = trunc i64 [[TMP49]] to i32 -; CHECK-NEXT: [[TMP51:%.*]] = lshr i64 [[TMP49]], 32 -; CHECK-NEXT: [[TMP52:%.*]] = trunc i64 [[TMP51]] to i32 -; CHECK-NEXT: [[TMP53:%.*]] = sub i32 0, [[TMP50]] -; CHECK-NEXT: [[TMP54:%.*]] = icmp eq i32 [[TMP52]], 0 -; CHECK-NEXT: [[TMP55:%.*]] = select i1 [[TMP54]], i32 [[TMP53]], i32 [[TMP50]] -; CHECK-NEXT: [[TMP56:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP58:%.*]] = mul i64 [[TMP56]], [[TMP57]] -; CHECK-NEXT: [[TMP59:%.*]] = trunc i64 [[TMP58]] to i32 -; CHECK-NEXT: [[TMP60:%.*]] = lshr i64 [[TMP58]], 32 -; CHECK-NEXT: [[TMP61:%.*]] = trunc i64 [[TMP60]] to i32 -; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP63:%.*]] = sub i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP64:%.*]] = select i1 [[TMP54]], i32 [[TMP62]], i32 [[TMP63]] -; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP64]] to i64 -; CHECK-NEXT: [[TMP66:%.*]] = zext i32 [[TMP41]] to i64 -; CHECK-NEXT: [[TMP67:%.*]] = mul i64 [[TMP65]], [[TMP66]] -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = lshr i64 [[TMP67]], 32 -; CHECK-NEXT: [[TMP70:%.*]] = trunc i64 [[TMP69]] to i32 -; CHECK-NEXT: [[TMP71:%.*]] = mul i32 [[TMP70]], [[TMP42]] -; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = icmp uge i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP74:%.*]] = icmp uge i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP75:%.*]] = and i1 [[TMP73]], [[TMP74]] -; CHECK-NEXT: [[TMP76:%.*]] = sub i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP77:%.*]] = add i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP75]], i32 [[TMP76]], i32 [[TMP72]] -; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP74]], i32 [[TMP78]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = insertelement <4 x i32> [[TMP40]], i32 [[TMP79]], i64 1 -; CHECK-NEXT: [[TMP81:%.*]] = extractelement <4 x i32> [[X]], i64 2 -; CHECK-NEXT: [[TMP82:%.*]] = extractelement <4 x i32> [[Y]], i64 2 -; CHECK-NEXT: [[TMP83:%.*]] = uitofp i32 [[TMP82]] to float -; CHECK-NEXT: [[TMP84:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP83]]) -; CHECK-NEXT: [[TMP85:%.*]] = fmul fast float [[TMP84]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP86:%.*]] = fptoui float [[TMP85]] to i32 -; CHECK-NEXT: [[TMP87:%.*]] = zext i32 [[TMP86]] to i64 -; CHECK-NEXT: [[TMP88:%.*]] = zext i32 [[TMP82]] to i64 -; CHECK-NEXT: [[TMP89:%.*]] = mul i64 [[TMP87]], [[TMP88]] -; CHECK-NEXT: [[TMP90:%.*]] = trunc i64 [[TMP89]] to i32 -; CHECK-NEXT: [[TMP91:%.*]] = lshr i64 [[TMP89]], 32 -; CHECK-NEXT: [[TMP92:%.*]] = trunc i64 [[TMP91]] to i32 -; CHECK-NEXT: [[TMP93:%.*]] = sub i32 0, [[TMP90]] -; CHECK-NEXT: [[TMP94:%.*]] = icmp eq i32 [[TMP92]], 0 -; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP93]], i32 [[TMP90]] -; CHECK-NEXT: [[TMP96:%.*]] = zext i32 [[TMP95]] to i64 -; CHECK-NEXT: [[TMP97:%.*]] = zext i32 [[TMP86]] to i64 -; CHECK-NEXT: [[TMP98:%.*]] = mul i64 [[TMP96]], [[TMP97]] -; CHECK-NEXT: [[TMP99:%.*]] = trunc i64 [[TMP98]] to i32 -; CHECK-NEXT: [[TMP100:%.*]] = lshr i64 [[TMP98]], 32 -; CHECK-NEXT: [[TMP101:%.*]] = trunc i64 [[TMP100]] to i32 -; CHECK-NEXT: [[TMP102:%.*]] = add i32 [[TMP86]], [[TMP101]] -; CHECK-NEXT: [[TMP103:%.*]] = sub i32 [[TMP86]], [[TMP101]] -; CHECK-NEXT: [[TMP104:%.*]] = select i1 [[TMP94]], i32 [[TMP102]], i32 [[TMP103]] -; CHECK-NEXT: [[TMP105:%.*]] = zext i32 [[TMP104]] to i64 -; CHECK-NEXT: [[TMP106:%.*]] = zext i32 [[TMP81]] to i64 -; CHECK-NEXT: [[TMP107:%.*]] = mul i64 [[TMP105]], [[TMP106]] -; CHECK-NEXT: [[TMP108:%.*]] = trunc i64 [[TMP107]] to i32 -; CHECK-NEXT: [[TMP109:%.*]] = lshr i64 [[TMP107]], 32 -; CHECK-NEXT: [[TMP110:%.*]] = trunc i64 [[TMP109]] to i32 -; CHECK-NEXT: [[TMP111:%.*]] = mul i32 [[TMP110]], [[TMP82]] -; CHECK-NEXT: [[TMP112:%.*]] = sub i32 [[TMP81]], [[TMP111]] -; CHECK-NEXT: [[TMP113:%.*]] = icmp uge i32 [[TMP112]], [[TMP82]] -; CHECK-NEXT: [[TMP114:%.*]] = icmp uge i32 [[TMP81]], [[TMP111]] -; CHECK-NEXT: [[TMP115:%.*]] = and i1 [[TMP113]], [[TMP114]] -; CHECK-NEXT: [[TMP116:%.*]] = sub i32 [[TMP112]], [[TMP82]] -; CHECK-NEXT: [[TMP117:%.*]] = add i32 [[TMP112]], [[TMP82]] -; CHECK-NEXT: [[TMP118:%.*]] = select i1 [[TMP115]], i32 [[TMP116]], i32 [[TMP112]] -; CHECK-NEXT: [[TMP119:%.*]] = select i1 [[TMP114]], i32 [[TMP118]], i32 [[TMP117]] -; CHECK-NEXT: [[TMP120:%.*]] = insertelement <4 x i32> [[TMP80]], i32 [[TMP119]], i64 2 -; CHECK-NEXT: [[TMP121:%.*]] = extractelement <4 x i32> [[X]], i64 3 -; CHECK-NEXT: [[TMP122:%.*]] = extractelement <4 x i32> [[Y]], i64 3 -; CHECK-NEXT: [[TMP123:%.*]] = uitofp i32 [[TMP122]] to float -; CHECK-NEXT: [[TMP124:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP123]]) -; CHECK-NEXT: [[TMP125:%.*]] = fmul fast float [[TMP124]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP126:%.*]] = fptoui float [[TMP125]] to i32 -; CHECK-NEXT: [[TMP127:%.*]] = zext i32 [[TMP126]] to i64 -; CHECK-NEXT: [[TMP128:%.*]] = zext i32 [[TMP122]] to i64 -; CHECK-NEXT: [[TMP129:%.*]] = mul i64 [[TMP127]], [[TMP128]] -; CHECK-NEXT: [[TMP130:%.*]] = trunc i64 [[TMP129]] to i32 -; CHECK-NEXT: [[TMP131:%.*]] = lshr i64 [[TMP129]], 32 -; CHECK-NEXT: [[TMP132:%.*]] = trunc i64 [[TMP131]] to i32 -; CHECK-NEXT: [[TMP133:%.*]] = sub i32 0, [[TMP130]] -; CHECK-NEXT: [[TMP134:%.*]] = icmp eq i32 [[TMP132]], 0 -; CHECK-NEXT: [[TMP135:%.*]] = select i1 [[TMP134]], i32 [[TMP133]], i32 [[TMP130]] -; CHECK-NEXT: [[TMP136:%.*]] = zext i32 [[TMP135]] to i64 -; CHECK-NEXT: [[TMP137:%.*]] = zext i32 [[TMP126]] to i64 -; CHECK-NEXT: [[TMP138:%.*]] = mul i64 [[TMP136]], [[TMP137]] -; CHECK-NEXT: [[TMP139:%.*]] = trunc i64 [[TMP138]] to i32 -; CHECK-NEXT: [[TMP140:%.*]] = lshr i64 [[TMP138]], 32 -; CHECK-NEXT: [[TMP141:%.*]] = trunc i64 [[TMP140]] to i32 -; CHECK-NEXT: [[TMP142:%.*]] = add i32 [[TMP126]], [[TMP141]] -; CHECK-NEXT: [[TMP143:%.*]] = sub i32 [[TMP126]], [[TMP141]] -; CHECK-NEXT: [[TMP144:%.*]] = select i1 [[TMP134]], i32 [[TMP142]], i32 [[TMP143]] -; CHECK-NEXT: [[TMP145:%.*]] = zext i32 [[TMP144]] to i64 -; CHECK-NEXT: [[TMP146:%.*]] = zext i32 [[TMP121]] to i64 -; CHECK-NEXT: [[TMP147:%.*]] = mul i64 [[TMP145]], [[TMP146]] -; CHECK-NEXT: [[TMP148:%.*]] = trunc i64 [[TMP147]] to i32 -; CHECK-NEXT: [[TMP149:%.*]] = lshr i64 [[TMP147]], 32 -; CHECK-NEXT: [[TMP150:%.*]] = trunc i64 [[TMP149]] to i32 -; CHECK-NEXT: [[TMP151:%.*]] = mul i32 [[TMP150]], [[TMP122]] -; CHECK-NEXT: [[TMP152:%.*]] = sub i32 [[TMP121]], [[TMP151]] -; CHECK-NEXT: [[TMP153:%.*]] = icmp uge i32 [[TMP152]], [[TMP122]] -; CHECK-NEXT: [[TMP154:%.*]] = icmp uge i32 [[TMP121]], [[TMP151]] -; CHECK-NEXT: [[TMP155:%.*]] = and i1 [[TMP153]], [[TMP154]] -; CHECK-NEXT: [[TMP156:%.*]] = sub i32 [[TMP152]], [[TMP122]] -; CHECK-NEXT: [[TMP157:%.*]] = add i32 [[TMP152]], [[TMP122]] -; CHECK-NEXT: [[TMP158:%.*]] = select i1 [[TMP155]], i32 [[TMP156]], i32 [[TMP152]] -; CHECK-NEXT: [[TMP159:%.*]] = select i1 [[TMP154]], i32 [[TMP158]], i32 [[TMP157]] -; CHECK-NEXT: [[TMP160:%.*]] = insertelement <4 x i32> [[TMP120]], i32 [[TMP159]], i64 3 -; CHECK-NEXT: store <4 x i32> [[TMP160]], <4 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP22:%.*]] = mul i32 [[TMP21]], [[TMP2]] +; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP1]], [[TMP22]] +; CHECK-NEXT: [[TMP24:%.*]] = icmp uge i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP25:%.*]] = sub i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP24]], i32 [[TMP25]], i32 [[TMP23]] +; CHECK-NEXT: [[TMP27:%.*]] = icmp uge i32 [[TMP26]], [[TMP2]] +; CHECK-NEXT: [[TMP28:%.*]] = sub i32 [[TMP26]], [[TMP2]] +; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP27]], i32 [[TMP28]], i32 [[TMP26]] +; CHECK-NEXT: [[TMP30:%.*]] = insertelement <4 x i32> undef, i32 [[TMP29]], i64 0 +; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[Y]], i64 1 +; CHECK-NEXT: [[TMP33:%.*]] = uitofp i32 [[TMP32]] to float +; CHECK-NEXT: [[TMP34:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP33]]) +; CHECK-NEXT: [[TMP35:%.*]] = fmul fast float [[TMP34]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP36:%.*]] = fptoui float [[TMP35]] to i32 +; CHECK-NEXT: [[TMP37:%.*]] = sub i32 0, [[TMP32]] +; CHECK-NEXT: [[TMP38:%.*]] = mul i32 [[TMP37]], [[TMP36]] +; CHECK-NEXT: [[TMP39:%.*]] = zext i32 [[TMP36]] to i64 +; CHECK-NEXT: [[TMP40:%.*]] = zext i32 [[TMP38]] to i64 +; CHECK-NEXT: [[TMP41:%.*]] = mul i64 [[TMP39]], [[TMP40]] +; CHECK-NEXT: [[TMP42:%.*]] = trunc i64 [[TMP41]] to i32 +; CHECK-NEXT: [[TMP43:%.*]] = lshr i64 [[TMP41]], 32 +; CHECK-NEXT: [[TMP44:%.*]] = trunc i64 [[TMP43]] to i32 +; CHECK-NEXT: [[TMP45:%.*]] = add i32 [[TMP36]], [[TMP44]] +; CHECK-NEXT: [[TMP46:%.*]] = zext i32 [[TMP31]] to i64 +; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP45]] to i64 +; CHECK-NEXT: [[TMP48:%.*]] = mul i64 [[TMP46]], [[TMP47]] +; CHECK-NEXT: [[TMP49:%.*]] = trunc i64 [[TMP48]] to i32 +; CHECK-NEXT: [[TMP50:%.*]] = lshr i64 [[TMP48]], 32 +; CHECK-NEXT: [[TMP51:%.*]] = trunc i64 [[TMP50]] to i32 +; CHECK-NEXT: [[TMP52:%.*]] = mul i32 [[TMP51]], [[TMP32]] +; CHECK-NEXT: [[TMP53:%.*]] = sub i32 [[TMP31]], [[TMP52]] +; CHECK-NEXT: [[TMP54:%.*]] = icmp uge i32 [[TMP53]], [[TMP32]] +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 [[TMP53]], [[TMP32]] +; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP54]], i32 [[TMP55]], i32 [[TMP53]] +; CHECK-NEXT: [[TMP57:%.*]] = icmp uge i32 [[TMP56]], [[TMP32]] +; CHECK-NEXT: [[TMP58:%.*]] = sub i32 [[TMP56]], [[TMP32]] +; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP57]], i32 [[TMP58]], i32 [[TMP56]] +; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i32> [[TMP30]], i32 [[TMP59]], i64 1 +; CHECK-NEXT: [[TMP61:%.*]] = extractelement <4 x i32> [[X]], i64 2 +; CHECK-NEXT: [[TMP62:%.*]] = extractelement <4 x i32> [[Y]], i64 2 +; CHECK-NEXT: [[TMP63:%.*]] = uitofp i32 [[TMP62]] to float +; CHECK-NEXT: [[TMP64:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP63]]) +; CHECK-NEXT: [[TMP65:%.*]] = fmul fast float [[TMP64]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP66:%.*]] = fptoui float [[TMP65]] to i32 +; CHECK-NEXT: [[TMP67:%.*]] = sub i32 0, [[TMP62]] +; CHECK-NEXT: [[TMP68:%.*]] = mul i32 [[TMP67]], [[TMP66]] +; CHECK-NEXT: [[TMP69:%.*]] = zext i32 [[TMP66]] to i64 +; CHECK-NEXT: [[TMP70:%.*]] = zext i32 [[TMP68]] to i64 +; CHECK-NEXT: [[TMP71:%.*]] = mul i64 [[TMP69]], [[TMP70]] +; CHECK-NEXT: [[TMP72:%.*]] = trunc i64 [[TMP71]] to i32 +; CHECK-NEXT: [[TMP73:%.*]] = lshr i64 [[TMP71]], 32 +; CHECK-NEXT: [[TMP74:%.*]] = trunc i64 [[TMP73]] to i32 +; CHECK-NEXT: [[TMP75:%.*]] = add i32 [[TMP66]], [[TMP74]] +; CHECK-NEXT: [[TMP76:%.*]] = zext i32 [[TMP61]] to i64 +; CHECK-NEXT: [[TMP77:%.*]] = zext i32 [[TMP75]] to i64 +; CHECK-NEXT: [[TMP78:%.*]] = mul i64 [[TMP76]], [[TMP77]] +; CHECK-NEXT: [[TMP79:%.*]] = trunc i64 [[TMP78]] to i32 +; CHECK-NEXT: [[TMP80:%.*]] = lshr i64 [[TMP78]], 32 +; CHECK-NEXT: [[TMP81:%.*]] = trunc i64 [[TMP80]] to i32 +; CHECK-NEXT: [[TMP82:%.*]] = mul i32 [[TMP81]], [[TMP62]] +; CHECK-NEXT: [[TMP83:%.*]] = sub i32 [[TMP61]], [[TMP82]] +; CHECK-NEXT: [[TMP84:%.*]] = icmp uge i32 [[TMP83]], [[TMP62]] +; CHECK-NEXT: [[TMP85:%.*]] = sub i32 [[TMP83]], [[TMP62]] +; CHECK-NEXT: [[TMP86:%.*]] = select i1 [[TMP84]], i32 [[TMP85]], i32 [[TMP83]] +; CHECK-NEXT: [[TMP87:%.*]] = icmp uge i32 [[TMP86]], [[TMP62]] +; CHECK-NEXT: [[TMP88:%.*]] = sub i32 [[TMP86]], [[TMP62]] +; CHECK-NEXT: [[TMP89:%.*]] = select i1 [[TMP87]], i32 [[TMP88]], i32 [[TMP86]] +; CHECK-NEXT: [[TMP90:%.*]] = insertelement <4 x i32> [[TMP60]], i32 [[TMP89]], i64 2 +; CHECK-NEXT: [[TMP91:%.*]] = extractelement <4 x i32> [[X]], i64 3 +; CHECK-NEXT: [[TMP92:%.*]] = extractelement <4 x i32> [[Y]], i64 3 +; CHECK-NEXT: [[TMP93:%.*]] = uitofp i32 [[TMP92]] to float +; CHECK-NEXT: [[TMP94:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP93]]) +; CHECK-NEXT: [[TMP95:%.*]] = fmul fast float [[TMP94]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP96:%.*]] = fptoui float [[TMP95]] to i32 +; CHECK-NEXT: [[TMP97:%.*]] = sub i32 0, [[TMP92]] +; CHECK-NEXT: [[TMP98:%.*]] = mul i32 [[TMP97]], [[TMP96]] +; CHECK-NEXT: [[TMP99:%.*]] = zext i32 [[TMP96]] to i64 +; CHECK-NEXT: [[TMP100:%.*]] = zext i32 [[TMP98]] to i64 +; CHECK-NEXT: [[TMP101:%.*]] = mul i64 [[TMP99]], [[TMP100]] +; CHECK-NEXT: [[TMP102:%.*]] = trunc i64 [[TMP101]] to i32 +; CHECK-NEXT: [[TMP103:%.*]] = lshr i64 [[TMP101]], 32 +; CHECK-NEXT: [[TMP104:%.*]] = trunc i64 [[TMP103]] to i32 +; CHECK-NEXT: [[TMP105:%.*]] = add i32 [[TMP96]], [[TMP104]] +; CHECK-NEXT: [[TMP106:%.*]] = zext i32 [[TMP91]] to i64 +; CHECK-NEXT: [[TMP107:%.*]] = zext i32 [[TMP105]] to i64 +; CHECK-NEXT: [[TMP108:%.*]] = mul i64 [[TMP106]], [[TMP107]] +; CHECK-NEXT: [[TMP109:%.*]] = trunc i64 [[TMP108]] to i32 +; CHECK-NEXT: [[TMP110:%.*]] = lshr i64 [[TMP108]], 32 +; CHECK-NEXT: [[TMP111:%.*]] = trunc i64 [[TMP110]] to i32 +; CHECK-NEXT: [[TMP112:%.*]] = mul i32 [[TMP111]], [[TMP92]] +; CHECK-NEXT: [[TMP113:%.*]] = sub i32 [[TMP91]], [[TMP112]] +; CHECK-NEXT: [[TMP114:%.*]] = icmp uge i32 [[TMP113]], [[TMP92]] +; CHECK-NEXT: [[TMP115:%.*]] = sub i32 [[TMP113]], [[TMP92]] +; CHECK-NEXT: [[TMP116:%.*]] = select i1 [[TMP114]], i32 [[TMP115]], i32 [[TMP113]] +; CHECK-NEXT: [[TMP117:%.*]] = icmp uge i32 [[TMP116]], [[TMP92]] +; CHECK-NEXT: [[TMP118:%.*]] = sub i32 [[TMP116]], [[TMP92]] +; CHECK-NEXT: [[TMP119:%.*]] = select i1 [[TMP117]], i32 [[TMP118]], i32 [[TMP116]] +; CHECK-NEXT: [[TMP120:%.*]] = insertelement <4 x i32> [[TMP90]], i32 [[TMP119]], i64 3 +; CHECK-NEXT: store <4 x i32> [[TMP120]], <4 x i32> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v4i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx8 s[8:15], s[0:1], 0xd -; GCN-NEXT: s_mov_b32 s6, 0x4f800000 -; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s19, 0xf000 -; GCN-NEXT: s_mov_b32 s18, -1 +; GCN-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0xd +; GCN-NEXT: s_mov_b32 s12, 0x4f7ffffe +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s12 -; GCN-NEXT: v_cvt_f32_u32_e32 v1, s13 -; GCN-NEXT: v_cvt_f32_u32_e32 v7, s15 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s8 +; GCN-NEXT: s_sub_i32 s2, 0, s8 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s9 +; GCN-NEXT: v_cvt_f32_u32_e32 v4, s11 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GCN-NEXT: s_sub_i32 s3, 0, s9 ; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GCN-NEXT: v_mul_f32_e32 v0, s6, v0 +; GCN-NEXT: v_cvt_f32_u32_e32 v2, s10 +; GCN-NEXT: v_mul_f32_e32 v0, s12, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_f32_e32 v1, s6, v1 +; GCN-NEXT: v_mul_f32_e32 v1, s12, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v0, s12 -; GCN-NEXT: v_mul_hi_u32 v3, v0, s12 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v0 -; GCN-NEXT: v_mul_lo_u32 v3, v1, s13 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v2, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GCN-NEXT: v_mul_hi_u32 v2, v1, s13 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v1 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s12 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v2, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s14 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s8, v0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], s8, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v1, s13 -; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s12, v3 -; GCN-NEXT: v_mul_f32_e32 v2, s6, v2 +; GCN-NEXT: v_mul_lo_u32 v3, s2, v0 +; GCN-NEXT: s_sub_i32 s2, 0, s10 +; GCN-NEXT: v_mul_f32_e32 v2, s12, v2 +; GCN-NEXT: v_mul_hi_u32 v3, v0, v3 ; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GCN-NEXT: v_add_i32_e32 v4, vcc, s12, v3 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s12, v3 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[4:5] -; GCN-NEXT: v_mul_lo_u32 v5, v2, s14 -; GCN-NEXT: v_mul_hi_u32 v6, v2, s14 -; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s9, v1 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v1 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, 0, v5 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v6 -; GCN-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v2 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v3 -; GCN-NEXT: v_add_i32_e32 v4, vcc, s13, v3 -; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s13, v3 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v1, v2 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v1, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s10 -; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v7 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GCN-NEXT: v_mul_lo_u32 v5, v1, s14 -; GCN-NEXT: v_mul_f32_e32 v1, s6, v2 -; GCN-NEXT: v_cvt_u32_f32_e32 v2, v1 -; GCN-NEXT: v_cndmask_b32_e64 v1, v4, v3, s[2:3] -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s10, v5 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v3 -; GCN-NEXT: v_mul_lo_u32 v4, v2, s15 -; GCN-NEXT: v_mul_hi_u32 v6, v2, s15 -; GCN-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v6 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v2 -; GCN-NEXT: v_add_i32_e32 v6, vcc, s14, v3 -; GCN-NEXT: v_add_i32_e32 v7, vcc, v4, v2 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, v4, v2 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v7, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v2, v2, s11 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s10, v5 -; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s14, v3 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_mul_lo_u32 v5, v2, s15 -; GCN-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc -; GCN-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s11, v5 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s11, v5 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v3 -; GCN-NEXT: v_add_i32_e32 v4, vcc, s15, v3 -; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s15, v3 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GCN-NEXT: v_cndmask_b32_e64 v3, v4, v3, s[2:3] -; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[16:19], 0 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v3, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s4, v0 +; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v4 +; GCN-NEXT: v_mul_lo_u32 v4, s3, v1 +; GCN-NEXT: s_mov_b32 s3, 0xf000 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s8 +; GCN-NEXT: v_mul_f32_e32 v3, s12, v3 +; GCN-NEXT: v_mul_hi_u32 v4, v1, v4 +; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s8, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s8, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v4, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s5, v1 +; GCN-NEXT: v_mul_lo_u32 v4, s2, v2 +; GCN-NEXT: s_sub_i32 s2, 0, s11 +; GCN-NEXT: v_mul_lo_u32 v1, v1, s9 +; GCN-NEXT: v_mul_hi_u32 v4, v2, v4 +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s5, v1 +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s9, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s9, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GCN-NEXT: v_add_i32_e32 v2, vcc, v4, v2 +; GCN-NEXT: v_mul_hi_u32 v2, s6, v2 +; GCN-NEXT: v_mul_lo_u32 v4, s2, v3 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_lo_u32 v2, v2, s10 +; GCN-NEXT: v_mul_hi_u32 v4, v3, v4 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s6, v2 +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s10, v2 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s10, v2 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v3, s7, v3 +; GCN-NEXT: v_mul_lo_u32 v3, v3, s11 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s7, v3 +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s11, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s11, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 ; GCN-NEXT: s_endpgm %r = urem <4 x i32> %x, %y store <4 x i32> %r, <4 x i32> addrspace(1)* %out @@ -1337,331 +1170,284 @@ define amdgpu_kernel void @sdiv_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> %x ; CHECK-NEXT: [[TMP9:%.*]] = xor i32 [[TMP7]], [[TMP4]] ; CHECK-NEXT: [[TMP10:%.*]] = uitofp i32 [[TMP9]] to float ; CHECK-NEXT: [[TMP11:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP10]]) -; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[TMP11]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[TMP11]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP13:%.*]] = fptoui float [[TMP12]] to i32 -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP9]] to i64 -; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP14]], [[TMP15]] -; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[TMP16]] to i32 -; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP16]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = sub i32 0, [[TMP9]] +; CHECK-NEXT: [[TMP15:%.*]] = mul i32 [[TMP14]], [[TMP13]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP13]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 -; CHECK-NEXT: [[TMP20:%.*]] = sub i32 0, [[TMP17]] -; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i32 [[TMP19]], 0 -; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 [[TMP17]] -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64 -; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[TMP13]] to i64 +; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 +; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 +; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP13]], [[TMP21]] +; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[TMP22]] to i64 ; CHECK-NEXT: [[TMP25:%.*]] = mul i64 [[TMP23]], [[TMP24]] ; CHECK-NEXT: [[TMP26:%.*]] = trunc i64 [[TMP25]] to i32 ; CHECK-NEXT: [[TMP27:%.*]] = lshr i64 [[TMP25]], 32 ; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = add i32 [[TMP13]], [[TMP28]] -; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[TMP13]], [[TMP28]] -; CHECK-NEXT: [[TMP31:%.*]] = select i1 [[TMP21]], i32 [[TMP29]], i32 [[TMP30]] -; CHECK-NEXT: [[TMP32:%.*]] = zext i32 [[TMP31]] to i64 -; CHECK-NEXT: [[TMP33:%.*]] = zext i32 [[TMP8]] to i64 -; CHECK-NEXT: [[TMP34:%.*]] = mul i64 [[TMP32]], [[TMP33]] -; CHECK-NEXT: [[TMP35:%.*]] = trunc i64 [[TMP34]] to i32 -; CHECK-NEXT: [[TMP36:%.*]] = lshr i64 [[TMP34]], 32 -; CHECK-NEXT: [[TMP37:%.*]] = trunc i64 [[TMP36]] to i32 -; CHECK-NEXT: [[TMP38:%.*]] = mul i32 [[TMP37]], [[TMP9]] -; CHECK-NEXT: [[TMP39:%.*]] = sub i32 [[TMP8]], [[TMP38]] -; CHECK-NEXT: [[TMP40:%.*]] = icmp uge i32 [[TMP39]], [[TMP9]] -; CHECK-NEXT: [[TMP41:%.*]] = icmp uge i32 [[TMP8]], [[TMP38]] -; CHECK-NEXT: [[TMP42:%.*]] = and i1 [[TMP40]], [[TMP41]] -; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP37]], 1 -; CHECK-NEXT: [[TMP44:%.*]] = sub i32 [[TMP37]], 1 -; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP42]], i32 [[TMP43]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP41]], i32 [[TMP45]], i32 [[TMP44]] -; CHECK-NEXT: [[TMP47:%.*]] = xor i32 [[TMP46]], [[TMP5]] -; CHECK-NEXT: [[TMP48:%.*]] = sub i32 [[TMP47]], [[TMP5]] -; CHECK-NEXT: [[TMP49:%.*]] = insertelement <4 x i32> undef, i32 [[TMP48]], i64 0 -; CHECK-NEXT: [[TMP50:%.*]] = extractelement <4 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP51:%.*]] = extractelement <4 x i32> [[Y]], i64 1 -; CHECK-NEXT: [[TMP52:%.*]] = ashr i32 [[TMP50]], 31 -; CHECK-NEXT: [[TMP53:%.*]] = ashr i32 [[TMP51]], 31 -; CHECK-NEXT: [[TMP54:%.*]] = xor i32 [[TMP52]], [[TMP53]] -; CHECK-NEXT: [[TMP55:%.*]] = add i32 [[TMP50]], [[TMP52]] -; CHECK-NEXT: [[TMP56:%.*]] = add i32 [[TMP51]], [[TMP53]] -; CHECK-NEXT: [[TMP57:%.*]] = xor i32 [[TMP55]], [[TMP52]] -; CHECK-NEXT: [[TMP58:%.*]] = xor i32 [[TMP56]], [[TMP53]] -; CHECK-NEXT: [[TMP59:%.*]] = uitofp i32 [[TMP58]] to float -; CHECK-NEXT: [[TMP60:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP59]]) -; CHECK-NEXT: [[TMP61:%.*]] = fmul fast float [[TMP60]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP62:%.*]] = fptoui float [[TMP61]] to i32 -; CHECK-NEXT: [[TMP63:%.*]] = zext i32 [[TMP62]] to i64 -; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[TMP58]] to i64 -; CHECK-NEXT: [[TMP65:%.*]] = mul i64 [[TMP63]], [[TMP64]] -; CHECK-NEXT: [[TMP66:%.*]] = trunc i64 [[TMP65]] to i32 -; CHECK-NEXT: [[TMP67:%.*]] = lshr i64 [[TMP65]], 32 -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = sub i32 0, [[TMP66]] -; CHECK-NEXT: [[TMP70:%.*]] = icmp eq i32 [[TMP68]], 0 -; CHECK-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP69]], i32 [[TMP66]] -; CHECK-NEXT: [[TMP72:%.*]] = zext i32 [[TMP71]] to i64 -; CHECK-NEXT: [[TMP73:%.*]] = zext i32 [[TMP62]] to i64 -; CHECK-NEXT: [[TMP74:%.*]] = mul i64 [[TMP72]], [[TMP73]] -; CHECK-NEXT: [[TMP75:%.*]] = trunc i64 [[TMP74]] to i32 -; CHECK-NEXT: [[TMP76:%.*]] = lshr i64 [[TMP74]], 32 -; CHECK-NEXT: [[TMP77:%.*]] = trunc i64 [[TMP76]] to i32 -; CHECK-NEXT: [[TMP78:%.*]] = add i32 [[TMP62]], [[TMP77]] -; CHECK-NEXT: [[TMP79:%.*]] = sub i32 [[TMP62]], [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = select i1 [[TMP70]], i32 [[TMP78]], i32 [[TMP79]] -; CHECK-NEXT: [[TMP81:%.*]] = zext i32 [[TMP80]] to i64 -; CHECK-NEXT: [[TMP82:%.*]] = zext i32 [[TMP57]] to i64 -; CHECK-NEXT: [[TMP83:%.*]] = mul i64 [[TMP81]], [[TMP82]] -; CHECK-NEXT: [[TMP84:%.*]] = trunc i64 [[TMP83]] to i32 -; CHECK-NEXT: [[TMP85:%.*]] = lshr i64 [[TMP83]], 32 -; CHECK-NEXT: [[TMP86:%.*]] = trunc i64 [[TMP85]] to i32 -; CHECK-NEXT: [[TMP87:%.*]] = mul i32 [[TMP86]], [[TMP58]] -; CHECK-NEXT: [[TMP88:%.*]] = sub i32 [[TMP57]], [[TMP87]] -; CHECK-NEXT: [[TMP89:%.*]] = icmp uge i32 [[TMP88]], [[TMP58]] -; CHECK-NEXT: [[TMP90:%.*]] = icmp uge i32 [[TMP57]], [[TMP87]] -; CHECK-NEXT: [[TMP91:%.*]] = and i1 [[TMP89]], [[TMP90]] -; CHECK-NEXT: [[TMP92:%.*]] = add i32 [[TMP86]], 1 -; CHECK-NEXT: [[TMP93:%.*]] = sub i32 [[TMP86]], 1 -; CHECK-NEXT: [[TMP94:%.*]] = select i1 [[TMP91]], i32 [[TMP92]], i32 [[TMP86]] -; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP90]], i32 [[TMP94]], i32 [[TMP93]] -; CHECK-NEXT: [[TMP96:%.*]] = xor i32 [[TMP95]], [[TMP54]] -; CHECK-NEXT: [[TMP97:%.*]] = sub i32 [[TMP96]], [[TMP54]] -; CHECK-NEXT: [[TMP98:%.*]] = insertelement <4 x i32> [[TMP49]], i32 [[TMP97]], i64 1 -; CHECK-NEXT: [[TMP99:%.*]] = extractelement <4 x i32> [[X]], i64 2 -; CHECK-NEXT: [[TMP100:%.*]] = extractelement <4 x i32> [[Y]], i64 2 -; CHECK-NEXT: [[TMP101:%.*]] = ashr i32 [[TMP99]], 31 -; CHECK-NEXT: [[TMP102:%.*]] = ashr i32 [[TMP100]], 31 -; CHECK-NEXT: [[TMP103:%.*]] = xor i32 [[TMP101]], [[TMP102]] -; CHECK-NEXT: [[TMP104:%.*]] = add i32 [[TMP99]], [[TMP101]] -; CHECK-NEXT: [[TMP105:%.*]] = add i32 [[TMP100]], [[TMP102]] -; CHECK-NEXT: [[TMP106:%.*]] = xor i32 [[TMP104]], [[TMP101]] -; CHECK-NEXT: [[TMP107:%.*]] = xor i32 [[TMP105]], [[TMP102]] -; CHECK-NEXT: [[TMP108:%.*]] = uitofp i32 [[TMP107]] to float -; CHECK-NEXT: [[TMP109:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP108]]) -; CHECK-NEXT: [[TMP110:%.*]] = fmul fast float [[TMP109]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP111:%.*]] = fptoui float [[TMP110]] to i32 -; CHECK-NEXT: [[TMP112:%.*]] = zext i32 [[TMP111]] to i64 -; CHECK-NEXT: [[TMP113:%.*]] = zext i32 [[TMP107]] to i64 -; CHECK-NEXT: [[TMP114:%.*]] = mul i64 [[TMP112]], [[TMP113]] -; CHECK-NEXT: [[TMP115:%.*]] = trunc i64 [[TMP114]] to i32 -; CHECK-NEXT: [[TMP116:%.*]] = lshr i64 [[TMP114]], 32 -; CHECK-NEXT: [[TMP117:%.*]] = trunc i64 [[TMP116]] to i32 -; CHECK-NEXT: [[TMP118:%.*]] = sub i32 0, [[TMP115]] -; CHECK-NEXT: [[TMP119:%.*]] = icmp eq i32 [[TMP117]], 0 -; CHECK-NEXT: [[TMP120:%.*]] = select i1 [[TMP119]], i32 [[TMP118]], i32 [[TMP115]] -; CHECK-NEXT: [[TMP121:%.*]] = zext i32 [[TMP120]] to i64 -; CHECK-NEXT: [[TMP122:%.*]] = zext i32 [[TMP111]] to i64 -; CHECK-NEXT: [[TMP123:%.*]] = mul i64 [[TMP121]], [[TMP122]] -; CHECK-NEXT: [[TMP124:%.*]] = trunc i64 [[TMP123]] to i32 -; CHECK-NEXT: [[TMP125:%.*]] = lshr i64 [[TMP123]], 32 -; CHECK-NEXT: [[TMP126:%.*]] = trunc i64 [[TMP125]] to i32 -; CHECK-NEXT: [[TMP127:%.*]] = add i32 [[TMP111]], [[TMP126]] -; CHECK-NEXT: [[TMP128:%.*]] = sub i32 [[TMP111]], [[TMP126]] -; CHECK-NEXT: [[TMP129:%.*]] = select i1 [[TMP119]], i32 [[TMP127]], i32 [[TMP128]] -; CHECK-NEXT: [[TMP130:%.*]] = zext i32 [[TMP129]] to i64 -; CHECK-NEXT: [[TMP131:%.*]] = zext i32 [[TMP106]] to i64 -; CHECK-NEXT: [[TMP132:%.*]] = mul i64 [[TMP130]], [[TMP131]] -; CHECK-NEXT: [[TMP133:%.*]] = trunc i64 [[TMP132]] to i32 -; CHECK-NEXT: [[TMP134:%.*]] = lshr i64 [[TMP132]], 32 -; CHECK-NEXT: [[TMP135:%.*]] = trunc i64 [[TMP134]] to i32 -; CHECK-NEXT: [[TMP136:%.*]] = mul i32 [[TMP135]], [[TMP107]] -; CHECK-NEXT: [[TMP137:%.*]] = sub i32 [[TMP106]], [[TMP136]] -; CHECK-NEXT: [[TMP138:%.*]] = icmp uge i32 [[TMP137]], [[TMP107]] -; CHECK-NEXT: [[TMP139:%.*]] = icmp uge i32 [[TMP106]], [[TMP136]] -; CHECK-NEXT: [[TMP140:%.*]] = and i1 [[TMP138]], [[TMP139]] -; CHECK-NEXT: [[TMP141:%.*]] = add i32 [[TMP135]], 1 -; CHECK-NEXT: [[TMP142:%.*]] = sub i32 [[TMP135]], 1 -; CHECK-NEXT: [[TMP143:%.*]] = select i1 [[TMP140]], i32 [[TMP141]], i32 [[TMP135]] -; CHECK-NEXT: [[TMP144:%.*]] = select i1 [[TMP139]], i32 [[TMP143]], i32 [[TMP142]] -; CHECK-NEXT: [[TMP145:%.*]] = xor i32 [[TMP144]], [[TMP103]] -; CHECK-NEXT: [[TMP146:%.*]] = sub i32 [[TMP145]], [[TMP103]] -; CHECK-NEXT: [[TMP147:%.*]] = insertelement <4 x i32> [[TMP98]], i32 [[TMP146]], i64 2 -; CHECK-NEXT: [[TMP148:%.*]] = extractelement <4 x i32> [[X]], i64 3 -; CHECK-NEXT: [[TMP149:%.*]] = extractelement <4 x i32> [[Y]], i64 3 -; CHECK-NEXT: [[TMP150:%.*]] = ashr i32 [[TMP148]], 31 -; CHECK-NEXT: [[TMP151:%.*]] = ashr i32 [[TMP149]], 31 -; CHECK-NEXT: [[TMP152:%.*]] = xor i32 [[TMP150]], [[TMP151]] -; CHECK-NEXT: [[TMP153:%.*]] = add i32 [[TMP148]], [[TMP150]] -; CHECK-NEXT: [[TMP154:%.*]] = add i32 [[TMP149]], [[TMP151]] -; CHECK-NEXT: [[TMP155:%.*]] = xor i32 [[TMP153]], [[TMP150]] -; CHECK-NEXT: [[TMP156:%.*]] = xor i32 [[TMP154]], [[TMP151]] -; CHECK-NEXT: [[TMP157:%.*]] = uitofp i32 [[TMP156]] to float -; CHECK-NEXT: [[TMP158:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP157]]) -; CHECK-NEXT: [[TMP159:%.*]] = fmul fast float [[TMP158]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP160:%.*]] = fptoui float [[TMP159]] to i32 -; CHECK-NEXT: [[TMP161:%.*]] = zext i32 [[TMP160]] to i64 -; CHECK-NEXT: [[TMP162:%.*]] = zext i32 [[TMP156]] to i64 -; CHECK-NEXT: [[TMP163:%.*]] = mul i64 [[TMP161]], [[TMP162]] -; CHECK-NEXT: [[TMP164:%.*]] = trunc i64 [[TMP163]] to i32 -; CHECK-NEXT: [[TMP165:%.*]] = lshr i64 [[TMP163]], 32 -; CHECK-NEXT: [[TMP166:%.*]] = trunc i64 [[TMP165]] to i32 -; CHECK-NEXT: [[TMP167:%.*]] = sub i32 0, [[TMP164]] -; CHECK-NEXT: [[TMP168:%.*]] = icmp eq i32 [[TMP166]], 0 -; CHECK-NEXT: [[TMP169:%.*]] = select i1 [[TMP168]], i32 [[TMP167]], i32 [[TMP164]] -; CHECK-NEXT: [[TMP170:%.*]] = zext i32 [[TMP169]] to i64 -; CHECK-NEXT: [[TMP171:%.*]] = zext i32 [[TMP160]] to i64 -; CHECK-NEXT: [[TMP172:%.*]] = mul i64 [[TMP170]], [[TMP171]] -; CHECK-NEXT: [[TMP173:%.*]] = trunc i64 [[TMP172]] to i32 -; CHECK-NEXT: [[TMP174:%.*]] = lshr i64 [[TMP172]], 32 -; CHECK-NEXT: [[TMP175:%.*]] = trunc i64 [[TMP174]] to i32 -; CHECK-NEXT: [[TMP176:%.*]] = add i32 [[TMP160]], [[TMP175]] -; CHECK-NEXT: [[TMP177:%.*]] = sub i32 [[TMP160]], [[TMP175]] -; CHECK-NEXT: [[TMP178:%.*]] = select i1 [[TMP168]], i32 [[TMP176]], i32 [[TMP177]] -; CHECK-NEXT: [[TMP179:%.*]] = zext i32 [[TMP178]] to i64 -; CHECK-NEXT: [[TMP180:%.*]] = zext i32 [[TMP155]] to i64 -; CHECK-NEXT: [[TMP181:%.*]] = mul i64 [[TMP179]], [[TMP180]] -; CHECK-NEXT: [[TMP182:%.*]] = trunc i64 [[TMP181]] to i32 -; CHECK-NEXT: [[TMP183:%.*]] = lshr i64 [[TMP181]], 32 -; CHECK-NEXT: [[TMP184:%.*]] = trunc i64 [[TMP183]] to i32 -; CHECK-NEXT: [[TMP185:%.*]] = mul i32 [[TMP184]], [[TMP156]] -; CHECK-NEXT: [[TMP186:%.*]] = sub i32 [[TMP155]], [[TMP185]] -; CHECK-NEXT: [[TMP187:%.*]] = icmp uge i32 [[TMP186]], [[TMP156]] -; CHECK-NEXT: [[TMP188:%.*]] = icmp uge i32 [[TMP155]], [[TMP185]] -; CHECK-NEXT: [[TMP189:%.*]] = and i1 [[TMP187]], [[TMP188]] -; CHECK-NEXT: [[TMP190:%.*]] = add i32 [[TMP184]], 1 -; CHECK-NEXT: [[TMP191:%.*]] = sub i32 [[TMP184]], 1 -; CHECK-NEXT: [[TMP192:%.*]] = select i1 [[TMP189]], i32 [[TMP190]], i32 [[TMP184]] -; CHECK-NEXT: [[TMP193:%.*]] = select i1 [[TMP188]], i32 [[TMP192]], i32 [[TMP191]] -; CHECK-NEXT: [[TMP194:%.*]] = xor i32 [[TMP193]], [[TMP152]] -; CHECK-NEXT: [[TMP195:%.*]] = sub i32 [[TMP194]], [[TMP152]] -; CHECK-NEXT: [[TMP196:%.*]] = insertelement <4 x i32> [[TMP147]], i32 [[TMP195]], i64 3 -; CHECK-NEXT: store <4 x i32> [[TMP196]], <4 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP29:%.*]] = mul i32 [[TMP28]], [[TMP9]] +; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[TMP8]], [[TMP29]] +; CHECK-NEXT: [[TMP31:%.*]] = icmp uge i32 [[TMP30]], [[TMP9]] +; CHECK-NEXT: [[TMP32:%.*]] = add i32 [[TMP28]], 1 +; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP31]], i32 [[TMP32]], i32 [[TMP28]] +; CHECK-NEXT: [[TMP34:%.*]] = sub i32 [[TMP30]], [[TMP9]] +; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP31]], i32 [[TMP34]], i32 [[TMP30]] +; CHECK-NEXT: [[TMP36:%.*]] = icmp uge i32 [[TMP35]], [[TMP9]] +; CHECK-NEXT: [[TMP37:%.*]] = add i32 [[TMP33]], 1 +; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP36]], i32 [[TMP37]], i32 [[TMP33]] +; CHECK-NEXT: [[TMP39:%.*]] = xor i32 [[TMP38]], [[TMP5]] +; CHECK-NEXT: [[TMP40:%.*]] = sub i32 [[TMP39]], [[TMP5]] +; CHECK-NEXT: [[TMP41:%.*]] = insertelement <4 x i32> undef, i32 [[TMP40]], i64 0 +; CHECK-NEXT: [[TMP42:%.*]] = extractelement <4 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP43:%.*]] = extractelement <4 x i32> [[Y]], i64 1 +; CHECK-NEXT: [[TMP44:%.*]] = ashr i32 [[TMP42]], 31 +; CHECK-NEXT: [[TMP45:%.*]] = ashr i32 [[TMP43]], 31 +; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP44]], [[TMP45]] +; CHECK-NEXT: [[TMP47:%.*]] = add i32 [[TMP42]], [[TMP44]] +; CHECK-NEXT: [[TMP48:%.*]] = add i32 [[TMP43]], [[TMP45]] +; CHECK-NEXT: [[TMP49:%.*]] = xor i32 [[TMP47]], [[TMP44]] +; CHECK-NEXT: [[TMP50:%.*]] = xor i32 [[TMP48]], [[TMP45]] +; CHECK-NEXT: [[TMP51:%.*]] = uitofp i32 [[TMP50]] to float +; CHECK-NEXT: [[TMP52:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP51]]) +; CHECK-NEXT: [[TMP53:%.*]] = fmul fast float [[TMP52]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP54:%.*]] = fptoui float [[TMP53]] to i32 +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 0, [[TMP50]] +; CHECK-NEXT: [[TMP56:%.*]] = mul i32 [[TMP55]], [[TMP54]] +; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP54]] to i64 +; CHECK-NEXT: [[TMP58:%.*]] = zext i32 [[TMP56]] to i64 +; CHECK-NEXT: [[TMP59:%.*]] = mul i64 [[TMP57]], [[TMP58]] +; CHECK-NEXT: [[TMP60:%.*]] = trunc i64 [[TMP59]] to i32 +; CHECK-NEXT: [[TMP61:%.*]] = lshr i64 [[TMP59]], 32 +; CHECK-NEXT: [[TMP62:%.*]] = trunc i64 [[TMP61]] to i32 +; CHECK-NEXT: [[TMP63:%.*]] = add i32 [[TMP54]], [[TMP62]] +; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[TMP49]] to i64 +; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP63]] to i64 +; CHECK-NEXT: [[TMP66:%.*]] = mul i64 [[TMP64]], [[TMP65]] +; CHECK-NEXT: [[TMP67:%.*]] = trunc i64 [[TMP66]] to i32 +; CHECK-NEXT: [[TMP68:%.*]] = lshr i64 [[TMP66]], 32 +; CHECK-NEXT: [[TMP69:%.*]] = trunc i64 [[TMP68]] to i32 +; CHECK-NEXT: [[TMP70:%.*]] = mul i32 [[TMP69]], [[TMP50]] +; CHECK-NEXT: [[TMP71:%.*]] = sub i32 [[TMP49]], [[TMP70]] +; CHECK-NEXT: [[TMP72:%.*]] = icmp uge i32 [[TMP71]], [[TMP50]] +; CHECK-NEXT: [[TMP73:%.*]] = add i32 [[TMP69]], 1 +; CHECK-NEXT: [[TMP74:%.*]] = select i1 [[TMP72]], i32 [[TMP73]], i32 [[TMP69]] +; CHECK-NEXT: [[TMP75:%.*]] = sub i32 [[TMP71]], [[TMP50]] +; CHECK-NEXT: [[TMP76:%.*]] = select i1 [[TMP72]], i32 [[TMP75]], i32 [[TMP71]] +; CHECK-NEXT: [[TMP77:%.*]] = icmp uge i32 [[TMP76]], [[TMP50]] +; CHECK-NEXT: [[TMP78:%.*]] = add i32 [[TMP74]], 1 +; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP77]], i32 [[TMP78]], i32 [[TMP74]] +; CHECK-NEXT: [[TMP80:%.*]] = xor i32 [[TMP79]], [[TMP46]] +; CHECK-NEXT: [[TMP81:%.*]] = sub i32 [[TMP80]], [[TMP46]] +; CHECK-NEXT: [[TMP82:%.*]] = insertelement <4 x i32> [[TMP41]], i32 [[TMP81]], i64 1 +; CHECK-NEXT: [[TMP83:%.*]] = extractelement <4 x i32> [[X]], i64 2 +; CHECK-NEXT: [[TMP84:%.*]] = extractelement <4 x i32> [[Y]], i64 2 +; CHECK-NEXT: [[TMP85:%.*]] = ashr i32 [[TMP83]], 31 +; CHECK-NEXT: [[TMP86:%.*]] = ashr i32 [[TMP84]], 31 +; CHECK-NEXT: [[TMP87:%.*]] = xor i32 [[TMP85]], [[TMP86]] +; CHECK-NEXT: [[TMP88:%.*]] = add i32 [[TMP83]], [[TMP85]] +; CHECK-NEXT: [[TMP89:%.*]] = add i32 [[TMP84]], [[TMP86]] +; CHECK-NEXT: [[TMP90:%.*]] = xor i32 [[TMP88]], [[TMP85]] +; CHECK-NEXT: [[TMP91:%.*]] = xor i32 [[TMP89]], [[TMP86]] +; CHECK-NEXT: [[TMP92:%.*]] = uitofp i32 [[TMP91]] to float +; CHECK-NEXT: [[TMP93:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP92]]) +; CHECK-NEXT: [[TMP94:%.*]] = fmul fast float [[TMP93]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP95:%.*]] = fptoui float [[TMP94]] to i32 +; CHECK-NEXT: [[TMP96:%.*]] = sub i32 0, [[TMP91]] +; CHECK-NEXT: [[TMP97:%.*]] = mul i32 [[TMP96]], [[TMP95]] +; CHECK-NEXT: [[TMP98:%.*]] = zext i32 [[TMP95]] to i64 +; CHECK-NEXT: [[TMP99:%.*]] = zext i32 [[TMP97]] to i64 +; CHECK-NEXT: [[TMP100:%.*]] = mul i64 [[TMP98]], [[TMP99]] +; CHECK-NEXT: [[TMP101:%.*]] = trunc i64 [[TMP100]] to i32 +; CHECK-NEXT: [[TMP102:%.*]] = lshr i64 [[TMP100]], 32 +; CHECK-NEXT: [[TMP103:%.*]] = trunc i64 [[TMP102]] to i32 +; CHECK-NEXT: [[TMP104:%.*]] = add i32 [[TMP95]], [[TMP103]] +; CHECK-NEXT: [[TMP105:%.*]] = zext i32 [[TMP90]] to i64 +; CHECK-NEXT: [[TMP106:%.*]] = zext i32 [[TMP104]] to i64 +; CHECK-NEXT: [[TMP107:%.*]] = mul i64 [[TMP105]], [[TMP106]] +; CHECK-NEXT: [[TMP108:%.*]] = trunc i64 [[TMP107]] to i32 +; CHECK-NEXT: [[TMP109:%.*]] = lshr i64 [[TMP107]], 32 +; CHECK-NEXT: [[TMP110:%.*]] = trunc i64 [[TMP109]] to i32 +; CHECK-NEXT: [[TMP111:%.*]] = mul i32 [[TMP110]], [[TMP91]] +; CHECK-NEXT: [[TMP112:%.*]] = sub i32 [[TMP90]], [[TMP111]] +; CHECK-NEXT: [[TMP113:%.*]] = icmp uge i32 [[TMP112]], [[TMP91]] +; CHECK-NEXT: [[TMP114:%.*]] = add i32 [[TMP110]], 1 +; CHECK-NEXT: [[TMP115:%.*]] = select i1 [[TMP113]], i32 [[TMP114]], i32 [[TMP110]] +; CHECK-NEXT: [[TMP116:%.*]] = sub i32 [[TMP112]], [[TMP91]] +; CHECK-NEXT: [[TMP117:%.*]] = select i1 [[TMP113]], i32 [[TMP116]], i32 [[TMP112]] +; CHECK-NEXT: [[TMP118:%.*]] = icmp uge i32 [[TMP117]], [[TMP91]] +; CHECK-NEXT: [[TMP119:%.*]] = add i32 [[TMP115]], 1 +; CHECK-NEXT: [[TMP120:%.*]] = select i1 [[TMP118]], i32 [[TMP119]], i32 [[TMP115]] +; CHECK-NEXT: [[TMP121:%.*]] = xor i32 [[TMP120]], [[TMP87]] +; CHECK-NEXT: [[TMP122:%.*]] = sub i32 [[TMP121]], [[TMP87]] +; CHECK-NEXT: [[TMP123:%.*]] = insertelement <4 x i32> [[TMP82]], i32 [[TMP122]], i64 2 +; CHECK-NEXT: [[TMP124:%.*]] = extractelement <4 x i32> [[X]], i64 3 +; CHECK-NEXT: [[TMP125:%.*]] = extractelement <4 x i32> [[Y]], i64 3 +; CHECK-NEXT: [[TMP126:%.*]] = ashr i32 [[TMP124]], 31 +; CHECK-NEXT: [[TMP127:%.*]] = ashr i32 [[TMP125]], 31 +; CHECK-NEXT: [[TMP128:%.*]] = xor i32 [[TMP126]], [[TMP127]] +; CHECK-NEXT: [[TMP129:%.*]] = add i32 [[TMP124]], [[TMP126]] +; CHECK-NEXT: [[TMP130:%.*]] = add i32 [[TMP125]], [[TMP127]] +; CHECK-NEXT: [[TMP131:%.*]] = xor i32 [[TMP129]], [[TMP126]] +; CHECK-NEXT: [[TMP132:%.*]] = xor i32 [[TMP130]], [[TMP127]] +; CHECK-NEXT: [[TMP133:%.*]] = uitofp i32 [[TMP132]] to float +; CHECK-NEXT: [[TMP134:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP133]]) +; CHECK-NEXT: [[TMP135:%.*]] = fmul fast float [[TMP134]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP136:%.*]] = fptoui float [[TMP135]] to i32 +; CHECK-NEXT: [[TMP137:%.*]] = sub i32 0, [[TMP132]] +; CHECK-NEXT: [[TMP138:%.*]] = mul i32 [[TMP137]], [[TMP136]] +; CHECK-NEXT: [[TMP139:%.*]] = zext i32 [[TMP136]] to i64 +; CHECK-NEXT: [[TMP140:%.*]] = zext i32 [[TMP138]] to i64 +; CHECK-NEXT: [[TMP141:%.*]] = mul i64 [[TMP139]], [[TMP140]] +; CHECK-NEXT: [[TMP142:%.*]] = trunc i64 [[TMP141]] to i32 +; CHECK-NEXT: [[TMP143:%.*]] = lshr i64 [[TMP141]], 32 +; CHECK-NEXT: [[TMP144:%.*]] = trunc i64 [[TMP143]] to i32 +; CHECK-NEXT: [[TMP145:%.*]] = add i32 [[TMP136]], [[TMP144]] +; CHECK-NEXT: [[TMP146:%.*]] = zext i32 [[TMP131]] to i64 +; CHECK-NEXT: [[TMP147:%.*]] = zext i32 [[TMP145]] to i64 +; CHECK-NEXT: [[TMP148:%.*]] = mul i64 [[TMP146]], [[TMP147]] +; CHECK-NEXT: [[TMP149:%.*]] = trunc i64 [[TMP148]] to i32 +; CHECK-NEXT: [[TMP150:%.*]] = lshr i64 [[TMP148]], 32 +; CHECK-NEXT: [[TMP151:%.*]] = trunc i64 [[TMP150]] to i32 +; CHECK-NEXT: [[TMP152:%.*]] = mul i32 [[TMP151]], [[TMP132]] +; CHECK-NEXT: [[TMP153:%.*]] = sub i32 [[TMP131]], [[TMP152]] +; CHECK-NEXT: [[TMP154:%.*]] = icmp uge i32 [[TMP153]], [[TMP132]] +; CHECK-NEXT: [[TMP155:%.*]] = add i32 [[TMP151]], 1 +; CHECK-NEXT: [[TMP156:%.*]] = select i1 [[TMP154]], i32 [[TMP155]], i32 [[TMP151]] +; CHECK-NEXT: [[TMP157:%.*]] = sub i32 [[TMP153]], [[TMP132]] +; CHECK-NEXT: [[TMP158:%.*]] = select i1 [[TMP154]], i32 [[TMP157]], i32 [[TMP153]] +; CHECK-NEXT: [[TMP159:%.*]] = icmp uge i32 [[TMP158]], [[TMP132]] +; CHECK-NEXT: [[TMP160:%.*]] = add i32 [[TMP156]], 1 +; CHECK-NEXT: [[TMP161:%.*]] = select i1 [[TMP159]], i32 [[TMP160]], i32 [[TMP156]] +; CHECK-NEXT: [[TMP162:%.*]] = xor i32 [[TMP161]], [[TMP128]] +; CHECK-NEXT: [[TMP163:%.*]] = sub i32 [[TMP162]], [[TMP128]] +; CHECK-NEXT: [[TMP164:%.*]] = insertelement <4 x i32> [[TMP123]], i32 [[TMP163]], i64 3 +; CHECK-NEXT: store <4 x i32> [[TMP164]], <4 x i32> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v4i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx8 s[12:19], s[0:1], 0xd -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s11, 0xf000 -; GCN-NEXT: s_mov_b32 s10, -1 +; GCN-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0xd +; GCN-NEXT: s_mov_b32 s16, 0x4f7ffffe ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_ashr_i32 s2, s16, 31 -; GCN-NEXT: s_add_i32 s3, s16, s2 -; GCN-NEXT: s_xor_b32 s5, s3, s2 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s5 -; GCN-NEXT: s_mov_b32 s16, 0x4f800000 -; GCN-NEXT: s_ashr_i32 s6, s17, 31 -; GCN-NEXT: s_add_i32 s0, s17, s6 +; GCN-NEXT: s_ashr_i32 s14, s8, 31 +; GCN-NEXT: s_add_i32 s2, s8, s14 +; GCN-NEXT: s_xor_b32 s12, s2, s14 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s12 +; GCN-NEXT: s_ashr_i32 s8, s9, 31 +; GCN-NEXT: s_add_i32 s2, s9, s8 +; GCN-NEXT: s_xor_b32 s15, s2, s8 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s17, s0, s6 -; GCN-NEXT: v_cvt_f32_u32_e32 v3, s17 -; GCN-NEXT: s_ashr_i32 s3, s12, 31 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s15 +; GCN-NEXT: s_sub_i32 s3, 0, s12 +; GCN-NEXT: s_ashr_i32 s9, s4, 31 ; GCN-NEXT: v_mul_f32_e32 v0, s16, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: s_add_i32 s4, s12, s3 -; GCN-NEXT: s_xor_b32 s4, s4, s3 -; GCN-NEXT: s_xor_b32 s7, s3, s2 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s5 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s5 -; GCN-NEXT: s_ashr_i32 s12, s13, 31 -; GCN-NEXT: s_add_i32 s13, s13, s12 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v3 -; GCN-NEXT: s_xor_b32 s13, s13, s12 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s4 -; GCN-NEXT: v_mul_f32_e32 v1, s16, v2 +; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 +; GCN-NEXT: s_add_i32 s2, s4, s9 +; GCN-NEXT: s_xor_b32 s2, s2, s9 +; GCN-NEXT: v_mul_lo_u32 v2, s3, v0 +; GCN-NEXT: v_mul_f32_e32 v1, s16, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v0, s5 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_mul_hi_u32 v5, v1, s17 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s4, v2 -; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s5, v4 -; GCN-NEXT: v_mul_lo_u32 v4, v1, s17 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], s4, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v5 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v1 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] -; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GCN-NEXT: s_ashr_i32 s5, s18, 31 -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[0:1] -; GCN-NEXT: s_add_i32 s0, s18, s5 -; GCN-NEXT: s_xor_b32 s4, s12, s6 -; GCN-NEXT: s_xor_b32 s12, s0, s5 -; GCN-NEXT: v_cvt_f32_u32_e32 v4, s12 -; GCN-NEXT: v_mul_hi_u32 v1, v1, s13 -; GCN-NEXT: v_xor_b32_e32 v0, s7, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s7, v0 -; GCN-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GCN-NEXT: v_mul_lo_u32 v2, v1, s17 -; GCN-NEXT: s_ashr_i32 s6, s19, 31 -; GCN-NEXT: v_mul_f32_e32 v4, s16, v4 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s13, v2 -; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s17, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s13, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v1 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v2, v4, s12 -; GCN-NEXT: v_mul_hi_u32 v3, v4, s12 -; GCN-NEXT: s_ashr_i32 s2, s14, 31 -; GCN-NEXT: s_add_i32 s3, s14, s2 -; GCN-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 +; GCN-NEXT: s_sub_i32 s3, 0, s15 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: s_ashr_i32 s4, s5, 31 +; GCN-NEXT: v_mul_lo_u32 v3, s3, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s2, v0 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v3 +; GCN-NEXT: v_mul_lo_u32 v3, v0, s12 +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s2, v3 +; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s12, v3 +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[2:3] +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s12, v3 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[2:3] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s12, v3 +; GCN-NEXT: s_add_i32 s2, s5, s4 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GCN-NEXT: s_xor_b32 s2, s2, s4 +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s2, v1 +; GCN-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x9 +; GCN-NEXT: s_xor_b32 s0, s9, s14 +; GCN-NEXT: v_xor_b32_e32 v0, s0, v0 +; GCN-NEXT: v_mul_lo_u32 v2, v1, s15 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s0, v0 +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v1 +; GCN-NEXT: s_ashr_i32 s3, s6, 31 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s2, v2 +; GCN-NEXT: s_ashr_i32 s2, s10, 31 +; GCN-NEXT: s_add_i32 s0, s10, s2 +; GCN-NEXT: s_xor_b32 s5, s0, s2 +; GCN-NEXT: v_cvt_f32_u32_e32 v3, s5 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v2 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s15, v2 +; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] +; GCN-NEXT: s_sub_i32 s0, 0, s5 +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v1 +; GCN-NEXT: v_mul_f32_e32 v3, s16, v3 +; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s15, v2 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; GCN-NEXT: s_xor_b32 s1, s4, s8 +; GCN-NEXT: v_mul_lo_u32 v5, s0, v3 +; GCN-NEXT: s_add_i32 s0, s6, s3 +; GCN-NEXT: s_xor_b32 s0, s0, s3 +; GCN-NEXT: s_ashr_i32 s4, s11, 31 +; GCN-NEXT: v_mul_hi_u32 v2, v3, v5 +; GCN-NEXT: v_xor_b32_e32 v1, s1, v1 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s1, v1 +; GCN-NEXT: s_xor_b32 s2, s3, s2 +; GCN-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GCN-NEXT: v_mul_hi_u32 v2, s0, v2 +; GCN-NEXT: s_mov_b32 s15, 0xf000 +; GCN-NEXT: s_mov_b32 s14, -1 +; GCN-NEXT: v_mul_lo_u32 v3, v2, s5 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v2 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s0, v3 +; GCN-NEXT: s_add_i32 s0, s11, s4 +; GCN-NEXT: s_xor_b32 s6, s0, s4 +; GCN-NEXT: v_cvt_f32_u32_e32 v4, s6 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s5, v3 ; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v5, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v4 -; GCN-NEXT: s_xor_b32 s3, s3, s2 -; GCN-NEXT: v_xor_b32_e32 v1, s4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v1 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v2, v4 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, v2, v4 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v3, s[0:1] -; GCN-NEXT: s_add_i32 s0, s19, s6 -; GCN-NEXT: s_xor_b32 s14, s0, s6 -; GCN-NEXT: v_cvt_f32_u32_e32 v4, s14 -; GCN-NEXT: v_mul_hi_u32 v2, v2, s3 -; GCN-NEXT: s_xor_b32 s7, s2, s5 +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s5, v3 ; GCN-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GCN-NEXT: v_mul_lo_u32 v3, v2, s12 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; GCN-NEXT: s_sub_i32 s0, 0, s6 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v2 ; GCN-NEXT: v_mul_f32_e32 v4, s16, v4 ; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GCN-NEXT: v_sub_i32_e32 v5, vcc, s3, v3 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s12, v5 -; GCN-NEXT: s_ashr_i32 s12, s15, 31 -; GCN-NEXT: v_mul_lo_u32 v6, v4, s14 -; GCN-NEXT: v_mul_hi_u32 v7, v4, s14 -; GCN-NEXT: s_add_i32 s13, s15, s12 -; GCN-NEXT: s_xor_b32 s13, s13, s12 -; GCN-NEXT: v_sub_i32_e32 v8, vcc, 0, v6 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v7 -; GCN-NEXT: v_cndmask_b32_e64 v6, v6, v8, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v6, v6, v4 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s3, v3 -; GCN-NEXT: v_add_i32_e32 v5, vcc, -1, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v2 -; GCN-NEXT: v_add_i32_e32 v7, vcc, v6, v4 -; GCN-NEXT: v_subrev_i32_e32 v4, vcc, v6, v4 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v4, v4, s13 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v3, vcc -; GCN-NEXT: v_cndmask_b32_e64 v2, v5, v2, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v3, v4, s14 -; GCN-NEXT: v_xor_b32_e32 v2, s7, v2 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s7, v2 -; GCN-NEXT: s_xor_b32 s4, s12, s6 -; GCN-NEXT: v_sub_i32_e32 v5, vcc, s13, v3 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v5 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s13, v3 -; GCN-NEXT: v_add_i32_e32 v5, vcc, -1, v4 -; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v4 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v3, v4, v3, vcc -; GCN-NEXT: v_cndmask_b32_e64 v3, v5, v3, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v3, s4, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s4, v3 -; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v3 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GCN-NEXT: v_xor_b32_e32 v2, s2, v2 +; GCN-NEXT: v_mul_lo_u32 v6, s0, v4 +; GCN-NEXT: s_ashr_i32 s0, s7, 31 +; GCN-NEXT: s_add_i32 s1, s7, s0 +; GCN-NEXT: s_xor_b32 s1, s1, s0 +; GCN-NEXT: v_mul_hi_u32 v3, v4, v6 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s2, v2 +; GCN-NEXT: s_xor_b32 s2, s0, s4 +; GCN-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; GCN-NEXT: v_mul_hi_u32 v3, s1, v3 +; GCN-NEXT: v_mul_lo_u32 v4, v3, s6 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, s1, v4 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s6, v4 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v5, vcc, s6, v4 +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[0:1] +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s6, v4 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GCN-NEXT: v_xor_b32_e32 v3, s2, v3 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v3 +; GCN-NEXT: s_waitcnt lgkmcnt(0) +; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[12:15], 0 ; GCN-NEXT: s_endpgm %r = sdiv <4 x i32> %x, %y store <4 x i32> %r, <4 x i32> addrspace(1)* %out @@ -1680,324 +1466,260 @@ define amdgpu_kernel void @srem_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> %x ; CHECK-NEXT: [[TMP8:%.*]] = xor i32 [[TMP6]], [[TMP4]] ; CHECK-NEXT: [[TMP9:%.*]] = uitofp i32 [[TMP8]] to float ; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP9]]) -; CHECK-NEXT: [[TMP11:%.*]] = fmul fast float [[TMP10]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP11:%.*]] = fmul fast float [[TMP10]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP12:%.*]] = fptoui float [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = zext i32 [[TMP12]] to i64 -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP8]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = mul i64 [[TMP13]], [[TMP14]] -; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32 -; CHECK-NEXT: [[TMP17:%.*]] = lshr i64 [[TMP15]], 32 +; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP8]] +; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[TMP13]], [[TMP12]] +; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP12]] to i64 +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP14]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = mul i64 [[TMP15]], [[TMP16]] ; CHECK-NEXT: [[TMP18:%.*]] = trunc i64 [[TMP17]] to i32 -; CHECK-NEXT: [[TMP19:%.*]] = sub i32 0, [[TMP16]] -; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i32 [[TMP18]], 0 -; CHECK-NEXT: [[TMP21:%.*]] = select i1 [[TMP20]], i32 [[TMP19]], i32 [[TMP16]] -; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP21]] to i64 -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP12]] to i64 +; CHECK-NEXT: [[TMP19:%.*]] = lshr i64 [[TMP17]], 32 +; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32 +; CHECK-NEXT: [[TMP21:%.*]] = add i32 [[TMP12]], [[TMP20]] +; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP7]] to i64 +; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP21]] to i64 ; CHECK-NEXT: [[TMP24:%.*]] = mul i64 [[TMP22]], [[TMP23]] ; CHECK-NEXT: [[TMP25:%.*]] = trunc i64 [[TMP24]] to i32 ; CHECK-NEXT: [[TMP26:%.*]] = lshr i64 [[TMP24]], 32 ; CHECK-NEXT: [[TMP27:%.*]] = trunc i64 [[TMP26]] to i32 -; CHECK-NEXT: [[TMP28:%.*]] = add i32 [[TMP12]], [[TMP27]] -; CHECK-NEXT: [[TMP29:%.*]] = sub i32 [[TMP12]], [[TMP27]] -; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP20]], i32 [[TMP28]], i32 [[TMP29]] -; CHECK-NEXT: [[TMP31:%.*]] = zext i32 [[TMP30]] to i64 -; CHECK-NEXT: [[TMP32:%.*]] = zext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[TMP33:%.*]] = mul i64 [[TMP31]], [[TMP32]] -; CHECK-NEXT: [[TMP34:%.*]] = trunc i64 [[TMP33]] to i32 -; CHECK-NEXT: [[TMP35:%.*]] = lshr i64 [[TMP33]], 32 -; CHECK-NEXT: [[TMP36:%.*]] = trunc i64 [[TMP35]] to i32 -; CHECK-NEXT: [[TMP37:%.*]] = mul i32 [[TMP36]], [[TMP8]] -; CHECK-NEXT: [[TMP38:%.*]] = sub i32 [[TMP7]], [[TMP37]] -; CHECK-NEXT: [[TMP39:%.*]] = icmp uge i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP40:%.*]] = icmp uge i32 [[TMP7]], [[TMP37]] -; CHECK-NEXT: [[TMP41:%.*]] = and i1 [[TMP39]], [[TMP40]] -; CHECK-NEXT: [[TMP42:%.*]] = sub i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP41]], i32 [[TMP42]], i32 [[TMP38]] -; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP40]], i32 [[TMP44]], i32 [[TMP43]] -; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP45]], [[TMP3]] -; CHECK-NEXT: [[TMP47:%.*]] = sub i32 [[TMP46]], [[TMP3]] -; CHECK-NEXT: [[TMP48:%.*]] = insertelement <4 x i32> undef, i32 [[TMP47]], i64 0 -; CHECK-NEXT: [[TMP49:%.*]] = extractelement <4 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP50:%.*]] = extractelement <4 x i32> [[Y]], i64 1 -; CHECK-NEXT: [[TMP51:%.*]] = ashr i32 [[TMP49]], 31 -; CHECK-NEXT: [[TMP52:%.*]] = ashr i32 [[TMP50]], 31 -; CHECK-NEXT: [[TMP53:%.*]] = add i32 [[TMP49]], [[TMP51]] -; CHECK-NEXT: [[TMP54:%.*]] = add i32 [[TMP50]], [[TMP52]] -; CHECK-NEXT: [[TMP55:%.*]] = xor i32 [[TMP53]], [[TMP51]] -; CHECK-NEXT: [[TMP56:%.*]] = xor i32 [[TMP54]], [[TMP52]] -; CHECK-NEXT: [[TMP57:%.*]] = uitofp i32 [[TMP56]] to float -; CHECK-NEXT: [[TMP58:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP57]]) -; CHECK-NEXT: [[TMP59:%.*]] = fmul fast float [[TMP58]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP60:%.*]] = fptoui float [[TMP59]] to i32 -; CHECK-NEXT: [[TMP61:%.*]] = zext i32 [[TMP60]] to i64 -; CHECK-NEXT: [[TMP62:%.*]] = zext i32 [[TMP56]] to i64 -; CHECK-NEXT: [[TMP63:%.*]] = mul i64 [[TMP61]], [[TMP62]] -; CHECK-NEXT: [[TMP64:%.*]] = trunc i64 [[TMP63]] to i32 -; CHECK-NEXT: [[TMP65:%.*]] = lshr i64 [[TMP63]], 32 -; CHECK-NEXT: [[TMP66:%.*]] = trunc i64 [[TMP65]] to i32 -; CHECK-NEXT: [[TMP67:%.*]] = sub i32 0, [[TMP64]] -; CHECK-NEXT: [[TMP68:%.*]] = icmp eq i32 [[TMP66]], 0 -; CHECK-NEXT: [[TMP69:%.*]] = select i1 [[TMP68]], i32 [[TMP67]], i32 [[TMP64]] -; CHECK-NEXT: [[TMP70:%.*]] = zext i32 [[TMP69]] to i64 -; CHECK-NEXT: [[TMP71:%.*]] = zext i32 [[TMP60]] to i64 -; CHECK-NEXT: [[TMP72:%.*]] = mul i64 [[TMP70]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = trunc i64 [[TMP72]] to i32 -; CHECK-NEXT: [[TMP74:%.*]] = lshr i64 [[TMP72]], 32 -; CHECK-NEXT: [[TMP75:%.*]] = trunc i64 [[TMP74]] to i32 -; CHECK-NEXT: [[TMP76:%.*]] = add i32 [[TMP60]], [[TMP75]] -; CHECK-NEXT: [[TMP77:%.*]] = sub i32 [[TMP60]], [[TMP75]] -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP68]], i32 [[TMP76]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP79:%.*]] = zext i32 [[TMP78]] to i64 -; CHECK-NEXT: [[TMP80:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP81:%.*]] = mul i64 [[TMP79]], [[TMP80]] -; CHECK-NEXT: [[TMP82:%.*]] = trunc i64 [[TMP81]] to i32 -; CHECK-NEXT: [[TMP83:%.*]] = lshr i64 [[TMP81]], 32 -; CHECK-NEXT: [[TMP84:%.*]] = trunc i64 [[TMP83]] to i32 -; CHECK-NEXT: [[TMP85:%.*]] = mul i32 [[TMP84]], [[TMP56]] -; CHECK-NEXT: [[TMP86:%.*]] = sub i32 [[TMP55]], [[TMP85]] -; CHECK-NEXT: [[TMP87:%.*]] = icmp uge i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP88:%.*]] = icmp uge i32 [[TMP55]], [[TMP85]] -; CHECK-NEXT: [[TMP89:%.*]] = and i1 [[TMP87]], [[TMP88]] -; CHECK-NEXT: [[TMP90:%.*]] = sub i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP91:%.*]] = add i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP89]], i32 [[TMP90]], i32 [[TMP86]] -; CHECK-NEXT: [[TMP93:%.*]] = select i1 [[TMP88]], i32 [[TMP92]], i32 [[TMP91]] -; CHECK-NEXT: [[TMP94:%.*]] = xor i32 [[TMP93]], [[TMP51]] -; CHECK-NEXT: [[TMP95:%.*]] = sub i32 [[TMP94]], [[TMP51]] -; CHECK-NEXT: [[TMP96:%.*]] = insertelement <4 x i32> [[TMP48]], i32 [[TMP95]], i64 1 -; CHECK-NEXT: [[TMP97:%.*]] = extractelement <4 x i32> [[X]], i64 2 -; CHECK-NEXT: [[TMP98:%.*]] = extractelement <4 x i32> [[Y]], i64 2 -; CHECK-NEXT: [[TMP99:%.*]] = ashr i32 [[TMP97]], 31 -; CHECK-NEXT: [[TMP100:%.*]] = ashr i32 [[TMP98]], 31 -; CHECK-NEXT: [[TMP101:%.*]] = add i32 [[TMP97]], [[TMP99]] -; CHECK-NEXT: [[TMP102:%.*]] = add i32 [[TMP98]], [[TMP100]] -; CHECK-NEXT: [[TMP103:%.*]] = xor i32 [[TMP101]], [[TMP99]] -; CHECK-NEXT: [[TMP104:%.*]] = xor i32 [[TMP102]], [[TMP100]] -; CHECK-NEXT: [[TMP105:%.*]] = uitofp i32 [[TMP104]] to float -; CHECK-NEXT: [[TMP106:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP105]]) -; CHECK-NEXT: [[TMP107:%.*]] = fmul fast float [[TMP106]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP108:%.*]] = fptoui float [[TMP107]] to i32 -; CHECK-NEXT: [[TMP109:%.*]] = zext i32 [[TMP108]] to i64 -; CHECK-NEXT: [[TMP110:%.*]] = zext i32 [[TMP104]] to i64 -; CHECK-NEXT: [[TMP111:%.*]] = mul i64 [[TMP109]], [[TMP110]] -; CHECK-NEXT: [[TMP112:%.*]] = trunc i64 [[TMP111]] to i32 -; CHECK-NEXT: [[TMP113:%.*]] = lshr i64 [[TMP111]], 32 -; CHECK-NEXT: [[TMP114:%.*]] = trunc i64 [[TMP113]] to i32 -; CHECK-NEXT: [[TMP115:%.*]] = sub i32 0, [[TMP112]] -; CHECK-NEXT: [[TMP116:%.*]] = icmp eq i32 [[TMP114]], 0 -; CHECK-NEXT: [[TMP117:%.*]] = select i1 [[TMP116]], i32 [[TMP115]], i32 [[TMP112]] -; CHECK-NEXT: [[TMP118:%.*]] = zext i32 [[TMP117]] to i64 -; CHECK-NEXT: [[TMP119:%.*]] = zext i32 [[TMP108]] to i64 -; CHECK-NEXT: [[TMP120:%.*]] = mul i64 [[TMP118]], [[TMP119]] -; CHECK-NEXT: [[TMP121:%.*]] = trunc i64 [[TMP120]] to i32 -; CHECK-NEXT: [[TMP122:%.*]] = lshr i64 [[TMP120]], 32 -; CHECK-NEXT: [[TMP123:%.*]] = trunc i64 [[TMP122]] to i32 -; CHECK-NEXT: [[TMP124:%.*]] = add i32 [[TMP108]], [[TMP123]] -; CHECK-NEXT: [[TMP125:%.*]] = sub i32 [[TMP108]], [[TMP123]] -; CHECK-NEXT: [[TMP126:%.*]] = select i1 [[TMP116]], i32 [[TMP124]], i32 [[TMP125]] -; CHECK-NEXT: [[TMP127:%.*]] = zext i32 [[TMP126]] to i64 -; CHECK-NEXT: [[TMP128:%.*]] = zext i32 [[TMP103]] to i64 -; CHECK-NEXT: [[TMP129:%.*]] = mul i64 [[TMP127]], [[TMP128]] -; CHECK-NEXT: [[TMP130:%.*]] = trunc i64 [[TMP129]] to i32 -; CHECK-NEXT: [[TMP131:%.*]] = lshr i64 [[TMP129]], 32 +; CHECK-NEXT: [[TMP28:%.*]] = mul i32 [[TMP27]], [[TMP8]] +; CHECK-NEXT: [[TMP29:%.*]] = sub i32 [[TMP7]], [[TMP28]] +; CHECK-NEXT: [[TMP30:%.*]] = icmp uge i32 [[TMP29]], [[TMP8]] +; CHECK-NEXT: [[TMP31:%.*]] = sub i32 [[TMP29]], [[TMP8]] +; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP30]], i32 [[TMP31]], i32 [[TMP29]] +; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP8]] +; CHECK-NEXT: [[TMP34:%.*]] = sub i32 [[TMP32]], [[TMP8]] +; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP33]], i32 [[TMP34]], i32 [[TMP32]] +; CHECK-NEXT: [[TMP36:%.*]] = xor i32 [[TMP35]], [[TMP3]] +; CHECK-NEXT: [[TMP37:%.*]] = sub i32 [[TMP36]], [[TMP3]] +; CHECK-NEXT: [[TMP38:%.*]] = insertelement <4 x i32> undef, i32 [[TMP37]], i64 0 +; CHECK-NEXT: [[TMP39:%.*]] = extractelement <4 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP40:%.*]] = extractelement <4 x i32> [[Y]], i64 1 +; CHECK-NEXT: [[TMP41:%.*]] = ashr i32 [[TMP39]], 31 +; CHECK-NEXT: [[TMP42:%.*]] = ashr i32 [[TMP40]], 31 +; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP39]], [[TMP41]] +; CHECK-NEXT: [[TMP44:%.*]] = add i32 [[TMP40]], [[TMP42]] +; CHECK-NEXT: [[TMP45:%.*]] = xor i32 [[TMP43]], [[TMP41]] +; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP44]], [[TMP42]] +; CHECK-NEXT: [[TMP47:%.*]] = uitofp i32 [[TMP46]] to float +; CHECK-NEXT: [[TMP48:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP47]]) +; CHECK-NEXT: [[TMP49:%.*]] = fmul fast float [[TMP48]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP50:%.*]] = fptoui float [[TMP49]] to i32 +; CHECK-NEXT: [[TMP51:%.*]] = sub i32 0, [[TMP46]] +; CHECK-NEXT: [[TMP52:%.*]] = mul i32 [[TMP51]], [[TMP50]] +; CHECK-NEXT: [[TMP53:%.*]] = zext i32 [[TMP50]] to i64 +; CHECK-NEXT: [[TMP54:%.*]] = zext i32 [[TMP52]] to i64 +; CHECK-NEXT: [[TMP55:%.*]] = mul i64 [[TMP53]], [[TMP54]] +; CHECK-NEXT: [[TMP56:%.*]] = trunc i64 [[TMP55]] to i32 +; CHECK-NEXT: [[TMP57:%.*]] = lshr i64 [[TMP55]], 32 +; CHECK-NEXT: [[TMP58:%.*]] = trunc i64 [[TMP57]] to i32 +; CHECK-NEXT: [[TMP59:%.*]] = add i32 [[TMP50]], [[TMP58]] +; CHECK-NEXT: [[TMP60:%.*]] = zext i32 [[TMP45]] to i64 +; CHECK-NEXT: [[TMP61:%.*]] = zext i32 [[TMP59]] to i64 +; CHECK-NEXT: [[TMP62:%.*]] = mul i64 [[TMP60]], [[TMP61]] +; CHECK-NEXT: [[TMP63:%.*]] = trunc i64 [[TMP62]] to i32 +; CHECK-NEXT: [[TMP64:%.*]] = lshr i64 [[TMP62]], 32 +; CHECK-NEXT: [[TMP65:%.*]] = trunc i64 [[TMP64]] to i32 +; CHECK-NEXT: [[TMP66:%.*]] = mul i32 [[TMP65]], [[TMP46]] +; CHECK-NEXT: [[TMP67:%.*]] = sub i32 [[TMP45]], [[TMP66]] +; CHECK-NEXT: [[TMP68:%.*]] = icmp uge i32 [[TMP67]], [[TMP46]] +; CHECK-NEXT: [[TMP69:%.*]] = sub i32 [[TMP67]], [[TMP46]] +; CHECK-NEXT: [[TMP70:%.*]] = select i1 [[TMP68]], i32 [[TMP69]], i32 [[TMP67]] +; CHECK-NEXT: [[TMP71:%.*]] = icmp uge i32 [[TMP70]], [[TMP46]] +; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP70]], [[TMP46]] +; CHECK-NEXT: [[TMP73:%.*]] = select i1 [[TMP71]], i32 [[TMP72]], i32 [[TMP70]] +; CHECK-NEXT: [[TMP74:%.*]] = xor i32 [[TMP73]], [[TMP41]] +; CHECK-NEXT: [[TMP75:%.*]] = sub i32 [[TMP74]], [[TMP41]] +; CHECK-NEXT: [[TMP76:%.*]] = insertelement <4 x i32> [[TMP38]], i32 [[TMP75]], i64 1 +; CHECK-NEXT: [[TMP77:%.*]] = extractelement <4 x i32> [[X]], i64 2 +; CHECK-NEXT: [[TMP78:%.*]] = extractelement <4 x i32> [[Y]], i64 2 +; CHECK-NEXT: [[TMP79:%.*]] = ashr i32 [[TMP77]], 31 +; CHECK-NEXT: [[TMP80:%.*]] = ashr i32 [[TMP78]], 31 +; CHECK-NEXT: [[TMP81:%.*]] = add i32 [[TMP77]], [[TMP79]] +; CHECK-NEXT: [[TMP82:%.*]] = add i32 [[TMP78]], [[TMP80]] +; CHECK-NEXT: [[TMP83:%.*]] = xor i32 [[TMP81]], [[TMP79]] +; CHECK-NEXT: [[TMP84:%.*]] = xor i32 [[TMP82]], [[TMP80]] +; CHECK-NEXT: [[TMP85:%.*]] = uitofp i32 [[TMP84]] to float +; CHECK-NEXT: [[TMP86:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP85]]) +; CHECK-NEXT: [[TMP87:%.*]] = fmul fast float [[TMP86]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP88:%.*]] = fptoui float [[TMP87]] to i32 +; CHECK-NEXT: [[TMP89:%.*]] = sub i32 0, [[TMP84]] +; CHECK-NEXT: [[TMP90:%.*]] = mul i32 [[TMP89]], [[TMP88]] +; CHECK-NEXT: [[TMP91:%.*]] = zext i32 [[TMP88]] to i64 +; CHECK-NEXT: [[TMP92:%.*]] = zext i32 [[TMP90]] to i64 +; CHECK-NEXT: [[TMP93:%.*]] = mul i64 [[TMP91]], [[TMP92]] +; CHECK-NEXT: [[TMP94:%.*]] = trunc i64 [[TMP93]] to i32 +; CHECK-NEXT: [[TMP95:%.*]] = lshr i64 [[TMP93]], 32 +; CHECK-NEXT: [[TMP96:%.*]] = trunc i64 [[TMP95]] to i32 +; CHECK-NEXT: [[TMP97:%.*]] = add i32 [[TMP88]], [[TMP96]] +; CHECK-NEXT: [[TMP98:%.*]] = zext i32 [[TMP83]] to i64 +; CHECK-NEXT: [[TMP99:%.*]] = zext i32 [[TMP97]] to i64 +; CHECK-NEXT: [[TMP100:%.*]] = mul i64 [[TMP98]], [[TMP99]] +; CHECK-NEXT: [[TMP101:%.*]] = trunc i64 [[TMP100]] to i32 +; CHECK-NEXT: [[TMP102:%.*]] = lshr i64 [[TMP100]], 32 +; CHECK-NEXT: [[TMP103:%.*]] = trunc i64 [[TMP102]] to i32 +; CHECK-NEXT: [[TMP104:%.*]] = mul i32 [[TMP103]], [[TMP84]] +; CHECK-NEXT: [[TMP105:%.*]] = sub i32 [[TMP83]], [[TMP104]] +; CHECK-NEXT: [[TMP106:%.*]] = icmp uge i32 [[TMP105]], [[TMP84]] +; CHECK-NEXT: [[TMP107:%.*]] = sub i32 [[TMP105]], [[TMP84]] +; CHECK-NEXT: [[TMP108:%.*]] = select i1 [[TMP106]], i32 [[TMP107]], i32 [[TMP105]] +; CHECK-NEXT: [[TMP109:%.*]] = icmp uge i32 [[TMP108]], [[TMP84]] +; CHECK-NEXT: [[TMP110:%.*]] = sub i32 [[TMP108]], [[TMP84]] +; CHECK-NEXT: [[TMP111:%.*]] = select i1 [[TMP109]], i32 [[TMP110]], i32 [[TMP108]] +; CHECK-NEXT: [[TMP112:%.*]] = xor i32 [[TMP111]], [[TMP79]] +; CHECK-NEXT: [[TMP113:%.*]] = sub i32 [[TMP112]], [[TMP79]] +; CHECK-NEXT: [[TMP114:%.*]] = insertelement <4 x i32> [[TMP76]], i32 [[TMP113]], i64 2 +; CHECK-NEXT: [[TMP115:%.*]] = extractelement <4 x i32> [[X]], i64 3 +; CHECK-NEXT: [[TMP116:%.*]] = extractelement <4 x i32> [[Y]], i64 3 +; CHECK-NEXT: [[TMP117:%.*]] = ashr i32 [[TMP115]], 31 +; CHECK-NEXT: [[TMP118:%.*]] = ashr i32 [[TMP116]], 31 +; CHECK-NEXT: [[TMP119:%.*]] = add i32 [[TMP115]], [[TMP117]] +; CHECK-NEXT: [[TMP120:%.*]] = add i32 [[TMP116]], [[TMP118]] +; CHECK-NEXT: [[TMP121:%.*]] = xor i32 [[TMP119]], [[TMP117]] +; CHECK-NEXT: [[TMP122:%.*]] = xor i32 [[TMP120]], [[TMP118]] +; CHECK-NEXT: [[TMP123:%.*]] = uitofp i32 [[TMP122]] to float +; CHECK-NEXT: [[TMP124:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP123]]) +; CHECK-NEXT: [[TMP125:%.*]] = fmul fast float [[TMP124]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP126:%.*]] = fptoui float [[TMP125]] to i32 +; CHECK-NEXT: [[TMP127:%.*]] = sub i32 0, [[TMP122]] +; CHECK-NEXT: [[TMP128:%.*]] = mul i32 [[TMP127]], [[TMP126]] +; CHECK-NEXT: [[TMP129:%.*]] = zext i32 [[TMP126]] to i64 +; CHECK-NEXT: [[TMP130:%.*]] = zext i32 [[TMP128]] to i64 +; CHECK-NEXT: [[TMP131:%.*]] = mul i64 [[TMP129]], [[TMP130]] ; CHECK-NEXT: [[TMP132:%.*]] = trunc i64 [[TMP131]] to i32 -; CHECK-NEXT: [[TMP133:%.*]] = mul i32 [[TMP132]], [[TMP104]] -; CHECK-NEXT: [[TMP134:%.*]] = sub i32 [[TMP103]], [[TMP133]] -; CHECK-NEXT: [[TMP135:%.*]] = icmp uge i32 [[TMP134]], [[TMP104]] -; CHECK-NEXT: [[TMP136:%.*]] = icmp uge i32 [[TMP103]], [[TMP133]] -; CHECK-NEXT: [[TMP137:%.*]] = and i1 [[TMP135]], [[TMP136]] -; CHECK-NEXT: [[TMP138:%.*]] = sub i32 [[TMP134]], [[TMP104]] -; CHECK-NEXT: [[TMP139:%.*]] = add i32 [[TMP134]], [[TMP104]] -; CHECK-NEXT: [[TMP140:%.*]] = select i1 [[TMP137]], i32 [[TMP138]], i32 [[TMP134]] -; CHECK-NEXT: [[TMP141:%.*]] = select i1 [[TMP136]], i32 [[TMP140]], i32 [[TMP139]] -; CHECK-NEXT: [[TMP142:%.*]] = xor i32 [[TMP141]], [[TMP99]] -; CHECK-NEXT: [[TMP143:%.*]] = sub i32 [[TMP142]], [[TMP99]] -; CHECK-NEXT: [[TMP144:%.*]] = insertelement <4 x i32> [[TMP96]], i32 [[TMP143]], i64 2 -; CHECK-NEXT: [[TMP145:%.*]] = extractelement <4 x i32> [[X]], i64 3 -; CHECK-NEXT: [[TMP146:%.*]] = extractelement <4 x i32> [[Y]], i64 3 -; CHECK-NEXT: [[TMP147:%.*]] = ashr i32 [[TMP145]], 31 -; CHECK-NEXT: [[TMP148:%.*]] = ashr i32 [[TMP146]], 31 -; CHECK-NEXT: [[TMP149:%.*]] = add i32 [[TMP145]], [[TMP147]] -; CHECK-NEXT: [[TMP150:%.*]] = add i32 [[TMP146]], [[TMP148]] -; CHECK-NEXT: [[TMP151:%.*]] = xor i32 [[TMP149]], [[TMP147]] -; CHECK-NEXT: [[TMP152:%.*]] = xor i32 [[TMP150]], [[TMP148]] -; CHECK-NEXT: [[TMP153:%.*]] = uitofp i32 [[TMP152]] to float -; CHECK-NEXT: [[TMP154:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP153]]) -; CHECK-NEXT: [[TMP155:%.*]] = fmul fast float [[TMP154]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP156:%.*]] = fptoui float [[TMP155]] to i32 -; CHECK-NEXT: [[TMP157:%.*]] = zext i32 [[TMP156]] to i64 -; CHECK-NEXT: [[TMP158:%.*]] = zext i32 [[TMP152]] to i64 -; CHECK-NEXT: [[TMP159:%.*]] = mul i64 [[TMP157]], [[TMP158]] -; CHECK-NEXT: [[TMP160:%.*]] = trunc i64 [[TMP159]] to i32 -; CHECK-NEXT: [[TMP161:%.*]] = lshr i64 [[TMP159]], 32 -; CHECK-NEXT: [[TMP162:%.*]] = trunc i64 [[TMP161]] to i32 -; CHECK-NEXT: [[TMP163:%.*]] = sub i32 0, [[TMP160]] -; CHECK-NEXT: [[TMP164:%.*]] = icmp eq i32 [[TMP162]], 0 -; CHECK-NEXT: [[TMP165:%.*]] = select i1 [[TMP164]], i32 [[TMP163]], i32 [[TMP160]] -; CHECK-NEXT: [[TMP166:%.*]] = zext i32 [[TMP165]] to i64 -; CHECK-NEXT: [[TMP167:%.*]] = zext i32 [[TMP156]] to i64 -; CHECK-NEXT: [[TMP168:%.*]] = mul i64 [[TMP166]], [[TMP167]] -; CHECK-NEXT: [[TMP169:%.*]] = trunc i64 [[TMP168]] to i32 -; CHECK-NEXT: [[TMP170:%.*]] = lshr i64 [[TMP168]], 32 -; CHECK-NEXT: [[TMP171:%.*]] = trunc i64 [[TMP170]] to i32 -; CHECK-NEXT: [[TMP172:%.*]] = add i32 [[TMP156]], [[TMP171]] -; CHECK-NEXT: [[TMP173:%.*]] = sub i32 [[TMP156]], [[TMP171]] -; CHECK-NEXT: [[TMP174:%.*]] = select i1 [[TMP164]], i32 [[TMP172]], i32 [[TMP173]] -; CHECK-NEXT: [[TMP175:%.*]] = zext i32 [[TMP174]] to i64 -; CHECK-NEXT: [[TMP176:%.*]] = zext i32 [[TMP151]] to i64 -; CHECK-NEXT: [[TMP177:%.*]] = mul i64 [[TMP175]], [[TMP176]] -; CHECK-NEXT: [[TMP178:%.*]] = trunc i64 [[TMP177]] to i32 -; CHECK-NEXT: [[TMP179:%.*]] = lshr i64 [[TMP177]], 32 -; CHECK-NEXT: [[TMP180:%.*]] = trunc i64 [[TMP179]] to i32 -; CHECK-NEXT: [[TMP181:%.*]] = mul i32 [[TMP180]], [[TMP152]] -; CHECK-NEXT: [[TMP182:%.*]] = sub i32 [[TMP151]], [[TMP181]] -; CHECK-NEXT: [[TMP183:%.*]] = icmp uge i32 [[TMP182]], [[TMP152]] -; CHECK-NEXT: [[TMP184:%.*]] = icmp uge i32 [[TMP151]], [[TMP181]] -; CHECK-NEXT: [[TMP185:%.*]] = and i1 [[TMP183]], [[TMP184]] -; CHECK-NEXT: [[TMP186:%.*]] = sub i32 [[TMP182]], [[TMP152]] -; CHECK-NEXT: [[TMP187:%.*]] = add i32 [[TMP182]], [[TMP152]] -; CHECK-NEXT: [[TMP188:%.*]] = select i1 [[TMP185]], i32 [[TMP186]], i32 [[TMP182]] -; CHECK-NEXT: [[TMP189:%.*]] = select i1 [[TMP184]], i32 [[TMP188]], i32 [[TMP187]] -; CHECK-NEXT: [[TMP190:%.*]] = xor i32 [[TMP189]], [[TMP147]] -; CHECK-NEXT: [[TMP191:%.*]] = sub i32 [[TMP190]], [[TMP147]] -; CHECK-NEXT: [[TMP192:%.*]] = insertelement <4 x i32> [[TMP144]], i32 [[TMP191]], i64 3 -; CHECK-NEXT: store <4 x i32> [[TMP192]], <4 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP133:%.*]] = lshr i64 [[TMP131]], 32 +; CHECK-NEXT: [[TMP134:%.*]] = trunc i64 [[TMP133]] to i32 +; CHECK-NEXT: [[TMP135:%.*]] = add i32 [[TMP126]], [[TMP134]] +; CHECK-NEXT: [[TMP136:%.*]] = zext i32 [[TMP121]] to i64 +; CHECK-NEXT: [[TMP137:%.*]] = zext i32 [[TMP135]] to i64 +; CHECK-NEXT: [[TMP138:%.*]] = mul i64 [[TMP136]], [[TMP137]] +; CHECK-NEXT: [[TMP139:%.*]] = trunc i64 [[TMP138]] to i32 +; CHECK-NEXT: [[TMP140:%.*]] = lshr i64 [[TMP138]], 32 +; CHECK-NEXT: [[TMP141:%.*]] = trunc i64 [[TMP140]] to i32 +; CHECK-NEXT: [[TMP142:%.*]] = mul i32 [[TMP141]], [[TMP122]] +; CHECK-NEXT: [[TMP143:%.*]] = sub i32 [[TMP121]], [[TMP142]] +; CHECK-NEXT: [[TMP144:%.*]] = icmp uge i32 [[TMP143]], [[TMP122]] +; CHECK-NEXT: [[TMP145:%.*]] = sub i32 [[TMP143]], [[TMP122]] +; CHECK-NEXT: [[TMP146:%.*]] = select i1 [[TMP144]], i32 [[TMP145]], i32 [[TMP143]] +; CHECK-NEXT: [[TMP147:%.*]] = icmp uge i32 [[TMP146]], [[TMP122]] +; CHECK-NEXT: [[TMP148:%.*]] = sub i32 [[TMP146]], [[TMP122]] +; CHECK-NEXT: [[TMP149:%.*]] = select i1 [[TMP147]], i32 [[TMP148]], i32 [[TMP146]] +; CHECK-NEXT: [[TMP150:%.*]] = xor i32 [[TMP149]], [[TMP117]] +; CHECK-NEXT: [[TMP151:%.*]] = sub i32 [[TMP150]], [[TMP117]] +; CHECK-NEXT: [[TMP152:%.*]] = insertelement <4 x i32> [[TMP114]], i32 [[TMP151]], i64 3 +; CHECK-NEXT: store <4 x i32> [[TMP152]], <4 x i32> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v4i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx8 s[12:19], s[0:1], 0xd -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s11, 0xf000 -; GCN-NEXT: s_mov_b32 s10, -1 +; GCN-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0xd +; GCN-NEXT: s_mov_b32 s14, 0x4f7ffffe +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_ashr_i32 s2, s16, 31 -; GCN-NEXT: s_add_i32 s3, s16, s2 -; GCN-NEXT: s_xor_b32 s5, s3, s2 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s5 -; GCN-NEXT: s_mov_b32 s16, 0x4f800000 -; GCN-NEXT: s_ashr_i32 s6, s12, 31 -; GCN-NEXT: s_ashr_i32 s2, s17, 31 +; GCN-NEXT: s_ashr_i32 s2, s8, 31 +; GCN-NEXT: s_add_i32 s3, s8, s2 +; GCN-NEXT: s_xor_b32 s2, s3, s2 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s2 +; GCN-NEXT: s_sub_i32 s13, 0, s2 +; GCN-NEXT: s_ashr_i32 s12, s9, 31 +; GCN-NEXT: s_add_i32 s9, s9, s12 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_add_i32 s0, s12, s6 -; GCN-NEXT: s_add_i32 s3, s17, s2 -; GCN-NEXT: s_xor_b32 s4, s0, s6 -; GCN-NEXT: v_mul_f32_e32 v0, s16, v0 +; GCN-NEXT: s_xor_b32 s9, s9, s12 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s9 +; GCN-NEXT: s_ashr_i32 s3, s4, 31 +; GCN-NEXT: v_mul_f32_e32 v0, s14, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s17, s3, s2 -; GCN-NEXT: s_ashr_i32 s7, s13, 31 -; GCN-NEXT: s_add_i32 s12, s13, s7 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s5 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s5 -; GCN-NEXT: s_xor_b32 s12, s12, s7 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s17 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v2 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s4 -; GCN-NEXT: v_mul_f32_e32 v1, s16, v1 +; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 +; GCN-NEXT: s_add_i32 s4, s4, s3 +; GCN-NEXT: s_xor_b32 s4, s4, s3 +; GCN-NEXT: v_mul_lo_u32 v2, s13, v0 +; GCN-NEXT: v_mul_f32_e32 v1, s14, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s5 -; GCN-NEXT: v_mul_lo_u32 v4, v1, s17 -; GCN-NEXT: v_mul_hi_u32 v5, v1, s17 -; GCN-NEXT: v_sub_i32_e32 v2, vcc, s4, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s4, v0 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s5, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, s5, v2 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s5, v2 -; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v5 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v1 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: s_ashr_i32 s0, s18, 31 -; GCN-NEXT: s_add_i32 s1, s18, s0 -; GCN-NEXT: s_xor_b32 s13, s1, s0 -; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s13 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s12 -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[2:3] +; GCN-NEXT: s_sub_i32 s13, 0, s9 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: s_ashr_i32 s12, s10, 31 +; GCN-NEXT: s_ashr_i32 s8, s5, 31 +; GCN-NEXT: s_add_i32 s5, s5, s8 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s4, v0 +; GCN-NEXT: v_mul_lo_u32 v2, s13, v1 +; GCN-NEXT: s_xor_b32 s5, s5, s8 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s2 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v2 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: s_add_i32 s2, s10, s12 +; GCN-NEXT: s_xor_b32 s2, s2, s12 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_cvt_f32_u32_e32 v2, s2 +; GCN-NEXT: v_mul_hi_u32 v1, s5, v1 +; GCN-NEXT: v_xor_b32_e32 v0, s3, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s3, v0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GCN-NEXT: v_xor_b32_e32 v0, s6, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v1, s17 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s6, v0 -; GCN-NEXT: v_mul_f32_e32 v2, s16, v2 +; GCN-NEXT: v_mul_lo_u32 v1, v1, s9 +; GCN-NEXT: s_sub_i32 s3, 0, s2 +; GCN-NEXT: s_ashr_i32 s4, s6, 31 +; GCN-NEXT: v_mul_f32_e32 v2, s14, v2 ; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s12, v1 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s12, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s17, v3 -; GCN-NEXT: v_mul_lo_u32 v5, v2, s13 -; GCN-NEXT: v_mul_hi_u32 v6, v2, s13 -; GCN-NEXT: v_add_i32_e32 v4, vcc, s17, v3 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s17, v3 -; GCN-NEXT: v_sub_i32_e32 v7, vcc, 0, v5 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v6 -; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v7, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v5, v5, v2 -; GCN-NEXT: s_ashr_i32 s6, s14, 31 -; GCN-NEXT: s_add_i32 s12, s14, s6 -; GCN-NEXT: s_xor_b32 s12, s12, s6 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v5, v2 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, v5, v2 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: s_ashr_i32 s0, s19, 31 -; GCN-NEXT: s_add_i32 s1, s19, s0 -; GCN-NEXT: s_xor_b32 s14, s1, s0 -; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc -; GCN-NEXT: v_cvt_f32_u32_e32 v3, s14 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v2, v2, s12 -; GCN-NEXT: v_cndmask_b32_e64 v1, v4, v1, s[2:3] +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s5, v1 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s9, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GCN-NEXT: v_mul_lo_u32 v4, s3, v2 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s9, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GCN-NEXT: v_mul_hi_u32 v3, v2, v4 +; GCN-NEXT: s_ashr_i32 s5, s11, 31 +; GCN-NEXT: s_add_i32 s3, s6, s4 +; GCN-NEXT: s_add_i32 s6, s11, s5 +; GCN-NEXT: s_xor_b32 s5, s6, s5 +; GCN-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; GCN-NEXT: v_cvt_f32_u32_e32 v3, s5 +; GCN-NEXT: s_xor_b32 s3, s3, s4 +; GCN-NEXT: v_mul_hi_u32 v2, s3, v2 +; GCN-NEXT: v_xor_b32_e32 v1, s8, v1 ; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GCN-NEXT: v_xor_b32_e32 v1, s7, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v2, s13 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s7, v1 -; GCN-NEXT: v_mul_f32_e32 v3, s16, v3 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s8, v1 +; GCN-NEXT: v_mul_lo_u32 v2, v2, s2 +; GCN-NEXT: s_ashr_i32 s6, s7, 31 +; GCN-NEXT: v_mul_f32_e32 v3, s14, v3 ; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GCN-NEXT: s_ashr_i32 s7, s15, 31 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s12, v2 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s12, v2 -; GCN-NEXT: v_mul_lo_u32 v6, v3, s14 -; GCN-NEXT: v_mul_hi_u32 v7, v3, s14 -; GCN-NEXT: s_add_i32 s12, s15, s7 -; GCN-NEXT: s_xor_b32 s12, s12, s7 -; GCN-NEXT: v_sub_i32_e32 v8, vcc, 0, v6 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v7 -; GCN-NEXT: v_cndmask_b32_e64 v6, v6, v8, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v6, v6, v3 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v4 -; GCN-NEXT: v_add_i32_e32 v5, vcc, s13, v4 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s13, v4 -; GCN-NEXT: v_add_i32_e32 v7, vcc, v6, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, v6, v3 -; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v7, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v3, v3, s12 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v2, v5, v2, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v3, v3, s14 -; GCN-NEXT: v_xor_b32_e32 v2, s6, v2 -; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s6, v2 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s12, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s12, v3 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v4 -; GCN-NEXT: v_add_i32_e32 v5, vcc, s14, v4 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s14, v4 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v3, v4, v3, vcc -; GCN-NEXT: v_cndmask_b32_e64 v3, v5, v3, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v3, s7, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s7, v3 -; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s3, v2 +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s2, v2 +; GCN-NEXT: s_sub_i32 s3, 0, s5 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v2 +; GCN-NEXT: v_mul_lo_u32 v5, s3, v3 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s2, v2 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v2 +; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GCN-NEXT: v_mul_hi_u32 v4, v3, v5 +; GCN-NEXT: s_add_i32 s2, s7, s6 +; GCN-NEXT: s_xor_b32 s7, s2, s6 +; GCN-NEXT: v_xor_b32_e32 v2, s4, v2 +; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v3, s7, v3 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s4, v2 +; GCN-NEXT: s_mov_b32 s3, 0xf000 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_lo_u32 v3, v3, s5 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s7, v3 +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s5, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v3 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s5, v3 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v3 +; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GCN-NEXT: v_xor_b32_e32 v3, s6, v3 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s6, v3 +; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 ; GCN-NEXT: s_endpgm %r = srem <4 x i32> %x, %y store <4 x i32> %r, <4 x i32> addrspace(1)* %out @@ -2086,7 +1808,7 @@ define amdgpu_kernel void @udiv_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> %x ; CHECK-NEXT: [[TMP78:%.*]] = and i32 [[TMP77]], 65535 ; CHECK-NEXT: [[TMP79:%.*]] = trunc i32 [[TMP78]] to i16 ; CHECK-NEXT: [[TMP80:%.*]] = insertelement <4 x i16> [[TMP60]], i16 [[TMP79]], i64 3 -; CHECK-NEXT: store <4 x i16> [[TMP80]], <4 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <4 x i16> [[TMP80]], <4 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v4i16: @@ -2244,7 +1966,7 @@ define amdgpu_kernel void @urem_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> %x ; CHECK-NEXT: [[TMP86:%.*]] = and i32 [[TMP85]], 65535 ; CHECK-NEXT: [[TMP87:%.*]] = trunc i32 [[TMP86]] to i16 ; CHECK-NEXT: [[TMP88:%.*]] = insertelement <4 x i16> [[TMP66]], i16 [[TMP87]], i64 3 -; CHECK-NEXT: store <4 x i16> [[TMP88]], <4 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <4 x i16> [[TMP88]], <4 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v4i16: @@ -2418,7 +2140,7 @@ define amdgpu_kernel void @sdiv_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> %x ; CHECK-NEXT: [[TMP94:%.*]] = ashr i32 [[TMP93]], 16 ; CHECK-NEXT: [[TMP95:%.*]] = trunc i32 [[TMP94]] to i16 ; CHECK-NEXT: [[TMP96:%.*]] = insertelement <4 x i16> [[TMP72]], i16 [[TMP95]], i64 3 -; CHECK-NEXT: store <4 x i16> [[TMP96]], <4 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <4 x i16> [[TMP96]], <4 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v4i16: @@ -2612,7 +2334,7 @@ define amdgpu_kernel void @srem_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> %x ; CHECK-NEXT: [[TMP102:%.*]] = ashr i32 [[TMP101]], 16 ; CHECK-NEXT: [[TMP103:%.*]] = trunc i32 [[TMP102]] to i16 ; CHECK-NEXT: [[TMP104:%.*]] = insertelement <4 x i16> [[TMP78]], i16 [[TMP103]], i64 3 -; CHECK-NEXT: store <4 x i16> [[TMP104]], <4 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <4 x i16> [[TMP104]], <4 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v4i16: @@ -2727,7 +2449,7 @@ define amdgpu_kernel void @udiv_i3(i3 addrspace(1)* %out, i3 %x, i3 %y) { ; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP10]], [[TMP14]] ; CHECK-NEXT: [[TMP16:%.*]] = and i32 [[TMP15]], 7 ; CHECK-NEXT: [[TMP17:%.*]] = trunc i32 [[TMP16]] to i3 -; CHECK-NEXT: store i3 [[TMP17]], i3 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i3 [[TMP17]], i3 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i3: @@ -2777,7 +2499,7 @@ define amdgpu_kernel void @urem_i3(i3 addrspace(1)* %out, i3 %x, i3 %y) { ; CHECK-NEXT: [[TMP17:%.*]] = sub i32 [[TMP1]], [[TMP16]] ; CHECK-NEXT: [[TMP18:%.*]] = and i32 [[TMP17]], 7 ; CHECK-NEXT: [[TMP19:%.*]] = trunc i32 [[TMP18]] to i3 -; CHECK-NEXT: store i3 [[TMP19]], i3 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i3 [[TMP19]], i3 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i3: @@ -2832,7 +2554,7 @@ define amdgpu_kernel void @sdiv_i3(i3 addrspace(1)* %out, i3 %x, i3 %y) { ; CHECK-NEXT: [[TMP19:%.*]] = shl i32 [[TMP18]], 29 ; CHECK-NEXT: [[TMP20:%.*]] = ashr i32 [[TMP19]], 29 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i32 [[TMP20]] to i3 -; CHECK-NEXT: store i3 [[TMP21]], i3 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i3 [[TMP21]], i3 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i3: @@ -2891,7 +2613,7 @@ define amdgpu_kernel void @srem_i3(i3 addrspace(1)* %out, i3 %x, i3 %y) { ; CHECK-NEXT: [[TMP21:%.*]] = shl i32 [[TMP20]], 29 ; CHECK-NEXT: [[TMP22:%.*]] = ashr i32 [[TMP21]], 29 ; CHECK-NEXT: [[TMP23:%.*]] = trunc i32 [[TMP22]] to i3 -; CHECK-NEXT: store i3 [[TMP23]], i3 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i3 [[TMP23]], i3 addrspace(1)* [[OUT:%.*]], align 1 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i3: @@ -2990,7 +2712,7 @@ define amdgpu_kernel void @udiv_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %x ; CHECK-NEXT: [[TMP58:%.*]] = and i32 [[TMP57]], 65535 ; CHECK-NEXT: [[TMP59:%.*]] = trunc i32 [[TMP58]] to i16 ; CHECK-NEXT: [[TMP60:%.*]] = insertelement <3 x i16> [[TMP40]], i16 [[TMP59]], i64 2 -; CHECK-NEXT: store <3 x i16> [[TMP60]], <3 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i16> [[TMP60]], <3 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v3i16: @@ -3114,7 +2836,7 @@ define amdgpu_kernel void @urem_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %x ; CHECK-NEXT: [[TMP64:%.*]] = and i32 [[TMP63]], 65535 ; CHECK-NEXT: [[TMP65:%.*]] = trunc i32 [[TMP64]] to i16 ; CHECK-NEXT: [[TMP66:%.*]] = insertelement <3 x i16> [[TMP44]], i16 [[TMP65]], i64 2 -; CHECK-NEXT: store <3 x i16> [[TMP66]], <3 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i16> [[TMP66]], <3 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v3i16: @@ -3254,7 +2976,7 @@ define amdgpu_kernel void @sdiv_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %x ; CHECK-NEXT: [[TMP70:%.*]] = ashr i32 [[TMP69]], 16 ; CHECK-NEXT: [[TMP71:%.*]] = trunc i32 [[TMP70]] to i16 ; CHECK-NEXT: [[TMP72:%.*]] = insertelement <3 x i16> [[TMP48]], i16 [[TMP71]], i64 2 -; CHECK-NEXT: store <3 x i16> [[TMP72]], <3 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i16> [[TMP72]], <3 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v3i16: @@ -3404,7 +3126,7 @@ define amdgpu_kernel void @srem_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %x ; CHECK-NEXT: [[TMP76:%.*]] = ashr i32 [[TMP75]], 16 ; CHECK-NEXT: [[TMP77:%.*]] = trunc i32 [[TMP76]] to i16 ; CHECK-NEXT: [[TMP78:%.*]] = insertelement <3 x i16> [[TMP52]], i16 [[TMP77]], i64 2 -; CHECK-NEXT: store <3 x i16> [[TMP78]], <3 x i16> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i16> [[TMP78]], <3 x i16> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v3i16: @@ -3545,7 +3267,7 @@ define amdgpu_kernel void @udiv_v3i15(<3 x i15> addrspace(1)* %out, <3 x i15> %x ; CHECK-NEXT: [[TMP58:%.*]] = and i32 [[TMP57]], 32767 ; CHECK-NEXT: [[TMP59:%.*]] = trunc i32 [[TMP58]] to i15 ; CHECK-NEXT: [[TMP60:%.*]] = insertelement <3 x i15> [[TMP40]], i15 [[TMP59]], i64 2 -; CHECK-NEXT: store <3 x i15> [[TMP60]], <3 x i15> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i15> [[TMP60]], <3 x i15> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v3i15: @@ -3677,7 +3399,7 @@ define amdgpu_kernel void @urem_v3i15(<3 x i15> addrspace(1)* %out, <3 x i15> %x ; CHECK-NEXT: [[TMP64:%.*]] = and i32 [[TMP63]], 32767 ; CHECK-NEXT: [[TMP65:%.*]] = trunc i32 [[TMP64]] to i15 ; CHECK-NEXT: [[TMP66:%.*]] = insertelement <3 x i15> [[TMP44]], i15 [[TMP65]], i64 2 -; CHECK-NEXT: store <3 x i15> [[TMP66]], <3 x i15> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i15> [[TMP66]], <3 x i15> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v3i15: @@ -3823,7 +3545,7 @@ define amdgpu_kernel void @sdiv_v3i15(<3 x i15> addrspace(1)* %out, <3 x i15> %x ; CHECK-NEXT: [[TMP70:%.*]] = ashr i32 [[TMP69]], 17 ; CHECK-NEXT: [[TMP71:%.*]] = trunc i32 [[TMP70]] to i15 ; CHECK-NEXT: [[TMP72:%.*]] = insertelement <3 x i15> [[TMP48]], i15 [[TMP71]], i64 2 -; CHECK-NEXT: store <3 x i15> [[TMP72]], <3 x i15> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i15> [[TMP72]], <3 x i15> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v3i15: @@ -3981,7 +3703,7 @@ define amdgpu_kernel void @srem_v3i15(<3 x i15> addrspace(1)* %out, <3 x i15> %x ; CHECK-NEXT: [[TMP76:%.*]] = ashr i32 [[TMP75]], 17 ; CHECK-NEXT: [[TMP77:%.*]] = trunc i32 [[TMP76]] to i15 ; CHECK-NEXT: [[TMP78:%.*]] = insertelement <3 x i15> [[TMP52]], i15 [[TMP77]], i64 2 -; CHECK-NEXT: store <3 x i15> [[TMP78]], <3 x i15> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <3 x i15> [[TMP78]], <3 x i15> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v3i15: @@ -4076,7 +3798,7 @@ define amdgpu_kernel void @srem_v3i15(<3 x i15> addrspace(1)* %out, <3 x i15> %x define amdgpu_kernel void @udiv_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @udiv_i32_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = udiv i32 [[X:%.*]], 1235195 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i32_oddk_denom: @@ -4102,7 +3824,7 @@ define amdgpu_kernel void @udiv_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { define amdgpu_kernel void @udiv_i32_pow2k_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @udiv_i32_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = udiv i32 [[X:%.*]], 4096 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i32_pow2k_denom: @@ -4125,7 +3847,7 @@ define amdgpu_kernel void @udiv_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; CHECK-LABEL: @udiv_i32_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i32 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = udiv i32 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i32_pow2_shl_denom: @@ -4154,7 +3876,7 @@ define amdgpu_kernel void @udiv_v2i32_pow2k_denom(<2 x i32> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = udiv i32 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i32_pow2k_denom: @@ -4183,7 +3905,7 @@ define amdgpu_kernel void @udiv_v2i32_mixed_pow2k_denom(<2 x i32> addrspace(1)* ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = udiv i32 [[TMP4]], 4095 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i32_mixed_pow2k_denom: @@ -4215,144 +3937,120 @@ define amdgpu_kernel void @udiv_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 0 ; CHECK-NEXT: [[TMP3:%.*]] = uitofp i32 [[TMP2]] to float ; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP3]]) -; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP6:%.*]] = fptoui float [[TMP5]] to i32 -; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP2]] to i64 -; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] -; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP7:%.*]] = sub i32 0, [[TMP2]] +; CHECK-NEXT: [[TMP8:%.*]] = mul i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] ; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP10]] -; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP12]], 0 -; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP10]] -; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP15]] to i64 -; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 +; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP6]], [[TMP14]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP1]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 ; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 ; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP14]], i32 [[TMP22]], i32 [[TMP23]] -; CHECK-NEXT: [[TMP25:%.*]] = zext i32 [[TMP24]] to i64 -; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP1]] to i64 -; CHECK-NEXT: [[TMP27:%.*]] = mul i64 [[TMP25]], [[TMP26]] -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = lshr i64 [[TMP27]], 32 -; CHECK-NEXT: [[TMP30:%.*]] = trunc i64 [[TMP29]] to i32 -; CHECK-NEXT: [[TMP31:%.*]] = mul i32 [[TMP30]], [[TMP2]] -; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP35:%.*]] = and i1 [[TMP33]], [[TMP34]] -; CHECK-NEXT: [[TMP36:%.*]] = add i32 [[TMP30]], 1 -; CHECK-NEXT: [[TMP37:%.*]] = sub i32 [[TMP30]], 1 -; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP35]], i32 [[TMP36]], i32 [[TMP30]] -; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP34]], i32 [[TMP38]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP40:%.*]] = insertelement <2 x i32> undef, i32 [[TMP39]], i64 0 -; CHECK-NEXT: [[TMP41:%.*]] = extractelement <2 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 -; CHECK-NEXT: [[TMP43:%.*]] = uitofp i32 [[TMP42]] to float -; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP43]]) -; CHECK-NEXT: [[TMP45:%.*]] = fmul fast float [[TMP44]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP46:%.*]] = fptoui float [[TMP45]] to i32 -; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP42]] to i64 -; CHECK-NEXT: [[TMP49:%.*]] = mul i64 [[TMP47]], [[TMP48]] -; CHECK-NEXT: [[TMP50:%.*]] = trunc i64 [[TMP49]] to i32 -; CHECK-NEXT: [[TMP51:%.*]] = lshr i64 [[TMP49]], 32 -; CHECK-NEXT: [[TMP52:%.*]] = trunc i64 [[TMP51]] to i32 -; CHECK-NEXT: [[TMP53:%.*]] = sub i32 0, [[TMP50]] -; CHECK-NEXT: [[TMP54:%.*]] = icmp eq i32 [[TMP52]], 0 -; CHECK-NEXT: [[TMP55:%.*]] = select i1 [[TMP54]], i32 [[TMP53]], i32 [[TMP50]] -; CHECK-NEXT: [[TMP56:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP58:%.*]] = mul i64 [[TMP56]], [[TMP57]] -; CHECK-NEXT: [[TMP59:%.*]] = trunc i64 [[TMP58]] to i32 -; CHECK-NEXT: [[TMP60:%.*]] = lshr i64 [[TMP58]], 32 -; CHECK-NEXT: [[TMP61:%.*]] = trunc i64 [[TMP60]] to i32 -; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP63:%.*]] = sub i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP64:%.*]] = select i1 [[TMP54]], i32 [[TMP62]], i32 [[TMP63]] -; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP64]] to i64 -; CHECK-NEXT: [[TMP66:%.*]] = zext i32 [[TMP41]] to i64 -; CHECK-NEXT: [[TMP67:%.*]] = mul i64 [[TMP65]], [[TMP66]] -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = lshr i64 [[TMP67]], 32 -; CHECK-NEXT: [[TMP70:%.*]] = trunc i64 [[TMP69]] to i32 -; CHECK-NEXT: [[TMP71:%.*]] = mul i32 [[TMP70]], [[TMP42]] -; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = icmp uge i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP74:%.*]] = icmp uge i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP75:%.*]] = and i1 [[TMP73]], [[TMP74]] -; CHECK-NEXT: [[TMP76:%.*]] = add i32 [[TMP70]], 1 -; CHECK-NEXT: [[TMP77:%.*]] = sub i32 [[TMP70]], 1 -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP75]], i32 [[TMP76]], i32 [[TMP70]] -; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP74]], i32 [[TMP78]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = insertelement <2 x i32> [[TMP40]], i32 [[TMP79]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP80]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP22:%.*]] = mul i32 [[TMP21]], [[TMP2]] +; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP1]], [[TMP22]] +; CHECK-NEXT: [[TMP24:%.*]] = icmp uge i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP25:%.*]] = add i32 [[TMP21]], 1 +; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP24]], i32 [[TMP25]], i32 [[TMP21]] +; CHECK-NEXT: [[TMP27:%.*]] = sub i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP24]], i32 [[TMP27]], i32 [[TMP23]] +; CHECK-NEXT: [[TMP29:%.*]] = icmp uge i32 [[TMP28]], [[TMP2]] +; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP26]], 1 +; CHECK-NEXT: [[TMP31:%.*]] = select i1 [[TMP29]], i32 [[TMP30]], i32 [[TMP26]] +; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x i32> undef, i32 [[TMP31]], i64 0 +; CHECK-NEXT: [[TMP33:%.*]] = extractelement <2 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 +; CHECK-NEXT: [[TMP35:%.*]] = uitofp i32 [[TMP34]] to float +; CHECK-NEXT: [[TMP36:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP35]]) +; CHECK-NEXT: [[TMP37:%.*]] = fmul fast float [[TMP36]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP38:%.*]] = fptoui float [[TMP37]] to i32 +; CHECK-NEXT: [[TMP39:%.*]] = sub i32 0, [[TMP34]] +; CHECK-NEXT: [[TMP40:%.*]] = mul i32 [[TMP39]], [[TMP38]] +; CHECK-NEXT: [[TMP41:%.*]] = zext i32 [[TMP38]] to i64 +; CHECK-NEXT: [[TMP42:%.*]] = zext i32 [[TMP40]] to i64 +; CHECK-NEXT: [[TMP43:%.*]] = mul i64 [[TMP41]], [[TMP42]] +; CHECK-NEXT: [[TMP44:%.*]] = trunc i64 [[TMP43]] to i32 +; CHECK-NEXT: [[TMP45:%.*]] = lshr i64 [[TMP43]], 32 +; CHECK-NEXT: [[TMP46:%.*]] = trunc i64 [[TMP45]] to i32 +; CHECK-NEXT: [[TMP47:%.*]] = add i32 [[TMP38]], [[TMP46]] +; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP33]] to i64 +; CHECK-NEXT: [[TMP49:%.*]] = zext i32 [[TMP47]] to i64 +; CHECK-NEXT: [[TMP50:%.*]] = mul i64 [[TMP48]], [[TMP49]] +; CHECK-NEXT: [[TMP51:%.*]] = trunc i64 [[TMP50]] to i32 +; CHECK-NEXT: [[TMP52:%.*]] = lshr i64 [[TMP50]], 32 +; CHECK-NEXT: [[TMP53:%.*]] = trunc i64 [[TMP52]] to i32 +; CHECK-NEXT: [[TMP54:%.*]] = mul i32 [[TMP53]], [[TMP34]] +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 [[TMP33]], [[TMP54]] +; CHECK-NEXT: [[TMP56:%.*]] = icmp uge i32 [[TMP55]], [[TMP34]] +; CHECK-NEXT: [[TMP57:%.*]] = add i32 [[TMP53]], 1 +; CHECK-NEXT: [[TMP58:%.*]] = select i1 [[TMP56]], i32 [[TMP57]], i32 [[TMP53]] +; CHECK-NEXT: [[TMP59:%.*]] = sub i32 [[TMP55]], [[TMP34]] +; CHECK-NEXT: [[TMP60:%.*]] = select i1 [[TMP56]], i32 [[TMP59]], i32 [[TMP55]] +; CHECK-NEXT: [[TMP61:%.*]] = icmp uge i32 [[TMP60]], [[TMP34]] +; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP58]], 1 +; CHECK-NEXT: [[TMP63:%.*]] = select i1 [[TMP61]], i32 [[TMP62]], i32 [[TMP58]] +; CHECK-NEXT: [[TMP64:%.*]] = insertelement <2 x i32> [[TMP32]], i32 [[TMP63]], i64 1 +; CHECK-NEXT: store <2 x i32> [[TMP64]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i32_pow2_shl_denom: ; GCN: ; %bb.0: ; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd ; GCN-NEXT: s_movk_i32 s4, 0x1000 +; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb ; GCN-NEXT: s_mov_b32 s7, 0xf000 ; GCN-NEXT: s_mov_b32 s6, -1 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_lshl_b32 s2, s4, s2 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s2 +; GCN-NEXT: s_lshl_b32 s5, s4, s2 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s5 ; GCN-NEXT: s_lshl_b32 s10, s4, s3 -; GCN-NEXT: s_mov_b32 s3, 0x4f800000 +; GCN-NEXT: s_mov_b32 s3, 0x4f7ffffe ; GCN-NEXT: v_cvt_f32_u32_e32 v1, s10 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb +; GCN-NEXT: s_sub_i32 s2, 0, s5 ; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 ; GCN-NEXT: v_mul_f32_e32 v0, s3, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 ; GCN-NEXT: v_mul_f32_e32 v1, s3, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v0, s2 -; GCN-NEXT: v_mul_hi_u32 v3, v0, s2 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v0 -; GCN-NEXT: v_mul_lo_u32 v3, v1, s10 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v2, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GCN-NEXT: v_mul_hi_u32 v2, v1, s10 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v1 -; GCN-NEXT: v_mul_lo_u32 v5, v0, s2 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v2, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s8, v5 -; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s2, v3 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_mul_lo_u32 v4, v1, s10 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], s8, v5 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GCN-NEXT: v_sub_i32_e32 v2, vcc, s9, v4 -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[0:1] +; GCN-NEXT: v_mul_lo_u32 v2, s2, v0 +; GCN-NEXT: s_sub_i32 s2, 0, s10 +; GCN-NEXT: v_mul_lo_u32 v3, s2, v1 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s8, v0 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v3 +; GCN-NEXT: v_mul_lo_u32 v3, v0, s5 +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; GCN-NEXT: v_sub_i32_e32 v3, vcc, s8, v3 +; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s5, v3 +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[2:3] +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, s5, v3 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[2:3] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s5, v3 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s9, v1 +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GCN-NEXT: v_mul_lo_u32 v2, v1, s10 +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s9, v2 ; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s10, v2 -; GCN-NEXT: v_add_i32_e32 v2, vcc, -1, v1 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v4 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s10, v2 +; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v3, s[0:1] ; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 ; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[2:3] +; GCN-NEXT: s_waitcnt lgkmcnt(0) ; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GCN-NEXT: s_endpgm %shl.y = shl <2 x i32> , %y @@ -4364,7 +4062,7 @@ define amdgpu_kernel void @udiv_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou define amdgpu_kernel void @urem_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @urem_i32_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = urem i32 [[X:%.*]], 1235195 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i32_oddk_denom: @@ -4392,7 +4090,7 @@ define amdgpu_kernel void @urem_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { define amdgpu_kernel void @urem_i32_pow2k_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @urem_i32_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = urem i32 [[X:%.*]], 4096 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i32_pow2k_denom: @@ -4415,7 +4113,7 @@ define amdgpu_kernel void @urem_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; CHECK-LABEL: @urem_i32_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i32 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = urem i32 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i32_pow2_shl_denom: @@ -4445,7 +4143,7 @@ define amdgpu_kernel void @urem_v2i32_pow2k_denom(<2 x i32> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = urem i32 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v2i32_pow2k_denom: @@ -4474,145 +4172,113 @@ define amdgpu_kernel void @urem_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 0 ; CHECK-NEXT: [[TMP3:%.*]] = uitofp i32 [[TMP2]] to float ; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP3]]) -; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP6:%.*]] = fptoui float [[TMP5]] to i32 -; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64 -; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP2]] to i64 -; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP7]], [[TMP8]] -; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32 -; CHECK-NEXT: [[TMP11:%.*]] = lshr i64 [[TMP9]], 32 +; CHECK-NEXT: [[TMP7:%.*]] = sub i32 0, [[TMP2]] +; CHECK-NEXT: [[TMP8:%.*]] = mul i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP9]], [[TMP10]] ; CHECK-NEXT: [[TMP12:%.*]] = trunc i64 [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP10]] -; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP12]], 0 -; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP10]] -; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP15]] to i64 -; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP11]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP13]] to i32 +; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP6]], [[TMP14]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP1]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 ; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 ; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 ; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 -; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP6]], [[TMP21]] -; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP14]], i32 [[TMP22]], i32 [[TMP23]] -; CHECK-NEXT: [[TMP25:%.*]] = zext i32 [[TMP24]] to i64 -; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP1]] to i64 -; CHECK-NEXT: [[TMP27:%.*]] = mul i64 [[TMP25]], [[TMP26]] -; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = lshr i64 [[TMP27]], 32 -; CHECK-NEXT: [[TMP30:%.*]] = trunc i64 [[TMP29]] to i32 -; CHECK-NEXT: [[TMP31:%.*]] = mul i32 [[TMP30]], [[TMP2]] -; CHECK-NEXT: [[TMP32:%.*]] = sub i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP34:%.*]] = icmp uge i32 [[TMP1]], [[TMP31]] -; CHECK-NEXT: [[TMP35:%.*]] = and i1 [[TMP33]], [[TMP34]] -; CHECK-NEXT: [[TMP36:%.*]] = sub i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP37:%.*]] = add i32 [[TMP32]], [[TMP2]] -; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP35]], i32 [[TMP36]], i32 [[TMP32]] -; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP34]], i32 [[TMP38]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP40:%.*]] = insertelement <2 x i32> undef, i32 [[TMP39]], i64 0 -; CHECK-NEXT: [[TMP41:%.*]] = extractelement <2 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 -; CHECK-NEXT: [[TMP43:%.*]] = uitofp i32 [[TMP42]] to float -; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP43]]) -; CHECK-NEXT: [[TMP45:%.*]] = fmul fast float [[TMP44]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP46:%.*]] = fptoui float [[TMP45]] to i32 -; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP48:%.*]] = zext i32 [[TMP42]] to i64 -; CHECK-NEXT: [[TMP49:%.*]] = mul i64 [[TMP47]], [[TMP48]] -; CHECK-NEXT: [[TMP50:%.*]] = trunc i64 [[TMP49]] to i32 -; CHECK-NEXT: [[TMP51:%.*]] = lshr i64 [[TMP49]], 32 -; CHECK-NEXT: [[TMP52:%.*]] = trunc i64 [[TMP51]] to i32 -; CHECK-NEXT: [[TMP53:%.*]] = sub i32 0, [[TMP50]] -; CHECK-NEXT: [[TMP54:%.*]] = icmp eq i32 [[TMP52]], 0 -; CHECK-NEXT: [[TMP55:%.*]] = select i1 [[TMP54]], i32 [[TMP53]], i32 [[TMP50]] -; CHECK-NEXT: [[TMP56:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP46]] to i64 -; CHECK-NEXT: [[TMP58:%.*]] = mul i64 [[TMP56]], [[TMP57]] -; CHECK-NEXT: [[TMP59:%.*]] = trunc i64 [[TMP58]] to i32 -; CHECK-NEXT: [[TMP60:%.*]] = lshr i64 [[TMP58]], 32 -; CHECK-NEXT: [[TMP61:%.*]] = trunc i64 [[TMP60]] to i32 -; CHECK-NEXT: [[TMP62:%.*]] = add i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP63:%.*]] = sub i32 [[TMP46]], [[TMP61]] -; CHECK-NEXT: [[TMP64:%.*]] = select i1 [[TMP54]], i32 [[TMP62]], i32 [[TMP63]] -; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP64]] to i64 -; CHECK-NEXT: [[TMP66:%.*]] = zext i32 [[TMP41]] to i64 -; CHECK-NEXT: [[TMP67:%.*]] = mul i64 [[TMP65]], [[TMP66]] -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = lshr i64 [[TMP67]], 32 -; CHECK-NEXT: [[TMP70:%.*]] = trunc i64 [[TMP69]] to i32 -; CHECK-NEXT: [[TMP71:%.*]] = mul i32 [[TMP70]], [[TMP42]] -; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = icmp uge i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP74:%.*]] = icmp uge i32 [[TMP41]], [[TMP71]] -; CHECK-NEXT: [[TMP75:%.*]] = and i1 [[TMP73]], [[TMP74]] -; CHECK-NEXT: [[TMP76:%.*]] = sub i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP77:%.*]] = add i32 [[TMP72]], [[TMP42]] -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP75]], i32 [[TMP76]], i32 [[TMP72]] -; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP74]], i32 [[TMP78]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = insertelement <2 x i32> [[TMP40]], i32 [[TMP79]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP80]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP22:%.*]] = mul i32 [[TMP21]], [[TMP2]] +; CHECK-NEXT: [[TMP23:%.*]] = sub i32 [[TMP1]], [[TMP22]] +; CHECK-NEXT: [[TMP24:%.*]] = icmp uge i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP25:%.*]] = sub i32 [[TMP23]], [[TMP2]] +; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP24]], i32 [[TMP25]], i32 [[TMP23]] +; CHECK-NEXT: [[TMP27:%.*]] = icmp uge i32 [[TMP26]], [[TMP2]] +; CHECK-NEXT: [[TMP28:%.*]] = sub i32 [[TMP26]], [[TMP2]] +; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP27]], i32 [[TMP28]], i32 [[TMP26]] +; CHECK-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> undef, i32 [[TMP29]], i64 0 +; CHECK-NEXT: [[TMP31:%.*]] = extractelement <2 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP32:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 +; CHECK-NEXT: [[TMP33:%.*]] = uitofp i32 [[TMP32]] to float +; CHECK-NEXT: [[TMP34:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP33]]) +; CHECK-NEXT: [[TMP35:%.*]] = fmul fast float [[TMP34]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP36:%.*]] = fptoui float [[TMP35]] to i32 +; CHECK-NEXT: [[TMP37:%.*]] = sub i32 0, [[TMP32]] +; CHECK-NEXT: [[TMP38:%.*]] = mul i32 [[TMP37]], [[TMP36]] +; CHECK-NEXT: [[TMP39:%.*]] = zext i32 [[TMP36]] to i64 +; CHECK-NEXT: [[TMP40:%.*]] = zext i32 [[TMP38]] to i64 +; CHECK-NEXT: [[TMP41:%.*]] = mul i64 [[TMP39]], [[TMP40]] +; CHECK-NEXT: [[TMP42:%.*]] = trunc i64 [[TMP41]] to i32 +; CHECK-NEXT: [[TMP43:%.*]] = lshr i64 [[TMP41]], 32 +; CHECK-NEXT: [[TMP44:%.*]] = trunc i64 [[TMP43]] to i32 +; CHECK-NEXT: [[TMP45:%.*]] = add i32 [[TMP36]], [[TMP44]] +; CHECK-NEXT: [[TMP46:%.*]] = zext i32 [[TMP31]] to i64 +; CHECK-NEXT: [[TMP47:%.*]] = zext i32 [[TMP45]] to i64 +; CHECK-NEXT: [[TMP48:%.*]] = mul i64 [[TMP46]], [[TMP47]] +; CHECK-NEXT: [[TMP49:%.*]] = trunc i64 [[TMP48]] to i32 +; CHECK-NEXT: [[TMP50:%.*]] = lshr i64 [[TMP48]], 32 +; CHECK-NEXT: [[TMP51:%.*]] = trunc i64 [[TMP50]] to i32 +; CHECK-NEXT: [[TMP52:%.*]] = mul i32 [[TMP51]], [[TMP32]] +; CHECK-NEXT: [[TMP53:%.*]] = sub i32 [[TMP31]], [[TMP52]] +; CHECK-NEXT: [[TMP54:%.*]] = icmp uge i32 [[TMP53]], [[TMP32]] +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 [[TMP53]], [[TMP32]] +; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP54]], i32 [[TMP55]], i32 [[TMP53]] +; CHECK-NEXT: [[TMP57:%.*]] = icmp uge i32 [[TMP56]], [[TMP32]] +; CHECK-NEXT: [[TMP58:%.*]] = sub i32 [[TMP56]], [[TMP32]] +; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP57]], i32 [[TMP58]], i32 [[TMP56]] +; CHECK-NEXT: [[TMP60:%.*]] = insertelement <2 x i32> [[TMP30]], i32 [[TMP59]], i64 1 +; CHECK-NEXT: store <2 x i32> [[TMP60]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v2i32_pow2_shl_denom: ; GCN: ; %bb.0: ; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd ; GCN-NEXT: s_movk_i32 s4, 0x1000 -; GCN-NEXT: s_mov_b32 s7, 0xf000 -; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_mov_b32 s7, 0x4f7ffffe ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_lshl_b32 s10, s4, s2 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s10 -; GCN-NEXT: s_mov_b32 s2, 0x4f800000 -; GCN-NEXT: s_lshl_b32 s11, s4, s3 -; GCN-NEXT: v_cvt_f32_u32_e32 v1, s11 +; GCN-NEXT: s_lshl_b32 s2, s4, s2 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s2 +; GCN-NEXT: s_lshl_b32 s6, s4, s3 +; GCN-NEXT: s_sub_i32 s3, 0, s2 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s6 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GCN-NEXT: v_mul_f32_e32 v0, s2, v0 +; GCN-NEXT: v_mul_f32_e32 v0, s7, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_f32_e32 v1, s2, v1 +; GCN-NEXT: v_mul_f32_e32 v1, s7, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v0, s10 -; GCN-NEXT: v_mul_hi_u32 v3, v0, s10 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 -; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v0 -; GCN-NEXT: v_mul_lo_u32 v3, v1, s11 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v2, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GCN-NEXT: v_mul_hi_u32 v2, v1, s11 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] -; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 +; GCN-NEXT: v_mul_lo_u32 v2, s3, v0 +; GCN-NEXT: s_sub_i32 s3, 0, s6 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: v_mul_hi_u32 v0, v0, s8 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v2, v2, v1 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s10 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v2, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s8, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], s8, v0 -; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s10, v3 -; GCN-NEXT: v_mul_lo_u32 v1, v1, s11 -; GCN-NEXT: v_add_i32_e32 v4, vcc, s10, v3 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s10, v3 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GCN-NEXT: v_sub_i32_e32 v2, vcc, s9, v1 -; GCN-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[0:1] -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s11, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, s11, v2 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s11, v2 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 +; GCN-NEXT: v_mul_hi_u32 v0, s4, v0 +; GCN-NEXT: v_mul_lo_u32 v2, s3, v1 +; GCN-NEXT: s_mov_b32 s3, 0xf000 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s2 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v2 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s5, v1 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_lo_u32 v1, v1, s6 +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s5, v1 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s6, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s6, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s6, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s6, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0 ; GCN-NEXT: s_endpgm %shl.y = shl <2 x i32> , %y %r = urem <2 x i32> %x, %shl.y @@ -4623,7 +4289,7 @@ define amdgpu_kernel void @urem_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou define amdgpu_kernel void @sdiv_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @sdiv_i32_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = sdiv i32 [[X:%.*]], 1235195 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i32_oddk_denom: @@ -4649,7 +4315,7 @@ define amdgpu_kernel void @sdiv_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { define amdgpu_kernel void @sdiv_i32_pow2k_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @sdiv_i32_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = sdiv i32 [[X:%.*]], 4096 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i32_pow2k_denom: @@ -4675,7 +4341,7 @@ define amdgpu_kernel void @sdiv_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; CHECK-LABEL: @sdiv_i32_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i32 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = sdiv i32 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i32_pow2_shl_denom: @@ -4734,7 +4400,7 @@ define amdgpu_kernel void @sdiv_v2i32_pow2k_denom(<2 x i32> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = sdiv i32 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v2i32_pow2k_denom: @@ -4769,7 +4435,7 @@ define amdgpu_kernel void @ssdiv_v2i32_mixed_pow2k_denom(<2 x i32> addrspace(1)* ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = sdiv i32 [[TMP4]], 4095 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: ssdiv_v2i32_mixed_pow2k_denom: @@ -4811,173 +4477,149 @@ define amdgpu_kernel void @sdiv_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou ; CHECK-NEXT: [[TMP9:%.*]] = xor i32 [[TMP7]], [[TMP4]] ; CHECK-NEXT: [[TMP10:%.*]] = uitofp i32 [[TMP9]] to float ; CHECK-NEXT: [[TMP11:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP10]]) -; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[TMP11]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[TMP11]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP13:%.*]] = fptoui float [[TMP12]] to i32 -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP9]] to i64 -; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP14]], [[TMP15]] -; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[TMP16]] to i32 -; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP16]], 32 +; CHECK-NEXT: [[TMP14:%.*]] = sub i32 0, [[TMP9]] +; CHECK-NEXT: [[TMP15:%.*]] = mul i32 [[TMP14]], [[TMP13]] +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP13]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP18:%.*]] = mul i64 [[TMP16]], [[TMP17]] ; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32 -; CHECK-NEXT: [[TMP20:%.*]] = sub i32 0, [[TMP17]] -; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i32 [[TMP19]], 0 -; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 [[TMP17]] -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64 -; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[TMP13]] to i64 +; CHECK-NEXT: [[TMP20:%.*]] = lshr i64 [[TMP18]], 32 +; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP20]] to i32 +; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP13]], [[TMP21]] +; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[TMP22]] to i64 ; CHECK-NEXT: [[TMP25:%.*]] = mul i64 [[TMP23]], [[TMP24]] ; CHECK-NEXT: [[TMP26:%.*]] = trunc i64 [[TMP25]] to i32 ; CHECK-NEXT: [[TMP27:%.*]] = lshr i64 [[TMP25]], 32 ; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[TMP27]] to i32 -; CHECK-NEXT: [[TMP29:%.*]] = add i32 [[TMP13]], [[TMP28]] -; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[TMP13]], [[TMP28]] -; CHECK-NEXT: [[TMP31:%.*]] = select i1 [[TMP21]], i32 [[TMP29]], i32 [[TMP30]] -; CHECK-NEXT: [[TMP32:%.*]] = zext i32 [[TMP31]] to i64 -; CHECK-NEXT: [[TMP33:%.*]] = zext i32 [[TMP8]] to i64 -; CHECK-NEXT: [[TMP34:%.*]] = mul i64 [[TMP32]], [[TMP33]] -; CHECK-NEXT: [[TMP35:%.*]] = trunc i64 [[TMP34]] to i32 -; CHECK-NEXT: [[TMP36:%.*]] = lshr i64 [[TMP34]], 32 -; CHECK-NEXT: [[TMP37:%.*]] = trunc i64 [[TMP36]] to i32 -; CHECK-NEXT: [[TMP38:%.*]] = mul i32 [[TMP37]], [[TMP9]] -; CHECK-NEXT: [[TMP39:%.*]] = sub i32 [[TMP8]], [[TMP38]] -; CHECK-NEXT: [[TMP40:%.*]] = icmp uge i32 [[TMP39]], [[TMP9]] -; CHECK-NEXT: [[TMP41:%.*]] = icmp uge i32 [[TMP8]], [[TMP38]] -; CHECK-NEXT: [[TMP42:%.*]] = and i1 [[TMP40]], [[TMP41]] -; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP37]], 1 -; CHECK-NEXT: [[TMP44:%.*]] = sub i32 [[TMP37]], 1 -; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP42]], i32 [[TMP43]], i32 [[TMP37]] -; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP41]], i32 [[TMP45]], i32 [[TMP44]] -; CHECK-NEXT: [[TMP47:%.*]] = xor i32 [[TMP46]], [[TMP5]] -; CHECK-NEXT: [[TMP48:%.*]] = sub i32 [[TMP47]], [[TMP5]] -; CHECK-NEXT: [[TMP49:%.*]] = insertelement <2 x i32> undef, i32 [[TMP48]], i64 0 -; CHECK-NEXT: [[TMP50:%.*]] = extractelement <2 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP51:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 -; CHECK-NEXT: [[TMP52:%.*]] = ashr i32 [[TMP50]], 31 -; CHECK-NEXT: [[TMP53:%.*]] = ashr i32 [[TMP51]], 31 -; CHECK-NEXT: [[TMP54:%.*]] = xor i32 [[TMP52]], [[TMP53]] -; CHECK-NEXT: [[TMP55:%.*]] = add i32 [[TMP50]], [[TMP52]] -; CHECK-NEXT: [[TMP56:%.*]] = add i32 [[TMP51]], [[TMP53]] -; CHECK-NEXT: [[TMP57:%.*]] = xor i32 [[TMP55]], [[TMP52]] -; CHECK-NEXT: [[TMP58:%.*]] = xor i32 [[TMP56]], [[TMP53]] -; CHECK-NEXT: [[TMP59:%.*]] = uitofp i32 [[TMP58]] to float -; CHECK-NEXT: [[TMP60:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP59]]) -; CHECK-NEXT: [[TMP61:%.*]] = fmul fast float [[TMP60]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP62:%.*]] = fptoui float [[TMP61]] to i32 -; CHECK-NEXT: [[TMP63:%.*]] = zext i32 [[TMP62]] to i64 -; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[TMP58]] to i64 -; CHECK-NEXT: [[TMP65:%.*]] = mul i64 [[TMP63]], [[TMP64]] -; CHECK-NEXT: [[TMP66:%.*]] = trunc i64 [[TMP65]] to i32 -; CHECK-NEXT: [[TMP67:%.*]] = lshr i64 [[TMP65]], 32 -; CHECK-NEXT: [[TMP68:%.*]] = trunc i64 [[TMP67]] to i32 -; CHECK-NEXT: [[TMP69:%.*]] = sub i32 0, [[TMP66]] -; CHECK-NEXT: [[TMP70:%.*]] = icmp eq i32 [[TMP68]], 0 -; CHECK-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP69]], i32 [[TMP66]] -; CHECK-NEXT: [[TMP72:%.*]] = zext i32 [[TMP71]] to i64 -; CHECK-NEXT: [[TMP73:%.*]] = zext i32 [[TMP62]] to i64 -; CHECK-NEXT: [[TMP74:%.*]] = mul i64 [[TMP72]], [[TMP73]] -; CHECK-NEXT: [[TMP75:%.*]] = trunc i64 [[TMP74]] to i32 -; CHECK-NEXT: [[TMP76:%.*]] = lshr i64 [[TMP74]], 32 -; CHECK-NEXT: [[TMP77:%.*]] = trunc i64 [[TMP76]] to i32 -; CHECK-NEXT: [[TMP78:%.*]] = add i32 [[TMP62]], [[TMP77]] -; CHECK-NEXT: [[TMP79:%.*]] = sub i32 [[TMP62]], [[TMP77]] -; CHECK-NEXT: [[TMP80:%.*]] = select i1 [[TMP70]], i32 [[TMP78]], i32 [[TMP79]] -; CHECK-NEXT: [[TMP81:%.*]] = zext i32 [[TMP80]] to i64 -; CHECK-NEXT: [[TMP82:%.*]] = zext i32 [[TMP57]] to i64 -; CHECK-NEXT: [[TMP83:%.*]] = mul i64 [[TMP81]], [[TMP82]] -; CHECK-NEXT: [[TMP84:%.*]] = trunc i64 [[TMP83]] to i32 -; CHECK-NEXT: [[TMP85:%.*]] = lshr i64 [[TMP83]], 32 -; CHECK-NEXT: [[TMP86:%.*]] = trunc i64 [[TMP85]] to i32 -; CHECK-NEXT: [[TMP87:%.*]] = mul i32 [[TMP86]], [[TMP58]] -; CHECK-NEXT: [[TMP88:%.*]] = sub i32 [[TMP57]], [[TMP87]] -; CHECK-NEXT: [[TMP89:%.*]] = icmp uge i32 [[TMP88]], [[TMP58]] -; CHECK-NEXT: [[TMP90:%.*]] = icmp uge i32 [[TMP57]], [[TMP87]] -; CHECK-NEXT: [[TMP91:%.*]] = and i1 [[TMP89]], [[TMP90]] -; CHECK-NEXT: [[TMP92:%.*]] = add i32 [[TMP86]], 1 -; CHECK-NEXT: [[TMP93:%.*]] = sub i32 [[TMP86]], 1 -; CHECK-NEXT: [[TMP94:%.*]] = select i1 [[TMP91]], i32 [[TMP92]], i32 [[TMP86]] -; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP90]], i32 [[TMP94]], i32 [[TMP93]] -; CHECK-NEXT: [[TMP96:%.*]] = xor i32 [[TMP95]], [[TMP54]] -; CHECK-NEXT: [[TMP97:%.*]] = sub i32 [[TMP96]], [[TMP54]] -; CHECK-NEXT: [[TMP98:%.*]] = insertelement <2 x i32> [[TMP49]], i32 [[TMP97]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP98]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP29:%.*]] = mul i32 [[TMP28]], [[TMP9]] +; CHECK-NEXT: [[TMP30:%.*]] = sub i32 [[TMP8]], [[TMP29]] +; CHECK-NEXT: [[TMP31:%.*]] = icmp uge i32 [[TMP30]], [[TMP9]] +; CHECK-NEXT: [[TMP32:%.*]] = add i32 [[TMP28]], 1 +; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP31]], i32 [[TMP32]], i32 [[TMP28]] +; CHECK-NEXT: [[TMP34:%.*]] = sub i32 [[TMP30]], [[TMP9]] +; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP31]], i32 [[TMP34]], i32 [[TMP30]] +; CHECK-NEXT: [[TMP36:%.*]] = icmp uge i32 [[TMP35]], [[TMP9]] +; CHECK-NEXT: [[TMP37:%.*]] = add i32 [[TMP33]], 1 +; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP36]], i32 [[TMP37]], i32 [[TMP33]] +; CHECK-NEXT: [[TMP39:%.*]] = xor i32 [[TMP38]], [[TMP5]] +; CHECK-NEXT: [[TMP40:%.*]] = sub i32 [[TMP39]], [[TMP5]] +; CHECK-NEXT: [[TMP41:%.*]] = insertelement <2 x i32> undef, i32 [[TMP40]], i64 0 +; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP43:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 +; CHECK-NEXT: [[TMP44:%.*]] = ashr i32 [[TMP42]], 31 +; CHECK-NEXT: [[TMP45:%.*]] = ashr i32 [[TMP43]], 31 +; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP44]], [[TMP45]] +; CHECK-NEXT: [[TMP47:%.*]] = add i32 [[TMP42]], [[TMP44]] +; CHECK-NEXT: [[TMP48:%.*]] = add i32 [[TMP43]], [[TMP45]] +; CHECK-NEXT: [[TMP49:%.*]] = xor i32 [[TMP47]], [[TMP44]] +; CHECK-NEXT: [[TMP50:%.*]] = xor i32 [[TMP48]], [[TMP45]] +; CHECK-NEXT: [[TMP51:%.*]] = uitofp i32 [[TMP50]] to float +; CHECK-NEXT: [[TMP52:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP51]]) +; CHECK-NEXT: [[TMP53:%.*]] = fmul fast float [[TMP52]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP54:%.*]] = fptoui float [[TMP53]] to i32 +; CHECK-NEXT: [[TMP55:%.*]] = sub i32 0, [[TMP50]] +; CHECK-NEXT: [[TMP56:%.*]] = mul i32 [[TMP55]], [[TMP54]] +; CHECK-NEXT: [[TMP57:%.*]] = zext i32 [[TMP54]] to i64 +; CHECK-NEXT: [[TMP58:%.*]] = zext i32 [[TMP56]] to i64 +; CHECK-NEXT: [[TMP59:%.*]] = mul i64 [[TMP57]], [[TMP58]] +; CHECK-NEXT: [[TMP60:%.*]] = trunc i64 [[TMP59]] to i32 +; CHECK-NEXT: [[TMP61:%.*]] = lshr i64 [[TMP59]], 32 +; CHECK-NEXT: [[TMP62:%.*]] = trunc i64 [[TMP61]] to i32 +; CHECK-NEXT: [[TMP63:%.*]] = add i32 [[TMP54]], [[TMP62]] +; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[TMP49]] to i64 +; CHECK-NEXT: [[TMP65:%.*]] = zext i32 [[TMP63]] to i64 +; CHECK-NEXT: [[TMP66:%.*]] = mul i64 [[TMP64]], [[TMP65]] +; CHECK-NEXT: [[TMP67:%.*]] = trunc i64 [[TMP66]] to i32 +; CHECK-NEXT: [[TMP68:%.*]] = lshr i64 [[TMP66]], 32 +; CHECK-NEXT: [[TMP69:%.*]] = trunc i64 [[TMP68]] to i32 +; CHECK-NEXT: [[TMP70:%.*]] = mul i32 [[TMP69]], [[TMP50]] +; CHECK-NEXT: [[TMP71:%.*]] = sub i32 [[TMP49]], [[TMP70]] +; CHECK-NEXT: [[TMP72:%.*]] = icmp uge i32 [[TMP71]], [[TMP50]] +; CHECK-NEXT: [[TMP73:%.*]] = add i32 [[TMP69]], 1 +; CHECK-NEXT: [[TMP74:%.*]] = select i1 [[TMP72]], i32 [[TMP73]], i32 [[TMP69]] +; CHECK-NEXT: [[TMP75:%.*]] = sub i32 [[TMP71]], [[TMP50]] +; CHECK-NEXT: [[TMP76:%.*]] = select i1 [[TMP72]], i32 [[TMP75]], i32 [[TMP71]] +; CHECK-NEXT: [[TMP77:%.*]] = icmp uge i32 [[TMP76]], [[TMP50]] +; CHECK-NEXT: [[TMP78:%.*]] = add i32 [[TMP74]], 1 +; CHECK-NEXT: [[TMP79:%.*]] = select i1 [[TMP77]], i32 [[TMP78]], i32 [[TMP74]] +; CHECK-NEXT: [[TMP80:%.*]] = xor i32 [[TMP79]], [[TMP46]] +; CHECK-NEXT: [[TMP81:%.*]] = sub i32 [[TMP80]], [[TMP46]] +; CHECK-NEXT: [[TMP82:%.*]] = insertelement <2 x i32> [[TMP41]], i32 [[TMP81]], i64 1 +; CHECK-NEXT: store <2 x i32> [[TMP82]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v2i32_pow2_shl_denom: ; GCN: ; %bb.0: ; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd -; GCN-NEXT: s_movk_i32 s4, 0x1000 -; GCN-NEXT: s_mov_b32 s14, 0x4f800000 -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9 -; GCN-NEXT: s_load_dwordx2 s[6:7], s[0:1], 0xb -; GCN-NEXT: s_mov_b32 s11, 0xf000 +; GCN-NEXT: s_movk_i32 s6, 0x1000 +; GCN-NEXT: s_mov_b32 s12, 0x4f7ffffe +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xb +; GCN-NEXT: s_mov_b32 s7, 0xf000 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_lshl_b32 s2, s4, s2 -; GCN-NEXT: s_ashr_i32 s5, s2, 31 -; GCN-NEXT: s_add_i32 s2, s2, s5 -; GCN-NEXT: s_xor_b32 s13, s2, s5 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s13 -; GCN-NEXT: s_ashr_i32 s2, s6, 31 -; GCN-NEXT: s_lshl_b32 s0, s4, s3 -; GCN-NEXT: s_add_i32 s1, s6, s2 +; GCN-NEXT: s_lshl_b32 s2, s6, s2 +; GCN-NEXT: s_ashr_i32 s10, s2, 31 +; GCN-NEXT: s_add_i32 s2, s2, s10 +; GCN-NEXT: s_xor_b32 s11, s2, s10 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s11 +; GCN-NEXT: s_sub_i32 s1, 0, s11 +; GCN-NEXT: s_lshl_b32 s0, s6, s3 +; GCN-NEXT: s_ashr_i32 s3, s0, 31 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_ashr_i32 s6, s0, 31 -; GCN-NEXT: s_add_i32 s4, s0, s6 -; GCN-NEXT: s_xor_b32 s3, s1, s2 -; GCN-NEXT: v_mul_f32_e32 v0, s14, v0 +; GCN-NEXT: s_add_i32 s0, s0, s3 +; GCN-NEXT: s_xor_b32 s13, s0, s3 +; GCN-NEXT: v_cvt_f32_u32_e32 v2, s13 +; GCN-NEXT: v_mul_f32_e32 v0, s12, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s15, s4, s6 -; GCN-NEXT: s_xor_b32 s12, s2, s5 -; GCN-NEXT: s_mov_b32 s10, -1 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s13 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s13 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s15 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[0:1] -; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v2 -; GCN-NEXT: v_mul_hi_u32 v0, v0, s3 -; GCN-NEXT: v_mul_f32_e32 v1, s14, v1 -; GCN-NEXT: v_mul_lo_u32 v2, v0, s13 +; GCN-NEXT: s_ashr_i32 s2, s8, 31 +; GCN-NEXT: s_add_i32 s0, s8, s2 +; GCN-NEXT: s_xor_b32 s0, s0, s2 +; GCN-NEXT: v_mul_lo_u32 v1, s1, v0 +; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GCN-NEXT: s_xor_b32 s2, s2, s10 +; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s0, v0 +; GCN-NEXT: v_mul_f32_e32 v1, s12, v2 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_sub_i32_e32 v4, vcc, s3, v2 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v4 -; GCN-NEXT: v_mul_lo_u32 v4, v1, s15 -; GCN-NEXT: v_mul_hi_u32 v5, v1, s15 -; GCN-NEXT: s_ashr_i32 s13, s7, 31 -; GCN-NEXT: s_add_i32 s7, s7, s13 -; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v5 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v1 -; GCN-NEXT: s_xor_b32 s7, s7, s13 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s3, v2 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s7 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v2, v1, s15 -; GCN-NEXT: v_xor_b32_e32 v0, s12, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s12, v0 -; GCN-NEXT: s_xor_b32 s4, s13, s6 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, s7, v2 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v3 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s7, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v1 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v1 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v1, s4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v1 -; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; GCN-NEXT: v_mul_lo_u32 v2, v0, s11 +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v0 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s0, v2 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s11, v2 +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s11, v2 +; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v3, s[0:1] +; GCN-NEXT: s_sub_i32 s0, 0, s13 +; GCN-NEXT: v_mul_lo_u32 v4, s0, v1 +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s11, v2 +; GCN-NEXT: s_ashr_i32 s0, s9, 31 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v4 +; GCN-NEXT: s_add_i32 s1, s9, s0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: s_xor_b32 s1, s1, s0 +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s1, v1 +; GCN-NEXT: v_xor_b32_e32 v0, s2, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s2, v0 +; GCN-NEXT: s_xor_b32 s2, s0, s3 +; GCN-NEXT: v_mul_lo_u32 v2, v1, s13 +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; GCN-NEXT: v_sub_i32_e32 v2, vcc, s1, v2 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v2 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s13, v2 +; GCN-NEXT: v_cndmask_b32_e64 v2, v2, v3, s[0:1] +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s13, v2 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GCN-NEXT: v_xor_b32_e32 v1, s2, v1 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s2, v1 +; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GCN-NEXT: s_endpgm %shl.y = shl <2 x i32> , %y %r = sdiv <2 x i32> %x, %shl.y @@ -4988,7 +4630,7 @@ define amdgpu_kernel void @sdiv_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou define amdgpu_kernel void @srem_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @srem_i32_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = srem i32 [[X:%.*]], 1235195 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i32_oddk_denom: @@ -5016,7 +4658,7 @@ define amdgpu_kernel void @srem_i32_oddk_denom(i32 addrspace(1)* %out, i32 %x) { define amdgpu_kernel void @srem_i32_pow2k_denom(i32 addrspace(1)* %out, i32 %x) { ; CHECK-LABEL: @srem_i32_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = srem i32 [[X:%.*]], 4096 -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i32_pow2k_denom: @@ -5043,7 +4685,7 @@ define amdgpu_kernel void @srem_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; CHECK-LABEL: @srem_i32_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i32 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = srem i32 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i32_pow2_shl_denom: @@ -5102,7 +4744,7 @@ define amdgpu_kernel void @srem_v2i32_pow2k_denom(<2 x i32> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = srem i32 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v2i32_pow2k_denom: @@ -5145,170 +4787,139 @@ define amdgpu_kernel void @srem_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou ; CHECK-NEXT: [[TMP8:%.*]] = xor i32 [[TMP6]], [[TMP4]] ; CHECK-NEXT: [[TMP9:%.*]] = uitofp i32 [[TMP8]] to float ; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP9]]) -; CHECK-NEXT: [[TMP11:%.*]] = fmul fast float [[TMP10]], 0x41F0000000000000 +; CHECK-NEXT: [[TMP11:%.*]] = fmul fast float [[TMP10]], 0x41EFFFFFC0000000 ; CHECK-NEXT: [[TMP12:%.*]] = fptoui float [[TMP11]] to i32 -; CHECK-NEXT: [[TMP13:%.*]] = zext i32 [[TMP12]] to i64 -; CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP8]] to i64 -; CHECK-NEXT: [[TMP15:%.*]] = mul i64 [[TMP13]], [[TMP14]] -; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32 -; CHECK-NEXT: [[TMP17:%.*]] = lshr i64 [[TMP15]], 32 +; CHECK-NEXT: [[TMP13:%.*]] = sub i32 0, [[TMP8]] +; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[TMP13]], [[TMP12]] +; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP12]] to i64 +; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP14]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = mul i64 [[TMP15]], [[TMP16]] ; CHECK-NEXT: [[TMP18:%.*]] = trunc i64 [[TMP17]] to i32 -; CHECK-NEXT: [[TMP19:%.*]] = sub i32 0, [[TMP16]] -; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i32 [[TMP18]], 0 -; CHECK-NEXT: [[TMP21:%.*]] = select i1 [[TMP20]], i32 [[TMP19]], i32 [[TMP16]] -; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP21]] to i64 -; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP12]] to i64 +; CHECK-NEXT: [[TMP19:%.*]] = lshr i64 [[TMP17]], 32 +; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32 +; CHECK-NEXT: [[TMP21:%.*]] = add i32 [[TMP12]], [[TMP20]] +; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[TMP7]] to i64 +; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP21]] to i64 ; CHECK-NEXT: [[TMP24:%.*]] = mul i64 [[TMP22]], [[TMP23]] ; CHECK-NEXT: [[TMP25:%.*]] = trunc i64 [[TMP24]] to i32 ; CHECK-NEXT: [[TMP26:%.*]] = lshr i64 [[TMP24]], 32 ; CHECK-NEXT: [[TMP27:%.*]] = trunc i64 [[TMP26]] to i32 -; CHECK-NEXT: [[TMP28:%.*]] = add i32 [[TMP12]], [[TMP27]] -; CHECK-NEXT: [[TMP29:%.*]] = sub i32 [[TMP12]], [[TMP27]] -; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP20]], i32 [[TMP28]], i32 [[TMP29]] -; CHECK-NEXT: [[TMP31:%.*]] = zext i32 [[TMP30]] to i64 -; CHECK-NEXT: [[TMP32:%.*]] = zext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[TMP33:%.*]] = mul i64 [[TMP31]], [[TMP32]] -; CHECK-NEXT: [[TMP34:%.*]] = trunc i64 [[TMP33]] to i32 -; CHECK-NEXT: [[TMP35:%.*]] = lshr i64 [[TMP33]], 32 -; CHECK-NEXT: [[TMP36:%.*]] = trunc i64 [[TMP35]] to i32 -; CHECK-NEXT: [[TMP37:%.*]] = mul i32 [[TMP36]], [[TMP8]] -; CHECK-NEXT: [[TMP38:%.*]] = sub i32 [[TMP7]], [[TMP37]] -; CHECK-NEXT: [[TMP39:%.*]] = icmp uge i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP40:%.*]] = icmp uge i32 [[TMP7]], [[TMP37]] -; CHECK-NEXT: [[TMP41:%.*]] = and i1 [[TMP39]], [[TMP40]] -; CHECK-NEXT: [[TMP42:%.*]] = sub i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP38]], [[TMP8]] -; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP41]], i32 [[TMP42]], i32 [[TMP38]] -; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP40]], i32 [[TMP44]], i32 [[TMP43]] -; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP45]], [[TMP3]] -; CHECK-NEXT: [[TMP47:%.*]] = sub i32 [[TMP46]], [[TMP3]] -; CHECK-NEXT: [[TMP48:%.*]] = insertelement <2 x i32> undef, i32 [[TMP47]], i64 0 -; CHECK-NEXT: [[TMP49:%.*]] = extractelement <2 x i32> [[X]], i64 1 -; CHECK-NEXT: [[TMP50:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 -; CHECK-NEXT: [[TMP51:%.*]] = ashr i32 [[TMP49]], 31 -; CHECK-NEXT: [[TMP52:%.*]] = ashr i32 [[TMP50]], 31 -; CHECK-NEXT: [[TMP53:%.*]] = add i32 [[TMP49]], [[TMP51]] -; CHECK-NEXT: [[TMP54:%.*]] = add i32 [[TMP50]], [[TMP52]] -; CHECK-NEXT: [[TMP55:%.*]] = xor i32 [[TMP53]], [[TMP51]] -; CHECK-NEXT: [[TMP56:%.*]] = xor i32 [[TMP54]], [[TMP52]] -; CHECK-NEXT: [[TMP57:%.*]] = uitofp i32 [[TMP56]] to float -; CHECK-NEXT: [[TMP58:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP57]]) -; CHECK-NEXT: [[TMP59:%.*]] = fmul fast float [[TMP58]], 0x41F0000000000000 -; CHECK-NEXT: [[TMP60:%.*]] = fptoui float [[TMP59]] to i32 -; CHECK-NEXT: [[TMP61:%.*]] = zext i32 [[TMP60]] to i64 -; CHECK-NEXT: [[TMP62:%.*]] = zext i32 [[TMP56]] to i64 -; CHECK-NEXT: [[TMP63:%.*]] = mul i64 [[TMP61]], [[TMP62]] -; CHECK-NEXT: [[TMP64:%.*]] = trunc i64 [[TMP63]] to i32 -; CHECK-NEXT: [[TMP65:%.*]] = lshr i64 [[TMP63]], 32 -; CHECK-NEXT: [[TMP66:%.*]] = trunc i64 [[TMP65]] to i32 -; CHECK-NEXT: [[TMP67:%.*]] = sub i32 0, [[TMP64]] -; CHECK-NEXT: [[TMP68:%.*]] = icmp eq i32 [[TMP66]], 0 -; CHECK-NEXT: [[TMP69:%.*]] = select i1 [[TMP68]], i32 [[TMP67]], i32 [[TMP64]] -; CHECK-NEXT: [[TMP70:%.*]] = zext i32 [[TMP69]] to i64 -; CHECK-NEXT: [[TMP71:%.*]] = zext i32 [[TMP60]] to i64 -; CHECK-NEXT: [[TMP72:%.*]] = mul i64 [[TMP70]], [[TMP71]] -; CHECK-NEXT: [[TMP73:%.*]] = trunc i64 [[TMP72]] to i32 -; CHECK-NEXT: [[TMP74:%.*]] = lshr i64 [[TMP72]], 32 -; CHECK-NEXT: [[TMP75:%.*]] = trunc i64 [[TMP74]] to i32 -; CHECK-NEXT: [[TMP76:%.*]] = add i32 [[TMP60]], [[TMP75]] -; CHECK-NEXT: [[TMP77:%.*]] = sub i32 [[TMP60]], [[TMP75]] -; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP68]], i32 [[TMP76]], i32 [[TMP77]] -; CHECK-NEXT: [[TMP79:%.*]] = zext i32 [[TMP78]] to i64 -; CHECK-NEXT: [[TMP80:%.*]] = zext i32 [[TMP55]] to i64 -; CHECK-NEXT: [[TMP81:%.*]] = mul i64 [[TMP79]], [[TMP80]] -; CHECK-NEXT: [[TMP82:%.*]] = trunc i64 [[TMP81]] to i32 -; CHECK-NEXT: [[TMP83:%.*]] = lshr i64 [[TMP81]], 32 -; CHECK-NEXT: [[TMP84:%.*]] = trunc i64 [[TMP83]] to i32 -; CHECK-NEXT: [[TMP85:%.*]] = mul i32 [[TMP84]], [[TMP56]] -; CHECK-NEXT: [[TMP86:%.*]] = sub i32 [[TMP55]], [[TMP85]] -; CHECK-NEXT: [[TMP87:%.*]] = icmp uge i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP88:%.*]] = icmp uge i32 [[TMP55]], [[TMP85]] -; CHECK-NEXT: [[TMP89:%.*]] = and i1 [[TMP87]], [[TMP88]] -; CHECK-NEXT: [[TMP90:%.*]] = sub i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP91:%.*]] = add i32 [[TMP86]], [[TMP56]] -; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP89]], i32 [[TMP90]], i32 [[TMP86]] -; CHECK-NEXT: [[TMP93:%.*]] = select i1 [[TMP88]], i32 [[TMP92]], i32 [[TMP91]] -; CHECK-NEXT: [[TMP94:%.*]] = xor i32 [[TMP93]], [[TMP51]] -; CHECK-NEXT: [[TMP95:%.*]] = sub i32 [[TMP94]], [[TMP51]] -; CHECK-NEXT: [[TMP96:%.*]] = insertelement <2 x i32> [[TMP48]], i32 [[TMP95]], i64 1 -; CHECK-NEXT: store <2 x i32> [[TMP96]], <2 x i32> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: [[TMP28:%.*]] = mul i32 [[TMP27]], [[TMP8]] +; CHECK-NEXT: [[TMP29:%.*]] = sub i32 [[TMP7]], [[TMP28]] +; CHECK-NEXT: [[TMP30:%.*]] = icmp uge i32 [[TMP29]], [[TMP8]] +; CHECK-NEXT: [[TMP31:%.*]] = sub i32 [[TMP29]], [[TMP8]] +; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP30]], i32 [[TMP31]], i32 [[TMP29]] +; CHECK-NEXT: [[TMP33:%.*]] = icmp uge i32 [[TMP32]], [[TMP8]] +; CHECK-NEXT: [[TMP34:%.*]] = sub i32 [[TMP32]], [[TMP8]] +; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP33]], i32 [[TMP34]], i32 [[TMP32]] +; CHECK-NEXT: [[TMP36:%.*]] = xor i32 [[TMP35]], [[TMP3]] +; CHECK-NEXT: [[TMP37:%.*]] = sub i32 [[TMP36]], [[TMP3]] +; CHECK-NEXT: [[TMP38:%.*]] = insertelement <2 x i32> undef, i32 [[TMP37]], i64 0 +; CHECK-NEXT: [[TMP39:%.*]] = extractelement <2 x i32> [[X]], i64 1 +; CHECK-NEXT: [[TMP40:%.*]] = extractelement <2 x i32> [[SHL_Y]], i64 1 +; CHECK-NEXT: [[TMP41:%.*]] = ashr i32 [[TMP39]], 31 +; CHECK-NEXT: [[TMP42:%.*]] = ashr i32 [[TMP40]], 31 +; CHECK-NEXT: [[TMP43:%.*]] = add i32 [[TMP39]], [[TMP41]] +; CHECK-NEXT: [[TMP44:%.*]] = add i32 [[TMP40]], [[TMP42]] +; CHECK-NEXT: [[TMP45:%.*]] = xor i32 [[TMP43]], [[TMP41]] +; CHECK-NEXT: [[TMP46:%.*]] = xor i32 [[TMP44]], [[TMP42]] +; CHECK-NEXT: [[TMP47:%.*]] = uitofp i32 [[TMP46]] to float +; CHECK-NEXT: [[TMP48:%.*]] = call fast float @llvm.amdgcn.rcp.f32(float [[TMP47]]) +; CHECK-NEXT: [[TMP49:%.*]] = fmul fast float [[TMP48]], 0x41EFFFFFC0000000 +; CHECK-NEXT: [[TMP50:%.*]] = fptoui float [[TMP49]] to i32 +; CHECK-NEXT: [[TMP51:%.*]] = sub i32 0, [[TMP46]] +; CHECK-NEXT: [[TMP52:%.*]] = mul i32 [[TMP51]], [[TMP50]] +; CHECK-NEXT: [[TMP53:%.*]] = zext i32 [[TMP50]] to i64 +; CHECK-NEXT: [[TMP54:%.*]] = zext i32 [[TMP52]] to i64 +; CHECK-NEXT: [[TMP55:%.*]] = mul i64 [[TMP53]], [[TMP54]] +; CHECK-NEXT: [[TMP56:%.*]] = trunc i64 [[TMP55]] to i32 +; CHECK-NEXT: [[TMP57:%.*]] = lshr i64 [[TMP55]], 32 +; CHECK-NEXT: [[TMP58:%.*]] = trunc i64 [[TMP57]] to i32 +; CHECK-NEXT: [[TMP59:%.*]] = add i32 [[TMP50]], [[TMP58]] +; CHECK-NEXT: [[TMP60:%.*]] = zext i32 [[TMP45]] to i64 +; CHECK-NEXT: [[TMP61:%.*]] = zext i32 [[TMP59]] to i64 +; CHECK-NEXT: [[TMP62:%.*]] = mul i64 [[TMP60]], [[TMP61]] +; CHECK-NEXT: [[TMP63:%.*]] = trunc i64 [[TMP62]] to i32 +; CHECK-NEXT: [[TMP64:%.*]] = lshr i64 [[TMP62]], 32 +; CHECK-NEXT: [[TMP65:%.*]] = trunc i64 [[TMP64]] to i32 +; CHECK-NEXT: [[TMP66:%.*]] = mul i32 [[TMP65]], [[TMP46]] +; CHECK-NEXT: [[TMP67:%.*]] = sub i32 [[TMP45]], [[TMP66]] +; CHECK-NEXT: [[TMP68:%.*]] = icmp uge i32 [[TMP67]], [[TMP46]] +; CHECK-NEXT: [[TMP69:%.*]] = sub i32 [[TMP67]], [[TMP46]] +; CHECK-NEXT: [[TMP70:%.*]] = select i1 [[TMP68]], i32 [[TMP69]], i32 [[TMP67]] +; CHECK-NEXT: [[TMP71:%.*]] = icmp uge i32 [[TMP70]], [[TMP46]] +; CHECK-NEXT: [[TMP72:%.*]] = sub i32 [[TMP70]], [[TMP46]] +; CHECK-NEXT: [[TMP73:%.*]] = select i1 [[TMP71]], i32 [[TMP72]], i32 [[TMP70]] +; CHECK-NEXT: [[TMP74:%.*]] = xor i32 [[TMP73]], [[TMP41]] +; CHECK-NEXT: [[TMP75:%.*]] = sub i32 [[TMP74]], [[TMP41]] +; CHECK-NEXT: [[TMP76:%.*]] = insertelement <2 x i32> [[TMP38]], i32 [[TMP75]], i64 1 +; CHECK-NEXT: store <2 x i32> [[TMP76]], <2 x i32> addrspace(1)* [[OUT:%.*]], align 8 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v2i32_pow2_shl_denom: ; GCN: ; %bb.0: ; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd -; GCN-NEXT: s_movk_i32 s4, 0x1000 -; GCN-NEXT: s_mov_b32 s14, 0x4f800000 -; GCN-NEXT: s_load_dwordx2 s[6:7], s[0:1], 0xb -; GCN-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9 +; GCN-NEXT: s_movk_i32 s6, 0x1000 +; GCN-NEXT: s_mov_b32 s7, 0x4f7ffffe ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_lshl_b32 s2, s4, s2 -; GCN-NEXT: s_ashr_i32 s5, s2, 31 -; GCN-NEXT: s_add_i32 s2, s2, s5 -; GCN-NEXT: s_xor_b32 s13, s2, s5 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s13 -; GCN-NEXT: s_lshl_b32 s2, s4, s3 -; GCN-NEXT: s_ashr_i32 s12, s6, 31 -; GCN-NEXT: s_add_i32 s3, s6, s12 -; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GCN-NEXT: s_lshl_b32 s2, s6, s2 ; GCN-NEXT: s_ashr_i32 s4, s2, 31 -; GCN-NEXT: s_add_i32 s6, s2, s4 -; GCN-NEXT: s_xor_b32 s5, s3, s12 -; GCN-NEXT: v_mul_f32_e32 v0, s14, v0 +; GCN-NEXT: s_add_i32 s2, s2, s4 +; GCN-NEXT: s_xor_b32 s2, s2, s4 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s2 +; GCN-NEXT: s_lshl_b32 s3, s6, s3 +; GCN-NEXT: s_ashr_i32 s6, s3, 31 +; GCN-NEXT: s_add_i32 s3, s3, s6 +; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GCN-NEXT: s_xor_b32 s3, s3, s6 +; GCN-NEXT: s_sub_i32 s6, 0, s2 +; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb +; GCN-NEXT: v_mul_f32_e32 v0, s7, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s15, s6, s4 -; GCN-NEXT: s_ashr_i32 s6, s7, 31 -; GCN-NEXT: s_add_i32 s7, s7, s6 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s13 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s13 -; GCN-NEXT: s_xor_b32 s7, s7, s6 -; GCN-NEXT: s_mov_b32 s11, 0xf000 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_cvt_f32_u32_e32 v2, s15 -; GCN-NEXT: s_mov_b32 s10, -1 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v2 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s5 -; GCN-NEXT: v_mul_f32_e32 v1, s14, v1 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, s3 +; GCN-NEXT: s_waitcnt lgkmcnt(0) +; GCN-NEXT: s_ashr_i32 s8, s0, 31 +; GCN-NEXT: v_mul_lo_u32 v2, s6, v0 +; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 +; GCN-NEXT: s_add_i32 s0, s0, s8 +; GCN-NEXT: s_xor_b32 s0, s0, s8 +; GCN-NEXT: v_mul_hi_u32 v2, v0, v2 +; GCN-NEXT: v_mul_f32_e32 v1, s7, v1 ; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s13 -; GCN-NEXT: v_mul_lo_u32 v4, v1, s15 -; GCN-NEXT: v_mul_hi_u32 v5, v1, s15 -; GCN-NEXT: v_sub_i32_e32 v2, vcc, s5, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s5, v0 -; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v5 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, s13, v2 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s13, v2 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v1, v1, s7 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v1, v1, s15 -; GCN-NEXT: v_xor_b32_e32 v0, s12, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s12, v0 -; GCN-NEXT: v_sub_i32_e32 v2, vcc, s7, v1 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], s7, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v2 -; GCN-NEXT: v_add_i32_e32 v3, vcc, s15, v2 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s15, v2 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v1, s6, v1 -; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s6, v1 -; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; GCN-NEXT: s_sub_i32 s6, 0, s3 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s0, v0 +; GCN-NEXT: v_mul_lo_u32 v2, s6, v1 +; GCN-NEXT: s_ashr_i32 s9, s1, 31 +; GCN-NEXT: s_add_i32 s1, s1, s9 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s2 +; GCN-NEXT: v_mul_hi_u32 v2, v1, v2 +; GCN-NEXT: s_mov_b32 s7, 0xf000 +; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, s2, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: s_xor_b32 s0, s1, s9 +; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GCN-NEXT: v_mul_hi_u32 v1, s0, v1 +; GCN-NEXT: v_xor_b32_e32 v0, s8, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s8, v0 +; GCN-NEXT: v_mul_lo_u32 v1, v1, s3 +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s3, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s3, v1 +; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GCN-NEXT: v_xor_b32_e32 v1, s9, v1 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s9, v1 +; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GCN-NEXT: s_endpgm %shl.y = shl <2 x i32> , %y %r = srem <2 x i32> %x, %shl.y @@ -5319,7 +4930,7 @@ define amdgpu_kernel void @srem_v2i32_pow2_shl_denom(<2 x i32> addrspace(1)* %ou define amdgpu_kernel void @udiv_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @udiv_i64_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = udiv i64 [[X:%.*]], 1235195949943 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i64_oddk_denom: @@ -5454,7 +5065,7 @@ define amdgpu_kernel void @udiv_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { define amdgpu_kernel void @udiv_i64_pow2k_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @udiv_i64_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = udiv i64 [[X:%.*]], 4096 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i64_pow2k_denom: @@ -5479,7 +5090,7 @@ define amdgpu_kernel void @udiv_i64_pow2_shl_denom(i64 addrspace(1)* %out, i64 % ; CHECK-LABEL: @udiv_i64_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i64 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = udiv i64 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_i64_pow2_shl_denom: @@ -5511,7 +5122,7 @@ define amdgpu_kernel void @udiv_v2i64_pow2k_denom(<2 x i64> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = udiv i64 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i64_pow2k_denom: @@ -5542,7 +5153,7 @@ define amdgpu_kernel void @udiv_v2i64_mixed_pow2k_denom(<2 x i64> addrspace(1)* ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = udiv i64 [[TMP4]], 4095 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i64_mixed_pow2k_denom: @@ -5672,7 +5283,7 @@ define amdgpu_kernel void @udiv_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou ; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[SHL_Y]], i64 1 ; CHECK-NEXT: [[TMP7:%.*]] = udiv i64 [[TMP5]], [[TMP6]] ; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[TMP7]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: udiv_v2i64_pow2_shl_denom: @@ -5702,7 +5313,7 @@ define amdgpu_kernel void @udiv_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou define amdgpu_kernel void @urem_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @urem_i64_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = urem i64 [[X:%.*]], 1235195393993 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i64_oddk_denom: @@ -5836,7 +5447,7 @@ define amdgpu_kernel void @urem_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { define amdgpu_kernel void @urem_i64_pow2k_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @urem_i64_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = urem i64 [[X:%.*]], 4096 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i64_pow2k_denom: @@ -5861,7 +5472,7 @@ define amdgpu_kernel void @urem_i64_pow2_shl_denom(i64 addrspace(1)* %out, i64 % ; CHECK-LABEL: @urem_i64_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i64 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = urem i64 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_i64_pow2_shl_denom: @@ -5897,7 +5508,7 @@ define amdgpu_kernel void @urem_v2i64_pow2k_denom(<2 x i64> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = urem i64 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v2i64_pow2k_denom: @@ -5932,7 +5543,7 @@ define amdgpu_kernel void @urem_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou ; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[SHL_Y]], i64 1 ; CHECK-NEXT: [[TMP7:%.*]] = urem i64 [[TMP5]], [[TMP6]] ; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[TMP7]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: urem_v2i64_pow2_shl_denom: @@ -5968,7 +5579,7 @@ define amdgpu_kernel void @urem_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou define amdgpu_kernel void @sdiv_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @sdiv_i64_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = sdiv i64 [[X:%.*]], 1235195 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i64_oddk_denom: @@ -6098,7 +5709,7 @@ define amdgpu_kernel void @sdiv_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { define amdgpu_kernel void @sdiv_i64_pow2k_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @sdiv_i64_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = sdiv i64 [[X:%.*]], 4096 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i64_pow2k_denom: @@ -6127,7 +5738,7 @@ define amdgpu_kernel void @sdiv_i64_pow2_shl_denom(i64 addrspace(1)* %out, i64 % ; CHECK-LABEL: @sdiv_i64_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i64 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = sdiv i64 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_i64_pow2_shl_denom: @@ -6284,7 +5895,7 @@ define amdgpu_kernel void @sdiv_v2i64_pow2k_denom(<2 x i64> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = sdiv i64 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v2i64_pow2k_denom: @@ -6323,7 +5934,7 @@ define amdgpu_kernel void @ssdiv_v2i64_mixed_pow2k_denom(<2 x i64> addrspace(1)* ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = sdiv i64 [[TMP4]], 4095 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: ssdiv_v2i64_mixed_pow2k_denom: @@ -6468,7 +6079,7 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou ; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[SHL_Y]], i64 1 ; CHECK-NEXT: [[TMP7:%.*]] = sdiv i64 [[TMP5]], [[TMP6]] ; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[TMP7]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: sdiv_v2i64_pow2_shl_denom: @@ -6749,7 +6360,7 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou define amdgpu_kernel void @srem_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @srem_i64_oddk_denom( ; CHECK-NEXT: [[R:%.*]] = srem i64 [[X:%.*]], 1235195 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i64_oddk_denom: @@ -6877,7 +6488,7 @@ define amdgpu_kernel void @srem_i64_oddk_denom(i64 addrspace(1)* %out, i64 %x) { define amdgpu_kernel void @srem_i64_pow2k_denom(i64 addrspace(1)* %out, i64 %x) { ; CHECK-LABEL: @srem_i64_pow2k_denom( ; CHECK-NEXT: [[R:%.*]] = srem i64 [[X:%.*]], 4096 -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i64_pow2k_denom: @@ -6908,7 +6519,7 @@ define amdgpu_kernel void @srem_i64_pow2_shl_denom(i64 addrspace(1)* %out, i64 % ; CHECK-LABEL: @srem_i64_pow2_shl_denom( ; CHECK-NEXT: [[SHL_Y:%.*]] = shl i64 4096, [[Y:%.*]] ; CHECK-NEXT: [[R:%.*]] = srem i64 [[X:%.*]], [[SHL_Y]] -; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store i64 [[R]], i64 addrspace(1)* [[OUT:%.*]], align 4 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_i64_pow2_shl_denom: @@ -7063,7 +6674,7 @@ define amdgpu_kernel void @srem_v2i64_pow2k_denom(<2 x i64> addrspace(1)* %out, ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[X]], i64 1 ; CHECK-NEXT: [[TMP5:%.*]] = srem i64 [[TMP4]], 4096 ; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[TMP5]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v2i64_pow2k_denom: @@ -7110,7 +6721,7 @@ define amdgpu_kernel void @srem_v2i64_pow2_shl_denom(<2 x i64> addrspace(1)* %ou ; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[SHL_Y]], i64 1 ; CHECK-NEXT: [[TMP7:%.*]] = srem i64 [[TMP5]], [[TMP6]] ; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[TMP7]], i64 1 -; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]] +; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64> addrspace(1)* [[OUT:%.*]], align 16 ; CHECK-NEXT: ret void ; ; GCN-LABEL: srem_v2i64_pow2_shl_denom: diff --git a/llvm/test/CodeGen/AMDGPU/bypass-div.ll b/llvm/test/CodeGen/AMDGPU/bypass-div.ll index 5cc320a3658b..9fcd97721ee7 100644 --- a/llvm/test/CodeGen/AMDGPU/bypass-div.ll +++ b/llvm/test/CodeGen/AMDGPU/bypass-div.ll @@ -661,32 +661,28 @@ define i32 @sdiv32(i32 %a, i32 %b) { ; GFX9-NEXT: v_add_u32_e32 v1, v1, v2 ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v2 ; GFX9-NEXT: v_cvt_f32_u32_e32 v3, v1 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_lo_u32 v4, v3, v1 -; GFX9-NEXT: v_mul_hi_u32 v5, v3, v1 -; GFX9-NEXT: v_sub_u32_e32 v6, 0, v4 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GFX9-NEXT: v_mul_hi_u32 v4, v4, v3 +; GFX9-NEXT: v_sub_u32_e32 v4, 0, v1 ; GFX9-NEXT: v_ashrrev_i32_e32 v5, 31, v0 ; GFX9-NEXT: v_add_u32_e32 v0, v0, v5 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v3, v3 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v5 -; GFX9-NEXT: v_add_u32_e32 v6, v3, v4 -; GFX9-NEXT: v_sub_u32_e32 v3, v3, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v6, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v0 ; GFX9-NEXT: v_xor_b32_e32 v2, v5, v2 +; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GFX9-NEXT: v_mul_lo_u32 v4, v4, v3 +; GFX9-NEXT: v_mul_hi_u32 v4, v3, v4 +; GFX9-NEXT: v_add_u32_e32 v3, v3, v4 +; GFX9-NEXT: v_mul_hi_u32 v3, v0, v3 ; GFX9-NEXT: v_mul_lo_u32 v4, v3, v1 ; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 -; GFX9-NEXT: v_add_u32_e32 v6, -1, v3 -; GFX9-NEXT: v_sub_u32_e32 v7, v0, v4 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v1 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, v5, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v4 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_sub_u32_e32 v4, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GFX9-NEXT: v_add_u32_e32 v4, 1, v3 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v2 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v2 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -699,28 +695,24 @@ define i32 @udiv32(i32 %a, i32 %b) { ; GFX9: ; %bb.0: ; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX9-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GFX9-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_lo_u32 v3, v2, v1 -; GFX9-NEXT: v_mul_hi_u32 v4, v2, v1 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v2 -; GFX9-NEXT: v_add_u32_e32 v4, v2, v3 -; GFX9-NEXT: v_sub_u32_e32 v2, v2, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v2, v2, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v2 +; GFX9-NEXT: v_mul_hi_u32 v3, v2, v3 +; GFX9-NEXT: v_add_u32_e32 v2, v2, v3 +; GFX9-NEXT: v_mul_hi_u32 v2, v0, v2 ; GFX9-NEXT: v_mul_lo_u32 v3, v2, v1 ; GFX9-NEXT: v_add_u32_e32 v4, 1, v2 -; GFX9-NEXT: v_add_u32_e32 v5, -1, v2 -; GFX9-NEXT: v_sub_u32_e32 v6, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v3 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX9-NEXT: v_add_u32_e32 v3, 1, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; GFX9-NEXT: s_setpc_b64 s[30:31] %d = udiv i32 %a, %b ret i32 %d @@ -734,31 +726,25 @@ define i32 @srem32(i32 %a, i32 %b) { ; GFX9-NEXT: v_add_u32_e32 v1, v1, v2 ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v2 ; GFX9-NEXT: v_cvt_f32_u32_e32 v2, v1 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 -; GFX9-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_lo_u32 v3, v2, v1 -; GFX9-NEXT: v_mul_hi_u32 v4, v2, v1 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 ; GFX9-NEXT: v_ashrrev_i32_e32 v4, 31, v0 ; GFX9-NEXT: v_add_u32_e32 v0, v0, v4 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v2, v2 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4 -; GFX9-NEXT: v_add_u32_e32 v5, v2, v3 -; GFX9-NEXT: v_sub_u32_e32 v2, v2, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v2, v2, v0 +; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; GFX9-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v2 +; GFX9-NEXT: v_mul_hi_u32 v3, v2, v3 +; GFX9-NEXT: v_add_u32_e32 v2, v2, v3 +; GFX9-NEXT: v_mul_hi_u32 v2, v0, v2 ; GFX9-NEXT: v_mul_lo_u32 v2, v2, v1 -; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v3, v1 -; GFX9-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GFX9-NEXT: v_add_u32_e32 v5, v3, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v2, v0, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_sub_u32_e32 v2, v0, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v4 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -771,28 +757,22 @@ define i32 @urem32(i32 %a, i32 %b) { ; GFX9: ; %bb.0: ; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX9-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GFX9-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GFX9-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GFX9-NEXT: v_mul_lo_u32 v3, v2, v1 -; GFX9-NEXT: v_mul_hi_u32 v4, v2, v1 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v2 -; GFX9-NEXT: v_add_u32_e32 v4, v2, v3 -; GFX9-NEXT: v_sub_u32_e32 v2, v2, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v2, v2, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v2 +; GFX9-NEXT: v_mul_hi_u32 v3, v2, v3 +; GFX9-NEXT: v_add_u32_e32 v2, v2, v3 +; GFX9-NEXT: v_mul_hi_u32 v2, v0, v2 ; GFX9-NEXT: v_mul_lo_u32 v2, v2, v1 -; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v3, v1 -; GFX9-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GFX9-NEXT: v_add_u32_e32 v4, v3, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v2, v0, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_sub_u32_e32 v2, v0, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GFX9-NEXT: s_setpc_b64 s[30:31] %d = urem i32 %a, %b ret i32 %d diff --git a/llvm/test/CodeGen/AMDGPU/idiv-licm.ll b/llvm/test/CodeGen/AMDGPU/idiv-licm.ll index d9699fe4ce66..cf17589f135d 100644 --- a/llvm/test/CodeGen/AMDGPU/idiv-licm.ll +++ b/llvm/test/CodeGen/AMDGPU/idiv-licm.ll @@ -5,45 +5,41 @@ define amdgpu_kernel void @udiv32_invariant_denom(i32 addrspace(1)* nocapture %a ; GFX9-LABEL: udiv32_invariant_denom: ; GFX9: ; %bb.0: ; %bb ; GFX9-NEXT: s_load_dword s2, s[0:1], 0x2c -; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b64 s[6:7], 0 +; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 +; GFX9-NEXT: s_mov_b64 s[4:5], 0 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) ; GFX9-NEXT: v_cvt_f32_u32_e32 v0, s2 ; GFX9-NEXT: s_sub_i32 s3, 0, s2 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GFX9-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_lo_u32 v1, v0, s2 -; GFX9-NEXT: v_mul_hi_u32 v2, v0, s2 -; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_add_u32_e32 v2, v0, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_mul_lo_u32 v1, s3, v0 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX9-NEXT: v_add_u32_e32 v0, v0, v1 ; GFX9-NEXT: BB0_1: ; %bb3 ; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1 -; GFX9-NEXT: v_mul_lo_u32 v3, v0, s7 -; GFX9-NEXT: v_mul_hi_u32 v4, v0, s6 -; GFX9-NEXT: v_mov_b32_e32 v1, s4 -; GFX9-NEXT: v_mov_b32_e32 v2, s5 +; GFX9-NEXT: v_mul_lo_u32 v3, s5, v0 +; GFX9-NEXT: v_mul_hi_u32 v4, s4, v0 +; GFX9-NEXT: v_mov_b32_e32 v2, s1 +; GFX9-NEXT: v_mov_b32_e32 v1, s0 ; GFX9-NEXT: v_add_u32_e32 v3, v4, v3 ; GFX9-NEXT: v_mul_lo_u32 v4, s3, v3 -; GFX9-NEXT: v_mul_lo_u32 v5, v3, s2 -; GFX9-NEXT: v_add_u32_e32 v6, 1, v3 -; GFX9-NEXT: v_add_u32_e32 v7, -1, v3 -; GFX9-NEXT: v_add_u32_e32 v4, s6, v4 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, s6, v5 -; GFX9-NEXT: v_cmp_le_u32_e64 s[0:1], s2, v4 -; GFX9-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GFX9-NEXT: s_add_u32 s6, s6, 1 -; GFX9-NEXT: s_addc_u32 s7, s7, 0 -; GFX9-NEXT: s_add_u32 s4, s4, 4 -; GFX9-NEXT: v_cndmask_b32_e64 v3, v3, v6, s[0:1] +; GFX9-NEXT: v_not_b32_e32 v6, v3 +; GFX9-NEXT: v_mul_lo_u32 v6, s2, v6 +; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 +; GFX9-NEXT: v_add_u32_e32 v4, s4, v4 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v4 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: v_add_u32_e32 v5, s4, v6 +; GFX9-NEXT: s_add_u32 s4, s4, 1 ; GFX9-NEXT: s_addc_u32 s5, s5, 0 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc -; GFX9-NEXT: s_cmpk_eq_i32 s6, 0x400 +; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v5, vcc +; GFX9-NEXT: s_add_u32 s0, s0, 4 +; GFX9-NEXT: s_addc_u32 s1, s1, 0 +; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v4 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: s_cmpk_eq_i32 s4, 0x400 ; GFX9-NEXT: global_store_dword v[1:2], v3, off ; GFX9-NEXT: s_cbranch_scc0 BB0_1 ; GFX9-NEXT: ; %bb.2: ; %bb2 @@ -69,49 +65,39 @@ define amdgpu_kernel void @urem32_invariant_denom(i32 addrspace(1)* nocapture %a ; GFX9-LABEL: urem32_invariant_denom: ; GFX9: ; %bb.0: ; %bb ; GFX9-NEXT: s_load_dword s2, s[0:1], 0x2c -; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b64 s[6:7], 0 +; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 +; GFX9-NEXT: s_mov_b64 s[4:5], 0 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) ; GFX9-NEXT: v_cvt_f32_u32_e32 v0, s2 ; GFX9-NEXT: s_sub_i32 s3, 0, s2 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GFX9-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_lo_u32 v1, v0, s2 -; GFX9-NEXT: v_mul_hi_u32 v2, v0, s2 -; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_add_u32_e32 v2, v0, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_mul_lo_u32 v1, s3, v0 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX9-NEXT: v_add_u32_e32 v0, v0, v1 ; GFX9-NEXT: BB1_1: ; %bb3 ; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1 -; GFX9-NEXT: v_mul_lo_u32 v3, v0, s7 -; GFX9-NEXT: v_mul_hi_u32 v4, v0, s6 -; GFX9-NEXT: v_mov_b32_e32 v1, s4 -; GFX9-NEXT: v_mov_b32_e32 v2, s5 +; GFX9-NEXT: v_mul_lo_u32 v3, s5, v0 +; GFX9-NEXT: v_mul_hi_u32 v4, s4, v0 +; GFX9-NEXT: v_mov_b32_e32 v2, s1 +; GFX9-NEXT: v_mov_b32_e32 v1, s0 ; GFX9-NEXT: v_add_u32_e32 v3, v4, v3 -; GFX9-NEXT: v_mul_lo_u32 v5, s3, v3 -; GFX9-NEXT: v_mul_lo_u32 v4, v3, s2 -; GFX9-NEXT: v_not_b32_e32 v6, v3 -; GFX9-NEXT: v_sub_u32_e32 v3, 1, v3 +; GFX9-NEXT: v_mul_lo_u32 v4, s3, v3 +; GFX9-NEXT: v_not_b32_e32 v3, v3 ; GFX9-NEXT: v_mul_lo_u32 v3, s2, v3 -; GFX9-NEXT: v_mul_lo_u32 v6, s2, v6 -; GFX9-NEXT: v_add_u32_e32 v5, s6, v5 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], s6, v4 -; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v5 -; GFX9-NEXT: s_and_b64 vcc, vcc, s[0:1] -; GFX9-NEXT: v_add_u32_e32 v4, s6, v6 -; GFX9-NEXT: v_add_u32_e32 v3, s6, v3 -; GFX9-NEXT: s_add_u32 s6, s6, 1 -; GFX9-NEXT: s_addc_u32 s7, s7, 0 -; GFX9-NEXT: s_add_u32 s4, s4, 4 +; GFX9-NEXT: v_add_u32_e32 v4, s4, v4 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v4 +; GFX9-NEXT: v_add_u32_e32 v3, s4, v3 +; GFX9-NEXT: s_add_u32 s4, s4, 1 ; GFX9-NEXT: s_addc_u32 s5, s5, 0 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v4, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[0:1] -; GFX9-NEXT: s_cmpk_eq_i32 s6, 0x400 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v3, vcc +; GFX9-NEXT: s_add_u32 s0, s0, 4 +; GFX9-NEXT: s_addc_u32 s1, s1, 0 +; GFX9-NEXT: v_subrev_u32_e32 v4, s2, v3 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v3 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX9-NEXT: s_cmpk_eq_i32 s4, 0x400 ; GFX9-NEXT: global_store_dword v[1:2], v3, off ; GFX9-NEXT: s_cbranch_scc0 BB1_1 ; GFX9-NEXT: ; %bb.2: ; %bb2 @@ -137,45 +123,41 @@ define amdgpu_kernel void @sdiv32_invariant_denom(i32 addrspace(1)* nocapture %a ; GFX9-LABEL: sdiv32_invariant_denom: ; GFX9: ; %bb.0: ; %bb ; GFX9-NEXT: s_load_dword s3, s[0:1], 0x2c -; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b32 s6, 0 +; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) ; GFX9-NEXT: s_ashr_i32 s2, s3, 31 ; GFX9-NEXT: s_add_i32 s3, s3, s2 ; GFX9-NEXT: s_xor_b32 s3, s3, s2 ; GFX9-NEXT: v_cvt_f32_u32_e32 v0, s3 +; GFX9-NEXT: s_sub_i32 s4, 0, s3 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GFX9-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_lo_u32 v1, v0, s3 -; GFX9-NEXT: v_mul_hi_u32 v2, v0, s3 -; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_add_u32_e32 v2, v0, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_mul_lo_u32 v1, s4, v0 +; GFX9-NEXT: s_mov_b32 s4, 0 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX9-NEXT: v_add_u32_e32 v0, v0, v1 ; GFX9-NEXT: BB2_1: ; %bb3 ; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1 -; GFX9-NEXT: v_mul_hi_u32 v3, v0, s6 -; GFX9-NEXT: v_mov_b32_e32 v1, s4 -; GFX9-NEXT: v_mov_b32_e32 v2, s5 +; GFX9-NEXT: v_mul_hi_u32 v3, s4, v0 +; GFX9-NEXT: v_mov_b32_e32 v2, s1 +; GFX9-NEXT: v_mov_b32_e32 v1, s0 ; GFX9-NEXT: v_mul_lo_u32 v4, v3, s3 ; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 -; GFX9-NEXT: v_add_u32_e32 v6, -1, v3 -; GFX9-NEXT: v_sub_u32_e32 v7, s6, v4 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, s6, v4 -; GFX9-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v7 -; GFX9-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] -; GFX9-NEXT: s_add_i32 s6, s6, 1 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v6, v3, vcc -; GFX9-NEXT: s_add_u32 s4, s4, 4 +; GFX9-NEXT: v_sub_u32_e32 v4, s4, v4 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s3, v4 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: v_subrev_u32_e32 v5, s3, v4 +; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v5, vcc +; GFX9-NEXT: s_add_i32 s4, s4, 1 +; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s3, v4 +; GFX9-NEXT: s_add_u32 s0, s0, 4 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: s_addc_u32 s1, s1, 0 ; GFX9-NEXT: v_xor_b32_e32 v3, s2, v3 -; GFX9-NEXT: s_addc_u32 s5, s5, 0 +; GFX9-NEXT: s_cmpk_eq_i32 s4, 0x400 ; GFX9-NEXT: v_subrev_u32_e32 v3, s2, v3 -; GFX9-NEXT: s_cmpk_eq_i32 s6, 0x400 ; GFX9-NEXT: global_store_dword v[1:2], v3, off ; GFX9-NEXT: s_cbranch_scc0 BB2_1 ; GFX9-NEXT: ; %bb.2: ; %bb2 @@ -201,43 +183,37 @@ define amdgpu_kernel void @srem32_invariant_denom(i32 addrspace(1)* nocapture %a ; GFX9-LABEL: srem32_invariant_denom: ; GFX9: ; %bb.0: ; %bb ; GFX9-NEXT: s_load_dword s2, s[0:1], 0x2c -; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24 +; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) ; GFX9-NEXT: s_ashr_i32 s3, s2, 31 ; GFX9-NEXT: s_add_i32 s2, s2, s3 ; GFX9-NEXT: s_xor_b32 s2, s2, s3 ; GFX9-NEXT: v_cvt_f32_u32_e32 v0, s2 -; GFX9-NEXT: s_mov_b32 s3, 0 +; GFX9-NEXT: s_sub_i32 s3, 0, s2 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GFX9-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GFX9-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX9-NEXT: v_mul_lo_u32 v1, v0, s2 -; GFX9-NEXT: v_mul_hi_u32 v2, v0, s2 -; GFX9-NEXT: v_sub_u32_e32 v3, 0, v1 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_add_u32_e32 v2, v0, v1 -; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GFX9-NEXT: v_mul_lo_u32 v1, s3, v0 +; GFX9-NEXT: s_mov_b32 s3, 0 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX9-NEXT: v_add_u32_e32 v0, v0, v1 ; GFX9-NEXT: BB3_1: ; %bb3 ; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1 -; GFX9-NEXT: v_mul_hi_u32 v3, v0, s3 -; GFX9-NEXT: v_mov_b32_e32 v1, s4 -; GFX9-NEXT: v_mov_b32_e32 v2, s5 +; GFX9-NEXT: v_mul_hi_u32 v3, s3, v0 +; GFX9-NEXT: v_mov_b32_e32 v2, s1 +; GFX9-NEXT: v_mov_b32_e32 v1, s0 ; GFX9-NEXT: v_mul_lo_u32 v3, v3, s2 -; GFX9-NEXT: v_sub_u32_e32 v4, s3, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], s3, v3 -; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v4 +; GFX9-NEXT: v_sub_u32_e32 v3, s3, v3 ; GFX9-NEXT: s_add_i32 s3, s3, 1 -; GFX9-NEXT: s_and_b64 vcc, vcc, s[0:1] -; GFX9-NEXT: v_subrev_u32_e32 v3, s2, v4 -; GFX9-NEXT: s_add_u32 s4, s4, 4 -; GFX9-NEXT: s_addc_u32 s5, s5, 0 -; GFX9-NEXT: v_add_u32_e32 v5, s2, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v3, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v3, v5, v3, s[0:1] +; GFX9-NEXT: v_subrev_u32_e32 v4, s2, v3 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v3 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX9-NEXT: s_add_u32 s0, s0, 4 +; GFX9-NEXT: s_addc_u32 s1, s1, 0 +; GFX9-NEXT: v_subrev_u32_e32 v4, s2, v3 +; GFX9-NEXT: v_cmp_le_u32_e32 vcc, s2, v3 ; GFX9-NEXT: s_cmpk_eq_i32 s3, 0x400 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc ; GFX9-NEXT: global_store_dword v[1:2], v3, off ; GFX9-NEXT: s_cbranch_scc0 BB3_1 ; GFX9-NEXT: ; %bb.2: ; %bb2 diff --git a/llvm/test/CodeGen/AMDGPU/sdiv.ll b/llvm/test/CodeGen/AMDGPU/sdiv.ll index dd87d23481ce..bb932b403f31 100644 --- a/llvm/test/CodeGen/AMDGPU/sdiv.ll +++ b/llvm/test/CodeGen/AMDGPU/sdiv.ll @@ -16,48 +16,44 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { ; GCN-LABEL: sdiv_i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9 +; GCN-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9 ; GCN-NEXT: s_mov_b32 s7, 0xf000 ; GCN-NEXT: s_mov_b32 s6, -1 -; GCN-NEXT: s_mov_b32 s2, s6 -; GCN-NEXT: s_mov_b32 s3, s7 +; GCN-NEXT: s_mov_b32 s10, s6 +; GCN-NEXT: s_mov_b32 s11, s7 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_mov_b32 s0, s10 -; GCN-NEXT: s_mov_b32 s1, s11 -; GCN-NEXT: buffer_load_dwordx2 v[0:1], off, s[0:3], 0 -; GCN-NEXT: s_mov_b32 s4, s8 -; GCN-NEXT: s_mov_b32 s5, s9 +; GCN-NEXT: s_mov_b32 s8, s2 +; GCN-NEXT: s_mov_b32 s9, s3 +; GCN-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 +; GCN-NEXT: s_mov_b32 s4, s0 +; GCN-NEXT: s_mov_b32 s5, s1 ; GCN-NEXT: s_waitcnt vmcnt(0) ; GCN-NEXT: v_ashrrev_i32_e32 v2, 31, v1 ; GCN-NEXT: v_add_i32_e32 v1, vcc, v2, v1 ; GCN-NEXT: v_xor_b32_e32 v1, v1, v2 ; GCN-NEXT: v_cvt_f32_u32_e32 v3, v1 -; GCN-NEXT: v_ashrrev_i32_e32 v6, 31, v0 -; GCN-NEXT: v_add_i32_e32 v0, vcc, v6, v0 -; GCN-NEXT: v_xor_b32_e32 v0, v0, v6 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 +; GCN-NEXT: v_ashrrev_i32_e32 v5, 31, v0 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v5, v0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GCN-NEXT: v_xor_b32_e32 v2, v6, v2 -; GCN-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; GCN-NEXT: v_xor_b32_e32 v0, v0, v5 +; GCN-NEXT: v_xor_b32_e32 v2, v5, v2 +; GCN-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GCN-NEXT: v_mul_lo_u32 v4, v3, v1 -; GCN-NEXT: v_mul_hi_u32 v5, v3, v1 -; GCN-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v5 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v4, v4, v3 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, v4, v3 -; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v3, v3, v0 +; GCN-NEXT: v_mul_lo_u32 v4, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v4, v3, v4 +; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v3, v0, v3 ; GCN-NEXT: v_mul_lo_u32 v4, v3, v1 ; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; GCN-NEXT: v_add_i32_e32 v6, vcc, -1, v3 -; GCN-NEXT: v_subrev_i32_e32 v7, vcc, v4, v0 -; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v1 -; GCN-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v5, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v4, v0 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v1 +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, v1, v0 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc ; GCN-NEXT: v_xor_b32_e32 v0, v0, v2 ; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 ; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 @@ -65,48 +61,44 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %i ; ; TONGA-LABEL: sdiv_i32: ; TONGA: ; %bb.0: -; TONGA-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x24 +; TONGA-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 ; TONGA-NEXT: s_mov_b32 s7, 0xf000 ; TONGA-NEXT: s_mov_b32 s6, -1 -; TONGA-NEXT: s_mov_b32 s2, s6 -; TONGA-NEXT: s_mov_b32 s3, s7 +; TONGA-NEXT: s_mov_b32 s10, s6 +; TONGA-NEXT: s_mov_b32 s11, s7 ; TONGA-NEXT: s_waitcnt lgkmcnt(0) -; TONGA-NEXT: s_mov_b32 s0, s10 -; TONGA-NEXT: s_mov_b32 s1, s11 -; TONGA-NEXT: buffer_load_dwordx2 v[0:1], off, s[0:3], 0 -; TONGA-NEXT: s_mov_b32 s4, s8 -; TONGA-NEXT: s_mov_b32 s5, s9 +; TONGA-NEXT: s_mov_b32 s8, s2 +; TONGA-NEXT: s_mov_b32 s9, s3 +; TONGA-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 +; TONGA-NEXT: s_mov_b32 s4, s0 +; TONGA-NEXT: s_mov_b32 s5, s1 ; TONGA-NEXT: s_waitcnt vmcnt(0) ; TONGA-NEXT: v_ashrrev_i32_e32 v2, 31, v1 ; TONGA-NEXT: v_add_u32_e32 v1, vcc, v2, v1 ; TONGA-NEXT: v_xor_b32_e32 v1, v1, v2 ; TONGA-NEXT: v_cvt_f32_u32_e32 v3, v1 -; TONGA-NEXT: v_ashrrev_i32_e32 v6, 31, v0 -; TONGA-NEXT: v_add_u32_e32 v0, vcc, v6, v0 -; TONGA-NEXT: v_xor_b32_e32 v0, v0, v6 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, 0, v1 +; TONGA-NEXT: v_ashrrev_i32_e32 v5, 31, v0 +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v5, v0 ; TONGA-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; TONGA-NEXT: v_xor_b32_e32 v2, v6, v2 -; TONGA-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; TONGA-NEXT: v_xor_b32_e32 v0, v0, v5 +; TONGA-NEXT: v_xor_b32_e32 v2, v5, v2 +; TONGA-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; TONGA-NEXT: v_cvt_u32_f32_e32 v3, v3 -; TONGA-NEXT: v_mul_lo_u32 v4, v3, v1 -; TONGA-NEXT: v_mul_hi_u32 v5, v3, v1 -; TONGA-NEXT: v_sub_u32_e32 v7, vcc, 0, v4 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v5 -; TONGA-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v4, v4, v3 -; TONGA-NEXT: v_add_u32_e32 v5, vcc, v4, v3 -; TONGA-NEXT: v_subrev_u32_e32 v3, vcc, v4, v3 -; TONGA-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v3, v3, v0 +; TONGA-NEXT: v_mul_lo_u32 v4, v4, v3 +; TONGA-NEXT: v_mul_hi_u32 v4, v3, v4 +; TONGA-NEXT: v_add_u32_e32 v3, vcc, v4, v3 +; TONGA-NEXT: v_mul_hi_u32 v3, v0, v3 ; TONGA-NEXT: v_mul_lo_u32 v4, v3, v1 ; TONGA-NEXT: v_add_u32_e32 v5, vcc, 1, v3 -; TONGA-NEXT: v_add_u32_e32 v6, vcc, -1, v3 -; TONGA-NEXT: v_subrev_u32_e32 v7, vcc, v4, v0 -; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v1 -; TONGA-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; TONGA-NEXT: v_cndmask_b32_e64 v0, v3, v5, s[0:1] -; TONGA-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; TONGA-NEXT: v_subrev_u32_e32 v0, vcc, v4, v0 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v1 +; TONGA-NEXT: v_subrev_u32_e32 v4, vcc, v1, v0 +; TONGA-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; TONGA-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v4, vcc, 1, v3 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; TONGA-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc ; TONGA-NEXT: v_xor_b32_e32 v0, v0, v2 ; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v0, v2 ; TONGA-NEXT: buffer_store_dword v0, off, s[4:7], 0 @@ -114,51 +106,47 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %i ; ; GFX9-LABEL: sdiv_i32: ; GFX9: ; %bb.0: -; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b32 s7, 0xf000 -; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: s_mov_b32 s10, s6 -; GFX9-NEXT: s_mov_b32 s11, s7 +; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24 +; GFX9-NEXT: s_mov_b32 s3, 0xf000 +; GFX9-NEXT: s_mov_b32 s2, -1 +; GFX9-NEXT: s_mov_b32 s10, s2 +; GFX9-NEXT: s_mov_b32 s11, s3 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: s_mov_b32 s8, s2 -; GFX9-NEXT: s_mov_b32 s9, s3 +; GFX9-NEXT: s_mov_b32 s8, s6 +; GFX9-NEXT: s_mov_b32 s9, s7 ; GFX9-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 -; GFX9-NEXT: s_mov_b32 s4, s0 -; GFX9-NEXT: s_mov_b32 s5, s1 +; GFX9-NEXT: s_mov_b32 s0, s4 +; GFX9-NEXT: s_mov_b32 s1, s5 ; GFX9-NEXT: s_waitcnt vmcnt(0) ; GFX9-NEXT: v_ashrrev_i32_e32 v2, 31, v1 ; GFX9-NEXT: v_add_u32_e32 v1, v1, v2 ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v2 ; GFX9-NEXT: v_cvt_f32_u32_e32 v3, v1 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_lo_u32 v4, v3, v1 -; GFX9-NEXT: v_mul_hi_u32 v5, v3, v1 -; GFX9-NEXT: v_sub_u32_e32 v6, 0, v4 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GFX9-NEXT: v_mul_hi_u32 v4, v4, v3 +; GFX9-NEXT: v_sub_u32_e32 v4, 0, v1 ; GFX9-NEXT: v_ashrrev_i32_e32 v5, 31, v0 ; GFX9-NEXT: v_add_u32_e32 v0, v0, v5 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v3, v3 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v5 -; GFX9-NEXT: v_add_u32_e32 v6, v3, v4 -; GFX9-NEXT: v_sub_u32_e32 v3, v3, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v6, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v0 ; GFX9-NEXT: v_xor_b32_e32 v2, v5, v2 +; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GFX9-NEXT: v_mul_lo_u32 v4, v4, v3 +; GFX9-NEXT: v_mul_hi_u32 v4, v3, v4 +; GFX9-NEXT: v_add_u32_e32 v3, v3, v4 +; GFX9-NEXT: v_mul_hi_u32 v3, v0, v3 ; GFX9-NEXT: v_mul_lo_u32 v4, v3, v1 ; GFX9-NEXT: v_add_u32_e32 v5, 1, v3 -; GFX9-NEXT: v_add_u32_e32 v6, -1, v3 -; GFX9-NEXT: v_sub_u32_e32 v7, v0, v4 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v1 -; GFX9-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, v5, s[0:1] -; GFX9-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v4 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_sub_u32_e32 v4, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GFX9-NEXT: v_add_u32_e32 v4, 1, v3 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v2 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v2 -; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; GFX9-NEXT: s_endpgm ; ; EG-LABEL: sdiv_i32: @@ -408,248 +396,226 @@ define amdgpu_kernel void @slow_sdiv_i32_3435(i32 addrspace(1)* %out, i32 addrsp define amdgpu_kernel void @sdiv_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { ; GCN-LABEL: sdiv_v2i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s11, 0xf000 -; GCN-NEXT: s_mov_b32 s10, -1 -; GCN-NEXT: s_mov_b32 s4, 0x4f800000 +; GCN-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9 +; GCN-NEXT: s_mov_b32 s7, 0xf000 +; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_mov_b32 s2, s6 +; GCN-NEXT: s_mov_b32 s3, s7 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_mov_b32 s8, s0 -; GCN-NEXT: s_mov_b32 s9, s1 -; GCN-NEXT: s_mov_b32 s0, s2 -; GCN-NEXT: s_mov_b32 s1, s3 -; GCN-NEXT: s_mov_b32 s2, s10 -; GCN-NEXT: s_mov_b32 s3, s11 +; GCN-NEXT: s_mov_b32 s0, s10 +; GCN-NEXT: s_mov_b32 s1, s11 ; GCN-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 +; GCN-NEXT: s_mov_b32 s2, 0x4f7ffffe +; GCN-NEXT: s_mov_b32 s4, s8 +; GCN-NEXT: s_mov_b32 s5, s9 ; GCN-NEXT: s_waitcnt vmcnt(0) -; GCN-NEXT: v_ashrrev_i32_e32 v5, 31, v2 -; GCN-NEXT: v_ashrrev_i32_e32 v7, 31, v3 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v5, v2 -; GCN-NEXT: v_ashrrev_i32_e32 v4, 31, v0 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v7, v3 -; GCN-NEXT: v_xor_b32_e32 v2, v2, v5 -; GCN-NEXT: v_ashrrev_i32_e32 v6, 31, v1 -; GCN-NEXT: v_xor_b32_e32 v8, v4, v5 +; GCN-NEXT: v_ashrrev_i32_e32 v4, 31, v2 +; GCN-NEXT: v_add_i32_e32 v2, vcc, v4, v2 +; GCN-NEXT: v_xor_b32_e32 v2, v2, v4 ; GCN-NEXT: v_cvt_f32_u32_e32 v5, v2 -; GCN-NEXT: v_xor_b32_e32 v3, v3, v7 -; GCN-NEXT: v_xor_b32_e32 v9, v6, v7 -; GCN-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v2 +; GCN-NEXT: v_ashrrev_i32_e32 v7, 31, v0 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v7, v0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GCN-NEXT: v_add_i32_e32 v0, vcc, v4, v0 -; GCN-NEXT: v_xor_b32_e32 v0, v0, v4 -; GCN-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GCN-NEXT: v_mul_f32_e32 v4, s4, v5 -; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GCN-NEXT: v_add_i32_e32 v1, vcc, v6, v1 -; GCN-NEXT: v_mul_f32_e32 v5, s4, v7 +; GCN-NEXT: v_xor_b32_e32 v0, v0, v7 +; GCN-NEXT: v_xor_b32_e32 v4, v7, v4 +; GCN-NEXT: v_mul_f32_e32 v5, s2, v5 ; GCN-NEXT: v_cvt_u32_f32_e32 v5, v5 +; GCN-NEXT: v_mul_lo_u32 v6, v6, v5 +; GCN-NEXT: v_mul_hi_u32 v6, v5, v6 +; GCN-NEXT: v_add_i32_e32 v5, vcc, v6, v5 +; GCN-NEXT: v_mul_hi_u32 v5, v0, v5 +; GCN-NEXT: v_ashrrev_i32_e32 v6, 31, v3 +; GCN-NEXT: v_mul_lo_u32 v8, v5, v2 +; GCN-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v8, v0 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v2 +; GCN-NEXT: v_subrev_i32_e32 v8, vcc, v2, v0 +; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[0:1] +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v8, s[0:1] +; GCN-NEXT: v_add_i32_e32 v8, vcc, 1, v5 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GCN-NEXT: s_mov_b64 s[0:1], vcc +; GCN-NEXT: v_add_i32_e32 v0, vcc, v6, v3 +; GCN-NEXT: v_xor_b32_e32 v2, v0, v6 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, v2 +; GCN-NEXT: v_sub_i32_e32 v9, vcc, 0, v2 +; GCN-NEXT: v_ashrrev_i32_e32 v3, 31, v1 +; GCN-NEXT: v_add_i32_e32 v1, vcc, v3, v1 +; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GCN-NEXT: v_xor_b32_e32 v1, v1, v3 +; GCN-NEXT: v_xor_b32_e32 v6, v3, v6 +; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v8, s[0:1] +; GCN-NEXT: v_mul_f32_e32 v0, s2, v0 +; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 +; GCN-NEXT: v_mul_lo_u32 v9, v9, v0 +; GCN-NEXT: v_mul_hi_u32 v7, v0, v9 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v7, v0 +; GCN-NEXT: v_mul_hi_u32 v3, v1, v0 +; GCN-NEXT: v_xor_b32_e32 v0, v5, v4 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GCN-NEXT: v_mul_lo_u32 v4, v3, v2 +; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v2 +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, v2, v1 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc ; GCN-NEXT: v_xor_b32_e32 v1, v1, v6 -; GCN-NEXT: v_mul_hi_u32 v6, v4, v2 -; GCN-NEXT: v_mul_lo_u32 v7, v4, v2 -; GCN-NEXT: v_mul_hi_u32 v10, v5, v3 -; GCN-NEXT: v_mul_lo_u32 v11, v5, v3 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v6 -; GCN-NEXT: v_sub_i32_e32 v12, vcc, 0, v7 -; GCN-NEXT: v_cndmask_b32_e64 v6, v7, v12, s[0:1] -; GCN-NEXT: v_sub_i32_e32 v13, vcc, 0, v11 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v10 -; GCN-NEXT: v_cndmask_b32_e64 v7, v11, v13, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v6, v6, v4 -; GCN-NEXT: v_mul_hi_u32 v7, v7, v5 -; GCN-NEXT: v_add_i32_e32 v10, vcc, v6, v4 -; GCN-NEXT: v_subrev_i32_e32 v4, vcc, v6, v4 -; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v10, s[0:1] -; GCN-NEXT: v_add_i32_e32 v6, vcc, v7, v5 -; GCN-NEXT: v_subrev_i32_e32 v5, vcc, v7, v5 -; GCN-NEXT: v_mul_hi_u32 v4, v4, v0 -; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v5, v5, v1 -; GCN-NEXT: v_mul_lo_u32 v6, v4, v2 -; GCN-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GCN-NEXT: v_mul_lo_u32 v11, v5, v3 -; GCN-NEXT: v_add_i32_e32 v10, vcc, -1, v4 -; GCN-NEXT: v_subrev_i32_e32 v14, vcc, v6, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v6 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], v14, v2 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v11, v1 -; GCN-NEXT: v_add_i32_e32 v12, vcc, 1, v5 -; GCN-NEXT: v_add_i32_e32 v13, vcc, -1, v5 -; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v1, v11 -; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; GCN-NEXT: s_and_b64 s[2:3], s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[2:3] -; GCN-NEXT: s_and_b64 s[2:3], s[4:5], vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v5, v12, s[2:3] -; GCN-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v1, v13, v1, vcc -; GCN-NEXT: v_xor_b32_e32 v0, v0, v8 -; GCN-NEXT: v_xor_b32_e32 v1, v1, v9 -; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, v1, v9 -; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; GCN-NEXT: v_sub_i32_e32 v1, vcc, v1, v6 +; GCN-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GCN-NEXT: s_endpgm ; ; TONGA-LABEL: sdiv_v2i32: ; TONGA: ; %bb.0: -; TONGA-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 -; TONGA-NEXT: s_mov_b32 s11, 0xf000 -; TONGA-NEXT: s_mov_b32 s10, -1 -; TONGA-NEXT: s_mov_b32 s4, 0x4f800000 +; TONGA-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x24 +; TONGA-NEXT: s_mov_b32 s7, 0xf000 +; TONGA-NEXT: s_mov_b32 s6, -1 +; TONGA-NEXT: s_mov_b32 s2, s6 +; TONGA-NEXT: s_mov_b32 s3, s7 ; TONGA-NEXT: s_waitcnt lgkmcnt(0) -; TONGA-NEXT: s_mov_b32 s8, s0 -; TONGA-NEXT: s_mov_b32 s9, s1 -; TONGA-NEXT: s_mov_b32 s0, s2 -; TONGA-NEXT: s_mov_b32 s1, s3 -; TONGA-NEXT: s_mov_b32 s2, s10 -; TONGA-NEXT: s_mov_b32 s3, s11 +; TONGA-NEXT: s_mov_b32 s0, s10 +; TONGA-NEXT: s_mov_b32 s1, s11 ; TONGA-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 +; TONGA-NEXT: s_mov_b32 s2, 0x4f7ffffe +; TONGA-NEXT: s_mov_b32 s4, s8 +; TONGA-NEXT: s_mov_b32 s5, s9 ; TONGA-NEXT: s_waitcnt vmcnt(0) -; TONGA-NEXT: v_ashrrev_i32_e32 v5, 31, v2 -; TONGA-NEXT: v_ashrrev_i32_e32 v7, 31, v3 -; TONGA-NEXT: v_add_u32_e32 v2, vcc, v5, v2 -; TONGA-NEXT: v_ashrrev_i32_e32 v4, 31, v0 -; TONGA-NEXT: v_add_u32_e32 v3, vcc, v7, v3 -; TONGA-NEXT: v_xor_b32_e32 v2, v2, v5 -; TONGA-NEXT: v_ashrrev_i32_e32 v6, 31, v1 -; TONGA-NEXT: v_xor_b32_e32 v8, v4, v5 +; TONGA-NEXT: v_ashrrev_i32_e32 v4, 31, v2 +; TONGA-NEXT: v_add_u32_e32 v2, vcc, v4, v2 +; TONGA-NEXT: v_xor_b32_e32 v2, v2, v4 ; TONGA-NEXT: v_cvt_f32_u32_e32 v5, v2 -; TONGA-NEXT: v_xor_b32_e32 v3, v3, v7 -; TONGA-NEXT: v_xor_b32_e32 v9, v6, v7 -; TONGA-NEXT: v_cvt_f32_u32_e32 v7, v3 +; TONGA-NEXT: v_sub_u32_e32 v6, vcc, 0, v2 +; TONGA-NEXT: v_ashrrev_i32_e32 v7, 31, v0 +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v7, v0 ; TONGA-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; TONGA-NEXT: v_add_u32_e32 v0, vcc, v4, v0 -; TONGA-NEXT: v_xor_b32_e32 v0, v0, v4 -; TONGA-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; TONGA-NEXT: v_mul_f32_e32 v4, s4, v5 -; TONGA-NEXT: v_cvt_u32_f32_e32 v4, v4 -; TONGA-NEXT: v_add_u32_e32 v1, vcc, v6, v1 -; TONGA-NEXT: v_mul_f32_e32 v5, s4, v7 +; TONGA-NEXT: v_xor_b32_e32 v0, v0, v7 +; TONGA-NEXT: v_xor_b32_e32 v4, v7, v4 +; TONGA-NEXT: v_mul_f32_e32 v5, s2, v5 ; TONGA-NEXT: v_cvt_u32_f32_e32 v5, v5 +; TONGA-NEXT: v_mul_lo_u32 v6, v6, v5 +; TONGA-NEXT: v_mul_hi_u32 v6, v5, v6 +; TONGA-NEXT: v_add_u32_e32 v5, vcc, v6, v5 +; TONGA-NEXT: v_mul_hi_u32 v5, v0, v5 +; TONGA-NEXT: v_ashrrev_i32_e32 v6, 31, v3 +; TONGA-NEXT: v_mul_lo_u32 v8, v5, v2 +; TONGA-NEXT: v_add_u32_e32 v9, vcc, 1, v5 +; TONGA-NEXT: v_subrev_u32_e32 v0, vcc, v8, v0 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v2 +; TONGA-NEXT: v_subrev_u32_e32 v8, vcc, v2, v0 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[0:1] +; TONGA-NEXT: v_cndmask_b32_e64 v0, v0, v8, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v8, vcc, 1, v5 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; TONGA-NEXT: s_mov_b64 s[0:1], vcc +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v6, v3 +; TONGA-NEXT: v_xor_b32_e32 v2, v0, v6 +; TONGA-NEXT: v_cvt_f32_u32_e32 v0, v2 +; TONGA-NEXT: v_sub_u32_e32 v9, vcc, 0, v2 +; TONGA-NEXT: v_ashrrev_i32_e32 v3, 31, v1 +; TONGA-NEXT: v_add_u32_e32 v1, vcc, v3, v1 +; TONGA-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; TONGA-NEXT: v_xor_b32_e32 v1, v1, v3 +; TONGA-NEXT: v_xor_b32_e32 v6, v3, v6 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v8, s[0:1] +; TONGA-NEXT: v_mul_f32_e32 v0, s2, v0 +; TONGA-NEXT: v_cvt_u32_f32_e32 v0, v0 +; TONGA-NEXT: v_mul_lo_u32 v9, v9, v0 +; TONGA-NEXT: v_mul_hi_u32 v7, v0, v9 +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v7, v0 +; TONGA-NEXT: v_mul_hi_u32 v3, v1, v0 +; TONGA-NEXT: v_xor_b32_e32 v0, v5, v4 +; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v0, v4 +; TONGA-NEXT: v_mul_lo_u32 v4, v3, v2 +; TONGA-NEXT: v_add_u32_e32 v5, vcc, 1, v3 +; TONGA-NEXT: v_subrev_u32_e32 v1, vcc, v4, v1 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v2 +; TONGA-NEXT: v_subrev_u32_e32 v4, vcc, v2, v1 +; TONGA-NEXT: v_cndmask_b32_e64 v3, v3, v5, s[0:1] +; TONGA-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v4, vcc, 1, v3 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; TONGA-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc ; TONGA-NEXT: v_xor_b32_e32 v1, v1, v6 -; TONGA-NEXT: v_mul_hi_u32 v6, v4, v2 -; TONGA-NEXT: v_mul_lo_u32 v7, v4, v2 -; TONGA-NEXT: v_mul_hi_u32 v10, v5, v3 -; TONGA-NEXT: v_mul_lo_u32 v11, v5, v3 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v6 -; TONGA-NEXT: v_sub_u32_e32 v12, vcc, 0, v7 -; TONGA-NEXT: v_cndmask_b32_e64 v6, v7, v12, s[0:1] -; TONGA-NEXT: v_sub_u32_e32 v13, vcc, 0, v11 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v10 -; TONGA-NEXT: v_cndmask_b32_e64 v7, v11, v13, s[2:3] -; TONGA-NEXT: v_mul_hi_u32 v6, v6, v4 -; TONGA-NEXT: v_mul_hi_u32 v7, v7, v5 -; TONGA-NEXT: v_add_u32_e32 v10, vcc, v6, v4 -; TONGA-NEXT: v_subrev_u32_e32 v4, vcc, v6, v4 -; TONGA-NEXT: v_cndmask_b32_e64 v4, v4, v10, s[0:1] -; TONGA-NEXT: v_add_u32_e32 v6, vcc, v7, v5 -; TONGA-NEXT: v_subrev_u32_e32 v5, vcc, v7, v5 -; TONGA-NEXT: v_mul_hi_u32 v4, v4, v0 -; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[2:3] -; TONGA-NEXT: v_mul_hi_u32 v5, v5, v1 -; TONGA-NEXT: v_mul_lo_u32 v6, v4, v2 -; TONGA-NEXT: v_add_u32_e32 v7, vcc, 1, v4 -; TONGA-NEXT: v_mul_lo_u32 v11, v5, v3 -; TONGA-NEXT: v_add_u32_e32 v10, vcc, -1, v4 -; TONGA-NEXT: v_subrev_u32_e32 v14, vcc, v6, v0 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v6 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[2:3], v14, v2 -; TONGA-NEXT: v_subrev_u32_e32 v0, vcc, v11, v1 -; TONGA-NEXT: v_add_u32_e32 v12, vcc, 1, v5 -; TONGA-NEXT: v_add_u32_e32 v13, vcc, -1, v5 -; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v1, v11 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; TONGA-NEXT: s_and_b64 s[2:3], s[2:3], s[0:1] -; TONGA-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[2:3] -; TONGA-NEXT: s_and_b64 s[2:3], s[4:5], vcc -; TONGA-NEXT: v_cndmask_b32_e64 v1, v5, v12, s[2:3] -; TONGA-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[0:1] -; TONGA-NEXT: v_cndmask_b32_e32 v1, v13, v1, vcc -; TONGA-NEXT: v_xor_b32_e32 v0, v0, v8 -; TONGA-NEXT: v_xor_b32_e32 v1, v1, v9 -; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v0, v8 -; TONGA-NEXT: v_sub_u32_e32 v1, vcc, v1, v9 -; TONGA-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; TONGA-NEXT: v_sub_u32_e32 v1, vcc, v1, v6 +; TONGA-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; TONGA-NEXT: s_endpgm ; ; GFX9-LABEL: sdiv_v2i32: ; GFX9: ; %bb.0: ; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b32 s11, 0xf000 -; GFX9-NEXT: s_mov_b32 s10, -1 -; GFX9-NEXT: s_mov_b32 s4, 0x4f800000 +; GFX9-NEXT: s_mov_b32 s7, 0xf000 +; GFX9-NEXT: s_mov_b32 s6, -1 +; GFX9-NEXT: s_mov_b32 s10, s6 +; GFX9-NEXT: s_mov_b32 s11, s7 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: s_mov_b32 s8, s0 -; GFX9-NEXT: s_mov_b32 s9, s1 -; GFX9-NEXT: s_mov_b32 s0, s2 -; GFX9-NEXT: s_mov_b32 s1, s3 -; GFX9-NEXT: s_mov_b32 s2, s10 -; GFX9-NEXT: s_mov_b32 s3, s11 -; GFX9-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 +; GFX9-NEXT: s_mov_b32 s8, s2 +; GFX9-NEXT: s_mov_b32 s9, s3 +; GFX9-NEXT: buffer_load_dwordx4 v[0:3], off, s[8:11], 0 +; GFX9-NEXT: s_mov_b32 s2, 0x4f7ffffe +; GFX9-NEXT: s_mov_b32 s4, s0 +; GFX9-NEXT: s_mov_b32 s5, s1 ; GFX9-NEXT: s_waitcnt vmcnt(0) -; GFX9-NEXT: v_ashrrev_i32_e32 v5, 31, v2 -; GFX9-NEXT: v_ashrrev_i32_e32 v6, 31, v3 -; GFX9-NEXT: v_add_u32_e32 v2, v2, v5 -; GFX9-NEXT: v_add_u32_e32 v3, v3, v6 -; GFX9-NEXT: v_xor_b32_e32 v2, v2, v5 -; GFX9-NEXT: v_cvt_f32_u32_e32 v7, v2 -; GFX9-NEXT: v_xor_b32_e32 v3, v3, v6 -; GFX9-NEXT: v_cvt_f32_u32_e32 v8, v3 -; GFX9-NEXT: v_ashrrev_i32_e32 v4, 31, v0 +; GFX9-NEXT: v_ashrrev_i32_e32 v4, 31, v2 +; GFX9-NEXT: v_ashrrev_i32_e32 v5, 31, v3 +; GFX9-NEXT: v_add_u32_e32 v2, v2, v4 +; GFX9-NEXT: v_add_u32_e32 v3, v3, v5 +; GFX9-NEXT: v_xor_b32_e32 v2, v2, v4 +; GFX9-NEXT: v_xor_b32_e32 v3, v3, v5 +; GFX9-NEXT: v_cvt_f32_u32_e32 v6, v2 +; GFX9-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GFX9-NEXT: v_sub_u32_e32 v10, 0, v2 +; GFX9-NEXT: v_sub_u32_e32 v11, 0, v3 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v6, v6 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GFX9-NEXT: v_add_u32_e32 v0, v0, v4 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v8, v8 -; GFX9-NEXT: v_xor_b32_e32 v5, v4, v5 -; GFX9-NEXT: v_mul_f32_e32 v7, s4, v7 -; GFX9-NEXT: v_cvt_u32_f32_e32 v7, v7 -; GFX9-NEXT: v_mul_f32_e32 v8, s4, v8 -; GFX9-NEXT: v_cvt_u32_f32_e32 v8, v8 -; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4 -; GFX9-NEXT: v_mul_lo_u32 v4, v7, v2 -; GFX9-NEXT: v_mul_hi_u32 v11, v7, v2 -; GFX9-NEXT: v_mul_lo_u32 v10, v8, v3 -; GFX9-NEXT: v_mul_hi_u32 v12, v8, v3 -; GFX9-NEXT: v_sub_u32_e32 v13, 0, v4 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v11 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v13, vcc -; GFX9-NEXT: v_sub_u32_e32 v14, 0, v10 -; GFX9-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v12 -; GFX9-NEXT: v_cndmask_b32_e64 v10, v10, v14, s[0:1] -; GFX9-NEXT: v_mul_hi_u32 v4, v4, v7 -; GFX9-NEXT: v_mul_hi_u32 v10, v10, v8 +; GFX9-NEXT: v_ashrrev_i32_e32 v8, 31, v0 ; GFX9-NEXT: v_ashrrev_i32_e32 v9, 31, v1 +; GFX9-NEXT: v_mul_f32_e32 v6, s2, v6 +; GFX9-NEXT: v_mul_f32_e32 v7, s2, v7 +; GFX9-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GFX9-NEXT: v_cvt_u32_f32_e32 v7, v7 +; GFX9-NEXT: v_add_u32_e32 v0, v0, v8 ; GFX9-NEXT: v_add_u32_e32 v1, v1, v9 -; GFX9-NEXT: v_xor_b32_e32 v6, v9, v6 +; GFX9-NEXT: v_mul_lo_u32 v10, v10, v6 +; GFX9-NEXT: v_mul_lo_u32 v11, v11, v7 +; GFX9-NEXT: v_xor_b32_e32 v0, v0, v8 ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v9 -; GFX9-NEXT: v_add_u32_e32 v9, v7, v4 -; GFX9-NEXT: v_sub_u32_e32 v4, v7, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v9, vcc -; GFX9-NEXT: v_add_u32_e32 v7, v8, v10 -; GFX9-NEXT: v_sub_u32_e32 v8, v8, v10 -; GFX9-NEXT: v_mul_hi_u32 v4, v4, v0 -; GFX9-NEXT: v_cndmask_b32_e64 v7, v8, v7, s[0:1] -; GFX9-NEXT: v_mul_hi_u32 v7, v7, v1 -; GFX9-NEXT: v_mul_lo_u32 v8, v4, v2 -; GFX9-NEXT: v_add_u32_e32 v9, 1, v4 -; GFX9-NEXT: v_mul_lo_u32 v11, v7, v3 -; GFX9-NEXT: v_add_u32_e32 v12, 1, v7 -; GFX9-NEXT: v_sub_u32_e32 v14, v0, v8 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v8 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[2:3], v14, v2 -; GFX9-NEXT: v_sub_u32_e32 v0, v1, v11 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v11 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; GFX9-NEXT: s_and_b64 s[2:3], s[2:3], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v4, v9, s[2:3] -; GFX9-NEXT: s_and_b64 s[2:3], s[4:5], s[0:1] -; GFX9-NEXT: v_add_u32_e32 v10, -1, v4 -; GFX9-NEXT: v_add_u32_e32 v13, -1, v7 -; GFX9-NEXT: v_cndmask_b32_e64 v1, v7, v12, s[2:3] -; GFX9-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[0:1] -; GFX9-NEXT: v_xor_b32_e32 v0, v0, v5 -; GFX9-NEXT: v_xor_b32_e32 v1, v1, v6 -; GFX9-NEXT: v_sub_u32_e32 v0, v0, v5 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v6 -; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; GFX9-NEXT: v_mul_hi_u32 v10, v6, v10 +; GFX9-NEXT: v_mul_hi_u32 v11, v7, v11 +; GFX9-NEXT: v_xor_b32_e32 v4, v8, v4 +; GFX9-NEXT: v_xor_b32_e32 v5, v9, v5 +; GFX9-NEXT: v_add_u32_e32 v6, v6, v10 +; GFX9-NEXT: v_add_u32_e32 v7, v7, v11 +; GFX9-NEXT: v_mul_hi_u32 v6, v0, v6 +; GFX9-NEXT: v_mul_hi_u32 v7, v1, v7 +; GFX9-NEXT: v_mul_lo_u32 v8, v6, v2 +; GFX9-NEXT: v_mul_lo_u32 v9, v7, v3 +; GFX9-NEXT: v_add_u32_e32 v10, 1, v6 +; GFX9-NEXT: v_add_u32_e32 v11, 1, v7 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v8 +; GFX9-NEXT: v_sub_u32_e32 v1, v1, v9 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v8, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v3 +; GFX9-NEXT: v_sub_u32_e32 v9, v1, v3 +; GFX9-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v8, vcc +; GFX9-NEXT: v_cndmask_b32_e64 v7, v7, v11, s[0:1] +; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, v9, s[0:1] +; GFX9-NEXT: v_add_u32_e32 v8, 1, v6 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v6, v8, vcc +; GFX9-NEXT: v_add_u32_e32 v9, 1, v7 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v9, vcc +; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4 +; GFX9-NEXT: v_xor_b32_e32 v1, v1, v5 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v4 +; GFX9-NEXT: v_sub_u32_e32 v1, v1, v5 +; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX9-NEXT: s_endpgm ; ; EG-LABEL: sdiv_v2i32: @@ -846,446 +812,404 @@ define amdgpu_kernel void @sdiv_v2i32_4(<2 x i32> addrspace(1)* %out, <2 x i32> define amdgpu_kernel void @sdiv_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { ; GCN-LABEL: sdiv_v4i32: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx4 s[12:15], s[0:1], 0x9 -; GCN-NEXT: s_mov_b32 s11, 0xf000 -; GCN-NEXT: s_mov_b32 s10, -1 -; GCN-NEXT: s_mov_b32 s2, s10 -; GCN-NEXT: s_mov_b32 s3, s11 +; GCN-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9 +; GCN-NEXT: s_mov_b32 s7, 0xf000 +; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_mov_b32 s2, s6 +; GCN-NEXT: s_mov_b32 s3, s7 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_mov_b32 s0, s14 -; GCN-NEXT: s_mov_b32 s1, s15 -; GCN-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 -; GCN-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:16 -; GCN-NEXT: s_mov_b32 s14, 0x4f800000 -; GCN-NEXT: s_mov_b32 s8, s12 -; GCN-NEXT: s_mov_b32 s9, s13 -; GCN-NEXT: s_waitcnt vmcnt(1) -; GCN-NEXT: v_ashrrev_i32_e32 v8, 31, v0 +; GCN-NEXT: s_mov_b32 s0, s10 +; GCN-NEXT: s_mov_b32 s1, s11 +; GCN-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:16 +; GCN-NEXT: s_mov_b32 s10, 0x4f7ffffe +; GCN-NEXT: s_mov_b32 s4, s8 +; GCN-NEXT: s_mov_b32 s5, s9 ; GCN-NEXT: s_waitcnt vmcnt(0) -; GCN-NEXT: v_ashrrev_i32_e32 v9, 31, v4 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v9, v4 -; GCN-NEXT: v_xor_b32_e32 v4, v4, v9 -; GCN-NEXT: v_xor_b32_e32 v15, v8, v9 -; GCN-NEXT: v_cvt_f32_u32_e32 v9, v4 -; GCN-NEXT: v_ashrrev_i32_e32 v11, 31, v5 -; GCN-NEXT: v_add_i32_e32 v5, vcc, v11, v5 +; GCN-NEXT: v_ashrrev_i32_e32 v8, 31, v0 ; GCN-NEXT: v_add_i32_e32 v0, vcc, v8, v0 -; GCN-NEXT: v_rcp_iflag_f32_e32 v9, v9 -; GCN-NEXT: v_xor_b32_e32 v5, v5, v11 ; GCN-NEXT: v_xor_b32_e32 v0, v0, v8 -; GCN-NEXT: v_cvt_f32_u32_e32 v8, v5 -; GCN-NEXT: v_mul_f32_e32 v9, s14, v9 -; GCN-NEXT: v_cvt_u32_f32_e32 v9, v9 +; GCN-NEXT: v_cvt_f32_u32_e32 v4, v0 +; GCN-NEXT: v_ashrrev_i32_e32 v14, 31, v2 +; GCN-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; GCN-NEXT: v_mul_f32_e32 v4, s10, v4 +; GCN-NEXT: v_cvt_u32_f32_e32 v9, v4 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v0 +; GCN-NEXT: v_mul_lo_u32 v10, v4, v9 +; GCN-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 +; GCN-NEXT: v_mul_hi_u32 v10, v9, v10 +; GCN-NEXT: v_add_i32_e32 v9, vcc, v10, v9 ; GCN-NEXT: v_ashrrev_i32_e32 v10, 31, v1 -; GCN-NEXT: v_rcp_iflag_f32_e32 v8, v8 -; GCN-NEXT: v_add_i32_e32 v1, vcc, v10, v1 -; GCN-NEXT: v_xor_b32_e32 v16, v10, v11 -; GCN-NEXT: v_xor_b32_e32 v1, v1, v10 -; GCN-NEXT: v_mul_f32_e32 v8, s14, v8 -; GCN-NEXT: v_mul_hi_u32 v11, v9, v4 -; GCN-NEXT: v_mul_lo_u32 v10, v9, v4 -; GCN-NEXT: v_cvt_u32_f32_e32 v8, v8 -; GCN-NEXT: v_ashrrev_i32_e32 v12, 31, v2 -; GCN-NEXT: v_ashrrev_i32_e32 v13, 31, v6 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v12, v2 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v11 -; GCN-NEXT: v_xor_b32_e32 v17, v12, v13 -; GCN-NEXT: v_xor_b32_e32 v2, v2, v12 -; GCN-NEXT: v_sub_i32_e32 v12, vcc, 0, v10 -; GCN-NEXT: v_cndmask_b32_e64 v10, v10, v12, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v12, v8, v5 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v13, v6 -; GCN-NEXT: v_xor_b32_e32 v6, v6, v13 -; GCN-NEXT: v_mul_lo_u32 v11, v8, v5 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v12 -; GCN-NEXT: v_cvt_f32_u32_e32 v12, v6 -; GCN-NEXT: v_mul_hi_u32 v10, v10, v9 -; GCN-NEXT: v_sub_i32_e32 v13, vcc, 0, v11 -; GCN-NEXT: v_cndmask_b32_e64 v11, v11, v13, s[2:3] -; GCN-NEXT: v_rcp_iflag_f32_e32 v12, v12 -; GCN-NEXT: v_ashrrev_i32_e32 v14, 31, v7 -; GCN-NEXT: v_add_i32_e32 v7, vcc, v14, v7 -; GCN-NEXT: v_xor_b32_e32 v7, v7, v14 -; GCN-NEXT: v_mul_f32_e32 v12, s14, v12 -; GCN-NEXT: v_cvt_u32_f32_e32 v12, v12 -; GCN-NEXT: v_mul_hi_u32 v18, v12, v6 -; GCN-NEXT: v_mul_lo_u32 v13, v12, v6 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v18 -; GCN-NEXT: v_add_i32_e32 v18, vcc, v10, v9 -; GCN-NEXT: v_subrev_i32_e32 v9, vcc, v10, v9 -; GCN-NEXT: v_mul_hi_u32 v10, v11, v8 -; GCN-NEXT: v_cndmask_b32_e64 v9, v9, v18, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v9, v9, v0 -; GCN-NEXT: v_sub_i32_e32 v19, vcc, 0, v13 -; GCN-NEXT: v_add_i32_e32 v11, vcc, v10, v8 -; GCN-NEXT: v_subrev_i32_e32 v8, vcc, v10, v8 -; GCN-NEXT: v_cndmask_b32_e64 v13, v13, v19, s[4:5] -; GCN-NEXT: v_cndmask_b32_e64 v8, v8, v11, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v10, v13, v12 -; GCN-NEXT: v_mul_lo_u32 v11, v9, v4 -; GCN-NEXT: v_mul_hi_u32 v8, v8, v1 -; GCN-NEXT: v_add_i32_e32 v13, vcc, v10, v12 -; GCN-NEXT: v_subrev_i32_e32 v10, vcc, v10, v12 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v11 -; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v11 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], v0, v4 -; GCN-NEXT: v_cndmask_b32_e64 v10, v10, v13, s[4:5] -; GCN-NEXT: v_mul_lo_u32 v0, v8, v5 -; GCN-NEXT: v_mul_hi_u32 v4, v10, v2 -; GCN-NEXT: v_add_i32_e32 v12, vcc, -1, v9 -; GCN-NEXT: v_add_i32_e32 v10, vcc, -1, v8 -; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v0 -; GCN-NEXT: v_sub_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[6:7], v0, v5 -; GCN-NEXT: v_mul_lo_u32 v5, v4, v6 -; GCN-NEXT: v_add_i32_e32 v1, vcc, 1, v9 -; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v8 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; GCN-NEXT: v_sub_i32_e32 v9, vcc, v2, v5 -; GCN-NEXT: s_and_b64 vcc, s[6:7], s[4:5] -; GCN-NEXT: v_cvt_f32_u32_e32 v11, v7 -; GCN-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v12, v1, s[0:1] -; GCN-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; GCN-NEXT: v_xor_b32_e32 v1, v1, v15 -; GCN-NEXT: v_xor_b32_e32 v8, v0, v16 -; GCN-NEXT: v_sub_i32_e32 v0, vcc, v1, v15 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, v8, v16 -; GCN-NEXT: v_rcp_iflag_f32_e32 v8, v11 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v9, v6 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], v2, v5 -; GCN-NEXT: v_ashrrev_i32_e32 v10, 31, v3 -; GCN-NEXT: v_mul_f32_e32 v8, s14, v8 -; GCN-NEXT: v_cvt_u32_f32_e32 v8, v8 -; GCN-NEXT: v_add_i32_e32 v3, vcc, v10, v3 -; GCN-NEXT: v_xor_b32_e32 v3, v3, v10 -; GCN-NEXT: v_add_i32_e32 v6, vcc, -1, v4 -; GCN-NEXT: v_mul_lo_u32 v5, v8, v7 -; GCN-NEXT: v_mul_hi_u32 v9, v8, v7 -; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v4 -; GCN-NEXT: v_sub_i32_e32 v11, vcc, 0, v5 -; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v11, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v5, v5, v8 -; GCN-NEXT: v_add_i32_e32 v9, vcc, v5, v8 -; GCN-NEXT: v_subrev_i32_e32 v5, vcc, v5, v8 -; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] -; GCN-NEXT: v_mul_hi_u32 v5, v5, v3 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; GCN-NEXT: v_mul_lo_u32 v4, v5, v7 -; GCN-NEXT: v_xor_b32_e32 v2, v2, v17 -; GCN-NEXT: v_sub_i32_e32 v2, vcc, v2, v17 -; GCN-NEXT: v_xor_b32_e32 v6, v10, v14 -; GCN-NEXT: v_sub_i32_e32 v8, vcc, v3, v4 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v8, v7 -; GCN-NEXT: v_cmp_ge_u32_e64 s[2:3], v3, v4 -; GCN-NEXT: v_add_i32_e32 v7, vcc, -1, v5 -; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v5 -; GCN-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GCN-NEXT: v_cndmask_b32_e32 v3, v5, v3, vcc -; GCN-NEXT: v_cndmask_b32_e64 v3, v7, v3, s[2:3] -; GCN-NEXT: v_xor_b32_e32 v3, v3, v6 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, v3, v6 -; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0 +; GCN-NEXT: s_waitcnt vmcnt(0) +; GCN-NEXT: v_ashrrev_i32_e32 v11, 31, v4 +; GCN-NEXT: v_add_i32_e32 v4, vcc, v11, v4 +; GCN-NEXT: v_xor_b32_e32 v4, v4, v11 +; GCN-NEXT: v_mul_hi_u32 v9, v4, v9 +; GCN-NEXT: v_xor_b32_e32 v8, v11, v8 +; GCN-NEXT: v_mul_lo_u32 v12, v9, v0 +; GCN-NEXT: v_add_i32_e32 v13, vcc, 1, v9 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, v4, v12 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v0 +; GCN-NEXT: v_sub_i32_e32 v12, vcc, v4, v0 +; GCN-NEXT: v_cndmask_b32_e64 v9, v9, v13, s[0:1] +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v12, s[0:1] +; GCN-NEXT: v_add_i32_e32 v12, vcc, 1, v9 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v4, v0 +; GCN-NEXT: s_mov_b64 s[0:1], vcc +; GCN-NEXT: v_add_i32_e32 v0, vcc, v10, v1 +; GCN-NEXT: v_xor_b32_e32 v1, v0, v10 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, v1 +; GCN-NEXT: v_sub_i32_e32 v13, vcc, 0, v1 +; GCN-NEXT: v_ashrrev_i32_e32 v4, 31, v5 +; GCN-NEXT: v_add_i32_e32 v5, vcc, v4, v5 +; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GCN-NEXT: v_xor_b32_e32 v5, v5, v4 +; GCN-NEXT: v_cndmask_b32_e64 v9, v9, v12, s[0:1] +; GCN-NEXT: v_xor_b32_e32 v4, v4, v10 +; GCN-NEXT: v_mul_f32_e32 v0, s10, v0 +; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 +; GCN-NEXT: v_ashrrev_i32_e32 v10, 31, v6 +; GCN-NEXT: v_mul_lo_u32 v13, v13, v0 +; GCN-NEXT: v_mul_hi_u32 v11, v0, v13 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v11, v0 +; GCN-NEXT: v_mul_hi_u32 v11, v5, v0 +; GCN-NEXT: v_xor_b32_e32 v0, v9, v8 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v8, v0 +; GCN-NEXT: v_mul_lo_u32 v8, v11, v1 +; GCN-NEXT: v_add_i32_e32 v9, vcc, 1, v11 +; GCN-NEXT: v_sub_i32_e32 v5, vcc, v5, v8 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v5, v1 +; GCN-NEXT: v_cndmask_b32_e64 v8, v11, v9, s[0:1] +; GCN-NEXT: v_sub_i32_e32 v9, vcc, v5, v1 +; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[0:1] +; GCN-NEXT: v_add_i32_e32 v9, vcc, 1, v8 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v5, v1 +; GCN-NEXT: s_mov_b64 s[0:1], vcc +; GCN-NEXT: v_add_i32_e32 v1, vcc, v14, v2 +; GCN-NEXT: v_xor_b32_e32 v2, v1, v14 +; GCN-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GCN-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GCN-NEXT: v_cndmask_b32_e64 v8, v8, v9, s[0:1] +; GCN-NEXT: v_ashrrev_i32_e32 v9, 31, v3 +; GCN-NEXT: v_rcp_iflag_f32_e32 v1, v1 +; GCN-NEXT: v_mul_f32_e32 v1, s10, v1 +; GCN-NEXT: v_cvt_u32_f32_e32 v1, v1 +; GCN-NEXT: v_mul_lo_u32 v5, v5, v1 +; GCN-NEXT: v_mul_hi_u32 v5, v1, v5 +; GCN-NEXT: v_add_i32_e32 v1, vcc, v5, v1 +; GCN-NEXT: v_add_i32_e32 v5, vcc, v10, v6 +; GCN-NEXT: v_xor_b32_e32 v5, v5, v10 +; GCN-NEXT: v_mul_hi_u32 v6, v5, v1 +; GCN-NEXT: v_xor_b32_e32 v1, v8, v4 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v4, v1 +; GCN-NEXT: v_xor_b32_e32 v10, v10, v14 +; GCN-NEXT: v_mul_lo_u32 v4, v6, v2 +; GCN-NEXT: v_add_i32_e32 v8, vcc, 1, v6 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, v5, v4 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v2 +; GCN-NEXT: v_cndmask_b32_e64 v5, v6, v8, s[0:1] +; GCN-NEXT: v_sub_i32_e32 v6, vcc, v4, v2 +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[0:1] +; GCN-NEXT: v_add_i32_e32 v6, vcc, 1, v5 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v4, v2 +; GCN-NEXT: s_mov_b64 s[0:1], vcc +; GCN-NEXT: v_add_i32_e32 v2, vcc, v9, v3 +; GCN-NEXT: v_xor_b32_e32 v3, v2, v9 +; GCN-NEXT: v_cvt_f32_u32_e32 v2, v3 +; GCN-NEXT: v_sub_i32_e32 v8, vcc, 0, v3 +; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[0:1] +; GCN-NEXT: v_ashrrev_i32_e32 v4, 31, v7 +; GCN-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GCN-NEXT: v_add_i32_e32 v7, vcc, v4, v7 +; GCN-NEXT: v_xor_b32_e32 v9, v4, v9 +; GCN-NEXT: v_xor_b32_e32 v4, v7, v4 +; GCN-NEXT: v_mul_f32_e32 v2, s10, v2 +; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GCN-NEXT: v_mul_lo_u32 v8, v8, v2 +; GCN-NEXT: v_mul_hi_u32 v6, v2, v8 +; GCN-NEXT: v_add_i32_e32 v2, vcc, v6, v2 +; GCN-NEXT: v_mul_hi_u32 v6, v4, v2 +; GCN-NEXT: v_xor_b32_e32 v2, v5, v10 +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, v10, v2 +; GCN-NEXT: v_mul_lo_u32 v5, v6, v3 +; GCN-NEXT: v_add_i32_e32 v7, vcc, 1, v6 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, v4, v5 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v3 +; GCN-NEXT: v_cndmask_b32_e64 v5, v6, v7, s[0:1] +; GCN-NEXT: v_sub_i32_e32 v6, vcc, v4, v3 +; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[0:1] +; GCN-NEXT: v_add_i32_e32 v6, vcc, 1, v5 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v4, v3 +; GCN-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc +; GCN-NEXT: v_xor_b32_e32 v3, v3, v9 +; GCN-NEXT: v_subrev_i32_e32 v3, vcc, v9, v3 +; GCN-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0 ; GCN-NEXT: s_endpgm ; ; TONGA-LABEL: sdiv_v4i32: ; TONGA: ; %bb.0: -; TONGA-NEXT: s_load_dwordx4 s[12:15], s[0:1], 0x24 -; TONGA-NEXT: s_mov_b32 s11, 0xf000 -; TONGA-NEXT: s_mov_b32 s10, -1 -; TONGA-NEXT: s_mov_b32 s2, s10 -; TONGA-NEXT: s_mov_b32 s3, s11 +; TONGA-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x24 +; TONGA-NEXT: s_mov_b32 s7, 0xf000 +; TONGA-NEXT: s_mov_b32 s6, -1 +; TONGA-NEXT: s_mov_b32 s2, s6 +; TONGA-NEXT: s_mov_b32 s3, s7 ; TONGA-NEXT: s_waitcnt lgkmcnt(0) -; TONGA-NEXT: s_mov_b32 s0, s14 -; TONGA-NEXT: s_mov_b32 s1, s15 -; TONGA-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 -; TONGA-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:16 -; TONGA-NEXT: s_mov_b32 s14, 0x4f800000 -; TONGA-NEXT: s_mov_b32 s8, s12 -; TONGA-NEXT: s_mov_b32 s9, s13 -; TONGA-NEXT: s_waitcnt vmcnt(1) -; TONGA-NEXT: v_ashrrev_i32_e32 v8, 31, v0 +; TONGA-NEXT: s_mov_b32 s0, s10 +; TONGA-NEXT: s_mov_b32 s1, s11 +; TONGA-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:16 +; TONGA-NEXT: s_mov_b32 s10, 0x4f7ffffe +; TONGA-NEXT: s_mov_b32 s4, s8 +; TONGA-NEXT: s_mov_b32 s5, s9 ; TONGA-NEXT: s_waitcnt vmcnt(0) -; TONGA-NEXT: v_ashrrev_i32_e32 v9, 31, v4 -; TONGA-NEXT: v_add_u32_e32 v4, vcc, v9, v4 -; TONGA-NEXT: v_xor_b32_e32 v4, v4, v9 -; TONGA-NEXT: v_xor_b32_e32 v15, v8, v9 -; TONGA-NEXT: v_cvt_f32_u32_e32 v9, v4 -; TONGA-NEXT: v_ashrrev_i32_e32 v11, 31, v5 -; TONGA-NEXT: v_add_u32_e32 v5, vcc, v11, v5 +; TONGA-NEXT: v_ashrrev_i32_e32 v8, 31, v0 ; TONGA-NEXT: v_add_u32_e32 v0, vcc, v8, v0 -; TONGA-NEXT: v_rcp_iflag_f32_e32 v9, v9 -; TONGA-NEXT: v_xor_b32_e32 v5, v5, v11 ; TONGA-NEXT: v_xor_b32_e32 v0, v0, v8 -; TONGA-NEXT: v_cvt_f32_u32_e32 v8, v5 -; TONGA-NEXT: v_mul_f32_e32 v9, s14, v9 -; TONGA-NEXT: v_cvt_u32_f32_e32 v9, v9 +; TONGA-NEXT: v_cvt_f32_u32_e32 v4, v0 +; TONGA-NEXT: v_ashrrev_i32_e32 v14, 31, v2 +; TONGA-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; TONGA-NEXT: v_mul_f32_e32 v4, s10, v4 +; TONGA-NEXT: v_cvt_u32_f32_e32 v9, v4 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, 0, v0 +; TONGA-NEXT: v_mul_lo_u32 v10, v4, v9 +; TONGA-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 +; TONGA-NEXT: v_mul_hi_u32 v10, v9, v10 +; TONGA-NEXT: v_add_u32_e32 v9, vcc, v10, v9 ; TONGA-NEXT: v_ashrrev_i32_e32 v10, 31, v1 -; TONGA-NEXT: v_rcp_iflag_f32_e32 v8, v8 -; TONGA-NEXT: v_add_u32_e32 v1, vcc, v10, v1 -; TONGA-NEXT: v_xor_b32_e32 v16, v10, v11 -; TONGA-NEXT: v_xor_b32_e32 v1, v1, v10 -; TONGA-NEXT: v_mul_f32_e32 v8, s14, v8 -; TONGA-NEXT: v_mul_hi_u32 v11, v9, v4 -; TONGA-NEXT: v_mul_lo_u32 v10, v9, v4 -; TONGA-NEXT: v_cvt_u32_f32_e32 v8, v8 -; TONGA-NEXT: v_ashrrev_i32_e32 v12, 31, v2 -; TONGA-NEXT: v_ashrrev_i32_e32 v13, 31, v6 -; TONGA-NEXT: v_add_u32_e32 v2, vcc, v12, v2 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v11 -; TONGA-NEXT: v_xor_b32_e32 v17, v12, v13 -; TONGA-NEXT: v_xor_b32_e32 v2, v2, v12 -; TONGA-NEXT: v_sub_u32_e32 v12, vcc, 0, v10 -; TONGA-NEXT: v_cndmask_b32_e64 v10, v10, v12, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v12, v8, v5 -; TONGA-NEXT: v_add_u32_e32 v6, vcc, v13, v6 -; TONGA-NEXT: v_xor_b32_e32 v6, v6, v13 -; TONGA-NEXT: v_mul_lo_u32 v11, v8, v5 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v12 -; TONGA-NEXT: v_cvt_f32_u32_e32 v12, v6 -; TONGA-NEXT: v_mul_hi_u32 v10, v10, v9 -; TONGA-NEXT: v_sub_u32_e32 v13, vcc, 0, v11 -; TONGA-NEXT: v_cndmask_b32_e64 v11, v11, v13, s[2:3] -; TONGA-NEXT: v_rcp_iflag_f32_e32 v12, v12 -; TONGA-NEXT: v_ashrrev_i32_e32 v14, 31, v7 -; TONGA-NEXT: v_add_u32_e32 v7, vcc, v14, v7 -; TONGA-NEXT: v_xor_b32_e32 v7, v7, v14 -; TONGA-NEXT: v_mul_f32_e32 v12, s14, v12 -; TONGA-NEXT: v_cvt_u32_f32_e32 v12, v12 -; TONGA-NEXT: v_mul_hi_u32 v18, v12, v6 -; TONGA-NEXT: v_mul_lo_u32 v13, v12, v6 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v18 -; TONGA-NEXT: v_add_u32_e32 v18, vcc, v10, v9 -; TONGA-NEXT: v_subrev_u32_e32 v9, vcc, v10, v9 -; TONGA-NEXT: v_mul_hi_u32 v10, v11, v8 -; TONGA-NEXT: v_cndmask_b32_e64 v9, v9, v18, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v9, v9, v0 -; TONGA-NEXT: v_sub_u32_e32 v19, vcc, 0, v13 -; TONGA-NEXT: v_add_u32_e32 v11, vcc, v10, v8 -; TONGA-NEXT: v_subrev_u32_e32 v8, vcc, v10, v8 -; TONGA-NEXT: v_cndmask_b32_e64 v13, v13, v19, s[4:5] -; TONGA-NEXT: v_cndmask_b32_e64 v8, v8, v11, s[2:3] -; TONGA-NEXT: v_mul_hi_u32 v10, v13, v12 -; TONGA-NEXT: v_mul_lo_u32 v11, v9, v4 -; TONGA-NEXT: v_mul_hi_u32 v8, v8, v1 -; TONGA-NEXT: v_add_u32_e32 v13, vcc, v10, v12 -; TONGA-NEXT: v_subrev_u32_e32 v10, vcc, v10, v12 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v11 -; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v0, v11 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[2:3], v0, v4 -; TONGA-NEXT: v_cndmask_b32_e64 v10, v10, v13, s[4:5] -; TONGA-NEXT: v_mul_lo_u32 v0, v8, v5 -; TONGA-NEXT: v_mul_hi_u32 v4, v10, v2 -; TONGA-NEXT: v_add_u32_e32 v12, vcc, -1, v9 -; TONGA-NEXT: v_add_u32_e32 v10, vcc, -1, v8 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v0 -; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v1, v0 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[6:7], v0, v5 -; TONGA-NEXT: v_mul_lo_u32 v5, v4, v6 -; TONGA-NEXT: v_add_u32_e32 v1, vcc, 1, v9 -; TONGA-NEXT: v_add_u32_e32 v0, vcc, 1, v8 -; TONGA-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; TONGA-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; TONGA-NEXT: v_sub_u32_e32 v9, vcc, v2, v5 -; TONGA-NEXT: s_and_b64 vcc, s[6:7], s[4:5] -; TONGA-NEXT: v_cvt_f32_u32_e32 v11, v7 -; TONGA-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; TONGA-NEXT: v_cndmask_b32_e64 v1, v12, v1, s[0:1] -; TONGA-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; TONGA-NEXT: v_xor_b32_e32 v1, v1, v15 -; TONGA-NEXT: v_xor_b32_e32 v8, v0, v16 -; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v1, v15 -; TONGA-NEXT: v_sub_u32_e32 v1, vcc, v8, v16 -; TONGA-NEXT: v_rcp_iflag_f32_e32 v8, v11 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v9, v6 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[2:3], v2, v5 -; TONGA-NEXT: v_ashrrev_i32_e32 v10, 31, v3 -; TONGA-NEXT: v_mul_f32_e32 v8, s14, v8 -; TONGA-NEXT: v_cvt_u32_f32_e32 v8, v8 -; TONGA-NEXT: v_add_u32_e32 v3, vcc, v10, v3 -; TONGA-NEXT: v_xor_b32_e32 v3, v3, v10 -; TONGA-NEXT: v_add_u32_e32 v6, vcc, -1, v4 -; TONGA-NEXT: v_mul_lo_u32 v5, v8, v7 -; TONGA-NEXT: v_mul_hi_u32 v9, v8, v7 -; TONGA-NEXT: v_add_u32_e32 v2, vcc, 1, v4 -; TONGA-NEXT: v_sub_u32_e32 v11, vcc, 0, v5 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v11, s[4:5] -; TONGA-NEXT: v_mul_hi_u32 v5, v5, v8 -; TONGA-NEXT: v_add_u32_e32 v9, vcc, v5, v8 -; TONGA-NEXT: v_subrev_u32_e32 v5, vcc, v5, v8 -; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] -; TONGA-NEXT: v_mul_hi_u32 v5, v5, v3 -; TONGA-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; TONGA-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; TONGA-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; TONGA-NEXT: v_mul_lo_u32 v4, v5, v7 -; TONGA-NEXT: v_xor_b32_e32 v2, v2, v17 -; TONGA-NEXT: v_sub_u32_e32 v2, vcc, v2, v17 -; TONGA-NEXT: v_xor_b32_e32 v6, v10, v14 -; TONGA-NEXT: v_sub_u32_e32 v8, vcc, v3, v4 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v8, v7 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[2:3], v3, v4 -; TONGA-NEXT: v_add_u32_e32 v7, vcc, -1, v5 -; TONGA-NEXT: v_add_u32_e32 v3, vcc, 1, v5 -; TONGA-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; TONGA-NEXT: v_cndmask_b32_e32 v3, v5, v3, vcc -; TONGA-NEXT: v_cndmask_b32_e64 v3, v7, v3, s[2:3] -; TONGA-NEXT: v_xor_b32_e32 v3, v3, v6 -; TONGA-NEXT: v_sub_u32_e32 v3, vcc, v3, v6 -; TONGA-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0 +; TONGA-NEXT: s_waitcnt vmcnt(0) +; TONGA-NEXT: v_ashrrev_i32_e32 v11, 31, v4 +; TONGA-NEXT: v_add_u32_e32 v4, vcc, v11, v4 +; TONGA-NEXT: v_xor_b32_e32 v4, v4, v11 +; TONGA-NEXT: v_mul_hi_u32 v9, v4, v9 +; TONGA-NEXT: v_xor_b32_e32 v8, v11, v8 +; TONGA-NEXT: v_mul_lo_u32 v12, v9, v0 +; TONGA-NEXT: v_add_u32_e32 v13, vcc, 1, v9 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, v4, v12 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v0 +; TONGA-NEXT: v_sub_u32_e32 v12, vcc, v4, v0 +; TONGA-NEXT: v_cndmask_b32_e64 v9, v9, v13, s[0:1] +; TONGA-NEXT: v_cndmask_b32_e64 v4, v4, v12, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v12, vcc, 1, v9 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v4, v0 +; TONGA-NEXT: s_mov_b64 s[0:1], vcc +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v10, v1 +; TONGA-NEXT: v_xor_b32_e32 v1, v0, v10 +; TONGA-NEXT: v_cvt_f32_u32_e32 v0, v1 +; TONGA-NEXT: v_sub_u32_e32 v13, vcc, 0, v1 +; TONGA-NEXT: v_ashrrev_i32_e32 v4, 31, v5 +; TONGA-NEXT: v_add_u32_e32 v5, vcc, v4, v5 +; TONGA-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; TONGA-NEXT: v_xor_b32_e32 v5, v5, v4 +; TONGA-NEXT: v_cndmask_b32_e64 v9, v9, v12, s[0:1] +; TONGA-NEXT: v_xor_b32_e32 v4, v4, v10 +; TONGA-NEXT: v_mul_f32_e32 v0, s10, v0 +; TONGA-NEXT: v_cvt_u32_f32_e32 v0, v0 +; TONGA-NEXT: v_ashrrev_i32_e32 v10, 31, v6 +; TONGA-NEXT: v_mul_lo_u32 v13, v13, v0 +; TONGA-NEXT: v_mul_hi_u32 v11, v0, v13 +; TONGA-NEXT: v_add_u32_e32 v0, vcc, v11, v0 +; TONGA-NEXT: v_mul_hi_u32 v11, v5, v0 +; TONGA-NEXT: v_xor_b32_e32 v0, v9, v8 +; TONGA-NEXT: v_subrev_u32_e32 v0, vcc, v8, v0 +; TONGA-NEXT: v_mul_lo_u32 v8, v11, v1 +; TONGA-NEXT: v_add_u32_e32 v9, vcc, 1, v11 +; TONGA-NEXT: v_sub_u32_e32 v5, vcc, v5, v8 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v5, v1 +; TONGA-NEXT: v_cndmask_b32_e64 v8, v11, v9, s[0:1] +; TONGA-NEXT: v_sub_u32_e32 v9, vcc, v5, v1 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v9, vcc, 1, v8 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v5, v1 +; TONGA-NEXT: s_mov_b64 s[0:1], vcc +; TONGA-NEXT: v_add_u32_e32 v1, vcc, v14, v2 +; TONGA-NEXT: v_xor_b32_e32 v2, v1, v14 +; TONGA-NEXT: v_cvt_f32_u32_e32 v1, v2 +; TONGA-NEXT: v_sub_u32_e32 v5, vcc, 0, v2 +; TONGA-NEXT: v_cndmask_b32_e64 v8, v8, v9, s[0:1] +; TONGA-NEXT: v_ashrrev_i32_e32 v9, 31, v3 +; TONGA-NEXT: v_rcp_iflag_f32_e32 v1, v1 +; TONGA-NEXT: v_mul_f32_e32 v1, s10, v1 +; TONGA-NEXT: v_cvt_u32_f32_e32 v1, v1 +; TONGA-NEXT: v_mul_lo_u32 v5, v5, v1 +; TONGA-NEXT: v_mul_hi_u32 v5, v1, v5 +; TONGA-NEXT: v_add_u32_e32 v1, vcc, v5, v1 +; TONGA-NEXT: v_add_u32_e32 v5, vcc, v10, v6 +; TONGA-NEXT: v_xor_b32_e32 v5, v5, v10 +; TONGA-NEXT: v_mul_hi_u32 v6, v5, v1 +; TONGA-NEXT: v_xor_b32_e32 v1, v8, v4 +; TONGA-NEXT: v_subrev_u32_e32 v1, vcc, v4, v1 +; TONGA-NEXT: v_xor_b32_e32 v10, v10, v14 +; TONGA-NEXT: v_mul_lo_u32 v4, v6, v2 +; TONGA-NEXT: v_add_u32_e32 v8, vcc, 1, v6 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, v5, v4 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v2 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v6, v8, s[0:1] +; TONGA-NEXT: v_sub_u32_e32 v6, vcc, v4, v2 +; TONGA-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v6, vcc, 1, v5 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v4, v2 +; TONGA-NEXT: s_mov_b64 s[0:1], vcc +; TONGA-NEXT: v_add_u32_e32 v2, vcc, v9, v3 +; TONGA-NEXT: v_xor_b32_e32 v3, v2, v9 +; TONGA-NEXT: v_cvt_f32_u32_e32 v2, v3 +; TONGA-NEXT: v_sub_u32_e32 v8, vcc, 0, v3 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[0:1] +; TONGA-NEXT: v_ashrrev_i32_e32 v4, 31, v7 +; TONGA-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; TONGA-NEXT: v_add_u32_e32 v7, vcc, v4, v7 +; TONGA-NEXT: v_xor_b32_e32 v9, v4, v9 +; TONGA-NEXT: v_xor_b32_e32 v4, v7, v4 +; TONGA-NEXT: v_mul_f32_e32 v2, s10, v2 +; TONGA-NEXT: v_cvt_u32_f32_e32 v2, v2 +; TONGA-NEXT: v_mul_lo_u32 v8, v8, v2 +; TONGA-NEXT: v_mul_hi_u32 v6, v2, v8 +; TONGA-NEXT: v_add_u32_e32 v2, vcc, v6, v2 +; TONGA-NEXT: v_mul_hi_u32 v6, v4, v2 +; TONGA-NEXT: v_xor_b32_e32 v2, v5, v10 +; TONGA-NEXT: v_subrev_u32_e32 v2, vcc, v10, v2 +; TONGA-NEXT: v_mul_lo_u32 v5, v6, v3 +; TONGA-NEXT: v_add_u32_e32 v7, vcc, 1, v6 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, v4, v5 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v4, v3 +; TONGA-NEXT: v_cndmask_b32_e64 v5, v6, v7, s[0:1] +; TONGA-NEXT: v_sub_u32_e32 v6, vcc, v4, v3 +; TONGA-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v6, vcc, 1, v5 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v4, v3 +; TONGA-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc +; TONGA-NEXT: v_xor_b32_e32 v3, v3, v9 +; TONGA-NEXT: v_subrev_u32_e32 v3, vcc, v9, v3 +; TONGA-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0 ; TONGA-NEXT: s_endpgm ; ; GFX9-LABEL: sdiv_v4i32: ; GFX9: ; %bb.0: -; GFX9-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b32 s15, 0xf000 -; GFX9-NEXT: s_mov_b32 s14, -1 -; GFX9-NEXT: s_mov_b32 s2, s14 -; GFX9-NEXT: s_mov_b32 s3, s15 +; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 +; GFX9-NEXT: s_mov_b32 s11, 0xf000 +; GFX9-NEXT: s_mov_b32 s10, -1 +; GFX9-NEXT: s_mov_b32 s4, 0x4f7ffffe ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: s_mov_b32 s0, s10 -; GFX9-NEXT: s_mov_b32 s1, s11 +; GFX9-NEXT: s_mov_b32 s8, s0 +; GFX9-NEXT: s_mov_b32 s9, s1 +; GFX9-NEXT: s_mov_b32 s0, s2 +; GFX9-NEXT: s_mov_b32 s1, s3 +; GFX9-NEXT: s_mov_b32 s2, s10 +; GFX9-NEXT: s_mov_b32 s3, s11 ; GFX9-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 ; GFX9-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:16 -; GFX9-NEXT: s_mov_b32 s4, 0x4f800000 -; GFX9-NEXT: s_mov_b32 s12, s8 -; GFX9-NEXT: s_mov_b32 s13, s9 ; GFX9-NEXT: s_waitcnt vmcnt(1) ; GFX9-NEXT: v_ashrrev_i32_e32 v8, 31, v0 ; GFX9-NEXT: s_waitcnt vmcnt(0) ; GFX9-NEXT: v_ashrrev_i32_e32 v9, 31, v4 ; GFX9-NEXT: v_add_u32_e32 v4, v4, v9 +; GFX9-NEXT: v_ashrrev_i32_e32 v11, 31, v5 ; GFX9-NEXT: v_add_u32_e32 v0, v0, v8 ; GFX9-NEXT: v_xor_b32_e32 v4, v4, v9 +; GFX9-NEXT: v_ashrrev_i32_e32 v10, 31, v1 +; GFX9-NEXT: v_ashrrev_i32_e32 v13, 31, v6 +; GFX9-NEXT: v_add_u32_e32 v5, v5, v11 ; GFX9-NEXT: v_xor_b32_e32 v16, v8, v9 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v8 ; GFX9-NEXT: v_cvt_f32_u32_e32 v8, v4 -; GFX9-NEXT: v_ashrrev_i32_e32 v11, 31, v5 -; GFX9-NEXT: v_add_u32_e32 v5, v5, v11 -; GFX9-NEXT: v_xor_b32_e32 v5, v5, v11 -; GFX9-NEXT: v_cvt_f32_u32_e32 v9, v5 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v8, v8 -; GFX9-NEXT: v_ashrrev_i32_e32 v13, 31, v6 -; GFX9-NEXT: v_ashrrev_i32_e32 v10, 31, v1 -; GFX9-NEXT: v_add_u32_e32 v6, v6, v13 +; GFX9-NEXT: v_ashrrev_i32_e32 v12, 31, v2 +; GFX9-NEXT: v_ashrrev_i32_e32 v15, 31, v7 ; GFX9-NEXT: v_add_u32_e32 v1, v1, v10 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v9, v9 -; GFX9-NEXT: v_mul_f32_e32 v8, s4, v8 -; GFX9-NEXT: v_xor_b32_e32 v6, v6, v13 +; GFX9-NEXT: v_add_u32_e32 v6, v6, v13 +; GFX9-NEXT: v_xor_b32_e32 v5, v5, v11 +; GFX9-NEXT: v_ashrrev_i32_e32 v14, 31, v3 +; GFX9-NEXT: v_add_u32_e32 v2, v2, v12 +; GFX9-NEXT: v_add_u32_e32 v7, v7, v15 ; GFX9-NEXT: v_xor_b32_e32 v17, v10, v11 ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v10 -; GFX9-NEXT: v_cvt_f32_u32_e32 v10, v6 -; GFX9-NEXT: v_cvt_u32_f32_e32 v8, v8 -; GFX9-NEXT: v_ashrrev_i32_e32 v12, 31, v2 -; GFX9-NEXT: v_add_u32_e32 v2, v2, v12 -; GFX9-NEXT: v_mul_f32_e32 v9, s4, v9 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v10, v10 +; GFX9-NEXT: v_cvt_f32_u32_e32 v10, v5 +; GFX9-NEXT: v_xor_b32_e32 v6, v6, v13 +; GFX9-NEXT: v_add_u32_e32 v3, v3, v14 ; GFX9-NEXT: v_xor_b32_e32 v18, v12, v13 ; GFX9-NEXT: v_xor_b32_e32 v2, v2, v12 -; GFX9-NEXT: v_cvt_u32_f32_e32 v9, v9 -; GFX9-NEXT: v_mul_hi_u32 v12, v8, v4 -; GFX9-NEXT: v_mul_lo_u32 v11, v8, v4 -; GFX9-NEXT: v_mul_f32_e32 v10, s4, v10 -; GFX9-NEXT: v_mul_lo_u32 v13, v9, v5 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v12 -; GFX9-NEXT: v_mul_hi_u32 v12, v9, v5 -; GFX9-NEXT: v_cvt_u32_f32_e32 v10, v10 -; GFX9-NEXT: v_sub_u32_e32 v19, 0, v11 -; GFX9-NEXT: v_cndmask_b32_e32 v11, v11, v19, vcc -; GFX9-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v12 -; GFX9-NEXT: v_sub_u32_e32 v19, 0, v13 -; GFX9-NEXT: v_cndmask_b32_e64 v13, v13, v19, s[0:1] -; GFX9-NEXT: v_mul_hi_u32 v19, v10, v6 -; GFX9-NEXT: v_ashrrev_i32_e32 v15, 31, v7 -; GFX9-NEXT: v_add_u32_e32 v7, v7, v15 +; GFX9-NEXT: v_cvt_f32_u32_e32 v12, v6 ; GFX9-NEXT: v_xor_b32_e32 v7, v7, v15 -; GFX9-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v19 -; GFX9-NEXT: v_cvt_f32_u32_e32 v19, v7 -; GFX9-NEXT: v_mul_hi_u32 v11, v11, v8 -; GFX9-NEXT: v_mul_lo_u32 v12, v10, v6 -; GFX9-NEXT: v_ashrrev_i32_e32 v14, 31, v3 -; GFX9-NEXT: v_rcp_iflag_f32_e32 v19, v19 -; GFX9-NEXT: v_add_u32_e32 v3, v3, v14 -; GFX9-NEXT: v_sub_u32_e32 v20, 0, v12 -; GFX9-NEXT: v_cndmask_b32_e64 v12, v12, v20, s[2:3] -; GFX9-NEXT: v_mul_f32_e32 v19, s4, v19 -; GFX9-NEXT: v_cvt_u32_f32_e32 v19, v19 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v8, v8 +; GFX9-NEXT: v_xor_b32_e32 v19, v14, v15 ; GFX9-NEXT: v_xor_b32_e32 v3, v3, v14 -; GFX9-NEXT: v_mul_hi_u32 v21, v19, v7 -; GFX9-NEXT: v_mul_lo_u32 v20, v19, v7 -; GFX9-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v21 -; GFX9-NEXT: v_add_u32_e32 v21, v8, v11 -; GFX9-NEXT: v_sub_u32_e32 v8, v8, v11 -; GFX9-NEXT: v_mul_hi_u32 v11, v13, v9 -; GFX9-NEXT: v_cndmask_b32_e32 v8, v8, v21, vcc -; GFX9-NEXT: v_mul_hi_u32 v8, v8, v0 -; GFX9-NEXT: v_sub_u32_e32 v22, 0, v20 -; GFX9-NEXT: v_add_u32_e32 v13, v9, v11 -; GFX9-NEXT: v_sub_u32_e32 v9, v9, v11 -; GFX9-NEXT: v_mul_hi_u32 v11, v12, v10 -; GFX9-NEXT: v_cndmask_b32_e64 v9, v9, v13, s[0:1] -; GFX9-NEXT: v_mul_hi_u32 v9, v9, v1 -; GFX9-NEXT: v_cndmask_b32_e64 v20, v20, v22, s[4:5] -; GFX9-NEXT: v_add_u32_e32 v12, v10, v11 -; GFX9-NEXT: v_sub_u32_e32 v10, v10, v11 -; GFX9-NEXT: v_cndmask_b32_e64 v10, v10, v12, s[2:3] +; GFX9-NEXT: v_cvt_f32_u32_e32 v14, v7 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v10, v10 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v12, v12 +; GFX9-NEXT: v_mul_f32_e32 v8, s4, v8 +; GFX9-NEXT: v_rcp_iflag_f32_e32 v14, v14 +; GFX9-NEXT: v_cvt_u32_f32_e32 v8, v8 +; GFX9-NEXT: v_mul_f32_e32 v10, s4, v10 +; GFX9-NEXT: v_mul_f32_e32 v12, s4, v12 +; GFX9-NEXT: v_cvt_u32_f32_e32 v10, v10 +; GFX9-NEXT: v_sub_u32_e32 v9, 0, v4 +; GFX9-NEXT: v_mul_f32_e32 v14, s4, v14 +; GFX9-NEXT: v_cvt_u32_f32_e32 v12, v12 +; GFX9-NEXT: v_mul_lo_u32 v9, v9, v8 +; GFX9-NEXT: v_cvt_u32_f32_e32 v14, v14 +; GFX9-NEXT: v_sub_u32_e32 v11, 0, v5 +; GFX9-NEXT: v_sub_u32_e32 v13, 0, v6 +; GFX9-NEXT: v_mul_lo_u32 v11, v11, v10 +; GFX9-NEXT: v_sub_u32_e32 v15, 0, v7 +; GFX9-NEXT: v_mul_lo_u32 v13, v13, v12 +; GFX9-NEXT: v_mul_lo_u32 v15, v15, v14 +; GFX9-NEXT: v_mul_hi_u32 v9, v8, v9 +; GFX9-NEXT: v_mul_hi_u32 v11, v10, v11 +; GFX9-NEXT: v_mul_hi_u32 v13, v12, v13 +; GFX9-NEXT: v_mul_hi_u32 v15, v14, v15 +; GFX9-NEXT: v_add_u32_e32 v8, v8, v9 +; GFX9-NEXT: v_mul_hi_u32 v8, v0, v8 +; GFX9-NEXT: v_add_u32_e32 v9, v10, v11 +; GFX9-NEXT: v_add_u32_e32 v10, v12, v13 +; GFX9-NEXT: v_mul_hi_u32 v9, v1, v9 +; GFX9-NEXT: v_add_u32_e32 v11, v14, v15 +; GFX9-NEXT: v_mul_hi_u32 v10, v2, v10 ; GFX9-NEXT: v_mul_lo_u32 v12, v8, v4 -; GFX9-NEXT: v_mul_hi_u32 v11, v20, v19 -; GFX9-NEXT: v_mul_hi_u32 v10, v10, v2 -; GFX9-NEXT: v_add_u32_e32 v13, 1, v8 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v12 +; GFX9-NEXT: v_mul_hi_u32 v11, v3, v11 +; GFX9-NEXT: v_mul_lo_u32 v14, v9, v5 +; GFX9-NEXT: v_mul_lo_u32 v15, v10, v6 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v12 -; GFX9-NEXT: v_mul_lo_u32 v12, v9, v5 -; GFX9-NEXT: v_add_u32_e32 v20, v19, v11 -; GFX9-NEXT: v_sub_u32_e32 v11, v19, v11 -; GFX9-NEXT: v_cndmask_b32_e64 v11, v11, v20, s[4:5] -; GFX9-NEXT: v_cmp_ge_u32_e64 s[2:3], v1, v12 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v12 -; GFX9-NEXT: v_mul_lo_u32 v12, v10, v6 -; GFX9-NEXT: v_mul_hi_u32 v11, v11, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v0, v4 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v5 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[6:7], v2, v12 -; GFX9-NEXT: v_sub_u32_e32 v2, v2, v12 -; GFX9-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GFX9-NEXT: v_cmp_ge_u32_e64 s[8:9], v2, v6 -; GFX9-NEXT: v_cndmask_b32_e64 v2, v8, v13, s[0:1] -; GFX9-NEXT: v_add_u32_e32 v0, 1, v9 -; GFX9-NEXT: s_and_b64 s[0:1], s[4:5], s[2:3] -; GFX9-NEXT: v_cndmask_b32_e64 v0, v9, v0, s[0:1] -; GFX9-NEXT: v_add_u32_e32 v1, 1, v10 -; GFX9-NEXT: s_and_b64 s[0:1], s[8:9], s[6:7] ; GFX9-NEXT: v_mul_lo_u32 v12, v11, v7 -; GFX9-NEXT: v_add_u32_e32 v19, -1, v8 -; GFX9-NEXT: v_cndmask_b32_e64 v1, v10, v1, s[0:1] -; GFX9-NEXT: v_add_u32_e32 v5, -1, v10 -; GFX9-NEXT: v_cndmask_b32_e32 v2, v19, v2, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[6:7] -; GFX9-NEXT: v_add_u32_e32 v4, -1, v9 -; GFX9-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[2:3] -; GFX9-NEXT: v_xor_b32_e32 v2, v2, v16 -; GFX9-NEXT: v_xor_b32_e32 v5, v1, v18 -; GFX9-NEXT: v_xor_b32_e32 v4, v0, v17 -; GFX9-NEXT: v_sub_u32_e32 v0, v2, v16 -; GFX9-NEXT: v_sub_u32_e32 v2, v5, v18 -; GFX9-NEXT: v_sub_u32_e32 v5, v3, v12 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v5, v7 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v3, v12 -; GFX9-NEXT: v_add_u32_e32 v3, 1, v11 -; GFX9-NEXT: s_and_b64 vcc, vcc, s[0:1] -; GFX9-NEXT: v_add_u32_e32 v5, -1, v11 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v11, v3, vcc -; GFX9-NEXT: v_sub_u32_e32 v1, v4, v17 -; GFX9-NEXT: v_xor_b32_e32 v4, v14, v15 -; GFX9-NEXT: v_cndmask_b32_e64 v3, v5, v3, s[0:1] -; GFX9-NEXT: v_xor_b32_e32 v3, v3, v4 -; GFX9-NEXT: v_sub_u32_e32 v3, v3, v4 -; GFX9-NEXT: buffer_store_dwordx4 v[0:3], off, s[12:15], 0 +; GFX9-NEXT: v_sub_u32_e32 v1, v1, v14 +; GFX9-NEXT: v_add_u32_e32 v13, 1, v8 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; GFX9-NEXT: v_sub_u32_e32 v2, v2, v15 +; GFX9-NEXT: v_cndmask_b32_e32 v8, v8, v13, vcc +; GFX9-NEXT: v_sub_u32_e32 v13, v0, v4 +; GFX9-NEXT: v_add_u32_e32 v14, 1, v9 +; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v5 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v13, vcc +; GFX9-NEXT: v_sub_u32_e32 v3, v3, v12 +; GFX9-NEXT: v_cndmask_b32_e64 v9, v9, v14, s[0:1] +; GFX9-NEXT: v_sub_u32_e32 v14, v1, v5 +; GFX9-NEXT: v_add_u32_e32 v15, 1, v10 +; GFX9-NEXT: v_cmp_ge_u32_e64 s[2:3], v2, v6 +; GFX9-NEXT: v_cndmask_b32_e64 v10, v10, v15, s[2:3] +; GFX9-NEXT: v_sub_u32_e32 v15, v2, v6 +; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, v14, s[0:1] +; GFX9-NEXT: v_add_u32_e32 v12, 1, v11 +; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v3, v7 +; GFX9-NEXT: v_add_u32_e32 v13, 1, v8 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v8, v13, vcc +; GFX9-NEXT: v_cndmask_b32_e64 v11, v11, v12, s[4:5] +; GFX9-NEXT: v_sub_u32_e32 v12, v3, v7 +; GFX9-NEXT: v_cndmask_b32_e64 v2, v2, v15, s[2:3] +; GFX9-NEXT: v_add_u32_e32 v14, 1, v9 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v1, v5 +; GFX9-NEXT: v_cndmask_b32_e32 v1, v9, v14, vcc +; GFX9-NEXT: v_cndmask_b32_e64 v3, v3, v12, s[4:5] +; GFX9-NEXT: v_add_u32_e32 v15, 1, v10 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; GFX9-NEXT: v_cndmask_b32_e32 v2, v10, v15, vcc +; GFX9-NEXT: v_add_u32_e32 v12, 1, v11 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v3, v7 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v11, v12, vcc +; GFX9-NEXT: v_xor_b32_e32 v0, v0, v16 +; GFX9-NEXT: v_xor_b32_e32 v1, v1, v17 +; GFX9-NEXT: v_xor_b32_e32 v2, v2, v18 +; GFX9-NEXT: v_xor_b32_e32 v3, v3, v19 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v16 +; GFX9-NEXT: v_sub_u32_e32 v1, v1, v17 +; GFX9-NEXT: v_sub_u32_e32 v2, v2, v18 +; GFX9-NEXT: v_sub_u32_e32 v3, v3, v19 +; GFX9-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0 ; GFX9-NEXT: s_endpgm ; ; EG-LABEL: sdiv_v4i32: @@ -2091,158 +2015,146 @@ define amdgpu_kernel void @v_sdiv_i24(i32 addrspace(1)* %out, i24 addrspace(1)* define amdgpu_kernel void @v_sdiv_i25(i32 addrspace(1)* %out, i25 addrspace(1)* %in) { ; GCN-LABEL: v_sdiv_i25: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9 +; GCN-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9 ; GCN-NEXT: s_mov_b32 s7, 0xf000 ; GCN-NEXT: s_mov_b32 s6, -1 -; GCN-NEXT: s_mov_b32 s2, s6 -; GCN-NEXT: s_mov_b32 s3, s7 +; GCN-NEXT: s_mov_b32 s10, s6 +; GCN-NEXT: s_mov_b32 s11, s7 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_mov_b32 s0, s10 -; GCN-NEXT: s_mov_b32 s1, s11 -; GCN-NEXT: buffer_load_dwordx2 v[0:1], off, s[0:3], 0 -; GCN-NEXT: s_mov_b32 s4, s8 -; GCN-NEXT: s_mov_b32 s5, s9 +; GCN-NEXT: s_mov_b32 s8, s2 +; GCN-NEXT: s_mov_b32 s9, s3 +; GCN-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 +; GCN-NEXT: s_mov_b32 s4, s0 +; GCN-NEXT: s_mov_b32 s5, s1 ; GCN-NEXT: s_waitcnt vmcnt(0) ; GCN-NEXT: v_bfe_i32 v2, v1, 0, 25 ; GCN-NEXT: v_bfe_i32 v1, v1, 24, 1 ; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v2 ; GCN-NEXT: v_xor_b32_e32 v2, v2, v1 ; GCN-NEXT: v_cvt_f32_u32_e32 v3, v2 -; GCN-NEXT: v_bfe_i32 v4, v0, 0, 25 +; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 +; GCN-NEXT: v_bfe_i32 v5, v0, 0, 25 ; GCN-NEXT: v_bfe_i32 v0, v0, 24, 1 -; GCN-NEXT: v_add_i32_e32 v4, vcc, v0, v4 ; GCN-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GCN-NEXT: v_xor_b32_e32 v4, v4, v0 +; GCN-NEXT: v_add_i32_e32 v5, vcc, v0, v5 +; GCN-NEXT: v_xor_b32_e32 v5, v5, v0 ; GCN-NEXT: v_xor_b32_e32 v0, v0, v1 -; GCN-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; GCN-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GCN-NEXT: v_mul_lo_u32 v5, v3, v2 -; GCN-NEXT: v_mul_hi_u32 v6, v3, v2 -; GCN-NEXT: v_sub_i32_e32 v7, vcc, 0, v5 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v6 -; GCN-NEXT: v_cndmask_b32_e64 v5, v5, v7, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v5, v5, v3 -; GCN-NEXT: v_add_i32_e32 v6, vcc, v5, v3 -; GCN-NEXT: v_subrev_i32_e32 v3, vcc, v5, v3 -; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v6, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v3, v3, v4 +; GCN-NEXT: v_mul_lo_u32 v4, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v4, v3, v4 +; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; GCN-NEXT: v_mul_hi_u32 v3, v5, v3 ; GCN-NEXT: v_mul_lo_u32 v1, v3, v2 -; GCN-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; GCN-NEXT: v_add_i32_e32 v6, vcc, -1, v3 -; GCN-NEXT: v_subrev_i32_e32 v7, vcc, v1, v4 -; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v2 -; GCN-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GCN-NEXT: v_cndmask_b32_e64 v1, v3, v5, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, v1, v5 +; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v2 +; GCN-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v4, vcc, v2, v1 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] +; GCN-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GCN-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc ; GCN-NEXT: v_xor_b32_e32 v1, v1, v0 -; GCN-NEXT: v_sub_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v0, v1 ; GCN-NEXT: v_bfe_i32 v0, v0, 0, 25 ; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 ; GCN-NEXT: s_endpgm ; ; TONGA-LABEL: v_sdiv_i25: ; TONGA: ; %bb.0: -; TONGA-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x24 +; TONGA-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 ; TONGA-NEXT: s_mov_b32 s7, 0xf000 ; TONGA-NEXT: s_mov_b32 s6, -1 -; TONGA-NEXT: s_mov_b32 s2, s6 -; TONGA-NEXT: s_mov_b32 s3, s7 +; TONGA-NEXT: s_mov_b32 s10, s6 +; TONGA-NEXT: s_mov_b32 s11, s7 ; TONGA-NEXT: s_waitcnt lgkmcnt(0) -; TONGA-NEXT: s_mov_b32 s0, s10 -; TONGA-NEXT: s_mov_b32 s1, s11 -; TONGA-NEXT: buffer_load_dwordx2 v[0:1], off, s[0:3], 0 -; TONGA-NEXT: s_mov_b32 s4, s8 -; TONGA-NEXT: s_mov_b32 s5, s9 +; TONGA-NEXT: s_mov_b32 s8, s2 +; TONGA-NEXT: s_mov_b32 s9, s3 +; TONGA-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 +; TONGA-NEXT: s_mov_b32 s4, s0 +; TONGA-NEXT: s_mov_b32 s5, s1 ; TONGA-NEXT: s_waitcnt vmcnt(0) ; TONGA-NEXT: v_bfe_i32 v2, v1, 0, 25 ; TONGA-NEXT: v_bfe_i32 v1, v1, 24, 1 ; TONGA-NEXT: v_add_u32_e32 v2, vcc, v1, v2 ; TONGA-NEXT: v_xor_b32_e32 v2, v2, v1 ; TONGA-NEXT: v_cvt_f32_u32_e32 v3, v2 -; TONGA-NEXT: v_bfe_i32 v4, v0, 0, 25 +; TONGA-NEXT: v_sub_u32_e32 v4, vcc, 0, v2 +; TONGA-NEXT: v_bfe_i32 v5, v0, 0, 25 ; TONGA-NEXT: v_bfe_i32 v0, v0, 24, 1 -; TONGA-NEXT: v_add_u32_e32 v4, vcc, v0, v4 ; TONGA-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; TONGA-NEXT: v_xor_b32_e32 v4, v4, v0 +; TONGA-NEXT: v_add_u32_e32 v5, vcc, v0, v5 +; TONGA-NEXT: v_xor_b32_e32 v5, v5, v0 ; TONGA-NEXT: v_xor_b32_e32 v0, v0, v1 -; TONGA-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; TONGA-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; TONGA-NEXT: v_cvt_u32_f32_e32 v3, v3 -; TONGA-NEXT: v_mul_lo_u32 v5, v3, v2 -; TONGA-NEXT: v_mul_hi_u32 v6, v3, v2 -; TONGA-NEXT: v_sub_u32_e32 v7, vcc, 0, v5 -; TONGA-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v6 -; TONGA-NEXT: v_cndmask_b32_e64 v5, v5, v7, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v5, v5, v3 -; TONGA-NEXT: v_add_u32_e32 v6, vcc, v5, v3 -; TONGA-NEXT: v_subrev_u32_e32 v3, vcc, v5, v3 -; TONGA-NEXT: v_cndmask_b32_e64 v3, v3, v6, s[0:1] -; TONGA-NEXT: v_mul_hi_u32 v3, v3, v4 +; TONGA-NEXT: v_mul_lo_u32 v4, v4, v3 +; TONGA-NEXT: v_mul_hi_u32 v4, v3, v4 +; TONGA-NEXT: v_add_u32_e32 v3, vcc, v4, v3 +; TONGA-NEXT: v_mul_hi_u32 v3, v5, v3 ; TONGA-NEXT: v_mul_lo_u32 v1, v3, v2 -; TONGA-NEXT: v_add_u32_e32 v5, vcc, 1, v3 -; TONGA-NEXT: v_add_u32_e32 v6, vcc, -1, v3 -; TONGA-NEXT: v_subrev_u32_e32 v7, vcc, v1, v4 -; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v2 -; TONGA-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; TONGA-NEXT: v_cndmask_b32_e64 v1, v3, v5, s[0:1] -; TONGA-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc +; TONGA-NEXT: v_add_u32_e32 v4, vcc, 1, v3 +; TONGA-NEXT: v_subrev_u32_e32 v1, vcc, v1, v5 +; TONGA-NEXT: v_cmp_ge_u32_e64 s[0:1], v1, v2 +; TONGA-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[0:1] +; TONGA-NEXT: v_subrev_u32_e32 v4, vcc, v2, v1 +; TONGA-NEXT: v_cndmask_b32_e64 v1, v1, v4, s[0:1] +; TONGA-NEXT: v_add_u32_e32 v4, vcc, 1, v3 +; TONGA-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; TONGA-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc ; TONGA-NEXT: v_xor_b32_e32 v1, v1, v0 -; TONGA-NEXT: v_sub_u32_e32 v0, vcc, v1, v0 +; TONGA-NEXT: v_subrev_u32_e32 v0, vcc, v0, v1 ; TONGA-NEXT: v_bfe_i32 v0, v0, 0, 25 ; TONGA-NEXT: buffer_store_dword v0, off, s[4:7], 0 ; TONGA-NEXT: s_endpgm ; ; GFX9-LABEL: v_sdiv_i25: ; GFX9: ; %bb.0: -; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24 -; GFX9-NEXT: s_mov_b32 s7, 0xf000 -; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: s_mov_b32 s10, s6 -; GFX9-NEXT: s_mov_b32 s11, s7 +; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24 +; GFX9-NEXT: s_mov_b32 s3, 0xf000 +; GFX9-NEXT: s_mov_b32 s2, -1 +; GFX9-NEXT: s_mov_b32 s10, s2 +; GFX9-NEXT: s_mov_b32 s11, s3 ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: s_mov_b32 s8, s2 -; GFX9-NEXT: s_mov_b32 s9, s3 +; GFX9-NEXT: s_mov_b32 s8, s6 +; GFX9-NEXT: s_mov_b32 s9, s7 ; GFX9-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 -; GFX9-NEXT: s_mov_b32 s4, s0 -; GFX9-NEXT: s_mov_b32 s5, s1 +; GFX9-NEXT: s_mov_b32 s0, s4 +; GFX9-NEXT: s_mov_b32 s1, s5 ; GFX9-NEXT: s_waitcnt vmcnt(0) ; GFX9-NEXT: v_bfe_i32 v2, v1, 0, 25 ; GFX9-NEXT: v_bfe_i32 v1, v1, 24, 1 ; GFX9-NEXT: v_add_u32_e32 v2, v2, v1 ; GFX9-NEXT: v_xor_b32_e32 v2, v2, v1 ; GFX9-NEXT: v_cvt_f32_u32_e32 v3, v2 -; GFX9-NEXT: v_bfe_i32 v6, v0, 0, 25 +; GFX9-NEXT: v_sub_u32_e32 v4, 0, v2 +; GFX9-NEXT: v_bfe_i32 v5, v0, 0, 25 ; GFX9-NEXT: v_bfe_i32 v0, v0, 24, 1 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GFX9-NEXT: v_mul_lo_u32 v4, v3, v2 -; GFX9-NEXT: v_mul_hi_u32 v5, v3, v2 -; GFX9-NEXT: v_sub_u32_e32 v7, 0, v4 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc -; GFX9-NEXT: v_mul_hi_u32 v4, v4, v3 -; GFX9-NEXT: v_add_u32_e32 v5, v6, v0 +; GFX9-NEXT: v_add_u32_e32 v5, v5, v0 ; GFX9-NEXT: v_xor_b32_e32 v5, v5, v0 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v1 -; GFX9-NEXT: v_add_u32_e32 v6, v3, v4 -; GFX9-NEXT: v_sub_u32_e32 v3, v3, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v6, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v5 +; GFX9-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; GFX9-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GFX9-NEXT: v_mul_lo_u32 v4, v4, v3 +; GFX9-NEXT: v_mul_hi_u32 v4, v3, v4 +; GFX9-NEXT: v_add_u32_e32 v3, v3, v4 +; GFX9-NEXT: v_mul_hi_u32 v3, v5, v3 ; GFX9-NEXT: v_mul_lo_u32 v4, v3, v2 ; GFX9-NEXT: v_add_u32_e32 v1, 1, v3 -; GFX9-NEXT: v_add_u32_e32 v6, -1, v3 -; GFX9-NEXT: v_sub_u32_e32 v7, v5, v4 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v5, v4 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[0:1], v7, v2 -; GFX9-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[0:1] -; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc +; GFX9-NEXT: v_sub_u32_e32 v4, v5, v4 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v4, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc +; GFX9-NEXT: v_sub_u32_e32 v3, v4, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v3, vcc +; GFX9-NEXT: v_add_u32_e32 v4, 1, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v3, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v0 ; GFX9-NEXT: v_sub_u32_e32 v0, v1, v0 ; GFX9-NEXT: v_bfe_i32 v0, v0, 0, 25 -; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; GFX9-NEXT: s_endpgm ; ; EG-LABEL: v_sdiv_i25: diff --git a/llvm/test/CodeGen/AMDGPU/udivrem.ll b/llvm/test/CodeGen/AMDGPU/udivrem.ll index f581c4709de7..be06c3d10431 100644 --- a/llvm/test/CodeGen/AMDGPU/udivrem.ll +++ b/llvm/test/CodeGen/AMDGPU/udivrem.ll @@ -42,46 +42,40 @@ define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, [8 x i32], i32 ; ; GFX6-LABEL: test_udivrem: ; GFX6: ; %bb.0: -; GFX6-NEXT: s_load_dword s12, s[0:1], 0x26 -; GFX6-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GFX6-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x13 -; GFX6-NEXT: s_load_dword s0, s[0:1], 0x1d -; GFX6-NEXT: s_mov_b32 s7, 0xf000 -; GFX6-NEXT: s_mov_b32 s6, -1 -; GFX6-NEXT: s_mov_b32 s10, s6 -; GFX6-NEXT: s_waitcnt lgkmcnt(0) -; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s12 -; GFX6-NEXT: s_mov_b32 s11, s7 -; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX6-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 -; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX6-NEXT: v_mul_lo_u32 v1, v0, s12 -; GFX6-NEXT: v_mul_hi_u32 v2, v0, s12 -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GFX6-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX6-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GFX6-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GFX6-NEXT: v_mul_hi_u32 v0, v0, s0 -; GFX6-NEXT: v_mul_lo_u32 v1, v0, s12 -; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GFX6-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GFX6-NEXT: v_sub_i32_e32 v4, vcc, s0, v1 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[0:1], s0, v1 -; GFX6-NEXT: v_cmp_le_u32_e64 s[2:3], s12, v4 -; GFX6-NEXT: v_subrev_i32_e32 v1, vcc, s12, v4 -; GFX6-NEXT: v_add_i32_e32 v5, vcc, s12, v4 -; GFX6-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GFX6-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GFX6-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[0:1] -; GFX6-NEXT: v_cndmask_b32_e32 v1, v4, v1, vcc -; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0 -; GFX6-NEXT: s_waitcnt expcnt(0) -; GFX6-NEXT: v_cndmask_b32_e64 v0, v5, v1, s[0:1] -; GFX6-NEXT: buffer_store_dword v0, off, s[8:11], 0 -; GFX6-NEXT: s_endpgm +; GFX6-NEXT: s_load_dword s3, s[0:1], 0x26 +; GFX6-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 +; GFX6-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x13 +; GFX6-NEXT: s_load_dword s0, s[0:1], 0x1d +; GFX6-NEXT: s_mov_b32 s7, 0xf000 +; GFX6-NEXT: s_mov_b32 s6, -1 +; GFX6-NEXT: s_mov_b32 s10, s6 +; GFX6-NEXT: s_waitcnt lgkmcnt(0) +; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s3 +; GFX6-NEXT: s_sub_i32 s2, 0, s3 +; GFX6-NEXT: s_mov_b32 s11, s7 +; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GFX6-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 +; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0 +; GFX6-NEXT: v_mul_lo_u32 v1, s2, v0 +; GFX6-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX6-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GFX6-NEXT: v_mul_hi_u32 v0, s0, v0 +; GFX6-NEXT: v_mul_lo_u32 v1, v0, s3 +; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GFX6-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1 +; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] +; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1 +; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1] +; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1 +; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] +; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1 +; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GFX6-NEXT: s_waitcnt expcnt(0) +; GFX6-NEXT: v_cndmask_b32_e64 v0, v1, v2, s[0:1] +; GFX6-NEXT: buffer_store_dword v0, off, s[8:11], 0 +; GFX6-NEXT: s_endpgm ; ; GFX8-LABEL: test_udivrem: ; GFX8: ; %bb.0: @@ -89,39 +83,33 @@ define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, [8 x i32], i32 ; GFX8-NEXT: s_load_dword s6, s[0:1], 0x74 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s7 +; GFX8-NEXT: s_sub_i32 s2, 0, s7 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GFX8-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GFX8-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GFX8-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX8-NEXT: v_mul_lo_u32 v1, v0, s7 -; GFX8-NEXT: v_mul_hi_u32 v2, v0, s7 -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, 0, v1 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX8-NEXT: v_add_u32_e32 v2, vcc, v1, v0 -; GFX8-NEXT: v_subrev_u32_e32 v0, vcc, v1, v0 -; GFX8-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v2, v0, s6 +; GFX8-NEXT: v_mul_lo_u32 v1, s2, v0 ; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24 ; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x4c -; GFX8-NEXT: v_mul_lo_u32 v3, v2, s7 +; GFX8-NEXT: v_mul_hi_u32 v1, v0, v1 +; GFX8-NEXT: v_add_u32_e32 v0, vcc, v1, v0 +; GFX8-NEXT: v_mul_hi_u32 v2, s6, v0 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: v_mov_b32_e32 v0, s2 ; GFX8-NEXT: v_mov_b32_e32 v1, s3 +; GFX8-NEXT: v_mul_lo_u32 v3, v2, s7 +; GFX8-NEXT: v_add_u32_e32 v4, vcc, 1, v2 +; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s6, v3 +; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s7, v3 +; GFX8-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] +; GFX8-NEXT: v_subrev_u32_e32 v4, vcc, s7, v3 +; GFX8-NEXT: v_cndmask_b32_e64 v3, v3, v4, s[0:1] ; GFX8-NEXT: v_add_u32_e32 v4, vcc, 1, v2 -; GFX8-NEXT: v_sub_u32_e32 v6, vcc, s6, v3 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[0:1], s6, v3 -; GFX8-NEXT: v_add_u32_e32 v5, vcc, -1, v2 -; GFX8-NEXT: v_cmp_le_u32_e64 s[2:3], s7, v6 -; GFX8-NEXT: v_subrev_u32_e32 v3, vcc, s7, v6 -; GFX8-NEXT: v_add_u32_e32 v7, vcc, s7, v6 -; GFX8-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GFX8-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GFX8-NEXT: v_cndmask_b32_e64 v2, v5, v2, s[0:1] +; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s7, v3 +; GFX8-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] ; GFX8-NEXT: flat_store_dword v[0:1], v2 -; GFX8-NEXT: v_cndmask_b32_e32 v3, v6, v3, vcc +; GFX8-NEXT: v_subrev_u32_e32 v4, vcc, s7, v3 ; GFX8-NEXT: v_mov_b32_e32 v0, s4 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v7, v3, s[0:1] +; GFX8-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] ; GFX8-NEXT: v_mov_b32_e32 v1, s5 ; GFX8-NEXT: flat_store_dword v[0:1], v2 ; GFX8-NEXT: s_endpgm @@ -184,114 +172,90 @@ define amdgpu_kernel void @test_udivrem_v2(<2 x i32> addrspace(1)* %out, <2 x i3 ; GFX6-LABEL: test_udivrem_v2: ; GFX6: ; %bb.0: ; GFX6-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0xb -; GFX6-NEXT: s_mov_b32 s2, 0x4f800000 -; GFX6-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9 -; GFX6-NEXT: s_mov_b32 s11, 0xf000 -; GFX6-NEXT: s_mov_b32 s10, -1 +; GFX6-NEXT: s_mov_b32 s3, 0x4f7ffffe +; GFX6-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GFX6-NEXT: s_waitcnt lgkmcnt(0) ; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s6 +; GFX6-NEXT: s_sub_i32 s2, 0, s6 ; GFX6-NEXT: v_cvt_f32_u32_e32 v1, s7 ; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0 ; GFX6-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX6-NEXT: v_mul_f32_e32 v0, s2, v0 +; GFX6-NEXT: v_mul_f32_e32 v0, s3, v0 ; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX6-NEXT: v_mul_f32_e32 v1, s2, v1 +; GFX6-NEXT: v_mul_f32_e32 v1, s3, v1 ; GFX6-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX6-NEXT: v_mul_lo_u32 v2, v0, s6 -; GFX6-NEXT: v_mul_hi_u32 v3, v0, s6 -; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 -; GFX6-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v2, v2, v0 -; GFX6-NEXT: v_mul_lo_u32 v3, v1, s7 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, v2, v0 -; GFX6-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GFX6-NEXT: v_mul_hi_u32 v2, v1, s7 -; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] -; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; GFX6-NEXT: v_mul_hi_u32 v0, v0, s4 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GFX6-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v2, v2, v1 +; GFX6-NEXT: s_mov_b32 s3, 0xf000 +; GFX6-NEXT: v_mul_lo_u32 v2, s2, v0 +; GFX6-NEXT: s_sub_i32 s2, 0, s7 +; GFX6-NEXT: v_mul_hi_u32 v2, v0, v2 +; GFX6-NEXT: v_add_i32_e32 v0, vcc, v2, v0 +; GFX6-NEXT: v_mul_hi_u32 v0, s4, v0 +; GFX6-NEXT: v_mul_lo_u32 v2, s2, v1 +; GFX6-NEXT: s_mov_b32 s2, -1 ; GFX6-NEXT: v_mul_lo_u32 v0, v0, s6 -; GFX6-NEXT: v_add_i32_e32 v5, vcc, v2, v1 -; GFX6-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v1, v1, s5 -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s4, v0 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[0:1], s4, v0 -; GFX6-NEXT: v_cmp_le_u32_e64 s[2:3], s6, v3 +; GFX6-NEXT: v_mul_hi_u32 v2, v1, v2 +; GFX6-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GFX6-NEXT: v_subrev_i32_e32 v3, vcc, s6, v0 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; GFX6-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX6-NEXT: v_subrev_i32_e32 v3, vcc, s6, v0 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; GFX6-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX6-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; GFX6-NEXT: v_mul_hi_u32 v1, s5, v1 ; GFX6-NEXT: v_mul_lo_u32 v1, v1, s7 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, s6, v3 -; GFX6-NEXT: v_subrev_i32_e32 v0, vcc, s6, v3 -; GFX6-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GFX6-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GFX6-NEXT: v_sub_i32_e32 v2, vcc, s5, v1 -; GFX6-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[0:1] -; GFX6-NEXT: v_cmp_ge_u32_e64 s[2:3], s5, v1 -; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s7, v2 -; GFX6-NEXT: v_add_i32_e32 v3, vcc, s7, v2 -; GFX6-NEXT: v_subrev_i32_e32 v1, vcc, s7, v2 -; GFX6-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX6-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc -; GFX6-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GFX6-NEXT: buffer_store_dwordx2 v[0:1], off, s[8:11], 0 +; GFX6-NEXT: v_sub_i32_e32 v1, vcc, s5, v1 +; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s7, v1 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s7, v1 +; GFX6-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s7, v1 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s7, v1 +; GFX6-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GFX6-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0 ; GFX6-NEXT: s_endpgm ; ; GFX8-LABEL: test_udivrem_v2: ; GFX8: ; %bb.0: -; GFX8-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x2c -; GFX8-NEXT: s_mov_b32 s2, 0x4f800000 -; GFX8-NEXT: s_load_dwordx2 s[6:7], s[0:1], 0x24 +; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x2c +; GFX8-NEXT: s_mov_b32 s3, 0x4f7ffffe +; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s10 -; GFX8-NEXT: v_cvt_f32_u32_e32 v1, s11 +; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s6 +; GFX8-NEXT: s_sub_i32 s2, 0, s6 +; GFX8-NEXT: v_cvt_f32_u32_e32 v1, s7 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v0, v0 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX8-NEXT: v_mul_f32_e32 v0, s2, v0 +; GFX8-NEXT: v_mul_f32_e32 v0, s3, v0 ; GFX8-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX8-NEXT: v_mul_f32_e32 v1, s2, v1 +; GFX8-NEXT: v_mul_f32_e32 v1, s3, v1 ; GFX8-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX8-NEXT: v_mul_hi_u32 v2, v0, s10 -; GFX8-NEXT: v_mul_lo_u32 v3, v0, s10 -; GFX8-NEXT: v_mul_hi_u32 v4, v1, s11 -; GFX8-NEXT: v_mul_lo_u32 v5, v1, s11 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GFX8-NEXT: v_sub_u32_e32 v6, vcc, 0, v3 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v3, v6, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v2, v2, v0 -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, 0, v5 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v4 -; GFX8-NEXT: v_add_u32_e32 v6, vcc, v2, v0 -; GFX8-NEXT: v_subrev_u32_e32 v0, vcc, v2, v0 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v5, v3, s[4:5] -; GFX8-NEXT: v_mul_hi_u32 v2, v2, v1 -; GFX8-NEXT: v_cndmask_b32_e64 v0, v0, v6, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v0, v0, s8 -; GFX8-NEXT: v_add_u32_e32 v3, vcc, v2, v1 -; GFX8-NEXT: v_subrev_u32_e32 v1, vcc, v2, v1 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[4:5] -; GFX8-NEXT: v_mul_hi_u32 v1, v1, s9 -; GFX8-NEXT: v_mul_lo_u32 v0, v0, s10 -; GFX8-NEXT: v_mul_lo_u32 v1, v1, s11 -; GFX8-NEXT: v_sub_u32_e32 v4, vcc, s8, v0 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[0:1], s8, v0 -; GFX8-NEXT: v_cmp_le_u32_e64 s[2:3], s10, v4 -; GFX8-NEXT: v_add_u32_e32 v5, vcc, s10, v4 -; GFX8-NEXT: v_subrev_u32_e32 v0, vcc, s10, v4 -; GFX8-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GFX8-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; GFX8-NEXT: v_sub_u32_e32 v2, vcc, s9, v1 -; GFX8-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[0:1] -; GFX8-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v1 -; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s11, v2 -; GFX8-NEXT: v_add_u32_e32 v3, vcc, s11, v2 -; GFX8-NEXT: v_subrev_u32_e32 v1, vcc, s11, v2 -; GFX8-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX8-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc -; GFX8-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[2:3] -; GFX8-NEXT: v_mov_b32_e32 v2, s6 -; GFX8-NEXT: v_mov_b32_e32 v3, s7 +; GFX8-NEXT: v_mul_lo_u32 v2, s2, v0 +; GFX8-NEXT: s_sub_i32 s2, 0, s7 +; GFX8-NEXT: v_mul_hi_u32 v2, v0, v2 +; GFX8-NEXT: v_add_u32_e32 v0, vcc, v2, v0 +; GFX8-NEXT: v_mul_hi_u32 v0, s4, v0 +; GFX8-NEXT: v_mul_lo_u32 v2, s2, v1 +; GFX8-NEXT: v_mul_lo_u32 v0, v0, s6 +; GFX8-NEXT: v_mul_hi_u32 v2, v1, v2 +; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s4, v0 +; GFX8-NEXT: v_subrev_u32_e32 v3, vcc, s6, v0 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; GFX8-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX8-NEXT: v_subrev_u32_e32 v3, vcc, s6, v0 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; GFX8-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v2, v1 +; GFX8-NEXT: v_mul_hi_u32 v1, s5, v1 +; GFX8-NEXT: v_mul_lo_u32 v1, v1, s7 +; GFX8-NEXT: v_sub_u32_e32 v1, vcc, s5, v1 +; GFX8-NEXT: v_subrev_u32_e32 v2, vcc, s7, v1 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s7, v1 +; GFX8-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GFX8-NEXT: v_subrev_u32_e32 v2, vcc, s7, v1 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s7, v1 +; GFX8-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc +; GFX8-NEXT: v_mov_b32_e32 v3, s1 +; GFX8-NEXT: v_mov_b32_e32 v2, s0 ; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1] ; GFX8-NEXT: s_endpgm %result0 = udiv <2 x i32> %x, %y @@ -390,207 +354,159 @@ define amdgpu_kernel void @test_udivrem_v4(<4 x i32> addrspace(1)* %out, <4 x i3 ; ; GFX6-LABEL: test_udivrem_v4: ; GFX6: ; %bb.0: -; GFX6-NEXT: s_load_dwordx8 s[8:15], s[0:1], 0xd -; GFX6-NEXT: s_mov_b32 s6, 0x4f800000 -; GFX6-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x9 -; GFX6-NEXT: s_mov_b32 s19, 0xf000 -; GFX6-NEXT: s_mov_b32 s18, -1 +; GFX6-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0xd +; GFX6-NEXT: s_mov_b32 s12, 0x4f7ffffe +; GFX6-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GFX6-NEXT: s_waitcnt lgkmcnt(0) -; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s12 -; GFX6-NEXT: v_cvt_f32_u32_e32 v1, s13 -; GFX6-NEXT: v_cvt_f32_u32_e32 v7, s15 +; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s8 +; GFX6-NEXT: s_sub_i32 s2, 0, s8 +; GFX6-NEXT: v_cvt_f32_u32_e32 v1, s9 +; GFX6-NEXT: v_cvt_f32_u32_e32 v4, s11 ; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GFX6-NEXT: s_sub_i32 s3, 0, s9 ; GFX6-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX6-NEXT: v_mul_f32_e32 v0, s6, v0 +; GFX6-NEXT: v_cvt_f32_u32_e32 v2, s10 +; GFX6-NEXT: v_mul_f32_e32 v0, s12, v0 ; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX6-NEXT: v_mul_f32_e32 v1, s6, v1 +; GFX6-NEXT: v_mul_f32_e32 v1, s12, v1 ; GFX6-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX6-NEXT: v_mul_lo_u32 v2, v0, s12 -; GFX6-NEXT: v_mul_hi_u32 v3, v0, s12 -; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v3 -; GFX6-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v2, v2, v0 -; GFX6-NEXT: v_mul_lo_u32 v3, v1, s13 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, v2, v0 -; GFX6-NEXT: v_subrev_i32_e32 v0, vcc, v2, v0 -; GFX6-NEXT: v_mul_hi_u32 v2, v1, s13 -; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[0:1] -; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v3 -; GFX6-NEXT: v_mul_hi_u32 v0, v0, s8 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GFX6-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v2, v2, v1 -; GFX6-NEXT: v_mul_lo_u32 v0, v0, s12 -; GFX6-NEXT: v_add_i32_e32 v5, vcc, v2, v1 -; GFX6-NEXT: v_subrev_i32_e32 v1, vcc, v2, v1 -; GFX6-NEXT: v_cvt_f32_u32_e32 v2, s14 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GFX6-NEXT: v_mul_hi_u32 v1, v1, s9 -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s8, v0 ; GFX6-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[4:5], s8, v0 -; GFX6-NEXT: v_mul_lo_u32 v1, v1, s13 -; GFX6-NEXT: v_cmp_le_u32_e64 s[2:3], s12, v3 -; GFX6-NEXT: v_mul_f32_e32 v2, s6, v2 +; GFX6-NEXT: v_mul_lo_u32 v3, s2, v0 +; GFX6-NEXT: s_sub_i32 s2, 0, s10 +; GFX6-NEXT: v_mul_f32_e32 v2, s12, v2 +; GFX6-NEXT: v_mul_hi_u32 v3, v0, v3 ; GFX6-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, s12, v3 -; GFX6-NEXT: v_subrev_i32_e32 v0, vcc, s12, v3 -; GFX6-NEXT: s_and_b64 vcc, s[2:3], s[4:5] -; GFX6-NEXT: v_mul_lo_u32 v5, v2, s14 -; GFX6-NEXT: v_mul_hi_u32 v6, v2, s14 -; GFX6-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GFX6-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s9, v1 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v1 -; GFX6-NEXT: v_sub_i32_e32 v1, vcc, 0, v5 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v6 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[4:5] -; GFX6-NEXT: v_mul_hi_u32 v1, v1, v2 -; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v3 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, s13, v3 -; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s13, v3 -; GFX6-NEXT: v_add_i32_e32 v6, vcc, v1, v2 -; GFX6-NEXT: v_subrev_i32_e32 v1, vcc, v1, v2 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v6, s[4:5] -; GFX6-NEXT: v_mul_hi_u32 v1, v1, s10 -; GFX6-NEXT: v_rcp_iflag_f32_e32 v2, v7 -; GFX6-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX6-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX6-NEXT: v_mul_lo_u32 v5, v1, s14 -; GFX6-NEXT: v_mul_f32_e32 v1, s6, v2 -; GFX6-NEXT: v_cvt_u32_f32_e32 v2, v1 -; GFX6-NEXT: v_cndmask_b32_e64 v1, v4, v3, s[2:3] -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s10, v5 -; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v3 -; GFX6-NEXT: v_mul_lo_u32 v4, v2, s15 -; GFX6-NEXT: v_mul_hi_u32 v6, v2, s15 -; GFX6-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; GFX6-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v6 -; GFX6-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[2:3] -; GFX6-NEXT: v_mul_hi_u32 v4, v4, v2 -; GFX6-NEXT: v_add_i32_e32 v6, vcc, s14, v3 -; GFX6-NEXT: v_add_i32_e32 v7, vcc, v4, v2 -; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, v4, v2 -; GFX6-NEXT: v_cndmask_b32_e64 v2, v2, v7, s[2:3] -; GFX6-NEXT: v_mul_hi_u32 v2, v2, s11 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[2:3], s10, v5 -; GFX6-NEXT: v_subrev_i32_e32 v4, vcc, s14, v3 -; GFX6-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX6-NEXT: v_mul_lo_u32 v5, v2, s15 -; GFX6-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc -; GFX6-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s11, v5 -; GFX6-NEXT: v_cmp_ge_u32_e64 s[2:3], s11, v5 -; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v3 -; GFX6-NEXT: v_add_i32_e32 v4, vcc, s15, v3 -; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s15, v3 -; GFX6-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX6-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX6-NEXT: v_cndmask_b32_e64 v3, v4, v3, s[2:3] -; GFX6-NEXT: buffer_store_dwordx4 v[0:3], off, s[16:19], 0 +; GFX6-NEXT: v_add_i32_e32 v0, vcc, v3, v0 +; GFX6-NEXT: v_mul_hi_u32 v0, s4, v0 +; GFX6-NEXT: v_rcp_iflag_f32_e32 v3, v4 +; GFX6-NEXT: v_mul_lo_u32 v4, s3, v1 +; GFX6-NEXT: s_mov_b32 s3, 0xf000 +; GFX6-NEXT: v_mul_lo_u32 v0, v0, s8 +; GFX6-NEXT: v_mul_f32_e32 v3, s12, v3 +; GFX6-NEXT: v_mul_hi_u32 v4, v1, v4 +; GFX6-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GFX6-NEXT: v_sub_i32_e32 v0, vcc, s4, v0 +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s8, v0 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GFX6-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s8, v0 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GFX6-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GFX6-NEXT: v_add_i32_e32 v1, vcc, v4, v1 +; GFX6-NEXT: v_mul_hi_u32 v1, s5, v1 +; GFX6-NEXT: v_mul_lo_u32 v4, s2, v2 +; GFX6-NEXT: s_sub_i32 s2, 0, s11 +; GFX6-NEXT: v_mul_lo_u32 v1, v1, s9 +; GFX6-NEXT: v_mul_hi_u32 v4, v2, v4 +; GFX6-NEXT: v_sub_i32_e32 v1, vcc, s5, v1 +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s9, v1 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GFX6-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s9, v1 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GFX6-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GFX6-NEXT: v_add_i32_e32 v2, vcc, v4, v2 +; GFX6-NEXT: v_mul_hi_u32 v2, s6, v2 +; GFX6-NEXT: v_mul_lo_u32 v4, s2, v3 +; GFX6-NEXT: s_mov_b32 s2, -1 +; GFX6-NEXT: v_mul_lo_u32 v2, v2, s10 +; GFX6-NEXT: v_mul_hi_u32 v4, v3, v4 +; GFX6-NEXT: v_sub_i32_e32 v2, vcc, s6, v2 +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s10, v2 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GFX6-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GFX6-NEXT: v_subrev_i32_e32 v5, vcc, s10, v2 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GFX6-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GFX6-NEXT: v_add_i32_e32 v3, vcc, v4, v3 +; GFX6-NEXT: v_mul_hi_u32 v3, s7, v3 +; GFX6-NEXT: v_mul_lo_u32 v3, v3, s11 +; GFX6-NEXT: v_sub_i32_e32 v3, vcc, s7, v3 +; GFX6-NEXT: v_subrev_i32_e32 v4, vcc, s11, v3 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GFX6-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX6-NEXT: v_subrev_i32_e32 v4, vcc, s11, v3 +; GFX6-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GFX6-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX6-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 ; GFX6-NEXT: s_endpgm ; ; GFX8-LABEL: test_udivrem_v4: ; GFX8: ; %bb.0: -; GFX8-NEXT: s_load_dwordx8 s[8:15], s[0:1], 0x34 -; GFX8-NEXT: s_mov_b32 s16, 0x4f800000 -; GFX8-NEXT: s_load_dwordx2 s[6:7], s[0:1], 0x24 +; GFX8-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x34 +; GFX8-NEXT: s_mov_b32 s12, 0x4f7ffffe +; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s12 -; GFX8-NEXT: v_cvt_f32_u32_e32 v1, s13 -; GFX8-NEXT: v_cvt_f32_u32_e32 v7, s15 +; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s8 +; GFX8-NEXT: s_sub_i32 s2, 0, s8 +; GFX8-NEXT: v_cvt_f32_u32_e32 v1, s9 +; GFX8-NEXT: v_cvt_f32_u32_e32 v4, s11 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v0, v0 +; GFX8-NEXT: s_sub_i32 s3, 0, s9 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX8-NEXT: v_mul_f32_e32 v0, s16, v0 +; GFX8-NEXT: v_cvt_f32_u32_e32 v2, s10 +; GFX8-NEXT: v_mul_f32_e32 v0, s12, v0 ; GFX8-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GFX8-NEXT: v_mul_f32_e32 v1, s16, v1 +; GFX8-NEXT: v_mul_f32_e32 v1, s12, v1 ; GFX8-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX8-NEXT: v_mul_lo_u32 v2, v0, s12 -; GFX8-NEXT: v_mul_hi_u32 v3, v0, s12 -; GFX8-NEXT: v_sub_u32_e32 v4, vcc, 0, v2 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v3 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v2, v4, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v2, v2, v0 -; GFX8-NEXT: v_mul_lo_u32 v3, v1, s13 -; GFX8-NEXT: v_add_u32_e32 v4, vcc, v2, v0 -; GFX8-NEXT: v_subrev_u32_e32 v0, vcc, v2, v0 -; GFX8-NEXT: v_mul_hi_u32 v2, v1, s13 -; GFX8-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[2:3] -; GFX8-NEXT: v_sub_u32_e32 v4, vcc, 0, v3 -; GFX8-NEXT: v_mul_hi_u32 v0, v0, s8 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1] -; GFX8-NEXT: v_mul_hi_u32 v2, v2, v1 -; GFX8-NEXT: v_mul_lo_u32 v0, v0, s12 -; GFX8-NEXT: v_add_u32_e32 v5, vcc, v2, v1 -; GFX8-NEXT: v_subrev_u32_e32 v1, vcc, v2, v1 -; GFX8-NEXT: v_cvt_f32_u32_e32 v2, s14 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[0:1] -; GFX8-NEXT: v_mul_hi_u32 v1, v1, s9 -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s8, v0 ; GFX8-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[4:5], s8, v0 -; GFX8-NEXT: v_mul_lo_u32 v1, v1, s13 -; GFX8-NEXT: v_cmp_le_u32_e64 s[2:3], s12, v3 -; GFX8-NEXT: v_mul_f32_e32 v2, s16, v2 +; GFX8-NEXT: v_mul_lo_u32 v3, s2, v0 +; GFX8-NEXT: s_sub_i32 s2, 0, s10 +; GFX8-NEXT: v_mul_f32_e32 v2, s12, v2 +; GFX8-NEXT: v_mul_hi_u32 v3, v0, v3 ; GFX8-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GFX8-NEXT: v_add_u32_e32 v4, vcc, s12, v3 -; GFX8-NEXT: v_subrev_u32_e32 v0, vcc, s12, v3 -; GFX8-NEXT: s_and_b64 vcc, s[2:3], s[4:5] -; GFX8-NEXT: v_mul_lo_u32 v5, v2, s14 -; GFX8-NEXT: v_mul_hi_u32 v6, v2, s14 -; GFX8-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GFX8-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s9, v1 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[2:3], s9, v1 -; GFX8-NEXT: v_sub_u32_e32 v1, vcc, 0, v5 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v6 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[4:5] -; GFX8-NEXT: v_mul_hi_u32 v1, v1, v2 -; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s13, v3 -; GFX8-NEXT: v_add_u32_e32 v4, vcc, s13, v3 -; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s13, v3 -; GFX8-NEXT: v_add_u32_e32 v6, vcc, v1, v2 -; GFX8-NEXT: v_subrev_u32_e32 v1, vcc, v1, v2 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v1, v6, s[4:5] -; GFX8-NEXT: v_mul_hi_u32 v1, v1, s10 -; GFX8-NEXT: v_rcp_iflag_f32_e32 v2, v7 -; GFX8-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX8-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX8-NEXT: v_mul_lo_u32 v5, v1, s14 -; GFX8-NEXT: v_mul_f32_e32 v1, s16, v2 -; GFX8-NEXT: v_cvt_u32_f32_e32 v2, v1 -; GFX8-NEXT: v_cndmask_b32_e64 v1, v4, v3, s[2:3] -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s10, v5 -; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s14, v3 -; GFX8-NEXT: v_mul_lo_u32 v4, v2, s15 -; GFX8-NEXT: v_mul_hi_u32 v6, v2, s15 -; GFX8-NEXT: v_sub_u32_e32 v7, vcc, 0, v4 -; GFX8-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v6 -; GFX8-NEXT: v_cndmask_b32_e64 v4, v4, v7, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v4, v4, v2 -; GFX8-NEXT: v_add_u32_e32 v6, vcc, s14, v3 -; GFX8-NEXT: v_add_u32_e32 v7, vcc, v4, v2 -; GFX8-NEXT: v_subrev_u32_e32 v2, vcc, v4, v2 -; GFX8-NEXT: v_cndmask_b32_e64 v2, v2, v7, s[2:3] -; GFX8-NEXT: v_mul_hi_u32 v2, v2, s11 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[2:3], s10, v5 -; GFX8-NEXT: v_subrev_u32_e32 v4, vcc, s14, v3 -; GFX8-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX8-NEXT: v_mul_lo_u32 v5, v2, s15 -; GFX8-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc -; GFX8-NEXT: v_cndmask_b32_e64 v2, v6, v2, s[2:3] -; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s11, v5 -; GFX8-NEXT: v_cmp_ge_u32_e64 s[2:3], s11, v5 -; GFX8-NEXT: v_cmp_le_u32_e64 s[0:1], s15, v3 -; GFX8-NEXT: v_add_u32_e32 v4, vcc, s15, v3 -; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s15, v3 -; GFX8-NEXT: s_and_b64 vcc, s[0:1], s[2:3] -; GFX8-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX8-NEXT: v_cndmask_b32_e64 v3, v4, v3, s[2:3] -; GFX8-NEXT: v_mov_b32_e32 v4, s6 -; GFX8-NEXT: v_mov_b32_e32 v5, s7 +; GFX8-NEXT: v_add_u32_e32 v0, vcc, v3, v0 +; GFX8-NEXT: v_mul_hi_u32 v0, s4, v0 +; GFX8-NEXT: v_rcp_iflag_f32_e32 v3, v4 +; GFX8-NEXT: v_mul_lo_u32 v4, s3, v1 +; GFX8-NEXT: v_mul_lo_u32 v0, v0, s8 +; GFX8-NEXT: v_mul_f32_e32 v3, s12, v3 +; GFX8-NEXT: v_mul_hi_u32 v4, v1, v4 +; GFX8-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s4, v0 +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s8, v0 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GFX8-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s8, v0 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GFX8-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v4, v1 +; GFX8-NEXT: v_mul_hi_u32 v1, s5, v1 +; GFX8-NEXT: v_mul_lo_u32 v4, s2, v2 +; GFX8-NEXT: s_sub_i32 s2, 0, s11 +; GFX8-NEXT: v_mul_lo_u32 v1, v1, s9 +; GFX8-NEXT: v_mul_hi_u32 v4, v2, v4 +; GFX8-NEXT: v_sub_u32_e32 v1, vcc, s5, v1 +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s9, v1 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GFX8-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s9, v1 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 +; GFX8-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GFX8-NEXT: v_add_u32_e32 v2, vcc, v4, v2 +; GFX8-NEXT: v_mul_hi_u32 v2, s6, v2 +; GFX8-NEXT: v_mul_lo_u32 v4, s2, v3 +; GFX8-NEXT: v_mul_lo_u32 v2, v2, s10 +; GFX8-NEXT: v_mul_hi_u32 v4, v3, v4 +; GFX8-NEXT: v_sub_u32_e32 v2, vcc, s6, v2 +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s10, v2 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GFX8-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GFX8-NEXT: v_subrev_u32_e32 v5, vcc, s10, v2 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s10, v2 +; GFX8-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc +; GFX8-NEXT: v_add_u32_e32 v3, vcc, v4, v3 +; GFX8-NEXT: v_mul_hi_u32 v3, s7, v3 +; GFX8-NEXT: v_mul_lo_u32 v3, v3, s11 +; GFX8-NEXT: v_sub_u32_e32 v3, vcc, s7, v3 +; GFX8-NEXT: v_subrev_u32_e32 v4, vcc, s11, v3 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GFX8-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX8-NEXT: v_subrev_u32_e32 v4, vcc, s11, v3 +; GFX8-NEXT: v_cmp_le_u32_e32 vcc, s11, v3 +; GFX8-NEXT: v_cndmask_b32_e32 v3, v3, v4, vcc +; GFX8-NEXT: v_mov_b32_e32 v5, s1 +; GFX8-NEXT: v_mov_b32_e32 v4, s0 ; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3] ; GFX8-NEXT: s_endpgm %result0 = udiv <4 x i32> %x, %y From llvm-commits at lists.llvm.org Wed Jul 8 11:15:29 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Wed, 08 Jul 2020 11:15:29 -0700 (PDT) Subject: [llvm] ecac951 - [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Message-ID: <5f060d41.1c69fb81.1c798.14b7@mx.google.com> Author: Jay Foad Date: 2020-07-08T19:14:49+01:00 New Revision: ecac951be92b71e5ec887a9fc768f202e4a8ab69 URL: https://github.com/llvm/llvm-project/commit/ecac951be92b71e5ec887a9fc768f202e4a8ab69 DIFF: https://github.com/llvm/llvm-project/commit/ecac951be92b71e5ec887a9fc768f202e4a8ab69.diff LOG: [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83382 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructions.td llvm/lib/Target/AMDGPU/CaymanInstructions.td llvm/lib/Target/AMDGPU/SIInstructions.td llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index 60be877a42c3..9f49136c986f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -1976,104 +1976,43 @@ SDValue AMDGPUTargetLowering::LowerUDIVREM(SDValue Op, return Res; } - SDValue Num = Op.getOperand(0); - SDValue Den = Op.getOperand(1); - - // RCP = URECIP(Den) = 2^32 / Den + e - // e is rounding error. - SDValue RCP = DAG.getNode(AMDGPUISD::URECIP, DL, VT, Den); - - // RCP_LO = mul(RCP, Den) */ - SDValue RCP_LO = DAG.getNode(ISD::MUL, DL, VT, RCP, Den); - - // RCP_HI = mulhu (RCP, Den) */ - SDValue RCP_HI = DAG.getNode(ISD::MULHU, DL, VT, RCP, Den); - - // NEG_RCP_LO = -RCP_LO - SDValue NEG_RCP_LO = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), - RCP_LO); - - const SDValue Zero = DAG.getConstant(0, DL, VT); - const EVT CCVT = getSetCCResultType(DAG.getDataLayout(), - *DAG.getContext(), VT); - - // ABS_RCP_LO = (RCP_HI == 0 ? NEG_RCP_LO : RCP_LO) - SDValue CmpRcpHiZero = DAG.getSetCC(DL, CCVT, RCP_HI, Zero, ISD::SETEQ); - SDValue ABS_RCP_LO = DAG.getNode(ISD::SELECT, - DL, VT, CmpRcpHiZero, NEG_RCP_LO, RCP_LO); - - // Calculate the rounding error from the URECIP instruction - // E = mulhu(ABS_RCP_LO, RCP) - SDValue E = DAG.getNode(ISD::MULHU, DL, VT, ABS_RCP_LO, RCP); - - // RCP_A_E = RCP + E - SDValue RCP_A_E = DAG.getNode(ISD::ADD, DL, VT, RCP, E); - - // RCP_S_E = RCP - E - SDValue RCP_S_E = DAG.getNode(ISD::SUB, DL, VT, RCP, E); - - // Tmp0 = (RCP_HI == 0 ? RCP_A_E : RCP_SUB_E) - SDValue Tmp0 = DAG.getNode(ISD::SELECT, DL, VT, - CmpRcpHiZero, RCP_A_E, RCP_S_E); - - // Quotient = mulhu(Tmp0, Num) - SDValue Quotient = DAG.getNode(ISD::MULHU, DL, VT, Tmp0, Num); - - // Num_S_Remainder = Quotient * Den - SDValue Num_S_Remainder = DAG.getNode(ISD::MUL, DL, VT, Quotient, Den); - - // Remainder = Num - Num_S_Remainder - SDValue Remainder = DAG.getNode(ISD::SUB, DL, VT, Num, Num_S_Remainder); - - // Remainder_GE_Den = (Remainder >= Den) - SDValue Remainder_GE_Den = DAG.getSetCC(DL, CCVT, Remainder, Den, ISD::SETUGE); - - // Remainder_GE_Zero = (Num >= Num_S_Remainder) - SDValue Remainder_GE_Zero = DAG.getSetCC(DL, CCVT, Num, Num_S_Remainder, - ISD::SETUGE); - - // Tmp1 = Remainder_GE_Den & Remainder_GE_Zero - SDValue Tmp1 = DAG.getNode(ISD::AND, DL, CCVT, Remainder_GE_Den, - Remainder_GE_Zero); - - // Calculate Division result: - - // Quotient_A_One = Quotient + 1 - SDValue Quotient_A_One = DAG.getNode(ISD::ADD, DL, VT, Quotient, - DAG.getConstant(1, DL, VT)); - - // Quotient_S_One = Quotient - 1 - SDValue Quotient_S_One = DAG.getNode(ISD::SUB, DL, VT, Quotient, - DAG.getConstant(1, DL, VT)); - - // Div = (Tmp1 ? Quotient_A_One : Quotient) - SDValue Div = DAG.getNode(ISD::SELECT, DL, VT, Tmp1, - Quotient_A_One, Quotient); - - // Div = (Remainder_GE_Zero ? Div : Quotient_S_One) - Div = DAG.getNode(ISD::SELECT, DL, VT, Remainder_GE_Zero, - Div, Quotient_S_One); - - // Calculate Rem result: - - // Remainder_S_Den = Remainder - Den - SDValue Remainder_S_Den = DAG.getNode(ISD::SUB, DL, VT, Remainder, Den); - - // Remainder_A_Den = Remainder + Den - SDValue Remainder_A_Den = DAG.getNode(ISD::ADD, DL, VT, Remainder, Den); - - // Rem = (Tmp1 ? Remainder_S_Den : Remainder) - SDValue Rem = DAG.getNode(ISD::SELECT, DL, VT, Tmp1, - Remainder_S_Den, Remainder); + SDValue X = Op.getOperand(0); + SDValue Y = Op.getOperand(1); - // Rem = (Remainder_GE_Zero ? Rem : Remainder_A_Den) - Rem = DAG.getNode(ISD::SELECT, DL, VT, - Remainder_GE_Zero, Rem, Remainder_A_Den); - SDValue Ops[2] = { - Div, - Rem - }; - return DAG.getMergeValues(Ops, DL); + // See AMDGPUCodeGenPrepare::expandDivRem32 for a description of the + // algorithm used here. + + // Initial estimate of inv(y). + SDValue Z = DAG.getNode(AMDGPUISD::URECIP, DL, VT, Y); + + // One round of UNR. + SDValue NegY = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Y); + SDValue NegYZ = DAG.getNode(ISD::MUL, DL, VT, NegY, Z); + Z = DAG.getNode(ISD::ADD, DL, VT, Z, + DAG.getNode(ISD::MULHU, DL, VT, Z, NegYZ)); + + // Quotient/remainder estimate. + SDValue Q = DAG.getNode(ISD::MULHU, DL, VT, X, Z); + SDValue R = + DAG.getNode(ISD::SUB, DL, VT, X, DAG.getNode(ISD::MUL, DL, VT, Q, Y)); + + // First quotient/remainder refinement. + EVT CCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT); + SDValue One = DAG.getConstant(1, DL, VT); + SDValue Cond = DAG.getSetCC(DL, CCVT, R, Y, ISD::SETUGE); + Q = DAG.getNode(ISD::SELECT, DL, VT, Cond, + DAG.getNode(ISD::ADD, DL, VT, Q, One), Q); + R = DAG.getNode(ISD::SELECT, DL, VT, Cond, + DAG.getNode(ISD::SUB, DL, VT, R, Y), R); + + // Second quotient/remainder refinement. + Cond = DAG.getSetCC(DL, CCVT, R, Y, ISD::SETUGE); + Q = DAG.getNode(ISD::SELECT, DL, VT, Cond, + DAG.getNode(ISD::ADD, DL, VT, Q, One), Q); + R = DAG.getNode(ISD::SELECT, DL, VT, Cond, + DAG.getNode(ISD::SUB, DL, VT, R, Y), R); + + return DAG.getMergeValues({Q, R}, DL); } SDValue AMDGPUTargetLowering::LowerSDIVREM(SDValue Op, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td index 16b061d9251f..5cb7ac320d2f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td @@ -528,7 +528,7 @@ class Constants { int TWO_PI = 0x40c90fdb; int PI = 0x40490fdb; int TWO_PI_INV = 0x3e22f983; -int FP_UINT_MAX_PLUS_1 = 0x4f800000; // 1 << 32 in floating point encoding +int FP_4294966784 = 0x4f7ffffe; // 4294966784 = 4294967296 - 512 = 2^32 - 2^9 int FP16_ONE = 0x3C00; int FP16_NEG_ONE = 0xBC00; int FP32_ONE = 0x3f800000; diff --git a/llvm/lib/Target/AMDGPU/CaymanInstructions.td b/llvm/lib/Target/AMDGPU/CaymanInstructions.td index e2978624811d..f4ddbf1131c3 100644 --- a/llvm/lib/Target/AMDGPU/CaymanInstructions.td +++ b/llvm/lib/Target/AMDGPU/CaymanInstructions.td @@ -57,11 +57,12 @@ def : POW_Common ; defm DIV_cm : DIV_Common; // RECIP_UINT emulation for Cayman -// The multiplication scales from [0,1] to the unsigned integer range +// The multiplication scales from [0,1) to the unsigned integer range, +// rounding down a bit to avoid unwanted overflow. def : R600Pat < (AMDGPUurecip i32:$src0), (FLT_TO_UINT_eg (MUL_IEEE (RECIP_IEEE_cm (UINT_TO_FLT_eg $src0)), - (MOV_IMM_I32 CONST.FP_UINT_MAX_PLUS_1))) + (MOV_IMM_I32 CONST.FP_4294966784))) >; def CF_END_CM : CF_CLAUSE_EG<32, (ins), "CF_END"> { diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td index ec378379ca92..0c4c9e0e9df2 100644 --- a/llvm/lib/Target/AMDGPU/SIInstructions.td +++ b/llvm/lib/Target/AMDGPU/SIInstructions.td @@ -1552,11 +1552,12 @@ class Ext32Pat : GCNPat < def : Ext32Pat ; def : Ext32Pat ; -// The multiplication scales from [0,1] to the unsigned integer range +// The multiplication scales from [0,1) to the unsigned integer range, +// rounding down a bit to avoid unwanted overflow. def : GCNPat < (AMDGPUurecip i32:$src0), (V_CVT_U32_F32_e32 - (V_MUL_F32_e32 (i32 CONST.FP_UINT_MAX_PLUS_1), + (V_MUL_F32_e32 (i32 CONST.FP_4294966784), (V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0)))) >; diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll index 76f3a4989635..53e3005910cd 100644 --- a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll +++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll @@ -4356,34 +4356,30 @@ define amdgpu_kernel void @sdiv_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; GCN-NEXT: s_add_i32 s3, s3, s8 ; GCN-NEXT: s_xor_b32 s9, s3, s8 ; GCN-NEXT: v_cvt_f32_u32_e32 v0, s9 -; GCN-NEXT: s_ashr_i32 s3, s2, 31 -; GCN-NEXT: s_add_i32 s2, s2, s3 -; GCN-NEXT: s_xor_b32 s2, s2, s3 +; GCN-NEXT: s_sub_i32 s3, 0, s9 +; GCN-NEXT: s_ashr_i32 s0, s2, 31 +; GCN-NEXT: s_add_i32 s1, s2, s0 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_xor_b32 s3, s3, s8 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: s_xor_b32 s1, s1, s0 +; GCN-NEXT: s_xor_b32 s2, s0, s8 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s9 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[0:1], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s2 +; GCN-NEXT: v_mul_lo_u32 v1, s3, v0 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s1, v0 ; GCN-NEXT: v_mul_lo_u32 v1, v0, s9 ; GCN-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GCN-NEXT: v_add_i32_e32 v3, vcc, -1, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], s2, v1 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, s2, v1 +; GCN-NEXT: v_sub_i32_e32 v1, vcc, s1, v1 +; GCN-NEXT: v_cmp_le_u32_e64 s[0:1], s9, v1 +; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] +; GCN-NEXT: v_subrev_i32_e32 v2, vcc, s9, v1 +; GCN-NEXT: v_add_i32_e32 v3, vcc, 1, v0 +; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1] ; GCN-NEXT: v_cmp_le_u32_e32 vcc, s9, v1 -; GCN-NEXT: s_and_b64 vcc, vcc, s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[0:1] -; GCN-NEXT: v_xor_b32_e32 v0, s3, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s3, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GCN-NEXT: v_xor_b32_e32 v0, s2, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s2, v0 ; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 ; GCN-NEXT: s_endpgm %shl.y = shl i32 4096, %y @@ -4690,45 +4686,38 @@ define amdgpu_kernel void @srem_i32_pow2_shl_denom(i32 addrspace(1)* %out, i32 % ; ; GCN-LABEL: srem_i32_pow2_shl_denom: ; GCN: ; %bb.0: -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb -; GCN-NEXT: s_mov_b32 s7, 0xf000 -; GCN-NEXT: s_mov_b32 s6, -1 +; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xb +; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9 ; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: s_lshl_b32 s2, 0x1000, s5 -; GCN-NEXT: s_ashr_i32 s3, s2, 31 -; GCN-NEXT: s_add_i32 s2, s2, s3 -; GCN-NEXT: s_xor_b32 s10, s2, s3 -; GCN-NEXT: v_cvt_f32_u32_e32 v0, s10 -; GCN-NEXT: s_ashr_i32 s8, s4, 31 -; GCN-NEXT: s_add_i32 s4, s4, s8 -; GCN-NEXT: s_xor_b32 s9, s4, s8 +; GCN-NEXT: s_lshl_b32 s3, 0x1000, s3 +; GCN-NEXT: s_ashr_i32 s4, s3, 31 +; GCN-NEXT: s_add_i32 s3, s3, s4 +; GCN-NEXT: s_xor_b32 s4, s3, s4 +; GCN-NEXT: v_cvt_f32_u32_e32 v0, s4 +; GCN-NEXT: s_sub_i32 s3, 0, s4 +; GCN-NEXT: s_ashr_i32 s5, s2, 31 +; GCN-NEXT: s_add_i32 s2, s2, s5 ; GCN-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9 -; GCN-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GCN-NEXT: s_xor_b32 s6, s2, s5 +; GCN-NEXT: s_mov_b32 s2, -1 +; GCN-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GCN-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GCN-NEXT: v_mul_lo_u32 v1, v0, s10 -; GCN-NEXT: v_mul_hi_u32 v2, v0, s10 -; GCN-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GCN-NEXT: v_cmp_eq_u32_e64 s[2:3], 0, v2 -; GCN-NEXT: v_cndmask_b32_e64 v1, v1, v3, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v1, v1, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, v1, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, v1, v0 -; GCN-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[2:3] -; GCN-NEXT: v_mul_hi_u32 v0, v0, s9 -; GCN-NEXT: v_mul_lo_u32 v0, v0, s10 -; GCN-NEXT: v_sub_i32_e32 v1, vcc, s9, v0 -; GCN-NEXT: v_cmp_ge_u32_e64 s[0:1], s9, v0 -; GCN-NEXT: v_add_i32_e32 v2, vcc, s10, v1 -; GCN-NEXT: v_cmp_le_u32_e64 s[2:3], s10, v1 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s10, v1 -; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1] -; GCN-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] -; GCN-NEXT: v_xor_b32_e32 v0, s8, v0 -; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s8, v0 -; GCN-NEXT: s_waitcnt lgkmcnt(0) -; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0 +; GCN-NEXT: v_mul_lo_u32 v1, s3, v0 +; GCN-NEXT: s_mov_b32 s3, 0xf000 +; GCN-NEXT: v_mul_hi_u32 v1, v0, v1 +; GCN-NEXT: v_add_i32_e32 v0, vcc, v1, v0 +; GCN-NEXT: v_mul_hi_u32 v0, s6, v0 +; GCN-NEXT: v_mul_lo_u32 v0, v0, s4 +; GCN-NEXT: v_sub_i32_e32 v0, vcc, s6, v0 +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; GCN-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GCN-NEXT: v_xor_b32_e32 v0, s5, v0 +; GCN-NEXT: v_subrev_i32_e32 v0, vcc, s5, v0 +; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; GCN-NEXT: s_endpgm %shl.y = shl i32 4096, %y %r = srem i32 %x, %shl.y diff --git a/llvm/test/CodeGen/AMDGPU/bypass-div.ll b/llvm/test/CodeGen/AMDGPU/bypass-div.ll index 9fcd97721ee7..d10d911aeac0 100644 --- a/llvm/test/CodeGen/AMDGPU/bypass-div.ll +++ b/llvm/test/CodeGen/AMDGPU/bypass-div.ll @@ -138,36 +138,32 @@ define i64 @sdiv64(i64 %a, i64 %b) { ; GFX9-NEXT: v_sub_co_u32_e32 v3, vcc, v3, v5 ; GFX9-NEXT: v_subb_co_u32_e32 v4, vcc, v1, v5, vcc ; GFX9-NEXT: BB0_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[6:7] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB0_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 ; GFX9-NEXT: v_add_u32_e32 v4, 1, v1 -; GFX9-NEXT: v_add_u32_e32 v5, -1, v1 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v1, v4, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v0, vcc +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX9-NEXT: v_add_u32_e32 v3, 1, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v1, v3, vcc ; GFX9-NEXT: v_mov_b32_e32 v4, 0 ; GFX9-NEXT: BB0_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v3 ; GFX9-NEXT: v_mov_b32_e32 v1, v4 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -293,36 +289,32 @@ define i64 @udiv64(i64 %a, i64 %b) { ; GFX9-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc ; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v1, vcc ; GFX9-NEXT: BB1_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[6:7] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB1_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 +; GFX9-NEXT: v_mov_b32_e32 v5, 0 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 ; GFX9-NEXT: v_add_u32_e32 v4, 1, v1 -; GFX9-NEXT: v_add_u32_e32 v5, -1, v1 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 ; GFX9-NEXT: v_sub_u32_e32 v0, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v1, v4, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v0, vcc -; GFX9-NEXT: v_mov_b32_e32 v5, 0 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; GFX9-NEXT: v_add_u32_e32 v3, 1, v1 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v4, v1, v3, vcc ; GFX9-NEXT: BB1_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v4 ; GFX9-NEXT: v_mov_b32_e32 v1, v5 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -462,36 +454,30 @@ define i64 @srem64(i64 %a, i64 %b) { ; GFX9-NEXT: v_sub_co_u32_e32 v3, vcc, v3, v7 ; GFX9-NEXT: v_subb_co_u32_e32 v4, vcc, v1, v7, vcc ; GFX9-NEXT: BB2_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[8:9] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[8:9] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB2_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 +; GFX9-NEXT: v_mov_b32_e32 v4, 0 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v1, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v3, v0, v1 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v3, v2 -; GFX9-NEXT: v_sub_u32_e32 v0, v3, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_add_u32_e32 v4, v3, v2 -; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v0, vcc -; GFX9-NEXT: v_mov_b32_e32 v4, 0 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 +; GFX9-NEXT: v_sub_u32_e32 v1, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GFX9-NEXT: v_sub_u32_e32 v1, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v3, v0, v1, vcc ; GFX9-NEXT: BB2_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v3 ; GFX9-NEXT: v_mov_b32_e32 v1, v4 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -616,36 +602,30 @@ define i64 @urem64(i64 %a, i64 %b) { ; GFX9-NEXT: v_cndmask_b32_e64 v1, v7, v10, s[4:5] ; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v1, vcc ; GFX9-NEXT: BB3_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[8:9] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[8:9] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB3_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 +; GFX9-NEXT: v_mov_b32_e32 v5, 0 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_mov_b32_e32 v5, 0 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v1, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v3, v0, v1 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v3, v2 -; GFX9-NEXT: v_sub_u32_e32 v0, v3, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_add_u32_e32 v4, v3, v2 -; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, v0, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v4, v4, v0, vcc +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v1 +; GFX9-NEXT: v_sub_u32_e32 v1, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GFX9-NEXT: v_sub_u32_e32 v1, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v4, v0, v1, vcc ; GFX9-NEXT: BB3_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v4 ; GFX9-NEXT: v_mov_b32_e32 v1, v5 ; GFX9-NEXT: s_setpc_b64 s[30:31] @@ -924,41 +904,35 @@ define <2 x i64> @sdivrem64(i64 %a, i64 %b) { ; GFX9-NEXT: v_subb_co_u32_e64 v4, s[8:9], v7, v10, s[8:9] ; GFX9-NEXT: v_subb_co_u32_e32 v6, vcc, v1, v8, vcc ; GFX9-NEXT: BB8_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[10:11] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[10:11] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB8_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 +; GFX9-NEXT: v_mov_b32_e32 v4, 0 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 -; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_mov_b32_e32 v4, 0 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 +; GFX9-NEXT: v_add_u32_e32 v5, 1, v1 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v3 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 ; GFX9-NEXT: v_add_u32_e32 v6, 1, v1 -; GFX9-NEXT: v_add_u32_e32 v7, -1, v1 -; GFX9-NEXT: v_sub_u32_e32 v5, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v5, v2 -; GFX9-NEXT: v_sub_u32_e32 v0, v5, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_add_u32_e32 v8, v5, v2 -; GFX9-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v5, v8, v0, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v1, v6, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v0, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v5, v0, v3, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v3, v1, v6, vcc ; GFX9-NEXT: v_mov_b32_e32 v6, v4 ; GFX9-NEXT: BB8_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v3 ; GFX9-NEXT: v_mov_b32_e32 v1, v4 ; GFX9-NEXT: v_mov_b32_e32 v2, v5 @@ -1097,41 +1071,35 @@ define <2 x i64> @udivrem64(i64 %a, i64 %b) { ; GFX9-NEXT: v_cndmask_b32_e32 v5, v5, v11, vcc ; GFX9-NEXT: v_cndmask_b32_e32 v6, v8, v1, vcc ; GFX9-NEXT: BB9_2: ; %Flow -; GFX9-NEXT: s_or_saveexec_b64 s[6:7], s[8:9] -; GFX9-NEXT: s_xor_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_saveexec_b64 s[4:5], s[8:9] +; GFX9-NEXT: s_xor_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz BB9_4 ; GFX9-NEXT: ; %bb.3: ; GFX9-NEXT: v_cvt_f32_u32_e32 v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, 0, v2 +; GFX9-NEXT: v_mov_b32_e32 v5, 0 +; GFX9-NEXT: v_mov_b32_e32 v7, v5 ; GFX9-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; GFX9-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; GFX9-NEXT: v_cvt_u32_f32_e32 v1, v1 +; GFX9-NEXT: v_mul_lo_u32 v3, v3, v1 +; GFX9-NEXT: v_mul_hi_u32 v3, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_mul_hi_u32 v1, v0, v1 ; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v2 -; GFX9-NEXT: v_sub_u32_e32 v5, 0, v3 -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GFX9-NEXT: v_mul_hi_u32 v3, v3, v1 -; GFX9-NEXT: v_mov_b32_e32 v5, 0 -; GFX9-NEXT: v_add_u32_e32 v4, v1, v3 -; GFX9-NEXT: v_sub_u32_e32 v1, v1, v3 +; GFX9-NEXT: v_add_u32_e32 v4, 1, v1 +; GFX9-NEXT: v_sub_u32_e32 v0, v0, v3 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc ; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; GFX9-NEXT: v_mul_hi_u32 v1, v1, v0 -; GFX9-NEXT: v_mul_lo_u32 v3, v1, v2 +; GFX9-NEXT: v_sub_u32_e32 v3, v0, v2 +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 ; GFX9-NEXT: v_add_u32_e32 v4, 1, v1 -; GFX9-NEXT: v_add_u32_e32 v7, -1, v1 -; GFX9-NEXT: v_sub_u32_e32 v6, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v2 -; GFX9-NEXT: v_sub_u32_e32 v0, v6, v2 -; GFX9-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GFX9-NEXT: v_add_u32_e32 v8, v6, v2 -; GFX9-NEXT: v_cndmask_b32_e64 v0, v6, v0, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v6, v8, v0, vcc -; GFX9-NEXT: v_cndmask_b32_e64 v0, v1, v4, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v4, v7, v0, vcc -; GFX9-NEXT: v_mov_b32_e32 v7, v5 +; GFX9-NEXT: v_cndmask_b32_e32 v6, v0, v3, vcc +; GFX9-NEXT: v_cndmask_b32_e32 v4, v1, v4, vcc ; GFX9-NEXT: BB9_4: -; GFX9-NEXT: s_or_b64 exec, exec, s[6:7] +; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: v_mov_b32_e32 v0, v4 ; GFX9-NEXT: v_mov_b32_e32 v1, v5 ; GFX9-NEXT: v_mov_b32_e32 v2, v6 diff --git a/llvm/test/CodeGen/AMDGPU/sdiv.ll b/llvm/test/CodeGen/AMDGPU/sdiv.ll index bb932b403f31..f51d152fa157 100644 --- a/llvm/test/CodeGen/AMDGPU/sdiv.ll +++ b/llvm/test/CodeGen/AMDGPU/sdiv.ll @@ -153,7 +153,7 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %i ; EG: ; %bb.0: ; EG-NEXT: ALU 0, @8, KC0[CB0:0-32], KC1[] ; EG-NEXT: TEX 0 @6 -; EG-NEXT: ALU 30, @9, KC0[CB0:0-32], KC1[] +; EG-NEXT: ALU 26, @9, KC0[CB0:0-32], KC1[] ; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1 ; EG-NEXT: CF_END ; EG-NEXT: PAD @@ -165,29 +165,25 @@ define amdgpu_kernel void @sdiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %i ; EG-NEXT: SETGT_INT * T0.W, 0.0, T0.Y, ; EG-NEXT: ADD_INT * T1.W, T0.Y, PV.W, ; EG-NEXT: XOR_INT * T1.W, PV.W, T0.W, +; EG-NEXT: SUB_INT T2.W, 0.0, PV.W, ; EG-NEXT: RECIP_UINT * T0.Y, PV.W, -; EG-NEXT: MULLO_INT * T0.Z, PS, T1.W, -; EG-NEXT: SUB_INT T2.W, 0.0, PS, -; EG-NEXT: MULHI * T1.X, T0.Y, T1.W, -; EG-NEXT: CNDE_INT T2.W, PS, PV.W, T0.Z, -; EG-NEXT: SETGT_INT * T3.W, 0.0, T0.X, -; EG-NEXT: MULHI * T0.Z, PV.W, T0.Y, -; EG-NEXT: ADD_INT T1.Z, T0.X, T3.W, -; EG-NEXT: ADD_INT T2.W, T0.Y, PS, -; EG-NEXT: SUB_INT * T4.W, T0.Y, PS, -; EG-NEXT: CNDE_INT T2.W, T1.X, PV.W, PS, -; EG-NEXT: XOR_INT * T4.W, PV.Z, T3.W, -; EG-NEXT: MULHI * T0.X, PV.W, PS, +; EG-NEXT: SETGT_INT T3.W, 0.0, T0.X, +; EG-NEXT: MULLO_INT * T0.Z, PV.W, PS, +; EG-NEXT: ADD_INT T2.W, T0.X, PV.W, +; EG-NEXT: MULHI * T0.X, T0.Y, PS, +; EG-NEXT: ADD_INT T4.W, T0.Y, PS, +; EG-NEXT: XOR_INT * T2.W, PV.W, T3.W, +; EG-NEXT: MULHI * T0.X, PS, PV.W, ; EG-NEXT: MULLO_INT * T0.Y, PS, T1.W, -; EG-NEXT: SUB_INT * T2.W, T4.W, PS, -; EG-NEXT: SETGE_UINT T1.W, PV.W, T1.W, -; EG-NEXT: SETGE_UINT * T2.W, T4.W, T0.Y, -; EG-NEXT: AND_INT T1.W, PV.W, PS, -; EG-NEXT: ADD_INT * T4.W, T0.X, 1, -; EG-NEXT: CNDE_INT T1.W, PV.W, T0.X, PS, -; EG-NEXT: ADD_INT * T4.W, T0.X, literal.x, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T1.W, T2.W, PS, PV.W, +; EG-NEXT: SUB_INT * T2.W, T2.W, PS, +; EG-NEXT: ADD_INT T0.Z, T0.X, 1, +; EG-NEXT: SETGE_UINT T4.W, PV.W, T1.W, +; EG-NEXT: SUB_INT * T5.W, PV.W, T1.W, +; EG-NEXT: CNDE_INT T2.W, PV.W, T2.W, PS, +; EG-NEXT: CNDE_INT * T4.W, PV.W, T0.X, PV.Z, +; EG-NEXT: ADD_INT T5.W, PS, 1, +; EG-NEXT: SETGE_UINT * T1.W, PV.W, T1.W, +; EG-NEXT: CNDE_INT T1.W, PS, T4.W, PV.W, BS:VEC_102/SCL_221 ; EG-NEXT: XOR_INT * T0.W, T3.W, T0.W, ; EG-NEXT: XOR_INT * T1.W, PV.W, PS, ; EG-NEXT: SUB_INT T0.X, PV.W, T0.W, @@ -622,7 +618,7 @@ define amdgpu_kernel void @sdiv_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> ad ; EG: ; %bb.0: ; EG-NEXT: ALU 0, @10, KC0[CB0:0-32], KC1[] ; EG-NEXT: TEX 1 @6 -; EG-NEXT: ALU 59, @11, KC0[CB0:0-32], KC1[] +; EG-NEXT: ALU 51, @11, KC0[CB0:0-32], KC1[] ; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.XY, T1.X, 1 ; EG-NEXT: CF_END ; EG-NEXT: PAD @@ -633,61 +629,53 @@ define amdgpu_kernel void @sdiv_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> ad ; EG-NEXT: MOV * T0.X, KC0[2].Z, ; EG-NEXT: ALU clause starting at 11: ; EG-NEXT: SETGT_INT * T0.W, 0.0, T1.Y, -; EG-NEXT: ADD_INT * T1.W, T1.Y, PV.W, -; EG-NEXT: XOR_INT T1.W, PV.W, T0.W, +; EG-NEXT: ADD_INT T1.W, T1.Y, PV.W, ; EG-NEXT: SETGT_INT * T2.W, 0.0, T1.X, -; EG-NEXT: ADD_INT T3.W, T1.X, PS, -; EG-NEXT: RECIP_UINT * T0.Z, PV.W, -; EG-NEXT: XOR_INT T3.W, PV.W, T2.W, BS:VEC_021/SCL_122 -; EG-NEXT: MULLO_INT * T1.X, PS, T1.W, +; EG-NEXT: XOR_INT * T1.W, PV.W, T0.W, +; EG-NEXT: SUB_INT T0.Z, 0.0, PV.W, +; EG-NEXT: ADD_INT T3.W, T1.X, T2.W, +; EG-NEXT: RECIP_UINT * T1.X, PV.W, +; EG-NEXT: XOR_INT T3.W, PV.W, T2.W, +; EG-NEXT: MULLO_INT * T0.Z, PV.Z, PS, +; EG-NEXT: SUB_INT T4.W, 0.0, PV.W, ; EG-NEXT: RECIP_UINT * T1.Y, PV.W, -; EG-NEXT: MULLO_INT * T1.Z, PS, T3.W, -; EG-NEXT: SUB_INT T4.W, 0.0, PS, -; EG-NEXT: MULHI * T2.X, T1.Y, T3.W, -; EG-NEXT: CNDE_INT T1.Z, PS, PV.W, T1.Z, BS:VEC_021/SCL_122 -; EG-NEXT: SUB_INT T4.W, 0.0, T1.X, -; EG-NEXT: MULHI * T2.Y, T0.Z, T1.W, -; EG-NEXT: CNDE_INT T2.Z, PS, PV.W, T1.X, -; EG-NEXT: SETGT_INT T4.W, 0.0, T0.X, -; EG-NEXT: MULHI * T1.X, PV.Z, T1.Y, -; EG-NEXT: SETGT_INT T3.X, 0.0, T0.Y, -; EG-NEXT: ADD_INT T3.Y, T0.X, PV.W, -; EG-NEXT: ADD_INT T1.Z, T1.Y, PS, -; EG-NEXT: SUB_INT T5.W, T1.Y, PS, -; EG-NEXT: MULHI * T0.X, PV.Z, T0.Z, -; EG-NEXT: CNDE_INT T1.X, T2.X, PV.Z, PV.W, -; EG-NEXT: XOR_INT T1.Y, PV.Y, T4.W, -; EG-NEXT: ADD_INT T1.Z, T0.Y, PV.X, -; EG-NEXT: ADD_INT T5.W, T0.Z, PS, -; EG-NEXT: SUB_INT * T6.W, T0.Z, PS, -; EG-NEXT: CNDE_INT T0.Z, T2.Y, PV.W, PS, -; EG-NEXT: XOR_INT T5.W, PV.Z, T3.X, -; EG-NEXT: MULHI * T0.X, PV.X, PV.Y, -; EG-NEXT: MULHI * T0.Y, PV.Z, PV.W, +; EG-NEXT: SETGT_INT T5.W, 0.0, T0.X, +; EG-NEXT: MULLO_INT * T1.Z, PV.W, PS, +; EG-NEXT: SETGT_INT T2.Z, 0.0, T0.Y, +; EG-NEXT: ADD_INT T4.W, T0.X, PV.W, +; EG-NEXT: MULHI * T0.X, T1.Y, PS, +; EG-NEXT: ADD_INT T1.Y, T1.Y, PS, +; EG-NEXT: XOR_INT T1.Z, PV.W, T5.W, +; EG-NEXT: ADD_INT T4.W, T0.Y, PV.Z, BS:VEC_120/SCL_212 +; EG-NEXT: MULHI * T0.X, T1.X, T0.Z, +; EG-NEXT: ADD_INT T0.Z, T1.X, PS, +; EG-NEXT: XOR_INT T4.W, PV.W, T2.Z, +; EG-NEXT: MULHI * T0.X, PV.Z, PV.Y, +; EG-NEXT: MULHI * T0.Y, PV.W, PV.Z, ; EG-NEXT: MULLO_INT * T0.Z, PS, T1.W, -; EG-NEXT: SUB_INT T6.W, T5.W, PS, -; EG-NEXT: MULLO_INT * T1.X, T0.X, T3.W, -; EG-NEXT: SUB_INT T1.Z, T1.Y, PS, -; EG-NEXT: SETGE_UINT T1.W, PV.W, T1.W, -; EG-NEXT: SETGE_UINT * T5.W, T5.W, T0.Z, -; EG-NEXT: AND_INT T2.Y, PV.W, PS, +; EG-NEXT: SUB_INT T4.W, T4.W, PS, +; EG-NEXT: MULLO_INT * T0.Z, T0.X, T3.W, +; EG-NEXT: SUB_INT T1.Y, T1.Z, PS, ; EG-NEXT: ADD_INT T0.Z, T0.Y, 1, -; EG-NEXT: SETGE_UINT T1.W, PV.Z, T3.W, -; EG-NEXT: SETGE_UINT * T3.W, T1.Y, T1.X, -; EG-NEXT: AND_INT T1.Y, PV.W, PS, -; EG-NEXT: ADD_INT T1.Z, T0.X, 1, -; EG-NEXT: CNDE_INT T1.W, PV.Y, T0.Y, PV.Z, -; EG-NEXT: ADD_INT * T6.W, T0.Y, literal.x, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T0.Y, T5.W, PS, PV.W, -; EG-NEXT: XOR_INT T0.Z, T3.X, T0.W, -; EG-NEXT: CNDE_INT T0.W, PV.Y, T0.X, PV.Z, -; EG-NEXT: ADD_INT * T1.W, T0.X, literal.x, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T1.Z, T3.W, PS, PV.W, -; EG-NEXT: XOR_INT T0.W, T4.W, T2.W, BS:VEC_120/SCL_212 +; EG-NEXT: SETGE_UINT T6.W, PV.W, T1.W, +; EG-NEXT: SUB_INT * T7.W, PV.W, T1.W, +; EG-NEXT: CNDE_INT T1.X, PV.W, T4.W, PS, BS:VEC_021/SCL_122 +; EG-NEXT: CNDE_INT T0.Y, PV.W, T0.Y, PV.Z, +; EG-NEXT: ADD_INT T0.Z, T0.X, 1, +; EG-NEXT: SETGE_UINT T4.W, PV.Y, T3.W, +; EG-NEXT: SUB_INT * T6.W, PV.Y, T3.W, +; EG-NEXT: CNDE_INT T1.Y, PV.W, T1.Y, PS, +; EG-NEXT: CNDE_INT T0.Z, PV.W, T0.X, PV.Z, +; EG-NEXT: ADD_INT T4.W, PV.Y, 1, +; EG-NEXT: SETGE_UINT * T1.W, PV.X, T1.W, +; EG-NEXT: CNDE_INT T0.Y, PS, T0.Y, PV.W, +; EG-NEXT: XOR_INT T1.Z, T2.Z, T0.W, BS:VEC_021/SCL_122 +; EG-NEXT: ADD_INT T0.W, PV.Z, 1, +; EG-NEXT: SETGE_UINT * T1.W, PV.Y, T3.W, +; EG-NEXT: CNDE_INT T0.Z, PS, T0.Z, PV.W, +; EG-NEXT: XOR_INT T0.W, T5.W, T2.W, ; EG-NEXT: XOR_INT * T1.W, PV.Y, PV.Z, -; EG-NEXT: SUB_INT T0.Y, PS, T0.Z, +; EG-NEXT: SUB_INT T0.Y, PS, T1.Z, ; EG-NEXT: XOR_INT * T1.W, PV.Z, PV.W, ; EG-NEXT: SUB_INT T0.X, PV.W, T0.W, ; EG-NEXT: LSHR * T1.X, KC0[2].Y, literal.x, @@ -1214,138 +1202,118 @@ define amdgpu_kernel void @sdiv_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> ad ; ; EG-LABEL: sdiv_v4i32: ; EG: ; %bb.0: -; EG-NEXT: ALU 0, @12, KC0[CB0:0-32], KC1[] -; EG-NEXT: TEX 0 @8 -; EG-NEXT: ALU 2, @13, KC0[], KC1[] -; EG-NEXT: TEX 0 @10 -; EG-NEXT: ALU 114, @16, KC0[CB0:0-32], KC1[] -; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T3.XYZW, T0.X, 1 +; EG-NEXT: ALU 0, @10, KC0[CB0:0-32], KC1[] +; EG-NEXT: TEX 1 @6 +; EG-NEXT: ALU 101, @11, KC0[CB0:0-32], KC1[] +; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T2.XYZW, T0.X, 1 ; EG-NEXT: CF_END ; EG-NEXT: PAD -; EG-NEXT: Fetch clause starting at 8: +; EG-NEXT: Fetch clause starting at 6: ; EG-NEXT: VTX_READ_128 T1.XYZW, T0.X, 16, #1 -; EG-NEXT: Fetch clause starting at 10: -; EG-NEXT: VTX_READ_128 T3.XYZW, T0.X, 0, #1 -; EG-NEXT: ALU clause starting at 12: +; EG-NEXT: VTX_READ_128 T0.XYZW, T0.X, 0, #1 +; EG-NEXT: ALU clause starting at 10: ; EG-NEXT: MOV * T0.X, KC0[2].Z, -; EG-NEXT: ALU clause starting at 13: -; EG-NEXT: SETGT_INT * T0.W, 0.0, T1.Z, -; EG-NEXT: ADD_INT * T2.W, T1.Z, PV.W, -; EG-NEXT: XOR_INT * T2.W, PV.W, T0.W, -; EG-NEXT: ALU clause starting at 16: -; EG-NEXT: RECIP_UINT * T0.X, T2.W, -; EG-NEXT: MULLO_INT * T0.Y, PS, T2.W, -; EG-NEXT: SUB_INT T4.W, 0.0, PS, -; EG-NEXT: MULHI * T0.Z, T0.X, T2.W, -; EG-NEXT: CNDE_INT T4.W, PS, PV.W, T0.Y, -; EG-NEXT: SETGT_INT * T5.W, 0.0, T3.Z, -; EG-NEXT: MULHI * T0.Y, PV.W, T0.X, -; EG-NEXT: SETGT_INT T2.Y, 0.0, T1.W, -; EG-NEXT: ADD_INT T1.Z, T3.Z, T5.W, BS:VEC_021/SCL_122 -; EG-NEXT: ADD_INT T4.W, T0.X, PS, -; EG-NEXT: SUB_INT * T6.W, T0.X, PS, -; EG-NEXT: CNDE_INT T0.Z, T0.Z, PV.W, PS, -; EG-NEXT: XOR_INT T4.W, PV.Z, T5.W, -; EG-NEXT: ADD_INT * T1.W, T1.W, PV.Y, -; EG-NEXT: XOR_INT T1.W, PS, T2.Y, -; EG-NEXT: MULHI * T0.X, PV.Z, PV.W, -; EG-NEXT: SETGT_INT T6.W, 0.0, T1.Y, -; EG-NEXT: RECIP_UINT * T0.Y, PV.W, -; EG-NEXT: ADD_INT T7.W, T1.Y, PV.W, -; EG-NEXT: MULLO_INT * T0.Z, PS, T1.W, -; EG-NEXT: XOR_INT T1.Z, PV.W, T6.W, BS:VEC_021/SCL_122 -; EG-NEXT: SUB_INT T7.W, 0.0, PS, -; EG-NEXT: MULHI * T1.Y, T0.Y, T1.W, -; EG-NEXT: CNDE_INT T7.W, PS, PV.W, T0.Z, -; EG-NEXT: RECIP_UINT * T0.Z, PV.Z, -; EG-NEXT: SETGT_INT T8.W, 0.0, T3.W, -; EG-NEXT: MULHI * T2.X, PV.W, T0.Y, -; EG-NEXT: ADD_INT T4.Y, T3.W, PV.W, -; EG-NEXT: ADD_INT T2.Z, T0.Y, PS, -; EG-NEXT: SUB_INT T3.W, T0.Y, PS, -; EG-NEXT: MULLO_INT * T0.Y, T0.Z, T1.Z, -; EG-NEXT: CNDE_INT T2.X, T1.Y, PV.Z, PV.W, -; EG-NEXT: XOR_INT T1.Y, PV.Y, T8.W, -; EG-NEXT: SETGT_INT T2.Z, 0.0, T1.X, -; EG-NEXT: SUB_INT T3.W, 0.0, PS, -; EG-NEXT: MULHI * T3.Z, T0.Z, T1.Z, -; EG-NEXT: CNDE_INT T4.Z, PS, PV.W, T0.Y, -; EG-NEXT: ADD_INT T3.W, T1.X, PV.Z, -; EG-NEXT: MULHI * T0.Y, PV.X, PV.Y, -; EG-NEXT: XOR_INT T3.W, PV.W, T2.Z, BS:VEC_021/SCL_122 -; EG-NEXT: MULHI * T1.X, PV.Z, T0.Z, +; EG-NEXT: ALU clause starting at 11: +; EG-NEXT: SETGT_INT * T2.W, 0.0, T1.W, +; EG-NEXT: ADD_INT * T1.W, T1.W, PV.W, +; EG-NEXT: XOR_INT * T1.W, PV.W, T2.W, +; EG-NEXT: SUB_INT T3.W, 0.0, PV.W, ; EG-NEXT: RECIP_UINT * T2.X, PV.W, -; EG-NEXT: MULLO_INT * T4.X, PS, T3.W, -; EG-NEXT: SETGT_INT T4.Z, 0.0, T3.Y, -; EG-NEXT: SUB_INT T7.W, 0.0, PS, -; EG-NEXT: MULHI * T4.Y, T2.X, T3.W, -; EG-NEXT: CNDE_INT T4.X, PS, PV.W, T4.X, -; EG-NEXT: ADD_INT T3.Y, T3.Y, PV.Z, -; EG-NEXT: ADD_INT T5.Z, T0.Z, T1.X, -; EG-NEXT: SUB_INT T7.W, T0.Z, T1.X, -; EG-NEXT: MULLO_INT * T0.Z, T0.Y, T1.W, -; EG-NEXT: CNDE_INT T5.Y, T3.Z, PV.Z, PV.W, -; EG-NEXT: XOR_INT T3.Z, PV.Y, T4.Z, -; EG-NEXT: SUB_INT T7.W, T1.Y, PS, -; EG-NEXT: MULHI * T1.X, PV.X, T2.X, -; EG-NEXT: SETGE_UINT T5.Z, PV.W, T1.W, -; EG-NEXT: SETGE_UINT T1.W, T1.Y, T0.Z, -; EG-NEXT: MULHI * T0.Z, PV.Y, PV.Z, -; EG-NEXT: AND_INT T1.Y, PV.Z, PV.W, -; EG-NEXT: ADD_INT T5.Z, T0.Y, 1, -; EG-NEXT: SETGT_INT T7.W, 0.0, T3.X, -; EG-NEXT: MULLO_INT * T3.Y, PS, T1.Z, -; EG-NEXT: SUB_INT T4.X, T3.Z, PS, -; EG-NEXT: ADD_INT T5.Y, T3.X, PV.W, -; EG-NEXT: ADD_INT T6.Z, T2.X, T1.X, BS:VEC_120/SCL_212 -; EG-NEXT: SUB_INT * T9.W, T2.X, T1.X, BS:VEC_120/SCL_212 -; EG-NEXT: MULLO_INT * T1.X, T0.X, T2.W, -; EG-NEXT: CNDE_INT T2.X, T4.Y, T6.Z, T9.W, -; EG-NEXT: XOR_INT T4.Y, T5.Y, T7.W, BS:VEC_201 -; EG-NEXT: SUB_INT T6.Z, T4.W, PS, BS:VEC_120/SCL_212 -; EG-NEXT: SETGE_UINT T9.W, T4.X, T1.Z, BS:VEC_102/SCL_221 -; EG-NEXT: SETGE_UINT * T10.W, T3.Z, T3.Y, -; EG-NEXT: AND_INT T3.X, PV.W, PS, -; EG-NEXT: ADD_INT T3.Y, T0.Z, 1, -; EG-NEXT: SETGE_UINT T1.Z, PV.Z, T2.W, -; EG-NEXT: SETGE_UINT T2.W, T4.W, T1.X, -; EG-NEXT: MULHI * T1.X, PV.X, PV.Y, -; EG-NEXT: AND_INT T2.X, PV.Z, PV.W, -; EG-NEXT: ADD_INT T5.Y, T0.X, 1, -; EG-NEXT: CNDE_INT T1.Z, PV.X, T0.Z, PV.Y, -; EG-NEXT: ADD_INT T4.W, T0.Z, literal.x, -; EG-NEXT: MULLO_INT * T0.Z, PS, T3.W, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T3.X, T10.W, PV.W, PV.Z, -; EG-NEXT: CNDE_INT T3.Y, PV.X, T0.X, PV.Y, -; EG-NEXT: CNDE_INT T1.Z, T1.Y, T0.Y, T5.Z, -; EG-NEXT: ADD_INT T4.W, T0.Y, literal.x, BS:VEC_120/SCL_212 -; EG-NEXT: SUB_INT * T9.W, T4.Y, PS, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: ADD_INT T0.X, T0.X, literal.x, -; EG-NEXT: SETGE_UINT T0.Y, PS, T3.W, -; EG-NEXT: SETGE_UINT T0.Z, T4.Y, T0.Z, -; EG-NEXT: CNDE_INT T1.W, T1.W, PV.W, PV.Z, -; EG-NEXT: XOR_INT * T3.W, T8.W, T2.Y, -; EG-NEXT: -1(nan), 0(0.000000e+00) +; EG-NEXT: SETGT_INT T4.W, 0.0, T0.W, +; EG-NEXT: MULLO_INT * T2.Y, PV.W, PS, +; EG-NEXT: SETGT_INT T2.Z, 0.0, T1.Y, +; EG-NEXT: ADD_INT T0.W, T0.W, PV.W, +; EG-NEXT: MULHI * T2.Y, T2.X, PS, +; EG-NEXT: ADD_INT T3.Z, T2.X, PS, +; EG-NEXT: XOR_INT T0.W, PV.W, T4.W, +; EG-NEXT: ADD_INT * T3.W, T1.Y, PV.Z, +; EG-NEXT: XOR_INT T3.W, PS, T2.Z, +; EG-NEXT: MULHI * T1.Y, PV.W, PV.Z, +; EG-NEXT: SUB_INT T5.W, 0.0, PV.W, +; EG-NEXT: RECIP_UINT * T2.X, PV.W, +; EG-NEXT: SETGT_INT T6.W, 0.0, T0.Y, +; EG-NEXT: MULLO_INT * T2.Y, PV.W, PS, +; EG-NEXT: ADD_INT T5.W, T0.Y, PV.W, +; EG-NEXT: MULHI * T0.Y, T2.X, PS, +; EG-NEXT: ADD_INT T0.Y, T2.X, PS, +; EG-NEXT: XOR_INT T3.Z, PV.W, T6.W, BS:VEC_021/SCL_122 +; EG-NEXT: SETGT_INT T5.W, 0.0, T1.Z, +; EG-NEXT: MULLO_INT * T2.X, T1.Y, T1.W, +; EG-NEXT: ADD_INT T7.W, T1.Z, PV.W, +; EG-NEXT: MULHI * T0.Y, PV.Z, PV.Y, +; EG-NEXT: XOR_INT T7.W, PV.W, T5.W, BS:VEC_021/SCL_122 +; EG-NEXT: MULLO_INT * T1.Z, PS, T3.W, +; EG-NEXT: SUB_INT T4.Z, 0.0, PV.W, +; EG-NEXT: SETGT_INT T8.W, 0.0, T1.X, +; EG-NEXT: RECIP_UINT * T2.Y, PV.W, +; EG-NEXT: ADD_INT T9.W, T1.X, PV.W, +; EG-NEXT: MULLO_INT * T1.X, PV.Z, PS, +; EG-NEXT: SETGT_INT T4.Z, 0.0, T0.Z, +; EG-NEXT: XOR_INT T9.W, PV.W, T8.W, +; EG-NEXT: MULHI * T1.X, T2.Y, PS, +; EG-NEXT: ADD_INT T1.X, T2.Y, PS, +; EG-NEXT: SUB_INT T2.Y, 0.0, PV.W, +; EG-NEXT: SUB_INT T1.Z, T3.Z, T1.Z, +; EG-NEXT: ADD_INT T10.W, T0.Z, PV.Z, BS:VEC_201 +; EG-NEXT: RECIP_UINT * T0.Z, PV.W, +; EG-NEXT: XOR_INT T3.X, PV.W, T4.Z, +; EG-NEXT: ADD_INT T3.Y, T0.Y, 1, +; EG-NEXT: SETGE_UINT T3.Z, PV.Z, T3.W, +; EG-NEXT: SUB_INT T10.W, PV.Z, T3.W, +; EG-NEXT: MULLO_INT * T2.Y, PV.Y, PS, +; EG-NEXT: CNDE_INT T1.Z, PV.Z, T1.Z, PV.W, +; EG-NEXT: CNDE_INT T10.W, PV.Z, T0.Y, PV.Y, +; EG-NEXT: MULHI * T0.Y, PV.X, T1.X, +; EG-NEXT: SETGT_INT T3.Y, 0.0, T0.X, +; EG-NEXT: ADD_INT T3.Z, PV.W, 1, +; EG-NEXT: SETGE_UINT T3.W, PV.Z, T3.W, BS:VEC_021/SCL_122 +; EG-NEXT: MULLO_INT * T1.X, PS, T7.W, +; EG-NEXT: CNDE_INT T4.Y, PV.W, T10.W, PV.Z, +; EG-NEXT: ADD_INT T1.Z, T0.X, PV.Y, +; EG-NEXT: SUB_INT T3.W, T3.X, PS, BS:VEC_120/SCL_212 +; EG-NEXT: MULHI * T0.X, T0.Z, T2.Y, +; EG-NEXT: ADD_INT T1.X, T0.Y, 1, +; EG-NEXT: SETGE_UINT T2.Y, PV.W, T7.W, +; EG-NEXT: ADD_INT T0.Z, T0.Z, PS, +; EG-NEXT: XOR_INT T10.W, PV.Z, T3.Y, +; EG-NEXT: SUB_INT * T0.W, T0.W, T2.X, +; EG-NEXT: SUB_INT T0.X, T3.W, T7.W, +; EG-NEXT: ADD_INT T5.Y, T1.Y, 1, +; EG-NEXT: SETGE_UINT T1.Z, PS, T1.W, BS:VEC_021/SCL_122 +; EG-NEXT: SUB_INT T11.W, PS, T1.W, BS:VEC_021/SCL_122 +; EG-NEXT: MULHI * T0.Z, PV.W, PV.Z, +; EG-NEXT: CNDE_INT T2.X, PV.Z, T0.W, PV.W, BS:VEC_021/SCL_122 +; EG-NEXT: CNDE_INT T1.Y, PV.Z, T1.Y, PV.Y, +; EG-NEXT: CNDE_INT T1.Z, T2.Y, T3.W, PV.X, BS:VEC_201 +; EG-NEXT: CNDE_INT T0.W, T2.Y, T0.Y, T1.X, BS:VEC_201 +; EG-NEXT: MULLO_INT * T0.X, PS, T9.W, +; EG-NEXT: ADD_INT T1.X, PV.W, 1, +; EG-NEXT: SETGE_UINT T0.Y, PV.Z, T7.W, +; EG-NEXT: ADD_INT T1.Z, PV.Y, 1, +; EG-NEXT: SETGE_UINT T1.W, PV.X, T1.W, BS:VEC_102/SCL_221 +; EG-NEXT: SUB_INT * T3.W, T10.W, PS, +; EG-NEXT: ADD_INT T0.X, T0.Z, 1, +; EG-NEXT: SETGE_UINT T2.Y, PS, T9.W, BS:VEC_102/SCL_221 +; EG-NEXT: SUB_INT T3.Z, PS, T9.W, BS:VEC_102/SCL_221 +; EG-NEXT: CNDE_INT T1.W, PV.W, T1.Y, PV.Z, +; EG-NEXT: XOR_INT * T2.W, T4.W, T2.W, ; EG-NEXT: XOR_INT T2.X, PV.W, PS, -; EG-NEXT: AND_INT T0.Y, PV.Y, PV.Z, -; EG-NEXT: ADD_INT T1.Z, T1.X, 1, -; EG-NEXT: CNDE_INT T1.W, T2.W, PV.X, T3.Y, -; EG-NEXT: XOR_INT * T0.W, T5.W, T0.W, -; EG-NEXT: XOR_INT T0.X, T4.Z, T6.W, BS:VEC_021/SCL_122 -; EG-NEXT: XOR_INT T1.Y, PV.W, PS, -; EG-NEXT: CNDE_INT T1.Z, PV.Y, T1.X, PV.Z, -; EG-NEXT: ADD_INT T1.W, T1.X, literal.x, -; EG-NEXT: SUB_INT * T3.W, PV.X, T3.W, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T0.Y, T0.Z, PV.W, PV.Z, -; EG-NEXT: SUB_INT T3.Z, PV.Y, T0.W, -; EG-NEXT: XOR_INT T0.W, T7.W, T2.Z, -; EG-NEXT: XOR_INT * T1.W, T3.X, PV.X, -; EG-NEXT: SUB_INT T3.Y, PS, T0.X, +; EG-NEXT: CNDE_INT T1.Y, PV.Y, T3.W, PV.Z, BS:VEC_021/SCL_122 +; EG-NEXT: CNDE_INT T0.Z, PV.Y, T0.Z, PV.X, +; EG-NEXT: CNDE_INT T0.W, T0.Y, T0.W, T1.X, BS:VEC_102/SCL_221 +; EG-NEXT: XOR_INT * T1.W, T4.Z, T5.W, +; EG-NEXT: XOR_INT T0.X, T6.W, T2.Z, +; EG-NEXT: XOR_INT T0.Y, PV.W, PS, +; EG-NEXT: ADD_INT T1.Z, PV.Z, 1, +; EG-NEXT: SETGE_UINT T0.W, PV.Y, T9.W, BS:VEC_021/SCL_122 +; EG-NEXT: SUB_INT * T2.W, PV.X, T2.W, +; EG-NEXT: CNDE_INT T1.Y, PV.W, T0.Z, PV.Z, +; EG-NEXT: SUB_INT T2.Z, PV.Y, T1.W, +; EG-NEXT: XOR_INT T0.W, T3.Y, T8.W, BS:VEC_021/SCL_122 +; EG-NEXT: XOR_INT * T1.W, T4.Y, PV.X, +; EG-NEXT: SUB_INT T2.Y, PS, T0.X, ; EG-NEXT: XOR_INT * T1.W, PV.Y, PV.W, -; EG-NEXT: SUB_INT T3.X, PV.W, T0.W, +; EG-NEXT: SUB_INT T2.X, PV.W, T0.W, ; EG-NEXT: LSHR * T0.X, KC0[2].Y, literal.x, ; EG-NEXT: 2(2.802597e-45), 0(0.000000e+00) %den_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1 @@ -1947,7 +1915,7 @@ define amdgpu_kernel void @v_sdiv_i24(i32 addrspace(1)* %out, i24 addrspace(1)* ; EG: ; %bb.0: ; EG-NEXT: ALU 0, @14, KC0[CB0:0-32], KC1[] ; EG-NEXT: TEX 3 @6 -; EG-NEXT: ALU 43, @15, KC0[CB0:0-32], KC1[] +; EG-NEXT: ALU 39, @15, KC0[CB0:0-32], KC1[] ; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1 ; EG-NEXT: CF_END ; EG-NEXT: PAD @@ -1965,37 +1933,33 @@ define amdgpu_kernel void @v_sdiv_i24(i32 addrspace(1)* %out, i24 addrspace(1)* ; EG-NEXT: 16(2.242078e-44), 0(0.000000e+00) ; EG-NEXT: OR_INT * T0.W, T0.X, PV.W, ; EG-NEXT: SETGT_INT * T1.W, 0.0, PV.W, -; EG-NEXT: ADD_INT * T0.W, T0.W, PV.W, -; EG-NEXT: XOR_INT * T0.W, PV.W, T1.W, -; EG-NEXT: RECIP_UINT * T0.X, PV.W, ; EG-NEXT: BFE_INT T2.W, T3.X, 0.0, literal.x, -; EG-NEXT: MULLO_INT * T0.Y, PS, T0.W, +; EG-NEXT: ADD_INT * T0.W, T0.W, PV.W, ; EG-NEXT: 8(1.121039e-44), 0(0.000000e+00) -; EG-NEXT: LSHL T0.Z, PV.W, literal.x, -; EG-NEXT: SUB_INT T2.W, 0.0, PS, -; EG-NEXT: MULHI * T1.X, T0.X, T0.W, +; EG-NEXT: LSHL T2.W, PV.W, literal.x, +; EG-NEXT: XOR_INT * T0.W, PS, T1.W, ; EG-NEXT: 16(2.242078e-44), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T2.W, PS, PV.W, T0.Y, -; EG-NEXT: OR_INT * T3.W, T2.X, PV.Z, -; EG-NEXT: SETGT_INT T4.W, 0.0, PS, -; EG-NEXT: MULHI * T0.Y, PV.W, T0.X, -; EG-NEXT: ADD_INT T0.Z, T3.W, PV.W, -; EG-NEXT: ADD_INT T2.W, T0.X, PS, -; EG-NEXT: SUB_INT * T3.W, T0.X, PS, -; EG-NEXT: CNDE_INT T2.W, T1.X, PV.W, PS, -; EG-NEXT: XOR_INT * T3.W, PV.Z, T4.W, -; EG-NEXT: MULHI * T0.X, PV.W, PS, +; EG-NEXT: SUB_INT T0.Z, 0.0, PS, +; EG-NEXT: OR_INT T2.W, T2.X, PV.W, +; EG-NEXT: RECIP_UINT * T0.X, PS, +; EG-NEXT: SETGT_INT T3.W, 0.0, PV.W, +; EG-NEXT: MULLO_INT * T0.Y, PV.Z, PS, +; EG-NEXT: ADD_INT T2.W, T2.W, PV.W, +; EG-NEXT: MULHI * T0.Y, T0.X, PS, +; EG-NEXT: ADD_INT T4.W, T0.X, PS, +; EG-NEXT: XOR_INT * T2.W, PV.W, T3.W, +; EG-NEXT: MULHI * T0.X, PS, PV.W, ; EG-NEXT: MULLO_INT * T0.Y, PS, T0.W, -; EG-NEXT: SUB_INT * T2.W, T3.W, PS, -; EG-NEXT: SETGE_UINT T0.W, PV.W, T0.W, -; EG-NEXT: SETGE_UINT * T2.W, T3.W, T0.Y, -; EG-NEXT: AND_INT T0.W, PV.W, PS, -; EG-NEXT: ADD_INT * T3.W, T0.X, 1, -; EG-NEXT: CNDE_INT T0.W, PV.W, T0.X, PS, -; EG-NEXT: ADD_INT * T3.W, T0.X, literal.x, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T0.W, T2.W, PS, PV.W, -; EG-NEXT: XOR_INT * T1.W, T4.W, T1.W, +; EG-NEXT: SUB_INT * T2.W, T2.W, PS, +; EG-NEXT: ADD_INT T0.Z, T0.X, 1, +; EG-NEXT: SETGE_UINT T4.W, PV.W, T0.W, +; EG-NEXT: SUB_INT * T5.W, PV.W, T0.W, +; EG-NEXT: CNDE_INT T2.W, PV.W, T2.W, PS, +; EG-NEXT: CNDE_INT * T4.W, PV.W, T0.X, PV.Z, +; EG-NEXT: ADD_INT T5.W, PS, 1, +; EG-NEXT: SETGE_UINT * T0.W, PV.W, T0.W, +; EG-NEXT: CNDE_INT T0.W, PS, T4.W, PV.W, BS:VEC_102/SCL_221 +; EG-NEXT: XOR_INT * T1.W, T3.W, T1.W, ; EG-NEXT: XOR_INT * T0.W, PV.W, PS, ; EG-NEXT: SUB_INT * T0.W, PV.W, T1.W, ; EG-NEXT: LSHL * T0.W, PV.W, literal.x, @@ -2161,7 +2125,7 @@ define amdgpu_kernel void @v_sdiv_i25(i32 addrspace(1)* %out, i25 addrspace(1)* ; EG: ; %bb.0: ; EG-NEXT: ALU 1, @10, KC0[CB0:0-32], KC1[] ; EG-NEXT: TEX 1 @6 -; EG-NEXT: ALU 41, @12, KC0[CB0:0-32], KC1[] +; EG-NEXT: ALU 37, @12, KC0[CB0:0-32], KC1[] ; EG-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1 ; EG-NEXT: CF_END ; EG-NEXT: PAD @@ -2177,36 +2141,32 @@ define amdgpu_kernel void @v_sdiv_i25(i32 addrspace(1)* %out, i25 addrspace(1)* ; EG-NEXT: ASHR * T0.W, PV.W, literal.x, ; EG-NEXT: 7(9.809089e-45), 0(0.000000e+00) ; EG-NEXT: SETGT_INT * T1.W, 0.0, PV.W, -; EG-NEXT: ADD_INT * T0.W, T0.W, PV.W, +; EG-NEXT: ADD_INT T0.W, T0.W, PV.W, +; EG-NEXT: LSHL * T2.W, T1.X, literal.x, +; EG-NEXT: 7(9.809089e-45), 0(0.000000e+00) ; EG-NEXT: XOR_INT * T0.W, PV.W, T1.W, +; EG-NEXT: SUB_INT T0.Z, 0.0, PV.W, +; EG-NEXT: ASHR T2.W, T2.W, literal.x, ; EG-NEXT: RECIP_UINT * T0.X, PV.W, -; EG-NEXT: MULLO_INT * T0.Y, PS, T0.W, -; EG-NEXT: LSHL T0.Z, T1.X, literal.x, -; EG-NEXT: SUB_INT T2.W, 0.0, PS, -; EG-NEXT: MULHI * T1.X, T0.X, T0.W, -; EG-NEXT: 7(9.809089e-45), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T2.W, PS, PV.W, T0.Y, -; EG-NEXT: ASHR * T3.W, PV.Z, literal.x, ; EG-NEXT: 7(9.809089e-45), 0(0.000000e+00) -; EG-NEXT: SETGT_INT T4.W, 0.0, PS, -; EG-NEXT: MULHI * T0.Y, PV.W, T0.X, -; EG-NEXT: ADD_INT T0.Z, T3.W, PV.W, -; EG-NEXT: ADD_INT T2.W, T0.X, PS, -; EG-NEXT: SUB_INT * T3.W, T0.X, PS, -; EG-NEXT: CNDE_INT T2.W, T1.X, PV.W, PS, -; EG-NEXT: XOR_INT * T3.W, PV.Z, T4.W, -; EG-NEXT: MULHI * T0.X, PV.W, PS, +; EG-NEXT: SETGT_INT T3.W, 0.0, PV.W, +; EG-NEXT: MULLO_INT * T0.Y, PV.Z, PS, +; EG-NEXT: ADD_INT T2.W, T2.W, PV.W, +; EG-NEXT: MULHI * T0.Y, T0.X, PS, +; EG-NEXT: ADD_INT T4.W, T0.X, PS, +; EG-NEXT: XOR_INT * T2.W, PV.W, T3.W, +; EG-NEXT: MULHI * T0.X, PS, PV.W, ; EG-NEXT: MULLO_INT * T0.Y, PS, T0.W, -; EG-NEXT: SUB_INT * T2.W, T3.W, PS, -; EG-NEXT: SETGE_UINT T0.W, PV.W, T0.W, -; EG-NEXT: SETGE_UINT * T2.W, T3.W, T0.Y, -; EG-NEXT: AND_INT T0.W, PV.W, PS, -; EG-NEXT: ADD_INT * T3.W, T0.X, 1, -; EG-NEXT: CNDE_INT T0.W, PV.W, T0.X, PS, -; EG-NEXT: ADD_INT * T3.W, T0.X, literal.x, -; EG-NEXT: -1(nan), 0(0.000000e+00) -; EG-NEXT: CNDE_INT T0.W, T2.W, PS, PV.W, -; EG-NEXT: XOR_INT * T1.W, T4.W, T1.W, +; EG-NEXT: SUB_INT * T2.W, T2.W, PS, +; EG-NEXT: ADD_INT T0.Z, T0.X, 1, +; EG-NEXT: SETGE_UINT T4.W, PV.W, T0.W, +; EG-NEXT: SUB_INT * T5.W, PV.W, T0.W, +; EG-NEXT: CNDE_INT T2.W, PV.W, T2.W, PS, +; EG-NEXT: CNDE_INT * T4.W, PV.W, T0.X, PV.Z, +; EG-NEXT: ADD_INT T5.W, PS, 1, +; EG-NEXT: SETGE_UINT * T0.W, PV.W, T0.W, +; EG-NEXT: CNDE_INT T0.W, PS, T4.W, PV.W, BS:VEC_102/SCL_221 +; EG-NEXT: XOR_INT * T1.W, T3.W, T1.W, ; EG-NEXT: XOR_INT * T0.W, PV.W, PS, ; EG-NEXT: SUB_INT * T0.W, PV.W, T1.W, ; EG-NEXT: LSHL * T0.W, PV.W, literal.x, diff --git a/llvm/test/CodeGen/AMDGPU/udivrem.ll b/llvm/test/CodeGen/AMDGPU/udivrem.ll index be06c3d10431..10299b314e83 100644 --- a/llvm/test/CodeGen/AMDGPU/udivrem.ll +++ b/llvm/test/CodeGen/AMDGPU/udivrem.ll @@ -6,37 +6,31 @@ define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, [8 x i32], i32 addrspace(1)* %out1, [8 x i32], i32 %x, [8 x i32], i32 %y) { ; R600-LABEL: test_udivrem: ; R600: ; %bb.0: -; R600-NEXT: ALU 27, @4, KC0[CB0:0-32], KC1[] -; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.X, T3.X, 0 -; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T1.X, T2.X, 1 +; R600-NEXT: ALU 21, @4, KC0[CB0:0-32], KC1[] +; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T2.X, T3.X, 0 +; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T1.X, T0.X, 1 ; R600-NEXT: CF_END ; R600-NEXT: ALU clause starting at 4: +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[9].X, ; R600-NEXT: RECIP_UINT * T0.X, KC0[9].X, -; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[9].X, -; R600-NEXT: SUB_INT T0.W, 0.0, PS, -; R600-NEXT: MULHI * T0.Z, T0.X, KC0[9].X, -; R600-NEXT: CNDE_INT * T0.W, PS, PV.W, T0.Y, -; R600-NEXT: MULHI * T0.Y, PV.W, T0.X, -; R600-NEXT: ADD_INT T0.W, T0.X, PS, -; R600-NEXT: SUB_INT * T1.W, T0.X, PS, -; R600-NEXT: CNDE_INT * T0.W, T0.Z, PV.W, PS, -; R600-NEXT: MULHI * T0.X, PV.W, KC0[6].W, +; R600-NEXT: MULLO_INT * T0.Y, PV.W, PS, +; R600-NEXT: MULHI * T0.Y, T0.X, PS, +; R600-NEXT: ADD_INT * T0.W, T0.X, PS, +; R600-NEXT: MULHI * T0.X, KC0[6].W, PV.W, ; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[9].X, ; R600-NEXT: SUB_INT * T0.W, KC0[6].W, PS, -; R600-NEXT: SETGE_UINT T1.W, PV.W, KC0[9].X, -; R600-NEXT: SETGE_UINT * T2.W, KC0[6].W, T0.Y, -; R600-NEXT: AND_INT T1.W, PV.W, PS, -; R600-NEXT: SUB_INT * T3.W, T0.W, KC0[9].X, -; R600-NEXT: CNDE_INT T3.W, PV.W, T0.W, PS, -; R600-NEXT: ADD_INT * T0.W, T0.W, KC0[9].X, -; R600-NEXT: CNDE_INT T1.X, T2.W, PS, PV.W, -; R600-NEXT: ADD_INT T0.W, T0.X, 1, -; R600-NEXT: LSHR * T2.X, KC0[4].Z, literal.x, +; R600-NEXT: SUB_INT T1.W, PV.W, KC0[9].X, +; R600-NEXT: SETGE_UINT * T2.W, PV.W, KC0[9].X, +; R600-NEXT: CNDE_INT * T0.W, PS, T0.W, PV.W, +; R600-NEXT: ADD_INT T0.Z, T0.X, 1, +; R600-NEXT: SUB_INT T1.W, PV.W, KC0[9].X, +; R600-NEXT: SETGE_UINT * T3.W, PV.W, KC0[9].X, +; R600-NEXT: CNDE_INT T1.X, PS, T0.W, PV.W, +; R600-NEXT: CNDE_INT T0.W, T2.W, T0.X, PV.Z, +; R600-NEXT: LSHR * T0.X, KC0[4].Z, literal.x, ; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00) -; R600-NEXT: CNDE_INT T0.W, T1.W, T0.X, PV.W, -; R600-NEXT: ADD_INT * T1.W, T0.X, literal.x, -; R600-NEXT: -1(nan), 0(0.000000e+00) -; R600-NEXT: CNDE_INT T0.X, T2.W, PS, PV.W, +; R600-NEXT: ADD_INT * T1.W, PV.W, 1, +; R600-NEXT: CNDE_INT T2.X, T3.W, T0.W, PV.W, ; R600-NEXT: LSHR * T3.X, KC0[2].Y, literal.x, ; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00) ; @@ -123,49 +117,39 @@ define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, [8 x i32], i32 define amdgpu_kernel void @test_udivrem_v2(<2 x i32> addrspace(1)* %out, <2 x i32> %x, <2 x i32> %y) { ; R600-LABEL: test_udivrem_v2: ; R600: ; %bb.0: -; R600-NEXT: ALU 39, @4, KC0[CB0:0-32], KC1[] +; R600-NEXT: ALU 29, @4, KC0[CB0:0-32], KC1[] ; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.XY, T1.X, 1 ; R600-NEXT: CF_END ; R600-NEXT: PAD ; R600-NEXT: ALU clause starting at 4: +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[3].Z, ; R600-NEXT: RECIP_UINT * T0.X, KC0[3].Z, -; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[3].Z, +; R600-NEXT: MULLO_INT * T0.Y, PV.W, PS, +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[3].Y, ; R600-NEXT: RECIP_UINT * T0.Z, KC0[3].Y, -; R600-NEXT: MULLO_INT * T0.W, PS, KC0[3].Y, -; R600-NEXT: SUB_INT T1.W, 0.0, PS, -; R600-NEXT: MULHI * T1.X, T0.Z, KC0[3].Y, -; R600-NEXT: CNDE_INT T1.Z, PS, PV.W, T0.W, -; R600-NEXT: SUB_INT T0.W, 0.0, T0.Y, -; R600-NEXT: MULHI * T1.Y, T0.X, KC0[3].Z, -; R600-NEXT: CNDE_INT T0.W, PS, PV.W, T0.Y, -; R600-NEXT: MULHI * T0.Y, PV.Z, T0.Z, -; R600-NEXT: ADD_INT T1.Z, T0.Z, PS, -; R600-NEXT: SUB_INT T1.W, T0.Z, PS, -; R600-NEXT: MULHI * T0.Y, PV.W, T0.X, -; R600-NEXT: CNDE_INT T0.Z, T1.X, PV.Z, PV.W, -; R600-NEXT: ADD_INT T0.W, T0.X, PS, BS:VEC_120/SCL_212 -; R600-NEXT: SUB_INT * T1.W, T0.X, PS, -; R600-NEXT: CNDE_INT T0.W, T1.Y, PV.W, PS, -; R600-NEXT: MULHI * T0.X, PV.Z, KC0[2].W, -; R600-NEXT: MULHI * T0.Y, PV.W, KC0[3].X, +; R600-NEXT: MULLO_INT * T0.W, PV.W, PS, +; R600-NEXT: MULHI * T0.W, T0.Z, PS, +; R600-NEXT: ADD_INT T0.W, T0.Z, PS, +; R600-NEXT: MULHI * T0.Y, T0.X, T0.Y, +; R600-NEXT: ADD_INT T1.W, T0.X, PS, +; R600-NEXT: MULHI * T0.X, KC0[2].W, PV.W, +; R600-NEXT: MULHI * T0.Y, KC0[3].X, PV.W, ; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[3].Z, ; R600-NEXT: SUB_INT T0.W, KC0[3].X, PS, ; R600-NEXT: MULLO_INT * T0.X, T0.X, KC0[3].Y, ; R600-NEXT: SUB_INT T0.Z, KC0[2].W, PS, -; R600-NEXT: SETGE_UINT * T1.W, PV.W, KC0[3].Z, -; R600-NEXT: SETGE_UINT * T2.W, KC0[3].X, T0.Y, -; R600-NEXT: AND_INT T0.Y, T1.W, PV.W, -; R600-NEXT: SUB_INT T1.Z, T0.W, KC0[3].Z, BS:VEC_120/SCL_212 -; R600-NEXT: SETGE_UINT * T1.W, T0.Z, KC0[3].Y, -; R600-NEXT: SETGE_UINT * T3.W, KC0[2].W, T0.X, -; R600-NEXT: AND_INT T1.Y, T1.W, PV.W, -; R600-NEXT: SUB_INT T2.Z, T0.Z, KC0[3].Y, -; R600-NEXT: CNDE_INT T1.W, T0.Y, T0.W, T1.Z, -; R600-NEXT: ADD_INT * T0.W, T0.W, KC0[3].Z, -; R600-NEXT: CNDE_INT T0.Y, T2.W, PS, PV.W, -; R600-NEXT: CNDE_INT T0.W, PV.Y, T0.Z, PV.Z, -; R600-NEXT: ADD_INT * T1.W, T0.Z, KC0[3].Y, -; R600-NEXT: CNDE_INT T0.X, T3.W, PS, PV.W, +; R600-NEXT: SETGE_UINT T1.W, PV.W, KC0[3].Z, +; R600-NEXT: SUB_INT * T2.W, PV.W, KC0[3].Z, +; R600-NEXT: CNDE_INT T1.Z, PV.W, T0.W, PS, +; R600-NEXT: SETGE_UINT T0.W, PV.Z, KC0[3].Y, +; R600-NEXT: SUB_INT * T1.W, PV.Z, KC0[3].Y, +; R600-NEXT: CNDE_INT T0.Z, PV.W, T0.Z, PS, +; R600-NEXT: SETGE_UINT T0.W, PV.Z, KC0[3].Z, +; R600-NEXT: SUB_INT * T1.W, PV.Z, KC0[3].Z, +; R600-NEXT: CNDE_INT T0.Y, PV.W, T1.Z, PS, +; R600-NEXT: SETGE_UINT T0.W, PV.Z, KC0[3].Y, +; R600-NEXT: SUB_INT * T1.W, PV.Z, KC0[3].Y, +; R600-NEXT: CNDE_INT T0.X, PV.W, T0.Z, PS, ; R600-NEXT: LSHR * T1.X, KC0[2].Y, literal.x, ; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00) ; @@ -268,88 +252,68 @@ define amdgpu_kernel void @test_udivrem_v2(<2 x i32> addrspace(1)* %out, <2 x i3 define amdgpu_kernel void @test_udivrem_v4(<4 x i32> addrspace(1)* %out, <4 x i32> %x, <4 x i32> %y) { ; R600-LABEL: test_udivrem_v4: ; R600: ; %bb.0: -; R600-NEXT: ALU 77, @4, KC0[CB0:0-32], KC1[] -; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T0.XYZW, T1.X, 1 +; R600-NEXT: ALU 57, @4, KC0[CB0:0-32], KC1[] +; R600-NEXT: MEM_RAT_CACHELESS STORE_RAW T3.XYZW, T0.X, 1 ; R600-NEXT: CF_END ; R600-NEXT: PAD ; R600-NEXT: ALU clause starting at 4: +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[5].X, ; R600-NEXT: RECIP_UINT * T0.X, KC0[5].X, -; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[5].X, -; R600-NEXT: SUB_INT T0.W, 0.0, PS, -; R600-NEXT: MULHI * T0.Z, T0.X, KC0[5].X, -; R600-NEXT: CNDE_INT * T0.W, PS, PV.W, T0.Y, -; R600-NEXT: MULHI * T0.Y, PV.W, T0.X, -; R600-NEXT: RECIP_UINT * T0.W, KC0[4].Y, -; R600-NEXT: MULLO_INT * T1.X, PS, KC0[4].Y, -; R600-NEXT: SUB_INT T1.W, 0.0, PS, -; R600-NEXT: MULHI * T1.Y, T0.W, KC0[4].Y, -; R600-NEXT: CNDE_INT T1.Z, PS, PV.W, T1.X, BS:VEC_021/SCL_122 -; R600-NEXT: ADD_INT T1.W, T0.X, T0.Y, -; R600-NEXT: SUB_INT * T2.W, T0.X, T0.Y, -; R600-NEXT: CNDE_INT T1.W, T0.Z, PV.W, PS, -; R600-NEXT: MULHI * T0.X, PV.Z, T0.W, -; R600-NEXT: MULHI * T0.Y, PV.W, KC0[4].X, +; R600-NEXT: MULLO_INT * T0.Y, PV.W, PS, +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[4].Z, ; R600-NEXT: RECIP_UINT * T0.Z, KC0[4].Z, -; R600-NEXT: MULLO_INT * T1.X, PS, KC0[4].Z, -; R600-NEXT: SUB_INT T1.W, 0.0, PS, -; R600-NEXT: MULHI * T1.Z, T0.Z, KC0[4].Z, -; R600-NEXT: CNDE_INT T1.W, PS, PV.W, T1.X, +; R600-NEXT: MULLO_INT * T0.W, PV.W, PS, +; R600-NEXT: MULHI * T0.W, T0.Z, PS, +; R600-NEXT: ADD_INT T0.W, T0.Z, PS, +; R600-NEXT: MULHI * T0.Y, T0.X, T0.Y, +; R600-NEXT: ADD_INT T1.W, T0.X, PS, +; R600-NEXT: MULHI * T0.X, KC0[3].Z, PV.W, +; R600-NEXT: MULHI * T0.Y, KC0[4].X, PV.W, +; R600-NEXT: MULLO_INT * T0.Y, PS, KC0[5].X, +; R600-NEXT: RECIP_UINT * T0.Z, KC0[4].Y, +; R600-NEXT: SUB_INT T0.W, 0.0, KC0[4].W, ; R600-NEXT: RECIP_UINT * T1.X, KC0[4].W, -; R600-NEXT: MULHI * T1.W, PV.W, T0.Z, -; R600-NEXT: ADD_INT T2.Z, T0.Z, PS, -; R600-NEXT: SUB_INT T1.W, T0.Z, PS, -; R600-NEXT: MULLO_INT * T0.Z, T1.X, KC0[4].W, -; R600-NEXT: CNDE_INT T1.Z, T1.Z, PV.Z, PV.W, -; R600-NEXT: SUB_INT T1.W, 0.0, PS, -; R600-NEXT: MULHI * T2.X, T1.X, KC0[4].W, -; R600-NEXT: CNDE_INT T1.W, PS, PV.W, T0.Z, -; R600-NEXT: MULHI * T0.Z, PV.Z, KC0[3].Z, -; R600-NEXT: MULHI * T1.Z, PV.W, T1.X, -; R600-NEXT: ADD_INT T2.Z, T1.X, PS, -; R600-NEXT: SUB_INT T1.W, T1.X, PS, -; R600-NEXT: MULLO_INT * T0.Z, T0.Z, KC0[4].Z, -; R600-NEXT: CNDE_INT T1.Z, T2.X, PV.Z, PV.W, -; R600-NEXT: SUB_INT T1.W, KC0[3].Z, PS, -; R600-NEXT: MULLO_INT * T0.Y, T0.Y, KC0[5].X, -; R600-NEXT: SUB_INT T1.X, PV.W, KC0[4].Z, -; R600-NEXT: SUB_INT T2.Y, KC0[4].X, PS, -; R600-NEXT: ADD_INT T2.Z, T0.W, T0.X, -; R600-NEXT: SUB_INT * T0.W, T0.W, T0.X, -; R600-NEXT: MULHI * T0.X, T1.Z, KC0[3].W, -; R600-NEXT: CNDE_INT T1.Y, T1.Y, T2.Z, T0.W, -; R600-NEXT: SETGE_UINT T1.Z, T2.Y, KC0[5].X, BS:VEC_120/SCL_212 -; R600-NEXT: SETGE_UINT * T0.W, KC0[4].X, T0.Y, BS:VEC_021/SCL_122 -; R600-NEXT: MULLO_INT * T0.X, T0.X, KC0[4].W, -; R600-NEXT: ADD_INT T2.X, T2.Y, KC0[5].X, -; R600-NEXT: AND_INT T0.Y, T1.Z, T0.W, -; R600-NEXT: SUB_INT T1.Z, T2.Y, KC0[5].X, -; R600-NEXT: SUB_INT * T2.W, KC0[3].W, PS, -; R600-NEXT: MULHI * T1.Y, T1.Y, KC0[3].Y, -; R600-NEXT: ADD_INT T3.X, T2.W, KC0[4].W, -; R600-NEXT: CNDE_INT T0.Y, T0.Y, T2.Y, T1.Z, -; R600-NEXT: SETGE_UINT T1.Z, T2.W, KC0[4].W, -; R600-NEXT: SETGE_UINT * T3.W, KC0[3].W, T0.X, -; R600-NEXT: MULLO_INT * T0.X, T1.Y, KC0[4].Y, -; R600-NEXT: SUB_INT T4.X, KC0[3].Y, PS, -; R600-NEXT: AND_INT T1.Y, T1.Z, T3.W, -; R600-NEXT: SUB_INT T1.Z, T2.W, KC0[4].W, -; R600-NEXT: SETGE_UINT * T4.W, T1.W, KC0[4].Z, BS:VEC_201 -; R600-NEXT: SETGE_UINT * T5.W, KC0[3].Z, T0.Z, -; R600-NEXT: AND_INT T5.X, T4.W, PV.W, -; R600-NEXT: CNDE_INT T1.Y, T1.Y, T2.W, T1.Z, BS:VEC_210 -; R600-NEXT: SETGE_UINT T0.Z, T4.X, KC0[4].Y, -; R600-NEXT: SETGE_UINT T2.W, KC0[3].Y, T0.X, BS:VEC_021/SCL_122 -; R600-NEXT: CNDE_INT * T0.W, T0.W, T2.X, T0.Y, -; R600-NEXT: AND_INT T0.X, PV.Z, PV.W, -; R600-NEXT: SUB_INT T2.Y, T4.X, KC0[4].Y, -; R600-NEXT: CNDE_INT T0.Z, T3.W, T3.X, PV.Y, -; R600-NEXT: CNDE_INT T3.W, PV.X, T1.W, T1.X, -; R600-NEXT: ADD_INT * T1.W, T1.W, KC0[4].Z, -; R600-NEXT: CNDE_INT T0.Y, T5.W, PS, PV.W, -; R600-NEXT: CNDE_INT T1.W, PV.X, T4.X, PV.Y, -; R600-NEXT: ADD_INT * T3.W, T4.X, KC0[4].Y, -; R600-NEXT: CNDE_INT T0.X, T2.W, PS, PV.W, -; R600-NEXT: LSHR * T1.X, KC0[2].Y, literal.x, +; R600-NEXT: MULLO_INT * T0.W, PV.W, PS, +; R600-NEXT: SUB_INT T1.W, 0.0, KC0[4].Y, +; R600-NEXT: MULHI * T0.W, T1.X, PS, +; R600-NEXT: ADD_INT T0.W, T1.X, PS, +; R600-NEXT: MULLO_INT * T1.X, PV.W, T0.Z, +; R600-NEXT: MULHI * T0.W, KC0[3].W, PV.W, +; R600-NEXT: MULLO_INT * T0.W, PS, KC0[4].W, +; R600-NEXT: SUB_INT T0.W, KC0[3].W, PS, +; R600-NEXT: MULHI * T1.X, T0.Z, T1.X, +; R600-NEXT: SETGE_UINT T1.Y, PV.W, KC0[4].W, +; R600-NEXT: ADD_INT T0.Z, T0.Z, PS, +; R600-NEXT: SUB_INT T1.W, KC0[4].X, T0.Y, +; R600-NEXT: MULLO_INT * T0.X, T0.X, KC0[4].Z, +; R600-NEXT: SUB_INT T0.Y, KC0[3].Z, PS, +; R600-NEXT: SETGE_UINT T1.Z, PV.W, KC0[5].X, +; R600-NEXT: SUB_INT * T2.W, PV.W, KC0[5].X, +; R600-NEXT: MULHI * T0.X, KC0[3].Y, T0.Z, +; R600-NEXT: SUB_INT T1.X, T0.W, KC0[4].W, +; R600-NEXT: CNDE_INT T2.Y, T1.Z, T1.W, T2.W, +; R600-NEXT: SETGE_UINT T0.Z, T0.Y, KC0[4].Z, +; R600-NEXT: SUB_INT T1.W, T0.Y, KC0[4].Z, +; R600-NEXT: MULLO_INT * T0.X, PS, KC0[4].Y, +; R600-NEXT: CNDE_INT T2.X, PV.Z, T0.Y, PV.W, +; R600-NEXT: SETGE_UINT T0.Y, PV.Y, KC0[5].X, +; R600-NEXT: SUB_INT T0.Z, PV.Y, KC0[5].X, +; R600-NEXT: SUB_INT T1.W, KC0[3].Y, PS, +; R600-NEXT: CNDE_INT * T0.W, T1.Y, T0.W, PV.X, +; R600-NEXT: SETGE_UINT T0.X, PS, KC0[4].W, +; R600-NEXT: SUB_INT T1.Y, PS, KC0[4].W, +; R600-NEXT: SETGE_UINT T1.Z, PV.W, KC0[4].Y, +; R600-NEXT: SUB_INT T2.W, PV.W, KC0[4].Y, +; R600-NEXT: CNDE_INT * T3.W, PV.Y, T2.Y, PV.Z, +; R600-NEXT: CNDE_INT T0.Y, PV.Z, T1.W, PV.W, +; R600-NEXT: CNDE_INT T3.Z, PV.X, T0.W, PV.Y, BS:VEC_021/SCL_122 +; R600-NEXT: SETGE_UINT T0.W, T2.X, KC0[4].Z, +; R600-NEXT: SUB_INT * T1.W, T2.X, KC0[4].Z, +; R600-NEXT: CNDE_INT T3.Y, PV.W, T2.X, PS, +; R600-NEXT: SETGE_UINT T0.W, PV.Y, KC0[4].Y, +; R600-NEXT: SUB_INT * T1.W, PV.Y, KC0[4].Y, +; R600-NEXT: CNDE_INT T3.X, PV.W, T0.Y, PS, +; R600-NEXT: LSHR * T0.X, KC0[2].Y, literal.x, ; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00) ; ; GFX6-LABEL: test_udivrem_v4: From llvm-commits at lists.llvm.org Wed Jul 8 11:15:36 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:15:36 +0000 (UTC) Subject: [PATCH] D83381: [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 In-Reply-To: References: Message-ID: <37692b6535745f8f925c3a3cce486b12@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf4bd01c1918e: [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83381/new/ https://reviews.llvm.org/D83381 Files: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/idiv-licm.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83381.276494.patch Type: text/x-patch Size: 475691 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:15:32 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Wed, 08 Jul 2020 11:15:32 -0700 (PDT) Subject: [llvm] a8816eb - [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Message-ID: <5f060d44.1c69fb81.dc73a.0bb7@mx.google.com> Author: Jay Foad Date: 2020-07-08T19:14:49+01:00 New Revision: a8816ebee01c1f923d928617bb4e55dcc1d7d6da URL: https://github.com/llvm/llvm-project/commit/a8816ebee01c1f923d928617bb4e55dcc1d7d6da DIFF: https://github.com/llvm/llvm-project/commit/a8816ebee01c1f923d928617bb4e55dcc1d7d6da.diff LOG: [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83383 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index bcad30a117e6..0802f2a2d08a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2523,104 +2523,46 @@ bool AMDGPULegalizerInfo::legalizeFDIV(MachineInstr &MI, return false; } -static Register buildDivRCP(MachineIRBuilder &B, Register Src) { - const LLT S32 = LLT::scalar(32); - - auto Cvt0 = B.buildUITOFP(S32, Src); - auto RcpIFlag = B.buildInstr(AMDGPU::G_AMDGPU_RCP_IFLAG, {S32}, {Cvt0}); - auto FPUIntMaxPlus1 = B.buildFConstant(S32, BitsToFloat(0x4f800000)); - auto Mul = B.buildFMul(S32, RcpIFlag, FPUIntMaxPlus1); - return B.buildFPTOUI(S32, Mul).getReg(0); -} - void AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl(MachineIRBuilder &B, Register DstReg, - Register Num, - Register Den, + Register X, + Register Y, bool IsDiv) const { const LLT S1 = LLT::scalar(1); const LLT S32 = LLT::scalar(32); - // RCP = URECIP(Den) = 2^32 / Den + e - // e is rounding error. - auto RCP = buildDivRCP(B, Den); - - // RCP_LO = mul(RCP, Den) - auto RCP_LO = B.buildMul(S32, RCP, Den); - - // RCP_HI = mulhu (RCP, Den) */ - auto RCP_HI = B.buildUMulH(S32, RCP, Den); - - // NEG_RCP_LO = -RCP_LO - auto Zero = B.buildConstant(S32, 0); - auto NEG_RCP_LO = B.buildSub(S32, Zero, RCP_LO); - - // ABS_RCP_LO = (RCP_HI == 0 ? NEG_RCP_LO : RCP_LO) - auto CmpRcpHiZero = B.buildICmp(CmpInst::ICMP_EQ, S1, RCP_HI, Zero); - auto ABS_RCP_LO = B.buildSelect(S32, CmpRcpHiZero, NEG_RCP_LO, RCP_LO); - - // Calculate the rounding error from the URECIP instruction - // E = mulhu(ABS_RCP_LO, RCP) - auto E = B.buildUMulH(S32, ABS_RCP_LO, RCP); - - // RCP_A_E = RCP + E - auto RCP_A_E = B.buildAdd(S32, RCP, E); - - // RCP_S_E = RCP - E - auto RCP_S_E = B.buildSub(S32, RCP, E); - - // Tmp0 = (RCP_HI == 0 ? RCP_A_E : RCP_SUB_E) - auto Tmp0 = B.buildSelect(S32, CmpRcpHiZero, RCP_A_E, RCP_S_E); - - // Quotient = mulhu(Tmp0, Num)stmp - auto Quotient = B.buildUMulH(S32, Tmp0, Num); - - // Num_S_Remainder = Quotient * Den - auto Num_S_Remainder = B.buildMul(S32, Quotient, Den); + // See AMDGPUCodeGenPrepare::expandDivRem32 for a description of the + // algorithm used here. - // Remainder = Num - Num_S_Remainder - auto Remainder = B.buildSub(S32, Num, Num_S_Remainder); + // Initial estimate of inv(y). + auto FloatY = B.buildUITOFP(S32, Y); + auto RcpIFlag = B.buildInstr(AMDGPU::G_AMDGPU_RCP_IFLAG, {S32}, {FloatY}); + auto Scale = B.buildFConstant(S32, BitsToFloat(0x4f7ffffe)); + auto ScaledY = B.buildFMul(S32, RcpIFlag, Scale); + auto Z = B.buildFPTOUI(S32, ScaledY); - // Remainder_GE_Den = Remainder >= Den - auto Remainder_GE_Den = B.buildICmp(CmpInst::ICMP_UGE, S1, Remainder, Den); + // One round of UNR. + auto NegY = B.buildSub(S32, B.buildConstant(S32, 0), Y); + auto NegYZ = B.buildMul(S32, NegY, Z); + Z = B.buildAdd(S32, Z, B.buildUMulH(S32, Z, NegYZ)); - // Remainder_GE_Zero = Num >= Num_S_Remainder; - auto Remainder_GE_Zero = B.buildICmp(CmpInst::ICMP_UGE, S1, - Num, Num_S_Remainder); + // Quotient/remainder estimate. + auto Q = B.buildUMulH(S32, X, Z); + auto R = B.buildSub(S32, X, B.buildMul(S32, Q, Y)); - // Tmp1 = Remainder_GE_Den & Remainder_GE_Zero - auto Tmp1 = B.buildAnd(S1, Remainder_GE_Den, Remainder_GE_Zero); - - // Calculate Division result: - - // Quotient_A_One = Quotient + 1 + // First quotient/remainder refinement. auto One = B.buildConstant(S32, 1); - auto Quotient_A_One = B.buildAdd(S32, Quotient, One); - - // Quotient_S_One = Quotient - 1 - auto Quotient_S_One = B.buildSub(S32, Quotient, One); - - // Div = (Tmp1 ? Quotient_A_One : Quotient) - auto Div = B.buildSelect(S32, Tmp1, Quotient_A_One, Quotient); - - // Div = (Remainder_GE_Zero ? Div : Quotient_S_One) - if (IsDiv) { - B.buildSelect(DstReg, Remainder_GE_Zero, Div, Quotient_S_One); - } else { - Div = B.buildSelect(S32, Remainder_GE_Zero, Div, Quotient_S_One); - - // Calculate Rem result: - auto Remainder_S_Den = B.buildSub(S32, Remainder, Den); - - // Remainder_A_Den = Remainder + Den - auto Remainder_A_Den = B.buildAdd(S32, Remainder, Den); - - // Rem = (Tmp1 ? Remainder_S_Den : Remainder) - auto Rem = B.buildSelect(S32, Tmp1, Remainder_S_Den, Remainder); + auto Cond = B.buildICmp(CmpInst::ICMP_UGE, S1, R, Y); + if (IsDiv) + Q = B.buildSelect(S32, Cond, B.buildAdd(S32, Q, One), Q); + R = B.buildSelect(S32, Cond, B.buildSub(S32, R, Y), R); - // Rem = (Remainder_GE_Zero ? Rem : Remainder_A_Den) - B.buildSelect(DstReg, Remainder_GE_Zero, Rem, Remainder_A_Den); - } + // Second quotient/remainder refinement. + Cond = B.buildICmp(CmpInst::ICMP_UGE, S1, R, Y); + if (IsDiv) + B.buildSelect(DstReg, Cond, B.buildAdd(S32, Q, One), Q); + else + B.buildSelect(DstReg, Cond, B.buildSub(S32, R, Y), R); } bool AMDGPULegalizerInfo::legalizeUDIV_UREM32(MachineInstr &MI, diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir index 1e821b69b1ab..8290a38bcb89 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir @@ -21,34 +21,30 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX6: $vgpr0 = COPY [[SUB4]](s32) + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: $vgpr0 = COPY [[SUB3]](s32) ; GFX8-LABEL: name: test_sdiv_s32 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -61,34 +57,30 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX8: $vgpr0 = COPY [[SUB4]](s32) + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: $vgpr0 = COPY [[SUB3]](s32) ; GFX9-LABEL: name: test_sdiv_s32 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -101,34 +93,30 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX9: $vgpr0 = COPY [[SUB4]](s32) + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: $vgpr0 = COPY [[SUB3]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s32) = G_SDIV %0, %1 @@ -155,67 +143,59 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX6: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX6: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C3]] - ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C3]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR5]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX6: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C3]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX6: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C3]] + ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX6: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX6: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX6: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] - ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) + ; GFX6: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB3]](s32), [[SUB7]](s32) ; GFX6: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX8-LABEL: name: test_sdiv_v2s32 ; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -231,67 +211,59 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX8: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX8: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C3]] - ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C3]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR5]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX8: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C3]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX8: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C3]] + ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX8: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX8: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX8: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] - ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) + ; GFX8: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB3]](s32), [[SUB7]](s32) ; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX9-LABEL: name: test_sdiv_v2s32 ; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -307,67 +279,59 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX9: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX9: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C3]] - ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C3]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR5]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX9: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C3]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX9: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C3]] + ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX9: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX9: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX9: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] - ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) + ; GFX9: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB3]](s32), [[SUB7]](s32) ; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) %0:_(<2 x s32>) = COPY $vgpr0_vgpr1 %1:_(<2 x s32>) = COPY $vgpr2_vgpr3 @@ -1981,34 +1945,30 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_sdiv_s16 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2026,34 +1986,30 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_sdiv_s16 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2071,34 +2027,30 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 @@ -2136,77 +2088,69 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX6: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 ; GFX6: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) ; GFX6: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX6: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX6: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR5]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX6: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX6: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX6: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX6: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX6: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX6: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] ; GFX6: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 - ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C6]] - ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C6]] - ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C]](s32) - ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND2]], [[SHL]] + ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) + ; GFX6: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C6]] + ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB7]](s32) + ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C6]] + ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32) + ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]] ; GFX6: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX6: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX8-LABEL: name: test_sdiv_v2s16 @@ -2230,77 +2174,69 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 ; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) ; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX8: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX8: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR5]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX8: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX8: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX8: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX8: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX8: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX8: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] ; GFX8: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 - ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C6]] - ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C6]] - ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C]](s32) - ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND2]], [[SHL]] + ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) + ; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C6]] + ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB7]](s32) + ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C6]] + ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32) + ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]] ; GFX8: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX8: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX9-LABEL: name: test_sdiv_v2s16 @@ -2324,72 +2260,64 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 ; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) ; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX9: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX9: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR2]] + ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[ADD6]], [[ASHR3]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR5]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR5]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR5]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR4]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR5]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR5]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR4]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD7]], [[UMULH5]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB8]] + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR5]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR4]], [[ADD7]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR5]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[XOR4]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[XOR5]] + ; GFX9: [[ADD8:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD8]], [[UMULH3]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[XOR5]] + ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[XOR5]] + ; GFX9: [[ADD9:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD9]], [[SELECT3]] ; GFX9: [[XOR6:%[0-9]+]]:_(s32) = G_XOR [[ASHR2]], [[ASHR3]] - ; GFX9: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[XOR6]] - ; GFX9: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] - ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) + ; GFX9: [[XOR7:%[0-9]+]]:_(s32) = G_XOR [[SELECT5]], [[XOR6]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR7]], [[XOR6]] + ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) + ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB7]](s32) ; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY6]](s32), [[COPY7]](s32) ; GFX9: $vgpr0 = COPY [[BUILD_VECTOR_TRUNC]](<2 x s16>) %0:_(<2 x s16>) = COPY $vgpr0 @@ -2420,34 +2348,30 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_sdiv_s7 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2465,34 +2389,30 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_sdiv_s7 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2510,34 +2430,30 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 @@ -2570,34 +2486,30 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_sdiv_s17 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2615,34 +2527,30 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_sdiv_s17 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2660,34 +2568,30 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD3]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD3]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[XOR1]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD4]], [[SELECT]] ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[ASHR]], [[ASHR1]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[XOR2]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[SELECT2]], [[XOR2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[XOR2]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB3]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir index 653d38f88cd3..74c6ce7db0e0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir @@ -21,30 +21,24 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: $vgpr0 = COPY [[SUB4]](s32) ; GFX8-LABEL: name: test_srem_s32 @@ -59,30 +53,24 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: $vgpr0 = COPY [[SUB4]](s32) ; GFX9-LABEL: name: test_srem_s32 @@ -97,30 +85,24 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: $vgpr0 = COPY [[SUB4]](s32) %0:_(s32) = COPY $vgpr0 @@ -149,62 +131,50 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX6: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR4]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX6: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) ; GFX6: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) @@ -222,62 +192,50 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX8: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR4]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX8: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) ; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) @@ -295,62 +253,50 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[UV1]], [[C]](s32) ; GFX9: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[UV3]], [[C]](s32) - ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] - ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[ASHR2]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UV3]], [[ASHR3]] + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C2]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR4]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX9: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SUB4]](s32), [[SUB9]](s32) ; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) @@ -1897,35 +1843,29 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] - ; GFX6: $vgpr0 = COPY [[AND1]](s32) + ; GFX6: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] + ; GFX6: $vgpr0 = COPY [[AND]](s32) ; GFX8-LABEL: name: test_srem_s16 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1942,35 +1882,29 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] - ; GFX8: $vgpr0 = COPY [[AND1]](s32) + ; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] + ; GFX8: $vgpr0 = COPY [[AND]](s32) ; GFX9-LABEL: name: test_srem_s16 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1987,35 +1921,29 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] - ; GFX9: $vgpr0 = COPY [[AND1]](s32) + ; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]] + ; GFX9: $vgpr0 = COPY [[AND]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s16) = G_TRUNC %0 @@ -2052,30 +1980,24 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX6: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 @@ -2083,43 +2005,37 @@ body: | ; GFX6: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX6: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX6: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX6: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX6: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX6: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR4]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX6: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX6: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX6: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX6: [[C5:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C5]] + ; GFX6: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C5]] ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C5]] - ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C]](s32) - ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND2]], [[SHL]] + ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C5]] + ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32) + ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]] ; GFX6: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX6: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX8-LABEL: name: test_srem_v2s16 @@ -2143,30 +2059,24 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 @@ -2174,43 +2084,37 @@ body: | ; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX8: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX8: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX8: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX8: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX8: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR4]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX8: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX8: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX8: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX8: [[C5:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535 ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) - ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C5]] + ; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C5]] ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C5]] - ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C]](s32) - ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND2]], [[SHL]] + ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C5]] + ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32) + ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]] ; GFX8: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX8: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX9-LABEL: name: test_srem_v2s16 @@ -2234,30 +2138,24 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) ; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16 @@ -2265,35 +2163,29 @@ body: | ; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16 ; GFX9: [[ASHR2:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG2]], [[C1]](s32) ; GFX9: [[ASHR3:%[0-9]+]]:_(s32) = G_ASHR [[SEXT_INREG3]], [[C1]](s32) - ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] - ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] - ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR2]] - ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD5]], [[ASHR3]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG2]], [[ASHR2]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG3]], [[ASHR3]] + ; GFX9: [[XOR3:%[0-9]+]]:_(s32) = G_XOR [[ADD3]], [[ASHR2]] + ; GFX9: [[XOR4:%[0-9]+]]:_(s32) = G_XOR [[ADD4]], [[ASHR3]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[XOR4]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[XOR4]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[XOR4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB5]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD6:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD6]], [[SUB6]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[XOR3]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[XOR4]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB7]](s32), [[XOR4]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR3]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SUB7]], [[XOR4]] - ; GFX9: [[ADD7:%[0-9]+]]:_(s32) = G_ADD [[SUB7]], [[XOR4]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB8]], [[SUB7]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD7]] - ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT7]], [[ASHR2]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[XOR4]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB5]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[XOR3]], [[ADD5]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[XOR4]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[XOR3]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[XOR4]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[XOR4]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB7]], [[SUB6]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[XOR4]] + ; GFX9: [[SUB8:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[XOR4]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB8]], [[SELECT2]] + ; GFX9: [[XOR5:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR2]] ; GFX9: [[SUB9:%[0-9]+]]:_(s32) = G_SUB [[XOR5]], [[ASHR2]] ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SUB9]](s32) @@ -2327,30 +2219,24 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) @@ -2370,30 +2256,24 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) @@ -2413,30 +2293,24 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) @@ -2471,30 +2345,24 @@ body: | ; GFX6: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) @@ -2514,30 +2382,24 @@ body: | ; GFX8: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) @@ -2557,30 +2419,24 @@ body: | ; GFX9: [[XOR1:%[0-9]+]]:_(s32) = G_XOR [[ADD1]], [[ASHR1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[XOR1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[XOR1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[XOR1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD2]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[XOR]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[XOR1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[XOR1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[XOR]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[XOR1]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[XOR1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD3]] - ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT3]], [[ASHR]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[XOR1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[XOR]], [[ADD2]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[XOR1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[XOR]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[XOR1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[XOR1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[XOR1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[XOR1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[XOR2:%[0-9]+]]:_(s32) = G_XOR [[SELECT1]], [[ASHR]] ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[XOR2]], [[ASHR]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SUB4]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir index b0615686427b..08e01280204a 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir @@ -14,91 +14,79 @@ body: | ; GFX6: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX6: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[COPY1]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX6: $vgpr0 = COPY [[SELECT2]](s32) ; GFX8-LABEL: name: test_udiv_s32 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX8: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[COPY1]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX8: $vgpr0 = COPY [[SELECT2]](s32) ; GFX9-LABEL: name: test_udiv_s32 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX9: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[COPY1]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX9: $vgpr0 = COPY [[SELECT2]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s32) = G_UDIV %0, %1 @@ -118,55 +106,47 @@ body: | ; GFX6: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[UV2]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C2]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C2]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD3]], [[UMULH5]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD3]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[UV3]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C2]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[UV3]] + ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[UV3]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C2]] + ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT2]](s32), [[SELECT5]](s32) ; GFX6: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX8-LABEL: name: test_udiv_v2s32 ; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -175,55 +155,47 @@ body: | ; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[UV2]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C2]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C2]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD3]], [[UMULH5]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD3]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[UV3]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C2]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[UV3]] + ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[UV3]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C2]] + ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT2]](s32), [[SELECT5]](s32) ; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX9-LABEL: name: test_udiv_v2s32 ; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -232,55 +204,47 @@ body: | ; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C2]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C2]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[UV2]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C2]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C3]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C2]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C2]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[ADD3]], [[UMULH5]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD3]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[UV3]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C2]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[UV3]] + ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[UV3]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C2]] + ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT2]](s32), [[SELECT5]](s32) ; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) %0:_(<2 x s32>) = COPY $vgpr0_vgpr1 %1:_(<2 x s32>) = COPY $vgpr2_vgpr3 @@ -1693,33 +1657,29 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX6: $vgpr0 = COPY [[AND3]](s32) + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX6: $vgpr0 = COPY [[AND2]](s32) ; GFX8-LABEL: name: test_udiv_s16 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1730,33 +1690,29 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX8: $vgpr0 = COPY [[AND3]](s32) + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX8: $vgpr0 = COPY [[AND2]](s32) ; GFX9-LABEL: name: test_udiv_s16 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1767,33 +1723,29 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX9: $vgpr0 = COPY [[AND3]](s32) + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX9: $vgpr0 = COPY [[AND2]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s16) = G_TRUNC %0 @@ -1824,64 +1776,56 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX6: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX6: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX6: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[ADD3]], [[UMULH5]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX6: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] - ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) - ; GFX6: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] - ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND7]], [[C]](s32) - ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND6]], [[SHL]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD3]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[AND3]] + ; GFX6: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[AND3]] + ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[AND3]] + ; GFX6: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX6: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] + ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT5]](s32) + ; GFX6: [[AND5:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] + ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND5]], [[C]](s32) + ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND4]], [[SHL]] ; GFX6: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX6: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX8-LABEL: name: test_udiv_v2s16 @@ -1899,64 +1843,56 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX8: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[ADD3]], [[UMULH5]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX8: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] - ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) - ; GFX8: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] - ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND7]], [[C]](s32) - ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND6]], [[SHL]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD3]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[AND3]] + ; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[AND3]] + ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[AND3]] + ; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] + ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT5]](s32) + ; GFX8: [[AND5:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] + ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND5]], [[C]](s32) + ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND4]], [[SHL]] ; GFX8: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX8: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX9-LABEL: name: test_udiv_v2s16 @@ -1974,60 +1910,52 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C4]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C4]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C4]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C4]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX9: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C5:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C5]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX9: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH5]], [[C4]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[UMULH5]], [[C4]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[ADD3]], [[UMULH5]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[SUB7]] - ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB3]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD3]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB4]](s32), [[AND3]] + ; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[C4]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[ADD4]], [[UMULH3]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[SUB4]], [[AND3]] + ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB5]], [[SUB4]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT4]](s32), [[AND3]] + ; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[SELECT3]], [[C4]] + ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD5]], [[SELECT3]] + ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) + ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT5]](s32) ; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY6]](s32), [[COPY7]](s32) ; GFX9: $vgpr0 = COPY [[BUILD_VECTOR_TRUNC]](<2 x s16>) %0:_(<2 x s16>) = COPY $vgpr0 @@ -2052,31 +1980,27 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_udiv_s7 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2088,31 +2012,27 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_udiv_s7 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2124,31 +2044,27 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 @@ -2175,31 +2091,27 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_udiv_s17 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2211,31 +2123,27 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_udiv_s17 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2247,31 +2155,27 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1 - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH2]], [[C3]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[UMULH2]], [[C3]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[ADD1]], [[UMULH2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[SUB3]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UMULH1]], [[C3]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD1]], [[UMULH1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT1]](s32), [[AND1]] + ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[SELECT]], [[C3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[ADD2]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT2]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir index e42fe1400477..988d75459e01 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir @@ -14,88 +14,70 @@ body: | ; GFX6: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[COPY1]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[COPY1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX6: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[COPY1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[COPY1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: $vgpr0 = COPY [[SELECT1]](s32) ; GFX8-LABEL: name: test_urem_s32 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[COPY1]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[COPY1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX8: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[COPY1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[COPY1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: $vgpr0 = COPY [[SELECT1]](s32) ; GFX9-LABEL: name: test_urem_s32 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[COPY1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[COPY1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[COPY1]] ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[COPY]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[COPY1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[COPY1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[COPY1]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[COPY1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX9: $vgpr0 = COPY [[SELECT3]](s32) + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[COPY1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[COPY1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[COPY]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[COPY1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[COPY1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[COPY1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[COPY1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: $vgpr0 = COPY [[SELECT1]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s32) = G_UREM %0, %1 @@ -115,54 +97,42 @@ body: | ; GFX6: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX6: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[UV2]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[UV2]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[UV2]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[UV2]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C2]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX6: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[UV3]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[UV3]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB7]], [[SUB6]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD1]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[UV3]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[UV3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[UV3]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[UV3]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX6: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT1]](s32), [[SELECT3]](s32) ; GFX6: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX8-LABEL: name: test_urem_v2s32 ; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -171,54 +141,42 @@ body: | ; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX8: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[UV2]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[UV2]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[UV2]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[UV2]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C2]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX8: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[UV3]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[UV3]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB7]], [[SUB6]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD1]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[UV3]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[UV3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[UV3]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[UV3]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT1]](s32), [[SELECT3]](s32) ; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) ; GFX9-LABEL: name: test_urem_v2s32 ; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1 @@ -227,54 +185,42 @@ body: | ; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>) ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[UV2]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[UV2]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[UV2]] ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C1]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[UV]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[UV2]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[UV2]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV]](s32), [[MUL1]] - ; GFX9: [[AND:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[UV2]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[UV2]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV2]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[UV2]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[UV]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[UV2]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[UV2]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[UV2]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[UV2]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[UV3]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C2]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[UV3]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[UV3]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C1]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[UV1]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[UV3]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[UV3]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[UV1]](s32), [[MUL3]] - ; GFX9: [[AND1:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[UV3]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[UV3]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND1]](s1), [[SUB7]], [[SUB6]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT3]](s32), [[SELECT7]](s32) + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C1]], [[UV3]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[ADD1]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[UV3]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[UV1]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[UV3]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[UV3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[UV3]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[UV3]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SELECT1]](s32), [[SELECT3]](s32) ; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>) %0:_(<2 x s32>) = COPY $vgpr0_vgpr1 %1:_(<2 x s32>) = COPY $vgpr2_vgpr3 @@ -1627,32 +1573,26 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX6: $vgpr0 = COPY [[AND3]](s32) + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX6: $vgpr0 = COPY [[AND2]](s32) ; GFX8-LABEL: name: test_urem_s16 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1663,32 +1603,26 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX8: $vgpr0 = COPY [[AND3]](s32) + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX8: $vgpr0 = COPY [[AND2]](s32) ; GFX9-LABEL: name: test_urem_s16 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 ; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 @@ -1699,32 +1633,26 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] - ; GFX9: $vgpr0 = COPY [[AND3]](s32) + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]] + ; GFX9: $vgpr0 = COPY [[AND2]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 %2:_(s16) = G_TRUNC %0 @@ -1755,63 +1683,51 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX6: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX6: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX6: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX6: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX6: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX6: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX6: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX6: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX6: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX6: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX6: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX6: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX6: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX6: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[AND4]] - ; GFX6: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[AND4]] - ; GFX6: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[SUB7]], [[SUB6]] - ; GFX6: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX6: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] - ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) - ; GFX6: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] - ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND7]], [[C]](s32) - ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND6]], [[SHL]] + ; GFX6: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX6: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD1]] + ; GFX6: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX6: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[AND3]] + ; GFX6: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[AND3]] + ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX6: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[AND3]] + ; GFX6: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[AND3]] + ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX6: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX6: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] + ; GFX6: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX6: [[AND5:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] + ; GFX6: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND5]], [[C]](s32) + ; GFX6: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND4]], [[SHL]] ; GFX6: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX6: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX8-LABEL: name: test_urem_v2s16 @@ -1829,63 +1745,51 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX8: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX8: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX8: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX8: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX8: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX8: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX8: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX8: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[AND4]] - ; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[AND4]] - ; GFX8: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[SUB7]], [[SUB6]] - ; GFX8: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX8: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] - ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) - ; GFX8: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] - ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND7]], [[C]](s32) - ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND6]], [[SHL]] + ; GFX8: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD1]] + ; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX8: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[AND3]] + ; GFX8: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[AND3]] + ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[AND3]] + ; GFX8: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[AND3]] + ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C1]] + ; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX8: [[AND5:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C1]] + ; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND5]], [[C]](s32) + ; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND4]], [[SHL]] ; GFX8: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32) ; GFX8: $vgpr0 = COPY [[BITCAST2]](<2 x s16>) ; GFX9-LABEL: name: test_urem_v2s16 @@ -1903,59 +1807,47 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C2]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C3]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32) - ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] + ; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]] ; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32) - ; GFX9: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] - ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND4]](s32) + ; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C1]] + ; GFX9: [[UITOFP1:%[0-9]+]]:_(s32) = G_UITOFP [[AND3]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG1:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP1]](s32) - ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C4:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG1]], [[C4]] ; GFX9: [[FPTOUI1:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL1]](s32) - ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI1]], [[AND4]] - ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[AND4]] - ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[MUL2]] - ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH3]](s32), [[C3]] - ; GFX9: [[SELECT4:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB4]], [[MUL2]] - ; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[SELECT4]], [[FPTOUI1]] - ; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI1]], [[UMULH4]] - ; GFX9: [[SELECT5:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[ADD2]], [[SUB5]] - ; GFX9: [[UMULH5:%[0-9]+]]:_(s32) = G_UMULH [[SELECT5]], [[AND3]] - ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH5]], [[AND4]] - ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[AND3]], [[MUL3]] - ; GFX9: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB6]](s32), [[AND4]] - ; GFX9: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND3]](s32), [[MUL3]] - ; GFX9: [[AND5:%[0-9]+]]:_(s1) = G_AND [[ICMP4]], [[ICMP5]] - ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SUB6]], [[AND4]] - ; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[SUB6]], [[AND4]] - ; GFX9: [[SELECT6:%[0-9]+]]:_(s32) = G_SELECT [[AND5]](s1), [[SUB7]], [[SUB6]] - ; GFX9: [[SELECT7:%[0-9]+]]:_(s32) = G_SELECT [[ICMP5]](s1), [[SELECT6]], [[ADD3]] - ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) - ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT7]](s32) + ; GFX9: [[SUB4:%[0-9]+]]:_(s32) = G_SUB [[C3]], [[AND3]] + ; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SUB4]], [[FPTOUI1]] + ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI1]], [[MUL2]] + ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI1]], [[UMULH2]] + ; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[AND2]], [[ADD1]] + ; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UMULH3]], [[AND3]] + ; GFX9: [[SUB5:%[0-9]+]]:_(s32) = G_SUB [[AND2]], [[MUL3]] + ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB5]](s32), [[AND3]] + ; GFX9: [[SUB6:%[0-9]+]]:_(s32) = G_SUB [[SUB5]], [[AND3]] + ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SUB6]], [[SUB5]] + ; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT2]](s32), [[AND3]] + ; GFX9: [[SUB7:%[0-9]+]]:_(s32) = G_SUB [[SELECT2]], [[AND3]] + ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP3]](s1), [[SUB7]], [[SELECT2]] + ; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) + ; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) ; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY6]](s32), [[COPY7]](s32) ; GFX9: $vgpr0 = COPY [[BUILD_VECTOR_TRUNC]](<2 x s16>) %0:_(<2 x s16>) = COPY $vgpr0 @@ -1980,30 +1872,24 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_urem_s7 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2015,30 +1901,24 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_urem_s7 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2050,30 +1930,24 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 @@ -2100,30 +1974,24 @@ body: | ; GFX6: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX6: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX6: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX6: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX6: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX6: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX6: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX6: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX6: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX6: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX6: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX6: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX6: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX6: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX6: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX6: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX6: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX6: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX6: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX6: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX6: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX6: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX6: $vgpr0 = COPY [[COPY4]](s32) ; GFX8-LABEL: name: test_urem_s17 ; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2135,30 +2003,24 @@ body: | ; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX8: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX8: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX8: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX8: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX8: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX8: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX8: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX8: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX8: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX8: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX8: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX8: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX8: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX8: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX8: $vgpr0 = COPY [[COPY4]](s32) ; GFX9-LABEL: name: test_urem_s17 ; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 @@ -2170,30 +2032,24 @@ body: | ; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]] ; GFX9: [[UITOFP:%[0-9]+]]:_(s32) = G_UITOFP [[AND1]](s32) ; GFX9: [[AMDGPU_RCP_IFLAG:%[0-9]+]]:_(s32) = G_AMDGPU_RCP_IFLAG [[UITOFP]](s32) - ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41F0000000000000 + ; GFX9: [[C1:%[0-9]+]]:_(s32) = G_FCONSTANT float 0x41EFFFFFC0000000 ; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[AMDGPU_RCP_IFLAG]], [[C1]] ; GFX9: [[FPTOUI:%[0-9]+]]:_(s32) = G_FPTOUI [[FMUL]](s32) - ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[FPTOUI]], [[AND1]] - ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[AND1]] ; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 - ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[MUL]] - ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UMULH]](s32), [[C2]] - ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB]], [[MUL]] - ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[SELECT]], [[FPTOUI]] - ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[FPTOUI]], [[UMULH1]] - ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[ADD]], [[SUB1]] - ; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[SELECT1]], [[AND]] - ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH2]], [[AND1]] - ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] - ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB2]](s32), [[AND1]] - ; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[AND]](s32), [[MUL1]] - ; GFX9: [[AND2:%[0-9]+]]:_(s1) = G_AND [[ICMP1]], [[ICMP2]] - ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SUB2]], [[AND1]] - ; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[SUB2]], [[AND1]] - ; GFX9: [[SELECT2:%[0-9]+]]:_(s32) = G_SELECT [[AND2]](s1), [[SUB3]], [[SUB2]] - ; GFX9: [[SELECT3:%[0-9]+]]:_(s32) = G_SELECT [[ICMP2]](s1), [[SELECT2]], [[ADD1]] - ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT3]](s32) + ; GFX9: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C2]], [[AND1]] + ; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SUB]], [[FPTOUI]] + ; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[FPTOUI]], [[MUL]] + ; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[FPTOUI]], [[UMULH]] + ; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[AND]], [[ADD]] + ; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UMULH1]], [[AND1]] + ; GFX9: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[AND]], [[MUL1]] + ; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SUB1]](s32), [[AND1]] + ; GFX9: [[SUB2:%[0-9]+]]:_(s32) = G_SUB [[SUB1]], [[AND1]] + ; GFX9: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s1), [[SUB2]], [[SUB1]] + ; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[SELECT]](s32), [[AND1]] + ; GFX9: [[SUB3:%[0-9]+]]:_(s32) = G_SUB [[SELECT]], [[AND1]] + ; GFX9: [[SELECT1:%[0-9]+]]:_(s32) = G_SELECT [[ICMP1]](s1), [[SUB3]], [[SELECT]] + ; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SELECT1]](s32) ; GFX9: $vgpr0 = COPY [[COPY4]](s32) %0:_(s32) = COPY $vgpr0 %1:_(s32) = COPY $vgpr1 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll index dc9910e9ed21..f68465faf61c 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll @@ -15,28 +15,24 @@ define i32 @v_sdiv_i32(i32 %num, i32 %den) { ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v1 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v5, v4, v1 -; GISEL-NEXT: v_mul_hi_u32 v6, v4, v1 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v5 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v4 -; GISEL-NEXT: v_add_i32_e64 v6, s[4:5], v4, v5 -; GISEL-NEXT: v_sub_i32_e64 v4, s[4:5], v4, v5 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 ; GISEL-NEXT: v_mul_lo_u32 v5, v4, v1 ; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v5 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v1 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v6, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc +; GISEL-NEXT: v_sub_i32_e64 v5, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v4 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc ; GISEL-NEXT: v_xor_b32_e32 v1, v2, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v1 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 @@ -96,31 +92,27 @@ define amdgpu_ps i32 @s_sdiv_i32(i32 inreg %num, i32 inreg %den) { ; GISEL-NEXT: s_ashr_i32 s3, s1, 31 ; GISEL-NEXT: s_add_i32 s0, s0, s2 ; GISEL-NEXT: s_add_i32 s1, s1, s3 -; GISEL-NEXT: s_xor_b32 s4, s0, s2 -; GISEL-NEXT: s_xor_b32 s5, s1, s3 -; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s5 +; GISEL-NEXT: s_xor_b32 s0, s0, s2 +; GISEL-NEXT: s_xor_b32 s4, s1, s3 +; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s4 +; GISEL-NEXT: s_sub_i32 s1, 0, s4 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GISEL-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_lo_u32 v1, v0, s5 -; GISEL-NEXT: v_mul_hi_u32 v2, v0, s5 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GISEL-NEXT: v_mul_hi_u32 v1, v1, v0 -; GISEL-NEXT: v_add_i32_e64 v2, s[0:1], v0, v1 -; GISEL-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 +; GISEL-NEXT: v_mul_lo_u32 v1, s1, v0 +; GISEL-NEXT: v_mul_hi_u32 v1, v0, v1 +; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GISEL-NEXT: v_mul_hi_u32 v0, s0, v0 +; GISEL-NEXT: v_mul_lo_u32 v1, v0, s4 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 ; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GISEL-NEXT: v_mul_hi_u32 v0, v0, s4 -; GISEL-NEXT: v_mul_lo_u32 v1, v0, s5 +; GISEL-NEXT: v_subrev_i32_e64 v2, s[0:1], s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; GISEL-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; GISEL-NEXT: v_sub_i32_e32 v4, vcc, s4, v1 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, s4, v1 -; GISEL-NEXT: v_cmp_le_u32_e64 s[0:1], s5, v4 -; GISEL-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: s_xor_b32 s0, s2, s3 ; GISEL-NEXT: v_xor_b32_e32 v0, s0, v0 ; GISEL-NEXT: v_subrev_i32_e32 v0, vcc, s0, v0 @@ -192,51 +184,43 @@ define <2 x i32> @v_sdiv_v2i32(<2 x i32> %num, <2 x i32> %den) { ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v10, vcc, 1, v4 -; GISEL-NEXT: v_mul_lo_u32 v11, v5, v3 -; GISEL-NEXT: v_add_i32_e32 v12, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v13, vcc, 1, v5 -; GISEL-NEXT: v_sub_i32_e32 v14, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v14, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v12, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v10, v5, v3 +; GISEL-NEXT: v_add_i32_e32 v11, vcc, 1, v5 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v10 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; GISEL-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v11, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v8 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v9 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 @@ -329,31 +313,27 @@ define i32 @v_sdiv_i32_pow2k_denom(i32 %num) { ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: s_movk_i32 s6, 0x1000 ; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0 +; CHECK-NEXT: v_mov_b32_e32 v2, 0xfffff000 ; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 -; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s6 +; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s6 ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 -; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, s6 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 +; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v3, v2 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 ; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v2 ; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_le_u32_e64 s[4:5], s6, v6 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CHECK-NEXT: v_subrev_i32_e64 v3, s[4:5], s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -370,52 +350,43 @@ define <2 x i32> @v_sdiv_v2i32_pow2k_denom(<2 x i32> %num) { ; GISEL-NEXT: v_ashrrev_i32_e32 v3, 31, v1 ; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s8 +; GISEL-NEXT: s_sub_i32 s4, 0, s8 ; GISEL-NEXT: v_add_i32_e32 v1, vcc, v1, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v6, v5, s8 -; GISEL-NEXT: v_mul_hi_u32 v7, v5, s8 -; GISEL-NEXT: v_mul_lo_u32 v8, v4, s8 -; GISEL-NEXT: v_mul_hi_u32 v9, v4, s8 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v5 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v4 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v5, v6 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v4, v7 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v1 +; GISEL-NEXT: v_mul_lo_u32 v6, s4, v5 +; GISEL-NEXT: v_mul_lo_u32 v7, s4, v4 +; GISEL-NEXT: v_mul_hi_u32 v6, v5, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v4, v7 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v6 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v7 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v1, v4 ; GISEL-NEXT: v_mul_lo_u32 v6, v5, s8 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v8, vcc, 1, v5 -; GISEL-NEXT: v_mul_lo_u32 v9, v4, s8 -; GISEL-NEXT: v_add_i32_e32 v10, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v11, vcc, 1, v4 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_le_u32_e64 s[6:7], s8, v12 -; GISEL-NEXT: v_cmp_le_u32_e64 s[8:9], s8, v0 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v5, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v4, v10, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v8, v4, s8 +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v4 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc +; GISEL-NEXT: v_subrev_i32_e64 v6, s[4:5], s8, v0 +; GISEL-NEXT: v_cmp_le_u32_e64 s[4:5], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v9, s[4:5] +; GISEL-NEXT: v_subrev_i32_e64 v7, s[6:7], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v5 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v4, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 @@ -428,61 +399,53 @@ define <2 x i32> @v_sdiv_v2i32_pow2k_denom(<2 x i32> %num) { ; CGP-NEXT: s_movk_i32 s4, 0x1000 ; CGP-NEXT: v_ashrrev_i32_e32 v2, 31, v0 ; CGP-NEXT: v_mov_b32_e32 v3, 0x1000 -; CGP-NEXT: v_ashrrev_i32_e32 v4, 31, v1 +; CGP-NEXT: s_mov_b32 s5, 0xfffff000 +; CGP-NEXT: v_mov_b32_e32 v4, 0xfffff000 +; CGP-NEXT: v_ashrrev_i32_e32 v5, 31, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_cvt_f32_u32_e32 v5, s4 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_cvt_f32_u32_e32 v6, v3 +; CGP-NEXT: v_cvt_f32_u32_e32 v6, s4 +; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; CGP-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 +; CGP-NEXT: v_rcp_iflag_f32_e32 v7, v7 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 -; CGP-NEXT: v_lshlrev_b32_e32 v7, 12, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v5, s4 -; CGP-NEXT: v_lshlrev_b32_e32 v9, 12, v6 -; CGP-NEXT: v_mul_hi_u32 v10, v6, v3 -; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v7 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, 0, v9 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v8 -; CGP-NEXT: v_cndmask_b32_e32 v7, v7, v11, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v8, v9, v12, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v7, v7, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v6 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v5, v7 -; CGP-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v6, v8 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; CGP-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CGP-NEXT: v_cndmask_b32_e64 v6, v6, v7, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v5, v5, v0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v1 -; CGP-NEXT: v_lshlrev_b32_e32 v7, 12, v5 -; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v9, vcc, 1, v5 -; CGP-NEXT: v_lshlrev_b32_e32 v10, 12, v6 -; CGP-NEXT: v_add_i32_e32 v11, vcc, 1, v6 -; CGP-NEXT: v_subrev_i32_e32 v12, vcc, 1, v6 -; CGP-NEXT: v_sub_i32_e32 v13, vcc, v0, v7 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v7 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v10 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v10 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v13, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v5, v8, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v6, v11, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v9, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v12, v1, s[4:5] +; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 +; CGP-NEXT: v_mul_lo_u32 v8, s5, v6 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v7 +; CGP-NEXT: v_mul_hi_u32 v8, v6, v8 +; CGP-NEXT: v_mul_hi_u32 v4, v7, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v7, v4 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 +; CGP-NEXT: v_lshlrev_b32_e32 v7, 12, v6 +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v6 +; CGP-NEXT: v_lshlrev_b32_e32 v9, 12, v4 +; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v4 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v7 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v9 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v8, vcc +; CGP-NEXT: v_subrev_i32_e64 v7, s[4:5], s4, v0 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v4, v4, v10, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v8, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v7, vcc +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v6 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v8, s[4:5] +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v4, v8, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] %result = sdiv <2 x i32> %num, ret <2 x i32> %result @@ -494,31 +457,27 @@ define i32 @v_sdiv_i32_oddk_denom(i32 %num) { ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: s_mov_b32 s6, 0x12d8fb ; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0 +; CHECK-NEXT: v_mov_b32_e32 v2, 0xffed2705 ; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 -; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s6 +; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s6 ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 -; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_lo_u32 v3, v2, s6 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, s6 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 +; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v3, v2 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 ; CHECK-NEXT: v_mul_lo_u32 v3, v2, s6 ; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_le_u32_e64 s[4:5], s6, v6 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CHECK-NEXT: v_subrev_i32_e64 v3, s[4:5], s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -535,52 +494,43 @@ define <2 x i32> @v_sdiv_v2i32_oddk_denom(<2 x i32> %num) { ; GISEL-NEXT: v_ashrrev_i32_e32 v3, 31, v1 ; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s8 +; GISEL-NEXT: s_sub_i32 s4, 0, s8 ; GISEL-NEXT: v_add_i32_e32 v1, vcc, v1, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v6, v5, s8 -; GISEL-NEXT: v_mul_hi_u32 v7, v5, s8 -; GISEL-NEXT: v_mul_lo_u32 v8, v4, s8 -; GISEL-NEXT: v_mul_hi_u32 v9, v4, s8 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v5 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v4 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v5, v6 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v4, v7 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v1 +; GISEL-NEXT: v_mul_lo_u32 v6, s4, v5 +; GISEL-NEXT: v_mul_lo_u32 v7, s4, v4 +; GISEL-NEXT: v_mul_hi_u32 v6, v5, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v4, v7 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v6 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v7 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v1, v4 ; GISEL-NEXT: v_mul_lo_u32 v6, v5, s8 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v8, vcc, 1, v5 -; GISEL-NEXT: v_mul_lo_u32 v9, v4, s8 -; GISEL-NEXT: v_add_i32_e32 v10, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v11, vcc, 1, v4 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_le_u32_e64 s[6:7], s8, v12 -; GISEL-NEXT: v_cmp_le_u32_e64 s[8:9], s8, v0 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v5, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v4, v10, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v8, v4, s8 +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v4 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc +; GISEL-NEXT: v_subrev_i32_e64 v6, s[4:5], s8, v0 +; GISEL-NEXT: v_cmp_le_u32_e64 s[4:5], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v9, s[4:5] +; GISEL-NEXT: v_subrev_i32_e64 v7, s[6:7], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v5 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v4, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 @@ -590,64 +540,56 @@ define <2 x i32> @v_sdiv_v2i32_oddk_denom(<2 x i32> %num) { ; CGP-LABEL: v_sdiv_v2i32_oddk_denom: ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CGP-NEXT: s_mov_b32 s8, 0x12d8fb +; CGP-NEXT: s_mov_b32 s4, 0x12d8fb ; CGP-NEXT: v_ashrrev_i32_e32 v2, 31, v0 ; CGP-NEXT: v_mov_b32_e32 v3, 0x12d8fb -; CGP-NEXT: v_ashrrev_i32_e32 v4, 31, v1 +; CGP-NEXT: s_mov_b32 s5, 0xffed2705 +; CGP-NEXT: v_mov_b32_e32 v4, 0xffed2705 +; CGP-NEXT: v_ashrrev_i32_e32 v5, 31, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_cvt_f32_u32_e32 v5, s8 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_cvt_f32_u32_e32 v6, v3 +; CGP-NEXT: v_cvt_f32_u32_e32 v6, s4 +; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; CGP-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 +; CGP-NEXT: v_rcp_iflag_f32_e32 v7, v7 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 -; CGP-NEXT: v_mul_lo_u32 v7, v5, s8 -; CGP-NEXT: v_mul_hi_u32 v8, v5, s8 -; CGP-NEXT: v_mul_lo_u32 v9, v6, v3 -; CGP-NEXT: v_mul_hi_u32 v10, v6, v3 -; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v7 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, 0, v9 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v8 -; CGP-NEXT: v_cndmask_b32_e32 v7, v7, v11, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v8, v9, v12, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v7, v7, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v6 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v5, v7 -; CGP-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v6, v8 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; CGP-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CGP-NEXT: v_cndmask_b32_e64 v6, v6, v7, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v5, v5, v0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v1 -; CGP-NEXT: v_mul_lo_u32 v7, v5, s8 -; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v5 -; CGP-NEXT: v_subrev_i32_e32 v9, vcc, 1, v5 -; CGP-NEXT: v_mul_lo_u32 v10, v6, v3 -; CGP-NEXT: v_add_i32_e32 v11, vcc, 1, v6 -; CGP-NEXT: v_subrev_i32_e32 v12, vcc, 1, v6 -; CGP-NEXT: v_sub_i32_e32 v13, vcc, v0, v7 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v7 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v10 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v10 -; CGP-NEXT: v_cmp_le_u32_e64 s[6:7], s8, v13 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; CGP-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v5, v8, s[6:7] -; CGP-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v6, v11, s[6:7] -; CGP-NEXT: v_cndmask_b32_e32 v0, v9, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v1, v12, v1, s[4:5] +; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 +; CGP-NEXT: v_mul_lo_u32 v8, s5, v6 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v7 +; CGP-NEXT: v_mul_hi_u32 v8, v6, v8 +; CGP-NEXT: v_mul_hi_u32 v4, v7, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v7, v4 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 +; CGP-NEXT: v_mul_lo_u32 v7, v6, s4 +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v6 +; CGP-NEXT: v_mul_lo_u32 v9, v4, v3 +; CGP-NEXT: v_add_i32_e32 v10, vcc, 1, v4 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v7 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v9 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v6, v6, v8, vcc +; CGP-NEXT: v_subrev_i32_e64 v7, s[4:5], s4, v0 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v4, v4, v10, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v8, s[6:7], v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v7, vcc +; CGP-NEXT: v_add_i32_e32 v7, vcc, 1, v6 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v8, s[4:5] +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 +; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v4, v8, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] %result = sdiv <2 x i32> %num, ret <2 x i32> %result @@ -665,28 +607,24 @@ define i32 @v_sdiv_i32_pow2_shl_denom(i32 %x, i32 %y) { ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v2 ; CHECK-NEXT: v_xor_b32_e32 v1, v1, v3 ; CHECK-NEXT: v_cvt_f32_u32_e32 v4, v1 +; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; CHECK-NEXT: v_cvt_u32_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_lo_u32 v5, v4, v1 -; CHECK-NEXT: v_mul_hi_u32 v6, v4, v1 -; CHECK-NEXT: v_sub_i32_e32 v7, vcc, 0, v5 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc -; CHECK-NEXT: v_mul_hi_u32 v5, v5, v4 -; CHECK-NEXT: v_add_i32_e64 v6, s[4:5], v4, v5 -; CHECK-NEXT: v_sub_i32_e64 v4, s[4:5], v4, v5 -; CHECK-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v0 +; CHECK-NEXT: v_mul_lo_u32 v5, v5, v4 +; CHECK-NEXT: v_mul_hi_u32 v5, v4, v5 +; CHECK-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; CHECK-NEXT: v_mul_hi_u32 v4, v0, v4 ; CHECK-NEXT: v_mul_lo_u32 v5, v4, v1 ; CHECK-NEXT: v_add_i32_e32 v6, vcc, 1, v4 -; CHECK-NEXT: v_subrev_i32_e32 v7, vcc, 1, v4 -; CHECK-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v5 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v1 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v4, v6, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc +; CHECK-NEXT: v_sub_i32_e64 v5, s[4:5], v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; CHECK-NEXT: v_add_i32_e32 v5, vcc, 1, v4 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc ; CHECK-NEXT: v_xor_b32_e32 v1, v2, v3 ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 @@ -718,51 +656,43 @@ define <2 x i32> @v_sdiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; GISEL-NEXT: v_xor_b32_e32 v2, v2, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v8, v3 +; GISEL-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; GISEL-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; GISEL-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 -; GISEL-NEXT: v_cvt_u32_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_lo_u32 v8, v6, v2 -; GISEL-NEXT: v_mul_hi_u32 v9, v6, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v7, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v8 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v8, v8, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v9, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v8, v8, v6 -; GISEL-NEXT: v_mul_hi_u32 v9, v9, v7 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v6, v8 -; GISEL-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v7, v9 -; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v7, v7, v8, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v0 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_lo_u32 v9, v9, v8 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v9, v8, v9 +; GISEL-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; GISEL-NEXT: v_mul_hi_u32 v6, v0, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v1, v7 ; GISEL-NEXT: v_mul_lo_u32 v8, v6, v2 ; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v6 -; GISEL-NEXT: v_subrev_i32_e32 v10, vcc, 1, v6 -; GISEL-NEXT: v_mul_lo_u32 v11, v7, v3 -; GISEL-NEXT: v_add_i32_e32 v12, vcc, 1, v7 -; GISEL-NEXT: v_subrev_i32_e32 v13, vcc, 1, v7 -; GISEL-NEXT: v_sub_i32_e32 v14, vcc, v0, v8 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v8 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v14, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v6, v9, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v7, v12, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v10, v7, v3 +; GISEL-NEXT: v_add_i32_e32 v11, vcc, 1, v7 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v10 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v9, vcc +; GISEL-NEXT: v_sub_i32_e64 v8, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v7, v7, v11, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v9, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v8, vcc +; GISEL-NEXT: v_add_i32_e32 v8, vcc, 1, v6 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v9, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v7 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v8, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v9, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v5 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -867,28 +797,24 @@ define i32 @v_sdiv_i32_24bit(i32 %num, i32 %den) { ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v1 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v5, v4, v1 -; GISEL-NEXT: v_mul_hi_u32 v6, v4, v1 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v5 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v4 -; GISEL-NEXT: v_add_i32_e64 v6, s[4:5], v4, v5 -; GISEL-NEXT: v_sub_i32_e64 v4, s[4:5], v4, v5 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 ; GISEL-NEXT: v_mul_lo_u32 v5, v4, v1 ; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v5 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v1 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v6, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc +; GISEL-NEXT: v_sub_i32_e64 v5, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v4 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc ; GISEL-NEXT: v_xor_b32_e32 v1, v2, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v1 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 @@ -958,51 +884,43 @@ define <2 x i32> @v_sdiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v10, vcc, 1, v4 -; GISEL-NEXT: v_mul_lo_u32 v11, v5, v3 -; GISEL-NEXT: v_add_i32_e32 v12, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v13, vcc, 1, v5 -; GISEL-NEXT: v_sub_i32_e32 v14, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v11 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v14, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v12, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v10, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v13, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v10, v5, v3 +; GISEL-NEXT: v_add_i32_e32 v11, vcc, 1, v5 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v10 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; GISEL-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v11, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v8 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v9 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v8 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll index cc70c96c18c3..1813c33019ae 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll @@ -166,29 +166,25 @@ define i64 @v_sdiv_i64(i64 %num, i64 %den) { ; CHECK-NEXT: s_cbranch_execz BB0_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v2 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v2 +; CHECK-NEXT: v_mov_b32_e32 v5, 0 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v1, v3 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v1 +; CHECK-NEXT: v_mul_hi_u32 v3, v1, v3 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v3 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 ; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v1 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v1 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v2 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v1, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v4, v5, v0, vcc -; CHECK-NEXT: v_mov_b32_e32 v5, 0 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v4, v1, v3, vcc ; CHECK-NEXT: BB0_4: ; CHECK-NEXT: s_or_b64 exec, exec, s[6:7] ; CHECK-NEXT: v_mov_b32_e32 v0, v4 @@ -369,28 +365,24 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_cbranch_scc0 BB1_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s4 +; CHECK-NEXT: s_sub_i32 s0, 0, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CHECK-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CHECK-NEXT: v_mul_lo_u32 v1, s0, v0 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_mul_hi_u32 v0, s2, v0 ; CHECK-NEXT: v_mul_lo_u32 v1, v0, s4 -; CHECK-NEXT: v_mul_hi_u32 v2, v0, s4 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 -; CHECK-NEXT: v_add_i32_e64 v2, s[0:1], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, s2, v1 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: v_mul_hi_u32 v0, v0, s2 -; CHECK-NEXT: v_mul_lo_u32 v1, v0, s4 +; CHECK-NEXT: v_subrev_i32_e64 v2, s[0:1], s4, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; CHECK-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, s2, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, s2, v1 -; CHECK-NEXT: v_cmp_le_u32_e64 s[0:1], s4, v4 -; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CHECK-NEXT: BB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 @@ -860,28 +852,24 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: s_cbranch_execz BB2_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v8, v0 ; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 -; CGP-NEXT: v_mul_hi_u32 v5, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v9, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 +; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v8, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 ; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v8 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 +; CGP-NEXT: v_sub_i32_e64 v5, s[4:5], v1, v4 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v9, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v10, vcc, v8, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v8, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v10, v4 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v9, v0, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB2_4: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -1043,28 +1031,24 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: s_cbranch_execz BB2_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v6 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v6 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; CGP-NEXT: v_subrev_i32_e32 v7, vcc, 1, v3 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v6 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v2, v3, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v4, v7, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v4, v3, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB2_8: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -2686,28 +2670,24 @@ define i64 @v_sdiv_i64_pow2_shl_denom(i64 %x, i64 %y) { ; CHECK-NEXT: s_cbranch_execz BB7_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v4 +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, 0, v4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 -; CHECK-NEXT: v_mul_hi_u32 v3, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v1, v2 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v2, v1, v2 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 ; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v1 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v4 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v1, v3, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v2, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v1, v2, vcc ; CHECK-NEXT: v_mov_b32_e32 v3, 0 ; CHECK-NEXT: BB7_4: ; CHECK-NEXT: s_or_b64 exec, exec, s[6:7] @@ -3180,28 +3160,24 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: s_cbranch_execz BB8_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v10 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v10 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v5, v0 ; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 -; CGP-NEXT: v_mul_hi_u32 v4, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v5, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 ; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v5 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v1, v10 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc ; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v6, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v5, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v5, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v10 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB8_4: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -3363,28 +3339,24 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: s_cbranch_execz BB8_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v8 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v8 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; CGP-NEXT: v_subrev_i32_e32 v6, vcc, 1, v3 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v8 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v2, v3, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v4, v6, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v4, v3, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB8_8: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -3403,29 +3375,25 @@ define i64 @v_sdiv_i64_24bit(i64 %num, i64 %den) { ; GISEL-NEXT: s_mov_b32 s4, 0xffffff ; GISEL-NEXT: v_and_b32_e32 v1, s4, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 ; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; GISEL-NEXT: v_mov_b32_e32 v1, 0 ; GISEL-NEXT: s_setpc_b64 s[30:31] ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll index 1d2d57669976..43f79f4b207d 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll @@ -15,28 +15,22 @@ define i32 @v_srem_i32(i32 %num, i32 %den) { ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v3, v1 +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; GISEL-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GISEL-NEXT: v_mul_lo_u32 v4, v3, v1 -; GISEL-NEXT: v_mul_hi_u32 v5, v3, v1 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v3 -; GISEL-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v0 +; GISEL-NEXT: v_mul_lo_u32 v4, v4, v3 +; GISEL-NEXT: v_mul_hi_u32 v4, v3, v4 +; GISEL-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v3 ; GISEL-NEXT: v_mul_lo_u32 v3, v3, v1 -; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; GISEL-NEXT: v_add_i32_e64 v5, s[4:5], v4, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 ; GISEL-NEXT: s_setpc_b64 s[30:31] @@ -88,37 +82,31 @@ declare i32 @llvm.amdgcn.readfirstlane(i32) define amdgpu_ps i32 @s_srem_i32(i32 inreg %num, i32 inreg %den) { ; GISEL-LABEL: s_srem_i32: ; GISEL: ; %bb.0: -; GISEL-NEXT: s_ashr_i32 s4, s0, 31 -; GISEL-NEXT: s_ashr_i32 s2, s1, 31 -; GISEL-NEXT: s_add_i32 s0, s0, s4 -; GISEL-NEXT: s_add_i32 s1, s1, s2 -; GISEL-NEXT: s_xor_b32 s3, s0, s4 -; GISEL-NEXT: s_xor_b32 s2, s1, s2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s2 +; GISEL-NEXT: s_ashr_i32 s2, s0, 31 +; GISEL-NEXT: s_ashr_i32 s3, s1, 31 +; GISEL-NEXT: s_add_i32 s0, s0, s2 +; GISEL-NEXT: s_add_i32 s1, s1, s3 +; GISEL-NEXT: s_xor_b32 s0, s0, s2 +; GISEL-NEXT: s_xor_b32 s1, s1, s3 +; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s1 +; GISEL-NEXT: s_sub_i32 s3, 0, s1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GISEL-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_lo_u32 v1, v0, s2 -; GISEL-NEXT: v_mul_hi_u32 v2, v0, s2 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GISEL-NEXT: v_mul_hi_u32 v1, v1, v0 -; GISEL-NEXT: v_add_i32_e64 v2, s[0:1], v0, v1 -; GISEL-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 -; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GISEL-NEXT: v_mul_hi_u32 v0, v0, s3 -; GISEL-NEXT: v_mul_lo_u32 v0, v0, s2 -; GISEL-NEXT: v_sub_i32_e32 v1, vcc, s3, v0 -; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s2, v1 -; GISEL-NEXT: v_add_i32_e64 v2, s[0:1], s2, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[0:1], s3, v0 -; GISEL-NEXT: v_subrev_i32_e64 v0, s[2:3], s2, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[0:1] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] -; GISEL-NEXT: v_xor_b32_e32 v0, s4, v0 -; GISEL-NEXT: v_subrev_i32_e32 v0, vcc, s4, v0 +; GISEL-NEXT: v_mul_lo_u32 v1, s3, v0 +; GISEL-NEXT: v_mul_hi_u32 v1, v0, v1 +; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GISEL-NEXT: v_mul_hi_u32 v0, s0, v0 +; GISEL-NEXT: v_mul_lo_u32 v0, v0, s1 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; GISEL-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GISEL-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GISEL-NEXT: v_xor_b32_e32 v0, s2, v0 +; GISEL-NEXT: v_subrev_i32_e32 v0, vcc, s2, v0 ; GISEL-NEXT: v_readfirstlane_b32 s0, v0 ; GISEL-NEXT: ; return to shader part epilog ; @@ -182,51 +170,39 @@ define <2 x i32> @v_srem_v2i32(<2 x i32> %num, <2 x i32> %den) { ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v8, v3 +; GISEL-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; GISEL-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v5 +; GISEL-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_cvt_u32_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v2 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v7, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v8 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v8, v8, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v9, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v8, v8, v5 -; GISEL-NEXT: v_mul_hi_u32 v9, v9, v7 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v5, v8 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v8 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v7, v9 -; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v7, v7, v8, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v5 +; GISEL-NEXT: v_mul_lo_u32 v9, v9, v8 +; GISEL-NEXT: v_mul_hi_u32 v7, v5, v7 +; GISEL-NEXT: v_mul_hi_u32 v9, v8, v9 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v7 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v1, v7 ; GISEL-NEXT: v_mul_lo_u32 v5, v5, v2 ; GISEL-NEXT: v_mul_lo_u32 v7, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; GISEL-NEXT: v_sub_i32_e32 v9, vcc, v1, v7 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v8, v2 -; GISEL-NEXT: v_add_i32_e64 v10, s[4:5], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v9, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v9, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v7 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v9, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -311,34 +287,27 @@ define i32 @v_srem_i32_pow2k_denom(i32 %num) { ; CHECK-LABEL: v_srem_i32_pow2k_denom: ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_movk_i32 s6, 0x1000 +; CHECK-NEXT: s_movk_i32 s4, 0x1000 ; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0 -; CHECK-NEXT: v_mov_b32_e32 v2, 0x1000 +; CHECK-NEXT: v_mov_b32_e32 v2, 0xfffff000 ; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s6 +; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s4 ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_lshlrev_b32_e32 v4, 12, v3 -; CHECK-NEXT: v_mul_hi_u32 v5, v3, s6 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CHECK-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v3 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v3 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v4 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v4, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v3, v2 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 +; CHECK-NEXT: v_lshlrev_b32_e32 v2, 12, v2 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -351,56 +320,43 @@ define <2 x i32> @v_srem_v2i32_pow2k_denom(<2 x i32> %num) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_ashrrev_i32_e32 v2, 31, v0 -; GISEL-NEXT: s_add_i32 s10, 0x1000, 0 +; GISEL-NEXT: s_add_i32 s4, 0x1000, 0 ; GISEL-NEXT: v_ashrrev_i32_e32 v3, 31, v1 ; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s10 +; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s4 +; GISEL-NEXT: s_sub_i32 s5, 0, s4 ; GISEL-NEXT: v_add_i32_e32 v1, vcc, v1, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v6, v5, s10 -; GISEL-NEXT: v_mul_hi_u32 v7, v5, s10 -; GISEL-NEXT: v_mul_lo_u32 v8, v4, s10 -; GISEL-NEXT: v_mul_hi_u32 v9, v4, s10 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v5 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v4 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v5, v6 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v4, v7 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v1 -; GISEL-NEXT: v_mul_lo_u32 v5, v5, s10 -; GISEL-NEXT: v_mul_lo_u32 v4, v4, s10 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v5 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v4 -; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s10, v6 -; GISEL-NEXT: v_add_i32_e64 v8, s[4:5], s10, v6 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; GISEL-NEXT: v_subrev_i32_e64 v0, s[6:7], s10, v6 -; GISEL-NEXT: v_cmp_le_u32_e64 s[6:7], s10, v7 -; GISEL-NEXT: v_add_i32_e64 v5, s[8:9], s10, v7 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v4 -; GISEL-NEXT: v_subrev_i32_e64 v1, s[10:11], s10, v7 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[8:9] +; GISEL-NEXT: v_mul_lo_u32 v6, s5, v5 +; GISEL-NEXT: v_mul_lo_u32 v7, s5, v4 +; GISEL-NEXT: v_mul_hi_u32 v6, v5, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v4, v7 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v6 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v7 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v1, v4 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, s4 +; GISEL-NEXT: v_mul_lo_u32 v4, v4, s4 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; GISEL-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GISEL-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 @@ -413,61 +369,49 @@ define <2 x i32> @v_srem_v2i32_pow2k_denom(<2 x i32> %num) { ; CGP-NEXT: s_movk_i32 s4, 0x1000 ; CGP-NEXT: v_ashrrev_i32_e32 v2, 31, v0 ; CGP-NEXT: v_mov_b32_e32 v3, 0x1000 -; CGP-NEXT: v_ashrrev_i32_e32 v4, 31, v1 +; CGP-NEXT: s_mov_b32 s5, 0xfffff000 +; CGP-NEXT: v_mov_b32_e32 v4, 0xfffff000 +; CGP-NEXT: v_ashrrev_i32_e32 v5, 31, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_cvt_f32_u32_e32 v5, s4 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_cvt_f32_u32_e32 v6, v3 +; CGP-NEXT: v_cvt_f32_u32_e32 v6, s4 +; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; CGP-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 +; CGP-NEXT: v_rcp_iflag_f32_e32 v7, v7 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 -; CGP-NEXT: v_lshlrev_b32_e32 v7, 12, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v5, s4 -; CGP-NEXT: v_lshlrev_b32_e32 v9, 12, v6 -; CGP-NEXT: v_mul_hi_u32 v10, v6, v3 -; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v7 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, 0, v9 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v8 -; CGP-NEXT: v_cndmask_b32_e32 v7, v7, v11, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v8, v9, v12, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v7, v7, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v6 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v5, v7 -; CGP-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v6, v8 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; CGP-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CGP-NEXT: v_cndmask_b32_e64 v6, v6, v7, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v5, v5, v0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v1 -; CGP-NEXT: v_lshlrev_b32_e32 v5, 12, v5 +; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 +; CGP-NEXT: v_mul_lo_u32 v8, s5, v6 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v7 +; CGP-NEXT: v_mul_hi_u32 v8, v6, v8 +; CGP-NEXT: v_mul_hi_u32 v4, v7, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v7, v4 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 ; CGP-NEXT: v_lshlrev_b32_e32 v6, 12, v6 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v0, v5 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v1, v6 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v7, v3 -; CGP-NEXT: v_add_i32_e64 v9, s[4:5], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v8, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[8:9], v8, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v6 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v8, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v8, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v9, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[8:9] +; CGP-NEXT: v_lshlrev_b32_e32 v4, 12, v4 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v1, v3 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc +; CGP-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v1, v3 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] %result = srem <2 x i32> %num, ret <2 x i32> %result @@ -477,34 +421,27 @@ define i32 @v_srem_i32_oddk_denom(i32 %num) { ; CHECK-LABEL: v_srem_i32_oddk_denom: ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_mov_b32 s6, 0x12d8fb +; CHECK-NEXT: s_mov_b32 s4, 0x12d8fb ; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0 -; CHECK-NEXT: v_mov_b32_e32 v2, 0x12d8fb +; CHECK-NEXT: v_mov_b32_e32 v2, 0xffed2705 ; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s6 +; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s4 ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_mul_lo_u32 v4, v3, s6 -; CHECK-NEXT: v_mul_hi_u32 v5, v3, s6 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CHECK-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v3 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_mul_lo_u32 v3, v3, s6 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v4 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v4, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v3, v2 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, s4 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v1 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -517,56 +454,43 @@ define <2 x i32> @v_srem_v2i32_oddk_denom(<2 x i32> %num) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_ashrrev_i32_e32 v2, 31, v0 -; GISEL-NEXT: s_add_i32 s10, 0x12d8fb, 0 +; GISEL-NEXT: s_add_i32 s4, 0x12d8fb, 0 ; GISEL-NEXT: v_ashrrev_i32_e32 v3, 31, v1 ; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s10 +; GISEL-NEXT: v_cvt_f32_u32_e32 v4, s4 +; GISEL-NEXT: s_sub_i32 s5, 0, s4 ; GISEL-NEXT: v_add_i32_e32 v1, vcc, v1, v3 ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_mul_lo_u32 v6, v5, s10 -; GISEL-NEXT: v_mul_hi_u32 v7, v5, s10 -; GISEL-NEXT: v_mul_lo_u32 v8, v4, s10 -; GISEL-NEXT: v_mul_hi_u32 v9, v4, s10 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v5 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v4 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v5, v6 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v4, v7 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v4, v4, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v1 -; GISEL-NEXT: v_mul_lo_u32 v5, v5, s10 -; GISEL-NEXT: v_mul_lo_u32 v4, v4, s10 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v5 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v4 -; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s10, v6 -; GISEL-NEXT: v_add_i32_e64 v8, s[4:5], s10, v6 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; GISEL-NEXT: v_subrev_i32_e64 v0, s[6:7], s10, v6 -; GISEL-NEXT: v_cmp_le_u32_e64 s[6:7], s10, v7 -; GISEL-NEXT: v_add_i32_e64 v5, s[8:9], s10, v7 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v4 -; GISEL-NEXT: v_subrev_i32_e64 v1, s[10:11], s10, v7 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[8:9] +; GISEL-NEXT: v_mul_lo_u32 v6, s5, v5 +; GISEL-NEXT: v_mul_lo_u32 v7, s5, v4 +; GISEL-NEXT: v_mul_hi_u32 v6, v5, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v4, v7 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v6 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v7 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v4, v1, v4 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, s4 +; GISEL-NEXT: v_mul_lo_u32 v4, v4, s4 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; GISEL-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GISEL-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 @@ -576,64 +500,52 @@ define <2 x i32> @v_srem_v2i32_oddk_denom(<2 x i32> %num) { ; CGP-LABEL: v_srem_v2i32_oddk_denom: ; CGP: ; %bb.0: ; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CGP-NEXT: s_mov_b32 s8, 0x12d8fb +; CGP-NEXT: s_mov_b32 s4, 0x12d8fb ; CGP-NEXT: v_ashrrev_i32_e32 v2, 31, v0 ; CGP-NEXT: v_mov_b32_e32 v3, 0x12d8fb -; CGP-NEXT: v_ashrrev_i32_e32 v4, 31, v1 +; CGP-NEXT: s_mov_b32 s5, 0xffed2705 +; CGP-NEXT: v_mov_b32_e32 v4, 0xffed2705 +; CGP-NEXT: v_ashrrev_i32_e32 v5, 31, v1 ; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_cvt_f32_u32_e32 v5, s8 -; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v4 -; CGP-NEXT: v_cvt_f32_u32_e32 v6, v3 +; CGP-NEXT: v_cvt_f32_u32_e32 v6, s4 +; CGP-NEXT: v_add_i32_e32 v1, vcc, v1, v5 +; CGP-NEXT: v_cvt_f32_u32_e32 v7, v3 ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; CGP-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; CGP-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 +; CGP-NEXT: v_rcp_iflag_f32_e32 v7, v7 +; CGP-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; CGP-NEXT: v_mul_f32_e32 v7, 0x4f7ffffe, v7 ; CGP-NEXT: v_cvt_u32_f32_e32 v6, v6 -; CGP-NEXT: v_mul_lo_u32 v7, v5, s8 -; CGP-NEXT: v_mul_hi_u32 v8, v5, s8 -; CGP-NEXT: v_mul_lo_u32 v9, v6, v3 -; CGP-NEXT: v_mul_hi_u32 v10, v6, v3 -; CGP-NEXT: v_sub_i32_e32 v11, vcc, 0, v7 -; CGP-NEXT: v_sub_i32_e32 v12, vcc, 0, v9 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v8 -; CGP-NEXT: v_cndmask_b32_e32 v7, v7, v11, vcc -; CGP-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v10 -; CGP-NEXT: v_cndmask_b32_e64 v8, v9, v12, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v7, v7, v5 -; CGP-NEXT: v_mul_hi_u32 v8, v8, v6 -; CGP-NEXT: v_add_i32_e64 v9, s[6:7], v5, v7 -; CGP-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; CGP-NEXT: v_add_i32_e64 v7, s[6:7], v6, v8 -; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; CGP-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CGP-NEXT: v_cndmask_b32_e64 v6, v6, v7, s[4:5] -; CGP-NEXT: v_mul_hi_u32 v5, v5, v0 -; CGP-NEXT: v_mul_hi_u32 v6, v6, v1 -; CGP-NEXT: v_mul_lo_u32 v5, v5, s8 -; CGP-NEXT: v_mul_lo_u32 v6, v6, v3 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v0, v5 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v1, v6 -; CGP-NEXT: v_cmp_le_u32_e32 vcc, s8, v7 -; CGP-NEXT: v_add_i32_e64 v9, s[4:5], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v7, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[6:7], v8, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[8:9], v8, v3 -; CGP-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v6 -; CGP-NEXT: v_sub_i32_e64 v1, s[10:11], v8, v3 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc -; CGP-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CGP-NEXT: v_cndmask_b32_e32 v1, v8, v1, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v9, v0, s[4:5] -; CGP-NEXT: v_cndmask_b32_e64 v1, v5, v1, s[8:9] +; CGP-NEXT: v_cvt_u32_f32_e32 v7, v7 +; CGP-NEXT: v_mul_lo_u32 v8, s5, v6 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v7 +; CGP-NEXT: v_mul_hi_u32 v8, v6, v8 +; CGP-NEXT: v_mul_hi_u32 v4, v7, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, v6, v8 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v7, v4 +; CGP-NEXT: v_mul_hi_u32 v6, v0, v6 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 +; CGP-NEXT: v_mul_lo_u32 v6, v6, s4 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v1, v3 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc +; CGP-NEXT: v_subrev_i32_e32 v4, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v6, vcc, v1, v3 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc ; CGP-NEXT: v_xor_b32_e32 v0, v0, v2 -; CGP-NEXT: v_xor_b32_e32 v1, v1, v4 +; CGP-NEXT: v_xor_b32_e32 v1, v1, v5 ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] %result = srem <2 x i32> %num, ret <2 x i32> %result @@ -651,28 +563,22 @@ define i32 @v_srem_i32_pow2_shl_denom(i32 %x, i32 %y) { ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v2 ; CHECK-NEXT: v_xor_b32_e32 v1, v1, v3 ; CHECK-NEXT: v_cvt_f32_u32_e32 v3, v1 +; CHECK-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_mul_lo_u32 v4, v3, v1 -; CHECK-NEXT: v_mul_hi_u32 v5, v3, v1 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CHECK-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v3 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 +; CHECK-NEXT: v_mul_lo_u32 v4, v4, v3 +; CHECK-NEXT: v_mul_hi_u32 v4, v3, v4 +; CHECK-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CHECK-NEXT: v_mul_hi_u32 v3, v0, v3 ; CHECK-NEXT: v_mul_lo_u32 v3, v3, v1 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; CHECK-NEXT: v_add_i32_e64 v5, s[4:5], v4, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, v0, v2 ; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -701,51 +607,39 @@ define <2 x i32> @v_srem_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; GISEL-NEXT: v_xor_b32_e32 v2, v2, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v8, v3 +; GISEL-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f800000, v6 -; GISEL-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 +; GISEL-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 -; GISEL-NEXT: v_cvt_u32_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_lo_u32 v8, v6, v2 -; GISEL-NEXT: v_mul_hi_u32 v9, v6, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v7, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v8 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v8, v8, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v9, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v8, v8, v6 -; GISEL-NEXT: v_mul_hi_u32 v9, v9, v7 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v6, v8 -; GISEL-NEXT: v_sub_i32_e64 v6, s[6:7], v6, v8 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v7, v9 -; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v7, v7, v8, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v0 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_lo_u32 v9, v9, v8 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v9, v8, v9 +; GISEL-NEXT: v_add_i32_e32 v6, vcc, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; GISEL-NEXT: v_mul_hi_u32 v6, v0, v6 +; GISEL-NEXT: v_mul_hi_u32 v7, v1, v7 ; GISEL-NEXT: v_mul_lo_u32 v6, v6, v2 ; GISEL-NEXT: v_mul_lo_u32 v7, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v8, vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e32 v9, vcc, v1, v7 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v8, v2 -; GISEL-NEXT: v_add_i32_e64 v10, s[4:5], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v9, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v9, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v7 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v9, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc +; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v5 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 @@ -844,28 +738,22 @@ define i32 @v_srem_i32_24bit(i32 %num, i32 %den) { ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v3, v1 +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; GISEL-NEXT: v_cvt_u32_f32_e32 v3, v3 -; GISEL-NEXT: v_mul_lo_u32 v4, v3, v1 -; GISEL-NEXT: v_mul_hi_u32 v5, v3, v1 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v3 -; GISEL-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v0 +; GISEL-NEXT: v_mul_lo_u32 v4, v4, v3 +; GISEL-NEXT: v_mul_hi_u32 v4, v3, v4 +; GISEL-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v3 ; GISEL-NEXT: v_mul_lo_u32 v3, v3, v1 -; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v4, v1 -; GISEL-NEXT: v_add_i32_e64 v5, s[4:5], v4, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v4, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v2 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 ; GISEL-NEXT: s_setpc_b64 s[30:31] @@ -930,51 +818,39 @@ define <2 x i32> @v_srem_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_xor_b32_e32 v3, v3, v7 ; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v7, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v8, v3 +; GISEL-NEXT: v_sub_i32_e32 v9, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 -; GISEL-NEXT: v_mul_f32_e32 v7, 0x4f800000, v7 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v5 +; GISEL-NEXT: v_mul_f32_e32 v8, 0x4f7ffffe, v8 ; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_cvt_u32_f32_e32 v7, v7 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v2 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v2 -; GISEL-NEXT: v_mul_lo_u32 v10, v7, v3 -; GISEL-NEXT: v_mul_hi_u32 v11, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, 0, v8 -; GISEL-NEXT: v_sub_i32_e32 v13, vcc, 0, v10 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v8, v8, v12, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v11 -; GISEL-NEXT: v_cndmask_b32_e64 v9, v10, v13, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v8, v8, v5 -; GISEL-NEXT: v_mul_hi_u32 v9, v9, v7 -; GISEL-NEXT: v_add_i32_e64 v10, s[6:7], v5, v8 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v8 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v7, v9 -; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v7, v9 -; GISEL-NEXT: v_cndmask_b32_e32 v5, v5, v10, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v7, v7, v8, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v0 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v8, v8 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v5 +; GISEL-NEXT: v_mul_lo_u32 v9, v9, v8 +; GISEL-NEXT: v_mul_hi_u32 v7, v5, v7 +; GISEL-NEXT: v_mul_hi_u32 v9, v8, v9 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v5, v7 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, v8, v9 +; GISEL-NEXT: v_mul_hi_u32 v5, v0, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v1, v7 ; GISEL-NEXT: v_mul_lo_u32 v5, v5, v2 ; GISEL-NEXT: v_mul_lo_u32 v7, v7, v3 -; GISEL-NEXT: v_sub_i32_e32 v8, vcc, v0, v5 -; GISEL-NEXT: v_sub_i32_e32 v9, vcc, v1, v7 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v8, v2 -; GISEL-NEXT: v_add_i32_e64 v10, s[4:5], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v5 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v8, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v9, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v9, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v7 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v9, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v9, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v10, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v7, vcc ; GISEL-NEXT: v_xor_b32_e32 v0, v0, v4 ; GISEL-NEXT: v_xor_b32_e32 v1, v1, v6 ; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll index e9a808dce646..438388ebf713 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll @@ -159,36 +159,30 @@ define i64 @v_srem_i64(i64 %num, i64 %den) { ; CHECK-NEXT: v_sub_i32_e32 v4, vcc, v3, v7 ; CHECK-NEXT: v_subb_u32_e32 v5, vcc, v1, v7, vcc ; CHECK-NEXT: BB0_2: ; %Flow -; CHECK-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CHECK-NEXT: s_xor_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CHECK-NEXT: s_xor_b64 exec, exec, s[4:5] ; CHECK-NEXT: s_cbranch_execz BB0_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v2 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v2 +; CHECK-NEXT: v_mov_b32_e32 v5, 0 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v1 -; CHECK-NEXT: v_mov_b32_e32 v5, 0 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v1, v3 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v1 +; CHECK-NEXT: v_mul_hi_u32 v3, v1, v3 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v3 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v3, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v3, v2 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v0, s[4:5] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v4, v0, v1, vcc ; CHECK-NEXT: BB0_4: -; CHECK-NEXT: s_or_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_b64 exec, exec, s[4:5] ; CHECK-NEXT: v_mov_b32_e32 v0, v4 ; CHECK-NEXT: v_mov_b32_e32 v1, v5 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -365,28 +359,22 @@ define amdgpu_ps i64 @s_srem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_cbranch_scc0 BB1_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s4 +; CHECK-NEXT: s_sub_i32 s0, 0, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CHECK-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_lo_u32 v1, v0, s4 -; CHECK-NEXT: v_mul_hi_u32 v2, v0, s4 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 -; CHECK-NEXT: v_add_i32_e64 v2, s[0:1], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[0:1], v0, v1 -; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: v_mul_hi_u32 v0, v0, s2 +; CHECK-NEXT: v_mul_lo_u32 v1, s0, v0 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_mul_hi_u32 v0, s2, v0 ; CHECK-NEXT: v_mul_lo_u32 v0, v0, s4 -; CHECK-NEXT: v_sub_i32_e32 v1, vcc, s2, v0 -; CHECK-NEXT: v_add_i32_e64 v2, s[0:1], s4, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[0:1], s2, v0 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 -; CHECK-NEXT: v_subrev_i32_e64 v0, s[2:3], s4, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[0:1] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, s2, v0 +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CHECK-NEXT: BB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 @@ -845,36 +833,30 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v1, v11 ; CGP-NEXT: v_subb_u32_e32 v1, vcc, v5, v11, vcc ; CGP-NEXT: BB2_2: ; %Flow2 -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB2_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 -; CGP-NEXT: v_mul_hi_u32 v5, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v9, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v8 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v8, v0 ; CGP-NEXT: v_mul_lo_u32 v0, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v8, v0 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v1, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v0 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v1, v4 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v8, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB2_4: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_or_b32_e32 v5, v3, v7 ; CGP-NEXT: v_mov_b32_e32 v4, 0 ; CGP-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[4:5] @@ -1026,36 +1008,30 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: v_sub_i32_e32 v4, vcc, v4, v9 ; CGP-NEXT: v_subb_u32_e32 v5, vcc, v3, v9, vcc ; CGP-NEXT: BB2_6: ; %Flow -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB2_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v6 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v6 +; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v3, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, v2, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v4, v6 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v2, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v4, v6 -; CGP-NEXT: v_sub_i32_e64 v2, s[6:7], v4, v6 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; CGP-NEXT: v_cndmask_b32_e64 v4, v5, v2, s[4:5] -; CGP-NEXT: v_mov_b32_e32 v5, 0 +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v6 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v3, vcc +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v6 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v4, v2, v3, vcc ; CGP-NEXT: BB2_8: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_mov_b32_e32 v2, v4 ; CGP-NEXT: v_mov_b32_e32 v3, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -2651,36 +2627,30 @@ define i64 @v_srem_i64_pow2_shl_denom(i64 %x, i64 %y) { ; CHECK-NEXT: v_sub_i32_e32 v2, vcc, v2, v7 ; CHECK-NEXT: v_subb_u32_e32 v3, vcc, v1, v7, vcc ; CHECK-NEXT: BB7_2: ; %Flow -; CHECK-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CHECK-NEXT: s_xor_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CHECK-NEXT: s_xor_b64 exec, exec, s[4:5] ; CHECK-NEXT: s_cbranch_execz BB7_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v4 +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, 0, v4 +; CHECK-NEXT: v_mov_b32_e32 v3, 0 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 -; CHECK-NEXT: v_mul_hi_u32 v3, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v1, v2 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v2, v1, v2 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v1, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v2, v4 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v2, v4 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v2, v3, v0, s[4:5] -; CHECK-NEXT: v_mov_b32_e32 v3, 0 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v0, v1, vcc ; CHECK-NEXT: BB7_4: -; CHECK-NEXT: s_or_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_b64 exec, exec, s[4:5] ; CHECK-NEXT: v_mov_b32_e32 v0, v2 ; CHECK-NEXT: v_mov_b32_e32 v1, v3 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -3139,36 +3109,30 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: v_sub_i32_e32 v0, vcc, v1, v11 ; CGP-NEXT: v_subb_u32_e32 v1, vcc, v4, v11, vcc ; CGP-NEXT: BB8_2: ; %Flow2 -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB8_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v10 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v10 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 -; CGP-NEXT: v_mul_hi_u32 v4, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v5 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v5, v0 ; CGP-NEXT: v_mul_lo_u32 v0, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v5, v0 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v1, v10 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v5, v0 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v1, v10 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v5, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB8_4: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_or_b32_e32 v5, v3, v9 ; CGP-NEXT: v_mov_b32_e32 v4, 0 ; CGP-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[4:5] @@ -3320,36 +3284,30 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: v_sub_i32_e32 v4, vcc, v4, v9 ; CGP-NEXT: v_subb_u32_e32 v5, vcc, v3, v9, vcc ; CGP-NEXT: BB8_6: ; %Flow -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB8_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v8 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v8 +; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v3, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, v2, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v4, v8 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v2, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v4, v8 -; CGP-NEXT: v_sub_i32_e64 v2, s[6:7], v4, v8 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; CGP-NEXT: v_cndmask_b32_e64 v4, v5, v2, s[4:5] -; CGP-NEXT: v_mov_b32_e32 v5, 0 +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v3, vcc +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v4, v2, v3, vcc ; CGP-NEXT: BB8_8: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_mov_b32_e32 v2, v4 ; CGP-NEXT: v_mov_b32_e32 v3, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -3365,29 +3323,23 @@ define i64 @v_srem_i64_24bit(i64 %num, i64 %den) { ; GISEL-NEXT: s_mov_b32 s4, 0xffffff ; GISEL-NEXT: v_and_b32_e32 v1, s4, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v2, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: v_mov_b32_e32 v1, 0 ; GISEL-NEXT: s_setpc_b64 s[30:31] ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll index 336305347f53..25eafb45f930 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll @@ -9,28 +9,24 @@ define i32 @v_udiv_i32(i32 %num, i32 %den) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 ; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_udiv_i32: @@ -75,28 +71,24 @@ define amdgpu_ps i32 @s_udiv_i32(i32 inreg %num, i32 inreg %den) { ; GISEL-LABEL: s_udiv_i32: ; GISEL: ; %bb.0: ; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s1 +; GISEL-NEXT: s_sub_i32 s2, 0, s1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GISEL-NEXT: v_cvt_u32_f32_e32 v0, v0 +; GISEL-NEXT: v_mul_lo_u32 v1, s2, v0 +; GISEL-NEXT: v_mul_hi_u32 v1, v0, v1 +; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GISEL-NEXT: v_mul_hi_u32 v0, s0, v0 ; GISEL-NEXT: v_mul_lo_u32 v1, v0, s1 -; GISEL-NEXT: v_mul_hi_u32 v2, v0, s1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GISEL-NEXT: v_mul_hi_u32 v1, v1, v0 -; GISEL-NEXT: v_add_i32_e64 v2, s[2:3], v0, v1 -; GISEL-NEXT: v_sub_i32_e64 v0, s[2:3], v0, v1 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v1 ; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GISEL-NEXT: v_mul_hi_u32 v0, v0, s0 -; GISEL-NEXT: v_mul_lo_u32 v1, v0, s1 +; GISEL-NEXT: v_subrev_i32_e64 v2, s[2:3], s1, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; GISEL-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; GISEL-NEXT: v_sub_i32_e32 v4, vcc, s0, v1 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, s0, v1 -; GISEL-NEXT: v_cmp_le_u32_e64 s[0:1], s1, v4 -; GISEL-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: v_readfirstlane_b32 s0, v0 ; GISEL-NEXT: ; return to shader part epilog ; @@ -142,51 +134,43 @@ define <2 x i32> @v_udiv_v2i32(<2 x i32> %num, <2 x i32> %den) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; GISEL-NEXT: v_mul_lo_u32 v9, v5, v3 -; GISEL-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; GISEL-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_udiv_v2i32: @@ -255,89 +239,117 @@ define i32 @v_udiv_i32_pow2k_denom(i32 %num) { ; CHECK-LABEL: v_udiv_i32_pow2k_denom: ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_movk_i32 s4, 0x1000 -; CHECK-NEXT: v_mov_b32_e32 v1, 0x1000 -; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s4 +; CHECK-NEXT: s_movk_i32 s6, 0x1000 +; CHECK-NEXT: v_mov_b32_e32 v1, 0xfffff000 +; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s6 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, s4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v2 -; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v2, v1 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_lshlrev_b32_e32 v2, 12, v1 +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; CHECK-NEXT: v_subrev_i32_e64 v2, s[4:5], s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v1 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %result = udiv i32 %num, 4096 ret i32 %result } define <2 x i32> @v_udiv_v2i32_pow2k_denom(<2 x i32> %num) { -; CHECK-LABEL: v_udiv_v2i32_pow2k_denom: -; CHECK: ; %bb.0: -; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_movk_i32 s4, 0x1000 -; CHECK-NEXT: v_mov_b32_e32 v2, 0x1000 -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s4 -; CHECK-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_cvt_u32_f32_e32 v4, v4 -; CHECK-NEXT: v_lshlrev_b32_e32 v5, 12, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v3, s4 -; CHECK-NEXT: v_lshlrev_b32_e32 v7, 12, v4 -; CHECK-NEXT: v_mul_hi_u32 v8, v4, v2 -; CHECK-NEXT: v_sub_i32_e32 v9, vcc, 0, v5 -; CHECK-NEXT: v_sub_i32_e32 v10, vcc, 0, v7 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CHECK-NEXT: v_cndmask_b32_e64 v6, v7, v10, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v5, v5, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v6, v4 -; CHECK-NEXT: v_add_i32_e64 v7, s[6:7], v3, v5 -; CHECK-NEXT: v_sub_i32_e64 v3, s[6:7], v3, v5 -; CHECK-NEXT: v_add_i32_e64 v5, s[6:7], v4, v6 -; CHECK-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v7, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v1 -; CHECK-NEXT: v_lshlrev_b32_e32 v5, 12, v3 -; CHECK-NEXT: v_add_i32_e32 v6, vcc, 1, v3 -; CHECK-NEXT: v_subrev_i32_e32 v7, vcc, 1, v3 -; CHECK-NEXT: v_lshlrev_b32_e32 v8, 12, v4 -; CHECK-NEXT: v_add_i32_e32 v9, vcc, 1, v4 -; CHECK-NEXT: v_subrev_i32_e32 v10, vcc, 1, v4 -; CHECK-NEXT: v_sub_i32_e32 v11, vcc, v0, v5 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v5 -; CHECK-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v8 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v8 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[6:7], v11, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v2 -; CHECK-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v3, v6, s[6:7] -; CHECK-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CHECK-NEXT: v_cndmask_b32_e64 v1, v4, v9, s[6:7] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v1, v10, v1, s[4:5] -; CHECK-NEXT: s_setpc_b64 s[30:31] +; GISEL-LABEL: v_udiv_v2i32_pow2k_denom: +; GISEL: ; %bb.0: +; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-NEXT: s_movk_i32 s8, 0x1000 +; GISEL-NEXT: v_cvt_f32_u32_e32 v2, s8 +; GISEL-NEXT: s_sub_i32 s4, 0, s8 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; GISEL-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_lo_u32 v4, s4, v3 +; GISEL-NEXT: v_mul_lo_u32 v5, s4, v2 +; GISEL-NEXT: v_mul_hi_u32 v4, v3, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v2, v5 +; GISEL-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v5 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v1, v2 +; GISEL-NEXT: v_lshlrev_b32_e32 v4, 12, v3 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GISEL-NEXT: v_lshlrev_b32_e32 v6, 12, v2 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v2 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v6 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GISEL-NEXT: v_subrev_i32_e64 v4, s[4:5], s8, v0 +; GISEL-NEXT: v_cmp_le_u32_e64 s[4:5], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e64 v2, v2, v7, s[4:5] +; GISEL-NEXT: v_subrev_i32_e64 v5, s[6:7], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v2 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v2, v5, vcc +; GISEL-NEXT: s_setpc_b64 s[30:31] +; +; CGP-LABEL: v_udiv_v2i32_pow2k_denom: +; CGP: ; %bb.0: +; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; CGP-NEXT: s_movk_i32 s4, 0x1000 +; CGP-NEXT: v_mov_b32_e32 v2, 0x1000 +; CGP-NEXT: s_mov_b32 s5, 0xfffff000 +; CGP-NEXT: v_mov_b32_e32 v3, 0xfffff000 +; CGP-NEXT: v_cvt_f32_u32_e32 v4, s4 +; CGP-NEXT: v_cvt_f32_u32_e32 v5, v2 +; CGP-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; CGP-NEXT: v_rcp_iflag_f32_e32 v5, v5 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_mul_f32_e32 v5, 0x4f7ffffe, v5 +; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 +; CGP-NEXT: v_cvt_u32_f32_e32 v5, v5 +; CGP-NEXT: v_mul_lo_u32 v6, s5, v4 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v5 +; CGP-NEXT: v_mul_hi_u32 v6, v4, v6 +; CGP-NEXT: v_mul_hi_u32 v3, v5, v3 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v6 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v5, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v1, v3 +; CGP-NEXT: v_lshlrev_b32_e32 v5, 12, v4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_lshlrev_b32_e32 v7, 12, v3 +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v3 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc +; CGP-NEXT: v_subrev_i32_e64 v5, s[4:5], s4, v0 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v2 +; CGP-NEXT: v_cndmask_b32_e64 v3, v3, v8, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v4 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v6, s[4:5] +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v3, v6, vcc +; CGP-NEXT: s_setpc_b64 s[30:31] %result = udiv <2 x i32> %num, ret <2 x i32> %result } @@ -347,87 +359,115 @@ define i32 @v_udiv_i32_oddk_denom(i32 %num) { ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: s_mov_b32 s6, 0x12d8fb -; CHECK-NEXT: v_cvt_f32_u32_e32 v1, s6 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 -; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v2, v1, s6 -; CHECK-NEXT: v_mul_hi_u32 v3, v1, s6 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, 0, v2 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v1, v2 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mov_b32_e32 v1, 0xffed2705 +; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s6 +; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 +; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v2, v1 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v2, v1, s6 ; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 -; CHECK-NEXT: v_subrev_i32_e32 v4, vcc, 1, v1 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 -; CHECK-NEXT: v_cmp_le_u32_e64 s[4:5], s6, v5 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v1, v3, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; CHECK-NEXT: v_subrev_i32_e64 v2, s[4:5], s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v1 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %result = udiv i32 %num, 1235195 ret i32 %result } define <2 x i32> @v_udiv_v2i32_oddk_denom(<2 x i32> %num) { -; CHECK-LABEL: v_udiv_v2i32_oddk_denom: -; CHECK: ; %bb.0: -; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_mov_b32 s8, 0x12d8fb -; CHECK-NEXT: v_mov_b32_e32 v2, 0x12d8fb -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s8 -; CHECK-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_cvt_u32_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_lo_u32 v5, v3, s8 -; CHECK-NEXT: v_mul_hi_u32 v6, v3, s8 -; CHECK-NEXT: v_mul_lo_u32 v7, v4, v2 -; CHECK-NEXT: v_mul_hi_u32 v8, v4, v2 -; CHECK-NEXT: v_sub_i32_e32 v9, vcc, 0, v5 -; CHECK-NEXT: v_sub_i32_e32 v10, vcc, 0, v7 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CHECK-NEXT: v_cndmask_b32_e64 v6, v7, v10, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v5, v5, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v6, v4 -; CHECK-NEXT: v_add_i32_e64 v7, s[6:7], v3, v5 -; CHECK-NEXT: v_sub_i32_e64 v3, s[6:7], v3, v5 -; CHECK-NEXT: v_add_i32_e64 v5, s[6:7], v4, v6 -; CHECK-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v7, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v1 -; CHECK-NEXT: v_mul_lo_u32 v5, v3, s8 -; CHECK-NEXT: v_add_i32_e32 v6, vcc, 1, v3 -; CHECK-NEXT: v_subrev_i32_e32 v7, vcc, 1, v3 -; CHECK-NEXT: v_mul_lo_u32 v8, v4, v2 -; CHECK-NEXT: v_add_i32_e32 v9, vcc, 1, v4 -; CHECK-NEXT: v_subrev_i32_e32 v10, vcc, 1, v4 -; CHECK-NEXT: v_sub_i32_e32 v11, vcc, v0, v5 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v5 -; CHECK-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v8 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v8 -; CHECK-NEXT: v_cmp_le_u32_e64 s[6:7], s8, v11 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v2 -; CHECK-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v3, v6, s[6:7] -; CHECK-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; CHECK-NEXT: v_cndmask_b32_e64 v1, v4, v9, s[6:7] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v7, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v1, v10, v1, s[4:5] -; CHECK-NEXT: s_setpc_b64 s[30:31] +; GISEL-LABEL: v_udiv_v2i32_oddk_denom: +; GISEL: ; %bb.0: +; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-NEXT: s_mov_b32 s8, 0x12d8fb +; GISEL-NEXT: v_cvt_f32_u32_e32 v2, s8 +; GISEL-NEXT: s_sub_i32 s4, 0, s8 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; GISEL-NEXT: v_cvt_u32_f32_e32 v3, v3 +; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_lo_u32 v4, s4, v3 +; GISEL-NEXT: v_mul_lo_u32 v5, s4, v2 +; GISEL-NEXT: v_mul_hi_u32 v4, v3, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v2, v5 +; GISEL-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v5 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v1, v2 +; GISEL-NEXT: v_mul_lo_u32 v4, v3, s8 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; GISEL-NEXT: v_mul_lo_u32 v6, v2, s8 +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v2 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v6 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; GISEL-NEXT: v_subrev_i32_e64 v4, s[4:5], s8, v0 +; GISEL-NEXT: v_cmp_le_u32_e64 s[4:5], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e64 v2, v2, v7, s[4:5] +; GISEL-NEXT: v_subrev_i32_e64 v5, s[6:7], s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v5, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v5, vcc, 1, v2 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s8, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v2, v5, vcc +; GISEL-NEXT: s_setpc_b64 s[30:31] +; +; CGP-LABEL: v_udiv_v2i32_oddk_denom: +; CGP: ; %bb.0: +; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; CGP-NEXT: s_mov_b32 s4, 0x12d8fb +; CGP-NEXT: v_mov_b32_e32 v2, 0x12d8fb +; CGP-NEXT: s_mov_b32 s5, 0xffed2705 +; CGP-NEXT: v_cvt_f32_u32_e32 v3, s4 +; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 +; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; CGP-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; CGP-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 +; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 +; CGP-NEXT: v_mul_lo_u32 v5, s5, v3 +; CGP-NEXT: v_mul_lo_u32 v6, s5, v4 +; CGP-NEXT: v_mul_hi_u32 v5, v3, v5 +; CGP-NEXT: v_mul_hi_u32 v6, v4, v6 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v5 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v6 +; CGP-NEXT: v_mul_hi_u32 v3, v0, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 +; CGP-NEXT: v_mul_lo_u32 v5, v3, s4 +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v3 +; CGP-NEXT: v_mul_lo_u32 v7, v4, v2 +; CGP-NEXT: v_add_i32_e32 v8, vcc, 1, v4 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v5 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v7 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v6, vcc +; CGP-NEXT: v_subrev_i32_e64 v5, s[4:5], s4, v0 +; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v2 +; CGP-NEXT: v_cndmask_b32_e64 v4, v4, v8, s[4:5] +; CGP-NEXT: v_sub_i32_e64 v6, s[6:7], v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc +; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v3 +; CGP-NEXT: v_cndmask_b32_e64 v1, v1, v6, s[4:5] +; CGP-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CGP-NEXT: v_cndmask_b32_e32 v0, v3, v5, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v4, v6, vcc +; CGP-NEXT: s_setpc_b64 s[30:31] %result = udiv <2 x i32> %num, ret <2 x i32> %result } @@ -438,28 +478,24 @@ define i32 @v_udiv_i32_pow2_shl_denom(i32 %x, i32 %y) { ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: v_lshl_b32_e32 v1, 0x1000, v1 ; CHECK-NEXT: v_cvt_f32_u32_e32 v2, v1 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_lo_u32 v3, v2, v1 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, v1 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v3, v2, v3 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 ; CHECK-NEXT: v_mul_lo_u32 v3, v2, v1 ; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %shl.y = shl i32 4096, %y %r = udiv i32 %x, %shl.y @@ -474,51 +510,43 @@ define <2 x i32> @v_udiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; GISEL-NEXT: v_lshl_b32_e32 v2, s4, v2 ; GISEL-NEXT: v_lshl_b32_e32 v3, s4, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; GISEL-NEXT: v_mul_lo_u32 v9, v5, v3 -; GISEL-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; GISEL-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_udiv_v2i32_pow2_shl_denom: @@ -595,28 +623,24 @@ define i32 @v_udiv_i32_24bit(i32 %num, i32 %den) { ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_and_b32_e32 v1, s4, v1 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 ; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_udiv_i32_24bit: @@ -669,51 +693,43 @@ define <2 x i32> @v_udiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; GISEL-NEXT: v_and_b32_e32 v2, s4, v2 ; GISEL-NEXT: v_and_b32_e32 v3, s4, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f7ffffe, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, 0x4f7ffffe, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 ; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v4 -; GISEL-NEXT: v_subrev_i32_e32 v8, vcc, 1, v4 -; GISEL-NEXT: v_mul_lo_u32 v9, v5, v3 -; GISEL-NEXT: v_add_i32_e32 v10, vcc, 1, v5 -; GISEL-NEXT: v_subrev_i32_e32 v11, vcc, 1, v5 -; GISEL-NEXT: v_sub_i32_e32 v12, vcc, v0, v6 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v6 -; GISEL-NEXT: v_sub_i32_e64 v0, s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v9 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v12, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v0, v3 -; GISEL-NEXT: s_and_b64 s[6:7], s[6:7], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v7, s[6:7] -; GISEL-NEXT: s_and_b64 s[6:7], s[8:9], s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v5, v10, s[6:7] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v1, v11, v1, s[4:5] +; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 +; GISEL-NEXT: v_add_i32_e32 v9, vcc, 1, v5 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v6 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v8 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc +; GISEL-NEXT: v_sub_i32_e64 v6, s[4:5], v0, v2 +; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v9, s[4:5] +; GISEL-NEXT: v_sub_i32_e64 v7, s[6:7], v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v6, vcc +; GISEL-NEXT: v_add_i32_e32 v6, vcc, 1, v4 +; GISEL-NEXT: v_cndmask_b32_e64 v1, v1, v7, s[4:5] +; GISEL-NEXT: v_add_i32_e32 v7, vcc, 1, v5 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v4, v6, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v5, v7, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_udiv_v2i32_24bit: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll index 219bcce04da1..e956af93bc6f 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll @@ -151,28 +151,24 @@ define i64 @v_udiv_i64(i64 %num, i64 %den) { ; CHECK-NEXT: s_cbranch_execz BB0_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v2 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v2 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v1, v3 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v1 +; CHECK-NEXT: v_mul_hi_u32 v3, v1, v3 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v3 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 ; CHECK-NEXT: v_add_i32_e32 v4, vcc, 1, v1 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v1 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v2 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v1, v4, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v4, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CHECK-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v4, v1, v3, vcc ; CHECK-NEXT: v_mov_b32_e32 v5, 0 ; CHECK-NEXT: BB0_4: ; CHECK-NEXT: s_or_b64 exec, exec, s[6:7] @@ -335,28 +331,24 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_cbranch_scc0 BB1_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s2 +; CHECK-NEXT: s_sub_i32 s1, 0, s2 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CHECK-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CHECK-NEXT: v_mul_lo_u32 v1, s1, v0 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_mul_hi_u32 v0, s0, v0 ; CHECK-NEXT: v_mul_lo_u32 v1, v0, s2 -; CHECK-NEXT: v_mul_hi_u32 v2, v0, s2 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 -; CHECK-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, s0, v1 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v1 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: v_mul_hi_u32 v0, v0, s0 -; CHECK-NEXT: v_mul_lo_u32 v1, v0, s2 +; CHECK-NEXT: v_subrev_i32_e64 v2, s[0:1], s2, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc ; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 -; CHECK-NEXT: v_subrev_i32_e32 v3, vcc, 1, v0 -; CHECK-NEXT: v_sub_i32_e32 v4, vcc, s0, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, s0, v1 -; CHECK-NEXT: v_cmp_le_u32_e64 s[0:1], s2, v4 -; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CHECK-NEXT: BB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 @@ -781,28 +773,24 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: s_cbranch_execz BB2_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v8, v0 ; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 -; CGP-NEXT: v_mul_hi_u32 v5, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v9, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 +; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v8, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 ; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v8 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 +; CGP-NEXT: v_sub_i32_e64 v5, s[4:5], v1, v4 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v9, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v10, vcc, v8, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v8, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v10, v4 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v9, v0, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB2_4: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -949,28 +937,24 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: s_cbranch_execz BB2_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v6 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v6 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; CGP-NEXT: v_subrev_i32_e32 v7, vcc, 1, v3 -; CGP-NEXT: v_sub_i32_e32 v8, vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v6 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v2, v3, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v4, v7, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v4, v3, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB2_8: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -2453,28 +2437,24 @@ define i64 @v_udiv_i64_pow2_shl_denom(i64 %x, i64 %y) { ; CHECK-NEXT: s_cbranch_execz BB7_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v4 +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, 0, v4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 -; CHECK-NEXT: v_mul_hi_u32 v3, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v1, v2 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v2, v1, v2 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 ; CHECK-NEXT: v_add_i32_e32 v3, vcc, 1, v1 -; CHECK-NEXT: v_subrev_i32_e32 v5, vcc, 1, v1 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v4 -; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v1, v3, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v2, v5, v0, vcc +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v1, v2, vcc ; CHECK-NEXT: v_mov_b32_e32 v3, 0 ; CHECK-NEXT: BB7_4: ; CHECK-NEXT: s_or_b64 exec, exec, s[6:7] @@ -2902,28 +2882,24 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: s_cbranch_execz BB8_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v10 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v10 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v5, v0 ; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 -; CGP-NEXT: v_mul_hi_u32 v4, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v5, v1 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 ; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v5 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v1, v10 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc ; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v0 -; CGP-NEXT: v_subrev_i32_e32 v6, vcc, 1, v0 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v5, v1 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v5, v1 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v10 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v0, v4, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB8_4: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -3070,28 +3046,24 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: s_cbranch_execz BB8_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v8 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v8 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 ; CGP-NEXT: v_add_i32_e32 v5, vcc, 1, v3 -; CGP-NEXT: v_subrev_i32_e32 v6, vcc, 1, v3 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v8 -; CGP-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; CGP-NEXT: v_cndmask_b32_e64 v2, v3, v5, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v4, v6, v2, vcc +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc +; CGP-NEXT: v_sub_i32_e64 v4, s[4:5], v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; CGP-NEXT: v_add_i32_e32 v4, vcc, 1, v3 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v4, v3, v4, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB8_8: ; CGP-NEXT: s_or_b64 exec, exec, s[6:7] @@ -3111,28 +3083,24 @@ define i64 @v_udiv_i64_24bit(i64 %num, i64 %den) { ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_and_b32_e32 v1, s4, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 ; GISEL-NEXT: v_add_i32_e32 v4, vcc, 1, v2 -; GISEL-NEXT: v_subrev_i32_e32 v5, vcc, 1, v2 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1 -; GISEL-NEXT: s_and_b64 s[4:5], s[4:5], vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v4, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc +; GISEL-NEXT: v_sub_i32_e64 v3, s[4:5], v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GISEL-NEXT: v_add_i32_e32 v3, vcc, 1, v2 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc ; GISEL-NEXT: v_mov_b32_e32 v1, 0 ; GISEL-NEXT: s_setpc_b64 s[30:31] ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll index 265246c5e8ec..68a83a91c62f 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll @@ -9,28 +9,22 @@ define i32 @v_urem_i32(i32 %num, i32 %den) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v2, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_urem_i32: @@ -73,28 +67,22 @@ define amdgpu_ps i32 @s_urem_i32(i32 inreg %num, i32 inreg %den) { ; GISEL-LABEL: s_urem_i32: ; GISEL: ; %bb.0: ; GISEL-NEXT: v_cvt_f32_u32_e32 v0, s1 +; GISEL-NEXT: s_sub_i32 s2, 0, s1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; GISEL-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; GISEL-NEXT: v_cvt_u32_f32_e32 v0, v0 -; GISEL-NEXT: v_mul_lo_u32 v1, v0, s1 -; GISEL-NEXT: v_mul_hi_u32 v2, v0, s1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; GISEL-NEXT: v_mul_hi_u32 v1, v1, v0 -; GISEL-NEXT: v_add_i32_e64 v2, s[2:3], v0, v1 -; GISEL-NEXT: v_sub_i32_e64 v0, s[2:3], v0, v1 -; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; GISEL-NEXT: v_mul_hi_u32 v0, v0, s0 +; GISEL-NEXT: v_mul_lo_u32 v1, s2, v0 +; GISEL-NEXT: v_mul_hi_u32 v1, v0, v1 +; GISEL-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GISEL-NEXT: v_mul_hi_u32 v0, s0, v0 ; GISEL-NEXT: v_mul_lo_u32 v0, v0, s1 -; GISEL-NEXT: v_sub_i32_e32 v1, vcc, s0, v0 -; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v1 -; GISEL-NEXT: v_add_i32_e64 v2, s[2:3], s1, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], s0, v0 -; GISEL-NEXT: v_subrev_i32_e64 v0, s[2:3], s1, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; GISEL-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; GISEL-NEXT: v_subrev_i32_e32 v1, vcc, s1, v0 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s1, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GISEL-NEXT: v_readfirstlane_b32 s0, v0 ; GISEL-NEXT: ; return to shader part epilog ; @@ -138,51 +126,40 @@ define <2 x i32> @v_urem_v2i32(<2 x i32> %num, <2 x i32> %den) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: s_mov_b32 s4, 0x4f7ffffe +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, s4, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, s4, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v4, v4, v2 ; GISEL-NEXT: v_mul_lo_u32 v5, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; GISEL-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_urem_v2i32: @@ -248,88 +225,101 @@ define i32 @v_urem_i32_pow2k_denom(i32 %num) { ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: s_movk_i32 s4, 0x1000 -; CHECK-NEXT: v_mov_b32_e32 v1, 0x1000 +; CHECK-NEXT: v_mov_b32_e32 v1, 0xfffff000 ; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, s4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 -; CHECK-NEXT: v_lshlrev_b32_e32 v2, 12, v2 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v2, v1 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_lshlrev_b32_e32 v1, 12, v1 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %result = urem i32 %num, 4096 ret i32 %result } define <2 x i32> @v_urem_v2i32_pow2k_denom(<2 x i32> %num) { -; CHECK-LABEL: v_urem_v2i32_pow2k_denom: -; CHECK: ; %bb.0: -; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_movk_i32 s4, 0x1000 -; CHECK-NEXT: v_mov_b32_e32 v2, 0x1000 -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s4 -; CHECK-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_cvt_u32_f32_e32 v4, v4 -; CHECK-NEXT: v_lshlrev_b32_e32 v5, 12, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v3, s4 -; CHECK-NEXT: v_lshlrev_b32_e32 v7, 12, v4 -; CHECK-NEXT: v_mul_hi_u32 v8, v4, v2 -; CHECK-NEXT: v_sub_i32_e32 v9, vcc, 0, v5 -; CHECK-NEXT: v_sub_i32_e32 v10, vcc, 0, v7 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CHECK-NEXT: v_cndmask_b32_e64 v6, v7, v10, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v5, v5, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v6, v4 -; CHECK-NEXT: v_add_i32_e64 v7, s[6:7], v3, v5 -; CHECK-NEXT: v_sub_i32_e64 v3, s[6:7], v3, v5 -; CHECK-NEXT: v_add_i32_e64 v5, s[6:7], v4, v6 -; CHECK-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v7, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v1 -; CHECK-NEXT: v_lshlrev_b32_e32 v3, 12, v3 -; CHECK-NEXT: v_lshlrev_b32_e32 v4, 12, v4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, v0, v3 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v1, v4 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v5, v2 -; CHECK-NEXT: v_add_i32_e64 v7, s[4:5], v5, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v5, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[6:7], v6, v2 -; CHECK-NEXT: v_add_i32_e64 v3, s[8:9], v6, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v4 -; CHECK-NEXT: v_sub_i32_e64 v1, s[10:11], v6, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc -; CHECK-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CHECK-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v7, v0, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[8:9] -; CHECK-NEXT: s_setpc_b64 s[30:31] +; GISEL-LABEL: v_urem_v2i32_pow2k_denom: +; GISEL: ; %bb.0: +; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-NEXT: s_movk_i32 s4, 0x1000 +; GISEL-NEXT: v_cvt_f32_u32_e32 v2, s4 +; GISEL-NEXT: s_sub_i32 s5, 0, s4 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_lo_u32 v3, s5, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v2 +; GISEL-NEXT: v_mul_hi_u32 v2, v1, v2 +; GISEL-NEXT: v_lshlrev_b32_e32 v3, 12, v3 +; GISEL-NEXT: v_lshlrev_b32_e32 v2, 12, v2 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v2 +; GISEL-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GISEL-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GISEL-NEXT: s_setpc_b64 s[30:31] +; +; CGP-LABEL: v_urem_v2i32_pow2k_denom: +; CGP: ; %bb.0: +; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; CGP-NEXT: s_movk_i32 s4, 0x1000 +; CGP-NEXT: v_mov_b32_e32 v2, 0x1000 +; CGP-NEXT: s_mov_b32 s5, 0x4f7ffffe +; CGP-NEXT: s_mov_b32 s6, 0xfffff000 +; CGP-NEXT: v_cvt_f32_u32_e32 v3, s4 +; CGP-NEXT: v_cvt_f32_u32_e32 v4, v2 +; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; CGP-NEXT: v_rcp_iflag_f32_e32 v4, v4 +; CGP-NEXT: v_mul_f32_e32 v3, s5, v3 +; CGP-NEXT: v_mul_f32_e32 v4, s5, v4 +; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 +; CGP-NEXT: v_cvt_u32_f32_e32 v4, v4 +; CGP-NEXT: v_mul_lo_u32 v5, s6, v3 +; CGP-NEXT: v_mul_lo_u32 v6, s6, v4 +; CGP-NEXT: v_mul_hi_u32 v5, v3, v5 +; CGP-NEXT: v_mul_hi_u32 v6, v4, v6 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v5 +; CGP-NEXT: v_add_i32_e32 v4, vcc, v4, v6 +; CGP-NEXT: v_mul_hi_u32 v3, v0, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v1, v4 +; CGP-NEXT: v_lshlrev_b32_e32 v3, 12, v3 +; CGP-NEXT: v_lshlrev_b32_e32 v4, 12, v4 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v4 +; CGP-NEXT: v_subrev_i32_e32 v3, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v1, v2 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CGP-NEXT: v_subrev_i32_e32 v3, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v1, v2 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CGP-NEXT: s_setpc_b64 s[30:31] %result = urem <2 x i32> %num, ret <2 x i32> %result } @@ -338,89 +328,94 @@ define i32 @v_urem_i32_oddk_denom(i32 %num) { ; CHECK-LABEL: v_urem_i32_oddk_denom: ; CHECK: ; %bb.0: ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_mov_b32 s6, 0x12d8fb -; CHECK-NEXT: v_mov_b32_e32 v1, 0x12d8fb -; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s6 +; CHECK-NEXT: s_mov_b32 s4, 0x12d8fb +; CHECK-NEXT: v_mov_b32_e32 v1, 0xffed2705 +; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_lo_u32 v3, v2, s6 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, s6 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 -; CHECK-NEXT: v_mul_lo_u32 v2, v2, s6 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s6, v3 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v2, v1 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_mul_lo_u32 v1, v1, s4 +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %result = urem i32 %num, 1235195 ret i32 %result } define <2 x i32> @v_urem_v2i32_oddk_denom(<2 x i32> %num) { -; CHECK-LABEL: v_urem_v2i32_oddk_denom: -; CHECK: ; %bb.0: -; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT: s_mov_b32 s8, 0x12d8fb -; CHECK-NEXT: v_mov_b32_e32 v2, 0x12d8fb -; CHECK-NEXT: v_cvt_f32_u32_e32 v3, s8 -; CHECK-NEXT: v_cvt_f32_u32_e32 v4, v2 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CHECK-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 -; CHECK-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; CHECK-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CHECK-NEXT: v_cvt_u32_f32_e32 v4, v4 -; CHECK-NEXT: v_mul_lo_u32 v5, v3, s8 -; CHECK-NEXT: v_mul_hi_u32 v6, v3, s8 -; CHECK-NEXT: v_mul_lo_u32 v7, v4, v2 -; CHECK-NEXT: v_mul_hi_u32 v8, v4, v2 -; CHECK-NEXT: v_sub_i32_e32 v9, vcc, 0, v5 -; CHECK-NEXT: v_sub_i32_e32 v10, vcc, 0, v7 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v5, v5, v9, vcc -; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v8 -; CHECK-NEXT: v_cndmask_b32_e64 v6, v7, v10, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v5, v5, v3 -; CHECK-NEXT: v_mul_hi_u32 v6, v6, v4 -; CHECK-NEXT: v_add_i32_e64 v7, s[6:7], v3, v5 -; CHECK-NEXT: v_sub_i32_e64 v3, s[6:7], v3, v5 -; CHECK-NEXT: v_add_i32_e64 v5, s[6:7], v4, v6 -; CHECK-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v7, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v5, s[4:5] -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v0 -; CHECK-NEXT: v_mul_hi_u32 v4, v4, v1 -; CHECK-NEXT: v_mul_lo_u32 v3, v3, s8 -; CHECK-NEXT: v_mul_lo_u32 v4, v4, v2 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, v0, v3 -; CHECK-NEXT: v_sub_i32_e32 v6, vcc, v1, v4 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s8, v5 -; CHECK-NEXT: v_add_i32_e64 v7, s[4:5], v5, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v3 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v5, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[6:7], v6, v2 -; CHECK-NEXT: v_add_i32_e64 v3, s[8:9], v6, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v4 -; CHECK-NEXT: v_sub_i32_e64 v1, s[10:11], v6, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc -; CHECK-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; CHECK-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v7, v0, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e64 v1, v3, v1, s[8:9] -; CHECK-NEXT: s_setpc_b64 s[30:31] +; GISEL-LABEL: v_urem_v2i32_oddk_denom: +; GISEL: ; %bb.0: +; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-NEXT: s_mov_b32 s4, 0x12d8fb +; GISEL-NEXT: v_cvt_f32_u32_e32 v2, s4 +; GISEL-NEXT: s_sub_i32 s5, 0, s4 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 +; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 +; GISEL-NEXT: v_mul_lo_u32 v3, s5, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v3, v0, v2 +; GISEL-NEXT: v_mul_hi_u32 v2, v1, v2 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, s4 +; GISEL-NEXT: v_mul_lo_u32 v2, v2, s4 +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v3 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v2 +; GISEL-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GISEL-NEXT: v_subrev_i32_e32 v2, vcc, s4, v0 +; GISEL-NEXT: v_subrev_i32_e32 v3, vcc, s4, v1 +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc +; GISEL-NEXT: s_setpc_b64 s[30:31] +; +; CGP-LABEL: v_urem_v2i32_oddk_denom: +; CGP: ; %bb.0: +; CGP-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; CGP-NEXT: s_mov_b32 s4, 0x12d8fb +; CGP-NEXT: v_mov_b32_e32 v2, 0x12d8fb +; CGP-NEXT: s_mov_b32 s5, 0xffed2705 +; CGP-NEXT: v_cvt_f32_u32_e32 v3, s4 +; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 +; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 +; CGP-NEXT: v_mul_lo_u32 v4, s5, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v4, v0, v3 +; CGP-NEXT: v_mul_hi_u32 v3, v1, v3 +; CGP-NEXT: v_mul_lo_u32 v4, v4, s4 +; CGP-NEXT: v_mul_lo_u32 v3, v3, v2 +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v1, v3 +; CGP-NEXT: v_subrev_i32_e32 v3, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v1, v2 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CGP-NEXT: v_subrev_i32_e32 v3, vcc, s4, v0 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, v1, v2 +; CGP-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v2 +; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc +; CGP-NEXT: s_setpc_b64 s[30:31] %result = urem <2 x i32> %num, ret <2 x i32> %result } @@ -431,28 +426,22 @@ define i32 @v_urem_i32_pow2_shl_denom(i32 %x, i32 %y) { ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT: v_lshl_b32_e32 v1, 0x1000, v1 ; CHECK-NEXT: v_cvt_f32_u32_e32 v2, v1 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; CHECK-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; CHECK-NEXT: v_cvt_u32_f32_e32 v2, v2 -; CHECK-NEXT: v_mul_lo_u32 v3, v2, v1 -; CHECK-NEXT: v_mul_hi_u32 v4, v2, v1 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; CHECK-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v2 +; CHECK-NEXT: v_mul_hi_u32 v3, v2, v3 +; CHECK-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; CHECK-NEXT: v_mul_hi_u32 v2, v0, v2 ; CHECK-NEXT: v_mul_lo_u32 v2, v2, v1 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; CHECK-NEXT: s_setpc_b64 s[30:31] %shl.y = shl i32 4096, %y %r = urem i32 %x, %shl.y @@ -464,54 +453,43 @@ define <2 x i32> @v_urem_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: s_movk_i32 s4, 0x1000 +; GISEL-NEXT: s_mov_b32 s5, 0x4f7ffffe ; GISEL-NEXT: v_lshl_b32_e32 v2, s4, v2 ; GISEL-NEXT: v_lshl_b32_e32 v3, s4, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, s5, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, s5, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v4, v4, v2 ; GISEL-NEXT: v_mul_lo_u32 v5, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; GISEL-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_urem_v2i32_pow2_shl_denom: @@ -584,28 +562,22 @@ define i32 @v_urem_i32_24bit(i32 %num, i32 %den) { ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_and_b32_e32 v1, s4, v1 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v2, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_urem_i32_24bit: @@ -651,56 +623,45 @@ define <2 x i32> @v_urem_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) { ; GISEL: ; %bb.0: ; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GISEL-NEXT: s_mov_b32 s4, 0xffffff +; GISEL-NEXT: s_mov_b32 s5, 0x4f7ffffe ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_and_b32_e32 v1, s4, v1 ; GISEL-NEXT: v_and_b32_e32 v2, s4, v2 ; GISEL-NEXT: v_and_b32_e32 v3, s4, v3 ; GISEL-NEXT: v_cvt_f32_u32_e32 v4, v2 -; GISEL-NEXT: v_cvt_f32_u32_e32 v5, v3 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 +; GISEL-NEXT: v_cvt_f32_u32_e32 v6, v3 +; GISEL-NEXT: v_sub_i32_e32 v7, vcc, 0, v3 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v4, v4 -; GISEL-NEXT: v_rcp_iflag_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_f32_e32 v4, 0x4f800000, v4 -; GISEL-NEXT: v_mul_f32_e32 v5, 0x4f800000, v5 +; GISEL-NEXT: v_rcp_iflag_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_f32_e32 v4, s5, v4 +; GISEL-NEXT: v_mul_f32_e32 v6, s5, v6 ; GISEL-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GISEL-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GISEL-NEXT: v_mul_lo_u32 v6, v4, v2 -; GISEL-NEXT: v_mul_hi_u32 v7, v4, v2 -; GISEL-NEXT: v_mul_lo_u32 v8, v5, v3 -; GISEL-NEXT: v_mul_hi_u32 v9, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v10, vcc, 0, v6 -; GISEL-NEXT: v_sub_i32_e32 v11, vcc, 0, v8 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v6, v6, v10, vcc -; GISEL-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v9 -; GISEL-NEXT: v_cndmask_b32_e64 v7, v8, v11, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v6, v6, v4 -; GISEL-NEXT: v_mul_hi_u32 v7, v7, v5 -; GISEL-NEXT: v_add_i32_e64 v8, s[6:7], v4, v6 -; GISEL-NEXT: v_sub_i32_e64 v4, s[6:7], v4, v6 -; GISEL-NEXT: v_add_i32_e64 v6, s[6:7], v5, v7 -; GISEL-NEXT: v_sub_i32_e64 v5, s[6:7], v5, v7 -; GISEL-NEXT: v_cndmask_b32_e32 v4, v4, v8, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v5, v5, v6, s[4:5] -; GISEL-NEXT: v_mul_hi_u32 v4, v4, v0 -; GISEL-NEXT: v_mul_hi_u32 v5, v5, v1 +; GISEL-NEXT: v_cvt_u32_f32_e32 v6, v6 +; GISEL-NEXT: v_mul_lo_u32 v5, v5, v4 +; GISEL-NEXT: v_mul_lo_u32 v7, v7, v6 +; GISEL-NEXT: v_mul_hi_u32 v5, v4, v5 +; GISEL-NEXT: v_mul_hi_u32 v7, v6, v7 +; GISEL-NEXT: v_add_i32_e32 v4, vcc, v4, v5 +; GISEL-NEXT: v_add_i32_e32 v5, vcc, v6, v7 +; GISEL-NEXT: v_mul_hi_u32 v4, v0, v4 +; GISEL-NEXT: v_mul_hi_u32 v5, v1, v5 ; GISEL-NEXT: v_mul_lo_u32 v4, v4, v2 ; GISEL-NEXT: v_mul_lo_u32 v5, v5, v3 -; GISEL-NEXT: v_sub_i32_e32 v6, vcc, v0, v4 -; GISEL-NEXT: v_sub_i32_e32 v7, vcc, v1, v5 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v6, v2 -; GISEL-NEXT: v_add_i32_e64 v8, s[4:5], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v4 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v6, v2 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[6:7], v7, v3 -; GISEL-NEXT: v_add_i32_e64 v2, s[8:9], v7, v3 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[8:9], v1, v5 -; GISEL-NEXT: v_sub_i32_e64 v1, s[10:11], v7, v3 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v6, v0, vcc -; GISEL-NEXT: s_and_b64 vcc, s[6:7], s[8:9] -; GISEL-NEXT: v_cndmask_b32_e32 v1, v7, v1, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v8, v0, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e64 v1, v2, v1, s[8:9] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v4 +; GISEL-NEXT: v_sub_i32_e32 v1, vcc, v1, v5 +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc +; GISEL-NEXT: v_sub_i32_e32 v4, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v5, vcc, v1, v3 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 +; GISEL-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc ; GISEL-NEXT: s_setpc_b64 s[30:31] ; ; CGP-LABEL: v_urem_v2i32_24bit: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll index 3e43bcf0409c..6b9357043b3c 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll @@ -145,36 +145,30 @@ define i64 @v_urem_i64(i64 %num, i64 %den) { ; CHECK-NEXT: v_cndmask_b32_e32 v4, v5, v3, vcc ; CHECK-NEXT: v_cndmask_b32_e32 v5, v6, v1, vcc ; CHECK-NEXT: BB0_2: ; %Flow -; CHECK-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CHECK-NEXT: s_xor_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CHECK-NEXT: s_xor_b64 exec, exec, s[4:5] ; CHECK-NEXT: s_cbranch_execz BB0_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v2 +; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v2 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v3, v1, v2 -; CHECK-NEXT: v_mul_hi_u32 v4, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CHECK-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v3, v3, v1 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v1, v3 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v4, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v3, v3, v1 +; CHECK-NEXT: v_mul_hi_u32 v3, v1, v3 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v3 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v1, v1, v2 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, v0, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v3, v2 -; CHECK-NEXT: v_add_i32_e64 v4, s[4:5], v3, v2 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v2 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v4, v4, v0, s[4:5] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v2 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 +; CHECK-NEXT: v_cndmask_b32_e32 v4, v0, v1, vcc ; CHECK-NEXT: v_mov_b32_e32 v5, 0 ; CHECK-NEXT: BB0_4: -; CHECK-NEXT: s_or_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_b64 exec, exec, s[4:5] ; CHECK-NEXT: v_mov_b32_e32 v0, v4 ; CHECK-NEXT: v_mov_b32_e32 v1, v5 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -333,28 +327,22 @@ define amdgpu_ps i64 @s_urem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_cbranch_scc0 BB1_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s2 +; CHECK-NEXT: s_sub_i32 s1, 0, s2 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CHECK-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CHECK-NEXT: v_mul_lo_u32 v1, v0, s2 -; CHECK-NEXT: v_mul_hi_u32 v2, v0, s2 -; CHECK-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 -; CHECK-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 -; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: v_mul_hi_u32 v0, v0, s0 +; CHECK-NEXT: v_mul_lo_u32 v1, s1, v0 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 +; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_mul_hi_u32 v0, s0, v0 ; CHECK-NEXT: v_mul_lo_u32 v0, v0, s2 -; CHECK-NEXT: v_sub_i32_e32 v1, vcc, s0, v0 -; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v1 -; CHECK-NEXT: v_add_i32_e64 v2, s[4:5], s2, v1 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[0:1], s0, v0 -; CHECK-NEXT: v_subrev_i32_e64 v0, s[2:3], s2, v1 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[0:1] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[0:1] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, s0, v0 +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s2, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s2, v0 +; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CHECK-NEXT: BB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 @@ -771,36 +759,30 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v5, vcc ; CGP-NEXT: v_cndmask_b32_e32 v1, v10, v11, vcc ; CGP-NEXT: BB2_2: ; %Flow2 -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB2_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v4 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v4 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v4 -; CGP-NEXT: v_mul_hi_u32 v5, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v9, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v9, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v8 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v8, v0 ; CGP-NEXT: v_mul_lo_u32 v0, v0, v4 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v8, v0 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v4 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v1, v4 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v8, v0 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v1, v4 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v5, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v8, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB2_4: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_or_b32_e32 v5, v3, v7 ; CGP-NEXT: v_mov_b32_e32 v4, 0 ; CGP-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[4:5] @@ -938,36 +920,30 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) { ; CGP-NEXT: v_cndmask_b32_e32 v4, v5, v7, vcc ; CGP-NEXT: v_cndmask_b32_e32 v5, v8, v3, vcc ; CGP-NEXT: BB2_6: ; %Flow -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB2_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v6 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v6 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v6 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v7, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v7, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v3, v3, v6 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, v2, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v4, v6 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v4, v6 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[6:7], v4, v6 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; CGP-NEXT: v_cndmask_b32_e64 v4, v5, v2, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v6 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v3, vcc +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v6 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v6 +; CGP-NEXT: v_cndmask_b32_e32 v4, v2, v3, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB2_8: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_mov_b32_e32 v2, v4 ; CGP-NEXT: v_mov_b32_e32 v3, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -2421,36 +2397,30 @@ define i64 @v_urem_i64_pow2_shl_denom(i64 %x, i64 %y) { ; CHECK-NEXT: v_cndmask_b32_e32 v2, v3, v5, vcc ; CHECK-NEXT: v_cndmask_b32_e32 v3, v6, v1, vcc ; CHECK-NEXT: BB7_2: ; %Flow -; CHECK-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CHECK-NEXT: s_xor_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CHECK-NEXT: s_xor_b64 exec, exec, s[4:5] ; CHECK-NEXT: s_cbranch_execz BB7_4 ; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v1, v4 +; CHECK-NEXT: v_sub_i32_e32 v2, vcc, 0, v4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f800000, v1 +; CHECK-NEXT: v_mul_f32_e32 v1, 0x4f7ffffe, v1 ; CHECK-NEXT: v_cvt_u32_f32_e32 v1, v1 -; CHECK-NEXT: v_mul_lo_u32 v2, v1, v4 -; CHECK-NEXT: v_mul_hi_u32 v3, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v5, vcc, 0, v2 -; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3 -; CHECK-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc -; CHECK-NEXT: v_mul_hi_u32 v2, v2, v1 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v1, v2 -; CHECK-NEXT: v_sub_i32_e64 v1, s[4:5], v1, v2 -; CHECK-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc -; CHECK-NEXT: v_mul_hi_u32 v1, v1, v0 +; CHECK-NEXT: v_mul_lo_u32 v2, v2, v1 +; CHECK-NEXT: v_mul_hi_u32 v2, v1, v2 +; CHECK-NEXT: v_add_i32_e32 v1, vcc, v1, v2 +; CHECK-NEXT: v_mul_hi_u32 v1, v0, v1 ; CHECK-NEXT: v_mul_lo_u32 v1, v1, v4 -; CHECK-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 -; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v2, v4 -; CHECK-NEXT: v_add_i32_e64 v3, s[4:5], v2, v4 -; CHECK-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v1 -; CHECK-NEXT: v_sub_i32_e64 v0, s[6:7], v2, v4 -; CHECK-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc -; CHECK-NEXT: v_cndmask_b32_e64 v2, v3, v0, s[4:5] +; CHECK-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CHECK-NEXT: v_sub_i32_e32 v1, vcc, v0, v4 +; CHECK-NEXT: v_cmp_ge_u32_e32 vcc, v0, v4 +; CHECK-NEXT: v_cndmask_b32_e32 v2, v0, v1, vcc ; CHECK-NEXT: v_mov_b32_e32 v3, 0 ; CHECK-NEXT: BB7_4: -; CHECK-NEXT: s_or_b64 exec, exec, s[8:9] +; CHECK-NEXT: s_or_b64 exec, exec, s[4:5] ; CHECK-NEXT: v_mov_b32_e32 v0, v2 ; CHECK-NEXT: v_mov_b32_e32 v1, v3 ; CHECK-NEXT: s_setpc_b64 s[30:31] @@ -2867,36 +2837,30 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v7, vcc ; CGP-NEXT: v_cndmask_b32_e32 v1, v4, v11, vcc ; CGP-NEXT: BB8_2: ; %Flow2 -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB8_4 ; CGP-NEXT: ; %bb.3: ; CGP-NEXT: v_cvt_f32_u32_e32 v0, v10 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, 0, v10 ; CGP-NEXT: v_rcp_iflag_f32_e32 v0, v0 -; CGP-NEXT: v_mul_f32_e32 v0, 0x4f800000, v0 +; CGP-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 ; CGP-NEXT: v_cvt_u32_f32_e32 v0, v0 -; CGP-NEXT: v_mul_lo_u32 v1, v0, v10 -; CGP-NEXT: v_mul_hi_u32 v4, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v1 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; CGP-NEXT: v_cndmask_b32_e32 v1, v1, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v1, v1, v0 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v0, v1 -; CGP-NEXT: v_sub_i32_e64 v0, s[4:5], v0, v1 -; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc -; CGP-NEXT: v_mul_hi_u32 v0, v0, v5 +; CGP-NEXT: v_mul_lo_u32 v1, v1, v0 +; CGP-NEXT: v_mul_hi_u32 v1, v0, v1 +; CGP-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; CGP-NEXT: v_mul_hi_u32 v0, v5, v0 ; CGP-NEXT: v_mul_lo_u32 v0, v0, v10 -; CGP-NEXT: v_sub_i32_e32 v1, vcc, v5, v0 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 -; CGP-NEXT: v_add_i32_e64 v4, s[4:5], v1, v10 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v5, v0 -; CGP-NEXT: v_sub_i32_e64 v0, s[6:7], v1, v10 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CGP-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v0, vcc, v5, v0 +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc +; CGP-NEXT: v_sub_i32_e32 v1, vcc, v0, v10 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v0, v10 +; CGP-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CGP-NEXT: v_mov_b32_e32 v1, 0 ; CGP-NEXT: BB8_4: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_or_b32_e32 v5, v3, v9 ; CGP-NEXT: v_mov_b32_e32 v4, 0 ; CGP-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[4:5] @@ -3034,36 +2998,30 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) { ; CGP-NEXT: v_cndmask_b32_e32 v4, v5, v7, vcc ; CGP-NEXT: v_cndmask_b32_e32 v5, v6, v3, vcc ; CGP-NEXT: BB8_6: ; %Flow -; CGP-NEXT: s_or_saveexec_b64 s[8:9], s[6:7] -; CGP-NEXT: s_xor_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_saveexec_b64 s[4:5], s[6:7] +; CGP-NEXT: s_xor_b64 exec, exec, s[4:5] ; CGP-NEXT: s_cbranch_execz BB8_8 ; CGP-NEXT: ; %bb.7: ; CGP-NEXT: v_cvt_f32_u32_e32 v3, v8 +; CGP-NEXT: v_sub_i32_e32 v4, vcc, 0, v8 ; CGP-NEXT: v_rcp_iflag_f32_e32 v3, v3 -; CGP-NEXT: v_mul_f32_e32 v3, 0x4f800000, v3 +; CGP-NEXT: v_mul_f32_e32 v3, 0x4f7ffffe, v3 ; CGP-NEXT: v_cvt_u32_f32_e32 v3, v3 -; CGP-NEXT: v_mul_lo_u32 v4, v3, v8 -; CGP-NEXT: v_mul_hi_u32 v5, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v6, vcc, 0, v4 -; CGP-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5 -; CGP-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc -; CGP-NEXT: v_mul_hi_u32 v4, v4, v3 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v3, v4 -; CGP-NEXT: v_sub_i32_e64 v3, s[4:5], v3, v4 -; CGP-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; CGP-NEXT: v_mul_hi_u32 v3, v3, v2 +; CGP-NEXT: v_mul_lo_u32 v4, v4, v3 +; CGP-NEXT: v_mul_hi_u32 v4, v3, v4 +; CGP-NEXT: v_add_i32_e32 v3, vcc, v3, v4 +; CGP-NEXT: v_mul_hi_u32 v3, v2, v3 ; CGP-NEXT: v_mul_lo_u32 v3, v3, v8 -; CGP-NEXT: v_sub_i32_e32 v4, vcc, v2, v3 -; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v4, v8 -; CGP-NEXT: v_add_i32_e64 v5, s[4:5], v4, v8 -; CGP-NEXT: v_cmp_ge_u32_e64 s[4:5], v2, v3 -; CGP-NEXT: v_sub_i32_e64 v2, s[6:7], v4, v8 -; CGP-NEXT: s_and_b64 vcc, vcc, s[4:5] -; CGP-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc -; CGP-NEXT: v_cndmask_b32_e64 v4, v5, v2, s[4:5] +; CGP-NEXT: v_sub_i32_e32 v2, vcc, v2, v3 +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v2, v2, v3, vcc +; CGP-NEXT: v_sub_i32_e32 v3, vcc, v2, v8 +; CGP-NEXT: v_cmp_ge_u32_e32 vcc, v2, v8 +; CGP-NEXT: v_cndmask_b32_e32 v4, v2, v3, vcc ; CGP-NEXT: v_mov_b32_e32 v5, 0 ; CGP-NEXT: BB8_8: -; CGP-NEXT: s_or_b64 exec, exec, s[8:9] +; CGP-NEXT: s_or_b64 exec, exec, s[4:5] ; CGP-NEXT: v_mov_b32_e32 v2, v4 ; CGP-NEXT: v_mov_b32_e32 v3, v5 ; CGP-NEXT: s_setpc_b64 s[30:31] @@ -3080,28 +3038,22 @@ define i64 @v_urem_i64_24bit(i64 %num, i64 %den) { ; GISEL-NEXT: v_and_b32_e32 v0, s4, v0 ; GISEL-NEXT: v_and_b32_e32 v1, s4, v2 ; GISEL-NEXT: v_cvt_f32_u32_e32 v2, v1 +; GISEL-NEXT: v_sub_i32_e32 v3, vcc, 0, v1 ; GISEL-NEXT: v_rcp_iflag_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f800000, v2 +; GISEL-NEXT: v_mul_f32_e32 v2, 0x4f7ffffe, v2 ; GISEL-NEXT: v_cvt_u32_f32_e32 v2, v2 -; GISEL-NEXT: v_mul_lo_u32 v3, v2, v1 -; GISEL-NEXT: v_mul_hi_u32 v4, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v5, vcc, 0, v3 -; GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4 -; GISEL-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc -; GISEL-NEXT: v_mul_hi_u32 v3, v3, v2 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v2, v3 -; GISEL-NEXT: v_sub_i32_e64 v2, s[4:5], v2, v3 -; GISEL-NEXT: v_cndmask_b32_e32 v2, v2, v4, vcc -; GISEL-NEXT: v_mul_hi_u32 v2, v2, v0 +; GISEL-NEXT: v_mul_lo_u32 v3, v3, v2 +; GISEL-NEXT: v_mul_hi_u32 v3, v2, v3 +; GISEL-NEXT: v_add_i32_e32 v2, vcc, v2, v3 +; GISEL-NEXT: v_mul_hi_u32 v2, v0, v2 ; GISEL-NEXT: v_mul_lo_u32 v2, v2, v1 -; GISEL-NEXT: v_sub_i32_e32 v3, vcc, v0, v2 -; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v3, v1 -; GISEL-NEXT: v_add_i32_e64 v4, s[4:5], v3, v1 -; GISEL-NEXT: v_cmp_ge_u32_e64 s[4:5], v0, v2 -; GISEL-NEXT: v_sub_i32_e64 v0, s[6:7], v3, v1 -; GISEL-NEXT: s_and_b64 vcc, vcc, s[4:5] -; GISEL-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc -; GISEL-NEXT: v_cndmask_b32_e64 v0, v4, v0, s[4:5] +; GISEL-NEXT: v_sub_i32_e32 v0, vcc, v0, v2 +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc +; GISEL-NEXT: v_sub_i32_e32 v2, vcc, v0, v1 +; GISEL-NEXT: v_cmp_ge_u32_e32 vcc, v0, v1 +; GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc ; GISEL-NEXT: v_mov_b32_e32 v1, 0 ; GISEL-NEXT: s_setpc_b64 s[30:31] ; From llvm-commits at lists.llvm.org Wed Jul 8 11:15:39 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:15:39 +0000 (UTC) Subject: [PATCH] D83382: [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGecac951be92b: [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83382/new/ https://reviews.llvm.org/D83382 Files: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructions.td llvm/lib/Target/AMDGPU/CaymanInstructions.td llvm/lib/Target/AMDGPU/SIInstructions.td llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll llvm/test/CodeGen/AMDGPU/bypass-div.ll llvm/test/CodeGen/AMDGPU/sdiv.ll llvm/test/CodeGen/AMDGPU/udivrem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83382.276495.patch Type: text/x-patch Size: 65713 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:15:44 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:15:44 +0000 (UTC) Subject: [PATCH] D83383: [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl In-Reply-To: References: Message-ID: <8f3d41ab6e43c5a9bdd22b7340299ad9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa8816ebee01c: [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83383/new/ https://reviews.llvm.org/D83383 Files: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sdiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-srem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-udiv.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-urem.mir llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll From llvm-commits at lists.llvm.org Wed Jul 8 11:17:31 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:17:31 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <2497a8eedded1e00c7c0c225bb40ffac@localhost.localdomain> clementval updated this revision to Diff 276497. clementval added a comment. Revert llvm_unreachable in test as well Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276497.patch Type: text/x-patch Size: 7262 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:24:18 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:24:18 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo In-Reply-To: References: Message-ID: <0becd795537e86b487ccdeeb97bf64ae@localhost.localdomain> guiand added a comment. I had gotten the impression that these calls aren't supported on WebAssembly by lines in `WebAssemblyRuntimeLibcallSignatures.cpp` like: Table[RTLIB::ATOMIC_LOAD] = unsupported; But it turns out clang has no problem generating calls to `__atomic_load` anyway: https://gcc.godbolt.org/z/kMDLJb And it looks like LLVM looks to provide general support for these functions: https://llvm.org/docs/Atomics.html#libcalls-atomic Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83361/new/ https://reviews.llvm.org/D83361 From llvm-commits at lists.llvm.org Wed Jul 8 11:32:26 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:32:26 +0000 (UTC) Subject: [PATCH] D71739: [AssumeBundles] Use operand bundles to encode alignment assumptions In-Reply-To: References: Message-ID: <663dd694f8103ddfa8e899cb0326edf2@localhost.localdomain> Tyker updated this revision to Diff 276498. Tyker added a comment. addressed commemt. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71739/new/ https://reviews.llvm.org/D71739 Files: clang/lib/CodeGen/CodeGenFunction.cpp clang/test/CodeGen/align_value.cpp clang/test/CodeGen/alloc-align-attr.c clang/test/CodeGen/assume-aligned-and-alloc-align-attributes.c clang/test/CodeGen/builtin-align-array.c clang/test/CodeGen/builtin-align.c clang/test/CodeGen/builtin-assume-aligned.c clang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-lvalue.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-paramvar.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function-variable.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function-two-params.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params-variable.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-two-params.cpp clang/test/CodeGen/catch-alignment-assumption-openmp.cpp clang/test/CodeGen/non-power-of-2-alignment-assumptions.c clang/test/OpenMP/simd_codegen.cpp clang/test/OpenMP/simd_metadata.c clang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/Transforms/Scalar/AlignmentFromAssumptions.h llvm/lib/Analysis/AssumeBundleQueries.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp llvm/test/Transforms/AlignmentFromAssumptions/simple.ll llvm/test/Transforms/AlignmentFromAssumptions/simple32.ll llvm/test/Transforms/Inline/align.ll llvm/test/Transforms/InstCombine/assume.ll llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll llvm/test/Verifier/assume-bundles.ll llvm/unittests/Analysis/AssumeBundleQueriesTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D71739.276498.patch Type: text/x-patch Size: 105578 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:33:24 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:33:24 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls In-Reply-To: References: Message-ID: <1a559569a5d37453200b775304ca99a1@localhost.localdomain> nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LGTM ================ Comment at: llvm/include/llvm/IR/Attributes.td:42 +/// Parameter or return value may not contain uninitialized or poison bits +def NoUndef : EnumAttr<"noundef">; ---------------- nit: Missing period at the end. ================ Comment at: llvm/lib/AsmParser/LLLexer.cpp:699 KEYWORD(immarg); + KEYWORD(noundef); ---------------- It looks like apart from `immarg` this is sorted alphabetically, so I'd move `noundef` before `nounwind`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83412/new/ https://reviews.llvm.org/D83412 From llvm-commits at lists.llvm.org Wed Jul 8 11:35:17 2020 From: llvm-commits at lists.llvm.org (Jay Foad via llvm-commits) Date: Wed, 08 Jul 2020 11:35:17 -0700 (PDT) Subject: [llvm] 47788b9 - SILoadStoreOptimizer: add support for GFX10 image instructions Message-ID: <5f0611e5.1c69fb81.9244d.161f@mx.google.com> Author: Jay Foad Date: 2020-07-08T19:15:46+01:00 New Revision: 47788b97a9eb1215d0ac01826f51fbe286f56c0b URL: https://github.com/llvm/llvm-project/commit/47788b97a9eb1215d0ac01826f51fbe286f56c0b DIFF: https://github.com/llvm/llvm-project/commit/47788b97a9eb1215d0ac01826f51fbe286f56c0b.diff LOG: SILoadStoreOptimizer: add support for GFX10 image instructions GFX10 image instructions use one or more address operands starting at vaddr0, instead of a single vaddr operand, to allow for NSA forms. Differential Revision: https://reviews.llvm.org/D81675 Added: llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir llvm/test/CodeGen/AMDGPU/merge-image-sample-gfx10.mir Modified: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp index 140e5509b87b..2eb1c52f1b59 100644 --- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp +++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp @@ -103,15 +103,19 @@ enum InstClassEnum { TBUFFER_STORE, }; -enum RegisterEnum { - SBASE = 0x1, - SRSRC = 0x2, - SOFFSET = 0x4, - VADDR = 0x8, - ADDR = 0x10, - SSAMP = 0x20, +struct AddressRegs { + unsigned char NumVAddrs = 0; + bool SBase = false; + bool SRsrc = false; + bool SOffset = false; + bool VAddr = false; + bool Addr = false; + bool SSamp = false; }; +// GFX10 image_sample instructions can have 12 vaddrs + srsrc + ssamp. +const unsigned MaxAddressRegs = 12 + 1 + 1; + class SILoadStoreOptimizer : public MachineFunctionPass { struct CombineInfo { MachineBasicBlock::iterator I; @@ -126,8 +130,8 @@ class SILoadStoreOptimizer : public MachineFunctionPass { bool SLC; bool DLC; bool UseST64; - int AddrIdx[5]; - const MachineOperand *AddrReg[5]; + int AddrIdx[MaxAddressRegs]; + const MachineOperand *AddrReg[MaxAddressRegs]; unsigned NumAddresses; unsigned Order; @@ -349,7 +353,8 @@ static InstClassEnum getInstClass(unsigned Opc, const SIInstrInfo &TII) { } if (TII.isMIMG(Opc)) { // Ignore instructions encoded without vaddr. - if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr) == -1) + if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr) == -1 && + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr0) == -1) return UNKNOWN; // TODO: Support IMAGE_GET_RESINFO and IMAGE_GET_LOD. if (TII.get(Opc).mayStore() || !TII.get(Opc).mayLoad() || @@ -422,58 +427,54 @@ static unsigned getInstSubclass(unsigned Opc, const SIInstrInfo &TII) { } } -static unsigned getRegs(unsigned Opc, const SIInstrInfo &TII) { - if (TII.isMUBUF(Opc)) { - unsigned result = 0; +static AddressRegs getRegs(unsigned Opc, const SIInstrInfo &TII) { + AddressRegs Result; - if (AMDGPU::getMUBUFHasVAddr(Opc)) { - result |= VADDR; - } - - if (AMDGPU::getMUBUFHasSrsrc(Opc)) { - result |= SRSRC; - } - - if (AMDGPU::getMUBUFHasSoffset(Opc)) { - result |= SOFFSET; - } - - return result; + if (TII.isMUBUF(Opc)) { + if (AMDGPU::getMUBUFHasVAddr(Opc)) + Result.VAddr = true; + if (AMDGPU::getMUBUFHasSrsrc(Opc)) + Result.SRsrc = true; + if (AMDGPU::getMUBUFHasSoffset(Opc)) + Result.SOffset = true; + + return Result; } if (TII.isMIMG(Opc)) { - unsigned result = VADDR | SRSRC; + int VAddr0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr0); + if (VAddr0Idx >= 0) { + int SRsrcIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::srsrc); + Result.NumVAddrs = SRsrcIdx - VAddr0Idx; + } else { + Result.VAddr = true; + } + Result.SRsrc = true; const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Opc); if (Info && AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode)->Sampler) - result |= SSAMP; + Result.SSamp = true; - return result; + return Result; } if (TII.isMTBUF(Opc)) { - unsigned result = 0; - - if (AMDGPU::getMTBUFHasVAddr(Opc)) { - result |= VADDR; - } - - if (AMDGPU::getMTBUFHasSrsrc(Opc)) { - result |= SRSRC; - } - - if (AMDGPU::getMTBUFHasSoffset(Opc)) { - result |= SOFFSET; - } - - return result; + if (AMDGPU::getMTBUFHasVAddr(Opc)) + Result.VAddr = true; + if (AMDGPU::getMTBUFHasSrsrc(Opc)) + Result.SRsrc = true; + if (AMDGPU::getMTBUFHasSoffset(Opc)) + Result.SOffset = true; + + return Result; } switch (Opc) { default: - return 0; + return Result; case AMDGPU::S_BUFFER_LOAD_DWORD_IMM: case AMDGPU::S_BUFFER_LOAD_DWORDX2_IMM: case AMDGPU::S_BUFFER_LOAD_DWORDX4_IMM: - return SBASE; + Result.SBase = true; + return Result; case AMDGPU::DS_READ_B32: case AMDGPU::DS_READ_B64: case AMDGPU::DS_READ_B32_gfx9: @@ -482,7 +483,8 @@ static unsigned getRegs(unsigned Opc, const SIInstrInfo &TII) { case AMDGPU::DS_WRITE_B64: case AMDGPU::DS_WRITE_B32_gfx9: case AMDGPU::DS_WRITE_B64_gfx9: - return ADDR; + Result.Addr = true; + return Result; } } @@ -539,38 +541,34 @@ void SILoadStoreOptimizer::CombineInfo::setMI(MachineBasicBlock::iterator MI, DLC = TII.getNamedOperand(*I, AMDGPU::OpName::dlc)->getImm(); } - unsigned AddrOpName[5] = {0}; - NumAddresses = 0; - const unsigned Regs = getRegs(I->getOpcode(), TII); - - if (Regs & ADDR) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::addr; - } - - if (Regs & SBASE) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::sbase; - } - - if (Regs & SRSRC) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::srsrc; - } - - if (Regs & SOFFSET) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::soffset; - } + AddressRegs Regs = getRegs(Opc, TII); - if (Regs & VADDR) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::vaddr; - } - - if (Regs & SSAMP) { - AddrOpName[NumAddresses++] = AMDGPU::OpName::ssamp; - } - - for (unsigned i = 0; i < NumAddresses; i++) { - AddrIdx[i] = AMDGPU::getNamedOperandIdx(I->getOpcode(), AddrOpName[i]); - AddrReg[i] = &I->getOperand(AddrIdx[i]); - } + NumAddresses = 0; + for (unsigned J = 0; J < Regs.NumVAddrs; J++) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr0) + J; + if (Regs.Addr) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::addr); + if (Regs.SBase) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::sbase); + if (Regs.SRsrc) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::srsrc); + if (Regs.SOffset) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::soffset); + if (Regs.VAddr) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr); + if (Regs.SSamp) + AddrIdx[NumAddresses++] = + AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::ssamp); + assert(NumAddresses <= MaxAddressRegs); + + for (unsigned J = 0; J < NumAddresses; J++) + AddrReg[J] = &I->getOperand(AddrIdx[J]); } } // end anonymous namespace. @@ -694,7 +692,7 @@ bool SILoadStoreOptimizer::dmasksCanBeCombined(const CombineInfo &CI, unsigned OperandsToMatch[] = {AMDGPU::OpName::glc, AMDGPU::OpName::slc, AMDGPU::OpName::d16, AMDGPU::OpName::unorm, AMDGPU::OpName::da, AMDGPU::OpName::r128, - AMDGPU::OpName::a16}; + AMDGPU::OpName::a16, AMDGPU::OpName::dlc}; for (auto op : OperandsToMatch) { int Idx = AMDGPU::getNamedOperandIdx(CI.I->getOpcode(), op); @@ -1288,9 +1286,9 @@ MachineBasicBlock::iterator SILoadStoreOptimizer::mergeBufferLoadPair( auto MIB = BuildMI(*MBB, Paired.I, DL, TII->get(Opcode), DestReg); - const unsigned Regs = getRegs(Opcode, *TII); + AddressRegs Regs = getRegs(Opcode, *TII); - if (Regs & VADDR) + if (Regs.VAddr) MIB.add(*TII->getNamedOperand(*CI.I, AMDGPU::OpName::vaddr)); // It shouldn't be possible to get this far if the two instructions @@ -1351,9 +1349,9 @@ MachineBasicBlock::iterator SILoadStoreOptimizer::mergeTBufferLoadPair( auto MIB = BuildMI(*MBB, Paired.I, DL, TII->get(Opcode), DestReg); - const unsigned Regs = getRegs(Opcode, *TII); + AddressRegs Regs = getRegs(Opcode, *TII); - if (Regs & VADDR) + if (Regs.VAddr) MIB.add(*TII->getNamedOperand(*CI.I, AMDGPU::OpName::vaddr)); unsigned JoinedFormat = @@ -1431,9 +1429,9 @@ MachineBasicBlock::iterator SILoadStoreOptimizer::mergeTBufferStorePair( auto MIB = BuildMI(*MBB, Paired.I, DL, TII->get(Opcode)) .addReg(SrcReg, RegState::Kill); - const unsigned Regs = getRegs(Opcode, *TII); + AddressRegs Regs = getRegs(Opcode, *TII); - if (Regs & VADDR) + if (Regs.VAddr) MIB.add(*TII->getNamedOperand(*CI.I, AMDGPU::OpName::vaddr)); unsigned JoinedFormat = @@ -1594,9 +1592,9 @@ MachineBasicBlock::iterator SILoadStoreOptimizer::mergeBufferStorePair( auto MIB = BuildMI(*MBB, Paired.I, DL, TII->get(Opcode)) .addReg(SrcReg, RegState::Kill); - const unsigned Regs = getRegs(Opcode, *TII); + AddressRegs Regs = getRegs(Opcode, *TII); - if (Regs & VADDR) + if (Regs.VAddr) MIB.add(*TII->getNamedOperand(*CI.I, AMDGPU::OpName::vaddr)); @@ -1991,6 +1989,8 @@ SILoadStoreOptimizer::collectMergeableInsts( if (!CI.hasMergeableAddress(*MRI)) continue; + LLVM_DEBUG(dbgs() << "Mergeable: " << MI); + addInstToMergeableList(CI, MergeableInsts); } @@ -2087,6 +2087,8 @@ SILoadStoreOptimizer::optimizeInstsWithSameBaseAddr( Modified = true; + LLVM_DEBUG(dbgs() << "Merging: " << *CI.I << " with: " << *Paired.I); + switch (CI.InstClass) { default: llvm_unreachable("unknown InstClass"); diff --git a/llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir b/llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir new file mode 100644 index 000000000000..c7d297de04a2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir @@ -0,0 +1,490 @@ +# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass si-load-store-opt -o - %s | FileCheck -check-prefix=GFX10 %s + +# GFX10-LABEL: name: image_load_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- +# GFX10-LABEL: name: image_load_merged_v1v3_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub3 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub0_sub1_sub2 + +name: image_load_merged_v1v3_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_merged_v2v2 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_64 = COPY %8.sub0_sub1 +# GFX10: %{{[0-9]+}}:vreg_64 = COPY killed %8.sub2_sub3 + +name: image_load_merged_v2v2 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5:vreg_64, %3:sgpr_256, 3, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) + %7:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5:vreg_64, %3:sgpr_256, 12, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_merged_v2v2_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_64 = COPY %8.sub2_sub3 +# GFX10: %{{[0-9]+}}:vreg_64 = COPY killed %8.sub0_sub1 + +name: image_load_merged_v2v2_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5:vreg_64, %3:sgpr_256, 12, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) + %7:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5:vreg_64, %3:sgpr_256, 3, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_merged_v3v1 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = COPY %8.sub0_sub1_sub2 +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY killed %8.sub3 + +name: image_load_merged_v3v1 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %7:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_merged_v3v1_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = COPY %8.sub1_sub2_sub3 +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY killed %8.sub0 + +name: image_load_merged_v3v1_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %7:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_divided_merged +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) + +name: image_load_divided_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %8:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %9:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %7:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %10:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %11:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_divided_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_divided_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vreg_128 = COPY %2 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + IMAGE_STORE_V4_V2 %4:vreg_128, %5:vreg_64, %3:sgpr_256, 15, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 16) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_dmask_overlapped_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_dmask_overlapped_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_dmask_not_disjoint_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 11, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_dmask_not_disjoint_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 11, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_0 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %6, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_0 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %7:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %8:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %6, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_1 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %6, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %6, %4, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_1 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %5:vgpr_32 = COPY %2.sub3 + %6:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %7:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %6, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %8:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %6, %4, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_3 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_4 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_4 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_5 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_5 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_6 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 1, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_6 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 1, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_7 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_7 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_8 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V1_gfx10 %6, %3, 8, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_8 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = COPY %5.sub0 + %7:vgpr_32 = IMAGE_LOAD_V1_V1_gfx10 %6, %3, 8, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) + %8:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_9 +# GFX10: %{{[0-9]+}}:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 1, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_9 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 1, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_10 +# GFX10: %{{[0-9]+}}:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_10 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_LOAD_V2_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_not_merged_11 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_load_not_merged_11 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_V1_V2_gfx10 %5, %3, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_V3_V2_gfx10 %5, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_mip_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_MIP_V4_V3_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_mip_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_96 = BUFFER_LOAD_DWORDX3_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_MIP_V1_V3_gfx10 %5:vreg_96, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_MIP_V3_V3_gfx10 %5:vreg_96, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + + +# GFX10-LABEL: name: image_load_mip_pck_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_MIP_PCK_V4_V3_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_mip_pck_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_96 = BUFFER_LOAD_DWORDX3_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_MIP_PCK_V1_V3_gfx10 %5:vreg_96, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_MIP_PCK_V3_V3_gfx10 %5:vreg_96, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + + +# GFX10-LABEL: name: image_load_mip_pck_sgn_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_MIP_PCK_SGN_V4_V3_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_mip_pck_sgn_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_96 = BUFFER_LOAD_DWORDX3_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_MIP_PCK_SGN_V1_V3_gfx10 %5:vreg_96, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_MIP_PCK_SGN_V3_V3_gfx10 %5:vreg_96, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_pck_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_PCK_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_pck_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_PCK_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_PCK_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_load_pck_sgn_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_LOAD_PCK_SGN_V4_V2_gfx10 %5, %3, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_load_pck_sgn_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_LOAD_PCK_SGN_V1_V2_gfx10 %5:vreg_64, %3:sgpr_256, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_LOAD_PCK_SGN_V3_V2_gfx10 %5:vreg_64, %3:sgpr_256, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- diff --git a/llvm/test/CodeGen/AMDGPU/merge-image-sample-gfx10.mir b/llvm/test/CodeGen/AMDGPU/merge-image-sample-gfx10.mir new file mode 100644 index 000000000000..70923145dd2d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/merge-image-sample-gfx10.mir @@ -0,0 +1,1173 @@ +# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass si-load-store-opt -o - %s | FileCheck -check-prefix=GFX10 %s + +# GFX10-LABEL: name: image_sample_l_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_l_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- +# GFX10-LABEL: name: image_sample_l_merged_v1v3_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub3 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub0_sub1_sub2 + +name: image_sample_l_merged_v1v3_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_merged_v2v2 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_64 = COPY %8.sub0_sub1 +# GFX10: %{{[0-9]+}}:vreg_64 = COPY killed %8.sub2_sub3 + +name: image_sample_l_merged_v2v2 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 3, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) + %7:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 12, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_merged_v2v2_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_64 = COPY %8.sub2_sub3 +# GFX10: %{{[0-9]+}}:vreg_64 = COPY killed %8.sub0_sub1 + +name: image_sample_l_merged_v2v2_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 12, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) + %7:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 3, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 8, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_merged_v3v1 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = COPY %8.sub0_sub1_sub2 +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY killed %8.sub3 + +name: image_sample_l_merged_v3v1 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %7:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_merged_v3v1_reversed +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = COPY %8.sub1_sub2_sub3 +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY killed %8.sub0 + +name: image_sample_l_merged_v3v1_reversed +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %7:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_divided_merged +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) + +name: image_sample_l_divided_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %8:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %9:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %7:vgpr_32, %7:vgpr_32, %7:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + %10:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %11:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_divided_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_divided_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vreg_128 = COPY %2 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + IMAGE_STORE_V4_V2_nsa_gfx10 %4:vreg_128, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 16) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_dmask_overlapped_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_dmask_overlapped_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_dmask_not_disjoint_not_merged +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 11, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_dmask_not_disjoint_not_merged +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 4, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 11, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_0 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_0 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %7:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %8:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_1 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %6, %6, %6, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %4, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_1 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %5:vgpr_32 = COPY %2.sub3 + %6:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %7:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %6, %6, %6, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %8:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %4, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_2 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %6, %6, %6, %4, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %4, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_2 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_128 = COPY $sgpr92_sgpr93_sgpr94_sgpr95 + %4:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %5:vgpr_32 = COPY %2.sub3 + %6:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %7:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %6, %6, %6, %4, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %8:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %6, %6, %6, %4, %3, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_3 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_4 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_4 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_5 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 1, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_5 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 1, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_6 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_6 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_7 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V2_nsa_gfx10 %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_7 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V2_nsa_gfx10 %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_8 +# GFX10: %{{[0-9]+}}:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 1, 0, 0, implicit $exec :: (dereferenceable load 8, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_8 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 1, 0, 0, implicit $exec :: (dereferenceable load 8, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_9 +# GFX10: %{{[0-9]+}}:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable load 8, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_9 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vreg_64 = IMAGE_SAMPLE_L_V2_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable load 8, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + +# GFX10-LABEL: name: image_sample_l_not_merged_10 +# GFX10: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) + +name: image_sample_l_not_merged_10 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V3_nsa_gfx10 %5, %5, %5, %3, %2, 8, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_V3_V3_nsa_gfx10 %5, %5, %5, %3, %2, 7, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + + + +# GFX10-LABEL: name: image_sample_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_V4_V2_nsa_gfx10 %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_V1_V2_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_V3_V2_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_b_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_B_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_b_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_B_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_B_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_b_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_B_CL_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_b_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_B_CL_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_B_CL_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_b_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_B_CL_O_V4_V5_nsa_gfx10 %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_b_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_B_CL_O_V1_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_B_CL_O_V3_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_b_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_B_O_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_b_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_B_O_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_B_O_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cd_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CD_V4_V6_nsa_gfx10 %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cd_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CD_V1_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CD_V3_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cd_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CD_CL_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cd_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CD_CL_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CD_CL_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cd_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CD_CL_O_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cd_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CD_CL_O_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CD_CL_O_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cd_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CD_O_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cd_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CD_O_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CD_O_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CL_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CL_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CL_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_CL_O_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_CL_O_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_CL_O_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_b_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_B_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_b_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_B_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_B_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_b_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_B_CL_V4_V5_nsa_gfx10 %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_b_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_B_CL_V1_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_B_CL_V3_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_b_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_B_CL_O_V4_V6_nsa_gfx10 %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_b_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_B_CL_O_V1_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_B_CL_O_V3_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_b_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_B_O_V4_V5_nsa_gfx10 %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_b_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_B_O_V1_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_B_O_V3_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cd_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CD_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cd_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CD_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CD_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cd_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CD_CL_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cd_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CD_CL_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CD_CL_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cd_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CD_CL_O_V4_V9_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cd_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CD_CL_O_V1_V9_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CD_CL_O_V3_V9_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cd_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CD_O_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cd_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CD_O_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CD_O_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CL_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CL_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CL_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_CL_O_V4_V5_nsa_gfx10 %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_CL_O_V1_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_CL_O_V3_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_d_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_D_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_d_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_D_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_D_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_d_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_D_CL_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_d_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_D_CL_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_D_CL_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_d_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_D_CL_O_V4_V9_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_d_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_D_CL_O_V1_V9_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_D_CL_O_V3_V9_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_d_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_D_O_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_d_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_D_O_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_D_O_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_l_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_L_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_l_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_L_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_L_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_lz_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_LZ_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_lz_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_LZ_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_LZ_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_lz_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_LZ_O_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_lz_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_LZ_O_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_LZ_O_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_l_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_L_O_V4_V5_nsa_gfx10 %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_l_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_L_O_V1_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_L_O_V3_V5_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_c_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_C_O_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_c_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_C_O_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_C_O_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_d_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_D_V4_V6_nsa_gfx10 %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_d_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_D_V1_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_D_V3_V6_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_d_cl_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_D_CL_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_d_cl_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_D_CL_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_D_CL_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_d_cl_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_D_CL_O_V4_V8_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_d_cl_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_D_CL_O_V1_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_D_CL_O_V3_V8_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_d_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_D_O_V4_V7_nsa_gfx10 %5, %5, %5, %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_d_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_D_O_V1_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_D_O_V3_V7_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_lz_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_LZ_V4_V2_nsa_gfx10 %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_lz_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_LZ_V3_V2_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_lz_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_LZ_O_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_lz_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_LZ_O_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_LZ_O_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_l_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_L_O_V4_V4_nsa_gfx10 %5, %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_l_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_L_O_V1_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_L_O_V3_V4_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- + + +# GFX10-LABEL: name: image_sample_o_merged_v1v3 +# GFX10: %{{[0-9]+}}:vreg_128 = IMAGE_SAMPLE_O_V4_V3_nsa_gfx10 %5, %5, %5, %3, %2, 15, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec, implicit $exec :: (dereferenceable load 16, align 4, addrspace 4) +# GFX10: %{{[0-9]+}}:vgpr_32 = COPY %8.sub0 +# GFX10: %{{[0-9]+}}:vreg_96 = COPY killed %8.sub1_sub2_sub3 + +name: image_sample_o_merged_v1v3 +body: | + bb.0.entry: + %0:sgpr_64 = COPY $sgpr0_sgpr1 + %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0, 0 + %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99 + %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0, 0 + %4:vgpr_32 = COPY %2.sub3 + %5:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET %2:sgpr_128, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load 16) + %6:vgpr_32 = IMAGE_SAMPLE_O_V1_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4, addrspace 4) + %7:vreg_96 = IMAGE_SAMPLE_O_V3_V3_nsa_gfx10 %5:vgpr_32, %5:vgpr_32, %5:vgpr_32, %3:sgpr_256, %2:sgpr_128, 14, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16, addrspace 4) +... +--- From llvm-commits at lists.llvm.org Wed Jul 8 11:35:27 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:35:27 +0000 (UTC) Subject: [PATCH] D81675: SILoadStoreOptimizer: add support for GFX10 image instructions In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG47788b97a9eb: SILoadStoreOptimizer: add support for GFX10 image instructions (authored by foad). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81675/new/ https://reviews.llvm.org/D81675 Files: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir llvm/test/CodeGen/AMDGPU/merge-image-sample-gfx10.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81675.276499.patch Type: text/x-patch Size: 115344 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:35:47 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:35:47 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show command line option In-Reply-To: References: Message-ID: yln marked 3 inline comments as done. yln added inline comments. ================ Comment at: llvm/docs/CommandGuide/lit.rst:120-121 + + Show the names of the specified tests. Choose from: + all, excluded, skipped, unsupported, pass, flakypass, xfail. + ---------------- jdenny wrote: > yln wrote: > > varungandhi-apple wrote: > > > Could you add an example here with how multiple items should be selected? For example, one might wonder > > > > > > 1. Can you do something like `--show skipped,xfail`? > > > 2. What are the semantics of `--show skipped --show xfail` (does it mean skipped AND xfail or does it only mean xfail)? > > > > > > You might also want to add a sentence to `--help` text on how to use multiple options. > > Good point, I will add an example here. > > 1. See test. > > 2. Last one wins. > > 3. The `--help` part is auto-generated by argparse (because we use the `choices` parameter). > Why not a comma-separated list? > > For example, I downloaded your patch and tried this: > > ``` > $ ./bin/llvm-lit test/Support > -- Testing: 2 tests, 2 workers -- > PASS: LLVM :: Support/check-default-options.txt (1 of 2) > PASS: LLVM :: Support/interrupts.test (2 of 2) > > Testing Time: 0.11s > Passed: 2 > $ ./bin/llvm-lit --show all test/Support > usage: lit [-h] [--version] [-j N] [--config-prefix NAME] [-D NAME=VAL] [-q] > [-s] [-v] [-vv] [-a] [-o PATH] [--no-progress-bar] > [--show-unsupported] [--show-xfail] > [--show {all,excluded,skipped,unsupported,pass,flakypass,xfail} [{all,excluded,skipped,unsupported,pass,flakypass,xfail} ...]] > [--path PATH] [--vg] [--vg-leak] [--vg-arg ARG] [--time-tests] > [--no-execute] [--xunit-xml-output XUNIT_XML_OUTPUT] > [--timeout MAXINDIVIDUALTESTTIME] [--max-failures MAX_FAILURES] > [--allow-empty-runs] [--max-tests N] [--max-time N] [--shuffle] > [-i] [--filter REGEX] [--num-shards M] [--run-shard N] [--debug] > [--show-suites] [--show-tests] [--show-used-features] > TEST_PATH [TEST_PATH ...] > lit: error: argument --show: invalid choice: 'test/Support' (choose from 'all', 'excluded', 'skipped', 'unsupported', 'pass', 'flakypass', 'xfail') > ``` > > The usage message shows that `--show` can be used before `TEST_PATH`, but it doesn't work unless I add `--` in between, which isn't listed in the options. Alternatively, I can specify `--show` after `TEST_PATH`, but that also isn't mentioned in the usage summary above. > > If the values were comma-separated, this wouldn't be an issue. > Why not a comma-separated list? Yes, I think this would also "feel" more natural than the current space-separated list and avoid the oddities about parameter ordering. Surprisingly, argparse does not support this. I see two options: * Implement this ourselves (it's actually not too bad): https://stackoverflow.com/a/60205263/271968 * We could generate flags, e.g., `--show-xfail`, for each result code. Pro: this would be in line with the existing flags for unsupported and fail. Con: Extending this scheme to accept user-defined codes would be harder. What do you think? Do you have a preference or additional ideas? ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:204 + else: + opts.shown_codes.add(lit.Test.ResultCode._instances[code.upper()]) ---------------- jdenny wrote: > What happens if there are user-defined result codes that are spelled the same except for case? Unfortunately, this can't be used (yet) to specify user-defined result codes at all. User codes are usually registered in config files and this code executes before we evaluate configs, i.e., it will print `argument --show: invalid choice: 'user-code' (choose from ...)` If we think it's worth it then we could push "choice validation" to a later point after we processed the user configs. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 From llvm-commits at lists.llvm.org Wed Jul 8 11:36:01 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:36:01 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls In-Reply-To: References: Message-ID: eugenis added a comment. This needs to include tests for bitcode writing and parsing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83412/new/ https://reviews.llvm.org/D83412 From llvm-commits at lists.llvm.org Wed Jul 8 11:36:08 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:36:08 +0000 (UTC) Subject: [PATCH] D71739: [AssumeBundles] Use operand bundles to encode alignment assumptions In-Reply-To: References: Message-ID: <9d5e447709ee9d1fe0d81d4917153ae2@localhost.localdomain> Tyker updated this revision to Diff 276500. Tyker added a comment. fixed Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71739/new/ https://reviews.llvm.org/D71739 Files: clang/lib/CodeGen/CodeGenFunction.cpp clang/test/CodeGen/align_value.cpp clang/test/CodeGen/alloc-align-attr.c clang/test/CodeGen/assume-aligned-and-alloc-align-attributes.c clang/test/CodeGen/builtin-align-array.c clang/test/CodeGen/builtin-align.c clang/test/CodeGen/builtin-assume-aligned.c clang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-lvalue.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-paramvar.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function-variable.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function-two-params.cpp clang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params-variable.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params.cpp clang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-two-params.cpp clang/test/CodeGen/catch-alignment-assumption-openmp.cpp clang/test/CodeGen/non-power-of-2-alignment-assumptions.c clang/test/OpenMP/simd_codegen.cpp clang/test/OpenMP/simd_metadata.c clang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/Transforms/Scalar/AlignmentFromAssumptions.h llvm/lib/Analysis/AssumeBundleQueries.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp llvm/test/Transforms/AlignmentFromAssumptions/simple.ll llvm/test/Transforms/AlignmentFromAssumptions/simple32.ll llvm/test/Transforms/Inline/align.ll llvm/test/Transforms/InstCombine/assume.ll llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll llvm/test/Verifier/assume-bundles.ll llvm/unittests/Analysis/AssumeBundleQueriesTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D71739.276500.patch Type: text/x-patch Size: 105632 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:36:21 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:36:21 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: <976bdd683ab8388eddc15a2ebd8a3524@localhost.localdomain> dblaikie accepted this revision. dblaikie added a comment. Great, thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Wed Jul 8 11:38:48 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:38:48 +0000 (UTC) Subject: [PATCH] D82510: [PowerPC][Power10] Implement low-order Vector Multiply, Modulus and Divide Instructions In-Reply-To: References: Message-ID: <08a7636e52e5c8429167f26d852d9607@localhost.localdomain> lei added a comment. This LGTM, just wondering why you have not included testing for BE. ================ Comment at: llvm/test/CodeGen/PowerPC/p10-vector-divide.ll:3 +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ +; RUN: -mcpu=pwr10 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | \ +; RUN: FileCheck %s ---------------- BE tests? ================ Comment at: llvm/test/CodeGen/PowerPC/p10-vector-modulo.ll:4 +; RUN: -mcpu=pwr10 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | \ +; RUN: FileCheck %s + ---------------- BE tests? ================ Comment at: llvm/test/CodeGen/PowerPC/p10-vector-multiply.ll:3 +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ +; RUN: -mcpu=pwr10 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | \ +; RUN: FileCheck %s ---------------- BE? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82510/new/ https://reviews.llvm.org/D82510 From llvm-commits at lists.llvm.org Wed Jul 8 11:39:13 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:39:13 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <9bffe8dfdfb06469da11df760f7fe285@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/docs/LangRef.rst:14269 +bit width, but they must have the same bit width. ``%a`` is the value to be +shifted, and ``%b`` is the amount to shift by. ``%b`` must be less than the bit +width. ---------------- Not sure what "must be" means in this context; the shift amount is a variable, so we can't enforce anything about it statically. Is it poison? Or undefined behavior? Or does the shift clamp to the min/max value? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 11:40:01 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:40:01 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dblaikie added a comment. In D82975#2138869 , @dstenb wrote: > In D82975#2135150 , @SouraVX wrote: > > > In D82975#2134626 , @dblaikie wrote: > > > > > In D82975#2134347 , @probinson wrote: > > > > > > > In D82975#2134132 , @dblaikie wrote: > > > > > > > > > In D82975#2128353 , @dstenb wrote: > > > > > > > > > > > In D82975#2127201 , @SouraVX wrote: > > > > > > > > > > > > > I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the `.debug_macro` section with `version` 4 and teach the `llvm-dwarfdump` to parse it correctly. > > > > > > > > > > > > > > > > > > One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4? > > > > > > > > > > > > > > > Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc. > > > > > > > > > > > > My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's `-fdebug-macro` with v4 is emitting a .debug_str_offsets section. > > > > > > > > LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either. > > > > > > > > > GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand: > > > > > > $ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug" > > > 1 .debug_info.dwo 0000003c 0000000000000000 > > > 2 .debug_abbrev.dwo 0000003e 0000000000000000 > > > 3 .debug_macro.dwo 0000001e 0000000000000000 > > > 4 .debug_macro.dwo 00000364 0000000000000000 > > > 5 .debug_macro.dwo 00000013 0000000000000000 > > > 6 .debug_line.dwo 00000048 0000000000000000 > > > 7 .debug_str_offsets.dwo 000002d5 0000000000000000 > > > 8 .debug_str.dwo 00000e05 0000000000000000 > > > $ llvm-dwarfdump-tot main.dwo -debug-macro > > > main.dwo: file format elf64-x86-64 > > > > > > .debug_macro.dwo contents: > > > 0x00000000: > > > - lineno: 19 macro: > > > DW_MACINFO_invalid > > > > > > > > > I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off. > > > > > > One more deciding factor to considered here(previously missed) is that: `GDB(trunk)` also doesn't understand `GNU macro extensions`(if you wish to call it) in split case. > > i.e > > `gcc -g3 -gsplit-dwarf test.c` > > `test.dwo` contains `.debug_macro.dwo` forms which no tool(as of now can dump). > > if you load `a.out` in GDB and try expanding macro(defined in source). > > GDB will report > > > > (gdb) info macro FOO > > The symbol `FOO' has no definition as a C/C++ preprocessor macro > > at :-1 > > > > > > on the other hand, if you try with `-gstrict-dwarf -gsplit-dwarf`. GDB is happy. > > So at the end of the day, even if we allow `GNU macro` extension, things will still be broken for `-gsplit-dwarf` case. > > Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in `.debug_macro.dwo` section. > > That it self is unclear right ? > > > (Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.) > > When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers? Perhaps - though there was some thought put into supporting GNU debug_macro in v4/pre-standard Fission, given the DWP format had columns for both debug_macro and debug_macinfo ( https://gcc.gnu.org/wiki/DebugFissionDWP ). Don't think it's a big deal either way - if someone comes along wanting to add debug_macro support for pre-standard Fission, we can discuss what that format looks like at that point - happy enough for it to be unimplemented (& as I said before, have "-ggdb -gdwarf-4 -fdebug-macro -> debug_macro" and "-ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf -> debug_macinfo.dwo"). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Wed Jul 8 11:41:46 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:41:46 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: yamauchi updated this revision to Diff 276501. yamauchi added a comment. Rebase. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 Files: llvm/lib/Target/X86/X86FixupLEAs.cpp llvm/lib/Target/X86/X86PadShortFunction.cpp llvm/test/CodeGen/X86/fixup-lea.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83330.276501.patch Type: text/x-patch Size: 5477 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:43:02 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:43:02 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: craig.topper updated this revision to Diff 276502. craig.topper added a comment. Herald added a subscriber: dmgreen. Add tests for InstCombine and InstSimplfy. Update frontend test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 Files: clang/test/CodeGen/arm-mve-intrinsics/dup.c llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstCombine/select.ll llvm/test/Transforms/InstSimplify/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83360.276502.patch Type: text/x-patch Size: 8854 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:44:03 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:44:03 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls In-Reply-To: References: Message-ID: <4b28bff0434abb8f87d8d8e14b0d5b66@localhost.localdomain> guiand updated this revision to Diff 276503. guiand added a comment. Added a test to attributes.ll, which seems to be where other attributes are tested. Is this sufficient? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83412/new/ https://reviews.llvm.org/D83412 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Attributes.td llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/Attributes.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp llvm/test/Bitcode/attributes.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83412.276503.patch Type: text/x-patch Size: 5285 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:45:06 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:45:06 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <90c5f8c4a0446c5bc3dab75dc714ed7c@localhost.localdomain> lebedev.ri added inline comments. ================ Comment at: llvm/docs/LangRef.rst:14269 +bit width, but they must have the same bit width. ``%a`` is the value to be +shifted, and ``%b`` is the amount to shift by. ``%b`` must be less than the bit +width. ---------------- efriedma wrote: > Not sure what "must be" means in this context; the shift amount is a variable, so we can't enforce anything about it statically. > > Is it poison? Or undefined behavior? Or does the shift clamp to the min/max value? Right, we should spell that out. IMO this should be consistent with normal shifts, i.e. poison), ``` If ``b`` is (statically or dynamically) equal to or larger than the number of bits in ``a``, this instruction returns a :ref:`poison value `. If the arguments are vectors, each vector element of ``a`` is shifted by the corresponding shift amount in ``b``. ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 From llvm-commits at lists.llvm.org Wed Jul 8 11:45:15 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Wed, 08 Jul 2020 11:45:15 -0700 (PDT) Subject: [lld] f86d96a - [ELF] Enforce double-dash form for --warn-backrefs-exclude Message-ID: <5f06143b.1c69fb81.fd8ed.15af@mx.google.com> Author: Fangrui Song Date: 2020-07-08T11:45:01-07:00 New Revision: f86d96a96441eb9f83c2dedb0ba5d7e4dc8dc089 URL: https://github.com/llvm/llvm-project/commit/f86d96a96441eb9f83c2dedb0ba5d7e4dc8dc089 DIFF: https://github.com/llvm/llvm-project/commit/f86d96a96441eb9f83c2dedb0ba5d7e4dc8dc089.diff LOG: [ELF] Enforce double-dash form for --warn-backrefs-exclude This is an LLD-specific option. We have enforced double-dash forms for other options (reduce collision with short options) but missed this one. Added: Modified: lld/ELF/Options.td Removed: ################################################################################ diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td index bc12f4d45546..c3c1309aca1a 100644 --- a/lld/ELF/Options.td +++ b/lld/ELF/Options.td @@ -436,7 +436,7 @@ defm warn_backrefs: BB<"warn-backrefs", "Do not warn about backward symbol references to fetch archive members (default)">; defm warn_backrefs_exclude - : Eq<"warn-backrefs-exclude", + : EEq<"warn-backrefs-exclude", "Glob describing an archive (or an object file within --start-lib) " "which should be ignored for --warn-backrefs.">, MetaVarName<"">; From llvm-commits at lists.llvm.org Wed Jul 8 11:46:02 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:46:02 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <2b7dab55ddfe828985d0896e0978093c@localhost.localdomain> spatel accepted this revision. spatel added a comment. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 11:46:18 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:46:18 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <1c99c250bac5694dbedf5891b54c241b@localhost.localdomain> lebedev.ri accepted this revision. lebedev.ri added a comment. LG, thank you. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 11:48:25 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:48:25 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls In-Reply-To: References: Message-ID: <825a0afe2052345f0eefd7df4fb0e17b@localhost.localdomain> eugenis accepted this revision. eugenis added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83412/new/ https://reviews.llvm.org/D83412 From llvm-commits at lists.llvm.org Wed Jul 8 11:48:29 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Wed, 08 Jul 2020 11:48:29 -0700 (PDT) Subject: [llvm] 12c2271 - [DAGCombiner] fix code comment and improve readability; NFC Message-ID: <5f0614fd.1c69fb81.1c798.16b8@mx.google.com> Author: Sanjay Patel Date: 2020-07-08T14:48:05-04:00 New Revision: 12c2271e534c297b71e52c3a25b53f3d475db78d URL: https://github.com/llvm/llvm-project/commit/12c2271e534c297b71e52c3a25b53f3d475db78d DIFF: https://github.com/llvm/llvm-project/commit/12c2271e534c297b71e52c3a25b53f3d475db78d.diff LOG: [DAGCombiner] fix code comment and improve readability; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 27b340ffcb7e..c682a2051ba3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -709,8 +709,7 @@ namespace { /// Merge consecutive store operations into a wide store. /// This optimization uses wide integers or vectors when possible. - /// \return number of stores that were merged into a merged store (the - /// affected nodes are stored as a prefix in \p StoreNodes). + /// \return true if stores were merged. bool mergeConsecutiveStores(StoreSDNode *St); /// Try to transform a truncation where C is a constant: @@ -16300,7 +16299,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // mergeable cases. To prevent this, we prune such stores from the // front of StoreNodes here. - bool RV = false; + bool MadeChange = false; while (StoreNodes.size() > 1) { size_t StartIdx = 0; while ((StartIdx + 1 < StoreNodes.size()) && @@ -16310,7 +16309,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // Bail if we don't have enough candidates to merge. if (StartIdx + 1 >= StoreNodes.size()) - return RV; + return MadeChange; if (StartIdx) StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + StartIdx); @@ -16446,8 +16445,8 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { continue; } - RV |= mergeStoresOfConstantsOrVecElts(StoreNodes, MemVT, NumElem, true, - UseVector, LastIntegerTrunc); + MadeChange |= mergeStoresOfConstantsOrVecElts( + StoreNodes, MemVT, NumElem, true, UseVector, LastIntegerTrunc); // Remove merged stores for next iteration. StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); @@ -16513,7 +16512,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { continue; } - RV |= mergeStoresOfConstantsOrVecElts( + MadeChange |= mergeStoresOfConstantsOrVecElts( StoreNodes, MemVT, NumStoresToMerge, false, true, false); StoreNodes.erase(StoreNodes.begin(), @@ -16759,8 +16758,8 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { SDValue(NewLoad.getNode(), 1)); } - // Replace the all stores with the new store. Recursively remove - // corresponding value if its no longer used. + // Replace all stores with the new store. Recursively remove corresponding + // values if they are no longer used. for (unsigned i = 0; i < NumElem; ++i) { SDValue Val = StoreNodes[i].MemNode->getOperand(1); CombineTo(StoreNodes[i].MemNode, NewStore); @@ -16768,13 +16767,13 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { recursivelyDeleteUnusedNodes(Val.getNode()); } - RV = true; + MadeChange = true; StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumElem); NumConsecutiveStores -= NumElem; } } - return RV; + return MadeChange; } SDValue DAGCombiner::replaceStoreChain(StoreSDNode *ST, SDValue BetterChain) { From llvm-commits at lists.llvm.org Wed Jul 8 11:48:32 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Wed, 08 Jul 2020 11:48:32 -0700 (PDT) Subject: [llvm] 1265eb2 - [DAGCombiner] clean up in mergeConsecutiveStores(); NFC Message-ID: <5f061500.1c69fb81.eda90.0e00@mx.google.com> Author: Sanjay Patel Date: 2020-07-08T14:48:05-04:00 New Revision: 1265eb2d5f7e9cdf6a557a2a1c338f370d730917 URL: https://github.com/llvm/llvm-project/commit/1265eb2d5f7e9cdf6a557a2a1c338f370d730917 DIFF: https://github.com/llvm/llvm-project/commit/1265eb2d5f7e9cdf6a557a2a1c338f370d730917.diff LOG: [DAGCombiner] clean up in mergeConsecutiveStores(); NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index c682a2051ba3..dd869f98b5bc 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -16248,31 +16248,18 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { EVT MemVT = St->getMemoryVT(); if (MemVT.isScalableVector()) return false; - - int64_t ElementSizeBytes = MemVT.getStoreSize(); - unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; - - if (MemVT.getSizeInBits() * 2 > MaximumLegalStoreInBits) + if (!MemVT.isSimple() || MemVT.getSizeInBits() * 2 > MaximumLegalStoreInBits) return false; - bool NoVectors = DAG.getMachineFunction().getFunction().hasFnAttribute( - Attribute::NoImplicitFloat); - // This function cannot currently deal with non-byte-sized memory sizes. + int64_t ElementSizeBytes = MemVT.getStoreSize(); if (ElementSizeBytes * 8 != (int64_t)MemVT.getSizeInBits()) return false; - if (!MemVT.isSimple()) - return false; - - // Perform an early exit check. Do not bother looking at stored values that - // are not constants, loads, or extracted vector elements. + // Do not bother looking at stored values that are not constants, loads, or + // extracted vector elements. SDValue StoredVal = peekThroughBitcasts(St->getValue()); StoreSource StoreSrc = getStoreSource(StoredVal); - bool IsNonTemporalStore = St->isNonTemporal(); - bool IsNonTemporalLoad = StoreSrc == StoreSource::Load && - cast(StoredVal)->isNonTemporal(); - if (StoreSrc == StoreSource::Unknown) return false; @@ -16291,6 +16278,16 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { return LHS.OffsetFromBase < RHS.OffsetFromBase; }); + unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; + bool AllowVectors = !DAG.getMachineFunction().getFunction().hasFnAttribute( + Attribute::NoImplicitFloat); + + bool IsNonTemporalStore = St->isNonTemporal(); + bool IsNonTemporalLoad = StoreSrc == StoreSource::Load && + cast(StoredVal)->isNonTemporal(); + LLVMContext &Context = *DAG.getContext(); + const DataLayout &DL = DAG.getDataLayout(); + // Store Merge attempts to merge the lowest stores. This generally // works out as if successful, as the remaining stores are checked // after the first collection of stores is merged. However, in the @@ -16298,7 +16295,6 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // p[0], p[1], p[2], p[3]}, we would fail and miss the subsequent // mergeable cases. To prevent this, we prune such stores from the // front of StoreNodes here. - bool MadeChange = false; while (StoreNodes.size() > 1) { size_t StartIdx = 0; @@ -16333,12 +16329,8 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { continue; } - // The node with the lowest store address. - LLVMContext &Context = *DAG.getContext(); - const DataLayout &DL = DAG.getDataLayout(); - - // Store the constants into memory as one consecutive store. if (StoreSrc == StoreSource::Constant) { + // Store the constants into memory as one consecutive store. while (NumConsecutiveStores >= 2) { LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; unsigned FirstStoreAS = FirstInChain->getAddressSpace(); @@ -16399,7 +16391,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // noimplicitfloat attribute. if ((!NonZero || TLI.storeOfVectorConstantIsCheap(MemVT, i + 1, FirstStoreAS)) && - !NoVectors) { + AllowVectors) { // Find a legal type for the vector store. unsigned Elts = (i + 1) * NumMemElts; EVT Ty = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); @@ -16412,7 +16404,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { } } - bool UseVector = (LastLegalVectorType > LastLegalType) && !NoVectors; + bool UseVector = (LastLegalVectorType > LastLegalType) && AllowVectors; unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType; // Check if we found a legal integer type that creates a meaningful @@ -16659,7 +16651,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // Only use vector types if the vector type is larger than the integer // type. If they are the same, use integers. bool UseVectorTy = - LastLegalVectorType > LastLegalIntegerType && !NoVectors; + LastLegalVectorType > LastLegalIntegerType && AllowVectors; unsigned LastLegalType = std::max(LastLegalVectorType, LastLegalIntegerType); From llvm-commits at lists.llvm.org Wed Jul 8 11:51:14 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:51:14 +0000 (UTC) Subject: [PATCH] D82210: [SVE] Remove calls to VectorType::getNumElements from CodeGen In-Reply-To: References: Message-ID: <10447f074289ef0339648986a29b1cca@localhost.localdomain> ctetreau updated this revision to Diff 276506. ctetreau added a comment. address code review issues Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82210/new/ https://reviews.llvm.org/D82210 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/ExpandReductions.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/CodeGen/ValueTypes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82210.276506.patch Type: text/x-patch Size: 11049 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:51:55 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:51:55 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: dblaikie added a comment. In D83351#2139599 , @lebedev.ri wrote: > In D83351#2139586 , @nickdesaulniers wrote: > > > I'm not a fan of the inconsistent use of range-for and for-each; I would prefer range-for everywhere since it's more concise. > > > My (consistently inconsistent) headcanon as to which to use when is that for_each should be used > when in principle we don't care in which order each item will be processed. I don't think that's how the rest of LLVM is written, nor probably a great model. std::for_each is guaranteed to visit the elements in order, so it doesn't have a different contract to a range-based-for loop & adds some extra syntax (the lambda introducers, etc), complications to error messages, etc. >> But I don't feel strongly enough to block the patch based on that. Maybe LLVM's style guide should provide clarity and guidance on the difference of opinion? I think this is one I'm willing to say LLVM convention's pretty clear - there's 30 calls to std::for_each across LLVM and subprojects - and about 25,000 range-based-for loops... - so please change these to range based for loops. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 11:54:06 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:54:06 +0000 (UTC) Subject: [PATCH] D83409: [opt] Remove obsolete --quiet option In-Reply-To: References: Message-ID: hans accepted this revision. hans added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Support/SystemUtils.cpp:20 if (stream_to_check.is_displayed()) { - if (print_warning) { errs() << "WARNING: You're attempting to print out a bitcode file.\n" "This is inadvisable as it may cause display problems. If\n" ---------------- nit: the indentation needs readjusting ================ Comment at: llvm/tools/opt/PassPrinters.cpp:75 bool runOnSCC(CallGraphSCC &SCC) override { - if (!QuietPass) Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; ---------------- nit: indent ================ Comment at: llvm/tools/opt/PassPrinters.cpp:110 bool runOnModule(Module &M) override { - if (!QuietPass) Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; ---------------- indent ================ Comment at: llvm/tools/opt/PassPrinters.cpp:140 bool runOnLoop(Loop *L, LPPassManager &LPM) override { - if (!QuietPass) Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; ---------------- indent ================ Comment at: llvm/tools/opt/PassPrinters.cpp:171 bool runOnRegion(Region *R, RGPassManager &RGM) override { - if (!QuietPass) { Out << "Printing analysis '" << PassToPrint->getPassName() << "' for " << "region: '" << R->getNameStr() << "' in function '" ---------------- indent ================ Comment at: llvm/tools/opt/opt.cpp:208 -static cl::opt -Quiet("q", cl::desc("Obsolete option"), cl::Hidden); - ---------------- Might be worth mentioning the option was "neutered" in r13844 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83409/new/ https://reviews.llvm.org/D83409 From llvm-commits at lists.llvm.org Wed Jul 8 11:55:15 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:55:15 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <261d8e49b91e68f7bc6b0ed3daf9b898@localhost.localdomain> NeHuang updated this revision to Diff 276505. NeHuang added a comment. Thanks Sean. Added a new lit test to check for global linkage and default visibility callee. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/test/ELF/Inputs/ppc64-callee-global-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel-callee-global.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.276505.patch Type: text/x-patch Size: 9496 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:56:15 2020 From: llvm-commits at lists.llvm.org (Anh Tuyen Tran via llvm-commits) Date: Wed, 08 Jul 2020 11:56:15 -0700 (PDT) Subject: [llvm] fead250 - [NFC] Separate Peeling Properties into its own struct Message-ID: <5f0616cf.1c69fb81.14da2.1757@mx.google.com> Author: Anh Tuyen Tran Date: 2020-07-08T18:56:03Z New Revision: fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a URL: https://github.com/llvm/llvm-project/commit/fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a DIFF: https://github.com/llvm/llvm-project/commit/fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a.diff LOG: [NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580 Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index 695b7d6061c0..b6698eefdb01 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -450,11 +450,6 @@ class TargetTransformInfo { /// transformation will select an unrolling factor based on the current cost /// threshold and other factors. unsigned Count; - /// A forced peeling factor (the number of bodied of the original loop - /// that should be peeled off before the loop body). When set to 0, the - /// unrolling transformation will select a peeling factor based on profile - /// information and other factors. - unsigned PeelCount; /// Default unroll count for loops with run-time trip count. unsigned DefaultUnrollRuntimeCount; // Set the maximum unrolling factor. The unrolling factor may be selected @@ -488,19 +483,10 @@ class TargetTransformInfo { bool Force; /// Allow using trip count upper bound to unroll loops. bool UpperBound; - /// Allow peeling off loop iterations. - bool AllowPeeling; - /// Allow peeling off loop iterations for loop nests. - bool AllowLoopNestsPeeling; /// Allow unrolling of all the iterations of the runtime loop remainder. bool UnrollRemainder; /// Allow unroll and jam. Used to enable unroll and jam for the target. bool UnrollAndJam; - /// Allow peeling basing on profile. Uses to enable peeling off all - /// iterations basing on provided profile. - /// If the value is true the peeling cost model can decide to peel only - /// some iterations and in this case it will set this to false. - bool PeelProfiledIterations; /// Threshold for unroll and jam, for inner loop size. The 'Threshold' /// value above is used during unroll and jam for the outer loop size. /// This value is used in the same manner to limit the size of the inner @@ -534,6 +520,28 @@ class TargetTransformInfo { /// intrinsic is supported. bool emitGetActiveLaneMask() const; + // Parameters that control the loop peeling transformation + struct PeelingPreferences { + /// A forced peeling factor (the number of bodied of the original loop + /// that should be peeled off before the loop body). When set to 0, the + /// a peeling factor based on profile information and other factors. + unsigned PeelCount; + /// Allow peeling off loop iterations. + bool AllowPeeling; + /// Allow peeling off loop iterations for loop nests. + bool AllowLoopNestsPeeling; + /// Allow peeling basing on profile. Uses to enable peeling off all + /// iterations basing on provided profile. + /// If the value is true the peeling cost model can decide to peel only + /// some iterations and in this case it will set this to false. + bool PeelProfiledIterations; + }; + + /// Get target-customized preferences for the generic loop peeling + /// transformation. The caller will initialize \p PP with the current + /// target-independent defaults with information from \p L and \p SE. + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const; /// @} /// \name Scalar Target Information @@ -1282,6 +1290,8 @@ class TargetTransformInfo::Concept { virtual bool isLoweredToCall(const Function *F) = 0; virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP) = 0; + virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) = 0; virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, @@ -1560,6 +1570,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { UnrollingPreferences &UP) override { return Impl.getUnrollingPreferences(L, SE, UP); } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) override { + return Impl.getPeelingPreferences(L, SE, PP); + } bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) override { diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index ca7106ab98aa..0ce975d6d4b5 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -150,6 +150,9 @@ class TargetTransformInfoImplBase { void getUnrollingPreferences(Loop *, ScalarEvolution &, TTI::UnrollingPreferences &) {} + void getPeelingPreferences(Loop *, ScalarEvolution &, + TTI::PeelingPreferences &) {} + bool isLegalAddImmediate(int64_t Imm) { return false; } bool isLegalICmpImmediate(int64_t Imm) { return false; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index c6a9a65ae6c1..f9d32eadd23e 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -451,6 +451,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { UP.BEInsns = 2; } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + PP.PeelCount = 0; + PP.AllowPeeling = true; + PP.AllowLoopNestsPeeling = false; + PP.PeelProfiledIterations = true; + } + bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index 1970cefcefba..bb3d02b95956 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -94,6 +94,7 @@ bool UnrollRuntimeLoopRemainder( void computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE); bool canPeel(Loop *L); @@ -119,6 +120,8 @@ bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, + bool &UseUpperBound); void simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI, @@ -133,9 +136,13 @@ TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount); + Optional UserUpperBound, Optional UserFullUnrollMaxCount); + +TargetTransformInfo::PeelingPreferences +gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling); unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent, diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 87c6f83938ed..2f051e53790b 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -327,6 +327,11 @@ void TargetTransformInfo::getUnrollingPreferences( return TTIImpl->getUnrollingPreferences(L, SE, UP); } +void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const { + return TTIImpl->getPeelingPreferences(L, SE, PP); +} + bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { return TTIImpl->isLegalAddImmediate(Imm); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index be0c51b83a25..cf6de797727b 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -859,6 +859,11 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, getFalkorUnrollingPreferences(L, SE, UP); } +void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) { switch (Inst->getIntrinsicID()) { diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 27afb2e5a7d6..094b04c95db4 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -153,6 +153,9 @@ class AArch64TTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 24f079ffe929..46051ac14b59 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -236,6 +236,10 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, } } +void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. @@ -990,6 +994,11 @@ void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, CommonTTI.getUnrollingPreferences(L, SE, UP); } +void GCNTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} + unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } @@ -1096,3 +1105,8 @@ void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { CommonTTI.getUnrollingPreferences(L, SE, UP); } + +void R600TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index 508ed061e935..b913f5194e40 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -61,6 +61,9 @@ class AMDGPUTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); }; class GCNTTIImpl final : public BasicTTIImplBase { @@ -141,6 +144,9 @@ class GCNTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); return TTI::PSK_FastHardware; @@ -258,6 +264,8 @@ class R600TTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); unsigned getHardwareNumberOfRegisters(bool Vec) const; unsigned getNumberOfRegisters(bool Vec) const; unsigned getRegisterBitWidth(bool Vector) const; diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 44dfb9e8c129..74b1331216a0 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1582,6 +1582,11 @@ void ARMTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void ARMTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + bool ARMTTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty, TTI::ReductionFlags Flags) const { return ST->hasMVEIntegerOps(); diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index 5d914227c968..537a546361ee 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -251,6 +251,8 @@ class ARMTTIImpl : public BasicTTIImplBase { bool emitGetActiveLaneMask() const; + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool shouldBuildLookupTablesForConstant(Constant *C) const { // In the ROPI and RWPI relocation models we can't have pointers to global // variables or functions in constant data, so don't convert switches to diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index 76df4e8e1931..80c8736cb74a 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -78,12 +78,17 @@ HexagonTTIImpl::getPopcntSupport(unsigned IntTyWidthInBit) const { void HexagonTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { UP.Runtime = UP.Partial = true; +} + +void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); // Only try to peel innermost loops with small runtime trip counts. if (L && L->empty() && canPeel(L) && SE.getSmallConstantTripCount(L) == 0 && SE.getSmallConstantMaxTripCount(L) > 0 && SE.getSmallConstantMaxTripCount(L) <= 5) { - UP.PeelCount = 2; + PP.PeelCount = 2; } } diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index 3365c5bf1cb1..5fe397486402 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -64,6 +64,9 @@ class HexagonTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + /// Bias LSR towards creating post-increment opportunities. bool shouldFavorPostInc() const; diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index 5c14d0f1a24d..3873c73fb2e0 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -155,3 +155,8 @@ void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Partial = UP.Runtime = true; UP.PartialThreshold = UP.Threshold / 4; } + +void NVPTXTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h index 88156f687284..cb832031f1ad 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h @@ -95,6 +95,10 @@ class NVPTXTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { // Volatile loads/stores are only supported for shared and global address // spaces, or for generic AS that maps to them. diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index f2c746a14299..53556ffc267d 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -568,6 +568,10 @@ void PPCTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, BaseT::getUnrollingPreferences(L, SE, UP); } +void PPCTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} // This function returns true to allow using coldcc calling convention. // Returning true results in coldcc being used for functions which are cold at // all call sites when the callers of the functions are not calling any other diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h index b831789d3e6e..d998521084e1 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h @@ -66,6 +66,8 @@ class PPCTTIImpl : public BasicTTIImplBase { TargetLibraryInfo *LibInfo); void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp index 36141426e27d..864200e5f71c 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp @@ -294,6 +294,10 @@ void SystemZTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void SystemZTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2) { diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h index d20541774da1..7f8f7f6f923f 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h @@ -50,6 +50,9 @@ class SystemZTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); /// @} diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp index f0ece1faa5fd..285cba6ee205 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp @@ -158,7 +158,8 @@ static bool computeUnrollAndJamCount( const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned OuterTripCount, unsigned OuterTripMultiple, unsigned OuterLoopSize, unsigned InnerTripCount, - unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP) { + unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP) { // First up use computeUnrollCount from the loop unroller to get a count // for unrolling the outer loop, plus any loops requiring explicit // unrolling we leave to the unroller. This uses UP.Threshold / @@ -168,7 +169,8 @@ static bool computeUnrollAndJamCount( bool UseUpperBound = false; bool ExplicitUnroll = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, ORE, OuterTripCount, MaxTripCount, - /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, UseUpperBound); + /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, PP, + UseUpperBound); if (ExplicitUnroll || UseUpperBound) { // If the user explicitly set the loop as unrolled, dont UnJ it. Leave it // for the unroller instead. @@ -282,7 +284,9 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, OptimizationRemarkEmitter &ORE, int OptLevel) { TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None, - None, None, None, None, None, None, None); + None, None, None, None, None); + TargetTransformInfo::PeelingPreferences PP = + gatherPeelingPreferences(L, SE, TTI, None, None); if (AllowUnrollAndJam.getNumOccurrences() > 0) UP.UnrollAndJam = AllowUnrollAndJam; if (UnrollAndJamThreshold.getNumOccurrences() > 0) @@ -367,7 +371,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, // Decide if, and by how much, to unroll bool IsCountSetExplicitly = computeUnrollAndJamCount( L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount, - OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP); + OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP, PP); if (UP.Count <= 1) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index ec56610e41e5..88845cde8d4f 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -193,9 +193,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount) { + Optional UserUpperBound, Optional UserFullUnrollMaxCount) { TargetTransformInfo::UnrollingPreferences UP; // Set up the defaults @@ -206,7 +204,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.PartialThreshold = 150; UP.PartialOptSizeThreshold = 0; UP.Count = 0; - UP.PeelCount = 0; UP.DefaultUnrollRuntimeCount = 8; UP.MaxCount = std::numeric_limits::max(); UP.FullUnrollMaxCount = std::numeric_limits::max(); @@ -218,10 +215,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.AllowExpensiveTripCount = false; UP.Force = false; UP.UpperBound = false; - UP.AllowPeeling = true; - UP.AllowLoopNestsPeeling = false; UP.UnrollAndJam = false; - UP.PeelProfiledIterations = true; UP.UnrollAndJamInnerLoopThreshold = 60; UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze; @@ -249,8 +243,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.MaxCount = UnrollMaxCount; if (UnrollFullMaxCount.getNumOccurrences() > 0) UP.FullUnrollMaxCount = UnrollFullMaxCount; - if (UnrollPeelCount.getNumOccurrences() > 0) - UP.PeelCount = UnrollPeelCount; if (UnrollAllowPartial.getNumOccurrences() > 0) UP.Partial = UnrollAllowPartial; if (UnrollAllowRemainder.getNumOccurrences() > 0) @@ -259,10 +251,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = UnrollRuntime; if (UnrollMaxUpperBound == 0) UP.UpperBound = false; - if (UnrollAllowPeeling.getNumOccurrences() > 0) - UP.AllowPeeling = UnrollAllowPeeling; - if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) - UP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; if (UnrollUnrollRemainder.getNumOccurrences() > 0) UP.UnrollRemainder = UnrollUnrollRemainder; if (UnrollMaxIterationsCountToAnalyze.getNumOccurrences() > 0) @@ -281,16 +269,39 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = *UserRuntime; if (UserUpperBound.hasValue()) UP.UpperBound = *UserUpperBound; - if (UserAllowPeeling.hasValue()) - UP.AllowPeeling = *UserAllowPeeling; - if (UserAllowProfileBasedPeeling.hasValue()) - UP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; if (UserFullUnrollMaxCount.hasValue()) UP.FullUnrollMaxCount = *UserFullUnrollMaxCount; return UP; } +TargetTransformInfo::PeelingPreferences +llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling) { + TargetTransformInfo::PeelingPreferences PP; + + // Get Target Specifc Values + TTI.getPeelingPreferences(L, SE, PP); + + // User Specified Values using cl::opt + if (UnrollPeelCount.getNumOccurrences() > 0) + PP.PeelCount = UnrollPeelCount; + if (UnrollAllowPeeling.getNumOccurrences() > 0) + PP.AllowPeeling = UnrollAllowPeeling; + if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) + PP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; + + // User Specifed values provided by argument + if (UserAllowPeeling.hasValue()) + PP.AllowPeeling = *UserAllowPeeling; + if (UserAllowProfileBasedPeeling.hasValue()) + PP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; + + return PP; +} + namespace { /// A struct to densely store the state of an instruction after unrolling at @@ -761,7 +772,8 @@ bool llvm::computeUnrollCount( ScalarEvolution &SE, const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, - TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) { + TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) { // Check for explicit Count. // 1st priority is unroll count set by "unroll-count" option. @@ -863,8 +875,8 @@ bool llvm::computeUnrollCount( } // 4th priority is loop peeling. - computePeelCount(L, LoopSize, UP, TripCount, SE); - if (UP.PeelCount) { + computePeelCount(L, LoopSize, UP, PP, TripCount, SE); + if (PP.PeelCount) { UP.Runtime = false; UP.Count = 1; return ExplicitUnroll; @@ -1067,8 +1079,9 @@ static LoopUnrollResult tryToUnrollLoop( TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences( L, SE, TTI, BFI, PSI, OptLevel, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound, - ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, ProvidedFullUnrollMaxCount); + TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences( + L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling); // Exit early if unrolling is disabled. For OptForSize, we pick the loop size // as threshold later on. @@ -1142,7 +1155,7 @@ static LoopUnrollResult tryToUnrollLoop( bool UseUpperBound = false; bool IsCountSetExplicitly = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero, - TripMultiple, LoopSize, UP, UseUpperBound); + TripMultiple, LoopSize, UP, PP, UseUpperBound); if (!UP.Count) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. @@ -1157,7 +1170,7 @@ static LoopUnrollResult tryToUnrollLoop( LoopUnrollResult UnrollResult = UnrollLoop( L, {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount, - UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder, + UseUpperBound, MaxOrZero, TripMultiple, PP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV}, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop); if (UnrollResult == LoopUnrollResult::Unmodified) @@ -1189,7 +1202,7 @@ static LoopUnrollResult tryToUnrollLoop( // If the loop was peeled, we already "used up" the profile information // we had, so we don't want to unroll or peel again. if (UnrollResult != LoopUnrollResult::FullyUnrolled && - (IsCountSetExplicitly || (UP.PeelProfiledIterations && UP.PeelCount))) + (IsCountSetExplicitly || (PP.PeelProfiledIterations && PP.PeelCount))) L->setLoopAlreadyUnrolled(); return UnrollResult; diff --git a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp index 43dfaf3e50dc..c653aacbee6c 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp @@ -279,19 +279,20 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount, // Return the number of iterations we want to peel off. void llvm::computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE) { assert(LoopSize > 0 && "Zero loop size is not allowed!"); - // Save the UP.PeelCount value set by the target in - // TTI.getUnrollingPreferences or by the flag -unroll-peel-count. - unsigned TargetPeelCount = UP.PeelCount; - UP.PeelCount = 0; + // Save the PP.PeelCount value set by the target in + // TTI.getPeelingPreferences or by the flag -unroll-peel-count. + unsigned TargetPeelCount = PP.PeelCount; + PP.PeelCount = 0; if (!canPeel(L)) return; // Only try to peel innermost loops by default. // The constraint can be relaxed by the target in TTI.getUnrollingPreferences // or by the flag -unroll-allow-loop-nests-peeling. - if (!UP.AllowLoopNestsPeeling && !L->empty()) + if (!PP.AllowLoopNestsPeeling && !L->empty()) return; // If the user provided a peel count, use that. @@ -299,13 +300,13 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, if (UserPeelCount) { LLVM_DEBUG(dbgs() << "Force-peeling first " << UnrollForcePeelCount << " iterations.\n"); - UP.PeelCount = UnrollForcePeelCount; - UP.PeelProfiledIterations = true; + PP.PeelCount = UnrollForcePeelCount; + PP.PeelProfiledIterations = true; return; } // Skip peeling if it's disabled. - if (!UP.AllowPeeling) + if (!PP.AllowPeeling) return; unsigned AlreadyPeeled = 0; @@ -354,8 +355,8 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn" << " some Phis into invariants.\n"); - UP.PeelCount = DesiredPeelCount; - UP.PeelProfiledIterations = false; + PP.PeelCount = DesiredPeelCount; + PP.PeelProfiledIterations = false; return; } } @@ -367,7 +368,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, return; // Do not apply profile base peeling if it is disabled. - if (!UP.PeelProfiledIterations) + if (!PP.PeelProfiledIterations) return; // If we don't know the trip count, but have reason to believe the average // trip count is low, peeling should be beneficial, since we will usually @@ -387,7 +388,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, (LoopSize * (*PeelCount + 1) <= UP.Threshold)) { LLVM_DEBUG(dbgs() << "Peeling first " << *PeelCount << " iterations.\n"); - UP.PeelCount = *PeelCount; + PP.PeelCount = *PeelCount; return; } LLVM_DEBUG(dbgs() << "Requested peel count: " << *PeelCount << "\n"); From llvm-commits at lists.llvm.org Wed Jul 8 11:56:24 2020 From: llvm-commits at lists.llvm.org (Anh Tuyen Tran via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:56:24 +0000 (UTC) Subject: [PATCH] D80580: [NFC] Separate Peeling Properties into its own struct In-Reply-To: References: Message-ID: <0a5155cc8988d3fb768eddb0b1b221e4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGfead250b439b: [NFC] Separate Peeling Properties into its own struct (authored by anhtuyen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80580/new/ https://reviews.llvm.org/D80580 Files: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80580.276508.patch Type: text/x-patch Size: 31069 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:57:04 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:57:04 +0000 (UTC) Subject: [PATCH] D82210: [SVE] Remove calls to VectorType::getNumElements from CodeGen In-Reply-To: References: Message-ID: <3958e1367945570838f00bfda2065f4d@localhost.localdomain> ctetreau added a comment. @sdesmalen I went ahead and made the requested changes. In ValueType.cpp, rather than leaving the code alone, I went ahead and returned getElementCount().Min along with adding a warning if the vector was scalable. This way getNumElement() can be removed in the future without having to touch this file again. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82210/new/ https://reviews.llvm.org/D82210 From llvm-commits at lists.llvm.org Wed Jul 8 11:58:09 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:58:09 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <9817bc51f68f01eab5d152026017e7f0@localhost.localdomain> eugenis accepted this revision. eugenis added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Wed Jul 8 11:59:14 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:59:14 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <58851bf31a70510b9c88c8f1e8417ade@localhost.localdomain> lebedev.ri added a comment. There's also `llvm::for_each()` with 40 uses. Can you please quote specific part of the whatever documentation you believe dictates this? If there isn't one, i'd like to see `ProgrammersManual` patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 11:59:56 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:59:56 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: cameron.mcinally updated this revision to Diff 276509. cameron.mcinally added a comment. Remove power-of-2 requirement. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 Files: llvm/include/llvm/CodeGen/ISDOpcodes.h Index: llvm/include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- llvm/include/llvm/CodeGen/ISDOpcodes.h +++ llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -448,11 +448,11 @@ FCANONICALIZE, /// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector - /// with the specified, possibly variable, elements. The number of elements - /// is required to be a power of two. The types of the operands must all be - /// the same and must match the vector element type, except that integer types - /// are allowed to be larger than the element type, in which case the operands - /// are implicitly truncated. + /// with the specified, possibly variable, elements. The types of the + /// operands must all be the same. The types of the operands must match the + /// vector element type, except that integer types are allowed to be larger + /// than the element type, in which case the operands are implicitly + /// truncated. BUILD_VECTOR, /// INSERT_VECTOR_ELT(VECTOR, VAL, IDX) - Returns VECTOR with the element -------------- next part -------------- A non-text attachment was scrubbed... Name: D83413.276509.patch Type: text/x-patch Size: 1093 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 11:59:58 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 18:59:58 +0000 (UTC) Subject: [PATCH] D83392: Strlen loop idiom recognition In-Reply-To: References: Message-ID: efriedma added a comment. I don't like the way the pattern-matching is written here; analyzing the exact instruction patterns is complicated, error-prone, and likely to miss cases. I think I'd structure the analysis like this: 1. Is the latch condition of the form "%loadedval = load i8, i8* %p; icmp eq i8 %loadedval, 0" 2. Query SCEV to check %p is an AddRec with step 1. 3. Query SCEV to check all the LCSSA PHI nodes are AddRecs. 4. Check that the loop doesn't contain any operations with side-effects. Then to apply the transform, you take the SCEV for the LCSSA PHI node, transform the AddRec to an Add (for example, `{n,+,1}` to `n + strlen(p)`), then use SCEVExpander to expand it. Leveraging SCEV like this will make the transform a lot more flexible and easier to read. ================ Comment at: llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1606 + if (LoopBody->size() > MaxLoopSize) + return false; + ---------------- "BasicBlock::size()" should never be used as a threshold; it's sensitive to debug info. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83392/new/ https://reviews.llvm.org/D83392 From llvm-commits at lists.llvm.org Wed Jul 8 12:00:19 2020 From: llvm-commits at lists.llvm.org (Anh Tuyen Tran via llvm-commits) Date: Wed, 08 Jul 2020 12:00:19 -0700 (PDT) Subject: [llvm] 6965af4 - Revert "[NFC] Separate Peeling Properties into its own struct" Message-ID: <5f0617c3.1c69fb81.3d0d9.1747@mx.google.com> Author: Anh Tuyen Tran Date: 2020-07-08T18:58:05Z New Revision: 6965af43e6b83fda2c32663f55b1568ffe6d67f9 URL: https://github.com/llvm/llvm-project/commit/6965af43e6b83fda2c32663f55b1568ffe6d67f9 DIFF: https://github.com/llvm/llvm-project/commit/6965af43e6b83fda2c32663f55b1568ffe6d67f9.diff LOG: Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a. Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index b6698eefdb01..695b7d6061c0 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -450,6 +450,11 @@ class TargetTransformInfo { /// transformation will select an unrolling factor based on the current cost /// threshold and other factors. unsigned Count; + /// A forced peeling factor (the number of bodied of the original loop + /// that should be peeled off before the loop body). When set to 0, the + /// unrolling transformation will select a peeling factor based on profile + /// information and other factors. + unsigned PeelCount; /// Default unroll count for loops with run-time trip count. unsigned DefaultUnrollRuntimeCount; // Set the maximum unrolling factor. The unrolling factor may be selected @@ -483,10 +488,19 @@ class TargetTransformInfo { bool Force; /// Allow using trip count upper bound to unroll loops. bool UpperBound; + /// Allow peeling off loop iterations. + bool AllowPeeling; + /// Allow peeling off loop iterations for loop nests. + bool AllowLoopNestsPeeling; /// Allow unrolling of all the iterations of the runtime loop remainder. bool UnrollRemainder; /// Allow unroll and jam. Used to enable unroll and jam for the target. bool UnrollAndJam; + /// Allow peeling basing on profile. Uses to enable peeling off all + /// iterations basing on provided profile. + /// If the value is true the peeling cost model can decide to peel only + /// some iterations and in this case it will set this to false. + bool PeelProfiledIterations; /// Threshold for unroll and jam, for inner loop size. The 'Threshold' /// value above is used during unroll and jam for the outer loop size. /// This value is used in the same manner to limit the size of the inner @@ -520,28 +534,6 @@ class TargetTransformInfo { /// intrinsic is supported. bool emitGetActiveLaneMask() const; - // Parameters that control the loop peeling transformation - struct PeelingPreferences { - /// A forced peeling factor (the number of bodied of the original loop - /// that should be peeled off before the loop body). When set to 0, the - /// a peeling factor based on profile information and other factors. - unsigned PeelCount; - /// Allow peeling off loop iterations. - bool AllowPeeling; - /// Allow peeling off loop iterations for loop nests. - bool AllowLoopNestsPeeling; - /// Allow peeling basing on profile. Uses to enable peeling off all - /// iterations basing on provided profile. - /// If the value is true the peeling cost model can decide to peel only - /// some iterations and in this case it will set this to false. - bool PeelProfiledIterations; - }; - - /// Get target-customized preferences for the generic loop peeling - /// transformation. The caller will initialize \p PP with the current - /// target-independent defaults with information from \p L and \p SE. - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) const; /// @} /// \name Scalar Target Information @@ -1290,8 +1282,6 @@ class TargetTransformInfo::Concept { virtual bool isLoweredToCall(const Function *F) = 0; virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP) = 0; - virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) = 0; virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, @@ -1570,10 +1560,6 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { UnrollingPreferences &UP) override { return Impl.getUnrollingPreferences(L, SE, UP); } - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) override { - return Impl.getPeelingPreferences(L, SE, PP); - } bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) override { diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 0ce975d6d4b5..ca7106ab98aa 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -150,9 +150,6 @@ class TargetTransformInfoImplBase { void getUnrollingPreferences(Loop *, ScalarEvolution &, TTI::UnrollingPreferences &) {} - void getPeelingPreferences(Loop *, ScalarEvolution &, - TTI::PeelingPreferences &) {} - bool isLegalAddImmediate(int64_t Imm) { return false; } bool isLegalICmpImmediate(int64_t Imm) { return false; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index f9d32eadd23e..c6a9a65ae6c1 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -451,14 +451,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { UP.BEInsns = 2; } - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - PP.PeelCount = 0; - PP.AllowPeeling = true; - PP.AllowLoopNestsPeeling = false; - PP.PeelProfiledIterations = true; - } - bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index bb3d02b95956..1970cefcefba 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -94,7 +94,6 @@ bool UnrollRuntimeLoopRemainder( void computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE); bool canPeel(Loop *L); @@ -120,8 +119,6 @@ bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, - bool &UseUpperBound); void simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI, @@ -136,13 +133,9 @@ TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserFullUnrollMaxCount); - -TargetTransformInfo::PeelingPreferences -gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, - const TargetTransformInfo &TTI, - Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling); + Optional UserUpperBound, Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling, + Optional UserFullUnrollMaxCount); unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent, diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 2f051e53790b..87c6f83938ed 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -327,11 +327,6 @@ void TargetTransformInfo::getUnrollingPreferences( return TTIImpl->getUnrollingPreferences(L, SE, UP); } -void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) const { - return TTIImpl->getPeelingPreferences(L, SE, PP); -} - bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { return TTIImpl->isLegalAddImmediate(Imm); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index cf6de797727b..be0c51b83a25 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -859,11 +859,6 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, getFalkorUnrollingPreferences(L, SE, UP); } -void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} - Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) { switch (Inst->getIntrinsicID()) { diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 094b04c95db4..27afb2e5a7d6 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -153,9 +153,6 @@ class AArch64TTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 46051ac14b59..24f079ffe929 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -236,10 +236,6 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, } } -void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. @@ -994,11 +990,6 @@ void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, CommonTTI.getUnrollingPreferences(L, SE, UP); } -void GCNTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - CommonTTI.getPeelingPreferences(L, SE, PP); -} - unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } @@ -1105,8 +1096,3 @@ void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { CommonTTI.getUnrollingPreferences(L, SE, UP); } - -void R600TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - CommonTTI.getPeelingPreferences(L, SE, PP); -} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index b913f5194e40..508ed061e935 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -61,9 +61,6 @@ class AMDGPUTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); }; class GCNTTIImpl final : public BasicTTIImplBase { @@ -144,9 +141,6 @@ class GCNTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); return TTI::PSK_FastHardware; @@ -264,8 +258,6 @@ class R600TTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); unsigned getHardwareNumberOfRegisters(bool Vec) const; unsigned getNumberOfRegisters(bool Vec) const; unsigned getRegisterBitWidth(bool Vector) const; diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 74b1331216a0..44dfb9e8c129 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1582,11 +1582,6 @@ void ARMTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } -void ARMTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} - bool ARMTTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty, TTI::ReductionFlags Flags) const { return ST->hasMVEIntegerOps(); diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index 537a546361ee..5d914227c968 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -251,8 +251,6 @@ class ARMTTIImpl : public BasicTTIImplBase { bool emitGetActiveLaneMask() const; - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); bool shouldBuildLookupTablesForConstant(Constant *C) const { // In the ROPI and RWPI relocation models we can't have pointers to global // variables or functions in constant data, so don't convert switches to diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index 80c8736cb74a..76df4e8e1931 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -78,17 +78,12 @@ HexagonTTIImpl::getPopcntSupport(unsigned IntTyWidthInBit) const { void HexagonTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { UP.Runtime = UP.Partial = true; -} - -void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); // Only try to peel innermost loops with small runtime trip counts. if (L && L->empty() && canPeel(L) && SE.getSmallConstantTripCount(L) == 0 && SE.getSmallConstantMaxTripCount(L) > 0 && SE.getSmallConstantMaxTripCount(L) <= 5) { - PP.PeelCount = 2; + UP.PeelCount = 2; } } diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index 5fe397486402..3365c5bf1cb1 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -64,9 +64,6 @@ class HexagonTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - /// Bias LSR towards creating post-increment opportunities. bool shouldFavorPostInc() const; diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index 3873c73fb2e0..5c14d0f1a24d 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -155,8 +155,3 @@ void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Partial = UP.Runtime = true; UP.PartialThreshold = UP.Threshold / 4; } - -void NVPTXTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h index cb832031f1ad..88156f687284 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h @@ -95,10 +95,6 @@ class NVPTXTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { // Volatile loads/stores are only supported for shared and global address // spaces, or for generic AS that maps to them. diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index 53556ffc267d..f2c746a14299 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -568,10 +568,6 @@ void PPCTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, BaseT::getUnrollingPreferences(L, SE, UP); } -void PPCTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} // This function returns true to allow using coldcc calling convention. // Returning true results in coldcc being used for functions which are cold at // all call sites when the callers of the functions are not calling any other diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h index d998521084e1..b831789d3e6e 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h @@ -66,8 +66,6 @@ class PPCTTIImpl : public BasicTTIImplBase { TargetLibraryInfo *LibInfo); void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp index 864200e5f71c..36141426e27d 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp @@ -294,10 +294,6 @@ void SystemZTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } -void SystemZTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2) { diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h index 7f8f7f6f923f..d20541774da1 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h @@ -50,9 +50,6 @@ class SystemZTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); /// @} diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp index 285cba6ee205..f0ece1faa5fd 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp @@ -158,8 +158,7 @@ static bool computeUnrollAndJamCount( const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned OuterTripCount, unsigned OuterTripMultiple, unsigned OuterLoopSize, unsigned InnerTripCount, - unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP) { + unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP) { // First up use computeUnrollCount from the loop unroller to get a count // for unrolling the outer loop, plus any loops requiring explicit // unrolling we leave to the unroller. This uses UP.Threshold / @@ -169,8 +168,7 @@ static bool computeUnrollAndJamCount( bool UseUpperBound = false; bool ExplicitUnroll = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, ORE, OuterTripCount, MaxTripCount, - /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, PP, - UseUpperBound); + /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, UseUpperBound); if (ExplicitUnroll || UseUpperBound) { // If the user explicitly set the loop as unrolled, dont UnJ it. Leave it // for the unroller instead. @@ -284,9 +282,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, OptimizationRemarkEmitter &ORE, int OptLevel) { TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None, - None, None, None, None, None); - TargetTransformInfo::PeelingPreferences PP = - gatherPeelingPreferences(L, SE, TTI, None, None); + None, None, None, None, None, None, None); if (AllowUnrollAndJam.getNumOccurrences() > 0) UP.UnrollAndJam = AllowUnrollAndJam; if (UnrollAndJamThreshold.getNumOccurrences() > 0) @@ -371,7 +367,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, // Decide if, and by how much, to unroll bool IsCountSetExplicitly = computeUnrollAndJamCount( L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount, - OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP, PP); + OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP); if (UP.Count <= 1) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index 88845cde8d4f..ec56610e41e5 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -193,7 +193,9 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserFullUnrollMaxCount) { + Optional UserUpperBound, Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling, + Optional UserFullUnrollMaxCount) { TargetTransformInfo::UnrollingPreferences UP; // Set up the defaults @@ -204,6 +206,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.PartialThreshold = 150; UP.PartialOptSizeThreshold = 0; UP.Count = 0; + UP.PeelCount = 0; UP.DefaultUnrollRuntimeCount = 8; UP.MaxCount = std::numeric_limits::max(); UP.FullUnrollMaxCount = std::numeric_limits::max(); @@ -215,7 +218,10 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.AllowExpensiveTripCount = false; UP.Force = false; UP.UpperBound = false; + UP.AllowPeeling = true; + UP.AllowLoopNestsPeeling = false; UP.UnrollAndJam = false; + UP.PeelProfiledIterations = true; UP.UnrollAndJamInnerLoopThreshold = 60; UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze; @@ -243,6 +249,8 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.MaxCount = UnrollMaxCount; if (UnrollFullMaxCount.getNumOccurrences() > 0) UP.FullUnrollMaxCount = UnrollFullMaxCount; + if (UnrollPeelCount.getNumOccurrences() > 0) + UP.PeelCount = UnrollPeelCount; if (UnrollAllowPartial.getNumOccurrences() > 0) UP.Partial = UnrollAllowPartial; if (UnrollAllowRemainder.getNumOccurrences() > 0) @@ -251,6 +259,10 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = UnrollRuntime; if (UnrollMaxUpperBound == 0) UP.UpperBound = false; + if (UnrollAllowPeeling.getNumOccurrences() > 0) + UP.AllowPeeling = UnrollAllowPeeling; + if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) + UP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; if (UnrollUnrollRemainder.getNumOccurrences() > 0) UP.UnrollRemainder = UnrollUnrollRemainder; if (UnrollMaxIterationsCountToAnalyze.getNumOccurrences() > 0) @@ -269,39 +281,16 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = *UserRuntime; if (UserUpperBound.hasValue()) UP.UpperBound = *UserUpperBound; + if (UserAllowPeeling.hasValue()) + UP.AllowPeeling = *UserAllowPeeling; + if (UserAllowProfileBasedPeeling.hasValue()) + UP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; if (UserFullUnrollMaxCount.hasValue()) UP.FullUnrollMaxCount = *UserFullUnrollMaxCount; return UP; } -TargetTransformInfo::PeelingPreferences -llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, - const TargetTransformInfo &TTI, - Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling) { - TargetTransformInfo::PeelingPreferences PP; - - // Get Target Specifc Values - TTI.getPeelingPreferences(L, SE, PP); - - // User Specified Values using cl::opt - if (UnrollPeelCount.getNumOccurrences() > 0) - PP.PeelCount = UnrollPeelCount; - if (UnrollAllowPeeling.getNumOccurrences() > 0) - PP.AllowPeeling = UnrollAllowPeeling; - if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) - PP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; - - // User Specifed values provided by argument - if (UserAllowPeeling.hasValue()) - PP.AllowPeeling = *UserAllowPeeling; - if (UserAllowProfileBasedPeeling.hasValue()) - PP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; - - return PP; -} - namespace { /// A struct to densely store the state of an instruction after unrolling at @@ -772,8 +761,7 @@ bool llvm::computeUnrollCount( ScalarEvolution &SE, const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, - TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) { + TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) { // Check for explicit Count. // 1st priority is unroll count set by "unroll-count" option. @@ -875,8 +863,8 @@ bool llvm::computeUnrollCount( } // 4th priority is loop peeling. - computePeelCount(L, LoopSize, UP, PP, TripCount, SE); - if (PP.PeelCount) { + computePeelCount(L, LoopSize, UP, TripCount, SE); + if (UP.PeelCount) { UP.Runtime = false; UP.Count = 1; return ExplicitUnroll; @@ -1079,9 +1067,8 @@ static LoopUnrollResult tryToUnrollLoop( TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences( L, SE, TTI, BFI, PSI, OptLevel, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound, + ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, ProvidedFullUnrollMaxCount); - TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences( - L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling); // Exit early if unrolling is disabled. For OptForSize, we pick the loop size // as threshold later on. @@ -1155,7 +1142,7 @@ static LoopUnrollResult tryToUnrollLoop( bool UseUpperBound = false; bool IsCountSetExplicitly = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero, - TripMultiple, LoopSize, UP, PP, UseUpperBound); + TripMultiple, LoopSize, UP, UseUpperBound); if (!UP.Count) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. @@ -1170,7 +1157,7 @@ static LoopUnrollResult tryToUnrollLoop( LoopUnrollResult UnrollResult = UnrollLoop( L, {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount, - UseUpperBound, MaxOrZero, TripMultiple, PP.PeelCount, UP.UnrollRemainder, + UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV}, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop); if (UnrollResult == LoopUnrollResult::Unmodified) @@ -1202,7 +1189,7 @@ static LoopUnrollResult tryToUnrollLoop( // If the loop was peeled, we already "used up" the profile information // we had, so we don't want to unroll or peel again. if (UnrollResult != LoopUnrollResult::FullyUnrolled && - (IsCountSetExplicitly || (PP.PeelProfiledIterations && PP.PeelCount))) + (IsCountSetExplicitly || (UP.PeelProfiledIterations && UP.PeelCount))) L->setLoopAlreadyUnrolled(); return UnrollResult; diff --git a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp index c653aacbee6c..43dfaf3e50dc 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp @@ -279,20 +279,19 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount, // Return the number of iterations we want to peel off. void llvm::computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE) { assert(LoopSize > 0 && "Zero loop size is not allowed!"); - // Save the PP.PeelCount value set by the target in - // TTI.getPeelingPreferences or by the flag -unroll-peel-count. - unsigned TargetPeelCount = PP.PeelCount; - PP.PeelCount = 0; + // Save the UP.PeelCount value set by the target in + // TTI.getUnrollingPreferences or by the flag -unroll-peel-count. + unsigned TargetPeelCount = UP.PeelCount; + UP.PeelCount = 0; if (!canPeel(L)) return; // Only try to peel innermost loops by default. // The constraint can be relaxed by the target in TTI.getUnrollingPreferences // or by the flag -unroll-allow-loop-nests-peeling. - if (!PP.AllowLoopNestsPeeling && !L->empty()) + if (!UP.AllowLoopNestsPeeling && !L->empty()) return; // If the user provided a peel count, use that. @@ -300,13 +299,13 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, if (UserPeelCount) { LLVM_DEBUG(dbgs() << "Force-peeling first " << UnrollForcePeelCount << " iterations.\n"); - PP.PeelCount = UnrollForcePeelCount; - PP.PeelProfiledIterations = true; + UP.PeelCount = UnrollForcePeelCount; + UP.PeelProfiledIterations = true; return; } // Skip peeling if it's disabled. - if (!PP.AllowPeeling) + if (!UP.AllowPeeling) return; unsigned AlreadyPeeled = 0; @@ -355,8 +354,8 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn" << " some Phis into invariants.\n"); - PP.PeelCount = DesiredPeelCount; - PP.PeelProfiledIterations = false; + UP.PeelCount = DesiredPeelCount; + UP.PeelProfiledIterations = false; return; } } @@ -368,7 +367,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, return; // Do not apply profile base peeling if it is disabled. - if (!PP.PeelProfiledIterations) + if (!UP.PeelProfiledIterations) return; // If we don't know the trip count, but have reason to believe the average // trip count is low, peeling should be beneficial, since we will usually @@ -388,7 +387,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, (LoopSize * (*PeelCount + 1) <= UP.Threshold)) { LLVM_DEBUG(dbgs() << "Peeling first " << *PeelCount << " iterations.\n"); - PP.PeelCount = *PeelCount; + UP.PeelCount = *PeelCount; return; } LLVM_DEBUG(dbgs() << "Requested peel count: " << *PeelCount << "\n"); From llvm-commits at lists.llvm.org Wed Jul 8 12:00:22 2020 From: llvm-commits at lists.llvm.org (Anh Tuyen Tran via llvm-commits) Date: Wed, 08 Jul 2020 12:00:22 -0700 (PDT) Subject: [llvm] 0369dc9 - [NFC] Separate Peeling Properties into its own struct Message-ID: <5f0617c6.1c69fb81.85b5.17f2@mx.google.com> Author: Sidharth Baveja Date: 2020-07-08T18:59:59Z New Revision: 0369dc98f958a1ca2ec05f1897f091129bb16e8a URL: https://github.com/llvm/llvm-project/commit/0369dc98f958a1ca2ec05f1897f091129bb16e8a DIFF: https://github.com/llvm/llvm-project/commit/0369dc98f958a1ca2ec05f1897f091129bb16e8a.diff LOG: [NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580 Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index 695b7d6061c0..b6698eefdb01 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -450,11 +450,6 @@ class TargetTransformInfo { /// transformation will select an unrolling factor based on the current cost /// threshold and other factors. unsigned Count; - /// A forced peeling factor (the number of bodied of the original loop - /// that should be peeled off before the loop body). When set to 0, the - /// unrolling transformation will select a peeling factor based on profile - /// information and other factors. - unsigned PeelCount; /// Default unroll count for loops with run-time trip count. unsigned DefaultUnrollRuntimeCount; // Set the maximum unrolling factor. The unrolling factor may be selected @@ -488,19 +483,10 @@ class TargetTransformInfo { bool Force; /// Allow using trip count upper bound to unroll loops. bool UpperBound; - /// Allow peeling off loop iterations. - bool AllowPeeling; - /// Allow peeling off loop iterations for loop nests. - bool AllowLoopNestsPeeling; /// Allow unrolling of all the iterations of the runtime loop remainder. bool UnrollRemainder; /// Allow unroll and jam. Used to enable unroll and jam for the target. bool UnrollAndJam; - /// Allow peeling basing on profile. Uses to enable peeling off all - /// iterations basing on provided profile. - /// If the value is true the peeling cost model can decide to peel only - /// some iterations and in this case it will set this to false. - bool PeelProfiledIterations; /// Threshold for unroll and jam, for inner loop size. The 'Threshold' /// value above is used during unroll and jam for the outer loop size. /// This value is used in the same manner to limit the size of the inner @@ -534,6 +520,28 @@ class TargetTransformInfo { /// intrinsic is supported. bool emitGetActiveLaneMask() const; + // Parameters that control the loop peeling transformation + struct PeelingPreferences { + /// A forced peeling factor (the number of bodied of the original loop + /// that should be peeled off before the loop body). When set to 0, the + /// a peeling factor based on profile information and other factors. + unsigned PeelCount; + /// Allow peeling off loop iterations. + bool AllowPeeling; + /// Allow peeling off loop iterations for loop nests. + bool AllowLoopNestsPeeling; + /// Allow peeling basing on profile. Uses to enable peeling off all + /// iterations basing on provided profile. + /// If the value is true the peeling cost model can decide to peel only + /// some iterations and in this case it will set this to false. + bool PeelProfiledIterations; + }; + + /// Get target-customized preferences for the generic loop peeling + /// transformation. The caller will initialize \p PP with the current + /// target-independent defaults with information from \p L and \p SE. + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const; /// @} /// \name Scalar Target Information @@ -1282,6 +1290,8 @@ class TargetTransformInfo::Concept { virtual bool isLoweredToCall(const Function *F) = 0; virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP) = 0; + virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) = 0; virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, @@ -1560,6 +1570,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { UnrollingPreferences &UP) override { return Impl.getUnrollingPreferences(L, SE, UP); } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) override { + return Impl.getPeelingPreferences(L, SE, PP); + } bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) override { diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index ca7106ab98aa..0ce975d6d4b5 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -150,6 +150,9 @@ class TargetTransformInfoImplBase { void getUnrollingPreferences(Loop *, ScalarEvolution &, TTI::UnrollingPreferences &) {} + void getPeelingPreferences(Loop *, ScalarEvolution &, + TTI::PeelingPreferences &) {} + bool isLegalAddImmediate(int64_t Imm) { return false; } bool isLegalICmpImmediate(int64_t Imm) { return false; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index c6a9a65ae6c1..f9d32eadd23e 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -451,6 +451,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { UP.BEInsns = 2; } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + PP.PeelCount = 0; + PP.AllowPeeling = true; + PP.AllowLoopNestsPeeling = false; + PP.PeelProfiledIterations = true; + } + bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index 1970cefcefba..bb3d02b95956 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -94,6 +94,7 @@ bool UnrollRuntimeLoopRemainder( void computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE); bool canPeel(Loop *L); @@ -119,6 +120,8 @@ bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, + bool &UseUpperBound); void simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI, @@ -133,9 +136,13 @@ TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount); + Optional UserUpperBound, Optional UserFullUnrollMaxCount); + +TargetTransformInfo::PeelingPreferences +gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling); unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent, diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 87c6f83938ed..2f051e53790b 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -327,6 +327,11 @@ void TargetTransformInfo::getUnrollingPreferences( return TTIImpl->getUnrollingPreferences(L, SE, UP); } +void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const { + return TTIImpl->getPeelingPreferences(L, SE, PP); +} + bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { return TTIImpl->isLegalAddImmediate(Imm); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index be0c51b83a25..cf6de797727b 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -859,6 +859,11 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, getFalkorUnrollingPreferences(L, SE, UP); } +void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) { switch (Inst->getIntrinsicID()) { diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 27afb2e5a7d6..094b04c95db4 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -153,6 +153,9 @@ class AArch64TTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 24f079ffe929..46051ac14b59 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -236,6 +236,10 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, } } +void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. @@ -990,6 +994,11 @@ void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, CommonTTI.getUnrollingPreferences(L, SE, UP); } +void GCNTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} + unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } @@ -1096,3 +1105,8 @@ void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { CommonTTI.getUnrollingPreferences(L, SE, UP); } + +void R600TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index 508ed061e935..b913f5194e40 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -61,6 +61,9 @@ class AMDGPUTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); }; class GCNTTIImpl final : public BasicTTIImplBase { @@ -141,6 +144,9 @@ class GCNTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); return TTI::PSK_FastHardware; @@ -258,6 +264,8 @@ class R600TTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); unsigned getHardwareNumberOfRegisters(bool Vec) const; unsigned getNumberOfRegisters(bool Vec) const; unsigned getRegisterBitWidth(bool Vector) const; diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 44dfb9e8c129..74b1331216a0 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1582,6 +1582,11 @@ void ARMTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void ARMTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + bool ARMTTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty, TTI::ReductionFlags Flags) const { return ST->hasMVEIntegerOps(); diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index 5d914227c968..537a546361ee 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -251,6 +251,8 @@ class ARMTTIImpl : public BasicTTIImplBase { bool emitGetActiveLaneMask() const; + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool shouldBuildLookupTablesForConstant(Constant *C) const { // In the ROPI and RWPI relocation models we can't have pointers to global // variables or functions in constant data, so don't convert switches to diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index 76df4e8e1931..80c8736cb74a 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -78,12 +78,17 @@ HexagonTTIImpl::getPopcntSupport(unsigned IntTyWidthInBit) const { void HexagonTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { UP.Runtime = UP.Partial = true; +} + +void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); // Only try to peel innermost loops with small runtime trip counts. if (L && L->empty() && canPeel(L) && SE.getSmallConstantTripCount(L) == 0 && SE.getSmallConstantMaxTripCount(L) > 0 && SE.getSmallConstantMaxTripCount(L) <= 5) { - UP.PeelCount = 2; + PP.PeelCount = 2; } } diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index 3365c5bf1cb1..5fe397486402 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -64,6 +64,9 @@ class HexagonTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + /// Bias LSR towards creating post-increment opportunities. bool shouldFavorPostInc() const; diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index 5c14d0f1a24d..3873c73fb2e0 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -155,3 +155,8 @@ void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Partial = UP.Runtime = true; UP.PartialThreshold = UP.Threshold / 4; } + +void NVPTXTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h index 88156f687284..cb832031f1ad 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h @@ -95,6 +95,10 @@ class NVPTXTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { // Volatile loads/stores are only supported for shared and global address // spaces, or for generic AS that maps to them. diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index f2c746a14299..53556ffc267d 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -568,6 +568,10 @@ void PPCTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, BaseT::getUnrollingPreferences(L, SE, UP); } +void PPCTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} // This function returns true to allow using coldcc calling convention. // Returning true results in coldcc being used for functions which are cold at // all call sites when the callers of the functions are not calling any other diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h index b831789d3e6e..d998521084e1 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h @@ -66,6 +66,8 @@ class PPCTTIImpl : public BasicTTIImplBase { TargetLibraryInfo *LibInfo); void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp index 36141426e27d..864200e5f71c 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp @@ -294,6 +294,10 @@ void SystemZTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void SystemZTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2) { diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h index d20541774da1..7f8f7f6f923f 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h @@ -50,6 +50,9 @@ class SystemZTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); /// @} diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp index f0ece1faa5fd..285cba6ee205 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp @@ -158,7 +158,8 @@ static bool computeUnrollAndJamCount( const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned OuterTripCount, unsigned OuterTripMultiple, unsigned OuterLoopSize, unsigned InnerTripCount, - unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP) { + unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP) { // First up use computeUnrollCount from the loop unroller to get a count // for unrolling the outer loop, plus any loops requiring explicit // unrolling we leave to the unroller. This uses UP.Threshold / @@ -168,7 +169,8 @@ static bool computeUnrollAndJamCount( bool UseUpperBound = false; bool ExplicitUnroll = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, ORE, OuterTripCount, MaxTripCount, - /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, UseUpperBound); + /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, PP, + UseUpperBound); if (ExplicitUnroll || UseUpperBound) { // If the user explicitly set the loop as unrolled, dont UnJ it. Leave it // for the unroller instead. @@ -282,7 +284,9 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, OptimizationRemarkEmitter &ORE, int OptLevel) { TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None, - None, None, None, None, None, None, None); + None, None, None, None, None); + TargetTransformInfo::PeelingPreferences PP = + gatherPeelingPreferences(L, SE, TTI, None, None); if (AllowUnrollAndJam.getNumOccurrences() > 0) UP.UnrollAndJam = AllowUnrollAndJam; if (UnrollAndJamThreshold.getNumOccurrences() > 0) @@ -367,7 +371,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, // Decide if, and by how much, to unroll bool IsCountSetExplicitly = computeUnrollAndJamCount( L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount, - OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP); + OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP, PP); if (UP.Count <= 1) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index ec56610e41e5..88845cde8d4f 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -193,9 +193,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount) { + Optional UserUpperBound, Optional UserFullUnrollMaxCount) { TargetTransformInfo::UnrollingPreferences UP; // Set up the defaults @@ -206,7 +204,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.PartialThreshold = 150; UP.PartialOptSizeThreshold = 0; UP.Count = 0; - UP.PeelCount = 0; UP.DefaultUnrollRuntimeCount = 8; UP.MaxCount = std::numeric_limits::max(); UP.FullUnrollMaxCount = std::numeric_limits::max(); @@ -218,10 +215,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.AllowExpensiveTripCount = false; UP.Force = false; UP.UpperBound = false; - UP.AllowPeeling = true; - UP.AllowLoopNestsPeeling = false; UP.UnrollAndJam = false; - UP.PeelProfiledIterations = true; UP.UnrollAndJamInnerLoopThreshold = 60; UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze; @@ -249,8 +243,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.MaxCount = UnrollMaxCount; if (UnrollFullMaxCount.getNumOccurrences() > 0) UP.FullUnrollMaxCount = UnrollFullMaxCount; - if (UnrollPeelCount.getNumOccurrences() > 0) - UP.PeelCount = UnrollPeelCount; if (UnrollAllowPartial.getNumOccurrences() > 0) UP.Partial = UnrollAllowPartial; if (UnrollAllowRemainder.getNumOccurrences() > 0) @@ -259,10 +251,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = UnrollRuntime; if (UnrollMaxUpperBound == 0) UP.UpperBound = false; - if (UnrollAllowPeeling.getNumOccurrences() > 0) - UP.AllowPeeling = UnrollAllowPeeling; - if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) - UP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; if (UnrollUnrollRemainder.getNumOccurrences() > 0) UP.UnrollRemainder = UnrollUnrollRemainder; if (UnrollMaxIterationsCountToAnalyze.getNumOccurrences() > 0) @@ -281,16 +269,39 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = *UserRuntime; if (UserUpperBound.hasValue()) UP.UpperBound = *UserUpperBound; - if (UserAllowPeeling.hasValue()) - UP.AllowPeeling = *UserAllowPeeling; - if (UserAllowProfileBasedPeeling.hasValue()) - UP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; if (UserFullUnrollMaxCount.hasValue()) UP.FullUnrollMaxCount = *UserFullUnrollMaxCount; return UP; } +TargetTransformInfo::PeelingPreferences +llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling) { + TargetTransformInfo::PeelingPreferences PP; + + // Get Target Specifc Values + TTI.getPeelingPreferences(L, SE, PP); + + // User Specified Values using cl::opt + if (UnrollPeelCount.getNumOccurrences() > 0) + PP.PeelCount = UnrollPeelCount; + if (UnrollAllowPeeling.getNumOccurrences() > 0) + PP.AllowPeeling = UnrollAllowPeeling; + if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) + PP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; + + // User Specifed values provided by argument + if (UserAllowPeeling.hasValue()) + PP.AllowPeeling = *UserAllowPeeling; + if (UserAllowProfileBasedPeeling.hasValue()) + PP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; + + return PP; +} + namespace { /// A struct to densely store the state of an instruction after unrolling at @@ -761,7 +772,8 @@ bool llvm::computeUnrollCount( ScalarEvolution &SE, const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, - TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) { + TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) { // Check for explicit Count. // 1st priority is unroll count set by "unroll-count" option. @@ -863,8 +875,8 @@ bool llvm::computeUnrollCount( } // 4th priority is loop peeling. - computePeelCount(L, LoopSize, UP, TripCount, SE); - if (UP.PeelCount) { + computePeelCount(L, LoopSize, UP, PP, TripCount, SE); + if (PP.PeelCount) { UP.Runtime = false; UP.Count = 1; return ExplicitUnroll; @@ -1067,8 +1079,9 @@ static LoopUnrollResult tryToUnrollLoop( TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences( L, SE, TTI, BFI, PSI, OptLevel, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound, - ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, ProvidedFullUnrollMaxCount); + TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences( + L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling); // Exit early if unrolling is disabled. For OptForSize, we pick the loop size // as threshold later on. @@ -1142,7 +1155,7 @@ static LoopUnrollResult tryToUnrollLoop( bool UseUpperBound = false; bool IsCountSetExplicitly = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero, - TripMultiple, LoopSize, UP, UseUpperBound); + TripMultiple, LoopSize, UP, PP, UseUpperBound); if (!UP.Count) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. @@ -1157,7 +1170,7 @@ static LoopUnrollResult tryToUnrollLoop( LoopUnrollResult UnrollResult = UnrollLoop( L, {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount, - UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder, + UseUpperBound, MaxOrZero, TripMultiple, PP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV}, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop); if (UnrollResult == LoopUnrollResult::Unmodified) @@ -1189,7 +1202,7 @@ static LoopUnrollResult tryToUnrollLoop( // If the loop was peeled, we already "used up" the profile information // we had, so we don't want to unroll or peel again. if (UnrollResult != LoopUnrollResult::FullyUnrolled && - (IsCountSetExplicitly || (UP.PeelProfiledIterations && UP.PeelCount))) + (IsCountSetExplicitly || (PP.PeelProfiledIterations && PP.PeelCount))) L->setLoopAlreadyUnrolled(); return UnrollResult; diff --git a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp index 43dfaf3e50dc..c653aacbee6c 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp @@ -279,19 +279,20 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount, // Return the number of iterations we want to peel off. void llvm::computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE) { assert(LoopSize > 0 && "Zero loop size is not allowed!"); - // Save the UP.PeelCount value set by the target in - // TTI.getUnrollingPreferences or by the flag -unroll-peel-count. - unsigned TargetPeelCount = UP.PeelCount; - UP.PeelCount = 0; + // Save the PP.PeelCount value set by the target in + // TTI.getPeelingPreferences or by the flag -unroll-peel-count. + unsigned TargetPeelCount = PP.PeelCount; + PP.PeelCount = 0; if (!canPeel(L)) return; // Only try to peel innermost loops by default. // The constraint can be relaxed by the target in TTI.getUnrollingPreferences // or by the flag -unroll-allow-loop-nests-peeling. - if (!UP.AllowLoopNestsPeeling && !L->empty()) + if (!PP.AllowLoopNestsPeeling && !L->empty()) return; // If the user provided a peel count, use that. @@ -299,13 +300,13 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, if (UserPeelCount) { LLVM_DEBUG(dbgs() << "Force-peeling first " << UnrollForcePeelCount << " iterations.\n"); - UP.PeelCount = UnrollForcePeelCount; - UP.PeelProfiledIterations = true; + PP.PeelCount = UnrollForcePeelCount; + PP.PeelProfiledIterations = true; return; } // Skip peeling if it's disabled. - if (!UP.AllowPeeling) + if (!PP.AllowPeeling) return; unsigned AlreadyPeeled = 0; @@ -354,8 +355,8 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn" << " some Phis into invariants.\n"); - UP.PeelCount = DesiredPeelCount; - UP.PeelProfiledIterations = false; + PP.PeelCount = DesiredPeelCount; + PP.PeelProfiledIterations = false; return; } } @@ -367,7 +368,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, return; // Do not apply profile base peeling if it is disabled. - if (!UP.PeelProfiledIterations) + if (!PP.PeelProfiledIterations) return; // If we don't know the trip count, but have reason to believe the average // trip count is low, peeling should be beneficial, since we will usually @@ -387,7 +388,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, (LoopSize * (*PeelCount + 1) <= UP.Threshold)) { LLVM_DEBUG(dbgs() << "Peeling first " << *PeelCount << " iterations.\n"); - UP.PeelCount = *PeelCount; + PP.PeelCount = *PeelCount; return; } LLVM_DEBUG(dbgs() << "Requested peel count: " << *PeelCount << "\n"); From llvm-commits at lists.llvm.org Wed Jul 8 12:00:49 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:00:49 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: <3ef819de786af720521203c089f51bf4@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 From llvm-commits at lists.llvm.org Wed Jul 8 12:00:52 2020 From: llvm-commits at lists.llvm.org (Jakub Kuderski via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:00:52 +0000 (UTC) Subject: [PATCH] D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes In-Reply-To: References: Message-ID: <2f4816da3a6acddb3e4cd6be392ea2eb@localhost.localdomain> kuhar accepted this revision. kuhar added a comment. This revision is now accepted and ready to land. Thanks for checking this. I dug up some old whole-program bitcode and uploaded it here in case it helps with future code reviews: https://drive.google.com/drive/folders/1VJpym19cW-8BVgdtl2MsD3zB4CoEQ93O?usp=sharing I did a quick experiment I run the trunk and your patch on a few binaries from each project and got the number below. I only set the performance governor to performance but didn't do anything fancy to fix the clocks, disable SMT, or any core pinning. My machine has two Xeon 6154 Skylake CPUs. | | Branch | Avg. [s] | Std | Run 1 [s] | Run 2 [s] | Run 3 [s] | Run 4 [s] | Run 5 [s] | Delta [%] | | ----------- | ------- | -------- | ------ | --------- | --------- | --------- | --------- | --------- | --------- | | sqlite3.bc | trunk | 15.922 | 0.1011 | 15.82 | 15.82 | 15.98 | 15.94 | 16.05 | | | | nicolai | 15.958 | 0.0522 | 15.97 | 16.03 | 15.97 | 15.93 | 15.89 | 0.23% | | llvm-as.bc | trunk | 207.868 | 2.7855 | 209.82 | 210.73 | 206.72 | 203.69 | 208.38 | | | | nicolai | 207.01 | 1.7279 | 209.37 | 205.98 | 207.75 | 207.12 | 204.83 | -0.41% | | wasm-as.bc | trunk | 43.596 | 0.2885 | 43.08 | 43.73 | 43.72 | 43.72 | 43.73 | | | | nicolai | 43.696 | 0.1689 | 43.82 | 43.49 | 43.91 | 43.66 | 43.6 | 0.23% | | wasm-opt.bc | trunk | 46.956 | 0.1383 | 47.06 | 47.14 | 46.91 | 46.82 | 46.85 | | | | nicolai | 46.998 | 0.4558 | 46.46 | 46.59 | 47.29 | 47.13 | 47.52 | 0.09% | | Seems performance-neutral to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83089/new/ https://reviews.llvm.org/D83089 From llvm-commits at lists.llvm.org Wed Jul 8 12:02:13 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via llvm-commits) Date: Wed, 08 Jul 2020 12:02:13 -0700 (PDT) Subject: [llvm] 89f1ad8 - [LangRef] Introduce `noundef` attribute for fully defined function params Message-ID: <5f061835.1c69fb81.2cb63.17b9@mx.google.com> Author: Gui Andrade Date: 2020-07-08T19:02:04Z New Revision: 89f1ad88b3f1ecf32e797247b9eab5662ed4bcf4 URL: https://github.com/llvm/llvm-project/commit/89f1ad88b3f1ecf32e797247b9eab5662ed4bcf4 DIFF: https://github.com/llvm/llvm-project/commit/89f1ad88b3f1ecf32e797247b9eab5662ed4bcf4.diff LOG: [LangRef] Introduce `noundef` attribute for fully defined function params LLVM currently does not require function parameters or return values to be fully initialized, and does not care if they are poison. This can be useful if the frontend ABI makes no such demands, but may prevent helpful backend transformations in case they do. Specifically, the C and C++ languages require all scalar function operands to be fully determined. Introducing this attribute is of particular use to MemorySanitizer today, although other transformations may benefit from it as well. We can modify MemorySanitizer instrumentation to provide modest (17%) space savings where `frozen` is present. This commit only adds the attribute to the Language Reference, and the actual implementation of the attribute will follow in a separate commit. Differential Revision: https://reviews.llvm.org/D82316 Added: Modified: llvm/docs/BitCodeFormat.rst llvm/docs/LangRef.rst Removed: ################################################################################ diff --git a/llvm/docs/BitCodeFormat.rst b/llvm/docs/BitCodeFormat.rst index 065f9c3d49d4..6e491c6e2854 100644 --- a/llvm/docs/BitCodeFormat.rst +++ b/llvm/docs/BitCodeFormat.rst @@ -1067,6 +1067,7 @@ The integer codes are mapped to well-known attributes as follows. * code 65: ``preallocated`` * code 66: ``no_merge`` * code 67: ``null_pointer_is_valid`` +* code 68: ``noundef`` .. note:: The ``allocsize`` attribute has a special encoding for its arguments. Its two diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index cc2f6d1b3a09..566d761d3072 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -1252,6 +1252,12 @@ Currently, only the following parameter attributes are defined: only valid on intrinsic declarations and cannot be applied to a call site or arbitrary function. +``noundef`` + This attribute applies to parameters and return values. If the value + representation contains any undefined or poison bits, the behavior is + undefined. Note that this does not refer to padding introduced by the + type's storage representation. + .. _gc: Garbage Collector Strategy Names @@ -3657,6 +3663,11 @@ behavior. Notably this includes (but is not limited to): - The condition operand of a :ref:`br ` instruction. - The callee operand of a :ref:`call ` or :ref:`invoke ` instruction. +- The parameter operand of a :ref:`call ` or :ref:`invoke ` + instruction, when the function or invoking call site has a ``noundef`` + attribute in the corresponding position. +- The operand of a :ref:`ret ` instruction if the function or invoking + call site has a `noundef` attribute in the return value position. Here are some examples: From llvm-commits at lists.llvm.org Wed Jul 8 12:02:16 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:02:16 +0000 (UTC) Subject: [PATCH] D82316: [LangRef] Add `noundef` attribute to documentation In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG89f1ad88b3f1: [LangRef] Introduce `noundef` attribute for fully defined function params (authored by guiand). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82316/new/ https://reviews.llvm.org/D82316 Files: llvm/docs/BitCodeFormat.rst llvm/docs/LangRef.rst Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -1252,6 +1252,12 @@ only valid on intrinsic declarations and cannot be applied to a call site or arbitrary function. +``noundef`` + This attribute applies to parameters and return values. If the value + representation contains any undefined or poison bits, the behavior is + undefined. Note that this does not refer to padding introduced by the + type's storage representation. + .. _gc: Garbage Collector Strategy Names @@ -3657,6 +3663,11 @@ - The condition operand of a :ref:`br ` instruction. - The callee operand of a :ref:`call ` or :ref:`invoke ` instruction. +- The parameter operand of a :ref:`call ` or :ref:`invoke ` + instruction, when the function or invoking call site has a ``noundef`` + attribute in the corresponding position. +- The operand of a :ref:`ret ` instruction if the function or invoking + call site has a `noundef` attribute in the return value position. Here are some examples: Index: llvm/docs/BitCodeFormat.rst =================================================================== --- llvm/docs/BitCodeFormat.rst +++ llvm/docs/BitCodeFormat.rst @@ -1067,6 +1067,7 @@ * code 65: ``preallocated`` * code 66: ``no_merge`` * code 67: ``null_pointer_is_valid`` +* code 68: ``noundef`` .. note:: The ``allocsize`` attribute has a special encoding for its arguments. Its two -------------- next part -------------- A non-text attachment was scrubbed... Name: D82316.276510.patch Type: text/x-patch Size: 1557 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:02:15 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via llvm-commits) Date: Wed, 08 Jul 2020 12:02:15 -0700 (PDT) Subject: [llvm] ff7900d - [LLVM] Accept `noundef` attribute in function definitions/calls Message-ID: <5f061837.1c69fb81.5f98a.166c@mx.google.com> Author: Gui Andrade Date: 2020-07-08T19:02:04Z New Revision: ff7900d5def4f645a6675d99ad39a38d8a468a63 URL: https://github.com/llvm/llvm-project/commit/ff7900d5def4f645a6675d99ad39a38d8a468a63 DIFF: https://github.com/llvm/llvm-project/commit/ff7900d5def4f645a6675d99ad39a38d8a468a63.diff LOG: [LLVM] Accept `noundef` attribute in function definitions/calls The `noundef` attribute indicates an argument or return value which may never have an undef value representation. This patch allows LLVM to parse the attribute. Differential Revision: https://reviews.llvm.org/D83412 Added: Modified: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Attributes.td llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/Attributes.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp llvm/test/Bitcode/attributes.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index 092730242d07..de4fe6630324 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -643,6 +643,7 @@ enum AttributeKindCodes { ATTR_KIND_PREALLOCATED = 65, ATTR_KIND_NO_MERGE = 66, ATTR_KIND_NULL_POINTER_IS_VALID = 67, + ATTR_KIND_NOUNDEF = 68, }; enum ComdatSelectionKindCodes { diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td index 3516ce20e988..395f9dbfb176 100644 --- a/llvm/include/llvm/IR/Attributes.td +++ b/llvm/include/llvm/IR/Attributes.td @@ -39,6 +39,9 @@ def Builtin : EnumAttr<"builtin">; /// Pass structure by value. def ByVal : TypeAttr<"byval">; +/// Parameter or return value may not contain uninitialized or poison bits. +def NoUndef : EnumAttr<"noundef">; + /// Marks function as being in a cold path. def Cold : EnumAttr<"cold">; diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp index 2a39e19a68b7..777ce3abdddd 100644 --- a/llvm/lib/AsmParser/LLLexer.cpp +++ b/llvm/lib/AsmParser/LLLexer.cpp @@ -664,6 +664,7 @@ lltok::Kind LLLexer::LexIdentifier() { KEYWORD(noreturn); KEYWORD(nosync); KEYWORD(nocf_check); + KEYWORD(noundef); KEYWORD(nounwind); KEYWORD(null_pointer_is_valid); KEYWORD(optforfuzzing); diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp index 85105f2c4b49..e3a52c7882a2 100644 --- a/llvm/lib/AsmParser/LLParser.cpp +++ b/llvm/lib/AsmParser/LLParser.cpp @@ -1374,6 +1374,7 @@ bool LLParser::ParseFnAttributeValuePairs(AttrBuilder &B, case lltok::kw_inalloca: case lltok::kw_nest: case lltok::kw_noalias: + case lltok::kw_noundef: case lltok::kw_nocapture: case lltok::kw_nonnull: case lltok::kw_returned: @@ -1677,6 +1678,9 @@ bool LLParser::ParseOptionalParamAttrs(AttrBuilder &B) { case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break; case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break; case lltok::kw_nest: B.addAttribute(Attribute::Nest); break; + case lltok::kw_noundef: + B.addAttribute(Attribute::NoUndef); + break; case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break; case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break; case lltok::kw_nofree: B.addAttribute(Attribute::NoFree); break; @@ -1774,6 +1778,9 @@ bool LLParser::ParseOptionalReturnAttrs(AttrBuilder &B) { } case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break; case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break; + case lltok::kw_noundef: + B.addAttribute(Attribute::NoUndef); + break; case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break; case lltok::kw_signext: B.addAttribute(Attribute::SExt); break; case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break; diff --git a/llvm/lib/AsmParser/LLToken.h b/llvm/lib/AsmParser/LLToken.h index 0c190bfb63b3..0fb3bae77dd3 100644 --- a/llvm/lib/AsmParser/LLToken.h +++ b/llvm/lib/AsmParser/LLToken.h @@ -196,6 +196,7 @@ enum Kind { kw_naked, kw_nest, kw_noalias, + kw_noundef, kw_nobuiltin, kw_nocapture, kw_noduplicate, diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index dceb492c9120..659e26c2bd25 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -1530,6 +1530,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) { return Attribute::SanitizeMemTag; case bitc::ATTR_KIND_PREALLOCATED: return Attribute::Preallocated; + case bitc::ATTR_KIND_NOUNDEF: + return Attribute::NoUndef; } } diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index a23f39be6b2a..9c15a5f9f193 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -731,6 +731,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) { return bitc::ATTR_KIND_SANITIZE_MEMTAG; case Attribute::Preallocated: return bitc::ATTR_KIND_PREALLOCATED; + case Attribute::NoUndef: + return bitc::ATTR_KIND_NOUNDEF; case Attribute::EndAttrKinds: llvm_unreachable("Can not encode end-attribute kinds marker."); case Attribute::None: diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp index 4c4aa51d9e80..f67d96a854f4 100644 --- a/llvm/lib/IR/Attributes.cpp +++ b/llvm/lib/IR/Attributes.cpp @@ -443,6 +443,8 @@ std::string Attribute::getAsString(bool InAttrGrp) const { return "cold"; if (hasAttribute(Attribute::ImmArg)) return "immarg"; + if (hasAttribute(Attribute::NoUndef)) + return "noundef"; if (hasAttribute(Attribute::ByVal)) { std::string Result; diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp index 594590a746b8..8cdbb9d35652 100644 --- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp +++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp @@ -877,6 +877,7 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs, case Attribute::NoMerge: case Attribute::NoReturn: case Attribute::NoSync: + case Attribute::NoUndef: case Attribute::None: case Attribute::NonNull: case Attribute::Preallocated: diff --git a/llvm/test/Bitcode/attributes.ll b/llvm/test/Bitcode/attributes.ll index 71c1fb59ebaa..fbbe4a80f31e 100644 --- a/llvm/test/Bitcode/attributes.ll +++ b/llvm/test/Bitcode/attributes.ll @@ -386,6 +386,12 @@ define void @f65() null_pointer_is_valid ret void; } +; CHECK: define noundef i32 @f66(i32 noundef %a) +define noundef i32 @f66(i32 noundef %a) +{ + ret i32 %a +} + ; CHECK: attributes #0 = { noreturn } ; CHECK: attributes #1 = { nounwind } ; CHECK: attributes #2 = { readnone } From llvm-commits at lists.llvm.org Wed Jul 8 12:02:19 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:02:19 +0000 (UTC) Subject: [PATCH] D83412: [LLVM] Accept `noundef` attribute in function definitions/calls In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGff7900d5def4: [LLVM] Accept `noundef` attribute in function definitions/calls (authored by guiand). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83412/new/ https://reviews.llvm.org/D83412 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Attributes.td llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/Attributes.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp llvm/test/Bitcode/attributes.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83412.276511.patch Type: text/x-patch Size: 5285 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:02:33 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Wed, 08 Jul 2020 20:02:33 +0100 Subject: [llvm] 6965af4 - Revert "[NFC] Separate Peeling Properties into its own struct" In-Reply-To: <5f0617c3.1c69fb81.3d0d9.1747@mx.google.com> References: <5f0617c3.1c69fb81.3d0d9.1747@mx.google.com> Message-ID: <3523453D-3F37-47A5-9679-5C4E8C9D0874@apple.com> > On 8 Jul 2020, at 20:00, Anh Tuyen Tran via llvm-commits wrote: > > > Author: Anh Tuyen Tran > Date: 2020-07-08T18:58:05Z > New Revision: 6965af43e6b83fda2c32663f55b1568ffe6d67f9 > > URL: https://github.com/llvm/llvm-project/commit/6965af43e6b83fda2c32663f55b1568ffe6d67f9 > DIFF: https://github.com/llvm/llvm-project/commit/6965af43e6b83fda2c32663f55b1568ffe6d67f9.diff > > LOG: Revert "[NFC] Separate Peeling Properties into its own struct" > > This reverts commit fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a. It would be good to include the reason for the revert in the commit message, so it is clear in the git log. Cheers, Florian From llvm-commits at lists.llvm.org Wed Jul 8 12:03:29 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Wed, 08 Jul 2020 20:03:29 +0100 Subject: [llvm] bf9a940 - Revert "Double check that passes correctly set their Modified status" In-Reply-To: <5f05f12a.1c69fb81.41061.0c3b@mx.google.com> References: <5f05f12a.1c69fb81.41061.0c3b@mx.google.com> Message-ID: > On 8 Jul 2020, at 17:15, via llvm-commits wrote: > > > Author: serge-sans-paille > Date: 2020-07-08T18:14:40+02:00 > New Revision: bf9a940c3f1b460420b1106fe5b1565fd60be5a2 > > URL: https://github.com/llvm/llvm-project/commit/bf9a940c3f1b460420b1106fe5b1565fd60be5a2 > DIFF: https://github.com/llvm/llvm-project/commit/bf9a940c3f1b460420b1106fe5b1565fd60be5a2.diff > > LOG: Revert "Double check that passes correctly set their Modified status" > > This reverts commit 37afd99c768b29c7df7c5f2eb645362fb61f9915. Hi It would be good to include the reason for the revert in the commit message, so it is clear in the log. Cheers, Florian From llvm-commits at lists.llvm.org Wed Jul 8 12:05:28 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:05:28 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <785156da5ff0e5da30013bd171496797@localhost.localdomain> dblaikie added a comment. In D83351#2139751 , @lebedev.ri wrote: > There's also `llvm::for_each()` with 40 uses. Still a very tiny fraction of all iterations. I expect most (or at least an order of magnitude or two more than the <100 for_each()s) of those iterations don't depend on the order of iteration - so I don't think it provides the signal you're thinking of, at least not to enough other developers to be useful - making the code quirky/different in a way that I think is likely to be confusing to other readers ("Why isn't this a range-based for loop? is there some subtle difference in behavior that the author intended that I'm not understanding?" & there isn't) > Can you please quote specific part of the whatever documentation you believe dictates this? > If there isn't one, i'd like to see `ProgrammersManual` patch. Perhaps this is sufficient: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 12:06:38 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via llvm-commits) Date: Wed, 08 Jul 2020 12:06:38 -0700 (PDT) Subject: [llvm] 05ce9aa - Tighten description of ISD::BUILD_VECTOR Message-ID: <5f06193e.1c69fb81.4a405.1aa4@mx.google.com> Author: Cameron McInally Date: 2020-07-08T14:06:23-05:00 New Revision: 05ce9aaa69f9f58598730895bdb5535fea17213f URL: https://github.com/llvm/llvm-project/commit/05ce9aaa69f9f58598730895bdb5535fea17213f DIFF: https://github.com/llvm/llvm-project/commit/05ce9aaa69f9f58598730895bdb5535fea17213f.diff LOG: Tighten description of ISD::BUILD_VECTOR VerifySDNode(...) in SelectionDAG.cpp shows that the operands of a BUILD_VECTOR must all be the same type. This patch cleans up the comment in ISDOpcodes.h to make that more obvious. Also, remove the requirement that the number of elements must be a power-of-2. That's not true. Differential Revision: https://reviews.llvm.org/D83413 Added: Modified: llvm/include/llvm/CodeGen/ISDOpcodes.h Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h index d3ff99dee37f..d121a4d5427f 100644 --- a/llvm/include/llvm/CodeGen/ISDOpcodes.h +++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -448,11 +448,11 @@ enum NodeType { FCANONICALIZE, /// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector - /// with the specified, possibly variable, elements. The number of elements - /// is required to be a power of two. The types of the operands must all be - /// the same and must match the vector element type, except that integer types - /// are allowed to be larger than the element type, in which case the operands - /// are implicitly truncated. + /// with the specified, possibly variable, elements. The types of the + /// operands must all be the same. The types of the operands must match the + /// vector element type, except that integer types are allowed to be larger + /// than the element type, in which case the operands are implicitly + /// truncated. BUILD_VECTOR, /// INSERT_VECTOR_ELT(VECTOR, VAL, IDX) - Returns VECTOR with the element From llvm-commits at lists.llvm.org Wed Jul 8 12:06:44 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:06:44 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: <5490f51cdb31a81c9e809635b369edd9@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG05ce9aaa69f9: Tighten description of ISD::BUILD_VECTOR (authored by cameron.mcinally). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 Files: llvm/include/llvm/CodeGen/ISDOpcodes.h Index: llvm/include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- llvm/include/llvm/CodeGen/ISDOpcodes.h +++ llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -448,11 +448,11 @@ FCANONICALIZE, /// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector - /// with the specified, possibly variable, elements. The number of elements - /// is required to be a power of two. The types of the operands must all be - /// the same and must match the vector element type, except that integer types - /// are allowed to be larger than the element type, in which case the operands - /// are implicitly truncated. + /// with the specified, possibly variable, elements. The types of the + /// operands must all be the same. The types of the operands must match the + /// vector element type, except that integer types are allowed to be larger + /// than the element type, in which case the operands are implicitly + /// truncated. BUILD_VECTOR, /// INSERT_VECTOR_ELT(VECTOR, VAL, IDX) - Returns VECTOR with the element -------------- next part -------------- A non-text attachment was scrubbed... Name: D83413.276513.patch Type: text/x-patch Size: 1093 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:06:51 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:06:51 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: yamauchi updated this revision to Diff 276514. yamauchi added a comment. Rebase. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 Files: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/test/CodeGen/X86/popcnt.ll llvm/test/CodeGen/X86/pr27202.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83331.276514.patch Type: text/x-patch Size: 19323 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:07:51 2020 From: llvm-commits at lists.llvm.org (Krzysztof Parzyszek via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:07:51 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: kparzysz added a comment. I'd move the "all types must be the same" to the end, so that it doesn't look like the exception overrides it, i.e. something like The types of the operands must match the vector element type, except that for integer types the operands are allowed to be of a larger type than the element type, in which case the operands are implicitly truncated. In any case, all operands must have the same type. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 From llvm-commits at lists.llvm.org Wed Jul 8 12:08:24 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:08:24 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: efriedma added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:94 /// If it contains any dynamic allocas, returns false. static bool canTRE(Function &F) { // Because of PR962, we don't TRE dynamic allocas. ---------------- avl wrote: > efriedma wrote: > > If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop this check. > It looks to me that original idea(PR962) was to avoid inefficient code which is generated for dynamic alloca. > > Currently there would still be generated inefficient code: > > Doing TRE for dynamic alloca requires correct stack adjustment to avoid extra stack usage. > i.e. dynamic stack reservation done for alloca should be restored > in the end of the current iteration. Current TRE implementation does not do this. > > Please, consider the test case: > > > ``` > #include > > int count; > __attribute__((noinline)) void globalIncrement(const int* param) { > count += *param; > } > > void test(int recurseCount) > { > if (recurseCount == 0) return; > { > int *temp = (int*)alloca(100); > globalIncrement(temp); > } > test(recurseCount - 1); > } > > > ``` > Following is the x86 asm generated for the above test case in assumption that dynamic allocas are possible: > > ``` > > .LBB1_2: > movq %rsp, %rdi > addq $-112, %rdi <<<<<<<<<<<<<< dynamic stack reservation, need to be restored before "jne .LBB1_2" > movq %rdi, %rsp > callq _Z15globalIncrementPKi > addl $-1, %ebx > jne .LBB1_2 > ``` > > So, it looks like we still have inefficient code here and it was a reason for avoiding TRE. I guess we can leave this for a later patch. This isn't really any worse than the stack usage before TRE, assuming we can't emit a sibling call in the backend. And we could avoid this by making TRE insert stacksave/stackrestore intrinsics. But better to do one thing at a time. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Wed Jul 8 12:10:39 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:10:39 +0000 (UTC) Subject: [PATCH] D83415: [Solaris] Fix Solaris build bots Message-ID: ctetreau created this revision. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83415 Files: llvm/unittests/IR/ConstantsTest.cpp Index: llvm/unittests/IR/ConstantsTest.cpp =================================================================== --- llvm/unittests/IR/ConstantsTest.cpp +++ llvm/unittests/IR/ConstantsTest.cpp @@ -646,10 +646,10 @@ Type *Int8Ty = Type::getInt8Ty(Context); for (unsigned Min : {1, 2, 8}) { - ElementCount SEC = {Min, true}; - ElementCount FEC = {Min, false}; + ElementCount ScalableEC = {Min, true}; + ElementCount FixedEC = {Min, false}; - for (auto EC : {SEC, FEC}) { + for (auto EC : {ScalableEC, FixedEC}) { for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { Constant *Zero = Constant::getNullValue(Ty); Constant *One = Constant::getAllOnesValue(Ty); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83415.276516.patch Type: text/x-patch Size: 700 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:12:17 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:12:17 +0000 (UTC) Subject: [PATCH] D81631: Fix undefined behavior in Dwarf. In-Reply-To: References: Message-ID: <3fcbdfd593e5f03b7f4776ca3fc11776@localhost.localdomain> dblaikie added a comment. Ah, I see - only reproduced in an optimized build. Sorry for the delays/noise. I'd perhaps feel /marginally/ better about fixing this issue this way: diff --git llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp index 296c380ae55..8be0c19bbc4 100644 --- llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp +++ llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp @@ -333,13 +333,13 @@ DIE *DwarfCompileUnit::getOrCreateCommonBlock( void DwarfCompileUnit::addRange(RangeSpan Range) { DD->insertSectionLabel(Range.Begin); - bool SameAsPrevCU = this == DD->getPrevCU(); + const DwarfCompileUnit *PrevCU = DD->getPrevCU(); DD->setPrevCU(this); // If we have no current ranges just add the range and return, otherwise, // check the current section and CU against the previous section and CU we // emitted into and the subprogram was contained within. If these are the // same then extend our current range, otherwise add this as a new range. - if (CURanges.empty() || !SameAsPrevCU || + if (CURanges.empty() || this != PrevCU || (&CURanges.back().End->getSection() != &Range.End->getSection())) { CURanges.push_back(Range); At least I /think/ that should work & probably better reflect when this value should be used. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81631/new/ https://reviews.llvm.org/D81631 From llvm-commits at lists.llvm.org Wed Jul 8 12:12:51 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:12:51 +0000 (UTC) Subject: [PATCH] D82416: [SVE] Make Constant::getSplatValue work for scalable vector splats In-Reply-To: References: Message-ID: ctetreau added a comment. @ro That's unfortunate. I pushed D83415 , I'll submit without review when my local tests pass. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82416/new/ https://reviews.llvm.org/D82416 From llvm-commits at lists.llvm.org Wed Jul 8 12:16:46 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:16:46 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: efriedma added a comment. > After looking at the code for the source order comperator, it looks like the score could change after units are scheduled as well in some edge cases. So AssertisHeap might fail? I'm not really comfortable with that... > We might even go further and limit the source order comperator to just the IR ordering and the queue IDs, because the real scheduling should happen in the machine scheduler. Make this a separate patch, in case it has some unexpected side-effect, but sure, that makes sense. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Wed Jul 8 12:18:07 2020 From: llvm-commits at lists.llvm.org (Michele Scandale via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:18:07 +0000 (UTC) Subject: [PATCH] D82659: Fix missing build dependency on omp_gen. In-Reply-To: References: Message-ID: <449c2aac74b03fd0cf2e1a7be36f8851@localhost.localdomain> michele.scandale added a comment. Uhm.. it looks like it is not needed anymore. In the `LLVMConfig.cmake` that will be installed a `intrinsics_gen` and `omp_gen` custom targets are created for exactly the purpose of allowing out-of-tree or standalone builds to freely depend on them. The Clang code where `intrinsics_gen` is conditionally added as a dependency is from 2014, while the change in `LLVMConfig.cmake.in` is from 2017. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82659/new/ https://reviews.llvm.org/D82659 From llvm-commits at lists.llvm.org Wed Jul 8 12:19:09 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:19:09 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <4d491784fba09a4fe2d96fe26da456a2@localhost.localdomain> MaskRay added a comment. One thought. If you want to merge two test files, ifdef global .globl foo .endif foo: 1. llvm-mc -filetype=obj 2. llvm-mc -filetype=obj --defsym global=1 ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:45 + bl callee1_stother0_local at notoc + add 3, 3, 30 + b callee1_stother0_local at notoc ---------------- These `add 3, 3, 30` & `mullw 3, 3, 30` instructions are not needed. Probably worth adding a comment before the first `bl ` explaining that the next instruction does not need to be `nop` as in the R_PPC64_REL24 case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Wed Jul 8 12:19:38 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:19:38 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: jdenny accepted this revision. jdenny added a comment. Still LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 From llvm-commits at lists.llvm.org Wed Jul 8 12:20:17 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:20:17 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: <11f8b932f70105d38f4b4da3c6b0b44e@localhost.localdomain> yamauchi updated this revision to Diff 276519. yamauchi added a comment. Turn OptForSize into a local variable. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 Files: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/test/CodeGen/X86/popcnt.ll llvm/test/CodeGen/X86/pr27202.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83331.276519.patch Type: text/x-patch Size: 20466 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:21:38 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via llvm-commits) Date: Wed, 08 Jul 2020 12:21:38 -0700 (PDT) Subject: [llvm] d2eb409 - [Solaris] Fix Solaris build bots Message-ID: <5f061cc2.1c69fb81.3a60c.18ef@mx.google.com> Author: Christopher Tetreault Date: 2020-07-08T12:21:21-07:00 New Revision: d2eb40937976d858807faee6fbc3e016fd3a4108 URL: https://github.com/llvm/llvm-project/commit/d2eb40937976d858807faee6fbc3e016fd3a4108 DIFF: https://github.com/llvm/llvm-project/commit/d2eb40937976d858807faee6fbc3e016fd3a4108.diff LOG: [Solaris] Fix Solaris build bots Reviewers: ro Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83415 Added: Modified: llvm/unittests/IR/ConstantsTest.cpp Removed: ################################################################################ diff --git a/llvm/unittests/IR/ConstantsTest.cpp b/llvm/unittests/IR/ConstantsTest.cpp index 3fed395daee4..f1c1c86293c8 100644 --- a/llvm/unittests/IR/ConstantsTest.cpp +++ b/llvm/unittests/IR/ConstantsTest.cpp @@ -646,10 +646,10 @@ TEST(ConstantsTest, GetSplatValueRoundTrip) { Type *Int8Ty = Type::getInt8Ty(Context); for (unsigned Min : {1, 2, 8}) { - ElementCount SEC = {Min, true}; - ElementCount FEC = {Min, false}; + ElementCount ScalableEC = {Min, true}; + ElementCount FixedEC = {Min, false}; - for (auto EC : {SEC, FEC}) { + for (auto EC : {ScalableEC, FixedEC}) { for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { Constant *Zero = Constant::getNullValue(Ty); Constant *One = Constant::getAllOnesValue(Ty); From llvm-commits at lists.llvm.org Wed Jul 8 12:21:41 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:21:41 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: <4dfaae3cf1905ded09183615390afc8e@localhost.localdomain> efriedma added a comment. Also, maybe we could change the way we compute scheduling priority based on the size of the queue. So keep the current scheduling for common cases, but switch to a simpler heuristic if the queue gets too large. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Wed Jul 8 12:21:42 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:21:42 +0000 (UTC) Subject: [PATCH] D83415: [Solaris] Fix Solaris build bots In-Reply-To: References: Message-ID: <09706be24e477488cb40dd4fe87315af@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGd2eb40937976: [Solaris] Fix Solaris build bots (authored by ctetreau). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83415/new/ https://reviews.llvm.org/D83415 Files: llvm/unittests/IR/ConstantsTest.cpp Index: llvm/unittests/IR/ConstantsTest.cpp =================================================================== --- llvm/unittests/IR/ConstantsTest.cpp +++ llvm/unittests/IR/ConstantsTest.cpp @@ -646,10 +646,10 @@ Type *Int8Ty = Type::getInt8Ty(Context); for (unsigned Min : {1, 2, 8}) { - ElementCount SEC = {Min, true}; - ElementCount FEC = {Min, false}; + ElementCount ScalableEC = {Min, true}; + ElementCount FixedEC = {Min, false}; - for (auto EC : {SEC, FEC}) { + for (auto EC : {ScalableEC, FixedEC}) { for (auto *Ty : {FloatTy, Int32Ty, Int8Ty}) { Constant *Zero = Constant::getNullValue(Ty); Constant *One = Constant::getAllOnesValue(Ty); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83415.276520.patch Type: text/x-patch Size: 700 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:23:16 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:23:16 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: <6d62c5a5857afadfc6f91641d014ca6d@localhost.localdomain> yamauchi marked an inline comment as done. yamauchi added inline comments. ================ Comment at: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:165 /// performance. bool OptForSize; ---------------- craig.topper wrote: > ARe more changes needed to get rid of this member? The only remaining use is for the "OptForMinSize implies OptForSize" assert in runOnMachineFunction. I turned into a local variable there to keep the assert. We could also remove it along with the assert if it'd be better that way. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 From llvm-commits at lists.llvm.org Wed Jul 8 12:23:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:23:27 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <66350ecb83c77ebc424dae00007419a3@localhost.localdomain> lebedev.ri added a comment. In D83351#2139769 , @dblaikie wrote: > In D83351#2139751 , @lebedev.ri wrote: > > > There's also `llvm::for_each()` with 40 uses. > > > Still a very tiny fraction of all iterations. I expect most (or at least an order of magnitude or two more than the <100 for_each()s) of those iterations don't depend on the order of iteration - so I don't think it provides the signal you're thinking of, at least not to enough other developers to be useful - making the code quirky/different in a way that I think is likely to be confusing to other readers ("Why isn't this a range-based for loop? is there some subtle difference in behavior that the author intended that I'm not understanding?" & there isn't) > > > Can you please quote specific part of the whatever documentation you believe dictates this? > > If there isn't one, i'd like to see `ProgrammersManual` patch. > > Perhaps this is sufficient: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible I'm afraid it is not. It only speaks about range-based for loop vs. old-style for loops. If `for_each` is so bad, we really should proactively document it, so i'm still waiting on the link/patch. Thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 12:23:46 2020 From: llvm-commits at lists.llvm.org (Mitch Phillips via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:23:46 +0000 (UTC) Subject: [PATCH] D83416: [NFC] Fix some docs warnings. Message-ID: hctim created this revision. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Fixes two minor issues in the docs: 1 - A header is too small: Warning, treated as error: llvm/llvm/docs/Passes.rst:70:Title underline too short. ``-basic-aa``: Basic Alias Analysis (stateless AA impl) ------------------------------------------------------ 2 - Multiple definitions on a non-anonymous target (llvm-dev mailing list): Warning, treated as error: llvm/llvm/docs/DeveloperPolicy.rst:3:Duplicate explicit target name: "llvm-dev mailing list". Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83416 Files: llvm/docs/DeveloperPolicy.rst llvm/docs/Passes.rst Index: llvm/docs/Passes.rst =================================================================== --- llvm/docs/Passes.rst +++ llvm/docs/Passes.rst @@ -67,7 +67,7 @@ Spadini, and Wojciech Stryjewski. ``-basic-aa``: Basic Alias Analysis (stateless AA impl) ------------------------------------------------------- +------------------------------------------------------- A basic alias analysis pass that implements identities (two different globals cannot alias, etc), but does no stateful analysis. Index: llvm/docs/DeveloperPolicy.rst =================================================================== --- llvm/docs/DeveloperPolicy.rst +++ llvm/docs/DeveloperPolicy.rst @@ -535,7 +535,7 @@ at a minimum. This time-based guideline is not strict: we may support much older compilers, or decide to support fewer versions. - * An RFC is sent to the `llvm-dev mailing list `_ + * An RFC is sent to the `llvm-dev mailing list`_ - Detail upsides of the version increase (e.g. which newer C++ language or library features LLVM should use; avoid miscompiles in particular compiler @@ -580,15 +580,15 @@ the LLVM world. However, this is really only intended to cover common cases that we have seen arise: different situations are different, and we are open to discussing unusual cases as well - just start an RFC thread on the -`llvm-dev mailing list `_. +`llvm-dev mailing list`_. Adding a New Target ------------------- LLVM is very receptive to new targets, even experimental ones, but a number of problems can appear when adding new large portions of code, and back-ends are -normally added in bulk. We have found that landing large pieces of new code -and then trying to fix emergent problems in-tree is problematic for a variety +normally added in bulk. We have found that landing large pieces of new code +and then trying to fix emergent problems in-tree is problematic for a variety of reasons. For these reasons, new targets are *always* added as *experimental* until @@ -627,8 +627,8 @@ * The target should have either reasonable documentation on how it works (ISA, ABI, etc.) or a publicly available simulator/hardware (either free or cheap enough) - preferably both. This allows - developers to validate assumptions, understand constraints and review code - that can affect the target. + developers to validate assumptions, understand constraints and review code + that can affect the target. In addition, the rules for a back-end to be promoted to **official** are: @@ -699,7 +699,7 @@ "should" concerns above. If you have a project that you think would make sense to add to the LLVM -monorepo, please start an RFC thread on the llvm-dev mailing list to kick off +monorepo, please start an RFC thread on the `llvm-dev mailing list`_ to kick off the discussion. This process can take some time and iteration - please don’t be discouraged or intimidated by that! @@ -761,8 +761,7 @@ and when this comes up, please start an RFC discussion on llvm-dev. This process is very new - please expect the details to change, it is always -safe to ask on the `llvm-dev mailing list -`_ about this. +safe to ask on the `llvm-dev mailing list`_ about this. Suggested disclaimer for the project README and the main project web page: @@ -1033,3 +1032,5 @@ to move code from (e.g.) libc++ to the LLVM core without concern, but that code cannot be moved from the LLVM core to libc++ without the copyright owner's permission. + +.. _llvm-dev mailing list: http://lists.llvm.org/mailman/listinfo/llvm-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: D83416.276521.patch Type: text/x-patch Size: 3737 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:24:10 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:24:10 +0000 (UTC) Subject: [PATCH] D82609: [PowerPC][Power10] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang In-Reply-To: References: Message-ID: <29cbd3b309a27c2b936bfa4a4a4b461f@localhost.localdomain> lei added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:79 +vector signed int test_vec_dive_si(void) { + // CHECK: @llvm.ppc.altivec.vdivesw(<4 x i32> + // CHECK-NEXT: ret <4 x i32> ---------------- why does the ck stops matching at the first param? Shouldn't we check the remaining param type and number of param are correct as well? ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:854 + [(set v4i32:$vD, + (int_ppc_altivec_vdivesw v4i32:$vA, v4i32:$vB))]>; def VDIVEUW : VXForm_1<651, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB), ---------------- nit: indent to match up with `v4i32` on the previous line. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82609/new/ https://reviews.llvm.org/D82609 From llvm-commits at lists.llvm.org Wed Jul 8 12:24:30 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:24:30 +0000 (UTC) Subject: [PATCH] D82838: Parse section ranges when verifying DWARF so we can exclude addresses that should have been stripped from DWARF. In-Reply-To: References: Message-ID: <6452a656bbe6d97bf9adebba512ed4d8@localhost.localdomain> dblaikie added a comment. In D82838#2135038 , @clayborg wrote: > In D82838#2134725 , @dblaikie wrote: > > > I think maybe this is sort of orthogonal to 46453... maybe not, but kind of. > > > > Seems like we should filter out known-tombstoned ranges (the only ones we can know for sure are the new -1/-2 tombstones - all the others have ambiguities). Then we should maybe flag maybe-tombstones with a little "eh, maybe?". Then we should warn for anything left that's even partially outside the .text range (this patch), then we should warn for overlaps/etc on the remaining ones? > > > So for this patch, anything that isn't in text becomes a warning and not an error? Or do we want to add an option to "llvm-dwarfdump --verify" to enforce the text ranges as feature that is disabled by default? --ignore-invalid-text-ranges? I think my goal was to suggest implementing filter known-tombstones first (now we have a good/known tombstone) so that "is not in .text" doesn't unduly warn on correctly tombstoned ranges/addresses (honestly bfd's tombstoning should be fairly good - since it creates empty ranges at least in debug_ranges that don't use base address selection entries - . Then we could maybe warn or error to varying degrees on the things in the middle (not certainly tombstoned, not entirely in .text... ) Sorry, didn't mean to muddy the waters with "warning V error" discussion or need to add more flags, etc - folks who implemented/have more ownership over "verify" should chime in on this, but for myself - yeah, I think I'm coming around to "let's just ignore anything that's even partially outside .text for now" & eventually maybe someone implements the specific tombstone support - and then we warn/error/something on "it's not tombstone, but it's outside .text" which would be a separate issue & a problem, even if it's non-overlapping. Then only the "is in .text" bits would be tested for overlapping. >> But as @jhenderson said, maybe those first ones come later & we use the .text range to determine which things to look at for overlap first, then add new verifier checks for "things outside .text that aren't clearly tombstoned" knowing that some of those are expected limitations of (at least gold's) previous tombstoning strategies. >> >> (I'd sort of like to avoid actually looking at the object's executable sections - but I can't really fault the strategy & even if we added all the other verifier checks/warnings/etc, it'd still be super reasonable to warn about ranges that are otherwise totally valid, but extend beyond/are entirely outside the actual executable .text) > > Since zero is so prevalent, it is nice to get that noise out of the error checking since it creates so many false errors at the moment. It makes the --verify option less useful and way too noisy if we don't do something. We can also just not do the .text ranges for object files since they typically have relocations on each address. We already avoid looking at ranges in many cases for .o files. Yup. In theory all this stuff should be supported for object files too. ================ Comment at: llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h:365 +static inline bool operator<(const DWARFVerifier::SortedRanges &LHS, + const DWARFVerifier::SortedRanges &RHS) { ---------------- (non-member static shouldn't be used in headers - I fixed the op< to not do this) ================ Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:743 + TextRanges.emplace_back(DWARFAddressRange( + StartAddr, StartAddr + Size, SectionedAddress::UndefSection)); + } ---------------- Not sure why this uses UndefSection - be nice to support this in object files and track distinct .text sections. Might be that pre-building a table doesn't suit that strategy - not sure. (& maybe even non-.text sections, in case someone decides to use __attribute__((section(".my_section"))) for their functions, for instance) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82838/new/ https://reviews.llvm.org/D82838 From llvm-commits at lists.llvm.org Wed Jul 8 12:24:49 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:24:49 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: ctetreau updated this revision to Diff 276522. ctetreau added a comment. Apparently `SEC` is defined in sys/time.h on solaris Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 Files: llvm/include/llvm/IR/PatternMatch.h llvm/test/Transforms/InstCombine/fmul.ll llvm/test/Transforms/InstCombine/mul.ll llvm/unittests/IR/PatternMatch.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83001.276522.patch Type: text/x-patch Size: 12657 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:26:38 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:26:38 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: <1281789f057a1031d1d102201038d021@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 From llvm-commits at lists.llvm.org Wed Jul 8 12:27:33 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:27:33 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: <7ac4c2de01f60920ae61ef1f8b4adec1@localhost.localdomain> ctetreau updated this revision to Diff 276523. ctetreau added a comment. fix comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 Files: llvm/include/llvm/IR/PatternMatch.h llvm/test/Transforms/InstCombine/fmul.ll llvm/test/Transforms/InstCombine/mul.ll llvm/unittests/IR/PatternMatch.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83001.276523.patch Type: text/x-patch Size: 12683 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:30:26 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:30:26 +0000 (UTC) Subject: [PATCH] D83417: GlobalISel: Restructure argument lowering loop in handleAssignments Message-ID: arsenm created this revision. arsenm added reviewers: aemerson, paquette, rovka, aditya_nandakumar, dsanders. Herald added subscribers: hiraditya, kristof.beyls, tpr, wdng. Herald added a project: LLVM. This was structured in a way that implied every split argument is in memory, or in registers. It is possible for a pass a original argument partially in registers, and partially in memory. Transpose the logic here to only consider a single piece at a time. Every individual CCValAssign should be treated independently, and any merge to original value needs to be handled later. This is in preparation for merging some preprocessing hacks in the AMDGPU calling convention lowering into the generic code. This was intended to be NFC, but it does partially address a FIXME in the memloc handling. As a result, this does slightly change AArch64 handling of some promoted arguments passed on the stack. The store will be emitted as the smaller, piece type rather than a wider store of an anyext value. I think this exposes a failure to merge stores later, as the change in swifterror replaces a single 64-bit stp with 2 4-byte str. I'm also not sure what the correct behavior for memlocs where the promoted size is larger than the original value. I've opted to clamp the memory access size to not exceed the value register to avoid the explicit trunc/extend/vector widen/vector extract instruction. This happens for AMDGPU for i8 arguments that end up stack passed, which are promoted to i16 (I think this is a preexisting DAG bug though, and they should not really be promoted when in memory). https://reviews.llvm.org/D83417 Files: llvm/lib/CodeGen/GlobalISel/CallLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv-ios.ll llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-exceptions.ll llvm/test/CodeGen/AArch64/GlobalISel/swifterror.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83417.276524.patch Type: text/x-patch Size: 13226 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:31:23 2020 From: llvm-commits at lists.llvm.org (Leonard Chan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:31:23 +0000 (UTC) Subject: [PATCH] D76389: [NewPM] Run the Speculative Execution Pass only if the target has divergent branches In-Reply-To: References: Message-ID: <158684e87d894a73b6f83eabecbe320a@localhost.localdomain> leonardchan abandoned this revision. leonardchan added a comment. In D76389#2120868 , @arsenm wrote: > This seems like it covers a different case than D82735 ? I haven't rebased in a while, but D82735 covers this. Abandoning since this is addressed in that patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76389/new/ https://reviews.llvm.org/D76389 From llvm-commits at lists.llvm.org Wed Jul 8 12:31:30 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:31:30 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <92f2d375a5481f4ec17761bd31d00701@localhost.localdomain> arsenm added a comment. This should skip trying to remove attributes from intrinsics Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 12:31:51 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:31:51 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: guiand added a comment. @eugenis and I discussed the possibility that removing attributes like this could cause problems if, for some reason, the ABI of a libcall changes as a result. The test I added actually demonstrates this: the `inreg` attribute is removed from the calls, which could change the behavior. But this is only a concern if there's some reason to specify ABI-changing attributes to libcalls in the callsite like this, and not have LLVM put them in the declaration. Which I'm not sure there is. What are peoples' thoughts? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Wed Jul 8 12:34:54 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:34:54 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: dblaikie added a comment. In D83351#2139818 , @lebedev.ri wrote: > In D83351#2139769 , @dblaikie wrote: > > > In D83351#2139751 , @lebedev.ri wrote: > > > > > There's also `llvm::for_each()` with 40 uses. > > > > > > Still a very tiny fraction of all iterations. I expect most (or at least an order of magnitude or two more than the <100 for_each()s) of those iterations don't depend on the order of iteration - so I don't think it provides the signal you're thinking of, at least not to enough other developers to be useful - making the code quirky/different in a way that I think is likely to be confusing to other readers ("Why isn't this a range-based for loop? is there some subtle difference in behavior that the author intended that I'm not understanding?" & there isn't) > > > > > Can you please quote specific part of the whatever documentation you believe dictates this? > > > If there isn't one, i'd like to see `ProgrammersManual` patch. > > > > Perhaps this is sufficient: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible > > > I'm afraid it is not. It only speaks about range-based for loop vs. old-style for loops. > If `for_each` is so bad, we really should proactively document it, so i'm still waiting on the link/patch. > Thanks. I don't think this is an LLVM-specific C++ stylistic issue & there's lots of ways that we could write less than ideal code that we don't document in the style guide. (general guidance like "don't use dynamic allocation when static allocation will do" for instance) & I think this fits under that kind of category. The existing uses of llvm::for_each look like mostly cases where an existing lambda is used in a few different places and it's slightly shorter/easier to use std::for_each than a range-based-for. I think those are OK/wouldn't object to them. But in these cases where a lambda expression is being passed immediately to for_each, it doesn't seem beneficial to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 12:35:50 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:35:50 +0000 (UTC) Subject: [PATCH] D83418: AMDGPU/GlobalISel: Start cleaning up calling convention lowering Message-ID: arsenm created this revision. arsenm added reviewers: nhaehnle, foad, kerbowa, rampitec, cdevadas, hsmhsm. Herald added subscribers: hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, wdng, jvesely, kzhuravl. Herald added a project: LLVM. There are various hacks working around limitations in handleAssignments, and the logical split between different parts isn't correct. Start separating the type legalization to satisfy going through the DAG infrastructure from the code required to split into register types. The type splitting should be moved to generic code. https://reviews.llvm.org/D83418 Files: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h llvm/test/CodeGen/AMDGPU/GlobalISel/function-returns.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83418.276525.patch Type: text/x-patch Size: 18240 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:36:28 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:36:28 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: zequanwu updated this revision to Diff 276526. zequanwu added a comment. - Remove "enable-call-graph-profile" option and enable CGProfilePass by default, unless `-no-integrated-as` is given in clang. - Use `LazyBlockFrequencyInfoPass` instead of `BlockFrequencyInfoWrapperPass` and check `F.getEntryCount` before get `BFI` to reduce cost. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276526.patch Type: text/x-patch Size: 17743 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:38:27 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:38:27 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: <57f414fd3bc38b2f75dfa0a017345d2b@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3569 +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { ---------------- Would it be enough to just fix isShuffleMaskLegal, instead of overriding this? ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.h:745 + /// illegal as the original, thus leading to an infinite legalisation loop. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); ---------------- This affects code that isn't using wide vectors, right? If this is supposed to be a temporary hack, I guess it's fine, but please explicitly state that in the comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 From llvm-commits at lists.llvm.org Wed Jul 8 12:38:37 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:38:37 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <18a1ace7cbf0b362de2be9ae5a43028d@localhost.localdomain> NeHuang marked an inline comment as done. NeHuang added inline comments. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:45 + bl callee1_stother0_local at notoc + add 3, 3, 30 + b callee1_stother0_local at notoc ---------------- MaskRay wrote: > These `add 3, 3, 30` & `mullw 3, 3, 30` instructions are not needed. > > Probably worth adding a comment before the first `bl ` explaining that the next instruction does not need to be `nop` as in the R_PPC64_REL24 case. Just want to confirm your suggestion is to remove the instruction after `bl .. at notoc` and add a comment for the `R_PPC64_REL24_NOT` cases `nop` is not needed. In this sense, we will only keep `blr`, `bl` and `b` instructions in the test case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Wed Jul 8 12:41:11 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:41:11 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: nickdesaulniers requested changes to this revision. nickdesaulniers added a comment. This revision now requires changes to proceed. In D83351#2139818 , @lebedev.ri wrote: > so i'm still waiting on the link/patch. Treat your fellow contributors with more respect, please.  I know style disagreements aren't exciting, but we're all on the same team.  I much prefer LLVM's community to the Linux kernel's for a reason, and I think it's worthwhile to speak up in defense of it, lest it decay.  I'm not the best at maintaining this myself, so if you see me break my own standards, please feel empowered to call me out. Another benefit of range for here is we don't need the braces, so these can be 2 lines instead of 3. Please change them to be consistent. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 12:41:42 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via llvm-commits) Date: Wed, 08 Jul 2020 12:41:42 -0700 (PDT) Subject: [llvm] 898065a - Reword description of ISD::BUILD_VECTOR Message-ID: <5f062176.1c69fb81.b6c78.19f9@mx.google.com> Author: Cameron McInally Date: 2020-07-08T14:41:26-05:00 New Revision: 898065a7b879f204874820f16e4e16ea2a961de0 URL: https://github.com/llvm/llvm-project/commit/898065a7b879f204874820f16e4e16ea2a961de0 DIFF: https://github.com/llvm/llvm-project/commit/898065a7b879f204874820f16e4e16ea2a961de0.diff LOG: Reword description of ISD::BUILD_VECTOR Move operand type restriction to the end of the description. This hopefully makes the intention more clear. Differential Revision: https://reviews.llvm.org/D83413 Added: Modified: llvm/include/llvm/CodeGen/ISDOpcodes.h Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h index d121a4d5427f..534f988c5e96 100644 --- a/llvm/include/llvm/CodeGen/ISDOpcodes.h +++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -449,10 +449,10 @@ enum NodeType { /// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector /// with the specified, possibly variable, elements. The types of the - /// operands must all be the same. The types of the operands must match the - /// vector element type, except that integer types are allowed to be larger - /// than the element type, in which case the operands are implicitly - /// truncated. + /// operands must match the vector element type, except that integer types + /// are allowed to be larger than the element type, in which case the + /// operands are implicitly truncated. The types of the operands must all + /// be the same. BUILD_VECTOR, /// INSERT_VECTOR_ELT(VECTOR, VAL, IDX) - Returns VECTOR with the element From llvm-commits at lists.llvm.org Wed Jul 8 12:41:53 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:41:53 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: zequanwu marked an inline comment as done. zequanwu added a comment. > The alternative of using LazyBlockFrequencyInfoPass and checking PSI->hasProfileSummary() first would also work I guess. If you think that's cleaner, maybe that's the better way to go. Since `PSI->hasProfileSummary()` is not necessary for this pass, it relies on function entry count. So, I check for `F.getEntryCount()` before getting BFI. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Wed Jul 8 12:42:52 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:42:52 +0000 (UTC) Subject: [PATCH] D82609: [PowerPC][Power10] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang In-Reply-To: References: Message-ID: amyk marked 2 inline comments as done. amyk added inline comments. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:79 +vector signed int test_vec_dive_si(void) { + // CHECK: @llvm.ppc.altivec.vdivesw(<4 x i32> + // CHECK-NEXT: ret <4 x i32> ---------------- lei wrote: > why does the ck stops matching at the first param? Shouldn't we check the remaining param type and number of param are correct as well? Yes, thanks for pointing that out. Will be fixing the CHECKs. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:854 + [(set v4i32:$vD, + (int_ppc_altivec_vdivesw v4i32:$vA, v4i32:$vB))]>; def VDIVEUW : VXForm_1<651, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB), ---------------- lei wrote: > nit: indent to match up with `v4i32` on the previous line. I will update with the proper indentation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82609/new/ https://reviews.llvm.org/D82609 From llvm-commits at lists.llvm.org Wed Jul 8 12:43:31 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:43:31 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show command line option In-Reply-To: References: Message-ID: <056fff8a7d4ff00c10b3a12048364278@localhost.localdomain> jdenny marked an inline comment as done. jdenny added inline comments. ================ Comment at: llvm/docs/CommandGuide/lit.rst:120-121 + + Show the names of the specified tests. Choose from: + all, excluded, skipped, unsupported, pass, flakypass, xfail. + ---------------- yln wrote: > jdenny wrote: > > yln wrote: > > > varungandhi-apple wrote: > > > > Could you add an example here with how multiple items should be selected? For example, one might wonder > > > > > > > > 1. Can you do something like `--show skipped,xfail`? > > > > 2. What are the semantics of `--show skipped --show xfail` (does it mean skipped AND xfail or does it only mean xfail)? > > > > > > > > You might also want to add a sentence to `--help` text on how to use multiple options. > > > Good point, I will add an example here. > > > 1. See test. > > > 2. Last one wins. > > > 3. The `--help` part is auto-generated by argparse (because we use the `choices` parameter). > > Why not a comma-separated list? > > > > For example, I downloaded your patch and tried this: > > > > ``` > > $ ./bin/llvm-lit test/Support > > -- Testing: 2 tests, 2 workers -- > > PASS: LLVM :: Support/check-default-options.txt (1 of 2) > > PASS: LLVM :: Support/interrupts.test (2 of 2) > > > > Testing Time: 0.11s > > Passed: 2 > > $ ./bin/llvm-lit --show all test/Support > > usage: lit [-h] [--version] [-j N] [--config-prefix NAME] [-D NAME=VAL] [-q] > > [-s] [-v] [-vv] [-a] [-o PATH] [--no-progress-bar] > > [--show-unsupported] [--show-xfail] > > [--show {all,excluded,skipped,unsupported,pass,flakypass,xfail} [{all,excluded,skipped,unsupported,pass,flakypass,xfail} ...]] > > [--path PATH] [--vg] [--vg-leak] [--vg-arg ARG] [--time-tests] > > [--no-execute] [--xunit-xml-output XUNIT_XML_OUTPUT] > > [--timeout MAXINDIVIDUALTESTTIME] [--max-failures MAX_FAILURES] > > [--allow-empty-runs] [--max-tests N] [--max-time N] [--shuffle] > > [-i] [--filter REGEX] [--num-shards M] [--run-shard N] [--debug] > > [--show-suites] [--show-tests] [--show-used-features] > > TEST_PATH [TEST_PATH ...] > > lit: error: argument --show: invalid choice: 'test/Support' (choose from 'all', 'excluded', 'skipped', 'unsupported', 'pass', 'flakypass', 'xfail') > > ``` > > > > The usage message shows that `--show` can be used before `TEST_PATH`, but it doesn't work unless I add `--` in between, which isn't listed in the options. Alternatively, I can specify `--show` after `TEST_PATH`, but that also isn't mentioned in the usage summary above. > > > > If the values were comma-separated, this wouldn't be an issue. > > Why not a comma-separated list? > > Yes, I think this would also "feel" more natural than the current space-separated list and avoid the oddities about parameter ordering. > Surprisingly, argparse does not support this. I see two options: > > * Implement this ourselves (it's actually not too bad): https://stackoverflow.com/a/60205263/271968 > * We could generate flags, e.g., `--show-xfail`, for each result code. Pro: this would be in line with the existing flags for unsupported and fail. Con: Extending this scheme to accept user-defined codes would be harder. > > What do you think? Do you have a preference or additional ideas? One aspect I like about the second option (`--show-*`) is that the semantics of multiple occurrences are more intuitive. The first option (comma-separated list) begs the question of accummulate vs. override. But this is a minor point. If I had to choose right now, I'd go with the second option, primarily because it's consistent with the existing interface. However, I don't have strong feelings about this, so I could be happy with either. ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:204 + else: + opts.shown_codes.add(lit.Test.ResultCode._instances[code.upper()]) ---------------- yln wrote: > jdenny wrote: > > What happens if there are user-defined result codes that are spelled the same except for case? > Unfortunately, this can't be used (yet) to specify user-defined result codes at all. User codes are usually registered in config files and this code executes before we evaluate configs, i.e., it will print `argument --show: invalid choice: 'user-code' (choose from ...)` > > If we think it's worth it then we could push "choice validation" to a later point after we processed the user configs. I don't know that `--show` support for user-defined result codes needs to be implemented in this patch. In that case, the case-insensitivity issue I raised is not relevant yet, right? That can be addressed later then. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 From llvm-commits at lists.llvm.org Wed Jul 8 12:45:47 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:45:47 +0000 (UTC) Subject: [PATCH] D83413: Tighten description of ISD::BUILD_VECTOR In-Reply-To: References: Message-ID: <4d63c5ab4200415a4d6e4e652b4af103@localhost.localdomain> cameron.mcinally added a comment. Thanks, @kparzysz . I like that idea. Pushed 898065a7b879f204874820f16e4e16ea2a961de0 . Would you review post-commit? Unfortunately, I missed your quoted recommendation until just now. My email client hid it. Are you okay with it as-is? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83413/new/ https://reviews.llvm.org/D83413 From llvm-commits at lists.llvm.org Wed Jul 8 12:45:56 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:45:56 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: efriedma added a comment. Probably worth adding a testcase for truncating from `<4 x i64>` to `<4 x i8>`. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1063 setOperationAction(ISD::STORE, VT, Custom); + setOperationAction(ISD::TRUNCATE, VT, Custom); } ---------------- This specifically applies to the result type. You might want to note that you're implicitly depending on the fact that we do custom legalization for NEON TRUNCATE operations for other reasons. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Wed Jul 8 12:46:03 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Wed, 08 Jul 2020 12:46:03 -0700 (PDT) Subject: [llvm] a48cf72 - [InstSimplify] Handle not inserted instruction gracefully (PR46638) Message-ID: <5f06227b.1c69fb81.fd8ed.19ce@mx.google.com> Author: Nikita Popov Date: 2020-07-08T21:43:32+02:00 New Revision: a48cf72238e740adb2a45012736c0c655070fb8f URL: https://github.com/llvm/llvm-project/commit/a48cf72238e740adb2a45012736c0c655070fb8f DIFF: https://github.com/llvm/llvm-project/commit/a48cf72238e740adb2a45012736c0c655070fb8f.diff LOG: [InstSimplify] Handle not inserted instruction gracefully (PR46638) When simplifying comparisons using a dominating assume, bail out if the context instruction is not inserted. Added: llvm/test/Transforms/SimplifyCFG/pr46638.ll Modified: llvm/lib/Analysis/InstructionSimplify.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index df4abe09797c..d3bdf9d6aafd 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -3284,7 +3284,8 @@ static Value *simplifyICmpWithMinMax(CmpInst::Predicate Pred, Value *LHS, static Value *simplifyICmpWithDominatingAssume(CmpInst::Predicate Predicate, Value *LHS, Value *RHS, const SimplifyQuery &Q) { - if (!Q.AC || !Q.CxtI) + // Gracefully handle instructions that have not been inserted yet. + if (!Q.AC || !Q.CxtI || !Q.CxtI->getParent()) return nullptr; for (Value *AssumeBaseOp : {LHS, RHS}) { diff --git a/llvm/test/Transforms/SimplifyCFG/pr46638.ll b/llvm/test/Transforms/SimplifyCFG/pr46638.ll new file mode 100644 index 000000000000..ba7ce88cf6ad --- /dev/null +++ b/llvm/test/Transforms/SimplifyCFG/pr46638.ll @@ -0,0 +1,45 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S -simplifycfg < %s | FileCheck %s + +define void @pr46638(i1 %c, i32 %x) { +; CHECK-LABEL: @pr46638( +; CHECK-NEXT: [[CMP1:%.*]] = icmp slt i32 [[X:%.*]], 0 +; CHECK-NEXT: call void @llvm.assume(i1 [[CMP1]]) +; CHECK-NEXT: br i1 [[C:%.*]], label [[TRUE2_CRITEDGE:%.*]], label [[FALSE1:%.*]] +; CHECK: false1: +; CHECK-NEXT: call void @dummy(i32 1) +; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[X]], 0 +; CHECK-NEXT: [[EXT:%.*]] = zext i1 [[CMP2]] to i32 +; CHECK-NEXT: call void @dummy(i32 [[EXT]]) +; CHECK-NEXT: ret void +; CHECK: true2.critedge: +; CHECK-NEXT: [[CMP2_C:%.*]] = icmp sgt i32 [[X]], 0 +; CHECK-NEXT: [[EXT_C:%.*]] = zext i1 [[CMP2_C]] to i32 +; CHECK-NEXT: call void @dummy(i32 [[EXT_C]]) +; CHECK-NEXT: call void @dummy(i32 2) +; CHECK-NEXT: ret void +; + %cmp1 = icmp slt i32 %x, 0 + call void @llvm.assume(i1 %cmp1) + br i1 %c, label %true1, label %false1 + +true1: + %cmp2 = icmp sgt i32 %x, 0 + %ext = zext i1 %cmp2 to i32 + call void @dummy(i32 %ext) + br i1 %c, label %true2, label %false2 + +false1: + call void @dummy(i32 1) + br label %true1 + +true2: + call void @dummy(i32 2) + ret void + +false2: + ret void +} + +declare void @dummy(i32) +declare void @llvm.assume(i1) From llvm-commits at lists.llvm.org Wed Jul 8 12:46:04 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Wed, 08 Jul 2020 12:46:04 -0700 (PDT) Subject: [llvm] 0b39d2d - Revert "[NFC] Separate Peeling Properties into its own struct" Message-ID: <5f06227c.1c69fb81.537eb.111b@mx.google.com> Author: Nikita Popov Date: 2020-07-08T21:43:32+02:00 New Revision: 0b39d2d75275b80994dac06b7ad05031cbd09393 URL: https://github.com/llvm/llvm-project/commit/0b39d2d75275b80994dac06b7ad05031cbd09393 DIFF: https://github.com/llvm/llvm-project/commit/0b39d2d75275b80994dac06b7ad05031cbd09393.diff LOG: Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit 0369dc98f958a1ca2ec05f1897f091129bb16e8a. Many failing tests. Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index b6698eefdb01..695b7d6061c0 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -450,6 +450,11 @@ class TargetTransformInfo { /// transformation will select an unrolling factor based on the current cost /// threshold and other factors. unsigned Count; + /// A forced peeling factor (the number of bodied of the original loop + /// that should be peeled off before the loop body). When set to 0, the + /// unrolling transformation will select a peeling factor based on profile + /// information and other factors. + unsigned PeelCount; /// Default unroll count for loops with run-time trip count. unsigned DefaultUnrollRuntimeCount; // Set the maximum unrolling factor. The unrolling factor may be selected @@ -483,10 +488,19 @@ class TargetTransformInfo { bool Force; /// Allow using trip count upper bound to unroll loops. bool UpperBound; + /// Allow peeling off loop iterations. + bool AllowPeeling; + /// Allow peeling off loop iterations for loop nests. + bool AllowLoopNestsPeeling; /// Allow unrolling of all the iterations of the runtime loop remainder. bool UnrollRemainder; /// Allow unroll and jam. Used to enable unroll and jam for the target. bool UnrollAndJam; + /// Allow peeling basing on profile. Uses to enable peeling off all + /// iterations basing on provided profile. + /// If the value is true the peeling cost model can decide to peel only + /// some iterations and in this case it will set this to false. + bool PeelProfiledIterations; /// Threshold for unroll and jam, for inner loop size. The 'Threshold' /// value above is used during unroll and jam for the outer loop size. /// This value is used in the same manner to limit the size of the inner @@ -520,28 +534,6 @@ class TargetTransformInfo { /// intrinsic is supported. bool emitGetActiveLaneMask() const; - // Parameters that control the loop peeling transformation - struct PeelingPreferences { - /// A forced peeling factor (the number of bodied of the original loop - /// that should be peeled off before the loop body). When set to 0, the - /// a peeling factor based on profile information and other factors. - unsigned PeelCount; - /// Allow peeling off loop iterations. - bool AllowPeeling; - /// Allow peeling off loop iterations for loop nests. - bool AllowLoopNestsPeeling; - /// Allow peeling basing on profile. Uses to enable peeling off all - /// iterations basing on provided profile. - /// If the value is true the peeling cost model can decide to peel only - /// some iterations and in this case it will set this to false. - bool PeelProfiledIterations; - }; - - /// Get target-customized preferences for the generic loop peeling - /// transformation. The caller will initialize \p PP with the current - /// target-independent defaults with information from \p L and \p SE. - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) const; /// @} /// \name Scalar Target Information @@ -1290,8 +1282,6 @@ class TargetTransformInfo::Concept { virtual bool isLoweredToCall(const Function *F) = 0; virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP) = 0; - virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) = 0; virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, @@ -1570,10 +1560,6 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { UnrollingPreferences &UP) override { return Impl.getUnrollingPreferences(L, SE, UP); } - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) override { - return Impl.getPeelingPreferences(L, SE, PP); - } bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) override { diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 0ce975d6d4b5..ca7106ab98aa 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -150,9 +150,6 @@ class TargetTransformInfoImplBase { void getUnrollingPreferences(Loop *, ScalarEvolution &, TTI::UnrollingPreferences &) {} - void getPeelingPreferences(Loop *, ScalarEvolution &, - TTI::PeelingPreferences &) {} - bool isLegalAddImmediate(int64_t Imm) { return false; } bool isLegalICmpImmediate(int64_t Imm) { return false; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index f9d32eadd23e..c6a9a65ae6c1 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -451,14 +451,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { UP.BEInsns = 2; } - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - PP.PeelCount = 0; - PP.AllowPeeling = true; - PP.AllowLoopNestsPeeling = false; - PP.PeelProfiledIterations = true; - } - bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index bb3d02b95956..1970cefcefba 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -94,7 +94,6 @@ bool UnrollRuntimeLoopRemainder( void computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE); bool canPeel(Loop *L); @@ -120,8 +119,6 @@ bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, - bool &UseUpperBound); void simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI, @@ -136,13 +133,9 @@ TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserFullUnrollMaxCount); - -TargetTransformInfo::PeelingPreferences -gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, - const TargetTransformInfo &TTI, - Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling); + Optional UserUpperBound, Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling, + Optional UserFullUnrollMaxCount); unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent, diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 2f051e53790b..87c6f83938ed 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -327,11 +327,6 @@ void TargetTransformInfo::getUnrollingPreferences( return TTIImpl->getUnrollingPreferences(L, SE, UP); } -void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - PeelingPreferences &PP) const { - return TTIImpl->getPeelingPreferences(L, SE, PP); -} - bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { return TTIImpl->isLegalAddImmediate(Imm); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index cf6de797727b..be0c51b83a25 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -859,11 +859,6 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, getFalkorUnrollingPreferences(L, SE, UP); } -void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} - Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) { switch (Inst->getIntrinsicID()) { diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 094b04c95db4..27afb2e5a7d6 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -153,9 +153,6 @@ class AArch64TTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 46051ac14b59..24f079ffe929 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -236,10 +236,6 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, } } -void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. @@ -994,11 +990,6 @@ void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, CommonTTI.getUnrollingPreferences(L, SE, UP); } -void GCNTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - CommonTTI.getPeelingPreferences(L, SE, PP); -} - unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } @@ -1105,8 +1096,3 @@ void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { CommonTTI.getUnrollingPreferences(L, SE, UP); } - -void R600TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - CommonTTI.getPeelingPreferences(L, SE, PP); -} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index b913f5194e40..508ed061e935 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -61,9 +61,6 @@ class AMDGPUTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); }; class GCNTTIImpl final : public BasicTTIImplBase { @@ -144,9 +141,6 @@ class GCNTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); return TTI::PSK_FastHardware; @@ -264,8 +258,6 @@ class R600TTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); unsigned getHardwareNumberOfRegisters(bool Vec) const; unsigned getNumberOfRegisters(bool Vec) const; unsigned getRegisterBitWidth(bool Vector) const; diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 74b1331216a0..44dfb9e8c129 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1582,11 +1582,6 @@ void ARMTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } -void ARMTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} - bool ARMTTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty, TTI::ReductionFlags Flags) const { return ST->hasMVEIntegerOps(); diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index 537a546361ee..5d914227c968 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -251,8 +251,6 @@ class ARMTTIImpl : public BasicTTIImplBase { bool emitGetActiveLaneMask() const; - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); bool shouldBuildLookupTablesForConstant(Constant *C) const { // In the ROPI and RWPI relocation models we can't have pointers to global // variables or functions in constant data, so don't convert switches to diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index 80c8736cb74a..76df4e8e1931 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -78,17 +78,12 @@ HexagonTTIImpl::getPopcntSupport(unsigned IntTyWidthInBit) const { void HexagonTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { UP.Runtime = UP.Partial = true; -} - -void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); // Only try to peel innermost loops with small runtime trip counts. if (L && L->empty() && canPeel(L) && SE.getSmallConstantTripCount(L) == 0 && SE.getSmallConstantMaxTripCount(L) > 0 && SE.getSmallConstantMaxTripCount(L) <= 5) { - PP.PeelCount = 2; + UP.PeelCount = 2; } } diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index 5fe397486402..3365c5bf1cb1 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -64,9 +64,6 @@ class HexagonTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - /// Bias LSR towards creating post-increment opportunities. bool shouldFavorPostInc() const; diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index 3873c73fb2e0..5c14d0f1a24d 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -155,8 +155,3 @@ void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Partial = UP.Runtime = true; UP.PartialThreshold = UP.Threshold / 4; } - -void NVPTXTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h index cb832031f1ad..88156f687284 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h @@ -95,10 +95,6 @@ class NVPTXTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { // Volatile loads/stores are only supported for shared and global address // spaces, or for generic AS that maps to them. diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index 53556ffc267d..f2c746a14299 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -568,10 +568,6 @@ void PPCTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, BaseT::getUnrollingPreferences(L, SE, UP); } -void PPCTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} // This function returns true to allow using coldcc calling convention. // Returning true results in coldcc being used for functions which are cold at // all call sites when the callers of the functions are not calling any other diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h index d998521084e1..b831789d3e6e 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h @@ -66,8 +66,6 @@ class PPCTTIImpl : public BasicTTIImplBase { TargetLibraryInfo *LibInfo); void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp index 864200e5f71c..36141426e27d 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp @@ -294,10 +294,6 @@ void SystemZTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } -void SystemZTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP) { - BaseT::getPeelingPreferences(L, SE, PP); -} bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2) { diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h index 7f8f7f6f923f..d20541774da1 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h @@ -50,9 +50,6 @@ class SystemZTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); - void getPeelingPreferences(Loop *L, ScalarEvolution &SE, - TTI::PeelingPreferences &PP); - bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); /// @} diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp index 285cba6ee205..f0ece1faa5fd 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp @@ -158,8 +158,7 @@ static bool computeUnrollAndJamCount( const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned OuterTripCount, unsigned OuterTripMultiple, unsigned OuterLoopSize, unsigned InnerTripCount, - unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP) { + unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP) { // First up use computeUnrollCount from the loop unroller to get a count // for unrolling the outer loop, plus any loops requiring explicit // unrolling we leave to the unroller. This uses UP.Threshold / @@ -169,8 +168,7 @@ static bool computeUnrollAndJamCount( bool UseUpperBound = false; bool ExplicitUnroll = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, ORE, OuterTripCount, MaxTripCount, - /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, PP, - UseUpperBound); + /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, UseUpperBound); if (ExplicitUnroll || UseUpperBound) { // If the user explicitly set the loop as unrolled, dont UnJ it. Leave it // for the unroller instead. @@ -284,9 +282,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, OptimizationRemarkEmitter &ORE, int OptLevel) { TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None, - None, None, None, None, None); - TargetTransformInfo::PeelingPreferences PP = - gatherPeelingPreferences(L, SE, TTI, None, None); + None, None, None, None, None, None, None); if (AllowUnrollAndJam.getNumOccurrences() > 0) UP.UnrollAndJam = AllowUnrollAndJam; if (UnrollAndJamThreshold.getNumOccurrences() > 0) @@ -371,7 +367,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, // Decide if, and by how much, to unroll bool IsCountSetExplicitly = computeUnrollAndJamCount( L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount, - OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP, PP); + OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP); if (UP.Count <= 1) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index 88845cde8d4f..ec56610e41e5 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -193,7 +193,9 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserFullUnrollMaxCount) { + Optional UserUpperBound, Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling, + Optional UserFullUnrollMaxCount) { TargetTransformInfo::UnrollingPreferences UP; // Set up the defaults @@ -204,6 +206,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.PartialThreshold = 150; UP.PartialOptSizeThreshold = 0; UP.Count = 0; + UP.PeelCount = 0; UP.DefaultUnrollRuntimeCount = 8; UP.MaxCount = std::numeric_limits::max(); UP.FullUnrollMaxCount = std::numeric_limits::max(); @@ -215,7 +218,10 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.AllowExpensiveTripCount = false; UP.Force = false; UP.UpperBound = false; + UP.AllowPeeling = true; + UP.AllowLoopNestsPeeling = false; UP.UnrollAndJam = false; + UP.PeelProfiledIterations = true; UP.UnrollAndJamInnerLoopThreshold = 60; UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze; @@ -243,6 +249,8 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.MaxCount = UnrollMaxCount; if (UnrollFullMaxCount.getNumOccurrences() > 0) UP.FullUnrollMaxCount = UnrollFullMaxCount; + if (UnrollPeelCount.getNumOccurrences() > 0) + UP.PeelCount = UnrollPeelCount; if (UnrollAllowPartial.getNumOccurrences() > 0) UP.Partial = UnrollAllowPartial; if (UnrollAllowRemainder.getNumOccurrences() > 0) @@ -251,6 +259,10 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = UnrollRuntime; if (UnrollMaxUpperBound == 0) UP.UpperBound = false; + if (UnrollAllowPeeling.getNumOccurrences() > 0) + UP.AllowPeeling = UnrollAllowPeeling; + if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) + UP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; if (UnrollUnrollRemainder.getNumOccurrences() > 0) UP.UnrollRemainder = UnrollUnrollRemainder; if (UnrollMaxIterationsCountToAnalyze.getNumOccurrences() > 0) @@ -269,39 +281,16 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = *UserRuntime; if (UserUpperBound.hasValue()) UP.UpperBound = *UserUpperBound; + if (UserAllowPeeling.hasValue()) + UP.AllowPeeling = *UserAllowPeeling; + if (UserAllowProfileBasedPeeling.hasValue()) + UP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; if (UserFullUnrollMaxCount.hasValue()) UP.FullUnrollMaxCount = *UserFullUnrollMaxCount; return UP; } -TargetTransformInfo::PeelingPreferences -llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, - const TargetTransformInfo &TTI, - Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling) { - TargetTransformInfo::PeelingPreferences PP; - - // Get Target Specifc Values - TTI.getPeelingPreferences(L, SE, PP); - - // User Specified Values using cl::opt - if (UnrollPeelCount.getNumOccurrences() > 0) - PP.PeelCount = UnrollPeelCount; - if (UnrollAllowPeeling.getNumOccurrences() > 0) - PP.AllowPeeling = UnrollAllowPeeling; - if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) - PP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; - - // User Specifed values provided by argument - if (UserAllowPeeling.hasValue()) - PP.AllowPeeling = *UserAllowPeeling; - if (UserAllowProfileBasedPeeling.hasValue()) - PP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; - - return PP; -} - namespace { /// A struct to densely store the state of an instruction after unrolling at @@ -772,8 +761,7 @@ bool llvm::computeUnrollCount( ScalarEvolution &SE, const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, - TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) { + TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) { // Check for explicit Count. // 1st priority is unroll count set by "unroll-count" option. @@ -875,8 +863,8 @@ bool llvm::computeUnrollCount( } // 4th priority is loop peeling. - computePeelCount(L, LoopSize, UP, PP, TripCount, SE); - if (PP.PeelCount) { + computePeelCount(L, LoopSize, UP, TripCount, SE); + if (UP.PeelCount) { UP.Runtime = false; UP.Count = 1; return ExplicitUnroll; @@ -1079,9 +1067,8 @@ static LoopUnrollResult tryToUnrollLoop( TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences( L, SE, TTI, BFI, PSI, OptLevel, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound, + ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, ProvidedFullUnrollMaxCount); - TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences( - L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling); // Exit early if unrolling is disabled. For OptForSize, we pick the loop size // as threshold later on. @@ -1155,7 +1142,7 @@ static LoopUnrollResult tryToUnrollLoop( bool UseUpperBound = false; bool IsCountSetExplicitly = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero, - TripMultiple, LoopSize, UP, PP, UseUpperBound); + TripMultiple, LoopSize, UP, UseUpperBound); if (!UP.Count) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. @@ -1170,7 +1157,7 @@ static LoopUnrollResult tryToUnrollLoop( LoopUnrollResult UnrollResult = UnrollLoop( L, {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount, - UseUpperBound, MaxOrZero, TripMultiple, PP.PeelCount, UP.UnrollRemainder, + UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV}, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop); if (UnrollResult == LoopUnrollResult::Unmodified) @@ -1202,7 +1189,7 @@ static LoopUnrollResult tryToUnrollLoop( // If the loop was peeled, we already "used up" the profile information // we had, so we don't want to unroll or peel again. if (UnrollResult != LoopUnrollResult::FullyUnrolled && - (IsCountSetExplicitly || (PP.PeelProfiledIterations && PP.PeelCount))) + (IsCountSetExplicitly || (UP.PeelProfiledIterations && UP.PeelCount))) L->setLoopAlreadyUnrolled(); return UnrollResult; diff --git a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp index c653aacbee6c..43dfaf3e50dc 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp @@ -279,20 +279,19 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount, // Return the number of iterations we want to peel off. void llvm::computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, - TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE) { assert(LoopSize > 0 && "Zero loop size is not allowed!"); - // Save the PP.PeelCount value set by the target in - // TTI.getPeelingPreferences or by the flag -unroll-peel-count. - unsigned TargetPeelCount = PP.PeelCount; - PP.PeelCount = 0; + // Save the UP.PeelCount value set by the target in + // TTI.getUnrollingPreferences or by the flag -unroll-peel-count. + unsigned TargetPeelCount = UP.PeelCount; + UP.PeelCount = 0; if (!canPeel(L)) return; // Only try to peel innermost loops by default. // The constraint can be relaxed by the target in TTI.getUnrollingPreferences // or by the flag -unroll-allow-loop-nests-peeling. - if (!PP.AllowLoopNestsPeeling && !L->empty()) + if (!UP.AllowLoopNestsPeeling && !L->empty()) return; // If the user provided a peel count, use that. @@ -300,13 +299,13 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, if (UserPeelCount) { LLVM_DEBUG(dbgs() << "Force-peeling first " << UnrollForcePeelCount << " iterations.\n"); - PP.PeelCount = UnrollForcePeelCount; - PP.PeelProfiledIterations = true; + UP.PeelCount = UnrollForcePeelCount; + UP.PeelProfiledIterations = true; return; } // Skip peeling if it's disabled. - if (!PP.AllowPeeling) + if (!UP.AllowPeeling) return; unsigned AlreadyPeeled = 0; @@ -355,8 +354,8 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn" << " some Phis into invariants.\n"); - PP.PeelCount = DesiredPeelCount; - PP.PeelProfiledIterations = false; + UP.PeelCount = DesiredPeelCount; + UP.PeelProfiledIterations = false; return; } } @@ -368,7 +367,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, return; // Do not apply profile base peeling if it is disabled. - if (!PP.PeelProfiledIterations) + if (!UP.PeelProfiledIterations) return; // If we don't know the trip count, but have reason to believe the average // trip count is low, peeling should be beneficial, since we will usually @@ -388,7 +387,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, (LoopSize * (*PeelCount + 1) <= UP.Threshold)) { LLVM_DEBUG(dbgs() << "Peeling first " << *PeelCount << " iterations.\n"); - PP.PeelCount = *PeelCount; + UP.PeelCount = *PeelCount; return; } LLVM_DEBUG(dbgs() << "Requested peel count: " << *PeelCount << "\n"); From llvm-commits at lists.llvm.org Wed Jul 8 12:51:16 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:51:16 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <308ba905f221184cc9b19c6ad5acf717@localhost.localdomain> efriedma added a comment. If a libcall requires some ABI-changing attribute, presumably we'd make emitBinaryFloatFnCall take care of it, or something like that. I don't think float libcalls currently require ABI attributes on any target, but if they did, there isn't any reason to expect the attributes to be related. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Wed Jul 8 12:53:33 2020 From: llvm-commits at lists.llvm.org (Chris Bieneman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:53:33 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <74a10e6c0e10d9534eaa9fe9c8aaf35f@localhost.localdomain> beanz added a comment. In D83372#2139641 , @lattner wrote: > Does this add a static constructor? Yes it does. `std::recursive_mutex` has non-trivial constructors and destructors. The code that was there knowingly leaked the mutex to avoid the static constructors and destructors. It should be safe to destroy it in `llvm_shutdown`, but making it a global is bad. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Wed Jul 8 12:53:56 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Wed, 08 Jul 2020 12:53:56 -0700 (PDT) Subject: [llvm] 9b1e953 - [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms Message-ID: <5f062454.1c69fb81.403da.1a35@mx.google.com> Author: Craig Topper Date: 2020-07-08T12:53:05-07:00 New Revision: 9b1e95329af7bb005275f18225b2c130ec3ea98d URL: https://github.com/llvm/llvm-project/commit/9b1e95329af7bb005275f18225b2c130ec3ea98d DIFF: https://github.com/llvm/llvm-project/commit/9b1e95329af7bb005275f18225b2c130ec3ea98d.diff LOG: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms As noted here https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html and by alive2, this transform isn't valid. If X is poison this potentially propagates poison when it shouldn't. This same transform still exists in DAGCombiner. Differential Revision: https://reviews.llvm.org/D83360 Added: Modified: clang/test/CodeGen/arm-mve-intrinsics/dup.c llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstCombine/select.ll llvm/test/Transforms/InstSimplify/select.ll Removed: ################################################################################ diff --git a/clang/test/CodeGen/arm-mve-intrinsics/dup.c b/clang/test/CodeGen/arm-mve-intrinsics/dup.c index 283c08257005..b443917cb258 100644 --- a/clang/test/CodeGen/arm-mve-intrinsics/dup.c +++ b/clang/test/CodeGen/arm-mve-intrinsics/dup.c @@ -242,7 +242,8 @@ uint32x4_t test_vdupq_m_n_u32(uint32x4_t inactive, uint32_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> undef, half [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> undef, <8 x i32> zeroinitializer -// CHECK-NEXT: ret <8 x half> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x half> [[DOTSPLAT]], <8 x half> undef +// CHECK-NEXT: ret <8 x half> [[TMP2]] // float16x8_t test_vdupq_x_n_f16(float16_t a, mve_pred16_t p) { @@ -255,7 +256,8 @@ float16x8_t test_vdupq_x_n_f16(float16_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer -// CHECK-NEXT: ret <4 x float> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[DOTSPLAT]], <4 x float> undef +// CHECK-NEXT: ret <4 x float> [[TMP2]] // float32x4_t test_vdupq_x_n_f32(float32_t a, mve_pred16_t p) { @@ -268,7 +270,8 @@ float32x4_t test_vdupq_x_n_f32(float32_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer -// CHECK-NEXT: ret <16 x i8> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[DOTSPLAT]], <16 x i8> undef +// CHECK-NEXT: ret <16 x i8> [[TMP2]] // int8x16_t test_vdupq_x_n_s8(int8_t a, mve_pred16_t p) { @@ -281,7 +284,8 @@ int8x16_t test_vdupq_x_n_s8(int8_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer -// CHECK-NEXT: ret <8 x i16> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[DOTSPLAT]], <8 x i16> undef +// CHECK-NEXT: ret <8 x i16> [[TMP2]] // int16x8_t test_vdupq_x_n_s16(int16_t a, mve_pred16_t p) { @@ -294,7 +298,8 @@ int16x8_t test_vdupq_x_n_s16(int16_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -// CHECK-NEXT: ret <4 x i32> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[DOTSPLAT]], <4 x i32> undef +// CHECK-NEXT: ret <4 x i32> [[TMP2]] // int32x4_t test_vdupq_x_n_s32(int32_t a, mve_pred16_t p) { @@ -307,7 +312,8 @@ int32x4_t test_vdupq_x_n_s32(int32_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer -// CHECK-NEXT: ret <16 x i8> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[DOTSPLAT]], <16 x i8> undef +// CHECK-NEXT: ret <16 x i8> [[TMP2]] // uint8x16_t test_vdupq_x_n_u8(uint8_t a, mve_pred16_t p) { @@ -320,7 +326,8 @@ uint8x16_t test_vdupq_x_n_u8(uint8_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer -// CHECK-NEXT: ret <8 x i16> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[DOTSPLAT]], <8 x i16> undef +// CHECK-NEXT: ret <8 x i16> [[TMP2]] // uint16x8_t test_vdupq_x_n_u16(uint16_t a, mve_pred16_t p) { @@ -333,7 +340,8 @@ uint16x8_t test_vdupq_x_n_u16(uint16_t a, mve_pred16_t p) // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[A:%.*]], i32 0 // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer -// CHECK-NEXT: ret <4 x i32> [[DOTSPLAT]] +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[DOTSPLAT]], <4 x i32> undef +// CHECK-NEXT: ret <4 x i32> [[TMP2]] // uint32x4_t test_vdupq_x_n_u32(uint32_t a, mve_pred16_t p) { diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index d3bdf9d6aafd..8cd5d2034586 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -4118,11 +4118,6 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, if (TrueVal == FalseVal) return TrueVal; - if (isa(TrueVal)) // select ?, undef, X -> X - return FalseVal; - if (isa(FalseVal)) // select ?, X, undef -> X - return TrueVal; - // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll index 381a77bb8d78..f990a58f984c 100644 --- a/llvm/test/Transforms/InstCombine/select.ll +++ b/llvm/test/Transforms/InstCombine/select.ll @@ -2273,3 +2273,43 @@ exit: %sel = select i1 %cond, i32 %phi, i32 %A ret i32 %sel } + +; Negative tests to ensure we don't remove selects with undef true/false values. +; See https://bugs.llvm.org/show_bug.cgi?id=31633 +; https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html +; https://reviews.llvm.org/D83360 +define i32 @false_undef(i1 %cond, i32 %x) { +; CHECK-LABEL: @false_undef( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[X:%.*]], i32 undef +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 %x, i32 undef + ret i32 %s +} + +define i32 @true_undef(i1 %cond, i32 %x) { +; CHECK-LABEL: @true_undef( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[X:%.*]] +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 undef, i32 %x + ret i32 %s +} + +define <2 x i32> @false_undef_vec(i1 %cond, <2 x i32> %x) { +; CHECK-LABEL: @false_undef_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> [[X:%.*]], <2 x i32> undef +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> %x, <2 x i32> undef + ret <2 x i32> %s +} + +define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { +; CHECK-LABEL: @true_undef_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> undef, <2 x i32> [[X:%.*]] +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> undef, <2 x i32> %x + ret <2 x i32> %s +} diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll index 139f5e3c3c23..81fc3ff186cd 100644 --- a/llvm/test/Transforms/InstSimplify/select.ll +++ b/llvm/test/Transforms/InstSimplify/select.ll @@ -750,3 +750,43 @@ define i1 @y_might_be_poison(float %x, float %y) { %c3 = select i1 %c1, i1 %c2, i1 false ret i1 %c3 } + +; Negative tests to ensure we don't remove selects with undef true/false values. +; See https://bugs.llvm.org/show_bug.cgi?id=31633 +; https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html +; https://reviews.llvm.org/D83360 +define i32 @false_undef(i1 %cond, i32 %x) { +; CHECK-LABEL: @false_undef( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[X:%.*]], i32 undef +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 %x, i32 undef + ret i32 %s +} + +define i32 @true_undef(i1 %cond, i32 %x) { +; CHECK-LABEL: @true_undef( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[X:%.*]] +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 undef, i32 %x + ret i32 %s +} + +define <2 x i32> @false_undef_vec(i1 %cond, <2 x i32> %x) { +; CHECK-LABEL: @false_undef_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> [[X:%.*]], <2 x i32> undef +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> %x, <2 x i32> undef + ret <2 x i32> %s +} + +define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { +; CHECK-LABEL: @true_undef_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> undef, <2 x i32> [[X:%.*]] +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> undef, <2 x i32> %x + ret <2 x i32> %s +} From llvm-commits at lists.llvm.org Wed Jul 8 12:54:00 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:54:00 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG9b1e95329af7: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X… (authored by craig.topper). Herald added a project: clang. Herald added a subscriber: cfe-commits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 Files: clang/test/CodeGen/arm-mve-intrinsics/dup.c llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstCombine/select.ll llvm/test/Transforms/InstSimplify/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83360.276527.patch Type: text/x-patch Size: 8854 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 12:55:52 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:55:52 +0000 (UTC) Subject: [PATCH] D82510: [PowerPC][Power10] Implement low-order Vector Multiply, Modulus and Divide Instructions In-Reply-To: References: Message-ID: amyk added a comment. Will address the comment of adding BE tests on the commit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82510/new/ https://reviews.llvm.org/D82510 From llvm-commits at lists.llvm.org Wed Jul 8 12:57:38 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 19:57:38 +0000 (UTC) Subject: [PATCH] D83001: [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats In-Reply-To: References: Message-ID: <3c82bd0cf4355a2692905757b0b8f9f8@localhost.localdomain> ctetreau updated this revision to Diff 276530. ctetreau added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83001/new/ https://reviews.llvm.org/D83001 Files: llvm/include/llvm/IR/PatternMatch.h llvm/test/Transforms/InstCombine/fmul.ll llvm/test/Transforms/InstCombine/mul.ll llvm/unittests/IR/PatternMatch.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83001.276530.patch Type: text/x-patch Size: 12665 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:00:00 2020 From: llvm-commits at lists.llvm.org (Muhammad Usman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:00:00 +0000 (UTC) Subject: [PATCH] D83392: Strlen loop idiom recognition In-Reply-To: References: Message-ID: musman added a comment. @efriedma Thanks for the review, I didn't think to use SCEV for this. I will modify the code as you suggested. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83392/new/ https://reviews.llvm.org/D83392 From llvm-commits at lists.llvm.org Wed Jul 8 13:05:32 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:05:32 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <47e6cf33dfb588156378e8c11c66ceae@localhost.localdomain> hfinkel added a comment. In D82998#2138980 , @dmgreen wrote: > It should be triggering relatively often, from looking at the tests/benchmarks I was seeing. Any code that uses pointer induction variables can trigger it. Whether that will common will be a matter of the style of the original code, and whether it actually proves move NoAlias will be variable. But it seems to come up enough to make code improvements in a fair number of cases. > > From what I can tell it should be almost free though, in terms of compile time. Just an extra check of whether a phi is a simple recursion. Baseline checks are generally self hosting and running the test suite. Make sure that everything looks clean. Check for regressions (compile time or execution time). Have you already done that? Also, checking on how many times this change triggers during that process is useful information. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Wed Jul 8 13:06:30 2020 From: llvm-commits at lists.llvm.org (Philip Reames via llvm-commits) Date: Wed, 8 Jul 2020 13:06:30 -0700 Subject: [llvm] 9b1e953 - [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms In-Reply-To: <5f062454.1c69fb81.403da.1a35@mx.google.com> References: <5f062454.1c69fb81.403da.1a35@mx.google.com> Message-ID: <390b4bd3-5637-f5a5-2f4e-89dd05ccdde9@philipreames.com> Rather than removing them, could we make the transform explicitly dependent on X being provably non-poison?  If nothing else, having the conditional transform gives an obvious place for a comment explaining the subtly herein.  :) Philip On 7/8/20 12:53 PM, Craig Topper via llvm-commits wrote: > Author: Craig Topper > Date: 2020-07-08T12:53:05-07:00 > New Revision: 9b1e95329af7bb005275f18225b2c130ec3ea98d > > URL: https://github.com/llvm/llvm-project/commit/9b1e95329af7bb005275f18225b2c130ec3ea98d > DIFF: https://github.com/llvm/llvm-project/commit/9b1e95329af7bb005275f18225b2c130ec3ea98d.diff > > LOG: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms > > As noted here https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html and by alive2, this transform isn't valid. If X is poison this potentially propagates poison when it shouldn't. > > This same transform still exists in DAGCombiner. > > Differential Revision: https://reviews.llvm.org/D83360 > > Added: > > > Modified: > clang/test/CodeGen/arm-mve-intrinsics/dup.c > llvm/lib/Analysis/InstructionSimplify.cpp > llvm/test/Transforms/InstCombine/select.ll > llvm/test/Transforms/InstSimplify/select.ll > > Removed: > > > > ################################################################################ > diff --git a/clang/test/CodeGen/arm-mve-intrinsics/dup.c b/clang/test/CodeGen/arm-mve-intrinsics/dup.c > index 283c08257005..b443917cb258 100644 > --- a/clang/test/CodeGen/arm-mve-intrinsics/dup.c > +++ b/clang/test/CodeGen/arm-mve-intrinsics/dup.c > @@ -242,7 +242,8 @@ uint32x4_t test_vdupq_m_n_u32(uint32x4_t inactive, uint32_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x half> undef, half [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x half> [[DOTSPLATINSERT]], <8 x half> undef, <8 x i32> zeroinitializer > -// CHECK-NEXT: ret <8 x half> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x half> [[DOTSPLAT]], <8 x half> undef > +// CHECK-NEXT: ret <8 x half> [[TMP2]] > // > float16x8_t test_vdupq_x_n_f16(float16_t a, mve_pred16_t p) > { > @@ -255,7 +256,8 @@ float16x8_t test_vdupq_x_n_f16(float16_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer > -// CHECK-NEXT: ret <4 x float> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[DOTSPLAT]], <4 x float> undef > +// CHECK-NEXT: ret <4 x float> [[TMP2]] > // > float32x4_t test_vdupq_x_n_f32(float32_t a, mve_pred16_t p) > { > @@ -268,7 +270,8 @@ float32x4_t test_vdupq_x_n_f32(float32_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer > -// CHECK-NEXT: ret <16 x i8> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[DOTSPLAT]], <16 x i8> undef > +// CHECK-NEXT: ret <16 x i8> [[TMP2]] > // > int8x16_t test_vdupq_x_n_s8(int8_t a, mve_pred16_t p) > { > @@ -281,7 +284,8 @@ int8x16_t test_vdupq_x_n_s8(int8_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer > -// CHECK-NEXT: ret <8 x i16> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[DOTSPLAT]], <8 x i16> undef > +// CHECK-NEXT: ret <8 x i16> [[TMP2]] > // > int16x8_t test_vdupq_x_n_s16(int16_t a, mve_pred16_t p) > { > @@ -294,7 +298,8 @@ int16x8_t test_vdupq_x_n_s16(int16_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer > -// CHECK-NEXT: ret <4 x i32> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[DOTSPLAT]], <4 x i32> undef > +// CHECK-NEXT: ret <4 x i32> [[TMP2]] > // > int32x4_t test_vdupq_x_n_s32(int32_t a, mve_pred16_t p) > { > @@ -307,7 +312,8 @@ int32x4_t test_vdupq_x_n_s32(int32_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <16 x i8> undef, i8 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <16 x i8> [[DOTSPLATINSERT]], <16 x i8> undef, <16 x i32> zeroinitializer > -// CHECK-NEXT: ret <16 x i8> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[DOTSPLAT]], <16 x i8> undef > +// CHECK-NEXT: ret <16 x i8> [[TMP2]] > // > uint8x16_t test_vdupq_x_n_u8(uint8_t a, mve_pred16_t p) > { > @@ -320,7 +326,8 @@ uint8x16_t test_vdupq_x_n_u8(uint8_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <8 x i16> undef, i16 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <8 x i16> [[DOTSPLATINSERT]], <8 x i16> undef, <8 x i32> zeroinitializer > -// CHECK-NEXT: ret <8 x i16> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[DOTSPLAT]], <8 x i16> undef > +// CHECK-NEXT: ret <8 x i16> [[TMP2]] > // > uint16x8_t test_vdupq_x_n_u16(uint16_t a, mve_pred16_t p) > { > @@ -333,7 +340,8 @@ uint16x8_t test_vdupq_x_n_u16(uint16_t a, mve_pred16_t p) > // CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]]) > // CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[A:%.*]], i32 0 > // CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer > -// CHECK-NEXT: ret <4 x i32> [[DOTSPLAT]] > +// CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[DOTSPLAT]], <4 x i32> undef > +// CHECK-NEXT: ret <4 x i32> [[TMP2]] > // > uint32x4_t test_vdupq_x_n_u32(uint32_t a, mve_pred16_t p) > { > > diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp > index d3bdf9d6aafd..8cd5d2034586 100644 > --- a/llvm/lib/Analysis/InstructionSimplify.cpp > +++ b/llvm/lib/Analysis/InstructionSimplify.cpp > @@ -4118,11 +4118,6 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, > if (TrueVal == FalseVal) > return TrueVal; > > - if (isa(TrueVal)) // select ?, undef, X -> X > - return FalseVal; > - if (isa(FalseVal)) // select ?, X, undef -> X > - return TrueVal; > - > // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' > Constant *TrueC, *FalseC; > if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && > > diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll > index 381a77bb8d78..f990a58f984c 100644 > --- a/llvm/test/Transforms/InstCombine/select.ll > +++ b/llvm/test/Transforms/InstCombine/select.ll > @@ -2273,3 +2273,43 @@ exit: > %sel = select i1 %cond, i32 %phi, i32 %A > ret i32 %sel > } > + > +; Negative tests to ensure we don't remove selects with undef true/false values. > +; See https://bugs.llvm.org/show_bug.cgi?id=31633 > +; https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html > +; https://reviews.llvm.org/D83360 > +define i32 @false_undef(i1 %cond, i32 %x) { > +; CHECK-LABEL: @false_undef( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[X:%.*]], i32 undef > +; CHECK-NEXT: ret i32 [[S]] > +; > + %s = select i1 %cond, i32 %x, i32 undef > + ret i32 %s > +} > + > +define i32 @true_undef(i1 %cond, i32 %x) { > +; CHECK-LABEL: @true_undef( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[X:%.*]] > +; CHECK-NEXT: ret i32 [[S]] > +; > + %s = select i1 %cond, i32 undef, i32 %x > + ret i32 %s > +} > + > +define <2 x i32> @false_undef_vec(i1 %cond, <2 x i32> %x) { > +; CHECK-LABEL: @false_undef_vec( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> [[X:%.*]], <2 x i32> undef > +; CHECK-NEXT: ret <2 x i32> [[S]] > +; > + %s = select i1 %cond, <2 x i32> %x, <2 x i32> undef > + ret <2 x i32> %s > +} > + > +define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { > +; CHECK-LABEL: @true_undef_vec( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> undef, <2 x i32> [[X:%.*]] > +; CHECK-NEXT: ret <2 x i32> [[S]] > +; > + %s = select i1 %cond, <2 x i32> undef, <2 x i32> %x > + ret <2 x i32> %s > +} > > diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll > index 139f5e3c3c23..81fc3ff186cd 100644 > --- a/llvm/test/Transforms/InstSimplify/select.ll > +++ b/llvm/test/Transforms/InstSimplify/select.ll > @@ -750,3 +750,43 @@ define i1 @y_might_be_poison(float %x, float %y) { > %c3 = select i1 %c1, i1 %c2, i1 false > ret i1 %c3 > } > + > +; Negative tests to ensure we don't remove selects with undef true/false values. > +; See https://bugs.llvm.org/show_bug.cgi?id=31633 > +; https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html > +; https://reviews.llvm.org/D83360 > +define i32 @false_undef(i1 %cond, i32 %x) { > +; CHECK-LABEL: @false_undef( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[X:%.*]], i32 undef > +; CHECK-NEXT: ret i32 [[S]] > +; > + %s = select i1 %cond, i32 %x, i32 undef > + ret i32 %s > +} > + > +define i32 @true_undef(i1 %cond, i32 %x) { > +; CHECK-LABEL: @true_undef( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[X:%.*]] > +; CHECK-NEXT: ret i32 [[S]] > +; > + %s = select i1 %cond, i32 undef, i32 %x > + ret i32 %s > +} > + > +define <2 x i32> @false_undef_vec(i1 %cond, <2 x i32> %x) { > +; CHECK-LABEL: @false_undef_vec( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> [[X:%.*]], <2 x i32> undef > +; CHECK-NEXT: ret <2 x i32> [[S]] > +; > + %s = select i1 %cond, <2 x i32> %x, <2 x i32> undef > + ret <2 x i32> %s > +} > + > +define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { > +; CHECK-LABEL: @true_undef_vec( > +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> undef, <2 x i32> [[X:%.*]] > +; CHECK-NEXT: ret <2 x i32> [[S]] > +; > + %s = select i1 %cond, <2 x i32> undef, <2 x i32> %x > + ret <2 x i32> %s > +} > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Wed Jul 8 13:06:58 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:06:58 +0000 (UTC) Subject: [PATCH] D83005: [NFC] Combine cstfp_pred_ty and cst_pred_ty In-Reply-To: References: Message-ID: <5c1ce7306b1a90ad3b25d6ba21792691@localhost.localdomain> ctetreau updated this revision to Diff 276532. ctetreau added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83005/new/ https://reviews.llvm.org/D83005 Files: llvm/include/llvm/IR/Constants.h llvm/include/llvm/IR/PatternMatch.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83005.276532.patch Type: text/x-patch Size: 4717 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:08:51 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:08:51 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <57a7070f1ecb984b35ccb1360091907e@localhost.localdomain> efriedma added a comment. Please also add testcases with select constant expressions, to test constant folding. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 13:09:25 2020 From: llvm-commits at lists.llvm.org (Rodrigo Dominguez via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:09:25 +0000 (UTC) Subject: [PATCH] D81172: [AMDGPU] Implement hardware bug workaround for image instructions In-Reply-To: References: Message-ID: rdomingu updated this revision to Diff 276533. rdomingu added a comment. Updating D81172 : [AMDGPU] Implement hardware bug workaround for image instructions Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81172/new/ https://reviews.llvm.org/D81172 Files: llvm/lib/Target/AMDGPU/AMDGPU.td llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.store.2d.d16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.store.2d.d16.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.d16.dim.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.d16.dim.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81172.276533.patch Type: text/x-patch Size: 33277 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:09:27 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:09:27 +0000 (UTC) Subject: [PATCH] D83390: [GlobalISel][InlineAsm] Extend input operands when register class size does not match type In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:537-539 + Register ScratchReg = + MRI->createGenericVirtualRegister(LLT::scalar(TargetSize)); + InputReg = MIRBuilder.buildAnyExt(ScratchReg, InputReg).getReg(0); ---------------- Scratch reg and InputReg are the same thing. You can just use the type as the DstOp to buildAnyExt. This may also be a problem for non-scalar sources. What if you have <2 x i4>? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83390/new/ https://reviews.llvm.org/D83390 From llvm-commits at lists.llvm.org Wed Jul 8 13:12:22 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:12:22 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: jdoerfert added inline comments. ================ Comment at: llvm/include/llvm/IR/Function.h:834 + /// Optionally passes back an offending user for diagnostic purposes and + /// ignores callback uses /// ---------------- Not: missing `.`. ================ Comment at: llvm/lib/IR/Function.cpp:1459 +/// other than direct calls or invokes to it. Optionally ignores callback +/// uses +bool Function::hasAddressTaken(const User **PutOffender, ---------------- Nit: missing `.`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Wed Jul 8 13:13:31 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:13:31 +0000 (UTC) Subject: [PATCH] D81500: [SVE] Remove calls to VectorType::getNumElements from IR In-Reply-To: References: Message-ID: ctetreau updated this revision to Diff 276536. ctetreau added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81500/new/ https://reviews.llvm.org/D81500 Files: llvm/include/llvm/IR/GetElementPtrTypeIterator.h llvm/include/llvm/IR/Instructions.h llvm/include/llvm/IR/MatrixBuilder.h llvm/lib/IR/AsmWriter.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/Constants.cpp llvm/lib/IR/Core.cpp llvm/lib/IR/Function.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Instructions.cpp llvm/lib/IR/Verifier.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81500.276536.patch Type: text/x-patch Size: 35508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:14:21 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 20:14:21 +0000 (UTC) Subject: [PATCH] D83421: [RFC] MemorySSAUpdater: Simplify applyUpdates Message-ID: nhaehnle created this revision. nhaehnle added reviewers: asbirlea, kuhar. Herald added subscribers: george.burgess.iv, hiraditya, Prazek. Herald added a project: LLVM. The alternative constructor of DominatorTree does not actually compute a dominator tree that is any different from the standard constructor, and callers of MemorySSAUpdater::applyUpdates have already applied updates to the dominator tree. Change-Id: I311b6e019763b26996e5181f4d43ccedb67d6e88 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83421 Files: llvm/lib/Analysis/MemorySSAUpdater.cpp Index: llvm/lib/Analysis/MemorySSAUpdater.cpp =================================================================== --- llvm/lib/Analysis/MemorySSAUpdater.cpp +++ llvm/lib/Analysis/MemorySSAUpdater.cpp @@ -792,20 +792,8 @@ DeleteUpdates.push_back({DT.Delete, Update.getFrom(), Update.getTo()}); } - if (!DeleteUpdates.empty()) { - // Update for inserted edges: use newDT and snapshot CFG as if deletes had - // not occurred. - // FIXME: This creates a new DT, so it's more expensive to do mix - // delete/inserts vs just inserts. We can do an incremental update on the DT - // to revert deletes, than re-delete the edges. Teaching DT to do this, is - // part of a pending cleanup. - DominatorTree NewDT(DT, DeleteUpdates); - GraphDiff GD(DeleteUpdates, /*ReverseApplyUpdates=*/true); - applyInsertUpdates(InsertUpdates, NewDT, &GD); - } else { - GraphDiff GD; - applyInsertUpdates(InsertUpdates, DT, &GD); - } + GraphDiff GD(DeleteUpdates, /*ReverseApplyUpdates=*/true); + applyInsertUpdates(InsertUpdates, DT, &GD); // Update for deleted edges for (auto &Update : DeleteUpdates) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83421.276537.patch Type: text/x-patch Size: 1178 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:15:14 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:15:14 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <8b5a1f91de3bdce94825cc3deec8dfb5@localhost.localdomain> craig.topper added a comment. In D83360#2139933 , @efriedma wrote: > Please also add testcases with select constant expressions, to test constant folding. Should we remove the handling from llvm::ConstantFoldSelectInstruction Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 13:15:59 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Wed, 08 Jul 2020 20:15:59 +0000 (UTC) Subject: [PATCH] D83421: [RFC] MemorySSAUpdater: Simplify applyUpdates In-Reply-To: References: Message-ID: <295e5e509b026434f26900ec8194335a@localhost.localdomain> nhaehnle added a comment. This is a weird one that I stumbled upon while looking at dominator tree construction. CalculateWithUpdates doesn't actually *do* anything with the passed-in updates, so this change should be NFC. Either this is a very useful optimization, or something is rather broken in the MemorySSAUpdater, but I simply don't know it well enough to tell which one it is, so any illumination would be very much appreciated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83421/new/ https://reviews.llvm.org/D83421 From llvm-commits at lists.llvm.org Wed Jul 8 13:17:04 2020 From: llvm-commits at lists.llvm.org (Ondrej Sykora via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:17:04 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: <59a930f2ca42279976a92f575812e12b@localhost.localdomain> ondrasej added a comment. Looks good overall. ================ Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:58 llvm::SmallVector *Result) { const size_t NumValues = std::max(NewValues.size(), Result->size()); ---------------- [style] Remove the empty line. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 From llvm-commits at lists.llvm.org Wed Jul 8 13:18:48 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:18:48 +0000 (UTC) Subject: [PATCH] D82820: [InstCombine] Fix mismatched attribute lists for combined calls In-Reply-To: References: Message-ID: <557214ba22e5a492b2e905b20a569613@localhost.localdomain> guiand added a comment. Good to hear. Should this be good to go in that case? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82820/new/ https://reviews.llvm.org/D82820 From llvm-commits at lists.llvm.org Wed Jul 8 13:20:10 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:20:10 +0000 (UTC) Subject: [PATCH] D83409: [opt] Remove obsolete --quiet option In-Reply-To: References: Message-ID: aeubanks updated this revision to Diff 276539. aeubanks added a comment. Format (not sure why git clang-format didn't work) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83409/new/ https://reviews.llvm.org/D83409 Files: llvm/include/llvm/Support/SystemUtils.h llvm/lib/Support/SystemUtils.cpp llvm/tools/llvm-as/llvm-as.cpp llvm/tools/llvm-extract/llvm-extract.cpp llvm/tools/llvm-link/llvm-link.cpp llvm/tools/opt/PassPrinters.cpp llvm/tools/opt/PassPrinters.h llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83409.276539.patch Type: text/x-patch Size: 13466 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:23:49 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:23:49 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <873ac752b3b806bd2a4b5be708ae433b@localhost.localdomain> paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1063 setOperationAction(ISD::STORE, VT, Custom); + setOperationAction(ISD::TRUNCATE, VT, Custom); } ---------------- efriedma wrote: > This specifically applies to the result type. You might want to note that you're implicitly depending on the fact that we do custom legalization for NEON TRUNCATE operations for other reasons. I wondered about that. Do you think it would be better if I was just explicit and add the necessary setOperation calls even though they're duplicates? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Wed Jul 8 13:25:28 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Wed, 08 Jul 2020 13:25:28 -0700 (PDT) Subject: [llvm] 930eaad - [opt] Remove obsolete --quiet option Message-ID: <5f062bb8.1c69fb81.4a405.1fb3@mx.google.com> Author: Arthur Eubanks Date: 2020-07-08T13:21:20-07:00 New Revision: 930eaadacfd11273af2f9c3ae21664648dc1e26f URL: https://github.com/llvm/llvm-project/commit/930eaadacfd11273af2f9c3ae21664648dc1e26f DIFF: https://github.com/llvm/llvm-project/commit/930eaadacfd11273af2f9c3ae21664648dc1e26f.diff LOG: [opt] Remove obsolete --quiet option git blame shows these were last touched in 2004? Obsoleted in r13844. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D83409 Added: Modified: llvm/include/llvm/Support/SystemUtils.h llvm/lib/Support/SystemUtils.cpp llvm/tools/llvm-as/llvm-as.cpp llvm/tools/llvm-extract/llvm-extract.cpp llvm/tools/llvm-link/llvm-link.cpp llvm/tools/opt/PassPrinters.cpp llvm/tools/opt/PassPrinters.h llvm/tools/opt/opt.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/SystemUtils.h b/llvm/include/llvm/Support/SystemUtils.h index 77deddb9ee1c..786bea3fcfae 100644 --- a/llvm/include/llvm/Support/SystemUtils.h +++ b/llvm/include/llvm/Support/SystemUtils.h @@ -15,17 +15,16 @@ #define LLVM_SUPPORT_SYSTEMUTILS_H namespace llvm { - class raw_ostream; +class raw_ostream; /// Determine if the raw_ostream provided is connected to a terminal. If so, /// generate a warning message to errs() advising against display of bitcode /// and return true. Otherwise just return false. /// Check for output written to a console bool CheckBitcodeOutputToConsole( - raw_ostream &stream_to_check, ///< The stream to be checked - bool print_warning = true ///< Control whether warnings are printed + raw_ostream &stream_to_check ///< The stream to be checked ); -} // End llvm namespace +} // namespace llvm #endif diff --git a/llvm/lib/Support/SystemUtils.cpp b/llvm/lib/Support/SystemUtils.cpp index 47e0c72ec7c1..f1149e48dce5 100644 --- a/llvm/lib/Support/SystemUtils.cpp +++ b/llvm/lib/Support/SystemUtils.cpp @@ -15,15 +15,12 @@ #include "llvm/Support/raw_ostream.h" using namespace llvm; -bool llvm::CheckBitcodeOutputToConsole(raw_ostream &stream_to_check, - bool print_warning) { +bool llvm::CheckBitcodeOutputToConsole(raw_ostream &stream_to_check) { if (stream_to_check.is_displayed()) { - if (print_warning) { - errs() << "WARNING: You're attempting to print out a bitcode file.\n" - "This is inadvisable as it may cause display problems. If\n" - "you REALLY want to taste LLVM bitcode first-hand, you\n" - "can force output with the `-f' option.\n\n"; - } + errs() << "WARNING: You're attempting to print out a bitcode file.\n" + "This is inadvisable as it may cause display problems. If\n" + "you REALLY want to taste LLVM bitcode first-hand, you\n" + "can force output with the `-f' option.\n\n"; return true; } return false; diff --git a/llvm/tools/llvm-as/llvm-as.cpp b/llvm/tools/llvm-as/llvm-as.cpp index 699e9121b88a..f2b52890a7f5 100644 --- a/llvm/tools/llvm-as/llvm-as.cpp +++ b/llvm/tools/llvm-as/llvm-as.cpp @@ -88,7 +88,7 @@ static void WriteOutputFile(const Module *M, const ModuleSummaryIndex *Index) { exit(1); } - if (Force || !CheckBitcodeOutputToConsole(Out->os(), true)) { + if (Force || !CheckBitcodeOutputToConsole(Out->os())) { const ModuleSummaryIndex *IndexToWrite = nullptr; // Don't attempt to write a summary index unless it contains any entries or // has non-zero flags. The latter is used to assemble dummy index files for diff --git a/llvm/tools/llvm-extract/llvm-extract.cpp b/llvm/tools/llvm-extract/llvm-extract.cpp index 9be4627e2ed7..cb1c4116ff19 100644 --- a/llvm/tools/llvm-extract/llvm-extract.cpp +++ b/llvm/tools/llvm-extract/llvm-extract.cpp @@ -381,7 +381,7 @@ int main(int argc, char **argv) { if (OutputAssembly) Passes.add( createPrintModulePass(Out.os(), "", PreserveAssemblyUseListOrder)); - else if (Force || !CheckBitcodeOutputToConsole(Out.os(), true)) + else if (Force || !CheckBitcodeOutputToConsole(Out.os())) Passes.add(createBitcodeWriterPass(Out.os(), PreserveBitcodeUseListOrder)); Passes.run(*M.get()); diff --git a/llvm/tools/llvm-link/llvm-link.cpp b/llvm/tools/llvm-link/llvm-link.cpp index d99659f3d50a..a7cda24bbe0a 100644 --- a/llvm/tools/llvm-link/llvm-link.cpp +++ b/llvm/tools/llvm-link/llvm-link.cpp @@ -399,7 +399,7 @@ int main(int argc, char **argv) { errs() << "Writing bitcode...\n"; if (OutputAssembly) { Composite->print(Out.os(), nullptr, PreserveAssemblyUseListOrder); - } else if (Force || !CheckBitcodeOutputToConsole(Out.os(), true)) + } else if (Force || !CheckBitcodeOutputToConsole(Out.os())) WriteBitcodeToFile(*Composite, Out.os(), PreserveBitcodeUseListOrder); // Declare success. diff --git a/llvm/tools/opt/PassPrinters.cpp b/llvm/tools/opt/PassPrinters.cpp index ed4fc1a8174b..4e81b5d29c4d 100644 --- a/llvm/tools/opt/PassPrinters.cpp +++ b/llvm/tools/opt/PassPrinters.cpp @@ -33,18 +33,16 @@ struct FunctionPassPrinter : public FunctionPass { raw_ostream &Out; static char ID; std::string PassName; - bool QuietPass; - FunctionPassPrinter(const PassInfo *PI, raw_ostream &out, bool Quiet) - : FunctionPass(ID), PassToPrint(PI), Out(out), QuietPass(Quiet) { + FunctionPassPrinter(const PassInfo *PI, raw_ostream &out) + : FunctionPass(ID), PassToPrint(PI), Out(out) { std::string PassToPrintName = std::string(PassToPrint->getPassName()); PassName = "FunctionPass Printer: " + PassToPrintName; } bool runOnFunction(Function &F) override { - if (!QuietPass) - Out << "Printing analysis '" << PassToPrint->getPassName() - << "' for function '" << F.getName() << "':\n"; + Out << "Printing analysis '" << PassToPrint->getPassName() + << "' for function '" << F.getName() << "':\n"; // Get and print pass... getAnalysisID(PassToPrint->getTypeInfo()).print(Out, F.getParent()); @@ -66,17 +64,15 @@ struct CallGraphSCCPassPrinter : public CallGraphSCCPass { const PassInfo *PassToPrint; raw_ostream &Out; std::string PassName; - bool QuietPass; - CallGraphSCCPassPrinter(const PassInfo *PI, raw_ostream &out, bool Quiet) - : CallGraphSCCPass(ID), PassToPrint(PI), Out(out), QuietPass(Quiet) { + CallGraphSCCPassPrinter(const PassInfo *PI, raw_ostream &out) + : CallGraphSCCPass(ID), PassToPrint(PI), Out(out) { std::string PassToPrintName = std::string(PassToPrint->getPassName()); PassName = "CallGraphSCCPass Printer: " + PassToPrintName; } bool runOnSCC(CallGraphSCC &SCC) override { - if (!QuietPass) - Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; + Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; // Get and print pass... for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) { @@ -103,17 +99,15 @@ struct ModulePassPrinter : public ModulePass { const PassInfo *PassToPrint; raw_ostream &Out; std::string PassName; - bool QuietPass; - ModulePassPrinter(const PassInfo *PI, raw_ostream &out, bool Quiet) - : ModulePass(ID), PassToPrint(PI), Out(out), QuietPass(Quiet) { + ModulePassPrinter(const PassInfo *PI, raw_ostream &out) + : ModulePass(ID), PassToPrint(PI), Out(out) { std::string PassToPrintName = std::string(PassToPrint->getPassName()); PassName = "ModulePass Printer: " + PassToPrintName; } bool runOnModule(Module &M) override { - if (!QuietPass) - Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; + Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; // Get and print pass... getAnalysisID(PassToPrint->getTypeInfo()).print(Out, &M); @@ -135,17 +129,15 @@ struct LoopPassPrinter : public LoopPass { const PassInfo *PassToPrint; raw_ostream &Out; std::string PassName; - bool QuietPass; - LoopPassPrinter(const PassInfo *PI, raw_ostream &out, bool Quiet) - : LoopPass(ID), PassToPrint(PI), Out(out), QuietPass(Quiet) { + LoopPassPrinter(const PassInfo *PI, raw_ostream &out) + : LoopPass(ID), PassToPrint(PI), Out(out) { std::string PassToPrintName = std::string(PassToPrint->getPassName()); PassName = "LoopPass Printer: " + PassToPrintName; } bool runOnLoop(Loop *L, LPPassManager &LPM) override { - if (!QuietPass) - Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; + Out << "Printing analysis '" << PassToPrint->getPassName() << "':\n"; // Get and print pass... getAnalysisID(PassToPrint->getTypeInfo()) @@ -168,20 +160,17 @@ struct RegionPassPrinter : public RegionPass { const PassInfo *PassToPrint; raw_ostream &Out; std::string PassName; - bool QuietPass; - RegionPassPrinter(const PassInfo *PI, raw_ostream &out, bool Quiet) - : RegionPass(ID), PassToPrint(PI), Out(out), QuietPass(Quiet) { + RegionPassPrinter(const PassInfo *PI, raw_ostream &out) + : RegionPass(ID), PassToPrint(PI), Out(out) { std::string PassToPrintName = std::string(PassToPrint->getPassName()); PassName = "RegionPass Printer: " + PassToPrintName; } bool runOnRegion(Region *R, RGPassManager &RGM) override { - if (!QuietPass) { - Out << "Printing analysis '" << PassToPrint->getPassName() << "' for " - << "region: '" << R->getNameStr() << "' in function '" - << R->getEntry()->getParent()->getName() << "':\n"; - } + Out << "Printing analysis '" << PassToPrint->getPassName() << "' for " + << "region: '" << R->getNameStr() << "' in function '" + << R->getEntry()->getParent()->getName() << "':\n"; // Get and print pass... getAnalysisID(PassToPrint->getTypeInfo()) .print(Out, R->getEntry()->getParent()->getParent()); @@ -201,28 +190,23 @@ char RegionPassPrinter::ID = 0; } // end anonymous namespace FunctionPass *llvm::createFunctionPassPrinter(const PassInfo *PI, - raw_ostream &OS, bool Quiet) { - return new FunctionPassPrinter(PI, OS, Quiet); + raw_ostream &OS) { + return new FunctionPassPrinter(PI, OS); } CallGraphSCCPass *llvm::createCallGraphPassPrinter(const PassInfo *PI, - raw_ostream &OS, - bool Quiet) { - return new CallGraphSCCPassPrinter(PI, OS, Quiet); + raw_ostream &OS) { + return new CallGraphSCCPassPrinter(PI, OS); } -ModulePass *llvm::createModulePassPrinter(const PassInfo *PI, raw_ostream &OS, - bool Quiet) { - return new ModulePassPrinter(PI, OS, Quiet); +ModulePass *llvm::createModulePassPrinter(const PassInfo *PI, raw_ostream &OS) { + return new ModulePassPrinter(PI, OS); } -LoopPass *llvm::createLoopPassPrinter(const PassInfo *PI, raw_ostream &OS, - bool Quiet) { - return new LoopPassPrinter(PI, OS, Quiet); +LoopPass *llvm::createLoopPassPrinter(const PassInfo *PI, raw_ostream &OS) { + return new LoopPassPrinter(PI, OS); } -RegionPass *llvm::createRegionPassPrinter(const PassInfo *PI, raw_ostream &OS, - bool Quiet) { - return new RegionPassPrinter(PI, OS, Quiet); +RegionPass *llvm::createRegionPassPrinter(const PassInfo *PI, raw_ostream &OS) { + return new RegionPassPrinter(PI, OS); } - diff --git a/llvm/tools/opt/PassPrinters.h b/llvm/tools/opt/PassPrinters.h index 9342c46f2ff6..a4e1921399fc 100644 --- a/llvm/tools/opt/PassPrinters.h +++ b/llvm/tools/opt/PassPrinters.h @@ -24,20 +24,16 @@ class PassInfo; class raw_ostream; class RegionPass; -FunctionPass *createFunctionPassPrinter(const PassInfo *PI, raw_ostream &out, - bool Quiet); +FunctionPass *createFunctionPassPrinter(const PassInfo *PI, raw_ostream &out); CallGraphSCCPass *createCallGraphPassPrinter(const PassInfo *PI, - raw_ostream &out, bool Quiet); + raw_ostream &out); -ModulePass *createModulePassPrinter(const PassInfo *PI, raw_ostream &out, - bool Quiet); +ModulePass *createModulePassPrinter(const PassInfo *PI, raw_ostream &out); -LoopPass *createLoopPassPrinter(const PassInfo *PI, raw_ostream &out, - bool Quiet); +LoopPass *createLoopPassPrinter(const PassInfo *PI, raw_ostream &out); -RegionPass *createRegionPassPrinter(const PassInfo *PI, raw_ostream &out, - bool Quiet); +RegionPass *createRegionPassPrinter(const PassInfo *PI, raw_ostream &out); } // end namespace llvm diff --git a/llvm/tools/opt/opt.cpp b/llvm/tools/opt/opt.cpp index 0e52134f0100..c250eefb8c43 100644 --- a/llvm/tools/opt/opt.cpp +++ b/llvm/tools/opt/opt.cpp @@ -203,13 +203,6 @@ DisableBuiltins("disable-builtin", cl::desc("Disable specific target library builtin function"), cl::ZeroOrMore); - -static cl::opt -Quiet("q", cl::desc("Obsolete option"), cl::Hidden); - -static cl::alias -QuietA("quiet", cl::desc("Alias for -q"), cl::aliasopt(Quiet)); - static cl::opt AnalyzeOnly("analyze", cl::desc("Only perform analysis, no optimization")); @@ -730,7 +723,7 @@ int main(int argc, char **argv) { // console, print out a warning message and refuse to do it. We don't // impress anyone by spewing tons of binary goo to a terminal. if (!Force && !NoOutput && !AnalyzeOnly && !OutputAssembly) - if (CheckBitcodeOutputToConsole(Out->os(), !Quiet)) + if (CheckBitcodeOutputToConsole(Out->os())) NoOutput = true; if (OutputThinLTOBC) @@ -900,19 +893,19 @@ int main(int argc, char **argv) { if (AnalyzeOnly) { switch (Kind) { case PT_Region: - Passes.add(createRegionPassPrinter(PassInf, Out->os(), Quiet)); + Passes.add(createRegionPassPrinter(PassInf, Out->os())); break; case PT_Loop: - Passes.add(createLoopPassPrinter(PassInf, Out->os(), Quiet)); + Passes.add(createLoopPassPrinter(PassInf, Out->os())); break; case PT_Function: - Passes.add(createFunctionPassPrinter(PassInf, Out->os(), Quiet)); + Passes.add(createFunctionPassPrinter(PassInf, Out->os())); break; case PT_CallGraphSCC: - Passes.add(createCallGraphPassPrinter(PassInf, Out->os(), Quiet)); + Passes.add(createCallGraphPassPrinter(PassInf, Out->os())); break; default: - Passes.add(createModulePassPrinter(PassInf, Out->os(), Quiet)); + Passes.add(createModulePassPrinter(PassInf, Out->os())); break; } } From llvm-commits at lists.llvm.org Wed Jul 8 13:25:33 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:25:33 +0000 (UTC) Subject: [PATCH] D83409: [opt] Remove obsolete --quiet option In-Reply-To: References: Message-ID: <6e96011aa58f71aff306c68bc5e24636@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG930eaadacfd1: [opt] Remove obsolete --quiet option (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83409/new/ https://reviews.llvm.org/D83409 Files: llvm/include/llvm/Support/SystemUtils.h llvm/lib/Support/SystemUtils.cpp llvm/tools/llvm-as/llvm-as.cpp llvm/tools/llvm-extract/llvm-extract.cpp llvm/tools/llvm-link/llvm-link.cpp llvm/tools/opt/PassPrinters.cpp llvm/tools/opt/PassPrinters.h llvm/tools/opt/opt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83409.276540.patch Type: text/x-patch Size: 13466 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:29:09 2020 From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:29:09 +0000 (UTC) Subject: [PATCH] D83416: [NFC] Fix some docs warnings In-Reply-To: References: Message-ID: <733aa80f8f1f6bbf981bcdfd1e6a9409@localhost.localdomain> lattner accepted this revision. lattner added a comment. This revision is now accepted and ready to land. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83416/new/ https://reviews.llvm.org/D83416 From llvm-commits at lists.llvm.org Wed Jul 8 13:30:50 2020 From: llvm-commits at lists.llvm.org (Cameron McInally via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:30:50 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <90f8b4443597623c6d98886668e714c5@localhost.localdomain> cameron.mcinally marked an inline comment as done. cameron.mcinally added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll:213 +; VBITS_GE_512: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; CHECK: ret ---------------- Just passing by and had to comment that this is an expensive truncate. It's around 5x slower than the equivalent on x86, with 1/2 the throughput. Might be a good candidate for a dedicated hardware instruction on future SVE revisions... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Wed Jul 8 13:36:07 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:36:07 +0000 (UTC) Subject: [PATCH] D83423: [MC, NVPTX] Add MCAsmPrinter support for unsigned-only data directives. Message-ID: tra created this revision. tra added reviewers: ABataev, grosbach. Herald added subscribers: sanjoy.google, bixia, hiraditya, jholewinski. Herald added a project: LLVM. PTX does not support negative values in .bNN data directives and we must typecast such values to unsigned before printing them. MCAsmInfo can now specify whether such casting is necessary for particular target. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83423 Files: llvm/include/llvm/MC/MCAsmInfo.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp llvm/test/CodeGen/NVPTX/data-direcitve-negative-values.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83423.276541.patch Type: text/x-patch Size: 6201 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:36:10 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:36:10 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <03af9140a3d57d007e4ab0b1a1f6270a@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. LGTM (forgot to accept earlier, sorry). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Wed Jul 8 13:40:21 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:40:21 +0000 (UTC) Subject: [PATCH] D83424: [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. Message-ID: yamauchi created this revision. yamauchi added reviewers: davidxl, RKSimon. Herald added a project: LLVM. These tests are split from D83332 and in the state where the profile guided size optimization is not yet enabled. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83424 Files: llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/phaddsub-extract.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83424.276544.patch Type: text/x-patch Size: 4787 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 13:40:36 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:40:36 +0000 (UTC) Subject: [PATCH] D83316: [OpenMPOpt][WIP] Structure for unittests In-Reply-To: References: Message-ID: <3546526ec292900d46c3c965e4b49cfa@localhost.localdomain> jdoerfert added a comment. Should we merge the test header into the test cpp? We need an actual unit test that runs :) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83316/new/ https://reviews.llvm.org/D83316 From llvm-commits at lists.llvm.org Wed Jul 8 13:43:20 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:43:20 +0000 (UTC) Subject: [PATCH] D81378: GlobalISel: Handle more cases in getGCDType In-Reply-To: References: Message-ID: arsenm marked 2 inline comments as done. arsenm added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:551 + + if (OrigTy.isVector()) { + LLT OrigElt = OrigTy.getElementType(); ---------------- aemerson wrote: > Can we reorganize this to not have so much nesting? > > Maybe duplicate `greatestCommonDivisor(OrigSize, TargetSize);` above this for the scalar case and early exit. I'm not sure what you mean, the fallthroughs here are significant. I don't see where duplicating greatestCommonDivisor(OrigSize, TargetSize) would help. I also wouldn't all anything here the scalar case; care is taken to manage the behavior of vector and "not vector" to preserve pointers where appropriate. I think isScalar not being the same as !isVector is a constant source of confusion ================ Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:560 + } + } else { + // If the source is a vector of pointers, return a pointer element. ---------------- aemerson wrote: > Invert this so the else early exits? These don't return though? It falls through CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81378/new/ https://reviews.llvm.org/D81378 From llvm-commits at lists.llvm.org Wed Jul 8 13:47:53 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:47:53 +0000 (UTC) Subject: [PATCH] D82881: [DEBUGINFO]Fix debug info for packed bitfields. In-Reply-To: References: Message-ID: <9bcd548977659c0b4dcc8538fc78cd92@localhost.localdomain> tra added a comment. I've sent D83423 to make sure NVPTX can handle negative values. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82881/new/ https://reviews.llvm.org/D82881 From llvm-commits at lists.llvm.org Wed Jul 8 13:56:08 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 20:56:08 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: fhahn added a comment. In D83335#2139796 , @efriedma wrote: > > After looking at the code for the source order comperator, it looks like the score could change after units are scheduled as well in some edge cases. > > So AssertisHeap might fail? I'm not really comfortable with that... I am not sure if we want to leave them in either. The main reason to include them in the patch was to show how I tried to verify things behave sanely for a wide range of inputs. As mentioned earlier, not picking the best candidate here should not a big deal and it should happen very rarely (did not happen during bootstrap on X86 and various SPEC & MultiSource benchmarks). The selection should be deterministic across different compilers/C++ STLs because the comparator enforces a total order. Does that make sense? > > >> We might even go further and limit the source order comperator to just the IR ordering and the queue IDs, because the real scheduling should happen in the machine scheduler. > > Make this a separate patch, in case it has some unexpected side-effect, but sure, that makes sense. yes that definitely needs to be separate. I'll need to do a more careful evaluation there, as changing the heuristic unfortunately impacts a bunch of test cases in small ways. In D83335#2139810 , @efriedma wrote: > Also, maybe we could change the way we compute scheduling priority based on the size of the queue. So keep the current scheduling for common cases, but switch to a simpler heuristic if the queue gets too large. Are you referring to using the heap only once the queue grows larger than a threshold or deciding what scheduling heuristics to enable based on the size? I'll add back the original threshold back to the patch. I removed it to ensure the heap & assertions are applied as broadly as possible for verification. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Wed Jul 8 14:01:27 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:01:27 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <2cd7906111de0717a09a16ed7ec47c1b@localhost.localdomain> MaskRay added a comment. In D83013#2139882 , @zequanwu wrote: > > The alternative of using LazyBlockFrequencyInfoPass and checking PSI->hasProfileSummary() first would also work I guess. If you think that's cleaner, maybe that's the better way to go. > > Since `PSI->hasProfileSummary()` is not necessary for this pass, it relies on function entry count. So, I check for `F.getEntryCount()` before getting BFI. Thanks. The last update looks good to me. I'll defer the approval to @nikic and folks who have expressed concerns about deleting legacy PM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Wed Jul 8 14:01:39 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:01:39 +0000 (UTC) Subject: [PATCH] D83424: [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. In-Reply-To: References: Message-ID: yamauchi updated this revision to Diff 276547. yamauchi added a comment. Rebase. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83424/new/ https://reviews.llvm.org/D83424 Files: llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/phaddsub-extract.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83424.276547.patch Type: text/x-patch Size: 4787 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:06:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:06:58 +0000 (UTC) Subject: [PATCH] D81901: GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT In-Reply-To: References: Message-ID: arsenm updated this revision to Diff 276549. arsenm added a comment. Remove unnecessary check CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81901/new/ https://reviews.llvm.org/D81901 Files: llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i8.ll llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-extract-vector-elt.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.dim.a16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-shuffle-vector.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-shuffle-vector.s16.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81901.276549.patch Type: text/x-patch Size: 486161 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:09:04 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:09:04 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <94b26253ee48dfc38f96ebf1a46164f5@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll:213 +; VBITS_GE_512: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; CHECK: ret ---------------- cameron.mcinally wrote: > Just passing by and had to comment that this is an expensive truncate. It's around 5x slower than the equivalent on x86, with 1/2 the throughput. > > Might be a good candidate for a dedicated hardware instruction on future SVE revisions... We could experiment with using `tbl` or `compact`, if this comes up in practice. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Wed Jul 8 14:11:11 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:11:11 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <920bf8956295d963839e4d2ea3c4b65b@localhost.localdomain> DiggerLin marked 48 inline comments as done. DiggerLin added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:310 + static constexpr uint32_t IsEprolMask = 0x0000'4000; + static constexpr uint32_t HasCodeLenMask = 0x0000'2000; + static constexpr uint32_t IntProcMask = 0x0000'1000; ---------------- jasonliu wrote: > In AIX OS header "/usr/include/sys/debug.h", we have this field as > ``` > unsigned has_tboff:1; /* Set if offset from start of proc stored */ > ``` > It might be better to rename HasCodeLenMask sot that it has a bit association with the original name? So that people do not need to reason about if this is the correct field or not. I think I will like to keep the HasFunctionCodeLenMask. for the we will give the function code lenth later depend in the the bit. Changing the code to if (!Err & HasTraceBackTableOffset() CodeLen = DE.getU32(&OffsetPtr, &Err); is not better than if (!Err && hasFunctionCodeLen()) CodeLen = DE.getU32(&OffsetPtr, &Err); ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:334 + // Byte 6 + static constexpr uint32_t HasVecInfoMask = 0x0080'0000; + static constexpr uint32_t Spare4Mask = 0x0040'0000; ---------------- jasonliu wrote: > I find the 6th byte here is a bit different than what we have in the OS headers: > ``` > /* Byte 6 */ > unsigned longtbtable:1; /* Set if xtbtable extension exists. */ > unsigned has_vec:1; /* Set if optional vector info is present */ > unsigned gpr_saved:6; /* Number of GPRs saved, max of 32 */ > ``` yes, the have two other documents both are different with the /usr/include/sys/debug.h (aix OS) . I keep the patch same as /usr/include/sys/debug.h now. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:396 + const uint8_t *TBPtr; + const uint64_t Size; + Optional ParaType; ---------------- jasonliu wrote: > Do you actually need the Size as data member? we need to Size to know whether the traceback table is long enough for the all for the fields. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:400 + Optional HandMask; + Optional CtlInfo; + Optional FunctionNameLen; ---------------- jasonliu wrote: > TODO: no number of CTL anchors? or Displacement into stack of each anchor? since most of function do not have controlled storage , decoding of controlled storage will put on another patch. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:846 + +static std::string parseParaType(uint32_t Value, unsigned int ParaNum) { + std::string ParaType; ---------------- jhenderson wrote: > Any particular reason you're using `unsigned int` here instead of just `unsigned` like you do below? Should it actually be a `size_t` in both cases? we know the ParaNum is less than 512 , I do not think we need size_t for it. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:850 + if (I != 0) + ParaType += ", "; + if ((Value & TracebackTable::FixedParaTypeBit) == 0) { ---------------- jasonliu wrote: > Consider doing ParaType += "i, " and ParaType += "f, " ... > and do a removal of ", " after parsing all parameters. since we will use SmallString, The cost of deleting last "," is more expense than currently implement. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:885 + DataExtractor DE(ArrayRef(Ptr, S), false, 4); + uint64_t offset_ptr = 0; + ---------------- jhenderson wrote: > Please use LLVM style for variable names. thanks ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:902 + CtlInfo = DE.getU32(&offset_ptr, &Err); + + if (!Err && isFuncNamePresent()) { ---------------- jasonliu wrote: > Are we missing something between CtlInfo and FunctionNameLen? > > ``` > * ctl_info exists if has_ctl bit is set. > * ctl_info_disp exists if ctl_info exists. > * name_len exists if name_present bit is set. > ``` > i.e. the ctl_info_disp? controlled storage info related will be put into another patch. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- jasonliu wrote: > Why do we need to declare a new variable? yes , we need it . it been use here FunctionName = DE.getBytes(&offset_ptr, Len, &Err); since after we get a value the point offset_ptr moved, we can not get it second time. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:916 + (support::endian::read32be(TBPtr + P) & TracebackTable::X) +#define GETBITWITHMASKSHIFT(P, X, S) \ + (support::endian::read32be(TBPtr + P) & TracebackTable::X) >> \ ---------------- jasonliu wrote: > Macros are missing undefs. > thanks ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:85 + XCOFFTracebackTable::create(V1, sizeof(V1)); + ASSERT_TRUE(!!TTOrErr1) << "Parse error"; + XCOFFTracebackTable TT1 = TTOrErr1.get(); ---------------- jhenderson wrote: > Here and in the equivalent cases elsewhere, use `ASSERT_THAT_EXPECTED(TTOrErr1, Succeeded());` thanks ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:86 + ASSERT_TRUE(!!TTOrErr1) << "Parse error"; + XCOFFTracebackTable TT1 = TTOrErr1.get(); + EXPECT_EQ(TT1.getVersion(), 1); ---------------- jhenderson wrote: > `XCOFFTracebackTable TT1 = *TTOrErr1;` is more traditional usage. thanks ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:95 + + EXPECT_TRUE(TT1.getParaType()); + EXPECT_STREQ(TT1.getParaType().getValue().data(), "i, i, f, f, d"); ---------------- jhenderson wrote: > `ASSERT_TRUE` or the next check will crash if it ever fails. thanks ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:96 + EXPECT_TRUE(TT1.getParaType()); + EXPECT_STREQ(TT1.getParaType().getValue().data(), "i, i, f, f, d"); + ---------------- jhenderson wrote: > Does `EXPECT_EQ(TT1.getParaType().getValue(), "i, i, f, f, d");` not work? according to https://github.com/google/googletest/blob/master/googletest/docs/primer.md EXPECT_EQ(val1, val2); compare two values. val1 == val2 EXPECT_STREQ(str1,str2); compare two C strings the two C strings have the same content ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:101-103 + ASSERT_TRUE(!!TTOrErr2) << "Parse error"; + XCOFFTracebackTable TT2 = TTOrErr2.get(); + EXPECT_STREQ(TT2.getParaType().getValue().data(), "f, f, d, i, i"); ---------------- jhenderson wrote: > For this and the ones below, same comments as above, but you also need an `ASSERT_TRUE(TT2.getParaType())` to avoid a crash in case the `Optional` is empty. thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 14:11:15 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:11:15 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: <212c113fab5a9c8d0d46430e3d016542@localhost.localdomain> guiand updated this revision to Diff 276550. guiand marked 2 inline comments as done. guiand added a comment. I'm splitting this patch into this new one, which depends on nothing but `noundef` correctly parsed by LLVM, and the rest which relies on clang emitting `noundef` everywhere. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 Files: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81699.276550.patch Type: text/x-patch Size: 8235 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:12:00 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:12:00 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: <15a8657349089a302d2cd977b3c1a0cb@localhost.localdomain> guiand marked an inline comment as done. guiand added inline comments. ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3497 + + if (ClEagerChecks && !CB.hasRetAttr(Attribute::PartialInit)) { + setShadow(&CB, getCleanShadow(&CB)); ---------------- vitalybuka wrote: > ClEagerChecks && !PartialInit; This one is for the return attribute, the other was for the arguments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 From llvm-commits at lists.llvm.org Wed Jul 8 14:12:36 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:12:36 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: stefanp updated this revision to Diff 276553. stefanp marked an inline comment as done. stefanp added a comment. Fixed the alignment that was auto-fixed. Removed the switch that was not needed. Fixed the types in the TD file to be s34imm. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 Files: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83255.276553.patch Type: text/x-patch Size: 8319 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:13:42 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:13:42 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: <7baa28a5b0ba0a78612c8bb9f41c53ca@localhost.localdomain> paulwalker-arm marked 2 inline comments as done. paulwalker-arm added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3569 +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { ---------------- efriedma wrote: > Would it be enough to just fix isShuffleMaskLegal, instead of overriding this? We expand VECTOR_SHUFFLE (using BUILD_VECTOR) so at this stage I think it's better to just prevent extra VECTOR_SHUFFLE instances/code paths as early as possible. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.h:745 + /// illegal as the original, thus leading to an infinite legalisation loop. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); ---------------- efriedma wrote: > This affects code that isn't using wide vectors, right? > > If this is supposed to be a temporary hack, I guess it's fine, but please explicitly state that in the comment. Sadly it affects all vectors but the interface doesn't have the necessary information. I can see other targets have hit the same issue but I've restricted our version as best I can (only taking affect when wide vectors are enabled). I've added a comment saying we can revert the change once we fully support BUILD_VECTOR when using the wide vectors. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 From llvm-commits at lists.llvm.org Wed Jul 8 14:14:09 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:14:09 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <9a267a60519255df4c0d059d66fd6727@localhost.localdomain> DiggerLin updated this revision to Diff 276551. DiggerLin marked 15 inline comments as done. DiggerLin added a comment. address comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 Files: llvm/include/llvm/BinaryFormat/XCOFF.h llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/unittests/Object/CMakeLists.txt llvm/unittests/Object/XCOFFObjectFileTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81585.276551.patch Type: text/x-patch Size: 16242 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:14:34 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:14:34 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: paulwalker-arm updated this revision to Diff 276554. paulwalker-arm added a comment. Added comment saying when we can revert the mergeStoresAfterLegalization change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -3,6 +3,18 @@ target triple = "aarch64-unknown-linux-gnu" +; Currently there is no custom lowering for vector shuffles operating on types +; bigger than NEON. However, having no support opens us up to a code generator +; hang when expanding BUILD_VECTOR. Here we just validate the promblematic case +; successfully exits code generation. +define void @hang_when_merging_stores_after_legalisation(<8 x i32>* %a, <2 x i32> %b) #0 { +; CHECK-LABEL: hang_when_merging_stores_after_legalisation: + %splat = shufflevector <2 x i32> %b, <2 x i32> undef, <8 x i32> zeroinitializer + %interleaved.vec = shufflevector <8 x i32> %splat, <8 x i32> undef, <8 x i32> + store <8 x i32> %interleaved.vec, <8 x i32>* %a, align 4 + ret void +} + ; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in ; validating all combinations of vector type. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -734,6 +734,20 @@ bool fallBackToDAGISel(const Instruction &Inst) const override; + bool + shouldExpandBuildVectorWithShuffles(EVT VT, + unsigned DefinedValues) const override; + + /// SVE code generation for fixed length vectors does not custom lower + /// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to + /// merge. However, merging them creates a BUILD_VECTOR that is just as + /// illegal as the original, thus leading to an infinite legalisation loop. + /// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal + /// vector types this override can be removed. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); + } + private: /// Keep a pointer to the AArch64Subtarget around so that we can /// make the right decision when generating code for different targets. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -3564,6 +3564,16 @@ } } +// VECTOR_SHUFFLE is not legal for vectors bigger than NEON, so we cannot use +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { + if (useSVEForFixedLengthVectorVT(VT)) + return false; + + return TargetLowering::shouldExpandBuildVectorWithShuffles(VT, DefinedValues); +} + bool AArch64TargetLowering::useSVEForFixedLengthVectors() const { // Prefer NEON unless larger SVE registers are available. return Subtarget->hasSVE() && Subtarget->getMinSVEVectorSizeInBits() >= 256; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83408.276554.patch Type: text/x-patch Size: 3164 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:14:41 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:14:41 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <9748eaa25044d5f860525fdfc4dce979@localhost.localdomain> efriedma added a comment. > Should we remove the handling from llvm::ConstantFoldSelectInstruction It seems silly to remove the handling from InstSimplify, but not constant folding. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:14:48 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:14:48 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: kuter added a comment. In D83297#2138744 , @bbn wrote: > Can we just merge the dependency graph and the "SchedulingGraph" because I think they are quite similar. Thank you for the idea. There are some problems with how the end result looks like and to fix that I will change this significantly. My answer is late because it took some time for me to decide what I am going to do with this. This is how it looks like right now: https://streamable.com/qguha8 The layout is not consistent across the frames. To fix this I will record the state of scheduling (or the changes that are happening to the state) at each step and output the scheduling graphs after the fix point iteration is complete hiding the nodes and edges that are not visible at each step. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 14:16:35 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Wed, 08 Jul 2020 14:16:35 -0700 (PDT) Subject: [llvm] 74a148a - GlobalISel: Verify G_BITCAST changes the type Message-ID: <5f0637b3.1c69fb81.ca6ef.1e3a@mx.google.com> Author: Matt Arsenault Date: 2020-07-08T17:16:27-04:00 New Revision: 74a148ad39ab32317948a2d6291264acd84bfa00 URL: https://github.com/llvm/llvm-project/commit/74a148ad39ab32317948a2d6291264acd84bfa00 DIFF: https://github.com/llvm/llvm-project/commit/74a148ad39ab32317948a2d6291264acd84bfa00.diff LOG: GlobalISel: Verify G_BITCAST changes the type Updated the AArch64 tests the best I could with my vague, inferred understanding of AArch64 register banks. As far as I can tell, there is only one 32-bit/64-bit type which will use the gpr register bank, so we have to use the fpr bank for the other operand. Added: Modified: llvm/lib/CodeGen/MachineVerifier.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-simple.mir llvm/test/CodeGen/AArch64/GlobalISel/select-bitcast.mir llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir llvm/test/MachineVerifier/test_g_bitcast.mir Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp index cfc6e38d7cde..c1a2c4e0bc6e 100644 --- a/llvm/lib/CodeGen/MachineVerifier.cpp +++ b/llvm/lib/CodeGen/MachineVerifier.cpp @@ -1018,6 +1018,10 @@ void MachineVerifier::verifyPreISelGenericInstruction(const MachineInstr *MI) { if (SrcTy.getSizeInBits() != DstTy.getSizeInBits()) report("bitcast sizes must match", MI); + + if (SrcTy == DstTy) + report("bitcast must change the type", MI); + break; } case TargetOpcode::G_INTTOPTR: diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir index 463d6c1ae76f..37d00dfb3174 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir @@ -398,20 +398,23 @@ legalized: true # CHECK: registers: # CHECK-NEXT: - { id: 0, class: gpr, preferred-register: '' } -# CHECK-NEXT: - { id: 1, class: gpr, preferred-register: '' } +# FAST-NEXT: - { id: 1, class: fpr, preferred-register: '' } +# GREEDY-NEXT: - { id: 1, class: gpr, preferred-register: '' } registers: - { id: 0, class: _ } - { id: 1, class: _ } # CHECK: body: # CHECK: %0:gpr(s32) = COPY $w0 -# CHECK: %1:gpr(s32) = G_BITCAST %0 +# FAST-NEXT: %1:fpr(<4 x s8>) = G_BITCAST %0 +# GREEDY-NEXT: %1:gpr(<4 x s8>) = G_BITCAST %0 +# The greedy check is incorrect and should produce fpr. body: | bb.0: liveins: $w0 %0(s32) = COPY $w0 - %1(s32) = G_BITCAST %0 + %1(<4 x s8>) = G_BITCAST %0 ... --- @@ -421,20 +424,22 @@ legalized: true # CHECK: registers: # CHECK-NEXT: - { id: 0, class: fpr, preferred-register: '' } -# CHECK-NEXT: - { id: 1, class: fpr, preferred-register: '' } +# FAST-NEXT: - { id: 1, class: gpr, preferred-register: '' } +# GREEDY-NEXT: - { id: 1, class: fpr, preferred-register: '' } registers: - { id: 0, class: _ } - { id: 1, class: _ } # CHECK: body: # CHECK: %0:fpr(<2 x s16>) = COPY $s0 -# CHECK: %1:fpr(<2 x s16>) = G_BITCAST %0 +# FAST: %1:gpr(s32) = G_BITCAST %0 +# GREEDY: %1:fpr(s32) = G_BITCAST %0 body: | bb.0: liveins: $s0 %0(<2 x s16>) = COPY $s0 - %1(<2 x s16>) = G_BITCAST %0 + %1(s32) = G_BITCAST %0 ... --- @@ -490,13 +495,14 @@ registers: - { id: 1, class: _ } # CHECK: body: # CHECK: %0:gpr(s64) = COPY $x0 -# CHECK: %1:gpr(s64) = G_BITCAST %0 +# FAST: %1:fpr(<2 x s32>) = G_BITCAST %0 +# GREEDY: %1:gpr(<2 x s32>) = G_BITCAST %0 body: | bb.0: liveins: $x0 %0(s64) = COPY $x0 - %1(s64) = G_BITCAST %0 + %1(<2 x s32>) = G_BITCAST %0 ... --- @@ -508,13 +514,14 @@ registers: - { id: 1, class: _ } # CHECK: body: # CHECK: %0:fpr(<2 x s32>) = COPY $d0 -# CHECK: %1:fpr(<2 x s32>) = G_BITCAST %0 +# FAST: %1:gpr(s64) = G_BITCAST %0 +# GREEDY: %1:fpr(s64) = G_BITCAST %0 body: | bb.0: liveins: $d0 %0(<2 x s32>) = COPY $d0 - %1(<2 x s32>) = G_BITCAST %0 + %1(s64) = G_BITCAST %0 ... --- diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-simple.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-simple.mir index 5b76737edc2b..89e310cc9ecd 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-simple.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-simple.mir @@ -35,8 +35,8 @@ body: | ; CHECK: [[BITCAST:%[0-9]+]]:_(<2 x s32>) = G_BITCAST [[COPY]](s64) ; CHECK: [[BITCAST1:%[0-9]+]]:_(s64) = G_BITCAST [[BITCAST]](<2 x s32>) ; CHECK: $x0 = COPY [[BITCAST1]](s64) - ; CHECK: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[SELECT3]](s32) - ; CHECK: $w0 = COPY [[BITCAST2]](s32) + ; CHECK: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[SELECT3]](s32) + ; CHECK: $w0 = COPY [[BITCAST2]](<2 x s16>) ; CHECK: [[BITCAST3:%[0-9]+]]:_(<4 x s8>) = G_BITCAST [[TRUNC1]](s32) ; CHECK: [[BITCAST4:%[0-9]+]]:_(s32) = G_BITCAST [[BITCAST3]](<4 x s8>) ; CHECK: $w0 = COPY [[BITCAST4]](s32) @@ -72,8 +72,8 @@ body: | %12:_(<2 x s32>) = G_BITCAST %0(s64) %13:_(s64) = G_BITCAST %12(<2 x s32>) $x0 = COPY %13(s64) - %14:_(s32) = G_BITCAST %10(s32) - $w0 = COPY %14(s32) + %14:_(<2 x s16>) = G_BITCAST %10(s32) + $w0 = COPY %14 %15:_(<4 x s8>) = G_BITCAST %4(s32) %20:_(s32) = G_BITCAST %15(<4 x s8>) $w0 = COPY %20(s32) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-bitcast.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-bitcast.mir index 0846f54289d7..d9ee37e312b9 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-bitcast.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-bitcast.mir @@ -22,18 +22,21 @@ legalized: true regBankSelected: true registers: - { id: 0, class: gpr } - - { id: 1, class: gpr } - + - { id: 1, class: fpr } + - { id: 2, class: gpr } body: | bb.0: liveins: $w0 ; CHECK-LABEL: name: bitcast_s32_gpr ; CHECK: [[COPY:%[0-9]+]]:gpr32all = COPY $w0 - ; CHECK: $w0 = COPY [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:fpr32 = COPY [[COPY]] + ; CHECK: [[COPY2:%[0-9]+]]:gpr32all = COPY [[COPY1]] + ; CHECK: $w0 = COPY [[COPY2]] %0(s32) = COPY $w0 - %1(s32) = G_BITCAST %0 - $w0 = COPY %1(s32) + %1(<2 x s16>) = G_BITCAST %0 + %2(s32) = G_BITCAST %1 + $w0 = COPY %2 ... --- @@ -43,18 +46,21 @@ regBankSelected: true registers: - { id: 0, class: fpr } - - { id: 1, class: fpr } - + - { id: 1, class: gpr } + - { id: 2, class: fpr } body: | bb.0: liveins: $s0 ; CHECK-LABEL: name: bitcast_s32_fpr ; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 - ; CHECK: $s0 = COPY [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[COPY]] + ; CHECK: [[COPY2:%[0-9]+]]:fpr32 = COPY [[COPY1]] + ; CHECK: $s0 = COPY [[COPY2]] %0(s32) = COPY $s0 - %1(s32) = G_BITCAST %0 - $s0 = COPY %1(s32) + %1(<2 x s16>) = G_BITCAST %0 + %2(s32) = G_BITCAST %1 + $s0 = COPY %2 ... --- @@ -75,8 +81,8 @@ body: | ; CHECK: [[COPY1:%[0-9]+]]:fpr32 = COPY [[COPY]] ; CHECK: $s0 = COPY [[COPY1]] %0(s32) = COPY $w0 - %1(s32) = G_BITCAST %0 - $s0 = COPY %1(s32) + %1(<2 x s16>) = G_BITCAST %0 + $s0 = COPY %1 ... --- @@ -94,9 +100,9 @@ body: | ; CHECK-LABEL: name: bitcast_s32_fpr_gpr ; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 - ; CHECK: [[COPY1:%[0-9]+]]:gpr32 = COPY [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[COPY]] ; CHECK: $w0 = COPY [[COPY1]] - %0(s32) = COPY $s0 + %0(<2 x s16>) = COPY $s0 %1(s32) = G_BITCAST %0 $w0 = COPY %1(s32) ... @@ -108,7 +114,8 @@ regBankSelected: true registers: - { id: 0, class: gpr } - - { id: 1, class: gpr } + - { id: 1, class: fpr } + - { id: 2, class: gpr } body: | bb.0: @@ -116,10 +123,13 @@ body: | ; CHECK-LABEL: name: bitcast_s64_gpr ; CHECK: [[COPY:%[0-9]+]]:gpr64all = COPY $x0 - ; CHECK: $x0 = COPY [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:fpr64 = COPY [[COPY]] + ; CHECK: [[COPY2:%[0-9]+]]:gpr64 = COPY [[COPY1]] + ; CHECK: $x0 = COPY [[COPY2]] %0(s64) = COPY $x0 - %1(s64) = G_BITCAST %0 - $x0 = COPY %1(s64) + %1(<2 x s32>) = G_BITCAST %0 + %2(s64) = G_BITCAST %1 + $x0 = COPY %2(s64) ... --- @@ -139,8 +149,8 @@ body: | ; CHECK: [[COPY:%[0-9]+]]:fpr64 = COPY $d0 ; CHECK: $d0 = COPY [[COPY]] %0(s64) = COPY $d0 - %1(s64) = G_BITCAST %0 - $d0 = COPY %1(s64) + %1(<2 x s32>) = G_BITCAST %0 + $d0 = COPY %1 ... --- @@ -160,8 +170,8 @@ body: | ; CHECK: [[COPY1:%[0-9]+]]:fpr64 = COPY [[COPY]] ; CHECK: $d0 = COPY [[COPY1]] %0(s64) = COPY $x0 - %1(s64) = G_BITCAST %0 - $d0 = COPY %1(s64) + %1(<2 x s32>) = G_BITCAST %0 + $d0 = COPY %1 ... --- @@ -179,11 +189,11 @@ body: | ; CHECK-LABEL: name: bitcast_s64_fpr_gpr ; CHECK: [[COPY:%[0-9]+]]:fpr64 = COPY $d0 - ; CHECK: [[COPY1:%[0-9]+]]:gpr64 = COPY [[COPY]] + ; CHECK: [[COPY1:%[0-9]+]]:gpr64all = COPY [[COPY]] ; CHECK: $x0 = COPY [[COPY1]] %0(s64) = COPY $d0 - %1(s64) = G_BITCAST %0 - $x0 = COPY %1(s64) + %1(<2 x s32>) = G_BITCAST %0 + $x0 = COPY %1 ... --- diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir index cda64271c0af..36132e0badbc 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir @@ -93,14 +93,15 @@ body: | ; CHECK-LABEL: name: dont_fold_bitcast ; CHECK: liveins: $w0 ; CHECK: %copy:gpr32all = COPY $w0 - ; CHECK: %bitcast:gpr32 = COPY %copy - ; CHECK: [[SUBREG_TO_REG:%[0-9]+]]:gpr64 = SUBREG_TO_REG 0, %bitcast, %subreg.sub_32 + ; CHECK: %bitcast1:gpr32 = COPY %copy + ; CHECK: [[SUBREG_TO_REG:%[0-9]+]]:gpr64 = SUBREG_TO_REG 0, %bitcast1, %subreg.sub_32 ; CHECK: %zext:gpr64 = UBFMXri [[SUBREG_TO_REG]], 0, 31 ; CHECK: $x0 = COPY %zext ; CHECK: RET_ReallyLR implicit $x0 %copy:gpr(s32) = COPY $w0 - %bitcast:gpr(s32) = G_BITCAST %copy(s32) - %zext:gpr(s64) = G_ZEXT %bitcast(s32) + %bitcast0:gpr(<4 x s8>) = G_BITCAST %copy(s32) + %bitcast1:gpr(s32) = G_BITCAST %bitcast0 + %zext:gpr(s64) = G_ZEXT %bitcast1(s32) $x0 = COPY %zext(s64) RET_ReallyLR implicit $x0 diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir index aedcb37e758c..6dcb28b9826d 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir @@ -11,9 +11,9 @@ body: | liveins: $sgpr0 ; CHECK-LABEL: name: bitcast_s ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0 - ; CHECK: [[BITCAST:%[0-9]+]]:sgpr(s32) = G_BITCAST [[COPY]](s32) + ; CHECK: [[BITCAST:%[0-9]+]]:sgpr(<2 x s16>) = G_BITCAST [[COPY]](s32) %0:_(s32) = COPY $sgpr0 - %1:_(s32) = G_BITCAST %0 + %1:_(<2 x s16>) = G_BITCAST %0 ... --- @@ -25,7 +25,7 @@ body: | liveins: $vgpr0 ; CHECK-LABEL: name: bitcast_v ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0 - ; CHECK: [[BITCAST:%[0-9]+]]:vgpr(s32) = G_BITCAST [[COPY]](s32) + ; CHECK: [[BITCAST:%[0-9]+]]:vgpr(<2 x s16>) = G_BITCAST [[COPY]](s32) %0:_(s32) = COPY $vgpr0 - %1:_(s32) = G_BITCAST %0 + %1:_(<2 x s16>) = G_BITCAST %0 ... diff --git a/llvm/test/MachineVerifier/test_g_bitcast.mir b/llvm/test/MachineVerifier/test_g_bitcast.mir index a399c859404f..24ee95ba4b63 100644 --- a/llvm/test/MachineVerifier/test_g_bitcast.mir +++ b/llvm/test/MachineVerifier/test_g_bitcast.mir @@ -34,4 +34,6 @@ body: | %10:_(p1) = G_IMPLICIT_DEF %11:_(p3) = G_BITCAST %8 + ; CHECK: Bad machine code: bitcast must change the type + %12:_(s64) = G_BITCAST %0 ... From llvm-commits at lists.llvm.org Wed Jul 8 14:16:52 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:16:52 +0000 (UTC) Subject: [PATCH] D81485: GlobalISel: Verify G_BITCAST changes the type In-Reply-To: References: Message-ID: <028cce326ca600240f689a45474da138@localhost.localdomain> arsenm closed this revision. arsenm added a comment. 74a148ad39ab32317948a2d6291264acd84bfa00 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81485/new/ https://reviews.llvm.org/D81485 From llvm-commits at lists.llvm.org Wed Jul 8 14:17:08 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:17:08 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: <7b2f305384c5cfa40a914d7247b348a8@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3569 +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { ---------------- paulwalker-arm wrote: > efriedma wrote: > > Would it be enough to just fix isShuffleMaskLegal, instead of overriding this? > We expand VECTOR_SHUFFLE (using BUILD_VECTOR) so at this stage I think it's better to just prevent extra VECTOR_SHUFFLE instances/code paths as early as possible. I'm not sure what "early" means in this context; the LegalizeDAG code that checks shouldExpandBuildVectorWithShuffles also checks isShuffleMaskLegal. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 From llvm-commits at lists.llvm.org Wed Jul 8 14:18:20 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:18:20 +0000 (UTC) Subject: [PATCH] D83297: [Attributor][WIP] Attribute scheduling visualization. In-Reply-To: References: Message-ID: kuter marked an inline comment as done. kuter added inline comments. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:906 + InfoCache(InfoCache), CGUpdater(CGUpdater), SG(&DG), Allowed(Allowed), SeedingPeriod(true) {} ---------------- sstefan1 wrote: > bbn wrote: > > Is this patch based on D83185 ? But I think those are 2 irrelevant patches, right? Maybe you should create a branch from master and apply changes to that. > That is fine, as long as allow list goes in first. @bbn Seeding allow list was originally meant for this patch 😄 . CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83297/new/ https://reviews.llvm.org/D83297 From llvm-commits at lists.llvm.org Wed Jul 8 14:18:20 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:18:20 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <6141720c5f00c16374fcca2ee48288be@localhost.localdomain> lebedev.ri updated this revision to Diff 276552. lebedev.ri added a reviewer: arsenm. lebedev.ri added a comment. Herald added a subscriber: wdng. Addressing nits: - It turns out, we can't add attributes to true intrinsics, so indeed, there is no point in trying to reduce them. Originally, i thought that wasn't the case and didn't want to deal with all that (with having to understand which attribute will just reappear if it is deleted) right away. So just ignore true intrinsics. - for_each changes I will split this up into several commits when landing. In D83351#2139879 , @nickdesaulniers wrote: > In D83351#2139818 , @lebedev.ri wrote: > > > so i'm still waiting on the link/patch. > > > Treat your fellow contributors with more respect, please.  I know style disagreements aren't exciting, but we're all on the same team. Please remember that any [self-respecting] community is diverse, and almost by definition that includes people with different 'base' languages. American speak of sugar-coating very message is neither universally-used nor is required, and not always following it does not mean ill-intent. > I much prefer LLVM's community to the Linux kernel's for a reason, and I think it's worthwhile to speak up in defense of it, lest it decay.  I'm not the best at maintaining this myself, so if you see me break my own standards, please feel empowered to call me out. > > Another benefit of range for here is we don't need the braces, so these can be 2 lines instead of 3. Please change them to be consistent. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/docs/CodingStandards.rst llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll llvm/test/Reduce/remove-attributes-from-intrinsics.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276552.patch Type: text/x-patch Size: 22549 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:18:52 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:18:52 +0000 (UTC) Subject: [PATCH] D83427: [MSAN] Update tests due to widespread eager checking Message-ID: guiand created this revision. guiand added reviewers: eugenis, vitalybuka. Herald added subscribers: llvm-commits, Sanitizers, hiraditya, mgorny. Herald added projects: Sanitizers, LLVM. Split off from D81699 Excludes `__sanitizer_unaligned_{load,store}` functions from checks. This is because they absolutely expect shadows passed over TLS. Adds lit configuration to be able to test MSAN's interaction with widespread eager checking due to clang emitting noundef attrs. Additionally introduces tests for: - Bitfield handling - Different sized function parameters - MSAN and struct padding (when passing structs by value) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83427 Files: compiler-rt/test/msan/CMakeLists.txt compiler-rt/test/msan/bitfield.cpp compiler-rt/test/msan/chained_origin.cpp compiler-rt/test/msan/chained_origin_empty_stack.cpp compiler-rt/test/msan/chained_origin_memcpy.cpp compiler-rt/test/msan/cxa_atexit.cpp compiler-rt/test/msan/in-struct-padding.cpp compiler-rt/test/msan/insertvalue_origin.cpp compiler-rt/test/msan/lit.site.cfg.py.in compiler-rt/test/msan/no_sanitize_memory_prop.cpp compiler-rt/test/msan/param_tls_limit.cpp compiler-rt/test/msan/parameter-mixing.cpp compiler-rt/test/msan/qsort.cpp compiler-rt/test/msan/signal_stress_test.cpp compiler-rt/test/msan/stack-origin2.cpp compiler-rt/test/msan/unpoison_param.cpp compiler-rt/test/msan/vararg.cpp compiler-rt/test/msan/vector_cvt.cpp llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83427.276555.patch Type: text/x-patch Size: 18321 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:20:38 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:20:38 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <97221d8c79b3a74d29b6397450911037@localhost.localdomain> stefanp added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:399 def PLI8 : MLS_DForm_SI34_RT5<14, (outs g8rc:$RT), - (ins s34imm:$SI), + (ins s34imm_pcrel:$SI), "pli $RT, $SI", IIC_IntSimple, []>; ---------------- nemanjai wrote: > It seems very odd to me that we would use the `_pcrel` version here. There should be no way to do anything PC-relative with this instruction since it will necessarily set the PC-Rel bit to zero. The immediate should always be a real immediate (never any fixup). > > So although it doesn't matter, we should probably not use the `_pcrel` version because it will be confusing. I was certainly confused and wrote about 3 versions of this comment :) You are correct. For some reason I added this here thinking that `MLS_DForm_SI34_RT5` was a multi-class just like the one above it... Anyway, I'll fix it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 From llvm-commits at lists.llvm.org Wed Jul 8 14:21:45 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:21:45 +0000 (UTC) Subject: [PATCH] D83299: [Attributor] [WIP] Introduce callbase context bridges. In-Reply-To: References: Message-ID: jdoerfert added a comment. Can we include some code that triggers this so we can tests it? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83299/new/ https://reviews.llvm.org/D83299 From llvm-commits at lists.llvm.org Wed Jul 8 14:22:49 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:22:49 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: <803f361e04930f96532cab6137716d1a@localhost.localdomain> efriedma added a comment. > Are you referring to using the heap only once the queue grows larger than a threshold or deciding what scheduling heuristics to enable based on the size? The scheduling heuristics. > The selection should be deterministic across different compilers/C++ STLs because the comparator enforces a total order. It's undefined behavior to call std::push_heap/std::pop_heap on an array that isn't a heap. If the total order changes, that can break the heap property. Not sure what the practical consequence would be on common STL implementations, but that seems scary enough that we want to ensure that can't happen. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Wed Jul 8 14:23:00 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:23:00 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. In-Reply-To: References: Message-ID: yamauchi updated this revision to Diff 276556. yamauchi added a comment. Rebased on the split tests (D83424 ). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83332/new/ https://reviews.llvm.org/D83332 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/phaddsub-extract.ll Index: llvm/test/CodeGen/X86/phaddsub-extract.ll =================================================================== --- llvm/test/CodeGen/X86/phaddsub-extract.ll +++ llvm/test/CodeGen/X86/phaddsub-extract.ll @@ -2095,35 +2095,19 @@ } define i32 @hadd32_4_pgso(<4 x i32> %x225) !prof !14 { -; SSE3-SLOW-LABEL: hadd32_4_pgso: -; SSE3-SLOW: # %bb.0: -; SSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] -; SSE3-SLOW-NEXT: paddd %xmm0, %xmm1 -; SSE3-SLOW-NEXT: phaddd %xmm1, %xmm1 -; SSE3-SLOW-NEXT: movd %xmm1, %eax -; SSE3-SLOW-NEXT: retq -; -; SSE3-FAST-LABEL: hadd32_4_pgso: -; SSE3-FAST: # %bb.0: -; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 -; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 -; SSE3-FAST-NEXT: movd %xmm0, %eax -; SSE3-FAST-NEXT: retq -; -; AVX-SLOW-LABEL: hadd32_4_pgso: -; AVX-SLOW: # %bb.0: -; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] -; AVX-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0 -; AVX-SLOW-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-SLOW-NEXT: vmovd %xmm0, %eax -; AVX-SLOW-NEXT: retq +; SSE3-LABEL: hadd32_4_pgso: +; SSE3: # %bb.0: +; SSE3-NEXT: phaddd %xmm0, %xmm0 +; SSE3-NEXT: phaddd %xmm0, %xmm0 +; SSE3-NEXT: movd %xmm0, %eax +; SSE3-NEXT: retq ; -; AVX-FAST-LABEL: hadd32_4_pgso: -; AVX-FAST: # %bb.0: -; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-FAST-NEXT: vmovd %xmm0, %eax -; AVX-FAST-NEXT: retq +; AVX-LABEL: hadd32_4_pgso: +; AVX: # %bb.0: +; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-NEXT: vmovd %xmm0, %eax +; AVX-NEXT: retq %x226 = shufflevector <4 x i32> %x225, <4 x i32> undef, <4 x i32> %x227 = add <4 x i32> %x225, %x226 %x228 = shufflevector <4 x i32> %x227, <4 x i32> undef, <4 x i32> Index: llvm/test/CodeGen/X86/avx-vperm2x128.ll =================================================================== --- llvm/test/CodeGen/X86/avx-vperm2x128.ll +++ llvm/test/CodeGen/X86/avx-vperm2x128.ll @@ -397,8 +397,7 @@ define <4 x double> @shuffle_v4f64_zz23_pgso(<4 x double> %a) !prof !14 { ; ALL-LABEL: shuffle_v4f64_zz23_pgso: ; ALL: # %bb.0: -; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 -; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = zero,zero,ymm0[2,3] ; ALL-NEXT: retq %s = shufflevector <4 x double> %a, <4 x double> , <4 x i32> ret <4 x double> %s @@ -441,8 +440,7 @@ define <4 x double> @shuffle_v4f64_zz67_pgso(<4 x double> %a) !prof !14 { ; ALL-LABEL: shuffle_v4f64_zz67_pgso: ; ALL: # %bb.0: -; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 -; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = zero,zero,ymm0[2,3] ; ALL-NEXT: retq %s = shufflevector <4 x double> , <4 x double> %a, <4 x i32> ret <4 x double> %s Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -34445,7 +34445,7 @@ return DAG.getBitcast(RootVT, V1); } - bool OptForSize = DAG.getMachineFunction().getFunction().hasOptSize(); + bool OptForSize = DAG.shouldOptForSize(); unsigned RootSizeInBits = RootVT.getSizeInBits(); unsigned NumRootElts = RootVT.getVectorNumElements(); unsigned BaseMaskEltSizeInBits = RootSizeInBits / NumBaseMaskElts; @@ -39287,7 +39287,7 @@ } // Only use (F)HADD opcodes if they aren't microcoded or minimizes codesize. - bool OptForSize = DAG.getMachineFunction().getFunction().hasOptSize(); + bool OptForSize = DAG.shouldOptForSize(); if (!Subtarget.hasFastHorizontalOps() && !OptForSize) return SDValue(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83332.276556.patch Type: text/x-patch Size: 4060 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:23:34 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:23:34 +0000 (UTC) Subject: [PATCH] D82679: OpaquePtr: Don't check pointee type for byval/preallocated In-Reply-To: References: Message-ID: jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82679/new/ https://reviews.llvm.org/D82679 From llvm-commits at lists.llvm.org Wed Jul 8 14:31:10 2020 From: llvm-commits at lists.llvm.org (David Majnemer via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:31:10 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <33588ea26985141853527613809a6cbc@localhost.localdomain> majnemer added inline comments. ================ Comment at: llvm/lib/Analysis/InstructionSimplify.cpp:4121-4125 - if (isa(TrueVal)) // select ?, undef, X -> X - return FalseVal; - if (isa(FalseVal)) // select ?, X, undef -> X - return TrueVal; - ---------------- Can we still do these optimizations when `X` is a frozen value? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:32:43 2020 From: llvm-commits at lists.llvm.org (Adam Nemet via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:32:43 +0000 (UTC) Subject: [PATCH] D72770: Add matrix types extension tests . In-Reply-To: References: Message-ID: anemet accepted this revision. anemet added a comment. This revision is now accepted and ready to land. LGTM. Repository: rT test-suite CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72770/new/ https://reviews.llvm.org/D72770 From llvm-commits at lists.llvm.org Wed Jul 8 14:33:04 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:33:04 +0000 (UTC) Subject: [PATCH] D83421: [RFC] MemorySSAUpdater: Simplify applyUpdates In-Reply-To: References: Message-ID: <9bcbef027251e174badfffb0ab553e6b@localhost.localdomain> asbirlea added a comment. This is an issue I am aware of and the main motivation for the DominatorTree refactoring work. We do need an updated DT here, and currently we're not computing the correct one. This isn't causing issues at the moment because of the scenarios where the updater is being used, but using the current DT in general is not correct. You're right, it's a very useful optimization now, but some of the cost will be re-added as soon as the infrastructure to update the DT to a PostCFG view is available. I opted for not removing this now to avoid the confusion of potential performance regressions for the next release, since re-adding the infrastructure needed for correctness will present as a regression. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83421/new/ https://reviews.llvm.org/D83421 From llvm-commits at lists.llvm.org Wed Jul 8 14:33:43 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:33:43 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <70cb60a0652c78f8d3d4060d44f41e23@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:310 + static constexpr uint32_t IsEprolMask = 0x0000'4000; + static constexpr uint32_t HasCodeLenMask = 0x0000'2000; + static constexpr uint32_t IntProcMask = 0x0000'1000; ---------------- DiggerLin wrote: > jasonliu wrote: > > In AIX OS header "/usr/include/sys/debug.h", we have this field as > > ``` > > unsigned has_tboff:1; /* Set if offset from start of proc stored */ > > ``` > > It might be better to rename HasCodeLenMask sot that it has a bit association with the original name? So that people do not need to reason about if this is the correct field or not. > I think I will like to keep the HasFunctionCodeLenMask. > for the we will give the function code lenth later depend in the the bit. > Changing the code to > if (!Err & HasTraceBackTableOffset() > CodeLen = DE.getU32(&OffsetPtr, &Err); > > is not better than > > if (!Err && hasFunctionCodeLen()) > CodeLen = DE.getU32(&OffsetPtr, &Err); > I think my point is to have association with the original name, and hasFunctionCodeLen does not have association with the name in system header which could cause confusion. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:396 + const uint8_t *TBPtr; + const uint64_t Size; + Optional ParaType; ---------------- DiggerLin wrote: > jasonliu wrote: > > Do you actually need the Size as data member? > we need to Size to know whether the traceback table is long enough for the all for the fields. But right now you only need to use it in the constructor and it was passed in as a parameter. So why do you need to save it as a data member? Or did I miss any other usage of `Size` as data member? ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:849 +static SmallString<32> parseParaType(uint32_t Value, unsigned ParaNum) { + SmallString<32> ParaType; + for (unsigned I = 0; I < ParaNum; ++I) { ---------------- Why always 32? As I mentioned in the other comment, ParaNum have implication for how large your SmallString could be. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:896 + report_fatal_error("vector info, controlled storage info and extension " + "table of traceback table not yet implemented"); + ---------------- I would hope we could skip the sections we do not want to parse for now gracefully instead of just report_fatal_error and stop parsing all together. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:850 + if (I != 0) + ParaType += ", "; + if ((Value & TracebackTable::FixedParaTypeBit) == 0) { ---------------- DiggerLin wrote: > jasonliu wrote: > > Consider doing ParaType += "i, " and ParaType += "f, " ... > > and do a removal of ", " after parsing all parameters. > since we will use SmallString, The cost of deleting last "," is more expense than currently implement. But you could avoid so many "if (I != 0)" condition which is not that efficient. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- DiggerLin wrote: > jasonliu wrote: > > Why do we need to declare a new variable? > yes , we need it . it been use here > FunctionName = DE.getBytes(&offset_ptr, Len, &Err); > > since after we get a value the point offset_ptr moved, we can not get it second time. What's wrong with ``` FunctionNameLen = DE.getU16(&OffsetPtr, &Err); if (!Err) FunctionName = DE.getBytes(&OffsetPtr, Len, &Err); ``` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 14:35:27 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:35:27 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <16f30eac775c595f69522d6d2dd12bef@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- jasonliu wrote: > DiggerLin wrote: > > jasonliu wrote: > > > Why do we need to declare a new variable? > > yes , we need it . it been use here > > FunctionName = DE.getBytes(&offset_ptr, Len, &Err); > > > > since after we get a value the point offset_ptr moved, we can not get it second time. > What's wrong with > ``` > FunctionNameLen = DE.getU16(&OffsetPtr, &Err); > if (!Err) > FunctionName = DE.getBytes(&OffsetPtr, Len, &Err); > ``` > ? I meant ``` FunctionNameLen = DE.getU16(&OffsetPtr, &Err); if (!Err) FunctionName = DE.getBytes(&OffsetPtr, FunctionNameLen, &Err); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 14:36:26 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:36:26 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: nickdesaulniers added a comment. In D83351#2140113 , @lebedev.ri wrote: > I will split this up into several commits when landing. What's the plan? ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:74-76 + if (F.getIntrinsicID() != Intrinsic::not_intrinsic) + return; // We can neither add nor remove attributes from intrinsics. + visitAttributeList(F.getAttributes(), FunctionsToRefine[&F]); ---------------- ``` // We can neither add nor remove attributes from intrinsics. if (F.getIntrinsicID() == Intrinsic::not_intrinsic) visitAttributeList(F.getAttributes(), FunctionsToRefine[&F]); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 14:38:03 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:38:03 +0000 (UTC) Subject: [PATCH] D81682: [PGO] Extend the value profile buckets for mem op sizes. In-Reply-To: References: Message-ID: <71b6212c5ce52514c529e3366b2d0d09@localhost.localdomain> yamauchi marked 3 inline comments as done. yamauchi added inline comments. ================ Comment at: compiler-rt/include/profile/InstrProfData.inc:839 +#if defined(_MSC_VER) && !defined(__clang__) + +#include ---------------- davidxl wrote: > There is __popcnt etc. Can they be used? > > https://docs.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=vs-2019 Unfortunately, no. I already tried this. (note "popcnt and popcnt64 aren't available on arm/arm64 and not on all x86/x86-64 CPUs" in the older update message.) I can confirm it on https://godbolt.org/z/bn5PEy ================ Comment at: compiler-rt/include/profile/InstrProfData.inc:842 +INSTR_PROF_VISIBILITY INSTR_PROF_INLINE +int InstProfClzll(unsigned long long X) { + unsigned long LeadZeroIdx = 0; ---------------- davidxl wrote: > Since these helpers are only used by runtime on target, not by the host compiler, they should be moved to InstrProfilingUtil.c instead as InstrProfData.Inc is shared by runtime and compiler. InstrProfIsSingleValRange (hence InstProfPopcountll) is used by the compiler in lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp. ================ Comment at: compiler-rt/lib/profile/InstrProfilingValue.c:274 +COMPILER_RT_VISIBILITY void +__llvm_profile_instrument_memop(uint64_t TargetValue, void *Data, + uint32_t CounterIndex) { ---------------- davidxl wrote: > Ideally, this function should be inline expanded by the compiler at instrumentation time -- but that can be done separately. True. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81682/new/ https://reviews.llvm.org/D81682 From llvm-commits at lists.llvm.org Wed Jul 8 14:40:07 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:40:07 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. In-Reply-To: References: Message-ID: <8d444b1ce4cdc4458642dfd6efa39c8e@localhost.localdomain> yamauchi added a comment. In D83332#2137359 , @RKSimon wrote: > You should probably commit the additional tests to trunk with its current codegen and then rebase the patch to show the diff. Done. The tests are separated into D83424 and this patch is now rebased on top of it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83332/new/ https://reviews.llvm.org/D83332 From llvm-commits at lists.llvm.org Wed Jul 8 14:40:20 2020 From: llvm-commits at lists.llvm.org (John Regehr via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:40:20 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <70f942a0b3520a7e681f620502ebf19d@localhost.localdomain> regehr added a comment. @majnemer should work: https://alive2.llvm.org/ce/z/vL4yn4 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:42:29 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:42:29 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <486e25d0a856d2424f50e1d3e82d0db9@localhost.localdomain> lebedev.ri added a comment. In D83351#2140177 , @nickdesaulniers wrote: > In D83351#2140113 , @lebedev.ri wrote: > > > I will split this up into several commits when landing. > > > What's the plan? langref(?), test cleanup, for_each cleanup, rest of the patch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 14:43:15 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:43:15 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: vitalybuka accepted this revision. vitalybuka added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:1687 + + if (!FArgEagerCheck) ArgOffset += alignTo(Size, kShadowTLSAlignment); ---------------- clang-format this ================ Comment at: llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll:1 +; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \ +; RUN: FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s ---------------- vitalybuka wrote: > would you like to try go generate test with llvm/utils/update_analyze_test_checks.py ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 From llvm-commits at lists.llvm.org Wed Jul 8 14:44:03 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:44:03 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <918817a903e8fd373ee5d84a3746815e@localhost.localdomain> craig.topper added a comment. Wasn't @majnemer asking about define i32 @src(i1 %cond, i32 %x) { %xf = freeze i32 %x %s = select i1 %cond, i32 %xf, i32 undef ret i32 %s } which is legal. I'm going to work on supporting known non-poison cases. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:44:50 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via llvm-commits) Date: Wed, 08 Jul 2020 14:44:50 -0700 (PDT) Subject: [compiler-rt] 339f1b4 - sanitizers: Add interceptors for getproto{ent,byname,bynumber}_r Message-ID: <5f063e52.1c69fb81.80483.2180@mx.google.com> Author: Gui Andrade Date: 2020-07-08T21:41:18Z New Revision: 339f1b49037bd7fbd1454c872bcfd1bb6c380f5d URL: https://github.com/llvm/llvm-project/commit/339f1b49037bd7fbd1454c872bcfd1bb6c380f5d DIFF: https://github.com/llvm/llvm-project/commit/339f1b49037bd7fbd1454c872bcfd1bb6c380f5d.diff LOG: sanitizers: Add interceptors for getproto{ent,byname,bynumber}_r This also allows intercepting these getprotoent functions on Linux as well, since Linux exposes them. Differential Revision: https://reviews.llvm.org/D82424 Added: compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp Modified: compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h Removed: ################################################################################ diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc index 053e559a9f90..ea9c71ba8803 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc +++ b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc @@ -7299,23 +7299,26 @@ INTERCEPTOR(int, setttyentpath, char *path) { #endif #if SANITIZER_INTERCEPT_PROTOENT -INTERCEPTOR(struct __sanitizer_protoent *, getprotoent) { - void *ctx; - COMMON_INTERCEPTOR_ENTER(ctx, getprotoent); - struct __sanitizer_protoent *p = REAL(getprotoent)(); - if (p) { - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p, sizeof(*p)); +static void write_protoent(void *ctx, struct __sanitizer_protoent *p) { + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p, sizeof(*p)); - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_name, REAL(strlen)(p->p_name) + 1); + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_name, REAL(strlen)(p->p_name) + 1); - SIZE_T pp_size = 1; // One handles the trailing \0 + SIZE_T pp_size = 1; // One handles the trailing \0 - for (char **pp = p->p_aliases; *pp; ++pp, ++pp_size) - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, *pp, REAL(strlen)(*pp) + 1); + for (char **pp = p->p_aliases; *pp; ++pp, ++pp_size) + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, *pp, REAL(strlen)(*pp) + 1); - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_aliases, - pp_size * sizeof(char **)); - } + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_aliases, + pp_size * sizeof(char **)); +} + +INTERCEPTOR(struct __sanitizer_protoent *, getprotoent) { + void *ctx; + COMMON_INTERCEPTOR_ENTER(ctx, getprotoent); + struct __sanitizer_protoent *p = REAL(getprotoent)(); + if (p) + write_protoent(ctx, p); return p; } @@ -7325,19 +7328,8 @@ INTERCEPTOR(struct __sanitizer_protoent *, getprotobyname, const char *name) { if (name) COMMON_INTERCEPTOR_READ_RANGE(ctx, name, REAL(strlen)(name) + 1); struct __sanitizer_protoent *p = REAL(getprotobyname)(name); - if (p) { - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p, sizeof(*p)); - - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_name, REAL(strlen)(p->p_name) + 1); - - SIZE_T pp_size = 1; // One handles the trailing \0 - - for (char **pp = p->p_aliases; *pp; ++pp, ++pp_size) - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, *pp, REAL(strlen)(*pp) + 1); - - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_aliases, - pp_size * sizeof(char **)); - } + if (p) + write_protoent(ctx, p); return p; } @@ -7345,19 +7337,8 @@ INTERCEPTOR(struct __sanitizer_protoent *, getprotobynumber, int proto) { void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, getprotobynumber, proto); struct __sanitizer_protoent *p = REAL(getprotobynumber)(proto); - if (p) { - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p, sizeof(*p)); - - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_name, REAL(strlen)(p->p_name) + 1); - - SIZE_T pp_size = 1; // One handles the trailing \0 - - for (char **pp = p->p_aliases; *pp; ++pp, ++pp_size) - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, *pp, REAL(strlen)(*pp) + 1); - - COMMON_INTERCEPTOR_WRITE_RANGE(ctx, p->p_aliases, - pp_size * sizeof(char **)); - } + if (p) + write_protoent(ctx, p); return p; } #define INIT_PROTOENT \ @@ -7368,6 +7349,58 @@ INTERCEPTOR(struct __sanitizer_protoent *, getprotobynumber, int proto) { #define INIT_PROTOENT #endif +#if SANITIZER_INTERCEPT_PROTOENT_R +INTERCEPTOR(int, getprotoent_r, struct __sanitizer_protoent *result_buf, + char *buf, SIZE_T buflen, struct __sanitizer_protoent **result) { + void *ctx; + COMMON_INTERCEPTOR_ENTER(ctx, getprotoent_r, result_buf, buf, buflen, + result); + int res = REAL(getprotoent_r)(result_buf, buf, buflen, result); + + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, result, sizeof *result); + if (!res && *result) + write_protoent(ctx, *result); + return res; +} + +INTERCEPTOR(int, getprotobyname_r, const char *name, + struct __sanitizer_protoent *result_buf, char *buf, SIZE_T buflen, + struct __sanitizer_protoent **result) { + void *ctx; + COMMON_INTERCEPTOR_ENTER(ctx, getprotobyname_r, name, result_buf, buf, + buflen, result); + if (name) + COMMON_INTERCEPTOR_READ_RANGE(ctx, name, REAL(strlen)(name) + 1); + int res = REAL(getprotobyname_r)(name, result_buf, buf, buflen, result); + + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, result, sizeof *result); + if (!res && *result) + write_protoent(ctx, *result); + return res; +} + +INTERCEPTOR(int, getprotobynumber_r, int num, + struct __sanitizer_protoent *result_buf, char *buf, + SIZE_T buflen, struct __sanitizer_protoent **result) { + void *ctx; + COMMON_INTERCEPTOR_ENTER(ctx, getprotobynumber_r, num, result_buf, buf, + buflen, result); + int res = REAL(getprotobynumber_r)(num, result_buf, buf, buflen, result); + + COMMON_INTERCEPTOR_WRITE_RANGE(ctx, result, sizeof *result); + if (!res && *result) + write_protoent(ctx, *result); + return res; +} + +#define INIT_PROTOENT_R \ + COMMON_INTERCEPT_FUNCTION(getprotoent_r); \ + COMMON_INTERCEPT_FUNCTION(getprotobyname_r); \ + COMMON_INTERCEPT_FUNCTION(getprotobynumber_r); +#else +#define INIT_PROTOENT_R +#endif + #if SANITIZER_INTERCEPT_NETENT INTERCEPTOR(struct __sanitizer_netent *, getnetent) { void *ctx; @@ -10071,6 +10104,7 @@ static void InitializeCommonInterceptors() { INIT_STRMODE; INIT_TTYENT; INIT_PROTOENT; + INIT_PROTOENT_R; INIT_NETENT; INIT_GETMNTINFO; INIT_MI_VECTOR_HASH; diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h index a5fcbadb2597..2d48e9d0ae1a 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h @@ -545,7 +545,8 @@ #define SANITIZER_INTERCEPT_FGETLN (SI_NETBSD || SI_FREEBSD) #define SANITIZER_INTERCEPT_STRMODE (SI_NETBSD || SI_FREEBSD) #define SANITIZER_INTERCEPT_TTYENT SI_NETBSD -#define SANITIZER_INTERCEPT_PROTOENT SI_NETBSD +#define SANITIZER_INTERCEPT_PROTOENT (SI_NETBSD || SI_LINUX) +#define SANITIZER_INTERCEPT_PROTOENT_R (SI_LINUX_NOT_ANDROID) #define SANITIZER_INTERCEPT_NETENT SI_NETBSD #define SANITIZER_INTERCEPT_SETVBUF (SI_NETBSD || SI_FREEBSD || \ SI_LINUX || SI_MAC) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h index d80280d9bf8c..ae54a8cf105e 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h @@ -129,12 +129,6 @@ struct __sanitizer_shmid_ds { void *_shm_internal; }; -struct __sanitizer_protoent { - char *p_name; - char **p_aliases; - int p_proto; -}; - struct __sanitizer_netent { char *n_name; char **n_aliases; diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h index f6c8a1450a93..658b0abaece8 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h @@ -704,6 +704,12 @@ struct __sanitizer_dl_phdr_info { extern unsigned struct_ElfW_Phdr_sz; #endif +struct __sanitizer_protoent { + char *p_name; + char **p_aliases; + int p_proto; +}; + struct __sanitizer_addrinfo { int ai_flags; int ai_family; diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp new file mode 100644 index 000000000000..1b4e90a407ac --- /dev/null +++ b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp @@ -0,0 +1,62 @@ +// RUN: %clangxx -std=c++11 -O0 -g %s -o %t && %run %t 2>&1 | FileCheck %s + +#include +#include +#include +#include + +void print_protoent(protoent *curr_entry) { + fprintf(stderr, "%s (%d)\n", curr_entry->p_name, curr_entry->p_proto); + + char **aliases = curr_entry->p_aliases; + while (char *alias = *aliases++) { + fprintf(stderr, " alias %s\n", alias); + } +} + +void print_all_protoent() { + protoent entry; + char buf[1024]; + protoent *curr_entry; + + while (getprotoent_r(&entry, buf, sizeof(buf), &curr_entry) != ENOENT && curr_entry) { + print_protoent(curr_entry); + } +} + +void print_protoent_by_name(const char *name) { + protoent entry; + char buf[1024]; + protoent *curr_entry; + + int res = getprotobyname_r(name, &entry, buf, sizeof(buf), &curr_entry); + assert(!res && curr_entry); + print_protoent(curr_entry); +} + +void print_protoent_by_num(int num) { + protoent entry; + char buf[1024]; + protoent *curr_entry; + + int res = getprotobynumber_r(num, &entry, buf, sizeof(buf), &curr_entry); + assert(!res && curr_entry); + print_protoent(curr_entry); +} + +int main() { + // CHECK: ip (0) + // CHECK-NEXT: alias IP + // CHECK: ipv6 (41) + // CHECK-NEXT: alias IPv6 + print_all_protoent(); + + // CHECK: rdp (27) + // CHECK-NEXT: alias RDP + print_protoent_by_name("rdp"); + + // CHECK: udp (17) + // CHECK-NEXT: alias UDP + print_protoent_by_num(17); + return 0; +} From llvm-commits at lists.llvm.org Wed Jul 8 14:47:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:47:27 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: lebedev.ri added inline comments. ================ Comment at: llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp:74-76 + if (F.getIntrinsicID() != Intrinsic::not_intrinsic) + return; // We can neither add nor remove attributes from intrinsics. + visitAttributeList(F.getAttributes(), FunctionsToRefine[&F]); ---------------- nickdesaulniers wrote: > ``` > // We can neither add nor remove attributes from intrinsics. > if (F.getIntrinsicID() == Intrinsic::not_intrinsic) > visitAttributeList(F.getAttributes(), FunctionsToRefine[&F]); > ``` We can do that, with no line count diff and doesn't the comment loos context - right now it's accurately on the early-return? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 14:48:15 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:48:15 +0000 (UTC) Subject: [PATCH] D83428: [flang] Fix negative unit number hashing Message-ID: klausler created this revision. klausler added reviewers: tskeith, sscalpone. klausler added a project: Flang. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Ensure that external unit number hashing produces a valid index for a negative unit number, viz. a NEWUNIT=. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83428 Files: flang/runtime/unit-map.h Index: flang/runtime/unit-map.h =================================================================== --- flang/runtime/unit-map.h +++ flang/runtime/unit-map.h @@ -15,6 +15,7 @@ #include "lock.h" #include "memory.h" #include "unit.h" +#include namespace Fortran::runtime::io { @@ -59,7 +60,7 @@ }; static constexpr int buckets_{1031}; // must be prime - int Hash(int n) { return n % buckets_; } + int Hash(int n) { return std::abs(n) % buckets_; } ExternalFileUnit *Find(int n) { Chain *previous{nullptr}; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83428.276561.patch Type: text/x-patch Size: 542 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:48:37 2020 From: llvm-commits at lists.llvm.org (John Regehr via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:48:37 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <3c3ea310eb9a62f90af45ea5db372499@localhost.localdomain> regehr added a comment. @craig.topper ok, I agree that should work. alive doesn't like it -- is this an alive bug @nlopes? a freeze should not yield undef. https://alive2.llvm.org/ce/z/mWAsYv Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:51:06 2020 From: llvm-commits at lists.llvm.org (Konstantin Schwarz via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:51:06 +0000 (UTC) Subject: [PATCH] D83390: [GlobalISel][InlineAsm] Extend input operands when register class size does not match type In-Reply-To: References: Message-ID: kschwarz added a comment. A similar issue (for tied input operands) is handled in https://reviews.llvm.org/D83384 The function introduced there should be extended to handle the vector case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83390/new/ https://reviews.llvm.org/D83390 From llvm-commits at lists.llvm.org Wed Jul 8 14:51:26 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:51:26 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <4731c870002abaaff841ee421b6377e1@localhost.localdomain> nickdesaulniers added a comment. In D83351#2140189 , @lebedev.ri wrote: > In D83351#2140177 , @nickdesaulniers wrote: > > > In D83351#2140113 , @lebedev.ri wrote: > > > > > I will split this up into several commits when landing. > > > > > > What's the plan? > > > langref(?), test cleanup, for_each cleanup, rest of the patch SGTM, if you want to split those off in child revisions/phab reviews I'd be happy to review/approve test cleanup, for_each, and the rebased patch for this feature. I think you meant coding standard (as opposed to langref)? (Just checking, or was there a LangRef change, too, and I missed it?) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Wed Jul 8 14:51:37 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:51:37 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: efriedma added a comment. In D83360#2140224 , @regehr wrote: > @craig.topper ok, I agree that should work. alive doesn't like it -- is this an alive bug @nlopes? a freeze should not yield undef. > https://alive2.llvm.org/ce/z/mWAsYv Did you mean to check something like the following? define i32 @src(i1 %cond, i32 %x) { %x2 = freeze i32 %x %s = select i1 %cond, i32 %x2, i32 undef ret i32 %s } define i32 @tgt(i1 %cond, i32 %x) { %x2 = freeze i32 %x ret i32 %x2 } Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:51:57 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:51:57 +0000 (UTC) Subject: [PATCH] D83427: [MSAN] Update tests due to widespread eager checking In-Reply-To: References: Message-ID: vitalybuka added a comment. Each of these can be a separate patch Additionally introduces tests for: - Bitfield handling - Different sized function parameters - MSAN and struct padding (when passing structs by value) And It would be nice if you cut off some other pieces Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83427/new/ https://reviews.llvm.org/D83427 From llvm-commits at lists.llvm.org Wed Jul 8 14:52:11 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:52:11 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: craig.topper added a comment. Alive does like this https://alive2.llvm.org/ce/z/yhibbe which is what I was going to implement. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 14:52:28 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:52:28 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: <8a06de762d573ded751b811060e8fe3d@localhost.localdomain> paulwalker-arm updated this revision to Diff 276563. paulwalker-arm added a comment. Removed shouldExpandBuildVectorWithShuffles override and updated isShuffleMaskLegal instead. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -3,6 +3,18 @@ target triple = "aarch64-unknown-linux-gnu" +; Currently there is no custom lowering for vector shuffles operating on types +; bigger than NEON. However, having no support opens us up to a code generator +; hang when expanding BUILD_VECTOR. Here we just validate the promblematic case +; successfully exits code generation. +define void @hang_when_merging_stores_after_legalisation(<8 x i32>* %a, <2 x i32> %b) #0 { +; CHECK-LABEL: hang_when_merging_stores_after_legalisation: + %splat = shufflevector <2 x i32> %b, <2 x i32> undef, <8 x i32> zeroinitializer + %interleaved.vec = shufflevector <8 x i32> %splat, <8 x i32> undef, <8 x i32> + store <8 x i32> %interleaved.vec, <8 x i32>* %a, align 4 + ret void +} + ; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in ; validating all combinations of vector type. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -734,6 +734,16 @@ bool fallBackToDAGISel(const Instruction &Inst) const override; + /// SVE code generation for fixed length vectors does not custom lower + /// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to + /// merge. However, merging them creates a BUILD_VECTOR that is just as + /// illegal as the original, thus leading to an infinite legalisation loop. + /// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal + /// vector types this override can be removed. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); + } + private: /// Keep a pointer to the AArch64Subtarget around so that we can /// make the right decision when generating code for different targets. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8741,6 +8741,10 @@ } bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef M, EVT VT) const { + // Currently no fixed length shuffles that require SVE are legal. + if (useSVEForFixedLengthVectorVT(VT)) + return false; + if (VT.getVectorNumElements() == 4 && (VT.is128BitVector() || VT.is64BitVector())) { unsigned PFIndexes[4]; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83408.276563.patch Type: text/x-patch Size: 2778 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:53:42 2020 From: llvm-commits at lists.llvm.org (Kostya Serebryany via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:53:42 +0000 (UTC) Subject: [PATCH] D76665: [asan] Stop instrumenting user-defined ELF sections In-Reply-To: References: Message-ID: <627952bdaae38827192164121d3ed974@localhost.localdomain> kcc added a comment. Will adding __attribute__((no_sanitize("address"))) to your global solve the problem you are trying to solve? (sorry for being too terse last time) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76665/new/ https://reviews.llvm.org/D76665 From llvm-commits at lists.llvm.org Wed Jul 8 14:56:15 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:56:15 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: <3ff65a9ea69e19ff4b45413742936b62@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 From llvm-commits at lists.llvm.org Wed Jul 8 14:56:20 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:56:20 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3569 +// them to expand BUILD_VECTOR. +bool AArch64TargetLowering::shouldExpandBuildVectorWithShuffles( + EVT VT, unsigned DefinedValues) const { ---------------- efriedma wrote: > paulwalker-arm wrote: > > efriedma wrote: > > > Would it be enough to just fix isShuffleMaskLegal, instead of overriding this? > > We expand VECTOR_SHUFFLE (using BUILD_VECTOR) so at this stage I think it's better to just prevent extra VECTOR_SHUFFLE instances/code paths as early as possible. > I'm not sure what "early" means in this context; the LegalizeDAG code that checks shouldExpandBuildVectorWithShuffles also checks isShuffleMaskLegal. I was worried about the else clause but see that's unlikely to affect vectors >128bit. Updating isShuffleMaskLegal looks to work and check-llvm didn't highlight any unwanted side effects so this works for me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 From llvm-commits at lists.llvm.org Wed Jul 8 14:56:30 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:56:30 +0000 (UTC) Subject: [PATCH] D83430: [AliasSetTracker] More precise AAInfo intersection check Message-ID: nikic created this revision. nikic added a reviewer: asbirlea. Herald added subscribers: llvm-commits, kosarev. Herald added a project: LLVM. The code currently checks whether the intersection has one of TBAA, Scope or NoAlias unset -- however, those might have already been unset in the first place, in which case we will unnecessarily report a change. Instead, compare the intersection result to the original AAInfo. This makes for a 0.5% geomean compile-time saving: https://llvm-compile-time-tracker.com/compare.php?from=0b39d2d75275b80994dac06b7ad05031cbd09393&to=7e6dc7d267393489b12e4884641f411d375d000e&stat=instructions The current form of the check was introduced in https://github.com/llvm/llvm-project/commit/35548e80d67dd0d6e61c489432cfb1dafe0ddb65, which is how I ran into this. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83430 Files: llvm/include/llvm/Analysis/AliasSetTracker.h Index: llvm/include/llvm/Analysis/AliasSetTracker.h =================================================================== --- llvm/include/llvm/Analysis/AliasSetTracker.h +++ llvm/include/llvm/Analysis/AliasSetTracker.h @@ -87,12 +87,7 @@ AAInfo = NewAAInfo; else { AAMDNodes Intersection(AAInfo.intersect(NewAAInfo)); - if (!Intersection.TBAA || !Intersection.Scope || - !Intersection.NoAlias) { - // NewAAInfo conflicts with AAInfo. - AAInfo = DenseMapInfo::getTombstoneKey(); - SizeChanged = true; - } + SizeChanged |= Intersection != AAInfo; AAInfo = Intersection; } return SizeChanged; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83430.276562.patch Type: text/x-patch Size: 707 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:58:13 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:58:13 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: <91346f6cc1016a2e27e1753897c2979d@localhost.localdomain> Conanap updated this revision to Diff 276564. Conanap added a comment. Relocated some of the instructions to a more appropriate place. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 Files: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s Index: llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s =================================================================== --- llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s +++ llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s @@ -405,3 +405,27 @@ # CHECK-BE: vinsdrx 1, 2, 3 # encoding: [0x10,0x22,0x1b,0xcf] # CHECK-LE: vinsdrx 1, 2, 3 # encoding: [0xcf,0x1b,0x22,0x10] vinsdrx 1, 2, 3 +# CHECK-BE: lxvrbx 32, 1, 2 # encoding: [0x7c,0x01,0x10,0x1b] +# CHECK-LE: lxvrbx 32, 1, 2 # encoding: [0x1b,0x10,0x01,0x7c] + lxvrbx 32, 1, 2 +# CHECK-BE: lxvrhx 33, 1, 2 # encoding: [0x7c,0x21,0x10,0x5b] +# CHECK-LE: lxvrhx 33, 1, 2 # encoding: [0x5b,0x10,0x21,0x7c] + lxvrhx 33, 1, 2 +# CHECK-BE: lxvrdx 34, 1, 2 # encoding: [0x7c,0x41,0x10,0xdb] +# CHECK-LE: lxvrdx 34, 1, 2 # encoding: [0xdb,0x10,0x41,0x7c] + lxvrdx 34, 1, 2 +# CHECK-BE: lxvrwx 35, 1, 2 # encoding: [0x7c,0x61,0x10,0x9b] +# CHECK-LE: lxvrwx 35, 1, 2 # encoding: [0x9b,0x10,0x61,0x7c] + lxvrwx 35, 1, 2 +# CHECK-BE: stxvrbx 32, 3, 1 # encoding: [0x7c,0x03,0x09,0x1b] +# CHECK-LE: stxvrbx 32, 3, 1 # encoding: [0x1b,0x09,0x03,0x7c] + stxvrbx 32, 3, 1 +# CHECK-BE: stxvrhx 33, 3, 1 # encoding: [0x7c,0x23,0x09,0x5b] +# CHECK-LE: stxvrhx 33, 3, 1 # encoding: [0x5b,0x09,0x23,0x7c] + stxvrhx 33, 3, 1 +# CHECK-BE: stxvrwx 34, 3, 1 # encoding: [0x7c,0x43,0x09,0x9b] +# CHECK-LE: stxvrwx 34, 3, 1 # encoding: [0x9b,0x09,0x43,0x7c] + stxvrwx 34, 3, 1 +# CHECK-BE: stxvrdx 35, 3, 1 # encoding: [0x7c,0x63,0x09,0xdb] +# CHECK-LE: stxvrdx 35, 3, 1 # encoding: [0xdb,0x09,0x63,0x7c] + stxvrdx 35, 3, 1 Index: llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt =================================================================== --- llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt +++ llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt @@ -278,3 +278,27 @@ # CHECK: vinsdrx 1, 2, 3 0x10 0x22 0x1b 0xcf + +# CHECK: lxvrbx 32, 1, 2 +0x7c 0x01 0x10 0x1b + +# CHECK: lxvrhx 33, 1, 2 +0x7c 0x21 0x10 0x5b + +# CHECK: lxvrdx 34, 1, 2 +0x7c 0x41 0x10 0xdb + +# CHECK: lxvrwx 35, 1, 2 +0x7c 0x61 0x10 0x9b + +# CHECK: stxvrbx 32, 3, 1 +0x7c 0x03 0x09 0x1b + +# CHECK: stxvrhx 33, 3, 1 +0x7c 0x23 0x09 0x5b + +# CHECK: stxvrwx 34, 3, 1 +0x7c 0x43 0x09 0x9b + +# CHECK: stxvrdx 35, 3, 1 +0x7c 0x63 0x09 0xdb Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -934,8 +934,25 @@ "vclrrb $vD, $vA, $rB", IIC_VecGeneral, [(set v16i8:$vD, (int_ppc_altivec_vclrrb v16i8:$vA, i32:$rB))]>; + + // The XFormMemOp flag for the following 8 insts is set on the instruction format. + let mayLoad = 1, mayStore = 1 in { + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; + def LXVRHX : X_XT6_RA5_RB5<31, 45, "lxvrhx", vsrc, []>; + def LXVRWX : X_XT6_RA5_RB5<31, 77, "lxvrwx", vsrc, []>; + def LXVRDX : X_XT6_RA5_RB5<31, 109, "lxvrdx", vsrc, []>; + } + + let mayLoad = 0, mayStore = 1 in { + def STXVRBX : X_XS6_RA5_RB5<31, 141, "stxvrbx", vsrc, []>; + def STXVRHX : X_XS6_RA5_RB5<31, 173, "stxvrhx", vsrc, []>; + def STXVRWX : X_XS6_RA5_RB5<31, 205, "stxvrwx", vsrc, []>; + def STXVRDX : X_XS6_RA5_RB5<31, 237, "stxvrdx", vsrc, []>; + } } + + //---------------------------- Anonymous Patterns ----------------------------// let Predicates = [IsISA3_1] in { def : Pat<(v16i8 (int_ppc_vsx_xxgenpcvbm v16i8:$VRB, imm:$IMM)), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83364.276564.patch Type: text/x-patch Size: 4046 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 14:58:48 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 21:58:48 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: <03fbe773f5d8615e2a55a97228fc041f@localhost.localdomain> Conanap marked 2 inline comments as done. Conanap added a comment. Relocated the isntr definitions to a more appropriate place. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Wed Jul 8 15:00:10 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:00:10 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: dblaikie, nickdesaulniers. lebedev.ri added a project: LLVM. As per lengthy/heated disscussion in D83351 , using `for_each` is potentially confusing, and it is therefore should be avoided. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83431 Files: llvm/docs/CodingStandards.rst Index: llvm/docs/CodingStandards.rst =================================================================== --- llvm/docs/CodingStandards.rst +++ llvm/docs/CodingStandards.rst @@ -1302,6 +1302,8 @@ for (Instruction &I : *BB) ... use I ... +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged. + Don't evaluate ``end()`` every time through a loop ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------- next part -------------- A non-text attachment was scrubbed... Name: D83431.276565.patch Type: text/x-patch Size: 431 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:01:23 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:01:23 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: nickdesaulniers, dblaikie, MaskRay. lebedev.ri added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83432 Files: llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll Index: llvm/test/Reduce/remove-operand-bundles.ll =================================================================== --- llvm/test/Reduce/remove-operand-bundles.ll +++ llvm/test/Reduce/remove-operand-bundles.ll @@ -1,6 +1,6 @@ ; Test that llvm-reduce can remove uninteresting operand bundles from calls. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t ; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s Index: llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll =================================================================== --- llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll +++ llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll @@ -1,6 +1,6 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-multiple-use-of-global-vars-in-same-instruction.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s Index: llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll =================================================================== --- llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll +++ llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll @@ -1,6 +1,6 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-multiple-use-of-args-in-same-instruction.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s Index: llvm/test/Reduce/remove-metadata.ll =================================================================== --- llvm/test/Reduce/remove-metadata.ll +++ llvm/test/Reduce/remove-metadata.ll @@ -1,7 +1,7 @@ ; Test that llvm-reduce can remove uninteresting metadata from an IR file. ; The Metadata pass erases named & unnamed metadata nodes. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-metadata.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=! %s Index: llvm/test/Reduce/remove-global-vars.ll =================================================================== --- llvm/test/Reduce/remove-global-vars.ll +++ llvm/test/Reduce/remove-global-vars.ll @@ -1,7 +1,7 @@ ; Test that llvm-reduce can remove uninteresting Global Variables as well as ; their direct uses (which in turn are replaced with 'undef'). ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-global-vars.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s Index: llvm/test/Reduce/remove-funcs.ll =================================================================== --- llvm/test/Reduce/remove-funcs.ll +++ llvm/test/Reduce/remove-funcs.ll @@ -1,7 +1,7 @@ ; Test that llvm-reduce can remove uninteresting functions as well as ; their InstCalls. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-funcs.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s Index: llvm/test/Reduce/remove-args.ll =================================================================== --- llvm/test/Reduce/remove-args.ll +++ llvm/test/Reduce/remove-args.ll @@ -1,6 +1,6 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t +; RUN: rm -f %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-args.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83432.276567.patch Type: text/x-patch Size: 3911 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:03:09 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:03:09 +0000 (UTC) Subject: [PATCH] D83434: [NFC][llvm-reduce] Purify for_each usage in Operand Bundles into range-based for loop Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: dblaikie, nickdesaulniers. lebedev.ri added a project: LLVM. As per lengthy/heated disscussion in D83351 , and CodingStandards D83431 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83434 Files: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Index: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp =================================================================== --- llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -56,10 +56,9 @@ OperandBundlesToKeepIndexes.reserve(Call.getNumOperandBundles()); // Enumerate every operand bundle on this call. - for_each(seq(0U, Call.getNumOperandBundles()), [&](unsigned BundleIndex) { + for (unsigned BundleIndex : seq(0U, Call.getNumOperandBundles())) if (O.shouldKeep()) // Should we keep this one? OperandBundlesToKeepIndexes.emplace_back(BundleIndex); - }); } }; @@ -104,9 +103,8 @@ OperandBundleRemapper R(ChunksToKeep); R.visit(Program); - for_each(R.CallsToRefine, [](const auto &P) { - return maybeRewriteCallWithDifferentBundles(P.first, P.second); - }); + for (const auto &I : R.CallsToRefine) + maybeRewriteCallWithDifferentBundles(I.first, I.second); } /// Counts the amount of operand bundles. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83434.276568.patch Type: text/x-patch Size: 1044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:04:20 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:04:20 +0000 (UTC) Subject: [PATCH] D83435: [NFCI][llvm-reduce] OperandBundleCounter: drop pointless constructor Message-ID: lebedev.ri created this revision. lebedev.ri added reviewers: nickdesaulniers, dblaikie. lebedev.ri added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83435 Files: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Index: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp =================================================================== --- llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -66,8 +66,6 @@ /// How many features (in this case, operand bundles) did we count, total? int OperandBundeCount = 0; - OperandBundleCounter() {} - /// So far only CallBase sub-classes can have operand bundles. void visitCallBase(CallBase &Call) { // Just accumulate the total number of operand bundles. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83435.276571.patch Type: text/x-patch Size: 574 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:06:05 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:06:05 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <99f12c1b90df0138856d19078ed8c096@localhost.localdomain> lebedev.ri updated this revision to Diff 276573. lebedev.ri added a comment. In D83351#2140238 , @nickdesaulniers wrote: > In D83351#2140189 , @lebedev.ri wrote: > > > In D83351#2140177 , @nickdesaulniers wrote: > > > > > In D83351#2140113 , @lebedev.ri wrote: > > > > > > > I will split this up into several commits when landing. > > > > > > > > > What's the plan? > > > > > > langref(?), test cleanup, for_each cleanup, rest of the patch > > > SGTM, if you want to split those off in child revisions/phab reviews I'd be happy to review/approve test cleanup, for_each, and the rebased patch for this feature. Since that wasn't paired with an accept, i can only surmise that it wasn’t an offer. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll llvm/test/Reduce/remove-attributes-from-intrinsics.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276573.patch Type: text/x-patch Size: 16837 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:08:04 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:08:04 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: NeHuang updated this revision to Diff 276572. NeHuang added a comment. - Merged two lit tests. - Removed the instruction after `bl .. at noc` and added comments.. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/test/ELF/Inputs/ppc64-callee-global-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.276572.patch Type: text/x-patch Size: 8412 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:17:00 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:17:00 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: <275880f0ec0aad8cf6d06f59c2098555@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. This revision is now accepted and ready to land. LGTM. Hope @sfertile can confirm. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:60 + .localentry caller1, 1 + # nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee1_stother0_default at notoc ---------------- We use `## ` for comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Wed Jul 8 15:17:14 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:17:14 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <1a1b7156641a3d505167cc5daedc90c2@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/docs/CodingStandards.rst:1305 +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged. + ---------------- Even if what's available in the context of the code is a pair of iterators and a callable (and using a range-based for loop would involve extra boilerplate)? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Wed Jul 8 15:18:23 2020 From: llvm-commits at lists.llvm.org (Nuno Lopes via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:18:23 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: nlopes added a comment. In D83360#2140241 , @craig.topper wrote: > Alive does like this https://alive2.llvm.org/ce/z/yhibbe which is what I was going to implement. right. There's a guaranteedNonPoison (or similar name) in ValueTracking that can be used I guess. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 15:19:03 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:19:03 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations Message-ID: lei created this revision. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya, nemanjai. Herald added a project: LLVM. Remove option guarding support of quad precision operations. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83437 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll llvm/test/CodeGen/PowerPC/constant-pool.ll llvm/test/CodeGen/PowerPC/f128-aggregates.ll llvm/test/CodeGen/PowerPC/f128-arith.ll llvm/test/CodeGen/PowerPC/f128-bitcast.ll llvm/test/CodeGen/PowerPC/f128-compare.ll llvm/test/CodeGen/PowerPC/f128-conv.ll llvm/test/CodeGen/PowerPC/f128-fma.ll llvm/test/CodeGen/PowerPC/f128-passByValue.ll llvm/test/CodeGen/PowerPC/f128-rounding.ll llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll llvm/test/CodeGen/PowerPC/float-load-store-pair.ll llvm/test/CodeGen/PowerPC/fp-strict-f128.ll llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll llvm/test/CodeGen/PowerPC/recipest.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83437.276579.patch Type: text/x-patch Size: 24112 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:19:06 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:19:06 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough In-Reply-To: References: Message-ID: <63fe20f34f55718b3c552d4bbbaec5eb@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83432/new/ https://reviews.llvm.org/D83432 From llvm-commits at lists.llvm.org Wed Jul 8 15:20:27 2020 From: llvm-commits at lists.llvm.org (Andrew Litteken via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:20:27 +0000 (UTC) Subject: [PATCH] D82730: [SimplifyCFG] Merge identical basic blocks (WIP) In-Reply-To: References: Message-ID: <3b2cd05aa133e4c4739dd60b932d5dc3@localhost.localdomain> AndrewLitteken added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:210 + + // Map from instructions in one block to instructions in the other. + SmallDenseMap Map; ---------------- I have a similar checking mechanism for checking if two sections of IR are similar, so it might make sense to create a shared utility function for the instruction mapping/comparison section. Where a basic block or range of instructions is passed in and the "`isSameOperationAs`" check for the range is performed, then if we develop new mechanisms for whether these sections are the same operations both passes get the same gains. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82730/new/ https://reviews.llvm.org/D82730 From llvm-commits at lists.llvm.org Wed Jul 8 15:20:29 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:20:29 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <94f50c30f54fbe7ae5d9c14fe22bc529@localhost.localdomain> lebedev.ri added inline comments. ================ Comment at: llvm/docs/CodingStandards.rst:1305 +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged. + ---------------- hubert.reinterpretcast wrote: > Even if what's available in the context of the code is a pair of iterators and a callable (and using a range-based for loop would involve extra boilerplate)? That is my understanding, yes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Wed Jul 8 15:25:12 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:25:12 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <20ec4eb17defee92aa65b7a29bbe3ff2@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:849 +static SmallString<32> parseParaType(uint32_t Value, unsigned ParaNum) { + SmallString<32> ParaType; + for (unsigned I = 0; I < ParaNum; ++I) { ---------------- jasonliu wrote: > Why always 32? As I mentioned in the other comment, ParaNum have implication for how large your SmallString could be. The template argument needs to be a compile time constant. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 15:27:03 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Wed, 08 Jul 2020 15:27:03 -0700 (PDT) Subject: [llvm] ac0af12 - [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison Message-ID: <5f064837.1c69fb81.98439.25ba@mx.google.com> Author: Craig Topper Date: 2020-07-08T15:24:55-07:00 New Revision: ac0af12ed2fc60cba494a7e5df1778fd6dedd481 URL: https://github.com/llvm/llvm-project/commit/ac0af12ed2fc60cba494a7e5df1778fd6dedd481 DIFF: https://github.com/llvm/llvm-project/commit/ac0af12ed2fc60cba494a7e5df1778fd6dedd481.diff LOG: [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison Part of addressing post-commit feedback from D83360 Added: Modified: llvm/test/Transforms/InstSimplify/select.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll index 81fc3ff186cd..0f43c8f61945 100644 --- a/llvm/test/Transforms/InstSimplify/select.ll +++ b/llvm/test/Transforms/InstSimplify/select.ll @@ -790,3 +790,61 @@ define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { %s = select i1 %cond, <2 x i32> undef, <2 x i32> %x ret <2 x i32> %s } + +; These can be folded because the other value is guaranteed not to be poison. +define i32 @false_undef_true_constant(i1 %cond) { +; CHECK-LABEL: @false_undef_true_constant( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 10, i32 undef +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 10, i32 undef + ret i32 %s +} + +define i32 @true_undef_false_constant(i1 %cond) { +; CHECK-LABEL: @true_undef_false_constant( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 20 +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 %cond, i32 undef, i32 20 + ret i32 %s +} + +define <2 x i32> @false_undef_true_constant_vec(i1 %cond) { +; CHECK-LABEL: @false_undef_true_constant_vec( +; CHECK-NEXT: ret <2 x i32> +; + %s = select i1 %cond, <2 x i32> , <2 x i32> undef + ret <2 x i32> %s +} + +define <2 x i32> @true_undef_false_constant_vec(i1 %cond) { +; CHECK-LABEL: @true_undef_false_constant_vec( +; CHECK-NEXT: ret <2 x i32> +; + %s = select i1 %cond, <2 x i32> undef, <2 x i32> + ret <2 x i32> %s +} + +; If one input is undef and the other is freeze, we can fold it to the freeze. +define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { +; CHECK-LABEL: @false_undef_true_freeze( +; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[XF]], i32 undef +; CHECK-NEXT: ret i32 [[S]] +; + %xf = freeze i32 %x + %s = select i1 %cond, i32 %xf, i32 undef + ret i32 %s +} + +define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { +; CHECK-LABEL: @false_undef_false_freeze( +; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[XF]] +; CHECK-NEXT: ret i32 [[S]] +; + %xf = freeze i32 %x + %s = select i1 %cond, i32 undef, i32 %xf + ret i32 %s +} From llvm-commits at lists.llvm.org Wed Jul 8 15:27:07 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:27:07 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <371582704abf44b1e9052c1c81897879@localhost.localdomain> nickdesaulniers added a comment. > As per lengthy/heated disscussion in D83351 , Probably could drop those adjectives. =P Also, s/disscussion/discussion/. > using for_each is potentially confusing, I guess @dblaikie did use the term confusing , but it might be useful to better reflect his point about confusion in regards to inconsistent style in the commit message. It certainly begs the question otherwise. ================ Comment at: llvm/docs/CodingStandards.rst:1305 +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged. + ---------------- lebedev.ri wrote: > hubert.reinterpretcast wrote: > > Even if what's available in the context of the code is a pair of iterators and a callable (and using a range-based for loop would involve extra boilerplate)? > That is my understanding, yes. it might be useful to provide more information here. Use of X is discouraged because ... IIUC @dblaikie 's points in https://reviews.llvm.org/D83351#2139727 and https://reviews.llvm.org/D83351#2139861 were that: 1. more concise error messages. 2. more concise unless an existing function/method/lambda already exists. I love functional programming styles, but if an inline lambda definition isn't shorter than a range-for, I agree with @dblaikie and feel a lambda is overkill. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Wed Jul 8 15:27:46 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:27:46 +0000 (UTC) Subject: [PATCH] D83439: [NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder Message-ID: wmi created this revision. wmi added reviewers: davidxl, wenlei, xur. Herald added subscribers: hiraditya, eraman. Herald added a project: LLVM. Change file static function getEntryForPercentile to be a static member function in ProfileSummaryBuilder so it can be used in other files. Repository: rL LLVM https://reviews.llvm.org/D83439 Files: llvm/include/llvm/ProfileData/ProfileCommon.h llvm/lib/Analysis/ProfileSummaryInfo.cpp llvm/lib/ProfileData/ProfileSummaryBuilder.cpp Index: llvm/lib/ProfileData/ProfileSummaryBuilder.cpp =================================================================== --- llvm/lib/ProfileData/ProfileSummaryBuilder.cpp +++ llvm/lib/ProfileData/ProfileSummaryBuilder.cpp @@ -31,6 +31,19 @@ const ArrayRef ProfileSummaryBuilder::DefaultCutoffs = DefaultCutoffsData; +const ProfileSummaryEntry & +ProfileSummaryBuilder::getEntryForPercentile(SummaryEntryVector &DS, + uint64_t Percentile) { + auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { + return Entry.Cutoff < Percentile; + }); + // The required percentile has to be <= one of the percentiles in the + // detailed summary. + if (It == DS.end()) + report_fatal_error("Desired percentile exceeds the maximum cutoff"); + return *It; +} + void InstrProfSummaryBuilder::addRecord(const InstrProfRecord &R) { // The first counter is not necessarily an entry count for IR // instrumentation profiles. Index: llvm/lib/Analysis/ProfileSummaryInfo.cpp =================================================================== --- llvm/lib/Analysis/ProfileSummaryInfo.cpp +++ llvm/lib/Analysis/ProfileSummaryInfo.cpp @@ -19,6 +19,7 @@ #include "llvm/IR/Module.h" #include "llvm/IR/ProfileSummary.h" #include "llvm/InitializePasses.h" +#include "llvm/ProfileData/ProfileCommon.h" #include "llvm/Support/CommandLine.h" using namespace llvm; @@ -70,19 +71,6 @@ "partial-profile", cl::Hidden, cl::init(false), cl::desc("Specify the current profile is used as a partial profile.")); -// Find the summary entry for a desired percentile of counts. -static const ProfileSummaryEntry &getEntryForPercentile(SummaryEntryVector &DS, - uint64_t Percentile) { - auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { - return Entry.Cutoff < Percentile; - }); - // The required percentile has to be <= one of the percentiles in the - // detailed summary. - if (It == DS.end()) - report_fatal_error("Desired percentile exceeds the maximum cutoff"); - return *It; -} - // The profile summary metadata may be attached either by the frontend or by // any backend passes (IR level instrumentation, for example). This method // checks if the Summary is null and if so checks if the summary metadata is now @@ -270,13 +258,13 @@ if (!computeSummary()) return; auto &DetailedSummary = Summary->getDetailedSummary(); - auto &HotEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffHot); + auto &HotEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffHot); HotCountThreshold = HotEntry.MinCount; if (ProfileSummaryHotCount.getNumOccurrences() > 0) HotCountThreshold = ProfileSummaryHotCount; - auto &ColdEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffCold); + auto &ColdEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffCold); ColdCountThreshold = ColdEntry.MinCount; if (ProfileSummaryColdCount.getNumOccurrences() > 0) ColdCountThreshold = ProfileSummaryColdCount; @@ -296,8 +284,8 @@ return iter->second; } auto &DetailedSummary = Summary->getDetailedSummary(); - auto &Entry = - getEntryForPercentile(DetailedSummary, PercentileCutoff); + auto &Entry = ProfileSummaryBuilder::getEntryForPercentile(DetailedSummary, + PercentileCutoff); uint64_t CountThreshold = Entry.MinCount; ThresholdCache[PercentileCutoff] = CountThreshold; return CountThreshold; Index: llvm/include/llvm/ProfileData/ProfileCommon.h =================================================================== --- llvm/include/llvm/ProfileData/ProfileCommon.h +++ llvm/include/llvm/ProfileData/ProfileCommon.h @@ -62,6 +62,10 @@ public: /// A vector of useful cutoff values for detailed summary. static const ArrayRef DefaultCutoffs; + + /// Find the summary entry for a desired percentile of counts. + static const ProfileSummaryEntry & + getEntryForPercentile(SummaryEntryVector &DS, uint64_t Percentile); }; class InstrProfSummaryBuilder final : public ProfileSummaryBuilder { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83439.276580.patch Type: text/x-patch Size: 4309 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:31:10 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:31:10 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <1ab0eb6f9b2630e7a1dc29d1772f241b@localhost.localdomain> lenary added a comment. I realise this is almost certainly something we want to land before the LLVM 11 branch date, as we included schedules in LLVM 10 with no way to use them, and would like users to be able to use them. I'll bring it up on the call tomorrow - I hope this PR implements what we agreed from the previous calls. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 From llvm-commits at lists.llvm.org Wed Jul 8 15:33:00 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Wed, 08 Jul 2020 15:33:00 -0700 (PDT) Subject: [llvm] 2ec5fc0 - DAG: Remove redundant handling of reg fixups Message-ID: <5f06499c.1c69fb81.c351e.25b7@mx.google.com> Author: Matt Arsenault Date: 2020-07-08T18:32:43-04:00 New Revision: 2ec5fc0c61fb4472bd5f9ea71130cdba215ed9a8 URL: https://github.com/llvm/llvm-project/commit/2ec5fc0c61fb4472bd5f9ea71130cdba215ed9a8 DIFF: https://github.com/llvm/llvm-project/commit/2ec5fc0c61fb4472bd5f9ea71130cdba215ed9a8.diff LOG: DAG: Remove redundant handling of reg fixups It looks like 9cac4e6d1403554b06ec2fc9d834087b1234b695 accidentally added a second copy of this from a bad rebase or something. This second copy was added, and the finalizeLowering call was not deleted as intended. Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp index 51afccdcb645..df0ce502a059 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp @@ -661,34 +661,6 @@ bool SelectionDAGISel::runOnMachineFunction(MachineFunction &mf) { // Determine if floating point is used for msvc computeUsesMSVCFloatingPoint(TM.getTargetTriple(), Fn, MF->getMMI()); - // Replace forward-declared registers with the registers containing - // the desired value. - for (DenseMap::iterator - I = FuncInfo->RegFixups.begin(), E = FuncInfo->RegFixups.end(); - I != E; ++I) { - Register From = I->first; - Register To = I->second; - // If To is also scheduled to be replaced, find what its ultimate - // replacement is. - while (true) { - DenseMap::iterator J = FuncInfo->RegFixups.find(To); - if (J == E) break; - To = J->second; - } - // Make sure the new register has a sufficiently constrained register class. - if (Register::isVirtualRegister(From) && Register::isVirtualRegister(To)) - MRI.constrainRegClass(To, MRI.getRegClass(From)); - // Replace it. - - - // Replacing one register with another won't touch the kill flags. - // We need to conservatively clear the kill flags as a kill on the old - // register might dominate existing uses of the new register. - if (!MRI.use_empty(To)) - MRI.clearKillFlags(From); - MRI.replaceRegWith(From, To); - } - TLI->finalizeLowering(*MF); // Release function-specific state. SDB and CurDAG are already cleared From llvm-commits at lists.llvm.org Wed Jul 8 15:34:36 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:34:36 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: <680eff364532c218d8e8dc3748326246@localhost.localdomain> wmi marked an inline comment as done. wmi added a comment. refactor GetEntryForPercentile out in https://reviews.llvm.org/D83439 Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 From llvm-commits at lists.llvm.org Wed Jul 8 15:39:58 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:39:58 +0000 (UTC) Subject: [PATCH] D83439: [NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder In-Reply-To: References: Message-ID: <5f54d04749a41f476343aba66524a819@localhost.localdomain> davidxl accepted this revision. davidxl added a comment. This revision is now accepted and ready to land. lgtm Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83439/new/ https://reviews.llvm.org/D83439 From llvm-commits at lists.llvm.org Wed Jul 8 15:40:37 2020 From: llvm-commits at lists.llvm.org (LLVM GN Syncbot via llvm-commits) Date: Wed, 08 Jul 2020 15:40:37 -0700 (PDT) Subject: [llvm] 3101fc6 - [gn build] Port d999cbc9883 Message-ID: <5f064b65.1c69fb81.14da2.2698@mx.google.com> Author: LLVM GN Syncbot Date: 2020-07-08T22:37:03Z New Revision: 3101fc692d2443226e749b8a643603efee695acc URL: https://github.com/llvm/llvm-project/commit/3101fc692d2443226e749b8a643603efee695acc DIFF: https://github.com/llvm/llvm-project/commit/3101fc692d2443226e749b8a643603efee695acc.diff LOG: [gn build] Port d999cbc9883 Added: Modified: llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn b/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn index 38bbb68d64f3..d1fc6ad4d979 100644 --- a/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn +++ b/llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn @@ -158,6 +158,7 @@ copy("Headers") { "opencl-c.h", "openmp_wrappers/__clang_openmp_device_functions.h", "openmp_wrappers/cmath", + "openmp_wrappers/complex.h", "openmp_wrappers/math.h", "pconfigintrin.h", "pkuintrin.h", From llvm-commits at lists.llvm.org Wed Jul 8 15:41:05 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:41:05 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <5b132ecd1ac176263944c690871b9dec@localhost.localdomain> lebedev.ri updated this revision to Diff 276585. lebedev.ri added a comment. It's okay if no new labmbda is needed though. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 Files: llvm/docs/CodingStandards.rst Index: llvm/docs/CodingStandards.rst =================================================================== --- llvm/docs/CodingStandards.rst +++ llvm/docs/CodingStandards.rst @@ -1302,6 +1302,9 @@ for (Instruction &I : *BB) ... use I ... +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged, +unless the the callable object already exists. + Don't evaluate ``end()`` every time through a loop ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------- next part -------------- A non-text attachment was scrubbed... Name: D83431.276585.patch Type: text/x-patch Size: 479 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:41:12 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:41:12 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <1c55d099e6ec5e6d2485452b91d3088e@localhost.localdomain> lebedev.ri added inline comments. ================ Comment at: llvm/docs/CodingStandards.rst:1305 +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged. + ---------------- nickdesaulniers wrote: > lebedev.ri wrote: > > hubert.reinterpretcast wrote: > > > Even if what's available in the context of the code is a pair of iterators and a callable (and using a range-based for loop would involve extra boilerplate)? > > That is my understanding, yes. > it might be useful to provide more information here. > > Use of X is discouraged because ... > > IIUC @dblaikie 's points in https://reviews.llvm.org/D83351#2139727 and https://reviews.llvm.org/D83351#2139861 were that: > 1. more concise error messages. > 2. more concise unless an existing function/method/lambda already exists. > > I love functional programming styles, but if an inline lambda definition isn't shorter than a range-for, I agree with @dblaikie and feel a lambda is overkill. I'm not really sure i understand the point about error message. Is this better? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Wed Jul 8 15:41:14 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:41:14 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, reames, nlopes, lebedev.ri, efriedma, majnemer. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Follow up from the transform being removed in D83360 . If X is probably not poison, then the transform is safe. Still plan to remove or adjust the code from ConstantFolding after this. Also need to audit the partial undef vector handling in InstSimplify just below this code. https://reviews.llvm.org/D83440 Files: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Index: llvm/test/Transforms/InstSimplify/select.ll =================================================================== --- llvm/test/Transforms/InstSimplify/select.ll +++ llvm/test/Transforms/InstSimplify/select.ll @@ -794,8 +794,7 @@ ; These can be folded because the other value is guaranteed not to be poison. define i32 @false_undef_true_constant(i1 %cond) { ; CHECK-LABEL: @false_undef_true_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 10, i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 10 ; %s = select i1 %cond, i32 10, i32 undef ret i32 %s @@ -803,8 +802,7 @@ define i32 @true_undef_false_constant(i1 %cond) { ; CHECK-LABEL: @true_undef_false_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 20 -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 20 ; %s = select i1 %cond, i32 undef, i32 20 ret i32 %s @@ -830,8 +828,7 @@ define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_true_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[XF]], i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 %xf, i32 undef @@ -841,8 +838,7 @@ define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_false_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[XF]] -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 undef, i32 %xf Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4118,6 +4118,15 @@ if (TrueVal == FalseVal) return TrueVal; + // If the true or false value is undef, we can fold to the other value as + // long as the other value isn't poison. + // select ?, undef, X -> X + if (isa(TrueVal) && isGuaranteedNotToBeUndefOrPoison(FalseVal)) + return FalseVal; + // select ?, X, undef -> X + if (isa(FalseVal) && isGuaranteedNotToBeUndefOrPoison(TrueVal)) + return TrueVal; + // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && -------------- next part -------------- A non-text attachment was scrubbed... Name: D83440.276583.patch Type: text/x-patch Size: 2535 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:46:17 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:46:17 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <8cc88a15a40b6111edc8be7d5eff9db4@localhost.localdomain> craig.topper marked an inline comment as done. craig.topper added inline comments. ================ Comment at: llvm/lib/Analysis/InstructionSimplify.cpp:4124 + // select ?, undef, X -> X + if (isa(TrueVal) && isGuaranteedNotToBeUndefOrPoison(FalseVal)) + return FalseVal; ---------------- Should I be passing CxtI and DT here? I based this off the code in SimplifyInsertElementInst, but maybe i should have based it off of SimplifyFreezeInst? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Wed Jul 8 15:46:27 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:46:27 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: wmi updated this revision to Diff 276586. wmi added a comment. Address David's comment. Adjust comments, function names and flag names. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 Files: llvm/include/llvm/ProfileData/InstrProf.h llvm/include/llvm/ProfileData/InstrProfWriter.h llvm/lib/ProfileData/InstrProf.cpp llvm/lib/ProfileData/InstrProfWriter.cpp llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test llvm/tools/llvm-profdata/llvm-profdata.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81981.276586.patch Type: text/x-patch Size: 17661 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:48:27 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Wed, 08 Jul 2020 15:48:27 -0700 (PDT) Subject: [llvm] 18bd821 - DAG: Remove redundant finalizeLowering call Message-ID: <5f064d3b.1c69fb81.5d2fc.268c@mx.google.com> Author: Matt Arsenault Date: 2020-07-08T18:48:20-04:00 New Revision: 18bd821f02261065a2235e43c7290b57d55224c8 URL: https://github.com/llvm/llvm-project/commit/18bd821f02261065a2235e43c7290b57d55224c8 DIFF: https://github.com/llvm/llvm-project/commit/18bd821f02261065a2235e43c7290b57d55224c8.diff LOG: DAG: Remove redundant finalizeLowering call 9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate this, and move all isel pseudo expansion to FinalizeISel. This was a bad rebase or something, and failed to actually delete this call. GlobalISel also has a redundant call of finalizeLowering. However, it requires more work to remove it since it currently triggers a lot of verifier errors in tests. Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp index df0ce502a059..1f0432196a2d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp @@ -661,8 +661,6 @@ bool SelectionDAGISel::runOnMachineFunction(MachineFunction &mf) { // Determine if floating point is used for msvc computeUsesMSVCFloatingPoint(TM.getTargetTriple(), Fn, MF->getMMI()); - TLI->finalizeLowering(*MF); - // Release function-specific state. SDB and CurDAG are already cleared // at this point. FuncInfo->clear(); From llvm-commits at lists.llvm.org Wed Jul 8 15:54:27 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:54:27 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: wmi updated this revision to Diff 276588. wmi added a comment. Fix a wrong flag name in test. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 Files: llvm/include/llvm/ProfileData/InstrProf.h llvm/include/llvm/ProfileData/InstrProfWriter.h llvm/lib/ProfileData/InstrProf.cpp llvm/lib/ProfileData/InstrProfWriter.cpp llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test llvm/tools/llvm-profdata/llvm-profdata.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81981.276588.patch Type: text/x-patch Size: 17664 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:57:58 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Wed, 8 Jul 2020 15:57:58 -0700 Subject: [PATCH] D83355: [flang] upstream intrinsic call lowering In-Reply-To: References: Message-ID: So this caused a dozen more _Complex warnings and other warnings. -eric On Wed, Jul 8, 2020 at 7:34 AM Eric Schweitz via Phabricator via llvm-commits wrote: > This revision was automatically updated to reflect the committed changes. > Closed by commit rG24b62f28c5da: [flang] Upstreaming intrinsic call > lowering. (authored by schweitz). > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D83355/new/ > > https://reviews.llvm.org/D83355 > > Files: > flang/include/flang/Lower/CharacterExpr.h > flang/include/flang/Lower/IntrinsicCall.h > flang/include/flang/Lower/Mangler.h > flang/include/flang/Optimizer/Dialect/FIRType.h > flang/lib/Lower/CMakeLists.txt > flang/lib/Lower/CharacterExpr.cpp > flang/lib/Lower/IntrinsicCall.cpp > flang/lib/Lower/Mangler.cpp > flang/lib/Optimizer/Dialect/FIRType.cpp > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:06:20 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:06:20 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <06c1cb72cadb32a1027d5c38511dc651@localhost.localdomain> avl updated this revision to Diff 276591. avl added a comment. addressed comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 Files: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/test/Transforms/TailCallElim/basic.ll llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82085.276591.patch Type: text/x-patch Size: 13610 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:07:22 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:07:22 +0000 (UTC) Subject: [PATCH] D83423: [MC, NVPTX] Add MCAsmPrinter support for unsigned-only data directives. In-Reply-To: References: Message-ID: <416f96f7cddafffe22c5a37ed4ca7a7e@localhost.localdomain> tra updated this revision to Diff 276592. tra edited the summary of this revision. tra added a comment. Updated existing test which produced data directive w/ negative value. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83423/new/ https://reviews.llvm.org/D83423 Files: llvm/include/llvm/MC/MCAsmInfo.h llvm/lib/MC/MCExpr.cpp llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp llvm/test/DebugInfo/NVPTX/packed_bitfields.ll Index: llvm/test/DebugInfo/NVPTX/packed_bitfields.ll =================================================================== --- llvm/test/DebugInfo/NVPTX/packed_bitfields.ll +++ llvm/test/DebugInfo/NVPTX/packed_bitfields.ll @@ -14,7 +14,8 @@ ; CHECK: .b8 3 // DW_AT_decl_line ; CHECK-NEXT: .b8 1 // DW_AT_byte_size ; CHECK-NEXT: .b8 6 // DW_AT_bit_size -; CHECK-NEXT: .b64 -1 // DW_AT_bit_offset +; Negative offset must be encoded as an unsigned integer. +; CHECK-NEXT: .b64 18446744073709551615 // DW_AT_bit_offset ; CHECK-NEXT: .b8 2 // DW_AT_data_member_location %struct.anon = type { i16 } Index: llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp =================================================================== --- llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp +++ llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp @@ -47,6 +47,7 @@ AscizDirective = nullptr; // not supported SupportsQuotedNames = false; SupportsExtendedDwarfLocDirective = false; + SupportsSignedData = false; // @TODO: Can we just disable this? WeakDirective = "\t// .weak\t"; Index: llvm/lib/MC/MCExpr.cpp =================================================================== --- llvm/lib/MC/MCExpr.cpp +++ llvm/lib/MC/MCExpr.cpp @@ -65,6 +65,8 @@ OS << format("0x%016" PRIx64, Value); break; } + else if (MAI && !MAI->supportsSignedData()) + OS << static_cast(Value); else OS << Value; return; Index: llvm/include/llvm/MC/MCAsmInfo.h =================================================================== --- llvm/include/llvm/MC/MCAsmInfo.h +++ llvm/include/llvm/MC/MCAsmInfo.h @@ -209,6 +209,9 @@ const char *Data32bitsDirective; const char *Data64bitsDirective; + /// True if data directives support signed values + bool SupportsSignedData = true; + /// If non-null, a directive that is used to emit a word which should be /// relocated as a 64-bit GP-relative offset, e.g. .gpdword on Mips. Defaults /// to nullptr. @@ -436,6 +439,7 @@ const char *getData16bitsDirective() const { return Data16bitsDirective; } const char *getData32bitsDirective() const { return Data32bitsDirective; } const char *getData64bitsDirective() const { return Data64bitsDirective; } + bool supportsSignedData() const { return SupportsSignedData; } const char *getGPRel64Directive() const { return GPRel64Directive; } const char *getGPRel32Directive() const { return GPRel32Directive; } const char *getDTPRel64Directive() const { return DTPRel64Directive; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83423.276592.patch Type: text/x-patch Size: 2568 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:08:11 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via llvm-commits) Date: Wed, 08 Jul 2020 16:08:11 -0700 (PDT) Subject: [compiler-rt] 158feab - [Sanitizer]: Require !android for protoent test Message-ID: <5f0651db.1c69fb81.436ee.1f54@mx.google.com> Author: Gui Andrade Date: 2020-07-08T23:07:59Z New Revision: 158feabde4cb98021469ed4126682d8ee57456eb URL: https://github.com/llvm/llvm-project/commit/158feabde4cb98021469ed4126682d8ee57456eb DIFF: https://github.com/llvm/llvm-project/commit/158feabde4cb98021469ed4126682d8ee57456eb.diff LOG: [Sanitizer]: Require !android for protoent test Added: Modified: compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp Removed: ################################################################################ diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp index 1b4e90a407ac..defc38ae2b1e 100644 --- a/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp +++ b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp @@ -1,4 +1,5 @@ // RUN: %clangxx -std=c++11 -O0 -g %s -o %t && %run %t 2>&1 | FileCheck %s +// REQUIRES: !android #include #include From llvm-commits at lists.llvm.org Wed Jul 8 16:15:17 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:15:17 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show command line option In-Reply-To: References: Message-ID: yln updated this revision to Diff 276593. yln marked an inline comment as done. yln added a comment. Add `-show-xxx` for all result codes in addtion to unsupported and xfail. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 Files: llvm/utils/lit/lit/cl_arguments.py llvm/utils/lit/lit/main.py llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt llvm/utils/lit/tests/show-result-codes.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D82233.276593.patch Type: text/x-patch Size: 4741 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:20:51 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:20:51 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: efriedma added a comment. I think I'd like to see a testcase where there are multiple recursive calls, but only one is a tail call that can be eliminated. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:474 + // Operand Bundles or not marked as TailCall. + if (CI->isNoTailCall() || CI->hasOperandBundles() || !CI->isTailCall()) return nullptr; ---------------- The hasOperandBundles() check looks completely new; is there some test for it? The `isNoTailCall()` check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. ================ Comment at: llvm/test/Transforms/TailCallElim/basic.ll:23 +; CHECK: call i32 @test1 + %X = call i32 @test1() ; [#uses=1] ret i32 %X ---------------- I'm not sure this is testing what it was originally supposed to. I guess that's okay, but please fix the comment at least. ================ Comment at: llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll:20 +; Function Attrs: nofree noinline norecurse nounwind uwtable +define dso_local void @_Z15globalIncrementPKi(i32* nocapture readonly %param) local_unnamed_addr #0 { +entry: ---------------- For the purpose of this testcase, we don't need the definition of _Z15globalIncrementPKi. ================ Comment at: llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll:37 +; CHECK: br label %tailrecurse +; CHECK-NOT: call void @_Z4testi +; CHECK: ret ---------------- I think I'd prefer to just generate this with update_test_checks.py Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Wed Jul 8 16:23:34 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:23:34 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show-xxx command line options In-Reply-To: References: Message-ID: <9b321211900c939c0f3e3b856df3859f@localhost.localdomain> yln marked an inline comment as done. yln added a comment. @jdenny I changed our approach to just adding `--show-` flags for all non-failure result codes, just as we already do for `--show-xfail` and `--show-unsupported`. The help entries for the flags look like this: --show-excluded Show excluded tests (EXCLUDED) --show-skipped Show skipped tests (SKIPPED) --show-unsupported Show unsupported tests (UNSUPPORTED) --show-pass Show passed tests (PASS) --show-flakypass Show passed with retry tests (FLAKYPASS) --show-xfail Show expectedly failed tests (XFAIL) ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:204 + else: + opts.shown_codes.add(lit.Test.ResultCode._instances[code.upper()]) ---------------- jdenny wrote: > yln wrote: > > jdenny wrote: > > > What happens if there are user-defined result codes that are spelled the same except for case? > > Unfortunately, this can't be used (yet) to specify user-defined result codes at all. User codes are usually registered in config files and this code executes before we evaluate configs, i.e., it will print `argument --show: invalid choice: 'user-code' (choose from ...)` > > > > If we think it's worth it then we could push "choice validation" to a later point after we processed the user configs. > I don't know that `--show` support for user-defined result codes needs to be implemented in this patch. > > In that case, the case-insensitivity issue I raised is not relevant yet, right? That can be addressed later then. > > In that case, the case-insensitivity issue I raised is not relevant yet, right? That's right! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 From llvm-commits at lists.llvm.org Wed Jul 8 16:29:27 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:29:27 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough In-Reply-To: References: Message-ID: <35f57070fcae1052f42fa999f6274169@localhost.localdomain> dblaikie added a comment. If the test is writing the output file anyway - is the rm necessary? (lots of tests write to output files via "-o %t" from some tool or another and most don't delete %t before doing so) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83432/new/ https://reviews.llvm.org/D83432 From llvm-commits at lists.llvm.org Wed Jul 8 16:31:03 2020 From: llvm-commits at lists.llvm.org (Mitch Phillips via llvm-commits) Date: Wed, 08 Jul 2020 16:31:03 -0700 (PDT) Subject: [llvm] 5a98581 - [NFC] Fix some docs warnings Message-ID: <5f065737.1c69fb81.fd0df.2ac5@mx.google.com> Author: Mitch Phillips Date: 2020-07-08T16:30:12-07:00 New Revision: 5a98581d196ba1ad9edaf36b1d1db122287b01eb URL: https://github.com/llvm/llvm-project/commit/5a98581d196ba1ad9edaf36b1d1db122287b01eb DIFF: https://github.com/llvm/llvm-project/commit/5a98581d196ba1ad9edaf36b1d1db122287b01eb.diff LOG: [NFC] Fix some docs warnings Summary: Fixes two minor issues in the docs present under `ninja docs-llvm-html`: 1 - A header is too small: ``` Warning, treated as error: llvm/llvm/docs/Passes.rst:70:Title underline too short. ``-basic-aa``: Basic Alias Analysis (stateless AA impl) ------------------------------------------------------ ``` 2 - Multiple definitions on a non-anonymous target (llvm-dev mailing list): ``` Warning, treated as error: llvm/llvm/docs/DeveloperPolicy.rst:3:Duplicate explicit target name: "llvm-dev mailing list". ``` Reviewers: lattner Reviewed By: lattner Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83416 Added: Modified: llvm/docs/DeveloperPolicy.rst llvm/docs/Passes.rst Removed: ################################################################################ diff --git a/llvm/docs/DeveloperPolicy.rst b/llvm/docs/DeveloperPolicy.rst index 8d424e63372a..b6e43ce0534e 100644 --- a/llvm/docs/DeveloperPolicy.rst +++ b/llvm/docs/DeveloperPolicy.rst @@ -535,7 +535,7 @@ will only be done through the following process: at a minimum. This time-based guideline is not strict: we may support much older compilers, or decide to support fewer versions. - * An RFC is sent to the `llvm-dev mailing list `_ + * An RFC is sent to the `llvm-dev mailing list`_ - Detail upsides of the version increase (e.g. which newer C++ language or library features LLVM should use; avoid miscompiles in particular compiler @@ -580,15 +580,15 @@ have the following general policies for introducing major new components into the LLVM world. However, this is really only intended to cover common cases that we have seen arise: diff erent situations are diff erent, and we are open to discussing unusual cases as well - just start an RFC thread on the -`llvm-dev mailing list `_. +`llvm-dev mailing list`_. Adding a New Target ------------------- LLVM is very receptive to new targets, even experimental ones, but a number of problems can appear when adding new large portions of code, and back-ends are -normally added in bulk. We have found that landing large pieces of new code -and then trying to fix emergent problems in-tree is problematic for a variety +normally added in bulk. We have found that landing large pieces of new code +and then trying to fix emergent problems in-tree is problematic for a variety of reasons. For these reasons, new targets are *always* added as *experimental* until @@ -627,8 +627,8 @@ The basic rules for a back-end to be upstreamed in **experimental** mode are: * The target should have either reasonable documentation on how it works (ISA, ABI, etc.) or a publicly available simulator/hardware (either free or cheap enough) - preferably both. This allows - developers to validate assumptions, understand constraints and review code - that can affect the target. + developers to validate assumptions, understand constraints and review code + that can affect the target. In addition, the rules for a back-end to be promoted to **official** are: @@ -699,7 +699,7 @@ targets", they: "should" concerns above. If you have a project that you think would make sense to add to the LLVM -monorepo, please start an RFC thread on the llvm-dev mailing list to kick off +monorepo, please start an RFC thread on the `llvm-dev mailing list`_ to kick off the discussion. This process can take some time and iteration - please don’t be discouraged or intimidated by that! @@ -761,8 +761,7 @@ may be eventually retired, but no process has been established for that yet. If and when this comes up, please start an RFC discussion on llvm-dev. This process is very new - please expect the details to change, it is always -safe to ask on the `llvm-dev mailing list -`_ about this. +safe to ask on the `llvm-dev mailing list`_ about this. Suggested disclaimer for the project README and the main project web page: @@ -1033,3 +1032,5 @@ applications to the binary redistribution clause. This also means that it is ok to move code from (e.g.) libc++ to the LLVM core without concern, but that code cannot be moved from the LLVM core to libc++ without the copyright owner's permission. + +.. _llvm-dev mailing list: http://lists.llvm.org/mailman/listinfo/llvm-dev diff --git a/llvm/docs/Passes.rst b/llvm/docs/Passes.rst index 216b87a925d2..9a6c6944b96e 100644 --- a/llvm/docs/Passes.rst +++ b/llvm/docs/Passes.rst @@ -67,7 +67,7 @@ This is inspired and adapted from code by: Naveen Neelakantam, Francesco Spadini, and Wojciech Stryjewski. ``-basic-aa``: Basic Alias Analysis (stateless AA impl) ------------------------------------------------------- +------------------------------------------------------- A basic alias analysis pass that implements identities (two diff erent globals cannot alias, etc), but does no stateful analysis. From llvm-commits at lists.llvm.org Wed Jul 8 16:31:12 2020 From: llvm-commits at lists.llvm.org (Mitch Phillips via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:31:12 +0000 (UTC) Subject: [PATCH] D83416: [NFC] Fix some docs warnings In-Reply-To: References: Message-ID: <9f98881902f45c129001e81c5e144a07@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG5a98581d196b: [NFC] Fix some docs warnings (authored by hctim). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83416/new/ https://reviews.llvm.org/D83416 Files: llvm/docs/DeveloperPolicy.rst llvm/docs/Passes.rst Index: llvm/docs/Passes.rst =================================================================== --- llvm/docs/Passes.rst +++ llvm/docs/Passes.rst @@ -67,7 +67,7 @@ Spadini, and Wojciech Stryjewski. ``-basic-aa``: Basic Alias Analysis (stateless AA impl) ------------------------------------------------------- +------------------------------------------------------- A basic alias analysis pass that implements identities (two different globals cannot alias, etc), but does no stateful analysis. Index: llvm/docs/DeveloperPolicy.rst =================================================================== --- llvm/docs/DeveloperPolicy.rst +++ llvm/docs/DeveloperPolicy.rst @@ -535,7 +535,7 @@ at a minimum. This time-based guideline is not strict: we may support much older compilers, or decide to support fewer versions. - * An RFC is sent to the `llvm-dev mailing list `_ + * An RFC is sent to the `llvm-dev mailing list`_ - Detail upsides of the version increase (e.g. which newer C++ language or library features LLVM should use; avoid miscompiles in particular compiler @@ -580,15 +580,15 @@ the LLVM world. However, this is really only intended to cover common cases that we have seen arise: different situations are different, and we are open to discussing unusual cases as well - just start an RFC thread on the -`llvm-dev mailing list `_. +`llvm-dev mailing list`_. Adding a New Target ------------------- LLVM is very receptive to new targets, even experimental ones, but a number of problems can appear when adding new large portions of code, and back-ends are -normally added in bulk. We have found that landing large pieces of new code -and then trying to fix emergent problems in-tree is problematic for a variety +normally added in bulk. We have found that landing large pieces of new code +and then trying to fix emergent problems in-tree is problematic for a variety of reasons. For these reasons, new targets are *always* added as *experimental* until @@ -627,8 +627,8 @@ * The target should have either reasonable documentation on how it works (ISA, ABI, etc.) or a publicly available simulator/hardware (either free or cheap enough) - preferably both. This allows - developers to validate assumptions, understand constraints and review code - that can affect the target. + developers to validate assumptions, understand constraints and review code + that can affect the target. In addition, the rules for a back-end to be promoted to **official** are: @@ -699,7 +699,7 @@ "should" concerns above. If you have a project that you think would make sense to add to the LLVM -monorepo, please start an RFC thread on the llvm-dev mailing list to kick off +monorepo, please start an RFC thread on the `llvm-dev mailing list`_ to kick off the discussion. This process can take some time and iteration - please don’t be discouraged or intimidated by that! @@ -761,8 +761,7 @@ and when this comes up, please start an RFC discussion on llvm-dev. This process is very new - please expect the details to change, it is always -safe to ask on the `llvm-dev mailing list -`_ about this. +safe to ask on the `llvm-dev mailing list`_ about this. Suggested disclaimer for the project README and the main project web page: @@ -1033,3 +1032,5 @@ to move code from (e.g.) libc++ to the LLVM core without concern, but that code cannot be moved from the LLVM core to libc++ without the copyright owner's permission. + +.. _llvm-dev mailing list: http://lists.llvm.org/mailman/listinfo/llvm-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: D83416.276598.patch Type: text/x-patch Size: 3737 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:34:01 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?B?RsSBbmctcnXDrCBTw7JuZw==?= via llvm-commits) Date: Wed, 8 Jul 2020 16:34:01 -0700 Subject: [PATCH] D83037: [llvm-readobj] - Fix a crash scenario in GNUStyle::printHashSymbols(). In-Reply-To: References: Message-ID: Perhaps information wasn't propagated in time. For the two patches I landed today, the emails are good: """ This revision was automatically updated to reflect the committed changes. Closed by commit rG169ec2d6b006: [ELF] Rename canRelax to toExecRelax. NFC (authored by MaskRay). """ """ This revision was automatically updated to reflect the committed changes. Closed by commit rGe89c075f3251: [test] Run llvm/test/**/*.yaml & don't run llvm/test/**/*.cxx (not exist) (authored by MaskRay). """ On Tue, Jul 7, 2020 at 2:47 AM James Henderson wrote: > > Hi @Mehdi/MaskRay, > > Looks like something else odd is going on with Phabricator - I accepted this patch, but when it landed, I got this email saying it landed in a "Needs Review" state (see below). Probably there's something wrong with Phabricator again? > > James > > > On Tue, 7 Jul 2020 at 10:41, George Rimar via Phabricator wrote: >> >> This revision was not accepted when it landed; it landed in state "Needs Review". >> This revision was automatically updated to reflect the committed changes. >> Closed by commit rGd5cbf7ba3252: [llvm-readobj] - Fix a crash scenario in GNUStyle<ELFT>::printHashSymbols(). (authored by grimar). >> >> Repository: >> rG LLVM Github Monorepo >> >> CHANGES SINCE LAST ACTION >> https://reviews.llvm.org/D83037/new/ >> >> https://reviews.llvm.org/D83037 >> >> Files: >> llvm/test/tools/llvm-readobj/ELF/hash-symbols.test >> llvm/tools/llvm-readobj/ELFDumper.cpp >> -- 宋方睿 From llvm-commits at lists.llvm.org Wed Jul 8 16:41:48 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:41:48 +0000 (UTC) Subject: [PATCH] D83442: [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, lebedev.ri, nlopes, eli.friedman, reames, majnemer. Herald added a subscriber: hiraditya. Herald added a project: LLVM. We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions. https://reviews.llvm.org/D83442 Files: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Index: llvm/test/Transforms/InstSimplify/select.ll =================================================================== --- llvm/test/Transforms/InstSimplify/select.ll +++ llvm/test/Transforms/InstSimplify/select.ll @@ -848,3 +848,17 @@ %s = select i1 %cond, i32 undef, i32 %xf ret i32 %s } + + at g = external global i32, align 1 + +; Make sure we don't fold partial undef vectors when constexprs are involved. +; We would need to prove the constexpr doesn't result in poison which we aren't +; equiped to do yet. +define <2 x i32> @false_undef_true_constextpr_vec(i1 %cond) { +; CHECK-LABEL: @false_undef_true_constextpr_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> , <2 x i32> +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> , <2 x i32> + ret <2 x i32> %s +} Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4135,9 +4135,11 @@ // one element is undef, choose the defined element as the safe result. if (TEltC == FEltC) NewC.push_back(TEltC); - else if (isa(TEltC)) + else if (isa(TEltC) && + isGuaranteedNotToBeUndefOrPoison(FEltC)) NewC.push_back(FEltC); - else if (isa(FEltC)) + else if (isa(FEltC) && + isGuaranteedNotToBeUndefOrPoison(TEltC)) NewC.push_back(TEltC); else break; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83442.276599.patch Type: text/x-patch Size: 1728 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:43:58 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:43:58 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <2c11f1be9e559eec1ea6ec3efd956ef7@localhost.localdomain> avl marked an inline comment as done. avl added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:474 + // Operand Bundles or not marked as TailCall. + if (CI->isNoTailCall() || CI->hasOperandBundles() || !CI->isTailCall()) return nullptr; ---------------- efriedma wrote: > The hasOperandBundles() check looks completely new; is there some test for it? > > The `isNoTailCall()` check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. >The hasOperandBundles() check looks completely new; is there some test for it? it is not new. it is copied from 245 line. Now, when patch changed from its original state all above conditions could be changed just to : if (!CI->isTailCall()) the test is Transforms/TailCallElim/deopt-bundle.ll >The isNoTailCall() check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. would add checking for that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Wed Jul 8 16:45:18 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:45:18 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show-xxx command line options In-Reply-To: References: Message-ID: <8de2ac87ace49407a1cb2a2524ea7619@localhost.localdomain> jdenny accepted this revision. jdenny added a comment. This revision is now accepted and ready to land. The last update lost a drive-by fix for `MAX_FAILURES`. I didn't check to see if that was fixed elsewhere. Other than the comment suggestion I just added, LGTM. ================ Comment at: llvm/utils/lit/lit/cl_arguments.py:71 + if not c.isFailure] + for code in success_codes: + format_group.add_argument( ---------------- I'd appreciate a comment here to clarify that user-defined result codes are not supported by `--show-*`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 From llvm-commits at lists.llvm.org Wed Jul 8 16:53:53 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:53:53 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: ggeorgakoudis updated this revision to Diff 276601. ggeorgakoudis added a comment. Update for comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.276601.patch Type: text/x-patch Size: 5336 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:55:08 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Wed, 08 Jul 2020 16:55:08 -0700 (PDT) Subject: [llvm] 2308487 - [openmp] Use switch in isAllowedClauseForDirective instead of multiple if Message-ID: <5f065cdc.1c69fb81.4dd6f.200e@mx.google.com> Author: Valentin Clement Date: 2020-07-08T19:54:59-04:00 New Revision: 23084878e96cadba4ade809b08229f3ee908aee9 URL: https://github.com/llvm/llvm-project/commit/23084878e96cadba4ade809b08229f3ee908aee9 DIFF: https://github.com/llvm/llvm-project/commit/23084878e96cadba4ade809b08229f3ee908aee9.diff LOG: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if Summary: Change the test in isAllowedClauseForDirective from if with multiple conditions to a main switch on directive and then switches on clause for each directive. Version check is still done with a condition in the return statment. Reviewers: jdoerfert, jdenny Reviewed By: jdenny Subscribers: yaxunl, guansong, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83363 Added: Modified: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp Removed: ################################################################################ diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index b4d1a6ed2026..43b7ec399b99 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -106,9 +106,17 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: bool llvm::tdl::isAllowedClauseForDirective(Directive D, Clause C, unsigned Version) { // IMPL-NEXT: assert(unsigned(D) <= llvm::tdl::Directive_enumSize); // IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); -// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 1 <= Version && 2147483647 >= Version) -// IMPL-NEXT: return true; -// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clauseb && 1 <= Version && 2147483647 >= Version) -// IMPL-NEXT: return true; -// IMPL-NEXT: return false; +// IMPL-NEXT: switch (D) { +// IMPL-NEXT: case TDLD_dira: +// IMPL-NEXT: switch (C) { +// IMPL-NEXT: case TDLC_clausea: +// IMPL-NEXT: return 1 <= Version && 2147483647 >= Version; +// IMPL-NEXT: case TDLC_clauseb: +// IMPL-NEXT: return 1 <= Version && 2147483647 >= Version; +// IMPL-NEXT: default: +// IMPL-NEXT: return false; +// IMPL-NEXT: } +// IMPL-NEXT: break; +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } diff --git a/llvm/test/TableGen/directive2.td b/llvm/test/TableGen/directive2.td index 8e180e20df1f..10f48c2a3ceb 100644 --- a/llvm/test/TableGen/directive2.td +++ b/llvm/test/TableGen/directive2.td @@ -97,10 +97,17 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: bool llvm::tdl::isAllowedClauseForDirective(Directive D, Clause C, unsigned Version) { // IMPL-NEXT: assert(unsigned(D) <= llvm::tdl::Directive_enumSize); // IMPL-NEXT: assert(unsigned(C) <= llvm::tdl::Clause_enumSize); -// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clausea && 2 <= Version && 4 >= Version) -// IMPL-NEXT: return true; -// IMPL-NEXT: if (D == TDLD_dira && C == TDLC_clauseb && 2 <= Version && 2147483647 >= Version) -// IMPL-NEXT: return true; -// IMPL-NEXT: return false; +// IMPL-NEXT: switch (D) { +// IMPL-NEXT: case TDLD_dira: +// IMPL-NEXT: switch (C) { +// IMPL-NEXT: case TDLC_clausea: +// IMPL-NEXT: return 2 <= Version && 4 >= Version; +// IMPL-NEXT: case TDLC_clauseb: +// IMPL-NEXT: return 2 <= Version && 2147483647 >= Version; +// IMPL-NEXT: default: +// IMPL-NEXT: return false; +// IMPL-NEXT: } +// IMPL-NEXT: break; +// IMPL-NEXT: } +// IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } - diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index a9f3569c07a2..d4d2b7965420 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -202,29 +202,27 @@ void GenerateGetKind(const std::vector &Records, raw_ostream &OS, OS << "}\n"; } -void GenerateTestForAllowedClauses(const std::vector &Clauses, - raw_ostream &OS, StringRef DirectiveName, - StringRef DirectivePrefix, - StringRef ClausePrefix) { - - const auto FormattedDirectiveName = getFormattedName(DirectiveName); +void GenerateCaseForVersionedClauses(const std::vector &Clauses, + raw_ostream &OS, StringRef DirectiveName, + StringRef DirectivePrefix, + StringRef ClausePrefix) { for (const auto &C : Clauses) { const auto MinVersion = C->getValueAsInt("minVersion"); const auto MaxVersion = C->getValueAsInt("maxVersion"); const auto SpecificClause = C->getValueAsDef("clause"); const auto ClauseName = SpecificClause->getValueAsString("name"); - - OS << " if (D == " << DirectivePrefix << FormattedDirectiveName - << " && C == " << ClausePrefix << getFormattedName(ClauseName) << " && " - << MinVersion << " <= Version && " << MaxVersion << " >= Version)\n"; - OS << " return true;\n"; + OS << " case " << ClausePrefix << getFormattedName(ClauseName) + << ":\n"; + OS << " return " << MinVersion << " <= Version && " << MaxVersion + << " >= Version;\n"; } } // Generate the isAllowedClauseForDirective function implementation. void GenerateIsAllowedClause(const std::vector &Directives, - raw_ostream &OS, StringRef DirectivePrefix, - StringRef ClausePrefix, StringRef CppNamespace) { + raw_ostream &OS, StringRef LanguageName, + StringRef DirectivePrefix, StringRef ClausePrefix, + StringRef CppNamespace) { OS << "\n"; OS << "bool llvm::" << CppNamespace << "::isAllowedClauseForDirective(" << "Directive D, Clause C, unsigned Version) {\n"; @@ -233,24 +231,39 @@ void GenerateIsAllowedClause(const std::vector &Directives, OS << " assert(unsigned(C) <= llvm::" << CppNamespace << "::Clause_enumSize);\n"; + OS << " switch (D) {\n"; + for (const auto &D : Directives) { + const auto DirectiveName = D->getValueAsString("name"); + OS << " case " << DirectivePrefix << getFormattedName(DirectiveName) + << ":\n"; + OS << " switch (C) {\n"; + const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); - GenerateTestForAllowedClauses(AllowedClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(AllowedClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); const auto &AllowedOnceClauses = D->getValueAsListOfDefs("allowedOnceClauses"); - GenerateTestForAllowedClauses(AllowedOnceClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(AllowedOnceClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); - GenerateTestForAllowedClauses(RequiredClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(RequiredClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); + + OS << " default:\n"; + OS << " return false;\n"; + OS << " }\n"; // End of clauses switch + OS << " break;\n"; } - OS << " return false;\n"; - OS << "}\n"; + + OS << " }\n"; // End of directives switch + OS << " llvm_unreachable(\"Invalid " << LanguageName + << " Directive kind\");\n"; + OS << "}\n"; // End of function isAllowedClauseForDirective } // Generate the implemenation section for the enumeration in the directive @@ -291,8 +304,8 @@ void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { GenerateGetName(Clauses, OS, "Clause", ClausePrefix, LanguageName, CppNamespace); - GenerateIsAllowedClause(Directives, OS, DirectivePrefix, ClausePrefix, - CppNamespace); + GenerateIsAllowedClause(Directives, OS, LanguageName, DirectivePrefix, + ClausePrefix, CppNamespace); } } // namespace llvm From llvm-commits at lists.llvm.org Wed Jul 8 16:55:09 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:55:09 +0000 (UTC) Subject: [PATCH] D83363: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if In-Reply-To: References: Message-ID: <5815d62d75bdf8ca190cf9e71b9e5725@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG23084878e96c: [openmp] Use switch in isAllowedClauseForDirective instead of multiple if (authored by clementval). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83363/new/ https://reviews.llvm.org/D83363 Files: llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83363.276602.patch Type: text/x-patch Size: 7262 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 16:58:16 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 23:58:16 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show-xxx command line options In-Reply-To: References: Message-ID: <8c0bb3f060a7d2ee0245fedf4b8cdb51@localhost.localdomain> yln added a comment. In D82233#2140554 , @jdenny wrote: > The last update lost a drive-by fix for `MAX_FAILURES`. I didn't check to see if that was fixed elsewhere. Fixed by author: https://github.com/llvm/llvm-project/commit/8cd117c24f48428e01f88cf18480e5af7eb20c0c > Other than the comment suggestion I just added, LGTM. I will add a comment before committing. Thanks for the quick reviews! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 From llvm-commits at lists.llvm.org Wed Jul 8 17:00:17 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:00:17 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <4b88752ce325fc43d76fd1ec2c822bf6@localhost.localdomain> tmsriram marked 4 inline comments as done. tmsriram added a comment. In D79978#2138208 , @MaskRay wrote: > I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well. > > For `basic-block-sections-cfiinstr_1.ll`, have you considered places like `CodeGen/X86/cfi-inserter-*`? You may even create a subdirectory there. > `_1` is not very common. `-1` is more common. Done. > `curl -L 'https://reviews.llvm.org/D79978?download=1'` does not have a/ or b/ prefix. I think that may be why `arc patch D79978` cannot apply the patch. > Can you upload a diff with either `arc diff`, git format-patch -1 or `git diff 'HEAD^'`? Thanks. Uploaded after git diff HEAD^. Please see if this works. ================ Comment at: llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll:6 +; CFI_INSTR: _Z7computebiiiiii +; CFI_INSTR: bb.0 +; CFI_INSTR: bb.1 ---------------- MaskRay wrote: > I think these labels may need `:` suffix and a `# `prefix to make them unique. I added the full name now, not sure what you mean by '#' prefix. ================ Comment at: llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll:20-30 +; Exhaust caller-saved parameter registers and force callee saved registers to +; be used in the computation. This tests that CFI directives for callee saved +; registers are generated with basic block sections. +; extern int f1(int, int, int); +; +; int compute(bool k, int p1, int p2, int p3, int p4, int p5, int p6) { +; int result = p1; ---------------- dblaikie wrote: > this looks nicer - though I'd still like a bit more commentary on exactly how/why these constructs are here? Why two function calls with interleaved parameters rather than one, etc? > > Mostly I'm hoping the test would explain why these constructs are used and which parts are relevant. (does the function need a non-void return type? or could the function calls be void-returning but conditional? (it's not like they can be optimized away, since they might have side effects anyway)) I cleaned this up a bit more adding more comments and changing it to a void func. A single func call is not utilizing callee saved registers but using two calls like this is forcing it. PTAL. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Wed Jul 8 17:01:26 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via llvm-commits) Date: Wed, 08 Jul 2020 17:01:26 -0700 (PDT) Subject: [llvm] f06d242 - [lit] Add --show-xxx command line options Message-ID: <5f065e56.1c69fb81.3b217.2ded@mx.google.com> Author: Julian Lettner Date: 2020-07-08T17:01:05-07:00 New Revision: f06d2420b738adef6cea80812fdde0bc36c4ea41 URL: https://github.com/llvm/llvm-project/commit/f06d2420b738adef6cea80812fdde0bc36c4ea41 DIFF: https://github.com/llvm/llvm-project/commit/f06d2420b738adef6cea80812fdde0bc36c4ea41.diff LOG: [lit] Add --show-xxx command line options Provide `--show-xxx` flags for all non-failure result codes, just as we already do for `--show-xfail` and `--show-unsupported`. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D82233 Added: llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt llvm/utils/lit/tests/show-result-codes.py Modified: llvm/utils/lit/lit/cl_arguments.py llvm/utils/lit/lit/main.py Removed: ################################################################################ diff --git a/llvm/utils/lit/lit/cl_arguments.py b/llvm/utils/lit/lit/cl_arguments.py index 803e5f9ba02b..966863624feb 100644 --- a/llvm/utils/lit/lit/cl_arguments.py +++ b/llvm/utils/lit/lit/cl_arguments.py @@ -65,12 +65,18 @@ def parse_args(): dest="useProgressBar", help="Do not use curses based progress bar", action="store_false") - format_group.add_argument("--show-unsupported", - help="Show unsupported tests", - action="store_true") - format_group.add_argument("--show-xfail", - help="Show tests that were expected to fail", - action="store_true") + + # Note: this does not generate flags for user-defined result codes. + success_codes = [c for c in lit.Test.ResultCode.all_codes() + if not c.isFailure] + for code in success_codes: + format_group.add_argument( + "--show-{}".format(code.name.lower()), + dest="shown_codes", + help="Show {} tests ({})".format(code.label.lower(), code.name), + action="append_const", + const=code, + default=[]) execution_group = parser.add_argument_group("Test Execution") execution_group.add_argument("--path", @@ -187,12 +193,6 @@ def parse_args(): else: opts.shard = None - opts.show_results = set() - if opts.show_unsupported: - opts.show_results.add(lit.Test.UNSUPPORTED) - if opts.show_xfail: - opts.show_results.add(lit.Test.XFAIL) - opts.reports = filter(None, [opts.output, opts.xunit_xml_output]) return opts diff --git a/llvm/utils/lit/lit/main.py b/llvm/utils/lit/lit/main.py index 860c584fbdf4..c47bdede3176 100755 --- a/llvm/utils/lit/lit/main.py +++ b/llvm/utils/lit/lit/main.py @@ -265,15 +265,15 @@ def print_results(tests, elapsed, opts): tests_by_code[test.result.code].append(test) for code in lit.Test.ResultCode.all_codes(): - print_group(tests_by_code[code], code, opts.show_results) + print_group(tests_by_code[code], code, opts.shown_codes) print_summary(tests_by_code, opts.quiet, elapsed) -def print_group(tests, code, show_results): +def print_group(tests, code, shown_codes): if not tests: return - if not code.isFailure and code not in show_results: + if not code.isFailure and code not in shown_codes: return print('*' * 20) print('{} Tests ({}):'.format(code.label, len(tests))) diff --git a/llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt b/llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt new file mode 100644 index 000000000000..15eb81a5f5e9 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt @@ -0,0 +1 @@ +RUN: false diff --git a/llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg b/llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg new file mode 100644 index 000000000000..2aa84326bcea --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg @@ -0,0 +1,6 @@ +import lit.formats +config.name = 'show-result-codes' +config.suffixes = ['.txt'] +config.test_format = lit.formats.ShTest() +config.test_source_root = None +config.test_exec_root = None diff --git a/llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt b/llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt new file mode 100644 index 000000000000..18efe9e49e95 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt @@ -0,0 +1 @@ +RUN: true diff --git a/llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt b/llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt new file mode 100644 index 000000000000..b1f70207f1fd --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt @@ -0,0 +1,2 @@ +REQUIRES: missing-feature +RUN: true diff --git a/llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt b/llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt new file mode 100644 index 000000000000..6f2e4e08ba18 --- /dev/null +++ b/llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt @@ -0,0 +1,2 @@ +XFAIL: * +RUN: false diff --git a/llvm/utils/lit/tests/show-result-codes.py b/llvm/utils/lit/tests/show-result-codes.py new file mode 100644 index 000000000000..5d8cd0d9eb9f --- /dev/null +++ b/llvm/utils/lit/tests/show-result-codes.py @@ -0,0 +1,21 @@ +# Test the --show- {pass,unsupported,xfail,...} options. +# +# RUN: not %{lit} %{inputs}/show-result-codes | FileCheck %s --check-prefix=NONE +# RUN: not %{lit} %{inputs}/show-result-codes --show-unsupported | FileCheck %s --check-prefix=ONE +# RUN: not %{lit} %{inputs}/show-result-codes --show-pass --show-xfail | FileCheck %s --check-prefix=MULTIPLE + +# Failing tests are always shown +# NONE-NOT: Unsupported Tests (1) +# NONE-NOT: Passed Tests (1) +# NONE-NOT: Expectedly Failed Tests (1) +# NONE: Failed Tests (1) + +# ONE: Unsupported Tests (1) +# ONE-NOT: Passed Tests (1) +# ONE-NOT: Expectedly Failed Tests (1) +# ONE: Failed Tests (1) + +# MULTIPLE-NOT: Unsupported Tests (1) +# MULTIPLE: Passed Tests (1) +# MULTIPLE: Expectedly Failed Tests (1) +# MULTIPLE: Failed Tests (1) From llvm-commits at lists.llvm.org Wed Jul 8 17:01:29 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:01:29 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <924fea6552b64bf9d72c906f5220dd2d@localhost.localdomain> tmsriram updated this revision to Diff 276603. tmsriram marked an inline comment as done. tmsriram added a comment. Address reviewer comments: - Rename tests. - SImplify tests. - Remove braces. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/DebugInfo/X86/basic-block-sections-cfi-1.ll llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr-1.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.276603.patch Type: text/x-patch Size: 15311 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 17:01:32 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:01:32 +0000 (UTC) Subject: [PATCH] D82233: [lit] Add --show-xxx command line options In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGf06d2420b738: [lit] Add --show-xxx command line options (authored by yln). Changed prior to commit: https://reviews.llvm.org/D82233?vs=276593&id=276604#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82233/new/ https://reviews.llvm.org/D82233 Files: llvm/utils/lit/lit/cl_arguments.py llvm/utils/lit/lit/main.py llvm/utils/lit/tests/Inputs/show-result-codes/fail.txt llvm/utils/lit/tests/Inputs/show-result-codes/lit.cfg llvm/utils/lit/tests/Inputs/show-result-codes/pass.txt llvm/utils/lit/tests/Inputs/show-result-codes/unsupported.txt llvm/utils/lit/tests/Inputs/show-result-codes/xfail.txt llvm/utils/lit/tests/show-result-codes.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D82233.276604.patch Type: text/x-patch Size: 4818 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 17:13:19 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:13:19 +0000 (UTC) Subject: [PATCH] D83444: [AArch64][SVE] Add lowering for llvm.fma. Message-ID: efriedma created this revision. efriedma added reviewers: sdesmalen, paulwalker-arm. Herald added subscribers: danielkiss, psnobl, hiraditya, kristof.beyls, tschuett. Herald added a project: LLVM. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to generate code. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83444 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td llvm/test/CodeGen/AArch64/sve-fp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83444.276606.patch Type: text/x-patch Size: 6344 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 17:15:07 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:15:07 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: <05a8e621ce6fbe9f194fe10fa6aa322b@localhost.localdomain> guiand updated this revision to Diff 276607. guiand marked an inline comment as done. guiand added a comment. Fix test checking wrong `__msan_warning` variant, clang-format Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 Files: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81699.276607.patch Type: text/x-patch Size: 8215 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 17:30:40 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:30:40 +0000 (UTC) Subject: [PATCH] D83430: [AliasSetTracker] More precise AAInfo intersection check In-Reply-To: References: Message-ID: asbirlea accepted this revision. asbirlea added a comment. This revision is now accepted and ready to land. LGTM, thank you! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83430/new/ https://reviews.llvm.org/D83430 From llvm-commits at lists.llvm.org Wed Jul 8 17:35:47 2020 From: llvm-commits at lists.llvm.org (Ding Fei via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:35:47 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: danix800 updated this revision to Diff 276609. danix800 added a comment. Full context attached. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 Files: llvm/lib/Support/Windows/Path.inc Index: llvm/lib/Support/Windows/Path.inc =================================================================== --- llvm/lib/Support/Windows/Path.inc +++ llvm/lib/Support/Windows/Path.inc @@ -958,8 +958,8 @@ // Convert path to the format that Windows is happy with. if (PathUTF16.size() > 0 && - !is_separator(PathUTF16[Path.size() - 1]) && - PathUTF16[Path.size() - 1] != L':') { + !is_separator(PathUTF16[PathUTF16.size() - 1]) && + PathUTF16[PathUTF16.size() - 1] != L':') { PathUTF16.push_back(L'\\'); PathUTF16.push_back(L'*'); } else { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83321.276609.patch Type: text/x-patch Size: 580 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 17:59:55 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 00:59:55 +0000 (UTC) Subject: [PATCH] D83428: [flang] Fix negative unit number hashing In-Reply-To: References: Message-ID: <9fe62c174200f3ea27ae8a4a9e5b13a1@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcffc6036173d: [flang] Fix negative unit number hashing (authored by klausler). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83428/new/ https://reviews.llvm.org/D83428 Files: flang/runtime/unit-map.h Index: flang/runtime/unit-map.h =================================================================== --- flang/runtime/unit-map.h +++ flang/runtime/unit-map.h @@ -15,6 +15,7 @@ #include "lock.h" #include "memory.h" #include "unit.h" +#include namespace Fortran::runtime::io { @@ -59,7 +60,7 @@ }; static constexpr int buckets_{1031}; // must be prime - int Hash(int n) { return n % buckets_; } + int Hash(int n) { return std::abs(n) % buckets_; } ExternalFileUnit *Find(int n) { Chain *previous{nullptr}; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83428.276612.patch Type: text/x-patch Size: 542 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 18:17:57 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:17:57 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <7d68e4e1a47e348401e4508dbff64e23@localhost.localdomain> MaskRay added a comment. In D83262#2135629 , @jhenderson wrote: > I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of `=0` to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason. I have a slightly stronger opinion that we should not add it. We could improve help messages for `cl::opt` to mention the default value. > Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. -finstrument-function-entry-bare). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected. Do you still have a need for output without source info? We add options if there is a reasonable use case, not that "we add it just to customize a behavior". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Wed Jul 8 18:41:47 2020 From: llvm-commits at lists.llvm.org (Kostya Serebryany via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:41:47 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: <44af5d3c142361d058747afedd3d2cf2@localhost.localdomain> kcc added a comment. No strong opinion on whether this needs to be done. If you feel strong, and if it will help, sure. (you may indeed have to test on various platforms, or rely on the post-commit bots) OTOH, the new profiler should not require all of these functions, you can probably get away with a custom-tailored variant of MapDynamicShadow. (Vitaly, please do the review) ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:122 - uptr granularity = GetMmapGranularity(); - uptr alignment = granularity * 8; - uptr left_padding = granularity; ---------------- tejohnson wrote: > The code in asan is multiplying the mmap granularity by 8, whereas the hwasan version shifts it by kShadowScale. I wasn't sure if the 8 here is supposed to be equivalent to a left shift by the shadow scale (which is typically 3 in asan), or is specifically hardcoded separately not using SHADOW_SCALE since it could be something other than 3 in some cases (e.g. 5 for myriad, or user set via ASAN_SHADOW_SCALE). Depending on what was intended here, I would keep the hardcoding of "3" passed to my refactored MapDynamicShadow, or change that to SHADOW_SCALE. I frankly don't remember :( Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Wed Jul 8 18:42:30 2020 From: llvm-commits at lists.llvm.org (John Regehr via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:42:30 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <15d8647afc7778fd0acb83f50de8b879@localhost.localdomain> regehr added a comment. > Did you mean to check something like the following? > > define i32 @src(i1 %cond, i32 %x) { > %x2 = freeze i32 %x > %s = select i1 %cond, i32 %x2, i32 undef > ret i32 %s > } > > define i32 @tgt(i1 %cond, i32 %x) { > %x2 = freeze i32 %x > ret i32 %x2 > } that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Wed Jul 8 18:45:47 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:45:47 +0000 (UTC) Subject: [PATCH] D83365: [PowerPC] start and end parameters for fixupIsDeadOrKill may exist in different block before RA In-Reply-To: References: Message-ID: shchenz updated this revision to Diff 276614. shchenz added a comment. update according to @nemanjai offline comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83365/new/ https://reviews.llvm.org/D83365 Files: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir Index: llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir =================================================================== --- /dev/null +++ llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir @@ -0,0 +1,21 @@ +# RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs -start-before ppc-mi-peepholes \ +# RUN: -stop-after ppc-mi-peepholes %s -o - | FileCheck %s + +--- +name: test +#CHECK : name : test +tracksRegLiveness: true +body: | + bb.0.entry: + liveins: $x3 + %0:g8rc = COPY $x3 + %1:gprc = COPY %0.sub_32:g8rc + %2:g8rc = LI8 63 + + bb.1: + %3:gprc = COPY %2.sub_32:g8rc + ; CHECK: %4:gprc = LI 0 + %4:gprc = XORI killed %3:gprc, 63 + STW killed %4:gprc, %4:gprc, 100 + BLR8 implicit $lr8, implicit $rm +... Index: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.cpp +++ llvm/lib/Target/PowerPC/PPCInstrInfo.cpp @@ -2655,10 +2655,15 @@ void PPCInstrInfo::fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const { - - // Instructions between [StartMI, EndMI] should be in same basic block. - assert((StartMI.getParent() == EndMI.getParent()) && - "Instructions are not in same basic block"); + // Conservatively clear kill flag for the register if the instructions are in + // different basic blocks, because the kill flag may no longer be right. There + // is no need to bother with dead flags since defs with no uses will be + // handled by DCE. + MachineRegisterInfo &MRI = StartMI.getParent()->getParent()->getRegInfo(); + if ((StartMI.getParent() != EndMI.getParent())) { + MRI.clearKillFlags(RegNo); + return; + } bool IsKillSet = false; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83365.276614.patch Type: text/x-patch Size: 1812 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 18:47:17 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:47:17 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: plotfi added a comment. @pratlucas gentle ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 From llvm-commits at lists.llvm.org Wed Jul 8 18:56:24 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 01:56:24 +0000 (UTC) Subject: [PATCH] D83447: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, RKSimon. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Technically a VSELECT expects a vector of all 1s or 0s elements for its condition. But we aren't guaranteeing that the sign bit and the non sign bits match in these locations. So we should use BLENDV which is more relaxed. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83447 Files: llvm/lib/Target/X86/X86ISelLowering.cpp Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -27681,7 +27681,8 @@ V0 = DAG.getBitcast(ExtVT, V0); V1 = DAG.getBitcast(ExtVT, V1); Sel = DAG.getBitcast(ExtVT, Sel); - return DAG.getBitcast(VT, DAG.getNode(X86ISD::BLENDV, dl, ExtVT, Sel, V0, V1)); + return DAG.getBitcast( + VT, DAG.getNode(X86ISD::BLENDV, dl, ExtVT, Sel, V0, V1)); } // On pre-SSE41 targets we splat the sign bit - a negative value will // set all bits of the lanes to true and VSELECT uses that in @@ -27826,7 +27827,8 @@ V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getNode(X86ISD::BLENDV, DL, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, DL, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true -------------- next part -------------- A non-text attachment was scrubbed... Name: D83447.276616.patch Type: text/x-patch Size: 1202 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:00:23 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:00:23 +0000 (UTC) Subject: [PATCH] D83447: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. In-Reply-To: References: Message-ID: <6ff67cdd886ebdee4a99a97f8b5419da@localhost.localdomain> craig.topper updated this revision to Diff 276617. craig.topper added a comment. Full patch. Previous was just the difference from running clang-format Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83447/new/ https://reviews.llvm.org/D83447 Files: llvm/lib/Target/X86/X86ISelLowering.cpp Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -27558,12 +27558,13 @@ ISD::SETGT); return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); } else if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, dl, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true @@ -27673,14 +27674,15 @@ !ISD::isBuildVectorOfConstantSDNodes(Amt.getNode()); auto SignBitSelect = [&](SDValue Sel, SDValue V0, SDValue V1) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just on + // the sign bit. if (UseSSE41) { MVT ExtVT = MVT::getVectorVT(MVT::i8, VT.getVectorNumElements() * 2); V0 = DAG.getBitcast(ExtVT, V0); V1 = DAG.getBitcast(ExtVT, V1); Sel = DAG.getBitcast(ExtVT, Sel); - return DAG.getBitcast(VT, DAG.getSelect(dl, ExtVT, Sel, V0, V1)); + return DAG.getBitcast( + VT, DAG.getNode(X86ISD::BLENDV, dl, ExtVT, Sel, V0, V1)); } // On pre-SSE41 targets we splat the sign bit - a negative value will // set all bits of the lanes to true and VSELECT uses that in @@ -27820,12 +27822,13 @@ auto SignBitSelect = [&](MVT SelVT, SDValue Sel, SDValue V0, SDValue V1) { if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(DL, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, DL, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true -------------- next part -------------- A non-text attachment was scrubbed... Name: D83447.276617.patch Type: text/x-patch Size: 2945 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:01:10 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via llvm-commits) Date: Wed, 08 Jul 2020 19:01:10 -0700 (PDT) Subject: [llvm] 4254ed5 - [Legalizer] Fix wrong operand in split vector helper Message-ID: <5f067a66.1c69fb81.6f166.36e6@mx.google.com> Author: Qiu Chaofan Date: 2020-07-09T09:57:29+08:00 New Revision: 4254ed5c325c4a366a5f763487822414df6a0de4 URL: https://github.com/llvm/llvm-project/commit/4254ed5c325c4a366a5f763487822414df6a0de4 DIFF: https://github.com/llvm/llvm-project/commit/4254ed5c325c4a366a5f763487822414df6a0de4.diff LOG: [Legalizer] Fix wrong operand in split vector helper This should be a typo introduced in D69275, which may cause an unknown segment fault in getNode. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83376 Added: Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 15d88eb5811f..550174f0df72 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2610,9 +2610,9 @@ SDValue DAGTypeLegalizer::SplitVecOp_TruncateHelper(SDNode *N) { SDValue Chain; if (N->isStrictFPOpcode()) { HalfLo = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfLo}); + {N->getOperand(0), InLoVec}); HalfHi = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfHi}); + {N->getOperand(0), InHiVec}); // Legalize the chain result - switch anything that used the old chain to // use the new one. Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, HalfLo.getValue(1), From llvm-commits at lists.llvm.org Wed Jul 8 19:01:23 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:01:23 +0000 (UTC) Subject: [PATCH] D83376: [Legalizer] Fix wrong operand in split vector helper In-Reply-To: References: Message-ID: <23b081efe84cb7bd63558f0aa8191342@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG4254ed5c325c: [Legalizer] Fix wrong operand in split vector helper (authored by qiucf). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83376/new/ https://reviews.llvm.org/D83376 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2610,9 +2610,9 @@ SDValue Chain; if (N->isStrictFPOpcode()) { HalfLo = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfLo}); + {N->getOperand(0), InLoVec}); HalfHi = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other}, - {N->getOperand(0), HalfHi}); + {N->getOperand(0), InHiVec}); // Legalize the chain result - switch anything that used the old chain to // use the new one. Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, HalfLo.getValue(1), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83376.276618.patch Type: text/x-patch Size: 858 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:06:05 2020 From: llvm-commits at lists.llvm.org (Bing Yu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:06:05 +0000 (UTC) Subject: [PATCH] D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks. In-Reply-To: References: Message-ID: <76ded0d4cb5f78cdbfc5679af0d01bee@localhost.localdomain> yubing added a comment. In D81791#2138692 , @RKSimon wrote: > @yubing @pengfei @craig.topper Please can you confirm the regressions have now been addressed? Thanks, Simon~ Your patch can solve our bug. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81791/new/ https://reviews.llvm.org/D81791 From llvm-commits at lists.llvm.org Wed Jul 8 19:06:26 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:06:26 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations In-Reply-To: References: Message-ID: <55ae7e966e2ff37696cc3a3b1ffa39d4@localhost.localdomain> qiucf added inline comments. ================ Comment at: llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll:57 define fp128 @testSubOdd(fp128 %a, fp128 %b) { -entry: - %0 = call fp128 @llvm.ppc.subf128.round.to.odd(fp128 %a, fp128 %b) +entry: %0 = call fp128 @llvm.ppc.subf128.round.to.odd(fp128 %a, fp128 %b) ret fp128 %0 ---------------- Misindent? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83437/new/ https://reviews.llvm.org/D83437 From llvm-commits at lists.llvm.org Wed Jul 8 19:12:55 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:12:55 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: mtrofin updated this revision to Diff 276619. mtrofin added a comment. removed the lit unnecessary (yet) changes, and added a name for the analysis in the registry Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 Files: llvm/CMakeLists.txt llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h llvm/include/llvm/Analysis/Utils/TFUtils.h llvm/lib/Analysis/CMakeLists.txt llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp llvm/lib/Analysis/TFUtils.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassRegistry.def llvm/unittests/Analysis/CMakeLists.txt llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/saved_model.pb llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.data-00000-of-00001 llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.index llvm/unittests/Analysis/TFUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82817.276619.patch Type: text/x-patch Size: 34759 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:31:52 2020 From: llvm-commits at lists.llvm.org (JunMa via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:31:52 +0000 (UTC) Subject: [PATCH] D83379: [Coroutines] Refactor sinkLifetimeStartMarkers In-Reply-To: References: Message-ID: <0b04cf15125a4931044dfb35130ca157@localhost.localdomain> junparser marked an inline comment as done. junparser added inline comments. ================ Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1578 + auto isUsedByLifetimeStart = [&](Instruction *I) { + if (isa(I) && I->hasOneUse()) + if (auto *IT = dyn_cast(I->user_back())) ---------------- lxfind wrote: > If I is a BitCastInst, wouldn't it be used by both lifetime.start and lifetime.end intrinsics, and hence has more than one user? Since sinkLifetimeStartMarkers is called after rewriteMaterializableInstructions, so if BitCastInst both used by lifetime.start and lifetime.end, then I should not cross the suspend point. So I believe that we can use hasOneUse here Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83379/new/ https://reviews.llvm.org/D83379 From llvm-commits at lists.llvm.org Wed Jul 8 19:32:44 2020 From: llvm-commits at lists.llvm.org (Varun Gandhi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:32:44 +0000 (UTC) Subject: [PATCH] D83449: [llvm] Add contains(KeyType) -> bool methods to Set types. Message-ID: varungandhi-apple created this revision. varungandhi-apple added a reviewer: rjmccall. Herald added subscribers: llvm-commits, dexonsmith. Herald added a project: LLVM. Add C++20-esque contains method for sets. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83449 Files: llvm/include/llvm/ADT/DenseSet.h llvm/include/llvm/ADT/SetVector.h llvm/include/llvm/ADT/SmallPtrSet.h llvm/include/llvm/ADT/SmallSet.h llvm/include/llvm/ADT/SparseSet.h llvm/include/llvm/ADT/StringSet.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83449.276623.patch Type: text/x-patch Size: 3276 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:35:08 2020 From: llvm-commits at lists.llvm.org (Varun Gandhi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:35:08 +0000 (UTC) Subject: [PATCH] D83449: [llvm] Add contains(KeyType) -> bool methods to Set types. In-Reply-To: References: Message-ID: <71f138f7f80aaee11ac3a0fd4f68e7cd@localhost.localdomain> varungandhi-apple added a comment. Prior art: - C++20: https://en.cppreference.com/w/cpp/container/set/contains - Rust: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.contains - Swift: https://developer.apple.com/documentation/swift/set Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83449/new/ https://reviews.llvm.org/D83449 From llvm-commits at lists.llvm.org Wed Jul 8 19:40:22 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:40:22 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <6bf833aed0ecbe341ee881688b5ea9c0@localhost.localdomain> plotfi added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1515 // in the epilogue, the residual adjustment is executed first. - uint64_t ArgumentPopSize = 0; - if (IsTailCallReturn) { - MachineOperand &StackAdjust = MBBI->getOperand(1); - - // For a tail-call in a callee-pops-arguments environment, some or all of - // the stack may actually be in use for the call's arguments, this is - // calculated during LowerCall and consumed here... - ArgumentPopSize = StackAdjust.getImm(); - } else { - // ... otherwise the amount to pop is *all* of the argument space, - // conveniently stored in the MachineFunctionInfo by - // LowerFormalArguments. This will, of course, be zero for the C calling - // convention. - ArgumentPopSize = AFI->getArgumentStackToRestore(); - } + uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB); ---------------- Is the bit that was removed a non-functional change here? If so, can this be a separate NFC commit? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 From llvm-commits at lists.llvm.org Wed Jul 8 19:59:14 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:59:14 +0000 (UTC) Subject: [PATCH] D82941: [StackSafety,NFC] Update documentation In-Reply-To: References: Message-ID: vitalybuka updated this revision to Diff 276624. vitalybuka marked 4 inline comments as done. vitalybuka added a comment. update Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82941/new/ https://reviews.llvm.org/D82941 Files: llvm/docs/LangRef.rst llvm/include/llvm/Analysis/StackSafetyAnalysis.h llvm/include/llvm/IR/ModuleSummaryIndex.h llvm/lib/Analysis/StackSafetyAnalysis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82941.276624.patch Type: text/x-patch Size: 6346 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 19:59:27 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 02:59:27 +0000 (UTC) Subject: [PATCH] D82941: [StackSafety,NFC] Update documentation In-Reply-To: References: Message-ID: vitalybuka marked an inline comment as done. vitalybuka added inline comments. ================ Comment at: llvm/docs/LangRef.rst:6857 +parameter. ``calls`` is empty, so the parameter is either not used for function +calls or ``offset`` already covers all accesses from nested function calls. Then +function itself can access just a single byte of the parameter #3. Additional ---------------- tejohnson wrote: > nit: s/Then/The/ > > I assume if it is passed directly to another call and we don't know the accesses within that callee (this is the per-module summary) that it would have a calls entry with offset [0,0]? > I assume if it is passed directly to another call and we don't know the accesses within that callee (this is the per-module summary) that it would have a calls entry with offset [0,0]? calls list is only needed when we don't know accesses inside of callee. If we know accesses we can directly apply them to offset of parameter (offset: [5, 5]) if we don't know accesses we still know offset used to pass the parameter I've added function body into the example. ================ Comment at: llvm/include/llvm/IR/ModuleSummaryIndex.h:557 + /// Describes the uses of a parameter by the range of byte offsets of direct + /// access in in the function and by all of the call targets it is passed to. struct ParamAccess { ---------------- tejohnson wrote: > nit: s/in in/in/ > > The "by all of the call targets it is passed to" sounds ambiguous to me, as if we already have knowledge of how it is accessed within those callees. I guess this is the situation after the thin link, but not initially. > > Probably best to be more explicit. I.e., from my understanding reading your LangRef writeup: > - In the per-module summary, the Calls vector summarize the byte offset applied to each pointer parameter before passing to each corresponding callee. I.e. this structure describes offsets computed from the pointer parameter within the function only > - In the combined summary, the offsets may include the offsets within each called function consuming this pointer parameter. Two questions: I assume that happens after some kind of inter-procedural propagation across the combined summary? After that propagation, is the Calls list empty? I moved this description close to fields. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82941/new/ https://reviews.llvm.org/D82941 From llvm-commits at lists.llvm.org Wed Jul 8 20:01:13 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:01:13 +0000 (UTC) Subject: [PATCH] D82941: [StackSafety,NFC] Update documentation In-Reply-To: References: Message-ID: <1cf20fb44738f0ec40011584bc247128@localhost.localdomain> vitalybuka updated this revision to Diff 276626. vitalybuka marked an inline comment as done. vitalybuka added a comment. update Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82941/new/ https://reviews.llvm.org/D82941 Files: llvm/docs/LangRef.rst llvm/include/llvm/Analysis/StackSafetyAnalysis.h llvm/include/llvm/IR/ModuleSummaryIndex.h llvm/lib/Analysis/StackSafetyAnalysis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82941.276626.patch Type: text/x-patch Size: 6378 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 20:04:43 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:04:43 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <821de3a7a28883ce91dc73f0f7fa1c63@localhost.localdomain> smeenai added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:32 +# FORMAT-NEXT: [[PREFIX]]-input2.o +# FORMAT_NOT: {{.}} + ---------------- sameerarora101 wrote: > smeenai wrote: > > You have an underscore instead of a dash :) > > > > Is the purpose to ensure that there's no other members? I assume a -EMPTY would work for that. We should also check for the "Archive : " header, to ensure there's no members before the table of contents member. > Thanks for catching the underscore. > > I added a check for "Archive : " header now. However, using `FORMAT-EMPTY` would just check that the next line (after 2nd member) has nothing on it. What I thought we wanted to check was that there is nothing at all after the second member. For eg, > > ``` > Archive : ... > ... __.SYMDEF > ...-input1.o > ...-input2.o > > something here > ``` > would pass with `FORMAT-EMPTY` just below the check for second member, right? But we want it to fail. You're right, that makes sense. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Wed Jul 8 20:06:24 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:06:24 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: smeenai accepted this revision. smeenai added a comment. This revision is now accepted and ready to land. LGTM, but please wait for @jhenderson too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Wed Jul 8 20:09:10 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:09:10 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <3a47f088df3250528ef27fec968dceb5@localhost.localdomain> smeenai added inline comments. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:18 + +static cl::opt OutputFile("output", + cl::desc("Specify output filename"), ---------------- sameerarora101 wrote: > smeenai wrote: > > As far as I can see, cctools libtool doesn't support the `-output` spelling, only `-o`. Is there any reason for us to support it? > Yup, that is true. I was just looking at other llvm tools and they have `-output` in addition to `-o`. So I thought of adding both. I can remove `-output` if you guys prefer that? I'd prefer to remove it, to mimic cctools libtool's interface as closely as possible. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Wed Jul 8 20:16:20 2020 From: llvm-commits at lists.llvm.org (Varun Gandhi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:16:20 +0000 (UTC) Subject: [PATCH] D83449: [llvm] Add contains(KeyType) -> bool methods to Set types. In-Reply-To: References: Message-ID: varungandhi-apple added a comment. I don't understand the `clang-tidy`/`clang-format` complaints. I've formatted the code based on how the surrounding code is formatted. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83449/new/ https://reviews.llvm.org/D83449 From llvm-commits at lists.llvm.org Wed Jul 8 20:24:21 2020 From: llvm-commits at lists.llvm.org (Jason Liu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:24:21 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <647da0754f01621d7096175fc172dd63@localhost.localdomain> jasonliu added inline comments. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:849 +static SmallString<32> parseParaType(uint32_t Value, unsigned ParaNum) { + SmallString<32> ParaType; + for (unsigned I = 0; I < ParaNum; ++I) { ---------------- hubert.reinterpretcast wrote: > jasonliu wrote: > > Why always 32? As I mentioned in the other comment, ParaNum have implication for how large your SmallString could be. > The template argument needs to be a compile time constant. Yes, sorry I missed that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Wed Jul 8 20:43:52 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:43:52 +0000 (UTC) Subject: [PATCH] D82941: [StackSafety,NFC] Update documentation In-Reply-To: References: Message-ID: <2f9261257ee10eafa68754f8a7af103a@localhost.localdomain> tejohnson accepted this revision. tejohnson added a comment. This revision is now accepted and ready to land. LGTM, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82941/new/ https://reviews.llvm.org/D82941 From llvm-commits at lists.llvm.org Wed Jul 8 20:46:34 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Wed, 08 Jul 2020 20:46:34 -0700 (PDT) Subject: [compiler-rt] 371c94f - Fix a typo in an error message. Message-ID: <5f06931a.1c69fb81.16e7.3657@mx.google.com> Author: Eric Christopher Date: 2020-07-08T20:43:05-07:00 New Revision: 371c94fca039bb85298756305758a56af129a1ce URL: https://github.com/llvm/llvm-project/commit/371c94fca039bb85298756305758a56af129a1ce DIFF: https://github.com/llvm/llvm-project/commit/371c94fca039bb85298756305758a56af129a1ce.diff LOG: Fix a typo in an error message. Added: Modified: compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp Removed: ################################################################################ diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp index dffe2c9c4737..91caa6a35693 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp @@ -2210,7 +2210,7 @@ void CheckNoDeepBind(const char *filename, int flag) { if (flag & RTLD_DEEPBIND) { Report( "You are trying to dlopen a %s shared library with RTLD_DEEPBIND flag" - " which is incompatibe with sanitizer runtime " + " which is incompatible with sanitizer runtime " "(see https://github.com/google/sanitizers/issues/611 for details" "). If you want to run %s library under sanitizers please remove " "RTLD_DEEPBIND from dlopen flags.\n", From llvm-commits at lists.llvm.org Wed Jul 8 20:53:10 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:53:10 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. Message-ID: Higuoxing created this revision. Higuoxing added reviewers: jhenderson, grimar, MaskRay. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Virtual functions should be overridden in the derived class DIEFixupVisitor rather than declare a new set of virtual functions. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83452 Files: llvm/lib/ObjectYAML/DWARFEmitter.cpp Index: llvm/lib/ObjectYAML/DWARFEmitter.cpp =================================================================== --- llvm/lib/ObjectYAML/DWARFEmitter.cpp +++ llvm/lib/ObjectYAML/DWARFEmitter.cpp @@ -440,36 +440,36 @@ public: DIEFixupVisitor(DWARFYAML::Data &DI) : DWARFYAML::Visitor(DI){}; -private: - virtual void onStartCompileUnit(DWARFYAML::Unit &CU) { +protected: + void onStartCompileUnit(DWARFYAML::Unit &CU) override { // Size of the unit header, excluding the length field itself. Length = CU.Version >= 5 ? 8 : 7; } - virtual void onEndCompileUnit(DWARFYAML::Unit &CU) { CU.Length = Length; } + void onEndCompileUnit(DWARFYAML::Unit &CU) override { CU.Length = Length; } - virtual void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) { + void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) override { Length += getULEB128Size(DIE.AbbrCode); } - virtual void onValue(const uint8_t U) { Length += 1; } - virtual void onValue(const uint16_t U) { Length += 2; } - virtual void onValue(const uint32_t U) { Length += 4; } - virtual void onValue(const uint64_t U, const bool LEB = false) { + void onValue(const uint8_t U) override { Length += 1; } + void onValue(const uint16_t U) override { Length += 2; } + void onValue(const uint32_t U) override { Length += 4; } + void onValue(const uint64_t U, const bool LEB = false) override { if (LEB) Length += getULEB128Size(U); else Length += 8; } - virtual void onValue(const int64_t S, const bool LEB = false) { + void onValue(const int64_t S, const bool LEB = false) override { if (LEB) Length += getSLEB128Size(S); else Length += 8; } - virtual void onValue(const StringRef String) { Length += String.size() + 1; } + void onValue(const StringRef String) override { Length += String.size() + 1; } - virtual void onValue(const MemoryBufferRef MBR) { + void onValue(const MemoryBufferRef MBR) override { Length += MBR.getBufferSize(); } }; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83452.276629.patch Type: text/x-patch Size: 2020 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 20:56:19 2020 From: llvm-commits at lists.llvm.org (Xing GUO via llvm-commits) Date: Wed, 08 Jul 2020 20:56:19 -0700 (PDT) Subject: [llvm] 683a1bb - [DWARFYAML][unittest] Refactor parseDWARFYAML(). Message-ID: <5f069563.1c69fb81.c8cf9.3911@mx.google.com> Author: Xing GUO Date: 2020-07-09T12:00:22+08:00 New Revision: 683a1bb253ef47ece27aad93812f22e8c51260fa URL: https://github.com/llvm/llvm-project/commit/683a1bb253ef47ece27aad93812f22e8c51260fa DIFF: https://github.com/llvm/llvm-project/commit/683a1bb253ef47ece27aad93812f22e8c51260fa.diff LOG: [DWARFYAML][unittest] Refactor parseDWARFYAML(). In this change, `parseDWARFYAML()` is refactored to be able to parse YAML decription into different data structures. We don't have to craft the whole DWARF structure for a small test in the future. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83220 Added: Modified: llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp Removed: ################################################################################ diff --git a/llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp b/llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp index e93773999ad3..a2addc8f2fa5 100644 --- a/llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp +++ b/llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp @@ -16,13 +16,7 @@ using namespace llvm; -static Expected parseDWARFYAML(StringRef Yaml, - bool IsLittleEndian = false, - bool Is64bit = true) { - DWARFYAML::Data Data; - Data.IsLittleEndian = IsLittleEndian; - Data.Is64bit = Is64bit; - +template static Error parseDWARFYAML(StringRef Yaml, T &Data) { SMDiagnostic GenerateDiag; yaml::Input YIn( Yaml, /*Ctxt=*/nullptr, @@ -35,7 +29,7 @@ static Expected parseDWARFYAML(StringRef Yaml, if (YIn.error()) return createStringError(YIn.error(), GenerateDiag.getMessage()); - return Data; + return Error::success(); } TEST(DebugAddrSection, TestParseDebugAddrYAML) { @@ -45,31 +39,29 @@ TEST(DebugAddrSection, TestParseDebugAddrYAML) { Length: 0x1234 Version: 5 )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - EXPECT_THAT_EXPECTED(DWARFOrErr, Succeeded()); + DWARFYAML::Data Data; + EXPECT_THAT_ERROR(parseDWARFYAML(Yaml, Data), Succeeded()); } TEST(DebugAddrSection, TestMissingVersion) { StringRef Yaml = R"( -debug_addr: - - Format: DWARF64 - Length: 0x1234 +Format: DWARF64 +Length: 0x1234 )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - EXPECT_THAT_ERROR(DWARFOrErr.takeError(), + DWARFYAML::AddrTableEntry AddrTableEntry; + EXPECT_THAT_ERROR(parseDWARFYAML(Yaml, AddrTableEntry), FailedWithMessage("missing required key 'Version'")); } TEST(DebugAddrSection, TestUnexpectedKey) { StringRef Yaml = R"( -debug_addr: - - Format: DWARF64 - Length: 0x1234 - Version: 5 - Blah: unexpected +Format: DWARF64 +Length: 0x1234 +Version: 5 +Blah: unexpected )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - EXPECT_THAT_ERROR(DWARFOrErr.takeError(), + DWARFYAML::AddrTableEntry AddrTableEntry; + EXPECT_THAT_ERROR(parseDWARFYAML(Yaml, AddrTableEntry), FailedWithMessage("unknown key 'Blah'")); } @@ -98,11 +90,11 @@ TEST(DebugPubSection, TestDebugPubSection) { - DieOffset: 0x4321 Name: def )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - ASSERT_THAT_EXPECTED(DWARFOrErr, Succeeded()); + DWARFYAML::Data Data; + ASSERT_THAT_ERROR(parseDWARFYAML(Yaml, Data), Succeeded()); - ASSERT_TRUE(DWARFOrErr->PubNames.hasValue()); - DWARFYAML::PubSection PubNames = DWARFOrErr->PubNames.getValue(); + ASSERT_TRUE(Data.PubNames.hasValue()); + DWARFYAML::PubSection PubNames = Data.PubNames.getValue(); ASSERT_EQ(PubNames.Entries.size(), 2u); EXPECT_EQ((uint32_t)PubNames.Entries[0].DieOffset, 0x1234u); @@ -110,8 +102,8 @@ TEST(DebugPubSection, TestDebugPubSection) { EXPECT_EQ((uint32_t)PubNames.Entries[1].DieOffset, 0x4321u); EXPECT_EQ(PubNames.Entries[1].Name, "def"); - ASSERT_TRUE(DWARFOrErr->PubTypes.hasValue()); - DWARFYAML::PubSection PubTypes = DWARFOrErr->PubTypes.getValue(); + ASSERT_TRUE(Data.PubTypes.hasValue()); + DWARFYAML::PubSection PubTypes = Data.PubTypes.getValue(); ASSERT_EQ(PubTypes.Entries.size(), 2u); EXPECT_EQ((uint32_t)PubTypes.Entries[0].DieOffset, 0x1234u); @@ -133,8 +125,8 @@ TEST(DebugPubSection, TestUnexpectedDescriptor) { Descriptor: 0x12 Name: abcd )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - EXPECT_THAT_ERROR(DWARFOrErr.takeError(), + DWARFYAML::Data Data; + EXPECT_THAT_ERROR(parseDWARFYAML(Yaml, Data), FailedWithMessage("unknown key 'Descriptor'")); } @@ -167,11 +159,11 @@ TEST(DebugGNUPubSection, TestDebugGNUPubSections) { Descriptor: 0x34 Name: def )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - ASSERT_THAT_EXPECTED(DWARFOrErr, Succeeded()); + DWARFYAML::Data Data; + ASSERT_THAT_ERROR(parseDWARFYAML(Yaml, Data), Succeeded()); - ASSERT_TRUE(DWARFOrErr->GNUPubNames.hasValue()); - DWARFYAML::PubSection GNUPubNames = DWARFOrErr->GNUPubNames.getValue(); + ASSERT_TRUE(Data.GNUPubNames.hasValue()); + DWARFYAML::PubSection GNUPubNames = Data.GNUPubNames.getValue(); ASSERT_EQ(GNUPubNames.Entries.size(), 2u); EXPECT_EQ((uint32_t)GNUPubNames.Entries[0].DieOffset, 0x1234u); @@ -181,8 +173,8 @@ TEST(DebugGNUPubSection, TestDebugGNUPubSections) { EXPECT_EQ((uint8_t)GNUPubNames.Entries[1].Descriptor, 0x34); EXPECT_EQ(GNUPubNames.Entries[1].Name, "def"); - ASSERT_TRUE(DWARFOrErr->GNUPubTypes.hasValue()); - DWARFYAML::PubSection GNUPubTypes = DWARFOrErr->GNUPubTypes.getValue(); + ASSERT_TRUE(Data.GNUPubTypes.hasValue()); + DWARFYAML::PubSection GNUPubTypes = Data.GNUPubTypes.getValue(); ASSERT_EQ(GNUPubTypes.Entries.size(), 2u); EXPECT_EQ((uint32_t)GNUPubTypes.Entries[0].DieOffset, 0x1234u); @@ -205,7 +197,7 @@ TEST(DebugGNUPubSection, TestMissingDescriptor) { - DieOffset: 0x1234 Name: abcd )"; - auto DWARFOrErr = parseDWARFYAML(Yaml); - EXPECT_THAT_ERROR(DWARFOrErr.takeError(), + DWARFYAML::Data Data; + EXPECT_THAT_ERROR(parseDWARFYAML(Yaml, Data), FailedWithMessage("missing required key 'Descriptor'")); } From llvm-commits at lists.llvm.org Wed Jul 8 20:56:29 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 03:56:29 +0000 (UTC) Subject: [PATCH] D83220: [DWARFYAML][unittest] Refactor parseDWARFYAML(). In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG683a1bb253ef: [DWARFYAML][unittest] Refactor parseDWARFYAML(). (authored by Higuoxing). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83220/new/ https://reviews.llvm.org/D83220 Files: llvm/unittests/ObjectYAML/DWARFYAMLTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83220.276630.patch Type: text/x-patch Size: 4898 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 21:14:15 2020 From: llvm-commits at lists.llvm.org (Lang Hames via llvm-commits) Date: Wed, 08 Jul 2020 21:14:15 -0700 (PDT) Subject: [llvm] 6709150 - [ORC] Modify LazyCallThroughManager to support asynchronous resolution. Message-ID: <5f069997.1c69fb81.f9d47.455f@mx.google.com> Author: Lang Hames Date: 2020-07-08T21:13:55-07:00 New Revision: 670915094462d831e3733e5b01a76471b8cf6dd8 URL: https://github.com/llvm/llvm-project/commit/670915094462d831e3733e5b01a76471b8cf6dd8 DIFF: https://github.com/llvm/llvm-project/commit/670915094462d831e3733e5b01a76471b8cf6dd8.diff LOG: [ORC] Modify LazyCallThroughManager to support asynchronous resolution. Asynchronous resolution is a better fit for handling reentry over IPC/RPC where we want to avoid blocking a communication handler/thread. Added: Modified: llvm/include/llvm/ExecutionEngine/Orc/IndirectionUtils.h llvm/include/llvm/ExecutionEngine/Orc/LazyReexports.h llvm/lib/ExecutionEngine/Orc/LazyReexports.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/ExecutionEngine/Orc/IndirectionUtils.h b/llvm/include/llvm/ExecutionEngine/Orc/IndirectionUtils.h index 4f1393296279..b3e2bddd716b 100644 --- a/llvm/include/llvm/ExecutionEngine/Orc/IndirectionUtils.h +++ b/llvm/include/llvm/ExecutionEngine/Orc/IndirectionUtils.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -53,6 +54,13 @@ namespace orc { /// are used by various ORC APIs to support lazy compilation class TrampolinePool { public: + using NotifyLandingResolvedFunction = + unique_function; + + using ResolveLandingFunction = unique_function; + virtual ~TrampolinePool() {} /// Get an available trampoline address. @@ -66,18 +74,15 @@ class TrampolinePool { /// A trampoline pool for trampolines within the current process. template class LocalTrampolinePool : public TrampolinePool { public: - using GetTrampolineLandingFunction = - std::function; - /// Creates a LocalTrampolinePool with the given RunCallback function. /// Returns an error if this function is unable to correctly allocate, write /// and protect the resolver code block. static Expected> - Create(GetTrampolineLandingFunction GetTrampolineLanding) { + Create(ResolveLandingFunction ResolveLanding) { Error Err = Error::success(); auto LTP = std::unique_ptr( - new LocalTrampolinePool(std::move(GetTrampolineLanding), Err)); + new LocalTrampolinePool(std::move(ResolveLanding), Err)); if (Err) return std::move(Err); @@ -108,13 +113,19 @@ template class LocalTrampolinePool : public TrampolinePool { static JITTargetAddress reenter(void *TrampolinePoolPtr, void *TrampolineId) { LocalTrampolinePool *TrampolinePool = static_cast(TrampolinePoolPtr); - return TrampolinePool->GetTrampolineLanding(static_cast( - reinterpret_cast(TrampolineId))); + + std::promise LandingAddressP; + auto LandingAddressF = LandingAddressP.get_future(); + + TrampolinePool->ResolveLanding(pointerToJITTargetAddress(TrampolineId), + [&](JITTargetAddress LandingAddress) { + LandingAddressP.set_value(LandingAddress); + }); + return LandingAddressF.get(); } - LocalTrampolinePool(GetTrampolineLandingFunction GetTrampolineLanding, - Error &Err) - : GetTrampolineLanding(std::move(GetTrampolineLanding)) { + LocalTrampolinePool(ResolveLandingFunction ResolveLanding, Error &Err) + : ResolveLanding(std::move(ResolveLanding)) { ErrorAsOutParameter _(&Err); @@ -173,7 +184,7 @@ template class LocalTrampolinePool : public TrampolinePool { return Error::success(); } - GetTrampolineLandingFunction GetTrampolineLanding; + ResolveLandingFunction ResolveLanding; std::mutex LTPMutex; sys::OwningMemoryBlock ResolverBlock; @@ -241,10 +252,14 @@ class LocalJITCompileCallbackManager : public JITCompileCallbackManager { JITTargetAddress ErrorHandlerAddress, Error &Err) : JITCompileCallbackManager(nullptr, ES, ErrorHandlerAddress) { + using NotifyLandingResolvedFunction = + TrampolinePool::NotifyLandingResolvedFunction; + ErrorAsOutParameter _(&Err); auto TP = LocalTrampolinePool::Create( - [this](JITTargetAddress TrampolineAddr) { - return executeCompileCallback(TrampolineAddr); + [this](JITTargetAddress TrampolineAddr, + NotifyLandingResolvedFunction NotifyLandingResolved) { + NotifyLandingResolved(executeCompileCallback(TrampolineAddr)); }); if (!TP) { diff --git a/llvm/include/llvm/ExecutionEngine/Orc/LazyReexports.h b/llvm/include/llvm/ExecutionEngine/Orc/LazyReexports.h index 407dc6b7e34f..01a2b9712e9a 100644 --- a/llvm/include/llvm/ExecutionEngine/Orc/LazyReexports.h +++ b/llvm/include/llvm/ExecutionEngine/Orc/LazyReexports.h @@ -47,6 +47,9 @@ class LazyCallThroughManager { NotifyResolvedFunction NotifyResolved); protected: + using NotifyLandingResolvedFunction = + TrampolinePool::NotifyLandingResolvedFunction; + LazyCallThroughManager(ExecutionSession &ES, JITTargetAddress ErrorHandlerAddr, std::unique_ptr TP); @@ -56,16 +59,13 @@ class LazyCallThroughManager { SymbolStringPtr SymbolName; }; + JITTargetAddress reportCallThroughError(Error Err); Expected findReexport(JITTargetAddress TrampolineAddr); - Expected resolveSymbol(const ReexportsEntry &RE); - Error notifyResolved(JITTargetAddress TrampolineAddr, JITTargetAddress ResolvedAddr); - - JITTargetAddress reportCallThroughError(Error Err) { - ES.reportError(std::move(Err)); - return ErrorHandlerAddr; - } + void resolveTrampolineLandingAddress( + JITTargetAddress TrampolineAddr, + NotifyLandingResolvedFunction NotifyLandingResolved); void setTrampolinePool(std::unique_ptr TP) { this->TP = std::move(TP); @@ -87,14 +87,19 @@ class LazyCallThroughManager { /// A lazy call-through manager that builds trampolines in the current process. class LocalLazyCallThroughManager : public LazyCallThroughManager { private: + using NotifyTargetResolved = unique_function; + LocalLazyCallThroughManager(ExecutionSession &ES, JITTargetAddress ErrorHandlerAddr) : LazyCallThroughManager(ES, ErrorHandlerAddr, nullptr) {} template Error init() { auto TP = LocalTrampolinePool::Create( - [this](JITTargetAddress TrampolineAddr) { - return callThroughToSymbol(TrampolineAddr); + [this](JITTargetAddress TrampolineAddr, + TrampolinePool::NotifyLandingResolvedFunction + NotifyLandingResolved) { + resolveTrampolineLandingAddress(TrampolineAddr, + std::move(NotifyLandingResolved)); }); if (!TP) @@ -104,21 +109,6 @@ class LocalLazyCallThroughManager : public LazyCallThroughManager { return Error::success(); } - JITTargetAddress callThroughToSymbol(JITTargetAddress TrampolineAddr) { - auto Entry = findReexport(TrampolineAddr); - if (!Entry) - return reportCallThroughError(Entry.takeError()); - - auto ResolvedAddr = resolveSymbol(std::move(*Entry)); - if (!ResolvedAddr) - return reportCallThroughError(ResolvedAddr.takeError()); - - if (Error Err = notifyResolved(TrampolineAddr, *ResolvedAddr)) - return reportCallThroughError(std::move(Err)); - - return *ResolvedAddr; - } - public: /// Create a LocalLazyCallThroughManager using the given ABI. See /// createLocalLazyCallThroughManager. diff --git a/llvm/lib/ExecutionEngine/Orc/LazyReexports.cpp b/llvm/lib/ExecutionEngine/Orc/LazyReexports.cpp index 2812159b0076..ff66955082d8 100644 --- a/llvm/lib/ExecutionEngine/Orc/LazyReexports.cpp +++ b/llvm/lib/ExecutionEngine/Orc/LazyReexports.cpp @@ -35,6 +35,11 @@ Expected LazyCallThroughManager::getCallThroughTrampoline( return *Trampoline; } +JITTargetAddress LazyCallThroughManager::reportCallThroughError(Error Err) { + ES.reportError(std::move(Err)); + return ErrorHandlerAddr; +} + Expected LazyCallThroughManager::findReexport(JITTargetAddress TrampolineAddr) { std::lock_guard Lock(LCTMMutex); @@ -46,19 +51,6 @@ LazyCallThroughManager::findReexport(JITTargetAddress TrampolineAddr) { return I->second; } -Expected -LazyCallThroughManager::resolveSymbol(const ReexportsEntry &RE) { - auto LookupResult = - ES.lookup(makeJITDylibSearchOrder(RE.SourceJD, - JITDylibLookupFlags::MatchAllSymbols), - RE.SymbolName, SymbolState::Ready); - - if (!LookupResult) - return LookupResult.takeError(); - - return LookupResult->getAddress(); -} - Error LazyCallThroughManager::notifyResolved(JITTargetAddress TrampolineAddr, JITTargetAddress ResolvedAddr) { NotifyResolvedFunction NotifyResolved; @@ -74,6 +66,37 @@ Error LazyCallThroughManager::notifyResolved(JITTargetAddress TrampolineAddr, return NotifyResolved ? NotifyResolved(ResolvedAddr) : Error::success(); } +void LazyCallThroughManager::resolveTrampolineLandingAddress( + JITTargetAddress TrampolineAddr, + NotifyLandingResolvedFunction NotifyLandingResolved) { + + auto Entry = findReexport(TrampolineAddr); + if (!Entry) + return NotifyLandingResolved(reportCallThroughError(Entry.takeError())); + + ES.lookup( + LookupKind::Static, + makeJITDylibSearchOrder(Entry->SourceJD, + JITDylibLookupFlags::MatchAllSymbols), + SymbolLookupSet({Entry->SymbolName}), SymbolState::Ready, + [this, TrampolineAddr, SymbolName = Entry->SymbolName, + NotifyLandingResolved = std::move(NotifyLandingResolved)]( + Expected Result) mutable { + if (Result) { + assert(Result->size() == 1 && "Unexpected result size"); + assert(Result->count(SymbolName) && "Unexpected result value"); + JITTargetAddress LandingAddr = (*Result)[SymbolName].getAddress(); + + if (auto Err = notifyResolved(TrampolineAddr, LandingAddr)) + NotifyLandingResolved(reportCallThroughError(std::move(Err))); + else + NotifyLandingResolved(LandingAddr); + } else + NotifyLandingResolved(reportCallThroughError(Result.takeError())); + }, + NoDependenciesToRegister); +} + Expected> createLocalLazyCallThroughManager(const Triple &T, ExecutionSession &ES, JITTargetAddress ErrorHandlerAddr) { From llvm-commits at lists.llvm.org Wed Jul 8 21:52:20 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 04:52:20 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <6ea7dcc9837ea1d188531c6621dbd523@localhost.localdomain> asb added a comment. This has been hanging around for a while, but I think we'd basically agreed this is the right logic. The comments have ended up referring to flags that don't exist on Clang making it a little hard to follow, and I've added a request to slightly expand testing. If you make those cleanups I think it should be ready for a final review and merge. As Sam says, lets flag this in today's RISC-V LLVM call to double-check everyone is happy. ================ Comment at: clang/lib/Driver/ToolChains/Arch/RISCV.cpp:622 + // 1. Explicit choices using `--with-arch=` + // 2. Based on `-mcpu` if target cpu has default isa extension feature + // 3. A default based on `--with-abi=`, if provided ---------------- As clang has no with-arch or with-abi, this comment seems inaccurate? ================ Comment at: clang/test/Driver/riscv-cpus.c:2 +// Check target CPUs are correctly passed. + +// RUN: %clang -target riscv32 -### -c %s 2>&1 -mcpu=rocket-rv32 | FileCheck -check-prefix=MCPU-ROCKETCHIP32 %s ---------------- I think for completeness this test should be validating the interaction of the ABI choosing logic with CPU selection as well. With the implemented logic I believe it should show that lp64d is selected for -mcpu=sifive-u54 and that -mcpu=sifive-u54 -mabi=lp64 will respect the ABI choice Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 From llvm-commits at lists.llvm.org Wed Jul 8 21:54:11 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 04:54:11 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: ikudrin added a comment. In D82886#2139657 , @dblaikie wrote: > I'm not suggesting it needs to be fixed - but that that codepath (the one that returns zero) is untested - so when it was committed, it was committed without test coverage. It'd be good to add test coverage where it is missing like this. Isn't adding that test coverage orthogonal to this particular patch? > Right - but what I mean is if there's only 10 bytes, as in your example - it reads the 4 bytes of DWARF64 mark, then 6 bytes out of the desired 8 - if the length was then reported as 10 (with an error saying the length was garbled/the contents terminated earlier than expected), would that be adequate to no longer need the zero length special case? I am OK with the current convention that if it is not possible to read the length field the code returns zero as the total length. It might be better to make the result `Optional` and return `None` in that case, but I really doubt it is worth investing time in that. Reporting something that was not read from the section (10 in your example) smells not good for me, but you may provide the patch if you feel like the code will be improved. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Wed Jul 8 22:08:12 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:08:12 +0000 (UTC) Subject: [PATCH] D83159: [RISCV] Add a new codegen test In-Reply-To: References: Message-ID: <1a10aa1127d38b4a6c108754ee5e2519@localhost.localdomain> asb added inline comments. ================ Comment at: llvm/test/CodeGen/RISCV/addimm-mulimm.ll:1 +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s ---------------- MaskRay wrote: > It'd be better adding a file-level comment. +1 on adding a comment to this file explaining what it's aiming to demonstrate Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 From llvm-commits at lists.llvm.org Wed Jul 8 22:18:11 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:18:11 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <81d153ebd2646014df0994abb36ceb20@localhost.localdomain> plotfi added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:241 + +bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const { + return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF); ---------------- This is so small that I feel it would be more descriptive at the call site of SavedRegs.set/test to have: ``` /// true if CSRs should be paired const bool producePairRegisters = produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF); ``` With some additional comments on the register paring in the context of homogenous-prolog-epilog. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 From llvm-commits at lists.llvm.org Wed Jul 8 22:19:55 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:19:55 +0000 (UTC) Subject: [PATCH] D80802: [RISCV] Upgrade RVV MC to v0.9. In-Reply-To: References: Message-ID: <6a4e8ed0230210c6194995de93626fe1@localhost.localdomain> asb added a comment. I've gone through and can't see any obvious issues. I defer to one of the RISC-V Vector extension usual suspects for giving a LGTM on the detail of the altered instructions etc. Once we have that, this looks good to land IMHO. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80802/new/ https://reviews.llvm.org/D80802 From llvm-commits at lists.llvm.org Wed Jul 8 22:25:55 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:25:55 +0000 (UTC) Subject: [PATCH] D77443: [RISCV] Fix RISCVInstrInfo::getInstSizeInBytes for atomics pseudos In-Reply-To: References: Message-ID: asb accepted this revision. asb added a comment. LGTM, +1 on adding a comment to the expansion functions noting the need to update getInstSizeInBytes. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77443/new/ https://reviews.llvm.org/D77443 From llvm-commits at lists.llvm.org Wed Jul 8 22:34:33 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:34:33 +0000 (UTC) Subject: [PATCH] D83455: [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, RKSimon. Herald added a subscriber: hiraditya. Herald added a project: LLVM. If we don't immediately lower the vector shift, the splat constant vector we created may get turned into a constant pool load before we get around to lowering the shift. This makes it a lot more difficult to create a shift by constant. Sometimes we fail to see through the constant pool at all and end up trying to lower as if it was a variable shift. This requires custom handling and may create an unsupported vselect on pre-sse-4.1 targets. Since we're after LegalizeVectorOps we are unable to legalize the unsupported vselect as that code is in LegalizeDAG. So calling LowerShift immediately ensures that we get see the splat constant. Fixes PR46527. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83455 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/pr46527.ll Index: llvm/test/CodeGen/X86/pr46527.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/X86/pr46527.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +;RUN: llc < %s -mtriple=i686-unknown -mattr=sse2 -relocation-model=pic | FileCheck %s + +define void @f(<16 x i8>* %out, <16 x i8> %in, i1 %flag) { +; CHECK-LABEL: f: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: calll .L0$pb +; CHECK-NEXT: .cfi_adjust_cfa_offset 4 +; CHECK-NEXT: .L0$pb: +; CHECK-NEXT: popl %eax +; CHECK-NEXT: .cfi_adjust_cfa_offset -4 +; CHECK-NEXT: .Ltmp0: +; CHECK-NEXT: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax +; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx +; CHECK-NEXT: movb {{[0-9]+}}(%esp), %dl +; CHECK-NEXT: notb %dl +; CHECK-NEXT: andb $1, %dl +; CHECK-NEXT: movzbl %dl, %edx +; CHECK-NEXT: movd %edx, %xmm1 +; CHECK-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] +; CHECK-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,3,4,5,6,7] +; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0] +; CHECK-NEXT: paddb %xmm1, %xmm1 +; CHECK-NEXT: pxor %xmm0, %xmm1 +; CHECK-NEXT: pxor {{\.LCPI.*}}@GOTOFF(%eax), %xmm1 +; CHECK-NEXT: movdqa %xmm1, (%ecx) +; CHECK-NEXT: retl +entry: + %0 = select i1 %flag, i8 0, i8 2 + %1 = insertelement <16 x i8> undef, i8 %0, i32 0 + %2 = shufflevector <16 x i8> %1, <16 x i8> undef, <16 x i32> zeroinitializer + %3 = xor <16 x i8> %2, %in + %4 = xor <16 x i8> %3, + store <16 x i8> %4, <16 x i8>* %out, align 16 + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -9689,6 +9689,9 @@ return SDValue(); } +static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget, + SelectionDAG &DAG); + /// If a BUILD_VECTOR's source elements all apply the same bit operation and /// one of their operands is constant, lower to a pair of BUILD_VECTOR and /// just apply the bit to the vectors. @@ -9696,6 +9699,7 @@ /// from this, but enough scalar bit operations are created from the later /// legalization + scalarization stages to need basic support. static SDValue lowerBuildVectorToBitOp(BuildVectorSDNode *Op, + const X86Subtarget &Subtarget, SelectionDAG &DAG) { SDLoc DL(Op); MVT VT = Op->getSimpleValueType(0); @@ -9759,7 +9763,14 @@ SDValue LHS = DAG.getBuildVector(VT, DL, LHSElts); SDValue RHS = DAG.getBuildVector(VT, DL, RHSElts); - return DAG.getNode(Opcode, DL, VT, LHS, RHS); + SDValue Res = DAG.getNode(Opcode, DL, VT, LHS, RHS); + + if (!IsShift) + return Res; + + // Immediately lower the shift to ensure the constant build vector doesn't + // get converted to a constant pool before the shift is lowered. + return LowerShift(Res, Subtarget, DAG); } /// Create a vector constant without a load. SSE/AVX provide the bare minimum @@ -10115,7 +10126,7 @@ return HorizontalOp; if (SDValue Broadcast = lowerBuildVectorAsBroadcast(BV, Subtarget, DAG)) return Broadcast; - if (SDValue BitOp = lowerBuildVectorToBitOp(BV, DAG)) + if (SDValue BitOp = lowerBuildVectorToBitOp(BV, Subtarget, DAG)) return BitOp; unsigned EVTBits = EltVT.getSizeInBits(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83455.276635.patch Type: text/x-patch Size: 3589 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 22:37:58 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:37:58 +0000 (UTC) Subject: [PATCH] D82988: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo In-Reply-To: References: Message-ID: asb requested changes to this revision. asb added a comment. This revision now requires changes to proceed. This is a nice simplification, thanks. My only request before committing is to split out the RISCVTargetMachine to a separate pass, as that is logically distinct. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82988/new/ https://reviews.llvm.org/D82988 From llvm-commits at lists.llvm.org Wed Jul 8 22:39:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 05:39:23 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <0d194514c148652b0900821e79cdb584@localhost.localdomain> MaskRay added a comment. In D83262#2140642 , @MaskRay wrote: > In D83262#2135629 , @jhenderson wrote: > > > I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of `=0` to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason. > > > I have a slightly stronger opinion that we should not add it. We could improve help messages for `cl::opt` to mention the default value. Given it more thought, perhaps we should switch llvm-symbolizer to llvm-objcopy style `OptTable`. Many `llvm::cl::opt` based tools are not user facing (llc/opt). `llvm::cl::opt` is quick and easy. For user facing utilities (clang/lld/objcopy), OptTable may be more suitable as OptTable can be customized to be similar to the most common GNU-style getopt_long behavior. llvm-readobj/llvm-objdump are a bit special: they don't have defaulted-to-true `llvm::cl::opt`. If they do, we may face similar conumdrum like `--no-demangle`. >> Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. -finstrument-function-entry-bare). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected. > > Do you still have a need for output without source info? We add options if there is a reasonable use case, not that "we add it just to customize a behavior". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Wed Jul 8 23:24:49 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:24:49 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: MaskRay added a comment. In D79978#2140575 , @tmsriram wrote: > In D79978#2138208 , @MaskRay wrote: > > > I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well. > > > > For `basic-block-sections-cfiinstr_1.ll`, have you considered places like `CodeGen/X86/cfi-inserter-*`? You may even create a subdirectory there. > > `_1` is not very common. `-1` is more common. > > > Done. > > > `curl -L 'https://reviews.llvm.org/D79978?download=1'` does not have a/ or b/ prefix. I think that may be why `arc patch D79978` cannot apply the patch. > > Can you upload a diff with either `arc diff`, git format-patch -1 or `git diff 'HEAD^'`? Thanks. > > Uploaded after git diff HEAD^. Please see if this works. Your diff includes: --- llvm/include/llvm/CodeGen/TargetFrameLowering.h +++ llvm/include/llvm/CodeGen/TargetFrameLowering.h `git diff 'HEAD^'` includes a prefix: --- c/llvm/include/llvm/CodeGen/TargetFrameLowering.h +++ w/llvm/include/llvm/CodeGen/TargetFrameLowering.h CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Wed Jul 8 23:27:23 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:27:23 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <862cd480cd27a0c858a1526a313a2f3a@localhost.localdomain> MaskRay requested changes to this revision. MaskRay added a comment. This revision now requires changes to proceed. I haven't looked into the details, but the test suggests that the patch is wrong: .section .text,"ax", at progbits,unique,2 _Z2f3b.2: # %if.end .cfi_startproc .cfi_def_cfa %rbp, 16 # this should inserted after addq $16, %rsp .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 addq $16, %rsp popq %rbp .cfi_def_cfa %rsp, 8 retq CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Wed Jul 8 23:35:50 2020 From: llvm-commits at lists.llvm.org (Xun Li via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:35:50 +0000 (UTC) Subject: [PATCH] D83379: [Coroutines] Refactor sinkLifetimeStartMarkers In-Reply-To: References: Message-ID: <1e1fcb07e67a43c5f8c461a09e99f10f@localhost.localdomain> lxfind accepted this revision. lxfind added a comment. This revision is now accepted and ready to land. Thank you! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83379/new/ https://reviews.llvm.org/D83379 From llvm-commits at lists.llvm.org Wed Jul 8 23:38:18 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:38:18 +0000 (UTC) Subject: [PATCH] D83242: [RFC][BPF] support expr with typedef type for FIELD_EXISTENCE reloc In-Reply-To: References: Message-ID: yonghong-song added a comment. Sounds good. Will try to add support for union/struct type as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83242/new/ https://reviews.llvm.org/D83242 From llvm-commits at lists.llvm.org Wed Jul 8 23:38:32 2020 From: llvm-commits at lists.llvm.org (Shawn Landden via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:38:32 +0000 (UTC) Subject: [PATCH] D83392: Strlen loop idiom recognition In-Reply-To: References: Message-ID: <37aabcaca3866cbd11c93894cd73a42f@localhost.localdomain> shawnl added a comment. Strlen on NULL is undefined, so looking for a NULL check makes no sense. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83392/new/ https://reviews.llvm.org/D83392 From llvm-commits at lists.llvm.org Wed Jul 8 23:45:06 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Wed, 08 Jul 2020 23:45:06 -0700 (PDT) Subject: [llvm] c96877f - [X86] Remove unnecessary union from getHostCPUFeatures. NFC Message-ID: <5f06bcf2.1c69fb81.1c798.56f3@mx.google.com> Author: Craig Topper Date: 2020-07-08T23:42:05-07:00 New Revision: c96877ff62253aeeaba4ea165861045263d25207 URL: https://github.com/llvm/llvm-project/commit/c96877ff62253aeeaba4ea165861045263d25207 DIFF: https://github.com/llvm/llvm-project/commit/c96877ff62253aeeaba4ea165861045263d25207.diff LOG: [X86] Remove unnecessary union from getHostCPUFeatures. NFC This seems to be leftover copied from an older implementation of getHostCPUName where we needed this to check the name of CPU vendor. We don't check the CPU vendor at all in getHostCPUFeatures so this union and the variable are unneeded. Added: Modified: llvm/lib/Support/Host.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index db99612c97b5..adfb599f55ff 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -1361,13 +1361,8 @@ int sys::getHostNumPhysicalCores() { bool sys::getHostCPUFeatures(StringMap &Features) { unsigned EAX = 0, EBX = 0, ECX = 0, EDX = 0; unsigned MaxLevel; - union { - unsigned u[3]; - char c[12]; - } text; - if (getX86CpuIDAndInfo(0, &MaxLevel, text.u + 0, text.u + 2, text.u + 1) || - MaxLevel < 1) + if (getX86CpuIDAndInfo(0, &MaxLevel, &EBX, &ECX, &EDX) || MaxLevel < 1) return false; getX86CpuIDAndInfo(1, &EAX, &EBX, &ECX, &EDX); From llvm-commits at lists.llvm.org Wed Jul 8 23:45:31 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:45:31 +0000 (UTC) Subject: [PATCH] D81805: [RISCV] Fix isStoreToStackSlot In-Reply-To: References: Message-ID: asb accepted this revision. asb added a comment. This looks good to me, good catch. I do see a codegen change on regstack-1.c from the GCC Torture Suite, so it might be worth having a quick look to see if it's easy to make a test case based on that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81805/new/ https://reviews.llvm.org/D81805 From llvm-commits at lists.llvm.org Wed Jul 8 23:50:23 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:50:23 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <16e57ccfcecbefd01643f13f575dbf25@localhost.localdomain> serge-sans-paille added a subscriber: fhahn. serge-sans-paille added a comment. Note: I had to revert this becasue I only tested X86 targe, and other targets suffer from a lot of « don't update return status » error. cc @fhahn Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Wed Jul 8 23:57:52 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via llvm-commits) Date: Wed, 08 Jul 2020 23:57:52 -0700 (PDT) Subject: [llvm] e38727a - [StackSafety,NFC] Update documentation Message-ID: <5f06bff0.1c69fb81.2a6cc.5882@mx.google.com> Author: Vitaly Buka Date: 2020-07-08T23:57:13-07:00 New Revision: e38727a0bbbf2a0ee8f29458163a56f3c821f010 URL: https://github.com/llvm/llvm-project/commit/e38727a0bbbf2a0ee8f29458163a56f3c821f010 DIFF: https://github.com/llvm/llvm-project/commit/e38727a0bbbf2a0ee8f29458163a56f3c821f010.diff LOG: [StackSafety,NFC] Update documentation It's follow up for D80908 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D82941 Added: Modified: llvm/docs/LangRef.rst llvm/include/llvm/Analysis/StackSafetyAnalysis.h llvm/include/llvm/IR/ModuleSummaryIndex.h llvm/lib/Analysis/StackSafetyAnalysis.cpp Removed: ################################################################################ diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 566d761d3072..c2d6200e67fa 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -6843,7 +6843,9 @@ function and looks like: param: 4, offset: [0, 5][, calls: ((Callee)[, (Callee)]*)]? where the first ``param`` is the number of the parameter it describes, -``offset`` is the known access range of the paramenter inside of the function. +``offset`` is the inclusive range of offsets from the pointer parameter to bytes +which can be accessed by the function. This range does not include accesses by +function calls from ``calls`` list. where each ``Callee`` decribes how parameter is forwared into other functions and looks like: @@ -6854,7 +6856,44 @@ functions and looks like: The ``callee`` refers to the summary entry id of the callee, ``param`` is the number of the callee parameter which points into the callers parameter -with offset known to be inside of the ``offset`` range. +with offset known to be inside of the ``offset`` range. ``calls`` will be +consumed and removed by thin link stage to update ``Param::offset`` so it +covers all accesses possible by ``calls``. + +Pointer parameter without corresponding ``Param`` is considered unsafe and we +assume that access with any offset is possible. + +Example: + +If we have the following function: + +.. code-block:: text + + define i64 @foo(i64* %0, i32* %1, i8* %2, i8 %3) { + store i32* %1, i32** @x + %5 = getelementptr inbounds i8, i8* %2, i64 5 + %6 = load i8, i8* %5 + %7 = getelementptr inbounds i8, i8* %2, i8 %3 + tail call void @bar(i8 %3, i8* %7) + %8 = load i64, i64* %0 + ret i64 %8 + } + +We can expect the record like this: + +.. code-block:: text + + params: ((param: 0, offset: [0, 7]),(param: 2, offset: [5, 5], calls: ((callee: ^3, param: 1, offset: [-128, 127])))) + +The function may access just 8 bytes of the paramenter %0 . ``calls`` is empty, +so the parameter is either not used for function calls or ``offset`` already +covers all accesses from nested function calls. +Parameter %1 escapes, so access is unknown. +The function itself can access just a single byte of the parameter %2. Additional +access is possible inside of the ``@bar`` or ``^3``. The function adds signed +offset to the pointer and passes the result as the argument %1 into ``^3``. +This record itself does not tell us how ``^3`` will access the parameter. +Parameter %3 is not a pointer. .. _refs_summary: diff --git a/llvm/include/llvm/Analysis/StackSafetyAnalysis.h b/llvm/include/llvm/Analysis/StackSafetyAnalysis.h index 3ee520eb0411..846c2e6f7e91 100644 --- a/llvm/include/llvm/Analysis/StackSafetyAnalysis.h +++ b/llvm/include/llvm/Analysis/StackSafetyAnalysis.h @@ -45,6 +45,12 @@ class StackSafetyInfo { void print(raw_ostream &O) const; /// Parameters use for a FunctionSummary. + /// Function collects access information of all pointer parameters. + /// Information includes a range of direct access of parameters by the + /// functions and all call sites accepting the parameter. + /// StackSafety assumes that missing parameter information means possibility + /// of access to the parameter with any offset, so we can correctly link + /// code without StackSafety information, e.g. non-ThinLTO. std::vector getParamAccesses() const; }; diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 595c7b7d4da0..9adaf5dfc3d3 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -553,8 +553,7 @@ class FunctionSummary : public GlobalValueSummary { unsigned AlwaysInline : 1; }; - /// Describes the uses of a parameter by the range of offsets accessed in the - /// function and all of the call targets it is passed to. + /// Describes the uses of a parameter by the function. struct ParamAccess { static constexpr uint32_t RangeWidth = 64; @@ -564,7 +563,7 @@ class FunctionSummary : public GlobalValueSummary { struct Call { uint64_t ParamNo = 0; GlobalValue::GUID Callee = 0; - ConstantRange Offsets{RangeWidth, true}; + ConstantRange Offsets{/*BitWidth=*/RangeWidth, /*isFullSet=*/true}; Call() = default; Call(uint64_t ParamNo, GlobalValue::GUID Callee, @@ -573,7 +572,15 @@ class FunctionSummary : public GlobalValueSummary { }; uint64_t ParamNo = 0; - ConstantRange Use{RangeWidth, true}; + /// The range contains byte offsets from the parameter pointer which + /// accessed by the function. In the per-module summary, it only includes + /// accesses made by the function instructions. In the combined summary, it + /// also includes accesses by nested function calls. + ConstantRange Use{/*BitWidth=*/RangeWidth, /*isFullSet=*/true}; + /// In the per-module summary, it summarizes the byte offset applied to each + /// pointer parameter before passing to each corresponding callee. + /// In the combined summary, it's empty and information is propagated by + /// inter-procedural analysis and applied to the Use field. std::vector Calls; ParamAccess() = default; diff --git a/llvm/lib/Analysis/StackSafetyAnalysis.cpp b/llvm/lib/Analysis/StackSafetyAnalysis.cpp index 793090cc9763..c737cf013608 100644 --- a/llvm/lib/Analysis/StackSafetyAnalysis.cpp +++ b/llvm/lib/Analysis/StackSafetyAnalysis.cpp @@ -748,13 +748,16 @@ const StackSafetyGlobalInfo::InfoTy &StackSafetyGlobalInfo::getInfo() const { return *Info; } -// Converts a StackSafetyFunctionInfo to the relevant FunctionSummary -// constructor fields std::vector StackSafetyInfo::getParamAccesses() const { + // Implementation transforms internal representation of parameter information + // into FunctionSummary format. std::vector ParamAccesses; for (const auto &KV : getInfo().Info.Params) { auto &PS = KV.second; + // Parameter accessed by any or unknown offset, represented as FullSet by + // StackSafety, is handled as the parameter for which we have no + // StackSafety info at all. So drop it to reduce summary size. if (PS.Range.isFullSet()) continue; @@ -763,6 +766,10 @@ StackSafetyInfo::getParamAccesses() const { Param.Calls.reserve(PS.Calls.size()); for (auto &C : PS.Calls) { + // Parameter forwarded into another function by any or unknown offset + // will make ParamAccess::Range as FullSet anyway. So we can drop the + // entire parameter like we did above. + // TODO(vitalybuka): Return already filtered parameters from getInfo(). if (C.Offset.isFullSet()) { ParamAccesses.pop_back(); break; From llvm-commits at lists.llvm.org Wed Jul 8 23:58:00 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 06:58:00 +0000 (UTC) Subject: [PATCH] D82941: [StackSafety,NFC] Update documentation In-Reply-To: References: Message-ID: <9072289b86a2392b750e43243a381e6f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe38727a0bbbf: [StackSafety,NFC] Update documentation (authored by vitalybuka). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82941/new/ https://reviews.llvm.org/D82941 Files: llvm/docs/LangRef.rst llvm/include/llvm/Analysis/StackSafetyAnalysis.h llvm/include/llvm/IR/ModuleSummaryIndex.h llvm/lib/Analysis/StackSafetyAnalysis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82941.276636.patch Type: text/x-patch Size: 6378 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:08:29 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:08:29 +0000 (UTC) Subject: [PATCH] D83456: [NFC][AArch64] Refactor getArgumentPopSize Message-ID: kyulee created this revision. Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. This refactors getArgumentPopSize() to prepare for D76570 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83456 Files: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Index: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -177,6 +177,38 @@ STATISTIC(NumRedZoneFunctions, "Number of functions using red zone"); +/// Returns the argument pop size. +static uint64_t getArgumentPopSize(MachineFunction &MF, + MachineBasicBlock &MBB) { + MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); + bool IsTailCallReturn = false; + if (MBB.end() != MBBI) { + unsigned RetOpcode = MBBI->getOpcode(); + IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || + RetOpcode == AArch64::TCRETURNri || + RetOpcode == AArch64::TCRETURNriBTI; + } + AArch64FunctionInfo *AFI = MF.getInfo(); + + uint64_t ArgumentPopSize = 0; + if (IsTailCallReturn) { + MachineOperand &StackAdjust = MBBI->getOperand(1); + + // For a tail-call in a callee-pops-arguments environment, some or all of + // the stack may actually be in use for the call's arguments, this is + // calculated during LowerCall and consumed here... + ArgumentPopSize = StackAdjust.getImm(); + } else { + // ... otherwise the amount to pop is *all* of the argument space, + // conveniently stored in the MachineFunctionInfo by + // LowerFormalArguments. This will, of course, be zero for the C calling + // convention. + ArgumentPopSize = AFI->getArgumentStackToRestore(); + } + + return ArgumentPopSize; +} + /// This is the biggest offset to the stack pointer we can encode in aarch64 /// instructions (without using a separate calculation and a temp register). /// Note that the exception here are vector stores/loads which cannot encode any @@ -1416,7 +1448,6 @@ const AArch64Subtarget &Subtarget = MF.getSubtarget(); const TargetInstrInfo *TII = Subtarget.getInstrInfo(); DebugLoc DL; - bool IsTailCallReturn = false; bool NeedsWinCFI = needsWinCFI(MF); bool HasWinCFI = false; bool IsFunclet = false; @@ -1427,10 +1458,6 @@ if (MBB.end() != MBBI) { DL = MBBI->getDebugLoc(); - unsigned RetOpcode = MBBI->getOpcode(); - IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || - RetOpcode == AArch64::TCRETURNri || - RetOpcode == AArch64::TCRETURNriBTI; IsFunclet = isFuncletReturnInstr(*MBBI); } @@ -1445,21 +1472,7 @@ // Initial and residual are named for consistency with the prologue. Note that // in the epilogue, the residual adjustment is executed first. - uint64_t ArgumentPopSize = 0; - if (IsTailCallReturn) { - MachineOperand &StackAdjust = MBBI->getOperand(1); - - // For a tail-call in a callee-pops-arguments environment, some or all of - // the stack may actually be in use for the call's arguments, this is - // calculated during LowerCall and consumed here... - ArgumentPopSize = StackAdjust.getImm(); - } else { - // ... otherwise the amount to pop is *all* of the argument space, - // conveniently stored in the MachineFunctionInfo by - // LowerFormalArguments. This will, of course, be zero for the C calling - // convention. - ArgumentPopSize = AFI->getArgumentStackToRestore(); - } + uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB); // The stack frame should be like below, // -------------- next part -------------- A non-text attachment was scrubbed... Name: D83456.276639.patch Type: text/x-patch Size: 3477 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:08:29 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:08:29 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <12edc62a671eae2506106a565c52af62@localhost.localdomain> yonghong-song added a comment. Herald added a subscriber: ormris. @iii Thanks for the patch. Looks like you also addressed the issue for larger vlen for struct/union/enum/func_proto. Currently, we mostly just ignored these types. We issued fatal error for unhandled types (mostly non-C types). Such a compiler sanitization should be fine. I am thinking whether we should issue a compiler warning for such unsupported type or not. If we did, we will need an option to silence them. Not sure whether it is worth it or not since we did not issue warning right now for these types. For the code itself, is it possible to keep all visit function as `void` type and do unknown type generation whenever it hits (e.g., float/double, large vlen)? For unknown types, I still like to keep it as fatal error. If it did happen, we will see how to deal with it. I just do not want to silently convert it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Thu Jul 9 00:18:15 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:18:15 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: kyulee updated this revision to Diff 276640. kyulee added a comment. Refactor getArgumentPopSize() to D83456 Add comment on producePairRegisters() Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D76570.276640.patch Type: text/x-patch Size: 44502 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:22:30 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:22:30 +0000 (UTC) Subject: [PATCH] D83456: [NFC][AArch64] Refactor getArgumentPopSize In-Reply-To: References: Message-ID: <28d6d2efa0ba79e715242cf2359eb259@localhost.localdomain> plotfi accepted this revision. plotfi added subscribers: thegameg, aemerson. plotfi added a comment. This revision is now accepted and ready to land. LGTM, but I'd like @thegameg or @aemerson to take a look as well. Thanks! ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:190 + RetOpcode == AArch64::TCRETURNriBTI; + } + AArch64FunctionInfo *AFI = MF.getInfo(); ---------------- I think this could be a nicer refactoring, for a future NFC, but ignore this for now (in interest of straight forward NFC reviewing): ``` const bool IsTailCallReturn = (MBB.end() != MBBI) && [](unsigned RetOpcode) { return (RetOpcode == AArch64::TCRETURNdi || RetOpcode == AArch64::TCRETURNri || RetOpcode == AArch64::TCRETURNriBTI); } (MBBI->getOpcode()); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83456/new/ https://reviews.llvm.org/D83456 From llvm-commits at lists.llvm.org Thu Jul 9 00:26:50 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:26:50 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. LGTM, with one optional suggestion. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:51 +## The warning is not yet implemented for llvm-libtool-darwin. +# RUN: llvm-libtool-darwin -static -o %t.lib %t-input1.o %t-input2.o %t-input1.o +# RUN: llvm-ar t %t.lib | \ ---------------- It might be worth adding a `2&>1` and check the output is empty here, to flag up if a warning starts getting emitted. That way, it points to where to add testing for the warning. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Thu Jul 9 00:27:13 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:27:13 +0000 (UTC) Subject: [PATCH] D66029: llvm-canon In-Reply-To: References: Message-ID: <6a63b0ad6d0b8c6b235e60017ad21e5f@localhost.localdomain> plotfi added a comment. In D66029#2052030 , @mpaszkowski wrote: > @plotfi Should I create a new review so that the HarborMaster will be able run the builds after the fix? If updating this review did not trigger it, go ahead and create a new one. Sorry for the late reply @mpaszkowski. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66029/new/ https://reviews.llvm.org/D66029 From llvm-commits at lists.llvm.org Tue Jul 7 05:07:21 2020 From: llvm-commits at lists.llvm.org (Valery Pykhtin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 12:07:21 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <6ca7cdaea22d9110fdf3a8b3af45a662@localhost.localdomain> vpykhtin added a comment. Can you please add a LiveInterval dump (possibly truncated) for the test before and after each move to this review? It's a bit hard to follow what happens there. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Tue Jul 7 07:34:37 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 14:34:37 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <59e9ebcccdad85532da1804a474777c0@localhost.localdomain> rampitec added a comment. In D82916#2135752 , @vpykhtin wrote: > Can you please add a LiveInterval dump (possibly truncated) for the test before and after each move to this review? It's a bit hard to follow what happens there. Sure! This is the LIS at each step w/o this patch: Before 1st move: %1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[96r,112r:1)[112r,160B:2) 0 at 16r 1 at 96r 2 at 112r 3 at 80r 4 at 32B-phi L000000000000000C [16r,32B:0)[32B,64r:1) 0 at 16r 1 at 32B-phi L00000000000000C0 [16r,16d:0) 0 at 16r L0000000000000003 [16r,16d:1)[96r,96d:0) 0 at 96r 1 at 16r L0000000000000030 [16r,32B:2)[32B,48r:3)[80r,80d:1)[112r,160B:0) 0 at 112r 1 at 80r 2 at 16r 3 at 32B-phi weight:0.000000e+00 %2 [48r,80r:0) 0 at 48r weight:0.000000e+00 %3 [64r,64d:0) 0 at 64r weight:0.000000e+00 RegMasks: ********** MACHINEINSTRS ********** # Machine code for function func: NoPHIs 0B bb.0: successors: %bb.1(0x80000000); %bb.1(100.00%) 16B %1:sgpr_128 = IMPLICIT_DEF 32B bb.1: ; predecessors: %bb.0, %bb.1 successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%) 48B %2:sgpr_32 = COPY %1.sub2:sgpr_128 64B dead %3:sgpr_32 = COPY %1.sub1:sgpr_128 80B dead %1.sub2:sgpr_128 = COPY %2:sgpr_32 96B %1.sub0:sgpr_128 = IMPLICIT_DEF 112B %1.sub2:sgpr_128 = IMPLICIT_DEF 128B S_CBRANCH_SCC1 %bb.1, implicit undef $scc 144B S_BRANCH %bb.2 160B bb.2: ; predecessors: %bb.1 Before 2nd move (still correct): %1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2) 0 at 16r 1 at 88r 2 at 96r 3 at 80r 4 at 32B-phi L000000000000000C [16r,32B:0)[32B,64r:1) 0 at 16r 1 at 32B-phi L00000000000000C0 [16r,16d:0) 0 at 16r L0000000000000003 [16r,16d:1)[96r,96d:0) 0 at 96r 1 at 16r L0000000000000030 [16r,32B:2)[32B,48r:3)[80r,80d:1)[88r,160B:0) 0 at 88r 1 at 80r 2 at 16r 3 at 32B-phi weight:0.000000e+00 %2 [48r,80r:0) 0 at 48r weight:0.000000e+00 %3 [64r,64d:0) 0 at 64r weight:0.000000e+00 RegMasks: ********** MACHINEINSTRS ********** # Machine code for function func: NoPHIs 0B bb.0: successors: %bb.1(0x80000000); %bb.1(100.00%) 16B %1:sgpr_128 = IMPLICIT_DEF 32B bb.1: ; predecessors: %bb.0, %bb.1 successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%) 48B %2:sgpr_32 = COPY %1.sub2:sgpr_128 64B dead %3:sgpr_32 = COPY %1.sub1:sgpr_128 80B dead %1.sub2:sgpr_128 = COPY %2:sgpr_32 88B %1.sub2:sgpr_128 = IMPLICIT_DEF 96B %1.sub0:sgpr_128 = IMPLICIT_DEF 128B S_CBRANCH_SCC1 %bb.1, implicit undef $scc 144B S_BRANCH %bb.2 160B bb.2: ; predecessors: %bb.1 After 2nd move from 64B to 92B, LIS is broken: %1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2) 0 at 16r 1 at 88r 2 at 96r 3 at 80r 4 at 32B-phi L000000000000000C [16r,32B:0)[32B,92r:1) 0 at 16r 1 at 32B-phi L00000000000000C0 [16r,16d:0) 0 at 16r L0000000000000003 [16r,16d:1)[96r,96d:0) 0 at 96r 1 at 16r L0000000000000030 [16r,32B:2)[32B,48r:3)[80r,80d:1)[88r,160B:0) 0 at 88r 1 at 80r 2 at 16r 3 at 32B-phi weight:0.000000e+00 %2 [48r,80r:0) 0 at 48r weight:0.000000e+00 %3 [92r,92d:0) 0 at 92r weight:0.000000e+00 RegMasks: ********** MACHINEINSTRS ********** # Machine code for function func: NoPHIs 0B bb.0: successors: %bb.1(0x80000000); %bb.1(100.00%) 16B %1:sgpr_128 = IMPLICIT_DEF 32B bb.1: ; predecessors: %bb.0, %bb.1 successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%) 48B %2:sgpr_32 = COPY %1.sub2:sgpr_128 80B dead %1.sub2:sgpr_128 = COPY %2:sgpr_32 88B %1.sub2:sgpr_128 = IMPLICIT_DEF 92B dead %3:sgpr_32 = COPY %1.sub1:sgpr_128 96B %1.sub0:sgpr_128 = IMPLICIT_DEF 128B S_CBRANCH_SCC1 %bb.1, implicit undef $scc 144B S_BRANCH %bb.2 160B bb.2: ; predecessors: %bb.1 *** Bad machine code: A Subrange is not covered by the main range *** - function: func - interval: %1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2) 0 at 16r 1 at 88r 2 at 96r 3 at 80r 4 at 32B-phi L000000000000000C [16r,32B:0)[32B,92r:1) 0 at 16r 1 at 32B-phi L00000000000000C0 [16r,16d:0) 0 at 16r L0000000000000003 [16r,16d:1)[96r,96d:0) 0 at 96r 1 at 16r L0000000000000030 [16r,32B:2)[32B,48r:3)[80r,80d:1)[88r,160B:0) 0 at 88r 1 at 80r 2 at 16r 3 at 32B-phi weight:0.000000e+00 LLVM ERROR: Found 1 machine code errors. The difference after the patch which occurs after the 2nd move: %1 [16r,32B:0)[32B,80r:4)[80r,88r:3)[88r,96r:2)[96r,160B:1) 0 at 16r 1 at 96r 2 at 88r 3 at 80r 4 at 32B-phi CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Tue Jul 7 09:15:52 2020 From: llvm-commits at lists.llvm.org (Valery Pykhtin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:15:52 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <09570da8a9402701090c418a4447e0f1@localhost.localdomain> vpykhtin added a comment. I understood your patch. Generally I think patching LIS with cleared undef flags ins't right at first place because the semantic of register lifetime is ruined. After the first move there should be an undef flag set at the %1.sub2 = IMPLICIT_DEF instruction which would break lives of all subregs live at this point. Can we drop updating LIS during scheduling and recreate it from scratch? Or may be fully recreate only the intervals for the registers involved in moves when the undef flag is patched after scheduling? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Tue Jul 7 09:36:53 2020 From: llvm-commits at lists.llvm.org (Valery Pykhtin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:36:53 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <6ff110eae96ef9965d04dc1ccfb9ee97@localhost.localdomain> vpykhtin accepted this revision. vpykhtin added a comment. This revision is now accepted and ready to land. Given than the condition here is rare, I'm ok with the patch, reworking undef handling in scheduler is a massive work. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Tue Jul 7 09:49:09 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 16:49:09 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: rampitec added a comment. In D82916#2136518 , @vpykhtin wrote: > Given than the condition here is rare, I'm ok with the patch, reworking undef handling in scheduler is a massive work. Right. Scheduler deliberately drops undef flags before the scheduling. To recreate them in the middle of the scheduling would be against the concept. I assume there is nothing after the scheduler which needs them. Moreover it looks like complete recompute of an LI in the middle of the scheduling is expensive and recompute of just some will simply result in an incosistent LIS. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 From llvm-commits at lists.llvm.org Tue Jul 7 10:33:32 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 17:33:32 +0000 (UTC) Subject: [PATCH] D82227: SLP: honor requested max vector size merging PHIs In-Reply-To: References: Message-ID: <4d5c371b4df3c261401a7f87ec526a03@localhost.localdomain> spatel accepted this revision. spatel added a comment. This revision is now accepted and ready to land. We can't expect the backend to lower arbitrary vector IR and/or unlimited register pressure efficiently, so there's always going to be a need to limit IR in ways like this, so LGTM, but wait a bit to commit in case there are more comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82227/new/ https://reviews.llvm.org/D82227 From llvm-commits at lists.llvm.org Tue Jul 7 12:22:53 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 19:22:53 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <2be4aae4f5a490655c66699bb2311a89@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG7c038726453b: LIS: fix handleMove to properly extend main range (authored by rampitec). Changed prior to commit: https://reviews.llvm.org/D82916?vs=274624&id=275681#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 Files: llvm/lib/CodeGen/LiveIntervals.cpp llvm/unittests/MI/LiveIntervalTest.cpp Index: llvm/unittests/MI/LiveIntervalTest.cpp =================================================================== --- llvm/unittests/MI/LiveIntervalTest.cpp +++ llvm/unittests/MI/LiveIntervalTest.cpp @@ -499,6 +499,26 @@ }); } +TEST(LiveIntervalTest, TestMoveSubRegUseAcrossMainRangeHole) { + liveIntervalTest(R"MIR( + %1:sgpr_128 = IMPLICIT_DEF + bb.1: + %2:sgpr_32 = COPY %1.sub2 + %3:sgpr_32 = COPY %1.sub1 + %1.sub2 = COPY %2 + undef %1.sub0 = IMPLICIT_DEF + %1.sub2 = IMPLICIT_DEF + S_CBRANCH_SCC1 %bb.1, implicit undef $scc + S_BRANCH %bb.2 + bb.2: +)MIR", [](MachineFunction &MF, LiveIntervals &LIS) { + MachineInstr &MI = getMI(MF, 3, /*BlockNum=*/1); + MI.getOperand(0).setIsUndef(false); + testHandleMove(MF, LIS, 4, 3, 1); + testHandleMove(MF, LIS, 1, 4, 1); + }); +} + TEST(LiveIntervalTest, BundleUse) { liveIntervalTest(R"MIR( %0 = IMPLICIT_DEF Index: llvm/lib/CodeGen/LiveIntervals.cpp =================================================================== --- llvm/lib/CodeGen/LiveIntervals.cpp +++ llvm/lib/CodeGen/LiveIntervals.cpp @@ -1011,6 +1011,20 @@ } } updateRange(LI, Reg, LaneBitmask::getNone()); + // If main range has a hole and we are moving a subrange use across + // the hole updateRange() cannot properly handle it since it only + // gets the LiveRange and not the whole LiveInterval. As a result + // we may end up with a main range not covering all subranges. + // This is extremely rare case, so let's check and reconstruct the + // main range. + for (LiveInterval::SubRange &S : LI.subranges()) { + if (LI.covers(S)) + continue; + LI.clear(); + LIS.constructMainRangeFromSubranges(LI); + break; + } + continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82916.275681.patch Type: text/x-patch Size: 1851 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 16:13:10 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 23:13:10 +0000 (UTC) Subject: [PATCH] D80249: CodeGen: Don't lazily construct MachineFunctionInfo In-Reply-To: References: Message-ID: <6f44cbb3ed876d49b3b428433dd3b554@localhost.localdomain> aemerson added a comment. In D80249#2109204 , @arsenm wrote: > In D80249#2107516 , @kparzysz wrote: > > > A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the `createMachineFunctionInfo` returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members: > > > > virtual MachineFunctionInfo * > > createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F, > > const TargetSubtargetInfo *STI) const { > > static MachineFunctionInfo default; > > return &default; > > } > > > > > > Edit: fix indentation in the code > > > Alternatively, make it pure virtual or make it llvm_unreachable? This is ok with me. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80249/new/ https://reviews.llvm.org/D80249 From llvm-commits at lists.llvm.org Wed Jul 8 08:19:48 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:19:48 +0000 (UTC) Subject: [PATCH] D82227: SLP: honor requested max vector size merging PHIs In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG64030099c378: SLP: honor requested max vector size merging PHIs (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82227/new/ https://reviews.llvm.org/D82227 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82227.276444.patch Type: text/x-patch Size: 57072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 08:45:50 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 15:45:50 +0000 (UTC) Subject: [PATCH] D82227: SLP: honor requested max vector size merging PHIs In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG64030099c378: SLP: honor requested max vector size merging PHIs (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82227/new/ https://reviews.llvm.org/D82227 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/remark_unsupported.ll llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82227.275710.patch Type: text/x-patch Size: 57072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Tue Jul 7 11:52:46 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Tue, 07 Jul 2020 18:52:46 +0000 (UTC) Subject: [PATCH] D82916: LIS: fix handleMove to properly extend main range In-Reply-To: References: Message-ID: <3446a7dfa74e3ade156ea42fe413d770@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG7c038726453b: LIS: fix handleMove to properly extend main range (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82916/new/ https://reviews.llvm.org/D82916 Files: llvm/lib/CodeGen/LiveIntervals.cpp llvm/unittests/MI/LiveIntervalTest.cpp Index: llvm/unittests/MI/LiveIntervalTest.cpp =================================================================== --- llvm/unittests/MI/LiveIntervalTest.cpp +++ llvm/unittests/MI/LiveIntervalTest.cpp @@ -499,6 +499,26 @@ }); } +TEST(LiveIntervalTest, TestMoveSubRegUseAcrossMainRangeHole) { + liveIntervalTest(R"MIR( + %1:sgpr_128 = IMPLICIT_DEF + bb.1: + %2:sgpr_32 = COPY %1.sub2 + %3:sgpr_32 = COPY %1.sub1 + %1.sub2 = COPY %2 + undef %1.sub0 = IMPLICIT_DEF + %1.sub2 = IMPLICIT_DEF + S_CBRANCH_SCC1 %bb.1, implicit undef $scc + S_BRANCH %bb.2 + bb.2: +)MIR", [](MachineFunction &MF, LiveIntervals &LIS) { + MachineInstr &MI = getMI(MF, 3, /*BlockNum=*/1); + MI.getOperand(0).setIsUndef(false); + testHandleMove(MF, LIS, 4, 3, 1); + testHandleMove(MF, LIS, 1, 4, 1); + }); +} + TEST(LiveIntervalTest, BundleUse) { liveIntervalTest(R"MIR( %0 = IMPLICIT_DEF Index: llvm/lib/CodeGen/LiveIntervals.cpp =================================================================== --- llvm/lib/CodeGen/LiveIntervals.cpp +++ llvm/lib/CodeGen/LiveIntervals.cpp @@ -1011,6 +1011,20 @@ } } updateRange(LI, Reg, LaneBitmask::getNone()); + // If main range has a hole and we are moving a subrange use across + // the hole updateRange() cannot properly handle it since it only + // gets the LiveRange and not the whole LiveInterval. As a result + // we may end up with a main range not covering all subranges. + // This is extremely rare case, so let's check and reconstruct the + // main range. + for (LiveInterval::SubRange &S : LI.subranges()) { + if (LI.covers(S)) + continue; + LI.clear(); + LIS.constructMainRangeFromSubranges(LI); + break; + } + continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82916.276165.patch Type: text/x-patch Size: 1851 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 10:32:18 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 17:32:18 +0000 (UTC) Subject: [PATCH] D81678: Introduce noundef attribute at call sites for stricter poison analysis In-Reply-To: References: Message-ID: guiand updated this revision to Diff 276487. guiand added a comment. Per @nikic's suggestion, I isolated the LLVM side of the changes to a separate revision D83412 , which should be good to go. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81678/new/ https://reviews.llvm.org/D81678 Files: clang/include/clang/Basic/CodeGenOptions.def clang/include/clang/Driver/CC1Options.td clang/lib/CodeGen/CGCall.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/test/CodeGen/indirect-noundef.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D81678.276487.patch Type: text/x-patch Size: 5753 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Wed Jul 8 15:35:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Wed, 08 Jul 2020 22:35:45 +0000 (UTC) Subject: [PATCH] D70720: [llvm-objdump] Display locations of variables alongside disassembly In-Reply-To: References: Message-ID: MaskRay reopened this revision. MaskRay added a comment. This revision is now accepted and ready to land. I think this is still in a reverted state. `--debug-vars` is useful and I'd like to see it come back. Adding @dblaikie for some opinions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70720/new/ https://reviews.llvm.org/D70720 From llvm-commits at lists.llvm.org Thu Jul 9 00:33:24 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:33:24 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <817cdf3a4d4da7b8fb6922e5cee8b35d@localhost.localdomain> nikic added a comment. Compile-time on CTMark is positive: https://llvm-compile-time-tracker.com/compare.php?from=0b39d2d75275b80994dac06b7ad05031cbd09393&to=7d4fbdce1f0c6bd62fedb6e6d462fceb1d34fd33&stat=instructions Text size mostly gets smaller as well: https://llvm-compile-time-tracker.com/compare.php?from=0b39d2d75275b80994dac06b7ad05031cbd09393&to=7d4fbdce1f0c6bd62fedb6e6d462fceb1d34fd33&stat=size-text No idea whether that's a good or bad sign in this case. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Thu Jul 9 00:34:08 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:34:08 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <3e44cfe788acd7303e7a2ccce3407ee0@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test:1 +## This test checks that unrelated options are hidden in help text. + ---------------- This is probably fine, but an alternative that might be more consistent is to expand help-message.test to show that the option categories that should be supported are printed (by checking the e.g. "Color options:" text), and that the unrelated options aren't (by similarly checking that the header for them isn't included). There are some examples of this for some tools like llvm-size. You might also want to add a --help-list test case as in llvm-size's help.test. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:18 + +static cl::opt OutputFile("output", + cl::desc("Specify output filename"), ---------------- smeenai wrote: > sameerarora101 wrote: > > smeenai wrote: > > > As far as I can see, cctools libtool doesn't support the `-output` spelling, only `-o`. Is there any reason for us to support it? > > Yup, that is true. I was just looking at other llvm tools and they have `-output` in addition to `-o`. So I thought of adding both. I can remove `-output` if you guys prefer that? > I'd prefer to remove it, to mimic cctools libtool's interface as closely as possible. No issues removing -output from me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 00:34:35 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:34:35 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <65a2203af5366023da004cb00c242719@localhost.localdomain> mwezdeck updated this revision to Diff 276641. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 Files: llvm/lib/Support/ManagedStatic.cpp Index: llvm/lib/Support/ManagedStatic.cpp =================================================================== --- llvm/lib/Support/ManagedStatic.cpp +++ llvm/lib/Support/ManagedStatic.cpp @@ -76,8 +76,12 @@ /// llvm_shutdown - Deallocate and destroy all ManagedStatic variables. void llvm::llvm_shutdown() { - std::lock_guard Lock(*getManagedStaticMutex()); + std::recursive_mutex *mtx = getManagedStaticMutex(); + mtx->lock(); while (StaticList) StaticList->destroy(); + mtx->unlock(); + + delete mtx; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83372.276641.patch Type: text/x-patch Size: 547 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:36:10 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:36:10 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: mwezdeck added a comment. Can anybody point me out why static constructors are bad? Is there some kind of idiom? Any resource where I can read about that will be helpful. Thanks. I'm updating the patch CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Thu Jul 9 00:36:50 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:36:50 +0000 (UTC) Subject: [PATCH] D83393: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. In-Reply-To: References: Message-ID: <383d6ab3d4ccaa7002e433043498de3b@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. LGTM too. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83393/new/ https://reviews.llvm.org/D83393 From llvm-commits at lists.llvm.org Thu Jul 9 00:37:22 2020 From: llvm-commits at lists.llvm.org (Rainer Orth via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:37:22 +0000 (UTC) Subject: [PATCH] D83415: [Solaris] Fix Solaris build bots In-Reply-To: References: Message-ID: <51194e09c1992ce1dd0b93832a7aa042@localhost.localdomain> ro added a comment. The Solaris buildbots are back to normal again. Thanks for the quick fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83415/new/ https://reviews.llvm.org/D83415 From llvm-commits at lists.llvm.org Thu Jul 9 00:39:23 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:39:23 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <332a0cd6eaa7195a1d6c8ead178bdaf3@localhost.localdomain> kyulee updated this revision to Diff 276644. kyulee marked 2 inline comments as done. kyulee added a comment. Fix for merge Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D76570.276644.patch Type: text/x-patch Size: 43538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:39:49 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:39:49 +0000 (UTC) Subject: [PATCH] D83457: Fix return status of AtomicExpandPass Message-ID: serge-sans-paille created this revision. serge-sans-paille added reviewers: arsenm, asb, foad, jdoerfert. Herald added subscribers: llvm-commits, hiraditya, wdng. Herald added a project: LLVM. Correctly reflect change in the return status. Patch needed to land https://reviews.llvm.org/D80916 https://reviews.llvm.org/D83457 Files: llvm/lib/CodeGen/AtomicExpandPass.cpp Index: llvm/lib/CodeGen/AtomicExpandPass.cpp =================================================================== --- llvm/lib/CodeGen/AtomicExpandPass.cpp +++ llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -1447,8 +1447,10 @@ default: llvm_unreachable("Unhandled case in tryExpandAtomicCmpXchg"); case TargetLoweringBase::AtomicExpansionKind::None: - if (ValueSize < MinCASSize) + if (ValueSize < MinCASSize) { expandPartwordCmpXchg(CI); + return true; + } return false; case TargetLoweringBase::AtomicExpansionKind::LLSC: { return expandAtomicCmpXchg(CI); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83457.276643.patch Type: text/x-patch Size: 596 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:39:57 2020 From: llvm-commits at lists.llvm.org (Kai Luo via llvm-commits) Date: Thu, 09 Jul 2020 00:39:57 -0700 (PDT) Subject: [llvm] e2b9318 - [PowerPC] Only make copies of registers on stack in variadic function when va_start is called Message-ID: <5f06c9cd.1c69fb81.5fdde.5803@mx.google.com> Author: Kai Luo Date: 2020-07-09T07:18:17Z New Revision: e2b93185b84bd88264377f785465933a89faa4f8 URL: https://github.com/llvm/llvm-project/commit/e2b93185b84bd88264377f785465933a89faa4f8 DIFF: https://github.com/llvm/llvm-project/commit/e2b93185b84bd88264377f785465933a89faa4f8.diff LOG: [PowerPC] Only make copies of registers on stack in variadic function when va_start is called On PPC64, for a variadic function, if va_start is not called, it won't access any variadic argument on stack, thus we can save stores of registers used to pass arguments. Differential Revision: https://reviews.llvm.org/D82361 Added: Modified: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/legalize-vaarg.ll llvm/test/CodeGen/PowerPC/ppc64-varargs.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index ff8e2382ec65..229c5a76010c 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -4299,7 +4299,11 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4( // If the function takes variable number of arguments, make a frame index for // the start of the first vararg value... for expansion of llvm.va_start. - if (isVarArg) { + // On ELFv2ABI spec, it writes: + // C programs that are intended to be *portable* across diff erent compilers + // and architectures must use the header file to deal with variable + // argument lists. + if (isVarArg && MFI.hasVAStart()) { int Depth = ArgOffset; FuncInfo->setVarArgsFrameIndex( diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll index 296dea2f1f21..d937acc09c64 100644 --- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll +++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll @@ -6,13 +6,6 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) { ; BE-LABEL: test_large_vec_vaarg: ; BE: # %bb.0: ; BE-NEXT: ld 3, -8(1) -; BE-NEXT: std 4, 56(1) -; BE-NEXT: std 5, 64(1) -; BE-NEXT: std 6, 72(1) -; BE-NEXT: std 7, 80(1) -; BE-NEXT: std 8, 88(1) -; BE-NEXT: std 9, 96(1) -; BE-NEXT: std 10, 104(1) ; BE-NEXT: addi 3, 3, 15 ; BE-NEXT: rldicr 3, 3, 0, 59 ; BE-NEXT: addi 4, 3, 16 @@ -28,15 +21,8 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) { ; LE-LABEL: test_large_vec_vaarg: ; LE: # %bb.0: ; LE-NEXT: ld 3, -8(1) -; LE-NEXT: std 4, 40(1) -; LE-NEXT: std 5, 48(1) -; LE-NEXT: std 6, 56(1) -; LE-NEXT: std 7, 64(1) ; LE-NEXT: addi 3, 3, 15 ; LE-NEXT: rldicr 3, 3, 0, 59 -; LE-NEXT: std 8, 72(1) -; LE-NEXT: std 9, 80(1) -; LE-NEXT: std 10, 88(1) ; LE-NEXT: addi 4, 3, 31 ; LE-NEXT: addi 5, 3, 16 ; LE-NEXT: rldicr 4, 4, 0, 59 diff --git a/llvm/test/CodeGen/PowerPC/ppc64-varargs.ll b/llvm/test/CodeGen/PowerPC/ppc64-varargs.ll index 56816aee6704..5aeedbf13e0b 100644 --- a/llvm/test/CodeGen/PowerPC/ppc64-varargs.ll +++ b/llvm/test/CodeGen/PowerPC/ppc64-varargs.ll @@ -7,29 +7,12 @@ define i32 @f(...) nounwind { ; BE-LABEL: f: ; BE: # %bb.0: # %entry -; BE-NEXT: mr r11, r3 ; BE-NEXT: li r3, 0 -; BE-NEXT: std r11, 48(r1) -; BE-NEXT: std r4, 56(r1) -; BE-NEXT: std r5, 64(r1) -; BE-NEXT: std r6, 72(r1) -; BE-NEXT: std r7, 80(r1) -; BE-NEXT: std r8, 88(r1) -; BE-NEXT: std r9, 96(r1) -; BE-NEXT: std r10, 104(r1) ; BE-NEXT: blr ; ; LE-LABEL: f: ; LE: # %bb.0: # %entry -; LE-NEXT: std r3, 32(r1) ; LE-NEXT: li r3, 0 -; LE-NEXT: std r4, 40(r1) -; LE-NEXT: std r5, 48(r1) -; LE-NEXT: std r6, 56(r1) -; LE-NEXT: std r7, 64(r1) -; LE-NEXT: std r8, 72(r1) -; LE-NEXT: std r9, 80(r1) -; LE-NEXT: std r10, 88(r1) ; LE-NEXT: blr entry: ret i32 0 From llvm-commits at lists.llvm.org Thu Jul 9 00:40:08 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:40:08 +0000 (UTC) Subject: [PATCH] D82361: [PowerPC] Only make copies of registers on stack in variadic function when va_start is called In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe2b93185b84b: [PowerPC] Only make copies of registers on stack in variadic function when… (authored by lkail). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82361/new/ https://reviews.llvm.org/D82361 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/legalize-vaarg.ll llvm/test/CodeGen/PowerPC/ppc64-varargs.ll Index: llvm/test/CodeGen/PowerPC/ppc64-varargs.ll =================================================================== --- llvm/test/CodeGen/PowerPC/ppc64-varargs.ll +++ llvm/test/CodeGen/PowerPC/ppc64-varargs.ll @@ -7,29 +7,12 @@ define i32 @f(...) nounwind { ; BE-LABEL: f: ; BE: # %bb.0: # %entry -; BE-NEXT: mr r11, r3 ; BE-NEXT: li r3, 0 -; BE-NEXT: std r11, 48(r1) -; BE-NEXT: std r4, 56(r1) -; BE-NEXT: std r5, 64(r1) -; BE-NEXT: std r6, 72(r1) -; BE-NEXT: std r7, 80(r1) -; BE-NEXT: std r8, 88(r1) -; BE-NEXT: std r9, 96(r1) -; BE-NEXT: std r10, 104(r1) ; BE-NEXT: blr ; ; LE-LABEL: f: ; LE: # %bb.0: # %entry -; LE-NEXT: std r3, 32(r1) ; LE-NEXT: li r3, 0 -; LE-NEXT: std r4, 40(r1) -; LE-NEXT: std r5, 48(r1) -; LE-NEXT: std r6, 56(r1) -; LE-NEXT: std r7, 64(r1) -; LE-NEXT: std r8, 72(r1) -; LE-NEXT: std r9, 80(r1) -; LE-NEXT: std r10, 88(r1) ; LE-NEXT: blr entry: ret i32 0 Index: llvm/test/CodeGen/PowerPC/legalize-vaarg.ll =================================================================== --- llvm/test/CodeGen/PowerPC/legalize-vaarg.ll +++ llvm/test/CodeGen/PowerPC/legalize-vaarg.ll @@ -6,13 +6,6 @@ ; BE-LABEL: test_large_vec_vaarg: ; BE: # %bb.0: ; BE-NEXT: ld 3, -8(1) -; BE-NEXT: std 4, 56(1) -; BE-NEXT: std 5, 64(1) -; BE-NEXT: std 6, 72(1) -; BE-NEXT: std 7, 80(1) -; BE-NEXT: std 8, 88(1) -; BE-NEXT: std 9, 96(1) -; BE-NEXT: std 10, 104(1) ; BE-NEXT: addi 3, 3, 15 ; BE-NEXT: rldicr 3, 3, 0, 59 ; BE-NEXT: addi 4, 3, 16 @@ -28,15 +21,8 @@ ; LE-LABEL: test_large_vec_vaarg: ; LE: # %bb.0: ; LE-NEXT: ld 3, -8(1) -; LE-NEXT: std 4, 40(1) -; LE-NEXT: std 5, 48(1) -; LE-NEXT: std 6, 56(1) -; LE-NEXT: std 7, 64(1) ; LE-NEXT: addi 3, 3, 15 ; LE-NEXT: rldicr 3, 3, 0, 59 -; LE-NEXT: std 8, 72(1) -; LE-NEXT: std 9, 80(1) -; LE-NEXT: std 10, 88(1) ; LE-NEXT: addi 4, 3, 31 ; LE-NEXT: addi 5, 3, 16 ; LE-NEXT: rldicr 4, 4, 0, 59 Index: llvm/lib/Target/PowerPC/PPCISelLowering.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -4299,7 +4299,11 @@ // If the function takes variable number of arguments, make a frame index for // the start of the first vararg value... for expansion of llvm.va_start. - if (isVarArg) { + // On ELFv2ABI spec, it writes: + // C programs that are intended to be *portable* across different compilers + // and architectures must use the header file to deal with variable + // argument lists. + if (isVarArg && MFI.hasVAStart()) { int Depth = ArgOffset; FuncInfo->setVarArgsFrameIndex( -------------- next part -------------- A non-text attachment was scrubbed... Name: D82361.276645.patch Type: text/x-patch Size: 2777 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:41:28 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:41:28 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. In-Reply-To: References: Message-ID: jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test:5 +# RUN: yaml2obj --docnum=1 %s -o %t1 +# RUN: llvm-readobj --dyn-relocations %t1 2>&1 | FileCheck %s --check-prefix=LLVM-EMPTY +# RUN: llvm-readelf --dyn-relocations %t1 2>&1 | FileCheck %s --implicit-check-not={{.}} --allow-empty ---------------- I'd avoid using -EMPTY as a custom prefix name as that might get confused with -EMPTY as the FileCheck check suffix. Perhaps "LLVM-NONE"? ================ Comment at: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test:19 + +## Check how we dump all possbile dynamic relocations sections. +# RUN: yaml2obj --docnum=2 %s -o %t2.1 ---------------- how -> that relocations -> relocation ================ Comment at: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test:104 + Value: 0x10 +## 0x28 == offset of .rel.dyn section in the segment. + - Tag: DT_RELR ---------------- `.relr.dyn`? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83387/new/ https://reviews.llvm.org/D83387 From llvm-commits at lists.llvm.org Thu Jul 9 00:41:41 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:41:41 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: kyulee marked an inline comment as done. kyulee added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:241 + +bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const { + return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF); ---------------- plotfi wrote: > This is so small that I feel it would be more descriptive at the call site of SavedRegs.set/test to have: > > ``` > /// true if CSRs should be paired > const bool producePairRegisters = produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF); > ``` > > With some additional comments on the register paring in the context of homogenous-prolog-epilog. Updated the comment. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1515 // in the epilogue, the residual adjustment is executed first. - uint64_t ArgumentPopSize = 0; - if (IsTailCallReturn) { - MachineOperand &StackAdjust = MBBI->getOperand(1); - - // For a tail-call in a callee-pops-arguments environment, some or all of - // the stack may actually be in use for the call's arguments, this is - // calculated during LowerCall and consumed here... - ArgumentPopSize = StackAdjust.getImm(); - } else { - // ... otherwise the amount to pop is *all* of the argument space, - // conveniently stored in the MachineFunctionInfo by - // LowerFormalArguments. This will, of course, be zero for the C calling - // convention. - ArgumentPopSize = AFI->getArgumentStackToRestore(); - } + uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB); ---------------- plotfi wrote: > Is the bit that was removed a non-functional change here? If so, can this be a separate NFC commit? It's refactored to D83456. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 From llvm-commits at lists.llvm.org Thu Jul 9 00:44:57 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:44:57 +0000 (UTC) Subject: [PATCH] D83447: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. In-Reply-To: References: Message-ID: <3c779a349d4430f9e7af6f332651a8dc@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83447/new/ https://reviews.llvm.org/D83447 From llvm-commits at lists.llvm.org Thu Jul 9 00:45:09 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:45:09 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: <11ecbc8f3586d42a4abfc8156efc09fa@localhost.localdomain> grimar added a comment. In D83264#2135482 , @grimar wrote: > Probably there are other names rather than "dead-reloc-in-nonalloc" which we might want to consider? > > `-z dead-noalloc-reloc-val` > `-z tombstone-reloc` > `-z resolve-dead-reloc` This remained unanswered. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Thu Jul 9 00:46:27 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:46:27 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <372dbdb259847e65b9f6f3b3b27594f2@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- dblaikie wrote: > jhenderson wrote: > > dblaikie wrote: > > > Higuoxing wrote: > > > > jhenderson wrote: > > > > > dblaikie wrote: > > > > > > Higuoxing wrote: > > > > > > > dblaikie wrote: > > > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > > > > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > What do you mean by "auto-generation aspects"? > > But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time. At the moment, to use yaml2obj to generate DWARF, you have to specify pretty much every detail of the DWARF, including the details of the abbrev table and the string table for example. Ideally, we should be able to describe the DWARF in a higher level manner (e.g. by just specifying the attributes and values in the .debug_info description, letting yaml2obj do all the leg work of selecting a form, populating the abbrev and string tables etc). You'll see details of this in @Higuoxing's mailing list posts about his GSOC project. We can use the basic-level testing for "bootstrapping". yaml2obj can generate valid raw sections, tested via hex -> allows testing of llvm-dwarfdump section dumping -> allows testing of yaml2obj higher-level functionality (because we know that llvm-dwarfdump section dumping now works). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Thu Jul 9 00:46:54 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:46:54 +0000 (UTC) Subject: [PATCH] D83458: [StackSafety,NFC] Reduce FunctionSummary size Message-ID: vitalybuka created this revision. vitalybuka added a reviewer: tejohnson. Herald added subscribers: llvm-commits, arphaman. Herald added a project: LLVM. Most compiler infocations will not need ParamAccess, so we can optimize memory usage there with smaller unique_ptr instead of empty vector. Suggested in D80908 review. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83458 Files: llvm/include/llvm/IR/ModuleSummaryIndex.h Index: llvm/include/llvm/IR/ModuleSummaryIndex.h =================================================================== --- llvm/include/llvm/IR/ModuleSummaryIndex.h +++ llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -629,7 +629,8 @@ std::unique_ptr TIdInfo; /// Uses for every parameter to this function. - std::vector ParamAccesses; + using ParamAccessesTy = std::vector; + std::unique_ptr ParamAccesses; public: FunctionSummary(GVFlags Flags, unsigned NumInsts, FFlags FunFlags, @@ -640,19 +641,20 @@ std::vector TypeCheckedLoadVCalls, std::vector TypeTestAssumeConstVCalls, std::vector TypeCheckedLoadConstVCalls, - std::vector ParamAccesses) + std::vector Params) : GlobalValueSummary(FunctionKind, Flags, std::move(Refs)), InstCount(NumInsts), FunFlags(FunFlags), EntryCount(EntryCount), - CallGraphEdgeList(std::move(CGEdges)), - ParamAccesses(std::move(ParamAccesses)) { + CallGraphEdgeList(std::move(CGEdges)) { if (!TypeTests.empty() || !TypeTestAssumeVCalls.empty() || !TypeCheckedLoadVCalls.empty() || !TypeTestAssumeConstVCalls.empty() || !TypeCheckedLoadConstVCalls.empty()) - TIdInfo = std::make_unique(TypeIdInfo{ - std::move(TypeTests), std::move(TypeTestAssumeVCalls), - std::move(TypeCheckedLoadVCalls), - std::move(TypeTestAssumeConstVCalls), - std::move(TypeCheckedLoadConstVCalls)}); + TIdInfo = std::make_unique( + TypeIdInfo{std::move(TypeTests), std::move(TypeTestAssumeVCalls), + std::move(TypeCheckedLoadVCalls), + std::move(TypeTestAssumeConstVCalls), + std::move(TypeCheckedLoadConstVCalls)}); + if (!Params.empty()) + ParamAccesses = std::make_unique(std::move(Params)); } // Gets the number of readonly and writeonly refs in RefEdgeList std::pair specialRefCounts() const; @@ -724,11 +726,20 @@ } /// Returns the list of known uses of pointer parameters. - ArrayRef paramAccesses() const { return ParamAccesses; } + ArrayRef paramAccesses() const { + if (ParamAccesses) + return *ParamAccesses; + return {}; + } /// Sets the list of known uses of pointer parameters. void setParamAccesses(std::vector NewParams) { - ParamAccesses = std::move(NewParams); + if (NewParams.empty()) + ParamAccesses.reset(); + else if (ParamAccesses) + *ParamAccesses = std::move(NewParams); + else + ParamAccesses = std::make_unique(std::move(NewParams)); } /// Add a type test to the summary. This is used by WholeProgramDevirt if we -------------- next part -------------- A non-text attachment was scrubbed... Name: D83458.276648.patch Type: text/x-patch Size: 2937 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 00:48:13 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 07:48:13 +0000 (UTC) Subject: [PATCH] D83459: Correctly update return status for MVEGatherScatterLowering Message-ID: serge-sans-paille created this revision. serge-sans-paille added reviewers: anwel, foad, jdoerfert. Herald added subscribers: llvm-commits, dmgreen, hiraditya. Herald added a project: LLVM. Correctly reflect change in the return status. Patch needed to land https://reviews.llvm.org/D80916 https://reviews.llvm.org/D83459 Files: llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp Index: llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp =================================================================== --- llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp +++ llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp @@ -990,26 +990,27 @@ SmallVector Gathers; SmallVector Scatters; + bool Changed = false; + for (BasicBlock &BB : F) { for (Instruction &I : BB) { IntrinsicInst *II = dyn_cast(&I); if (II && II->getIntrinsicID() == Intrinsic::masked_gather) { Gathers.push_back(II); if (isa(II->getArgOperand(0))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(0))->getOperand(1), II->getParent(), LI); } else if (II && II->getIntrinsicID() == Intrinsic::masked_scatter) { Scatters.push_back(II); if (isa(II->getArgOperand(1))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(1))->getOperand(1), II->getParent(), LI); } } } - bool Changed = false; for (unsigned i = 0; i < Gathers.size(); i++) { IntrinsicInst *I = Gathers[i]; Value *L = lowerGather(I); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83459.276649.patch Type: text/x-patch Size: 1335 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:11:25 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:25 +0000 (UTC) Subject: [PATCH] D82858: [llvm-objdump] Detect note section for ELF objects In-Reply-To: References: Message-ID: jhenderson added a comment. In D82858#2139642 , @scott.linder wrote: > > In an ideal world, we'd merge all the binary tools (GNU and LLVM) into a single tool, or redistribute functionality somehow, so that we don't have duplicate functionality like we already do. This takes us further away from that ideal. > > I'm confused by this statement in particular. If the goal is to just have one tool, why did LLVM start re-implementing these tools to begin with? Wasn't the first commit of "llvm-objdump"/"llvm-readobj" a massive step away from the ideal? I'm afraid I can't answer that question. I joined LLVM development quite some time after both llvm-readobj and llvm-objdump were initially created. My suspicion is that llvm-readobj was created to provide a generic testing facility, llvm-readelf (i.e. GNU output style for llvm-readobj) was later added for GNU compatiblity, and llvm-objdump was created for disassembly, with GNU compatibility features such as section header printing added later on. However, I haven't attempted to research any of this in depth, so I could easily see this being wrong. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82858/new/ https://reviews.llvm.org/D82858 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:26 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:26 +0000 (UTC) Subject: [PATCH] D83460: Fix HexagonGenExtract return statu Message-ID: serge-sans-paille created this revision. serge-sans-paille added reviewers: kparzysz, foad, jdoerfert. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Correctly reflect change in the return status. Patch needed to land https://reviews.llvm.org/D80916 https://reviews.llvm.org/D83460 Files: llvm/lib/Target/Hexagon/HexagonGenExtract.cpp Index: llvm/lib/Target/Hexagon/HexagonGenExtract.cpp =================================================================== --- llvm/lib/Target/Hexagon/HexagonGenExtract.cpp +++ llvm/lib/Target/Hexagon/HexagonGenExtract.cpp @@ -221,15 +221,16 @@ } bool HexagonGenExtract::visitBlock(BasicBlock *B) { + bool Changed = false; + // Depth-first, bottom-up traversal. for (auto *DTN : children(DT->getNode(B))) - visitBlock(DTN->getBlock()); + Changed |= visitBlock(DTN->getBlock()); // Allow limiting the number of generated extracts for debugging purposes. bool HasCutoff = ExtractCutoff.getPosition(); unsigned Cutoff = ExtractCutoff; - bool Changed = false; BasicBlock::iterator I = std::prev(B->end()), NextI, Begin = B->begin(); while (true) { if (HasCutoff && (ExtractCount >= Cutoff)) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83460.276651.patch Type: text/x-patch Size: 841 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:11:26 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:26 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: jhenderson accepted this revision. jhenderson added a comment. Latest update LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:26 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:26 +0000 (UTC) Subject: [PATCH] D83461: [SVE] Fix implicit TypeSize->uint64_t conversion in getCastInstrCost Message-ID: david-arm created this revision. david-arm added reviewers: sdesmalen, ctetreau, kmclaughlin. Herald added subscribers: llvm-commits, psnobl, tschuett. Herald added a reviewer: efriedma. Herald added a project: LLVM. In getCastInstrCost() when comparing different sizes for src and dst types we should be using the TypeSize comparison operators instead of relying upon TypeSize being converted a uin64_t. Previously this meant we were dropping the scalable property and treating fixed and scalable vector types the same. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83461 Files: llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/test/Analysis/CostModel/AArch64/sve-bitcast.ll Index: llvm/test/Analysis/CostModel/AArch64/sve-bitcast.ll =================================================================== --- /dev/null +++ llvm/test/Analysis/CostModel/AArch64/sve-bitcast.ll @@ -0,0 +1,11 @@ +; RUN: opt -mtriple=aarch64-linux-gnu -mattr=+sve -cost-model -analyze < %s 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning + +; CHECK: Found an estimated cost of 0 for instruction: %b = bitcast %a to + +define @foo( %a, i32 %x) { + %b = bitcast %a to + ret %b +} Index: llvm/include/llvm/CodeGen/BasicTTIImpl.h =================================================================== --- llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -694,8 +694,8 @@ std::pair SrcLT = TLI->getTypeLegalizationCost(DL, Src); std::pair DstLT = TLI->getTypeLegalizationCost(DL, Dst); - unsigned SrcSize = SrcLT.second.getSizeInBits(); - unsigned DstSize = DstLT.second.getSizeInBits(); + TypeSize SrcSize = SrcLT.second.getSizeInBits(); + TypeSize DstSize = DstLT.second.getSizeInBits(); bool IntOrPtrSrc = Src->isIntegerTy() || Src->isPointerTy(); bool IntOrPtrDst = Dst->isIntegerTy() || Dst->isPointerTy(); @@ -769,8 +769,7 @@ // Check vector-to-vector casts. if (DstVTy && SrcVTy) { // If the cast is between same-sized registers, then the check is simple. - if (SrcLT.first == DstLT.first && - SrcLT.second.getSizeInBits() == DstLT.second.getSizeInBits()) { + if (SrcLT.first == DstLT.first && SrcSize == DstSize) { // Assume that Zext is done using AND. if (Opcode == Instruction::ZExt) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83461.276652.patch Type: text/x-patch Size: 1849 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:11:27 2020 From: llvm-commits at lists.llvm.org (Dominik Montada via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:27 +0000 (UTC) Subject: [PATCH] D83390: [GlobalISel][InlineAsm] Extend input operands when register class size does not match type In-Reply-To: References: Message-ID: <0fe92e38d0b3e5552d26ec637c400e22@localhost.localdomain> gargaroff added a comment. In D83390#2140237 , @kschwarz wrote: > A similar issue (for tied input operands) is handled in https://reviews.llvm.org/D83384 > The function introduced there should be extended to handle the vector case. In that case I'll hold off on this patch until the one you linked is landed. If that one will introduce the extension handling already, I'll abandon this patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83390/new/ https://reviews.llvm.org/D83390 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:27 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Mikael_Holm=C3=A9n_via_Phabricator?= via llvm-commits) Date: Thu, 09 Jul 2020 08:11:27 +0000 (UTC) Subject: [PATCH] D81345: [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz In-Reply-To: References: Message-ID: <6cb68cff50239469853c2c53f9734284@localhost.localdomain> uabelho added a comment. Hi, I start seeing a crash with this patch: https://bugs.llvm.org/show_bug.cgi?id=46652 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81345/new/ https://reviews.llvm.org/D81345 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:28 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:28 +0000 (UTC) Subject: [PATCH] D83457: Fix return status of AtomicExpandPass In-Reply-To: References: Message-ID: <005e957cae06908abd342d864993c5e8@localhost.localdomain> foad accepted this revision. foad added a comment. This revision is now accepted and ready to land. Looks obviously correct. Might be slightly neater to change expandPartwordCmpXchg to return true, like expandAtomicCmpXchg does? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83457/new/ https://reviews.llvm.org/D83457 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:28 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:28 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough In-Reply-To: References: Message-ID: <5ba53acb70b01f9d505e4e8c19e41e6f@localhost.localdomain> lebedev.ri added a comment. In D83432#2140532 , @dblaikie wrote: > If the test is writing the output file anyway - is the rm necessary? (lots of tests write to output files via "-o %t" from some tool or another and most don't delete %t before doing so) They were added by you in rL372054 , so indeed the revert of that commit would be the best solution now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83432/new/ https://reviews.llvm.org/D83432 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:29 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:29 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE Message-ID: djtodoro created this revision. djtodoro added reviewers: probinson, aprantl, vsk, dstenb. djtodoro added projects: debug-info, LLVM. Herald added subscribers: llvm-commits, hiraditya. SONY debugger does not support debug entry values feature, so the plan is to avoid production of the entry values by default when the tuning is SCE debugger. The feature still can be enabled with the `-debug-entry-values` option for the testing/development purposes. This patch addresses PR46643. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83462 Files: llvm/lib/CodeGen/TargetOptionsImpl.cpp llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir Index: llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir =================================================================== --- llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir +++ llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir @@ -1,16 +1,22 @@ # RUN: llc -start-before=livedebugvalues -mtriple=x86_64-apple-darwin -o %t %s -filetype=obj # RUN: llvm-dwarfdump %t | FileCheck %s -# -# int global; -# int foo(int p, int q, int r) { -# global = p + 1; -# asm __volatile("" : : : "edi", "esi", "edx"); -# return 123; -# } + +# RUN: llc -start-before=livedebugvalues -debugger-tune=sce -mtriple=x86_64-sce-ps4 -o %t1 %s -filetype=obj +# RUN: llvm-dwarfdump %t1 | FileCheck %s -check-prefix=SCE + +## Based on: +## int global; +## int foo(int p, int q, int r) { +## global = p + 1; +## asm __volatile("" : : : "edi", "esi", "edx"); +## return 123; +## } # CHECK: DW_TAG_formal_parameter # CHECK: DW_OP_entry_value +# SCE-NOT: DW_OP_GNU_entry_value + --- | ; ModuleID = 'multiple-param-dbg-value-entry.ll' source_filename = "multiple-param-dbg-value-entry.c" Index: llvm/lib/CodeGen/TargetOptionsImpl.cpp =================================================================== --- llvm/lib/CodeGen/TargetOptionsImpl.cpp +++ llvm/lib/CodeGen/TargetOptionsImpl.cpp @@ -49,5 +49,6 @@ /// NOTE: There are targets that still do not support the debug entry values /// production. bool TargetOptions::ShouldEmitDebugEntryValues() const { - return SupportsDebugEntryValues || EnableDebugEntryValues; + return (SupportsDebugEntryValues && DebuggerTuning != DebuggerKind::SCE) || + EnableDebugEntryValues; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83462.276653.patch Type: text/x-patch Size: 1636 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:13:12 2020 From: llvm-commits at lists.llvm.org (Nuno Lopes via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:13:12 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: nlopes added inline comments. ================ Comment at: llvm/lib/Analysis/InstructionSimplify.cpp:4124 + // select ?, undef, X -> X + if (isa(TrueVal) && isGuaranteedNotToBeUndefOrPoison(FalseVal)) + return FalseVal; ---------------- craig.topper wrote: > Should I be passing CxtI and DT here? I based this off the code in SimplifyInsertElementInst, but maybe i should have based it off of SimplifyFreezeInst? Yes, you can; it's correct to do so. The only issue is that `SimplifySelectInst()` doesn't seem to receive the select instruction as argument to be passed as CxtI. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Thu Jul 9 01:14:02 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:14:02 +0000 (UTC) Subject: [PATCH] D83463: [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE Message-ID: djtodoro created this revision. djtodoro added reviewers: probinson, dstenb, vsk, aprantl. djtodoro added projects: debug-info, LLVM. Herald added subscribers: llvm-commits, hiraditya. Emit DWARF 5 call-site symbols even though DWARF 4 is set, only in the case of LLDB tuning. This patch addresses PR46643. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83463 Files: llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83463.276655.patch Type: text/x-patch Size: 4243 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:15:09 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:15:09 +0000 (UTC) Subject: [PATCH] D83442: [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison In-Reply-To: References: Message-ID: lebedev.ri accepted this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. lg CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83442/new/ https://reviews.llvm.org/D83442 From llvm-commits at lists.llvm.org Thu Jul 9 01:15:49 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:15:49 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <4e8bce77c1374fc27a730f0fb5a94ccb@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:348 + + // Using for the fixed or float point parameter type identification. + static constexpr uint32_t FixedParaTypeBit = 0x8000'0000; ---------------- Using -> Used (?) ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:408 + uint64_t Size); + XCOFFTracebackTable(const uint8_t *Ptr, uint64_t Size, Error &Err); + ---------------- Make this private so that all users must use the `create` function. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:883 +XCOFFTracebackTable::XCOFFTracebackTable(const uint8_t *Ptr, + const uint64_t Size, Error &Err) + : TBPtr(Ptr), Size(Size) { ---------------- Delete the `const` for `Size`. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:886 + ErrorAsOutParameter EAO(&Err); + DataExtractor DE(ArrayRef(Ptr, Size), /* IsLittleEndian */ false, + /* AddressSize */ 0); ---------------- Preferred style is `/* IsLittleEndian=*/false` (same with `AddressSize` below). This plays well with clang-format too. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:888 + /* AddressSize */ 0); + uint64_t OffsetPtr = 0; + ---------------- This isn't a pointer, so remove the `Ptr`! ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:890 + + DE.getU64(&OffsetPtr, &Err); + ---------------- Not sure if you've come across it yet, but you could simplify all these calls using `DataExtractor::Cursor` instead. This stores the error internally, whilst also allowing you to continue parsing to the end, with no harmful effects (because nothing is read if `Cursor` is in an error state), similar to `Err` being passed in everywhere. It also tracks the offset. Usage is: ``` DataExtractor::Cursor C(/*Offset=*/0); DE.getU64(C); // By the way, what is this value for? ... // No need for ifs here. CodeLen = DE.getU32(C); HandlerMask = DE.getU32(C); ... Err = C.takeError(); ``` ================ Comment at: llvm/unittests/Object/XCOFFObjectFileTest.cpp:96 + EXPECT_TRUE(TT1.getParaType()); + EXPECT_STREQ(TT1.getParaType().getValue().data(), "i, i, f, f, d"); + ---------------- DiggerLin wrote: > jhenderson wrote: > > Does `EXPECT_EQ(TT1.getParaType().getValue(), "i, i, f, f, d");` not work? > > according to https://github.com/google/googletest/blob/master/googletest/docs/primer.md > > EXPECT_EQ(val1, val2); compare two values. val1 == val2 > > EXPECT_STREQ(str1,str2); compare two C strings the two C strings have the same content > Emphasis on "C strings". You're not using a C string on the left, so can use standard C-string to `std::string` implicit conversion on the right instead of calling `c_str` or `data`, so `EXPECT_EQ` is better (simplfies code). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Thu Jul 9 01:15:59 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:15:59 +0000 (UTC) Subject: [PATCH] D83459: Correctly update return status for MVEGatherScatterLowering In-Reply-To: References: Message-ID: <30f9a8bafd0d0ce1cb108c868efffd5b@localhost.localdomain> foad added a comment. Looks obvious to me but I'd prefer an ARM maintainer to review it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83459/new/ https://reviews.llvm.org/D83459 From llvm-commits at lists.llvm.org Thu Jul 9 01:17:30 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:17:30 +0000 (UTC) Subject: [PATCH] D83460: Fix HexagonGenExtract return statu In-Reply-To: References: Message-ID: foad added a comment. Looks obviously correct to me but I'd prefer a Hexagon maintainer to review it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83460/new/ https://reviews.llvm.org/D83460 From llvm-commits at lists.llvm.org Thu Jul 9 01:19:15 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:19:15 +0000 (UTC) Subject: [PATCH] D83464: [MachineOutliner][AArch64] Fix for noreturn functions Message-ID: kyulee created this revision. Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. Noreturn functions conservatively require save + restore LR. When there is no available register, the stack is used to handle LR. If the block contains a call, the outlined function also needs to set a frame on the stack. This caused an assertion failure for . This checks whether a candidate is MachineOutlinerDefault while requiring the modification of the stack due to a call. If that is the case, outlining is bailed-out to avoid modifying the stack twice. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83464 Files: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp llvm/test/CodeGen/AArch64/machine-outliner-noreturn-no-stack.mir Index: llvm/test/CodeGen/AArch64/machine-outliner-noreturn-no-stack.mir =================================================================== --- /dev/null +++ llvm/test/CodeGen/AArch64/machine-outliner-noreturn-no-stack.mir @@ -0,0 +1,69 @@ +# RUN: llc -mtriple=arm64-apple-ios -run-pass=prologepilog -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s + +# Noreturn functions conservatively need to save and restore lr. +# When there is no available register, the stack is used at call site. +# If the stack also needs to be set up for a call in the outlined function, +# bail-out this case since we do not handle adjusting the stack twice. + +# CHECK-NOT: OUTLINED_FUNCTION + +--- | + @g = external global i32 + define void @stack_1() #0 { ret void } + define void @stack_2() #0 { ret void } + define void @baz() { + ret void + } + attributes #0 = { noredzone noreturn } +... +--- +name: stack_1 +tracksRegLiveness: true +body: | + bb.0: + liveins: $x4, $x0, $x1, $x2, $x3 + $w8 = MOVZWi 259, 0 + $x9 = ADRP target-flags(aarch64-page) @g + $x9 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-nc) @g, 0 + STRXui $x9, $x1, 0 + STRHHui $w8, $x1, 8 + $w8 = ORRWrs $wzr, $w4, 0, implicit-def $x8 + STRXui $x8, $x3, 0 + STPXi $x3, $xzr, $x2, 0 + $w8 = MOVZWi 271, 0 + STRHHui $w8, $x2, 8 + $x8 = ORRXrs $xzr, $x0, 0 + $x0 = ORRXrs $xzr, $x1, 0 + $x1 = ORRXrs $xzr, $x2, 0 + BL @baz, implicit-def dead $lr, implicit $sp, implicit $x8, implicit $x0, implicit $x1, implicit $x3, implicit $x4, implicit-def $sp, implicit-def $x5, implicit-def $x6, implicit-def $x7, implicit-def $x8, implicit-def $x9, implicit-def $x10, implicit-def $x11, implicit-def $x12, implicit-def $x13, implicit-def $x14, implicit-def $x15, implicit-def $x18 + BRK 1 +... +--- +name: stack_2 +tracksRegLiveness: true +body: | + bb.0: + liveins: $x4, $x0, $x1, $x2, $x3 + $w8 = MOVZWi 259, 0 + $x9 = ADRP target-flags(aarch64-page) @g + $x9 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-nc) @g, 0 + STRXui $x9, $x1, 0 + STRHHui $w8, $x1, 8 + $w8 = ORRWrs $wzr, $w4, 0, implicit-def $x8 + STRXui $x8, $x3, 0 + STPXi $x3, $xzr, $x2, 0 + $w8 = MOVZWi 271, 0 + STRHHui $w8, $x2, 8 + $x8 = ORRXrs $xzr, $x0, 0 + $x0 = ORRXrs $xzr, $x1, 0 + $x1 = ORRXrs $xzr, $x2, 0 + BL @baz, implicit-def dead $lr, implicit $sp, implicit $x8, implicit $x0, implicit $x1, implicit $x3, implicit $x4, implicit-def $sp, implicit-def $x5, implicit-def $x6, implicit-def $x7, implicit-def $x8, implicit-def $x9, implicit-def $x10, implicit-def $x11, implicit-def $x12, implicit-def $x13, implicit-def $x14, implicit-def $x15, implicit-def $x18 + BRK 1 +... +--- +name: baz +tracksRegLiveness: true +body: | + bb.0: + liveins: $w0, $lr, $w8 + RET undef $lr Index: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64InstrInfo.cpp +++ llvm/lib/Target/AArch64/AArch64InstrInfo.cpp @@ -6195,6 +6195,15 @@ return outliner::OutlinedFunction(); } + // We don't fix up the stack twice. + if (std::any_of(RepeatedSequenceLocs.begin(), RepeatedSequenceLocs.end(), + [](const outliner::Candidate &C) { + return C.CallConstructionID == MachineOutlinerDefault; + })) { + RepeatedSequenceLocs.clear(); + return outliner::OutlinedFunction(); + } + // Save + restore LR. NumBytesToCreateFrame += 8; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83464.276659.patch Type: text/x-patch Size: 3673 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:21:34 2020 From: llvm-commits at lists.llvm.org (Anna Welker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:21:34 +0000 (UTC) Subject: [PATCH] D83459: Correctly update return status for MVEGatherScatterLowering In-Reply-To: References: Message-ID: <6c36f18b934110a1a465eef38a154485@localhost.localdomain> anwel accepted this revision. anwel added a comment. This revision is now accepted and ready to land. LGTM - `Changed` should indeed reflect possible changes by that function, too. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83459/new/ https://reviews.llvm.org/D83459 From llvm-commits at lists.llvm.org Thu Jul 9 01:24:27 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:24:27 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: <6e07d0bc05e0a93d5f5b0529564d2aac@localhost.localdomain> Petar.Avramovic marked an inline comment as done. Petar.Avramovic added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:240 +static bool buildAnyextOrCopy(Register Dst, Register Src, + MachineIRBuilder &MIRBuilder) { ---------------- paquette wrote: > Would `MachineIRBuilder::buildExtOrTrunc` work here? > > If not, maybe it would make sense to move this to `MachineIRBuilder` for the sake of consistency? The destination is vreg with reg class and source is generic vreg with LLT, it would not work for anyext since it requires both source and dest to be generic virtual registers. Here we anyext to new generic vreg with same size as Dst and then copy to Dst. MachineIRBuilder specializes for generic vregs so should it be something like: "buildExtOrTruncToVRegWithRegClass" ? Should I also cover vectors? I don't know if there is a way to know LLT of vector type that would fit into reg class. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 From llvm-commits at lists.llvm.org Thu Jul 9 01:25:45 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:25:45 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: dmgreen added a comment. Thanks for the info. That's very useful. I think I need to find a way to get better compile time numbers. Whenever I try to run them they come up so noisy as to be almost useless. I am in the very lucky position where all of the benchmarks I run are bare-metal and deterministic, so I can very accurately get performance numbers with zero noise in them (at least of the "I run the same binary twice and get the same result" kind). Those benchmarks showed this to be a improvement, and codesize test we run at -Oz where essentially flat with some small ups and downs. I had run the llvm-test-suite for testing, but not for performance. That was what pointed at the problems in D82987 . Luckily there was a gcc torture test that pointed out concisely what the problem was without having to go find out which part of cjpeg was broken. With that fixed the rest of the suite passed, along with a bootstrap. I also ran some csmith testing overnight some times back with a this and a couple of other patches in it, but that didn't show any problems even when D82987 wasn't fixed, so probably doesn't mean much. I can try and get some statistics. I usually trust the "zero noise" benchmarks more, due to showing performance more directly than statistics or noisy results usually do. But I acknowledge that they may be different to the test-suite. Plus I guess for basic-aa checking the number of noalias vs mayalias calls can be a more direct indication of how AA on it's own is performing. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Thu Jul 9 01:26:57 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:26:57 +0000 (UTC) Subject: [PATCH] D78861: [Attributor] [WIP] Track AA dependency using dependency graph In-Reply-To: References: Message-ID: <3bdf0159c69ac1531ab980ace7f4821a@localhost.localdomain> bbn updated this revision to Diff 276658. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78861/new/ https://reviews.llvm.org/D78861 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/depgraph.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D78861.276658.patch Type: text/x-patch Size: 30384 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:29:35 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Thu, 09 Jul 2020 01:29:35 -0700 (PDT) Subject: [llvm] a60c31f - Fix return status of AtomicExpandPass Message-ID: <5f06d56f.1c69fb81.c351e.5c28@mx.google.com> Author: serge-sans-paille Date: 2020-07-09T10:27:48+02:00 New Revision: a60c31fd6229d2b2e578fa7e192e98303e69223c URL: https://github.com/llvm/llvm-project/commit/a60c31fd6229d2b2e578fa7e192e98303e69223c DIFF: https://github.com/llvm/llvm-project/commit/a60c31fd6229d2b2e578fa7e192e98303e69223c.diff LOG: Fix return status of AtomicExpandPass Correctly reflect change in the return status. Differential Revision: https://reviews.llvm.org/D83457 Added: Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 2b51e8c84a53..a5030305435c 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -89,7 +89,7 @@ namespace { AtomicRMWInst *I, TargetLoweringBase::AtomicExpansionKind ExpansionKind); AtomicRMWInst *widenPartwordAtomicRMW(AtomicRMWInst *AI); - void expandPartwordCmpXchg(AtomicCmpXchgInst *I); + bool expandPartwordCmpXchg(AtomicCmpXchgInst *I); void expandAtomicRMWToMaskedIntrinsic(AtomicRMWInst *AI); void expandAtomicCmpXchgToMaskedIntrinsic(AtomicCmpXchgInst *CI); @@ -826,7 +826,7 @@ AtomicRMWInst *AtomicExpand::widenPartwordAtomicRMW(AtomicRMWInst *AI) { return NewAI; } -void AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) { +bool AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) { // The basic idea here is that we're expanding a cmpxchg of a // smaller memory size up to a word-sized cmpxchg. To do this, we // need to add a retry-loop for strong cmpxchg, so that @@ -949,6 +949,7 @@ void AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) { CI->replaceAllUsesWith(Res); CI->eraseFromParent(); + return true; } void AtomicExpand::expandAtomicOpToLLSC( @@ -1448,7 +1449,7 @@ bool AtomicExpand::tryExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) { llvm_unreachable("Unhandled case in tryExpandAtomicCmpXchg"); case TargetLoweringBase::AtomicExpansionKind::None: if (ValueSize < MinCASSize) - expandPartwordCmpXchg(CI); + return expandPartwordCmpXchg(CI); return false; case TargetLoweringBase::AtomicExpansionKind::LLSC: { return expandAtomicCmpXchg(CI); From llvm-commits at lists.llvm.org Thu Jul 9 01:29:43 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:29:43 +0000 (UTC) Subject: [PATCH] D83457: Fix return status of AtomicExpandPass In-Reply-To: References: Message-ID: <5208d5296c19940f1009cfe0483dba58@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa60c31fd6229: Fix return status of AtomicExpandPass (authored by serge-sans-paille). Herald added a subscriber: jfb. Changed prior to commit: https://reviews.llvm.org/D83457?vs=276643&id=276661#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83457/new/ https://reviews.llvm.org/D83457 Files: llvm/lib/CodeGen/AtomicExpandPass.cpp Index: llvm/lib/CodeGen/AtomicExpandPass.cpp =================================================================== --- llvm/lib/CodeGen/AtomicExpandPass.cpp +++ llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -89,7 +89,7 @@ AtomicRMWInst *I, TargetLoweringBase::AtomicExpansionKind ExpansionKind); AtomicRMWInst *widenPartwordAtomicRMW(AtomicRMWInst *AI); - void expandPartwordCmpXchg(AtomicCmpXchgInst *I); + bool expandPartwordCmpXchg(AtomicCmpXchgInst *I); void expandAtomicRMWToMaskedIntrinsic(AtomicRMWInst *AI); void expandAtomicCmpXchgToMaskedIntrinsic(AtomicCmpXchgInst *CI); @@ -826,7 +826,7 @@ return NewAI; } -void AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) { +bool AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) { // The basic idea here is that we're expanding a cmpxchg of a // smaller memory size up to a word-sized cmpxchg. To do this, we // need to add a retry-loop for strong cmpxchg, so that @@ -949,6 +949,7 @@ CI->replaceAllUsesWith(Res); CI->eraseFromParent(); + return true; } void AtomicExpand::expandAtomicOpToLLSC( @@ -1448,7 +1449,7 @@ llvm_unreachable("Unhandled case in tryExpandAtomicCmpXchg"); case TargetLoweringBase::AtomicExpansionKind::None: if (ValueSize < MinCASSize) - expandPartwordCmpXchg(CI); + return expandPartwordCmpXchg(CI); return false; case TargetLoweringBase::AtomicExpansionKind::LLSC: { return expandAtomicCmpXchg(CI); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83457.276661.patch Type: text/x-patch Size: 1493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:29:56 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:29:56 +0000 (UTC) Subject: [PATCH] D83465: Encode alignment attribute for `atomicrmw` Message-ID: gchatelet created this revision. gchatelet added reviewers: jfb, jyknight. Herald added subscribers: llvm-commits, dexonsmith, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This is a follow up patch to D83136 adding the align attribute to `atomicwmw`. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83465 Files: llvm/docs/LangRef.rst llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/AsmWriter.cpp llvm/test/Bitcode/compatibility.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83465.276662.patch Type: text/x-patch Size: 11391 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:37:31 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:37:31 +0000 (UTC) Subject: [PATCH] D83466: [PowerPC] Exploit type-J min/max for maximum/minimum intrinsic Message-ID: qiucf created this revision. qiucf added reviewers: PowerPC, hfinkel, nemanjai, jsji, steven.zhang. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. Herald added a project: LLVM. According to LangRef, LLVM have `llvm.minimum.*` and `llvm.maximum.*` intrinsics introduced in D52764 . (besides `llvm.maxnum.*` and `llvm.minnum.*`) Its semantics is (taking minimum as example): > If either operand is a NaN, returns NaN. Otherwise returns the lesser of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2018. PowerPC has type-j max/min instruction (`xs(max|min)jdp`) since ISA 3.0, the semantics is: > If src1 or src2 is a SNaN, an Invalid Operation exception occurs. > > If src1 is a NaN, result is src1. Otherwise, if src2 is a NaN, result is src2. > > Otherwise, if src1 is a Zero and src2 is a Zero and either src1 or src2 is a -Zero, the result is -Zero. > > Otherwise, if src1 is a +Zero and src2 is a +Zero, the result is +Zero. > > Otherwise, if src1 is less than src2, result is src1. Otherwise, result is src2. This instruction returns the 'first' that is NaN, and respects zero signs. One thing in confusion is whether `llvm.fmaximum.*` should quiet it if result is SNaN.. This instruction won't. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83466 Files: llvm/lib/Target/PowerPC/PPCInstrVSX.td llvm/test/CodeGen/PowerPC/fminmax-type-j.ll Index: llvm/test/CodeGen/PowerPC/fminmax-type-j.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/PowerPC/fminmax-type-j.ll @@ -0,0 +1,27 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ +; RUN: -mcpu=pwr9 < %s | FileCheck %s + +declare double @llvm.maximum.f64(double, double) +declare double @llvm.minimum.f64(double, double) + +define double @test_max(double %a, double %b) { +; CHECK-LABEL: test_max: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: xsmaxjdp 1, 1, 2 +; CHECK-NEXT: blr +entry: + %0 = call double @llvm.maximum.f64(double %a, double %b) + ret double %0 +} + +define double @test_min(double %a, double %b) { +; CHECK-LABEL: test_min: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: xsminjdp 1, 1, 2 +; CHECK-NEXT: blr +entry: + %0 = call double @llvm.minimum.f64(double %a, double %b) + ret double %0 +} + Index: llvm/lib/Target/PowerPC/PPCInstrVSX.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrVSX.td +++ llvm/lib/Target/PowerPC/PPCInstrVSX.td @@ -1616,10 +1616,12 @@ // FIXME: Setting the hasSideEffects flag here to match current behaviour. let hasSideEffects = 1 in { - def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc, - IIC_VecFP, []>; - def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc, - IIC_VecFP, []>; + def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsfrc, vsfrc, vsfrc, + IIC_VecFP, + [(set f64:$XT, (fmaximum f64:$XA, f64:$XB))]>; + def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsfrc, vsfrc, vsfrc, + IIC_VecFP, + [(set f64:$XT, (fminimum f64:$XA, f64:$XB))]>; } // Vector Byte-Reverse H/W/D/Q Word -------------- next part -------------- A non-text attachment was scrubbed... Name: D83466.276657.patch Type: text/x-patch Size: 2064 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:39:27 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:39:27 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: <8962370fe046a20a4c30c2b5de8b5302@localhost.localdomain> grimar added a comment. Changing `virtual` -> `override` looks fine to me, but why did you change `private` -> `protected`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 01:39:45 2020 From: llvm-commits at lists.llvm.org (Kan Shengchen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:39:45 +0000 (UTC) Subject: [PATCH] D83366: [MC] Simplify the logic of applying fixup for fragments, NFCI In-Reply-To: References: Message-ID: <47a9dcb203258fdf73f51962185bb4ff@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe59e39b7c409: [MC] Simplify the logic of applying fixup for fragments, NFCI (authored by skan). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83366/new/ https://reviews.llvm.org/D83366 Files: llvm/lib/MC/MCAssembler.cpp Index: llvm/lib/MC/MCAssembler.cpp =================================================================== --- llvm/lib/MC/MCAssembler.cpp +++ llvm/lib/MC/MCAssembler.cpp @@ -820,48 +820,57 @@ // Evaluate and apply the fixups, generating relocation entries as necessary. for (MCSection &Sec : *this) { for (MCFragment &Frag : Sec) { - // Data and relaxable fragments both have fixups. So only process - // those here. - // FIXME: Is there a better way to do this? MCEncodedFragmentWithFixups - // being templated makes this tricky. - if (isa(&Frag) && - isa(&Frag)) - continue; - if (!isa(&Frag) && !isa(&Frag) && - !isa(&Frag)) - continue; ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; - if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *AF = dyn_cast(&Frag)) { + + // Process MCAlignFragment and MCEncodedFragmentWithFixups here. + switch (Frag.getKind()) { + default: + continue; + case MCFragment::FT_Align: { + MCAlignFragment &AF = cast(Frag); // Insert fixup type for code alignment if the target define // shouldInsertFixupForCodeAlign target hook. - if (Sec.UseCodeAlign() && AF->hasEmitNops()) { - getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); - } + if (Sec.UseCodeAlign() && AF.hasEmitNops()) + getBackend().shouldInsertFixupForCodeAlign(*this, Layout, AF); continue; - } else if (auto *FragWithFixups = - dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else - llvm_unreachable("Unknown fragment with fixups!"); + } + case MCFragment::FT_Data: { + MCDataFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + STI = DF.getSubtargetInfo(); + assert(!DF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_Relaxable: { + MCRelaxableFragment &RF = cast(Frag); + Fixups = RF.getFixups(); + Contents = RF.getContents(); + STI = RF.getSubtargetInfo(); + assert(!RF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_CVDefRange: { + MCCVDefRangeFragment &CF = cast(Frag); + Fixups = CF.getFixups(); + Contents = CF.getContents(); + break; + } + case MCFragment::FT_Dwarf: { + MCDwarfLineAddrFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + case MCFragment::FT_DwarfFrame: { + MCDwarfCallFrameFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + } for (const MCFixup &Fixup : Fixups) { uint64_t FixedValue; bool IsResolved; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83366.276663.patch Type: text/x-patch Size: 4224 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:39:46 2020 From: llvm-commits at lists.llvm.org (Shengchen Kan via llvm-commits) Date: Thu, 09 Jul 2020 01:39:46 -0700 (PDT) Subject: [llvm] e59e39b - [MC] Simplify the logic of applying fixup for fragments, NFCI Message-ID: <5f06d7d2.1c69fb81.c12e3.5b1e@mx.google.com> Author: Shengchen Kan Date: 2020-07-09T16:39:13+08:00 New Revision: e59e39b7c4092ead733d25e7801429fd9dab7007 URL: https://github.com/llvm/llvm-project/commit/e59e39b7c4092ead733d25e7801429fd9dab7007 DIFF: https://github.com/llvm/llvm-project/commit/e59e39b7c4092ead733d25e7801429fd9dab7007.diff LOG: [MC] Simplify the logic of applying fixup for fragments, NFCI Replace mutiple `if else` clauses with a `switch` clause and remove redundant checks. Before this patch, we need to add a statement like `if(!isa(Frag)) ` here each time we add a new kind of `MCEncodedFragment` even if it has no fixups. After this patch, we don't need to do that. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D83366 Added: Modified: llvm/lib/MC/MCAssembler.cpp Removed: ################################################################################ diff --git a/llvm/lib/MC/MCAssembler.cpp b/llvm/lib/MC/MCAssembler.cpp index c1a39cada7e2..3ca8714b7817 100644 --- a/llvm/lib/MC/MCAssembler.cpp +++ b/llvm/lib/MC/MCAssembler.cpp @@ -820,48 +820,57 @@ void MCAssembler::layout(MCAsmLayout &Layout) { // Evaluate and apply the fixups, generating relocation entries as necessary. for (MCSection &Sec : *this) { for (MCFragment &Frag : Sec) { - // Data and relaxable fragments both have fixups. So only process - // those here. - // FIXME: Is there a better way to do this? MCEncodedFragmentWithFixups - // being templated makes this tricky. - if (isa(&Frag) && - isa(&Frag)) - continue; - if (!isa(&Frag) && !isa(&Frag) && - !isa(&Frag)) - continue; ArrayRef Fixups; MutableArrayRef Contents; const MCSubtargetInfo *STI = nullptr; - if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - STI = FragWithFixups->getSubtargetInfo(); - assert(!FragWithFixups->hasInstructions() || STI != nullptr); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *FragWithFixups = dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else if (auto *AF = dyn_cast(&Frag)) { + + // Process MCAlignFragment and MCEncodedFragmentWithFixups here. + switch (Frag.getKind()) { + default: + continue; + case MCFragment::FT_Align: { + MCAlignFragment &AF = cast(Frag); // Insert fixup type for code alignment if the target define // shouldInsertFixupForCodeAlign target hook. - if (Sec.UseCodeAlign() && AF->hasEmitNops()) { - getBackend().shouldInsertFixupForCodeAlign(*this, Layout, *AF); - } + if (Sec.UseCodeAlign() && AF.hasEmitNops()) + getBackend().shouldInsertFixupForCodeAlign(*this, Layout, AF); continue; - } else if (auto *FragWithFixups = - dyn_cast(&Frag)) { - Fixups = FragWithFixups->getFixups(); - Contents = FragWithFixups->getContents(); - } else - llvm_unreachable("Unknown fragment with fixups!"); + } + case MCFragment::FT_Data: { + MCDataFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + STI = DF.getSubtargetInfo(); + assert(!DF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_Relaxable: { + MCRelaxableFragment &RF = cast(Frag); + Fixups = RF.getFixups(); + Contents = RF.getContents(); + STI = RF.getSubtargetInfo(); + assert(!RF.hasInstructions() || STI != nullptr); + break; + } + case MCFragment::FT_CVDefRange: { + MCCVDefRangeFragment &CF = cast(Frag); + Fixups = CF.getFixups(); + Contents = CF.getContents(); + break; + } + case MCFragment::FT_Dwarf: { + MCDwarfLineAddrFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + case MCFragment::FT_DwarfFrame: { + MCDwarfCallFrameFragment &DF = cast(Frag); + Fixups = DF.getFixups(); + Contents = DF.getContents(); + break; + } + } for (const MCFixup &Fixup : Fixups) { uint64_t FixedValue; bool IsResolved; From llvm-commits at lists.llvm.org Thu Jul 9 01:43:05 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:43:05 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: <7a3aba1e4ef80c8bb7addf64a34217b7@localhost.localdomain> grimar added a comment. Also, I find the description a bit confusing: > Virtual functions should be overridden in the derived class Isn't this change (virtual->override) here a no-op to improve the code style? All these funtions are anyways overriden, with the `override` word or without it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 01:43:57 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:43:57 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: <70b736ec08528dc4022c292ac2e00f06@localhost.localdomain> jhenderson added a comment. Some points: 1. The `virtual` on the old version didn't declare a new virtual class. In face, it was actually superfluous from a strict point of view. `virtual` is propagated to all sub-class versions of functions with the same signature, regardless of accessor type. Prior to C++11, it was traditionally used to show that a sub-class function was an implementation of a virtual function in the point. 2. C++11 introduced the `override` specifier. It doesn't make a function any more or less virtual than it would have been before. All it does is require that the function overrides a virtual function in the parent class. 3. There's no need to change from `private` to `protected`. This just affects how functions can be called from the outside world - it doesn't affect the `virtual` nature of functions. That all being said, there's nothing wrong with this change, since you're working in the area, in my opinion. Using `override` is always a good idea since it provides safety guarantees (rather than potentially accidentally creating a new virtual function). You just need to update your description and summary accordingly (something like "use override instead of virtual for better safety"). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 01:44:25 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:44:25 +0000 (UTC) Subject: [PATCH] D83455: [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. In-Reply-To: References: Message-ID: <4ec83c15a03f4c346995125b6c7b9fd1@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM - cheers Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83455/new/ https://reviews.llvm.org/D83455 From llvm-commits at lists.llvm.org Thu Jul 9 01:44:57 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:44:57 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: <0e1cbac58624a596a89e85961a660de6@localhost.localdomain> jhenderson added a comment. In D83452#2141125 , @grimar wrote: > Also, I find the description a bit confusing: > > > Virtual functions should be overridden in the derived class > > Isn't this change (virtual->override) here a no-op to improve the code style? > All these funtions are anyways overriden, with the `override` word or without it. (I'm working on the assumption that @Higuoxing misunderstood the meaning of virtual and override - see my comment that landed at the same time as yours) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 01:46:25 2020 From: llvm-commits at lists.llvm.org (Lucas Prates via llvm-commits) Date: Thu, 09 Jul 2020 01:46:25 -0700 (PDT) Subject: [llvm] fc39a9c - [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand Message-ID: <5f06d961.1c69fb81.d8682.5c6d@mx.google.com> Author: Lucas Prates Date: 2020-07-09T09:46:17+01:00 New Revision: fc39a9ca0ef4f7b07c485e0d3c61ec0776f7a38c URL: https://github.com/llvm/llvm-project/commit/fc39a9ca0ef4f7b07c485e0d3c61ec0776f7a38c DIFF: https://github.com/llvm/llvm-project/commit/fc39a9ca0ef4f7b07c485e0d3c61ec0776f7a38c.diff LOG: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand Summary: When legalizing a biscast operation from an fp16 operand to an i16 on a target that requires both input and output types to be promoted to 32-bits, an assertion can fail when building the new node due to a mismatch between the the operation's result size and the type specified to the node. This patches fix the issue by making sure the bit width of the types match for the FP_TO_FP16 node, covering the difference with an extra ANYEXTEND operation. Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82552 Added: llvm/test/CodeGen/ARM/arm-half-promote.ll Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index d1411c4b6060..0b80173cb419 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -4554,7 +4554,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, // FIXME need to be more flexible about rounding mode. (void)V.convert(APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored); - return getConstant(V.bitcastToAPInt(), DL, VT); + return getConstant(V.bitcastToAPInt().getZExtValue(), DL, VT); } } } diff --git a/llvm/test/CodeGen/ARM/arm-half-promote.ll b/llvm/test/CodeGen/ARM/arm-half-promote.ll new file mode 100644 index 000000000000..1d81273a6fd5 --- /dev/null +++ b/llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -0,0 +1,53 @@ +; RUN: llc < %s -mtriple=thumbv7s-apple-ios7.0.0 | FileCheck %s + +define arm_aapcs_vfpcc { <8 x half>, <8 x half> } @f1() { +; CHECK-LABEL: _f1 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} + +define swiftcc { <8 x half>, <8 x half> } @f2() { +; CHECK-LABEL: _f2 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} From llvm-commits at lists.llvm.org Thu Jul 9 01:46:27 2020 From: llvm-commits at lists.llvm.org (Lucas Prates via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:46:27 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGfc39a9ca0ef4: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand (authored by pratlucas). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/ARM/arm-half-promote.ll Index: llvm/test/CodeGen/ARM/arm-half-promote.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -0,0 +1,53 @@ +; RUN: llc < %s -mtriple=thumbv7s-apple-ios7.0.0 | FileCheck %s + +define arm_aapcs_vfpcc { <8 x half>, <8 x half> } @f1() { +; CHECK-LABEL: _f1 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} + +define swiftcc { <8 x half>, <8 x half> } @f2() { +; CHECK-LABEL: _f2 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + ret { <8 x half>, <8 x half> } zeroinitializer +} Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -4554,7 +4554,7 @@ // FIXME need to be more flexible about rounding mode. (void)V.convert(APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored); - return getConstant(V.bitcastToAPInt(), DL, VT); + return getConstant(V.bitcastToAPInt().getZExtValue(), DL, VT); } } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82552.276664.patch Type: text/x-patch Size: 2681 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:47:49 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:47:49 +0000 (UTC) Subject: [PATCH] D81500: [SVE] Remove calls to VectorType::getNumElements from IR In-Reply-To: References: Message-ID: <1a3d27204b899b1c77a5ac34ec88c1a7@localhost.localdomain> RKSimon added inline comments. ================ Comment at: llvm/include/llvm/IR/MatrixBuilder.h:47 LHS = B.CreateVectorSplat( - cast(RHS->getType())->getNumElements(), LHS, + cast(RHS->getType())->getNumElements(), LHS, "scalar.splat"); ---------------- Do we not have a method for creating splat vectors that works for fixed and non-fixed vector types? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81500/new/ https://reviews.llvm.org/D81500 From llvm-commits at lists.llvm.org Thu Jul 9 01:51:06 2020 From: llvm-commits at lists.llvm.org (Thorsten via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:51:06 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: tschuett added a comment. See the c++ core guide rules: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-override Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 01:51:43 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:51:43 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. In-Reply-To: References: Message-ID: <60d9f4c7d98a2f59e0c400433f62f574@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM - cheers Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83332/new/ https://reviews.llvm.org/D83332 From llvm-commits at lists.llvm.org Thu Jul 9 01:51:56 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Thu, 09 Jul 2020 01:51:56 -0700 (PDT) Subject: [llvm] b805e94 - [PredicateInfo] Add additional RenamedOp field to PB. Message-ID: <5f06daac.1c69fb81.10050.6271@mx.google.com> Author: Florian Hahn Date: 2020-07-09T09:51:18+01:00 New Revision: b805e944773e119461903e5140389072c02796bf URL: https://github.com/llvm/llvm-project/commit/b805e944773e119461903e5140389072c02796bf DIFF: https://github.com/llvm/llvm-project/commit/b805e944773e119461903e5140389072c02796bf.diff LOG: [PredicateInfo] Add additional RenamedOp field to PB. OriginalOp of a predicate always refers to the original IR value that was renamed. So for nested predicates of the same value, it will always refer to the original IR value. For the use in SCCP however, we need to find the renamed value that is currently used in the condition associated with the predicate. This patch adds a new RenamedOp field to do exactly that. NewGVN currently relies on the existing behavior to merge instruction metadata. A test case to check for exactly that has been added in 195fa4bfae10. Reviewers: efriedma, davide, nikic Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D78133 Added: Modified: llvm/include/llvm/Transforms/Utils/PredicateInfo.h llvm/lib/Transforms/Utils/PredicateInfo.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/Utils/PredicateInfo.h b/llvm/include/llvm/Transforms/Utils/PredicateInfo.h index 16f776f7d292..657b97c67a8b 100644 --- a/llvm/include/llvm/Transforms/Utils/PredicateInfo.h +++ b/llvm/include/llvm/Transforms/Utils/PredicateInfo.h @@ -79,6 +79,10 @@ class PredicateBase : public ilist_node { // This can be use by passes, when destroying predicateinfo, to know // whether they can just drop the intrinsic, or have to merge metadata. Value *OriginalOp; + // The renamed operand in the condition used for this predicate. For nested + // predicates, this is diff erent to OriginalOp which refers to the initial + // operand. + Value *RenamedOp; PredicateBase(const PredicateBase &) = delete; PredicateBase &operator=(const PredicateBase &) = delete; PredicateBase() = delete; diff --git a/llvm/lib/Transforms/Utils/PredicateInfo.cpp b/llvm/lib/Transforms/Utils/PredicateInfo.cpp index 6fba0bc13da6..d320f488c5c5 100644 --- a/llvm/lib/Transforms/Utils/PredicateInfo.cpp +++ b/llvm/lib/Transforms/Utils/PredicateInfo.cpp @@ -600,6 +600,9 @@ Value *PredicateInfoBuilder::materializeStack(unsigned int &Counter, RenameIter == RenameStack.begin() ? OrigOp : (RenameIter - 1)->Def; ValueDFS &Result = *RenameIter; auto *ValInfo = Result.PInfo; + ValInfo->RenamedOp = (RenameStack.end() - Start) == RenameStack.begin() + ? OrigOp + : (RenameStack.end() - Start - 1)->Def; // For edge predicates, we can just place the operand in the block before // the terminator. For assume, we have to place it right before the assume // to ensure we dominate all of our uses. Always insert right before the From llvm-commits at lists.llvm.org Thu Jul 9 01:52:01 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:52:01 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Add additional RenamedOp field to PB. In-Reply-To: References: Message-ID: <01a61205b66265022b83a2baba092ecb@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGb805e944773e: [PredicateInfo] Add additional RenamedOp field to PB. (authored by fhahn). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 Files: llvm/include/llvm/Transforms/Utils/PredicateInfo.h llvm/lib/Transforms/Utils/PredicateInfo.cpp Index: llvm/lib/Transforms/Utils/PredicateInfo.cpp =================================================================== --- llvm/lib/Transforms/Utils/PredicateInfo.cpp +++ llvm/lib/Transforms/Utils/PredicateInfo.cpp @@ -600,6 +600,9 @@ RenameIter == RenameStack.begin() ? OrigOp : (RenameIter - 1)->Def; ValueDFS &Result = *RenameIter; auto *ValInfo = Result.PInfo; + ValInfo->RenamedOp = (RenameStack.end() - Start) == RenameStack.begin() + ? OrigOp + : (RenameStack.end() - Start - 1)->Def; // For edge predicates, we can just place the operand in the block before // the terminator. For assume, we have to place it right before the assume // to ensure we dominate all of our uses. Always insert right before the Index: llvm/include/llvm/Transforms/Utils/PredicateInfo.h =================================================================== --- llvm/include/llvm/Transforms/Utils/PredicateInfo.h +++ llvm/include/llvm/Transforms/Utils/PredicateInfo.h @@ -79,6 +79,10 @@ // This can be use by passes, when destroying predicateinfo, to know // whether they can just drop the intrinsic, or have to merge metadata. Value *OriginalOp; + // The renamed operand in the condition used for this predicate. For nested + // predicates, this is different to OriginalOp which refers to the initial + // operand. + Value *RenamedOp; PredicateBase(const PredicateBase &) = delete; PredicateBase &operator=(const PredicateBase &) = delete; PredicateBase() = delete; -------------- next part -------------- A non-text attachment was scrubbed... Name: D78133.276666.patch Type: text/x-patch Size: 1566 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 01:52:13 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:52:13 +0000 (UTC) Subject: [PATCH] D83424: [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. In-Reply-To: References: Message-ID: <48b5bbf193eb8edcf7f578f81070f900@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM - cheers Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83424/new/ https://reviews.llvm.org/D83424 From llvm-commits at lists.llvm.org Thu Jul 9 01:54:44 2020 From: llvm-commits at lists.llvm.org (Evgeniy via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:54:44 +0000 (UTC) Subject: [PATCH] D79485: [BPI] Improve static heuristics for "cold" paths. In-Reply-To: References: Message-ID: <743c7b6ac9a10127864c6f44f8f54494@localhost.localdomain> ebrevnov added a comment. I measured performance on bunch of java related benchmarks we have in house including SPECJVM2008, SPECJbb2015, DaCapo9.12 and others. I don't see any noticeable impact on performance. Please let me know if you need more details on these experiments or some additional perf data. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79485/new/ https://reviews.llvm.org/D79485 From llvm-commits at lists.llvm.org Thu Jul 9 01:58:18 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard via llvm-commits) Date: Thu, 09 Jul 2020 01:58:18 -0700 (PDT) Subject: [llvm] dc4a6f5 - [llvm-objdump] Display locations of variables alongside disassembly Message-ID: <5f06dc2a.1c69fb81.3cf5b.5fe0@mx.google.com> Author: Oliver Stannard Date: 2020-07-09T09:58:00+01:00 New Revision: dc4a6f5db4f0178bae43ef615cc8902c759d6195 URL: https://github.com/llvm/llvm-project/commit/dc4a6f5db4f0178bae43ef615cc8902c759d6195 DIFF: https://github.com/llvm/llvm-project/commit/dc4a6f5db4f0178bae43ef615cc8902c759d6195.diff LOG: [llvm-objdump] Display locations of variables alongside disassembly This adds the --debug-vars option to llvm-objdump, which prints locations (registers/memory) of source-level variables alongside the disassembly based on DWARF info. A vertical line is printed for each live-range, with a label at the top giving the variable name and location, and the position and length of the line indicating the program counter range in which it is valid. Differential revision: https://reviews.llvm.org/D70720 Added: llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s Modified: llvm/docs/CommandGuide/llvm-objdump.rst llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp llvm/tools/llvm-objdump/llvm-objdump.cpp Removed: ################################################################################ diff --git a/llvm/docs/CommandGuide/llvm-objdump.rst b/llvm/docs/CommandGuide/llvm-objdump.rst index 2de48d66536f..df4cf746abbb 100644 --- a/llvm/docs/CommandGuide/llvm-objdump.rst +++ b/llvm/docs/CommandGuide/llvm-objdump.rst @@ -123,6 +123,17 @@ OPTIONS Demangle symbol names in the output. +.. option:: --debug-vars= + + Print the locations (in registers or memory) of source-level variables + alongside disassembly. ``format`` may be ``unicode`` or ``ascii``, defaulting + to ``unicode`` if omitted. + +.. option:: --debug-vars-indent= + + Distance to indent the source-level variable display, relative to the start + of the disassembly. Defaults to 40 characters. + .. option:: -j, --section= Perform commands on the specified sections only. For Mach-O use diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h index 28929306ca92..1aff2624990f 100644 --- a/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h +++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h @@ -142,6 +142,12 @@ class DWARFExpression { void print(raw_ostream &OS, const MCRegisterInfo *RegInfo, DWARFUnit *U, bool IsEH = false) const; + /// Print the expression in a format intended to be compact and useful to a + /// user, but not perfectly unambiguous, or capable of representing every + /// valid DWARF expression. Returns true if the expression was sucessfully + /// printed. + bool printCompact(raw_ostream &OS, const MCRegisterInfo &RegInfo); + bool verify(DWARFUnit *U); private: diff --git a/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp b/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp index 28b24b3baaab..d3c1cd5bb88f 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp @@ -374,4 +374,74 @@ bool DWARFExpression::verify(DWARFUnit *U) { return true; } +/// A user-facing string representation of a DWARF expression. This might be an +/// Address expression, in which case it will be implicitly dereferenced, or a +/// Value expression. +struct PrintedExpr { + enum ExprKind { + Address, + Value, + }; + ExprKind Kind; + SmallString<16> String; + + PrintedExpr(ExprKind K = Address) : Kind(K) {} +}; + +static bool printCompactDWARFExpr(raw_ostream &OS, DWARFExpression::iterator I, + const DWARFExpression::iterator E, + const MCRegisterInfo &MRI) { + SmallVector Stack; + + while (I != E) { + DWARFExpression::Operation &Op = *I; + uint8_t Opcode = Op.getCode(); + switch (Opcode) { + case dwarf::DW_OP_regx: { + // DW_OP_regx: A register, with the register num given as an operand. + // Printed as the plain register name. + uint64_t DwarfRegNum = Op.getRawOperand(0); + Optional LLVMRegNum = MRI.getLLVMRegNum(DwarfRegNum, false); + if (!LLVMRegNum) { + OS << ""; + return false; + } + raw_svector_ostream S(Stack.emplace_back(PrintedExpr::Value).String); + S << MRI.getName(*LLVMRegNum); + break; + } + default: + if (Opcode >= dwarf::DW_OP_reg0 && Opcode <= dwarf::DW_OP_reg31) { + // DW_OP_reg: A register, with the register num implied by the + // opcode. Printed as the plain register name. + uint64_t DwarfRegNum = Opcode - dwarf::DW_OP_reg0; + Optional LLVMRegNum = MRI.getLLVMRegNum(DwarfRegNum, false); + if (!LLVMRegNum) { + OS << ""; + return false; + } + raw_svector_ostream S(Stack.emplace_back(PrintedExpr::Value).String); + S << MRI.getName(*LLVMRegNum); + } else { + // If we hit an unknown operand, we don't know its effect on the stack, + // so bail out on the whole expression. + OS << ""; + return false; + } + break; + } + ++I; + } + + assert(Stack.size() == 1 && "expected one value on stack"); + OS << Stack.front().String; + + return true; +} + +bool DWARFExpression::printCompact(raw_ostream &OS, const MCRegisterInfo &MRI) { + return printCompactDWARFExpr(OS, begin(), end(), MRI); +} + } // namespace llvm diff --git a/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c b/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c new file mode 100644 index 000000000000..20c17f7d9762 --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c @@ -0,0 +1,10 @@ +int foo(int a, int b, int c) { + int x = a + b; + int y = x + c; + return y; +} + +int bar(int a) { + a++; + return a; +} diff --git a/llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c b/llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c new file mode 100644 index 000000000000..8d923be01328 --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c @@ -0,0 +1,3 @@ +int foo(int *喵) { + return *喵; +} diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s new file mode 100644 index 000000000000..9bbb36f14e3a --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s @@ -0,0 +1,351 @@ +## Check that the --debug-vars option works for simple register locations, when +## using DWARF4 debug info, with functions in multiple sections. + +## Generated with this compile command, with the source code in Inputs/debug.c: +## clang --target=arm--none-eabi -march=armv7-a -c debug.c -O1 -gdwarf-4 -S -o - -ffunction-sections + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn | \ +# RUN: FileCheck %s + +# CHECK: Disassembly of section .text.foo: +# CHECK-EMPTY: +# CHECK-NEXT: 00000000 : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: ┃ ┠─ b = R1 +# CHECK-NEXT: ┃ ┃ ┠─ c = R2 +# CHECK-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# CHECK-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# CHECK-NEXT: ┌─ y = R0 +# CHECK-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# CHECK-NEXT: 8: bx lr ┻ ┻ ┻ +# CHECK-EMPTY: +# CHECK-NEXT: Disassembly of section .text.bar: +# CHECK-EMPTY: +# CHECK-NEXT: 00000000 : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: 0: add r0, r0, #1 ┃ +# CHECK-NEXT: 4: bx lr ┻ + + .text + .syntax unified + .eabi_attribute 67, "2.09" + .eabi_attribute 6, 10 + .eabi_attribute 7, 65 + .eabi_attribute 8, 1 + .eabi_attribute 9, 2 + .fpu neon + .eabi_attribute 34, 0 + .eabi_attribute 17, 1 + .eabi_attribute 20, 1 + .eabi_attribute 21, 1 + .eabi_attribute 23, 3 + .eabi_attribute 24, 1 + .eabi_attribute 25, 1 + .eabi_attribute 38, 1 + .eabi_attribute 18, 4 + .eabi_attribute 26, 2 + .eabi_attribute 14, 0 + .file "debug.c" + .section .text.foo,"ax",%progbits + .globl foo + .p2align 2 + .type foo,%function + .code 32 +foo: +.Lfunc_begin0: + .file 1 "/work" "llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" + .loc 1 1 0 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc + .loc 1 2 13 prologue_end + add r0, r1, r0 +.Ltmp0: + .loc 1 3 13 + add r0, r0, r2 +.Ltmp1: + .loc 1 4 3 + bx lr +.Ltmp2: +.Lfunc_end0: + .size foo, .Lfunc_end0-foo + .cfi_endproc + .cantunwind + .fnend + + .section .text.bar,"ax",%progbits + .globl bar + .p2align 2 + .type bar,%function + .code 32 +bar: +.Lfunc_begin1: + .loc 1 7 0 + .fnstart + .cfi_startproc + .loc 1 8 4 prologue_end + add r0, r0, #1 +.Ltmp3: + .loc 1 9 3 + bx lr +.Ltmp4: +.Lfunc_end1: + .size bar, .Lfunc_end1-bar + .cfi_endproc + .cantunwind + .fnend + + .section .debug_str,"MS",%progbits,1 +.Linfo_string0: + .asciz "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" +.Linfo_string1: + .asciz "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" +.Linfo_string2: + .asciz "/work/scratch" +.Linfo_string3: + .asciz "foo" +.Linfo_string4: + .asciz "int" +.Linfo_string5: + .asciz "bar" +.Linfo_string6: + .asciz "a" +.Linfo_string7: + .asciz "b" +.Linfo_string8: + .asciz "c" +.Linfo_string9: + .asciz "x" +.Linfo_string10: + .asciz "y" + .section .debug_loc,"",%progbits +.Ldebug_loc0: + .long -1 + .long .Lfunc_begin0 + .long .Lfunc_begin0-.Lfunc_begin0 + .long .Ltmp0-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 +.Ldebug_loc1: + .long -1 + .long .Lfunc_begin0 + .long .Ltmp0-.Lfunc_begin0 + .long .Ltmp1-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 +.Ldebug_loc2: + .long -1 + .long .Lfunc_begin0 + .long .Ltmp1-.Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 + .section .debug_abbrev,"",%progbits + .byte 1 + .byte 17 + .byte 1 + .byte 37 + .byte 14 + .byte 19 + .byte 5 + .byte 3 + .byte 14 + .byte 16 + .byte 23 + .byte 27 + .byte 14 + .byte 17 + .byte 1 + .byte 85 + .byte 23 + .byte 0 + .byte 0 + .byte 2 + .byte 46 + .byte 1 + .byte 17 + .byte 1 + .byte 18 + .byte 6 + .byte 64 + .byte 24 + .ascii "\227B" + .byte 25 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 39 + .byte 25 + .byte 73 + .byte 19 + .byte 63 + .byte 25 + .byte 0 + .byte 0 + .byte 3 + .byte 5 + .byte 0 + .byte 2 + .byte 23 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 4 + .byte 5 + .byte 0 + .byte 2 + .byte 24 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 5 + .byte 52 + .byte 0 + .byte 2 + .byte 23 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 6 + .byte 36 + .byte 0 + .byte 3 + .byte 14 + .byte 62 + .byte 11 + .byte 11 + .byte 11 + .byte 0 + .byte 0 + .byte 0 + .section .debug_info,"",%progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 +.Ldebug_info_start0: + .short 4 + .long .debug_abbrev + .byte 4 + .byte 1 + .long .Linfo_string0 + .short 12 + .long .Linfo_string1 + .long .Lline_table_start0 + .long .Linfo_string2 + .long 0 + .long .Ldebug_ranges0 + .byte 2 + .long .Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 91 + + .long .Linfo_string3 + .byte 1 + .byte 1 + + .long 166 + + .byte 3 + .long .Ldebug_loc0 + .long .Linfo_string6 + .byte 1 + .byte 1 + .long 166 + .byte 4 + .byte 1 + .byte 81 + .long .Linfo_string7 + .byte 1 + .byte 1 + .long 166 + .byte 4 + .byte 1 + .byte 82 + .long .Linfo_string8 + .byte 1 + .byte 1 + .long 166 + .byte 5 + .long .Ldebug_loc1 + .long .Linfo_string9 + .byte 1 + .byte 2 + .long 166 + .byte 5 + .long .Ldebug_loc2 + .long .Linfo_string10 + .byte 1 + .byte 3 + .long 166 + .byte 0 + .byte 2 + .long .Lfunc_begin1 + .long .Lfunc_end1-.Lfunc_begin1 + .byte 1 + .byte 91 + + .long .Linfo_string5 + .byte 1 + .byte 7 + + .long 166 + + .byte 4 + .byte 1 + .byte 80 + .long .Linfo_string6 + .byte 1 + .byte 7 + .long 166 + .byte 0 + .byte 6 + .long .Linfo_string4 + .byte 5 + .byte 4 + .byte 0 +.Ldebug_info_end0: + .section .debug_ranges,"",%progbits +.Ldebug_ranges0: + .long .Lfunc_begin0 + .long .Lfunc_end0 + .long .Lfunc_begin1 + .long .Lfunc_end1 + .long 0 + .long 0 + .ident "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" + .section ".note.GNU-stack","",%progbits + .addrsig + .eabi_attribute 30, 1 + .section .debug_line,"",%progbits +.Lline_table_start0: diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s new file mode 100644 index 000000000000..bf0c7bd52feb --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s @@ -0,0 +1,454 @@ +## Check that the --debug-vars option works for simple register locations, when +## using DWARF4 debug info, with multiple functions in one section. Check that +## the live-range lines are rendered correctly when using the --no-show-raw-insn, +## --line-numbers and --source options. These do not affect the DWARF parsing +## used by --debug-vars, but do add extra lines or columns to the output, so we +## test to make sure the live ranges are still displayed correctly. + +## Generated with this compile command, with the source code in Inputs/debug.c: +## clang --target=arm--none-eabi -march=armv7-a -c debug.c -O1 -gdwarf-4 -S -o - + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars | \ +# RUN: FileCheck %s --check-prefix=RAW --strict-whitespace + +## Check that passing the default value for --debug-vars-indent (40) makes no +## change to the output. +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --debug-vars-indent=40 | \ +# RUN: FileCheck %s --check-prefix=RAW --strict-whitespace + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --debug-vars-indent=30 | \ +# RUN: FileCheck %s --check-prefix=INDENT --strict-whitespace + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn | \ +# RUN: FileCheck %s --check-prefix=NO-RAW --strict-whitespace + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn --line-numbers | \ +# RUN: FileCheck %s --check-prefix=LINE-NUMS --strict-whitespace + +# RUN: mkdir -p %t/a +# RUN: cp %p/Inputs/debug.c %t/a/debug.c +# RUN: sed -e "s,SRC_COMPDIR,%/t/a,g" %s > %t.s +# RUN: llvm-mc -triple armv8a--none-eabi < %t.s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn --source | \ +# RUN: FileCheck %s --check-prefix=SOURCE --strict-whitespace + +## An optional argument to the --debug-vars= option can be used to switch +## between unicode and ascii output (with unicode being the default). +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars=unicode | \ +# RUN: FileCheck %s --check-prefix=RAW --strict-whitespace +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars=ascii | \ +# RUN: FileCheck %s --check-prefix=ASCII --strict-whitespace + +## Note that llvm-objdump emits tab characters in the disassembly, assuming an +## 8-byte tab stop, so these might not look aligned in a text editor. + +# RAW: 00000000 : +# RAW-NEXT: ┠─ a = R0 +# RAW-NEXT: ┃ ┠─ b = R1 +# RAW-NEXT: ┃ ┃ ┠─ c = R2 +# RAW-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# RAW-NEXT: 0: 00 00 81 e0 add r0, r1, r0 ┻ ┃ ┃ ╈ +# RAW-NEXT: ┌─ y = R0 +# RAW-NEXT: 4: 02 00 80 e0 add r0, r0, r2 ╈ ┃ ┃ ┻ +# RAW-NEXT: 8: 1e ff 2f e1 bx lr ┻ ┻ ┻ +# RAW-EMPTY: +# RAW-NEXT: 0000000c : +# RAW-NEXT: ┠─ a = R0 +# RAW-NEXT: c: 01 00 80 e2 add r0, r0, #1 ┃ +# RAW-NEXT: 10: 1e ff 2f e1 bx lr ┻ + + +# INDENT: 00000000 : +# INDENT-NEXT: ┠─ a = R0 +# INDENT-NEXT: ┃ ┠─ b = R1 +# INDENT-NEXT: ┃ ┃ ┠─ c = R2 +# INDENT-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# INDENT-NEXT: 0: 00 00 81 e0 add r0, r1, r0 ┻ ┃ ┃ ╈ +# INDENT-NEXT: ┌─ y = R0 +# INDENT-NEXT: 4: 02 00 80 e0 add r0, r0, r2 ╈ ┃ ┃ ┻ +# INDENT-NEXT: 8: 1e ff 2f e1 bx lr ┻ ┻ ┻ +# INDENT-EMPTY: +# INDENT-NEXT: 0000000c : +# INDENT-NEXT: ┠─ a = R0 +# INDENT-NEXT: c: 01 00 80 e2 add r0, r0, #1 ┃ +# INDENT-NEXT: 10: 1e ff 2f e1 bx lr ┻ + +# NO-RAW: 00000000 : +# NO-RAW-NEXT: ┠─ a = R0 +# NO-RAW-NEXT: ┃ ┠─ b = R1 +# NO-RAW-NEXT: ┃ ┃ ┠─ c = R2 +# NO-RAW-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# NO-RAW-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# NO-RAW-NEXT: ┌─ y = R0 +# NO-RAW-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# NO-RAW-NEXT: 8: bx lr ┻ ┻ ┻ +# NO-RAW-EMPTY: +# NO-RAW-NEXT: 0000000c : +# NO-RAW-NEXT: ┠─ a = R0 +# NO-RAW-NEXT: c: add r0, r0, #1 ┃ +# NO-RAW-NEXT: 10: bx lr ┻ + +# LINE-NUMS: 00000000 : +# LINE-NUMS-NEXT: ; foo(): +# LINE-NUMS-NEXT: ; SRC_COMPDIR{{[\\/]}}debug.c:2 ┠─ a = R0 +# LINE-NUMS-NEXT: ┃ ┠─ b = R1 +# LINE-NUMS-NEXT: ┃ ┃ ┠─ c = R2 +# LINE-NUMS-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# LINE-NUMS-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# LINE-NUMS-NEXT: ; SRC_COMPDIR{{[\\/]}}debug.c:3 ┌─ y = R0 +# LINE-NUMS-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# LINE-NUMS-NEXT: ; SRC_COMPDIR{{[\\/]}}debug.c:4 ┃ ┃ ┃ +# LINE-NUMS-NEXT: 8: bx lr ┻ ┻ ┻ +# LINE-NUMS-EMPTY: +# LINE-NUMS-NEXT: 0000000c : +# LINE-NUMS-NEXT: ; bar(): +# LINE-NUMS-NEXT: ; SRC_COMPDIR{{[\\/]}}debug.c:8 ┠─ a = R0 +# LINE-NUMS-NEXT: c: add r0, r0, #1 ┃ +# LINE-NUMS-NEXT: ; SRC_COMPDIR{{[\\/]}}debug.c:9 ┃ +# LINE-NUMS-NEXT: 10: bx lr ┻ + +# SOURCE: 00000000 : +# SOURCE-NEXT: ; int x = a + b; ┠─ a = R0 +# SOURCE-NEXT: ┃ ┠─ b = R1 +# SOURCE-NEXT: ┃ ┃ ┠─ c = R2 +# SOURCE-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# SOURCE-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# SOURCE-NEXT: ; int y = x + c; ┌─ y = R0 +# SOURCE-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# SOURCE-NEXT: ; return y; ┃ ┃ ┃ +# SOURCE-NEXT: 8: bx lr ┻ ┻ ┻ +# SOURCE-EMPTY: +# SOURCE-NEXT: 0000000c : +# SOURCE-NEXT: ; a++; ┠─ a = R0 +# SOURCE-NEXT: c: add r0, r0, #1 ┃ +# SOURCE-NEXT: ; return a; ┃ +# SOURCE-NEXT: 10: bx lr ┻ + +# ASCII: 00000000 : +# ASCII-NEXT: |- a = R0 +# ASCII-NEXT: | |- b = R1 +# ASCII-NEXT: | | |- c = R2 +# ASCII-NEXT: | | | /- x = R0 +# ASCII-NEXT: 0: 00 00 81 e0 add r0, r1, r0 v | | ^ +# ASCII-NEXT: /- y = R0 +# ASCII-NEXT: 4: 02 00 80 e0 add r0, r0, r2 ^ | | v +# ASCII-NEXT: 8: 1e ff 2f e1 bx lr v v v +# ASCII-EMPTY: +# ASCII-NEXT: 0000000c : +# ASCII-NEXT: |- a = R0 +# ASCII-NEXT: c: 01 00 80 e2 add r0, r0, #1 | +# ASCII-NEXT: 10: 1e ff 2f e1 bx lr v + + .text + .syntax unified + .eabi_attribute 67, "2.09" + .eabi_attribute 6, 10 + .eabi_attribute 7, 65 + .eabi_attribute 8, 1 + .eabi_attribute 9, 2 + .fpu neon + .eabi_attribute 34, 0 + .eabi_attribute 17, 1 + .eabi_attribute 20, 1 + .eabi_attribute 21, 1 + .eabi_attribute 23, 3 + .eabi_attribute 24, 1 + .eabi_attribute 25, 1 + .eabi_attribute 38, 1 + .eabi_attribute 18, 4 + .eabi_attribute 26, 2 + .eabi_attribute 14, 0 + .file "debug.c" + .globl foo + .p2align 2 + .type foo,%function + .code 32 +foo: +.Lfunc_begin0: + .file 1 "" "SRC_COMPDIR/debug.c" + .loc 1 1 0 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc + .loc 1 2 13 prologue_end + add r0, r1, r0 +.Ltmp0: + .loc 1 3 13 + add r0, r0, r2 +.Ltmp1: + .loc 1 4 3 + bx lr +.Ltmp2: +.Lfunc_end0: + .size foo, .Lfunc_end0-foo + .cfi_endproc + .cantunwind + .fnend + + .globl bar + .p2align 2 + .type bar,%function + .code 32 +bar: +.Lfunc_begin1: + .loc 1 7 0 + .fnstart + .cfi_startproc + .loc 1 8 4 prologue_end + add r0, r0, #1 +.Ltmp3: + .loc 1 9 3 + bx lr +.Ltmp4: +.Lfunc_end1: + .size bar, .Lfunc_end1-bar + .cfi_endproc + .cantunwind + .fnend + + .section .debug_str,"MS",%progbits,1 +.Linfo_string0: + .asciz "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" +.Linfo_string1: + .asciz "SRC_COMPDIR/debug.c" +.Linfo_string2: + .asciz "" +.Linfo_string3: + .asciz "foo" +.Linfo_string4: + .asciz "int" +.Linfo_string5: + .asciz "bar" +.Linfo_string6: + .asciz "a" +.Linfo_string7: + .asciz "b" +.Linfo_string8: + .asciz "c" +.Linfo_string9: + .asciz "x" +.Linfo_string10: + .asciz "y" + .section .debug_loc,"",%progbits +.Ldebug_loc0: + .long .Lfunc_begin0-.Lfunc_begin0 + .long .Ltmp0-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 +.Ldebug_loc1: + .long .Ltmp0-.Lfunc_begin0 + .long .Ltmp1-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 +.Ldebug_loc2: + .long .Ltmp1-.Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 + .section .debug_abbrev,"",%progbits + .byte 1 + .byte 17 + .byte 1 + .byte 37 + .byte 14 + .byte 19 + .byte 5 + .byte 3 + .byte 14 + .byte 16 + .byte 23 + .byte 27 + .byte 14 + .byte 17 + .byte 1 + .byte 18 + .byte 6 + .byte 0 + .byte 0 + .byte 2 + .byte 46 + .byte 1 + .byte 17 + .byte 1 + .byte 18 + .byte 6 + .byte 64 + .byte 24 + .ascii "\227B" + .byte 25 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 39 + .byte 25 + .byte 73 + .byte 19 + .byte 63 + .byte 25 + .byte 0 + .byte 0 + .byte 3 + .byte 5 + .byte 0 + .byte 2 + .byte 23 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 4 + .byte 5 + .byte 0 + .byte 2 + .byte 24 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 5 + .byte 52 + .byte 0 + .byte 2 + .byte 23 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 6 + .byte 36 + .byte 0 + .byte 3 + .byte 14 + .byte 62 + .byte 11 + .byte 11 + .byte 11 + .byte 0 + .byte 0 + .byte 0 + .section .debug_info,"",%progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 +.Ldebug_info_start0: + .short 4 + .long .debug_abbrev + .byte 4 + .byte 1 + .long .Linfo_string0 + .short 12 + .long .Linfo_string1 + .long .Lline_table_start0 + .long .Linfo_string2 + .long .Lfunc_begin0 + .long .Lfunc_end1-.Lfunc_begin0 + .byte 2 + .long .Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 91 + + .long .Linfo_string3 + .byte 1 + .byte 1 + + .long 166 + + .byte 3 + .long .Ldebug_loc0 + .long .Linfo_string6 + .byte 1 + .byte 1 + .long 166 + .byte 4 + .byte 1 + .byte 81 + .long .Linfo_string7 + .byte 1 + .byte 1 + .long 166 + .byte 4 + .byte 1 + .byte 82 + .long .Linfo_string8 + .byte 1 + .byte 1 + .long 166 + .byte 5 + .long .Ldebug_loc1 + .long .Linfo_string9 + .byte 1 + .byte 2 + .long 166 + .byte 5 + .long .Ldebug_loc2 + .long .Linfo_string10 + .byte 1 + .byte 3 + .long 166 + .byte 0 + .byte 2 + .long .Lfunc_begin1 + .long .Lfunc_end1-.Lfunc_begin1 + .byte 1 + .byte 91 + + .long .Linfo_string5 + .byte 1 + .byte 7 + + .long 166 + + .byte 4 + .byte 1 + .byte 80 + .long .Linfo_string6 + .byte 1 + .byte 7 + .long 166 + .byte 0 + .byte 6 + .long .Linfo_string4 + .byte 5 + .byte 4 + .byte 0 +.Ldebug_info_end0: + .ident "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" + .section ".note.GNU-stack","",%progbits + .addrsig + .eabi_attribute 30, 1 + .section .debug_line,"",%progbits +.Lline_table_start0: diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s new file mode 100644 index 000000000000..a9e08ff95556 --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s @@ -0,0 +1,411 @@ +## Check that the --debug-vars option works for simple register locations, when +## using DWARF4 debug info, with functions in multiple sections. + +## Generated with this compile command, with the source code in Inputs/debug.c: +## clang --target=arm--none-eabi -march=armv7-a -c debug.c -O1 -gdwarf-5 -S -o - -ffunction-sections + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj --dwarf-version=5 | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn | \ +# RUN: FileCheck %s + +# CHECK: Disassembly of section .text.foo: +# CHECK-EMPTY: +# CHECK-NEXT: 00000000 : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: ┃ ┠─ b = R1 +# CHECK-NEXT: ┃ ┃ ┠─ c = R2 +# CHECK-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# CHECK-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# CHECK-NEXT: ┌─ y = R0 +# CHECK-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# CHECK-NEXT: 8: bx lr ┻ ┻ ┻ +# CHECK-EMPTY: +# CHECK-NEXT: Disassembly of section .text.bar: +# CHECK-EMPTY: +# CHECK-NEXT: 00000000 : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: 0: add r0, r0, #1 ┃ +# CHECK-NEXT: 4: bx lr ┻ + + .text + .syntax unified + .eabi_attribute 67, "2.09" + .eabi_attribute 6, 10 + .eabi_attribute 7, 65 + .eabi_attribute 8, 1 + .eabi_attribute 9, 2 + .fpu neon + .eabi_attribute 34, 0 + .eabi_attribute 17, 1 + .eabi_attribute 20, 1 + .eabi_attribute 21, 1 + .eabi_attribute 23, 3 + .eabi_attribute 24, 1 + .eabi_attribute 25, 1 + .eabi_attribute 38, 1 + .eabi_attribute 18, 4 + .eabi_attribute 26, 2 + .eabi_attribute 14, 0 + .file "debug.c" + .section .text.foo,"ax",%progbits + .globl foo + .p2align 2 + .type foo,%function + .code 32 +foo: +.Lfunc_begin0: + .file 0 "/work/scratch" "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" md5 0x07374f01ab24ec7c07db73bc13bd778e + .file 1 "/work" "llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" md5 0x07374f01ab24ec7c07db73bc13bd778e + .loc 1 1 0 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc + .loc 1 2 13 prologue_end + add r0, r1, r0 +.Ltmp0: + .loc 1 3 13 + add r0, r0, r2 +.Ltmp1: + .loc 1 4 3 + bx lr +.Ltmp2: +.Lfunc_end0: + .size foo, .Lfunc_end0-foo + .cfi_endproc + .cantunwind + .fnend + + .section .text.bar,"ax",%progbits + .globl bar + .p2align 2 + .type bar,%function + .code 32 +bar: +.Lfunc_begin1: + .loc 1 7 0 + .fnstart + .cfi_startproc + .loc 1 8 4 prologue_end + add r0, r0, #1 +.Ltmp3: + .loc 1 9 3 + bx lr +.Ltmp4: +.Lfunc_end1: + .size bar, .Lfunc_end1-bar + .cfi_endproc + .cantunwind + .fnend + + .section .debug_str_offsets,"",%progbits + .long 48 + .short 5 + .short 0 +.Lstr_offsets_base0: + .section .debug_str,"MS",%progbits,1 +.Linfo_string0: + .asciz "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" +.Linfo_string1: + .asciz "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" +.Linfo_string2: + .asciz "/work/scratch" +.Linfo_string3: + .asciz "foo" +.Linfo_string4: + .asciz "int" +.Linfo_string5: + .asciz "bar" +.Linfo_string6: + .asciz "a" +.Linfo_string7: + .asciz "b" +.Linfo_string8: + .asciz "c" +.Linfo_string9: + .asciz "x" +.Linfo_string10: + .asciz "y" + .section .debug_str_offsets,"",%progbits + .long .Linfo_string0 + .long .Linfo_string1 + .long .Linfo_string2 + .long .Linfo_string3 + .long .Linfo_string4 + .long .Linfo_string5 + .long .Linfo_string6 + .long .Linfo_string7 + .long .Linfo_string8 + .long .Linfo_string9 + .long .Linfo_string10 + .section .debug_loclists,"",%progbits + .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 +.Ldebug_loclist_table_start0: + .short 5 + .byte 4 + .byte 0 + .long 3 +.Lloclists_table_base0: + .long .Ldebug_loc0-.Lloclists_table_base0 + .long .Ldebug_loc1-.Lloclists_table_base0 + .long .Ldebug_loc2-.Lloclists_table_base0 +.Ldebug_loc0: + .byte 3 + .byte 0 + .uleb128 .Ltmp0-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loc1: + .byte 1 + .byte 0 + .byte 4 + .uleb128 .Ltmp0-.Lfunc_begin0 + .uleb128 .Ltmp1-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loc2: + .byte 1 + .byte 0 + .byte 4 + .uleb128 .Ltmp1-.Lfunc_begin0 + .uleb128 .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loclist_table_end0: + .section .debug_abbrev,"",%progbits + .byte 1 + .byte 17 + .byte 1 + .byte 37 + .byte 37 + .byte 19 + .byte 5 + .byte 3 + .byte 37 + .byte 114 + .byte 23 + .byte 16 + .byte 23 + .byte 27 + .byte 37 + .byte 17 + .byte 1 + .byte 85 + .byte 35 + .byte 115 + .byte 23 + .byte 116 + .byte 23 + .ascii "\214\001" + .byte 23 + .byte 0 + .byte 0 + .byte 2 + .byte 46 + .byte 1 + .byte 17 + .byte 27 + .byte 18 + .byte 6 + .byte 64 + .byte 24 + .byte 122 + .byte 25 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 39 + .byte 25 + .byte 73 + .byte 19 + .byte 63 + .byte 25 + .byte 0 + .byte 0 + .byte 3 + .byte 5 + .byte 0 + .byte 2 + .byte 34 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 4 + .byte 5 + .byte 0 + .byte 2 + .byte 24 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 5 + .byte 52 + .byte 0 + .byte 2 + .byte 34 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 6 + .byte 36 + .byte 0 + .byte 3 + .byte 37 + .byte 62 + .byte 11 + .byte 11 + .byte 11 + .byte 0 + .byte 0 + .byte 0 + .section .debug_info,"",%progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 +.Ldebug_info_start0: + .short 5 + .byte 1 + .byte 4 + .long .debug_abbrev + .byte 1 + .byte 0 + .short 12 + .byte 1 + .long .Lstr_offsets_base0 + .long .Lline_table_start0 + .byte 2 + .long 0 + .byte 0 + .long .Laddr_table_base0 + .long .Lrnglists_table_base0 + .long .Lloclists_table_base0 + .byte 2 + .byte 0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 91 + + .byte 3 + .byte 1 + .byte 1 + + .long 132 + + .byte 3 + .byte 0 + .byte 6 + .byte 1 + .byte 1 + .long 132 + .byte 4 + .byte 1 + .byte 81 + .byte 7 + .byte 1 + .byte 1 + .long 132 + .byte 4 + .byte 1 + .byte 82 + .byte 8 + .byte 1 + .byte 1 + .long 132 + .byte 5 + .byte 1 + .byte 9 + .byte 1 + .byte 2 + .long 132 + .byte 5 + .byte 2 + .byte 10 + .byte 1 + .byte 3 + .long 132 + .byte 0 + .byte 2 + .byte 1 + .long .Lfunc_end1-.Lfunc_begin1 + .byte 1 + .byte 91 + + .byte 5 + .byte 1 + .byte 7 + + .long 132 + + .byte 4 + .byte 1 + .byte 80 + .byte 6 + .byte 1 + .byte 7 + .long 132 + .byte 0 + .byte 6 + .byte 4 + .byte 5 + .byte 4 + .byte 0 +.Ldebug_info_end0: + .section .debug_rnglists,"",%progbits + .long .Ldebug_rnglist_table_end0-.Ldebug_rnglist_table_start0 +.Ldebug_rnglist_table_start0: + .short 5 + .byte 4 + .byte 0 + .long 1 +.Lrnglists_table_base0: + .long .Ldebug_ranges0-.Lrnglists_table_base0 +.Ldebug_ranges0: + .byte 3 + .byte 0 + .uleb128 .Lfunc_end0-.Lfunc_begin0 + .byte 3 + .byte 1 + .uleb128 .Lfunc_end1-.Lfunc_begin1 + .byte 0 +.Ldebug_rnglist_table_end0: + .section .debug_addr,"",%progbits + .long .Ldebug_addr_end0-.Ldebug_addr_start0 +.Ldebug_addr_start0: + .short 5 + .byte 4 + .byte 0 +.Laddr_table_base0: + .long .Lfunc_begin0 + .long .Lfunc_begin1 +.Ldebug_addr_end0: + .ident "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" + .section ".note.GNU-stack","",%progbits + .addrsig + .eabi_attribute 30, 1 + .section .debug_line,"",%progbits +.Lline_table_start0: diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s new file mode 100644 index 000000000000..8a63a4ee9ec0 --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s @@ -0,0 +1,382 @@ +## Check that the --debug-vars option works for simple register locations, when +## using DWARF5 debug info, with multiple functions in one section. + +## Generated with this compile command, with the source code in Inputs/debug.c: +## clang --target=arm--none-eabi -march=armv7-a -c debug.c -O1 -gdwarf-3 -S -o - + +# RUN: llvm-mc -triple armv8a--none-eabi < %s -filetype=obj --dwarf-version=5 | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn | \ +# RUN: FileCheck %s + +# CHECK: Disassembly of section .text: +# CHECK-EMPTY: +# CHECK-NEXT: 00000000 : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: ┃ ┠─ b = R1 +# CHECK-NEXT: ┃ ┃ ┠─ c = R2 +# CHECK-NEXT: ┃ ┃ ┃ ┌─ x = R0 +# CHECK-NEXT: 0: add r0, r1, r0 ┻ ┃ ┃ ╈ +# CHECK-NEXT: ┌─ y = R0 +# CHECK-NEXT: 4: add r0, r0, r2 ╈ ┃ ┃ ┻ +# CHECK-NEXT: 8: bx lr ┻ ┻ ┻ +# CHECK-EMPTY: +# CHECK-NEXT: 0000000c : +# CHECK-NEXT: ┠─ a = R0 +# CHECK-NEXT: c: add r0, r0, #1 ┃ +# CHECK-NEXT: 10: bx lr ┻ + + .text + .syntax unified + .eabi_attribute 67, "2.09" + .eabi_attribute 6, 10 + .eabi_attribute 7, 65 + .eabi_attribute 8, 1 + .eabi_attribute 9, 2 + .fpu neon + .eabi_attribute 34, 0 + .eabi_attribute 17, 1 + .eabi_attribute 20, 1 + .eabi_attribute 21, 1 + .eabi_attribute 23, 3 + .eabi_attribute 24, 1 + .eabi_attribute 25, 1 + .eabi_attribute 38, 1 + .eabi_attribute 18, 4 + .eabi_attribute 26, 2 + .eabi_attribute 14, 0 + .file "debug.c" + .globl foo + .p2align 2 + .type foo,%function + .code 32 +foo: +.Lfunc_begin0: + .file 0 "/work/scratch" "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" md5 0x07374f01ab24ec7c07db73bc13bd778e + .file 1 "/work" "llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" md5 0x07374f01ab24ec7c07db73bc13bd778e + .loc 1 1 0 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc + .loc 1 2 13 prologue_end + add r0, r1, r0 +.Ltmp0: + .loc 1 3 13 + add r0, r0, r2 +.Ltmp1: + .loc 1 4 3 + bx lr +.Ltmp2: +.Lfunc_end0: + .size foo, .Lfunc_end0-foo + .cfi_endproc + .cantunwind + .fnend + + .globl bar + .p2align 2 + .type bar,%function + .code 32 +bar: +.Lfunc_begin1: + .loc 1 7 0 + .fnstart + .cfi_startproc + .loc 1 8 4 prologue_end + add r0, r0, #1 +.Ltmp3: + .loc 1 9 3 + bx lr +.Ltmp4: +.Lfunc_end1: + .size bar, .Lfunc_end1-bar + .cfi_endproc + .cantunwind + .fnend + + .section .debug_str_offsets,"",%progbits + .long 48 + .short 5 + .short 0 +.Lstr_offsets_base0: + .section .debug_str,"MS",%progbits,1 +.Linfo_string0: + .asciz "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" +.Linfo_string1: + .asciz "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" +.Linfo_string2: + .asciz "/work/scratch" +.Linfo_string3: + .asciz "foo" +.Linfo_string4: + .asciz "int" +.Linfo_string5: + .asciz "bar" +.Linfo_string6: + .asciz "a" +.Linfo_string7: + .asciz "b" +.Linfo_string8: + .asciz "c" +.Linfo_string9: + .asciz "x" +.Linfo_string10: + .asciz "y" + .section .debug_str_offsets,"",%progbits + .long .Linfo_string0 + .long .Linfo_string1 + .long .Linfo_string2 + .long .Linfo_string3 + .long .Linfo_string4 + .long .Linfo_string5 + .long .Linfo_string6 + .long .Linfo_string7 + .long .Linfo_string8 + .long .Linfo_string9 + .long .Linfo_string10 + .section .debug_loclists,"",%progbits + .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 +.Ldebug_loclist_table_start0: + .short 5 + .byte 4 + .byte 0 + .long 3 +.Lloclists_table_base0: + .long .Ldebug_loc0-.Lloclists_table_base0 + .long .Ldebug_loc1-.Lloclists_table_base0 + .long .Ldebug_loc2-.Lloclists_table_base0 +.Ldebug_loc0: + .byte 4 + .uleb128 .Lfunc_begin0-.Lfunc_begin0 + .uleb128 .Ltmp0-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loc1: + .byte 4 + .uleb128 .Ltmp0-.Lfunc_begin0 + .uleb128 .Ltmp1-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loc2: + .byte 4 + .uleb128 .Ltmp1-.Lfunc_begin0 + .uleb128 .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 80 + .byte 0 +.Ldebug_loclist_table_end0: + .section .debug_abbrev,"",%progbits + .byte 1 + .byte 17 + .byte 1 + .byte 37 + .byte 37 + .byte 19 + .byte 5 + .byte 3 + .byte 37 + .byte 114 + .byte 23 + .byte 16 + .byte 23 + .byte 27 + .byte 37 + .byte 17 + .byte 27 + .byte 18 + .byte 6 + .byte 115 + .byte 23 + .ascii "\214\001" + .byte 23 + .byte 0 + .byte 0 + .byte 2 + .byte 46 + .byte 1 + .byte 17 + .byte 27 + .byte 18 + .byte 6 + .byte 64 + .byte 24 + .byte 122 + .byte 25 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 39 + .byte 25 + .byte 73 + .byte 19 + .byte 63 + .byte 25 + .byte 0 + .byte 0 + .byte 3 + .byte 5 + .byte 0 + .byte 2 + .byte 34 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 4 + .byte 5 + .byte 0 + .byte 2 + .byte 24 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 5 + .byte 52 + .byte 0 + .byte 2 + .byte 34 + .byte 3 + .byte 37 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 6 + .byte 36 + .byte 0 + .byte 3 + .byte 37 + .byte 62 + .byte 11 + .byte 11 + .byte 11 + .byte 0 + .byte 0 + .byte 0 + .section .debug_info,"",%progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 +.Ldebug_info_start0: + .short 5 + .byte 1 + .byte 4 + .long .debug_abbrev + .byte 1 + .byte 0 + .short 12 + .byte 1 + .long .Lstr_offsets_base0 + .long .Lline_table_start0 + .byte 2 + .byte 0 + .long .Lfunc_end1-.Lfunc_begin0 + .long .Laddr_table_base0 + .long .Lloclists_table_base0 + .byte 2 + .byte 0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 91 + + .byte 3 + .byte 1 + .byte 1 + + .long 128 + + .byte 3 + .byte 0 + .byte 6 + .byte 1 + .byte 1 + .long 128 + .byte 4 + .byte 1 + .byte 81 + .byte 7 + .byte 1 + .byte 1 + .long 128 + .byte 4 + .byte 1 + .byte 82 + .byte 8 + .byte 1 + .byte 1 + .long 128 + .byte 5 + .byte 1 + .byte 9 + .byte 1 + .byte 2 + .long 128 + .byte 5 + .byte 2 + .byte 10 + .byte 1 + .byte 3 + .long 128 + .byte 0 + .byte 2 + .byte 1 + .long .Lfunc_end1-.Lfunc_begin1 + .byte 1 + .byte 91 + + .byte 5 + .byte 1 + .byte 7 + + .long 128 + + .byte 4 + .byte 1 + .byte 80 + .byte 6 + .byte 1 + .byte 7 + .long 128 + .byte 0 + .byte 6 + .byte 4 + .byte 5 + .byte 4 + .byte 0 +.Ldebug_info_end0: + .section .debug_addr,"",%progbits + .long .Ldebug_addr_end0-.Ldebug_addr_start0 +.Ldebug_addr_start0: + .short 5 + .byte 4 + .byte 0 +.Laddr_table_base0: + .long .Lfunc_begin0 + .long .Lfunc_begin1 +.Ldebug_addr_end0: + .ident "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" + .section ".note.GNU-stack","",%progbits + .addrsig + .eabi_attribute 30, 1 + .section .debug_line,"",%progbits +.Lline_table_start0: diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s b/llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s new file mode 100644 index 000000000000..2573dc63513e --- /dev/null +++ b/llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s @@ -0,0 +1,232 @@ +# RUN: mkdir -p %t/a +# RUN: cp %p/Inputs/wide-char.c %t/a/wide-char.c +# RUN: sed -e "s,SRC_COMPDIR,%/t/a,g" %s > %t.s +# RUN: llvm-mc -triple armv8a--none-eabi < %t.s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --source | \ +# RUN: FileCheck %s --strict-whitespace + +## The Chinese character in the source does not print correctly on Windows. +# UNSUPPORTED: system-windows + +## Check that the --debug-vars option correctly aligns the variable display when +## the source code (printed by the -S option) includes East Asian wide +## characters. + +# CHECK: 00000000 : +# CHECK-NEXT: ; return *喵; ┠─ 喵 = R0 +# CHECK-NEXT: 0: 00 00 90 e5 ldr r0, [r0] ┻ +# CHECK-NEXT: 4: 1e ff 2f e1 bx lr + + .text + .syntax unified + .eabi_attribute 67, "2.09" + .eabi_attribute 6, 10 + .eabi_attribute 7, 65 + .eabi_attribute 8, 1 + .eabi_attribute 9, 2 + .fpu vfpv3 + .eabi_attribute 34, 0 + .eabi_attribute 17, 1 + .eabi_attribute 20, 1 + .eabi_attribute 21, 1 + .eabi_attribute 23, 3 + .eabi_attribute 24, 1 + .eabi_attribute 25, 1 + .eabi_attribute 38, 1 + .eabi_attribute 18, 4 + .eabi_attribute 26, 2 + .eabi_attribute 14, 0 + .file "wide.c" + .globl foo + .p2align 2 + .type foo,%function + .code 32 +foo: +.Lfunc_begin0: + .file 1 "SRC_COMPDIR/wide-char.c" + .loc 1 1 0 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc + .loc 1 2 10 prologue_end + ldr r0, [r0] +.Ltmp0: + .loc 1 2 3 is_stmt 0 + bx lr +.Ltmp1: +.Lfunc_end0: + .size foo, .Lfunc_end0-foo + .cfi_endproc + .cantunwind + .fnend + + .section .debug_str,"MS",%progbits,1 +.Linfo_string0: + .asciz "clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)" +.Linfo_string1: + .asciz "wide-char.c" +.Linfo_string2: + .asciz "SRC_COMPDIR" +.Linfo_string3: + .asciz "foo" +.Linfo_string4: + .asciz "int" +.Linfo_string5: + .asciz "\345\226\265" + .section .debug_loc,"",%progbits +.Ldebug_loc0: + .long .Lfunc_begin0-.Lfunc_begin0 + .long .Ltmp0-.Lfunc_begin0 + .short 1 + .byte 80 + .long 0 + .long 0 + .section .debug_abbrev,"",%progbits + .byte 1 + .byte 17 + .byte 1 + .byte 37 + .byte 14 + .byte 19 + .byte 5 + .byte 3 + .byte 14 + .byte 16 + .byte 23 + .byte 27 + .byte 14 + .ascii "\264B" + .byte 25 + .byte 17 + .byte 1 + .byte 18 + .byte 6 + .byte 0 + .byte 0 + .byte 2 + .byte 46 + .byte 1 + .byte 17 + .byte 1 + .byte 18 + .byte 6 + .byte 64 + .byte 24 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 39 + .byte 25 + .byte 73 + .byte 19 + .byte 63 + .byte 25 + .byte 0 + .byte 0 + .byte 3 + .byte 5 + .byte 0 + .byte 2 + .byte 23 + .byte 3 + .byte 14 + .byte 58 + .byte 11 + .byte 59 + .byte 11 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 4 + .byte 36 + .byte 0 + .byte 3 + .byte 14 + .byte 62 + .byte 11 + .byte 11 + .byte 11 + .byte 0 + .byte 0 + .byte 5 + .byte 15 + .byte 0 + .byte 73 + .byte 19 + .byte 0 + .byte 0 + .byte 0 + .section .debug_info,"",%progbits +.Lcu_begin0: + .long 84 + .short 4 + .long .debug_abbrev + .byte 4 + .byte 1 + .long .Linfo_string0 + .short 12 + .long .Linfo_string1 + .long .Lline_table_start0 + .long .Linfo_string2 + + .long .Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 2 + .long .Lfunc_begin0 + .long .Lfunc_end0-.Lfunc_begin0 + .byte 1 + .byte 91 + .long .Linfo_string3 + .byte 1 + .byte 1 + + .long 75 + + .byte 3 + .long .Ldebug_loc0 + .long .Linfo_string5 + .byte 1 + .byte 1 + .long 82 + .byte 0 + .byte 4 + .long .Linfo_string4 + .byte 5 + .byte 4 + .byte 5 + .long 75 + .byte 0 + .section .debug_ranges,"",%progbits + .section .debug_macinfo,"",%progbits +.Lcu_macro_begin0: + .byte 0 + .section .debug_pubnames,"",%progbits + .long .LpubNames_end0-.LpubNames_begin0 +.LpubNames_begin0: + .short 2 + .long .Lcu_begin0 + .long 88 + .long 38 + .asciz "foo" + .long 0 +.LpubNames_end0: + .section .debug_pubtypes,"",%progbits + .long .LpubTypes_end0-.LpubTypes_begin0 +.LpubTypes_begin0: + .short 2 + .long .Lcu_begin0 + .long 88 + .long 75 + .asciz "int" + .long 0 +.LpubTypes_end0: + + .ident "clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)" + .section ".note.GNU-stack","",%progbits + .eabi_attribute 30, 1 + .section .debug_line,"",%progbits +.Lline_table_start0: diff --git a/llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s b/llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s new file mode 100644 index 000000000000..6c3d326843c7 --- /dev/null +++ b/llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s @@ -0,0 +1,372 @@ +## Check that the --debug-vars option works for simple register locations, when +## using DWARF4 debug info, with multiple functions in one section. + +## Generated with this compile command and source code: +## clang --target=arm--none-eabi -march=armv7-a -c debug.c -O1 -gdwarf-3 -S -o - + +## clang --target=powerpc64-unknown-linux -c debug.c -O1 -S -o - + +## int foo(int a, int b, int c) { +## int x = a + b; +## int y = x + c; +## return y; +## } +## +## int bar(int a) { +## a++; +## return a; +## } + +# RUN: llvm-mc -triple powerpc64-unknown-linux < %s -filetype=obj | \ +# RUN: llvm-objdump - -d --debug-vars --no-show-raw-insn | \ +# RUN: FileCheck %s + +# CHECK: Disassembly of section .text: +# CHECK-EMPTY: +# CHECK-NEXT: 0000000000000000 <.text>: +# CHECK-NEXT: ┠─ a = S3 +# CHECK-NEXT: ┃ ┠─ b = S4 +# CHECK-NEXT: ┃ ┃ ┠─ c = S5 +# CHECK-NEXT: ┃ ┃ ┃ ┌─ x = S3 +# CHECK-NEXT: 0: add 3, 4, 3 ┻ ┃ ┃ ╈ +# CHECK-NEXT: ┌─ y = S3 +# CHECK-NEXT: 4: add 3, 3, 5 ╈ ┃ ┃ ┻ +# CHECK-NEXT: 8: extsw 3, 3 ┻ ┃ ┃ +# CHECK-NEXT: c: blr ┃ ┃ +# CHECK-NEXT: ... +# CHECK-NEXT: ┠─ a = S3 +# CHECK-NEXT: 1c: addi 3, 3, 1 ┃ +# CHECK-NEXT: 20: extsw 3, 3 ┻ +# CHECK-NEXT: 24: blr +# CHECK-NEXT: ... + + .text + .file "debug.c" + .globl foo # -- Begin function foo + .p2align 2 + .type foo, at function + .section .opd,"aw", at progbits +foo: # @foo + .p2align 3 + .quad .Lfunc_begin0 + .quad .TOC. at tocbase + .quad 0 + .text +.Lfunc_begin0: + .file 1 "/work" "llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" + .loc 1 1 0 # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:1:0 + .cfi_sections .debug_frame + .cfi_startproc +# %bb.0: # %entry + #DEBUG_VALUE: foo:a <- $x3 + #DEBUG_VALUE: foo:a <- $r3 + #DEBUG_VALUE: foo:b <- $x4 + #DEBUG_VALUE: foo:b <- $x4 + #DEBUG_VALUE: foo:b <- $r4 + #DEBUG_VALUE: foo:c <- $x5 + #DEBUG_VALUE: foo:c <- $x5 + #DEBUG_VALUE: foo:c <- $r5 + .loc 1 2 13 prologue_end # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:2:13 + add 3, 4, 3 +.Ltmp0: + #DEBUG_VALUE: foo:x <- $r3 + .loc 1 3 13 # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:3:13 + add 3, 3, 5 +.Ltmp1: + #DEBUG_VALUE: foo:y <- $r3 + .loc 1 4 3 # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:4:3 + extsw 3, 3 +.Ltmp2: + blr +.Ltmp3: + .long 0 + .quad 0 +.Lfunc_end0: + .size foo, .Lfunc_end0-.Lfunc_begin0 + .cfi_endproc + # -- End function + .globl bar # -- Begin function bar + .p2align 2 + .type bar, at function + .section .opd,"aw", at progbits +bar: # @bar + .p2align 3 + .quad .Lfunc_begin1 + .quad .TOC. at tocbase + .quad 0 + .text +.Lfunc_begin1: + .loc 1 7 0 # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:7:0 + .cfi_startproc +# %bb.0: # %entry + #DEBUG_VALUE: bar:a <- $x3 + #DEBUG_VALUE: bar:a <- $r3 + .loc 1 8 4 prologue_end # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:8:4 + addi 3, 3, 1 +.Ltmp4: + #DEBUG_VALUE: bar:a <- $r3 + .loc 1 9 3 # llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c:9:3 + extsw 3, 3 +.Ltmp5: + blr +.Ltmp6: + .long 0 + .quad 0 +.Lfunc_end1: + .size bar, .Lfunc_end1-.Lfunc_begin1 + .cfi_endproc + # -- End function + .section .debug_str,"MS", at progbits,1 +.Linfo_string0: + .asciz "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" # string offset=0 +.Linfo_string1: + .asciz "/work/llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c" # string offset=101 +.Linfo_string2: + .asciz "/work/scratch" # string offset=164 +.Linfo_string3: + .asciz "foo" # string offset=178 +.Linfo_string4: + .asciz "int" # string offset=182 +.Linfo_string5: + .asciz "bar" # string offset=186 +.Linfo_string6: + .asciz "a" # string offset=190 +.Linfo_string7: + .asciz "b" # string offset=192 +.Linfo_string8: + .asciz "c" # string offset=194 +.Linfo_string9: + .asciz "x" # string offset=196 +.Linfo_string10: + .asciz "y" # string offset=198 + .section .debug_loc,"", at progbits +.Ldebug_loc0: + .quad .Lfunc_begin0-.Lfunc_begin0 + .quad .Ltmp0-.Lfunc_begin0 + .short 3 # Loc expr size + .byte 144 # super-register DW_OP_regx + .byte 179 # 1203 + .byte 9 # + .quad 0 + .quad 0 +.Ldebug_loc1: + .quad .Ltmp0-.Lfunc_begin0 + .quad .Ltmp1-.Lfunc_begin0 + .short 3 # Loc expr size + .byte 144 # super-register DW_OP_regx + .byte 179 # 1203 + .byte 9 # + .quad 0 + .quad 0 +.Ldebug_loc2: + .quad .Ltmp1-.Lfunc_begin0 + .quad .Ltmp2-.Lfunc_begin0 + .short 3 # Loc expr size + .byte 144 # super-register DW_OP_regx + .byte 179 # 1203 + .byte 9 # + .quad 0 + .quad 0 +.Ldebug_loc3: + .quad .Lfunc_begin1-.Lfunc_begin0 + .quad .Ltmp5-.Lfunc_begin0 + .short 3 # Loc expr size + .byte 144 # super-register DW_OP_regx + .byte 179 # 1203 + .byte 9 # + .quad 0 + .quad 0 + .section .debug_abbrev,"", at progbits + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 1 # DW_CHILDREN_yes + .byte 37 # DW_AT_producer + .byte 14 # DW_FORM_strp + .byte 19 # DW_AT_language + .byte 5 # DW_FORM_data2 + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 16 # DW_AT_stmt_list + .byte 23 # DW_FORM_sec_offset + .byte 27 # DW_AT_comp_dir + .byte 14 # DW_FORM_strp + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 2 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 1 # DW_CHILDREN_yes + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 64 # DW_AT_frame_base + .byte 24 # DW_FORM_exprloc + .ascii "\227B" # DW_AT_GNU_all_call_sites + .byte 25 # DW_FORM_flag_present + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 39 # DW_AT_prototyped + .byte 25 # DW_FORM_flag_present + .byte 73 # DW_AT_type + .byte 19 # DW_FORM_ref4 + .byte 63 # DW_AT_external + .byte 25 # DW_FORM_flag_present + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 3 # Abbreviation Code + .byte 5 # DW_TAG_formal_parameter + .byte 0 # DW_CHILDREN_no + .byte 2 # DW_AT_location + .byte 23 # DW_FORM_sec_offset + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 73 # DW_AT_type + .byte 19 # DW_FORM_ref4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 4 # Abbreviation Code + .byte 5 # DW_TAG_formal_parameter + .byte 0 # DW_CHILDREN_no + .byte 2 # DW_AT_location + .byte 24 # DW_FORM_exprloc + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 73 # DW_AT_type + .byte 19 # DW_FORM_ref4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 5 # Abbreviation Code + .byte 52 # DW_TAG_variable + .byte 0 # DW_CHILDREN_no + .byte 2 # DW_AT_location + .byte 23 # DW_FORM_sec_offset + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 73 # DW_AT_type + .byte 19 # DW_FORM_ref4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 6 # Abbreviation Code + .byte 36 # DW_TAG_base_type + .byte 0 # DW_CHILDREN_no + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 62 # DW_AT_encoding + .byte 11 # DW_FORM_data1 + .byte 11 # DW_AT_byte_size + .byte 11 # DW_FORM_data1 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + .section .debug_info,"", at progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit +.Ldebug_info_start0: + .short 4 # DWARF version number + .long .debug_abbrev # Offset Into Abbrev. Section + .byte 8 # Address Size (in bytes) + .byte 1 # Abbrev [1] 0xb:0xb5 DW_TAG_compile_unit + .long .Linfo_string0 # DW_AT_producer + .short 12 # DW_AT_language + .long .Linfo_string1 # DW_AT_name + .long .Lline_table_start0 # DW_AT_stmt_list + .long .Linfo_string2 # DW_AT_comp_dir + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin0 # DW_AT_high_pc + .byte 2 # Abbrev [2] 0x2a:0x65 DW_TAG_subprogram + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 81 + # DW_AT_GNU_all_call_sites + .long .Linfo_string3 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 1 # DW_AT_decl_line + # DW_AT_prototyped + .long 184 # DW_AT_type + # DW_AT_external + .byte 3 # Abbrev [3] 0x43:0xf DW_TAG_formal_parameter + .long .Ldebug_loc0 # DW_AT_location + .long .Linfo_string6 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 1 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 4 # Abbrev [4] 0x52:0xf DW_TAG_formal_parameter + .byte 3 # DW_AT_location + .byte 144 + .ascii "\264\t" + .long .Linfo_string7 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 1 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 4 # Abbrev [4] 0x61:0xf DW_TAG_formal_parameter + .byte 3 # DW_AT_location + .byte 144 + .ascii "\265\t" + .long .Linfo_string8 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 1 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 5 # Abbrev [5] 0x70:0xf DW_TAG_variable + .long .Ldebug_loc1 # DW_AT_location + .long .Linfo_string9 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 2 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 5 # Abbrev [5] 0x7f:0xf DW_TAG_variable + .long .Ldebug_loc2 # DW_AT_location + .long .Linfo_string10 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 3 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 0 # End Of Children Mark + .byte 2 # Abbrev [2] 0x8f:0x29 DW_TAG_subprogram + .quad .Lfunc_begin1 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin1 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 81 + # DW_AT_GNU_all_call_sites + .long .Linfo_string5 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 7 # DW_AT_decl_line + # DW_AT_prototyped + .long 184 # DW_AT_type + # DW_AT_external + .byte 3 # Abbrev [3] 0xa8:0xf DW_TAG_formal_parameter + .long .Ldebug_loc3 # DW_AT_location + .long .Linfo_string6 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 7 # DW_AT_decl_line + .long 184 # DW_AT_type + .byte 0 # End Of Children Mark + .byte 6 # Abbrev [6] 0xb8:0x7 DW_TAG_base_type + .long .Linfo_string4 # DW_AT_name + .byte 5 # DW_AT_encoding + .byte 4 # DW_AT_byte_size + .byte 0 # End Of Children Mark +.Ldebug_info_end0: + .ident "clang version 10.0.0 (git at github.com:llvm/llvm-project.git e73f78acd34360f7450b81167d9dc858ccddc262)" + .section ".note.GNU-stack","", at progbits + .addrsig + .section .debug_line,"", at progbits +.Lline_table_start0: diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp index 81b3aac5c931..7d282074efa6 100644 --- a/llvm/tools/llvm-objdump/llvm-objdump.cpp +++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp @@ -21,7 +21,9 @@ #include "MachODump.h" #include "WasmDump.h" #include "XCOFFDump.h" +#include "llvm/ADT/IndexedMap.h" #include "llvm/ADT/Optional.h" +#include "llvm/ADT/SmallSet.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SetOperations.h" #include "llvm/ADT/StringExtras.h" @@ -79,6 +81,8 @@ using namespace llvm; using namespace llvm::object; using namespace llvm::objdump; +#define DEBUG_TYPE "objdump" + static cl::OptionCategory ObjdumpCat("llvm-objdump Options"); static cl::opt AdjustVMA( @@ -344,6 +348,28 @@ static cl::opt cl::cat(ObjdumpCat)); static cl::alias WideShort("w", cl::Grouping, cl::aliasopt(Wide)); +enum DebugVarsFormat { + DVDisabled, + DVUnicode, + DVASCII, +}; + +static cl::opt DbgVariables( + "debug-vars", cl::init(DVDisabled), + cl::desc("Print the locations (in registers or memory) of " + "source-level variables alongside disassembly"), + cl::ValueOptional, + cl::values(clEnumValN(DVUnicode, "", "unicode"), + clEnumValN(DVUnicode, "unicode", "unicode"), + clEnumValN(DVASCII, "ascii", "unicode")), + cl::cat(ObjdumpCat)); + +static cl::opt + DbgIndent("debug-vars-indent", cl::init(40), + cl::desc("Distance to indent the source-level variable display, " + "relative to the start of the disassembly"), + cl::cat(ObjdumpCat)); + static cl::extrahelp HelpResponse("\nPass @FILE as argument to read options from FILE.\n"); @@ -548,6 +574,357 @@ static bool getHidden(RelocationRef RelRef) { } namespace { + +/// Get the column at which we want to start printing the instruction +/// disassembly, taking into account anything which appears to the left of it. +unsigned getInstStartColumn(const MCSubtargetInfo &STI) { + return NoShowRawInsn ? 16 : STI.getTargetTriple().isX86() ? 40 : 24; +} + +/// Stores a single expression representing the location of a source-level +/// variable, along with the PC range for which that expression is valid. +struct LiveVariable { + DWARFLocationExpression LocExpr; + const char *VarName; + DWARFUnit *Unit; + const DWARFDie FuncDie; + + LiveVariable(const DWARFLocationExpression &LocExpr, const char *VarName, + DWARFUnit *Unit, const DWARFDie FuncDie) + : LocExpr(LocExpr), VarName(VarName), Unit(Unit), FuncDie(FuncDie) {} + + bool liveAtAddress(object::SectionedAddress Addr) { + if (LocExpr.Range == None) + return false; + return LocExpr.Range->SectionIndex == Addr.SectionIndex && + LocExpr.Range->LowPC <= Addr.Address && + LocExpr.Range->HighPC > Addr.Address; + } + + void print(raw_ostream &OS, const MCRegisterInfo &MRI) const { + DataExtractor Data({LocExpr.Expr.data(), LocExpr.Expr.size()}, + Unit->getContext().isLittleEndian(), 0); + DWARFExpression Expression(Data, Unit->getAddressByteSize()); + Expression.printCompact(OS, MRI); + } +}; + +/// Helper class for printing source variable locations alongside disassembly. +class LiveVariablePrinter { + // Information we want to track about one column in which we are printing a + // variable live range. + struct Column { + unsigned VarIdx = NullVarIdx; + bool LiveIn = false; + bool LiveOut = false; + bool MustDrawLabel = false; + + bool isActive() const { return VarIdx != NullVarIdx; } + + static constexpr unsigned NullVarIdx = std::numeric_limits::max(); + }; + + // All live variables we know about in the object/image file. + std::vector LiveVariables; + + // The columns we are currently drawing. + IndexedMap ActiveCols; + + const MCRegisterInfo &MRI; + const MCSubtargetInfo &STI; + + void addVariable(DWARFDie FuncDie, DWARFDie VarDie) { + uint64_t FuncLowPC, FuncHighPC, SectionIndex; + FuncDie.getLowAndHighPC(FuncLowPC, FuncHighPC, SectionIndex); + const char *VarName = VarDie.getName(DINameKind::ShortName); + DWARFUnit *U = VarDie.getDwarfUnit(); + + Expected Locs = + VarDie.getLocations(dwarf::DW_AT_location); + if (!Locs) { + // If the variable doesn't have any locations, just ignore it. We don't + // report an error or warning here as that could be noisy on optimised + // code. + consumeError(Locs.takeError()); + return; + } + + for (const DWARFLocationExpression &LocExpr : *Locs) { + if (LocExpr.Range) { + LiveVariables.emplace_back(LocExpr, VarName, U, FuncDie); + } else { + // If the LocExpr does not have an associated range, it is valid for + // the whole of the function. + // TODO: technically it is not valid for any range covered by another + // LocExpr, does that happen in reality? + DWARFLocationExpression WholeFuncExpr{ + DWARFAddressRange(FuncLowPC, FuncHighPC, SectionIndex), + LocExpr.Expr}; + LiveVariables.emplace_back(WholeFuncExpr, VarName, U, FuncDie); + } + } + } + + void addFunction(DWARFDie D) { + for (const DWARFDie &Child : D.children()) { + if (Child.getTag() == dwarf::DW_TAG_variable || + Child.getTag() == dwarf::DW_TAG_formal_parameter) + addVariable(D, Child); + else + addFunction(Child); + } + } + + // Get the column number (in characters) at which the first live variable + // line should be printed. + unsigned getIndentLevel() const { + return DbgIndent + getInstStartColumn(STI); + } + + // Indent to the first live-range column to the right of the currently + // printed line, and return the index of that column. + // TODO: formatted_raw_ostream uses "column" to mean a number of characters + // since the last \n, and we use it to mean the number of slots in which we + // put live variable lines. Pick a less overloaded word. + unsigned moveToFirstVarColumn(formatted_raw_ostream &OS) { + // Logical column number: column zero is the first column we print in, each + // logical column is 2 physical columns wide. + unsigned FirstUnprintedLogicalColumn = + std::max((int)(OS.getColumn() - getIndentLevel() + 1) / 2, 0); + // Physical column number: the actual column number in characters, with + // zero being the left-most side of the screen. + unsigned FirstUnprintedPhysicalColumn = + getIndentLevel() + FirstUnprintedLogicalColumn * 2; + + if (FirstUnprintedPhysicalColumn > OS.getColumn()) + OS.PadToColumn(FirstUnprintedPhysicalColumn); + + return FirstUnprintedLogicalColumn; + } + + unsigned findFreeColumn() { + for (unsigned ColIdx = 0; ColIdx < ActiveCols.size(); ++ColIdx) + if (!ActiveCols[ColIdx].isActive()) + return ColIdx; + + size_t OldSize = ActiveCols.size(); + ActiveCols.grow(std::max(OldSize * 2, 1)); + return OldSize; + } + +public: + LiveVariablePrinter(const MCRegisterInfo &MRI, const MCSubtargetInfo &STI) + : LiveVariables(), ActiveCols(Column()), MRI(MRI), STI(STI) {} + + void dump() const { + for (const LiveVariable &LV : LiveVariables) { + dbgs() << LV.VarName << " @ " << LV.LocExpr.Range << ": "; + LV.print(dbgs(), MRI); + dbgs() << "\n"; + } + } + + void addCompileUnit(DWARFDie D) { + if (D.getTag() == dwarf::DW_TAG_subprogram) + addFunction(D); + else + for (const DWARFDie &Child : D.children()) + addFunction(Child); + } + + /// Update to match the state of the instruction between ThisAddr and + /// NextAddr. In the common case, any live range active at ThisAddr is + /// live-in to the instruction, and any live range active at NextAddr is + /// live-out of the instruction. If IncludeDefinedVars is false, then live + /// ranges starting at NextAddr will be ignored. + void update(object::SectionedAddress ThisAddr, + object::SectionedAddress NextAddr, bool IncludeDefinedVars) { + // First, check variables which have already been assigned a column, so + // that we don't change their order. + SmallSet CheckedVarIdxs; + for (unsigned ColIdx = 0, End = ActiveCols.size(); ColIdx < End; ++ColIdx) { + if (!ActiveCols[ColIdx].isActive()) + continue; + CheckedVarIdxs.insert(ActiveCols[ColIdx].VarIdx); + LiveVariable &LV = LiveVariables[ActiveCols[ColIdx].VarIdx]; + ActiveCols[ColIdx].LiveIn = LV.liveAtAddress(ThisAddr); + ActiveCols[ColIdx].LiveOut = LV.liveAtAddress(NextAddr); + LLVM_DEBUG(dbgs() << "pass 1, " << ThisAddr.Address << "-" + << NextAddr.Address << ", " << LV.VarName << ", Col " + << ColIdx << ": LiveIn=" << ActiveCols[ColIdx].LiveIn + << ", LiveOut=" << ActiveCols[ColIdx].LiveOut << "\n"); + + if (!ActiveCols[ColIdx].LiveIn && !ActiveCols[ColIdx].LiveOut) + ActiveCols[ColIdx].VarIdx = Column::NullVarIdx; + } + + // Next, look for variables which don't already have a column, but which + // are now live. + if (IncludeDefinedVars) { + for (unsigned VarIdx = 0, End = LiveVariables.size(); VarIdx < End; + ++VarIdx) { + if (CheckedVarIdxs.count(VarIdx)) + continue; + LiveVariable &LV = LiveVariables[VarIdx]; + bool LiveIn = LV.liveAtAddress(ThisAddr); + bool LiveOut = LV.liveAtAddress(NextAddr); + if (!LiveIn && !LiveOut) + continue; + + unsigned ColIdx = findFreeColumn(); + LLVM_DEBUG(dbgs() << "pass 2, " << ThisAddr.Address << "-" + << NextAddr.Address << ", " << LV.VarName << ", Col " + << ColIdx << ": LiveIn=" << LiveIn + << ", LiveOut=" << LiveOut << "\n"); + ActiveCols[ColIdx].VarIdx = VarIdx; + ActiveCols[ColIdx].LiveIn = LiveIn; + ActiveCols[ColIdx].LiveOut = LiveOut; + ActiveCols[ColIdx].MustDrawLabel = true; + } + } + } + + enum class LineChar { + RangeStart, + RangeMid, + RangeEnd, + LabelVert, + LabelCornerNew, + LabelCornerActive, + LabelHoriz, + }; + const char *getLineChar(LineChar C) const { + bool IsASCII = DbgVariables == DVASCII; + switch (C) { + case LineChar::RangeStart: + return IsASCII ? "^" : u8"\u2548"; + case LineChar::RangeMid: + return IsASCII ? "|" : u8"\u2503"; + case LineChar::RangeEnd: + return IsASCII ? "v" : u8"\u253b"; + case LineChar::LabelVert: + return IsASCII ? "|" : u8"\u2502"; + case LineChar::LabelCornerNew: + return IsASCII ? "/" : u8"\u250c"; + case LineChar::LabelCornerActive: + return IsASCII ? "|" : u8"\u2520"; + case LineChar::LabelHoriz: + return IsASCII ? "-" : u8"\u2500"; + } + } + + /// Print live ranges to the right of an existing line. This assumes the + /// line is not an instruction, so doesn't start or end any live ranges, so + /// we only need to print active ranges or empty columns. If AfterInst is + /// true, this is being printed after the last instruction fed to update(), + /// otherwise this is being printed before it. + void printAfterOtherLine(formatted_raw_ostream &OS, bool AfterInst) { + if (ActiveCols.size()) { + unsigned FirstUnprintedColumn = moveToFirstVarColumn(OS); + for (size_t ColIdx = FirstUnprintedColumn, End = ActiveCols.size(); + ColIdx < End; ++ColIdx) { + if (ActiveCols[ColIdx].isActive()) { + if ((AfterInst && ActiveCols[ColIdx].LiveOut) || + (!AfterInst && ActiveCols[ColIdx].LiveIn)) + OS << getLineChar(LineChar::RangeMid); + else if (!AfterInst && ActiveCols[ColIdx].LiveOut) + OS << getLineChar(LineChar::LabelVert); + else + OS << " "; + } + OS << " "; + } + } + OS << "\n"; + } + + /// Print any live variable range info needed to the right of a + /// non-instruction line of disassembly. This is where we print the variable + /// names and expressions, with thin line-drawing characters connecting them + /// to the live range which starts at the next instruction. If MustPrint is + /// true, we have to print at least one line (with the continuation of any + /// already-active live ranges) because something has already been printed + /// earlier on this line. + void printBetweenInsts(formatted_raw_ostream &OS, bool MustPrint) { + bool PrintedSomething = false; + for (unsigned ColIdx = 0, End = ActiveCols.size(); ColIdx < End; ++ColIdx) { + if (ActiveCols[ColIdx].isActive() && ActiveCols[ColIdx].MustDrawLabel) { + // First we need to print the live range markers for any active + // columns to the left of this one. + OS.PadToColumn(getIndentLevel()); + for (unsigned ColIdx2 = 0; ColIdx2 < ColIdx; ++ColIdx2) { + if (ActiveCols[ColIdx2].isActive()) { + if (ActiveCols[ColIdx2].MustDrawLabel && + !ActiveCols[ColIdx2].LiveIn) + OS << getLineChar(LineChar::LabelVert) << " "; + else + OS << getLineChar(LineChar::RangeMid) << " "; + } else + OS << " "; + } + + // Then print the variable name and location of the new live range, + // with box drawing characters joining it to the live range line. + OS << getLineChar(ActiveCols[ColIdx].LiveIn + ? LineChar::LabelCornerActive + : LineChar::LabelCornerNew) + << getLineChar(LineChar::LabelHoriz) << " "; + WithColor(OS, raw_ostream::GREEN) + << LiveVariables[ActiveCols[ColIdx].VarIdx].VarName; + OS << " = "; + { + WithColor ExprColor(OS, raw_ostream::CYAN); + LiveVariables[ActiveCols[ColIdx].VarIdx].print(OS, MRI); + } + + // If there are any columns to the right of the expression we just + // printed, then continue their live range lines. + unsigned FirstUnprintedColumn = moveToFirstVarColumn(OS); + for (unsigned ColIdx2 = FirstUnprintedColumn, End = ActiveCols.size(); + ColIdx2 < End; ++ColIdx2) { + if (ActiveCols[ColIdx2].isActive() && ActiveCols[ColIdx2].LiveIn) + OS << getLineChar(LineChar::RangeMid) << " "; + else + OS << " "; + } + + OS << "\n"; + PrintedSomething = true; + } + } + + for (unsigned ColIdx = 0, End = ActiveCols.size(); ColIdx < End; ++ColIdx) + if (ActiveCols[ColIdx].isActive()) + ActiveCols[ColIdx].MustDrawLabel = false; + + // If we must print something (because we printed a line/column number), + // but don't have any new variables to print, then print a line which + // just continues any existing live ranges. + if (MustPrint && !PrintedSomething) + printAfterOtherLine(OS, false); + } + + /// Print the live variable ranges to the right of a disassembled instruction. + void printAfterInst(formatted_raw_ostream &OS) { + if (!ActiveCols.size()) + return; + unsigned FirstUnprintedColumn = moveToFirstVarColumn(OS); + for (unsigned ColIdx = FirstUnprintedColumn, End = ActiveCols.size(); + ColIdx < End; ++ColIdx) { + if (!ActiveCols[ColIdx].isActive()) + OS << " "; + else if (ActiveCols[ColIdx].LiveIn && ActiveCols[ColIdx].LiveOut) + OS << getLineChar(LineChar::RangeMid) << " "; + else if (ActiveCols[ColIdx].LiveOut) + OS << getLineChar(LineChar::RangeStart) << " "; + else if (ActiveCols[ColIdx].LiveIn) + OS << getLineChar(LineChar::RangeEnd) << " "; + else + llvm_unreachable("var must be live in or out!"); + } + } +}; + class SourcePrinter { protected: DILineInfo OldLineInfo; @@ -565,11 +942,12 @@ class SourcePrinter { private: bool cacheSource(const DILineInfo& LineInfoFile); - void printLines(raw_ostream &OS, const DILineInfo &LineInfo, - StringRef Delimiter); + void printLines(formatted_raw_ostream &OS, const DILineInfo &LineInfo, + StringRef Delimiter, LiveVariablePrinter &LVP); - void printSources(raw_ostream &OS, const DILineInfo &LineInfo, - StringRef ObjectFilename, StringRef Delimiter); + void printSources(formatted_raw_ostream &OS, const DILineInfo &LineInfo, + StringRef ObjectFilename, StringRef Delimiter, + LiveVariablePrinter &LVP); public: SourcePrinter() = default; @@ -583,9 +961,10 @@ class SourcePrinter { Symbolizer.reset(new symbolize::LLVMSymbolizer(SymbolizerOpts)); } virtual ~SourcePrinter() = default; - virtual void printSourceLine(raw_ostream &OS, + virtual void printSourceLine(formatted_raw_ostream &OS, object::SectionedAddress Address, StringRef ObjectFilename, + LiveVariablePrinter &LVP, StringRef Delimiter = "; "); }; @@ -619,9 +998,10 @@ bool SourcePrinter::cacheSource(const DILineInfo &LineInfo) { return true; } -void SourcePrinter::printSourceLine(raw_ostream &OS, +void SourcePrinter::printSourceLine(formatted_raw_ostream &OS, object::SectionedAddress Address, StringRef ObjectFilename, + LiveVariablePrinter &LVP, StringRef Delimiter) { if (!Symbolizer) return; @@ -646,14 +1026,15 @@ void SourcePrinter::printSourceLine(raw_ostream &OS, } if (PrintLines) - printLines(OS, LineInfo, Delimiter); + printLines(OS, LineInfo, Delimiter, LVP); if (PrintSource) - printSources(OS, LineInfo, ObjectFilename, Delimiter); + printSources(OS, LineInfo, ObjectFilename, Delimiter, LVP); OldLineInfo = LineInfo; } -void SourcePrinter::printLines(raw_ostream &OS, const DILineInfo &LineInfo, - StringRef Delimiter) { +void SourcePrinter::printLines(formatted_raw_ostream &OS, + const DILineInfo &LineInfo, StringRef Delimiter, + LiveVariablePrinter &LVP) { bool PrintFunctionName = LineInfo.FunctionName != DILineInfo::BadString && LineInfo.FunctionName != OldLineInfo.FunctionName; if (PrintFunctionName) { @@ -666,13 +1047,16 @@ void SourcePrinter::printLines(raw_ostream &OS, const DILineInfo &LineInfo, } if (LineInfo.FileName != DILineInfo::BadString && LineInfo.Line != 0 && (OldLineInfo.Line != LineInfo.Line || - OldLineInfo.FileName != LineInfo.FileName || PrintFunctionName)) - OS << Delimiter << LineInfo.FileName << ":" << LineInfo.Line << "\n"; + OldLineInfo.FileName != LineInfo.FileName || PrintFunctionName)) { + OS << Delimiter << LineInfo.FileName << ":" << LineInfo.Line; + LVP.printBetweenInsts(OS, true); + } } -void SourcePrinter::printSources(raw_ostream &OS, const DILineInfo &LineInfo, - StringRef ObjectFilename, - StringRef Delimiter) { +void SourcePrinter::printSources(formatted_raw_ostream &OS, + const DILineInfo &LineInfo, + StringRef ObjectFilename, StringRef Delimiter, + LiveVariablePrinter &LVP) { if (LineInfo.FileName == DILineInfo::BadString || LineInfo.Line == 0 || (OldLineInfo.Line == LineInfo.Line && OldLineInfo.FileName == LineInfo.FileName)) @@ -692,7 +1076,8 @@ void SourcePrinter::printSources(raw_ostream &OS, const DILineInfo &LineInfo, return; } // Vector begins at 0, line numbers are non-zero - OS << Delimiter << LineBuffer->second[LineInfo.Line - 1] << '\n'; + OS << Delimiter << LineBuffer->second[LineInfo.Line - 1]; + LVP.printBetweenInsts(OS, true); } } @@ -710,28 +1095,30 @@ static bool hasMappingSymbols(const ObjectFile *Obj) { return isArmElf(Obj) || isAArch64Elf(Obj); } -static void printRelocation(StringRef FileName, const RelocationRef &Rel, - uint64_t Address, bool Is64Bits) { +static void printRelocation(formatted_raw_ostream &OS, StringRef FileName, + const RelocationRef &Rel, uint64_t Address, + bool Is64Bits) { StringRef Fmt = Is64Bits ? "\t\t%016" PRIx64 ": " : "\t\t\t%08" PRIx64 ": "; SmallString<16> Name; SmallString<32> Val; Rel.getTypeName(Name); if (Error E = getRelocationValueString(Rel, Val)) reportError(std::move(E), FileName); - outs() << format(Fmt.data(), Address) << Name << "\t" << Val << "\n"; + OS << format(Fmt.data(), Address) << Name << "\t" << Val; } class PrettyPrinter { public: virtual ~PrettyPrinter() = default; - virtual void printInst(MCInstPrinter &IP, const MCInst *MI, - ArrayRef Bytes, - object::SectionedAddress Address, raw_ostream &OS, - StringRef Annot, MCSubtargetInfo const &STI, - SourcePrinter *SP, StringRef ObjectFilename, - std::vector *Rels = nullptr) { + virtual void + printInst(MCInstPrinter &IP, const MCInst *MI, ArrayRef Bytes, + object::SectionedAddress Address, formatted_raw_ostream &OS, + StringRef Annot, MCSubtargetInfo const &STI, SourcePrinter *SP, + StringRef ObjectFilename, std::vector *Rels, + LiveVariablePrinter &LVP) { if (SP && (PrintSource || PrintLines)) - SP->printSourceLine(OS, Address, ObjectFilename); + SP->printSourceLine(OS, Address, ObjectFilename, LVP); + LVP.printBetweenInsts(OS, false); size_t Start = OS.tell(); if (!NoLeadingAddr) @@ -741,11 +1128,9 @@ class PrettyPrinter { dumpBytes(Bytes, OS); } - // The output of printInst starts with a tab. Print some spaces so that the - // tab has 1 column and advances to the target tab stop. Give more columns - // to x86 which may encode an instruction with many bytes. - unsigned TabStop = - NoShowRawInsn ? 16 : STI.getTargetTriple().isX86() ? 40 : 24; + // The output of printInst starts with a tab. Print some spaces so that + // the tab has 1 column and advances to the target tab stop. + unsigned TabStop = getInstStartColumn(STI); unsigned Column = OS.tell() - Start; OS.indent(Column < TabStop - 1 ? TabStop - 1 - Column : 7 - Column % 8); @@ -766,7 +1151,7 @@ PrettyPrinter PrettyPrinterInst; class HexagonPrettyPrinter : public PrettyPrinter { public: void printLead(ArrayRef Bytes, uint64_t Address, - raw_ostream &OS) { + formatted_raw_ostream &OS) { uint32_t opcode = (Bytes[3] << 24) | (Bytes[2] << 16) | (Bytes[1] << 8) | Bytes[0]; if (!NoLeadingAddr) @@ -778,12 +1163,12 @@ class HexagonPrettyPrinter : public PrettyPrinter { } } void printInst(MCInstPrinter &IP, const MCInst *MI, ArrayRef Bytes, - object::SectionedAddress Address, raw_ostream &OS, + object::SectionedAddress Address, formatted_raw_ostream &OS, StringRef Annot, MCSubtargetInfo const &STI, SourcePrinter *SP, - StringRef ObjectFilename, - std::vector *Rels) override { + StringRef ObjectFilename, std::vector *Rels, + LiveVariablePrinter &LVP) override { if (SP && (PrintSource || PrintLines)) - SP->printSourceLine(OS, Address, ObjectFilename, ""); + SP->printSourceLine(OS, Address, ObjectFilename, LVP, ""); if (!MI) { printLead(Bytes, Address.Address, OS); OS << " "; @@ -809,7 +1194,7 @@ class HexagonPrettyPrinter : public PrettyPrinter { auto PrintReloc = [&]() -> void { while ((RelCur != RelEnd) && (RelCur->getOffset() <= Address.Address)) { if (RelCur->getOffset() == Address.Address) { - printRelocation(ObjectFilename, *RelCur, Address.Address, false); + printRelocation(OS, ObjectFilename, *RelCur, Address.Address, false); return; } ++RelCur; @@ -820,7 +1205,7 @@ class HexagonPrettyPrinter : public PrettyPrinter { OS << Separator; Separator = "\n"; if (SP && (PrintSource || PrintLines)) - SP->printSourceLine(OS, Address, ObjectFilename, ""); + SP->printSourceLine(OS, Address, ObjectFilename, LVP, ""); printLead(Bytes, Address.Address, OS); OS << Preamble; Preamble = " "; @@ -848,12 +1233,12 @@ HexagonPrettyPrinter HexagonPrettyPrinterInst; class AMDGCNPrettyPrinter : public PrettyPrinter { public: void printInst(MCInstPrinter &IP, const MCInst *MI, ArrayRef Bytes, - object::SectionedAddress Address, raw_ostream &OS, + object::SectionedAddress Address, formatted_raw_ostream &OS, StringRef Annot, MCSubtargetInfo const &STI, SourcePrinter *SP, - StringRef ObjectFilename, - std::vector *Rels) override { + StringRef ObjectFilename, std::vector *Rels, + LiveVariablePrinter &LVP) override { if (SP && (PrintSource || PrintLines)) - SP->printSourceLine(OS, Address, ObjectFilename); + SP->printSourceLine(OS, Address, ObjectFilename, LVP); if (MI) { SmallString<40> InstStr; @@ -900,12 +1285,12 @@ AMDGCNPrettyPrinter AMDGCNPrettyPrinterInst; class BPFPrettyPrinter : public PrettyPrinter { public: void printInst(MCInstPrinter &IP, const MCInst *MI, ArrayRef Bytes, - object::SectionedAddress Address, raw_ostream &OS, + object::SectionedAddress Address, formatted_raw_ostream &OS, StringRef Annot, MCSubtargetInfo const &STI, SourcePrinter *SP, - StringRef ObjectFilename, - std::vector *Rels) override { + StringRef ObjectFilename, std::vector *Rels, + LiveVariablePrinter &LVP) override { if (SP && (PrintSource || PrintLines)) - SP->printSourceLine(OS, Address, ObjectFilename); + SP->printSourceLine(OS, Address, ObjectFilename, LVP); if (!NoLeadingAddr) OS << format("%8" PRId64 ":", Address.Address / 8); if (!NoShowRawInsn) { @@ -1094,26 +1479,27 @@ static char getMappingSymbolKind(ArrayRef MappingSymbols, static uint64_t dumpARMELFData(uint64_t SectionAddr, uint64_t Index, uint64_t End, const ObjectFile *Obj, ArrayRef Bytes, - ArrayRef MappingSymbols) { + ArrayRef MappingSymbols, + raw_ostream &OS) { support::endianness Endian = Obj->isLittleEndian() ? support::little : support::big; - outs() << format("%8" PRIx64 ":\t", SectionAddr + Index); + OS << format("%8" PRIx64 ":\t", SectionAddr + Index); if (Index + 4 <= End) { - dumpBytes(Bytes.slice(Index, 4), outs()); - outs() << "\t.word\t" + dumpBytes(Bytes.slice(Index, 4), OS); + OS << "\t.word\t" << format_hex(support::endian::read32(Bytes.data() + Index, Endian), 10); return 4; } if (Index + 2 <= End) { - dumpBytes(Bytes.slice(Index, 2), outs()); - outs() << "\t\t.short\t" + dumpBytes(Bytes.slice(Index, 2), OS); + OS << "\t\t.short\t" << format_hex(support::endian::read16(Bytes.data() + Index, Endian), 6); return 2; } - dumpBytes(Bytes.slice(Index, 1), outs()); - outs() << "\t\t.byte\t" << format_hex(Bytes[0], 4); + dumpBytes(Bytes.slice(Index, 1), OS); + OS << "\t\t.byte\t" << format_hex(Bytes[0], 4); return 1; } @@ -1288,6 +1674,17 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, stable_sort(SecSyms.second); stable_sort(AbsoluteSymbols); + std::unique_ptr DICtx; + LiveVariablePrinter LVP(*Ctx.getRegisterInfo(), *STI); + + if (DbgVariables != DVDisabled) { + DICtx = DWARFContext::create(*Obj); + for (const std::unique_ptr &CU : DICtx->compile_units()) + LVP.addCompileUnit(CU->getUnitDIE(false)); + } + + LLVM_DEBUG(LVP.dump()); + for (const SectionRef &Section : ToolSectionFilter(*Obj)) { if (FilterSections.empty() && !DisassembleAll && (!Section.isText() || Section.isVirtual())) @@ -1481,6 +1878,7 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, Symbols[SI].Type != ELF::STT_OBJECT && !DisassembleAll; bool DumpARMELFData = false; + formatted_raw_ostream FOS(outs()); while (Index < End) { // ARM and AArch64 ELF binaries can interleave data and text in the // same section. We rely on the markers introduced to understand what @@ -1502,7 +1900,7 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, if (DumpARMELFData) { Size = dumpARMELFData(SectionAddr, Index, End, Obj, Bytes, - MappingSymbols); + MappingSymbols, FOS); } else { // When -z or --disassemble-zeroes are given we always dissasemble // them. Otherwise we might want to skip zero bytes we see. @@ -1515,7 +1913,7 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, if (size_t N = countSkippableZeroBytes(Bytes.slice(Index, MaxOffset))) { - outs() << "\t\t..." << '\n'; + FOS << "\t\t..." << '\n'; Index += N; continue; } @@ -1530,11 +1928,14 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, if (Size == 0) Size = 1; + LVP.update({Index, Section.getIndex()}, + {Index + Size, Section.getIndex()}, Index + Size != End); + PIP.printInst( *IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size), - {SectionAddr + Index + VMAAdjustment, Section.getIndex()}, outs(), - "", *STI, &SP, Obj->getFileName(), &Rels); - outs() << CommentStream.str(); + {SectionAddr + Index + VMAAdjustment, Section.getIndex()}, FOS, + "", *STI, &SP, Obj->getFileName(), &Rels, LVP); + FOS << CommentStream.str(); Comments.clear(); // If disassembly has failed, avoid analysing invalid/incomplete @@ -1551,7 +1952,7 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, Inst, SectionAddr + Index, Size)) { Target = *MaybeTarget; PrintTarget = true; - outs() << " # " << Twine::utohexstr(Target); + FOS << " # " << Twine::utohexstr(Target); } if (PrintTarget) { // In a relocatable object, the target's section must reside in @@ -1607,16 +2008,18 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, if (Demangle) TargetName = demangle(TargetName); - outs() << " <" << TargetName; + FOS << " <" << TargetName; uint64_t Disp = Target - TargetAddress; if (Disp) - outs() << "+0x" << Twine::utohexstr(Disp); - outs() << '>'; + FOS << "+0x" << Twine::utohexstr(Disp); + FOS << '>'; } } } } - outs() << "\n"; + + LVP.printAfterInst(FOS); + FOS << "\n"; // Hexagon does this in pretty printer if (Obj->getArch() != Triple::hexagon) { @@ -1646,8 +2049,9 @@ static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, Offset += AdjustVMA; } - printRelocation(Obj->getFileName(), *RelCur, SectionAddr + Offset, - Is64Bits); + printRelocation(FOS, Obj->getFileName(), *RelCur, + SectionAddr + Offset, Is64Bits); + LVP.printAfterOtherLine(FOS, true); ++RelCur; } } From llvm-commits at lists.llvm.org Thu Jul 9 02:00:18 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:00:18 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <445db43caa1145f6b330285b6581dcb4@localhost.localdomain> fhahn added a comment. In D80916#2140861 , @serge-sans-paille wrote: > Note: I had to revert this becasue I only tested X86 targe, and other targets suffer from a lot of « don't update return status » error. cc @fhahn Hmm, where do other targets suffer from those errors? In the various backend pipelines/passes? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Thu Jul 9 02:02:07 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:02:07 +0000 (UTC) Subject: [PATCH] D83397: [flang] Replace uses of _Complex with std::complex In-Reply-To: References: Message-ID: <749538964a0972df998488e0271c3771@localhost.localdomain> DavidTruby added a comment. I would actually say this is more serious than just a warning (although clang only has it as a warning for various reasons). _Complex is //**not a keyword that exists in C++**//, so this code is non-conformant. It also isn't implemented in MSVC which is a compiler we should try to support as the rest of LLVM does. It's also just less ergonomic to use than `std::complex` and they are guaranteed to be layout compatible, so if we need to call into/out of C the following still works: // file.cpp #include extern "C" float real_part(std::complex c) { return c.real(); } // file.c #include #include extern float real_part(float _Complex c); int main() { float _Complex c = 1+2*I; printf("%f\n", real_part(c)); } This isn't accidental; the C++ standard guarantees that this works. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83397/new/ https://reviews.llvm.org/D83397 From llvm-commits at lists.llvm.org Thu Jul 9 02:07:48 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:07:48 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: Higuoxing added a comment. In D83452#2141125 , @grimar wrote: > Also, I find the description a bit confusing: > > > Virtual functions should be overridden in the derived class > > Isn't this change (virtual->override) here a no-op to improve the code style? > All these funtions are anyways overriden, with the `override` word or without it. Yes, I know these functions are overridden with or without the `override` key word. Because the `DIEFixupVisitor` class is derived from `DWARFYAML::Vistor`, and those `onXXXX()` functions are declared as virtual functions in `DWARFYAML::Vistor` class. I want to make it stricter since if we use `override` keyword to override an non-existing virtual function, compiler will complain about it. For example, if we change the signature of functions in derived class by mistake, the compiler will spot it for us. As for the change from `private:` to `protected:`, that's because these functions in `DWARFYAML::Vistor` and `DWARFYAML::DumpVistor` are declared in `protected` keyword, I want to make them consistent. Sorry for my poor expressions :( Does it make sense now? In D83452#2141126 , @jhenderson wrote: > Some points: > > 1. The `virtual` on the old version didn't declare a new virtual class. In face, it was actually superfluous from a strict point of view. `virtual` is propagated to all sub-class versions of functions with the same signature, regardless of accessor type. Prior to C++11, it was traditionally used to show that a sub-class function was an implementation of a virtual function in the point. > 2. C++11 introduced the `override` specifier. It doesn't make a function any more or less virtual than it would have been before. All it does is require that the function overrides a virtual function in the parent class. Yes! That's exactly what I want to do! See my comments for @grimar. > 3. There's no need to change from `private` to `protected`. This just affects how functions can be called from the outside world - it doesn't affect the `virtual` nature of functions. I want to make the coding style consistent since these functions in `DWARFYAML::DIEFixupVisitor` and `DWARFYAML::Visitor` are declared in `protected` keyword. > That all being said, there's nothing wrong with this change, since you're working in the area, in my opinion. Using `override` is always a good idea since it provides safety guarantees (rather than potentially accidentally creating a new virtual function). You just need to update your description and summary accordingly (something like "use override instead of virtual for better safety"). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:09:14 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:09:14 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Virtual functions should be overridden in derived class. In-Reply-To: References: Message-ID: <6329e61e9fc4fd0b9e77e0eec1fa1a35@localhost.localdomain> jhenderson added a comment. Okay, please update the summary and description to clarify this, because it looks like a number of people were confused! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:18:16 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:18:16 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Use override instead of virtual for better safety. In-Reply-To: References: Message-ID: <60598e76dea3728b23688b1248230b6e@localhost.localdomain> Higuoxing added a comment. In D83452#2141178 , @jhenderson wrote: > Okay, please update the summary and description to clarify this, because it looks like a number of people were confused! Sorry for the inconvenience. Does it look better now? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:19:22 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Thu, 09 Jul 2020 02:19:22 -0700 (PDT) Subject: [llvm] e4ec6d0 - Correctly update return status for MVEGatherScatterLowering Message-ID: <5f06e11a.1c69fb81.756d.61ad@mx.google.com> Author: serge-sans-paille Date: 2020-07-09T11:18:54+02:00 New Revision: e4ec6d0afe14ca4ba6cebd35c7f46f3ce0859ecf URL: https://github.com/llvm/llvm-project/commit/e4ec6d0afe14ca4ba6cebd35c7f46f3ce0859ecf DIFF: https://github.com/llvm/llvm-project/commit/e4ec6d0afe14ca4ba6cebd35c7f46f3ce0859ecf.diff LOG: Correctly update return status for MVEGatherScatterLowering `Changed` should reflect all possible changes. Differential Revision: https://reviews.llvm.org/D83459 Added: Modified: llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp b/llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp index d5fa92130e69..4d7ad6cd60cb 100644 --- a/llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp +++ b/llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp @@ -990,26 +990,27 @@ bool MVEGatherScatterLowering::runOnFunction(Function &F) { SmallVector Gathers; SmallVector Scatters; + bool Changed = false; + for (BasicBlock &BB : F) { for (Instruction &I : BB) { IntrinsicInst *II = dyn_cast(&I); if (II && II->getIntrinsicID() == Intrinsic::masked_gather) { Gathers.push_back(II); if (isa(II->getArgOperand(0))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(0))->getOperand(1), II->getParent(), LI); } else if (II && II->getIntrinsicID() == Intrinsic::masked_scatter) { Scatters.push_back(II); if (isa(II->getArgOperand(1))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(1))->getOperand(1), II->getParent(), LI); } } } - bool Changed = false; for (unsigned i = 0; i < Gathers.size(); i++) { IntrinsicInst *I = Gathers[i]; Value *L = lowerGather(I); From llvm-commits at lists.llvm.org Thu Jul 9 02:19:25 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:19:25 +0000 (UTC) Subject: [PATCH] D83459: Correctly update return status for MVEGatherScatterLowering In-Reply-To: References: Message-ID: <5297f388705dca26954940ed98bd473f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe4ec6d0afe14: Correctly update return status for MVEGatherScatterLowering (authored by serge-sans-paille). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83459/new/ https://reviews.llvm.org/D83459 Files: llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp Index: llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp =================================================================== --- llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp +++ llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp @@ -990,26 +990,27 @@ SmallVector Gathers; SmallVector Scatters; + bool Changed = false; + for (BasicBlock &BB : F) { for (Instruction &I : BB) { IntrinsicInst *II = dyn_cast(&I); if (II && II->getIntrinsicID() == Intrinsic::masked_gather) { Gathers.push_back(II); if (isa(II->getArgOperand(0))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(0))->getOperand(1), II->getParent(), LI); } else if (II && II->getIntrinsicID() == Intrinsic::masked_scatter) { Scatters.push_back(II); if (isa(II->getArgOperand(1))) - optimiseOffsets( + Changed |= optimiseOffsets( cast(II->getArgOperand(1))->getOperand(1), II->getParent(), LI); } } } - bool Changed = false; for (unsigned i = 0; i < Gathers.size(); i++) { IntrinsicInst *I = Gathers[i]; Value *L = lowerGather(I); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83459.276670.patch Type: text/x-patch Size: 1335 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:20:26 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:20:26 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <9470de6fbd4bbcb4e212ea0e0aa7d27c@localhost.localdomain> serge-sans-paille added a comment. > Hmm, where do other targets suffer from those errors? In the various backend pipelines/passes? Yes, the culprits where https://reviews.llvm.org/D83457 https://reviews.llvm.org/D83459 and https://reviews.llvm.org/D83460 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Thu Jul 9 02:21:20 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:21:20 +0000 (UTC) Subject: [PATCH] D82756: Port some floating point options to new option marshalling infrastructure In-Reply-To: References: Message-ID: <14735891e78898afb9744092d97794b7@localhost.localdomain> dang added inline comments. ================ Comment at: clang/include/clang/Driver/Options.td:1176 +defm reciprocal_math : OptInFFlag< "reciprocal-math", "Allow division operations to be reassociated", "", "", [], "LangOpts->AllowRecip">; +def fapprox_func : Flag<["-"], "fapprox-func">, Group, Flags<[CC1Option, NoDriverOption]>, + MarshallingInfoFlag<"LangOpts->ApproxFunc", "false">; ---------------- Anastasia wrote: > could this also be OptInFFlag? The aim was to keep the driver semantics the same as before and this was not something you could control with the driver, so I left it as just a CC1 flag. However if it makes sense to be able to control this from the driver then we can definitely make this `OptInFFLag`. ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2805 CmdArgs.push_back("-menable-unsafe-fp-math"); + ApproxFunc = true; + } ---------------- Anastasia wrote: > Is this a bug fix ? No, in current trunk approximating floating point functions was something that was implied by other optimization flags, i.e. disabling math errno, enabling associative/reciprocal math, disabling signed zeros and disabling trapping math and -ffast-math which does all the previously mentioned things. This patch moves this logic in the driver by introducing a new CC1 flag for this so that parsing CC1 options can be more easily automated. This just reflects the logic that was previously inside cc1. ================ Comment at: clang/test/CodeGen/fp-function-attrs.cpp:2 +// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffinite-math-only -menable-unsafe-fp-math \ +// RUN: -menable-no-infs -menable-no-nans -fno-signed-zeros -freciprocal-math \ +// RUN: -fapprox-func -mreassociate -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s ---------------- Anastasia wrote: > Not clear why do you need to pass these extra flags now? Previously passing -ffast-math to CC1 implied all these other flags. I am trying to make CC1 option parsing as simple as possible, so that we can then make it easy to generate a command line from a CompilerInvocation instance. You can refer to [[ http://lists.llvm.org/pipermail/cfe-dev/2020-May/065421.html | http://lists.llvm.org/pipermail/cfe-dev/2020-May/065421.html ]] for more details on why we want to be able to do this Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82756/new/ https://reviews.llvm.org/D82756 From llvm-commits at lists.llvm.org Thu Jul 9 02:24:01 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:24:01 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Use override instead of virtual for better safety. In-Reply-To: References: Message-ID: <0d5a8dc306f5b3a281ba0bad97baca46@localhost.localdomain> jhenderson added a comment. > We should override those functions rather than declare a new set of virtual functions for better safety. We're still not declaring a new set of virtual functions. Just say something like "We should use the modern override keyword instead of virtual for virtual functions in subclasses." Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:27:38 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:27:38 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Use override instead of virtual for better safety. In-Reply-To: References: Message-ID: <0cbe8f08358a706ecbd33784a070be79@localhost.localdomain> Higuoxing added a comment. In D83452#2141201 , @jhenderson wrote: > > We should override those functions rather than declare a new set of virtual functions for better safety. > > We're still not declaring a new set of virtual functions. Just say something like "We should use the modern override keyword instead of virtual for virtual functions in subclasses." Thanks a lot! I've changed the description. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:30:54 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:30:54 +0000 (UTC) Subject: [PATCH] D83467: [SVE][CodeGen] Add README for SVE-related warnings in tests Message-ID: david-arm created this revision. david-arm added reviewers: sdesmalen, efriedma, c-rhodes. Herald added subscribers: llvm-commits, psnobl, arphaman, kristof.beyls, tschuett. Herald added a reviewer: rengolin. Herald added a project: LLVM. I have added a new file: llvm/test/CodeGen/AArch64/README that describes what to do in the event one of the SVE codegen tests fails the warnings check. In addition, I've added comments to all the relevant SVE tests pointing users at the README file. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83467 Files: llvm/test/CodeGen/AArch64/README llvm/test/CodeGen/AArch64/sve-alloca-stackid.ll llvm/test/CodeGen/AArch64/sve-bitcast.ll llvm/test/CodeGen/AArch64/sve-breakdown-scalable-vectortype.ll llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll llvm/test/CodeGen/AArch64/sve-calling-convention-tuple-types.ll llvm/test/CodeGen/AArch64/sve-calling-convention.ll llvm/test/CodeGen/AArch64/sve-extract-element.ll llvm/test/CodeGen/AArch64/sve-extract-subvector.ll llvm/test/CodeGen/AArch64/sve-fcmp.ll llvm/test/CodeGen/AArch64/sve-fp.ll llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll llvm/test/CodeGen/AArch64/sve-gep.ll llvm/test/CodeGen/AArch64/sve-insert-element.ll llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-int-arith-pred.ll llvm/test/CodeGen/AArch64/sve-int-arith.ll llvm/test/CodeGen/AArch64/sve-int-div-pred.ll llvm/test/CodeGen/AArch64/sve-int-imm.ll llvm/test/CodeGen/AArch64/sve-int-log-imm.ll llvm/test/CodeGen/AArch64/sve-int-log-pred.ll llvm/test/CodeGen/AArch64/sve-int-log.ll llvm/test/CodeGen/AArch64/sve-int-mad-pred.ll llvm/test/CodeGen/AArch64/sve-int-mul-pred.ll llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll llvm/test/CodeGen/AArch64/sve-intrinsics-contiguous-prefetches.ll llvm/test/CodeGen/AArch64/sve-intrinsics-conversion.ll llvm/test/CodeGen/AArch64/sve-intrinsics-counting-bits.ll llvm/test/CodeGen/AArch64/sve-intrinsics-counting-elems.ll llvm/test/CodeGen/AArch64/sve-intrinsics-create-tuple.ll llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ffr-manipulation.ll llvm/test/CodeGen/AArch64/sve-intrinsics-fp-arith-merging.ll llvm/test/CodeGen/AArch64/sve-intrinsics-fp-arith.ll llvm/test/CodeGen/AArch64/sve-intrinsics-fp-compares.ll llvm/test/CodeGen/AArch64/sve-intrinsics-fp-converts.ll llvm/test/CodeGen/AArch64/sve-intrinsics-fp-reduce.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scalar-base-vector-indexes.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-invalid-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-index.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-compares-with-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-int-compares.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll llvm/test/CodeGen/AArch64/sve-intrinsics-logical.ll llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-fp32.ll llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-fp64.ll llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-int8.ll llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll llvm/test/CodeGen/AArch64/sve-intrinsics-pred-operations.ll llvm/test/CodeGen/AArch64/sve-intrinsics-pred-testing.ll llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll llvm/test/CodeGen/AArch64/sve-intrinsics-reversal.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scalar-to-vec.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-scaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-unscaled-offsets.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-64bit-scaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-64bit-unscaled-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-vector-base-imm-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-vector-base-scalar-offset.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll llvm/test/CodeGen/AArch64/sve-intrinsics-shifts-merging.ll llvm/test/CodeGen/AArch64/sve-intrinsics-shifts.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sqdec.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sqinc.ll llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll llvm/test/CodeGen/AArch64/sve-intrinsics-stN-reg-imm-addr-mode.ll llvm/test/CodeGen/AArch64/sve-intrinsics-stN-reg-reg-addr-mode.ll llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll llvm/test/CodeGen/AArch64/sve-intrinsics-uqdec.ll llvm/test/CodeGen/AArch64/sve-intrinsics-uqinc.ll llvm/test/CodeGen/AArch64/sve-intrinsics-while.ll llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll llvm/test/CodeGen/AArch64/sve-masked-ldst-sext.ll llvm/test/CodeGen/AArch64/sve-masked-ldst-trunc.ll llvm/test/CodeGen/AArch64/sve-masked-ldst-zext.ll llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-pred-log.ll llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll llvm/test/CodeGen/AArch64/sve-select.ll llvm/test/CodeGen/AArch64/sve-setcc.ll llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll llvm/test/CodeGen/AArch64/sve-vscale-combine.ll llvm/test/CodeGen/AArch64/sve-vscale.ll llvm/test/CodeGen/AArch64/sve-vselect-imm.ll llvm/test/CodeGen/AArch64/sve-zeroinit.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83467.276673.patch Type: text/x-patch Size: 69490 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:34:07 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:34:07 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <6aeab75e07c3a9685dd7935fd40e2460@localhost.localdomain> dnsampaio updated this revision to Diff 276674. dnsampaio marked 3 inline comments as done. dnsampaio added a comment. Clear users assumptions (and test it) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276674.patch Type: text/x-patch Size: 5911 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:36:57 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:36:57 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <90eb498fd24d30f801cd2fc39b0fe9e7@localhost.localdomain> fhahn added a comment. In D80916#2141194 , @serge-sans-paille wrote: > > Hmm, where do other targets suffer from those errors? In the various backend pipelines/passes? > > Yes, the culprits where https://reviews.llvm.org/D83457 https://reviews.llvm.org/D83459 and https://reviews.llvm.org/D83460 Sounds good. I can try a build on AArch64 in a bit. Cross-compiling the test-suite for example should be relatively straight-forward, if you have a linker & libraries for the target architecture (e.g. on linux it should be easy to get the required toolchains for platforms like ARM and AArch64) https://llvm.org/docs/lnt/tests.html#cross-compiling Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Thu Jul 9 02:40:22 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:40:22 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <198294357e71d1e2922a5378da1b4a61@localhost.localdomain> dnsampaio added inline comments. ================ Comment at: llvm/lib/Transforms/Scalar/BDCE.cpp:127 + Changed = true; + NumSExt2ZExt++; + continue; ---------------- nikic wrote: > You probably need to `clearAssumptionsOfUsers()` here. Please check this test case: https://alive2.llvm.org/ce/z/caMis2 Indeed, many thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Thu Jul 9 02:41:50 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:41:50 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Use override instead of virtual for better safety. In-Reply-To: References: Message-ID: <02979735157af1ff3a876a10b955f138@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 From llvm-commits at lists.llvm.org Thu Jul 9 02:42:10 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:42:10 +0000 (UTC) Subject: [PATCH] D70174: [AArch64][SVE] Use FP for scavenging slot In-Reply-To: References: Message-ID: david-arm added a comment. This fixes an issue building SLEEF with SVE at -O0. It needs rebasing though as it doesn't apply cleanly to HEAD. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70174/new/ https://reviews.llvm.org/D70174 From llvm-commits at lists.llvm.org Thu Jul 9 02:42:23 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via llvm-commits) Date: Thu, 09 Jul 2020 02:42:23 -0700 (PDT) Subject: [llvm] 9e7fddb - [yaml][clang-tidy] Fix multiline YAML serialization Message-ID: <5f06e67f.1c69fb81.fbed5.664d@mx.google.com> Author: Dmitry Polukhin Date: 2020-07-09T02:41:58-07:00 New Revision: 9e7fddbd36f567217255c1df1cb816b79f0250af URL: https://github.com/llvm/llvm-project/commit/9e7fddbd36f567217255c1df1cb816b79f0250af DIFF: https://github.com/llvm/llvm-project/commit/9e7fddbd36f567217255c1df1cb816b79f0250af.diff LOG: [yaml][clang-tidy] Fix multiline YAML serialization Summary: New line duplication logic introduced in https://reviews.llvm.org/D63482 has two issues: (1) there is no logic that removes duplicate newlines when clang-apply-replacment reads YAML and (2) in general such logic should be applied to all strings and should happen on string serialization level instead in YAML parser. This diff changes multiline strings quotation from single quote `'` to double `"`. It solves problems with internal newlines because now they are escaped. Also double quotation solves the problem with leading whitespace after newline. In case of single quotation YAML parsers should remove leading whitespace according to specification. In case of double quotation these leading are internal space and they are preserved. There is no way to instruct YAML parsers to preserve leading whitespaces after newline so double quotation is the only viable option that solves all problems at once. Test Plan: check-all Reviewers: gribozavr, mgehre, yvvan Subscribers: xazax.hun, hiraditya, cfe-commits, llvm-commits Tags: #clang-tools-extra, #clang, #llvm Differential Revision: https://reviews.llvm.org/D80301 Added: Modified: clang/include/clang/Tooling/ReplacementsYaml.h clang/unittests/Tooling/ReplacementsYamlTest.cpp llvm/include/llvm/Support/YAMLTraits.h llvm/lib/Support/YAMLTraits.cpp llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll llvm/unittests/Support/YAMLIOTest.cpp Removed: ################################################################################ diff --git a/clang/include/clang/Tooling/ReplacementsYaml.h b/clang/include/clang/Tooling/ReplacementsYaml.h index 2e3e401652e2..83e35d623255 100644 --- a/clang/include/clang/Tooling/ReplacementsYaml.h +++ b/clang/include/clang/Tooling/ReplacementsYaml.h @@ -35,13 +35,7 @@ template <> struct MappingTraits { NormalizedReplacement(const IO &, const clang::tooling::Replacement &R) : FilePath(R.getFilePath()), Offset(R.getOffset()), - Length(R.getLength()), ReplacementText(R.getReplacementText()) { - size_t lineBreakPos = ReplacementText.find('\n'); - while (lineBreakPos != std::string::npos) { - ReplacementText.replace(lineBreakPos, 1, "\n\n"); - lineBreakPos = ReplacementText.find('\n', lineBreakPos + 2); - } - } + Length(R.getLength()), ReplacementText(R.getReplacementText()) {} clang::tooling::Replacement denormalize(const IO &) { return clang::tooling::Replacement(FilePath, Offset, Length, diff --git a/clang/unittests/Tooling/ReplacementsYamlTest.cpp b/clang/unittests/Tooling/ReplacementsYamlTest.cpp index c8fe9c4db412..3328d9bad55c 100644 --- a/clang/unittests/Tooling/ReplacementsYamlTest.cpp +++ b/clang/unittests/Tooling/ReplacementsYamlTest.cpp @@ -65,7 +65,7 @@ TEST(ReplacementsYamlTest, serializesNewLines) { " - FilePath: '/path/to/file1.h'\n" " Offset: 0\n" " Length: 0\n" - " ReplacementText: '#include \n\n'\n" + " ReplacementText: \"#include \\n\"\n" "...\n", YamlContentStream.str().c_str()); } diff --git a/llvm/include/llvm/Support/YAMLTraits.h b/llvm/include/llvm/Support/YAMLTraits.h index f93f36037679..44e34a4a09b4 100644 --- a/llvm/include/llvm/Support/YAMLTraits.h +++ b/llvm/include/llvm/Support/YAMLTraits.h @@ -649,24 +649,25 @@ inline bool isBool(StringRef S) { inline QuotingType needsQuotes(StringRef S) { if (S.empty()) return QuotingType::Single; + + QuotingType MaxQuotingNeeded = QuotingType::None; if (isSpace(static_cast(S.front())) || isSpace(static_cast(S.back()))) - return QuotingType::Single; + MaxQuotingNeeded = QuotingType::Single; if (isNull(S)) - return QuotingType::Single; + MaxQuotingNeeded = QuotingType::Single; if (isBool(S)) - return QuotingType::Single; + MaxQuotingNeeded = QuotingType::Single; if (isNumeric(S)) - return QuotingType::Single; + MaxQuotingNeeded = QuotingType::Single; // 7.3.3 Plain Style // Plain scalars must not begin with most indicators, as this would cause // ambiguity with other YAML constructs. static constexpr char Indicators[] = R"(-?:\,[]{}#&*!|>'"%@`)"; if (S.find_first_of(Indicators) == 0) - return QuotingType::Single; + MaxQuotingNeeded = QuotingType::Single; - QuotingType MaxQuotingNeeded = QuotingType::None; for (unsigned char C : S) { // Alphanum is safe. if (isAlnum(C)) @@ -684,11 +685,11 @@ inline QuotingType needsQuotes(StringRef S) { case 0x9: continue; // LF(0xA) and CR(0xD) may delimit values and so require at least single - // quotes. + // quotes. LLVM YAML parser cannot handle single quoted multiline so use + // double quoting to produce valid YAML. case 0xA: case 0xD: - MaxQuotingNeeded = QuotingType::Single; - continue; + return QuotingType::Double; // DEL (0x7F) are excluded from the allowed character range. case 0x7F: return QuotingType::Double; diff --git a/llvm/lib/Support/YAMLTraits.cpp b/llvm/lib/Support/YAMLTraits.cpp index 752fab2be9b3..9ac7c65e19f7 100644 --- a/llvm/lib/Support/YAMLTraits.cpp +++ b/llvm/lib/Support/YAMLTraits.cpp @@ -878,12 +878,12 @@ StringRef ScalarTraits::input(StringRef Scalar, void *, } void ScalarTraits::output(const std::string &Val, void *, - raw_ostream &Out) { + raw_ostream &Out) { Out << Val; } StringRef ScalarTraits::input(StringRef Scalar, void *, - std::string &Val) { + std::string &Val) { Val = Scalar.str(); return StringRef(); } diff --git a/llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll b/llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll index e92733f9a81a..2846889bf239 100644 --- a/llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll +++ b/llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll @@ -18,8 +18,7 @@ ; YAML-NEXT: - String: ' loads, ' ; YAML-NEXT: - NumComputeOps: '0' ; YAML-NEXT: - String: ' compute ops' -; YAML-NEXT: - String: ', -; YAML-NEXT: additionally ' +; YAML-NEXT: - String: ",\nadditionally " ; YAML-NEXT: - NumStores: '0' ; YAML-NEXT: - String: ' stores, ' ; YAML-NEXT: - NumLoads: '4' @@ -47,8 +46,7 @@ ; YAML-NEXT: - String: ' loads, ' ; YAML-NEXT: - NumComputeOps: '120' ; YAML-NEXT: - String: ' compute ops' -; YAML-NEXT: - String: ', -; YAML-NEXT: additionally ' +; YAML-NEXT: - String: ",\nadditionally " ; YAML-NEXT: - NumStores: '0' ; YAML-NEXT: - String: ' stores, ' ; YAML-NEXT: - NumLoads: '4' diff --git a/llvm/unittests/Support/YAMLIOTest.cpp b/llvm/unittests/Support/YAMLIOTest.cpp index d86489cf7560..492d854ef812 100644 --- a/llvm/unittests/Support/YAMLIOTest.cpp +++ b/llvm/unittests/Support/YAMLIOTest.cpp @@ -285,10 +285,8 @@ TEST(YAMLIO, MultilineStrings) { YOut << Original; } auto Expected = "---\n" - "str1: 'a multiline string\n" - "foobarbaz'\n" - "str2: 'another one\r" - "foobarbaz'\n" + "str1: \"a multiline string\\nfoobarbaz\"\n" + "str2: \"another one\\rfoobarbaz\"\n" "str3: a one-line string\n" "...\n"; ASSERT_EQ(Serialized, Expected); From llvm-commits at lists.llvm.org Thu Jul 9 02:42:27 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:42:27 +0000 (UTC) Subject: [PATCH] D80301: [yaml][clang-tidy] Fix multiline YAML serialization In-Reply-To: References: Message-ID: <5cc6df654594d2d6d4a58ae71c2b1e82@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9e7fddbd36f5: [yaml][clang-tidy] Fix multiline YAML serialization (authored by DmitryPolukhin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80301/new/ https://reviews.llvm.org/D80301 Files: clang/include/clang/Tooling/ReplacementsYaml.h clang/unittests/Tooling/ReplacementsYamlTest.cpp llvm/include/llvm/Support/YAMLTraits.h llvm/lib/Support/YAMLTraits.cpp llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll llvm/unittests/Support/YAMLIOTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80301.276679.patch Type: text/x-patch Size: 5853 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:43:32 2020 From: llvm-commits at lists.llvm.org (Dmitry Polukhin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:43:32 +0000 (UTC) Subject: [PATCH] D80301: [yaml][clang-tidy] Fix multiline YAML serialization In-Reply-To: References: Message-ID: <8c1f2b616e468bd2a3dc9395b93ce0ca@localhost.localdomain> DmitryPolukhin added a comment. @aaron.ballman - thank you for the review! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80301/new/ https://reviews.llvm.org/D80301 From llvm-commits at lists.llvm.org Thu Jul 9 02:44:10 2020 From: llvm-commits at lists.llvm.org (Jaydeep Chauhan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:44:10 +0000 (UTC) Subject: [PATCH] D83468: [Debuginfo] Fix for PR46653 Message-ID: Jac1494 created this revision. Jac1494 added reviewers: JDevlieghere, vsk, aprantl. Jac1494 added a project: debug-info. Herald added subscribers: llvm-commits, ormris, hiraditya. Herald added a project: LLVM. This patch fix https://bugs.llvm.org/show_bug.cgi?id=46653 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83468 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/test/CodeGen/X86/stack-protector.ll llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir llvm/test/DebugInfo/X86/dbg-prolog-end.ll llvm/test/DebugInfo/X86/tail-merge.ll llvm/test/DebugInfo/debugline-no-prologue_end.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83468.276676.patch Type: text/x-patch Size: 5933 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:44:33 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:44:33 +0000 (UTC) Subject: [PATCH] D83240: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping In-Reply-To: References: Message-ID: <8849640029b4fde892a5313ec9721e79@localhost.localdomain> mbrkusanin updated this revision to Diff 276677. mbrkusanin added a comment. - Updated tests. - Added tests with waterfall loops. - Changed them to -stop-after=instruction-select like others for GlobalISel. Didn't know what waterfall loops were before. Let me know if all the necessary cases are covered. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83240/new/ https://reviews.llvm.org/D83240 Files: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/BUFInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83240.276677.patch Type: text/x-patch Size: 115493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 02:53:16 2020 From: llvm-commits at lists.llvm.org (Cullen Rhodes via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:53:16 +0000 (UTC) Subject: [PATCH] D83196: [CodeGen] Fix a warning in DAGTypeLegalizer::SetSplitVector In-Reply-To: References: Message-ID: c-rhodes accepted this revision. c-rhodes added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83196/new/ https://reviews.llvm.org/D83196 From llvm-commits at lists.llvm.org Thu Jul 9 02:53:25 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:53:25 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: vitalybuka added inline comments. ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:103 uptr FindDynamicShadowStart() { + uptr shadow_size_bytes = GetHighMemEnd(SHADOW_SCALE) >> SHADOW_SCALE; #if ASAN_PREMAP_SHADOW ---------------- MemToShadowSize(GetHighMemEnd(SHADOW_SCALE)) ================ Comment at: compiler-rt/lib/hwasan/hwasan.cpp:289 - MadviseShadow(); - ---------------- why it's gone? ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_common.h:123 void MprotectMallocZones(void *addr, int prot); +// Get the max address, taking into account alignment due to the mmap ---------------- Shadow is specific to only some sanitizers, so I don't like to have it in /sanitizer_common/ Also we have msan/tsan/dfsan with different shadows for which is not clear if we can reuse these functions without exposing more particular sanitizer details. Maybe keeping all shadow code with some redundancy but independent is going to be easier to maintain in long term. Anyway, if we still go this way, maybe we put code into sanitizer_common/sanitizer_shadow_* files ? ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cpp:204 + const uptr granularity = GetMmapGranularity(); + const uptr alignment = shadow_base_alignment + ? 1ULL << shadow_base_alignment ---------------- I think it's going to be cleaner if we replace uptr mmap_alignment_scale, uptr shadow_base_alignment with uptr shadow_scale, uptr min_shadow_base_alignment and adjust calculations accordingly: const uptr alignment = max(granularity << shadow_scale, min_shadow_base_alignment) it should be copied into mac and win, even if they use 0 there, for consistency ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp:1083 + largest_gap_found, max_occupied_addr); + uptr new_max_vm = RoundDownTo(largest_gap_found << SHADOW_SCALE, alignment); + if (new_max_vm < max_occupied_addr) { ---------------- SHADOW_SCALE is undefined here ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp:1092 + RestrictMemoryToMaxAddress(new_max_vm); + kHighMemEnd = new_max_vm - 1; + space_size = kHighShadowEnd + left_padding; ---------------- kHighShadowEnd and kHighMemEnd are undefined here ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp:1093 + kHighMemEnd = new_max_vm - 1; + space_size = kHighShadowEnd + left_padding; + VReport(2, "FindDynamicShadowStart, space_size = %p\n", space_size); ---------------- because if kHighMemEnd here we need to return mapped size on all platforms :( ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_win.cpp:351 +uptr MapDynamicShadow(uptr shadow_size_bytes, uptr mmap_alignment_scale, + uptr shadow_base_alignment) { ---------------- shadow_size_bytes is not used ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_win.cpp:353 + uptr shadow_base_alignment) { + CHECK(shadow_base_alignment == 0); + uptr granularity = GetMmapGranularity(); ---------------- CHECK_NE ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_win.cpp:357 + uptr left_padding = granularity; + uptr space_size = kHighShadowEnd + left_padding; + uptr shadow_start = FindAvailableMemoryRange(space_size, alignment, ---------------- kHighShadowEnd is asan constant probably space_size should be calculated from shadow_size_bytes same for mac version Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Thu Jul 9 02:54:09 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 09:54:09 +0000 (UTC) Subject: [PATCH] D82988: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo In-Reply-To: References: Message-ID: lenary marked an inline comment as done. lenary added a comment. The RISCVTargetMachine changes are not distinct. You can only use virtual registers (ScratchReg) if you are before register allocation, as I understand it. ================ Comment at: llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:74-76 MachineBasicBlock::iterator NMBBI = std::next(MBBI); - Modified |= expandMI(MBB, MBBI, NMBBI); + Modified |= expandMI(MBB, MBBI); MBBI = NMBBI; ---------------- luismarques wrote: > lenary wrote: > > Oh this loop can be simplifed - though I'm not sure I should be incrementing `MBBI` if we've inserted new instructions using `BuildMI`, which I think also increments the iterator automatically. Guidance here would be helpful. > Maybe I misunderstood your issue, but I think the increment is correct: > 1) It is what AArch64 does. > 2) If you have two consecutive pseudo-instructions this code expands them both, so you aren't skipping over the second by performing the explicit increment. > 3) You don't want to iterate over the expanded instructions, to recursively expand them, since we don't emit other pseudo-instructions in the expansions. > BTW, what simplification were you considering for this loop? Turning it back into a for loop like: ``` for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); MBBI != E; MBBI++) { Modified |= expandMI(MBB, MBBI) } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82988/new/ https://reviews.llvm.org/D82988 From llvm-commits at lists.llvm.org Thu Jul 9 03:01:27 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 03:01:27 -0700 (PDT) Subject: [llvm] dbed9d5 - VersionPrinter - use const auto& iterator in for-range-loop. Message-ID: <5f06eaf7.1c69fb81.3b07c.5997@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T10:56:38+01:00 New Revision: dbed9d5ce7f5ab870b3ff20a14ee6c366c803fdb URL: https://github.com/llvm/llvm-project/commit/dbed9d5ce7f5ab870b3ff20a14ee6c366c803fdb DIFF: https://github.com/llvm/llvm-project/commit/dbed9d5ce7f5ab870b3ff20a14ee6c366c803fdb.diff LOG: VersionPrinter - use const auto& iterator in for-range-loop. Avoids unnecessary copies and silences clang tidy warning. Added: Modified: llvm/lib/Support/CommandLine.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/CommandLine.cpp b/llvm/lib/Support/CommandLine.cpp index cee96083f700..12ef0d511b14 100644 --- a/llvm/lib/Support/CommandLine.cpp +++ b/llvm/lib/Support/CommandLine.cpp @@ -2537,7 +2537,7 @@ class VersionPrinter { // information. if (ExtraVersionPrinters != nullptr) { outs() << '\n'; - for (auto I : *ExtraVersionPrinters) + for (const auto &I : *ExtraVersionPrinters) I(outs()); } From llvm-commits at lists.llvm.org Thu Jul 9 03:03:48 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:03:48 +0000 (UTC) Subject: [PATCH] D82988: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo In-Reply-To: References: Message-ID: <1fab539e614e46b73982939c1bb9ed42@localhost.localdomain> asb accepted this revision. asb added a comment. This revision is now accepted and ready to land. Got it, thanks. In that case LGTM and please tweak the commit message (the last paragraph specifically) so it's clear that the two changes are interlinked. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82988/new/ https://reviews.llvm.org/D82988 From llvm-commits at lists.llvm.org Thu Jul 9 03:07:57 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:07:57 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: vitalybuka added inline comments. ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:122 - uptr granularity = GetMmapGranularity(); - uptr alignment = granularity * 8; - uptr left_padding = granularity; ---------------- kcc wrote: > tejohnson wrote: > > The code in asan is multiplying the mmap granularity by 8, whereas the hwasan version shifts it by kShadowScale. I wasn't sure if the 8 here is supposed to be equivalent to a left shift by the shadow scale (which is typically 3 in asan), or is specifically hardcoded separately not using SHADOW_SCALE since it could be something other than 3 in some cases (e.g. 5 for myriad, or user set via ASAN_SHADOW_SCALE). Depending on what was intended here, I would keep the hardcoding of "3" passed to my refactored MapDynamicShadow, or change that to SHADOW_SCALE. > I frankly don't remember :( It should be SHADOW_SCALE, myriad works only because it does not use dynamic shadow Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Thu Jul 9 03:08:03 2020 From: llvm-commits at lists.llvm.org (Bevin Hansson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:08:03 +0000 (UTC) Subject: [PATCH] D83216: [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. In-Reply-To: References: Message-ID: <7ce224136c7949d039771556aafb1fd5@localhost.localdomain> ebevhan updated this revision to Diff 276683. ebevhan added a comment. Addressed review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83216/new/ https://reviews.llvm.org/D83216 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/Verifier.cpp llvm/test/CodeGen/X86/sshl_sat.ll llvm/test/CodeGen/X86/ushl_sat.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83216.276683.patch Type: text/x-patch Size: 51593 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:10:17 2020 From: llvm-commits at lists.llvm.org (Simon Wallis via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:10:17 +0000 (UTC) Subject: [PATCH] D82638: [MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef In-Reply-To: References: Message-ID: <1e53dc253caa6d80dae2af38297c499b@localhost.localdomain> simonwallis2 added a comment. Hi @lkail Yes, I used llc -stop-before=machine-cp mcp-dest-regs-no-dup.ll to create the MIR test mcp-dest-regs-no-dup.mir I note that the RUN line you suggested produces this error: llc: warning: run-pass is for .mir file only. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82638/new/ https://reviews.llvm.org/D82638 From llvm-commits at lists.llvm.org Thu Jul 9 03:16:03 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:16:03 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: arichardson, emaste. Herald added a reviewer: espindola. It allows handling cases when we have SHT_REL[A] sections before target sections in objects. This fixes https://bugs.llvm.org/show_bug.cgi?id=46632 which says: "Normally it is not what compilers would emit. We have to support it, because some custom tools might want to use this feature, which is not restricted by ELF gABI" https://reviews.llvm.org/D83469 Files: lld/ELF/InputFiles.cpp lld/test/ELF/reloc-sec-before-target.test Index: lld/test/ELF/reloc-sec-before-target.test =================================================================== --- /dev/null +++ lld/test/ELF/reloc-sec-before-target.test @@ -0,0 +1,33 @@ +# RUN: yaml2obj %s -o %t.o +# RUN: ld.lld -shared %t.o -o %t +# RUN: llvm-readelf --relocs %t | FileCheck %s + +## In this case we have an object with a relocation section before +## the corresponding relocatable target section. Normally it is not what +## compilers would emit. We have to support it, because some custom tools might +## want to use this feature, which is not restricted by ELF gABI. + +## Check we handle the relocation properly. +# CHECK: Relocation section '.rela.dyn' at offset 0x238 contains 1 entries: +# CHECK-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend +# CHECK-NEXT: 00000000000022f0 0000000100000001 R_X86_64_64 0000000000000000 foo + 0 + +--- !ELF +FileHeader: + Class: ELFCLASS64 + Data: ELFDATA2LSB + Type: ET_REL + Machine: EM_X86_64 +Sections: + - Name: .rela.data + Type: SHT_RELA + Info: .data + Relocations: + - Symbol: foo + Type: R_X86_64_64 + - Name: .data + Type: SHT_PROGBITS + Flags: [ SHF_ALLOC, SHF_WRITE ] +Symbols: + - Name: foo + Binding: STB_GLOBAL Index: lld/ELF/InputFiles.cpp =================================================================== --- lld/ELF/InputFiles.cpp +++ lld/ELF/InputFiles.cpp @@ -632,6 +632,8 @@ break; case SHT_SYMTAB: case SHT_STRTAB: + case SHT_REL: + case SHT_RELA: case SHT_NULL: break; default: @@ -639,11 +641,23 @@ } } - // This block handles SHF_LINK_ORDER. + // We have the second loop. It is used to: + // 1) handle SHF_LINK_ORDER sections. + // 2) create SHT_REL[A] sections. In a specific case it might be possible + // to have a relocatable section that follows the corresponding relocation + // section. In this case the relocation section references the target + // section that is not yet created and we error out. For simplicity of + // implementation, we do not implement the creation of sections on demand. for (size_t i = 0, e = objSections.size(); i < e; ++i) { if (this->sections[i] == &InputSection::discarded) continue; const Elf_Shdr &sec = objSections[i]; + + // Create SHT_REL[A] sections. + if (sec.sh_type == SHT_REL || sec.sh_type == SHT_RELA) + this->sections[i] = createInputSection(sec); + + // This block handles SHF_LINK_ORDER. if (!(sec.sh_flags & SHF_LINK_ORDER)) continue; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83469.276685.patch Type: text/x-patch Size: 2595 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:17:14 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 03:17:14 -0700 (PDT) Subject: [llvm] 03fe47a - ConstantFoldScalarCall3 - use const APInt& returned by getValue() Message-ID: <5f06eeaa.1c69fb81.10533.6555@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T11:16:47+01:00 New Revision: 03fe47a29c95dbda5ecd548e35627bb16f7dfc6d URL: https://github.com/llvm/llvm-project/commit/03fe47a29c95dbda5ecd548e35627bb16f7dfc6d DIFF: https://github.com/llvm/llvm-project/commit/03fe47a29c95dbda5ecd548e35627bb16f7dfc6d.diff LOG: ConstantFoldScalarCall3 - use const APInt& returned by getValue() Avoids unnecessary APInt copies and silences clang tidy warning. Added: Modified: llvm/lib/Analysis/ConstantFolding.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp index 76e7b2906c91..a414336fb21b 100644 --- a/llvm/lib/Analysis/ConstantFolding.cpp +++ b/llvm/lib/Analysis/ConstantFolding.cpp @@ -2607,8 +2607,8 @@ static Constant *ConstantFoldScalarCall3(StringRef Name, // how rounding should be done, and provide their own folding to be // consistent with rounding. This is the same approach as used by // DAGTypeLegalizer::ExpandIntRes_MULFIX. - APInt Lhs = Op1->getValue(); - APInt Rhs = Op2->getValue(); + const APInt &Lhs = Op1->getValue(); + const APInt &Rhs = Op2->getValue(); unsigned Scale = Op3->getValue().getZExtValue(); unsigned Width = Lhs.getBitWidth(); assert(Scale < Width && "Illegal scale."); From llvm-commits at lists.llvm.org Thu Jul 9 03:23:33 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:23:33 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: grimar planned changes to this revision. grimar added a comment. Going to update. I've missed a failture in a test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 03:23:48 2020 From: llvm-commits at lists.llvm.org (Jun Ma via llvm-commits) Date: Thu, 09 Jul 2020 03:23:48 -0700 (PDT) Subject: [llvm] f0bfad2 - [Coroutines] Refactor sinkLifetimeStartMarkers Message-ID: <5f06f034.1c69fb81.6774.69b5@mx.google.com> Author: Jun Ma Date: 2020-07-09T18:23:28+08:00 New Revision: f0bfad2ed9b4a0eec68b71c7df2ee588806788c2 URL: https://github.com/llvm/llvm-project/commit/f0bfad2ed9b4a0eec68b71c7df2ee588806788c2 DIFF: https://github.com/llvm/llvm-project/commit/f0bfad2ed9b4a0eec68b71c7df2ee588806788c2.diff LOG: [Coroutines] Refactor sinkLifetimeStartMarkers Differential Revision: https://reviews.llvm.org/D83379 Added: llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll Modified: llvm/lib/Transforms/Coroutines/CoroFrame.cpp llvm/lib/Transforms/Coroutines/CoroSplit.cpp Removed: llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll ################################################################################ diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp index 2d398643a9c3..f55501a05d85 100644 --- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp +++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp @@ -1548,6 +1548,75 @@ static void sinkSpillUsesAfterCoroBegin(Function &F, const SpillInfo &Spills, return; } +/// For each local variable that all of its user are only used inside one of +/// suspended region, we sink their lifetime.start markers to the place where +/// after the suspend block. Doing so minimizes the lifetime of each variable, +/// hence minimizing the amount of data we end up putting on the frame. +static void sinkLifetimeStartMarkers(Function &F, coro::Shape &Shape, + SuspendCrossingInfo &Checker) { + DominatorTree DT(F); + + // Collect all possible basic blocks which may dominate all uses of allocas. + SmallPtrSet DomSet; + DomSet.insert(&F.getEntryBlock()); + for (auto *CSI : Shape.CoroSuspends) { + BasicBlock *SuspendBlock = CSI->getParent(); + assert(isSuspendBlock(SuspendBlock) && SuspendBlock->getSingleSuccessor() && + "should have split coro.suspend into its own block"); + DomSet.insert(SuspendBlock->getSingleSuccessor()); + } + + for (Instruction &I : instructions(F)) { + if (!isa(&I)) + continue; + + for (BasicBlock *DomBB : DomSet) { + bool Valid = true; + SmallVector BCInsts; + + auto isUsedByLifetimeStart = [&](Instruction *I) { + if (isa(I) && I->hasOneUse()) + if (auto *IT = dyn_cast(I->user_back())) + return IT->getIntrinsicID() == Intrinsic::lifetime_start; + return false; + }; + + for (User *U : I.users()) { + Instruction *UI = cast(U); + // For all users except lifetime.start markers, if they are all + // dominated by one of the basic blocks and do not cross + // suspend points as well, then there is no need to spill the + // instruction. + if (!DT.dominates(DomBB, UI->getParent()) || + Checker.isDefinitionAcrossSuspend(DomBB, U)) { + // Skip bitcast used by lifetime.start markers. + if (isUsedByLifetimeStart(UI)) { + BCInsts.push_back(UI); + continue; + } + Valid = false; + break; + } + } + // Sink lifetime.start markers to dominate block when they are + // only used outside the region. + if (Valid && BCInsts.size() != 0) { + auto *NewBitcast = BCInsts[0]->clone(); + auto *NewLifetime = cast(BCInsts[0]->user_back())->clone(); + NewLifetime->replaceUsesOfWith(BCInsts[0], NewBitcast); + NewBitcast->insertBefore(DomBB->getTerminator()); + NewLifetime->insertBefore(DomBB->getTerminator()); + + // All the outsided lifetime.start markers are no longer necessary. + for (Instruction *S : BCInsts) { + S->user_back()->eraseFromParent(); + } + break; + } + } + } +} + void coro::buildCoroutineFrame(Function &F, Shape &Shape) { eliminateSwiftError(F, Shape); @@ -1598,6 +1667,7 @@ void coro::buildCoroutineFrame(Function &F, Shape &Shape) { Spills.clear(); } + sinkLifetimeStartMarkers(F, Shape, Checker); // Collect lifetime.start info for each alloca. using LifetimeStart = SmallPtrSet; llvm::DenseMap> LifetimeMap; diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp index 0841cebab51c..9c4392e7999b 100644 --- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp +++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp @@ -1239,103 +1239,6 @@ static void simplifySuspendPoints(coro::Shape &Shape) { S.resize(N); } -/// For every local variable that has lifetime intrinsics markers, we sink -/// their lifetime.start marker to the places where the variable is being -/// used for the first time. Doing so minimizes the lifetime of each variable, -/// hence minimizing the amount of data we end up putting on the frame. -static void sinkLifetimeStartMarkers(Function &F) { - DominatorTree Dom(F); - for (Instruction &I : instructions(F)) { - // We look for this particular pattern: - // %tmpX = alloca %.., align ... - // %0 = bitcast %...* %tmpX to i8* - // call void @llvm.lifetime.start.p0i8(i64 ..., i8* nonnull %0) #2 - if (!isa(&I)) - continue; - // There can be multiple lifetime start markers for the same variable. - SmallPtrSet LifetimeStartInsts; - // SinkBarriers stores all instructions that use this local variable. - // When sinking the lifetime start intrinsics, we can never sink past - // these barriers. - SmallPtrSet SinkBarriers; - bool Valid = true; - auto AddSinkBarrier = [&](Instruction *I) { - // When adding a new barrier to SinkBarriers, we maintain the case - // that no instruction in SinkBarriers dominates another instruction. - SmallPtrSet ToRemove; - bool ShouldAdd = true; - for (Instruction *S : SinkBarriers) { - if (I == S || Dom.dominates(S, I)) { - ShouldAdd = false; - break; - } else if (Dom.dominates(I, S)) { - ToRemove.insert(S); - } - } - if (ShouldAdd) { - SinkBarriers.insert(I); - for (Instruction *R : ToRemove) { - SinkBarriers.erase(R); - } - } - }; - for (User *U : I.users()) { - if (!isa(U)) - continue; - for (User *CU : U->users()) { - // If we see any user of CastInst that's not lifetime start/end - // intrinsics, give up because it's too complex. - if (auto *CUI = dyn_cast(CU)) { - if (CUI->getIntrinsicID() == Intrinsic::lifetime_start) - LifetimeStartInsts.insert(CUI); - else if (CUI->getIntrinsicID() == Intrinsic::lifetime_end) - AddSinkBarrier(CUI); - else - Valid = false; - } else { - Valid = false; - } - } - } - if (!Valid || LifetimeStartInsts.empty()) - continue; - - for (User *U : I.users()) { - if (isa(U)) - continue; - // Every user of the variable is also a sink barrier. - AddSinkBarrier(cast(U)); - } - - // For each sink barrier, we insert a lifetime start marker right - // before it. - for (Instruction *S : SinkBarriers) { - if (auto *IS = dyn_cast(S)) { - if (IS->getIntrinsicID() == Intrinsic::lifetime_end) { - // If we have a lifetime end marker in SinkBarriers, meaning it's - // not dominated by any other users, we can safely delete it. - IS->eraseFromParent(); - continue; - } - } - // We find an existing lifetime.start marker that domintes the barrier, - // clone it and insert it right before the barrier. We cannot clone an - // arbitrary lifetime.start marker because we want to make sure the - // BitCast instruction referred in the marker also dominates the barrier. - for (const IntrinsicInst *LifetimeStart : LifetimeStartInsts) { - if (Dom.dominates(LifetimeStart, S)) { - LifetimeStart->clone()->insertBefore(S); - break; - } - } - } - // All the old lifetime.start markers are no longer necessary. - for (IntrinsicInst *S : LifetimeStartInsts) { - S->eraseFromParent(); - } - } -} - static void splitSwitchCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones) { assert(Shape.ABI == coro::ABI::Switch); @@ -1525,7 +1428,6 @@ static coro::Shape splitCoroutine(Function &F, return Shape; simplifySuspendPoints(Shape); - sinkLifetimeStartMarkers(F); buildCoroutineFrame(F, Shape); replaceFrameSize(Shape); diff --git a/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll b/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll similarity index 93% rename from llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll rename to llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll index 2d6b28a2baf8..9f9c1661138c 100644 --- a/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll +++ b/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll @@ -1,5 +1,5 @@ ; Tests that coro-split will optimize the lifetime.start maker of each local variable, -; sink them to the places closest to the actual use. +; sink them to the places after the suspend block. ; RUN: opt < %s -coro-split -S | FileCheck %s ; RUN: opt < %s -passes=coro-split -S | FileCheck %s @@ -43,14 +43,14 @@ exit: ; CHECK-LABEL: @a.resume( ; CHECK: %testval = alloca i32, align 4 +; CHECK-NEXT: %0 = bitcast i32* %testval to i8* +; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) ; CHECK-NEXT: getelementptr inbounds %a.Frame ; CHECK-NEXT: getelementptr inbounds %"struct.lean_future::Awaiter" -; CHECK-NEXT: %cast1 = bitcast i32* %testval to i8* ; CHECK-NEXT: %val = load i32, i32* %Result -; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* %cast1) ; CHECK-NEXT: %test = load i32, i32* %testval ; CHECK-NEXT: call void @print(i32 %test) -; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* %cast1) +; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* %0) ; CHECK-NEXT: call void @print(i32 %val) ; CHECK-NEXT: ret void diff --git a/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll b/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll new file mode 100644 index 000000000000..6695c22a1f09 --- /dev/null +++ b/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll @@ -0,0 +1,80 @@ +; Tests that coro-split will optimize the lifetime.start maker of each local variable, +; sink them to the places after the suspend block. +; RUN: opt < %s -coro-split -S | FileCheck %s +; RUN: opt < %s -passes=coro-split -S | FileCheck %s + +%"struct.std::coroutine_handle" = type { i8* } +%"struct.std::coroutine_handle.0" = type { %"struct.std::coroutine_handle" } +%"struct.lean_future::Awaiter" = type { i32, %"struct.std::coroutine_handle.0" } + +declare i1 @getcond() +declare i8* @malloc(i64) +declare void @print(i32) + +define void @a() "coroutine.presplit"="1" { +entry: + %ref.tmp7 = alloca %"struct.lean_future::Awaiter", align 8 + %testval = alloca i32 + %cast = bitcast i32* %testval to i8* + ; lifetime of %testval starts here, but not used until await.ready. + call void @llvm.lifetime.start.p0i8(i64 4, i8* %cast) + %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null) + %alloc = call i8* @malloc(i64 16) #3 + %vFrame = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %alloc) + %testcond = call i1 @getcond() + br i1 %testcond, label %if.suspend, label %else.direct + +if.suspend: + %save = call token @llvm.coro.save(i8* null) + %Result.i19 = getelementptr inbounds %"struct.lean_future::Awaiter", %"struct.lean_future::Awaiter"* %ref.tmp7, i64 0, i32 0 + %suspend = call i8 @llvm.coro.suspend(token %save, i1 false) + switch i8 %suspend, label %exit [ + i8 0, label %await.ready + i8 1, label %exit + ] + +else.direct: + br label %after.await + +await.ready: + %StrayCoroSave = call token @llvm.coro.save(i8* null) + %val = load i32, i32* %Result.i19 + %test = load i32, i32* %testval + call void @print(i32 %test) + call void @print(i32 %val) + br label %after.await + +after.await: + %test1 = load i32, i32* %testval + call void @print(i32 %test1) + call void @llvm.lifetime.end.p0i8(i64 4, i8* %cast) + br label %exit + +exit: + call i1 @llvm.coro.end(i8* null, i1 false) + ret void +} + +; CHECK-LABEL: @a.resume( +; CHECK: %[[VAL:testval.+]] = getelementptr inbounds %a.Frame +; CHECK-NOT: %testval = alloca i32, align 4 +; CHECK-NOT: %[[CAST:.+]] = bitcast i32* %testval to i8* +; CHECK-NOT: call void @llvm.lifetime.start.p0i8(i64 4, i8* %[[CAST]]) +; CHECK: %test = load i32, i32* %[[VAL]] +; CHECK-NOT: %test = load i32, i32* %testval + +declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*) +declare i1 @llvm.coro.alloc(token) #3 +declare noalias nonnull i8* @"\01??2 at YAPEAX_K@Z"(i64) local_unnamed_addr +declare i64 @llvm.coro.size.i64() #5 +declare i8* @llvm.coro.begin(token, i8* writeonly) #3 +declare void @"\01?puts@@YAXZZ"(...) +declare token @llvm.coro.save(i8*) #3 +declare i8* @llvm.coro.frame() #5 +declare i8 @llvm.coro.suspend(token, i1) #3 +declare void @"\01??3 at YAXPEAX@Z"(i8*) local_unnamed_addr #10 +declare i8* @llvm.coro.free(token, i8* nocapture readonly) #2 +declare i1 @llvm.coro.end(i8*, i1) #3 +declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #4 +declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #4 + From llvm-commits at lists.llvm.org Thu Jul 9 03:24:04 2020 From: llvm-commits at lists.llvm.org (JunMa via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:24:04 +0000 (UTC) Subject: [PATCH] D83379: [Coroutines] Refactor sinkLifetimeStartMarkers In-Reply-To: References: Message-ID: <7af2bbdbaad684fb6b70bcd1136c0c7f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf0bfad2ed9b4: [Coroutines] Refactor sinkLifetimeStartMarkers (authored by junparser). Changed prior to commit: https://reviews.llvm.org/D83379?vs=276357&id=276689#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83379/new/ https://reviews.llvm.org/D83379 Files: llvm/lib/Transforms/Coroutines/CoroFrame.cpp llvm/lib/Transforms/Coroutines/CoroSplit.cpp llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll llvm/test/Transforms/Coroutines/coro-split-sink-lifetime.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83379.276689.patch Type: text/x-patch Size: 12396 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:26:25 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:26:25 +0000 (UTC) Subject: [PATCH] D80916: [LegacyPM] Double check that passes correctly set their Modified status In-Reply-To: References: Message-ID: <5d4c89c8db064d4769d02496fac056cd@localhost.localdomain> serge-sans-paille added a comment. > Sounds good. I can try a build on AArch64 in a bit. > > Cross-compiling the test-suite for example should be relatively straight-forward, if you have a linker & libraries for the target architecture (e.g. on linux it should be easy to get the required toolchains for platforms like ARM and AArch64) https://llvm.org/docs/lnt/tests.html#cross-compiling Well, I can trigger the failure locally, it's just that I didn't enable all `LLVM_TARGETS` during my local builds. I hope cross compiling won't unveil more issues :-) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80916/new/ https://reviews.llvm.org/D80916 From llvm-commits at lists.llvm.org Thu Jul 9 03:28:47 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:28:47 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <65bb7ba9ff7ff4a964d30b319cae051c@localhost.localdomain> grimar updated this revision to Diff 276690. grimar added a comment. - Remove outdated test. - Test `SHT_REL` case too. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 Files: lld/ELF/InputFiles.cpp lld/test/ELF/invalid/reloc-section-reordered.test lld/test/ELF/reloc-sec-before-target.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83469.276690.patch Type: text/x-patch Size: 3858 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:29:36 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:29:36 +0000 (UTC) Subject: [PATCH] D83470: [LV] Fix versioning-for-unit-stide of loops with small trip count Message-ID: Ayal created this revision. Ayal added reviewers: fhahn, gilr, uabelho. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. This patch fixes D81345 and PR46652. If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis still generates runtime checks for unit strides that will version the loop. In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345 . The latter is chosen for now as the former requires refactoring. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83470 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/optsize.ll Index: llvm/test/Transforms/LoopVectorize/optsize.ll =================================================================== --- llvm/test/Transforms/LoopVectorize/optsize.ll +++ llvm/test/Transforms/LoopVectorize/optsize.ll @@ -221,6 +221,32 @@ ret void } +; PR46652: Check that the need for stride==1 check prevents vectorizing a loop +; having tiny trip count, when compiling w/o -Os/-Oz. +; CHECK-LABEL: @pr46652 +; CHECK-NOT: vector.scevcheck +; CHECK-NOT: vector.body +; CHECK-LABEL: for.body + + at g = external global [1 x i16], align 1 + +define void @pr46652() { +entry: + br label %for.body + +for.body: ; preds = %for.body, %entry + %l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ] + %mul = mul nsw i16 %l1.02, undef + %arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul + %0 = load i16, i16* %arrayidx6, align 1 + %inc9 = add nuw nsw i16 %l1.02, 1 + %exitcond.not = icmp eq i16 %inc9, 16 + br i1 %exitcond.not, label %for.end, label %for.body + +for.end: ; preds = %for.body + ret void +} + !llvm.module.flags = !{!0} !0 = !{i32 1, !"ProfileSummary", !1} !1 = !{!2, !3, !4, !5, !6, !7, !8, !9} Index: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp =================================================================== --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4937,8 +4937,14 @@ return true; } - assert(Legal->getLAI()->getSymbolicStrides().empty() && - "Specializing for stride == 1 under -Os/-Oz"); + // FIXME: Avoid specializing for stride==1 instead of bailing out. + if (!Legal->getLAI()->getSymbolicStrides().empty()) { + reportVectorizationFailure("Runtime stride check for small trip count", + "runtime stride == 1 checks needed. Enable vectorization of " + "this loop without such check by compiling with -Os/-Oz", + "CantVersionLoopWithOptForSize", ORE, TheLoop); + return true; + } return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83470.276687.patch Type: text/x-patch Size: 2061 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:29:57 2020 From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:29:57 +0000 (UTC) Subject: [PATCH] D83471: [PowerPC] Don't set use to RM for static rounding instructions Message-ID: qiucf created this revision. qiucf added reviewers: nemanjai, jsji, PowerPC, ZhangKang, steven.zhang, hfinkel. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. Herald added a project: LLVM. Instructions `x(v|s)r(d|s)pi[zmp]?` and `fri[npzm]` use fixed rounding directions without referencing current rounding mode. This patch removes their use to `RM` in instructions definition. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83471 Files: llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrVSX.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83471.276688.patch Type: text/x-patch Size: 10736 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:30:41 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:30:41 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: vitalybuka added inline comments. ================ Comment at: llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll:1 +; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \ +; RUN: FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s ---------------- vitalybuka wrote: > vitalybuka wrote: > > would you like to try go generate test with llvm/utils/update_analyze_test_checks.py > ? ``` ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \ ; RUN: FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define noundef i32 @NormalRet() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @NormalRet( ; CHECK-NEXT: ret i32 123 ; ret i32 123 } define i32 @PartialRet() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @PartialRet( ; CHECK-NEXT: store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 ; CHECK-NEXT: store i32 0, i32* @__msan_retval_origin_tls, align 4 ; CHECK-NEXT: ret i32 123 ; ret i32 123 } define noundef i32 @LoadedRet() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @LoadedRet( ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* ; CHECK-NEXT: [[O:%.*]] = load i32, i32* [[P]], align 4 ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* ; CHECK-NEXT: [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4 ; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4 ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0 ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0 ; CHECK: 7: ; CHECK-NEXT: call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1 ; CHECK-NEXT: unreachable ; CHECK: 8: ; CHECK-NEXT: ret i32 [[O]] ; %p = inttoptr i64 0 to i32 * %o = load i32, i32 *%p ret i32 %o } define void @NormalArg(i32 noundef %a) nounwind uwtable sanitize_memory { ; CHECK-LABEL: @NormalArg( ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* ; CHECK-NEXT: store i32 0, i32* [[TMP3]], align 4 ; CHECK-NEXT: store i32 [[A:%.*]], i32* [[P]], align 4 ; CHECK-NEXT: ret void ; %p = inttoptr i64 0 to i32 * store i32 %a, i32 *%p ret void } define void @PartialArg(i32 %a) nounwind uwtable sanitize_memory { ; CHECK-LABEL: @PartialArg( ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8 ; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4 ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* ; CHECK-NEXT: [[TMP3:%.*]] = ptrtoint i32* [[P]] to i64 ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[TMP3]], 87960930222080 ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* ; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 17592186044416 ; CHECK-NEXT: [[TMP7:%.*]] = inttoptr i64 [[TMP6]] to i32* ; CHECK-NEXT: store i32 [[TMP1]], i32* [[TMP5]], align 4 ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[TMP1]], 0 ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP8:%.*]], label [[TMP9:%.*]], !prof !0 ; CHECK: 8: ; CHECK-NEXT: store i32 [[TMP2]], i32* [[TMP7]], align 4 ; CHECK-NEXT: br label [[TMP9]] ; CHECK: 9: ; CHECK-NEXT: store i32 [[A:%.*]], i32* [[P]], align 4 ; CHECK-NEXT: ret void ; %p = inttoptr i64 0 to i32 * store i32 %a, i32 *%p ret void } define void @CallNormal() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @CallNormal( ; CHECK-NEXT: [[R:%.*]] = call i32 @NormalRet() #0 ; CHECK-NEXT: call void @NormalArg(i32 [[R]]) #0 ; CHECK-NEXT: ret void ; %r = call i32 @NormalRet() nounwind uwtable sanitize_memory call void @NormalArg(i32 %r) nounwind uwtable sanitize_memory ret void } define void @CallWithLoaded() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @CallWithLoaded( ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* ; CHECK-NEXT: [[O:%.*]] = load i32, i32* [[P]], align 4 ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* ; CHECK-NEXT: [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4 ; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4 ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0 ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0 ; CHECK: 7: ; CHECK-NEXT: call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1 ; CHECK-NEXT: unreachable ; CHECK: 8: ; CHECK-NEXT: call void @NormalArg(i32 [[O]]) #0 ; CHECK-NEXT: ret void ; %p = inttoptr i64 0 to i32 * %o = load i32, i32 *%p call void @NormalArg(i32 %o) nounwind uwtable sanitize_memory ret void } define void @CallPartial() nounwind uwtable sanitize_memory { ; CHECK-LABEL: @CallPartial( ; CHECK-NEXT: store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 ; CHECK-NEXT: [[R:%.*]] = call i32 @PartialRet() #0 ; CHECK-NEXT: [[_MSRET:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* @__msan_retval_origin_tls, align 4 ; CHECK-NEXT: store i32 [[_MSRET]], i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8 ; CHECK-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4 ; CHECK-NEXT: call void @PartialArg(i32 [[R]]) #0 ; CHECK-NEXT: ret void ; %r = call i32 @PartialRet() nounwind uwtable sanitize_memory call void @PartialArg(i32 %r) nounwind uwtable sanitize_memory ret void } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 From llvm-commits at lists.llvm.org Thu Jul 9 03:37:18 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:37:18 +0000 (UTC) Subject: [PATCH] D83468: [Debuginfo] Fix for PR46653 In-Reply-To: References: Message-ID: <04d46bee557d1db44e28b6556818ba92@localhost.localdomain> dstenb added a comment. Could you please add a paragraph to the summary that explains why this change is done? ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1987-1989 // We have an explicit location, different from the previous location. // Don't repeat a line-0 record, but otherwise emit the new location. // (The new location might be an explicit line 0, which we do emit.) ---------------- AFAICT, this comment is no longer valid. ================ Comment at: llvm/test/DebugInfo/X86/dbg-prolog-end.ll:29 ;CHECK-LABEL: main: -;CHECK: .loc 1 0 0 prologue_end define i32 @main() nounwind ssp !dbg !6 { ---------------- Since this test is called "dbg-prolog-end", perhaps this should still have a CHECK that verifies the position where prologue_end is emitted on? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83468/new/ https://reviews.llvm.org/D83468 From llvm-commits at lists.llvm.org Thu Jul 9 03:38:00 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 03:38:00 -0700 (PDT) Subject: [llvm] 58a8571 - DebugCounterList::printOptionInfo - use const auto& iterator in for-range-loop. Message-ID: <5f06f388.1c69fb81.2a6cc.6e37@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T11:37:49+01:00 New Revision: 58a85717cce5166b0952aee4d13375dda94f7497 URL: https://github.com/llvm/llvm-project/commit/58a85717cce5166b0952aee4d13375dda94f7497 DIFF: https://github.com/llvm/llvm-project/commit/58a85717cce5166b0952aee4d13375dda94f7497.diff LOG: DebugCounterList::printOptionInfo - use const auto& iterator in for-range-loop. Avoids unnecessary copies and silences clang tidy warning. Added: Modified: llvm/lib/Support/DebugCounter.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/DebugCounter.cpp b/llvm/lib/Support/DebugCounter.cpp index 713b41969818..8c579f395282 100644 --- a/llvm/lib/Support/DebugCounter.cpp +++ b/llvm/lib/Support/DebugCounter.cpp @@ -31,7 +31,7 @@ class DebugCounterList : public cl::list { // width, so we do the same. Option::printHelpStr(HelpStr, GlobalWidth, ArgStr.size() + 6); const auto &CounterInstance = DebugCounter::instance(); - for (auto Name : CounterInstance) { + for (const auto &Name : CounterInstance) { const auto Info = CounterInstance.getCounterInfo(CounterInstance.getCounterId(Name)); size_t NumSpaces = GlobalWidth - Info.first.size() - 8; From llvm-commits at lists.llvm.org Thu Jul 9 03:44:48 2020 From: llvm-commits at lists.llvm.org (Ding Fei via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:44:48 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: <6bd878d34717c00cb31b250b25a97783@localhost.localdomain> danix800 added a comment. Could anyone merge this revision if acceptable? I don't have commit access. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 From llvm-commits at lists.llvm.org Thu Jul 9 03:45:56 2020 From: llvm-commits at lists.llvm.org (Georgii Rymar via llvm-commits) Date: Thu, 09 Jul 2020 03:45:56 -0700 (PDT) Subject: [llvm] 54bdde1 - [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. Message-ID: <5f06f564.1c69fb81.da110.5dc5@mx.google.com> Author: Georgii Rymar Date: 2020-07-09T13:43:52+03:00 New Revision: 54bdde1dc0cde8176ef5616c82ee793218173cab URL: https://github.com/llvm/llvm-project/commit/54bdde1dc0cde8176ef5616c82ee793218173cab DIFF: https://github.com/llvm/llvm-project/commit/54bdde1dc0cde8176ef5616c82ee793218173cab.diff LOG: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. This allows to propagate an error and report a warning properly. Differential revision: https://reviews.llvm.org/D83393 Added: Modified: llvm/test/tools/llvm-readobj/ELF/versym-invalid.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-readobj/ELF/versym-invalid.test b/llvm/test/tools/llvm-readobj/ELF/versym-invalid.test index 7151a905b1d1..d495b1cfd063 100644 --- a/llvm/test/tools/llvm-readobj/ELF/versym-invalid.test +++ b/llvm/test/tools/llvm-readobj/ELF/versym-invalid.test @@ -115,13 +115,39 @@ Sections: ## Check we report a warning when a SHT_GNU_versym section has an invalid entry size. # RUN: yaml2obj --docnum=5 %s -o %t5 -# RUN: llvm-readelf -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU -# RUN: llvm-readobj -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM - +# RUN: llvm-readelf -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU +# RUN: llvm-readobj -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM + +# INVALID-ENT-SIZE-GNU: Symbol table '.dynsym' contains 2 entries: +# INVALID-ENT-SIZE-GNU-NEXT: Num: Value Size Type Bind Vis Ndx Name +# INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-GNU-NEXT: 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND @ +# INVALID-ENT-SIZE-GNU-NEXT: 1: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND foo@ # INVALID-ENT-SIZE-GNU: Version symbols section '.gnu.version' contains 1 entries: # INVALID-ENT-SIZE-GNU-NEXT: Addr: 0000000000000000 Offset: 0x000040 Link: 0 () # INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 +# INVALID-ENT-SIZE-LLVM: DynamicSymbols [ +# INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: @ (0) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: foo@ (1) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: ] # INVALID-ENT-SIZE-LLVM: VersionSymbols [ # INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 # INVALID-ENT-SIZE-LLVM-NEXT: ] @@ -137,6 +163,8 @@ Sections: Type: SHT_GNU_versym Entries: [ 0 ] EntSize: 3 +DynamicSymbols: + - Name: foo ## Check we report a warning when the number of version entries does not match the number of symbols in the associated symbol table. diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp index b8a5de27cb67..6a7f37e39a9a 100644 --- a/llvm/tools/llvm-readobj/ELFDumper.cpp +++ b/llvm/tools/llvm-readobj/ELFDumper.cpp @@ -1086,10 +1086,12 @@ Expected ELFDumper::getSymbolVersion(const Elf_Sym *Sym, sizeof(Elf_Sym); // Get the corresponding version index entry. - const Elf_Versym *Versym = unwrapOrError( - ObjF->getFileName(), ObjF->getELFFile()->template getEntry( - SymbolVersionSection, EntryIndex)); - return this->getSymbolVersionByIndex(Versym->vs_index, IsDefault); + if (Expected EntryOrErr = + ObjF->getELFFile()->template getEntry( + SymbolVersionSection, EntryIndex)) + return this->getSymbolVersionByIndex((*EntryOrErr)->vs_index, IsDefault); + else + return EntryOrErr.takeError(); } template From llvm-commits at lists.llvm.org Thu Jul 9 03:46:07 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:46:07 +0000 (UTC) Subject: [PATCH] D83393: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper::getSymbolVersion'. In-Reply-To: References: Message-ID: <03350058573fe9e4047b5abc89d1505d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG54bdde1dc0cd: [llvm-readelf] - Stop using 'unwrapOrError()' in 'ELFDumper<ELFT>… (authored by grimar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83393/new/ https://reviews.llvm.org/D83393 Files: llvm/test/tools/llvm-readobj/ELF/versym-invalid.test llvm/tools/llvm-readobj/ELFDumper.cpp Index: llvm/tools/llvm-readobj/ELFDumper.cpp =================================================================== --- llvm/tools/llvm-readobj/ELFDumper.cpp +++ llvm/tools/llvm-readobj/ELFDumper.cpp @@ -1086,10 +1086,12 @@ sizeof(Elf_Sym); // Get the corresponding version index entry. - const Elf_Versym *Versym = unwrapOrError( - ObjF->getFileName(), ObjF->getELFFile()->template getEntry( - SymbolVersionSection, EntryIndex)); - return this->getSymbolVersionByIndex(Versym->vs_index, IsDefault); + if (Expected EntryOrErr = + ObjF->getELFFile()->template getEntry( + SymbolVersionSection, EntryIndex)) + return this->getSymbolVersionByIndex((*EntryOrErr)->vs_index, IsDefault); + else + return EntryOrErr.takeError(); } template Index: llvm/test/tools/llvm-readobj/ELF/versym-invalid.test =================================================================== --- llvm/test/tools/llvm-readobj/ELF/versym-invalid.test +++ llvm/test/tools/llvm-readobj/ELF/versym-invalid.test @@ -115,13 +115,39 @@ ## Check we report a warning when a SHT_GNU_versym section has an invalid entry size. # RUN: yaml2obj --docnum=5 %s -o %t5 -# RUN: llvm-readelf -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU -# RUN: llvm-readobj -V %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM - +# RUN: llvm-readelf -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-GNU +# RUN: llvm-readobj -V --dyn-syms %t5 2>&1 | FileCheck -DFILE=%t5 %s --check-prefix=INVALID-ENT-SIZE-LLVM + +# INVALID-ENT-SIZE-GNU: Symbol table '.dynsym' contains 2 entries: +# INVALID-ENT-SIZE-GNU-NEXT: Num: Value Size Type Bind Vis Ndx Name +# INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-GNU-NEXT: 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND @ +# INVALID-ENT-SIZE-GNU-NEXT: 1: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND foo@ # INVALID-ENT-SIZE-GNU: Version symbols section '.gnu.version' contains 1 entries: # INVALID-ENT-SIZE-GNU-NEXT: Addr: 0000000000000000 Offset: 0x000040 Link: 0 () # INVALID-ENT-SIZE-GNU-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 +# INVALID-ENT-SIZE-LLVM: DynamicSymbols [ +# INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': section [index 1] has invalid sh_entsize: expected 2, but got 3 +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: @ (0) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: Symbol { +# INVALID-ENT-SIZE-LLVM-NEXT: Name: foo@ (1) +# INVALID-ENT-SIZE-LLVM-NEXT: Value: 0x0 +# INVALID-ENT-SIZE-LLVM-NEXT: Size: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Binding: Local (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Type: None (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: Other: 0 +# INVALID-ENT-SIZE-LLVM-NEXT: Section: Undefined (0x0) +# INVALID-ENT-SIZE-LLVM-NEXT: } +# INVALID-ENT-SIZE-LLVM-NEXT: ] # INVALID-ENT-SIZE-LLVM: VersionSymbols [ # INVALID-ENT-SIZE-LLVM-NEXT: warning: '[[FILE]]': cannot read content of SHT_GNU_versym section with index 1: section [index 1] has an invalid sh_entsize: 3 # INVALID-ENT-SIZE-LLVM-NEXT: ] @@ -137,6 +163,8 @@ Type: SHT_GNU_versym Entries: [ 0 ] EntSize: 3 +DynamicSymbols: + - Name: foo ## Check we report a warning when the number of version entries does not match the number of symbols in the associated symbol table. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83393.276694.patch Type: text/x-patch Size: 4072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:48:21 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:48:21 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: jhenderson added a comment. Looks fine to me, just comments that need updating. ================ Comment at: lld/ELF/InputFiles.cpp:644 - // This block handles SHF_LINK_ORDER. + // We have the second loop. It is used to: + // 1) handle SHF_LINK_ORDER sections. ---------------- the -> a ================ Comment at: lld/ELF/InputFiles.cpp:646 + // 1) handle SHF_LINK_ORDER sections. + // 2) create SHT_REL[A] sections. In a specific case it might be possible + // to have a relocatable section that follows the corresponding relocation ---------------- I would say "create SHT_REL[A} sections. In some cases relocated sections may follow the corresponding relocation section. In such a case, the relocation section would attempt to reference a target section that has not yet been created. For simplicity, delay creation of relocation sections until now." ================ Comment at: lld/ELF/InputFiles.cpp:656 + + // Create SHT_REL[A] sections. + if (sec.sh_type == SHT_REL || sec.sh_type == SHT_RELA) ---------------- I think this comment is probably superfluous - the code is simple enough to show it. ================ Comment at: lld/ELF/InputFiles.cpp:660 + + // This block handles SHF_LINK_ORDER. if (!(sec.sh_flags & SHF_LINK_ORDER)) ---------------- I think you can delete this comment. It's fairly obvious that the rest is to do with SHF_LINK_ORDER, owing to the if statement. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 03:49:04 2020 From: llvm-commits at lists.llvm.org (Paul Walker via llvm-commits) Date: Thu, 09 Jul 2020 03:49:04 -0700 (PDT) Subject: [llvm] 614fb09 - [SVE] Disable some BUILD_VECTOR related code generator features. Message-ID: <5f06f620.1c69fb81.daa83.6777@mx.google.com> Author: Paul Walker Date: 2020-07-09T10:47:04Z New Revision: 614fb09645c8710f106dedb5f244f75ef97a1acb URL: https://github.com/llvm/llvm-project/commit/614fb09645c8710f106dedb5f244f75ef97a1acb DIFF: https://github.com/llvm/llvm-project/commit/614fb09645c8710f106dedb5f244f75ef97a1acb.diff LOG: [SVE] Disable some BUILD_VECTOR related code generator features. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch updates isShuffleMaskLegal to reject vector types that require SVE. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Differential Revision: https://reviews.llvm.org/D83408 Added: Modified: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 729fb8f62912..c3735c8784ca 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8741,6 +8741,10 @@ SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op, } bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef M, EVT VT) const { + // Currently no fixed length shuffles that require SVE are legal. + if (useSVEForFixedLengthVectorVT(VT)) + return false; + if (VT.getVectorNumElements() == 4 && (VT.is128BitVector() || VT.is64BitVector())) { unsigned PFIndexes[4]; diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h index 60ce88576f91..4b395acd816d 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -734,6 +734,16 @@ class AArch64TargetLowering : public TargetLowering { bool fallBackToDAGISel(const Instruction &Inst) const override; + /// SVE code generation for fixed length vectors does not custom lower + /// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to + /// merge. However, merging them creates a BUILD_VECTOR that is just as + /// illegal as the original, thus leading to an infinite legalisation loop. + /// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal + /// vector types this override can be removed. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); + } + private: /// Keep a pointer to the AArch64Subtarget around so that we can /// make the right decision when generating code for diff erent targets. diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll index 52574fad8210..6c01b51a4d81 100644 --- a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -3,6 +3,18 @@ target triple = "aarch64-unknown-linux-gnu" +; Currently there is no custom lowering for vector shuffles operating on types +; bigger than NEON. However, having no support opens us up to a code generator +; hang when expanding BUILD_VECTOR. Here we just validate the promblematic case +; successfully exits code generation. +define void @hang_when_merging_stores_after_legalisation(<8 x i32>* %a, <2 x i32> %b) #0 { +; CHECK-LABEL: hang_when_merging_stores_after_legalisation: + %splat = shufflevector <2 x i32> %b, <2 x i32> undef, <8 x i32> zeroinitializer + %interleaved.vec = shufflevector <8 x i32> %splat, <8 x i32> undef, <8 x i32> + store <8 x i32> %interleaved.vec, <8 x i32>* %a, align 4 + ret void +} + ; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in ; validating all combinations of vector type. From llvm-commits at lists.llvm.org Thu Jul 9 03:49:06 2020 From: llvm-commits at lists.llvm.org (Paul Walker via llvm-commits) Date: Thu, 09 Jul 2020 03:49:06 -0700 (PDT) Subject: [llvm] 6b40331 - [SVE] Scalarize fixed length masked loads and stores. Message-ID: <5f06f622.1c69fb81.6774.6bd4@mx.google.com> Author: Paul Walker Date: 2020-07-09T10:47:04Z New Revision: 6b403319f86f55f1d6e16bde58d792d696e9fa8f URL: https://github.com/llvm/llvm-project/commit/6b403319f86f55f1d6e16bde58d792d696e9fa8f DIFF: https://github.com/llvm/llvm-project/commit/6b403319f86f55f1d6e16bde58d792d696e9fa8f.diff LOG: [SVE] Scalarize fixed length masked loads and stores. When adding support for scalable vector masked loads and stores we accidently opened up likewise for fixed length vectors. This patch restricts support to scalable vectors only, thus ensuring fixed length vectors are treated the same regardless of SVE support. Differential Revision: https://reviews.llvm.org/D83341 Added: llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll Modified: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 27afb2e5a7d6..6b1e5d5083e2 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -159,10 +159,10 @@ class AArch64TTIImpl : public BasicTTIImplBase { bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info); bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) { - if (!isa(DataType) || !ST->hasSVE()) + if (!isa(DataType) || !ST->hasSVE()) return false; - Type *Ty = cast(DataType)->getElementType(); + Type *Ty = cast(DataType)->getElementType(); if (Ty->isBFloatTy() || Ty->isHalfTy() || Ty->isFloatTy() || Ty->isDoubleTy()) return true; diff --git a/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll b/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll new file mode 100644 index 000000000000..580c403ab719 --- /dev/null +++ b/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll @@ -0,0 +1,129 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S %s -scalarize-masked-mem-intrin -mtriple=aarch64-linux-gnu | FileCheck %s +; RUN: opt -S %s -scalarize-masked-mem-intrin -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s + +define <2 x i64> @scalarize_v2i64(<2 x i64>* %p, <2 x i1> %mask, <2 x i64> %passthru) { +; CHECK-LABEL: @scalarize_v2i64( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: [[SCALAR_MASK:%.*]] = bitcast <2 x i1> [[MASK:%.*]] to i2 +; CHECK-NEXT: [[TMP2:%.*]] = and i2 [[SCALAR_MASK]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i2 [[TMP2]], 0 +; CHECK-NEXT: br i1 [[TMP3]], label [[COND_LOAD:%.*]], label [[ELSE:%.*]] +; CHECK: cond.load: +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 0 +; CHECK-NEXT: [[TMP5:%.*]] = load i64, i64* [[TMP4]], align 8 +; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[PASSTHRU:%.*]], i64 [[TMP5]], i64 0 +; CHECK-NEXT: br label [[ELSE]] +; CHECK: else: +; CHECK-NEXT: [[RES_PHI_ELSE:%.*]] = phi <2 x i64> [ [[TMP6]], [[COND_LOAD]] ], [ [[PASSTHRU]], [[TMP0:%.*]] ] +; CHECK-NEXT: [[TMP7:%.*]] = and i2 [[SCALAR_MASK]], -2 +; CHECK-NEXT: [[TMP8:%.*]] = icmp ne i2 [[TMP7]], 0 +; CHECK-NEXT: br i1 [[TMP8]], label [[COND_LOAD1:%.*]], label [[ELSE2:%.*]] +; CHECK: cond.load1: +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 1 +; CHECK-NEXT: [[TMP10:%.*]] = load i64, i64* [[TMP9]], align 8 +; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[RES_PHI_ELSE]], i64 [[TMP10]], i64 1 +; CHECK-NEXT: br label [[ELSE2]] +; CHECK: else2: +; CHECK-NEXT: [[RES_PHI_ELSE3:%.*]] = phi <2 x i64> [ [[TMP11]], [[COND_LOAD1]] ], [ [[RES_PHI_ELSE]], [[ELSE]] ] +; CHECK-NEXT: ret <2 x i64> [[RES_PHI_ELSE3]] +; + %ret = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64>* %p, i32 128, <2 x i1> %mask, <2 x i64> %passthru) + ret <2 x i64> %ret +} + +define <2 x i64> @scalarize_v2i64_ones_mask(<2 x i64>* %p, <2 x i64> %passthru) { +; CHECK-LABEL: @scalarize_v2i64_ones_mask( +; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i64>, <2 x i64>* [[P:%.*]], align 8 +; CHECK-NEXT: ret <2 x i64> [[TMP1]] +; + %ret = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64>* %p, i32 8, <2 x i1> , <2 x i64> %passthru) + ret <2 x i64> %ret +} + +define <2 x i64> @scalarize_v2i64_zero_mask(<2 x i64>* %p, <2 x i64> %passthru) { +; CHECK-LABEL: @scalarize_v2i64_zero_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: ret <2 x i64> [[PASSTHRU:%.*]] +; + %ret = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64>* %p, i32 8, <2 x i1> , <2 x i64> %passthru) + ret <2 x i64> %ret +} + +define <2 x i64> @scalarize_v2i64_const_mask(<2 x i64>* %p, <2 x i64> %passthru) { +; CHECK-LABEL: @scalarize_v2i64_const_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 1 +; CHECK-NEXT: [[TMP3:%.*]] = load i64, i64* [[TMP2]], align 8 +; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> [[PASSTHRU:%.*]], i64 [[TMP3]], i64 1 +; CHECK-NEXT: ret <2 x i64> [[TMP4]] +; + %ret = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64>* %p, i32 8, <2 x i1> , <2 x i64> %passthru) + ret <2 x i64> %ret +} + +; This use a byte sized but non power of 2 element size. This used to crash due to bad alignment calculation. +define <2 x i24> @scalarize_v2i24(<2 x i24>* %p, <2 x i1> %mask, <2 x i24> %passthru) { +; CHECK-LABEL: @scalarize_v2i24( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i24>* [[P:%.*]] to i24* +; CHECK-NEXT: [[SCALAR_MASK:%.*]] = bitcast <2 x i1> [[MASK:%.*]] to i2 +; CHECK-NEXT: [[TMP2:%.*]] = and i2 [[SCALAR_MASK]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i2 [[TMP2]], 0 +; CHECK-NEXT: br i1 [[TMP3]], label [[COND_LOAD:%.*]], label [[ELSE:%.*]] +; CHECK: cond.load: +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i24, i24* [[TMP1]], i32 0 +; CHECK-NEXT: [[TMP5:%.*]] = load i24, i24* [[TMP4]], align 1 +; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i24> [[PASSTHRU:%.*]], i24 [[TMP5]], i64 0 +; CHECK-NEXT: br label [[ELSE]] +; CHECK: else: +; CHECK-NEXT: [[RES_PHI_ELSE:%.*]] = phi <2 x i24> [ [[TMP6]], [[COND_LOAD]] ], [ [[PASSTHRU]], [[TMP0:%.*]] ] +; CHECK-NEXT: [[TMP7:%.*]] = and i2 [[SCALAR_MASK]], -2 +; CHECK-NEXT: [[TMP8:%.*]] = icmp ne i2 [[TMP7]], 0 +; CHECK-NEXT: br i1 [[TMP8]], label [[COND_LOAD1:%.*]], label [[ELSE2:%.*]] +; CHECK: cond.load1: +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i24, i24* [[TMP1]], i32 1 +; CHECK-NEXT: [[TMP10:%.*]] = load i24, i24* [[TMP9]], align 1 +; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i24> [[RES_PHI_ELSE]], i24 [[TMP10]], i64 1 +; CHECK-NEXT: br label [[ELSE2]] +; CHECK: else2: +; CHECK-NEXT: [[RES_PHI_ELSE3:%.*]] = phi <2 x i24> [ [[TMP11]], [[COND_LOAD1]] ], [ [[RES_PHI_ELSE]], [[ELSE]] ] +; CHECK-NEXT: ret <2 x i24> [[RES_PHI_ELSE3]] +; + %ret = call <2 x i24> @llvm.masked.load.v2i24.p0v2i24(<2 x i24>* %p, i32 8, <2 x i1> %mask, <2 x i24> %passthru) + ret <2 x i24> %ret +} + +; This use a byte sized but non power of 2 element size. This used to crash due to bad alignment calculation. +define <2 x i48> @scalarize_v2i48(<2 x i48>* %p, <2 x i1> %mask, <2 x i48> %passthru) { +; CHECK-LABEL: @scalarize_v2i48( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i48>* [[P:%.*]] to i48* +; CHECK-NEXT: [[SCALAR_MASK:%.*]] = bitcast <2 x i1> [[MASK:%.*]] to i2 +; CHECK-NEXT: [[TMP2:%.*]] = and i2 [[SCALAR_MASK]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i2 [[TMP2]], 0 +; CHECK-NEXT: br i1 [[TMP3]], label [[COND_LOAD:%.*]], label [[ELSE:%.*]] +; CHECK: cond.load: +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i48, i48* [[TMP1]], i32 0 +; CHECK-NEXT: [[TMP5:%.*]] = load i48, i48* [[TMP4]], align 2 +; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i48> [[PASSTHRU:%.*]], i48 [[TMP5]], i64 0 +; CHECK-NEXT: br label [[ELSE]] +; CHECK: else: +; CHECK-NEXT: [[RES_PHI_ELSE:%.*]] = phi <2 x i48> [ [[TMP6]], [[COND_LOAD]] ], [ [[PASSTHRU]], [[TMP0:%.*]] ] +; CHECK-NEXT: [[TMP7:%.*]] = and i2 [[SCALAR_MASK]], -2 +; CHECK-NEXT: [[TMP8:%.*]] = icmp ne i2 [[TMP7]], 0 +; CHECK-NEXT: br i1 [[TMP8]], label [[COND_LOAD1:%.*]], label [[ELSE2:%.*]] +; CHECK: cond.load1: +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i48, i48* [[TMP1]], i32 1 +; CHECK-NEXT: [[TMP10:%.*]] = load i48, i48* [[TMP9]], align 2 +; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i48> [[RES_PHI_ELSE]], i48 [[TMP10]], i64 1 +; CHECK-NEXT: br label [[ELSE2]] +; CHECK: else2: +; CHECK-NEXT: [[RES_PHI_ELSE3:%.*]] = phi <2 x i48> [ [[TMP11]], [[COND_LOAD1]] ], [ [[RES_PHI_ELSE]], [[ELSE]] ] +; CHECK-NEXT: ret <2 x i48> [[RES_PHI_ELSE3]] +; + %ret = call <2 x i48> @llvm.masked.load.v2i48.p0v2i48(<2 x i48>* %p, i32 16, <2 x i1> %mask, <2 x i48> %passthru) + ret <2 x i48> %ret +} + +declare <2 x i24> @llvm.masked.load.v2i24.p0v2i24(<2 x i24>*, i32, <2 x i1>, <2 x i24>) +declare <2 x i48> @llvm.masked.load.v2i48.p0v2i48(<2 x i48>*, i32, <2 x i1>, <2 x i48>) +declare <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64>*, i32, <2 x i1>, <2 x i64>) diff --git a/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll b/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll new file mode 100644 index 000000000000..e3245ca34d1e --- /dev/null +++ b/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll @@ -0,0 +1,63 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S %s -scalarize-masked-mem-intrin -mtriple=aarch64-linux-gnu | FileCheck %s +; RUN: opt -S %s -scalarize-masked-mem-intrin -mtriple=aarch64-linux-gnu -mattr=+sve | FileCheck %s + +define void @scalarize_v2i64(<2 x i64>* %p, <2 x i1> %mask, <2 x i64> %data) { +; CHECK-LABEL: @scalarize_v2i64( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: [[SCALAR_MASK:%.*]] = bitcast <2 x i1> [[MASK:%.*]] to i2 +; CHECK-NEXT: [[TMP2:%.*]] = and i2 [[SCALAR_MASK]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i2 [[TMP2]], 0 +; CHECK-NEXT: br i1 [[TMP3]], label [[COND_STORE:%.*]], label [[ELSE:%.*]] +; CHECK: cond.store: +; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[DATA:%.*]], i64 0 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 0 +; CHECK-NEXT: store i64 [[TMP4]], i64* [[TMP5]], align 8 +; CHECK-NEXT: br label [[ELSE]] +; CHECK: else: +; CHECK-NEXT: [[TMP6:%.*]] = and i2 [[SCALAR_MASK]], -2 +; CHECK-NEXT: [[TMP7:%.*]] = icmp ne i2 [[TMP6]], 0 +; CHECK-NEXT: br i1 [[TMP7]], label [[COND_STORE1:%.*]], label [[ELSE2:%.*]] +; CHECK: cond.store1: +; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[DATA]], i64 1 +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 1 +; CHECK-NEXT: store i64 [[TMP8]], i64* [[TMP9]], align 8 +; CHECK-NEXT: br label [[ELSE2]] +; CHECK: else2: +; CHECK-NEXT: ret void +; + call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> %data, <2 x i64>* %p, i32 128, <2 x i1> %mask) + ret void +} + +define void @scalarize_v2i64_ones_mask(<2 x i64>* %p, <2 x i64> %data) { +; CHECK-LABEL: @scalarize_v2i64_ones_mask( +; CHECK-NEXT: store <2 x i64> [[DATA:%.*]], <2 x i64>* [[P:%.*]], align 8 +; CHECK-NEXT: ret void +; + call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> %data, <2 x i64>* %p, i32 8, <2 x i1> ) + ret void +} + +define void @scalarize_v2i64_zero_mask(<2 x i64>* %p, <2 x i64> %data) { +; CHECK-LABEL: @scalarize_v2i64_zero_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: ret void +; + call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> %data, <2 x i64>* %p, i32 8, <2 x i1> ) + ret void +} + +define void @scalarize_v2i64_const_mask(<2 x i64>* %p, <2 x i64> %data) { +; CHECK-LABEL: @scalarize_v2i64_const_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64>* [[P:%.*]] to i64* +; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[DATA:%.*]], i64 1 +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, i64* [[TMP1]], i32 1 +; CHECK-NEXT: store i64 [[TMP2]], i64* [[TMP3]], align 8 +; CHECK-NEXT: ret void +; + call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> %data, <2 x i64>* %p, i32 8, <2 x i1> ) + ret void +} + +declare void @llvm.masked.store.v2i64.p0v2i64(<2 x i64>, <2 x i64>*, i32, <2 x i1>) From llvm-commits at lists.llvm.org Thu Jul 9 03:49:16 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:49:16 +0000 (UTC) Subject: [PATCH] D83408: [SVE] Disable some BUILD_VECTOR related code generator features. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG614fb09645c8: [SVE] Disable some BUILD_VECTOR related code generator features. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83408/new/ https://reviews.llvm.org/D83408 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll @@ -3,6 +3,18 @@ target triple = "aarch64-unknown-linux-gnu" +; Currently there is no custom lowering for vector shuffles operating on types +; bigger than NEON. However, having no support opens us up to a code generator +; hang when expanding BUILD_VECTOR. Here we just validate the promblematic case +; successfully exits code generation. +define void @hang_when_merging_stores_after_legalisation(<8 x i32>* %a, <2 x i32> %b) #0 { +; CHECK-LABEL: hang_when_merging_stores_after_legalisation: + %splat = shufflevector <2 x i32> %b, <2 x i32> undef, <8 x i32> zeroinitializer + %interleaved.vec = shufflevector <8 x i32> %splat, <8 x i32> undef, <8 x i32> + store <8 x i32> %interleaved.vec, <8 x i32>* %a, align 4 + ret void +} + ; NOTE: Currently all CONCAT_VECTORS get expanded so there's little point in ; validating all combinations of vector type. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -734,6 +734,16 @@ bool fallBackToDAGISel(const Instruction &Inst) const override; + /// SVE code generation for fixed length vectors does not custom lower + /// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to + /// merge. However, merging them creates a BUILD_VECTOR that is just as + /// illegal as the original, thus leading to an infinite legalisation loop. + /// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal + /// vector types this override can be removed. + bool mergeStoresAfterLegalization(EVT VT) const override { + return !useSVEForFixedLengthVectors(); + } + private: /// Keep a pointer to the AArch64Subtarget around so that we can /// make the right decision when generating code for different targets. Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8741,6 +8741,10 @@ } bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef M, EVT VT) const { + // Currently no fixed length shuffles that require SVE are legal. + if (useSVEForFixedLengthVectorVT(VT)) + return false; + if (VT.getVectorNumElements() == 4 && (VT.is128BitVector() || VT.is64BitVector())) { unsigned PFIndexes[4]; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83408.276695.patch Type: text/x-patch Size: 2778 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:49:19 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:49:19 +0000 (UTC) Subject: [PATCH] D83341: [SVE] Scalarize fixed length masked loads and stores. In-Reply-To: References: Message-ID: <9135688c5de3e5e19431bcadc889fb68@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG6b403319f86f: [SVE] Scalarize fixed length masked loads and stores. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83341/new/ https://reviews.llvm.org/D83341 Files: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-load.ll llvm/test/Transforms/ScalarizeMaskedMemIntrin/AArch64/expand-masked-store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83341.276696.patch Type: text/x-patch Size: 11825 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:51:16 2020 From: llvm-commits at lists.llvm.org (Xing GUO via llvm-commits) Date: Thu, 09 Jul 2020 03:51:16 -0700 (PDT) Subject: [llvm] 47c4ce4 - [DWARFYAML] Use override instead of virtual for better safety. Message-ID: <5f06f6a4.1c69fb81.f4b7c.6989@mx.google.com> Author: Xing GUO Date: 2020-07-09T18:55:42+08:00 New Revision: 47c4ce41a16412efe49da1715be3144861cbf50a URL: https://github.com/llvm/llvm-project/commit/47c4ce41a16412efe49da1715be3144861cbf50a DIFF: https://github.com/llvm/llvm-project/commit/47c4ce41a16412efe49da1715be3144861cbf50a.diff LOG: [DWARFYAML] Use override instead of virtual for better safety. Functions in DWARFYML::FixupVisitor are declared as virtual functions in its base class DWARFYAML::Visitor. We should use the mordern "override" keyword instead of "virtual" for virtual functions in subclasses for better safety. Besides, the visibility is changed from private to protected to make it consistent with DWARFYAML::FixupVisitor class and DWARFYAML::Visitor class. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83452 Added: Modified: llvm/lib/ObjectYAML/DWARFEmitter.cpp Removed: ################################################################################ diff --git a/llvm/lib/ObjectYAML/DWARFEmitter.cpp b/llvm/lib/ObjectYAML/DWARFEmitter.cpp index 6cbb90bb10f2..a8b467af7b2d 100644 --- a/llvm/lib/ObjectYAML/DWARFEmitter.cpp +++ b/llvm/lib/ObjectYAML/DWARFEmitter.cpp @@ -440,36 +440,36 @@ class DIEFixupVisitor : public DWARFYAML::Visitor { public: DIEFixupVisitor(DWARFYAML::Data &DI) : DWARFYAML::Visitor(DI){}; -private: - virtual void onStartCompileUnit(DWARFYAML::Unit &CU) { +protected: + void onStartCompileUnit(DWARFYAML::Unit &CU) override { // Size of the unit header, excluding the length field itself. Length = CU.Version >= 5 ? 8 : 7; } - virtual void onEndCompileUnit(DWARFYAML::Unit &CU) { CU.Length = Length; } + void onEndCompileUnit(DWARFYAML::Unit &CU) override { CU.Length = Length; } - virtual void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) { + void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) override { Length += getULEB128Size(DIE.AbbrCode); } - virtual void onValue(const uint8_t U) { Length += 1; } - virtual void onValue(const uint16_t U) { Length += 2; } - virtual void onValue(const uint32_t U) { Length += 4; } - virtual void onValue(const uint64_t U, const bool LEB = false) { + void onValue(const uint8_t U) override { Length += 1; } + void onValue(const uint16_t U) override { Length += 2; } + void onValue(const uint32_t U) override { Length += 4; } + void onValue(const uint64_t U, const bool LEB = false) override { if (LEB) Length += getULEB128Size(U); else Length += 8; } - virtual void onValue(const int64_t S, const bool LEB = false) { + void onValue(const int64_t S, const bool LEB = false) override { if (LEB) Length += getSLEB128Size(S); else Length += 8; } - virtual void onValue(const StringRef String) { Length += String.size() + 1; } + void onValue(const StringRef String) override { Length += String.size() + 1; } - virtual void onValue(const MemoryBufferRef MBR) { + void onValue(const MemoryBufferRef MBR) override { Length += MBR.getBufferSize(); } }; From llvm-commits at lists.llvm.org Thu Jul 9 03:51:23 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:51:23 +0000 (UTC) Subject: [PATCH] D83452: [DWARFYAML] Use override instead of virtual for better safety. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG47c4ce41a164: [DWARFYAML] Use override instead of virtual for better safety. (authored by Higuoxing). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83452/new/ https://reviews.llvm.org/D83452 Files: llvm/lib/ObjectYAML/DWARFEmitter.cpp Index: llvm/lib/ObjectYAML/DWARFEmitter.cpp =================================================================== --- llvm/lib/ObjectYAML/DWARFEmitter.cpp +++ llvm/lib/ObjectYAML/DWARFEmitter.cpp @@ -440,36 +440,36 @@ public: DIEFixupVisitor(DWARFYAML::Data &DI) : DWARFYAML::Visitor(DI){}; -private: - virtual void onStartCompileUnit(DWARFYAML::Unit &CU) { +protected: + void onStartCompileUnit(DWARFYAML::Unit &CU) override { // Size of the unit header, excluding the length field itself. Length = CU.Version >= 5 ? 8 : 7; } - virtual void onEndCompileUnit(DWARFYAML::Unit &CU) { CU.Length = Length; } + void onEndCompileUnit(DWARFYAML::Unit &CU) override { CU.Length = Length; } - virtual void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) { + void onStartDIE(DWARFYAML::Unit &CU, DWARFYAML::Entry &DIE) override { Length += getULEB128Size(DIE.AbbrCode); } - virtual void onValue(const uint8_t U) { Length += 1; } - virtual void onValue(const uint16_t U) { Length += 2; } - virtual void onValue(const uint32_t U) { Length += 4; } - virtual void onValue(const uint64_t U, const bool LEB = false) { + void onValue(const uint8_t U) override { Length += 1; } + void onValue(const uint16_t U) override { Length += 2; } + void onValue(const uint32_t U) override { Length += 4; } + void onValue(const uint64_t U, const bool LEB = false) override { if (LEB) Length += getULEB128Size(U); else Length += 8; } - virtual void onValue(const int64_t S, const bool LEB = false) { + void onValue(const int64_t S, const bool LEB = false) override { if (LEB) Length += getSLEB128Size(S); else Length += 8; } - virtual void onValue(const StringRef String) { Length += String.size() + 1; } + void onValue(const StringRef String) override { Length += String.size() + 1; } - virtual void onValue(const MemoryBufferRef MBR) { + void onValue(const MemoryBufferRef MBR) override { Length += MBR.getBufferSize(); } }; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83452.276697.patch Type: text/x-patch Size: 2020 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 03:53:49 2020 From: llvm-commits at lists.llvm.org (Andrew Ng via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 10:53:49 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: <8f0d2c107e1e1024e2c10b4d79b8ec2c@localhost.localdomain> andrewng added a comment. In D83321#2140620 , @danix800 wrote: > Full context attached @andrewng . Thanks for doing that, makes it easier for reviewers to see around the code without having to load up the file in question. Perhaps this particular use case needs to be added to the unit testing ("llvm/unittests/Support/Path.cpp")? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 From llvm-commits at lists.llvm.org Thu Jul 9 04:03:48 2020 From: llvm-commits at lists.llvm.org (Sourabh Singh Tomar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:03:48 +0000 (UTC) Subject: [PATCH] D82084: [DebugInfo] Refactored out `debug_line.dwo` emission from `DwarfTypeUnit` to `DwarfDebug` In-Reply-To: References: Message-ID: SouraVX added a comment. Apologies for If I'm pinging too frequently :) . Gentle Ping to all reviewers again! :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82084/new/ https://reviews.llvm.org/D82084 From llvm-commits at lists.llvm.org Thu Jul 9 04:06:17 2020 From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:06:17 +0000 (UTC) Subject: [PATCH] D82638: [MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef In-Reply-To: References: Message-ID: lkail added a comment. I used your previous `mcp-dest-regs-no-dup.ll` and run llc -simplify-mir -verify-machineinstrs -mtriple=arm-eabi -stop-before=machine-cp mcp-dest-regs-no-dup.ll -o mcp-dest-regs-no-dup.mir It produces --- | ; ModuleID = 'mcp-dest-regs-no-dup.ll' source_filename = "mcp-dest-regs-no-dup.ll" target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "arm-unknown-unknown-eabi" @a = hidden local_unnamed_addr global i32 0, align 4 @b = hidden local_unnamed_addr global i32 0, align 4 @f = hidden local_unnamed_addr global i32 0, align 4 @c = hidden local_unnamed_addr global i32 0, align 4 @d = hidden local_unnamed_addr global i32 0, align 4 @e = hidden local_unnamed_addr global i32 0, align 4 @m = hidden local_unnamed_addr global i32 0, align 4 @g = hidden local_unnamed_addr global i32* null, align 4 define hidden void @h() local_unnamed_addr #0 { call void asm sideeffect "", ""() #1 br label %11 1: ; preds = %13 call void asm sideeffect "", ""() #1 %2 = load i32, i32* @a, align 4 %3 = udiv i32 %2, 45 %4 = load i32, i32* @b, align 4 %5 = load i32, i32* @f, align 4 %6 = icmp ult i32 %4, %5 %7 = zext i1 %6 to i32 %8 = icmp ule i32 %3, %7 %9 = zext i1 %8 to i32 %10 = icmp ult i32 %14, %9 br i1 %10, label %18, label %11 11: ; preds = %1, %0 store i32 2, i32* @f, align 4 %12 = load i32*, i32** @g, align 4 br label %13 13: ; preds = %13, %11 %14 = load i32, i32* @c, align 4 store i32 11, i32* @d, align 4 store i32 0, i32* @e, align 4 store i32 1, i32* @b, align 4 %15 = load i32, i32* @m, align 4 store i32 %15, i32* %12, align 4 %16 = load i32, i32* @f, align 4 %17 = icmp eq i32 %16, 0 br i1 %17, label %1, label %13 18: ; preds = %1 ret void } ; Function Attrs: nounwind declare void @llvm.stackprotector(i8*, i8**) #1 attributes #0 = { "target-features"="+armv8-a,-fpregs" } attributes #1 = { nounwind } ... --- name: h alignment: 4 tracksRegLiveness: true frameInfo: maxAlignment: 1 maxCallFrameSize: 0 machineFunctionInfo: {} body: | bb.0 (%ir-block.0): INLINEASM &"", 1 /* sideeffect attdialect */ renamable $r10 = MOVi32imm @f renamable $r8 = MOVi32imm @d renamable $r11 = MOVi 11, 14 /* CC::al */, $noreg, $noreg renamable $r2 = MOVi32imm @e renamable $r4 = MOVi 0, 14 /* CC::al */, $noreg, $noreg renamable $r5 = MOVi32imm @b renamable $r6 = MOVi 1, 14 /* CC::al */, $noreg, $noreg renamable $r7 = MOVi32imm @c renamable $r3 = MOVi32imm @m B %bb.2 bb.1 (%ir-block.1): successors: %bb.4(0x04000000), %bb.2(0x7c000000) liveins: $r1, $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r10, $r11 INLINEASM &"", 1 /* sideeffect attdialect */ renamable $r12 = LDRi12 renamable $r10, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @f) renamable $lr = LDRi12 renamable $r5, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @b) CMPrr killed renamable $lr, killed renamable $r12, 14 /* CC::al */, $noreg, implicit-def $cpsr renamable $r12 = MOVi 0, 14 /* CC::al */, $noreg, $noreg renamable $r12 = MOVCCi16 killed renamable $r12, 1, 3 /* CC::lo */, killed $cpsr renamable $r0 = MOVi32imm @a renamable $lr = LDRi12 killed renamable $r0, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @a) renamable $r0 = MOVi32imm 1813430637 dead renamable $r9, renamable $r0 = UMULL renamable $lr, killed renamable $r0, 14 /* CC::al */, $noreg, $noreg renamable $r9 = COPY killed renamable $r0 renamable $r0 = SUBrr killed renamable $lr, renamable $r9, 14 /* CC::al */, $noreg, $noreg renamable $r0 = ADDrsi killed renamable $r9, killed renamable $r0, 11, 14 /* CC::al */, $noreg, $noreg CMPrsi killed renamable $r12, killed renamable $r0, 43, 14 /* CC::al */, $noreg, implicit-def $cpsr renamable $r0 = MOVi 0, 14 /* CC::al */, $noreg, $noreg renamable $r0 = MOVCCi16 killed renamable $r0, 1, 2 /* CC::hs */, killed $cpsr CMPrr killed renamable $r1, killed renamable $r0, 14 /* CC::al */, $noreg, implicit-def $cpsr Bcc %bb.4, 3 /* CC::lo */, killed $cpsr B %bb.2 bb.2 (%ir-block.11): liveins: $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r10, $r11 renamable $r0 = MOVi 2, 14 /* CC::al */, $noreg, $noreg STRi12 killed renamable $r0, renamable $r10, 0, 14 /* CC::al */, $noreg :: (store 4 into @f) renamable $r0 = MOVi32imm @g renamable $r12 = LDRi12 killed renamable $r0, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @g) bb.3 (%ir-block.13): successors: %bb.1(0x04000000), %bb.3(0x7c000000) liveins: $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r10, $r11, $r12 STRi12 renamable $r11, renamable $r8, 0, 14 /* CC::al */, $noreg :: (store 4 into @d) STRi12 renamable $r4, renamable $r2, 0, 14 /* CC::al */, $noreg :: (store 4 into @e) STRi12 renamable $r6, renamable $r5, 0, 14 /* CC::al */, $noreg :: (store 4 into @b) renamable $r1 = LDRi12 renamable $r7, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @c) renamable $r0 = LDRi12 renamable $r3, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @m) STRi12 killed renamable $r0, renamable $r12, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.12) renamable $r0 = LDRi12 renamable $r10, 0, 14 /* CC::al */, $noreg :: (dereferenceable load 4 from @f) CMPri killed renamable $r0, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr Bcc %bb.1, 0 /* CC::eq */, killed $cpsr B %bb.3 bb.4 (%ir-block.18): BX_RET 14 /* CC::al */, $noreg ... Can you double confirm it? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82638/new/ https://reviews.llvm.org/D82638 From llvm-commits at lists.llvm.org Thu Jul 9 04:15:01 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:15:01 +0000 (UTC) Subject: [PATCH] D83472: [SystemZ/ZOS] Add header file to encapsulate use of Message-ID: Kai created this revision. Kai added reviewers: rnk, jfb, majnemer, ilya-biryukov, aganea, hubert.reinterpretcast, kbarton, yusra.syeda, uweigand. Herald added subscribers: llvm-commits, dexonsmith, hiraditya, mgorny. Herald added a project: LLVM. The non-standard header file `` provides some return values. `EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver. On z/OS Unix System Services, this header file does not exists. This patch - adds a check for ``, removing the dependency on `LLVM_ON_UNIX` - adds a new header file `llvm/Support/ExitCodes`, which either includes `` or defines `EX_IOERR` - updates the users of `EX_IOERR` to include the new header file https://reviews.llvm.org/D83472 Files: clang/lib/Driver/Driver.cpp llvm/cmake/config-ix.cmake llvm/include/llvm/Support/ExitCodes.h llvm/lib/Support/CrashRecoveryContext.cpp llvm/lib/Support/Unix/Signals.inc -------------- next part -------------- A non-text attachment was scrubbed... Name: D83472.276699.patch Type: text/x-patch Size: 3806 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:19:14 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:19:14 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: RithikSharma updated this revision to Diff 276701. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83311.276701.patch Type: text/x-patch Size: 18617 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:23:59 2020 From: llvm-commits at lists.llvm.org (zuojian lin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:23:59 +0000 (UTC) Subject: [PATCH] D81631: Fix undefined behavior in Dwarf. In-Reply-To: References: Message-ID: linzj added a comment. > - bool SameAsPrevCU = this == DD->getPrevCU(); + const DwarfCompileUnit *PrevCU = DD->getPrevCU(); Err, once you invoke getPrevCU, then a load-to-undefined instruction will be emitted. So valgrind will still report this problem. valgrind doesn't like undefined value's load. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81631/new/ https://reviews.llvm.org/D81631 From llvm-commits at lists.llvm.org Thu Jul 9 04:28:21 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:28:21 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. In-Reply-To: References: Message-ID: <1916681d6b04b8b4f2462960839a0703@localhost.localdomain> grimar updated this revision to Diff 276702. grimar marked 5 inline comments as done. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83387/new/ https://reviews.llvm.org/D83387 Files: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83387.276702.patch Type: text/x-patch Size: 6676 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:28:38 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:28:38 +0000 (UTC) Subject: [PATCH] D83387: [llvm-readobj] - Add a generic test for --dyn-relocations and fix an issue. In-Reply-To: References: Message-ID: <1feb267f9694b1e29ebce1484c84216a@localhost.localdomain> grimar added inline comments. ================ Comment at: llvm/test/tools/llvm-readobj/ELF/dynamic-reloc.test:59 + +--- !ELF +FileHeader: ---------------- MaskRay wrote: > Can the test be merged with dynamic-reloc-no-section-headers.test? > > We also miss a warning: `'[[FILE]]': section header string table index xxx does not exist` > Can the test be merged with dynamic-reloc-no-section-headers.test? Yes, but I think we should merge `dynamic-reloc-no-section-headers.test` (specific case) -> `here` (general test) after. `dynamic-reloc-no-section-headers.test` tests the case when there are no section headers. It is a sub-case for `--dyn-relocations` and it is implemented with: ``` ## We simulate no section header table by ## overriding the ELF header properties. SHOff: 0x0 SHNum: 0x0 ``` But now we have a different way to do this: ``` SectionHeaderTable: NoHeaders: true ``` So in a follow-up we should be able to remove the `dynamic-reloc-no-section-headers.test` and just extend this test to do something like: ``` SectionHeaderTable: NoHeaders: [[NOHEADERS]] ``` > We also miss a warning: '[[FILE]]': section header string table index xxx does not exist Why should we emit it? We have a `.shstrtab` section in this object (implicitly created). I've added `--implicit-check-not=warning:` just in case, to demonstrate that we expect no warnings here. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83387/new/ https://reviews.llvm.org/D83387 From llvm-commits at lists.llvm.org Thu Jul 9 04:31:24 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:31:24 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <3c63907c76add4abb861044770f54a28@localhost.localdomain> hfinkel accepted this revision. hfinkel added a comment. This revision is now accepted and ready to land. Given that compile time looks okay (and it's clearly doing something, as its causing more simplification (or, at least, fewer instructions)), as does correctness, this LGTM. If we notice any significant performance problems, we can revert (but given that we're relatively close to branching, we should commit this sooner rather than later to give the maximum opportunity to spot such things). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Thu Jul 9 04:35:50 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:35:50 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <3dd3cce5ea348b0f73c1325933952b62@localhost.localdomain> grimar updated this revision to Diff 276703. grimar marked 4 inline comments as done. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 Files: lld/ELF/InputFiles.cpp lld/test/ELF/invalid/reloc-section-reordered.test lld/test/ELF/reloc-sec-before-target.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83469.276703.patch Type: text/x-patch Size: 3731 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:39:48 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:39:48 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <662a2510687e0595621582698118d227@localhost.localdomain> jhenderson accepted this revision. jhenderson added a comment. This revision is now accepted and ready to land. LGTM (with nits), but I'd like a second opinion from somebody with fresher LLD knowledge. ================ Comment at: lld/ELF/InputFiles.cpp:646 + // 1) handle SHF_LINK_ORDER sections. + // 2) create SHT_REL[A} sections. In some cases relocated sections may follow + // the corresponding relocation section. In such a case, the relocation ---------------- Typo? '}' -> ']' ================ Comment at: lld/ELF/InputFiles.cpp:647 + // 2) create SHT_REL[A} sections. In some cases relocated sections may follow + // the corresponding relocation section. In such a case, the relocation + // section would attempt to reference a target section that has not yet ---------------- Oops, my bad: "a case" -> "cases" CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 04:45:50 2020 From: llvm-commits at lists.llvm.org (Ilya Leoshkevich via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:45:50 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <03240e00ca5b4f40ce6c205c7dfd7846@localhost.localdomain> iii updated this revision to Diff 276704. iii added a comment. Thanks for the review! I've changed the return types back to void and added addUnknownType() calls where needed. I've also restored llvm_unreachable() for unknown DIType subclasses. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 Files: llvm/lib/Target/BPF/BTFDebug.cpp llvm/lib/Target/BPF/BTFDebug.h llvm/test/CodeGen/BPF/BTF/double.ll llvm/test/CodeGen/BPF/BTF/float.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83289.276704.patch Type: text/x-patch Size: 11418 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:46:21 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:46:21 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition Message-ID: bbrezillon created this revision. Herald added subscribers: llvm-commits, jvesely, Anastasia. Herald added a project: LLVM. Fix FP_ILOGBNAN definition to match the opencl-c-base.h one and guarantee that FP_ILOGBNAN and FP_ILOGB0 are different. Doing that implies fixing ilogb() implementation to return the right value. Signed-off-by: Boris Brezillon Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83473 Files: libclc/generic/include/clc/float/definitions.h libclc/generic/lib/math/ilogb.cl Index: libclc/generic/lib/math/ilogb.cl =================================================================== --- libclc/generic/lib/math/ilogb.cl +++ libclc/generic/lib/math/ilogb.cl @@ -31,7 +31,15 @@ int rs = -118 - (int) clz(ux & MANTBITS_SP32); int r = (int) (ax >> EXPSHIFTBITS_SP32) - EXPBIAS_SP32; r = ax < 0x00800000U ? rs : r; - r = ax > EXPBITS_SP32 | ax == 0 ? 0x80000000 : r; + r = ax == 0 ? FP_ILOGBNAN : r; + + // We could merge those 2 tests and have: + // + // r = ax >= EXPBITS_SP32 ? 0x7fffffff : r + // + // since FP_ILOGBNAN is set to INT_MAX, but it's clearer this way and + // FP_ILOGBNAN can change without requiring changes to ilogb() code. + r = ax > EXPBITS_SP32 ? FP_ILOGBNAN : r; r = ax == EXPBITS_SP32 ? 0x7fffffff : r; return r; } Index: libclc/generic/include/clc/float/definitions.h =================================================================== --- libclc/generic/include/clc/float/definitions.h +++ libclc/generic/include/clc/float/definitions.h @@ -15,7 +15,7 @@ #define FLT_EPSILON 0x1.0p-23f #define FP_ILOGB0 (-2147483647 - 1) -#define FP_ILOGBNAN (-2147483647 - 1) +#define FP_ILOGBNAN 2147483647 #define M_E_F 0x1.5bf0a8p+1f #define M_LOG2E_F 0x1.715476p+0f -------------- next part -------------- A non-text attachment was scrubbed... Name: D83473.276705.patch Type: text/x-patch Size: 1286 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 04:52:08 2020 From: llvm-commits at lists.llvm.org (Benjamin Kramer via llvm-commits) Date: Thu, 09 Jul 2020 04:52:08 -0700 (PDT) Subject: [llvm] b444705 - Make helpers static. NFC. Message-ID: <5f0704e8.1c69fb81.ee268.6428@mx.google.com> Author: Benjamin Kramer Date: 2020-07-09T13:48:56+02:00 New Revision: b44470547e2ec8a52abb67c3f538ecc49ee27970 URL: https://github.com/llvm/llvm-project/commit/b44470547e2ec8a52abb67c3f538ecc49ee27970 DIFF: https://github.com/llvm/llvm-project/commit/b44470547e2ec8a52abb67c3f538ecc49ee27970.diff LOG: Make helpers static. NFC. Added: Modified: clang/lib/AST/ASTImporter.cpp clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/MC/MCDisassembler/MCDisassembler.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Transforms/Scalar/LowerExpectIntrinsic.cpp llvm/lib/Transforms/Vectorize/VectorCombine.cpp mlir/lib/Conversion/SPIRVToLLVM/ConvertSPIRVToLLVM.cpp mlir/lib/Dialect/Affine/IR/AffineOps.cpp mlir/lib/Dialect/Linalg/Transforms/Loops.cpp mlir/test/lib/Transforms/TestLinalgTransforms.cpp Removed: ################################################################################ diff --git a/clang/lib/AST/ASTImporter.cpp b/clang/lib/AST/ASTImporter.cpp index 8ec6db622f0a..fa2421ee826e 100644 --- a/clang/lib/AST/ASTImporter.cpp +++ b/clang/lib/AST/ASTImporter.cpp @@ -3655,9 +3655,9 @@ struct FriendCountAndPosition { }; template -FriendCountAndPosition getFriendCountAndPosition( +static FriendCountAndPosition getFriendCountAndPosition( const FriendDecl *FD, - std::function GetCanTypeOrDecl) { + llvm::function_ref GetCanTypeOrDecl) { unsigned int FriendCount = 0; llvm::Optional FriendPosition; const auto *RD = cast(FD->getLexicalDeclContext()); @@ -3679,7 +3679,7 @@ FriendCountAndPosition getFriendCountAndPosition( return {FriendCount, *FriendPosition}; } -FriendCountAndPosition getFriendCountAndPosition(const FriendDecl *FD) { +static FriendCountAndPosition getFriendCountAndPosition(const FriendDecl *FD) { if (FD->getFriendType()) return getFriendCountAndPosition(FD, [](const FriendDecl *F) { if (TypeSourceInfo *TSI = F->getFriendType()) diff --git a/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp index 29393c2ca02b..8b575f4f4759 100644 --- a/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp @@ -726,7 +726,8 @@ StdLibraryFunctionsChecker::findFunctionSummary(const CallEvent &Call, return findFunctionSummary(FD, C); } -llvm::Optional lookupType(StringRef Name, const ASTContext &ACtx) { +static llvm::Optional lookupType(StringRef Name, + const ASTContext &ACtx) { IdentifierInfo &II = ACtx.Idents.get(Name); auto LookupRes = ACtx.getTranslationUnitDecl()->lookup(&II); if (LookupRes.size() == 0) diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 3179cb5b4e36..1e8fdb506619 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -4167,7 +4167,7 @@ void llvm::UpgradeSectionAttributes(Module &M) { } } - +namespace { // Prior to LLVM 10.0, the strictfp attribute could be used on individual // callsites within a function that did not also have the strictfp attribute. // Since 10.0, if strict FP semantics are needed within a function, the @@ -4185,7 +4185,7 @@ struct StrictFPUpgradeVisitor : public InstVisitor { void visitCallBase(CallBase &Call) { if (!Call.isStrictFP()) return; - if (dyn_cast(&Call)) + if (isa(&Call)) return; // If we get here, the caller doesn't have the strictfp attribute // but this callsite does. Replace the strictfp attribute with nobuiltin. @@ -4193,6 +4193,7 @@ struct StrictFPUpgradeVisitor : public InstVisitor { Call.addAttribute(AttributeList::FunctionIndex, Attribute::NoBuiltin); } }; +} // namespace void llvm::UpgradeFunctionAttributes(Function &F) { // If a function definition doesn't have the strictfp attribute, diff --git a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp index 9cdacb64c4f4..a58e8f6d9bcc 100644 --- a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp +++ b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp @@ -47,7 +47,7 @@ void MCDisassembler::setSymbolizer(std::unique_ptr Symzer) { case XCOFF::XMC_##A: \ return P; -uint8_t getSMCPriority(XCOFF::StorageMappingClass SMC) { +static uint8_t getSMCPriority(XCOFF::StorageMappingClass SMC) { switch (SMC) { SMC_PCASE(PR, 1) SMC_PCASE(RO, 1) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 017bfba94b61..9f3321922d6a 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -42429,7 +42429,7 @@ static SDValue PromoteMaskArithmetic(SDNode *N, SelectionDAG &DAG, } } -unsigned convertIntLogicToFPLogicOpcode(unsigned Opcode) { +static unsigned convertIntLogicToFPLogicOpcode(unsigned Opcode) { unsigned FPOpcode; switch (Opcode) { default: llvm_unreachable("Unexpected input node for FP logic conversion"); diff --git a/llvm/lib/Transforms/Scalar/LowerExpectIntrinsic.cpp b/llvm/lib/Transforms/Scalar/LowerExpectIntrinsic.cpp index 05db70c787bb..0fe7dd9cfb39 100644 --- a/llvm/lib/Transforms/Scalar/LowerExpectIntrinsic.cpp +++ b/llvm/lib/Transforms/Scalar/LowerExpectIntrinsic.cpp @@ -55,8 +55,8 @@ static cl::opt UnlikelyBranchWeight( "unlikely-branch-weight", cl::Hidden, cl::init(1), cl::desc("Weight of the branch unlikely to be taken (default = 1)")); -std::tuple getBranchWeight(Intrinsic::ID IntrinsicID, - CallInst *CI, int BranchCount) { +static std::tuple +getBranchWeight(Intrinsic::ID IntrinsicID, CallInst *CI, int BranchCount) { if (IntrinsicID == Intrinsic::expect) { // __builtin_expect return std::make_tuple(LikelyBranchWeight.getValue(), diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp index 32332ed5b02d..64b41bf9cefa 100644 --- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp +++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp @@ -50,6 +50,7 @@ static cl::opt DisableBinopExtractShuffle( static const unsigned InvalidIndex = std::numeric_limits::max(); +namespace { class VectorCombine { public: VectorCombine(Function &F, const TargetTransformInfo &TTI, @@ -80,6 +81,7 @@ class VectorCombine { bool scalarizeBinopOrCmp(Instruction &I); bool foldExtractedCmps(Instruction &I); }; +} // namespace static void replaceValue(Value &Old, Value &New) { Old.replaceAllUsesWith(&New); diff --git a/mlir/lib/Conversion/SPIRVToLLVM/ConvertSPIRVToLLVM.cpp b/mlir/lib/Conversion/SPIRVToLLVM/ConvertSPIRVToLLVM.cpp index 52fa2091389e..6d0028b38ec2 100644 --- a/mlir/lib/Conversion/SPIRVToLLVM/ConvertSPIRVToLLVM.cpp +++ b/mlir/lib/Conversion/SPIRVToLLVM/ConvertSPIRVToLLVM.cpp @@ -71,7 +71,7 @@ static unsigned getLLVMTypeBitWidth(LLVM::LLVMType type) { } /// Creates `IntegerAttribute` with all bits set for given type -IntegerAttr minusOneIntegerAttribute(Type type, Builder builder) { +static IntegerAttr minusOneIntegerAttribute(Type type, Builder builder) { if (auto vecType = type.dyn_cast()) { auto integerType = vecType.getElementType().cast(); return builder.getIntegerAttr(integerType, -1); diff --git a/mlir/lib/Dialect/Affine/IR/AffineOps.cpp b/mlir/lib/Dialect/Affine/IR/AffineOps.cpp index 3f10e744f419..4367fa39789c 100644 --- a/mlir/lib/Dialect/Affine/IR/AffineOps.cpp +++ b/mlir/lib/Dialect/Affine/IR/AffineOps.cpp @@ -1690,7 +1690,7 @@ void mlir::extractForInductionVars(ArrayRef forInsts, /// Builds an affine loop nest, using "loopCreatorFn" to create individual loop /// operations. template -void buildAffineLoopNestImpl( +static void buildAffineLoopNestImpl( OpBuilder &builder, Location loc, BoundListTy lbs, BoundListTy ubs, ArrayRef steps, function_ref bodyBuilderFn, diff --git a/mlir/lib/Dialect/Linalg/Transforms/Loops.cpp b/mlir/lib/Dialect/Linalg/Transforms/Loops.cpp index 6cbe947657a0..6a1d00fe620c 100644 --- a/mlir/lib/Dialect/Linalg/Transforms/Loops.cpp +++ b/mlir/lib/Dialect/Linalg/Transforms/Loops.cpp @@ -609,8 +609,8 @@ mlir::createConvertLinalgToAffineLoopsPass() { // TODO: gradually remove this layer as more ops become "named". template -Optional linalgOpToLoopsImplSwitch(Operation *op, - OpBuilder &builder) { +static Optional linalgOpToLoopsImplSwitch(Operation *op, + OpBuilder &builder) { assert(isa(op) && "LinalgOp expected"); if (isa(op)) return linalgOpToLoopsImpl(op, builder); diff --git a/mlir/test/lib/Transforms/TestLinalgTransforms.cpp b/mlir/test/lib/Transforms/TestLinalgTransforms.cpp index 44a0e8cbdb14..f93cd9faa504 100644 --- a/mlir/test/lib/Transforms/TestLinalgTransforms.cpp +++ b/mlir/test/lib/Transforms/TestLinalgTransforms.cpp @@ -244,8 +244,8 @@ static LogicalResult copyCallBackFn(OpBuilder &b, Value src, Value dst, return success(); } -void fillPromotionCallBackPatterns(MLIRContext *ctx, - OwningRewritePatternList &patterns) { +static void fillPromotionCallBackPatterns(MLIRContext *ctx, + OwningRewritePatternList &patterns) { patterns.insert>( ctx, LinalgTilingOptions().setTileSizes({16, 16, 16}), LinalgMarker(Identifier::get("START", ctx), From llvm-commits at lists.llvm.org Thu Jul 9 04:58:07 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 11:58:07 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: <085c909301e903c400a80fb60a834c7b@localhost.localdomain> bbrezillon marked an inline comment as done. bbrezillon added inline comments. ================ Comment at: libclc/generic/lib/math/ilogb.cl:34 r = ax < 0x00800000U ? rs : r; - r = ax > EXPBITS_SP32 | ax == 0 ? 0x80000000 : r; + r = ax == 0 ? FP_ILOGBNAN : r; + ---------------- My bad, should be FP_ILOGB0 here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Thu Jul 9 05:05:28 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:05:28 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: bbrezillon updated this revision to Diff 276706. bbrezillon added a comment. Use FP_ILOGB0 where appropriate. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 Files: libclc/generic/include/clc/float/definitions.h libclc/generic/lib/math/ilogb.cl Index: libclc/generic/lib/math/ilogb.cl =================================================================== --- libclc/generic/lib/math/ilogb.cl +++ libclc/generic/lib/math/ilogb.cl @@ -31,7 +31,15 @@ int rs = -118 - (int) clz(ux & MANTBITS_SP32); int r = (int) (ax >> EXPSHIFTBITS_SP32) - EXPBIAS_SP32; r = ax < 0x00800000U ? rs : r; - r = ax > EXPBITS_SP32 | ax == 0 ? 0x80000000 : r; + r = ax == 0 ? FP_ILOGB0 : r; + + // We could merge those 2 tests and have: + // + // r = ax >= EXPBITS_SP32 ? 0x7fffffff : r + // + // since FP_ILOGBNAN is set to INT_MAX, but it's clearer this way and + // FP_ILOGBNAN can change without requiring changes to ilogb() code. + r = ax > EXPBITS_SP32 ? FP_ILOGBNAN : r; r = ax == EXPBITS_SP32 ? 0x7fffffff : r; return r; } Index: libclc/generic/include/clc/float/definitions.h =================================================================== --- libclc/generic/include/clc/float/definitions.h +++ libclc/generic/include/clc/float/definitions.h @@ -15,7 +15,7 @@ #define FLT_EPSILON 0x1.0p-23f #define FP_ILOGB0 (-2147483647 - 1) -#define FP_ILOGBNAN (-2147483647 - 1) +#define FP_ILOGBNAN 2147483647 #define M_E_F 0x1.5bf0a8p+1f #define M_LOG2E_F 0x1.715476p+0f -------------- next part -------------- A non-text attachment was scrubbed... Name: D83473.276706.patch Type: text/x-patch Size: 1284 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:07:26 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Thu, 09 Jul 2020 05:07:26 -0700 (PDT) Subject: [llvm] a86ce06 - [SCCP] Use conditional info with AND/OR branch conditions. Message-ID: <5f07087e.1c69fb81.2a2cd.6647@mx.google.com> Author: Florian Hahn Date: 2020-07-09T12:59:24+01:00 New Revision: a86ce06fafaa051554c6a21d487fa70e998dcafe URL: https://github.com/llvm/llvm-project/commit/a86ce06fafaa051554c6a21d487fa70e998dcafe DIFF: https://github.com/llvm/llvm-project/commit/a86ce06fafaa051554c6a21d487fa70e998dcafe.diff LOG: [SCCP] Use conditional info with AND/OR branch conditions. Currently SCCP does not combine the information of conditions joined by AND in the true branch or OR in the false branch. For branches on AND, 2 copies will be inserted for the true branch, with one being the operand of the other as in the code below. We can combine the information using intersection. Note that for the OR case, the copies are inserted in the false branch, where using intersection is safe as well. define void @foo(i32 %a) { entry: %lt = icmp ult i32 %a, 100 %gt = icmp ugt i32 %a, 20 %and = and i1 %lt, %gt ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] } %a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a) ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] } %a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0) br i1 %and, label %true, label %false true: ; preds = %entry call void @use(i32 %a.1) %true.1 = icmp ne i32 %a.1, 20 call void @use.i1(i1 %true.1) ret void false: ; preds = %entry call void @use(i32 %a.1) ret void } Reviewers: efriedma, davide, mssimpso, nikic Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D77808 Added: Modified: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/conditions-ranges.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/SCCP.cpp b/llvm/lib/Transforms/Scalar/SCCP.cpp index 8ba118291206..5ebd3b71fe78 100644 --- a/llvm/lib/Transforms/Scalar/SCCP.cpp +++ b/llvm/lib/Transforms/Scalar/SCCP.cpp @@ -1258,7 +1258,7 @@ void SCCPSolver::handleCallResult(CallBase &CB) { return; Value *CopyOf = CB.getOperand(0); - ValueLatticeElement OriginalVal = getValueState(CopyOf); + ValueLatticeElement CopyOfVal = getValueState(CopyOf); auto *PI = getPredicateInfoFor(&CB); assert(PI && "Missing predicate info for ssa.copy"); @@ -1271,25 +1271,27 @@ void SCCPSolver::handleCallResult(CallBase &CB) { Cmp = dyn_cast(PAssume->Condition); TrueEdge = true; } else { - mergeInValue(ValueState[&CB], &CB, OriginalVal); + mergeInValue(ValueState[&CB], &CB, CopyOfVal); return; } // Everything below relies on the condition being a comparison. if (!Cmp) { - mergeInValue(ValueState[&CB], &CB, OriginalVal); + mergeInValue(ValueState[&CB], &CB, CopyOfVal); return; } + Value *RenamedOp = PI->RenamedOp; Value *CmpOp0 = Cmp->getOperand(0); Value *CmpOp1 = Cmp->getOperand(1); - if (CopyOf != CmpOp0 && CopyOf != CmpOp1) { - mergeInValue(ValueState[&CB], &CB, OriginalVal); + // Bail out if neither of the operands matches RenamedOp. + if (CmpOp0 != RenamedOp && CmpOp1 != RenamedOp) { + mergeInValue(ValueState[&CB], &CB, getValueState(CopyOf)); return; } auto Pred = Cmp->getPredicate(); - if (CmpOp0 != CopyOf) { + if (CmpOp1 == RenamedOp) { std::swap(CmpOp0, CmpOp1); Pred = Cmp->getSwappedPredicate(); } @@ -1300,27 +1302,37 @@ void SCCPSolver::handleCallResult(CallBase &CB) { return; } + // The code below relies on PredicateInfo only inserting copies for the + // true branch when the branch condition is an AND and only inserting + // copies for the false branch when the branch condition is an OR. This + // ensures we can intersect the range from the condition with the range of + // CopyOf. if (!TrueEdge) Pred = CmpInst::getInversePredicate(Pred); ValueLatticeElement CondVal = getValueState(CmpOp1); ValueLatticeElement &IV = ValueState[&CB]; - if (CondVal.isConstantRange() || OriginalVal.isConstantRange()) { - auto NewCR = + if (CondVal.isConstantRange() || CopyOfVal.isConstantRange()) { + auto ImposedCR = ConstantRange::getFull(DL.getTypeSizeInBits(CopyOf->getType())); // Get the range imposed by the condition. if (CondVal.isConstantRange()) - NewCR = ConstantRange::makeAllowedICmpRegion( + ImposedCR = ConstantRange::makeAllowedICmpRegion( Pred, CondVal.getConstantRange()); // Combine range info for the original value with the new range from the // condition. - auto OriginalCR = OriginalVal.isConstantRange() - ? OriginalVal.getConstantRange() - : ConstantRange::getFull( - DL.getTypeSizeInBits(CopyOf->getType())); - NewCR = NewCR.intersectWith(OriginalCR); + auto CopyOfCR = CopyOfVal.isConstantRange() + ? CopyOfVal.getConstantRange() + : ConstantRange::getFull( + DL.getTypeSizeInBits(CopyOf->getType())); + auto NewCR = ImposedCR.intersectWith(CopyOfCR); + // If the existing information is != x, do not use the information from + // a chained predicate, as the != x information is more likely to be + // helpful in practice. + if (!CopyOfCR.contains(NewCR) && CopyOfCR.getSingleMissingElement()) + NewCR = CopyOfCR; addAdditionalUser(CmpOp1, &CB); // TODO: Actually filp MayIncludeUndef for the created range to false, @@ -1344,7 +1356,7 @@ void SCCPSolver::handleCallResult(CallBase &CB) { return; } - return (void)mergeInValue(IV, &CB, OriginalVal); + return (void)mergeInValue(IV, &CB, CopyOfVal); } } diff --git a/llvm/test/Transforms/SCCP/conditions-ranges.ll b/llvm/test/Transforms/SCCP/conditions-ranges.ll index ea857a100854..612a38f008fc 100644 --- a/llvm/test/Transforms/SCCP/conditions-ranges.ll +++ b/llvm/test/Transforms/SCCP/conditions-ranges.ll @@ -814,14 +814,11 @@ define void @f16_conditions_and(i32 %a, i32 %b) { ; CHECK-NEXT: [[BC:%.*]] = and i1 [[LT]], [[GT]] ; CHECK-NEXT: br i1 [[BC]], label [[TRUE:%.*]], label [[FALSE:%.*]] ; CHECK: true: -; CHECK-NEXT: [[F_1:%.*]] = icmp eq i32 [[A]], 0 -; CHECK-NEXT: call void @use(i1 [[F_1]]) -; CHECK-NEXT: [[F_2:%.*]] = icmp eq i32 [[A]], 20 -; CHECK-NEXT: call void @use(i1 [[F_2]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: call void @use(i1 false) ; CHECK-NEXT: call void @use(i1 false) ; CHECK-NEXT: call void @use(i1 true) -; CHECK-NEXT: [[T_2:%.*]] = icmp ne i32 [[A]], 20 -; CHECK-NEXT: call void @use(i1 [[T_2]]) +; CHECK-NEXT: call void @use(i1 true) ; CHECK-NEXT: [[C_1:%.*]] = icmp eq i32 [[A]], 21 ; CHECK-NEXT: call void @use(i1 [[C_1]]) ; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[A]], 21 @@ -899,10 +896,8 @@ define void @f17_conditions_or(i32 %a, i32 %b) { ; CHECK: false: ; CHECK-NEXT: call void @use(i1 false) ; CHECK-NEXT: call void @use(i1 false) -; CHECK-NEXT: [[F_3:%.*]] = icmp ugt i32 [[A]], 100 -; CHECK-NEXT: call void @use(i1 [[F_3]]) -; CHECK-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 100 -; CHECK-NEXT: call void @use(i1 [[T_1]]) +; CHECK-NEXT: call void @use(i1 false) +; CHECK-NEXT: call void @use(i1 true) ; CHECK-NEXT: call void @use(i1 true) ; CHECK-NEXT: [[C_1:%.*]] = icmp eq i32 [[A]], 21 ; CHECK-NEXT: call void @use(i1 [[C_1]]) From llvm-commits at lists.llvm.org Thu Jul 9 05:07:34 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:07:34 +0000 (UTC) Subject: [PATCH] D77808: [SCCP] Use conditional info with AND/OR branch conditions. In-Reply-To: References: Message-ID: <797188269709827b4efab3685002336c@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGa86ce06fafaa: [SCCP] Use conditional info with AND/OR branch conditions. (authored by fhahn). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77808/new/ https://reviews.llvm.org/D77808 Files: llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/conditions-ranges.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D77808.276707.patch Type: text/x-patch Size: 5678 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:16:45 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via llvm-commits) Date: Thu, 09 Jul 2020 05:16:45 -0700 (PDT) Subject: [lld] 68f5a8b - [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. Message-ID: <5f070aad.1c69fb81.34b5.23fb@mx.google.com> Author: Igor Kudrin Date: 2020-07-09T19:15:11+07:00 New Revision: 68f5a8b2042b8c4dc83d1851b462a0570eb3410f URL: https://github.com/llvm/llvm-project/commit/68f5a8b2042b8c4dc83d1851b462a0570eb3410f DIFF: https://github.com/llvm/llvm-project/commit/68f5a8b2042b8c4dc83d1851b462a0570eb3410f.diff LOG: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. The parsing method did not check reading errors and might easily fall into an infinite loop on an invalid input because of that. Differential Revision: https://reviews.llvm.org/D83049 Added: lld/test/ELF/gdb-index-invalid-pubnames.s llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s Modified: lld/ELF/DWARF.h lld/ELF/SyntheticSections.cpp llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp Removed: ################################################################################ diff --git a/lld/ELF/DWARF.h b/lld/ELF/DWARF.h index 8609e35faf95..a12dae6e9960 100644 --- a/lld/ELF/DWARF.h +++ b/lld/ELF/DWARF.h @@ -56,11 +56,11 @@ template class LLDDwarfObj final : public llvm::DWARFObject { return addrSection; } - const llvm::DWARFSection &getGnuPubnamesSection() const override { + const LLDDWARFSection &getGnuPubnamesSection() const override { return gnuPubnamesSection; } - const llvm::DWARFSection &getGnuPubtypesSection() const override { + const LLDDWARFSection &getGnuPubtypesSection() const override { return gnuPubtypesSection; } diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp index f6d66fff6d4b..33748f881576 100644 --- a/lld/ELF/SyntheticSections.cpp +++ b/lld/ELF/SyntheticSections.cpp @@ -2705,12 +2705,15 @@ template static std::vector readPubNamesAndTypes(const LLDDwarfObj &obj, const std::vector &cus) { - const DWARFSection &pubNames = obj.getGnuPubnamesSection(); - const DWARFSection &pubTypes = obj.getGnuPubtypesSection(); + const LLDDWARFSection &pubNames = obj.getGnuPubnamesSection(); + const LLDDWARFSection &pubTypes = obj.getGnuPubtypesSection(); std::vector ret; - for (const DWARFSection *pub : {&pubNames, &pubTypes}) { - DWARFDebugPubTable table(obj, *pub, config->isLE, true); + for (const LLDDWARFSection *pub : {&pubNames, &pubTypes}) { + DWARFDataExtractor data(obj, *pub, config->isLE, config->wordsize); + DWARFDebugPubTable table; + if (Error e = table.extract(data, /*GnuStyle=*/true)) + warn(toString(pub->sec) + ": " + toString(std::move(e))); for (const DWARFDebugPubTable::Set &set : table.getData()) { // The value written into the constant pool is kind << 24 | cuIndex. As we // don't know how many compilation units precede this object to compute diff --git a/lld/test/ELF/gdb-index-invalid-pubnames.s b/lld/test/ELF/gdb-index-invalid-pubnames.s new file mode 100644 index 000000000000..15eb86ee7c1e --- /dev/null +++ b/lld/test/ELF/gdb-index-invalid-pubnames.s @@ -0,0 +1,26 @@ +# REQUIRES: x86 +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t +# RUN: ld.lld --gdb-index %t -o /dev/null 2>&1 | FileCheck %s + +# CHECK: warning: {{.*}}(.debug_gnu_pubnames): unexpected end of data at offset 0x1 while reading [0x0, 0x4) + + .section .debug_abbrev,"", at progbits + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 0 # DW_CHILDREN_no + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + + .section .debug_info,"", at progbits +.LCUBegin: + .long .LUnitEnd-.LUnitBegin # Length of Unit +.LUnitBegin: + .short 4 # DWARF version number + .long .debug_abbrev # Offset Into Abbrev. Section + .byte 8 # Address Size (in bytes) + .byte 1 # Abbrev [1] DW_TAG_compile_unit +.LUnitEnd: + + .section .debug_gnu_pubnames,"", at progbits + .byte 0 diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h index 60f80bb12aa9..80c2d75bdc2a 100644 --- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h +++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h @@ -12,6 +12,7 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/StringRef.h" #include "llvm/BinaryFormat/Dwarf.h" +#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h" #include "llvm/DebugInfo/DWARF/DWARFObject.h" #include #include @@ -67,11 +68,12 @@ class DWARFDebugPubTable { /// gnu styled tables contains additional information. /// This flag determines whether or not section we parse is debug_gnu* table. - bool GnuStyle; + bool GnuStyle = false; public: - DWARFDebugPubTable(const DWARFObject &Obj, const DWARFSection &Sec, - bool LittleEndian, bool GnuStyle); + DWARFDebugPubTable() = default; + + Error extract(DWARFDataExtractor Data, bool GnuStyle); void dump(raw_ostream &OS) const; diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp index ebe2600e0689..dba6b85e9104 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp @@ -336,6 +336,14 @@ static void dumpLoclistsSection(raw_ostream &OS, DIDumpOptions DumpOpts, } } +static void dumpPubTableSection(raw_ostream &OS, DIDumpOptions DumpOpts, + DWARFDataExtractor Data, bool GnuStyle) { + DWARFDebugPubTable Table; + if (Error E = Table.extract(Data, GnuStyle)) + DumpOpts.RecoverableErrorHandler(std::move(E)); + Table.dump(OS); +} + void DWARFContext::dump( raw_ostream &OS, DIDumpOptions DumpOpts, std::array, DIDT_ID_Count> DumpOffsets) { @@ -626,26 +634,32 @@ void DWARFContext::dump( } if (shouldDump(Explicit, ".debug_pubnames", DIDT_ID_DebugPubnames, - DObj->getPubnamesSection().Data)) - DWARFDebugPubTable(*DObj, DObj->getPubnamesSection(), isLittleEndian(), false) - .dump(OS); + DObj->getPubnamesSection().Data)) { + DWARFDataExtractor PubTableData(*DObj, DObj->getPubnamesSection(), + isLittleEndian(), 0); + dumpPubTableSection(OS, DumpOpts, PubTableData, /*GnuStyle=*/false); + } if (shouldDump(Explicit, ".debug_pubtypes", DIDT_ID_DebugPubtypes, - DObj->getPubtypesSection().Data)) - DWARFDebugPubTable(*DObj, DObj->getPubtypesSection(), isLittleEndian(), false) - .dump(OS); + DObj->getPubtypesSection().Data)) { + DWARFDataExtractor PubTableData(*DObj, DObj->getPubtypesSection(), + isLittleEndian(), 0); + dumpPubTableSection(OS, DumpOpts, PubTableData, /*GnuStyle=*/false); + } if (shouldDump(Explicit, ".debug_gnu_pubnames", DIDT_ID_DebugGnuPubnames, - DObj->getGnuPubnamesSection().Data)) - DWARFDebugPubTable(*DObj, DObj->getGnuPubnamesSection(), isLittleEndian(), - true /* GnuStyle */) - .dump(OS); + DObj->getGnuPubnamesSection().Data)) { + DWARFDataExtractor PubTableData(*DObj, DObj->getGnuPubnamesSection(), + isLittleEndian(), 0); + dumpPubTableSection(OS, DumpOpts, PubTableData, /*GnuStyle=*/true); + } if (shouldDump(Explicit, ".debug_gnu_pubtypes", DIDT_ID_DebugGnuPubtypes, - DObj->getGnuPubtypesSection().Data)) - DWARFDebugPubTable(*DObj, DObj->getGnuPubtypesSection(), isLittleEndian(), - true /* GnuStyle */) - .dump(OS); + DObj->getGnuPubtypesSection().Data)) { + DWARFDataExtractor PubTableData(*DObj, DObj->getGnuPubtypesSection(), + isLittleEndian(), 0); + dumpPubTableSection(OS, DumpOpts, PubTableData, /*GnuStyle=*/true); + } if (shouldDump(Explicit, ".debug_str_offsets", DIDT_ID_DebugStrOffsets, DObj->getStrOffsetsSection().Data)) diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp index eecca87d5270..45e16653420c 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp @@ -18,34 +18,32 @@ using namespace llvm; using namespace dwarf; -DWARFDebugPubTable::DWARFDebugPubTable(const DWARFObject &Obj, - const DWARFSection &Sec, - bool LittleEndian, bool GnuStyle) - : GnuStyle(GnuStyle) { - DWARFDataExtractor PubNames(Obj, Sec, LittleEndian, 0); - uint64_t Offset = 0; - while (PubNames.isValidOffset(Offset)) { +Error DWARFDebugPubTable::extract(DWARFDataExtractor Data, bool GnuStyle) { + this->GnuStyle = GnuStyle; + Sets.clear(); + DataExtractor::Cursor C(0); + while (C && Data.isValidOffset(C.tell())) { Sets.push_back({}); Set &SetData = Sets.back(); - std::tie(SetData.Length, SetData.Format) = - PubNames.getInitialLength(&Offset); + std::tie(SetData.Length, SetData.Format) = Data.getInitialLength(C); const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(SetData.Format); - SetData.Version = PubNames.getU16(&Offset); - SetData.Offset = PubNames.getRelocatedValue(OffsetSize, &Offset); - SetData.Size = PubNames.getUnsigned(&Offset, OffsetSize); + SetData.Version = Data.getU16(C); + SetData.Offset = Data.getRelocatedValue(C, OffsetSize); + SetData.Size = Data.getUnsigned(C, OffsetSize); - while (Offset < Sec.Data.size()) { - uint64_t DieRef = PubNames.getUnsigned(&Offset, OffsetSize); + while (C) { + uint64_t DieRef = Data.getUnsigned(C, OffsetSize); if (DieRef == 0) break; - uint8_t IndexEntryValue = GnuStyle ? PubNames.getU8(&Offset) : 0; - StringRef Name = PubNames.getCStrRef(&Offset); + uint8_t IndexEntryValue = GnuStyle ? Data.getU8(C) : 0; + StringRef Name = Data.getCStrRef(C); SetData.Entries.push_back( {DieRef, PubIndexEntryDescriptor(IndexEntryValue), Name}); } } + return C.takeError(); } void DWARFDebugPubTable::dump(raw_ostream &OS) const { diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s b/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s new file mode 100644 index 000000000000..cfd75c2a9aca --- /dev/null +++ b/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s @@ -0,0 +1,26 @@ +# RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t +# RUN: not llvm-dwarfdump -v %t 2>&1 | FileCheck %s + +# CHECK: .debug_pubnames contents: +# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) + +# CHECK: .debug_pubtypes contents: +# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) + +# CHECK: .debug_gnu_pubnames contents: +# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) + +# CHECK: .debug_gnu_pubtypes contents: +# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) + + .section .debug_pubnames,"", at progbits + .byte 0 + + .section .debug_pubtypes,"", at progbits + .byte 0 + + .section .debug_gnu_pubnames,"", at progbits + .byte 0 + + .section .debug_gnu_pubtypes,"", at progbits + .byte 0 From llvm-commits at lists.llvm.org Thu Jul 9 05:16:47 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via llvm-commits) Date: Thu, 09 Jul 2020 05:16:47 -0700 (PDT) Subject: [lld] ca4d8da - [DebugInfo] Add more checks to parsing .debug_pub* sections. Message-ID: <5f070aaf.1c69fb81.e33e7.65ca@mx.google.com> Author: Igor Kudrin Date: 2020-07-09T19:15:31+07:00 New Revision: ca4d8da0c33cd9bcd05f94b4b3ac125b72be2a2a URL: https://github.com/llvm/llvm-project/commit/ca4d8da0c33cd9bcd05f94b4b3ac125b72be2a2a DIFF: https://github.com/llvm/llvm-project/commit/ca4d8da0c33cd9bcd05f94b4b3ac125b72be2a2a.diff LOG: [DebugInfo] Add more checks to parsing .debug_pub* sections. The patch adds checking for various potential issues in parsing name lookup tables and reporting them as recoverable errors, similarly as we do for other tables. Differential Revision: https://reviews.llvm.org/D83050 Added: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s Modified: lld/ELF/SyntheticSections.cpp lld/test/ELF/Inputs/gdb-index.s lld/test/ELF/gdb-index-invalid-pubnames.s lld/test/ELF/gdb-index.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp Removed: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s ################################################################################ diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp index 33748f881576..731b9f658060 100644 --- a/lld/ELF/SyntheticSections.cpp +++ b/lld/ELF/SyntheticSections.cpp @@ -2712,8 +2712,9 @@ readPubNamesAndTypes(const LLDDwarfObj &obj, for (const LLDDWARFSection *pub : {&pubNames, &pubTypes}) { DWARFDataExtractor data(obj, *pub, config->isLE, config->wordsize); DWARFDebugPubTable table; - if (Error e = table.extract(data, /*GnuStyle=*/true)) + table.extract(data, /*GnuStyle=*/true, [&](Error e) { warn(toString(pub->sec) + ": " + toString(std::move(e))); + }); for (const DWARFDebugPubTable::Set &set : table.getData()) { // The value written into the constant pool is kind << 24 | cuIndex. As we // don't know how many compilation units precede this object to compute diff --git a/lld/test/ELF/Inputs/gdb-index.s b/lld/test/ELF/Inputs/gdb-index.s index 794995c150f9..88474e1fdb12 100644 --- a/lld/test/ELF/Inputs/gdb-index.s +++ b/lld/test/ELF/Inputs/gdb-index.s @@ -53,7 +53,7 @@ aaaaaaaaaaaaaaaa: .byte 0 .section .debug_gnu_pubnames,"", at progbits -.long 0x18 +.long 0x24 .value 0x2 .long 0 .long 0x33 diff --git a/lld/test/ELF/gdb-index-invalid-pubnames.s b/lld/test/ELF/gdb-index-invalid-pubnames.s index 15eb86ee7c1e..fc10dac487c5 100644 --- a/lld/test/ELF/gdb-index-invalid-pubnames.s +++ b/lld/test/ELF/gdb-index-invalid-pubnames.s @@ -2,7 +2,7 @@ # RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t # RUN: ld.lld --gdb-index %t -o /dev/null 2>&1 | FileCheck %s -# CHECK: warning: {{.*}}(.debug_gnu_pubnames): unexpected end of data at offset 0x1 while reading [0x0, 0x4) +# CHECK: warning: {{.*}}(.debug_gnu_pubnames): name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4) .section .debug_abbrev,"", at progbits .byte 1 # Abbreviation Code diff --git a/lld/test/ELF/gdb-index.s b/lld/test/ELF/gdb-index.s index bb8ecf34bb6a..546590ab359e 100644 --- a/lld/test/ELF/gdb-index.s +++ b/lld/test/ELF/gdb-index.s @@ -109,7 +109,7 @@ entrypoint: .byte 0 .section .debug_gnu_pubnames,"", at progbits -.long 0x18 +.long 0x1e .value 0x2 .long 0 .long 0x33 diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h index 80c2d75bdc2a..cb347615868b 100644 --- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h +++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h @@ -73,7 +73,8 @@ class DWARFDebugPubTable { public: DWARFDebugPubTable() = default; - Error extract(DWARFDataExtractor Data, bool GnuStyle); + void extract(DWARFDataExtractor Data, bool GnuStyle, + function_ref RecoverableErrorHandler); void dump(raw_ostream &OS) const; diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp index dba6b85e9104..bf6219497770 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp @@ -339,8 +339,7 @@ static void dumpLoclistsSection(raw_ostream &OS, DIDumpOptions DumpOpts, static void dumpPubTableSection(raw_ostream &OS, DIDumpOptions DumpOpts, DWARFDataExtractor Data, bool GnuStyle) { DWARFDebugPubTable Table; - if (Error E = Table.extract(Data, GnuStyle)) - DumpOpts.RecoverableErrorHandler(std::move(E)); + Table.extract(Data, GnuStyle, DumpOpts.RecoverableErrorHandler); Table.dump(OS); } diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp index 45e16653420c..fea3b9ace8ca 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp @@ -11,6 +11,7 @@ #include "llvm/ADT/StringRef.h" #include "llvm/BinaryFormat/Dwarf.h" #include "llvm/Support/DataExtractor.h" +#include "llvm/Support/Errc.h" #include "llvm/Support/Format.h" #include "llvm/Support/raw_ostream.h" #include @@ -18,32 +19,75 @@ using namespace llvm; using namespace dwarf; -Error DWARFDebugPubTable::extract(DWARFDataExtractor Data, bool GnuStyle) { +void DWARFDebugPubTable::extract( + DWARFDataExtractor Data, bool GnuStyle, + function_ref RecoverableErrorHandler) { this->GnuStyle = GnuStyle; Sets.clear(); - DataExtractor::Cursor C(0); - while (C && Data.isValidOffset(C.tell())) { + uint64_t Offset = 0; + while (Data.isValidOffset(Offset)) { + uint64_t SetOffset = Offset; Sets.push_back({}); - Set &SetData = Sets.back(); + Set &NewSet = Sets.back(); - std::tie(SetData.Length, SetData.Format) = Data.getInitialLength(C); - const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(SetData.Format); + DataExtractor::Cursor C(Offset); + std::tie(NewSet.Length, NewSet.Format) = Data.getInitialLength(C); + if (!C) { + // Drop the newly added set because it does not contain anything useful + // to dump. + Sets.pop_back(); + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 " parsing failed: %s", + SetOffset, toString(C.takeError()).c_str())); + return; + } + + Offset = C.tell() + NewSet.Length; + DWARFDataExtractor SetData(Data, Offset); + const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(NewSet.Format); + + NewSet.Version = SetData.getU16(C); + NewSet.Offset = SetData.getRelocatedValue(C, OffsetSize); + NewSet.Size = SetData.getUnsigned(C, OffsetSize); - SetData.Version = Data.getU16(C); - SetData.Offset = Data.getRelocatedValue(C, OffsetSize); - SetData.Size = Data.getUnsigned(C, OffsetSize); + if (!C) { + // Preserve the newly added set because at least some fields of the header + // are read and can be dumped. + RecoverableErrorHandler( + createStringError(errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " does not have a complete header: %s", + SetOffset, toString(C.takeError()).c_str())); + continue; + } while (C) { - uint64_t DieRef = Data.getUnsigned(C, OffsetSize); + uint64_t DieRef = SetData.getUnsigned(C, OffsetSize); if (DieRef == 0) break; - uint8_t IndexEntryValue = GnuStyle ? Data.getU8(C) : 0; - StringRef Name = Data.getCStrRef(C); - SetData.Entries.push_back( - {DieRef, PubIndexEntryDescriptor(IndexEntryValue), Name}); + uint8_t IndexEntryValue = GnuStyle ? SetData.getU8(C) : 0; + StringRef Name = SetData.getCStrRef(C); + if (C) + NewSet.Entries.push_back( + {DieRef, PubIndexEntryDescriptor(IndexEntryValue), Name}); + } + + if (!C) { + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 " parsing failed: %s", + SetOffset, toString(std::move(C.takeError())).c_str())); + continue; } + if (C.tell() != Offset) + RecoverableErrorHandler(createStringError( + errc::invalid_argument, + "name lookup table at offset 0x%" PRIx64 + " has a terminator at offset 0x%" PRIx64 + " before the expected end at 0x%" PRIx64, + SetOffset, C.tell() - OffsetSize, Offset - OffsetSize)); } - return C.takeError(); } void DWARFDebugPubTable::dump(raw_ostream &OS) const { diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s b/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s new file mode 100644 index 000000000000..8daea1e0f80d --- /dev/null +++ b/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s @@ -0,0 +1,150 @@ +# RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t + +## All four name lookup table sections share the same parser, but slightly +## diff erent code paths are used to reach it. Do a comprehensive check for one +## of the sections and minimal checks for the others. + +# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2> %t.err | FileCheck %s +# RUN: FileCheck %s --input-file=%t.err --check-prefix=ERR + +# RUN: not llvm-dwarfdump -debug-pubnames -debug-pubtypes -debug-gnu-pubtypes %t 2>&1 | \ +# RUN: FileCheck %s --check-prefix=ERR-MIN + + .section .debug_gnu_pubnames,"", at progbits +# CHECK: .debug_gnu_pubnames contents: + +## The next few sets do not contain all required fields in the header. +# ERR: error: name lookup table at offset 0x0 does not have a complete header: unexpected end of data at offset 0x5 while reading [0x4, 0x6) +# CHECK-NEXT: length = 0x00000001, format = DWARF32, version = 0x0000, unit_offset = 0x00000000, unit_size = 0x00000000 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet0End-.LSet0 # Length +.LSet0: + .byte 1 # Version (truncated) +.LSet0End: + +# ERR: error: name lookup table at offset 0x5 does not have a complete header: unexpected end of data at offset 0xe while reading [0xb, 0xf) +# CHECK-NEXT: length = 0x00000005, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000000 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet1End-.LSet1 # Length +.LSet1: + .short 2 # Version + .byte 1, 2, 3 # Debug Info Offset (truncated) +.LSet1End: + +# ERR: error: name lookup table at offset 0xe does not have a complete header: unexpected end of data at offset 0x1b while reading [0x18, 0x1c) +# CHECK-NEXT: length = 0x00000009, format = DWARF32, version = 0x0002, unit_offset = 0x00000032, unit_size = 0x00000000 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet2End-.LSet2 # Length +.LSet2: + .short 2 # Version + .long 0x32 # Debug Info Offset + .byte 1, 2, 3 # Debug Info Length (truncated) +.LSet2End: + +## This set is terminated just after the header. +# ERR: error: name lookup table at offset 0x1b parsing failed: unexpected end of data at offset 0x29 while reading [0x29, 0x2d) +# CHECK-NEXT: length = 0x0000000a, format = DWARF32, version = 0x0002, unit_offset = 0x00000048, unit_size = 0x00000064 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet3End-.LSet3 # Length +.LSet3: + .short 2 # Version + .long 0x48 # Debug Info Offset + .long 0x64 # Debug Info Length +.LSet3End: + +## The offset in the first pair is truncated. +# ERR: error: name lookup table at offset 0x29 parsing failed: unexpected end of data at offset 0x3a while reading [0x37, 0x3b) +# CHECK-NEXT: length = 0x0000000d, format = DWARF32, version = 0x0002, unit_offset = 0x000000ac, unit_size = 0x00000036 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet4End-.LSet4 # Length +.LSet4: + .short 2 # Version + .long 0xac # Debug Info Offset + .long 0x36 # Debug Info Length + .byte 1, 2, 3 # Offset (truncated) +.LSet4End: + +## The set is truncated just after the offset of the first pair. +# ERR: error: name lookup table at offset 0x3a parsing failed: unexpected end of data at offset 0x4c while reading [0x4c, 0x4d) +# CHECK-NEXT: length = 0x0000000e, format = DWARF32, version = 0x0002, unit_offset = 0x000000e2, unit_size = 0x00000015 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet5End-.LSet5 # Length +.LSet5: + .short 2 # Version + .long 0xe2 # Debug Info Offset + .long 0x15 # Debug Info Length + .long 0xf4 # Offset +.LSet5End: + +## The set is truncated just after the index entry field of the first pair. +# ERR: error: name lookup table at offset 0x4c parsing failed: no null terminated string at offset 0x5f +# CHECK-NEXT: length = 0x0000000f, format = DWARF32, version = 0x0002, unit_offset = 0x000000f7, unit_size = 0x00000010 +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet6End-.LSet6 # Length +.LSet6: + .short 2 # Version + .long 0xf7 # Debug Info Offset + .long 0x10 # Debug Info Length + .long 0xf4 # Offset + .byte 0x30 # Index Entry +.LSet6End: + +## This set contains a string which is not properly terminated. +# ERR: error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 +# CHECK-NEXT: length = 0x00000012, format = DWARF32, version = 0x0002, unit_offset = 0x00000107, unit_size = 0x0000004b +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NOT: 0x + .long .LSet7End-.LSet7 # Length +.LSet7: + .short 2 # Version + .long 0x107 # Debug Info Offset + .long 0x4b # Debug Info Length + .long 0x111 # Offset + .byte 0x30 # Index Entry + .ascii "foo" # The string does not terminate before the set data ends. +.LSet7End: + +## This set occupies some space after the terminator. +# ERR: error: name lookup table at offset 0x75 has a terminator at offset 0x8c before the expected end at 0x8d +# CHECK-NEXT: length = 0x00000018, format = DWARF32, version = 0x0002, unit_offset = 0x00000154, unit_size = 0x000002ac +# CHECK-NEXT: Offset Linkage Kind Name +# CHECK-NEXT: 0x0000018e EXTERNAL FUNCTION "foo" +# CHECK-NOT: 0x + .long .LSet8End-.LSet8 # Length +.LSet8: + .short 2 # Version + .long 0x154 # Debug Info Offset + .long 0x2ac # Debug Info Length + .long 0x18e # Offset + .byte 0x30 # Index Entry + .asciz "foo" # Name + .long 0 # Terminator + .space 1 +.LSet8End: + +## The remaining space in the section is too short to even contain a unit length +## field. +# ERR: error: name lookup table at offset 0x91 parsing failed: unexpected end of data at offset 0x94 while reading [0x91, 0x95) +# CHECK-NOT: length = + .space 3 + +# ERR-MIN: .debug_pubnames contents: +# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4) +# ERR-MIN: .debug_pubtypes contents: +# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4) +# ERR-MIN: .debug_gnu_pubtypes contents: +# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4) + + .section .debug_pubnames,"", at progbits + .byte 0 + .section .debug_pubtypes,"", at progbits + .byte 0 + .section .debug_gnu_pubtypes,"", at progbits + .byte 0 diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s b/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s deleted file mode 100644 index cfd75c2a9aca..000000000000 --- a/llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s +++ /dev/null @@ -1,26 +0,0 @@ -# RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t -# RUN: not llvm-dwarfdump -v %t 2>&1 | FileCheck %s - -# CHECK: .debug_pubnames contents: -# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) - -# CHECK: .debug_pubtypes contents: -# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) - -# CHECK: .debug_gnu_pubnames contents: -# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) - -# CHECK: .debug_gnu_pubtypes contents: -# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4) - - .section .debug_pubnames,"", at progbits - .byte 0 - - .section .debug_pubtypes,"", at progbits - .byte 0 - - .section .debug_gnu_pubnames,"", at progbits - .byte 0 - - .section .debug_gnu_pubtypes,"", at progbits - .byte 0 From llvm-commits at lists.llvm.org Thu Jul 9 05:16:51 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:16:51 +0000 (UTC) Subject: [PATCH] D83049: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG68f5a8b2042b: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section. (authored by ikudrin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83049/new/ https://reviews.llvm.org/D83049 Files: lld/ELF/DWARF.h lld/ELF/SyntheticSections.cpp lld/test/ELF/gdb-index-invalid-pubnames.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83049.276708.patch Type: text/x-patch Size: 10275 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:16:54 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:16:54 +0000 (UTC) Subject: [PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGca4d8da0c33c: [DebugInfo] Add more checks to parsing .debug_pub* sections. (authored by ikudrin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83050/new/ https://reviews.llvm.org/D83050 Files: lld/ELF/SyntheticSections.cpp lld/test/ELF/Inputs/gdb-index.s lld/test/ELF/gdb-index-invalid-pubnames.s lld/test/ELF/gdb-index.s llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h llvm/lib/DebugInfo/DWARF/DWARFContext.cpp llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83050.276709.patch Type: text/x-patch Size: 15755 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:18:17 2020 From: llvm-commits at lists.llvm.org (Daniel Grumberg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:18:17 +0000 (UTC) Subject: [PATCH] D83474: Add support for specifying only a denormalizer Message-ID: dang created this revision. dang added a reviewer: Bigcheese. Herald added subscribers: llvm-commits, cfe-commits, dexonsmith. Herald added projects: clang, LLVM. This commit adds a denormalyzer for optimization level. Depends on D83406 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83474 Files: clang/include/clang/Driver/Options.td clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/Option/OptParser.td llvm/utils/TableGen/OptParserEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83474.276710.patch Type: text/x-patch Size: 13327 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:24:49 2020 From: llvm-commits at lists.llvm.org (Benjamin Kramer via llvm-commits) Date: Thu, 09 Jul 2020 05:24:49 -0700 (PDT) Subject: [llvm] d36b841 - [DebugInfo] Fix pessimizing move. NFC. Message-ID: <5f070c91.1c69fb81.1ec84.7752@mx.google.com> Author: Benjamin Kramer Date: 2020-07-09T14:23:46+02:00 New Revision: d36b8414bdde1f361c40e6f6d53788c43ffe53c1 URL: https://github.com/llvm/llvm-project/commit/d36b8414bdde1f361c40e6f6d53788c43ffe53c1 DIFF: https://github.com/llvm/llvm-project/commit/d36b8414bdde1f361c40e6f6d53788c43ffe53c1.diff LOG: [DebugInfo] Fix pessimizing move. NFC. DWARFDebugPubTable.cpp:80:31: warning: moving a temporary object prevents copy elision [-Wpessimizing-move] Added: Modified: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp Removed: ################################################################################ diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp index fea3b9ace8ca..5031acdb54ef 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp @@ -77,7 +77,7 @@ void DWARFDebugPubTable::extract( RecoverableErrorHandler(createStringError( errc::invalid_argument, "name lookup table at offset 0x%" PRIx64 " parsing failed: %s", - SetOffset, toString(std::move(C.takeError())).c_str())); + SetOffset, toString(C.takeError()).c_str())); continue; } if (C.tell() != Offset) From llvm-commits at lists.llvm.org Thu Jul 9 05:39:24 2020 From: llvm-commits at lists.llvm.org (Daniel Stone via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:39:24 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: daniels added a comment. The build failure is again unrelated; comment filed at https://github.com/google/llvm-premerge-checks/issues/207 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Thu Jul 9 05:41:44 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:41:44 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: aykevl added a comment. ping? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Thu Jul 9 05:43:12 2020 From: llvm-commits at lists.llvm.org (Daniel Stone via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:43:12 +0000 (UTC) Subject: [PATCH] D77589: libclc: Add Mesa/SPIR-V target In-Reply-To: References: Message-ID: <60ae297a16b1761da6c5892db14640b3@localhost.localdomain> daniels added a comment. That PR was merged to fix the build failure, and now we have a new failure ... https://github.com/google/llvm-premerge-checks/issues/207 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77589/new/ https://reviews.llvm.org/D77589 From llvm-commits at lists.llvm.org Thu Jul 9 05:45:40 2020 From: llvm-commits at lists.llvm.org (Daniil Fukalov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:45:40 +0000 (UTC) Subject: [PATCH] D82761: SpeculativeExecution: Fix for logic change introduced in D81730. In-Reply-To: References: Message-ID: <26a6febccb53c7384c44fc070d82663a@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG167767a775f3: SpeculativeExecution: Fix for logic change introduced in D81730. (authored by dfukalov). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82761/new/ https://reviews.llvm.org/D82761 Files: llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp llvm/test/Transforms/SpeculativeExecution/PR46267.ll Index: llvm/test/Transforms/SpeculativeExecution/PR46267.ll =================================================================== --- llvm/test/Transforms/SpeculativeExecution/PR46267.ll +++ llvm/test/Transforms/SpeculativeExecution/PR46267.ll @@ -1,6 +1,36 @@ ; RUN: opt < %s -S -speculative-execution | FileCheck %s ; RUN: opt < %s -S -passes='speculative-execution' | FileCheck %s +%class.B = type { i32 (...)** } + +; Testing that two bitcasts are not hoisted to the first BB +define i8* @foo(%class.B* readonly %b) { +; CHECK-LABEL: foo +; CHECK-LABEL: entry +; CHECK-NEXT: %i = icmp eq %class.B* %b, null +; CHECK-NEXT: br i1 %i, label %end, label %notnull +entry: + %i = icmp eq %class.B* %b, null + br i1 %i, label %end, label %notnull + +; CHECK-LABEL: notnull: +; CHECK-NEXT: %i1 = bitcast %class.B* %b to i32** +; CHECK: %i3 = bitcast %class.B* %b to i8* +notnull: ; preds = %entry + %i1 = bitcast %class.B* %b to i32** + %vtable = load i32*, i32** %i1, align 8 + %i2 = getelementptr inbounds i32, i32* %vtable, i64 -2 + %offset.to.top = load i32, i32* %i2, align 4 + %i3 = bitcast %class.B* %b to i8* + %i4 = sext i32 %offset.to.top to i64 + %i5 = getelementptr inbounds i8, i8* %i3, i64 %i4 + br label %end + +end: ; preds = %notnull, %entry + %i6 = phi i8* [ %i5, %notnull ], [ null, %entry ] + ret i8* %i6 +} + define void @f(i32 %i) { entry: ; CHECK-LABEL: @f( Index: llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp =================================================================== --- llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp +++ llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp @@ -291,10 +291,8 @@ if (TotalSpeculationCost > SpecExecMaxSpeculationCost) return false; // too much to hoist } else { - // If the instruction cannot be hoisted but has zero cost suppose it's - // a special case e.g. debug info instrinsics that should not be counted - // for threshold. - if (Cost) + // Debug info instrinsics should not be counted for threshold. + if (!isa(I)) NotHoistedInstCount++; if (NotHoistedInstCount > SpecExecMaxNotHoisted) return false; // too much left behind -------------- next part -------------- A non-text attachment was scrubbed... Name: D82761.276714.patch Type: text/x-patch Size: 2278 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:45:41 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Thu, 09 Jul 2020 05:45:41 -0700 (PDT) Subject: [llvm] 167767a - SpeculativeExecution: Fix for logic change introduced in D81730. Message-ID: <5f071175.1c69fb81.14da2.6ec8@mx.google.com> Author: dfukalov Date: 2020-07-09T15:45:23+03:00 New Revision: 167767a775f3db5cd94053d4da6a4f419b6211cd URL: https://github.com/llvm/llvm-project/commit/167767a775f3db5cd94053d4da6a4f419b6211cd DIFF: https://github.com/llvm/llvm-project/commit/167767a775f3db5cd94053d4da6a4f419b6211cd.diff LOG: SpeculativeExecution: Fix for logic change introduced in D81730. Summary: The test case started to hoist bitcasts to upper BB after D81730. Reverted unintentional logic change. Some instructions may have zero cost but will not be hoisted by different limitation so should be counted for threshold. Reviewers: aprantl, arsenm, nhaehnle Reviewed By: aprantl Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82761 Added: Modified: llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp llvm/test/Transforms/SpeculativeExecution/PR46267.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp b/llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp index ea848f4e7a2a..f82a2936c762 100644 --- a/llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp +++ b/llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp @@ -291,10 +291,8 @@ bool SpeculativeExecutionPass::considerHoistingFromTo( if (TotalSpeculationCost > SpecExecMaxSpeculationCost) return false; // too much to hoist } else { - // If the instruction cannot be hoisted but has zero cost suppose it's - // a special case e.g. debug info instrinsics that should not be counted - // for threshold. - if (Cost) + // Debug info instrinsics should not be counted for threshold. + if (!isa(I)) NotHoistedInstCount++; if (NotHoistedInstCount > SpecExecMaxNotHoisted) return false; // too much left behind diff --git a/llvm/test/Transforms/SpeculativeExecution/PR46267.ll b/llvm/test/Transforms/SpeculativeExecution/PR46267.ll index 5a2c7049f991..5c61b225a7f5 100644 --- a/llvm/test/Transforms/SpeculativeExecution/PR46267.ll +++ b/llvm/test/Transforms/SpeculativeExecution/PR46267.ll @@ -1,6 +1,36 @@ ; RUN: opt < %s -S -speculative-execution | FileCheck %s ; RUN: opt < %s -S -passes='speculative-execution' | FileCheck %s +%class.B = type { i32 (...)** } + +; Testing that two bitcasts are not hoisted to the first BB +define i8* @foo(%class.B* readonly %b) { +; CHECK-LABEL: foo +; CHECK-LABEL: entry +; CHECK-NEXT: %i = icmp eq %class.B* %b, null +; CHECK-NEXT: br i1 %i, label %end, label %notnull +entry: + %i = icmp eq %class.B* %b, null + br i1 %i, label %end, label %notnull + +; CHECK-LABEL: notnull: +; CHECK-NEXT: %i1 = bitcast %class.B* %b to i32** +; CHECK: %i3 = bitcast %class.B* %b to i8* +notnull: ; preds = %entry + %i1 = bitcast %class.B* %b to i32** + %vtable = load i32*, i32** %i1, align 8 + %i2 = getelementptr inbounds i32, i32* %vtable, i64 -2 + %offset.to.top = load i32, i32* %i2, align 4 + %i3 = bitcast %class.B* %b to i8* + %i4 = sext i32 %offset.to.top to i64 + %i5 = getelementptr inbounds i8, i8* %i3, i64 %i4 + br label %end + +end: ; preds = %notnull, %entry + %i6 = phi i8* [ %i5, %notnull ], [ null, %entry ] + ret i8* %i6 +} + define void @f(i32 %i) { entry: ; CHECK-LABEL: @f( From llvm-commits at lists.llvm.org Thu Jul 9 05:47:21 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:47:21 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. Message-ID: SjoerdMeijer created this revision. SjoerdMeijer added reviewers: fhahn, anemet, Gerolf. Herald added subscribers: tschuett, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This tightens the matrix intrinsic definitions in LLVM LangRef and adds corresponding checks to the IR Verifier. https://reviews.llvm.org/D83477 Files: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83477.276713.patch Type: text/x-patch Size: 15143 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:47:53 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:47:53 +0000 (UTC) Subject: [PATCH] D83203: [CodeGen] Fix warnings in SelectionDAG::SplitVector In-Reply-To: References: Message-ID: <4a92379c1e56c3776c8e1341f98ee223@localhost.localdomain> kmclaughlin accepted this revision. kmclaughlin added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83203/new/ https://reviews.llvm.org/D83203 From llvm-commits at lists.llvm.org Thu Jul 9 05:48:11 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:48:11 +0000 (UTC) Subject: [PATCH] D83046: [LiveDebugValues] 1/4 Install an implementation-picking LiveDebugValues pass In-Reply-To: References: Message-ID: <2e7647738cfa9cb6c2f9a2741b9c2698@localhost.localdomain> djtodoro added inline comments. ================ Comment at: llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h:1 +//===- LiveDebugValues.cpp - Tracking Debug Value MIs ---------------------===// +// ---------------- And since this is a header file we need: `-*- C++ -*-` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83046/new/ https://reviews.llvm.org/D83046 From llvm-commits at lists.llvm.org Thu Jul 9 05:54:54 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via llvm-commits) Date: Thu, 09 Jul 2020 05:54:54 -0700 (PDT) Subject: [llvm] 97106f9 - [RISCV] Avoid Splitting MBB in RISCVExpandPseudo Message-ID: <5f07139e.1c69fb81.2edbb.6cb4@mx.google.com> Author: Sam Elliott Date: 2020-07-09T13:54:13+01:00 New Revision: 97106f9d80f6ba1bf5eafbd5a6f88d72913ec5a1 URL: https://github.com/llvm/llvm-project/commit/97106f9d80f6ba1bf5eafbd5a6f88d72913ec5a1 DIFF: https://github.com/llvm/llvm-project/commit/97106f9d80f6ba1bf5eafbd5a6f88d72913ec5a1.diff LOG: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo Since the `RISCVExpandPseudo` pass has been split from `RISCVExpandAtomicPseudo` pass, it would be nice to run the former as early as possible (The latter has to be run as late as possible to ensure correctness). Running earlier means we can reschedule these pairs as we see fit. Running earlier in the machine pass pipeline is good, but would mean teaching many more passes about `hasLabelMustBeEmitted`. Splitting the basic blocks also pessimises possible optimisations because some optimisations are MBB-local, and others are disabled if the block has its address taken (which is notionally what `hasLabelMustBeEmitted` means). This patch uses a new approach of setting the pre-instruction symbol on the AUIPC instruction to a temporary symbol and referencing that. This avoids splitting the basic block, but allows us to reference exactly the instruction that we need to. Notionally, this approach seems more correct because we do actually want to address a specific instruction. This then allows the pass to be moved much earlier in the pass pipeline, before both scheduling and register allocation. However, to do so we must leave the MIR in SSA form (by not redefining registers), and so use a virtual register for the intermediate value. By using this virtual register, this pass now has to come before register allocation. Reviewed By: luismarques, asb Differential Revision: https://reviews.llvm.org/D82988 Added: Modified: llvm/include/llvm/CodeGen/MachineBasicBlock.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp llvm/lib/Target/RISCV/RISCVMCInstLower.cpp llvm/lib/Target/RISCV/RISCVTargetMachine.cpp llvm/test/CodeGen/RISCV/codemodel-lowering.ll llvm/test/CodeGen/RISCV/mir-target-flags.ll llvm/test/CodeGen/RISCV/pic-models.ll llvm/test/CodeGen/RISCV/tls-models.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h index d6cb7211cf70..b69f6584fe6c 100644 --- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h +++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h @@ -143,10 +143,6 @@ class MachineBasicBlock /// branch. bool AddressTaken = false; - /// Indicate that this basic block needs its symbol be emitted regardless of - /// whether the flow just falls-through to it. - bool LabelMustBeEmitted = false; - /// Indicate that this basic block is the entry block of an EH scope, i.e., /// the block that used to have a catchpad or cleanuppad instruction in the /// LLVM IR. @@ -206,13 +202,6 @@ class MachineBasicBlock /// branch. void setHasAddressTaken() { AddressTaken = true; } - /// Test whether this block must have its label emitted. - bool hasLabelMustBeEmitted() const { return LabelMustBeEmitted; } - - /// Set this block to reflect that, regardless how we flow to it, we need - /// its label be emitted. - void setLabelMustBeEmitted() { LabelMustBeEmitted = true; } - /// Return the MachineFunction containing this basic block. const MachineFunction *getParent() const { return xParent; } MachineFunction *getParent() { return xParent; } diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index 27e9ffe9ea07..4d7c36041398 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -3057,16 +3057,13 @@ void AsmPrinter::emitBasicBlockStart(const MachineBasicBlock &MBB) { if (MBB.pred_empty() || (!MF->hasBBLabels() && isBlockOnlyReachableByFallthrough(&MBB) && - !MBB.isEHFuncletEntry() && !MBB.hasLabelMustBeEmitted())) { + !MBB.isEHFuncletEntry())) { if (isVerbose()) { // NOTE: Want this comment at start of line, don't emit with AddComment. OutStreamer->emitRawComment(" %bb." + Twine(MBB.getNumber()) + ":", false); } } else { - if (isVerbose() && MBB.hasLabelMustBeEmitted()) { - OutStreamer->AddComment("Label of block must be emitted"); - } // Switch to a new section if this basic block must begin a section. if (MBB.isBeginSection()) { OutStreamer->SwitchSection( diff --git a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp index 5dcd294cef04..33db8f231c7d 100644 --- a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp +++ b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp @@ -19,6 +19,7 @@ #include "llvm/CodeGen/LivePhysRegs.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstrBuilder.h" +#include "llvm/MC/MCContext.h" using namespace llvm; @@ -41,24 +42,18 @@ class RISCVExpandPseudo : public MachineFunctionPass { private: bool expandMBB(MachineBasicBlock &MBB); - bool expandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI); + bool expandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI); bool expandAuipcInstPair(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI, unsigned FlagsHi, unsigned SecondOpcode); bool expandLoadLocalAddress(MachineBasicBlock &MBB, - MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI); + MachineBasicBlock::iterator MBBI); bool expandLoadAddress(MachineBasicBlock &MBB, - MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI); + MachineBasicBlock::iterator MBBI); bool expandLoadTLSIEAddress(MachineBasicBlock &MBB, - MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI); + MachineBasicBlock::iterator MBBI); bool expandLoadTLSGDAddress(MachineBasicBlock &MBB, - MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI); + MachineBasicBlock::iterator MBBI); }; char RISCVExpandPseudo::ID = 0; @@ -77,7 +72,7 @@ bool RISCVExpandPseudo::expandMBB(MachineBasicBlock &MBB) { MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); while (MBBI != E) { MachineBasicBlock::iterator NMBBI = std::next(MBBI); - Modified |= expandMI(MBB, MBBI, NMBBI); + Modified |= expandMI(MBB, MBBI); MBBI = NMBBI; } @@ -85,73 +80,56 @@ bool RISCVExpandPseudo::expandMBB(MachineBasicBlock &MBB) { } bool RISCVExpandPseudo::expandMI(MachineBasicBlock &MBB, - MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI) { + MachineBasicBlock::iterator MBBI) { switch (MBBI->getOpcode()) { case RISCV::PseudoLLA: - return expandLoadLocalAddress(MBB, MBBI, NextMBBI); + return expandLoadLocalAddress(MBB, MBBI); case RISCV::PseudoLA: - return expandLoadAddress(MBB, MBBI, NextMBBI); + return expandLoadAddress(MBB, MBBI); case RISCV::PseudoLA_TLS_IE: - return expandLoadTLSIEAddress(MBB, MBBI, NextMBBI); + return expandLoadTLSIEAddress(MBB, MBBI); case RISCV::PseudoLA_TLS_GD: - return expandLoadTLSGDAddress(MBB, MBBI, NextMBBI); + return expandLoadTLSGDAddress(MBB, MBBI); } return false; } -bool RISCVExpandPseudo::expandAuipcInstPair( - MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI, unsigned FlagsHi, - unsigned SecondOpcode) { +bool RISCVExpandPseudo::expandAuipcInstPair(MachineBasicBlock &MBB, + MachineBasicBlock::iterator MBBI, + unsigned FlagsHi, + unsigned SecondOpcode) { MachineFunction *MF = MBB.getParent(); MachineInstr &MI = *MBBI; DebugLoc DL = MI.getDebugLoc(); Register DestReg = MI.getOperand(0).getReg(); - const MachineOperand &Symbol = MI.getOperand(1); + Register ScratchReg = + MF->getRegInfo().createVirtualRegister(&RISCV::GPRRegClass); - MachineBasicBlock *NewMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); + MachineOperand &Symbol = MI.getOperand(1); + Symbol.setTargetFlags(FlagsHi); + MCSymbol *AUIPCSymbol = MF->getContext().createTempSymbol(false); - // Tell AsmPrinter that we unconditionally want the symbol of this label to be - // emitted. - NewMBB->setLabelMustBeEmitted(); + MachineInstr *MIAUIPC = + BuildMI(MBB, MBBI, DL, TII->get(RISCV::AUIPC), ScratchReg).add(Symbol); + MIAUIPC->setPreInstrSymbol(*MF, AUIPCSymbol); - MF->insert(++MBB.getIterator(), NewMBB); + BuildMI(MBB, MBBI, DL, TII->get(SecondOpcode), DestReg) + .addReg(ScratchReg) + .addSym(AUIPCSymbol, RISCVII::MO_PCREL_LO); - BuildMI(NewMBB, DL, TII->get(RISCV::AUIPC), DestReg) - .addDisp(Symbol, 0, FlagsHi); - BuildMI(NewMBB, DL, TII->get(SecondOpcode), DestReg) - .addReg(DestReg) - .addMBB(NewMBB, RISCVII::MO_PCREL_LO); - - // Move all the rest of the instructions to NewMBB. - NewMBB->splice(NewMBB->end(), &MBB, std::next(MBBI), MBB.end()); - // Update machine-CFG edges. - NewMBB->transferSuccessorsAndUpdatePHIs(&MBB); - // Make the original basic block fall-through to the new. - MBB.addSuccessor(NewMBB); - - // Make sure live-ins are correctly attached to this new basic block. - LivePhysRegs LiveRegs; - computeAndAddLiveIns(LiveRegs, *NewMBB); - - NextMBBI = MBB.end(); MI.eraseFromParent(); return true; } bool RISCVExpandPseudo::expandLoadLocalAddress( - MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI) { - return expandAuipcInstPair(MBB, MBBI, NextMBBI, RISCVII::MO_PCREL_HI, - RISCV::ADDI); + MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) { + return expandAuipcInstPair(MBB, MBBI, RISCVII::MO_PCREL_HI, RISCV::ADDI); } -bool RISCVExpandPseudo::expandLoadAddress( - MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI) { +bool RISCVExpandPseudo::expandLoadAddress(MachineBasicBlock &MBB, + MachineBasicBlock::iterator MBBI) { MachineFunction *MF = MBB.getParent(); unsigned SecondOpcode; @@ -164,25 +142,21 @@ bool RISCVExpandPseudo::expandLoadAddress( SecondOpcode = RISCV::ADDI; FlagsHi = RISCVII::MO_PCREL_HI; } - return expandAuipcInstPair(MBB, MBBI, NextMBBI, FlagsHi, SecondOpcode); + return expandAuipcInstPair(MBB, MBBI, FlagsHi, SecondOpcode); } bool RISCVExpandPseudo::expandLoadTLSIEAddress( - MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI) { + MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) { MachineFunction *MF = MBB.getParent(); const auto &STI = MF->getSubtarget(); unsigned SecondOpcode = STI.is64Bit() ? RISCV::LD : RISCV::LW; - return expandAuipcInstPair(MBB, MBBI, NextMBBI, RISCVII::MO_TLS_GOT_HI, - SecondOpcode); + return expandAuipcInstPair(MBB, MBBI, RISCVII::MO_TLS_GOT_HI, SecondOpcode); } bool RISCVExpandPseudo::expandLoadTLSGDAddress( - MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, - MachineBasicBlock::iterator &NextMBBI) { - return expandAuipcInstPair(MBB, MBBI, NextMBBI, RISCVII::MO_TLS_GD_HI, - RISCV::ADDI); + MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) { + return expandAuipcInstPair(MBB, MBBI, RISCVII::MO_TLS_GD_HI, RISCV::ADDI); } } // end of anonymous namespace diff --git a/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp b/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp index b1dbcfa7f738..8ddcf757c97e 100644 --- a/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp +++ b/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp @@ -121,6 +121,9 @@ bool llvm::LowerRISCVMachineOperandToMCOperand(const MachineOperand &MO, case MachineOperand::MO_ConstantPoolIndex: MCOp = lowerSymbolOperand(MO, AP.GetCPISymbol(MO.getIndex()), AP); break; + case MachineOperand::MO_MCSymbol: + MCOp = lowerSymbolOperand(MO, MO.getMCSymbol(), AP); + break; } return true; } diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index 75683e2fd8e9..63f607a9c352 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -173,7 +173,6 @@ void RISCVPassConfig::addPreSched2() {} void RISCVPassConfig::addPreEmitPass() { addPass(&BranchRelaxationPassID); } void RISCVPassConfig::addPreEmitPass2() { - addPass(createRISCVExpandPseudoPass()); // Schedule the expansion of AMOs at the last possible moment, avoiding the // possibility for other passes to break the requirements for forward // progress in the LR/SC block. @@ -181,5 +180,6 @@ void RISCVPassConfig::addPreEmitPass2() { } void RISCVPassConfig::addPreRegAlloc() { + addPass(createRISCVExpandPseudoPass()); addPass(createRISCVMergeBaseOffsetOptPass()); } diff --git a/llvm/test/CodeGen/RISCV/codemodel-lowering.ll b/llvm/test/CodeGen/RISCV/codemodel-lowering.ll index 6c172a26f050..84774feccf12 100644 --- a/llvm/test/CodeGen/RISCV/codemodel-lowering.ll +++ b/llvm/test/CodeGen/RISCV/codemodel-lowering.ll @@ -16,9 +16,9 @@ define i32 @lower_global(i32 %a) nounwind { ; ; RV32I-MEDIUM-LABEL: lower_global: ; RV32I-MEDIUM: # %bb.0: -; RV32I-MEDIUM-NEXT: .LBB0_1: # Label of block must be emitted +; RV32I-MEDIUM-NEXT: .Ltmp0: ; RV32I-MEDIUM-NEXT: auipc a0, %pcrel_hi(G) -; RV32I-MEDIUM-NEXT: addi a0, a0, %pcrel_lo(.LBB0_1) +; RV32I-MEDIUM-NEXT: addi a0, a0, %pcrel_lo(.Ltmp0) ; RV32I-MEDIUM-NEXT: lw a0, 0(a0) ; RV32I-MEDIUM-NEXT: ret %1 = load volatile i32, i32* @G @@ -39,9 +39,9 @@ define void @lower_blockaddress() nounwind { ; ; RV32I-MEDIUM-LABEL: lower_blockaddress: ; RV32I-MEDIUM: # %bb.0: -; RV32I-MEDIUM-NEXT: .LBB1_1: # Label of block must be emitted +; RV32I-MEDIUM-NEXT: .Ltmp1: ; RV32I-MEDIUM-NEXT: auipc a0, %pcrel_hi(addr) -; RV32I-MEDIUM-NEXT: addi a0, a0, %pcrel_lo(.LBB1_1) +; RV32I-MEDIUM-NEXT: addi a0, a0, %pcrel_lo(.Ltmp1) ; RV32I-MEDIUM-NEXT: addi a1, zero, 1 ; RV32I-MEDIUM-NEXT: sw a1, 0(a0) ; RV32I-MEDIUM-NEXT: ret @@ -82,17 +82,16 @@ define signext i32 @lower_blockaddress_displ(i32 signext %w) nounwind { ; RV32I-MEDIUM: # %bb.0: # %entry ; RV32I-MEDIUM-NEXT: addi sp, sp, -16 ; RV32I-MEDIUM-NEXT: sw ra, 12(sp) -; RV32I-MEDIUM-NEXT: .LBB2_5: # %entry -; RV32I-MEDIUM-NEXT: # Label of block must be emitted -; RV32I-MEDIUM-NEXT: auipc a1, %pcrel_hi(.Ltmp0) -; RV32I-MEDIUM-NEXT: addi a1, a1, %pcrel_lo(.LBB2_5) +; RV32I-MEDIUM-NEXT: .Ltmp2: +; RV32I-MEDIUM-NEXT: auipc a1, %pcrel_hi(.Ltmp3) +; RV32I-MEDIUM-NEXT: addi a1, a1, %pcrel_lo(.Ltmp2) ; RV32I-MEDIUM-NEXT: addi a2, zero, 101 ; RV32I-MEDIUM-NEXT: sw a1, 8(sp) ; RV32I-MEDIUM-NEXT: blt a0, a2, .LBB2_3 ; RV32I-MEDIUM-NEXT: # %bb.1: # %if.then ; RV32I-MEDIUM-NEXT: lw a0, 8(sp) ; RV32I-MEDIUM-NEXT: jr a0 -; RV32I-MEDIUM-NEXT: .Ltmp0: # Block address taken +; RV32I-MEDIUM-NEXT: .Ltmp3: # Block address taken ; RV32I-MEDIUM-NEXT: .LBB2_2: # %return ; RV32I-MEDIUM-NEXT: addi a0, zero, 4 ; RV32I-MEDIUM-NEXT: j .LBB2_4 @@ -140,9 +139,9 @@ define float @lower_constantpool(float %a) nounwind { ; ; RV32I-MEDIUM-LABEL: lower_constantpool: ; RV32I-MEDIUM: # %bb.0: -; RV32I-MEDIUM-NEXT: .LBB3_1: # Label of block must be emitted +; RV32I-MEDIUM-NEXT: .Ltmp4: ; RV32I-MEDIUM-NEXT: auipc a1, %pcrel_hi(.LCPI3_0) -; RV32I-MEDIUM-NEXT: addi a1, a1, %pcrel_lo(.LBB3_1) +; RV32I-MEDIUM-NEXT: addi a1, a1, %pcrel_lo(.Ltmp4) ; RV32I-MEDIUM-NEXT: flw ft0, 0(a1) ; RV32I-MEDIUM-NEXT: fmv.w.x ft1, a0 ; RV32I-MEDIUM-NEXT: fadd.s ft0, ft1, ft0 diff --git a/llvm/test/CodeGen/RISCV/mir-target-flags.ll b/llvm/test/CodeGen/RISCV/mir-target-flags.ll index f41fb77dbb00..b1bf935c4e3b 100644 --- a/llvm/test/CodeGen/RISCV/mir-target-flags.ll +++ b/llvm/test/CodeGen/RISCV/mir-target-flags.ll @@ -27,11 +27,11 @@ define i32 @caller(i32 %a) nounwind { ; RV32-SMALL-NEXT: target-flags(riscv-hi) @g_i ; RV32-SMALL-NEXT: target-flags(riscv-lo) @g_i ; RV32-SMALL: target-flags(riscv-tls-got-hi) @t_un -; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) %bb.1 +; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) ; RV32-SMALL: target-flags(riscv-tls-got-hi) @t_ld -; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) %bb.2 +; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) ; RV32-SMALL: target-flags(riscv-tls-got-hi) @t_ie -; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) %bb.3 +; RV32-SMALL-NEXT: target-flags(riscv-pcrel-lo) ; RV32-SMALL: target-flags(riscv-tprel-hi) @t_le ; RV32-SMALL-NEXT: target-flags(riscv-tprel-add) @t_le ; RV32-SMALL-NEXT: target-flags(riscv-tprel-lo) @t_le @@ -39,17 +39,17 @@ define i32 @caller(i32 %a) nounwind { ; ; RV32-MED-LABEL: name: caller ; RV32-MED: target-flags(riscv-got-hi) @g_e -; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) %bb.1 +; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) ; RV32-MED: target-flags(riscv-pcrel-hi) @g_i -; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) %bb.2 +; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) ; RV32-MED: target-flags(riscv-tls-gd-hi) @t_un -; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) %bb.3 -; RV32-MED-NEXT: target-flags(riscv-plt) &__tls_get_addr +; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) +; RV32-MED: target-flags(riscv-plt) &__tls_get_addr ; RV32-MED: target-flags(riscv-tls-gd-hi) @t_ld -; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) %bb.4 -; RV32-MED-NEXT: target-flags(riscv-plt) &__tls_get_addr +; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) +; RV32-MED: target-flags(riscv-plt) &__tls_get_addr ; RV32-MED: target-flags(riscv-tls-got-hi) @t_ie -; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) %bb.5 +; RV32-MED-NEXT: target-flags(riscv-pcrel-lo) ; RV32-MED: target-flags(riscv-tprel-hi) @t_le ; RV32-MED-NEXT: target-flags(riscv-tprel-add) @t_le ; RV32-MED-NEXT: target-flags(riscv-tprel-lo) @t_le diff --git a/llvm/test/CodeGen/RISCV/pic-models.ll b/llvm/test/CodeGen/RISCV/pic-models.ll index 8d835ae99f40..46e9cee57d79 100644 --- a/llvm/test/CodeGen/RISCV/pic-models.ll +++ b/llvm/test/CodeGen/RISCV/pic-models.ll @@ -26,10 +26,9 @@ define i32* @f1() nounwind { ; ; RV32-PIC-LABEL: f1: ; RV32-PIC: # %bb.0: # %entry -; RV32-PIC-NEXT: .LBB0_1: # %entry -; RV32-PIC-NEXT: # Label of block must be emitted +; RV32-PIC-NEXT: .Ltmp0: ; RV32-PIC-NEXT: auipc a0, %got_pcrel_hi(external_var) -; RV32-PIC-NEXT: lw a0, %pcrel_lo(.LBB0_1)(a0) +; RV32-PIC-NEXT: lw a0, %pcrel_lo(.Ltmp0)(a0) ; RV32-PIC-NEXT: ret ; ; RV64-STATIC-LABEL: f1: @@ -40,10 +39,9 @@ define i32* @f1() nounwind { ; ; RV64-PIC-LABEL: f1: ; RV64-PIC: # %bb.0: # %entry -; RV64-PIC-NEXT: .LBB0_1: # %entry -; RV64-PIC-NEXT: # Label of block must be emitted +; RV64-PIC-NEXT: .Ltmp0: ; RV64-PIC-NEXT: auipc a0, %got_pcrel_hi(external_var) -; RV64-PIC-NEXT: ld a0, %pcrel_lo(.LBB0_1)(a0) +; RV64-PIC-NEXT: ld a0, %pcrel_lo(.Ltmp0)(a0) ; RV64-PIC-NEXT: ret entry: ret i32* @external_var @@ -61,10 +59,9 @@ define i32* @f2() nounwind { ; ; RV32-PIC-LABEL: f2: ; RV32-PIC: # %bb.0: # %entry -; RV32-PIC-NEXT: .LBB1_1: # %entry -; RV32-PIC-NEXT: # Label of block must be emitted +; RV32-PIC-NEXT: .Ltmp1: ; RV32-PIC-NEXT: auipc a0, %pcrel_hi(internal_var) -; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB1_1) +; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp1) ; RV32-PIC-NEXT: ret ; ; RV64-STATIC-LABEL: f2: @@ -75,10 +72,9 @@ define i32* @f2() nounwind { ; ; RV64-PIC-LABEL: f2: ; RV64-PIC: # %bb.0: # %entry -; RV64-PIC-NEXT: .LBB1_1: # %entry -; RV64-PIC-NEXT: # Label of block must be emitted +; RV64-PIC-NEXT: .Ltmp1: ; RV64-PIC-NEXT: auipc a0, %pcrel_hi(internal_var) -; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB1_1) +; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp1) ; RV64-PIC-NEXT: ret entry: ret i32* @internal_var diff --git a/llvm/test/CodeGen/RISCV/tls-models.ll b/llvm/test/CodeGen/RISCV/tls-models.ll index 25a2f71beb31..27f63ff33674 100644 --- a/llvm/test/CodeGen/RISCV/tls-models.ll +++ b/llvm/test/CodeGen/RISCV/tls-models.ll @@ -23,10 +23,9 @@ define i32* @f1() nounwind { ; RV32-PIC: # %bb.0: # %entry ; RV32-PIC-NEXT: addi sp, sp, -16 ; RV32-PIC-NEXT: sw ra, 12(sp) -; RV32-PIC-NEXT: .LBB0_1: # %entry -; RV32-PIC-NEXT: # Label of block must be emitted +; RV32-PIC-NEXT: .Ltmp0: ; RV32-PIC-NEXT: auipc a0, %tls_gd_pcrel_hi(unspecified) -; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB0_1) +; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp0) ; RV32-PIC-NEXT: call __tls_get_addr at plt ; RV32-PIC-NEXT: lw ra, 12(sp) ; RV32-PIC-NEXT: addi sp, sp, 16 @@ -36,10 +35,9 @@ define i32* @f1() nounwind { ; RV64-PIC: # %bb.0: # %entry ; RV64-PIC-NEXT: addi sp, sp, -16 ; RV64-PIC-NEXT: sd ra, 8(sp) -; RV64-PIC-NEXT: .LBB0_1: # %entry -; RV64-PIC-NEXT: # Label of block must be emitted +; RV64-PIC-NEXT: .Ltmp0: ; RV64-PIC-NEXT: auipc a0, %tls_gd_pcrel_hi(unspecified) -; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB0_1) +; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp0) ; RV64-PIC-NEXT: call __tls_get_addr at plt ; RV64-PIC-NEXT: ld ra, 8(sp) ; RV64-PIC-NEXT: addi sp, sp, 16 @@ -47,19 +45,17 @@ define i32* @f1() nounwind { ; ; RV32-NOPIC-LABEL: f1: ; RV32-NOPIC: # %bb.0: # %entry -; RV32-NOPIC-NEXT: .LBB0_1: # %entry -; RV32-NOPIC-NEXT: # Label of block must be emitted +; RV32-NOPIC-NEXT: .Ltmp0: ; RV32-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(unspecified) -; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.LBB0_1)(a0) +; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.Ltmp0)(a0) ; RV32-NOPIC-NEXT: add a0, a0, tp ; RV32-NOPIC-NEXT: ret ; ; RV64-NOPIC-LABEL: f1: ; RV64-NOPIC: # %bb.0: # %entry -; RV64-NOPIC-NEXT: .LBB0_1: # %entry -; RV64-NOPIC-NEXT: # Label of block must be emitted +; RV64-NOPIC-NEXT: .Ltmp0: ; RV64-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(unspecified) -; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.LBB0_1)(a0) +; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.Ltmp0)(a0) ; RV64-NOPIC-NEXT: add a0, a0, tp ; RV64-NOPIC-NEXT: ret entry: @@ -74,10 +70,9 @@ define i32* @f2() nounwind { ; RV32-PIC: # %bb.0: # %entry ; RV32-PIC-NEXT: addi sp, sp, -16 ; RV32-PIC-NEXT: sw ra, 12(sp) -; RV32-PIC-NEXT: .LBB1_1: # %entry -; RV32-PIC-NEXT: # Label of block must be emitted +; RV32-PIC-NEXT: .Ltmp1: ; RV32-PIC-NEXT: auipc a0, %tls_gd_pcrel_hi(ld) -; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB1_1) +; RV32-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp1) ; RV32-PIC-NEXT: call __tls_get_addr at plt ; RV32-PIC-NEXT: lw ra, 12(sp) ; RV32-PIC-NEXT: addi sp, sp, 16 @@ -87,10 +82,9 @@ define i32* @f2() nounwind { ; RV64-PIC: # %bb.0: # %entry ; RV64-PIC-NEXT: addi sp, sp, -16 ; RV64-PIC-NEXT: sd ra, 8(sp) -; RV64-PIC-NEXT: .LBB1_1: # %entry -; RV64-PIC-NEXT: # Label of block must be emitted +; RV64-PIC-NEXT: .Ltmp1: ; RV64-PIC-NEXT: auipc a0, %tls_gd_pcrel_hi(ld) -; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.LBB1_1) +; RV64-PIC-NEXT: addi a0, a0, %pcrel_lo(.Ltmp1) ; RV64-PIC-NEXT: call __tls_get_addr at plt ; RV64-PIC-NEXT: ld ra, 8(sp) ; RV64-PIC-NEXT: addi sp, sp, 16 @@ -98,19 +92,17 @@ define i32* @f2() nounwind { ; ; RV32-NOPIC-LABEL: f2: ; RV32-NOPIC: # %bb.0: # %entry -; RV32-NOPIC-NEXT: .LBB1_1: # %entry -; RV32-NOPIC-NEXT: # Label of block must be emitted +; RV32-NOPIC-NEXT: .Ltmp1: ; RV32-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ld) -; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.LBB1_1)(a0) +; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.Ltmp1)(a0) ; RV32-NOPIC-NEXT: add a0, a0, tp ; RV32-NOPIC-NEXT: ret ; ; RV64-NOPIC-LABEL: f2: ; RV64-NOPIC: # %bb.0: # %entry -; RV64-NOPIC-NEXT: .LBB1_1: # %entry -; RV64-NOPIC-NEXT: # Label of block must be emitted +; RV64-NOPIC-NEXT: .Ltmp1: ; RV64-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ld) -; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.LBB1_1)(a0) +; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.Ltmp1)(a0) ; RV64-NOPIC-NEXT: add a0, a0, tp ; RV64-NOPIC-NEXT: ret entry: @@ -123,37 +115,33 @@ entry: define i32* @f3() nounwind { ; RV32-PIC-LABEL: f3: ; RV32-PIC: # %bb.0: # %entry -; RV32-PIC-NEXT: .LBB2_1: # %entry -; RV32-PIC-NEXT: # Label of block must be emitted +; RV32-PIC-NEXT: .Ltmp2: ; RV32-PIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ie) -; RV32-PIC-NEXT: lw a0, %pcrel_lo(.LBB2_1)(a0) +; RV32-PIC-NEXT: lw a0, %pcrel_lo(.Ltmp2)(a0) ; RV32-PIC-NEXT: add a0, a0, tp ; RV32-PIC-NEXT: ret ; ; RV64-PIC-LABEL: f3: ; RV64-PIC: # %bb.0: # %entry -; RV64-PIC-NEXT: .LBB2_1: # %entry -; RV64-PIC-NEXT: # Label of block must be emitted +; RV64-PIC-NEXT: .Ltmp2: ; RV64-PIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ie) -; RV64-PIC-NEXT: ld a0, %pcrel_lo(.LBB2_1)(a0) +; RV64-PIC-NEXT: ld a0, %pcrel_lo(.Ltmp2)(a0) ; RV64-PIC-NEXT: add a0, a0, tp ; RV64-PIC-NEXT: ret ; ; RV32-NOPIC-LABEL: f3: ; RV32-NOPIC: # %bb.0: # %entry -; RV32-NOPIC-NEXT: .LBB2_1: # %entry -; RV32-NOPIC-NEXT: # Label of block must be emitted +; RV32-NOPIC-NEXT: .Ltmp2: ; RV32-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ie) -; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.LBB2_1)(a0) +; RV32-NOPIC-NEXT: lw a0, %pcrel_lo(.Ltmp2)(a0) ; RV32-NOPIC-NEXT: add a0, a0, tp ; RV32-NOPIC-NEXT: ret ; ; RV64-NOPIC-LABEL: f3: ; RV64-NOPIC: # %bb.0: # %entry -; RV64-NOPIC-NEXT: .LBB2_1: # %entry -; RV64-NOPIC-NEXT: # Label of block must be emitted +; RV64-NOPIC-NEXT: .Ltmp2: ; RV64-NOPIC-NEXT: auipc a0, %tls_ie_pcrel_hi(ie) -; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.LBB2_1)(a0) +; RV64-NOPIC-NEXT: ld a0, %pcrel_lo(.Ltmp2)(a0) ; RV64-NOPIC-NEXT: add a0, a0, tp ; RV64-NOPIC-NEXT: ret entry: From llvm-commits at lists.llvm.org Thu Jul 9 05:55:02 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:55:02 +0000 (UTC) Subject: [PATCH] D82988: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo In-Reply-To: References: Message-ID: <033057738b8af174c2024cfe91071ea4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG97106f9d80f6: [RISCV] Avoid Splitting MBB in RISCVExpandPseudo (authored by lenary). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82988/new/ https://reviews.llvm.org/D82988 Files: llvm/include/llvm/CodeGen/MachineBasicBlock.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp llvm/lib/Target/RISCV/RISCVMCInstLower.cpp llvm/lib/Target/RISCV/RISCVTargetMachine.cpp llvm/test/CodeGen/RISCV/codemodel-lowering.ll llvm/test/CodeGen/RISCV/mir-target-flags.ll llvm/test/CodeGen/RISCV/pic-models.ll llvm/test/CodeGen/RISCV/tls-models.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82988.276718.patch Type: text/x-patch Size: 23228 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:56:24 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:56:24 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: StephenTozer updated this revision to Diff 276717. StephenTozer added a comment. Update: Duplicate registers in the debug operands could be an issue with the previous patch version, and duplicates aren't easy to prevent, so always treat DBG_VALUE_LISTs as potentially having them. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 Files: llvm/include/llvm/BinaryFormat/Dwarf.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/include/llvm/CodeGen/MachineInstrBuilder.h llvm/include/llvm/IR/DebugInfoMetadata.h llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/Target.td llvm/lib/BinaryFormat/Dwarf.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/LiveRangeShrink.cpp llvm/lib/CodeGen/MIRParser/MIParser.cpp llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/MachineRegisterInfo.cpp llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/RegAllocFast.cpp llvm/lib/IR/DebugInfoMetadata.cpp llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/X86/X86OptimizeLEAs.cpp llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D82363.276717.patch Type: text/x-patch Size: 49321 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 05:57:55 2020 From: llvm-commits at lists.llvm.org (Sander de Smalen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:57:55 +0000 (UTC) Subject: [PATCH] D83203: [CodeGen] Fix warnings in SelectionDAG::SplitVector In-Reply-To: References: Message-ID: sdesmalen added a comment. Can you please clarify the title/summary a bit more before committing the patch? ================ Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9628 Hi = getNode(ISD::EXTRACT_SUBVECTOR, DL, HiVT, N, - getVectorIdxConstant(LoVT.getVectorNumElements(), DL)); + getVectorIdxConstant(LoVT.getVectorMinNumElements(), DL)); return std::make_pair(Lo, Hi); ---------------- nit: can you add a comment explaining why `getVectorMinNumElements()` is valid here? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83203/new/ https://reviews.llvm.org/D83203 From llvm-commits at lists.llvm.org Thu Jul 9 05:58:24 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 12:58:24 +0000 (UTC) Subject: [PATCH] D83479: [COFF] Error on unexpected .pdata size Message-ID: hans created this revision. hans added reviewers: thakis, MaskRay. Previously, lld would crash if the .pdata size is not an even multiple of the expected .pdata entry size. This makes it error instead. (We hit this in Chromium due to an assembler problem: https://crbug.com/1101577) https://reviews.llvm.org/D83479 Files: lld/COFF/Writer.cpp lld/test/COFF/pdata-arm64-bad.yaml -------------- next part -------------- A non-text attachment was scrubbed... Name: D83479.276719.patch Type: text/x-patch Size: 4587 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 06:06:25 2020 From: llvm-commits at lists.llvm.org (Ding Fei via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:06:25 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: danix800 added a comment. It's just so obvious. The unit-test `TEST_F(FileSystemTest, widenPath)` in `llvm/unittests/Support/Path.cpp:2080-2146` has already demonstrated this. #ifdef _WIN32 TEST_F(FileSystemTest, widenPath) { const std::wstring LongPathPrefix(L"\\\\?\\"); // Test that the length limit is checked against the UTF-16 length and not the // UTF-8 length. std::string Input("C:\\foldername\\"); const std::string Pi("\xcf\x80"); // UTF-8 lower case pi. // Add Pi up to the MAX_PATH limit. const size_t NumChars = MAX_PATH - Input.size() - 1; for (size_t i = 0; i < NumChars; ++i) Input += Pi; // Check that UTF-8 length already exceeds MAX_PATH. EXPECT_TRUE(Input.size() > MAX_PATH); SmallVector Result; ASSERT_NO_ERROR(windows::widenPath(Input, Result)); // Result should not start with the long path prefix. EXPECT_TRUE(std::wmemcmp(Result.data(), LongPathPrefix.c_str(), LongPathPrefix.size()) != 0); EXPECT_EQ(Result.size(), (size_t)MAX_PATH - 1); // Add another Pi to exceed the MAX_PATH limit. `Input.size() > MAX_PATH` is expected to be true and `Result.size()` is expected to be `MAX_PATH - 1`, meaning that `PathUTF16[Path.size() - 1]` is an out-of-bound array access. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 From llvm-commits at lists.llvm.org Thu Jul 9 06:09:38 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 06:09:38 -0700 (PDT) Subject: [llvm] 4597bfd - BasicAAResult::constantOffsetHeuristic - pass APInt arg as const reference. NFCI. Message-ID: <5f071712.1c69fb81.6df42.73cc@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T14:09:24+01:00 New Revision: 4597bfddf11b7d9dcf373525d0bc889bcc212b13 URL: https://github.com/llvm/llvm-project/commit/4597bfddf11b7d9dcf373525d0bc889bcc212b13 DIFF: https://github.com/llvm/llvm-project/commit/4597bfddf11b7d9dcf373525d0bc889bcc212b13.diff LOG: BasicAAResult::constantOffsetHeuristic - pass APInt arg as const reference. NFCI. Avoids unnecessary APInt copies and silences clang tidy warning. Added: Modified: llvm/include/llvm/Analysis/BasicAliasAnalysis.h llvm/lib/Analysis/BasicAliasAnalysis.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index 403510dbb3a3..9214bfcd7a24 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -189,7 +189,7 @@ class BasicAAResult : public AAResultBase { bool constantOffsetHeuristic(const SmallVectorImpl &VarIndices, LocationSize V1Size, LocationSize V2Size, - APInt BaseOffset, AssumptionCache *AC, + const APInt &BaseOffset, AssumptionCache *AC, DominatorTree *DT); bool isValueEqualInPotentialCycles(const Value *V1, const Value *V2); diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 86f8932490e6..74664098ce1d 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -433,7 +433,7 @@ static bool isObjectSize(const Value *V, uint64_t Size, const DataLayout &DL, /// an issue, for example, in particular for 32b pointers with negative indices /// that rely on two's complement wrap-arounds for precise alias information /// where the maximum pointer size is 64b. -static APInt adjustToPointerSize(APInt Offset, unsigned PointerSize) { +static APInt adjustToPointerSize(const APInt &Offset, unsigned PointerSize) { assert(PointerSize <= Offset.getBitWidth() && "Invalid PointerSize!"); unsigned ShiftBits = Offset.getBitWidth() - PointerSize; return (Offset << ShiftBits).ashr(ShiftBits); @@ -1993,7 +1993,7 @@ void BasicAAResult::GetIndexDifference( bool BasicAAResult::constantOffsetHeuristic( const SmallVectorImpl &VarIndices, - LocationSize MaybeV1Size, LocationSize MaybeV2Size, APInt BaseOffset, + LocationSize MaybeV1Size, LocationSize MaybeV2Size, const APInt &BaseOffset, AssumptionCache *AC, DominatorTree *DT) { if (VarIndices.size() != 2 || MaybeV1Size == LocationSize::unknown() || MaybeV2Size == LocationSize::unknown()) From llvm-commits at lists.llvm.org Thu Jul 9 06:09:40 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 06:09:40 -0700 (PDT) Subject: [llvm] f54402b - [X86][AVX] Attempt to fold extract_subvector(shuffle(X)) -> extract_subvector(X) Message-ID: <5f071714.1c69fb81.3b07c.69d1@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T14:09:24+01:00 New Revision: f54402b63a4f5b0b4b15e0f82ce8ff8501b206e6 URL: https://github.com/llvm/llvm-project/commit/f54402b63a4f5b0b4b15e0f82ce8ff8501b206e6 DIFF: https://github.com/llvm/llvm-project/commit/f54402b63a4f5b0b4b15e0f82ce8ff8501b206e6.diff LOG: [X86][AVX] Attempt to fold extract_subvector(shuffle(X)) -> extract_subvector(X) If we're extracting a subvector from a shuffle that is shuffling entire subvectors we can peek through and extract the subvector from the shuffle source instead. This helps remove some cases where concat_vectors(extract_subvector(),extract_subvector()) legalizations has resulted in BLEND/VPERM2F128 shuffles of the subvectors. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll llvm/test/CodeGen/X86/known-signbits-vector.ll llvm/test/CodeGen/X86/packss.ll llvm/test/CodeGen/X86/var-permute-256.ll llvm/test/CodeGen/X86/vector-pack-256.ll llvm/test/CodeGen/X86/x86-interleaved-access.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 9f3321922d6a..2d6a0c731862 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -48304,6 +48304,31 @@ static SDValue combineExtractSubvector(SDNode *N, SelectionDAG &DAG, InVec.getOperand(0).getValueType() == VT) return InVec.getOperand(0); + // Attempt to extract from the source of a shuffle vector. + if ((InVecVT.getSizeInBits() % VT.getSizeInBits()) == 0 && + (IdxVal % VT.getVectorNumElements()) == 0) { + SmallVector ShuffleMask; + SmallVector ScaledMask; + SmallVector ShuffleInputs; + unsigned NumSubVecs = InVecVT.getSizeInBits() / VT.getSizeInBits(); + // Decode the shuffle mask and scale it so its shuffling subvectors. + if (getTargetShuffleInputs(InVecBC, ShuffleInputs, ShuffleMask, DAG) && + scaleShuffleElements(ShuffleMask, NumSubVecs, ScaledMask)) { + unsigned SubVecIdx = IdxVal / VT.getVectorNumElements(); + if (ScaledMask[SubVecIdx] == SM_SentinelUndef) + return DAG.getUNDEF(VT); + if (ScaledMask[SubVecIdx] == SM_SentinelZero) + return getZeroVector(VT, Subtarget, DAG, SDLoc(N)); + SDValue Src = ShuffleInputs[ScaledMask[SubVecIdx] / NumSubVecs]; + if (Src.getValueSizeInBits() == InVecVT.getSizeInBits()) { + unsigned SrcSubVecIdx = ScaledMask[SubVecIdx] % NumSubVecs; + unsigned SrcEltIdx = SrcSubVecIdx * VT.getVectorNumElements(); + return extractSubVector(DAG.getBitcast(InVecVT, Src), SrcEltIdx, DAG, + SDLoc(N), VT.getSizeInBits()); + } + } + } + // If we're extracting the lowest subvector and we're the only user, // we may be able to perform this with a smaller vector width. if (IdxVal == 0 && InVec.hasOneUse()) { diff --git a/llvm/test/CodeGen/X86/avx-vperm2x128.ll b/llvm/test/CodeGen/X86/avx-vperm2x128.ll index 27edb3155a39..2abca6ea7fe9 100644 --- a/llvm/test/CodeGen/X86/avx-vperm2x128.ll +++ b/llvm/test/CodeGen/X86/avx-vperm2x128.ll @@ -603,9 +603,8 @@ entry: define <4 x i64> @ld0_hi0_lo1_4i64(<4 x i64> * %pa, <4 x i64> %b) nounwind uwtable readnone ssp { ; AVX1-LABEL: ld0_hi0_lo1_4i64: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = mem[2,3],ymm0[0,1] -; AVX1-NEXT: vpaddq {{.*}}(%rip), %xmm0, %xmm1 -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 +; AVX1-NEXT: vmovdqa 16(%rdi), %xmm1 +; AVX1-NEXT: vpaddq {{.*}}(%rip), %xmm1, %xmm1 ; AVX1-NEXT: vpaddq {{.*}}(%rip), %xmm0, %xmm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: retq @@ -647,12 +646,10 @@ entry: define <8 x i32> @ld0_hi0_lo1_8i32(<8 x i32> * %pa, <8 x i32> %b) nounwind uwtable readnone ssp { ; AVX1-LABEL: ld0_hi0_lo1_8i32: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = mem[2,3],ymm0[0,1] -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1 -; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [1,2,3,4] -; AVX1-NEXT: vpaddd %xmm2, %xmm1, %xmm1 -; AVX1-NEXT: vpaddd %xmm2, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 +; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1,2,3,4] +; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0 +; AVX1-NEXT: vpaddd 16(%rdi), %xmm1, %xmm1 +; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: ld0_hi0_lo1_8i32: @@ -670,9 +667,9 @@ entry: define <8 x i32> @ld1_hi0_hi1_8i32(<8 x i32> %a, <8 x i32> * %pb) nounwind uwtable readnone ssp { ; AVX1-LABEL: ld1_hi0_hi1_8i32: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1,2,3,4] ; AVX1-NEXT: vpaddd 16(%rdi), %xmm1, %xmm2 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0 ; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 ; AVX1-NEXT: retq diff --git a/llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll b/llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll index 295b5271ed0b..f115f9a6ef38 100644 --- a/llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll +++ b/llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll @@ -7740,7 +7740,7 @@ define i64 @test_mm512_reduce_max_epi64(<8 x i64> %__W) { ; X86: # %bb.0: # %entry ; X86-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X86-NEXT: vpmaxsq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 @@ -7753,7 +7753,7 @@ define i64 @test_mm512_reduce_max_epi64(<8 x i64> %__W) { ; X64: # %bb.0: # %entry ; X64-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X64-NEXT: vpmaxsq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 @@ -7779,7 +7779,7 @@ define i64 @test_mm512_reduce_max_epu64(<8 x i64> %__W) { ; X86: # %bb.0: # %entry ; X86-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X86-NEXT: vpmaxuq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 @@ -7792,7 +7792,7 @@ define i64 @test_mm512_reduce_max_epu64(<8 x i64> %__W) { ; X64: # %bb.0: # %entry ; X64-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X64-NEXT: vpmaxuq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 @@ -7865,7 +7865,7 @@ define i64 @test_mm512_reduce_min_epi64(<8 x i64> %__W) { ; X86: # %bb.0: # %entry ; X86-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X86-NEXT: vpminsq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpminsq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpminsq %zmm1, %zmm0, %zmm0 @@ -7878,7 +7878,7 @@ define i64 @test_mm512_reduce_min_epi64(<8 x i64> %__W) { ; X64: # %bb.0: # %entry ; X64-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X64-NEXT: vpminsq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpminsq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpminsq %zmm1, %zmm0, %zmm0 @@ -7904,7 +7904,7 @@ define i64 @test_mm512_reduce_min_epu64(<8 x i64> %__W) { ; X86: # %bb.0: # %entry ; X86-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X86-NEXT: vpminuq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpminuq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpminuq %zmm1, %zmm0, %zmm0 @@ -7917,7 +7917,7 @@ define i64 @test_mm512_reduce_min_epu64(<8 x i64> %__W) { ; X64: # %bb.0: # %entry ; X64-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X64-NEXT: vpminuq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpminuq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpminuq %zmm1, %zmm0, %zmm0 @@ -7994,7 +7994,7 @@ define i64 @test_mm512_mask_reduce_max_epi64(i8 zeroext %__M, <8 x i64> %__W) { ; X86-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X86-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X86-NEXT: vpmaxsq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 @@ -8010,7 +8010,7 @@ define i64 @test_mm512_mask_reduce_max_epi64(i8 zeroext %__M, <8 x i64> %__W) { ; X64-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X64-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X64-NEXT: vpmaxsq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0 @@ -8041,7 +8041,7 @@ define i64 @test_mm512_mask_reduce_max_epu64(i8 zeroext %__M, <8 x i64> %__W) { ; X86-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z} ; X86-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X86-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 @@ -8056,7 +8056,7 @@ define i64 @test_mm512_mask_reduce_max_epu64(i8 zeroext %__M, <8 x i64> %__W) { ; X64-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z} ; X64-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm0[4,5,6,7,0,1,2,3] ; X64-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0 @@ -8144,7 +8144,7 @@ define i64 @test_mm512_mask_reduce_min_epi64(i8 zeroext %__M, <8 x i64> %__W) { ; X86-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X86-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X86-NEXT: vpminsq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpminsq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpminsq %zmm1, %zmm0, %zmm0 @@ -8160,7 +8160,7 @@ define i64 @test_mm512_mask_reduce_min_epi64(i8 zeroext %__M, <8 x i64> %__W) { ; X64-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X64-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X64-NEXT: vpminsq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpminsq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpminsq %zmm1, %zmm0, %zmm0 @@ -8192,7 +8192,7 @@ define i64 @test_mm512_mask_reduce_min_epu64(i8 zeroext %__M, <8 x i64> %__W) { ; X86-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X86-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X86-NEXT: vpminuq %zmm0, %zmm1, %zmm0 -; X86-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X86-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X86-NEXT: vpminuq %zmm1, %zmm0, %zmm0 ; X86-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X86-NEXT: vpminuq %zmm1, %zmm0, %zmm0 @@ -8208,7 +8208,7 @@ define i64 @test_mm512_mask_reduce_min_epu64(i8 zeroext %__M, <8 x i64> %__W) { ; X64-NEXT: vmovdqa64 %zmm0, %zmm1 {%k1} ; X64-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7,0,1,2,3] ; X64-NEXT: vpminuq %zmm0, %zmm1, %zmm0 -; X64-NEXT: vpermq {{.*#+}} zmm1 = zmm0[2,3,0,1,6,7,4,5] +; X64-NEXT: vextracti128 $1, %ymm0, %xmm1 ; X64-NEXT: vpminuq %zmm1, %zmm0, %zmm0 ; X64-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] ; X64-NEXT: vpminuq %zmm1, %zmm0, %zmm0 diff --git a/llvm/test/CodeGen/X86/known-signbits-vector.ll b/llvm/test/CodeGen/X86/known-signbits-vector.ll index 1e054c178598..b18b8079fd23 100644 --- a/llvm/test/CodeGen/X86/known-signbits-vector.ll +++ b/llvm/test/CodeGen/X86/known-signbits-vector.ll @@ -256,9 +256,8 @@ define <4 x double> @signbits_sext_shuffle_sitofp(<4 x i32> %a0, <4 x i64> %a1) ; X86-NEXT: vpmovsxdq %xmm0, %xmm0 ; X86-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; X86-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,3,2] -; X86-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] ; X86-NEXT: vextractf128 $1, %ymm0, %xmm1 -; X86-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X86-NEXT: vshufps {{.*#+}} xmm0 = xmm1[0,2],xmm0[0,2] ; X86-NEXT: vcvtdq2pd %xmm0, %ymm0 ; X86-NEXT: retl ; @@ -269,9 +268,8 @@ define <4 x double> @signbits_sext_shuffle_sitofp(<4 x i32> %a0, <4 x i64> %a1) ; X64-AVX1-NEXT: vpmovsxdq %xmm0, %xmm0 ; X64-AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; X64-AVX1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,3,2] -; X64-AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] ; X64-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1 -; X64-AVX1-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X64-AVX1-NEXT: vshufps {{.*#+}} xmm0 = xmm1[0,2],xmm0[0,2] ; X64-AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0 ; X64-AVX1-NEXT: retq ; diff --git a/llvm/test/CodeGen/X86/packss.ll b/llvm/test/CodeGen/X86/packss.ll index 9a4025ab75e4..f4b601afb569 100644 --- a/llvm/test/CodeGen/X86/packss.ll +++ b/llvm/test/CodeGen/X86/packss.ll @@ -356,18 +356,12 @@ define <32 x i8> @packsswb_icmp_zero_trunc_256(<16 x i16> %a0) { ; ; AVX1-LABEL: packsswb_icmp_zero_trunc_256: ; AVX1: # %bb.0: -; AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1 -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 -; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3 -; AVX1-NEXT: vpcmpeqw %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm2 = zero,zero,ymm0[0,1] -; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1 -; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0 -; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm1 -; AVX1-NEXT: vpacksswb %xmm1, %xmm2, %xmm1 +; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1 +; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm2 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 +; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0 +; AVX1-NEXT: vpacksswb %xmm0, %xmm1, %xmm0 +; AVX1-NEXT: vpacksswb %xmm2, %xmm1, %xmm1 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: ret{{[l|q]}} ; diff --git a/llvm/test/CodeGen/X86/var-permute-256.ll b/llvm/test/CodeGen/X86/var-permute-256.ll index ff099c154a15..7590add145a0 100644 --- a/llvm/test/CodeGen/X86/var-permute-256.ll +++ b/llvm/test/CodeGen/X86/var-permute-256.ll @@ -1104,18 +1104,18 @@ entry: define <4 x i32> @var_shuffle_v4i32_from_v8i32(<8 x i32> %v, <4 x i32> %indices) unnamed_addr nounwind { ; XOP-LABEL: var_shuffle_v4i32_from_v8i32: ; XOP: # %bb.0: # %entry -; XOP-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm0[2,3,2,3] +; XOP-NEXT: vextractf128 $1, %ymm0, %xmm2 ; XOP-NEXT: vpermil2ps $0, %xmm1, %xmm2, %xmm0, %xmm0 ; XOP-NEXT: vzeroupper ; XOP-NEXT: retq ; ; AVX1-LABEL: var_shuffle_v4i32_from_v8i32: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT: vpermilps %xmm1, %xmm0, %xmm2 -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 +; AVX1-NEXT: vpermilps %xmm1, %xmm2, %xmm2 ; AVX1-NEXT: vpermilps %xmm1, %xmm0, %xmm0 ; AVX1-NEXT: vpcmpgtd {{.*}}(%rip), %xmm1, %xmm1 -; AVX1-NEXT: vblendvps %xmm1, %xmm0, %xmm2, %xmm0 +; AVX1-NEXT: vblendvps %xmm1, %xmm2, %xmm0, %xmm0 ; AVX1-NEXT: vzeroupper ; AVX1-NEXT: retq ; diff --git a/llvm/test/CodeGen/X86/vector-pack-256.ll b/llvm/test/CodeGen/X86/vector-pack-256.ll index b1e72df9644e..fb0e5d063f61 100644 --- a/llvm/test/CodeGen/X86/vector-pack-256.ll +++ b/llvm/test/CodeGen/X86/vector-pack-256.ll @@ -49,16 +49,14 @@ define <16 x i16> @trunc_concat_packssdw_256(<8 x i32> %a0, <8 x i32> %a1) nounw define <16 x i16> @trunc_concat_packusdw_256(<8 x i32> %a0, <8 x i32> %a1) nounwind { ; AVX1-LABEL: trunc_concat_packusdw_256: ; AVX1: # %bb.0: -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 -; AVX1-NEXT: vpsrld $17, %xmm2, %xmm2 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm2 +; AVX1-NEXT: vpsrld $17, %xmm0, %xmm2 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT: vpsrld $17, %xmm0, %xmm0 ; AVX1-NEXT: vandps {{.*}}(%rip), %ymm1, %ymm1 -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm2[2,3],ymm1[2,3] -; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3 -; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 +; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3 +; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0 +; AVX1-NEXT: vpackusdw %xmm1, %xmm2, %xmm1 +; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: trunc_concat_packusdw_256: @@ -91,16 +89,14 @@ define <16 x i16> @trunc_concat_packusdw_256(<8 x i32> %a0, <8 x i32> %a1) nounw define <32 x i8> @trunc_concat_packsswb_256(<16 x i16> %a0, <16 x i16> %a1) nounwind { ; AVX1-LABEL: trunc_concat_packsswb_256: ; AVX1: # %bb.0: -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 -; AVX1-NEXT: vpsraw $15, %xmm2, %xmm2 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm2 +; AVX1-NEXT: vpsraw $15, %xmm0, %xmm2 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT: vpsraw $15, %xmm0, %xmm0 ; AVX1-NEXT: vandps {{.*}}(%rip), %ymm1, %ymm1 -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm2[2,3],ymm1[2,3] -; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3 -; AVX1-NEXT: vpacksswb %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 +; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3 +; AVX1-NEXT: vpacksswb %xmm3, %xmm0, %xmm0 +; AVX1-NEXT: vpacksswb %xmm1, %xmm2, %xmm1 +; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: trunc_concat_packsswb_256: @@ -145,16 +141,14 @@ define <32 x i8> @trunc_concat_packsswb_256(<16 x i16> %a0, <16 x i16> %a1) noun define <32 x i8> @trunc_concat_packuswb_256(<16 x i16> %a0, <16 x i16> %a1) nounwind { ; AVX1-LABEL: trunc_concat_packuswb_256: ; AVX1: # %bb.0: -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 -; AVX1-NEXT: vpsrlw $15, %xmm2, %xmm2 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm2 +; AVX1-NEXT: vpsrlw $15, %xmm0, %xmm2 +; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT: vpsrlw $15, %xmm0, %xmm0 ; AVX1-NEXT: vandps {{.*}}(%rip), %ymm1, %ymm1 -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm2[2,3],ymm1[2,3] -; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3 -; AVX1-NEXT: vpackuswb %xmm3, %xmm2, %xmm2 -; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 +; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3 +; AVX1-NEXT: vpackuswb %xmm3, %xmm0, %xmm0 +; AVX1-NEXT: vpackuswb %xmm1, %xmm2, %xmm1 +; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: trunc_concat_packuswb_256: diff --git a/llvm/test/CodeGen/X86/x86-interleaved-access.ll b/llvm/test/CodeGen/X86/x86-interleaved-access.ll index c323db1deeac..b80775ac7d57 100644 --- a/llvm/test/CodeGen/X86/x86-interleaved-access.ll +++ b/llvm/test/CodeGen/X86/x86-interleaved-access.ll @@ -584,279 +584,265 @@ define <16 x i1> @interleaved_load_vf16_i8_stride4(<64 x i8>* %ptr) { define <32 x i1> @interleaved_load_vf32_i8_stride4(<128 x i8>* %ptr) { ; AVX1-LABEL: interleaved_load_vf32_i8_stride4: ; AVX1: # %bb.0: -; AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX1-NEXT: vmovdqa 112(%rdi), %xmm11 -; AVX1-NEXT: vpshufb %xmm0, %xmm11, %xmm1 -; AVX1-NEXT: vmovdqa 96(%rdi), %xmm12 -; AVX1-NEXT: vpshufb %xmm0, %xmm12, %xmm3 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[1],xmm1[1] -; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX1-NEXT: vmovdqa 80(%rdi), %xmm14 -; AVX1-NEXT: vpshufb %xmm2, %xmm14, %xmm4 -; AVX1-NEXT: vmovdqa 64(%rdi), %xmm6 -; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm5 +; AVX1-NEXT: vmovdqa {{.*#+}} xmm6 = +; AVX1-NEXT: vmovdqa (%rdi), %xmm10 +; AVX1-NEXT: vmovdqa 16(%rdi), %xmm11 +; AVX1-NEXT: vmovdqa 32(%rdi), %xmm12 +; AVX1-NEXT: vmovdqa 48(%rdi), %xmm13 +; AVX1-NEXT: vpshufb %xmm6, %xmm13, %xmm4 +; AVX1-NEXT: vpshufb %xmm6, %xmm12, %xmm5 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm5[0],xmm4[0],xmm5[1],xmm4[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm4[0,1,2,3],xmm1[4,5,6,7] -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm8 -; AVX1-NEXT: vmovdqa (%rdi), %xmm13 -; AVX1-NEXT: vmovdqa 16(%rdi), %xmm15 -; AVX1-NEXT: vmovdqa 32(%rdi), %xmm7 -; AVX1-NEXT: vmovdqa 48(%rdi), %xmm5 -; AVX1-NEXT: vpshufb %xmm0, %xmm5, %xmm1 +; AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX1-NEXT: vpshufb %xmm0, %xmm11, %xmm5 +; AVX1-NEXT: vpshufb %xmm0, %xmm10, %xmm7 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm7[0],xmm5[0],xmm7[1],xmm5[1] +; AVX1-NEXT: vpblendw {{.*#+}} xmm8 = xmm5[0,1,2,3],xmm4[4,5,6,7] +; AVX1-NEXT: vmovdqa 112(%rdi), %xmm14 +; AVX1-NEXT: vpshufb %xmm6, %xmm14, %xmm7 +; AVX1-NEXT: vmovdqa 96(%rdi), %xmm5 +; AVX1-NEXT: vpshufb %xmm6, %xmm5, %xmm6 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm6[0],xmm7[0],xmm6[1],xmm7[1] +; AVX1-NEXT: vmovdqa 80(%rdi), %xmm6 +; AVX1-NEXT: vpshufb %xmm0, %xmm6, %xmm2 +; AVX1-NEXT: vmovdqa 64(%rdi), %xmm7 ; AVX1-NEXT: vpshufb %xmm0, %xmm7, %xmm0 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] -; AVX1-NEXT: vpshufb %xmm2, %xmm15, %xmm1 -; AVX1-NEXT: vpshufb %xmm2, %xmm13, %xmm2 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7] -; AVX1-NEXT: vblendps {{.*#+}} ymm8 = ymm0[0,1,2,3],ymm8[4,5,6,7] -; AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX1-NEXT: vpshufb %xmm0, %xmm11, %xmm1 -; AVX1-NEXT: vpshufb %xmm0, %xmm12, %xmm2 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] +; AVX1-NEXT: vpblendw {{.*#+}} xmm9 = xmm0[0,1,2,3],xmm1[4,5,6,7] +; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = +; AVX1-NEXT: vpshufb %xmm1, %xmm13, %xmm2 +; AVX1-NEXT: vpshufb %xmm1, %xmm12, %xmm0 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] ; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX1-NEXT: vpshufb %xmm2, %xmm14, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm4 +; AVX1-NEXT: vpshufb %xmm2, %xmm11, %xmm3 +; AVX1-NEXT: vpshufb %xmm2, %xmm10, %xmm4 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm3[0,1,2,3],xmm1[4,5,6,7] -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 -; AVX1-NEXT: vpshufb %xmm0, %xmm5, %xmm3 -; AVX1-NEXT: vpshufb %xmm0, %xmm7, %xmm0 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] -; AVX1-NEXT: vpshufb %xmm2, %xmm15, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm13, %xmm2 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm2[0,1,2,3],xmm0[4,5,6,7] -; AVX1-NEXT: vblendps {{.*#+}} ymm9 = ymm0[0,1,2,3],ymm1[4,5,6,7] +; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm3[0,1,2,3],xmm0[4,5,6,7] +; AVX1-NEXT: vpcmpeqb %xmm0, %xmm8, %xmm0 +; AVX1-NEXT: vmovdqu %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill +; AVX1-NEXT: vpshufb %xmm1, %xmm14, %xmm0 +; AVX1-NEXT: vpshufb %xmm1, %xmm5, %xmm1 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] +; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm1 +; AVX1-NEXT: vpshufb %xmm2, %xmm7, %xmm2 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] +; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7] +; AVX1-NEXT: vpcmpeqb %xmm0, %xmm9, %xmm9 ; AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX1-NEXT: vpshufb %xmm0, %xmm11, %xmm1 +; AVX1-NEXT: vpshufb %xmm0, %xmm13, %xmm1 ; AVX1-NEXT: vpshufb %xmm0, %xmm12, %xmm2 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] ; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX1-NEXT: vpshufb %xmm2, %xmm14, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm4 +; AVX1-NEXT: vpshufb %xmm2, %xmm11, %xmm3 +; AVX1-NEXT: vpshufb %xmm2, %xmm10, %xmm4 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm3[0,1,2,3],xmm1[4,5,6,7] -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 -; AVX1-NEXT: vpshufb %xmm0, %xmm5, %xmm3 -; AVX1-NEXT: vpshufb %xmm0, %xmm7, %xmm0 +; AVX1-NEXT: vpblendw {{.*#+}} xmm8 = xmm3[0,1,2,3],xmm1[4,5,6,7] +; AVX1-NEXT: vpshufb %xmm0, %xmm14, %xmm3 +; AVX1-NEXT: vpshufb %xmm0, %xmm5, %xmm0 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] -; AVX1-NEXT: vpshufb %xmm2, %xmm15, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm13, %xmm2 +; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm3 +; AVX1-NEXT: vpshufb %xmm2, %xmm7, %xmm2 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm2[0,1,2,3],xmm0[4,5,6,7] -; AVX1-NEXT: vblendps {{.*#+}} ymm10 = ymm0[0,1,2,3],ymm1[4,5,6,7] -; AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX1-NEXT: vpshufb %xmm0, %xmm11, %xmm1 -; AVX1-NEXT: vpshufb %xmm0, %xmm12, %xmm2 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] -; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX1-NEXT: vpshufb %xmm2, %xmm14, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm6, %xmm4 +; AVX1-NEXT: vpblendw {{.*#+}} xmm15 = xmm2[0,1,2,3],xmm0[4,5,6,7] +; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = +; AVX1-NEXT: vpshufb %xmm2, %xmm13, %xmm3 +; AVX1-NEXT: vpshufb %xmm2, %xmm12, %xmm4 ; AVX1-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm3[0,1,2,3],xmm1[4,5,6,7] -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 -; AVX1-NEXT: vpshufb %xmm0, %xmm5, %xmm3 -; AVX1-NEXT: vpshufb %xmm0, %xmm7, %xmm0 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] -; AVX1-NEXT: vpshufb %xmm2, %xmm15, %xmm3 -; AVX1-NEXT: vpshufb %xmm2, %xmm13, %xmm2 -; AVX1-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm2[0,1,2,3],xmm0[4,5,6,7] -; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] -; AVX1-NEXT: vextractf128 $1, %ymm9, %xmm1 -; AVX1-NEXT: vextractf128 $1, %ymm8, %xmm2 -; AVX1-NEXT: vpcmpeqb %xmm1, %xmm2, %xmm1 -; AVX1-NEXT: vpcmpeqb %xmm9, %xmm8, %xmm2 -; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1 -; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2 -; AVX1-NEXT: vextractf128 $1, %ymm10, %xmm3 -; AVX1-NEXT: vpcmpeqb %xmm2, %xmm3, %xmm2 -; AVX1-NEXT: vpcmpeqb %xmm0, %xmm10, %xmm0 -; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0 -; AVX1-NEXT: vxorps %ymm0, %ymm1, %ymm0 +; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX1-NEXT: vpshufb %xmm4, %xmm11, %xmm0 +; AVX1-NEXT: vpshufb %xmm4, %xmm10, %xmm1 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] +; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm3[4,5,6,7] +; AVX1-NEXT: vpcmpeqb %xmm0, %xmm8, %xmm0 +; AVX1-NEXT: vpshufb %xmm2, %xmm14, %xmm1 +; AVX1-NEXT: vpshufb %xmm2, %xmm5, %xmm2 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] +; AVX1-NEXT: vpshufb %xmm4, %xmm6, %xmm2 +; AVX1-NEXT: vpshufb %xmm4, %xmm7, %xmm3 +; AVX1-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1] +; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7] +; AVX1-NEXT: vpcmpeqb %xmm1, %xmm15, %xmm1 +; AVX1-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload +; AVX1-NEXT: vinsertf128 $1, %xmm9, %ymm2, %ymm2 +; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 +; AVX1-NEXT: vxorps %ymm0, %ymm2, %ymm0 ; AVX1-NEXT: vxorps {{.*}}(%rip), %ymm0, %ymm0 ; AVX1-NEXT: retq ; ; AVX2-LABEL: interleaved_load_vf32_i8_stride4: ; AVX2: # %bb.0: -; AVX2-NEXT: vmovdqa (%rdi), %xmm9 -; AVX2-NEXT: vmovdqa 16(%rdi), %xmm11 -; AVX2-NEXT: vmovdqa 32(%rdi), %xmm12 -; AVX2-NEXT: vmovdqa 48(%rdi), %xmm13 -; AVX2-NEXT: vmovdqa {{.*#+}} xmm6 = -; AVX2-NEXT: vpshufb %xmm6, %xmm13, %xmm4 -; AVX2-NEXT: vpshufb %xmm6, %xmm12, %xmm5 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm5[0],xmm4[0],xmm5[1],xmm4[1] -; AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX2-NEXT: vpshufb %xmm0, %xmm11, %xmm5 -; AVX2-NEXT: vpshufb %xmm0, %xmm9, %xmm7 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm7[0],xmm5[0],xmm7[1],xmm5[1] -; AVX2-NEXT: vpblendd {{.*#+}} xmm8 = xmm5[0,1],xmm4[2,3] -; AVX2-NEXT: vmovdqa 112(%rdi), %xmm14 -; AVX2-NEXT: vpshufb %xmm6, %xmm14, %xmm7 -; AVX2-NEXT: vpermq {{.*#+}} ymm5 = mem[2,3,0,1] -; AVX2-NEXT: vextracti128 $1, %ymm5, %xmm5 -; AVX2-NEXT: vpshufb %xmm6, %xmm5, %xmm6 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm6 = xmm6[0],xmm7[0],xmm6[1],xmm7[1] -; AVX2-NEXT: vinserti128 $1, %xmm6, %ymm0, %ymm10 -; AVX2-NEXT: vmovdqa 80(%rdi), %xmm6 -; AVX2-NEXT: vpshufb %xmm0, %xmm6, %xmm1 -; AVX2-NEXT: vpermq {{.*#+}} ymm7 = mem[2,3,0,1] -; AVX2-NEXT: vextracti128 $1, %ymm7, %xmm7 -; AVX2-NEXT: vpshufb %xmm0, %xmm7, %xmm0 +; AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = +; AVX2-NEXT: vmovdqa 112(%rdi), %xmm9 +; AVX2-NEXT: vpshufb %xmm0, %xmm9, %xmm1 +; AVX2-NEXT: vmovdqa 96(%rdi), %xmm10 +; AVX2-NEXT: vpshufb %xmm0, %xmm10, %xmm3 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[1],xmm1[1] +; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm1 +; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX2-NEXT: vmovdqa 80(%rdi), %xmm12 +; AVX2-NEXT: vpshufb %xmm2, %xmm12, %xmm4 +; AVX2-NEXT: vmovdqa 64(%rdi), %xmm5 +; AVX2-NEXT: vpshufb %xmm2, %xmm5, %xmm6 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm6[0],xmm4[0],xmm6[1],xmm4[1] +; AVX2-NEXT: vinserti128 $1, %xmm4, %ymm0, %ymm4 +; AVX2-NEXT: vpblendd {{.*#+}} ymm8 = ymm4[0,1,2,3,4,5],ymm1[6,7] +; AVX2-NEXT: vmovdqa (%rdi), %xmm11 +; AVX2-NEXT: vmovdqa 16(%rdi), %xmm13 +; AVX2-NEXT: vmovdqa 32(%rdi), %xmm6 +; AVX2-NEXT: vmovdqa 48(%rdi), %xmm7 +; AVX2-NEXT: vpshufb %xmm0, %xmm7, %xmm1 +; AVX2-NEXT: vpshufb %xmm0, %xmm6, %xmm0 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] -; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3,4,5],ymm10[6,7] -; AVX2-NEXT: vpblendd {{.*#+}} ymm8 = ymm8[0,1,2,3],ymm0[4,5,6,7] +; AVX2-NEXT: vpshufb %xmm2, %xmm13, %xmm1 +; AVX2-NEXT: vpshufb %xmm2, %xmm11, %xmm2 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] +; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3] +; AVX2-NEXT: vpblendd {{.*#+}} ymm8 = ymm0[0,1,2,3],ymm8[4,5,6,7] ; AVX2-NEXT: vmovdqa {{.*#+}} xmm1 = -; AVX2-NEXT: vpshufb %xmm1, %xmm13, %xmm0 -; AVX2-NEXT: vpshufb %xmm1, %xmm12, %xmm2 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[1],xmm0[1] +; AVX2-NEXT: vpshufb %xmm1, %xmm9, %xmm2 +; AVX2-NEXT: vpshufb %xmm1, %xmm10, %xmm0 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] +; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 ; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX2-NEXT: vpshufb %xmm2, %xmm11, %xmm3 -; AVX2-NEXT: vpshufb %xmm2, %xmm9, %xmm4 +; AVX2-NEXT: vpshufb %xmm2, %xmm12, %xmm3 +; AVX2-NEXT: vpshufb %xmm2, %xmm5, %xmm4 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1] -; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm3[0,1],xmm0[2,3] -; AVX2-NEXT: vpshufb %xmm1, %xmm14, %xmm3 -; AVX2-NEXT: vpshufb %xmm1, %xmm5, %xmm1 +; AVX2-NEXT: vinserti128 $1, %xmm3, %ymm0, %ymm3 +; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm3[0,1,2,3,4,5],ymm0[6,7] +; AVX2-NEXT: vpshufb %xmm1, %xmm7, %xmm3 +; AVX2-NEXT: vpshufb %xmm1, %xmm6, %xmm1 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm1 -; AVX2-NEXT: vpshufb %xmm2, %xmm6, %xmm3 -; AVX2-NEXT: vpshufb %xmm2, %xmm7, %xmm2 +; AVX2-NEXT: vpshufb %xmm2, %xmm13, %xmm3 +; AVX2-NEXT: vpshufb %xmm2, %xmm11, %xmm2 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2 -; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0,1,2,3,4,5],ymm1[6,7] -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] +; AVX2-NEXT: vpblendd {{.*#+}} xmm1 = xmm2[0,1],xmm1[2,3] +; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] ; AVX2-NEXT: vpcmpeqb %ymm0, %ymm8, %ymm8 ; AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX2-NEXT: vpshufb %xmm0, %xmm13, %xmm1 -; AVX2-NEXT: vpshufb %xmm0, %xmm12, %xmm2 +; AVX2-NEXT: vpshufb %xmm0, %xmm9, %xmm1 +; AVX2-NEXT: vpshufb %xmm0, %xmm10, %xmm2 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] +; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm1 ; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX2-NEXT: vpshufb %xmm2, %xmm11, %xmm3 -; AVX2-NEXT: vpshufb %xmm2, %xmm9, %xmm4 +; AVX2-NEXT: vpshufb %xmm2, %xmm12, %xmm3 +; AVX2-NEXT: vpshufb %xmm2, %xmm5, %xmm4 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1] -; AVX2-NEXT: vpblendd {{.*#+}} xmm1 = xmm3[0,1],xmm1[2,3] -; AVX2-NEXT: vpshufb %xmm0, %xmm14, %xmm3 -; AVX2-NEXT: vpshufb %xmm0, %xmm5, %xmm0 +; AVX2-NEXT: vinserti128 $1, %xmm3, %ymm0, %ymm3 +; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm3[0,1,2,3,4,5],ymm1[6,7] +; AVX2-NEXT: vpshufb %xmm0, %xmm7, %xmm3 +; AVX2-NEXT: vpshufb %xmm0, %xmm6, %xmm0 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] -; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 -; AVX2-NEXT: vpshufb %xmm2, %xmm6, %xmm3 -; AVX2-NEXT: vpshufb %xmm2, %xmm7, %xmm2 +; AVX2-NEXT: vpshufb %xmm2, %xmm13, %xmm3 +; AVX2-NEXT: vpshufb %xmm2, %xmm11, %xmm2 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2 -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3,4,5],ymm0[6,7] -; AVX2-NEXT: vpblendd {{.*#+}} ymm10 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm2[0,1],xmm0[2,3] +; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] ; AVX2-NEXT: vmovdqa {{.*#+}} xmm1 = -; AVX2-NEXT: vpshufb %xmm1, %xmm13, %xmm2 -; AVX2-NEXT: vpshufb %xmm1, %xmm12, %xmm3 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1] -; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX2-NEXT: vpshufb %xmm3, %xmm11, %xmm4 -; AVX2-NEXT: vpshufb %xmm3, %xmm9, %xmm0 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1] -; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3] -; AVX2-NEXT: vpshufb %xmm1, %xmm14, %xmm2 -; AVX2-NEXT: vpshufb %xmm1, %xmm5, %xmm1 -; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm1 -; AVX2-NEXT: vpshufb %xmm3, %xmm6, %xmm2 -; AVX2-NEXT: vpshufb %xmm3, %xmm7, %xmm3 +; AVX2-NEXT: vpshufb %xmm1, %xmm9, %xmm2 +; AVX2-NEXT: vpshufb %xmm1, %xmm10, %xmm3 ; AVX2-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1] ; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2 -; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0,1,2,3,4,5],ymm1[6,7] -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] -; AVX2-NEXT: vpcmpeqb %ymm0, %ymm10, %ymm0 +; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX2-NEXT: vpshufb %xmm3, %xmm12, %xmm4 +; AVX2-NEXT: vpshufb %xmm3, %xmm5, %xmm5 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm5[0],xmm4[0],xmm5[1],xmm4[1] +; AVX2-NEXT: vinserti128 $1, %xmm4, %ymm0, %ymm4 +; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm4[0,1,2,3,4,5],ymm2[6,7] +; AVX2-NEXT: vpshufb %xmm1, %xmm7, %xmm4 +; AVX2-NEXT: vpshufb %xmm1, %xmm6, %xmm1 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1] +; AVX2-NEXT: vpshufb %xmm3, %xmm13, %xmm4 +; AVX2-NEXT: vpshufb %xmm3, %xmm11, %xmm3 +; AVX2-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1] +; AVX2-NEXT: vpblendd {{.*#+}} xmm1 = xmm3[0,1],xmm1[2,3] +; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm2[4,5,6,7] +; AVX2-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: vpxor %ymm0, %ymm8, %ymm0 ; AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0 ; AVX2-NEXT: retq ; ; AVX512-LABEL: interleaved_load_vf32_i8_stride4: ; AVX512: # %bb.0: -; AVX512-NEXT: vmovdqa 112(%rdi), %xmm10 -; AVX512-NEXT: vmovdqa {{.*#+}} xmm2 = -; AVX512-NEXT: vpshufb %xmm2, %xmm10, %xmm3 -; AVX512-NEXT: vpermq {{.*#+}} ymm1 = mem[2,3,0,1] -; AVX512-NEXT: vextracti128 $1, %ymm1, %xmm11 -; AVX512-NEXT: vpshufb %xmm2, %xmm11, %xmm2 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX512-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm4 -; AVX512-NEXT: vmovdqa 80(%rdi), %xmm12 +; AVX512-NEXT: vmovdqa 112(%rdi), %xmm11 +; AVX512-NEXT: vmovdqa {{.*#+}} xmm0 = +; AVX512-NEXT: vpshufb %xmm0, %xmm11, %xmm3 +; AVX512-NEXT: vmovdqa 96(%rdi), %xmm13 +; AVX512-NEXT: vpshufb %xmm0, %xmm13, %xmm0 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] +; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 +; AVX512-NEXT: vmovdqa 80(%rdi), %xmm14 ; AVX512-NEXT: vmovdqa {{.*#+}} xmm5 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX512-NEXT: vpshufb %xmm5, %xmm12, %xmm6 -; AVX512-NEXT: vpermq {{.*#+}} ymm3 = mem[2,3,0,1] -; AVX512-NEXT: vextracti128 $1, %ymm3, %xmm13 -; AVX512-NEXT: vpshufb %xmm5, %xmm13, %xmm5 +; AVX512-NEXT: vpshufb %xmm5, %xmm14, %xmm6 +; AVX512-NEXT: vmovdqa 64(%rdi), %xmm4 +; AVX512-NEXT: vpshufb %xmm5, %xmm4, %xmm5 ; AVX512-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1] ; AVX512-NEXT: vinserti128 $1, %xmm5, %ymm0, %ymm5 -; AVX512-NEXT: vpblendd {{.*#+}} ymm4 = ymm5[0,1,2,3,4,5],ymm4[6,7] +; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm5[0,1,2,3,4,5],ymm0[6,7] ; AVX512-NEXT: vmovdqa64 (%rdi), %zmm5 ; AVX512-NEXT: vpmovdb %zmm5, %xmm5 -; AVX512-NEXT: vpblendd {{.*#+}} ymm9 = ymm5[0,1,2,3],ymm4[4,5,6,7] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX512-NEXT: vmovdqa (%rdi), %xmm14 -; AVX512-NEXT: vmovdqa 16(%rdi), %xmm6 -; AVX512-NEXT: vmovdqa 32(%rdi), %xmm7 -; AVX512-NEXT: vmovdqa 48(%rdi), %xmm4 -; AVX512-NEXT: vpshufb %xmm0, %xmm4, %xmm1 -; AVX512-NEXT: vpshufb %xmm0, %xmm7, %xmm2 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX512-NEXT: vpshufb %xmm2, %xmm6, %xmm3 -; AVX512-NEXT: vpshufb %xmm2, %xmm14, %xmm5 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm5[0],xmm3[0],xmm5[1],xmm3[1] -; AVX512-NEXT: vpblendd {{.*#+}} xmm1 = xmm3[0,1],xmm1[2,3] -; AVX512-NEXT: vpshufb %xmm0, %xmm10, %xmm3 -; AVX512-NEXT: vpshufb %xmm0, %xmm11, %xmm0 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] +; AVX512-NEXT: vpblendd {{.*#+}} ymm9 = ymm5[0,1,2,3],ymm0[4,5,6,7] +; AVX512-NEXT: vmovdqa {{.*#+}} xmm5 = +; AVX512-NEXT: vpshufb %xmm5, %xmm11, %xmm0 +; AVX512-NEXT: vpshufb %xmm5, %xmm13, %xmm6 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm6[0],xmm0[0],xmm6[1],xmm0[1] ; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 -; AVX512-NEXT: vpshufb %xmm2, %xmm12, %xmm3 -; AVX512-NEXT: vpshufb %xmm2, %xmm13, %xmm2 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] +; AVX512-NEXT: vmovdqa {{.*#+}} xmm1 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX512-NEXT: vpshufb %xmm1, %xmm14, %xmm6 +; AVX512-NEXT: vpshufb %xmm1, %xmm4, %xmm7 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm6 = xmm7[0],xmm6[0],xmm7[1],xmm6[1] +; AVX512-NEXT: vinserti128 $1, %xmm6, %ymm0, %ymm6 +; AVX512-NEXT: vpblendd {{.*#+}} ymm8 = ymm6[0,1,2,3,4,5],ymm0[6,7] +; AVX512-NEXT: vmovdqa (%rdi), %xmm10 +; AVX512-NEXT: vmovdqa 16(%rdi), %xmm12 +; AVX512-NEXT: vmovdqa 32(%rdi), %xmm7 +; AVX512-NEXT: vmovdqa 48(%rdi), %xmm0 +; AVX512-NEXT: vpshufb %xmm5, %xmm0, %xmm6 +; AVX512-NEXT: vpshufb %xmm5, %xmm7, %xmm5 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1] +; AVX512-NEXT: vpshufb %xmm1, %xmm12, %xmm6 +; AVX512-NEXT: vpshufb %xmm1, %xmm10, %xmm1 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm6[0],xmm1[1],xmm6[1] +; AVX512-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0,1],xmm5[2,3] +; AVX512-NEXT: vpblendd {{.*#+}} ymm8 = ymm1[0,1,2,3],ymm8[4,5,6,7] +; AVX512-NEXT: vmovdqa {{.*#+}} xmm1 = +; AVX512-NEXT: vpshufb %xmm1, %xmm11, %xmm5 +; AVX512-NEXT: vpshufb %xmm1, %xmm13, %xmm6 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm6[0],xmm5[0],xmm6[1],xmm5[1] +; AVX512-NEXT: vinserti128 $1, %xmm5, %ymm0, %ymm5 +; AVX512-NEXT: vmovdqa {{.*#+}} xmm6 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX512-NEXT: vpshufb %xmm6, %xmm14, %xmm2 +; AVX512-NEXT: vpshufb %xmm6, %xmm4, %xmm3 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1] ; AVX512-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2 -; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3,4,5],ymm0[6,7] -; AVX512-NEXT: vpblendd {{.*#+}} ymm8 = ymm1[0,1,2,3],ymm0[4,5,6,7] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm0 = -; AVX512-NEXT: vpshufb %xmm0, %xmm4, %xmm1 -; AVX512-NEXT: vpshufb %xmm0, %xmm7, %xmm2 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm2 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX512-NEXT: vpshufb %xmm2, %xmm6, %xmm3 -; AVX512-NEXT: vpshufb %xmm2, %xmm14, %xmm5 +; AVX512-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3,4,5],ymm5[6,7] +; AVX512-NEXT: vpshufb %xmm1, %xmm0, %xmm3 +; AVX512-NEXT: vpshufb %xmm1, %xmm7, %xmm1 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1] +; AVX512-NEXT: vpshufb %xmm6, %xmm12, %xmm3 +; AVX512-NEXT: vpshufb %xmm6, %xmm10, %xmm5 ; AVX512-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm5[0],xmm3[0],xmm5[1],xmm3[1] ; AVX512-NEXT: vpblendd {{.*#+}} xmm1 = xmm3[0,1],xmm1[2,3] -; AVX512-NEXT: vpshufb %xmm0, %xmm10, %xmm3 -; AVX512-NEXT: vpshufb %xmm0, %xmm11, %xmm0 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] -; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0 -; AVX512-NEXT: vpshufb %xmm2, %xmm12, %xmm3 -; AVX512-NEXT: vpshufb %xmm2, %xmm13, %xmm2 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; AVX512-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2 -; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3,4,5],ymm0[6,7] -; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm1 = -; AVX512-NEXT: vpshufb %xmm1, %xmm4, %xmm2 -; AVX512-NEXT: vpshufb %xmm1, %xmm7, %xmm3 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1] -; AVX512-NEXT: vmovdqa {{.*#+}} xmm3 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> -; AVX512-NEXT: vpshufb %xmm3, %xmm6, %xmm4 -; AVX512-NEXT: vpshufb %xmm3, %xmm14, %xmm5 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm5[0],xmm4[0],xmm5[1],xmm4[1] -; AVX512-NEXT: vpblendd {{.*#+}} xmm2 = xmm4[0,1],xmm2[2,3] -; AVX512-NEXT: vpshufb %xmm1, %xmm10, %xmm4 -; AVX512-NEXT: vpshufb %xmm1, %xmm11, %xmm1 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1] -; AVX512-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm1 -; AVX512-NEXT: vpshufb %xmm3, %xmm12, %xmm4 -; AVX512-NEXT: vpshufb %xmm3, %xmm13, %xmm3 -; AVX512-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1] +; AVX512-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm2[4,5,6,7] +; AVX512-NEXT: vmovdqa {{.*#+}} xmm2 = +; AVX512-NEXT: vpshufb %xmm2, %xmm11, %xmm3 +; AVX512-NEXT: vpshufb %xmm2, %xmm13, %xmm5 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm5[0],xmm3[0],xmm5[1],xmm3[1] ; AVX512-NEXT: vinserti128 $1, %xmm3, %ymm0, %ymm3 -; AVX512-NEXT: vpblendd {{.*#+}} ymm1 = ymm3[0,1,2,3,4,5],ymm1[6,7] -; AVX512-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0,1,2,3],ymm1[4,5,6,7] +; AVX512-NEXT: vmovdqa {{.*#+}} xmm5 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u> +; AVX512-NEXT: vpshufb %xmm5, %xmm14, %xmm6 +; AVX512-NEXT: vpshufb %xmm5, %xmm4, %xmm4 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm6[0],xmm4[1],xmm6[1] +; AVX512-NEXT: vinserti128 $1, %xmm4, %ymm0, %ymm4 +; AVX512-NEXT: vpblendd {{.*#+}} ymm3 = ymm4[0,1,2,3,4,5],ymm3[6,7] +; AVX512-NEXT: vpshufb %xmm2, %xmm0, %xmm0 +; AVX512-NEXT: vpshufb %xmm2, %xmm7, %xmm2 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[1],xmm0[1] +; AVX512-NEXT: vpshufb %xmm5, %xmm12, %xmm2 +; AVX512-NEXT: vpshufb %xmm5, %xmm10, %xmm4 +; AVX512-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm4[0],xmm2[0],xmm4[1],xmm2[1] +; AVX512-NEXT: vpblendd {{.*#+}} xmm0 = xmm2[0,1],xmm0[2,3] +; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm3[4,5,6,7] ; AVX512-NEXT: vpcmpeqb %zmm8, %zmm9, %k0 -; AVX512-NEXT: vpcmpeqb %zmm1, %zmm0, %k1 +; AVX512-NEXT: vpcmpeqb %zmm0, %zmm1, %k1 ; AVX512-NEXT: kxnord %k1, %k0, %k0 ; AVX512-NEXT: vpmovm2b %k0, %zmm0 ; AVX512-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0 From llvm-commits at lists.llvm.org Thu Jul 9 06:13:34 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:13:34 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: djtodoro added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:492 + /// Returns a pointer to the operand corresponding to a debug use of Reg, or /// nullptr if Reg is not used in any debug operand. ---------------- ` \p Reg` ? ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:503 + } + ArrayRef getDebugOperandsForReg(Register Reg) { + assert(isDebugValue() && "Tried to get debug operands for non-debug_value"); ---------------- Can we use templates to avoid duplicated code here for `getDebugOperandsForReg()`? ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:1165 + } + bool isVariadicDebugValue() const { + return getOpcode() == TargetOpcode::DBG_VALUE_LIST; ---------------- Should the name of the method follow the new name of the meta instr ? ================ Comment at: llvm/lib/CodeGen/MachineInstr.cpp:2234 // Propagate Reg to debug value instructions. - for (auto *DBI : DbgValues) - DBI->getDebugOperandForReg(DefReg)->setReg(Reg); + for (auto *DBI : DbgValues) { + for (MachineOperand *Op : DBI->getDebugOperandsForReg(DefReg)) { ---------------- No need for the extra curly brackets ================ Comment at: llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir:1 +# RUN: llc -run-pass machineverifier -o - %s | FileCheck %s +# CHECK: DBG_VALUE_LIST !14, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus, DW_OP_stack_value), $edi, $esi, debug-location !15 ---------------- Please add a top level comment describing the purpose of the test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 From llvm-commits at lists.llvm.org Thu Jul 9 06:14:21 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:14:21 +0000 (UTC) Subject: [PATCH] D83481: [yaml2obj] - Refactor header-sh-fields.yaml test. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added a subscriber: emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This refines the test to use macros. It is needed for a follow-up change that adds a functionality to override more fields. Also, it is just cleaner to test each key separately. https://reviews.llvm.org/D83481 Files: llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml Index: llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml =================================================================== --- llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml +++ llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml @@ -3,30 +3,13 @@ ## First we check the default values. -# RUN: yaml2obj --docnum=1 %s -o %t1 -# RUN: llvm-readelf --file-headers %t1 | FileCheck %s --check-prefix=DEFAULT +# RUN: yaml2obj %s -o %t-default +# RUN: llvm-readelf --file-headers %t-default | FileCheck %s --check-prefix=DEFAULT -# DEFAULT: Start of section headers: 88 (bytes into file) -# DEFAULT: Size of section headers: 64 (bytes) -# DEFAULT: Number of section headers: 3 -# DEFAULT: Section header string table index: 2 - ---- !ELF -FileHeader: - Class: ELFCLASS64 - Data: ELFDATA2LSB - Type: ET_REL - Machine: EM_X86_64 - -## Override 3 fields: e_shoff, e_shnum and e_shstrndx. Check the output. - -# RUN: yaml2obj --docnum=2 %s -o %t2 -# RUN: llvm-readelf --file-headers %t2 | FileCheck %s --check-prefix=CUSTOM - -# CUSTOM: Start of section headers: 2 (bytes into file) -# CUSTOM: Size of section headers: 64 (bytes) -# CUSTOM: Number of section headers: 3 -# CUSTOM: Section header string table index: 4 +# DEFAULT: Start of section headers: 88 (bytes into file) +# DEFAULT: Size of section headers: 64 (bytes) +# DEFAULT: Number of section headers: 3 +# DEFAULT: Section header string table index: 2 --- !ELF FileHeader: @@ -34,28 +17,37 @@ Data: ELFDATA2LSB Type: ET_REL Machine: EM_X86_64 - SHEntSize: 64 - SHOff: 2 - SHNum: 3 - SHStrNdx: 4 + SHEntSize: [[SHENTSIZE=64]] + SHOff: [[SHOFF=88]] + SHNum: [[SHNUM=3]] + SHStrNdx: [[SHSTRNDX=2]] + +## Override different fields to check the output produced. + +## Override the e_shoff field. +# RUN: yaml2obj %s -DSHOFF=3 -o %t2 +# RUN: llvm-readelf --file-headers %t2 | FileCheck %s --check-prefix=SHOFF + +# SHOFF: Start of section headers: 3 (bytes into file) + +## Override the e_shnum field. +# RUN: yaml2obj %s -DSHNUM=2 -o %t3 +# RUN: llvm-readelf --file-headers %t3 | FileCheck %s --check-prefix=SHNUM -## Finally, we use the same YAML as above, but set e_shentsize to 1. +# SHNUM: Number of section headers: 2{{$}} + +## Override the e_shstrndx field. +# RUN: yaml2obj %s -DSHSTRNDX=4 -o %t4 +# RUN: llvm-readelf --file-headers %t4 | FileCheck %s --check-prefix=SHSTRNDX + +# SHSTRNDX: Section header string table index: 4{{$}} + +## Override the e_shentsize field. ## Check the result using raw output from 'od' because llvm-readelf ## is unable to dump such headers. -# RUN: yaml2obj --docnum=3 %s -o %t3 -# RUN: od -A n -t x1 -v -j 0x3a -N 1 %t3 | FileCheck %s --check-prefix=NEWSIZE -# RUN: od -A n -t x1 -v -j 0x3a -N 1 %t2 | FileCheck %s --check-prefix=OLDSIZE +# RUN: yaml2obj %s -DSHENTSIZE=1 -o %t5 +# RUN: od -A n -t x1 -v -j 0x3a -N 1 %t5 | FileCheck %s --check-prefix=NEWSIZE +# RUN: od -A n -t x1 -v -j 0x3a -N 1 %t-default | FileCheck %s --check-prefix=OLDSIZE # NEWSIZE: 01 # OLDSIZE: 40 - ---- !ELF -FileHeader: - Class: ELFCLASS64 - Data: ELFDATA2LSB - Type: ET_REL - Machine: EM_X86_64 - SHEntSize: 1 - SHOff: 2 - SHNum: 3 - SHStrNdx: 4 -------------- next part -------------- A non-text attachment was scrubbed... Name: D83481.276723.patch Type: text/x-patch Size: 3320 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 06:14:58 2020 From: llvm-commits at lists.llvm.org (Yvan Roux via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:14:58 +0000 (UTC) Subject: [PATCH] D83313: [MachineOutliner] Fix liveness computing. In-Reply-To: References: Message-ID: <18963935a2bc21176c196dadd90c3453@localhost.localdomain> yroux added a comment. In D83313#2137190 , @efriedma wrote: > I'm not really happy with this approach; if LiveRegUnits isn't producing correct results, we should fix it, not try to hack around it. > > Maybe we should revive D40061 ... My attempt wasn't to to hack around it, but to calculate the LiveRegUnit accurately ;-) Sorry I wasn't working on LLVM when you proposed D40061 , but it is indeed a better approach, can you re-open it ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83313/new/ https://reviews.llvm.org/D83313 From llvm-commits at lists.llvm.org Thu Jul 9 06:18:54 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:18:54 +0000 (UTC) Subject: [PATCH] D83482: [yaml2obj] - Add a syntax to override e_phoff, e_phentsize and e_phnum fields. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: hiraditya, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This adds `PhOff`, `PhEntSize` and `PhNum` keys. Will be useful for creating broken objects for testing llvm-readelf. https://reviews.llvm.org/D83482 Files: llvm/include/llvm/ObjectYAML/ELFYAML.h llvm/lib/ObjectYAML/ELFEmitter.cpp llvm/lib/ObjectYAML/ELFYAML.cpp llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml llvm/tools/obj2yaml/elf2yaml.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83482.276724.patch Type: text/x-patch Size: 5407 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 06:21:46 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:21:46 +0000 (UTC) Subject: [PATCH] D83483: GlobalISel: Don't use virtual for distinguishing arg handlers Message-ID: arsenm created this revision. arsenm added reviewers: aemerson, paquette, aditya_nandakumar, dsanders, rovka, petarj. Herald added subscribers: kerbowa, atanasyan, hiraditya, arichardson, nhaehnle, wdng, jvesely, sdardis. Herald added a project: LLVM. There's no reason to involve the hassle of a virtual method targets have to override for a simple boolean. Not sure exactly what's going on with Mips, but it seems to define its own totally separate handler classes. https://reviews.llvm.org/D83483 Files: llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/lib/Target/ARM/ARMCallLowering.cpp llvm/lib/Target/Mips/MipsCallLowering.cpp llvm/lib/Target/X86/X86CallLowering.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83483.276725.patch Type: text/x-patch Size: 24895 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 06:28:53 2020 From: llvm-commits at lists.llvm.org (Yvan Roux via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:28:53 +0000 (UTC) Subject: [PATCH] D83313: [MachineOutliner] Fix liveness computing. In-Reply-To: References: Message-ID: <93979d3b862a5821ae9b54b9e31fee4f@localhost.localdomain> yroux added a comment. In D83313#2138334 , @samparker wrote: > I guess I just don't understand why the BX_RET wouldn't be marked with using the link register in the first place? Yes that's confusing and the first thing I tried was to add a USES = [LR] to its definition, but then LR will need to be livein everywhere my understanding is that it is handled as needed by prolog/epilog emission Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83313/new/ https://reviews.llvm.org/D83313 From llvm-commits at lists.llvm.org Thu Jul 9 06:33:50 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:33:50 +0000 (UTC) Subject: [PATCH] D83472: [SystemZ/ZOS] Add header file to encapsulate use of In-Reply-To: References: Message-ID: <698a91b3887d31c64d27779cc672cbd3@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/include/llvm/Support/ExitCodes.h:22 +#elif __MVS__ +// does not exists on z/OS. The only used value in LLVM is +// EX_IOERR, which is used to signal a special error condition (broken pipe). ---------------- Minor nits: s/exists/exist/; s/used value/value used/; CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83472/new/ https://reviews.llvm.org/D83472 From llvm-commits at lists.llvm.org Thu Jul 9 06:37:47 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Thu, 09 Jul 2020 06:37:47 -0700 (PDT) Subject: [llvm] 6f5d913 - OpaquePtr: Don't check pointee type for byval/preallocated Message-ID: <5f071dab.1c69fb81.3a60c.7aeb@mx.google.com> Author: Matt Arsenault Date: 2020-07-09T09:37:41-04:00 New Revision: 6f5d9136b27eefc981333d8c23ea9c0a38033d7b URL: https://github.com/llvm/llvm-project/commit/6f5d9136b27eefc981333d8c23ea9c0a38033d7b DIFF: https://github.com/llvm/llvm-project/commit/6f5d9136b27eefc981333d8c23ea9c0a38033d7b.diff LOG: OpaquePtr: Don't check pointee type for byval/preallocated Since none of these users really care about the actual type, hide the type under a new size-getting attribute to go along with hasPassPointeeByValueAttr. This will work better for the future byref attribute, which may end up only tracking the byte size and not the IR type. We currently have 3 parameter attributes that should carry the type (technically inalloca does not yet). The APIs are somewhat awkward since preallocated/inalloca piggyback on byval in some places, but in others are treated as distinct attributes. Since these are all mutually exclusive, we should probably just merge all the attribute infrastructure treating these as totally distinct attributes. Added: Modified: llvm/include/llvm/IR/Argument.h llvm/lib/IR/Function.cpp llvm/lib/IR/Mangler.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/Argument.h b/llvm/include/llvm/IR/Argument.h index 2bd8e99f12c4..af469e8a5d1a 100644 --- a/llvm/include/llvm/IR/Argument.h +++ b/llvm/include/llvm/IR/Argument.h @@ -75,6 +75,10 @@ class Argument final : public Value { /// attribute. These attributes represent arguments being passed by value. bool hasPassPointeeByValueAttr() const; + /// If this argument satisfies has hasPassPointeeByValueAttr, return the + /// in-memory ABI size copied to the stack for the call. Otherwise, return 0. + uint64_t getPassPointeeByValueCopySize(const DataLayout &DL) const; + /// If this is a byval or inalloca argument, return its alignment. /// FIXME: Remove this function once transition to Align is over. /// Use getParamAlign() instead. diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp index 78092cd7077d..0ec0cce83a8c 100644 --- a/llvm/lib/IR/Function.cpp +++ b/llvm/lib/IR/Function.cpp @@ -128,6 +128,27 @@ bool Argument::hasPassPointeeByValueAttr() const { Attrs.hasParamAttribute(getArgNo(), Attribute::Preallocated); } +uint64_t Argument::getPassPointeeByValueCopySize(const DataLayout &DL) const { + AttributeSet ParamAttrs + = getParent()->getAttributes().getParamAttributes(getArgNo()); + + // FIXME: All the type carrying attributes are mutually exclusive, so there + // should be a single query to get the stored type that handles any of them. + if (Type *ByValTy = ParamAttrs.getByValType()) + return DL.getTypeAllocSize(ByValTy); + if (Type *PreAllocTy = ParamAttrs.getPreallocatedType()) + return DL.getTypeAllocSize(PreAllocTy); + + // FIXME: inalloca always depends on pointee element type. It's also possible + // for byval to miss it. + if (ParamAttrs.hasAttribute(Attribute::InAlloca) || + ParamAttrs.hasAttribute(Attribute::ByVal) || + ParamAttrs.hasAttribute(Attribute::Preallocated)) + return DL.getTypeAllocSize(cast(getType())->getElementType()); + + return 0; +} + unsigned Argument::getParamAlignment() const { assert(getType()->isPointerTy() && "Only pointers have alignments"); return getParent()->getParamAlignment(getArgNo()); diff --git a/llvm/lib/IR/Mangler.cpp b/llvm/lib/IR/Mangler.cpp index ba6ca7abae58..0d66e321c396 100644 --- a/llvm/lib/IR/Mangler.cpp +++ b/llvm/lib/IR/Mangler.cpp @@ -94,15 +94,18 @@ static void addByteCountSuffix(raw_ostream &OS, const Function *F, const DataLayout &DL) { // Calculate arguments size total. unsigned ArgWords = 0; + + const unsigned PtrSize = DL.getPointerSize(); + for (Function::const_arg_iterator AI = F->arg_begin(), AE = F->arg_end(); AI != AE; ++AI) { - Type *Ty = AI->getType(); // 'Dereference' type in case of byval or inalloca parameter attribute. - if (AI->hasPassPointeeByValueAttr()) - Ty = cast(Ty)->getElementType(); + uint64_t AllocSize = AI->hasPassPointeeByValueAttr() ? + AI->getPassPointeeByValueCopySize(DL) : + DL.getTypeAllocSize(AI->getType()); + // Size should be aligned to pointer size. - unsigned PtrSize = DL.getPointerSize(); - ArgWords += alignTo(DL.getTypeAllocSize(Ty), PtrSize); + ArgWords += alignTo(AllocSize, PtrSize); } OS << '@' << ArgWords; From llvm-commits at lists.llvm.org Thu Jul 9 06:38:11 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:38:11 +0000 (UTC) Subject: [PATCH] D82679: OpaquePtr: Don't check pointee type for byval/preallocated In-Reply-To: References: Message-ID: arsenm closed this revision. arsenm added a comment. 6f5d9136b27eefc981333d8c23ea9c0a38033d7b CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82679/new/ https://reviews.llvm.org/D82679 From llvm-commits at lists.llvm.org Thu Jul 9 06:39:43 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:39:43 +0000 (UTC) Subject: [PATCH] D83444: [AArch64][SVE] Add lowering for llvm.fma. In-Reply-To: References: Message-ID: <27738e0006beeb4b2806734bab00faa7@localhost.localdomain> paulwalker-arm accepted this revision. paulwalker-arm added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:405-410 + def : Pat<(nxv8f16 (AArch64fma_p nxv8i1:$P, nxv8f16:$Op1, nxv8f16:$Op2, nxv8f16:$Op3)), + (FMLA_ZPmZZ_H $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv4f32 (AArch64fma_p nxv4i1:$P, nxv4f32:$Op1, nxv4f32:$Op2, nxv4f32:$Op3)), + (FMLA_ZPmZZ_S $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv2f64 (AArch64fma_p nxv2i1:$P, nxv2f64:$Op1, nxv2f64:$Op2, nxv2f64:$Op3)), + (FMLA_ZPmZZ_D $P, $Op3, $Op1, $Op2)>; ---------------- I was going to say you're missing patterns for the other legal scalable vector types, but I can see that's a common theme across the floating point instructions so I'm happy enough. ================ Comment at: llvm/test/CodeGen/AArch64/sve-fp.ll:138 + +declare @llvm.fma.nxv2f64(, , ) +declare @llvm.fma.nxv4f32(, , ) ---------------- To be consistent these belong at the bottom of the file with the others. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83444/new/ https://reviews.llvm.org/D83444 From llvm-commits at lists.llvm.org Thu Jul 9 06:40:50 2020 From: llvm-commits at lists.llvm.org (Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:40:50 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <2c0f390008a7180c1b57cf4e0d4bf808@localhost.localdomain> khchen planned changes to this revision. khchen marked an inline comment as done. khchen added a comment. BTW, this patch depends on D77030 , which aim to avoid the forcing of any ProcessorModel to have `FeatureRVCHints` feature. But if we decide to keep the `FeatureRVCHints`, I need to change implementation a little. ================ Comment at: clang/test/Driver/riscv-cpus.c:2 +// Check target CPUs are correctly passed. + +// RUN: %clang -target riscv32 -### -c %s 2>&1 -mcpu=rocket-rv32 | FileCheck -check-prefix=MCPU-ROCKETCHIP32 %s ---------------- asb wrote: > I think for completeness this test should be validating the interaction of the ABI choosing logic with CPU selection as well. With the implemented logic I believe it should show that lp64d is selected for -mcpu=sifive-u54 and that -mcpu=sifive-u54 -mabi=lp64 will respect the ABI choice okay, it makes sense to me, I will update this patch soon. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 From llvm-commits at lists.llvm.org Thu Jul 9 06:44:30 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:44:30 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <4cfab9dee4262bd919522d43bd0f2239@localhost.localdomain> spatel added a comment. In D60413#2139023 , @dnsampaio wrote: > Fixed test location and command The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Thu Jul 9 06:51:46 2020 From: llvm-commits at lists.llvm.org (Lewis Revill via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:51:46 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: <259666c0412fadc9938fb9edf2eddbb1@localhost.localdomain> lewis-revill added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:173 + +bool RISCVDAGToDAGISel::SelectSLOI(SDValue N, SDValue &RS1, SDValue &Shamt) { + MVT XLenVT = Subtarget->getXLenVT(); ---------------- Indentation within these Select functions is messed up, presumably due to a mix of tabs and spaces. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:262 +bool RISCVDAGToDAGISel::SelectSLOIW(SDValue N, SDValue &RS1, SDValue &Shamt) { + if (N.getOpcode() == ISD::SIGN_EXTEND_INREG && + cast(N.getOperand(1))->getVT() == MVT::i32) { ---------------- I'm not sure the convention other select functions for W instructions follow but perhaps an assert for IsRV64 should be added for completeness? ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:641 +def SROIPat : ComplexPattern; +def SLOIWPat : ComplexPattern; +def SROIWPat : ComplexPattern; ---------------- Can these W selects be guarded for 64 bit only? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 From llvm-commits at lists.llvm.org Thu Jul 9 06:54:22 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:54:22 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: dmgreen added a comment. Thanks. I had started collecting some statistics. It appears this patch lowers the number of calls to alias(..) by 0.8%, which matches up with what @nikic was seeing. The percentage of calls that end up as No/Must/Partial alias goes up from 65.7% to 66%. Which probably doesn't tell you a lot on it's own considering all the things that could be going on, but it's in the right direction. Like I said, my benchmarks show this to be an improvement, and it should give better aliasing info. If it does cause problems for anyone I am of course happy to take look. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Thu Jul 9 06:55:09 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Thu, 09 Jul 2020 06:55:09 -0700 (PDT) Subject: [llvm] af839a9 - [BasicAA] Enable -basic-aa-recphi by default Message-ID: <5f0721bd.1c69fb81.436ee.7085@mx.google.com> Author: David Green Date: 2020-07-09T14:54:53+01:00 New Revision: af839a96187e3538d63ad57571e4bdf01e2b15c5 URL: https://github.com/llvm/llvm-project/commit/af839a96187e3538d63ad57571e4bdf01e2b15c5 DIFF: https://github.com/llvm/llvm-project/commit/af839a96187e3538d63ad57571e4bdf01e2b15c5.diff LOG: [BasicAA] Enable -basic-aa-recphi by default This option was added a while back, to help improve AA around pointer phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can then prove more precise aliasing info. Differential Revision: https://reviews.llvm.org/D82998 Added: Modified: llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/phi-loop.ll llvm/test/Analysis/BasicAA/recphi.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 74664098ce1d..5574d3f5db6a 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -66,7 +66,7 @@ using namespace llvm; /// Enable analysis of recursive PHI nodes. static cl::opt EnableRecPhiAnalysis("basic-aa-recphi", cl::Hidden, - cl::init(false)); + cl::init(true)); /// By default, even on 32-bit architectures we use 64-bit integers for /// calculations. This will allow us to more-aggressively decompose indexing diff --git a/llvm/test/Analysis/BasicAA/phi-loop.ll b/llvm/test/Analysis/BasicAA/phi-loop.ll index db3023c6560d..e54752a9223f 100644 --- a/llvm/test/Analysis/BasicAA/phi-loop.ll +++ b/llvm/test/Analysis/BasicAA/phi-loop.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -basic-aa-recphi=1 -gvn -S | FileCheck %s +; RUN: opt < %s -basic-aa -gvn -S | FileCheck %s ; ; Check that section->word_ofs doesn't get reloaded in every iteration of the ; for loop. diff --git a/llvm/test/Analysis/BasicAA/recphi.ll b/llvm/test/Analysis/BasicAA/recphi.ll index 130058c74560..bdd85c8f0e6c 100644 --- a/llvm/test/Analysis/BasicAA/recphi.ll +++ b/llvm/test/Analysis/BasicAA/recphi.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -basic-aa-recphi -disable-output 2>&1 | FileCheck %s +; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s ; CHECK-LABEL: Function: simple: 5 pointers, 0 call sites ; CHECK: NoAlias: float* %src1, float* %src2 From llvm-commits at lists.llvm.org Thu Jul 9 06:55:18 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:55:18 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGaf839a96187e: [BasicAA] Enable -basic-aa-recphi by default (authored by dmgreen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 Files: llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/phi-loop.ll llvm/test/Analysis/BasicAA/recphi.ll Index: llvm/test/Analysis/BasicAA/recphi.ll =================================================================== --- llvm/test/Analysis/BasicAA/recphi.ll +++ llvm/test/Analysis/BasicAA/recphi.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -basic-aa-recphi -disable-output 2>&1 | FileCheck %s +; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s ; CHECK-LABEL: Function: simple: 5 pointers, 0 call sites ; CHECK: NoAlias: float* %src1, float* %src2 Index: llvm/test/Analysis/BasicAA/phi-loop.ll =================================================================== --- llvm/test/Analysis/BasicAA/phi-loop.ll +++ llvm/test/Analysis/BasicAA/phi-loop.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -basic-aa-recphi=1 -gvn -S | FileCheck %s +; RUN: opt < %s -basic-aa -gvn -S | FileCheck %s ; ; Check that section->word_ofs doesn't get reloaded in every iteration of the ; for loop. Index: llvm/lib/Analysis/BasicAliasAnalysis.cpp =================================================================== --- llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -66,7 +66,7 @@ /// Enable analysis of recursive PHI nodes. static cl::opt EnableRecPhiAnalysis("basic-aa-recphi", cl::Hidden, - cl::init(false)); + cl::init(true)); /// By default, even on 32-bit architectures we use 64-bit integers for /// calculations. This will allow us to more-aggressively decompose indexing -------------- next part -------------- A non-text attachment was scrubbed... Name: D82998.276728.patch Type: text/x-patch Size: 1598 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 06:55:38 2020 From: llvm-commits at lists.llvm.org (Andrew Ng via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:55:38 +0000 (UTC) Subject: [PATCH] D83321: [Support] Fix utf16 path's index upper bound In-Reply-To: References: Message-ID: <3bf166ee1e75095984e523bb77b3b13f@localhost.localdomain> andrewng added a comment. That test is for `widenPath` and it does cover a similar situation (I know because I wrote it). The code you have changed relates to `directory_iterator_construct` which does not have coverage of the situation where the UTF16 length is less than the UTF8 length, i.e. the issue that you are fixing. If another reviewer is happy to approve this change without any test coverage, then that's fine with me. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83321/new/ https://reviews.llvm.org/D83321 From llvm-commits at lists.llvm.org Thu Jul 9 06:59:33 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 13:59:33 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <2224acde5b64de286559f69d6412ac50@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:228 + DepResult->isAnti())) + return true; + return false; ---------------- The if here doesn't add much I think. It would be simpler to just `return DepResult && DepResult->isOutput() || DepResult->isFlow() || DepResult->isAnti()`? ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:233 + +bool isDependenceSafe(Instruction &I, MemorySSAUpdater &MSSAU, + SmallPtrSet InstsToCheck) { ---------------- I don't think there is a reason to pass MemorySSAUpdater here, as you don't modify the IR. Just pass MemorySSA directly. Also, please add a comment what the logic behind the checks is (same for the DI version) ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:245 + return false; + bool IsFlowOrOutput = false; + bool IsAnti = false; ---------------- What we are doing here is basically checking if 2 instructions may alias, right? Given that, the variable names seem a bit confusing. Also, the function returns true if either IsFlowOrOutput or IsAnti is true. Could you just return true directly? ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:368 // Skip tests when we don't have PDT or DI - if (!PDT || !DI) + if (!PDT || !(DI || MSSAU)) return false; ---------------- Does it make sense to even call this function if either of those are not available, i.e. if all those required wouldn't it make sense to assert that they are all provided or turn them into references? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Thu Jul 9 07:00:37 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:00:37 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: fhahn added a comment. In D82998#2141649 , @dmgreen wrote: > Thanks. > > I had started collecting some statistics. It appears this patch lowers the number of calls to alias(..) by 0.8%, which matches up with what @nikic was seeing. The percentage of calls that end up as No/Must/Partial alias goes up from 65.7% to 66%. Which probably doesn't tell you a lot on it's own considering all the things that could be going on, but it's in the right direction. > > Like I said, my benchmarks show this to be an improvement, and it should give better aliasing info. If it does cause problems for anyone I am of course happy to take look. yeah that should be fine, the main thing I was worried about is compile-time here, but it sounds like we are good there. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Thu Jul 9 07:01:38 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Thu, 09 Jul 2020 07:01:38 -0700 (PDT) Subject: [llvm] 3514f58 - Fix MSVC "not all control paths return a value" warning. NFC. Message-ID: <5f072342.1c69fb81.ae735.6508@mx.google.com> Author: Simon Pilgrim Date: 2020-07-09T15:01:13+01:00 New Revision: 3514f58fbea5967a967468633c901e9b2f241594 URL: https://github.com/llvm/llvm-project/commit/3514f58fbea5967a967468633c901e9b2f241594 DIFF: https://github.com/llvm/llvm-project/commit/3514f58fbea5967a967468633c901e9b2f241594.diff LOG: Fix MSVC "not all control paths return a value" warning. NFC. Added: Modified: llvm/tools/llvm-objdump/llvm-objdump.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp index 7d282074efa6..320bbb5d358b 100644 --- a/llvm/tools/llvm-objdump/llvm-objdump.cpp +++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp @@ -811,6 +811,7 @@ class LiveVariablePrinter { case LineChar::LabelHoriz: return IsASCII ? "-" : u8"\u2500"; } + llvm_unreachable("Unhandled LineChar enum"); } /// Print live ranges to the right of an existing line. This assumes the From llvm-commits at lists.llvm.org Thu Jul 9 07:15:24 2020 From: llvm-commits at lists.llvm.org (Sam Elliott via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:15:24 +0000 (UTC) Subject: [PATCH] D65802: [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) In-Reply-To: References: Message-ID: lenary added a comment. Herald added a subscriber: ecnelises. Ping? This is mentioned in a RISC-V bug on llvm bugzilla, and seems like a nice improvement to multiple targets if we can land it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D65802/new/ https://reviews.llvm.org/D65802 From llvm-commits at lists.llvm.org Thu Jul 9 07:16:47 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:16:47 +0000 (UTC) Subject: [PATCH] D83228: [llvm] [unittests] Remove some temporary files after they're not needed In-Reply-To: References: Message-ID: broadwaylamb added a comment. Gentle ping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83228/new/ https://reviews.llvm.org/D83228 From llvm-commits at lists.llvm.org Thu Jul 9 07:17:45 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:17:45 +0000 (UTC) Subject: [PATCH] D83463: [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE In-Reply-To: References: Message-ID: <437c474733e74e84b48dfd5f9737b514@localhost.localdomain> probinson accepted this revision. probinson added a comment. This revision is now accepted and ready to land. Thanks! This will keep the debugger folks off my case. :-) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83463/new/ https://reviews.llvm.org/D83463 From llvm-commits at lists.llvm.org Thu Jul 9 07:19:13 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:19:13 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <691d01cba5680cccfcf2cce12483032b@localhost.localdomain> dnsampaio added a comment. In D60413#2141645 , @spatel wrote: > In D60413#2139023 , @dnsampaio wrote: > > > Fixed test location and command > > > The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs. @spatel Ups, sorry about that, I got two working environments now wfh, one was not fixed. Will do as you recommend. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Thu Jul 9 07:22:04 2020 From: llvm-commits at lists.llvm.org (Anirudh Prasad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:22:04 +0000 (UTC) Subject: [PATCH] D83484: Use InitLLVM in llvm-stress, sancov and TableGen Message-ID: anirudhp created this revision. anirudhp added reviewers: uweigand, Kai, ruiu. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. This patch refactors the llvm tools namely, llvm-stress, sancov and TableGen, to use the new `InitLLVM` interface which encapsulates `PrettyStackTrace` Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83484 Files: llvm/tools/llvm-stress/llvm-stress.cpp llvm/tools/sancov/sancov.cpp llvm/utils/TableGen/TableGen.cpp Index: llvm/utils/TableGen/TableGen.cpp =================================================================== --- llvm/utils/TableGen/TableGen.cpp +++ llvm/utils/TableGen/TableGen.cpp @@ -12,9 +12,7 @@ #include "TableGenBackends.h" // Declares all backends. #include "llvm/Support/CommandLine.h" -#include "llvm/Support/ManagedStatic.h" -#include "llvm/Support/PrettyStackTrace.h" -#include "llvm/Support/Signals.h" +#include "llvm/Support/InitLLVM.h" #include "llvm/TableGen/Main.h" #include "llvm/TableGen/Record.h" #include "llvm/TableGen/SetTheory.h" @@ -272,12 +270,9 @@ } int main(int argc, char **argv) { - sys::PrintStackTraceOnErrorSignal(argv[0]); - PrettyStackTraceProgram X(argc, argv); + InitLLVM X(argc, argv); cl::ParseCommandLineOptions(argc, argv); - llvm_shutdown_obj Y; - return TableGenMain(argv[0], &LLVMTableGenMain); } Index: llvm/tools/sancov/sancov.cpp =================================================================== --- llvm/tools/sancov/sancov.cpp +++ llvm/tools/sancov/sancov.cpp @@ -32,15 +32,13 @@ #include "llvm/Support/Errc.h" #include "llvm/Support/ErrorOr.h" #include "llvm/Support/FileSystem.h" +#include "llvm/Support/InitLLVM.h" #include "llvm/Support/JSON.h" #include "llvm/Support/MD5.h" -#include "llvm/Support/ManagedStatic.h" #include "llvm/Support/MemoryBuffer.h" #include "llvm/Support/Path.h" -#include "llvm/Support/PrettyStackTrace.h" #include "llvm/Support/Regex.h" #include "llvm/Support/SHA1.h" -#include "llvm/Support/Signals.h" #include "llvm/Support/SourceMgr.h" #include "llvm/Support/SpecialCaseList.h" #include "llvm/Support/TargetRegistry.h" @@ -1134,10 +1132,7 @@ } // namespace int main(int Argc, char **Argv) { - // Print stack trace if we signal out. - sys::PrintStackTraceOnErrorSignal(Argv[0]); - PrettyStackTraceProgram X(Argc, Argv); - llvm_shutdown_obj Y; // Call llvm_shutdown() on exit. + llvm::InitLLVM X(Argc, Argv); llvm::InitializeAllTargetInfos(); llvm::InitializeAllTargetMCs(); Index: llvm/tools/llvm-stress/llvm-stress.cpp =================================================================== --- llvm/tools/llvm-stress/llvm-stress.cpp +++ llvm/tools/llvm-stress/llvm-stress.cpp @@ -38,8 +38,7 @@ #include "llvm/Support/CommandLine.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FileSystem.h" -#include "llvm/Support/ManagedStatic.h" -#include "llvm/Support/PrettyStackTrace.h" +#include "llvm/Support/InitLLVM.h" #include "llvm/Support/ToolOutputFile.h" #include "llvm/Support/raw_ostream.h" #include @@ -733,10 +732,8 @@ int main(int argc, char **argv) { using namespace llvm; - // Init LLVM, call llvm_shutdown() on exit, parse args, etc. - PrettyStackTraceProgram X(argc, argv); + InitLLVM X(argc, argv); cl::ParseCommandLineOptions(argc, argv, "llvm codegen stress-tester\n"); - llvm_shutdown_obj Y; auto M = std::make_unique("/tmp/autogen.bc", Context); Function *F = GenEmptyFunction(M.get()); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83484.276734.patch Type: text/x-patch Size: 2998 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 07:24:01 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:24:01 +0000 (UTC) Subject: [PATCH] D65802: [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) In-Reply-To: References: Message-ID: lebedev.ri added a comment. @rogfer01 reverse-ping? I'm not sure why this got stuck. Perhaps the code is fine and only the assert should go. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D65802/new/ https://reviews.llvm.org/D65802 From llvm-commits at lists.llvm.org Thu Jul 9 07:24:23 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Thu, 09 Jul 2020 07:24:23 -0700 (PDT) Subject: [llvm] fdde69a - AMDGPU/GlobalISel: Work around verifier error in test Message-ID: <5f072897.1c69fb81.325ec.7723@mx.google.com> Author: Matt Arsenault Date: 2020-07-09T10:24:16-04:00 New Revision: fdde69aac9b99a5cda49c5b738dc8dc67ea4cbbd URL: https://github.com/llvm/llvm-project/commit/fdde69aac9b99a5cda49c5b738dc8dc67ea4cbbd DIFF: https://github.com/llvm/llvm-project/commit/fdde69aac9b99a5cda49c5b738dc8dc67ea4cbbd.diff LOG: AMDGPU/GlobalISel: Work around verifier error in test The unfortunate split between finalizeLowering and the selector pass means there's a point where the verifier fails. The DAG selector pass skips the verifier, but this seems to not work when using the GlobalISel fallback. Added: Modified: llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll index df536962e1b2..256b95d222be 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-divergent.ll @@ -1,4 +1,4 @@ -; RUN: not llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -global-isel-abort=2 -pass-remarks-missed="gisel.*" -o /dev/null 2>&1 %s | FileCheck -check-prefix=ERR %s +; RUN: not llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -global-isel-abort=2 -pass-remarks-missed="gisel.*" -verify-machineinstrs -o /dev/null 2>&1 %s | FileCheck -check-prefix=ERR %s ; ERR: remark: :0:0: cannot select: %{{[0-9]+}}:sreg_32(p5) = G_DYN_STACKALLOC %{{[0-9]+}}:vgpr(s32), 1 (in function: kernel_dynamic_stackalloc_vgpr_align4) ; ERR-NEXT: warning: Instruction selection used fallback path for kernel_dynamic_stackalloc_vgpr_align4 @@ -13,13 +13,13 @@ define amdgpu_kernel void @kernel_dynamic_stackalloc_vgpr_align4(i32 addrspace(1 %gep = getelementptr i32, i32 addrspace(1)* %ptr, i32 %id %n = load i32, i32 addrspace(1)* %gep %alloca = alloca i32, i32 %n, align 4, addrspace(5) - store volatile i32 0, i32 addrspace(5)* %alloca + store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(1)* undef ret void } define void @func_dynamic_stackalloc_vgpr_align4(i32 %n) { %alloca = alloca i32, i32 %n, align 4, addrspace(5) - store volatile i32 0, i32 addrspace(5)* %alloca + store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(1)* undef ret void } From llvm-commits at lists.llvm.org Thu Jul 9 07:25:15 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:25:15 +0000 (UTC) Subject: [PATCH] D81296: [PDB] Defer public serialization until PDB writing In-Reply-To: References: Message-ID: grimar added inline comments. ================ Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:236 + auto B = &HashRecords[BucketStarts[I]]; + auto E = &HashRecords[BucketCursors[I]]; + auto BucketCmp = [Records](const PSHashRecord &LHash, ---------------- 69 lld/COFF tests fails for me with the same symptom here. The problem is that `BucketStarts[I]==1`, `BucketCursors[I]==1`, but `HashRecords` has size of 1 too. I.e. it tries to access past the end of `Hashrtecords`. e.g: ``` 218>******************** 218>FAIL: lld :: COFF/used-lto.ll (358 of 2419) 218>******************** TEST 'lld :: COFF/used-lto.ll' FAILED ******************** 218>Script: 218>-- 218>: 'RUN: at line 2'; d:\work3\llvm\llvm-project\build\debug\bin\llvm-as.exe -o D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.obj D:\Work3\LLVM\llvm-project\lld\test\COFF\used-lto.ll 218>: 'RUN: at line 3'; d:\work3\llvm\llvm-project\build\debug\bin\lld-link.exe -dll -debug -opt:ref -noentry -out:D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.dll D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.obj 218>: 'RUN: at line 4'; d:\work3\llvm\llvm-project\build\debug\bin\llvm-pdbutil.exe dump -publics D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.pdb | d:\work3\llvm\llvm-project\build\debug\bin\filecheck.exe D:\Work3\LLVM\llvm-project\lld\test\COFF\used-lto.ll 218>-- 218>Exit Code: 2147483651 218> 218>Command Output (stdout): 218>-- 218>$ ":" "RUN: at line 2" 218>$ "d:\work3\llvm\llvm-project\build\debug\bin\llvm-as.exe" "-o" "D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.obj" "D:\Work3\LLVM\llvm-project\lld\test\COFF\used-lto.ll" 218>$ ":" "RUN: at line 3" 218>$ "d:\work3\llvm\llvm-project\build\debug\bin\lld-link.exe" "-dll" "-debug" "-opt:ref" "-noentry" "-out:D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.dll" "D:\Work3\LLVM\llvm-project\build\tools\lld\test\COFF\Output\used-lto.ll.tmp.obj" 218># command stderr: 218>PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. 218> 218>0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902354DE (0x000000945F786798 0x0000000000000001 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), std::vector >::operator[]() + 0x6E bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733 + 0x4B byte(s) 218> 218> + 0x4B byte(s) 218> 218>0x00007FF7902364D70x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000FEC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945F6C1750 0x0000000000000FF8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x57 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733, d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x4B byte(s) 218> 218> + 0x2D byte(s) 218> 218>0x00007FF790235F600x00007FF7902364D7 (0x000000945FA176D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945F6C1750 0x0000000000000FF4 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x70 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733 + 0x4B byte(s) 218> 218>, d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>0x00007FF7902364D70x00007FF790227670 (0x000000945F6C1750 0x0000000000000FE4 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945FA176D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x57 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733 + 0x4B byte(s) 218> 218>, ::operator()() + 0x57 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>0x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000FF0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790235F60, ::operator()() + 0x57 bytes(s) (0x000000945FA168C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x70 bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>0x00007FF790227670 (0x000000945FA168C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733 + 0x4B byte(s) 218> 218>, c:\program files (x86)\microsoft visuabuginfo\pdb\native\gsistreambuilder.cpp, line 16707566, c:\program files (x86)\microsoft visuabuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x30 byte(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218> 218> 218> + 0x2D byte(s) 218> 218>, d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 1733 + 0x4B byte(s) 218> 218>0x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000FFC 0x000000945F6C1648 0x000000945F6C1558)0x00007FF79022EB700x00007FF790235F60 (0x000000945FA168C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790235F60 (0x000000945FA1DD30 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945FA164D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000FE0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x57 bytes(s), ::operator()() + 0x57 bytes(s), std::_Invob86082a9071349385d17558456a913>::operator()() + 0x30 bytes(s), ::operator()() + 0x70 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733, std::ia_7cb86082a9071349385d17558456a913>::operator()() + 0x30 bytes(s) + 0x4B byte(s) 218> 218>, d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152, ::operator()() + 0x57 bytes(s) + 0x12 byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>, d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>0x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000FE8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790227670 (0x000000945FA164D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x57 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>0x00007FF790235F60 (0x000000945FA169E0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x70 bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>0x00007FF790235F60 (0x000000945FA16A70 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF7902276C0, ::operator()() + 0x70 bytes(s) (0x000000945FA168C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790227670 (0x000000945FA169E0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF790235F60 (0x000000945FA16710 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79022EB70 (0x000000945FA176D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x70 bytes(s), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF79023807F (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s)0x00007FF7902276C0 (0x000000945FA176D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>, std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>0x00007FF78F80DD73 (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79023807F (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_c271b34c55094de858467af() + 0x53 bytes(s), ::operator()() + 0x70 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>0x00007FF795FE2D0B (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x2B bytes(s), d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 161 218> 218>0x00007FF795FDE3A0 (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FE01C0 (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FDE3F0 (0x000000945FA168B8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>, d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>0x00007FF790227670 (0x000000945FA16A70 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FE3ABF (0x000000945FA168B0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79022EB70 (0x000000945FA16A70 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\vector, line 1733 + 0x4B byte(s) 218> 218>, std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF7902364D7 (0x000000945F6C1750 0x0000000000000C94 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790227670, ::operator()() + 0x57 bytes(s) (0x000000945FA16710 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79022EB70 (0x000000945FA169E0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>, std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF790230865 (0x0000000000000000 0x0000000000001000 0x000000945F6C1750 0xCCCCCCCCCCCCCCCC)0x00007FF78F80DD73 (0x00000094625CF9A0 0x000000945F9C0270 0x000000945F7B9680 0xCCCCCCCCCCCCCCCC)0x00007FF79022EB70, std::_Func_class::operator()() + 0x53 bytes(s) (0x000000945FA164D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< &>() + 0x30 bytes(s), std::invoke< &>() + 0x30 bytes(s), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152, d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 16707566 + 0x12 byte(s), llvm::parallel::detail::parallel_for_each_n >() + 0x185 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218> 218> 218> + 0x30 byte(s) 218> 218>, d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 156 + 0x15 byte(s) 218> 218>0x00007FF790227670 (0x000000945FA1DD30 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79023011D0x00007FF79022EB70 (0x0000000000000000 0x0000000000001000 0x000000945F6C98C0 0x000000945F6C5840) (0x000000945FA16710 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF78F80DD73 (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_class::operator()() + 0x53 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>0x00007FF795FE8119 (0x000000945F7B9680 0x0000000100000000 0xCCCCCCCC00000002 0xCCCCCCCCCCCCCCCC)0x00007FF7902276C0 (0x000000945FA16710 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), llvm::parallel::detail::`anonymous namespace'::ThreadPoolExecutor::work() + 0x119 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 109 218> 218>0x00007FF795FE2D0B (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x2B bytes(s), d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 161 218> 218>0x00007FF795FE2CBD (0x000000945F7C6FA0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF795FDE3A0 (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x3D bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566, d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 52 + 0x30 byte(s) + 0x3D byte(s) 218> 218> 218> 218>0x00007FF795FDE490 (0x000000945F7C6FA0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< >() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF7902276C0 (0x000000945FA164D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>0x00007FF79023807F (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s)0x00007FF795FE0260, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218> (0x000000945F7C6FA0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< >() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF78F80DD73 (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF795FDECDC (0x000000945F7C6FA0 0x00000094612AF6CC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_class::operator()() + 0x53 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>0x00007FF795FE2D0B (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x2B bytes(s), d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 161 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FDE3A0 (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FE01C0 (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF7902276C00x00007FF795FDE3F0 (0x000000945FA164C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218> (0x000000945FA169E0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>0x00007FF795FE3ABF (0x000000945FA164C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>0x00007FF78F80DD73 (0x000000946434FB70 0x000000945F9BEC70 0x000000945F7B9680 0xCCCCCCCCCCCCCCCC), std::_Func_class::operator()() + 0x53 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>0x00007FF79023807F (0x000000945FA169D8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s)0x00007FF795FE8119 (0x000000945F7B9680 0x0000000100000000 0xCCCCCCCC00000004 0xCCCCCCCCCCCCCCCC), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>, llvm::parallel::detail::`anonymous namespace'::ThreadPoolExecutor::work() + 0x119 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\support\parallel.cpp, line 109 218> 218>0x00007FF78F80DD73 (0x000000945FA169D8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF795FE2CBD (0x000000945F7C6640 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_class::operator()() + 0x53 bytes(s), ::operator()() + 0x3D bytes(s), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), std::_Invoker_functor::_Call< &>() + 0x30 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 235 + 0x2D byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF790235F60 (0x000000945F7E6DB0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), ::operator()() + 0x70 bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 152 + 0x12 byte(s) 218> 218>0x00007FF795FE01C0 (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF790227670 (0x000000945F7E6DB0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_LaunchPad >,std::default_delete > > > >::_Execute<0>() + 0x3C bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 239 218> 218>0x00007FF795FDE3F0 (0x000000945FA176C8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF79022EB70 (0x000000945FA1DD30 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF795FE5B8D (0x00000094612AF6A8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_LaunchPad >,std::default_delete > > > >::_Run() + 0x5D bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 2470x00007FF7902276C0 218> 218> (0x000000945FA1DD30 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>, llvm::parallelForEachN< >() + 0x5D bytes(s), d:\work3\llvm\llvm-project\llvm\include\llvm\support\parallel.h, line 194 218> 218>0x00007FF795FE45F8 (0x00000094612AF6A8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_LaunchPad >,std::default_delete > > > >::_Go() + 0x28 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 231 218> 218>0x00007FF790224B60 (0x000000945F786790 0x0000009400000000 0x000000945F6C9970 0xCCCCCCCCCCCCCCCC)0x00007FF79023807F (0x000000945FA1DD28 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), llvm::pdb::GSIHashStreamBuilder::finalizeBuckets() + 0x390 bytes(s)0x00007FF795FE356D (0x00000094612AF6A8 0x00007FF97072AA1B 0x0000000000000000 0x0000000000000000), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 261 218> 218>0x00007FF7902239B1 (0x000000945F7420D0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Pad::_Call_func() + 0x2D bytes(s), llvm::pdb::GSIStreamBuilder::finalizePublicBuckets() + 0x71 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native, line 15732480, d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native, line 52 218> 218>, d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 171 + 0x3D byte(s) 218> 218> 218> 218>0x00007FF795FDE4900x00007FF795FE2D0B (0x000000945F7C6640 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945FA169D8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF790222FA6 (0x000000945F7420D0 0x000000945F6C9A98 0x000000945F6C9CE0 0xCCCCCCCC00000000), std::_Invoker_functor::_Call< >() + 0x30 bytes(s), llvm::pdb::GSIStreamBuilder::finalizeMsfLayout() + 0x36 bytes(s), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\gsistreambuilder.cpp, line 306 218> 218>, std::_Invoker_functor:,1>::_Call<::_Call< &>() + 0x30 bytes(s), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s), std::invoke< &>() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 167075660x00007FF79023F880 + 0x30 byte(s) 218> 218> (0x000000945F6CA690 0x000000945F6C9DF8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), llvm::pdb::PDBFileBuilder::finalizeMsfLayout() + 0x180 bytes(s)0x00007FF7902276C0 (0x000000945FA16A70 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF78F80DD73 (0x000000945FA1DD28 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\pdbfilebuilder.cpp, line 141 + 0x21 byte(s) 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF79023E378 (0x000000945F6CA690 0x000000945F6CA5C8 0x000000945F6CA5F0 0x000000945F6CAC68), llvm::pdb::PDBFileBuilder::commit() + 0x88 bytes(s)0x00007FF795FE0260 (0x000000945F7C6640 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), d:\work3\llvm\llvm-project\llvm\lib\debuginfo\pdb\native\pdbfilebuilder.cpp, line 265 + 0x12 byte(s) 218> 218>, std::invoke< >() + 0x30 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF78F965956 (0x000000945F6CA688 0x000000945F6CAC68 0x000000945F6CACB0 0xCCCCCCCCCCCCCCCC)0x00007FF795FDECDC (0x000000945F7C6640 0x00000094612AF6CC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), `anonymous namespace'::PDBLinker::commit() + 0x116 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 16707566 + 0x30 byte(s) 218> 218>0x00007FF79022EB70 (0x000000945F7E6DB0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), d:\work3\llvm\llvm-project\lld\coff\pdb.cpp, line 1456 + 0x6E byte(s) 218> 218>, std::_Invoker_ret::_Call< &>() + 0x30 bytes(s), std::_Func_class::operator()() + 0x53 bytes(s), ::operator()() + 0x2B bytes(s), std::_LaunchPad >,std::default_delete > > > >::_Execute<0>() + 0x3C bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 15732480 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 209 218> 218>, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\thr\xthread, line 239 218> 218>0x00007FF970744FB8 (0x00007FF78F2FA8C9 0x00000094612AF6A8 0x0000000000000000 0x0000000000000000), _register_onexit_function() + 0x4A8 bytes(s) 218> 218>0x00007FF78F9632E3 (0x000000945F753FE0 0x000000945F6CAF80 0x000000945F6CAF90 0x000000945ED2061C), lld::coff::createPDB() + 0x143 bytes(s), d:\work3\llvm\llvm-project\lld\coff\pdb.cpp, line 1377 218> 218>0x00007FF970744BF1 (0x000000945F9C8DA0 0x0000000000000000 0x0000000000000000 0x0000000000000000), _register_onexit_function() + 0xE1 bytes(s) 218> 218>0x00007FF79023807F0x00007FF795FE3ABF (0x000000945FA16708 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) (0x000000945FA176C0 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC), std::_Func_impl_no_alloc<,void>::_Do_call() + 0x2F bytes(s)0x00007FF9A00A13D2 (0x00007FF9A00A13B0 0x0000000000000000 0x0000000000000000 0x0000000000000000), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\type_traits, line 157324800x00007FF78F90242F 218> 218>, BaseThreadInitThunk() + 0x22 bytes(s), c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480, c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.16.27023\include\functional, line 15732480 (0x000000945F6CB020 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC) 218> 218> 218> 218> 218> 218>0x00007FF795FE5B8D (0x00000094612AF6A8 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC 0xCCCCCCCCCCCCCCCC)0x00007FF9A28654E4 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x34 bytes(s) 218> 218> 218>CUSTOMBUILD : error : command failed with exit status: 2147483651 ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81296/new/ https://reviews.llvm.org/D81296 From llvm-commits at lists.llvm.org Thu Jul 9 07:28:52 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:28:52 +0000 (UTC) Subject: [PATCH] D82763: MIR: Infer not-SSA for subregister defs In-Reply-To: References: Message-ID: arsenm marked an inline comment as done. arsenm added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineRegisterInfo.h:453 + /// specified register, otherwise nullptr. + MachineOperand *getOneDef(Register Reg) const { + def_iterator DI = def_begin(Reg); ---------------- thegameg wrote: > There is a `MachineInstr *getUniqueVRegDef(Register Reg) const;`. Maybe name this one `getUniqueVRegDefOperand`? Are they actually vreg-only? I guess this would technically work for physical registers. I picked the name to mirror the existing hasOneDef just above CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82763/new/ https://reviews.llvm.org/D82763 From llvm-commits at lists.llvm.org Thu Jul 9 07:31:26 2020 From: llvm-commits at lists.llvm.org (Diogo Sampaio via llvm-commits) Date: Thu, 09 Jul 2020 07:31:26 -0700 (PDT) Subject: [llvm] a0e981c - [NFC] Add SExt multiuses test Message-ID: <5f072a3e.1c69fb81.afaa7.750f@mx.google.com> Author: Diogo Sampaio Date: 2020-07-09T15:31:16+01:00 New Revision: a0e981c190ffa0ad6b521222bc2fba504c9750ec URL: https://github.com/llvm/llvm-project/commit/a0e981c190ffa0ad6b521222bc2fba504c9750ec DIFF: https://github.com/llvm/llvm-project/commit/a0e981c190ffa0ad6b521222bc2fba504c9750ec.diff LOG: [NFC] Add SExt multiuses test Added: llvm/test/Transforms/BDCE/sext_multi_uses.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/Transforms/BDCE/sext_multi_uses.ll b/llvm/test/Transforms/BDCE/sext_multi_uses.ll new file mode 100644 index 000000000000..97709357919e --- /dev/null +++ b/llvm/test/Transforms/BDCE/sext_multi_uses.ll @@ -0,0 +1,111 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -o - -bdce -S %s | FileCheck %s +define i32 @ZEXT_0(i16 %a) { +; CHECK-LABEL: @ZEXT_0( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] +; CHECK-NEXT: ret i32 [[OR]] +; +entry: + %ext = sext i16 %a to i32 + %and = and i32 %ext, 65280 + %lsr = lshr i32 %ext, 8 + %and2 = and i32 %lsr, 255 + %or = or i32 %and, %and2 + ret i32 %or +} + +define i32 @ZEXT_1(i16 %a) { +; CHECK-LABEL: @ZEXT_1( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 +; CHECK-NEXT: [[AND:%.*]] = or i32 [[EXT]], -65536 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] +; CHECK-NEXT: ret i32 [[OR]] +; +entry: + %ext = sext i16 %a to i32 + %lsr = lshr i32 %ext, 8 + %and2 = and i32 %lsr, 255 + %and = or i32 %ext, 4294901760 + %or = or i32 %and, %and2 + ret i32 %or +} + +define i16 @NOT_ZEXT_0(i16 %a) { +; CHECK-LABEL: @NOT_ZEXT_0( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 9 +; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] +; CHECK-NEXT: [[RET:%.*]] = trunc i32 [[OR]] to i16 +; CHECK-NEXT: ret i16 [[RET]] +; +entry: + %ext = sext i16 %a to i32 + %and = and i32 %ext, 65280 + %lsr = lshr i32 %ext, 9 + %and2 = and i32 %lsr, 255 + %or = or i32 %and, %and2 + %ret = trunc i32 %or to i16 + ret i16 %ret +} + +define i32 @NOT_ZEXT_1(i16 %a) { +; CHECK-LABEL: @NOT_ZEXT_1( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 85280 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] +; CHECK-NEXT: ret i32 [[OR]] +; +entry: + %ext = sext i16 %a to i32 + %and = and i32 %ext, 85280 + %lsr = lshr i32 %ext, 8 + %and2 = and i32 %lsr, 255 + %or = or i32 %and, %and2 + ret i32 %or +} + +define i32 @NOT_ZEXT_2(i16 %a) { +; CHECK-LABEL: @NOT_ZEXT_2( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 +; CHECK-NEXT: [[AND:%.*]] = xor i32 [[EXT]], -65536 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] +; CHECK-NEXT: ret i32 [[OR]] +; +entry: + %ext = sext i16 %a to i32 + %lsr = lshr i32 %ext, 8 + %and2 = and i32 %lsr, 255 + %and = xor i32 %ext, 4294901760 + %or = or i32 %and, %and2 + ret i32 %or +} + +define i16 @clear_assumptions(i8 %x, i16 %y) { +; CHECK-LABEL: @clear_assumptions( +; CHECK-NEXT: [[EXT:%.*]] = sext i8 [[X:%.*]] to i16 +; CHECK-NEXT: [[ADD:%.*]] = add nsw i16 [[EXT]], [[Y:%.*]] +; CHECK-NEXT: [[AND:%.*]] = and i16 [[ADD]], 255 +; CHECK-NEXT: ret i16 [[AND]] +; + %ext = sext i8 %x to i16 + %add = add nsw i16 %ext, %y + %and = and i16 %add, 255 + ret i16 %and +} From llvm-commits at lists.llvm.org Thu Jul 9 07:32:33 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:32:33 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <0a03d72694fd8bc0f91f0ca7cca5d74d@localhost.localdomain> sameerarora101 marked 4 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/hide-unrelated-options.test:1 +## This test checks that unrelated options are hidden in help text. + ---------------- jhenderson wrote: > This is probably fine, but an alternative that might be more consistent is to expand help-message.test to show that the option categories that should be supported are printed (by checking the e.g. "Color options:" text), and that the unrelated options aren't (by similarly checking that the header for them isn't included). There are some examples of this for some tools like llvm-size. You might also want to add a --help-list test case as in llvm-size's help.test. ok, I have modeled `help.test` under `llvm-libtool-darwin` in a similar fashion as `llvm-size`. Thanks! (I have also removed `hide-unrelated-options.test` now). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 07:33:28 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:33:28 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: <5d0bd0aa3037a6dace974e4f26fa18c9@localhost.localdomain> probinson added a comment. Between this code and DwarfDebug, I'm having a hard time understanding the various controls over emitting entry-values. - There's an existing tuning check in DwarfDebug, now you're adding the complementary check in TargetOptions. Does this make the DwarfDebug check redundant? - There are 3 different flags to sort out whether to emit this stuff. SupportsDebugEntryValues is strictly based on the target, I get that. Then there's EnableDebugEntryValues in the TargetOptions, which seems to be tied to a tool-command-line option; and separately there's EmitDwarfDebugEntryValues in DwarfDebug, with its own command-line option. Can we simplify this at all? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 From llvm-commits at lists.llvm.org Thu Jul 9 07:36:27 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:36:27 +0000 (UTC) Subject: [PATCH] D83463: [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE In-Reply-To: References: Message-ID: <53b41d36d4034961d80ace40462aee8a@localhost.localdomain> djtodoro added a comment. In D83463#2141678 , @probinson wrote: > Thanks! This will keep the debugger folks off my case. :-) :D Thanks for the review! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83463/new/ https://reviews.llvm.org/D83463 From llvm-commits at lists.llvm.org Thu Jul 9 07:38:25 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:38:25 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <3f3a5fe0cd471ad1a85d2f69d4c7517f@localhost.localdomain> sameerarora101 updated this revision to Diff 276735. sameerarora101 added a comment. Remove `--output` option (keeping `-o`) and update `help.test` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 Files: llvm/docs/CommandGuide/index.rst llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/CMakeLists.txt llvm/test/tools/llvm-libtool-darwin/Inputs/input1.yaml llvm/test/tools/llvm-libtool-darwin/Inputs/input2.yaml llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82923.276735.patch Type: text/x-patch Size: 11756 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 07:39:10 2020 From: llvm-commits at lists.llvm.org (Lewis Revill via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:39:10 +0000 (UTC) Subject: [PATCH] D79873: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbbp asm instructions In-Reply-To: References: Message-ID: <6fb3d0d6e7624618ed75cfd267e092b3@localhost.localdomain> lewis-revill added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:870 +let Predicates = [HasStdExtZbbOrZbp, IsRV64] in { +def : Pat<(or (riscv_sllw (assertsexti32 GPR:$rs1), (assertsexti32 GPR:$rs2)), ---------------- I'm not quite sure given the tests here how these patterns are used? Is `@llvm.fshl.i32` lowered to this pattern? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79873/new/ https://reviews.llvm.org/D79873 From llvm-commits at lists.llvm.org Thu Jul 9 07:39:21 2020 From: llvm-commits at lists.llvm.org (Chris Bieneman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:39:21 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: beanz added a comment. There are a lot of downsides to global constructors. One is that they are difficult to debug because the code runs before `main`, so if anything goes wrong callstacks are odd, debuggers sometimes struggle to attach, and general headaches ensue. Additionally global constructors are an always-paid performance hit. Local statics are only constructed if they are used, but global statics are always constructed, and transitively they construct anything else they use. This can severely impact the process launch time of a program and it is very difficult to optimize around (other than just not using global constructors). While process launch time may not be something most applications care about because it "only happens once" for a compiler that can be invoked tens or hundreds of thousands of times per day per user, that cost really adds up. The last big issue with global constructors in LLVM is that LLVM is used as a library in many contexts, and global constructors (and generally global variables) cause lots of issues with that. One of the big problems with using global constructors in library code is that it is really easy to get multiple copies of the globals embedded into the same process. This generally happens when you have a single executable that links two libraries that both include parts of LLVM (in particular libLLVMSupport which you usually want to be able to inline/LTO across). There have been periodic pushes to clean up the global variables, and most of the ones in LLVM today are `cl::opt` objects which need a re-design to clean up. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Thu Jul 9 07:39:27 2020 From: llvm-commits at lists.llvm.org (Whitney Tsang via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:39:27 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: Whitney added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:222 + SmallPtrSet InstsToCheck) { + return !std::any_of(InstsToCheck.begin(), InstsToCheck.end(), + [&DI, &I](Instruction *CurInst) { ---------------- 1. !any_of can change to none_of 2. use llvm::none_of instead of std::none_of ``` return none_of(InstsToCheck, [&DI, &I] ... ``` ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:250 + if (isa(MemUseOrDef)) { + if (isa(DestMemUseOrDef) || + isa(DestMemUseOrDef)) { ---------------- DestMemUseOrDef is of type MemoryUseOrDef, so it must be a MemoryUse of MemoryDef. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:582 + EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, &PDT, &DI, &MSSAU)); + EXPECT_TRUE(isSafeToMoveBefore(*LI2, *LI1, DT, &PDT, nullptr, &MSSAU)); }); ---------------- bmahjour wrote: > Please also add a check to make sure independent memory load/stores can be moved passed each other. For example, `%load2` should be able to move before the store to B. > > store i32 %load1, i32* %arrayidx_B, align 4 > %load2 = load i32, i32* %arrayidx_A, align 4 Good idea. The test should include all four types of dependence, and all should be considered safe. Also make sure they are bidirectional, so check both move forward and move backward. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:513 *CI_safecall->getNextNode(), DT, &PDT, - &DI)); + &DI, &MSSAU)); + EXPECT_TRUE(isSafeToMoveBefore(*CI_safecall->getPrevNode(), ---------------- RithikSharma wrote: > Whitney wrote: > > change all the existing ones to `&DI, nullptr))` to make sure you are testing `DI`. > Sure but even when we give preference to DI? > > ``` > if (DI) > return isDependenceSafe(I, *DI, InstsToCheck); > else if (MSSAU) > return isDependenceSafe(I, *MSSAU, InstsToCheck); > ``` Yes, because the code may change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Thu Jul 9 07:41:29 2020 From: llvm-commits at lists.llvm.org (Chris Bieneman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:41:29 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <72ef83addb669dd6ad123e0844d2bfe7@localhost.localdomain> beanz added a comment. Logically your patch here looks fine. I'd rather see it use a `std::lock_guard` as the original code did, with a nested scope added, but that is pretty nitpicky. Also the code doesn't conform to LLVM's coding standards for variable naming (https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Thu Jul 9 07:42:51 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:42:51 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <4cb7488bdf2e6526ed9dbccb71c03214@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:5 +# RUN: llvm-libtool-darwin -help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help-list | \ ---------------- It's quite possible the llvm-size test doesn't have testing for unrelated options. How is it tested in this version now? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 07:45:42 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:45:42 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: RithikSharma updated this revision to Diff 276737. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 Files: llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83311.276737.patch Type: text/x-patch Size: 17680 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 07:46:10 2020 From: llvm-commits at lists.llvm.org (Lewis Revill via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:46:10 +0000 (UTC) Subject: [PATCH] D79874: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbs asm instructions In-Reply-To: References: Message-ID: lewis-revill added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:666 +let Predicates = [HasStdExtZbs, IsRV64] in +def : Pat<(and (xor (riscv_sllw 1, GPR:$rs2), -1), GPR:$rs1), + (SBCLR GPR:$rs1, GPR:$rs2)>; ---------------- Why does this need to be `riscv_sllw` as opposed to `shl`? Isn't the former intended for matching patterns resulting from a 32 bit operation? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79874/new/ https://reviews.llvm.org/D79874 From llvm-commits at lists.llvm.org Thu Jul 9 07:48:41 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:48:41 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: <56b134b0b668b23e60e58155ec0de565@localhost.localdomain> djtodoro added a comment. In D83462#2141719 , @probinson wrote: > Between this code and DwarfDebug, I'm having a hard time understanding the various controls over emitting entry-values. I see.. I think we can/should simplify it. > - There's an existing tuning check in DwarfDebug, now you're adding the complementary check in TargetOptions. Does this make the DwarfDebug check redundant? It is is redundant now.. Thanks! > - There are 3 different flags to sort out whether to emit this stuff. SupportsDebugEntryValues is strictly based on the target, I get that. Then there's EnableDebugEntryValues in the TargetOptions, which seems to be tied to a tool-command-line option; and separately there's EmitDwarfDebugEntryValues in DwarfDebug, with its own command-line option. Can we simplify this at all? I think yes. We needed the first two (SupportsDebugEntryValues and EnableDebugEntryValues from TargetOptions) in order to enable it by default for some targets and to leave an experimental option for developers who want to extend the support for the other architectures. There is a situation when different debuggers come into the game, so we thought to control it on the very end (within DwarfDebug with the EmitDwarfDebugEntryValues), but at that place we can control only caller-side-dwarf-symbols (dw_tag_call_sites) but not the dw_entry_values (callee side; since it is being produced during LiveDebugValues phase). So, having this patch, we can get rid of the DwarfDebug::EmitDwarfDebugEntryValues option, since we can experiment everything with the fields from TargetOptions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 From llvm-commits at lists.llvm.org Thu Jul 9 07:51:04 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via llvm-commits) Date: Thu, 09 Jul 2020 07:51:04 -0700 (PDT) Subject: [lld] beb52b1 - [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation Message-ID: <5f072ed8.1c69fb81.8958e.7cab@mx.google.com> Author: Stefan Pintilie Date: 2020-07-09T09:50:19-05:00 New Revision: beb52b12cb175d0df9bf168837d153f857f69eda URL: https://github.com/llvm/llvm-project/commit/beb52b12cb175d0df9bf168837d153f857f69eda DIFF: https://github.com/llvm/llvm-project/commit/beb52b12cb175d0df9bf168837d153f857f69eda.diff LOG: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation The R_PPC64_REL24 is used in function calls when the caller requires a valid TOC pointer. If the callee shares the same TOC or does not clobber the TOC pointer then a direct call can be made. If the callee does not share the TOC a thunk must be added to save the TOC pointer for the caller. Up until PC Relative was introduced all local calls on medium and large code models were assumed to share a TOC. This is no longer the case because if the caller requires a TOC and the callee is PC Relative then the callee can clobber the TOC even if it is in the same DSO. This patch is to add support for a TOC caller calling a PC Relative callee that clobbers the TOC. Reviewed By: sfertile, MaskRay Differential Revision: https://reviews.llvm.org/D82950 Added: lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s Modified: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp Removed: ################################################################################ diff --git a/lld/ELF/Arch/PPC64.cpp b/lld/ELF/Arch/PPC64.cpp index 0764dabe45f8..cf58b322bb3a 100644 --- a/lld/ELF/Arch/PPC64.cpp +++ b/lld/ELF/Arch/PPC64.cpp @@ -1039,6 +1039,11 @@ bool PPC64::needsThunk(RelExpr expr, RelType type, const InputFile *file, if (s.isInPlt()) return true; + // This check looks at the st_other bits of the callee. If the value is 1 + // then the callee clobbers the TOC and we need an R2 save stub. + if ((s.stOther >> 5) == 1) + return true; + // If a symbol is a weak undefined and we are compiling an executable // it doesn't need a range-extending thunk since it can't be called. if (s.isUndefWeak() && !config->shared) diff --git a/lld/ELF/Thunks.cpp b/lld/ELF/Thunks.cpp index 744ceaf725cf..ea74d343ebb2 100644 --- a/lld/ELF/Thunks.cpp +++ b/lld/ELF/Thunks.cpp @@ -279,6 +279,20 @@ class PPC64PltCallStub final : public Thunk { void addSymbols(ThunkSection &isec) override; }; +// PPC64 R2 Save Stub +// When the caller requires a valid R2 TOC pointer but the callee does not +// require a TOC pointer and the callee cannot guarantee that it doesn't +// clobber R2 then we need to save R2. This stub: +// 1) Saves the TOC pointer to the stack. +// 2) Tail calls the callee. +class PPC64R2SaveStub final : public Thunk { +public: + PPC64R2SaveStub(Symbol &dest) : Thunk(dest, 0) {} + uint32_t size() override { return 8; } + void writeTo(uint8_t *buf) override; + void addSymbols(ThunkSection &isec) override; +}; + // A bl instruction uses a signed 24 bit offset, with an implicit 4 byte // alignment. This gives a possible 26 bits of 'reach'. If the call offset is // larger then that we need to emit a long-branch thunk. The target address @@ -822,6 +836,21 @@ void PPC64PltCallStub::addSymbols(ThunkSection &isec) { s->file = destination.file; } +void PPC64R2SaveStub::writeTo(uint8_t *buf) { + int64_t offset = destination.getVA() - (getThunkTargetSym()->getVA() + 4); + // The branch offset needs to fit in 26 bits. + if (!isInt<26>(offset)) + fatal("R2 save stub branch offset is too large: " + Twine(offset)); + write32(buf + 0, 0xf8410018); // std r2,24(r1) + write32(buf + 4, 0x48000000 | (offset & 0x03fffffc)); // b +} + +void PPC64R2SaveStub::addSymbols(ThunkSection &isec) { + Defined *s = addSymbol(saver.save("__toc_save_" + destination.getName()), + STT_FUNC, 0, isec); + s->needsTocRestore = true; +} + void PPC64LongBranchThunk::writeTo(uint8_t *buf) { int64_t offset = in.ppc64LongBranchTarget->getEntryVA(&destination, addend) - getPPC64TocBase(); @@ -950,6 +979,11 @@ static Thunk *addThunkPPC64(RelType type, Symbol &s, int64_t a) { if (s.isInPlt()) return make(s); + // This check looks at the st_other bits of the callee. If the value is 1 + // then the callee clobbers the TOC and we need an R2 save stub. + if ((s.stOther >> 5) == 1) + return make(s); + if (config->picThunk) return make(s, a); diff --git a/lld/test/ELF/ppc64-error-toc-local-call.s b/lld/test/ELF/ppc64-error-toc-local-call.s new file mode 100644 index 000000000000..f23eba101209 --- /dev/null +++ b/lld/test/ELF/ppc64-error-toc-local-call.s @@ -0,0 +1,32 @@ +# RUN: llvm-mc -filetype=obj -triple=powerpc64le %s -o %t.o +# RUN: not ld.lld %t.o -o /dev/null 2>&1 | FileCheck %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64 %s -o %t.o +# RUN: not ld.lld %t.o -o /dev/null 2>&1 | FileCheck %s + +## This test checks that the linker produces errors when it is missing the nop +## after a local call to a callee with st_other=1. + +# CHECK: (.text+0xC): call to save_callee lacks nop, can't restore toc +# CHECK: (.text+0x1C): call to save_callee lacks nop, can't restore toc + +callee: + .localentry callee, 1 + blr # 0x0 + +caller: +.Lfunc_gep1: + addis 2, 12, .TOC.-.Lfunc_gep1 at ha + addi 2, 2, .TOC.-.Lfunc_gep1 at l +.Lfunc_lep1: + .localentry caller, .Lfunc_lep1-.Lfunc_gep1 + bl callee # 0xC + blr + +caller_tail: +.Lfunc_gep2: + addis 2, 12, .TOC.-.Lfunc_gep2 at ha + addi 2, 2, .TOC.-.Lfunc_gep2 at l +.Lfunc_lep2: + .localentry caller_tail, .Lfunc_lep2-.Lfunc_gep2 + b callee # 0x1C diff --git a/lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s b/lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s new file mode 100644 index 000000000000..89f62c7de7ce --- /dev/null +++ b/lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s @@ -0,0 +1,33 @@ +# REQUIRES: ppc +# RUN: echo 'SECTIONS { \ +# RUN: .text_callee 0x10010000 : { *(.text_callee) } \ +# RUN: .text_caller 0x20020000 : { *(.text_caller) } \ +# RUN: }' > %t.script + +# RUN: llvm-mc -filetype=obj -triple=powerpc64le %s -o %t.o +# RUN: not ld.lld -T %t.script %t.o -o %t 2>&1 >/dev/null | FileCheck %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64 %s -o %t.o +# RUN: not ld.lld -T %t.script %t.o -o %t 2>&1 >/dev/null | FileCheck %s + +# CHECK: error: R2 save stub branch offset is too large: -268501028 + +.section .text_callee, "ax", %progbits +callee: + .localentry callee, 1 + blr + +.section .text_caller, "ax", %progbits +caller: +.Lfunc_gep1: + addis 2, 12, .TOC.-.Lfunc_gep1 at ha + addi 2, 2, .TOC.-.Lfunc_gep1 at l +.Lfunc_lep1: + .localentry caller, .Lfunc_lep1-.Lfunc_gep1 + addis 30, 2, global at toc@ha + lwz 3, global at toc@l(30) + bl callee + nop + blr +global: + .long 0 diff --git a/lld/test/ELF/ppc64-toc-call-to-pcrel.s b/lld/test/ELF/ppc64-toc-call-to-pcrel.s new file mode 100644 index 000000000000..1807895a1914 --- /dev/null +++ b/lld/test/ELF/ppc64-toc-call-to-pcrel.s @@ -0,0 +1,74 @@ +# REQUIRES: ppc +# RUN: echo 'SECTIONS { \ +# RUN: .text_callee 0x10010000 : { *(.text_callee) } \ +# RUN: .text_caller 0x10020000 : { *(.text_caller) } \ +# RUN: }' > %t.script + +# RUN: llvm-mc -filetype=obj -triple=powerpc64le %s -o %t.o +# RUN: ld.lld -T %t.script %t.o -o %t +# RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=future %t | FileCheck %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64 %s -o %t.o +# RUN: ld.lld -T %t.script %t.o -o %t +# RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=future %t | FileCheck %s + +# The point of this test is to make sure that when a function with TOC access +# a local function with st_other=1, a TOC save stub is inserted. + +# SYMBOL: Symbol table '.symtab' contains 7 entries +# SYMBOL: 10010000 0 NOTYPE LOCAL DEFAULT [] 1 callee +# SYMBOL: 10020000 0 NOTYPE LOCAL DEFAULT [] 2 caller +# SYMBOL: 10020020 0 NOTYPE LOCAL DEFAULT [] 2 caller_14 +# SYMBOL: 1002003c 8 FUNC LOCAL DEFAULT 2 __toc_save_callee + +# CHECK-LABEL: callee +# CHECK: blr + +# CHECK-LABEL: caller +# CHECK: bl 0x1002003c +# CHECK-NEXT: ld 2, 24(1) +# CHECK-NEXT: blr + +# CHECK-LABEL: caller_14 +# CHECK: bfl 0, 0x1002003c +# CHECK-NEXT: ld 2, 24(1) +# CHECK-NEXT: blr + +# CHECK-LABEL: __toc_save_callee +# CHECK-NEXT: std 2, 24(1) +# CHECK-NEXT: b 0x10010000 + + +.section .text_callee, "ax", %progbits +callee: + .localentry callee, 1 + blr + +.section .text_caller, "ax", %progbits +caller: +.Lfunc_gep1: + addis 2, 12, .TOC.-.Lfunc_gep1 at ha + addi 2, 2, .TOC.-.Lfunc_gep1 at l +.Lfunc_lep1: + .localentry caller, .Lfunc_lep1-.Lfunc_gep1 + addis 30, 2, global at toc@ha + lwz 3, global at toc@l(30) + bl callee + nop + blr +global: + .long 0 + +caller_14: +.Lfunc_gep2: + addis 2, 12, .TOC.-.Lfunc_gep1 at ha + addi 2, 2, .TOC.-.Lfunc_gep1 at l +.Lfunc_lep2: + .localentry caller_14, .Lfunc_lep2-.Lfunc_gep2 + addis 30, 2, global at toc@ha + lwz 3, global at toc@l(30) + bcl 4, 0, callee + nop + blr From llvm-commits at lists.llvm.org Thu Jul 9 07:51:16 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:51:16 +0000 (UTC) Subject: [PATCH] D82950: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation In-Reply-To: References: Message-ID: <7ac9ec1c2e7d32f71c53f3c5fe2f43cf@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbeb52b12cb17: [PowerPC] Support PCRelative Callees for R_PPC64_REL24 Relocation (authored by stefanp, committed by kamaub). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82950/new/ https://reviews.llvm.org/D82950 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Thunks.cpp lld/test/ELF/ppc64-error-toc-local-call.s lld/test/ELF/ppc64-toc-call-to-pcrel-long-jump.s lld/test/ELF/ppc64-toc-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82950.276738.patch Type: text/x-patch Size: 7586 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 07:52:34 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:52:34 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <6e18458d29d76707c143be8d645e7d5c@localhost.localdomain> RithikSharma marked 3 inline comments as done. RithikSharma added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:228 + DepResult->isAnti())) + return true; + return false; ---------------- fhahn wrote: > The if here doesn't add much I think. It would be simpler to just `return DepResult && DepResult->isOutput() || DepResult->isFlow() || > DepResult->isAnti()`? Thanks, yeah I should directly return it. I missed it in this diff as well. I'll update in the next diff. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:233 + +bool isDependenceSafe(Instruction &I, MemorySSAUpdater &MSSAU, + SmallPtrSet InstsToCheck) { ---------------- fhahn wrote: > I don't think there is a reason to pass MemorySSAUpdater here, as you don't modify the IR. Just pass MemorySSA directly. > > Also, please add a comment what the logic behind the checks is (same for the DI version) Acknowledged and updated to MemorySSA instead of MemorySSAUpdater. Will add the required comments in the next diff. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:368 // Skip tests when we don't have PDT or DI - if (!PDT || !DI) + if (!PDT || !(DI || MSSAU)) return false; ---------------- fhahn wrote: > Does it make sense to even call this function if either of those are not available, i.e. if all those required wouldn't it make sense to assert that they are all provided or turn them into references? I'm sorry, I didn't understand. We need at least DI or MSSA to find dependency. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Thu Jul 9 07:52:53 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:52:53 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: sameerarora101 marked an inline comment as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:5 +# RUN: llvm-libtool-darwin -help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help-list | \ ---------------- jhenderson wrote: > It's quite possible the llvm-size test doesn't have testing for unrelated options. How is it tested in this version now? yup, you are right, `llvm-size` doesn't test that unrelated options are not present. For my case, the unrelated option `--safepoint-ir-verifier-print-only:` comes under `General Options:` when I add the support for `-static`. I can add `--implicit-check-not=General` in the second diff D83002 for that? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 07:55:45 2020 From: llvm-commits at lists.llvm.org (Francesco Petrogalli via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:55:45 +0000 (UTC) Subject: [PATCH] D70174: [AArch64][SVE] Use FP for scavenging slot In-Reply-To: References: Message-ID: <7a1fcc6fd18290830f5b619e67a2820d@localhost.localdomain> fpetrogalli added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:2522 // Min/MaxCSFrameIndex, respectively. // Returns the size of the stack. +static int64_t ---------------- I think you should update the description of the method to reflect the new interface. ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:2584-2585 int MinCSFrameIndex, MaxCSFrameIndex; - return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false); + auto Assign = [](int FI, int64_t Offset) {}; + auto Align = [](int FI, unsigned Align) {}; + return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, ---------------- I think you can remove these two and pass them directly inside the call as `[](int, int64_t){}` and `[](int, unsigned){}` You could actually make these two values the default argument of the method, so you don't have to bother specifying them here? (up to you, though) ================ Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:2592-2598 + auto Assign = [&MFI](int FI, int64_t Offset) { + LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n"); + MFI.setObjectOffset(FI, Offset); + }; + auto Align = [&MFI](int FI, unsigned Align) { + MFI.setObjectAlignment(FI, Align); + }; ---------------- Nit: these should also be passed directly as argument to the call, no need to set up variables that are not used anywhere else. I say this is a nit because the first lambda is 4 lines long, so it might not look beautiful once placed inside the function parameters... :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70174/new/ https://reviews.llvm.org/D70174 From llvm-commits at lists.llvm.org Thu Jul 9 07:56:14 2020 From: llvm-commits at lists.llvm.org (James Henderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:56:14 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <2c2601ab93216ac66b6c7ab89e4e0cfd@localhost.localdomain> jhenderson added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:5 +# RUN: llvm-libtool-darwin -help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help-list | \ ---------------- sameerarora101 wrote: > jhenderson wrote: > > It's quite possible the llvm-size test doesn't have testing for unrelated options. How is it tested in this version now? > yup, you are right, `llvm-size` doesn't test that unrelated options are not present. For my case, the unrelated option `--safepoint-ir-verifier-print-only:` comes under `General Options:` when I add the support for `-static`. I can add `--implicit-check-not=General` in the second diff D83002 for that? I think we've done something similar elsewhere. Take a look around, and try seeing if there's a precedent. I might be inclined for a more verbose --implicit-check-not for "General Options:" for safety (since "General" might appear in somebdoy's path). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 07:57:42 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:57:42 +0000 (UTC) Subject: [PATCH] D82923: introducing llvm-libtool-darwin In-Reply-To: References: Message-ID: <4eeeb5280be2835e00b3fa17d8860bb2@localhost.localdomain> sameerarora101 marked 2 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:5 +# RUN: llvm-libtool-darwin -help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help | FileCheck --check-prefixes=LIBTOOL-USAGE,CATEG %s --match-full-lines +# RUN: llvm-libtool-darwin --help-list | \ ---------------- jhenderson wrote: > sameerarora101 wrote: > > jhenderson wrote: > > > It's quite possible the llvm-size test doesn't have testing for unrelated options. How is it tested in this version now? > > yup, you are right, `llvm-size` doesn't test that unrelated options are not present. For my case, the unrelated option `--safepoint-ir-verifier-print-only:` comes under `General Options:` when I add the support for `-static`. I can add `--implicit-check-not=General` in the second diff D83002 for that? > I think we've done something similar elsewhere. Take a look around, and try seeing if there's a precedent. > > I might be inclined for a more verbose --implicit-check-not for "General Options:" for safety (since "General" might appear in somebdoy's path). ok got it, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82923/new/ https://reviews.llvm.org/D82923 From llvm-commits at lists.llvm.org Thu Jul 9 07:58:10 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 14:58:10 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <7c916577f8c80c29c4478e82a543c33f@localhost.localdomain> wmi added a comment. In D79978#2140849 , @MaskRay wrote: > I haven't looked into the details, but the test suggests that the patch is wrong: > > # basic-block-sections-cfi-1.ll > .section .text,"ax", at progbits,unique,2 > _Z2f3b.2: # %if.end > .cfi_startproc > .cfi_def_cfa %rbp, 16 # this should be inserted after addq $16, %rsp > .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 > addq $16, %rsp > popq %rbp > .cfi_def_cfa %rsp, 8 > retq > I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Thu Jul 9 08:03:19 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:03:19 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: fhahn added a comment. Thanks for improving the LangRef and verifier. I am not entirely sure about referring to the matrixes as linearized (see inline comment), otherwise looks great. ================ Comment at: llvm/docs/LangRef.rst:15500 or the memory layout) can be expressed using the matrix intrinsics. Matrixes are -embedded in a flat vector and the intrinsics take the dimensions as arguments. +linearized in a vector and the intrinsics take the dimensions as arguments. Currently column-major layout is assumed. The intrinsics support both integer ---------------- When I read linearized here, I thing about https://en.wikipedia.org/wiki/Linearization , so there might be potential for confusion. It might be worth defining exactly what we mean be embedding here, then further uses should be un-ambigous: the columns of a matrix R x C are embedded into a vector such that the elements of subsequent columns are adjacent in the vector. Or more formally element `I` of column `J` is at index `J * R + I` in the vector (with indices starting at 0) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Thu Jul 9 08:08:30 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:08:30 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: dnsampaio updated this revision to Diff 276741. dnsampaio added a comment. Re-fixed test file, now showing only differences Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/BDCE/sext_multi_uses.ll Index: llvm/test/Transforms/BDCE/sext_multi_uses.ll =================================================================== --- llvm/test/Transforms/BDCE/sext_multi_uses.ll +++ llvm/test/Transforms/BDCE/sext_multi_uses.ll @@ -1,11 +1,11 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -o - -bdce -S %s | FileCheck %s +; RUN: opt -S -bdce < %s | FileCheck %s define i32 @ZEXT_0(i16 %a) { ; CHECK-LABEL: @ZEXT_0( ; CHECK-NEXT: entry: -; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 -; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280 -; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[AND:%.*]] = and i32 [[TMP0]], 65280 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[TMP0]], 8 ; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 ; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] ; CHECK-NEXT: ret i32 [[OR]] @@ -22,10 +22,10 @@ define i32 @ZEXT_1(i16 %a) { ; CHECK-LABEL: @ZEXT_1( ; CHECK-NEXT: entry: -; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 -; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[TMP0]], 8 ; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 -; CHECK-NEXT: [[AND:%.*]] = or i32 [[EXT]], -65536 +; CHECK-NEXT: [[AND:%.*]] = or i32 [[TMP0]], -65536 ; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] ; CHECK-NEXT: ret i32 [[OR]] ; @@ -99,8 +99,8 @@ define i16 @clear_assumptions(i8 %x, i16 %y) { ; CHECK-LABEL: @clear_assumptions( -; CHECK-NEXT: [[EXT:%.*]] = sext i8 [[X:%.*]] to i16 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i16 [[EXT]], [[Y:%.*]] +; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[X:%.*]] to i16 +; CHECK-NEXT: [[ADD:%.*]] = add i16 [[TMP1]], [[Y:%.*]] ; CHECK-NEXT: [[AND:%.*]] = and i16 [[ADD]], 255 ; CHECK-NEXT: ret i16 [[AND]] ; Index: llvm/lib/Transforms/Scalar/BDCE.cpp =================================================================== --- llvm/lib/Transforms/Scalar/BDCE.cpp +++ llvm/lib/Transforms/Scalar/BDCE.cpp @@ -9,7 +9,8 @@ // This file implements the Bit-Tracking Dead Code Elimination pass. Some // instructions (shifts, some ands, ors, etc.) kill some of their input bits. // We track these dead bits and remove instructions that compute only these -// dead bits. +// dead bits. We also simplify sext that generates unused extension bits, +// converting it to a zext. // //===----------------------------------------------------------------------===// @@ -19,6 +20,7 @@ #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/DemandedBits.h" #include "llvm/Analysis/GlobalsModRef.h" +#include "llvm/IR/IRBuilder.h" #include "llvm/IR/InstIterator.h" #include "llvm/IR/Instructions.h" #include "llvm/InitializePasses.h" @@ -33,6 +35,8 @@ STATISTIC(NumRemoved, "Number of instructions removed (unused)"); STATISTIC(NumSimplified, "Number of instructions trivialized (dead bits)"); +STATISTIC(NumSExt2ZExt, + "Number of sign extension instructions converted to zero extension"); /// If an instruction is trivialized (dead), then the chain of users of that /// instruction may need to be cleared of assumptions that can no longer be @@ -109,6 +113,23 @@ continue; } + // Convert SExt into ZExt if none of the extension bits is required + if (SExtInst *SE = dyn_cast(&I)) { + APInt Demanded = DB.getDemandedBits(SE); + const uint32_t SrcBitSize = SE->getSrcTy()->getScalarSizeInBits(); + auto *const DstTy = SE->getDestTy(); + const uint32_t DestBitSize = DstTy->getScalarSizeInBits(); + if (Demanded.countLeadingZeros() >= (DestBitSize - SrcBitSize)) { + clearAssumptionsOfUsers(SE, DB); + IRBuilder<> Builder(SE); + I.replaceAllUsesWith(Builder.CreateZExt(SE->getOperand(0), DstTy)); + Worklist.push_back(SE); + Changed = true; + NumSExt2ZExt++; + continue; + } + } + for (Use &U : I.operands()) { // DemandedBits only detects dead integer uses. if (!U->getType()->isIntOrIntVectorTy()) -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276741.patch Type: text/x-patch Size: 4189 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:11:25 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:11:25 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: <454e2dc67093a36903d053db6c415075@localhost.localdomain> djtodoro updated this revision to Diff 276742. djtodoro added a comment. - Remove unnecessary DwarfDebug entry-values option CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/TargetOptionsImpl.cpp llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir Index: llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir =================================================================== --- llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir +++ llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir @@ -21,7 +21,7 @@ # RUN: | llvm-dwarfdump - | FileCheck %s -check-prefixes=CHECK-DWARF5 -implicit-check-not=DW_AT_call # # RUN: llc -emit-call-site-info -dwarf-version 5 -filetype=obj -debugger-tune=sce \ -# RUN: -emit-debug-entry-values -debug-entry-values -mtriple=x86_64-unknown-unknown \ +# RUN: -debug-entry-values -mtriple=x86_64-unknown-unknown \ # RUN: -start-after=machineverifier -o - %s | llvm-dwarfdump - | FileCheck %s -check-prefixes=CHECK-DWARF5 # # This is based on the following reproducer: Index: llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir =================================================================== --- llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir +++ llvm/test/DebugInfo/MIR/X86/DW_OP_entry_value.mir @@ -1,16 +1,22 @@ # RUN: llc -start-before=livedebugvalues -mtriple=x86_64-apple-darwin -o %t %s -filetype=obj # RUN: llvm-dwarfdump %t | FileCheck %s -# -# int global; -# int foo(int p, int q, int r) { -# global = p + 1; -# asm __volatile("" : : : "edi", "esi", "edx"); -# return 123; -# } + +# RUN: llc -start-before=livedebugvalues -debugger-tune=sce -mtriple=x86_64-sce-ps4 -o %t1 %s -filetype=obj +# RUN: llvm-dwarfdump %t1 | FileCheck %s -check-prefix=SCE + +## Based on: +## int global; +## int foo(int p, int q, int r) { +## global = p + 1; +## asm __volatile("" : : : "edi", "esi", "edx"); +## return 123; +## } # CHECK: DW_TAG_formal_parameter # CHECK: DW_OP_entry_value +# SCE-NOT: DW_OP_GNU_entry_value + --- | ; ModuleID = 'multiple-param-dbg-value-entry.ll' source_filename = "multiple-param-dbg-value-entry.c" Index: llvm/lib/CodeGen/TargetOptionsImpl.cpp =================================================================== --- llvm/lib/CodeGen/TargetOptionsImpl.cpp +++ llvm/lib/CodeGen/TargetOptionsImpl.cpp @@ -49,5 +49,6 @@ /// NOTE: There are targets that still do not support the debug entry values /// production. bool TargetOptions::ShouldEmitDebugEntryValues() const { - return SupportsDebugEntryValues || EnableDebugEntryValues; + return (SupportsDebugEntryValues && DebuggerTuning != DebuggerKind::SCE) || + EnableDebugEntryValues; } Index: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp =================================================================== --- llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -95,10 +95,6 @@ "use-dwarf-ranges-base-address-specifier", cl::Hidden, cl::desc("Use base address specifiers in debug_ranges"), cl::init(false)); -static cl::opt EmitDwarfDebugEntryValues( - "emit-debug-entry-values", cl::Hidden, - cl::desc("Emit the debug entry values"), cl::init(false)); - static cl::opt GenerateARangeSection("generate-arange-section", cl::Hidden, cl::desc("Generate dwarf aranges"), @@ -425,9 +421,7 @@ // Emit call-site-param debug info for GDB and LLDB, if the target supports // the debug entry values feature. It can also be enabled explicitly. - EmitDebugEntryValues = (Asm->TM.Options.ShouldEmitDebugEntryValues() && - (tuneForGDB() || tuneForLLDB())) || - EmitDwarfDebugEntryValues; + EmitDebugEntryValues = Asm->TM.Options.ShouldEmitDebugEntryValues(); Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83462.276742.patch Type: text/x-patch Size: 3681 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:12:07 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:12:07 +0000 (UTC) Subject: [PATCH] D83463: [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE In-Reply-To: References: Message-ID: <5066cb94eb86ed4fa6d6f3fe22c13366@localhost.localdomain> djtodoro updated this revision to Diff 276743. djtodoro added a comment. - Rebase on top of D83462 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83463/new/ https://reviews.llvm.org/D83463 Files: llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83463.276743.patch Type: text/x-patch Size: 4193 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:13:33 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:13:33 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: paulwalker-arm added a comment. In D83395#2139892 , @efriedma wrote: > Probably worth adding a testcase for truncating from `<4 x i64>` to `<4 x i8>`. I'm happy to add this but just wanted to query what it gives. `<4 x i8>` is not a legal type so the test just exercises the same truncate path as `<4 x i64>` to `<4 x i16>`, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Thu Jul 9 08:15:30 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:15:30 +0000 (UTC) Subject: [PATCH] D83375: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: References: Message-ID: <3661257473ce509263aab77064aef164@localhost.localdomain> jfb added inline comments. ================ Comment at: llvm/include/llvm/Bitcode/LLVMBitCodes.h:539 FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, vol, - // ordering, synchscope] + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, + // success_ordering, ssid, ---------------- gchatelet wrote: > The documentation here was wrong. > alignment was never stored for `FUNC_CODE_INST_CMPXCHG_OLD` and `failure_ordering` and `weak` were optional. It used to only have "ordering", and didn't separate success / failure (so it wasn't optional as much as not there). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83375/new/ https://reviews.llvm.org/D83375 From llvm-commits at lists.llvm.org Thu Jul 9 08:18:48 2020 From: llvm-commits at lists.llvm.org (Hamilton Tobon-Mosquera via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:18:48 +0000 (UTC) Subject: [PATCH] D83316: [OpenMPOpt][WIP] Structure for unittests In-Reply-To: References: Message-ID: hamax97 added a comment. In D83316#2140024 , @jdoerfert wrote: > Should we merge the test header into the test cpp? > > We need an actual unit test that runs :) Sure sure, I had an external problem and couldn't finish uploading the patches. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83316/new/ https://reviews.llvm.org/D83316 From llvm-commits at lists.llvm.org Thu Jul 9 08:23:44 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:23:44 +0000 (UTC) Subject: [PATCH] D83465: Encode alignment attribute for `atomicrmw` In-Reply-To: References: Message-ID: <1c97382e23e7cc1e04f9cf9e33c74452@localhost.localdomain> jfb added inline comments. ================ Comment at: llvm/lib/Bitcode/Reader/BitcodeReader.cpp:5083 + if (Record.size() == 7) + Alignment = Align(1ULL << Record[6]); + else ---------------- I think you want this instead: ``` MaybeAlign Align; if (Error Err = parseAlignmentValue(Record[6], Align)) return Err; ``` ? ================ Comment at: llvm/test/Bitcode/compatibility.ll:756 + %atomicrmw.umin = atomicrmw volatile umin i32* %word, i32 22 syncscope("singlethread") monotonic, align 4 + ; CHECK: %atomicrmw.umin = atomicrmw volatile umin i32* %word, i32 22 syncscope("singlethread") monotonic, align 4 fence acquire ---------------- I think you still need tests *without* alignment? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83465/new/ https://reviews.llvm.org/D83465 From llvm-commits at lists.llvm.org Thu Jul 9 08:25:29 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:25:29 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: <8c01c60f79651ced100d38be79265659@localhost.localdomain> StephenTozer marked an inline comment as done. StephenTozer added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:503 + } + ArrayRef getDebugOperandsForReg(Register Reg) { + assert(isDebugValue() && "Tried to get debug operands for non-debug_value"); ---------------- djtodoro wrote: > Can we use templates to avoid duplicated code here for `getDebugOperandsForReg()`? We can, as long as we use a static function to hold the common code (if there's a way to do so without a static function then I'd be happy to go with that instead); the solution looks something like this: ``` template static ArrayRef getDebugOperandsForReg(Instruction *MI, Register Reg) { assert(MI->isDebugValue() && "Tried to get debug operands for non-debug_value"); SmallVector Ops; for (Operand &Op : MI->debug_operands()) { if (Op.isReg() && Op.getReg() == Reg) Ops.push_back(&Op); } return Ops; } ArrayRef getDebugOperandsForReg(Register Reg) const { return MachineInstr::getDebugOperandsForReg(this, Reg); } ArrayRef getDebugOperandsForReg(Register Reg) { return MachineInstr::getDebugOperandsForReg(this, Reg); } ``` Does this look good? It removes the duplication, it's just a bit more verbose and leaves an otherwise useless static function hanging around, unless it's moved to a private block (which is also fine but reduces readability by moving it far away from the public functions). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 From llvm-commits at lists.llvm.org Thu Jul 9 08:37:57 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:37:57 +0000 (UTC) Subject: [PATCH] D79873: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbbp asm instructions In-Reply-To: References: Message-ID: PaoloS marked an inline comment as done. PaoloS added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:870 +let Predicates = [HasStdExtZbbOrZbp, IsRV64] in { +def : Pat<(or (riscv_sllw (assertsexti32 GPR:$rs1), (assertsexti32 GPR:$rs2)), ---------------- lewis-revill wrote: > I'm not quite sure given the tests here how these patterns are used? Is `@llvm.fshl.i32` lowered to this pattern? Precisely. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79873/new/ https://reviews.llvm.org/D79873 From llvm-commits at lists.llvm.org Thu Jul 9 08:38:20 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:38:20 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: <3fb5cd8ae4c951a0a014c352292e8981@localhost.localdomain> PaoloS marked 2 inline comments as done. PaoloS added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:173 + +bool RISCVDAGToDAGISel::SelectSLOI(SDValue N, SDValue &RS1, SDValue &Shamt) { + MVT XLenVT = Subtarget->getXLenVT(); ---------------- lewis-revill wrote: > Indentation within these Select functions is messed up, presumably due to a mix of tabs and spaces. Yes, I was trying to use spaces only in the end. Must have missed these. ================ Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:262 +bool RISCVDAGToDAGISel::SelectSLOIW(SDValue N, SDValue &RS1, SDValue &Shamt) { + if (N.getOpcode() == ISD::SIGN_EXTEND_INREG && + cast(N.getOperand(1))->getVT() == MVT::i32) { ---------------- lewis-revill wrote: > I'm not sure the convention other select functions for W instructions follow but perhaps an assert for IsRV64 should be added for completeness? Well, SLOIW exists only on RV64. I could add it, but I think it would be a bit redundant if I guard the selects only for RV64. But yes, for completeness I probably should. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 From llvm-commits at lists.llvm.org Thu Jul 9 08:41:29 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:41:29 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: DiggerLin marked 25 inline comments as done. DiggerLin added inline comments. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:408 + uint64_t Size); + XCOFFTracebackTable(const uint8_t *Ptr, uint64_t Size, Error &Err); + ---------------- jhenderson wrote: > Make this private so that all users must use the `create` function. thanks ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:396 + const uint8_t *TBPtr; + const uint64_t Size; + Optional ParaType; ---------------- jasonliu wrote: > DiggerLin wrote: > > jasonliu wrote: > > > Do you actually need the Size as data member? > > we need to Size to know whether the traceback table is long enough for the all for the fields. > But right now you only need to use it in the constructor and it was passed in as a parameter. So why do you need to save it as a data member? Or did I miss any other usage of `Size` as data member? thanks ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:883 +XCOFFTracebackTable::XCOFFTracebackTable(const uint8_t *Ptr, + const uint64_t Size, Error &Err) + : TBPtr(Ptr), Size(Size) { ---------------- jhenderson wrote: > Delete the `const` for `Size`. thanks ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:888 + /* AddressSize */ 0); + uint64_t OffsetPtr = 0; + ---------------- jhenderson wrote: > This isn't a pointer, so remove the `Ptr`! thanks ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:890 + + DE.getU64(&OffsetPtr, &Err); + ---------------- jhenderson wrote: > Not sure if you've come across it yet, but you could simplify all these calls using `DataExtractor::Cursor` instead. This stores the error internally, whilst also allowing you to continue parsing to the end, with no harmful effects (because nothing is read if `Cursor` is in an error state), similar to `Err` being passed in everywhere. It also tracks the offset. Usage is: > > ``` > DataExtractor::Cursor C(/*Offset=*/0); > DE.getU64(C); // By the way, what is this value for? > ... > // No need for ifs here. > CodeLen = DE.getU32(C); > HandlerMask = DE.getU32(C); > ... > > Err = C.takeError(); > ``` I considered to use Cursor before, but most of our data member are Optianl I think it maybe better to keep current implement, using Cursor, it still get a value(for example zero), and the CodeLen, ParaType, FunctionName etc are Optional type. it will do Optional &operator=(T &&y) for all the fields(CodeLen HandlerMask etc,) even if there is error before and if will always call parseParaType(). I do not think it is efficient. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:896 + report_fatal_error("vector info, controlled storage info and extension " + "table of traceback table not yet implemented"); + ---------------- jasonliu wrote: > I would hope we could skip the sections we do not want to parse for now gracefully instead of just report_fatal_error and stop parsing all together. I think report_fatal_error maybe reasonable here. if we skip the section , how does the user know a object file has these section or not ? if report a error , the user will know the llvm-objdump do not support the vector etc section and need to ask to developer the functionality. and I think we also will create a new patch to support vector etc . ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:850 + if (I != 0) + ParaType += ", "; + if ((Value & TracebackTable::FixedParaTypeBit) == 0) { ---------------- jasonliu wrote: > DiggerLin wrote: > > jasonliu wrote: > > > Consider doing ParaType += "i, " and ParaType += "f, " ... > > > and do a removal of ", " after parsing all parameters. > > since we will use SmallString, The cost of deleting last "," is more expense than currently implement. > But you could avoid so many "if (I != 0)" condition which is not that efficient. I do not think it is a large number of loop. if it is a large number of loop, I will consider your suggestion. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- jasonliu wrote: > jasonliu wrote: > > DiggerLin wrote: > > > jasonliu wrote: > > > > Why do we need to declare a new variable? > > > yes , we need it . it been use here > > > FunctionName = DE.getBytes(&offset_ptr, Len, &Err); > > > > > > since after we get a value the point offset_ptr moved, we can not get it second time. > > What's wrong with > > ``` > > FunctionNameLen = DE.getU16(&OffsetPtr, &Err); > > if (!Err) > > FunctionName = DE.getBytes(&OffsetPtr, Len, &Err); > > ``` > > ? > I meant > ``` > FunctionNameLen = DE.getU16(&OffsetPtr, &Err); > if (!Err) > FunctionName = DE.getBytes(&OffsetPtr, FunctionNameLen, &Err); > ``` what I think is "the FunctionNameLen is Optional type , when calling FunctionName = DE.getBytes(&OffsetPtr, FunctionNameLen, &Err); the FunctionNameLen need to convert to uint16_t . And I think adding a local variable Len is less cost than calling a function FunctionNameLen.getValue() " ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:904 + if (!Err && isFuncNamePresent()) { + uint16_t Len = DE.getU16(&offset_ptr, &Err); + if (!Err) ---------------- DiggerLin wrote: > jasonliu wrote: > > jasonliu wrote: > > > DiggerLin wrote: > > > > jasonliu wrote: > > > > > Why do we need to declare a new variable? > > > > yes , we need it . it been use here > > > > FunctionName = DE.getBytes(&offset_ptr, Len, &Err); > > > > > > > > since after we get a value the point offset_ptr moved, we can not get it second time. > > > What's wrong with > > > ``` > > > FunctionNameLen = DE.getU16(&OffsetPtr, &Err); > > > if (!Err) > > > FunctionName = DE.getBytes(&OffsetPtr, Len, &Err); > > > ``` > > > ? > > I meant > > ``` > > FunctionNameLen = DE.getU16(&OffsetPtr, &Err); > > if (!Err) > > FunctionName = DE.getBytes(&OffsetPtr, FunctionNameLen, &Err); > > ``` > what I think is "the FunctionNameLen is Optional type , when calling > FunctionName = DE.getBytes(&OffsetPtr, FunctionNameLen, &Err); > the FunctionNameLen need to convert to uint16_t . > And I think adding a local variable Len is less cost than calling a function FunctionNameLen.getValue() " > discuss offline , change as suggestion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Thu Jul 9 08:43:19 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:43:19 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <9198837a4fad6e3199e06b6e52b68385@localhost.localdomain> sameerarora101 marked 2 inline comments as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/create-static-lib.test:51 +## The warning is not yet implemented for llvm-libtool-darwin. +# RUN: llvm-libtool-darwin -static -o %t.lib %t-input1.o %t-input2.o %t-input1.o +# RUN: llvm-ar t %t.lib | \ ---------------- jhenderson wrote: > It might be worth adding a `2&>1` and check the output is empty here, to flag up if a warning starts getting emitted. That way, it points to where to add testing for the warning. ok, added ``` # RUN: llvm-libtool-darwin -static -o %t.lib %t-input1.o %t-input2.o %t-input1.o 2>&1 | \ # RUN: FileCheck %s --allow-empty --implicit-check-not={{.}} ``` I hope `--alow-empty` is the right way to go about empty input files (found it here http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140804/229905.html). The flag is not there in the documentation Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Thu Jul 9 08:44:40 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:44:40 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <25f8c60f0d38ad8a9c941698ce5d2024@localhost.localdomain> DiggerLin updated this revision to Diff 276749. DiggerLin marked 8 inline comments as done. DiggerLin added a comment. address comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 Files: llvm/include/llvm/BinaryFormat/XCOFF.h llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/unittests/Object/CMakeLists.txt llvm/unittests/Object/XCOFFObjectFileTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81585.276749.patch Type: text/x-patch Size: 16107 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:45:32 2020 From: llvm-commits at lists.llvm.org (Daniel Stone via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:45:32 +0000 (UTC) Subject: [PATCH] D77589: libclc: Add Mesa/SPIR-V target In-Reply-To: References: Message-ID: <23a8aa1e6a4444d6dfd7df7e50a50c3d@localhost.localdomain> daniels added a comment. Third time's a charm - the CI now passes. Can someone please push this when you're ready? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77589/new/ https://reviews.llvm.org/D77589 From llvm-commits at lists.llvm.org Thu Jul 9 08:46:22 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:46:22 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram added a comment. In D79978#2141768 , @wmi wrote: > In D79978#2140849 , @MaskRay wrote: > > > I haven't looked into the details, but the test suggests that the patch is wrong: > > > > # basic-block-sections-cfi-1.ll > > .section .text,"ax", at progbits,unique,2 > > _Z2f3b.2: # %if.end > > .cfi_startproc > > .cfi_def_cfa %rbp, 16 # this should be inserted after addq $16, %rsp > > .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 > > addq $16, %rsp > > popq %rbp > > .cfi_def_cfa %rsp, 8 > > retq > > > > > I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around. Right. If you also compile the code without fbasicblock-sections, then you will see that there is no cfi directive for addq $16, %rsp at that point where you are referring to. I use that myself to understand when bugs happen. I will double-check. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Thu Jul 9 08:49:15 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:49:15 +0000 (UTC) Subject: [PATCH] D83479: [COFF] Error on unexpected .pdata size In-Reply-To: References: Message-ID: thakis accepted this revision. thakis added a comment. This revision is now accepted and ready to land. Nice! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83479/new/ https://reviews.llvm.org/D83479 From llvm-commits at lists.llvm.org Thu Jul 9 08:52:13 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Thu, 09 Jul 2020 15:52:13 +0000 (UTC) Subject: [PATCH] D83421: [RFC] MemorySSAUpdater: Simplify applyUpdates In-Reply-To: References: Message-ID: <8b934059c6f47fe9e9b76f475655dc75@localhost.localdomain> nhaehnle abandoned this revision. nhaehnle added a comment. Okay, thank you for that context, that makes sense. I'm going to drop this change then obviously. FYI, I'm considering to at least try out making dominator tree construction based on the CfgInterfaceImpl from D83088 , so that GenericDomTreeConstruction is no longer all a giant ball of templates. Do you think this would help you with your changes? Obviously there's the question of how it impacts compile time, but we won't really know for sure until we try it, which is one of my motivations here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83421/new/ https://reviews.llvm.org/D83421 From llvm-commits at lists.llvm.org Thu Jul 9 08:54:04 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:54:04 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <9f447bf0dfb95562fd9184a16c071b7e@localhost.localdomain> sameerarora101 updated this revision to Diff 276751. sameerarora101 marked an inline comment as done. sameerarora101 added a comment. Updating test files Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/test/tools/llvm-libtool-darwin/missing-library-type.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.276751.patch Type: text/x-patch Size: 12535 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:54:15 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:54:15 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <5afc5b642eac3eaae7374ab1721eef60@localhost.localdomain> paulwalker-arm updated this revision to Diff 276752. paulwalker-arm added a comment. Made custom lowering for all truncates explicit. Added test for trunc_v4i64_v4i8 and tighten up the register based tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83395.276752.patch Type: text/x-patch Size: 20207 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 08:54:17 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:54:17 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <82a15517cac743bdb27f33a471a158ca@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:896 + report_fatal_error("vector info, controlled storage info and extension " + "table of traceback table not yet implemented"); + ---------------- DiggerLin wrote: > jasonliu wrote: > > I would hope we could skip the sections we do not want to parse for now gracefully instead of just report_fatal_error and stop parsing all together. > I think report_fatal_error maybe reasonable here. if we skip the section , how does the user know a object file has these section or not ? if report a error , the user will know the llvm-objdump do not support the vector etc section and need to ask to developer the functionality. and I think we also will create a new patch to support vector etc . It should be possible to create stubs for handling these that print something that makes it clear that the decoding has not been implemented. Using `report_fatal_error` means that a user with an object file containing these are blocked even if they only need to inspect the traceback table for other purposes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 From llvm-commits at lists.llvm.org Thu Jul 9 08:55:25 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:55:25 +0000 (UTC) Subject: [PATCH] D83479: [COFF] Error on unexpected .pdata size In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/COFF/Writer.cpp:1868 + if ((end - begin) % sizeof(Entry) != 0) { + fatal("unexpected .pdata size: " + std::to_string(end - begin) + + " is not a multiple of " + std::to_string(sizeof(Entry))); ---------------- Nit: `Twine(end-begin)` ... to avoid a temporary std::string. ================ Comment at: lld/test/COFF/pdata-arm64-bad.yaml:1 +# RUN: yaml2obj < %s > %t.obj +# ---------------- `yaml2obj %s -o %t.obj` Drop the `#` on the empty line. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83479/new/ https://reviews.llvm.org/D83479 From llvm-commits at lists.llvm.org Thu Jul 9 08:56:37 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:56:37 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <7548120161f9f53645486c5546b22424@localhost.localdomain> sameerarora101 marked an inline comment as done. sameerarora101 added inline comments. ================ Comment at: llvm/test/tools/llvm-libtool-darwin/help-message.test:10 # RUN: llvm-libtool-darwin --help-list | \ -# RUN: FileCheck -check-prefixes=LIBTOOL-USAGE,LIST %s --match-full-lines +# RUN: FileCheck -check-prefixes=LIBTOOL-USAGE,LIST %s --match-full-lines --implicit-check-not="--safepoint-ir-verifier-print-only" ---------------- added `--implicit-check-not="--safepoint-ir-verifier-print-only` as we know the headers won't be present because of `LIST-NOT` checks below. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Thu Jul 9 08:56:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:56:45 +0000 (UTC) Subject: [PATCH] D83479: [COFF] Error on unexpected .pdata size In-Reply-To: References: Message-ID: <4f7eb4cb0e39425e7300e8a8c0eca2ae@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/COFF/pdata-arm64-bad.yaml:5 + +# This file is like pdata-arm64.yaml, except that .pdata has been extended with +# 4 bytes. This can happen due to for example bad assembler input. Check that ---------------- The yaml can be further simplified. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83479/new/ https://reviews.llvm.org/D83479 From llvm-commits at lists.llvm.org Thu Jul 9 08:58:33 2020 From: llvm-commits at lists.llvm.org (Daniel Stone via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 15:58:33 +0000 (UTC) Subject: [PATCH] D82078: libclc: Make all built-ins overloadable In-Reply-To: References: Message-ID: <3bf43c18ab167c8fed8b4b530e424fbe@localhost.localdomain> daniels added a comment. @jvesely @tstellar Hi, any thoughts on this change please? We need this in order to have a fully working SPIR-V backend. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82078/new/ https://reviews.llvm.org/D82078 From llvm-commits at lists.llvm.org Thu Jul 9 09:00:52 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via llvm-commits) Date: Thu, 09 Jul 2020 09:00:52 -0700 (PDT) Subject: [llvm] 9ecda9a - Revert 51b0da73 "Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def."" Message-ID: <5f073f34.1c69fb81.660e8.8302@mx.google.com> Author: Hans Wennborg Date: 2020-07-09T17:55:58+02:00 New Revision: 9ecda9aa804dcb49e30e46161b29976494b0a1f9 URL: https://github.com/llvm/llvm-project/commit/9ecda9aa804dcb49e30e46161b29976494b0a1f9 DIFF: https://github.com/llvm/llvm-project/commit/9ecda9aa804dcb49e30e46161b29976494b0a1f9.diff LOG: Revert 51b0da73 "Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def."" It gets miscompiled with GCC 5.3, causing Clang to crash with "error: unknown target CPU 'x86-64'" See the llvm-commits thread for reproduction steps. This reverts commit 51b0da731af75c68dd521e04cc576d5a611b1612. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 4b96c66b0e29..91feb146baaa 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -184,6 +184,10 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") +// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is +// valid for 64-bit mode, but has empty string so it doesn't get added to +// target attributes in IR. +X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index adfb599f55ff..9f73f1ab1424 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; } - if (testFeature(X86::FEATURE_64BIT)) { + if (testFeature(X86::FEATURE_EM64T)) { *Type = X86::INTEL_CORE2; // "core2" *Subtype = X86::INTEL_CORE2_65; break; @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 15: { - if (testFeature(X86::FEATURE_64BIT)) { + if (testFeature(X86::FEATURE_EM64T)) { *Type = X86::INTEL_NOCONA; break; } @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, setFeature(X86::FEATURE_FMA4); if (HasExtLeaf1 && ((EDX >> 29) & 1)) - setFeature(X86::FEATURE_64BIT); + setFeature(X86::FEATURE_EM64T); } StringRef sys::getHostCPUName() { diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 7e87d65a7c56..261e296b9e5a 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -48,14 +48,6 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } - constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { - uint32_t NewBits = Bits[I] & RHS.Bits[I]; - Bits[I] = NewBits; - } - return *this; - } - constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { uint32_t NewBits = Bits[I] | RHS.Bits[I]; @@ -65,14 +57,16 @@ class FeatureBitset { } constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { - FeatureBitset Result = *this; - Result &= RHS; + FeatureBitset Result; + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) + Result.Bits[I] = Bits[I] & RHS.Bits[I]; return Result; } constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { - FeatureBitset Result = *this; - Result |= RHS; + FeatureBitset Result; + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) + Result.Bits[I] = Bits[I] | RHS.Bits[I]; return Result; } @@ -117,10 +111,10 @@ static constexpr FeatureBitset FeaturesPentium4 = static constexpr FeatureBitset FeaturesPrescott = FeaturesPentium4 | FeatureSSE3; static constexpr FeatureBitset FeaturesNocona = - FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; + FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; // Basic 64-bit capable CPU. -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; // Intel Core CPUs static constexpr FeatureBitset FeaturesCore2 = @@ -207,7 +201,7 @@ static constexpr FeatureBitset FeaturesAthlon = static constexpr FeatureBitset FeaturesAthlonXP = FeaturesAthlon | FeatureFXSR | FeatureSSE; static constexpr FeatureBitset FeaturesK8 = - FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; + FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | @@ -215,7 +209,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; @@ -226,7 +220,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = // AMD Bulldozer architecture processors. static constexpr FeatureBitset FeaturesBDVER1 = FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | - FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | + FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; @@ -242,7 +236,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = static constexpr FeatureBitset FeaturesZNVER1 = FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | - FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | + FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | @@ -369,7 +363,7 @@ static constexpr ProcInfo Processors[] = { X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { for (const auto &P : Processors) - if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) + if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) return P.Kind; return CK_None; @@ -378,7 +372,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, bool Only64Bit) { for (const auto &P : Processors) - if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) + if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) Values.emplace_back(P.Name); } @@ -407,6 +401,7 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; +static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; @@ -532,14 +527,8 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); - FeatureBitset Bits = I->Features; - - // Remove the 64-bit feature which we only use to validate if a CPU can - // be used with 64-bit mode. - Bits &= ~Feature64BIT; - // Add the string version of all set bits. - getFeatureBitsAsStrings(Bits, EnabledFeatures); + getFeatureBitsAsStrings(I->Features, EnabledFeatures); } // For each feature that is (transitively) implied by this feature, set it. From llvm-commits at lists.llvm.org Thu Jul 9 09:01:14 2020 From: llvm-commits at lists.llvm.org (Alexis Perry-Holby via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:01:14 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran Message-ID: AlexisPerry created this revision. AlexisPerry added reviewers: sscalpone, richard.barton.arm, DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Changed default F18_FC from pgf90 to gfortran. Removed unnecessary references to pgf90 in favor of more generic naming. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83488 Files: flang/tools/f18/f18.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83488.276755.patch Type: text/x-patch Size: 4464 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:01:15 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:01:15 +0000 (UTC) Subject: [PATCH] D83264: [ELF] Add -z dead-reloc-in-nonalloc== In-Reply-To: References: Message-ID: MaskRay added a comment. In D83264#2140954 , @grimar wrote: > In D83264#2135482 , @grimar wrote: > > > Probably there are other names rather than "dead-reloc-in-nonalloc" which we might want to consider? > > > > `-z dead-noalloc-reloc-val` > > `-z tombstone-reloc` > > `-z resolve-dead-reloc` > > > This remained unanswered. Dead relocations in SHF_ALLOC sections are errors. `relocateNonAlloc` does not handle them, so `-z tombstone-reloc` and `-z resolve-dead-reloc` are excluded. For `-z dead-nonalloc-reloc-val` The adjective `nonalloc` appears to describe `reloc`, which is not precise. `nonalloc` describes the section. I still consider `-z dead-reloc-in-nonalloc` the best. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83264/new/ https://reviews.llvm.org/D83264 From llvm-commits at lists.llvm.org Thu Jul 9 09:02:14 2020 From: llvm-commits at lists.llvm.org (Owen Anderson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:02:14 +0000 (UTC) Subject: [PATCH] D69668: Introduce a generic memset to bzero transformation in CodeGen In-Reply-To: References: Message-ID: resistor added inline comments. ================ Comment at: llvm/lib/Analysis/TargetLibraryInfo.cpp:55 -static bool hasBcmp(const Triple &TT) { - // Posix removed support from bcmp() in 2001, but the glibc and several - // implementations of the libc still have it. +static bool hasPosix2001LibCFunctions(const Triple &TT) { + // Posix removed support from some libc functions in 2001 (bcmp, bzero), but ---------------- This name reads like it refers to functions that were added in 2001, not to functions that were removed after 2001. ================ Comment at: llvm/test/CodeGen/AArch64/arm64-memset-to-bzero.ll:11 ; CHECK-LINUX: {{b|bl}} memset -define void @fct1(i8* nocapture %ptr) minsize { +define void @fct1(i8* nocapture %ptr) minsize nounwind { entry: ---------------- Can you remove these unrelated changes? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69668/new/ https://reviews.llvm.org/D69668 From llvm-commits at lists.llvm.org Thu Jul 9 09:02:17 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:02:17 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: <9f3a757ffd28a30dbeab96391fc60d43@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:368 // Skip tests when we don't have PDT or DI - if (!PDT || !DI) + if (!PDT || !(DI || MSSAU)) return false; ---------------- RithikSharma wrote: > fhahn wrote: > > Does it make sense to even call this function if either of those are not available, i.e. if all those required wouldn't it make sense to assert that they are all provided or turn them into references? > I'm sorry, I didn't understand. We need at least DI or MSSA to find dependency. I meant does it make sense to call this function without `PDT == nullptr` for example? It seems like it is kind of required here, right? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Thu Jul 9 09:02:24 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via llvm-commits) Date: Thu, 9 Jul 2020 18:02:24 +0200 Subject: [llvm] 51b0da7 - Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." In-Reply-To: <5f052995.1c69fb81.150c9.bb11@mx.google.com> References: <5f052995.1c69fb81.150c9.bb11@mx.google.com> Message-ID: Sadly it seems this gets miscompiled by GCC 5, making Clang fail to find the x86_64 target. Since that's the baseline supported compiler, I've reverted in 9ecda9aa804dcb49e30e46161b29976494b0a1f9 Here are repro instructions: $ mkdir /tmp/gcc5 $ curl https://commondatastorage.googleapis.com/chromium-browser-clang/tools/gcc530trusty.tgz | tar -C /tmp/gcc5 -zx $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_C_COMPILER=/tmp/gcc5/bin/gcc -DCMAKE_CXX_COMPILER=/tmp/gcc5/bin/g++ ../llvm $ ninja clang $ touch /tmp/a.c $ bin/clang -c /tmp/a.c error: unknown target CPU 'x86-64' PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: bin/clang -c /tmp/a.c bin/clang(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x1a)[0x18eb26a] bin/clang(_ZN4llvm3sys17RunSignalHandlersEv+0x3a)[0x18e932a] bin/clang(_ZN4llvm3sys15CleanupOnSignalEm+0x8a)[0x18e955a] bin/clang[0x186a820] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14110)[0x7f3721be6110] /lib/x86_64-linux-gnu/libc.so.6(+0x16182c)[0x7f37217e682c] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm+0x3c)[0x7f3721add92c] bin/clang(_ZN5clang10TargetInfo16CreateTargetInfoERNS_17DiagnosticsEngineERKSt10shared_ptrINS_13TargetOptionsEE+0xe9a)[0x3c4b37a] bin/clang(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x48)[0x1feef68] bin/clang(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x6ca)[0x20d4a1a] bin/clang(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0xf9c)[0xa1e58c] bin/clang[0xa1b627] bin/clang[0x1ed70d5] bin/clang(_ZN4llvm20CrashRecoveryContext9RunSafelyENS_12function_refIFvvEEE+0xa0)[0x186a9c0] bin/clang[0x1ed7dcb] bin/clang(_ZNK5clang6driver11Compilation14ExecuteCommandERKNS0_7CommandERPS3_+0x88)[0x1eb2938] bin/clang(_ZNK5clang6driver11Compilation11ExecuteJobsERKNS0_7JobListERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0x107)[0x1eb3037] bin/clang(_ZN5clang6driver6Driver18ExecuteCompilationERNS0_11CompilationERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0xba)[0x1ebac5a] bin/clang(main+0x111d)[0x9a0ebd] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f37216abe0b] bin/clang(_start+0x2a)[0xa1b07a] clang-11: error: clang frontend command failed due to signal (use -v to see invocation) clang version 11.0.0 (https://github.com/llvm/llvm-project 51b0da731af75c68dd521e04cc576d5a611b1612) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /work/llvm.monorepo/build.foo/bin clang-11: error: unable to execute command: Segmentation fault clang-11: note: diagnostic msg: Error generating preprocessed source(s). On Wed, Jul 8, 2020 at 4:04 AM Craig Topper via llvm-commits wrote: > > > Author: Craig Topper > Date: 2020-07-07T19:01:58-07:00 > New Revision: 51b0da731af75c68dd521e04cc576d5a611b1612 > > URL: https://github.com/llvm/llvm-project/commit/51b0da731af75c68dd521e04cc576d5a611b1612 > DIFF: https://github.com/llvm/llvm-project/commit/51b0da731af75c68dd521e04cc576d5a611b1612.diff > > LOG: Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." > > These represent the same thing but 64BIT only showed up from > getHostCPUFeatures providing a list of featuers to clang. While > EM64T showed up from getting the features for a named CPU. > > EM64T didn't have a string specifically so it would not be passed > up to clang when getting features for a named CPU. While 64bit > needed a name since that's how it is index. > > Merge them by filtering 64bit out before sending features to clang > for named CPUs. > > Added: > > > Modified: > llvm/include/llvm/Support/X86TargetParser.def > llvm/lib/Support/Host.cpp > llvm/lib/Support/X86TargetParser.cpp > > Removed: > > > > ################################################################################ > diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def > index 9910fd615b1d..ed41295166b3 100644 > --- a/llvm/include/llvm/Support/X86TargetParser.def > +++ b/llvm/include/llvm/Support/X86TargetParser.def > @@ -184,10 +184,6 @@ X86_FEATURE (CLWB, "clwb") > X86_FEATURE (CLZERO, "clzero") > X86_FEATURE (CMPXCHG16B, "cx16") > X86_FEATURE (CMPXCHG8B, "cx8") > -// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is > -// valid for 64-bit mode, but has empty string so it doesn't get added to > -// target attributes in IR. > -X86_FEATURE (EM64T, "") > X86_FEATURE (ENQCMD, "enqcmd") > X86_FEATURE (F16C, "f16c") > X86_FEATURE (FSGSBASE, "fsgsbase") > > diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp > index 3a7d9a0242fa..db99612c97b5 100644 > --- a/llvm/lib/Support/Host.cpp > +++ b/llvm/lib/Support/Host.cpp > @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, > } > break; > } > - if (testFeature(X86::FEATURE_EM64T)) { > + if (testFeature(X86::FEATURE_64BIT)) { > *Type = X86::INTEL_CORE2; // "core2" > *Subtype = X86::INTEL_CORE2_65; > break; > @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, > } > break; > case 15: { > - if (testFeature(X86::FEATURE_EM64T)) { > + if (testFeature(X86::FEATURE_64BIT)) { > *Type = X86::INTEL_NOCONA; > break; > } > @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, > setFeature(X86::FEATURE_FMA4); > > if (HasExtLeaf1 && ((EDX >> 29) & 1)) > - setFeature(X86::FEATURE_EM64T); > + setFeature(X86::FEATURE_64BIT); > } > > StringRef sys::getHostCPUName() { > > diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp > index df03f63e720e..cbb7f6186d0d 100644 > --- a/llvm/lib/Support/X86TargetParser.cpp > +++ b/llvm/lib/Support/X86TargetParser.cpp > @@ -48,6 +48,14 @@ class FeatureBitset { > return (Bits[I / 32] & Mask) != 0; > } > > + constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { > + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { > + uint32_t NewBits = Bits[I] & RHS.Bits[I]; > + Bits[I] = NewBits; > + } > + return *this; > + } > + > constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { > for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { > uint32_t NewBits = Bits[I] | RHS.Bits[I]; > @@ -57,16 +65,14 @@ class FeatureBitset { > } > > constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { > - FeatureBitset Result; > - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) > - Result.Bits[I] = Bits[I] & RHS.Bits[I]; > + FeatureBitset Result = *this; > + Result &= RHS; > return Result; > } > > constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { > - FeatureBitset Result; > - for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) > - Result.Bits[I] = Bits[I] | RHS.Bits[I]; > + FeatureBitset Result = *this; > + Result |= RHS; > return Result; > } > > @@ -111,10 +117,10 @@ static constexpr FeatureBitset FeaturesPentium4 = > static constexpr FeatureBitset FeaturesPrescott = > FeaturesPentium4 | FeatureSSE3; > static constexpr FeatureBitset FeaturesNocona = > - FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; > + FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; > > // Basic 64-bit capable CPU. > -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; > +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; > > // Intel Core CPUs > static constexpr FeatureBitset FeaturesCore2 = > @@ -201,7 +207,7 @@ static constexpr FeatureBitset FeaturesAthlon = > static constexpr FeatureBitset FeaturesAthlonXP = > FeaturesAthlon | FeatureFXSR | FeatureSSE; > static constexpr FeatureBitset FeaturesK8 = > - FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; > + FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; > static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; > static constexpr FeatureBitset FeaturesAMDFAM10 = > FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | > @@ -209,7 +215,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = > > // Bobcat architecture processors. > static constexpr FeatureBitset FeaturesBTVER1 = > - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | > + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | > FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | > FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | > FeatureSAHF; > @@ -220,7 +226,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = > // AMD Bulldozer architecture processors. > static constexpr FeatureBitset FeaturesBDVER1 = > FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | > - FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | > + FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | > FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | > FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | > FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; > @@ -236,7 +242,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = > static constexpr FeatureBitset FeaturesZNVER1 = > FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | > FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | > - FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | > + FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | > FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | > FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | > FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | > @@ -363,7 +369,7 @@ static constexpr ProcInfo Processors[] = { > > X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { > for (const auto &P : Processors) > - if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) > + if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) > return P.Kind; > > return CK_None; > @@ -372,7 +378,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { > void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, > bool Only64Bit) { > for (const auto &P : Processors) > - if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) > + if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) > Values.emplace_back(P.Name); > } > > @@ -401,7 +407,6 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; > static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; > static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; > static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; > -static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; > static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; > static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; > static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; > @@ -528,8 +533,14 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, > [&](const ProcInfo &P) { return P.Name == CPU; }); > assert(I != std::end(Processors) && "Processor not found!"); > > + FeatureBitset Bits = I->Features; > + > + // Remove the 64-bit feature which we only use to validate if a CPU can > + // be used with 64-bit mode. > + Bits &= ~Feature64BIT; > + > // Add the string version of all set bits. > - getFeatureBitsAsStrings(I->Features, EnabledFeatures); > + getFeatureBitsAsStrings(Bits, EnabledFeatures); > } > > // For each feature that is (transitively) implied by this feature, set it. > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Thu Jul 9 09:06:55 2020 From: llvm-commits at lists.llvm.org (Tom Stellard via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:06:55 +0000 (UTC) Subject: [PATCH] D82078: libclc: Make all built-ins overloadable In-Reply-To: References: Message-ID: <83bfe8c870b13c3cedaf7f0971a69ce8@localhost.localdomain> tstellar accepted this revision. tstellar added a comment. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82078/new/ https://reviews.llvm.org/D82078 From llvm-commits at lists.llvm.org Thu Jul 9 09:10:07 2020 From: llvm-commits at lists.llvm.org (Greg McGary via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:10:07 +0000 (UTC) Subject: [PATCH] D81413: lld: improve handling of `-platform_version` In-Reply-To: References: Message-ID: <5d8fbeb90ed0a539f22d7f70e958863e@localhost.localdomain> gkm added a comment. @compnerd, do you intend to complete & land this diff? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81413/new/ https://reviews.llvm.org/D81413 From llvm-commits at lists.llvm.org Thu Jul 9 09:12:14 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:12:14 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: MaskRay added a comment. In D79978#2141768 , @wmi wrote: > In D79978#2140849 , @MaskRay wrote: > > > I haven't looked into the details, but the test suggests that the patch is wrong: > > > > # basic-block-sections-cfi-1.ll > > .section .text,"ax", at progbits,unique,2 > > _Z2f3b.2: # %if.end > > .cfi_startproc > > .cfi_def_cfa %rbp, 16 # this should be inserted after addq $16, %rsp > > .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 > > addq $16, %rsp > > popq %rbp > > .cfi_def_cfa %rsp, 8 > > retq > > > > > I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around. Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. > For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there. This question still stands. I haven't seen other CFI tests in`DebugInfo/X86/` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Thu Jul 9 09:18:06 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Thu, 09 Jul 2020 09:18:06 -0700 (PDT) Subject: [llvm] 9477d39 - [SCCP] Move tests using only ipsccp from IPConstantProp to SCCP (NFC). Message-ID: <5f07433e.1c69fb81.1d11.874c@mx.google.com> Author: Florian Hahn Date: 2020-07-09T17:16:15+01:00 New Revision: 9477d39e61f8e0076e47fef81941e8a24e979d6f URL: https://github.com/llvm/llvm-project/commit/9477d39e61f8e0076e47fef81941e8a24e979d6f DIFF: https://github.com/llvm/llvm-project/commit/9477d39e61f8e0076e47fef81941e8a24e979d6f.diff LOG: [SCCP] Move tests using only ipsccp from IPConstantProp to SCCP (NFC). Some of the tests in the llvm/test/Transforms/IPConstantProp directory actually only use -ipsccp. Those tests belong to the other (IP)SCCP tests in llvm/test/Transforms/SCCP/ and this commits moves them there to avoid confusion with IPConstantProp. Added: llvm/test/Transforms/SCCP/2009-09-24-byval-ptr.ll llvm/test/Transforms/SCCP/PR16052.ll llvm/test/Transforms/SCCP/PR26044.ll llvm/test/Transforms/SCCP/dangling-block-address.ll llvm/test/Transforms/SCCP/fp-bc-icmp-const-fold.ll llvm/test/Transforms/SCCP/global.ll llvm/test/Transforms/SCCP/musttail-call.ll llvm/test/Transforms/SCCP/remove-call-inst.ll llvm/test/Transforms/SCCP/solve-after-each-resolving-undefs-for-function.ll llvm/test/Transforms/SCCP/user-with-multiple-uses.ll Modified: Removed: llvm/test/Transforms/IPConstantProp/2009-09-24-byval-ptr.ll llvm/test/Transforms/IPConstantProp/PR16052.ll llvm/test/Transforms/IPConstantProp/PR26044.ll llvm/test/Transforms/IPConstantProp/dangling-block-address.ll llvm/test/Transforms/IPConstantProp/fp-bc-icmp-const-fold.ll llvm/test/Transforms/IPConstantProp/global.ll llvm/test/Transforms/IPConstantProp/musttail-call.ll llvm/test/Transforms/IPConstantProp/remove-call-inst.ll llvm/test/Transforms/IPConstantProp/solve-after-each-resolving-undefs-for-function.ll llvm/test/Transforms/IPConstantProp/user-with-multiple-uses.ll ################################################################################ diff --git a/llvm/test/Transforms/IPConstantProp/2009-09-24-byval-ptr.ll b/llvm/test/Transforms/SCCP/2009-09-24-byval-ptr.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/2009-09-24-byval-ptr.ll rename to llvm/test/Transforms/SCCP/2009-09-24-byval-ptr.ll diff --git a/llvm/test/Transforms/IPConstantProp/PR16052.ll b/llvm/test/Transforms/SCCP/PR16052.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/PR16052.ll rename to llvm/test/Transforms/SCCP/PR16052.ll diff --git a/llvm/test/Transforms/IPConstantProp/PR26044.ll b/llvm/test/Transforms/SCCP/PR26044.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/PR26044.ll rename to llvm/test/Transforms/SCCP/PR26044.ll diff --git a/llvm/test/Transforms/IPConstantProp/dangling-block-address.ll b/llvm/test/Transforms/SCCP/dangling-block-address.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/dangling-block-address.ll rename to llvm/test/Transforms/SCCP/dangling-block-address.ll diff --git a/llvm/test/Transforms/IPConstantProp/fp-bc-icmp-const-fold.ll b/llvm/test/Transforms/SCCP/fp-bc-icmp-const-fold.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/fp-bc-icmp-const-fold.ll rename to llvm/test/Transforms/SCCP/fp-bc-icmp-const-fold.ll diff --git a/llvm/test/Transforms/IPConstantProp/global.ll b/llvm/test/Transforms/SCCP/global.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/global.ll rename to llvm/test/Transforms/SCCP/global.ll diff --git a/llvm/test/Transforms/IPConstantProp/musttail-call.ll b/llvm/test/Transforms/SCCP/musttail-call.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/musttail-call.ll rename to llvm/test/Transforms/SCCP/musttail-call.ll diff --git a/llvm/test/Transforms/IPConstantProp/remove-call-inst.ll b/llvm/test/Transforms/SCCP/remove-call-inst.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/remove-call-inst.ll rename to llvm/test/Transforms/SCCP/remove-call-inst.ll diff --git a/llvm/test/Transforms/IPConstantProp/solve-after-each-resolving-undefs-for-function.ll b/llvm/test/Transforms/SCCP/solve-after-each-resolving-undefs-for-function.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/solve-after-each-resolving-undefs-for-function.ll rename to llvm/test/Transforms/SCCP/solve-after-each-resolving-undefs-for-function.ll diff --git a/llvm/test/Transforms/IPConstantProp/user-with-multiple-uses.ll b/llvm/test/Transforms/SCCP/user-with-multiple-uses.ll similarity index 100% rename from llvm/test/Transforms/IPConstantProp/user-with-multiple-uses.ll rename to llvm/test/Transforms/SCCP/user-with-multiple-uses.ll From llvm-commits at lists.llvm.org Thu Jul 9 09:20:02 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:20:02 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <2ad79f8a1fc07ac42542cd658da3c7fc@localhost.localdomain> SjoerdMeijer marked an inline comment as done. SjoerdMeijer added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15500 or the memory layout) can be expressed using the matrix intrinsics. Matrixes are -embedded in a flat vector and the intrinsics take the dimensions as arguments. +linearized in a vector and the intrinsics take the dimensions as arguments. Currently column-major layout is assumed. The intrinsics support both integer ---------------- fhahn wrote: > When I read linearized here, I thing about https://en.wikipedia.org/wiki/Linearization , so there might be potential for confusion. > > It might be worth defining exactly what we mean be embedding here, then further uses should be un-ambigous: the columns of a matrix R x C are embedded into a vector such that the elements of subsequent columns are adjacent in the vector. Or more formally element `I` of column `J` is at index `J * R + I` in the vector (with indices starting at 0) Yep, thanks. I was looking how to rephrase "embedded", but agree that "linearization" is perhaps equally vague, so yes this is the best we can do: > Or more formally element I of column J is at index J * R + I in the vector (with indices starting at 0) Will go for that one. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Thu Jul 9 09:23:05 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:23:05 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <116f19b75aa58035c14b222cf621eebf@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15500 or the memory layout) can be expressed using the matrix intrinsics. Matrixes are -embedded in a flat vector and the intrinsics take the dimensions as arguments. +linearized in a vector and the intrinsics take the dimensions as arguments. Currently column-major layout is assumed. The intrinsics support both integer ---------------- SjoerdMeijer wrote: > fhahn wrote: > > When I read linearized here, I thing about https://en.wikipedia.org/wiki/Linearization , so there might be potential for confusion. > > > > It might be worth defining exactly what we mean be embedding here, then further uses should be un-ambigous: the columns of a matrix R x C are embedded into a vector such that the elements of subsequent columns are adjacent in the vector. Or more formally element `I` of column `J` is at index `J * R + I` in the vector (with indices starting at 0) > Yep, thanks. I was looking how to rephrase "embedded", but agree that "linearization" is perhaps equally vague, so yes this is the best we can do: > > > Or more formally element I of column J is at index J * R + I in the vector (with indices starting at 0) > > Will go for that one. It would also be good to say that layout defaults to column major currently. It can be changed globally during the lowering to row-major as well, but we probably do not want to mention actual pass specifics here. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Thu Jul 9 09:23:50 2020 From: llvm-commits at lists.llvm.org (Konstantin Zhuravlyov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:23:50 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <6919d7ad687d83c79d46cf6bdf3be804@localhost.localdomain> kzhuravl added a comment. Do we also want to remove it from v2 metadata? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Thu Jul 9 09:25:02 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:25:02 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <6f0a11b53607ed8bef6af4cd9e6930f3@localhost.localdomain> arsenm added a comment. In D82818#2141973 , @kzhuravl wrote: > Do we also want to remove it from v2 metadata? Probably, but I looked briefly and didn't actually see the direct equivalent CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Thu Jul 9 09:25:17 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:25:17 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: <790b616668bc06d53c355ac343e4b481@localhost.localdomain> StephenTozer updated this revision to Diff 276757. StephenTozer added a comment. Address latest review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 Files: llvm/include/llvm/BinaryFormat/Dwarf.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/include/llvm/CodeGen/MachineInstrBuilder.h llvm/include/llvm/IR/DebugInfoMetadata.h llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/Target.td llvm/lib/BinaryFormat/Dwarf.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/LiveRangeShrink.cpp llvm/lib/CodeGen/MIRParser/MIParser.cpp llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/MachineRegisterInfo.cpp llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/RegAllocFast.cpp llvm/lib/IR/DebugInfoMetadata.cpp llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/X86/X86OptimizeLEAs.cpp llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D82363.276757.patch Type: text/x-patch Size: 49505 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:28:42 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:28:42 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: wmi added a comment. In D79978#2141959 , @MaskRay wrote: > In D79978#2141768 , @wmi wrote: > > > In D79978#2140849 , @MaskRay wrote: > > > > > I haven't looked into the details, but the test suggests that the patch is wrong: > > > > > > # basic-block-sections-cfi-1.ll > > > .section .text,"ax", at progbits,unique,2 > > > _Z2f3b.2: # %if.end > > > .cfi_startproc > > > .cfi_def_cfa %rbp, 16 # this should be inserted after addq $16, %rsp > > > .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 > > > addq $16, %rsp > > > popq %rbp > > > .cfi_def_cfa %rsp, 8 > > > retq > > > > > > > > > I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around. > > > Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. That is because .cfi_def_cfa %rbp, 16 is identical to the following: .cfi_def_cfa_register %rbp .cfi_def_cfa_offset 16 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Thu Jul 9 09:28:58 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via llvm-commits) Date: Thu, 09 Jul 2020 09:28:58 -0700 (PDT) Subject: [llvm] bd20680 - [PowerPC] Split s34imm into two types Message-ID: <5f0745ca.1c69fb81.756d.82c6@mx.google.com> Author: Stefan Pintilie Date: 2020-07-09T11:28:32-05:00 New Revision: bd2068031121adf5a0e28d9306a1741d6f0bbd87 URL: https://github.com/llvm/llvm-project/commit/bd2068031121adf5a0e28d9306a1741d6f0bbd87 DIFF: https://github.com/llvm/llvm-project/commit/bd2068031121adf5a0e28d9306a1741d6f0bbd87.diff LOG: [PowerPC] Split s34imm into two types Currently the instruction paddi always takes s34imm as the type for the 34 bit immediate. However, the PC Relative form of the instruction should not produce the same fixup as the non PC Relative form. This patch splits the s34imm type into s34imm and s34imm_pcrel so that two different fixups can be emitted. Reviewed By: kamaub, nemanjai Differential Revision: https://reviews.llvm.org/D83255 Added: llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s Modified: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp index dbaf221db9fc..59cb2b994a4b 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp @@ -46,6 +46,7 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t Value) { case PPC::fixup_ppc_half16ds: return Value & 0xfffc; case PPC::fixup_ppc_pcrel34: + case PPC::fixup_ppc_imm34: return Value & 0x3ffffffff; } } @@ -68,6 +69,7 @@ static unsigned getFixupKindNumBytes(unsigned Kind) { case PPC::fixup_ppc_br24_notoc: return 4; case PPC::fixup_ppc_pcrel34: + case PPC::fixup_ppc_imm34: case FK_Data_8: return 8; case PPC::fixup_ppc_nofixup: @@ -100,6 +102,7 @@ class PPCAsmBackend : public MCAsmBackend { { "fixup_ppc_half16", 0, 16, 0 }, { "fixup_ppc_half16ds", 0, 14, 0 }, { "fixup_ppc_pcrel34", 0, 34, MCFixupKindInfo::FKF_IsPCRel }, + { "fixup_ppc_imm34", 0, 34, 0 }, { "fixup_ppc_nofixup", 0, 0, 0 } }; const static MCFixupKindInfo InfosLE[PPC::NumTargetFixupKinds] = { @@ -112,6 +115,7 @@ class PPCAsmBackend : public MCAsmBackend { { "fixup_ppc_half16", 0, 16, 0 }, { "fixup_ppc_half16ds", 2, 14, 0 }, { "fixup_ppc_pcrel34", 0, 34, MCFixupKindInfo::FKF_IsPCRel }, + { "fixup_ppc_imm34", 0, 34, 0 }, { "fixup_ppc_nofixup", 0, 0, 0 } }; diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp index d8b3301e97f1..1af08ec5539d 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp @@ -409,6 +409,9 @@ unsigned PPCELFObjectWriter::getRelocType(MCContext &Ctx, const MCValue &Target, break; } break; + case PPC::fixup_ppc_imm34: + llvm_unreachable("Unsupported Modifier for fixup_ppc_imm34."); + break; case FK_Data_8: switch (Modifier) { default: llvm_unreachable("Unsupported Modifier"); diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h index 2fb8947fd4e0..73292f7b7938 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h @@ -43,6 +43,9 @@ enum Fixups { // A 34-bit fixup corresponding to PC-relative paddi. fixup_ppc_pcrel34, + // A 34-bit fixup corresponding to Non-PC-relative paddi. + fixup_ppc_imm34, + /// Not a true fixup, but ties a symbol to a call to __tls_get_addr for the /// TLS general and local dynamic models, or inserts the thread-pointer /// register number. diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp index fb65e7320f2b..8c0e0a80b1e2 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp @@ -104,20 +104,36 @@ unsigned PPCMCCodeEmitter::getImm16Encoding(const MCInst &MI, unsigned OpNo, return 0; } -uint64_t -PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const { +uint64_t PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI, + MCFixupKind Fixup) const { const MCOperand &MO = MI.getOperand(OpNo); - if (MO.isReg() || MO.isImm()) + assert(!MO.isReg() && "Not expecting a register for this operand."); + if (MO.isImm()) return getMachineOpValue(MI, MO, Fixups, STI); // Add a fixup for the immediate field. - Fixups.push_back(MCFixup::create(0, MO.getExpr(), - (MCFixupKind)PPC::fixup_ppc_pcrel34)); + Fixups.push_back(MCFixup::create(0, MO.getExpr(), Fixup)); return 0; } +uint64_t +PPCMCCodeEmitter::getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI) const { + return getImm34Encoding(MI, OpNo, Fixups, STI, + (MCFixupKind)PPC::fixup_ppc_imm34); +} + +uint64_t +PPCMCCodeEmitter::getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI) const { + return getImm34Encoding(MI, OpNo, Fixups, STI, + (MCFixupKind)PPC::fixup_ppc_pcrel34); +} + unsigned PPCMCCodeEmitter::getMemRIEncoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, const MCSubtargetInfo &STI) const { diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h index 588aa76bd806..4504cc6a7405 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h @@ -52,7 +52,14 @@ class PPCMCCodeEmitter : public MCCodeEmitter { const MCSubtargetInfo &STI) const; uint64_t getImm34Encoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const; + const MCSubtargetInfo &STI, + MCFixupKind Fixup) const; + uint64_t getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI) const; + uint64_t getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI) const; unsigned getMemRIEncoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, const MCSubtargetInfo &STI) const; diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.td b/llvm/lib/Target/PowerPC/PPCInstrInfo.td index 673ab63039cf..39a90bf9b346 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrInfo.td +++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.td @@ -757,7 +757,13 @@ def PPCS34ImmAsmOperand : AsmOperandClass { } def s34imm : Operand { let PrintMethod = "printS34ImmOperand"; - let EncoderMethod = "getImm34Encoding"; + let EncoderMethod = "getImm34EncodingNoPCRel"; + let ParserMatchClass = PPCS34ImmAsmOperand; + let DecoderMethod = "decodeSImmOperand<34>"; +} +def s34imm_pcrel : Operand { + let PrintMethod = "printS34ImmOperand"; + let EncoderMethod = "getImm34EncodingPCRel"; let ParserMatchClass = PPCS34ImmAsmOperand; let DecoderMethod = "decodeSImmOperand<34>"; } diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td index 2c21d0a175ad..91bb912e5726 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -432,7 +432,7 @@ let Predicates = [PrefixInstrs] in { let Interpretation64Bit = 1, isCodeGenOnly = 1 in { defm PADDI8 : MLS_DForm_R_SI34_RTA5_p<14, (outs g8rc:$RT), (ins g8rc:$RA, s34imm:$SI), - (ins immZero:$RA, s34imm:$SI), + (ins immZero:$RA, s34imm_pcrel:$SI), "paddi $RT, $RA, $SI", IIC_LdStLFD>; let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { def PLI8 : MLS_DForm_SI34_RT5<14, (outs g8rc:$RT), @@ -442,7 +442,7 @@ let Predicates = [PrefixInstrs] in { } defm PADDI : MLS_DForm_R_SI34_RTA5_p<14, (outs gprc:$RT), (ins gprc:$RA, s34imm:$SI), - (ins immZero:$RA, s34imm:$SI), + (ins immZero:$RA, s34imm_pcrel:$SI), "paddi $RT, $RA, $SI", IIC_LdStLFD>; let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { def PLI : MLS_DForm_SI34_RT5<14, (outs gprc:$RT), diff --git a/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s b/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s new file mode 100644 index 000000000000..0d2c879380e0 --- /dev/null +++ b/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s @@ -0,0 +1,7 @@ +# RUN: not --crash llvm-mc -triple powerpc64-- --filetype=obj < %s 2> %t +# RUN: FileCheck < %t %s +# RUN: not --crash llvm-mc -triple powerpc64le-- --filetype=obj < %s 2> %t +# RUN: FileCheck < %t %s + +# CHECK: Unsupported Modifier for fixup_ppc_imm34. +paddi 3, 13, symbol at toc, 0 From llvm-commits at lists.llvm.org Thu Jul 9 09:29:01 2020 From: llvm-commits at lists.llvm.org (Kamau Bridgeman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:29:01 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <6e54430785d6863d4965c7d5be65ddc6@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbd2068031121: [PowerPC] Split s34imm into two types (authored by stefanp, committed by kamaub). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 Files: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83255.276758.patch Type: text/x-patch Size: 8319 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:33:12 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:33:12 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram added a comment. In D79978#2141959 , @MaskRay wrote: > In D79978#2141768 , @wmi wrote: > > > In D79978#2140849 , @MaskRay wrote: > > > > > I haven't looked into the details, but the test suggests that the patch is wrong: > > > > > > # basic-block-sections-cfi-1.ll > > > .section .text,"ax", at progbits,unique,2 > > > _Z2f3b.2: # %if.end > > > .cfi_startproc > > > .cfi_def_cfa %rbp, 16 # this should be inserted after addq $16, %rsp > > > .cfi_offset %rbp, -16 # this should be after .cfi_def_cfa %rbp, 16 > > > addq $16, %rsp > > > popq %rbp > > > .cfi_def_cfa %rsp, 8 > > > retq > > > > > > > > > I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around. > > > Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. > > > For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there. > > This question still stands. I haven't seen other CFI tests in`DebugInfo/X86/` Sure, I will move it, np. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Thu Jul 9 09:36:47 2020 From: llvm-commits at lists.llvm.org (Jessica Paquette via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:36:47 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: paquette added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:240 +static bool buildAnyextOrCopy(Register Dst, Register Src, + MachineIRBuilder &MIRBuilder) { ---------------- Petar.Avramovic wrote: > paquette wrote: > > Would `MachineIRBuilder::buildExtOrTrunc` work here? > > > > If not, maybe it would make sense to move this to `MachineIRBuilder` for the sake of consistency? > The destination is vreg with reg class and source is generic vreg with LLT, it would not work for anyext since it requires both source and dest to be generic virtual registers. Here we anyext to new generic vreg with same size as Dst and then copy to Dst. > MachineIRBuilder specializes for generic vregs so should it be something like: "buildExtOrTruncToVRegWithRegClass" ? Should I also cover vectors? I don't know if there is a way to know LLT of vector type that would fit into reg class. > I see. Considering that difference, I think this is fine. We can refactor later if it turns out to be a good idea. As for vectors, I think that would be better in a follow-up. ================ Comment at: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp:254 + if (DstSize > SrcSize) { + if (!MRI->getType(Src).isValid() || !MRI->getType(Src).isScalar()) { + LLVM_DEBUG(dbgs() << "Can't extend input to size of destination" ---------------- Can we check `MRI->getType(Src).isValid()` before here? I think we always want to check this, right? e.g. ``` auto SrcTy = MRI->getType(Src); if (!SrcTy.isValid()) { LLVM_DEBUG(dbgs() << "Source type for copy is not valid\n"); return false; } if (DstSize < SrcSize) { ... } // Attempt to anyext small scalar sources. if (DstSize > SrcSize) { ... } ... ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 From llvm-commits at lists.llvm.org Thu Jul 9 09:38:10 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Thu, 09 Jul 2020 09:38:10 -0700 (PDT) Subject: [llvm] afc1a70 - [AliasSetTracker] More precise AAInfo intersection check Message-ID: <5f0747f2.1c69fb81.23342.7edd@mx.google.com> Author: Nikita Popov Date: 2020-07-09T18:29:41+02:00 New Revision: afc1a709433e3754ee3819efd9e144b657919131 URL: https://github.com/llvm/llvm-project/commit/afc1a709433e3754ee3819efd9e144b657919131 DIFF: https://github.com/llvm/llvm-project/commit/afc1a709433e3754ee3819efd9e144b657919131.diff LOG: [AliasSetTracker] More precise AAInfo intersection check The code currently checks whether the intersection has one of TBAA, Scope or NoAlias unset -- however, those might have already been unset in the first place, in which case we will unnecessarily report a change. Instead, compare the intersection result to the original AAInfo. This makes for a 0.5% geomean compile-time saving on CTMark. Differential Revision: https://reviews.llvm.org/D83430 Added: Modified: llvm/include/llvm/Analysis/AliasSetTracker.h Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/AliasSetTracker.h b/llvm/include/llvm/Analysis/AliasSetTracker.h index e94a758b06ba..690a94d9cf2c 100644 --- a/llvm/include/llvm/Analysis/AliasSetTracker.h +++ b/llvm/include/llvm/Analysis/AliasSetTracker.h @@ -87,12 +87,7 @@ class AliasSet : public ilist_node { AAInfo = NewAAInfo; else { AAMDNodes Intersection(AAInfo.intersect(NewAAInfo)); - if (!Intersection.TBAA || !Intersection.Scope || - !Intersection.NoAlias) { - // NewAAInfo conflicts with AAInfo. - AAInfo = DenseMapInfo::getTombstoneKey(); - SizeChanged = true; - } + SizeChanged |= Intersection != AAInfo; AAInfo = Intersection; } return SizeChanged; From llvm-commits at lists.llvm.org Thu Jul 9 09:38:19 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:38:19 +0000 (UTC) Subject: [PATCH] D83430: [AliasSetTracker] More precise AAInfo intersection check In-Reply-To: References: Message-ID: <6f039bf6d175f3fc9a86db8246c2df86@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGafc1a709433e: [AliasSetTracker] More precise AAInfo intersection check (authored by nikic). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83430/new/ https://reviews.llvm.org/D83430 Files: llvm/include/llvm/Analysis/AliasSetTracker.h Index: llvm/include/llvm/Analysis/AliasSetTracker.h =================================================================== --- llvm/include/llvm/Analysis/AliasSetTracker.h +++ llvm/include/llvm/Analysis/AliasSetTracker.h @@ -87,12 +87,7 @@ AAInfo = NewAAInfo; else { AAMDNodes Intersection(AAInfo.intersect(NewAAInfo)); - if (!Intersection.TBAA || !Intersection.Scope || - !Intersection.NoAlias) { - // NewAAInfo conflicts with AAInfo. - AAInfo = DenseMapInfo::getTombstoneKey(); - SizeChanged = true; - } + SizeChanged |= Intersection != AAInfo; AAInfo = Intersection; } return SizeChanged; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83430.276760.patch Type: text/x-patch Size: 707 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:39:02 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:39:02 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: guiand updated this revision to Diff 276761. guiand added a comment. Use autogenerated test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 Files: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81699.276761.patch Type: text/x-patch Size: 11832 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:40:04 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:40:04 +0000 (UTC) Subject: [PATCH] D81699: MemorySanitizer: Add option to insert init checks at call site In-Reply-To: References: Message-ID: guiand marked 10 inline comments as done. guiand added inline comments. ================ Comment at: llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll:1 +; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \ +; RUN: FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s ---------------- vitalybuka wrote: > vitalybuka wrote: > > vitalybuka wrote: > > > would you like to try go generate test with llvm/utils/update_analyze_test_checks.py > > ? > ``` > ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py > ; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \ > ; RUN: FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s > > target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > target triple = "x86_64-unknown-linux-gnu" > > define noundef i32 @NormalRet() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @NormalRet( > ; CHECK-NEXT: ret i32 123 > ; > ret i32 123 > } > > define i32 @PartialRet() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @PartialRet( > ; CHECK-NEXT: store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 > ; CHECK-NEXT: store i32 0, i32* @__msan_retval_origin_tls, align 4 > ; CHECK-NEXT: ret i32 123 > ; > ret i32 123 > } > > define noundef i32 @LoadedRet() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @LoadedRet( > ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* > ; CHECK-NEXT: [[O:%.*]] = load i32, i32* [[P]], align 4 > ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 > ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 > ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* > ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 > ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* > ; CHECK-NEXT: [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4 > ; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4 > ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0 > ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0 > ; CHECK: 7: > ; CHECK-NEXT: call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1 > ; CHECK-NEXT: unreachable > ; CHECK: 8: > ; CHECK-NEXT: ret i32 [[O]] > ; > %p = inttoptr i64 0 to i32 * > %o = load i32, i32 *%p > ret i32 %o > } > > > define void @NormalArg(i32 noundef %a) nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @NormalArg( > ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* > ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 > ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 > ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* > ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 > ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* > ; CHECK-NEXT: store i32 0, i32* [[TMP3]], align 4 > ; CHECK-NEXT: store i32 [[A:%.*]], i32* [[P]], align 4 > ; CHECK-NEXT: ret void > ; > %p = inttoptr i64 0 to i32 * > store i32 %a, i32 *%p > ret void > } > > define void @PartialArg(i32 %a) nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @PartialArg( > ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8 > ; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4 > ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* > ; CHECK-NEXT: [[TMP3:%.*]] = ptrtoint i32* [[P]] to i64 > ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[TMP3]], 87960930222080 > ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* > ; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 17592186044416 > ; CHECK-NEXT: [[TMP7:%.*]] = inttoptr i64 [[TMP6]] to i32* > ; CHECK-NEXT: store i32 [[TMP1]], i32* [[TMP5]], align 4 > ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[TMP1]], 0 > ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP8:%.*]], label [[TMP9:%.*]], !prof !0 > ; CHECK: 8: > ; CHECK-NEXT: store i32 [[TMP2]], i32* [[TMP7]], align 4 > ; CHECK-NEXT: br label [[TMP9]] > ; CHECK: 9: > ; CHECK-NEXT: store i32 [[A:%.*]], i32* [[P]], align 4 > ; CHECK-NEXT: ret void > ; > %p = inttoptr i64 0 to i32 * > store i32 %a, i32 *%p > ret void > } > > define void @CallNormal() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @CallNormal( > ; CHECK-NEXT: [[R:%.*]] = call i32 @NormalRet() #0 > ; CHECK-NEXT: call void @NormalArg(i32 [[R]]) #0 > ; CHECK-NEXT: ret void > ; > %r = call i32 @NormalRet() nounwind uwtable sanitize_memory > call void @NormalArg(i32 %r) nounwind uwtable sanitize_memory > ret void > } > > define void @CallWithLoaded() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @CallWithLoaded( > ; CHECK-NEXT: [[P:%.*]] = inttoptr i64 0 to i32* > ; CHECK-NEXT: [[O:%.*]] = load i32, i32* [[P]], align 4 > ; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64 > ; CHECK-NEXT: [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080 > ; CHECK-NEXT: [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32* > ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416 > ; CHECK-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32* > ; CHECK-NEXT: [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4 > ; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4 > ; CHECK-NEXT: [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0 > ; CHECK-NEXT: br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0 > ; CHECK: 7: > ; CHECK-NEXT: call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1 > ; CHECK-NEXT: unreachable > ; CHECK: 8: > ; CHECK-NEXT: call void @NormalArg(i32 [[O]]) #0 > ; CHECK-NEXT: ret void > ; > %p = inttoptr i64 0 to i32 * > %o = load i32, i32 *%p > call void @NormalArg(i32 %o) nounwind uwtable sanitize_memory > ret void > } > > define void @CallPartial() nounwind uwtable sanitize_memory { > ; CHECK-LABEL: @CallPartial( > ; CHECK-NEXT: store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 > ; CHECK-NEXT: [[R:%.*]] = call i32 @PartialRet() #0 > ; CHECK-NEXT: [[_MSRET:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8 > ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* @__msan_retval_origin_tls, align 4 > ; CHECK-NEXT: store i32 [[_MSRET]], i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8 > ; CHECK-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4 > ; CHECK-NEXT: call void @PartialArg(i32 [[R]]) #0 > ; CHECK-NEXT: ret void > ; > %r = call i32 @PartialRet() nounwind uwtable sanitize_memory > call void @PartialArg(i32 %r) nounwind uwtable sanitize_memory > ret void > } > > ``` Thanks for helping me out with this, Vitaly! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81699/new/ https://reviews.llvm.org/D81699 From llvm-commits at lists.llvm.org Thu Jul 9 09:40:38 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via llvm-commits) Date: Thu, 09 Jul 2020 09:40:38 -0700 (PDT) Subject: [compiler-rt] a89d54f - [compiler-rt] Better Windows support for running tests in external shell Message-ID: <5f074886.1c69fb81.c153f.86e9@mx.google.com> Author: Sergej Jaskiewicz Date: 2020-07-09T19:40:22+03:00 New Revision: a89d54fd61a6e7a05f7434491135e667306a22e7 URL: https://github.com/llvm/llvm-project/commit/a89d54fd61a6e7a05f7434491135e667306a22e7 DIFF: https://github.com/llvm/llvm-project/commit/a89d54fd61a6e7a05f7434491135e667306a22e7.diff LOG: [compiler-rt] Better Windows support for running tests in external shell Summary: These changes are necessary to support remote running compiler-rt tests that were compiled on Windows. Most of the code here has been copy-pasted from other lit configs. Why do we remove the conversions to ASCII in the crt config? We set the `universal_newlines` argument to `True` in `Popen` instead. This is supported in both Python 2.7 and 3, is easier (no need to do the `str(dir.decode('ascii'))` dance) and less error prone. Also, this is necessary because if the config is executed on Windows, and `execute_external` is `True`, we take the branch `if sys.platform in ['win32'] and execute_external`, and if we use Python 3, then the `dir` variable is a byte-like object, not str, but the ``replace method on byte-like objects requires its arguments to also be byte-like objects, which is incompatible with Python 2 etc etc. It is a lot simpler to just work with strings in the first place, which is achieved by setting `universal_newlines` to `True`. As far as I understand, this way wasn't taken because of the need to support Python <2.7, but this is not the case now. Reviewers: compnerd, phosek, weimingz Reviewed By: compnerd Subscribers: dberris, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D83485 Added: Modified: compiler-rt/test/builtins/Unit/lit.cfg.py compiler-rt/test/crt/lit.cfg.py Removed: ################################################################################ diff --git a/compiler-rt/test/builtins/Unit/lit.cfg.py b/compiler-rt/test/builtins/Unit/lit.cfg.py index 8fdb1a216ee2..c8888078be50 100644 --- a/compiler-rt/test/builtins/Unit/lit.cfg.py +++ b/compiler-rt/test/builtins/Unit/lit.cfg.py @@ -5,6 +5,17 @@ import lit.formats +# Choose between lit's internal shell pipeline runner and a real shell. If +# LIT_USE_INTERNAL_SHELL is in the environment, we use that as an override. +use_lit_shell = os.environ.get("LIT_USE_INTERNAL_SHELL") +if use_lit_shell: + # 0 is external, "" is default, and everything else is internal. + execute_external = (use_lit_shell == "0") +else: + # Otherwise we default to internal on Windows and external elsewhere, as + # bash on Windows is usually very slow. + execute_external = (not sys.platform in ['win32']) + def get_required_attr(config, attr_name): attr_value = getattr(config, attr_name, None) if attr_value == None: @@ -35,10 +46,16 @@ def get_required_attr(config, attr_name): else: base_lib = os.path.join(config.compiler_rt_libdir, "libclang_rt.builtins%s.a" % config.target_suffix) + if sys.platform in ['win32'] and execute_external: + # Don't pass dosish path separator to msys bash.exe. + base_lib = base_lib.replace('\\', '/') config.substitutions.append( ("%librt ", base_lib + ' -lc -lm ') ) builtins_source_dir = os.path.join( get_required_attr(config, "compiler_rt_src_root"), "lib", "builtins") +if sys.platform in ['win32'] and execute_external: + # Don't pass dosish path separator to msys bash.exe. + builtins_source_dir = builtins_source_dir.replace('\\', '/') builtins_lit_source_dir = get_required_attr(config, "builtins_lit_source_dir") extra_link_flags = ["-nodefaultlibs"] diff --git a/compiler-rt/test/crt/lit.cfg.py b/compiler-rt/test/crt/lit.cfg.py index dc15e456fe19..68e7eda7d59b 100644 --- a/compiler-rt/test/crt/lit.cfg.py +++ b/compiler-rt/test/crt/lit.cfg.py @@ -26,15 +26,15 @@ def get_library_path(file): config.target_cflags.strip(), '-print-file-name=%s' % file], stdout=subprocess.PIPE, - env=config.environment) + env=config.environment, + universal_newlines=True) if not cmd.stdout: lit_config.fatal("Couldn't find the library path for '%s'" % file) dir = cmd.stdout.read().strip() if sys.platform in ['win32'] and execute_external: # Don't pass dosish path separator to msys bash.exe. dir = dir.replace('\\', '/') - # Ensure the result is an ascii string, across Python2.5+ - Python3. - return str(dir.decode('ascii')) + return dir def get_libgcc_file_name(): @@ -42,15 +42,15 @@ def get_libgcc_file_name(): config.target_cflags.strip(), '-print-libgcc-file-name'], stdout=subprocess.PIPE, - env=config.environment) + env=config.environment, + universal_newlines=True) if not cmd.stdout: lit_config.fatal("Couldn't find the library path for '%s'" % file) dir = cmd.stdout.read().strip() if sys.platform in ['win32'] and execute_external: # Don't pass dosish path separator to msys bash.exe. dir = dir.replace('\\', '/') - # Ensure the result is an ascii string, across Python2.5+ - Python3. - return str(dir.decode('ascii')) + return dir def build_invocation(compile_flags): @@ -66,6 +66,11 @@ def build_invocation(compile_flags): base_lib = os.path.join( config.compiler_rt_libdir, "clang_rt.%%s%s.o" % config.target_suffix) + +if sys.platform in ['win32'] and execute_external: + # Don't pass dosish path separator to msys bash.exe. + base_lib = base_lib.replace('\\', '/') + config.substitutions.append(('%crtbegin', base_lib % "crtbegin")) config.substitutions.append(('%crtend', base_lib % "crtend")) From llvm-commits at lists.llvm.org Thu Jul 9 09:40:55 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:40:55 +0000 (UTC) Subject: [PATCH] D83491: [flang] Fix a crash when creating generics from a copy Message-ID: PeteSteinfeld created this revision. PeteSteinfeld added reviewers: klausler, tskeith. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. When a program unit creates a generic based on one defined in a module, the function `CopyFrom()` is called to create the `GenericDetails`. This function copied the `specificProcs_` but failed to copy the `bindingNames_`. If the function `CheckGeneric()` then gets called, it tries to index into the empty binding names and causes the crash. I fixed this by adding code to `CopyFrom()` to copy the binding names. I also added a test that causes the crash. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83491 Files: flang/lib/Semantics/symbol.cpp flang/test/Semantics/resolve53.f90 Index: flang/test/Semantics/resolve53.f90 =================================================================== --- flang/test/Semantics/resolve53.f90 +++ flang/test/Semantics/resolve53.f90 @@ -457,3 +457,26 @@ integer :: i, j end end + +module m20 + interface operator(.not.) + real function f(x) + character(*),intent(in) :: x + end function + end interface + interface operator(+) + procedure f + end interface +end module + +subroutine s1() + use m20 + interface operator(.not.) + !ERROR: Procedure 'f' is already specified in generic 'operator(.not.)' + procedure f + end interface + interface operator(+) + !ERROR: Procedure 'f' is already specified in generic 'operator(+)' + procedure f + end interface +end subroutine s1 Index: flang/lib/Semantics/symbol.cpp =================================================================== --- flang/lib/Semantics/symbol.cpp +++ flang/lib/Semantics/symbol.cpp @@ -201,6 +201,14 @@ specificProcs_.push_back(symbol); } } + for (const SourceName &sourceName : from.bindingNames_) { + if (std::find_if(bindingNames_.begin(), bindingNames_.end(), + [&](const SourceName &mySource) { + return &mySource == &sourceName; + }) == bindingNames_.end()) { + bindingNames_.push_back(sourceName); + } + } } // The name of the kind of details for this symbol. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83491.276763.patch Type: text/x-patch Size: 1400 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 09:44:25 2020 From: llvm-commits at lists.llvm.org (Fred Riss via llvm-commits) Date: Thu, 09 Jul 2020 09:44:25 -0700 (PDT) Subject: [llvm] e529d77 - [lldb] Use enum constant instead of raw value Message-ID: <5f074969.1c69fb81.ffac7.8676@mx.google.com> Author: Fred Riss Date: 2020-07-09T09:43:50-07:00 New Revision: e529d774c4d5d1eba13dcad5eef2195dd06741c6 URL: https://github.com/llvm/llvm-project/commit/e529d774c4d5d1eba13dcad5eef2195dd06741c6 DIFF: https://github.com/llvm/llvm-project/commit/e529d774c4d5d1eba13dcad5eef2195dd06741c6.diff LOG: [lldb] Use enum constant instead of raw value Added: Modified: lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp llvm/include/llvm/BinaryFormat/MachO.h Removed: ################################################################################ diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp index 6c2f22b837c5..2bb4b21adeae 100644 --- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp +++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp @@ -2297,7 +2297,7 @@ size_t ObjectFileMachO::ParseSymtab() { #if defined(__APPLE__) && \ (defined(__arm__) || defined(__arm64__) || defined(__aarch64__)) - if (m_header.flags & 0x80000000u && + if (m_header.flags & MH_DYLIB_IN_CACHE && process->GetAddressByteSize() == sizeof(void *)) { // This mach-o memory file is in the dyld shared cache. If this // program is not remote and this is iOS, then this process will @@ -2379,7 +2379,7 @@ size_t ObjectFileMachO::ParseSymtab() { // problem. For binaries outside the shared cache, it's faster to // read the entire strtab at once instead of piece-by-piece as we // process the nlist records. - if ((m_header.flags & 0x80000000u) == 0) { + if ((m_header.flags & MH_DYLIB_IN_CACHE) == 0) { DataBufferSP strtab_data_sp( ReadMemory(process_sp, strtab_addr, strtab_data_byte_size)); if (strtab_data_sp) { @@ -2608,7 +2608,7 @@ size_t ObjectFileMachO::ParseSymtab() { // to parse any DSC unmapped symbol information. If we find any, we set a // flag that tells the normal nlist parser to ignore all LOCAL symbols. - if (m_header.flags & 0x80000000u) { + if (m_header.flags & MH_DYLIB_IN_CACHE) { // Before we can start mapping the DSC, we need to make certain the // target process is actually using the cache we can find. diff --git a/llvm/include/llvm/BinaryFormat/MachO.h b/llvm/include/llvm/BinaryFormat/MachO.h index 0010f36e8b89..e43fea0a2465 100644 --- a/llvm/include/llvm/BinaryFormat/MachO.h +++ b/llvm/include/llvm/BinaryFormat/MachO.h @@ -82,7 +82,8 @@ enum { MH_HAS_TLV_DESCRIPTORS = 0x00800000u, MH_NO_HEAP_EXECUTION = 0x01000000u, MH_APP_EXTENSION_SAFE = 0x02000000u, - MH_NLIST_OUTOFSYNC_WITH_DYLDINFO = 0x04000000u + MH_NLIST_OUTOFSYNC_WITH_DYLDINFO = 0x04000000u, + MH_DYLIB_IN_CACHE = 0x80000000u, }; enum : uint32_t { From llvm-commits at lists.llvm.org Thu Jul 9 09:44:50 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via llvm-commits) Date: Thu, 09 Jul 2020 09:44:50 -0700 (PDT) Subject: [compiler-rt] 5ab446c - [compiler-rt] [test] Use the parent process env as base env in tests Message-ID: <5f074982.1c69fb81.b3e06.90d2@mx.google.com> Author: Sergej Jaskiewicz Date: 2020-07-09T19:44:35+03:00 New Revision: 5ab446cfe5503fd4431a94db4d741cf3b5fdcd15 URL: https://github.com/llvm/llvm-project/commit/5ab446cfe5503fd4431a94db4d741cf3b5fdcd15 DIFF: https://github.com/llvm/llvm-project/commit/5ab446cfe5503fd4431a94db4d741cf3b5fdcd15.diff LOG: [compiler-rt] [test] Use the parent process env as base env in tests Summary: Right now the lit config builds up an environment that the tests will be run in. However, it does it from scratch instead of adding new variables to the parent process environment. This may (and does) result in strange behavior when running tests with an executor (i. e. with the `COMPILER_RT_EMULATOR` CMake variable set to something), since the executor may need some of the parent process's environment variables. Here this is fixed. Reviewers: compnerd, phosek Reviewed By: compnerd Subscribers: dberris, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D83486 Added: Modified: compiler-rt/test/lit.common.cfg.py Removed: ################################################################################ diff --git a/compiler-rt/test/lit.common.cfg.py b/compiler-rt/test/lit.common.cfg.py index 98a2f3c03e60..7c98c387c870 100644 --- a/compiler-rt/test/lit.common.cfg.py +++ b/compiler-rt/test/lit.common.cfg.py @@ -67,6 +67,8 @@ # to link. In r19 and later we just use the default which is libc++. config.cxx_mode_flags.append('-stdlib=libstdc++') +config.environment = dict(os.environ) + # Clear some environment variables that might affect Clang. possibly_dangerous_env_vars = ['ASAN_OPTIONS', 'DFSAN_OPTIONS', 'LSAN_OPTIONS', 'MSAN_OPTIONS', 'UBSAN_OPTIONS', From llvm-commits at lists.llvm.org Thu Jul 9 09:46:29 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:46:29 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <4de37a139e2c8b797ebfbaaaa245045d@localhost.localdomain> nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LG ================ Comment at: llvm/lib/Transforms/Scalar/BDCE.cpp:125 + IRBuilder<> Builder(SE); + I.replaceAllUsesWith(Builder.CreateZExt(SE->getOperand(0), DstTy)); + Worklist.push_back(SE); ---------------- Please pass `SE->getName()` here to preserve the instruction name. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 From llvm-commits at lists.llvm.org Thu Jul 9 09:49:31 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via llvm-commits) Date: Thu, 09 Jul 2020 09:49:31 -0700 (PDT) Subject: [compiler-rt] 8372d50 - [compiler-rt] [test] Allow expanding lit substitutions recursively Message-ID: <5f074a9b.1c69fb81.5cfd2.79ae@mx.google.com> Author: Sergej Jaskiewicz Date: 2020-07-09T19:49:18+03:00 New Revision: 8372d505082aceb38417e0b561cd32f2e227597b URL: https://github.com/llvm/llvm-project/commit/8372d505082aceb38417e0b561cd32f2e227597b DIFF: https://github.com/llvm/llvm-project/commit/8372d505082aceb38417e0b561cd32f2e227597b.diff LOG: [compiler-rt] [test] Allow expanding lit substitutions recursively Summary: This allows using lit substitutions in the `COMPILER_RT_EMULATOR` variable. (For reference, the ability to expand substitutions recursively has been introduced in https://reviews.llvm.org/D76178.) Reviewers: phosek, compnerd Reviewed By: compnerd Subscribers: dberris, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D83489 Added: Modified: compiler-rt/test/lit.common.cfg.py Removed: ################################################################################ diff --git a/compiler-rt/test/lit.common.cfg.py b/compiler-rt/test/lit.common.cfg.py index 7c98c387c870..32a602bfb318 100644 --- a/compiler-rt/test/lit.common.cfg.py +++ b/compiler-rt/test/lit.common.cfg.py @@ -23,6 +23,9 @@ # bash on Windows is usually very slow. execute_external = (not sys.platform in ['win32']) +# Allow expanding substitutions that are based on other substitutions +config.recursiveExpansionLimit = 10 + # Setup test format. config.test_format = lit.formats.ShTest(execute_external) if execute_external: From llvm-commits at lists.llvm.org Thu Jul 9 09:49:39 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:49:39 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <6c3f8e4bc3c13edaa134f3b0eb04396a@localhost.localdomain> SjoerdMeijer marked an inline comment as done. SjoerdMeijer added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15574 +matrix, using a stride of %Stride between columns. For two consecutive columns +A and B, %Stride refers to the distance (the number of elements) between the +start of column A and the start of column B. The result matrix is linearized ---------------- I am actually now also interested in defining `%Stride` better. Using our new definition: > For a `R x C` matrix, element `i` of column `j` is at index `j * R + i` in its vector, with indices starting at 0. >From the description of %Stride it follows that: %Stride = ( (j+1) * R + 0) - (j * R + 0) => %Stride = R So double checking: we can simply the description of %Stride just by saying it is equal to the number of rows, is that correct? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Thu Jul 9 09:50:20 2020 From: llvm-commits at lists.llvm.org (Vedant Kumar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:50:20 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: vsk added a comment. +1 for removing EmitDwarfDebugEntryValues, that's a nice cleanup. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 From llvm-commits at lists.llvm.org Thu Jul 9 09:52:00 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:52:00 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: <16502b81d063851b5268d47ac3e01505@localhost.localdomain> echristo added a comment. So the tuning here for SCE is also a "does not support" or something else? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 From llvm-commits at lists.llvm.org Thu Jul 9 09:55:06 2020 From: llvm-commits at lists.llvm.org (Adrian Prantl via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:55:06 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: <53f1a5633aadfcbad9d0ee8752c157ed@localhost.localdomain> aprantl added a comment. lgtm from my point of view now. ================ Comment at: llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp:981 // Insert the value at the old position - RetVal = InsertValueInst::Create(RetVal, V, Ri, "oldret", InsertPt); + RetVal = IRB.CreateInsertValue(RetVal, V, Ri, "oldret"); } ---------------- Nice! ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:5 +; RUN: opt < %s -deadargelim -check-debugify -S 2>&1 \ +; RUN: | FileCheck %s -check-prefix=DEBUG + ---------------- Why use a custom prefix if there is only one FileCheck invocation? ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:219 +!93 = !DILocation(line: 37, column: 1, scope: !58) +!94 = !DILocation(line: 38, column: 1, scope: !58) + ---------------- We might as well delete all the `column: 1` fields, assuming that 0 is the default. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 From llvm-commits at lists.llvm.org Thu Jul 9 09:56:16 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via llvm-commits) Date: Thu, 09 Jul 2020 09:56:16 -0700 (PDT) Subject: [llvm] 06fc125 - [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. Message-ID: <5f074c30.1c69fb81.3130e.87ef@mx.google.com> Author: Hiroshi Yamauchi Date: 2020-07-09T09:56:01-07:00 New Revision: 06fc125d8c5d7d1244ee2160fba52bc1b91ddb99 URL: https://github.com/llvm/llvm-project/commit/06fc125d8c5d7d1244ee2160fba52bc1b91ddb99 DIFF: https://github.com/llvm/llvm-project/commit/06fc125d8c5d7d1244ee2160fba52bc1b91ddb99.diff LOG: [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. Added: Modified: llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/phaddsub-extract.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/avx-vperm2x128.ll b/llvm/test/CodeGen/X86/avx-vperm2x128.ll index 2abca6ea7fe9..8310cc9b50a3 100644 --- a/llvm/test/CodeGen/X86/avx-vperm2x128.ll +++ b/llvm/test/CodeGen/X86/avx-vperm2x128.ll @@ -394,6 +394,15 @@ define <4 x double> @shuffle_v4f64_zz23_optsize(<4 x double> %a) optsize { %s = shufflevector <4 x double> %a, <4 x double> , <4 x i32> ret <4 x double> %s } +define <4 x double> @shuffle_v4f64_zz23_pgso(<4 x double> %a) !prof !14 { +; ALL-LABEL: shuffle_v4f64_zz23_pgso: +; ALL: # %bb.0: +; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 +; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: retq + %s = shufflevector <4 x double> %a, <4 x double> , <4 x i32> + ret <4 x double> %s +} define <4 x double> @shuffle_v4f64_zz45(<4 x double> %a) { ; ALL-LABEL: shuffle_v4f64_zz45: @@ -429,6 +438,15 @@ define <4 x double> @shuffle_v4f64_zz67_optsize(<4 x double> %a) optsize { %s = shufflevector <4 x double> , <4 x double> %a, <4 x i32> ret <4 x double> %s } +define <4 x double> @shuffle_v4f64_zz67_pgso(<4 x double> %a) !prof !14 { +; ALL-LABEL: shuffle_v4f64_zz67_pgso: +; ALL: # %bb.0: +; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 +; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: retq + %s = shufflevector <4 x double> , <4 x double> %a, <4 x i32> + ret <4 x double> %s +} define <4 x double> @shuffle_v4f64_01zz(<4 x double> %a) { ; ALL-LABEL: shuffle_v4f64_01zz: @@ -685,3 +703,20 @@ entry: %res = add <8 x i32> %shuffle, ret <8 x i32> %res } + +!llvm.module.flags = !{!0} +!0 = !{i32 1, !"ProfileSummary", !1} +!1 = !{!2, !3, !4, !5, !6, !7, !8, !9} +!2 = !{!"ProfileFormat", !"InstrProf"} +!3 = !{!"TotalCount", i64 10000} +!4 = !{!"MaxCount", i64 10} +!5 = !{!"MaxInternalCount", i64 1} +!6 = !{!"MaxFunctionCount", i64 1000} +!7 = !{!"NumCounts", i64 3} +!8 = !{!"NumFunctions", i64 3} +!9 = !{!"DetailedSummary", !10} +!10 = !{!11, !12, !13} +!11 = !{i32 10000, i64 100, i32 1} +!12 = !{i32 999000, i64 100, i32 1} +!13 = !{i32 999999, i64 1, i32 2} +!14 = !{!"function_entry_count", i64 0} diff --git a/llvm/test/CodeGen/X86/phaddsub-extract.ll b/llvm/test/CodeGen/X86/phaddsub-extract.ll index b7af19b7b1e4..f475a31b7d29 100644 --- a/llvm/test/CodeGen/X86/phaddsub-extract.ll +++ b/llvm/test/CodeGen/X86/phaddsub-extract.ll @@ -2094,6 +2094,44 @@ define i32 @hadd32_4_optsize(<4 x i32> %x225) optsize { ret i32 %x230 } +define i32 @hadd32_4_pgso(<4 x i32> %x225) !prof !14 { +; SSE3-SLOW-LABEL: hadd32_4_pgso: +; SSE3-SLOW: # %bb.0: +; SSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] +; SSE3-SLOW-NEXT: paddd %xmm0, %xmm1 +; SSE3-SLOW-NEXT: phaddd %xmm1, %xmm1 +; SSE3-SLOW-NEXT: movd %xmm1, %eax +; SSE3-SLOW-NEXT: retq +; +; SSE3-FAST-LABEL: hadd32_4_pgso: +; SSE3-FAST: # %bb.0: +; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 +; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 +; SSE3-FAST-NEXT: movd %xmm0, %eax +; SSE3-FAST-NEXT: retq +; +; AVX-SLOW-LABEL: hadd32_4_pgso: +; AVX-SLOW: # %bb.0: +; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] +; AVX-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0 +; AVX-SLOW-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-SLOW-NEXT: vmovd %xmm0, %eax +; AVX-SLOW-NEXT: retq +; +; AVX-FAST-LABEL: hadd32_4_pgso: +; AVX-FAST: # %bb.0: +; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-FAST-NEXT: vmovd %xmm0, %eax +; AVX-FAST-NEXT: retq + %x226 = shufflevector <4 x i32> %x225, <4 x i32> undef, <4 x i32> + %x227 = add <4 x i32> %x225, %x226 + %x228 = shufflevector <4 x i32> %x227, <4 x i32> undef, <4 x i32> + %x229 = add <4 x i32> %x227, %x228 + %x230 = extractelement <4 x i32> %x229, i32 0 + ret i32 %x230 +} + define i32 @hadd32_8_optsize(<8 x i32> %x225) optsize { ; SSE3-LABEL: hadd32_8_optsize: ; SSE3: # %bb.0: @@ -2141,3 +2179,20 @@ define i32 @hadd32_16_optsize(<16 x i32> %x225) optsize { %x230 = extractelement <16 x i32> %x229, i32 0 ret i32 %x230 } + +!llvm.module.flags = !{!0} +!0 = !{i32 1, !"ProfileSummary", !1} +!1 = !{!2, !3, !4, !5, !6, !7, !8, !9} +!2 = !{!"ProfileFormat", !"InstrProf"} +!3 = !{!"TotalCount", i64 10000} +!4 = !{!"MaxCount", i64 10} +!5 = !{!"MaxInternalCount", i64 1} +!6 = !{!"MaxFunctionCount", i64 1000} +!7 = !{!"NumCounts", i64 3} +!8 = !{!"NumFunctions", i64 3} +!9 = !{!"DetailedSummary", !10} +!10 = !{!11, !12, !13} +!11 = !{i32 10000, i64 100, i32 1} +!12 = !{i32 999000, i64 100, i32 1} +!13 = !{i32 999999, i64 1, i32 2} +!14 = !{!"function_entry_count", i64 0} From llvm-commits at lists.llvm.org Thu Jul 9 09:58:14 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:58:14 +0000 (UTC) Subject: [PATCH] D83470: [LV] Fix versioning-for-unit-stide of loops with small trip count In-Reply-To: References: Message-ID: <8f06a55e5b4cb92fabfbad0e17e1263b@localhost.localdomain> fhahn accepted this revision. fhahn added a comment. This revision is now accepted and ready to land. LGTM, thanks! > In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345 . The latter is chosen for now as the former requires refactoring. As already discussed in D81345 , ideally LV would have more flexibility to drive LAA, but this requires non-trivial refactoring. Which we should do, but until then the patch looks like a reasonable fix to the crash. ================ Comment at: llvm/test/Transforms/LoopVectorize/optsize.ll:239 + %l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ] + %mul = mul nsw i16 %l1.02, undef + %arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul ---------------- Better to use a non-undef constant/value? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83470/new/ https://reviews.llvm.org/D83470 From llvm-commits at lists.llvm.org Thu Jul 9 09:58:29 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 16:58:29 +0000 (UTC) Subject: [PATCH] D83421: [RFC] MemorySSAUpdater: Simplify applyUpdates In-Reply-To: References: Message-ID: asbirlea added a comment. In D83421#2141875 , @nhaehnle wrote: > Okay, thank you for that context, that makes sense. I'm going to drop this change then obviously. > > FYI, I'm considering to at least try out making dominator tree construction based on the CfgInterfaceImpl from D83088 , so that GenericDomTreeConstruction is no longer all a giant ball of templates. Do you think this would help you with your changes? Obviously there's the question of how it impacts compile time, but we won't really know for sure until we try it, which is one of my motivations here. I think it's a great idea to improve GenericDomTreeConstruction. Some of the already committed patches did some nice cleanups. I don't have a clear picture yet of what the overlap/conflict will be with D77341 , but we can work this out as you sent out more changes. I'll send out an update on the RFC as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83421/new/ https://reviews.llvm.org/D83421 From llvm-commits at lists.llvm.org Thu Jul 9 10:01:14 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:01:14 +0000 (UTC) Subject: [PATCH] D83311: [CodeMoverUtils] Add optional data dependence checks using MSSA In-Reply-To: References: Message-ID: RithikSharma marked an inline comment as done. RithikSharma added inline comments. ================ Comment at: llvm/lib/Transforms/Utils/CodeMoverUtils.cpp:368 // Skip tests when we don't have PDT or DI - if (!PDT || !DI) + if (!PDT || !(DI || MSSAU)) return false; ---------------- fhahn wrote: > RithikSharma wrote: > > fhahn wrote: > > > Does it make sense to even call this function if either of those are not available, i.e. if all those required wouldn't it make sense to assert that they are all provided or turn them into references? > > I'm sorry, I didn't understand. We need at least DI or MSSA to find dependency. > I meant does it make sense to call this function without `PDT == nullptr` for example? It seems like it is kind of required here, right? Got it, why is PDT not a reference if it is required, right? Most code motion clients example LICM don't have PDT so until we find a way to prove control flow equivalence with some other analysis we need to keep the !PDT check but we did changed PDT into pointer as we will be expecting nullptr in near future. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83311/new/ https://reviews.llvm.org/D83311 From llvm-commits at lists.llvm.org Thu Jul 9 10:07:31 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:07:31 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <83373d3ed3659ca7ca1ca5205d2ed63d@localhost.localdomain> fhahn added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15574 +matrix, using a stride of %Stride between columns. For two consecutive columns +A and B, %Stride refers to the distance (the number of elements) between the +start of column A and the start of column B. The result matrix is linearized ---------------- SjoerdMeijer wrote: > I am actually now also interested in defining `%Stride` better. Using our new definition: > > > For a `R x C` matrix, element `i` of column `j` is at index `j * R + i` in its vector, with indices starting at 0. > > From the description of %Stride it follows that: > > %Stride = ( (j+1) * R + 0) - (j * R + 0) > => > %Stride = R > > So double checking: we can simply the description of %Stride just by saying it is equal to the number of rows, is that correct? Stride can be > the number of rows. For example, if you want to load a 2x2 sub-matrix from a 4x4 matrix, you would use `llvm.matrix.column.major.load(%start, 4, false, 2, 2), where %start points to the first element of the sub-matrix. The function to compute column addresses has an extensive comment about how things work: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L92 It boils down to something like: the start address of column I in memory is computed as ` getelementptr %Start, I * Stride`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Thu Jul 9 10:07:36 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:07:36 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <03338290040f39af1116a77d1b89da38@localhost.localdomain> MaskRay added a comment. Looks good. Some comment suggestions. ================ Comment at: lld/ELF/InputFiles.cpp:646 + // 1) handle SHF_LINK_ORDER sections. + // 2) create SHT_REL[A} sections. In some cases relocated sections may follow + // the corresponding relocation section. In such a case, the relocation ---------------- jhenderson wrote: > Typo? '}' -> ']' ... the section header index of a relocation section may be smaller than that of the relocated section ================ Comment at: lld/test/ELF/reloc-sec-before-target.test:1 +## In this case we have an object with a relocation section before +## the corresponding relocatable target section. Normally it is not what ---------------- If the section header index of a SHT_REL[A] section is smaller than the section header index of the relocated section ================ Comment at: lld/test/ELF/reloc-sec-before-target.test:3 +## the corresponding relocatable target section. Normally it is not what +## compilers would emit. We have to support it, because some custom tools might +## want to use this feature, which is not restricted by ELF gABI. ---------------- > We have to support it, because some custom tools might want to use this feature, which is not restricted by ELF gABI. Worth mentioning that GNU ld supports this as well. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 10:08:25 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:08:25 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/test/ELF/reloc-sec-before-target.test:1 +## In this case we have an object with a relocation section before +## the corresponding relocatable target section. Normally it is not what ---------------- MaskRay wrote: > If the section header index of a SHT_REL[A] section is smaller than the section header index of the relocated section `reloc-sec-before-relocated.test` might be a better name. (target -> relocated) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 10:09:10 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:09:10 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <1f2c23ee82fb766dc05947de0ea8fb24@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/ELF/reloc-sec-before-target.test:3 +## the corresponding relocatable target section. Normally it is not what +## compilers would emit. We have to support it, because some custom tools might +## want to use this feature, which is not restricted by ELF gABI. ---------------- MaskRay wrote: > > We have to support it, because some custom tools might want to use this feature, which is not restricted by ELF gABI. > > Worth mentioning that GNU ld supports this as well. This should also be mentioned in the description. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Thu Jul 9 10:13:39 2020 From: llvm-commits at lists.llvm.org (Francis Visoiu Mistrih via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:13:39 +0000 (UTC) Subject: [PATCH] D83456: [NFC][AArch64] Refactor getArgumentPopSize In-Reply-To: References: Message-ID: thegameg accepted this revision. thegameg added a comment. LGTM, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83456/new/ https://reviews.llvm.org/D83456 From llvm-commits at lists.llvm.org Thu Jul 9 10:15:16 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:15:16 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: dmgreen marked an inline comment as done. dmgreen added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:238 VPTransformState(unsigned VF, unsigned UF, LoopInfo *LI, DominatorTree *DT, - IRBuilder<> &Builder, VectorizerValueMap &ValueMap, - InnerLoopVectorizer *ILV, VPCallback &Callback) - : VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder), + const TargetTransformInfo *TTI, IRBuilder<> &Builder, + VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV, ---------------- gilr wrote: > dmgreen wrote: > > Ayal wrote: > > > Too bad this requires passing TTI through the State everywhere. > > > Perhaps storing TTI in the recipe would be somewhat better. > > I've changed it to be stored there. It does mean multiple things are holding TTI. Let me know what you think. > It seems that TTI is only used later for deciding whether to use a shuffle sequence or an intrinsic based on data available during planning. If so, then it would be best if the Planner calls TTI->useReductionIntrinsic() and records that boolean decision in the Recipe. This is also required in order to estimate in-loop reduction cost. This could be done separately. Do you mean to change the interface to createTargetReduction, to take a bool instead? Yeah I think that sounds good. I'd prefer to do it as a separate review as it does involve changing the interface. I will put a patch together. I was imagining that we would change the cost to use getArithmeticReductionCost, which hopefully handles the details of how the target lowers reductions. I haven't looked deeply into the details yet though. That is on the list of things to do. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Thu Jul 9 10:16:03 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:16:03 +0000 (UTC) Subject: [PATCH] D83493: [WebAssembly][NFC] Simplify vector shift lowering and add tests Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff. Herald added subscribers: llvm-commits, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. This patch builds on 0d7286a652 by simplifying the code for detecting splat values and adding new tests demonstrating the lowering of splatted absolute value shift amounts. The lowering is very bad right now, but subsequent patches will improve it considerably. The tests will be useful for evaluating the improvements in those patches. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83493 Files: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83493.276769.patch Type: text/x-patch Size: 5072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 10:18:34 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:18:34 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <5cb2a1e5828b4bdfd29a8adb46e50504@localhost.localdomain> davidxl added inline comments. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:217 + +void IRToNativeSizeLearning::FunctionFeatures::fillTensor(int32_t *Ptr) const { + int Pos = 0; ---------------- To avoid this translation, the Features can be a wrapper of the tensor array -- with each field accessed with a symbolic index. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:263 + return None; + auto Features = IRToNativeSizeLearning::getFunctionFeatures( + const_cast(F), FAM); ---------------- Can we make getFunctionFeatures directly return the filled tensor -- or at least provide a wrapper? There is no need to expose the TF details with the inline sequence here. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:268 + std::vector Output{nullptr}; + if (!Evaluator->evaluate(Output)) + return None; ---------------- Code from line 268 to 271 can probably be wrapped in a single wrapper function to hide TF details including Tensor delete Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Thu Jul 9 10:19:00 2020 From: llvm-commits at lists.llvm.org (Vedant Kumar via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:19:00 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: <29e3de8c56d2a3d7324e14d87f2d226c@localhost.localdomain> vsk added a comment. Are test{1,2,3,4,5,6} and main all necessary to exercise the changes in this patch? On the surface, it looks like there are two primary changes -- one that affects the case when deadargelim changes the function return type, and another that affects the case where deadargelim modifies a function that returns an array/struct. Can the test be pared down to just cover those two cases? ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:4 + +; RUN: opt < %s -deadargelim -check-debugify -S 2>&1 \ +; RUN: | FileCheck %s -check-prefix=DEBUG ---------------- It doesn't look like the -check-debugify output is important, so it shouldn't be necessary to run the pass. Also, since the dbg.values are also not important, please run -debugify with -debugify-level=locations to omit those intrinsics. ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:7 + +; DEBUG: %oldret = extractvalue { i16, i32 } %B, 1, !dbg ![[RET1:.*]] +; DEBUG: %oldret = extractvalue { i32, i16 } %B, 0, !dbg ![[RET2:.*]] ---------------- Please restructure these checks so they have a clear correspondence to a test function. The typical way to write this is: ``` ; CHECK-LABEL: some_test1 ; CHECK: ... define void @some_test1 ; CHECK-LABEL: some_test2 ; CHECK: ... define void @some_test2 ``` etc. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 From llvm-commits at lists.llvm.org Thu Jul 9 10:24:54 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:24:54 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <57aebc092859da977e3de445ffae0c15@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/ELF/Arch/AVR.cpp:98 + case R_AVR_LO8_LDI_NEG: + writeLDI(loc, (~val + 1) & 0xff); + break; ---------------- ~val + 1 = -val So just use (-val) ================ Comment at: lld/test/ELF/avr-reloc.s:3 +# RUN: llvm-mc -filetype=obj -triple=avr -mcpu=atmega328p %s -o %t.o +# RUN: ld.lld %t.o --defsym=a=0x12345678 --defsym=b=30 -o %t.exe -Ttext=0 +# RUN: llvm-objdump -d %t.exe | FileCheck %s ---------------- `-o %t` Omit `.exe` I don't know why `-Ttext=0` is there. Use `--image-base=0` if you really want a 0 address text segment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Thu Jul 9 10:33:34 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Thu, 09 Jul 2020 10:33:34 -0700 (PDT) Subject: [llvm] 0b72b9d - [ValueLattice] Simplify canTrackGlobalVariableInterprocedurally (NFC). Message-ID: <5f0754ee.1c69fb81.5159d.908b@mx.google.com> Author: Florian Hahn Date: 2020-07-09T18:33:09+01:00 New Revision: 0b72b9d07fcdf888a7020f16f8e497f1e83d2d90 URL: https://github.com/llvm/llvm-project/commit/0b72b9d07fcdf888a7020f16f8e497f1e83d2d90 DIFF: https://github.com/llvm/llvm-project/commit/0b72b9d07fcdf888a7020f16f8e497f1e83d2d90.diff LOG: [ValueLattice] Simplify canTrackGlobalVariableInterprocedurally (NFC). using all_of and checking for valid users in the lambda seems more straight forward. Also adds a comment explaining what we are checking. Added: Modified: llvm/lib/Analysis/ValueLatticeUtils.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/ValueLatticeUtils.cpp b/llvm/lib/Analysis/ValueLatticeUtils.cpp index 3f9287e26ce7..53638c351f72 100644 --- a/llvm/lib/Analysis/ValueLatticeUtils.cpp +++ b/llvm/lib/Analysis/ValueLatticeUtils.cpp @@ -28,16 +28,14 @@ bool llvm::canTrackGlobalVariableInterprocedurally(GlobalVariable *GV) { if (GV->isConstant() || !GV->hasLocalLinkage() || !GV->hasDefinitiveInitializer()) return false; - return !any_of(GV->users(), [&](User *U) { - if (auto *Store = dyn_cast(U)) { - if (Store->getValueOperand() == GV || Store->isVolatile()) - return true; - } else if (auto *Load = dyn_cast(U)) { - if (Load->isVolatile()) - return true; - } else { - return true; - } + return all_of(GV->users(), [&](User *U) { + // Currently all users of a global variable have to be none-volatile loads + // or stores and the global cannot be stored itself. + if (auto *Store = dyn_cast(U)) + return Store->getValueOperand() != GV && !Store->isVolatile(); + if (auto *Load = dyn_cast(U)) + return !Load->isVolatile(); + return false; }); } From llvm-commits at lists.llvm.org Thu Jul 9 10:36:54 2020 From: llvm-commits at lists.llvm.org (Sean Fertile via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:36:54 +0000 (UTC) Subject: [PATCH] D82816: [LLD][PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC In-Reply-To: References: Message-ID: sfertile accepted this revision. sfertile added a comment. One comment but otherwise LGTM. ================ Comment at: lld/ELF/Arch/PPC64.cpp:1041 return false; // If a function is in the Plt it needs to be called with a call-stub. ---------------- NeHuang wrote: > sfertile wrote: > > We should probably insert a couple of fatal error here: > > 1) if the type in NOTOC and the symbols st_other indicates it needs the toc-pointer setup. > > 2) If the type is not NOTOC but the symbols st_other indicates it tramples the toc. > Thanks Sean for the advice. I also moved the fatal error check for the protocol "external call with R_PPC64_REL_NOTOC" here so that we are checking all unimplemented protocols in the same function. 👍 ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:13 +# RUN: ld.lld -T %t.script -shared %t1.o %t2.o -o %t.so +# RUN: ld.lld -T %t.script %t3.o -o %t +# RUN: llvm-readelf -s %t.so | FileCheck %s --check-prefix=SYMBOL ---------------- Nit: I suggest having a separate set of run steps for testing the exec since the steps are completely disjoint from the shared object test. ie separate out ``` # RUN: llvm-mc -filetype=obj -triple=powerpc64[le] -defsym GLOBAL=1 %s -o %t3.o # RUN: ld.lld -T %t.script %t3.o -o %t # RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL-GLOBAL # RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t | FileCheck %s ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Thu Jul 9 10:39:54 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via llvm-commits) Date: Thu, 09 Jul 2020 10:39:54 -0700 (PDT) Subject: [lld] 6f7727d - [PowerPC] Fix test case from beb52b12cb17 Message-ID: <5f07566a.1c69fb81.bf585.828d@mx.google.com> Author: Stefan Pintilie Date: 2020-07-09T12:39:24-05:00 New Revision: 6f7727db478b452a262b2beea2beceef096eb68c URL: https://github.com/llvm/llvm-project/commit/6f7727db478b452a262b2beea2beceef096eb68c DIFF: https://github.com/llvm/llvm-project/commit/6f7727db478b452a262b2beea2beceef096eb68c.diff LOG: [PowerPC] Fix test case from beb52b12cb17 Forgot to add the REQUIRES ppc line to the test. Added: Modified: lld/test/ELF/ppc64-error-toc-local-call.s Removed: ################################################################################ diff --git a/lld/test/ELF/ppc64-error-toc-local-call.s b/lld/test/ELF/ppc64-error-toc-local-call.s index f23eba101209..606f6ead5463 100644 --- a/lld/test/ELF/ppc64-error-toc-local-call.s +++ b/lld/test/ELF/ppc64-error-toc-local-call.s @@ -1,3 +1,4 @@ +# REQUIRES: ppc # RUN: llvm-mc -filetype=obj -triple=powerpc64le %s -o %t.o # RUN: not ld.lld %t.o -o /dev/null 2>&1 | FileCheck %s From llvm-commits at lists.llvm.org Thu Jul 9 10:40:33 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Thu, 09 Jul 2020 10:40:33 -0700 (PDT) Subject: [llvm] 3e75912 - [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. Message-ID: <5f075691.1c69fb81.4dd6f.7729@mx.google.com> Author: Craig Topper Date: 2020-07-09T10:40:09-07:00 New Revision: 3e75912005cbbdc7c7244b73319cb7441e64682f URL: https://github.com/llvm/llvm-project/commit/3e75912005cbbdc7c7244b73319cb7441e64682f DIFF: https://github.com/llvm/llvm-project/commit/3e75912005cbbdc7c7244b73319cb7441e64682f.diff LOG: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. Technically a VSELECT expects a vector of all 1s or 0s elements for its condition. But we aren't guaranteeing that the sign bit and the non sign bits match in these locations. So we should use BLENDV which is more relaxed. Differential Revision: https://reviews.llvm.org/D83447 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 2d6a0c731862..afb356e2cf96 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -27558,12 +27558,13 @@ static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget, ISD::SETGT); return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); } else if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, dl, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true @@ -27673,14 +27674,15 @@ static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget, !ISD::isBuildVectorOfConstantSDNodes(Amt.getNode()); auto SignBitSelect = [&](SDValue Sel, SDValue V0, SDValue V1) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just on + // the sign bit. if (UseSSE41) { MVT ExtVT = MVT::getVectorVT(MVT::i8, VT.getVectorNumElements() * 2); V0 = DAG.getBitcast(ExtVT, V0); V1 = DAG.getBitcast(ExtVT, V1); Sel = DAG.getBitcast(ExtVT, Sel); - return DAG.getBitcast(VT, DAG.getSelect(dl, ExtVT, Sel, V0, V1)); + return DAG.getBitcast( + VT, DAG.getNode(X86ISD::BLENDV, dl, ExtVT, Sel, V0, V1)); } // On pre-SSE41 targets we splat the sign bit - a negative value will // set all bits of the lanes to true and VSELECT uses that in @@ -27820,12 +27822,13 @@ static SDValue LowerRotate(SDValue Op, const X86Subtarget &Subtarget, auto SignBitSelect = [&](MVT SelVT, SDValue Sel, SDValue V0, SDValue V1) { if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(DL, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, DL, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true From llvm-commits at lists.llvm.org Thu Jul 9 10:40:46 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:40:46 +0000 (UTC) Subject: [PATCH] D83447: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. In-Reply-To: References: Message-ID: <7a50f11a988aa8b2d4050fbc604fcce7@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG3e75912005cb: [X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were… (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83447/new/ https://reviews.llvm.org/D83447 Files: llvm/lib/Target/X86/X86ISelLowering.cpp Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -27558,12 +27558,13 @@ ISD::SETGT); return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); } else if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(dl, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, dl, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true @@ -27673,14 +27674,15 @@ !ISD::isBuildVectorOfConstantSDNodes(Amt.getNode()); auto SignBitSelect = [&](SDValue Sel, SDValue V0, SDValue V1) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just on + // the sign bit. if (UseSSE41) { MVT ExtVT = MVT::getVectorVT(MVT::i8, VT.getVectorNumElements() * 2); V0 = DAG.getBitcast(ExtVT, V0); V1 = DAG.getBitcast(ExtVT, V1); Sel = DAG.getBitcast(ExtVT, Sel); - return DAG.getBitcast(VT, DAG.getSelect(dl, ExtVT, Sel, V0, V1)); + return DAG.getBitcast( + VT, DAG.getNode(X86ISD::BLENDV, dl, ExtVT, Sel, V0, V1)); } // On pre-SSE41 targets we splat the sign bit - a negative value will // set all bits of the lanes to true and VSELECT uses that in @@ -27820,12 +27822,13 @@ auto SignBitSelect = [&](MVT SelVT, SDValue Sel, SDValue V0, SDValue V1) { if (Subtarget.hasSSE41()) { - // On SSE41 targets we make use of the fact that VSELECT lowers - // to PBLENDVB which selects bytes based just on the sign bit. + // On SSE41 targets we can use PBLENDVB which selects bytes based just + // on the sign bit. V0 = DAG.getBitcast(VT, V0); V1 = DAG.getBitcast(VT, V1); Sel = DAG.getBitcast(VT, Sel); - return DAG.getBitcast(SelVT, DAG.getSelect(DL, VT, Sel, V0, V1)); + return DAG.getBitcast(SelVT, + DAG.getNode(X86ISD::BLENDV, DL, VT, Sel, V0, V1)); } // On pre-SSE41 targets we test for the sign bit by comparing to // zero - a negative value will set all bits of the lanes to true -------------- next part -------------- A non-text attachment was scrubbed... Name: D83447.276779.patch Type: text/x-patch Size: 2945 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 10:44:01 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via llvm-commits) Date: Thu, 09 Jul 2020 10:44:01 -0700 (PDT) Subject: [llvm] 2c1a900 - [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. Message-ID: <5f075761.1c69fb81.28f27.87a7@mx.google.com> Author: Hiroshi Yamauchi Date: 2020-07-09T10:43:45-07:00 New Revision: 2c1a9006dd73e4e9f446fb6fa6cf2328eee1558f URL: https://github.com/llvm/llvm-project/commit/2c1a9006dd73e4e9f446fb6fa6cf2328eee1558f DIFF: https://github.com/llvm/llvm-project/commit/2c1a9006dd73e4e9f446fb6fa6cf2328eee1558f.diff LOG: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/phaddsub-extract.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index afb356e2cf96..cfbf38c56bc9 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -34448,7 +34448,7 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, return DAG.getBitcast(RootVT, V1); } - bool OptForSize = DAG.getMachineFunction().getFunction().hasOptSize(); + bool OptForSize = DAG.shouldOptForSize(); unsigned RootSizeInBits = RootVT.getSizeInBits(); unsigned NumRootElts = RootVT.getVectorNumElements(); unsigned BaseMaskEltSizeInBits = RootSizeInBits / NumBaseMaskElts; @@ -39290,7 +39290,7 @@ static SDValue combineReductionToHorizontal(SDNode *ExtElt, SelectionDAG &DAG, } // Only use (F)HADD opcodes if they aren't microcoded or minimizes codesize. - bool OptForSize = DAG.getMachineFunction().getFunction().hasOptSize(); + bool OptForSize = DAG.shouldOptForSize(); if (!Subtarget.hasFastHorizontalOps() && !OptForSize) return SDValue(); diff --git a/llvm/test/CodeGen/X86/avx-vperm2x128.ll b/llvm/test/CodeGen/X86/avx-vperm2x128.ll index 8310cc9b50a3..26a5cd328d5c 100644 --- a/llvm/test/CodeGen/X86/avx-vperm2x128.ll +++ b/llvm/test/CodeGen/X86/avx-vperm2x128.ll @@ -397,8 +397,7 @@ define <4 x double> @shuffle_v4f64_zz23_optsize(<4 x double> %a) optsize { define <4 x double> @shuffle_v4f64_zz23_pgso(<4 x double> %a) !prof !14 { ; ALL-LABEL: shuffle_v4f64_zz23_pgso: ; ALL: # %bb.0: -; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 -; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = zero,zero,ymm0[2,3] ; ALL-NEXT: retq %s = shufflevector <4 x double> %a, <4 x double> , <4 x i32> ret <4 x double> %s @@ -441,8 +440,7 @@ define <4 x double> @shuffle_v4f64_zz67_optsize(<4 x double> %a) optsize { define <4 x double> @shuffle_v4f64_zz67_pgso(<4 x double> %a) !prof !14 { ; ALL-LABEL: shuffle_v4f64_zz67_pgso: ; ALL: # %bb.0: -; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1 -; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] +; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = zero,zero,ymm0[2,3] ; ALL-NEXT: retq %s = shufflevector <4 x double> , <4 x double> %a, <4 x i32> ret <4 x double> %s diff --git a/llvm/test/CodeGen/X86/phaddsub-extract.ll b/llvm/test/CodeGen/X86/phaddsub-extract.ll index f475a31b7d29..dd258c5f424a 100644 --- a/llvm/test/CodeGen/X86/phaddsub-extract.ll +++ b/llvm/test/CodeGen/X86/phaddsub-extract.ll @@ -2095,35 +2095,19 @@ define i32 @hadd32_4_optsize(<4 x i32> %x225) optsize { } define i32 @hadd32_4_pgso(<4 x i32> %x225) !prof !14 { -; SSE3-SLOW-LABEL: hadd32_4_pgso: -; SSE3-SLOW: # %bb.0: -; SSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] -; SSE3-SLOW-NEXT: paddd %xmm0, %xmm1 -; SSE3-SLOW-NEXT: phaddd %xmm1, %xmm1 -; SSE3-SLOW-NEXT: movd %xmm1, %eax -; SSE3-SLOW-NEXT: retq -; -; SSE3-FAST-LABEL: hadd32_4_pgso: -; SSE3-FAST: # %bb.0: -; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 -; SSE3-FAST-NEXT: phaddd %xmm0, %xmm0 -; SSE3-FAST-NEXT: movd %xmm0, %eax -; SSE3-FAST-NEXT: retq -; -; AVX-SLOW-LABEL: hadd32_4_pgso: -; AVX-SLOW: # %bb.0: -; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1] -; AVX-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0 -; AVX-SLOW-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-SLOW-NEXT: vmovd %xmm0, %eax -; AVX-SLOW-NEXT: retq +; SSE3-LABEL: hadd32_4_pgso: +; SSE3: # %bb.0: +; SSE3-NEXT: phaddd %xmm0, %xmm0 +; SSE3-NEXT: phaddd %xmm0, %xmm0 +; SSE3-NEXT: movd %xmm0, %eax +; SSE3-NEXT: retq ; -; AVX-FAST-LABEL: hadd32_4_pgso: -; AVX-FAST: # %bb.0: -; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0 -; AVX-FAST-NEXT: vmovd %xmm0, %eax -; AVX-FAST-NEXT: retq +; AVX-LABEL: hadd32_4_pgso: +; AVX: # %bb.0: +; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0 +; AVX-NEXT: vmovd %xmm0, %eax +; AVX-NEXT: retq %x226 = shufflevector <4 x i32> %x225, <4 x i32> undef, <4 x i32> %x227 = add <4 x i32> %x225, %x226 %x228 = shufflevector <4 x i32> %x227, <4 x i32> undef, <4 x i32> From llvm-commits at lists.llvm.org Thu Jul 9 10:49:25 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Thu, 09 Jul 2020 10:49:25 -0700 (PDT) Subject: [llvm] 8769611 - Remove unnecessary 'rm' in llvm-reduce tests Message-ID: <5f0758a5.1c69fb81.660e8.8a5f@mx.google.com> Author: David Blaikie Date: 2020-07-09T10:49:11-07:00 New Revision: 8769611f0af2598177d8d03ad6dbbe064210bfed URL: https://github.com/llvm/llvm-project/commit/8769611f0af2598177d8d03ad6dbbe064210bfed DIFF: https://github.com/llvm/llvm-project/commit/8769611f0af2598177d8d03ad6dbbe064210bfed.diff LOG: Remove unnecessary 'rm' in llvm-reduce tests These were initially added to cleanup some transient/leftover files in r372054. Now that's all cleaned up, these are no longer needed. Added: Modified: llvm/test/Reduce/remove-args.ll llvm/test/Reduce/remove-funcs.ll llvm/test/Reduce/remove-global-vars.ll llvm/test/Reduce/remove-metadata.ll llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll llvm/test/Reduce/remove-operand-bundles.ll Removed: ################################################################################ diff --git a/llvm/test/Reduce/remove-args.ll b/llvm/test/Reduce/remove-args.ll index 8d6130262bf5..161a6fd3731b 100644 --- a/llvm/test/Reduce/remove-args.ll +++ b/llvm/test/Reduce/remove-args.ll @@ -1,6 +1,5 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-args.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s diff --git a/llvm/test/Reduce/remove-funcs.ll b/llvm/test/Reduce/remove-funcs.ll index 8e9b2579a974..59ffd849193d 100644 --- a/llvm/test/Reduce/remove-funcs.ll +++ b/llvm/test/Reduce/remove-funcs.ll @@ -1,7 +1,6 @@ ; Test that llvm-reduce can remove uninteresting functions as well as ; their InstCalls. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-funcs.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s diff --git a/llvm/test/Reduce/remove-global-vars.ll b/llvm/test/Reduce/remove-global-vars.ll index 921083fc93b0..4fca4a1e6973 100644 --- a/llvm/test/Reduce/remove-global-vars.ll +++ b/llvm/test/Reduce/remove-global-vars.ll @@ -1,7 +1,6 @@ ; Test that llvm-reduce can remove uninteresting Global Variables as well as ; their direct uses (which in turn are replaced with 'undef'). ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-global-vars.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s diff --git a/llvm/test/Reduce/remove-metadata.ll b/llvm/test/Reduce/remove-metadata.ll index da7d5a2f16b4..51a50ca20a98 100644 --- a/llvm/test/Reduce/remove-metadata.ll +++ b/llvm/test/Reduce/remove-metadata.ll @@ -1,7 +1,6 @@ ; Test that llvm-reduce can remove uninteresting metadata from an IR file. ; The Metadata pass erases named & unnamed metadata nodes. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-metadata.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=! %s diff --git a/llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll b/llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll index 4d280bcf2f31..21a638f1e6bc 100644 --- a/llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll +++ b/llvm/test/Reduce/remove-multiple-use-of-args-in-same-instruction.ll @@ -1,6 +1,5 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-multiple-use-of-args-in-same-instruction.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s diff --git a/llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll b/llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll index c8b639099fd3..4400bc818e55 100644 --- a/llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll +++ b/llvm/test/Reduce/remove-multiple-use-of-global-vars-in-same-instruction.ll @@ -1,6 +1,5 @@ ; Test that llvm-reduce can remove uninteresting function arguments from function definitions as well as their calls. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test %python --test-arg %p/Inputs/remove-multiple-use-of-global-vars-in-same-instruction.py %s -o %t ; RUN: cat %t | FileCheck -implicit-check-not=uninteresting %s diff --git a/llvm/test/Reduce/remove-operand-bundles.ll b/llvm/test/Reduce/remove-operand-bundles.ll index 39c0af6c9ae5..b0f3af6dbc85 100644 --- a/llvm/test/Reduce/remove-operand-bundles.ll +++ b/llvm/test/Reduce/remove-operand-bundles.ll @@ -1,6 +1,5 @@ ; Test that llvm-reduce can remove uninteresting operand bundles from calls. ; -; RUN: rm -rf %t ; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t ; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s From llvm-commits at lists.llvm.org Thu Jul 9 10:49:41 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:49:41 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough In-Reply-To: References: Message-ID: <2101c8d31be948edc619b2b376e5d81e@localhost.localdomain> dblaikie added a comment. In D83432#2141025 , @lebedev.ri wrote: > In D83432#2140532 , @dblaikie wrote: > > > If the test is writing the output file anyway - is the rm necessary? (lots of tests write to output files via "-o %t" from some tool or another and most don't delete %t before doing so) > > > They were added by you in rL372054 , so indeed the revert of that commit would be the best solution now. Thought it might've been my fault :) great - removed in 8769611f0af2598177d8d03ad6dbbe064210bfed Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83432/new/ https://reviews.llvm.org/D83432 From llvm-commits at lists.llvm.org Thu Jul 9 10:50:58 2020 From: llvm-commits at lists.llvm.org (Alex Bradbury via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:50:58 +0000 (UTC) Subject: [PATCH] D77030: [RISCV] refactor FeatureRVCHints to make ProcessorModel more intuitive In-Reply-To: References: Message-ID: <4aa9889ae59e43a4885173976e601869@localhost.localdomain> asb accepted this revision. asb added a comment. Let's go for it! Thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77030/new/ https://reviews.llvm.org/D77030 From llvm-commits at lists.llvm.org Thu Jul 9 10:52:00 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Thu, 09 Jul 2020 10:52:00 -0700 (PDT) Subject: [llvm] 918e653 - [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. Message-ID: <5f075940.1c69fb81.38726.8238@mx.google.com> Author: Craig Topper Date: 2020-07-09T10:51:29-07:00 New Revision: 918e6531863187d65895fd68bbc622369b3d79f3 URL: https://github.com/llvm/llvm-project/commit/918e6531863187d65895fd68bbc622369b3d79f3 DIFF: https://github.com/llvm/llvm-project/commit/918e6531863187d65895fd68bbc622369b3d79f3.diff LOG: [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. If we don't immediately lower the vector shift, the splat constant vector we created may get turned into a constant pool load before we get around to lowering the shift. This makes it a lot more difficult to create a shift by constant. Sometimes we fail to see through the constant pool at all and end up trying to lower as if it was a variable shift. This requires custom handling and may create an unsupported vselect on pre-sse-4.1 targets. Since we're after LegalizeVectorOps we are unable to legalize the unsupported vselect as that code is in LegalizeVectorOps rather than LegalizeDAG. So calling LowerShift immediately ensures that we get see the splat constant. Fixes PR46527. Differential Revision: https://reviews.llvm.org/D83455 Added: llvm/test/CodeGen/X86/pr46527.ll Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index cfbf38c56bc9..cdd7ba1ec432 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -9689,6 +9689,9 @@ static SDValue LowerToHorizontalOp(const BuildVectorSDNode *BV, return SDValue(); } +static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget, + SelectionDAG &DAG); + /// If a BUILD_VECTOR's source elements all apply the same bit operation and /// one of their operands is constant, lower to a pair of BUILD_VECTOR and /// just apply the bit to the vectors. @@ -9696,6 +9699,7 @@ static SDValue LowerToHorizontalOp(const BuildVectorSDNode *BV, /// from this, but enough scalar bit operations are created from the later /// legalization + scalarization stages to need basic support. static SDValue lowerBuildVectorToBitOp(BuildVectorSDNode *Op, + const X86Subtarget &Subtarget, SelectionDAG &DAG) { SDLoc DL(Op); MVT VT = Op->getSimpleValueType(0); @@ -9759,7 +9763,14 @@ static SDValue lowerBuildVectorToBitOp(BuildVectorSDNode *Op, SDValue LHS = DAG.getBuildVector(VT, DL, LHSElts); SDValue RHS = DAG.getBuildVector(VT, DL, RHSElts); - return DAG.getNode(Opcode, DL, VT, LHS, RHS); + SDValue Res = DAG.getNode(Opcode, DL, VT, LHS, RHS); + + if (!IsShift) + return Res; + + // Immediately lower the shift to ensure the constant build vector doesn't + // get converted to a constant pool before the shift is lowered. + return LowerShift(Res, Subtarget, DAG); } /// Create a vector constant without a load. SSE/AVX provide the bare minimum @@ -10115,7 +10126,7 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const { return HorizontalOp; if (SDValue Broadcast = lowerBuildVectorAsBroadcast(BV, Subtarget, DAG)) return Broadcast; - if (SDValue BitOp = lowerBuildVectorToBitOp(BV, DAG)) + if (SDValue BitOp = lowerBuildVectorToBitOp(BV, Subtarget, DAG)) return BitOp; unsigned EVTBits = EltVT.getSizeInBits(); diff --git a/llvm/test/CodeGen/X86/pr46527.ll b/llvm/test/CodeGen/X86/pr46527.ll new file mode 100644 index 000000000000..1d3f16f8c1ae --- /dev/null +++ b/llvm/test/CodeGen/X86/pr46527.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +;RUN: llc < %s -mtriple=i686-unknown -mattr=sse2 -relocation-model=pic | FileCheck %s + +define void @f(<16 x i8>* %out, <16 x i8> %in, i1 %flag) { +; CHECK-LABEL: f: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: calll .L0$pb +; CHECK-NEXT: .cfi_adjust_cfa_offset 4 +; CHECK-NEXT: .L0$pb: +; CHECK-NEXT: popl %eax +; CHECK-NEXT: .cfi_adjust_cfa_offset -4 +; CHECK-NEXT: .Ltmp0: +; CHECK-NEXT: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax +; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx +; CHECK-NEXT: movb {{[0-9]+}}(%esp), %dl +; CHECK-NEXT: notb %dl +; CHECK-NEXT: andb $1, %dl +; CHECK-NEXT: movzbl %dl, %edx +; CHECK-NEXT: movd %edx, %xmm1 +; CHECK-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] +; CHECK-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,3,4,5,6,7] +; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0] +; CHECK-NEXT: paddb %xmm1, %xmm1 +; CHECK-NEXT: pxor %xmm0, %xmm1 +; CHECK-NEXT: pxor {{\.LCPI.*}}@GOTOFF(%eax), %xmm1 +; CHECK-NEXT: movdqa %xmm1, (%ecx) +; CHECK-NEXT: retl +entry: + %0 = select i1 %flag, i8 0, i8 2 + %1 = insertelement <16 x i8> undef, i8 %0, i32 0 + %2 = shufflevector <16 x i8> %1, <16 x i8> undef, <16 x i32> zeroinitializer + %3 = xor <16 x i8> %2, %in + %4 = xor <16 x i8> %3, + store <16 x i8> %4, <16 x i8>* %out, align 16 + ret void +} From llvm-commits at lists.llvm.org Thu Jul 9 10:52:12 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:52:12 +0000 (UTC) Subject: [PATCH] D83455: [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG918e65318631: [X86] Immediately call LowerShift from lowerBuildVectorToBitOp. (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83455/new/ https://reviews.llvm.org/D83455 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/pr46527.ll Index: llvm/test/CodeGen/X86/pr46527.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/X86/pr46527.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +;RUN: llc < %s -mtriple=i686-unknown -mattr=sse2 -relocation-model=pic | FileCheck %s + +define void @f(<16 x i8>* %out, <16 x i8> %in, i1 %flag) { +; CHECK-LABEL: f: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: calll .L0$pb +; CHECK-NEXT: .cfi_adjust_cfa_offset 4 +; CHECK-NEXT: .L0$pb: +; CHECK-NEXT: popl %eax +; CHECK-NEXT: .cfi_adjust_cfa_offset -4 +; CHECK-NEXT: .Ltmp0: +; CHECK-NEXT: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax +; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx +; CHECK-NEXT: movb {{[0-9]+}}(%esp), %dl +; CHECK-NEXT: notb %dl +; CHECK-NEXT: andb $1, %dl +; CHECK-NEXT: movzbl %dl, %edx +; CHECK-NEXT: movd %edx, %xmm1 +; CHECK-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] +; CHECK-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,3,4,5,6,7] +; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0] +; CHECK-NEXT: paddb %xmm1, %xmm1 +; CHECK-NEXT: pxor %xmm0, %xmm1 +; CHECK-NEXT: pxor {{\.LCPI.*}}@GOTOFF(%eax), %xmm1 +; CHECK-NEXT: movdqa %xmm1, (%ecx) +; CHECK-NEXT: retl +entry: + %0 = select i1 %flag, i8 0, i8 2 + %1 = insertelement <16 x i8> undef, i8 %0, i32 0 + %2 = shufflevector <16 x i8> %1, <16 x i8> undef, <16 x i32> zeroinitializer + %3 = xor <16 x i8> %2, %in + %4 = xor <16 x i8> %3, + store <16 x i8> %4, <16 x i8>* %out, align 16 + ret void +} Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -9689,6 +9689,9 @@ return SDValue(); } +static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget, + SelectionDAG &DAG); + /// If a BUILD_VECTOR's source elements all apply the same bit operation and /// one of their operands is constant, lower to a pair of BUILD_VECTOR and /// just apply the bit to the vectors. @@ -9696,6 +9699,7 @@ /// from this, but enough scalar bit operations are created from the later /// legalization + scalarization stages to need basic support. static SDValue lowerBuildVectorToBitOp(BuildVectorSDNode *Op, + const X86Subtarget &Subtarget, SelectionDAG &DAG) { SDLoc DL(Op); MVT VT = Op->getSimpleValueType(0); @@ -9759,7 +9763,14 @@ SDValue LHS = DAG.getBuildVector(VT, DL, LHSElts); SDValue RHS = DAG.getBuildVector(VT, DL, RHSElts); - return DAG.getNode(Opcode, DL, VT, LHS, RHS); + SDValue Res = DAG.getNode(Opcode, DL, VT, LHS, RHS); + + if (!IsShift) + return Res; + + // Immediately lower the shift to ensure the constant build vector doesn't + // get converted to a constant pool before the shift is lowered. + return LowerShift(Res, Subtarget, DAG); } /// Create a vector constant without a load. SSE/AVX provide the bare minimum @@ -10115,7 +10126,7 @@ return HorizontalOp; if (SDValue Broadcast = lowerBuildVectorAsBroadcast(BV, Subtarget, DAG)) return Broadcast; - if (SDValue BitOp = lowerBuildVectorToBitOp(BV, DAG)) + if (SDValue BitOp = lowerBuildVectorToBitOp(BV, Subtarget, DAG)) return BitOp; unsigned EVTBits = EltVT.getSizeInBits(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83455.276780.patch Type: text/x-patch Size: 3589 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 10:52:17 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 17:52:17 +0000 (UTC) Subject: [PATCH] D83432: [NFC][llvm-reduce] Don't `rm -rf` in tests, `rm -f` is enough In-Reply-To: References: Message-ID: <20862b72786a64a4c7a0dcaac69710a4@localhost.localdomain> lebedev.ri abandoned this revision. lebedev.ri added a comment. In D83432#2142154 , @dblaikie wrote: > In D83432#2141025 , @lebedev.ri wrote: > > > In D83432#2140532 , @dblaikie wrote: > > > > > If the test is writing the output file anyway - is the rm necessary? (lots of tests write to output files via "-o %t" from some tool or another and most don't delete %t before doing so) > > > > > > They were added by you in rL372054 , so indeed the revert of that commit would be the best solution now. > > > Thought it might've been my fault :) great - removed in 8769611f0af2598177d8d03ad6dbbe064210bfed Thanks, that'll do. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83432/new/ https://reviews.llvm.org/D83432 From llvm-commits at lists.llvm.org Thu Jul 9 11:01:38 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Thu, 09 Jul 2020 11:01:38 -0700 (PDT) Subject: [llvm] 122b064 - [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison Message-ID: <5f075b82.1c69fb81.af42.8cbb@mx.google.com> Author: Craig Topper Date: 2020-07-09T11:01:12-07:00 New Revision: 122b0640fc97202bacb630744dfc6da58f11af42 URL: https://github.com/llvm/llvm-project/commit/122b0640fc97202bacb630744dfc6da58f11af42 DIFF: https://github.com/llvm/llvm-project/commit/122b0640fc97202bacb630744dfc6da58f11af42.diff LOG: [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions. Differential Revision: https://reviews.llvm.org/D83442 Added: Modified: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index 8cd5d2034586..277e2907fa04 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -4135,9 +4135,11 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, // one element is undef, choose the defined element as the safe result. if (TEltC == FEltC) NewC.push_back(TEltC); - else if (isa(TEltC)) + else if (isa(TEltC) && + isGuaranteedNotToBeUndefOrPoison(FEltC)) NewC.push_back(FEltC); - else if (isa(FEltC)) + else if (isa(FEltC) && + isGuaranteedNotToBeUndefOrPoison(TEltC)) NewC.push_back(TEltC); else break; diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll index 0f43c8f61945..8b69badb32f3 100644 --- a/llvm/test/Transforms/InstSimplify/select.ll +++ b/llvm/test/Transforms/InstSimplify/select.ll @@ -848,3 +848,17 @@ define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { %s = select i1 %cond, i32 undef, i32 %xf ret i32 %s } + + at g = external global i32, align 1 + +; Make sure we don't fold partial undef vectors when constexprs are involved. +; We would need to prove the constexpr doesn't result in poison which we aren't +; equiped to do yet. +define <2 x i32> @false_undef_true_constextpr_vec(i1 %cond) { +; CHECK-LABEL: @false_undef_true_constextpr_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> , <2 x i32> +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> , <2 x i32> + ret <2 x i32> %s +} From llvm-commits at lists.llvm.org Thu Jul 9 11:01:44 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:01:44 +0000 (UTC) Subject: [PATCH] D82552: [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand In-Reply-To: References: Message-ID: <82fd770d440bb58d42bdf02a20d6b687@localhost.localdomain> plotfi added a comment. @pratlucas Awesome thank you! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82552/new/ https://reviews.llvm.org/D82552 From llvm-commits at lists.llvm.org Thu Jul 9 11:01:45 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:01:45 +0000 (UTC) Subject: [PATCH] D83442: [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG122b0640fc97: [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the… (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83442/new/ https://reviews.llvm.org/D83442 Files: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Index: llvm/test/Transforms/InstSimplify/select.ll =================================================================== --- llvm/test/Transforms/InstSimplify/select.ll +++ llvm/test/Transforms/InstSimplify/select.ll @@ -848,3 +848,17 @@ %s = select i1 %cond, i32 undef, i32 %xf ret i32 %s } + + at g = external global i32, align 1 + +; Make sure we don't fold partial undef vectors when constexprs are involved. +; We would need to prove the constexpr doesn't result in poison which we aren't +; equiped to do yet. +define <2 x i32> @false_undef_true_constextpr_vec(i1 %cond) { +; CHECK-LABEL: @false_undef_true_constextpr_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], <2 x i32> , <2 x i32> +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 %cond, <2 x i32> , <2 x i32> + ret <2 x i32> %s +} Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4135,9 +4135,11 @@ // one element is undef, choose the defined element as the safe result. if (TEltC == FEltC) NewC.push_back(TEltC); - else if (isa(TEltC)) + else if (isa(TEltC) && + isGuaranteedNotToBeUndefOrPoison(FEltC)) NewC.push_back(FEltC); - else if (isa(FEltC)) + else if (isa(FEltC) && + isGuaranteedNotToBeUndefOrPoison(TEltC)) NewC.push_back(TEltC); else break; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83442.276781.patch Type: text/x-patch Size: 1728 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:03:27 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:03:27 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: <43ae7d4881c58cdb4a2484dacc4194e1@localhost.localdomain> davidxl added inline comments. ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:293 +static void writeInstrProfile(StringRef OutputFilename, + ProfileFormat OutputFormat, ---------------- this refactoring can also be committed independently ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:553 + std::unique_ptr WC; + // Make sure Inputs[i] is sample profile and Inputs[i - 1] is + // instrumentation profile. ---------------- make sample file path as the part of the option, so there is no need to handle the ordering. ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:568 + adjustInstrProfiles(WC, Reader, + Inputs[i].Weight / (double)Inputs[1 - i].Weight, + EarlyInlineSizeThreshold, BaseScaleFunction); ---------------- Are these two weights comparable? ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:872 + cl::opt BaseScaleFunction( + "base-scale-function", cl::init(""), cl::Hidden, + cl::desc("When supplementing an instrumentation profile with sample " ---------------- Is this flag tested? Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 From llvm-commits at lists.llvm.org Thu Jul 9 11:07:03 2020 From: llvm-commits at lists.llvm.org (Tim Keith via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:07:03 +0000 (UTC) Subject: [PATCH] D83491: [flang] Fix a crash when creating generics from a copy In-Reply-To: References: Message-ID: <66085936a9f67a3b3bb58d2f898ff665@localhost.localdomain> tskeith added inline comments. ================ Comment at: flang/lib/Semantics/symbol.cpp:211 + } + } } ---------------- I think that `specificProcs_` and `bindingNames_` are supposed to be parallel vectors; at least that is the assumption in `CheckHelper::CheckGeneric`. So this should be written as a single loop that pushes onto the two lists at the same time. As it's written it looks like the two loops might push different numbers of elements on the two lists. One thing that suggests that the above assumption is wrong is the existence of this constructor: `GenericDetails(const SymbolVector &specificProcs);`. But I'm not sure it is ever used, so it would be good if you can delete it as part of this change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83491/new/ https://reviews.llvm.org/D83491 From llvm-commits at lists.llvm.org Thu Jul 9 11:08:22 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:08:22 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <58305fd3a1deafa9ddc5d4ec89b7944d@localhost.localdomain> craig.topper updated this revision to Diff 276782. craig.topper added a comment. Use CxtI and DT if they are passed correctly in the SimplifyQuery CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 Files: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Index: llvm/test/Transforms/InstSimplify/select.ll =================================================================== --- llvm/test/Transforms/InstSimplify/select.ll +++ llvm/test/Transforms/InstSimplify/select.ll @@ -794,8 +794,7 @@ ; These can be folded because the other value is guaranteed not to be poison. define i32 @false_undef_true_constant(i1 %cond) { ; CHECK-LABEL: @false_undef_true_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 10, i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 10 ; %s = select i1 %cond, i32 10, i32 undef ret i32 %s @@ -803,8 +802,7 @@ define i32 @true_undef_false_constant(i1 %cond) { ; CHECK-LABEL: @true_undef_false_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 20 -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 20 ; %s = select i1 %cond, i32 undef, i32 20 ret i32 %s @@ -830,8 +828,7 @@ define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_true_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[XF]], i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 %xf, i32 undef @@ -841,8 +838,7 @@ define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_false_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[XF]] -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 undef, i32 %xf Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4118,6 +4118,17 @@ if (TrueVal == FalseVal) return TrueVal; + // If the true or false value is undef, we can fold to the other value as + // long as the other value isn't poison. + // select ?, undef, X -> X + if (isa(TrueVal) && + isGuaranteedNotToBeUndefOrPoison(FalseVal, Q.CxtI, Q.DT)) + return FalseVal; + // select ?, X, undef -> X + if (isa(FalseVal) && + isGuaranteedNotToBeUndefOrPoison(TrueVal, Q.CxtI, Q.DT)) + return TrueVal; + // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && -------------- next part -------------- A non-text attachment was scrubbed... Name: D83440.276782.patch Type: text/x-patch Size: 2577 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:09:25 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:09:25 +0000 (UTC) Subject: [PATCH] D83495: [DebugInfo] Add DWARF emission for DBG_VALUE_LIST Message-ID: StephenTozer created this revision. StephenTozer added reviewers: aprantl, probinson, vsk, djtodoro, dblaikie. StephenTozer added a project: debug-info. Herald added subscribers: llvm-commits, aheejin, hiraditya, dschuff. Herald added a project: LLVM. Continuing the work discussed in the RFC[0], this patch implements the actual emission of DWARF from a DBG_VALUE_LIST instruction. The logic for handling the new instruction is simple in most places; DbgEntityHistoryCalculator has a more complex set of changes since it's more involved with register tracking, and the code for producing DW_AT_location in both DwarfDebug and DwarfExpression also required some heftier work. Previously, the code in emitDebugLocEntry functioned along the lines of: 1. Emit any fragment info 2. Emit any entry value info 3. Emit the location specified in the DBG_VALUE, e.g. `DW_OP_reg X` or `DW_OP_constu X` 4. Finally call DwarfExpression::addExpression(), which handles the DIExpression (except fragments) Since there may now be multiple locations scattered throughout the expression, rather than a single location at the front, `addExpression` has been modified to optionally take a lambda that is used to handle `DW_OP_LLVM_arg N`; the lambda is passed in from emitDebugLocEntry, and performs step 3 using the Nth debug operand. Non-list debug values follow the same behaviour as before. DwarfCompileUnit::constructVariableDIEImpl is similar, but simpler. The alternative to using the lambda would be to move some of the code in DwarfDebug::emitDebugLocEntry directly into DwarfExpr, and passing a list of locations to `addExpression`. The hard part with this is that DwarfDebug and DwarfCompileUnit perform step 3 differently, although it's possible their behaviour can be merged. The purpose of choosing the lambda was to minimize the amount of actual change made, but if the alternative option seems like an objectively good refactor then I'm happy to adjust. [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83495 Files: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DbgEntityHistoryCalculator.cpp llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp llvm/lib/CodeGen/AsmPrinter/DebugLocEntry.h llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h llvm/test/DebugInfo/X86/dbg_value_list_clobbers.mir llvm/test/DebugInfo/X86/dbg_value_list_emission.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83495.276774.patch Type: text/x-patch Size: 41044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:15:06 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:15:06 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: lei accepted this revision. lei added a comment. LGTM thx. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Thu Jul 9 11:16:21 2020 From: llvm-commits at lists.llvm.org (Martin KaFai Lau via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:16:21 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: iamkafai added a comment. I don't have strong opinion on whether using int for float/double or byte array or completely skip emitting BTF for those members. I would slightly prefer the later (skip emitting BTF) for now until we figure out a real use case that the bpf prog needs to read them. I think the bpf program is not supposed to access the float/double anyway, right?. I would prefer a warning. especially for >64k members which should not happen, right? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Thu Jul 9 11:17:56 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:17:56 +0000 (UTC) Subject: [PATCH] D83444: [AArch64][SVE] Add lowering for llvm.fma. In-Reply-To: References: Message-ID: paulwalker-arm added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:405-410 + def : Pat<(nxv8f16 (AArch64fma_p nxv8i1:$P, nxv8f16:$Op1, nxv8f16:$Op2, nxv8f16:$Op3)), + (FMLA_ZPmZZ_H $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv4f32 (AArch64fma_p nxv4i1:$P, nxv4f32:$Op1, nxv4f32:$Op2, nxv4f32:$Op3)), + (FMLA_ZPmZZ_S $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv2f64 (AArch64fma_p nxv2i1:$P, nxv2f64:$Op1, nxv2f64:$Op2, nxv2f64:$Op3)), + (FMLA_ZPmZZ_D $P, $Op3, $Op1, $Op2)>; ---------------- paulwalker-arm wrote: > I was going to say you're missing patterns for the other legal scalable vector types, but I can see that's a common theme across the floating point instructions so I'm happy enough. FYI: I'll look into the missing patterns for the existing operations tomorrow as part of the fixed length enablement. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83444/new/ https://reviews.llvm.org/D83444 From llvm-commits at lists.llvm.org Thu Jul 9 11:18:47 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 09 Jul 2020 11:18:47 -0700 (PDT) Subject: [lld] c282708 - Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. Message-ID: <5f075f87.1c69fb81.43098.87a8@mx.google.com> Author: Eric Christopher Date: 2020-07-09T11:14:00-07:00 New Revision: c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 URL: https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 DIFF: https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3.diff LOG: Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. Added: Modified: lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp Removed: ################################################################################ diff --git a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp index aad5f8afcfdc..07c1d4242e03 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { fromBinary(fileBytes, sizeof(fileBytes), "i386"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { fromBinary(fileBytes, sizeof(fileBytes), "ppc"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f2->fileType), MH_OBJECT); - EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f2->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); EXPECT_TRUE(f2->undefinedSymbols.empty()); @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp index 6ceb197b4b84..c1445ea7eacd 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { std::unique_ptr f2 = fromYAML(intermediate); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f2->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f2->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->sections.empty()); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; From llvm-commits at lists.llvm.org Thu Jul 9 11:19:46 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 9 Jul 2020 11:19:46 -0700 Subject: [PATCH] D83355: [flang] upstream intrinsic call lowering In-Reply-To: References: Message-ID: singleIndirectionLevel is dead in this commit. -eric On Wed, Jul 8, 2020 at 3:57 PM Eric Christopher wrote: > So this caused a dozen more _Complex warnings and other warnings. > > -eric > > On Wed, Jul 8, 2020 at 7:34 AM Eric Schweitz via Phabricator via > llvm-commits wrote: > >> This revision was automatically updated to reflect the committed changes. >> Closed by commit rG24b62f28c5da: [flang] Upstreaming intrinsic call >> lowering. (authored by schweitz). >> >> Repository: >> rG LLVM Github Monorepo >> >> CHANGES SINCE LAST ACTION >> https://reviews.llvm.org/D83355/new/ >> >> https://reviews.llvm.org/D83355 >> >> Files: >> flang/include/flang/Lower/CharacterExpr.h >> flang/include/flang/Lower/IntrinsicCall.h >> flang/include/flang/Lower/Mangler.h >> flang/include/flang/Optimizer/Dialect/FIRType.h >> flang/lib/Lower/CMakeLists.txt >> flang/lib/Lower/CharacterExpr.cpp >> flang/lib/Lower/IntrinsicCall.cpp >> flang/lib/Lower/Mangler.cpp >> flang/lib/Optimizer/Dialect/FIRType.cpp >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:24:03 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:24:03 +0000 (UTC) Subject: [PATCH] D83497: [PowerPC][Power10] Fix the VINSW instruction to have an i32 argument. Message-ID: amyk created this revision. amyk added reviewers: power-llvm-team, PowerPC, nemanjai. amyk added projects: LLVM, PowerPC. Herald added subscribers: shchenz, hiraditya. Previously, the `vinsw` instruction and intrinsic was defined to have its second argument argument as an i64. As a result, the argument would have to either be sign or zero extended prior being passed to `vinsw`. This patch fixes the second argument of the `vinsw` instruction and intrinsic to have an i32 argument. <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i32, i32) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83497 Files: llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll Index: llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll =================================================================== --- llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll +++ llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll @@ -232,16 +232,16 @@ } declare <4 x i32> @llvm.ppc.altivec.vinswvrx(<4 x i32>, i64, <4 x i32>) -define <4 x i32> @testVINSW(<4 x i32> %a, i64 %b) { +define <4 x i32> @testVINSW(<4 x i32> %a, i32 %b) { ; CHECK-LABEL: testVINSW: ; CHECK: # %bb.0: # %entry ; CHECK-NEXT: vinsw v2, r5, 1 ; CHECK-NEXT: blr entry: - %0 = tail call <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32> %a, i64 %b, i32 1) + %0 = tail call <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32> %a, i32 %b, i32 1) ret <4 x i32> %0 } -declare <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i64, i32 immarg) +declare <4 x i32> @llvm.ppc.altivec.vinsw(<4 x i32>, i32, i32 immarg) define <2 x i64> @testVINSD(<2 x i64> %a, i64 %b) { ; CHECK-LABEL: testVINSD: Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -242,15 +242,6 @@ } -// VX-Form: [PO VRT / UIM RB XO]. -// We use VXForm_1 to implement it, that is, we use "VRA" (5 bit) to represent -// "/ UIM" (unused bit followed by a 4-bit immediate) -// Destructive (insert) forms are suffixed with _ins. -class VXForm_VRT5_UIM5_RB5_ins xo, string opc, list pattern> - : VXForm_1, - RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">; - // VX-Form: [PO VRT RA VRB XO]. // Destructive (insert) forms are suffixed with _ins. class VXForm_VTB5_RA5_ins xo, string opc, list pattern> @@ -794,16 +785,18 @@ (int_ppc_altivec_vsrdbi v16i8:$VRA, v16i8:$VRB, i32:$SH))]>; - def VINSW : - VXForm_VRT5_UIM5_RB5_ins<207, "vinsw", - [(set v4i32:$vD, - (int_ppc_altivec_vinsw v4i32:$vDi, i64:$rB, - timm:$UIM))]>; + def VINSW : + VXForm_1<207, (outs vrrc:$vD), (ins vrrc:$vDi, u4imm:$UIM, gprc:$rB), + "vinsw $vD, $rB, $UIM", IIC_VecGeneral, + [(set v4i32:$vD, + (int_ppc_altivec_vinsw v4i32:$vDi, i32:$rB, timm:$UIM))]>, + RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">; def VINSD : - VXForm_VRT5_UIM5_RB5_ins<463, "vinsd", - [(set v2i64:$vD, - (int_ppc_altivec_vinsd v2i64:$vDi, i64:$rB, - timm:$UIM))]>; + VXForm_1<463, (outs vrrc:$vD), (ins vrrc:$vDi, u4imm:$UIM, g8rc:$rB), + "vinsd $vD, $rB, $UIM", IIC_VecGeneral, + [(set v2i64:$vD, + (int_ppc_altivec_vinsd v2i64:$vDi, i64:$rB, timm:$UIM))]>, + RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">; def VINSBVLX : VXForm_VTB5_RA5_ins<15, "vinsbvlx", [(set v16i8:$vD, Index: llvm/include/llvm/IR/IntrinsicsPowerPC.td =================================================================== --- llvm/include/llvm/IR/IntrinsicsPowerPC.td +++ llvm/include/llvm/IR/IntrinsicsPowerPC.td @@ -525,7 +525,7 @@ // P10 Vector Insert with immediate. def int_ppc_altivec_vinsw : Intrinsic<[llvm_v4i32_ty], - [llvm_v4i32_ty, llvm_i64_ty, llvm_i32_ty], + [llvm_v4i32_ty, llvm_i32_ty, llvm_i32_ty], [IntrNoMem, ImmArg>]>; def int_ppc_altivec_vinsd : Intrinsic<[llvm_v2i64_ty], -------------- next part -------------- A non-text attachment was scrubbed... Name: D83497.276784.patch Type: text/x-patch Size: 3989 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:25:26 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:25:26 +0000 (UTC) Subject: [PATCH] D82502: [PowerPC][Power10] Implement Load VSX Vector and Sign Extend and Zero Extend In-Reply-To: References: Message-ID: Conanap updated this revision to Diff 276785. Conanap marked 3 inline comments as done. Conanap added a comment. Now depends on D83364 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82502/new/ https://reviews.llvm.org/D82502 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/lib/Target/PowerPC/PPCISelLowering.h llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/ISA31-vsx-builtins.ll llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82502.276785.patch Type: text/x-patch Size: 13415 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:25:54 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:25:54 +0000 (UTC) Subject: [PATCH] D82502: [PowerPC][Power10] Implement Load VSX Vector and Sign Extend and Zero Extend In-Reply-To: References: Message-ID: <8d8a4a8fdb7ddaaf5fc8a535d8bdeef1@localhost.localdomain> Conanap added a comment. Also removed unnecessary brackets and comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82502/new/ https://reviews.llvm.org/D82502 From llvm-commits at lists.llvm.org Thu Jul 9 11:29:49 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:29:49 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <95b41ad96108e168eb917c7d5ca854a1@localhost.localdomain> reames accepted this revision. reames added a comment. This revision is now accepted and ready to land. LGTM And thank you for doing this. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Thu Jul 9 11:29:55 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:29:55 +0000 (UTC) Subject: [PATCH] D83424: [PGO][PGSO] Add profile guided size optimization tests to X86 ISel Lowering. In-Reply-To: References: Message-ID: <1bfca5be1d29693fdd50f20800bb4dc7@localhost.localdomain> yamauchi closed this revision. yamauchi added a comment. Committed as https://reviews.llvm.org/rG06fc125d8c5d Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83424/new/ https://reviews.llvm.org/D83424 From llvm-commits at lists.llvm.org Thu Jul 9 11:31:00 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:31:00 +0000 (UTC) Subject: [PATCH] D83332: [PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering. In-Reply-To: References: Message-ID: <606b7e4ac482bf3ccc096c4331f3e2c6@localhost.localdomain> yamauchi closed this revision. yamauchi added a comment. Committed as https://reviews.llvm.org/rG2c1a9006dd73 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83332/new/ https://reviews.llvm.org/D83332 From llvm-commits at lists.llvm.org Thu Jul 9 11:33:59 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:33:59 +0000 (UTC) Subject: [PATCH] D82443: [ARM] Narrowing half-precision lowering to supported CCs In-Reply-To: References: Message-ID: <845934269fc8b953e851ef995492ae4c@localhost.localdomain> plotfi updated this revision to Diff 276786. plotfi added a comment. D82552 appears to have also fixed fastcc, so I will only land the test case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82443/new/ https://reviews.llvm.org/D82443 Files: llvm/test/CodeGen/ARM/arm-half-promote.ll Index: llvm/test/CodeGen/ARM/arm-half-promote.ll =================================================================== --- llvm/test/CodeGen/ARM/arm-half-promote.ll +++ llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -51,3 +51,31 @@ ; CHECK-NEXT: bx lr ret { <8 x half>, <8 x half> } zeroinitializer } + +define fastcc { <8 x half>, <8 x half> } @f3() { +; CHECK-LABEL: _f3 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + + ret { <8 x half>, <8 x half> } zeroinitializer +} + -------------- next part -------------- A non-text attachment was scrubbed... Name: D82443.276786.patch Type: text/x-patch Size: 1237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:34:26 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:34:26 +0000 (UTC) Subject: [PATCH] D82443: [ARM] Narrowing half-precision lowering to supported CCs In-Reply-To: References: Message-ID: plotfi updated this revision to Diff 276787. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82443/new/ https://reviews.llvm.org/D82443 Files: llvm/test/CodeGen/ARM/arm-half-promote.ll Index: llvm/test/CodeGen/ARM/arm-half-promote.ll =================================================================== --- llvm/test/CodeGen/ARM/arm-half-promote.ll +++ llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -51,3 +51,31 @@ ; CHECK-NEXT: bx lr ret { <8 x half>, <8 x half> } zeroinitializer } + +define fastcc { <8 x half>, <8 x half> } @f3() { +; CHECK-LABEL: _f3 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + + ret { <8 x half>, <8 x half> } zeroinitializer +} + -------------- next part -------------- A non-text attachment was scrubbed... Name: D82443.276787.patch Type: text/x-patch Size: 1237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:35:26 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:35:26 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: dstenb added a comment. In D82975#2139704 , @dblaikie wrote: > > (Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.) > > > > When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers? > > Perhaps - though there was some thought put into supporting GNU debug_macro in v4/pre-standard Fission, given the DWP format had columns for both debug_macro and debug_macinfo ( https://gcc.gnu.org/wiki/DebugFissionDWP ). Don't think it's a big deal either way - if someone comes along wanting to add debug_macro support for pre-standard Fission, we can discuss what that format looks like at that point - happy enough for it to be unimplemented (& as I said before, have "-ggdb -gdwarf-4 -fdebug-macro -> debug_macro" and "-ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf -> debug_macinfo.dwo"). I'll leave the DWO parts out of this patch then, and later if we get to that, emitting macinfo in the `ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf` case seems reasonable to me. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Thu Jul 9 11:39:21 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via llvm-commits) Date: Thu, 09 Jul 2020 11:39:21 -0700 (PDT) Subject: [llvm] 7e169ce - [NFC][test] Adding fastcc test case for promoted 16-bit integer bitcasts. Message-ID: <5f076459.1c69fb81.b1cbf.7a8f@mx.google.com> Author: Puyan Lotfi Date: 2020-07-09T11:38:49-07:00 New Revision: 7e169cec74b09dc7a0eeafcb21e9f827314265ef URL: https://github.com/llvm/llvm-project/commit/7e169cec74b09dc7a0eeafcb21e9f827314265ef DIFF: https://github.com/llvm/llvm-project/commit/7e169cec74b09dc7a0eeafcb21e9f827314265ef.diff LOG: [NFC][test] Adding fastcc test case for promoted 16-bit integer bitcasts. The following: https://reviews.llvm.org/D82552 fixed an assert in the SelectionDag ISel legalizer for some CCs on armv7. I noticed that this fix also fixes the assert when using fastcc, so I am adding a fastcc regression test here. Differential Revision: https://reviews.llvm.org/D82443 Added: Modified: llvm/test/CodeGen/ARM/arm-half-promote.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/ARM/arm-half-promote.ll b/llvm/test/CodeGen/ARM/arm-half-promote.ll index 1d81273a6fd5..f3c9a9e081ba 100644 --- a/llvm/test/CodeGen/ARM/arm-half-promote.ll +++ b/llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -51,3 +51,31 @@ define swiftcc { <8 x half>, <8 x half> } @f2() { ; CHECK-NEXT: bx lr ret { <8 x half>, <8 x half> } zeroinitializer } + +define fastcc { <8 x half>, <8 x half> } @f3() { +; CHECK-LABEL: _f3 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + + ret { <8 x half>, <8 x half> } zeroinitializer +} + From llvm-commits at lists.llvm.org Thu Jul 9 11:39:24 2020 From: llvm-commits at lists.llvm.org (Phabricator via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:39:24 +0000 (UTC) Subject: [PATCH] D82443: [ARM] Narrowing half-precision lowering to supported CCs In-Reply-To: References: Message-ID: <2ab7f52ded7e956c052454c4d9e576e0@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG7e169cec74b0: [NFC][test] Adding fastcc test case for promoted 16-bit integer bitcasts. (authored by Puyan Lotfi <plotfi at fb.com>). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82443/new/ https://reviews.llvm.org/D82443 Files: llvm/test/CodeGen/ARM/arm-half-promote.ll Index: llvm/test/CodeGen/ARM/arm-half-promote.ll =================================================================== --- llvm/test/CodeGen/ARM/arm-half-promote.ll +++ llvm/test/CodeGen/ARM/arm-half-promote.ll @@ -51,3 +51,31 @@ ; CHECK-NEXT: bx lr ret { <8 x half>, <8 x half> } zeroinitializer } + +define fastcc { <8 x half>, <8 x half> } @f3() { +; CHECK-LABEL: _f3 +; CHECK: vpush {d8} +; CHECK-NEXT: vmov.f64 d8, #5.000000e-01 +; CHECK-NEXT: vmov.i32 d8, #0x0 +; CHECK-NEXT: vmov.i32 d0, #0x0 +; CHECK-NEXT: vmov.i32 d1, #0x0 +; CHECK-NEXT: vmov.i32 d2, #0x0 +; CHECK-NEXT: vmov.i32 d3, #0x0 +; CHECK-NEXT: vmov.i32 d4, #0x0 +; CHECK-NEXT: vmov.i32 d5, #0x0 +; CHECK-NEXT: vmov.i32 d6, #0x0 +; CHECK-NEXT: vmov.i32 d7, #0x0 +; CHECK-NEXT: vmov.f32 s1, s16 +; CHECK-NEXT: vmov.f32 s3, s16 +; CHECK-NEXT: vmov.f32 s5, s16 +; CHECK-NEXT: vmov.f32 s7, s16 +; CHECK-NEXT: vmov.f32 s9, s16 +; CHECK-NEXT: vmov.f32 s11, s16 +; CHECK-NEXT: vmov.f32 s13, s16 +; CHECK-NEXT: vmov.f32 s15, s16 +; CHECK-NEXT: vpop {d8} +; CHECK-NEXT: bx lr + + ret { <8 x half>, <8 x half> } zeroinitializer +} + -------------- next part -------------- A non-text attachment was scrubbed... Name: D82443.276789.patch Type: text/x-patch Size: 1237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:42:45 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:42:45 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <6cded06ee50af2a216d3531ad192f735@localhost.localdomain> nikic accepted this revision. nikic added a comment. LG from my side. New compile-time numbers: https://llvm-compile-time-tracker.com/compare.php?from=0b39d2d75275b80994dac06b7ad05031cbd09393&to=fd070b79e063fff2fad3cd4a467f64dfca83eb90&stat=instructions It's nearly neutral now. ================ Comment at: llvm/test/CodeGen/AMDGPU/opt-pipeline.ll:285 +; GCN-O1-NEXT: Branch Probability Analysis +; GCN-O1-NEXT: Block Frequency Analysis ; GCN-O1-NEXT: FunctionPass Manager ---------------- This test is out of date. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Thu Jul 9 11:42:52 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:42:52 +0000 (UTC) Subject: [PATCH] D73940: GlobalISel: Reimplement moreElementsVectorDst In-Reply-To: References: Message-ID: arsenm updated this revision to Diff 276791. arsenm added a comment. Fix backwards diff CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73940/new/ https://reviews.llvm.org/D73940 Files: llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Index: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp =================================================================== --- llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -1272,10 +1272,8 @@ void LegalizerHelper::moreElementsVectorDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx) { MachineOperand &MO = MI.getOperand(OpIdx); - Register DstExt = MRI.createGenericVirtualRegister(WideTy); MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt()); - MIRBuilder.buildExtract(MO, DstExt, 0); - MO.setReg(DstExt); + MO.setReg(widenWithUnmerge(WideTy, MO.getReg())); } void LegalizerHelper::moreElementsVectorSrc(MachineInstr &MI, LLT MoreTy, @@ -1443,6 +1441,40 @@ return Legalized; } +Register LegalizerHelper::widenWithUnmerge(LLT WideTy, Register OrigReg) { + Register WideReg = MRI.createGenericVirtualRegister(WideTy); + LLT OrigTy = MRI.getType(OrigReg); + LLT LCMTy = getLCMType(WideTy, OrigTy); + + const int NumMergeParts = LCMTy.getSizeInBits() / WideTy.getSizeInBits(); + const int NumUnmergeParts = LCMTy.getSizeInBits() / OrigTy.getSizeInBits(); + + Register UnmergeSrc = WideReg; + + // Create a merge to the LCM type, padding with undef + // %0:_(<3 x s32>) = G_FOO => <4 x s32> + // => + // %1:_(<4 x s32>) = G_FOO + // %2:_(<4 x s32>) = G_IMPLICIT_DEF + // %3:_(<12 x s32>) = G_CONCAT_VECTORS %1, %2, %2 + // %0:_(<3 x s32>), %4:_, %5:_, %6:_ = G_UNMERGE_VALUES %3 + if (NumMergeParts > 1) { + Register Undef = MIRBuilder.buildUndef(WideTy).getReg(0); + SmallVector MergeParts(NumMergeParts, Undef); + MergeParts[0] = WideReg; + UnmergeSrc = MIRBuilder.buildMerge(LCMTy, MergeParts).getReg(0); + } + + // Unmerge to the original register and pad with dead defs. + SmallVector UnmergeResults(NumUnmergeParts); + UnmergeResults[0] = OrigReg; + for (int I = 1; I != NumUnmergeParts; ++I) + UnmergeResults[I] = MRI.createGenericVirtualRegister(OrigTy); + + MIRBuilder.buildUnmerge(UnmergeResults, UnmergeSrc); + return WideReg; +} + LegalizerHelper::LegalizeResult LegalizerHelper::widenScalarUnmergeValues(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) { Index: llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h =================================================================== --- llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h +++ llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h @@ -154,6 +154,10 @@ /// def by inserting a G_BITCAST from \p CastTy void bitcastDst(MachineInstr &MI, LLT CastTy, unsigned OpIdx); + /// Widen \p OrigReg to \p WideTy by merging to a wider type, padding with + /// G_IMPLICIT_DEF, and producing dead results. + Register widenWithUnmerge(LLT WideTy, Register OrigReg); + private: LegalizeResult widenScalarMergeValues(MachineInstr &MI, unsigned TypeIdx, LLT WideTy); -------------- next part -------------- A non-text attachment was scrubbed... Name: D73940.276791.patch Type: text/x-patch Size: 2994 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:46:17 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:46:17 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <8448b5db05ca02fab6c12c2ac1d3b04c@localhost.localdomain> foad updated this revision to Diff 276792. foad added a comment. Rebase. Fix silly mistake in checking for negative offsets. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 Files: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp llvm/test/CodeGen/AMDGPU/flat-address-space.ll llvm/test/CodeGen/AMDGPU/offset-split-flat.ll llvm/test/CodeGen/AMDGPU/offset-split-global.ll llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll llvm/test/CodeGen/AMDGPU/store-hi16.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83394.276792.patch Type: text/x-patch Size: 22580 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:48:02 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:48:02 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- foad wrote: > arsenm wrote: > > arsenm wrote: > > > foad wrote: > > > > arsenm wrote: > > > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? > > > Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions > > Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture. My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Thu Jul 9 11:50:37 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:50:37 +0000 (UTC) Subject: [PATCH] D82230: [ADT] Specialize std::swap() for SetVector In-Reply-To: References: Message-ID: <9bb87a691a14edf514abedd89a013ecc@localhost.localdomain> nikic added a comment. Ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82230/new/ https://reviews.llvm.org/D82230 From llvm-commits at lists.llvm.org Thu Jul 9 11:51:51 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:51:51 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <11a465cc7ec7c949bd2b34660986205b@localhost.localdomain> foad marked an inline comment as done. foad added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- arsenm wrote: > foad wrote: > > arsenm wrote: > > > arsenm wrote: > > > > foad wrote: > > > > > arsenm wrote: > > > > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > > > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? > > > > Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions > > > Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > > I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture. > My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers) But in that case, you should still avoid making drastic changes to vaddr in case it ends up accidentally pointing *into* one of the apertures, when you wanted a global access. E.g. if you're accessing a global that happens to be just past the end of the private or local aperture. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Thu Jul 9 11:52:01 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:52:01 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes Message-ID: aeubanks created this revision. aeubanks added reviewers: ychen, asbirlea, hans. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. PassInfoMixin should be used for all NPM passes, rater than a custom `name()`. This caused ambiguous references in LegacyPassManager.cpp, so had to remove "using namespace llvm::legacy" and move some things around. The passes had to be moved to the llvm namespace, or else they would get printed as "(anonymous namespace)::FooPass". Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83498 Files: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83498.276796.patch Type: text/x-patch Size: 14677 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:53:01 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:53:01 +0000 (UTC) Subject: [PATCH] D83458: [StackSafety,NFC] Reduce FunctionSummary size In-Reply-To: References: Message-ID: <8189f5244d12532a327143e6efda4e7f@localhost.localdomain> tejohnson accepted this revision. tejohnson added a comment. This revision is now accepted and ready to land. lgtm Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83458/new/ https://reviews.llvm.org/D83458 From llvm-commits at lists.llvm.org Thu Jul 9 11:54:21 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:54:21 +0000 (UTC) Subject: [PATCH] D83499: [MSAN runtime] Add poison_stack function that also updates origin Message-ID: guiand created this revision. guiand added reviewers: eugenis, vitalybuka. Herald added subscribers: llvm-commits, Sanitizers, hiraditya. Herald added projects: Sanitizers, LLVM. With eager-checks and msan-poison-stack-with-call enabled, this saves ~1.5% instrumented binary size. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83499 Files: compiler-rt/lib/msan/msan_interceptors.cpp compiler-rt/lib/msan/msan_interface_internal.h llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/alloca.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83499.276797.patch Type: text/x-patch Size: 12589 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:55:13 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:55:13 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. Message-ID: amyk created this revision. amyk added reviewers: PowerPC, power-llvm-team, nemanjai, lei. amyk added projects: LLVM, PowerPC, clang. Herald added a subscriber: shchenz. This patch implements custom codegen for the `vec_replace_elt` and `vec_replace_unaligned` builtins. These builtins map to the `@llvm.ppc.altivec.vinsw` and `@llvm.ppc.altivec.vinsd` intrinsics depending on the arguments. The main motivation for doing custom codegen for these intrinsics is because there are float and double versions of the builtin. Normally, the converting the float to an integer would be done via `fptoui` in the IR, however it is more preferable to use `bitcast`. The original patch that implemented the front end done this adding unions to altivec.h (https://reviews.llvm.org/D82359) but this patch uses custom codegen to use `bitcast` instead for the float conversion instead. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83500 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D83500.276794.patch Type: text/x-patch Size: 12113 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 11:58:06 2020 From: llvm-commits at lists.llvm.org (Erich Keane via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:58:06 +0000 (UTC) Subject: [PATCH] D82230: [ADT] Specialize std::swap() for SetVector In-Reply-To: References: Message-ID: <41aa0df0c2855aa3a715953a5d6d7184@localhost.localdomain> erichkeane accepted this revision. erichkeane added a comment. This revision is now accepted and ready to land. Please fix the clang-format concerns, otherwise LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82230/new/ https://reviews.llvm.org/D82230 From llvm-commits at lists.llvm.org Thu Jul 9 11:58:34 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via llvm-commits) Date: Thu, 09 Jul 2020 11:58:34 -0700 (PDT) Subject: [llvm] 7af27b6 - [NFC][AArch64] Refactor getArgumentPopSize Message-ID: <5f0768da.1c69fb81.890ae.8afe@mx.google.com> Author: Kyungwoo Lee Date: 2020-07-09T11:58:15-07:00 New Revision: 7af27b65b3ce4690da97fd8048be7237e0358ef5 URL: https://github.com/llvm/llvm-project/commit/7af27b65b3ce4690da97fd8048be7237e0358ef5 DIFF: https://github.com/llvm/llvm-project/commit/7af27b65b3ce4690da97fd8048be7237e0358ef5.diff LOG: [NFC][AArch64] Refactor getArgumentPopSize Differential Revision: https://reviews.llvm.org/D83456 Added: Modified: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 0ada039fa0a7..bd76855f7c64 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -177,6 +177,38 @@ static cl::opt StackTaggingMergeSetTag( STATISTIC(NumRedZoneFunctions, "Number of functions using red zone"); +/// Returns the argument pop size. +static uint64_t getArgumentPopSize(MachineFunction &MF, + MachineBasicBlock &MBB) { + MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); + bool IsTailCallReturn = false; + if (MBB.end() != MBBI) { + unsigned RetOpcode = MBBI->getOpcode(); + IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || + RetOpcode == AArch64::TCRETURNri || + RetOpcode == AArch64::TCRETURNriBTI; + } + AArch64FunctionInfo *AFI = MF.getInfo(); + + uint64_t ArgumentPopSize = 0; + if (IsTailCallReturn) { + MachineOperand &StackAdjust = MBBI->getOperand(1); + + // For a tail-call in a callee-pops-arguments environment, some or all of + // the stack may actually be in use for the call's arguments, this is + // calculated during LowerCall and consumed here... + ArgumentPopSize = StackAdjust.getImm(); + } else { + // ... otherwise the amount to pop is *all* of the argument space, + // conveniently stored in the MachineFunctionInfo by + // LowerFormalArguments. This will, of course, be zero for the C calling + // convention. + ArgumentPopSize = AFI->getArgumentStackToRestore(); + } + + return ArgumentPopSize; +} + /// This is the biggest offset to the stack pointer we can encode in aarch64 /// instructions (without using a separate calculation and a temp register). /// Note that the exception here are vector stores/loads which cannot encode any @@ -1416,7 +1448,6 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF, const AArch64Subtarget &Subtarget = MF.getSubtarget(); const TargetInstrInfo *TII = Subtarget.getInstrInfo(); DebugLoc DL; - bool IsTailCallReturn = false; bool NeedsWinCFI = needsWinCFI(MF); bool HasWinCFI = false; bool IsFunclet = false; @@ -1427,10 +1458,6 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF, if (MBB.end() != MBBI) { DL = MBBI->getDebugLoc(); - unsigned RetOpcode = MBBI->getOpcode(); - IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || - RetOpcode == AArch64::TCRETURNri || - RetOpcode == AArch64::TCRETURNriBTI; IsFunclet = isFuncletReturnInstr(*MBBI); } @@ -1445,21 +1472,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF, // Initial and residual are named for consistency with the prologue. Note that // in the epilogue, the residual adjustment is executed first. - uint64_t ArgumentPopSize = 0; - if (IsTailCallReturn) { - MachineOperand &StackAdjust = MBBI->getOperand(1); - - // For a tail-call in a callee-pops-arguments environment, some or all of - // the stack may actually be in use for the call's arguments, this is - // calculated during LowerCall and consumed here... - ArgumentPopSize = StackAdjust.getImm(); - } else { - // ... otherwise the amount to pop is *all* of the argument space, - // conveniently stored in the MachineFunctionInfo by - // LowerFormalArguments. This will, of course, be zero for the C calling - // convention. - ArgumentPopSize = AFI->getArgumentStackToRestore(); - } + uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB); // The stack frame should be like below, // From llvm-commits at lists.llvm.org Thu Jul 9 11:58:44 2020 From: llvm-commits at lists.llvm.org (Phabricator via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 18:58:44 +0000 (UTC) Subject: [PATCH] D83456: [NFC][AArch64] Refactor getArgumentPopSize In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG7af27b65b3ce: [NFC][AArch64] Refactor getArgumentPopSize (authored by Kyungwoo Lee <kyulee.llvm at gmail.com>, committed by Puyan Lotfi <plotfi at fb.com>). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83456/new/ https://reviews.llvm.org/D83456 Files: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Index: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp =================================================================== --- llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -177,6 +177,38 @@ STATISTIC(NumRedZoneFunctions, "Number of functions using red zone"); +/// Returns the argument pop size. +static uint64_t getArgumentPopSize(MachineFunction &MF, + MachineBasicBlock &MBB) { + MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); + bool IsTailCallReturn = false; + if (MBB.end() != MBBI) { + unsigned RetOpcode = MBBI->getOpcode(); + IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || + RetOpcode == AArch64::TCRETURNri || + RetOpcode == AArch64::TCRETURNriBTI; + } + AArch64FunctionInfo *AFI = MF.getInfo(); + + uint64_t ArgumentPopSize = 0; + if (IsTailCallReturn) { + MachineOperand &StackAdjust = MBBI->getOperand(1); + + // For a tail-call in a callee-pops-arguments environment, some or all of + // the stack may actually be in use for the call's arguments, this is + // calculated during LowerCall and consumed here... + ArgumentPopSize = StackAdjust.getImm(); + } else { + // ... otherwise the amount to pop is *all* of the argument space, + // conveniently stored in the MachineFunctionInfo by + // LowerFormalArguments. This will, of course, be zero for the C calling + // convention. + ArgumentPopSize = AFI->getArgumentStackToRestore(); + } + + return ArgumentPopSize; +} + /// This is the biggest offset to the stack pointer we can encode in aarch64 /// instructions (without using a separate calculation and a temp register). /// Note that the exception here are vector stores/loads which cannot encode any @@ -1416,7 +1448,6 @@ const AArch64Subtarget &Subtarget = MF.getSubtarget(); const TargetInstrInfo *TII = Subtarget.getInstrInfo(); DebugLoc DL; - bool IsTailCallReturn = false; bool NeedsWinCFI = needsWinCFI(MF); bool HasWinCFI = false; bool IsFunclet = false; @@ -1427,10 +1458,6 @@ if (MBB.end() != MBBI) { DL = MBBI->getDebugLoc(); - unsigned RetOpcode = MBBI->getOpcode(); - IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi || - RetOpcode == AArch64::TCRETURNri || - RetOpcode == AArch64::TCRETURNriBTI; IsFunclet = isFuncletReturnInstr(*MBBI); } @@ -1445,21 +1472,7 @@ // Initial and residual are named for consistency with the prologue. Note that // in the epilogue, the residual adjustment is executed first. - uint64_t ArgumentPopSize = 0; - if (IsTailCallReturn) { - MachineOperand &StackAdjust = MBBI->getOperand(1); - - // For a tail-call in a callee-pops-arguments environment, some or all of - // the stack may actually be in use for the call's arguments, this is - // calculated during LowerCall and consumed here... - ArgumentPopSize = StackAdjust.getImm(); - } else { - // ... otherwise the amount to pop is *all* of the argument space, - // conveniently stored in the MachineFunctionInfo by - // LowerFormalArguments. This will, of course, be zero for the C calling - // convention. - ArgumentPopSize = AFI->getArgumentStackToRestore(); - } + uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB); // The stack frame should be like below, // -------------- next part -------------- A non-text attachment was scrubbed... Name: D83456.276798.patch Type: text/x-patch Size: 3477 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:01:45 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:01:45 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <716538b21f33b96041c9ebc6af0dcbce@localhost.localdomain> avl updated this revision to Diff 276799. avl added a comment. addressed comments: added test for multiple recursive calls, removed duplicated check for operand bundles, simplified and commented tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 Files: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/test/Transforms/TailCallElim/basic.ll llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82085.276799.patch Type: text/x-patch Size: 19838 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:03:43 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:03:43 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: <6f957a841d62bec4f7de16acb22b0b70@localhost.localdomain> amyk marked 2 inline comments as done. amyk added inline comments. ================ Comment at: clang/include/clang/Basic/BuiltinsPPC.def:339 +BUILTIN(__builtin_altivec_vec_replace_elt, "V4UiV4UiULLiIi", "t") +BUILTIN(__builtin_altivec_vec_replace_unaligned, "V4UiV4UiULLiIi", "t") ---------------- I originally intended to implement this like the `xxpermdi` builtin: ``` BUILTIN(__builtin_vsx_xxpermdi, "v.", "t") ``` to use `v.` but I am not able to declare these builtins as void. For now, they're more or less an arbitrary signature that would match `vinsw`. ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:606 +vector float test_vec_replace_elt_f(void) { + // CHECK-BE: bitcast float %{{.+}} to i32 + // CHECK-BE-NEXT: @llvm.ppc.altivec.vinsw(<4 x i32> %{{.+}}, i32 %{{.+}}, i32 8 ---------------- I've utilized tests that were from Biplob's original patch (https://reviews.llvm.org/D82359), but added the `bitcasts` to the float/double cases. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 From llvm-commits at lists.llvm.org Thu Jul 9 12:05:07 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:05:07 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <9f5a0cae9709827cd694a1032f718e63@localhost.localdomain> ychen added a comment. I was aware of this recently. Thanks for fixing this. Just one nit. Please wait for one other reviewer. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- How about keeping this local? These are only for testing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 12:05:29 2020 From: llvm-commits at lists.llvm.org (Baptiste Saleil via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:05:29 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: <68b538ab32311d9362ed0d3f0a2b48f3@localhost.localdomain> bsaleil added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:939 + // The XFormMemOp flag for the following 8 insts is set on the instruction format. + let mayLoad = 1, mayStore = 1 in { + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; ---------------- Shouldn't `mayStore` be 0 instead of 1 here ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Thu Jul 9 12:11:17 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:11:17 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: hans accepted this revision. hans added a comment. In D83013#2142271 , @nikic wrote: > New compile-time numbers: https://llvm-compile-time-tracker.com/compare.php?from=0b39d2d75275b80994dac06b7ad05031cbd09393&to=fd070b79e063fff2fad3cd4a467f64dfca83eb90&stat=instructions It's nearly neutral now. Sounds great! lgtm2 (with the test update Nikita mentioned) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Thu Jul 9 12:14:17 2020 From: llvm-commits at lists.llvm.org (Baptiste Saleil via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:14:17 +0000 (UTC) Subject: [PATCH] D83338: [PowerPC][Power10] Implemented Vector Shift Builtins In-Reply-To: References: Message-ID: bsaleil added a comment. Shouldn't we have test cases to test `vec_sl`, `vec_sr` and `vec_sra` ? ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:800 def int_ppc_altivec_vsrw : PowerPC_Vec_WWW_Intrinsic<"vsrw">; +def int_ppc_altivec_vsrq : PowerPC_Vec_QQQ_Intrinsic<"vsrq">; def int_ppc_altivec_vsrab : PowerPC_Vec_BBB_Intrinsic<"vsrab">; ---------------- nit: indentation issue ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:919 + + def VSLQ : VX1_Int_Ty< 261, "vslq", int_ppc_altivec_vslq, v1i128>; + def VSRAQ : VX1_Int_Ty< 773, "vsraq", int_ppc_altivec_vsraq, v1i128>; ---------------- nit: extra spaces before `:` here and in the next two lines Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83338/new/ https://reviews.llvm.org/D83338 From llvm-commits at lists.llvm.org Thu Jul 9 12:15:27 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:15:27 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: ggeorgakoudis updated this revision to Diff 276802. ggeorgakoudis added a comment. Update regression test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.276802.patch Type: text/x-patch Size: 5058 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:16:30 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:16:30 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <1156e4a1d4f38f2ccb3b32cd0d73f9c7@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM > I'm happy to add this but just wanted to query what it gives. <4 x i8> is not a legal type so the test just exercises the same truncate path as <4 x i64> to <4 x i16>, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be). That's fine; I just want to test that it doesn't get caught in the custom lowering and crash somehow. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 From llvm-commits at lists.llvm.org Thu Jul 9 12:19:09 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:19:09 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: <812091e7dfb97a8eb59764190bd37543@localhost.localdomain> amyk updated this revision to Diff 276804. amyk added a comment. Updated for clang format changes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D83500.276804.patch Type: text/x-patch Size: 12237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:22:38 2020 From: llvm-commits at lists.llvm.org (Kai Nacke via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:22:38 +0000 (UTC) Subject: [PATCH] D83484: Use InitLLVM in llvm-stress, sancov and TableGen In-Reply-To: References: Message-ID: <423ae8f0a312b246472d2e2857e9754a@localhost.localdomain> Kai accepted this revision. Kai added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83484/new/ https://reviews.llvm.org/D83484 From llvm-commits at lists.llvm.org Thu Jul 9 12:24:24 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Thu, 09 Jul 2020 12:24:24 -0700 (PDT) Subject: [llvm] 469da66 - [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison Message-ID: <5f076ee8.1c69fb81.aeec7.8f4c@mx.google.com> Author: Craig Topper Date: 2020-07-09T12:21:03-07:00 New Revision: 469da663f2df150629786df3f82c217062924f5e URL: https://github.com/llvm/llvm-project/commit/469da663f2df150629786df3f82c217062924f5e DIFF: https://github.com/llvm/llvm-project/commit/469da663f2df150629786df3f82c217062924f5e.diff LOG: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison Follow up from the transform being removed in D83360. If X is probably not poison, then the transform is safe. Still plan to remove or adjust the code from ConstantFolding after this. Differential Revision: https://reviews.llvm.org/D83440 Added: Modified: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp index 277e2907fa04..0975a65d183e 100644 --- a/llvm/lib/Analysis/InstructionSimplify.cpp +++ b/llvm/lib/Analysis/InstructionSimplify.cpp @@ -4118,6 +4118,17 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal, if (TrueVal == FalseVal) return TrueVal; + // If the true or false value is undef, we can fold to the other value as + // long as the other value isn't poison. + // select ?, undef, X -> X + if (isa(TrueVal) && + isGuaranteedNotToBeUndefOrPoison(FalseVal, Q.CxtI, Q.DT)) + return FalseVal; + // select ?, X, undef -> X + if (isa(FalseVal) && + isGuaranteedNotToBeUndefOrPoison(TrueVal, Q.CxtI, Q.DT)) + return TrueVal; + // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll index 8b69badb32f3..753d8fa64bdb 100644 --- a/llvm/test/Transforms/InstSimplify/select.ll +++ b/llvm/test/Transforms/InstSimplify/select.ll @@ -794,8 +794,7 @@ define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) { ; These can be folded because the other value is guaranteed not to be poison. define i32 @false_undef_true_constant(i1 %cond) { ; CHECK-LABEL: @false_undef_true_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 10, i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 10 ; %s = select i1 %cond, i32 10, i32 undef ret i32 %s @@ -803,8 +802,7 @@ define i32 @false_undef_true_constant(i1 %cond) { define i32 @true_undef_false_constant(i1 %cond) { ; CHECK-LABEL: @true_undef_false_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 20 -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 20 ; %s = select i1 %cond, i32 undef, i32 20 ret i32 %s @@ -830,8 +828,7 @@ define <2 x i32> @true_undef_false_constant_vec(i1 %cond) { define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_true_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[XF]], i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 %xf, i32 undef @@ -841,8 +838,7 @@ define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_false_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[XF]] -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 undef, i32 %xf From llvm-commits at lists.llvm.org Thu Jul 9 12:24:33 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:24:33 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <08f5407df370d69e1d710dfa1f1fde57@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG469da663f2df: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably… (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 Files: llvm/lib/Analysis/InstructionSimplify.cpp llvm/test/Transforms/InstSimplify/select.ll Index: llvm/test/Transforms/InstSimplify/select.ll =================================================================== --- llvm/test/Transforms/InstSimplify/select.ll +++ llvm/test/Transforms/InstSimplify/select.ll @@ -794,8 +794,7 @@ ; These can be folded because the other value is guaranteed not to be poison. define i32 @false_undef_true_constant(i1 %cond) { ; CHECK-LABEL: @false_undef_true_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 10, i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 10 ; %s = select i1 %cond, i32 10, i32 undef ret i32 %s @@ -803,8 +802,7 @@ define i32 @true_undef_false_constant(i1 %cond) { ; CHECK-LABEL: @true_undef_false_constant( -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 20 -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 20 ; %s = select i1 %cond, i32 undef, i32 20 ret i32 %s @@ -830,8 +828,7 @@ define i32 @false_undef_true_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_true_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 [[XF]], i32 undef -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 %xf, i32 undef @@ -841,8 +838,7 @@ define i32 @false_undef_false_freeze(i1 %cond, i32 %x) { ; CHECK-LABEL: @false_undef_false_freeze( ; CHECK-NEXT: [[XF:%.*]] = freeze i32 [[X:%.*]] -; CHECK-NEXT: [[S:%.*]] = select i1 [[COND:%.*]], i32 undef, i32 [[XF]] -; CHECK-NEXT: ret i32 [[S]] +; CHECK-NEXT: ret i32 [[XF]] ; %xf = freeze i32 %x %s = select i1 %cond, i32 undef, i32 %xf Index: llvm/lib/Analysis/InstructionSimplify.cpp =================================================================== --- llvm/lib/Analysis/InstructionSimplify.cpp +++ llvm/lib/Analysis/InstructionSimplify.cpp @@ -4118,6 +4118,17 @@ if (TrueVal == FalseVal) return TrueVal; + // If the true or false value is undef, we can fold to the other value as + // long as the other value isn't poison. + // select ?, undef, X -> X + if (isa(TrueVal) && + isGuaranteedNotToBeUndefOrPoison(FalseVal, Q.CxtI, Q.DT)) + return FalseVal; + // select ?, X, undef -> X + if (isa(FalseVal) && + isGuaranteedNotToBeUndefOrPoison(TrueVal, Q.CxtI, Q.DT)) + return TrueVal; + // Deal with partial undef vector constants: select ?, VecC, VecC' --> VecC'' Constant *TrueC, *FalseC; if (TrueVal->getType()->isVectorTy() && match(TrueVal, m_Constant(TrueC)) && -------------- next part -------------- A non-text attachment was scrubbed... Name: D83440.276805.patch Type: text/x-patch Size: 2577 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:25:51 2020 From: llvm-commits at lists.llvm.org (Brian Sumner via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:25:51 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <581f33bd0b9908872edeb0c847e91496@localhost.localdomain> b-sumner added a comment. In D82818#2141981 , @arsenm wrote: > In D82818#2141973 , @kzhuravl wrote: > > > Do we also want to remove it from v2 metadata? > > > Probably, but I looked briefly and didn't actually see the direct equivalent It's "ValueType" in the v2 metadata. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Thu Jul 9 12:30:29 2020 From: llvm-commits at lists.llvm.org (Puyan Lotfi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:30:29 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <2cdddeebcb07e7afd777a99f54d00c52@localhost.localdomain> plotfi added a comment. @kyulee Update please (since the NFC has landed), so that Harbormaster can run again without conflict. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 From llvm-commits at lists.llvm.org Thu Jul 9 12:32:31 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:32:31 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <99e3952d905e92e3a0154fa18242e1b6@localhost.localdomain> kyulee updated this revision to Diff 276806. kyulee added a comment. Rebase after D83456 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D76570.276806.patch Type: text/x-patch Size: 43538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:36:18 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:36:18 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: aeubanks marked an inline comment as done. aeubanks added inline comments. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- ychen wrote: > How about keeping this local? These are only for testing. Do you mean keeping this in an anonymous namespace? As mentioned in the commit, that makes the printed name messed up. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 12:40:27 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:40:27 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: zequanwu updated this revision to Diff 276808. zequanwu added a comment. Update test case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276808.patch Type: text/x-patch Size: 17602 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:40:35 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:40:35 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: <58cd335bdccb4fceb4ce9beaf5fe9591@localhost.localdomain> lei added inline comments. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14273 + // The third argument to vec_replace_elt will be emitted to either + // the vinsw or vinsd instruction. It must be a compile time constant. + ConstantInt *ArgCI = dyn_cast(Ops[2]); ---------------- Do you mean? ``` // The third argument of vec_replace_elt must be a compile time constant and will be emitted either // to the vinsw or vinsd instruction. ``` ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14289 + else + ConstArg = (ConstArg * 4); + Ops[2] = ConstantInt::getSigned(Int32Ty, ConstArg); ---------------- ``` ConstArg *= 4; // Fix the constant according to endianess. if (getTarget().isLittleEndian()) ConstArg = 12 - ConstArg; ``` ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14320 + Call = Builder.CreateCall(F, Ops); + } + return Call; ---------------- What are the chances of reaching to the end of this if/else-if section and `Call` is null? ie `getPrimitiveSizeInBits() != [32|64]` I feel like it would be better if we can structure it so that we are not doing all these nesting of `if`s and just do returns within the diff if-conditions. Have you tried to pull out the diff handling of 32/64bit arg and consolidating the code a bit? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 From llvm-commits at lists.llvm.org Thu Jul 9 12:41:56 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:41:56 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: ychen added inline comments. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- aeubanks wrote: > ychen wrote: > > How about keeping this local? These are only for testing. > Do you mean keeping this in an anonymous namespace? > As mentioned in the commit, that makes the printed name messed up. Add some regex in lit tests? Running pass: {{.*}}NoOpModulePass Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 12:42:19 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:42:19 +0000 (UTC) Subject: [PATCH] D82329: [SVE] Fix invalid Scalable to fixed width vetor type demotion in LLT In-Reply-To: References: Message-ID: ctetreau abandoned this revision. ctetreau added a comment. This patch (in the form of an unconditional cast to FixedVectorType on line 22) has been rolled into D82210 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82329/new/ https://reviews.llvm.org/D82329 From llvm-commits at lists.llvm.org Thu Jul 9 12:43:09 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:43:09 +0000 (UTC) Subject: [PATCH] D82210: [SVE] Remove calls to VectorType::getNumElements from CodeGen In-Reply-To: References: Message-ID: <54c0eaf1393087379b841ac3f858f65f@localhost.localdomain> ctetreau updated this revision to Diff 276810. ctetreau added a comment. account for abandoned D82329 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82210/new/ https://reviews.llvm.org/D82210 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/ExpandReductions.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/CodeGen/LowLevelType.cpp llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/CodeGen/ValueTypes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82210.276810.patch Type: text/x-patch Size: 11591 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:43:51 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:43:51 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <66e66d4eb3c8296bef8d9c086cd13e5a@localhost.localdomain> lebedev.ri added a comment. @dblaikie does this look like the wording you'd expect? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Thu Jul 9 12:43:53 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via llvm-commits) Date: Thu, 09 Jul 2020 12:43:53 -0700 (PDT) Subject: [llvm] ff5b9a7 - [SVE] Remove calls to VectorType::getNumElements from CodeGen Message-ID: <5f077379.1c69fb81.3b07c.83f6@mx.google.com> Author: Christopher Tetreault Date: 2020-07-09T12:43:36-07:00 New Revision: ff5b9a7b3b2736db02c6550bb4eae84ae65e294c URL: https://github.com/llvm/llvm-project/commit/ff5b9a7b3b2736db02c6550bb4eae84ae65e294c DIFF: https://github.com/llvm/llvm-project/commit/ff5b9a7b3b2736db02c6550bb4eae84ae65e294c.diff LOG: [SVE] Remove calls to VectorType::getNumElements from CodeGen Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm Reviewed By: RKSimon Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82210 Added: Modified: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/ExpandReductions.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/CodeGen/LowLevelType.cpp llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/CodeGen/ValueTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp index 5fe8a092797b..e8b8e6c93cf0 100644 --- a/llvm/lib/CodeGen/CodeGenPrepare.cpp +++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -5329,7 +5329,7 @@ bool CodeGenPrepare::optimizeGatherScatterInst(Instruction *MemoryInst, if (!RewriteGEP && Ops.size() == 2) return false; - unsigned NumElts = cast(Ptr->getType())->getNumElements(); + unsigned NumElts = cast(Ptr->getType())->getNumElements(); IRBuilder<> Builder(MemoryInst); @@ -6628,7 +6628,7 @@ bool CodeGenPrepare::optimizeShuffleVectorInst(ShuffleVectorInst *SVI) { if (!NewType) return false; - VectorType *SVIVecType = cast(SVI->getType()); + auto *SVIVecType = cast(SVI->getType()); assert(!NewType->isVectorTy() && "Expected a scalar type!"); assert(NewType->getScalarSizeInBits() == SVIVecType->getScalarSizeInBits() && "Expected a type of the same size!"); diff --git a/llvm/lib/CodeGen/ExpandReductions.cpp b/llvm/lib/CodeGen/ExpandReductions.cpp index 294edc49fd40..45f21c1085dd 100644 --- a/llvm/lib/CodeGen/ExpandReductions.cpp +++ b/llvm/lib/CodeGen/ExpandReductions.cpp @@ -125,7 +125,8 @@ bool expandReductions(Function &F, const TargetTransformInfo *TTI) { if (!FMF.allowReassoc()) Rdx = getOrderedReduction(Builder, Acc, Vec, getOpcode(ID), MRK); else { - if (!isPowerOf2_32(cast(Vec->getType())->getNumElements())) + if (!isPowerOf2_32( + cast(Vec->getType())->getNumElements())) continue; Rdx = getShuffleReduction(Builder, Vec, getOpcode(ID), MRK); @@ -146,7 +147,8 @@ bool expandReductions(Function &F, const TargetTransformInfo *TTI) { case Intrinsic::experimental_vector_reduce_fmax: case Intrinsic::experimental_vector_reduce_fmin: { Value *Vec = II->getArgOperand(0); - if (!isPowerOf2_32(cast(Vec->getType())->getNumElements())) + if (!isPowerOf2_32( + cast(Vec->getType())->getNumElements())) continue; Rdx = getShuffleReduction(Builder, Vec, getOpcode(ID), MRK); diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp index 0171d6cb18ca..bbdefe3e5ca4 100644 --- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp +++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp @@ -1059,7 +1059,7 @@ bool IRTranslator::translateGetElementPtr(const User &U, // splat vector. unsigned VectorWidth = 0; if (auto *VT = dyn_cast(U.getType())) - VectorWidth = VT->getNumElements(); + VectorWidth = cast(VT)->getNumElements(); // We might need to splat the base pointer into a vector if the offsets // are vectors. @@ -1946,7 +1946,7 @@ bool IRTranslator::translateInsertElement(const User &U, MachineIRBuilder &MIRBuilder) { // If it is a <1 x Ty> vector, use the scalar as it is // not a legal vector type in LLT. - if (cast(U.getType())->getNumElements() == 1) + if (cast(U.getType())->getNumElements() == 1) return translateCopy(U, *U.getOperand(1), MIRBuilder); Register Res = getOrCreateVReg(U); @@ -1961,7 +1961,7 @@ bool IRTranslator::translateExtractElement(const User &U, MachineIRBuilder &MIRBuilder) { // If it is a <1 x Ty> vector, use the scalar as it is // not a legal vector type in LLT. - if (cast(U.getOperand(0)->getType())->getNumElements() == 1) + if (cast(U.getOperand(0)->getType())->getNumElements() == 1) return translateCopy(U, *U.getOperand(0), MIRBuilder); Register Res = getOrCreateVReg(U); diff --git a/llvm/lib/CodeGen/InterleavedAccessPass.cpp b/llvm/lib/CodeGen/InterleavedAccessPass.cpp index d9ea1cac9574..c4d83547a06c 100644 --- a/llvm/lib/CodeGen/InterleavedAccessPass.cpp +++ b/llvm/lib/CodeGen/InterleavedAccessPass.cpp @@ -280,7 +280,7 @@ static bool isReInterleaveMask(ArrayRef Mask, unsigned &Factor, bool InterleavedAccess::lowerInterleavedLoad( LoadInst *LI, SmallVector &DeadInsts) { - if (!LI->isSimple()) + if (!LI->isSimple() || isa(LI->getType())) return false; SmallVector Shuffles; @@ -308,7 +308,8 @@ bool InterleavedAccess::lowerInterleavedLoad( unsigned Factor, Index; - unsigned NumLoadElements = cast(LI->getType())->getNumElements(); + unsigned NumLoadElements = + cast(LI->getType())->getNumElements(); // Check if the first shufflevector is DE-interleave shuffle. if (!isDeInterleaveMask(Shuffles[0]->getShuffleMask(), Factor, Index, MaxFactor, NumLoadElements)) @@ -421,13 +422,13 @@ bool InterleavedAccess::lowerInterleavedStore( return false; ShuffleVectorInst *SVI = dyn_cast(SI->getValueOperand()); - if (!SVI || !SVI->hasOneUse()) + if (!SVI || !SVI->hasOneUse() || isa(SVI->getType())) return false; // Check if the shufflevector is RE-interleave shuffle. unsigned Factor; unsigned OpNumElts = - cast(SVI->getOperand(0)->getType())->getNumElements(); + cast(SVI->getOperand(0)->getType())->getNumElements(); if (!isReInterleaveMask(SVI->getShuffleMask(), Factor, MaxFactor, OpNumElts)) return false; diff --git a/llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp b/llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp index 5b346aeffdbf..f7131926ee65 100644 --- a/llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp +++ b/llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp @@ -1200,7 +1200,8 @@ bool InterleavedLoadCombineImpl::combine(std::list &InterleavedLoad, IRBuilder<> Builder(InsertionPoint); Type *ETy = InterleavedLoad.front().SVI->getType()->getElementType(); unsigned ElementsPerSVI = - InterleavedLoad.front().SVI->getType()->getNumElements(); + cast(InterleavedLoad.front().SVI->getType()) + ->getNumElements(); FixedVectorType *ILTy = FixedVectorType::get(ETy, Factor * ElementsPerSVI); SmallVector Indices; diff --git a/llvm/lib/CodeGen/LowLevelType.cpp b/llvm/lib/CodeGen/LowLevelType.cpp index 40dfa696a2b9..33752a1f9230 100644 --- a/llvm/lib/CodeGen/LowLevelType.cpp +++ b/llvm/lib/CodeGen/LowLevelType.cpp @@ -19,7 +19,7 @@ using namespace llvm; LLT llvm::getLLTForType(Type &Ty, const DataLayout &DL) { if (auto VTy = dyn_cast(&Ty)) { - auto NumElements = VTy->getNumElements(); + auto NumElements = cast(VTy)->getNumElements(); LLT ScalarTy = getLLTForType(*VTy->getElementType(), DL); if (NumElements == 1) return ScalarTy; diff --git a/llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp b/llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp index 04772d2e0709..c93b29617438 100644 --- a/llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp +++ b/llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp @@ -83,7 +83,7 @@ static bool isConstantIntVector(Value *Mask) { if (!C) return false; - unsigned NumElts = cast(Mask->getType())->getNumElements(); + unsigned NumElts = cast(Mask->getType())->getNumElements(); for (unsigned i = 0; i != NumElts; ++i) { Constant *CElt = C->getAggregateElement(i); if (!CElt || !isa(CElt)) @@ -132,7 +132,7 @@ static void scalarizeMaskedLoad(CallInst *CI, bool &ModifiedDT) { Value *Src0 = CI->getArgOperand(3); const Align AlignVal = cast(Alignment)->getAlignValue(); - VectorType *VecType = cast(CI->getType()); + VectorType *VecType = cast(CI->getType()); Type *EltTy = VecType->getElementType(); @@ -158,7 +158,7 @@ static void scalarizeMaskedLoad(CallInst *CI, bool &ModifiedDT) { Type *NewPtrType = EltTy->getPointerTo(Ptr->getType()->getPointerAddressSpace()); Value *FirstEltPtr = Builder.CreateBitCast(Ptr, NewPtrType); - unsigned VectorWidth = VecType->getNumElements(); + unsigned VectorWidth = cast(VecType)->getNumElements(); // The result vector Value *VResult = Src0; @@ -271,7 +271,7 @@ static void scalarizeMaskedStore(CallInst *CI, bool &ModifiedDT) { Value *Mask = CI->getArgOperand(3); const Align AlignVal = cast(Alignment)->getAlignValue(); - VectorType *VecType = cast(Src->getType()); + auto *VecType = cast(Src->getType()); Type *EltTy = VecType->getElementType(); @@ -295,7 +295,7 @@ static void scalarizeMaskedStore(CallInst *CI, bool &ModifiedDT) { Type *NewPtrType = EltTy->getPointerTo(Ptr->getType()->getPointerAddressSpace()); Value *FirstEltPtr = Builder.CreateBitCast(Ptr, NewPtrType); - unsigned VectorWidth = VecType->getNumElements(); + unsigned VectorWidth = cast(VecType)->getNumElements(); if (isConstantIntVector(Mask)) { for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) { @@ -396,7 +396,7 @@ static void scalarizeMaskedGather(CallInst *CI, bool &ModifiedDT) { Value *Mask = CI->getArgOperand(2); Value *Src0 = CI->getArgOperand(3); - VectorType *VecType = cast(CI->getType()); + auto *VecType = cast(CI->getType()); Type *EltTy = VecType->getElementType(); IRBuilder<> Builder(CI->getContext()); @@ -520,8 +520,8 @@ static void scalarizeMaskedScatter(CallInst *CI, bool &ModifiedDT) { Value *Alignment = CI->getArgOperand(2); Value *Mask = CI->getArgOperand(3); - assert(isa(Src->getType()) && - "Unexpected data type in masked scatter intrinsic"); + auto *SrcFVTy = cast(Src->getType()); + assert( isa(Ptrs->getType()) && isa(cast(Ptrs->getType())->getElementType()) && @@ -534,7 +534,7 @@ static void scalarizeMaskedScatter(CallInst *CI, bool &ModifiedDT) { Builder.SetCurrentDebugLocation(CI->getDebugLoc()); MaybeAlign AlignVal = cast(Alignment)->getMaybeAlignValue(); - unsigned VectorWidth = cast(Src->getType())->getNumElements(); + unsigned VectorWidth = SrcFVTy->getNumElements(); // Shorten the way if the mask is a vector of constants. if (isConstantIntVector(Mask)) { @@ -605,7 +605,7 @@ static void scalarizeMaskedExpandLoad(CallInst *CI, bool &ModifiedDT) { Value *Mask = CI->getArgOperand(1); Value *PassThru = CI->getArgOperand(2); - VectorType *VecType = cast(CI->getType()); + auto *VecType = cast(CI->getType()); Type *EltTy = VecType->getElementType(); @@ -718,7 +718,7 @@ static void scalarizeMaskedCompressStore(CallInst *CI, bool &ModifiedDT) { Value *Ptr = CI->getArgOperand(1); Value *Mask = CI->getArgOperand(2); - VectorType *VecType = cast(Src->getType()); + auto *VecType = cast(Src->getType()); IRBuilder<> Builder(CI->getContext()); Instruction *InsertPt = CI; diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index f5e12101c8e9..c8b72abb9b7d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -4295,7 +4295,7 @@ static bool getUniformBase(const Value *Ptr, SDValue &Base, SDValue &Index, Base = SDB->getValue(C); - unsigned NumElts = cast(Ptr->getType())->getNumElements(); + unsigned NumElts = cast(Ptr->getType())->getNumElements(); EVT VT = EVT::getVectorVT(*DAG.getContext(), TLI.getPointerTy(DL), NumElts); Index = DAG.getConstant(0, SDB->getCurSDLoc(), VT); IndexType = ISD::SIGNED_SCALED; diff --git a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp index eef5c1463fde..27bebe503ce6 100644 --- a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp +++ b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp @@ -1765,7 +1765,7 @@ static std::string scalarConstantToHexString(const Constant *C) { } else { unsigned NumElements; if (auto *VTy = dyn_cast(Ty)) - NumElements = VTy->getNumElements(); + NumElements = cast(VTy)->getNumElements(); else NumElements = Ty->getArrayNumElements(); std::string HexString; diff --git a/llvm/lib/CodeGen/ValueTypes.cpp b/llvm/lib/CodeGen/ValueTypes.cpp index 2ff596629a7e..66bcdd9b2c4a 100644 --- a/llvm/lib/CodeGen/ValueTypes.cpp +++ b/llvm/lib/CodeGen/ValueTypes.cpp @@ -122,7 +122,14 @@ EVT EVT::getExtendedVectorElementType() const { unsigned EVT::getExtendedVectorNumElements() const { assert(isExtended() && "Type is not extended!"); - return cast(LLVMTy)->getNumElements(); + ElementCount EC = cast(LLVMTy)->getElementCount(); + if (EC.Scalable) { + WithColor::warning() + << "The code that requested the fixed number of elements has made the " + "assumption that this vector is not scalable. This assumption was " + "not correct, and this may lead to broken code\n"; + } + return EC.Min; } ElementCount EVT::getExtendedVectorElementCount() const { From llvm-commits at lists.llvm.org Thu Jul 9 12:43:55 2020 From: llvm-commits at lists.llvm.org (Christopher Tetreault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:43:55 +0000 (UTC) Subject: [PATCH] D82210: [SVE] Remove calls to VectorType::getNumElements from CodeGen In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGff5b9a7b3b27: [SVE] Remove calls to VectorType::getNumElements from CodeGen (authored by ctetreau). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82210/new/ https://reviews.llvm.org/D82210 Files: llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/ExpandReductions.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/InterleavedAccessPass.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/CodeGen/LowLevelType.cpp llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/CodeGen/ValueTypes.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82210.276811.patch Type: text/x-patch Size: 11591 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 12:59:49 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 19:59:49 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <79cb127463285c3c64781cfadfb170b6@localhost.localdomain> dblaikie added a comment. In D83431#2142420 , @lebedev.ri wrote: > @dblaikie does this look like the wording you'd expect? Yep, looks alright, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Thu Jul 9 13:00:25 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:00:25 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: <002258a9372504af3c084523c5360ea3@localhost.localdomain> tejohnson marked 6 inline comments as done. tejohnson added inline comments. ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:103 uptr FindDynamicShadowStart() { + uptr shadow_size_bytes = GetHighMemEnd(SHADOW_SCALE) >> SHADOW_SCALE; #if ASAN_PREMAP_SHADOW ---------------- vitalybuka wrote: > MemToShadowSize(GetHighMemEnd(SHADOW_SCALE)) The MemToShadowSize facility currently only exists in hwasan, let me go ahead and add it to asan as well (I see I accidentally started referencing it in the asan_mac/win code!). ================ Comment at: compiler-rt/lib/asan/asan_linux.cpp:122 - uptr granularity = GetMmapGranularity(); - uptr alignment = granularity * 8; - uptr left_padding = granularity; ---------------- vitalybuka wrote: > kcc wrote: > > tejohnson wrote: > > > The code in asan is multiplying the mmap granularity by 8, whereas the hwasan version shifts it by kShadowScale. I wasn't sure if the 8 here is supposed to be equivalent to a left shift by the shadow scale (which is typically 3 in asan), or is specifically hardcoded separately not using SHADOW_SCALE since it could be something other than 3 in some cases (e.g. 5 for myriad, or user set via ASAN_SHADOW_SCALE). Depending on what was intended here, I would keep the hardcoding of "3" passed to my refactored MapDynamicShadow, or change that to SHADOW_SCALE. > > I frankly don't remember :( > It should be SHADOW_SCALE, myriad works only because it does not use dynamic shadow Ok thanks, I'll change the code to use that. ================ Comment at: compiler-rt/lib/hwasan/hwasan.cpp:289 - MadviseShadow(); - ---------------- vitalybuka wrote: > why it's gone? It is now embedded in the refactored code (similar to the way asan was doing the madvise calls). See ReserveShadowMemoryRange (the refactored one and the original asan one). ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_common.h:123 void MprotectMallocZones(void *addr, int prot); +// Get the max address, taking into account alignment due to the mmap ---------------- vitalybuka wrote: > Shadow is specific to only some sanitizers, so I don't like to have it in /sanitizer_common/ > Also we have msan/tsan/dfsan with different shadows for which is not clear if we can reuse these functions without exposing more particular sanitizer details. Maybe keeping all shadow code with some redundancy but independent is going to be easier to maintain in long term. > > Anyway, if we still go this way, maybe we put code into sanitizer_common/sanitizer_shadow_* files ? > > > Also we have msan/tsan/dfsan with different shadows for which is not clear if we can reuse these functions without exposing more particular sanitizer details. Maybe keeping all shadow code with some redundancy but independent is going to be easier to maintain in long term. I started to look at trying to coalesce some of the other *san shadow setup code with the *asan versions, but decided to punt on that, at least for now. There are a lot of things that looked similar there too, but not as much as in *asan. As I mentioned in my reply on the RFC just now, it seemed like a good idea to at least common the *asan versions since they were structured somewhat different but essentially functionally the same in almost every way, and it seemed confusing to have multiple versions that aren't obviously duplicates but in fact are. That being said, I could have the new heap profiler duplicate them (at least the parts that I need) if that is much preferred. I don't technically need ProtectGap for the heap profiler, but it seems like a good thing to use at least while debugging the instrumentation code. > Anyway, if we still go this way, maybe we put code into sanitizer_common/sanitizer_shadow_* files ? I can do that. There is a small amount of shadow setup code already here, I could move that too. ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cpp:204 + const uptr granularity = GetMmapGranularity(); + const uptr alignment = shadow_base_alignment + ? 1ULL << shadow_base_alignment ---------------- vitalybuka wrote: > I think it's going to be cleaner if we replace > uptr mmap_alignment_scale, uptr shadow_base_alignment > with > uptr shadow_scale, uptr min_shadow_base_alignment > and adjust calculations accordingly: > const uptr alignment = max(granularity << shadow_scale, min_shadow_base_alignment) > > it should be copied into mac and win, even if they use 0 there, for consistency Sounds good ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp:1083 + largest_gap_found, max_occupied_addr); + uptr new_max_vm = RoundDownTo(largest_gap_found << SHADOW_SCALE, alignment); + if (new_max_vm < max_occupied_addr) { ---------------- vitalybuka wrote: > SHADOW_SCALE is undefined here For this and other comments on the win/mac version, obviously I didn't do a very careful job of moving this code from asan to sanitizer_common. =( Will work on fixing those and need to find a way to at least compile these codes for those platforms to flush out these issues. If you have any pointers on setting up a cross compile of this code on a linux system please let me know. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Thu Jul 9 13:03:32 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:03:32 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: zequanwu updated this revision to Diff 276813. zequanwu added a comment. rebase. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276813.patch Type: text/x-patch Size: 17602 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:03:52 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via llvm-commits) Date: Thu, 09 Jul 2020 13:03:52 -0700 (PDT) Subject: [llvm] c92a8c0 - [LPM] Port CGProfilePass from NPM to LPM Message-ID: <5f077828.1c69fb81.84699.8d6f@mx.google.com> Author: Zequan Wu Date: 2020-07-09T13:03:42-07:00 New Revision: c92a8c0a0f68fbbb23e3fdde071007e63a552e82 URL: https://github.com/llvm/llvm-project/commit/c92a8c0a0f68fbbb23e3fdde071007e63a552e82 DIFF: https://github.com/llvm/llvm-project/commit/c92a8c0a0f68fbbb23e3fdde071007e63a552e82.diff LOG: [LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013 Added: Modified: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll Removed: llvm/test/Other/new-pm-cgprofile.ll ################################################################################ diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def index d465e00d4c70..f3e43919eeca 100644 --- a/clang/include/clang/Basic/CodeGenOptions.def +++ b/clang/include/clang/Basic/CodeGenOptions.def @@ -252,7 +252,6 @@ CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables. CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer. CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer. CODEGENOPT(ProfileSampleAccurate, 1, 0) ///< Sample profile is accurate. -CODEGENOPT(CallGraphProfile , 1, 0) ///< Run call graph profile. /// Attempt to use register sized accesses to bit-fields in structures, when /// possible. diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 9e6d5e4593d3..3ada1aaa4ed8 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -620,6 +620,7 @@ void EmitAssemblyHelper::CreatePasses(legacy::PassManager &MPM, PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize; PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP; PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop; + PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops; // Loop interleaving in the loop vectorizer has historically been set to be @@ -1144,7 +1145,7 @@ void EmitAssemblyHelper::EmitAssemblyWithNewPassManager( PTO.LoopInterleaving = CodeGenOpts.UnrollLoops; PTO.LoopVectorization = CodeGenOpts.VectorizeLoop; PTO.SLPVectorization = CodeGenOpts.VectorizeSLP; - PTO.CallGraphProfile = CodeGenOpts.CallGraphProfile; + PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PTO.Coroutines = LangOpts.Coroutines; PassInstrumentationCallbacks PIC; @@ -1562,7 +1563,7 @@ static void runThinLTOBackend( Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops; Conf.PTO.LoopVectorization = CGOpts.VectorizeLoop; Conf.PTO.SLPVectorization = CGOpts.VectorizeSLP; - Conf.PTO.CallGraphProfile = CGOpts.CallGraphProfile; + Conf.PTO.CallGraphProfile = !CGOpts.DisableIntegratedAS; // Context sensitive profile. if (CGOpts.hasProfileCSIRInstr()) { diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index 6f6af917e3a3..fd34c6b8a955 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -860,7 +860,6 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK, Opts.RerollLoops = Args.hasArg(OPT_freroll_loops); Opts.DisableIntegratedAS = Args.hasArg(OPT_fno_integrated_as); - Opts.CallGraphProfile = !Opts.DisableIntegratedAS; Opts.Autolink = !Args.hasArg(OPT_fno_autolink); Opts.SampleProfileFile = std::string(Args.getLastArgValue(OPT_fprofile_sample_use_EQ)); diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index f0d5accf13c5..06e8507036ac 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -103,6 +103,7 @@ void initializeCFGViewerLegacyPassPass(PassRegistry&); void initializeCFIInstrInserterPass(PassRegistry&); void initializeCFLAndersAAWrapperPassPass(PassRegistry&); void initializeCFLSteensAAWrapperPassPass(PassRegistry&); +void initializeCGProfileLegacyPassPass(PassRegistry &); void initializeCallGraphDOTPrinterPass(PassRegistry&); void initializeCallGraphPrinterLegacyPassPass(PassRegistry&); void initializeCallGraphViewerPass(PassRegistry&); diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index 28e454d3b0fc..d1b9f269d5d4 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -282,6 +282,8 @@ ModulePass *createSampleProfileLoaderPass(StringRef Name); ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str, raw_ostream *ThinLinkOS = nullptr); +ModulePass *createCGProfileLegacyPass(); + } // End llvm namespace #endif diff --git a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h index 8b03bcba10e4..a9928c3f5a40 100644 --- a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h +++ b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h @@ -156,6 +156,7 @@ class PassManagerBuilder { bool DisableTailCalls; bool DisableUnrollLoops; + bool CallGraphProfile; bool SLPVectorize; bool LoopVectorize; bool LoopsInterleaved; diff --git a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h index 28fd3804dec9..4cb45fd42f80 100644 --- a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h +++ b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h @@ -19,11 +19,6 @@ namespace llvm { class CGProfilePass : public PassInfoMixin { public: PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM); - -private: - void addModuleFlags( - Module &M, - MapVector, uint64_t> &Counts) const; }; } // end namespace llvm diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 58510609cf5e..4d6c30b87a99 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -248,10 +248,6 @@ static cl::opt EnableCHR("enable-chr-npm", cl::init(true), cl::Hidden, cl::desc("Enable control height reduction optimization (CHR)")); -static cl::opt EnableCallGraphProfile( - "enable-npm-call-graph-profile", cl::init(true), cl::Hidden, - cl::desc("Enable call graph profile pass for the new PM (default = on)")); - /// Flag to enable inline deferral during PGO. static cl::opt EnablePGOInlineDeferral("enable-npm-pgo-inline-deferral", cl::init(true), @@ -267,7 +263,7 @@ PipelineTuningOptions::PipelineTuningOptions() { Coroutines = false; LicmMssaOptCap = SetLicmMssaOptCap; LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap; - CallGraphProfile = EnableCallGraphProfile; + CallGraphProfile = true; } extern cl::opt EnableHotColdSplit; diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 9534fb874107..b65eb469a492 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -195,6 +195,7 @@ PassManagerBuilder::PassManagerBuilder() { PrepareForThinLTO = EnablePrepareForThinLTO; PerformThinLTO = EnablePerformThinLTO; DivergentTarget = false; + CallGraphProfile = true; } PassManagerBuilder::~PassManagerBuilder() { @@ -834,6 +835,10 @@ void PassManagerBuilder::populateModulePassManager( if (MergeFunctions) MPM.add(createMergeFunctionsPass()); + // Add Module flag "CG Profile" based on Branch Frequency Information. + if (CallGraphProfile) + MPM.add(createCGProfileLegacyPass()); + // LoopSink pass sinks instructions hoisted by LICM, which serves as a // canonicalization pass that enables other optimizations. As a result, // LoopSink pass needs to be a very late IR pass to avoid undoing LICM diff --git a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp index 2d5bd9570940..e95731a2117b 100644 --- a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp +++ b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp @@ -10,22 +10,48 @@ #include "llvm/ADT/MapVector.h" #include "llvm/Analysis/BlockFrequencyInfo.h" +#include "llvm/Analysis/LazyBlockFrequencyInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/MDBuilder.h" #include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" #include "llvm/ProfileData/InstrProf.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/Instrumentation.h" #include using namespace llvm; -PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { +static bool +addModuleFlags(Module &M, + MapVector, uint64_t> &Counts) { + if (Counts.empty()) + return false; + + LLVMContext &Context = M.getContext(); + MDBuilder MDB(Context); + std::vector Nodes; + + for (auto E : Counts) { + Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), + ValueAsMetadata::get(E.first.second), + MDB.createConstant(ConstantInt::get( + Type::getInt64Ty(Context), E.second))}; + Nodes.push_back(MDNode::get(Context, Vals)); + } + + M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); + return true; +} + +static bool +runCGProfilePass(Module &M, + function_ref GetBFI, + function_ref GetTTI) { MapVector, uint64_t> Counts; - FunctionAnalysisManager &FAM = - MAM.getResult(M).getManager(); InstrProfSymtab Symtab; auto UpdateCounts = [&](TargetTransformInfo &TTI, Function *F, Function *CalledF, uint64_t NewCount) { @@ -35,14 +61,14 @@ PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { Count = SaturatingAdd(Count, NewCount); }; // Ignore error here. Indirect calls are ignored if this fails. - (void)(bool)Symtab.create(M); + (void)(bool) Symtab.create(M); for (auto &F : M) { - if (F.isDeclaration()) + if (F.isDeclaration() || !F.getEntryCount()) continue; - auto &BFI = FAM.getResult(F); + auto &BFI = GetBFI(F); if (BFI.getEntryFreq() == 0) continue; - TargetTransformInfo &TTI = FAM.getResult(F); + TargetTransformInfo &TTI = GetTTI(F); for (auto &BB : F) { Optional BBCount = BFI.getBlockProfileCount(&BB); if (!BBCount) @@ -69,28 +95,56 @@ PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { } } - addModuleFlags(M, Counts); - - return PreservedAnalyses::all(); + return addModuleFlags(M, Counts); } -void CGProfilePass::addModuleFlags( - Module &M, - MapVector, uint64_t> &Counts) const { - if (Counts.empty()) - return; +namespace { +struct CGProfileLegacyPass final : public ModulePass { + static char ID; + CGProfileLegacyPass() : ModulePass(ID) { + initializeCGProfileLegacyPassPass(*PassRegistry::getPassRegistry()); + } - LLVMContext &Context = M.getContext(); - MDBuilder MDB(Context); - std::vector Nodes; + void getAnalysisUsage(AnalysisUsage &AU) const override { + AU.setPreservesCFG(); + AU.addRequired(); + AU.addRequired(); + } - for (auto E : Counts) { - Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), - ValueAsMetadata::get(E.first.second), - MDB.createConstant(ConstantInt::get( - Type::getInt64Ty(Context), E.second))}; - Nodes.push_back(MDNode::get(Context, Vals)); + bool runOnModule(Module &M) override { + auto GetBFI = [this](Function &F) -> BlockFrequencyInfo & { + return this->getAnalysis(F).getBFI(); + }; + auto GetTTI = [this](Function &F) -> TargetTransformInfo & { + return this->getAnalysis().getTTI(F); + }; + + return runCGProfilePass(M, GetBFI, GetTTI); } +}; - M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); +} // namespace + +char CGProfileLegacyPass::ID = 0; + +INITIALIZE_PASS(CGProfileLegacyPass, "cg-profile", "Call Graph Profile", false, + false) + +ModulePass *llvm::createCGProfileLegacyPass() { + return new CGProfileLegacyPass(); +} + +PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { + FunctionAnalysisManager &FAM = + MAM.getResult(M).getManager(); + auto GetBFI = [&FAM](Function &F) -> BlockFrequencyInfo & { + return FAM.getResult(F); + }; + auto GetTTI = [&FAM](Function &F) -> TargetTransformInfo & { + return FAM.getResult(F); + }; + + runCGProfilePass(M, GetBFI, GetTTI); + + return PreservedAnalyses::all(); } diff --git a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp index 64626225f23f..ad238f1357c6 100644 --- a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp @@ -112,6 +112,7 @@ void llvm::initializeInstrumentation(PassRegistry &Registry) { initializePGOInstrumentationUseLegacyPassPass(Registry); initializePGOIndirectCallPromotionLegacyPassPass(Registry); initializePGOMemOPSizeOptLegacyPassPass(Registry); + initializeCGProfileLegacyPassPass(Registry); initializeInstrOrderFileLegacyPassPass(Registry); initializeInstrProfilingLegacyPassPass(Registry); initializeMemorySanitizerLegacyPassPass(Registry); diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 32d36f4e7280..85f9d8c867bf 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -276,6 +276,12 @@ ; GCN-O1-NEXT: Warn about non-applied transformations ; GCN-O1-NEXT: Alignment from assumptions ; GCN-O1-NEXT: Strip Unused Function Prototypes +; GCN-O1-NEXT: Call Graph Profile +; GCN-O1-NEXT: FunctionPass Manager +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: FunctionPass Manager ; GCN-O1-NEXT: Dominator Tree Construction ; GCN-O1-NEXT: Natural Loop Information @@ -623,6 +629,12 @@ ; GCN-O2-NEXT: Strip Unused Function Prototypes ; GCN-O2-NEXT: Dead Global Elimination ; GCN-O2-NEXT: Merge Duplicate Global Constants +; GCN-O2-NEXT: Call Graph Profile +; GCN-O2-NEXT: FunctionPass Manager +; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Natural Loop Information +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: FunctionPass Manager ; GCN-O2-NEXT: Dominator Tree Construction ; GCN-O2-NEXT: Natural Loop Information @@ -975,6 +987,12 @@ ; GCN-O3-NEXT: Strip Unused Function Prototypes ; GCN-O3-NEXT: Dead Global Elimination ; GCN-O3-NEXT: Merge Duplicate Global Constants +; GCN-O3-NEXT: Call Graph Profile +; GCN-O3-NEXT: FunctionPass Manager +; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Natural Loop Information +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: FunctionPass Manager ; GCN-O3-NEXT: Dominator Tree Construction ; GCN-O3-NEXT: Natural Loop Information diff --git a/llvm/test/Instrumentation/cgprofile.ll b/llvm/test/Instrumentation/cgprofile.ll index 1edf3b6ec518..70a1f81aa53e 100644 --- a/llvm/test/Instrumentation/cgprofile.ll +++ b/llvm/test/Instrumentation/cgprofile.ll @@ -1,4 +1,5 @@ ; RUN: opt < %s -passes cg-profile -S | FileCheck %s +; RUN: opt < %s -cg-profile -S | FileCheck %s declare void @b() diff --git a/llvm/test/Other/new-pm-cgprofile.ll b/llvm/test/Other/new-pm-cgprofile.ll deleted file mode 100644 index c7fe31ab570f..000000000000 --- a/llvm/test/Other/new-pm-cgprofile.ll +++ /dev/null @@ -1,11 +0,0 @@ -; RUN: opt -debug-pass-manager -passes='default' %s 2>&1 |FileCheck %s --check-prefixes=DEFAULT -; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=0 %s 2>&1 |FileCheck %s --check-prefixes=OFF -; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=1 %s 2>&1 |FileCheck %s --check-prefixes=ON -; -; DEFAULT: Running pass: CGProfilePass -; OFF-NOT: Running pass: CGProfilePass -; ON: Running pass: CGProfilePass - -define void @foo() { - ret void -} diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index ca72ec1f7567..56f85d0fb9a8 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -280,6 +280,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index f629bfc3444b..942f7d9dfead 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -285,6 +285,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index dde9fbeb9950..d975cc48b629 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -266,6 +266,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information From llvm-commits at lists.llvm.org Thu Jul 9 13:04:02 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:04:02 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <1499635f61bf6b379c79ea0be082e0bf@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGc92a8c0a0f68: [LPM] Port CGProfilePass from NPM to LPM (authored by zequanwu). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276814.patch Type: text/x-patch Size: 17602 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:05:38 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:05:38 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <089e563fe2b0ce0dabbbf2aa97f285b0@localhost.localdomain> efriedma added a subscriber: tgt. efriedma added a comment. > that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef If I'm understanding correctly, this reduces to something like the following: define i32 @src() { %x2 = freeze i32 undef ret i32 %x2 } define i32 @tgt() { ret i32 undef } This seems a little suspect, yes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Thu Jul 9 13:05:58 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:05:58 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: <1445c351a71b3a7f49577bd8b3447cae@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/docs/CodingStandards.rst:1306 +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged, +unless the the callable object already exists. + ---------------- Typo: s/the the/if the/; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 From llvm-commits at lists.llvm.org Thu Jul 9 13:08:04 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:08:04 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: echristo added a comment. Some inline nits. I see you've already committed and that's fine - I still don't think we should do it, but we can delete it again soon :) ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:623 PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop; + PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; ---------------- Comment here as to why. ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:1148 PTO.SLPVectorization = CodeGenOpts.VectorizeSLP; - PTO.CallGraphProfile = CodeGenOpts.CallGraphProfile; + PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PTO.Coroutines = LangOpts.Coroutines; ---------------- Comment here as to why. ================ Comment at: clang/lib/CodeGen/BackendUtil.cpp:1566 Conf.PTO.SLPVectorization = CGOpts.VectorizeSLP; - Conf.PTO.CallGraphProfile = CGOpts.CallGraphProfile; + Conf.PTO.CallGraphProfile = !CGOpts.DisableIntegratedAS; ---------------- Ditto :) ================ Comment at: llvm/lib/Transforms/Instrumentation/CGProfile.cpp:64 // Ignore error here. Indirect calls are ignored if this fails. - (void)(bool)Symtab.create(M); + (void)(bool) Symtab.create(M); for (auto &F : M) { ---------------- Extra space? Did clang-format put this in? ================ Comment at: llvm/lib/Transforms/Instrumentation/CGProfile.cpp:66 for (auto &F : M) { - if (F.isDeclaration()) + if (F.isDeclaration() || !F.getEntryCount()) continue; ---------------- Comment? What's the change for? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Thu Jul 9 13:11:11 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 13:11:11 -0700 (PDT) Subject: [llvm] caa423e - Revert "[InstCombine] Lower infinite combine loop detection thresholds" Message-ID: <5f0779df.1c69fb81.b1193.94f3@mx.google.com> Author: Roman Lebedev Date: 2020-07-09T23:10:42+03:00 New Revision: caa423eef0d128f35ac11ddbce34964caafb61c1 URL: https://github.com/llvm/llvm-project/commit/caa423eef0d128f35ac11ddbce34964caafb61c1 DIFF: https://github.com/llvm/llvm-project/commit/caa423eef0d128f35ac11ddbce34964caafb61c1.diff LOG: Revert "[InstCombine] Lower infinite combine loop detection thresholds" And just after 3 days, we have a hit in `InstCombiner::mergeStoreIntoSuccessor()`: https://bugs.llvm.org/show_bug.cgi?id=46661 To be recommitted once that is addressed. This reverts commit cd7f8051ac7b6f08734102446482c1e5d951bfcc. Added: Modified: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index e810b3de25bc..d1c1e5418825 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,13 +123,8 @@ STATISTIC(NumReassoc , "Number of reassociations"); DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); -// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; -#ifndef NDEBUG -static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; -#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; -#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), From llvm-commits at lists.llvm.org Thu Jul 9 13:11:13 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 13:11:13 -0700 (PDT) Subject: [llvm] 29a9dd5 - [Docs] CodingStandards: for_each is discouraged Message-ID: <5f0779e1.1c69fb81.54cff.93d2@mx.google.com> Author: Roman Lebedev Date: 2020-07-09T23:10:42+03:00 New Revision: 29a9dd5bfe50be9b6aecbe95c6670734e5ee29c5 URL: https://github.com/llvm/llvm-project/commit/29a9dd5bfe50be9b6aecbe95c6670734e5ee29c5 DIFF: https://github.com/llvm/llvm-project/commit/29a9dd5bfe50be9b6aecbe95c6670734e5ee29c5.diff LOG: [Docs] CodingStandards: for_each is discouraged Summary: As per disscussion in D83351, using `for_each` is potentially confusing, at least in regards to inconsistent style (there's less than 100 `for_each` usages in LLVM, but ~100.000 `for` range-based loops Therefore, it should be avoided. Reviewers: dblaikie, nickdesaulniers Reviewed By: dblaikie, nickdesaulniers Subscribers: hubert.reinterpretcast, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83431 Added: Modified: llvm/docs/CodingStandards.rst Removed: ################################################################################ diff --git a/llvm/docs/CodingStandards.rst b/llvm/docs/CodingStandards.rst index 861ab05420fb..99fb6af02a28 100644 --- a/llvm/docs/CodingStandards.rst +++ b/llvm/docs/CodingStandards.rst @@ -1302,6 +1302,9 @@ loops wherever possible for all newly added code. For example: for (Instruction &I : *BB) ... use I ... +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged, +unless the the callable object already exists. + Don't evaluate ``end()`` every time through a loop ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From llvm-commits at lists.llvm.org Thu Jul 9 13:11:15 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 13:11:15 -0700 (PDT) Subject: [llvm] d8bf5e8 - [NFCI][llvm-reduce] OperandBundleCounter: drop pointless constructor Message-ID: <5f0779e3.1c69fb81.ba785.94c1@mx.google.com> Author: Roman Lebedev Date: 2020-07-09T23:10:42+03:00 New Revision: d8bf5e8048dbd1f726d50b43fa4f8ed4fa9a5178 URL: https://github.com/llvm/llvm-project/commit/d8bf5e8048dbd1f726d50b43fa4f8ed4fa9a5178 DIFF: https://github.com/llvm/llvm-project/commit/d8bf5e8048dbd1f726d50b43fa4f8ed4fa9a5178.diff LOG: [NFCI][llvm-reduce] OperandBundleCounter: drop pointless constructor Reviewers: nickdesaulniers, dblaikie Reviewed By: nickdesaulniers Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83435 Added: Modified: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp index 3f1cb3740813..cc2aebf46df1 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -67,8 +67,6 @@ struct OperandBundleCounter : public InstVisitor { /// How many features (in this case, operand bundles) did we count, total? int OperandBundeCount = 0; - OperandBundleCounter() {} - /// So far only CallBase sub-classes can have operand bundles. void visitCallBase(CallBase &Call) { // Just accumulate the total number of operand bundles. From llvm-commits at lists.llvm.org Thu Jul 9 13:11:17 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 13:11:17 -0700 (PDT) Subject: [llvm] 6b82441 - [NFC][llvm-reduce] Purify for_each usage in Operand Bundles into range-based for loop Message-ID: <5f0779e5.1c69fb81.c055f.8834@mx.google.com> Author: Roman Lebedev Date: 2020-07-09T23:10:43+03:00 New Revision: 6b824415a21c188adfcabbb61ac8cf5d44b8e236 URL: https://github.com/llvm/llvm-project/commit/6b824415a21c188adfcabbb61ac8cf5d44b8e236 DIFF: https://github.com/llvm/llvm-project/commit/6b824415a21c188adfcabbb61ac8cf5d44b8e236.diff LOG: [NFC][llvm-reduce] Purify for_each usage in Operand Bundles into range-based for loop Summary: As per lengthy/heated disscussion in D83351, and CodingStandards D83431. Reviewers: dblaikie, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83434 Added: Modified: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp index cc2aebf46df1..77cb73837c82 100644 --- a/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ b/llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -56,10 +56,9 @@ class OperandBundleRemapper : public InstVisitor { OperandBundlesToKeepIndexes.reserve(Call.getNumOperandBundles()); // Enumerate every operand bundle on this call. - for_each(seq(0U, Call.getNumOperandBundles()), [&](unsigned BundleIndex) { + for (unsigned BundleIndex : seq(0U, Call.getNumOperandBundles())) if (O.shouldKeep()) // Should we keep this one? OperandBundlesToKeepIndexes.emplace_back(BundleIndex); - }); } }; @@ -102,9 +101,8 @@ static void extractOperandBundesFromModule(std::vector ChunksToKeep, OperandBundleRemapper R(ChunksToKeep); R.visit(Program); - for_each(R.CallsToRefine, [](const auto &P) { - return maybeRewriteCallWithDifferentBundles(P.first, P.second); - }); + for (const auto &I : R.CallsToRefine) + maybeRewriteCallWithDifferentBundles(I.first, I.second); } /// Counts the amount of operand bundles. From llvm-commits at lists.llvm.org Thu Jul 9 13:11:19 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 13:11:19 -0700 (PDT) Subject: [llvm] 03640ee - [llvm-reduce] Reducing attributes Message-ID: <5f0779e7.1c69fb81.1b195.9502@mx.google.com> Author: Roman Lebedev Date: 2020-07-09T23:10:43+03:00 New Revision: 03640ee0fa73c6eaf8cb12050203027239136789 URL: https://github.com/llvm/llvm-project/commit/03640ee0fa73c6eaf8cb12050203027239136789 DIFF: https://github.com/llvm/llvm-project/commit/03640ee0fa73c6eaf8cb12050203027239136789.diff LOG: [llvm-reduce] Reducing attributes Summary: This handles all three places where attributes could currently be - `GlobalVariable`, `Function` and `CallBase`. For last two, it correctly handles all three possible attribute locations (return value, arguments and function itself) There was a previous attempt at it D73853, which was committed in rGfc62b36a000681c01e993242b583c5ec4ab48a3c, but then reverted all the way back in rGb12176d2aafa0ccb2585aa218fc3b454ba84f2a9 due to some (osx?) test failures. Reviewers: nickdesaulniers, dblaikie, diegotf, george.burgess.iv, jdoerfert, Tyker, arsenm Reviewed By: nickdesaulniers Subscribers: wdng, MaskRay, arsenm, llvm-commits, mgorny Tags: #llvm Differential Revision: https://reviews.llvm.org/D83351 Added: llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll llvm/test/Reduce/remove-attributes-from-intrinsics.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h Modified: llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn Removed: ################################################################################ diff --git a/llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll b/llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll new file mode 100644 index 000000000000..60df12e94feb --- /dev/null +++ b/llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll @@ -0,0 +1,40 @@ +; Just because a function is named like an intrinsic does not mean we should skip it's attributes. +; +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +; CHECK-ALL: declare i32 @llvm.not.really.an.intrinsic(i32, i32) #0 +declare i32 @llvm.not.really.an.intrinsic(i32, i32) #0 + +define i32 @t(i32 %a) { +; CHECK-ALL-LABEL: @t( + +; CHECK-INTERESTINGNESS: %r = +; CHECK-INTERESTINGNESS-SAME: call +; CHECK-INTERESTINGNESS-SAME: "arg0" +; CHECK-INTERESTINGNESS-SAME: i32 @llvm.not.really.an.intrinsic(i32 +; CHECK-INTERESTINGNESS-SAME: "arg3" +; CHECK-INTERESTINGNESS-SAME: %a +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: %a +; CHECK-INTERESTINGNESS-SAME: #1 + +; CHECK-FINAL: %r = call "arg0" i32 @llvm.not.really.an.intrinsic(i32 "arg3" %a, i32 %a) #1 +; CHECK-ALL: ret i32 %r + + %r = call "arg0" "arg1" i32 @llvm.not.really.an.intrinsic(i32 "arg2" "arg3" %a, i32 %a) "arg4" "arg5" + ret i32 %r +} + +; CHECK-INTERESTINGNESS: attributes #0 = { +; CHECK-INTERESTINGNESS-SAME: "arg6" + +; CHECK-INTERESTINGNESS: attributes #1 = { +; CHECK-INTERESTINGNESS-SAME: "arg4" + +; CHECK-FINAL: attributes #0 = { "arg6" } +; CHECK-FINAL: attributes #1 = { "arg4" } + +; CHECK-ALL-NOT: attributes # + +attributes #0 = { "arg6" "arg7" } diff --git a/llvm/test/Reduce/remove-attributes-from-intrinsics.ll b/llvm/test/Reduce/remove-attributes-from-intrinsics.ll new file mode 100644 index 000000000000..7a8a8f0eb114 --- /dev/null +++ b/llvm/test/Reduce/remove-attributes-from-intrinsics.ll @@ -0,0 +1,38 @@ +; We can't actually put attributes on intrinsic declarations, only on call sites. +; +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +define i32 @t(i32 %a) { +; CHECK-ALL-LABEL: @t( + +; CHECK-INTERESTINGNESS: %r = +; CHECK-INTERESTINGNESS-SAME: call +; CHECK-INTERESTINGNESS-SAME: "arg0" +; CHECK-INTERESTINGNESS-SAME: i32 @llvm.uadd.sat.i32(i32 +; CHECK-INTERESTINGNESS-SAME: "arg3" +; CHECK-INTERESTINGNESS-SAME: %a +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: %a +; CHECK-INTERESTINGNESS-SAME: #1 + +; CHECK-FINAL: %r = call "arg0" i32 @llvm.uadd.sat.i32(i32 "arg3" %a, i32 %a) #1 +; CHECK-ALL: ret i32 %r + + %r = call "arg0" "arg1" i32 @llvm.uadd.sat.i32(i32 "arg2" "arg3" %a, i32 %a) "arg4" "arg5" + ret i32 %r +} + +; CHECK-ALL: declare i32 @llvm.uadd.sat.i32(i32, i32) #0 +declare i32 @llvm.uadd.sat.i32(i32, i32) #0 + +; CHECK-ALL: attributes #0 = { nounwind readnone speculatable willreturn } + +; CHECK-INTERESTINGNESS: attributes #1 = { +; CHECK-INTERESTINGNESS-SAME: "arg4" + +; CHECK-FINAL: attributes #1 = { "arg4" } + +; CHECK-ALL-NOT: attributes # + +attributes #0 = { "arg6" "arg7" } diff --git a/llvm/test/Reduce/remove-call-site-attributes.ll b/llvm/test/Reduce/remove-call-site-attributes.ll new file mode 100644 index 000000000000..e8f50355812a --- /dev/null +++ b/llvm/test/Reduce/remove-call-site-attributes.ll @@ -0,0 +1,38 @@ +; Test that llvm-reduce can remove uninteresting operand bundles from calls. +; +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +; CHECK-ALL: declare i32 @f1(i32, i32) +declare i32 @f1(i32, i32) + +; CHECK-FINAL-LABEL: define i32 @interesting(i32 %arg0, i32 %arg1) { +; CHECK-FINAL-NEXT: entry: +; CHECK-FINAL-NEXT: %r = call "attr0" i32 @f1(i32 "attr4" %arg0, i32 %arg1) #0 +; CHECK-FINAL-NEXT: ret i32 %r +; CHECK-FINAL-NEXT: } +define i32 @interesting(i32 %arg0, i32 %arg1) { +entry: +; CHECK-INTERESTINGNESS-LABEL: @interesting( + +; CHECK-INTERESTINGNESS: %r = call +; CHECK-INTERESTINGNESS-SAME: "attr0" +; CHECK-INTERESTINGNESS-SAME: i32 @f1( +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: "attr4" +; CHECK-INTERESTINGNESS-SAME: %arg0 +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: %arg1 +; CHECK-INTERESTINGNESS-SAME: #0 +; CHECK-INTERESTINGNESS: ret i32 %r + + %r = call "attr0" "attr1" "attr2" i32 @f1(i32 "attr3" "attr4" "attr5" %arg0, i32 "attr6" "attr7" "attr8" %arg1) #0 + ret i32 %r +} + +; CHECK-INTERESTINGNESS: attributes #0 = { +; CHECK-INTERESTINGNESS-SAME: "attr10" + +; CHECK-FINAL: attributes #0 = { "attr10" } + +attributes #0 = { "attr9" "attr10" "attr11" } diff --git a/llvm/test/Reduce/remove-function-attributes.ll b/llvm/test/Reduce/remove-function-attributes.ll new file mode 100644 index 000000000000..52bbda36f332 --- /dev/null +++ b/llvm/test/Reduce/remove-function-attributes.ll @@ -0,0 +1,23 @@ +; Test that llvm-reduce can remove uninteresting attributes. +; +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +; CHECK-INTERESTINGNESS: declare +; CHECK-INTERESTINGNESS-SAME: "attr0" +; CHECK-INTERESTINGNESS-SAME: void @f0 +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: i32 +; CHECK-INTERESTINGNESS-SAME: "attr6" +; CHECK-INTERESTINGNESS-SAME: #0 + +; CHECK-FINAL: declare "attr0" void @f0(i32, i32 "attr6") #0 + +declare "attr0" "attr1" "attr2" void @f0(i32 "attr3" "attr4" "attr5", i32 "attr6" "attr7" "attr8") #0 + +; CHECK-INTERESTINGNESS: attributes #0 = { +; CHECK-INTERESTINGNESS-SAME: "attr10" + +; CHECK-FINAL: attributes #0 = { "attr10" } + +attributes #0 = { "attr9" "attr10" "attr11" } diff --git a/llvm/test/Reduce/remove-global-variable-attributes.ll b/llvm/test/Reduce/remove-global-variable-attributes.ll new file mode 100644 index 000000000000..bec3afd960e9 --- /dev/null +++ b/llvm/test/Reduce/remove-global-variable-attributes.ll @@ -0,0 +1,27 @@ +; Test that llvm-reduce can remove uninteresting attributes. +; +; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=CHECK-ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t +; RUN: cat %t | FileCheck --check-prefixes=CHECK-ALL,CHECK-FINAL %s + +; CHECK-ALL: @gv0 = global i32 0 #0 +; CHECK-ALL-NEXT: @gv1 = global i32 0 #1 +; CHECK-ALL-NEXT: @gv2 = global i32 0 + at gv0 = global i32 0 #0 + at gv1 = global i32 0 #1 + at gv2 = global i32 0 #2 + +; CHECK-INTERESTINGNESS: attributes #0 = { +; CHECK-INTERESTINGNESS-SAME: "attr0" +; CHECK-INTERESTINGNESS-SAME: "attr2" + +; CHECK-INTERESTINGNESS-NEXT: attributes #1 = { +; CHECK-INTERESTINGNESS-SAME: "attr4" + +; CHECK-FINAL: attributes #0 = { "attr0" "attr2" } +; CHECK-FINAL-NEXT: attributes #1 = { "attr4" } + +; CHECK-FINAL-NOT: attributes #2 + +attributes #0 = { "attr0" "attr1" "attr2"} +attributes #1 = { "attr3" "attr4" "attr5"} +attributes #2 = { "attr6" "attr7" "attr8"} diff --git a/llvm/tools/llvm-reduce/CMakeLists.txt b/llvm/tools/llvm-reduce/CMakeLists.txt index 24eedac613f5..01b9d0b4afe1 100644 --- a/llvm/tools/llvm-reduce/CMakeLists.txt +++ b/llvm/tools/llvm-reduce/CMakeLists.txt @@ -14,6 +14,7 @@ add_llvm_tool(llvm-reduce TestRunner.cpp deltas/Delta.cpp deltas/ReduceArguments.cpp + deltas/ReduceAttributes.cpp deltas/ReduceBasicBlocks.cpp deltas/ReduceFunctions.cpp deltas/ReduceGlobalVars.cpp diff --git a/llvm/tools/llvm-reduce/DeltaManager.h b/llvm/tools/llvm-reduce/DeltaManager.h index 5635352b43d8..b1a4ee0df4db 100644 --- a/llvm/tools/llvm-reduce/DeltaManager.h +++ b/llvm/tools/llvm-reduce/DeltaManager.h @@ -14,6 +14,7 @@ #include "TestRunner.h" #include "deltas/Delta.h" #include "deltas/ReduceArguments.h" +#include "deltas/ReduceAttributes.h" #include "deltas/ReduceBasicBlocks.h" #include "deltas/ReduceFunctions.h" #include "deltas/ReduceGlobalVars.h" @@ -32,6 +33,7 @@ inline void runDeltaPasses(TestRunner &Tester) { reduceArgumentsDeltaPass(Tester); reduceInstructionsDeltaPass(Tester); reduceOperandBundesDeltaPass(Tester); + reduceAttributesDeltaPass(Tester); // TODO: Implement the remaining Delta Passes } diff --git a/llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp b/llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp new file mode 100644 index 000000000000..cbaf5d5efd34 --- /dev/null +++ b/llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp @@ -0,0 +1,200 @@ +//===- ReduceAttributes.cpp - Specialized Delta Pass -------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements a function which calls the Generic Delta pass in order +// to reduce uninteresting attributes. +// +//===----------------------------------------------------------------------===// + +#include "ReduceAttributes.h" +#include "Delta.h" +#include "TestRunner.h" +#include "llvm/ADT/ArrayRef.h" +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/Sequence.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/ADT/iterator_range.h" +#include "llvm/IR/Attributes.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/GlobalVariable.h" +#include "llvm/IR/InstVisitor.h" +#include "llvm/IR/InstrTypes.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/Module.h" +#include "llvm/Support/raw_ostream.h" +#include +#include +#include +#include +#include + +namespace llvm { +class LLVMContext; +} // namespace llvm + +using namespace llvm; + +namespace { + +using AttrPtrVecTy = std::vector; +using AttrPtrIdxVecVecTy = std::pair; +using AttrPtrVecVecTy = SmallVector; + +/// Given ChunksToKeep, produce a map of global variables/functions/calls +/// and indexes of attributes to be preserved for each of them. +class AttributeRemapper : public InstVisitor { + Oracle O; + +public: + DenseMap GlobalVariablesToRefine; + DenseMap FunctionsToRefine; + DenseMap CallsToRefine; + + explicit AttributeRemapper(ArrayRef ChunksToKeep) : O(ChunksToKeep) {} + + void visitModule(Module &M) { + for (GlobalVariable &GV : M.getGlobalList()) + visitGlobalVariable(GV); + } + + void visitGlobalVariable(GlobalVariable &GV) { + // Global variables only have one attribute set. + const AttributeSet &AS = GV.getAttributes(); + if (AS.hasAttributes()) + visitAttributeSet(AS, GlobalVariablesToRefine[&GV]); + } + + void visitFunction(Function &F) { + if (F.getIntrinsicID() != Intrinsic::not_intrinsic) + return; // We can neither add nor remove attributes from intrinsics. + visitAttributeList(F.getAttributes(), FunctionsToRefine[&F]); + } + + void visitCallBase(CallBase &I) { + visitAttributeList(I.getAttributes(), CallsToRefine[&I]); + } + + void visitAttributeList(const AttributeList &AL, + AttrPtrVecVecTy &AttributeSetsToPreserve) { + assert(AttributeSetsToPreserve.empty() && "Should not be sharing vectors."); + AttributeSetsToPreserve.reserve(AL.getNumAttrSets()); + for (unsigned SetIdx : seq(AL.index_begin(), AL.index_end())) { + AttrPtrIdxVecVecTy AttributesToPreserve; + AttributesToPreserve.first = SetIdx; + visitAttributeSet(AL.getAttributes(AttributesToPreserve.first), + AttributesToPreserve.second); + if (!AttributesToPreserve.second.empty()) + AttributeSetsToPreserve.emplace_back(std::move(AttributesToPreserve)); + } + } + + void visitAttributeSet(const AttributeSet &AS, + AttrPtrVecTy &AttrsToPreserve) { + assert(AttrsToPreserve.empty() && "Should not be sharing vectors."); + AttrsToPreserve.reserve(AS.getNumAttributes()); + for (const Attribute &A : AS) + if (O.shouldKeep()) + AttrsToPreserve.emplace_back(&A); + } +}; + +struct AttributeCounter : public InstVisitor { + /// How many features (in this case, attributes) did we count, total? + int AttributeCount = 0; + + void visitModule(Module &M) { + for (GlobalVariable &GV : M.getGlobalList()) + visitGlobalVariable(GV); + } + + void visitGlobalVariable(GlobalVariable &GV) { + // Global variables only have one attribute set. + visitAttributeSet(GV.getAttributes()); + } + + void visitFunction(Function &F) { + if (F.getIntrinsicID() != Intrinsic::not_intrinsic) + return; // We can neither add nor remove attributes from intrinsics. + visitAttributeList(F.getAttributes()); + } + + void visitCallBase(CallBase &I) { visitAttributeList(I.getAttributes()); } + + void visitAttributeList(const AttributeList &AL) { + for (const AttributeSet &AS : AL) + visitAttributeSet(AS); + } + + void visitAttributeSet(const AttributeSet &AS) { + AttributeCount += AS.getNumAttributes(); + } +}; + +} // namespace + +AttributeSet +convertAttributeRefToAttributeSet(LLVMContext &C, + ArrayRef Attributes) { + AttrBuilder B; + for (const Attribute *A : Attributes) + B.addAttribute(*A); + return AttributeSet::get(C, B); +} + +AttributeList convertAttributeRefVecToAttributeList( + LLVMContext &C, ArrayRef AttributeSets) { + std::vector> SetVec; + SetVec.reserve(AttributeSets.size()); + + transform(AttributeSets, std::back_inserter(SetVec), + [&C](const AttrPtrIdxVecVecTy &V) { + return std::make_pair( + V.first, convertAttributeRefToAttributeSet(C, V.second)); + }); + + sort(SetVec, [](const std::pair &LHS, + const std::pair &RHS) { + return LHS.first < RHS.first; // All values are unique. + }); + + return AttributeList::get(C, SetVec); +} + +/// Removes out-of-chunk attributes from module. +static void extractAttributesFromModule(std::vector ChunksToKeep, + Module *Program) { + AttributeRemapper R(ChunksToKeep); + R.visit(Program); + + LLVMContext &C = Program->getContext(); + for (const auto &I : R.GlobalVariablesToRefine) + I.first->setAttributes(convertAttributeRefToAttributeSet(C, I.second)); + for (const auto &I : R.FunctionsToRefine) + I.first->setAttributes(convertAttributeRefVecToAttributeList(C, I.second)); + for (const auto &I : R.CallsToRefine) + I.first->setAttributes(convertAttributeRefVecToAttributeList(C, I.second)); +} + +/// Counts the amount of attributes. +static int countAttributes(Module *Program) { + AttributeCounter C; + + // TODO: Silence index with --quiet flag + outs() << "----------------------------\n"; + C.visit(Program); + outs() << "Number of attributes: " << C.AttributeCount << "\n"; + + return C.AttributeCount; +} + +void llvm::reduceAttributesDeltaPass(TestRunner &Test) { + outs() << "*** Reducing Attributes...\n"; + int AttributeCount = countAttributes(Test.getProgram()); + runDeltaPass(Test, AttributeCount, extractAttributesFromModule); +} diff --git a/llvm/tools/llvm-reduce/deltas/ReduceAttributes.h b/llvm/tools/llvm-reduce/deltas/ReduceAttributes.h new file mode 100644 index 000000000000..f8deb045560f --- /dev/null +++ b/llvm/tools/llvm-reduce/deltas/ReduceAttributes.h @@ -0,0 +1,20 @@ +//===- ReduceAttributes.h - Specialized Delta Pass ------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements a function which calls the Generic Delta pass in order +// to reduce uninteresting attributes. +// +//===----------------------------------------------------------------------===// + +namespace llvm { + +class TestRunner; + +void reduceAttributesDeltaPass(TestRunner &Test); + +} // namespace llvm diff --git a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn index efb8e40850c3..a8648d73ca0d 100644 --- a/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn @@ -12,6 +12,7 @@ executable("llvm-reduce") { "TestRunner.cpp", "deltas/Delta.cpp", "deltas/ReduceArguments.cpp", + "deltas/ReduceAttributes.cpp", "deltas/ReduceBasicBlocks.cpp", "deltas/ReduceFunctions.cpp", "deltas/ReduceGlobalVars.cpp", From llvm-commits at lists.llvm.org Thu Jul 9 13:11:23 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:11:23 +0000 (UTC) Subject: [PATCH] D83431: [Docs] CodingStandards: for_each is discouraged In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG29a9dd5bfe50: [Docs] CodingStandards: for_each is discouraged (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83431/new/ https://reviews.llvm.org/D83431 Files: llvm/docs/CodingStandards.rst Index: llvm/docs/CodingStandards.rst =================================================================== --- llvm/docs/CodingStandards.rst +++ llvm/docs/CodingStandards.rst @@ -1302,6 +1302,9 @@ for (Instruction &I : *BB) ... use I ... +Usage of ``std::for_each()``/``llvm::for_each()`` functions is discouraged, +unless the the callable object already exists. + Don't evaluate ``end()`` every time through a loop ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------- next part -------------- A non-text attachment was scrubbed... Name: D83431.276817.patch Type: text/x-patch Size: 479 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:11:24 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:11:24 +0000 (UTC) Subject: [PATCH] D83435: [NFCI][llvm-reduce] OperandBundleCounter: drop pointless constructor In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGd8bf5e8048db: [NFCI][llvm-reduce] OperandBundleCounter: drop pointless constructor (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83435/new/ https://reviews.llvm.org/D83435 Files: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Index: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp =================================================================== --- llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -67,8 +67,6 @@ /// How many features (in this case, operand bundles) did we count, total? int OperandBundeCount = 0; - OperandBundleCounter() {} - /// So far only CallBase sub-classes can have operand bundles. void visitCallBase(CallBase &Call) { // Just accumulate the total number of operand bundles. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83435.276818.patch Type: text/x-patch Size: 574 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:11:25 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:11:25 +0000 (UTC) Subject: [PATCH] D83434: [NFC][llvm-reduce] Purify for_each usage in Operand Bundles into range-based for loop In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG6b824415a21c: [NFC][llvm-reduce] Purify for_each usage in Operand Bundles into range-based… (authored by lebedev.ri). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83434/new/ https://reviews.llvm.org/D83434 Files: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp Index: llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp =================================================================== --- llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp +++ llvm/tools/llvm-reduce/deltas/ReduceOperandBundles.cpp @@ -56,10 +56,9 @@ OperandBundlesToKeepIndexes.reserve(Call.getNumOperandBundles()); // Enumerate every operand bundle on this call. - for_each(seq(0U, Call.getNumOperandBundles()), [&](unsigned BundleIndex) { + for (unsigned BundleIndex : seq(0U, Call.getNumOperandBundles())) if (O.shouldKeep()) // Should we keep this one? OperandBundlesToKeepIndexes.emplace_back(BundleIndex); - }); } }; @@ -102,9 +101,8 @@ OperandBundleRemapper R(ChunksToKeep); R.visit(Program); - for_each(R.CallsToRefine, [](const auto &P) { - return maybeRewriteCallWithDifferentBundles(P.first, P.second); - }); + for (const auto &I : R.CallsToRefine) + maybeRewriteCallWithDifferentBundles(I.first, I.second); } /// Counts the amount of operand bundles. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83434.276819.patch Type: text/x-patch Size: 1044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:11:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:11:27 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: <21027d72047d17231c3f7add7b895bb8@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG03640ee0fa73: [llvm-reduce] Reducing attributes (authored by lebedev.ri). Changed prior to commit: https://reviews.llvm.org/D83351?vs=276573&id=276820#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 Files: llvm/test/Reduce/remove-attributes-from-intrinsic-like-functions.ll llvm/test/Reduce/remove-attributes-from-intrinsics.ll llvm/test/Reduce/remove-call-site-attributes.ll llvm/test/Reduce/remove-function-attributes.ll llvm/test/Reduce/remove-global-variable-attributes.ll llvm/tools/llvm-reduce/CMakeLists.txt llvm/tools/llvm-reduce/DeltaManager.h llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp llvm/tools/llvm-reduce/deltas/ReduceAttributes.h llvm/utils/gn/secondary/llvm/tools/llvm-reduce/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83351.276820.patch Type: text/x-patch Size: 16752 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:13:36 2020 From: llvm-commits at lists.llvm.org (Artem Belevich via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:13:36 +0000 (UTC) Subject: [PATCH] D83503: [buildbot] Annotated builder tweaks Message-ID: tra created this revision. tra added a reviewer: gkistanova. Herald added subscribers: sanjoy.google, bixia. - Allow bypassing source code checkouts. Cloning complete LLVM tree takes 2-3 minutes and not all bots need it (e.g. some CUDA bots just need to run tests built somewhere else) - Allow using out-of-tree annotated scripts. This is useful for tinkering with bot operations without having to update buildmaster. Repository: rZORG LLVM Github Zorg https://reviews.llvm.org/D83503 Files: zorg/buildbot/builders/AnnotatedBuilder.py Index: zorg/buildbot/builders/AnnotatedBuilder.py =================================================================== --- zorg/buildbot/builders/AnnotatedBuilder.py +++ zorg/buildbot/builders/AnnotatedBuilder.py @@ -11,7 +11,8 @@ env=None, extra_args=None, timeout=1200, - is_legacy_mode=False): + is_legacy_mode=False, + checkout_llvm_sources=True): """ Returns a new build factory that uses AnnotatedCommand, which allows the build to be run by version-controlled scripts that do @@ -76,19 +77,22 @@ src_dir='llvm-zorg', alwaysUseLatest=True) - f.addGetSourcecodeSteps() - + if checkout_llvm_sources: + f.addGetSourcecodeSteps() extra_args_with_props = [WithProperties(arg) for arg in extra_args] # Explicitly use '/' as separator, because it works on *nix and Windows. - script_path = "../llvm-zorg/zorg/buildbot/builders/annotated/%s" % (script) + if script.startswith('/'): + command = [script] + else: + script_path = "../llvm-zorg/zorg/buildbot/builders/annotated/%s" % (script) + command = ["python", script_path, WithProperties("--jobs=%(jobs:-)s")] + command += extra_args_with_props + f.addStep(AnnotatedCommand(name="annotate", description="annotate", timeout=timeout, haltOnFailure=True, - command=["python", - script_path, - WithProperties("--jobs=%(jobs:-)s")] - + extra_args_with_props, + command=command, env=merged_env)) return f -------------- next part -------------- A non-text attachment was scrubbed... Name: D83503.276821.patch Type: text/x-patch Size: 1772 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:15:18 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:15:18 +0000 (UTC) Subject: [PATCH] D83504: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls. callee has a TOC Message-ID: NeHuang created this revision. NeHuang added reviewers: nemanjai, sfertile, MaskRay, stefanp, hfinkel, power-llvm-team. NeHuang added a project: LLVM. Herald added subscribers: llvm-commits, shchenz, arichardson, emaste. Herald added a reviewer: espindola. The PC Relative code now allows for calls that are marked with the relocation R_PPC64_REL24_NOTOC. This indicates that the caller does not have a valid TOC pointer in R2 and does not require R2 to be restored after the call. This patch is added to support local calls to callees that require a TOC Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83504 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Target.h lld/ELF/Thunks.cpp lld/test/ELF/ppc64-pcrel-call-to-toc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83504.276812.patch Type: text/x-patch Size: 7305 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:15:22 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:15:22 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: DiggerLin marked 20 inline comments as done. DiggerLin added inline comments. ================ Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:90 + support::ubig16_t ModuleType; + char CpuFlag; + char CpuType; ---------------- jasonliu wrote: > Why do we use `char` and not `uint8_t`? thanks ================ Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:139 +const XCOFFAuxiliaryHeader64 *XCOFFObjectFile::AuxiliaryHeader64() const { + assert(is64Bit() && "64-bit interface called on a 64-bit object file."); + return static_cast(AuxiliaryHeader); ---------------- jasonliu wrote: > 64-bit interface called on a 32-bit object file. thanks ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:105 + const XCOFFAuxiliaryHeader64 *AuxHeader64Prt = Obj.AuxiliaryHeader64(); + printAuxiliaryHeaders(AuxHeader64Prt); + } else { ---------------- jasonliu wrote: > I don't think you need to define an extra variable here. thanks ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:491 + W.print##H(S, T); \ + if ((X = X - sizeof(T)) == 0) \ + return ---------------- hubert.reinterpretcast wrote: > This strikes me as extremely hazardous. What if we get a length value that is reflective of a partial field? thanks ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:494 + +void XCOFFDumper::printAuxiliaryHeaders(const XCOFFAuxiliaryHeader32 *AuxHeader) { + if (AuxHeader == nullptr) { ---------------- jasonliu wrote: > Please consider combine 32 bit and 64 bit version of this function using template, as most of the fields have the same name. we print out the information based on the binary sequence of auxiliary header. the same field is on different offset between the 32bit and 64 bits. it is difficult to implement in template. for example. o_tsize : Offset at 4 in 32bits , but Offset at 56 at 64bits. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:501 + DictScope DS(W, "AuxiliaryHeader"); + PrintAuxMember(Hex, "Magic", AuxHeader->AuxMagic, AuxSize); + PrintAuxMember(Hex, "Version", AuxHeader->Version, AuxSize); ---------------- hubert.reinterpretcast wrote: > jasonliu wrote: > > Why do you need to pass in `AuxSize` to the macro function when all inputs are the same? > `AuxSize` is modified by each macro(!) invocation... for AuxSize is modified , I just make it looks like a function. ================ Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:178 + // --auxiliary-headers + cl::opt + XCOFFAuxiliaryHeaders("auxiliary-headers", ---------------- jasonliu wrote: > I'm assuming we need to add it somewhere in the llvm docs about this new option. thanks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Thu Jul 9 13:18:56 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:18:56 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <9ff0489b5924290aa2677dd981f1161d@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG205dc0922d5f: [CallGraph] Ignore callback uses (authored by ggeorgakoudis). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.276823.patch Type: text/x-patch Size: 5058 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:18:56 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via llvm-commits) Date: Thu, 09 Jul 2020 13:18:56 -0700 (PDT) Subject: [llvm] 205dc09 - [CallGraph] Ignore callback uses Message-ID: <5f077bb0.1c69fb81.6f166.9610@mx.google.com> Author: Giorgis Georgakoudis Date: 2020-07-09T13:13:46-07:00 New Revision: 205dc0922d5f7305226f7457fcbcb4224c92530c URL: https://github.com/llvm/llvm-project/commit/205dc0922d5f7305226f7457fcbcb4224c92530c DIFF: https://github.com/llvm/llvm-project/commit/205dc0922d5f7305226f7457fcbcb4224c92530c.diff LOG: [CallGraph] Ignore callback uses Summary: Ignore callback uses when adding a callback function in the CallGraph. Callback functions are typically created when outlining, e.g. for OpenMP, so they have internal scope and linkage. They should not be added to the ExternalCallingNode since they are only callable by the specified caller function at creation time. A CGSCC pass, such as OpenMPOpt, may need to update the CallGraph by adding a new outlined callback function. Without ignoring callback uses, adding breaks CGSCC pass restrictions and results to a broken CallGraph. Reviewers: jdoerfert Subscribers: hiraditya, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83370 Added: llvm/test/Analysis/CallGraph/ignore-callback-uses.ll Modified: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h index ee66abc3eaed..bb4ec13c7610 100644 --- a/llvm/include/llvm/IR/Function.h +++ b/llvm/include/llvm/IR/Function.h @@ -830,9 +830,11 @@ class Function : public GlobalObject, public ilist_node { /// hasAddressTaken - returns true if there are any uses of this function /// other than direct calls or invokes to it, or blockaddress expressions. - /// Optionally passes back an offending user for diagnostic purposes. + /// Optionally passes back an offending user for diagnostic purposes and + /// ignores callback uses. /// - bool hasAddressTaken(const User** = nullptr) const; + bool hasAddressTaken(const User ** = nullptr, + bool IgnoreCallbackUses = false) const; /// isDefTriviallyDead - Return true if it is trivially safe to remove /// this function definition from the module (because it isn't externally diff --git a/llvm/lib/Analysis/CallGraph.cpp b/llvm/lib/Analysis/CallGraph.cpp index d8abccfdb095..08264512f400 100644 --- a/llvm/lib/Analysis/CallGraph.cpp +++ b/llvm/lib/Analysis/CallGraph.cpp @@ -77,9 +77,11 @@ bool CallGraph::invalidate(Module &, const PreservedAnalyses &PA, void CallGraph::addToCallGraph(Function *F) { CallGraphNode *Node = getOrInsertFunction(F); - // If this function has external linkage or has its address taken, anything - // could call it. - if (!F->hasLocalLinkage() || F->hasAddressTaken()) + bool IgnoreCallbackUses = true; + + // If this function has external linkage or has its address taken and + // it is not a callback, then anything could call it. + if (!F->hasLocalLinkage() || F->hasAddressTaken(nullptr, IgnoreCallbackUses)) ExternalCallingNode->addCalledFunction(nullptr, Node); populateCallGraphNode(Node); diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp index 0ec0cce83a8c..995bc40c362f 100644 --- a/llvm/lib/IR/Function.cpp +++ b/llvm/lib/IR/Function.cpp @@ -20,6 +20,7 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/IR/AbstractCallSite.h" #include "llvm/IR/Argument.h" #include "llvm/IR/Attributes.h" #include "llvm/IR/BasicBlock.h" @@ -1484,12 +1485,18 @@ Optional Intrinsic::remangleIntrinsicFunction(Function *F) { } /// hasAddressTaken - returns true if there are any uses of this function -/// other than direct calls or invokes to it. -bool Function::hasAddressTaken(const User* *PutOffender) const { +/// other than direct calls or invokes to it. Optionally ignores callback +/// uses. +bool Function::hasAddressTaken(const User **PutOffender, + bool IgnoreCallbackUses) const { for (const Use &U : uses()) { const User *FU = U.getUser(); if (isa(FU)) continue; + + if (IgnoreCallbackUses && AbstractCallSite(&U)) + continue; + const auto *Call = dyn_cast(FU); if (!Call) { if (PutOffender) diff --git a/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll b/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll new file mode 100644 index 000000000000..8964ca1efd86 --- /dev/null +++ b/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll @@ -0,0 +1,51 @@ +; RUN: opt < %s -print-callgraph -disable-output 2>&1 | FileCheck %s +; CHECK: Call graph node <><<{{.*}}>> #uses=0 +; CHECK-NEXT: CS<{{.*}}> calls function 'f' +; CHECK-NEXT: CS<{{.*}}> calls function '__kmpc_fork_call' +; CHECK-EMPTY: + +%struct.ident_t = type { i32, i32, i32, i32, i8* } + + at 0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 + at 1 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @0, i32 0, i32 0) }, align 8 + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local void @f() { +entry: + br label %omp_parallel + +omp_parallel: ; preds = %entry + call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @f..omp_par to void (i32*, i32*, ...)*)) + br label %omp.par.exit.split + +omp.par.exit.split: ; preds = %omp_parallel + ret void +} + +; Function Attrs: norecurse nounwind +define internal void @f..omp_par(i32* noalias %tid.addr, i32* noalias %zero.addr) { +omp.par.entry: + %tid.addr.local = alloca i32, align 4 + %0 = load i32, i32* %tid.addr, align 4 + store i32 %0, i32* %tid.addr.local, align 4 + %tid = load i32, i32* %tid.addr.local, align 4 + br label %omp.par.region + +omp.par.exit.split.exitStub: ; preds = %omp.par.outlined.exit + ret void + +omp.par.region: ; preds = %omp.par.entry + br label %omp.par.pre_finalize + +omp.par.pre_finalize: ; preds = %omp.par.region + br label %omp.par.outlined.exit + +omp.par.outlined.exit: ; preds = %omp.par.pre_finalize + br label %omp.par.exit.split.exitStub +} + +; Function Attrs: nounwind +declare !callback !2 void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) #2 + +!2 = !{!3} +!3 = !{i64 2, i64 -1, i64 -1, i1 true} From llvm-commits at lists.llvm.org Thu Jul 9 13:19:03 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:19:03 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <3a25d36fa349b2f9d53ba99b079160ae@localhost.localdomain> aeubanks marked an inline comment as done. aeubanks added inline comments. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- ychen wrote: > aeubanks wrote: > > ychen wrote: > > > How about keeping this local? These are only for testing. > > Do you mean keeping this in an anonymous namespace? > > As mentioned in the commit, that makes the printed name messed up. > Add some regex in lit tests? > Running pass: {{.*}}NoOpModulePass > I don't see any reason to distinguish it from other passes, even if it's only used for testing. It's a useful tool for sanity checks. Having a `(anonymous namespace)` printed anywhere doesn't look good. And it'd require updating more tests than I really want to. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 13:22:48 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:22:48 +0000 (UTC) Subject: [PATCH] D82703: [InstCombine] convert assumes to operand bundles In-Reply-To: References: Message-ID: <8ec208adc84b1dbd9d8908c5831b4542@localhost.localdomain> Tyker updated this revision to Diff 276824. Tyker added a comment. I had to merge the previous patch with other i had laying around and add some more to make the bundle mode be at least as good as the default. so this revision got quite a bit bigger. I hope its still fine. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82703/new/ https://reviews.llvm.org/D82703 Files: llvm/include/llvm/Analysis/AssumeBundleQueries.h llvm/include/llvm/Analysis/ValueTracking.h llvm/include/llvm/Transforms/Utils/AssumeBundleBuilder.h llvm/lib/Analysis/AssumeBundleQueries.cpp llvm/lib/Analysis/Loads.cpp llvm/lib/Analysis/ValueTracking.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp llvm/test/Analysis/BasicAA/featuretest.ll llvm/test/Analysis/ValueTracking/assume.ll llvm/test/Transforms/InstCombine/assume.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82703.276824.patch Type: text/x-patch Size: 26240 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:27:16 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:27:16 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram updated this revision to Diff 276825. tmsriram added a comment. Move tests to CodeGen/X86/cfi*. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.276825.patch Type: text/x-patch Size: 15285 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:28:43 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:28:43 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <46b6b909600c0a46616df89961214d39@localhost.localdomain> lebedev.ri added a comment. This seems to have broken the build http://45.33.8.238/linux/22500/step_7.txt Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Thu Jul 9 13:31:48 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:31:48 +0000 (UTC) Subject: [PATCH] D83091: [FileCheck] Improve -dump-input documentation In-Reply-To: References: Message-ID: <9446759ab5091c69342204d6912c4dfb@localhost.localdomain> jdenny added a comment. Ping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83091/new/ https://reviews.llvm.org/D83091 From llvm-commits at lists.llvm.org Thu Jul 9 13:32:36 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:32:36 +0000 (UTC) Subject: [PATCH] D83468: [Debuginfo] Fix for PR46653 In-Reply-To: References: Message-ID: probinson added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1990 // (The new location might be an explicit line 0, which we do emit.) - if (DL.getLine() == 0 && LastAsmLine == 0) + if (DL.getLine() == 0 && LastAsmLine != 0) return; ---------------- It looks like this change will suppress emitting an explicit line-0 location. We definitely want to emit line-0 in many cases, so this patch is going too far. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83468/new/ https://reviews.llvm.org/D83468 From llvm-commits at lists.llvm.org Thu Jul 9 13:32:47 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:32:47 +0000 (UTC) Subject: [PATCH] D83505: [NFC] Add utility to sum/merge stats files Message-ID: Tyker created this revision. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Add a small script to sum *.stats file given as input and output the totals usage example: merge-stats.py $(find ./builddir/ -name "*.stats") > total.stats Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83505 Files: llvm/utils/merge-stats.py Index: llvm/utils/merge-stats.py =================================================================== --- /dev/null +++ llvm/utils/merge-stats.py @@ -0,0 +1,33 @@ +#!/usr/bin/env python +''' +Merge .stats files generated by llvm tools + +merge-stats.py takes as argument a list of stats files to merge +and output the result on stdout + +Usage: + merge-stats.py $(find ./builddir/ -name "*.stats") > total.stats +''' + +import json +import sys + +result = {} + +for arg in range(1, len(sys.argv)): + with open(sys.argv[arg], "r", encoding='utf-8', + errors='ignore') as f: + text = f.read() + try: + data = json.loads(text) + except: + print('ignored %s: failed to parse' % sys.argv[arg], file= sys.stderr) + continue + for key in data: + if key in result: + result[key] += data[key] + else: + result[key] = data[key] + +out = json.dumps(result, indent=2) +print(out) -------------- next part -------------- A non-text attachment was scrubbed... Name: D83505.276828.patch Type: text/x-patch Size: 931 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:33:37 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:33:37 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: Conanap marked an inline comment as done. Conanap added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:939 + // The XFormMemOp flag for the following 8 insts is set on the instruction format. + let mayLoad = 1, mayStore = 1 in { + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; ---------------- bsaleil wrote: > Shouldn't `mayStore` be 0 instead of 1 here ? yes, thanks; will fix on the commit Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 From llvm-commits at lists.llvm.org Thu Jul 9 13:34:04 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:34:04 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: DiggerLin updated this revision to Diff 276827. DiggerLin marked 7 inline comments as done. DiggerLin added a comment. address comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 Files: llvm/docs/CommandGuide/llvm-readobj.rst llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-32-xlc-exec llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-32-xlc-obj.o llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-64-xlc-exec llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-64-xlc-obj.o llvm/test/tools/llvm-readobj/XCOFF/xcoff-auxiliary-header.test llvm/tools/llvm-readobj/ObjDumper.h llvm/tools/llvm-readobj/XCOFFDumper.cpp llvm/tools/llvm-readobj/llvm-readobj.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82549.276827.patch Type: text/x-patch Size: 20655 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:36:05 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Thu, 09 Jul 2020 13:36:05 -0700 (PDT) Subject: [llvm] c025bdf - Revert D83013 "[LPM] Port CGProfilePass from NPM to LPM" Message-ID: <5f077fb5.1c69fb81.e2e55.98ab@mx.google.com> Author: Fangrui Song Date: 2020-07-09T13:34:04-07:00 New Revision: c025bdf25a59a79d60a2e99962c8653547a825d8 URL: https://github.com/llvm/llvm-project/commit/c025bdf25a59a79d60a2e99962c8653547a825d8 DIFF: https://github.com/llvm/llvm-project/commit/c025bdf25a59a79d60a2e99962c8653547a825d8.diff LOG: Revert D83013 "[LPM] Port CGProfilePass from NPM to LPM" This reverts commit c92a8c0a0f68fbbb23e3fdde071007e63a552e82. It breaks builds and has unaddressed review comments. Added: llvm/test/Other/new-pm-cgprofile.ll Modified: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll Removed: ################################################################################ diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def index f3e43919eeca..d465e00d4c70 100644 --- a/clang/include/clang/Basic/CodeGenOptions.def +++ b/clang/include/clang/Basic/CodeGenOptions.def @@ -252,6 +252,7 @@ CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables. CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer. CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer. CODEGENOPT(ProfileSampleAccurate, 1, 0) ///< Sample profile is accurate. +CODEGENOPT(CallGraphProfile , 1, 0) ///< Run call graph profile. /// Attempt to use register sized accesses to bit-fields in structures, when /// possible. diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 3ada1aaa4ed8..9e6d5e4593d3 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -620,7 +620,6 @@ void EmitAssemblyHelper::CreatePasses(legacy::PassManager &MPM, PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize; PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP; PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop; - PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops; // Loop interleaving in the loop vectorizer has historically been set to be @@ -1145,7 +1144,7 @@ void EmitAssemblyHelper::EmitAssemblyWithNewPassManager( PTO.LoopInterleaving = CodeGenOpts.UnrollLoops; PTO.LoopVectorization = CodeGenOpts.VectorizeLoop; PTO.SLPVectorization = CodeGenOpts.VectorizeSLP; - PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; + PTO.CallGraphProfile = CodeGenOpts.CallGraphProfile; PTO.Coroutines = LangOpts.Coroutines; PassInstrumentationCallbacks PIC; @@ -1563,7 +1562,7 @@ static void runThinLTOBackend( Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops; Conf.PTO.LoopVectorization = CGOpts.VectorizeLoop; Conf.PTO.SLPVectorization = CGOpts.VectorizeSLP; - Conf.PTO.CallGraphProfile = !CGOpts.DisableIntegratedAS; + Conf.PTO.CallGraphProfile = CGOpts.CallGraphProfile; // Context sensitive profile. if (CGOpts.hasProfileCSIRInstr()) { diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index fd34c6b8a955..6f6af917e3a3 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -860,6 +860,7 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK, Opts.RerollLoops = Args.hasArg(OPT_freroll_loops); Opts.DisableIntegratedAS = Args.hasArg(OPT_fno_integrated_as); + Opts.CallGraphProfile = !Opts.DisableIntegratedAS; Opts.Autolink = !Args.hasArg(OPT_fno_autolink); Opts.SampleProfileFile = std::string(Args.getLastArgValue(OPT_fprofile_sample_use_EQ)); diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 06e8507036ac..f0d5accf13c5 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -103,7 +103,6 @@ void initializeCFGViewerLegacyPassPass(PassRegistry&); void initializeCFIInstrInserterPass(PassRegistry&); void initializeCFLAndersAAWrapperPassPass(PassRegistry&); void initializeCFLSteensAAWrapperPassPass(PassRegistry&); -void initializeCGProfileLegacyPassPass(PassRegistry &); void initializeCallGraphDOTPrinterPass(PassRegistry&); void initializeCallGraphPrinterLegacyPassPass(PassRegistry&); void initializeCallGraphViewerPass(PassRegistry&); diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index d1b9f269d5d4..28e454d3b0fc 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -282,8 +282,6 @@ ModulePass *createSampleProfileLoaderPass(StringRef Name); ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str, raw_ostream *ThinLinkOS = nullptr); -ModulePass *createCGProfileLegacyPass(); - } // End llvm namespace #endif diff --git a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h index a9928c3f5a40..8b03bcba10e4 100644 --- a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h +++ b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h @@ -156,7 +156,6 @@ class PassManagerBuilder { bool DisableTailCalls; bool DisableUnrollLoops; - bool CallGraphProfile; bool SLPVectorize; bool LoopVectorize; bool LoopsInterleaved; diff --git a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h index 4cb45fd42f80..28fd3804dec9 100644 --- a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h +++ b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h @@ -19,6 +19,11 @@ namespace llvm { class CGProfilePass : public PassInfoMixin { public: PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM); + +private: + void addModuleFlags( + Module &M, + MapVector, uint64_t> &Counts) const; }; } // end namespace llvm diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 4d6c30b87a99..58510609cf5e 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -248,6 +248,10 @@ static cl::opt EnableCHR("enable-chr-npm", cl::init(true), cl::Hidden, cl::desc("Enable control height reduction optimization (CHR)")); +static cl::opt EnableCallGraphProfile( + "enable-npm-call-graph-profile", cl::init(true), cl::Hidden, + cl::desc("Enable call graph profile pass for the new PM (default = on)")); + /// Flag to enable inline deferral during PGO. static cl::opt EnablePGOInlineDeferral("enable-npm-pgo-inline-deferral", cl::init(true), @@ -263,7 +267,7 @@ PipelineTuningOptions::PipelineTuningOptions() { Coroutines = false; LicmMssaOptCap = SetLicmMssaOptCap; LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap; - CallGraphProfile = true; + CallGraphProfile = EnableCallGraphProfile; } extern cl::opt EnableHotColdSplit; diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index b65eb469a492..9534fb874107 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -195,7 +195,6 @@ PassManagerBuilder::PassManagerBuilder() { PrepareForThinLTO = EnablePrepareForThinLTO; PerformThinLTO = EnablePerformThinLTO; DivergentTarget = false; - CallGraphProfile = true; } PassManagerBuilder::~PassManagerBuilder() { @@ -835,10 +834,6 @@ void PassManagerBuilder::populateModulePassManager( if (MergeFunctions) MPM.add(createMergeFunctionsPass()); - // Add Module flag "CG Profile" based on Branch Frequency Information. - if (CallGraphProfile) - MPM.add(createCGProfileLegacyPass()); - // LoopSink pass sinks instructions hoisted by LICM, which serves as a // canonicalization pass that enables other optimizations. As a result, // LoopSink pass needs to be a very late IR pass to avoid undoing LICM diff --git a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp index e95731a2117b..2d5bd9570940 100644 --- a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp +++ b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp @@ -10,48 +10,22 @@ #include "llvm/ADT/MapVector.h" #include "llvm/Analysis/BlockFrequencyInfo.h" -#include "llvm/Analysis/LazyBlockFrequencyInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/MDBuilder.h" #include "llvm/IR/PassManager.h" -#include "llvm/InitializePasses.h" #include "llvm/ProfileData/InstrProf.h" -#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/Instrumentation.h" #include using namespace llvm; -static bool -addModuleFlags(Module &M, - MapVector, uint64_t> &Counts) { - if (Counts.empty()) - return false; - - LLVMContext &Context = M.getContext(); - MDBuilder MDB(Context); - std::vector Nodes; - - for (auto E : Counts) { - Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), - ValueAsMetadata::get(E.first.second), - MDB.createConstant(ConstantInt::get( - Type::getInt64Ty(Context), E.second))}; - Nodes.push_back(MDNode::get(Context, Vals)); - } - - M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); - return true; -} - -static bool -runCGProfilePass(Module &M, - function_ref GetBFI, - function_ref GetTTI) { +PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { MapVector, uint64_t> Counts; + FunctionAnalysisManager &FAM = + MAM.getResult(M).getManager(); InstrProfSymtab Symtab; auto UpdateCounts = [&](TargetTransformInfo &TTI, Function *F, Function *CalledF, uint64_t NewCount) { @@ -61,14 +35,14 @@ runCGProfilePass(Module &M, Count = SaturatingAdd(Count, NewCount); }; // Ignore error here. Indirect calls are ignored if this fails. - (void)(bool) Symtab.create(M); + (void)(bool)Symtab.create(M); for (auto &F : M) { - if (F.isDeclaration() || !F.getEntryCount()) + if (F.isDeclaration()) continue; - auto &BFI = GetBFI(F); + auto &BFI = FAM.getResult(F); if (BFI.getEntryFreq() == 0) continue; - TargetTransformInfo &TTI = GetTTI(F); + TargetTransformInfo &TTI = FAM.getResult(F); for (auto &BB : F) { Optional BBCount = BFI.getBlockProfileCount(&BB); if (!BBCount) @@ -95,56 +69,28 @@ runCGProfilePass(Module &M, } } - return addModuleFlags(M, Counts); -} + addModuleFlags(M, Counts); -namespace { -struct CGProfileLegacyPass final : public ModulePass { - static char ID; - CGProfileLegacyPass() : ModulePass(ID) { - initializeCGProfileLegacyPassPass(*PassRegistry::getPassRegistry()); - } + return PreservedAnalyses::all(); +} - void getAnalysisUsage(AnalysisUsage &AU) const override { - AU.setPreservesCFG(); - AU.addRequired(); - AU.addRequired(); - } +void CGProfilePass::addModuleFlags( + Module &M, + MapVector, uint64_t> &Counts) const { + if (Counts.empty()) + return; - bool runOnModule(Module &M) override { - auto GetBFI = [this](Function &F) -> BlockFrequencyInfo & { - return this->getAnalysis(F).getBFI(); - }; - auto GetTTI = [this](Function &F) -> TargetTransformInfo & { - return this->getAnalysis().getTTI(F); - }; + LLVMContext &Context = M.getContext(); + MDBuilder MDB(Context); + std::vector Nodes; - return runCGProfilePass(M, GetBFI, GetTTI); + for (auto E : Counts) { + Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), + ValueAsMetadata::get(E.first.second), + MDB.createConstant(ConstantInt::get( + Type::getInt64Ty(Context), E.second))}; + Nodes.push_back(MDNode::get(Context, Vals)); } -}; - -} // namespace - -char CGProfileLegacyPass::ID = 0; -INITIALIZE_PASS(CGProfileLegacyPass, "cg-profile", "Call Graph Profile", false, - false) - -ModulePass *llvm::createCGProfileLegacyPass() { - return new CGProfileLegacyPass(); -} - -PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { - FunctionAnalysisManager &FAM = - MAM.getResult(M).getManager(); - auto GetBFI = [&FAM](Function &F) -> BlockFrequencyInfo & { - return FAM.getResult(F); - }; - auto GetTTI = [&FAM](Function &F) -> TargetTransformInfo & { - return FAM.getResult(F); - }; - - runCGProfilePass(M, GetBFI, GetTTI); - - return PreservedAnalyses::all(); + M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); } diff --git a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp index ad238f1357c6..64626225f23f 100644 --- a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp @@ -112,7 +112,6 @@ void llvm::initializeInstrumentation(PassRegistry &Registry) { initializePGOInstrumentationUseLegacyPassPass(Registry); initializePGOIndirectCallPromotionLegacyPassPass(Registry); initializePGOMemOPSizeOptLegacyPassPass(Registry); - initializeCGProfileLegacyPassPass(Registry); initializeInstrOrderFileLegacyPassPass(Registry); initializeInstrProfilingLegacyPassPass(Registry); initializeMemorySanitizerLegacyPassPass(Registry); diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 85f9d8c867bf..32d36f4e7280 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -276,12 +276,6 @@ ; GCN-O1-NEXT: Warn about non-applied transformations ; GCN-O1-NEXT: Alignment from assumptions ; GCN-O1-NEXT: Strip Unused Function Prototypes -; GCN-O1-NEXT: Call Graph Profile -; GCN-O1-NEXT: FunctionPass Manager -; GCN-O1-NEXT: Dominator Tree Construction -; GCN-O1-NEXT: Natural Loop Information -; GCN-O1-NEXT: Lazy Branch Probability Analysis -; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: FunctionPass Manager ; GCN-O1-NEXT: Dominator Tree Construction ; GCN-O1-NEXT: Natural Loop Information @@ -629,12 +623,6 @@ ; GCN-O2-NEXT: Strip Unused Function Prototypes ; GCN-O2-NEXT: Dead Global Elimination ; GCN-O2-NEXT: Merge Duplicate Global Constants -; GCN-O2-NEXT: Call Graph Profile -; GCN-O2-NEXT: FunctionPass Manager -; GCN-O2-NEXT: Dominator Tree Construction -; GCN-O2-NEXT: Natural Loop Information -; GCN-O2-NEXT: Lazy Branch Probability Analysis -; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: FunctionPass Manager ; GCN-O2-NEXT: Dominator Tree Construction ; GCN-O2-NEXT: Natural Loop Information @@ -987,12 +975,6 @@ ; GCN-O3-NEXT: Strip Unused Function Prototypes ; GCN-O3-NEXT: Dead Global Elimination ; GCN-O3-NEXT: Merge Duplicate Global Constants -; GCN-O3-NEXT: Call Graph Profile -; GCN-O3-NEXT: FunctionPass Manager -; GCN-O3-NEXT: Dominator Tree Construction -; GCN-O3-NEXT: Natural Loop Information -; GCN-O3-NEXT: Lazy Branch Probability Analysis -; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: FunctionPass Manager ; GCN-O3-NEXT: Dominator Tree Construction ; GCN-O3-NEXT: Natural Loop Information diff --git a/llvm/test/Instrumentation/cgprofile.ll b/llvm/test/Instrumentation/cgprofile.ll index 70a1f81aa53e..1edf3b6ec518 100644 --- a/llvm/test/Instrumentation/cgprofile.ll +++ b/llvm/test/Instrumentation/cgprofile.ll @@ -1,5 +1,4 @@ ; RUN: opt < %s -passes cg-profile -S | FileCheck %s -; RUN: opt < %s -cg-profile -S | FileCheck %s declare void @b() diff --git a/llvm/test/Other/new-pm-cgprofile.ll b/llvm/test/Other/new-pm-cgprofile.ll new file mode 100644 index 000000000000..c7fe31ab570f --- /dev/null +++ b/llvm/test/Other/new-pm-cgprofile.ll @@ -0,0 +1,11 @@ +; RUN: opt -debug-pass-manager -passes='default' %s 2>&1 |FileCheck %s --check-prefixes=DEFAULT +; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=0 %s 2>&1 |FileCheck %s --check-prefixes=OFF +; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=1 %s 2>&1 |FileCheck %s --check-prefixes=ON +; +; DEFAULT: Running pass: CGProfilePass +; OFF-NOT: Running pass: CGProfilePass +; ON: Running pass: CGProfilePass + +define void @foo() { + ret void +} diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index 56f85d0fb9a8..ca72ec1f7567 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -280,12 +280,6 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants -; CHECK-NEXT: Call Graph Profile -; CHECK-NEXT: FunctionPass Manager -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 942f7d9dfead..f629bfc3444b 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -285,12 +285,6 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants -; CHECK-NEXT: Call Graph Profile -; CHECK-NEXT: FunctionPass Manager -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index d975cc48b629..dde9fbeb9950 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -266,12 +266,6 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants -; CHECK-NEXT: Call Graph Profile -; CHECK-NEXT: FunctionPass Manager -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information From llvm-commits at lists.llvm.org Thu Jul 9 13:36:42 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 09 Jul 2020 13:36:42 -0700 (PDT) Subject: [llvm] ce1e485 - Temporarily Revert "[PowerPC] Split s34imm into two types" Message-ID: <5f077fda.1c69fb81.8958e.94b5@mx.google.com> Author: Eric Christopher Date: 2020-07-09T13:36:32-07:00 New Revision: ce1e4853b5a15d679bd662ac5777a2390daf0391 URL: https://github.com/llvm/llvm-project/commit/ce1e4853b5a15d679bd662ac5777a2390daf0391 DIFF: https://github.com/llvm/llvm-project/commit/ce1e4853b5a15d679bd662ac5777a2390daf0391.diff LOG: Temporarily Revert "[PowerPC] Split s34imm into two types" as it was failing in Release+Asserts mode with an assert. This reverts commit bd2068031121adf5a0e28d9306a1741d6f0bbd87. Added: Modified: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h llvm/lib/Target/PowerPC/PPCInstrInfo.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td Removed: llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s ################################################################################ diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp index 59cb2b994a4b..dbaf221db9fc 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp @@ -46,7 +46,6 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t Value) { case PPC::fixup_ppc_half16ds: return Value & 0xfffc; case PPC::fixup_ppc_pcrel34: - case PPC::fixup_ppc_imm34: return Value & 0x3ffffffff; } } @@ -69,7 +68,6 @@ static unsigned getFixupKindNumBytes(unsigned Kind) { case PPC::fixup_ppc_br24_notoc: return 4; case PPC::fixup_ppc_pcrel34: - case PPC::fixup_ppc_imm34: case FK_Data_8: return 8; case PPC::fixup_ppc_nofixup: @@ -102,7 +100,6 @@ class PPCAsmBackend : public MCAsmBackend { { "fixup_ppc_half16", 0, 16, 0 }, { "fixup_ppc_half16ds", 0, 14, 0 }, { "fixup_ppc_pcrel34", 0, 34, MCFixupKindInfo::FKF_IsPCRel }, - { "fixup_ppc_imm34", 0, 34, 0 }, { "fixup_ppc_nofixup", 0, 0, 0 } }; const static MCFixupKindInfo InfosLE[PPC::NumTargetFixupKinds] = { @@ -115,7 +112,6 @@ class PPCAsmBackend : public MCAsmBackend { { "fixup_ppc_half16", 0, 16, 0 }, { "fixup_ppc_half16ds", 2, 14, 0 }, { "fixup_ppc_pcrel34", 0, 34, MCFixupKindInfo::FKF_IsPCRel }, - { "fixup_ppc_imm34", 0, 34, 0 }, { "fixup_ppc_nofixup", 0, 0, 0 } }; diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp index 1af08ec5539d..d8b3301e97f1 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp @@ -409,9 +409,6 @@ unsigned PPCELFObjectWriter::getRelocType(MCContext &Ctx, const MCValue &Target, break; } break; - case PPC::fixup_ppc_imm34: - llvm_unreachable("Unsupported Modifier for fixup_ppc_imm34."); - break; case FK_Data_8: switch (Modifier) { default: llvm_unreachable("Unsupported Modifier"); diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h index 73292f7b7938..2fb8947fd4e0 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h @@ -43,9 +43,6 @@ enum Fixups { // A 34-bit fixup corresponding to PC-relative paddi. fixup_ppc_pcrel34, - // A 34-bit fixup corresponding to Non-PC-relative paddi. - fixup_ppc_imm34, - /// Not a true fixup, but ties a symbol to a call to __tls_get_addr for the /// TLS general and local dynamic models, or inserts the thread-pointer /// register number. diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp index 8c0e0a80b1e2..fb65e7320f2b 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp @@ -104,36 +104,20 @@ unsigned PPCMCCodeEmitter::getImm16Encoding(const MCInst &MI, unsigned OpNo, return 0; } -uint64_t PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI, - MCFixupKind Fixup) const { +uint64_t +PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned OpNo, + SmallVectorImpl &Fixups, + const MCSubtargetInfo &STI) const { const MCOperand &MO = MI.getOperand(OpNo); - assert(!MO.isReg() && "Not expecting a register for this operand."); - if (MO.isImm()) + if (MO.isReg() || MO.isImm()) return getMachineOpValue(MI, MO, Fixups, STI); // Add a fixup for the immediate field. - Fixups.push_back(MCFixup::create(0, MO.getExpr(), Fixup)); + Fixups.push_back(MCFixup::create(0, MO.getExpr(), + (MCFixupKind)PPC::fixup_ppc_pcrel34)); return 0; } -uint64_t -PPCMCCodeEmitter::getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const { - return getImm34Encoding(MI, OpNo, Fixups, STI, - (MCFixupKind)PPC::fixup_ppc_imm34); -} - -uint64_t -PPCMCCodeEmitter::getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const { - return getImm34Encoding(MI, OpNo, Fixups, STI, - (MCFixupKind)PPC::fixup_ppc_pcrel34); -} - unsigned PPCMCCodeEmitter::getMemRIEncoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, const MCSubtargetInfo &STI) const { diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h index 4504cc6a7405..588aa76bd806 100644 --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h @@ -52,14 +52,7 @@ class PPCMCCodeEmitter : public MCCodeEmitter { const MCSubtargetInfo &STI) const; uint64_t getImm34Encoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI, - MCFixupKind Fixup) const; - uint64_t getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const; - uint64_t getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, - SmallVectorImpl &Fixups, - const MCSubtargetInfo &STI) const; + const MCSubtargetInfo &STI) const; unsigned getMemRIEncoding(const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, const MCSubtargetInfo &STI) const; diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.td b/llvm/lib/Target/PowerPC/PPCInstrInfo.td index 39a90bf9b346..673ab63039cf 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrInfo.td +++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.td @@ -757,13 +757,7 @@ def PPCS34ImmAsmOperand : AsmOperandClass { } def s34imm : Operand { let PrintMethod = "printS34ImmOperand"; - let EncoderMethod = "getImm34EncodingNoPCRel"; - let ParserMatchClass = PPCS34ImmAsmOperand; - let DecoderMethod = "decodeSImmOperand<34>"; -} -def s34imm_pcrel : Operand { - let PrintMethod = "printS34ImmOperand"; - let EncoderMethod = "getImm34EncodingPCRel"; + let EncoderMethod = "getImm34Encoding"; let ParserMatchClass = PPCS34ImmAsmOperand; let DecoderMethod = "decodeSImmOperand<34>"; } diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td index 91bb912e5726..2c21d0a175ad 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -432,7 +432,7 @@ let Predicates = [PrefixInstrs] in { let Interpretation64Bit = 1, isCodeGenOnly = 1 in { defm PADDI8 : MLS_DForm_R_SI34_RTA5_p<14, (outs g8rc:$RT), (ins g8rc:$RA, s34imm:$SI), - (ins immZero:$RA, s34imm_pcrel:$SI), + (ins immZero:$RA, s34imm:$SI), "paddi $RT, $RA, $SI", IIC_LdStLFD>; let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { def PLI8 : MLS_DForm_SI34_RT5<14, (outs g8rc:$RT), @@ -442,7 +442,7 @@ let Predicates = [PrefixInstrs] in { } defm PADDI : MLS_DForm_R_SI34_RTA5_p<14, (outs gprc:$RT), (ins gprc:$RA, s34imm:$SI), - (ins immZero:$RA, s34imm_pcrel:$SI), + (ins immZero:$RA, s34imm:$SI), "paddi $RT, $RA, $SI", IIC_LdStLFD>; let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { def PLI : MLS_DForm_SI34_RT5<14, (outs gprc:$RT), diff --git a/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s b/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s deleted file mode 100644 index 0d2c879380e0..000000000000 --- a/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s +++ /dev/null @@ -1,7 +0,0 @@ -# RUN: not --crash llvm-mc -triple powerpc64-- --filetype=obj < %s 2> %t -# RUN: FileCheck < %t %s -# RUN: not --crash llvm-mc -triple powerpc64le-- --filetype=obj < %s 2> %t -# RUN: FileCheck < %t %s - -# CHECK: Unsupported Modifier for fixup_ppc_imm34. -paddi 3, 13, symbol at toc, 0 From llvm-commits at lists.llvm.org Thu Jul 9 13:36:42 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 9 Jul 2020 23:36:42 +0300 Subject: [lld] c282708 - Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. In-Reply-To: <5f075f87.1c69fb81.43098.87a8@mx.google.com> References: <5f075f87.1c69fb81.43098.87a8@mx.google.com> Message-ID: This seems to have broken http://45.33.8.238/win/19452/step_4.txt On Thu, Jul 9, 2020 at 9:18 PM Eric Christopher via llvm-commits wrote: > > > Author: Eric Christopher > Date: 2020-07-09T11:14:00-07:00 > New Revision: c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 > > URL: https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 > DIFF: https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3.diff > > LOG: Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. > > Added: > > > Modified: > lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > > Removed: > > > > ################################################################################ > diff --git a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > index aad5f8afcfdc..07c1d4242e03 100644 > --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > EXPECT_TRUE(f->undefinedSymbols.empty()); > @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { > fromBinary(fileBytes, sizeof(fileBytes), "i386"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > EXPECT_TRUE(f->undefinedSymbols.empty()); > @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { > fromBinary(fileBytes, sizeof(fileBytes), "ppc"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > EXPECT_TRUE(f->undefinedSymbols.empty()); > @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > EXPECT_TRUE(f->undefinedSymbols.empty()); > @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > EXPECT_TRUE(f->undefinedSymbols.empty()); > @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); > EXPECT_EQ((int)(f2->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f2->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f2->localSymbols.empty()); > EXPECT_TRUE(f2->globalSymbols.empty()); > EXPECT_TRUE(f2->undefinedSymbols.empty()); > @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > const Section& text = f->sections[0]; > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > const Section& text = f->sections[0]; > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > const Section& text = f->sections[0]; > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > const Section& text = f->sections[0]; > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > > diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > index 6ceb197b4b84..c1445ea7eacd 100644 > --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f->sections.empty()); > EXPECT_TRUE(f->localSymbols.empty()); > EXPECT_TRUE(f->globalSymbols.empty()); > @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { > std::unique_ptr f2 = fromYAML(intermediate); > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f2->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f2->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_TRUE(f2->sections.empty()); > EXPECT_TRUE(f2->localSymbols.empty()); > EXPECT_TRUE(f2->globalSymbols.empty()); > @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& sect1 = f->sections[0]; > @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& sect1 = f->sections[0]; > @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& sect1 = f->sections[0]; > @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { > "...\n"); > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& sect1 = f->sections[0]; > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits From llvm-commits at lists.llvm.org Thu Jul 9 13:39:33 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:39:33 +0000 (UTC) Subject: [PATCH] D83505: [NFC] Add utility to sum/merge stats files In-Reply-To: References: Message-ID: <7c1c519b8222da43bfeeb5c5534f0779@localhost.localdomain> lebedev.ri added a comment. Is there really no such existing script? (i don't know of one, i've always hacked around with bash) It's a non-blocker, but i just want to point out that aggregating via sum is not always correct for all stats - there's `updateMax()` setter. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83505/new/ https://reviews.llvm.org/D83505 From llvm-commits at lists.llvm.org Thu Jul 9 13:40:44 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:40:44 +0000 (UTC) Subject: [PATCH] D83506: [NFC] Add debug and stat counters to assume queries and assume builder Message-ID: Tyker created this revision. Tyker added a reviewer: jdoerfert. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Add debug counter and stats counter to assume queries and assume builder here is the collected stats on a build of check-llvm + check-clang. "assume-builder.NumAssumeBuilt": 2720879, "assume-builder.NumAssumesMerged": 761396, "assume-builder.NumAssumesRemoved": 1576212, "assume-builder.NumBundlesInAssumes": 6518809, "assume-queries.NumAssumeQueries": 85566380, "assume-queries.NumUsefullAssumeQueries": 2727360, the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries ask to keep providing information to try to get better information. and this isn't counted as a usefull query evem tho it can be usefull Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83506 Files: llvm/lib/Analysis/AssumeBundleQueries.cpp llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp llvm/test/Analysis/ValueTracking/assume-queries-counter.ll llvm/test/Transforms/Util/assume-builder-counter.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83506.276829.patch Type: text/x-patch Size: 15525 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:40:31 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 9 Jul 2020 13:40:31 -0700 Subject: [llvm] bd20680 - [PowerPC] Split s34imm into two types In-Reply-To: <5f0745ca.1c69fb81.756d.82c6@mx.google.com> References: <5f0745ca.1c69fb81.756d.82c6@mx.google.com> Message-ID: Hi Kamau and Stefan, This was failing with asserts in a release+asserts build so I've reverted it thusly: echristo at athyra ~/s/llvm-project> git push To github.com:llvm/llvm-project.git c025bdf25a5..ce1e4853b5a master -> master if you need any help reproducing or debugging let me know :) Thanks! -eric On Thu, Jul 9, 2020 at 9:29 AM Kamau Bridgeman via llvm-commits < llvm-commits at lists.llvm.org> wrote: > > Author: Stefan Pintilie > Date: 2020-07-09T11:28:32-05:00 > New Revision: bd2068031121adf5a0e28d9306a1741d6f0bbd87 > > URL: > https://github.com/llvm/llvm-project/commit/bd2068031121adf5a0e28d9306a1741d6f0bbd87 > DIFF: > https://github.com/llvm/llvm-project/commit/bd2068031121adf5a0e28d9306a1741d6f0bbd87.diff > > LOG: [PowerPC] Split s34imm into two types > > Currently the instruction paddi always takes s34imm as the type for the > 34 bit immediate. However, the PC Relative form of the instruction should > not produce the same fixup as the non PC Relative form. > This patch splits the s34imm type into s34imm and s34imm_pcrel so that two > different fixups can be emitted. > > Reviewed By: kamaub, nemanjai > > Differential Revision: https://reviews.llvm.org/D83255 > > Added: > llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s > > Modified: > llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp > llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp > llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h > llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp > llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h > llvm/lib/Target/PowerPC/PPCInstrInfo.td > llvm/lib/Target/PowerPC/PPCInstrPrefix.td > > Removed: > > > > > ################################################################################ > diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp > b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp > index dbaf221db9fc..59cb2b994a4b 100644 > --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp > +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp > @@ -46,6 +46,7 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t > Value) { > case PPC::fixup_ppc_half16ds: > return Value & 0xfffc; > case PPC::fixup_ppc_pcrel34: > + case PPC::fixup_ppc_imm34: > return Value & 0x3ffffffff; > } > } > @@ -68,6 +69,7 @@ static unsigned getFixupKindNumBytes(unsigned Kind) { > case PPC::fixup_ppc_br24_notoc: > return 4; > case PPC::fixup_ppc_pcrel34: > + case PPC::fixup_ppc_imm34: > case FK_Data_8: > return 8; > case PPC::fixup_ppc_nofixup: > @@ -100,6 +102,7 @@ class PPCAsmBackend : public MCAsmBackend { > { "fixup_ppc_half16", 0, 16, 0 }, > { "fixup_ppc_half16ds", 0, 14, 0 }, > { "fixup_ppc_pcrel34", 0, 34, > MCFixupKindInfo::FKF_IsPCRel }, > + { "fixup_ppc_imm34", 0, 34, 0 }, > { "fixup_ppc_nofixup", 0, 0, 0 } > }; > const static MCFixupKindInfo InfosLE[PPC::NumTargetFixupKinds] = { > @@ -112,6 +115,7 @@ class PPCAsmBackend : public MCAsmBackend { > { "fixup_ppc_half16", 0, 16, 0 }, > { "fixup_ppc_half16ds", 2, 14, 0 }, > { "fixup_ppc_pcrel34", 0, 34, > MCFixupKindInfo::FKF_IsPCRel }, > + { "fixup_ppc_imm34", 0, 34, 0 }, > { "fixup_ppc_nofixup", 0, 0, 0 } > }; > > > diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp > b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp > index d8b3301e97f1..1af08ec5539d 100644 > --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp > +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFObjectWriter.cpp > @@ -409,6 +409,9 @@ unsigned PPCELFObjectWriter::getRelocType(MCContext > &Ctx, const MCValue &Target, > break; > } > break; > + case PPC::fixup_ppc_imm34: > + llvm_unreachable("Unsupported Modifier for fixup_ppc_imm34."); > + break; > case FK_Data_8: > switch (Modifier) { > default: llvm_unreachable("Unsupported Modifier"); > > diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h > b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h > index 2fb8947fd4e0..73292f7b7938 100644 > --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h > +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h > @@ -43,6 +43,9 @@ enum Fixups { > // A 34-bit fixup corresponding to PC-relative paddi. > fixup_ppc_pcrel34, > > + // A 34-bit fixup corresponding to Non-PC-relative paddi. > + fixup_ppc_imm34, > + > /// Not a true fixup, but ties a symbol to a call to __tls_get_addr for > the > /// TLS general and local dynamic models, or inserts the thread-pointer > /// register number. > > diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp > b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp > index fb65e7320f2b..8c0e0a80b1e2 100644 > --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp > +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp > @@ -104,20 +104,36 @@ unsigned PPCMCCodeEmitter::getImm16Encoding(const > MCInst &MI, unsigned OpNo, > return 0; > } > > -uint64_t > -PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned OpNo, > - SmallVectorImpl &Fixups, > - const MCSubtargetInfo &STI) const { > +uint64_t PPCMCCodeEmitter::getImm34Encoding(const MCInst &MI, unsigned > OpNo, > + SmallVectorImpl > &Fixups, > + const MCSubtargetInfo &STI, > + MCFixupKind Fixup) const { > const MCOperand &MO = MI.getOperand(OpNo); > - if (MO.isReg() || MO.isImm()) > + assert(!MO.isReg() && "Not expecting a register for this operand."); > + if (MO.isImm()) > return getMachineOpValue(MI, MO, Fixups, STI); > > // Add a fixup for the immediate field. > - Fixups.push_back(MCFixup::create(0, MO.getExpr(), > - (MCFixupKind)PPC::fixup_ppc_pcrel34)); > + Fixups.push_back(MCFixup::create(0, MO.getExpr(), Fixup)); > return 0; > } > > +uint64_t > +PPCMCCodeEmitter::getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, > + SmallVectorImpl > &Fixups, > + const MCSubtargetInfo &STI) > const { > + return getImm34Encoding(MI, OpNo, Fixups, STI, > + (MCFixupKind)PPC::fixup_ppc_imm34); > +} > + > +uint64_t > +PPCMCCodeEmitter::getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, > + SmallVectorImpl &Fixups, > + const MCSubtargetInfo &STI) const > { > + return getImm34Encoding(MI, OpNo, Fixups, STI, > + (MCFixupKind)PPC::fixup_ppc_pcrel34); > +} > + > unsigned PPCMCCodeEmitter::getMemRIEncoding(const MCInst &MI, unsigned > OpNo, > SmallVectorImpl > &Fixups, > const MCSubtargetInfo &STI) > const { > > diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h > b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h > index 588aa76bd806..4504cc6a7405 100644 > --- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h > +++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h > @@ -52,7 +52,14 @@ class PPCMCCodeEmitter : public MCCodeEmitter { > const MCSubtargetInfo &STI) const; > uint64_t getImm34Encoding(const MCInst &MI, unsigned OpNo, > SmallVectorImpl &Fixups, > - const MCSubtargetInfo &STI) const; > + const MCSubtargetInfo &STI, > + MCFixupKind Fixup) const; > + uint64_t getImm34EncodingNoPCRel(const MCInst &MI, unsigned OpNo, > + SmallVectorImpl &Fixups, > + const MCSubtargetInfo &STI) const; > + uint64_t getImm34EncodingPCRel(const MCInst &MI, unsigned OpNo, > + SmallVectorImpl &Fixups, > + const MCSubtargetInfo &STI) const; > unsigned getMemRIEncoding(const MCInst &MI, unsigned OpNo, > SmallVectorImpl &Fixups, > const MCSubtargetInfo &STI) const; > > diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.td > b/llvm/lib/Target/PowerPC/PPCInstrInfo.td > index 673ab63039cf..39a90bf9b346 100644 > --- a/llvm/lib/Target/PowerPC/PPCInstrInfo.td > +++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.td > @@ -757,7 +757,13 @@ def PPCS34ImmAsmOperand : AsmOperandClass { > } > def s34imm : Operand { > let PrintMethod = "printS34ImmOperand"; > - let EncoderMethod = "getImm34Encoding"; > + let EncoderMethod = "getImm34EncodingNoPCRel"; > + let ParserMatchClass = PPCS34ImmAsmOperand; > + let DecoderMethod = "decodeSImmOperand<34>"; > +} > +def s34imm_pcrel : Operand { > + let PrintMethod = "printS34ImmOperand"; > + let EncoderMethod = "getImm34EncodingPCRel"; > let ParserMatchClass = PPCS34ImmAsmOperand; > let DecoderMethod = "decodeSImmOperand<34>"; > } > > diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td > b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td > index 2c21d0a175ad..91bb912e5726 100644 > --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td > +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td > @@ -432,7 +432,7 @@ let Predicates = [PrefixInstrs] in { > let Interpretation64Bit = 1, isCodeGenOnly = 1 in { > defm PADDI8 : > MLS_DForm_R_SI34_RTA5_p<14, (outs g8rc:$RT), (ins g8rc:$RA, > s34imm:$SI), > - (ins immZero:$RA, s34imm:$SI), > + (ins immZero:$RA, s34imm_pcrel:$SI), > "paddi $RT, $RA, $SI", IIC_LdStLFD>; > let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { > def PLI8 : MLS_DForm_SI34_RT5<14, (outs g8rc:$RT), > @@ -442,7 +442,7 @@ let Predicates = [PrefixInstrs] in { > } > defm PADDI : > MLS_DForm_R_SI34_RTA5_p<14, (outs gprc:$RT), (ins gprc:$RA, > s34imm:$SI), > - (ins immZero:$RA, s34imm:$SI), > + (ins immZero:$RA, s34imm_pcrel:$SI), > "paddi $RT, $RA, $SI", IIC_LdStLFD>; > let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in { > def PLI : MLS_DForm_SI34_RT5<14, (outs gprc:$RT), > > diff --git a/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s > b/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s > new file mode 100644 > index 000000000000..0d2c879380e0 > --- /dev/null > +++ b/llvm/test/MC/PowerPC/ppc64-errors-emit-obj.s > @@ -0,0 +1,7 @@ > +# RUN: not --crash llvm-mc -triple powerpc64-- --filetype=obj < %s 2> %t > +# RUN: FileCheck < %t %s > +# RUN: not --crash llvm-mc -triple powerpc64le-- --filetype=obj < %s 2> %t > +# RUN: FileCheck < %t %s > + > +# CHECK: Unsupported Modifier for fixup_ppc_imm34. > +paddi 3, 13, symbol at toc, 0 > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:42:10 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 9 Jul 2020 13:42:10 -0700 Subject: [lld] c282708 - Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. In-Reply-To: References: <5f075f87.1c69fb81.43098.87a8@mx.google.com> Message-ID: Well, heck. That's awesome and not showing up with my clang. I guess I'll revert mine and the original and we can figure out what's going on. Thanks for the heads up! -eric On Thu, Jul 9, 2020 at 1:37 PM Roman Lebedev wrote: > This seems to have broken http://45.33.8.238/win/19452/step_4.txt > > On Thu, Jul 9, 2020 at 9:18 PM Eric Christopher via llvm-commits > wrote: > > > > > > Author: Eric Christopher > > Date: 2020-07-09T11:14:00-07:00 > > New Revision: c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 > > > > URL: > https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 > > DIFF: > https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3.diff > > > > LOG: Fix [-Werror,-Wsign-compare] warnings arising from subsection > symbols patch. > > > > Added: > > > > > > Modified: > > lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > > lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > > > > Removed: > > > > > > > > > ################################################################################ > > diff --git > a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > > index aad5f8afcfdc..07c1d4242e03 100644 > > --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > > +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp > > @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { > > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > EXPECT_TRUE(f->undefinedSymbols.empty()); > > @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { > > fromBinary(fileBytes, sizeof(fileBytes), "i386"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > EXPECT_TRUE(f->undefinedSymbols.empty()); > > @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { > > fromBinary(fileBytes, sizeof(fileBytes), "ppc"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > EXPECT_TRUE(f->undefinedSymbols.empty()); > > @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { > > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > EXPECT_TRUE(f->undefinedSymbols.empty()); > > @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { > > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > EXPECT_TRUE(f->undefinedSymbols.empty()); > > @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { > > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); > > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); > > EXPECT_EQ((int)(f2->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f2->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f2->localSymbols.empty()); > > EXPECT_TRUE(f2->globalSymbols.empty()); > > EXPECT_TRUE(f2->undefinedSymbols.empty()); > > @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { > > > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& text = f->sections[0]; > > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > > @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { > > > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& text = f->sections[0]; > > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > > @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { > > > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& text = f->sections[0]; > > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > > @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { > > > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > > EXPECT_EQ((int)(f->fileType), MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > const Section& text = f->sections[0]; > > EXPECT_TRUE(text.segmentName.equals("__TEXT")); > > > > diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > > index 6ceb197b4b84..c1445ea7eacd 100644 > > --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > > +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp > > @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f->sections.empty()); > > EXPECT_TRUE(f->localSymbols.empty()); > > EXPECT_TRUE(f->globalSymbols.empty()); > > @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { > > std::unique_ptr f2 = fromYAML(intermediate); > > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f2->flags), > llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f2->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_TRUE(f2->sections.empty()); > > EXPECT_TRUE(f2->localSymbols.empty()); > > EXPECT_TRUE(f2->globalSymbols.empty()); > > @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > > > const Section& sect1 = f->sections[0]; > > @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > > > const Section& sect1 = f->sections[0]; > > @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > > > const Section& sect1 = f->sections[0]; > > @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { > > "...\n"); > > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); > > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); > > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); > > EXPECT_EQ(f->sections.size(), 2UL); > > > > const Section& sect1 = f->sections[0]; > > > > > > > > _______________________________________________ > > llvm-commits mailing list > > llvm-commits at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:46:45 2020 From: llvm-commits at lists.llvm.org (Vy Nguyen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:46:45 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: <41655314d3691c17d2d333297f40e362@localhost.localdomain> oontvoo updated this revision to Diff 276830. oontvoo marked an inline comment as done. oontvoo added a comment. Removed empty line Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 Files: llvm/docs/CommandGuide/llvm-exegesis.rst llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att llvm/test/tools/llvm-exegesis/X86/lbr/lit.local.cfg llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.h llvm/tools/llvm-exegesis/lib/X86/CMakeLists.txt llvm/tools/llvm-exegesis/lib/X86/Target.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.h llvm/tools/llvm-exegesis/llvm-exegesis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77422.276830.patch Type: text/x-patch Size: 21801 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:47:47 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:47:47 +0000 (UTC) Subject: [PATCH] D83505: [NFC] Add utility to sum/merge stats files In-Reply-To: References: Message-ID: <45a4e81b4fab57648d716016d049a07f@localhost.localdomain> Tyker added a comment. In D83505#2142556 , @lebedev.ri wrote: > Is there really no such existing script? (i don't know of one, i've always hacked around with bash) i tried to find one the first time i needed one and didn't find any. maybe there is one i didn't find. > It's a non-blocker, but i just want to point out that aggregating via sum is not always correct for all stats - there's `updateMax()` setter. ok i wasn't aware of it, i don't think it is really possible to differentiate between them. but we could count both the sum and the max for all stats. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83505/new/ https://reviews.llvm.org/D83505 From llvm-commits at lists.llvm.org Thu Jul 9 13:47:58 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:47:58 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <0f5fe12c74ad4201f9578bb02987fd1f@localhost.localdomain> lebedev.ri added a comment. I'm not sure if this is the commit to blame, but i think this might have broken http://45.33.8.238/linux/22501/step_12.txt Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Thu Jul 9 13:50:10 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:50:10 +0000 (UTC) Subject: [PATCH] D83505: [NFC] Add utility to sum/merge stats files In-Reply-To: References: Message-ID: <8302d5678faceab14a36464e8055359f@localhost.localdomain> lebedev.ri added a comment. In D83505#2142581 , @Tyker wrote: > In D83505#2142556 , @lebedev.ri wrote: > > > Is there really no such existing script? (i don't know of one, i've always hacked around with bash) > > > i tried to find one the first time i needed one and didn't find any. maybe there is one i didn't find. > > > It's a non-blocker, but i just want to point out that aggregating via sum is not always correct for all stats - there's `updateMax()` setter. > > ok > i wasn't aware of it, i don't think it is really possible to differentiate between them. Yep. Just feel like pointing out. > but we could count both the sum and the max for all stats. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83505/new/ https://reviews.llvm.org/D83505 From llvm-commits at lists.llvm.org Thu Jul 9 13:53:32 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:53:32 +0000 (UTC) Subject: [PATCH] D83497: [PowerPC][Power10] Fix the VINSW instruction to have an i32 argument. In-Reply-To: References: Message-ID: lei accepted this revision as: lei. lei added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83497/new/ https://reviews.llvm.org/D83497 From llvm-commits at lists.llvm.org Thu Jul 9 13:59:52 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 09 Jul 2020 13:59:52 -0700 (PDT) Subject: [lld] 98eec77 - Temporarily Revert "Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch." Message-ID: <5f078548.1c69fb81.a50cb.9940@mx.google.com> Author: Eric Christopher Date: 2020-07-09T13:46:59-07:00 New Revision: 98eec7700c3f397283a3937b1d3ddfe4e6d3b910 URL: https://github.com/llvm/llvm-project/commit/98eec7700c3f397283a3937b1d3ddfe4e6d3b910 DIFF: https://github.com/llvm/llvm-project/commit/98eec7700c3f397283a3937b1d3ddfe4e6d3b910.diff LOG: Temporarily Revert "Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch." as it's causing build errors with another clang so I'll need to approach this differently. This reverts commit c2827083166cd5150232d8fd3ada3cf8fa8c9ac3. Added: Modified: lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp Removed: ################################################################################ diff --git a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp index 07c1d4242e03..aad5f8afcfdc 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { fromBinary(fileBytes, sizeof(fileBytes), "i386"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { fromBinary(fileBytes, sizeof(fileBytes), "ppc"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f2->fileType), MH_OBJECT); - EXPECT_EQ(f2->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); EXPECT_TRUE(f2->undefinedSymbols.empty()); @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp index c1445ea7eacd..6ceb197b4b84 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { std::unique_ptr f2 = fromYAML(intermediate); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); - EXPECT_EQ(f2->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f2->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->sections.empty()); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; From llvm-commits at lists.llvm.org Thu Jul 9 14:00:26 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Thu, 9 Jul 2020 14:00:26 -0700 Subject: [lld] c282708 - Fix [-Werror,-Wsign-compare] warnings arising from subsection symbols patch. In-Reply-To: References: <5f075f87.1c69fb81.43098.87a8@mx.google.com> Message-ID: I've reverted it thusly: echristo at athyra ~/s/llvm-project> git push To github.com:llvm/llvm-project.git ce1e4853b5a..98eec7700c3 master -> master and I'm not sure why it just started failing on my side as there are no other changes. I'll take a look. Thanks again! -eric On Thu, Jul 9, 2020 at 1:42 PM Eric Christopher wrote: > Well, heck. That's awesome and not showing up with my clang. > > I guess I'll revert mine and the original and we can figure out what's > going on. > > Thanks for the heads up! > > -eric > > On Thu, Jul 9, 2020 at 1:37 PM Roman Lebedev wrote: > >> This seems to have broken http://45.33.8.238/win/19452/step_4.txt >> >> On Thu, Jul 9, 2020 at 9:18 PM Eric Christopher via llvm-commits >> wrote: >> > >> > >> > Author: Eric Christopher >> > Date: 2020-07-09T11:14:00-07:00 >> > New Revision: c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 >> > >> > URL: >> https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3 >> > DIFF: >> https://github.com/llvm/llvm-project/commit/c2827083166cd5150232d8fd3ada3cf8fa8c9ac3.diff >> > >> > LOG: Fix [-Werror,-Wsign-compare] warnings arising from subsection >> symbols patch. >> > >> > Added: >> > >> > >> > Modified: >> > lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp >> > lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp >> > >> > Removed: >> > >> > >> > >> > >> ################################################################################ >> > diff --git >> a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp >> b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp >> > index aad5f8afcfdc..07c1d4242e03 100644 >> > --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp >> > +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp >> > @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { >> > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > EXPECT_TRUE(f->undefinedSymbols.empty()); >> > @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { >> > fromBinary(fileBytes, sizeof(fileBytes), "i386"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > EXPECT_TRUE(f->undefinedSymbols.empty()); >> > @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { >> > fromBinary(fileBytes, sizeof(fileBytes), "ppc"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > EXPECT_TRUE(f->undefinedSymbols.empty()); >> > @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { >> > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > EXPECT_TRUE(f->undefinedSymbols.empty()); >> > @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { >> > fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > EXPECT_TRUE(f->undefinedSymbols.empty()); >> > @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { >> > fromBinary(fileBytes, sizeof(fileBytes), "armv7"); >> > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); >> > EXPECT_EQ((int)(f2->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f2->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f2->localSymbols.empty()); >> > EXPECT_TRUE(f2->globalSymbols.empty()); >> > EXPECT_TRUE(f2->undefinedSymbols.empty()); >> > @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { >> > >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > const Section& text = f->sections[0]; >> > EXPECT_TRUE(text.segmentName.equals("__TEXT")); >> > @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { >> > >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > const Section& text = f->sections[0]; >> > EXPECT_TRUE(text.segmentName.equals("__TEXT")); >> > @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { >> > >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > const Section& text = f->sections[0]; >> > EXPECT_TRUE(text.segmentName.equals("__TEXT")); >> > @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { >> > >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); >> > EXPECT_EQ((int)(f->fileType), MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > const Section& text = f->sections[0]; >> > EXPECT_TRUE(text.segmentName.equals("__TEXT")); >> > >> > diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp >> b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp >> > index 6ceb197b4b84..c1445ea7eacd 100644 >> > --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp >> > +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp >> > @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f->sections.empty()); >> > EXPECT_TRUE(f->localSymbols.empty()); >> > EXPECT_TRUE(f->globalSymbols.empty()); >> > @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { >> > std::unique_ptr f2 = fromYAML(intermediate); >> > EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f2->flags), >> llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f2->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_TRUE(f2->sections.empty()); >> > EXPECT_TRUE(f2->localSymbols.empty()); >> > EXPECT_TRUE(f2->globalSymbols.empty()); >> > @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > >> > const Section& sect1 = f->sections[0]; >> > @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > >> > const Section& sect1 = f->sections[0]; >> > @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > >> > const Section& sect1 = f->sections[0]; >> > @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { >> > "...\n"); >> > EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); >> > EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); >> > - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > + EXPECT_EQ(f->flags, llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); >> > EXPECT_EQ(f->sections.size(), 2UL); >> > >> > const Section& sect1 = f->sections[0]; >> > >> > >> > >> > _______________________________________________ >> > llvm-commits mailing list >> > llvm-commits at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:00:44 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:00:44 +0000 (UTC) Subject: [PATCH] D83506: [NFC] Add debug and stat counters to assume queries and assume builder In-Reply-To: References: Message-ID: <4210b5113983a7766e7e8270aa530b85@localhost.localdomain> Tyker updated this revision to Diff 276831. Tyker added a comment. Fixed formating Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83506/new/ https://reviews.llvm.org/D83506 Files: llvm/lib/Analysis/AssumeBundleQueries.cpp llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp llvm/test/Analysis/ValueTracking/assume-queries-counter.ll llvm/test/Transforms/Util/assume-builder-counter.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83506.276831.patch Type: text/x-patch Size: 15661 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:01:07 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:01:07 +0000 (UTC) Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: probinson added a comment. In D83462#2142036 , @echristo wrote: > So the tuning here for SCE is also a "does not support" or something else? I'm told our debugger currently does not support the entry-value opcode. Locations with that opcode would be dropped/ignored (presumably just the individual location-list elements, although I haven't verified). I am nudging our folks in the direction of supporting it, but need to demonstrate value first; that the additional location descriptions increase availability, and that the expressions can be evaluated usefully. I have to say a few preliminary experiments don't make me feel too positive about that second part, as the entry-value expressions tend to rely on registers that aren't saved by the ABI, so unwinding won't recover them. In the meantime I'd prefer that they weren't emitted. for the usual size reasons. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83462/new/ https://reviews.llvm.org/D83462 From llvm-commits at lists.llvm.org Thu Jul 9 14:02:51 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Thu, 09 Jul 2020 14:02:51 -0700 (PDT) Subject: [llvm] c2a61ef - Revert "[CallGraph] Ignore callback uses" Message-ID: <5f0785fb.1c69fb81.a5778.9550@mx.google.com> Author: Roman Lebedev Date: 2020-07-10T00:02:07+03:00 New Revision: c2a61ef3885019c5e0444d8789de63e1ce4d5003 URL: https://github.com/llvm/llvm-project/commit/c2a61ef3885019c5e0444d8789de63e1ce4d5003 DIFF: https://github.com/llvm/llvm-project/commit/c2a61ef3885019c5e0444d8789de63e1ce4d5003.diff LOG: Revert "[CallGraph] Ignore callback uses" This likely has broken test/Transforms/Attributor/IPConstantProp/ tests. http://45.33.8.238/linux/22502/step_12.txt This reverts commit 205dc0922d5f7305226f7457fcbcb4224c92530c. Added: Modified: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp Removed: llvm/test/Analysis/CallGraph/ignore-callback-uses.ll ################################################################################ diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h index bb4ec13c7610..ee66abc3eaed 100644 --- a/llvm/include/llvm/IR/Function.h +++ b/llvm/include/llvm/IR/Function.h @@ -830,11 +830,9 @@ class Function : public GlobalObject, public ilist_node { /// hasAddressTaken - returns true if there are any uses of this function /// other than direct calls or invokes to it, or blockaddress expressions. - /// Optionally passes back an offending user for diagnostic purposes and - /// ignores callback uses. + /// Optionally passes back an offending user for diagnostic purposes. /// - bool hasAddressTaken(const User ** = nullptr, - bool IgnoreCallbackUses = false) const; + bool hasAddressTaken(const User** = nullptr) const; /// isDefTriviallyDead - Return true if it is trivially safe to remove /// this function definition from the module (because it isn't externally diff --git a/llvm/lib/Analysis/CallGraph.cpp b/llvm/lib/Analysis/CallGraph.cpp index 08264512f400..d8abccfdb095 100644 --- a/llvm/lib/Analysis/CallGraph.cpp +++ b/llvm/lib/Analysis/CallGraph.cpp @@ -77,11 +77,9 @@ bool CallGraph::invalidate(Module &, const PreservedAnalyses &PA, void CallGraph::addToCallGraph(Function *F) { CallGraphNode *Node = getOrInsertFunction(F); - bool IgnoreCallbackUses = true; - - // If this function has external linkage or has its address taken and - // it is not a callback, then anything could call it. - if (!F->hasLocalLinkage() || F->hasAddressTaken(nullptr, IgnoreCallbackUses)) + // If this function has external linkage or has its address taken, anything + // could call it. + if (!F->hasLocalLinkage() || F->hasAddressTaken()) ExternalCallingNode->addCalledFunction(nullptr, Node); populateCallGraphNode(Node); diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp index 995bc40c362f..0ec0cce83a8c 100644 --- a/llvm/lib/IR/Function.cpp +++ b/llvm/lib/IR/Function.cpp @@ -20,7 +20,6 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" -#include "llvm/IR/AbstractCallSite.h" #include "llvm/IR/Argument.h" #include "llvm/IR/Attributes.h" #include "llvm/IR/BasicBlock.h" @@ -1485,18 +1484,12 @@ Optional Intrinsic::remangleIntrinsicFunction(Function *F) { } /// hasAddressTaken - returns true if there are any uses of this function -/// other than direct calls or invokes to it. Optionally ignores callback -/// uses. -bool Function::hasAddressTaken(const User **PutOffender, - bool IgnoreCallbackUses) const { +/// other than direct calls or invokes to it. +bool Function::hasAddressTaken(const User* *PutOffender) const { for (const Use &U : uses()) { const User *FU = U.getUser(); if (isa(FU)) continue; - - if (IgnoreCallbackUses && AbstractCallSite(&U)) - continue; - const auto *Call = dyn_cast(FU); if (!Call) { if (PutOffender) diff --git a/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll b/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll deleted file mode 100644 index 8964ca1efd86..000000000000 --- a/llvm/test/Analysis/CallGraph/ignore-callback-uses.ll +++ /dev/null @@ -1,51 +0,0 @@ -; RUN: opt < %s -print-callgraph -disable-output 2>&1 | FileCheck %s -; CHECK: Call graph node <><<{{.*}}>> #uses=0 -; CHECK-NEXT: CS<{{.*}}> calls function 'f' -; CHECK-NEXT: CS<{{.*}}> calls function '__kmpc_fork_call' -; CHECK-EMPTY: - -%struct.ident_t = type { i32, i32, i32, i32, i8* } - - at 0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 - at 1 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @0, i32 0, i32 0) }, align 8 - -; Function Attrs: noinline nounwind optnone uwtable -define dso_local void @f() { -entry: - br label %omp_parallel - -omp_parallel: ; preds = %entry - call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @f..omp_par to void (i32*, i32*, ...)*)) - br label %omp.par.exit.split - -omp.par.exit.split: ; preds = %omp_parallel - ret void -} - -; Function Attrs: norecurse nounwind -define internal void @f..omp_par(i32* noalias %tid.addr, i32* noalias %zero.addr) { -omp.par.entry: - %tid.addr.local = alloca i32, align 4 - %0 = load i32, i32* %tid.addr, align 4 - store i32 %0, i32* %tid.addr.local, align 4 - %tid = load i32, i32* %tid.addr.local, align 4 - br label %omp.par.region - -omp.par.exit.split.exitStub: ; preds = %omp.par.outlined.exit - ret void - -omp.par.region: ; preds = %omp.par.entry - br label %omp.par.pre_finalize - -omp.par.pre_finalize: ; preds = %omp.par.region - br label %omp.par.outlined.exit - -omp.par.outlined.exit: ; preds = %omp.par.pre_finalize - br label %omp.par.exit.split.exitStub -} - -; Function Attrs: nounwind -declare !callback !2 void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) #2 - -!2 = !{!3} -!3 = !{i64 2, i64 -1, i64 -1, i1 true} From llvm-commits at lists.llvm.org Thu Jul 9 14:03:58 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:03:58 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <0efa030b00a3005a95f090f2a5abca51@localhost.localdomain> lebedev.ri reopened this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. Temporarily reverted in c2a61ef3885019c5e0444d8789de63e1ce4d5003 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Thu Jul 9 14:09:51 2020 From: llvm-commits at lists.llvm.org (Paul Robinson via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:09:51 +0000 (UTC) Subject: [PATCH] D83091: [FileCheck] Improve -dump-input documentation In-Reply-To: References: Message-ID: <22f262a102a6ef19e2310aed2c02e5fd@localhost.localdomain> probinson accepted this revision. probinson added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83091/new/ https://reviews.llvm.org/D83091 From llvm-commits at lists.llvm.org Thu Jul 9 14:09:53 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:09:53 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <61f8418f1170e36628d3921b12f0237f@localhost.localdomain> lebedev.ri requested changes to this revision. lebedev.ri added a comment. This revision now requires changes to proceed. Yep, that was it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Thu Jul 9 14:11:22 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:11:22 +0000 (UTC) Subject: [PATCH] D83507: [AssumeBundles] Fix Bug in Assume Queries Message-ID: Tyker created this revision. Tyker added a reviewer: jdoerfert. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. this bug was causing miscompile. now clang cant properly selfhost with -mllvm --enable-knowledge-retention Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83507 Files: llvm/lib/Analysis/AssumeBundleQueries.cpp Index: llvm/lib/Analysis/AssumeBundleQueries.cpp =================================================================== --- llvm/lib/Analysis/AssumeBundleQueries.cpp +++ llvm/lib/Analysis/AssumeBundleQueries.cpp @@ -179,12 +179,15 @@ if (!II || Elem.Index == AssumptionCache::ExprResultIdx) continue; if (RetainedKnowledge RK = getKnowledgeFromBundle( - *II, II->bundle_op_info_begin()[Elem.Index])) + *II, II->bundle_op_info_begin()[Elem.Index])) { + if (V != RK.WasOn) + continue; if (is_contained(AttrKinds, RK.AttrKind) && Filter(RK, II, &II->bundle_op_info_begin()[Elem.Index])) { NumUsefullAssumeQueries++; return RK; } + } } return RetainedKnowledge::none(); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83507.276832.patch Type: text/x-patch Size: 801 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:12:43 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:12:43 +0000 (UTC) Subject: [PATCH] D83091: [FileCheck] Improve -dump-input documentation In-Reply-To: References: Message-ID: <4f9ef1322dc49589bf4509c5644eb6bd@localhost.localdomain> jdenny added a comment. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83091/new/ https://reviews.llvm.org/D83091 From llvm-commits at lists.llvm.org Thu Jul 9 14:13:27 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:13:27 +0000 (UTC) Subject: [PATCH] D83507: [AssumeBundles] Fix Bug in Assume Queries In-Reply-To: References: Message-ID: <8e9cb9e41e7c448dcdb66e2b1160e40a@localhost.localdomain> lebedev.ri added a comment. Test? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83507/new/ https://reviews.llvm.org/D83507 From llvm-commits at lists.llvm.org Thu Jul 9 14:15:33 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:15:33 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: zequanwu updated this revision to Diff 276834. zequanwu added a comment. Add comments and fix test failure in http://45.33.8.238/linux/22500/step_7.txt. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.276834.patch Type: text/x-patch Size: 18322 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:16:01 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Thu, 09 Jul 2020 14:16:01 -0700 (PDT) Subject: [llvm] c0308fd - [PredicateInfo] Print RenamedOp (NFC) Message-ID: <5f078911.1c69fb81.531f.865d@mx.google.com> Author: Nikita Popov Date: 2020-07-09T23:14:24+02:00 New Revision: c0308fd154f9a945608bd42630dc81dce5edfb40 URL: https://github.com/llvm/llvm-project/commit/c0308fd154f9a945608bd42630dc81dce5edfb40 DIFF: https://github.com/llvm/llvm-project/commit/c0308fd154f9a945608bd42630dc81dce5edfb40.diff LOG: [PredicateInfo] Print RenamedOp (NFC) Make it easier to debug renaming issues. Added: Modified: llvm/lib/Transforms/Utils/PredicateInfo.cpp llvm/test/Transforms/Util/PredicateInfo/condprop.ll llvm/test/Transforms/Util/PredicateInfo/unnamed-types.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Utils/PredicateInfo.cpp b/llvm/lib/Transforms/Utils/PredicateInfo.cpp index d320f488c5c5..c81efd77aa5f 100644 --- a/llvm/lib/Transforms/Utils/PredicateInfo.cpp +++ b/llvm/lib/Transforms/Utils/PredicateInfo.cpp @@ -896,18 +896,21 @@ class PredicateInfoAnnotatedWriter : public AssemblyAnnotationWriter { PB->From->printAsOperand(OS); OS << ","; PB->To->printAsOperand(OS); - OS << "] }\n"; + OS << "]"; } else if (const auto *PS = dyn_cast(PI)) { OS << "; switch predicate info { CaseValue: " << *PS->CaseValue << " Switch:" << *PS->Switch << " Edge: ["; PS->From->printAsOperand(OS); OS << ","; PS->To->printAsOperand(OS); - OS << "] }\n"; + OS << "]"; } else if (const auto *PA = dyn_cast(PI)) { OS << "; assume predicate info {" - << " Comparison:" << *PA->Condition << " }\n"; + << " Comparison:" << *PA->Condition; } + OS << ", RenamedOp: "; + PI->RenamedOp->printAsOperand(OS, false); + OS << " }\n"; } } }; diff --git a/llvm/test/Transforms/Util/PredicateInfo/condprop.ll b/llvm/test/Transforms/Util/PredicateInfo/condprop.ll index daf6bb8b40a8..756457ab7fa9 100644 --- a/llvm/test/Transforms/Util/PredicateInfo/condprop.ll +++ b/llvm/test/Transforms/Util/PredicateInfo/condprop.ll @@ -138,7 +138,7 @@ define void @test4(i1 %b, i32 %x) { ; CHECK-NEXT: i32 2, label [[CASE0]] ; CHECK-NEXT: i32 3, label [[CASE3]] ; CHECK-NEXT: i32 4, label [[DEFAULT:%.*]] -; CHECK-NEXT: ] Edge: [label [[SW]],label %case1] } +; CHECK-NEXT: ] Edge: [label [[SW]],label %case1] ; CHECK-NEXT: [[X_0:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[X:%.*]]) ; CHECK-NEXT: switch i32 [[X]], label [[DEFAULT]] [ ; CHECK-NEXT: i32 0, label [[CASE0]] diff --git a/llvm/test/Transforms/Util/PredicateInfo/unnamed-types.ll b/llvm/test/Transforms/Util/PredicateInfo/unnamed-types.ll index 21e702178fd7..d1e0f358fc9f 100644 --- a/llvm/test/Transforms/Util/PredicateInfo/unnamed-types.ll +++ b/llvm/test/Transforms/Util/PredicateInfo/unnamed-types.ll @@ -7,12 +7,12 @@ ; CHECK-LABEL: bb: ; CHECK: Has predicate info -; CHECK: branch predicate info { TrueEdge: 1 Comparison: %cmp1 = icmp ne %0* %arg, null Edge: [label %bb,label %bb1] } +; CHECK: branch predicate info { TrueEdge: 1 Comparison: %cmp1 = icmp ne %0* %arg, null Edge: [label %bb,label %bb1], RenamedOp: %arg } ; CHECK-NEXT: %arg.0 = call %0* @llvm.ssa.copy.{{.+}}(%0* %arg) ; CHECK-LABEL: bb1: ; CHECK: Has predicate info -; CHECK-NEXT: branch predicate info { TrueEdge: 0 Comparison: %cmp2 = icmp ne %1* null, %tmp Edge: [label %bb1,label %bb3] } +; CHECK-NEXT: branch predicate info { TrueEdge: 0 Comparison: %cmp2 = icmp ne %1* null, %tmp Edge: [label %bb1,label %bb3], RenamedOp: %tmp } ; CHECK-NEXT: %tmp.0 = call %1* @llvm.ssa.copy.{{.+}}(%1* %tmp) define void @f0(%0* %arg, %1* %tmp) { From llvm-commits at lists.llvm.org Thu Jul 9 14:16:25 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:16:25 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <6884af0d21a3b98f5730e55e4ca4e729@localhost.localdomain> zequanwu marked 5 inline comments as done. zequanwu added inline comments. ================ Comment at: llvm/lib/Transforms/Instrumentation/CGProfile.cpp:64 // Ignore error here. Indirect calls are ignored if this fails. - (void)(bool)Symtab.create(M); + (void)(bool) Symtab.create(M); for (auto &F : M) { ---------------- echristo wrote: > Extra space? Did clang-format put this in? Yes, `clang-format` put this in. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Thu Jul 9 14:18:26 2020 From: llvm-commits at lists.llvm.org (Nuno Lopes via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:18:26 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <2434b323bab909f54e78abf9712899fc@localhost.localdomain> nlopes added a subscriber: tgt. nlopes added a comment. In D83360#2142457 , @efriedma wrote: > > that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef > > If I'm understanding correctly, this reduces to something like the following: > > define i32 @src() { > > %x2 = freeze i32 undef > ret i32 %x2 > > } > > define i32 @tgt() { > > ret i32 undef > > } > > This seems a little suspect, yes. This is a known bug: https://github.com/AliveToolkit/alive2/issues/3 gotta fix this soon. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Thu Jul 9 14:26:13 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Thu, 09 Jul 2020 14:26:13 -0700 (PDT) Subject: [llvm] 6890e2a - [DAGCombiner] add helper function to manage list of consecutive stores; NFC Message-ID: <5f078b75.1c69fb81.6f076.8ff3@mx.google.com> Author: Sanjay Patel Date: 2020-07-09T17:20:03-04:00 New Revision: 6890e2a17b75211cf65fca597ada768bda348c4c URL: https://github.com/llvm/llvm-project/commit/6890e2a17b75211cf65fca597ada768bda348c4c DIFF: https://github.com/llvm/llvm-project/commit/6890e2a17b75211cf65fca597ada768bda348c4c.diff LOG: [DAGCombiner] add helper function to manage list of consecutive stores; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index dd869f98b5bc..a6293db22074 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -707,6 +707,12 @@ namespace { SmallVectorImpl &StoreNodes, unsigned NumStores, SDNode *RootNode); + /// This is a helper function for mergeConsecutiveStores. Given a list of + /// store candidates, find the first N that are consecutive in memory. + /// Returns 0 if there are not at least 2 consecutive stores to try merging. + unsigned getConsecutiveStores(SmallVectorImpl &StoreNodes, + int64_t ElementSizeBytes) const; + /// Merge consecutive store operations into a wide store. /// This optimization uses wide integers or vectors when possible. /// \return true if stores were merged. @@ -16237,6 +16243,46 @@ bool DAGCombiner::checkMergeStoreCandidatesForDependencies( return true; } +unsigned +DAGCombiner::getConsecutiveStores(SmallVectorImpl &StoreNodes, + int64_t ElementSizeBytes) const { + while (true) { + // Find a store past the width of the first store. + size_t StartIdx = 0; + while ((StartIdx + 1 < StoreNodes.size()) && + StoreNodes[StartIdx].OffsetFromBase + ElementSizeBytes != + StoreNodes[StartIdx + 1].OffsetFromBase) + ++StartIdx; + + // Bail if we don't have enough candidates to merge. + if (StartIdx + 1 >= StoreNodes.size()) + return 0; + + // Trim stores that overlapped with the first store. + if (StartIdx) + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + StartIdx); + + // Scan the memory operations on the chain and find the first + // non-consecutive store memory address. + unsigned NumConsecutiveStores = 1; + int64_t StartAddress = StoreNodes[0].OffsetFromBase; + // Check that the addresses are consecutive starting from the second + // element in the list of stores. + for (unsigned i = 1, e = StoreNodes.size(); i < e; ++i) { + int64_t CurrAddress = StoreNodes[i].OffsetFromBase; + if (CurrAddress - StartAddress != (ElementSizeBytes * i)) + break; + NumConsecutiveStores = i + 1; + } + if (NumConsecutiveStores > 1) + return NumConsecutiveStores; + + // There are no consecutive stores at the start of the list. + // Remove the first store and try again. + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + 1); + } +} + bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None || !EnableStoreMerging) return false; @@ -16297,38 +16343,14 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // front of StoreNodes here. bool MadeChange = false; while (StoreNodes.size() > 1) { - size_t StartIdx = 0; - while ((StartIdx + 1 < StoreNodes.size()) && - StoreNodes[StartIdx].OffsetFromBase + ElementSizeBytes != - StoreNodes[StartIdx + 1].OffsetFromBase) - ++StartIdx; - - // Bail if we don't have enough candidates to merge. - if (StartIdx + 1 >= StoreNodes.size()) + unsigned NumConsecutiveStores = + getConsecutiveStores(StoreNodes, ElementSizeBytes); + // There are no more stores in the list to examine. + if (NumConsecutiveStores == 0) return MadeChange; - if (StartIdx) - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + StartIdx); - - // Scan the memory operations on the chain and find the first - // non-consecutive store memory address. - unsigned NumConsecutiveStores = 1; - int64_t StartAddress = StoreNodes[0].OffsetFromBase; - // Check that the addresses are consecutive starting from the second - // element in the list of stores. - for (unsigned i = 1, e = StoreNodes.size(); i < e; ++i) { - int64_t CurrAddress = StoreNodes[i].OffsetFromBase; - if (CurrAddress - StartAddress != (ElementSizeBytes * i)) - break; - NumConsecutiveStores = i + 1; - } - - if (NumConsecutiveStores < 2) { - StoreNodes.erase(StoreNodes.begin(), - StoreNodes.begin() + NumConsecutiveStores); - continue; - } - + // We have at least 2 consecutive stores. Try to merge them. + assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores"); if (StoreSrc == StoreSource::Constant) { // Store the constants into memory as one consecutive store. while (NumConsecutiveStores >= 2) { @@ -16518,6 +16540,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // come from multiple consecutive loads. We merge them into a single // wide load and a single wide store. assert(StoreSrc == StoreSource::Load && "Expected load source for store"); + int64_t StartAddress = StoreNodes[0].OffsetFromBase; // Look for load nodes which are used by the stored values. SmallVector LoadNodes; From llvm-commits at lists.llvm.org Thu Jul 9 14:26:15 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Thu, 09 Jul 2020 14:26:15 -0700 (PDT) Subject: [llvm] 8d74cb0 - [DAGCombiner] add helper function for store merging of constants; NFC Message-ID: <5f078b77.1c69fb81.23336.8485@mx.google.com> Author: Sanjay Patel Date: 2020-07-09T17:20:03-04:00 New Revision: 8d74cb01b732a55f0942a9fabe9f779820f2c06b URL: https://github.com/llvm/llvm-project/commit/8d74cb01b732a55f0942a9fabe9f779820f2c06b DIFF: https://github.com/llvm/llvm-project/commit/8d74cb01b732a55f0942a9fabe9f779820f2c06b.diff LOG: [DAGCombiner] add helper function for store merging of constants; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index a6293db22074..3e0f05a3738e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -713,6 +713,12 @@ namespace { unsigned getConsecutiveStores(SmallVectorImpl &StoreNodes, int64_t ElementSizeBytes) const; + /// This is a helper function for mergeConsecutiveStores. It is used for + /// store chains that are composed entirely of constant values. + bool tryStoreMergeOfConstants(SmallVectorImpl &StoreNodes, + unsigned NumConsecutiveStores, + EVT MemVT, bool AllowVectors, SDNode *Root); + /// Merge consecutive store operations into a wide store. /// This optimization uses wide integers or vectors when possible. /// \return true if stores were merged. @@ -16283,6 +16289,129 @@ DAGCombiner::getConsecutiveStores(SmallVectorImpl &StoreNodes, } } +bool DAGCombiner::tryStoreMergeOfConstants( + SmallVectorImpl &StoreNodes, unsigned NumConsecutiveStores, + EVT MemVT, bool AllowVectors, SDNode *RootNode) { + LLVMContext &Context = *DAG.getContext(); + const DataLayout &DL = DAG.getDataLayout(); + int64_t ElementSizeBytes = MemVT.getStoreSize(); + unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; + bool MadeChange = false; + // Store the constants into memory as one consecutive store. + while (NumConsecutiveStores >= 2) { + LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; + unsigned FirstStoreAS = FirstInChain->getAddressSpace(); + unsigned FirstStoreAlign = FirstInChain->getAlignment(); + unsigned LastLegalType = 1; + unsigned LastLegalVectorType = 1; + bool LastIntegerTrunc = false; + bool NonZero = false; + unsigned FirstZeroAfterNonZero = NumConsecutiveStores; + for (unsigned i = 0; i < NumConsecutiveStores; ++i) { + StoreSDNode *ST = cast(StoreNodes[i].MemNode); + SDValue StoredVal = ST->getValue(); + bool IsElementZero = false; + if (ConstantSDNode *C = dyn_cast(StoredVal)) + IsElementZero = C->isNullValue(); + else if (ConstantFPSDNode *C = dyn_cast(StoredVal)) + IsElementZero = C->getConstantFPValue()->isNullValue(); + if (IsElementZero) { + if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores) + FirstZeroAfterNonZero = i; + } + NonZero |= !IsElementZero; + + // Find a legal type for the constant store. + unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8; + EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits); + bool IsFast = false; + + // Break early when size is too large to be legal. + if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits) + break; + + if (TLI.isTypeLegal(StoreTy) && + TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstInChain->getMemOperand(), &IsFast) && + IsFast) { + LastIntegerTrunc = false; + LastLegalType = i + 1; + // Or check whether a truncstore is legal. + } else if (TLI.getTypeAction(Context, StoreTy) == + TargetLowering::TypePromoteInteger) { + EVT LegalizedStoredValTy = + TLI.getTypeToTransformTo(Context, StoredVal.getValueType()); + if (TLI.isTruncStoreLegal(LegalizedStoredValTy, StoreTy) && + TLI.canMergeStoresTo(FirstStoreAS, LegalizedStoredValTy, DAG) && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstInChain->getMemOperand(), &IsFast) && + IsFast) { + LastIntegerTrunc = true; + LastLegalType = i + 1; + } + } + + // We only use vectors if the constant is known to be zero or the + // target allows it and the function is not marked with the + // noimplicitfloat attribute. + if ((!NonZero || + TLI.storeOfVectorConstantIsCheap(MemVT, i + 1, FirstStoreAS)) && + AllowVectors) { + // Find a legal type for the vector store. + unsigned Elts = (i + 1) * NumMemElts; + EVT Ty = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); + if (TLI.isTypeLegal(Ty) && TLI.isTypeLegal(MemVT) && + TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) && + TLI.allowsMemoryAccess(Context, DL, Ty, + *FirstInChain->getMemOperand(), &IsFast) && + IsFast) + LastLegalVectorType = i + 1; + } + } + + bool UseVector = (LastLegalVectorType > LastLegalType) && AllowVectors; + unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType; + + // Check if we found a legal integer type that creates a meaningful + // merge. + if (NumElem < 2) { + // We know that candidate stores are in order and of correct + // shape. While there is no mergeable sequence from the + // beginning one may start later in the sequence. The only + // reason a merge of size N could have failed where another of + // the same size would not have, is if the alignment has + // improved or we've dropped a non-zero value. Drop as many + // candidates as we can here. + unsigned NumSkip = 1; + while ((NumSkip < NumConsecutiveStores) && + (NumSkip < FirstZeroAfterNonZero) && + (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) + NumSkip++; + + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); + NumConsecutiveStores -= NumSkip; + continue; + } + + // Check that we can merge these candidates without causing a cycle. + if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem, + RootNode)) { + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); + NumConsecutiveStores -= NumElem; + continue; + } + + MadeChange |= mergeStoresOfConstantsOrVecElts( + StoreNodes, MemVT, NumElem, true, UseVector, LastIntegerTrunc); + + // Remove merged stores for next iteration. + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); + NumConsecutiveStores -= NumElem; + } + return MadeChange; +} + bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None || !EnableStoreMerging) return false; @@ -16352,120 +16481,8 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // We have at least 2 consecutive stores. Try to merge them. assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores"); if (StoreSrc == StoreSource::Constant) { - // Store the constants into memory as one consecutive store. - while (NumConsecutiveStores >= 2) { - LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; - unsigned FirstStoreAS = FirstInChain->getAddressSpace(); - unsigned FirstStoreAlign = FirstInChain->getAlignment(); - unsigned LastLegalType = 1; - unsigned LastLegalVectorType = 1; - bool LastIntegerTrunc = false; - bool NonZero = false; - unsigned FirstZeroAfterNonZero = NumConsecutiveStores; - for (unsigned i = 0; i < NumConsecutiveStores; ++i) { - StoreSDNode *ST = cast(StoreNodes[i].MemNode); - SDValue StoredVal = ST->getValue(); - bool IsElementZero = false; - if (ConstantSDNode *C = dyn_cast(StoredVal)) - IsElementZero = C->isNullValue(); - else if (ConstantFPSDNode *C = dyn_cast(StoredVal)) - IsElementZero = C->getConstantFPValue()->isNullValue(); - if (IsElementZero) { - if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores) - FirstZeroAfterNonZero = i; - } - NonZero |= !IsElementZero; - - // Find a legal type for the constant store. - unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8; - EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits); - bool IsFast = false; - - // Break early when size is too large to be legal. - if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits) - break; - - if (TLI.isTypeLegal(StoreTy) && - TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstInChain->getMemOperand(), &IsFast) && - IsFast) { - LastIntegerTrunc = false; - LastLegalType = i + 1; - // Or check whether a truncstore is legal. - } else if (TLI.getTypeAction(Context, StoreTy) == - TargetLowering::TypePromoteInteger) { - EVT LegalizedStoredValTy = - TLI.getTypeToTransformTo(Context, StoredVal.getValueType()); - if (TLI.isTruncStoreLegal(LegalizedStoredValTy, StoreTy) && - TLI.canMergeStoresTo(FirstStoreAS, LegalizedStoredValTy, DAG) && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstInChain->getMemOperand(), - &IsFast) && - IsFast) { - LastIntegerTrunc = true; - LastLegalType = i + 1; - } - } - - // We only use vectors if the constant is known to be zero or the - // target allows it and the function is not marked with the - // noimplicitfloat attribute. - if ((!NonZero || - TLI.storeOfVectorConstantIsCheap(MemVT, i + 1, FirstStoreAS)) && - AllowVectors) { - // Find a legal type for the vector store. - unsigned Elts = (i + 1) * NumMemElts; - EVT Ty = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); - if (TLI.isTypeLegal(Ty) && TLI.isTypeLegal(MemVT) && - TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) && - TLI.allowsMemoryAccess( - Context, DL, Ty, *FirstInChain->getMemOperand(), &IsFast) && - IsFast) - LastLegalVectorType = i + 1; - } - } - - bool UseVector = (LastLegalVectorType > LastLegalType) && AllowVectors; - unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType; - - // Check if we found a legal integer type that creates a meaningful - // merge. - if (NumElem < 2) { - // We know that candidate stores are in order and of correct - // shape. While there is no mergeable sequence from the - // beginning one may start later in the sequence. The only - // reason a merge of size N could have failed where another of - // the same size would not have, is if the alignment has - // improved or we've dropped a non-zero value. Drop as many - // candidates as we can here. - unsigned NumSkip = 1; - while ( - (NumSkip < NumConsecutiveStores) && - (NumSkip < FirstZeroAfterNonZero) && - (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) - NumSkip++; - - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); - NumConsecutiveStores -= NumSkip; - continue; - } - - // Check that we can merge these candidates without causing a cycle. - if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem, - RootNode)) { - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); - NumConsecutiveStores -= NumElem; - continue; - } - - MadeChange |= mergeStoresOfConstantsOrVecElts( - StoreNodes, MemVT, NumElem, true, UseVector, LastIntegerTrunc); - - // Remove merged stores for next iteration. - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); - NumConsecutiveStores -= NumElem; - } + MadeChange |= tryStoreMergeOfConstants(StoreNodes, NumConsecutiveStores, + MemVT, AllowVectors, RootNode); continue; } From llvm-commits at lists.llvm.org Thu Jul 9 14:26:18 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Thu, 09 Jul 2020 14:26:18 -0700 (PDT) Subject: [llvm] f98a602 - [DAGCombiner] add helper function for store merging of extracts; NFC Message-ID: <5f078b7a.1c69fb81.ffc83.8e34@mx.google.com> Author: Sanjay Patel Date: 2020-07-09T17:20:03-04:00 New Revision: f98a602c2e37648f637aac15cba7cbe86906e720 URL: https://github.com/llvm/llvm-project/commit/f98a602c2e37648f637aac15cba7cbe86906e720 DIFF: https://github.com/llvm/llvm-project/commit/f98a602c2e37648f637aac15cba7cbe86906e720.diff LOG: [DAGCombiner] add helper function for store merging of extracts; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 3e0f05a3738e..406af4f18549 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -719,6 +719,14 @@ namespace { unsigned NumConsecutiveStores, EVT MemVT, bool AllowVectors, SDNode *Root); + /// This is a helper function for mergeConsecutiveStores. It is used for + /// store chains that are composed entirely of extracted vector elements. + /// When extracting multiple vector elements, try to store them in one + /// vector store rather than a sequence of scalar stores. + bool tryStoreMergeOfExtracts(SmallVectorImpl &StoreNodes, + unsigned NumConsecutiveStores, EVT MemVT, + SDNode *Root); + /// Merge consecutive store operations into a wide store. /// This optimization uses wide integers or vectors when possible. /// \return true if stores were merged. @@ -16297,6 +16305,7 @@ bool DAGCombiner::tryStoreMergeOfConstants( int64_t ElementSizeBytes = MemVT.getStoreSize(); unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; bool MadeChange = false; + // Store the constants into memory as one consecutive store. while (NumConsecutiveStores >= 2) { LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; @@ -16412,6 +16421,74 @@ bool DAGCombiner::tryStoreMergeOfConstants( return MadeChange; } +bool DAGCombiner::tryStoreMergeOfExtracts( + SmallVectorImpl &StoreNodes, unsigned NumConsecutiveStores, + EVT MemVT, SDNode *RootNode) { + LLVMContext &Context = *DAG.getContext(); + const DataLayout &DL = DAG.getDataLayout(); + unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; + bool MadeChange = false; + + // Loop on Consecutive Stores on success. + while (NumConsecutiveStores >= 2) { + LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; + unsigned FirstStoreAS = FirstInChain->getAddressSpace(); + unsigned FirstStoreAlign = FirstInChain->getAlignment(); + unsigned NumStoresToMerge = 1; + for (unsigned i = 0; i < NumConsecutiveStores; ++i) { + // Find a legal type for the vector store. + unsigned Elts = (i + 1) * NumMemElts; + EVT Ty = EVT::getVectorVT(*DAG.getContext(), MemVT.getScalarType(), Elts); + bool IsFast = false; + + // Break early when size is too large to be legal. + if (Ty.getSizeInBits() > MaximumLegalStoreInBits) + break; + + if (TLI.isTypeLegal(Ty) && TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) && + TLI.allowsMemoryAccess(Context, DL, Ty, + *FirstInChain->getMemOperand(), &IsFast) && + IsFast) + NumStoresToMerge = i + 1; + } + + // Check if we found a legal integer type creating a meaningful + // merge. + if (NumStoresToMerge < 2) { + // We know that candidate stores are in order and of correct + // shape. While there is no mergeable sequence from the + // beginning one may start later in the sequence. The only + // reason a merge of size N could have failed where another of + // the same size would not have, is if the alignment has + // improved. Drop as many candidates as we can here. + unsigned NumSkip = 1; + while ((NumSkip < NumConsecutiveStores) && + (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) + NumSkip++; + + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); + NumConsecutiveStores -= NumSkip; + continue; + } + + // Check that we can merge these candidates without causing a cycle. + if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumStoresToMerge, + RootNode)) { + StoreNodes.erase(StoreNodes.begin(), + StoreNodes.begin() + NumStoresToMerge); + NumConsecutiveStores -= NumStoresToMerge; + continue; + } + + MadeChange |= mergeStoresOfConstantsOrVecElts( + StoreNodes, MemVT, NumStoresToMerge, false, true, false); + + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumStoresToMerge); + NumConsecutiveStores -= NumStoresToMerge; + } + return MadeChange; +} + bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None || !EnableStoreMerging) return false; @@ -16485,71 +16562,9 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { MemVT, AllowVectors, RootNode); continue; } - - // When extracting multiple vector elements, try to store them - // in one vector store rather than a sequence of scalar stores. if (StoreSrc == StoreSource::Extract) { - // Loop on Consecutive Stores on success. - while (NumConsecutiveStores >= 2) { - LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; - unsigned FirstStoreAS = FirstInChain->getAddressSpace(); - unsigned FirstStoreAlign = FirstInChain->getAlignment(); - unsigned NumStoresToMerge = 1; - for (unsigned i = 0; i < NumConsecutiveStores; ++i) { - // Find a legal type for the vector store. - unsigned Elts = (i + 1) * NumMemElts; - EVT Ty = - EVT::getVectorVT(*DAG.getContext(), MemVT.getScalarType(), Elts); - bool IsFast = false; - - // Break early when size is too large to be legal. - if (Ty.getSizeInBits() > MaximumLegalStoreInBits) - break; - - if (TLI.isTypeLegal(Ty) && - TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) && - TLI.allowsMemoryAccess(Context, DL, Ty, - *FirstInChain->getMemOperand(), &IsFast) && - IsFast) - NumStoresToMerge = i + 1; - } - - // Check if we found a legal integer type creating a meaningful - // merge. - if (NumStoresToMerge < 2) { - // We know that candidate stores are in order and of correct - // shape. While there is no mergeable sequence from the - // beginning one may start later in the sequence. The only - // reason a merge of size N could have failed where another of - // the same size would not have, is if the alignment has - // improved. Drop as many candidates as we can here. - unsigned NumSkip = 1; - while ( - (NumSkip < NumConsecutiveStores) && - (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) - NumSkip++; - - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); - NumConsecutiveStores -= NumSkip; - continue; - } - - // Check that we can merge these candidates without causing a cycle. - if (!checkMergeStoreCandidatesForDependencies( - StoreNodes, NumStoresToMerge, RootNode)) { - StoreNodes.erase(StoreNodes.begin(), - StoreNodes.begin() + NumStoresToMerge); - NumConsecutiveStores -= NumStoresToMerge; - continue; - } - - MadeChange |= mergeStoresOfConstantsOrVecElts( - StoreNodes, MemVT, NumStoresToMerge, false, true, false); - - StoreNodes.erase(StoreNodes.begin(), - StoreNodes.begin() + NumStoresToMerge); - NumConsecutiveStores -= NumStoresToMerge; - } + MadeChange |= tryStoreMergeOfExtracts(StoreNodes, NumConsecutiveStores, + MemVT, RootNode); continue; } From llvm-commits at lists.llvm.org Thu Jul 9 14:26:22 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Thu, 09 Jul 2020 14:26:22 -0700 (PDT) Subject: [llvm] b476e6a - [DAGCombiner] add helper function for store merging of loaded values; NFC Message-ID: <5f078b7e.1c69fb81.730ba.95de@mx.google.com> Author: Sanjay Patel Date: 2020-07-09T17:20:04-04:00 New Revision: b476e6a642d08aefaf391df026893078cd6ea9b2 URL: https://github.com/llvm/llvm-project/commit/b476e6a642d08aefaf391df026893078cd6ea9b2 DIFF: https://github.com/llvm/llvm-project/commit/b476e6a642d08aefaf391df026893078cd6ea9b2.diff LOG: [DAGCombiner] add helper function for store merging of loaded values; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 406af4f18549..309125035d34 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -717,7 +717,7 @@ namespace { /// store chains that are composed entirely of constant values. bool tryStoreMergeOfConstants(SmallVectorImpl &StoreNodes, unsigned NumConsecutiveStores, - EVT MemVT, bool AllowVectors, SDNode *Root); + EVT MemVT, SDNode *Root, bool AllowVectors); /// This is a helper function for mergeConsecutiveStores. It is used for /// store chains that are composed entirely of extracted vector elements. @@ -727,6 +727,13 @@ namespace { unsigned NumConsecutiveStores, EVT MemVT, SDNode *Root); + /// This is a helper function for mergeConsecutiveStores. It is used for + /// store chains that are composed entirely of loaded values. + bool tryStoreMergeOfLoads(SmallVectorImpl &StoreNodes, + unsigned NumConsecutiveStores, EVT MemVT, + SDNode *Root, bool AllowVectors, + bool IsNonTemporalStore, bool IsNonTemporalLoad); + /// Merge consecutive store operations into a wide store. /// This optimization uses wide integers or vectors when possible. /// \return true if stores were merged. @@ -16299,7 +16306,7 @@ DAGCombiner::getConsecutiveStores(SmallVectorImpl &StoreNodes, bool DAGCombiner::tryStoreMergeOfConstants( SmallVectorImpl &StoreNodes, unsigned NumConsecutiveStores, - EVT MemVT, bool AllowVectors, SDNode *RootNode) { + EVT MemVT, SDNode *RootNode, bool AllowVectors) { LLVMContext &Context = *DAG.getContext(); const DataLayout &DL = DAG.getDataLayout(); int64_t ElementSizeBytes = MemVT.getStoreSize(); @@ -16489,6 +16496,261 @@ bool DAGCombiner::tryStoreMergeOfExtracts( return MadeChange; } +bool DAGCombiner::tryStoreMergeOfLoads(SmallVectorImpl &StoreNodes, + unsigned NumConsecutiveStores, EVT MemVT, + SDNode *RootNode, bool AllowVectors, + bool IsNonTemporalStore, + bool IsNonTemporalLoad) { + LLVMContext &Context = *DAG.getContext(); + const DataLayout &DL = DAG.getDataLayout(); + int64_t ElementSizeBytes = MemVT.getStoreSize(); + unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; + bool MadeChange = false; + + int64_t StartAddress = StoreNodes[0].OffsetFromBase; + + // Look for load nodes which are used by the stored values. + SmallVector LoadNodes; + + // Find acceptable loads. Loads need to have the same chain (token factor), + // must not be zext, volatile, indexed, and they must be consecutive. + BaseIndexOffset LdBasePtr; + + for (unsigned i = 0; i < NumConsecutiveStores; ++i) { + StoreSDNode *St = cast(StoreNodes[i].MemNode); + SDValue Val = peekThroughBitcasts(St->getValue()); + LoadSDNode *Ld = cast(Val); + + BaseIndexOffset LdPtr = BaseIndexOffset::match(Ld, DAG); + // If this is not the first ptr that we check. + int64_t LdOffset = 0; + if (LdBasePtr.getBase().getNode()) { + // The base ptr must be the same. + if (!LdBasePtr.equalBaseIndex(LdPtr, DAG, LdOffset)) + break; + } else { + // Check that all other base pointers are the same as this one. + LdBasePtr = LdPtr; + } + + // We found a potential memory operand to merge. + LoadNodes.push_back(MemOpLink(Ld, LdOffset)); + } + + while (NumConsecutiveStores >= 2 && LoadNodes.size() >= 2) { + // If we have load/store pair instructions and we only have two values, + // don't bother merging. + Align RequiredAlignment; + if (LoadNodes.size() == 2 && TLI.hasPairedLoad(MemVT, RequiredAlignment) && + StoreNodes[0].MemNode->getAlign() >= RequiredAlignment) { + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + 2); + LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + 2); + break; + } + LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; + unsigned FirstStoreAS = FirstInChain->getAddressSpace(); + unsigned FirstStoreAlign = FirstInChain->getAlignment(); + LoadSDNode *FirstLoad = cast(LoadNodes[0].MemNode); + unsigned FirstLoadAlign = FirstLoad->getAlignment(); + + // Scan the memory operations on the chain and find the first + // non-consecutive load memory address. These variables hold the index in + // the store node array. + + unsigned LastConsecutiveLoad = 1; + + // This variable refers to the size and not index in the array. + unsigned LastLegalVectorType = 1; + unsigned LastLegalIntegerType = 1; + bool isDereferenceable = true; + bool DoIntegerTruncate = false; + StartAddress = LoadNodes[0].OffsetFromBase; + SDValue FirstChain = FirstLoad->getChain(); + for (unsigned i = 1; i < LoadNodes.size(); ++i) { + // All loads must share the same chain. + if (LoadNodes[i].MemNode->getChain() != FirstChain) + break; + + int64_t CurrAddress = LoadNodes[i].OffsetFromBase; + if (CurrAddress - StartAddress != (ElementSizeBytes * i)) + break; + LastConsecutiveLoad = i; + + if (isDereferenceable && !LoadNodes[i].MemNode->isDereferenceable()) + isDereferenceable = false; + + // Find a legal type for the vector store. + unsigned Elts = (i + 1) * NumMemElts; + EVT StoreTy = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); + + // Break early when size is too large to be legal. + if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits) + break; + + bool IsFastSt = false; + bool IsFastLd = false; + if (TLI.isTypeLegal(StoreTy) && + TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstInChain->getMemOperand(), &IsFastSt) && + IsFastSt && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstLoad->getMemOperand(), &IsFastLd) && + IsFastLd) { + LastLegalVectorType = i + 1; + } + + // Find a legal type for the integer store. + unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8; + StoreTy = EVT::getIntegerVT(Context, SizeInBits); + if (TLI.isTypeLegal(StoreTy) && + TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstInChain->getMemOperand(), &IsFastSt) && + IsFastSt && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstLoad->getMemOperand(), &IsFastLd) && + IsFastLd) { + LastLegalIntegerType = i + 1; + DoIntegerTruncate = false; + // Or check whether a truncstore and extload is legal. + } else if (TLI.getTypeAction(Context, StoreTy) == + TargetLowering::TypePromoteInteger) { + EVT LegalizedStoredValTy = TLI.getTypeToTransformTo(Context, StoreTy); + if (TLI.isTruncStoreLegal(LegalizedStoredValTy, StoreTy) && + TLI.canMergeStoresTo(FirstStoreAS, LegalizedStoredValTy, DAG) && + TLI.isLoadExtLegal(ISD::ZEXTLOAD, LegalizedStoredValTy, StoreTy) && + TLI.isLoadExtLegal(ISD::SEXTLOAD, LegalizedStoredValTy, StoreTy) && + TLI.isLoadExtLegal(ISD::EXTLOAD, LegalizedStoredValTy, StoreTy) && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstInChain->getMemOperand(), &IsFastSt) && + IsFastSt && + TLI.allowsMemoryAccess(Context, DL, StoreTy, + *FirstLoad->getMemOperand(), &IsFastLd) && + IsFastLd) { + LastLegalIntegerType = i + 1; + DoIntegerTruncate = true; + } + } + } + + // Only use vector types if the vector type is larger than the integer + // type. If they are the same, use integers. + bool UseVectorTy = + LastLegalVectorType > LastLegalIntegerType && AllowVectors; + unsigned LastLegalType = + std::max(LastLegalVectorType, LastLegalIntegerType); + + // We add +1 here because the LastXXX variables refer to location while + // the NumElem refers to array/index size. + unsigned NumElem = std::min(NumConsecutiveStores, LastConsecutiveLoad + 1); + NumElem = std::min(LastLegalType, NumElem); + + if (NumElem < 2) { + // We know that candidate stores are in order and of correct + // shape. While there is no mergeable sequence from the + // beginning one may start later in the sequence. The only + // reason a merge of size N could have failed where another of + // the same size would not have is if the alignment or either + // the load or store has improved. Drop as many candidates as we + // can here. + unsigned NumSkip = 1; + while ((NumSkip < LoadNodes.size()) && + (LoadNodes[NumSkip].MemNode->getAlignment() <= FirstLoadAlign) && + (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) + NumSkip++; + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); + LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumSkip); + NumConsecutiveStores -= NumSkip; + continue; + } + + // Check that we can merge these candidates without causing a cycle. + if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem, + RootNode)) { + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); + LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumElem); + NumConsecutiveStores -= NumElem; + continue; + } + + // Find if it is better to use vectors or integers to load and store + // to memory. + EVT JointMemOpVT; + if (UseVectorTy) { + // Find a legal type for the vector store. + unsigned Elts = NumElem * NumMemElts; + JointMemOpVT = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); + } else { + unsigned SizeInBits = NumElem * ElementSizeBytes * 8; + JointMemOpVT = EVT::getIntegerVT(Context, SizeInBits); + } + + SDLoc LoadDL(LoadNodes[0].MemNode); + SDLoc StoreDL(StoreNodes[0].MemNode); + + // The merged loads are required to have the same incoming chain, so + // using the first's chain is acceptable. + + SDValue NewStoreChain = getMergeStoreChains(StoreNodes, NumElem); + AddToWorklist(NewStoreChain.getNode()); + + MachineMemOperand::Flags LdMMOFlags = + isDereferenceable ? MachineMemOperand::MODereferenceable + : MachineMemOperand::MONone; + if (IsNonTemporalLoad) + LdMMOFlags |= MachineMemOperand::MONonTemporal; + + MachineMemOperand::Flags StMMOFlags = IsNonTemporalStore + ? MachineMemOperand::MONonTemporal + : MachineMemOperand::MONone; + + SDValue NewLoad, NewStore; + if (UseVectorTy || !DoIntegerTruncate) { + NewLoad = DAG.getLoad( + JointMemOpVT, LoadDL, FirstLoad->getChain(), FirstLoad->getBasePtr(), + FirstLoad->getPointerInfo(), FirstLoadAlign, LdMMOFlags); + NewStore = DAG.getStore( + NewStoreChain, StoreDL, NewLoad, FirstInChain->getBasePtr(), + FirstInChain->getPointerInfo(), FirstStoreAlign, StMMOFlags); + } else { // This must be the truncstore/extload case + EVT ExtendedTy = + TLI.getTypeToTransformTo(*DAG.getContext(), JointMemOpVT); + NewLoad = DAG.getExtLoad(ISD::EXTLOAD, LoadDL, ExtendedTy, + FirstLoad->getChain(), FirstLoad->getBasePtr(), + FirstLoad->getPointerInfo(), JointMemOpVT, + FirstLoadAlign, LdMMOFlags); + NewStore = DAG.getTruncStore(NewStoreChain, StoreDL, NewLoad, + FirstInChain->getBasePtr(), + FirstInChain->getPointerInfo(), JointMemOpVT, + FirstInChain->getAlignment(), + FirstInChain->getMemOperand()->getFlags()); + } + + // Transfer chain users from old loads to the new load. + for (unsigned i = 0; i < NumElem; ++i) { + LoadSDNode *Ld = cast(LoadNodes[i].MemNode); + DAG.ReplaceAllUsesOfValueWith(SDValue(Ld, 1), + SDValue(NewLoad.getNode(), 1)); + } + + // Replace all stores with the new store. Recursively remove corresponding + // values if they are no longer used. + for (unsigned i = 0; i < NumElem; ++i) { + SDValue Val = StoreNodes[i].MemNode->getOperand(1); + CombineTo(StoreNodes[i].MemNode, NewStore); + if (Val.getNode()->use_empty()) + recursivelyDeleteUnusedNodes(Val.getNode()); + } + + MadeChange = true; + StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); + LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumElem); + NumConsecutiveStores -= NumElem; + } + return MadeChange; +} + bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None || !EnableStoreMerging) return false; @@ -16530,15 +16792,11 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { return LHS.OffsetFromBase < RHS.OffsetFromBase; }); - unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1; bool AllowVectors = !DAG.getMachineFunction().getFunction().hasFnAttribute( Attribute::NoImplicitFloat); - bool IsNonTemporalStore = St->isNonTemporal(); bool IsNonTemporalLoad = StoreSrc == StoreSource::Load && cast(StoredVal)->isNonTemporal(); - LLVMContext &Context = *DAG.getContext(); - const DataLayout &DL = DAG.getDataLayout(); // Store Merge attempts to merge the lowest stores. This generally // works out as if successful, as the remaining stores are checked @@ -16559,7 +16817,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores"); if (StoreSrc == StoreSource::Constant) { MadeChange |= tryStoreMergeOfConstants(StoreNodes, NumConsecutiveStores, - MemVT, AllowVectors, RootNode); + MemVT, RootNode, AllowVectors); continue; } if (StoreSrc == StoreSource::Extract) { @@ -16567,257 +16825,12 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { MemVT, RootNode); continue; } - - // Below we handle the case of multiple consecutive stores that - // come from multiple consecutive loads. We merge them into a single - // wide load and a single wide store. - assert(StoreSrc == StoreSource::Load && "Expected load source for store"); - int64_t StartAddress = StoreNodes[0].OffsetFromBase; - - // Look for load nodes which are used by the stored values. - SmallVector LoadNodes; - - // Find acceptable loads. Loads need to have the same chain (token factor), - // must not be zext, volatile, indexed, and they must be consecutive. - BaseIndexOffset LdBasePtr; - - for (unsigned i = 0; i < NumConsecutiveStores; ++i) { - StoreSDNode *St = cast(StoreNodes[i].MemNode); - SDValue Val = peekThroughBitcasts(St->getValue()); - LoadSDNode *Ld = cast(Val); - - BaseIndexOffset LdPtr = BaseIndexOffset::match(Ld, DAG); - // If this is not the first ptr that we check. - int64_t LdOffset = 0; - if (LdBasePtr.getBase().getNode()) { - // The base ptr must be the same. - if (!LdBasePtr.equalBaseIndex(LdPtr, DAG, LdOffset)) - break; - } else { - // Check that all other base pointers are the same as this one. - LdBasePtr = LdPtr; - } - - // We found a potential memory operand to merge. - LoadNodes.push_back(MemOpLink(Ld, LdOffset)); - } - - while (NumConsecutiveStores >= 2 && LoadNodes.size() >= 2) { - // If we have load/store pair instructions and we only have two values, - // don't bother merging. - Align RequiredAlignment; - if (LoadNodes.size() == 2 && - TLI.hasPairedLoad(MemVT, RequiredAlignment) && - StoreNodes[0].MemNode->getAlign() >= RequiredAlignment) { - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + 2); - LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + 2); - break; - } - LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; - unsigned FirstStoreAS = FirstInChain->getAddressSpace(); - unsigned FirstStoreAlign = FirstInChain->getAlignment(); - LoadSDNode *FirstLoad = cast(LoadNodes[0].MemNode); - unsigned FirstLoadAlign = FirstLoad->getAlignment(); - - // Scan the memory operations on the chain and find the first - // non-consecutive load memory address. These variables hold the index in - // the store node array. - - unsigned LastConsecutiveLoad = 1; - - // This variable refers to the size and not index in the array. - unsigned LastLegalVectorType = 1; - unsigned LastLegalIntegerType = 1; - bool isDereferenceable = true; - bool DoIntegerTruncate = false; - StartAddress = LoadNodes[0].OffsetFromBase; - SDValue FirstChain = FirstLoad->getChain(); - for (unsigned i = 1; i < LoadNodes.size(); ++i) { - // All loads must share the same chain. - if (LoadNodes[i].MemNode->getChain() != FirstChain) - break; - - int64_t CurrAddress = LoadNodes[i].OffsetFromBase; - if (CurrAddress - StartAddress != (ElementSizeBytes * i)) - break; - LastConsecutiveLoad = i; - - if (isDereferenceable && !LoadNodes[i].MemNode->isDereferenceable()) - isDereferenceable = false; - - // Find a legal type for the vector store. - unsigned Elts = (i + 1) * NumMemElts; - EVT StoreTy = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); - - // Break early when size is too large to be legal. - if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits) - break; - - bool IsFastSt = false; - bool IsFastLd = false; - if (TLI.isTypeLegal(StoreTy) && - TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstInChain->getMemOperand(), &IsFastSt) && - IsFastSt && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstLoad->getMemOperand(), &IsFastLd) && - IsFastLd) { - LastLegalVectorType = i + 1; - } - - // Find a legal type for the integer store. - unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8; - StoreTy = EVT::getIntegerVT(Context, SizeInBits); - if (TLI.isTypeLegal(StoreTy) && - TLI.canMergeStoresTo(FirstStoreAS, StoreTy, DAG) && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstInChain->getMemOperand(), &IsFastSt) && - IsFastSt && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstLoad->getMemOperand(), &IsFastLd) && - IsFastLd) { - LastLegalIntegerType = i + 1; - DoIntegerTruncate = false; - // Or check whether a truncstore and extload is legal. - } else if (TLI.getTypeAction(Context, StoreTy) == - TargetLowering::TypePromoteInteger) { - EVT LegalizedStoredValTy = TLI.getTypeToTransformTo(Context, StoreTy); - if (TLI.isTruncStoreLegal(LegalizedStoredValTy, StoreTy) && - TLI.canMergeStoresTo(FirstStoreAS, LegalizedStoredValTy, DAG) && - TLI.isLoadExtLegal(ISD::ZEXTLOAD, LegalizedStoredValTy, - StoreTy) && - TLI.isLoadExtLegal(ISD::SEXTLOAD, LegalizedStoredValTy, - StoreTy) && - TLI.isLoadExtLegal(ISD::EXTLOAD, LegalizedStoredValTy, StoreTy) && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstInChain->getMemOperand(), - &IsFastSt) && - IsFastSt && - TLI.allowsMemoryAccess(Context, DL, StoreTy, - *FirstLoad->getMemOperand(), &IsFastLd) && - IsFastLd) { - LastLegalIntegerType = i + 1; - DoIntegerTruncate = true; - } - } - } - - // Only use vector types if the vector type is larger than the integer - // type. If they are the same, use integers. - bool UseVectorTy = - LastLegalVectorType > LastLegalIntegerType && AllowVectors; - unsigned LastLegalType = - std::max(LastLegalVectorType, LastLegalIntegerType); - - // We add +1 here because the LastXXX variables refer to location while - // the NumElem refers to array/index size. - unsigned NumElem = - std::min(NumConsecutiveStores, LastConsecutiveLoad + 1); - NumElem = std::min(LastLegalType, NumElem); - - if (NumElem < 2) { - // We know that candidate stores are in order and of correct - // shape. While there is no mergeable sequence from the - // beginning one may start later in the sequence. The only - // reason a merge of size N could have failed where another of - // the same size would not have is if the alignment or either - // the load or store has improved. Drop as many candidates as we - // can here. - unsigned NumSkip = 1; - while ((NumSkip < LoadNodes.size()) && - (LoadNodes[NumSkip].MemNode->getAlignment() <= FirstLoadAlign) && - (StoreNodes[NumSkip].MemNode->getAlignment() <= FirstStoreAlign)) - NumSkip++; - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumSkip); - LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumSkip); - NumConsecutiveStores -= NumSkip; - continue; - } - - // Check that we can merge these candidates without causing a cycle. - if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem, - RootNode)) { - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); - LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumElem); - NumConsecutiveStores -= NumElem; - continue; - } - - // Find if it is better to use vectors or integers to load and store - // to memory. - EVT JointMemOpVT; - if (UseVectorTy) { - // Find a legal type for the vector store. - unsigned Elts = NumElem * NumMemElts; - JointMemOpVT = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts); - } else { - unsigned SizeInBits = NumElem * ElementSizeBytes * 8; - JointMemOpVT = EVT::getIntegerVT(Context, SizeInBits); - } - - SDLoc LoadDL(LoadNodes[0].MemNode); - SDLoc StoreDL(StoreNodes[0].MemNode); - - // The merged loads are required to have the same incoming chain, so - // using the first's chain is acceptable. - - SDValue NewStoreChain = getMergeStoreChains(StoreNodes, NumElem); - AddToWorklist(NewStoreChain.getNode()); - - MachineMemOperand::Flags LdMMOFlags = - isDereferenceable ? MachineMemOperand::MODereferenceable - : MachineMemOperand::MONone; - if (IsNonTemporalLoad) - LdMMOFlags |= MachineMemOperand::MONonTemporal; - - MachineMemOperand::Flags StMMOFlags = - IsNonTemporalStore ? MachineMemOperand::MONonTemporal - : MachineMemOperand::MONone; - - SDValue NewLoad, NewStore; - if (UseVectorTy || !DoIntegerTruncate) { - NewLoad = - DAG.getLoad(JointMemOpVT, LoadDL, FirstLoad->getChain(), - FirstLoad->getBasePtr(), FirstLoad->getPointerInfo(), - FirstLoadAlign, LdMMOFlags); - NewStore = DAG.getStore( - NewStoreChain, StoreDL, NewLoad, FirstInChain->getBasePtr(), - FirstInChain->getPointerInfo(), FirstStoreAlign, StMMOFlags); - } else { // This must be the truncstore/extload case - EVT ExtendedTy = - TLI.getTypeToTransformTo(*DAG.getContext(), JointMemOpVT); - NewLoad = DAG.getExtLoad(ISD::EXTLOAD, LoadDL, ExtendedTy, - FirstLoad->getChain(), FirstLoad->getBasePtr(), - FirstLoad->getPointerInfo(), JointMemOpVT, - FirstLoadAlign, LdMMOFlags); - NewStore = DAG.getTruncStore(NewStoreChain, StoreDL, NewLoad, - FirstInChain->getBasePtr(), - FirstInChain->getPointerInfo(), - JointMemOpVT, FirstInChain->getAlignment(), - FirstInChain->getMemOperand()->getFlags()); - } - - // Transfer chain users from old loads to the new load. - for (unsigned i = 0; i < NumElem; ++i) { - LoadSDNode *Ld = cast(LoadNodes[i].MemNode); - DAG.ReplaceAllUsesOfValueWith(SDValue(Ld, 1), - SDValue(NewLoad.getNode(), 1)); - } - - // Replace all stores with the new store. Recursively remove corresponding - // values if they are no longer used. - for (unsigned i = 0; i < NumElem; ++i) { - SDValue Val = StoreNodes[i].MemNode->getOperand(1); - CombineTo(StoreNodes[i].MemNode, NewStore); - if (Val.getNode()->use_empty()) - recursivelyDeleteUnusedNodes(Val.getNode()); - } - - MadeChange = true; - StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem); - LoadNodes.erase(LoadNodes.begin(), LoadNodes.begin() + NumElem); - NumConsecutiveStores -= NumElem; + if (StoreSrc == StoreSource::Load) { + MadeChange |= tryStoreMergeOfLoads(StoreNodes, NumConsecutiveStores, + MemVT, RootNode, AllowVectors, + IsNonTemporalStore, + IsNonTemporalLoad); + continue; } } return MadeChange; From llvm-commits at lists.llvm.org Thu Jul 9 14:26:25 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Thu, 09 Jul 2020 14:26:25 -0700 (PDT) Subject: [llvm] a46cf40 - [DAGCombiner] convert if-chain in store merging to switch; NFC Message-ID: <5f078b81.1c69fb81.9fdcc.9255@mx.google.com> Author: Sanjay Patel Date: 2020-07-09T17:20:04-04:00 New Revision: a46cf40240adb3f3171f08705e65d7300b2719cb URL: https://github.com/llvm/llvm-project/commit/a46cf40240adb3f3171f08705e65d7300b2719cb DIFF: https://github.com/llvm/llvm-project/commit/a46cf40240adb3f3171f08705e65d7300b2719cb.diff LOG: [DAGCombiner] convert if-chain in store merging to switch; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 309125035d34..effd5d6ab7d8 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -16773,7 +16773,7 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // Do not bother looking at stored values that are not constants, loads, or // extracted vector elements. SDValue StoredVal = peekThroughBitcasts(St->getValue()); - StoreSource StoreSrc = getStoreSource(StoredVal); + const StoreSource StoreSrc = getStoreSource(StoredVal); if (StoreSrc == StoreSource::Unknown) return false; @@ -16815,22 +16815,25 @@ bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) { // We have at least 2 consecutive stores. Try to merge them. assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores"); - if (StoreSrc == StoreSource::Constant) { + switch (StoreSrc) { + case StoreSource::Constant: MadeChange |= tryStoreMergeOfConstants(StoreNodes, NumConsecutiveStores, MemVT, RootNode, AllowVectors); - continue; - } - if (StoreSrc == StoreSource::Extract) { + break; + + case StoreSource::Extract: MadeChange |= tryStoreMergeOfExtracts(StoreNodes, NumConsecutiveStores, MemVT, RootNode); - continue; - } - if (StoreSrc == StoreSource::Load) { + break; + + case StoreSource::Load: MadeChange |= tryStoreMergeOfLoads(StoreNodes, NumConsecutiveStores, MemVT, RootNode, AllowVectors, - IsNonTemporalStore, - IsNonTemporalLoad); - continue; + IsNonTemporalStore, IsNonTemporalLoad); + break; + + default: + llvm_unreachable("Unhandled store source type"); } } return MadeChange; From llvm-commits at lists.llvm.org Thu Jul 9 14:27:38 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:27:38 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <211fcc3f57a8adb2d8ee876f8ced9118@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1705 + // Use signed division by a power of two to truncate towards 0. + int64_t D = 1LL << (NumBits - 1); + RemainderOffset = (static_cast(COffsetVal) / D) * D; ---------------- foad wrote: > arsenm wrote: > > foad wrote: > > > arsenm wrote: > > > > arsenm wrote: > > > > > foad wrote: > > > > > > arsenm wrote: > > > > > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > > > > The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? > > > > > Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions > > > > Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) > > > > This limitation also only needs to be applied if AS == FLAT_ADDRESS > > > > > > I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture. > > My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers) > But in that case, you should still avoid making drastic changes to vaddr in case it ends up accidentally pointing *into* one of the apertures, when you wanted a global access. E.g. if you're accessing a global that happens to be just past the end of the private or local aperture. I thought Nicolai mentioned this might not be possible? I guess I don't understand why the aperture is so complicated and not just a couple of flags in the high, unused bits Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Thu Jul 9 14:31:42 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via llvm-commits) Date: Thu, 09 Jul 2020 14:31:42 -0700 (PDT) Subject: [llvm] 77f8f81 - [AMDGPU] Return restricted number of regs from TTI Message-ID: <5f078cbe.1c69fb81.3c949.97ff@mx.google.com> Author: Stanislav Mekhanoshin Date: 2020-07-09T14:31:28-07:00 New Revision: 77f8f813a9ae20152129a8ebb9fea5fcec859194 URL: https://github.com/llvm/llvm-project/commit/77f8f813a9ae20152129a8ebb9fea5fcec859194 DIFF: https://github.com/llvm/llvm-project/commit/77f8f813a9ae20152129a8ebb9fea5fcec859194.diff LOG: [AMDGPU] Return restricted number of regs from TTI This is practically NFC at the moment because nothing really asks the real number or does anything useful with it. Differential Revision: https://reviews.llvm.org/D82202 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 24f079ffe929..8783427b5002 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -239,7 +239,7 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. - return 256; + return MaxVGPRs; } unsigned GCNTTIImpl::getNumberOfRegisters(bool Vec) const { @@ -248,6 +248,13 @@ unsigned GCNTTIImpl::getNumberOfRegisters(bool Vec) const { return getHardwareNumberOfRegisters(Vec) >> 3; } +unsigned GCNTTIImpl::getNumberOfRegisters(unsigned RCID) const { + const SIRegisterInfo *TRI = ST->getRegisterInfo(); + const TargetRegisterClass *RC = TRI->getRegClass(RCID); + unsigned NumVGPRs = (TRI->getRegSizeInBits(*RC) + 31) / 32; + return getHardwareNumberOfRegisters(false) / NumVGPRs; +} + unsigned GCNTTIImpl::getRegisterBitWidth(bool Vector) const { return 32; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index 508ed061e935..b8a027c79bfc 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -74,6 +74,7 @@ class GCNTTIImpl final : public BasicTTIImplBase { AMDGPUTTIImpl CommonTTI; bool IsGraphicsShader; bool HasFP32Denormals; + unsigned MaxVGPRs; const FeatureBitset InlineFeatureIgnoreList = { // Codegen control options which don't matter. @@ -133,7 +134,11 @@ class GCNTTIImpl final : public BasicTTIImplBase { TLI(ST->getTargetLowering()), CommonTTI(TM, F), IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())), - HasFP32Denormals(AMDGPU::SIModeRegisterDefaults(F).allFP32Denormals()) {} + HasFP32Denormals(AMDGPU::SIModeRegisterDefaults(F).allFP32Denormals()), + MaxVGPRs(ST->getMaxNumVGPRs( + std::max(ST->getWavesPerEU(F).first, + ST->getWavesPerEUForWorkGroup( + ST->getFlatWorkGroupSizes(F).second)))) {} bool hasBranchDivergence() { return true; } bool useGPUDivergenceAnalysis() const; @@ -148,6 +153,7 @@ class GCNTTIImpl final : public BasicTTIImplBase { unsigned getHardwareNumberOfRegisters(bool Vector) const; unsigned getNumberOfRegisters(bool Vector) const; + unsigned getNumberOfRegisters(unsigned RCID) const; unsigned getRegisterBitWidth(bool Vector) const; unsigned getMinVectorRegisterBitWidth() const; unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize, From llvm-commits at lists.llvm.org Thu Jul 9 14:35:07 2020 From: llvm-commits at lists.llvm.org (Tyker via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:35:07 +0000 (UTC) Subject: [PATCH] D83507: [AssumeBundles] Fix Bug in Assume Queries In-Reply-To: References: Message-ID: <60e58f10e4606b05f7f6e7d64a8bf30e@localhost.localdomain> Tyker added a comment. In D83507#2142627 , @lebedev.ri wrote: > Test? i would also like to add a test for it, but the smallest reproduction example is still very big 30k+ line of IR and depend on what is present in the AssumptionCache so its an -O3 run. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83507/new/ https://reviews.llvm.org/D83507 From llvm-commits at lists.llvm.org Thu Jul 9 14:40:32 2020 From: llvm-commits at lists.llvm.org (Vy Nguyen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:40:32 +0000 (UTC) Subject: [PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements. In-Reply-To: References: Message-ID: <75b2cf8cbb551d80a9f55a13422b6b0c@localhost.localdomain> oontvoo updated this revision to Diff 276842. oontvoo added a comment. Fixed warning [-Wmissing-field-initializers] in code Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77422/new/ https://reviews.llvm.org/D77422 Files: llvm/docs/CommandGuide/llvm-exegesis.rst llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att llvm/test/tools/llvm-exegesis/X86/lbr/lit.local.cfg llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.cpp llvm/tools/llvm-exegesis/lib/PerfHelper.h llvm/tools/llvm-exegesis/lib/X86/CMakeLists.txt llvm/tools/llvm-exegesis/lib/X86/Target.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp llvm/tools/llvm-exegesis/lib/X86/X86Counter.h llvm/tools/llvm-exegesis/llvm-exegesis.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77422.276842.patch Type: text/x-patch Size: 21862 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:41:34 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:41:34 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: amyk marked 3 inline comments as done. amyk added inline comments. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14273 + // The third argument to vec_replace_elt will be emitted to either + // the vinsw or vinsd instruction. It must be a compile time constant. + ConstantInt *ArgCI = dyn_cast(Ops[2]); ---------------- lei wrote: > Do you mean? > ``` > // The third argument of vec_replace_elt must be a compile time constant and will be emitted either > // to the vinsw or vinsd instruction. > ``` Yes. Thank you - I will update the wording here and in the other builtin. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14320 + Call = Builder.CreateCall(F, Ops); + } + return Call; ---------------- lei wrote: > What are the chances of reaching to the end of this if/else-if section and `Call` is null? ie `getPrimitiveSizeInBits() != [32|64]` > I feel like it would be better if we can structure it so that we are not doing all these nesting of `if`s and just do returns within the diff if-conditions. > > Have you tried to pull out the diff handling of 32/64bit arg and consolidating the code a bit? Thanks - I realize that I should probably pull the `Call` out. I'll update this. I've actually consolidated the code quite a bit already, but I'll see if I can make any further improvements on this. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 From llvm-commits at lists.llvm.org Thu Jul 9 14:42:06 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:42:06 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: arsenm updated this revision to Diff 276843. arsenm added a comment. Remove v2, and also accept parsing CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 Files: llvm/docs/AMDGPUUsage.rst llvm/include/llvm/Support/AMDGPUMetadata.h llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp llvm/lib/Support/AMDGPUMetadata.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82818.276843.patch Type: text/x-patch Size: 189057 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:42:20 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:42:20 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: <257f42a44fb80db7ce07976b9eb15032@localhost.localdomain> amyk updated this revision to Diff 276844. amyk added a comment. Address review comments - update comments - pull out common code Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D83500.276844.patch Type: text/x-patch Size: 12137 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 14:47:10 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:47:10 +0000 (UTC) Subject: [PATCH] D83491: [flang] Fix a crash when creating generics from a copy In-Reply-To: References: Message-ID: <15ecc7e1e1377282632490505a765a42@localhost.localdomain> PeteSteinfeld marked an inline comment as done. PeteSteinfeld added inline comments. ================ Comment at: flang/lib/Semantics/symbol.cpp:211 + } + } } ---------------- tskeith wrote: > I think that `specificProcs_` and `bindingNames_` are supposed to be parallel vectors; at least that is the assumption in `CheckHelper::CheckGeneric`. So this should be written as a single loop that pushes onto the two lists at the same time. As it's written it looks like the two loops might push different numbers of elements on the two lists. > > One thing that suggests that the above assumption is wrong is the existence of this constructor: `GenericDetails(const SymbolVector &specificProcs);`. But I'm not sure it is ever used, so it would be good if you can delete it as part of this change. After doing some testing, I believe that you're correct that they're supposed to be parallel vectors. I'll put some calls to `CHECK()` into `CopyFrom()` to verify that their sizes match and copy them as pairs. You're also correct that the constructor `GenericDetails(const SymbolVector &specificProcs)` is not used. I'll delete it. Thanks for the guidance! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83491/new/ https://reviews.llvm.org/D83491 From llvm-commits at lists.llvm.org Thu Jul 9 14:51:57 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:51:57 +0000 (UTC) Subject: [PATCH] D83240: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping In-Reply-To: References: Message-ID: <480f28005b71002b7e5d15de63542e1a@localhost.localdomain> arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM. Apparently the cases to apply the mapping are somehow already there CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83240/new/ https://reviews.llvm.org/D83240 From llvm-commits at lists.llvm.org Thu Jul 9 14:53:03 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:53:03 +0000 (UTC) Subject: [PATCH] D73853: [llvm-reduce] add ReduceAttribute delta pass In-Reply-To: References: Message-ID: <22f68bd873cc3cad295cbd67edb064b3@localhost.localdomain> lebedev.ri added a comment. Clean-sheet implementation D83351 landed in rG03640ee0fa73c6eaf8cb12050203027239136789 , and bots appear to be happy with it. Abandon this? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73853/new/ https://reviews.llvm.org/D73853 From llvm-commits at lists.llvm.org Thu Jul 9 14:53:14 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Thu, 09 Jul 2020 14:53:14 -0700 (PDT) Subject: [llvm] f40b113 - Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." Message-ID: <5f0791ca.1c69fb81.32166.92a6@mx.google.com> Author: Craig Topper Date: 2020-07-09T14:52:16-07:00 New Revision: f40b11325e368667cf1dd91922d57dcef8069c8a URL: https://github.com/llvm/llvm-project/commit/f40b11325e368667cf1dd91922d57dcef8069c8a DIFF: https://github.com/llvm/llvm-project/commit/f40b11325e368667cf1dd91922d57dcef8069c8a.diff LOG: Recommit "[X86] Merge the FEATURE_64BIT and FEATURE_EM64T bits in X86TargetParser.def." This time without the change to make operator| use operator&=. That seems to be the source of the gcc 5.3 miscompile. Original commit message: These represent the same thing but 64BIT only showed up from getHostCPUFeatures providing a list of featuers to clang. While EM64T showed up from getting the features for a named CPU. EM64T didn't have a string specifically so it would not be passed up to clang when getting features for a named CPU. While 64bit needed a name since that's how it is index. Merge them by filtering 64bit out before sending features to clang for named CPUs. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Support/X86TargetParser.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 91feb146baaa..4b96c66b0e29 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -184,10 +184,6 @@ X86_FEATURE (CLWB, "clwb") X86_FEATURE (CLZERO, "clzero") X86_FEATURE (CMPXCHG16B, "cx16") X86_FEATURE (CMPXCHG8B, "cx8") -// FIXME: Merge with 64BIT? Currently separate to be used to tell if CPU is -// valid for 64-bit mode, but has empty string so it doesn't get added to -// target attributes in IR. -X86_FEATURE (EM64T, "") X86_FEATURE (ENQCMD, "enqcmd") X86_FEATURE (F16C, "f16c") X86_FEATURE (FSGSBASE, "fsgsbase") diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index 9f73f1ab1424..adfb599f55ff 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -868,7 +868,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; } - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_CORE2; // "core2" *Subtype = X86::INTEL_CORE2_65; break; @@ -894,7 +894,7 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 15: { - if (testFeature(X86::FEATURE_EM64T)) { + if (testFeature(X86::FEATURE_64BIT)) { *Type = X86::INTEL_NOCONA; break; } @@ -1140,7 +1140,7 @@ static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, setFeature(X86::FEATURE_FMA4); if (HasExtLeaf1 && ((EDX >> 29) & 1)) - setFeature(X86::FEATURE_EM64T); + setFeature(X86::FEATURE_64BIT); } StringRef sys::getHostCPUName() { diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp index 261e296b9e5a..572d1203aaf2 100644 --- a/llvm/lib/Support/X86TargetParser.cpp +++ b/llvm/lib/Support/X86TargetParser.cpp @@ -38,6 +38,7 @@ class FeatureBitset { } constexpr FeatureBitset &set(unsigned I) { + // GCC <6.2 crashes if this is written in a single statement. uint32_t NewBits = Bits[I / 32] | (uint32_t(1) << (I % 32)); Bits[I / 32] = NewBits; return *this; @@ -48,14 +49,25 @@ class FeatureBitset { return (Bits[I / 32] & Mask) != 0; } + constexpr FeatureBitset &operator&=(const FeatureBitset &RHS) { + for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { + // GCC <6.2 crashes if this is written in a single statement. + uint32_t NewBits = Bits[I] & RHS.Bits[I]; + Bits[I] = NewBits; + } + return *this; + } + constexpr FeatureBitset &operator|=(const FeatureBitset &RHS) { for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) { + // GCC <6.2 crashes if this is written in a single statement. uint32_t NewBits = Bits[I] | RHS.Bits[I]; Bits[I] = NewBits; } return *this; } + // gcc 5.3 miscompiles this if we try to write this using operator&=. constexpr FeatureBitset operator&(const FeatureBitset &RHS) const { FeatureBitset Result; for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) @@ -63,6 +75,7 @@ class FeatureBitset { return Result; } + // gcc 5.3 miscompiles this if we try to write this using operator&=. constexpr FeatureBitset operator|(const FeatureBitset &RHS) const { FeatureBitset Result; for (unsigned I = 0, E = array_lengthof(Bits); I != E; ++I) @@ -111,10 +124,10 @@ static constexpr FeatureBitset FeaturesPentium4 = static constexpr FeatureBitset FeaturesPrescott = FeaturesPentium4 | FeatureSSE3; static constexpr FeatureBitset FeaturesNocona = - FeaturesPrescott | FeatureEM64T | FeatureCMPXCHG16B; + FeaturesPrescott | Feature64BIT | FeatureCMPXCHG16B; // Basic 64-bit capable CPU. -static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | FeatureEM64T; +static constexpr FeatureBitset FeaturesX86_64 = FeaturesPentium4 | Feature64BIT; // Intel Core CPUs static constexpr FeatureBitset FeaturesCore2 = @@ -201,7 +214,7 @@ static constexpr FeatureBitset FeaturesAthlon = static constexpr FeatureBitset FeaturesAthlonXP = FeaturesAthlon | FeatureFXSR | FeatureSSE; static constexpr FeatureBitset FeaturesK8 = - FeaturesAthlonXP | FeatureSSE2 | FeatureEM64T; + FeaturesAthlonXP | FeatureSSE2 | Feature64BIT; static constexpr FeatureBitset FeaturesK8SSE3 = FeaturesK8 | FeatureSSE3; static constexpr FeatureBitset FeaturesAMDFAM10 = FeaturesK8SSE3 | FeatureCMPXCHG16B | FeatureLZCNT | FeaturePOPCNT | @@ -209,7 +222,7 @@ static constexpr FeatureBitset FeaturesAMDFAM10 = // Bobcat architecture processors. static constexpr FeatureBitset FeaturesBTVER1 = - FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | + FeatureX87 | FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeaturePOPCNT | FeaturePRFCHW | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_A | FeatureSAHF; @@ -220,7 +233,7 @@ static constexpr FeatureBitset FeaturesBTVER2 = // AMD Bulldozer architecture processors. static constexpr FeatureBitset FeaturesBDVER1 = FeatureX87 | FeatureAES | FeatureAVX | FeatureCMPXCHG8B | - FeatureCMPXCHG16B | FeatureEM64T | FeatureFMA4 | FeatureFXSR | FeatureLWP | + FeatureCMPXCHG16B | Feature64BIT | FeatureFMA4 | FeatureFXSR | FeatureLWP | FeatureLZCNT | FeatureMMX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureSAHF | FeatureSSE | FeatureSSE2 | FeatureSSE3 | FeatureSSSE3 | FeatureSSE4_1 | FeatureSSE4_2 | FeatureSSE4_A | FeatureXOP | FeatureXSAVE; @@ -236,7 +249,7 @@ static constexpr FeatureBitset FeaturesBDVER4 = static constexpr FeatureBitset FeaturesZNVER1 = FeatureX87 | FeatureADX | FeatureAES | FeatureAVX | FeatureAVX2 | FeatureBMI | FeatureBMI2 | FeatureCLFLUSHOPT | FeatureCLZERO | - FeatureCMPXCHG8B | FeatureCMPXCHG16B | FeatureEM64T | FeatureF16C | + FeatureCMPXCHG8B | FeatureCMPXCHG16B | Feature64BIT | FeatureF16C | FeatureFMA | FeatureFSGSBASE | FeatureFXSR | FeatureLZCNT | FeatureMMX | FeatureMOVBE | FeatureMWAITX | FeaturePCLMUL | FeaturePOPCNT | FeaturePRFCHW | FeatureRDRND | FeatureRDSEED | FeatureSAHF | FeatureSHA | @@ -363,7 +376,7 @@ static constexpr ProcInfo Processors[] = { X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { for (const auto &P : Processors) - if (P.Name == CPU && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (P.Name == CPU && (P.Features[FEATURE_64BIT] || !Only64Bit)) return P.Kind; return CK_None; @@ -372,7 +385,7 @@ X86::CPUKind llvm::X86::parseArchX86(StringRef CPU, bool Only64Bit) { void llvm::X86::fillValidCPUArchList(SmallVectorImpl &Values, bool Only64Bit) { for (const auto &P : Processors) - if (!P.Name.empty() && (P.Features[FEATURE_EM64T] || !Only64Bit)) + if (!P.Name.empty() && (P.Features[FEATURE_64BIT] || !Only64Bit)) Values.emplace_back(P.Name); } @@ -401,7 +414,6 @@ static constexpr FeatureBitset ImpliedFeaturesCLZERO = {}; static constexpr FeatureBitset ImpliedFeaturesCMOV = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG16B = {}; static constexpr FeatureBitset ImpliedFeaturesCMPXCHG8B = {}; -static constexpr FeatureBitset ImpliedFeaturesEM64T = {}; static constexpr FeatureBitset ImpliedFeaturesENQCMD = {}; static constexpr FeatureBitset ImpliedFeaturesFSGSBASE = {}; static constexpr FeatureBitset ImpliedFeaturesFXSR = {}; @@ -527,8 +539,14 @@ void llvm::X86::getFeaturesForCPU(StringRef CPU, [&](const ProcInfo &P) { return P.Name == CPU; }); assert(I != std::end(Processors) && "Processor not found!"); + FeatureBitset Bits = I->Features; + + // Remove the 64-bit feature which we only use to validate if a CPU can + // be used with 64-bit mode. + Bits &= ~Feature64BIT; + // Add the string version of all set bits. - getFeatureBitsAsStrings(I->Features, EnabledFeatures); + getFeatureBitsAsStrings(Bits, EnabledFeatures); } // For each feature that is (transitively) implied by this feature, set it. From llvm-commits at lists.llvm.org Thu Jul 9 14:53:40 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:53:40 +0000 (UTC) Subject: [PATCH] D83094: Analysis: Add a GenericCycleInfo analysis In-Reply-To: References: Message-ID: arsenm added inline comments. ================ Comment at: llvm/lib/Analysis/GenericCycleInfo.cpp:420 +/// or that the right set of cycles in the CFG were found. +void GenericCycleInfoBase::validateTree() const { + DenseSet blocks; ---------------- If this is going to be assert-based, might as well disable the whole function ifndef NDEBUG? ================ Comment at: llvm/lib/Analysis/GenericCycleInfo.cpp:427 +/// or that the right set of cycles in the CFG were found. +void GenericCycleInfoBase::validateTree() const { + DenseSet blocks; ---------------- nhaehnle wrote: > arsenm wrote: > > I think it would be more helpful to have this return a bool for fail/pass, and not directly assert. The assert conditions could print more about why it's not valid (although there so many asserts, this might be annoying) > It's not clear to me why the fail/pass return would be more helpful? I sometimes have called these type of verifiers in the middle of passes in gdb but it's not that important Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83094/new/ https://reviews.llvm.org/D83094 From llvm-commits at lists.llvm.org Thu Jul 9 14:59:49 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:59:49 +0000 (UTC) Subject: [PATCH] D73853: [llvm-reduce] add ReduceAttribute delta pass In-Reply-To: References: Message-ID: nickdesaulniers abandoned this revision. nickdesaulniers added a comment. Thanks for fixing; I owe you that beer! 🍻 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73853/new/ https://reviews.llvm.org/D73853 From llvm-commits at lists.llvm.org Thu Jul 9 15:00:02 2020 From: llvm-commits at lists.llvm.org (Frederic Riss via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:00:02 +0000 (UTC) Subject: [PATCH] D83023: [lldb/ObjectFileMachO] Fetch shared cache images from our own shared cache In-Reply-To: References: Message-ID: <72277ddb64641e79fe12137762e0a80b@localhost.localdomain> friss added a comment. In D83023#2129985 , @labath wrote: > In D83023#2128475 , @friss wrote: > > > In D83023#2128298 , @labath wrote: > > > > > I think this is a very interesting feature (lldb being able to load modules from memory; the mac shared cache thingy is interesting too, but in a different way). We have a feature request from people who are downloading modules from a network (from a proprietary symbol server, etc.) and would like to pass them to lldb without having to serialize them to disk. This would be a step towards making that happen. It could also be useful for our own unit tests which now have to do a similar thing. > > > > > > However, I think this could use some clean up. There's a lot of juggling of data/file/object offsets going on, and it all seems inconsistent and avoidable to me. Please see inline comments for details. > > > > > > I'll see what can be done. My main goal while working on this was to avoid changing the semantics outside of the shared cache usecase. I understand fairly well the codepath that I added and then just moved some other bits around to keep the existing semantics for the rest. Happy to rework this. > > > So, if an object file needs to access some data which is outside of "its" image then my idea about using sliced data buffers will probably not work. It that case, using the "object offset" field to communicate the location of the "object" might not be a bad idea (it's still different than the use in .a files, but maybe we can stretch that definition). The part that bugs me then is having this functionality key off of the "data" field being set. Ideally, these would be two orthogonal features: > > - the "data" would control whether you read the file system to obtain the object contents > - the "object offset" would tell you where to locate the desired object inside these "jumbo" objects I think this will work. And I can hide the ugliness inside ObjectFileMachO. A shared cache image only ever needs to access data after its start, so I can model images a stretching from their starting point to the end of the shared cache. I remember doing it this way first, and the reason I changed my mind was because of some checks in the ObjectFileMachO::CreateSections which fired because the load commands were relative to the full shared cache instead of just the image. This can be dealt with locally in this function (the rest of the code has to deal with it anyway, because once an ObjectFile plugin claims an input, the data gets clamped to not have a starting offset anymore Take a look at https://reviews.llvm.org/D83512, it implements the generic part and it required basically no work to support ELF in-memory files. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83023/new/ https://reviews.llvm.org/D83023 From llvm-commits at lists.llvm.org Thu Jul 9 15:00:19 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:00:19 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: MaskRay added a comment. Hi, your git commit contains extra Phabricator tags. You can drop `Reviewers:` `Subscribers:` `Tags:` and the text `Summary:` from the git commit with the following script: arcfilter () { arc amend git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend --date=now -F - } `Reviewed By: ` is considered important by some people. Please keep the tag. (`--date=now` is my personal preference (author dates are usually not useful. Using committer dates can make log almost monotonic in time)) `llvm/utils/git/pre-push.py` can validate the message does not include unneeded tags. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Thu Jul 9 15:00:49 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via llvm-commits) Date: Thu, 09 Jul 2020 15:00:49 -0700 (PDT) Subject: [llvm] 839f8e4 - [FileCheck] Improve -dump-input documentation Message-ID: <5f079391.1c69fb81.3b217.9d72@mx.google.com> Author: Joel E. Denny Date: 2020-07-09T18:00:30-04:00 New Revision: 839f8e4fe2dcf490a0972d7761f95e5a6b287faf URL: https://github.com/llvm/llvm-project/commit/839f8e4fe2dcf490a0972d7761f95e5a6b287faf DIFF: https://github.com/llvm/llvm-project/commit/839f8e4fe2dcf490a0972d7761f95e5a6b287faf.diff LOG: [FileCheck] Improve -dump-input documentation Document the default of `fail` in `-help`. Extend `-dump-input=help` to help users find related command-line options, but let `-help` provide their full documentation. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D83091 Added: Modified: llvm/test/FileCheck/dump-input-enable.txt llvm/utils/FileCheck/FileCheck.cpp Removed: ################################################################################ diff --git a/llvm/test/FileCheck/dump-input-enable.txt b/llvm/test/FileCheck/dump-input-enable.txt index 6f1f8e123207..b0aacfb2fed2 100644 --- a/llvm/test/FileCheck/dump-input-enable.txt +++ b/llvm/test/FileCheck/dump-input-enable.txt @@ -224,7 +224,7 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input option: Cannot find op ; HELP-NOT: {{.}} ; HELP: The following description was requested by -dump-input=help -; HELP: try{{.*}}-color +; HELP: - colors {{.*}} ; HELP-NOT: {{.}} ; Trace is sometimes suppressed. diff --git a/llvm/utils/FileCheck/FileCheck.cpp b/llvm/utils/FileCheck/FileCheck.cpp index e0037b596cdb..659491c89636 100644 --- a/llvm/utils/FileCheck/FileCheck.cpp +++ b/llvm/utils/FileCheck/FileCheck.cpp @@ -121,10 +121,9 @@ static cl::list DumpInputs( cl::desc("Dump input to stderr, adding annotations representing\n" "currently enabled diagnostics. When there are multiple\n" "occurrences of this option, the that appears earliest\n" - "in the list below has precedence.\n"), + "in the list below has precedence. The default is 'fail'.\n"), cl::value_desc("mode"), - cl::values(clEnumValN(DumpInputHelp, "help", - "Explain dump format and quit"), + cl::values(clEnumValN(DumpInputHelp, "help", "Explain input dump and quit"), clEnumValN(DumpInputAlways, "always", "Always dump input"), clEnumValN(DumpInputFail, "fail", "Dump input on failure"), clEnumValN(DumpInputNever, "never", "Never dump input"))); @@ -180,8 +179,15 @@ static MarkerStyle GetMarker(FileCheckDiag::MatchType MatchTy) { static void DumpInputAnnotationHelp(raw_ostream &OS) { OS << "The following description was requested by -dump-input=help to\n" - << "explain the input annotations printed by -dump-input=always and\n" - << "-dump-input=fail:\n\n"; + << "explain the input dump printed by FileCheck.\n" + << "\n" + << "Related command-line options:\n" + << " - -dump-input= enables or disables the input dump\n" + << " - -v and -vv add more annotations\n" + << " - -color forces colors to be enabled both in the dump and below\n" + << " - -help documents the above options in more detail\n" + << "\n" + << "Input dump annotation format:\n"; // Labels for input lines. OS << " - "; @@ -233,8 +239,7 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { WithColor(OS, raw_ostream::CYAN, true, false) << "discarded match"; OS << ", "; WithColor(OS, raw_ostream::CYAN, true, true) << "unmatched input"; - OS << "\n\n" - << "If you are not seeing color above or in input dumps, try: -color\n"; + OS << "\n"; } /// An annotation for a single input line. @@ -675,12 +680,10 @@ int main(int argc, char **argv) { if (DumpInput == DumpInputAlways || (ExitCode == 1 && DumpInput == DumpInputFail)) { errs() << "\n" - << "Input file: " - << InputFilename - << "\n" + << "Input file: " << InputFilename << "\n" << "Check file: " << CheckFilename << "\n" << "\n" - << "-dump-input=help describes the format of the following dump.\n" + << "-dump-input=help explains the following input dump.\n" << "\n"; std::vector Annotations; unsigned LabelWidth; From llvm-commits at lists.llvm.org Thu Jul 9 15:00:56 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:00:56 +0000 (UTC) Subject: [PATCH] D83091: [FileCheck] Improve -dump-input documentation In-Reply-To: References: Message-ID: <759a4dc5f3b75f680066eac72e01533a@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG839f8e4fe2dc: [FileCheck] Improve -dump-input documentation (authored by jdenny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83091/new/ https://reviews.llvm.org/D83091 Files: llvm/test/FileCheck/dump-input-enable.txt llvm/utils/FileCheck/FileCheck.cpp Index: llvm/utils/FileCheck/FileCheck.cpp =================================================================== --- llvm/utils/FileCheck/FileCheck.cpp +++ llvm/utils/FileCheck/FileCheck.cpp @@ -121,10 +121,9 @@ cl::desc("Dump input to stderr, adding annotations representing\n" "currently enabled diagnostics. When there are multiple\n" "occurrences of this option, the that appears earliest\n" - "in the list below has precedence.\n"), + "in the list below has precedence. The default is 'fail'.\n"), cl::value_desc("mode"), - cl::values(clEnumValN(DumpInputHelp, "help", - "Explain dump format and quit"), + cl::values(clEnumValN(DumpInputHelp, "help", "Explain input dump and quit"), clEnumValN(DumpInputAlways, "always", "Always dump input"), clEnumValN(DumpInputFail, "fail", "Dump input on failure"), clEnumValN(DumpInputNever, "never", "Never dump input"))); @@ -180,8 +179,15 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { OS << "The following description was requested by -dump-input=help to\n" - << "explain the input annotations printed by -dump-input=always and\n" - << "-dump-input=fail:\n\n"; + << "explain the input dump printed by FileCheck.\n" + << "\n" + << "Related command-line options:\n" + << " - -dump-input= enables or disables the input dump\n" + << " - -v and -vv add more annotations\n" + << " - -color forces colors to be enabled both in the dump and below\n" + << " - -help documents the above options in more detail\n" + << "\n" + << "Input dump annotation format:\n"; // Labels for input lines. OS << " - "; @@ -233,8 +239,7 @@ WithColor(OS, raw_ostream::CYAN, true, false) << "discarded match"; OS << ", "; WithColor(OS, raw_ostream::CYAN, true, true) << "unmatched input"; - OS << "\n\n" - << "If you are not seeing color above or in input dumps, try: -color\n"; + OS << "\n"; } /// An annotation for a single input line. @@ -675,12 +680,10 @@ if (DumpInput == DumpInputAlways || (ExitCode == 1 && DumpInput == DumpInputFail)) { errs() << "\n" - << "Input file: " - << InputFilename - << "\n" + << "Input file: " << InputFilename << "\n" << "Check file: " << CheckFilename << "\n" << "\n" - << "-dump-input=help describes the format of the following dump.\n" + << "-dump-input=help explains the following input dump.\n" << "\n"; std::vector Annotations; unsigned LabelWidth; Index: llvm/test/FileCheck/dump-input-enable.txt =================================================================== --- llvm/test/FileCheck/dump-input-enable.txt +++ llvm/test/FileCheck/dump-input-enable.txt @@ -224,7 +224,7 @@ ; HELP-NOT: {{.}} ; HELP: The following description was requested by -dump-input=help -; HELP: try{{.*}}-color +; HELP: - colors {{.*}} ; HELP-NOT: {{.}} ; Trace is sometimes suppressed. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83091.276847.patch Type: text/x-patch Size: 3119 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:01:46 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:01:46 +0000 (UTC) Subject: [PATCH] D83491: [flang] Fix a crash when creating generics from a copy In-Reply-To: References: Message-ID: <20fa65b33b5c59b14eafac8f8fbc10dd@localhost.localdomain> PeteSteinfeld updated this revision to Diff 276848. PeteSteinfeld added a comment. With Tim's guidance, I changed the code in `CopyFrom()` to treat the procs and binding names as a pair and also removed the constructor for `GenericDetails` that took a vector of procs, since it wasn't used anywhere. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83491/new/ https://reviews.llvm.org/D83491 Files: flang/include/flang/Semantics/symbol.h flang/lib/Semantics/symbol.cpp flang/test/Semantics/resolve53.f90 Index: flang/test/Semantics/resolve53.f90 =================================================================== --- flang/test/Semantics/resolve53.f90 +++ flang/test/Semantics/resolve53.f90 @@ -457,3 +457,26 @@ integer :: i, j end end + +module m20 + interface operator(.not.) + real function f(x) + character(*),intent(in) :: x + end function + end interface + interface operator(+) + procedure f + end interface +end module + +subroutine s1() + use m20 + interface operator(.not.) + !ERROR: Procedure 'f' is already specified in generic 'operator(.not.)' + procedure f + end interface + interface operator(+) + !ERROR: Procedure 'f' is already specified in generic 'operator(+)' + procedure f + end interface +end subroutine s1 Index: flang/lib/Semantics/symbol.cpp =================================================================== --- flang/lib/Semantics/symbol.cpp +++ flang/lib/Semantics/symbol.cpp @@ -150,9 +150,6 @@ return *this; } -GenericDetails::GenericDetails(const SymbolVector &specificProcs) - : specificProcs_{specificProcs} {} - void GenericDetails::AddSpecificProc( const Symbol &proc, SourceName bindingName) { specificProcs_.push_back(proc); @@ -186,6 +183,8 @@ } void GenericDetails::CopyFrom(const GenericDetails &from) { + CHECK(specificProcs_.size() == bindingNames_.size()); + CHECK(from.specificProcs_.size() == from.bindingNames_.size()); if (from.specific_) { CHECK(!specific_ || specific_ == from.specific_); specific_ = from.specific_; @@ -194,11 +193,13 @@ CHECK(!derivedType_ || derivedType_ == from.derivedType_); derivedType_ = from.derivedType_; } - for (const Symbol &symbol : from.specificProcs_) { + for (std::size_t i{0}; i < from.specificProcs_.size(); ++i) { if (std::find_if(specificProcs_.begin(), specificProcs_.end(), - [&](const Symbol &mySymbol) { return &mySymbol == &symbol; }) == - specificProcs_.end()) { - specificProcs_.push_back(symbol); + [&](const Symbol &mySymbol) { + return &mySymbol == &*from.specificProcs_[i]; + }) == specificProcs_.end()) { + specificProcs_.push_back(from.specificProcs_[i]); + bindingNames_.push_back(from.bindingNames_[i]); } } } Index: flang/include/flang/Semantics/symbol.h =================================================================== --- flang/include/flang/Semantics/symbol.h +++ flang/include/flang/Semantics/symbol.h @@ -423,7 +423,6 @@ class GenericDetails { public: GenericDetails() {} - GenericDetails(const SymbolVector &specificProcs); GenericKind kind() const { return kind_; } void set_kind(GenericKind kind) { kind_ = kind; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83491.276848.patch Type: text/x-patch Size: 2721 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:02:28 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:02:28 +0000 (UTC) Subject: [PATCH] D73853: [llvm-reduce] add ReduceAttribute delta pass In-Reply-To: References: Message-ID: <31272e82f88515eaef11afc243179566@localhost.localdomain> lebedev.ri added a comment. In D73853#2142749 , @nickdesaulniers wrote: > Thanks for fixing; I owe you that beer! 🍻 Not really fixing, i didn't read what this code was doing Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D73853/new/ https://reviews.llvm.org/D73853 From llvm-commits at lists.llvm.org Thu Jul 9 15:04:47 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:04:47 +0000 (UTC) Subject: [PATCH] D83504: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls. callee has a TOC In-Reply-To: References: Message-ID: <859ca30cfd237b8a77f4d7ca6f464e4d@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/ELF/Arch/PPC64.cpp:1053 + if (type == R_PPC64_REL24_NOTOC && (s.stOther >> 5) > 1) + return true; + ---------------- misaligned ================ Comment at: lld/ELF/Target.h:216 +// The prefixed instruction is always a 4 byte prefix followed by a 4 byte +// instruction. Therefore, the prefix is always in lower memory than the ---------------- Make it imperative, i.e. // Write a prefixed instruction, which is a 4-byte prefix followed by a 4-byte instruction (regardless of endianness). "As a result, we need to shift the pieces around on little endian machines." is not needed. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-toc.s:18 + +# The test is created to check that when a function without TOC access a +# local function using TOC, a r12 setup stub is inserted. ---------------- `## ` ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-toc.s:22 +# SYMBOL: 1: 0000000010020000 0 NOTYPE LOCAL DEFAULT [] 2 callee +# SYMBOL: 2: 0000000010030000 0 NOTYPE LOCAL DEFAULT [] 3 caller +# SYMBOL: 3: 0000000010010000 0 NOTYPE LOCAL DEFAULT 1 func ---------------- `-NEXT:` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83504/new/ https://reviews.llvm.org/D83504 From llvm-commits at lists.llvm.org Thu Jul 9 15:06:18 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via llvm-commits) Date: Thu, 09 Jul 2020 15:06:18 -0700 (PDT) Subject: [llvm] 5ffec46 - [PowerPC][Power10] Add Instruction definition/MC Tests for Load/Store Rightmost VSX Vector Message-ID: <5f0794da.1c69fb81.9bc78.9975@mx.google.com> Author: Albion Fung Date: 2020-07-09T17:06:03-05:00 New Revision: 5ffec467202808f92c378adae95d9972926aba7d URL: https://github.com/llvm/llvm-project/commit/5ffec467202808f92c378adae95d9972926aba7d DIFF: https://github.com/llvm/llvm-project/commit/5ffec467202808f92c378adae95d9972926aba7d.diff LOG: [PowerPC][Power10] Add Instruction definition/MC Tests for Load/Store Rightmost VSX Vector This patch adds the instruction definitions and the assembly/disassembly tests for the Load/Store VSX Vector Rightmose instructions. Differential Revision: https://reviews.llvm.org/D83364 Added: Modified: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td index 2c21d0a175ad..2d12a72e29ae 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ b/llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -942,8 +942,26 @@ let Predicates = [IsISA3_1] in { "vclrrb $vD, $vA, $rB", IIC_VecGeneral, [(set v16i8:$vD, (int_ppc_altivec_vclrrb v16i8:$vA, i32:$rB))]>; + + // The XFormMemOp flag for the following 8 instructions is set on + // the instruction format. + let mayLoad = 1, mayStore = 0 in { + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; + def LXVRHX : X_XT6_RA5_RB5<31, 45, "lxvrhx", vsrc, []>; + def LXVRWX : X_XT6_RA5_RB5<31, 77, "lxvrwx", vsrc, []>; + def LXVRDX : X_XT6_RA5_RB5<31, 109, "lxvrdx", vsrc, []>; + } + + let mayLoad = 0, mayStore = 1 in { + def STXVRBX : X_XS6_RA5_RB5<31, 141, "stxvrbx", vsrc, []>; + def STXVRHX : X_XS6_RA5_RB5<31, 173, "stxvrhx", vsrc, []>; + def STXVRWX : X_XS6_RA5_RB5<31, 205, "stxvrwx", vsrc, []>; + def STXVRDX : X_XS6_RA5_RB5<31, 237, "stxvrdx", vsrc, []>; + } } + + //---------------------------- Anonymous Patterns ----------------------------// let Predicates = [IsISA3_1] in { def : Pat<(v16i8 (int_ppc_vsx_xxgenpcvbm v16i8:$VRB, imm:$IMM)), diff --git a/llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt b/llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt index 77ec7793973c..f8d310fa7e14 100644 --- a/llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt +++ b/llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt @@ -278,3 +278,27 @@ # CHECK: vinsdrx 1, 2, 3 0x10 0x22 0x1b 0xcf + +# CHECK: lxvrbx 32, 1, 2 +0x7c 0x01 0x10 0x1b + +# CHECK: lxvrhx 33, 1, 2 +0x7c 0x21 0x10 0x5b + +# CHECK: lxvrdx 34, 1, 2 +0x7c 0x41 0x10 0xdb + +# CHECK: lxvrwx 35, 1, 2 +0x7c 0x61 0x10 0x9b + +# CHECK: stxvrbx 32, 3, 1 +0x7c 0x03 0x09 0x1b + +# CHECK: stxvrhx 33, 3, 1 +0x7c 0x23 0x09 0x5b + +# CHECK: stxvrwx 34, 3, 1 +0x7c 0x43 0x09 0x9b + +# CHECK: stxvrdx 35, 3, 1 +0x7c 0x63 0x09 0xdb diff --git a/llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s b/llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s index 4215725ea584..5ed6b14d38ae 100644 --- a/llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s +++ b/llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s @@ -405,3 +405,27 @@ # CHECK-BE: vinsdrx 1, 2, 3 # encoding: [0x10,0x22,0x1b,0xcf] # CHECK-LE: vinsdrx 1, 2, 3 # encoding: [0xcf,0x1b,0x22,0x10] vinsdrx 1, 2, 3 +# CHECK-BE: lxvrbx 32, 1, 2 # encoding: [0x7c,0x01,0x10,0x1b] +# CHECK-LE: lxvrbx 32, 1, 2 # encoding: [0x1b,0x10,0x01,0x7c] + lxvrbx 32, 1, 2 +# CHECK-BE: lxvrhx 33, 1, 2 # encoding: [0x7c,0x21,0x10,0x5b] +# CHECK-LE: lxvrhx 33, 1, 2 # encoding: [0x5b,0x10,0x21,0x7c] + lxvrhx 33, 1, 2 +# CHECK-BE: lxvrdx 34, 1, 2 # encoding: [0x7c,0x41,0x10,0xdb] +# CHECK-LE: lxvrdx 34, 1, 2 # encoding: [0xdb,0x10,0x41,0x7c] + lxvrdx 34, 1, 2 +# CHECK-BE: lxvrwx 35, 1, 2 # encoding: [0x7c,0x61,0x10,0x9b] +# CHECK-LE: lxvrwx 35, 1, 2 # encoding: [0x9b,0x10,0x61,0x7c] + lxvrwx 35, 1, 2 +# CHECK-BE: stxvrbx 32, 3, 1 # encoding: [0x7c,0x03,0x09,0x1b] +# CHECK-LE: stxvrbx 32, 3, 1 # encoding: [0x1b,0x09,0x03,0x7c] + stxvrbx 32, 3, 1 +# CHECK-BE: stxvrhx 33, 3, 1 # encoding: [0x7c,0x23,0x09,0x5b] +# CHECK-LE: stxvrhx 33, 3, 1 # encoding: [0x5b,0x09,0x23,0x7c] + stxvrhx 33, 3, 1 +# CHECK-BE: stxvrwx 34, 3, 1 # encoding: [0x7c,0x43,0x09,0x9b] +# CHECK-LE: stxvrwx 34, 3, 1 # encoding: [0x9b,0x09,0x43,0x7c] + stxvrwx 34, 3, 1 +# CHECK-BE: stxvrdx 35, 3, 1 # encoding: [0x7c,0x63,0x09,0xdb] +# CHECK-LE: stxvrdx 35, 3, 1 # encoding: [0xdb,0x09,0x63,0x7c] + stxvrdx 35, 3, 1 From llvm-commits at lists.llvm.org Thu Jul 9 15:06:28 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:06:28 +0000 (UTC) Subject: [PATCH] D83364: [PowerPC][Power10] Implement Instruction definition and MC Tests for Load and Store VSX Vector with Zero or Sign Extend In-Reply-To: References: Message-ID: <03a5a5fc2cf660838a6b56764042e2e0@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG5ffec4672028: [PowerPC][Power10] Add Instruction definition/MC Tests for Load/Store Rightmost… (authored by Conanap, committed by amyk). Changed prior to commit: https://reviews.llvm.org/D83364?vs=276564&id=276850#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83364/new/ https://reviews.llvm.org/D83364 Files: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s Index: llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s =================================================================== --- llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s +++ llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s @@ -405,3 +405,27 @@ # CHECK-BE: vinsdrx 1, 2, 3 # encoding: [0x10,0x22,0x1b,0xcf] # CHECK-LE: vinsdrx 1, 2, 3 # encoding: [0xcf,0x1b,0x22,0x10] vinsdrx 1, 2, 3 +# CHECK-BE: lxvrbx 32, 1, 2 # encoding: [0x7c,0x01,0x10,0x1b] +# CHECK-LE: lxvrbx 32, 1, 2 # encoding: [0x1b,0x10,0x01,0x7c] + lxvrbx 32, 1, 2 +# CHECK-BE: lxvrhx 33, 1, 2 # encoding: [0x7c,0x21,0x10,0x5b] +# CHECK-LE: lxvrhx 33, 1, 2 # encoding: [0x5b,0x10,0x21,0x7c] + lxvrhx 33, 1, 2 +# CHECK-BE: lxvrdx 34, 1, 2 # encoding: [0x7c,0x41,0x10,0xdb] +# CHECK-LE: lxvrdx 34, 1, 2 # encoding: [0xdb,0x10,0x41,0x7c] + lxvrdx 34, 1, 2 +# CHECK-BE: lxvrwx 35, 1, 2 # encoding: [0x7c,0x61,0x10,0x9b] +# CHECK-LE: lxvrwx 35, 1, 2 # encoding: [0x9b,0x10,0x61,0x7c] + lxvrwx 35, 1, 2 +# CHECK-BE: stxvrbx 32, 3, 1 # encoding: [0x7c,0x03,0x09,0x1b] +# CHECK-LE: stxvrbx 32, 3, 1 # encoding: [0x1b,0x09,0x03,0x7c] + stxvrbx 32, 3, 1 +# CHECK-BE: stxvrhx 33, 3, 1 # encoding: [0x7c,0x23,0x09,0x5b] +# CHECK-LE: stxvrhx 33, 3, 1 # encoding: [0x5b,0x09,0x23,0x7c] + stxvrhx 33, 3, 1 +# CHECK-BE: stxvrwx 34, 3, 1 # encoding: [0x7c,0x43,0x09,0x9b] +# CHECK-LE: stxvrwx 34, 3, 1 # encoding: [0x9b,0x09,0x43,0x7c] + stxvrwx 34, 3, 1 +# CHECK-BE: stxvrdx 35, 3, 1 # encoding: [0x7c,0x63,0x09,0xdb] +# CHECK-LE: stxvrdx 35, 3, 1 # encoding: [0xdb,0x09,0x63,0x7c] + stxvrdx 35, 3, 1 Index: llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt =================================================================== --- llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt +++ llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt @@ -278,3 +278,27 @@ # CHECK: vinsdrx 1, 2, 3 0x10 0x22 0x1b 0xcf + +# CHECK: lxvrbx 32, 1, 2 +0x7c 0x01 0x10 0x1b + +# CHECK: lxvrhx 33, 1, 2 +0x7c 0x21 0x10 0x5b + +# CHECK: lxvrdx 34, 1, 2 +0x7c 0x41 0x10 0xdb + +# CHECK: lxvrwx 35, 1, 2 +0x7c 0x61 0x10 0x9b + +# CHECK: stxvrbx 32, 3, 1 +0x7c 0x03 0x09 0x1b + +# CHECK: stxvrhx 33, 3, 1 +0x7c 0x23 0x09 0x5b + +# CHECK: stxvrwx 34, 3, 1 +0x7c 0x43 0x09 0x9b + +# CHECK: stxvrdx 35, 3, 1 +0x7c 0x63 0x09 0xdb Index: llvm/lib/Target/PowerPC/PPCInstrPrefix.td =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrPrefix.td +++ llvm/lib/Target/PowerPC/PPCInstrPrefix.td @@ -942,8 +942,26 @@ "vclrrb $vD, $vA, $rB", IIC_VecGeneral, [(set v16i8:$vD, (int_ppc_altivec_vclrrb v16i8:$vA, i32:$rB))]>; + + // The XFormMemOp flag for the following 8 instructions is set on + // the instruction format. + let mayLoad = 1, mayStore = 0 in { + def LXVRBX : X_XT6_RA5_RB5<31, 13, "lxvrbx", vsrc, []>; + def LXVRHX : X_XT6_RA5_RB5<31, 45, "lxvrhx", vsrc, []>; + def LXVRWX : X_XT6_RA5_RB5<31, 77, "lxvrwx", vsrc, []>; + def LXVRDX : X_XT6_RA5_RB5<31, 109, "lxvrdx", vsrc, []>; + } + + let mayLoad = 0, mayStore = 1 in { + def STXVRBX : X_XS6_RA5_RB5<31, 141, "stxvrbx", vsrc, []>; + def STXVRHX : X_XS6_RA5_RB5<31, 173, "stxvrhx", vsrc, []>; + def STXVRWX : X_XS6_RA5_RB5<31, 205, "stxvrwx", vsrc, []>; + def STXVRDX : X_XS6_RA5_RB5<31, 237, "stxvrdx", vsrc, []>; + } } + + //---------------------------- Anonymous Patterns ----------------------------// let Predicates = [IsISA3_1] in { def : Pat<(v16i8 (int_ppc_vsx_xxgenpcvbm v16i8:$VRB, imm:$IMM)), -------------- next part -------------- A non-text attachment was scrubbed... Name: D83364.276850.patch Type: text/x-patch Size: 4059 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:20:07 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:20:07 +0000 (UTC) Subject: [PATCH] D83515: [flang] Fix frontend build with -DBUILD_SHARED_LIBS=On Message-ID: klausler created this revision. klausler added reviewers: tskeith, schweitz, sscalpone. klausler added a project: Flang. Herald added subscribers: llvm-commits, sstefan1, mgorny. Herald added a reviewer: jdoerfert. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Fix fronted shared library builds by eliminating dependences of the parser on other component libraries, moving some code around that wasn't in the right library, and making some dependences explicit in the CMakeLists.txt files. The lowering library does not yet build as a shared library due to some undefined names. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83515 Files: flang/include/flang/Common/indirection.h flang/include/flang/Evaluate/call.h flang/include/flang/Evaluate/expression.h flang/include/flang/Evaluate/tools.h flang/include/flang/Parser/parse-tree.h flang/include/flang/Semantics/expression.h flang/include/flang/Semantics/tools.h flang/lib/Evaluate/CMakeLists.txt flang/lib/Evaluate/call.cpp flang/lib/Evaluate/expression.cpp flang/lib/Evaluate/tools.cpp flang/lib/Lower/CMakeLists.txt flang/lib/Parser/parse-tree.cpp flang/lib/Semantics/CMakeLists.txt flang/lib/Semantics/expression.cpp flang/lib/Semantics/tools.cpp flang/tools/f18-parse-demo/stub-evaluate.cpp flang/tools/f18/CMakeLists.txt flang/unittests/Evaluate/CMakeLists.txt flang/unittests/Runtime/CMakeLists.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: D83515.276852.patch Type: text/x-patch Size: 16370 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:20:28 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:20:28 +0000 (UTC) Subject: [PATCH] D83516: [PowerPC][Power10] RFC 2608 Instruction definitions and MC Tests Message-ID: Conanap created this revision. Conanap added reviewers: power-llvm-team, PowerPC, saghir, nemanjai, hfinkel. Conanap added projects: LLVM, clang, PowerPC. This implements instruction definitions and MC tests for RFC2608. Please note that some instrs have classes that will need to be changed later as their classes have not been implemented yet - they will be implemented in their respective patches. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83516 Files: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83516.276849.patch Type: text/x-patch Size: 11994 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:25:59 2020 From: llvm-commits at lists.llvm.org (Scott Linder via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:25:59 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: scott.linder added inline comments. ================ Comment at: llvm/docs/AMDGPUUsage.rst:2321 "ValueType" string Required Kernel argument value type. Only present if "ValueKind" is ---------------- Should we delete this as well? Or at least mark it as non-Required? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Thu Jul 9 15:27:04 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:27:04 +0000 (UTC) Subject: [PATCH] D83500: [PowerPC][Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. In-Reply-To: References: Message-ID: <14f48b7d22bdcda2fae56bca63c89cd7@localhost.localdomain> amyk updated this revision to Diff 276853. amyk added a comment. Fix assignment of variable. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83500/new/ https://reviews.llvm.org/D83500 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c -------------- next part -------------- A non-text attachment was scrubbed... Name: D83500.276853.patch Type: text/x-patch Size: 12110 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:31:19 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:31:19 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <0129fb3a4317711269f58138dfa0873d@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1359-1368 + dwarf::Attribute MacrosAttr = getDwarfVersion() >= 5 + ? dwarf::DW_AT_macros + : dwarf::DW_AT_GNU_macros; if (useSplitDwarf()) TheCU.addSectionDelta( - TheCU.getUnitDie(), dwarf::DW_AT_macros, U.getMacroLabelBegin(), + TheCU.getUnitDie(), MacrosAttr, U.getMacroLabelBegin(), TLOF.getDwarfMacroDWOSection()->getBeginSymbol()); ---------------- Looks like this might be wrong for v4 + split DWARF + using macro? Or perhaps this code isn't reachable by that combination? Might be more clear, then, to sink the MacrosAttr choice down into the "else" clause here, and assert in the split DWARF case that the version >= 5? (possibly including a note about how the pre-v5, GCC debug_macro extension isn't supported with Split DWARF) ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3032-3035 + Asm->emitULEB128(Type); + Asm->OutStreamer->AddComment("Line Number"); + Asm->emitULEB128(M.getLine()); + Asm->OutStreamer->AddComment("Macro String"); ---------------- /might/ be worth pulling these 4 lines out as a lambda to use from the if/else branches, but probably not... ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3036-3048 + if (!Value.empty()) + // FIXME: Add support for DWARF64. + Asm->OutStreamer->emitSymbolValue( + this->InfoHolder.getStringPool() + .getEntry(*Asm, (Name + " " + Value).str()) + .getSymbol(), + /*Size=*/4); ---------------- Might be nice to refactor this in both the original codepath and the new codepath you're adding (either before or after this commit) to compute the string once & share the rest of this expression.. ``` std::string Str = Value.empty() ? Name.str() : (Name + ' ' + Value).str(); Asm->OutStreamer->emitSymbol(this->InfoHolder.getStringPool().getEntry(*Asm, Str).getSymbol(), 4); ``` ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3088-3091 + dwarf::DW_MACRO_end_file, [&](unsigned Form) { + return (getDwarfVersion() >= 5) + ? dwarf::MacroString(Form) + : dwarf::GnuMacroString(Form); ---------------- Looks like maybe this could skip the std::function_ref, and do this: ``` emitMacroFileImpl(F, U, dwarf::DW_MACRO_start_file, dwarf::DW_MACRO_end_file, (getDwarfVersion() >= 5) ? dwarf::MacroString : dwarf::GnuMacroString); ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Thu Jul 9 15:31:53 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:31:53 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: arsenm updated this revision to Diff 276854. arsenm added a comment. Update documentation CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 Files: llvm/docs/AMDGPUUsage.rst llvm/include/llvm/Support/AMDGPUMetadata.h llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp llvm/lib/Support/AMDGPUMetadata.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82818.276854.patch Type: text/x-patch Size: 191020 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:41:58 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:41:58 +0000 (UTC) Subject: [PATCH] D83518: IR: Define byref attribute Message-ID: arsenm created this revision. arsenm added reviewers: rjmccall, jdoerfert, efriedma, t-tye, yaxunl, scott.linder, rnk, spatel, lebedev.ri, nlopes, fhahn, hfinkel, Anastasia. Herald added subscribers: dexonsmith, steven_wu, hiraditya, wdng. Herald added a project: LLVM. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. This implements the variant only capturing the size, unlike the type in the original RFC. This also does not mandate the align attribute be present. If the align attribute is not present, an alignment of 1 is assumed. https://reviews.llvm.org/D83518 Files: llvm/docs/LangRef.rst llvm/docs/ReleaseNotes.rst llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Argument.h llvm/include/llvm/IR/Attributes.h llvm/include/llvm/IR/Attributes.td llvm/include/llvm/IR/Function.h llvm/lib/Analysis/MemoryBuiltins.cpp llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLParser.h llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/AttributeImpl.h llvm/lib/IR/Attributes.cpp llvm/lib/IR/Function.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp llvm/test/Assembler/byref-parse-error-0.ll llvm/test/Assembler/byref-parse-error-1.ll llvm/test/Assembler/byref-parse-error-10.ll llvm/test/Assembler/byref-parse-error-2.ll llvm/test/Assembler/byref-parse-error-3.ll llvm/test/Assembler/byref-parse-error-4.ll llvm/test/Assembler/byref-parse-error-5.ll llvm/test/Assembler/byref-parse-error-6.ll llvm/test/Assembler/byref-parse-error-7.ll llvm/test/Assembler/byref-parse-error-8.ll llvm/test/Assembler/byref-parse-error-9.ll llvm/test/Bitcode/attributes.ll llvm/test/CodeGen/X86/byref.ll llvm/test/Instrumentation/AddressSanitizer/byref-args.ll llvm/test/Transforms/DeadArgElim/byref.ll llvm/test/Transforms/Inline/byref-align.ll llvm/test/Transforms/LowerConstantIntrinsics/objectsize_basic.ll llvm/test/Verifier/byref.ll llvm/unittests/IR/VerifierTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83518.276856.patch Type: text/x-patch Size: 41303 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:43:07 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:43:07 +0000 (UTC) Subject: [PATCH] D83338: [PowerPC][Power10] Implemented Vector Shift Builtins In-Reply-To: References: Message-ID: <0477a55aca3de69040ce49df62fb499e@localhost.localdomain> amyk requested changes to this revision. amyk added a comment. This revision now requires changes to proceed. This will need to be rebased against your 2608 instruction definitions patch. But yes, I believe you are missing the clang and llc test case for this patch. Requesting changes due to missing tests. ================ Comment at: clang/lib/Headers/altivec.h:17099 + +/* vector shifts for quadwords */ +static __inline__ vector unsigned __int128 __ATTRS_o_ai ---------------- `/* vs[l | r | raq] */` (with a new line after the comment) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83338/new/ https://reviews.llvm.org/D83338 From llvm-commits at lists.llvm.org Thu Jul 9 15:50:58 2020 From: llvm-commits at lists.llvm.org (Pete Steinfeld via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:50:58 +0000 (UTC) Subject: [PATCH] D83491: [flang] Fix a crash when creating generics from a copy In-Reply-To: References: Message-ID: <706c992543b863fc51dfcc33ddc1b152@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG85d9745c83a1: [flang] Fix a crash when creating generics from a copy (authored by PeteSteinfeld). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83491/new/ https://reviews.llvm.org/D83491 Files: flang/include/flang/Semantics/symbol.h flang/lib/Semantics/symbol.cpp flang/test/Semantics/resolve53.f90 Index: flang/test/Semantics/resolve53.f90 =================================================================== --- flang/test/Semantics/resolve53.f90 +++ flang/test/Semantics/resolve53.f90 @@ -457,3 +457,26 @@ integer :: i, j end end + +module m20 + interface operator(.not.) + real function f(x) + character(*),intent(in) :: x + end function + end interface + interface operator(+) + procedure f + end interface +end module + +subroutine s1() + use m20 + interface operator(.not.) + !ERROR: Procedure 'f' is already specified in generic 'operator(.not.)' + procedure f + end interface + interface operator(+) + !ERROR: Procedure 'f' is already specified in generic 'operator(+)' + procedure f + end interface +end subroutine s1 Index: flang/lib/Semantics/symbol.cpp =================================================================== --- flang/lib/Semantics/symbol.cpp +++ flang/lib/Semantics/symbol.cpp @@ -150,9 +150,6 @@ return *this; } -GenericDetails::GenericDetails(const SymbolVector &specificProcs) - : specificProcs_{specificProcs} {} - void GenericDetails::AddSpecificProc( const Symbol &proc, SourceName bindingName) { specificProcs_.push_back(proc); @@ -186,6 +183,8 @@ } void GenericDetails::CopyFrom(const GenericDetails &from) { + CHECK(specificProcs_.size() == bindingNames_.size()); + CHECK(from.specificProcs_.size() == from.bindingNames_.size()); if (from.specific_) { CHECK(!specific_ || specific_ == from.specific_); specific_ = from.specific_; @@ -194,11 +193,13 @@ CHECK(!derivedType_ || derivedType_ == from.derivedType_); derivedType_ = from.derivedType_; } - for (const Symbol &symbol : from.specificProcs_) { + for (std::size_t i{0}; i < from.specificProcs_.size(); ++i) { if (std::find_if(specificProcs_.begin(), specificProcs_.end(), - [&](const Symbol &mySymbol) { return &mySymbol == &symbol; }) == - specificProcs_.end()) { - specificProcs_.push_back(symbol); + [&](const Symbol &mySymbol) { + return &mySymbol == &*from.specificProcs_[i]; + }) == specificProcs_.end()) { + specificProcs_.push_back(from.specificProcs_[i]); + bindingNames_.push_back(from.bindingNames_[i]); } } } Index: flang/include/flang/Semantics/symbol.h =================================================================== --- flang/include/flang/Semantics/symbol.h +++ flang/include/flang/Semantics/symbol.h @@ -423,7 +423,6 @@ class GenericDetails { public: GenericDetails() {} - GenericDetails(const SymbolVector &specificProcs); GenericKind kind() const { return kind_; } void set_kind(GenericKind kind) { kind_ = kind; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83491.276859.patch Type: text/x-patch Size: 2721 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:52:15 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:52:15 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager Message-ID: aeubanks created this revision. Herald added a reviewer: bollu. Herald added subscribers: llvm-commits, cfe-commits, jfb, dexonsmith, steven_wu, hiraditya. Herald added projects: clang, LLVM. This uses pass instrumentation callbacks to skip optional passes. PassInfoMixin now declares that passes inheriting from it are by default optional. Using RequiredPassInfoMixin overrides the pass to be required. The new OptNoneInstrumentation is part of StandardInstrumentations. The feature of skipping optional passes for optnone functions under NPM is gated on a -enable-npm-optnone flag. Currently it is by default false. That is because we still need to mark all required passes to be required. Otherwise optnone functions will start behaving incorrectly. After that is done in following changes, we can remove the flag and always enable this. All adaptors/managers must be required, since the pass(es) they are wrapping may be required. In the future, opt-bisect will use this same mechanmism of determining which passes are required/optional. Depends on D83498 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83519 Files: clang/lib/CodeGen/BackendUtil.cpp llvm/include/llvm/Analysis/CGSCCPassManager.h llvm/include/llvm/IR/PassInstrumentation.h llvm/include/llvm/IR/PassManager.h llvm/include/llvm/IR/PassManagerInternal.h llvm/include/llvm/Passes/StandardInstrumentations.h llvm/include/llvm/Transforms/Scalar/LoopPassManager.h llvm/lib/IR/PassTimingInfo.cpp llvm/lib/LTO/LTOBackend.cpp llvm/lib/Passes/StandardInstrumentations.cpp llvm/test/Feature/optnone-opt.ll llvm/tools/opt/NewPMDriver.cpp llvm/unittests/IR/PassBuilderCallbacksTest.cpp polly/include/polly/ScopPass.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83519.276860.patch Type: text/x-patch Size: 23684 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:55:13 2020 From: llvm-commits at lists.llvm.org (Peter Klausler via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:55:13 +0000 (UTC) Subject: [PATCH] D83515: [flang] Fix frontend build with -DBUILD_SHARED_LIBS=On In-Reply-To: References: Message-ID: <8fe4e7a33d167f0bdf3ff8ccca2db8ff@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG8a8bb078a3c8: [flang] Fix frontend build with -DBUILD_SHARED_LIBS=On (authored by klausler). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83515/new/ https://reviews.llvm.org/D83515 Files: flang/include/flang/Common/indirection.h flang/include/flang/Evaluate/call.h flang/include/flang/Evaluate/expression.h flang/include/flang/Evaluate/tools.h flang/include/flang/Parser/parse-tree.h flang/include/flang/Semantics/expression.h flang/include/flang/Semantics/tools.h flang/lib/Evaluate/CMakeLists.txt flang/lib/Evaluate/call.cpp flang/lib/Evaluate/expression.cpp flang/lib/Evaluate/tools.cpp flang/lib/Lower/CMakeLists.txt flang/lib/Parser/parse-tree.cpp flang/lib/Semantics/CMakeLists.txt flang/lib/Semantics/expression.cpp flang/lib/Semantics/tools.cpp flang/tools/f18-parse-demo/stub-evaluate.cpp flang/tools/f18/CMakeLists.txt flang/unittests/Evaluate/CMakeLists.txt flang/unittests/Runtime/CMakeLists.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: D83515.276861.patch Type: text/x-patch Size: 16370 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 15:55:39 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 22:55:39 +0000 (UTC) Subject: [PATCH] D83520: [llvm-libtool-darwin] Allow flatenning archives Message-ID: sameerarora101 created this revision. sameerarora101 added reviewers: alexshap, Ktwu, smeenai, jhenderson, MaskRay, mtrent. Herald added subscribers: llvm-commits, mgorny. Herald added a project: LLVM. Add support for flattening archives while creating static libraries. As per cctools' libtool's behavior, llvm-libtool-darwin does not flatten archives recursively. Depends on D83002 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83520 Files: llvm/test/tools/llvm-libtool-darwin/Inputs/invalid-archive.lib llvm/test/tools/llvm-libtool-darwin/invalid-archive.test llvm/test/tools/llvm-libtool-darwin/valid-archive.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83520.276862.patch Type: text/x-patch Size: 7958 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:03:08 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:03:08 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <5dc0aa1e56ceb3e25f68b1d50a78d909@localhost.localdomain> ychen added inline comments. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- aeubanks wrote: > ychen wrote: > > aeubanks wrote: > > > ychen wrote: > > > > How about keeping this local? These are only for testing. > > > Do you mean keeping this in an anonymous namespace? > > > As mentioned in the commit, that makes the printed name messed up. > > Add some regex in lit tests? > > Running pass: {{.*}}NoOpModulePass > > > I don't see any reason to distinguish it from other passes, even if it's only used for testing. It's a useful tool for sanity checks. Having a `(anonymous namespace)` printed anywhere doesn't look good. > And it'd require updating more tests than I really want to. If we really treat them as normal passes, they should be moved to a header file. If we treat them as testing tools only, we put them in .cpp file in an anonymous namespace. It looks confusing to be not in header file and in `llvm` namespace. Or perhaps, we don't touch these no-op passes, and add a comment saying we're overriding the `name()` computing here to make tests cleaner? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 16:13:20 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via llvm-commits) Date: Thu, 09 Jul 2020 16:13:20 -0700 (PDT) Subject: [llvm] 56ae2ce - [AArch64][SVE] Add lowering for llvm.fma. Message-ID: <5f07a490.1c69fb81.aa776.a2c1@mx.google.com> Author: Eli Friedman Date: 2020-07-09T16:12:41-07:00 New Revision: 56ae2cebcdf7884470212ed2a04c1bce73d5c996 URL: https://github.com/llvm/llvm-project/commit/56ae2cebcdf7884470212ed2a04c1bce73d5c996 DIFF: https://github.com/llvm/llvm-project/commit/56ae2cebcdf7884470212ed2a04c1bce73d5c996.diff LOG: [AArch64][SVE] Add lowering for llvm.fma. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to at least generate code. Differential Revision: https://reviews.llvm.org/D83444 Added: Modified: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td llvm/test/CodeGen/AArch64/sve-fp.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index c3735c8784ca..1a3bbaf1832d 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -948,6 +948,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom); setOperationAction(ISD::SPLAT_VECTOR, VT, Custom); setOperationAction(ISD::SELECT, VT, Custom); + setOperationAction(ISD::FMA, VT, Custom); } } @@ -1470,6 +1471,7 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const { MAKE_CASE(AArch64ISD::FADD_PRED) MAKE_CASE(AArch64ISD::FADDA_PRED) MAKE_CASE(AArch64ISD::FADDV_PRED) + MAKE_CASE(AArch64ISD::FMA_PRED) MAKE_CASE(AArch64ISD::FMAXV_PRED) MAKE_CASE(AArch64ISD::FMAXNMV_PRED) MAKE_CASE(AArch64ISD::FMINV_PRED) @@ -3455,6 +3457,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op, return LowerF128Call(Op, DAG, RTLIB::SUB_F128); case ISD::FMUL: return LowerF128Call(Op, DAG, RTLIB::MUL_F128); + case ISD::FMA: + return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMA_PRED); case ISD::FDIV: return LowerF128Call(Op, DAG, RTLIB::DIV_F128); case ISD::FP_ROUND: diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h index 4b395acd816d..1be44797aac7 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -77,6 +77,7 @@ enum NodeType : unsigned { FADD_PRED, SDIV_PRED, UDIV_PRED, + FMA_PRED, SMIN_MERGE_OP1, UMIN_MERGE_OP1, SMAX_MERGE_OP1, diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td index 825082230d1f..28a54e6f7d79 100644 --- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td +++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td @@ -167,9 +167,15 @@ def SDT_AArch64Arith : SDTypeProfile<1, 3, [ SDTCVecEltisVT<1,i1>, SDTCisSameAs<2,3> ]>; +def SDT_AArch64FMA : SDTypeProfile<1, 4, [ + SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVec<3>, SDTCisVec<4>, + SDTCVecEltisVT<1,i1>, SDTCisSameAs<2,3>, SDTCisSameAs<3,4> +]>; + // Predicated operations with the result of inactive lanes being unspecified. def AArch64add_p : SDNode<"AArch64ISD::ADD_PRED", SDT_AArch64Arith>; def AArch64fadd_p : SDNode<"AArch64ISD::FADD_PRED", SDT_AArch64Arith>; +def AArch64fma_p : SDNode<"AArch64ISD::FMA_PRED", SDT_AArch64FMA>; def AArch64sdiv_p : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64Arith>; def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>; @@ -393,6 +399,16 @@ let Predicates = [HasSVE] in { defm FNMAD_ZPmZZ : sve_fp_3op_p_zds_b<0b10, "fnmad", int_aarch64_sve_fnmad>; defm FNMSB_ZPmZZ : sve_fp_3op_p_zds_b<0b11, "fnmsb", int_aarch64_sve_fnmsb>; + // Add patterns for FMA where disabled lanes are undef. + // FIXME: Implement a pseudo so we can choose a better instruction after + // regalloc. + def : Pat<(nxv8f16 (AArch64fma_p nxv8i1:$P, nxv8f16:$Op1, nxv8f16:$Op2, nxv8f16:$Op3)), + (FMLA_ZPmZZ_H $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv4f32 (AArch64fma_p nxv4i1:$P, nxv4f32:$Op1, nxv4f32:$Op2, nxv4f32:$Op3)), + (FMLA_ZPmZZ_S $P, $Op3, $Op1, $Op2)>; + def : Pat<(nxv2f64 (AArch64fma_p nxv2i1:$P, nxv2f64:$Op1, nxv2f64:$Op2, nxv2f64:$Op3)), + (FMLA_ZPmZZ_D $P, $Op3, $Op1, $Op2)>; + defm FTMAD_ZZI : sve_fp_ftmad<"ftmad", int_aarch64_sve_ftmad_x>; defm FMLA_ZZZI : sve_fp_fma_by_indexed_elem<0b0, "fmla", int_aarch64_sve_fmla_lane>; diff --git a/llvm/test/CodeGen/AArch64/sve-fp.ll b/llvm/test/CodeGen/AArch64/sve-fp.ll index c7cf917b2e64..e3c0ba72bda1 100644 --- a/llvm/test/CodeGen/AArch64/sve-fp.ll +++ b/llvm/test/CodeGen/AArch64/sve-fp.ll @@ -85,6 +85,56 @@ define @fmul_d( %a, %res } +define @fma_half( %a, %b, %c) { +; CHECK-LABEL: fma_half: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.h +; CHECK-NEXT: fmla z2.h, p0/m, z0.h, z1.h +; CHECK-NEXT: mov z0.d, z2.d +; CHECK-NEXT: ret + %r = call @llvm.fma.nxv8f16( %a, %b, %c) + ret %r +} +define @fma_float( %a, %b, %c) { +; CHECK-LABEL: fma_float: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.s +; CHECK-NEXT: fmla z2.s, p0/m, z0.s, z1.s +; CHECK-NEXT: mov z0.d, z2.d +; CHECK-NEXT: ret + %r = call @llvm.fma.nxv4f32( %a, %b, %c) + ret %r +} +define @fma_double_1( %a, %b, %c) { +; CHECK-LABEL: fma_double_1: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.d +; CHECK-NEXT: fmla z2.d, p0/m, z0.d, z1.d +; CHECK-NEXT: mov z0.d, z2.d +; CHECK-NEXT: ret + %r = call @llvm.fma.nxv2f64( %a, %b, %c) + ret %r +} +define @fma_double_2( %a, %b, %c) { +; CHECK-LABEL: fma_double_2: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.d +; CHECK-NEXT: fmla z2.d, p0/m, z1.d, z0.d +; CHECK-NEXT: mov z0.d, z2.d +; CHECK-NEXT: ret + %r = call @llvm.fma.nxv2f64( %b, %a, %c) + ret %r +} +define @fma_double_3( %a, %b, %c) { +; CHECK-LABEL: fma_double_3: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.d +; CHECK-NEXT: fmla z0.d, p0/m, z2.d, z1.d +; CHECK-NEXT: ret + %r = call @llvm.fma.nxv2f64( %c, %b, %a) + ret %r +} + define @frecps_h( %a, %b) { ; CHECK-LABEL: frecps_h: ; CHECK: // %bb.0: @@ -166,5 +216,9 @@ declare @llvm.aarch64.sve.frsqrts.x.nxv8f16( @llvm.aarch64.sve.frsqrts.x.nxv4f32(, ) declare @llvm.aarch64.sve.frsqrts.x.nxv2f64(, ) +declare @llvm.fma.nxv2f64(, , ) +declare @llvm.fma.nxv4f32(, , ) +declare @llvm.fma.nxv8f16(, , ) + ; Function Attrs: nounwind readnone declare double @llvm.aarch64.sve.faddv.nxv2f64(, ) #2 From llvm-commits at lists.llvm.org Thu Jul 9 16:13:33 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:13:33 +0000 (UTC) Subject: [PATCH] D83444: [AArch64][SVE] Add lowering for llvm.fma. In-Reply-To: References: Message-ID: <2350856758e96217d5bf3270ca6ca729@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG56ae2cebcdf7: [AArch64][SVE] Add lowering for llvm.fma. (authored by efriedma). Changed prior to commit: https://reviews.llvm.org/D83444?vs=276606&id=276863#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83444/new/ https://reviews.llvm.org/D83444 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td llvm/test/CodeGen/AArch64/sve-fp.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83444.276863.patch Type: text/x-patch Size: 6719 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:15:09 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:15:09 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <88370d3977f0d2e7fe3d5ee0eafbd143@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Thu Jul 9 16:20:57 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:20:57 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <477e11dfd7c23acd13219bcdbbf89a76@localhost.localdomain> avl added a comment. Thank you, for the review. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Thu Jul 9 16:22:30 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:22:30 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <25c51f3bc44f7191f8a0b350aa4a6920@localhost.localdomain> aeubanks updated this revision to Diff 276865. aeubanks added a comment. Put passes/analyses in PassBuilder.cpp back into anonymous namespace, override name Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 Files: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83498.276865.patch Type: text/x-patch Size: 13697 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:22:49 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:22:49 +0000 (UTC) Subject: [PATCH] D83521: [NFC] Extract the code to write instr profile into function writeInstrProfile Message-ID: wmi created this revision. wmi added a reviewer: davidxl. Herald added a project: LLVM. So that the function writeInstrProfile can be used in other places, for example in https://reviews.llvm.org/D81981. Repository: rL LLVM https://reviews.llvm.org/D83521 Files: llvm/tools/llvm-profdata/llvm-profdata.cpp Index: llvm/tools/llvm-profdata/llvm-profdata.cpp =================================================================== --- llvm/tools/llvm-profdata/llvm-profdata.cpp +++ llvm/tools/llvm-profdata/llvm-profdata.cpp @@ -291,6 +291,22 @@ }); } +static void writeInstrProfile(StringRef OutputFilename, + ProfileFormat OutputFormat, + InstrProfWriter &Writer) { + std::error_code EC; + raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); + if (EC) + exitWithErrorCode(EC, OutputFilename); + + if (OutputFormat == PF_Text) { + if (Error E = Writer.writeText(Output)) + exitWithError(std::move(E)); + } else { + Writer.write(Output); + } +} + static void mergeInstrProfile(const WeightedFileVector &Inputs, SymbolRemapper *Remapper, StringRef OutputFilename, @@ -366,18 +382,7 @@ (NumErrors > 0 && FailMode == failIfAnyAreInvalid)) exitWithError("No profiles could be merged."); - std::error_code EC; - raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); - if (EC) - exitWithErrorCode(EC, OutputFilename); - - InstrProfWriter &Writer = Contexts[0]->Writer; - if (OutputFormat == PF_Text) { - if (Error E = Writer.writeText(Output)) - exitWithError(std::move(E)); - } else { - Writer.write(Output); - } + writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer); } /// Make a copy of the given function samples with all symbol names remapped -------------- next part -------------- A non-text attachment was scrubbed... Name: D83521.276864.patch Type: text/x-patch Size: 1563 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:25:21 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:25:21 +0000 (UTC) Subject: [PATCH] D83521: [NFC] Extract the code to write instr profile into function writeInstrProfile In-Reply-To: References: Message-ID: davidxl accepted this revision. davidxl added a comment. This revision is now accepted and ready to land. lgtm Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83521/new/ https://reviews.llvm.org/D83521 From llvm-commits at lists.llvm.org Thu Jul 9 16:26:08 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:26:08 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <5da2399da0566d90098475558c66a77b@localhost.localdomain> mtrofin updated this revision to Diff 276868. mtrofin marked 5 inline comments as done. mtrofin added a comment. feedback Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 Files: llvm/CMakeLists.txt llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h llvm/include/llvm/Analysis/Utils/TFUtils.h llvm/lib/Analysis/CMakeLists.txt llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp llvm/lib/Analysis/TFUtils.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassRegistry.def llvm/unittests/Analysis/CMakeLists.txt llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/saved_model.pb llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.data-00000-of-00001 llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.index llvm/unittests/Analysis/TFUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82817.276868.patch Type: text/x-patch Size: 35891 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:26:24 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:26:24 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <3bacb4d2d90bc07ceef28c5e9a81f325@localhost.localdomain> mtrofin added inline comments. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:263 + return None; + auto Features = IRToNativeSizeLearning::getFunctionFeatures( + const_cast(F), FAM); ---------------- davidxl wrote: > Can we make getFunctionFeatures directly return the filled tensor -- or at least provide a wrapper? There is no need to expose the TF details with the inline sequence here. I'm planning on reusing getFunctionFeatures for the other part of this functionality - extracting them to provide a training data set for the ir2native model. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:268 + std::vector Output{nullptr}; + if (!Evaluator->evaluate(Output)) + return None; ---------------- davidxl wrote: > Code from line 268 to 271 can probably be wrapped in a single wrapper function to hide TF details including Tensor delete Ya, and it looks nicer - thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Thu Jul 9 16:27:02 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:27:02 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <87127caad34709d11465f0fcfa72d872@localhost.localdomain> mtrofin marked 2 inline comments as done. mtrofin added inline comments. ================ Comment at: llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp:97 +#if LLVM_HAVE_TF_API + EXPECT_GT(*SizeEstimate, 0); +#else ---------------- davidxl wrote: > why is the result 0? It's not, it's greater than 0. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Thu Jul 9 16:27:54 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:27:54 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <590f8bd90973208da0d8c20151502123@localhost.localdomain> asbirlea accepted this revision. asbirlea added inline comments. ================ Comment at: llvm/lib/Passes/PassBuilder.cpp:300 -namespace { +namespace llvm { ---------------- ychen wrote: > aeubanks wrote: > > ychen wrote: > > > aeubanks wrote: > > > > ychen wrote: > > > > > How about keeping this local? These are only for testing. > > > > Do you mean keeping this in an anonymous namespace? > > > > As mentioned in the commit, that makes the printed name messed up. > > > Add some regex in lit tests? > > > Running pass: {{.*}}NoOpModulePass > > > > > I don't see any reason to distinguish it from other passes, even if it's only used for testing. It's a useful tool for sanity checks. Having a `(anonymous namespace)` printed anywhere doesn't look good. > > And it'd require updating more tests than I really want to. > If we really treat them as normal passes, they should be moved to a header file. If we treat them as testing tools only, we put them in .cpp file in an anonymous namespace. It looks confusing to be not in header file and in `llvm` namespace. > > Or perhaps, we don't touch these no-op passes, and add a comment saying we're overriding the `name()` computing here to make tests cleaner? IMO a comment clarifying these are a special case will work here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Thu Jul 9 16:30:14 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:30:14 +0000 (UTC) Subject: [PATCH] D81172: [AMDGPU] Implement hardware bug workaround for image instructions In-Reply-To: References: Message-ID: <899bb9c953f2b484712d6671fba9dc04@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3221-3222 + + if (ImageStore && ST.hasImageStoreD16Bug()) + { + SmallVector PackedRegs; ---------------- Brace formatting ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3226-3228 + // TODO Handle v3f16 + if (StoreVT.getNumElements() == 3) + return Reg; ---------------- There's no obstacle to handling v3 here, it should work in the other cases Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81172/new/ https://reviews.llvm.org/D81172 From llvm-commits at lists.llvm.org Thu Jul 9 16:30:46 2020 From: llvm-commits at lists.llvm.org (Wei Mi via llvm-commits) Date: Thu, 09 Jul 2020 16:30:46 -0700 (PDT) Subject: [llvm] 78fe6a3 - [NFC] Extract the code to write instr profile into function writeInstrProfile Message-ID: <5f07a8a6.1c69fb81.85c9e.9e77@mx.google.com> Author: Wei Mi Date: 2020-07-09T16:30:28-07:00 New Revision: 78fe6a3ee244cf1b590cd2a169c81ec00de08cb2 URL: https://github.com/llvm/llvm-project/commit/78fe6a3ee244cf1b590cd2a169c81ec00de08cb2 DIFF: https://github.com/llvm/llvm-project/commit/78fe6a3ee244cf1b590cd2a169c81ec00de08cb2.diff LOG: [NFC] Extract the code to write instr profile into function writeInstrProfile So that the function writeInstrProfile can be used in other places. Differential Revision: https://reviews.llvm.org/D83521 Added: Modified: llvm/tools/llvm-profdata/llvm-profdata.cpp Removed: ################################################################################ diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp index 1eb4bc66d60c..843f072a61c3 100644 --- a/llvm/tools/llvm-profdata/llvm-profdata.cpp +++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp @@ -291,6 +291,22 @@ static void mergeWriterContexts(WriterContext *Dst, WriterContext *Src) { }); } +static void writeInstrProfile(StringRef OutputFilename, + ProfileFormat OutputFormat, + InstrProfWriter &Writer) { + std::error_code EC; + raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); + if (EC) + exitWithErrorCode(EC, OutputFilename); + + if (OutputFormat == PF_Text) { + if (Error E = Writer.writeText(Output)) + exitWithError(std::move(E)); + } else { + Writer.write(Output); + } +} + static void mergeInstrProfile(const WeightedFileVector &Inputs, SymbolRemapper *Remapper, StringRef OutputFilename, @@ -366,18 +382,7 @@ static void mergeInstrProfile(const WeightedFileVector &Inputs, (NumErrors > 0 && FailMode == failIfAnyAreInvalid)) exitWithError("No profiles could be merged."); - std::error_code EC; - raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); - if (EC) - exitWithErrorCode(EC, OutputFilename); - - InstrProfWriter &Writer = Contexts[0]->Writer; - if (OutputFormat == PF_Text) { - if (Error E = Writer.writeText(Output)) - exitWithError(std::move(E)); - } else { - Writer.write(Output); - } + writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer); } /// Make a copy of the given function samples with all symbol names remapped From llvm-commits at lists.llvm.org Thu Jul 9 16:31:00 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:31:00 +0000 (UTC) Subject: [PATCH] D83521: [NFC] Extract the code to write instr profile into function writeInstrProfile In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG78fe6a3ee244: [NFC] Extract the code to write instr profile into function writeInstrProfile (authored by wmi). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83521/new/ https://reviews.llvm.org/D83521 Files: llvm/tools/llvm-profdata/llvm-profdata.cpp Index: llvm/tools/llvm-profdata/llvm-profdata.cpp =================================================================== --- llvm/tools/llvm-profdata/llvm-profdata.cpp +++ llvm/tools/llvm-profdata/llvm-profdata.cpp @@ -291,6 +291,22 @@ }); } +static void writeInstrProfile(StringRef OutputFilename, + ProfileFormat OutputFormat, + InstrProfWriter &Writer) { + std::error_code EC; + raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); + if (EC) + exitWithErrorCode(EC, OutputFilename); + + if (OutputFormat == PF_Text) { + if (Error E = Writer.writeText(Output)) + exitWithError(std::move(E)); + } else { + Writer.write(Output); + } +} + static void mergeInstrProfile(const WeightedFileVector &Inputs, SymbolRemapper *Remapper, StringRef OutputFilename, @@ -366,18 +382,7 @@ (NumErrors > 0 && FailMode == failIfAnyAreInvalid)) exitWithError("No profiles could be merged."); - std::error_code EC; - raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None); - if (EC) - exitWithErrorCode(EC, OutputFilename); - - InstrProfWriter &Writer = Contexts[0]->Writer; - if (OutputFormat == PF_Text) { - if (Error E = Writer.writeText(Output)) - exitWithError(std::move(E)); - } else { - Writer.write(Output); - } + writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer); } /// Make a copy of the given function samples with all symbol names remapped -------------- next part -------------- A non-text attachment was scrubbed... Name: D83521.276870.patch Type: text/x-patch Size: 1563 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:33:39 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:33:39 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <17e96208556185319635855326e23c3b@localhost.localdomain> MaskRay added a comment. One point I want to raise: the model files are larger than 100KiB. Doing this once and keeping it stable for, say, 6 months is a probably an acceptable pace. If folks keep iterating on the model and checking in other model files as a result in a more regular basis, I'd be more wary as I am not sure other folks may accept this. llvm/llvm-test-suite might be suitable place if you want to add large model files. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Thu Jul 9 16:41:32 2020 From: llvm-commits at lists.llvm.org (Alexis Perry-Holby via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:41:32 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: AlexisPerry updated this revision to Diff 276871. AlexisPerry added a comment. Extended the flang driver options to include gfortran equivalents to pgf90 specific options. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 Files: flang/tools/f18/f18.cpp Index: flang/tools/f18/f18.cpp =================================================================== --- flang/tools/f18/f18.cpp +++ flang/tools/f18/f18.cpp @@ -446,15 +446,17 @@ args.pop_front(); } break; - } else if (arg == "-Mfixed") { + } else if (arg == "-Mfixed" || arg == "-ffixed-form") { driver.forcedForm = true; options.isFixedForm = true; - } else if (arg == "-Mfree") { + } else if (arg == "-Mfree" || arg == "-ffree-form") { driver.forcedForm = true; options.isFixedForm = false; - } else if (arg == "-Mextend") { + } else if (arg == "-Mextend" || arg == "-ffixed-line-length-132") { options.fixedFormColumns = 132; - } else if (arg == "-Munlimited") { + } else if (arg == "-Munlimited" || arg == "-ffree-line-length-none" || + arg == "-ffree-line-length-0" || arg == "-ffixed-line-length-none" || + arg == "-ffixed-line-length-0") { // For reparsing f18's -E output of fixed-form cooked character stream options.fixedFormColumns = 1000000; } else if (arg == "-Mbackslash") { @@ -463,7 +465,8 @@ } else if (arg == "-Mnobackslash") { options.features.Enable( Fortran::common::LanguageFeature::BackslashEscapes, true); - } else if (arg == "-Mstandard") { + } else if (arg == "-Mstandard" || arg == "-std=f95" || + arg == "-std=f2003" || arg == "-std=f2008" || arg == "-std=legacy") { driver.warnOnNonstandardUsage = true; } else if (arg == "-fopenmp") { options.features.Enable(Fortran::common::LanguageFeature::OpenMP); @@ -530,6 +533,8 @@ } else if (arg.substr(0, 2) == "-U") { options.predefinitions.emplace_back( arg.substr(2), std::optional{}); + } else if (arg == "-fdefault-double-8") { + defaultKinds.set_defaultRealKind(4); } else if (arg == "-r8" || arg == "-fdefault-real-8") { defaultKinds.set_defaultRealKind(8); } else if (arg == "-i8" || arg == "-fdefault-integer-8") { @@ -580,15 +585,17 @@ } else if (arg == "-help" || arg == "--help" || arg == "-?") { llvm::errs() << "f18 options:\n" - << " -Mfixed | -Mfree force the source form\n" - << " -Mextend 132-column fixed form\n" + << " -Mfixed | -Mfree | -ffixed-form | -ffree-form force the " + "source form\n" + << " -Mextend | -ffixed-line-length-132 132-column fixed form\n" << " -f[no-]backslash enable[disable] \\escapes in literals\n" << " -M[no]backslash disable[enable] \\escapes in literals\n" << " -Mstandard enable conformance warnings\n" + << " -std= enable conformance warnings\n" << " -fenable= enable a language feature\n" << " -fdisable= disable a language feature\n" - << " -r8 | -fdefault-real-8 | -i8 | -fdefault-integer-8 " - "change default kinds of intrinsic types\n" + << " -r8 | -fdefault-real-8 | -i8 | -fdefault-integer-8 | " + "-fdefault-double-8 change default kinds of intrinsic types\n" << " -Werror treat warnings as errors\n" << " -ed enable fixed form D lines\n" << " -E prescan & preprocess only\n" -------------- next part -------------- A non-text attachment was scrubbed... Name: D83488.276871.patch Type: text/x-patch Size: 3390 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:42:24 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:42:24 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <15bdef6ea80ca3996c9ab19ef162b1b5@localhost.localdomain> mtrofin marked an inline comment as done. mtrofin added a comment. In D82817#2142936 , @MaskRay wrote: > One point I want to raise: the model files are larger than 100KiB. Doing this once and keeping it stable for, say, 6 months is a probably an acceptable pace. If folks keep iterating on the model and checking in other model files as a result in a more regular basis, I'd be more wary as I am not sure other folks may accept this. llvm/llvm-test-suite might be suitable place if you want to add large model files. That makes sense - we won't want to churn these other than for large new changes, so that should be "rarely" - but I'll take a look at the llvm-test-suite as a next step, too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Thu Jul 9 16:49:43 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:49:43 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <1a9dc1899d75642696a609bf2d4389ca@localhost.localdomain> aeubanks updated this revision to Diff 276872. aeubanks added a comment. Update comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 Files: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83498.276872.patch Type: text/x-patch Size: 13793 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:52:06 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:52:06 +0000 (UTC) Subject: [PATCH] D83518: IR: Define byref attribute In-Reply-To: References: Message-ID: <638a010946afc246971cf7413cf2d299@localhost.localdomain> arsenm planned changes to this revision. arsenm added a comment. I just realized a reason why in the ultimate end state, the actual pointee type is still useful CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83518/new/ https://reviews.llvm.org/D83518 From llvm-commits at lists.llvm.org Thu Jul 9 16:53:50 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:53:50 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: clementval added a comment. I don't see the previous change made to set gfortran as default. Are they gone with your diff update? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 16:55:22 2020 From: llvm-commits at lists.llvm.org (Alexis Perry-Holby via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:55:22 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: <9053f72145e86c5c05eed3b81a48aba6@localhost.localdomain> AlexisPerry updated this revision to Diff 276873. AlexisPerry added a comment. Changed the default external compiler used by the flang temporary driver. - Changed default F18_FC from pgf90 to gfortran. - Removed unneccesary references to pgf90 in favor of more generic naming. - Extended the flang driver options to include gfortran equivalents to pgf90 specific options. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 Files: flang/tools/f18/f18.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83488.276873.patch Type: text/x-patch Size: 7699 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:57:17 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:57:17 +0000 (UTC) Subject: [PATCH] D83439: [NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe296e9dfd6ce: [NFC] Change getEntryForPercentile to be a static function in… (authored by wmi). Changed prior to commit: https://reviews.llvm.org/D83439?vs=276580&id=276874#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83439/new/ https://reviews.llvm.org/D83439 Files: llvm/include/llvm/ProfileData/ProfileCommon.h llvm/lib/Analysis/ProfileSummaryInfo.cpp llvm/lib/ProfileData/ProfileSummaryBuilder.cpp Index: llvm/lib/ProfileData/ProfileSummaryBuilder.cpp =================================================================== --- llvm/lib/ProfileData/ProfileSummaryBuilder.cpp +++ llvm/lib/ProfileData/ProfileSummaryBuilder.cpp @@ -31,6 +31,19 @@ const ArrayRef ProfileSummaryBuilder::DefaultCutoffs = DefaultCutoffsData; +const ProfileSummaryEntry & +ProfileSummaryBuilder::getEntryForPercentile(SummaryEntryVector &DS, + uint64_t Percentile) { + auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { + return Entry.Cutoff < Percentile; + }); + // The required percentile has to be <= one of the percentiles in the + // detailed summary. + if (It == DS.end()) + report_fatal_error("Desired percentile exceeds the maximum cutoff"); + return *It; +} + void InstrProfSummaryBuilder::addRecord(const InstrProfRecord &R) { // The first counter is not necessarily an entry count for IR // instrumentation profiles. Index: llvm/lib/Analysis/ProfileSummaryInfo.cpp =================================================================== --- llvm/lib/Analysis/ProfileSummaryInfo.cpp +++ llvm/lib/Analysis/ProfileSummaryInfo.cpp @@ -19,6 +19,7 @@ #include "llvm/IR/Module.h" #include "llvm/IR/ProfileSummary.h" #include "llvm/InitializePasses.h" +#include "llvm/ProfileData/ProfileCommon.h" #include "llvm/Support/CommandLine.h" using namespace llvm; @@ -86,19 +87,6 @@ "and the factor to scale the working set size to use the same " "shared thresholds as PGO.")); -// Find the summary entry for a desired percentile of counts. -static const ProfileSummaryEntry &getEntryForPercentile(SummaryEntryVector &DS, - uint64_t Percentile) { - auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { - return Entry.Cutoff < Percentile; - }); - // The required percentile has to be <= one of the percentiles in the - // detailed summary. - if (It == DS.end()) - report_fatal_error("Desired percentile exceeds the maximum cutoff"); - return *It; -} - // The profile summary metadata may be attached either by the frontend or by // any backend passes (IR level instrumentation, for example). This method // checks if the Summary is null and if so checks if the summary metadata is now @@ -284,13 +272,13 @@ /// Compute the hot and cold thresholds. void ProfileSummaryInfo::computeThresholds() { auto &DetailedSummary = Summary->getDetailedSummary(); - auto &HotEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffHot); + auto &HotEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffHot); HotCountThreshold = HotEntry.MinCount; if (ProfileSummaryHotCount.getNumOccurrences() > 0) HotCountThreshold = ProfileSummaryHotCount; - auto &ColdEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffCold); + auto &ColdEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffCold); ColdCountThreshold = ColdEntry.MinCount; if (ProfileSummaryColdCount.getNumOccurrences() > 0) ColdCountThreshold = ProfileSummaryColdCount; @@ -324,8 +312,8 @@ return iter->second; } auto &DetailedSummary = Summary->getDetailedSummary(); - auto &Entry = - getEntryForPercentile(DetailedSummary, PercentileCutoff); + auto &Entry = ProfileSummaryBuilder::getEntryForPercentile(DetailedSummary, + PercentileCutoff); uint64_t CountThreshold = Entry.MinCount; ThresholdCache[PercentileCutoff] = CountThreshold; return CountThreshold; Index: llvm/include/llvm/ProfileData/ProfileCommon.h =================================================================== --- llvm/include/llvm/ProfileData/ProfileCommon.h +++ llvm/include/llvm/ProfileData/ProfileCommon.h @@ -62,6 +62,10 @@ public: /// A vector of useful cutoff values for detailed summary. static const ArrayRef DefaultCutoffs; + + /// Find the summary entry for a desired percentile of counts. + static const ProfileSummaryEntry & + getEntryForPercentile(SummaryEntryVector &DS, uint64_t Percentile); }; class InstrProfSummaryBuilder final : public ProfileSummaryBuilder { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83439.276874.patch Type: text/x-patch Size: 4353 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 16:57:18 2020 From: llvm-commits at lists.llvm.org (Wei Mi via llvm-commits) Date: Thu, 09 Jul 2020 16:57:18 -0700 (PDT) Subject: [llvm] e296e9d - [NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder. Message-ID: <5f07aede.1c69fb81.43090.951f@mx.google.com> Author: Wei Mi Date: 2020-07-09T16:38:19-07:00 New Revision: e296e9dfd6ceade1271e48a0afacd1a4826676be URL: https://github.com/llvm/llvm-project/commit/e296e9dfd6ceade1271e48a0afacd1a4826676be DIFF: https://github.com/llvm/llvm-project/commit/e296e9dfd6ceade1271e48a0afacd1a4826676be.diff LOG: [NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder. Change file static function getEntryForPercentile to be a static member function in ProfileSummaryBuilder so it can be used by other files. Differential Revision: https://reviews.llvm.org/D83439 Added: Modified: llvm/include/llvm/ProfileData/ProfileCommon.h llvm/lib/Analysis/ProfileSummaryInfo.cpp llvm/lib/ProfileData/ProfileSummaryBuilder.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/ProfileData/ProfileCommon.h b/llvm/include/llvm/ProfileData/ProfileCommon.h index f98a34387fdf..14c305b3d0c0 100644 --- a/llvm/include/llvm/ProfileData/ProfileCommon.h +++ b/llvm/include/llvm/ProfileData/ProfileCommon.h @@ -62,6 +62,10 @@ class ProfileSummaryBuilder { public: /// A vector of useful cutoff values for detailed summary. static const ArrayRef DefaultCutoffs; + + /// Find the summary entry for a desired percentile of counts. + static const ProfileSummaryEntry & + getEntryForPercentile(SummaryEntryVector &DS, uint64_t Percentile); }; class InstrProfSummaryBuilder final : public ProfileSummaryBuilder { diff --git a/llvm/lib/Analysis/ProfileSummaryInfo.cpp b/llvm/lib/Analysis/ProfileSummaryInfo.cpp index 655fc244cb39..c9671d4f5c2e 100644 --- a/llvm/lib/Analysis/ProfileSummaryInfo.cpp +++ b/llvm/lib/Analysis/ProfileSummaryInfo.cpp @@ -19,6 +19,7 @@ #include "llvm/IR/Module.h" #include "llvm/IR/ProfileSummary.h" #include "llvm/InitializePasses.h" +#include "llvm/ProfileData/ProfileCommon.h" #include "llvm/Support/CommandLine.h" using namespace llvm; @@ -86,19 +87,6 @@ static cl::opt PartialSampleProfileWorkingSetSizeScaleFactor( "and the factor to scale the working set size to use the same " "shared thresholds as PGO.")); -// Find the summary entry for a desired percentile of counts. -static const ProfileSummaryEntry &getEntryForPercentile(SummaryEntryVector &DS, - uint64_t Percentile) { - auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { - return Entry.Cutoff < Percentile; - }); - // The required percentile has to be <= one of the percentiles in the - // detailed summary. - if (It == DS.end()) - report_fatal_error("Desired percentile exceeds the maximum cutoff"); - return *It; -} - // The profile summary metadata may be attached either by the frontend or by // any backend passes (IR level instrumentation, for example). This method // checks if the Summary is null and if so checks if the summary metadata is now @@ -284,13 +272,13 @@ bool ProfileSummaryInfo::isFunctionEntryCold(const Function *F) const { /// Compute the hot and cold thresholds. void ProfileSummaryInfo::computeThresholds() { auto &DetailedSummary = Summary->getDetailedSummary(); - auto &HotEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffHot); + auto &HotEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffHot); HotCountThreshold = HotEntry.MinCount; if (ProfileSummaryHotCount.getNumOccurrences() > 0) HotCountThreshold = ProfileSummaryHotCount; - auto &ColdEntry = - getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffCold); + auto &ColdEntry = ProfileSummaryBuilder::getEntryForPercentile( + DetailedSummary, ProfileSummaryCutoffCold); ColdCountThreshold = ColdEntry.MinCount; if (ProfileSummaryColdCount.getNumOccurrences() > 0) ColdCountThreshold = ProfileSummaryColdCount; @@ -324,8 +312,8 @@ ProfileSummaryInfo::computeThreshold(int PercentileCutoff) const { return iter->second; } auto &DetailedSummary = Summary->getDetailedSummary(); - auto &Entry = - getEntryForPercentile(DetailedSummary, PercentileCutoff); + auto &Entry = ProfileSummaryBuilder::getEntryForPercentile(DetailedSummary, + PercentileCutoff); uint64_t CountThreshold = Entry.MinCount; ThresholdCache[PercentileCutoff] = CountThreshold; return CountThreshold; diff --git a/llvm/lib/ProfileData/ProfileSummaryBuilder.cpp b/llvm/lib/ProfileData/ProfileSummaryBuilder.cpp index 3299b5f92069..5d3a07640942 100644 --- a/llvm/lib/ProfileData/ProfileSummaryBuilder.cpp +++ b/llvm/lib/ProfileData/ProfileSummaryBuilder.cpp @@ -31,6 +31,19 @@ static const uint32_t DefaultCutoffsData[] = { const ArrayRef ProfileSummaryBuilder::DefaultCutoffs = DefaultCutoffsData; +const ProfileSummaryEntry & +ProfileSummaryBuilder::getEntryForPercentile(SummaryEntryVector &DS, + uint64_t Percentile) { + auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) { + return Entry.Cutoff < Percentile; + }); + // The required percentile has to be <= one of the percentiles in the + // detailed summary. + if (It == DS.end()) + report_fatal_error("Desired percentile exceeds the maximum cutoff"); + return *It; +} + void InstrProfSummaryBuilder::addRecord(const InstrProfRecord &R) { // The first counter is not necessarily an entry count for IR // instrumentation profiles. From llvm-commits at lists.llvm.org Thu Jul 9 16:58:08 2020 From: llvm-commits at lists.llvm.org (Alexis Perry-Holby via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:58:08 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: AlexisPerry added a comment. In D83488#2142950 , @clementval wrote: > I don't see the previous change made to set gfortran as default. Are they gone with your diff update? My mistake, sorry about that. They should be back now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 16:58:48 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Thu, 09 Jul 2020 16:58:48 -0700 (PDT) Subject: [llvm] 8039d2c - [NFC] Derive from PassInfoMixin for no-op/printing passes Message-ID: <5f07af38.1c69fb81.22853.9465@mx.google.com> Author: Arthur Eubanks Date: 2020-07-09T16:58:30-07:00 New Revision: 8039d2c3bf14585ef37dc9343bf393ecad9aead9 URL: https://github.com/llvm/llvm-project/commit/8039d2c3bf14585ef37dc9343bf393ecad9aead9 DIFF: https://github.com/llvm/llvm-project/commit/8039d2c3bf14585ef37dc9343bf393ecad9aead9.diff LOG: [NFC] Derive from PassInfoMixin for no-op/printing passes PassInfoMixin should be used for all NPM passes, rater than a custom `name()`. This caused ambiguous references in LegacyPassManager.cpp, so had to remove "using namespace llvm::legacy" and move some things around. The passes had to be moved to the llvm namespace, or else they would get printed as "(anonymous namespace)::FooPass". Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D83498 Added: Modified: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/IRPrintingPasses.h b/llvm/include/llvm/IR/IRPrintingPasses.h index 230db988f737..3a1c489ee09f 100644 --- a/llvm/include/llvm/IR/IRPrintingPasses.h +++ b/llvm/include/llvm/IR/IRPrintingPasses.h @@ -19,17 +19,10 @@ #define LLVM_IR_IRPRINTINGPASSES_H #include "llvm/ADT/StringRef.h" +#include "llvm/IR/PassManager.h" #include namespace llvm { -class Pass; -class Function; -class FunctionPass; -class Module; -class ModulePass; -class PreservedAnalyses; -class raw_ostream; -template class AnalysisManager; /// Create and return a pass that writes the module to the specified /// \c raw_ostream. @@ -71,7 +64,7 @@ extern bool shouldPrintAfterPass(StringRef); /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintModulePass { +class PrintModulePass : public PassInfoMixin { raw_ostream &OS; std::string Banner; bool ShouldPreserveUseListOrder; @@ -82,15 +75,13 @@ class PrintModulePass { bool ShouldPreserveUseListOrder = false); PreservedAnalyses run(Module &M, AnalysisManager &); - - static StringRef name() { return "PrintModulePass"; } }; /// Pass for printing a Function as LLVM's text IR assembly. /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintFunctionPass { +class PrintFunctionPass : public PassInfoMixin { raw_ostream &OS; std::string Banner; @@ -99,8 +90,6 @@ class PrintFunctionPass { PrintFunctionPass(raw_ostream &OS, const std::string &Banner = ""); PreservedAnalyses run(Function &F, AnalysisManager &); - - static StringRef name() { return "PrintFunctionPass"; } }; } // End llvm namespace diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 1d9c44f385fb..4189aea46294 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -33,7 +33,6 @@ #include #include using namespace llvm; -using namespace llvm::legacy; // See PassManagers.h for Pass Manager infrastructure overview. @@ -387,6 +386,66 @@ class FunctionPassManagerImpl : public Pass, void FunctionPassManagerImpl::anchor() {} char FunctionPassManagerImpl::ID = 0; + +//===----------------------------------------------------------------------===// +// FunctionPassManagerImpl implementation +// +bool FunctionPassManagerImpl::doInitialization(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + Changed |= getContainedManager(Index)->doInitialization(M); + + return Changed; +} + +bool FunctionPassManagerImpl::doFinalization(Module &M) { + bool Changed = false; + + for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) + Changed |= getContainedManager(Index)->doFinalization(M); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} + +void FunctionPassManagerImpl::releaseMemoryOnTheFly() { + if (!wasRun) + return; + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + FPPassManager *FPPM = getContainedManager(Index); + for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { + FPPM->getContainedPass(Index)->releaseMemory(); + } + } + wasRun = false; +} + +// Execute all the passes managed by this top level manager. +// Return true if any function is modified by a pass. +bool FunctionPassManagerImpl::run(Function &F) { + bool Changed = false; + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnFunction(F); + F.getContext().yield(); + } + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + getContainedManager(Index)->cleanup(); + + wasRun = true; + return Changed; +} } // namespace legacy } // namespace llvm @@ -406,7 +465,7 @@ class MPPassManager : public Pass, public PMDataManager { // Delete on the fly managers. ~MPPassManager() override { for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; delete FPP; } } @@ -451,7 +510,7 @@ class MPPassManager : public Pass, public PMDataManager { for (unsigned Index = 0; Index < getNumContainedPasses(); ++Index) { ModulePass *MP = getContainedPass(Index); MP->dumpPassStructure(Offset + 1); - MapVector::const_iterator I = + MapVector::const_iterator I = OnTheFlyManagers.find(MP); if (I != OnTheFlyManagers.end()) I->second->dumpPassStructure(Offset + 2); @@ -471,7 +530,7 @@ class MPPassManager : public Pass, public PMDataManager { private: /// Collection of on the fly FPPassManagers. These managers manage /// function passes that are required by module passes. - MapVector OnTheFlyManagers; + MapVector OnTheFlyManagers; }; char MPPassManager::ID = 0; @@ -534,6 +593,33 @@ class PassManagerImpl : public Pass, void PassManagerImpl::anchor() {} char PassManagerImpl::ID = 0; + +//===----------------------------------------------------------------------===// +// PassManagerImpl implementation + +// +/// run - Execute all of the passes scheduled for execution. Keep track of +/// whether any of the passes modifies the module, and if so, return true. +bool PassManagerImpl::run(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnModule(M); + M.getContext().yield(); + } + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} } // namespace legacy } // namespace llvm @@ -1314,12 +1400,15 @@ AnalysisResolver::findImplPass(Pass *P, AnalysisID AnalysisPI, Function &F) { return PM.getOnTheFlyPass(P, AnalysisPI, F); } +namespace llvm { +namespace legacy { + //===----------------------------------------------------------------------===// // FunctionPassManager implementation /// Create new Function pass manager FunctionPassManager::FunctionPassManager(Module *m) : M(m) { - FPM = new FunctionPassManagerImpl(); + FPM = new legacy::FunctionPassManagerImpl(); // FPM is the top level manager. FPM->setTopLevelManager(FPM); @@ -1358,36 +1447,8 @@ bool FunctionPassManager::doInitialization() { bool FunctionPassManager::doFinalization() { return FPM->doFinalization(*M); } - -//===----------------------------------------------------------------------===// -// FunctionPassManagerImpl implementation -// -bool FunctionPassManagerImpl::doInitialization(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - Changed |= getContainedManager(Index)->doInitialization(M); - - return Changed; -} - -bool FunctionPassManagerImpl::doFinalization(Module &M) { - bool Changed = false; - - for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) - Changed |= getContainedManager(Index)->doFinalization(M); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} +} // namespace legacy +} // namespace llvm /// cleanup - After running all passes, clean up pass manager cache. void FPPassManager::cleanup() { @@ -1399,35 +1460,6 @@ void FPPassManager::cleanup() { } } -void FunctionPassManagerImpl::releaseMemoryOnTheFly() { - if (!wasRun) - return; - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - FPPassManager *FPPM = getContainedManager(Index); - for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { - FPPM->getContainedPass(Index)->releaseMemory(); - } - } - wasRun = false; -} - -// Execute all the passes managed by this top level manager. -// Return true if any function is modified by a pass. -bool FunctionPassManagerImpl::run(Function &F) { - bool Changed = false; - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnFunction(F); - F.getContext().yield(); - } - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - getContainedManager(Index)->cleanup(); - - wasRun = true; - return Changed; -} //===----------------------------------------------------------------------===// // FPPassManager implementation @@ -1554,7 +1586,7 @@ MPPassManager::runOnModule(Module &M) { // Initialize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; Changed |= FPP->doInitialization(M); } @@ -1615,7 +1647,7 @@ MPPassManager::runOnModule(Module &M) { // Finalize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; // We don't know when is the last time an on-the-fly pass is run, // so we need to releaseMemory / finalize here FPP->releaseMemoryOnTheFly(); @@ -1636,9 +1668,9 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { RequiredPass->getPotentialPassManagerType()) && "Unable to handle Pass that requires lower level Analysis pass"); - FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; if (!FPP) { - FPP = new FunctionPassManagerImpl(); + FPP = new legacy::FunctionPassManagerImpl(); // FPP is the top level manager. FPP->setTopLevelManager(FPP); @@ -1669,7 +1701,7 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { /// its runOnFunction() for function F. std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Function &F) { - FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; assert(FPP && "Unable to find on the fly pass"); FPP->releaseMemoryOnTheFly(); @@ -1678,32 +1710,8 @@ std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Changed); } -//===----------------------------------------------------------------------===// -// PassManagerImpl implementation - -// -/// run - Execute all of the passes scheduled for execution. Keep track of -/// whether any of the passes modifies the module, and if so, return true. -bool PassManagerImpl::run(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnModule(M); - M.getContext().yield(); - } - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} +namespace llvm { +namespace legacy { //===----------------------------------------------------------------------===// // PassManager implementation @@ -1728,6 +1736,8 @@ void PassManager::add(Pass *P) { bool PassManager::run(Module &M) { return PM->run(M); } +} // namespace legacy +} // namespace llvm //===----------------------------------------------------------------------===// // PMStack implementation @@ -1818,4 +1828,4 @@ void FunctionPass::assignPassManager(PMStack &PMS, PM->add(this); } -PassManagerBase::~PassManagerBase() {} +legacy::PassManagerBase::~PassManagerBase() {} diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 58510609cf5e..675511a542a1 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -299,11 +299,16 @@ const PassBuilder::OptimizationLevel PassBuilder::OptimizationLevel::Oz = { namespace { +// The following passes/analyses have custom names, otherwise their name will +// include `(anonymous namespace)`. These are special since they are only for +// testing purposes and don't live in a header file. + /// No-op module pass which does nothing. -struct NoOpModulePass { +struct NoOpModulePass : PassInfoMixin { PreservedAnalyses run(Module &M, ModuleAnalysisManager &) { return PreservedAnalyses::all(); } + static StringRef name() { return "NoOpModulePass"; } }; @@ -319,7 +324,7 @@ class NoOpModuleAnalysis : public AnalysisInfoMixin { }; /// No-op CGSCC pass which does nothing. -struct NoOpCGSCCPass { +struct NoOpCGSCCPass : PassInfoMixin { PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &, LazyCallGraph &, CGSCCUpdateResult &UR) { return PreservedAnalyses::all(); @@ -341,7 +346,7 @@ class NoOpCGSCCAnalysis : public AnalysisInfoMixin { }; /// No-op function pass which does nothing. -struct NoOpFunctionPass { +struct NoOpFunctionPass : PassInfoMixin { PreservedAnalyses run(Function &F, FunctionAnalysisManager &) { return PreservedAnalyses::all(); } @@ -360,7 +365,7 @@ class NoOpFunctionAnalysis : public AnalysisInfoMixin { }; /// No-op loop pass which does nothing. -struct NoOpLoopPass { +struct NoOpLoopPass : PassInfoMixin { PreservedAnalyses run(Loop &L, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &) { return PreservedAnalyses::all(); @@ -386,7 +391,7 @@ AnalysisKey NoOpCGSCCAnalysis::Key; AnalysisKey NoOpFunctionAnalysis::Key; AnalysisKey NoOpLoopAnalysis::Key; -} // End anonymous namespace. +} // namespace void PassBuilder::invokePeepholeEPCallbacks( FunctionPassManager &FPM, PassBuilder::OptimizationLevel Level) { From llvm-commits at lists.llvm.org Thu Jul 9 16:58:54 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 23:58:54 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <5748579b139085e46dee5eed695d25bd@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG8039d2c3bf14: [NFC] Derive from PassInfoMixin for no-op/printing passes (authored by aeubanks). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 Files: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83498.276875.patch Type: text/x-patch Size: 13793 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:00:36 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:00:36 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: clementval added a comment. It's weird, the changes are in but both files are identical ... so it looks like there is nothing to review Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 17:01:33 2020 From: llvm-commits at lists.llvm.org (Rahul Joshi via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:01:33 +0000 (UTC) Subject: [PATCH] D83522: [flang] Adopt NoRegionArguments (WhereOp) and ParentOneOf (ResultOp) traits Message-ID: jurahul created this revision. Herald added subscribers: llvm-commits, dexonsmith, inglorion. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83522 Files: flang/include/flang/Optimizer/Dialect/FIROps.td flang/lib/Optimizer/Dialect/FIROps.cpp Index: flang/lib/Optimizer/Dialect/FIROps.cpp =================================================================== --- flang/lib/Optimizer/Dialect/FIROps.cpp +++ flang/lib/Optimizer/Dialect/FIROps.cpp @@ -968,21 +968,14 @@ auto results = parentOp->getResults(); auto operands = op.getOperands(); - if (isa(parentOp) || isa(parentOp) || - isa(parentOp)) { - if (parentOp->getNumResults() != op.getNumOperands()) - return op.emitOpError() << "parent of result must have same arity"; - for (auto e : llvm::zip(results, operands)) { - if (std::get<0>(e).getType() != std::get<1>(e).getType()) - return op.emitOpError() - << "types mismatch between result op and its parent"; - } - } else { - return op.emitOpError() - << "result only terminates if, do_loop, or iterate_while regions"; + if (parentOp->getNumResults() != op.getNumOperands()) + return op.emitOpError() << "parent of result must have same arity"; + for (auto e : llvm::zip(results, operands)) { + if (std::get<0>(e).getType() != std::get<1>(e).getType()) + return op.emitOpError() + << "types mismatch between result op and its parent"; + return success(); } - return success(); -} //===----------------------------------------------------------------------===// // SelectOp @@ -1439,16 +1432,6 @@ } static LogicalResult verify(fir::WhereOp op) { - // Verify that the entry of each child region does not have arguments. - for (auto ®ion : op.getOperation()->getRegions()) { - if (region.empty()) - continue; - - for (auto &b : region) - if (b.getNumArguments() != 0) - return op.emitOpError( - "requires that child entry blocks have no arguments"); - } if (op.getNumResults() != 0 && op.otherRegion().empty()) return op.emitOpError("must have an else block if defining values"); Index: flang/include/flang/Optimizer/Dialect/FIROps.td =================================================================== --- flang/include/flang/Optimizer/Dialect/FIROps.td +++ flang/include/flang/Optimizer/Dialect/FIROps.td @@ -1853,7 +1853,9 @@ // Fortran loops //===----------------------------------------------------------------------===// -def fir_ResultOp : fir_Op<"result", [NoSideEffect, ReturnLike, Terminator]> { +def fir_ResultOp : fir_Op<"result", + [NoSideEffect, ReturnLike, Terminator, + ParentOneOf<["WhereOp", "LoopOp", "IterWhileOp"]>]> { let summary = "special terminator for use in fir region operations"; let description = [{ @@ -1970,7 +1972,7 @@ }]; } -def fir_WhereOp : region_Op<"if"> { +def fir_WhereOp : region_Op<"if", [NoRegionArguments]> { let summary = "if-then-else conditional operation"; let description = [{ Used to conditionally execute operations. This operation is the FIR -------------- next part -------------- A non-text attachment was scrubbed... Name: D83522.276876.patch Type: text/x-patch Size: 2891 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:10:20 2020 From: llvm-commits at lists.llvm.org (Alexis Perry-Holby via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:10:20 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: <629ae4d651371d9a656ab573b3eed7bc@localhost.localdomain> AlexisPerry added a comment. I'll admit, this is my first time trying to commit to LLVM directly and using Phabricator (rather than on a fork that uses pull requests), and I'm definitely still learning the workflow. I had some git issues when first creating the patch that meant my first commit was lost locally, so when I went to update the patch it only had the second commit which caused the original changes to disappear. I then created a new branch and re-did the changes and updated again, using only a single commit, so that the diff would now have everything in it. If this has resulted in an unreviewable state for this diff, then I will close it and submit a new one which will hopefully go more smoothly now that I've learned a few things. Sorry for the confusion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 17:11:19 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:11:19 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: <3c797fc8f1f0bc83cf6a4edbf320c4c6@localhost.localdomain> ychen added inline comments. ================ Comment at: llvm/include/llvm/IR/PassInstrumentation.h:150 for (auto &C : Callbacks->BeforePassCallbacks) - ShouldRun &= C(Pass.name(), llvm::Any(&IR)); + ShouldRun &= C(Pass.name(), Pass.isRequired(), llvm::Any(&IR)); return ShouldRun; ---------------- Could we do this to not changing the callback API? `ShouldRun &= C(Pass.name(), llvm::Any(&IR)) || Pass.isRequired();` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Thu Jul 9 17:15:23 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:15:23 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets Message-ID: nickdesaulniers created this revision. nickdesaulniers added reviewers: jyknight, void. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Fixes a kernel panic for 4.4 LTS x86_64 Linux kernels. Fixes: D79794 Link: https://github.com/ClangBuiltLinux/linux/issues/1085 Signed-off-by: Nick Desaulniers Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83523 Files: llvm/lib/CodeGen/MachineSink.cpp llvm/test/CodeGen/X86/machine-sink-inlineasm-br.ll Index: llvm/test/CodeGen/X86/machine-sink-inlineasm-br.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/X86/machine-sink-inlineasm-br.ll @@ -0,0 +1,28 @@ +; RUN: llc -O2 $1 -print-after=machine-sink -stop-after=machine-sink -o /dev/null 2>&1 %s | FileCheck %s +%struct1 = type { i8*, %struct2, %struct3 } +%struct2 = type { %struct2*, %struct2* } +%struct3 = type { %struct4 } +%struct4 = type { i32 } +%struct5 = type { %struct6, %struct2, void (%struct1*)*, void (%struct1*)* } +%struct6 = type { i32 } + +define i32 @klist_dec_and_del(%struct1* %0) { + %2 = getelementptr inbounds %struct1, %struct1* %0, i64 0, i32 2 + %3 = getelementptr inbounds %struct3, %struct3* %2, i64 0, i32 0, i32 0 +; CHECK-NOT: %0:gr64 = nuw ADD64ri8 %1:gr64(tied-def 0), 24, implicit-def dead $eflags +; CHECK: INLINEASM_BR &"" [sideeffect] [mayload] [maystore] [attdialect], $0:[mem:m], %1:gr64, 1, $noreg, 24, $noreg, $1:[imm], 1, $2:[imm], blockaddress(@klist_dec_and_del, %ir-block.4), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags + callbr void asm sideeffect "", "*m,er,X,~{memory},~{dirflag},~{fpsr},~{flags}"(i32* %3, i32 1, i8* blockaddress(@klist_dec_and_del, %4)) + to label %8 [label %4] + +4: ; preds = %1 + %5 = getelementptr %struct3, %struct3* %2, i64 -6 + br label %6 + +6: ; preds = %4 + %7 = bitcast %struct3* %5 to %struct5** + store %struct5* null, %struct5** %7, align 8 + br label %8 + +8: ; preds = %6, %1 + ret i32 undef +} Index: llvm/lib/CodeGen/MachineSink.cpp =================================================================== --- llvm/lib/CodeGen/MachineSink.cpp +++ llvm/lib/CodeGen/MachineSink.cpp @@ -733,13 +733,6 @@ if (SuccToSinkTo && SuccToSinkTo->isEHPad()) return nullptr; - // It ought to be okay to sink instructions into an INLINEASM_BR target, but - // only if we make sure that MI occurs _before_ an INLINEASM_BR instruction in - // the source block (which this code does not yet do). So for now, forbid - // doing so. - if (SuccToSinkTo && SuccToSinkTo->isInlineAsmBrIndirectTarget()) - return nullptr; - return SuccToSinkTo; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83523.276879.patch Type: text/x-patch Size: 2394 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:16:39 2020 From: llvm-commits at lists.llvm.org (Nick Desaulniers via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:16:39 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: <5e5db15c069d351ef4ab416138f6edbf@localhost.localdomain> nickdesaulniers added a comment. >From the comment, there's obviously more we want to do here. Posting what I have for today since it's EOD. Will iterate on feedback tomorrow. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Thu Jul 9 17:18:20 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:18:20 +0000 (UTC) Subject: [PATCH] D82467: [PowerPC][Power10] Implement Truncate and Store VSX Vector Builtins In-Reply-To: References: Message-ID: <0c2fcdad947011531f92bc3c9dd3e4c2@localhost.localdomain> amyk updated this revision to Diff 276881. amyk added a comment. Rebased patch, and addressed review comments of having a single `CHECK`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82467/new/ https://reviews.llvm.org/D82467 Files: clang/lib/Headers/altivec.h clang/test/CodeGen/builtins-ppc-p10vector.c llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10vsx.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82467.276881.patch Type: text/x-patch Size: 10357 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:20:49 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:20:49 +0000 (UTC) Subject: [PATCH] D83524: Testing: [AArch64] Message-ID: kyulee created this revision. Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls, mgorny. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83524 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83524.276882.patch Type: text/x-patch Size: 43538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:22:16 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:22:16 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: clementval added a comment. In D83488#2142974 , @AlexisPerry wrote: > I'll admit, this is my first time trying to commit to LLVM directly and using Phabricator (rather than on a fork that uses pull requests), and I'm definitely still learning the workflow. I had some git issues when first creating the patch that meant my first commit was lost locally, so when I went to update the patch it only had the second commit which caused the original changes to disappear. I then created a new branch and re-did the changes and updated again, using only a single commit, so that the diff would now have everything in it. > > If this has resulted in an unreviewable state for this diff, then I will close it and submit a new one which will hopefully go more smoothly now that I've learned a few things. Sorry for the confusion. The diff looks fine now! If you don't use it already, I would recommend to use arcanist to deal with the diff and phabricator ... it makes things easier Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 17:23:03 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:23:03 +0000 (UTC) Subject: [PATCH] D83525: Remove optnone from FullUnroll.ll Message-ID: aeubanks created this revision. aeubanks added a reviewer: echristo. Herald added subscribers: llvm-commits, zzheng. Herald added a project: LLVM. The function shouldn't be optimized if it's marked with optnone. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83525 Files: llvm/test/Transforms/LoopUnroll/FullUnroll.ll Index: llvm/test/Transforms/LoopUnroll/FullUnroll.ll =================================================================== --- llvm/test/Transforms/LoopUnroll/FullUnroll.ll +++ llvm/test/Transforms/LoopUnroll/FullUnroll.ll @@ -15,7 +15,7 @@ ; CHECK: br label ; CHECK-NOT: br i1 -; Function Attrs: noinline nounwind optnone uwtable +; Function Attrs: noinline nounwind uwtable define void @foo() #0 { bb: %tmp = alloca [5 x i32*], align 16 @@ -68,7 +68,7 @@ ret void } -attributes #0 = { noinline nounwind optnone uwtable } +attributes #0 = { noinline nounwind uwtable } !llvm.module.flags = !{!0} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83525.276883.patch Type: text/x-patch Size: 615 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:23:20 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via llvm-commits) Date: Thu, 09 Jul 2020 17:23:20 -0700 (PDT) Subject: [llvm] ce22527 - [AArch64][GlobalISel] Add more specific debug info tests for 613f12dd8e2403f5630ab299d2a1bb2cb111ead1. Message-ID: <5f07b4f8.1c69fb81.ae735.8975@mx.google.com> Author: Amara Emerson Date: 2020-07-09T17:13:16-07:00 New Revision: ce22527c0c7a38ce0ac2037104a2a89443754836 URL: https://github.com/llvm/llvm-project/commit/ce22527c0c7a38ce0ac2037104a2a89443754836 DIFF: https://github.com/llvm/llvm-project/commit/ce22527c0c7a38ce0ac2037104a2a89443754836.diff LOG: [AArch64][GlobalISel] Add more specific debug info tests for 613f12dd8e2403f5630ab299d2a1bb2cb111ead1. As requested, these tests check for specific debug locs on the output of the legalizer. The only one that I couldn't write was for moreElementsVector, which AFAICT we don't trigger on AArch64. Added: llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr-debugloc.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-memlib-debug-loc.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift-imm-promote-dloc.mir Modified: llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr-debugloc.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr-debugloc.mir new file mode 100644 index 000000000000..4ab9a6c3ab06 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr-debugloc.mir @@ -0,0 +1,52 @@ +# RUN: llc -mtriple=aarch64-- -run-pass=legalizer -verify-machineinstrs -O0 %s -o - | FileCheck %s +--- | + target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + target triple = "arm64-apple-ios13.0.0" + + define void @test_debugloc() { + ret void + } + + !llvm.module.flags = !{!0, !1, !2, !3, !4} + !llvm.dbg.cu = !{!5} + !llvm.ident = !{!8} + + !0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 0]} + !1 = !{i32 7, !"Dwarf Version", i32 4} + !2 = !{i32 2, !"Debug Info Version", i32 3} + !3 = !{i32 1, !"wchar_size", i32 4} + !4 = !{i32 7, !"PIC Level", i32 2} + !5 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6, producer: "clang") + !6 = !DIFile(filename: "foo.c", directory: "/") + !7 = !{} + !8 = !{!"clang"} + !9 = distinct !DISubprogram(name: "test_debugloc", scope: !6, file: !6, line: 3, type: !10, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !5, retainedNodes: !7) + !10 = !DISubroutineType(types: !7) + !11 = !DILocation(line: 4, column: 3, scope: !9) + !12 = !DILocation(line: 5, column: 1, scope: !9) + +... +--- +name: test_debugloc +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x0' } + - { reg: '$q0' } +body: | + bb.1: + liveins: $q0, $x0 + + ; CHECK-LABEL: name: test_debugloc + ; CHECK: liveins: $q0, $x0 + ; CHECK: [[COPY:%[0-9]+]]:_(<2 x p0>) = COPY $q0 + ; CHECK: [[COPY1:%[0-9]+]]:_(p0) = COPY $x0 + ; CHECK: [[BITCAST:%[0-9]+]]:_(<2 x s64>) = G_BITCAST [[COPY]](<2 x p0>), debug-location !DILocation(line: 4, column: 3 + ; CHECK: G_STORE [[BITCAST]](<2 x s64>), [[COPY1]](p0), debug-location !DILocation(line: 4, column: 3 + ; CHECK: RET_ReallyLR debug-location !DILocation(line: 5, column: 1 + %0:_(<2 x p0>) = COPY $q0 + %1:_(p0) = COPY $x0 + G_STORE %0(<2 x p0>), %1(p0), debug-location !11 :: (store 16) + RET_ReallyLR debug-location !12 + +... diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-memlib-debug-loc.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-memlib-debug-loc.mir new file mode 100644 index 000000000000..6a5df883acd0 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-memlib-debug-loc.mir @@ -0,0 +1,60 @@ +# RUN: llc -mtriple=aarch64-- -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s +--- | + target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + target triple = "arm64-apple-ios13.0.0" + + define void @test_memset_debug(i8* %ptr, i32 %c, i32 %len) local_unnamed_addr!dbg !9 { + entry: + %conv = zext i32 %len to i64, !dbg !11 + %0 = trunc i32 %c to i8, !dbg !11 + call void @llvm.memset.p0i8.i64(i8* align 1 %ptr, i8 %0, i64 %conv, i1 false) #3, !dbg !11 + ret void, !dbg !12 + } + + declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #1 + attributes #1 = { argmemonly nounwind willreturn writeonly } + + !llvm.module.flags = !{!0, !1, !2, !3, !4} + !llvm.dbg.cu = !{!5} + !llvm.ident = !{!8} + + !0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 0]} + !1 = !{i32 7, !"Dwarf Version", i32 4} + !2 = !{i32 2, !"Debug Info Version", i32 3} + !3 = !{i32 1, !"wchar_size", i32 4} + !4 = !{i32 7, !"PIC Level", i32 2} + !5 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6, producer: "clang") + !6 = !DIFile(filename: "foo.c", directory: "/") + !7 = !{} + !8 = !{!"clang"} + !9 = distinct !DISubprogram(name: "test_memset_debug", scope: !6, file: !6, line: 3, type: !10, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !5, retainedNodes: !7) + !10 = !DISubroutineType(types: !7) + !11 = !DILocation(line: 4, column: 3, scope: !9) + !12 = !DILocation(line: 5, column: 1, scope: !9) + +... +--- +name: test_memset_debug +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x0' } + - { reg: '$w1' } + - { reg: '$w2' } +body: | + bb.1.entry: + liveins: $w1, $w2, $x0 + + ; We're checking that the BL call has the debug loc of the original intrinsic call. + ; CHECK-LABEL: name: test_memset_debug + ; CHECK: BL &memset, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, debug-location !11 + ; CHECK: RET_ReallyLR debug-location !12 + %0:_(p0) = COPY $x0 + %1:_(s32) = COPY $w1 + %2:_(s32) = COPY $w2 + %3:_(s64) = G_ZEXT %2(s32), debug-location !11 + %4:_(s8) = G_TRUNC %1(s32), debug-location !11 + G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.memset), %0(p0), %4(s8), %3(s64), 0, debug-location !11 :: (store 1 into %ir.ptr) + RET_ReallyLR debug-location !12 + +... diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift-imm-promote-dloc.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift-imm-promote-dloc.mir new file mode 100644 index 000000000000..54464fe6a610 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift-imm-promote-dloc.mir @@ -0,0 +1,58 @@ +# RUN: llc -mtriple=aarch64-- -run-pass=legalizer -verify-machineinstrs -O0 %s -o - | FileCheck %s +--- | + target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + target triple = "arm64-apple-ios13.0.0" + + define void @test_shl_imm_promote_debug() { + ret void + } + + !llvm.module.flags = !{!0, !1, !2, !3, !4} + !llvm.dbg.cu = !{!5} + !llvm.ident = !{!8} + + !0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 0]} + !1 = !{i32 7, !"Dwarf Version", i32 4} + !2 = !{i32 2, !"Debug Info Version", i32 3} + !3 = !{i32 1, !"wchar_size", i32 4} + !4 = !{i32 7, !"PIC Level", i32 2} + !5 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6, producer: "clang") + !6 = !DIFile(filename: "foo.c", directory: "/") + !7 = !{} + !8 = !{!"clang"} + !9 = distinct !DISubprogram(name: "test_shl_imm_promote_debug", scope: !6, file: !6, line: 3, type: !10, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !5, retainedNodes: !7) + !10 = !DISubroutineType(types: !7) + !11 = !DILocation(line: 4, column: 3, scope: !9) + !12 = !DILocation(line: 5, column: 1, scope: !9) + +... +--- +name: test_shl_imm_promote_debug +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x0' } + - { reg: '$w1' } + - { reg: '$w2' } +body: | + bb.1: + liveins: $w0, $w1 + + ; Check that the G_LSHR has the right debug loc. This should also check that the G_ZEXT of the constant + ; also has the right DL too, but it gets optimized away. + ; CHECK-LABEL: name: test_shl_imm_promote_debug + ; CHECK: liveins: $w0, $w1 + ; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0 + ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1 + ; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8 + ; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C]](s64), debug-location !DILocation(line: 4, column: 3 + ; CHECK: $w0 = COPY [[LSHR]](s32) + ; CHECK: RET_ReallyLR debug-location !DILocation(line: 5, column: 1 + %0:_(p0) = COPY $x0 + %1:_(s32) = COPY $w1 + %2:_(s32) = G_CONSTANT i32 8 + %3:_(s32) = G_LSHR %1(s32), %2(s32), debug-location !11 + $w0 = COPY %3(s32) + RET_ReallyLR debug-location !12 + +... diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir index a8aac0210b18..12be076e14cb 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir @@ -1,10 +1,30 @@ # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -# RUN: llc -O0 -run-pass=legalizer --debugify-and-strip-all-safe --debugify-level=locations %s -o - | FileCheck %s +# RUN: llc -O0 -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s --- | target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" target triple = "aarch64--" define void @test_vaarg() { ret void } + + + !llvm.module.flags = !{!0, !1, !2, !3, !4} + !llvm.dbg.cu = !{!5} + !llvm.ident = !{!8} + + !0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 0]} + !1 = !{i32 7, !"Dwarf Version", i32 4} + !2 = !{i32 2, !"Debug Info Version", i32 3} + !3 = !{i32 1, !"wchar_size", i32 4} + !4 = !{i32 7, !"PIC Level", i32 2} + !5 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6, producer: "clang") + !6 = !DIFile(filename: "foo.c", directory: "/") + !7 = !{} + !8 = !{!"clang"} + !9 = distinct !DISubprogram(name: "test_vaarg", scope: !6, file: !6, line: 3, type: !10, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !5, retainedNodes: !7) + !10 = !DISubroutineType(types: !7) + !11 = !DILocation(line: 4, column: 3, scope: !9) + !12 = !DILocation(line: 5, column: 1, scope: !9) + ... --- @@ -13,25 +33,25 @@ body: | bb.0: ; CHECK-LABEL: name: test_vaarg ; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0 - ; CHECK: [[LOAD:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0) :: (load 8) + ; CHECK: [[LOAD:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) :: (load 8) ; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8 - ; CHECK: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD]], [[C]](s64) - ; CHECK: G_STORE [[PTR_ADD]](p0), [[COPY]](p0) :: (store 8) - ; CHECK: [[LOAD1:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0) :: (load 8) - ; CHECK: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD1]], [[C]](s64) - ; CHECK: G_STORE [[PTR_ADD1]](p0), [[COPY]](p0) :: (store 8) - ; CHECK: [[LOAD2:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0) :: (load 8) + ; CHECK: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD]], [[C]](s64), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) + ; CHECK: G_STORE [[PTR_ADD]](p0), [[COPY]](p0), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) :: (store 8) + ; CHECK: [[LOAD1:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0), debug-location !DILocation(line: 5, column: 1, scope: {{.*}}) :: (load 8) + ; CHECK: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD1]], [[C]](s64), debug-location !DILocation(line: 5, column: 1, scope: {{.*}}) + ; CHECK: G_STORE [[PTR_ADD1]](p0), [[COPY]](p0), debug-location !DILocation(line: 5, column: 1, scope: {{.*}}) :: (store 8) + ; CHECK: [[LOAD2:%[0-9]+]]:_(p0) = G_LOAD [[COPY]](p0), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) :: (load 8) ; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 15 - ; CHECK: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD2]], [[C1]](s64) + ; CHECK: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[LOAD2]], [[C1]](s64), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) ; CHECK: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 -16 - ; CHECK: [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[PTR_ADD2]], [[C2]](s64) - ; CHECK: [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTRMASK]], [[C]](s64) - ; CHECK: G_STORE [[PTR_ADD3]](p0), [[COPY]](p0) :: (store 8) + ; CHECK: [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[PTR_ADD2]], [[C2]](s64), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) + ; CHECK: [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTRMASK]], [[C]](s64), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) + ; CHECK: G_STORE [[PTR_ADD3]](p0), [[COPY]](p0), debug-location !DILocation(line: 4, column: 3, scope: {{.*}}) :: (store 8) %0:_(p0) = COPY $x0 - %1:_(s8) = G_VAARG %0(p0), 1 + %1:_(s8) = G_VAARG %0(p0), 1, debug-location !11 - %2:_(s64) = G_VAARG %0(p0), 8 + %2:_(s64) = G_VAARG %0(p0), 8, debug-location !12 - %3:_(s64) = G_VAARG %0(p0), 16 + %3:_(s64) = G_VAARG %0(p0), 16, debug-location !11 ... From llvm-commits at lists.llvm.org Thu Jul 9 17:23:45 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:23:45 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: clementval accepted this revision. clementval added a comment. This revision is now accepted and ready to land. LGTM. You might want to wait to see if anybody has something to say on this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Thu Jul 9 17:25:56 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:25:56 +0000 (UTC) Subject: [PATCH] D83526: [FileCheck] In input dump, elide only if ellipsis is shorter Message-ID: jdenny created this revision. jdenny added reviewers: probinson, thopre, jhenderson, mehdi_amini. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. For example, given `-dump-input-context=3 -vv`, the following now shows more leading context for the error than requested because a leading ellipsis would occupy the same number of lines as it would elide: <<<<<< 1: foo6 2: foo5 3: foo4 4: foo3 5: foo2 6: foo1 7: hello world check:1 ^~~~~ check:2 X~~~~ error: no match found 8: foo1 check:2 ~~~~ 9: foo2 check:2 ~~~~ 10: foo3 check:2 ~~~~ . . . >>>>>> Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83526 Files: llvm/test/FileCheck/dump-input-context.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83526.276884.patch Type: text/x-patch Size: 13234 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:26:38 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:26:38 +0000 (UTC) Subject: [PATCH] D82203: [FileCheck] Implement -dump-input-context In-Reply-To: References: Message-ID: <04340fd78e883717f290493e3d34af8d@localhost.localdomain> jdenny updated this revision to Diff 276880. jdenny added a comment. In an effort to facilitate the review, I've extracted the logic that elides input lines only when the ellipsis is shorter. That's now in D83526 . I've also rebased. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82203/new/ https://reviews.llvm.org/D82203 Files: llvm/test/FileCheck/dump-input-annotations.txt llvm/test/FileCheck/dump-input-context.txt llvm/test/FileCheck/dump-input-enable.txt llvm/test/FileCheck/dump-input-filter.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82203.276880.patch Type: text/x-patch Size: 27157 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:28:06 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:28:06 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: <74d1fd6ef04fd1a84bb55985dd63f0f0@localhost.localdomain> jdenny updated this revision to Diff 276886. jdenny added a comment. Rebased. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 Files: llvm/test/FileCheck/dump-input-filter.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83097.276886.patch Type: text/x-patch Size: 19090 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 17:28:09 2020 From: llvm-commits at lists.llvm.org (Julian Lettner via llvm-commits) Date: Thu, 09 Jul 2020 17:28:09 -0700 (PDT) Subject: [compiler-rt] bed3e1a - [Sanitizer] Update macOS version checking Message-ID: <5f07b619.1c69fb81.486ff.a1e9@mx.google.com> Author: Julian Lettner Date: 2020-07-09T17:28:01-07:00 New Revision: bed3e1a99b41f5a9525bc0edf12ecbcf63aab0cf URL: https://github.com/llvm/llvm-project/commit/bed3e1a99b41f5a9525bc0edf12ecbcf63aab0cf DIFF: https://github.com/llvm/llvm-project/commit/bed3e1a99b41f5a9525bc0edf12ecbcf63aab0cf.diff LOG: [Sanitizer] Update macOS version checking Support macOS 11 in our runtime version checking code and update `GetMacosAlignedVersionInternal()` accordingly. This follows the implementation of `Triple::getMacOSXVersion()` in the Clang driver. Reviewed By: delcypher Differential Revision: https://reviews.llvm.org/D82918 Added: Modified: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp compiler-rt/lib/sanitizer_common/tests/sanitizer_mac_test.cpp Removed: ################################################################################ diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp index c22e7517fc6f..7a3dfbcc2760 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp @@ -606,12 +606,22 @@ HandleSignalMode GetHandleSignalMode(int signum) { return result; } +// This corresponds to Triple::getMacOSXVersion() in the Clang driver. static MacosVersion GetMacosAlignedVersionInternal() { u16 kernel_major = GetDarwinKernelVersion().major; - const u16 version_offset = 4; - CHECK_GE(kernel_major, version_offset); - u16 macos_major = kernel_major - version_offset; - return MacosVersion(10, macos_major); + // Darwin 0-3 -> unsupported + // Darwin 4-19 -> macOS 10.x + // Darwin 20+ -> macOS 11+ + CHECK_GE(kernel_major, 4); + u16 major, minor; + if (kernel_major < 20) { + major = 10; + minor = kernel_major - 4; + } else { + major = 11 + kernel_major - 20; + minor = 0; + } + return MacosVersion(major, minor); } static_assert(sizeof(MacosVersion) == sizeof(atomic_uint32_t::Type), diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_mac_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_mac_test.cpp index b327ba96e223..c8658ea55d03 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_mac_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_mac_test.cpp @@ -24,8 +24,12 @@ namespace __sanitizer { TEST(SanitizerMac, GetMacosAlignedVersion) { MacosVersion vers = GetMacosAlignedVersion(); - EXPECT_EQ(vers.major, 10); - EXPECT_EQ(vers.minor, GetDarwinKernelVersion().major - 4); + u16 kernel_major = GetDarwinKernelVersion().major; + bool macos_11 = (kernel_major >= 20); + u16 expected_major = macos_11 ? (kernel_major - 9) : 10; + u16 expected_minor = macos_11 ? 0 : (kernel_major - 4); + EXPECT_EQ(vers.major, expected_major); + EXPECT_EQ(vers.minor, expected_minor); } void ParseVersion(const char *vers, u16 *major, u16 *minor); From llvm-commits at lists.llvm.org Thu Jul 9 17:28:11 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:28:11 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: <07f39f202972dfe3681fe23c5bda2f45@localhost.localdomain> aeubanks marked an inline comment as done. aeubanks added inline comments. ================ Comment at: llvm/include/llvm/IR/PassInstrumentation.h:150 for (auto &C : Callbacks->BeforePassCallbacks) - ShouldRun &= C(Pass.name(), llvm::Any(&IR)); + ShouldRun &= C(Pass.name(), Pass.isRequired(), llvm::Any(&IR)); return ShouldRun; ---------------- ychen wrote: > Could we do this to not changing the callback API? > `ShouldRun &= C(Pass.name(), llvm::Any(&IR)) || Pass.isRequired();` Each pass instrumentation should decide whether or not to run the pass based on whether or not the pass is required or optional. An optional pass may still be run, (which should be the case for the vast majority of instances). For example, the optnone would only care if a pass is required or not if it sees that a function is marked optnone. Similarly, opt-bisect would only care if a pass is required if it's hit the bisect limit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Thu Jul 9 17:33:08 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:33:08 +0000 (UTC) Subject: [PATCH] D82203: [FileCheck] Implement -dump-input-context In-Reply-To: References: Message-ID: <47f78afadec3c27c3be71a7f5167ed11@localhost.localdomain> jdenny added a comment. I have no more simplifications planned, and there are no outstanding dependencies. This patch is ready for a review. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82203/new/ https://reviews.llvm.org/D82203 From llvm-commits at lists.llvm.org Thu Jul 9 17:37:41 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:37:41 +0000 (UTC) Subject: [PATCH] D83525: Remove optnone from FullUnroll.ll In-Reply-To: References: Message-ID: echristo added a comment. I may need to comment it better, but in this case part of it is that it's designed to only have the forced loop unroller run on it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83525/new/ https://reviews.llvm.org/D83525 From llvm-commits at lists.llvm.org Thu Jul 9 17:37:42 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Thu, 9 Jul 2020 17:37:42 -0700 Subject: [llvm] 613f12d - [AArch64][GlobalISel] Set the current debug loc when missing in some cases. In-Reply-To: <44C39B43-D853-462D-81D7-47FE1500197C@apple.com> References: <5ea15350.1c69fb81.e6a9c.69b0@mx.google.com> <44C39B43-D853-462D-81D7-47FE1500197C@apple.com> Message-ID: Thanks! On Thu, Jul 9, 2020 at 5:24 PM Amara Emerson wrote: > > I’ve added some more tests in ce22527c0c7a (except moreElementsVector which I don’t think arm64 can currently trigger) > > Amara > > > On Jul 6, 2020, at 3:05 PM, Amara Emerson wrote: > > > > Hi David, > > > > Sorry I missed this. I’ll try get round to it this week. > > > > Cheers, > > Amara > > > >> On Jul 6, 2020, at 12:22 PM, David Blaikie wrote: > >> > >> Ping > >> > >> On Mon, Apr 27, 2020 at 9:17 PM David Blaikie wrote: > >>> > >>> Could you add more precise tests that validate the specific debug locations that are present on the resulting instructions are the desired ones? > >>> > >>> On Thu, Apr 23, 2020 at 1:36 AM Amara Emerson via llvm-commits wrote: > >>>> > >>>> > >>>> Author: Amara Emerson > >>>> Date: 2020-04-23T01:34:57-07:00 > >>>> New Revision: 613f12dd8e2403f5630ab299d2a1bb2cb111ead1 > >>>> > >>>> URL: https://github.com/llvm/llvm-project/commit/613f12dd8e2403f5630ab299d2a1bb2cb111ead1 > >>>> DIFF: https://github.com/llvm/llvm-project/commit/613f12dd8e2403f5630ab299d2a1bb2cb111ead1.diff > >>>> > >>>> LOG: [AArch64][GlobalISel] Set the current debug loc when missing in some cases. > >>>> > >>>> Added: > >>>> > >>>> > >>>> Modified: > >>>> llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp > >>>> llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp > >>>> llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir > >>>> llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir > >>>> llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir > >>>> > >>>> Removed: > >>>> > >>>> > >>>> > >>>> ################################################################################ > >>>> diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp > >>>> index 47c723cbf5a3..09e303eadd49 100644 > >>>> --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp > >>>> +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp > >>>> @@ -570,7 +570,7 @@ llvm::createMemLibcall(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI, > >>>> } > >>>> const char *Name = TLI.getLibcallName(RTLibcall); > >>>> > >>>> - MIRBuilder.setInstr(MI); > >>>> + MIRBuilder.setInstrAndDebugLoc(MI); > >>>> > >>>> CallLowering::CallLoweringInfo Info; > >>>> Info.CallConv = TLI.getLibcallCallingConv(RTLibcall); > >>>> @@ -3610,7 +3610,7 @@ LegalizerHelper::moreElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx, > >>>> LegalizerHelper::LegalizeResult > >>>> LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx, > >>>> LLT MoreTy) { > >>>> - MIRBuilder.setInstr(MI); > >>>> + MIRBuilder.setInstrAndDebugLoc(MI); > >>>> unsigned Opc = MI.getOpcode(); > >>>> switch (Opc) { > >>>> case TargetOpcode::G_IMPLICIT_DEF: > >>>> > >>>> diff --git a/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp > >>>> index 60ccb3621a2e..cae5028f1925 100644 > >>>> --- a/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp > >>>> +++ b/llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp > >>>> @@ -675,7 +675,7 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr( > >>>> if (Amount > 31) > >>>> return true; // This will have to remain a register variant. > >>>> assert(MRI.getType(AmtReg).getSizeInBits() == 32); > >>>> - MIRBuilder.setInstr(MI); > >>>> + MIRBuilder.setInstrAndDebugLoc(MI); > >>>> auto ExtCst = MIRBuilder.buildZExt(LLT::scalar(64), AmtReg); > >>>> MI.getOperand(2).setReg(ExtCst.getReg(0)); > >>>> return true; > >>>> @@ -704,7 +704,7 @@ bool AArch64LegalizerInfo::legalizeLoadStore( > >>>> return false; > >>>> } > >>>> > >>>> - MIRBuilder.setInstr(MI); > >>>> + MIRBuilder.setInstrAndDebugLoc(MI); > >>>> unsigned PtrSize = ValTy.getElementType().getSizeInBits(); > >>>> const LLT NewTy = LLT::vector(ValTy.getNumElements(), PtrSize); > >>>> auto &MMO = **MI.memoperands_begin(); > >>>> @@ -722,7 +722,7 @@ bool AArch64LegalizerInfo::legalizeLoadStore( > >>>> bool AArch64LegalizerInfo::legalizeVaArg(MachineInstr &MI, > >>>> MachineRegisterInfo &MRI, > >>>> MachineIRBuilder &MIRBuilder) const { > >>>> - MIRBuilder.setInstr(MI); > >>>> + MIRBuilder.setInstrAndDebugLoc(MI); > >>>> MachineFunction &MF = MIRBuilder.getMF(); > >>>> Align Alignment(MI.getOperand(2).getImm()); > >>>> Register Dst = MI.getOperand(0).getReg(); > >>>> > >>>> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir > >>>> index 6d50898117cd..5b32fd51f58c 100644 > >>>> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir > >>>> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store-vector-of-ptr.mir > >>>> @@ -1,5 +1,6 @@ > >>>> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py > >>>> # RUN: llc -O0 -march=aarch64 -run-pass=legalizer %s -o - | FileCheck %s > >>>> +# RUN: llc -O0 -debugify-and-strip-all-safe -march=aarch64 -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s > >>>> --- | > >>>> target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" > >>>> target triple = "aarch64" > >>>> > >>>> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir > >>>> index 7ccb5166e4a7..dc42d603d737 100644 > >>>> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir > >>>> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir > >>>> @@ -1,5 +1,6 @@ > >>>> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py > >>>> # RUN: llc -O0 -march=aarch64 -run-pass=legalizer %s -o - | FileCheck %s > >>>> +# RUN: llc -O0 -debugify-and-strip-all-safe -march=aarch64 -run-pass=legalizer -verify-machineinstrs %s -o - | FileCheck %s > >>>> --- > >>>> name: test_shift > >>>> body: | > >>>> > >>>> diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir > >>>> index fe1d5a5002c9..7446fde7ba08 100644 > >>>> --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir > >>>> +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-vaarg.mir > >>>> @@ -1,5 +1,5 @@ > >>>> # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py > >>>> -# RUN: llc -O0 -run-pass=legalizer %s -o - | FileCheck %s > >>>> +# RUN: llc -O0 -run-pass=legalizer --debugify-and-strip-all-safe --debugify-level=locations %s -o - | FileCheck %s > >>>> > >>>> --- | > >>>> target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> llvm-commits mailing list > >>>> llvm-commits at lists.llvm.org > >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > > > From llvm-commits at lists.llvm.org Thu Jul 9 17:39:44 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:39:44 +0000 (UTC) Subject: [PATCH] D83247: [compiler-rt][asan][hwasan] Refactor shadow setup into sanitizer_common (NFCI) In-Reply-To: References: Message-ID: <5a5f1252611bcc1defd944a240551f5f@localhost.localdomain> vitalybuka added inline comments. ================ Comment at: compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp:1083 + largest_gap_found, max_occupied_addr); + uptr new_max_vm = RoundDownTo(largest_gap_found << SHADOW_SCALE, alignment); + if (new_max_vm < max_occupied_addr) { ---------------- tejohnson wrote: > vitalybuka wrote: > > SHADOW_SCALE is undefined here > For this and other comments on the win/mac version, obviously I didn't do a very careful job of moving this code from asan to sanitizer_common. =( Will work on fixing those and need to find a way to at least compile these codes for those platforms to flush out these issues. If you have any pointers on setting up a cross compile of this code on a linux system please let me know. > > I have setup to compile on Win. I can try updated patch and fix it up if necessary. Unfortunately I have no OSX but it should be possible to find someone on the team. Still there are other platforms where we can only rely on build bots. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83247/new/ https://reviews.llvm.org/D83247 From llvm-commits at lists.llvm.org Thu Jul 9 17:55:57 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 00:55:57 +0000 (UTC) Subject: [PATCH] D83499: [MSAN runtime] Add poison_stack function that also updates origin In-Reply-To: References: Message-ID: <315876800be7d4872d3a15d7ecd31356@localhost.localdomain> vitalybuka accepted this revision. vitalybuka added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/test/Instrumentation/MemorySanitizer/alloca.ll:9 + +; RUN: opt < %s -msan-check-access-address=0 -msan-poison-stack-with-call=1 \ +; RUN: -msan-track-origins=1 -S -passes=msan 2>&1 | FileCheck %s "--check-prefixes=CHECK,CALL-ORIGIN" ---------------- flag order is inconsistent also line breaks with \ is unnecessary long lines should be accepted by clang-format -style file in tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83499/new/ https://reviews.llvm.org/D83499 From llvm-commits at lists.llvm.org Thu Jul 9 18:01:53 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via llvm-commits) Date: Thu, 09 Jul 2020 18:01:53 -0700 (PDT) Subject: [llvm] 57f2a78 - [StackSafety,NFC] Reduce FunctionSummary size Message-ID: <5f07be01.1c69fb81.5e469.abcd@mx.google.com> Author: Vitaly Buka Date: 2020-07-09T18:01:39-07:00 New Revision: 57f2a789ca074165baa1c8eea3332007477c9f91 URL: https://github.com/llvm/llvm-project/commit/57f2a789ca074165baa1c8eea3332007477c9f91 DIFF: https://github.com/llvm/llvm-project/commit/57f2a789ca074165baa1c8eea3332007477c9f91.diff LOG: [StackSafety,NFC] Reduce FunctionSummary size Most compiler infocations will not need ParamAccess, so we can optimize memory usage there with smaller unique_ptr instead of empty vector. Suggested in D80908 review. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D83458 Added: Modified: llvm/include/llvm/IR/ModuleSummaryIndex.h Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 9adaf5dfc3d3..4cfd4e916200 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -629,7 +629,8 @@ class FunctionSummary : public GlobalValueSummary { std::unique_ptr TIdInfo; /// Uses for every parameter to this function. - std::vector ParamAccesses; + using ParamAccessesTy = std::vector; + std::unique_ptr ParamAccesses; public: FunctionSummary(GVFlags Flags, unsigned NumInsts, FFlags FunFlags, @@ -640,19 +641,20 @@ class FunctionSummary : public GlobalValueSummary { std::vector TypeCheckedLoadVCalls, std::vector TypeTestAssumeConstVCalls, std::vector TypeCheckedLoadConstVCalls, - std::vector ParamAccesses) + std::vector Params) : GlobalValueSummary(FunctionKind, Flags, std::move(Refs)), InstCount(NumInsts), FunFlags(FunFlags), EntryCount(EntryCount), - CallGraphEdgeList(std::move(CGEdges)), - ParamAccesses(std::move(ParamAccesses)) { + CallGraphEdgeList(std::move(CGEdges)) { if (!TypeTests.empty() || !TypeTestAssumeVCalls.empty() || !TypeCheckedLoadVCalls.empty() || !TypeTestAssumeConstVCalls.empty() || !TypeCheckedLoadConstVCalls.empty()) - TIdInfo = std::make_unique(TypeIdInfo{ - std::move(TypeTests), std::move(TypeTestAssumeVCalls), - std::move(TypeCheckedLoadVCalls), - std::move(TypeTestAssumeConstVCalls), - std::move(TypeCheckedLoadConstVCalls)}); + TIdInfo = std::make_unique( + TypeIdInfo{std::move(TypeTests), std::move(TypeTestAssumeVCalls), + std::move(TypeCheckedLoadVCalls), + std::move(TypeTestAssumeConstVCalls), + std::move(TypeCheckedLoadConstVCalls)}); + if (!Params.empty()) + ParamAccesses = std::make_unique(std::move(Params)); } // Gets the number of readonly and writeonly refs in RefEdgeList std::pair specialRefCounts() const; @@ -724,11 +726,20 @@ class FunctionSummary : public GlobalValueSummary { } /// Returns the list of known uses of pointer parameters. - ArrayRef paramAccesses() const { return ParamAccesses; } + ArrayRef paramAccesses() const { + if (ParamAccesses) + return *ParamAccesses; + return {}; + } /// Sets the list of known uses of pointer parameters. void setParamAccesses(std::vector NewParams) { - ParamAccesses = std::move(NewParams); + if (NewParams.empty()) + ParamAccesses.reset(); + else if (ParamAccesses) + *ParamAccesses = std::move(NewParams); + else + ParamAccesses = std::make_unique(std::move(NewParams)); } /// Add a type test to the summary. This is used by WholeProgramDevirt if we From llvm-commits at lists.llvm.org Thu Jul 9 18:02:01 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:02:01 +0000 (UTC) Subject: [PATCH] D83458: [StackSafety,NFC] Reduce FunctionSummary size In-Reply-To: References: Message-ID: <3d04bbe729f2e9c0cfd1d81ef60f5105@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG57f2a789ca07: [StackSafety,NFC] Reduce FunctionSummary size (authored by vitalybuka). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83458/new/ https://reviews.llvm.org/D83458 Files: llvm/include/llvm/IR/ModuleSummaryIndex.h Index: llvm/include/llvm/IR/ModuleSummaryIndex.h =================================================================== --- llvm/include/llvm/IR/ModuleSummaryIndex.h +++ llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -629,7 +629,8 @@ std::unique_ptr TIdInfo; /// Uses for every parameter to this function. - std::vector ParamAccesses; + using ParamAccessesTy = std::vector; + std::unique_ptr ParamAccesses; public: FunctionSummary(GVFlags Flags, unsigned NumInsts, FFlags FunFlags, @@ -640,19 +641,20 @@ std::vector TypeCheckedLoadVCalls, std::vector TypeTestAssumeConstVCalls, std::vector TypeCheckedLoadConstVCalls, - std::vector ParamAccesses) + std::vector Params) : GlobalValueSummary(FunctionKind, Flags, std::move(Refs)), InstCount(NumInsts), FunFlags(FunFlags), EntryCount(EntryCount), - CallGraphEdgeList(std::move(CGEdges)), - ParamAccesses(std::move(ParamAccesses)) { + CallGraphEdgeList(std::move(CGEdges)) { if (!TypeTests.empty() || !TypeTestAssumeVCalls.empty() || !TypeCheckedLoadVCalls.empty() || !TypeTestAssumeConstVCalls.empty() || !TypeCheckedLoadConstVCalls.empty()) - TIdInfo = std::make_unique(TypeIdInfo{ - std::move(TypeTests), std::move(TypeTestAssumeVCalls), - std::move(TypeCheckedLoadVCalls), - std::move(TypeTestAssumeConstVCalls), - std::move(TypeCheckedLoadConstVCalls)}); + TIdInfo = std::make_unique( + TypeIdInfo{std::move(TypeTests), std::move(TypeTestAssumeVCalls), + std::move(TypeCheckedLoadVCalls), + std::move(TypeTestAssumeConstVCalls), + std::move(TypeCheckedLoadConstVCalls)}); + if (!Params.empty()) + ParamAccesses = std::make_unique(std::move(Params)); } // Gets the number of readonly and writeonly refs in RefEdgeList std::pair specialRefCounts() const; @@ -724,11 +726,20 @@ } /// Returns the list of known uses of pointer parameters. - ArrayRef paramAccesses() const { return ParamAccesses; } + ArrayRef paramAccesses() const { + if (ParamAccesses) + return *ParamAccesses; + return {}; + } /// Sets the list of known uses of pointer parameters. void setParamAccesses(std::vector NewParams) { - ParamAccesses = std::move(NewParams); + if (NewParams.empty()) + ParamAccesses.reset(); + else if (ParamAccesses) + *ParamAccesses = std::move(NewParams); + else + ParamAccesses = std::make_unique(std::move(NewParams)); } /// Add a type test to the summary. This is used by WholeProgramDevirt if we -------------- next part -------------- A non-text attachment was scrubbed... Name: D83458.276889.patch Type: text/x-patch Size: 2937 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 18:02:24 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:02:24 +0000 (UTC) Subject: [PATCH] D83525: Remove optnone from FullUnroll.ll In-Reply-To: References: Message-ID: aeubanks added a comment. In D83525#2143062 , @echristo wrote: > I may need to comment it better, but in this case part of it is that it's designed to only have the forced loop unroller run on it. Ah I see, thanks for the explanation Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83525/new/ https://reviews.llvm.org/D83525 From llvm-commits at lists.llvm.org Thu Jul 9 18:07:41 2020 From: llvm-commits at lists.llvm.org (Bill Wendling via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:07:41 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: void added a comment. I'm confused by this change. The original code *should* be correct. That it's resulting in this issue is another thing. My first question is, why are we trying to sink instructions into the entry block? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Thu Jul 9 18:21:30 2020 From: llvm-commits at lists.llvm.org (Alexander Shaposhnikov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:21:30 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <25edab6129c0bc3770d470abd9840770@localhost.localdomain> alexshap added inline comments. ================ Comment at: llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp:34 +enum Operation { Static }; +static cl::opt LibraryOperation( ---------------- enum class + i would add a blank line between the lines 34 and 35 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 From llvm-commits at lists.llvm.org Thu Jul 9 18:23:45 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:23:45 +0000 (UTC) Subject: [PATCH] D77341: [DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff. In-Reply-To: References: Message-ID: <3a92b9b22118466058e74047ad300c70@localhost.localdomain> asbirlea updated this revision to Diff 276890. asbirlea added a comment. This revision is now accepted and ready to land. Updated to include the part of the patch that's moving the Updates to a CFGDiff object. Splitting off from the clean-up work merging the two branches when BUI is null. This patch does not exhibit the compile-time regression which caused it to be reverted previously. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77341/new/ https://reviews.llvm.org/D77341 Files: llvm/include/llvm/IR/Dominators.h llvm/include/llvm/Support/CFGDiff.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/IR/Dominators.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77341.276890.patch Type: text/x-patch Size: 19105 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 18:29:43 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:29:43 +0000 (UTC) Subject: [PATCH] D77341: [DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff. In-Reply-To: References: Message-ID: <933225f82a44a7a689725887b72382be@localhost.localdomain> asbirlea updated this revision to Diff 276891. asbirlea added a comment. Nit: re-add `const`s Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77341/new/ https://reviews.llvm.org/D77341 Files: llvm/include/llvm/IR/Dominators.h llvm/include/llvm/Support/CFGDiff.h llvm/include/llvm/Support/GenericDomTree.h llvm/include/llvm/Support/GenericDomTreeConstruction.h llvm/lib/IR/Dominators.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D77341.276891.patch Type: text/x-patch Size: 18743 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 18:43:36 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:43:36 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: <776e29c11c20f65196bb95282b3f6ed3@localhost.localdomain> aeubanks marked an inline comment as done. aeubanks added inline comments. ================ Comment at: llvm/include/llvm/IR/PassInstrumentation.h:150 for (auto &C : Callbacks->BeforePassCallbacks) - ShouldRun &= C(Pass.name(), llvm::Any(&IR)); + ShouldRun &= C(Pass.name(), Pass.isRequired(), llvm::Any(&IR)); return ShouldRun; ---------------- aeubanks wrote: > ychen wrote: > > Could we do this to not changing the callback API? > > `ShouldRun &= C(Pass.name(), llvm::Any(&IR)) || Pass.isRequired();` > Each pass instrumentation should decide whether or not to run the pass based on whether or not the pass is required or optional. An optional pass may still be run, (which should be the case for the vast majority of instances). > > For example, the optnone would only care if a pass is required or not if it sees that a function is marked optnone. > Similarly, opt-bisect would only care if a pass is required if it's hit the bisect limit. Sorry, now I understand what you mean, the ands and ors confused me. I don't want to rule out the possibility of some future pass instrumentation wanting to skip even a required pass. But I am open to discussion on this point. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Thu Jul 9 18:45:38 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:45:38 +0000 (UTC) Subject: [PATCH] D79874: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbs asm instructions In-Reply-To: References: Message-ID: <4afa9559f85cd458dea82a6200376bef@localhost.localdomain> PaoloS marked an inline comment as done. PaoloS added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:666 +let Predicates = [HasStdExtZbs, IsRV64] in +def : Pat<(and (xor (riscv_sllw 1, GPR:$rs2), -1), GPR:$rs1), + (SBCLR GPR:$rs1, GPR:$rs2)>; ---------------- lewis-revill wrote: > Why does this need to be `riscv_sllw` as opposed to `shl`? Isn't the former intended for matching patterns resulting from a 32 bit operation? Indeed. That happened because some constants in the samples used to discover the patterns were set by default to 32 bit. Fixed, thank you. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79874/new/ https://reviews.llvm.org/D79874 From llvm-commits at lists.llvm.org Thu Jul 9 18:50:24 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:50:24 +0000 (UTC) Subject: [PATCH] D79874: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbs asm instructions In-Reply-To: References: Message-ID: PaoloS updated this revision to Diff 276892. PaoloS added a comment. Added pattern-matching for sbexti, sbclrw, sbsetw, sbinvw and sbextw. Added correspondent codegen tests. Reorganized the tests so that both 32 and 64 bit files have both 32 and 64 bit versions of each test. Fixed some imprecise patterns due to unnecessary constant truncations. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79874/new/ https://reviews.llvm.org/D79874 Files: llvm/lib/Target/RISCV/RISCVInstrInfoB.td llvm/test/CodeGen/RISCV/rv32Zbs.ll llvm/test/CodeGen/RISCV/rv64Zbs.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79874.276892.patch Type: text/x-patch Size: 14975 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 18:56:07 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:56:07 +0000 (UTC) Subject: [PATCH] D79874: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbs asm instructions In-Reply-To: References: Message-ID: <8f9dfb2eee3815ea095a1e7c71790f50@localhost.localdomain> PaoloS added a comment. I had to exclude the test of the pattern of sbclr_i64 on RV32 because it caused a warning of asm conflicts. Also the cross tests i64 on RV32 are quite noisy here. I'd rather keep them though as they show quite efficient results on many other instructions in other subextensions. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79874/new/ https://reviews.llvm.org/D79874 From llvm-commits at lists.llvm.org Thu Jul 9 18:58:05 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:58:05 +0000 (UTC) Subject: [PATCH] D80975: [SCEV][IndVarSimplify] insert point should not be block front if the front instruction is a PHI In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGf1efb8bb4ba0: [SCEV][IndVarSimplify] insert point should not be block front. (authored by shchenz). Changed prior to commit: https://reviews.llvm.org/D80975?vs=268070&id=276893#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80975/new/ https://reviews.llvm.org/D80975 Files: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll Index: llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll =================================================================== --- /dev/null +++ llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll @@ -0,0 +1,24 @@ +; RUN: opt < %s -indvars -S | FileCheck %s + +target datalayout = "e-m:e-i64:64-n32:64" + +define dso_local void @Widen_i32_i8ptr() local_unnamed_addr { +; CHECK-LABEL: @Widen_i32_i8ptr( +; CHECK: phi i8* +; CHECK: phi i32 +entry: + %ptrids = alloca [15 x i8*], align 8 + %arraydecay2032 = getelementptr inbounds [15 x i8*], [15 x i8*]* %ptrids, i64 0, i64 0 + store i8** %arraydecay2032, i8*** inttoptr (i64 8 to i8***), align 8 + br label %for.cond2106 + +for.cond2106: ; preds = %for.cond2106, %entry + %gid.0 = phi i8* [ null, %entry ], [ %incdec.ptr, %for.cond2106 ] + %i.0 = phi i32 [ 0, %entry ], [ %inc2117, %for.cond2106 ] + %incdec.ptr = getelementptr inbounds i8, i8* %gid.0, i64 1 + %idxprom2114 = zext i32 %i.0 to i64 + %arrayidx2115 = getelementptr inbounds [15 x i8*], [15 x i8*]* %ptrids, i64 0, i64 %idxprom2114 + store i8* %gid.0, i8** %arrayidx2115, align 8 + %inc2117 = add nuw nsw i32 %i.0, 1 + br label %for.cond2106 +} Index: llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp =================================================================== --- llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp +++ llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp @@ -1292,7 +1292,8 @@ if (useSubtract) Step = SE.getNegativeSCEV(Step); // Expand the step somewhere that dominates the loop header. - Value *StepV = expandCodeFor(Step, IntTy, &L->getHeader()->front()); + Value *StepV = expandCodeFor(Step, IntTy, + &*L->getHeader()->getFirstInsertionPt()); // The no-wrap behavior proved by IsIncrement(NUW|NSW) is only applicable if // we actually do emit an addition. It does not apply if we emit a @@ -1438,7 +1439,8 @@ { // Expand the step somewhere that dominates the loop header. SCEVInsertPointGuard Guard(Builder, this); - StepV = expandCodeFor(Step, IntTy, &L->getHeader()->front()); + StepV = expandCodeFor(Step, IntTy, + &*L->getHeader()->getFirstInsertionPt()); } Result = expandIVInc(PN, StepV, L, ExpandTy, IntTy, useSubtract); } @@ -1870,11 +1872,6 @@ } } - // IndVarSimplify sometimes sets the insertion point at the block start, even - // when there are PHIs at that point. We must correct for this. - if (isa(*InsertPt)) - InsertPt = &*InsertPt->getParent()->getFirstInsertionPt(); - // Check to see if we already expanded this here. auto I = InsertedExpressions.find(std::make_pair(S, InsertPt)); if (I != InsertedExpressions.end()) @@ -1945,7 +1942,8 @@ // Emit code for it. SCEVInsertPointGuard Guard(Builder, this); PHINode *V = - cast(expandCodeFor(H, nullptr, &L->getHeader()->front())); + cast(expandCodeFor(H, nullptr, + &*L->getHeader()->getFirstInsertionPt())); return V; } Index: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp =================================================================== --- llvm/lib/Transforms/Scalar/IndVarSimplify.cpp +++ llvm/lib/Transforms/Scalar/IndVarSimplify.cpp @@ -1435,8 +1435,12 @@ // either find an existing phi or materialize a new one. Either way, we // expect a well-formed cyclic phi-with-increments. i.e. any operand not part // of the phi-SCC dominates the loop entry. - Instruction *InsertPt = &L->getHeader()->front(); - WidePhi = cast(Rewriter.expandCodeFor(AddRec, WideType, InsertPt)); + Instruction *InsertPt = &*L->getHeader()->getFirstInsertionPt(); + WidePhi = dyn_cast(Rewriter.expandCodeFor(AddRec, WideType, InsertPt)); + // If the wide phi is not a phi node, for example a cast node, like bitcast, + // inttoptr, ptrtoint, just skip for now. + if (!WidePhi) + return nullptr; // Remembering the WideIV increment generated by SCEVExpander allows // widenIVUse to reuse it when widening the narrow IV's increment. We don't -------------- next part -------------- A non-text attachment was scrubbed... Name: D80975.276893.patch Type: text/x-patch Size: 4191 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 18:58:05 2020 From: llvm-commits at lists.llvm.org (Chen Zheng via llvm-commits) Date: Thu, 09 Jul 2020 18:58:05 -0700 (PDT) Subject: [llvm] f1efb8b - [SCEV][IndVarSimplify] insert point should not be block front. Message-ID: <5f07cb2d.1c69fb81.ef764.af33@mx.google.com> Author: Chen Zheng Date: 2020-07-09T21:56:57-04:00 New Revision: f1efb8bb4ba0584a9b994f3404a2c62920ce6652 URL: https://github.com/llvm/llvm-project/commit/f1efb8bb4ba0584a9b994f3404a2c62920ce6652 DIFF: https://github.com/llvm/llvm-project/commit/f1efb8bb4ba0584a9b994f3404a2c62920ce6652.diff LOG: [SCEV][IndVarSimplify] insert point should not be block front. The block front may be a PHI node, inserting a cast instructions like BitCast, PtrToInt, IntToPtr among PHIs is not right. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D80975 Added: llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll Modified: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp index f6a0b6ea4637..0357d905fde5 100644 --- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp +++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp @@ -1435,8 +1435,12 @@ PHINode *WidenIV::createWideIV(SCEVExpander &Rewriter) { // either find an existing phi or materialize a new one. Either way, we // expect a well-formed cyclic phi-with-increments. i.e. any operand not part // of the phi-SCC dominates the loop entry. - Instruction *InsertPt = &L->getHeader()->front(); - WidePhi = cast(Rewriter.expandCodeFor(AddRec, WideType, InsertPt)); + Instruction *InsertPt = &*L->getHeader()->getFirstInsertionPt(); + WidePhi = dyn_cast(Rewriter.expandCodeFor(AddRec, WideType, InsertPt)); + // If the wide phi is not a phi node, for example a cast node, like bitcast, + // inttoptr, ptrtoint, just skip for now. + if (!WidePhi) + return nullptr; // Remembering the WideIV increment generated by SCEVExpander allows // widenIVUse to reuse it when widening the narrow IV's increment. We don't diff --git a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp index 71b48482f26a..c54ae26b5323 100644 --- a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp +++ b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp @@ -1292,7 +1292,8 @@ SCEVExpander::getAddRecExprPHILiterally(const SCEVAddRecExpr *Normalized, if (useSubtract) Step = SE.getNegativeSCEV(Step); // Expand the step somewhere that dominates the loop header. - Value *StepV = expandCodeFor(Step, IntTy, &L->getHeader()->front()); + Value *StepV = expandCodeFor(Step, IntTy, + &*L->getHeader()->getFirstInsertionPt()); // The no-wrap behavior proved by IsIncrement(NUW|NSW) is only applicable if // we actually do emit an addition. It does not apply if we emit a @@ -1438,7 +1439,8 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) { { // Expand the step somewhere that dominates the loop header. SCEVInsertPointGuard Guard(Builder, this); - StepV = expandCodeFor(Step, IntTy, &L->getHeader()->front()); + StepV = expandCodeFor(Step, IntTy, + &*L->getHeader()->getFirstInsertionPt()); } Result = expandIVInc(PN, StepV, L, ExpandTy, IntTy, useSubtract); } @@ -1870,11 +1872,6 @@ Value *SCEVExpander::expand(const SCEV *S) { } } - // IndVarSimplify sometimes sets the insertion point at the block start, even - // when there are PHIs at that point. We must correct for this. - if (isa(*InsertPt)) - InsertPt = &*InsertPt->getParent()->getFirstInsertionPt(); - // Check to see if we already expanded this here. auto I = InsertedExpressions.find(std::make_pair(S, InsertPt)); if (I != InsertedExpressions.end()) @@ -1945,7 +1942,8 @@ SCEVExpander::getOrInsertCanonicalInductionVariable(const Loop *L, // Emit code for it. SCEVInsertPointGuard Guard(Builder, this); PHINode *V = - cast(expandCodeFor(H, nullptr, &L->getHeader()->front())); + cast(expandCodeFor(H, nullptr, + &*L->getHeader()->getFirstInsertionPt())); return V; } diff --git a/llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll b/llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll new file mode 100644 index 000000000000..80191d4e5b77 --- /dev/null +++ b/llvm/test/Transforms/IndVarSimplify/widen-i32-i8ptr.ll @@ -0,0 +1,24 @@ +; RUN: opt < %s -indvars -S | FileCheck %s + +target datalayout = "e-m:e-i64:64-n32:64" + +define dso_local void @Widen_i32_i8ptr() local_unnamed_addr { +; CHECK-LABEL: @Widen_i32_i8ptr( +; CHECK: phi i8* +; CHECK: phi i32 +entry: + %ptrids = alloca [15 x i8*], align 8 + %arraydecay2032 = getelementptr inbounds [15 x i8*], [15 x i8*]* %ptrids, i64 0, i64 0 + store i8** %arraydecay2032, i8*** inttoptr (i64 8 to i8***), align 8 + br label %for.cond2106 + +for.cond2106: ; preds = %for.cond2106, %entry + %gid.0 = phi i8* [ null, %entry ], [ %incdec.ptr, %for.cond2106 ] + %i.0 = phi i32 [ 0, %entry ], [ %inc2117, %for.cond2106 ] + %incdec.ptr = getelementptr inbounds i8, i8* %gid.0, i64 1 + %idxprom2114 = zext i32 %i.0 to i64 + %arrayidx2115 = getelementptr inbounds [15 x i8*], [15 x i8*]* %ptrids, i64 0, i64 %idxprom2114 + store i8* %gid.0, i8** %arrayidx2115, align 8 + %inc2117 = add nuw nsw i32 %i.0, 1 + br label %for.cond2106 +} From llvm-commits at lists.llvm.org Thu Jul 9 18:59:25 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 01:59:25 +0000 (UTC) Subject: [PATCH] D80975: [SCEV][IndVarSimplify] insert point should not be block front if the front instruction is a PHI In-Reply-To: References: Message-ID: shchenz added a comment. I will let this patch commit first since it was already approved by @lebedev.ri . Welcome your post commit comments @reames @mkazantsev Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80975/new/ https://reviews.llvm.org/D80975 From llvm-commits at lists.llvm.org Thu Jul 9 19:24:29 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 02:24:29 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: <64111ff50177079aac66e43e55b365eb@localhost.localdomain> PaoloS marked an inline comment as done. PaoloS added inline comments. ================ Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoB.td:641 +def SROIPat : ComplexPattern; +def SLOIWPat : ComplexPattern; +def SROIWPat : ComplexPattern; ---------------- lewis-revill wrote: > Can these W selects be guarded for 64 bit only? Not sure how to do it, they can't be enclosed in Predicates like the instruction patterns. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 From llvm-commits at lists.llvm.org Thu Jul 9 19:27:00 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 02:27:00 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: <80e47595ca446114f74b9833d50525fa@localhost.localdomain> hubert.reinterpretcast added inline comments. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:486 +#define PrintAuxMember(H, S, T, X) \ + W.print##H(S, T); \ ---------------- This macro does not operate within the confines of what a function can do with respect to its caller (it can cause the caller to return early). I do not believe that using a function-like naming style is appropriate. I also do not believe that using such a macro for control flow is desirable. You can encode a table (yes, a macro is okay for that) with much the same information: (format, description, pointer-to-member, offset in the table past-the-end of the member) and use that table in the place where this macro is being invoked. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:491 + W.print##H(S, T); \ + if ((X = X - sizeof(T)) == 0) \ + return ---------------- DiggerLin wrote: > hubert.reinterpretcast wrote: > > This strikes me as extremely hazardous. What if we get a length value that is reflective of a partial field? > thanks We still have to build with C++14 compilers for the time being. Assigning a large 64-bit value to a 32-bit signed type is verboten. In any case, checking the table size against the last field of the table I described above would avoid this issue. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Thu Jul 9 19:33:55 2020 From: llvm-commits at lists.llvm.org (Paolo Savini via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 02:33:55 +0000 (UTC) Subject: [PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions In-Reply-To: References: Message-ID: PaoloS updated this revision to Diff 276897. PaoloS added a comment. Fixed indentation. Added architecture type control for complex pattern matching of sloiw, sroiw and slliuw. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79870/new/ https://reviews.llvm.org/D79870 Files: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoB.td llvm/test/CodeGen/RISCV/rv32Zbb.ll llvm/test/CodeGen/RISCV/rv64Zbb.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79870.276897.patch Type: text/x-patch Size: 67050 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 19:47:50 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 02:47:50 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <535d187696ad43a63c6a2adcf645d407@localhost.localdomain> sameerds added a comment. In D83394#2142277 , @foad wrote: > Rebase. > Fix silly mistake in checking for negative offsets. It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Thu Jul 9 20:17:39 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:17:39 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: <20dcbffe062a695919f16061c2c9dbbb@localhost.localdomain> sameerarora101 updated this revision to Diff 276901. sameerarora101 added a comment. `enum` -> `enum class` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/test/tools/llvm-libtool-darwin/missing-library-type.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.276901.patch Type: text/x-patch Size: 12565 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 20:27:02 2020 From: llvm-commits at lists.llvm.org (Jan Vesely via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:27:02 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: <9f7672483c1dacf58044f34447960584@localhost.localdomain> jvesely added a comment. What is the problem this patch is trying to address? The specs do not mandate these two values to be different. On the more practical side. This patch only changes fp32 implementation to return the new value leaving the fp64 implementation to return `INT_MIN` in both cases. The implementation now returns `FP_ILOGBNAN` even for `Inf` input, which is not correct. CLC spec doesn't talk about `Inf` inputs, but the libm behaviour is to return `INT_MAX, which might be useful. If `FP_ILOGBNAN` and `FP_ILOGB0` need to be different it'd be better to use `FP_ILOGBNAN == INT_MIN` and `FP_ILOGB0 == -INT_MAX`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Thu Jul 9 20:28:19 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:28:19 +0000 (UTC) Subject: [PATCH] D83526: [FileCheck] In input dump, elide only if ellipsis is shorter In-Reply-To: References: Message-ID: mehdi_amini accepted this revision. mehdi_amini added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/utils/FileCheck/FileCheck.cpp:432 +static void DumpEllipsisOrElidedLines(raw_ostream &OS, std::string &ElidedLines, + unsigned LabelWidth) { ---------------- Maybe you can add a comment to describe this function? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83526/new/ https://reviews.llvm.org/D83526 From llvm-commits at lists.llvm.org Thu Jul 9 20:30:59 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:30:59 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: <0d57ba6256daf65c1b260a67aaa06cb7@localhost.localdomain> mehdi_amini accepted this revision. mehdi_amini added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/utils/FileCheck/FileCheck.cpp:455 + case DumpInputFilterAll: + llvm_unreachable("unexpected DumpInputFilterAll"); + break; ---------------- In a tool like FileCheck I rather err on the side of deterministically failing with a `report_fatal_error` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 From llvm-commits at lists.llvm.org Thu Jul 9 20:36:17 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:36:17 +0000 (UTC) Subject: [PATCH] D83520: [llvm-libtool-darwin] Allow flattening archives In-Reply-To: References: Message-ID: <21eaa3561712ff900ab9772183af461c@localhost.localdomain> sameerarora101 updated this revision to Diff 276904. sameerarora101 added a comment. Updating type and comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83520/new/ https://reviews.llvm.org/D83520 Files: llvm/test/tools/llvm-libtool-darwin/Inputs/invalid-archive.lib llvm/test/tools/llvm-libtool-darwin/invalid-archive.test llvm/test/tools/llvm-libtool-darwin/valid-archive.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83520.276904.patch Type: text/x-patch Size: 8045 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 20:38:33 2020 From: llvm-commits at lists.llvm.org (Richard Smith via llvm-commits) Date: Thu, 09 Jul 2020 20:38:33 -0700 (PDT) Subject: [llvm] 553dbb6 - [demangler] Don't allow the template parameters from the in a Message-ID: <5f07e2b9.1c69fb81.20553.b977@mx.google.com> Author: Richard Smith Date: 2020-07-09T20:38:19-07:00 New Revision: 553dbb6d7b32cc786281dea2c58a420bcbc9bb82 URL: https://github.com/llvm/llvm-project/commit/553dbb6d7b32cc786281dea2c58a420bcbc9bb82 DIFF: https://github.com/llvm/llvm-project/commit/553dbb6d7b32cc786281dea2c58a420bcbc9bb82.diff LOG: [demangler] Don't allow the template parameters from the in a to leak out into later parts of the name. This caused us to fail to demangle certain constructs involving generic lambdas. Added: Modified: libcxxabi/src/demangle/ItaniumDemangle.h libcxxabi/test/test_demangle.pass.cpp llvm/include/llvm/Demangle/ItaniumDemangle.h Removed: ################################################################################ diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h index dcece3899213..0f81675f244d 100644 --- a/libcxxabi/src/demangle/ItaniumDemangle.h +++ b/libcxxabi/src/demangle/ItaniumDemangle.h @@ -5096,6 +5096,8 @@ Node *AbstractManglingParser::parseSpecialName() { // ::= template Node *AbstractManglingParser::parseEncoding() { + ScopedTemplateParamList EncodingTemplateParams(this); + if (look() == 'G' || look() == 'T') return getDerived().parseSpecialName(); diff --git a/libcxxabi/test/test_demangle.pass.cpp b/libcxxabi/test/test_demangle.pass.cpp index c75b6974881e..b8b2ad51a994 100644 --- a/libcxxabi/test/test_demangle.pass.cpp +++ b/libcxxabi/test/test_demangle.pass.cpp @@ -29792,6 +29792,16 @@ const char* cases[][2] = // "auto inline_func()::'lambda'(int, int) const" {"_ZZ11inline_funcvENKUlTyTyT_T0_E_clIiiEEDaS_S0_", "auto inline_func()::'lambda'($T, $T0)::operator()($T, $T0) const"}, {"_ZZ11inline_funcvENKUlTyTyT_T1_T0_E_clIiiiEEDaS_S0_S1_", "auto inline_func()::'lambda'($T, auto, $T0)::operator()($T, auto, $T0) const"}, + {"_ZN1XIZ1fIiEvOT_EUlOT_DpT0_E_EclIJEEEvDpT_", "void X(int&&)::'lambda'(auto&&, auto...)>::operator()<>()"}, + // FIXME: This is wrong, should demangle to the same as the previous entry. + // See https://github.com/itanium-cxx-abi/cxx-abi/issues/106. + {"_ZN1XIZ1fIiEvOT_EUlS2_DpT0_E_EclIJEEEvDpT_", "void X(int&&)::'lambda'(int&&, auto...)>::operator()<>()"}, + + // FIXME: This is wrong; the S2_ backref should expand to OT_ and then to + // "double&&". But we can't cope with a substitution that represents a + // diff erent type the node it is a substitute for. + // See https://github.com/itanium-cxx-abi/cxx-abi/issues/106. + {"_Z1h1XIJZ1fIiEDaOT_E1AZ1gIdEDaS2_E1BEE", "h(X(int&&)::A, auto g(int&&)::B>)"}, {"_Z1fIL4Enumn1EEvv", "void f<(Enum)-1>()"}, diff --git a/llvm/include/llvm/Demangle/ItaniumDemangle.h b/llvm/include/llvm/Demangle/ItaniumDemangle.h index dcece3899213..0f81675f244d 100644 --- a/llvm/include/llvm/Demangle/ItaniumDemangle.h +++ b/llvm/include/llvm/Demangle/ItaniumDemangle.h @@ -5096,6 +5096,8 @@ Node *AbstractManglingParser::parseSpecialName() { // ::= template Node *AbstractManglingParser::parseEncoding() { + ScopedTemplateParamList EncodingTemplateParams(this); + if (look() == 'G' || look() == 'T') return getDerived().parseSpecialName(); From llvm-commits at lists.llvm.org Thu Jul 9 20:46:31 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:46:31 +0000 (UTC) Subject: [PATCH] D83002: [llvm-libtool-darwin] Add support for -static option In-Reply-To: References: Message-ID: sameerarora101 updated this revision to Diff 276905. sameerarora101 added a comment. As per @alexshap's recommendation, `verifyDarwinObject` -> `verifyMachOObject` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83002/new/ https://reviews.llvm.org/D83002 Files: llvm/docs/CommandGuide/llvm-libtool-darwin.rst llvm/test/tools/llvm-libtool-darwin/basic.test llvm/test/tools/llvm-libtool-darwin/create-static-lib.test llvm/test/tools/llvm-libtool-darwin/help-message.test llvm/test/tools/llvm-libtool-darwin/invalid-input-output-args.test llvm/test/tools/llvm-libtool-darwin/missing-library-type.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83002.276905.patch Type: text/x-patch Size: 12560 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 20:48:52 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 03:48:52 +0000 (UTC) Subject: [PATCH] D83365: [PowerPC] start and end parameters for fixupIsDeadOrKill may exist in different block before RA In-Reply-To: References: Message-ID: <4d1d6e3c8979d6e165527da6993de406@localhost.localdomain> shchenz updated this revision to Diff 276906. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83365/new/ https://reviews.llvm.org/D83365 Files: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.h llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir Index: llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir =================================================================== --- /dev/null +++ llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir @@ -0,0 +1,21 @@ +# RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs -start-before ppc-mi-peepholes \ +# RUN: -stop-after ppc-mi-peepholes %s -o - | FileCheck %s + +--- +name: test +#CHECK : name : test +tracksRegLiveness: true +body: | + bb.0.entry: + liveins: $x3 + %0:g8rc = COPY $x3 + %1:gprc = COPY %0.sub_32:g8rc + %2:g8rc = LI8 63 + + bb.1: + %3:gprc = COPY %2.sub_32:g8rc + ; CHECK: %4:gprc = LI 0 + %4:gprc = XORI killed %3:gprc, 63 + STW killed %4:gprc, %4:gprc, 100 + BLR8 implicit $lr8, implicit $rm +... Index: llvm/lib/Target/PowerPC/PPCInstrInfo.h =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.h +++ llvm/lib/Target/PowerPC/PPCInstrInfo.h @@ -565,14 +565,18 @@ int64_t OffsetImm) const; /// Fixup killed/dead flag for register \p RegNo between instructions [\p - /// StartMI, \p EndMI]. Some PostRA transformations may violate register - /// killed/dead flags semantics, this function can be called to fix up. Before - /// calling this function, + /// StartMI, \p EndMI]. Some pre-RA or post-RA transformations may violate + /// register killed/dead flags semantics, this function can be called to fix + /// up. Before calling this function, /// 1. Ensure that \p RegNo liveness is killed after instruction \p EndMI. /// 2. Ensure that there is no new definition between (\p StartMI, \p EndMI) /// and possible definition for \p RegNo is \p StartMI or \p EndMI. - /// 3. Ensure that all instructions between [\p StartMI, \p EndMI] are in same - /// basic block. + /// 3. We can do accurate fixup for the case when all instructions between + /// [\p StartMI, \p EndMI] are in same basic block. + /// 4. For the case when \p StartMI and \p EndMI are not in same basic block, + /// we conservatively clear kill flag for all uses of \p RegNo for pre-RA + /// and for post-RA, we give an assertion as without reaching definition + /// analysis post-RA, \p StartMI and \p EndMI are hard to keep right. void fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const; void replaceInstrWithLI(MachineInstr &MI, const LoadImmediateInfo &LII) const; Index: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.cpp +++ llvm/lib/Target/PowerPC/PPCInstrInfo.cpp @@ -2655,10 +2655,15 @@ void PPCInstrInfo::fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const { - - // Instructions between [StartMI, EndMI] should be in same basic block. - assert((StartMI.getParent() == EndMI.getParent()) && - "Instructions are not in same basic block"); + // Conservatively clear kill flag for the register if the instructions are in + // different basic blocks and in SSA form, because the kill flag may no longer + // be right. There is no need to bother with dead flags since defs with no + // uses will be handled by DCE. + MachineRegisterInfo &MRI = StartMI.getParent()->getParent()->getRegInfo(); + if (MRI.isSSA() && (StartMI.getParent() != EndMI.getParent())) { + MRI.clearKillFlags(RegNo); + return; + } bool IsKillSet = false; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83365.276906.patch Type: text/x-patch Size: 3577 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 21:00:03 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:00:03 +0000 (UTC) Subject: [PATCH] D83520: [llvm-libtool-darwin] Allow flattening archives In-Reply-To: References: Message-ID: <92653a20050f7d082d037a8f19bea48f@localhost.localdomain> sameerarora101 updated this revision to Diff 276908. sameerarora101 added a comment. `verifyDarwinObject` -> `verifyMachOObject` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83520/new/ https://reviews.llvm.org/D83520 Files: llvm/test/tools/llvm-libtool-darwin/Inputs/invalid-archive.lib llvm/test/tools/llvm-libtool-darwin/invalid-archive.test llvm/test/tools/llvm-libtool-darwin/valid-archive.test llvm/tools/llvm-libtool-darwin/CMakeLists.txt llvm/tools/llvm-libtool-darwin/LLVMBuild.txt llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83520.276908.patch Type: text/x-patch Size: 7990 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 21:08:08 2020 From: llvm-commits at lists.llvm.org (Kyungwoo Lee via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:08:08 +0000 (UTC) Subject: [PATCH] D76570: [AArch64] Homogeneous Prolog and Epilog for Size Optimization In-Reply-To: References: Message-ID: <6802c919e9d50269419c6424c48dee5f@localhost.localdomain> kyulee updated this revision to Diff 276909. kyulee added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76570/new/ https://reviews.llvm.org/D76570 Files: llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/lib/Target/AArch64/AArch64LowerHomogeneousPrologEpilog.cpp llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/CMakeLists.txt llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-frame-tail.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog-no-helper.ll llvm/test/CodeGen/AArch64/arm64-homogeneous-prolog-epilog.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D76570.276909.patch Type: text/x-patch Size: 43538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 21:13:08 2020 From: llvm-commits at lists.llvm.org (Richard Smith via llvm-commits) Date: Thu, 09 Jul 2020 21:13:08 -0700 (PDT) Subject: [llvm] b03f175 - [demangler] More properly save and restore the template parameter state Message-ID: <5f07ead4.1c69fb81.b0849.aaeb@mx.google.com> Author: Richard Smith Date: 2020-07-09T21:12:51-07:00 New Revision: b03f1756fb4fd5ac5d606a7e4fd8aea1d9f18541 URL: https://github.com/llvm/llvm-project/commit/b03f1756fb4fd5ac5d606a7e4fd8aea1d9f18541 DIFF: https://github.com/llvm/llvm-project/commit/b03f1756fb4fd5ac5d606a7e4fd8aea1d9f18541.diff LOG: [demangler] More properly save and restore the template parameter state when parsing an encoding. Added: Modified: libcxxabi/src/demangle/ItaniumDemangle.h libcxxabi/test/test_demangle.pass.cpp llvm/include/llvm/Demangle/ItaniumDemangle.h Removed: ################################################################################ diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h index 0f81675f244d..6ab873218386 100644 --- a/libcxxabi/src/demangle/ItaniumDemangle.h +++ b/libcxxabi/src/demangle/ItaniumDemangle.h @@ -5096,7 +5096,21 @@ Node *AbstractManglingParser::parseSpecialName() { // ::= template Node *AbstractManglingParser::parseEncoding() { - ScopedTemplateParamList EncodingTemplateParams(this); + // The template parameters of an encoding are unrelated to those of the + // enclosing context. + class SaveTemplateParams { + AbstractManglingParser *Parser; + decltype(TemplateParams) OldParams; + + public: + SaveTemplateParams(AbstractManglingParser *Parser) : Parser(Parser) { + OldParams = std::move(Parser->TemplateParams); + Parser->TemplateParams.clear(); + } + ~SaveTemplateParams() { + Parser->TemplateParams = std::move(OldParams); + } + } SaveTemplateParams(this); if (look() == 'G' || look() == 'T') return getDerived().parseSpecialName(); diff --git a/libcxxabi/test/test_demangle.pass.cpp b/libcxxabi/test/test_demangle.pass.cpp index b8b2ad51a994..ef75b61a94af 100644 --- a/libcxxabi/test/test_demangle.pass.cpp +++ b/libcxxabi/test/test_demangle.pass.cpp @@ -29796,6 +29796,7 @@ const char* cases[][2] = // FIXME: This is wrong, should demangle to the same as the previous entry. // See https://github.com/itanium-cxx-abi/cxx-abi/issues/106. {"_ZN1XIZ1fIiEvOT_EUlS2_DpT0_E_EclIJEEEvDpT_", "void X(int&&)::'lambda'(int&&, auto...)>::operator()<>()"}, + {"_ZZZZN6abcdef9abcdefghi29abcdefabcdefabcdefabcefabcdef27xxxxxxxxxxxxxxxxxxxxxxxxxxxEN4absl8DurationERKNSt3__u12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEPNS1_19yyyyyyyyyyyyyyyyyyyEENK3$_5clEvENKUlvE_clEvE6zzzzzz", "abcdef::abcdefghi::abcdefabcdefabcdefabcefabcdef::xxxxxxxxxxxxxxxxxxxxxxxxxxx(absl::Duration, std::__u::basic_string, std::__u::allocator > const&, abcdef::abcdefghi::abcdefabcdefabcdefabcefabcdef::yyyyyyyyyyyyyyyyyyy*)::$_5::operator()() const::'lambda'()::operator()() const::zzzzzz"}, // FIXME: This is wrong; the S2_ backref should expand to OT_ and then to // "double&&". But we can't cope with a substitution that represents a diff --git a/llvm/include/llvm/Demangle/ItaniumDemangle.h b/llvm/include/llvm/Demangle/ItaniumDemangle.h index 0f81675f244d..6ab873218386 100644 --- a/llvm/include/llvm/Demangle/ItaniumDemangle.h +++ b/llvm/include/llvm/Demangle/ItaniumDemangle.h @@ -5096,7 +5096,21 @@ Node *AbstractManglingParser::parseSpecialName() { // ::= template Node *AbstractManglingParser::parseEncoding() { - ScopedTemplateParamList EncodingTemplateParams(this); + // The template parameters of an encoding are unrelated to those of the + // enclosing context. + class SaveTemplateParams { + AbstractManglingParser *Parser; + decltype(TemplateParams) OldParams; + + public: + SaveTemplateParams(AbstractManglingParser *Parser) : Parser(Parser) { + OldParams = std::move(Parser->TemplateParams); + Parser->TemplateParams.clear(); + } + ~SaveTemplateParams() { + Parser->TemplateParams = std::move(OldParams); + } + } SaveTemplateParams(this); if (look() == 'G' || look() == 'T') return getDerived().parseSpecialName(); From llvm-commits at lists.llvm.org Thu Jul 9 21:18:24 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:18:24 +0000 (UTC) Subject: [PATCH] D83375: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: References: Message-ID: <4ccda9997ba2506728aa1d1546fd92a0@localhost.localdomain> gchatelet marked 2 inline comments as done. gchatelet added inline comments. ================ Comment at: llvm/include/llvm/Bitcode/LLVMBitCodes.h:539 FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, vol, - // ordering, synchscope] + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, + // success_ordering, ssid, ---------------- jfb wrote: > gchatelet wrote: > > The documentation here was wrong. > > alignment was never stored for `FUNC_CODE_INST_CMPXCHG_OLD` and `failure_ordering` and `weak` were optional. > It used to only have "ordering", and didn't separate success / failure (so it wasn't optional as much as not there). Thx for the explanation :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83375/new/ https://reviews.llvm.org/D83375 From llvm-commits at lists.llvm.org Thu Jul 9 21:27:50 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Thu, 09 Jul 2020 21:27:50 -0700 (PDT) Subject: [llvm] 3058245 - [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) Message-ID: <5f07ee46.1c69fb81.3eb4d.b0f5@mx.google.com> Author: Guillaume Chatelet Date: 2020-07-10T04:27:39Z New Revision: 30582457b47004dec8a78144abc919a13ccbd08c URL: https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c DIFF: https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c.diff LOG: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) This is preparatory work to unable storing alignment for AtomicCmpXchgInst. See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168 Differential Revision: https://reviews.llvm.org/D83375 Added: Modified: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index de4fe6630324..a0c22a7d0905 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -536,8 +536,9 @@ enum FunctionCodes { FUNC_CODE_DEBUG_LOC = 35, // DEBUG_LOC: [Line,Col,ScopeVal, IAVal] FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, vol, - // ordering, synchscope] + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, + // success_ordering, ssid, + // failure_ordering?, weak?] FUNC_CODE_INST_ATOMICRMW = 38, // ATOMICRMW: [ptrty,ptr,val, operation, // align, vol, // ordering, synchscope] @@ -551,8 +552,9 @@ enum FunctionCodes { FUNC_CODE_INST_GEP = 43, // GEP: [inbounds, n x operands] FUNC_CODE_INST_STORE = 44, // STORE: [ptrty,ptr,valty,val, align, vol] FUNC_CODE_INST_STOREATOMIC = 45, // STORE: [ptrty,ptr,val, align, vol - FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty,ptr,valty,cmp,new, align, - // vol,ordering,synchscope] + FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty, ptr, cmp, newval, vol, + // success_ordering, ssid, + // failure_ordering, weak] FUNC_CODE_INST_LANDINGPAD = 47, // LANDINGPAD: [ty,val,num,id0,val0...] FUNC_CODE_INST_CLEANUPRET = 48, // CLEANUPRET: [val] or [val,bb#] FUNC_CODE_INST_CATCHRET = 49, // CATCHRET: [val,bb#] diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 659e26c2bd25..ad1e97540298 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -4982,63 +4982,120 @@ Error BitcodeReader::parseFunctionBody(Function *F) { InstructionList.push_back(I); break; } - case bitc::FUNC_CODE_INST_CMPXCHG_OLD: - case bitc::FUNC_CODE_INST_CMPXCHG: { - // CMPXCHG:[ptrty, ptr, cmp, new, vol, successordering, ssid, + case bitc::FUNC_CODE_INST_CMPXCHG_OLD: { + // CMPXCHG:[ptrty, ptr, cmp, new, vol, success_ordering, ssid, // failureordering?, isweak?] - unsigned OpNum = 0; - Value *Ptr, *Cmp, *New; - if (getValueTypePair(Record, OpNum, NextValueNo, Ptr, &FullTy)) + const size_t RecordCount = Record.size(); + unsigned Slot = 0; + Value *Ptr = nullptr; + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) return error("Invalid record"); if (!isa(Ptr->getType())) return error("Cmpxchg operand is not a pointer type"); - if (BitCode == bitc::FUNC_CODE_INST_CMPXCHG) { - if (getValueTypePair(Record, OpNum, NextValueNo, Cmp, &FullTy)) - return error("Invalid record"); - } else if (popValue(Record, OpNum, NextValueNo, - getPointerElementFlatType(FullTy), Cmp)) + Value *Cmp = nullptr; + if (popValue(Record, Slot, NextValueNo, getPointerElementFlatType(FullTy), + Cmp)) return error("Invalid record"); - else - FullTy = cast(FullTy)->getElementType(); - if (popValue(Record, OpNum, NextValueNo, Cmp->getType(), New) || - Record.size() < OpNum + 3 || Record.size() > OpNum + 5) + if (!(RecordCount == 6 || RecordCount == 7 || RecordCount == 8)) return error("Invalid record"); - AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[OpNum + 1]); - if (SuccessOrdering == AtomicOrdering::NotAtomic || - SuccessOrdering == AtomicOrdering::Unordered) + Value *New = nullptr; + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) return error("Invalid record"); - SyncScope::ID SSID = getDecodedSyncScopeID(Record[OpNum + 2]); if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), Ptr->getType())) return Err; + + const bool IsVol = Record[3]; + + const AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[4]); + if (SuccessOrdering == AtomicOrdering::NotAtomic || + SuccessOrdering == AtomicOrdering::Unordered) + return error("Invalid record"); + + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); + AtomicOrdering FailureOrdering; - if (Record.size() < 7) + if (RecordCount > 6) + FailureOrdering = getDecodedOrdering(Record[6]); + else FailureOrdering = AtomicCmpXchgInst::getStrongestFailureOrdering(SuccessOrdering); - else - FailureOrdering = getDecodedOrdering(Record[OpNum + 3]); - Align Alignment( + const Align Alignment( TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); + + FullTy = cast(FullTy)->getElementType(); + FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID); - FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); - cast(I)->setVolatile(Record[OpNum]); - if (Record.size() < 8) { + cast(I)->setVolatile(IsVol); + + if (RecordCount > 7) { + cast(I)->setWeak(Record[7]); + } else { // Before weak cmpxchgs existed, the instruction simply returned the // value loaded from memory, so bitcode files from that era will be // expecting the first component of a modern cmpxchg. CurBB->getInstList().push_back(I); I = ExtractValueInst::Create(I, 0); FullTy = cast(FullTy)->getElementType(0); - } else { - cast(I)->setWeak(Record[OpNum+4]); } + InstructionList.push_back(I); + break; + } + case bitc::FUNC_CODE_INST_CMPXCHG: { + // CMPXCHG: [ptrty, ptr, cmp, newval, vol, success_ordering, ssid, + // failure_ordering, weak] + const size_t RecordCount = Record.size(); + unsigned Slot = 0; + Value *Ptr = nullptr; + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) + return error("Invalid record"); + + if (!isa(Ptr->getType())) + return error("Cmpxchg operand is not a pointer type"); + + Value *Cmp = nullptr; + if (getValueTypePair(Record, Slot, NextValueNo, Cmp, &FullTy)) + return error("Invalid record"); + + if (RecordCount != 8) + return error("Invalid record"); + + Value *New = nullptr; + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) + return error("Invalid record"); + + const bool IsVol = Record[3]; + + const AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[4]); + if (SuccessOrdering == AtomicOrdering::NotAtomic || + SuccessOrdering == AtomicOrdering::Unordered) + return error("Invalid record"); + + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); + + if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), Ptr->getType())) + return Err; + + const AtomicOrdering FailureOrdering = getDecodedOrdering(Record[6]); + + const bool IsWeak = Record[7]; + + const Align Alignment( + TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); + + FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); + I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, + FailureOrdering, SSID); + + cast(I)->setVolatile(IsVol); + cast(I)->setWeak(IsWeak); InstructionList.push_back(I); break; From llvm-commits at lists.llvm.org Thu Jul 9 21:28:04 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:28:04 +0000 (UTC) Subject: [PATCH] D83375: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. gchatelet marked an inline comment as done. Closed by commit rG30582457b470: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) (authored by gchatelet). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83375/new/ https://reviews.llvm.org/D83375 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83375.276910.patch Type: text/x-patch Size: 8009 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 21:34:37 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:34:37 +0000 (UTC) Subject: [PATCH] D82876: [Alignment][NFC] Migrate TargetTransformInfo::allowsMisalignedMemoryAccesses to Align In-Reply-To: References: Message-ID: <81b8fa3a1de04b2512fbd20df89af6cb@localhost.localdomain> gchatelet added a comment. @courbet I'm still willing to push this one. Shall I remove [NFC] and call it a day? AFAICT it's the only show stopper here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82876/new/ https://reviews.llvm.org/D82876 From llvm-commits at lists.llvm.org Thu Jul 9 21:48:57 2020 From: llvm-commits at lists.llvm.org (Rafik Zurob via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:48:57 +0000 (UTC) Subject: [PATCH] D83497: [PowerPC][Power10] Fix the VINSW instruction to have an i32 argument. In-Reply-To: References: Message-ID: <2e26afb3142ed8675c431e3f6e45012a@localhost.localdomain> rzurob requested changes to this revision. rzurob added a comment. This revision now requires changes to proceed. We should update vins[bhw][lr] too. They have the same problem. ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:523 Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_i64_ty, llvm_v4i32_ty], [IntrNoMem]>; ---------------- The same problem also occurs in vins[bhw][lr] Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83497/new/ https://reviews.llvm.org/D83497 From llvm-commits at lists.llvm.org Thu Jul 9 21:58:53 2020 From: llvm-commits at lists.llvm.org (Brian Yang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 04:58:53 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: <79decaef2fa04de427cdc331ab85d226@localhost.localdomain> ichoyjx added inline comments. ================ Comment at: flang/test/Semantics/omp-clause-validity01.f90:457 - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) ---------------- clementval wrote: > As a side note, This is supposed to be fine in Clang so I removed the check. I looked at the OpenMP 5.0 std and didn't see a restriction on `reduction` for `task loop simd`. What's the current plan? Are we trying to cover OpenMP 5.0 Spec for semantics (it appears so)? ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:229 VersionedClause, VersionedClause, VersionedClause ---------------- Bear with me, what does 50 mean? ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:427 VersionedClause, - VersionedClause, - VersionedClause + VersionedClause + ]; ---------------- For `target enter` and `target exit`, `nowait` is only allowed once. If it's allowed here, will this restriction be captured by the rules in `target` directive above? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Thu Jul 9 22:06:54 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 05:06:54 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <3cc322180763cf48b0e7162ef61cebd1@localhost.localdomain> yonghong-song added a comment. Thanks for the change. LGTM. Could you add some warnings for these unsupported types (using llvm::errs())? Without this patch, BTF will be rejected. This patch will make them pass verifier. Warning will let user know the program has unsupported types or certain limitations and these types and limitations are worked around by the compiler. People may report these types and limitations back to bpf community so we can evaluate whether we need to extend BTF to have proper support. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Thu Jul 9 22:40:19 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Thu, 09 Jul 2020 22:40:19 -0700 (PDT) Subject: [llvm] 760bbda - [llvm-symbolizer][test] Fix options-from-env.test Message-ID: <5f07ff43.1c69fb81.a57ff.adb7@mx.google.com> Author: Fangrui Song Date: 2020-07-09T22:39:56-07:00 New Revision: 760bbda2d8200481d03a46a74587035059dd12cc URL: https://github.com/llvm/llvm-project/commit/760bbda2d8200481d03a46a74587035059dd12cc DIFF: https://github.com/llvm/llvm-project/commit/760bbda2d8200481d03a46a74587035059dd12cc.diff LOG: [llvm-symbolizer][test] Fix options-from-env.test options-from-env.test (D71668) does not test it intended to test: `llvm-symbolizer 0x20112f` prints `0x20112f` in the absence of an environment variable. Added: Modified: llvm/test/tools/llvm-symbolizer/options-from-env.test Removed: ################################################################################ diff --git a/llvm/test/tools/llvm-symbolizer/options-from-env.test b/llvm/test/tools/llvm-symbolizer/options-from-env.test index 76a2987a64e6..5fb566f56a02 100644 --- a/llvm/test/tools/llvm-symbolizer/options-from-env.test +++ b/llvm/test/tools/llvm-symbolizer/options-from-env.test @@ -1,4 +1,6 @@ -RUN: env LLVM_SYMBOLIZER_OPTS=--print-address llvm-symbolizer 0x20112f | FileCheck %s -RUN: env LLVM_ADDR2LINE_OPTS=--print-address llvm-addr2line 0x20112f | FileCheck %s +# RUN: env LLVM_SYMBOLIZER_OPTS='0 1 --verbose' llvm-symbolizer 2 | FileCheck %s +# RUN: env LLVM_ADDR2LINE_OPTS='0 1 --verbose' llvm-addr2line 2 | FileCheck %s -CHECK: 0x20112f +# CHECK: 0 +# CHECK-NEXT: 1 +# CHECK-NEXT: 2 From llvm-commits at lists.llvm.org Thu Jul 9 22:53:15 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Thu, 09 Jul 2020 22:53:15 -0700 (PDT) Subject: [llvm] e71c7b5 - [CodeMoverUtils] Move OrderedInstructions to CodeMoverUtils Message-ID: <5f08024b.1c69fb81.b25d5.c005@mx.google.com> Author: SharmaRithik Date: 2020-07-10T11:22:43+05:30 New Revision: e71c7b593a2d1b7d60dc8aaa4b8ede03de7bbd00 URL: https://github.com/llvm/llvm-project/commit/e71c7b593a2d1b7d60dc8aaa4b8ede03de7bbd00 DIFF: https://github.com/llvm/llvm-project/commit/e71c7b593a2d1b7d60dc8aaa4b8ede03de7bbd00.diff LOG: [CodeMoverUtils] Move OrderedInstructions to CodeMoverUtils Summary: This patch moves OrderedInstructions to CodeMoverUtils as It was the only place where OrderedInstructions is required. Authored By: RithikSharma Reviewer: Whitney, bmahjour, etiotto, fhahn, nikic Reviewed By: Whitney, nikic Subscribers: mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D80643 Added: Modified: llvm/lib/Analysis/CMakeLists.txt llvm/lib/Transforms/Utils/CodeMoverUtils.cpp llvm/unittests/Analysis/CMakeLists.txt llvm/utils/gn/secondary/llvm/lib/Analysis/BUILD.gn llvm/utils/gn/secondary/llvm/unittests/Analysis/BUILD.gn Removed: llvm/include/llvm/Analysis/OrderedInstructions.h llvm/lib/Analysis/OrderedInstructions.cpp llvm/unittests/Analysis/OrderedInstructionsTest.cpp ################################################################################ diff --git a/llvm/include/llvm/Analysis/OrderedInstructions.h b/llvm/include/llvm/Analysis/OrderedInstructions.h deleted file mode 100644 index b2bf85750228..000000000000 --- a/llvm/include/llvm/Analysis/OrderedInstructions.h +++ /dev/null @@ -1,57 +0,0 @@ -//===- llvm/Transforms/Utils/OrderedInstructions.h -------------*- C++ -*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file defines an efficient way to check for dominance relation between 2 -// instructions. -// -// FIXME: This is really just a convenience wrapper to check dominance between -// two arbitrary instructions in diff erent basic blocks. We should fold it into -// DominatorTree, which is the more widely used interface. -// -//===----------------------------------------------------------------------===// - -#ifndef LLVM_ANALYSIS_ORDEREDINSTRUCTIONS_H -#define LLVM_ANALYSIS_ORDEREDINSTRUCTIONS_H - -namespace llvm { - -class DominatorTree; -class Instruction; - -class OrderedInstructions { - /// The dominator tree of the parent function. - DominatorTree *DT; - - /// Return true if the first instruction comes before the second in the - /// same basic block. It will create an ordered basic block, if it does - /// not yet exist in OBBMap. - bool localDominates(const Instruction *, const Instruction *) const; - -public: - /// Constructor. - OrderedInstructions(DominatorTree *DT) : DT(DT) {} - - /// Return true if first instruction dominates the second. - bool dominates(const Instruction *, const Instruction *) const; - - /// Return true if the first instruction comes before the second in the - /// dominator tree DFS traversal if they are in diff erent basic blocks, - /// or if the first instruction comes before the second in the same basic - /// block. - bool dfsBefore(const Instruction *, const Instruction *) const; - - // Return true if the first instruction comes before the second in the - // dominator tree BFS traversal based on the level number of nodes in - // dominator tree if they are in diff erent basic blocks else if the first - // instruction comes before the second in the same basic block. - bool domTreeLevelBefore(const Instruction *, const Instruction *) const; -}; - -} // end namespace llvm - -#endif // LLVM_ANALYSIS_ORDEREDINSTRUCTIONS_H diff --git a/llvm/lib/Analysis/CMakeLists.txt b/llvm/lib/Analysis/CMakeLists.txt index 9cc2576ae1ee..a317579ecc83 100644 --- a/llvm/lib/Analysis/CMakeLists.txt +++ b/llvm/lib/Analysis/CMakeLists.txt @@ -90,7 +90,6 @@ add_llvm_component_library(LLVMAnalysis ObjCARCAnalysisUtils.cpp ObjCARCInstKind.cpp OptimizationRemarkEmitter.cpp - OrderedInstructions.cpp PHITransAddr.cpp PhiValues.cpp PostDominators.cpp diff --git a/llvm/lib/Analysis/OrderedInstructions.cpp b/llvm/lib/Analysis/OrderedInstructions.cpp deleted file mode 100644 index 58d9a618184a..000000000000 --- a/llvm/lib/Analysis/OrderedInstructions.cpp +++ /dev/null @@ -1,59 +0,0 @@ -//===-- OrderedInstructions.cpp - Instruction dominance function ---------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file defines utility to check dominance relation of 2 instructions. -// -//===----------------------------------------------------------------------===// - -#include "llvm/Analysis/OrderedInstructions.h" -#include "llvm/IR/Dominators.h" - -using namespace llvm; - -bool OrderedInstructions::localDominates(const Instruction *InstA, - const Instruction *InstB) const { - assert(InstA->getParent() == InstB->getParent() && - "Instructions must be in the same basic block"); - - return InstA->comesBefore(InstB); -} - -/// Given 2 instructions, check for dominance relation if the instructions are -/// in the same basic block. Otherwise, use dominator tree. -bool OrderedInstructions::dominates(const Instruction *InstA, - const Instruction *InstB) const { - // Use ordered basic block to do dominance check in case the 2 instructions - // are in the same basic block. - if (InstA->getParent() == InstB->getParent()) - return localDominates(InstA, InstB); - return DT->dominates(InstA->getParent(), InstB->getParent()); -} - -bool OrderedInstructions::dfsBefore(const Instruction *InstA, - const Instruction *InstB) const { - // Use ordered basic block in case the 2 instructions are in the same basic - // block. - if (InstA->getParent() == InstB->getParent()) - return localDominates(InstA, InstB); - - DomTreeNode *DA = DT->getNode(InstA->getParent()); - DomTreeNode *DB = DT->getNode(InstB->getParent()); - return DA->getDFSNumIn() < DB->getDFSNumIn(); -} - -bool OrderedInstructions::domTreeLevelBefore(const Instruction *InstA, - const Instruction *InstB) const { - // Use ordered basic block in case the 2 instructions are in the same basic - // block. - if (InstA->getParent() == InstB->getParent()) - return localDominates(InstA, InstB); - - DomTreeNode *DA = DT->getNode(InstA->getParent()); - DomTreeNode *DB = DT->getNode(InstB->getParent()); - return DA->getLevel() < DB->getLevel(); -} diff --git a/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp b/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp index 11a740f8285b..08047dc0f96e 100644 --- a/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp +++ b/llvm/lib/Transforms/Utils/CodeMoverUtils.cpp @@ -15,7 +15,6 @@ #include "llvm/ADT/Optional.h" #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/DependenceAnalysis.h" -#include "llvm/Analysis/OrderedInstructions.h" #include "llvm/Analysis/PostDominators.h" #include "llvm/Analysis/ValueTracking.h" #include "llvm/IR/Dominators.h" @@ -94,6 +93,18 @@ class ControlConditions { }; } // namespace +static bool domTreeLevelBefore(DominatorTree *DT, const Instruction *InstA, + const Instruction *InstB) { + // Use ordered basic block in case the 2 instructions are in the same + // block. + if (InstA->getParent() == InstB->getParent()) + return InstA->comesBefore(InstB); + + DomTreeNode *DA = DT->getNode(InstA->getParent()); + DomTreeNode *DB = DT->getNode(InstB->getParent()); + return DA->getLevel() < DB->getLevel(); +} + const Optional ControlConditions::collectControlConditions( const BasicBlock &BB, const BasicBlock &Dominator, const DominatorTree &DT, const PostDominatorTree &PDT, unsigned MaxLookup) { @@ -332,9 +343,8 @@ bool llvm::isSafeToMoveBefore(Instruction &I, Instruction &InsertPoint, if (&InsertPoint == OpInst || !DT.dominates(OpInst, &InsertPoint)) return false; - OrderedInstructions OI(&DT); DT.updateDFSNumbers(); - const bool MoveForward = OI.domTreeLevelBefore(&I, &InsertPoint); + const bool MoveForward = domTreeLevelBefore(&DT, &I, &InsertPoint); Instruction &StartInst = (MoveForward ? I : InsertPoint); Instruction &EndInst = (MoveForward ? InsertPoint : I); SmallPtrSet InstsToCheck; diff --git a/llvm/unittests/Analysis/CMakeLists.txt b/llvm/unittests/Analysis/CMakeLists.txt index 9f28bc701b58..42f7dd3c0610 100644 --- a/llvm/unittests/Analysis/CMakeLists.txt +++ b/llvm/unittests/Analysis/CMakeLists.txt @@ -29,7 +29,6 @@ add_llvm_unittest(AnalysisTests LoopNestTest.cpp MemoryBuiltinsTest.cpp MemorySSATest.cpp - OrderedInstructionsTest.cpp PhiValuesTest.cpp ProfileSummaryInfoTest.cpp ScalarEvolutionTest.cpp @@ -41,4 +40,4 @@ add_llvm_unittest(AnalysisTests ValueLatticeTest.cpp ValueTrackingTest.cpp VectorUtilsTest.cpp - ) \ No newline at end of file + ) diff --git a/llvm/unittests/Analysis/OrderedInstructionsTest.cpp b/llvm/unittests/Analysis/OrderedInstructionsTest.cpp deleted file mode 100644 index 473fe7f50fc8..000000000000 --- a/llvm/unittests/Analysis/OrderedInstructionsTest.cpp +++ /dev/null @@ -1,64 +0,0 @@ -//===- OrderedInstructions.cpp - Unit tests for OrderedInstructions ------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -#include "llvm/Analysis/OrderedInstructions.h" -#include "llvm/IR/BasicBlock.h" -#include "llvm/IR/Dominators.h" -#include "llvm/IR/IRBuilder.h" -#include "llvm/IR/Instructions.h" -#include "llvm/IR/LLVMContext.h" -#include "llvm/IR/Module.h" -#include "gtest/gtest.h" - -using namespace llvm; - -/// Check intra-basicblock and inter-basicblock dominance using -/// OrderedInstruction. -TEST(OrderedInstructionsTest, DominanceTest) { - LLVMContext Ctx; - Module M("test", Ctx); - IRBuilder<> B(Ctx); - FunctionType *FTy = - FunctionType::get(Type::getVoidTy(Ctx), {B.getInt8PtrTy()}, false); - Function *F = Function::Create(FTy, Function::ExternalLinkage, "f", M); - - // Create the function as follow and check for dominance relation. - // - // test(): - // bbx: - // loadx; - // loady; - // bby: - // loadz; - // return; - // - // More specifically, check for loadx -> (dominates) loady, - // loady -> loadx and loady -> loadz. - // - // Create BBX with 2 loads. - BasicBlock *BBX = BasicBlock::Create(Ctx, "bbx", F); - B.SetInsertPoint(BBX); - Argument *PointerArg = &*F->arg_begin(); - LoadInst *LoadInstX = B.CreateLoad(B.getInt8Ty(), PointerArg); - LoadInst *LoadInstY = B.CreateLoad(B.getInt8Ty(), PointerArg); - - // Create BBY with 1 load. - BasicBlock *BBY = BasicBlock::Create(Ctx, "bby", F); - B.SetInsertPoint(BBY); - LoadInst *LoadInstZ = B.CreateLoad(B.getInt8Ty(), PointerArg); - B.CreateRet(LoadInstZ); - std::unique_ptr DT(new DominatorTree(*F)); - OrderedInstructions OI(&*DT); - - // Intra-BB dominance test. - EXPECT_TRUE(OI.dominates(LoadInstX, LoadInstY)); - EXPECT_FALSE(OI.dominates(LoadInstY, LoadInstX)); - - // Inter-BB dominance test. - EXPECT_TRUE(OI.dominates(LoadInstY, LoadInstZ)); -} diff --git a/llvm/utils/gn/secondary/llvm/lib/Analysis/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Analysis/BUILD.gn index e7b89b791714..11498ed60298 100644 --- a/llvm/utils/gn/secondary/llvm/lib/Analysis/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/lib/Analysis/BUILD.gn @@ -88,7 +88,6 @@ static_library("Analysis") { "ObjCARCAnalysisUtils.cpp", "ObjCARCInstKind.cpp", "OptimizationRemarkEmitter.cpp", - "OrderedInstructions.cpp", "PHITransAddr.cpp", "PhiValues.cpp", "PostDominators.cpp", diff --git a/llvm/utils/gn/secondary/llvm/unittests/Analysis/BUILD.gn b/llvm/utils/gn/secondary/llvm/unittests/Analysis/BUILD.gn index 191d84837804..b0dcd497d844 100644 --- a/llvm/utils/gn/secondary/llvm/unittests/Analysis/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/unittests/Analysis/BUILD.gn @@ -31,7 +31,6 @@ unittest("AnalysisTests") { "LoopNestTest.cpp", "MemoryBuiltinsTest.cpp", "MemorySSATest.cpp", - "OrderedInstructionsTest.cpp", "PhiValuesTest.cpp", "ProfileSummaryInfoTest.cpp", "ScalarEvolutionTest.cpp", From llvm-commits at lists.llvm.org Thu Jul 9 22:53:21 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 05:53:21 +0000 (UTC) Subject: [PATCH] D83530: WIP: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable Message-ID: MaskRay created this revision. Herald added subscribers: llvm-commits, rupprecht, mgorny. Herald added a reviewer: jhenderson. Herald added a project: LLVM. - The way to turn off an `llvm::cl::opt` is `--inlines=false` or `--inlines=0`, which is different from prevailing --foo/--no-foo used by other user-facing utilities. - Handle --foo/--no-foo is cumbersome with llvm::cl (see --demangle/--no-demangle) Some behavior changes: - Added --no-inlines: replacement for -i=0 - --output-style= is ignored for llvm-addr2line This is a WIP because llvm::cl does not support grouped short options, POSIX.1-2017 12.2 Utility Syntax Guidelines: > One or more options without option-arguments, followed by at most one > option that takes an option-argument, should be accepted when grouped > behind one '-' delimiter. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83530 Files: llvm/test/tools/llvm-symbolizer/Inputs/flush-output.py llvm/test/tools/llvm-symbolizer/basic.s llvm/test/tools/llvm-symbolizer/functions.s llvm/test/tools/llvm-symbolizer/output-style-inlined.test llvm/test/tools/llvm-symbolizer/split-dwarf.test llvm/test/tools/llvm-symbolizer/untag-addresses.test llvm/tools/llvm-symbolizer/CMakeLists.txt llvm/tools/llvm-symbolizer/Opts.td llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83530.276912.patch Type: text/x-patch Size: 30549 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 22:59:35 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 05:59:35 +0000 (UTC) Subject: [PATCH] D83262: [llvm-symbolizer] Add options to disable printing source files & inlining In-Reply-To: References: Message-ID: <68c91b02dfd31f6e4578a8837899bf88@localhost.localdomain> MaskRay added a comment. Created D83530 to switch to OptTable. The only unimplemented feature is grouped short options (POSIX.1-2017 12.2 Utility Syntax Guidelines, Guideline 5) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83262/new/ https://reviews.llvm.org/D83262 From llvm-commits at lists.llvm.org Thu Jul 9 23:07:59 2020 From: llvm-commits at lists.llvm.org (Zakk Chen via llvm-commits) Date: Thu, 09 Jul 2020 23:07:59 -0700 (PDT) Subject: [llvm] 04b9a46 - [RISCV] Refactor FeatureRVCHints to make ProcessorModel more intuitive Message-ID: <5f0805bf.1c69fb81.ffc83.b5ca@mx.google.com> Author: Zakk Chen Date: 2020-07-09T23:07:39-07:00 New Revision: 04b9a46c842f793a2baedcad64de35fcbd3e93b7 URL: https://github.com/llvm/llvm-project/commit/04b9a46c842f793a2baedcad64de35fcbd3e93b7 DIFF: https://github.com/llvm/llvm-project/commit/04b9a46c842f793a2baedcad64de35fcbd3e93b7.diff LOG: [RISCV] Refactor FeatureRVCHints to make ProcessorModel more intuitive Reviewers: luismarques, asb, evandro Reviewed By: asb, evandro Tags: #llvm Differential Revision: https://reviews.llvm.org/D77030 Added: Modified: llvm/lib/Target/RISCV/RISCV.td llvm/lib/Target/RISCV/RISCVSubtarget.h llvm/test/MC/RISCV/rv32c-invalid.s Removed: ################################################################################ diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td index ec00b256eeab..f0583f691936 100644 --- a/llvm/lib/Target/RISCV/RISCV.td +++ b/llvm/lib/Target/RISCV/RISCV.td @@ -140,12 +140,12 @@ def HasStdExtB : Predicate<"Subtarget->hasStdExtB()">, AssemblerPredicate<(all_of FeatureStdExtB), "'B' (Bit Manipulation Instructions)">; -def FeatureRVCHints - : SubtargetFeature<"rvc-hints", "EnableRVCHintInstrs", "true", - "Enable RVC Hint Instructions.">; +def FeatureNoRVCHints + : SubtargetFeature<"no-rvc-hints", "EnableRVCHintInstrs", "false", + "Disable RVC Hint Instructions.">; def HasRVCHints : Predicate<"Subtarget->enableRVCHintInstrs()">, - AssemblerPredicate<(all_of FeatureRVCHints), - "RVC Hint Instructions">; + AssemblerPredicate<(all_of(not FeatureNoRVCHints)), + "RVC Hint Instructions">; def FeatureStdExtV : SubtargetFeature<"experimental-v", "HasStdExtV", "true", @@ -207,15 +207,13 @@ include "RISCVSchedRocket64.td" // RISC-V processors supported. //===----------------------------------------------------------------------===// -def : ProcessorModel<"generic-rv32", NoSchedModel, [FeatureRVCHints]>; +def : ProcessorModel<"generic-rv32", NoSchedModel, []>; -def : ProcessorModel<"generic-rv64", NoSchedModel, [Feature64Bit, - FeatureRVCHints]>; +def : ProcessorModel<"generic-rv64", NoSchedModel, [Feature64Bit]>; -def : ProcessorModel<"rocket-rv32", Rocket32Model, [FeatureRVCHints]>; +def : ProcessorModel<"rocket-rv32", Rocket32Model, []>; -def : ProcessorModel<"rocket-rv64", Rocket64Model, [Feature64Bit, - FeatureRVCHints]>; +def : ProcessorModel<"rocket-rv64", Rocket64Model, [Feature64Bit]>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h index 133542de2301..fe1285f23b15 100644 --- a/llvm/lib/Target/RISCV/RISCVSubtarget.h +++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h @@ -54,7 +54,7 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo { bool HasRV64 = false; bool IsRV32E = false; bool EnableLinkerRelax = false; - bool EnableRVCHintInstrs = false; + bool EnableRVCHintInstrs = true; bool EnableSaveRestore = false; unsigned XLen = 32; MVT XLenVT = MVT::i32; diff --git a/llvm/test/MC/RISCV/rv32c-invalid.s b/llvm/test/MC/RISCV/rv32c-invalid.s index 29cf0ac239fa..53b62c289e75 100644 --- a/llvm/test/MC/RISCV/rv32c-invalid.s +++ b/llvm/test/MC/RISCV/rv32c-invalid.s @@ -1,4 +1,4 @@ -# RUN: not llvm-mc -triple=riscv32 -mattr=+c -mattr=-rvc-hints < %s 2>&1 \ +# RUN: not llvm-mc -triple=riscv32 -mattr=+c -mattr=+no-rvc-hints < %s 2>&1 \ # RUN: | FileCheck %s ## GPRC From llvm-commits at lists.llvm.org Thu Jul 9 23:08:03 2020 From: llvm-commits at lists.llvm.org (Phabricator via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:08:03 +0000 (UTC) Subject: [PATCH] D77030: [RISCV] refactor FeatureRVCHints to make ProcessorModel more intuitive In-Reply-To: References: Message-ID: <050c49a244df2942f2b0263dbc421909@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG04b9a46c842f: [RISCV] Refactor FeatureRVCHints to make ProcessorModel more intuitive (authored by Zakk Chen <zakk.chen at sifive.com>). Changed prior to commit: https://reviews.llvm.org/D77030?vs=261187&id=276914#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77030/new/ https://reviews.llvm.org/D77030 Files: llvm/lib/Target/RISCV/RISCV.td llvm/lib/Target/RISCV/RISCVSubtarget.h llvm/test/MC/RISCV/rv32c-invalid.s Index: llvm/test/MC/RISCV/rv32c-invalid.s =================================================================== --- llvm/test/MC/RISCV/rv32c-invalid.s +++ llvm/test/MC/RISCV/rv32c-invalid.s @@ -1,4 +1,4 @@ -# RUN: not llvm-mc -triple=riscv32 -mattr=+c -mattr=-rvc-hints < %s 2>&1 \ +# RUN: not llvm-mc -triple=riscv32 -mattr=+c -mattr=+no-rvc-hints < %s 2>&1 \ # RUN: | FileCheck %s ## GPRC Index: llvm/lib/Target/RISCV/RISCVSubtarget.h =================================================================== --- llvm/lib/Target/RISCV/RISCVSubtarget.h +++ llvm/lib/Target/RISCV/RISCVSubtarget.h @@ -54,7 +54,7 @@ bool HasRV64 = false; bool IsRV32E = false; bool EnableLinkerRelax = false; - bool EnableRVCHintInstrs = false; + bool EnableRVCHintInstrs = true; bool EnableSaveRestore = false; unsigned XLen = 32; MVT XLenVT = MVT::i32; Index: llvm/lib/Target/RISCV/RISCV.td =================================================================== --- llvm/lib/Target/RISCV/RISCV.td +++ llvm/lib/Target/RISCV/RISCV.td @@ -140,12 +140,12 @@ AssemblerPredicate<(all_of FeatureStdExtB), "'B' (Bit Manipulation Instructions)">; -def FeatureRVCHints - : SubtargetFeature<"rvc-hints", "EnableRVCHintInstrs", "true", - "Enable RVC Hint Instructions.">; +def FeatureNoRVCHints + : SubtargetFeature<"no-rvc-hints", "EnableRVCHintInstrs", "false", + "Disable RVC Hint Instructions.">; def HasRVCHints : Predicate<"Subtarget->enableRVCHintInstrs()">, - AssemblerPredicate<(all_of FeatureRVCHints), - "RVC Hint Instructions">; + AssemblerPredicate<(all_of(not FeatureNoRVCHints)), + "RVC Hint Instructions">; def FeatureStdExtV : SubtargetFeature<"experimental-v", "HasStdExtV", "true", @@ -207,15 +207,13 @@ // RISC-V processors supported. //===----------------------------------------------------------------------===// -def : ProcessorModel<"generic-rv32", NoSchedModel, [FeatureRVCHints]>; +def : ProcessorModel<"generic-rv32", NoSchedModel, []>; -def : ProcessorModel<"generic-rv64", NoSchedModel, [Feature64Bit, - FeatureRVCHints]>; +def : ProcessorModel<"generic-rv64", NoSchedModel, [Feature64Bit]>; -def : ProcessorModel<"rocket-rv32", Rocket32Model, [FeatureRVCHints]>; +def : ProcessorModel<"rocket-rv32", Rocket32Model, []>; -def : ProcessorModel<"rocket-rv64", Rocket64Model, [Feature64Bit, - FeatureRVCHints]>; +def : ProcessorModel<"rocket-rv64", Rocket64Model, [Feature64Bit]>; //===----------------------------------------------------------------------===// -------------- next part -------------- A non-text attachment was scrubbed... Name: D77030.276914.patch Type: text/x-patch Size: 2757 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:17:03 2020 From: llvm-commits at lists.llvm.org (Serge Pavlov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:17:03 +0000 (UTC) Subject: [PATCH] D78896: [Support] Add file lock/unlock functions In-Reply-To: References: Message-ID: <62ca5ed55eaa57c14a985e31f2d866b8@localhost.localdomain> sepavloff updated this revision to Diff 276917. sepavloff added a comment. Rebased patch Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78896/new/ https://reviews.llvm.org/D78896 Files: llvm/include/llvm/Support/FileSystem.h llvm/lib/Support/Unix/Path.inc llvm/lib/Support/Windows/Path.inc llvm/unittests/Support/ProgramTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D78896.276917.patch Type: text/x-patch Size: 7922 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:18:34 2020 From: llvm-commits at lists.llvm.org (Jez Ng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:18:34 +0000 (UTC) Subject: [PATCH] D83532: [lld-macho] Partial support for weak definitions Message-ID: int3 created this revision. int3 added a reviewer: lld-macho. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. This diff adds support for weak definitions, though it doesn't handle weak symbols in dylibs quite correctly -- we need to emit binding opcodes for them in the weak binding section rather than the lazy binding section. What *is* covered in this diff: 1. Reading the weak flag from symbol table / export trie, and writing it to the export trie 2. Refining the symbol table's rules for choosing one symbol definition over another. Wrote a few dozen test cases to make sure we were matching ld64's behavior. We can now link basic C++ programs. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83532 Files: lld/MachO/Arch/X86_64.cpp lld/MachO/ExportTrie.cpp lld/MachO/ExportTrie.h lld/MachO/InputFiles.cpp lld/MachO/SymbolTable.cpp lld/MachO/SymbolTable.h lld/MachO/Symbols.h lld/MachO/SyntheticSections.cpp lld/test/MachO/weak-definition-direct-fetch.s lld/test/MachO/weak-definition-indirect-fetch.s lld/test/MachO/weak-definition-order.s lld/test/MachO/weak-definition-over-dysym.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83532.276918.patch Type: text/x-patch Size: 23600 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:30:37 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:30:37 +0000 (UTC) Subject: [PATCH] D83533: [Alignment][NFC] Update Bitcodewriter to use Align Message-ID: gchatelet created this revision. gchatelet added a reviewer: courbet. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83533 Files: llvm/lib/Bitcode/Writer/BitcodeWriter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83533.276921.patch Type: text/x-patch Size: 5237 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:35:24 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:35:24 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <6688fe82e3aee187d21c7510c157d804@localhost.localdomain> mkazantsev marked an inline comment as done. mkazantsev added inline comments. ================ Comment at: llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp:2518 + if (auto *PN = foldSelectToPhiImpl(Sel, I->getParent(), DT, Builder)) + return PN; + ---------------- nikic wrote: > It seems quite likely that some of the parents (or all of them) are going to be the same. Might it make sense to deduplicate? > > ``` > // Collect likely candidates for placing the phi node. > SmallPtrSet CandidateBlocks; > CandidateBlocks.insert(Sel.getParent(); > for (Value *V : Sel.operands()) > if (auto *I = dyn_cast(V)) > CandidateBlocks.insert(I->getParent()); > > for (BasicBlock *BB : CandidateBlocks) > if (auto *PN = foldSelectToPhiImpl(Sel, BB, DT, Builder)) > return PN; > ``` Agreed. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Thu Jul 9 23:43:01 2020 From: llvm-commits at lists.llvm.org (Pengfei Wang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:43:01 +0000 (UTC) Subject: [PATCH] D83534: [X86][MMX] Optimize MMX shift intrinsics. Message-ID: pengfei created this revision. pengfei added reviewers: craig.topper, LuoYuanke. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83534 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/mmx-intrinsics.ll Index: llvm/test/CodeGen/X86/mmx-intrinsics.ll =================================================================== --- llvm/test/CodeGen/X86/mmx-intrinsics.ll +++ llvm/test/CodeGen/X86/mmx-intrinsics.ll @@ -311,6 +311,19 @@ ret i64 %4 } +define i64 @test72_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test72_2 +; ALL-NOT: psraw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrai.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.q(x86_mmx, i32) nounwind readnone define i64 @test71(<1 x i64> %a) nounwind readnone optsize ssp { @@ -339,6 +352,19 @@ ret i64 %4 } +define i64 @test70_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test70_2 +; ALL-NOT: psrld +entry: + %0 = bitcast <1 x i64> %a to <2 x i32> + %mmx_var.i = bitcast <2 x i32> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrli.d(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <2 x i32> + %3 = bitcast <2 x i32> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.w(x86_mmx, i32) nounwind readnone define i64 @test69(<1 x i64> %a) nounwind readnone optsize ssp { @@ -397,6 +423,19 @@ ret i64 %4 } +define i64 @test66_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test66_2 +; ALL-NOT: psllw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.pslli.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psra.d(x86_mmx, x86_mmx) nounwind readnone define i64 @test65(<1 x i64> %a, <1 x i64> %b) nounwind readnone optsize ssp { Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -25236,6 +25236,9 @@ // Clamp out of bounds shift amounts since they will otherwise be masked // to 8-bits which may make it no longer out of bounds. unsigned ShiftAmount = C->getAPIntValue().getLimitedValue(255); + if (ShiftAmount == 0) + return Op.getOperand(1); + return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), Op.getOperand(0), Op.getOperand(1), DAG.getTargetConstant(ShiftAmount, DL, MVT::i32)); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83534.276922.patch Type: text/x-patch Size: 2784 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:45:55 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:45:55 +0000 (UTC) Subject: [PATCH] D83535: [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. Message-ID: craig.topper created this revision. craig.topper added reviewers: efriedma, lebedev.ri, nlopes, spatel, reames. Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. Herald added a project: LLVM. This matches the recent change to InstSimplify from D83440 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83535 Files: llvm/lib/IR/ConstantFold.cpp llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll llvm/test/Transforms/InstSimplify/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83535.276923.patch Type: text/x-patch Size: 6217 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 23:47:52 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:47:52 +0000 (UTC) Subject: [PATCH] D83533: [Alignment][NFC] Update Bitcodewriter to use Align In-Reply-To: References: Message-ID: courbet accepted this revision. courbet added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:2951 + using AlignField = Bitfield::Element; // bits : 0-4 + using UsedWithInAllocaField = Bitfield::Element; // bits : 5 + using UnknownField = Bitfield::Element; // bits : 6 ---------------- s/5/AlignField::NextBit/ ? (same below) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83533/new/ https://reviews.llvm.org/D83533 From llvm-commits at lists.llvm.org Thu Jul 9 23:53:03 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 06:53:03 +0000 (UTC) Subject: [PATCH] D83534: [X86][MMX] Optimize MMX shift intrinsics. In-Reply-To: References: Message-ID: <1b6b465e936ced9b6c9e956a7ba75561@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83534/new/ https://reviews.llvm.org/D83534 From llvm-commits at lists.llvm.org Fri Jul 10 00:02:01 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:02:01 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: <3c76d736c45cff0695f37ac6958ed290@localhost.localdomain> dnsampaio updated this revision to Diff 276926. dnsampaio added a comment. Preserve name Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/BDCE/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276926.patch Type: text/x-patch Size: 4218 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:11:47 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Fri, 10 Jul 2020 00:11:47 -0700 (PDT) Subject: [llvm] 229dfb4 - [CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector Message-ID: <5f0814b3.1c69fb81.49193.c59a@mx.google.com> Author: David Sherwood Date: 2020-07-10T08:11:30+01:00 New Revision: 229dfb4728f45cf9607aaa564155c267f3a0f59c URL: https://github.com/llvm/llvm-project/commit/229dfb4728f45cf9607aaa564155c267f3a0f59c DIFF: https://github.com/llvm/llvm-project/commit/229dfb4728f45cf9607aaa564155c267f3a0f59c.diff LOG: [CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector This patch replaces some invalid calls to getVectorNumElements() with calls to getVectorMinNumElements() instead, since the code paths changed in this patch work for both fixed and scalable vector types. Fixes warnings in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83203 Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 0b80173cb419..806509120869 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -9639,14 +9639,22 @@ SelectionDAG::GetDependentSplitDestVTs(const EVT &VT, const EVT &EnvVT, std::pair SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT, const EVT &HiVT) { - assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <= - N.getValueType().getVectorNumElements() && + assert(LoVT.isScalableVector() == HiVT.isScalableVector() && + LoVT.isScalableVector() == N.getValueType().isScalableVector() && + "Splitting vector with an invalid mixture of fixed and scalable " + "vector types"); + assert(LoVT.getVectorMinNumElements() + HiVT.getVectorMinNumElements() <= + N.getValueType().getVectorMinNumElements() && "More vector elements requested than available!"); SDValue Lo, Hi; Lo = getNode(ISD::EXTRACT_SUBVECTOR, DL, LoVT, N, getVectorIdxConstant(0, DL)); + // For scalable vectors it is safe to use LoVT.getVectorMinNumElements() + // (rather than having to use ElementCount), because EXTRACT_SUBVECTOR scales + // IDX with the runtime scaling factor of the result vector type. For + // fixed-width result vectors, that runtime scaling factor is 1. Hi = getNode(ISD::EXTRACT_SUBVECTOR, DL, HiVT, N, - getVectorIdxConstant(LoVT.getVectorNumElements(), DL)); + getVectorIdxConstant(LoVT.getVectorMinNumElements(), DL)); return std::make_pair(Lo, Hi); } From llvm-commits at lists.llvm.org Fri Jul 10 00:11:52 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:11:52 +0000 (UTC) Subject: [PATCH] D83203: [CodeGen] Fix warnings in SelectionDAG::SplitVector In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG229dfb4728f4: [CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector (authored by david-arm). Changed prior to commit: https://reviews.llvm.org/D83203?vs=275924&id=276929#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83203/new/ https://reviews.llvm.org/D83203 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -9639,14 +9639,22 @@ std::pair SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT, const EVT &HiVT) { - assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <= - N.getValueType().getVectorNumElements() && + assert(LoVT.isScalableVector() == HiVT.isScalableVector() && + LoVT.isScalableVector() == N.getValueType().isScalableVector() && + "Splitting vector with an invalid mixture of fixed and scalable " + "vector types"); + assert(LoVT.getVectorMinNumElements() + HiVT.getVectorMinNumElements() <= + N.getValueType().getVectorMinNumElements() && "More vector elements requested than available!"); SDValue Lo, Hi; Lo = getNode(ISD::EXTRACT_SUBVECTOR, DL, LoVT, N, getVectorIdxConstant(0, DL)); + // For scalable vectors it is safe to use LoVT.getVectorMinNumElements() + // (rather than having to use ElementCount), because EXTRACT_SUBVECTOR scales + // IDX with the runtime scaling factor of the result vector type. For + // fixed-width result vectors, that runtime scaling factor is 1. Hi = getNode(ISD::EXTRACT_SUBVECTOR, DL, HiVT, N, - getVectorIdxConstant(LoVT.getVectorNumElements(), DL)); + getVectorIdxConstant(LoVT.getVectorMinNumElements(), DL)); return std::make_pair(Lo, Hi); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83203.276929.patch Type: text/x-patch Size: 1613 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:19:06 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via llvm-commits) Date: Fri, 10 Jul 2020 00:19:06 -0700 (PDT) Subject: [llvm] 043eaa9 - [WebAssembly][NFC] Simplify vector shift lowering and add tests Message-ID: <5f08166a.1c69fb81.4a405.ca20@mx.google.com> Author: Thomas Lively Date: 2020-07-10T00:18:59-07:00 New Revision: 043eaa9a4a0808fe4e82b2ef1823ccafa491c065 URL: https://github.com/llvm/llvm-project/commit/043eaa9a4a0808fe4e82b2ef1823ccafa491c065 DIFF: https://github.com/llvm/llvm-project/commit/043eaa9a4a0808fe4e82b2ef1823ccafa491c065.diff LOG: [WebAssembly][NFC] Simplify vector shift lowering and add tests This patch builds on 0d7286a652 by simplifying the code for detecting splat values and adding new tests demonstrating the lowering of splatted absolute value shift amounts, which are common in code generated by Halide. The lowering is very bad right now, but subsequent patches will improve it considerably. The tests will be useful for evaluating the improvements in those patches. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D83493 Added: Modified: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp index 3f4ebd501595..a9b9eceb4130 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp @@ -1677,12 +1677,12 @@ SDValue WebAssemblyTargetLowering::LowerShift(SDValue Op, // Only manually lower vector shifts assert(Op.getSimpleValueType().isVector()); - auto ShiftVal = Op.getOperand(1); - if (!DAG.isSplatValue(ShiftVal, /*AllowUndefs=*/true)) + auto ShiftVal = DAG.getSplatValue(Op.getOperand(1)); + if (!ShiftVal) return unrollVectorShift(Op, DAG); - auto SplatVal = DAG.getSplatValue(ShiftVal); - assert(SplatVal != SDValue()); + // Use anyext because none of the high bits can affect the shift + ShiftVal = DAG.getAnyExtOrTrunc(ShiftVal, DL, MVT::i32); unsigned Opcode; switch (Op.getOpcode()) { @@ -1699,10 +1699,7 @@ SDValue WebAssemblyTargetLowering::LowerShift(SDValue Op, llvm_unreachable("unexpected opcode"); } - // Use anyext because none of the high bits can affect the shift - auto ScalarShift = DAG.getAnyExtOrTrunc(SplatVal, DL, MVT::i32); - return DAG.getNode(Opcode, DL, Op.getValueType(), Op.getOperand(0), - ScalarShift); + return DAG.getNode(Opcode, DL, Op.getValueType(), Op.getOperand(0), ShiftVal); } //===----------------------------------------------------------------------===// diff --git a/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll b/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll index ded430f89545..2473f0b27b7e 100644 --- a/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll +++ b/llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll @@ -25,3 +25,79 @@ define <16 x i8> @shl_add(<16 x i8> %v, i8 %a, i8 %b) { %r = shl <16 x i8> %v, %shift ret <16 x i8> %r } + +; CHECK-LABEL: shl_abs: +; CHECK-NEXT: .functype shl_abs (v128, i32) -> (v128) +; CHECK-NEXT: i8x16.extract_lane_u $push8=, $0, 0 +; CHECK-NEXT: i8x16.splat $push0=, $1 +; CHECK-NEXT: i8x16.abs $push98=, $pop0 +; CHECK-NEXT: local.tee $push97=, $2=, $pop98 +; CHECK-NEXT: i8x16.extract_lane_u $push6=, $pop97, 0 +; CHECK-NEXT: i32.const $push2=, 7 +; CHECK-NEXT: i32.and $push7=, $pop6, $pop2 +; CHECK-NEXT: i32.shl $push9=, $pop8, $pop7 +; CHECK-NEXT: i8x16.splat $push10=, $pop9 +; CHECK-NEXT: i8x16.extract_lane_u $push4=, $0, 1 +; CHECK-NEXT: i8x16.extract_lane_u $push1=, $2, 1 +; CHECK-NEXT: i32.const $push96=, 7 +; CHECK-NEXT: i32.and $push3=, $pop1, $pop96 +; CHECK-NEXT: i32.shl $push5=, $pop4, $pop3 +; CHECK-NEXT: i8x16.replace_lane $push11=, $pop10, 1, $pop5 +; ... +; CHECK: i8x16.extract_lane_u $push79=, $0, 15 +; CHECK-NEXT: i8x16.extract_lane_u $push77=, $2, 15 +; CHECK-NEXT: i32.const $push82=, 7 +; CHECK-NEXT: i32.and $push78=, $pop77, $pop82 +; CHECK-NEXT: i32.shl $push80=, $pop79, $pop78 +; CHECK-NEXT: i8x16.replace_lane $push81=, $pop76, 15, $pop80 +; CHECK-NEXT: return $pop81 +define <16 x i8> @shl_abs(<16 x i8> %v, i8 %a) { + %t1 = insertelement <16 x i8> undef, i8 %a, i32 0 + %va = shufflevector <16 x i8> %t1, <16 x i8> undef, <16 x i32> zeroinitializer + %nva = sub <16 x i8> zeroinitializer, %va + %c = icmp sgt <16 x i8> %va, zeroinitializer + %shift = select <16 x i1> %c, <16 x i8> %va, <16 x i8> %nva + %r = shl <16 x i8> %v, %shift + ret <16 x i8> %r +} + +; CHECK-LABEL: shl_abs_add: +; CHECK-NEXT: .functype shl_abs_add (v128, i32, i32) -> (v128) +; CHECK-NEXT: i8x16.extract_lane_u $push11=, $0, 0 +; CHECK-NEXT: i8x16.splat $push1=, $1 +; CHECK-NEXT: i8x16.splat $push0=, $2 +; CHECK-NEXT: i8x16.add $push2=, $pop1, $pop0 +; CHECK-NEXT: v8x16.shuffle $push3=, $pop2, $0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 +; CHECK-NEXT: i8x16.abs $push101=, $pop3 +; CHECK-NEXT: local.tee $push100=, $3=, $pop101 +; CHECK-NEXT: i8x16.extract_lane_u $push9=, $pop100, 0 +; CHECK-NEXT: i32.const $push5=, 7 +; CHECK-NEXT: i32.and $push10=, $pop9, $pop5 +; CHECK-NEXT: i32.shl $push12=, $pop11, $pop10 +; CHECK-NEXT: i8x16.splat $push13=, $pop12 +; CHECK-NEXT: i8x16.extract_lane_u $push7=, $0, 1 +; CHECK-NEXT: i8x16.extract_lane_u $push4=, $3, 1 +; CHECK-NEXT: i32.const $push99=, 7 +; CHECK-NEXT: i32.and $push6=, $pop4, $pop99 +; CHECK-NEXT: i32.shl $push8=, $pop7, $pop6 +; CHECK-NEXT: i8x16.replace_lane $push14=, $pop13, 1, $pop8 +; ... +; CHECK: i8x16.extract_lane_u $push82=, $0, 15 +; CHECK-NEXT: i8x16.extract_lane_u $push80=, $3, 15 +; CHECK-NEXT: i32.const $push85=, 7 +; CHECK-NEXT: i32.and $push81=, $pop80, $pop85 +; CHECK-NEXT: i32.shl $push83=, $pop82, $pop81 +; CHECK-NEXT: i8x16.replace_lane $push84=, $pop79, 15, $pop83 +; CHECK-NEXT: return $pop84 +define <16 x i8> @shl_abs_add(<16 x i8> %v, i8 %a, i8 %b) { + %t1 = insertelement <16 x i8> undef, i8 %a, i32 0 + %va = shufflevector <16 x i8> %t1, <16 x i8> undef, <16 x i32> zeroinitializer + %t2 = insertelement <16 x i8> undef, i8 %b, i32 0 + %vb = shufflevector <16 x i8> %t2, <16 x i8> undef, <16 x i32> zeroinitializer + %vadd = add <16 x i8> %va, %vb + %nvadd = sub <16 x i8> zeroinitializer, %vadd + %c = icmp sgt <16 x i8> %vadd, zeroinitializer + %shift = select <16 x i1> %c, <16 x i8> %vadd, <16 x i8> %nvadd + %r = shl <16 x i8> %v, %shift + ret <16 x i8> %r +} From llvm-commits at lists.llvm.org Fri Jul 10 00:19:17 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:19:17 +0000 (UTC) Subject: [PATCH] D83493: [WebAssembly][NFC] Simplify vector shift lowering and add tests In-Reply-To: References: Message-ID: <359b71adf4594033cba4dc774bb217b8@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG043eaa9a4a08: [WebAssembly][NFC] Simplify vector shift lowering and add tests (authored by tlively). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83493/new/ https://reviews.llvm.org/D83493 Files: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83493.276931.patch Type: text/x-patch Size: 5072 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:24:26 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:24:26 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <7334c7002e7056fdb2e089fe3570121b@localhost.localdomain> mkazantsev updated this revision to Diff 276933. mkazantsev added a comment. Implemented deduplication (used SetVector to ensure that we have deterministic fold order). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 Files: llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/test/Transforms/InstCombine/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83284.276933.patch Type: text/x-patch Size: 7110 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:25:09 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:25:09 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <9007c245861f6177bd6440f22bfba13e@localhost.localdomain> david-arm accepted this revision. david-arm added a comment. This revision is now accepted and ready to land. LGTM! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Fri Jul 10 00:27:16 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:27:16 +0000 (UTC) Subject: [PATCH] D83537: [WebAssembly] Use ISD::SPLAT_VECTOR for splats Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff. Herald added subscribers: llvm-commits, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. This patch legalizes ISD::SPLAT_VECTOR to enable the generic DAG combines that create splat_vector nodes. This simplifies some ISel patterns, although ISD::isBuildVectorAllOnes has to be extended to recognize splat_vectors to keep the vnot pattern fragment working. The AddedComplexity for splats is also removed so that we no longer prefer constant splats over v128.const instructions. This is consistent with the instruction preferences used in BUILD_VECTOR lowering and reduces the instruction count in many tests. There is a small regression in that insert_vector_elts into undef vectors at constant indices that could previously have been turned into swizzles can no longer be simplified that way because those nodes are combined to splat_vector nodes instead of BUILD_VECTOR nodes. This change includes a custom target combine meant to fix this, but unfortunately the generic combine gets precedence over the custom combine. Fixing this is left as future work, and the custom combine is kept because it is still useful in the non-constant index case. See @swizzle_one_i8x16 and @swizzle_one_var_i8x16 in simd-build-vector.ll for details. The motivation for this change is that follow-on patches will introduce new combines that will greatly improve codegen for splatted vector shift values but rely on splats having no undef lanes. Unlike a splatting build_vector, a splat_vector node never has undef lanes. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83537 Files: llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/Target/WebAssembly/WebAssemblyISD.def llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-arith.ll llvm/test/CodeGen/WebAssembly/simd-build-vector.ll llvm/test/CodeGen/WebAssembly/simd-load-splat.ll llvm/test/CodeGen/WebAssembly/simd.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83537.276934.patch Type: text/x-patch Size: 27977 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:29:36 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:29:36 +0000 (UTC) Subject: [PATCH] D83196: [CodeGen] Fix a warning in DAGTypeLegalizer::SetSplitVector In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGda731894a2fe: [CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer… (authored by david-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83196/new/ https://reviews.llvm.org/D83196 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp @@ -835,9 +835,9 @@ void DAGTypeLegalizer::SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi) { assert(Lo.getValueType().getVectorElementType() == - Op.getValueType().getVectorElementType() && - 2*Lo.getValueType().getVectorNumElements() == - Op.getValueType().getVectorNumElements() && + Op.getValueType().getVectorElementType() && + Lo.getValueType().getVectorElementCount() * 2 == + Op.getValueType().getVectorElementCount() && Hi.getValueType() == Lo.getValueType() && "Invalid type for split vector"); // Lo/Hi may have been newly allocated, if so, add nodeid's as relevant. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83196.276935.patch Type: text/x-patch Size: 929 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:29:36 2020 From: llvm-commits at lists.llvm.org (David Sherwood via llvm-commits) Date: Fri, 10 Jul 2020 00:29:36 -0700 (PDT) Subject: [llvm] da73189 - [CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector Message-ID: <5f0818e0.1c69fb81.7a6cc.cf2d@mx.google.com> Author: David Sherwood Date: 2020-07-10T08:29:17+01:00 New Revision: da731894a2fe45fd5bec9698f3206c1fdee2829a URL: https://github.com/llvm/llvm-project/commit/da731894a2fe45fd5bec9698f3206c1fdee2829a DIFF: https://github.com/llvm/llvm-project/commit/da731894a2fe45fd5bec9698f3206c1fdee2829a.diff LOG: [CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert from getVectorNumElements() to getVectorElementCount(), since this code path works for both fixed and scalable vectors. This fixes up one warning in the test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83196 Added: Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp index 2e1377c2c173..ae087d3bbd8c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp @@ -835,9 +835,9 @@ void DAGTypeLegalizer::GetSplitVector(SDValue Op, SDValue &Lo, void DAGTypeLegalizer::SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi) { assert(Lo.getValueType().getVectorElementType() == - Op.getValueType().getVectorElementType() && - 2*Lo.getValueType().getVectorNumElements() == - Op.getValueType().getVectorNumElements() && + Op.getValueType().getVectorElementType() && + Lo.getValueType().getVectorElementCount() * 2 == + Op.getValueType().getVectorElementCount() && Hi.getValueType() == Lo.getValueType() && "Invalid type for split vector"); // Lo/Hi may have been newly allocated, if so, add nodeid's as relevant. From llvm-commits at lists.llvm.org Fri Jul 10 00:35:00 2020 From: llvm-commits at lists.llvm.org (Diogo Sampaio via llvm-commits) Date: Fri, 10 Jul 2020 00:35:00 -0700 (PDT) Subject: [llvm] 7bf1683 - [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses Message-ID: <5f081a24.1c69fb81.f9d86.c70c@mx.google.com> Author: Diogo Sampaio Date: 2020-07-10T08:34:53+01:00 New Revision: 7bf168390fd05460b1c0df3fa570758c6be718fd URL: https://github.com/llvm/llvm-project/commit/7bf168390fd05460b1c0df3fa570758c6be718fd DIFF: https://github.com/llvm/llvm-project/commit/7bf168390fd05460b1c0df3fa570758c6be718fd.diff LOG: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value. Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic Reviewed By: lebedev.ri, nikic Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60413 Added: Modified: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/BDCE/sext_multi_uses.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/BDCE.cpp b/llvm/lib/Transforms/Scalar/BDCE.cpp index c22ea0b8271e..767c7656dcfa 100644 --- a/llvm/lib/Transforms/Scalar/BDCE.cpp +++ b/llvm/lib/Transforms/Scalar/BDCE.cpp @@ -9,7 +9,8 @@ // This file implements the Bit-Tracking Dead Code Elimination pass. Some // instructions (shifts, some ands, ors, etc.) kill some of their input bits. // We track these dead bits and remove instructions that compute only these -// dead bits. +// dead bits. We also simplify sext that generates unused extension bits, +// converting it to a zext. // //===----------------------------------------------------------------------===// @@ -19,6 +20,7 @@ #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/DemandedBits.h" #include "llvm/Analysis/GlobalsModRef.h" +#include "llvm/IR/IRBuilder.h" #include "llvm/IR/InstIterator.h" #include "llvm/IR/Instructions.h" #include "llvm/InitializePasses.h" @@ -33,6 +35,8 @@ using namespace llvm; STATISTIC(NumRemoved, "Number of instructions removed (unused)"); STATISTIC(NumSimplified, "Number of instructions trivialized (dead bits)"); +STATISTIC(NumSExt2ZExt, + "Number of sign extension instructions converted to zero extension"); /// If an instruction is trivialized (dead), then the chain of users of that /// instruction may need to be cleared of assumptions that can no longer be @@ -109,6 +113,24 @@ static bool bitTrackingDCE(Function &F, DemandedBits &DB) { continue; } + // Convert SExt into ZExt if none of the extension bits is required + if (SExtInst *SE = dyn_cast(&I)) { + APInt Demanded = DB.getDemandedBits(SE); + const uint32_t SrcBitSize = SE->getSrcTy()->getScalarSizeInBits(); + auto *const DstTy = SE->getDestTy(); + const uint32_t DestBitSize = DstTy->getScalarSizeInBits(); + if (Demanded.countLeadingZeros() >= (DestBitSize - SrcBitSize)) { + clearAssumptionsOfUsers(SE, DB); + IRBuilder<> Builder(SE); + I.replaceAllUsesWith( + Builder.CreateZExt(SE->getOperand(0), DstTy, SE->getName())); + Worklist.push_back(SE); + Changed = true; + NumSExt2ZExt++; + continue; + } + } + for (Use &U : I.operands()) { // DemandedBits only detects dead integer uses. if (!U->getType()->isIntOrIntVectorTy()) diff --git a/llvm/test/Transforms/BDCE/sext_multi_uses.ll b/llvm/test/Transforms/BDCE/sext_multi_uses.ll index 97709357919e..fa8549e0dec3 100644 --- a/llvm/test/Transforms/BDCE/sext_multi_uses.ll +++ b/llvm/test/Transforms/BDCE/sext_multi_uses.ll @@ -1,11 +1,11 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -o - -bdce -S %s | FileCheck %s +; RUN: opt -S -bdce < %s | FileCheck %s define i32 @ZEXT_0(i16 %a) { ; CHECK-LABEL: @ZEXT_0( ; CHECK-NEXT: entry: -; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 -; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280 -; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[EXT1:%.*]] = zext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT1]], 65280 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT1]], 8 ; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 ; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] ; CHECK-NEXT: ret i32 [[OR]] @@ -22,10 +22,10 @@ entry: define i32 @ZEXT_1(i16 %a) { ; CHECK-LABEL: @ZEXT_1( ; CHECK-NEXT: entry: -; CHECK-NEXT: [[EXT:%.*]] = sext i16 [[A:%.*]] to i32 -; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8 +; CHECK-NEXT: [[EXT1:%.*]] = zext i16 [[A:%.*]] to i32 +; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT1]], 8 ; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255 -; CHECK-NEXT: [[AND:%.*]] = or i32 [[EXT]], -65536 +; CHECK-NEXT: [[AND:%.*]] = or i32 [[EXT1]], -65536 ; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]] ; CHECK-NEXT: ret i32 [[OR]] ; @@ -99,8 +99,8 @@ entry: define i16 @clear_assumptions(i8 %x, i16 %y) { ; CHECK-LABEL: @clear_assumptions( -; CHECK-NEXT: [[EXT:%.*]] = sext i8 [[X:%.*]] to i16 -; CHECK-NEXT: [[ADD:%.*]] = add nsw i16 [[EXT]], [[Y:%.*]] +; CHECK-NEXT: [[EXT1:%.*]] = zext i8 [[X:%.*]] to i16 +; CHECK-NEXT: [[ADD:%.*]] = add i16 [[EXT1]], [[Y:%.*]] ; CHECK-NEXT: [[AND:%.*]] = and i16 [[ADD]], 255 ; CHECK-NEXT: ret i16 [[AND]] ; From llvm-commits at lists.llvm.org Fri Jul 10 00:35:03 2020 From: llvm-commits at lists.llvm.org (Diogo N. Sampaio via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:35:03 +0000 (UTC) Subject: [PATCH] D60413: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG7bf168390fd0: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses (authored by dnsampaio). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60413/new/ https://reviews.llvm.org/D60413 Files: llvm/lib/Transforms/Scalar/BDCE.cpp llvm/test/Transforms/BDCE/sext_multi_uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D60413.276936.patch Type: text/x-patch Size: 4218 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 00:38:49 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:38:49 +0000 (UTC) Subject: [PATCH] D83537: [WebAssembly] Use ISD::SPLAT_VECTOR for splats In-Reply-To: References: Message-ID: <65ee6aa57d65d9b518ada373687e7e0b@localhost.localdomain> tlively planned changes to this revision. tlively added a comment. This whole patch got a whole lot more complex than I thought it would be at first, so I'm going to split out some of the separately-useful parts and experiment with other ways of getting rid of the undef lanes in splat build_vectors. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83537/new/ https://reviews.llvm.org/D83537 From llvm-commits at lists.llvm.org Fri Jul 10 00:46:21 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:46:21 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <8ce12043bfce5f0f6b31ce1878c901a2@localhost.localdomain> nikic added inline comments. ================ Comment at: llvm/test/Transforms/InstCombine/select.ll:2286 +; CHECK: exit: +; CHECK-NEXT: ret i32 [[B:%.*]] +; ---------------- I don't understand why this returns `%B` (and what the difference to the previous test is, for that matter). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Fri Jul 10 00:59:16 2020 From: llvm-commits at lists.llvm.org (Qing Shan Zhang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 07:59:16 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations In-Reply-To: References: Message-ID: <2af140474917c8e387397dca8558f45f@localhost.localdomain> steven.zhang accepted this revision. steven.zhang added a comment. This revision is now accepted and ready to land. LGTM as long as you fix that unintend change of that line of the test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83437/new/ https://reviews.llvm.org/D83437 From llvm-commits at lists.llvm.org Fri Jul 10 01:02:13 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:02:13 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: bbrezillon added a comment. In D83473#2143152 , @jvesely wrote: > What is the problem this patch is trying to address? Well, the primary goal was to have consistent values in `clang/lib/Headers/opencl-c-base.h` and `libclc/generic/include/clc/float/definitions.h` to avoid the mess when one links against libclc but includes `opencl-c-base.h`. Not to mention that having 2 conflicting definitions in headers that both lives in the same code base and are supposed to represent the same thing is confusing, to say the least. > The specs do not mandate these two values to be different. It might be me misunderstanding the spec here. I read > "The value of FP_ILOGB0 shall be either INT_MIN or -INT_MAX. The value of FP_ILOGBNAN shall be either INT_MAX or INT_MIN." as (`FP_ILOGB0=INT_MIN` and `FP_ILOGBNAN=INT_MAX`) or (`FP_ILOGB0=-INT_MAX` and `FP_ILOGBNAN=INT_MIN`). But you're probably right, there's nothing stating that `FP_ILOGB0` and `FP_ILOGBNAN` should map to different values, it's just the pattern I've seen in various libc implementations. > On the more practical side. > This patch only changes fp32 implementation to return the new value leaving the fp64 implementation to return `INT_MIN` in both cases. Oops. The fp64 version should definitely be patched accordingly. > The implementation now returns `FP_ILOGBNAN` even for `Inf` input, which is not correct. Hm, nope, it still returns `0x7fffffff`, which is `INT_MAX`. I think you're referring to my comment, where I'm emitting the idea of merging the 2 tests into a single one since `FP_ILOGBNAN` is now also equal to `INT_MAX`, but as mentioned there, I think clarity prevails over optimization (especially since clang might optimize that for us anyway). > CLC spec doesn't talk about `Inf` inputs, but the libm behaviour is to return `INT_MAX, which might be useful. Yep, and I didn't change that part. > If `FP_ILOGBNAN` and `FP_ILOGB0` need to be different it'd be better to use `FP_ILOGBNAN == INT_MIN` and `FP_ILOGB0 == -INT_MAX`. Except you'd then have a mismatch between `clang/lib/Headers/opencl-c-base.h` and `libclc/generic/include/clc/float/definitions.h`. So maybe the answer is don't include `opencl-c-base.h` when you link against libclc, but as I mentioned above, the fact that both headers living in the same code base define 2 different values for the same thing is confusing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Fri Jul 10 01:03:50 2020 From: llvm-commits at lists.llvm.org (Clement Courbet via llvm-commits) Date: Fri, 10 Jul 2020 01:03:50 -0700 (PDT) Subject: [compiler-rt] 68c011a - [builtins] Optimize udivmodti4 for many platforms. Message-ID: <5f0820e6.1c69fb81.28025.bf0a@mx.google.com> Author: Danila Kutenin Date: 2020-07-10T09:59:16+02:00 New Revision: 68c011aa085ab8ec198198e45c83de605a7dc31f URL: https://github.com/llvm/llvm-project/commit/68c011aa085ab8ec198198e45c83de605a7dc31f DIFF: https://github.com/llvm/llvm-project/commit/68c011aa085ab8ec198198e45c83de605a7dc31f.diff LOG: [builtins] Optimize udivmodti4 for many platforms. Summary: While benchmarking uint128 division we found out that it has huge latency for small divisors https://reviews.llvm.org/D83027 ``` Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------------------------------------------------------------- BM_DivideIntrinsic128UniformDivisor 13.0 13.0 55000000 BM_DivideIntrinsic128UniformDivisor<__int128> 14.3 14.3 50000000 BM_RemainderIntrinsic128UniformDivisor 13.5 13.5 52000000 BM_RemainderIntrinsic128UniformDivisor<__int128> 14.1 14.1 50000000 BM_DivideIntrinsic128SmallDivisor 153 153 5000000 BM_DivideIntrinsic128SmallDivisor<__int128> 170 170 3000000 BM_RemainderIntrinsic128SmallDivisor 153 153 5000000 BM_RemainderIntrinsic128SmallDivisor<__int128> 155 155 5000000 ``` This patch suggests a more optimized version of the division: If the divisor is 64 bit, we can proceed with the divq instruction on x86 or constant multiplication mechanisms for other platforms. Once both divisor and dividend are not less than 2**64, we use branch free subtract algorithm, it has at most 64 cycles. After that our benchmarks improved significantly ``` Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------------------------------------------------------------- BM_DivideIntrinsic128UniformDivisor 11.0 11.0 64000000 BM_DivideIntrinsic128UniformDivisor<__int128> 13.8 13.8 51000000 BM_RemainderIntrinsic128UniformDivisor 11.6 11.6 61000000 BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 52000000 BM_DivideIntrinsic128SmallDivisor 27.1 27.1 26000000 BM_DivideIntrinsic128SmallDivisor<__int128> 29.4 29.4 24000000 BM_RemainderIntrinsic128SmallDivisor 27.9 27.8 26000000 BM_RemainderIntrinsic128SmallDivisor<__int128> 29.1 29.1 25000000 ``` If not using divq instrinsics, it is still much better ``` Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------------------------------------------------------------- BM_DivideIntrinsic128UniformDivisor 12.2 12.2 58000000 BM_DivideIntrinsic128UniformDivisor<__int128> 13.5 13.5 52000000 BM_RemainderIntrinsic128UniformDivisor 12.7 12.7 56000000 BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 51000000 BM_DivideIntrinsic128SmallDivisor 30.2 30.2 24000000 BM_DivideIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000 BM_RemainderIntrinsic128SmallDivisor 31.4 31.4 23000000 BM_RemainderIntrinsic128SmallDivisor<__int128> 33.8 33.8 21000000 ``` PowerPC benchmarks: Was ``` BM_DivideIntrinsic128UniformDivisor 22.3 22.3 32000000 BM_DivideIntrinsic128UniformDivisor<__int128> 23.8 23.8 30000000 BM_RemainderIntrinsic128UniformDivisor 22.5 22.5 32000000 BM_RemainderIntrinsic128UniformDivisor<__int128> 24.9 24.9 29000000 BM_DivideIntrinsic128SmallDivisor 394 394 2000000 BM_DivideIntrinsic128SmallDivisor<__int128> 397 397 2000000 BM_RemainderIntrinsic128SmallDivisor 399 399 2000000 BM_RemainderIntrinsic128SmallDivisor<__int128> 397 397 2000000 ``` With this patch ``` BM_DivideIntrinsic128UniformDivisor 21.7 21.7 33000000 BM_DivideIntrinsic128UniformDivisor<__int128> 23.0 23.0 31000000 BM_RemainderIntrinsic128UniformDivisor 21.9 21.9 33000000 BM_RemainderIntrinsic128UniformDivisor<__int128> 23.9 23.9 30000000 BM_DivideIntrinsic128SmallDivisor 32.7 32.6 23000000 BM_DivideIntrinsic128SmallDivisor<__int128> 33.4 33.4 21000000 BM_RemainderIntrinsic128SmallDivisor 31.1 31.1 22000000 BM_RemainderIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000 ``` My email: danilak at google.com, I don't have commit rights Reviewers: howard.hinnant, courbet, MaskRay Reviewed By: courbet Subscribers: steven.zhang, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D81809 Added: Modified: compiler-rt/lib/builtins/udivmodti4.c Removed: ################################################################################ diff --git a/compiler-rt/lib/builtins/udivmodti4.c b/compiler-rt/lib/builtins/udivmodti4.c index dd14a8b579ca..55def37c9e1f 100644 --- a/compiler-rt/lib/builtins/udivmodti4.c +++ b/compiler-rt/lib/builtins/udivmodti4.c @@ -14,182 +14,145 @@ #ifdef CRT_HAS_128BIT +// Returns the 128 bit division result by 64 bit. Result must fit in 64 bits. +// Remainder stored in r. +// Taken and adjusted from libdivide libdivide_128_div_64_to_64 division +// fallback. For a correctness proof see the reference for this algorithm +// in Knuth, Volume 2, section 4.3.1, Algorithm D. +UNUSED +static inline du_int udiv128by64to64default(du_int u1, du_int u0, du_int v, + du_int *r) { + const unsigned n_udword_bits = sizeof(du_int) * CHAR_BIT; + const du_int b = (1ULL << (n_udword_bits / 2)); // Number base (32 bits) + du_int un1, un0; // Norm. dividend LSD's + du_int vn1, vn0; // Norm. divisor digits + du_int q1, q0; // Quotient digits + du_int un64, un21, un10; // Dividend digit pairs + du_int rhat; // A remainder + si_int s; // Shift amount for normalization + + s = __builtin_clzll(v); + if (s > 0) { + // Normalize the divisor. + v = v << s; + un64 = (u1 << s) | (u0 >> (n_udword_bits - s)); + un10 = u0 << s; // Shift dividend left + } else { + // Avoid undefined behavior of (u0 >> 64). + un64 = u1; + un10 = u0; + } + + // Break divisor up into two 32-bit digits. + vn1 = v >> (n_udword_bits / 2); + vn0 = v & 0xFFFFFFFF; + + // Break right half of dividend into two digits. + un1 = un10 >> (n_udword_bits / 2); + un0 = un10 & 0xFFFFFFFF; + + // Compute the first quotient digit, q1. + q1 = un64 / vn1; + rhat = un64 - q1 * vn1; + + // q1 has at most error 2. No more than 2 iterations. + while (q1 >= b || q1 * vn0 > b * rhat + un1) { + q1 = q1 - 1; + rhat = rhat + vn1; + if (rhat >= b) + break; + } + + un21 = un64 * b + un1 - q1 * v; + + // Compute the second quotient digit. + q0 = un21 / vn1; + rhat = un21 - q0 * vn1; + + // q0 has at most error 2. No more than 2 iterations. + while (q0 >= b || q0 * vn0 > b * rhat + un0) { + q0 = q0 - 1; + rhat = rhat + vn1; + if (rhat >= b) + break; + } + + *r = (un21 * b + un0 - q0 * v) >> s; + return q1 * b + q0; +} + +static inline du_int udiv128by64to64(du_int u1, du_int u0, du_int v, + du_int *r) { +#if defined(__x86_64__) + du_int result; + __asm__("divq %[v]" + : "=a"(result), "=d"(*r) + : [ v ] "r"(v), "a"(u0), "d"(u1)); + return result; +#else + return udiv128by64to64default(u1, u0, v, r); +#endif +} + // Effects: if rem != 0, *rem = a % b // Returns: a / b -// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide - COMPILER_RT_ABI tu_int __udivmodti4(tu_int a, tu_int b, tu_int *rem) { - const unsigned n_udword_bits = sizeof(du_int) * CHAR_BIT; const unsigned n_utword_bits = sizeof(tu_int) * CHAR_BIT; - utwords n; - n.all = a; - utwords d; - d.all = b; - utwords q; - utwords r; - unsigned sr; - // special cases, X is unknown, K != 0 - if (n.s.high == 0) { - if (d.s.high == 0) { - // 0 X - // --- - // 0 X - if (rem) - *rem = n.s.low % d.s.low; - return n.s.low / d.s.low; - } - // 0 X - // --- - // K X + utwords dividend; + dividend.all = a; + utwords divisor; + divisor.all = b; + utwords quotient; + utwords remainder; + if (divisor.all > dividend.all) { if (rem) - *rem = n.s.low; + *rem = dividend.all; return 0; } - // n.s.high != 0 - if (d.s.low == 0) { - if (d.s.high == 0) { - // K X - // --- - // 0 0 - if (rem) - *rem = n.s.high % d.s.low; - return n.s.high / d.s.low; - } - // d.s.high != 0 - if (n.s.low == 0) { - // K 0 - // --- - // K 0 - if (rem) { - r.s.high = n.s.high % d.s.high; - r.s.low = 0; - *rem = r.all; - } - return n.s.high / d.s.high; - } - // K K - // --- - // K 0 - if ((d.s.high & (d.s.high - 1)) == 0) /* if d is a power of 2 */ { - if (rem) { - r.s.low = n.s.low; - r.s.high = n.s.high & (d.s.high - 1); - *rem = r.all; - } - return n.s.high >> __builtin_ctzll(d.s.high); - } - // K K - // --- - // K 0 - sr = __builtin_clzll(d.s.high) - __builtin_clzll(n.s.high); - // 0 <= sr <= n_udword_bits - 2 or sr large - if (sr > n_udword_bits - 2) { - if (rem) - *rem = n.all; - return 0; - } - ++sr; - // 1 <= sr <= n_udword_bits - 1 - // q.all = n.all << (n_utword_bits - sr); - q.s.low = 0; - q.s.high = n.s.low << (n_udword_bits - sr); - // r.all = n.all >> sr; - r.s.high = n.s.high >> sr; - r.s.low = (n.s.high << (n_udword_bits - sr)) | (n.s.low >> sr); - } else /* d.s.low != 0 */ { - if (d.s.high == 0) { - // K X - // --- - // 0 K - if ((d.s.low & (d.s.low - 1)) == 0) /* if d is a power of 2 */ { - if (rem) - *rem = n.s.low & (d.s.low - 1); - if (d.s.low == 1) - return n.all; - sr = __builtin_ctzll(d.s.low); - q.s.high = n.s.high >> sr; - q.s.low = (n.s.high << (n_udword_bits - sr)) | (n.s.low >> sr); - return q.all; - } - // K X - // --- - // 0 K - sr = 1 + n_udword_bits + __builtin_clzll(d.s.low) - - __builtin_clzll(n.s.high); - // 2 <= sr <= n_utword_bits - 1 - // q.all = n.all << (n_utword_bits - sr); - // r.all = n.all >> sr; - if (sr == n_udword_bits) { - q.s.low = 0; - q.s.high = n.s.low; - r.s.high = 0; - r.s.low = n.s.high; - } else if (sr < n_udword_bits) /* 2 <= sr <= n_udword_bits - 1 */ { - q.s.low = 0; - q.s.high = n.s.low << (n_udword_bits - sr); - r.s.high = n.s.high >> sr; - r.s.low = (n.s.high << (n_udword_bits - sr)) | (n.s.low >> sr); - } else /* n_udword_bits + 1 <= sr <= n_utword_bits - 1 */ { - q.s.low = n.s.low << (n_utword_bits - sr); - q.s.high = (n.s.high << (n_utword_bits - sr)) | - (n.s.low >> (sr - n_udword_bits)); - r.s.high = 0; - r.s.low = n.s.high >> (sr - n_udword_bits); - } + // When the divisor fits in 64 bits, we can use an optimized path. + if (divisor.s.high == 0) { + remainder.s.high = 0; + if (dividend.s.high < divisor.s.low) { + // The result fits in 64 bits. + quotient.s.low = udiv128by64to64(dividend.s.high, dividend.s.low, + divisor.s.low, &remainder.s.low); + quotient.s.high = 0; } else { - // K X - // --- - // K K - sr = __builtin_clzll(d.s.high) - __builtin_clzll(n.s.high); - // 0 <= sr <= n_udword_bits - 1 or sr large - if (sr > n_udword_bits - 1) { - if (rem) - *rem = n.all; - return 0; - } - ++sr; - // 1 <= sr <= n_udword_bits - // q.all = n.all << (n_utword_bits - sr); - // r.all = n.all >> sr; - q.s.low = 0; - if (sr == n_udword_bits) { - q.s.high = n.s.low; - r.s.high = 0; - r.s.low = n.s.high; - } else { - r.s.high = n.s.high >> sr; - r.s.low = (n.s.high << (n_udword_bits - sr)) | (n.s.low >> sr); - q.s.high = n.s.low << (n_udword_bits - sr); - } + // First, divide with the high part to get the remainder in dividend.s.high. + // After that dividend.s.high < divisor.s.low. + quotient.s.high = dividend.s.high / divisor.s.low; + dividend.s.high = dividend.s.high % divisor.s.low; + quotient.s.low = udiv128by64to64(dividend.s.high, dividend.s.low, + divisor.s.low, &remainder.s.low); } + if (rem) + *rem = remainder.all; + return quotient.all; } - // Not a special case - // q and r are initialized with: - // q.all = n.all << (n_utword_bits - sr); - // r.all = n.all >> sr; - // 1 <= sr <= n_utword_bits - 1 - su_int carry = 0; - for (; sr > 0; --sr) { - // r:q = ((r:q) << 1) | carry - r.s.high = (r.s.high << 1) | (r.s.low >> (n_udword_bits - 1)); - r.s.low = (r.s.low << 1) | (q.s.high >> (n_udword_bits - 1)); - q.s.high = (q.s.high << 1) | (q.s.low >> (n_udword_bits - 1)); - q.s.low = (q.s.low << 1) | carry; - // carry = 0; - // if (r.all >= d.all) + // 0 <= shift <= 63. + si_int shift = + __builtin_clzll(divisor.s.high) - __builtin_clzll(dividend.s.high); + divisor.all <<= shift; + quotient.s.high = 0; + quotient.s.low = 0; + for (; shift >= 0; --shift) { + quotient.s.low <<= 1; + // Branch free version of. + // if (dividend.all >= divisor.all) // { - // r.all -= d.all; - // carry = 1; + // dividend.all -= divisor.all; + // carry = 1; // } - const ti_int s = (ti_int)(d.all - r.all - 1) >> (n_utword_bits - 1); - carry = s & 1; - r.all -= d.all & s; + const ti_int s = + (ti_int)(divisor.all - dividend.all - 1) >> (n_utword_bits - 1); + quotient.s.low |= s & 1; + dividend.all -= divisor.all & s; + divisor.all >>= 1; } - q.all = (q.all << 1) | carry; if (rem) - *rem = r.all; - return q.all; + *rem = dividend.all; + return quotient.all; } #endif // CRT_HAS_128BIT From llvm-commits at lists.llvm.org Fri Jul 10 01:04:58 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:04:58 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <55afbf182972852f6152d4db2e1a1ce6@localhost.localdomain> mwezdeck updated this revision to Diff 276941. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 Files: llvm/lib/Support/ManagedStatic.cpp Index: llvm/lib/Support/ManagedStatic.cpp =================================================================== --- llvm/lib/Support/ManagedStatic.cpp +++ llvm/lib/Support/ManagedStatic.cpp @@ -76,8 +76,11 @@ /// llvm_shutdown - Deallocate and destroy all ManagedStatic variables. void llvm::llvm_shutdown() { - std::lock_guard Lock(*getManagedStaticMutex()); + { + std::lock_guard Lock(*getManagedStaticMutex()); - while (StaticList) - StaticList->destroy(); + while (StaticList) + StaticList->destroy(); + } + delete getManagedStaticMutex(); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83372.276941.patch Type: text/x-patch Size: 614 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:06:43 2020 From: llvm-commits at lists.llvm.org (Maksym Wezdecki via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:06:43 +0000 (UTC) Subject: [PATCH] D83372: Fix for memory leak reported by Valgrind In-Reply-To: References: Message-ID: <73e39f28d43e4c0a24645750c937169b@localhost.localdomain> mwezdeck added a comment. Thank you very much for explanation. Indeed, the context where user loads an so file thousands times a day makes requirement for startup time very important. Thanks again for poining this out to me. I've updated the patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83372/new/ https://reviews.llvm.org/D83372 From llvm-commits at lists.llvm.org Fri Jul 10 01:11:14 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:11:14 +0000 (UTC) Subject: [PATCH] D83533: [Alignment][NFC] Update Bitcodewriter to use Align In-Reply-To: References: Message-ID: gchatelet updated this revision to Diff 276942. gchatelet added a comment. - update reader as well and address comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83533/new/ https://reviews.llvm.org/D83533 Files: llvm/include/llvm/Bitstream/BitCodes.h llvm/include/llvm/Bitstream/BitcodeCommon.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83533.276942.patch Type: text/x-patch Size: 8984 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:12:44 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:12:44 +0000 (UTC) Subject: [PATCH] D83533: [Alignment][NFC] Update Bitcodewriter to use Align In-Reply-To: References: Message-ID: <25eb2b7781f2fcc11cd90eb691bd9a57@localhost.localdomain> gchatelet updated this revision to Diff 276943. gchatelet added a comment. - Remove newline in BitCodes.h Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83533/new/ https://reviews.llvm.org/D83533 Files: llvm/include/llvm/Bitstream/BitcodeCommon.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83533.276943.patch Type: text/x-patch Size: 8682 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:14:59 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:14:59 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <48953a60ff60751b9f6e14f4c04233d4@localhost.localdomain> SjoerdMeijer marked an inline comment as done. SjoerdMeijer added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15574 +matrix, using a stride of %Stride between columns. For two consecutive columns +A and B, %Stride refers to the distance (the number of elements) between the +start of column A and the start of column B. The result matrix is linearized ---------------- fhahn wrote: > SjoerdMeijer wrote: > > I am actually now also interested in defining `%Stride` better. Using our new definition: > > > > > For a `R x C` matrix, element `i` of column `j` is at index `j * R + i` in its vector, with indices starting at 0. > > > > From the description of %Stride it follows that: > > > > %Stride = ( (j+1) * R + 0) - (j * R + 0) > > => > > %Stride = R > > > > So double checking: we can simply the description of %Stride just by saying it is equal to the number of rows, is that correct? > Stride can be > the number of rows. > > For example, if you want to load a 2x2 sub-matrix from a 4x4 matrix, you would use `llvm.matrix.column.major.load(%start, 4, false, 2, 2), where %start points to the first element of the sub-matrix. > > The function to compute column addresses has an extensive comment about how things work: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L92 > > It boils down to something like: the start address of column I in memory is computed as ` getelementptr %Start, I * Stride`. Ah yes, thanks, I see now. I will add this, and we have at least one more condition, Stride >= Rows, to add to the verifier. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Fri Jul 10 01:15:45 2020 From: llvm-commits at lists.llvm.org (Jay Foad via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:15:45 +0000 (UTC) Subject: [PATCH] D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways In-Reply-To: References: Message-ID: <11b532e8ca8e50cbb75a5b07f0d2252c@localhost.localdomain> foad added a comment. In D83394#2143122 , @sameerds wrote: > In D83394#2142277 , @foad wrote: > > > Rebase. > > Fix silly mistake in checking for negative offsets. > > > It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test. There are no new tests. All the testing comes from staring at the changes in offset-split-flat.ll and offset-split-global.ll. In retrospect I should have noticed that there was something wrong with the original patch, because it didn't cause any changes in offset-split-flat.ll. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83394/new/ https://reviews.llvm.org/D83394 From llvm-commits at lists.llvm.org Fri Jul 10 01:32:30 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Mikael_Holm=C3=A9n_via_Phabricator?= via llvm-commits) Date: Fri, 10 Jul 2020 08:32:30 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <28ca439881063df534b0f0799c69ec16@localhost.localdomain> uabelho added a comment. Hi, I noticed that we got a runtime failure with this patch for our out-of-tree target. I haven't gone to the bottom with it yet, but I saw that a colleague of mine wrote a PR about a miscompile with -basic-aa-recphi that at least hasn't been handled according to bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37952 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 01:34:18 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Fri, 10 Jul 2020 01:34:18 -0700 (PDT) Subject: [llvm] 77133cc - [X86][AVX] Attempt to fold PACK(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(PACK(X,Y)). Message-ID: <5f08280a.1c69fb81.1ff27.e220@mx.google.com> Author: Simon Pilgrim Date: 2020-07-10T09:33:27+01:00 New Revision: 77133cc1e2c91678082d2098b959757e72dfce60 URL: https://github.com/llvm/llvm-project/commit/77133cc1e2c91678082d2098b959757e72dfce60 DIFF: https://github.com/llvm/llvm-project/commit/77133cc1e2c91678082d2098b959757e72dfce60.diff LOG: [X86][AVX] Attempt to fold PACK(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(PACK(X,Y)). Truncations lowered as shuffles of multiple (concatenated) vectors often leave us with lane-crossing shuffles that feed a PACKSS/PACKUS, if both shuffles are fed from the same 2 vector sources, then we can PACK the sources directly and shuffle the result instead. This is currently limited to whole i128 lanes in a 256-bit vector, but we can extend this if the need arises (but I'm not seeing many examples in real world code). Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avg.ll llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll llvm/test/CodeGen/X86/packss.ll llvm/test/CodeGen/X86/vector-compare-results.ll llvm/test/CodeGen/X86/vector-pack-256.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index cdd7ba1ec432..4d3b0eda58f2 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -41937,6 +41937,37 @@ static SDValue combineVectorPackWithShuffle(SDNode *N, SelectionDAG &DAG) { } } + // Attempt to fold PACK(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(PACK(X,Y)). + // TODO: Relax shuffle scaling to support sub-128-bit subvector shuffles. + if (VT.is256BitVector()) { + if (auto *SVN0 = dyn_cast(N0)) { + if (auto *SVN1 = dyn_cast(N1)) { + SmallVector ShuffleMask0, ShuffleMask1; + if (scaleShuffleElements(SVN0->getMask(), 2, ShuffleMask0) && + scaleShuffleElements(SVN1->getMask(), 2, ShuffleMask1)) { + SDValue Op00 = SVN0->getOperand(0); + SDValue Op01 = SVN0->getOperand(1); + SDValue Op10 = SVN1->getOperand(0); + SDValue Op11 = SVN1->getOperand(1); + if ((Op00 == Op11) && (Op01 == Op10)) { + std::swap(Op10, Op11); + ShuffleVectorSDNode::commuteMask(ShuffleMask1); + } + if ((Op00 == Op10) && (Op01 == Op11)) { + SmallVector ShuffleMask; + ShuffleMask.append(ShuffleMask0.begin(), ShuffleMask0.end()); + ShuffleMask.append(ShuffleMask1.begin(), ShuffleMask1.end()); + SDLoc DL(N); + SDValue Res = DAG.getNode(Opcode, DL, VT, Op00, Op01); + Res = DAG.getBitcast(MVT::v4i64, Res); + Res = DAG.getVectorShuffle(MVT::v4i64, DL, Res, Res, ShuffleMask); + return DAG.getBitcast(VT, Res); + } + } + } + } + } + return SDValue(); } diff --git a/llvm/test/CodeGen/X86/avg.ll b/llvm/test/CodeGen/X86/avg.ll index c6c2230a1d77..d2638a1681e8 100644 --- a/llvm/test/CodeGen/X86/avg.ll +++ b/llvm/test/CodeGen/X86/avg.ll @@ -233,9 +233,8 @@ define void @avg_v24i8(<24 x i8>* %a, <24 x i8>* %b) nounwind { ; AVX2-NEXT: vpsrld $1, %ymm2, %ymm2 ; AVX2-NEXT: vpsrld $1, %ymm1, %ymm1 ; AVX2-NEXT: vpsrld $1, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm3 = ymm1[2,3],ymm0[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 -; AVX2-NEXT: vpackusdw %ymm3, %ymm0, %ymm0 +; AVX2-NEXT: vpackusdw %ymm0, %ymm1, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] ; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0 ; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1 ; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0 @@ -597,25 +596,22 @@ define void @avg_v48i8(<48 x i8>* %a, <48 x i8>* %b) nounwind { ; AVX2-NEXT: vpsubd %ymm6, %ymm5, %ymm5 ; AVX2-NEXT: vpsrld $1, %ymm5, %ymm5 ; AVX2-NEXT: vpsrld $1, %ymm4, %ymm4 +; AVX2-NEXT: vpackusdw %ymm4, %ymm5, %ymm4 ; AVX2-NEXT: vpsrld $1, %ymm3, %ymm3 ; AVX2-NEXT: vpsrld $1, %ymm2, %ymm2 +; AVX2-NEXT: vpackusdw %ymm2, %ymm3, %ymm2 ; AVX2-NEXT: vpsrld $1, %ymm1, %ymm1 ; AVX2-NEXT: vpsrld $1, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm6 = ymm1[2,3],ymm0[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 -; AVX2-NEXT: vpackusdw %ymm6, %ymm0, %ymm0 +; AVX2-NEXT: vpackusdw %ymm0, %ymm1, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] ; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255] ; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm6 = ymm3[2,3],ymm2[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm3, %ymm2 -; AVX2-NEXT: vpackusdw %ymm6, %ymm2, %ymm2 +; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm2[0,2,1,3] ; AVX2-NEXT: vpand %ymm1, %ymm2, %ymm2 ; AVX2-NEXT: vperm2i128 {{.*#+}} ymm3 = ymm2[2,3],ymm0[2,3] ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm2, %ymm0 ; AVX2-NEXT: vpackuswb %ymm3, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm5[2,3],ymm4[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm4, %ymm5, %ymm3 -; AVX2-NEXT: vpackusdw %ymm2, %ymm3, %ymm2 +; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm4[0,2,1,3] ; AVX2-NEXT: vpand %ymm1, %ymm2, %ymm1 ; AVX2-NEXT: vextracti128 $1, %ymm1, %xmm2 ; AVX2-NEXT: vpackuswb %xmm2, %xmm1, %xmm1 diff --git a/llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll b/llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll index 2424e1c3a74d..bf251635ffb0 100644 --- a/llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll +++ b/llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll @@ -251,15 +251,12 @@ define i32 @v32i16(<32 x i16> %a, <32 x i16> %b, <32 x i16> %c, <32 x i16> %d) { ; AVX2: # %bb.0: ; AVX2-NEXT: vpcmpgtw %ymm3, %ymm1, %ymm1 ; AVX2-NEXT: vpcmpgtw %ymm2, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpacksswb %ymm2, %ymm0, %ymm0 +; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: vpcmpgtw %ymm7, %ymm5, %ymm1 ; AVX2-NEXT: vpcmpgtw %ymm6, %ymm4, %ymm2 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm3 = ymm2[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm2, %ymm1 -; AVX2-NEXT: vpacksswb %ymm3, %ymm1, %ymm1 +; AVX2-NEXT: vpacksswb %ymm1, %ymm2, %ymm1 ; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] ; AVX2-NEXT: vpmovmskb %ymm0, %eax ; AVX2-NEXT: vzeroupper ; AVX2-NEXT: retq diff --git a/llvm/test/CodeGen/X86/packss.ll b/llvm/test/CodeGen/X86/packss.ll index f4b601afb569..16349ae2c7f9 100644 --- a/llvm/test/CodeGen/X86/packss.ll +++ b/llvm/test/CodeGen/X86/packss.ll @@ -369,10 +369,8 @@ define <32 x i8> @packsswb_icmp_zero_trunc_256(<16 x i16> %a0) { ; AVX2: # %bb.0: ; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1 ; AVX2-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0 -; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm0[4,5,6,7] -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm0 = zero,zero,ymm0[0,1] -; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0 -; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT: vpacksswb %ymm0, %ymm1, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,0,2,3] ; AVX2-NEXT: ret{{[l|q]}} %1 = icmp eq <16 x i16> %a0, zeroinitializer %2 = sext <16 x i1> %1 to <16 x i16> diff --git a/llvm/test/CodeGen/X86/vector-compare-results.ll b/llvm/test/CodeGen/X86/vector-compare-results.ll index 80cfa365e0f1..568dff8e1358 100644 --- a/llvm/test/CodeGen/X86/vector-compare-results.ll +++ b/llvm/test/CodeGen/X86/vector-compare-results.ll @@ -1453,15 +1453,13 @@ define <64 x i1> @test_cmp_v64i16(<64 x i16> %a0, <64 x i16> %a1) nounwind { ; AVX2-NEXT: movq %rdi, %rax ; AVX2-NEXT: vpcmpgtw %ymm5, %ymm1, %ymm1 ; AVX2-NEXT: vpcmpgtw %ymm4, %ymm0, %ymm0 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm4 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpacksswb %ymm4, %ymm0, %ymm0 +; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] ; AVX2-NEXT: vpmovmskb %ymm0, %ecx ; AVX2-NEXT: vpcmpgtw %ymm7, %ymm3, %ymm0 ; AVX2-NEXT: vpcmpgtw %ymm6, %ymm2, %ymm1 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm1[2,3],ymm0[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 -; AVX2-NEXT: vpacksswb %ymm2, %ymm0, %ymm0 +; AVX2-NEXT: vpacksswb %ymm0, %ymm1, %ymm0 +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] ; AVX2-NEXT: vpmovmskb %ymm0, %edx ; AVX2-NEXT: shlq $32, %rdx ; AVX2-NEXT: orq %rcx, %rdx diff --git a/llvm/test/CodeGen/X86/vector-pack-256.ll b/llvm/test/CodeGen/X86/vector-pack-256.ll index fb0e5d063f61..6d1016ba1d73 100644 --- a/llvm/test/CodeGen/X86/vector-pack-256.ll +++ b/llvm/test/CodeGen/X86/vector-pack-256.ll @@ -24,10 +24,7 @@ define <16 x i16> @trunc_concat_packssdw_256(<8 x i32> %a0, <8 x i32> %a1) nounw ; AVX2: # %bb.0: ; AVX2-NEXT: vpsrad $17, %ymm0, %ymm0 ; AVX2-NEXT: vpsrad $23, %ymm1, %ymm1 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpackssdw %ymm2, %ymm0, %ymm0 -; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT: vpackssdw %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: retq ; ; AVX512-LABEL: trunc_concat_packssdw_256: @@ -64,10 +61,7 @@ define <16 x i16> @trunc_concat_packusdw_256(<8 x i32> %a0, <8 x i32> %a1) nounw ; AVX2-NEXT: vpsrld $17, %ymm0, %ymm0 ; AVX2-NEXT: vpbroadcastd {{.*#+}} ymm2 = [15,15,15,15,15,15,15,15] ; AVX2-NEXT: vpand %ymm2, %ymm1, %ymm1 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpackusdw %ymm2, %ymm0, %ymm0 -; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT: vpackusdw %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: retq ; ; AVX512-LABEL: trunc_concat_packusdw_256: @@ -103,10 +97,7 @@ define <32 x i8> @trunc_concat_packsswb_256(<16 x i16> %a0, <16 x i16> %a1) noun ; AVX2: # %bb.0: ; AVX2-NEXT: vpsraw $15, %ymm0, %ymm0 ; AVX2-NEXT: vpand {{.*}}(%rip), %ymm1, %ymm1 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpacksswb %ymm2, %ymm0, %ymm0 -; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: retq ; ; AVX512F-LABEL: trunc_concat_packsswb_256: @@ -155,10 +146,7 @@ define <32 x i8> @trunc_concat_packuswb_256(<16 x i16> %a0, <16 x i16> %a1) noun ; AVX2: # %bb.0: ; AVX2-NEXT: vpsrlw $15, %ymm0, %ymm0 ; AVX2-NEXT: vpand {{.*}}(%rip), %ymm1, %ymm1 -; AVX2-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-NEXT: vpackuswb %ymm2, %ymm0, %ymm0 -; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT: vpackuswb %ymm1, %ymm0, %ymm0 ; AVX2-NEXT: retq ; ; AVX512F-LABEL: trunc_concat_packuswb_256: From llvm-commits at lists.llvm.org Fri Jul 10 01:36:41 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:36:41 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: SjoerdMeijer marked an inline comment as done. SjoerdMeijer added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15574 +matrix, using a stride of %Stride between columns. For two consecutive columns +A and B, %Stride refers to the distance (the number of elements) between the +start of column A and the start of column B. The result matrix is linearized ---------------- SjoerdMeijer wrote: > fhahn wrote: > > SjoerdMeijer wrote: > > > I am actually now also interested in defining `%Stride` better. Using our new definition: > > > > > > > For a `R x C` matrix, element `i` of column `j` is at index `j * R + i` in its vector, with indices starting at 0. > > > > > > From the description of %Stride it follows that: > > > > > > %Stride = ( (j+1) * R + 0) - (j * R + 0) > > > => > > > %Stride = R > > > > > > So double checking: we can simply the description of %Stride just by saying it is equal to the number of rows, is that correct? > > Stride can be > the number of rows. > > > > For example, if you want to load a 2x2 sub-matrix from a 4x4 matrix, you would use `llvm.matrix.column.major.load(%start, 4, false, 2, 2), where %start points to the first element of the sub-matrix. > > > > The function to compute column addresses has an extensive comment about how things work: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L92 > > > > It boils down to something like: the start address of column I in memory is computed as ` getelementptr %Start, I * Stride`. > Ah yes, thanks, I see now. I will add this, and we have at least one more condition, Stride >= Rows, to add to the verifier. ignore: > and we have at least one more condition, Stride >= Rows, to add to the verifier. %Stride is not an immediate. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Fri Jul 10 01:41:42 2020 From: llvm-commits at lists.llvm.org (Brooks Moses via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:41:42 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <8dd66ff925c04b9c30a98b317d5f7db2@localhost.localdomain> brooksmoses added a comment. I'd erroneously made this comment on the revision (where I think nobody will see it) rather than here, so copying it here: A question about this: We got a couple of dozen MSAN warnings in the Google codebase from the revision that removed these -- I'm guessing that perhaps what happened was that the undef in question was a use-of-uninitialized-value, and this optimization was hiding the use so the MSAN checks didn't trigger. Is re-enabling this going to make those MSAN warnings go away again, re-hiding this undefined behavior? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Fri Jul 10 01:43:28 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:43:28 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <352704403bb6bde5f61ee3ef88c9b2dc@localhost.localdomain> dmgreen added a comment. Thanks. I will take a look. >From this, that bug appears to be fixed: https://godbolt.org/z/6W8hc8. Probably from D82987 I would expect. I will check. Opt 10.0.0 does show the problem, where as trunk seems to be fixed. Let me know about the runtime failure. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 01:46:48 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:46:48 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: fhahn added inline comments. ================ Comment at: llvm/docs/LangRef.rst:15574 +matrix, using a stride of %Stride between columns. For two consecutive columns +A and B, %Stride refers to the distance (the number of elements) between the +start of column A and the start of column B. The result matrix is linearized ---------------- SjoerdMeijer wrote: > SjoerdMeijer wrote: > > fhahn wrote: > > > SjoerdMeijer wrote: > > > > I am actually now also interested in defining `%Stride` better. Using our new definition: > > > > > > > > > For a `R x C` matrix, element `i` of column `j` is at index `j * R + i` in its vector, with indices starting at 0. > > > > > > > > From the description of %Stride it follows that: > > > > > > > > %Stride = ( (j+1) * R + 0) - (j * R + 0) > > > > => > > > > %Stride = R > > > > > > > > So double checking: we can simply the description of %Stride just by saying it is equal to the number of rows, is that correct? > > > Stride can be > the number of rows. > > > > > > For example, if you want to load a 2x2 sub-matrix from a 4x4 matrix, you would use `llvm.matrix.column.major.load(%start, 4, false, 2, 2), where %start points to the first element of the sub-matrix. > > > > > > The function to compute column addresses has an extensive comment about how things work: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L92 > > > > > > It boils down to something like: the start address of column I in memory is computed as ` getelementptr %Start, I * Stride`. > > Ah yes, thanks, I see now. I will add this, and we have at least one more condition, Stride >= Rows, to add to the verifier. > ignore: > > > and we have at least one more condition, Stride >= Rows, to add to the verifier. > > %Stride is not an immediate. yes, the stride can be an arbitrary value. In some (probably most) it will be a ConstantInt, so it might be worth just checking for ConstantInt. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Fri Jul 10 01:47:56 2020 From: llvm-commits at lists.llvm.org (Nuno Lopes via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:47:56 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <0c1ad9421d0b4635f6e64194f649fcde@localhost.localdomain> nlopes added a comment. In D83440#2143450 , @brooksmoses wrote: > A question about this: We got a couple of dozen MSAN warnings in the Google codebase from the revision that removed these -- I'm guessing that perhaps what happened was that the undef in question was a use-of-uninitialized-value, and this optimization was hiding the use so the MSAN checks didn't trigger. Is re-enabling this going to make those MSAN warnings go away again, re-hiding this undefined behavior? Yes, it may hide some warnings. But that's life. Optimizations do hide programming errors. This is not the only one for sure. But it's ok as you are checking the code that will run. If the compiler can "fix" bugs automatically, then developers don't need to worry about those. As long as you keep running these checks continuously to track changes in the compiler like this one :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Fri Jul 10 01:48:06 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via llvm-commits) Date: Fri, 10 Jul 2020 01:48:06 -0700 (PDT) Subject: [compiler-rt] c06417b - Fix check-all with -DLLVM_USE_SANITIZER=Address Message-ID: <5f082b46.1c69fb81.b160d.dd85@mx.google.com> Author: Vitaly Buka Date: 2020-07-10T01:47:51-07:00 New Revision: c06417b24dfbfd62650ab05aff78e188b5d00681 URL: https://github.com/llvm/llvm-project/commit/c06417b24dfbfd62650ab05aff78e188b5d00681 DIFF: https://github.com/llvm/llvm-project/commit/c06417b24dfbfd62650ab05aff78e188b5d00681.diff LOG: Fix check-all with -DLLVM_USE_SANITIZER=Address Added: Modified: clang/test/SemaTemplate/stack-exhaustion.cpp compiler-rt/cmake/config-ix.cmake llvm/test/tools/gold/lit.local.cfg Removed: ################################################################################ diff --git a/clang/test/SemaTemplate/stack-exhaustion.cpp b/clang/test/SemaTemplate/stack-exhaustion.cpp index 1eb1b474ffa5..c7bfea4132d5 100644 --- a/clang/test/SemaTemplate/stack-exhaustion.cpp +++ b/clang/test/SemaTemplate/stack-exhaustion.cpp @@ -8,6 +8,9 @@ // implementation limits, just disable the test. // UNSUPPORTED: system-netbsd +// asan has own stack-overflow check. +// UNSUPPORTED: asan + // expected-warning@* 0-1{{stack nearly exhausted}} // expected-note@* 0+{{}} diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake index 1f3697ff6f65..2edc1dabd90d 100644 --- a/compiler-rt/cmake/config-ix.cmake +++ b/compiler-rt/cmake/config-ix.cmake @@ -650,7 +650,7 @@ endif() # TODO: Add builtins support. -if (CRT_SUPPORTED_ARCH AND OS_NAME MATCHES "Linux") +if (CRT_SUPPORTED_ARCH AND OS_NAME MATCHES "Linux" AND NOT LLVM_USE_SANITIZER) set(COMPILER_RT_HAS_CRT TRUE) else() set(COMPILER_RT_HAS_CRT FALSE) diff --git a/llvm/test/tools/gold/lit.local.cfg b/llvm/test/tools/gold/lit.local.cfg index a704bb548b9d..6d2835ac8be6 100644 --- a/llvm/test/tools/gold/lit.local.cfg +++ b/llvm/test/tools/gold/lit.local.cfg @@ -1,2 +1,7 @@ -if (not 'ld_plugin' in config.available_features): +if (not 'ld_plugin' in config.available_features): config.unsupported = True + +# gold can't load instrumented plugin. +for san in ['asan', 'msan', 'ubsan']: + if (san in config.available_features): + config.unsupported = True From llvm-commits at lists.llvm.org Fri Jul 10 01:48:16 2020 From: llvm-commits at lists.llvm.org (Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:48:16 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <3f1781a12d833fcb238b155b76a99817@localhost.localdomain> khchen updated this revision to Diff 276951. khchen added a comment. addess asb's comment. [RISCV][Clang] Support -mcpu option. Summary: 1. gcc uses `-march` and `-mtune` flag to chose arch and pipeline model, but clang does not have `-mtune` flag, we uses `-mcpu` to chose both info. 2. Add SiFive e31 and u54 cpu which have default march and pipeline model. 3. Specific `-mcpu` with rocket-rv[32|64] would select pipeline model only, and use the driver's arch choosing logic to get default arch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 Files: clang/lib/Basic/Targets/RISCV.cpp clang/lib/Basic/Targets/RISCV.h clang/lib/Driver/ToolChains/Arch/RISCV.cpp clang/lib/Driver/ToolChains/Arch/RISCV.h clang/lib/Driver/ToolChains/CommonArgs.cpp clang/test/Driver/riscv-cpus.c llvm/include/llvm/Support/RISCVTargetParser.def llvm/include/llvm/Support/TargetParser.h llvm/lib/Support/TargetParser.cpp llvm/lib/Target/RISCV/RISCV.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D71124.276951.patch Type: text/x-patch Size: 17919 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:50:55 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:50:55 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: hans accepted this revision. hans added a comment. Still lgtm. For what it's worth, I think you could have just re-committed with the fixes rather than uploading for review again. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Fri Jul 10 01:53:03 2020 From: llvm-commits at lists.llvm.org (Brooks Moses via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:53:03 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: <28540f085d550c218e59f5c4d787c39c@localhost.localdomain> brooksmoses added a comment. Replying to my own question, as I was able to test this sooner than I expected: Yes, it looks like the new MSAN warnings remain after this revision. Excellent! I think that proves that this was a useful fix. :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Fri Jul 10 01:53:13 2020 From: llvm-commits at lists.llvm.org (David Stuttard via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:53:13 +0000 (UTC) Subject: [PATCH] D83540: [NFC] Change isFPPredicate comparison to ignore lower bound Message-ID: dstuttard created this revision. Herald added subscribers: llvm-commits, kerbowa, hiraditya, nhaehnle, jvesely, arsenm. Herald added a project: LLVM. Since changing the Predicate to be an unsigned enum, the lower bound check for isFPPredicate no longer needs to check the lower bound, since it will always evaluate to true. Also fixed a similar issue in SIISelLowering.cpp by removing the need for comparing to FIRST and LAST predicates Added an assert to the isFPPredicate comparison to flag if the FIRST_FCMP_PREDICATE is ever changed to anything other than 0, in which case the logic will break. Without this change warnings are generated in VS. Change-Id: I358f0daf28c0628c7bda8ad4cab4e1757b761bab Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83540 Files: llvm/include/llvm/IR/InstrTypes.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4567,8 +4567,7 @@ EVT VT = N->getValueType(0); const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < ICmpInst::Predicate::FIRST_ICMP_PREDICATE || - CondCode > ICmpInst::Predicate::LAST_ICMP_PREDICATE) + if (!ICmpInst::isIntPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); ICmpInst::Predicate IcInput = static_cast(CondCode); @@ -4604,10 +4603,8 @@ const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < FCmpInst::Predicate::FIRST_FCMP_PREDICATE || - CondCode > FCmpInst::Predicate::LAST_FCMP_PREDICATE) { + if (!FCmpInst::isFPPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); - } SDValue Src0 = N->getOperand(1); SDValue Src1 = N->getOperand(2); Index: llvm/include/llvm/IR/InstrTypes.h =================================================================== --- llvm/include/llvm/IR/InstrTypes.h +++ llvm/include/llvm/IR/InstrTypes.h @@ -805,7 +805,9 @@ void setPredicate(Predicate P) { setSubclassData(P); } static bool isFPPredicate(Predicate P) { - return P >= FIRST_FCMP_PREDICATE && P <= LAST_FCMP_PREDICATE; + assert(FIRST_FCMP_PREDICATE == 0 && + "FIRST_FCMP_PREDICATE is required to be 0"); + return P <= LAST_FCMP_PREDICATE; } static bool isIntPredicate(Predicate P) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83540.276953.patch Type: text/x-patch Size: 1695 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:55:19 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:55:19 +0000 (UTC) Subject: [PATCH] D66004: [WIP][X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support In-Reply-To: References: Message-ID: <567c6225a1a0a898faeaad197a61654d@localhost.localdomain> RKSimon updated this revision to Diff 276954. RKSimon added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66004/new/ https://reviews.llvm.org/D66004 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avg.ll llvm/test/CodeGen/X86/avx-trunc.ll llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll llvm/test/CodeGen/X86/bitcast-and-setcc-128.ll llvm/test/CodeGen/X86/bitcast-setcc-128.ll llvm/test/CodeGen/X86/buildvec-extract.ll llvm/test/CodeGen/X86/buildvec-insertvec.ll llvm/test/CodeGen/X86/combine-fcopysign.ll llvm/test/CodeGen/X86/combine-shl.ll llvm/test/CodeGen/X86/haddsub-shuf.ll llvm/test/CodeGen/X86/inline-asm-x-i128.ll llvm/test/CodeGen/X86/insert-into-constant-vector.ll llvm/test/CodeGen/X86/insertelement-shuffle.ll llvm/test/CodeGen/X86/known-signbits-vector.ll llvm/test/CodeGen/X86/load-partial.ll llvm/test/CodeGen/X86/load-slice.ll llvm/test/CodeGen/X86/masked_expandload.ll llvm/test/CodeGen/X86/masked_load.ll llvm/test/CodeGen/X86/masked_store_trunc.ll llvm/test/CodeGen/X86/oddshuffles.ll llvm/test/CodeGen/X86/oddsubvector.ll llvm/test/CodeGen/X86/pmul.ll llvm/test/CodeGen/X86/pmulh.ll llvm/test/CodeGen/X86/pr29112.ll llvm/test/CodeGen/X86/pr44976.ll llvm/test/CodeGen/X86/pr46585.ll llvm/test/CodeGen/X86/promote-cmp.ll llvm/test/CodeGen/X86/psubus.ll llvm/test/CodeGen/X86/shrink_vmul.ll llvm/test/CodeGen/X86/shuffle-of-insert.ll llvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll llvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll llvm/test/CodeGen/X86/sse41.ll llvm/test/CodeGen/X86/test-shrink-bug.ll llvm/test/CodeGen/X86/trunc-subvector.ll llvm/test/CodeGen/X86/udiv_fix.ll llvm/test/CodeGen/X86/udiv_fix_sat.ll llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll llvm/test/CodeGen/X86/urem-seteq-vec-nonzero.ll llvm/test/CodeGen/X86/vec_insert-2.ll llvm/test/CodeGen/X86/vec_insert-3.ll llvm/test/CodeGen/X86/vec_insert-5.ll llvm/test/CodeGen/X86/vec_int_to_fp.ll llvm/test/CodeGen/X86/vec_set-6.ll llvm/test/CodeGen/X86/vector-idiv-udiv-256.ll llvm/test/CodeGen/X86/vector-pack-256.ll llvm/test/CodeGen/X86/vector-reduce-and-bool.ll llvm/test/CodeGen/X86/vector-reduce-mul.ll llvm/test/CodeGen/X86/vector-reduce-or-bool.ll llvm/test/CodeGen/X86/vector-reduce-xor-bool.ll llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll llvm/test/CodeGen/X86/vector-shuffle-128-v8.ll llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll llvm/test/CodeGen/X86/vector-shuffle-combining.ll llvm/test/CodeGen/X86/vector-shuffle-variable-128.ll llvm/test/CodeGen/X86/vector-trunc-math.ll llvm/test/CodeGen/X86/vector-trunc.ll llvm/test/CodeGen/X86/vector-zext.ll llvm/test/CodeGen/X86/vselect.ll llvm/test/CodeGen/X86/vshift-4.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D66004.276954.patch Type: text/x-patch Size: 278290 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 01:56:58 2020 From: llvm-commits at lists.llvm.org (David Stuttard via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:56:58 +0000 (UTC) Subject: [PATCH] D83540: [NFC] Change isFPPredicate comparison to ignore lower bound In-Reply-To: References: Message-ID: dstuttard added reviewers: gchatelet, nhaehnle, serge-sans-paille. dstuttard added a comment. D81662 started causing warnings for VS. This fixes those. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83540/new/ https://reviews.llvm.org/D83540 From llvm-commits at lists.llvm.org Fri Jul 10 01:57:55 2020 From: llvm-commits at lists.llvm.org (Petar Avramovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 08:57:55 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: Petar.Avramovic updated this revision to Diff 276955. Petar.Avramovic added a comment. Move isValid check on top. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 Files: llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83384.276955.patch Type: text/x-patch Size: 4361 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:04:41 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:04:41 +0000 (UTC) Subject: [PATCH] D83540: [NFC] Change isFPPredicate comparison to ignore lower bound In-Reply-To: References: Message-ID: gchatelet accepted this revision. gchatelet added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83540/new/ https://reviews.llvm.org/D83540 From llvm-commits at lists.llvm.org Fri Jul 10 02:05:49 2020 From: llvm-commits at lists.llvm.org (Brooks Moses via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:05:49 +0000 (UTC) Subject: [PATCH] D83440: [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison In-Reply-To: References: Message-ID: brooksmoses added a comment. In D83440#2143461 , @nlopes wrote: > Yes, it may hide some warnings. > But that's life. Optimizations do hide programming errors. This is not the only one for sure. But it's ok as you are checking the code that will run. If the compiler can "fix" bugs automatically, then developers don't need to worry about those. As long as you keep running these checks continuously to track changes in the compiler like this one :) Fair point, and thanks for the perspective! :) Even though I seem to have gotten "lucky" (or unlucky?) with this case, I'm sure that point will be relevant sooner or later. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83440/new/ https://reviews.llvm.org/D83440 From llvm-commits at lists.llvm.org Fri Jul 10 02:12:28 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:12:28 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <5fa8107c43593d409c155cd4b033a888@localhost.localdomain> grimar updated this revision to Diff 276957. grimar marked 7 inline comments as done. grimar edited the summary of this revision. grimar added a comment. - Addressed review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 Files: lld/ELF/InputFiles.cpp lld/test/ELF/invalid/reloc-section-reordered.test lld/test/ELF/reloc-sec-before-relocated.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83469.276957.patch Type: text/x-patch Size: 3814 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:17:49 2020 From: llvm-commits at lists.llvm.org (Dinar Temirbulatov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:17:49 +0000 (UTC) Subject: [PATCH] D57779: [SLP] Add support for throttling. In-Reply-To: References: Message-ID: <30539c82de7661489e4abe9be290c80e@localhost.localdomain> dtemirbulatov marked 5 inline comments as done. dtemirbulatov added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4108-4111 + int i = 0; + for (auto It = Vec.begin(), E = Vec.end(); It != E; ++It, i++) + if (i>MaxCostsRecalculations) + Vec.erase(It); ---------------- ABataev wrote: > Just `Vec.erase(Vec.rbegin(), Vec.rbegin() + (Vec.size() - MaxCostsRecalculations)`? No, We could not use "Vec.rbegin() + " with std::set. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7259 + Throttling = true; + Cost = V.getTreeCost() + ReductionCost; + } ---------------- ABataev wrote: > Looks like you missed compare ща `Cost` with `-SLPCostThreshold` here. You vectorized the tree after throttling unconditionally. Plus, the `Cost` is calculated here, but not used later except for the debug prints. we don't need need to compare here, this is done inside findSubTree(). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57779/new/ https://reviews.llvm.org/D57779 From llvm-commits at lists.llvm.org Fri Jul 10 02:18:48 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:18:48 +0000 (UTC) Subject: [PATCH] D77341: [DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff. In-Reply-To: References: Message-ID: nikic added a comment. Numbers for the new patch: https://llvm-compile-time-tracker.com/compare.php?from=c0308fd154f9a945608bd42630dc81dce5edfb40&to=e6e3534e77961cfa65d36828de5c75f36a25d009&stat=instructions The regression is definitely smaller now, but still fairly large. E.g. > 2% on mafft. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77341/new/ https://reviews.llvm.org/D77341 From llvm-commits at lists.llvm.org Fri Jul 10 02:19:20 2020 From: llvm-commits at lists.llvm.org (Dinar Temirbulatov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:19:20 +0000 (UTC) Subject: [PATCH] D57779: [SLP] Add support for throttling. In-Reply-To: References: Message-ID: dtemirbulatov updated this revision to Diff 276958. dtemirbulatov marked an inline comment as done. dtemirbulatov added a comment. Addressed remarks, rebased. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57779/new/ https://reviews.llvm.org/D57779 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll llvm/test/Transforms/SLPVectorizer/X86/powof2div.ll llvm/test/Transforms/SLPVectorizer/X86/slp-throttle.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D57779.276958.patch Type: text/x-patch Size: 65164 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:25:09 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Fri, 10 Jul 2020 02:25:09 -0700 (PDT) Subject: [llvm] 9a3e8b1 - extractConstantWithoutWrapping - use const APInt& returned by SCEVConstant::getAPInt() Message-ID: <5f0833f5.1c69fb81.985cd.e0b1@mx.google.com> Author: Simon Pilgrim Date: 2020-07-10T10:24:29+01:00 New Revision: 9a3e8b11a8317b1a3d7440b0585b011cc9527494 URL: https://github.com/llvm/llvm-project/commit/9a3e8b11a8317b1a3d7440b0585b011cc9527494 DIFF: https://github.com/llvm/llvm-project/commit/9a3e8b11a8317b1a3d7440b0585b011cc9527494.diff LOG: extractConstantWithoutWrapping - use const APInt& returned by SCEVConstant::getAPInt() Avoids unnecessary APInt copies and silences clang tidy warning. Added: Modified: llvm/lib/Analysis/ScalarEvolution.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 75926aa3a960..48c686b73260 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -1353,7 +1353,7 @@ bool ScalarEvolution::proveNoWrapByVaryingStart(const SCEV *Start, static APInt extractConstantWithoutWrapping(ScalarEvolution &SE, const SCEVConstant *ConstantTerm, const SCEVAddExpr *WholeAddExpr) { - const APInt C = ConstantTerm->getAPInt(); + const APInt &C = ConstantTerm->getAPInt(); const unsigned BitWidth = C.getBitWidth(); // Find number of trailing zeros of (x + y + ...) w/o the C first: uint32_t TZ = BitWidth; From llvm-commits at lists.llvm.org Fri Jul 10 02:25:32 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:25:32 +0000 (UTC) Subject: [PATCH] D83122: Fix crash when getVFABIMappings is called with an indirect call instruction In-Reply-To: References: Message-ID: sanwou01 requested review of this revision. sanwou01 marked 2 inline comments as done. sanwou01 added inline comments. ================ Comment at: llvm/unittests/Analysis/VectorFunctionABITest.cpp:12 #include "llvm/IR/InstIterator.h" +#include "llvm/IRReader/IRReader.h" #include "gtest/gtest.h" ---------------- fpetrogalli wrote: > Is this needed? For the definition of parseAssemblyString, yes. ================ Comment at: llvm/unittests/Analysis/VectorFunctionABITest.cpp:634 + LLVMContext C; + std::unique_ptr M = parseIR(C, R"IR( +define void @call(void () * %f) { ---------------- fpetrogalli wrote: > Very elegant, but is this `unique_ptr` needed? "borrowed" from other unit tests, so I can only take credit for finding it. I do think the `unique_ptr` is needed: parseAssemblyString passes ownership of the Module to us, so it'd be rude to drop it on the floor. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83122/new/ https://reviews.llvm.org/D83122 From llvm-commits at lists.llvm.org Fri Jul 10 02:32:52 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:32:52 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: djtodoro marked 3 inline comments as done. djtodoro added a comment. @aprantl @vsk Thanks a lot for the feedback! ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:4 + +; RUN: opt < %s -deadargelim -check-debugify -S 2>&1 \ +; RUN: | FileCheck %s -check-prefix=DEBUG ---------------- vsk wrote: > It doesn't look like the -check-debugify output is important, so it shouldn't be necessary to run the pass. > > Also, since the dbg.values are also not important, please run -debugify with -debugify-level=locations to omit those intrinsics. I see..thanks :) ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:5 +; RUN: opt < %s -deadargelim -check-debugify -S 2>&1 \ +; RUN: | FileCheck %s -check-prefix=DEBUG + ---------------- aprantl wrote: > Why use a custom prefix if there is only one FileCheck invocation? Oh, sure.. ================ Comment at: llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll:7 + +; DEBUG: %oldret = extractvalue { i16, i32 } %B, 1, !dbg ![[RET1:.*]] +; DEBUG: %oldret = extractvalue { i32, i16 } %B, 0, !dbg ![[RET2:.*]] ---------------- vsk wrote: > Please restructure these checks so they have a clear correspondence to a test function. The typical way to write this is: > > ``` > ; CHECK-LABEL: some_test1 > ; CHECK: ... > define void @some_test1 > > ; CHECK-LABEL: some_test2 > ; CHECK: ... > define void @some_test2 > ``` > etc. I was a bit lazy, sure, thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 From llvm-commits at lists.llvm.org Fri Jul 10 02:37:11 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via llvm-commits) Date: Fri, 10 Jul 2020 02:37:11 -0700 (PDT) Subject: [llvm] cf40db2 - [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping Message-ID: <5f0836c7.1c69fb81.9f63.c870@mx.google.com> Author: Mirko Brkusanin Date: 2020-07-10T11:32:32+02:00 New Revision: cf40db21af489e3e5bc1c39cea855cc068c4ce48 URL: https://github.com/llvm/llvm-project/commit/cf40db21af489e3e5bc1c39cea855cc068c4ce48 DIFF: https://github.com/llvm/llvm-project/commit/cf40db21af489e3e5bc1c39cea855cc068c4ce48.diff LOG: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT. Differential Revision: https://reviews.llvm.org/D83240 Added: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll Modified: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/BUFInstructions.td Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 0d7819bc144d..56bc0c44779d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -3839,6 +3839,8 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16: case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT: case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16: + case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT: + case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: case AMDGPU::G_AMDGPU_BUFFER_STORE: case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE: case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT: diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 4bc9fd04b3de..fa42ddc54b56 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1799,12 +1799,14 @@ defm : MTBUF_StoreIntrinsicPat; + defm : MTBUF_StoreIntrinsicPat; defm : MTBUF_StoreIntrinsicPat; defm : MTBUF_StoreIntrinsicPat; } // End HasUnpackedD16VMem. let SubtargetPredicate = HasPackedD16VMem in { defm : MTBUF_StoreIntrinsicPat; + defm : MTBUF_StoreIntrinsicPat; defm : MTBUF_StoreIntrinsicPat; defm : MTBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll new file mode 100644 index 000000000000..987c839bf91c --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll @@ -0,0 +1,492 @@ +; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -stop-after=instruction-select -verify-machineinstrs -o - %s | FileCheck -check-prefix=UNPACKED %s +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx810 -stop-after=instruction-select -verify-machineinstrs -o - %s | FileCheck -check-prefix=PACKED %s + +define amdgpu_ps void @raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset(half %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_v2f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<2 x half> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_v2f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 16 + ; UNPACKED: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; UNPACKED: [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 [[COPY7]], [[COPY]], implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[V_LSHRREV_B32_e64_]], %subreg.sub1 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_XY_gfx80_OFFEN_exact [[REG_SEQUENCE1]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_v2f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_XY_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.v2f16(<2 x half> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; FIXME: Crashes +; define amdgpu_ps void @raw_tbuffer_store_v3f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<3 x half> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { +; call void @llvm.amdgcn.raw.tbuffer.store.v3f16(<3 x half> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) +; ret void +; } + +define amdgpu_ps void @raw_tbuffer_store_v4f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<4 x half> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_v4f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1, $vgpr2 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY7:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3 + ; UNPACKED: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 16 + ; UNPACKED: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; UNPACKED: [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 [[COPY8]], [[COPY]], implicit $exec + ; UNPACKED: [[COPY9:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; UNPACKED: [[V_LSHRREV_B32_e64_1:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 [[COPY9]], [[COPY1]], implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[V_LSHRREV_B32_e64_]], %subreg.sub1, [[COPY1]], %subreg.sub2, [[V_LSHRREV_B32_e64_1]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_XYZW_gfx80_OFFEN_exact [[REG_SEQUENCE1]], [[COPY6]], [[REG_SEQUENCE]], [[COPY7]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 8 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_v4f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1, $vgpr2 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY7:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1 + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_XYZW_OFFEN_exact [[REG_SEQUENCE]], [[COPY6]], [[REG_SEQUENCE1]], [[COPY7]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 8 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.v4f16(<4 x half> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Waterfall for rsrc +define amdgpu_ps void @raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__sgpr_soffset(half %val, <4 x i32> %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc and soffset +define amdgpu_ps void @raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__vgpr_soffset(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__vgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; UNPACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; UNPACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__vgpr_voffset__vgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; PACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; PACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Waterfall for rsrc and soffset, copy for voffset +define amdgpu_ps void @raw_tbuffer_store_f16__vgpr_rsrc__sgpr_voffset__vgpr_soffset(half %val, <4 x i32> %rsrc, i32 inreg %voffset, i32 %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__sgpr_voffset__vgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; UNPACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; UNPACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__vgpr_rsrc__sgpr_voffset__vgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; PACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; PACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_glc(half %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_glc + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_glc + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 1) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc(half %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 2) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc_glc(half %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc_glc + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc_glc + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 3) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_dlc(half %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_dlc + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_D16_X_gfx80_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_f16__sgpr_rsrc__vgpr_voffset__sgpr_soffset_dlc + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_D16_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable store 2 into custom "TargetCustom7", align 1, addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f16(half %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 4) + ret void +} + +declare void @llvm.amdgcn.raw.tbuffer.store.f16(half, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v2f16(<2 x half>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v3f16(<3 x half>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v4f16(<4 x half>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll new file mode 100644 index 000000000000..e3a09d5dd367 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll @@ -0,0 +1,284 @@ +; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -stop-after=instruction-select -verify-machineinstrs -o - %s | FileCheck -check-prefix=UNPACKED %s +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx810 -stop-after=instruction-select -verify-machineinstrs -o - %s | FileCheck -check-prefix=PACKED %s + +define amdgpu_ps void @raw_tbuffer_store_i8__sgpr_rsrc__vgpr_voffset__sgpr_soffset(i8 %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_i8__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; UNPACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; UNPACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_i8__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; PACKED: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; PACKED: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.i8(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc +define amdgpu_ps void @raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__sgpr_soffset(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 inreg %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__sgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__sgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.i8(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc and soffset +define amdgpu_ps void @raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__vgpr_soffset(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__vgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr6 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; UNPACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; UNPACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; UNPACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__vgpr_voffset__vgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr6 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; PACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; PACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; PACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.i8(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc and soffset, copy for voffset +define amdgpu_ps void @raw_tbuffer_store_i8__vgpr_rsrc__sgpr_voffset__vgpr_soffset(i8 %val, <4 x i32> %rsrc, i32 inreg %voffset, i32 %soffset) { + ; UNPACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__sgpr_voffset__vgpr_soffset + ; UNPACKED: bb.1 (%ir-block.0): + ; UNPACKED: successors: %bb.2(0x80000000) + ; UNPACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; UNPACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; UNPACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; UNPACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; UNPACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; UNPACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; UNPACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; UNPACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; UNPACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; UNPACKED: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; UNPACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; UNPACKED: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; UNPACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; UNPACKED: bb.2: + ; UNPACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; UNPACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; UNPACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; UNPACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; UNPACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; UNPACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; UNPACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; UNPACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; UNPACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; UNPACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; UNPACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; UNPACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; UNPACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; UNPACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; UNPACKED: bb.3: + ; UNPACKED: successors: %bb.4(0x80000000) + ; UNPACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; UNPACKED: bb.4: + ; UNPACKED: S_ENDPGM 0 + ; PACKED-LABEL: name: raw_tbuffer_store_i8__vgpr_rsrc__sgpr_voffset__vgpr_soffset + ; PACKED: bb.1 (%ir-block.0): + ; PACKED: successors: %bb.2(0x80000000) + ; PACKED: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; PACKED: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; PACKED: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; PACKED: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; PACKED: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; PACKED: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; PACKED: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; PACKED: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; PACKED: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; PACKED: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; PACKED: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; PACKED: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; PACKED: [[S_MOV_B64_term:%[0-9]+]]:sreg_64_xexec = S_MOV_B64_term $exec + ; PACKED: bb.2: + ; PACKED: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; PACKED: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; PACKED: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; PACKED: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; PACKED: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; PACKED: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; PACKED: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; PACKED: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; PACKED: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; PACKED: [[S_AND_B64_1:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U32_e64_]], [[S_AND_B64_]], implicit-def $scc + ; PACKED: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 1 into custom "TargetCustom7", addrspace 4) + ; PACKED: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; PACKED: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc + ; PACKED: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; PACKED: bb.3: + ; PACKED: successors: %bb.4(0x80000000) + ; PACKED: $exec = S_MOV_B64_term [[S_MOV_B64_term]] + ; PACKED: bb.4: + ; PACKED: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.i8(i8 %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +declare void @llvm.amdgcn.raw.tbuffer.store.i8(i8, <4 x i32>, i32, i32, i32 immarg, i32 immarg) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll new file mode 100644 index 000000000000..24047146ffb7 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll @@ -0,0 +1,651 @@ +; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -stop-after=instruction-select -verify-machineinstrs -o - %s | FileCheck %s + +; Natural mapping +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Natural mapping +define amdgpu_ps void @raw_tbuffer_store_v2f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<2 x float> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_v2f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1, $vgpr2 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY7:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1 + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY2]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY4]], %subreg.sub2, [[COPY5]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_XY_OFFEN_exact [[REG_SEQUENCE]], [[COPY6]], [[REG_SEQUENCE1]], [[COPY7]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 8 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.v2f32(<2 x float> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Natural mapping +define amdgpu_ps void @raw_tbuffer_store_v3f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<3 x float> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_v3f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1, $vgpr2, $vgpr3 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY7:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY8:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_96 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2 + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY3]], %subreg.sub0, [[COPY4]], %subreg.sub1, [[COPY5]], %subreg.sub2, [[COPY6]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_XYZ_OFFEN_exact [[REG_SEQUENCE]], [[COPY7]], [[REG_SEQUENCE1]], [[COPY8]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 12 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.v3f32(<3 x float> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Natural mapping +define amdgpu_ps void @raw_tbuffer_store_v4f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset(<4 x float> %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_v4f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY7:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY8:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY9:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3 + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY4]], %subreg.sub0, [[COPY5]], %subreg.sub1, [[COPY6]], %subreg.sub2, [[COPY7]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_XYZW_OFFEN_exact [[REG_SEQUENCE]], [[COPY8]], [[REG_SEQUENCE1]], [[COPY9]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 16 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.v4f32(<4 x float> %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Copies for VGPR arguments +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__sgpr_voffset__sgpr_soffset(float %val, <4 x i32> inreg %rsrc, i32 inreg %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__sgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr7 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc +define amdgpu_ps void @raw_tbuffer_store_f32__vgpr_rsrc__vgpr_voffset__sgpr_soffset(float %val, <4 x i32> %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__vgpr_rsrc__vgpr_voffset__sgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: successors: %bb.2(0x80000000) + ; CHECK: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; CHECK: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; CHECK: [[S_MOV_B32_term:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32_term $exec_lo + ; CHECK: bb.2: + ; CHECK: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; CHECK: [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; CHECK: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[COPY6]], 0, 94, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[S_AND_B32_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; CHECK: $exec_lo = S_XOR_B32_term $exec_lo, [[S_AND_SAVEEXEC_B32_]], implicit-def $scc + ; CHECK: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x80000000) + ; CHECK: $exec_lo = S_MOV_B32_term [[S_MOV_B32_term]] + ; CHECK: bb.4: + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 1) + ret void +} + +; Waterfall for rsrc and soffset +define amdgpu_ps void @raw_tbuffer_store_f32__vgpr_rsrc__vgpr_voffset__vgpr_soffset(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__vgpr_rsrc__vgpr_voffset__vgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: successors: %bb.2(0x80000000) + ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; CHECK: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; CHECK: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; CHECK: [[S_MOV_B32_term:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32_term $exec_lo + ; CHECK: bb.2: + ; CHECK: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; CHECK: [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; CHECK: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; CHECK: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; CHECK: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; CHECK: [[S_AND_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U32_e64_]], [[S_AND_B32_]], implicit-def $scc + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[S_AND_B32_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; CHECK: $exec_lo = S_XOR_B32_term $exec_lo, [[S_AND_SAVEEXEC_B32_]], implicit-def $scc + ; CHECK: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x80000000) + ; CHECK: $exec_lo = S_MOV_B32_term [[S_MOV_B32_term]] + ; CHECK: bb.4: + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; Waterfall for rsrc and soffset, copy for voffset +define amdgpu_ps void @raw_tbuffer_store_f32__vgpr_rsrc__sgpr_voffset__vgpr_soffset(float %val, <4 x i32> %rsrc, i32 inreg %voffset, i32 %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__vgpr_rsrc__sgpr_voffset__vgpr_soffset + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: successors: %bb.2(0x80000000) + ; CHECK: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY6:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY5]] + ; CHECK: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; CHECK: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; CHECK: [[S_MOV_B32_term:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32_term $exec_lo + ; CHECK: bb.2: + ; CHECK: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; CHECK: [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; CHECK: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; CHECK: [[V_READFIRSTLANE_B32_4:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec + ; CHECK: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U32_e64 [[V_READFIRSTLANE_B32_4]], [[COPY6]], implicit $exec + ; CHECK: [[S_AND_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U32_e64_]], [[S_AND_B32_]], implicit-def $scc + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY7]], [[REG_SEQUENCE3]], [[V_READFIRSTLANE_B32_4]], 0, 78, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[S_AND_B32_1]], implicit-def $exec, implicit-def $scc, implicit $exec + ; CHECK: $exec_lo = S_XOR_B32_term $exec_lo, [[S_AND_SAVEEXEC_B32_]], implicit-def $scc + ; CHECK: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x80000000) + ; CHECK: $exec_lo = S_MOV_B32_term [[S_MOV_B32_term]] + ; CHECK: bb.4: + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 0) + ret void +} + +; Natural mapping + glc +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_glc(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_glc + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 1) + ret void +} + +; Natural mapping + slc +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 2) + ret void +} + +; Natural mapping + glc + slc +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc_glc(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_slc_glc + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 1, 1, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 3) + ret void +} + +; Natural mapping + dlc +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_dlc(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_dlc + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 0, 78, 0, 0, 0, 1, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 78, i32 4) + ret void +} + + + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vdpr_voffset__sgpr_soffset__voffset0(float %val, <4 x i32> inreg %rsrc, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vdpr_voffset__sgpr_soffset__voffset0 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFSET_exact [[COPY]], [[REG_SEQUENCE]], [[COPY5]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 0, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset4095(float %val, <4 x i32> inreg %rsrc, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset4095 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFSET_exact [[COPY]], [[REG_SEQUENCE]], [[COPY5]], 4095, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 4095, align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 4095, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset4096(float %val, <4 x i32> inreg %rsrc, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset4096 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4096 + ; CHECK: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY6]], [[REG_SEQUENCE]], [[COPY5]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 4096, align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 4096, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_voffset_add16(float %val, <4 x i32> inreg %rsrc, i32 %voffset.base, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_voffset_add16 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 16, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 16, align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %voffset = add i32 %voffset.base, 16 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset_add4095(float %val, <4 x i32> inreg %rsrc, i32 %voffset.base, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset_add4095 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[COPY6]], 4095, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 4095, align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %voffset = add i32 %voffset.base, 4095 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset_add4096(float %val, <4 x i32> inreg %rsrc, i32 %voffset.base, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset__voffset_add4096 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4096 + ; CHECK: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; CHECK: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY5]], [[COPY7]], 0, implicit $exec + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[V_ADD_U32_e64_]], [[REG_SEQUENCE]], [[COPY6]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 4096, align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %voffset = add i32 %voffset.base, 4096 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset4095(float %val, <4 x i32> inreg %rsrc, i32 %voffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset4095 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4095 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 4095, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset4096(float %val, <4 x i32> inreg %rsrc, i32 %voffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset4096 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4096 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 4096, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add16(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset.base) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add16 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 16 + ; CHECK: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[COPY6]], [[S_MOV_B32_]], implicit-def $scc + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[S_ADD_I32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %soffset = add i32 %soffset.base, 16 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add4095(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset.base) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add4095 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4095 + ; CHECK: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[COPY6]], [[S_MOV_B32_]], implicit-def $scc + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[S_ADD_I32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %soffset = add i32 %soffset.base, 4095 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add4096(float %val, <4 x i32> inreg %rsrc, i32 %voffset, i32 inreg %soffset.base) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add4096 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; CHECK: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; CHECK: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4096 + ; CHECK: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[COPY6]], [[S_MOV_B32_]], implicit-def $scc + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE]], [[S_ADD_I32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: S_ENDPGM 0 + %soffset = add i32 %soffset.base, 4096 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; An add of the offset is necessary, with a waterfall loop. Make sure the add is done outside of the waterfall loop. +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add5000(float %val, <4 x i32> %rsrc, i32 %voffset, i32 inreg %soffset.base) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_soffset_add5000 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: successors: %bb.2(0x80000000) + ; CHECK: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 5000 + ; CHECK: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[COPY6]], [[S_MOV_B32_]], implicit-def $scc + ; CHECK: [[COPY7:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; CHECK: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; CHECK: [[S_MOV_B32_term:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32_term $exec_lo + ; CHECK: bb.2: + ; CHECK: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY7]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY7]], implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY8]], implicit $exec + ; CHECK: [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; CHECK: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[COPY5]], [[REG_SEQUENCE3]], [[S_ADD_I32_]], 0, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7", align 1, addrspace 4) + ; CHECK: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[S_AND_B32_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; CHECK: $exec_lo = S_XOR_B32_term $exec_lo, [[S_AND_SAVEEXEC_B32_]], implicit-def $scc + ; CHECK: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x80000000) + ; CHECK: $exec_lo = S_MOV_B32_term [[S_MOV_B32_term]] + ; CHECK: bb.4: + ; CHECK: S_ENDPGM 0 + %soffset = add i32 %soffset.base, 5000 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +; An add of the offset is necessary, with a waterfall loop. Make sure the add is done outside of the waterfall loop. +define amdgpu_ps void @raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_voffset_add5000(float %val, <4 x i32> %rsrc, i32 %voffset.base, i32 inreg %soffset) { + ; CHECK-LABEL: name: raw_tbuffer_store_f32__sgpr_rsrc__vgpr_voffset__sgpr_soffset_voffset_add5000 + ; CHECK: bb.1 (%ir-block.0): + ; CHECK: successors: %bb.2(0x80000000) + ; CHECK: liveins: $sgpr2, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 + ; CHECK: $vcc_hi = IMPLICIT_DEF + ; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; CHECK: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; CHECK: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2 + ; CHECK: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3 + ; CHECK: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4 + ; CHECK: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr5 + ; CHECK: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; CHECK: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3 + ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 4096 + ; CHECK: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]] + ; CHECK: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY5]], [[COPY7]], 0, implicit $exec + ; CHECK: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1 + ; CHECK: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3 + ; CHECK: [[S_MOV_B32_term:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32_term $exec_lo + ; CHECK: bb.2: + ; CHECK: successors: %bb.3(0x40000000), %bb.2(0x40000000) + ; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY8]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], [[COPY8]], implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub0, implicit $exec + ; CHECK: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[COPY9]].sub1, implicit $exec + ; CHECK: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[V_READFIRSTLANE_B32_2]], %subreg.sub0, [[V_READFIRSTLANE_B32_3]], %subreg.sub1 + ; CHECK: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE2]], [[COPY9]], implicit $exec + ; CHECK: [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_B32 [[V_CMP_EQ_U64_e64_1]], [[V_CMP_EQ_U64_e64_]], implicit-def $scc + ; CHECK: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3 + ; CHECK: TBUFFER_STORE_FORMAT_X_OFFEN_exact [[COPY]], [[V_ADD_U32_e64_]], [[REG_SEQUENCE3]], [[COPY6]], 904, 94, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into custom "TargetCustom7" + 5000, align 1, addrspace 4) + ; CHECK: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[S_AND_B32_]], implicit-def $exec, implicit-def $scc, implicit $exec + ; CHECK: $exec_lo = S_XOR_B32_term $exec_lo, [[S_AND_SAVEEXEC_B32_]], implicit-def $scc + ; CHECK: S_CBRANCH_EXECNZ %bb.2, implicit $exec + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x80000000) + ; CHECK: $exec_lo = S_MOV_B32_term [[S_MOV_B32_term]] + ; CHECK: bb.4: + ; CHECK: S_ENDPGM 0 + %voffset = add i32 %voffset.base, 5000 + call void @llvm.amdgcn.raw.tbuffer.store.f32(float %val, <4 x i32> %rsrc, i32 %voffset, i32 %soffset, i32 94, i32 0) + ret void +} + +declare void @llvm.amdgcn.raw.tbuffer.store.f32(float, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v3f32(<3 x float>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) +declare void @llvm.amdgcn.raw.tbuffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32 immarg, i32 immarg) From llvm-commits at lists.llvm.org Fri Jul 10 02:37:18 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:37:18 +0000 (UTC) Subject: [PATCH] D83240: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping In-Reply-To: References: Message-ID: <1d0999cca1ab80e7096679651d48bb67@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGcf40db21af48: [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping (authored by mbrkusanin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83240/new/ https://reviews.llvm.org/D83240 Files: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/BUFInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.f16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.i8.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.store.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83240.276965.patch Type: text/x-patch Size: 115493 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:37:39 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Fri, 10 Jul 2020 02:37:39 -0700 (PDT) Subject: [llvm] 264ab1e - [LV] Pick vector loop body as insert point for SCEV expansion. Message-ID: <5f0836e3.1c69fb81.2a2cd.c736@mx.google.com> Author: Florian Hahn Date: 2020-07-10T10:37:12+01:00 New Revision: 264ab1e2c815728ede5d1fce257abbd04044cc27 URL: https://github.com/llvm/llvm-project/commit/264ab1e2c815728ede5d1fce257abbd04044cc27 DIFF: https://github.com/llvm/llvm-project/commit/264ab1e2c815728ede5d1fce257abbd04044cc27.diff LOG: [LV] Pick vector loop body as insert point for SCEV expansion. Currently the DomTree is not kept up to date for additional blocks generated in the vector loop, for example when vectorizing with predication. SCEVExpander relies on dominance checks when looking for existing instructions to re-use and in some cases that can lead to the expander picking instructions that do not actually dominate their insert point (e.g. as in PR46525). Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG is only patched up after generating code for a block. For now, we can just use the vector loop header, as this ensures the inserted instructions dominate all uses in the vector loop. There should be no noticeable impact on the generated code, as other passes should sink those instructions, if profitable. Fixes PR46525. Reviewers: Ayal, gilr, mkazantsev, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D83288 Added: llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 4998082f3868..10e690d56ffd 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -2877,6 +2877,18 @@ Value *InnerLoopVectorizer::emitTransformedIndex( return B.CreateMul(X, Y); }; + // Get a suitable insert point for SCEV expansion. For blocks in the vector + // loop, choose the end of the vector loop header (=LoopVectorBody), because + // the DomTree is not kept up-to-date for additional blocks generated in the + // vector loop. By using the header as insertion point, we guarantee that the + // expanded instructions dominate all their uses. + auto GetInsertPoint = [this, &B]() { + BasicBlock *InsertBB = B.GetInsertPoint()->getParent(); + if (InsertBB != LoopVectorBody && + LI->getLoopFor(LoopVectorBody) == LI->getLoopFor(InsertBB)) + return LoopVectorBody->getTerminator(); + return &*B.GetInsertPoint(); + }; switch (ID.getKind()) { case InductionDescriptor::IK_IntInduction: { assert(Index->getType() == StartValue->getType() && @@ -2884,7 +2896,7 @@ Value *InnerLoopVectorizer::emitTransformedIndex( if (ID.getConstIntStepValue() && ID.getConstIntStepValue()->isMinusOne()) return B.CreateSub(StartValue, Index); auto *Offset = CreateMul( - Index, Exp.expandCodeFor(Step, Index->getType(), &*B.GetInsertPoint())); + Index, Exp.expandCodeFor(Step, Index->getType(), GetInsertPoint())); return CreateAdd(StartValue, Offset); } case InductionDescriptor::IK_PtrInduction: { @@ -2892,8 +2904,8 @@ Value *InnerLoopVectorizer::emitTransformedIndex( "Expected constant step for pointer induction"); return B.CreateGEP( StartValue->getType()->getPointerElementType(), StartValue, - CreateMul(Index, Exp.expandCodeFor(Step, Index->getType(), - &*B.GetInsertPoint()))); + CreateMul(Index, + Exp.expandCodeFor(Step, Index->getType(), GetInsertPoint()))); } case InductionDescriptor::IK_FpInduction: { assert(Step->getType()->isFloatingPointTy() && "Expected FP Step value"); diff --git a/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll new file mode 100644 index 000000000000..98f394e2dc3d --- /dev/null +++ b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll @@ -0,0 +1,114 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -loop-vectorize -force-vector-width=2 -S -prefer-predicate-over-epilog %s | FileCheck %s + + +; Test case for PR46525. There are two candidates to pick for +; `udiv i64 %y, %add` when expanding SCEV expressions. Make sure we pick %div, +; which dominates the vector loop. + +define void @test(i16 %x, i64 %y, i32* %ptr) { +; CHECK-LABEL: @test( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[CONV19:%.*]] = sext i16 [[X:%.*]] to i64 +; CHECK-NEXT: [[ADD:%.*]] = add i64 [[CONV19]], 492802768830814067 +; CHECK-NEXT: br label [[LOOP_PREHEADER:%.*]] +; CHECK: loop.preheader: +; CHECK-NEXT: [[DIV:%.*]] = udiv i64 [[Y:%.*]], [[ADD]] +; CHECK-NEXT: [[INC:%.*]] = add i64 [[DIV]], 1 +; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i64 [[DIV]], 4 +; CHECK-NEXT: [[TMP1:%.*]] = udiv i64 [[TMP0]], [[INC]] +; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1 +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP2]], 1 +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2 +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] +; CHECK-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], [[INC]] +; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP2]], 1 +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> undef, i64 [[TRIP_COUNT_MINUS_1]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE6:%.*]] ] +; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], [[INC]] +; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> undef, i64 [[OFFSET_IDX]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> undef, i64 [[INC]], i32 0 +; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i64> , [[DOTSPLAT]] +; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i64> [[BROADCAST_SPLAT2]], [[TMP3]] +; CHECK-NEXT: [[TMP4:%.*]] = mul i64 0, [[INC]] +; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], [[TMP4]] +; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x i64> undef, i64 [[INDEX]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT3]], <2 x i64> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT4]], +; CHECK-NEXT: [[TMP6:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]] +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP6]], i32 0 +; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] +; CHECK: pred.store.if: +; CHECK-NEXT: store i32 0, i32* [[PTR:%.*]], align 4 +; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]] +; CHECK: pred.store.continue: +; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP6]], i32 1 +; CHECK-NEXT: br i1 [[TMP8]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]] +; CHECK: pred.store.if5: +; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4 +; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]] +; CHECK: pred.store.continue6: +; CHECK-NEXT: [[OFFSET_IDX7:%.*]] = mul i64 [[INDEX]], [[INC]] +; CHECK-NEXT: [[TMP9:%.*]] = trunc i64 [[OFFSET_IDX7]] to i8 +; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[INC]] to i8 +; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <2 x i8> undef, i8 [[TMP9]], i32 0 +; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <2 x i8> [[BROADCAST_SPLATINSERT8]], <2 x i8> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: [[DOTSPLATINSERT10:%.*]] = insertelement <2 x i8> undef, i8 [[TMP10]], i32 0 +; CHECK-NEXT: [[DOTSPLAT11:%.*]] = shufflevector <2 x i8> [[DOTSPLATINSERT10]], <2 x i8> undef, <2 x i32> zeroinitializer +; CHECK-NEXT: [[TMP11:%.*]] = mul <2 x i8> , [[DOTSPLAT11]] +; CHECK-NEXT: [[INDUCTION12:%.*]] = add <2 x i8> [[BROADCAST_SPLAT9]], [[TMP11]] +; CHECK-NEXT: [[TMP12:%.*]] = mul i8 0, [[TMP10]] +; CHECK-NEXT: [[TMP13:%.*]] = add i8 [[TMP9]], [[TMP12]] +; CHECK-NEXT: [[TMP14:%.*]] = add i8 [[TMP13]], 1 +; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2 +; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0 +; +entry: + %conv19 = sext i16 %x to i64 + %add = add i64 %conv19, 492802768830814067 + br label %loop.preheader + +loop.preheader: + %div = udiv i64 %y, %add + %inc = add i64 %div, 1 + br label %loop + +loop: + %iv = phi i64 [ %iv.next, %loop ], [ 0, %loop.preheader ] + store i32 0, i32* %ptr, align 4 + %v2 = trunc i64 %iv to i8 + %v3 = add i8 %v2, 1 + %cmp15 = icmp slt i8 %v3, 5 + %iv.next = add i64 %iv, %inc + br i1 %cmp15, label %loop, label %loop.exit + +loop.exit: + %div.1 = udiv i64 %y, %add + %v1 = add i64 %div.1, 1 + br label %loop.2 + +loop.2: + %iv.1 = phi i64 [ %iv.next.1, %loop.2 ], [ 0, %loop.exit ] + %iv.next.1 = add i64 %iv.1, %v1 + call void @use(i64 %iv.next.1) + %ec = icmp ult i64 %iv.next.1, 200 + br i1 %ec, label %loop.2, label %loop.2.exit + +loop.2.exit: + %c = call i1 @cond() + br i1 %c, label %loop.preheader, label %exit + +exit: + ret void +} + +declare void @use(i64) +declare i1 @cond() From llvm-commits at lists.llvm.org Fri Jul 10 02:37:49 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:37:49 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. In-Reply-To: References: Message-ID: <9fe6fead8ce76978fae6ed23a86cb295@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG264ab1e2c815: [LV] Pick vector loop body as insert point for SCEV expansion. (authored by fhahn). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83288/new/ https://reviews.llvm.org/D83288 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83288.276967.patch Type: text/x-patch Size: 8223 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:38:31 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:38:31 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: djtodoro updated this revision to Diff 276966. djtodoro added a comment. - Addressing comments - Reduce the test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 Files: llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81939.276966.patch Type: text/x-patch Size: 6031 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:46:19 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:46:19 +0000 (UTC) Subject: [PATCH] D81939: [deadargelim] Attach dbg info to the insert/extractvalue instructions In-Reply-To: References: Message-ID: djtodoro updated this revision to Diff 276969. djtodoro added a comment. - [test] remove unused debugify metadata CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81939/new/ https://reviews.llvm.org/D81939 Files: llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp llvm/test/DebugInfo/X86/dbgloc-insert-extract-val-instrs.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81939.276969.patch Type: text/x-patch Size: 5973 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 02:47:22 2020 From: llvm-commits at lists.llvm.org (Orlando Cazalet-Hyams via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 09:47:22 +0000 (UTC) Subject: [PATCH] D83495: [DebugInfo] Add DWARF emission for DBG_VALUE_LIST In-Reply-To: References: Message-ID: Orlando added a comment. Hey, just noticed a couple of comments to remove from the tests. ================ Comment at: llvm/test/DebugInfo/X86/dbg_value_list_clobbers.mir:41 + bb.0.entry: + ; XXX NOTE that I've added an explicit DW_OP_stack_value to the expressions. REMOVE THESE when the + ; implied operator is added to the expressions automatically. ---------------- You can remove this XXX note. ================ Comment at: llvm/test/DebugInfo/X86/dbg_value_list_clobbers.mir:67 + DBG_VALUE_LIST !12, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus, DW_OP_stack_value), $eax, $ecx, debug-location !15 + ; XXX THIS CHECK FAILS + ; CHECK-NEXT: [{{.*}}): DW_OP_breg0 RAX+0, DW_OP_breg2 RCX+0, DW_OP_plus, DW_OP_stack_value ---------------- I assume this no longer fails? ================ Comment at: llvm/test/DebugInfo/X86/dbg_value_list_emission.mir:63 + ; (3) Check that multiple references to one reg arg works. + ; XXX What was the consensus on this - are we allowing it? + DBG_VALUE_LIST !25, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 0, DW_OP_minus, DW_OP_stack_value), $eax, debug-location !15 ---------------- Can remove this XXX note. As you mentioned offline, having multiple references to the same arg (i.e. multiple `DW_OP_LLVM_arg, 0` in the expr) is never a problem. Though, slightly tangentially, I'm still a little unclear on what the final decision was on how to handle duplicate register arg operands. In D82363 you said 'always treat DBG_VALUE_LISTs as potentially having them'. Please could you explain a little further? (i.e. is it an error state, do we need to add extra checks when dealing with DBG_VALUE_LISTs etc). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83495/new/ https://reviews.llvm.org/D83495 From llvm-commits at lists.llvm.org Fri Jul 10 03:00:49 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Mikael_Holm=C3=A9n_via_Phabricator?= via llvm-commits) Date: Fri, 10 Jul 2020 10:00:49 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <19b6ba31ca2002d8435b23444c86885d@localhost.localdomain> uabelho added a comment. In D82998#2143456 , @dmgreen wrote: > Thanks. I will take a look. > > From this, that bug appears to be fixed: https://godbolt.org/z/6W8hc8. Probably from D82987 I would expect. I will check. Opt 10.0.0 does show the problem, where as trunk seems to be fixed. > > Let me know about the runtime failure. Ok thanks! We'll dig further in the new error. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 03:03:07 2020 From: llvm-commits at lists.llvm.org (Anna Welker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:03:07 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: anwel updated this revision to Diff 276970. anwel marked 6 inline comments as done. anwel added a comment. Revisited the new test to make it cleaner. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll llvm/test/Transforms/LoopVectorize/pointer-induction.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81267.276970.patch Type: text/x-patch Size: 50504 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:03:26 2020 From: llvm-commits at lists.llvm.org (Anna Welker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:03:26 +0000 (UTC) Subject: [PATCH] D81267: [LV] Enable the LoopVectorizer to create pointer inductions In-Reply-To: References: Message-ID: anwel added inline comments. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:4 +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + ---------------- fhahn wrote: > dmgreen wrote: > > I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in. > It looks like the options above actually force vectorization with a certain factor. In that case, it Is probably best to remove the triple. > > I'd also consider just checking the loop-vectorize output (without -dce -instcombine), if it is not too messy, as it makes the test more prone to break when something changes in instcombine. Also, it might be possible to only specifically check the IR related to the generated induction, rather than autogenerating the checks, which include a lot of relatively irrelevant stuff. I thought it did need the target information to behave in the right way, but apparently I was mistaken - so no relocation necessary, I removed the target. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:4 +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + ---------------- anwel wrote: > fhahn wrote: > > dmgreen wrote: > > > I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in. > > It looks like the options above actually force vectorization with a certain factor. In that case, it Is probably best to remove the triple. > > > > I'd also consider just checking the loop-vectorize output (without -dce -instcombine), if it is not too messy, as it makes the test more prone to break when something changes in instcombine. Also, it might be possible to only specifically check the IR related to the generated induction, rather than autogenerating the checks, which include a lot of relatively irrelevant stuff. > I thought it did need the target information to behave in the right way, but apparently I was mistaken - so no relocation necessary, I removed the target. Thanks for the feedback, I don't have much experience writing opt tests so your advice is very welcome. I have removed the triple and the meta data, after checking that we don't need them, and reduced the checks to `vector.ph`, `vector.body` and the loop latch that changes the induction variable. ================ Comment at: llvm/test/Transforms/LoopVectorize/pointer-induction.ll:7 +; Function Attrs: nofree norecurse nounwind +define void @a(i8* readnone %b) local_unnamed_addr #0 { +; CHECK-LABEL: @a( ---------------- dmgreen wrote: > Also some of this might be able to be cleaned up, like the local_unnamed_addr, the metadata and all/most(?) of the attributes. Should be a lot cleaner now. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81267/new/ https://reviews.llvm.org/D81267 From llvm-commits at lists.llvm.org Fri Jul 10 03:23:44 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:23:44 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <1918e1487ed5cb758b646cfb8742ca21@localhost.localdomain> hans added a comment. We're seeing a runtime error in Chromium too, tracked at https://bugs.chromium.org/p/chromium/issues/detail?id=1103818 I'll try to provide more details shortly. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 03:27:04 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:27:04 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <46b8b7c6ffd319533db6dd1218b9722c@localhost.localdomain> dmgreen added a comment. OK Thanks. I'll revert the patch and we can see what's up in the reproducers. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 03:31:49 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:31:49 +0000 (UTC) Subject: [PATCH] D83542: [SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock Message-ID: david-arm created this revision. david-arm added reviewers: sdesmalen, ctetreau, c-rhodes. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a reviewer: efriedma. Herald added a project: LLVM. In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83542 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/scalable-vector.ll Index: llvm/test/Transforms/SLPVectorizer/AArch64/scalable-vector.ll =================================================================== --- llvm/test/Transforms/SLPVectorizer/AArch64/scalable-vector.ll +++ llvm/test/Transforms/SLPVectorizer/AArch64/scalable-vector.ll @@ -1,5 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -slp-vectorizer -S | FileCheck %s +; RUN: opt < %s -slp-vectorizer -S 2>%t | FileCheck %s +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" @@ -21,5 +24,28 @@ ret void } +define @scalable_phi( %a, i32 %b) { +; CHECK-LABEL: @scalable_phi( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[B:%.*]], 0 +; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[END:%.*]] +; CHECK: if.then: +; CHECK-NEXT: br label [[END]] +; CHECK: end: +; CHECK-NEXT: [[RETVAL:%.*]] = phi [ [[A:%.*]], [[ENTRY:%.*]] ], [ zeroinitializer, [[IF_THEN]] ] +; CHECK-NEXT: ret [[RETVAL]] +; +entry: + %cmp = icmp eq i32 %b, 0 + br i1 %cmp, label %if.then, label %end + +if.then: + br label %end + +end: + %retval = phi [ %a, %entry ], [ zeroinitializer, %if.then ] + ret %retval +} + declare @llvm.masked.load.nxv16i8.p0nxv16i8(*, i32 immarg, , ) declare void @llvm.masked.store.nxv16i8.p0nxv16i8(, *, i32 immarg, ) Index: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp =================================================================== --- llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -7389,8 +7389,19 @@ // Look for the next elements with the same type. SmallVector::iterator SameTypeIt = IncIt; Type *EltTy = (*IncIt)->getType(); - unsigned EltSize = EltTy->isSized() ? DL->getTypeSizeInBits(EltTy) - : MaxVecRegSize; + unsigned EltSize; + + if (EltTy->isSized()) { + TypeSize EltTS = DL->getTypeSizeInBits(EltTy); + if (EltTS.isScalable()) { + // For now, just ignore vectorizing scalable types. + ++IncIt; + continue; + } + EltSize = EltTS.getFixedSize(); + } else + EltSize = MaxVecRegSize; + unsigned MaxNumElts = MaxVecRegSize / EltSize; if (MaxNumElts < 2) { ++IncIt; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83542.276972.patch Type: text/x-patch Size: 2720 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:38:39 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:38:39 +0000 (UTC) Subject: [PATCH] D83395: [SVE] Code generation for fixed length vector truncates. In-Reply-To: References: Message-ID: <9a5eb08213abd3e7d60dfd3ab37292c0@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf78e6a3095ca: [SVE] Code generation for fixed length vector truncates. (authored by paulwalker-arm). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83395/new/ https://reviews.llvm.org/D83395 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83395.276974.patch Type: text/x-patch Size: 20207 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:38:40 2020 From: llvm-commits at lists.llvm.org (Paul Walker via llvm-commits) Date: Fri, 10 Jul 2020 03:38:40 -0700 (PDT) Subject: [llvm] f78e6a3 - [SVE] Code generation for fixed length vector truncates. Message-ID: <5f084530.1c69fb81.79187.d24e@mx.google.com> Author: Paul Walker Date: 2020-07-10T10:37:19Z New Revision: f78e6a3095ca82d7621def46b2531b922a56e8f9 URL: https://github.com/llvm/llvm-project/commit/f78e6a3095ca82d7621def46b2531b922a56e8f9 DIFF: https://github.com/llvm/llvm-project/commit/f78e6a3095ca82d7621def46b2531b922a56e8f9.diff LOG: [SVE] Code generation for fixed length vector truncates. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Differential Revision: https://reviews.llvm.org/D83395 Added: llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll Modified: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 1a3bbaf1832d..65ccc18ed601 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -961,6 +961,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, for (MVT VT : MVT::fp_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT)) addTypeForFixedLengthSVE(VT); + + // 64bit results can mean a bigger than NEON input. + for (auto VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32}) + setOperationAction(ISD::TRUNCATE, VT, Custom); + + // 128bit results imply a bigger than NEON input. + for (auto VT : {MVT::v16i8, MVT::v8i16, MVT::v4i32}) + setOperationAction(ISD::TRUNCATE, VT, Custom); } } @@ -1061,6 +1069,7 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) { setOperationAction(ISD::FADD, VT, Custom); setOperationAction(ISD::LOAD, VT, Custom); setOperationAction(ISD::STORE, VT, Custom); + setOperationAction(ISD::TRUNCATE, VT, Custom); } void AArch64TargetLowering::addDRTypeForNEON(MVT VT) { @@ -8843,6 +8852,9 @@ SDValue AArch64TargetLowering::LowerTRUNCATE(SDValue Op, if (!VT.isVector() || VT.isScalableVector()) return Op; + if (useSVEForFixedLengthVectorVT(Op.getOperand(0).getValueType())) + return LowerFixedLengthVectorTruncateToSVE(Op, DAG); + // Since we are looking for a right shift by a constant value of 1 and we are // operating on types at least 16 bits in length (sign/zero extended OpA and // OpB, which are at least 8 bits), it follows that the truncate will always @@ -15059,6 +15071,42 @@ SDValue AArch64TargetLowering::LowerFixedLengthVectorStoreToSVE( Store->isTruncatingStore()); } +SDValue AArch64TargetLowering::LowerFixedLengthVectorTruncateToSVE( + SDValue Op, SelectionDAG &DAG) const { + EVT VT = Op.getValueType(); + assert(VT.isFixedLengthVector() && "Expected fixed length vector type!"); + + SDLoc DL(Op); + SDValue Val = Op.getOperand(0); + EVT ContainerVT = getContainerForFixedLengthVector(DAG, Val.getValueType()); + Val = convertToScalableVector(DAG, ContainerVT, Val); + + // Repeatedly truncate Val until the result is of the desired element type. + switch (ContainerVT.getSimpleVT().SimpleTy) { + default: + llvm_unreachable("unimplemented container type"); + case MVT::nxv2i64: + Val = DAG.getNode(ISD::BITCAST, DL, MVT::nxv4i32, Val); + Val = DAG.getNode(AArch64ISD::UZP1, DL, MVT::nxv4i32, Val, Val); + if (VT.getVectorElementType() == MVT::i32) + break; + LLVM_FALLTHROUGH; + case MVT::nxv4i32: + Val = DAG.getNode(ISD::BITCAST, DL, MVT::nxv8i16, Val); + Val = DAG.getNode(AArch64ISD::UZP1, DL, MVT::nxv8i16, Val, Val); + if (VT.getVectorElementType() == MVT::i16) + break; + LLVM_FALLTHROUGH; + case MVT::nxv8i16: + Val = DAG.getNode(ISD::BITCAST, DL, MVT::nxv16i8, Val); + Val = DAG.getNode(AArch64ISD::UZP1, DL, MVT::nxv16i8, Val, Val); + assert(VT.getVectorElementType() == MVT::i8 && "Unexpected element type!"); + break; + } + + return convertFromScalableVector(DAG, VT, Val); +} + SDValue AArch64TargetLowering::LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG, unsigned NewOp) const { diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h index 1be44797aac7..4fe77481706b 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -892,6 +892,8 @@ class AArch64TargetLowering : public TargetLowering { SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const; SDValue LowerFixedLengthVectorStoreToSVE(SDValue Op, SelectionDAG &DAG) const; + SDValue LowerFixedLengthVectorTruncateToSVE(SDValue Op, + SelectionDAG &DAG) const; SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG, SmallVectorImpl &Created) const override; diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll new file mode 100644 index 000000000000..f62abc094606 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll @@ -0,0 +1,369 @@ +; RUN: llc -aarch64-sve-vector-bits-min=128 -asm-verbose=0 < %s | FileCheck %s -check-prefix=NO_SVE +; RUN: llc -aarch64-sve-vector-bits-min=256 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK +; RUN: llc -aarch64-sve-vector-bits-min=384 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK +; RUN: llc -aarch64-sve-vector-bits-min=512 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=640 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=768 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=896 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512 +; RUN: llc -aarch64-sve-vector-bits-min=1024 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1152 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1280 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1408 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1536 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1664 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1792 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=1920 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 +; RUN: llc -aarch64-sve-vector-bits-min=2048 -asm-verbose=0 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 + +target triple = "aarch64-unknown-linux-gnu" + +; Don't use SVE when its registers are no bigger than NEON. +; NO_SVE-NOT: z{0-9} + +; +; truncate i16 -> i8 +; + +define <16 x i8> @trunc_v16i16_v16i8(<16 x i16>* %in) #0 { +; CHECK-LABEL: trunc_v16i16_v16i8: +; CHECK: ptrue [[PG:p[0-9]+]].h, vl16 +; CHECK-NEXT: ld1h { [[A_HALFS:z[0-9]+]].h }, [[PG]]/z, [x0] +; CHECK-NEXT: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; CHECK-NEXT: ret + %a = load <16 x i16>, <16 x i16>* %in + %b = trunc <16 x i16> %a to <16 x i8> + ret <16 x i8> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i16_v32i8(<32 x i16>* %in, <32 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v32i16_v32i8: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32 +; VBITS_GE_512: ld1h { [[A_HALFS:z[0-9]+]].h }, [[PG]]/z, [x0] +; VBITS_GE_512: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_512: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <32 x i16>, <32 x i16>* %in + %b = trunc <32 x i16> %a to <32 x i8> + %c = add <32 x i8> %b, %b + store <32 x i8> %c, <32 x i8>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v64i16_v64i8(<64 x i16>* %in, <64 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v64i16_v64i8: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl64 +; VBITS_GE_1024: ld1h { [[A_HALFS:z[0-9]+]].h }, [[PG]]/z, [x0] +; VBITS_GE_1024: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_1024: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <64 x i16>, <64 x i16>* %in + %b = trunc <64 x i16> %a to <64 x i8> + %c = add <64 x i8> %b, %b + store <64 x i8> %c, <64 x i8>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v128i16_v128i8(<128 x i16>* %in, <128 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v128i16_v128i8: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl128 +; VBITS_GE_2048: ld1h { [[A_HALFS:z[0-9]+]].h }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_2048: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <128 x i16>, <128 x i16>* %in + %b = trunc <128 x i16> %a to <128 x i8> + %c = add <128 x i8> %b, %b + store <128 x i8> %c, <128 x i8>* %out + ret void +} + +; +; truncate i32 -> i8 +; + +define <8 x i8> @trunc_v8i32_v8i8(<8 x i32>* %in) #0 { +; CHECK-LABEL: trunc_v8i32_v8i8: +; CHECK: ptrue [[PG:p[0-9]+]].s, vl8 +; CHECK-NEXT: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; CHECK-NEXT: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; CHECK-NEXT: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; CHECK-NEXT: ret + %a = load <8 x i32>, <8 x i32>* %in + %b = trunc <8 x i32> %a to <8 x i8> + ret <8 x i8> %b +} + +define <16 x i8> @trunc_v16i32_v16i8(<16 x i32>* %in) #0 { +; CHECK-LABEL: trunc_v16i32_v16i8: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16 +; VBITS_GE_512-NEXT: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_512-NEXT: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512-NEXT: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_512-NEXT: ret + %a = load <16 x i32>, <16 x i32>* %in + %b = trunc <16 x i32> %a to <16 x i8> + ret <16 x i8> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i32_v32i8(<32 x i32>* %in, <32 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v32i32_v32i8: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32 +; VBITS_GE_1024: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_1024: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_1024: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_1024: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <32 x i32>, <32 x i32>* %in + %b = trunc <32 x i32> %a to <32 x i8> + %c = add <32 x i8> %b, %b + store <32 x i8> %c, <32 x i8>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v64i32_v64i8(<64 x i32>* %in, <64 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v64i32_v64i8: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl64 +; VBITS_GE_2048: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_2048: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_2048: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <64 x i32>, <64 x i32>* %in + %b = trunc <64 x i32> %a to <64 x i8> + %c = add <64 x i8> %b, %b + store <64 x i8> %c, <64 x i8>* %out + ret void +} + +; +; truncate i32 -> i16 +; + +define <8 x i16> @trunc_v8i32_v8i16(<8 x i32>* %in) #0 { +; CHECK-LABEL: trunc_v8i32_v8i16: +; CHECK: ptrue [[PG:p[0-9]+]].s, vl8 +; CHECK-NEXT: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; CHECK-NEXT: uzp1 z0.h, [[A_WORDS]].h, [[A_WORDS]].h +; CHECK-NEXT: ret + %a = load <8 x i32>, <8 x i32>* %in + %b = trunc <8 x i32> %a to <8 x i16> + ret <8 x i16> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v16i32_v16i16(<16 x i32>* %in, <16 x i16>* %out) #0 { +; CHECK-LABEL: trunc_v16i32_v16i16: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16 +; VBITS_GE_512: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_512: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512: add [[A_HALFS]].h, [[PG]]/m, [[A_HALFS]].h, [[A_HALFS]].h + %a = load <16 x i32>, <16 x i32>* %in + %b = trunc <16 x i32> %a to <16 x i16> + %c = add <16 x i16> %b, %b + store <16 x i16> %c, <16 x i16>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i32_v32i16(<32 x i32>* %in, <32 x i16>* %out) #0 { +; CHECK-LABEL: trunc_v32i32_v32i16: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32 +; VBITS_GE_1024: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_1024: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_1024: add [[A_HALFS]].h, [[PG]]/m, [[A_HALFS]].h, [[A_HALFS]].h + %a = load <32 x i32>, <32 x i32>* %in + %b = trunc <32 x i32> %a to <32 x i16> + %c = add <32 x i16> %b, %b + store <32 x i16> %c, <32 x i16>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v64i32_v64i16(<64 x i32>* %in, <64 x i16>* %out) #0 { +; CHECK-LABEL: trunc_v64i32_v64i16: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl64 +; VBITS_GE_2048: ld1w { [[A_WORDS:z[0-9]+]].s }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_2048: add [[A_HALFS]].h, [[PG]]/m, [[A_HALFS]].h, [[A_HALFS]].h + %a = load <64 x i32>, <64 x i32>* %in + %b = trunc <64 x i32> %a to <64 x i16> + %c = add <64 x i16> %b, %b + store <64 x i16> %c, <64 x i16>* %out + ret void +} + +; +; truncate i64 -> i8 +; + +; NOTE: v4i8 is not legal so result i8 elements are held within i16 containers. +define <4 x i8> @trunc_v4i64_v4i8(<4 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v4i64_v4i8: +; VBITS_GE_256: ptrue [[PG:p[0-9]+]].d, vl4 +; VBITS_GE_256-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_256-NEXT: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_256-NEXT: uzp1 z0.h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_256-NEXT: ret + %a = load <4 x i64>, <4 x i64>* %in + %b = trunc <4 x i64> %a to <4 x i8> + ret <4 x i8> %b +} + +define <8 x i8> @trunc_v8i64_v8i8(<8 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v8i64_v8i8: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8 +; VBITS_GE_512-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_512-NEXT: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_512-NEXT: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512-NEXT: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_512-NEXT: ret + %a = load <8 x i64>, <8 x i64>* %in + %b = trunc <8 x i64> %a to <8 x i8> + ret <8 x i8> %b +} + +define <16 x i8> @trunc_v16i64_v16i8(<16 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v16i64_v16i8: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16 +; VBITS_GE_1024-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_1024-NEXT: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_1024-NEXT: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_1024-NEXT: uzp1 z0.b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_1024-NEXT: ret + %a = load <16 x i64>, <16 x i64>* %in + %b = trunc <16 x i64> %a to <16 x i8> + ret <16 x i8> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i64_v32i8(<32 x i64>* %in, <32 x i8>* %out) #0 { +; CHECK-LABEL: trunc_v32i64_v32i8: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32 +; VBITS_GE_2048: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_2048: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_2048: uzp1 [[A_BYTES:z[0-9]+]].b, [[A_HALFS]].b, [[A_HALFS]].b +; VBITS_GE_2048: add [[A_BYTES]].b, [[PG]]/m, [[A_BYTES]].b, [[A_BYTES]].b + %a = load <32 x i64>, <32 x i64>* %in + %b = trunc <32 x i64> %a to <32 x i8> + %c = add <32 x i8> %b, %b + store <32 x i8> %c, <32 x i8>* %out + ret void +} + +; +; truncate i64 -> i16 +; + +define <4 x i16> @trunc_v4i64_v4i16(<4 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v4i64_v4i16: +; CHECK: ptrue [[PG:p[0-9]+]].d, vl4 +; CHECK-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; CHECK-NEXT: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; CHECK-NEXT: uzp1 z0.h, [[A_WORDS]].h, [[A_WORDS]].h +; CHECK-NEXT: ret + %a = load <4 x i64>, <4 x i64>* %in + %b = trunc <4 x i64> %a to <4 x i16> + ret <4 x i16> %b +} + +define <8 x i16> @trunc_v8i64_v8i16(<8 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v8i64_v8i16: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8 +; VBITS_GE_512-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_512-NEXT: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_512-NEXT: uzp1 z0.h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_512-NEXT: ret + %a = load <8 x i64>, <8 x i64>* %in + %b = trunc <8 x i64> %a to <8 x i16> + ret <8 x i16> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v16i64_v16i16(<16 x i64>* %in, <16 x i16>* %out) #0 { +; CHECK-LABEL: trunc_v16i64_v16i16: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16 +; VBITS_GE_1024: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_1024: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_1024: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_1024: add [[A_HALFS]].h, [[PG]]/m, [[A_HALFS]].h, [[A_HALFS]].h + %a = load <16 x i64>, <16 x i64>* %in + %b = trunc <16 x i64> %a to <16 x i16> + %c = add <16 x i16> %b, %b + store <16 x i16> %c, <16 x i16>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i64_v32i16(<32 x i64>* %in, <32 x i16>* %out) #0 { +; CHECK-LABEL: trunc_v32i64_v32i16: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32 +; VBITS_GE_2048: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_2048: uzp1 [[A_HALFS:z[0-9]+]].h, [[A_WORDS]].h, [[A_WORDS]].h +; VBITS_GE_2048: add [[A_HALFS]].h, [[PG]]/m, [[A_HALFS]].h, [[A_HALFS]].h + %a = load <32 x i64>, <32 x i64>* %in + %b = trunc <32 x i64> %a to <32 x i16> + %c = add <32 x i16> %b, %b + store <32 x i16> %c, <32 x i16>* %out + ret void +} + +; +; truncate i64 -> i32 +; + +define <4 x i32> @trunc_v4i64_v4i32(<4 x i64>* %in) #0 { +; CHECK-LABEL: trunc_v4i64_v4i32: +; CHECK: ptrue [[PG:p[0-9]+]].d, vl4 +; CHECK-NEXT: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; CHECK-NEXT: uzp1 z0.s, [[A_DWORDS]].s, [[A_DWORDS]].s +; CHECK-NEXT: ret + %a = load <4 x i64>, <4 x i64>* %in + %b = trunc <4 x i64> %a to <4 x i32> + ret <4 x i32> %b +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v8i64_v8i32(<8 x i64>* %in, <8 x i32>* %out) #0 { +; CHECK-LABEL: trunc_v8i64_v8i32: +; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8 +; VBITS_GE_512: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_512: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_512: add [[A_WORDS]].s, [[PG]]/m, [[A_WORDS]].s, [[A_WORDS]].s + %a = load <8 x i64>, <8 x i64>* %in + %b = trunc <8 x i64> %a to <8 x i32> + %c = add <8 x i32> %b, %b + store <8 x i32> %c, <8 x i32>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v16i64_v16i32(<16 x i64>* %in, <16 x i32>* %out) #0 { +; CHECK-LABEL: trunc_v16i64_v16i32: +; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16 +; VBITS_GE_1024: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_1024: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_1024: add [[A_WORDS]].s, [[PG]]/m, [[A_WORDS]].s, [[A_WORDS]].s + %a = load <16 x i64>, <16 x i64>* %in + %b = trunc <16 x i64> %a to <16 x i32> + %c = add <16 x i32> %b, %b + store <16 x i32> %c, <16 x i32>* %out + ret void +} + +; NOTE: Extra 'add' is to prevent the truncate being combined with the store. +define void @trunc_v32i64_v32i32(<32 x i64>* %in, <32 x i32>* %out) #0 { +; CHECK-LABEL: trunc_v32i64_v32i32: +; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32 +; VBITS_GE_2048: ld1d { [[A_DWORDS:z[0-9]+]].d }, [[PG]]/z, [x0] +; VBITS_GE_2048: uzp1 [[A_WORDS:z[0-9]+]].s, [[A_DWORDS]].s, [[A_DWORDS]].s +; VBITS_GE_2048: add [[A_WORDS]].s, [[PG]]/m, [[A_WORDS]].s, [[A_WORDS]].s + %a = load <32 x i64>, <32 x i64>* %in + %b = trunc <32 x i64> %a to <32 x i32> + %c = add <32 x i32> %b, %b + store <32 x i32> %c, <32 x i32>* %out + ret void +} + +attributes #0 = { nounwind "target-features"="+sve" } From llvm-commits at lists.llvm.org Fri Jul 10 03:40:01 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:40:01 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <9492e4df6cf71f5c758a67ae5562fe44@localhost.localdomain> SjoerdMeijer updated this revision to Diff 276973. SjoerdMeijer added a comment. As discussed: - removed "linearization" and replaced it with the explanation how matrices are laid out in vectors. - Similarly, spent some words how Stride is used/calculated - added a check for Stride >= Rows. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 Files: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83477.276973.patch Type: text/x-patch Size: 17567 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:43:18 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:43:18 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: SjoerdMeijer updated this revision to Diff 276975. SjoerdMeijer added a comment. Removed unnecessary extra newline. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 Files: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83477.276975.patch Type: text/x-patch Size: 17565 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:53:51 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:53:51 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <7e1d1159a9804062c7c7461208c4e8f1@localhost.localdomain> LukeGeeson updated this revision to Diff 276977. LukeGeeson marked 3 inline comments as done. LukeGeeson added a comment. - Addresses dmgreens comments - reordered CPUs in the right places - added code/tests in all files that exist in the a77 patch made minor adjustments including: - adding FeatureRCPC to the missing CPUs Please let me know if there is anything else CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.276977.patch Type: text/x-patch Size: 21205 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:57:07 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:57:07 +0000 (UTC) Subject: [PATCH] D83543: [CodeMoverUtils] Add more data dependency related test case Message-ID: RithikSharma created this revision. RithikSharma added reviewers: Whitney, bmahjour, etiotto. RithikSharma added a project: LLVM. Herald added a subscriber: llvm-commits. This patch adds more test case focusing on data dependency Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83543 Files: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp Index: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp =================================================================== --- llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp +++ llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp @@ -652,3 +652,76 @@ EXPECT_FALSE(isSafeToMoveBefore(*SubInst, *AddInst, DT, &PDT, &DI)); }); } + +TEST(CodeMoverUtils, IsSafeToMoveTest5) { + LLVMContext C; + + std::unique_ptr M = + parseIR(C, R"(define void @dependence(i32* noalias %A, i32* noalias %B){ +entry: + store i32 0, i32* %A, align 4 ; storeA0 + store i32 2, i32* %A, align 4 ; storeA1 + %tmp0 = load i32, i32* %A, align 4 ; loadA0 + store i32 1, i32* %B, align 4 ; storeB0 + %tmp1 = load i32, i32* %A, align 4 ; loadA1 + store i32 2, i32* %A, align 4 ; storeA2 + store i32 4, i32* %B, align 4 ; StoreB1 + %tmp2 = load i32, i32* %A, align 4 ; loadA2 + %tmp3 = load i32, i32* %A, align 4 ; loadA3 + %tmp4 = load i32, i32* %B, align 4 ; loadB2 + %tmp5 = load i32, i32* %B, align 4 ; loadB3 + ret void +})"); + + run(*M, "dependence", + [&](Function &F, DominatorTree &DT, PostDominatorTree &PDT, + DependenceInfo &DI) { + Instruction *LoadA0 = getInstructionByName(F, "tmp0"); + Instruction *StoreA1 = LoadA0->getPrevNode(); + Instruction *StoreA0 = StoreA1->getPrevNode(); + // Output forward dependency + EXPECT_FALSE(isSafeToMoveBefore(*StoreA0, *LoadA0, DT, &PDT, &DI)); + // Output backward dependency + EXPECT_FALSE(isSafeToMoveBefore(*StoreA1, *StoreA0, DT, &PDT, &DI)); + + Instruction *LoadA2 = getInstructionByName(F, "tmp2"); + Instruction *StoreB1 = LoadA2->getPrevNode(); + Instruction *StoreA2 = StoreB1->getNextNode(); + // No Output forward dependency + EXPECT_TRUE(isSafeToMoveBefore(*StoreA2, *LoadA2, DT, &PDT, &DI)); + // No Output backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*StoreB1, *StoreA2, DT, &PDT, &DI)); + + Instruction *StoreB0 = LoadA0->getNextNode(); + // Flow backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA0, *StoreA1, DT, &PDT, &DI)); + // Anti forward dependency + EXPECT_FALSE(isSafeToMoveBefore(*StoreA1, *StoreB0, DT, &PDT, &DI)); + + Instruction *LoadA1 = getInstructionByName(F, "tmp1"); + // Anti backward dependency + EXPECT_FALSE(isSafeToMoveBefore(*StoreA2, *LoadA1, DT, &PDT, &DI)); + // Flow forward dependency + EXPECT_FALSE(isSafeToMoveBefore(*LoadA1, *StoreB1, DT, &PDT, &DI)); + // No anti forward dependency + EXPECT_TRUE(isSafeToMoveBefore(*StoreB0, *StoreA2, DT, &PDT, &DI)); + // No flow backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA1, *StoreB0, DT, &PDT, &DI)); + // No anti backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*StoreB0, *LoadA0, DT, &PDT, &DI)); + // No flow forward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA0, *LoadA1, DT, &PDT, &DI)); + + Instruction *LoadA3 = getInstructionByName(F, "tmp3"); + Instruction *LoadB2 = getInstructionByName(F, "tmp4"); + Instruction *LoadB3 = getInstructionByName(F, "tmp5"); + // Input forward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA2, *LoadB2, DT, &PDT, &DI)); + // Input backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA3, *LoadA2, DT, &PDT, &DI)); + // No input backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadB2, *LoadA3, DT, &PDT, &DI)); + // No input forward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA3, *LoadB3, DT, &PDT, &DI)); + }); +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83543.276979.patch Type: text/x-patch Size: 3739 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 03:57:50 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 10:57:50 +0000 (UTC) Subject: [PATCH] D81695: [flang] Turn off FLANG_ENABLE_WERROR by default In-Reply-To: References: Message-ID: DavidTruby added a comment. Can we move ahead with this change before the llvm 11 branch? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81695/new/ https://reviews.llvm.org/D81695 From llvm-commits at lists.llvm.org Fri Jul 10 04:03:08 2020 From: llvm-commits at lists.llvm.org (Stefan Pintilie via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:03:08 +0000 (UTC) Subject: [PATCH] D83255: [PowerPC] Split s34imm into two types In-Reply-To: References: Message-ID: <61b4848fd05794ec7d620480ed11a84c@localhost.localdomain> stefanp reopened this revision. stefanp added a comment. This revision is now accepted and ready to land. Going to reopen this review. The initial commit was pulled due to asserts being hit on an release+asserts build. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83255/new/ https://reviews.llvm.org/D83255 From llvm-commits at lists.llvm.org Fri Jul 10 04:12:26 2020 From: llvm-commits at lists.llvm.org (David Stuttard via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:12:26 +0000 (UTC) Subject: [PATCH] D83540: [NFC] Change isFPPredicate comparison to ignore lower bound In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG69a89b54c626: [NFC] Change isFPPredicate comparison to ignore lower bound (authored by dstuttard). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83540/new/ https://reviews.llvm.org/D83540 Files: llvm/include/llvm/IR/InstrTypes.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4567,8 +4567,7 @@ EVT VT = N->getValueType(0); const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < ICmpInst::Predicate::FIRST_ICMP_PREDICATE || - CondCode > ICmpInst::Predicate::LAST_ICMP_PREDICATE) + if (!ICmpInst::isIntPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); ICmpInst::Predicate IcInput = static_cast(CondCode); @@ -4604,10 +4603,8 @@ const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < FCmpInst::Predicate::FIRST_FCMP_PREDICATE || - CondCode > FCmpInst::Predicate::LAST_FCMP_PREDICATE) { + if (!FCmpInst::isFPPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); - } SDValue Src0 = N->getOperand(1); SDValue Src1 = N->getOperand(2); Index: llvm/include/llvm/IR/InstrTypes.h =================================================================== --- llvm/include/llvm/IR/InstrTypes.h +++ llvm/include/llvm/IR/InstrTypes.h @@ -805,7 +805,9 @@ void setPredicate(Predicate P) { setSubclassData(P); } static bool isFPPredicate(Predicate P) { - return P >= FIRST_FCMP_PREDICATE && P <= LAST_FCMP_PREDICATE; + assert(FIRST_FCMP_PREDICATE == 0 && + "FIRST_FCMP_PREDICATE is required to be 0"); + return P <= LAST_FCMP_PREDICATE; } static bool isIntPredicate(Predicate P) { -------------- next part -------------- A non-text attachment was scrubbed... Name: D83540.276981.patch Type: text/x-patch Size: 1695 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:12:25 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Fri, 10 Jul 2020 04:12:25 -0700 (PDT) Subject: [llvm] 69a89b5 - [NFC] Change isFPPredicate comparison to ignore lower bound Message-ID: <5f084d19.1c69fb81.aa776.e1b5@mx.google.com> Author: dstuttar Date: 2020-07-10T11:57:20+01:00 New Revision: 69a89b54c62696d45731b48c26686cc4f9d652c6 URL: https://github.com/llvm/llvm-project/commit/69a89b54c62696d45731b48c26686cc4f9d652c6 DIFF: https://github.com/llvm/llvm-project/commit/69a89b54c62696d45731b48c26686cc4f9d652c6.diff LOG: [NFC] Change isFPPredicate comparison to ignore lower bound Summary: Since changing the Predicate to be an unsigned enum, the lower bound check for isFPPredicate no longer needs to check the lower bound, since it will always evaluate to true. Also fixed a similar issue in SIISelLowering.cpp by removing the need for comparing to FIRST and LAST predicates Added an assert to the isFPPredicate comparison to flag if the FIRST_FCMP_PREDICATE is ever changed to anything other than 0, in which case the logic will break. Without this change warnings are generated in VS. Change-Id: I358f0daf28c0628c7bda8ad4cab4e1757b761bab Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83540 Added: Modified: llvm/include/llvm/IR/InstrTypes.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h index 8408c8772b22..07af00ec9240 100644 --- a/llvm/include/llvm/IR/InstrTypes.h +++ b/llvm/include/llvm/IR/InstrTypes.h @@ -805,7 +805,9 @@ class CmpInst : public Instruction { void setPredicate(Predicate P) { setSubclassData(P); } static bool isFPPredicate(Predicate P) { - return P >= FIRST_FCMP_PREDICATE && P <= LAST_FCMP_PREDICATE; + assert(FIRST_FCMP_PREDICATE == 0 && + "FIRST_FCMP_PREDICATE is required to be 0"); + return P <= LAST_FCMP_PREDICATE; } static bool isIntPredicate(Predicate P) { diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 79204180540f..d035aa8f72bd 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -4567,8 +4567,7 @@ static SDValue lowerICMPIntrinsic(const SITargetLowering &TLI, EVT VT = N->getValueType(0); const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < ICmpInst::Predicate::FIRST_ICMP_PREDICATE || - CondCode > ICmpInst::Predicate::LAST_ICMP_PREDICATE) + if (!ICmpInst::isIntPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); ICmpInst::Predicate IcInput = static_cast(CondCode); @@ -4604,10 +4603,8 @@ static SDValue lowerFCMPIntrinsic(const SITargetLowering &TLI, const auto *CD = cast(N->getOperand(3)); unsigned CondCode = CD->getZExtValue(); - if (CondCode < FCmpInst::Predicate::FIRST_FCMP_PREDICATE || - CondCode > FCmpInst::Predicate::LAST_FCMP_PREDICATE) { + if (!FCmpInst::isFPPredicate(static_cast(CondCode))) return DAG.getUNDEF(VT); - } SDValue Src0 = N->getOperand(1); SDValue Src1 = N->getOperand(2); From llvm-commits at lists.llvm.org Fri Jul 10 04:13:54 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Fri, 10 Jul 2020 04:13:54 -0700 (PDT) Subject: [llvm] 4cc26a4 - [X86][SSE] Use shouldUseHorizontalOp helper to determine whether to use (F)HADD. NFCI. Message-ID: <5f084d72.1c69fb81.ba785.e6b3@mx.google.com> Author: Simon Pilgrim Date: 2020-07-10T12:13:34+01:00 New Revision: 4cc26a44ca8b29abf9e73a1048e8a36ac87b1fa1 URL: https://github.com/llvm/llvm-project/commit/4cc26a44ca8b29abf9e73a1048e8a36ac87b1fa1 DIFF: https://github.com/llvm/llvm-project/commit/4cc26a44ca8b29abf9e73a1048e8a36ac87b1fa1.diff LOG: [X86][SSE] Use shouldUseHorizontalOp helper to determine whether to use (F)HADD. NFCI. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 4d3b0eda58f2..695b6ef35f11 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -39301,8 +39301,7 @@ static SDValue combineReductionToHorizontal(SDNode *ExtElt, SelectionDAG &DAG, } // Only use (F)HADD opcodes if they aren't microcoded or minimizes codesize. - bool OptForSize = DAG.shouldOptForSize(); - if (!Subtarget.hasFastHorizontalOps() && !OptForSize) + if (!shouldUseHorizontalOp(true, DAG, Subtarget)) return SDValue(); unsigned HorizOpcode = Opc == ISD::ADD ? X86ISD::HADD : X86ISD::FHADD; From llvm-commits at lists.llvm.org Fri Jul 10 04:13:56 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Fri, 10 Jul 2020 04:13:56 -0700 (PDT) Subject: [llvm] 9ce9831 - StackSafetyAnalysis.cpp - pass ConstantRange arg as const reference. Message-ID: <5f084d74.1c69fb81.18106.e663@mx.google.com> Author: Simon Pilgrim Date: 2020-07-10T12:13:34+01:00 New Revision: 9ce98312896c5c67adb3a137506758d2cac8bb37 URL: https://github.com/llvm/llvm-project/commit/9ce98312896c5c67adb3a137506758d2cac8bb37 DIFF: https://github.com/llvm/llvm-project/commit/9ce98312896c5c67adb3a137506758d2cac8bb37.diff LOG: StackSafetyAnalysis.cpp - pass ConstantRange arg as const reference. Avoids unnecessary copies and silences clang tidy warning - we do this in most places, there are just a few that were missed. Added: Modified: llvm/lib/Analysis/StackSafetyAnalysis.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/StackSafetyAnalysis.cpp b/llvm/lib/Analysis/StackSafetyAnalysis.cpp index c737cf013608..bbfc303aefac 100644 --- a/llvm/lib/Analysis/StackSafetyAnalysis.cpp +++ b/llvm/lib/Analysis/StackSafetyAnalysis.cpp @@ -59,7 +59,7 @@ template struct CallInfo { // Range should never set to empty-set, that is an invalid access range // that can cause empty-set to be propagated with ConstantRange::add ConstantRange Offset; - CallInfo(const CalleeTy *Callee, size_t ParamNo, ConstantRange Offset) + CallInfo(const CalleeTy *Callee, size_t ParamNo, const ConstantRange &Offset) : Callee(Callee), ParamNo(ParamNo), Offset(Offset) {} }; @@ -202,7 +202,7 @@ class StackSafetyLocalAnalysis { ConstantRange offsetFrom(Value *Addr, Value *Base); ConstantRange getAccessRange(Value *Addr, Value *Base, - ConstantRange SizeRange); + const ConstantRange &SizeRange); ConstantRange getAccessRange(Value *Addr, Value *Base, TypeSize Size); ConstantRange getMemIntrinsicAccessRange(const MemIntrinsic *MI, const Use &U, Value *Base); @@ -237,7 +237,7 @@ ConstantRange StackSafetyLocalAnalysis::offsetFrom(Value *Addr, Value *Base) { ConstantRange StackSafetyLocalAnalysis::getAccessRange(Value *Addr, Value *Base, - ConstantRange SizeRange) { + const ConstantRange &SizeRange) { // Zero-size loads and stores do not access memory. if (SizeRange.isEmptySet()) return ConstantRange::getEmpty(PointerSize); From llvm-commits at lists.llvm.org Fri Jul 10 04:17:10 2020 From: llvm-commits at lists.llvm.org (Alok Kumar Sharma via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:17:10 +0000 (UTC) Subject: [PATCH] D83544: [DebugInfo] Support for DW_AT_associated and DW_AT_allocated. Message-ID: alok created this revision. alok added reviewers: aprantl, probinson, jmorse, dblaikie, vsk, jini.susan.george, SouraVX. alok added a project: debug-info. Herald added subscribers: llvm-commits, ormris, hiraditya. Herald added a reviewer: sscalpone. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. This support is needed for the Fortran array variables with pointer/allocatable attribute. This support enables debugger to identify the status of variable whether that is currently allocated/associated. for pointer array (before allocation/association) without DW_AT_associated ------------------------ (gdb) pt ptr type = integer (140737345375288:140737354129776) (gdb) p ptr value requires 35017956 bytes, which is more than max-value-size ---------------------------------------------------------------- with DW_AT_associated --------------------- (gdb) pt ptr type = integer (:) (gdb) p ptr $1 = --------------------- for allocatable array (before allocation) without DW_AT_allocated ----------------------- (gdb) pt arr type = integer (140737345375288:140737354129776) (gdb) p arr value requires 35017956 bytes, which is more than max-value-size ---------------------------------------------------------------- with DW_AT_allocated -------------------- (gdb) pt arr type = integer, allocatable (:) (gdb) p arr $1 = -------------------- Testing - unit test cases added - check-llvm - check-debuginfo Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83544 Files: llvm/docs/LangRef.rst llvm/include/llvm/IR/DebugInfoMetadata.h llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/MetadataLoader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp llvm/lib/IR/AsmWriter.cpp llvm/lib/IR/DebugInfoMetadata.cpp llvm/lib/IR/LLVMContextImpl.h llvm/lib/IR/Verifier.cpp llvm/test/Bitcode/allocated.ll llvm/test/Bitcode/associated.ll llvm/test/DebugInfo/X86/dwarfdump-allocatedExp.ll llvm/test/DebugInfo/X86/dwarfdump-allocatedVar.ll llvm/test/DebugInfo/X86/dwarfdump-associatedExp.ll llvm/test/DebugInfo/X86/dwarfdump-associatedVar.ll llvm/test/Verifier/array_allocated.ll llvm/test/Verifier/array_associated.ll llvm/unittests/IR/DebugTypeODRUniquingTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83544.276980.patch Type: text/x-patch Size: 45752 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:18:10 2020 From: llvm-commits at lists.llvm.org (Diana Picus via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:18:10 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo In-Reply-To: References: Message-ID: <2855420b1a23f4e2e7f2fb393e2ffe85@localhost.localdomain> rovka added a comment. Hi again, I'm sorry, but I don't know much about WebAssembly. I would suggest taking the opposite approach: mark as unavailable in general, and as available only on platforms where you know for sure they exist - based on the docs you linked, that's probably anything that has C++11 support. Maybe someone else with more experience in this area can chime in. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83361/new/ https://reviews.llvm.org/D83361 From llvm-commits at lists.llvm.org Fri Jul 10 04:24:56 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:24:56 +0000 (UTC) Subject: [PATCH] D83159: [RISCV][test] Add a new codegen test In-Reply-To: References: Message-ID: benshi001 updated this revision to Diff 276985. benshi001 retitled this revision from "[RISCV] Add a new codegen test" to "[RISCV][test] Add a new codegen test". benshi001 edited the summary of this revision. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 Files: llvm/test/CodeGen/RISCV/addimm-mulimm.ll Index: llvm/test/CodeGen/RISCV/addimm-mulimm.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/RISCV/addimm-mulimm.ll @@ -0,0 +1,95 @@ +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s +; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV64IM %s + +; Test whether (mul (add x, c1), c2) can be transformed to +; (add (mul x, c2), c1*c2) according to different c1/c2 pairs. + +define signext i32 @add_mul_trans_accept_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 11 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: addi a0, a0, 407 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 11 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: addiw a0, a0, 407 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 37 + %tmp1 = mul i32 %tmp0, 11 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_accept_2(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_2 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 13 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 28 +; RV32IM-NEXT: addi a1, a1, 1701 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_2 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 13 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 28 +; RV64IM-NEXT: addiw a1, a1, 1701 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 8953 + %tmp1 = mul i32 %tmp0, 13 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_reject_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 19 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 9 +; RV32IM-NEXT: addi a1, a1, 585 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_reject_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 19 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 9 +; RV64IM-NEXT: addiw a1, a1, 585 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1971 + %tmp1 = mul i32 %tmp0, 19 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_2(i32 %x) { +; RV32IM: # %bb.0: +; RV32IM-NEXT: lui a1, 792 +; RV32IM-NEXT: addi a1, a1, -1709 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 1014660 +; RV32IM-NEXT: addi a1, a1, -1891 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM: # %bb.0: +; RV64IM-NEXT: lui a1, 792 +; RV64IM-NEXT: addiw a1, a1, -1709 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 1014660 +; RV64IM-NEXT: addiw a1, a1, -1891 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1841231 + %tmp1 = mul i32 %tmp0, 3242323 + ret i32 %tmp1 +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83159.276985.patch Type: text/x-patch Size: 3054 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:25:54 2020 From: llvm-commits at lists.llvm.org (ChenZheng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:25:54 +0000 (UTC) Subject: [PATCH] D83365: [PowerPC] start and end parameters for fixupIsDeadOrKill may exist in different block before RA In-Reply-To: References: Message-ID: <5dd4493cd53b2f94766b08de441e4214@localhost.localdomain> shchenz updated this revision to Diff 276988. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83365/new/ https://reviews.llvm.org/D83365 Files: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.h llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir Index: llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir =================================================================== --- /dev/null +++ llvm/test/CodeGen/PowerPC/fixup-kill-dead-flag-crash.mir @@ -0,0 +1,21 @@ +# RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs -start-before ppc-mi-peepholes \ +# RUN: -stop-after ppc-mi-peepholes %s -o - | FileCheck %s + +--- +name: test +#CHECK : name : test +tracksRegLiveness: true +body: | + bb.0.entry: + liveins: $x3 + %0:g8rc = COPY $x3 + %1:gprc = COPY %0.sub_32:g8rc + %2:g8rc = LI8 63 + + bb.1: + %3:gprc = COPY %2.sub_32:g8rc + ; CHECK: %4:gprc = LI 0 + %4:gprc = XORI killed %3:gprc, 63 + STW killed %4:gprc, %4:gprc, 100 + BLR8 implicit $lr8, implicit $rm +... Index: llvm/lib/Target/PowerPC/PPCInstrInfo.h =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.h +++ llvm/lib/Target/PowerPC/PPCInstrInfo.h @@ -565,14 +565,18 @@ int64_t OffsetImm) const; /// Fixup killed/dead flag for register \p RegNo between instructions [\p - /// StartMI, \p EndMI]. Some PostRA transformations may violate register - /// killed/dead flags semantics, this function can be called to fix up. Before - /// calling this function, + /// StartMI, \p EndMI]. Some pre-RA or post-RA transformations may violate + /// register killed/dead flags semantics, this function can be called to fix + /// up. Before calling this function, /// 1. Ensure that \p RegNo liveness is killed after instruction \p EndMI. /// 2. Ensure that there is no new definition between (\p StartMI, \p EndMI) /// and possible definition for \p RegNo is \p StartMI or \p EndMI. - /// 3. Ensure that all instructions between [\p StartMI, \p EndMI] are in same - /// basic block. + /// 3. We can do accurate fixup for the case when all instructions between + /// [\p StartMI, \p EndMI] are in same basic block. + /// 4. For the case when \p StartMI and \p EndMI are not in same basic block, + /// we conservatively clear kill flag for all uses of \p RegNo for pre-RA + /// and for post-RA, we give an assertion as without reaching definition + /// analysis post-RA, \p StartMI and \p EndMI are hard to keep right. void fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const; void replaceInstrWithLI(MachineInstr &MI, const LoadImmediateInfo &LII) const; Index: llvm/lib/Target/PowerPC/PPCInstrInfo.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCInstrInfo.cpp +++ llvm/lib/Target/PowerPC/PPCInstrInfo.cpp @@ -2655,6 +2655,15 @@ void PPCInstrInfo::fixupIsDeadOrKill(MachineInstr &StartMI, MachineInstr &EndMI, unsigned RegNo) const { + // Conservatively clear kill flag for the register if the instructions are in + // different basic blocks and in SSA form, because the kill flag may no longer + // be right. There is no need to bother with dead flags since defs with no + // uses will be handled by DCE. + MachineRegisterInfo &MRI = StartMI.getParent()->getParent()->getRegInfo(); + if (MRI.isSSA() && (StartMI.getParent() != EndMI.getParent())) { + MRI.clearKillFlags(RegNo); + return; + } // Instructions between [StartMI, EndMI] should be in same basic block. assert((StartMI.getParent() == EndMI.getParent()) && -------------- next part -------------- A non-text attachment was scrubbed... Name: D83365.276988.patch Type: text/x-patch Size: 3490 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:28:55 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:28:55 +0000 (UTC) Subject: [PATCH] D83495: [DebugInfo] Add DWARF emission for DBG_VALUE_LIST In-Reply-To: References: Message-ID: <8f681ee7e62129498e0f387e9ad9464b@localhost.localdomain> StephenTozer marked an inline comment as done. StephenTozer added inline comments. ================ Comment at: llvm/test/DebugInfo/X86/dbg_value_list_emission.mir:63 + ; (3) Check that multiple references to one reg arg works. + ; XXX What was the consensus on this - are we allowing it? + DBG_VALUE_LIST !25, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 0, DW_OP_minus, DW_OP_stack_value), $eax, debug-location !15 ---------------- Orlando wrote: > Can remove this XXX note. As you mentioned offline, having multiple references to the same arg (i.e. multiple `DW_OP_LLVM_arg, 0` in the expr) is never a problem. > > Though, slightly tangentially, I'm still a little unclear on what the final decision was on how to handle duplicate register arg operands. In D82363 you said 'always treat DBG_VALUE_LISTs as potentially having them'. Please could you explain a little further? (i.e. is it an error state, do we need to add extra checks when dealing with DBG_VALUE_LISTs etc). It is not an error state, just a slightly more inconvenient form than one without duplicates. It requires some extra work in a few places (operating on a vector instead of a single pointer), but there is no reason for it to be invalid. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83495/new/ https://reviews.llvm.org/D83495 From llvm-commits at lists.llvm.org Fri Jul 10 04:30:43 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:30:43 +0000 (UTC) Subject: [PATCH] D83495: [DebugInfo] Add DWARF emission for DBG_VALUE_LIST In-Reply-To: References: Message-ID: <8827464ea11b0beb88a5bb1c1207fbfe@localhost.localdomain> StephenTozer updated this revision to Diff 276989. StephenTozer added a comment. Remove old comments from tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83495/new/ https://reviews.llvm.org/D83495 Files: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DbgEntityHistoryCalculator.cpp llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp llvm/lib/CodeGen/AsmPrinter/DebugLocEntry.h llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h llvm/test/DebugInfo/X86/dbg_value_list_clobbers.mir llvm/test/DebugInfo/X86/dbg_value_list_emission.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83495.276989.patch Type: text/x-patch Size: 40761 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:38:39 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:38:39 +0000 (UTC) Subject: [PATCH] D83526: [FileCheck] In input dump, elide only if ellipsis is shorter In-Reply-To: References: Message-ID: jdenny updated this revision to Diff 276990. jdenny added a comment. Apply reviewer suggestion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83526/new/ https://reviews.llvm.org/D83526 Files: llvm/test/FileCheck/dump-input-context.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83526.276990.patch Type: text/x-patch Size: 13471 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:39:08 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:39:08 +0000 (UTC) Subject: [PATCH] D82203: [FileCheck] Implement -dump-input-context In-Reply-To: References: Message-ID: <493733d9a2bd63ecd9b0633e5ed5125a@localhost.localdomain> jdenny added a comment. Thanks for the reviews! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82203/new/ https://reviews.llvm.org/D82203 From llvm-commits at lists.llvm.org Fri Jul 10 04:39:49 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:39:49 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: jdenny marked an inline comment as done. jdenny added inline comments. ================ Comment at: llvm/utils/FileCheck/FileCheck.cpp:455 + case DumpInputFilterAll: + llvm_unreachable("unexpected DumpInputFilterAll"); + break; ---------------- mehdi_amini wrote: > In a tool like FileCheck I rather err on the side of deterministically failing with a `report_fatal_error` I don't object in principal, but I see no precedent for this in FileCheck. Are you ok with this landing as is? If FileCheck should generally use `report_fatal_error` instead of `llvm_unreachable`, I feel like that should be discussed in a separate review for all occurrences. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 From llvm-commits at lists.llvm.org Fri Jul 10 04:45:52 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:45:52 +0000 (UTC) Subject: [PATCH] D83535: [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. In-Reply-To: References: Message-ID: lebedev.ri accepted this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. LG, thank you Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83535/new/ https://reviews.llvm.org/D83535 From llvm-commits at lists.llvm.org Fri Jul 10 04:48:21 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via llvm-commits) Date: Fri, 10 Jul 2020 04:48:21 -0700 (PDT) Subject: [llvm] b69e0f6 - DomTreeUpdater::dump() - use const auto& iterator in for-range-loop. Message-ID: <5f085585.1c69fb81.4c7ab.f202@mx.google.com> Author: Simon Pilgrim Date: 2020-07-10T12:47:15+01:00 New Revision: b69e0f674fb5a05224fbe50cae9a9e4137a2c0e1 URL: https://github.com/llvm/llvm-project/commit/b69e0f674fb5a05224fbe50cae9a9e4137a2c0e1 DIFF: https://github.com/llvm/llvm-project/commit/b69e0f674fb5a05224fbe50cae9a9e4137a2c0e1.diff LOG: DomTreeUpdater::dump() - use const auto& iterator in for-range-loop. Avoids unnecessary copies and silences clang tidy warning. Added: Modified: llvm/lib/Analysis/DomTreeUpdater.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/DomTreeUpdater.cpp b/llvm/lib/Analysis/DomTreeUpdater.cpp index e0e17de20121..26e637bb6d99 100644 --- a/llvm/lib/Analysis/DomTreeUpdater.cpp +++ b/llvm/lib/Analysis/DomTreeUpdater.cpp @@ -507,7 +507,7 @@ LLVM_DUMP_METHOD void DomTreeUpdater::dump() const { OS << "Pending DeletedBBs:\n"; Index = 0; - for (auto BB : DeletedBBs) { + for (const auto &BB : DeletedBBs) { OS << " " << Index << " : "; ++Index; if (BB->hasName()) @@ -519,7 +519,7 @@ LLVM_DUMP_METHOD void DomTreeUpdater::dump() const { OS << "Pending Callbacks:\n"; Index = 0; - for (auto BB : Callbacks) { + for (const auto &BB : Callbacks) { OS << " " << Index << " : "; ++Index; if (BB->hasName()) From llvm-commits at lists.llvm.org Fri Jul 10 04:57:47 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:57:47 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <902ed4aebb22679b3cb3b20ed1519373@localhost.localdomain> mkazantsev marked an inline comment as done. mkazantsev added inline comments. ================ Comment at: llvm/test/Transforms/InstCombine/select.ll:2286 +; CHECK: exit: +; CHECK-NEXT: ret i32 [[B:%.*]] +; ---------------- nikic wrote: > I don't understand why this returns `%B` (and what the difference to the previous test is, for that matter). It actually returns `%A`, `%B` here is just a regex name. Seems that I miscopied it on revase, it was supposed to exertice scenario ``` %sel = select i1 %cond, i32 %A, i32 %phi ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Fri Jul 10 04:57:54 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:57:54 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: <336d104c6c2d42216b78e12dab440f4a@localhost.localdomain> mbrkusanin updated this revision to Diff 276992. mbrkusanin marked 3 inline comments as done. mbrkusanin set the repository for this revision to rG LLVM Github Monorepo. mbrkusanin added a comment. - Also renamed and updated SDag tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 Files: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83214.276992.patch Type: text/x-patch Size: 16660 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:58:12 2020 From: llvm-commits at lists.llvm.org (Mirko Brkusanin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:58:12 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: mbrkusanin added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054 + + Optional Arg = + getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true); + ---------------- arsenm wrote: > I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through Unfortunately regular version fails to produce the value. ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12 +; CHECK: ; %bb.0: +; CHECK-NEXT: s_mov_b32 s0, 0 +; CHECK-NEXT: s_mov_b32 s1, 0 +; CHECK-NEXT: ; return to shader part epilog ---------------- arsenm wrote: > This can be one s_mov_b64 It can, but SIFoldOperands will not let that happen. From: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = COPY %10.sub0:sreg_64 %4:sreg_32 = COPY %10.sub1:sreg_64 plus some instructions that use %3, %4 but will eventually be removed. SIFoldOperands will produce: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = S_MOV_B32 0 %4:sreg_32 = S_MOV_B32 0 ... which makes the first instruction dead and in the end we're left with two S_MOV_B32. For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands. Alternatively we can produce v_cmp_ne_u32_e64 s[0:1], 0, 0 if for whatever reason that is more preferable then s_mov_b32 s0, 0 s_mov_b32 s1, 0 Anyway, this is not an issue with selecting ballot. Following example has the same issue: ``` define amdgpu_cs i64 @si_fold_constants_i64() { %x = add i64 0, 0 ret i64 %x } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 From llvm-commits at lists.llvm.org Fri Jul 10 05:03:05 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:03:05 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: mkazantsev updated this revision to Diff 276994. mkazantsev added a comment. Fixed var naming in test that went astray after rebase. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 Files: llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/test/Transforms/InstCombine/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83284.276994.patch Type: text/x-patch Size: 7183 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 05:07:59 2020 From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:07:59 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: mkazantsev marked an inline comment as done. mkazantsev added inline comments. ================ Comment at: llvm/test/Transforms/InstCombine/select.ll:2286 +; CHECK: exit: +; CHECK-NEXT: ret i32 [[B:%.*]] +; ---------------- mkazantsev wrote: > nikic wrote: > > I don't understand why this returns `%B` (and what the difference to the previous test is, for that matter). > It actually returns `%A`, `%B` here is just a regex name. Seems that I miscopied it on revase, it was supposed to exertice scenario > ``` > %sel = select i1 %cond, i32 %A, i32 %phi > ``` Fixed the test `test_select_into_phi_not_idom_2` and var naming in it. It was supposed to show that we can replace Phi that is a false value of select (the previous test shows when it's a true value). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Fri Jul 10 05:08:48 2020 From: llvm-commits at lists.llvm.org (Djordje Todorovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:08:48 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: djtodoro added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:503 + } + ArrayRef getDebugOperandsForReg(Register Reg) { + assert(isDebugValue() && "Tried to get debug operands for non-debug_value"); ---------------- StephenTozer wrote: > djtodoro wrote: > > Can we use templates to avoid duplicated code here for `getDebugOperandsForReg()`? > We can, as long as we use a static function to hold the common code (if there's a way to do so without a static function then I'd be happy to go with that instead); the solution looks something like this: > > ``` > template > static ArrayRef getDebugOperandsForReg(Instruction *MI, Register Reg) { > assert(MI->isDebugValue() && "Tried to get debug operands for non-debug_value"); > SmallVector Ops; > for (Operand &Op : MI->debug_operands()) { > if (Op.isReg() && Op.getReg() == Reg) > Ops.push_back(&Op); > } > return Ops; > } > ArrayRef getDebugOperandsForReg(Register Reg) const { > return MachineInstr::getDebugOperandsForReg(this, Reg); > } > ArrayRef getDebugOperandsForReg(Register Reg) { > return MachineInstr::getDebugOperandsForReg(this, Reg); > } > ``` > > Does this look good? It removes the duplication, it's just a bit more verbose and leaves an otherwise useless static function hanging around, unless it's moved to a private block (which is also fine but reduces readability by moving it far away from the public functions). This looks good to me, thanks. ================ Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:486 + /// register \p Reg. + const bool hasDebugOperandForReg(Register Reg) const { + return count_if(debug_operands(), [Reg](const MachineOperand &Op) { ---------------- `const` does not have any impact here, since it is a return type, so it should be removed. ================ Comment at: llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp:79 + const DIExpression *DIExpr = MI.getDebugExpression(); + if (MI.isNonVariadicDebugValue()) { + DIExpr = DIExpression::prepend(MI.getDebugExpression(), ---------------- Should be `isNonListDebugValue()` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 From llvm-commits at lists.llvm.org Fri Jul 10 05:24:45 2020 From: llvm-commits at lists.llvm.org (Victor Huang via llvm-commits) Date: Fri, 10 Jul 2020 05:24:45 -0700 (PDT) Subject: [lld] 118366d - [PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC Message-ID: <5f085e0d.1c69fb81.2a2cd.d531@mx.google.com> Author: Victor Huang Date: 2020-07-10T07:23:32-05:00 New Revision: 118366dcb6c35239c1c816d109230d6f7f3660af URL: https://github.com/llvm/llvm-project/commit/118366dcb6c35239c1c816d109230d6f7f3660af DIFF: https://github.com/llvm/llvm-project/commit/118366dcb6c35239c1c816d109230d6f7f3660af.diff LOG: [PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC The PC Relative code allows for calls that are marked with the relocation R_PPC64_REL24_NOTOC. This indicates that the caller does not have a valid TOC pointer in R2 and does not require R2 to be restored after the call. This patch is added to support local calls to callees tha also do not have a TOC. Reviewed By: sfertile, MaskRay, stefanp Differential Revision: https://reviews.llvm.org/D82816 Added: lld/test/ELF/Inputs/ppc64-callee-global-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s Modified: lld/ELF/Arch/PPC64.cpp Removed: ################################################################################ diff --git a/lld/ELF/Arch/PPC64.cpp b/lld/ELF/Arch/PPC64.cpp index cf58b322bb3a..71c568088fb9 100644 --- a/lld/ELF/Arch/PPC64.cpp +++ b/lld/ELF/Arch/PPC64.cpp @@ -681,6 +681,8 @@ RelExpr PPC64::getRelExpr(RelType type, const Symbol &s, case R_PPC64_REL14: case R_PPC64_REL24: return R_PPC64_CALL_PLT; + case R_PPC64_REL24_NOTOC: + return R_PLT_PC; case R_PPC64_REL16_LO: case R_PPC64_REL16_HA: case R_PPC64_REL16_HI: @@ -993,7 +995,8 @@ void PPC64::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const { write32(loc, (read32(loc) & ~mask) | (val & mask)); break; } - case R_PPC64_REL24: { + case R_PPC64_REL24: + case R_PPC64_REL24_NOTOC: { uint32_t mask = 0x03FFFFFC; checkInt(loc, val, 26, rel); checkAlignment(loc, val, 4, rel); @@ -1032,16 +1035,28 @@ void PPC64::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const { bool PPC64::needsThunk(RelExpr expr, RelType type, const InputFile *file, uint64_t branchAddr, const Symbol &s, int64_t a) const { - if (type != R_PPC64_REL14 && type != R_PPC64_REL24) + if (type != R_PPC64_REL14 && type != R_PPC64_REL24 && + type != R_PPC64_REL24_NOTOC) return false; + // FIXME: Remove the fatal error once the call protocol is implemented. + if (type == R_PPC64_REL24_NOTOC && s.isInPlt()) + fatal("unimplemented feature: external function call with the reltype" + " R_PPC64_REL24_NOTOC"); + // If a function is in the Plt it needs to be called with a call-stub. if (s.isInPlt()) return true; - // This check looks at the st_other bits of the callee. If the value is 1 - // then the callee clobbers the TOC and we need an R2 save stub. - if ((s.stOther >> 5) == 1) + // FIXME: Remove the fatal error once the call protocol is implemented. + if (type == R_PPC64_REL24_NOTOC && (s.stOther >> 5) > 1) + fatal("unimplemented feature: local function call with the reltype" + " R_PPC64_REL24_NOTOC and the callee needs toc-pointer setup"); + + // This check looks at the st_other bits of the callee with relocation + // R_PPC64_REL14 or R_PPC64_REL24. If the value is 1, then the callee + // clobbers the TOC and we need an R2 save stub. + if (type != R_PPC64_REL24_NOTOC && (s.stOther >> 5) == 1) return true; // If a symbol is a weak undefined and we are compiling an executable @@ -1069,7 +1084,7 @@ bool PPC64::inBranchRange(RelType type, uint64_t src, uint64_t dst) const { int64_t offset = dst - src; if (type == R_PPC64_REL14) return isInt<16>(offset); - if (type == R_PPC64_REL24) + if (type == R_PPC64_REL24 || type == R_PPC64_REL24_NOTOC) return isInt<26>(offset); llvm_unreachable("unsupported relocation type used in branch"); } diff --git a/lld/test/ELF/Inputs/ppc64-callee-global-hidden.s b/lld/test/ELF/Inputs/ppc64-callee-global-hidden.s new file mode 100644 index 000000000000..e33d191f226d --- /dev/null +++ b/lld/test/ELF/Inputs/ppc64-callee-global-hidden.s @@ -0,0 +1,15 @@ +func_extern: + blr + +.hidden callee3_stother0_hidden +.globl callee3_stother0_hidden +callee3_stother0_hidden: + blr + +.hidden callee4_stother1_hidden +.globl callee4_stother1_hidden +callee4_stother1_hidden: + .localentry callee4_stother1_hidden, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl func_extern at notoc + blr diff --git a/lld/test/ELF/ppc64-pcrel-call-to-pcrel.s b/lld/test/ELF/ppc64-pcrel-call-to-pcrel.s new file mode 100644 index 000000000000..3b0d9e31fb7b --- /dev/null +++ b/lld/test/ELF/ppc64-pcrel-call-to-pcrel.s @@ -0,0 +1,124 @@ +# REQUIRES: ppc +# RUN: echo 'SECTIONS { \ +# RUN: .text_default_stother0 0x10010000: { *(.text_default_stother0) } \ +# RUN: .text_default_stother1 0x10020000: { *(.text_default_stother1) } \ +# RUN: .text_hidden_stother0 0x10030000: { *(.text_hidden_stother0) } \ +# RUN: .text_hidden_stother1 0x10040000: { *(.text_hidden_stother1) } \ +# RUN: }' > %t.script + +# RUN: llvm-mc -filetype=obj -triple=powerpc64le -defsym HIDDEN=1 %s -o %t1.o +# RUN: llvm-mc -filetype=obj -triple=powerpc64le %p/Inputs/ppc64-callee-global-hidden.s -o %t2.o +# RUN: ld.lld -T %t.script -shared %t1.o %t2.o -o %t.so +# RUN: llvm-readelf -s %t.so | FileCheck %s --check-prefix=SYMBOL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t.so | FileCheck --check-prefix=CHECK --check-prefix=CHECK-HIDDEN %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64le -defsym GLOBAL=1 %s -o %t3.o +# RUN: ld.lld -T %t.script %t3.o -o %t +# RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL-GLOBAL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t | FileCheck %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64 -defsym HIDDEN=1 %s -o %t1.o +# RUN: llvm-mc -filetype=obj -triple=powerpc64 %p/Inputs/ppc64-callee-global-hidden.s -o %t2.o +# RUN: ld.lld -T %t.script -shared %t1.o %t2.o -o %t.so +# RUN: llvm-readelf -s %t.so | FileCheck %s --check-prefix=SYMBOL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t.so | FileCheck --check-prefix=CHECK --check-prefix=CHECK-HIDDEN %s + +# RUN: llvm-mc -filetype=obj -triple=powerpc64 -defsym GLOBAL=1 %s -o %t3.o +# RUN: ld.lld -T %t.script %t3.o -o %t +# RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL-GLOBAL +# RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t | FileCheck %s + +# SYMBOL: 2: 0000000010010000 0 NOTYPE LOCAL DEFAULT 5 callee1_stother0_default +# SYMBOL-NEXT: 3: 0000000010020004 0 NOTYPE LOCAL DEFAULT [] 6 callee2_stother1_default +# SYMBOL-NEXT: 4: 0000000010010004 0 NOTYPE LOCAL DEFAULT [] 5 caller1 +# SYMBOL-NEXT: 5: 000000001002000c 0 NOTYPE LOCAL DEFAULT [] 6 caller2 +# SYMBOL-NEXT: 6: 0000000010030000 0 NOTYPE LOCAL DEFAULT [] 7 caller3 +# SYMBOL-NEXT: 7: 0000000010040000 0 NOTYPE LOCAL DEFAULT [] 8 caller4 +# SYMBOL-NEXT: 8: 0000000010020000 0 NOTYPE LOCAL DEFAULT 6 func_local +# SYMBOL-NEXT: 9: 0000000010040008 0 NOTYPE LOCAL DEFAULT 9 func_extern +# SYMBOL-NEXT: 10: 000000001004000c 0 NOTYPE LOCAL HIDDEN 9 callee3_stother0_hidden +# SYMBOL-NEXT: 11: 0000000010040010 0 NOTYPE LOCAL HIDDEN [] 9 callee4_stother1_hidden + +# SYMBOL-GLOBAL: 2: 0000000010010004 0 NOTYPE LOCAL DEFAULT [] 1 caller1 +# SYMBOL-GLOBAL-NEXT: 3: 000000001002000c 0 NOTYPE LOCAL DEFAULT [] 2 caller2 +# SYMBOL-GLOBAL-NEXT: 4: 0000000010020000 0 NOTYPE LOCAL DEFAULT 2 func_local +# SYMBOL-GLOBAL-NEXT: 5: 0000000010010000 0 NOTYPE GLOBAL DEFAULT 1 callee1_stother0_default +# SYMBOL-GLOBAL-NEXT: 6: 0000000010020004 0 NOTYPE GLOBAL DEFAULT [] 2 callee2_stother1_default + +# CHECK-LABEL: : +# CHECK-NEXT: 10010000: blr + +# CHECK-LABEL: : +# CHECK: 10010004: bl 0x10010000 +# CHECK-NEXT: 10010008: b 0x10010000 +.section .text_default_stother0, "ax", %progbits +.ifdef GLOBAL +.globl callee1_stother0_default +.endif +callee1_stother0_default: + blr +caller1: + .localentry caller1, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee1_stother0_default at notoc + b callee1_stother0_default at notoc + +# CHECK-LABEL: : +# CHECK-NEXT: 10020000: blr + +# CHECK-LABEL: : +# CHECK-NEXT: 10020004: bl 0x10020000 +# CHECK-NEXT: 10020008: blr + +# CHECK-LABEL: : +# CHECK: 1002000c: bl 0x10020004 +# CHECK-NEXT: 10020010: b 0x10020004 +.section .text_default_stother1, "ax", %progbits +func_local: + blr +.ifdef GLOBAL +.globl callee2_stother1_default +.endif +callee2_stother1_default: + .localentry callee2_stother1_default, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl func_local at notoc + blr +caller2: + .localentry caller2, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee2_stother1_default at notoc + b callee2_stother1_default at notoc + +# CHECK-HIDDEN-LABEL: : +# CHECK-HIDDEN-NEXT: 10030000: bl 0x1004000c +# CHECK-HIDDEN-NEXT: 10030004: b 0x1004000c + +# CHECK-HIDDEN-LABEL: : +# CHECK-HIDDEN-NEXT: 10040000: bl 0x10040010 +# CHECK-HIDDEN-NEXT: 10040004: b 0x10040010 + +# CHECK-HIDDEN-LABEL: : +# CHECK-HIDDEN-NEXT: 10040008: blr + +# CHECK-HIDDEN-LABEL: : +# CHECK-HIDDEN-NEXT: 1004000c: blr + +# CHECK-HIDDEN-LABEL: : +# CHECK-HIDDEN-NEXT: 10040010: bl 0x10040008 +# CHECK-HIDDEN-NEXT: 10040014: blr +.ifdef HIDDEN +.section .text_hidden_stother0, "ax", %progbits +caller3: + .localentry caller3, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee3_stother0_hidden at notoc + b callee3_stother0_hidden at notoc + +.section .text_hidden_stother1, "ax", %progbits +caller4: + .localentry caller4, 1 + ## nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee4_stother1_hidden at notoc + b callee4_stother1_hidden at notoc +.endif From llvm-commits at lists.llvm.org Fri Jul 10 05:25:00 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:25:00 +0000 (UTC) Subject: [PATCH] D82816: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls, callee also has no TOC In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. NeHuang marked 5 inline comments as done. Closed by commit rG118366dcb6c3: [PowerPC] Implement R_PPC64_REL24_NOTOC calls, callee also has no TOC (authored by NeHuang). Changed prior to commit: https://reviews.llvm.org/D82816?vs=276572&id=276999#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 Files: lld/ELF/Arch/PPC64.cpp lld/test/ELF/Inputs/ppc64-callee-global-hidden.s lld/test/ELF/ppc64-pcrel-call-to-pcrel.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82816.276999.patch Type: text/x-patch Size: 8548 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 05:25:12 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:25:12 +0000 (UTC) Subject: [PATCH] D82816: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls, callee also has no TOC In-Reply-To: References: Message-ID: <9e2b60132862fbca220d69f466f2cb0e@localhost.localdomain> NeHuang added inline comments. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:13 +# RUN: ld.lld -T %t.script -shared %t1.o %t2.o -o %t.so +# RUN: ld.lld -T %t.script %t3.o -o %t +# RUN: llvm-readelf -s %t.so | FileCheck %s --check-prefix=SYMBOL ---------------- sfertile wrote: > Nit: I suggest having a separate set of run steps for testing the exec since the steps are completely disjoint from the shared object test. ie separate out > ``` > # RUN: llvm-mc -filetype=obj -triple=powerpc64[le] -defsym GLOBAL=1 %s -o %t3.o > # RUN: ld.lld -T %t.script %t3.o -o %t > # RUN: llvm-readelf -s %t | FileCheck %s --check-prefix=SYMBOL-GLOBAL > # RUN: llvm-objdump -d --no-show-raw-insn --mcpu=pwr10 %t | FileCheck %s > ``` > Thanks. Addressed the nit when committing the patch. ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-pcrel.s:60 + .localentry caller1, 1 + # nop is not needed after bl for R_PPC64_REL24_NOTOC + bl callee1_stother0_default at notoc ---------------- MaskRay wrote: > We use `## ` for comments. Thanks. Addressed the nit when committing the patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82816/new/ https://reviews.llvm.org/D82816 From llvm-commits at lists.llvm.org Fri Jul 10 05:25:25 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:25:25 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: <73365151a1e687edd25a32c59be770f1@localhost.localdomain> StephenTozer updated this revision to Diff 276998. StephenTozer added a comment. Remove useless const, add rename that somehow escaped last diff. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 Files: llvm/include/llvm/BinaryFormat/Dwarf.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/include/llvm/CodeGen/MachineInstrBuilder.h llvm/include/llvm/IR/DebugInfoMetadata.h llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/Target.td llvm/lib/BinaryFormat/Dwarf.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/LiveRangeShrink.cpp llvm/lib/CodeGen/MIRParser/MIParser.cpp llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/MachineRegisterInfo.cpp llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/RegAllocFast.cpp llvm/lib/IR/DebugInfoMetadata.cpp llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/X86/X86OptimizeLEAs.cpp llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D82363.276998.patch Type: text/x-patch Size: 49495 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 05:44:01 2020 From: llvm-commits at lists.llvm.org (Bjorn Pettersson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:44:01 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <3bfb692def4d3769dd65698869404216@localhost.localdomain> bjope added a comment. Here is a reproducer for the new problem that @uabelho noticed. Might be possible to reduce it further, but I think it at least show the problem: ; opt -basic-aa-recphi=0 -gvn -o - -S ; opt -basic-aa-recphi=1 -gvn -o - -S declare i16 @myprintf(i32) define i16 @main(i16 %argc.5.par, i16** nocapture readnone %argv.6.par) { %int_arr.10 = alloca [3 x i16], align 1 %_tmp1 = getelementptr inbounds [3 x i16], [3 x i16]* %int_arr.10, i16 0, i16 2 br label %bb1 bb1: ; preds = %bb1, %0 %i.7.0 = phi i16 [ 2, %0 ], [ %_tmp5, %bb1 ] %ls1.9.0 = phi i16* [ %_tmp1, %0 ], [ %_tmp7, %bb1 ] store i16 %i.7.0, i16* %ls1.9.0, align 1 %_tmp5 = add nsw i16 %i.7.0, -1 %_tmp7 = getelementptr i16, i16* %ls1.9.0, i16 -1 %_tmp9 = icmp sgt i16 %i.7.0, 0 br i1 %_tmp9, label %bb1, label %bb3 bb3: ; preds = %bb1 %_tmp11 = getelementptr inbounds [3 x i16], [3 x i16]* %int_arr.10, i16 0, i16 1 %_tmp12 = load i16, i16* %_tmp11, align 1 %_tmp13 = sext i16 %_tmp12 to i32 %_tmp16 = call i16 @myprintf(i32 %_tmp13) %_tmp18.not = icmp eq i16 %_tmp12, 1 br i1 %_tmp18.not, label %bb5, label %bb4 bb4: ; preds = %bb3 ret i16 1 bb5: ; preds = %bb3, %bb4 ret i16 0 } See https://godbolt.org/z/j59xxh for the diff. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 05:44:07 2020 From: llvm-commits at lists.llvm.org (David Green via llvm-commits) Date: Fri, 10 Jul 2020 05:44:07 -0700 (PDT) Subject: [llvm] e1135b4 - Revert "[BasicAA] Enable -basic-aa-recphi by default" Message-ID: <5f086297.1c69fb81.28ceb.eb66@mx.google.com> Author: David Green Date: 2020-07-10T13:43:54+01:00 New Revision: e1135b486aaed207f87a6d2e890678ef133b816e URL: https://github.com/llvm/llvm-project/commit/e1135b486aaed207f87a6d2e890678ef133b816e DIFF: https://github.com/llvm/llvm-project/commit/e1135b486aaed207f87a6d2e890678ef133b816e.diff LOG: Revert "[BasicAA] Enable -basic-aa-recphi by default" This reverts commit af839a96187e3538d63ad57571e4bdf01e2b15c5. Some issues appear to be being caused by this. Reverting whilst we investigate. Added: Modified: llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/phi-loop.ll llvm/test/Analysis/BasicAA/recphi.ll Removed: ################################################################################ diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 5574d3f5db6a..74664098ce1d 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -66,7 +66,7 @@ using namespace llvm; /// Enable analysis of recursive PHI nodes. static cl::opt EnableRecPhiAnalysis("basic-aa-recphi", cl::Hidden, - cl::init(true)); + cl::init(false)); /// By default, even on 32-bit architectures we use 64-bit integers for /// calculations. This will allow us to more-aggressively decompose indexing diff --git a/llvm/test/Analysis/BasicAA/phi-loop.ll b/llvm/test/Analysis/BasicAA/phi-loop.ll index e54752a9223f..db3023c6560d 100644 --- a/llvm/test/Analysis/BasicAA/phi-loop.ll +++ b/llvm/test/Analysis/BasicAA/phi-loop.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -gvn -S | FileCheck %s +; RUN: opt < %s -basic-aa -basic-aa-recphi=1 -gvn -S | FileCheck %s ; ; Check that section->word_ofs doesn't get reloaded in every iteration of the ; for loop. diff --git a/llvm/test/Analysis/BasicAA/recphi.ll b/llvm/test/Analysis/BasicAA/recphi.ll index bdd85c8f0e6c..130058c74560 100644 --- a/llvm/test/Analysis/BasicAA/recphi.ll +++ b/llvm/test/Analysis/BasicAA/recphi.ll @@ -1,4 +1,4 @@ -; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 | FileCheck %s +; RUN: opt < %s -basic-aa -aa-eval -print-all-alias-modref-info -basic-aa-recphi -disable-output 2>&1 | FileCheck %s ; CHECK-LABEL: Function: simple: 5 pointers, 0 call sites ; CHECK: NoAlias: float* %src1, float* %src2 From llvm-commits at lists.llvm.org Fri Jul 10 05:53:37 2020 From: llvm-commits at lists.llvm.org (Ilya Leoshkevich via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:53:37 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <9f664aee55d71d97ceb9e85a13e13f2c@localhost.localdomain> iii updated this revision to Diff 277007. iii added a comment. I've added the warning. On my s390 it looks like this when building kselftests: CLANG kselftest/bpf/tools/build/bpftool/pid_iter.bpf.o Type `float` is not supported by BTF, emitting `unsigned char[4]` instead Type `double` is not supported by BTF, emitting `unsigned char[8]` instead I don't think existing progs are using any of these fields, but it might be beneficial to be able to at least memcpy their contents to e.g. a perf event buffer for later inspection. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 Files: llvm/lib/Target/BPF/BTFDebug.cpp llvm/lib/Target/BPF/BTFDebug.h llvm/test/CodeGen/BPF/BTF/double.ll llvm/test/CodeGen/BPF/BTF/float.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83289.277007.patch Type: text/x-patch Size: 11842 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 05:54:07 2020 From: llvm-commits at lists.llvm.org (Igor Kudrin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:54:07 +0000 (UTC) Subject: [PATCH] D83549: [ELF] Do not force bringing out symbols passed by -init and -fini. Message-ID: ikudrin created this revision. ikudrin added reviewers: ruiu, MaskRay, grimar. ikudrin added projects: LLVM, lld. Herald added subscribers: arichardson, emaste. Herald added a reviewer: espindola. After D69985 , symbols for `-init` and `-fini` were unconditionally marked as used even if they were just lazy symbols seen when scanning archives. That resulted in exposing them in the symbol table of an output file, as Undefined, which added unwanted dependencies. The patch fixes the issue by checking the kind of the symbols before marking. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83549 Files: lld/ELF/Driver.cpp lld/test/ELF/archive-init-fini.s Index: lld/test/ELF/archive-init-fini.s =================================================================== --- /dev/null +++ lld/test/ELF/archive-init-fini.s @@ -0,0 +1,16 @@ +# REQUIRES: x86 + +## This checks that LLD does not add "_init" and "_fini" symbols into +## the symbol table of the output binary if the symbols are encountered in +## an archive but not in fact used in input files. + +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t +# RUN: rm -f %t.a +# RUN: llvm-ar rcs %t.a %t +# RUN: ld.lld -shared -m elf_x86_64 %t.a -o %t.out +# RUN: llvm-nm %t.out | \ +# RUN: FileCheck %s --implicit-check-not=_init --implicit-check-not=_fini + +.global _init, _fini +_init: +_fini: Index: lld/ELF/Driver.cpp =================================================================== --- lld/ELF/Driver.cpp +++ lld/ELF/Driver.cpp @@ -1944,9 +1944,9 @@ handleUndefinedGlob(pat); // Mark -init and -fini symbols so that the LTO doesn't eliminate them. - if (Symbol *sym = symtab->find(config->init)) + if (Symbol *sym = dyn_cast_or_null(symtab->find(config->init))) sym->isUsedInRegularObj = true; - if (Symbol *sym = symtab->find(config->fini)) + if (Symbol *sym = dyn_cast_or_null(symtab->find(config->fini))) sym->isUsedInRegularObj = true; // If any of our inputs are bitcode files, the LTO code generator may create -------------- next part -------------- A non-text attachment was scrubbed... Name: D83549.277006.patch Type: text/x-patch Size: 1369 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 05:54:47 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 12:54:47 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <661643f0fb360691a9653cee6387f76e@localhost.localdomain> dmgreen added a comment. Thanks. I was just trying to come up with a reproducer that used negative gep constants! But the example I was trying didn't seem to be showing the same problem. It does make sense that could cause problems though. I reverted the option as rGe1135b486aae and will try to put together a fix for the issue. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 06:15:57 2020 From: llvm-commits at lists.llvm.org (Stephen Tozer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:15:57 +0000 (UTC) Subject: [PATCH] D82363: [DebugInfo] Add new instruction and expression operator for variadic debug values In-Reply-To: References: Message-ID: <3dca7f18d393fa8e0a20464a206f6c70@localhost.localdomain> StephenTozer updated this revision to Diff 277015. StephenTozer added a comment. Rename appendOpsToLoc->appendOpsToArg to fit the rest of the names, update isUndefDebugValue check to check all operands (pulled ahead from a patch further up the stack). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82363/new/ https://reviews.llvm.org/D82363 Files: llvm/include/llvm/BinaryFormat/Dwarf.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/include/llvm/CodeGen/MachineInstrBuilder.h llvm/include/llvm/IR/DebugInfoMetadata.h llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/Target.td llvm/lib/BinaryFormat/Dwarf.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/InlineSpiller.cpp llvm/lib/CodeGen/LiveRangeShrink.cpp llvm/lib/CodeGen/MIRParser/MIParser.cpp llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/MachineRegisterInfo.cpp llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/RegAllocFast.cpp llvm/lib/IR/DebugInfoMetadata.cpp llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/X86/X86OptimizeLEAs.cpp llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D82363.277015.patch Type: text/x-patch Size: 50080 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 06:17:14 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:17:14 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results In-Reply-To: References: Message-ID: <30e9cf3a54ede6cb92eed211eebbb892@localhost.localdomain> okura added a comment. On second thoughts, I think so too about the licenses check. Could you tell me where you think we can use the liveness information in this comment ? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 From llvm-commits at lists.llvm.org Fri Jul 10 06:23:47 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:23:47 +0000 (UTC) Subject: [PATCH] D83554: [llvm-readobj] - Stop using unwrapOrError() for all program_headers() calls. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. program_headers() returns the list of program headers. This change allows to continue attempt of dumping when something is wrong with program headers. https://reviews.llvm.org/D83554 Files: llvm/test/Object/invalid.test llvm/test/tools/llvm-readobj/ELF/dynamic-tags.test llvm/test/tools/llvm-readobj/ELF/gnu-notes.test llvm/test/tools/llvm-readobj/ELF/gnu-phdrs.test llvm/test/tools/llvm-readobj/ELF/gnu-section-mapping.test llvm/test/tools/llvm-readobj/ELF/program-headers.test llvm/tools/llvm-readobj/ELFDumper.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83554.277017.patch Type: text/x-patch Size: 23717 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 06:25:54 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:25:54 +0000 (UTC) Subject: [PATCH] D77152: [SelectionDAG] Better legalization for FSHL and FSHR In-Reply-To: References: Message-ID: <13bdc386a5b75ca91b06f690a2a8c2b0@localhost.localdomain> RKSimon added a comment. reverse ping - any luck on the amdgpu regressions? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77152/new/ https://reviews.llvm.org/D77152 From llvm-commits at lists.llvm.org Fri Jul 10 06:26:15 2020 From: llvm-commits at lists.llvm.org (Richard Barton via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:26:15 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: <98adb725beaf6b1606bce942b756373c@localhost.localdomain> richard.barton.arm added a comment. Thanks for picking up the work on this one @AlexisPerry! I have replied to the RFC (late - my apologies) with a slightly different proposal. I'd like to make a final decision at the developers call on Monday. I don't hold my position too strongly though and would be happy to go in this direction. I also think we can go ahead with this patch because it moves us to a better place than we are now. Even if we chose my proposal to have no default most of this patch would still be kept because having gfortran support is a useful feature anyway. On the patch itself, I have two requests: 1. This patch seems to be doing a couple of different things now that don't seem to be strictly linked: - Switching the default F18_FC to gfortran - Add support for a number of new gfortran options I think these would be better submitted as two patches. Do you agree? 2. The patch should add a regression test for these fallback and option translation behaviours in test/Driver. It is a shame there are none already. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Fri Jul 10 06:36:37 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:36:37 +0000 (UTC) Subject: [PATCH] D79625: [PowerPC] Extend .reloc directive on PowerPC In-Reply-To: References: Message-ID: <3def96714120f32a5403b977e336b4d5@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. This revision is now accepted and ready to land. My remaining comments are minor and can be addressed on the commit. AFAICT, Sean's comments have been addressed as well so I think this is good to go. @sfertile if you have any objections, feel free to override my approval, otherwise this is good to go. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:111 +static bool getOffsetFromBinaryExpr(const MCBinaryExpr &BinExpr, + uint64_t &Offset, MCDataFragment **DF) { + const MCExpr *LHS = BinExpr.getLHS(); ---------------- Please favour reference-to-pointer rather than pointer-to-pointer as I believe the former is the convention. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:183 + bool HaveOffset = getOffsetFromBinaryExpr(BinExpr, ComputedOffset, &DF); + assert(HaveOffset && "Unable to get the offset of the binary expression."); + assert(DF && "Expected a valid data fragment."); ---------------- I think you'll need `(void)HaveOffset;` to silence warnings on no asserts builds. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:212 + + // If the symbol expression is not a binary expression we can let the base + // class handle the issue. ---------------- Is this comment actually true? It seems that if the expression is not a binary expression, we assert. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:223 + bool HaveOffset = getOffsetFromBinaryExpr(BinExpr, ComputedOffset, &DF); + assert(HaveOffset && "Unable to get the offset of the binary expression."); + assert(DF && "Expected a valid data fragment."); ---------------- Similar to above wrt. unused variable without asserts. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79625/new/ https://reviews.llvm.org/D79625 From llvm-commits at lists.llvm.org Fri Jul 10 06:37:40 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:37:40 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: LukeGeeson updated this revision to Diff 277019. LukeGeeson added a comment. - Added FP16 to Cortex-X1 set of features CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277019.patch Type: text/x-patch Size: 21296 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 06:45:08 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:45:08 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: clementval updated this revision to Diff 277021. clementval marked 6 inline comments as done. clementval added a comment. Address review comment about NoWait in target eneter/exit data Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 Files: flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/check-omp-structure.h flang/test/Semantics/omp-clause-validity01.f90 llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83326.277021.patch Type: text/x-patch Size: 75242 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 06:45:34 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:45:34 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: clementval added a comment. Thanks for the review. I just updates the patch and answered your questions. ================ Comment at: flang/test/Semantics/omp-clause-validity01.f90:457 - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) ---------------- ichoyjx wrote: > clementval wrote: > > As a side note, This is supposed to be fine in Clang so I removed the check. I looked at the OpenMP 5.0 std and didn't see a restriction on `reduction` for `task loop simd`. > What's the current plan? Are we trying to cover OpenMP 5.0 Spec for semantics (it appears so)? Clang just moved to 5.0as default and my guesses that we are targeting 5.0 as well since it is the current standard. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:229 VersionedClause, VersionedClause, VersionedClause ---------------- ichoyjx wrote: > Bear with me, what does 50 mean? The `VersionedClause` is defined as this: `VersionedClause` So here it means the clause is valid from version 5.0 and up. This is currently not used in Flang but in Clang it's used and I took the value from the old macros definition `OMPKinds.def`. So version 4.5 would 45 and so on. ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:427 VersionedClause, - VersionedClause, - VersionedClause + VersionedClause + ]; ---------------- ichoyjx wrote: > For `target enter` and `target exit`, `nowait` is only allowed once. If it's allowed here, will this restriction be captured by the rules in `target` directive above? Good catch. I updated the patch and moved them in the correct set. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Fri Jul 10 06:47:04 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:47:04 +0000 (UTC) Subject: [PATCH] D83488: [flang] Change the default F18_FC to gfortran In-Reply-To: References: Message-ID: <94ffc588baf7294a00b08b145d4133a0@localhost.localdomain> clementval added a comment. +1 to split this in two patches Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83488/new/ https://reviews.llvm.org/D83488 From llvm-commits at lists.llvm.org Fri Jul 10 06:48:36 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 13:48:36 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <989c73eb2652020dd6202ea3b9a2dcee@localhost.localdomain> dmgreen added inline comments. ================ Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:135 + (AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC | + AArch64::AEK_SSBS | AArch64::AEK_RAS)) AARCH64_CPU_NAME("neoverse-e1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, ---------------- AEK_RAS will be included in ARMV8_2A, I believe. ================ Comment at: llvm/include/llvm/Support/ARMTargetParser.def:298 +ARM_CPU_NAME("cortex-a78",ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (ARM::AEK_RAS | ARM::AEK_DOTPROD)) +ARM_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, ---------------- This one too I think. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 From llvm-commits at lists.llvm.org Fri Jul 10 07:03:04 2020 From: llvm-commits at lists.llvm.org (Bardia Mahjour via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:03:04 +0000 (UTC) Subject: [PATCH] D83543: [CodeMoverUtils] Add more data dependency related test case In-Reply-To: References: Message-ID: bmahjour added inline comments. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:697 + // Flow backward dependency + EXPECT_TRUE(isSafeToMoveBefore(*LoadA0, *StoreA1, DT, &PDT, &DI)); + // Anti forward dependency ---------------- This should be EXPECT_FALSE! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83543/new/ https://reviews.llvm.org/D83543 From llvm-commits at lists.llvm.org Fri Jul 10 07:12:38 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:12:38 +0000 (UTC) Subject: [PATCH] D80580: [NFC] Separate Peeling Properties into its own struct In-Reply-To: References: Message-ID: <0edd96d15a0dde71dec2b5a6020e5620@localhost.localdomain> sidbav updated this revision to Diff 277025. sidbav added reviewers: anhtuyen, nikic. sidbav added a comment. Applying the patch in its previous state resulted in build failures. Updated patch to resolve all build failures. Major change is in the file `llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp` in the `gatherPeelingPreferences` function. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80580/new/ https://reviews.llvm.org/D80580 Files: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D80580.277025.patch Type: text/x-patch Size: 31214 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:12:47 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:12:47 +0000 (UTC) Subject: [PATCH] D83504: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls. callee has a TOC In-Reply-To: References: Message-ID: NeHuang updated this revision to Diff 277024. NeHuang marked 4 inline comments as done. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83504/new/ https://reviews.llvm.org/D83504 Files: lld/ELF/Arch/PPC64.cpp lld/ELF/Target.h lld/ELF/Thunks.cpp lld/test/ELF/ppc64-pcrel-call-to-toc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83504.277024.patch Type: text/x-patch Size: 7253 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:13:59 2020 From: llvm-commits at lists.llvm.org (Victor Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:13:59 +0000 (UTC) Subject: [PATCH] D83504: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls. callee has a TOC In-Reply-To: References: Message-ID: <96f18110187b30e8941b989c8ce10cfd@localhost.localdomain> NeHuang added a comment. Addressed comments from MaskRay. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83504/new/ https://reviews.llvm.org/D83504 From llvm-commits at lists.llvm.org Fri Jul 10 07:16:58 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:16:58 +0000 (UTC) Subject: [PATCH] D83557: [DebugInfo] Simplify DwarfDebug::emitMacro Message-ID: dstenb created this revision. dstenb added reviewers: SouraVX, dblaikie, ikudrin. dstenb added a project: debug-info. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Broken out from a review comment on D82975 . This is an NFC expect for that the Macinfo macro string is now emitted using a single emitBytes() invocation, so it can be done using a single string directive. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83557 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp Index: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp =================================================================== --- llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -2996,6 +2996,10 @@ StringRef Name = M.getName(); StringRef Value = M.getValue(); + // There should be one space between the macro name and the macro value in + // define entries. In undef entries, only the macro name is emitted. + std::string Str = Value.empty() ? Name.str() : (Name + " " + Value).str(); + if (UseDebugMacroSection) { unsigned Type = M.getMacinfoType() == dwarf::DW_MACINFO_define ? dwarf::DW_MACRO_define_strx @@ -3005,29 +3009,15 @@ Asm->OutStreamer->AddComment("Line Number"); Asm->emitULEB128(M.getLine()); Asm->OutStreamer->AddComment("Macro String"); - if (!Value.empty()) - Asm->emitULEB128(this->InfoHolder.getStringPool() - .getIndexedEntry(*Asm, (Name + " " + Value).str()) - .getIndex()); - else - // DW_MACRO_undef_strx doesn't have a value, so just emit the macro - // string. - Asm->emitULEB128(this->InfoHolder.getStringPool() - .getIndexedEntry(*Asm, (Name).str()) - .getIndex()); + Asm->emitULEB128( + InfoHolder.getStringPool().getIndexedEntry(*Asm, Str).getIndex()); } else { Asm->OutStreamer->AddComment(dwarf::MacinfoString(M.getMacinfoType())); Asm->emitULEB128(M.getMacinfoType()); Asm->OutStreamer->AddComment("Line Number"); Asm->emitULEB128(M.getLine()); Asm->OutStreamer->AddComment("Macro String"); - Asm->OutStreamer->emitBytes(Name); - if (!Value.empty()) { - // There should be one space between macro name and macro value. - Asm->emitInt8(' '); - Asm->OutStreamer->AddComment("Macro Value="); - Asm->OutStreamer->emitBytes(Value); - } + Asm->OutStreamer->emitBytes(Str); Asm->emitInt8('\0'); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83557.277029.patch Type: text/x-patch Size: 2030 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:19:31 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:19:31 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <6a5bd95f1e8d6869469eac2b277b3589@localhost.localdomain> dstenb updated this revision to Diff 277030. dstenb marked 2 inline comments as done. dstenb added a comment. Rebase and address review comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/test/DebugInfo/X86/debug-macro-gnu.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82975.277030.patch Type: text/x-patch Size: 8112 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:20:20 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:20:20 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <2643c394554079cff7ff46527cbeed37@localhost.localdomain> LukeGeeson updated this revision to Diff 277032. LukeGeeson added a comment. - removed RAS as it's in 8.2a CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277032.patch Type: text/x-patch Size: 21262 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:21:56 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:21:56 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <223f6e5a5805b89663258f433f5421eb@localhost.localdomain> dstenb marked 2 inline comments as done. dstenb added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3036-3048 + if (!Value.empty()) + // FIXME: Add support for DWARF64. + Asm->OutStreamer->emitSymbolValue( + this->InfoHolder.getStringPool() + .getEntry(*Asm, (Name + " " + Value).str()) + .getSymbol(), + /*Size=*/4); ---------------- dblaikie wrote: > Might be nice to refactor this in both the original codepath and the new codepath you're adding (either before or after this commit) to compute the string once & share the rest of this expression.. > ``` > std::string Str = Value.empty() ? Name.str() : (Name + ' ' + Value).str(); > Asm->OutStreamer->emitSymbol(this->InfoHolder.getStringPool().getEntry(*Asm, Str).getSymbol(), 4); > ``` > Yes, good idea! I split that out to the preparatory patch D83557. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3088-3091 + dwarf::DW_MACRO_end_file, [&](unsigned Form) { + return (getDwarfVersion() >= 5) + ? dwarf::MacroString(Form) + : dwarf::GnuMacroString(Form); ---------------- dblaikie wrote: > Looks like maybe this could skip the std::function_ref, and do this: > ``` > emitMacroFileImpl(F, U, dwarf::DW_MACRO_start_file, > dwarf::DW_MACRO_end_file, > (getDwarfVersion() >= 5) > ? dwarf::MacroString > : dwarf::GnuMacroString); > ``` Oh, thanks, of course! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Fri Jul 10 07:22:16 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:22:16 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: LukeGeeson updated this revision to Diff 277033. LukeGeeson marked an inline comment as done. LukeGeeson added a comment. - removed missed RAS CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277033.patch Type: text/x-patch Size: 21247 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:26:24 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:26:24 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <86e1244fe3d5a675aa5ab0e5f154c86c@localhost.localdomain> LukeGeeson updated this revision to Diff 277036. LukeGeeson added a comment. - Added FP16 to ARM a78 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277036.patch Type: text/x-patch Size: 21278 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:26:40 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:26:40 +0000 (UTC) Subject: [PATCH] D83497: [PowerPC][Power10] Fix the VINSW instruction to have an i32 argument. In-Reply-To: References: Message-ID: <070b32b6e35d08365a71cc7ed6c3fda4@localhost.localdomain> amyk marked an inline comment as done. amyk added inline comments. ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:523 Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_i64_ty, llvm_v4i32_ty], [IntrNoMem]>; ---------------- rzurob wrote: > The same problem also occurs in vins[bhw][lr] Ah, I missed that. Thank you for bringing that to my attention. I'll look into it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83497/new/ https://reviews.llvm.org/D83497 From llvm-commits at lists.llvm.org Fri Jul 10 07:28:32 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:28:32 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results In-Reply-To: References: Message-ID: okura marked an inline comment as done. okura added inline comments. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:729 + return Result; + } + ---------------- jdoerfert wrote: > Please add documentation and consider taking the instructions as references. > > Nit: Move `F` after the first check to shorten the lifetime (and avoid confusion). > Do you expect that I change interfaces of AAReachability to take instructions as references too? Could you tell me the advantage of taking as references compared to taking as pointers? I don't mean I want to stick to pointers, just want to know about it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 From llvm-commits at lists.llvm.org Fri Jul 10 07:42:54 2020 From: llvm-commits at lists.llvm.org (Whitney Tsang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:42:54 +0000 (UTC) Subject: [PATCH] D83543: [CodeMoverUtils] Add more data dependency related test case In-Reply-To: References: Message-ID: Whitney added inline comments. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:698 + EXPECT_TRUE(isSafeToMoveBefore(*LoadA0, *StoreA1, DT, &PDT, &DI)); + // Anti forward dependency + EXPECT_FALSE(isSafeToMoveBefore(*StoreA1, *StoreB0, DT, &PDT, &DI)); ---------------- should this be flow dependency? as the dependency is still read after write. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83543/new/ https://reviews.llvm.org/D83543 From llvm-commits at lists.llvm.org Fri Jul 10 07:45:22 2020 From: llvm-commits at lists.llvm.org (Hans Wennborg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:45:22 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: <6122552963973b3145977f48ea8daf55@localhost.localdomain> hans added a comment. Apologies for not having a better repro of the Chromium/libpng problem. Once there's a fix, I'd be happy to try it out in our build if you'd like. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 07:49:52 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Fri, 10 Jul 2020 07:49:52 -0700 (PDT) Subject: [llvm] ef0ecb7 - [NFCI][InstCombine] PR46661: multiple stores eligible for merging into successor - worklist issue Message-ID: <5f088010.1c69fb81.28025.e2a5@mx.google.com> Author: Roman Lebedev Date: 2020-07-10T17:49:16+03:00 New Revision: ef0ecb7b03332fd5076a7ea641eadf0d01cd32c0 URL: https://github.com/llvm/llvm-project/commit/ef0ecb7b03332fd5076a7ea641eadf0d01cd32c0 DIFF: https://github.com/llvm/llvm-project/commit/ef0ecb7b03332fd5076a7ea641eadf0d01cd32c0.diff LOG: [NFCI][InstCombine] PR46661: multiple stores eligible for merging into successor - worklist issue The testcase should pass with a single instcombine iteration. Added: llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll new file mode 100644 index 000000000000..f04897b6fb66 --- /dev/null +++ b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll @@ -0,0 +1,74 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt %s -instcombine -instcombine-infinite-loop-threshold=4 -S | FileCheck %s + + at var_7 = external global i8, align 1 + at var_1 = external global i32, align 4 + at var_0 = external global i16, align 2 + at var_5 = external global i64, align 8 + at arr_2 = external global [0 x i32], align 4 + at arr_4 = external global [0 x i16], align 2 + at arr_3 = external global [8 x i32], align 16 + +define void @_Z4testv() { +; CHECK-LABEL: @_Z4testv( +; CHECK-NEXT: bb: +; CHECK-NEXT: [[I:%.*]] = load i8, i8* @var_7, align 1 +; CHECK-NEXT: [[I1:%.*]] = icmp eq i8 [[I]], -1 +; CHECK-NEXT: [[I4:%.*]] = load i16, i16* @var_0, align 2 +; CHECK-NEXT: [[I8:%.*]] = sext i16 [[I4]] to i32 +; CHECK-NEXT: br i1 [[I1]], label [[BB10:%.*]], label [[BB9:%.*]] +; CHECK: bb9: +; CHECK-NEXT: br label [[BB12:%.*]] +; CHECK: bb10: +; CHECK-NEXT: [[I2:%.*]] = load i32, i32* @var_1, align 4 +; CHECK-NEXT: [[I3:%.*]] = icmp eq i32 [[I2]], 0 +; CHECK-NEXT: [[I6:%.*]] = load i64, i64* @var_5, align 8 +; CHECK-NEXT: [[I5:%.*]] = sext i16 [[I4]] to i64 +; CHECK-NEXT: [[I7:%.*]] = select i1 [[I3]], i64 [[I6]], i64 [[I5]] +; CHECK-NEXT: [[I11:%.*]] = trunc i64 [[I7]] to i32 +; CHECK-NEXT: br label [[BB12]] +; CHECK: bb12: +; CHECK-NEXT: [[STOREMERGE1:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ] +; CHECK-NEXT: [[STOREMERGE:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ] +; CHECK-NEXT: store i32 [[STOREMERGE1]], i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 0), align 4 +; CHECK-NEXT: store i16 [[I4]], i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 0), align 2 +; CHECK-NEXT: store i32 [[I8]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 0), align 16 +; CHECK-NEXT: store i32 [[STOREMERGE]], i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 1), align 4 +; CHECK-NEXT: store i16 [[I4]], i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 1), align 2 +; CHECK-NEXT: store i32 [[I8]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 1), align 4 +; CHECK-NEXT: ret void +; +bb: + %i = load i8, i8* @var_7, align 1 + %i1 = icmp eq i8 %i, -1 + %i2 = load i32, i32* @var_1, align 4 + %i3 = icmp eq i32 %i2, 0 + %i4 = load i16, i16* @var_0, align 2 + %i5 = sext i16 %i4 to i64 + %i6 = load i64, i64* @var_5, align 8 + %i7 = select i1 %i3, i64 %i6, i64 %i5 + %i8 = sext i16 %i4 to i32 + br i1 %i1, label %bb10, label %bb9 + +bb9: ; preds = %bb + store i32 1, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 0), align 4 + store i16 %i4, i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 0), align 2 + store i32 %i8, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 0), align 4 + store i32 1, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 1), align 4 + store i16 %i4, i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 1), align 2 + store i32 %i8, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 1), align 4 + br label %bb12 + +bb10: ; preds = %bb + %i11 = trunc i64 %i7 to i32 + store i32 %i11, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 0), align 4 + store i16 %i4, i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 0), align 2 + store i32 %i8, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 0), align 4 + store i32 %i11, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @arr_2, i64 0, i64 1), align 4 + store i16 %i4, i16* getelementptr inbounds ([0 x i16], [0 x i16]* @arr_4, i64 0, i64 1), align 2 + store i32 %i8, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @arr_3, i64 0, i64 1), align 4 + br label %bb12 + +bb12: ; preds = %bb10, %bb9 + ret void +} From llvm-commits at lists.llvm.org Fri Jul 10 07:49:54 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Fri, 10 Jul 2020 07:49:54 -0700 (PDT) Subject: [llvm] 2655a70 - [InstCombine] After merging store into successor, queue prev. store to be visited (PR46661) Message-ID: <5f088012.1c69fb81.b8be5.e857@mx.google.com> Author: Roman Lebedev Date: 2020-07-10T17:49:16+03:00 New Revision: 2655a70a046b67abe3bca01059d8767030f6e1c9 URL: https://github.com/llvm/llvm-project/commit/2655a70a046b67abe3bca01059d8767030f6e1c9 DIFF: https://github.com/llvm/llvm-project/commit/2655a70a046b67abe3bca01059d8767030f6e1c9.diff LOG: [InstCombine] After merging store into successor, queue prev. store to be visited (PR46661) We can happen to have a situation with many stores eligible for transform, but due to our visitation order (top to bottom), when we have processed the first eligible instruction, we would not try to reprocess the previous instructions that are now also eligible. So after we've successfully merged a store that was second-to-last instruction into successor, if the now-second-to-last instruction is also a such store that is eligible, add it to worklist to be revisited. Fixes https://bugs.llvm.org/show_bug.cgi?id=46661 Added: Modified: llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp index 17f33be374e1..7203850ad24d 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp @@ -1425,18 +1425,33 @@ Instruction *InstCombiner::visitStoreInst(StoreInst &SI) { if (isa(Val)) return eraseInstFromFunction(SI); + auto IsNoopInstrForStoreMerging = [](BasicBlock::iterator BBI) { + return isa(BBI) || + (isa(BBI) && BBI->getType()->isPointerTy()); + }; + // If this store is the second-to-last instruction in the basic block // (excluding debug info and bitcasts of pointers) and if the block ends with // an unconditional branch, try to move the store to the successor block. BBI = SI.getIterator(); do { ++BBI; - } while (isa(BBI) || - (isa(BBI) && BBI->getType()->isPointerTy())); + } while (IsNoopInstrForStoreMerging(BBI)); if (BranchInst *BI = dyn_cast(BBI)) if (BI->isUnconditional()) - mergeStoreIntoSuccessor(SI); + if (mergeStoreIntoSuccessor(SI)) { + // Okay, we've managed to do that. Now, let's see if now-second-to-last + // instruction is also a store that we can also sink. + BasicBlock::iterator FirstInstr = BBI->getParent()->begin(); + do { + if (BBI != FirstInstr) + --BBI; + } while (BBI != FirstInstr && IsNoopInstrForStoreMerging(BBI)); + if (StoreInst *PrevStore = dyn_cast(BBI)) + Worklist.add(PrevStore); + return nullptr; + } return nullptr; } diff --git a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll index f04897b6fb66..5be341fa6228 100644 --- a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll +++ b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll @@ -1,5 +1,5 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt %s -instcombine -instcombine-infinite-loop-threshold=4 -S | FileCheck %s +; RUN: opt %s -instcombine -instcombine-infinite-loop-threshold=2 -S | FileCheck %s @var_7 = external global i8, align 1 @var_1 = external global i32, align 4 From llvm-commits at lists.llvm.org Fri Jul 10 07:49:56 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Fri, 10 Jul 2020 07:49:56 -0700 (PDT) Subject: [llvm] 7103c87 - Reland "[InstCombine] Lower infinite combine loop detection thresholds"" Message-ID: <5f088014.1c69fb81.fdb7f.eee2@mx.google.com> Author: Roman Lebedev Date: 2020-07-10T17:49:16+03:00 New Revision: 7103c87596efccd532e9fe04a6ba6a200fed8481 URL: https://github.com/llvm/llvm-project/commit/7103c87596efccd532e9fe04a6ba6a200fed8481 DIFF: https://github.com/llvm/llvm-project/commit/7103c87596efccd532e9fe04a6ba6a200fed8481.diff LOG: Reland "[InstCombine] Lower infinite combine loop detection thresholds"" This relands commit cd7f8051ac7b6f08734102446482c1e5d951bfcc that was reverted since lower threshold have successfully found an issue. Now that the issue is fixed, let's wait until the next one is reported. This reverts commit caa423eef0d128f35ac11ddbce34964caafb61c1. Added: Modified: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index d1c1e5418825..e810b3de25bc 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,8 +123,13 @@ STATISTIC(NumReassoc , "Number of reassociations"); DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); +// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; +#ifndef NDEBUG +static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; +#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; +#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), From llvm-commits at lists.llvm.org Fri Jul 10 07:55:22 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:55:22 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <55c3ae130263e0c77e9a7ecd3032dbe1@localhost.localdomain> dantrushin updated this revision to Diff 277039. dantrushin added a comment. Herald added a subscriber: MatzeB. Changed DerivedPtrMap and moved it to StatepointLowering.h as it only needed for local gc.relocate processing; Rebased on tip, rewrote lowerStatepointMetaArgs as been told; Deleted test mods, will add separate test later; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.h llvm/lib/CodeGen/TargetLoweringBase.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.277039.patch Type: text/x-patch Size: 15792 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:57:32 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:57:32 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: <21c1523fa76389d4d704505c02493694@localhost.localdomain> dantrushin updated this revision to Diff 277040. dantrushin added a comment. Changed DerivedPtrMap and moved it to StatepointLowering.h as it only needed for local gc.relocate processing; Rebased on tip, rewrote lowerStatepointMetaArgs as been told; Deleted test mods, will add separate test later; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 Files: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.h llvm/lib/CodeGen/TargetLoweringBase.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81648.277040.patch Type: text/x-patch Size: 15792 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 07:58:38 2020 From: llvm-commits at lists.llvm.org (Konstantin Zhuravlyov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:58:38 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <39e64ccac7290d4b92824f5a67507193@localhost.localdomain> kzhuravl added inline comments. ================ Comment at: llvm/docs/AMDGPUUsage.rst:2321 - "ValueType" string Required Kernel argument value type. Only - present if "ValueKind" is ---------------- It looks like (according to phab diff) you removed space between "ValueType" and 'string', which will break the table. ================ Comment at: llvm/docs/AMDGPUUsage.rst:2322 + "ValueType" string Unused and deprecated. This should no longer + be emitted, but is accepted for compatability. + ---------------- compatibility ================ Comment at: llvm/docs/AMDGPUUsage.rst:2820-2842 - ".value_type" string Required Kernel argument value type. Only - present if ".value_kind" is - "by_value". For vector data - types, the value is for the - element type. Values include: - - - "struct" ---------------- Should same text apply as above? "Unused and deprecated. This should no longer..." ================ Comment at: llvm/include/llvm/Support/AMDGPUMetadata.h:82 -/// Value types. +/// Value types. This is deprecated and only remains for compatability parsing +/// of old metadata. ---------------- compatibility ================ Comment at: llvm/lib/Support/AMDGPUMetadata.cpp:115 + + // Removed. Accepted for parsing compatability, but not emitted. + Optional Unused; ---------------- same as above CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Fri Jul 10 07:58:43 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via llvm-commits) Date: Fri, 10 Jul 2020 07:58:43 -0700 (PDT) Subject: [llvm] 23cd70d - [PDB] Fix out-of-bounds acces when sorting GSI buckets Message-ID: <5f088223.1c69fb81.2f6a9.ee59@mx.google.com> Author: Alexandre Ganea Date: 2020-07-10T10:55:27-04:00 New Revision: 23cd70d71c10dc0b31ac37a733349f9de2e9b84c URL: https://github.com/llvm/llvm-project/commit/23cd70d71c10dc0b31ac37a733349f9de2e9b84c DIFF: https://github.com/llvm/llvm-project/commit/23cd70d71c10dc0b31ac37a733349f9de2e9b84c.diff LOG: [PDB] Fix out-of-bounds acces when sorting GSI buckets When building in Debug on Windows-MSVC after b7402edce315, a lot of tests were failing because we were dereferencing an element past the end of HashRecords. This happened towards the end of the table, in unused slots. Added: Modified: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp Removed: ################################################################################ diff --git a/llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp b/llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp index ce248f34762d..4e58489f1401 100644 --- a/llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp +++ b/llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp @@ -232,8 +232,10 @@ void GSIHashStreamBuilder::finalizeBuckets( // The algorithm used here corresponds to the function // caseInsensitiveComparePchPchCchCch in the reference implementation. parallelForEachN(0, IPHR_HASH, [&](size_t I) { - auto B = &HashRecords[BucketStarts[I]]; - auto E = &HashRecords[BucketCursors[I]]; + auto B = HashRecords.begin() + BucketStarts[I]; + auto E = HashRecords.begin() + BucketCursors[I]; + if (B == E) + return; auto BucketCmp = [Records](const PSHashRecord &LHash, const PSHashRecord &RHash) { const BulkPublic &L = Records[uint32_t(LHash.Off)]; From llvm-commits at lists.llvm.org Fri Jul 10 07:59:12 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 14:59:12 +0000 (UTC) Subject: [PATCH] D81296: [PDB] Defer public serialization until PDB writing In-Reply-To: References: Message-ID: <8a8ec9e4369c74abad63d487818b15a9@localhost.localdomain> aganea added a comment. @grimar Fixed in 23cd70d71c10dc0b31ac37a733349f9de2e9b84c Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81296/new/ https://reviews.llvm.org/D81296 From llvm-commits at lists.llvm.org Fri Jul 10 08:00:03 2020 From: llvm-commits at lists.llvm.org (Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:00:03 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <8cb4ae811e002ac8ac4400abf52f785d@localhost.localdomain> khchen updated this revision to Diff 277042. khchen added a comment. fix typo Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 Files: clang/lib/Basic/Targets/RISCV.cpp clang/lib/Basic/Targets/RISCV.h clang/lib/Driver/ToolChains/Arch/RISCV.cpp clang/lib/Driver/ToolChains/Arch/RISCV.h clang/lib/Driver/ToolChains/CommonArgs.cpp clang/test/Driver/riscv-cpus.c llvm/include/llvm/Support/RISCVTargetParser.def llvm/include/llvm/Support/TargetParser.h llvm/lib/Support/TargetParser.cpp llvm/lib/Target/RISCV/RISCV.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D71124.277042.patch Type: text/x-patch Size: 17920 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:04:27 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via llvm-commits) Date: Fri, 10 Jul 2020 08:04:27 -0700 (PDT) Subject: [llvm] bce8fce - [FileCheck] Implement -dump-input-context Message-ID: <5f08837b.1c69fb81.33d45.de99@mx.google.com> Author: Joel E. Denny Date: 2020-07-10T11:02:10-04:00 New Revision: bce8fced41b96260a42dfbb254240a49769fafa9 URL: https://github.com/llvm/llvm-project/commit/bce8fced41b96260a42dfbb254240a49769fafa9 DIFF: https://github.com/llvm/llvm-project/commit/bce8fced41b96260a42dfbb254240a49769fafa9.diff LOG: [FileCheck] Implement -dump-input-context This patch is motivated by discussions at each of: * * When input is dumped as specified by `-dump-input=fail`, this patch filters the dump to show only input lines that are the starting lines of error diagnostics plus the number of contextual lines specified `-dump-input-context` (defaults to 5). When `-dump-input=always`, there might be not be any errors, so all input lines are printed, as without this patch. Here's some sample output with `-dump-input-context=3 -vv`: ``` <<<<<< . . . 13: foo 14: foo 15: hello world check:1 ^~~~~~~~~~~ 16: foo check:2'0 X~~ error: no match found 17: foo check:2'0 ~~~ 18: foo check:2'0 ~~~ 19: foo check:2'0 ~~~ . . . 27: foo check:2'0 ~~~ 28: foo check:2'0 ~~~ 29: foo check:2'0 ~~~ 30: goodbye word check:2'0 ~~~~~~~~~~~~ check:2'1 ? possible intended match 31: foo check:2'0 ~~~ 32: foo check:2'0 ~~~ 33: foo check:2'0 ~~~ . . . >>>>>> ``` Reviewed By: mehdi_amini, arsenm, jhenderson, rsmith, SjoerdMeijer, Meinersbur, lattner Differential Revision: https://reviews.llvm.org/D82203 Added: llvm/test/FileCheck/dump-input-context.txt llvm/test/FileCheck/dump-input-filter.txt Modified: llvm/test/FileCheck/dump-input-annotations.txt llvm/test/FileCheck/dump-input-enable.txt llvm/utils/FileCheck/FileCheck.cpp Removed: ################################################################################ diff --git a/llvm/test/FileCheck/dump-input-annotations.txt b/llvm/test/FileCheck/dump-input-annotations.txt index f4f0d3ca6022..785ee23d03dd 100644 --- a/llvm/test/FileCheck/dump-input-annotations.txt +++ b/llvm/test/FileCheck/dump-input-annotations.txt @@ -24,7 +24,7 @@ ; ALIGN:{{.*}}error:{{.*}} ; ALIGN:{{.*}}possible intended match here{{.*}} -; ALIGN:Full input was: +; ALIGN:Input was: ; ALIGN-NEXT:<<<<<< ; ALIGN-NEXT: 1: hello world ; ALIGN-NEXT:check:1 ^~~~~ diff --git a/llvm/test/FileCheck/dump-input-context.txt b/llvm/test/FileCheck/dump-input-context.txt new file mode 100644 index 000000000000..6badaf88778d --- /dev/null +++ b/llvm/test/FileCheck/dump-input-context.txt @@ -0,0 +1,227 @@ +;-------------------------------------------------- +; Input file, check file, and directives for checking the size of the context. +; +; These are designed to be used with -dump-input=fail -vv. +; +; In the resulting input dump, there are three potential ellipses: +; +; - S: At the start of the input. +; - M: Between two input lines included by the filter. +; - E: At the end of the input. +; +; They are all present at -dump-input-context=6. One disappears each time +; -dump-input-context is incremented beyond that because there are no lines +; left to elide. +;-------------------------------------------------- + +; RUN: echo foo8 > %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo lab1 hello >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo lab2 world >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo8 >> %t.in +; RUN: echo foo9 >> %t.in + +; RUN: echo 'CHECK-LABEL: lab1' > %t.chk +; RUN: echo ' CHECK-NEXT: hello' >> %t.chk +; RUN: echo 'CHECK-LABEL: lab2' >> %t.chk +; RUN: echo ' CHECK-NEXT: world' >> %t.chk + +; C0: <<<<<< +; CS-NEXT: . +; CS-NEXT: . +; CS-NEXT: . +; C8-NEXT: 1: foo8 +; C7-NEXT: 2: foo7 +; C6-NEXT: 3: foo6 +; C5-NEXT: 4: foo5 +; C4-NEXT: 5: foo4 +; C3-NEXT: 6: foo3 +; C2-NEXT: 7: foo2 +; C1-NEXT: 8: foo1 +; C0-NEXT: 9: lab1 hello +; C0-NEXT: label:1'0 ^~~~ +; C0-NEXT: label:1'1 ^~~~ +; C0-NEXT: next:2 !~~~~ error: match on wrong line +; C1-NEXT: 10: foo1 +; C2-NEXT: 11: foo2 +; C3-NEXT: 12: foo3 +; C4-NEXT: 13: foo4 +; C5-NEXT: 14: foo5 +; C6-NEXT: 15: foo6 +; C7-NEXT: 16: foo7 +; CM-NEXT: . +; CM-NEXT: . +; CM-NEXT: . +; C7-NEXT: 17: foo7 +; C6-NEXT: 18: foo6 +; C5-NEXT: 19: foo5 +; C4-NEXT: 20: foo4 +; C3-NEXT: 21: foo3 +; C2-NEXT: 22: foo2 +; C1-NEXT: 23: foo1 +; C0-NEXT: 24: lab2 world +; C0-NEXT: label:3 ^~~~ +; C0-NEXT: next:4 !~~~~ error: match on wrong line +; C1-NEXT: 25: foo1 +; C2-NEXT: 26: foo2 +; C3-NEXT: 27: foo3 +; C4-NEXT: 28: foo4 +; C5-NEXT: 29: foo5 +; C6-NEXT: 30: foo6 +; C7-NEXT: 31: foo7 +; C8-NEXT: 32: foo8 +; C9-NEXT: 33: foo9 +; CE-NEXT: . +; CE-NEXT: . +; CE-NEXT: . +; C0-NEXT: >>>>>> + +;-------------------------------------------------- +; Check -dump-input-context=. +;-------------------------------------------------- + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=-1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefix=BADVAL -DVAL=-1 + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=foobar \ +; RUN: | FileCheck %s -match-full-lines -check-prefix=BADVAL -DVAL=foobar + +BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input-context option: '[[VAL]]' value invalid for uint argument! + +;-------------------------------------------------- +; Check -dump-input-context explicit values. +;-------------------------------------------------- + +; 0 is an important boundary case. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=0 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,CS,CM,CE + +; 1 is an important boundary case. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +; 6 is the boundary case at which all ellipses are present in our test. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=6 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,CS,CM,CE + +; 7 is the boundary case at which the middle ellipsis disappears. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=7 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,CS,CE + +; 8 is the boundary case at which the start ellipsis disappears. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=8 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,CE + +; 9 is the boundary case at which the end ellipsis disappears. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=9 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,C9 + +; Make sure all is fine when -dump-input-context is far larger than the input. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=200 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,C9 + +;-------------------------------------------------- +; Check that -dump-input-context default is 5. +;-------------------------------------------------- + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: | FileCheck %s -match-full-lines \ +; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,CS,CM,CE + +;-------------------------------------------------- +; Check multiple -dump-input-context options. +; +; This might occur when a test author specifies -dump-input-context on a +; specific FileCheck call while a test runner specifies -dump-input-context in +; FILECHECK_OPTS, but check the behavior generally. +; +; The largest value wins because it provides the most information. +;-------------------------------------------------- + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check duplicate. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=1 -dump-input-context=1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check precedence. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=0 -dump-input-context=1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=1 -dump-input-context=0 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check that FILECHECK_OPTS isn't handled diff erently. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; RUN: %ProtectFileCheckOutput FILECHECK_OPTS=-dump-input-context=0 \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +; RUN: %ProtectFileCheckOutput FILECHECK_OPTS=-dump-input-context=1 \ +; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-context=0 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE diff --git a/llvm/test/FileCheck/dump-input-enable.txt b/llvm/test/FileCheck/dump-input-enable.txt index b0aacfb2fed2..932701b0019b 100644 --- a/llvm/test/FileCheck/dump-input-enable.txt +++ b/llvm/test/FileCheck/dump-input-enable.txt @@ -236,7 +236,7 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input option: Cannot find op ; NODUMP-NOT: <<<<<< -; DUMP-OK: Full input was: +; DUMP-OK: Input was: ; DUMP-OK-NEXT: <<<<<< ; DUMP-OK-NEXT: 1: hello ; DUMP-OK-NEXT: check:1 ^~~~~ @@ -244,7 +244,7 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input option: Cannot find op ; DUMP-OK-NEXT: next:2 ^~~~~ ; DUMP-OK-NEXT: >>>>>> -; DUMP-ERR: Full input was: +; DUMP-ERR: Input was: ; DUMP-ERR-NEXT: <<<<<< ; DUMP-ERR-NEXT: 1: hello ; DUMP-ERR-V-NEXT: check:1 ^~~~~ diff --git a/llvm/test/FileCheck/dump-input-filter.txt b/llvm/test/FileCheck/dump-input-filter.txt new file mode 100644 index 000000000000..f1b8cf543d05 --- /dev/null +++ b/llvm/test/FileCheck/dump-input-filter.txt @@ -0,0 +1,208 @@ +; To keep this test maintainable, avoid depending on -dump-input-context's +; default value, which is checked in dump-input-context.txt instead. + +;-------------------------------------------------- +; Create the input file and the check file. +;-------------------------------------------------- + +; line 1 +; RUN: echo start > %t.in +; RUN: echo foo0 >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo8 >> %t.in +; RUN: echo foo9 >> %t.in +; line 12 +; RUN: echo hello >> %t.in +; RUN: echo foo0 >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo8 >> %t.in +; RUN: echo foo9 >> %t.in +; line 23 +; RUN: echo word >> %t.in +; RUN: echo foo0 >> %t.in +; RUN: echo foo1 >> %t.in +; RUN: echo foo2 >> %t.in +; RUN: echo foo3 >> %t.in +; RUN: echo foo4 >> %t.in +; RUN: echo foo5 >> %t.in +; RUN: echo foo6 >> %t.in +; RUN: echo foo7 >> %t.in +; RUN: echo foo8 >> %t.in +; RUN: echo foo9 >> %t.in +; line 34 +; RUN: echo end >> %t.in + +; RUN: echo 'CHECK: start' > %t.chk +; RUN: echo 'CHECK: hello' >> %t.chk +; RUN: echo 'CHECK: world' >> %t.chk +; RUN: echo 'CHECK: end' >> %t.chk + +;-------------------------------------------------- +; Directives for checking the dump. +;-------------------------------------------------- + +; ALL: <<<<<< +; ALL-NEXT: 1: start +; ALL-NEXT: check:1 ^~~~~ +; ALL-NEXT: 2: foo0 +; ALL-NEXT: 3: foo1 +; ALL-NEXT: 4: foo2 +; ALL-NEXT: 5: foo3 +; ALL-NEXT: 6: foo4 +; ALL-NEXT: 7: foo5 +; ALL-NEXT: 8: foo6 +; ALL-NEXT: 9: foo7 +; ALL-NEXT: 10: foo8 +; ALL-NEXT: 11: foo9 +; ALL-NEXT: 12: hello +; ALL-NEXT: check:2 ^~~~~ +; ALL-NEXT: 13: foo0 +; ALL-NEXT: check:3'0 X~~~ error: no match found +; ALL-NEXT: 14: foo1 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 15: foo2 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 16: foo3 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 17: foo4 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 18: foo5 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 19: foo6 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 20: foo7 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 21: foo8 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 22: foo9 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 23: word +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: check:3'1 ? possible intended match +; ALL-NEXT: 24: foo0 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 25: foo1 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 26: foo2 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 27: foo3 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 28: foo4 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 29: foo5 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 30: foo6 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 31: foo7 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 32: foo8 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 33: foo9 +; ALL-NEXT: check:3'0 ~~~~ +; ALL-NEXT: 34: end +; ALL-NEXT: check:3'0 ~~~ +; ALL-NEXT: >>>>>> + +; ERROR: <<<<<< +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: 11: foo9 +; ERROR-NEXT: 12: hello +; ERROR-NEXT: check:2 ^~~~~ +; ERROR-NEXT: 13: foo0 +; ERROR-NEXT: check:3'0 X~~~ error: no match found +; ERROR-NEXT: 14: foo1 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: 15: foo2 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: 21: foo8 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: 22: foo9 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: 23: word +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: check:3'1 ? possible intended match +; ERROR-NEXT: 24: foo0 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: 25: foo1 +; ERROR-NEXT: check:3'0 ~~~~ +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: . +; ERROR-NEXT: >>>>>> + +;-------------------------------------------------- +; Check how -dump-input affects filter. +;-------------------------------------------------- + +; no -dump-input => include errors. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ERROR + +; -dump-input=fail => include errors. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input=fail \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ERROR + +; -dump-input=always => include all. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input=always \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ALL + +;-------------------------------------------------- +; Check that other kinds of errors are included by -dump-input=fail. +; +; "error: no match found" and "possible intended match" are checked above. +;-------------------------------------------------- + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; error: no match expected. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; RUN: echo 'foo' > %t.not-err.in +; RUN: echo 'CHECK-NOT: foo' > %t.not-err.chk + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=0 -dump-input=fail \ +; RUN: %t.not-err.chk < %t.not-err.in 2>&1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=NOT-ERR + +; NOT-ERR: 1: foo +; NOT-ERR-NEXT: not:1 !~~ error: no match expected + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; error: match on wrong line. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; RUN: echo 'foo' > %t.next-err.in +; RUN: echo 'foo' >> %t.next-err.in +; RUN: echo 'bar' >> %t.next-err.in +; RUN: echo 'CHECK: foo' > %t.next-err.chk +; RUN: echo 'CHECK-NEXT: bar' >> %t.next-err.chk + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=0 -dump-input=fail \ +; RUN: %t.next-err.chk < %t.next-err.in 2>&1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=NEXT-ERR + +; NEXT-ERR: 3: bar +; NEXT-ERR-NEXT: next:2 !~~ error: match on wrong line diff --git a/llvm/utils/FileCheck/FileCheck.cpp b/llvm/utils/FileCheck/FileCheck.cpp index 659491c89636..0952d4bd24d0 100644 --- a/llvm/utils/FileCheck/FileCheck.cpp +++ b/llvm/utils/FileCheck/FileCheck.cpp @@ -128,6 +128,14 @@ static cl::list DumpInputs( clEnumValN(DumpInputFail, "fail", "Dump input on failure"), clEnumValN(DumpInputNever, "never", "Never dump input"))); +static cl::list DumpInputContexts( + "dump-input-context", cl::value_desc("N"), + cl::desc("In the dump requested by -dump-input=fail, print input\n" + "lines before and input lines after the starting line of\n" + "any error diagnostic. When there are multiple occurrences of\n" + "this option, the largest specified has precedence. The\n" + "default is 5.\n")); + typedef cl::list::const_iterator prefix_iterator; @@ -150,10 +158,16 @@ struct MarkerStyle { raw_ostream::Colors Color; /// A note to follow the marker, or empty string if none. std::string Note; + /// Does this marker indicate inclusion by the input filter implied by + /// -dump-input=fail? + bool FiltersAsError; MarkerStyle() {} MarkerStyle(char Lead, raw_ostream::Colors Color, - const std::string &Note = "") - : Lead(Lead), Color(Color), Note(Note) {} + const std::string &Note = "", bool FiltersAsError = false) + : Lead(Lead), Color(Color), Note(Note), FiltersAsError(FiltersAsError) { + assert((!FiltersAsError || !Note.empty()) && + "expected error diagnostic to have note"); + } }; static MarkerStyle GetMarker(FileCheckDiag::MatchType MatchTy) { @@ -161,18 +175,22 @@ static MarkerStyle GetMarker(FileCheckDiag::MatchType MatchTy) { case FileCheckDiag::MatchFoundAndExpected: return MarkerStyle('^', raw_ostream::GREEN); case FileCheckDiag::MatchFoundButExcluded: - return MarkerStyle('!', raw_ostream::RED, "error: no match expected"); + return MarkerStyle('!', raw_ostream::RED, "error: no match expected", + /*FiltersAsError=*/true); case FileCheckDiag::MatchFoundButWrongLine: - return MarkerStyle('!', raw_ostream::RED, "error: match on wrong line"); + return MarkerStyle('!', raw_ostream::RED, "error: match on wrong line", + /*FiltersAsError=*/true); case FileCheckDiag::MatchFoundButDiscarded: return MarkerStyle('!', raw_ostream::CYAN, "discard: overlaps earlier match"); case FileCheckDiag::MatchNoneAndExcluded: return MarkerStyle('X', raw_ostream::GREEN); case FileCheckDiag::MatchNoneButExpected: - return MarkerStyle('X', raw_ostream::RED, "error: no match found"); + return MarkerStyle('X', raw_ostream::RED, "error: no match found", + /*FiltersAsError=*/true); case FileCheckDiag::MatchFuzzy: - return MarkerStyle('?', raw_ostream::MAGENTA, "possible intended match"); + return MarkerStyle('?', raw_ostream::MAGENTA, "possible intended match", + /*FiltersAsError=*/true); } llvm_unreachable_internal("unexpected match type"); } @@ -183,6 +201,7 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { << "\n" << "Related command-line options:\n" << " - -dump-input= enables or disables the input dump\n" + << " - -dump-input-context= adjusts the context of errors\n" << " - -v and -vv add more annotations\n" << " - -color forces colors to be enabled both in the dump and below\n" << " - -help documents the above options in more detail\n" @@ -228,6 +247,12 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { WithColor(OS, raw_ostream::SAVEDCOLOR, true) << "?"; OS << " marks fuzzy match when no match is found\n"; + // Elided lines. + OS << " - "; + WithColor(OS, raw_ostream::SAVEDCOLOR, true) << "..."; + OS << " indicates elided input lines and annotations, as specified by\n" + << " -dump-input=fail and -dump-input-context\n"; + // Colors. OS << " - colors "; WithColor(OS, raw_ostream::GREEN, true) << "success"; @@ -248,10 +273,12 @@ struct InputAnnotation { unsigned DiagIndex; /// The label for this annotation. std::string Label; + /// Is this the initial fragment of a diagnostic that has been broken across + /// multiple lines? + bool IsFirstLine; /// What input line (one-origin indexing) this annotation marks. This might - /// be diff erent from the starting line of the original diagnostic if this is - /// a non-initial fragment of a diagnostic that has been broken across - /// multiple lines. + /// be diff erent from the starting line of the original diagnostic if + /// !IsFirstLine. unsigned InputLine; /// The column range (one-origin indexing, open end) in which to mark the /// input line. If InputEndCol is UINT_MAX, treat it as the last column @@ -347,6 +374,7 @@ BuildInputAnnotations(const SourceMgr &SM, unsigned CheckFileBufferID, // Compute the mark location, and break annotation into multiple // annotations if it spans multiple lines. + A.IsFirstLine = true; A.InputLine = DiagItr->InputStartLine; A.InputStartCol = DiagItr->InputStartCol; if (DiagItr->InputStartLine == DiagItr->InputEndLine) { @@ -370,6 +398,7 @@ BuildInputAnnotations(const SourceMgr &SM, unsigned CheckFileBufferID, InputAnnotation B; B.DiagIndex = A.DiagIndex; B.Label = A.Label; + B.IsFirstLine = false; B.InputLine = L; B.Marker = A.Marker; B.Marker.Lead = '~'; @@ -386,11 +415,27 @@ BuildInputAnnotations(const SourceMgr &SM, unsigned CheckFileBufferID, } } +static unsigned FindInputLineInFilter( + bool FilterOnError, unsigned CurInputLine, + const std::vector::iterator &AnnotationBeg, + const std::vector::iterator &AnnotationEnd) { + if (!FilterOnError) + return CurInputLine; + for (auto AnnotationItr = AnnotationBeg; AnnotationItr != AnnotationEnd; + ++AnnotationItr) { + if (AnnotationItr->IsFirstLine && AnnotationItr->Marker.FiltersAsError) + return AnnotationItr->InputLine; + } + return UINT_MAX; +} + static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, + bool DumpInputFilterOnError, + unsigned DumpInputContext, StringRef InputFileText, std::vector &Annotations, unsigned LabelWidth) { - OS << "Full input was:\n<<<<<<\n"; + OS << "Input was:\n<<<<<<\n"; // Sort annotations. std::sort(Annotations.begin(), Annotations.end(), @@ -460,12 +505,47 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, LabelWidth = std::max(LabelWidth, LineNoWidth) + 3; // Print annotated input lines. + unsigned PrevLineInFilter = 0; // 0 means none so far + unsigned NextLineInFilter = 0; // 0 means uncomputed, UINT_MAX means none + bool PrevLineElided = false; auto AnnotationItr = Annotations.begin(), AnnotationEnd = Annotations.end(); for (unsigned Line = 1; InputFilePtr != InputFileEnd || AnnotationItr != AnnotationEnd; ++Line) { const unsigned char *InputFileLine = InputFilePtr; + // Compute the previous and next line included by the filter. + if (NextLineInFilter < Line) + NextLineInFilter = FindInputLineInFilter(DumpInputFilterOnError, Line, + AnnotationItr, AnnotationEnd); + assert(NextLineInFilter && "expected NextLineInFilter to be computed"); + if (NextLineInFilter == Line) + PrevLineInFilter = Line; + + // Elide this input line and its annotations if it's not within the + // context specified by -dump-input-context of an input line included by + // the dump filter. + if ((!PrevLineInFilter || PrevLineInFilter + DumpInputContext < Line) && + (NextLineInFilter == UINT_MAX || + Line + DumpInputContext < NextLineInFilter)) { + while (InputFilePtr != InputFileEnd && *InputFilePtr != '\n') + ++InputFilePtr; + if (InputFilePtr != InputFileEnd) + ++InputFilePtr; + while (AnnotationItr != AnnotationEnd && AnnotationItr->InputLine == Line) + ++AnnotationItr; + if (!PrevLineElided) { + for (unsigned i = 0; i < 3; ++i) { + WithColor(OS, raw_ostream::BLACK, /*Bold=*/true) + << right_justify(".", LabelWidth); + OS << '\n'; + } + PrevLineElided = true; + } + continue; + } + PrevLineElided = false; + // Print right-aligned line number. WithColor(OS, raw_ostream::BLACK, true) << format_decimal(Line, LabelWidth) << ": "; @@ -553,10 +633,21 @@ int main(int argc, char **argv) { InitLLVM X(argc, argv); cl::ParseCommandLineOptions(argc, argv, /*Overview*/ "", /*Errs*/ nullptr, "FILECHECK_OPTS"); + + // Select -dump-input* values. The -help documentation specifies the default + // value and which value to choose if an option is specified multiple times. + // In the latter case, the general rule of thumb is to choose the value that + // provides the most information. DumpInputValue DumpInput = DumpInputs.empty() ? DumpInputFail : *std::max_element(DumpInputs.begin(), DumpInputs.end()); + bool DumpInputFilterOnError = DumpInput == DumpInputFail; + unsigned DumpInputContext = DumpInputContexts.empty() + ? 5 + : *std::max_element(DumpInputContexts.begin(), + DumpInputContexts.end()); + if (DumpInput == DumpInputHelp) { DumpInputAnnotationHelp(outs()); return 0; @@ -689,7 +780,8 @@ int main(int argc, char **argv) { unsigned LabelWidth; BuildInputAnnotations(SM, CheckFileBufferID, ImpPatBufferIDRange, Diags, Annotations, LabelWidth); - DumpAnnotatedInput(errs(), Req, InputFileText, Annotations, LabelWidth); + DumpAnnotatedInput(errs(), Req, DumpInputFilterOnError, DumpInputContext, + InputFileText, Annotations, LabelWidth); } return ExitCode; From llvm-commits at lists.llvm.org Fri Jul 10 08:04:29 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via llvm-commits) Date: Fri, 10 Jul 2020 08:04:29 -0700 (PDT) Subject: [llvm] 77b6ddf - [FileCheck] In input dump, elide only if ellipsis is shorter Message-ID: <5f08837d.1c69fb81.28025.e3b6@mx.google.com> Author: Joel E. Denny Date: 2020-07-10T11:02:11-04:00 New Revision: 77b6ddf1bd77da90407316345156415dc646e744 URL: https://github.com/llvm/llvm-project/commit/77b6ddf1bd77da90407316345156415dc646e744 DIFF: https://github.com/llvm/llvm-project/commit/77b6ddf1bd77da90407316345156415dc646e744.diff LOG: [FileCheck] In input dump, elide only if ellipsis is shorter For example, given `-dump-input-context=3 -vv`, the following now shows more leading context for the error than requested because a leading ellipsis would occupy the same number of lines as it would elide: ``` <<<<<< 1: foo6 2: foo5 3: foo4 4: foo3 5: foo2 6: foo1 7: hello world check:1 ^~~~~ check:2 X~~~~ error: no match found 8: foo1 check:2 ~~~~ 9: foo2 check:2 ~~~~ 10: foo3 check:2 ~~~~ . . . >>>>>> ``` Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D83526 Added: Modified: llvm/test/FileCheck/dump-input-context.txt llvm/utils/FileCheck/FileCheck.cpp Removed: ################################################################################ diff --git a/llvm/test/FileCheck/dump-input-context.txt b/llvm/test/FileCheck/dump-input-context.txt index 6badaf88778d..2e4382ec12ed 100644 --- a/llvm/test/FileCheck/dump-input-context.txt +++ b/llvm/test/FileCheck/dump-input-context.txt @@ -9,9 +9,9 @@ ; - M: Between two input lines included by the filter. ; - E: At the end of the input. ; -; They are all present at -dump-input-context=6. One disappears each time -; -dump-input-context is incremented beyond that because there are no lines -; left to elide. +; They are all present at -dump-input-context=4. One becomes useless each time +; -dump-input-context is incremented beyond that because then that ellipsis +; becomes equal to or larger than the input lines it elides. ;-------------------------------------------------- ; RUN: echo foo8 > %t.in @@ -47,6 +47,7 @@ ; RUN: echo foo7 >> %t.in ; RUN: echo foo8 >> %t.in ; RUN: echo foo9 >> %t.in +; RUN: echo foo0 >> %t.in ; RUN: echo 'CHECK-LABEL: lab1' > %t.chk ; RUN: echo ' CHECK-NEXT: hello' >> %t.chk @@ -57,9 +58,9 @@ ; CS-NEXT: . ; CS-NEXT: . ; CS-NEXT: . -; C8-NEXT: 1: foo8 -; C7-NEXT: 2: foo7 -; C6-NEXT: 3: foo6 +; C5-NEXT: 1: foo8 +; C5-NEXT: 2: foo7 +; C5-NEXT: 3: foo6 ; C5-NEXT: 4: foo5 ; C4-NEXT: 5: foo4 ; C3-NEXT: 6: foo3 @@ -75,11 +76,11 @@ ; C4-NEXT: 13: foo4 ; C5-NEXT: 14: foo5 ; C6-NEXT: 15: foo6 -; C7-NEXT: 16: foo7 +; C6-NEXT: 16: foo7 ; CM-NEXT: . ; CM-NEXT: . ; CM-NEXT: . -; C7-NEXT: 17: foo7 +; C6-NEXT: 17: foo7 ; C6-NEXT: 18: foo6 ; C5-NEXT: 19: foo5 ; C4-NEXT: 20: foo4 @@ -96,13 +97,65 @@ ; C5-NEXT: 29: foo5 ; C6-NEXT: 30: foo6 ; C7-NEXT: 31: foo7 -; C8-NEXT: 32: foo8 -; C9-NEXT: 33: foo9 +; C7-NEXT: 32: foo8 +; C7-NEXT: 33: foo9 +; C7-NEXT: 34: foo0 ; CE-NEXT: . ; CE-NEXT: . ; CE-NEXT: . ; C0-NEXT: >>>>>> +; Now build an alternate set of checks where input lines that might be elided by +; ellipses have annotations. + +; RUN: cp %t.in %t.wide.in +; RUN: echo 'CHECK-LABEL: lab1' > %t.wide.chk +; RUN: echo ' CHECK: hello' >> %t.wide.chk +; RUN: echo ' CHECK: goodbye' >> %t.wide.chk +; RUN: echo 'CHECK-LABEL: lab2' >> %t.wide.chk +; RUN: echo ' CHECK-NEXT: world' >> %t.wide.chk + +; W5: <<<<<< +; W5: 9: lab1 hello +; W5-NEXT: label:1'0 ^~~~ +; W5-NEXT: label:1'1 ^~~~ +; W5-NEXT: check:2 ^~~~~ +; W5-NEXT: 10: foo1 +; W5-NEXT: check:3 X~~~ error: no match found +; W5-NEXT: 11: foo2 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 12: foo3 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 13: foo4 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 14: foo5 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 15: foo6 +; W5-NEXT: check:3 ~~~~ +; W6-NEXT: 16: foo7 +; W6-NEXT: check:3 ~~~~ +; WM-NEXT: . +; WM-NEXT: . +; WM-NEXT: . +; W6-NEXT: 17: foo7 +; W6-NEXT: check:3 ~~~~ +; W6-NEXT: 18: foo6 +; W6-NEXT: check:3 ~~~~ +; W5-NEXT: 19: foo5 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 20: foo4 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 21: foo3 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 22: foo2 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 23: foo1 +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: 24: lab2 world +; W5-NEXT: label:4 ^~~~ +; W5-NEXT: check:3 ~~~~ +; W5-NEXT: next:5 !~~~~ error: match on wrong line + ;-------------------------------------------------- ; Check -dump-input-context=. ;-------------------------------------------------- @@ -135,40 +188,35 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input-context option: '[[VAL ; RUN: -dump-input-context=1 \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE -; 6 is the boundary case at which all ellipses are present in our test. +; 4 is the boundary case at which all ellipses are present in our test. ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ -; RUN: -dump-input-context=6 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,CS,CM,CE +; RUN: -dump-input-context=4 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,CS,CM,CE -; 7 is the boundary case at which the middle ellipsis disappears. +; 5 is the boundary case at which the start ellipsis is useless. ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ -; RUN: -dump-input-context=7 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,CS,CE +; RUN: -dump-input-context=5 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,C5,CM,CE -; 8 is the boundary case at which the start ellipsis disappears. +; 6 is the boundary case at which the middle ellipsis is useless. ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ -; RUN: -dump-input-context=8 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,CE +; RUN: -dump-input-context=6 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,C5,C6,CE -; 9 is the boundary case at which the end ellipsis disappears. +; 7 is the boundary case at which the end ellipsis is useless. ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ -; RUN: -dump-input-context=9 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,C9 +; RUN: -dump-input-context=7 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7 ; Make sure all is fine when -dump-input-context is far larger than the input. ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ ; RUN: -dump-input-context=200 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7,C8,C9 +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,C5,C6,C7 ;-------------------------------------------------- ; Check that -dump-input-context default is 5. @@ -176,8 +224,7 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input-context option: '[[VAL ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ -; RUN: | FileCheck %s -match-full-lines \ -; RUN: -check-prefixes=C0,C1,C2,C3,C4,C5,CS,CM,CE +; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,C2,C3,C4,C5,CM,CE ;-------------------------------------------------- ; Check multiple -dump-input-context options. @@ -225,3 +272,22 @@ BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input-context option: '[[VAL ; RUN: not FileCheck -dump-input=fail -vv %t.chk < %t.in 2>&1 \ ; RUN: -dump-input-context=0 \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=C0,C1,CS,CM,CE + +;-------------------------------------------------- +; Check how annotations on input lines that might be elided by ellipses affect +; whether they are actually elided. +;-------------------------------------------------- + +; At -dump-input-context=5, the ellipsis is useful but only when annotations on +; elided input lines are considered. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.wide.chk < %t.wide.in 2>&1 \ +; RUN: -dump-input-context=5 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=W5,WM + +; At -dump-input-context=6, the ellipsis is not useful even when annotations on +; elided input lines are considered. +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input=fail -vv %t.wide.chk < %t.wide.in 2>&1 \ +; RUN: -dump-input-context=6 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=W5,W6 diff --git a/llvm/utils/FileCheck/FileCheck.cpp b/llvm/utils/FileCheck/FileCheck.cpp index 0952d4bd24d0..ec2556074aee 100644 --- a/llvm/utils/FileCheck/FileCheck.cpp +++ b/llvm/utils/FileCheck/FileCheck.cpp @@ -429,6 +429,25 @@ static unsigned FindInputLineInFilter( return UINT_MAX; } +/// To OS, print a vertical ellipsis (right-justified at LabelWidth) if it would +/// occupy less lines than ElidedLines, but print ElidedLines otherwise. Either +/// way, clear ElidedLines. Thus, if ElidedLines is empty, do nothing. +static void DumpEllipsisOrElidedLines(raw_ostream &OS, std::string &ElidedLines, + unsigned LabelWidth) { + if (ElidedLines.empty()) + return; + unsigned EllipsisLines = 3; + if (EllipsisLines < StringRef(ElidedLines).count('\n')) { + for (unsigned i = 0; i < EllipsisLines; ++i) { + WithColor(OS, raw_ostream::BLACK, /*Bold=*/true) + << right_justify(".", LabelWidth); + OS << '\n'; + } + } else + OS << ElidedLines; + ElidedLines.clear(); +} + static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, bool DumpInputFilterOnError, unsigned DumpInputContext, @@ -507,7 +526,12 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, // Print annotated input lines. unsigned PrevLineInFilter = 0; // 0 means none so far unsigned NextLineInFilter = 0; // 0 means uncomputed, UINT_MAX means none - bool PrevLineElided = false; + std::string ElidedLines; + raw_string_ostream ElidedLinesOS(ElidedLines); + ColorMode TheColorMode = + WithColor(OS).colorsEnabled() ? ColorMode::Enable : ColorMode::Disable; + if (TheColorMode == ColorMode::Enable) + ElidedLinesOS.enable_colors(true); auto AnnotationItr = Annotations.begin(), AnnotationEnd = Annotations.end(); for (unsigned Line = 1; InputFilePtr != InputFileEnd || AnnotationItr != AnnotationEnd; @@ -524,37 +548,29 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, // Elide this input line and its annotations if it's not within the // context specified by -dump-input-context of an input line included by - // the dump filter. + // the dump filter. However, in case the resulting ellipsis would occupy + // more lines than the input lines and annotations it elides, buffer the + // elided lines and annotations so we can print them instead. + raw_ostream *LineOS = &OS; if ((!PrevLineInFilter || PrevLineInFilter + DumpInputContext < Line) && (NextLineInFilter == UINT_MAX || - Line + DumpInputContext < NextLineInFilter)) { - while (InputFilePtr != InputFileEnd && *InputFilePtr != '\n') - ++InputFilePtr; - if (InputFilePtr != InputFileEnd) - ++InputFilePtr; - while (AnnotationItr != AnnotationEnd && AnnotationItr->InputLine == Line) - ++AnnotationItr; - if (!PrevLineElided) { - for (unsigned i = 0; i < 3; ++i) { - WithColor(OS, raw_ostream::BLACK, /*Bold=*/true) - << right_justify(".", LabelWidth); - OS << '\n'; - } - PrevLineElided = true; - } - continue; + Line + DumpInputContext < NextLineInFilter)) + LineOS = &ElidedLinesOS; + else { + LineOS = &OS; + DumpEllipsisOrElidedLines(OS, ElidedLinesOS.str(), LabelWidth); } - PrevLineElided = false; // Print right-aligned line number. - WithColor(OS, raw_ostream::BLACK, true) + WithColor(*LineOS, raw_ostream::BLACK, /*Bold=*/true, /*BF=*/false, + TheColorMode) << format_decimal(Line, LabelWidth) << ": "; // For the case where -v and colors are enabled, find the annotations for // good matches for expected patterns in order to highlight everything // else in the line. There are no such annotations if -v is disabled. std::vector FoundAndExpectedMatches; - if (Req.Verbose && WithColor(OS).colorsEnabled()) { + if (Req.Verbose && TheColorMode == ColorMode::Enable) { for (auto I = AnnotationItr; I != AnnotationEnd && I->InputLine == Line; ++I) { if (I->FoundAndExpectedMatch) @@ -566,7 +582,8 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, // expected patterns. bool Newline = false; { - WithColor COS(OS); + WithColor COS(*LineOS, raw_ostream::SAVEDCOLOR, /*Bold=*/false, + /*BG=*/false, TheColorMode); bool InMatch = false; if (Req.Verbose) COS.changeColor(raw_ostream::CYAN, true, true); @@ -590,13 +607,14 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, ++InputFilePtr; } } - OS << '\n'; + *LineOS << '\n'; unsigned InputLineWidth = InputFilePtr - InputFileLine - Newline; // Print any annotations. while (AnnotationItr != AnnotationEnd && AnnotationItr->InputLine == Line) { - WithColor COS(OS, AnnotationItr->Marker.Color, true); + WithColor COS(*LineOS, AnnotationItr->Marker.Color, /*Bold=*/true, + /*BG=*/false, TheColorMode); // The two spaces below are where the ": " appears on input lines. COS << left_justify(AnnotationItr->Label, LabelWidth) << " "; unsigned Col; @@ -621,6 +639,7 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, ++AnnotationItr; } } + DumpEllipsisOrElidedLines(OS, ElidedLinesOS.str(), LabelWidth); OS << ">>>>>>\n"; } From llvm-commits at lists.llvm.org Fri Jul 10 08:04:31 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via llvm-commits) Date: Fri, 10 Jul 2020 08:04:31 -0700 (PDT) Subject: [llvm] 9fd4b5f - [FileCheck] Implement -dump-input-filter Message-ID: <5f08837f.1c69fb81.20553.f98c@mx.google.com> Author: Joel E. Denny Date: 2020-07-10T11:02:11-04:00 New Revision: 9fd4b5faacbdfb887389c9ac246efa23be1cd334 URL: https://github.com/llvm/llvm-project/commit/9fd4b5faacbdfb887389c9ac246efa23be1cd334 DIFF: https://github.com/llvm/llvm-project/commit/9fd4b5faacbdfb887389c9ac246efa23be1cd334.diff LOG: [FileCheck] Implement -dump-input-filter This makes the input dump filtering implemented by D82203 more configurable. D82203 enables filtering out everything but the initial input lines of error diagnostics (plus some context). This patch enables including any line with any kind of annotation. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D83097 Added: Modified: llvm/test/FileCheck/dump-input-filter.txt llvm/utils/FileCheck/FileCheck.cpp Removed: ################################################################################ diff --git a/llvm/test/FileCheck/dump-input-filter.txt b/llvm/test/FileCheck/dump-input-filter.txt index f1b8cf543d05..29f43ff1ce80 100644 --- a/llvm/test/FileCheck/dump-input-filter.txt +++ b/llvm/test/FileCheck/dump-input-filter.txt @@ -115,6 +115,102 @@ ; ALL-NEXT: check:3'0 ~~~ ; ALL-NEXT: >>>>>> +; ANNOTATION-FULL: <<<<<< +; ANNOTATION-FULL-NEXT: 1: start +; ANNOTATION-FULL-NEXT: check:1 ^~~~~ +; ANNOTATION-FULL-NEXT: 2: foo0 +; ANNOTATION-FULL-NEXT: 3: foo1 +; ANNOTATION-FULL-NEXT: . +; ANNOTATION-FULL-NEXT: . +; ANNOTATION-FULL-NEXT: . +; ANNOTATION-FULL-NEXT: 10: foo8 +; ANNOTATION-FULL-NEXT: 11: foo9 +; ANNOTATION-FULL-NEXT: 12: hello +; ANNOTATION-FULL-NEXT: check:2 ^~~~~ +; ANNOTATION-FULL-NEXT: 13: foo0 +; ANNOTATION-FULL-NEXT: check:3'0 X~~~ error: no match found +; ANNOTATION-FULL-NEXT: 14: foo1 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 15: foo2 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 16: foo3 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 17: foo4 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 18: foo5 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 19: foo6 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 20: foo7 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 21: foo8 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 22: foo9 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 23: word +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: check:3'1 ? possible intended match +; ANNOTATION-FULL-NEXT: 24: foo0 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 25: foo1 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 26: foo2 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 27: foo3 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 28: foo4 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 29: foo5 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 30: foo6 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 31: foo7 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 32: foo8 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 33: foo9 +; ANNOTATION-FULL-NEXT: check:3'0 ~~~~ +; ANNOTATION-FULL-NEXT: 34: end +; ANNOTATION-FULL-NEXT: check:3'0 ~~~ +; ANNOTATION-FULL-NEXT: >>>>>> + +; ANNOTATION: <<<<<< +; ANNOTATION-NEXT: 1: start +; ANNOTATION-NEXT: check:1 ^~~~~ +; ANNOTATION-NEXT: 2: foo0 +; ANNOTATION-NEXT: 3: foo1 +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: 10: foo8 +; ANNOTATION-NEXT: 11: foo9 +; ANNOTATION-NEXT: 12: hello +; ANNOTATION-NEXT: check:2 ^~~~~ +; ANNOTATION-NEXT: 13: foo0 +; ANNOTATION-NEXT: check:3'0 X~~~ error: no match found +; ANNOTATION-NEXT: 14: foo1 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: 15: foo2 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: 21: foo8 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: 22: foo9 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: 23: word +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: check:3'1 ? possible intended match +; ANNOTATION-NEXT: 24: foo0 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: 25: foo1 +; ANNOTATION-NEXT: check:3'0 ~~~~ +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: . +; ANNOTATION-NEXT: >>>>>> + ; ERROR: <<<<<< ; ERROR-NEXT: . ; ERROR-NEXT: . @@ -148,28 +244,149 @@ ; ERROR-NEXT: >>>>>> ;-------------------------------------------------- -; Check how -dump-input affects filter. +; Check -dump-input-filter=. +;-------------------------------------------------- + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=foobar \ +; RUN: | FileCheck %s -match-full-lines -check-prefix=BADVAL + +BADVAL: {{F|f}}ile{{C|c}}heck{{.*}}: for the --dump-input-filter option: Cannot find option named 'foobar'! + +;-------------------------------------------------- +; Check -dump-input-filter explicit values. +;-------------------------------------------------- + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=all \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ALL + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=annotation-full \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION-FULL + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=annotation \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION + +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=error \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ERROR + +;-------------------------------------------------- +; Check -dump-input-filter defaults. ;-------------------------------------------------- -; no -dump-input => include errors. +; no -dump-input => -dump-input-filter=error ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=ERROR -; -dump-input=fail => include errors. +; -dump-input=fail => -dump-input-filter=error ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ ; RUN: -dump-input=fail \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=ERROR -; -dump-input=always => include all. +; -dump-input=always => -dump-input-filter=all ; RUN: %ProtectFileCheckOutput \ ; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ ; RUN: -dump-input=always \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=ALL ;-------------------------------------------------- -; Check that other kinds of errors are included by -dump-input=fail. +; Check multiple -dump-input-filter options. +; +; This might occur when a test author specifies -dump-input-filter on a specific +; FileCheck call while a test runner specifies -dump-input-filter in +; FILECHECK_OPTS, but check the behavior generally. +; +; The value providing the most information wins. +;-------------------------------------------------- + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check duplicate. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; all, all => all +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=all -dump-input-filter=all \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ALL + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check precedence. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; all, annotation-full => all +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=all -dump-input-filter=annotation-full \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ALL + +; annotation-full, annotation => annotation-full +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=annotation-full \ +; RUN: -dump-input-filter=annotation \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION-FULL + +; annotation, error => annotation +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=annotation -dump-input-filter=error \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check that order doesn't matter. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; error, annotation => annotation +; RUN: %ProtectFileCheckOutput \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=error -dump-input-filter=annotation \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION + +;- - - - - - - - - - - - - - - - - - - - - - - - - +; Check that FILECHECK_OPTS isn't handled diff erently. +;- - - - - - - - - - - - - - - - - - - - - - - - - + +; annotation, error => annotation +; RUN: %ProtectFileCheckOutput FILECHECK_OPTS=-dump-input-filter=annotation \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=error \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION + +; error, annotation => annotation +; RUN: %ProtectFileCheckOutput FILECHECK_OPTS=-dump-input-filter=error \ +; RUN: not FileCheck -dump-input-context=2 -vv %t.chk < %t.in 2>&1 \ +; RUN: -dump-input-filter=annotation \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=ANNOTATION + +;-------------------------------------------------- +; Check the case where all input lines are filtered out. +;-------------------------------------------------- + +; RUN: echo 'CHECK: hello' > %t.good.chk + +; RUN: %ProtectFileCheckOutput \ +; RUN: FileCheck -dump-input=always -dump-input-filter=error -vv %t.good.chk \ +; RUN: < %t.in 2>&1 \ +; RUN: | FileCheck %s -match-full-lines -check-prefixes=EMPTY + +; EMPTY: <<<<<< +; EMPTY-NEXT: . +; EMPTY-NEXT: . +; EMPTY-NEXT: . +; EMPTY-NEXT: >>>>>> + +;-------------------------------------------------- +; Check that other kinds of errors are included by -dump-input-filter=error. ; ; "error: no match found" and "possible intended match" are checked above. ;-------------------------------------------------- @@ -182,7 +399,7 @@ ; RUN: echo 'CHECK-NOT: foo' > %t.not-err.chk ; RUN: %ProtectFileCheckOutput \ -; RUN: not FileCheck -dump-input-context=0 -dump-input=fail \ +; RUN: not FileCheck -dump-input-context=0 -dump-input-filter=error \ ; RUN: %t.not-err.chk < %t.not-err.in 2>&1 \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=NOT-ERR @@ -200,9 +417,9 @@ ; RUN: echo 'CHECK-NEXT: bar' >> %t.next-err.chk ; RUN: %ProtectFileCheckOutput \ -; RUN: not FileCheck -dump-input-context=0 -dump-input=fail \ +; RUN: not FileCheck -dump-input-context=0 -dump-input-filter=error \ ; RUN: %t.next-err.chk < %t.next-err.in 2>&1 \ ; RUN: | FileCheck %s -match-full-lines -check-prefixes=NEXT-ERR ; NEXT-ERR: 3: bar -; NEXT-ERR-NEXT: next:2 !~~ error: match on wrong line +; NEXT-ERR-NEXT: next:2 !~~ error: match on wrong line \ No newline at end of file diff --git a/llvm/utils/FileCheck/FileCheck.cpp b/llvm/utils/FileCheck/FileCheck.cpp index ec2556074aee..8bf1dd2e9b49 100644 --- a/llvm/utils/FileCheck/FileCheck.cpp +++ b/llvm/utils/FileCheck/FileCheck.cpp @@ -128,11 +128,38 @@ static cl::list DumpInputs( clEnumValN(DumpInputFail, "fail", "Dump input on failure"), clEnumValN(DumpInputNever, "never", "Never dump input"))); +// The order of DumpInputFilterValue members affects their precedence, as +// documented for -dump-input-filter below. +enum DumpInputFilterValue { + DumpInputFilterError, + DumpInputFilterAnnotation, + DumpInputFilterAnnotationFull, + DumpInputFilterAll +}; + +static cl::list DumpInputFilters( + "dump-input-filter", + cl::desc("In the dump requested by -dump-input, print only input lines of\n" + "kind plus any context specified by -dump-input-context.\n" + "When there are multiple occurrences of this option, the \n" + "that appears earliest in the list below has precedence. The\n" + "default is 'error' when -dump-input=fail, and it's 'all' when\n" + "-dump-input=always.\n"), + cl::value_desc("kind"), + cl::values(clEnumValN(DumpInputFilterAll, "all", "All input lines"), + clEnumValN(DumpInputFilterAnnotationFull, "annotation-full", + "Input lines with annotations"), + clEnumValN(DumpInputFilterAnnotation, "annotation", + "Input lines with starting points of annotations"), + clEnumValN(DumpInputFilterError, "error", + "Input lines with starting points of error " + "annotations"))); + static cl::list DumpInputContexts( "dump-input-context", cl::value_desc("N"), - cl::desc("In the dump requested by -dump-input=fail, print input\n" - "lines before and input lines after the starting line of\n" - "any error diagnostic. When there are multiple occurrences of\n" + cl::desc("In the dump requested by -dump-input, print input lines\n" + "before and input lines after any lines specified by\n" + "-dump-input-filter. When there are multiple occurrences of\n" "this option, the largest specified has precedence. The\n" "default is 5.\n")); @@ -158,8 +185,7 @@ struct MarkerStyle { raw_ostream::Colors Color; /// A note to follow the marker, or empty string if none. std::string Note; - /// Does this marker indicate inclusion by the input filter implied by - /// -dump-input=fail? + /// Does this marker indicate inclusion by -dump-input-filter=error? bool FiltersAsError; MarkerStyle() {} MarkerStyle(char Lead, raw_ostream::Colors Color, @@ -201,7 +227,8 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { << "\n" << "Related command-line options:\n" << " - -dump-input= enables or disables the input dump\n" - << " - -dump-input-context= adjusts the context of errors\n" + << " - -dump-input-filter= filters the input lines\n" + << " - -dump-input-context= adjusts the context of filtered lines\n" << " - -v and -vv add more annotations\n" << " - -color forces colors to be enabled both in the dump and below\n" << " - -help documents the above options in more detail\n" @@ -251,7 +278,7 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { OS << " - "; WithColor(OS, raw_ostream::SAVEDCOLOR, true) << "..."; OS << " indicates elided input lines and annotations, as specified by\n" - << " -dump-input=fail and -dump-input-context\n"; + << " -dump-input-filter and -dump-input-context\n"; // Colors. OS << " - colors "; @@ -416,15 +443,28 @@ BuildInputAnnotations(const SourceMgr &SM, unsigned CheckFileBufferID, } static unsigned FindInputLineInFilter( - bool FilterOnError, unsigned CurInputLine, + DumpInputFilterValue DumpInputFilter, unsigned CurInputLine, const std::vector::iterator &AnnotationBeg, const std::vector::iterator &AnnotationEnd) { - if (!FilterOnError) + if (DumpInputFilter == DumpInputFilterAll) return CurInputLine; for (auto AnnotationItr = AnnotationBeg; AnnotationItr != AnnotationEnd; ++AnnotationItr) { - if (AnnotationItr->IsFirstLine && AnnotationItr->Marker.FiltersAsError) + switch (DumpInputFilter) { + case DumpInputFilterAll: + llvm_unreachable("unexpected DumpInputFilterAll"); + break; + case DumpInputFilterAnnotationFull: return AnnotationItr->InputLine; + case DumpInputFilterAnnotation: + if (AnnotationItr->IsFirstLine) + return AnnotationItr->InputLine; + break; + case DumpInputFilterError: + if (AnnotationItr->IsFirstLine && AnnotationItr->Marker.FiltersAsError) + return AnnotationItr->InputLine; + break; + } } return UINT_MAX; } @@ -449,7 +489,7 @@ static void DumpEllipsisOrElidedLines(raw_ostream &OS, std::string &ElidedLines, } static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, - bool DumpInputFilterOnError, + DumpInputFilterValue DumpInputFilter, unsigned DumpInputContext, StringRef InputFileText, std::vector &Annotations, @@ -540,7 +580,7 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, // Compute the previous and next line included by the filter. if (NextLineInFilter < Line) - NextLineInFilter = FindInputLineInFilter(DumpInputFilterOnError, Line, + NextLineInFilter = FindInputLineInFilter(DumpInputFilter, Line, AnnotationItr, AnnotationEnd); assert(NextLineInFilter && "expected NextLineInFilter to be computed"); if (NextLineInFilter == Line) @@ -548,7 +588,7 @@ static void DumpAnnotatedInput(raw_ostream &OS, const FileCheckRequest &Req, // Elide this input line and its annotations if it's not within the // context specified by -dump-input-context of an input line included by - // the dump filter. However, in case the resulting ellipsis would occupy + // -dump-input-filter. However, in case the resulting ellipsis would occupy // more lines than the input lines and annotations it elides, buffer the // elided lines and annotations so we can print them instead. raw_ostream *LineOS = &OS; @@ -661,7 +701,13 @@ int main(int argc, char **argv) { DumpInputs.empty() ? DumpInputFail : *std::max_element(DumpInputs.begin(), DumpInputs.end()); - bool DumpInputFilterOnError = DumpInput == DumpInputFail; + DumpInputFilterValue DumpInputFilter; + if (DumpInputFilters.empty()) + DumpInputFilter = DumpInput == DumpInputAlways ? DumpInputFilterAll + : DumpInputFilterError; + else + DumpInputFilter = + *std::max_element(DumpInputFilters.begin(), DumpInputFilters.end()); unsigned DumpInputContext = DumpInputContexts.empty() ? 5 : *std::max_element(DumpInputContexts.begin(), @@ -799,7 +845,7 @@ int main(int argc, char **argv) { unsigned LabelWidth; BuildInputAnnotations(SM, CheckFileBufferID, ImpPatBufferIDRange, Diags, Annotations, LabelWidth); - DumpAnnotatedInput(errs(), Req, DumpInputFilterOnError, DumpInputContext, + DumpAnnotatedInput(errs(), Req, DumpInputFilter, DumpInputContext, InputFileText, Annotations, LabelWidth); } From llvm-commits at lists.llvm.org Fri Jul 10 08:04:33 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:04:33 +0000 (UTC) Subject: [PATCH] D82203: [FileCheck] Implement -dump-input-context In-Reply-To: References: Message-ID: <9bb92d995663233afaff55bccd0ddba5@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGbce8fced41b9: [FileCheck] Implement -dump-input-context (authored by jdenny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82203/new/ https://reviews.llvm.org/D82203 Files: llvm/test/FileCheck/dump-input-annotations.txt llvm/test/FileCheck/dump-input-context.txt llvm/test/FileCheck/dump-input-enable.txt llvm/test/FileCheck/dump-input-filter.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82203.277045.patch Type: text/x-patch Size: 27157 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:04:36 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:04:36 +0000 (UTC) Subject: [PATCH] D83526: [FileCheck] In input dump, elide only if ellipsis is shorter In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG77b6ddf1bd77: [FileCheck] In input dump, elide only if ellipsis is shorter (authored by jdenny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83526/new/ https://reviews.llvm.org/D83526 Files: llvm/test/FileCheck/dump-input-context.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83526.277046.patch Type: text/x-patch Size: 13471 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:04:40 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:04:40 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: <3824da190b9158c18401e16e89c5936e@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG9fd4b5faacbd: [FileCheck] Implement -dump-input-filter (authored by jdenny). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 Files: llvm/test/FileCheck/dump-input-filter.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83097.277047.patch Type: text/x-patch Size: 19090 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:05:40 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:05:40 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: <91ba5168ec9e5b082728228c96eef392@localhost.localdomain> jdenny marked an inline comment as done. jdenny added inline comments. ================ Comment at: llvm/utils/FileCheck/FileCheck.cpp:455 + case DumpInputFilterAll: + llvm_unreachable("unexpected DumpInputFilterAll"); + break; ---------------- jdenny wrote: > mehdi_amini wrote: > > In a tool like FileCheck I rather err on the side of deterministically failing with a `report_fatal_error` > I don't object in principal, but I see no precedent for this in FileCheck. > > Are you ok with this landing as is? If FileCheck should generally use `report_fatal_error` instead of `llvm_unreachable`, I feel like that should be discussed in a separate review for all occurrences. Given your accept and the tone of your comment, I decided it's safe to land this as is. I'm fine to revert or adjust if you feel this was the wrong decision. And again, I'm open to a larger discussion about making this change throughout FileCheck. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 From llvm-commits at lists.llvm.org Thu Jul 9 01:11:29 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard (Linaro) via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:11:29 +0000 (UTC) Subject: [PATCH] D70720: [llvm-objdump] Display locations of variables alongside disassembly In-Reply-To: References: Message-ID: <866e6d39fc4f1ac9ffebc1507b79ea5c@localhost.localdomain> ostannard added a comment. This was blocked on D76291 for a while, but that is now committed, so I'll re-land this today. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70720/new/ https://reviews.llvm.org/D70720 From llvm-commits at lists.llvm.org Thu Jul 9 01:58:26 2020 From: llvm-commits at lists.llvm.org (Oliver Stannard (Linaro) via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 08:58:26 +0000 (UTC) Subject: [PATCH] D70720: [llvm-objdump] Display locations of variables alongside disassembly In-Reply-To: References: Message-ID: <83f35cbc0b5dd369c3e2fdf257f6c1ad@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGdc4a6f5db4f0: [llvm-objdump] Display locations of variables alongside disassembly (authored by ostannard). Changed prior to commit: https://reviews.llvm.org/D70720?vs=250510&id=276668#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70720/new/ https://reviews.llvm.org/D70720 Files: llvm/docs/CommandGuide/llvm-objdump.rst llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s llvm/tools/llvm-objdump/llvm-objdump.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D70720.276668.patch Type: text/x-patch Size: 94191 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Thu Jul 9 13:52:32 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:52:32 +0000 (UTC) Subject: [PATCH] D82202: [AMDGPU] Return restricted number of regs from TTI In-Reply-To: References: Message-ID: <6e4332f90e78c4f4cbae6a73419ed1cd@localhost.localdomain> rampitec added a comment. Ping. This is mostly NFC for now, but is there any reason to return false info from TTI anyway? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82202/new/ https://reviews.llvm.org/D82202 From llvm-commits at lists.llvm.org Thu Jul 9 13:53:41 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 20:53:41 +0000 (UTC) Subject: [PATCH] D82202: [AMDGPU] Return restricted number of regs from TTI In-Reply-To: References: Message-ID: <8ae336685806dc289488060b7f002894@localhost.localdomain> arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. This is terrible, but if it's already there... CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82202/new/ https://reviews.llvm.org/D82202 From llvm-commits at lists.llvm.org Thu Jul 9 14:32:00 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Thu, 09 Jul 2020 21:32:00 +0000 (UTC) Subject: [PATCH] D82202: [AMDGPU] Return restricted number of regs from TTI In-Reply-To: References: Message-ID: <04d157c802e2870c1aceea79476a4751@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG77f8f813a9ae: [AMDGPU] Return restricted number of regs from TTI (authored by rampitec). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82202/new/ https://reviews.llvm.org/D82202 Files: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h Index: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -74,6 +74,7 @@ AMDGPUTTIImpl CommonTTI; bool IsGraphicsShader; bool HasFP32Denormals; + unsigned MaxVGPRs; const FeatureBitset InlineFeatureIgnoreList = { // Codegen control options which don't matter. @@ -133,7 +134,11 @@ TLI(ST->getTargetLowering()), CommonTTI(TM, F), IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())), - HasFP32Denormals(AMDGPU::SIModeRegisterDefaults(F).allFP32Denormals()) {} + HasFP32Denormals(AMDGPU::SIModeRegisterDefaults(F).allFP32Denormals()), + MaxVGPRs(ST->getMaxNumVGPRs( + std::max(ST->getWavesPerEU(F).first, + ST->getWavesPerEUForWorkGroup( + ST->getFlatWorkGroupSizes(F).second)))) {} bool hasBranchDivergence() { return true; } bool useGPUDivergenceAnalysis() const; @@ -148,6 +153,7 @@ unsigned getHardwareNumberOfRegisters(bool Vector) const; unsigned getNumberOfRegisters(bool Vector) const; + unsigned getNumberOfRegisters(unsigned RCID) const; unsigned getRegisterBitWidth(bool Vector) const; unsigned getMinVectorRegisterBitWidth() const; unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize, Index: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -239,7 +239,7 @@ unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. - return 256; + return MaxVGPRs; } unsigned GCNTTIImpl::getNumberOfRegisters(bool Vec) const { @@ -248,6 +248,13 @@ return getHardwareNumberOfRegisters(Vec) >> 3; } +unsigned GCNTTIImpl::getNumberOfRegisters(unsigned RCID) const { + const SIRegisterInfo *TRI = ST->getRegisterInfo(); + const TargetRegisterClass *RC = TRI->getRegClass(RCID); + unsigned NumVGPRs = (TRI->getRegSizeInBits(*RC) + 31) / 32; + return getHardwareNumberOfRegisters(false) / NumVGPRs; +} + unsigned GCNTTIImpl::getRegisterBitWidth(bool Vector) const { return 32; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D82202.276838.patch Type: text/x-patch Size: 2446 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 04:18:27 2020 From: llvm-commits at lists.llvm.org (Sebastian Neubauer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 11:18:27 +0000 (UTC) Subject: [PATCH] D81728: [InstCombine] Add target-specific inst combining In-Reply-To: References: Message-ID: <0de5388c34093c23e219822767215e74@localhost.localdomain> Flakebi updated this revision to Diff 276983. Flakebi added a comment. Rebased (no conflicts this time). Friendly ping for review. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81728/new/ https://reviews.llvm.org/D81728 Files: clang/test/CodeGen/thinlto-distributed-newpm.ll llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/IR/Function.h llvm/include/llvm/Transforms/InstCombine/InstCombiner.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/IR/Function.cpp llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/lib/Target/AMDGPU/InstCombineTables.td llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/X86/CMakeLists.txt llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp llvm/lib/Target/X86/X86TargetTransformInfo.h llvm/lib/Transforms/InstCombine/CMakeLists.txt llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp llvm/lib/Transforms/InstCombine/InstCombineInternal.h llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp llvm/lib/Transforms/InstCombine/InstCombineTables.td llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp llvm/lib/Transforms/InstCombine/InstructionCombining.cpp llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll llvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll llvm/test/Transforms/InstCombine/AMDGPU/ldexp.ll llvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll llvm/test/Transforms/InstCombine/ARM/neon-intrinsics.ll llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll llvm/test/Transforms/InstCombine/X86/X86FsubCmpCombine.ll llvm/test/Transforms/InstCombine/X86/addcarry.ll llvm/test/Transforms/InstCombine/X86/clmulqdq.ll llvm/test/Transforms/InstCombine/X86/x86-avx2.ll llvm/test/Transforms/InstCombine/X86/x86-avx512.ll llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll llvm/test/Transforms/InstCombine/X86/x86-insertps.ll llvm/test/Transforms/InstCombine/X86/x86-masked-memops.ll llvm/test/Transforms/InstCombine/X86/x86-movmsk.ll llvm/test/Transforms/InstCombine/X86/x86-pack.ll llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll llvm/test/Transforms/InstCombine/X86/x86-sse.ll llvm/test/Transforms/InstCombine/X86/x86-sse2.ll llvm/test/Transforms/InstCombine/X86/x86-sse41.ll llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll llvm/test/Transforms/InstCombine/X86/x86-vec_demanded_elts.ll llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll llvm/test/Transforms/InstCombine/X86/x86-xop.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81728.276983.patch Type: text/x-patch Size: 484375 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:08:09 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:08:09 +0000 (UTC) Subject: [PATCH] D83482: [yaml2obj] - Add a syntax to override e_phoff, e_phentsize and e_phnum fields. In-Reply-To: References: Message-ID: <0d7a8436b5c3218bbdc2f86b0c1bc1d5@localhost.localdomain> grimar updated this revision to Diff 277048. grimar added a comment. - Fix the test, broken by mistake. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83482/new/ https://reviews.llvm.org/D83482 Files: llvm/include/llvm/ObjectYAML/ELFYAML.h llvm/lib/ObjectYAML/ELFEmitter.cpp llvm/lib/ObjectYAML/ELFYAML.cpp llvm/test/tools/yaml2obj/ELF/header-sh-fields.yaml llvm/tools/obj2yaml/elf2yaml.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83482.277048.patch Type: text/x-patch Size: 7949 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:12:13 2020 From: llvm-commits at lists.llvm.org (David Stenberg via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:12:13 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <9c134ac420f62c2fcf07e0deffd15c43@localhost.localdomain> dstenb marked 3 inline comments as done. dstenb added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1359-1368 + dwarf::Attribute MacrosAttr = getDwarfVersion() >= 5 + ? dwarf::DW_AT_macros + : dwarf::DW_AT_GNU_macros; if (useSplitDwarf()) TheCU.addSectionDelta( - TheCU.getUnitDie(), dwarf::DW_AT_macros, U.getMacroLabelBegin(), + TheCU.getUnitDie(), MacrosAttr, U.getMacroLabelBegin(), TLOF.getDwarfMacroDWOSection()->getBeginSymbol()); ---------------- dblaikie wrote: > Looks like this might be wrong for v4 + split DWARF + using macro? Or perhaps this code isn't reachable by that combination? > > Might be more clear, then, to sink the MacrosAttr choice down into the "else" clause here, and assert in the split DWARF case that the version >= 5? (possibly including a note about how the pre-v5, GCC debug_macro extension isn't supported with Split DWARF) Sorry, in what way does this look wrong? If I am not overlooking something, this look the same as what GCC emits for the attribute in the `-g3 -gdwarf-4 -gsplit-dwarf` case. Regardless of the above, doing like you suggest and adding an assert seems like a good idea. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3032-3035 + Asm->emitULEB128(Type); + Asm->OutStreamer->AddComment("Line Number"); + Asm->emitULEB128(M.getLine()); + Asm->OutStreamer->AddComment("Macro String"); ---------------- dblaikie wrote: > /might/ be worth pulling these 4 lines out as a lambda to use from the if/else branches, but probably not... Although there is some code duplication, I think I prefer to keep it as is. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Fri Jul 10 08:15:37 2020 From: llvm-commits at lists.llvm.org (George Rimar via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:15:37 +0000 (UTC) Subject: [PATCH] D83559: [test/Object][llvm-objdump] - llvm-objdump: don't abort() when the e_phoff field is invalid and refine testing. Message-ID: grimar created this revision. grimar added reviewers: jhenderson, MaskRay. Herald added subscribers: rupprecht, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. llvm-objdump currently calls report_fatal_error() when the e_phoff field is invalid. This is tested by elf-invalid-phdr.test which has the following issues: 1. It uses a precompiled object. 2. it could be a part of invalid.test. 3. It tests the Object lib, but we have no separate test for llvm-objdump. This patch addresses issues mentioned. https://reviews.llvm.org/D83559 Files: llvm/test/Object/Inputs/invalid-phdr.elf llvm/test/Object/elf-invalid-phdr.test llvm/test/Object/invalid.test llvm/test/tools/llvm-objdump/ELF/invalid-phdr.test llvm/tools/llvm-objdump/ELFDump.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83559.277049.patch Type: text/x-patch Size: 4610 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:18:03 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:18:03 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <95d31d1de1713b2412577ca0d75c40ca@localhost.localdomain> ggeorgakoudis updated this revision to Diff 277051. ggeorgakoudis added a comment. Herald added a reviewer: sstefan1. Herald added subscribers: okura, bbn, kuter. Herald added a reviewer: baziotis. Update regression tests under Transforms/Attributor/IPConstantProp Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll llvm/test/Transforms/Attributor/IPConstantProp/arg-count-mismatch.ll llvm/test/Transforms/Attributor/IPConstantProp/arg-type-mismatch.ll llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.277051.patch Type: text/x-patch Size: 16704 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:18:04 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:18:04 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: jdoerfert added a comment. In D83269#2137745 , @JonChesterfield wrote: > I think there's slightly more code here than is necessary. > > Specifically, I think identifyKernels should return SmallPtrSetImpl instead of populating a member variable which can later be accessed. With a rename, proposing: > `SmallPtrSetImpl getKernels(Module &M){/*roughly contents of current identifyKernels */}` > > The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache. > > Thoughts? We will end up looking at it once per SCC in the program, per invocation of the pass. I would prefer to cache module wide information explicitly and this was the "smallest" solution for this for now. I can do recompute but the `nvvm.annotations` has ~100 (non-kernel) entries from the device runtime we'll have to go through every time. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 From llvm-commits at lists.llvm.org Fri Jul 10 08:19:07 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:19:07 +0000 (UTC) Subject: [PATCH] D57779: [SLP] Add support for throttling. In-Reply-To: References: Message-ID: <8f2933c2bb42e3cc8bd2b9ad7bdf9a0d@localhost.localdomain> ABataev added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3286-3295 + for (std::unique_ptr &TEPtr : Tree->VectorizableTree) { + TreeEntry *Entry = TEPtr.get(); + if (Entry->State == TreeEntry::Vectorize) + VecNodes.push_back(Entry); + } + // Canceling unprofitable elements. + for (std::unique_ptr &TEPtr : Tree->VectorizableTree) { ---------------- These two loops can be merged, no? And use `switch` instead of `if`, if possible, after merging ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3289 + if (Entry->State == TreeEntry::Vectorize) + VecNodes.push_back(Entry); + } ---------------- You don't need to push the elements to a new vector here, instead, you can directly perform required actions. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3865-3866 // Gathering cost would be too much for tiny trees. - if (VectorizableTree[0]->State == TreeEntry::NeedToGather || - VectorizableTree[1]->State == TreeEntry::NeedToGather) + if (Tree->VectorizableTree[0]->State == TreeEntry::NeedToGather || + Tree->VectorizableTree[1]->State == TreeEntry::NeedToGather) return false; ---------------- Maybe, better to use `!= TreeEntry::Vectorize` to avoid trees with proposed gathering? ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4084-4088 + for (User *Op : Inst->users()) + if (Tree->ScalarToTreeEntry.find(Op) != Tree->ScalarToTreeEntry.end()) { + NeedGather = true; + break; + } ---------------- `llvm::any_of(Inst->users(), [Tree](User *Op){ return Tree->ScalarToTreeEntry.count(Op) > 0; }` ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4121 + // Avoid reducing the tree if there is no potential room to reduce. + if ((Tree->TreeCost - UserCost - Sum) > -SLPCostThreshold) + return false; ---------------- `>=` ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6419 int Cost = R.getTreeCost(); + unsigned UserCost = 0; CandidateFound = true; ---------------- Do you really need this new var here? I don't see where it is used except as an argument of `R.findSubTree(UserCost)` call ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4108-4111 + int i = 0; + for (auto It = Vec.begin(), E = Vec.end(); It != E; ++It, i++) + if (i>MaxCostsRecalculations) + Vec.erase(It); ---------------- dtemirbulatov wrote: > ABataev wrote: > > Just `Vec.erase(Vec.rbegin(), Vec.rbegin() + (Vec.size() - MaxCostsRecalculations)`? > No, We could not use "Vec.rbegin() + " with std::set. Then just `Vec.erase(Vec.begin() + MaxCostsRecalculations, Vec.end());`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57779/new/ https://reviews.llvm.org/D57779 From llvm-commits at lists.llvm.org Fri Jul 10 08:19:38 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:19:38 +0000 (UTC) Subject: [PATCH] D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors. In-Reply-To: References: Message-ID: <1de193b4b017957139147afb170f6f1d@localhost.localdomain> ABataev updated this revision to Diff 277052. ABataev added a comment. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57059/new/ https://reviews.llvm.org/D57059 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll llvm/test/Transforms/SLPVectorizer/X86/PR32086.ll llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll llvm/test/Transforms/SLPVectorizer/X86/cse.ll llvm/test/Transforms/SLPVectorizer/X86/extract.ll llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll llvm/test/Transforms/SLPVectorizer/X86/multi_user.ll llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll llvm/test/Transforms/SLPVectorizer/X86/partail.ll llvm/test/Transforms/SLPVectorizer/X86/phi.ll llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll llvm/test/Transforms/SLPVectorizer/X86/resched.ll llvm/test/Transforms/SLPVectorizer/X86/reuse-extracts-in-wider-vect.ll llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D57059.277052.patch Type: text/x-patch Size: 215479 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:19:58 2020 From: llvm-commits at lists.llvm.org (Sourabh Singh Tomar via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:19:58 +0000 (UTC) Subject: [PATCH] D83560: [DebugInfo] Added support for DW_OP_implicit_value in llvm Message-ID: SouraVX created this revision. SouraVX added reviewers: aprantl, probinson, dblaikie, jini.susan.george, alok. SouraVX added projects: LLVM, debug-info. Herald added subscribers: llvm-commits, hiraditya. llvm is missing support for DW_OP_implicit_value operation. DW_OP_implicit_value op is indispensable for cases such as optimized out long double variables. For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions Consider the following example: int main() { long double ld = 3.14; printf("dummy\n"); ld *= ld; return 0; } when compiled with tunk `clang` as `clang test.c -g -O1` produces following location description of variable `ld`: DW_AT_location (0x00000000: [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value) DW_AT_name ("ld") Here one may notice that this representation is incorrect(DWARF4 stack could only hold integers(and only up to the size of address)). Here the variable size itself is `128` bit. GDB and LLDB confirms this: (gdb) p ld $1 = (lldb) frame variable ld (long double) ld = GCC represents/uses DW_OP_implicit_value in these sort of situations. Based on the discussion with Jakub Jelinek regarding GCC's motivation for using this, I concluded that DW_OP_implicit_value is most appropriate in this case. Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html GDB seems happy after this patch:(LLDB doesn't have support for DW_OP_implicit_value) (gdb) p ld p ld $1 = 3.14000000000000012434 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83560 Files: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h llvm/test/DebugInfo/X86/DW_OP_implicit_value.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83560.277053.patch Type: text/x-patch Size: 6143 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:28:05 2020 From: llvm-commits at lists.llvm.org (Jon Chesterfield via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:28:05 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: <885a144d338226d5e417d7a442a33098@localhost.localdomain> JonChesterfield accepted this revision. JonChesterfield added a comment. This revision is now accepted and ready to land. Fair enough, stateful it is then. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 From llvm-commits at lists.llvm.org Fri Jul 10 08:29:00 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Fri, 10 Jul 2020 08:29:00 -0700 (PDT) Subject: [llvm] 02fec9d - [DAGCombiner] move/rename variables for readability; NFC Message-ID: <5f08893c.1c69fb81.22853.e305@mx.google.com> Author: Sanjay Patel Date: 2020-07-10T11:28:51-04:00 New Revision: 02fec9d2a5f4a6f169bcf2e850eb244fb919309f URL: https://github.com/llvm/llvm-project/commit/02fec9d2a5f4a6f169bcf2e850eb244fb919309f DIFF: https://github.com/llvm/llvm-project/commit/02fec9d2a5f4a6f169bcf2e850eb244fb919309f.diff LOG: [DAGCombiner] move/rename variables for readability; NFC Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: ################################################################################ diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index effd5d6ab7d8..0d84cd89f5ae 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -16551,7 +16551,6 @@ bool DAGCombiner::tryStoreMergeOfLoads(SmallVectorImpl &StoreNodes, unsigned FirstStoreAS = FirstInChain->getAddressSpace(); unsigned FirstStoreAlign = FirstInChain->getAlignment(); LoadSDNode *FirstLoad = cast(LoadNodes[0].MemNode); - unsigned FirstLoadAlign = FirstLoad->getAlignment(); // Scan the memory operations on the chain and find the first // non-consecutive load memory address. These variables hold the index in @@ -16565,10 +16564,10 @@ bool DAGCombiner::tryStoreMergeOfLoads(SmallVectorImpl &StoreNodes, bool isDereferenceable = true; bool DoIntegerTruncate = false; StartAddress = LoadNodes[0].OffsetFromBase; - SDValue FirstChain = FirstLoad->getChain(); + SDValue LoadChain = FirstLoad->getChain(); for (unsigned i = 1; i < LoadNodes.size(); ++i) { // All loads must share the same chain. - if (LoadNodes[i].MemNode->getChain() != FirstChain) + if (LoadNodes[i].MemNode->getChain() != LoadChain) break; int64_t CurrAddress = LoadNodes[i].OffsetFromBase; @@ -16645,6 +16644,7 @@ bool DAGCombiner::tryStoreMergeOfLoads(SmallVectorImpl &StoreNodes, // the NumElem refers to array/index size. unsigned NumElem = std::min(NumConsecutiveStores, LastConsecutiveLoad + 1); NumElem = std::min(LastLegalType, NumElem); + unsigned FirstLoadAlign = FirstLoad->getAlignment(); if (NumElem < 2) { // We know that candidate stores are in order and of correct From llvm-commits at lists.llvm.org Fri Jul 10 08:29:03 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Fri, 10 Jul 2020 08:29:03 -0700 (PDT) Subject: [llvm] d84b4e1 - [AArch64][x86] add tests for rotated store merge; NFC Message-ID: <5f08893f.1c69fb81.de81.f5e1@mx.google.com> Author: Sanjay Patel Date: 2020-07-10T11:28:51-04:00 New Revision: d84b4e163da7fba26594f960fca10fa31f7c611a URL: https://github.com/llvm/llvm-project/commit/d84b4e163da7fba26594f960fca10fa31f7c611a DIFF: https://github.com/llvm/llvm-project/commit/d84b4e163da7fba26594f960fca10fa31f7c611a.diff LOG: [AArch64][x86] add tests for rotated store merge; NFC Added: Modified: llvm/test/CodeGen/AArch64/merge-store-dependency.ll llvm/test/CodeGen/X86/stores-merging.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/AArch64/merge-store-dependency.ll b/llvm/test/CodeGen/AArch64/merge-store-dependency.ll index 5613db1e5214..77b7012d2ed1 100644 --- a/llvm/test/CodeGen/AArch64/merge-store-dependency.ll +++ b/llvm/test/CodeGen/AArch64/merge-store-dependency.ll @@ -50,7 +50,6 @@ define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8*)* %func, i8* %arg ; A53-NEXT: // =>This Inner Loop Header: Depth=1 ; A53-NEXT: b .LBB0_4 entry: - %0 = bitcast %struct1* %fde to i8* tail call void @llvm.memset.p0i8.i64(i8* align 8 %0, i8 0, i64 40, i1 false) %state = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 4 @@ -96,6 +95,110 @@ exit: ret void } +define void @rotate16_in_place(i8* %p) { +; A53-LABEL: rotate16_in_place: +; A53: // %bb.0: +; A53-NEXT: ldrb w8, [x0, #1] +; A53-NEXT: ldrb w9, [x0] +; A53-NEXT: strb w8, [x0] +; A53-NEXT: strb w9, [x0, #1] +; A53-NEXT: ret + %p0 = getelementptr i8, i8* %p, i64 0 + %p1 = getelementptr i8, i8* %p, i64 1 + %i0 = load i8, i8* %p0, align 1 + %i1 = load i8, i8* %p1, align 1 + store i8 %i1, i8* %p0, align 1 + store i8 %i0, i8* %p1, align 1 + ret void +} + +define void @rotate16(i8* %p, i8* %q) { +; A53-LABEL: rotate16: +; A53: // %bb.0: +; A53-NEXT: ldrb w8, [x0, #1] +; A53-NEXT: ldrb w9, [x0] +; A53-NEXT: strb w8, [x1] +; A53-NEXT: strb w9, [x1, #1] +; A53-NEXT: ret + %p0 = getelementptr i8, i8* %p, i64 0 + %p1 = getelementptr i8, i8* %p, i64 1 + %q0 = getelementptr i8, i8* %q, i64 0 + %q1 = getelementptr i8, i8* %q, i64 1 + %i0 = load i8, i8* %p0, align 1 + %i1 = load i8, i8* %p1, align 1 + store i8 %i1, i8* %q0, align 1 + store i8 %i0, i8* %q1, align 1 + ret void +} + +define void @rotate32_in_place(i16* %p) { +; A53-LABEL: rotate32_in_place: +; A53: // %bb.0: +; A53-NEXT: ldrh w8, [x0, #2] +; A53-NEXT: ldrh w9, [x0] +; A53-NEXT: strh w8, [x0] +; A53-NEXT: strh w9, [x0, #2] +; A53-NEXT: ret + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + store i16 %i1, i16* %p0, align 2 + store i16 %i0, i16* %p1, align 2 + ret void +} + +define void @rotate32(i16* %p) { +; A53-LABEL: rotate32: +; A53: // %bb.0: +; A53-NEXT: ldrh w8, [x0, #2] +; A53-NEXT: ldrh w9, [x0] +; A53-NEXT: strh w8, [x0, #84] +; A53-NEXT: strh w9, [x0, #86] +; A53-NEXT: ret + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %p42 = getelementptr i16, i16* %p, i64 42 + %p43 = getelementptr i16, i16* %p, i64 43 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + store i16 %i1, i16* %p42, align 2 + store i16 %i0, i16* %p43, align 2 + ret void +} + +define void @rotate64_in_place(i32* %p) { +; A53-LABEL: rotate64_in_place: +; A53: // %bb.0: +; A53-NEXT: ldp w9, w8, [x0] +; A53-NEXT: stp w8, w9, [x0] +; A53-NEXT: ret + %p0 = getelementptr i32, i32* %p, i64 0 + %p1 = getelementptr i32, i32* %p, i64 1 + %i0 = load i32, i32* %p0, align 4 + %i1 = load i32, i32* %p1, align 4 + store i32 %i1, i32* %p0, align 4 + store i32 %i0, i32* %p1, align 4 + ret void +} + +define void @rotate64(i32* %p) { +; A53-LABEL: rotate64: +; A53: // %bb.0: +; A53-NEXT: ldp w9, w8, [x0] +; A53-NEXT: stp w8, w9, [x0, #8] +; A53-NEXT: ret + %p0 = getelementptr i32, i32* %p, i64 0 + %p1 = getelementptr i32, i32* %p, i64 1 + %p2 = getelementptr i32, i32* %p, i64 2 + %p3 = getelementptr i32, i32* %p, i64 3 + %i0 = load i32, i32* %p0, align 4 + %i1 = load i32, i32* %p1, align 4 + store i32 %i1, i32* %p2, align 4 + store i32 %i0, i32* %p3, align 4 + ret void +} + declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) declare i32 @fcntl(i32, i32, ...) declare noalias i8* @foo() diff --git a/llvm/test/CodeGen/X86/stores-merging.ll b/llvm/test/CodeGen/X86/stores-merging.ll index 6420ac7dc3ed..768684067f32 100644 --- a/llvm/test/CodeGen/X86/stores-merging.ll +++ b/llvm/test/CodeGen/X86/stores-merging.ll @@ -242,3 +242,200 @@ define void @pr43446_1(i8* %a) { store i1 true, i1* %b, align 1 ret void } + +define void @rotate16_in_place(i8* %p) { +; CHECK-LABEL: rotate16_in_place: +; CHECK: # %bb.0: +; CHECK-NEXT: movb (%rdi), %al +; CHECK-NEXT: movb 1(%rdi), %cl +; CHECK-NEXT: movb %cl, (%rdi) +; CHECK-NEXT: movb %al, 1(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i8, i8* %p, i64 0 + %p1 = getelementptr i8, i8* %p, i64 1 + %i0 = load i8, i8* %p0, align 1 + %i1 = load i8, i8* %p1, align 1 + store i8 %i1, i8* %p0, align 1 + store i8 %i0, i8* %p1, align 1 + ret void +} + +define void @rotate16(i8* %p, i8* %q) { +; CHECK-LABEL: rotate16: +; CHECK: # %bb.0: +; CHECK-NEXT: movb (%rdi), %al +; CHECK-NEXT: movb 1(%rdi), %cl +; CHECK-NEXT: movb %cl, (%rsi) +; CHECK-NEXT: movb %al, 1(%rsi) +; CHECK-NEXT: retq + %p0 = getelementptr i8, i8* %p, i64 0 + %p1 = getelementptr i8, i8* %p, i64 1 + %q0 = getelementptr i8, i8* %q, i64 0 + %q1 = getelementptr i8, i8* %q, i64 1 + %i0 = load i8, i8* %p0, align 1 + %i1 = load i8, i8* %p1, align 1 + store i8 %i1, i8* %q0, align 1 + store i8 %i0, i8* %q1, align 1 + ret void +} + +define void @rotate32_in_place(i16* %p) { +; CHECK-LABEL: rotate32_in_place: +; CHECK: # %bb.0: +; CHECK-NEXT: movzwl (%rdi), %eax +; CHECK-NEXT: movzwl 2(%rdi), %ecx +; CHECK-NEXT: movw %cx, (%rdi) +; CHECK-NEXT: movw %ax, 2(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + store i16 %i1, i16* %p0, align 2 + store i16 %i0, i16* %p1, align 2 + ret void +} + +define void @rotate32(i16* %p) { +; CHECK-LABEL: rotate32: +; CHECK: # %bb.0: +; CHECK-NEXT: movzwl (%rdi), %eax +; CHECK-NEXT: movzwl 2(%rdi), %ecx +; CHECK-NEXT: movw %cx, 84(%rdi) +; CHECK-NEXT: movw %ax, 86(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %p42 = getelementptr i16, i16* %p, i64 42 + %p43 = getelementptr i16, i16* %p, i64 43 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + store i16 %i1, i16* %p42, align 2 + store i16 %i0, i16* %p43, align 2 + ret void +} + +define void @rotate64_in_place(i32* %p) { +; CHECK-LABEL: rotate64_in_place: +; CHECK: # %bb.0: +; CHECK-NEXT: movl (%rdi), %eax +; CHECK-NEXT: movl 4(%rdi), %ecx +; CHECK-NEXT: movl %ecx, (%rdi) +; CHECK-NEXT: movl %eax, 4(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i32, i32* %p, i64 0 + %p1 = getelementptr i32, i32* %p, i64 1 + %i0 = load i32, i32* %p0, align 4 + %i1 = load i32, i32* %p1, align 4 + store i32 %i1, i32* %p0, align 4 + store i32 %i0, i32* %p1, align 4 + ret void +} + +define void @rotate64(i32* %p) { +; CHECK-LABEL: rotate64: +; CHECK: # %bb.0: +; CHECK-NEXT: movl (%rdi), %eax +; CHECK-NEXT: movl 4(%rdi), %ecx +; CHECK-NEXT: movl %ecx, 8(%rdi) +; CHECK-NEXT: movl %eax, 12(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i32, i32* %p, i64 0 + %p1 = getelementptr i32, i32* %p, i64 1 + %p2 = getelementptr i32, i32* %p, i64 2 + %p3 = getelementptr i32, i32* %p, i64 3 + %i0 = load i32, i32* %p0, align 4 + %i1 = load i32, i32* %p1, align 4 + store i32 %i1, i32* %p2, align 4 + store i32 %i0, i32* %p3, align 4 + ret void +} + +define void @rotate64_iterate(i16* %p) { +; CHECK-LABEL: rotate64_iterate: +; CHECK: # %bb.0: +; CHECK-NEXT: movl (%rdi), %eax +; CHECK-NEXT: movl 4(%rdi), %ecx +; CHECK-NEXT: movl %ecx, 84(%rdi) +; CHECK-NEXT: movl %eax, 88(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %p2 = getelementptr i16, i16* %p, i64 2 + %p3 = getelementptr i16, i16* %p, i64 3 + %p42 = getelementptr i16, i16* %p, i64 42 + %p43 = getelementptr i16, i16* %p, i64 43 + %p44 = getelementptr i16, i16* %p, i64 44 + %p45 = getelementptr i16, i16* %p, i64 45 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + %i2 = load i16, i16* %p2, align 2 + %i3 = load i16, i16* %p3, align 2 + store i16 %i2, i16* %p42, align 2 + store i16 %i3, i16* %p43, align 2 + store i16 %i0, i16* %p44, align 2 + store i16 %i1, i16* %p45, align 2 + ret void +} + +define void @rotate32_consecutive(i16* %p) { +; CHECK-LABEL: rotate32_consecutive: +; CHECK: # %bb.0: +; CHECK-NEXT: movzwl (%rdi), %eax +; CHECK-NEXT: movzwl 2(%rdi), %ecx +; CHECK-NEXT: movzwl 4(%rdi), %edx +; CHECK-NEXT: movzwl 6(%rdi), %esi +; CHECK-NEXT: movw %cx, 84(%rdi) +; CHECK-NEXT: movw %ax, 86(%rdi) +; CHECK-NEXT: movw %si, 88(%rdi) +; CHECK-NEXT: movw %dx, 90(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %p2 = getelementptr i16, i16* %p, i64 2 + %p3 = getelementptr i16, i16* %p, i64 3 + %p42 = getelementptr i16, i16* %p, i64 42 + %p43 = getelementptr i16, i16* %p, i64 43 + %p44 = getelementptr i16, i16* %p, i64 44 + %p45 = getelementptr i16, i16* %p, i64 45 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + %i2 = load i16, i16* %p2, align 2 + %i3 = load i16, i16* %p3, align 2 + store i16 %i1, i16* %p42, align 2 + store i16 %i0, i16* %p43, align 2 + store i16 %i3, i16* %p44, align 2 + store i16 %i2, i16* %p45, align 2 + ret void +} + +define void @rotate32_twice(i16* %p) { +; CHECK-LABEL: rotate32_twice: +; CHECK: # %bb.0: +; CHECK-NEXT: movzwl (%rdi), %eax +; CHECK-NEXT: movzwl 2(%rdi), %ecx +; CHECK-NEXT: movzwl 4(%rdi), %edx +; CHECK-NEXT: movzwl 6(%rdi), %esi +; CHECK-NEXT: movw %cx, 84(%rdi) +; CHECK-NEXT: movw %ax, 86(%rdi) +; CHECK-NEXT: movw %si, 108(%rdi) +; CHECK-NEXT: movw %dx, 110(%rdi) +; CHECK-NEXT: retq + %p0 = getelementptr i16, i16* %p, i64 0 + %p1 = getelementptr i16, i16* %p, i64 1 + %p2 = getelementptr i16, i16* %p, i64 2 + %p3 = getelementptr i16, i16* %p, i64 3 + %p42 = getelementptr i16, i16* %p, i64 42 + %p43 = getelementptr i16, i16* %p, i64 43 + %p54 = getelementptr i16, i16* %p, i64 54 + %p55 = getelementptr i16, i16* %p, i64 55 + %i0 = load i16, i16* %p0, align 2 + %i1 = load i16, i16* %p1, align 2 + %i2 = load i16, i16* %p2, align 2 + %i3 = load i16, i16* %p3, align 2 + store i16 %i1, i16* %p42, align 2 + store i16 %i0, i16* %p43, align 2 + store i16 %i3, i16* %p54, align 2 + store i16 %i2, i16* %p55, align 2 + ret void +} From llvm-commits at lists.llvm.org Fri Jul 10 08:30:16 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:30:16 +0000 (UTC) Subject: [PATCH] D82499: [DAGCombiner] tighten constraints for fma fold In-Reply-To: References: Message-ID: <5af27c9d6f67c8bfa6f52fa313dc5f9b@localhost.localdomain> spatel marked an inline comment as done. spatel added a comment. Any other comments/feedback? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82499/new/ https://reviews.llvm.org/D82499 From llvm-commits at lists.llvm.org Fri Jul 10 08:32:20 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:32:20 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <6ea264801b06ac168aae935f83fffc63@localhost.localdomain> jdoerfert added inline comments. ================ Comment at: llvm/lib/IR/Function.cpp:1499 + continue; + const auto *Call = dyn_cast(FU); ---------------- You might need to inspect the ACS here. Check if it is really a callback callsite. ================ Comment at: llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll:104 -; IS__CGSCC_NPM-NEXT: entry: -; IS__CGSCC_NPM-NEXT: ret i8* bitcast (i8** @GlobalVPtr to i8*) ; ---------------- TBH, this looks somehow we didn't run the script after a recent update. I'll commit an update to the tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Fri Jul 10 08:33:18 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Fri, 10 Jul 2020 08:33:18 -0700 (PDT) Subject: [llvm] eb5c7f6 - [ARM] Add test with tcreturn and debug value. Message-ID: <5f088a3e.1c69fb81.e3a6b.e9a3@mx.google.com> Author: Florian Hahn Date: 2020-07-10T16:32:21+01:00 New Revision: eb5c7f6b8fe0e66bbbc2aa23cc899fa11b750030 URL: https://github.com/llvm/llvm-project/commit/eb5c7f6b8fe0e66bbbc2aa23cc899fa11b750030 DIFF: https://github.com/llvm/llvm-project/commit/eb5c7f6b8fe0e66bbbc2aa23cc899fa11b750030.diff LOG: [ARM] Add test with tcreturn and debug value. In the attached test case, a non-terminator instruction (DBG_VALUE) is inserted after a terminator, producing an invalid MBB. Added: llvm/test/CodeGen/ARM/dbg-tcreturn.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/CodeGen/ARM/dbg-tcreturn.ll b/llvm/test/CodeGen/ARM/dbg-tcreturn.ll new file mode 100644 index 000000000000..9aed1bb14d58 --- /dev/null +++ b/llvm/test/CodeGen/ARM/dbg-tcreturn.ll @@ -0,0 +1,46 @@ +; RUN: llc %s -o - -stop-after=finalize-isel | FileCheck %s + +target datalayout = "e-m:o-p:32:32-Fi8-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32" +target triple = "thumbv7-apple-ios7.0.0" + +; CHECK-LABEL: name: test +; CHECK: body: +; CHECK-NEXT: bb.0.entry: +; CHECK-NEXT: liveins: $r0, $r1 +; CHECK: %1:gpr = COPY $r1 +; CHECK-NEXT: %0:gpr = COPY $r0 +; CHECK-NEXT: $r0 = COPY %0 +; CHECK-NEXT: $r1 = COPY %1 +; CHECK-NEXT: TCRETURNdi &__divsi3, implicit $sp, implicit $r0, implicit $r1 +; CHECK-NEXT: DBG_VALUE $noreg, $noreg, !13, !DIExpression(), debug-location !16 + +define i32 @test(i32 %a1, i32 %a2) !dbg !5 { +entry: + %res = sdiv i32 %a1, %a2 + call void @llvm.dbg.value(metadata i32 %res, metadata !13, metadata !DIExpression()), !dbg !16 + ret i32 %res +} + +; Function Attrs: nounwind readnone speculatable willreturn +declare void @llvm.dbg.value(metadata, metadata, metadata) + +!llvm.dbg.cu = !{!0} +!llvm.module.flags = !{!3, !4} + +!0 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !1, producer: "Swift", isOptimized: true, runtimeVersion: 5, emissionKind: FullDebug) +!1 = !DIFile(filename: "foo.swift", directory: "/tmp") +!2 = !{} +!3 = !{i32 2, !"Debug Info Version", i32 3} +!4 = !{i32 1, !"Swift Minor Version", i8 3} +!5 = distinct !DISubprogram(name: "n0", linkageName: "n1", scope: !7, file: !6, line: 86, type: !8, scopeLine: 86, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0) +!6 = !DIFile(filename: "bar.swift", directory: "") +!7 = !DIModule(scope: null, name: "Swift") +!8 = !DISubroutineType(types: !9) +!9 = !{!10, !12} +!10 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "Int", scope: !7, file: !11, size: 32, elements: !2, runtimeLang: DW_LANG_Swift, identifier: "$i1") +!11 = !DIFile(filename: "f1.swift", directory: "") +!12 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "n2", scope: !7, file: !6, size: 32, elements: !2, runtimeLang: DW_LANG_Swift, identifier: "n3") +!13 = !DILocalVariable(name: "n4", scope: !14, file: !1, line: 89, type: !15) +!14 = distinct !DILexicalBlock(scope: !5, file: !6, line: 86, column: 34) +!15 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !10) +!16 = !DILocation(line: 89, column: 9, scope: !14) From llvm-commits at lists.llvm.org Fri Jul 10 08:34:08 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:34:08 +0000 (UTC) Subject: [PATCH] D83469: [LLD][ELF] - Allow relocation sections to appear before their target sections. In-Reply-To: References: Message-ID: <4387cdbcf01651f733fc1d620c1cafd6@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83469/new/ https://reviews.llvm.org/D83469 From llvm-commits at lists.llvm.org Fri Jul 10 08:34:21 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:34:21 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <9be481356c9b64c2d49d3fd4c9c902bb@localhost.localdomain> sanwou01 updated this revision to Diff 277055. sanwou01 marked 3 inline comments as done. sanwou01 added a comment. Updates to address feedback, in particular: - use `readonly` attribute to determine memory dependencies between call instructions, and update tests accordingly - add a standalone test that does not rely on -vector-library Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 Files: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll llvm/test/Transforms/SLPVectorizer/vectorizable-functions.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82550.277055.patch Type: text/x-patch Size: 33890 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:34:28 2020 From: llvm-commits at lists.llvm.org (Sanne Wouda via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:34:28 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: sanwou01 added inline comments. ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4545 + } else { + Module *M = F->getParent(); + Type *Tys[] = {FixedVectorType::get(CI->getType(), E->Scalars.size())}; ---------------- fpetrogalli wrote: > `M` is used only inside `getDeclaration`, no need to declare a variable for it. I'm just moving some pre-existing code into an else clause here. I can still tidy this up if you prefer, but perhaps in a follow-up NFC? ================ Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5110 + + return true; +} ---------------- fpetrogalli wrote: > I suspect there are many things that may fail here, other than not having a mapping in `VFDatabase` or having pointer arguments. I think it would be safer to reverse the logic, and have the function return false by default, and return true if VFDatabase is not empty and there is no pointer arguments. As discussed below, we can rely on the existing readonly attribute here, so this helper function is no longer needed, and removed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Fri Jul 10 08:38:20 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:38:20 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: <6efc51e0f4165adf124df937788b6c37@localhost.localdomain> DiggerLin updated this revision to Diff 277057. DiggerLin marked 4 inline comments as done. DiggerLin added a comment. address comment Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 Files: llvm/docs/CommandGuide/llvm-readobj.rst llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-32-xlc-exec llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-32-xlc-obj.o llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-64-xlc-exec llvm/test/tools/llvm-readobj/XCOFF/Inputs/xcoff-64-xlc-obj.o llvm/test/tools/llvm-readobj/XCOFF/xcoff-auxiliary-header.test llvm/tools/llvm-readobj/ObjDumper.h llvm/tools/llvm-readobj/XCOFFDumper.cpp llvm/tools/llvm-readobj/llvm-readobj.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82549.277057.patch Type: text/x-patch Size: 19422 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:38:26 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:38:26 +0000 (UTC) Subject: [PATCH] D83561: [ScheduleDAG] Move DBG_VALUEs after first term forward. Message-ID: fhahn created this revision. fhahn added reviewers: vsk, aprantl, jpaquette, efriedma. Herald added subscribers: hiraditya, kristof.beyls, MatzeB. Herald added a reviewer: paquette. Herald added a project: LLVM. MBBs are not allowed to have non-terminator instructions after the first terminator. Currently in some cases (see the modified test), EmitSchedule can add DBG_VALUEs after the last terminator, for example when referring a debug value that gets folded into a TCRETURN instruction on ARM. This patch updates EmitSchedule to move inserted DBG_VALUEs just before the first terminator. I am not sure if there are terminators produce values that can in turn be used by a DBG_VALUE. In that case, moving the DBG_VALUE might result in referencing an undefined register. But in any case, it seems like currently there is no way to insert a proper DBG_VALUEs for such registers anyways. Alternatively it might make sense to just remove those extra DBG_VALUES. I am not too familiar with the details of debug info in the backend and would appreciate any suggestions on how to address the issue in the best possible way. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83561 Files: llvm/include/llvm/CodeGen/MachineInstr.h llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp llvm/test/CodeGen/ARM/dbg-tcreturn.ll Index: llvm/test/CodeGen/ARM/dbg-tcreturn.ll =================================================================== --- llvm/test/CodeGen/ARM/dbg-tcreturn.ll +++ llvm/test/CodeGen/ARM/dbg-tcreturn.ll @@ -1,4 +1,4 @@ -; RUN: llc %s -o - -stop-after=finalize-isel | FileCheck %s +; RUN: llc %s -o - -stop-after=finalize-isel -verify-machineinstrs | FileCheck %s target datalayout = "e-m:o-p:32:32-Fi8-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32" target triple = "thumbv7-apple-ios7.0.0" @@ -11,8 +11,8 @@ ; CHECK-NEXT: %0:gpr = COPY $r0 ; CHECK-NEXT: $r0 = COPY %0 ; CHECK-NEXT: $r1 = COPY %1 -; CHECK-NEXT: TCRETURNdi &__divsi3, implicit $sp, implicit $r0, implicit $r1 ; CHECK-NEXT: DBG_VALUE $noreg, $noreg, !13, !DIExpression(), debug-location !16 +; CHECK-NEXT: TCRETURNdi &__divsi3, implicit $sp, implicit $r0, implicit $r1 define i32 @test(i32 %a1, i32 %a2) !dbg !5 { entry: Index: llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp +++ llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp @@ -1034,7 +1034,25 @@ } InsertPos = Emitter.getInsertPos(); - return Emitter.getBlock(); + // In some cases, DBG_VALUEs might be inserted after the first terminator, + // which results in an invalid MBB. If that happens, move the DBG_VALUEs + // before the first terminator. + MachineBasicBlock *InsertBB = Emitter.getBlock(); + auto FirstTerm = InsertBB->getFirstTerminator(); + if (FirstTerm != InsertBB->end()) { + assert(!FirstTerm->isDebugValue() && + "first terminator cannot be a debug value"); + for (MachineInstr &MI : make_early_inc_range( + make_range(std::next(FirstTerm), InsertBB->end()))) { + if (!MI.isDebugValue()) + continue; + + if (&MI == InsertPos) + InsertPos = std::prev(InsertPos->getIterator()); + MI.moveBefore(&*FirstTerm); + } + } + return InsertBB; } /// Return the basic block label. Index: llvm/lib/CodeGen/MachineInstr.cpp =================================================================== --- llvm/lib/CodeGen/MachineInstr.cpp +++ llvm/lib/CodeGen/MachineInstr.cpp @@ -147,6 +147,10 @@ setFlags(MI.Flags); } +void MachineInstr::moveBefore(MachineInstr *MovePos) { + MovePos->getParent()->splice(MovePos, getParent(), getIterator()); +} + /// getRegInfo - If this instruction is embedded into a MachineFunction, /// return the MachineRegisterInfo object for the current function, otherwise /// return null. Index: llvm/include/llvm/CodeGen/MachineInstr.h =================================================================== --- llvm/include/llvm/CodeGen/MachineInstr.h +++ llvm/include/llvm/CodeGen/MachineInstr.h @@ -280,6 +280,9 @@ const MachineBasicBlock* getParent() const { return Parent; } MachineBasicBlock* getParent() { return Parent; } + // Move the instruction before \p MovePos. + void moveBefore(MachineInstr *MovePos); + /// Return the function that contains the basic block that this instruction /// belongs to. /// -------------- next part -------------- A non-text attachment was scrubbed... Name: D83561.277058.patch Type: text/x-patch Size: 3132 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:38:27 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:38:27 +0000 (UTC) Subject: [PATCH] D81647: MIR Statepoint refactoring. Part 3: Spill GC Ptr regs. In-Reply-To: References: Message-ID: <3f67aab1d42f1482f0d57ca4d163474b@localhost.localdomain> dantrushin updated this revision to Diff 277059. dantrushin added a comment. Restore accidently destroyed review. Change handling of shared landing pads - now we reserve spill slots as we go, upon encountering first statepoint jumping to that landing pad. Slightly modified tests to be less sensitive to unimportant changes (spill ordering) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81647/new/ https://reviews.llvm.org/D81647 Files: llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp llvm/test/CodeGen/X86/statepoint-fixup-call.mir llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir llvm/test/CodeGen/X86/statepoint-fixup-shared-ehpad.mir llvm/test/CodeGen/X86/statepoint-vreg.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D81647.277059.patch Type: text/x-patch Size: 39084 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:39:40 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:39:40 +0000 (UTC) Subject: [PATCH] D82502: [PowerPC][Power10] Implement Load VSX Vector and Sign Extend and Zero Extend In-Reply-To: References: Message-ID: <377084893c089599678ef3e6412bfbc7@localhost.localdomain> amyk added a comment. Please update this patch to remove the instruction defs and MC tests. Also, you can update the patch to put your backend llc tests in the file I've introduced in: https://reviews.llvm.org/D82467 ================ Comment at: clang/test/CodeGen/builtins-ppc-p10vector.c:14 -#include +#include "altivec.h" ---------------- unintended change? ================ Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14165 + // The width of the narrow type becomes an operand of the LXVRZX node + SDValue Width = ; + SDValue LoadOps[] = {LD->getChain(), LD->getBasePtr(), DAG.getIntPtrConstant(MemoryType.getScalarSizeInBits(), dl)}; ---------------- You did not assign anything here? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82502/new/ https://reviews.llvm.org/D82502 From llvm-commits at lists.llvm.org Fri Jul 10 08:39:49 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:39:49 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: <803a1ce015120578364baa77e46aae0d@localhost.localdomain> DiggerLin added inline comments. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:486 +#define PrintAuxMember(H, S, T, X) \ + W.print##H(S, T); \ ---------------- hubert.reinterpretcast wrote: > This macro does not operate within the confines of what a function can do with respect to its caller (it can cause the caller to return early). I do not believe that using a function-like naming style is appropriate. I also do not believe that using such a macro for control flow is desirable. > > You can encode a table (yes, a macro is okay for that) with much the same information: > (format, description, pointer-to-member, offset in the table past-the-end of the member) > > and use that table in the place where this macro is being invoked. for the function printNumber is a overload function. using a macro, the complie will determine which version of printNumber will be used when compile. if using a table, I think how to make the code call the correct overload version of printNumber based in the parameter type when running may be complicated. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:491 + W.print##H(S, T); \ + if ((X = X - sizeof(T)) == 0) \ + return ---------------- hubert.reinterpretcast wrote: > DiggerLin wrote: > > hubert.reinterpretcast wrote: > > > This strikes me as extremely hazardous. What if we get a length value that is reflective of a partial field? > > thanks > We still have to build with C++14 compilers for the time being. Assigning a large 64-bit value to a 32-bit signed type is verboten. In any case, checking the table size against the last field of the table I described above would avoid this issue. thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Fri Jul 10 08:40:34 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Fri, 10 Jul 2020 08:40:34 -0700 (PDT) Subject: [llvm] ec00aa9 - [DomTreeUpdater] Use const auto * when iterating over pointers (NFC). Message-ID: <5f088bf2.1c69fb81.7e178.fd99@mx.google.com> Author: Florian Hahn Date: 2020-07-10T16:39:15+01:00 New Revision: ec00aa99dd4cd46c328219cffefcfdf8a8bd53a0 URL: https://github.com/llvm/llvm-project/commit/ec00aa99dd4cd46c328219cffefcfdf8a8bd53a0 DIFF: https://github.com/llvm/llvm-project/commit/ec00aa99dd4cd46c328219cffefcfdf8a8bd53a0.diff LOG: [DomTreeUpdater] Use const auto * when iterating over pointers (NFC). This silences the warning below: llvm-project/llvm/lib/Analysis/DomTreeUpdater.cpp:510:20: warning: loop variable 'BB' is always a copy because the range of type 'const SmallPtrSet' does not return a reference [-Wrange-loop-analysis] for (const auto &BB : DeletedBBs) { ^ llvm-project/llvm/lib/Analysis/DomTreeUpdater.cpp:510:8: note: use non-reference type 'llvm::BasicBlock *' for (const auto &BB : DeletedBBs) { ^~~~~~~~~~~~~~~~ 1 warning generated. Added: Modified: llvm/lib/Analysis/DomTreeUpdater.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/DomTreeUpdater.cpp b/llvm/lib/Analysis/DomTreeUpdater.cpp index 26e637bb6d99..9594da0a4f91 100644 --- a/llvm/lib/Analysis/DomTreeUpdater.cpp +++ b/llvm/lib/Analysis/DomTreeUpdater.cpp @@ -507,7 +507,7 @@ LLVM_DUMP_METHOD void DomTreeUpdater::dump() const { OS << "Pending DeletedBBs:\n"; Index = 0; - for (const auto &BB : DeletedBBs) { + for (const auto *BB : DeletedBBs) { OS << " " << Index << " : "; ++Index; if (BB->hasName()) From llvm-commits at lists.llvm.org Fri Jul 10 08:41:12 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:41:12 +0000 (UTC) Subject: [PATCH] D83504: [PowerPC] Implement R_PPC64_REL24_NOTOC local calls. callee has a TOC In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/ELF/Thunks.cpp:981 + if (type == R_PPC64_REL24_NOTOC && (s.stOther >> 5) > 1) + return make(s); ---------------- What if `(s.stOther >> 5) == 1`? ================ Comment at: lld/test/ELF/ppc64-pcrel-call-to-toc.s:18 + +## The test is created to check that when a function without TOC access a +## local function using TOC, a r12 setup stub is inserted. ---------------- `The test is created to check that` can be omitted. `local` is not accurate. ## When a function without TOC accesses a function using TOC, an r12 setup stub is inserted. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83504/new/ https://reviews.llvm.org/D83504 From llvm-commits at lists.llvm.org Fri Jul 10 08:41:20 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 08:41:20 -0700 (PDT) Subject: [llvm] 43d8d59 - [Attributor][NFC] Update tests after recent changes Message-ID: <5f088c20.1c69fb81.101de.e3fe@mx.google.com> Author: Johannes Doerfert Date: 2020-07-10T10:39:32-05:00 New Revision: 43d8d59d6d2c03165a219022430e3b1cbf6351a2 URL: https://github.com/llvm/llvm-project/commit/43d8d59d6d2c03165a219022430e3b1cbf6351a2 DIFF: https://github.com/llvm/llvm-project/commit/43d8d59d6d2c03165a219022430e3b1cbf6351a2.diff LOG: [Attributor][NFC] Update tests after recent changes Attributor tests are mostly updated using the auto upgrade scripts but sometimes we forget. If we do it manually or continue using old check lines that still match we see unrelated changes down the line. This is just a cleanup. Added: Modified: llvm/test/Transforms/Attributor/ArgumentPromotion/2008-02-01-ReturnAttrs.ll llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll llvm/test/Transforms/Attributor/ArgumentPromotion/alignment.ll llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll llvm/test/Transforms/Attributor/ArgumentPromotion/byval.ll llvm/test/Transforms/Attributor/ArgumentPromotion/control-flow2.ll llvm/test/Transforms/Attributor/ArgumentPromotion/crash.ll llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll llvm/test/Transforms/Attributor/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll llvm/test/Transforms/Attributor/ArgumentPromotion/profile.ll llvm/test/Transforms/Attributor/ArgumentPromotion/sret.ll llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll llvm/test/Transforms/Attributor/heap_to_stack.ll llvm/test/Transforms/Attributor/internal-noalias.ll llvm/test/Transforms/Attributor/liveness.ll llvm/test/Transforms/Attributor/memory_locations.ll llvm/test/Transforms/Attributor/misc_crash.ll llvm/test/Transforms/Attributor/readattrs.ll llvm/test/Transforms/Attributor/undefined_behavior.ll llvm/test/Transforms/Attributor/value-simplify.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/2008-02-01-ReturnAttrs.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/2008-02-01-ReturnAttrs.ll index a963c3a31c37..b943af621940 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/2008-02-01-ReturnAttrs.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/2008-02-01-ReturnAttrs.ll @@ -14,8 +14,8 @@ define internal i32 @deref(i32* %x) nounwind { ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@deref ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: entry: -; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]] +; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP2:%.*]] = load i32, i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: ret i32 [[TMP2]] ; @@ -34,7 +34,7 @@ define i32 @f(i32 %x) { ; NOT_TUNIT_NPM-LABEL: define {{[^@]+}}@f ; NOT_TUNIT_NPM-SAME: (i32 [[X:%.*]]) ; NOT_TUNIT_NPM-NEXT: entry: -; NOT_TUNIT_NPM-NEXT: [[X_ADDR:%.*]] = alloca i32 +; NOT_TUNIT_NPM-NEXT: [[X_ADDR:%.*]] = alloca i32, align 4 ; NOT_TUNIT_NPM-NEXT: store i32 [[X]], i32* [[X_ADDR]], align 4 ; NOT_TUNIT_NPM-NEXT: [[TMP1:%.*]] = call i32 @deref(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[X_ADDR]]) ; NOT_TUNIT_NPM-NEXT: ret i32 [[TMP1]] @@ -42,7 +42,7 @@ define i32 @f(i32 %x) { ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@f ; IS__TUNIT_NPM-SAME: (i32 [[X:%.*]]) ; IS__TUNIT_NPM-NEXT: entry: -; IS__TUNIT_NPM-NEXT: [[X_ADDR:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[X_ADDR:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 [[X]], i32* [[X_ADDR]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP0:%.*]] = load i32, i32* [[X_ADDR]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = call i32 @deref(i32 [[TMP0]]) diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll index f0aeb8d15add..49342ae4f04c 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll @@ -100,8 +100,8 @@ define internal fastcc void @promote_avx2(<4 x i64>* %arg, <4 x i64>* readonly % ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@promote_avx2 ; IS__TUNIT_NPM-SAME: (<4 x i64>* noalias nocapture nofree nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <4 x i64> -; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP0]], <4 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <4 x i64>, align 32 +; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP0]], <4 x i64>* [[ARG1_PRIV]], align 32 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <4 x i64>, <4 x i64>* [[ARG1_PRIV]], align 32 ; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP]], <4 x i64>* [[ARG]], align 32 ; IS__TUNIT_NPM-NEXT: ret void diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll index e3a33c6121e6..361dfa0cd592 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll @@ -21,8 +21,8 @@ define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal5 ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer512_call_avx512_legal512_prefer512 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void @@ -108,8 +108,8 @@ define internal fastcc void @callee_avx512_legal512_prefer256_call_avx512_legal5 ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer256_call_avx512_legal512_prefer256 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void @@ -195,8 +195,8 @@ define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal5 ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer512_call_avx512_legal512_prefer256 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void @@ -282,8 +282,8 @@ define internal fastcc void @callee_avx512_legal512_prefer256_call_avx512_legal5 ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer256_call_avx512_legal512_prefer512 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void @@ -537,8 +537,8 @@ define internal fastcc void @callee_avx2_legal256_prefer256_call_avx2_legal512_p ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx2_legal256_prefer256_call_avx2_legal512_prefer256 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void @@ -624,8 +624,8 @@ define internal fastcc void @callee_avx2_legal512_prefer256_call_avx2_legal256_p ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx2_legal512_prefer256_call_avx2_legal256_prefer256 ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) ; IS__TUNIT_NPM-NEXT: bb: -; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64> -; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]] +; IS__TUNIT_NPM-NEXT: [[ARG1_PRIV:%.*]] = alloca <8 x i64>, align 64 +; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP0]], <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: [[TMP:%.*]] = load <8 x i64>, <8 x i64>* [[ARG1_PRIV]], align 64 ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 ; IS__TUNIT_NPM-NEXT: ret void diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/alignment.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/alignment.ll index c869ba50874b..0367a5e42ec1 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/alignment.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/alignment.ll @@ -33,8 +33,8 @@ define internal void @g(i32* %a) { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@g ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]]) -; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]] +; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[AA:%.*]] = load i32, i32* [[A_PRIV]], align 1 ; IS__TUNIT_NPM-NEXT: call void @z(i32 [[AA]]) ; IS__TUNIT_NPM-NEXT: ret void @@ -70,10 +70,10 @@ define internal i32 @test(i32* %X, i64* %Y) { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@test ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]], i64 [[TMP1:%.*]]) -; IS__TUNIT_NPM-NEXT: [[Y_PRIV:%.*]] = alloca i64 -; IS__TUNIT_NPM-NEXT: store i64 [[TMP1]], i64* [[Y_PRIV]] -; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]] +; IS__TUNIT_NPM-NEXT: [[Y_PRIV:%.*]] = alloca i64, align 8 +; IS__TUNIT_NPM-NEXT: store i64 [[TMP1]], i64* [[Y_PRIV]], align 4 +; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[A:%.*]] = load i32, i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[B:%.*]] = load i64, i64* [[Y_PRIV]], align 8 ; IS__TUNIT_NPM-NEXT: [[C:%.*]] = add i32 [[A]], 1 @@ -113,16 +113,16 @@ Return2: define internal i32 @caller(i32* %A) { ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@caller ; IS__TUNIT_OPM-SAME: (i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A:%.*]]) -; IS__TUNIT_OPM-NEXT: [[B:%.*]] = alloca i64 +; IS__TUNIT_OPM-NEXT: [[B:%.*]] = alloca i64, align 8 ; IS__TUNIT_OPM-NEXT: store i64 1, i64* [[B]], align 8 ; IS__TUNIT_OPM-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]], i64* noalias nocapture nofree nonnull readonly align 8 dereferenceable(8) [[B]]) ; IS__TUNIT_OPM-NEXT: ret i32 [[C]] ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@caller ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]]) -; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]] -; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i64 +; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]], align 4 +; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i64, align 8 ; IS__TUNIT_NPM-NEXT: store i64 1, i64* [[B]], align 8 ; IS__TUNIT_NPM-NEXT: [[TMP2:%.*]] = load i32, i32* [[A_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP3:%.*]] = load i64, i64* [[B]], align 8 @@ -131,7 +131,7 @@ define internal i32 @caller(i32* %A) { ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@caller ; IS__CGSCC____-SAME: (i32* nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A:%.*]]) -; IS__CGSCC____-NEXT: [[B:%.*]] = alloca i64 +; IS__CGSCC____-NEXT: [[B:%.*]] = alloca i64, align 8 ; IS__CGSCC____-NEXT: store i64 1, i64* [[B]], align 8 ; IS__CGSCC____-NEXT: [[C:%.*]] = call i32 @test(i32* nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]], i64* noalias nocapture nofree nonnull readonly align 8 dereferenceable(8) [[B]]) ; IS__CGSCC____-NEXT: ret i32 [[C]] @@ -144,13 +144,13 @@ define internal i32 @caller(i32* %A) { define i32 @callercaller() { ; NOT_TUNIT_NPM-LABEL: define {{[^@]+}}@callercaller() -; NOT_TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32 +; NOT_TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32, align 4 ; NOT_TUNIT_NPM-NEXT: store i32 2, i32* [[B]], align 4 ; NOT_TUNIT_NPM-NEXT: [[X:%.*]] = call i32 @caller(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B]]) ; NOT_TUNIT_NPM-NEXT: ret i32 [[X]] ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callercaller() -; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 2, i32* [[B]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[B]], align 4 ; IS__TUNIT_NPM-NEXT: [[X:%.*]] = call i32 @caller(i32 [[TMP1]]) diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll index 3877097e127e..3519c96a731a 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll @@ -15,10 +15,10 @@ define internal i32 @test(i32* %X, i32* %Y) { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@test ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]], i32 [[TMP1:%.*]]) -; IS__TUNIT_NPM-NEXT: [[Y_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP1]], i32* [[Y_PRIV]] -; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]] +; IS__TUNIT_NPM-NEXT: [[Y_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP1]], i32* [[Y_PRIV]], align 4 +; IS__TUNIT_NPM-NEXT: [[X_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[A:%.*]] = load i32, i32* [[X_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[B:%.*]] = load i32, i32* [[Y_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[C:%.*]] = add i32 [[A]], [[B]] @@ -40,16 +40,16 @@ define internal i32 @test(i32* %X, i32* %Y) { define internal i32 @caller(i32* %B) { ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@caller ; IS__TUNIT_OPM-SAME: (i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B:%.*]]) -; IS__TUNIT_OPM-NEXT: [[A:%.*]] = alloca i32 +; IS__TUNIT_OPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__TUNIT_OPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__TUNIT_OPM-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]], i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B]]) ; IS__TUNIT_OPM-NEXT: ret i32 [[C]] ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@caller ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]]) -; IS__TUNIT_NPM-NEXT: [[B_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[B_PRIV]] -; IS__TUNIT_NPM-NEXT: [[A:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[B_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[B_PRIV]], align 4 +; IS__TUNIT_NPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP2:%.*]] = load i32, i32* [[A]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP3:%.*]] = load i32, i32* [[B_PRIV]], align 4 @@ -58,7 +58,7 @@ define internal i32 @caller(i32* %B) { ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@caller ; IS__CGSCC____-SAME: (i32* nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B:%.*]]) -; IS__CGSCC____-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC____-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC____-NEXT: store i32 1, i32* [[A]], align 4 ; IS__CGSCC____-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]], i32* nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B]]) ; IS__CGSCC____-NEXT: ret i32 [[C]] @@ -71,13 +71,13 @@ define internal i32 @caller(i32* %B) { define i32 @callercaller() { ; NOT_TUNIT_NPM-LABEL: define {{[^@]+}}@callercaller() -; NOT_TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32 +; NOT_TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32, align 4 ; NOT_TUNIT_NPM-NEXT: store i32 2, i32* [[B]], align 4 ; NOT_TUNIT_NPM-NEXT: [[X:%.*]] = call i32 @caller(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[B]]) ; NOT_TUNIT_NPM-NEXT: ret i32 [[X]] ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callercaller() -; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[B:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 2, i32* [[B]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[B]], align 4 ; IS__TUNIT_NPM-NEXT: [[X:%.*]] = call i32 @caller(i32 [[TMP1]]) diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/byval.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/byval.ll index 194188cdf8ae..a74151b777a5 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/byval.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/byval.ll @@ -94,15 +94,15 @@ define i32 @main() nounwind { ; IS__TUNIT_NPM-NEXT: store i32 1, i32* [[TMP1]], align 8 ; IS__TUNIT_NPM-NEXT: [[TMP4:%.*]] = getelementptr [[STRUCT_SS]], %struct.ss* [[S]], i32 0, i32 1 ; IS__TUNIT_NPM-NEXT: store i64 2, i64* [[TMP4]], align 4 -; IS__TUNIT_NPM-NEXT: [[S_CAST1:%.*]] = bitcast %struct.ss* [[S]] to i32* -; IS__TUNIT_NPM-NEXT: [[TMP0:%.*]] = load i32, i32* [[S_CAST1]], align 8 -; IS__TUNIT_NPM-NEXT: [[S_0_12:%.*]] = getelementptr [[STRUCT_SS]], %struct.ss* [[S]], i32 0, i32 1 -; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i64, i64* [[S_0_12]], align 8 -; IS__TUNIT_NPM-NEXT: [[C0:%.*]] = call i32 @f(i32 [[TMP0]], i64 [[TMP1]]) ; IS__TUNIT_NPM-NEXT: [[S_CAST:%.*]] = bitcast %struct.ss* [[S]] to i32* -; IS__TUNIT_NPM-NEXT: [[TMP2:%.*]] = load i32, i32* [[S_CAST]], align 32 +; IS__TUNIT_NPM-NEXT: [[TMP0:%.*]] = load i32, i32* [[S_CAST]], align 8 ; IS__TUNIT_NPM-NEXT: [[S_0_1:%.*]] = getelementptr [[STRUCT_SS]], %struct.ss* [[S]], i32 0, i32 1 -; IS__TUNIT_NPM-NEXT: [[TMP3:%.*]] = load i64, i64* [[S_0_1]], align 32 +; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i64, i64* [[S_0_1]], align 8 +; IS__TUNIT_NPM-NEXT: [[C0:%.*]] = call i32 @f(i32 [[TMP0]], i64 [[TMP1]]) +; IS__TUNIT_NPM-NEXT: [[S_CAST1:%.*]] = bitcast %struct.ss* [[S]] to i32* +; IS__TUNIT_NPM-NEXT: [[TMP2:%.*]] = load i32, i32* [[S_CAST1]], align 32 +; IS__TUNIT_NPM-NEXT: [[S_0_12:%.*]] = getelementptr [[STRUCT_SS]], %struct.ss* [[S]], i32 0, i32 1 +; IS__TUNIT_NPM-NEXT: [[TMP3:%.*]] = load i64, i64* [[S_0_12]], align 32 ; IS__TUNIT_NPM-NEXT: [[C1:%.*]] = call i32 @g(i32 [[TMP2]], i64 [[TMP3]]) ; IS__TUNIT_NPM-NEXT: [[A:%.*]] = add i32 [[C0]], [[C1]] ; IS__TUNIT_NPM-NEXT: ret i32 [[A]] diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/control-flow2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/control-flow2.ll index 61b1c19e49a0..eaeebf14ae0a 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/control-flow2.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/control-flow2.ll @@ -18,8 +18,8 @@ define internal i32 @callee(i1 %C, i32* %P) { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee ; IS__TUNIT_NPM-SAME: (i1 [[C:%.*]], i32 [[TMP0:%.*]]) -; IS__TUNIT_NPM-NEXT: [[P_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[P_PRIV]] +; IS__TUNIT_NPM-NEXT: [[P_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[P_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: br label [[F:%.*]] ; IS__TUNIT_NPM: T: ; IS__TUNIT_NPM-NEXT: unreachable @@ -48,20 +48,20 @@ F: ; preds = %0 define i32 @foo() { ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@foo() -; IS__TUNIT_OPM-NEXT: [[A:%.*]] = alloca i32 +; IS__TUNIT_OPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__TUNIT_OPM-NEXT: store i32 17, i32* [[A]], align 4 ; IS__TUNIT_OPM-NEXT: [[X:%.*]] = call i32 @callee(i1 false, i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]]) ; IS__TUNIT_OPM-NEXT: ret i32 [[X]] ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@foo() -; IS__TUNIT_NPM-NEXT: [[A:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 17, i32* [[A]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[A]], align 4 ; IS__TUNIT_NPM-NEXT: [[X:%.*]] = call i32 @callee(i1 false, i32 [[TMP1]]) ; IS__TUNIT_NPM-NEXT: ret i32 [[X]] ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@foo() -; IS__CGSCC____-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC____-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC____-NEXT: store i32 17, i32* [[A]], align 4 ; IS__CGSCC____-NEXT: [[X:%.*]] = call i32 @callee(i32* noalias nocapture nofree nonnull readonly align 4 dereferenceable(4) [[A]]) ; IS__CGSCC____-NEXT: ret i32 [[X]] diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/crash.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/crash.ll index 66e3ad1cc6d3..5a03026c2dc2 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/crash.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/crash.ll @@ -75,8 +75,8 @@ define i32 @test_inf_promote_caller(i32 %arg) { ; IS__CGSCC____-LABEL: define {{[^@]+}}@test_inf_promote_caller ; IS__CGSCC____-SAME: (i32 [[ARG:%.*]]) ; IS__CGSCC____-NEXT: bb: -; IS__CGSCC____-NEXT: [[TMP:%.*]] = alloca [[S:%.*]] -; IS__CGSCC____-NEXT: [[TMP1:%.*]] = alloca [[S]] +; IS__CGSCC____-NEXT: [[TMP:%.*]] = alloca [[S:%.*]], align 8 +; IS__CGSCC____-NEXT: [[TMP1:%.*]] = alloca [[S]], align 8 ; IS__CGSCC____-NEXT: unreachable ; bb: diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll index 070df749a15d..4da401f1db29 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll @@ -41,13 +41,13 @@ dead: define internal i32 @caller(i32* %B) { ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@caller() -; IS__CGSCC_OPM-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC_OPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC_OPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__CGSCC_OPM-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[A]]) ; IS__CGSCC_OPM-NEXT: ret i32 0 ; ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@caller() -; IS__CGSCC_NPM-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC_NPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC_NPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__CGSCC_NPM-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[A]]) ; IS__CGSCC_NPM-NEXT: ret i32 undef @@ -60,7 +60,7 @@ define internal i32 @caller(i32* %B) { define i32 @callercaller() { ; CHECK-LABEL: define {{[^@]+}}@callercaller() -; CHECK-NEXT: [[B:%.*]] = alloca i32 +; CHECK-NEXT: [[B:%.*]] = alloca i32, align 4 ; CHECK-NEXT: store i32 2, i32* [[B]], align 4 ; CHECK-NEXT: ret i32 0 ; diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll index 4491ceb2d277..9d8d5e5decc3 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll @@ -51,21 +51,21 @@ dead: define internal i32 @caller(i32* %B) { ; IS__TUNIT____-LABEL: define {{[^@]+}}@caller ; IS__TUNIT____-SAME: (i32* noalias nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) -; IS__TUNIT____-NEXT: [[A:%.*]] = alloca i32 +; IS__TUNIT____-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__TUNIT____-NEXT: store i32 1, i32* [[A]], align 4 ; IS__TUNIT____-NEXT: [[C:%.*]] = call i32 @test(i32* noalias nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B]]) ; IS__TUNIT____-NEXT: ret i32 0 ; ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@caller ; IS__CGSCC_OPM-SAME: (i32* nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) -; IS__CGSCC_OPM-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC_OPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC_OPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__CGSCC_OPM-NEXT: [[C:%.*]] = call i32 @test(i32* nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B]]) ; IS__CGSCC_OPM-NEXT: ret i32 0 ; ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@caller ; IS__CGSCC_NPM-SAME: (i32* nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) -; IS__CGSCC_NPM-NEXT: [[A:%.*]] = alloca i32 +; IS__CGSCC_NPM-NEXT: [[A:%.*]] = alloca i32, align 4 ; IS__CGSCC_NPM-NEXT: store i32 1, i32* [[A]], align 4 ; IS__CGSCC_NPM-NEXT: [[C:%.*]] = call i32 @test(i32* nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B]]) ; IS__CGSCC_NPM-NEXT: ret i32 undef @@ -78,7 +78,7 @@ define internal i32 @caller(i32* %B) { define i32 @callercaller() { ; CHECK-LABEL: define {{[^@]+}}@callercaller() -; CHECK-NEXT: [[B:%.*]] = alloca i32 +; CHECK-NEXT: [[B:%.*]] = alloca i32, align 4 ; CHECK-NEXT: store i32 2, i32* [[B]], align 4 ; CHECK-NEXT: [[X:%.*]] = call i32 @caller(i32* noalias nocapture nofree nonnull writeonly align 4 dereferenceable(4) [[B]]) ; CHECK-NEXT: ret i32 0 diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll index e33891d27c31..a372028311a6 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll @@ -15,7 +15,7 @@ define void @foo() { ; CHECK-LABEL: define {{[^@]+}}@foo() -; CHECK-NEXT: [[TMP:%.*]] = alloca void (i16*)* +; CHECK-NEXT: [[TMP:%.*]] = alloca void (i16*)*, align 8 ; CHECK-NEXT: store void (i16*)* @bar, void (i16*)** [[TMP]], align 8 ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/profile.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/profile.ll index 87bd530d9647..cacc6b95d263 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/profile.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/profile.ll @@ -9,13 +9,13 @@ target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:1 define void @caller() #0 { ; NOT_TUNIT_NPM-LABEL: define {{[^@]+}}@caller() -; NOT_TUNIT_NPM-NEXT: [[X:%.*]] = alloca i32 +; NOT_TUNIT_NPM-NEXT: [[X:%.*]] = alloca i32, align 4 ; NOT_TUNIT_NPM-NEXT: store i32 42, i32* [[X]], align 4 ; NOT_TUNIT_NPM-NEXT: call void @promote_i32_ptr(i32* noalias nocapture nonnull readonly align 4 dereferenceable(4) [[X]]), !prof !0 ; NOT_TUNIT_NPM-NEXT: ret void ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@caller() -; IS__TUNIT_NPM-NEXT: [[X:%.*]] = alloca i32 +; IS__TUNIT_NPM-NEXT: [[X:%.*]] = alloca i32, align 4 ; IS__TUNIT_NPM-NEXT: store i32 42, i32* [[X]], align 4 ; IS__TUNIT_NPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[X]], align 4 ; IS__TUNIT_NPM-NEXT: call void @promote_i32_ptr(i32 [[TMP1]]), !prof !0 @@ -36,8 +36,8 @@ define internal void @promote_i32_ptr(i32* %xp) { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@promote_i32_ptr ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]]) -; IS__TUNIT_NPM-NEXT: [[XP_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[XP_PRIV]] +; IS__TUNIT_NPM-NEXT: [[XP_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[XP_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[X:%.*]] = load i32, i32* [[XP_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: call void @use_i32(i32 [[X]]) ; IS__TUNIT_NPM-NEXT: ret void diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/sret.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/sret.ll index 12408c1a3792..a84ab1f4ef4b 100644 --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/sret.ll +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/sret.ll @@ -40,14 +40,14 @@ define internal void @add({i32, i32}* %this, i32* sret %r) { define void @f() { ; IS________OPM-LABEL: define {{[^@]+}}@f() -; IS________OPM-NEXT: [[R:%.*]] = alloca i32 -; IS________OPM-NEXT: [[PAIR:%.*]] = alloca { i32, i32 } +; IS________OPM-NEXT: [[R:%.*]] = alloca i32, align 4 +; IS________OPM-NEXT: [[PAIR:%.*]] = alloca { i32, i32 }, align 8 ; IS________OPM-NEXT: call void @add({ i32, i32 }* nocapture nofree nonnull readonly align 8 dereferenceable(8) [[PAIR]], i32* nocapture nofree nonnull sret writeonly align 4 dereferenceable(4) [[R]]) ; IS________OPM-NEXT: ret void ; ; IS________NPM-LABEL: define {{[^@]+}}@f() -; IS________NPM-NEXT: [[R:%.*]] = alloca i32 -; IS________NPM-NEXT: [[PAIR:%.*]] = alloca { i32, i32 } +; IS________NPM-NEXT: [[R:%.*]] = alloca i32, align 4 +; IS________NPM-NEXT: [[PAIR:%.*]] = alloca { i32, i32 }, align 8 ; IS________NPM-NEXT: call void @add({ i32, i32 }* noalias nocapture nofree nonnull readonly align 8 dereferenceable(8) [[PAIR]], i32* noalias nocapture nofree nonnull sret writeonly align 4 dereferenceable(4) [[R]]) ; IS________NPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll b/llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll index a285ffc03f27..aa1374a7c3c5 100644 --- a/llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll +++ b/llvm/test/Transforms/Attributor/IPConstantProp/dangling-block-address.ll @@ -19,7 +19,7 @@ define internal void @foo(i32 %x) nounwind readnone { ; IS__CGSCC____-SAME: (i32 [[X:%.*]]) ; IS__CGSCC____-NEXT: entry: ; IS__CGSCC____-NEXT: [[B:%.*]] = alloca i32, align 4 -; IS__CGSCC____-NEXT: store volatile i32 -1, i32* [[B]] +; IS__CGSCC____-NEXT: store volatile i32 -1, i32* [[B]], align 4 ; IS__CGSCC____-NEXT: ret void ; entry: @@ -41,9 +41,9 @@ define internal void @bar(i32* nocapture %pc) nounwind readonly { ; IS__CGSCC_OPM: indirectgoto: ; IS__CGSCC_OPM-NEXT: [[INDVAR]] = phi i32 [ [[INDVAR_NEXT]], [[LAB0:%.*]] ], [ 0, [[ENTRY:%.*]] ] ; IS__CGSCC_OPM-NEXT: [[PC_ADDR_0:%.*]] = getelementptr i32, i32* [[PC]], i32 [[INDVAR]] -; IS__CGSCC_OPM-NEXT: [[TMP1_PN:%.*]] = load i32, i32* [[PC_ADDR_0]] +; IS__CGSCC_OPM-NEXT: [[TMP1_PN:%.*]] = load i32, i32* [[PC_ADDR_0]], align 4 ; IS__CGSCC_OPM-NEXT: [[INDIRECT_GOTO_DEST_IN:%.*]] = getelementptr inbounds [2 x i8*], [2 x i8*]* @bar.l, i32 0, i32 [[TMP1_PN]] -; IS__CGSCC_OPM-NEXT: [[INDIRECT_GOTO_DEST:%.*]] = load i8*, i8** [[INDIRECT_GOTO_DEST_IN]] +; IS__CGSCC_OPM-NEXT: [[INDIRECT_GOTO_DEST:%.*]] = load i8*, i8** [[INDIRECT_GOTO_DEST_IN]], align 8 ; IS__CGSCC_OPM-NEXT: indirectbr i8* [[INDIRECT_GOTO_DEST]], [label [[LAB0]], label %end] ; entry: diff --git a/llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll b/llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll index 9fb1ca055826..b28b38f89687 100644 --- a/llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll +++ b/llvm/test/Transforms/Attributor/IPConstantProp/pthreads.ll @@ -42,27 +42,16 @@ define dso_local i32 @main() { ; IS__TUNIT____-NEXT: [[CALL3:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @buz, i8* noalias nofree nonnull readnone align 8 dereferenceable(1) "no-capture-maybe-returned" [[ALLOC2]]) ; IS__TUNIT____-NEXT: ret i32 0 ; -; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@main() -; IS__CGSCC_OPM-NEXT: entry: -; IS__CGSCC_OPM-NEXT: [[ALLOC1:%.*]] = alloca i8, align 8 -; IS__CGSCC_OPM-NEXT: [[ALLOC2:%.*]] = alloca i8, align 8 -; IS__CGSCC_OPM-NEXT: [[THREAD:%.*]] = alloca i64, align 8 -; IS__CGSCC_OPM-NEXT: [[CALL:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @foo, i8* noalias nocapture nofree readnone align 536870912 null) -; IS__CGSCC_OPM-NEXT: [[CALL1:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @bar, i8* noalias nofree nonnull readnone align 8 dereferenceable(8) bitcast (i8** @GlobalVPtr to i8*)) -; IS__CGSCC_OPM-NEXT: [[CALL2:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @baz, i8* noalias nocapture nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC1]]) -; IS__CGSCC_OPM-NEXT: [[CALL3:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @buz, i8* noalias nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC2]]) -; IS__CGSCC_OPM-NEXT: ret i32 0 -; -; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@main() -; IS__CGSCC_NPM-NEXT: entry: -; IS__CGSCC_NPM-NEXT: [[ALLOC1:%.*]] = alloca i8, align 8 -; IS__CGSCC_NPM-NEXT: [[ALLOC2:%.*]] = alloca i8, align 8 -; IS__CGSCC_NPM-NEXT: [[THREAD:%.*]] = alloca i64, align 8 -; IS__CGSCC_NPM-NEXT: [[CALL:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @foo, i8* noalias nocapture nofree readnone align 536870912 null) -; IS__CGSCC_NPM-NEXT: [[CALL1:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @bar, i8* noalias nofree nonnull readnone align 8 dereferenceable(8) bitcast (i8** @GlobalVPtr to i8*)) -; IS__CGSCC_NPM-NEXT: [[CALL2:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @baz, i8* noalias nocapture nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC1]]) -; IS__CGSCC_NPM-NEXT: [[CALL3:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @buz, i8* noalias nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC2]]) -; IS__CGSCC_NPM-NEXT: ret i32 0 +; IS__CGSCC____-LABEL: define {{[^@]+}}@main() +; IS__CGSCC____-NEXT: entry: +; IS__CGSCC____-NEXT: [[ALLOC1:%.*]] = alloca i8, align 8 +; IS__CGSCC____-NEXT: [[ALLOC2:%.*]] = alloca i8, align 8 +; IS__CGSCC____-NEXT: [[THREAD:%.*]] = alloca i64, align 8 +; IS__CGSCC____-NEXT: [[CALL:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @foo, i8* noalias nocapture nofree readnone align 536870912 null) +; IS__CGSCC____-NEXT: [[CALL1:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @bar, i8* noalias nofree nonnull readnone align 8 dereferenceable(8) bitcast (i8** @GlobalVPtr to i8*)) +; IS__CGSCC____-NEXT: [[CALL2:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @baz, i8* noalias nocapture nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC1]]) +; IS__CGSCC____-NEXT: [[CALL3:%.*]] = call i32 @pthread_create(i64* nonnull align 8 dereferenceable(8) [[THREAD]], %union.pthread_attr_t* noalias nocapture align 536870912 null, i8* (i8*)* nonnull @buz, i8* noalias nofree nonnull readnone align 8 dereferenceable(1) [[ALLOC2]]) +; IS__CGSCC____-NEXT: ret i32 0 ; entry: %alloc1 = alloca i8, align 8 @@ -93,15 +82,10 @@ define internal i8* @bar(i8* %arg) { ; IS__TUNIT____-NEXT: entry: ; IS__TUNIT____-NEXT: ret i8* bitcast (i8** @GlobalVPtr to i8*) ; -; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@bar -; IS__CGSCC_OPM-SAME: (i8* nofree readnone returned "no-capture-maybe-returned" [[ARG:%.*]]) -; IS__CGSCC_OPM-NEXT: entry: -; IS__CGSCC_OPM-NEXT: ret i8* bitcast (i8** @GlobalVPtr to i8*) -; -; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@bar -; IS__CGSCC_NPM-SAME: (i8* nofree readnone returned "no-capture-maybe-returned" [[ARG:%.*]]) -; IS__CGSCC_NPM-NEXT: entry: -; IS__CGSCC_NPM-NEXT: ret i8* bitcast (i8** @GlobalVPtr to i8*) +; IS__CGSCC____-LABEL: define {{[^@]+}}@bar +; IS__CGSCC____-SAME: (i8* nofree readnone returned "no-capture-maybe-returned" [[ARG:%.*]]) +; IS__CGSCC____-NEXT: entry: +; IS__CGSCC____-NEXT: ret i8* bitcast (i8** @GlobalVPtr to i8*) ; entry: ret i8* %arg diff --git a/llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll b/llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll index e514f4396776..040be04ec3ee 100644 --- a/llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll +++ b/llvm/test/Transforms/Attributor/IPConstantProp/return-argument.ll @@ -65,7 +65,7 @@ define internal { i32, i32 } @foo(i32 %A, i32 %B) { define void @caller(i1 %C) personality i32 (...)* @__gxx_personality_v0 { ; IS__TUNIT____-LABEL: define {{[^@]+}}@caller ; IS__TUNIT____-SAME: (i1 [[C:%.*]]) #2 personality i32 (...)* @__gxx_personality_v0 -; IS__TUNIT____-NEXT: [[Q:%.*]] = alloca i32 +; IS__TUNIT____-NEXT: [[Q:%.*]] = alloca i32, align 4 ; IS__TUNIT____-NEXT: [[W:%.*]] = call align 4 i32* @incdec(i1 [[C]], i32* noalias nofree nonnull align 4 dereferenceable(4) "no-capture-maybe-returned" [[Q]]) ; IS__TUNIT____-NEXT: [[S1:%.*]] = call { i32, i32 } @foo(i32 1, i32 2) ; IS__TUNIT____-NEXT: [[X1:%.*]] = extractvalue { i32, i32 } [[S1]], 0 @@ -83,7 +83,7 @@ define void @caller(i1 %C) personality i32 (...)* @__gxx_personality_v0 { ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@caller ; IS__CGSCC____-SAME: (i1 [[C:%.*]]) #1 personality i32 (...)* @__gxx_personality_v0 -; IS__CGSCC____-NEXT: [[Q:%.*]] = alloca i32 +; IS__CGSCC____-NEXT: [[Q:%.*]] = alloca i32, align 4 ; IS__CGSCC____-NEXT: [[W:%.*]] = call align 4 i32* @incdec(i1 [[C]], i32* noalias nofree nonnull align 4 dereferenceable(4) [[Q]]) ; IS__CGSCC____-NEXT: [[S1:%.*]] = call { i32, i32 } @foo(i32 1, i32 2) ; IS__CGSCC____-NEXT: [[X1:%.*]] = extractvalue { i32, i32 } [[S1]], 0 diff --git a/llvm/test/Transforms/Attributor/heap_to_stack.ll b/llvm/test/Transforms/Attributor/heap_to_stack.ll index 140a5ebac019..909cda54fc86 100644 --- a/llvm/test/Transforms/Attributor/heap_to_stack.ll +++ b/llvm/test/Transforms/Attributor/heap_to_stack.ll @@ -82,7 +82,7 @@ define void @test3() { ; IS________OPM-NEXT: ret void ; ; IS________NPM-LABEL: define {{[^@]+}}@test3() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: tail call void @no_sync_func(i8* noalias nocapture nofree [[TMP1]]) ; IS________NPM-NEXT: ret void ; @@ -102,7 +102,7 @@ define void @test3a(i8* %p) { ; ; IS________NPM-LABEL: define {{[^@]+}}@test3a ; IS________NPM-SAME: (i8* nocapture [[P:%.*]]) -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: tail call void @nofree_arg_only(i8* noalias nocapture nofree [[TMP1]], i8* nocapture [[P]]) ; IS________NPM-NEXT: ret void ; @@ -157,7 +157,7 @@ define void @test0() { ; IS________OPM-NEXT: ret void ; ; IS________NPM-LABEL: define {{[^@]+}}@test0() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 8 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 8, align 1 ; IS________NPM-NEXT: [[CALLOC_BC:%.*]] = bitcast i8* [[TMP1]] to i8* ; IS________NPM-NEXT: call void @llvm.memset.p0i8.i64(i8* [[CALLOC_BC]], i8 0, i64 8, i1 false) ; IS________NPM-NEXT: tail call void @no_sync_func(i8* noalias nocapture nofree [[TMP1]]) @@ -177,7 +177,7 @@ define void @test4() { ; IS________OPM-NEXT: ret void ; ; IS________NPM-LABEL: define {{[^@]+}}@test4() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: tail call void @nofree_func(i8* noalias nocapture nofree [[TMP1]]) ; IS________NPM-NEXT: ret void ; @@ -207,7 +207,7 @@ define void @test5(i32, i8* %p) { ; ; IS________NPM-LABEL: define {{[^@]+}}@test5 ; IS________NPM-SAME: (i32 [[TMP0:%.*]], i8* nocapture [[P:%.*]]) -; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0 ; IS________NPM-NEXT: br i1 [[TMP3]], label [[TMP5:%.*]], label [[TMP4:%.*]] ; IS________NPM: 4: @@ -256,7 +256,7 @@ define void @test6(i32) { ; ; IS________NPM-LABEL: define {{[^@]+}}@test6 ; IS________NPM-SAME: (i32 [[TMP0:%.*]]) -; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0 ; IS________NPM-NEXT: br i1 [[TMP3]], label [[TMP5:%.*]], label [[TMP4:%.*]] ; IS________NPM: 4: @@ -293,7 +293,7 @@ define void @test7() { ; IS________OPM-NEXT: unreachable ; ; IS________NPM-LABEL: define {{[^@]+}}@test7() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: [[TMP2:%.*]] = tail call i32 @no_return_call() ; IS________NPM-NEXT: unreachable ; @@ -359,7 +359,7 @@ define i32 @test10() { ; IS________OPM-NEXT: ret i32 [[TMP3]] ; ; IS________NPM-LABEL: define {{[^@]+}}@test10() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: tail call void @no_sync_func(i8* noalias nocapture nofree [[TMP1]]) ; IS________NPM-NEXT: [[TMP2:%.*]] = bitcast i8* [[TMP1]] to i32* ; IS________NPM-NEXT: store i32 10, i32* [[TMP2]], align 4 @@ -387,7 +387,7 @@ define i32 @test_lifetime() { ; IS________OPM-NEXT: ret i32 [[TMP3]] ; ; IS________NPM-LABEL: define {{[^@]+}}@test_lifetime() -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: tail call void @no_sync_func(i8* noalias nocapture nofree [[TMP1]]) ; IS________NPM-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* noalias nocapture nonnull align 4 dereferenceable(4) [[TMP1]]) ; IS________NPM-NEXT: [[TMP2:%.*]] = bitcast i8* [[TMP1]] to i32* @@ -455,7 +455,7 @@ define i32 @irreducible_cfg(i32 %0) { ; ; IS________NPM-LABEL: define {{[^@]+}}@irreducible_cfg ; IS________NPM-SAME: (i32 [[TMP0:%.*]]) -; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP2:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: [[TMP3:%.*]] = bitcast i8* [[TMP2]] to i32* ; IS________NPM-NEXT: store i32 10, i32* [[TMP3]], align 4 ; IS________NPM-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP0]], 1 @@ -555,7 +555,7 @@ define i32 @malloc_in_loop(i32 %0) { ; IS________NPM-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP6]], 0 ; IS________NPM-NEXT: br i1 [[TMP7]], label [[TMP8:%.*]], label [[TMP11:%.*]] ; IS________NPM: 8: -; IS________NPM-NEXT: [[TMP9:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP9:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to i32* ; IS________NPM-NEXT: store i32 1, i32* [[TMP10]], align 8 ; IS________NPM-NEXT: br label [[TMP4]] @@ -680,7 +680,7 @@ define void @test16a(i8 %v, i8** %P) { ; ; IS________NPM-LABEL: define {{[^@]+}}@test16a ; IS________NPM-SAME: (i8 [[V:%.*]], i8** nocapture nofree readnone [[P:%.*]]) -; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4 +; IS________NPM-NEXT: [[TMP1:%.*]] = alloca i8, i64 4, align 1 ; IS________NPM-NEXT: store i8 [[V]], i8* [[TMP1]], align 1 ; IS________NPM-NEXT: tail call void @no_sync_func(i8* noalias nocapture nofree nonnull dereferenceable(1) [[TMP1]]) ; IS________NPM-NEXT: ret void diff --git a/llvm/test/Transforms/Attributor/internal-noalias.ll b/llvm/test/Transforms/Attributor/internal-noalias.ll index 2605e46cb243..182cadf48a0b 100644 --- a/llvm/test/Transforms/Attributor/internal-noalias.ll +++ b/llvm/test/Transforms/Attributor/internal-noalias.ll @@ -123,10 +123,10 @@ define internal i32 @noalias_args_argmem_ro(i32* %A, i32* %B) #1 { ; ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@noalias_args_argmem_ro ; IS__TUNIT_NPM-SAME: (i32 [[TMP0:%.*]], i32 [[TMP1:%.*]]) -; IS__TUNIT_NPM-NEXT: [[B_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP1]], i32* [[B_PRIV]] -; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32 -; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]] +; IS__TUNIT_NPM-NEXT: [[B_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP1]], i32* [[B_PRIV]], align 4 +; IS__TUNIT_NPM-NEXT: [[A_PRIV:%.*]] = alloca i32, align 4 +; IS__TUNIT_NPM-NEXT: store i32 [[TMP0]], i32* [[A_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[T0:%.*]] = load i32, i32* [[A_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[T1:%.*]] = load i32, i32* [[B_PRIV]], align 4 ; IS__TUNIT_NPM-NEXT: [[ADD:%.*]] = add nsw i32 [[T0]], [[T1]] diff --git a/llvm/test/Transforms/Attributor/liveness.ll b/llvm/test/Transforms/Attributor/liveness.ll index bab313465e51..1ce3ff007092 100644 --- a/llvm/test/Transforms/Attributor/liveness.ll +++ b/llvm/test/Transforms/Attributor/liveness.ll @@ -1778,7 +1778,7 @@ indirectgoto: ; preds = %lab0, %entry define i32 @main() { ; CHECK-LABEL: define {{[^@]+}}@main() ; CHECK-NEXT: entry: -; CHECK-NEXT: [[F:%.*]] = alloca i32 +; CHECK-NEXT: [[F:%.*]] = alloca i32, align 4 ; CHECK-NEXT: br label [[FOR_COND_0:%.*]] ; CHECK: for.cond.0: ; CHECK-NEXT: [[G_0:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY_0:%.*]] ] @@ -1856,8 +1856,8 @@ define i32 @h(i32 %i) { define void @bad_gep() { ; CHECK-LABEL: define {{[^@]+}}@bad_gep() ; CHECK-NEXT: entry: -; CHECK-NEXT: [[N:%.*]] = alloca i8 -; CHECK-NEXT: [[M:%.*]] = alloca i8 +; CHECK-NEXT: [[N:%.*]] = alloca i8, align 1 +; CHECK-NEXT: [[M:%.*]] = alloca i8, align 1 ; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 1, i8* noalias nocapture nonnull dereferenceable(1) [[N]]) ; CHECK-NEXT: br label [[EXIT:%.*]] ; CHECK: while.body: diff --git a/llvm/test/Transforms/Attributor/memory_locations.ll b/llvm/test/Transforms/Attributor/memory_locations.ll index 3e6facb44577..42dc2e51be3a 100644 --- a/llvm/test/Transforms/Attributor/memory_locations.ll +++ b/llvm/test/Transforms/Attributor/memory_locations.ll @@ -325,7 +325,7 @@ define void @callerA2(i8* %arg) { ; CHECK: Function Attrs: readnone define void @callerB1() { ; CHECK-LABEL: define {{[^@]+}}@callerB1() -; CHECK-NEXT: [[STACK:%.*]] = alloca i8 +; CHECK-NEXT: [[STACK:%.*]] = alloca i8, align 1 ; CHECK-NEXT: [[TMP1:%.*]] = call i8* @argmem_only(i8* nonnull dereferenceable(1) [[STACK]]) ; CHECK-NEXT: ret void ; @@ -336,7 +336,7 @@ define void @callerB1() { ; CHECK: Function Attrs: inaccessiblememonly define void @callerB2() { ; CHECK-LABEL: define {{[^@]+}}@callerB2() -; CHECK-NEXT: [[STACK:%.*]] = alloca i8 +; CHECK-NEXT: [[STACK:%.*]] = alloca i8, align 1 ; CHECK-NEXT: [[TMP1:%.*]] = call i8* @inaccesible_argmem_only_decl(i8* nonnull dereferenceable(1) [[STACK]]) ; CHECK-NEXT: ret void ; @@ -476,7 +476,7 @@ define void @writeonly_global_via_arg_internal() { define i8 @recursive_not_readnone(i8* %ptr, i1 %c) { ; CHECK-LABEL: define {{[^@]+}}@recursive_not_readnone ; CHECK-SAME: (i8* nocapture nofree writeonly [[PTR:%.*]], i1 [[C:%.*]]) -; CHECK-NEXT: [[ALLOC:%.*]] = alloca i8 +; CHECK-NEXT: [[ALLOC:%.*]] = alloca i8, align 1 ; CHECK-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] ; CHECK: t: ; CHECK-NEXT: [[TMP1:%.*]] = call i8 @recursive_not_readnone(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[ALLOC]], i1 false) @@ -504,7 +504,7 @@ f: define internal i8 @recursive_not_readnone_internal(i8* %ptr, i1 %c) { ; IS__TUNIT____-LABEL: define {{[^@]+}}@recursive_not_readnone_internal ; IS__TUNIT____-SAME: (i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[PTR:%.*]], i1 [[C:%.*]]) -; IS__TUNIT____-NEXT: [[ALLOC:%.*]] = alloca i8 +; IS__TUNIT____-NEXT: [[ALLOC:%.*]] = alloca i8, align 1 ; IS__TUNIT____-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] ; IS__TUNIT____: t: ; IS__TUNIT____-NEXT: [[TMP1:%.*]] = call i8 @recursive_not_readnone_internal(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[ALLOC]], i1 false) @@ -516,7 +516,7 @@ define internal i8 @recursive_not_readnone_internal(i8* %ptr, i1 %c) { ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@recursive_not_readnone_internal ; IS__CGSCC____-SAME: (i8* nocapture nofree nonnull writeonly dereferenceable(1) [[PTR:%.*]], i1 [[C:%.*]]) -; IS__CGSCC____-NEXT: [[ALLOC:%.*]] = alloca i8 +; IS__CGSCC____-NEXT: [[ALLOC:%.*]] = alloca i8, align 1 ; IS__CGSCC____-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] ; IS__CGSCC____: t: ; IS__CGSCC____-NEXT: [[TMP1:%.*]] = call i8 @recursive_not_readnone_internal(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[ALLOC]], i1 false) @@ -542,7 +542,7 @@ f: define i8 @readnone_caller(i1 %c) { ; CHECK-LABEL: define {{[^@]+}}@readnone_caller ; CHECK-SAME: (i1 [[C:%.*]]) -; CHECK-NEXT: [[A:%.*]] = alloca i8 +; CHECK-NEXT: [[A:%.*]] = alloca i8, align 1 ; CHECK-NEXT: [[R:%.*]] = call i8 @recursive_not_readnone_internal(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[A]], i1 [[C]]) ; CHECK-NEXT: ret i8 [[R]] ; @@ -558,7 +558,7 @@ define i8 @readnone_caller(i1 %c) { define internal i8 @recursive_not_readnone_internal2(i8* %ptr, i1 %c) { ; IS__TUNIT____-LABEL: define {{[^@]+}}@recursive_not_readnone_internal2 ; IS__TUNIT____-SAME: (i8* noalias nocapture nofree nonnull writeonly [[PTR:%.*]], i1 [[C:%.*]]) -; IS__TUNIT____-NEXT: [[ALLOC:%.*]] = alloca i8 +; IS__TUNIT____-NEXT: [[ALLOC:%.*]] = alloca i8, align 1 ; IS__TUNIT____-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] ; IS__TUNIT____: t: ; IS__TUNIT____-NEXT: [[TMP1:%.*]] = call i8 @recursive_not_readnone_internal2(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[ALLOC]], i1 false) @@ -570,7 +570,7 @@ define internal i8 @recursive_not_readnone_internal2(i8* %ptr, i1 %c) { ; ; IS__CGSCC____-LABEL: define {{[^@]+}}@recursive_not_readnone_internal2 ; IS__CGSCC____-SAME: (i8* nocapture nofree nonnull writeonly [[PTR:%.*]], i1 [[C:%.*]]) -; IS__CGSCC____-NEXT: [[ALLOC:%.*]] = alloca i8 +; IS__CGSCC____-NEXT: [[ALLOC:%.*]] = alloca i8, align 1 ; IS__CGSCC____-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] ; IS__CGSCC____: t: ; IS__CGSCC____-NEXT: [[TMP1:%.*]] = call i8 @recursive_not_readnone_internal2(i8* noalias nocapture nofree nonnull writeonly dereferenceable(1) [[ALLOC]], i1 false) diff --git a/llvm/test/Transforms/Attributor/misc_crash.ll b/llvm/test/Transforms/Attributor/misc_crash.ll index d4cfe681c5d0..f8e10fa44545 100644 --- a/llvm/test/Transforms/Attributor/misc_crash.ll +++ b/llvm/test/Transforms/Attributor/misc_crash.ll @@ -87,7 +87,7 @@ define void @func4() { define internal void @func5(i32 %0) { ; CHECK-LABEL: define {{[^@]+}}@func5() -; CHECK-NEXT: [[TMP:%.*]] = alloca i8* +; CHECK-NEXT: [[TMP:%.*]] = alloca i8*, align 8 ; CHECK-NEXT: br label [[BLOCK:%.*]] ; CHECK: block: ; CHECK-NEXT: store i8* blockaddress(@func5, [[BLOCK]]), i8** [[TMP]], align 8 diff --git a/llvm/test/Transforms/Attributor/readattrs.ll b/llvm/test/Transforms/Attributor/readattrs.ll index 1a7301e3c1b6..a78ffb0653b6 100644 --- a/llvm/test/Transforms/Attributor/readattrs.ll +++ b/llvm/test/Transforms/Attributor/readattrs.ll @@ -195,7 +195,7 @@ declare void @escape_readonly_ptr(i8** %addr, i8* readonly %ptr) define void @unsound_readnone(i8* %ignored, i8* %escaped_then_written) { ; CHECK-LABEL: define {{[^@]+}}@unsound_readnone ; CHECK-SAME: (i8* nocapture nofree readnone [[IGNORED:%.*]], i8* [[ESCAPED_THEN_WRITTEN:%.*]]) -; CHECK-NEXT: [[ADDR:%.*]] = alloca i8* +; CHECK-NEXT: [[ADDR:%.*]] = alloca i8*, align 8 ; CHECK-NEXT: call void @escape_readnone_ptr(i8** nonnull align 8 dereferenceable(8) [[ADDR]], i8* noalias readnone [[ESCAPED_THEN_WRITTEN]]) ; CHECK-NEXT: [[ADDR_LD:%.*]] = load i8*, i8** [[ADDR]], align 8 ; CHECK-NEXT: store i8 0, i8* [[ADDR_LD]], align 1 @@ -211,7 +211,7 @@ define void @unsound_readnone(i8* %ignored, i8* %escaped_then_written) { define void @unsound_readonly(i8* %ignored, i8* %escaped_then_written) { ; CHECK-LABEL: define {{[^@]+}}@unsound_readonly ; CHECK-SAME: (i8* nocapture nofree readnone [[IGNORED:%.*]], i8* [[ESCAPED_THEN_WRITTEN:%.*]]) -; CHECK-NEXT: [[ADDR:%.*]] = alloca i8* +; CHECK-NEXT: [[ADDR:%.*]] = alloca i8*, align 8 ; CHECK-NEXT: call void @escape_readonly_ptr(i8** nonnull align 8 dereferenceable(8) [[ADDR]], i8* readonly [[ESCAPED_THEN_WRITTEN]]) ; CHECK-NEXT: [[ADDR_LD:%.*]] = load i8*, i8** [[ADDR]], align 8 ; CHECK-NEXT: store i8 0, i8* [[ADDR_LD]], align 1 diff --git a/llvm/test/Transforms/Attributor/undefined_behavior.ll b/llvm/test/Transforms/Attributor/undefined_behavior.ll index 8a2185f5adc7..b49c26e8a2f7 100644 --- a/llvm/test/Transforms/Attributor/undefined_behavior.ll +++ b/llvm/test/Transforms/Attributor/undefined_behavior.ll @@ -340,7 +340,7 @@ e: ; FIXME: Currently it doesn't propagate the undef. define i32 @cond_br_on_undef_uninit() { ; CHECK-LABEL: define {{[^@]+}}@cond_br_on_undef_uninit() -; CHECK-NEXT: [[ALLOC:%.*]] = alloca i1 +; CHECK-NEXT: [[ALLOC:%.*]] = alloca i1, align 1 ; CHECK-NEXT: [[COND:%.*]] = load i1, i1* [[ALLOC]], align 1 ; CHECK-NEXT: br i1 [[COND]], label [[T:%.*]], label [[E:%.*]] ; CHECK: t: diff --git a/llvm/test/Transforms/Attributor/value-simplify.ll b/llvm/test/Transforms/Attributor/value-simplify.ll index ac0fcaef1693..febfaba5b19a 100644 --- a/llvm/test/Transforms/Attributor/value-simplify.ll +++ b/llvm/test/Transforms/Attributor/value-simplify.ll @@ -290,10 +290,25 @@ define internal i32* @test_preallocated(i32* preallocated(i32) %a) { ret i32* %a } define i32* @complicated_args_preallocated() { -; CHECK-LABEL: define {{[^@]+}}@complicated_args_preallocated() -; CHECK-NEXT: [[C:%.*]] = call token @llvm.call.preallocated.setup(i32 1) -; CHECK-NEXT: [[CALL:%.*]] = call i32* @test_preallocated(i32* noalias nocapture nofree writeonly preallocated(i32) align 536870912 null) -; CHECK-NEXT: ret i32* [[CALL]] +; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@complicated_args_preallocated() +; IS__TUNIT_OPM-NEXT: [[C:%.*]] = call token @llvm.call.preallocated.setup(i32 1) +; IS__TUNIT_OPM-NEXT: [[CALL:%.*]] = call i32* @test_preallocated(i32* noalias nocapture nofree writeonly preallocated(i32) align 536870912 null) #5 [ "preallocated"(token [[C]]) ] +; IS__TUNIT_OPM-NEXT: ret i32* [[CALL]] +; +; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@complicated_args_preallocated() +; IS__TUNIT_NPM-NEXT: [[C:%.*]] = call token @llvm.call.preallocated.setup(i32 1) +; IS__TUNIT_NPM-NEXT: [[CALL:%.*]] = call i32* @test_preallocated(i32* noalias nocapture nofree writeonly preallocated(i32) align 536870912 null) #4 [ "preallocated"(token [[C]]) ] +; IS__TUNIT_NPM-NEXT: ret i32* [[CALL]] +; +; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@complicated_args_preallocated() +; IS__CGSCC_OPM-NEXT: [[C:%.*]] = call token @llvm.call.preallocated.setup(i32 1) +; IS__CGSCC_OPM-NEXT: [[CALL:%.*]] = call i32* @test_preallocated(i32* noalias nocapture nofree writeonly preallocated(i32) align 536870912 null) #6 [ "preallocated"(token [[C]]) ] +; IS__CGSCC_OPM-NEXT: ret i32* [[CALL]] +; +; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@complicated_args_preallocated() +; IS__CGSCC_NPM-NEXT: [[C:%.*]] = call token @llvm.call.preallocated.setup(i32 1) +; IS__CGSCC_NPM-NEXT: [[CALL:%.*]] = call i32* @test_preallocated(i32* noalias nocapture nofree writeonly preallocated(i32) align 536870912 null) #5 [ "preallocated"(token [[C]]) ] +; IS__CGSCC_NPM-NEXT: ret i32* [[CALL]] ; %c = call token @llvm.call.preallocated.setup(i32 1) %call = call i32* @test_preallocated(i32* preallocated(i32) null) ["preallocated"(token %c)] From llvm-commits at lists.llvm.org Fri Jul 10 08:44:02 2020 From: llvm-commits at lists.llvm.org (Valery Pykhtin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:44:02 +0000 (UTC) Subject: [PATCH] D82258: [RegisterCoalescer] Fix IMPLICIT_DEF init removal for a register on joining In-Reply-To: References: Message-ID: vpykhtin added a comment. ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82258/new/ https://reviews.llvm.org/D82258 From llvm-commits at lists.llvm.org Fri Jul 10 08:45:14 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via llvm-commits) Date: Fri, 10 Jul 2020 08:45:14 -0700 (PDT) Subject: [llvm] 864586d - [ARM] Pass -verify-machineinstr to test and XFAIL until fixed. Message-ID: <5f088d0a.1c69fb81.952b1.fc2a@mx.google.com> Author: Florian Hahn Date: 2020-07-10T16:44:52+01:00 New Revision: 864586d0fd7df8efb2ac1f85ad1122e9a8fae349 URL: https://github.com/llvm/llvm-project/commit/864586d0fd7df8efb2ac1f85ad1122e9a8fae349 DIFF: https://github.com/llvm/llvm-project/commit/864586d0fd7df8efb2ac1f85ad1122e9a8fae349.diff LOG: [ARM] Pass -verify-machineinstr to test and XFAIL until fixed. Some bots run with -verify-machineinstr enabled. Add it to the new test and XFAIL it until fixed. Added: Modified: llvm/test/CodeGen/ARM/dbg-tcreturn.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/ARM/dbg-tcreturn.ll b/llvm/test/CodeGen/ARM/dbg-tcreturn.ll index 9aed1bb14d58..37ec4e3d92ee 100644 --- a/llvm/test/CodeGen/ARM/dbg-tcreturn.ll +++ b/llvm/test/CodeGen/ARM/dbg-tcreturn.ll @@ -1,4 +1,5 @@ -; RUN: llc %s -o - -stop-after=finalize-isel | FileCheck %s +; XFAIL: * +; RUN: llc %s -o - -stop-after=finalize-isel -verify-machineinstr | FileCheck %s target datalayout = "e-m:o-p:32:32-Fi8-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32" target triple = "thumbv7-apple-ios7.0.0" From llvm-commits at lists.llvm.org Fri Jul 10 08:50:08 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:50:08 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <9edeffc76b2cb556a73af259529be062@localhost.localdomain> aykevl updated this revision to Diff 277061. aykevl added a comment. Thanks for the review! This should address all comments. `-Ttext=0` was indeed unnecessary (tests still pass) so I've removed it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 Files: lld/ELF/Arch/AVR.cpp lld/test/ELF/avr-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D78741.277061.patch Type: text/x-patch Size: 6055 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:51:11 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:51:11 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations In-Reply-To: References: Message-ID: <45db6963f1c335dc14daf82bd29f500d@localhost.localdomain> nemanjai added inline comments. ================ Comment at: llvm/test/CodeGen/PowerPC/float-load-store-pair.ll:71 ; CHECK-NEXT: ld 3, a14 at toc@l(3) -; CHECK-NEXT: stxvx 34, 1, 5 -; CHECK-NEXT: li 5, 152 ---------------- This missing store is a bit of a concern. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83437/new/ https://reviews.llvm.org/D83437 From llvm-commits at lists.llvm.org Fri Jul 10 08:51:33 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:51:33 +0000 (UTC) Subject: [PATCH] D83265: [MBP] Use profile count to compute tail dup cost if it is available In-Reply-To: References: Message-ID: <26b61854c72ba515044ee597aed636ae@localhost.localdomain> davidxl added a comment. A few things: 1. can this be extended to the TailDup pass too? 2. what is the impact on text size? 3. what is the performance with AFDO and XFDO? 4. Can the threshold be tuned higher (with more performnace)? ================ Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:420 + /// The return value is used to model tail duplication cost. + BlockFrequency getBlockCountOrFrequency(const MachineBasicBlock *BB) { + if (UseProfileCount) { ---------------- This should probbaly return Optional<..> ================ Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:426 + else + return 0; + } ---------------- Return None here. Also when None is returned, I think the caller needs to handle it conservatively -- perhaps resort to Freq based method ================ Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:3167 + SmallVector Succs; + for (MachineBasicBlock *Succ : BB->successors()) { + if (BlockFilter && !BlockFilter->count(Succ)) ---------------- Is this change related? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83265/new/ https://reviews.llvm.org/D83265 From llvm-commits at lists.llvm.org Fri Jul 10 08:52:07 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:52:07 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations In-Reply-To: References: Message-ID: <89a95a73567d6133349a6b8b3a36ab8e@localhost.localdomain> nemanjai accepted this revision. nemanjai added a comment. LGTM. Thank you. ================ Comment at: llvm/test/CodeGen/PowerPC/float-load-store-pair.ll:71 ; CHECK-NEXT: ld 3, a14 at toc@l(3) -; CHECK-NEXT: stxvx 34, 1, 5 -; CHECK-NEXT: li 5, 152 ---------------- nemanjai wrote: > This missing store is a bit of a concern. Oh NVM. That is a store to pass the f128 on the stack which we don't need to do according to the ABI. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83437/new/ https://reviews.llvm.org/D83437 From llvm-commits at lists.llvm.org Fri Jul 10 08:55:13 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:55:13 +0000 (UTC) Subject: [PATCH] D78663: [builtins] Add 32-bit shift builtins In-Reply-To: References: Message-ID: <2c067667d9855f0f4ae28bc08ad02dc0@localhost.localdomain> aykevl marked an inline comment as done. aykevl added inline comments. ================ Comment at: compiler-rt/lib/builtins/int_types.h:46-65 +#if _YUGA_LITTLE_ENDIAN + hu_int low; + hi_int high; +#else + hi_int high; + hu_int low; +#endif // _YUGA_LITTLE_ENDIAN ---------------- luismarques wrote: > Are these macro definitions processor-specific? If so, can't we use something more general? I'm not sure what `YUGA` means exactly. The constants are defined in int_endianness.h and are also used in various other places in int_types.h (this file). I'm just following the existing convention. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78663/new/ https://reviews.llvm.org/D78663 From llvm-commits at lists.llvm.org Fri Jul 10 08:55:46 2020 From: llvm-commits at lists.llvm.org (Michael Liao via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:55:46 +0000 (UTC) Subject: [PATCH] D83562: [fix-irreducible] Skip unreachable predecessors. Message-ID: hliao created this revision. hliao added a reviewer: sameerds. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. - Skip unreachable predecessors during header detection in SCC. Those unreachable blocks would be generated in the switch lowering pass in the corner cases or other frontends. Even though they could be removed through the CFG simplification, we should skip them during header detection. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83562 Files: llvm/lib/Transforms/Utils/FixIrreducible.cpp llvm/test/Transforms/FixIrreducible/unreachable.ll Index: llvm/test/Transforms/FixIrreducible/unreachable.ll =================================================================== --- /dev/null +++ llvm/test/Transforms/FixIrreducible/unreachable.ll @@ -0,0 +1,24 @@ +; RUN: opt %s -fix-irreducible -S -o - | FileCheck %s + +; CHECK-LABEL: @unreachable( +; CHECK: entry: +; CHECK-NOT: irr.guard: +define void @unreachable(i32 %n) { +entry: + br label %loop.body + +loop.body: + br label %inner.block + +unreachable.block: + br label %inner.block + +inner.block: + br i1 undef, label %loop.exit, label %loop.latch + +loop.latch: + br label %loop.body + +loop.exit: + ret void +} Index: llvm/lib/Transforms/Utils/FixIrreducible.cpp =================================================================== --- llvm/lib/Transforms/Utils/FixIrreducible.cpp +++ llvm/lib/Transforms/Utils/FixIrreducible.cpp @@ -281,6 +281,9 @@ LLVM_DEBUG(dbgs() << "Found headers:"); for (auto BB : reverse(Blocks)) { for (const auto P : predecessors(BB)) { + // Skip unreachable predecessors. + if (!DT.isReachableFromEntry(P)) + continue; if (!Blocks.count(P)) { LLVM_DEBUG(dbgs() << " " << BB->getName()); Headers.insert(BB); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83562.277062.patch Type: text/x-patch Size: 1223 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 08:59:15 2020 From: llvm-commits at lists.llvm.org (Xun Li via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 15:59:15 +0000 (UTC) Subject: [PATCH] D83563: [Coroutines] Fix a typo in documentation Message-ID: lxfind created this revision. lxfind added reviewers: GorNishanov, majnemer. Herald added subscribers: llvm-commits, modocache. Herald added a project: LLVM. In the example, the variable that's crossing suspend point was referred wrongly, fix it. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83563 Files: llvm/docs/Coroutines.rst Index: llvm/docs/Coroutines.rst =================================================================== --- llvm/docs/Coroutines.rst +++ llvm/docs/Coroutines.rst @@ -257,10 +257,10 @@ One of the steps of coroutine lowering is building the coroutine frame. The def-use chains are analyzed to determine which objects need be kept alive across suspend points. In the coroutine shown in the previous section, use of virtual register -`%n.val` is separated from the definition by a suspend point, therefore, it +`%inc` is separated from the definition by a suspend point, therefore, it cannot reside on the stack frame since the latter goes away once the coroutine is suspended and control is returned back to the caller. An i32 slot is -allocated in the coroutine frame and `%n.val` is spilled and reloaded from that +allocated in the coroutine frame and `%inc` is spilled and reloaded from that slot as needed. We also store addresses of the resume and destroy functions so that the -------------- next part -------------- A non-text attachment was scrubbed... Name: D83563.277063.patch Type: text/x-patch Size: 991 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:00:56 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:00:56 +0000 (UTC) Subject: [PATCH] D81788: [OpenMPOpt] ICV Tracking In-Reply-To: References: Message-ID: <820f5dc6b72b949f4d3e213c79fb93ec@localhost.localdomain> lebedev.ri added a comment. Herald added subscribers: okura, bbn. I've reverted this in rG1d542f0ca83fa1411d6501a8d088450d83abd5b8. There appears to be some kind of memory corruption/use-after-free/etc going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`, in `DeleteCallCB()`, `CI` is garbage. Reduced reproducer: `./bin/opt -O3 /tmp/test.ll`: ; ModuleID = '/home/lebedevri/CREDUCE/input.c' source_filename = "/home/lebedevri/CREDUCE/input.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" %struct.ident_t = type { i32, i32, i32, i32, i8* } @.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 @0 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8 ; Function Attrs: nounwind uwtable define dso_local i32 @b() #0 { %1 = alloca i32, align 4 %2 = call i32 @a() %3 = load i32, i32* %1, align 4 ret i32 %3 } ; Function Attrs: nounwind uwtable define internal i32 @a() #0 { %1 = alloca i32, align 4 %2 = call i32 @b() call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*)) %3 = load i32, i32* %1, align 4 ret i32 %3 } ; Function Attrs: norecurse nounwind uwtable define internal void @.omp_outlined.(i32* noalias %0, i32* noalias %1) #1 { %3 = alloca i32*, align 8 %4 = alloca i32*, align 8 store i32* %0, i32** %3, align 8, !tbaa !2 store i32* %1, i32** %4, align 8, !tbaa !2 ret void } ; Function Attrs: nounwind declare !callback !6 void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) #2 attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #2 = { nounwind } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{!"Debian clang version 11.0.0-++20200709100646+c92a8c0a0f6-1~exp1~20200709201313.3348"} !2 = !{!3, !3, i64 0} !3 = !{!"any pointer", !4, i64 0} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C/C++ TBAA"} !6 = !{!7} !7 = !{i64 2, i64 -1, i64 -1, i1 true} Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81788/new/ https://reviews.llvm.org/D81788 From llvm-commits at lists.llvm.org Fri Jul 10 09:00:56 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Fri, 10 Jul 2020 09:00:56 -0700 (PDT) Subject: [llvm] 1d542f0 - Revert "[OpenMPOpt] ICV Tracking" Message-ID: <5f0890b8.1c69fb81.436ee.ee50@mx.google.com> Author: Roman Lebedev Date: 2020-07-10T19:00:15+03:00 New Revision: 1d542f0ca83fa1411d6501a8d088450d83abd5b8 URL: https://github.com/llvm/llvm-project/commit/1d542f0ca83fa1411d6501a8d088450d83abd5b8 DIFF: https://github.com/llvm/llvm-project/commit/1d542f0ca83fa1411d6501a8d088450d83abd5b8.diff LOG: Revert "[OpenMPOpt] ICV Tracking" There appears to be some kind of memory corruption/use-after-free/etc going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`, in `DeleteCallCB()`, `CI` is garbage. WIll post reproducer in the original review. This reverts commit 6c4a5e9257bac022ffe60e466686ba7fc96ffd1a. Added: Modified: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/icv_tracking.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h b/llvm/include/llvm/Transforms/IPO/Attributor.h index c6261845b765..93fc89278c79 100644 --- a/llvm/include/llvm/Transforms/IPO/Attributor.h +++ b/llvm/include/llvm/Transforms/IPO/Attributor.h @@ -1036,14 +1036,6 @@ struct Attributor { identifyDefaultAbstractAttributes(const_cast(F)); } - /// Helper function to remove callsite. - void removeCallSite(CallInst *CI) { - if (!CI) - return; - - CGUpdater.removeCallSite(*CI); - } - /// Record that \p U is to be replaces with \p NV after information was /// manifested. This also triggers deletion of trivially dead istructions. bool changeUseAfterManifest(Use &U, Value &NV) { diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 8ad562f513e4..85d88ec3ca26 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -53,47 +53,8 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified, static constexpr auto TAG = "[" DEBUG_TYPE "]"; #endif -/// Helper struct to store tracked ICV values at specif instructions. -struct ICVValue { - Instruction *Inst; - Value *TrackedValue; - - ICVValue(Instruction *I, Value *Val) : Inst(I), TrackedValue(Val) {} -}; - -namespace llvm { - -// Provide DenseMapInfo for ICVValue -template <> struct DenseMapInfo { - using InstInfo = DenseMapInfo; - using ValueInfo = DenseMapInfo; - - static inline ICVValue getEmptyKey() { - return ICVValue(InstInfo::getEmptyKey(), ValueInfo::getEmptyKey()); - }; - - static inline ICVValue getTombstoneKey() { - return ICVValue(InstInfo::getTombstoneKey(), ValueInfo::getTombstoneKey()); - }; - - static unsigned getHashValue(const ICVValue &ICVVal) { - return detail::combineHashValue( - InstInfo::getHashValue(ICVVal.Inst), - ValueInfo::getHashValue(ICVVal.TrackedValue)); - } - - static bool isEqual(const ICVValue &LHS, const ICVValue &RHS) { - return InstInfo::isEqual(LHS.Inst, RHS.Inst) && - ValueInfo::isEqual(LHS.TrackedValue, RHS.TrackedValue); - } -}; - -} // end namespace llvm - namespace { -struct AAICVTracker; - /// OpenMP specific information. For now, stores RFIs and ICVs also needed for /// Attributor runs. struct OMPInformationCache : public InformationCache { @@ -160,9 +121,9 @@ struct OMPInformationCache : public InformationCache { /// Return the vector of uses in function \p F. UseVector &getOrCreateUseVector(Function *F) { - std::shared_ptr &UV = UsesMap[F]; + std::unique_ptr &UV = UsesMap[F]; if (!UV) - UV = std::make_shared(); + UV = std::make_unique(); return *UV; } @@ -218,7 +179,7 @@ struct OMPInformationCache : public InformationCache { private: /// Map from functions to all uses of this runtime function contained in /// them. - DenseMap> UsesMap; + DenseMap> UsesMap; }; /// The slice of the module we are allowed to look at. @@ -391,9 +352,9 @@ struct OpenMPOpt { OpenMPOpt(SmallVectorImpl &SCC, CallGraphUpdater &CGUpdater, OptimizationRemarkGetter OREGetter, - OMPInformationCache &OMPInfoCache, Attributor &A) + OMPInformationCache &OMPInfoCache) : M(*(*SCC.begin())->getParent()), SCC(SCC), CGUpdater(CGUpdater), - OREGetter(OREGetter), OMPInfoCache(OMPInfoCache), A(A) {} + OREGetter(OREGetter), OMPInfoCache(OMPInfoCache) {} /// Run all OpenMP optimizations on the underlying SCC/ModuleSlice. bool run() { @@ -424,7 +385,6 @@ struct OpenMPOpt { } } - Changed |= runAttributor(); Changed |= deduplicateRuntimeCalls(); Changed |= deleteParallelRegions(); @@ -786,206 +746,9 @@ struct OpenMPOpt { /// OpenMP-specific information cache. Also Used for Attributor runs. OMPInformationCache &OMPInfoCache; - - /// Attributor instance. - Attributor &A; - - /// Helper function to run Attributor on SCC. - bool runAttributor() { - if (SCC.empty()) - return false; - - registerAAs(); - - ChangeStatus Changed = A.run(); - - LLVM_DEBUG(dbgs() << "[Attributor] Done with " << SCC.size() - << " functions, result: " << Changed << ".\n"); - - return Changed == ChangeStatus::CHANGED; - } - - /// Populate the Attributor with abstract attribute opportunities in the - /// function. - void registerAAs() { - for (Function *F : SCC) { - if (F->isDeclaration()) - continue; - - A.getOrCreateAAFor(IRPosition::function(*F)); - } - } -}; - -/// Abstract Attribute for tracking ICV values. -struct AAICVTracker : public StateWrapper { - using Base = StateWrapper; - AAICVTracker(const IRPosition &IRP, Attributor &A) : Base(IRP) {} - - /// Returns true if value is assumed to be tracked. - bool isAssumedTracked() const { return getAssumed(); } - - /// Returns true if value is known to be tracked. - bool isKnownTracked() const { return getAssumed(); } - - /// Create an abstract attribute biew for the position \p IRP. - static AAICVTracker &createForPosition(const IRPosition &IRP, Attributor &A); - - /// Return the value with which \p I can be replaced for specific \p ICV. - virtual Value *getReplacementValue(InternalControlVar ICV, - const Instruction *I, Attributor &A) = 0; - - /// See AbstractAttribute::getName() - const std::string getName() const override { return "AAICVTracker"; } - - static const char ID; -}; - -struct AAICVTrackerFunction : public AAICVTracker { - AAICVTrackerFunction(const IRPosition &IRP, Attributor &A) - : AAICVTracker(IRP, A) {} - - // FIXME: come up with better string. - const std::string getAsStr() const override { return "ICVTracker"; } - - // FIXME: come up with some stats. - void trackStatistics() const override {} - - /// TODO: decide whether to deduplicate here, or use current - /// deduplicateRuntimeCalls function. - ChangeStatus manifest(Attributor &A) override { - ChangeStatus Changed = ChangeStatus::UNCHANGED; - - for (InternalControlVar &ICV : TrackableICVs) - if (deduplicateICVGetters(ICV, A)) - Changed = ChangeStatus::CHANGED; - - return Changed; - } - - bool deduplicateICVGetters(InternalControlVar &ICV, Attributor &A) { - auto &OMPInfoCache = static_cast(A.getInfoCache()); - auto &ICVInfo = OMPInfoCache.ICVs[ICV]; - auto &GetterRFI = OMPInfoCache.RFIs[ICVInfo.Getter]; - - bool Changed = false; - - auto ReplaceAndDeleteCB = [&](Use &U, Function &Caller) { - CallInst *CI = OpenMPOpt::getCallIfRegularCall(U, &GetterRFI); - Instruction *UserI = cast(U.getUser()); - Value *ReplVal = getReplacementValue(ICV, UserI, A); - - if (!ReplVal || !CI) - return false; - - A.removeCallSite(CI); - CI->replaceAllUsesWith(ReplVal); - CI->eraseFromParent(); - Changed = true; - return true; - }; - - GetterRFI.foreachUse(ReplaceAndDeleteCB); - return Changed; - } - - // Map of ICV to their values at specific program point. - EnumeratedArray, InternalControlVar, - InternalControlVar::ICV___last> - ICVValuesMap; - - // Currently only nthreads is being tracked. - // this array will only grow with time. - InternalControlVar TrackableICVs[1] = {ICV_nthreads}; - - ChangeStatus updateImpl(Attributor &A) override { - ChangeStatus HasChanged = ChangeStatus::UNCHANGED; - - Function *F = getAnchorScope(); - - auto &OMPInfoCache = static_cast(A.getInfoCache()); - - for (InternalControlVar ICV : TrackableICVs) { - auto &SetterRFI = OMPInfoCache.RFIs[OMPInfoCache.ICVs[ICV].Setter]; - - auto TrackValues = [&](Use &U, Function &) { - CallInst *CI = OpenMPOpt::getCallIfRegularCall(U); - if (!CI) - return false; - - // FIXME: handle setters with more that 1 arguments. - /// Track new value. - if (ICVValuesMap[ICV].insert(ICVValue(CI, CI->getArgOperand(0)))) - HasChanged = ChangeStatus::CHANGED; - - return false; - }; - - SetterRFI.foreachUse(TrackValues, F); - } - - return HasChanged; - } - - /// Return the value with which \p I can be replaced for specific \p ICV. - Value *getReplacementValue(InternalControlVar ICV, const Instruction *I, - Attributor &A) override { - const BasicBlock *CurrBB = I->getParent(); - - auto &ValuesSet = ICVValuesMap[ICV]; - auto &OMPInfoCache = static_cast(A.getInfoCache()); - auto &GetterRFI = OMPInfoCache.RFIs[OMPInfoCache.ICVs[ICV].Getter]; - - for (const auto &ICVVal : ValuesSet) { - if (CurrBB == ICVVal.Inst->getParent()) { - if (!ICVVal.Inst->comesBefore(I)) - continue; - - // both instructions are in the same BB and at \p I we know the ICV - // value. - while (I != ICVVal.Inst) { - // we don't yet know if a call might update an ICV. - // TODO: check callsite AA for value. - if (const auto *CB = dyn_cast(I)) - if (CB->getCalledFunction() != GetterRFI.Declaration) - return nullptr; - - I = I->getPrevNode(); - } - - // No call in between, return the value. - return ICVVal.TrackedValue; - } - } - - // No value was tracked. - return nullptr; - } }; } // namespace -const char AAICVTracker::ID = 0; - -AAICVTracker &AAICVTracker::createForPosition(const IRPosition &IRP, - Attributor &A) { - AAICVTracker *AA = nullptr; - switch (IRP.getPositionKind()) { - case IRPosition::IRP_INVALID: - case IRPosition::IRP_FLOAT: - case IRPosition::IRP_ARGUMENT: - case IRPosition::IRP_RETURNED: - case IRPosition::IRP_CALL_SITE_RETURNED: - case IRPosition::IRP_CALL_SITE_ARGUMENT: - case IRPosition::IRP_CALL_SITE: - llvm_unreachable("ICVTracker can only be created for function position!"); - case IRPosition::IRP_FUNCTION: - AA = new (A.Allocator) AAICVTrackerFunction(IRP, A); - break; - } - - return *AA; -} - PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, LazyCallGraph &CG, CGSCCUpdateResult &UR) { @@ -1022,10 +785,8 @@ PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator, /*CGSCC*/ &Functions, ModuleSlice); - Attributor A(Functions, InfoCache, CGUpdater); - // TODO: Compute the module slice we are allowed to look at. - OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A); + OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache); bool Changed = OMPOpt.run(); (void)Changed; return PreservedAnalyses::all(); @@ -1089,10 +850,8 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { Allocator, /*CGSCC*/ &Functions, ModuleSlice); - Attributor A(Functions, InfoCache, CGUpdater); - // TODO: Compute the module slice we are allowed to look at. - OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A); + OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache); return OMPOpt.run(); } diff --git a/llvm/test/Transforms/OpenMP/icv_tracking.ll b/llvm/test/Transforms/OpenMP/icv_tracking.ll index c2b5d40ce97a..e3704338a7a9 100644 --- a/llvm/test/Transforms/OpenMP/icv_tracking.ll +++ b/llvm/test/Transforms/OpenMP/icv_tracking.ll @@ -11,12 +11,16 @@ define dso_local i32 @foo(i32 %0, i32 %1) { ; CHECK-LABEL: define {{[^@]+}}@foo ; CHECK-SAME: (i32 [[TMP0:%.*]], i32 [[TMP1:%.*]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP0]]) +; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP1]]) -; CHECK-NEXT: tail call void @use(i32 [[TMP1]]) -; CHECK-NEXT: tail call void @use(i32 [[TMP1]]) +; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) +; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) ; CHECK-NEXT: tail call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*)) -; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP3]]) +; CHECK-NEXT: [[TMP7:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP7]]) ; CHECK-NEXT: ret i32 0 ; tail call void @omp_set_num_threads(i32 %0) @@ -47,13 +51,15 @@ define internal void @.omp_outlined.(i32* %0, i32* %1) { ; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 10) -; CHECK-NEXT: tail call void @use(i32 10) +; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) ; CHECK-NEXT: ret void ; ; FIXME: this value should be tracked and the rest of the getters deduplicated and replaced with it. %3 = tail call i32 @omp_get_max_threads() %4 = tail call i32 @omp_get_max_threads() tail call void @use(i32 %4) +; FIXME: this value ( min(%3, 10) ) should be tracked and the rest of the getters deduplicated and replaced with it. tail call void @omp_set_num_threads(i32 10) %5 = tail call i32 @omp_get_max_threads() tail call void @use(i32 %5) @@ -68,9 +74,10 @@ define dso_local i32 @bar(i32 %0, i32 %1) { ; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP0]], [[TMP1]] ; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 [[TMP0]], i32 [[TMP1]] ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP4]]) -; CHECK-NEXT: tail call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined..1 to void (i32*, i32*, ...)*)) ; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) +; CHECK-NEXT: tail call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined..1 to void (i32*, i32*, ...)*)) +; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP6]]) ; CHECK-NEXT: ret i32 0 ; %3 = icmp sgt i32 %0, %1 @@ -90,9 +97,10 @@ define internal void @.omp_outlined..1(i32* %0, i32* %1) { ; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP3]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 10) -; CHECK-NEXT: tail call void @use(i32 10) ; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) +; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) ; CHECK-NEXT: ret void ; %3 = tail call i32 @omp_get_max_threads() From llvm-commits at lists.llvm.org Fri Jul 10 09:01:46 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?Nicolai_H=C3=A4hnle_via_Phabricator?= via llvm-commits) Date: Fri, 10 Jul 2020 16:01:46 +0000 (UTC) Subject: [PATCH] D83088: Introduce CfgTraits abstraction In-Reply-To: References: Message-ID: <096321d6dc8c28fb1040aaa2fffd5e61@localhost.localdomain> nhaehnle marked an inline comment as done. nhaehnle added inline comments. ================ Comment at: llvm/include/llvm/CodeGen/MachineCfgTraits.h:136-138 + // Prefer to avoid support for bundled instructions as long as we + // don't really need it. + assert(!m_instr->isBundle()); ---------------- arsenm wrote: > nhaehnle wrote: > > arsenm wrote: > > > I've been thinking about more aggressively using bundles around call sites to handle waterfall looping around divergent calls with SGPR arguments > > Hmm, so what's the correct iteration behavior in the presence of bundles? Iterate over all instructions in the bundle (which is that MachineBasicBlock::instr_iterator does) and only iterate over explicit defs? I think that's what makes the most sense, and what I'm going with for now... > I don't think this actually needs to specially consider bundles. The BUNDLE itself is supposed to have the uses/defs that cover all the uses/defs inside the bundle. You shouldn't need to worry about the individual instructions This is what should be there with the last change :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83088/new/ https://reviews.llvm.org/D83088 From llvm-commits at lists.llvm.org Fri Jul 10 09:05:01 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via llvm-commits) Date: Fri, 10 Jul 2020 09:05:01 -0700 (PDT) Subject: [llvm] 1fbb719 - [LPM] Port CGProfilePass from NPM to LPM Message-ID: <5f0891ad.1c69fb81.7808c.0895@mx.google.com> Author: Zequan Wu Date: 2020-07-10T09:04:51-07:00 New Revision: 1fbb719470c6e0395abaab66c68fae3b8ae405d0 URL: https://github.com/llvm/llvm-project/commit/1fbb719470c6e0395abaab66c68fae3b8ae405d0 DIFF: https://github.com/llvm/llvm-project/commit/1fbb719470c6e0395abaab66c68fae3b8ae405d0.diff LOG: [LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013 Added: Modified: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll Removed: llvm/test/Other/new-pm-cgprofile.ll ################################################################################ diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def index 67c0b4203420..c7e01eb12851 100644 --- a/clang/include/clang/Basic/CodeGenOptions.def +++ b/clang/include/clang/Basic/CodeGenOptions.def @@ -254,7 +254,6 @@ CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables. CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer. CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer. CODEGENOPT(ProfileSampleAccurate, 1, 0) ///< Sample profile is accurate. -CODEGENOPT(CallGraphProfile , 1, 0) ///< Run call graph profile. /// Attempt to use register sized accesses to bit-fields in structures, when /// possible. diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 9e6d5e4593d3..dce0940670a2 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -620,6 +620,9 @@ void EmitAssemblyHelper::CreatePasses(legacy::PassManager &MPM, PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize; PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP; PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop; + // Only enable CGProfilePass when using integrated assembler, since + // non-integrated assemblers don't recognize .cgprofile section. + PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops; // Loop interleaving in the loop vectorizer has historically been set to be @@ -1144,7 +1147,9 @@ void EmitAssemblyHelper::EmitAssemblyWithNewPassManager( PTO.LoopInterleaving = CodeGenOpts.UnrollLoops; PTO.LoopVectorization = CodeGenOpts.VectorizeLoop; PTO.SLPVectorization = CodeGenOpts.VectorizeSLP; - PTO.CallGraphProfile = CodeGenOpts.CallGraphProfile; + // Only enable CGProfilePass when using integrated assembler, since + // non-integrated assemblers don't recognize .cgprofile section. + PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; PTO.Coroutines = LangOpts.Coroutines; PassInstrumentationCallbacks PIC; @@ -1562,7 +1567,9 @@ static void runThinLTOBackend( Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops; Conf.PTO.LoopVectorization = CGOpts.VectorizeLoop; Conf.PTO.SLPVectorization = CGOpts.VectorizeSLP; - Conf.PTO.CallGraphProfile = CGOpts.CallGraphProfile; + // Only enable CGProfilePass when using integrated assembler, since + // non-integrated assemblers don't recognize .cgprofile section. + Conf.PTO.CallGraphProfile = !CGOpts.DisableIntegratedAS; // Context sensitive profile. if (CGOpts.hasProfileCSIRInstr()) { diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index e24de29a309e..863c6b3ca4f3 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -860,7 +860,6 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK, Opts.RerollLoops = Args.hasArg(OPT_freroll_loops); Opts.DisableIntegratedAS = Args.hasArg(OPT_fno_integrated_as); - Opts.CallGraphProfile = !Opts.DisableIntegratedAS; Opts.Autolink = !Args.hasArg(OPT_fno_autolink); Opts.SampleProfileFile = std::string(Args.getLastArgValue(OPT_fprofile_sample_use_EQ)); diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index f0d5accf13c5..06e8507036ac 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -103,6 +103,7 @@ void initializeCFGViewerLegacyPassPass(PassRegistry&); void initializeCFIInstrInserterPass(PassRegistry&); void initializeCFLAndersAAWrapperPassPass(PassRegistry&); void initializeCFLSteensAAWrapperPassPass(PassRegistry&); +void initializeCGProfileLegacyPassPass(PassRegistry &); void initializeCallGraphDOTPrinterPass(PassRegistry&); void initializeCallGraphPrinterLegacyPassPass(PassRegistry&); void initializeCallGraphViewerPass(PassRegistry&); diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index 28e454d3b0fc..d1b9f269d5d4 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -282,6 +282,8 @@ ModulePass *createSampleProfileLoaderPass(StringRef Name); ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str, raw_ostream *ThinLinkOS = nullptr); +ModulePass *createCGProfileLegacyPass(); + } // End llvm namespace #endif diff --git a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h index 8b03bcba10e4..a9928c3f5a40 100644 --- a/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h +++ b/llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h @@ -156,6 +156,7 @@ class PassManagerBuilder { bool DisableTailCalls; bool DisableUnrollLoops; + bool CallGraphProfile; bool SLPVectorize; bool LoopVectorize; bool LoopsInterleaved; diff --git a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h index 28fd3804dec9..4cb45fd42f80 100644 --- a/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h +++ b/llvm/include/llvm/Transforms/Instrumentation/CGProfile.h @@ -19,11 +19,6 @@ namespace llvm { class CGProfilePass : public PassInfoMixin { public: PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM); - -private: - void addModuleFlags( - Module &M, - MapVector, uint64_t> &Counts) const; }; } // end namespace llvm diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 675511a542a1..771cdfd17aa5 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -248,10 +248,6 @@ static cl::opt EnableCHR("enable-chr-npm", cl::init(true), cl::Hidden, cl::desc("Enable control height reduction optimization (CHR)")); -static cl::opt EnableCallGraphProfile( - "enable-npm-call-graph-profile", cl::init(true), cl::Hidden, - cl::desc("Enable call graph profile pass for the new PM (default = on)")); - /// Flag to enable inline deferral during PGO. static cl::opt EnablePGOInlineDeferral("enable-npm-pgo-inline-deferral", cl::init(true), @@ -267,7 +263,7 @@ PipelineTuningOptions::PipelineTuningOptions() { Coroutines = false; LicmMssaOptCap = SetLicmMssaOptCap; LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap; - CallGraphProfile = EnableCallGraphProfile; + CallGraphProfile = true; } extern cl::opt EnableHotColdSplit; diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 9534fb874107..b65eb469a492 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -195,6 +195,7 @@ PassManagerBuilder::PassManagerBuilder() { PrepareForThinLTO = EnablePrepareForThinLTO; PerformThinLTO = EnablePerformThinLTO; DivergentTarget = false; + CallGraphProfile = true; } PassManagerBuilder::~PassManagerBuilder() { @@ -834,6 +835,10 @@ void PassManagerBuilder::populateModulePassManager( if (MergeFunctions) MPM.add(createMergeFunctionsPass()); + // Add Module flag "CG Profile" based on Branch Frequency Information. + if (CallGraphProfile) + MPM.add(createCGProfileLegacyPass()); + // LoopSink pass sinks instructions hoisted by LICM, which serves as a // canonicalization pass that enables other optimizations. As a result, // LoopSink pass needs to be a very late IR pass to avoid undoing LICM diff --git a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp index 2d5bd9570940..05451733625e 100644 --- a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp +++ b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp @@ -10,22 +10,47 @@ #include "llvm/ADT/MapVector.h" #include "llvm/Analysis/BlockFrequencyInfo.h" +#include "llvm/Analysis/LazyBlockFrequencyInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/MDBuilder.h" #include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" #include "llvm/ProfileData/InstrProf.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/Instrumentation.h" #include using namespace llvm; -PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { +static bool +addModuleFlags(Module &M, + MapVector, uint64_t> &Counts) { + if (Counts.empty()) + return false; + + LLVMContext &Context = M.getContext(); + MDBuilder MDB(Context); + std::vector Nodes; + + for (auto E : Counts) { + Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), + ValueAsMetadata::get(E.first.second), + MDB.createConstant(ConstantInt::get( + Type::getInt64Ty(Context), E.second))}; + Nodes.push_back(MDNode::get(Context, Vals)); + } + + M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); + return true; +} + +static bool runCGProfilePass( + Module &M, function_ref GetBFI, + function_ref GetTTI, bool LazyBFI) { MapVector, uint64_t> Counts; - FunctionAnalysisManager &FAM = - MAM.getResult(M).getManager(); InstrProfSymtab Symtab; auto UpdateCounts = [&](TargetTransformInfo &TTI, Function *F, Function *CalledF, uint64_t NewCount) { @@ -35,14 +60,18 @@ PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { Count = SaturatingAdd(Count, NewCount); }; // Ignore error here. Indirect calls are ignored if this fails. - (void)(bool)Symtab.create(M); + (void)(bool) Symtab.create(M); for (auto &F : M) { - if (F.isDeclaration()) + // Avoid extra cost of running passes for BFI when the function doesn't have + // entry count. Since LazyBlockFrequencyInfoPass only exists in LPM, check + // if using LazyBlockFrequencyInfoPass. + // TODO: Remove LazyBFI when LazyBlockFrequencyInfoPass is available in NPM. + if (F.isDeclaration() || (LazyBFI && !F.getEntryCount())) continue; - auto &BFI = FAM.getResult(F); + auto &BFI = GetBFI(F); if (BFI.getEntryFreq() == 0) continue; - TargetTransformInfo &TTI = FAM.getResult(F); + TargetTransformInfo &TTI = GetTTI(F); for (auto &BB : F) { Optional BBCount = BFI.getBlockProfileCount(&BB); if (!BBCount) @@ -69,28 +98,56 @@ PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { } } - addModuleFlags(M, Counts); - - return PreservedAnalyses::all(); + return addModuleFlags(M, Counts); } -void CGProfilePass::addModuleFlags( - Module &M, - MapVector, uint64_t> &Counts) const { - if (Counts.empty()) - return; +namespace { +struct CGProfileLegacyPass final : public ModulePass { + static char ID; + CGProfileLegacyPass() : ModulePass(ID) { + initializeCGProfileLegacyPassPass(*PassRegistry::getPassRegistry()); + } - LLVMContext &Context = M.getContext(); - MDBuilder MDB(Context); - std::vector Nodes; + void getAnalysisUsage(AnalysisUsage &AU) const override { + AU.setPreservesCFG(); + AU.addRequired(); + AU.addRequired(); + } - for (auto E : Counts) { - Metadata *Vals[] = {ValueAsMetadata::get(E.first.first), - ValueAsMetadata::get(E.first.second), - MDB.createConstant(ConstantInt::get( - Type::getInt64Ty(Context), E.second))}; - Nodes.push_back(MDNode::get(Context, Vals)); + bool runOnModule(Module &M) override { + auto GetBFI = [this](Function &F) -> BlockFrequencyInfo & { + return this->getAnalysis(F).getBFI(); + }; + auto GetTTI = [this](Function &F) -> TargetTransformInfo & { + return this->getAnalysis().getTTI(F); + }; + + return runCGProfilePass(M, GetBFI, GetTTI, true); } +}; - M.addModuleFlag(Module::Append, "CG Profile", MDNode::get(Context, Nodes)); +} // namespace + +char CGProfileLegacyPass::ID = 0; + +INITIALIZE_PASS(CGProfileLegacyPass, "cg-profile", "Call Graph Profile", false, + false) + +ModulePass *llvm::createCGProfileLegacyPass() { + return new CGProfileLegacyPass(); +} + +PreservedAnalyses CGProfilePass::run(Module &M, ModuleAnalysisManager &MAM) { + FunctionAnalysisManager &FAM = + MAM.getResult(M).getManager(); + auto GetBFI = [&FAM](Function &F) -> BlockFrequencyInfo & { + return FAM.getResult(F); + }; + auto GetTTI = [&FAM](Function &F) -> TargetTransformInfo & { + return FAM.getResult(F); + }; + + runCGProfilePass(M, GetBFI, GetTTI, false); + + return PreservedAnalyses::all(); } diff --git a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp index 64626225f23f..ad238f1357c6 100644 --- a/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp +++ b/llvm/lib/Transforms/Instrumentation/Instrumentation.cpp @@ -112,6 +112,7 @@ void llvm::initializeInstrumentation(PassRegistry &Registry) { initializePGOInstrumentationUseLegacyPassPass(Registry); initializePGOIndirectCallPromotionLegacyPassPass(Registry); initializePGOMemOPSizeOptLegacyPassPass(Registry); + initializeCGProfileLegacyPassPass(Registry); initializeInstrOrderFileLegacyPassPass(Registry); initializeInstrProfilingLegacyPassPass(Registry); initializeMemorySanitizerLegacyPassPass(Registry); diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 32d36f4e7280..85f9d8c867bf 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -276,6 +276,12 @@ ; GCN-O1-NEXT: Warn about non-applied transformations ; GCN-O1-NEXT: Alignment from assumptions ; GCN-O1-NEXT: Strip Unused Function Prototypes +; GCN-O1-NEXT: Call Graph Profile +; GCN-O1-NEXT: FunctionPass Manager +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: FunctionPass Manager ; GCN-O1-NEXT: Dominator Tree Construction ; GCN-O1-NEXT: Natural Loop Information @@ -623,6 +629,12 @@ ; GCN-O2-NEXT: Strip Unused Function Prototypes ; GCN-O2-NEXT: Dead Global Elimination ; GCN-O2-NEXT: Merge Duplicate Global Constants +; GCN-O2-NEXT: Call Graph Profile +; GCN-O2-NEXT: FunctionPass Manager +; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Natural Loop Information +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: FunctionPass Manager ; GCN-O2-NEXT: Dominator Tree Construction ; GCN-O2-NEXT: Natural Loop Information @@ -975,6 +987,12 @@ ; GCN-O3-NEXT: Strip Unused Function Prototypes ; GCN-O3-NEXT: Dead Global Elimination ; GCN-O3-NEXT: Merge Duplicate Global Constants +; GCN-O3-NEXT: Call Graph Profile +; GCN-O3-NEXT: FunctionPass Manager +; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Natural Loop Information +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: FunctionPass Manager ; GCN-O3-NEXT: Dominator Tree Construction ; GCN-O3-NEXT: Natural Loop Information diff --git a/llvm/test/Instrumentation/cgprofile.ll b/llvm/test/Instrumentation/cgprofile.ll index 1edf3b6ec518..70a1f81aa53e 100644 --- a/llvm/test/Instrumentation/cgprofile.ll +++ b/llvm/test/Instrumentation/cgprofile.ll @@ -1,4 +1,5 @@ ; RUN: opt < %s -passes cg-profile -S | FileCheck %s +; RUN: opt < %s -cg-profile -S | FileCheck %s declare void @b() diff --git a/llvm/test/Other/new-pm-cgprofile.ll b/llvm/test/Other/new-pm-cgprofile.ll deleted file mode 100644 index c7fe31ab570f..000000000000 --- a/llvm/test/Other/new-pm-cgprofile.ll +++ /dev/null @@ -1,11 +0,0 @@ -; RUN: opt -debug-pass-manager -passes='default' %s 2>&1 |FileCheck %s --check-prefixes=DEFAULT -; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=0 %s 2>&1 |FileCheck %s --check-prefixes=OFF -; RUN: opt -debug-pass-manager -passes='default' -enable-npm-call-graph-profile=1 %s 2>&1 |FileCheck %s --check-prefixes=ON -; -; DEFAULT: Running pass: CGProfilePass -; OFF-NOT: Running pass: CGProfilePass -; ON: Running pass: CGProfilePass - -define void @foo() { - ret void -} diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index ca72ec1f7567..56f85d0fb9a8 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -280,6 +280,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index f629bfc3444b..942f7d9dfead 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -285,6 +285,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index dde9fbeb9950..d975cc48b629 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -266,6 +266,12 @@ ; CHECK-NEXT: Strip Unused Function Prototypes ; CHECK-NEXT: Dead Global Elimination ; CHECK-NEXT: Merge Duplicate Global Constants +; CHECK-NEXT: Call Graph Profile +; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: FunctionPass Manager ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information From llvm-commits at lists.llvm.org Fri Jul 10 09:05:12 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:05:12 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <65e2cb0c0ace9414211f2e1048827320@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG1fbb719470c6: [LPM] Port CGProfilePass from NPM to LPM (authored by zequanwu). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 Files: clang/include/clang/Basic/CodeGenOptions.def clang/lib/CodeGen/BackendUtil.cpp clang/lib/Frontend/CompilerInvocation.cpp llvm/include/llvm/InitializePasses.h llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h llvm/include/llvm/Transforms/Instrumentation/CGProfile.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Transforms/IPO/PassManagerBuilder.cpp llvm/lib/Transforms/Instrumentation/CGProfile.cpp llvm/lib/Transforms/Instrumentation/Instrumentation.cpp llvm/test/CodeGen/AMDGPU/opt-pipeline.ll llvm/test/Instrumentation/cgprofile.ll llvm/test/Other/new-pm-cgprofile.ll llvm/test/Other/opt-O2-pipeline.ll llvm/test/Other/opt-O3-pipeline.ll llvm/test/Other/opt-Os-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83013.277065.patch Type: text/x-patch Size: 18322 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:05:40 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:05:40 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <2ea39b007d2c201a07ab6aeb31a8465f@localhost.localdomain> davidxl added inline comments. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:270 + int32_t *V = static_cast(TF_TensorData(Evaluator->getInput()[0])); + Features.fillTensor(V); + auto ER = Evaluator->evaluate(); ---------------- can this call be folded in to getFunctionFeatures? or this interface is expected to be used in other places? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Fri Jul 10 09:06:32 2020 From: llvm-commits at lists.llvm.org (Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:06:32 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: <0c70e1950cdd53a5b1775a913b1c594c@localhost.localdomain> khchen updated this revision to Diff 277067. khchen added a comment. avoid to check compiler version in testcase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 Files: clang/lib/Basic/Targets/RISCV.cpp clang/lib/Basic/Targets/RISCV.h clang/lib/Driver/ToolChains/Arch/RISCV.cpp clang/lib/Driver/ToolChains/Arch/RISCV.h clang/lib/Driver/ToolChains/CommonArgs.cpp clang/test/Driver/riscv-cpus.c llvm/include/llvm/Support/RISCVTargetParser.def llvm/include/llvm/Support/TargetParser.h llvm/lib/Support/TargetParser.cpp llvm/lib/Target/RISCV/RISCV.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D71124.277067.patch Type: text/x-patch Size: 17890 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:07:56 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:07:56 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: <0150e884cd5f11f8156b91469a751b3b@localhost.localdomain> zequanwu added a comment. In D83013#2143470 , @hans wrote: > Still lgtm. For what it's worth, I think you could have just re-committed with the fixes rather than uploading for review again. Gotcha, thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Fri Jul 10 09:09:33 2020 From: llvm-commits at lists.llvm.org (Kang Zhang via llvm-commits) Date: Fri, 10 Jul 2020 09:09:33 -0700 (PDT) Subject: [llvm] e5123ea - [NFC][PowerPC] Add a new MIR file to test mi-peephole pass Message-ID: <5f0892bd.1c69fb81.d4603.f729@mx.google.com> Author: Kang Zhang Date: 2020-07-10T16:08:07Z New Revision: e5123ea248eb460c6695dc28ed2f1cc53495356b URL: https://github.com/llvm/llvm-project/commit/e5123ea248eb460c6695dc28ed2f1cc53495356b DIFF: https://github.com/llvm/llvm-project/commit/e5123ea248eb460c6695dc28ed2f1cc53495356b.diff LOG: [NFC][PowerPC] Add a new MIR file to test mi-peephole pass Added: llvm/test/CodeGen/PowerPC/mi-peephole.mir Modified: Removed: ################################################################################ diff --git a/llvm/test/CodeGen/PowerPC/mi-peephole.mir b/llvm/test/CodeGen/PowerPC/mi-peephole.mir new file mode 100644 index 000000000000..8bf72461d545 --- /dev/null +++ b/llvm/test/CodeGen/PowerPC/mi-peephole.mir @@ -0,0 +1,37 @@ +# RUN: llc -mtriple=powerpc64le--linux-gnu -run-pass ppc-mi-peepholes %s -o - \ +# RUN: -verify-machineinstrs | FileCheck %s + +--- +name: testRLDIC +alignment: 16 +tracksRegLiveness: true +registers: + - { id: 0, class: g8rc } + - { id: 1, class: g8rc } + - { id: 2, class: g8rc } +liveins: + - { reg: '$x3', virtual-reg: '%0' } + - { reg: '$x4', virtual-reg: '%1' } +frameInfo: + maxAlignment: 1 +machineFunctionInfo: {} +body: | +body: | + bb.0.entry: + liveins: $x3, $x4 + + %1:g8rc = COPY $x4 + %0:g8rc = COPY $x3 + %2:g8rc = RLDICL killed %1, 0, 32 + %3:g8rc = RLDICR %2, 2, 61 + $x3 = COPY %3 + BLR8 implicit $lr8, implicit $rm, implicit $x3 + + ; CHECK-LABEL: testRLDIC + ; CHECK: bb.0.entry: + ; CHECK: %1:g8rc = COPY $x4 + ; CHECK: %0:g8rc = COPY $x3 + ; CHECK: %3:g8rc = RLDIC %1, 2, 30 + ; CHECK: $x3 = COPY %3 + ; CHECK: BLR8 implicit $lr8, implicit $rm, implicit $x3 +... From llvm-commits at lists.llvm.org Fri Jul 10 09:10:03 2020 From: llvm-commits at lists.llvm.org (Jinsong Ji via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:10:03 +0000 (UTC) Subject: [PATCH] D83565: [compiler-rt][CMake] Pass down LLVM_LIT_ARGS in runtime build Message-ID: jsji created this revision. jsji added reviewers: phosek, beanz. Herald added subscribers: llvm-commits, mgorny, dberris. Herald added a project: LLVM. We should also pass down the LLVM_LIT_ARGS in runtime build mode, so that the runtime tests can be well controlled as well. We actually passed this down in clang/runtime/CMakeLists.txt But not for calls from llvm/runtime/CMakeLists.txt. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83565 Files: llvm/cmake/modules/LLVMExternalProjectUtils.cmake Index: llvm/cmake/modules/LLVMExternalProjectUtils.cmake =================================================================== --- llvm/cmake/modules/LLVMExternalProjectUtils.cmake +++ llvm/cmake/modules/LLVMExternalProjectUtils.cmake @@ -250,6 +250,7 @@ -DLLVM_HAVE_LINK_VERSION_SCRIPT=${LLVM_HAVE_LINK_VERSION_SCRIPT} -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=${LLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO} -DLLVM_USE_RELATIVE_PATHS_IN_FILES=${LLVM_USE_RELATIVE_PATHS_IN_FILES} + -DLLVM_LIT_ARGS=${LLVM_LIT_ARGS} -DLLVM_SOURCE_PREFIX=${LLVM_SOURCE_PREFIX} -DPACKAGE_VERSION=${PACKAGE_VERSION} -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83565.277070.patch Type: text/x-patch Size: 734 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:10:57 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:10:57 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: mtrofin marked 2 inline comments as done. mtrofin added inline comments. ================ Comment at: llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp:270 + int32_t *V = static_cast(TF_TensorData(Evaluator->getInput()[0])); + Features.fillTensor(V); + auto ER = Evaluator->evaluate(); ---------------- davidxl wrote: > can this call be folded in to getFunctionFeatures? or this interface is expected to be used in other places? when we'll use getFunctionFeatures for publishing training data, the vector won't come from a TF_Tensor anymore, it'll just be from a memory buffer. This is because just getting the data out doesn't need to depend on any tensorflow stuff (I'll split, at that point, this file in 2) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 From llvm-commits at lists.llvm.org Fri Jul 10 09:14:34 2020 From: llvm-commits at lists.llvm.org (Jeroen Dobbelaere via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:14:34 +0000 (UTC) Subject: [PATCH] D68484: [PATCH 01/26] [noalias] LangRef: noalias intrinsics and ptr_provenance documentation. In-Reply-To: References: Message-ID: <531143c31b4903a2686e03dc06172a5d@localhost.localdomain> jeroen.dobbelaere added a comment. In D68484#2116935 , @jeroen.dobbelaere wrote: > Notes: > > - in a future version 'llvm.noalias' and 'llvm.provenance.noalias' will be merged into a single intrinsic. I was thinking of merging `llvm.noalias` and `llvm.provenance.noalias` but now decided to not do it: - llvm.noalias is a convenience shortcut to llvm.provenance.noalias + llvm.noalias.arg.guard - keeping the convenience intrinsic reduces the amount of generated code and makes tracking tbaa on the intrinsics easier. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68484/new/ https://reviews.llvm.org/D68484 From llvm-commits at lists.llvm.org Fri Jul 10 09:17:29 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:17:29 +0000 (UTC) Subject: [PATCH] D83566: [ARM] CSEL generation Message-ID: dmgreen created this revision. dmgreen added reviewers: simon_tatham, efriedma, SjoerdMeijer, samparker, ostannard. Herald added subscribers: danielkiss, hiraditya, kristof.beyls. Herald added a project: LLVM. This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. https://reviews.llvm.org/D83566 Files: llvm/lib/Target/ARM/ARMInstrThumb2.td llvm/lib/Target/ARM/Thumb2InstrInfo.cpp llvm/lib/Target/ARM/Thumb2InstrInfo.h llvm/test/CodeGen/Thumb2/csel.ll llvm/test/CodeGen/Thumb2/float-ops.ll llvm/test/CodeGen/Thumb2/mve-abs.ll llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll llvm/test/CodeGen/Thumb2/mve-vmaxv.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83566.276982.patch Type: text/x-patch Size: 20079 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:20:04 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:20:04 +0000 (UTC) Subject: [PATCH] D83567: [DAGCombiner] allow load/store merging if pairs can be rotated into place Message-ID: spatel created this revision. spatel added reviewers: efriedma, RKSimon, lebedev.ri, craig.topper. Herald added subscribers: ecnelises, hiraditya, kristof.beyls, mcrosier. Herald added a project: LLVM. This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895: https://bugs.llvm.org/show_bug.cgi?id=41098 https://bugs.llvm.org/show_bug.cgi?id=44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". https://reviews.llvm.org/D83567 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/merge-store-dependency.ll llvm/test/CodeGen/X86/stores-merging.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83567.277068.patch Type: text/x-patch Size: 9052 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:21:50 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:21:50 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: <0ce86e5b5b1096eb46df3437dc060880@localhost.localdomain> ychen added a comment. High-level request: how about split this patch into two, the first for the `require` pass part; the second for the PassInstrument callback. Then we could discuss the choices of first patch and D82344 . ================ Comment at: llvm/include/llvm/IR/PassInstrumentation.h:150 for (auto &C : Callbacks->BeforePassCallbacks) - ShouldRun &= C(Pass.name(), llvm::Any(&IR)); + ShouldRun &= C(Pass.name(), Pass.isRequired(), llvm::Any(&IR)); return ShouldRun; ---------------- aeubanks wrote: > aeubanks wrote: > > ychen wrote: > > > Could we do this to not changing the callback API? > > > `ShouldRun &= C(Pass.name(), llvm::Any(&IR)) || Pass.isRequired();` > > Each pass instrumentation should decide whether or not to run the pass based on whether or not the pass is required or optional. An optional pass may still be run, (which should be the case for the vast majority of instances). > > > > For example, the optnone would only care if a pass is required or not if it sees that a function is marked optnone. > > Similarly, opt-bisect would only care if a pass is required if it's hit the bisect limit. > Sorry, now I understand what you mean, the ands and ors confused me. > > I don't want to rule out the possibility of some future pass instrumentation wanting to skip even a required pass. But I am open to discussion on this point. > I don't want to rule out the possibility of some future pass instrumentation wanting to skip even a required pass. But I am open to discussion on this point. That makes sense. However, since this requires changing the callback API(return value or parameter), and there is no real use of it for the moment. IMHO we should defer it to the use case comes up. If there was no change to the callback API, I wouldn't mind doing this for now. The immediate motivation for the `require` is the same as D82344 (the approach is a little bit different). That's we don't want to consider infrastructure passes (pass managers, adaptor passes that have a nested pass manager) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Fri Jul 10 09:21:55 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:21:55 +0000 (UTC) Subject: [PATCH] D83568: [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a reviewer: efriedma. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83568 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-fp-converts.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83568.277072.patch Type: text/x-patch Size: 7475 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:22:39 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:22:39 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <1864e44604b7688fd27eddc747fe5639@localhost.localdomain> fhahn accepted this revision. fhahn added a comment. This revision is now accepted and ready to land. LGTM, thanks! Some optional nits related to wording inline (I think it would be good to start the sentences for the arguments with a `The`). ================ Comment at: llvm/docs/LangRef.rst:15502 +matrixes are passed and returned as vectors. This means that for a ``R`` x +``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in its +vector, with indices starting at 0. Currently column-major layout is assumed. ---------------- maybe something like `in the corresponding vector` instead of `in its vector`, where it might be a little unclear what `its` refers to. ================ Comment at: llvm/docs/LangRef.rst:15527 -The and arguments must be constant integers. The vector argument -%In and the returned vector must have * elements. +First argument %In is vector that corresponds to a x matrix. +Thus, arguments and correspond to the number of rows and columns, ---------------- `The first` ..? ================ Comment at: llvm/docs/LangRef.rst:15554 -The , and arguments must be constant -integers. The vector argument %A must have * elements, %B -must have * elements and the returned vector must have - * elements. +First vector argument %A corresponds to a matrix with * +elements, and second argument %B to a matrix with * ---------------- `The first`... `and the second` ...? ================ Comment at: llvm/docs/LangRef.rst:15558 +constant integers. The returned vector must have * +elements. Vectors %A, %B, and the returned vector all have the same float or +integer element type. ---------------- `must all have` ? ================ Comment at: llvm/docs/LangRef.rst:15588 -The , and arguments must be constant integers. The -returned vector must have * elements. %Stride must be >= . +First argument %Ptr is a pointer type to the returned vector type, and +correponds to the start address to load from. Second argument %Stride is a ---------------- `The first...`? ================ Comment at: llvm/docs/LangRef.rst:15589 +First argument %Ptr is a pointer type to the returned vector type, and +correponds to the start address to load from. Second argument %Stride is a +postive, constant integer with %Stride ``>=`` . %Stride is used to compute ---------------- `The second`? ================ Comment at: llvm/docs/LangRef.rst:15592 +the column memory addresses. I.e., for a column ``C``, its start memory +addresses is calculated with %Ptr + ``C`` * %Stride. Third Argument + is a boolean value. The fourth and fifth arguments, and ---------------- `The third` ================ Comment at: llvm/docs/LangRef.rst:15628 -The , , arguments must be constant integers. The -vector argument %In must have * elements. %Stride must be >= . +First argument %In is vector that corresponds to a x matrix to be +stored to memory. Second argument %Ptr is a pointer type to the vector type of ---------------- `The first argument %In is a vector`? ================ Comment at: llvm/docs/LangRef.rst:15629 +First argument %In is vector that corresponds to a x matrix to be +stored to memory. Second argument %Ptr is a pointer type to the vector type of +%In, and is the start address of the matrix in memory. Third argument %Stride ---------------- `The second argument %Ptr is a pointer to the`? ================ Comment at: llvm/docs/LangRef.rst:15630 +stored to memory. Second argument %Ptr is a pointer type to the vector type of +%In, and is the start address of the matrix in memory. Third argument %Stride +is a positive, constant integer with %Stride ``>=`` . %Stride is used to ---------------- `The third`? ================ Comment at: llvm/docs/LangRef.rst:15633 +compute the column memory addresses. I.e., for a column ``C``, its start memory +addresses is calculated with %Ptr + ``C`` * %Stride. Fourth argument + is a boolean value. Arguments and correspond to the ---------------- `The fourth`? ================ Comment at: llvm/docs/LangRef.rst:15634 +addresses is calculated with %Ptr + ``C`` * %Stride. Fourth argument + is a boolean value. Arguments and correspond to the +number of rows and columns, respectively, and must be positive, constant ---------------- `The arguments`? ================ Comment at: llvm/lib/IR/Verifier.cpp:5069 + Assert(ResultTy->getElementType() == Op1ElemTy, + "Type mismatch of the result and second operand vector!", IF); + ---------------- It would be good to be consistent with the capitalization/puncation with the existing message at 5073 or update the message there. Also, it might be good to include `vector element type` in the message, as in the message for Op0. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Fri Jul 10 09:25:29 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:25:29 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: LukeGeeson updated this revision to Diff 277074. LukeGeeson added a comment. removed FP16FML feature CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277074.patch Type: text/x-patch Size: 21179 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:26:03 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:26:03 +0000 (UTC) Subject: [PATCH] D83562: [fix-irreducible] Skip unreachable predecessors. In-Reply-To: References: Message-ID: sameerds added a subscriber: cdevadas. sameerds added a comment. @cdevadas reported that other parts of the AMDGPU backend are also affected by the unreachable blocks being produced in the switch lowering. Instead of fixing each such pass separately, it seems the best way forward it to put the block elimination earlier in the flow. @cdevadas is already working on such a change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83562/new/ https://reviews.llvm.org/D83562 From llvm-commits at lists.llvm.org Fri Jul 10 09:26:40 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:26:40 +0000 (UTC) Subject: [PATCH] D83568: [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. In-Reply-To: References: Message-ID: <4a11273662a0c6de135136e8f9cfbe50@localhost.localdomain> paulwalker-arm added reviewers: david-arm, kmclaughlin, sdesmalen. paulwalker-arm added a comment. I've no illusions about there being other omissions but this patch resolves the last of the functional issues for the test set I've been measuring against. I'll update and abandon the POC patches (D71760 & D71767 ) accordingly. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83568/new/ https://reviews.llvm.org/D83568 From llvm-commits at lists.llvm.org Fri Jul 10 09:28:29 2020 From: llvm-commits at lists.llvm.org (Zhang Kang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:28:29 +0000 (UTC) Subject: [PATCH] D83569: [PowerPC] Fix the killed flag in mi-peephole pass Message-ID: ZhangKang created this revision. ZhangKang added reviewers: PowerPC, nemanjai, jsji, steven.zhang. ZhangKang added a project: LLVM. Herald added subscribers: shchenz, wuzish, hiraditya. When doing the peephole 'RLDICL + RLDICR --> RLDIC', we forget to move the killed flag from srcMI to destMI. This patch is to fix this bug. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83569 Files: llvm/lib/Target/PowerPC/PPCMIPeephole.cpp llvm/test/CodeGen/PowerPC/mi-peephole.mir Index: llvm/test/CodeGen/PowerPC/mi-peephole.mir =================================================================== --- llvm/test/CodeGen/PowerPC/mi-peephole.mir +++ llvm/test/CodeGen/PowerPC/mi-peephole.mir @@ -31,7 +31,7 @@ ; CHECK: bb.0.entry: ; CHECK: %1:g8rc = COPY $x4 ; CHECK: %0:g8rc = COPY $x3 - ; CHECK: %3:g8rc = RLDIC %1, 2, 30 + ; CHECK: %3:g8rc = RLDIC killed %1, 2, 30 ; CHECK: $x3 = COPY %3 ; CHECK: BLR8 implicit $lr8, implicit $rm, implicit $x3 ... Index: llvm/lib/Target/PowerPC/PPCMIPeephole.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCMIPeephole.cpp +++ llvm/lib/Target/PowerPC/PPCMIPeephole.cpp @@ -1556,6 +1556,11 @@ MI.getOperand(2).setImm(NewSH); MI.getOperand(3).setImm(NewMB); + if (SrcMI->getOperand(1).isKill()) { + MI.getOperand(1).setIsKill(true); + SrcMI->getOperand(1).setIsKill(false); + } + LLVM_DEBUG(dbgs() << "To: "); LLVM_DEBUG(MI.dump()); NumRotatesCollapsed++; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83569.277073.patch Type: text/x-patch Size: 1017 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:29:04 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:29:04 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <4110487d1355de851f44aebb0694095c@localhost.localdomain> dmgreen accepted this revision. dmgreen added a comment. This revision is now accepted and ready to land. Thanks. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 From llvm-commits at lists.llvm.org Fri Jul 10 09:29:27 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:29:27 +0000 (UTC) Subject: [PATCH] D83568: [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. In-Reply-To: References: Message-ID: <97b7dce1f10aafbc3bdd159b064063c0@localhost.localdomain> paulwalker-arm marked an inline comment as done. paulwalker-arm added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:966 // 64bit results can mean a bigger than NEON input. - for (auto VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32}) + for (auto VT : {MVT::v8i8, MVT::v4i16}) setOperationAction(ISD::TRUNCATE, VT, Custom); ---------------- When doing the FP_ROUND work I realised there are no truncates with a result type of v2i32 that can have a legal type that is bigger than 128bit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83568/new/ https://reviews.llvm.org/D83568 From llvm-commits at lists.llvm.org Fri Jul 10 09:29:55 2020 From: llvm-commits at lists.llvm.org (Saleem Abdulrasool via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:29:55 +0000 (UTC) Subject: [PATCH] D83532: [lld-macho] Partial support for weak definitions In-Reply-To: References: Message-ID: <09755590357078c8d1aa9bb4052a7d05@localhost.localdomain> compnerd added inline comments. ================ Comment at: lld/MachO/ExportTrie.cpp:63 + uint8_t flags; + ExportInfo(const Symbol &sym) + : address(sym.getVA()), ---------------- Can you please mark this `explicit`? ================ Comment at: lld/MachO/ExportTrie.h:40 using TrieEntryCallback = - llvm::function_ref; + llvm::function_ref; ---------------- Are we sure we wont need any other flags? I wonder if it's better to just treat weakness as a flag. IIRC, there is a `EXPORT_SYMBOL_FLAGS_REEXPORT` and `EXPORT_SYMBOL_FLAGS_KIND_THREAD_LOCAL` that would be fairly good to account for. ================ Comment at: lld/MachO/InputFiles.cpp:235 + return make(name, isec, value, sym.n_desc & N_WEAK_DEF); }; ---------------- It's okay if you want to use braces, but please use them on both sides. However, I think that this is better written as: ``` if (sym.n_type & N_EXT) return symtab->addDefined(name, isec, value, sym.n_desc & N_WEAK_DEF); return make(name, isec, value, sym.n_desc & N_WEAK_DEF); ``` ================ Comment at: lld/MachO/SymbolTable.cpp:52 + } + } ---------------- What do you think of doing an early return instead if the symbol was inserted? Can you explain why currently it is not an error if the symbol is not inserted and not defined? Seems like a comment for that would be good. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83532/new/ https://reviews.llvm.org/D83532 From llvm-commits at lists.llvm.org Fri Jul 10 09:30:03 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:30:03 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: <23ceb78c3c9757d97f79949c41c5ed2c@localhost.localdomain> aeubanks marked an inline comment as done. aeubanks added a comment. In D83519#2144403 , @ychen wrote: > High-level request: how about split this patch into two, the first for the `require` pass part; the second for the PassInstrument callback. Then we could discuss the choices of first patch and D82344 . Good idea, will split the patch and take a closer look at your patch. ================ Comment at: llvm/include/llvm/IR/PassInstrumentation.h:150 for (auto &C : Callbacks->BeforePassCallbacks) - ShouldRun &= C(Pass.name(), llvm::Any(&IR)); + ShouldRun &= C(Pass.name(), Pass.isRequired(), llvm::Any(&IR)); return ShouldRun; ---------------- ychen wrote: > aeubanks wrote: > > aeubanks wrote: > > > ychen wrote: > > > > Could we do this to not changing the callback API? > > > > `ShouldRun &= C(Pass.name(), llvm::Any(&IR)) || Pass.isRequired();` > > > Each pass instrumentation should decide whether or not to run the pass based on whether or not the pass is required or optional. An optional pass may still be run, (which should be the case for the vast majority of instances). > > > > > > For example, the optnone would only care if a pass is required or not if it sees that a function is marked optnone. > > > Similarly, opt-bisect would only care if a pass is required if it's hit the bisect limit. > > Sorry, now I understand what you mean, the ands and ors confused me. > > > > I don't want to rule out the possibility of some future pass instrumentation wanting to skip even a required pass. But I am open to discussion on this point. > > I don't want to rule out the possibility of some future pass instrumentation wanting to skip even a required pass. But I am open to discussion on this point. > > That makes sense. However, since this requires changing the callback API(return value or parameter), and there is no real use of it for the moment. IMHO we should defer it to the use case comes up. If there was no change to the callback API, I wouldn't mind doing this for now. > > The immediate motivation for the `require` is the same as D82344 (the approach is a little bit different). That's we don't want to consider infrastructure passes (pass managers, adaptor passes that have a nested pass manager) Sounds good, I'll go with your approach for keeping the current API. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Fri Jul 10 09:31:50 2020 From: llvm-commits at lists.llvm.org (Alexey Bataev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:31:50 +0000 (UTC) Subject: [PATCH] D82550: [SLPVectorizer] handle vectorized lib functions In-Reply-To: References: Message-ID: <1850cc37546abdbf254f41896c1eb712@localhost.localdomain> ABataev accepted this revision. ABataev added a comment. This revision is now accepted and ready to land. LG Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82550/new/ https://reviews.llvm.org/D82550 From llvm-commits at lists.llvm.org Fri Jul 10 09:32:15 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:32:15 +0000 (UTC) Subject: [PATCH] D83570: [Matrix] Lowering pass should also run at O0 Message-ID: SjoerdMeijer created this revision. SjoerdMeijer added reviewers: fhahn, anemet, Gerolf. Herald added subscribers: tschuett, hiraditya. Herald added a project: LLVM. As the Matrix intrinsic lowering pass is not running at -O0, code using the matrix extension results in a backend crash. https://reviews.llvm.org/D83570 Files: clang/test/CodeGen/matrix-lowering.c llvm/lib/Transforms/IPO/PassManagerBuilder.cpp Index: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp =================================================================== --- llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -531,6 +531,9 @@ // new unnamed globals. MPM.add(createNameAnonGlobalPass()); } + // Matrix intrinsics need to lowered also at -O0, but don't run CSE as a + // clean-up after it, which we do with OptLevel > 0. + MPM.add(createLowerMatrixIntrinsicsPass()); return; } Index: clang/test/CodeGen/matrix-lowering.c =================================================================== --- /dev/null +++ clang/test/CodeGen/matrix-lowering.c @@ -0,0 +1,21 @@ +// RUN: %clang -O0 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O1 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O2 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O3 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Ofast -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Os -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Oz -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s + +// CHECK-NOT: @llvm.matrix.multiply + +typedef float m4x4_t __attribute__((matrix_type(4, 4))); + +m4x4_t f(m4x4_t a, m4x4_t b, m4x4_t c) { +// +// CHECK-LAVEL: f( +// CHECK-NOT: @llvm.matrix.multiply +// CHECK: } +// + return a + b * c; +} + -------------- next part -------------- A non-text attachment was scrubbed... Name: D83570.277075.patch Type: text/x-patch Size: 1682 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:37:15 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:37:15 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <73604b75f84ef2a53fcfc98405d420f0@localhost.localdomain> SjoerdMeijer added a comment. Thanks for reviewing, and I will make those changes before committing. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 From llvm-commits at lists.llvm.org Fri Jul 10 09:38:14 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:38:14 +0000 (UTC) Subject: [PATCH] D83570: [Matrix] Lowering pass should also run at O0 In-Reply-To: References: Message-ID: <87adf5bf5aff2fd2217192ab2412cdcc@localhost.localdomain> SjoerdMeijer updated this revision to Diff 277077. SjoerdMeijer added a comment. Fix typo in comment. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83570/new/ https://reviews.llvm.org/D83570 Files: clang/test/CodeGen/matrix-lowering.c llvm/lib/Transforms/IPO/PassManagerBuilder.cpp Index: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp =================================================================== --- llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -531,6 +531,9 @@ // new unnamed globals. MPM.add(createNameAnonGlobalPass()); } + // Matrix intrinsics need to be lowered also at -O0, but don't run CSE as a + // clean-up after it as we do with OptLevel > 0. + MPM.add(createLowerMatrixIntrinsicsPass()); return; } Index: clang/test/CodeGen/matrix-lowering.c =================================================================== --- /dev/null +++ clang/test/CodeGen/matrix-lowering.c @@ -0,0 +1,21 @@ +// RUN: %clang -O0 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O1 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O2 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -O3 -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Ofast -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Os -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang -Oz -fenable-matrix -target aarch64-linux-eabi %s -S -emit-llvm -o - | FileCheck %s + +// CHECK-NOT: @llvm.matrix.multiply + +typedef float m4x4_t __attribute__((matrix_type(4, 4))); + +m4x4_t f(m4x4_t a, m4x4_t b, m4x4_t c) { +// +// CHECK-LAVEL: f( +// CHECK-NOT: @llvm.matrix.multiply +// CHECK: } +// + return a + b * c; +} + -------------- next part -------------- A non-text attachment was scrubbed... Name: D83570.277077.patch Type: text/x-patch Size: 1681 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:40:50 2020 From: llvm-commits at lists.llvm.org (Jessica Paquette via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:40:50 +0000 (UTC) Subject: [PATCH] D83384: [GlobalISel][InlineAsm] Fix buildCopy for inputs In-Reply-To: References: Message-ID: <800c5531ee49d9ef4aea8ad7f41459bd@localhost.localdomain> paquette accepted this revision. paquette added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83384/new/ https://reviews.llvm.org/D83384 From llvm-commits at lists.llvm.org Fri Jul 10 09:42:12 2020 From: llvm-commits at lists.llvm.org (Michael Liao via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:42:12 +0000 (UTC) Subject: [PATCH] D83562: [fix-irreducible] Skip unreachable predecessors. In-Reply-To: References: Message-ID: <27d4f01d3def256ed6c46e0c9d51732e@localhost.localdomain> hliao added a comment. In D83562#2144432 , @sameerds wrote: > @cdevadas reported that other parts of the AMDGPU backend are also affected by the unreachable blocks being produced in the switch lowering. Instead of fixing each such pass separately, it seems the best way forward it to put the block elimination earlier in the flow. @cdevadas is already working on such a change. I understand the cleanup of switch-lower pass could avoid the issue. But, as a general pass, it should be prepared to handle the general cases similar to other general passes. Also, it's a blocking issue to our schedule. Could you review this trivial change? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83562/new/ https://reviews.llvm.org/D83562 From llvm-commits at lists.llvm.org Fri Jul 10 09:42:17 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:42:17 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <7475127a5e864b6a11ae7ba6b504241a@localhost.localdomain> kmclaughlin added inline comments. ================ Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1096 + // Extract lo/hi halves of legal predicate types. + def : Pat<(nxv2i1 (extract_subvector (nxv4i1 PPR:$Ps), (i64 0))), ---------------- efriedma wrote: > Do we need to support extracting, for example, an nxv2i1 from an nxv16i1? We may need to support extracting a nxv2i1 from an nxv16i1, etc at some point, though I don't believe there are any code paths which would require this just now? At least, for the purposes of this patch I think we just need those patterns where the index is either 0 or half the number of elements. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Fri Jul 10 09:48:23 2020 From: llvm-commits at lists.llvm.org (Shawn Landden via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:48:23 +0000 (UTC) Subject: [PATCH] D83571: [RFC][WIP] New carry-less multiplication instruction llvm.experimental.clmul Message-ID: shawnl created this revision. shawnl added reviewers: lebedev.ri, jyknight, kparzysz, scanon. Herald added subscribers: llvm-commits, jdoerfert, hiraditya. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. [RFC][WIP] New carry-less multiplication instruction llvm.experimental.clmul The problem I am having in continuing on this is that SelectionDAG has this concept of illegal types, and most of these hardware instructions return v1i128 even if i128 is not legal. The way I would really like to handle this is to do the necessary lowering BEFORE translation. This would also mean that the lowering doesn't have to be written twice. Once for the interpreter, and again for translation. However I am wondering how my i128 CTPOP lowering worked....maybe I just need to step through the code. https://reviews.llvm.org/D83571 Files: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/GenericOpcodes.td llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir llvm/test/CodeGen/AArch64/clmul.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83571.277078.patch Type: text/x-patch Size: 11484 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:48:27 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:48:27 +0000 (UTC) Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization: consider more blocks In-Reply-To: References: Message-ID: <6c15e492597da45de177071bed2572f0@localhost.localdomain> nikic accepted this revision. nikic added a comment. This revision is now accepted and ready to land. LG CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83284/new/ https://reviews.llvm.org/D83284 From llvm-commits at lists.llvm.org Fri Jul 10 09:55:08 2020 From: llvm-commits at lists.llvm.org (David Sherwood via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:55:08 +0000 (UTC) Subject: [PATCH] D83572: [SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair Message-ID: david-arm created this revision. david-arm added reviewers: sdesmalen, efriedma, kmclaughlin. Herald added subscribers: llvm-commits, steven.zhang, psnobl, hiraditya, kristof.beyls, tschuett. Herald added a reviewer: rengolin. Herald added a project: LLVM. In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable property of TypeSize when trying to create an integer type of equivalent size. In fact, this optimisation makes no sense for scalable types since we don't know the size at compile time. I have changed the code to bail out when encountering scalable type sizes. I've added a test to llvm/test/CodeGen/AArch64/sve-fp.ll that exercises this code path. The test already emits an error if it encounters warnings due to implicit TypeSize->uint64_t conversions. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83572 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/sve-fp.ll Index: llvm/test/CodeGen/AArch64/sve-fp.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fp.ll +++ llvm/test/CodeGen/AArch64/sve-fp.ll @@ -208,6 +208,18 @@ ret void } +define void @float_copy(* %P1, * %P2) { +; CHECK-LABEL: float_copy: +; CHECK: // %bb.0: +; CHECK-NEXT: ptrue p0.s +; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0] +; CHECK-NEXT: st1w { z0.s }, p0, [x1] +; CHECK-NEXT: ret + %A = load , * %P1, align 16 + store %A, * %P2, align 16 + ret void +} + declare @llvm.aarch64.sve.frecps.x.nxv8f16(, ) declare @llvm.aarch64.sve.frecps.x.nxv4f32( , ) declare @llvm.aarch64.sve.frecps.x.nxv2f64(, ) Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -15759,7 +15759,14 @@ ST->getPointerInfo().getAddrSpace() != 0) return SDValue(); - EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits()); + TypeSize VTSize = VT.getSizeInBits(); + + // We don't know the size of scalable types at compile time so we cannot + // create an integer of the equivalent size. + if (VTSize.isScalable()) + return SDValue(); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VTSize.getFixedSize()); if (!TLI.isOperationLegal(ISD::LOAD, IntVT) || !TLI.isOperationLegal(ISD::STORE, IntVT) || !TLI.isDesirableToTransformToIntegerOp(ISD::LOAD, VT) || -------------- next part -------------- A non-text attachment was scrubbed... Name: D83572.277080.patch Type: text/x-patch Size: 1866 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 09:55:40 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:55:40 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: yamauchi added a comment. More comments? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 From llvm-commits at lists.llvm.org Fri Jul 10 09:55:58 2020 From: llvm-commits at lists.llvm.org (Hiroshi Yamauchi via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:55:58 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: <4105ac24bb03041a41cb52daccd70e3d@localhost.localdomain> yamauchi added a comment. More comments? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 From llvm-commits at lists.llvm.org Fri Jul 10 09:58:52 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:58:52 +0000 (UTC) Subject: [PATCH] D83361: [LLVM] Add libatomic load/store functions to TargetLibraryInfo In-Reply-To: References: Message-ID: jfb added a subscriber: dschuff. jfb added a comment. @dschuff can help answer the WebAssembly question. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83361/new/ https://reviews.llvm.org/D83361 From llvm-commits at lists.llvm.org Fri Jul 10 09:59:52 2020 From: llvm-commits at lists.llvm.org (serge via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 16:59:52 +0000 (UTC) Subject: [PATCH] D83460: Fix HexagonGenExtract return statu In-Reply-To: References: Message-ID: serge-sans-paille added a comment. ping @kparzysz for the final review then :-) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83460/new/ https://reviews.llvm.org/D83460 From llvm-commits at lists.llvm.org Fri Jul 10 10:02:09 2020 From: llvm-commits at lists.llvm.org (Sergej Jaskiewicz via llvm-commits) Date: Fri, 10 Jul 2020 10:02:09 -0700 (PDT) Subject: [compiler-rt] 979c502 - Revert "[compiler-rt] [test] Use the parent process env as base env in tests" Message-ID: <5f089f11.1c69fb81.b26bc.f323@mx.google.com> Author: Sergej Jaskiewicz Date: 2020-07-10T20:01:50+03:00 New Revision: 979c5023d3f0656cf51bd645936f52acd62b0333 URL: https://github.com/llvm/llvm-project/commit/979c5023d3f0656cf51bd645936f52acd62b0333 DIFF: https://github.com/llvm/llvm-project/commit/979c5023d3f0656cf51bd645936f52acd62b0333.diff LOG: Revert "[compiler-rt] [test] Use the parent process env as base env in tests" This reverts commit 5ab446cfe5503fd4431a94db4d741cf3b5fdcd15. That commit caused memory sanitizer test failures on PowerPC buildbots Added: Modified: compiler-rt/test/lit.common.cfg.py Removed: ################################################################################ diff --git a/compiler-rt/test/lit.common.cfg.py b/compiler-rt/test/lit.common.cfg.py index 32a602bfb318..9d0c214bd9a7 100644 --- a/compiler-rt/test/lit.common.cfg.py +++ b/compiler-rt/test/lit.common.cfg.py @@ -70,8 +70,6 @@ # to link. In r19 and later we just use the default which is libc++. config.cxx_mode_flags.append('-stdlib=libstdc++') -config.environment = dict(os.environ) - # Clear some environment variables that might affect Clang. possibly_dangerous_env_vars = ['ASAN_OPTIONS', 'DFSAN_OPTIONS', 'LSAN_OPTIONS', 'MSAN_OPTIONS', 'UBSAN_OPTIONS', From llvm-commits at lists.llvm.org Fri Jul 10 10:06:12 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:06:12 +0000 (UTC) Subject: [PATCH] D83518: IR: Define byref attribute In-Reply-To: References: Message-ID: <0b40d5582190847c73e124bd388f4880@localhost.localdomain> arsenm updated this revision to Diff 277082. arsenm edited the summary of this revision. arsenm added a comment. Preserve the type, which does matter for some pointer arguments to get the address space. The pointee alignment is also significant, which would need to be solved to only use byref arguments CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83518/new/ https://reviews.llvm.org/D83518 Files: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/IR/Argument.h llvm/include/llvm/IR/Attributes.h llvm/include/llvm/IR/Attributes.td llvm/include/llvm/IR/Function.h llvm/lib/Analysis/MemoryBuiltins.cpp llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/AsmParser/LLParser.h llvm/lib/AsmParser/LLToken.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/IR/AsmWriter.cpp llvm/lib/IR/AttributeImpl.h llvm/lib/IR/Attributes.cpp llvm/lib/IR/Function.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/Utils/CodeExtractor.cpp llvm/test/Assembler/byref-parse-error-0.ll llvm/test/Assembler/byref-parse-error-1.ll llvm/test/Assembler/byref-parse-error-10.ll llvm/test/Assembler/byref-parse-error-2.ll llvm/test/Assembler/byref-parse-error-3.ll llvm/test/Assembler/byref-parse-error-4.ll llvm/test/Assembler/byref-parse-error-5.ll llvm/test/Assembler/byref-parse-error-6.ll llvm/test/Assembler/byref-parse-error-7.ll llvm/test/Assembler/byref-parse-error-8.ll llvm/test/Assembler/byref-parse-error-9.ll llvm/test/Bitcode/attributes.ll llvm/test/CodeGen/X86/byref.ll llvm/test/Instrumentation/AddressSanitizer/byref-args.ll llvm/test/Transforms/DeadArgElim/byref.ll llvm/test/Transforms/Inline/byref-align.ll llvm/test/Transforms/LowerConstantIntrinsics/objectsize_basic.ll llvm/test/Verifier/byref.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83518.277082.patch Type: text/x-patch Size: 38384 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:11:04 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:11:04 +0000 (UTC) Subject: [PATCH] D83331: [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. In-Reply-To: References: Message-ID: <7770d93c7af0bdec2e67a030e1ab29b9@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83331/new/ https://reviews.llvm.org/D83331 From llvm-commits at lists.llvm.org Fri Jul 10 10:12:59 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:12:59 +0000 (UTC) Subject: [PATCH] D83574: CodeGen: Add support for lowering byref attribute Message-ID: arsenm created this revision. Herald added subscribers: hiraditya, wdng. Herald added a project: LLVM. https://reviews.llvm.org/D83574 Files: llvm/include/llvm/CodeGen/TargetCallingConv.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83574.277085.patch Type: text/x-patch Size: 8627 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:14:19 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:14:19 +0000 (UTC) Subject: [PATCH] D79630: AMDGPU: Start interpreting byref on kernel arguments In-Reply-To: References: Message-ID: <418effd411d7bfa3fbd2beba2560bb20@localhost.localdomain> arsenm updated this revision to Diff 277086. arsenm retitled this revision from "AMDGPU: Start interpreting byval on kernel arguments" to "AMDGPU: Start interpreting byref on kernel arguments". arsenm edited the summary of this revision. arsenm added a comment. Switch to byref CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79630/new/ https://reviews.llvm.org/D79630 Files: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/kernel-args.ll llvm/test/CodeGen/AMDGPU/kernel-argument-dag-lowering.ll llvm/test/CodeGen/AMDGPU/lower-kernargs.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79630.277086.patch Type: text/x-patch Size: 123767 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:14:47 2020 From: llvm-commits at lists.llvm.org (Benjamin Kramer via llvm-commits) Date: Fri, 10 Jul 2020 10:14:47 -0700 (PDT) Subject: [llvm] b887da8 - [CGProfile] Fix layering, IPO depends in Instrumentation. Message-ID: <5f08a207.1c69fb81.3a60c.0541@mx.google.com> Author: Benjamin Kramer Date: 2020-07-10T19:13:47+02:00 New Revision: b887da81cc179575932302b87ae8dc7d3f5e3690 URL: https://github.com/llvm/llvm-project/commit/b887da81cc179575932302b87ae8dc7d3f5e3690 DIFF: https://github.com/llvm/llvm-project/commit/b887da81cc179575932302b87ae8dc7d3f5e3690.diff LOG: [CGProfile] Fix layering, IPO depends in Instrumentation. Added: Modified: llvm/include/llvm/Transforms/IPO.h llvm/include/llvm/Transforms/Instrumentation.h llvm/lib/Transforms/Instrumentation/CGProfile.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index d1b9f269d5d4..28e454d3b0fc 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -282,8 +282,6 @@ ModulePass *createSampleProfileLoaderPass(StringRef Name); ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str, raw_ostream *ThinLinkOS = nullptr); -ModulePass *createCGProfileLegacyPass(); - } // End llvm namespace #endif diff --git a/llvm/include/llvm/Transforms/Instrumentation.h b/llvm/include/llvm/Transforms/Instrumentation.h index 04f211da9819..d4373d7b39ea 100644 --- a/llvm/include/llvm/Transforms/Instrumentation.h +++ b/llvm/include/llvm/Transforms/Instrumentation.h @@ -88,6 +88,8 @@ ModulePass *createPGOIndirectCallPromotionLegacyPass(bool InLTO = false, bool SamplePGO = false); FunctionPass *createPGOMemOPSizeOptLegacyPass(); +ModulePass *createCGProfileLegacyPass(); + // The pgo-specific indirect call promotion function declared below is used by // the pgo-driven indirect call promotion and sample profile passes. It's a // wrapper around llvm::promoteCall, et al. that additionally computes !prof diff --git a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp index 05451733625e..0cc0d9b07387 100644 --- a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp +++ b/llvm/lib/Transforms/Instrumentation/CGProfile.cpp @@ -18,7 +18,6 @@ #include "llvm/IR/PassManager.h" #include "llvm/InitializePasses.h" #include "llvm/ProfileData/InstrProf.h" -#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/Instrumentation.h" #include From llvm-commits at lists.llvm.org Fri Jul 10 10:16:15 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:16:15 +0000 (UTC) Subject: [PATCH] D83571: [RFC][WIP] New carry-less multiplication instruction llvm.experimental.clmul In-Reply-To: References: Message-ID: craig.topper added inline comments. ================ Comment at: llvm/docs/LangRef.rst:8417 +but with no carrying of overflow values. If the operands are polynomials in a +Glois field of 2 elements, then this is equilivent to multiplication. Thus, +this operation is also known as polynomial multiplication. ---------------- Glois - > Galois? equivilent -> equivalent CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83571/new/ https://reviews.llvm.org/D83571 From llvm-commits at lists.llvm.org Fri Jul 10 10:17:40 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:17:40 +0000 (UTC) Subject: [PATCH] D80633: DAG: Handle expanding strict_fsub into fneg and strict_fadd In-Reply-To: References: Message-ID: <68805cd2a836c9a0fd098c994415d5ca@localhost.localdomain> arsenm updated this revision to Diff 277087. arsenm edited the summary of this revision. arsenm added a comment. This revision is now accepted and ready to land. Herald added a subscriber: kristof.beyls. Fix ARM/AArch64 tests by duplicating the logic rather than adding all the logic to account for getStrictFPOperationAction CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80633/new/ https://reviews.llvm.org/D80633 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/Target/AMDGPU/VOP2Instructions.td llvm/test/CodeGen/AMDGPU/strict_fsub.f16.ll llvm/test/CodeGen/AMDGPU/strict_fsub.f32.ll llvm/test/CodeGen/AMDGPU/strict_fsub.f64.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D80633.277087.patch Type: text/x-patch Size: 24283 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:21:15 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:21:15 +0000 (UTC) Subject: [PATCH] D83575: [NewPM] Allow passes to never be skipped Message-ID: aeubanks created this revision. aeubanks added reviewers: ychen, asbirlea, hans. Herald added a reviewer: bollu. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. PassInfoMixin now declares that passes inheriting from it are by default optional. Using RequiredPassInfoMixin overrides the pass to be required. All adaptors/managers must be required, since the pass(es) they are wrapping may be required. In the future, optnone and opt-bisect will use this mechanism to determine which passes to skip. This is an alternative to https://reviews.llvm.org/D82344, will add tests if we decide to go with this approach. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83575 Files: llvm/include/llvm/Analysis/CGSCCPassManager.h llvm/include/llvm/IR/PassManager.h llvm/include/llvm/IR/PassManagerInternal.h llvm/include/llvm/Transforms/Scalar/LoopPassManager.h polly/include/polly/ScopPass.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83575.277088.patch Type: text/x-patch Size: 4871 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:22:43 2020 From: llvm-commits at lists.llvm.org (Mircea Trofin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:22:43 +0000 (UTC) Subject: [PATCH] D82817: [llvm] Native size estimator for training -Oz inliner In-Reply-To: References: Message-ID: <285d93e569accc16a428eff25d492f5c@localhost.localdomain> mtrofin updated this revision to Diff 277090. mtrofin marked an inline comment as done. mtrofin added a comment. Switched the saved_model protobuf to its textual representation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82817/new/ https://reviews.llvm.org/D82817 Files: llvm/CMakeLists.txt llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h llvm/include/llvm/Analysis/Utils/TFUtils.h llvm/lib/Analysis/CMakeLists.txt llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp llvm/lib/Analysis/TFUtils.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassRegistry.def llvm/unittests/Analysis/CMakeLists.txt llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/saved_model.pbtxt llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.data-00000-of-00001 llvm/unittests/Analysis/Inputs/ir2native_x86_64_model/variables/variables.index llvm/unittests/Analysis/TFUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82817.277090.patch Type: text/x-patch Size: 293876 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:23:10 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:23:10 +0000 (UTC) Subject: [PATCH] D83519: [NewPM] Support optnone under new pass manager In-Reply-To: References: Message-ID: aeubanks added a comment. In D83519#2144463 , @aeubanks wrote: > In D83519#2144403 , @ychen wrote: > > > High-level request: how about split this patch into two, the first for the `require` pass part; the second for the PassInstrument callback. Then we could discuss the choices of first patch and D82344 . > > > Good idea, will split the patch and take a closer look at your patch. I split the required passes part into https://reviews.llvm.org/D83575. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83519/new/ https://reviews.llvm.org/D83519 From llvm-commits at lists.llvm.org Fri Jul 10 10:24:43 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via llvm-commits) Date: Fri, 10 Jul 2020 10:24:43 -0700 (PDT) Subject: [llvm] 954db63 - [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM Message-ID: <5f08a45b.1c69fb81.3d0d9.f64a@mx.google.com> Author: Luke Geeson Date: 2020-07-10T18:24:11+01:00 New Revision: 954db63cd149df031d9b660bf68f0fe1de1defb9 URL: https://github.com/llvm/llvm-project/commit/954db63cd149df031d9b660bf68f0fe1de1defb9 DIFF: https://github.com/llvm/llvm-project/commit/954db63cd149df031d9b660bf68f0fe1de1defb9.diff LOG: [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1 processors for AArch64 and ARM. In detail: - Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang - Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm details of the CPU can be found here: https://www.arm.com/products/cortex-x https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78 The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Reviewers: t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83206 Added: Modified: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp Removed: ################################################################################ diff --git a/clang/test/Driver/aarch64-cpus.c b/clang/test/Driver/aarch64-cpus.c index 53b546265f6a..f39241bee8a6 100644 --- a/clang/test/Driver/aarch64-cpus.c +++ b/clang/test/Driver/aarch64-cpus.c @@ -173,6 +173,10 @@ // RUN: %clang -target aarch64 -mcpu=cortex-a77 -### -c %s 2>&1 | FileCheck -check-prefix=CORTEX-A77 %s // CORTEX-A77: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-a77" +// RUN: %clang -target aarch64 -mcpu=cortex-x1 -### -c %s 2>&1 | FileCheck -check-prefix=CORTEXX1 %s +// CORTEXX1: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-x1" +// RUN: %clang -target aarch64 -mcpu=cortex-a78 -### -c %s 2>&1 | FileCheck -check-prefix=CORTEXA78 %s +// CORTEXA78: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-a78" // RUN: %clang -target aarch64_be -mcpu=exynos-m3 -### -c %s 2>&1 | FileCheck -check-prefix=M3 %s // RUN: %clang -target aarch64 -mbig-endian -mcpu=exynos-m3 -### -c %s 2>&1 | FileCheck -check-prefix=M3 %s diff --git a/clang/test/Driver/arm-cortex-cpus.c b/clang/test/Driver/arm-cortex-cpus.c index d99526abe446..6de1040e9420 100644 --- a/clang/test/Driver/arm-cortex-cpus.c +++ b/clang/test/Driver/arm-cortex-cpus.c @@ -840,6 +840,18 @@ // CHECK-CORTEX-A76AE-SOFT: "-target-feature" "+soft-float" // CHECK-CORTEX-A76AE-SOFT: "-target-feature" "+soft-float-abi" +// RUN: %clang -target armv8a-arm-none-eabi -mcpu=cortex-x1 -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-CORTEX-X1 %s +// RUN: %clang -target armv8a-arm-none-eabi -mcpu=cortex-x1 -mfpu=crypto-neon-fp-armv8 -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-CORTEX-X1-MFPU %s +// CHECK-CORTEX-X1: "-cc1"{{.*}} "-triple" "armv8.2a-{{.*}} "-target-cpu" "cortex-x1" +// CHECK-CORTEX-X1-MFPU: "-cc1"{{.*}} "-target-feature" "+fp-armv8" +// CHECK-CORTEX-X1-MFPU: "-target-feature" "+crypto" + +// RUN: %clang -target armv8a-arm-none-eabi -mcpu=cortex-a78 -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-CORTEX-A78 %s +// RUN: %clang -target armv8a-arm-none-eabi -mcpu=cortex-a78 -mfpu=crypto-neon-fp-armv8 -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-CORTEX-A78-MFPU %s +// CHECK-CORTEX-A78: "-cc1"{{.*}} "-triple" "armv8.2a-{{.*}} "-target-cpu" "cortex-a78" +// CHECK-CORTEX-A78-MFPU: "-cc1"{{.*}} "-target-feature" "+fp-armv8" +// CHECK-CORTEX-A78-MFPU: "-target-feature" "+crypto" + // RUN: %clang -target arm -mcpu=cortex-m23 -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-CPUV8MBASE %s // CHECK-CPUV8MBASE: "-cc1"{{.*}} "-triple" "thumbv8m.base- diff --git a/llvm/include/llvm/Support/AArch64TargetParser.def b/llvm/include/llvm/Support/AArch64TargetParser.def index 66843c6e1941..13b7cfc4b5cd 100644 --- a/llvm/include/llvm/Support/AArch64TargetParser.def +++ b/llvm/include/llvm/Support/AArch64TargetParser.def @@ -127,6 +127,12 @@ AARCH64_CPU_NAME("cortex-a76ae", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, AARCH64_CPU_NAME("cortex-a77", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, (AArch64::AEK_FP16 | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD | AArch64::AEK_SSBS)) +AARCH64_CPU_NAME("cortex-a78", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC | + AArch64::AEK_SSBS)) +AARCH64_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC | + AArch64::AEK_SSBS)) AARCH64_CPU_NAME("neoverse-e1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, (AArch64::AEK_DOTPROD | AArch64::AEK_FP16 | AArch64::AEK_RAS | AArch64::AEK_RCPC | AArch64::AEK_SSBS)) diff --git a/llvm/include/llvm/Support/ARMTargetParser.def b/llvm/include/llvm/Support/ARMTargetParser.def index 7a81af72ad33..9f51c841e429 100644 --- a/llvm/include/llvm/Support/ARMTargetParser.def +++ b/llvm/include/llvm/Support/ARMTargetParser.def @@ -294,6 +294,10 @@ ARM_CPU_NAME("cortex-a76ae", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, (ARM::AEK_FP16 | ARM::AEK_DOTPROD)) ARM_CPU_NAME("cortex-a77", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, (ARM::AEK_FP16 | ARM::AEK_DOTPROD)) +ARM_CPU_NAME("cortex-a78",ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (ARM::AEK_FP16 | ARM::AEK_DOTPROD)) +ARM_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, + (ARM::AEK_FP16 | ARM::AEK_DOTPROD)) ARM_CPU_NAME("neoverse-n1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false, (ARM::AEK_FP16 | ARM::AEK_DOTPROD)) ARM_CPU_NAME("cyclone", ARMV8A, FK_CRYPTO_NEON_FP_ARMV8, false, ARM::AEK_CRC) diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index adfb599f55ff..8dc8c4e9775a 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -205,6 +205,8 @@ StringRef sys::detail::getHostCPUNameForARM(StringRef ProcCpuinfoContent) { .Case("0xd0a", "cortex-a75") .Case("0xd0b", "cortex-a76") .Case("0xd0d", "cortex-a77") + .Case("0xd41", "cortex-a78") + .Case("0xd44", "cortex-x1") .Case("0xd0c", "neoverse-n1") .Default("generic"); } diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td index da68e3ed17a2..534af9686af0 100644 --- a/llvm/lib/Target/AArch64/AArch64.td +++ b/llvm/lib/Target/AArch64/AArch64.td @@ -636,6 +636,36 @@ def ProcA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77", FeatureDotProd ]>; +def ProcA78 : SubtargetFeature<"cortex-a78", "ARMProcFamily", + "CortexA78", + "Cortex-A78 ARM processors", [ + HasV8_2aOps, + FeatureCrypto, + FeatureFPARMv8, + FeatureFuseAES, + FeatureNEON, + FeatureRCPC, + FeaturePerfMon, + FeaturePostRAScheduler, + FeatureSPE, + FeatureFullFP16, + FeatureSSBS, + FeatureDotProd]>; + +def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1", + "Cortex-X1 ARM processors", [ + HasV8_2aOps, + FeatureCrypto, + FeatureFPARMv8, + FeatureFuseAES, + FeatureNEON, + FeatureRCPC, + FeaturePerfMon, + FeaturePostRAScheduler, + FeatureSPE, + FeatureFullFP16, + FeatureDotProd]>; + def ProcA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX", "Fujitsu A64FX processors", [ HasV8_2aOps, @@ -978,6 +1008,8 @@ def : ProcessorModel<"cortex-a75", CortexA57Model, [ProcA75]>; def : ProcessorModel<"cortex-a76", CortexA57Model, [ProcA76]>; def : ProcessorModel<"cortex-a76ae", CortexA57Model, [ProcA76]>; def : ProcessorModel<"cortex-a77", CortexA57Model, [ProcA77]>; +def : ProcessorModel<"cortex-a78", CortexA57Model, [ProcA78]>; +def : ProcessorModel<"cortex-x1", CortexA57Model, [ProcX1]>; def : ProcessorModel<"neoverse-e1", CortexA53Model, [ProcNeoverseE1]>; def : ProcessorModel<"neoverse-n1", CortexA57Model, [ProcNeoverseN1]>; def : ProcessorModel<"exynos-m3", ExynosM3Model, [ProcExynosM3]>; diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp index 2f5abd76cc3a..029535cb98b5 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp @@ -102,6 +102,8 @@ void AArch64Subtarget::initializeProperties() { case CortexA75: case CortexA76: case CortexA77: + case CortexA78: + case CortexX1: PrefFunctionLogAlignment = 4; break; case A64FX: diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h index b7dc05e27e6a..b111f0016948 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.h +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h @@ -56,6 +56,8 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { CortexA75, CortexA76, CortexA77, + CortexA78, + CortexX1, ExynosM3, Falkor, Kryo, diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td index 90b581447372..0468f7f1cf8e 100644 --- a/llvm/lib/Target/ARM/ARM.td +++ b/llvm/lib/Target/ARM/ARM.td @@ -596,6 +596,10 @@ def ProcA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76", "Cortex-A76 ARM processors", []>; def ProcA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77", "Cortex-A77 ARM processors", []>; +def ProcA78 : SubtargetFeature<"cortex-a78", "ARMProcFamily", "CortexA78", + "Cortex-A78 ARM processors", []>; +def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1", + "Cortex-X1 ARM processors", []>; def ProcKrait : SubtargetFeature<"krait", "ARMProcFamily", "Krait", "Qualcomm Krait processors", []>; @@ -1234,6 +1238,22 @@ def : ProcNoItin<"cortex-a77", [ARMv82a, ProcA77, FeatureFullFP16, FeatureDotProd]>; +def : ProcNoItin<"cortex-a78", [ARMv82a, ProcA78, + FeatureHWDivThumb, + FeatureHWDivARM, + FeatureCrypto, + FeatureCRC, + FeatureFullFP16, + FeatureDotProd]>; + +def : ProcNoItin<"cortex-x1", [ARMv82a, ProcX1, + FeatureHWDivThumb, + FeatureHWDivARM, + FeatureCrypto, + FeatureCRC, + FeatureFullFP16, + FeatureDotProd]>; + def : ProcNoItin<"neoverse-n1", [ARMv82a, FeatureHWDivThumb, FeatureHWDivARM, diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp b/llvm/lib/Target/ARM/ARMSubtarget.cpp index 3d828e55c869..46802037c2aa 100644 --- a/llvm/lib/Target/ARM/ARMSubtarget.cpp +++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp @@ -293,12 +293,14 @@ void ARMSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { case CortexA75: case CortexA76: case CortexA77: + case CortexA78: case CortexR4: case CortexR4F: case CortexR5: case CortexR7: case CortexM3: case CortexR52: + case CortexX1: break; case Exynos: LdStMultipleTiming = SingleIssuePlusExtras; diff --git a/llvm/lib/Target/ARM/ARMSubtarget.h b/llvm/lib/Target/ARM/ARMSubtarget.h index c42356f28b65..2703e385dd81 100644 --- a/llvm/lib/Target/ARM/ARMSubtarget.h +++ b/llvm/lib/Target/ARM/ARMSubtarget.h @@ -62,6 +62,7 @@ class ARMSubtarget : public ARMGenSubtargetInfo { CortexA75, CortexA76, CortexA77, + CortexA78, CortexA8, CortexA9, CortexM3, @@ -70,6 +71,7 @@ class ARMSubtarget : public ARMGenSubtargetInfo { CortexR5, CortexR52, CortexR7, + CortexX1, Exynos, Krait, Kryo, diff --git a/llvm/test/CodeGen/AArch64/cpus.ll b/llvm/test/CodeGen/AArch64/cpus.ll index 107aca373923..3d4ad97b7fb2 100644 --- a/llvm/test/CodeGen/AArch64/cpus.ll +++ b/llvm/test/CodeGen/AArch64/cpus.ll @@ -16,6 +16,8 @@ ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=cortex-a76ae 2>&1 | FileCheck %s ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=cortex-a76 2>&1 | FileCheck %s ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=cortex-a77 2>&1 | FileCheck %s +; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=cortex-a78 2>&1 | FileCheck %s +; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=cortex-x1 2>&1 | FileCheck %s ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-e1 2>&1 | FileCheck %s ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 2>&1 | FileCheck %s ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s diff --git a/llvm/test/CodeGen/AArch64/remat.ll b/llvm/test/CodeGen/AArch64/remat.ll index 400be14b6e46..90ad2508fe17 100644 --- a/llvm/test/CodeGen/AArch64/remat.ll +++ b/llvm/test/CodeGen/AArch64/remat.ll @@ -9,6 +9,8 @@ ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=cortex-a73 -o - %s | FileCheck %s ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=cortex-a75 -o - %s | FileCheck %s ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=cortex-a77 -o - %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=cortex-a78 -o - %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=cortex-x1 -o - %s | FileCheck %s ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=neoverse-e1 -o - %s | FileCheck %s ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=neoverse-n1 -o - %s | FileCheck %s ; RUN: llc -mtriple=aarch64-linux-gnuabi -mcpu=exynos-m3 -o - %s | FileCheck %s diff --git a/llvm/test/MC/AArch64/armv8.2a-dotprod.s b/llvm/test/MC/AArch64/armv8.2a-dotprod.s index 6315066efe58..3b9f416a63fb 100644 --- a/llvm/test/MC/AArch64/armv8.2a-dotprod.s +++ b/llvm/test/MC/AArch64/armv8.2a-dotprod.s @@ -5,6 +5,8 @@ // RUN: llvm-mc -triple aarch64 -mcpu=cortex-a75 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD // RUN: llvm-mc -triple aarch64 -mcpu=cortex-a76 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD // RUN: llvm-mc -triple aarch64 -mcpu=cortex-a77 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD +// RUN: llvm-mc -triple aarch64 -mcpu=cortex-a78 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD +// RUN: llvm-mc -triple aarch64 -mcpu=cortex-x1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD // RUN: llvm-mc -triple aarch64 -mcpu=neoverse-e1 -show-encoding < %s| FileCheck %s --check-prefix=CHECK-DOTPROD // RUN: llvm-mc -triple aarch64 -mcpu=neoverse-n1 -show-encoding < %s| FileCheck %s --check-prefix=CHECK-DOTPROD // RUN: llvm-mc -triple aarch64 -mcpu=tsv110 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD @@ -19,6 +21,10 @@ // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s // RUN: not llvm-mc -triple aarch64 -mcpu=cortex-a77 -mattr=-dotprod -show-encoding < %s 2> %t // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s +// RUN: not llvm-mc -triple aarch64 -mcpu=cortex-a78 -mattr=-dotprod -show-encoding < %s 2> %t +// RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s +// RUN: not llvm-mc -triple aarch64 -mcpu=cortex-x1 -mattr=-dotprod -show-encoding < %s 2> %t +// RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s // RUN: not llvm-mc -triple aarch64 -mcpu=neoverse-n1 -mattr=-dotprod -show-encoding < %s 2> %t // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s diff --git a/llvm/test/MC/ARM/armv8.2a-dotprod-a32.s b/llvm/test/MC/ARM/armv8.2a-dotprod-a32.s index 748392630920..8ca2a97c602d 100644 --- a/llvm/test/MC/ARM/armv8.2a-dotprod-a32.s +++ b/llvm/test/MC/ARM/armv8.2a-dotprod-a32.s @@ -4,11 +4,17 @@ // RUN: llvm-mc -triple arm -mcpu=cortex-a76 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: llvm-mc -triple arm -mcpu=neoverse-n1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: llvm-mc -triple arm -mcpu=cortex-a77 -show-encoding < %s | FileCheck %s --check-prefix=CHECK +// RUN: llvm-mc -triple arm -mcpu=cortex-a78 -show-encoding < %s | FileCheck %s --check-prefix=CHECK +// RUN: llvm-mc -triple arm -mcpu=cortex-x1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: not llvm-mc -triple arm -mattr=-dotprod -show-encoding < %s 2> %t // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s // RUN: not llvm-mc -triple arm -mcpu=cortex-a77 -mattr=-dotprod -show-encoding < %s 2> %t // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s +// RUN: not llvm-mc -triple arm -mcpu=cortex-a78 -mattr=-dotprod -show-encoding < %s 2> %t +// RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s +// RUN: not llvm-mc -triple arm -mcpu=cortex-x1 -mattr=-dotprod -show-encoding < %s 2> %t +// RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s // RUN: not llvm-mc -triple arm -show-encoding < %s 2> %t // RUN: FileCheck --check-prefix=CHECK-NO-DOTPROD < %t %s // RUN: not llvm-mc -triple arm -mattr=+v8.1a -show-encoding < %s 2> %t diff --git a/llvm/test/MC/ARM/armv8.2a-dotprod-t32.s b/llvm/test/MC/ARM/armv8.2a-dotprod-t32.s index 47837729eef6..8570a7be3150 100644 --- a/llvm/test/MC/ARM/armv8.2a-dotprod-t32.s +++ b/llvm/test/MC/ARM/armv8.2a-dotprod-t32.s @@ -3,6 +3,8 @@ // RUN: llvm-mc -triple thumb -mcpu=cortex-a75 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: llvm-mc -triple thumb -mcpu=cortex-a76 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: llvm-mc -triple thumb -mcpu=cortex-a77 -show-encoding < %s | FileCheck %s --check-prefix=CHECK +// RUN: llvm-mc -triple thumb -mcpu=cortex-a78 -show-encoding < %s | FileCheck %s --check-prefix=CHECK +// RUN: llvm-mc -triple thumb -mcpu=cortex-x1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: llvm-mc -triple thumb -mcpu=neoverse-n1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK // RUN: not llvm-mc -triple thumb -mattr=-dotprod -show-encoding < %s 2> %t diff --git a/llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt b/llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt index 1c41882119b3..8b0aac526999 100644 --- a/llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt +++ b/llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt @@ -5,6 +5,8 @@ # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=cortex-a65ae --disassemble < %s | FileCheck %s # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=cortex-a75 --disassemble < %s | FileCheck %s # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=cortex-a77 --disassemble < %s | FileCheck %s +# RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=cortex-a78 --disassemble < %s | FileCheck %s +# RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=cortex-x1 --disassemble < %s | FileCheck %s # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=neoverse-e1 --disassemble < %s | FileCheck %s # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=neoverse-n1 --disassemble < %s | FileCheck %s diff --git a/llvm/unittests/Support/TargetParserTest.cpp b/llvm/unittests/Support/TargetParserTest.cpp index b9736481e0e3..0127cb6ae009 100644 --- a/llvm/unittests/Support/TargetParserTest.cpp +++ b/llvm/unittests/Support/TargetParserTest.cpp @@ -262,6 +262,18 @@ TEST(TargetParserTest, testARMCPU) { ARM::AEK_HWDIVTHUMB | ARM::AEK_DSP | ARM::AEK_FP16 | ARM::AEK_RAS | ARM::AEK_DOTPROD, "8.2-A")); + EXPECT_TRUE(testARMCPU("cortex-a78", "armv8.2-a", "crypto-neon-fp-armv8", + ARM::AEK_DOTPROD | ARM::AEK_FP16 | + ARM::AEK_SEC | ARM::AEK_MP | ARM::AEK_VIRT | + ARM::AEK_HWDIVARM | ARM::AEK_HWDIVTHUMB | + ARM::AEK_DSP | ARM::AEK_CRC | ARM::AEK_RAS, + "8.2-A")); + EXPECT_TRUE(testARMCPU("cortex-x1", "armv8.2-a", "crypto-neon-fp-armv8", + ARM::AEK_RAS | ARM::AEK_FP16 | ARM::AEK_DOTPROD | + ARM::AEK_SEC | ARM::AEK_MP | ARM::AEK_VIRT | + ARM::AEK_HWDIVARM | ARM::AEK_HWDIVTHUMB | + ARM::AEK_DSP | ARM::AEK_CRC | ARM::AEK_RAS, + "8.2-A")); EXPECT_TRUE(testARMCPU("neoverse-n1", "armv8.2-a", "crypto-neon-fp-armv8", ARM::AEK_CRC | ARM::AEK_SEC | ARM::AEK_MP | ARM::AEK_VIRT | ARM::AEK_HWDIVARM | @@ -310,7 +322,7 @@ TEST(TargetParserTest, testARMCPU) { "7-S")); } -static constexpr unsigned NumARMCPUArchs = 87; +static constexpr unsigned NumARMCPUArchs = 89; TEST(TargetParserTest, testARMCPUArchList) { SmallVector List; @@ -864,6 +876,20 @@ TEST(TargetParserTest, testAArch64CPU) { AArch64::AEK_RDM | AArch64::AEK_SIMD | AArch64::AEK_RAS | AArch64::AEK_LSE | AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC | AArch64::AEK_SSBS, "8.2-A")); + EXPECT_TRUE(testAArch64CPU( + "cortex-a78", "armv8.2-a", "crypto-neon-fp-armv8", + AArch64::AEK_CRC | AArch64::AEK_CRYPTO | AArch64::AEK_FP | + AArch64::AEK_RDM | AArch64::AEK_SIMD | AArch64::AEK_RAS | + AArch64::AEK_LSE | AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | + AArch64::AEK_RCPC | AArch64::AEK_SSBS, + "8.2-A")); + EXPECT_TRUE(testAArch64CPU( + "cortex-x1", "armv8.2-a", "crypto-neon-fp-armv8", + AArch64::AEK_CRC | AArch64::AEK_CRYPTO | AArch64::AEK_FP | + AArch64::AEK_RDM | AArch64::AEK_SIMD | AArch64::AEK_RAS | + AArch64::AEK_LSE | AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | + AArch64::AEK_RCPC | AArch64::AEK_SSBS, + "8.2-A")); EXPECT_TRUE(testAArch64CPU( "cyclone", "armv8-a", "crypto-neon-fp-armv8", AArch64::AEK_CRYPTO | AArch64::AEK_FP | AArch64::AEK_SIMD, "8-A")); @@ -1002,7 +1028,7 @@ TEST(TargetParserTest, testAArch64CPU) { "8.2-A")); } -static constexpr unsigned NumAArch64CPUArchs = 40; +static constexpr unsigned NumAArch64CPUArchs = 42; TEST(TargetParserTest, testAArch64CPUArchList) { SmallVector List; From llvm-commits at lists.llvm.org Fri Jul 10 10:24:52 2020 From: llvm-commits at lists.llvm.org (Luke Geeson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:24:52 +0000 (UTC) Subject: [PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM In-Reply-To: References: Message-ID: <4b26057b4954628a16e9bba2a7cb1c53@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG954db63cd149: [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM (authored by LukeGeeson). Changed prior to commit: https://reviews.llvm.org/D83206?vs=277074&id=277093#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83206/new/ https://reviews.llvm.org/D83206 Files: clang/test/Driver/aarch64-cpus.c clang/test/Driver/arm-cortex-cpus.c llvm/include/llvm/Support/AArch64TargetParser.def llvm/include/llvm/Support/ARMTargetParser.def llvm/lib/Support/Host.cpp llvm/lib/Target/AArch64/AArch64.td llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/ARM/ARM.td llvm/lib/Target/ARM/ARMSubtarget.cpp llvm/lib/Target/ARM/ARMSubtarget.h llvm/test/CodeGen/AArch64/cpus.ll llvm/test/CodeGen/AArch64/remat.ll llvm/test/MC/AArch64/armv8.2a-dotprod.s llvm/test/MC/ARM/armv8.2a-dotprod-a32.s llvm/test/MC/ARM/armv8.2a-dotprod-t32.s llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt llvm/unittests/Support/TargetParserTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83206.277093.patch Type: text/x-patch Size: 21213 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:25:11 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:25:11 +0000 (UTC) Subject: [PATCH] D80580: [NFC] Separate Peeling Properties into its own struct In-Reply-To: References: Message-ID: <9c0703947f5e978b1dc12ff6991d8846@localhost.localdomain> Meinersbur added a comment. Patch still look fine to me. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80580/new/ https://reviews.llvm.org/D80580 From llvm-commits at lists.llvm.org Fri Jul 10 10:26:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:26:42 +0000 (UTC) Subject: [PATCH] D83549: [ELF] Do not force bringing out symbols passed by -init and -fini. In-Reply-To: References: Message-ID: MaskRay added a subscriber: aaronpuchert. MaskRay added a comment. I think this is about different expectation of `_init` and `_fini` behaviors. @aaronpuchert Your comment https://bugzilla.opensuse.org/show_bug.cgi?id=1155108#c10 "However, it's actually a bug in all three linkers. It has been fixed in lld just today [1]" made me wonder whether https://bugs.llvm.org/show_bug.cgi?id=43927 was really a linker bug. I need an LLD reproduce file (obtained via `LLD_REPRODUCE=/tmp/rep.tar ...` or `-Wl,--reproduce=/tmp/rep.tar` to under the matter better. An alternative fix to @ikudrin's problem is to revert D69985 openSUSE's OpenMP problem probably should be fixed by adding `-u _init -u _fini` instead of relying on the linker retaining the bitcode defined symbols. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83549/new/ https://reviews.llvm.org/D83549 From llvm-commits at lists.llvm.org Fri Jul 10 10:27:38 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:27:38 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <1ee8490ea8ba275a69878e61dd246421@localhost.localdomain> arsenm updated this revision to Diff 277094. arsenm marked 2 inline comments as done. arsenm added a comment. Documentation fixes CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 Files: llvm/docs/AMDGPUUsage.rst llvm/include/llvm/Support/AMDGPUMetadata.h llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp llvm/lib/Support/AMDGPUMetadata.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D82818.277094.patch Type: text/x-patch Size: 191222 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:30:44 2020 From: llvm-commits at lists.llvm.org (Konstantin Zhuravlyov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:30:44 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: <3c39c58f9fad338a548a7b0e5e587b3e@localhost.localdomain> kzhuravl accepted this revision. kzhuravl added a comment. This revision is now accepted and ready to land. lgtm CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Fri Jul 10 10:33:41 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:33:41 +0000 (UTC) Subject: [PATCH] D83576: [BasicAA] Fix -basicaa-recphi for geps with negative offsets Message-ID: dmgreen created this revision. dmgreen added reviewers: efriedma, hfinkel, fhahn, tobiasvk. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. As shown in D82998 , the basic-aa-recphi option can cause miscompiles for gep's with negative constants. The option checks for recursive phi, that recurse through a contant gep. If it finds one, it performs aliasing calculations using the other phi operands with an unknown size, to specify that an unknown number of elements after the initial value are potentially accessed. This works fine expect where the constant is negative, as the size is still considered to be positive. So this patch checks to make sure that the constant is also positive. I will not attempt to turn the option back on until after the branch is made next week, to give us lots of time to catch anything else. But this should hopefully fix the issues. https://reviews.llvm.org/D83576 Files: llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/recphi.ll Index: llvm/test/Analysis/BasicAA/recphi.ll =================================================================== --- llvm/test/Analysis/BasicAA/recphi.ll +++ llvm/test/Analysis/BasicAA/recphi.ll @@ -92,8 +92,8 @@ ; CHECK: NoAlias: i32* %arrayidx1, i8* %0 ; CHECK: NoAlias: i32* %arrayidx, i32* %arrayidx1 ; CHECK: MayAlias: [10 x i32]* %tab, i32* %p.addr.05.i -; CHECK: NoAlias: i32* %p.addr.05.i, i8* %0 -; CHECK: NoAlias: i32* %arrayidx, i32* %p.addr.05.i +; CHECK: MayAlias: i32* %p.addr.05.i, i8* %0 +; CHECK: MayAlias: i32* %arrayidx, i32* %p.addr.05.i ; CHECK: MayAlias: i32* %arrayidx1, i32* %p.addr.05.i ; CHECK: MayAlias: [10 x i32]* %tab, i32* %incdec.ptr.i ; CHECK: MayAlias: i32* %incdec.ptr.i, i8* %0 @@ -141,17 +141,17 @@ ; CHECK: NoAlias: [3 x i16]* %int_arr.10, i16** %argv.6.par ; CHECK: NoAlias: i16* %_tmp1, i16** %argv.6.par ; CHECK: PartialAlias: [3 x i16]* %int_arr.10, i16* %_tmp1 -; CHECK: NoAlias: i16* %ls1.9.0, i16** %argv.6.par +; CHECK: MayAlias: i16* %ls1.9.0, i16** %argv.6.par ; CHECK: MayAlias: [3 x i16]* %int_arr.10, i16* %ls1.9.0 ; CHECK: MayAlias: i16* %_tmp1, i16* %ls1.9.0 -; CHECK: NoAlias: i16* %_tmp7, i16** %argv.6.par +; CHECK: MayAlias: i16* %_tmp7, i16** %argv.6.par ; CHECK: MayAlias: [3 x i16]* %int_arr.10, i16* %_tmp7 ; CHECK: MayAlias: i16* %_tmp1, i16* %_tmp7 ; CHECK: NoAlias: i16* %_tmp7, i16* %ls1.9.0 ; CHECK: NoAlias: i16* %_tmp11, i16** %argv.6.par ; CHECK: PartialAlias: [3 x i16]* %int_arr.10, i16* %_tmp11 ; CHECK: NoAlias: i16* %_tmp1, i16* %_tmp11 -; CHECK: NoAlias: i16* %_tmp11, i16* %ls1.9.0 +; CHECK: MayAlias: i16* %_tmp11, i16* %ls1.9.0 ; CHECK: MayAlias: i16* %_tmp11, i16* %_tmp7 ; CHECK: Both ModRef: Ptr: i16** %argv.6.par <-> %_tmp16 = call i16 @call(i32 %_tmp13) ; CHECK: NoModRef: Ptr: [3 x i16]* %int_arr.10 <-> %_tmp16 = call i16 @call(i32 %_tmp13) Index: llvm/lib/Analysis/BasicAliasAnalysis.cpp =================================================================== --- llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -1666,8 +1666,10 @@ // result of this PHI node (e.g. in a loop). If this is the case, we // would recurse and always get a MayAlias. Handle this case specially // below. - if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 && - isa(PV1GEP->idx_begin())) { + if (PV1GEP->getPointerOperand() == PN && + PV1GEP->getNumIndices() == 1 && + isa(PV1GEP->idx_begin()) && + cast(PV1GEP->idx_begin())->getSExtValue() >= 0) { isRecursive = true; continue; } @@ -1693,8 +1695,10 @@ // result of this PHI node (e.g. in a loop). If this is the case, we // would recurse and always get a MayAlias. Handle this case specially // below. - if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 && - isa(PV1GEP->idx_begin())) { + if (PV1GEP->getPointerOperand() == PN && + PV1GEP->getNumIndices() == 1 && + isa(PV1GEP->idx_begin()) && + cast(PV1GEP->idx_begin())->getSExtValue() >= 0) { isRecursive = true; continue; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83576.277081.patch Type: text/x-patch Size: 3716 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:34:41 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:34:41 +0000 (UTC) Subject: [PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot In-Reply-To: References: Message-ID: <6573da6ff26d763ee5fbcff8fc11b62d@localhost.localdomain> arsenm accepted this revision. arsenm added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054 + + Optional Arg = + getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true); + ---------------- mbrkusanin wrote: > arsenm wrote: > > I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through > Unfortunately regular version fails to produce the value. > I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12 +; CHECK: ; %bb.0: +; CHECK-NEXT: s_mov_b32 s0, 0 +; CHECK-NEXT: s_mov_b32 s1, 0 +; CHECK-NEXT: ; return to shader part epilog ---------------- mbrkusanin wrote: > arsenm wrote: > > This can be one s_mov_b64 > It can, but SIFoldOperands will not let that happen. > > From: > %10:sreg_64 = S_MOV_B64 0 > %3:sreg_32 = COPY %10.sub0:sreg_64 > %4:sreg_32 = COPY %10.sub1:sreg_64 > plus some instructions that use %3, %4 but will eventually be removed. > > SIFoldOperands will produce: > %10:sreg_64 = S_MOV_B64 0 > %3:sreg_32 = S_MOV_B32 0 > %4:sreg_32 = S_MOV_B32 0 > ... > > which makes the first instruction dead and in the end we're left with two S_MOV_B32. > > For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands. > Alternatively we can produce > v_cmp_ne_u32_e64 s[0:1], 0, 0 > if for whatever reason that is more preferable then > s_mov_b32 s0, 0 > s_mov_b32 s1, 0 > > Anyway, this is not an issue with selecting ballot. Following example has the same issue: > > ``` > define amdgpu_cs i64 @si_fold_constants_i64() { > %x = add i64 0, 0 > ret i64 %x > } > ``` I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83214/new/ https://reviews.llvm.org/D83214 From llvm-commits at lists.llvm.org Fri Jul 10 10:35:35 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:35:35 +0000 (UTC) Subject: [PATCH] D82998: [BasicAA] Enable -basic-aa-recphi by default In-Reply-To: References: Message-ID: dmgreen added a comment. Not a problem. These problems can get very complex in real code. I have put up D83576 to hopefully fix the issue, but won't attempt to turn the option back on until after the new branch gets created. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82998/new/ https://reviews.llvm.org/D82998 From llvm-commits at lists.llvm.org Fri Jul 10 10:36:50 2020 From: llvm-commits at lists.llvm.org (Vishal Chebrolu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:36:50 +0000 (UTC) Subject: [PATCH] D82892: [NFC] Added comparision for all types in haveSameSpecialState() of Instruction.cpp In-Reply-To: References: Message-ID: vish99 updated this revision to Diff 277097. vish99 added a comment. Comment added Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82892/new/ https://reviews.llvm.org/D82892 Files: llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/Instructions.h llvm/lib/IR/Instruction.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D82892.277097.patch Type: text/x-patch Size: 45465 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:42:42 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Fri, 10 Jul 2020 10:42:42 -0700 (PDT) Subject: [llvm] 1cf6f21 - [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. Message-ID: <5f08a892.1c69fb81.4b096.090a@mx.google.com> Author: Craig Topper Date: 2020-07-10T10:42:25-07:00 New Revision: 1cf6f210a2ed87dcda2183fffd6f9aa17b5c493c URL: https://github.com/llvm/llvm-project/commit/1cf6f210a2ed87dcda2183fffd6f9aa17b5c493c DIFF: https://github.com/llvm/llvm-project/commit/1cf6f210a2ed87dcda2183fffd6f9aa17b5c493c.diff LOG: [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. This matches the recent change to InstSimplify from D83440. Differential Revision: https://reviews.llvm.org/D83535 Added: Modified: llvm/lib/IR/ConstantFold.cpp llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll llvm/test/Transforms/InstSimplify/select.ll Removed: ################################################################################ diff --git a/llvm/lib/IR/ConstantFold.cpp b/llvm/lib/IR/ConstantFold.cpp index f3c3e9ad9f69..f02246cda7fc 100644 --- a/llvm/lib/IR/ConstantFold.cpp +++ b/llvm/lib/IR/ConstantFold.cpp @@ -779,10 +779,30 @@ Constant *llvm::ConstantFoldSelectInstruction(Constant *Cond, if (isa(V1)) return V1; return V2; } - if (isa(V1)) return V2; - if (isa(V2)) return V1; + if (V1 == V2) return V1; + // If the true or false value is undef, we can fold to the other value as + // long as the other value isn't poison. + auto NotPoison = [](Constant *C) { + // TODO: We can analyze ConstExpr by opcode to determine if there is any + // possibility of poison. + if (isa(C)) + return false; + + if (isa(C) || isa(C) || isa(C) || + isa(C) || isa(C)) + return true; + + if (C->getType()->isVectorTy()) + return !C->containsUndefElement() && !C->containsConstantExpression(); + + // TODO: Recursively analyze aggregates or other constants. + return false; + }; + if (isa(V1) && NotPoison(V2)) return V2; + if (isa(V2) && NotPoison(V1)) return V1; + if (ConstantExpr *TrueVal = dyn_cast(V1)) { if (TrueVal->getOpcode() == Instruction::Select) if (TrueVal->getOperand(0) == Cond) diff --git a/llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll b/llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll index 1fa4bdc1964e..3acd21c73958 100644 --- a/llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll +++ b/llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll @@ -221,7 +221,7 @@ define amdgpu_kernel void @store_select_group_global_mismatch_inttoptr_flat_null } ; CHECK-LABEL: @store_select_group_global_mismatch_undef_undef_constexpr( -; CHECK: store i32 7, i32 addrspace(3)* null +; CHECK: store i32 7, i32* select (i1 icmp eq (i32 ptrtoint (i32 addrspace(3)* @lds1 to i32), i32 4), i32* addrspacecast (i32 addrspace(3)* null to i32*), i32* undef), align 4 define amdgpu_kernel void @store_select_group_global_mismatch_undef_undef_constexpr() #0 { store i32 7, i32* select (i1 icmp eq (i32 ptrtoint (i32 addrspace(3)* @lds1 to i32), i32 4), i32* addrspacecast (i32 addrspace(3)* null to i32*), i32* addrspacecast (i32 addrspace(1)* undef to i32*)), align 4 ret void diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll index 753d8fa64bdb..353f2e6a6753 100644 --- a/llvm/test/Transforms/InstSimplify/select.ll +++ b/llvm/test/Transforms/InstSimplify/select.ll @@ -858,3 +858,73 @@ define <2 x i32> @false_undef_true_constextpr_vec(i1 %cond) { %s = select i1 %cond, <2 x i32> , <2 x i32> ret <2 x i32> %s } + +define i32 @all_constant_true_undef() { +; CHECK-LABEL: @all_constant_true_undef( +; CHECK-NEXT: ret i32 1 +; + %s = select i1 ptrtoint (i32 ()* @all_constant_true_undef to i1), i32 undef, i32 1 + ret i32 %s +} + +define float @all_constant_false_undef() { +; CHECK-LABEL: @all_constant_false_undef( +; CHECK-NEXT: ret float 1.000000e+00 +; + %s = select i1 ptrtoint (float ()* @all_constant_false_undef to i1), float undef, float 1.0 + ret float %s +} + +define <2 x i32> @all_constant_true_undef_vec() { +; CHECK-LABEL: @all_constant_true_undef_vec( +; CHECK-NEXT: ret <2 x i32> +; + %s = select i1 ptrtoint (<2 x i32> ()* @all_constant_true_undef_vec to i1), <2 x i32> undef, <2 x i32> + ret <2 x i32> %s +} + +define <2 x float> @all_constant_false_undef_vec() { +; CHECK-LABEL: @all_constant_false_undef_vec( +; CHECK-NEXT: ret <2 x float> +; + %s = select i1 ptrtoint (<2 x float> ()* @all_constant_false_undef_vec to i1), <2 x float> undef, <2 x float> + ret <2 x float> %s +} + +; Negative tests. Don't fold if the non-undef operand is a constexpr. +define i32 @all_constant_false_undef_true_constexpr() { +; CHECK-LABEL: @all_constant_false_undef_true_constexpr( +; CHECK-NEXT: [[S:%.*]] = select i1 ptrtoint (i32 ()* @all_constant_false_undef_true_constexpr to i1), i32 ptrtoint (i32 ()* @all_constant_false_undef_true_constexpr to i32), i32 undef +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 ptrtoint (i32 ()* @all_constant_false_undef_true_constexpr to i1), i32 ptrtoint (i32 ()* @all_constant_false_undef_true_constexpr to i32), i32 undef + ret i32 %s +} + +define i32 @all_constant_true_undef_false_constexpr() { +; CHECK-LABEL: @all_constant_true_undef_false_constexpr( +; CHECK-NEXT: [[S:%.*]] = select i1 ptrtoint (i32 ()* @all_constant_true_undef_false_constexpr to i1), i32 undef, i32 ptrtoint (i32 ()* @all_constant_true_undef_false_constexpr to i32) +; CHECK-NEXT: ret i32 [[S]] +; + %s = select i1 ptrtoint (i32 ()* @all_constant_true_undef_false_constexpr to i1), i32 undef, i32 ptrtoint (i32 ()* @all_constant_true_undef_false_constexpr to i32) + ret i32 %s +} + +; Negative tests. Don't fold if the non-undef operand is a vector containing a constexpr. +define <2 x i32> @all_constant_false_undef_true_constexpr_vec() { +; CHECK-LABEL: @all_constant_false_undef_true_constexpr_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 ptrtoint (<2 x i32> ()* @all_constant_false_undef_true_constexpr_vec to i1), <2 x i32> ()* @all_constant_false_undef_true_constexpr_vec to i32), i32 -1>, <2 x i32> undef +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 ptrtoint (<2 x i32> ()* @all_constant_false_undef_true_constexpr_vec to i1), <2 x i32> ()* @all_constant_false_undef_true_constexpr_vec to i32), i32 -1>, <2 x i32> undef + ret <2 x i32> %s +} + +define <2 x i32> @all_constant_true_undef_false_constexpr_vec() { +; CHECK-LABEL: @all_constant_true_undef_false_constexpr_vec( +; CHECK-NEXT: [[S:%.*]] = select i1 ptrtoint (<2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i1), <2 x i32> undef, <2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i32)> +; CHECK-NEXT: ret <2 x i32> [[S]] +; + %s = select i1 ptrtoint (<2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i1), <2 x i32> undef, <2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i32)> + ret <2 x i32> %s +} From llvm-commits at lists.llvm.org Fri Jul 10 10:42:53 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:42:53 +0000 (UTC) Subject: [PATCH] D83535: [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG1cf6f210a2ed: [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction… (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83535/new/ https://reviews.llvm.org/D83535 Files: llvm/lib/IR/ConstantFold.cpp llvm/test/Transforms/InferAddressSpaces/AMDGPU/select.ll llvm/test/Transforms/InstSimplify/select.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83535.277098.patch Type: text/x-patch Size: 6217 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:45:40 2020 From: llvm-commits at lists.llvm.org (Jordan Rupprecht via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:45:40 +0000 (UTC) Subject: [PATCH] D83530: WIP: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable In-Reply-To: References: Message-ID: <3b58b02898687df3e7cb4a99893b0224@localhost.localdomain> rupprecht added a reviewer: rupprecht. rupprecht added a comment. `-f` is a GNU option, so we should keep that for compatibility. Is there a reason it's being removed? ... actually, the patch says `-f` is dropped, but from the tablegen, it looks like it's still there? LGTM to the general idea of replacing cl::opt with tablegen options for simple command line tools. Will take a look when the grouped option issue is addressed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83530/new/ https://reviews.llvm.org/D83530 From llvm-commits at lists.llvm.org Fri Jul 10 10:47:27 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:47:27 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: tmsriram added a comment. Ping, Is this alright now? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 10:48:33 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:48:33 +0000 (UTC) Subject: [PATCH] D83575: [NewPM] Allow passes to never be skipped In-Reply-To: References: Message-ID: <1c0f420784501afb0912ebdedc7311d7@localhost.localdomain> aeubanks updated this revision to Diff 277101. aeubanks added a comment. Actually use isRequired() in PassInstrumentation::runBeforePass() Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83575/new/ https://reviews.llvm.org/D83575 Files: llvm/include/llvm/Analysis/CGSCCPassManager.h llvm/include/llvm/IR/PassInstrumentation.h llvm/include/llvm/IR/PassManager.h llvm/include/llvm/IR/PassManagerInternal.h llvm/include/llvm/Transforms/Scalar/LoopPassManager.h polly/include/polly/ScopPass.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83575.277101.patch Type: text/x-patch Size: 5302 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:49:21 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:49:21 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <01022c7848b1b5c9c174789f92e19f52@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:4 +# RUN: ld.lld %t.o --defsym=a=0x12345678 --defsym=b=30 -o %t +# RUN: llvm-objdump -d %t | FileCheck %s +# RUN: llvm-objdump -s %t | FileCheck --check-prefix=HEX %s ---------------- You may want `--print-imm-hex` to print immediates in hexadecimal. ================ Comment at: lld/test/ELF/avr-reloc.s:17 +# CHECK-NEXT: ldi r20, 255 +ldi r20, lo8(a) +ldi r20, hi8(a) ---------------- Might be worth adding comments about the exact relocation type used, e.g. `ldi r20, lo8(a) # R_AVR_LO8_LDI...` ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- `{{.*}} 1e1e000f 00785634 12` The address of .DATA is insignificant and should be omitted. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Fri Jul 10 10:52:14 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:52:14 +0000 (UTC) Subject: [PATCH] D83013: [LPM] Port CGProfilePass from NPM to LPM In-Reply-To: References: Message-ID: MaskRay added a comment. In D83013#2143470 , @hans wrote: > Still lgtm. For what it's worth, I think you could have just re-committed with the fixes rather than uploading for review again. This may be a difference of habits but I usually upload the last revision if it contains anything more than comment changes. The reviewed version might be read by posterity to get a quick overview about the patch. Browsing git log -p is not very convenient at times. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83013/new/ https://reviews.llvm.org/D83013 From llvm-commits at lists.llvm.org Fri Jul 10 10:54:27 2020 From: llvm-commits at lists.llvm.org (Kerry McLaughlin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:54:27 +0000 (UTC) Subject: [PATCH] D83577: [SVE][Codegen] Add a helper function for pointer increment logic Message-ID: kmclaughlin created this revision. kmclaughlin added reviewers: efriedma, sdesmalen, david-arm. Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. Herald added a project: LLVM. Helper used when splitting load & store operations to calculate the pointer + offset for the high half of the split Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83577 Files: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Index: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -976,6 +976,25 @@ SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, + MachinePointerInfo &MPI, + SDValue &Ptr) { + SDLoc DL(N); + unsigned IncrementSize = MemVT.getSizeInBits().getKnownMinSize() / 8; + + if (MemVT.isScalableVector()) { + SDValue BytesIncrement = DAG.getVScale( + DL, Ptr.getValueType(), + APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize)); + MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace()); + Ptr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, BytesIncrement); + } else { + MPI = N->getPointerInfo().getWithOffset(IncrementSize); + // Increment the pointer to the other half. + Ptr = DAG.getObjectPtrOffset(DL, Ptr, IncrementSize); + } +} + void DAGTypeLegalizer::SplitVecRes_BinOp(SDNode *N, SDValue &Lo, SDValue &Hi) { SDValue LHSLo, LHSHi; @@ -1537,19 +1556,8 @@ LD->getPointerInfo(), LoMemVT, LD->getOriginalAlign(), MMOFlags, AAInfo); - unsigned IncrementSize = LoMemVT.getSizeInBits().getKnownMinSize() / 8; - MachinePointerInfo MPI; - if (LoVT.isScalableVector()) { - SDValue BytesIncrement = DAG.getVScale( - dl, Ptr.getValueType(), - APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize)); - MPI = MachinePointerInfo(LD->getPointerInfo().getAddrSpace()); - Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr, BytesIncrement); - } else { - MPI = LD->getPointerInfo().getWithOffset(IncrementSize); - Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize); - } + IncrementPointer(LD, LoMemVT, MPI, Ptr); Hi = DAG.getLoad(ISD::UNINDEXED, ExtType, HiVT, dl, Ch, Ptr, Offset, MPI, HiMemVT, LD->getOriginalAlign(), MMOFlags, AAInfo); @@ -2489,8 +2497,6 @@ if (!LoMemVT.isByteSized() || !HiMemVT.isByteSized()) return TLI.scalarizeVectorStore(N, DAG); - unsigned IncrementSize = LoMemVT.getSizeInBits().getKnownMinSize() / 8; - if (isTruncating) Lo = DAG.getTruncStore(Ch, DL, Lo, Ptr, N->getPointerInfo(), LoMemVT, Alignment, MMOFlags, AAInfo); @@ -2499,17 +2505,7 @@ AAInfo); MachinePointerInfo MPI; - if (LoMemVT.isScalableVector()) { - SDValue BytesIncrement = DAG.getVScale( - DL, Ptr.getValueType(), - APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize)); - MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace()); - Ptr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, BytesIncrement); - } else { - MPI = N->getPointerInfo().getWithOffset(IncrementSize); - // Increment the pointer to the other half. - Ptr = DAG.getObjectPtrOffset(DL, Ptr, IncrementSize); - } + IncrementPointer(N, LoMemVT, MPI, Ptr); if (isTruncating) Hi = DAG.getTruncStore(Ch, DL, Hi, Ptr, MPI, Index: llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h =================================================================== --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -776,6 +776,11 @@ void GetSplitVector(SDValue Op, SDValue &Lo, SDValue &Hi); void SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi); + // Helper function for incrementing the pointer when splitting + // memory operations + void IncrementPointer(MemSDNode *N, EVT MemVT, + MachinePointerInfo &MPI, SDValue &Ptr); + // Vector Result Splitting: <128 x ty> -> 2 x <64 x ty>. void SplitVectorResult(SDNode *N, unsigned ResNo); void SplitVecRes_BinOp(SDNode *N, SDValue &Lo, SDValue &Hi); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83577.277102.patch Type: text/x-patch Size: 4007 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 10:59:08 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 17:59:08 +0000 (UTC) Subject: [PATCH] D83570: [Matrix] Lowering pass should also run at O0 In-Reply-To: References: Message-ID: <05c36d24d3db74398910a303a7d4f611@localhost.localdomain> fhahn added a comment. There is already a patch to run a simple version of the lowering as part of the target pipelines D76858 . I didn't land it yet, as I first wanted to come up with a lightweight system to figure out if the lowering pass actually needs to run on a function beforehand, to keep compile times low, if no matrix intrinsics are present. But I did not have time to wrap this up yet. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83570/new/ https://reviews.llvm.org/D83570 From llvm-commits at lists.llvm.org Fri Jul 10 10:59:43 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via llvm-commits) Date: Fri, 10 Jul 2020 10:59:43 -0700 (PDT) Subject: [lld] add59ec - Re-land [CodeView] Add full repro to LF_BUILDINFO record Message-ID: <5f08ac8f.1c69fb81.392c6.038d@mx.google.com> Author: Alexandre Ganea Date: 2020-07-10T13:59:28-04:00 New Revision: add59ecb34e3003311b7e2318b16a0ef10c76d79 URL: https://github.com/llvm/llvm-project/commit/add59ecb34e3003311b7e2318b16a0ef10c76d79 DIFF: https://github.com/llvm/llvm-project/commit/add59ecb34e3003311b7e2318b16a0ef10c76d79.diff LOG: Re-land [CodeView] Add full repro to LF_BUILDINFO record This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable). Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding. For more information see PR36198 and D43002. Differential Revision: https://reviews.llvm.org/D80833 Added: clang/test/CodeGen/debug-info-codeview-buildinfo.c lld/test/COFF/pdb-relative-source-lines2.test Modified: lld/COFF/PDB.cpp lld/test/COFF/Inputs/pdb_lines_1_relative.yaml lld/test/COFF/Inputs/pdb_lines_2_relative.yaml lld/test/COFF/pdb-relative-source-lines.test llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp llvm/test/DebugInfo/COFF/build-info.ll llvm/test/DebugInfo/COFF/global-type-hashes.ll llvm/test/DebugInfo/COFF/types-basic.ll llvm/test/DebugInfo/COFF/types-data-members.ll Removed: ################################################################################ diff --git a/clang/test/CodeGen/debug-info-codeview-buildinfo.c b/clang/test/CodeGen/debug-info-codeview-buildinfo.c new file mode 100644 index 000000000000..3434f5f86579 --- /dev/null +++ b/clang/test/CodeGen/debug-info-codeview-buildinfo.c @@ -0,0 +1,26 @@ +// UNSUPPORTED: s390x +// RUN: %clang_cl /c /Z7 /Fo%t.obj -- %s +// RUN: llvm-pdbutil dump --types %t.obj | FileCheck %s +// RUN: %clang_cl /c /Z7 /Fo%t.obj -fdebug-compilation-dir . -- %s +// RUN: llvm-pdbutil dump --types %t.obj | FileCheck %s --check-prefix RELATIVE + +int main() { return 42; } + +// CHECK: Types (.debug$T) +// CHECK: ============================================================ +// CHECK: 0x[[PWD:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[PWDVAL:.+]] +// CHECK: 0x[[FILEPATH:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[FILEPATHVAL:.+[\\/]debug-info-codeview-buildinfo.c]] +// CHECK: 0x[[ZIPDB:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: +// CHECK: 0x[[TOOL:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[TOOLVAL:.+[\\/]clang.*]] +// CHECK: 0x[[CMDLINE:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: "-cc1 +// CHECK: 0x{{.+}} | LF_BUILDINFO [size = {{.+}}] +// CHECK: 0x[[PWD]]: `[[PWDVAL]]` +// CHECK: 0x[[TOOL]]: `[[TOOLVAL]]` +// CHECK: 0x[[FILEPATH]]: `[[FILEPATHVAL]]` +// CHECK: 0x[[ZIPDB]]: `` +// CHECK: 0x[[CMDLINE]]: `"-cc1 + +// RELATIVE: Types (.debug$T) +// RELATIVE: ============================================================ +// RELATIVE: 0x{{.+}} | LF_BUILDINFO [size = {{.+}}] +// RELATIVE: 0x{{.+}}: `.` diff --git a/lld/COFF/PDB.cpp b/lld/COFF/PDB.cpp index 49d04add5be0..5738eae7d6c4 100644 --- a/lld/COFF/PDB.cpp +++ b/lld/COFF/PDB.cpp @@ -250,6 +250,72 @@ static void addTypeInfo(pdb::TpiStreamBuilder &tpiBuilder, }); } +// LF_BUILDINFO records might contain relative paths, and we want to make them +// absolute. We do this remapping only after the type records were merged, +// because the full types graph isn't known during merging. In addition, we plan +// to multi-thread the type merging, and the change below needs to be done +// atomically, single-threaded. + +// A complication could arise when a LF_STRING_ID record already exists with the +// same content as the new absolutized path. In that case, we simply redirect +// LF_BUILDINFO's CurrentDirectory index to reference the existing LF_STRING_ID +// record. + +static void remapBuildInfo(TypeCollection &idTable) { + SimpleTypeSerializer s; + idTable.ForEachRecord([&](TypeIndex ti, const CVType &type) { + if (type.kind() != LF_BUILDINFO) + return; + BuildInfoRecord bi; + cantFail(TypeDeserializer::deserializeAs(const_cast(type), bi)); + + auto makeAbsoluteRecord = + [&](BuildInfoRecord::BuildInfoArg recordType) -> Optional { + TypeIndex recordTi = bi.getArgs()[recordType]; + if (recordTi.isNoneType()) + return None; + CVType recordRef = idTable.getType(recordTi); + + StringIdRecord record; + cantFail(TypeDeserializer::deserializeAs(recordRef, record)); + + SmallString<128> abolutizedPath(record.getString()); + pdbMakeAbsolute(abolutizedPath); + + if (abolutizedPath == record.getString()) + return None; // The path is already absolute. + + record.String = abolutizedPath; + ArrayRef recordData = s.serialize(record); + + // Replace the previous LF_STRING_ID record + if (!idTable.replaceType(recordTi, CVType(recordData), + /*Stabilize=*/true)) + return recordTi; + return None; + }; + + Optional curDirTI = + makeAbsoluteRecord(BuildInfoRecord::CurrentDirectory); + Optional buildToolTI = + makeAbsoluteRecord(BuildInfoRecord::BuildTool); + + if (curDirTI || buildToolTI) { + // This new record is already there. We don't want duplicates, so + // re-serialize the BuildInfoRecord instead. + if (curDirTI) + bi.ArgIndices[BuildInfoRecord::CurrentDirectory] = *curDirTI; + if (buildToolTI) + bi.ArgIndices[BuildInfoRecord::BuildTool] = *buildToolTI; + + ArrayRef biData = s.serialize(bi); + bool r = idTable.replaceType(ti, CVType(biData), /*Stabilize=*/true); + assert(r && "Didn't expect two build records pointing to the same OBJ!"); + (void)r; + } + }); +} + static bool remapTypeIndex(TypeIndex &ti, ArrayRef typeIndexMap) { if (ti.isSimple()) return true; @@ -988,6 +1054,9 @@ void PDBLinker::addObjectsToPDB() { builder.getStringTableBuilder().setStrings(pdbStrTab); t1.stop(); + // Remap the contents of the LF_BUILDINFO record. + remapBuildInfo(tMerger.getIDTable()); + // Construct TPI and IPI stream contents. ScopedTimer t2(tpiStreamLayoutTimer); addTypeInfo(builder.getTpiBuilder(), tMerger.getTypeTable()); diff --git a/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml b/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml index 947de419d6b8..9a6b192e1d0d 100644 --- a/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml +++ b/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml @@ -19,6 +19,7 @@ sections: Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ] Alignment: 4 SectionData: '' + SizeOfRawData: 0 - Name: .xdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -38,7 +39,6 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 - SectionData: 04000000F10000002F0000002D003C1100000000D0000700000000000000581B000000000000636C616E672076657273696F6E20372E302E30200000F1000000300000002A0047110000000000000000000000001B000000000000000000000002100000000000000000006D61696E0002004F11F20000003000000000000000000000001B00000000000000030000002400000000000000020000000C000000030000001100000004000000F400000030000000010000001001EA6429BCE282CCF3F0E3CD93B216EB410000110000001001061EB73ABB642532857A4F1D9CBAC3230000F30000001C000000002E5C7064625F6C696E65735F312E63002E5C666F6F2E6800000000 Subsections: - !Symbols Records: @@ -46,15 +46,15 @@ sections: Compile3Sym: Flags: [ ] Machine: X64 - FrontendMajor: 7 + FrontendMajor: 11 FrontendMinor: 0 FrontendBuild: 0 FrontendQFE: 0 - BackendMajor: 7000 + BackendMajor: 11000 BackendMinor: 0 BackendBuild: 0 BackendQFE: 0 - Version: 'clang version 7.0.0 ' + Version: 'clang version 11.0.0 (https://github.com/llvm/llvm-project.git 77dad72eae974338ddc13d74783c012ccbb8c5ac)' - !Symbols Records: - Kind: S_GPROC32_ID @@ -65,8 +65,17 @@ sections: FunctionType: 4098 Flags: [ ] DisplayName: main + - Kind: S_FRAMEPROC + FrameProcSym: + TotalFrameBytes: 40 + PaddingFrameBytes: 0 + OffsetToPadding: 0 + BytesOfCalleeSavedRegisters: 0 + OffsetOfExceptionHandler: 0 + SectionIdOfExceptionHandler: 0 + Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: + ScopeEndSym: {} - !Lines CodeSize: 27 Flags: [ ] @@ -87,15 +96,15 @@ sections: LineStart: 4 IsStatement: false EndDelta: 0 - Columns: + Columns: [] - !FileChecksums Checksums: - FileName: '.\pdb_lines_1.c' Kind: MD5 - Checksum: EA6429BCE282CCF3F0E3CD93B216EB41 + Checksum: 9A64DD4298487888B1D99F825D520C5E - FileName: '.\foo.h' Kind: MD5 - Checksum: 061EB73ABB642532857A4F1D9CBAC323 + Checksum: A9D05E6DC184DE20A57797E24F8B0E97 - !StringTable Strings: - '.\pdb_lines_1.c' @@ -103,23 +112,27 @@ sections: - '' - '' - '' + - !Symbols + Records: + - Kind: S_BUILDINFO + BuildInfoSym: + BuildId: 4105 Relocations: - - VirtualAddress: 100 + - VirtualAddress: 184 SymbolName: main Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 104 + - VirtualAddress: 188 SymbolName: main Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 124 + - VirtualAddress: 240 SymbolName: main Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 128 + - VirtualAddress: 244 SymbolName: main Type: IMAGE_REL_AMD64_SECTION - Name: '.debug$T' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 - SectionData: 0400000006000112000000000E0008107400000000000000001000001200011600000000011000006D61696E00F3F2F10E0008100300000000000000001000000E0001160000000003100000666F6F00 Types: - Kind: LF_ARGLIST ArgList: @@ -148,6 +161,25 @@ sections: ParentScope: 0 FunctionType: 4099 Name: foo + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: . + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: pdb_lines_1.c + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: 'buildninjaRel\bin\clang-cl.exe' + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: '"-cc1" "-triple" "x86_64-pc-windows-msvc19.26.28806" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "pdb_lines_1.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-resource-dir" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0" "-internal-isystem" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\ATLMFC\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\include" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\ucrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\shared" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\winrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\cppwinrt" "-fdebug-compilation-dir" "." "-ferror-limit" "19" "-fmessage-length=146" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.26.28806" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-faddrsig" "-x" "c"' + - Kind: LF_BUILDINFO + BuildInfo: + ArgIndices: [ 4101, 4103, 4102, 0, 4104 ] - Name: .pdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -160,8 +192,12 @@ sections: SymbolName: main Type: IMAGE_REL_AMD64_ADDR32NB - VirtualAddress: 8 - SymbolName: .xdata + SymbolTableIndex: 6 Type: IMAGE_REL_AMD64_ADDR32NB + - Name: .llvm_addrsig + Characteristics: [ IMAGE_SCN_LNK_REMOVE ] + Alignment: 1 + SectionData: 0A1D - Name: .xdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_LNK_COMDAT, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -169,7 +205,6 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_LNK_COMDAT, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 - SectionData: 04000000F10000002F000000290047110000000000000000000000000F00000000000000000000000410000000000000000000666F6F0002004F1100F20000003000000000000000000000000F000000180000000300000024000000000000000200000004000000030000000900000004000000 Subsections: - !Symbols Records: @@ -181,8 +216,17 @@ sections: FunctionType: 4100 Flags: [ ] DisplayName: foo + - Kind: S_FRAMEPROC + FrameProcSym: + TotalFrameBytes: 40 + PaddingFrameBytes: 0 + OffsetToPadding: 0 + BytesOfCalleeSavedRegisters: 0 + OffsetOfExceptionHandler: 0 + SectionIdOfExceptionHandler: 0 + Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: + ScopeEndSym: {} - !Lines CodeSize: 15 Flags: [ ] @@ -203,7 +247,7 @@ sections: LineStart: 4 IsStatement: false EndDelta: 0 - Columns: + Columns: [] Relocations: - VirtualAddress: 44 SymbolName: foo @@ -211,10 +255,10 @@ sections: - VirtualAddress: 48 SymbolName: foo Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 68 + - VirtualAddress: 100 SymbolName: foo Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 72 + - VirtualAddress: 104 SymbolName: foo Type: IMAGE_REL_AMD64_SECTION - Name: .pdata @@ -229,7 +273,7 @@ sections: SymbolName: foo Type: IMAGE_REL_AMD64_ADDR32NB - VirtualAddress: 8 - SymbolName: .xdata + SymbolTableIndex: 11 Type: IMAGE_REL_AMD64_ADDR32NB symbols: - Name: .text @@ -301,7 +345,7 @@ symbols: StorageClass: IMAGE_SYM_CLASS_EXTERNAL - Name: .xdata Value: 0 - SectionNumber: 10 + SectionNumber: 11 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC @@ -331,22 +375,22 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 264 + Length: 396 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 2204933783 + CheckSum: 3390249978 Number: 7 - Name: '.debug$S' Value: 0 - SectionNumber: 11 + SectionNumber: 12 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 116 + Length: 148 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 2691661839 + CheckSum: 1236081121 Number: 5 Selection: IMAGE_COMDAT_SELECT_ASSOCIATIVE - Name: '.debug$T' @@ -356,10 +400,10 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 80 + Length: 2028 NumberOfRelocations: 0 NumberOfLinenumbers: 0 - CheckSum: 3541780432 + CheckSum: 2043733667 Number: 8 - Name: .pdata Value: 0 @@ -375,7 +419,7 @@ symbols: Number: 9 - Name: .pdata Value: 0 - SectionNumber: 12 + SectionNumber: 13 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC @@ -386,6 +430,24 @@ symbols: CheckSum: 3642757804 Number: 5 Selection: IMAGE_COMDAT_SELECT_ASSOCIATIVE + - Name: .llvm_addrsig + Value: 0 + SectionNumber: 10 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_STATIC + SectionDefinition: + Length: 2 + NumberOfRelocations: 0 + NumberOfLinenumbers: 0 + CheckSum: 2582217811 + Number: 10 + - Name: '@feat.00' + Value: 0 + SectionNumber: -1 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_STATIC - Name: main Value: 0 SectionNumber: 1 @@ -398,4 +460,11 @@ symbols: SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_EXTERNAL + - Name: .file + Value: 0 + SectionNumber: -2 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_FILE + File: pdb_lines_1.c ... diff --git a/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml b/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml index 1b051d82d9a4..71ce5d63f508 100644 --- a/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml +++ b/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml @@ -15,6 +15,7 @@ sections: Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ] Alignment: 4 SectionData: '' + SizeOfRawData: 0 - Name: .drectve Characteristics: [ IMAGE_SCN_LNK_INFO, IMAGE_SCN_LNK_REMOVE ] Alignment: 1 @@ -22,7 +23,6 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 - SectionData: 04000000F10000002F0000002D003C1100000000D0000700000000000000581B000000000000636C616E672076657273696F6E20372E302E30200000F10000002F0000002900471100000000000000000000000001000000000000000000000002100000000000000000006261720002004F1100F2000000200000000000000000000000010000000000000001000000140000000000000002000000F400000018000000010000001001DF91CB3A2B8D917486574BB50CAC4CC70000F300000014000000002E5C7064625F6C696E65735F322E6300000000 Subsections: - !Symbols Records: @@ -30,15 +30,15 @@ sections: Compile3Sym: Flags: [ ] Machine: X64 - FrontendMajor: 7 + FrontendMajor: 11 FrontendMinor: 0 FrontendBuild: 0 FrontendQFE: 0 - BackendMajor: 7000 + BackendMajor: 11000 BackendMinor: 0 BackendBuild: 0 BackendQFE: 0 - Version: 'clang version 7.0.0 ' + Version: 'clang version 11.0.0 (https://github.com/llvm/llvm-project.git 77dad72eae974338ddc13d74783c012ccbb8c5ac)' - !Symbols Records: - Kind: S_GPROC32_ID @@ -49,8 +49,17 @@ sections: FunctionType: 4098 Flags: [ ] DisplayName: bar + - Kind: S_FRAMEPROC + FrameProcSym: + TotalFrameBytes: 0 + PaddingFrameBytes: 0 + OffsetToPadding: 0 + BytesOfCalleeSavedRegisters: 0 + OffsetOfExceptionHandler: 0 + SectionIdOfExceptionHandler: 0 + Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: + ScopeEndSym: {} - !Lines CodeSize: 1 Flags: [ ] @@ -63,35 +72,39 @@ sections: LineStart: 2 IsStatement: false EndDelta: 0 - Columns: + Columns: [] - !FileChecksums Checksums: - FileName: '.\pdb_lines_2.c' Kind: MD5 - Checksum: DF91CB3A2B8D917486574BB50CAC4CC7 + Checksum: 4CC58B73BFD5AB52F87CFB3C604BB288 - !StringTable Strings: - '.\pdb_lines_2.c' - '' - '' - '' + - !Symbols + Records: + - Kind: S_BUILDINFO + BuildInfoSym: + BuildId: 4103 Relocations: - - VirtualAddress: 100 + - VirtualAddress: 184 SymbolName: bar Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 104 + - VirtualAddress: 188 SymbolName: bar Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 124 + - VirtualAddress: 240 SymbolName: bar Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 128 + - VirtualAddress: 244 SymbolName: bar Type: IMAGE_REL_AMD64_SECTION - Name: '.debug$T' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 - SectionData: 0400000006000112000000000E0008100300000000000000001000000E000116000000000110000062617200 Types: - Kind: LF_ARGLIST ArgList: @@ -108,6 +121,29 @@ sections: ParentScope: 0 FunctionType: 4097 Name: bar + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: . + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: pdb_lines_2.c + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: 'buildninjaRel\bin\clang-cl.exe' + - Kind: LF_STRING_ID + StringId: + Id: 0 + String: '"-cc1" "-triple" "x86_64-pc-windows-msvc19.26.28806" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "pdb_lines_2.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-resource-dir" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0" "-internal-isystem" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\ATLMFC\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\include" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\ucrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\shared" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\winrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\cppwinrt" "-fdebug-compilation-dir" "." "-ferror-limit" "19" "-fmessage-length=146" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.26.28806" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-faddrsig" "-x" "c"' + - Kind: LF_BUILDINFO + BuildInfo: + ArgIndices: [ 4099, 4101, 4100, 0, 4102 ] + - Name: .llvm_addrsig + Characteristics: [ IMAGE_SCN_LNK_REMOVE ] + Alignment: 1 + SectionData: '' symbols: - Name: .text Value: 0 @@ -164,10 +200,10 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 216 + Length: 348 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 2383431754 + CheckSum: 2408981505 Number: 5 - Name: '.debug$T' Value: 0 @@ -176,15 +212,40 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 44 + Length: 1992 NumberOfRelocations: 0 NumberOfLinenumbers: 0 - CheckSum: 179171995 + CheckSum: 1158086003 Number: 6 + - Name: .llvm_addrsig + Value: 0 + SectionNumber: 7 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_STATIC + SectionDefinition: + Length: 0 + NumberOfRelocations: 0 + NumberOfLinenumbers: 0 + CheckSum: 0 + Number: 7 + - Name: '@feat.00' + Value: 0 + SectionNumber: -1 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_STATIC - Name: bar Value: 0 SectionNumber: 1 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_FUNCTION StorageClass: IMAGE_SYM_CLASS_EXTERNAL + - Name: .file + Value: 0 + SectionNumber: -2 + SimpleType: IMAGE_SYM_TYPE_NULL + ComplexType: IMAGE_SYM_DTYPE_NULL + StorageClass: IMAGE_SYM_CLASS_FILE + File: pdb_lines_2.c ... diff --git a/lld/test/COFF/pdb-relative-source-lines.test b/lld/test/COFF/pdb-relative-source-lines.test index 547056785962..632aa48cb6cd 100644 --- a/lld/test/COFF/pdb-relative-source-lines.test +++ b/lld/test/COFF/pdb-relative-source-lines.test @@ -15,7 +15,9 @@ int main(void) { void bar(void) { } -$ clang-cl -Xclang -fdebug-compilation-dir -Xclang . -c -Z7 pdb_lines*.c +$ clang-cl -fdebug-compilation-dir . -no-canonical-prefixes -c -Z7 pdb_lines*.c +$ obj2yaml pdb_lines_1.obj > pdb_lines_1_relative.yaml +$ obj2yaml pdb_lines_2.obj > pdb_lines_2_relative.yaml /pdbsourcepath: only sets the directory that relative paths are considered relative to, so this test needs to pass relative paths to lld-link for: @@ -33,9 +35,9 @@ RUN: cd %t RUN: yaml2obj %S/Inputs/pdb_lines_1_relative.yaml -o %t/pdb_lines_1_relative.obj RUN: yaml2obj %S/Inputs/pdb_lines_2_relative.yaml -o %t/pdb_lines_2_relative.obj RUN: ./lld-link -debug "-pdbsourcepath:c:\src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj -RUN: llvm-pdbutil pdb2yaml -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck %s +RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck %s RUN: ./lld-link -debug "-pdbsourcepath:/usr/src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj -RUN: llvm-pdbutil pdb2yaml -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=POSIX %s +RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=POSIX %s CHECK-LABEL: - Module: 'c:\src\pdb_lines_1_relative.obj' CHECK-NEXT: ObjFile: 'c:\src\pdb_lines_1_relative.obj' @@ -70,6 +72,20 @@ CHECK-NEXT: - 'c:\src\out.pdb' CHECK-NEXT: - cmd CHECK-NEXT: - '-debug -pdbsourcepath:c:\src -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj' +CHECK-LABEL: IpiStream: + +CHECK: - Kind: LF_STRING_ID +CHECK-NEXT: StringId: +CHECK-NEXT: Id: 0 +CHECK-NEXT: String: 'c:\src' +CHECK-NEXT: - Kind: LF_STRING_ID +CHECK-NEXT: StringId: +CHECK-NEXT: Id: 0 +CHECK-NEXT: String: pdb_lines_1.c +CHECK-NEXT: - Kind: LF_STRING_ID +CHECK-NEXT: StringId: +CHECK-NEXT: Id: 0 +CHECK-NEXT: String: 'c:\src\buildninjaRel\bin\clang-cl.exe' POSIX-LABEL: - Module: '/usr/src/pdb_lines_1_relative.obj' POSIX-NEXT: ObjFile: '/usr/src/pdb_lines_1_relative.obj' @@ -103,3 +119,17 @@ POSIX-NEXT: - pdb POSIX-NEXT: - '/usr/src/out.pdb' POSIX-NEXT: - cmd POSIX-NEXT: - '-debug -pdbsourcepath:/usr/src -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj' + +POSIX-LABEL: IpiStream: +POSIX: - Kind: LF_STRING_ID +POSIX-NEXT: StringId: +POSIX-NEXT: Id: 0 +POSIX-NEXT: String: '/usr/src' +POSIX-NEXT: - Kind: LF_STRING_ID +POSIX-NEXT: StringId: +POSIX-NEXT: Id: 0 +POSIX-NEXT: String: pdb_lines_1.c +POSIX-NEXT: - Kind: LF_STRING_ID +POSIX-NEXT: StringId: +POSIX-NEXT: Id: 0 +POSIX-NEXT: String: '/usr/src/buildninjaRel/bin/clang-cl.exe' diff --git a/lld/test/COFF/pdb-relative-source-lines2.test b/lld/test/COFF/pdb-relative-source-lines2.test new file mode 100644 index 000000000000..955f7bc1e453 --- /dev/null +++ b/lld/test/COFF/pdb-relative-source-lines2.test @@ -0,0 +1,66 @@ +REQUIRES: system-windows + +Test the linker line tables on roughly the following example: + +==> foo.h <== +void bar(void); +inline void foo(void) { + bar(); +} +==> pdb_lines_1.c <== +#include "foo.h" +int main(void) { + foo(); + return 42; +} +==> pdb_lines_2.c <== +void bar(void) { +} + +$ clang-cl -fdebug-compilation-dir . -no-canonical-prefixes -c -Z7 pdb_lines*.c +$ obj2yaml pdb_lines_1.obj > pdb_lines_1_relative.yaml +$ obj2yaml pdb_lines_2.obj > pdb_lines_2_relative.yaml + +/pdbsourcepath: only sets the directory that relative paths are considered +relative to, so this test needs to pass relative paths to lld-link for: +1. The input obj files +2. The /pdb: switch +3. The lld-link invocation itself +To achieve this, put all inputs of the lld-link invocation (including lld-link +itself) in a temp directory that's cwd and then make sure to only use relative +arguments when calling ./lld-link below. +RUN: rm -rf %t +RUN: mkdir %t +RUN: cp lld-link %t/lld-link +RUN: cd %t + +Test the convoluted case at the end of remapBuildInfo() in lld/COFF/PDB.cpp +The only drawback right now is that this edge case will create LF_BUILDINFO +records with front references in the IPI stream. However the Visual Studio +debugger takes the .PDB thusly created without any problems. +Tested on VS2015, 2017 and 2019. + +RUN: yaml2obj %S/Inputs/pdb_lines_1_relative.yaml -o %t/pdb_lines_1_relative.obj +RUN: sed -e "s|String: \.|String: "c:\\\src"|" < %S/Inputs/pdb_lines_2_relative.yaml > %t/pdb_lines_2_relative.yaml +RUN: yaml2obj pdb_lines_2_relative.yaml -o %t/pdb_lines_2_relative.obj +RUN: ./lld-link -debug "-pdbsourcepath:c:\src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj +RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=EXISTING %s + +EXISTING-LABEL: IpiStream: + +EXISTING: - Kind: LF_STRING_ID +EXISTING-NEXT: StringId: +EXISTING-NEXT: Id: 0 +EXISTING-NEXT: String: . +EXISTING-NEXT: - Kind: LF_STRING_ID +EXISTING-NEXT: StringId: +EXISTING-NEXT: Id: 0 +EXISTING-NEXT: String: pdb_lines_1.c +EXISTING: - Kind: LF_STRING_ID +EXISTING-NEXT: StringId: +EXISTING-NEXT: Id: 0 +EXISTING-LABEL: String: 'c:\src' +EXISTING-NEXT: - Kind: LF_STRING_ID +EXISTING-NEXT: StringId: +EXISTING-NEXT: Id: 0 +EXISTING-NEXT: String: pdb_lines_2.c diff --git a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp index f7041c0cc926..cf3c38c57f6d 100644 --- a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp @@ -77,6 +77,7 @@ #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormatVariadic.h" #include "llvm/Support/Path.h" +#include "llvm/Support/Program.h" #include "llvm/Support/SMLoc.h" #include "llvm/Support/ScopedPrinter.h" #include "llvm/Target/TargetLoweringObjectFile.h" @@ -831,6 +832,31 @@ static TypeIndex getStringIdTypeIdx(GlobalTypeTableBuilder &TypeTable, return TypeTable.writeLeafType(SIR); } +static std::string flattenCommandLine(ArrayRef Args, + StringRef MainFilename) { + std::string FlatCmdLine; + raw_string_ostream OS(FlatCmdLine); + StringRef LastArg; + for (StringRef Arg : Args) { + if (Arg.empty()) + continue; + // The command-line shall not contain the file to compile. + if (Arg == MainFilename && LastArg != "-main-file-name") + continue; + // Also remove the output file. + if (Arg == "-o" || LastArg == "-o") { + LastArg = Arg; + continue; + } + if (!LastArg.empty()) + OS << " "; + llvm::sys::printArg(OS, Arg, /*Quote=*/true); + LastArg = Arg; + } + OS.flush(); + return FlatCmdLine; +} + void CodeViewDebug::emitBuildInfo() { // First, make LF_BUILDINFO. It's a sequence of strings with various bits of // build info. The known prefix is: @@ -851,8 +877,16 @@ void CodeViewDebug::emitBuildInfo() { getStringIdTypeIdx(TypeTable, MainSourceFile->getDirectory()); BuildInfoArgs[BuildInfoRecord::SourceFile] = getStringIdTypeIdx(TypeTable, MainSourceFile->getFilename()); - // FIXME: Path to compiler and command line. PDB is intentionally blank unless - // we implement /Zi type servers. + // FIXME: PDB is intentionally blank unless we implement /Zi type servers. + BuildInfoArgs[BuildInfoRecord::TypeServerPDB] = + getStringIdTypeIdx(TypeTable, ""); + if (Asm->TM.Options.MCOptions.Argv0 != nullptr) { + BuildInfoArgs[BuildInfoRecord::BuildTool] = + getStringIdTypeIdx(TypeTable, Asm->TM.Options.MCOptions.Argv0); + BuildInfoArgs[BuildInfoRecord::CommandLine] = getStringIdTypeIdx( + TypeTable, flattenCommandLine(Asm->TM.Options.MCOptions.CommandLineArgs, + MainSourceFile->getFilename())); + } BuildInfoRecord BIR(BuildInfoArgs); TypeIndex BuildInfoIndex = TypeTable.writeLeafType(BIR); diff --git a/llvm/test/DebugInfo/COFF/build-info.ll b/llvm/test/DebugInfo/COFF/build-info.ll index 94f006c3b093..983aa22214bc 100644 --- a/llvm/test/DebugInfo/COFF/build-info.ll +++ b/llvm/test/DebugInfo/COFF/build-info.ll @@ -5,7 +5,7 @@ ; CHECK-NEXT: 0x{{.*}}: `D:\src\scopes\clang` ; CHECK-NEXT: : `` ; CHECK-NEXT: 0x{{.*}}: `D:\src\scopes\foo.cpp` -; CHECK-NEXT: : `` +; CHECK-NEXT: 0x{{.*}}: `` ; CHECK-NEXT: : `` ; CHECK: {{.*}} | S_BUILDINFO [size = 8] BuildId = `[[INFO_IDX]]` diff --git a/llvm/test/DebugInfo/COFF/global-type-hashes.ll b/llvm/test/DebugInfo/COFF/global-type-hashes.ll index 70f9df156a5b..3c6c27301b20 100644 --- a/llvm/test/DebugInfo/COFF/global-type-hashes.ll +++ b/llvm/test/DebugInfo/COFF/global-type-hashes.ll @@ -295,7 +295,8 @@ attributes #2 = { noinline nounwind optnone "correctly-rounded-divide-sqrt-fp-ma ; YAML: - 4470750F2E319329 ; YAML: - 0FB556FD1FAB66D7 ; YAML: - 5970EFB4874D0F3F -; YAML: - EDB1D74C120CF44A +; YAML: - D8EF11198C33843F +; YAML: - D81F744D7366282B ; ... diff --git a/llvm/test/DebugInfo/COFF/types-basic.ll b/llvm/test/DebugInfo/COFF/types-basic.ll index 81e0c25d17cd..6455452d125a 100644 --- a/llvm/test/DebugInfo/COFF/types-basic.ll +++ b/llvm/test/DebugInfo/COFF/types-basic.ll @@ -511,14 +511,22 @@ ; ASM: .asciz "t.cpp" # StringData ; ASM: .byte 242 ; ASM: .byte 241 -; ASM: # BuildInfo (0x1015) +; ASM: # StringId (0x1015) +; ASM: .short 0xa # Record length +; ASM: .short 0x1605 # Record kind: LF_STRING_ID +; ASM: .long 0x0 # Id +; ASM: .byte 0 # StringData +; ASM: .byte 243 +; ASM: .byte 242 +; ASM: .byte 241 +; ASM: # BuildInfo (0x1016) ; ASM: .short 0x1a # Record length ; ASM: .short 0x1603 # Record kind: LF_BUILDINFO ; ASM: .short 0x5 # NumArgs ; ASM: .long 0x1013 # Argument: D:\src\llvm\build ; ASM: .long 0x0 # Argument ; ASM: .long 0x1014 # Argument: t.cpp -; ASM: .long 0x0 # Argument +; ASM: .long 0x1015 # Argument ; ASM: .long 0x0 # Argument ; ASM: .byte 242 ; ASM: .byte 241 diff --git a/llvm/test/DebugInfo/COFF/types-data-members.ll b/llvm/test/DebugInfo/COFF/types-data-members.ll index 87fde74b989c..1e699efdf8ed 100644 --- a/llvm/test/DebugInfo/COFF/types-data-members.ll +++ b/llvm/test/DebugInfo/COFF/types-data-members.ll @@ -727,14 +727,22 @@ ; ASM: .asciz "t.cpp" # StringData ; ASM: .byte 242 ; ASM: .byte 241 -; ASM: # BuildInfo (0x1022) +; ASM: # StringId (0x1022) +; ASM: .short 0xa # Record length +; ASM: .short 0x1605 # Record kind: LF_STRING_ID +; ASM: .long 0x0 # Id +; ASM: .byte 0 # StringData +; ASM: .byte 243 +; ASM: .byte 242 +; ASM: .byte 241 +; ASM: # BuildInfo (0x1023) ; ASM: .short 0x1a # Record length ; ASM: .short 0x1603 # Record kind: LF_BUILDINFO ; ASM: .short 0x5 # NumArgs ; ASM: .long 0x1020 # Argument: D:\src\llvm\build ; ASM: .long 0x0 # Argument ; ASM: .long 0x1021 # Argument: t.cpp -; ASM: .long 0x0 # Argument +; ASM: .long 0x1022 # Argument ; ASM: .long 0x0 # Argument ; ASM: .byte 242 ; ASM: .byte 241 From llvm-commits at lists.llvm.org Fri Jul 10 11:07:30 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:07:30 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: <3541068eef02a52f5fab91d9eb1d4ef8@localhost.localdomain> reames added a comment. On first skim, looks much much better. I'm going to do a detailed pass through, but thank you for making the major design change requested. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Fri Jul 10 11:08:07 2020 From: llvm-commits at lists.llvm.org (Hafiz Abid Qadeer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:08:07 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. In-Reply-To: References: Message-ID: <3462143602fc751a3803e6df31a58f72@localhost.localdomain> abidh updated this revision to Diff 277104. abidh added a comment. I have removed the .eh_frame. I have looked a bit more into it. It seems that riscv target does create relocations in .gcc_except_table. The following commit gives some background on its encoding. https://reviews.llvm.org/rGab009a602e96b238000d9e20e5c54b078d08aad3 If I use -mno-relax during compilation then I dont see the relocations in .gcc_except_table and the problem goes away. I was able to create a simpler testcase which is attached with https://bugs.llvm.org/show_bug.cgi?id=46675 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83244/new/ https://reviews.llvm.org/D83244 Files: lld/ELF/Relocations.cpp lld/test/ELF/comdat-discarded-no-error.s Index: lld/test/ELF/comdat-discarded-no-error.s =================================================================== --- /dev/null +++ lld/test/ELF/comdat-discarded-no-error.s @@ -0,0 +1,18 @@ +# REQUIRES: x86 +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o +# RUN: echo '.section .text.foo,"axG", at progbits,foo,comdat; .globl foo; foo:' |\ +# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t2.o +# RUN: echo '.section .text.foo,"axG", at progbits,foo,comdat; .globl bar; bar:' |\ +# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t3.o + +# RUN: ld.lld %t2.o %t3.o %t1.o -o /dev/null 2>&1 + +.globl _start +_start: + nop + +.section .text.foo,"axG", at progbits,foo,comdat + nop + +.section .gcc_except_table,"a" + .quad .text.foo Index: lld/ELF/Relocations.cpp =================================================================== --- lld/ELF/Relocations.cpp +++ lld/ELF/Relocations.cpp @@ -955,6 +955,12 @@ (sec.name == ".got2" || sec.name == ".toc")) return false; + // The "gcc_except_table" can have relocations to discarded sections. + // Don't error out. + if (cast(sym).discardedSecIdx != 0 && + sec.name == ".gcc_except_table") + return false; + bool isWarning = (config->unresolvedSymbols == UnresolvedPolicy::Warn && canBeExternal) || config->noinhibitExec; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83244.277104.patch Type: text/x-patch Size: 1330 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:17:46 2020 From: llvm-commits at lists.llvm.org (Davide Italiano via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:17:46 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <5f7716e1c097e328a7085aeb1e37b6b3@localhost.localdomain> davide added a comment. This broke the modules build on macOS. warning: /Applications/Xcode5.app/Contents/Developer/Toolchains/OSX10.15.xctoolchain/usr/bin/libtool: warning for library: lib/libLLVMExtensions.a the table of contents is empty (no object file members in the library define global symbols) [320/3939] Building CXX object lib/Remarks/CMakeFiles/LLVMRemarks.dir/RemarkLinker.cpp.o FAILED: lib/Remarks/CMakeFiles/LLVMRemarks.dir/RemarkLinker.cpp.o /Applications/Xcode5.app/Contents/Developer/Toolchains/OSX10.15.xctoolchain/usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Remarks -I/Users/davide/work/llvm-project/llvm/lib/Remarks -I/Applications/Xcode5.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.Internal.sdk/usr/include/libxml2 -Iinclude -I/Users/davide/work/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/davide/work/build-modules/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode5.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.Internal.sdk -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Remarks/CMakeFiles/LLVMRemarks.dir/RemarkLinker.cpp.o -MF lib/Remarks/CMakeFiles/LLVMRemarks.dir/RemarkLinker.cpp.o.d -o lib/Remarks/CMakeFiles/LLVMRemarks.dir/RemarkLinker.cpp.o -c /Users/davide/work/llvm-project/llvm/lib/Remarks/RemarkLinker.cpp While building module 'LLVM_Object' imported from /Users/davide/work/llvm-project/llvm/include/llvm/Remarks/RemarkLinker.h:16: While building module 'LLVM_IR' imported from /Users/davide/work/llvm-project/llvm/include/llvm/Object/IRSymtab.h:29: While building module 'LLVM_intrinsic_gen' imported from /Users/davide/work/llvm-project/llvm/include/llvm/IR/IRPrintingPasses.h:22: In file included from :1: In file included from /Users/davide/work/llvm-project/llvm/include/llvm/IR/Argument.h:18: /Users/davide/work/llvm-project/llvm/include/llvm/IR/Attributes.h:75:14: fatal error: 'llvm/IR/Attributes.inc' file not found #include "llvm/IR/Attributes.inc" ^~~~~~~~~~~~~~~~~~~~~~~~ While building module 'LLVM_Object' imported from /Users/davide/work/llvm-project/llvm/include/llvm/Remarks/RemarkLinker.h:16: While building module 'LLVM_IR' imported from /Users/davide/work/llvm-project/llvm/include/llvm/Object/IRSymtab.h:29: In file included from :4: /Users/davide/work/llvm-project/llvm/include/llvm/IR/IRPrintingPasses.h:22:10: fatal error: could not build module 'LLVM_intrinsic_gen' #include "llvm/IR/PassManager.h" ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ While building module 'LLVM_Object' imported from /Users/davide/work/llvm-project/llvm/include/llvm/Remarks/RemarkLinker.h:16: In file included from :4: /Users/davide/work/llvm-project/llvm/include/llvm/Object/IRSymtab.h:29:10: fatal error: could not build module 'LLVM_IR' #include "llvm/IR/GlobalValue.h" ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ In file included from /Users/davide/work/llvm-project/llvm/lib/Remarks/RemarkLinker.cpp:13: /Users/davide/work/llvm-project/llvm/include/llvm/Remarks/RemarkLinker.h:16:10: fatal error: could not build module 'LLVM_Object' #include "llvm/Object/ObjectFile.h" ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ 4 errors generated. I'm going to revert, and I'm going to follow up with precise instructions on how to repro. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Fri Jul 10 11:18:54 2020 From: llvm-commits at lists.llvm.org (Davide Italiano via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:18:54 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <095123f3e7e19c38779520f8b0ea76fa@localhost.localdomain> davide added a comment. on any recent'ish macOS (although, I don't think the OS quite matters) % xcrun cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_MODULES=On -DLLVM_ENABLE_ASSERTIONS:BOOL=TRUE -DLLDB_ENABLE_PYTHON=On then % ninja check-lldb I feel the only relevant bit is `-DLLVM_ENABLE_MODULES=On `, YMMV. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Fri Jul 10 11:19:21 2020 From: llvm-commits at lists.llvm.org (Davide Italiano via llvm-commits) Date: Fri, 10 Jul 2020 11:19:21 -0700 (PDT) Subject: [llvm] fdb7856 - Revert "[NFC] Derive from PassInfoMixin for no-op/printing passes" Message-ID: <5f08b129.1c69fb81.16cfe.00a5@mx.google.com> Author: Davide Italiano Date: 2020-07-10T11:19:13-07:00 New Revision: fdb7856d54a1f81bab0ac0c8a4e984620589e699 URL: https://github.com/llvm/llvm-project/commit/fdb7856d54a1f81bab0ac0c8a4e984620589e699 DIFF: https://github.com/llvm/llvm-project/commit/fdb7856d54a1f81bab0ac0c8a4e984620589e699.diff LOG: Revert "[NFC] Derive from PassInfoMixin for no-op/printing passes" This reverts commit 8039d2c3bf14585ef37dc9343bf393ecad9aead9 as it breaks the modules build on macOS. Added: Modified: llvm/include/llvm/IR/IRPrintingPasses.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/IRPrintingPasses.h b/llvm/include/llvm/IR/IRPrintingPasses.h index 3a1c489ee09f..230db988f737 100644 --- a/llvm/include/llvm/IR/IRPrintingPasses.h +++ b/llvm/include/llvm/IR/IRPrintingPasses.h @@ -19,10 +19,17 @@ #define LLVM_IR_IRPRINTINGPASSES_H #include "llvm/ADT/StringRef.h" -#include "llvm/IR/PassManager.h" #include namespace llvm { +class Pass; +class Function; +class FunctionPass; +class Module; +class ModulePass; +class PreservedAnalyses; +class raw_ostream; +template class AnalysisManager; /// Create and return a pass that writes the module to the specified /// \c raw_ostream. @@ -64,7 +71,7 @@ extern bool shouldPrintAfterPass(StringRef); /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintModulePass : public PassInfoMixin { +class PrintModulePass { raw_ostream &OS; std::string Banner; bool ShouldPreserveUseListOrder; @@ -75,13 +82,15 @@ class PrintModulePass : public PassInfoMixin { bool ShouldPreserveUseListOrder = false); PreservedAnalyses run(Module &M, AnalysisManager &); + + static StringRef name() { return "PrintModulePass"; } }; /// Pass for printing a Function as LLVM's text IR assembly. /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintFunctionPass : public PassInfoMixin { +class PrintFunctionPass { raw_ostream &OS; std::string Banner; @@ -90,6 +99,8 @@ class PrintFunctionPass : public PassInfoMixin { PrintFunctionPass(raw_ostream &OS, const std::string &Banner = ""); PreservedAnalyses run(Function &F, AnalysisManager &); + + static StringRef name() { return "PrintFunctionPass"; } }; } // End llvm namespace diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 4189aea46294..1d9c44f385fb 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -33,6 +33,7 @@ #include #include using namespace llvm; +using namespace llvm::legacy; // See PassManagers.h for Pass Manager infrastructure overview. @@ -386,66 +387,6 @@ class FunctionPassManagerImpl : public Pass, void FunctionPassManagerImpl::anchor() {} char FunctionPassManagerImpl::ID = 0; - -//===----------------------------------------------------------------------===// -// FunctionPassManagerImpl implementation -// -bool FunctionPassManagerImpl::doInitialization(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - Changed |= getContainedManager(Index)->doInitialization(M); - - return Changed; -} - -bool FunctionPassManagerImpl::doFinalization(Module &M) { - bool Changed = false; - - for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) - Changed |= getContainedManager(Index)->doFinalization(M); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} - -void FunctionPassManagerImpl::releaseMemoryOnTheFly() { - if (!wasRun) - return; - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - FPPassManager *FPPM = getContainedManager(Index); - for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { - FPPM->getContainedPass(Index)->releaseMemory(); - } - } - wasRun = false; -} - -// Execute all the passes managed by this top level manager. -// Return true if any function is modified by a pass. -bool FunctionPassManagerImpl::run(Function &F) { - bool Changed = false; - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnFunction(F); - F.getContext().yield(); - } - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - getContainedManager(Index)->cleanup(); - - wasRun = true; - return Changed; -} } // namespace legacy } // namespace llvm @@ -465,7 +406,7 @@ class MPPassManager : public Pass, public PMDataManager { // Delete on the fly managers. ~MPPassManager() override { for (auto &OnTheFlyManager : OnTheFlyManagers) { - legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + FunctionPassManagerImpl *FPP = OnTheFlyManager.second; delete FPP; } } @@ -510,7 +451,7 @@ class MPPassManager : public Pass, public PMDataManager { for (unsigned Index = 0; Index < getNumContainedPasses(); ++Index) { ModulePass *MP = getContainedPass(Index); MP->dumpPassStructure(Offset + 1); - MapVector::const_iterator I = + MapVector::const_iterator I = OnTheFlyManagers.find(MP); if (I != OnTheFlyManagers.end()) I->second->dumpPassStructure(Offset + 2); @@ -530,7 +471,7 @@ class MPPassManager : public Pass, public PMDataManager { private: /// Collection of on the fly FPPassManagers. These managers manage /// function passes that are required by module passes. - MapVector OnTheFlyManagers; + MapVector OnTheFlyManagers; }; char MPPassManager::ID = 0; @@ -593,33 +534,6 @@ class PassManagerImpl : public Pass, void PassManagerImpl::anchor() {} char PassManagerImpl::ID = 0; - -//===----------------------------------------------------------------------===// -// PassManagerImpl implementation - -// -/// run - Execute all of the passes scheduled for execution. Keep track of -/// whether any of the passes modifies the module, and if so, return true. -bool PassManagerImpl::run(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnModule(M); - M.getContext().yield(); - } - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} } // namespace legacy } // namespace llvm @@ -1400,15 +1314,12 @@ AnalysisResolver::findImplPass(Pass *P, AnalysisID AnalysisPI, Function &F) { return PM.getOnTheFlyPass(P, AnalysisPI, F); } -namespace llvm { -namespace legacy { - //===----------------------------------------------------------------------===// // FunctionPassManager implementation /// Create new Function pass manager FunctionPassManager::FunctionPassManager(Module *m) : M(m) { - FPM = new legacy::FunctionPassManagerImpl(); + FPM = new FunctionPassManagerImpl(); // FPM is the top level manager. FPM->setTopLevelManager(FPM); @@ -1447,8 +1358,36 @@ bool FunctionPassManager::doInitialization() { bool FunctionPassManager::doFinalization() { return FPM->doFinalization(*M); } -} // namespace legacy -} // namespace llvm + +//===----------------------------------------------------------------------===// +// FunctionPassManagerImpl implementation +// +bool FunctionPassManagerImpl::doInitialization(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + Changed |= getContainedManager(Index)->doInitialization(M); + + return Changed; +} + +bool FunctionPassManagerImpl::doFinalization(Module &M) { + bool Changed = false; + + for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) + Changed |= getContainedManager(Index)->doFinalization(M); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} /// cleanup - After running all passes, clean up pass manager cache. void FPPassManager::cleanup() { @@ -1460,6 +1399,35 @@ void FPPassManager::cleanup() { } } +void FunctionPassManagerImpl::releaseMemoryOnTheFly() { + if (!wasRun) + return; + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + FPPassManager *FPPM = getContainedManager(Index); + for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { + FPPM->getContainedPass(Index)->releaseMemory(); + } + } + wasRun = false; +} + +// Execute all the passes managed by this top level manager. +// Return true if any function is modified by a pass. +bool FunctionPassManagerImpl::run(Function &F) { + bool Changed = false; + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnFunction(F); + F.getContext().yield(); + } + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + getContainedManager(Index)->cleanup(); + + wasRun = true; + return Changed; +} //===----------------------------------------------------------------------===// // FPPassManager implementation @@ -1586,7 +1554,7 @@ MPPassManager::runOnModule(Module &M) { // Initialize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + FunctionPassManagerImpl *FPP = OnTheFlyManager.second; Changed |= FPP->doInitialization(M); } @@ -1647,7 +1615,7 @@ MPPassManager::runOnModule(Module &M) { // Finalize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + FunctionPassManagerImpl *FPP = OnTheFlyManager.second; // We don't know when is the last time an on-the-fly pass is run, // so we need to releaseMemory / finalize here FPP->releaseMemoryOnTheFly(); @@ -1668,9 +1636,9 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { RequiredPass->getPotentialPassManagerType()) && "Unable to handle Pass that requires lower level Analysis pass"); - legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; + FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; if (!FPP) { - FPP = new legacy::FunctionPassManagerImpl(); + FPP = new FunctionPassManagerImpl(); // FPP is the top level manager. FPP->setTopLevelManager(FPP); @@ -1701,7 +1669,7 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { /// its runOnFunction() for function F. std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Function &F) { - legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; + FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; assert(FPP && "Unable to find on the fly pass"); FPP->releaseMemoryOnTheFly(); @@ -1710,8 +1678,32 @@ std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Changed); } -namespace llvm { -namespace legacy { +//===----------------------------------------------------------------------===// +// PassManagerImpl implementation + +// +/// run - Execute all of the passes scheduled for execution. Keep track of +/// whether any of the passes modifies the module, and if so, return true. +bool PassManagerImpl::run(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnModule(M); + M.getContext().yield(); + } + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} //===----------------------------------------------------------------------===// // PassManager implementation @@ -1736,8 +1728,6 @@ void PassManager::add(Pass *P) { bool PassManager::run(Module &M) { return PM->run(M); } -} // namespace legacy -} // namespace llvm //===----------------------------------------------------------------------===// // PMStack implementation @@ -1828,4 +1818,4 @@ void FunctionPass::assignPassManager(PMStack &PMS, PM->add(this); } -legacy::PassManagerBase::~PassManagerBase() {} +PassManagerBase::~PassManagerBase() {} diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 771cdfd17aa5..4d6c30b87a99 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -295,16 +295,11 @@ const PassBuilder::OptimizationLevel PassBuilder::OptimizationLevel::Oz = { namespace { -// The following passes/analyses have custom names, otherwise their name will -// include `(anonymous namespace)`. These are special since they are only for -// testing purposes and don't live in a header file. - /// No-op module pass which does nothing. -struct NoOpModulePass : PassInfoMixin { +struct NoOpModulePass { PreservedAnalyses run(Module &M, ModuleAnalysisManager &) { return PreservedAnalyses::all(); } - static StringRef name() { return "NoOpModulePass"; } }; @@ -320,7 +315,7 @@ class NoOpModuleAnalysis : public AnalysisInfoMixin { }; /// No-op CGSCC pass which does nothing. -struct NoOpCGSCCPass : PassInfoMixin { +struct NoOpCGSCCPass { PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &, LazyCallGraph &, CGSCCUpdateResult &UR) { return PreservedAnalyses::all(); @@ -342,7 +337,7 @@ class NoOpCGSCCAnalysis : public AnalysisInfoMixin { }; /// No-op function pass which does nothing. -struct NoOpFunctionPass : PassInfoMixin { +struct NoOpFunctionPass { PreservedAnalyses run(Function &F, FunctionAnalysisManager &) { return PreservedAnalyses::all(); } @@ -361,7 +356,7 @@ class NoOpFunctionAnalysis : public AnalysisInfoMixin { }; /// No-op loop pass which does nothing. -struct NoOpLoopPass : PassInfoMixin { +struct NoOpLoopPass { PreservedAnalyses run(Loop &L, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &) { return PreservedAnalyses::all(); @@ -387,7 +382,7 @@ AnalysisKey NoOpCGSCCAnalysis::Key; AnalysisKey NoOpFunctionAnalysis::Key; AnalysisKey NoOpLoopAnalysis::Key; -} // namespace +} // End anonymous namespace. void PassBuilder::invokePeepholeEPCallbacks( FunctionPassManager &FPM, PassBuilder::OptimizationLevel Level) { From llvm-commits at lists.llvm.org Fri Jul 10 11:19:47 2020 From: llvm-commits at lists.llvm.org (Davide Italiano via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:19:47 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <842afa176a04095d4e0099d7d725311b@localhost.localdomain> davide added a comment. Reverted in: commit fdb7856d54a1f81bab0ac0c8a4e984620589e699 (HEAD -> master, origin/master, origin/HEAD) Author: Davide Italiano Date: Fri Jul 10 11:16:33 2020 -0700 Revert "[NFC] Derive from PassInfoMixin for no-op/printing passes" This reverts commit 8039d2c3bf14585ef37dc9343bf393ecad9aead9 as it breaks the modules build on macOS. Don't hesitate to ping me if you need any other info to reproduce. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Fri Jul 10 11:20:22 2020 From: llvm-commits at lists.llvm.org (Whitney Tsang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:20:22 +0000 (UTC) Subject: [PATCH] D83543: [CodeMoverUtils] Add more data dependency related test case In-Reply-To: References: Message-ID: Whitney added inline comments. ================ Comment at: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp:679 + DependenceInfo &DI) { + Instruction *LoadA0 = getInstructionByName(F, "tmp0"); + Instruction *StoreA1 = LoadA0->getPrevNode(); ---------------- This may make it more clear: ``` Instruction *LoadA0 = getInstructionByName(F, "tmp0"); Instruction *LoadA1 = getInstructionByName(F, "tmp1"); Instruction *LoadA2 = getInstructionByName(F, "tmp2"); Instruction *LoadA3 = getInstructionByName(F, "tmp3"); Instruction *LoadB2 = getInstructionByName(F, "tmp4"); Instruction *LoadB3 = getInstructionByName(F, "tmp5"); Instruction *StoreA1 = LoadA0->getPrevNode(); Instruction *StoreA0 = StoreA1->getPrevNode(); Instruction *StoreB0 = LoadA0->getNextNode(); Instruction *StoreB1 = LoadA2->getPrevNode(); Instruction *StoreA2 = StoreB1->getPrevNode(); // Input forward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadA2, *LoadB2, DT, &PDT, &DI)); // Input backward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadA3, *LoadA2, DT, &PDT, &DI)); // Output forward dependency EXPECT_FALSE(isSafeToMoveBefore(*StoreA0, *LoadA0, DT, &PDT, &DI)); // Output backward dependency EXPECT_FALSE(isSafeToMoveBefore(*StoreA1, *StoreA0, DT, &PDT, &DI)); // Flow forward dependency EXPECT_FALSE(isSafeToMoveBefore(*StoreA1, *StoreB0, DT, &PDT, &DI)); // Flow backward dependency EXPECT_FALSE(isSafeToMoveBefore(*LoadA0, *StoreA1, DT, &PDT, &DI)); // Anti forward dependency EXPECT_FALSE(isSafeToMoveBefore(*LoadA1, *StoreB1, DT, &PDT, &DI)); // Anti backward dependency EXPECT_FALSE(isSafeToMoveBefore(*StoreA2, *LoadA1, DT, &PDT, &DI)); // No input backward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadB2, *LoadA3, DT, &PDT, &DI)); // No input forward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadA3, *LoadB3, DT, &PDT, &DI)); // No output forward dependency EXPECT_TRUE(isSafeToMoveBefore(*StoreA2, *LoadA2, DT, &PDT, &DI)); // No output backward dependency EXPECT_TRUE(isSafeToMoveBefore(*StoreB1, *StoreA2, DT, &PDT, &DI)); // No flow forward dependency EXPECT_TRUE(isSafeToMoveBefore(*StoreB0, *StoreA2, DT, &PDT, &DI)); // No flow backward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadA1, *StoreB0, DT, &PDT, &DI)); // No anti backward dependency EXPECT_TRUE(isSafeToMoveBefore(*StoreB0, *LoadA0, DT, &PDT, &DI)); // No anti forward dependency EXPECT_TRUE(isSafeToMoveBefore(*LoadA0, *LoadA1, DT, &PDT, &DI)); ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83543/new/ https://reviews.llvm.org/D83543 From llvm-commits at lists.llvm.org Fri Jul 10 11:21:51 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:21:51 +0000 (UTC) Subject: [PATCH] D83570: [Matrix] Lowering pass should also run at O0 In-Reply-To: References: Message-ID: SjoerdMeijer abandoned this revision. SjoerdMeijer added a comment. Ah okay, cheers, missed that. Time to abandon this. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83570/new/ https://reviews.llvm.org/D83570 From llvm-commits at lists.llvm.org Fri Jul 10 11:24:23 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:24:23 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: jfb updated this revision to Diff 277107. jfb marked 14 inline comments as done. jfb added a comment. Address more comments, add names. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 Files: llvm/docs/Contributing.rst llvm/docs/HowToSubmitABug.rst llvm/docs/Security.rst llvm/docs/index.rst Index: llvm/docs/index.rst =================================================================== --- llvm/docs/index.rst +++ llvm/docs/index.rst @@ -85,7 +85,7 @@ Reporting a security issue -* :ref:`How to report a security issue?` +* :ref:`report-security-issue` Indices and tables ================== Index: llvm/docs/Security.rst =================================================================== --- llvm/docs/Security.rst +++ llvm/docs/Security.rst @@ -19,12 +19,26 @@ Group Composition ================= -Initial group -------------- - -The initial security group will start small and grow following the process established below. The LLVM Board will pick 10 community members. These members shall represent a wide cross-section of the community, and meet the criteria for inclusion below. - -*FUTURE*: where we maintain a list of current Security Group members can be decided later. +Security Group Members +---------------------- + +The members of the group represent a wide cross-section of the community, and meet the criteria for inclusion below. + +* Akila Srinivasan (Apple) +* Dimitry Andric (invidual; FreeBSD) +* Ed Maste (individual; FreeBSD) +* JF Bastien (Apple) +* Josh Eads (Sony) +* Kristof Beyls (ARM) +* Matthew Riley (Google) +* Oliver Hunt (Apple) +* Paul Robinson (Sony) +* Peter Smith (ARM) +* Philip Reames (Azul Systems Inc) +* Pietro Albini (individual; Rust) +* Serge Guelton (RedHat) +* Shayne Hiet-Block (Microsoft) +* Steve Klabnik (Oxide Computer Company; Rust) Criteria -------- @@ -182,7 +196,14 @@ The security-sensitive parts of the LLVM Project currently are: * None (this process is new, the list hasn't been populated yet) +* *FUTURE*: this section will be expanded. + +The parts of the LLVM Project which are currently treated as non-security sensitive are: + +* Language front-ends, such as clang, for which a malicious input file can cause undesirable behavior. For example, a maliciously-crafter C or Rust source file can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been hardened, and compiling untrusted code usually also includes running utilities such as `make` which can more readily perform malicious things. +* *FUTURE*: this section will be expanded. +.. _report-security-issue: How to report a security issue? =============================== Index: llvm/docs/HowToSubmitABug.rst =================================================================== --- llvm/docs/HowToSubmitABug.rst +++ llvm/docs/HowToSubmitABug.rst @@ -10,7 +10,7 @@ about it. This document describes what you can do to increase the odds of getting it fixed quickly. -If you believe that the bug is security related, please follow :ref:`How to report a security issue?`. +🔒 If you believe that the bug is security related, please follow :ref:`report-security-issue`. 🔒 Basically you have to do two things at a minimum. First, decide whether the bug `crashes the compiler`_ (or an LLVM pass), or if the Index: llvm/docs/Contributing.rst =================================================================== --- llvm/docs/Contributing.rst +++ llvm/docs/Contributing.rst @@ -40,7 +40,7 @@ Reporting a Security Issue -------------------------- -There is a separate process to submit security-related bugs, see :ref:`How to report a security issue?`. +There is a separate process to submit security-related bugs, see :ref:`report-security-issue`. Bigger Pieces of Work --------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: D70326.277107.patch Type: text/x-patch Size: 3481 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:24:34 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:24:34 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <9010c38ba123c4889511a9f9799326d3@localhost.localdomain> jfb added inline comments. ================ Comment at: llvm/docs/Security.rst:25 + +The initial security group will start small and grow following the process established below. The LLVM Board will pick 10 community members. These members shall represent a wide cross-section of the community, and meet the criteria for inclusion below. + ---------------- rengolin wrote: > aadg wrote: > > I understand we have to solve a chicken & egg problem here to get the group started ; I think we should rather say that a call for application to the initial security group should be made, and the board will pick 10 candidates amongst the applications. The board can not possibly know everyone in the community, and to be effective, this group needs volunteers, not people who have been volunteered. > > > > 10 seems like a big number of people for an initial group --- given the number of people who expressed interest in the forming of this group, so what should we do if there are less than 10 volunteers ? > > > > The initial task for this group will probably be to finish fleshing up this proposal. > I agree on both points. We shouldn't burden the foundation with it nor we should restrict the number of members to a fixed size. The board's expressed preferred direction is to start with those who have self-identified in the RFC discussion. I'll go with that. ================ Comment at: llvm/docs/Security.rst:46 + + - Vendor contacts: + ---------------- theraven wrote: > psmith wrote: > > There are many vendors that build products from LLVM, and would like to be informed about vulnerabilities, but they may not be able to provide a security expert for the group. We may be at risk of putting off smaller vendors from putting names forward that largely want to be informed but may not be able to contribute fixes. > > > > I don't think this needs changing in the text though. We'll have to see how it goes. > I think that's a great point. We may want to have a two-ring approach, with a group that will coordinate the response and patch, and a wider distribution group that has access to the embargoed patch so that they can do package builds and coordinated releases. My concern over this approach is that it's much lower overhead to be in the second group and so the incentive is to only be in the second group, where you benefit from the process but don't contribute. > > This also needs to be balanced with the fact that a leak of an embargoed patch is more likely the more people are exposed to it (for example, OpenBSD leaked the fix for the KRACK WPA2 attack before the embargo, which put everyone else at risk and got the project banned from access to embargoed fixes for a few things). > > From the project's perspective, what is the benefit of having these small vendors participating? For the vendor to benefit, they must have a process for handling embargoed fixes and doing coordinated releases. It seems quite unlikely that such a vendor would not have someone who can help our at least in coordinating the response, if not in assessing the security. Agreed that this is a valid concern. I would like to figure it out as we move forward, not in the first steps of setting up our process. ================ Comment at: llvm/docs/Security.rst:52 + + - If already in the LLVM Security Group, has actively participated in one (if any) security issue in the last year. + - If already in the LLVM Security Group, has actively participated in most membership discussions in the last year. ---------------- rengolin wrote: > Redundant wording. Perhaps sub-bullet points? Using sub-bullets makes it unclear what is "and" and what isn't. ================ Comment at: llvm/docs/Security.rst:112 + +Following the process below, the LLVM Security Group decides on embargo date for public disclosure for each Security issue. An embargo may be lifted before the agreed-upon date if all vendors planning to ship a fix have already done so, and if the reporter does not object. + ---------------- rengolin wrote: > What if the group doesn't have a member from an affected vendor? How do we handle external vendor/country embargo? It won't be perfect at the start, but it's already way more imperfect right now. We should do outreach (such as on this review and the RFC), and over the first few months I think we'll be in a position where what you ask isn't an issue anymore. ================ Comment at: llvm/docs/Security.rst:153 +* Within two business days, a member of the Security Group is put in charge of driving the issue to an acceptable resolution. This champion doesn’t need to be the same person for each issue. This person can self-nominate. +* Members of the Security Group discuss in which circumstances (if any) an issue is relevant to security, and determine if it is a security issue. +* Negotiate an embargo date for public disclosure, with a default minimum time limit of ninety days. ---------------- psmith wrote: > Is it worth documenting what happens when the decision that the issue is not security-related? For example update "What is a security issue?" if necessary. > > We have time limits and a place for communicating fixes. How and where do we communicate a non-security issue? For example is there a LLVM-DEV post? > > I'm sure that there will be some decisions that will need revisiting due to community feedback or further information. I don't think that there needs to be a formal appeals procedure, I think that if the arguments are persuasive the committee can change their mind. An issue that isn't deemed part of the security surface area is opened to the public as part of the embargo. In most cases, we'd decide that there's no embargo. That being said, we can keep an embargo if, for example, it's not part of LLVM's security surface area but is for some non-LLVM project. Say: we use a 3rd party library in a non-secure manner, we're told of an issue, we say "not in out threat model", but other open-source projects have it in their thread model. In such a case we don't want to lift embargo, because it would affect the non-LLVM project. Ultimately, we'll also do a periodic report where those issues show up. ================ Comment at: llvm/docs/Security.rst:180 +.. _CVE process: https://cve.mitre.org +.. _chromium issue tracker: https://crbug.com +.. _GitHub security: https://help.github.com/en/articles/about-maintainer-security-advisories ---------------- kcc wrote: > crbug.org has been working well for us e.g. for oss-fuzz or for one-off cases like > https://bugs.chromium.org/p/chromium/issues/detail?id=994957 > https://bugs.chromium.org/p/chromium/issues/detail?id=606626 > > GitHub's security advisories are very recent and unclear if the workflow is polished. > E.g. I can't seem to add comments to the advisory once it's public. > I didn't check if these advisories have an API (they should). > > Yet, I think we should consider GitHub as the primary candidate because this is where LLVM is and where the majority of OSS people are. > We may need to ask GitHub to implement missing features, if any. That's been my thinking as well, and one of the first follow-ups I'd like to come after this initial commit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 11:25:00 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:25:00 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <9eb53010ab9120c2367e293a1c791316@localhost.localdomain> jfb added a comment. I believe this is now ready to go, with more to do afterwards. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 11:26:03 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:26:03 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: reames added a comment. Detailed comments on the new implementation. These are on the whole minor. Remember to add your new test file. I need to look more closely at the code outside StatepointLowering, will do that in a separate comment shortly. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:560 + if (willLowerDirectly(PtrSD) || P->getType()->isVectorTy() || + Builder.DL->isNonIntegralPointerType(P->getType())) { + LLVM_DEBUG(dbgs() << "spill "; PtrSD.dump(&Builder.DAG)); ---------------- I don't think the isNonIntegralPointer check is needed here. Can you either remove or explain? ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:602 + continue; + } + Ptr2ResNo[SDV] = -1; ---------------- This map population isn't related to stack reservation. Please either separate it into it's own loop, or merge it with the population of the LowerAsVReg set above. (If I understood this, it's simply the index into a flattened list of the values to be spilled in registers.) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:603 + } + Ptr2ResNo[SDV] = -1; reservePreviousStackSlotForValue(SI.Bases[i], Builder); ---------------- I'd suggest simply not adding anything for the map for the spill case. Using contains checks are more idiomatic of the purpose. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:640 // (base[0], ptr[0], base[1], ptr[1], ...) for (unsigned i = 0; i < SI.Bases.size(); ++i) { + SDValue Derived = Builder.getValue(SI.Ptrs[i]); ---------------- This block of code is functionally broken when base != derived. You have only added the vreg information for the derived, and would need to spill the base so that the GC can find it. The fix is trivial, pass "true" for the base case when Base != Derived. (Also, this is the probably the profitable lowering, so don't be tempted to add the base to a reg.) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:685 - if (Loc.getNode()) { + if (Ptr2ResNo[SDV] >= 0) { + SpillMap[V] = None; ---------------- See comment above about count checks. (Contains) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:735 + + unsigned NumVRegs = + llvm::count_if(Ptr2ResNo, [](auto V) { return V.second >= 0; }); ---------------- Isn't this simply Ptr2ResNo.size()? (after the change suggest above) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:854 + SDValue SD = getValue(Ptr); + if (Ptr2ResNo[SD] >= 0) { + NodeTys.push_back(SD.getValueType()); ---------------- Again, contains check, not default value. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:1122 + auto &DPtrMap = StatepointLowering.DerivedPtrMap; + auto It = DPtrMap.find(DerivedPtr); ---------------- Add a comment describing what this does. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Fri Jul 10 11:26:18 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:26:18 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v Message-ID: MaskRay created this revision. MaskRay added a reviewer: sammccall. Herald added subscribers: llvm-commits, rupprecht, dexonsmith, steven_wu, hiraditya. Herald added a reviewer: alexshap. Herald added a reviewer: rupprecht. Herald added a reviewer: jhenderson. Herald added a project: LLVM. lit can do substitution in an argument place (like `llvm-ar` in `ln -s llvm-ar`) which is counterintuitive. Replace the symlink mechanism with the more intuitive `EXE=$(command -v)` (POSIX shell comformant). Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83578 Files: llvm/test/tools/llvm-ar/tool-name.test llvm/test/tools/llvm-dlltool/tool-name.test llvm/test/tools/llvm-lib/tool-name.test llvm/test/tools/llvm-objcopy/tool-name.test llvm/test/tools/llvm-ranlib/tool-name.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83578.277108.patch Type: text/x-patch Size: 3747 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:27:55 2020 From: llvm-commits at lists.llvm.org (Lei Huang via llvm-commits) Date: Fri, 10 Jul 2020 11:27:55 -0700 (PDT) Subject: [llvm] 90b1a71 - [PowerPC] Enable default support of quad precision operations Message-ID: <5f08b32b.1c69fb81.403da.f806@mx.google.com> Author: Lei Huang Date: 2020-07-10T13:27:48-05:00 New Revision: 90b1a710aede2b276cda47538142fef6f5253361 URL: https://github.com/llvm/llvm-project/commit/90b1a710aede2b276cda47538142fef6f5253361 DIFF: https://github.com/llvm/llvm-project/commit/90b1a710aede2b276cda47538142fef6f5253361.diff LOG: [PowerPC] Enable default support of quad precision operations Summary: Remove option guarding support of quad precision operations. Reviewers: nemanjai, #powerpc, steven.zhang Reviewed By: nemanjai, #powerpc, steven.zhang Subscribers: qiucf, wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83437 Added: Modified: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll llvm/test/CodeGen/PowerPC/constant-pool.ll llvm/test/CodeGen/PowerPC/f128-aggregates.ll llvm/test/CodeGen/PowerPC/f128-arith.ll llvm/test/CodeGen/PowerPC/f128-bitcast.ll llvm/test/CodeGen/PowerPC/f128-compare.ll llvm/test/CodeGen/PowerPC/f128-conv.ll llvm/test/CodeGen/PowerPC/f128-fma.ll llvm/test/CodeGen/PowerPC/f128-passByValue.ll llvm/test/CodeGen/PowerPC/f128-rounding.ll llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll llvm/test/CodeGen/PowerPC/float-load-store-pair.ll llvm/test/CodeGen/PowerPC/fp-strict-f128.ll llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll llvm/test/CodeGen/PowerPC/recipest.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp index 229c5a76010c..49140bab5134 100644 --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp @@ -117,9 +117,6 @@ cl::desc("disable sibling call optimization on ppc"), cl::Hidden); static cl::opt DisableInnermostLoopAlign32("disable-ppc-innermost-loop-align32", cl::desc("don't always align innermost loop to 32 bytes on ppc"), cl::Hidden); -static cl::opt EnableQuadPrecision("enable-ppc-quad-precision", -cl::desc("enable quad precision float support on ppc"), cl::Hidden); - static cl::opt UseAbsoluteJumpTables("ppc-use-absolute-jumptables", cl::desc("use absolute jump tables on ppc"), cl::Hidden); @@ -1004,61 +1001,59 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM, setOperationAction(ISD::SRL, MVT::v1i128, Legal); setOperationAction(ISD::SRA, MVT::v1i128, Expand); - if (EnableQuadPrecision) { - addRegisterClass(MVT::f128, &PPC::VRRCRegClass); - setOperationAction(ISD::FADD, MVT::f128, Legal); - setOperationAction(ISD::FSUB, MVT::f128, Legal); - setOperationAction(ISD::FDIV, MVT::f128, Legal); - setOperationAction(ISD::FMUL, MVT::f128, Legal); - setOperationAction(ISD::FP_EXTEND, MVT::f128, Legal); - // No extending loads to f128 on PPC. - for (MVT FPT : MVT::fp_valuetypes()) - setLoadExtAction(ISD::EXTLOAD, MVT::f128, FPT, Expand); - setOperationAction(ISD::FMA, MVT::f128, Legal); - setCondCodeAction(ISD::SETULT, MVT::f128, Expand); - setCondCodeAction(ISD::SETUGT, MVT::f128, Expand); - setCondCodeAction(ISD::SETUEQ, MVT::f128, Expand); - setCondCodeAction(ISD::SETOGE, MVT::f128, Expand); - setCondCodeAction(ISD::SETOLE, MVT::f128, Expand); - setCondCodeAction(ISD::SETONE, MVT::f128, Expand); - - setOperationAction(ISD::FTRUNC, MVT::f128, Legal); - setOperationAction(ISD::FRINT, MVT::f128, Legal); - setOperationAction(ISD::FFLOOR, MVT::f128, Legal); - setOperationAction(ISD::FCEIL, MVT::f128, Legal); - setOperationAction(ISD::FNEARBYINT, MVT::f128, Legal); - setOperationAction(ISD::FROUND, MVT::f128, Legal); - - setOperationAction(ISD::SELECT, MVT::f128, Expand); - setOperationAction(ISD::FP_ROUND, MVT::f64, Legal); - setOperationAction(ISD::FP_ROUND, MVT::f32, Legal); - setTruncStoreAction(MVT::f128, MVT::f64, Expand); - setTruncStoreAction(MVT::f128, MVT::f32, Expand); - setOperationAction(ISD::BITCAST, MVT::i128, Custom); - // No implementation for these ops for PowerPC. - setOperationAction(ISD::FSIN , MVT::f128, Expand); - setOperationAction(ISD::FCOS , MVT::f128, Expand); - setOperationAction(ISD::FPOW, MVT::f128, Expand); - setOperationAction(ISD::FPOWI, MVT::f128, Expand); - setOperationAction(ISD::FREM, MVT::f128, Expand); - - // Handle constrained floating-point operations of fp128 - setOperationAction(ISD::STRICT_FADD, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FSUB, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FMUL, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FDIV, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FMA, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FSQRT, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FP_ROUND, MVT::f64, Legal); - setOperationAction(ISD::STRICT_FP_ROUND, MVT::f32, Legal); - setOperationAction(ISD::STRICT_FRINT, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FNEARBYINT, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FFLOOR, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FCEIL, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FTRUNC, MVT::f128, Legal); - setOperationAction(ISD::STRICT_FROUND, MVT::f128, Legal); - } + addRegisterClass(MVT::f128, &PPC::VRRCRegClass); + setOperationAction(ISD::FADD, MVT::f128, Legal); + setOperationAction(ISD::FSUB, MVT::f128, Legal); + setOperationAction(ISD::FDIV, MVT::f128, Legal); + setOperationAction(ISD::FMUL, MVT::f128, Legal); + setOperationAction(ISD::FP_EXTEND, MVT::f128, Legal); + // No extending loads to f128 on PPC. + for (MVT FPT : MVT::fp_valuetypes()) + setLoadExtAction(ISD::EXTLOAD, MVT::f128, FPT, Expand); + setOperationAction(ISD::FMA, MVT::f128, Legal); + setCondCodeAction(ISD::SETULT, MVT::f128, Expand); + setCondCodeAction(ISD::SETUGT, MVT::f128, Expand); + setCondCodeAction(ISD::SETUEQ, MVT::f128, Expand); + setCondCodeAction(ISD::SETOGE, MVT::f128, Expand); + setCondCodeAction(ISD::SETOLE, MVT::f128, Expand); + setCondCodeAction(ISD::SETONE, MVT::f128, Expand); + + setOperationAction(ISD::FTRUNC, MVT::f128, Legal); + setOperationAction(ISD::FRINT, MVT::f128, Legal); + setOperationAction(ISD::FFLOOR, MVT::f128, Legal); + setOperationAction(ISD::FCEIL, MVT::f128, Legal); + setOperationAction(ISD::FNEARBYINT, MVT::f128, Legal); + setOperationAction(ISD::FROUND, MVT::f128, Legal); + + setOperationAction(ISD::SELECT, MVT::f128, Expand); + setOperationAction(ISD::FP_ROUND, MVT::f64, Legal); + setOperationAction(ISD::FP_ROUND, MVT::f32, Legal); + setTruncStoreAction(MVT::f128, MVT::f64, Expand); + setTruncStoreAction(MVT::f128, MVT::f32, Expand); + setOperationAction(ISD::BITCAST, MVT::i128, Custom); + // No implementation for these ops for PowerPC. + setOperationAction(ISD::FSIN, MVT::f128, Expand); + setOperationAction(ISD::FCOS, MVT::f128, Expand); + setOperationAction(ISD::FPOW, MVT::f128, Expand); + setOperationAction(ISD::FPOWI, MVT::f128, Expand); + setOperationAction(ISD::FREM, MVT::f128, Expand); + + // Handle constrained floating-point operations of fp128 + setOperationAction(ISD::STRICT_FADD, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FSUB, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FMUL, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FDIV, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FMA, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FSQRT, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FP_ROUND, MVT::f64, Legal); + setOperationAction(ISD::STRICT_FP_ROUND, MVT::f32, Legal); + setOperationAction(ISD::STRICT_FRINT, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FNEARBYINT, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FFLOOR, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FCEIL, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FTRUNC, MVT::f128, Legal); + setOperationAction(ISD::STRICT_FROUND, MVT::f128, Legal); setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom); setOperationAction(ISD::BSWAP, MVT::v8i16, Legal); setOperationAction(ISD::BSWAP, MVT::v4i32, Legal); @@ -1307,20 +1302,18 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM, setTargetDAGCombine(ISD::VSELECT); } - if (EnableQuadPrecision) { - setLibcallName(RTLIB::LOG_F128, "logf128"); - setLibcallName(RTLIB::LOG2_F128, "log2f128"); - setLibcallName(RTLIB::LOG10_F128, "log10f128"); - setLibcallName(RTLIB::EXP_F128, "expf128"); - setLibcallName(RTLIB::EXP2_F128, "exp2f128"); - setLibcallName(RTLIB::SIN_F128, "sinf128"); - setLibcallName(RTLIB::COS_F128, "cosf128"); - setLibcallName(RTLIB::POW_F128, "powf128"); - setLibcallName(RTLIB::FMIN_F128, "fminf128"); - setLibcallName(RTLIB::FMAX_F128, "fmaxf128"); - setLibcallName(RTLIB::POWI_F128, "__powikf2"); - setLibcallName(RTLIB::REM_F128, "fmodf128"); - } + setLibcallName(RTLIB::LOG_F128, "logf128"); + setLibcallName(RTLIB::LOG2_F128, "log2f128"); + setLibcallName(RTLIB::LOG10_F128, "log10f128"); + setLibcallName(RTLIB::EXP_F128, "expf128"); + setLibcallName(RTLIB::EXP2_F128, "exp2f128"); + setLibcallName(RTLIB::SIN_F128, "sinf128"); + setLibcallName(RTLIB::COS_F128, "cosf128"); + setLibcallName(RTLIB::POW_F128, "powf128"); + setLibcallName(RTLIB::FMIN_F128, "fminf128"); + setLibcallName(RTLIB::FMAX_F128, "fmaxf128"); + setLibcallName(RTLIB::POWI_F128, "__powikf2"); + setLibcallName(RTLIB::REM_F128, "fmodf128"); // With 32 condition bits, we don't need to sink (and duplicate) compares // aggressively in CodeGenPrep. @@ -8308,7 +8301,7 @@ SDValue PPCTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG, const SDLoc &dl) const { // FP to INT conversions are legal for f128. - if (EnableQuadPrecision && (Op->getOperand(0).getValueType() == MVT::f128)) + if (Op->getOperand(0).getValueType() == MVT::f128) return Op; // Expand ppcf128 to i32 by hand for the benefit of llvm-gcc bootstrap on @@ -8576,7 +8569,7 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op, return LowerINT_TO_FPVector(Op, DAG, dl); // Conversions to f128 are legal. - if (EnableQuadPrecision && (Op.getValueType() == MVT::f128)) + if (Op.getValueType() == MVT::f128) return Op; if (Subtarget.hasQPX() && Op.getOperand(0).getValueType() == MVT::v4i1) { @@ -9104,10 +9097,9 @@ SDValue PPCTargetLowering::LowerBITCAST(SDValue Op, SelectionDAG &DAG) const { SDLoc dl(Op); SDValue Op0 = Op->getOperand(0); - if (!EnableQuadPrecision || - (Op.getValueType() != MVT::f128 ) || + if ((Op.getValueType() != MVT::f128) || (Op0.getOpcode() != ISD::BUILD_PAIR) || - (Op0.getOperand(0).getValueType() != MVT::i64) || + (Op0.getOperand(0).getValueType() != MVT::i64) || (Op0.getOperand(1).getValueType() != MVT::i64)) return SDValue(); @@ -16373,7 +16365,7 @@ bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const Function &F, case Type::DoubleTyID: return true; case Type::FP128TyID: - return EnableQuadPrecision && Subtarget.hasP9Vector(); + return Subtarget.hasP9Vector(); default: return false; } diff --git a/llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll b/llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll index 366493ae76b2..bd8d6099c40f 100644 --- a/llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll +++ b/llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll @@ -1,6 +1,5 @@ -; RUN: llc -verify-machineinstrs -mcpu=pwr9 -enable-ppc-quad-precision \ -; RUN: -mtriple=powerpc64le-unknown-unknown -ppc-vsr-nums-as-vr \ -; RUN: -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ +; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s | FileCheck %s @A = common global fp128 0xL00000000000000000000000000000000, align 16 @B = common global fp128 0xL00000000000000000000000000000000, align 16 diff --git a/llvm/test/CodeGen/PowerPC/constant-pool.ll b/llvm/test/CodeGen/PowerPC/constant-pool.ll index 797fc74672a2..4355cfa6ba21 100644 --- a/llvm/test/CodeGen/PowerPC/constant-pool.ll +++ b/llvm/test/CodeGen/PowerPC/constant-pool.ll @@ -1,6 +1,5 @@ ; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ -; RUN: -mcpu=future -enable-ppc-quad-precision -ppc-asm-full-reg-names \ -; RUN: < %s | FileCheck %s +; RUN: -mcpu=future -ppc-asm-full-reg-names < %s | FileCheck %s define float @FloatConstantPool() { ; CHECK-LABEL: FloatConstantPool: diff --git a/llvm/test/CodeGen/PowerPC/f128-aggregates.ll b/llvm/test/CodeGen/PowerPC/f128-aggregates.ll index 006ad745f607..094d29e2f258 100644 --- a/llvm/test/CodeGen/PowerPC/f128-aggregates.ll +++ b/llvm/test/CodeGen/PowerPC/f128-aggregates.ll @@ -1,9 +1,8 @@ ; RUN: llc -relocation-model=pic -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ -; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s \ +; RUN: | FileCheck %s ; RUN: llc -relocation-model=pic -mcpu=pwr9 -mtriple=powerpc64-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ -; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s \ +; RUN: -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s \ ; RUN: | FileCheck -check-prefix=CHECK-BE %s ; Testing homogeneous aggregates. diff --git a/llvm/test/CodeGen/PowerPC/f128-arith.ll b/llvm/test/CodeGen/PowerPC/f128-arith.ll index a957e0e6bdaa..40b123bb9276 100644 --- a/llvm/test/CodeGen/PowerPC/f128-arith.ll +++ b/llvm/test/CodeGen/PowerPC/f128-arith.ll @@ -1,5 +1,4 @@ -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs \ ; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | FileCheck %s ; Function Attrs: norecurse nounwind diff --git a/llvm/test/CodeGen/PowerPC/f128-bitcast.ll b/llvm/test/CodeGen/PowerPC/f128-bitcast.ll index 68069e542ffd..fca24f5fd541 100644 --- a/llvm/test/CodeGen/PowerPC/f128-bitcast.ll +++ b/llvm/test/CodeGen/PowerPC/f128-bitcast.ll @@ -1,10 +1,8 @@ -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs \ ; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | FileCheck %s -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ -; RUN: -ppc-asm-full-reg-names \ -; RUN: -ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=CHECK-BE +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-unknown -verify-machineinstrs \ +; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | \ +; RUN: FileCheck %s --check-prefix=CHECK-BE ; Function Attrs: norecurse nounwind readnone define i64 @getPart1(fp128 %in) local_unnamed_addr { diff --git a/llvm/test/CodeGen/PowerPC/f128-compare.ll b/llvm/test/CodeGen/PowerPC/f128-compare.ll index c876878f05fa..5376b3b3f1c5 100644 --- a/llvm/test/CodeGen/PowerPC/f128-compare.ll +++ b/llvm/test/CodeGen/PowerPC/f128-compare.ll @@ -1,5 +1,4 @@ -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs \ ; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s | FileCheck %s @a_qp = common global fp128 0xL00000000000000000000000000000000, align 16 diff --git a/llvm/test/CodeGen/PowerPC/f128-conv.ll b/llvm/test/CodeGen/PowerPC/f128-conv.ll index 4c64341d6349..2cb317492545 100644 --- a/llvm/test/CodeGen/PowerPC/f128-conv.ll +++ b/llvm/test/CodeGen/PowerPC/f128-conv.ll @@ -1,6 +1,6 @@ ; RUN: llc -relocation-model=pic -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -ppc-vsr-nums-as-vr \ -; RUN: -verify-machineinstrs -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: -ppc-vsr-nums-as-vr -verify-machineinstrs -ppc-asm-full-reg-names < %s \ +; RUN: | FileCheck %s @mem = global [5 x i64] [i64 56, i64 63, i64 3, i64 5, i64 6], align 8 @umem = global [5 x i64] [i64 560, i64 100, i64 34, i64 2, i64 5], align 8 diff --git a/llvm/test/CodeGen/PowerPC/f128-fma.ll b/llvm/test/CodeGen/PowerPC/f128-fma.ll index 8f76520d32bd..f63ae04699f4 100644 --- a/llvm/test/CodeGen/PowerPC/f128-fma.ll +++ b/llvm/test/CodeGen/PowerPC/f128-fma.ll @@ -1,6 +1,5 @@ ; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -ppc-vsr-nums-as-vr \ -; RUN: -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s | FileCheck %s define void @qpFmadd(fp128* nocapture readonly %a, fp128* nocapture %b, fp128* nocapture readonly %c, fp128* nocapture %res) { diff --git a/llvm/test/CodeGen/PowerPC/f128-passByValue.ll b/llvm/test/CodeGen/PowerPC/f128-passByValue.ll index cbccaea3bce1..8b2db6b03510 100644 --- a/llvm/test/CodeGen/PowerPC/f128-passByValue.ll +++ b/llvm/test/CodeGen/PowerPC/f128-passByValue.ll @@ -1,5 +1,4 @@ -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -ppc-vsr-nums-as-vr \ +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -ppc-vsr-nums-as-vr \ ; RUN: -verify-machineinstrs -ppc-asm-full-reg-names < %s | FileCheck %s ; Function Attrs: norecurse nounwind readnone diff --git a/llvm/test/CodeGen/PowerPC/f128-rounding.ll b/llvm/test/CodeGen/PowerPC/f128-rounding.ll index 063eb1456fd8..56f63be5734e 100644 --- a/llvm/test/CodeGen/PowerPC/f128-rounding.ll +++ b/llvm/test/CodeGen/PowerPC/f128-rounding.ll @@ -1,5 +1,4 @@ -; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -verify-machineinstrs \ +; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs \ ; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s | FileCheck %s diff --git a/llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll b/llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll index ebded9dcccbe..10d56fb2c47a 100644 --- a/llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll +++ b/llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll @@ -1,6 +1,6 @@ ; RUN: llc -relocation-model=pic -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -verify-machineinstrs -enable-ppc-quad-precision \ -; RUN: -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names < %s \ +; RUN: | FileCheck %s @f128Array = global [4 x fp128] [fp128 0xL00000000000000004004C00000000000, fp128 0xLF000000000000000400808AB851EB851, diff --git a/llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll b/llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll index bae676cd09cd..8542b1072233 100644 --- a/llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll +++ b/llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll @@ -1,10 +1,10 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -ppc-vsr-nums-as-vr \ ; RUN: -relocation-model=pic -ppc-asm-full-reg-names -verify-machineinstrs \ -; RUN: -enable-ppc-quad-precision < %s | FileCheck %s +; RUN: < %s | FileCheck %s ; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-unknown -ppc-vsr-nums-as-vr \ ; RUN: -ppc-asm-full-reg-names -verify-machineinstrs \ -; RUN: -enable-ppc-quad-precision < %s | FileCheck %s -check-prefix=CHECK-BE +; RUN: < %s | FileCheck %s -check-prefix=CHECK-BE ; Vector extract DWord and convert to quad precision. diff --git a/llvm/test/CodeGen/PowerPC/float-load-store-pair.ll b/llvm/test/CodeGen/PowerPC/float-load-store-pair.ll index a8ed39ee9ce0..caeff7155334 100644 --- a/llvm/test/CodeGen/PowerPC/float-load-store-pair.ll +++ b/llvm/test/CodeGen/PowerPC/float-load-store-pair.ll @@ -52,24 +52,22 @@ define signext i32 @test() nounwind { ; CHECK-NEXT: addis 3, 2, a10 at toc@ha ; CHECK-NEXT: lfd 10, a10 at toc@l(3) ; CHECK-NEXT: addis 3, 2, a11 at toc@ha -; CHECK-NEXT: addis 6, 2, a17 at toc@ha +; CHECK-NEXT: lfd 11, a11 at toc@l(3) +; CHECK-NEXT: addis 3, 2, a12 at toc@ha ; CHECK-NEXT: addis 5, 2, a16 at toc@ha +; CHECK-NEXT: addis 6, 2, a17 at toc@ha ; CHECK-NEXT: addi 6, 6, a17 at toc@l -; CHECK-NEXT: addi 5, 5, a16 at toc@l ; CHECK-NEXT: lxvx 34, 0, 6 +; CHECK-NEXT: lfd 12, a12 at toc@l(3) +; CHECK-NEXT: addis 3, 2, a13 at toc@ha +; CHECK-NEXT: addi 5, 5, a16 at toc@l ; CHECK-NEXT: addis 4, 2, a15 at toc@ha ; CHECK-NEXT: lxvx 0, 0, 5 ; CHECK-NEXT: ld 4, a15 at toc@l(4) -; CHECK-NEXT: li 5, 168 -; CHECK-NEXT: lfd 11, a11 at toc@l(3) -; CHECK-NEXT: addis 3, 2, a12 at toc@ha -; CHECK-NEXT: lfd 12, a12 at toc@l(3) -; CHECK-NEXT: addis 3, 2, a13 at toc@ha +; CHECK-NEXT: li 5, 152 ; CHECK-NEXT: lfd 13, a13 at toc@l(3) ; CHECK-NEXT: addis 3, 2, a14 at toc@ha ; CHECK-NEXT: ld 3, a14 at toc@l(3) -; CHECK-NEXT: stxvx 34, 1, 5 -; CHECK-NEXT: li 5, 152 ; CHECK-NEXT: stxvx 0, 1, 5 ; CHECK-NEXT: std 4, 144(1) ; CHECK-NEXT: std 3, 136(1) diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-f128.ll b/llvm/test/CodeGen/PowerPC/fp-strict-f128.ll index 21ddb799141d..cab949a6ebd2 100644 --- a/llvm/test/CodeGen/PowerPC/fp-strict-f128.ll +++ b/llvm/test/CodeGen/PowerPC/fp-strict-f128.ll @@ -1,5 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s -mtriple=powerpc64le-unknown-linux -mcpu=pwr9 -enable-ppc-quad-precision=true | FileCheck %s +; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr \ +; RUN: < %s -mtriple=powerpc64le-unknown-linux -mcpu=pwr9 | FileCheck %s declare fp128 @llvm.experimental.constrained.fadd.f128(fp128, fp128, metadata, metadata) declare fp128 @llvm.experimental.constrained.fsub.f128(fp128, fp128, metadata, metadata) diff --git a/llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll b/llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll index f29d9bd251ca..5c49760a9f45 100644 --- a/llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll +++ b/llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll @@ -1,7 +1,7 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ -; RUN: -mcpu=future -enable-ppc-quad-precision -ppc-asm-full-reg-names \ -; RUN: -ppc-vsr-nums-as-vr < %s | FileCheck %s +; RUN: -mcpu=future -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \ +; RUN: | FileCheck %s @_ZL13StaticBoolVar = internal unnamed_addr global i8 0, align 1 @_ZL19StaticSignedCharVar = internal unnamed_addr global i8 0, align 1 diff --git a/llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll b/llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll index 7f7659b356ee..c838308f3aa6 100644 --- a/llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll +++ b/llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll @@ -1,7 +1,7 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \ -; RUN: -mcpu=future -enable-ppc-quad-precision -ppc-asm-full-reg-names \ -; RUN: -ppc-vsr-nums-as-vr < %s | FileCheck %s +; RUN: -mcpu=future -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \ +; RUN: | FileCheck %s %struct.Struct = type { i8, i16, i32 } diff --git a/llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll b/llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll index feb20465c510..4762b10af7dd 100644 --- a/llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll +++ b/llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll @@ -1,5 +1,5 @@ ; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \ -; RUN: -enable-ppc-quad-precision -ppc-asm-full-reg-names < %s | FileCheck %s +; RUN: -ppc-asm-full-reg-names < %s | FileCheck %s ; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown \ ; RUN: -ppc-asm-full-reg-names < %s | FileCheck %s -check-prefix=CHECK-PWR8 \ ; RUN: -implicit-check-not "\" diff --git a/llvm/test/CodeGen/PowerPC/recipest.ll b/llvm/test/CodeGen/PowerPC/recipest.ll index 4276d944af8f..7ceddd95e573 100644 --- a/llvm/test/CodeGen/PowerPC/recipest.ll +++ b/llvm/test/CodeGen/PowerPC/recipest.ll @@ -1176,14 +1176,7 @@ define fp128 @hoo5_fmf(fp128 %a) #1 { ; ; CHECK-P9-LABEL: hoo5_fmf: ; CHECK-P9: # %bb.0: -; CHECK-P9-NEXT: mflr 0 -; CHECK-P9-NEXT: std 0, 16(1) -; CHECK-P9-NEXT: stdu 1, -32(1) -; CHECK-P9-NEXT: bl sqrtl -; CHECK-P9-NEXT: nop -; CHECK-P9-NEXT: addi 1, 1, 32 -; CHECK-P9-NEXT: ld 0, 16(1) -; CHECK-P9-NEXT: mtlr 0 +; CHECK-P9-NEXT: xssqrtqp 2, 2 ; CHECK-P9-NEXT: blr %r = call reassoc ninf afn fp128 @llvm.sqrt.f128(fp128 %a) ret fp128 %r @@ -1216,14 +1209,7 @@ define fp128 @hoo5_safe(fp128 %a) #1 { ; ; CHECK-P9-LABEL: hoo5_safe: ; CHECK-P9: # %bb.0: -; CHECK-P9-NEXT: mflr 0 -; CHECK-P9-NEXT: std 0, 16(1) -; CHECK-P9-NEXT: stdu 1, -32(1) -; CHECK-P9-NEXT: bl sqrtl -; CHECK-P9-NEXT: nop -; CHECK-P9-NEXT: addi 1, 1, 32 -; CHECK-P9-NEXT: ld 0, 16(1) -; CHECK-P9-NEXT: mtlr 0 +; CHECK-P9-NEXT: xssqrtqp 2, 2 ; CHECK-P9-NEXT: blr %r = call fp128 @llvm.sqrt.f128(fp128 %a) ret fp128 %r From llvm-commits at lists.llvm.org Fri Jul 10 11:27:57 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:27:57 +0000 (UTC) Subject: [PATCH] D83437: [PowerPC] Enable default support of quad precision operations In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG90b1a710aede: [PowerPC] Enable default support of quad precision operations (authored by lei). Changed prior to commit: https://reviews.llvm.org/D83437?vs=276579&id=277110#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83437/new/ https://reviews.llvm.org/D83437 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/builtins-ppc-p9-f128.ll llvm/test/CodeGen/PowerPC/constant-pool.ll llvm/test/CodeGen/PowerPC/f128-aggregates.ll llvm/test/CodeGen/PowerPC/f128-arith.ll llvm/test/CodeGen/PowerPC/f128-bitcast.ll llvm/test/CodeGen/PowerPC/f128-compare.ll llvm/test/CodeGen/PowerPC/f128-conv.ll llvm/test/CodeGen/PowerPC/f128-fma.ll llvm/test/CodeGen/PowerPC/f128-passByValue.ll llvm/test/CodeGen/PowerPC/f128-rounding.ll llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll llvm/test/CodeGen/PowerPC/f128-vecExtractNconv.ll llvm/test/CodeGen/PowerPC/float-load-store-pair.ll llvm/test/CodeGen/PowerPC/fp-strict-f128.ll llvm/test/CodeGen/PowerPC/global-address-non-got-indirect-access.ll llvm/test/CodeGen/PowerPC/pcrel-got-indirect.ll llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll llvm/test/CodeGen/PowerPC/recipest.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83437.277110.patch Type: text/x-patch Size: 23758 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:28:36 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:28:36 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 277111. MaskRay added a comment. CMD -> EXE Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83578/new/ https://reviews.llvm.org/D83578 Files: llvm/test/tools/llvm-ar/tool-name.test llvm/test/tools/llvm-dlltool/tool-name.test llvm/test/tools/llvm-lib/tool-name.test llvm/test/tools/llvm-objcopy/tool-name.test llvm/test/tools/llvm-ranlib/tool-name.test -------------- next part -------------- A non-text attachment was scrubbed... Name: D83578.277111.patch Type: text/x-patch Size: 3747 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:28:42 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:28:42 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <8ec8df4a8906b84f091bf192a291ac39@localhost.localdomain> jfb marked an inline comment as done. jfb added subscribers: ojhunt, peter.smith, dim, reames, emaste, probinson. jfb added inline comments. ================ Comment at: llvm/docs/Security.rst:41 +* Shayne Hiet-Block (Microsoft) +* Steve Klabnik (Oxide Computer Company; Rust) ---------------- Tagging @dim @emaste @kristof.beyls @mattdr @ojhunt @probinson @peter.smith @reames @Shayne, they're the folks who have Phabricator accounts from this list. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 11:34:53 2020 From: llvm-commits at lists.llvm.org (Steven Wan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:34:53 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][AIX] Unsupport test on AIX Message-ID: stevewan created this revision. Herald added subscribers: llvm-commits, rupprecht, MaskRay. Herald added a project: LLVM. The test `error-opening-directory.test` fails on AIX as it allows open() and read() on a directory. This patch adds `# UNSUPPORTED: system-aix` to the test to prevent it from running on AIX. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83579 Files: llvm/test/tools/llvm-ar/error-opening-directory.test Index: llvm/test/tools/llvm-ar/error-opening-directory.test =================================================================== --- llvm/test/tools/llvm-ar/error-opening-directory.test +++ llvm/test/tools/llvm-ar/error-opening-directory.test @@ -1,6 +1,6 @@ -## Unsupported on FreeBSD as FreeBSD 12 and earlier allow reading directories -## by default. -# UNSUPPORTED: system-freebsd +## Unsupported on AIX and FreeBSD as AIX and FreeBSD 12 and earlier allow +## reading directories by default. +# UNSUPPORTED: system-freebsd, system-aix # RUN: rm -rf %t && mkdir -p %t -------------- next part -------------- A non-text attachment was scrubbed... Name: D83579.277113.patch Type: text/x-patch Size: 575 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 11:37:22 2020 From: llvm-commits at lists.llvm.org (Jan Vesely via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:37:22 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: <004957a540e3114d73d64031882f9b5c@localhost.localdomain> jvesely added a comment. In D83473#2143394 , @bbrezillon wrote: > In D83473#2143152 , @jvesely wrote: > > > What is the problem this patch is trying to address? > > > Well, the primary goal was to have consistent values in `clang/lib/Headers/opencl-c-base.h` and `libclc/generic/include/clc/float/definitions.h` to avoid the mess when one links against libclc but includes `opencl-c-base.h`. > Not to mention that having 2 conflicting definitions in headers that both lives in the same code base and are supposed to represent the same thing is confusing, to say the least. That combination is not supported and should not be attempted. If you want to link with libclc. you should use clc headers installed by libclc. I agree that not having multiple conflicting headers is preferable, and I have no idea why clang added their own header instead of using libclc. I'd say a better solution would be to drop libclc headers entirely and switch the build to use clang's CLC header, or drop clang's CLC header, trying to sync two different locations is just asking for trouble. > > >> The specs do not mandate these two values to be different. > > It might be me misunderstanding the spec here. I read > >> "The value of FP_ILOGB0 shall be either INT_MIN or -INT_MAX. The value of FP_ILOGBNAN shall be either INT_MAX or INT_MIN." > > as (`FP_ILOGB0=INT_MIN` and `FP_ILOGBNAN=INT_MAX`) or (`FP_ILOGB0=-INT_MAX` and `FP_ILOGBNAN=INT_MIN`). > But you're probably right, there's nothing stating that `FP_ILOGB0` and `FP_ILOGBNAN` should map to different values, it's just the pattern I've seen in various libc implementations. > >> On the more practical side. >> This patch only changes fp32 implementation to return the new value leaving the fp64 implementation to return `INT_MIN` in both cases. > > Oops. The fp64 version should definitely be patched accordingly. > >> The implementation now returns `FP_ILOGBNAN` even for `Inf` input, which is not correct. > > Hm, nope, it still returns `0x7fffffff`, which is `INT_MAX`. I think you're referring to my comment, where I'm emitting the idea of merging the 2 tests into a single one since `FP_ILOGBNAN` is now also equal to `INT_MAX`, but as mentioned there, I think clarity prevails over optimization (especially since clang might optimize that for us anyway). It doesn't matter if the value is in hex or decimal or a define, The results of `ilogb(Inf)` and `ilogb(NaN)` is now indistinguishable to the caller. The spec is rather clear that the value of `FP_ILOGBNAN` shall be returned only if the input is NaN. Since libclc already uses `INT_MAX` for `Inf`, `FP_ILOGBNAN` cannot use the same value. > > >> CLC spec doesn't talk about `Inf` inputs, but the libm behaviour is to return `INT_MAX, which might be useful. > > Yep, and I didn't change that part. > >> If `FP_ILOGBNAN` and `FP_ILOGB0` need to be different it'd be better to use `FP_ILOGBNAN == INT_MIN` and `FP_ILOGB0 == -INT_MAX`. > > Except you'd then have a mismatch between `clang/lib/Headers/opencl-c-base.h` and `libclc/generic/include/clc/float/definitions.h`. > So maybe the answer is don't include `opencl-c-base.h` when you link against libclc, but as I mentioned above, the fact that both headers living in the same code base define 2 different values for the same thing is confusing. There are potentially tons of other differences between libclc and clang opencl headers, trying to sync them is futile. libclc historically supported multiple clang versions (the current head can be built using clang 3.9 -> clang 11), I'd strongly prefer to keep it that way until at least clc 1.2 is fully supported, but in the end, it's a question for @tstellar . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Fri Jul 10 11:38:47 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:38:47 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: reames requested changes to this revision. reames added a comment. This revision now requires changes to proceed. (Comments on code outside StatepointLowering. Minor +1 question) Overall, if you address the style comments from the this and the last comment, I think we're close to an LGTM. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:225 unsigned NumVRegs = HasVRegVariadicDefs ? NumResults : II.getNumDefs(); + if (Node->getMachineOpcode() == TargetOpcode::STATEPOINT) + NumVRegs = NumResults; ---------------- This line looks unnecessary provided the defs list is marked variadic. If you want to leave it for the moment, that's fine, but I think we can clean this up in a follow up change. (To be specific, OutOperandList in Target.td for STATEPOINT can be variable_ops, and this line disappears.) ================ Comment at: llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:128 const MCInstrDesc &II = TII->get(Def->getMachineOpcode()); - if (ResNo >= II.getNumDefs() && - II.ImplicitDefs[ResNo - II.getNumDefs()] == Reg) + if (ResNo >= II.getNumDefs() && II.hasImplicitDefOfPhysReg(Reg)) PhysReg = Reg; ---------------- I don't understand the implication of this change, can you explain? On the surface, this looks likely okay, but not understanding the reason for the change is bothering me. ================ Comment at: llvm/lib/CodeGen/TargetLoweringBase.cpp:1051 MIB.add(MO); + if (TiedTo < i) + MIB->tieOperands(TiedTo, MIB->getNumOperands() - 1); ---------------- Please add a comment which explains you're relying on all the definitions coming before any FI and thus the indices not changing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Fri Jul 10 11:38:50 2020 From: llvm-commits at lists.llvm.org (Steven Wan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:38:50 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][AIX] Unsupport test on AIX In-Reply-To: References: Message-ID: <3b21d774c8f0709c53b6be210afb52b4@localhost.localdomain> stevewan added a comment. FYI @sameerarora101, we on AIX are experiencing the exact same failure as what has been fixed in https://reviews.llvm.org/D82786 for FreeBSD. This patch adds AIX to the unsupported list as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83579/new/ https://reviews.llvm.org/D83579 From llvm-commits at lists.llvm.org Fri Jul 10 11:39:11 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:39:11 +0000 (UTC) Subject: [PATCH] D83499: [MSAN runtime] Add poison_stack function that also updates origin In-Reply-To: References: Message-ID: eugenis requested changes to this revision. eugenis added a comment. This revision now requires changes to proceed. I don't think 1.5% is good enough to introduce new runtime functions for the off-by-default code path. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83499/new/ https://reviews.llvm.org/D83499 From llvm-commits at lists.llvm.org Fri Jul 10 11:39:53 2020 From: llvm-commits at lists.llvm.org (Anh Tuyen Tran via llvm-commits) Date: Fri, 10 Jul 2020 11:39:53 -0700 (PDT) Subject: [llvm] e541e1b - [NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Message-ID: <5f08b5f9.1c69fb81.77d8f.029b@mx.google.com> Author: Sidharth Baveja Date: 2020-07-10T18:39:30Z New Revision: e541e1b757237172c247904b670c9894d6b3759d URL: https://github.com/llvm/llvm-project/commit/e541e1b757237172c247904b670c9894d6b3759d DIFF: https://github.com/llvm/llvm-project/commit/e541e1b757237172c247904b670c9894d6b3759d.diff LOG: [NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Summary: This patch separates the peeling specific parameters from the UnrollingPreferences, and creates a new struct called PeelingPreferences. Functions which used the UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580 Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index 695b7d6061c0..b6698eefdb01 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -450,11 +450,6 @@ class TargetTransformInfo { /// transformation will select an unrolling factor based on the current cost /// threshold and other factors. unsigned Count; - /// A forced peeling factor (the number of bodied of the original loop - /// that should be peeled off before the loop body). When set to 0, the - /// unrolling transformation will select a peeling factor based on profile - /// information and other factors. - unsigned PeelCount; /// Default unroll count for loops with run-time trip count. unsigned DefaultUnrollRuntimeCount; // Set the maximum unrolling factor. The unrolling factor may be selected @@ -488,19 +483,10 @@ class TargetTransformInfo { bool Force; /// Allow using trip count upper bound to unroll loops. bool UpperBound; - /// Allow peeling off loop iterations. - bool AllowPeeling; - /// Allow peeling off loop iterations for loop nests. - bool AllowLoopNestsPeeling; /// Allow unrolling of all the iterations of the runtime loop remainder. bool UnrollRemainder; /// Allow unroll and jam. Used to enable unroll and jam for the target. bool UnrollAndJam; - /// Allow peeling basing on profile. Uses to enable peeling off all - /// iterations basing on provided profile. - /// If the value is true the peeling cost model can decide to peel only - /// some iterations and in this case it will set this to false. - bool PeelProfiledIterations; /// Threshold for unroll and jam, for inner loop size. The 'Threshold' /// value above is used during unroll and jam for the outer loop size. /// This value is used in the same manner to limit the size of the inner @@ -534,6 +520,28 @@ class TargetTransformInfo { /// intrinsic is supported. bool emitGetActiveLaneMask() const; + // Parameters that control the loop peeling transformation + struct PeelingPreferences { + /// A forced peeling factor (the number of bodied of the original loop + /// that should be peeled off before the loop body). When set to 0, the + /// a peeling factor based on profile information and other factors. + unsigned PeelCount; + /// Allow peeling off loop iterations. + bool AllowPeeling; + /// Allow peeling off loop iterations for loop nests. + bool AllowLoopNestsPeeling; + /// Allow peeling basing on profile. Uses to enable peeling off all + /// iterations basing on provided profile. + /// If the value is true the peeling cost model can decide to peel only + /// some iterations and in this case it will set this to false. + bool PeelProfiledIterations; + }; + + /// Get target-customized preferences for the generic loop peeling + /// transformation. The caller will initialize \p PP with the current + /// target-independent defaults with information from \p L and \p SE. + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const; /// @} /// \name Scalar Target Information @@ -1282,6 +1290,8 @@ class TargetTransformInfo::Concept { virtual bool isLoweredToCall(const Function *F) = 0; virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP) = 0; + virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) = 0; virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, @@ -1560,6 +1570,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { UnrollingPreferences &UP) override { return Impl.getUnrollingPreferences(L, SE, UP); } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) override { + return Impl.getPeelingPreferences(L, SE, PP); + } bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) override { diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index ca7106ab98aa..0ce975d6d4b5 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -150,6 +150,9 @@ class TargetTransformInfoImplBase { void getUnrollingPreferences(Loop *, ScalarEvolution &, TTI::UnrollingPreferences &) {} + void getPeelingPreferences(Loop *, ScalarEvolution &, + TTI::PeelingPreferences &) {} + bool isLegalAddImmediate(int64_t Imm) { return false; } bool isLegalICmpImmediate(int64_t Imm) { return false; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index c6a9a65ae6c1..f9d32eadd23e 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -451,6 +451,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { UP.BEInsns = 2; } + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + PP.PeelCount = 0; + PP.AllowPeeling = true; + PP.AllowLoopNestsPeeling = false; + PP.PeelProfiledIterations = true; + } + bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE, AssumptionCache &AC, TargetLibraryInfo *LibInfo, diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index 1970cefcefba..bb3d02b95956 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -94,6 +94,7 @@ bool UnrollRuntimeLoopRemainder( void computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE); bool canPeel(Loop *L); @@ -119,6 +120,8 @@ bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, + bool &UseUpperBound); void simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI, @@ -133,9 +136,13 @@ TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount); + Optional UserUpperBound, Optional UserFullUnrollMaxCount); + +TargetTransformInfo::PeelingPreferences +gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling); unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent, diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 87c6f83938ed..2f051e53790b 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -327,6 +327,11 @@ void TargetTransformInfo::getUnrollingPreferences( return TTIImpl->getUnrollingPreferences(L, SE, UP); } +void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + PeelingPreferences &PP) const { + return TTIImpl->getPeelingPreferences(L, SE, PP); +} + bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { return TTIImpl->isLegalAddImmediate(Imm); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index be0c51b83a25..cf6de797727b 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -859,6 +859,11 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, getFalkorUnrollingPreferences(L, SE, UP); } +void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType) { switch (Inst->getIntrinsicID()) { diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h index 6b1e5d5083e2..1f029689a60e 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h @@ -153,6 +153,9 @@ class AArch64TTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, Type *ExpectedType); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp index 8783427b5002..542a5f006c0f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -236,6 +236,10 @@ void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, } } +void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} unsigned GCNTTIImpl::getHardwareNumberOfRegisters(bool Vec) const { // The concept of vector registers doesn't really exist. Some packed vector // operations operate on the normal 32-bit registers. @@ -997,6 +1001,11 @@ void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, CommonTTI.getUnrollingPreferences(L, SE, UP); } +void GCNTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} + unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } @@ -1103,3 +1112,8 @@ void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { CommonTTI.getUnrollingPreferences(L, SE, UP); } + +void R600TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + CommonTTI.getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h index b8a027c79bfc..3364a9bcaccb 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h @@ -61,6 +61,9 @@ class AMDGPUTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); }; class GCNTTIImpl final : public BasicTTIImplBase { @@ -146,6 +149,9 @@ class GCNTTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); return TTI::PSK_FastHardware; @@ -264,6 +270,8 @@ class R600TTIImpl final : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); unsigned getHardwareNumberOfRegisters(bool Vec) const; unsigned getNumberOfRegisters(bool Vec) const; unsigned getRegisterBitWidth(bool Vector) const; diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 44dfb9e8c129..74b1331216a0 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1582,6 +1582,11 @@ void ARMTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void ARMTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} + bool ARMTTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty, TTI::ReductionFlags Flags) const { return ST->hasMVEIntegerOps(); diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index 5d914227c968..537a546361ee 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -251,6 +251,8 @@ class ARMTTIImpl : public BasicTTIImplBase { bool emitGetActiveLaneMask() const; + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool shouldBuildLookupTablesForConstant(Constant *C) const { // In the ROPI and RWPI relocation models we can't have pointers to global // variables or functions in constant data, so don't convert switches to diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index 76df4e8e1931..80c8736cb74a 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -78,12 +78,17 @@ HexagonTTIImpl::getPopcntSupport(unsigned IntTyWidthInBit) const { void HexagonTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP) { UP.Runtime = UP.Partial = true; +} + +void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); // Only try to peel innermost loops with small runtime trip counts. if (L && L->empty() && canPeel(L) && SE.getSmallConstantTripCount(L) == 0 && SE.getSmallConstantMaxTripCount(L) > 0 && SE.getSmallConstantMaxTripCount(L) <= 5) { - UP.PeelCount = 2; + PP.PeelCount = 2; } } diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index 3365c5bf1cb1..5fe397486402 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -64,6 +64,9 @@ class HexagonTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + /// Bias LSR towards creating post-increment opportunities. bool shouldFavorPostInc() const; diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index 5c14d0f1a24d..3873c73fb2e0 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -155,3 +155,8 @@ void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Partial = UP.Runtime = true; UP.PartialThreshold = UP.Threshold / 4; } + +void NVPTXTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h index 88156f687284..cb832031f1ad 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h @@ -95,6 +95,10 @@ class NVPTXTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { // Volatile loads/stores are only supported for shared and global address // spaces, or for generic AS that maps to them. diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index f2c746a14299..53556ffc267d 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -568,6 +568,10 @@ void PPCTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, BaseT::getUnrollingPreferences(L, SE, UP); } +void PPCTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} // This function returns true to allow using coldcc calling convention. // Returning true results in coldcc being used for functions which are cold at // all call sites when the callers of the functions are not calling any other diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h index b831789d3e6e..d998521084e1 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h @@ -66,6 +66,8 @@ class PPCTTIImpl : public BasicTTIImplBase { TargetLibraryInfo *LibInfo); void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp index 36141426e27d..864200e5f71c 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp @@ -294,6 +294,10 @@ void SystemZTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE, UP.Force = true; } +void SystemZTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP) { + BaseT::getPeelingPreferences(L, SE, PP); +} bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2) { diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h index d20541774da1..7f8f7f6f923f 100644 --- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h +++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h @@ -50,6 +50,9 @@ class SystemZTTIImpl : public BasicTTIImplBase { void getUnrollingPreferences(Loop *L, ScalarEvolution &SE, TTI::UnrollingPreferences &UP); + void getPeelingPreferences(Loop *L, ScalarEvolution &SE, + TTI::PeelingPreferences &PP); + bool isLSRCostLess(TargetTransformInfo::LSRCost &C1, TargetTransformInfo::LSRCost &C2); /// @} diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp index f0ece1faa5fd..285cba6ee205 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp @@ -158,7 +158,8 @@ static bool computeUnrollAndJamCount( const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned OuterTripCount, unsigned OuterTripMultiple, unsigned OuterLoopSize, unsigned InnerTripCount, - unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP) { + unsigned InnerLoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP) { // First up use computeUnrollCount from the loop unroller to get a count // for unrolling the outer loop, plus any loops requiring explicit // unrolling we leave to the unroller. This uses UP.Threshold / @@ -168,7 +169,8 @@ static bool computeUnrollAndJamCount( bool UseUpperBound = false; bool ExplicitUnroll = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, ORE, OuterTripCount, MaxTripCount, - /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, UseUpperBound); + /*MaxOrZero*/ false, OuterTripMultiple, OuterLoopSize, UP, PP, + UseUpperBound); if (ExplicitUnroll || UseUpperBound) { // If the user explicitly set the loop as unrolled, dont UnJ it. Leave it // for the unroller instead. @@ -282,7 +284,9 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, OptimizationRemarkEmitter &ORE, int OptLevel) { TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None, - None, None, None, None, None, None, None); + None, None, None, None, None); + TargetTransformInfo::PeelingPreferences PP = + gatherPeelingPreferences(L, SE, TTI, None, None); if (AllowUnrollAndJam.getNumOccurrences() > 0) UP.UnrollAndJam = AllowUnrollAndJam; if (UnrollAndJamThreshold.getNumOccurrences() > 0) @@ -367,7 +371,7 @@ tryToUnrollAndJamLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, // Decide if, and by how much, to unroll bool IsCountSetExplicitly = computeUnrollAndJamCount( L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount, - OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP); + OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP, PP); if (UP.Count <= 1) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index ec56610e41e5..87f40bb7ba85 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -193,9 +193,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, int OptLevel, Optional UserThreshold, Optional UserCount, Optional UserAllowPartial, Optional UserRuntime, - Optional UserUpperBound, Optional UserAllowPeeling, - Optional UserAllowProfileBasedPeeling, - Optional UserFullUnrollMaxCount) { + Optional UserUpperBound, Optional UserFullUnrollMaxCount) { TargetTransformInfo::UnrollingPreferences UP; // Set up the defaults @@ -206,7 +204,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.PartialThreshold = 150; UP.PartialOptSizeThreshold = 0; UP.Count = 0; - UP.PeelCount = 0; UP.DefaultUnrollRuntimeCount = 8; UP.MaxCount = std::numeric_limits::max(); UP.FullUnrollMaxCount = std::numeric_limits::max(); @@ -218,10 +215,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.AllowExpensiveTripCount = false; UP.Force = false; UP.UpperBound = false; - UP.AllowPeeling = true; - UP.AllowLoopNestsPeeling = false; UP.UnrollAndJam = false; - UP.PeelProfiledIterations = true; UP.UnrollAndJamInnerLoopThreshold = 60; UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze; @@ -249,8 +243,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.MaxCount = UnrollMaxCount; if (UnrollFullMaxCount.getNumOccurrences() > 0) UP.FullUnrollMaxCount = UnrollFullMaxCount; - if (UnrollPeelCount.getNumOccurrences() > 0) - UP.PeelCount = UnrollPeelCount; if (UnrollAllowPartial.getNumOccurrences() > 0) UP.Partial = UnrollAllowPartial; if (UnrollAllowRemainder.getNumOccurrences() > 0) @@ -259,10 +251,6 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = UnrollRuntime; if (UnrollMaxUpperBound == 0) UP.UpperBound = false; - if (UnrollAllowPeeling.getNumOccurrences() > 0) - UP.AllowPeeling = UnrollAllowPeeling; - if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) - UP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; if (UnrollUnrollRemainder.getNumOccurrences() > 0) UP.UnrollRemainder = UnrollUnrollRemainder; if (UnrollMaxIterationsCountToAnalyze.getNumOccurrences() > 0) @@ -281,16 +269,45 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences( UP.Runtime = *UserRuntime; if (UserUpperBound.hasValue()) UP.UpperBound = *UserUpperBound; - if (UserAllowPeeling.hasValue()) - UP.AllowPeeling = *UserAllowPeeling; - if (UserAllowProfileBasedPeeling.hasValue()) - UP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; if (UserFullUnrollMaxCount.hasValue()) UP.FullUnrollMaxCount = *UserFullUnrollMaxCount; return UP; } +TargetTransformInfo::PeelingPreferences +llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE, + const TargetTransformInfo &TTI, + Optional UserAllowPeeling, + Optional UserAllowProfileBasedPeeling) { + TargetTransformInfo::PeelingPreferences PP; + + // Default values + PP.PeelCount = 0; + PP.AllowPeeling = true; + PP.AllowLoopNestsPeeling = false; + PP.PeelProfiledIterations = true; + + // Get Target Specifc Values + TTI.getPeelingPreferences(L, SE, PP); + + // User Specified Values using cl::opt + if (UnrollPeelCount.getNumOccurrences() > 0) + PP.PeelCount = UnrollPeelCount; + if (UnrollAllowPeeling.getNumOccurrences() > 0) + PP.AllowPeeling = UnrollAllowPeeling; + if (UnrollAllowLoopNestsPeeling.getNumOccurrences() > 0) + PP.AllowLoopNestsPeeling = UnrollAllowLoopNestsPeeling; + + // User Specifed values provided by argument + if (UserAllowPeeling.hasValue()) + PP.AllowPeeling = *UserAllowPeeling; + if (UserAllowProfileBasedPeeling.hasValue()) + PP.PeelProfiledIterations = *UserAllowProfileBasedPeeling; + + return PP; +} + namespace { /// A struct to densely store the state of an instruction after unrolling at @@ -761,7 +778,8 @@ bool llvm::computeUnrollCount( ScalarEvolution &SE, const SmallPtrSetImpl &EphValues, OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount, bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize, - TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) { + TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) { // Check for explicit Count. // 1st priority is unroll count set by "unroll-count" option. @@ -863,8 +881,8 @@ bool llvm::computeUnrollCount( } // 4th priority is loop peeling. - computePeelCount(L, LoopSize, UP, TripCount, SE); - if (UP.PeelCount) { + computePeelCount(L, LoopSize, UP, PP, TripCount, SE); + if (PP.PeelCount) { UP.Runtime = false; UP.Count = 1; return ExplicitUnroll; @@ -1067,8 +1085,9 @@ static LoopUnrollResult tryToUnrollLoop( TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences( L, SE, TTI, BFI, PSI, OptLevel, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound, - ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, ProvidedFullUnrollMaxCount); + TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences( + L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling); // Exit early if unrolling is disabled. For OptForSize, we pick the loop size // as threshold later on. @@ -1142,7 +1161,7 @@ static LoopUnrollResult tryToUnrollLoop( bool UseUpperBound = false; bool IsCountSetExplicitly = computeUnrollCount( L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero, - TripMultiple, LoopSize, UP, UseUpperBound); + TripMultiple, LoopSize, UP, PP, UseUpperBound); if (!UP.Count) return LoopUnrollResult::Unmodified; // Unroll factor (Count) must be less or equal to TripCount. @@ -1157,7 +1176,7 @@ static LoopUnrollResult tryToUnrollLoop( LoopUnrollResult UnrollResult = UnrollLoop( L, {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount, - UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder, + UseUpperBound, MaxOrZero, TripMultiple, PP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV}, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop); if (UnrollResult == LoopUnrollResult::Unmodified) @@ -1189,7 +1208,7 @@ static LoopUnrollResult tryToUnrollLoop( // If the loop was peeled, we already "used up" the profile information // we had, so we don't want to unroll or peel again. if (UnrollResult != LoopUnrollResult::FullyUnrolled && - (IsCountSetExplicitly || (UP.PeelProfiledIterations && UP.PeelCount))) + (IsCountSetExplicitly || (PP.PeelProfiledIterations && PP.PeelCount))) L->setLoopAlreadyUnrolled(); return UnrollResult; diff --git a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp index 43dfaf3e50dc..c653aacbee6c 100644 --- a/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp @@ -279,19 +279,20 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount, // Return the number of iterations we want to peel off. void llvm::computePeelCount(Loop *L, unsigned LoopSize, TargetTransformInfo::UnrollingPreferences &UP, + TargetTransformInfo::PeelingPreferences &PP, unsigned &TripCount, ScalarEvolution &SE) { assert(LoopSize > 0 && "Zero loop size is not allowed!"); - // Save the UP.PeelCount value set by the target in - // TTI.getUnrollingPreferences or by the flag -unroll-peel-count. - unsigned TargetPeelCount = UP.PeelCount; - UP.PeelCount = 0; + // Save the PP.PeelCount value set by the target in + // TTI.getPeelingPreferences or by the flag -unroll-peel-count. + unsigned TargetPeelCount = PP.PeelCount; + PP.PeelCount = 0; if (!canPeel(L)) return; // Only try to peel innermost loops by default. // The constraint can be relaxed by the target in TTI.getUnrollingPreferences // or by the flag -unroll-allow-loop-nests-peeling. - if (!UP.AllowLoopNestsPeeling && !L->empty()) + if (!PP.AllowLoopNestsPeeling && !L->empty()) return; // If the user provided a peel count, use that. @@ -299,13 +300,13 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, if (UserPeelCount) { LLVM_DEBUG(dbgs() << "Force-peeling first " << UnrollForcePeelCount << " iterations.\n"); - UP.PeelCount = UnrollForcePeelCount; - UP.PeelProfiledIterations = true; + PP.PeelCount = UnrollForcePeelCount; + PP.PeelProfiledIterations = true; return; } // Skip peeling if it's disabled. - if (!UP.AllowPeeling) + if (!PP.AllowPeeling) return; unsigned AlreadyPeeled = 0; @@ -354,8 +355,8 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn" << " some Phis into invariants.\n"); - UP.PeelCount = DesiredPeelCount; - UP.PeelProfiledIterations = false; + PP.PeelCount = DesiredPeelCount; + PP.PeelProfiledIterations = false; return; } } @@ -367,7 +368,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, return; // Do not apply profile base peeling if it is disabled. - if (!UP.PeelProfiledIterations) + if (!PP.PeelProfiledIterations) return; // If we don't know the trip count, but have reason to believe the average // trip count is low, peeling should be beneficial, since we will usually @@ -387,7 +388,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize, (LoopSize * (*PeelCount + 1) <= UP.Threshold)) { LLVM_DEBUG(dbgs() << "Peeling first " << *PeelCount << " iterations.\n"); - UP.PeelCount = *PeelCount; + PP.PeelCount = *PeelCount; return; } LLVM_DEBUG(dbgs() << "Requested peel count: " << *PeelCount << "\n"); From llvm-commits at lists.llvm.org Fri Jul 10 11:41:45 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:41:45 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][AIX] Unsupport test on AIX In-Reply-To: References: Message-ID: <0a7264a8a40fd20808104b2bc62cde22@localhost.localdomain> hubert.reinterpretcast accepted this revision. hubert.reinterpretcast added a comment. This revision is now accepted and ready to land. LGTM. Please give other reviewers some time to comment; thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83579/new/ https://reviews.llvm.org/D83579 From llvm-commits at lists.llvm.org Fri Jul 10 11:44:24 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:44:24 +0000 (UTC) Subject: [PATCH] D79625: [PowerPC] Extend .reloc directive on PowerPC In-Reply-To: References: Message-ID: <79d6b7cdf130b5240e4e878985f056ec@localhost.localdomain> nemanjai added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp:118 { "fixup_ppc_pcrel34", 0, 34, MCFixupKindInfo::FKF_IsPCRel }, + { "fixup_ppc_linker_opt", 0, 0, 0 }, { "fixup_ppc_nofixup", 0, 0, 0 } ---------------- This is going to need `MCFixupKindInfo::FKF_IsPCRel` or else it needs to be moved to the non-PC-Rel section of the switch in `getRelocType()`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79625/new/ https://reviews.llvm.org/D79625 From llvm-commits at lists.llvm.org Fri Jul 10 11:44:35 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:44:35 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][test][AIX] Unsupport error-opening-directory.test on AIX In-Reply-To: References: Message-ID: <27f1341cdff234ee553c23e84983f649@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83579/new/ https://reviews.llvm.org/D83579 From llvm-commits at lists.llvm.org Fri Jul 10 11:48:43 2020 From: llvm-commits at lists.llvm.org (Sameer Arora via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:48:43 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][test][AIX] Unsupport error-opening-directory.test on AIX In-Reply-To: References: Message-ID: <603351c361f47b78dcf20e2d5420e7ff@localhost.localdomain> sameerarora101 accepted this revision. sameerarora101 added a comment. @stevewan Thanks for reporting and the fix. Sorry about the breakage! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83579/new/ https://reviews.llvm.org/D83579 From llvm-commits at lists.llvm.org Fri Jul 10 11:49:43 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:49:43 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v In-Reply-To: References: Message-ID: <46effa1452c30a2358ea62a68ff69f85@localhost.localdomain> sammccall accepted this revision. sammccall added a comment. This revision is now accepted and ready to land. I believe if we do this without removing the substitutions, we're effectively running `EXE=$(command -v /full/path/to/llvm-ranlib)` which is a bit weird and IMO doesn't make sense as a final state. I haven't done much investigation into the history/motivation of these substitutions, but I imagine we can remove at least some of them, and I'm happy to help out. How do you think we should sequence this? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83578/new/ https://reviews.llvm.org/D83578 From llvm-commits at lists.llvm.org Fri Jul 10 11:55:25 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 18:55:25 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: <50362268ce7249c61ba81c9a59e7054d@localhost.localdomain> bbrezillon added a comment. >>> The implementation now returns `FP_ILOGBNAN` even for `Inf` input, which is not correct. >> >> Hm, nope, it still returns `0x7fffffff`, which is `INT_MAX`. I think you're referring to my comment, where I'm emitting the idea of merging the 2 tests into a single one since `FP_ILOGBNAN` is now also equal to `INT_MAX`, but as mentioned there, I think clarity prevails over optimization (especially since clang might optimize that for us anyway). > > It doesn't matter if the value is in hex or decimal or a define, The results of `ilogb(Inf)` and `ilogb(NaN)` is now indistinguishable to the caller. > The spec is rather clear that the value of `FP_ILOGBNAN` shall be returned only if the input is NaN. Hm, can you point me to the portion of the spec where this is stated? If there's such a rule stating that `FP_ILOGBNAN` should be unique, the current implementation is wrong since both `FP_ILOGBNAN` and `FP_ILOGB0` map to the same value. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Fri Jul 10 11:59:19 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Fri, 10 Jul 2020 11:59:19 -0700 (PDT) Subject: [llvm] a0b5496 - [PredicateInfo] Add test for multiple branches on same condition (NFC) Message-ID: <5f08ba87.1c69fb81.4a405.fece@mx.google.com> Author: Nikita Popov Date: 2020-07-10T20:59:03+02:00 New Revision: a0b549602612fa2577068bcdcae3bfbc6c9c3264 URL: https://github.com/llvm/llvm-project/commit/a0b549602612fa2577068bcdcae3bfbc6c9c3264 DIFF: https://github.com/llvm/llvm-project/commit/a0b549602612fa2577068bcdcae3bfbc6c9c3264.diff LOG: [PredicateInfo] Add test for multiple branches on same condition (NFC) This illustrates a case where RenamedOp does not correspond to the value used in the condition, which it ideally should. Added: llvm/test/Transforms/Util/PredicateInfo/branch-on-same-cond.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/Transforms/Util/PredicateInfo/branch-on-same-cond.ll b/llvm/test/Transforms/Util/PredicateInfo/branch-on-same-cond.ll new file mode 100644 index 000000000000..7cf52d1bed3c --- /dev/null +++ b/llvm/test/Transforms/Util/PredicateInfo/branch-on-same-cond.ll @@ -0,0 +1,64 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S -print-predicateinfo < %s 2>&1 >/dev/null | FileCheck %s + +; FIXME: RenamedOp should be %cmp or %x in all cases here, +; which is the value used in the condition. +define i32 @test(i32 %x) { +; CHECK-LABEL: @test( +; CHECK-NEXT: entry: +; CHECK-NEXT: br label [[BB1:%.*]] +; CHECK: bb1: +; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[X:%.*]], 0 +; CHECK: RenamedOp: [[CMP]] +; CHECK: [[CMP_0:%.*]] = call i1 @llvm.ssa.copy.{{.*}}(i1 [[CMP]]) +; CHECK: RenamedOp: [[X]] +; CHECK: [[X_0:%.*]] = call i32 @llvm.ssa.copy.{{.*}}(i32 [[X]]) +; CHECK-NEXT: br i1 [[CMP]], label [[BB2:%.*]], label [[EXIT1:%.*]] +; CHECK: bb2: +; CHECK: RenamedOp: [[CMP_0]] +; CHECK: [[CMP_0_1:%.*]] = call i1 @llvm.ssa.copy.{{.*}}(i1 [[CMP_0]]) +; CHECK: RenamedOp: [[X]] +; CHECK: [[X_0_1:%.*]] = call i32 @llvm.ssa.copy.{{.*}}(i32 [[X_0]]) +; CHECK: RenamedOp: [[X_0]] +; CHECK: [[X_0_4:%.*]] = call i32 @llvm.ssa.copy.{{.*}}(i32 [[X_0]]) +; CHECK-NEXT: br i1 [[CMP_0]], label [[BB3:%.*]], label [[EXIT2:%.*]] +; CHECK: bb3: +; CHECK: RenamedOp: [[X]] +; CHECK: [[X_0_1_2:%.*]] = call i32 @llvm.ssa.copy.{{.*}}(i32 [[X_0_1]]) +; CHECK: RenamedOp: [[X_0_1]] +; CHECK: [[X_0_1_3:%.*]] = call i32 @llvm.ssa.copy.{{.*}}(i32 [[X_0_1]]) +; CHECK-NEXT: br i1 [[CMP_0_1]], label [[EXIT3:%.*]], label [[EXIT4:%.*]] +; CHECK: exit1: +; CHECK-NEXT: ret i32 0 +; CHECK: exit2: +; CHECK-NEXT: ret i32 [[X_0_4]] +; CHECK: exit3: +; CHECK-NEXT: ret i32 [[X_0_1_2]] +; CHECK: exit4: +; CHECK-NEXT: ret i32 [[X_0_1_3]] +; +entry: + br label %bb1 + +bb1: + %cmp = icmp eq i32 %x, 0 + br i1 %cmp, label %bb2, label %exit1 + +bb2: + br i1 %cmp, label %bb3, label %exit2 + +bb3: + br i1 %cmp, label %exit3, label %exit4 + +exit1: + ret i32 0 + +exit2: + ret i32 %x + +exit3: + ret i32 %x + +exit4: + ret i32 %x +} From llvm-commits at lists.llvm.org Fri Jul 10 12:07:54 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:07:54 +0000 (UTC) Subject: [PATCH] D83575: [NewPM] Allow passes to never be skipped In-Reply-To: References: Message-ID: aeubanks updated this revision to Diff 277116. aeubanks added a comment. Add tests (taken from https://reviews.llvm.org/D82344, thanks ychen) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83575/new/ https://reviews.llvm.org/D83575 Files: llvm/include/llvm/Analysis/CGSCCPassManager.h llvm/include/llvm/IR/PassInstrumentation.h llvm/include/llvm/IR/PassManager.h llvm/include/llvm/IR/PassManagerInternal.h llvm/include/llvm/Transforms/Scalar/LoopPassManager.h llvm/unittests/IR/PassBuilderCallbacksTest.cpp polly/include/polly/ScopPass.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83575.277116.patch Type: text/x-patch Size: 9255 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:09:53 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:09:53 +0000 (UTC) Subject: [PATCH] D82344: [NewPM] Make PMs and adaptor passes for PMs unskippable In-Reply-To: References: Message-ID: <5d3d2cf80a11607bf5ee85c5d18c1e3b@localhost.localdomain> aeubanks added a comment. I think I prefer https://reviews.llvm.org/D83575 over this, this uses too much template metaprogramming for my liking. WDYT? ================ Comment at: llvm/include/llvm/Analysis/CGSCCPassManager.h:359 + static bool isSkippable() { + return !std::is_base_of::value; + } ---------------- This is saying that only a ModuleToPostOrderCGSCCPassAdaptor around a CGSCCPassManager isn't skippable, all other ModuleToPostOrderCGSCCPassAdaptor are? What about a wrapper around a normal CGSCC pass that isn't skippable? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82344/new/ https://reviews.llvm.org/D82344 From llvm-commits at lists.llvm.org Fri Jul 10 12:12:02 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:12:02 +0000 (UTC) Subject: [PATCH] D83581: [WebAssembly] Prefer v128.const for constant splats Message-ID: tlively created this revision. tlively added a reviewer: aheejin. Herald added subscribers: llvm-commits, sunfish, hiraditya, jgravelle-google, sbc100, dschuff. Herald added a project: LLVM. In BUILD_VECTOR lowering, we used to generally prefer using splats over v128.const instructions because v128.const has a very large encoding. However, in d5b7a4e2e8 we switched to preferring consts because they are expected to be more efficient in engines. This patch updates the ISel patterns to match this current preference. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83581 Files: llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-arith.ll llvm/test/CodeGen/WebAssembly/simd.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83581.277117.patch Type: text/x-patch Size: 9915 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:12:33 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Fri, 10 Jul 2020 12:12:33 -0700 (PDT) Subject: [llvm] 9ff310d - AArch64: Fix unused variables Message-ID: <5f08bda1.1c69fb81.f3780.fd54@mx.google.com> Author: Matt Arsenault Date: 2020-07-10T15:12:25-04:00 New Revision: 9ff310d5bfa56bb5f29645e2d2fee12115c3ddb7 URL: https://github.com/llvm/llvm-project/commit/9ff310d5bfa56bb5f29645e2d2fee12115c3ddb7 DIFF: https://github.com/llvm/llvm-project/commit/9ff310d5bfa56bb5f29645e2d2fee12115c3ddb7.diff LOG: AArch64: Fix unused variables Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp index ec9683a560f8..11a8d5def429 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp @@ -885,7 +885,6 @@ bool AArch64CallLowering::lowerTailCall( const auto &Forwards = FuncInfo->getForwardedMustTailRegParms(); // Do the actual argument marshalling. - SmallVector PhysRegs; OutgoingArgHandler Handler(MIRBuilder, MRI, MIB, AssignFnFixed, AssignFnVarArg, true, FPDiff); if (!handleAssignments(MIRBuilder, OutArgs, Handler)) @@ -1003,7 +1002,6 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, TRI->emitReservedArgRegCallError(MF); // Do the actual argument marshalling. - SmallVector PhysRegs; OutgoingArgHandler Handler(MIRBuilder, MRI, MIB, AssignFnFixed, AssignFnVarArg, false); if (!handleAssignments(MIRBuilder, OutArgs, Handler)) From llvm-commits at lists.llvm.org Fri Jul 10 12:18:59 2020 From: llvm-commits at lists.llvm.org (Nemanja Ivanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:18:59 +0000 (UTC) Subject: [PATCH] D79864: [PowerPC] Add linker opt for PC Relative GOT indirect accesses In-Reply-To: References: Message-ID: nemanjai added inline comments. ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:98 + + // User of the GOT-indirect address. + if (IsPartOfGOTToPCRelPair.hasValue() && !IsPartOfGOTToPCRelPair.getValue()) ---------------- ``` // For example, the load that will get the relocation as follows: // .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) // lwa 3, 4(3) ``` ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp:109 + + // Producer of the GOT-indirect address. + if (IsPartOfGOTToPCRelPair.hasValue() && IsPartOfGOTToPCRelPair.getValue()) ---------------- ``` // For example, the prefixed load from the got that will get the label as follows: // pld 3, vec at got@pcrel(0), 1 // .Lpcrel1: ``` ================ Comment at: llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp:110 + printInstruction(MI, Address, O); + O << "\n"; + O << SymbolName; ---------------- Is there not something like `printLabel()` and `printRelocDirective()` for this? It seems odd to be manually printing it like this. If there isn't that's fine, just seems odd. ================ Comment at: llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp:246 + Register DefReg; + Register UseInstDef; + bool StillValid; ---------------- I know this is based on my suggestion, but I think the name doesn't make sense if this is a store (since a store does not define anything). Perhaps just `UseReg` is fine. ================ Comment at: llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp:255 + // collect potential pairs of GOT indirect access instructions. + for (auto BBI = MBB.instr_begin(); BBI != MBB.instr_end(); ++BBI) { + // Look for the initial GOT indirect load. ---------------- I think a simpler implementation for lines 255-311 would be something like this: ``` for (auto BBI = MBB.instr_begin(); BBI != MBB.instr_end(); ++BBI) { // Look for the initial GOT indirect load. if (isGOTPLDpc(*BBI)) { GOTDefUsePair CurrentPair{BBI, MachineBasicBlock::iterator(), BBI->getOperand(0).getReg(), PPC::NoRegister, true}; CandPairs.push_back(CurrentPair); continue; } // We haven't encountered any new PLD instructions, nothing to check. if (CandPairs.empty()) continue; // Run through the candidate pairs and see if any of the registers // defined in the PLD instructions are used by this instruction. // Note: the size of CandPairs can change in the loop. for (unsigned Idx = 0; Idx < CandPairs.size(); Idx++) { GOTDefUsePair &Pair = CandPairs[Idx]; // The instruction does not use or modify this PLD's def reg, ignore it. if (!BBI->readsRegister(Pair.DefReg, TRI) && !BBI->modifiesRegister(Pair.DefReg, TRI)) continue; // The use needs to be used in the address compuation and not // as the register being stored for a store. const MachineOperand *UseOp = hasPCRelativeForm(*BBI) ? &BBI->getOperand(2) : nullptr; // Check for a valid use. if (UseOp && UseOp->isReg() && UseOp->getReg() == Pair.DefReg && UseOp->isUse() && UseOp->isKill()) { Pair.UseInst = BBI; Pair.UseInstDef = BBI->getOperand(0).getReg(); ValidPairs.push_back(Pair); } CandPairs.erase(CandPairs.begin() + Idx); } } // Go through all of the pairs and check for any more valid uses. for (auto Pair = ValidPairs.begin(); Pair != ValidPairs.end(); Pair++) { // We shouldn't be here if we don't have a valid pair. assert(Pair->UseInst.isValid() && Pair->StillValid && "Kept an invalid def/use pair for GOT PCRel opt"); ``` The idea is that after the loop over the basic block, we just have a data structure containing pairs of (PLD, MemAccess) where the register defined by PLD is guaranteed to not have uses in between them. Then for each such pair, we check for defs/uses of the register defined by the MemAccess and we're done. ================ Comment at: llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp:268 + // If this instruction is not a PLDpc as above and the vector is still + // empty then there is no point in going futher. Find the next + // instruction. ---------------- s/futher/further ================ Comment at: llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll:159 +; CHECK-NEXT: mtfprwz f1, r4 +; CHECK-NEXT: lxvx vs0, 0, r3 +; CHECK-NEXT: pld r3, outputVi32 at got@pcrel(0), 1 ---------------- ``` ; FIXME: we should always convert X-Form instructions that use ; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt. ``` ================ Comment at: llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll:390 + ret i32* @input32 +} ---------------- Please add `attributes #0 = { nounwind }` and decorate the functions with it so we don't get the `.cfi` directives since they just make the test case more busy. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79864/new/ https://reviews.llvm.org/D79864 From llvm-commits at lists.llvm.org Fri Jul 10 12:24:30 2020 From: llvm-commits at lists.llvm.org (Alexei Starovoitov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:24:30 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <0f060b6fb1828c36638e06db9d2c73fc@localhost.localdomain> ast added a comment. does pahole convert float/double to int ? Is it really the case? I think it's better to skip float/double when they are part of a struct and leave a hole in there. I worry that representing them as 'char' array may cause backward compatibility issues later. If pahole is doing such hack now it probably should be fixed too. As far as structs with >64k members I think the hard error is better. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 12:28:08 2020 From: llvm-commits at lists.llvm.org (Jinsong Ji via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:28:08 +0000 (UTC) Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in prologue In-Reply-To: References: Message-ID: jsji accepted this revision as: jsji. jsji added a comment. This revision is now accepted and ready to land. LGTM. ================ Comment at: llvm/lib/Target/PowerPC/PPCFrameLowering.cpp:1375 + const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo(); + const bool needsCFI = MF.needsFrameMoves() && !Subtarget.isAIXABI(); auto StackAllocMIPos = llvm::find_if(PrologMBB, [](MachineInstr &MI) { ---------------- Can we add comments here about excluding `AIX`. ================ Comment at: llvm/lib/Target/PowerPC/PPCFrameLowering.cpp:1451 + buildDefCFAReg(PrologMBB, {MI}, FPReg); + buildCFAOffset(PrologMBB, {MI}, 0); + } ---------------- We should merge this two CFIs into one. ================ Comment at: llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll:59 +; CHECK-LE-NEXT: .cfi_def_cfa_register r1 +; CHECK-LE-NEXT: .cfi_def_cfa_offset 4144 ; CHECK-LE-NEXT: li r3, 3 ---------------- These CFIs are not generated in this patch, it would be better if we can pre-commit the testcase changed due to removing of `nounwind` attr first, so that we can see the real changes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83276/new/ https://reviews.llvm.org/D83276 From llvm-commits at lists.llvm.org Fri Jul 10 12:30:13 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:30:13 +0000 (UTC) Subject: [PATCH] D83583: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen Message-ID: clementval created this revision. clementval added reviewers: sstefan1, jdoerfert, jdenny. Herald added subscribers: llvm-commits, guansong, hiraditya, yaxunl, mgorny. Herald added a project: LLVM. Diff D83176 moved the last piece of code from OMPConstants.cpp and now this file was only useful to include the tablegen generated file. This patch replace OMPConstants.cpp with OMP.cpp generated by tablegen. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83583 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/lib/Frontend/OpenMP/CMakeLists.txt llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83583.277124.patch Type: text/x-patch Size: 6028 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:35:33 2020 From: llvm-commits at lists.llvm.org (Christudasan Devadasan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:35:33 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. Message-ID: cdevadas created this revision. cdevadas added reviewers: arsenm, sameerds. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. It is possible that LowerSwitch pass leaves certain blocks unreachable from the entry. If not removed, these dead blocks can trouble the subsequent passes, violating certain properties. In the AMDGPU target flow, the UnreachableBlockElim pass that removes the dead blocks is invoked just before preISel passes. The LowerSwitch pass is currently inserted during preISel. This patch inserts the Lowerswitch pass in an appropriate place to ensure any dead blocks resulting from the transformation will be removed. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83584 Files: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll Index: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll @@ -0,0 +1,60 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s | FileCheck -check-prefix=GCN %s +define void @test() #1 { + ; Clean up the unreachable blocks introduced with LowerSwitch pass. + ; This test ensures that, in the pass flow, UnreachableBlockElim pass + ; follows the LowerSwitch. Otherwise, this testcase will crash soonafter + ; the instruction selection due to the incomplete PHI node in an MBB whose + ; incoming values were never codegenerated. + ; + ; GCN-LABEL: name: test + ; GCN: bb.{{[0-9]+}}.entry: + ; GCN: bb.{{[0-9]+}}.entry.true.blk: + ; GCN: bb.{{[0-9]+}}.entry.false.blk: + ; GCN: bb.{{[0-9]+}}.switch.blk: + + ; GCN-NOT: bb.{{[0-9]+}}.preheader.blk + ; GCN-NOT: bb.{{[0-9]+}}.pre.false.blk: + ; GCN-NOT: bb.{{[0-9]+}}.unreach.blk: + ; GCN-NOT: PHI + + ; GCN: bb.{{[0-9]+}}.exit: + entry: + %idx = tail call i32 @llvm.amdgcn.workitem.id.x() #0 + br i1 undef, label %entry.true.blk, label %entry.false.blk + + entry.true.blk: ; preds = %entry + %exit.cmp = icmp ult i32 %idx, 3 + br i1 %exit.cmp, label %switch.blk, label %exit + + entry.false.blk: ; preds = %entry + unreachable + + switch.blk: ; preds = %entry.true.blk + switch i32 %idx, label %preheader.blk [ + i32 0, label %exit + i32 1, label %exit + i32 2, label %exit + ] + + preheader.blk: ; preds = %switch.blk + %pre.exit = icmp ult i32 %idx, 5 + br i1 %pre.exit, label %unreach.blk, label %pre.false.blk + + pre.false.blk: ; preds = %preheader.blk + %call.pre.false = tail call i32 @func(i32 %idx) #0 + br label %unreach.blk + + unreach.blk: ; preds = %preheader.blk, %pre.false.blk + %phi.val = phi i32 [ %call.pre.false, %pre.false.blk ], [ undef, %preheader.blk ] + store i32 %phi.val, i32* undef + unreachable + + exit: ; preds = %switch.blk + ret void +} + +declare i32 @llvm.amdgcn.workitem.id.x() #0 +declare i32 @func(i32)#0 + +attributes #0 = { nounwind readnone } +attributes #1 = { nounwind } Index: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -787,10 +787,11 @@ if (EnableLoadStoreVectorizer) addPass(createLoadStoreVectorizerPass()); + + addPass(createLowerSwitchPass()); } bool AMDGPUPassConfig::addPreISel() { - addPass(createLowerSwitchPass()); addPass(createFlattenCFGPass()); return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83584.277125.patch Type: text/x-patch Size: 3015 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:37:48 2020 From: llvm-commits at lists.llvm.org (Albion Fung via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:37:48 +0000 (UTC) Subject: [PATCH] D83516: [PowerPC][Power10] Vector shift Instruction definitions and MC Tests In-Reply-To: References: Message-ID: <528e306c06201000dceef7d02198a229@localhost.localdomain> Conanap updated this revision to Diff 277127. Conanap added a comment. Added a new line to the end of ppc64-encoding-ISA31.txt Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83516/new/ https://reviews.llvm.org/D83516 Files: llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt llvm/test/MC/PowerPC/ppc64-encoding-ISA31.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83516.277127.patch Type: text/x-patch Size: 11966 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:43:22 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:43:22 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: dantrushin marked 11 inline comments as done. dantrushin added inline comments. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:225 unsigned NumVRegs = HasVRegVariadicDefs ? NumResults : II.getNumDefs(); + if (Node->getMachineOpcode() == TargetOpcode::STATEPOINT) + NumVRegs = NumResults; ---------------- reames wrote: > This line looks unnecessary provided the defs list is marked variadic. If you want to leave it for the moment, that's fine, but I think we can clean this up in a follow up change. > > (To be specific, OutOperandList in Target.td for STATEPOINT can be variable_ops, and this line disappears.) `II` here is `MCInstrDesc`. Statepoint has 0 defs in it MCIntrDesc. (It is `MachineInstr::getNumDefs` which work correctly for variadic outs). So I really need it here ================ Comment at: llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:128 const MCInstrDesc &II = TII->get(Def->getMachineOpcode()); - if (ResNo >= II.getNumDefs() && - II.ImplicitDefs[ResNo - II.getNumDefs()] == Reg) + if (ResNo >= II.getNumDefs() && II.hasImplicitDefOfPhysReg(Reg)) PhysReg = Reg; ---------------- reames wrote: > I don't understand the implication of this change, can you explain? On the surface, this looks likely okay, but not understanding the reason for the change is bothering me. Again, `MCInstr.getNumDefs()` for statepoint always returns 0. This code assumes that all 'dynamic' defs are implicit, so it accesses II.ImplicitDefs without any checks. But statepoint has no implicit defs, so for it II.ImplicitDefs is NULL. That functions does the the same thing, but with proper checks. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp:640 // (base[0], ptr[0], base[1], ptr[1], ...) for (unsigned i = 0; i < SI.Bases.size(); ++i) { + SDValue Derived = Builder.getValue(SI.Ptrs[i]); ---------------- reames wrote: > This block of code is functionally broken when base != derived. You have only added the vreg information for the derived, and would need to spill the base so that the GC can find it. The fix is trivial, pass "true" for the base case when Base != Derived. > > (Also, this is the probably the profitable lowering, so don't be tempted to add the base to a reg.) vreg information is only needed for derived pointers to relocate them. (base pointers are not relocated). So I could just assign all bases to VRegs (if at all possible). I just did not want to have same value both in stack slot and vreg. So IMHO the better approach is to check if base is found in `LowerAsVReg` (or that is what you mean?) As for profitability, our goal is to shift as much spilling as possible to register allocator , doesn't it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Fri Jul 10 12:43:23 2020 From: llvm-commits at lists.llvm.org (Alina Sbirlea via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:43:23 +0000 (UTC) Subject: [PATCH] D77341: [DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff. In-Reply-To: References: Message-ID: asbirlea added a comment. Thank you for the testing. Could you help with with instructions on how to run the tracker myself? My local testing showed a negligible regression for mafft and a negligible improvement on other benchmarks, so it looked like noise on average. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77341/new/ https://reviews.llvm.org/D77341 From llvm-commits at lists.llvm.org Fri Jul 10 12:43:32 2020 From: llvm-commits at lists.llvm.org (Jan Korous via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:43:32 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <7e605f9d1a7fb944b8aa9b997e592bc0@localhost.localdomain> jkorous added inline comments. ================ Comment at: llvm/docs/Security.rst:204 +* Language front-ends, such as clang, for which a malicious input file can cause undesirable behavior. For example, a maliciously-crafter C or Rust source file can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been hardened, and compiling untrusted code usually also includes running utilities such as `make` which can more readily perform malicious things. +* *FUTURE*: this section will be expanded. ---------------- We should probably include tools that need to be run with elevated privileges of some sort. For example lldb getting root. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 12:44:59 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:44:59 +0000 (UTC) Subject: [PATCH] D82549: [AIX][XCOFF] parsing xcoff object file auxiliary header In-Reply-To: References: Message-ID: hubert.reinterpretcast marked an inline comment as done. hubert.reinterpretcast added inline comments. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:489 +#define PrintAuxMember32(H, S, T) \ + if (offsetof(XCOFFAuxiliaryHeader32, T) + sizeof(AuxHeader->T) <= AuxiSize) \ + W.print##H(S, AuxHeader->T) ---------------- You can use `XCOFFAuxiliaryHeader32::T` in the `sizeof`. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:499 + DictScope DS(W, "AuxiliaryHeader"); + PrintAuxMember32(Hex, "Magic", AuxMagic); + PrintAuxMember32(Hex, "Version", Version); ---------------- Since the macro refers to it, move the macro definition into the scope of `AuxiSize`. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:527 + PrintAuxMember32(Number, "Section number for .tdata", SecNumOfTData); + PrintAuxMember32(Number, "Section number for .tbss", SecNumOfTBSS); +} ---------------- For symmetry, move the `undef` into the function body. ================ Comment at: llvm/tools/llvm-readobj/XCOFFDumper.cpp:486 +#define PrintAuxMember(H, S, T, X) \ + W.print##H(S, T); \ ---------------- DiggerLin wrote: > hubert.reinterpretcast wrote: > > This macro does not operate within the confines of what a function can do with respect to its caller (it can cause the caller to return early). I do not believe that using a function-like naming style is appropriate. I also do not believe that using such a macro for control flow is desirable. > > > > You can encode a table (yes, a macro is okay for that) with much the same information: > > (format, description, pointer-to-member, offset in the table past-the-end of the member) > > > > and use that table in the place where this macro is being invoked. > for the function printNumber is a overload function. using a macro, the complie will determine which version of printNumber will be used when compile. if using a table, I think how to make the code call the correct overload version of printNumber based in the parameter type when running may be complicated. It can be solved using pointers-to-member in the table (I guess we don't really want to use those for selecting functions), but the macro is in better shape anyway now; thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82549/new/ https://reviews.llvm.org/D82549 From llvm-commits at lists.llvm.org Fri Jul 10 12:47:29 2020 From: llvm-commits at lists.llvm.org (Denis Antrushin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:47:29 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: dantrushin updated this revision to Diff 277132. dantrushin marked 2 inline comments as done. dantrushin added a comment. Addressed comments. Now it appears that `Ptr2ResNo` and `LowerAsVReg` somewhat duplicate each other. Perhaps I can rename `Ptr2ResNo` to `LowerAsVReg` and get rid of old `LowerAsVReg`? Or it will hurt readability? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 Files: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp llvm/lib/CodeGen/SelectionDAG/StatepointLowering.h llvm/lib/CodeGen/TargetLoweringBase.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81648.277132.patch Type: text/x-patch Size: 15792 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 12:48:24 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:48:24 +0000 (UTC) Subject: [PATCH] D83583: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen In-Reply-To: References: Message-ID: <62c2f444d40370c4b0c0ad2b13b17bc8@localhost.localdomain> clementval marked an inline comment as done. clementval added inline comments. ================ Comment at: llvm/lib/Frontend/OpenMP/CMakeLists.txt:2 +set(LLVM_TARGET_DEFINITIONS ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/OpenMP/OMP.td) +tablegen(LLVM OMP.cpp --gen-directive-impl) +add_public_tablegen_target(omp_cpp) ---------------- The reason to have renamed `OMP.cpp.inc` to `OMP.cpp` is that now it is directly source of a library and the name `OMP.cpp.inc` will be used to generated code pieces that have to be included in clang/flang. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83583/new/ https://reviews.llvm.org/D83583 From llvm-commits at lists.llvm.org Fri Jul 10 12:52:31 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via llvm-commits) Date: Fri, 10 Jul 2020 12:52:31 -0700 (PDT) Subject: [llvm] 21b4cc1 - Reland [NFC] Derive from PassInfoMixin for no-op/printing passes Message-ID: <5f08c6ff.1c69fb81.f991d.091e@mx.google.com> Author: Arthur Eubanks Date: 2020-07-10T12:51:28-07:00 New Revision: 21b4cc1db9f4eb6d6956802257e3a80f86045c67 URL: https://github.com/llvm/llvm-project/commit/21b4cc1db9f4eb6d6956802257e3a80f86045c67 DIFF: https://github.com/llvm/llvm-project/commit/21b4cc1db9f4eb6d6956802257e3a80f86045c67.diff LOG: Reland [NFC] Derive from PassInfoMixin for no-op/printing passes PassInfoMixin should be used for all NPM passes, rater than a custom `name()`. This caused ambiguous references in LegacyPassManager.cpp, so had to remove "using namespace llvm::legacy" and move some things around. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D83498 Added: Modified: llvm/include/llvm/IR/IRPrintingPasses.h llvm/include/llvm/module.modulemap llvm/lib/IR/LegacyPassManager.cpp llvm/lib/Passes/PassBuilder.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/IR/IRPrintingPasses.h b/llvm/include/llvm/IR/IRPrintingPasses.h index 230db988f737..3a1c489ee09f 100644 --- a/llvm/include/llvm/IR/IRPrintingPasses.h +++ b/llvm/include/llvm/IR/IRPrintingPasses.h @@ -19,17 +19,10 @@ #define LLVM_IR_IRPRINTINGPASSES_H #include "llvm/ADT/StringRef.h" +#include "llvm/IR/PassManager.h" #include namespace llvm { -class Pass; -class Function; -class FunctionPass; -class Module; -class ModulePass; -class PreservedAnalyses; -class raw_ostream; -template class AnalysisManager; /// Create and return a pass that writes the module to the specified /// \c raw_ostream. @@ -71,7 +64,7 @@ extern bool shouldPrintAfterPass(StringRef); /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintModulePass { +class PrintModulePass : public PassInfoMixin { raw_ostream &OS; std::string Banner; bool ShouldPreserveUseListOrder; @@ -82,15 +75,13 @@ class PrintModulePass { bool ShouldPreserveUseListOrder = false); PreservedAnalyses run(Module &M, AnalysisManager &); - - static StringRef name() { return "PrintModulePass"; } }; /// Pass for printing a Function as LLVM's text IR assembly. /// /// Note: This pass is for use with the new pass manager. Use the create...Pass /// functions above to create passes for use with the legacy pass manager. -class PrintFunctionPass { +class PrintFunctionPass : public PassInfoMixin { raw_ostream &OS; std::string Banner; @@ -99,8 +90,6 @@ class PrintFunctionPass { PrintFunctionPass(raw_ostream &OS, const std::string &Banner = ""); PreservedAnalyses run(Function &F, AnalysisManager &); - - static StringRef name() { return "PrintFunctionPass"; } }; } // End llvm namespace diff --git a/llvm/include/llvm/module.modulemap b/llvm/include/llvm/module.modulemap index a36b68491683..b262311a96a0 100644 --- a/llvm/include/llvm/module.modulemap +++ b/llvm/include/llvm/module.modulemap @@ -256,6 +256,7 @@ module LLVM_intrinsic_gen { module Analysis_PostDominators { header "Analysis/PostDominators.h" export * } module Analysis_DomTreeUpdater { header "Analysis/DomTreeUpdater.h" export * } module IR_IRBuilder { header "IR/IRBuilder.h" export * } + module IR_IRPrintingPasses { header "IR/IRPrintingPasses.h" export * } module IR_MatrixBuilder { header "IR/MatrixBuilder.h" export * } module IR_PassManager { header "IR/PassManager.h" export * } module IR_PassManagerImpl { header "IR/PassManagerImpl.h" export * } diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 1d9c44f385fb..4189aea46294 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -33,7 +33,6 @@ #include #include using namespace llvm; -using namespace llvm::legacy; // See PassManagers.h for Pass Manager infrastructure overview. @@ -387,6 +386,66 @@ class FunctionPassManagerImpl : public Pass, void FunctionPassManagerImpl::anchor() {} char FunctionPassManagerImpl::ID = 0; + +//===----------------------------------------------------------------------===// +// FunctionPassManagerImpl implementation +// +bool FunctionPassManagerImpl::doInitialization(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + Changed |= getContainedManager(Index)->doInitialization(M); + + return Changed; +} + +bool FunctionPassManagerImpl::doFinalization(Module &M) { + bool Changed = false; + + for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) + Changed |= getContainedManager(Index)->doFinalization(M); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} + +void FunctionPassManagerImpl::releaseMemoryOnTheFly() { + if (!wasRun) + return; + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + FPPassManager *FPPM = getContainedManager(Index); + for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { + FPPM->getContainedPass(Index)->releaseMemory(); + } + } + wasRun = false; +} + +// Execute all the passes managed by this top level manager. +// Return true if any function is modified by a pass. +bool FunctionPassManagerImpl::run(Function &F) { + bool Changed = false; + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnFunction(F); + F.getContext().yield(); + } + + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) + getContainedManager(Index)->cleanup(); + + wasRun = true; + return Changed; +} } // namespace legacy } // namespace llvm @@ -406,7 +465,7 @@ class MPPassManager : public Pass, public PMDataManager { // Delete on the fly managers. ~MPPassManager() override { for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; delete FPP; } } @@ -451,7 +510,7 @@ class MPPassManager : public Pass, public PMDataManager { for (unsigned Index = 0; Index < getNumContainedPasses(); ++Index) { ModulePass *MP = getContainedPass(Index); MP->dumpPassStructure(Offset + 1); - MapVector::const_iterator I = + MapVector::const_iterator I = OnTheFlyManagers.find(MP); if (I != OnTheFlyManagers.end()) I->second->dumpPassStructure(Offset + 2); @@ -471,7 +530,7 @@ class MPPassManager : public Pass, public PMDataManager { private: /// Collection of on the fly FPPassManagers. These managers manage /// function passes that are required by module passes. - MapVector OnTheFlyManagers; + MapVector OnTheFlyManagers; }; char MPPassManager::ID = 0; @@ -534,6 +593,33 @@ class PassManagerImpl : public Pass, void PassManagerImpl::anchor() {} char PassManagerImpl::ID = 0; + +//===----------------------------------------------------------------------===// +// PassManagerImpl implementation + +// +/// run - Execute all of the passes scheduled for execution. Keep track of +/// whether any of the passes modifies the module, and if so, return true. +bool PassManagerImpl::run(Module &M) { + bool Changed = false; + + dumpArguments(); + dumpPasses(); + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doInitialization(M); + + initializeAllAnalysisInfo(); + for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { + Changed |= getContainedManager(Index)->runOnModule(M); + M.getContext().yield(); + } + + for (ImmutablePass *ImPass : getImmutablePasses()) + Changed |= ImPass->doFinalization(M); + + return Changed; +} } // namespace legacy } // namespace llvm @@ -1314,12 +1400,15 @@ AnalysisResolver::findImplPass(Pass *P, AnalysisID AnalysisPI, Function &F) { return PM.getOnTheFlyPass(P, AnalysisPI, F); } +namespace llvm { +namespace legacy { + //===----------------------------------------------------------------------===// // FunctionPassManager implementation /// Create new Function pass manager FunctionPassManager::FunctionPassManager(Module *m) : M(m) { - FPM = new FunctionPassManagerImpl(); + FPM = new legacy::FunctionPassManagerImpl(); // FPM is the top level manager. FPM->setTopLevelManager(FPM); @@ -1358,36 +1447,8 @@ bool FunctionPassManager::doInitialization() { bool FunctionPassManager::doFinalization() { return FPM->doFinalization(*M); } - -//===----------------------------------------------------------------------===// -// FunctionPassManagerImpl implementation -// -bool FunctionPassManagerImpl::doInitialization(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - Changed |= getContainedManager(Index)->doInitialization(M); - - return Changed; -} - -bool FunctionPassManagerImpl::doFinalization(Module &M) { - bool Changed = false; - - for (int Index = getNumContainedManagers() - 1; Index >= 0; --Index) - Changed |= getContainedManager(Index)->doFinalization(M); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} +} // namespace legacy +} // namespace llvm /// cleanup - After running all passes, clean up pass manager cache. void FPPassManager::cleanup() { @@ -1399,35 +1460,6 @@ void FPPassManager::cleanup() { } } -void FunctionPassManagerImpl::releaseMemoryOnTheFly() { - if (!wasRun) - return; - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - FPPassManager *FPPM = getContainedManager(Index); - for (unsigned Index = 0; Index < FPPM->getNumContainedPasses(); ++Index) { - FPPM->getContainedPass(Index)->releaseMemory(); - } - } - wasRun = false; -} - -// Execute all the passes managed by this top level manager. -// Return true if any function is modified by a pass. -bool FunctionPassManagerImpl::run(Function &F) { - bool Changed = false; - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnFunction(F); - F.getContext().yield(); - } - - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) - getContainedManager(Index)->cleanup(); - - wasRun = true; - return Changed; -} //===----------------------------------------------------------------------===// // FPPassManager implementation @@ -1554,7 +1586,7 @@ MPPassManager::runOnModule(Module &M) { // Initialize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; Changed |= FPP->doInitialization(M); } @@ -1615,7 +1647,7 @@ MPPassManager::runOnModule(Module &M) { // Finalize on-the-fly passes for (auto &OnTheFlyManager : OnTheFlyManagers) { - FunctionPassManagerImpl *FPP = OnTheFlyManager.second; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManager.second; // We don't know when is the last time an on-the-fly pass is run, // so we need to releaseMemory / finalize here FPP->releaseMemoryOnTheFly(); @@ -1636,9 +1668,9 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { RequiredPass->getPotentialPassManagerType()) && "Unable to handle Pass that requires lower level Analysis pass"); - FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[P]; if (!FPP) { - FPP = new FunctionPassManagerImpl(); + FPP = new legacy::FunctionPassManagerImpl(); // FPP is the top level manager. FPP->setTopLevelManager(FPP); @@ -1669,7 +1701,7 @@ void MPPassManager::addLowerLevelRequiredPass(Pass *P, Pass *RequiredPass) { /// its runOnFunction() for function F. std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Function &F) { - FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; + legacy::FunctionPassManagerImpl *FPP = OnTheFlyManagers[MP]; assert(FPP && "Unable to find on the fly pass"); FPP->releaseMemoryOnTheFly(); @@ -1678,32 +1710,8 @@ std::tuple MPPassManager::getOnTheFlyPass(Pass *MP, AnalysisID PI, Changed); } -//===----------------------------------------------------------------------===// -// PassManagerImpl implementation - -// -/// run - Execute all of the passes scheduled for execution. Keep track of -/// whether any of the passes modifies the module, and if so, return true. -bool PassManagerImpl::run(Module &M) { - bool Changed = false; - - dumpArguments(); - dumpPasses(); - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doInitialization(M); - - initializeAllAnalysisInfo(); - for (unsigned Index = 0; Index < getNumContainedManagers(); ++Index) { - Changed |= getContainedManager(Index)->runOnModule(M); - M.getContext().yield(); - } - - for (ImmutablePass *ImPass : getImmutablePasses()) - Changed |= ImPass->doFinalization(M); - - return Changed; -} +namespace llvm { +namespace legacy { //===----------------------------------------------------------------------===// // PassManager implementation @@ -1728,6 +1736,8 @@ void PassManager::add(Pass *P) { bool PassManager::run(Module &M) { return PM->run(M); } +} // namespace legacy +} // namespace llvm //===----------------------------------------------------------------------===// // PMStack implementation @@ -1818,4 +1828,4 @@ void FunctionPass::assignPassManager(PMStack &PMS, PM->add(this); } -PassManagerBase::~PassManagerBase() {} +legacy::PassManagerBase::~PassManagerBase() {} diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 4d6c30b87a99..771cdfd17aa5 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -295,11 +295,16 @@ const PassBuilder::OptimizationLevel PassBuilder::OptimizationLevel::Oz = { namespace { +// The following passes/analyses have custom names, otherwise their name will +// include `(anonymous namespace)`. These are special since they are only for +// testing purposes and don't live in a header file. + /// No-op module pass which does nothing. -struct NoOpModulePass { +struct NoOpModulePass : PassInfoMixin { PreservedAnalyses run(Module &M, ModuleAnalysisManager &) { return PreservedAnalyses::all(); } + static StringRef name() { return "NoOpModulePass"; } }; @@ -315,7 +320,7 @@ class NoOpModuleAnalysis : public AnalysisInfoMixin { }; /// No-op CGSCC pass which does nothing. -struct NoOpCGSCCPass { +struct NoOpCGSCCPass : PassInfoMixin { PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &, LazyCallGraph &, CGSCCUpdateResult &UR) { return PreservedAnalyses::all(); @@ -337,7 +342,7 @@ class NoOpCGSCCAnalysis : public AnalysisInfoMixin { }; /// No-op function pass which does nothing. -struct NoOpFunctionPass { +struct NoOpFunctionPass : PassInfoMixin { PreservedAnalyses run(Function &F, FunctionAnalysisManager &) { return PreservedAnalyses::all(); } @@ -356,7 +361,7 @@ class NoOpFunctionAnalysis : public AnalysisInfoMixin { }; /// No-op loop pass which does nothing. -struct NoOpLoopPass { +struct NoOpLoopPass : PassInfoMixin { PreservedAnalyses run(Loop &L, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &) { return PreservedAnalyses::all(); @@ -382,7 +387,7 @@ AnalysisKey NoOpCGSCCAnalysis::Key; AnalysisKey NoOpFunctionAnalysis::Key; AnalysisKey NoOpLoopAnalysis::Key; -} // End anonymous namespace. +} // namespace void PassBuilder::invokePeepholeEPCallbacks( FunctionPassManager &FPM, PassBuilder::OptimizationLevel Level) { From llvm-commits at lists.llvm.org Fri Jul 10 12:53:18 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:53:18 +0000 (UTC) Subject: [PATCH] D83498: [NFC] Derive from PassInfoMixin for no-op/printing passes In-Reply-To: References: Message-ID: <78067e2f8e32599d722414c7a5e48eec@localhost.localdomain> aeubanks added a comment. In D83498#2144732 , @davide wrote: > Reverted in: > > commit fdb7856d54a1f81bab0ac0c8a4e984620589e699 (HEAD -> master, origin/master, origin/HEAD) > Author: Davide Italiano > Date: Fri Jul 10 11:16:33 2020 -0700 > > Revert "[NFC] Derive from PassInfoMixin for no-op/printing passes" > > This reverts commit 8039d2c3bf14585ef37dc9343bf393ecad9aead9 as > it breaks the modules build on macOS. > > > Don't hesitate to ping me if you need any other info to reproduce. Repro'ed, fixed (hopefully), and relanded in 21b4cc1db9f4eb6d6956802257e3a80f86045c67 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83498/new/ https://reviews.llvm.org/D83498 From llvm-commits at lists.llvm.org Fri Jul 10 12:55:37 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 19:55:37 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v In-Reply-To: References: Message-ID: MaskRay added a comment. In D83578#2144828 , @sammccall wrote: > I believe if we do this without removing the substitutions, we're effectively running `EXE=$(command -v /full/path/to/llvm-ranlib)` which is a bit weird and IMO doesn't make sense as a final state. > > I haven't done much investigation into the history/motivation of these substitutions, but I imagine we can remove at least some of them, and I'm happy to help out. Agreed that the lit feature is weird and should be removed. AFAIK I don't think the feature is used in other places. We should be able to drop the feature. > How do you think we should sequence this? Apologies that as a non-native speaker I don't know what "sequence" means in the sentence. If you mean "dropping the feature from lit", are you signing off for the work? :) I'd be happy to review (I don't know enough about lit to locate the feature...) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83578/new/ https://reviews.llvm.org/D83578 From llvm-commits at lists.llvm.org Fri Jul 10 13:00:08 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:00:08 +0000 (UTC) Subject: [PATCH] D82344: [NewPM] Make PMs and adaptor passes for PMs unskippable In-Reply-To: References: Message-ID: <5d65ea49ff01cf61a991fec1243d62aa@localhost.localdomain> ychen marked an inline comment as done. ychen added a comment. In D82344#2144872 , @aeubanks wrote: > I think I prefer https://reviews.llvm.org/D83575 over this, this uses too much template metaprogramming for my liking. WDYT? I'm happy if either this or D83575 are landed and slightly in favor of this. The difference between this and D83575 is that if we want the pass to implement an extra `require` method all the time. I choose not too because I think it is less pervasive and I imagine the use cases for `require` are limited. I'm kind of want to hide this detail from most pass developers so they don't need to worry about this in most cases. The template stuff is used so we don't mandate a `require` method all the time and the templates could be simplified using `is_detected`. Other than the points I made I think it is perfectly reasonable to pursue D83575 . For one thing, I'd prefer `require` over `skippable`. And it requires less code diff FWIW. Anyway since we're authors of the alternatives and I'm probably biased. It would be a good thing to solicit other opinions. Thoughts? ================ Comment at: llvm/include/llvm/Analysis/CGSCCPassManager.h:359 + static bool isSkippable() { + return !std::is_base_of::value; + } ---------------- aeubanks wrote: > This is saying that only a ModuleToPostOrderCGSCCPassAdaptor around a CGSCCPassManager isn't skippable, all other ModuleToPostOrderCGSCCPassAdaptor are? What about a wrapper around a normal CGSCC pass that isn't skippable? > What about a wrapper around a normal CGSCC pass that isn't skippable? Good point! I was not considering the case of normal passes declaring `require`. I think it should return false unconditionally here to mean the adaptor itself is always `required`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82344/new/ https://reviews.llvm.org/D82344 From llvm-commits at lists.llvm.org Fri Jul 10 13:00:37 2020 From: llvm-commits at lists.llvm.org (Andrii Nakryiko via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:00:37 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <1a1717ce33c4b11cea8523202b9c8d88@localhost.localdomain> anakryiko added a comment. pahole does a sensible thing, it represents it as BTF_KIND_INT (which maps to DWARF's DW_TAG_base_type), but with no encoding (so it's effectively unsigned integer with proper size). I've been advocating for adding one extra encoding bit to BTF_KIND_INT's encoding field to represent floating-point numbers, in addition to existing CHAR, SIGNED, and BOOL. This would make sense (float is indivisible set of bytes with particular interpretation, which is what BTF_KIND_INT is), it's easy to support in kernel, it's easy to sanitize in libbpf. The only downside is a bit mismatched INT name (DWARF uses more generic BASE_TYPE name for this), which I think is very-very minor (and one can argue that all the primitive types are integers). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 13:00:43 2020 From: llvm-commits at lists.llvm.org (Lei Huang via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:00:43 +0000 (UTC) Subject: [PATCH] D83516: [PowerPC][Power10] 128-bit Binary Integer Operation instruction definitions and MC Tests In-Reply-To: References: Message-ID: <85c4d655fb230e33118f58b148fc9a8f@localhost.localdomain> lei accepted this revision. lei added a comment. This revision is now accepted and ready to land. LGTM Please address the nits on commit. ================ Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:1022 + def XSCVUQQP : X_VT5_XO5_VB5<63, 3, 836, "xscvuqqp", []>; + def XSCVSQQP: X_VT5_XO5_VB5<63, 11, 836, "xscvsqqp", []>; } ---------------- nit: looks like there's a mix of diff spacings in the section above. Please keep it consistent. It should be `def NAME : DEF` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83516/new/ https://reviews.llvm.org/D83516 From llvm-commits at lists.llvm.org Fri Jul 10 13:02:01 2020 From: llvm-commits at lists.llvm.org (Stanislav Mekhanoshin via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:02:01 +0000 (UTC) Subject: [PATCH] D79630: AMDGPU: Start interpreting byref on kernel arguments In-Reply-To: References: Message-ID: rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM with a nit. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h:164 - void emitKernelArg(const DataLayout &DL, Type *Ty, ValueKind ValueKind, + void emitKernelArg(const DataLayout &DL, Type *Ty, Align Alignment, ValueKind ValueKind, MaybeAlign PointeeAlign = None, StringRef Name = "", ---------------- This is too long. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79630/new/ https://reviews.llvm.org/D79630 From llvm-commits at lists.llvm.org Fri Jul 10 13:03:17 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:03:17 +0000 (UTC) Subject: [PATCH] D77341: [DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff. In-Reply-To: References: Message-ID: nikic added a comment. In D77341#2144974 , @asbirlea wrote: > Thank you for the testing. Could you help with with instructions on how to run the tracker myself? > My local testing showed a negligible regression for mafft and a negligible improvement on other benchmarks, so it looked like noise on average. The tracker just compiles test-suite under `perf stat` using the cached cmake configs. If I pick out the file with the largest regression (mafft `constants.c` in `ReleaseThinLTO` config with 18% regression) I can reproduce this locally as follows: perf stat /home/nikic/llvm-project/build/bin/clang -DNDEBUG -O3 -fomit-frame-pointer -flto=thin -DNDEBUG -w -Werror=date-time -DLLVM -MD -MT MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o -MF MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o.d -o MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o -c ../MultiSource/Benchmarks/mafft/constants.c This gives me 3.5M instructions before and 4.2M instructions after. Those particular numbers are for an assertion-enabled build (the numbers on the website are without assertions.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77341/new/ https://reviews.llvm.org/D77341 From llvm-commits at lists.llvm.org Fri Jul 10 13:04:36 2020 From: llvm-commits at lists.llvm.org (Digger via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:04:36 +0000 (UTC) Subject: [PATCH] D81585: [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d In-Reply-To: References: Message-ID: <9e799a87120a7d15d3c210f621833e10@localhost.localdomain> DiggerLin updated this revision to Diff 277135. DiggerLin added a comment. add parsing the controlled storage info. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81585/new/ https://reviews.llvm.org/D81585 Files: llvm/include/llvm/BinaryFormat/XCOFF.h llvm/include/llvm/Object/XCOFFObjectFile.h llvm/lib/Object/XCOFFObjectFile.cpp llvm/unittests/Object/CMakeLists.txt llvm/unittests/Object/XCOFFObjectFileTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81585.277135.patch Type: text/x-patch Size: 17537 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 13:06:20 2020 From: llvm-commits at lists.llvm.org (Hubert Tong via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:06:20 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v In-Reply-To: References: Message-ID: <15b7a003d8d2e32b7894a34ea780f391@localhost.localdomain> hubert.reinterpretcast added a comment. In D83578#2145020 , @MaskRay wrote: > Agreed that the lit feature is weird and should be removed. AFAIK I don't think the feature is used in other places. We should be able to drop the feature. What you do mean by "in an argument place"? Does `llvm-ar` appear in an "argument place" in `not --crash llvm-ar` or `env -i LANG=C llvm-ar`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83578/new/ https://reviews.llvm.org/D83578 From llvm-commits at lists.llvm.org Fri Jul 10 13:08:27 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:08:27 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: MaskRay added a comment. In D79978#2141992 , @wmi wrote: > In D79978#2141959 , @MaskRay wrote: > > > Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. > > > That is because .cfi_def_cfa %rbp, 16 is identical to the following: > .cfi_def_cfa_register %rbp > .cfi_def_cfa_offset 16 Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me. I think it lacks a summary about what the patch intends to do. Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled? Or The entry basic block can be shuffled as well? For either behavior, I am not sure the test covers the situations: I can only find `.cfi_def_cfa_register` in the entry block, not in others - so I am not confident that `.cfi_def_cfa_register` information is correctly retained. We need a stronger test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 13:15:25 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:15:25 +0000 (UTC) Subject: [PATCH] D82698: [NewPM] make parsePassPipeline parse adaptor-wrapped user passes In-Reply-To: References: Message-ID: aeubanks added a comment. I think this makes sense as long as it's not abused and doesn't make things harder to reason about. Should be fine since most (all?) uses are through opt. ================ Comment at: llvm/test/Other/pass-pipeline-parsing.ll:182 +; CHECK-ADAPTORS: Running pass: ModuleToFunctionPassAdaptor<{{.*}}NoOpFunctionPass> +; CHECK-ADAPTORS: Running pass: ModuleToFunctionPassAdaptor<{{.*}}FunctionToLoopPassAdaptor<{{.*}}NoOpLoopPass>{{.*}}> +; CHECK-ADAPTORS: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}NoOpCGSCCPass> ---------------- what about a separate check for Running pass: FunctionToLoopPassAdaptor<{{.*}}NoOpLoopPass> and NoOpLoopPass itself? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82698/new/ https://reviews.llvm.org/D82698 From llvm-commits at lists.llvm.org Fri Jul 10 13:21:22 2020 From: llvm-commits at lists.llvm.org (Jesse Natalie via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:21:22 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: jenatali added a comment. @jvesely, I think libclc needs to change its definition here, as it's the only one out of 3 OpenCL standard lib headers that's different. Since OpenCL allows apps to provide precompiled intermediate representations to the API, in the form of SPIR or SPIR-V, that means that the app could have embedded references to `FP_ILOGBNAN`, which are just a constant value since it's a `#define`, in their own code. They could write a kernel which calls `ilogb` and compares the result to `FP_ILOGBNAN`, and compile that with either clang trunk (which uses opencl-c.h automatically), or with Khronos's offline compiler (https://github.com/KhronosGroup/SPIR) using Khronos's standard library headers (https://github.com/KhronosGroup/libclcxx). Both of these standard library implementation headers define `FP_ILOGBNAN` to be `INT_MAX`: - https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/opencl-c-base.h#L165 - https://github.com/KhronosGroup/libclcxx/blob/96459f111c3e3a4709f7e09bf5fb73dea81a475a/include/opencl_math_constants#L85 Nobody is going to offline compile OpenCL code using the libclc headers. That means that if our implementation wants to leverage libclc behind the scenes, then libclc should use the same definition of this value as the other standard libraries. Yeah, it's different from CPU land where you don't go around compiling against standard library headers from one library, and then linking against another -- except that you can kind of do that too between e.g. glibc and musl (which have matching definitions of this define for what it's worth). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Fri Jul 10 13:22:29 2020 From: llvm-commits at lists.llvm.org (Aaron Puchert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:22:29 +0000 (UTC) Subject: [PATCH] D83549: [ELF] Do not force bringing out symbols passed by -init and -fini. In-Reply-To: References: Message-ID: <686120d39b7e13c0a4abb1e923619ba8@localhost.localdomain> aaronpuchert added a comment. In D83549#2144625 , @MaskRay wrote: > @aaronpuchert Your comment https://bugzilla.opensuse.org/show_bug.cgi?id=1155108#c10 "However, it's actually a bug in all three linkers. It has been fixed in lld just today [1]" made me wonder whether https://bugs.llvm.org/show_bug.cgi?id=43927 was really a linker bug. The linkers were behaving differently though: lld and bfd dropped the FINI entirely whereas gold had FINI=0, which lead to a crash. Which of these behaviors would be expected? > I need an LLD reproduce file (obtained via `LLD_REPRODUCE=/tmp/rep.tar ...` or `-Wl,--reproduce=/tmp/rep.tar` to under the matter better. It should suffice to have a source file with a hidden visibility function that's used in a `-init` or `-fini` linker flag with LTO. If that doesn't help, I can also try putting some files together. > openSUSE's OpenMP problem probably should be fixed by adding `-u __kmp_internal_end_fini` along with `-Wl,-fini=__kmp_internal_end_fini` instead of relying on the linker retaining the bitcode defined symbols. I think the problem here is that `-Wl,-fini` without `-u` works when you don't use LTO, but when you enable it the function is removed and with it the FINI entry in `.dynamic`. As a layperson I might expect (as the OpenMP writers did) that an explicit `-fini` flag on the command line is sufficient, and that the linker makes sure LTO doesn't drop the function since it is arguably used. I can't really comment on this patch though, because I don't know under which circumstances one might use inits or finis that are undefined, and then expect them to be dropped. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83549/new/ https://reviews.llvm.org/D83549 From llvm-commits at lists.llvm.org Fri Jul 10 13:25:13 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:25:13 +0000 (UTC) Subject: [PATCH] D82698: [NewPM] make parsePassPipeline parse adaptor-wrapped user passes In-Reply-To: References: Message-ID: <743455c1e7345f571c46a3a2427b7e64@localhost.localdomain> aeubanks added a comment. Not sure exactly how to write a test for this, but it'd be nice to test a case where we have a module pass and a function pass both named the same. e.g. in PassRegistry.def there are multiple passes called "print". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82698/new/ https://reviews.llvm.org/D82698 From llvm-commits at lists.llvm.org Fri Jul 10 13:27:05 2020 From: llvm-commits at lists.llvm.org (Jez Ng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:27:05 +0000 (UTC) Subject: [PATCH] D83532: [lld-macho] Partial support for weak definitions In-Reply-To: References: Message-ID: <6348d33c5e61087314c9276de1915a79@localhost.localdomain> int3 updated this revision to Diff 277137. int3 marked 6 inline comments as done. int3 added a comment. address comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83532/new/ https://reviews.llvm.org/D83532 Files: lld/MachO/Arch/X86_64.cpp lld/MachO/ExportTrie.cpp lld/MachO/ExportTrie.h lld/MachO/InputFiles.cpp lld/MachO/SymbolTable.cpp lld/MachO/SymbolTable.h lld/MachO/Symbols.h lld/MachO/SyntheticSections.cpp lld/test/MachO/weak-definition-direct-fetch.s lld/test/MachO/weak-definition-indirect-fetch.s lld/test/MachO/weak-definition-order.s lld/test/MachO/weak-definition-over-dysym.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83532.277137.patch Type: text/x-patch Size: 23615 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 13:27:34 2020 From: llvm-commits at lists.llvm.org (Jez Ng via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:27:34 +0000 (UTC) Subject: [PATCH] D83532: [lld-macho] Partial support for weak definitions In-Reply-To: References: Message-ID: <020d187a577a8badcdff9c2c2d420e8a@localhost.localdomain> int3 added inline comments. ================ Comment at: lld/MachO/ExportTrie.h:40 using TrieEntryCallback = - llvm::function_ref; + llvm::function_ref; ---------------- compnerd wrote: > Are we sure we wont need any other flags? I wonder if it's better to just treat weakness as a flag. IIRC, there is a `EXPORT_SYMBOL_FLAGS_REEXPORT` and `EXPORT_SYMBOL_FLAGS_KIND_THREAD_LOCAL` that would be fairly good to account for. Yeah I wasn't super sure about how to organize this. My reason for making it a bool is so that the trie-decoding-specific logic can all live in ExportTrie.cpp. But having a whole bunch of boolean parameters in a callback isn't the prettiest either. But I figure we can switch it back later if necessary as we support more flags (or think of a different, cleaner API) ================ Comment at: lld/MachO/InputFiles.cpp:235 + return make(name, isec, value, sym.n_desc & N_WEAK_DEF); }; ---------------- compnerd wrote: > It's okay if you want to use braces, but please use them on both sides. However, I think that this is better written as: > > ``` > if (sym.n_type & N_EXT) > return symtab->addDefined(name, isec, value, sym.n_desc & N_WEAK_DEF); > return make(name, isec, value, sym.n_desc & N_WEAK_DEF); > ``` oh yeah the braces were accidentally left over after some editing ================ Comment at: lld/MachO/SymbolTable.cpp:52 + } + } ---------------- compnerd wrote: > What do you think of doing an early return instead if the symbol was inserted? > > Can you explain why currently it is not an error if the symbol is not inserted and not defined? Seems like a comment for that would be good. not inserted and not defined => we will fall through to replaceSymbol below which will replace any Undefined/Lazy/Dylib symbols with the Defined one Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83532/new/ https://reviews.llvm.org/D83532 From llvm-commits at lists.llvm.org Fri Jul 10 13:50:42 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:50:42 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Add additional RenamedOp field to PB. In-Reply-To: References: Message-ID: <57eefab308f78a3d9c3947c8ca578a5d@localhost.localdomain> nikic added a comment. I've added a test case in https://github.com/llvm/llvm-project/commit/a0b549602612fa2577068bcdcae3bfbc6c9c3264, which shows a case where the RenamedOp doesn't work quite right. In this case RenamedOp should always refer back to `%cmp` or `%x`, but ends up refering to `%cmp.0`, `%x.0` and `%x.0.1`. I don't think this matters in a practical way though, because we don't get any additional information from the dominated predicate infos. It still seems like we either shouldn't place those predicate infos (because they're redundant), or provide them with a correct RenamedOp though. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 From llvm-commits at lists.llvm.org Fri Jul 10 13:52:09 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:52:09 +0000 (UTC) Subject: [PATCH] D83587: [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero Message-ID: craig.topper created this revision. craig.topper added reviewers: spatel, RKSimon. Herald added subscribers: arphaman, hiraditya. Herald added a project: LLVM. Bit 7 of the index controls zeroing, the other bits are ignored when bit 7 is set. Shuffle lowering was using 128 and shuffle combining was using 255. Seems like we should be consistent. This patch changes shuffle combining to use 128 to match lowering. https://reviews.llvm.org/D83587 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-trunc.ll Index: llvm/test/CodeGen/X86/vector-trunc.ll =================================================================== --- llvm/test/CodeGen/X86/vector-trunc.ll +++ llvm/test/CodeGen/X86/vector-trunc.ll @@ -456,7 +456,7 @@ ; ; SSSE3-LABEL: trunc8i32_8i16_lshr: ; SSSE3: # %bb.0: # %entry -; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,255,255] +; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,128,128] ; SSSE3-NEXT: pshufb %xmm2, %xmm1 ; SSSE3-NEXT: pshufb %xmm2, %xmm0 ; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35040,7 +35040,7 @@ continue; } if (M == SM_SentinelZero) { - PSHUFBMask.push_back(DAG.getConstant(255, DL, MVT::i8)); + PSHUFBMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; @@ -35071,7 +35071,7 @@ continue; } if (M == SM_SentinelZero) { - VPPERMMask.push_back(DAG.getConstant(128, DL, MVT::i8)); + VPPERMMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83587.277139.patch Type: text/x-patch Size: 1373 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 13:56:25 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:56:25 +0000 (UTC) Subject: [PATCH] D83572: [SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair In-Reply-To: References: Message-ID: <4170342164bb1a97cfe80d8a3a15ef57@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15751 if (ISD::isNormalStore(ST) && ISD::isNormalLoad(Value.getNode()) && Value.hasOneUse()) { LoadSDNode *LD = cast(Value); ---------------- Mixing early return and non-early-return like this is confusing. But I guess also orthogonal. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15754 EVT VT = LD->getMemoryVT(); if (!VT.isFloatingPoint() || VT != ST->getMemoryVT() || ---------------- The isFloatingPoint check here seems weird; it doesn't really make sense to handle float vectors, but not int vectors. But I guess that's orthogonal. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83572/new/ https://reviews.llvm.org/D83572 From llvm-commits at lists.llvm.org Fri Jul 10 13:57:49 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 20:57:49 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <5a8bf5dd23bc5302a8b52e6626481f85@localhost.localdomain> jfb marked 2 inline comments as done. jfb added inline comments. ================ Comment at: llvm/docs/Security.rst:204 +* Language front-ends, such as clang, for which a malicious input file can cause undesirable behavior. For example, a maliciously-crafter C or Rust source file can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been hardened, and compiling untrusted code usually also includes running utilities such as `make` which can more readily perform malicious things. +* *FUTURE*: this section will be expanded. ---------------- jkorous wrote: > We should probably include tools that need to be run with elevated privileges of some sort. For example lldb getting root. We'd need LLDB maintainers signing up to doing this maintenance. Not that we can't / shouldn't, but that we ought to consider these one at a time, with proper support. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 14:00:26 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:00:26 +0000 (UTC) Subject: [PATCH] D83576: [BasicAA] Fix -basicaa-recphi for geps with negative offsets In-Reply-To: References: Message-ID: hfinkel added inline comments. ================ Comment at: llvm/lib/Analysis/BasicAliasAnalysis.cpp:1668 // would recurse and always get a MayAlias. Handle this case specially // below. + if (PV1GEP->getPointerOperand() == PN && ---------------- I would like to see the "The option checks for recursive phi, that recurse through a contant gep. If it finds one, it performs aliasing calculations using the other phi operands with an unknown size, to specify that an unknown number of elements after the initial value are potentially accessed. This works fine expect where the constant is negative, as the size is still considered to be positive. " part from the patch description -- that information -- in the comment here. Also, can you make a lambda function do avoid the duplication of the comment and the condition in two places? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83576/new/ https://reviews.llvm.org/D83576 From llvm-commits at lists.llvm.org Fri Jul 10 14:00:51 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:00:51 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <0105a537f9e446745febfe186212b130@localhost.localdomain> jdoerfert added inline comments. ================ Comment at: llvm/test/Transforms/Attributor/IPConstantProp/arg-count-mismatch.ll:47 +; IS__CGSCC_OPM-NEXT: [[CALL:%.*]] = call i16 bitcast (i16 (i16, i16)* @bar to i16 (i16)*)(i16 [[A]]) +; IS__CGSCC_OPM-NEXT: ret i16 [[CALL]] ; ---------------- The problem here is the missing "reference" edge in the old call graph. Before the patch `bar` is externally callable and `foo` calls an external function. With the patch, neither is the case but `foo` is also not calling `bar` (I think). The solution is to verify we are looking at a callback call and not any abstract call site. Callback calls cause such "reference" edges since D82572. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Fri Jul 10 14:01:36 2020 From: llvm-commits at lists.llvm.org (Jon Roelofs via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:01:36 +0000 (UTC) Subject: [PATCH] D83588: [TableGen][CGS] Print better errors on overlapping InstRW Message-ID: jroelofs created this revision. jroelofs added reviewers: thakis, evandro, dsanders. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83588 Files: llvm/utils/TableGen/CodeGenSchedule.cpp Index: llvm/utils/TableGen/CodeGenSchedule.cpp =================================================================== --- llvm/utils/TableGen/CodeGenSchedule.cpp +++ llvm/utils/TableGen/CodeGenSchedule.cpp @@ -1083,13 +1083,14 @@ if (RWD->getValueAsDef("SchedModel") == RWModelDef && RWModelDef->getValueAsBit("FullInstRWOverlapCheck")) { assert(!InstDefs.empty()); // Checked at function start. - PrintFatalError + PrintError (InstRWDef->getLoc(), "Overlapping InstRW definition for \"" + InstDefs.front()->getName() + "\" also matches previous \"" + RWD->getValue("Instrs")->getValue()->getAsString() + "\"."); + PrintFatalError(RWD->getLoc(), "Previous match was here."); } } LLVM_DEBUG(dbgs() << "InstRW: Reuse SC " << OldSCIdx << ":" @@ -1118,13 +1119,14 @@ for (Record *OldRWDef : SchedClasses[OldSCIdx].InstRWs) { if (OldRWDef->getValueAsDef("SchedModel") == RWModelDef) { assert(!InstDefs.empty()); // Checked at function start. - PrintFatalError + PrintError (InstRWDef->getLoc(), "Overlapping InstRW definition for \"" + InstDefs.front()->getName() + "\" also matches previous \"" + OldRWDef->getValue("Instrs")->getValue()->getAsString() + "\"."); + PrintFatalError(OldRWDef->getLoc(), "Previous match was here."); } assert(OldRWDef != InstRWDef && "SchedClass has duplicate InstRW def"); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83588.277141.patch Type: text/x-patch Size: 1718 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 14:03:57 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:03:57 +0000 (UTC) Subject: [PATCH] D78133: [PredicateInfo] Add additional RenamedOp field to PB. In-Reply-To: References: Message-ID: nikic added a comment. Okay, after thinking about this a bit more, this does impact analysis quality. Here is a better test case: define i32 @test2(i32 %x) { entry: %cmp1 = icmp sgt i32 %x, 0 %cmp2 = icmp sgt i32 %x, 10 br i1 %cmp1, label %bb2, label %exit1 bb2: br i1 %cmp2, label %exit2, label %exit3 exit1: ret i32 0 exit2: ret i32 %x exit3: ret i32 %x } Gives us: define i32 @test2(i32 %x) { entry: %cmp1 = icmp sgt i32 %x, 0 %cmp2 = icmp sgt i32 %x, 10 ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %cmp1 = icmp sgt i32 %x, 0 Edge: [label %entry,label %bb2], RenamedOp: %x } %x.0 = call i32 @llvm.ssa.copy.94226533345936(i32 %x) br i1 %cmp1, label %bb2, label %exit1 bb2: ; preds = %entry ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %cmp2 = icmp sgt i32 %x, 10 Edge: [label %bb2,label %exit2], RenamedOp: %x } %x.0.1 = call i32 @llvm.ssa.copy.94226533345936(i32 %x.0) ; Has predicate info ; branch predicate info { TrueEdge: 0 Comparison: %cmp2 = icmp sgt i32 %x, 10 Edge: [label %bb2,label %exit3], RenamedOp: %x.0 } %x.0.2 = call i32 @llvm.ssa.copy.94226533345936(i32 %x.0) br i1 %cmp2, label %exit2, label %exit3 exit1: ; preds = %entry ret i32 0 exit2: ; preds = %bb2 ret i32 %x.0.1 exit3: ; preds = %bb2 ret i32 %x.0.2 } Note that the false edge has `RenamedOp: %x.0`, but the comparison uses `%x`. That means we're going to lose the additional bound. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78133/new/ https://reviews.llvm.org/D78133 From llvm-commits at lists.llvm.org Fri Jul 10 14:07:20 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:07:20 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <309e9343b7d949774b02f81b7d8e6123@localhost.localdomain> tmsriram added a comment. In D79978#2145063 , @MaskRay wrote: > In D79978#2141992 , @wmi wrote: > > > In D79978#2141959 , @MaskRay wrote: > > > > > Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. > > > > > > That is because .cfi_def_cfa %rbp, 16 is identical to the following: > > .cfi_def_cfa_register %rbp > > .cfi_def_cfa_offset 16 > > > Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me. I am not a CFI expert either and that is not a problem. > I think it lacks a summary about what the patch intends to do. Ok, I can write a more detailed summary here. > Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled? @amharc If a basic block is placed in a unique section then it can potentially be moved away from the original function. CIEs do not allow ranges unlike debug info. So, when you are the PC of this basic block how does the unwinder know where the CFA is? In order to do that, we have to replicate the cfi directives to say how to find this. This test you pointed out shows that we do this with cfi directives. Please note that the control-flow of the program never changes even though the blocks are shuffled randomly. > Or > > The entry basic block can be shuffled as well? The entry basic block is fine as it has the symbol of the original function and the default CFI generated is correct. Its section is the same as the function's section. > For either behavior, I am not sure the test covers the situations: I can only find `.cfi_def_cfa_register` in the entry block, not in others - so I am not confident that `.cfi_def_cfa_register` information is correctly retained. We need a stronger test. Sure, could you please tell us what we should be testing. We have a test for callee saved register cfi directives being generated too. > For convenience of other reviewers, the diff when -basicblock-sections=all is turned on: > > - je .LBB0_2 > -# %bb.1: # %if.then > + je _Z2f3b.2 > + jmp _Z2f3b.1 > + .cfi_endproc > + .section .text,"ax", at progbits,unique,1 > +_Z2f3b.1: # %if.then > + .cfi_startproc > + .cfi_def_cfa %rbp, 16 > + .cfi_offset %rbp, -16 > callq _Z2f1v > -.LBB0_2: # %if.end > + jmp _Z2f3b.2 > +.Ltmp0: > + .size _Z2f3b.1, .Ltmp0-_Z2f3b.1 > + .cfi_endproc > + .section .text,"ax", at progbits,unique,2 > +_Z2f3b.2: # %if.end > + .cfi_startproc > + .cfi_def_cfa %rbp, 16 > + .cfi_offset %rbp, -16 > addq $16, %rsp > popq %rbp > .cfi_def_cfa %rsp, 8 > retq > > > > Note also that only `rbp` is described. I think we need another register to demonstrate the effect. The other test does check for callee saved registers. This test is not only to check %rbp but also to make sure cfi_startproc and cfi_endproc are generated as expected. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 14:11:14 2020 From: llvm-commits at lists.llvm.org (Sam McCall via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:11:14 +0000 (UTC) Subject: [PATCH] D83578: [test] Replace a fragile lit feature (substitution in an argument place) with command -v In-Reply-To: References: Message-ID: <8e2f2cfd798b1369ab7c6589b84950e3@localhost.localdomain> sammccall added a comment. In D83578#2145020 , @MaskRay wrote: > > How do you think we should sequence this? > > Apologies that as a non-native speaker I don't know what "sequence" means in the sentence. If you mean "dropping the feature from lit", are you signing off for the work? :) I'd be happy to review (I don't know enough about lit to locate the feature...) Sorry to be unclear, I was on my phone and being lazy! I mean do we want to e.g.: - land patches removing particular substitutions and their uses, and eventually remove the feature if they can all be eliminated - determine whether removing the feature is feasible, then land patches like this one, then remove the feature - land patches like this one, then determine whether removing the feature is feasible, then remove the feature (I'm afraid this way will get stuck in a bad state) But to back up a bit, I'm a bit worried about a chesterton's fence situation - why are these substitutions there. Unfortunately r315085 deleted the explanation for these substitutions, which read: # For each occurrence of an llvm tool name as its own word, replace it # with the full path to the build directory holding that tool. This # ensures that we are testing the tools just built and not some random # tools that might happen to be in the user's PATH. Thus this list # includes every tool placed in $(LLVM_OBJ_ROOT)/$(BuildMode)/bin # (llvm_tools_dir in lit parlance). This was originally added to resolve https://bugs.llvm.org/show_bug.cgi?id=8199 in https://github.com/llvm/llvm-project/commit/dc276c315cec9e33df8e0e171b686f79143f651d and is still described in the testing guide. I'm not sure the causes of that bug are gone, so we might just be reintroducing it. It's easy for me to say "it's better if the path is set correctly, and possibly hermetically" but I'm not confident I have a good enough handle on the various platforms and build modes to know how to get there... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83578/new/ https://reviews.llvm.org/D83578 From llvm-commits at lists.llvm.org Fri Jul 10 14:12:59 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:12:59 +0000 (UTC) Subject: [PATCH] D83576: [BasicAA] Fix -basicaa-recphi for geps with negative offsets In-Reply-To: References: Message-ID: <17a83d84c07f0463509a6477b0e7ab50@localhost.localdomain> efriedma added inline comments. ================ Comment at: llvm/lib/Analysis/BasicAliasAnalysis.cpp:1672 + isa(PV1GEP->idx_begin()) && + cast(PV1GEP->idx_begin())->getSExtValue() >= 0) { isRecursive = true; ---------------- Please avoid getSExtValue() when you can, in favor of APInt methods. Do we need to check the GEP is inbounds? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83576/new/ https://reviews.llvm.org/D83576 From llvm-commits at lists.llvm.org Fri Jul 10 14:14:11 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:14:11 +0000 (UTC) Subject: [PATCH] D83577: [SVE][Codegen] Add a helper function for pointer increment logic In-Reply-To: References: Message-ID: <076283bc159ef4ba0136db295fd92da3@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM, thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83577/new/ https://reviews.llvm.org/D83577 From llvm-commits at lists.llvm.org Fri Jul 10 14:17:12 2020 From: llvm-commits at lists.llvm.org (Sidharth Baveja via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:17:12 +0000 (UTC) Subject: [PATCH] D83056: [NFC] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities In-Reply-To: References: Message-ID: <7e398b8062bf1760856cae4d286a1709@localhost.localdomain> sidbav updated this revision to Diff 277145. sidbav marked an inline comment as not done. sidbav added a comment. Minor update after making some modifications to D80580 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83056/new/ https://reviews.llvm.org/D83056 Files: llvm/include/llvm/Transforms/Utils/LoopPeel.h llvm/include/llvm/Transforms/Utils/UnrollLoop.h llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp llvm/lib/Transforms/Scalar/LoopFuse.cpp llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp llvm/lib/Transforms/Utils/CMakeLists.txt llvm/lib/Transforms/Utils/LoopPeel.cpp llvm/lib/Transforms/Utils/LoopUnroll.cpp llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83056.277145.patch Type: text/x-patch Size: 18731 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 14:21:03 2020 From: llvm-commits at lists.llvm.org (Jinsong Ji via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:21:03 +0000 (UTC) Subject: [PATCH] D83590: [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel Message-ID: jsji created this revision. jsji added reviewers: PowerPC, hfinkel, shchenz, steven.zhang. Herald added subscribers: llvm-commits, kbarton, hiraditya, nemanjai. Herald added a project: LLVM. P9 is the only one with InstrSchedModel, but we may have more in the future, we should not hardcoded it to P9 , check hasInstrSchedModel instead. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83590 Files: llvm/lib/Target/PowerPC/PPCSubtarget.cpp llvm/test/CodeGen/PowerPC/sms-remark.ll Index: llvm/test/CodeGen/PowerPC/sms-remark.ll =================================================================== --- llvm/test/CodeGen/PowerPC/sms-remark.ll +++ llvm/test/CodeGen/PowerPC/sms-remark.ll @@ -1,14 +1,19 @@ ; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ ; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr9 --ppc-enable-pipeliner \ ; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ -; RUN: | FileCheck %s +; RUN: | FileCheck %s --check-prefix=ENABLED +; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ +; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr8 --ppc-enable-pipeliner \ +; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ +; RUN: | FileCheck %s --allow-empty --check-prefix=DISABLED @x = dso_local local_unnamed_addr global <{ i32, i32, i32, i32, [1020 x i32] }> <{ i32 1, i32 2, i32 3, i32 4, [1020 x i32] zeroinitializer }>, align 4 @y = dso_local global [1024 x i32] zeroinitializer, align 4 define dso_local i32* @foo() local_unnamed_addr { -;CHECK: Schedule found with Initiation Interval -;CHECK: Pipelined succesfully! +;ENABLED: Schedule found with Initiation Interval +;ENABLED: Pipelined succesfully! +;DISABLED-NOT: remark entry: %.pre = load i32, i32* getelementptr inbounds ([1024 x i32], [1024 x i32]* @y, i64 0, i64 0), align 4 br label %for.body Index: llvm/lib/Target/PowerPC/PPCSubtarget.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCSubtarget.cpp +++ llvm/lib/Target/PowerPC/PPCSubtarget.cpp @@ -180,7 +180,7 @@ bool PPCSubtarget::enableMachineScheduler() const { return true; } bool PPCSubtarget::enableMachinePipeliner() const { - return (CPUDirective == PPC::DIR_PWR9) && EnableMachinePipeliner; + return getSchedModel().hasInstrSchedModel() && EnableMachinePipeliner; } bool PPCSubtarget::useDFAforSMS() const { return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83590.277146.patch Type: text/x-patch Size: 2044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 14:21:41 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via llvm-commits) Date: Fri, 10 Jul 2020 14:21:41 -0700 (PDT) Subject: [llvm] 6dda6ff - [FileCheck] Fix up -dump-input* docs Message-ID: <5f08dbe5.1c69fb81.20bd.09e9@mx.google.com> Author: Joel E. Denny Date: 2020-07-10T17:21:01-04:00 New Revision: 6dda6ff0e094b667311dbd7a46d4e36aa787e033 URL: https://github.com/llvm/llvm-project/commit/6dda6ff0e094b667311dbd7a46d4e36aa787e033 DIFF: https://github.com/llvm/llvm-project/commit/6dda6ff0e094b667311dbd7a46d4e36aa787e033.diff LOG: [FileCheck] Fix up -dump-input* docs In FileCheck.rst, add `-dump-input-context` and `-dump-input-filter`, and fix some `-dump-input` documentation. In `FileCheck -help`, `cl::value_desc("kind")` is being ignored for `-dump-input-filter`, so just drop it. Extend `-dump-input=help` to mention FILECHECK_OPTS. Added: Modified: llvm/docs/CommandGuide/FileCheck.rst llvm/utils/FileCheck/FileCheck.cpp Removed: ################################################################################ diff --git a/llvm/docs/CommandGuide/FileCheck.rst b/llvm/docs/CommandGuide/FileCheck.rst index cb5db00c7b12..0a0c2c5dd25d 100644 --- a/llvm/docs/CommandGuide/FileCheck.rst +++ b/llvm/docs/CommandGuide/FileCheck.rst @@ -103,11 +103,37 @@ and from the command line. -verify``. With this option FileCheck will verify that input does not contain warnings not covered by any ``CHECK:`` patterns. -.. option:: --dump-input +.. option:: --dump-input Dump input to stderr, adding annotations representing currently enabled - diagnostics. Do this either 'always', on 'fail' (default), or 'never'. - Specify 'help' to explain the dump format and quit. + diagnostics. When there are multiple occurrences of this option, the + ```` that appears earliest in the list below has precedence. The + default is ``fail``. + + * ``help`` - Explain input dump and quit + * ``always`` - Always dump input + * ``fail`` - Dump input on failure + * ``never`` - Never dump input + +.. option:: --dump-input-context + + In the dump requested by ``--dump-input``, print ```` input lines before + and ```` input lines after any lines specified by ``--dump-input-filter``. + When there are multiple occurrences of this option, the largest specified + ```` has precedence. The default is 5. + +.. option:: --dump-input-filter + + In the dump requested by ``--dump-input``, print only input lines of kind + ```` plus any context specified by ``--dump-input-context``. When + there are multiple occurrences of this option, the ```` that appears + earliest in the list below has precedence. The default is ``error`` when + ``--dump-input=fail``, and it's ``all`` when ``--dump-input=always``. + + * ``all`` - All input lines + * ``annotation-full`` - Input lines with annotations + * ``annotation`` - Input lines with starting points of annotations + * ``error`` - Input lines with starting points of error annotations .. option:: --enable-var-scope @@ -137,15 +163,15 @@ and from the command line. .. option:: -v - Print good directive pattern matches. However, if ``-input-dump=fail`` or - ``-input-dump=always``, add those matches as input annotations instead. + Print good directive pattern matches. However, if ``-dump-input=fail`` or + ``-dump-input=always``, add those matches as input annotations instead. .. option:: -vv Print information helpful in diagnosing internal FileCheck issues, such as discarded overlapping ``CHECK-DAG:`` matches, implicit EOF pattern matches, and ``CHECK-NOT:`` patterns that do not have matches. Implies ``-v``. - However, if ``-input-dump=fail`` or ``-input-dump=always``, just add that + However, if ``-dump-input=fail`` or ``-dump-input=always``, just add that information as input annotations instead. .. option:: --allow-deprecated-dag-overlap diff --git a/llvm/utils/FileCheck/FileCheck.cpp b/llvm/utils/FileCheck/FileCheck.cpp index 8bf1dd2e9b49..fa79c5e89489 100644 --- a/llvm/utils/FileCheck/FileCheck.cpp +++ b/llvm/utils/FileCheck/FileCheck.cpp @@ -140,12 +140,11 @@ enum DumpInputFilterValue { static cl::list DumpInputFilters( "dump-input-filter", cl::desc("In the dump requested by -dump-input, print only input lines of\n" - "kind plus any context specified by -dump-input-context.\n" - "When there are multiple occurrences of this option, the \n" + "kind plus any context specified by -dump-input-context.\n" + "When there are multiple occurrences of this option, the \n" "that appears earliest in the list below has precedence. The\n" "default is 'error' when -dump-input=fail, and it's 'all' when\n" "-dump-input=always.\n"), - cl::value_desc("kind"), cl::values(clEnumValN(DumpInputFilterAll, "all", "All input lines"), clEnumValN(DumpInputFilterAnnotationFull, "annotation-full", "Input lines with annotations"), @@ -226,14 +225,21 @@ static void DumpInputAnnotationHelp(raw_ostream &OS) { << "explain the input dump printed by FileCheck.\n" << "\n" << "Related command-line options:\n" + << "\n" << " - -dump-input= enables or disables the input dump\n" - << " - -dump-input-filter= filters the input lines\n" + << " - -dump-input-filter= filters the input lines\n" << " - -dump-input-context= adjusts the context of filtered lines\n" << " - -v and -vv add more annotations\n" << " - -color forces colors to be enabled both in the dump and below\n" << " - -help documents the above options in more detail\n" << "\n" - << "Input dump annotation format:\n"; + << "These options can also be set via FILECHECK_OPTS. For example, for\n" + << "maximum debugging output on failures:\n" + << "\n" + << " $ FILECHECK_OPTS='-dump-input-filter=all -vv -color' ninja check\n" + << "\n" + << "Input dump annotation format:\n" + << "\n"; // Labels for input lines. OS << " - "; From llvm-commits at lists.llvm.org Fri Jul 10 14:23:10 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:23:10 +0000 (UTC) Subject: [PATCH] D83137: [SVE][CodeGen] Legalisation of masked loads and stores In-Reply-To: References: Message-ID: <13f42214fd81e21bdbfd94ddb117c144@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. LGTM ================ Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1096 + // Extract lo/hi halves of legal predicate types. + def : Pat<(nxv2i1 (extract_subvector (nxv4i1 PPR:$Ps), (i64 0))), ---------------- kmclaughlin wrote: > efriedma wrote: > > Do we need to support extracting, for example, an nxv2i1 from an nxv16i1? > We may need to support extracting a nxv2i1 from an nxv16i1, etc at some point, though I don't believe there are any code paths which would require this just now? At least, for the purposes of this patch I think we just need those patterns where the index is either 0 or half the number of elements. We do have a DAGCombine for EXTRACT_SUBVECTOR of an EXTRACT_SUBVECTOR; it isn't triggering here for some reason? I guess that's okay. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83137/new/ https://reviews.llvm.org/D83137 From llvm-commits at lists.llvm.org Fri Jul 10 14:25:29 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:25:29 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <23b19ad0cd7551bfa810f928a9ca4185@localhost.localdomain> yonghong-song added a comment. For structs with >64k members, currently, it is ignored (with type id 0), no compilation error. That is why I recommend to issue a warning. I worried maybe some codes there with such huge struct/enum (esp. enum), which is not used by bpf program directly, but present in the bpf code. The fatal error may break them. But I agree such cases should be really rare, so maybe fatal error is okay. Using hole instead of representing float as another type may make it complicated to handle cases like const float ... float * ... float a[100]; some may be struct members, some other may be pointee's, some may be function arguments, etc.... Or we only care about float member in a structure, which is the motivation for this patch, and all other potential use of float should be ignored (as of today)? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 14:26:36 2020 From: llvm-commits at lists.llvm.org (Evandro Menezes via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:26:36 +0000 (UTC) Subject: [PATCH] D83588: [TableGen][CGS] Print better errors on overlapping InstRW In-Reply-To: References: Message-ID: evandro added inline comments. ================ Comment at: llvm/utils/TableGen/CodeGenSchedule.cpp:1093 "\"."); + PrintFatalError(RWD->getLoc(), "Previous match was here."); } ---------------- Or rather, continuing from the previous line: ``` "\"" + " at " + RWD->getLoc()); ``` ================ Comment at: llvm/utils/TableGen/CodeGenSchedule.cpp:1129 "\"."); + PrintFatalError(OldRWDef->getLoc(), "Previous match was here."); } ---------------- Ditto. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83588/new/ https://reviews.llvm.org/D83588 From llvm-commits at lists.llvm.org Fri Jul 10 14:26:39 2020 From: llvm-commits at lists.llvm.org (Eli Friedman via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:26:39 +0000 (UTC) Subject: [PATCH] D83568: [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. In-Reply-To: References: Message-ID: <439f1198b0020f3f62008bc3e20cead5@localhost.localdomain> efriedma accepted this revision. efriedma added a comment. This revision is now accepted and ready to land. LGTM with one minor comment ================ Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-fp-converts.ll:33 +define <8 x half> @fptrunc_v8f32_v8f16(<8 x float>* %in) #0 { +; CHECK-LABEL: fptrunc_v8f32_v8f16: + %a = load <8 x float>, <8 x float>* %in ---------------- Please stick a check in here for the scalar fptrunc, so we're at least testing something. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83568/new/ https://reviews.llvm.org/D83568 From llvm-commits at lists.llvm.org Fri Jul 10 14:27:20 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:27:20 +0000 (UTC) Subject: [PATCH] D81775: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: <2e04eba0d78e7a77732c77f1316408ea@localhost.localdomain> zequanwu added a comment. Friendly ping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81775/new/ https://reviews.llvm.org/D81775 From llvm-commits at lists.llvm.org Fri Jul 10 14:27:43 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:27:43 +0000 (UTC) Subject: [PATCH] D81429: [COFF] Port CallGraphSort to COFF from ELF In-Reply-To: References: Message-ID: zequanwu added a comment. Herald added a subscriber: dang. Friendly ping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81429/new/ https://reviews.llvm.org/D81429 From llvm-commits at lists.llvm.org Fri Jul 10 14:28:52 2020 From: llvm-commits at lists.llvm.org (Tom Stellard via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:28:52 +0000 (UTC) Subject: [PATCH] D82694: [clang-shlib] Don't link with static clang libraries In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG1d68a780b34e: [clang-shlib] Don't link with static clang libraries (authored by tstellar). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82694/new/ https://reviews.llvm.org/D82694 Files: clang/tools/clang-shlib/CMakeLists.txt Index: clang/tools/clang-shlib/CMakeLists.txt =================================================================== --- clang/tools/clang-shlib/CMakeLists.txt +++ clang/tools/clang-shlib/CMakeLists.txt @@ -13,7 +13,12 @@ else() list(APPEND _OBJECTS $) endif() - list(APPEND _DEPS $) + if (BUILD_SHARED_LIBS) + # If we are building static libraries, then we don't need to add the static + # libraries as a depedency, because we are already linking against the + # individual object files. + list(APPEND _DEPS $) + endif() # clang libraries are redundant since we are linking all the individual # object files into libclang-cpp.so, so filter them out from _DEPS. -------------- next part -------------- A non-text attachment was scrubbed... Name: D82694.277147.patch Type: text/x-patch Size: 818 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 14:33:47 2020 From: llvm-commits at lists.llvm.org (Bill Wendling via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:33:47 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: <9976f4a6850a45088850ce9dece860ee@localhost.localdomain> void added a comment. Without this change, the "ADD64ri8" instruction is before the INLINEASM_BR before machine sinking. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Fri Jul 10 14:37:38 2020 From: llvm-commits at lists.llvm.org (Florian Hahn via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:37:38 +0000 (UTC) Subject: [PATCH] D83335: [ScheduleDAGRRList] Use std::*_heap() to keep candidate queue a heap. In-Reply-To: References: Message-ID: fhahn added a comment. In D83335#2140138 , @efriedma wrote: > > Are you referring to using the heap only once the queue grows larger than a threshold or deciding what scheduling heuristics to enable based on the size? > > The scheduling heuristics. > > > The selection should be deterministic across different compilers/C++ STLs because the comparator enforces a total order. > > It's undefined behavior to call std::push_heap/std::pop_heap on an array that isn't a heap. If the total order changes, that can break the heap property. Not sure what the practical consequence would be on common STL implementations, but that seems scary enough that we want to ensure that can't happen. Yeah we should avoid that. I'll take another look at the source order comparator, but I don't think we can rule out changing costs as of right now. Potentially changing the comparator for backends using the MachineScheduler is a bit bigger task. In the meantime I think I'll put up a patch that limits the number of candidates to scan linearly, to avoid a nasty quadratic compile-time case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83335/new/ https://reviews.llvm.org/D83335 From llvm-commits at lists.llvm.org Fri Jul 10 14:42:44 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via llvm-commits) Date: Fri, 10 Jul 2020 14:42:44 -0700 (PDT) Subject: [polly] 32bf468 - [Polly] Fix -polly-opt-isl -analyze Message-ID: <5f08e0d4.1c69fb81.132a7.013b@mx.google.com> Author: Michael Kruse Date: 2020-07-10T16:42:03-05:00 New Revision: 32bf46842025b740de870dbcd446e4b0f203c9aa URL: https://github.com/llvm/llvm-project/commit/32bf46842025b740de870dbcd446e4b0f203c9aa DIFF: https://github.com/llvm/llvm-project/commit/32bf46842025b740de870dbcd446e4b0f203c9aa.diff LOG: [Polly] Fix -polly-opt-isl -analyze The member LastSchedule was never set, such that printScop would always print "n/a" instead of the last schedule. To ensure that the isl_ctx lives as least as long as the stored schedule, also store a shared_ptr. Also set the schedule tree output style to ISL_YAML_STYLE_BLOCK to avoid printing everything on a single line. `opt -polly-opt-isl -analyze` will be used in the next commit. Added: Modified: polly/lib/Transform/ScheduleOptimizer.cpp polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll polly/test/ScheduleOptimizer/tile_after_fusion.ll Removed: ################################################################################ diff --git a/polly/lib/Transform/ScheduleOptimizer.cpp b/polly/lib/Transform/ScheduleOptimizer.cpp index d9dffb8f1f11..ebb596222a17 100644 --- a/polly/lib/Transform/ScheduleOptimizer.cpp +++ b/polly/lib/Transform/ScheduleOptimizer.cpp @@ -1389,7 +1389,7 @@ class IslScheduleOptimizer : public ScopPass { explicit IslScheduleOptimizer() : ScopPass(ID) {} - ~IslScheduleOptimizer() override { isl_schedule_free(LastSchedule); } + ~IslScheduleOptimizer() override { releaseMemory(); } /// Optimize the schedule of the SCoP @p S. bool runOnScop(Scop &S) override; @@ -1404,9 +1404,11 @@ class IslScheduleOptimizer : public ScopPass { void releaseMemory() override { isl_schedule_free(LastSchedule); LastSchedule = nullptr; + IslCtx.reset(); } private: + std::shared_ptr IslCtx; isl_schedule *LastSchedule = nullptr; }; } // namespace @@ -1630,6 +1632,8 @@ bool IslScheduleOptimizer::runOnScop(Scop &S) { ScopsOptimized++; NumAffineLoopsOptimized += ScopStats.NumAffineLoops; NumBoxedLoopsOptimized += ScopStats.NumBoxedLoops; + LastSchedule = NewSchedule.copy(); + IslCtx = S.getSharedIslCtx(); S.setScheduleTree(NewSchedule); S.markAsOptimized(); @@ -1652,6 +1656,7 @@ void IslScheduleOptimizer::printScop(raw_ostream &OS, Scop &) const { } p = isl_printer_to_str(isl_schedule_get_ctx(LastSchedule)); + p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK); p = isl_printer_print_schedule(p, LastSchedule); ScheduleStr = isl_printer_get_str(p); isl_printer_free(p); diff --git a/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll b/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll index 7a645364c4bf..ef6a9f42adab 100644 --- a/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll +++ b/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll @@ -18,6 +18,7 @@ ; C[i][j] += alpha * A[i][k] * B[k][j]; ; } ; +; CHECK-LABEL: Printing analysis 'Polly - Generate an AST from the SCoP (isl)' for region: 'bb8 => bb32' in function 'kernel_gemm': ; CHECK: { ; CHECK-NEXT: // 1st level tiling - Tiles ; CHECK-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1) diff --git a/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll b/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll index 76341cb6afe0..b73083d196c8 100644 --- a/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll +++ b/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll @@ -24,6 +24,7 @@ ; C[i][j] += alpha * A[i][k] * B[k][j]; ; } ; +; CHECK-LABEL: Printing analysis 'Polly - Generate an AST from the SCoP (isl)' for region: 'bb8 => bb32' in function 'kernel_gemm': ; CHECK: { ; CHECK-NEXT: // 1st level tiling - Tiles ; CHECK-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1) @@ -76,6 +77,7 @@ ; CHECK-NEXT: } ; CHECK-NEXT: } ; +; EXTRACTION-OF-MACRO-KERNEL-LABEL: Printing analysis 'Polly - Generate an AST from the SCoP (isl)' for region: 'bb8 => bb32' in function 'kernel_gemm': ; EXTRACTION-OF-MACRO-KERNEL: { ; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles ; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1) diff --git a/polly/test/ScheduleOptimizer/tile_after_fusion.ll b/polly/test/ScheduleOptimizer/tile_after_fusion.ll index 42833ef97999..35f20ce5723d 100644 --- a/polly/test/ScheduleOptimizer/tile_after_fusion.ll +++ b/polly/test/ScheduleOptimizer/tile_after_fusion.ll @@ -17,6 +17,7 @@ ; checks whether they are tiled after being fused when polly-opt-fusion equals ; "max". ; +; CHECK-LABEL: Printing analysis 'Polly - Generate an AST from the SCoP (isl)' for region: 'for.cond => for.end56' in function 'tf': ; CHECK: 1st level tiling - Tiles ; CHECK-NEXT: for (int c0 = 0; c0 <= 7; c0 += 1) ; CHECK-NEXT: for (int c1 = 0; c1 <= 7; c1 += 1) From llvm-commits at lists.llvm.org Fri Jul 10 14:42:46 2020 From: llvm-commits at lists.llvm.org (Michael Kruse via llvm-commits) Date: Fri, 10 Jul 2020 14:42:46 -0700 (PDT) Subject: [polly] c0bc995 - [Polly] Fix prevectorization of fused loops. Message-ID: <5f08e0d6.1c69fb81.5078a.14c1@mx.google.com> Author: Michael Kruse Date: 2020-07-10T16:42:03-05:00 New Revision: c0bc995429c417c1e206841d6b9727218fab3f73 URL: https://github.com/llvm/llvm-project/commit/c0bc995429c417c1e206841d6b9727218fab3f73 DIFF: https://github.com/llvm/llvm-project/commit/c0bc995429c417c1e206841d6b9727218fab3f73.diff LOG: [Polly] Fix prevectorization of fused loops. The schedule of a fused loop has one isl_space per statement, such that a conversion to a isl_map fails. However, the prevectorization is interested in the schedule space only: Converting to the non-union representation only after extracting the schedule range fixes the problem. This fixes llvm.org/PR46578 Added: polly/test/ScheduleOptimizer/focaltech_test_detail_threshold-7bc17e.ll Modified: polly/lib/Transform/ScheduleOptimizer.cpp Removed: ################################################################################ diff --git a/polly/lib/Transform/ScheduleOptimizer.cpp b/polly/lib/Transform/ScheduleOptimizer.cpp index ebb596222a17..3806707259fd 100644 --- a/polly/lib/Transform/ScheduleOptimizer.cpp +++ b/polly/lib/Transform/ScheduleOptimizer.cpp @@ -385,8 +385,8 @@ ScheduleTreeOptimizer::isolateFullPartialTiles(isl::schedule_node Node, assert(isl_schedule_node_get_type(Node.get()) == isl_schedule_node_band); Node = Node.child(0).child(0); isl::union_map SchedRelUMap = Node.get_prefix_schedule_relation(); - isl::map ScheduleRelation = isl::map::from_union_map(SchedRelUMap); - isl::set ScheduleRange = ScheduleRelation.range(); + isl::union_set ScheduleRangeUSet = SchedRelUMap.range(); + isl::set ScheduleRange{ScheduleRangeUSet}; isl::set IsolateDomain = getPartialTilePrefixes(ScheduleRange, VectorWidth); auto AtomicOption = getDimOptions(IsolateDomain.get_ctx(), "atomic"); isl::union_set IsolateOption = getIsolateOptions(IsolateDomain, 1); diff --git a/polly/test/ScheduleOptimizer/focaltech_test_detail_threshold-7bc17e.ll b/polly/test/ScheduleOptimizer/focaltech_test_detail_threshold-7bc17e.ll new file mode 100644 index 000000000000..2668386c542d --- /dev/null +++ b/polly/test/ScheduleOptimizer/focaltech_test_detail_threshold-7bc17e.ll @@ -0,0 +1,94 @@ +; RUN: opt %loadPolly -polly-opt-isl -polly-opt-fusion=max -polly-vectorizer=stripmine -polly-invariant-load-hoisting -polly-optimized-scops -analyze < %s | FileCheck %s +; +; llvm.org/PR46578 +; +target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" + +%struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75 = type { [60 x i8]*, [60 x i32]*, [60 x i32]*, [60 x i32]*, [60 x i32]*, [60 x i32]*, [60 x i32]* } + at ft8006m_g_stCfg_Incell_DetailThreshold = external dso_local local_unnamed_addr global %struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75, align 8 +declare dso_local i32 @ft8006m_atoi() local_unnamed_addr #0 + +define void @func() { +entry: + switch i32 undef, label %cleanup [ + i32 10, label %if.end + i32 14, label %if.end + i32 16, label %if.end + ] + +if.end: ; preds = %entry, %entry, %entry + %call15 = call i32 @ft8006m_atoi() #1 + %0 = zext i32 %call15 to i64 + br label %for.cond + +for.cond: ; preds = %for.inc39, %if.end + %indvars.iv302 = phi i64 [ %indvars.iv.next303, %for.inc39 ], [ 0, %if.end ] + %exitcond304 = icmp eq i64 %indvars.iv302, 60 + br i1 %exitcond304, label %cleanup, label %for.cond21 + +for.cond21: ; preds = %for.body23, %for.cond + %indvars.iv296 = phi i64 [ %indvars.iv.next297, %for.body23 ], [ 0, %for.cond ] + %exitcond298 = icmp eq i64 %indvars.iv296, 60 + br i1 %exitcond298, label %for.cond28, label %for.body23 + +for.body23: ; preds = %for.cond21 + %1 = load [60 x i32]*, [60 x i32]** getelementptr inbounds (%struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75, %struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75* @ft8006m_g_stCfg_Incell_DetailThreshold, i64 0, i32 2), align 8 + %arrayidx25 = getelementptr [60 x i32], [60 x i32]* %1, i64 %indvars.iv302, i64 %indvars.iv296 + store i32 undef, i32* %arrayidx25, align 4 + %indvars.iv.next297 = add nuw nsw i64 %indvars.iv296, 1 + br label %for.cond21 + +for.cond28: ; preds = %for.body30, %for.cond21 + %indvars.iv299 = phi i64 [ %indvars.iv.next300, %for.body30 ], [ 0, %for.cond21 ] + %exitcond301 = icmp eq i64 %indvars.iv299, 60 + br i1 %exitcond301, label %for.inc39, label %for.body30 + +for.body30: ; preds = %for.cond28 + %2 = load [60 x i32]*, [60 x i32]** getelementptr inbounds (%struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75, %struct.stCfg_Incell_DetailThreshold.2.30.42.62.74.94.122.126.134.166.194.242.338.342.346.350.354.358.496.0.2.9.16.28.36.37.38.39.40.75* @ft8006m_g_stCfg_Incell_DetailThreshold, i64 0, i32 2), align 8 + %arrayidx34 = getelementptr [60 x i32], [60 x i32]* %2, i64 %0, i64 %indvars.iv299 + store i32 undef, i32* %arrayidx34, align 4 + %indvars.iv.next300 = add nuw nsw i64 %indvars.iv299, 1 + br label %for.cond28 + +for.inc39: ; preds = %for.cond28 + %indvars.iv.next303 = add nuw nsw i64 %indvars.iv302, 1 + br label %for.cond + +cleanup: ; preds = %for.cond, %entry + ret void +} + + +; CHECK-LABEL: Printing analysis 'Polly - Optimize schedule of SCoP' for region: 'for.cond => cleanup' in function 'func': +; CHECK: Calculated schedule: +; CHECK: domain: "[call15] -> { Stmt_for_body23[i0, i1] : 0 <= i0 <= 59 and 0 <= i1 <= 59; Stmt_for_body30[i0, i1] : 0 <= i0 <= 59 and 0 <= i1 <= 59 }" +; CHECK: child: +; CHECK: mark: "1st level tiling - Tiles" +; CHECK: child: +; CHECK: schedule: "[call15] -> [{ Stmt_for_body23[i0, i1] -> [(floor((i0 + i1)/32))]; Stmt_for_body30[i0, i1] -> [(floor((call15 + i1)/32))] }, { Stmt_for_body23[i0, i1] -> [(floor((i0)/32))]; Stmt_for_body30[i0, i1] -> [(floor((i0)/32))] }]" +; CHECK: permutable: 1 +; CHECK: coincident: [ 1, 0 ] +; CHECK: child: +; CHECK: mark: "1st level tiling - Points" +; CHECK: child: +; CHECK: schedule: "[call15] -> [{ Stmt_for_body23[i0, i1] -> [(floor((i0 + i1)/4) - 8*floor((i0 + i1)/32))]; Stmt_for_body30[i0, i1] -> [(floor((call15 + i1)/4) - 8*floor((call15 + i1)/32))] }]" +; CHECK: permutable: 1 +; CHECK: coincident: [ 1 ] +; CHECK: options: "[call15] -> { atomic[0]; isolate{{\[\[}}i0, i1] -> [i2]] : 0 <= i1 <= 1 and 0 <= i2 <= 7 and call15 - 32i0 <= 4i2 <= 56 + call15 - 32i0 and (call15 >= 120 or i2 < -8i0 + 8i1); isolate{{\[\[}}i0, 0] -> [i2]] : 0 <= i2 <= 7 and ((call15 >= 120 and -8i0 <= i2 <= 21 - 8i0) or (call15 >= 92 and i2 <= 28 - 8i0 and 4i2 >= call15 - 32i0) or (call15 <= 91 and -8i0 <= i2 <= 21 - 8i0) or (92 <= call15 <= 119 and -8i0 <= i2 <= 21 - 8i0) or (call15 <= 91 and 22 - 8i0 <= i2 <= 28 - 8i0 and 4i2 <= 56 + call15 - 32i0) or (call15 <= 119 and i2 >= 29 - 8i0 and call15 - 32i0 <= 4i2 <= 56 + call15 - 32i0)); isolate{{\[\[}}i0, 1] -> [i2]] : i2 >= 0 and 8 - 8i0 <= i2 <= 7 and ((call15 >= 120 and i2 <= 28 - 8i0) or (call15 <= 119 and i2 >= 29 - 8i0 and 4i2 <= 56 + call15 - 32i0) or (call15 <= 119 and i2 <= 28 - 8i0)) }" +; CHECK: child: +; CHECK: schedule: "[call15] -> [{ Stmt_for_body23[i0, i1] -> [((i0) mod 32)]; Stmt_for_body30[i0, i1] -> [((i0) mod 32)] }]" +; CHECK: permutable: 1 +; CHECK: child: +; CHECK: mark: "SIMD" +; CHECK: child: +; CHECK: sequence: +; CHECK: - filter: "[call15] -> { Stmt_for_body23[i0, i1] }" +; CHECK: child: +; CHECK: schedule: "[call15] -> [{ Stmt_for_body23[i0, i1] -> [((i0 + i1) mod 4)]; Stmt_for_body30[i0, i1] -> [((call15 + i1) mod 4)] }]" +; CHECK: permutable: 1 +; CHECK: coincident: [ 1 ] +; CHECK: - filter: "[call15] -> { Stmt_for_body30[i0, i1] }" +; CHECK: child: +; CHECK: schedule: "[call15] -> [{ Stmt_for_body23[i0, i1] -> [((i0 + i1) mod 4)]; Stmt_for_body30[i0, i1] -> [((call15 + i1) mod 4)] }]" +; CHECK: permutable: 1 +; CHECK: coincident: [ 1 ] From llvm-commits at lists.llvm.org Fri Jul 10 14:44:15 2020 From: llvm-commits at lists.llvm.org (Krzysztof Pszeniczny via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:44:15 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: amharc added a comment. Ple In D79978#2145063 , @MaskRay wrote: > In D79978#2141992 , @wmi wrote: > > > In D79978#2141959 , @MaskRay wrote: > > > > > Ack. Then what instructions should be placed at the top of these basic blocks? Should `.cfi_def_cfa_register %rbp` be placed as well? If you move these basic blocks around, `.cfi_def_cfa_register %rbp` is currently not tracked. > > > > > > That is because .cfi_def_cfa %rbp, 16 is identical to the following: > > .cfi_def_cfa_register %rbp > > .cfi_def_cfa_offset 16 > > > Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me. > I think it lacks a summary about what the patch intends to do. > > Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled? > > Or > > The entry basic block can be shuffled as well? We explicitly want to support the case where all BBs can be shuffled around arbitrarily. > For either behavior, I am not sure the test covers the situations: I can only find `.cfi_def_cfa_register` in the entry block, not in others - so I am not confident that `.cfi_def_cfa_register` information is correctly retained. We need a stronger test. It is retained, because we always emit a full `.cfi_def_cfa` in all non-entry basic blocks - which, as @wmi has mentioned before, is equivalent to a `.cfi_def_cfa_register` and a `.cfi_def_cfa_offset`. For non-entry basic blocks, there is no previous state (neither the cfa register nor the cfa offset), so we cannot emit just `.cfi_def_cfa_register` or just `.cfi_def_cfa_offset`, because both of them preserve one part of the cfa definition (the offset for `.cfi_def_cfa_register` or the register for `.cfi_def_cfa_offset`). > Note also that only `rbp` is described. I think we need another register to demonstrate the effect. `rbp` is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 14:46:02 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:46:02 +0000 (UTC) Subject: [PATCH] D78283: [MustExecute] Use MustBeExecutedInterval to eliminate work duplication In-Reply-To: References: Message-ID: <4e1e5ca3a117100f76851f968e39d921@localhost.localdomain> lebedev.ri added a comment. Herald added a reviewer: baziotis. Herald added subscribers: okura, bbn. > Early test results show significant improvements wrt. compile time and no changes to the context. But the tests are changed. Can this be split up into NFC refactoring + some other patches that change tests? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78283/new/ https://reviews.llvm.org/D78283 From llvm-commits at lists.llvm.org Fri Jul 10 14:55:07 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:55:07 +0000 (UTC) Subject: [PATCH] D81993: [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy In-Reply-To: References: Message-ID: aemerson added a comment. Ping Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81993/new/ https://reviews.llvm.org/D81993 From llvm-commits at lists.llvm.org Fri Jul 10 14:59:16 2020 From: llvm-commits at lists.llvm.org (Andrii Nakryiko via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 21:59:16 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: anakryiko added a comment. I concur with @yonghong-song, making float disappear is much bigger pain. But I also think we should go with encoding and actually also fix encoding interpretation in kernel. Right now it allows only one of SIGNED, CHAR, or BOOL to be specified, which makes it impossible to correctly represent `signed char` type, as you can see from this: $ cat test.c struct s { signed char wat; }; int main() { static struct s s = { .wat = -2 }; return s.wat; } $ cc -g test.c -o test $ readelf -wi test | rg 'signed char' <48> DW_AT_encoding : 6 (signed char) <49> DW_AT_name : (indirect string, offset: 0x50): signed char $ pahole -JV test | rg 'signed char' [2] INT signed char size=1 bit_offset=0 nr_bits=8 encoding=(none) $ clang -g -target bpf -c test.c -o test.bpf.o $ bpftool btf dump file test.bpf.o | rg 'signed char' [5] INT 'signed char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED DWARF gets it right. pahole does the same thing as for float, just emits no encoding. clang emits it as signed, but not a char. BTW, I couldn't make neither pahole nor clang to emit CHAR encoding for anything (char, unsigned char, signed char), what was it supposed to be used for originally? So I'd say let's add BTF_INT_FLOATING and fix SIGNED CHAR and CHAR problem altogether? Libbpf can trivially sanitize BTF for older kernels. Thoughts? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 15:01:41 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:01:41 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <24c884e3d2e972e8f8d109aec9511cf9@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:790 addPass(createLoadStoreVectorizerPass()); + + addPass(createLowerSwitchPass()); ---------------- Can you add a comment for why this is here ================ Comment at: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll:6 + ; follows the LowerSwitch. Otherwise, this testcase will crash soonafter + ; the instruction selection due to the incomplete PHI node in an MBB whose + ; incoming values were never codegenerated. ---------------- typo soonafter Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 From llvm-commits at lists.llvm.org Fri Jul 10 15:07:54 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:07:54 +0000 (UTC) Subject: [PATCH] D70376: [LVI] Restructure caching In-Reply-To: References: Message-ID: <7933e274a74d413fbf287b2ed3ce1b74@localhost.localdomain> tejohnson added a comment. Hi @nikic, I just tracked down a big compile time increase to this patch. The issue cropped up for a very large function, where a cycle profile showed hotspots in: llvm::DenseMapBase, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo >, llvm::detail::DenseMapPair, llvm::ValueLatticeElement> >, llvm::AssertingVH, llvm::ValueLatticeElement, llvm::DenseMapInfo >, llvm::detail::DenseMapPair, llvm::ValueLatticeElement> >::erase(llvm::AssertingVH const&) and (anonymous namespace)::LVIValueHandle::deleted() The problem is related to this patch's restructuring of the LazyValueInfoCache so that instead of a 2-level map from Value* -> BB -> ValueLatticeElement, it now has a 2-level map from BB -> Value -> ValueLatticeElement. The problem is that LVIValueHandle::deleted invokes LazyValueInfoCache::eraseValue on a Value*, which now needs to walk through every entry in the outer map to remove the Value* from every block containing it in its lattice. Before, it could simply do a single lookup on the outer map to remove the Value. When I revert this patch the compile time goes down ~35%. I noticed in your description this comment: "A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914 , but possibly that's not the right choice." It sounds like your proposed alternative would address this issue. Would you be able to do that? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70376/new/ https://reviews.llvm.org/D70376 From llvm-commits at lists.llvm.org Fri Jul 10 15:09:20 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:09:20 +0000 (UTC) Subject: [PATCH] D78283: [MustExecute] Use MustBeExecutedInterval to eliminate work duplication In-Reply-To: References: Message-ID: lebedev.ri accepted this revision. lebedev.ri added a comment. This revision is now accepted and ready to land. Alright, this has been up for review long enough :) Large diffs are indeed scary, and impossible to review, and there is not really a way for people to mark parts as reviewed so if i'm not the first one reviewing, i guess no one has spotted anything bad yet. This is pretty self-contained, in a pretty new (attributor) area. Cursory examination suggests that this looks about right. I suspect the perf story is indeed better than what there currently is :S So i feel like i'm okay with rubber-stamping it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78283/new/ https://reviews.llvm.org/D78283 From llvm-commits at lists.llvm.org Fri Jul 10 15:10:00 2020 From: llvm-commits at lists.llvm.org (Amy Kwan via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:10:00 +0000 (UTC) Subject: [PATCH] D83497: [PowerPC][Power10] Fix VINS* (vector insert byte/half/word) instructions to have i32 arguments. In-Reply-To: References: Message-ID: <1796f90eb3df1bead3e449390749bc22@localhost.localdomain> amyk updated this revision to Diff 277153. amyk retitled this revision from "[PowerPC][Power10] Fix the VINSW instruction to have an i32 argument." to "[PowerPC][Power10] Fix VINS* (vector insert byte/half/word) instructions to have i32 arguments.". amyk edited the summary of this revision. amyk added a comment. Herald added a project: clang. Herald added a subscriber: cfe-commits. Updated revision to fix vector insert byte/half/word versions to have an i32 argument. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83497/new/ https://reviews.llvm.org/D83497 Files: clang/include/clang/Basic/BuiltinsPPC.def clang/test/CodeGen/builtins-ppc-p10vector.c llvm/include/llvm/IR/IntrinsicsPowerPC.td llvm/lib/Target/PowerPC/PPCInstrPrefix.td llvm/test/CodeGen/PowerPC/builtins-ppc-p10permute.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83497.277153.patch Type: text/x-patch Size: 18081 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 15:10:43 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:10:43 +0000 (UTC) Subject: [PATCH] D81993: [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy In-Reply-To: References: Message-ID: <1ff8fb1800de093681b0b5449ddcefcc@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp:447 + Register Dst = MI.getOperand(0).getReg(); + unsigned DstSize = MRI.getType(Dst).getSizeInBits(); + if (MI.hasOneMemOperand()) ---------------- I think this needs to be careful about a vector sextload Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81993/new/ https://reviews.llvm.org/D81993 From llvm-commits at lists.llvm.org Fri Jul 10 15:15:17 2020 From: llvm-commits at lists.llvm.org (James Y Knight via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:15:17 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: <64a283912c34fe1564125274277272cb@localhost.localdomain> jyknight added a comment. It looks like the issue shown in this test-case appears in Two-Address instruction pass, not Machine Sink. We go from: bb.0 (%ir-block.1): successors: %bb.2(0x80000000), %bb.1(0x00000000); %bb.2(100.00%), %bb.1(0.00%) liveins: $rdi %1:gr64 = COPY killed $rdi %0:gr64 = nuw ADD64ri8 %1:gr64(tied-def 0), 24, implicit-def dead $eflags INLINEASM_BR &"# $0 $1 $2" [sideeffect] [mayload] [maystore] [attdialect], $0:[mem:m], killed %1:gr64, 1, $noreg, 24, $noreg, $1:[imm], 1, $2:[imm], blockaddress(@klist_dec_and_del, %ir-block.4), $ 3:[clobber], implicit-def dead early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def dead early-clobber $eflags JMP_1 %bb.2 bb.1 (%ir-block.4, address-taken): ; predecessors: %bb.0 successors: %bb.2(0x80000000); %bb.2(100.00%) MOV64mi32 killed %0:gr64, 1, $noreg, -24, $noreg, 0 :: (store 8 into %ir.6) bb.2 (%ir-block.7): ; predecessors: %bb.0, %bb.1 RET 0, undef $eax then replace the ADD64ri8 with a LEA64r -- but place it _after_ the INLINEASM_BR, bb.0 (%ir-block.1): successors: %bb.2(0x80000000), %bb.1(0x00000000); %bb.2(100.00%), %bb.1(0.00%) liveins: $rdi %1:gr64 = COPY killed $rdi - %0:gr64 = nuw ADD64ri8 %1:gr64(tied-def 0), 24, implicit-def dead $eflags - INLINEASM_BR &"# $0 $1 $2" [sideeffect] [mayload] [maystore] [attdialect], $0:[mem:m], killed %1:gr64, 1, $noreg, 24, $noreg, $1:[imm], 1, $2:[imm], blockaddress(@klist_dec_and_del, %ir-block.4), $3:[clobber], implicit-def dead early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def dead early-clobber $eflags + INLINEASM_BR &"# $0 $1 $2" [sideeffect] [mayload] [maystore] [attdialect], $0:[mem:m], %1:gr64, 1, $noreg, 24, $noreg, $1:[imm], 1, $2:[imm], blockaddress(@klist_dec_and_del, %ir-block.4), $3:[clobber], implicit-def dead early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def dead early-clobber $eflags + %0:gr64 = LEA64r killed %1:gr64, 1, $noreg, 24, $noreg JMP_1 %bb.2 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Fri Jul 10 15:15:34 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Fri, 10 Jul 2020 15:15:34 -0700 (PDT) Subject: [llvm] 122a45f - [X86] Add isel patterns for matching broadcast vpternlog if the ternlog and the broadcast have different types. Message-ID: <5f08e886.1c69fb81.f85a1.0f9b@mx.google.com> Author: Craig Topper Date: 2020-07-10T15:15:02-07:00 New Revision: 122a45fbac059be0fb88b2b909191d7a93ce9c09 URL: https://github.com/llvm/llvm-project/commit/122a45fbac059be0fb88b2b909191d7a93ce9c09 DIFF: https://github.com/llvm/llvm-project/commit/122a45fbac059be0fb88b2b909191d7a93ce9c09.diff LOG: [X86] Add isel patterns for matching broadcast vpternlog if the ternlog and the broadcast have different types. Added: Modified: llvm/lib/Target/X86/X86InstrAVX512.td llvm/test/CodeGen/X86/vector-fshl-128.ll llvm/test/CodeGen/X86/vector-fshl-256.ll llvm/test/CodeGen/X86/vector-fshl-512.ll llvm/test/CodeGen/X86/vector-fshr-128.ll llvm/test/CodeGen/X86/vector-fshr-256.ll llvm/test/CodeGen/X86/vector-fshr-512.ll llvm/test/CodeGen/X86/vector-shuffle-avx512.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86InstrAVX512.td b/llvm/lib/Target/X86/X86InstrAVX512.td index 0921a0e51668..a3ad0b1c8dd6 100644 --- a/llvm/lib/Target/X86/X86InstrAVX512.td +++ b/llvm/lib/Target/X86/X86InstrAVX512.td @@ -11365,6 +11365,36 @@ let Predicates = [HasVLX] in { (VPTERNLOGQZ128rmi VR128X:$src1, VR128X:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v16i8 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v16i8 (X86vpternlog (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v16i8 (X86vpternlog VR128X:$src1, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v16i8 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v16i8 (X86vpternlog (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v16i8 (X86vpternlog VR128X:$src1, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v8i16 (X86vpternlog VR128X:$src1, VR128X:$src2, VR128X:$src3, (i8 timm:$src4))), (VPTERNLOGQZ128rri VR128X:$src1, VR128X:$src2, VR128X:$src3, @@ -11382,6 +11412,66 @@ let Predicates = [HasVLX] in { (VPTERNLOGQZ128rmi VR128X:$src1, VR128X:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v8i16 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v8i16 (X86vpternlog (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v8i16 (X86vpternlog VR128X:$src1, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v8i16 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v8i16 (X86vpternlog (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v8i16 (X86vpternlog VR128X:$src1, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v4i32 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v4i32 (X86vpternlog (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v4i32 (X86vpternlog VR128X:$src1, + (bitconvert (v2i64 (X86VBroadcastld64 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v2i64 (X86vpternlog VR128X:$src1, VR128X:$src2, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v2i64 (X86vpternlog (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, VR128X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v2i64 (X86vpternlog VR128X:$src1, + (bitconvert (v4i32 (X86VBroadcastld32 addr:$src3))), + VR128X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ128rmbi VR128X:$src1, VR128X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v32i8 (X86vpternlog VR256X:$src1, VR256X:$src2, VR256X:$src3, (i8 timm:$src4))), (VPTERNLOGQZ256rri VR256X:$src1, VR256X:$src2, VR256X:$src3, @@ -11399,6 +11489,36 @@ let Predicates = [HasVLX] in { (VPTERNLOGQZ256rmi VR256X:$src1, VR256X:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v32i8 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v32i8 (X86vpternlog (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v32i8 (X86vpternlog VR256X:$src1, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v32i8 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v32i8 (X86vpternlog (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v32i8 (X86vpternlog VR256X:$src1, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v16i16 (X86vpternlog VR256X:$src1, VR256X:$src2, VR256X:$src3, (i8 timm:$src4))), (VPTERNLOGQZ256rri VR256X:$src1, VR256X:$src2, VR256X:$src3, @@ -11415,6 +11535,66 @@ let Predicates = [HasVLX] in { VR256X:$src2, (i8 timm:$src4))), (VPTERNLOGQZ256rmi VR256X:$src1, VR256X:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v16i16 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v16i16 (X86vpternlog (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v16i16 (X86vpternlog VR256X:$src1, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v16i16 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v16i16 (X86vpternlog (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v16i16 (X86vpternlog VR256X:$src1, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v8i32 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v8i32 (X86vpternlog (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v8i32 (X86vpternlog VR256X:$src1, + (bitconvert (v4i64 (X86VBroadcastld64 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGQZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v4i64 (X86vpternlog VR256X:$src1, VR256X:$src2, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v4i64 (X86vpternlog (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, VR256X:$src1, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v4i64 (X86vpternlog VR256X:$src1, + (bitconvert (v8i32 (X86VBroadcastld32 addr:$src3))), + VR256X:$src2, (i8 timm:$src4))), + (VPTERNLOGDZ256rmbi VR256X:$src1, VR256X:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; } let Predicates = [HasAVX512] in { @@ -11435,6 +11615,36 @@ let Predicates = [HasAVX512] in { (VPTERNLOGQZrmi VR512:$src1, VR512:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v64i8 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v64i8 (X86vpternlog (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v64i8 (X86vpternlog VR512:$src1, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v64i8 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v64i8 (X86vpternlog (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v64i8 (X86vpternlog VR512:$src1, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + def : Pat<(v32i16 (X86vpternlog VR512:$src1, VR512:$src2, VR512:$src3, (i8 timm:$src4))), (VPTERNLOGQZrri VR512:$src1, VR512:$src2, VR512:$src3, @@ -11448,9 +11658,84 @@ let Predicates = [HasAVX512] in { (VPTERNLOGQZrmi VR512:$src1, VR512:$src2, addr:$src3, (VPTERNLOG321_imm8 timm:$src4))>; def : Pat<(v32i16 (X86vpternlog VR512:$src1, (loadv32i16 addr:$src3), - VR512:$src2, (i8 timm:$src4))), + VR512:$src2, (i8 timm:$src4))), (VPTERNLOGQZrmi VR512:$src1, VR512:$src2, addr:$src3, (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v32i16 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v32i16 (X86vpternlog (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v32i16 (X86vpternlog VR512:$src1, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v32i16 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v32i16 (X86vpternlog (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v32i16 (X86vpternlog VR512:$src1, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v32i16 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v32i16 (X86vpternlog (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v32i16 (X86vpternlog VR512:$src1, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v16i32 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v16i32 (X86vpternlog (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v16i32 (X86vpternlog VR512:$src1, + (bitconvert (v8i64 (X86VBroadcastld64 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGQZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; + + def : Pat<(v8i64 (X86vpternlog VR512:$src1, VR512:$src2, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + timm:$src4)>; + def : Pat<(v8i64 (X86vpternlog (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, VR512:$src1, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG321_imm8 timm:$src4))>; + def : Pat<(v8i64 (X86vpternlog VR512:$src1, + (bitconvert (v16i32 (X86VBroadcastld32 addr:$src3))), + VR512:$src2, (i8 timm:$src4))), + (VPTERNLOGDZrmbi VR512:$src1, VR512:$src2, addr:$src3, + (VPTERNLOG132_imm8 timm:$src4))>; } // Patterns to implement vnot using vpternlog instead of creating all ones diff --git a/llvm/test/CodeGen/X86/vector-fshl-128.ll b/llvm/test/CodeGen/X86/vector-fshl-128.ll index b2ad1b33384e..d8442048f65e 100644 --- a/llvm/test/CodeGen/X86/vector-fshl-128.ll +++ b/llvm/test/CodeGen/X86/vector-fshl-128.ll @@ -2905,8 +2905,7 @@ define <16 x i8> @constant_funnnel_v16i8(<16 x i8> %x, <16 x i8> %y) nounwind { ; AVX512VL-NEXT: vpsllvd {{.*}}(%rip), %zmm2, %zmm2 ; AVX512VL-NEXT: vpord %zmm1, %zmm2, %zmm1 ; AVX512VL-NEXT: vpmovdb %zmm1, %xmm1 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} xmm2 = [18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $216, %xmm2, %xmm1, %xmm0 +; AVX512VL-NEXT: vpternlogq $216, {{.*}}(%rip){1to2}, %xmm1, %xmm0 ; AVX512VL-NEXT: vzeroupper ; AVX512VL-NEXT: retq ; diff --git a/llvm/test/CodeGen/X86/vector-fshl-256.ll b/llvm/test/CodeGen/X86/vector-fshl-256.ll index 674b064100c4..12feea765898 100644 --- a/llvm/test/CodeGen/X86/vector-fshl-256.ll +++ b/llvm/test/CodeGen/X86/vector-fshl-256.ll @@ -2376,8 +2376,7 @@ define <32 x i8> @constant_funnnel_v32i8(<32 x i8> %x, <32 x i8> %y) nounwind { ; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1 ; AVX512VL-NEXT: vpackuswb %ymm4, %ymm1, %ymm1 ; AVX512VL-NEXT: vpor %ymm1, %ymm2, %ymm1 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $216, %ymm2, %ymm1, %ymm0 +; AVX512VL-NEXT: vpternlogq $216, {{.*}}(%rip){1to4}, %ymm1, %ymm0 ; AVX512VL-NEXT: retq ; ; AVX512BW-LABEL: constant_funnnel_v32i8: diff --git a/llvm/test/CodeGen/X86/vector-fshl-512.ll b/llvm/test/CodeGen/X86/vector-fshl-512.ll index 09a29fdbaad4..6e0cb76398df 100644 --- a/llvm/test/CodeGen/X86/vector-fshl-512.ll +++ b/llvm/test/CodeGen/X86/vector-fshl-512.ll @@ -1184,8 +1184,7 @@ define <64 x i8> @constant_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y) nounwind { ; AVX512F-NEXT: vpackuswb %ymm5, %ymm1, %ymm1 ; AVX512F-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1 ; AVX512F-NEXT: vporq %zmm1, %zmm2, %zmm1 -; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm2 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512F-NEXT: vpternlogq $216, %zmm2, %zmm1, %zmm0 +; AVX512F-NEXT: vpternlogq $216, {{.*}}(%rip){1to8}, %zmm1, %zmm0 ; AVX512F-NEXT: retq ; ; AVX512VL-LABEL: constant_funnnel_v64i8: @@ -1236,8 +1235,7 @@ define <64 x i8> @constant_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y) nounwind { ; AVX512VL-NEXT: vpackuswb %ymm5, %ymm1, %ymm1 ; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1 ; AVX512VL-NEXT: vporq %zmm1, %zmm2, %zmm1 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} zmm2 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $216, %zmm2, %zmm1, %zmm0 +; AVX512VL-NEXT: vpternlogq $216, {{.*}}(%rip){1to8}, %zmm1, %zmm0 ; AVX512VL-NEXT: retq ; ; AVX512BW-LABEL: constant_funnnel_v64i8: diff --git a/llvm/test/CodeGen/X86/vector-fshr-128.ll b/llvm/test/CodeGen/X86/vector-fshr-128.ll index 23fbc5e70707..b7cc39a32d71 100644 --- a/llvm/test/CodeGen/X86/vector-fshr-128.ll +++ b/llvm/test/CodeGen/X86/vector-fshr-128.ll @@ -2651,9 +2651,8 @@ define <16 x i8> @constant_funnnel_v16i8(<16 x i8> %x, <16 x i8> %y) nounwind { ; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero ; AVX512VL-NEXT: vpsllvd {{.*}}(%rip), %zmm0, %zmm0 ; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0 -; AVX512VL-NEXT: vpmovdb %zmm0, %xmm2 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} xmm0 = [18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $202, %xmm1, %xmm2, %xmm0 +; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0 +; AVX512VL-NEXT: vpternlogq $228, {{.*}}(%rip){1to2}, %xmm1, %xmm0 ; AVX512VL-NEXT: vzeroupper ; AVX512VL-NEXT: retq ; diff --git a/llvm/test/CodeGen/X86/vector-fshr-256.ll b/llvm/test/CodeGen/X86/vector-fshr-256.ll index bd5698bc63be..bbeaed5cc725 100644 --- a/llvm/test/CodeGen/X86/vector-fshr-256.ll +++ b/llvm/test/CodeGen/X86/vector-fshr-256.ll @@ -2083,9 +2083,8 @@ define <32 x i8> @constant_funnnel_v32i8(<32 x i8> %x, <32 x i8> %y) nounwind { ; AVX512VL-NEXT: vpmullw {{.*}}(%rip), %ymm2, %ymm2 ; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2 ; AVX512VL-NEXT: vpackuswb %ymm3, %ymm2, %ymm2 -; AVX512VL-NEXT: vpor %ymm2, %ymm0, %ymm2 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} ymm0 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $202, %ymm1, %ymm2, %ymm0 +; AVX512VL-NEXT: vpor %ymm2, %ymm0, %ymm0 +; AVX512VL-NEXT: vpternlogq $228, {{.*}}(%rip){1to4}, %ymm1, %ymm0 ; AVX512VL-NEXT: retq ; ; AVX512BW-LABEL: constant_funnnel_v32i8: diff --git a/llvm/test/CodeGen/X86/vector-fshr-512.ll b/llvm/test/CodeGen/X86/vector-fshr-512.ll index 3337ebe22fed..c89782bc359c 100644 --- a/llvm/test/CodeGen/X86/vector-fshr-512.ll +++ b/llvm/test/CodeGen/X86/vector-fshr-512.ll @@ -1171,9 +1171,8 @@ define <64 x i8> @constant_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y) nounwind { ; AVX512F-NEXT: vpsrlw $8, %ymm3, %ymm3 ; AVX512F-NEXT: vpackuswb %ymm4, %ymm3, %ymm3 ; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2 -; AVX512F-NEXT: vporq %zmm2, %zmm0, %zmm2 -; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm0 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512F-NEXT: vpternlogq $202, %zmm1, %zmm2, %zmm0 +; AVX512F-NEXT: vporq %zmm2, %zmm0, %zmm0 +; AVX512F-NEXT: vpternlogq $228, {{.*}}(%rip){1to8}, %zmm1, %zmm0 ; AVX512F-NEXT: retq ; ; AVX512VL-LABEL: constant_funnnel_v64i8: @@ -1223,9 +1222,8 @@ define <64 x i8> @constant_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y) nounwind { ; AVX512VL-NEXT: vpsrlw $8, %ymm3, %ymm3 ; AVX512VL-NEXT: vpackuswb %ymm4, %ymm3, %ymm3 ; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2 -; AVX512VL-NEXT: vporq %zmm2, %zmm0, %zmm2 -; AVX512VL-NEXT: vpbroadcastq {{.*#+}} zmm0 = [18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360,18446744073709551360] -; AVX512VL-NEXT: vpternlogq $202, %zmm1, %zmm2, %zmm0 +; AVX512VL-NEXT: vporq %zmm2, %zmm0, %zmm0 +; AVX512VL-NEXT: vpternlogq $228, {{.*}}(%rip){1to8}, %zmm1, %zmm0 ; AVX512VL-NEXT: retq ; ; AVX512BW-LABEL: constant_funnnel_v64i8: diff --git a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll index 1ab6f2cc45fc..cb2dd3ef7e86 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-avx512.ll @@ -337,11 +337,15 @@ define <32 x i16> @test_mm512_mask_blend_epi16(<32 x i16> %A, <32 x i16> %W){ ; SKX-NEXT: vpblendmw %zmm0, %zmm1, %zmm0 {%k1} ; SKX-NEXT: ret{{[l|q]}} ; -; KNL-LABEL: test_mm512_mask_blend_epi16: -; KNL: # %bb.0: # %entry -; KNL-NEXT: vpbroadcastd {{.*#+}} zmm2 = [65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535] -; KNL-NEXT: vpternlogq $216, %zmm2, %zmm1, %zmm0 -; KNL-NEXT: ret{{[l|q]}} +; KNL64-LABEL: test_mm512_mask_blend_epi16: +; KNL64: # %bb.0: # %entry +; KNL64-NEXT: vpternlogd $216, {{.*}}(%rip){1to16}, %zmm1, %zmm0 +; KNL64-NEXT: retq +; +; KNL32-LABEL: test_mm512_mask_blend_epi16: +; KNL32: # %bb.0: # %entry +; KNL32-NEXT: vpternlogd $216, {{\.LCPI.*}}{1to16}, %zmm1, %zmm0 +; KNL32-NEXT: retl entry: %0 = shufflevector <32 x i16> %A, <32 x i16> %W, <32 x i32> ret <32 x i16> %0 From llvm-commits at lists.llvm.org Fri Jul 10 15:16:39 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via llvm-commits) Date: Fri, 10 Jul 2020 15:16:39 -0700 (PDT) Subject: [llvm] 31f4e43 - AMDGPU: Remove .value_type from kernel metadata Message-ID: <5f08e8c7.1c69fb81.e3e56.ffa0@mx.google.com> Author: Matt Arsenault Date: 2020-07-10T18:16:31-04:00 New Revision: 31f4e43f3f391e5c5034580f972e0acc78f99b63 URL: https://github.com/llvm/llvm-project/commit/31f4e43f3f391e5c5034580f972e0acc78f99b63 DIFF: https://github.com/llvm/llvm-project/commit/31f4e43f3f391e5c5034580f972e0acc78f99b63.diff LOG: AMDGPU: Remove .value_type from kernel metadata This doesn't appear used for anything, and is emitted incorrectly based on the description. This also depends on the IR type, and pointee element type. Added: Modified: llvm/docs/AMDGPUUsage.rst llvm/include/llvm/Support/AMDGPUMetadata.h llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp llvm/lib/Support/AMDGPUMetadata.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s Removed: ################################################################################ diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 36930fc253ec..af764c085600 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -2318,29 +2318,10 @@ non-AMD key names should be prefixed by "*vendor-name*.". multi-grid synchronization is passed in the kernarg. - "ValueType" string Required Kernel argument value type. Only - present if "ValueKind" is - "ByValue". For vector data - types, the value is for the - element type. Values include: - - - "Struct" - - "I8" - - "U8" - - "I16" - - "U16" - - "F16" - - "I32" - - "U32" - - "F32" - - "I64" - - "U64" - - "F64" + "ValueType" string Unused and deprecated. This should no longer + be emitted, but is accepted for compatibility. + - .. TODO:: - How can it be determined if a - vector type, and what size - vector? "PointeeAlign" integer Alignment in bytes of pointee type for pointer type kernel argument. Must be a power @@ -2817,29 +2798,9 @@ same *vendor-name*. multi-grid synchronization is passed in the kernarg. - ".value_type" string Required Kernel argument value type. Only - present if ".value_kind" is - "by_value". For vector data - types, the value is for the - element type. Values include: - - - "struct" - - "i8" - - "u8" - - "i16" - - "u16" - - "f16" - - "i32" - - "u32" - - "f32" - - "i64" - - "u64" - - "f64" + ".value_type" string Unused and deprecated. This should no longer + be emitted, but is accepted for compatibility. - .. TODO:: - How can it be determined if a - vector type, and what size - vector? ".pointee_align" integer Alignment in bytes of pointee type for pointer type kernel argument. Must be a power diff --git a/llvm/include/llvm/Support/AMDGPUMetadata.h b/llvm/include/llvm/Support/AMDGPUMetadata.h index eeef4e699c3e..920c97f7e112 100644 --- a/llvm/include/llvm/Support/AMDGPUMetadata.h +++ b/llvm/include/llvm/Support/AMDGPUMetadata.h @@ -79,7 +79,8 @@ enum class ValueKind : uint8_t { Unknown = 0xff }; -/// Value types. +/// Value types. This is deprecated and only remains for compatibility parsing +/// of old metadata. enum class ValueType : uint8_t { Struct = 0, I8 = 1, @@ -164,7 +165,7 @@ constexpr char Offset[] = "Offset"; constexpr char Align[] = "Align"; /// Key for Kernel::Arg::Metadata::mValueKind. constexpr char ValueKind[] = "ValueKind"; -/// Key for Kernel::Arg::Metadata::mValueType. +/// Key for Kernel::Arg::Metadata::mValueType. (deprecated) constexpr char ValueType[] = "ValueType"; /// Key for Kernel::Arg::Metadata::mPointeeAlign. constexpr char PointeeAlign[] = "PointeeAlign"; @@ -198,8 +199,6 @@ struct Metadata final { uint32_t mAlign = 0; /// Value kind. Required. ValueKind mValueKind = ValueKind::Unknown; - /// Value type. Required. - ValueType mValueType = ValueType::Unknown; /// Pointee alignment in bytes. Optional. uint32_t mPointeeAlign = 0; /// Address space qualifier. Optional. diff --git a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp index e8b9e12ce4c8..cd1d872cc219 100644 --- a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp +++ b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp @@ -127,25 +127,6 @@ bool MetadataVerifier::verifyKernelArgs(msgpack::DocNode &Node) { .Default(false); })) return false; - if (!verifyScalarEntry(ArgsMap, ".value_type", true, - msgpack::Type::String, - [](msgpack::DocNode &SNode) { - return StringSwitch(SNode.getString()) - .Case("struct", true) - .Case("i8", true) - .Case("u8", true) - .Case("i16", true) - .Case("u16", true) - .Case("f16", true) - .Case("i32", true) - .Case("u32", true) - .Case("f32", true) - .Case("i64", true) - .Case("u64", true) - .Case("f64", true) - .Default(false); - })) - return false; if (!verifyIntegerEntry(ArgsMap, ".pointee_align", false)) return false; if (!verifyScalarEntry(ArgsMap, ".address_space", false, diff --git a/llvm/lib/Support/AMDGPUMetadata.cpp b/llvm/lib/Support/AMDGPUMetadata.cpp index 4ea197a97389..bfa1fe86cd3e 100644 --- a/llvm/lib/Support/AMDGPUMetadata.cpp +++ b/llvm/lib/Support/AMDGPUMetadata.cpp @@ -111,7 +111,11 @@ struct MappingTraits { YIO.mapRequired(Kernel::Arg::Key::Size, MD.mSize); YIO.mapRequired(Kernel::Arg::Key::Align, MD.mAlign); YIO.mapRequired(Kernel::Arg::Key::ValueKind, MD.mValueKind); - YIO.mapRequired(Kernel::Arg::Key::ValueType, MD.mValueType); + + // Removed. Accepted for parsing compatibility, but not emitted. + Optional Unused; + YIO.mapOptional(Kernel::Arg::Key::ValueType, Unused); + YIO.mapOptional(Kernel::Arg::Key::PointeeAlign, MD.mPointeeAlign, uint32_t(0)); YIO.mapOptional(Kernel::Arg::Key::AddrSpaceQual, MD.mAddrSpaceQual, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp index 5dc387127964..c6f6a3b84e36 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp @@ -127,38 +127,6 @@ ValueKind MetadataStreamerV2::getValueKind(Type *Ty, StringRef TypeQual, ValueKind::ByValue); } -ValueType MetadataStreamerV2::getValueType(Type *Ty, StringRef TypeName) const { - switch (Ty->getTypeID()) { - case Type::IntegerTyID: { - auto Signed = !TypeName.startswith("u"); - switch (Ty->getIntegerBitWidth()) { - case 8: - return Signed ? ValueType::I8 : ValueType::U8; - case 16: - return Signed ? ValueType::I16 : ValueType::U16; - case 32: - return Signed ? ValueType::I32 : ValueType::U32; - case 64: - return Signed ? ValueType::I64 : ValueType::U64; - default: - return ValueType::Struct; - } - } - case Type::HalfTyID: - return ValueType::F16; - case Type::FloatTyID: - return ValueType::F32; - case Type::DoubleTyID: - return ValueType::F64; - case Type::PointerTyID: - return getValueType(Ty->getPointerElementType(), TypeName); - case Type::FixedVectorTyID: - return getValueType(cast(Ty)->getElementType(), TypeName); - default: - return ValueType::Struct; - } -} - std::string MetadataStreamerV2::getTypeName(Type *Ty, bool Signed) const { switch (Ty->getTypeID()) { case Type::IntegerTyID: { @@ -372,7 +340,6 @@ void MetadataStreamerV2::emitKernelArg(const DataLayout &DL, Type *Ty, Arg.mSize = DL.getTypeAllocSize(Ty); Arg.mAlign = DL.getABITypeAlign(Ty).value(); Arg.mValueKind = ValueKind; - Arg.mValueType = getValueType(Ty, BaseTypeName); Arg.mPointeeAlign = PointeeAlign ? PointeeAlign->value() : 0; if (auto PtrTy = dyn_cast(Ty)) @@ -573,38 +540,6 @@ StringRef MetadataStreamerV3::getValueKind(Type *Ty, StringRef TypeQual, : "by_value"); } -StringRef MetadataStreamerV3::getValueType(Type *Ty, StringRef TypeName) const { - switch (Ty->getTypeID()) { - case Type::IntegerTyID: { - auto Signed = !TypeName.startswith("u"); - switch (Ty->getIntegerBitWidth()) { - case 8: - return Signed ? "i8" : "u8"; - case 16: - return Signed ? "i16" : "u16"; - case 32: - return Signed ? "i32" : "u32"; - case 64: - return Signed ? "i64" : "u64"; - default: - return "struct"; - } - } - case Type::HalfTyID: - return "f16"; - case Type::FloatTyID: - return "f32"; - case Type::DoubleTyID: - return "f64"; - case Type::PointerTyID: - return getValueType(Ty->getPointerElementType(), TypeName); - case Type::FixedVectorTyID: - return getValueType(cast(Ty)->getElementType(), TypeName); - default: - return "struct"; - } -} - std::string MetadataStreamerV3::getTypeName(Type *Ty, bool Signed) const { switch (Ty->getTypeID()) { case Type::IntegerTyID: { @@ -801,8 +736,6 @@ void MetadataStreamerV3::emitKernelArg(const DataLayout &DL, Type *Ty, Arg[".offset"] = Arg.getDocument()->getNode(Offset); Offset += Size; Arg[".value_kind"] = Arg.getDocument()->getNode(ValueKind, /*Copy=*/true); - Arg[".value_type"] = - Arg.getDocument()->getNode(getValueType(Ty, BaseTypeName), /*Copy=*/true); if (PointeeAlign) Arg[".pointee_align"] = Arg.getDocument()->getNode(PointeeAlign->value()); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h index 3cbc8bb5e2b9..9534fffd228d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h @@ -67,8 +67,6 @@ class MetadataStreamerV3 final : public MetadataStreamer { StringRef getValueKind(Type *Ty, StringRef TypeQual, StringRef BaseTypeName) const; - StringRef getValueType(Type *Ty, StringRef TypeName) const; - std::string getTypeName(Type *Ty, bool Signed) const; msgpack::ArrayDocNode getWorkGroupDimensions(MDNode *Node) const; @@ -135,8 +133,6 @@ class MetadataStreamerV2 final : public MetadataStreamer { ValueKind getValueKind(Type *Ty, StringRef TypeQual, StringRef BaseTypeName) const; - ValueType getValueType(Type *Ty, StringRef TypeName) const; - std::string getTypeName(Type *Ty, bool Signed) const; std::vector getWorkGroupDimensions(MDNode *Node) const; diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll index 5ed07dc6e3cc..4afa9f35499f 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll @@ -1,6 +1,6 @@ ; RUN: llc -mattr=+code-object-v3 -mtriple=amdgcn-amd-amdhsa -filetype=obj -o - < %s | llvm-readelf --notes | FileCheck %s -; CHECK: - .args: +; CHECK: - .args: ; CHECK-NEXT: - .access: read_only ; CHECK-NEXT: .address_space: global ; CHECK-NEXT: .is_const: true @@ -10,14 +10,12 @@ ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'float*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f32 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: out ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'float*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f32 ; CHECK: .name: test_ro_arg ; CHECK: .symbol: test_ro_arg.kd diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll index 053c21b8eb79..6e22f85f453f 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll @@ -8,7 +8,6 @@ ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: ReadOnly ; CHECK-NEXT: IsConst: true @@ -18,7 +17,6 @@ ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll index 850c6f1a6f63..5f77fef31bfa 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel-v3.ll @@ -9,19 +9,15 @@ ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NOT: .value_kind: hidden_default_queue ; CHECK-NOT: .value_kind: hidden_completion_action ; CHECK: .language: OpenCL C @@ -42,34 +38,27 @@ define amdgpu_kernel void @test_non_enqueue_kernel_caller(i8 %a) #0 ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_default_queue -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_completion_action -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll index 9879a741ad43..086c810c9ded 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-enqueue-kernel.ll @@ -16,20 +16,16 @@ ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NOT: ValueKind: HiddenDefaultQueue ; CHECK-NOT: ValueKind: HiddenCompletionAction define amdgpu_kernel void @test_non_enqueue_kernel_caller(i8 %a) #0 @@ -48,34 +44,27 @@ define amdgpu_kernel void @test_non_enqueue_kernel_caller(i8 %a) #0 ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenDefaultQueue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenCompletionAction -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_enqueue_kernel_caller(i8 %a) #1 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll index 74e80ec43208..ef441b9c3cc2 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll @@ -24,19 +24,15 @@ ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 @@ -44,9 +40,7 @@ ; CHECK-NOT: .value_kind: hidden_completion_action ; CHECK-NOT: .value_kind: hidden_hostcall_buffer ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -65,39 +59,31 @@ define amdgpu_kernel void @test_char(i8 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: ushort2 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: u16 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -116,39 +102,31 @@ define amdgpu_kernel void @test_ushort2(<2 x i16> %a) #0 ; CHECK-NEXT: .size: 16 ; CHECK-NEXT: .type_name: int3 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -167,39 +145,31 @@ define amdgpu_kernel void @test_int3(<3 x i32> %a) #0 ; CHECK-NEXT: .size: 32 ; CHECK-NEXT: .type_name: ulong4 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: u64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -218,39 +188,31 @@ define amdgpu_kernel void @test_ulong4(<4 x i64> %a) #0 ; CHECK-NEXT: .size: 16 ; CHECK-NEXT: .type_name: half8 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -269,39 +231,31 @@ define amdgpu_kernel void @test_half8(<8 x half> %a) #0 ; CHECK-NEXT: .size: 64 ; CHECK-NEXT: .type_name: float16 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: f32 ; CHECK-NEXT: - .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 88 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 96 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 104 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 112 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -320,39 +274,31 @@ define amdgpu_kernel void @test_float16(<16 x float> %a) #0 ; CHECK-NEXT: .size: 128 ; CHECK-NEXT: .type_name: double16 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: f64 ; CHECK-NEXT: - .offset: 128 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 136 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 144 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 152 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 160 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 168 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 176 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -372,39 +318,31 @@ define amdgpu_kernel void @test_double16(<16 x double> %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -424,39 +362,31 @@ define amdgpu_kernel void @test_pointer(i32 addrspace(1)* %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: image2d_t ; CHECK-NEXT: .value_kind: image -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -475,39 +405,31 @@ define amdgpu_kernel void @test_image(%opencl.image2d_t addrspace(1)* %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: sampler_t ; CHECK-NEXT: .value_kind: sampler -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -527,39 +449,31 @@ define amdgpu_kernel void @test_sampler(i32 %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: queue_t ; CHECK-NEXT: .value_kind: queue -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -578,39 +492,31 @@ define amdgpu_kernel void @test_queue(%opencl.queue_t addrspace(1)* %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: struct A ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -629,39 +535,31 @@ define amdgpu_kernel void @test_struct(%struct.A %a) #0 ; CHECK-NEXT: .size: 32 ; CHECK-NEXT: .type_name: struct A ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -680,39 +578,31 @@ define amdgpu_kernel void @test_array([32 x i8] %a) #0 ; CHECK-NEXT: .size: 16 ; CHECK-NEXT: .type_name: i128 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -731,51 +621,41 @@ define amdgpu_kernel void @test_i128(i128 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .name: b ; CHECK-NEXT: .offset: 4 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: short2 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i16 ; CHECK-NEXT: - .name: c ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: char3 ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -795,14 +675,12 @@ define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .address_space: constant ; CHECK-NEXT: .name: c ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: l ; CHECK-NEXT: .offset: 16 @@ -810,39 +688,31 @@ define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -865,7 +735,6 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .is_const: true ; CHECK-NEXT: .is_restrict: true @@ -874,7 +743,6 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .is_pipe: true ; CHECK-NEXT: .name: c @@ -882,39 +750,31 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)*' ; CHECK-NEXT: .value_kind: pipe -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -937,7 +797,6 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: image1d_t ; CHECK-NEXT: .value_kind: image -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .access: write_only ; CHECK-NEXT: .address_space: global ; CHECK-NEXT: .name: wo @@ -945,7 +804,6 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: image2d_t ; CHECK-NEXT: .value_kind: image -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .access: read_write ; CHECK-NEXT: .address_space: global ; CHECK-NEXT: .name: rw @@ -953,39 +811,31 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: image3d_t ; CHECK-NEXT: .value_kind: image -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1006,39 +856,31 @@ define amdgpu_kernel void @test_access_qual(%opencl.image1d_t addrspace(1)* %ro, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1058,39 +900,31 @@ define amdgpu_kernel void @test_vec_type_hint_half(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1110,39 +944,31 @@ define amdgpu_kernel void @test_vec_type_hint_float(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1162,39 +988,31 @@ define amdgpu_kernel void @test_vec_type_hint_double(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1214,39 +1032,31 @@ define amdgpu_kernel void @test_vec_type_hint_char(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1266,39 +1076,31 @@ define amdgpu_kernel void @test_vec_type_hint_short(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1318,39 +1120,31 @@ define amdgpu_kernel void @test_vec_type_hint_long(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1370,39 +1164,31 @@ define amdgpu_kernel void @test_vec_type_hint_unknown(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1427,39 +1213,31 @@ define amdgpu_kernel void @test_reqd_wgs_vec_type_hint(i32 %a) #0 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: int ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1485,39 +1263,31 @@ define amdgpu_kernel void @test_wgs_hint_vec_type_hint(i32 %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'int addrspace(5)* addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1536,39 +1306,31 @@ define amdgpu_kernel void @test_arg_ptr_to_ptr(i32 addrspace(5)* addrspace(1)* % ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: struct B ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1587,39 +1349,31 @@ define amdgpu_kernel void @test_arg_struct_contains_ptr(%struct.B %a) #0 ; CHECK-NEXT: .size: 16 ; CHECK-NEXT: .type_name: 'global int addrspace(5)* __attribute__((ext_vector_type(2)))' ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i32 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1639,39 +1393,31 @@ define amdgpu_kernel void @test_arg_vector_of_ptr(<2 x i32 addrspace(1)*> %a) #0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: clk_event_t ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1692,7 +1438,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'long addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 8 @@ -1700,7 +1445,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: c ; CHECK-NEXT: .offset: 12 @@ -1708,7 +1452,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char2 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: d ; CHECK-NEXT: .offset: 16 @@ -1716,7 +1459,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char3 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: e ; CHECK-NEXT: .offset: 20 @@ -1724,7 +1466,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char4 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: f ; CHECK-NEXT: .offset: 24 @@ -1732,7 +1473,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char8 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: g ; CHECK-NEXT: .offset: 28 @@ -1740,46 +1480,37 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char16 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: h ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .pointee_align: 1 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 88 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1806,7 +1537,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .type_name: 'long addrspace(5)*' ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 8 @@ -1814,7 +1544,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: c ; CHECK-NEXT: .offset: 12 @@ -1822,7 +1551,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char2 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: d ; CHECK-NEXT: .offset: 16 @@ -1830,7 +1558,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char3 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: e ; CHECK-NEXT: .offset: 20 @@ -1838,7 +1565,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char4 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: f ; CHECK-NEXT: .offset: 24 @@ -1846,7 +1572,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char8 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: g ; CHECK-NEXT: .offset: 28 @@ -1854,46 +1579,37 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .type_name: 'char16 addrspace(5)*' ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: local ; CHECK-NEXT: .name: h ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .pointee_align: 16 ; CHECK-NEXT: .size: 4 ; CHECK-NEXT: .value_kind: dynamic_shared_pointer -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 88 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -1918,39 +1634,31 @@ define amdgpu_kernel void @test_pointee_align_attribute(i64 addrspace(1)* align ; CHECK-NEXT: .size: 25 ; CHECK-NEXT: .type_name: __block_literal ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: struct ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 80 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .device_enqueue_symbol: __test_block_invoke_kernel_runtime_handle ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: @@ -1971,39 +1679,31 @@ define amdgpu_kernel void @__test_block_invoke_kernel( ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_printf_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_default_queue -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_completion_action -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 @@ -2021,7 +1721,6 @@ define amdgpu_kernel void @test_enqueue_kernel_caller(i8 %a) #2 ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: i32 ; CHECK: .name: unknown_addrspace_kernarg ; CHECK: .symbol: unknown_addrspace_kernarg.kd define amdgpu_kernel void @unknown_addrspace_kernarg(i32 addrspace(12345)* %ptr) #0 { diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll index d5a5fb630d71..8969c7b45bbb 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll @@ -33,30 +33,24 @@ ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NOT: ValueKind: HiddenHostcallBuffer ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NOT: ValueKind: HiddenDefaultQueue ; CHECK-NOT: ValueKind: HiddenCompletionAction ; CHECK: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_char(i8 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !9 @@ -74,39 +68,31 @@ define amdgpu_kernel void @test_char(i8 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: U16 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_ushort2(<2 x i16> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !10 @@ -124,39 +110,31 @@ define amdgpu_kernel void @test_ushort2(<2 x i16> %a) #0 ; CHECK-NEXT: Size: 16 ; CHECK-NEXT: Align: 16 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_int3(<3 x i32> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !11 @@ -174,39 +152,31 @@ define amdgpu_kernel void @test_int3(<3 x i32> %a) #0 ; CHECK-NEXT: Size: 32 ; CHECK-NEXT: Align: 32 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: U64 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_ulong4(<4 x i64> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !12 @@ -224,39 +194,31 @@ define amdgpu_kernel void @test_ulong4(<4 x i64> %a) #0 ; CHECK-NEXT: Size: 16 ; CHECK-NEXT: Align: 16 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_half8(<8 x half> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !13 @@ -274,39 +236,31 @@ define amdgpu_kernel void @test_half8(<8 x half> %a) #0 ; CHECK-NEXT: Size: 64 ; CHECK-NEXT: Align: 64 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: F32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_float16(<16 x float> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !14 @@ -324,39 +278,31 @@ define amdgpu_kernel void @test_float16(<16 x float> %a) #0 ; CHECK-NEXT: Size: 128 ; CHECK-NEXT: Align: 128 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: F64 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_double16(<16 x double> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !15 @@ -374,40 +320,32 @@ define amdgpu_kernel void @test_double16(<16 x double> %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_pointer(i32 addrspace(1)* %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !16 @@ -425,40 +363,32 @@ define amdgpu_kernel void @test_pointer(i32 addrspace(1)* %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Image -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_image(%opencl.image2d_t addrspace(1)* %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !17 @@ -476,39 +406,31 @@ define amdgpu_kernel void @test_image(%opencl.image2d_t addrspace(1)* %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: Sampler -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_sampler(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !18 @@ -526,40 +448,32 @@ define amdgpu_kernel void @test_sampler(i32 %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Queue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_queue(%opencl.queue_t addrspace(1)* %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !19 @@ -577,39 +491,31 @@ define amdgpu_kernel void @test_queue(%opencl.queue_t addrspace(1)* %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_struct(%struct.A %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !20 @@ -627,39 +533,31 @@ define amdgpu_kernel void @test_struct(%struct.A %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_array([8 x i8] %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !20 @@ -677,39 +575,31 @@ define amdgpu_kernel void @test_array([8 x i8] %a) #0 ; CHECK-NEXT: Size: 16 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_i128(i128 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !21 @@ -727,53 +617,43 @@ define amdgpu_kernel void @test_i128(i128 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: b ; CHECK-NEXT: TypeName: short2 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I16 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: c ; CHECK-NEXT: TypeName: char3 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 !kernel_arg_addr_space !22 !kernel_arg_access_qual !23 !kernel_arg_type !24 @@ -791,7 +671,6 @@ define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: c @@ -799,7 +678,6 @@ define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Constant ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: l @@ -807,41 +685,33 @@ define amdgpu_kernel void @test_multi_arg(i32 %a, <2 x i16> %b, <3 x i8> %c) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: PointeeAlign: 4 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, i32 addrspace(4)* %c, @@ -861,7 +731,6 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: IsVolatile: true @@ -870,7 +739,6 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: IsConst: true @@ -880,41 +748,33 @@ define amdgpu_kernel void @test_addr_space(i32 addrspace(1)* %g, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Pipe -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: IsPipe: true ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, i32 addrspace(1)* %b, @@ -934,7 +794,6 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Image -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: ReadOnly ; CHECK-NEXT: - Name: wo @@ -942,7 +801,6 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Image -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: WriteOnly ; CHECK-NEXT: - Name: rw @@ -950,40 +808,32 @@ define amdgpu_kernel void @test_type_qual(i32 addrspace(1)* %a, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: Image -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: ReadWrite ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_access_qual(%opencl.image1d_t addrspace(1)* %ro, %opencl.image2d_t addrspace(1)* %wo, @@ -1005,39 +855,31 @@ define amdgpu_kernel void @test_access_qual(%opencl.image1d_t addrspace(1)* %ro, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_half(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1057,39 +899,31 @@ define amdgpu_kernel void @test_vec_type_hint_half(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_float(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1109,39 +943,31 @@ define amdgpu_kernel void @test_vec_type_hint_float(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_double(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1161,39 +987,31 @@ define amdgpu_kernel void @test_vec_type_hint_double(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_char(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1213,39 +1031,31 @@ define amdgpu_kernel void @test_vec_type_hint_char(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_short(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1265,39 +1075,31 @@ define amdgpu_kernel void @test_vec_type_hint_short(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_long(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1317,39 +1119,31 @@ define amdgpu_kernel void @test_vec_type_hint_long(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_vec_type_hint_unknown(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1370,39 +1164,31 @@ define amdgpu_kernel void @test_vec_type_hint_unknown(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_reqd_wgs_vec_type_hint(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1424,39 +1210,31 @@ define amdgpu_kernel void @test_reqd_wgs_vec_type_hint(i32 %a) #0 ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_wgs_hint_vec_type_hint(i32 %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !3 @@ -1475,40 +1253,32 @@ define amdgpu_kernel void @test_wgs_hint_vec_type_hint(i32 %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_arg_ptr_to_ptr(i32 addrspace(5)* addrspace(1)* %a) #0 !kernel_arg_addr_space !81 !kernel_arg_access_qual !2 !kernel_arg_type !80 @@ -1526,24 +1296,19 @@ define amdgpu_kernel void @test_arg_ptr_to_ptr(i32 addrspace(5)* addrspace(1)* % ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_arg_struct_contains_ptr(%struct.B %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !82 @@ -1561,39 +1326,31 @@ define amdgpu_kernel void @test_arg_struct_contains_ptr(%struct.B %a) #0 ; CHECK-NEXT: Size: 16 ; CHECK-NEXT: Align: 16 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I32 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_arg_vector_of_ptr(<2 x i32 addrspace(1)*> %a) #0 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !83 @@ -1611,40 +1368,32 @@ define amdgpu_kernel void @test_arg_vector_of_ptr(<2 x i32 addrspace(1)*> %a) #0 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_arg_unknown_builtin_type( %opencl.clk_event_t addrspace(1)* %a) #0 @@ -1663,7 +1412,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: b @@ -1671,7 +1419,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 1 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1680,7 +1427,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 2 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1689,7 +1435,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 4 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1698,7 +1443,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 4 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1707,7 +1451,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 8 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1716,7 +1459,6 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 16 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1724,40 +1466,32 @@ define amdgpu_kernel void @test_arg_unknown_builtin_type( ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: PointeeAlign: 1 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, i8 addrspace(3)* %b, @@ -1782,7 +1516,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Name: b @@ -1790,7 +1523,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 8 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1799,7 +1531,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 32 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1808,7 +1539,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 64 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1817,7 +1547,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 256 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1826,7 +1555,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 128 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1835,7 +1563,6 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: PointeeAlign: 1024 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: AccQual: Default @@ -1843,40 +1570,32 @@ define amdgpu_kernel void @test_pointee_align(i64 addrspace(1)* %a, ; CHECK-NEXT: Size: 4 ; CHECK-NEXT: Align: 4 ; CHECK-NEXT: ValueKind: DynamicSharedPointer -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: PointeeAlign: 16 ; CHECK-NEXT: AddrSpaceQual: Local ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_pointee_align_attribute(i64 addrspace(1)* align 16 %a, i8 addrspace(3)* align 8 %b, @@ -1904,39 +1623,31 @@ define amdgpu_kernel void @test_pointee_align_attribute(i64 addrspace(1)* align ; CHECK-NEXT: Size: 25 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: Struct ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @__test_block_invoke_kernel( <{ i32, i32, i8*, i8 addrspace(1)*, i8 }> %arg) #1 @@ -1955,39 +1666,31 @@ define amdgpu_kernel void @__test_block_invoke_kernel( ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenPrintfBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenDefaultQueue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenCompletionAction -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @test_enqueue_kernel_caller(i8 %a) #2 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !9 @@ -2001,7 +1704,6 @@ define amdgpu_kernel void @test_enqueue_kernel_caller(i8 %a) #2 ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: I32 define amdgpu_kernel void @unknown_addrspace_kernarg(i32 addrspace(12345)* %ptr) #0 { ret void } diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll index 20368f6643ad..67e2bfce5035 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v3.ll @@ -11,19 +11,16 @@ ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK: .name: test0 ; CHECK: .symbol: test0.kd define amdgpu_kernel void @test0( @@ -44,23 +41,19 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK: .name: test8 ; CHECK: .symbol: test8.kd define amdgpu_kernel void @test8( @@ -81,27 +74,22 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK: .name: test16 ; CHECK: .symbol: test16.kd define amdgpu_kernel void @test16( @@ -122,31 +110,25 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK: .name: test24 ; CHECK: .symbol: test24.kd define amdgpu_kernel void @test24( @@ -167,36 +149,29 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK: .name: test32 ; CHECK: .symbol: test32.kd define amdgpu_kernel void @test32( @@ -217,46 +192,37 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK: .name: test48 ; CHECK: .symbol: test48.kd define amdgpu_kernel void @test48( @@ -277,51 +243,41 @@ entry: ; CHECK-NEXT: .offset: 0 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: a ; CHECK-NEXT: .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .name: b ; CHECK-NEXT: .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: global_buffer -; CHECK-NEXT: .value_type: f16 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 40 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 48 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 56 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 64 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_none -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 72 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_multigrid_sync_arg -; CHECK-NEXT: .value_type: i8 ; CHECK: .name: test56 ; CHECK: .symbol: test56.kd define amdgpu_kernel void @test56( diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll index 8b59cdde8458..91e382dcce5f 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args.ll @@ -13,19 +13,16 @@ ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test0( @@ -47,24 +44,20 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test8( half addrspace(1)* %r, @@ -85,28 +78,23 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test16( half addrspace(1)* %r, @@ -127,32 +115,26 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test24( half addrspace(1)* %r, @@ -173,36 +155,29 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test32( @@ -224,46 +199,37 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test48( @@ -285,51 +251,41 @@ entry: ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: a ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Name: b ; CHECK-NEXT: Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: GlobalBuffer -; CHECK-NEXT: ValueType: F16 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenNone -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NEXT: CodeProps: define amdgpu_kernel void @test56( diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll index 8741bfbc1bb6..c52e09158eeb 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent-v3.ll @@ -9,19 +9,15 @@ ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NOT: .value_kind: hidden_hostcall_buffer diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll index 5f1cda0fd216..fb5874acbe7d 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-absent.ll @@ -15,20 +15,16 @@ ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NOT: ValueKind: HiddenHostcallBuffer ; CHECK-NOT: ValueKind: HiddenDefaultQueue ; CHECK-NOT: ValueKind: HiddenCompletionAction diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll index 1a75f3661bd4..8fd62c89a603 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll @@ -9,24 +9,19 @@ ; CHECK-NEXT: .size: 1 ; CHECK-NEXT: .type_name: char ; CHECK-NEXT: .value_kind: by_value -; CHECK-NEXT: .value_type: i8 ; CHECK-NEXT: - .offset: 8 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_x -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 16 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_y -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .offset: 24 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_global_offset_z -; CHECK-NEXT: .value_type: i64 ; CHECK-NEXT: - .address_space: global ; CHECK-NEXT: .offset: 32 ; CHECK-NEXT: .size: 8 ; CHECK-NEXT: .value_kind: hidden_hostcall_buffer -; CHECK-NEXT: .value_type: i8 ; CHECK: .language: OpenCL C ; CHECK-NEXT: .language_version: ; CHECK-NEXT: - 2 diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll index b0428638e254..e8db4d2a866e 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present.ll @@ -15,24 +15,19 @@ ; CHECK-NEXT: Size: 1 ; CHECK-NEXT: Align: 1 ; CHECK-NEXT: ValueKind: ByValue -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AccQual: Default ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetX -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetY -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenGlobalOffsetZ -; CHECK-NEXT: ValueType: I64 ; CHECK-NEXT: - Size: 8 ; CHECK-NEXT: Align: 8 ; CHECK-NEXT: ValueKind: HiddenHostcallBuffer -; CHECK-NEXT: ValueType: I8 ; CHECK-NEXT: AddrSpaceQual: Global ; CHECK-NOT: ValueKind: HiddenDefaultQueue ; CHECK-NOT: ValueKind: HiddenCompletionAction diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll index ec048c2d02db..5ee17a8c4855 100644 --- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll +++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-images-v3.ll @@ -16,92 +16,80 @@ %opencl.image3d_t = type opaque ; CHECK: --- -; CHECK: amdhsa.kernels: -; CHECK: - .args: +; CHECK: amdhsa.kernels: +; CHECK: - .args: ; CHECK: - .address_space: global ; CHECK: .name: a ; CHECK: .offset: 0 ; CHECK: .size: 8 ; CHECK: .type_name: image1d_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: b ; CHECK: .offset: 8 ; CHECK: .size: 8 ; CHECK: .type_name: image1d_array_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: c ; CHECK: .offset: 16 ; CHECK: .size: 8 ; CHECK: .type_name: image1d_buffer_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: d ; CHECK: .offset: 24 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: e ; CHECK: .offset: 32 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_array_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: f ; CHECK: .offset: 40 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_array_depth_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: g ; CHECK: .offset: 48 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_array_msaa_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: h ; CHECK: .offset: 56 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_array_msaa_depth_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: i ; CHECK: .offset: 64 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_depth_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: j ; CHECK: .offset: 72 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_msaa_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: k ; CHECK: .offset: 80 ; CHECK: .size: 8 ; CHECK: .type_name: image2d_msaa_depth_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct ; CHECK: - .address_space: global ; CHECK: .name: l ; CHECK: .offset: 88 ; CHECK: .size: 8 ; CHECK: .type_name: image3d_t ; CHECK: .value_kind: image -; CHECK: .value_type: struct define amdgpu_kernel void @test(%opencl.image1d_t addrspace(1)* %a, %opencl.image1d_array_t addrspace(1)* %b, %opencl.image1d_buffer_t addrspace(1)* %c, diff --git a/llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s b/llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s index f0c290822440..390eba266d89 100644 --- a/llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s +++ b/llvm/test/MC/AMDGPU/hsa-metadata-kernel-args.s @@ -2,6 +2,9 @@ // RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx800 -mattr=-code-object-v3 -show-encoding %s | FileCheck --check-prefix=CHECK --check-prefix=GFX800 %s // RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -show-encoding %s | FileCheck --check-prefix=CHECK --check-prefix=GFX900 %s +// The legacy ValueType field should be parsed without error, but not +// re-emitted. + // CHECK: .amd_amdgpu_hsa_metadata // CHECK: Version: [ 1, 0 ] // CHECK: Printf: @@ -17,24 +20,19 @@ // CHECK: Size: 1 // CHECK: Align: 1 // CHECK: ValueKind: ByValue -// CHECK: ValueType: I8 // CHECK: AccQual: Default // CHECK: - Size: 8 // CHECK: Align: 8 // CHECK: ValueKind: HiddenGlobalOffsetX -// CHECK: ValueType: I64 // CHECK: - Size: 8 // CHECK: Align: 8 // CHECK: ValueKind: HiddenGlobalOffsetY -// CHECK: ValueType: I64 // CHECK: - Size: 8 // CHECK: Align: 8 // CHECK: ValueKind: HiddenGlobalOffsetZ -// CHECK: ValueType: I64 // CHECK: - Size: 8 // CHECK: Align: 8 // CHECK: ValueKind: HiddenPrintfBuffer -// CHECK: ValueType: I8 // CHECK: AddrSpaceQual: Global // CHECK: .end_amd_amdgpu_hsa_metadata .amd_amdgpu_hsa_metadata From llvm-commits at lists.llvm.org Fri Jul 10 15:16:54 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:16:54 +0000 (UTC) Subject: [PATCH] D82818: AMDGPU: Remove .value_type from kernel metadata In-Reply-To: References: Message-ID: arsenm closed this revision. arsenm added a comment. 31f4e43f3f391e5c5034580f972e0acc78f99b63 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82818/new/ https://reviews.llvm.org/D82818 From llvm-commits at lists.llvm.org Fri Jul 10 15:21:10 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 15:21:10 -0700 (PDT) Subject: [llvm] cc28058 - Temporarily revert "[NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD)" Message-ID: <5f08e9d6.1c69fb81.f3780.0800@mx.google.com> Author: Eric Christopher Date: 2020-07-10T15:21:00-07:00 New Revision: cc28058c13e89ecc85dac7e1bd5d13a2ce1bb620 URL: https://github.com/llvm/llvm-project/commit/cc28058c13e89ecc85dac7e1bd5d13a2ce1bb620 DIFF: https://github.com/llvm/llvm-project/commit/cc28058c13e89ecc85dac7e1bd5d13a2ce1bb620.diff LOG: Temporarily revert "[NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD)" as it wasn't NFC and is causing issues with thinlto bitcode reading. I've followed up offline with reproduction instructions and testcases. This reverts commit 30582457b47004dec8a78144abc919a13ccbd08c. Added: Modified: llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/lib/Bitcode/Reader/BitcodeReader.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index a0c22a7d0905..de4fe6630324 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -536,9 +536,8 @@ enum FunctionCodes { FUNC_CODE_DEBUG_LOC = 35, // DEBUG_LOC: [Line,Col,ScopeVal, IAVal] FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, - // success_ordering, ssid, - // failure_ordering?, weak?] + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, vol, + // ordering, synchscope] FUNC_CODE_INST_ATOMICRMW = 38, // ATOMICRMW: [ptrty,ptr,val, operation, // align, vol, // ordering, synchscope] @@ -552,9 +551,8 @@ enum FunctionCodes { FUNC_CODE_INST_GEP = 43, // GEP: [inbounds, n x operands] FUNC_CODE_INST_STORE = 44, // STORE: [ptrty,ptr,valty,val, align, vol] FUNC_CODE_INST_STOREATOMIC = 45, // STORE: [ptrty,ptr,val, align, vol - FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty, ptr, cmp, newval, vol, - // success_ordering, ssid, - // failure_ordering, weak] + FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty,ptr,valty,cmp,new, align, + // vol,ordering,synchscope] FUNC_CODE_INST_LANDINGPAD = 47, // LANDINGPAD: [ty,val,num,id0,val0...] FUNC_CODE_INST_CLEANUPRET = 48, // CLEANUPRET: [val] or [val,bb#] FUNC_CODE_INST_CATCHRET = 49, // CATCHRET: [val,bb#] diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index ad1e97540298..659e26c2bd25 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -4982,120 +4982,63 @@ Error BitcodeReader::parseFunctionBody(Function *F) { InstructionList.push_back(I); break; } - case bitc::FUNC_CODE_INST_CMPXCHG_OLD: { - // CMPXCHG:[ptrty, ptr, cmp, new, vol, success_ordering, ssid, + case bitc::FUNC_CODE_INST_CMPXCHG_OLD: + case bitc::FUNC_CODE_INST_CMPXCHG: { + // CMPXCHG:[ptrty, ptr, cmp, new, vol, successordering, ssid, // failureordering?, isweak?] - const size_t RecordCount = Record.size(); - unsigned Slot = 0; - Value *Ptr = nullptr; - if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) + unsigned OpNum = 0; + Value *Ptr, *Cmp, *New; + if (getValueTypePair(Record, OpNum, NextValueNo, Ptr, &FullTy)) return error("Invalid record"); if (!isa(Ptr->getType())) return error("Cmpxchg operand is not a pointer type"); - Value *Cmp = nullptr; - if (popValue(Record, Slot, NextValueNo, getPointerElementFlatType(FullTy), - Cmp)) - return error("Invalid record"); - - if (!(RecordCount == 6 || RecordCount == 7 || RecordCount == 8)) + if (BitCode == bitc::FUNC_CODE_INST_CMPXCHG) { + if (getValueTypePair(Record, OpNum, NextValueNo, Cmp, &FullTy)) + return error("Invalid record"); + } else if (popValue(Record, OpNum, NextValueNo, + getPointerElementFlatType(FullTy), Cmp)) return error("Invalid record"); + else + FullTy = cast(FullTy)->getElementType(); - Value *New = nullptr; - if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) + if (popValue(Record, OpNum, NextValueNo, Cmp->getType(), New) || + Record.size() < OpNum + 3 || Record.size() > OpNum + 5) return error("Invalid record"); - if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), Ptr->getType())) - return Err; - - const bool IsVol = Record[3]; - - const AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[4]); + AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[OpNum + 1]); if (SuccessOrdering == AtomicOrdering::NotAtomic || SuccessOrdering == AtomicOrdering::Unordered) return error("Invalid record"); + SyncScope::ID SSID = getDecodedSyncScopeID(Record[OpNum + 2]); - const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); - + if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), Ptr->getType())) + return Err; AtomicOrdering FailureOrdering; - if (RecordCount > 6) - FailureOrdering = getDecodedOrdering(Record[6]); - else + if (Record.size() < 7) FailureOrdering = AtomicCmpXchgInst::getStrongestFailureOrdering(SuccessOrdering); + else + FailureOrdering = getDecodedOrdering(Record[OpNum + 3]); - const Align Alignment( + Align Alignment( TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); - - FullTy = cast(FullTy)->getElementType(); - FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID); + FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); + cast(I)->setVolatile(Record[OpNum]); - cast(I)->setVolatile(IsVol); - - if (RecordCount > 7) { - cast(I)->setWeak(Record[7]); - } else { + if (Record.size() < 8) { // Before weak cmpxchgs existed, the instruction simply returned the // value loaded from memory, so bitcode files from that era will be // expecting the first component of a modern cmpxchg. CurBB->getInstList().push_back(I); I = ExtractValueInst::Create(I, 0); FullTy = cast(FullTy)->getElementType(0); + } else { + cast(I)->setWeak(Record[OpNum+4]); } - InstructionList.push_back(I); - break; - } - case bitc::FUNC_CODE_INST_CMPXCHG: { - // CMPXCHG: [ptrty, ptr, cmp, newval, vol, success_ordering, ssid, - // failure_ordering, weak] - const size_t RecordCount = Record.size(); - unsigned Slot = 0; - Value *Ptr = nullptr; - if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) - return error("Invalid record"); - - if (!isa(Ptr->getType())) - return error("Cmpxchg operand is not a pointer type"); - - Value *Cmp = nullptr; - if (getValueTypePair(Record, Slot, NextValueNo, Cmp, &FullTy)) - return error("Invalid record"); - - if (RecordCount != 8) - return error("Invalid record"); - - Value *New = nullptr; - if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) - return error("Invalid record"); - - const bool IsVol = Record[3]; - - const AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[4]); - if (SuccessOrdering == AtomicOrdering::NotAtomic || - SuccessOrdering == AtomicOrdering::Unordered) - return error("Invalid record"); - - const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); - - if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), Ptr->getType())) - return Err; - - const AtomicOrdering FailureOrdering = getDecodedOrdering(Record[6]); - - const bool IsWeak = Record[7]; - - const Align Alignment( - TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); - - FullTy = StructType::get(Context, {FullTy, Type::getInt1Ty(Context)}); - I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, - FailureOrdering, SSID); - - cast(I)->setVolatile(IsVol); - cast(I)->setWeak(IsWeak); InstructionList.push_back(I); break; From llvm-commits at lists.llvm.org Fri Jul 10 15:23:24 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 15:23:24 -0700 Subject: [llvm] 3058245 - [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: <5f07ee46.1c69fb81.3eb4d.b0f5@mx.google.com> References: <5f07ee46.1c69fb81.3eb4d.b0f5@mx.google.com> Message-ID: Hi Guillaume, I've temporarily reverted this in cc28058c13e89ecc85dac7e1bd5d13a2ce1bb620 as it was causing compilation errors for thinlto. I'll send you an email with some reproduction instructions offline. Sorry for any inconvenience :( -eric On Thu, Jul 9, 2020 at 9:27 PM Guillaume Chatelet via llvm-commits < llvm-commits at lists.llvm.org> wrote: > > Author: Guillaume Chatelet > Date: 2020-07-10T04:27:39Z > New Revision: 30582457b47004dec8a78144abc919a13ccbd08c > > URL: > https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c > DIFF: > https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c.diff > > LOG: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) > > This is preparatory work to unable storing alignment for AtomicCmpXchgInst. > See D83136 for context and bug: > https://bugs.llvm.org/show_bug.cgi?id=27168 > > Differential Revision: https://reviews.llvm.org/D83375 > > Added: > > > Modified: > llvm/include/llvm/Bitcode/LLVMBitCodes.h > llvm/lib/Bitcode/Reader/BitcodeReader.cpp > > Removed: > > > > > ################################################################################ > diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h > b/llvm/include/llvm/Bitcode/LLVMBitCodes.h > index de4fe6630324..a0c22a7d0905 100644 > --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h > +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h > @@ -536,8 +536,9 @@ enum FunctionCodes { > > FUNC_CODE_DEBUG_LOC = 35, // DEBUG_LOC: [Line,Col,ScopeVal, > IAVal] > FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] > - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, align, > vol, > - // ordering, synchscope] > + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, vol, > + // success_ordering, ssid, > + // failure_ordering?, weak?] > FUNC_CODE_INST_ATOMICRMW = 38, // ATOMICRMW: [ptrty,ptr,val, > operation, > // align, vol, > // ordering, synchscope] > @@ -551,8 +552,9 @@ enum FunctionCodes { > FUNC_CODE_INST_GEP = 43, // GEP: [inbounds, n x operands] > FUNC_CODE_INST_STORE = 44, // STORE: [ptrty,ptr,valty,val, align, > vol] > FUNC_CODE_INST_STOREATOMIC = 45, // STORE: [ptrty,ptr,val, align, vol > - FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty,ptr,valty,cmp,new, > align, > - // vol,ordering,synchscope] > + FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty, ptr, cmp, newval, > vol, > + // success_ordering, ssid, > + // failure_ordering, weak] > FUNC_CODE_INST_LANDINGPAD = 47, // LANDINGPAD: [ty,val,num,id0,val0...] > FUNC_CODE_INST_CLEANUPRET = 48, // CLEANUPRET: [val] or [val,bb#] > FUNC_CODE_INST_CATCHRET = 49, // CATCHRET: [val,bb#] > > diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > index 659e26c2bd25..ad1e97540298 100644 > --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > @@ -4982,63 +4982,120 @@ Error BitcodeReader::parseFunctionBody(Function > *F) { > InstructionList.push_back(I); > break; > } > - case bitc::FUNC_CODE_INST_CMPXCHG_OLD: > - case bitc::FUNC_CODE_INST_CMPXCHG: { > - // CMPXCHG:[ptrty, ptr, cmp, new, vol, successordering, ssid, > + case bitc::FUNC_CODE_INST_CMPXCHG_OLD: { > + // CMPXCHG:[ptrty, ptr, cmp, new, vol, success_ordering, ssid, > // failureordering?, isweak?] > - unsigned OpNum = 0; > - Value *Ptr, *Cmp, *New; > - if (getValueTypePair(Record, OpNum, NextValueNo, Ptr, &FullTy)) > + const size_t RecordCount = Record.size(); > + unsigned Slot = 0; > + Value *Ptr = nullptr; > + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) > return error("Invalid record"); > > if (!isa(Ptr->getType())) > return error("Cmpxchg operand is not a pointer type"); > > - if (BitCode == bitc::FUNC_CODE_INST_CMPXCHG) { > - if (getValueTypePair(Record, OpNum, NextValueNo, Cmp, &FullTy)) > - return error("Invalid record"); > - } else if (popValue(Record, OpNum, NextValueNo, > - getPointerElementFlatType(FullTy), Cmp)) > + Value *Cmp = nullptr; > + if (popValue(Record, Slot, NextValueNo, > getPointerElementFlatType(FullTy), > + Cmp)) > return error("Invalid record"); > - else > - FullTy = cast(FullTy)->getElementType(); > > - if (popValue(Record, OpNum, NextValueNo, Cmp->getType(), New) || > - Record.size() < OpNum + 3 || Record.size() > OpNum + 5) > + if (!(RecordCount == 6 || RecordCount == 7 || RecordCount == 8)) > return error("Invalid record"); > > - AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[OpNum + > 1]); > - if (SuccessOrdering == AtomicOrdering::NotAtomic || > - SuccessOrdering == AtomicOrdering::Unordered) > + Value *New = nullptr; > + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) > return error("Invalid record"); > - SyncScope::ID SSID = getDecodedSyncScopeID(Record[OpNum + 2]); > > if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), > Ptr->getType())) > return Err; > + > + const bool IsVol = Record[3]; > + > + const AtomicOrdering SuccessOrdering = > getDecodedOrdering(Record[4]); > + if (SuccessOrdering == AtomicOrdering::NotAtomic || > + SuccessOrdering == AtomicOrdering::Unordered) > + return error("Invalid record"); > + > + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); > + > AtomicOrdering FailureOrdering; > - if (Record.size() < 7) > + if (RecordCount > 6) > + FailureOrdering = getDecodedOrdering(Record[6]); > + else > FailureOrdering = > > AtomicCmpXchgInst::getStrongestFailureOrdering(SuccessOrdering); > - else > - FailureOrdering = getDecodedOrdering(Record[OpNum + 3]); > > - Align Alignment( > + const Align Alignment( > TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); > + > + FullTy = cast(FullTy)->getElementType(); > + FullTy = StructType::get(Context, {FullTy, > Type::getInt1Ty(Context)}); > I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, > FailureOrdering, SSID); > - FullTy = StructType::get(Context, {FullTy, > Type::getInt1Ty(Context)}); > - cast(I)->setVolatile(Record[OpNum]); > > - if (Record.size() < 8) { > + cast(I)->setVolatile(IsVol); > + > + if (RecordCount > 7) { > + cast(I)->setWeak(Record[7]); > + } else { > // Before weak cmpxchgs existed, the instruction simply returned > the > // value loaded from memory, so bitcode files from that era will > be > // expecting the first component of a modern cmpxchg. > CurBB->getInstList().push_back(I); > I = ExtractValueInst::Create(I, 0); > FullTy = cast(FullTy)->getElementType(0); > - } else { > - cast(I)->setWeak(Record[OpNum+4]); > } > + InstructionList.push_back(I); > + break; > + } > + case bitc::FUNC_CODE_INST_CMPXCHG: { > + // CMPXCHG: [ptrty, ptr, cmp, newval, vol, success_ordering, ssid, > + // failure_ordering, weak] > + const size_t RecordCount = Record.size(); > + unsigned Slot = 0; > + Value *Ptr = nullptr; > + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) > + return error("Invalid record"); > + > + if (!isa(Ptr->getType())) > + return error("Cmpxchg operand is not a pointer type"); > + > + Value *Cmp = nullptr; > + if (getValueTypePair(Record, Slot, NextValueNo, Cmp, &FullTy)) > + return error("Invalid record"); > + > + if (RecordCount != 8) > + return error("Invalid record"); > + > + Value *New = nullptr; > + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) > + return error("Invalid record"); > + > + const bool IsVol = Record[3]; > + > + const AtomicOrdering SuccessOrdering = > getDecodedOrdering(Record[4]); > + if (SuccessOrdering == AtomicOrdering::NotAtomic || > + SuccessOrdering == AtomicOrdering::Unordered) > + return error("Invalid record"); > + > + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); > + > + if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), > Ptr->getType())) > + return Err; > + > + const AtomicOrdering FailureOrdering = > getDecodedOrdering(Record[6]); > + > + const bool IsWeak = Record[7]; > + > + const Align Alignment( > + TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); > + > + FullTy = StructType::get(Context, {FullTy, > Type::getInt1Ty(Context)}); > + I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, SuccessOrdering, > + FailureOrdering, SSID); > + > + cast(I)->setVolatile(IsVol); > + cast(I)->setWeak(IsWeak); > > InstructionList.push_back(I); > break; > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 15:26:26 2020 From: llvm-commits at lists.llvm.org (Alexei Starovoitov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:26:26 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <97ffdccaf011743a63570b12b9d7773b@localhost.localdomain> ast added a comment. 'float *var;' in the struct won't be trivial to handle with a skip indeed. My concern with float->char[] substitute is that the kernel decides on calling convention based on these types. If there is a function fn(int a, float b); and pahole emits BTF for it as fn(int a, char b[4]) the kernel will let bpf progs to attach to it with wrong register passing. Currently array is not allowed in btf_distill_func_proto(), so it's a theoretical issue, but still dangerous long term. I think pahole/clang should either skip generating BTF for anything with float or BTF should be extended to encode it. I think extending BTF would be easier. I don't like KIND_INT_FLOATING though. KIND_FLOAT is better. Just like single KIND_INT that represents char/int/long the KIND_FLOAT should be able to represent float/double/long double. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 15:28:30 2020 From: llvm-commits at lists.llvm.org (JF Bastien via llvm-commits) Date: Fri, 10 Jul 2020 15:28:30 -0700 (PDT) Subject: [llvm] 7bf73bc - [docs] LLVM Security Group and Process Message-ID: <5f08eb8e.1c69fb81.8c769.037c@mx.google.com> Author: JF Bastien Date: 2020-07-10T15:24:02-07:00 New Revision: 7bf73bcf6d9335938bd072b11809d305173c7c1e URL: https://github.com/llvm/llvm-project/commit/7bf73bcf6d9335938bd072b11809d305173c7c1e DIFF: https://github.com/llvm/llvm-project/commit/7bf73bcf6d9335938bd072b11809d305173c7c1e.diff LOG: [docs] LLVM Security Group and Process Summary: See the corresponding RFC on llvm-dev for a discussion of this proposal. http://lists.llvm.org/pipermail/llvm-dev/2019-November/136839.html Subscribers: jkorous, dexonsmith, arphaman, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70326 Added: llvm/docs/Security.rst Modified: llvm/docs/Contributing.rst llvm/docs/HowToSubmitABug.rst llvm/docs/Reference.rst llvm/docs/index.rst Removed: ################################################################################ diff --git a/llvm/docs/Contributing.rst b/llvm/docs/Contributing.rst index 9185bfdf4917..45d4d7a27fa7 100644 --- a/llvm/docs/Contributing.rst +++ b/llvm/docs/Contributing.rst @@ -37,6 +37,11 @@ and use the built binaries to reproduce the failure described in the bug. Use a debug build (`-DCMAKE_BUILD_TYPE=Debug`) or a build with assertions (`-DLLVM_ENABLE_ASSERTIONS=On`, enabled for Debug builds). +Reporting a Security Issue +-------------------------- + +There is a separate process to submit security-related bugs, see :ref:`report-security-issue`. + Bigger Pieces of Work --------------------- In case you are interested in taking on a bigger piece of work, a list of diff --git a/llvm/docs/HowToSubmitABug.rst b/llvm/docs/HowToSubmitABug.rst index ac28f290bbdd..58c020fbaedb 100644 --- a/llvm/docs/HowToSubmitABug.rst +++ b/llvm/docs/HowToSubmitABug.rst @@ -10,6 +10,8 @@ If you're working with LLVM and run into a bug, we definitely want to know about it. This document describes what you can do to increase the odds of getting it fixed quickly. +🔒 If you believe that the bug is security related, please follow :ref:`report-security-issue`. 🔒 + Basically you have to do two things at a minimum. First, decide whether the bug `crashes the compiler`_ (or an LLVM pass), or if the compiler is `miscompiling`_ the program (i.e., the diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst index d116edafb9bf..3911094fbfd6 100644 --- a/llvm/docs/Reference.rst +++ b/llvm/docs/Reference.rst @@ -37,6 +37,7 @@ LLVM and API reference documentation. PDB/index ScudoHardenedAllocator MemTagSanitizer + Security SegmentedStacks StackMaps SpeculativeLoadHardening diff --git a/llvm/docs/Security.rst b/llvm/docs/Security.rst new file mode 100644 index 000000000000..62085cc1c202 --- /dev/null +++ b/llvm/docs/Security.rst @@ -0,0 +1,220 @@ +=================== +LLVM Security Group +=================== + +The LLVM Security Group has the following goals: + +1. Allow LLVM contributors and security researchers to disclose security-related issues affecting the LLVM project to members of the LLVM community. +2. Organize fixes, code reviews, and release management for said issues. +3. Allow distributors time to investigate and deploy fixes before wide dissemination of vulnerabilities or mitigation shortcomings. +4. Ensure timely notification and release to vendors who package and distribute LLVM-based toolchains and projects. +5. Ensure timely notification to users of LLVM-based toolchains whose compiled code is security-sensitive, through the `CVE process`_. +6. Strive to improve security over time, for example by adding additional testing, fuzzing, and hardening after fixing issues. + +*Note*: these goals ensure timely action, provide disclosure timing when issues are reported, and respect vendors' / packagers' / users' constraints. + +The LLVM Security Group is private. It is composed of trusted LLVM contributors. Its discussions remain within the Security Group (plus issue reporter and key experts) while an issue is being investigated. After an issue becomes public, the entirety of the group’s discussions pertaining to that issue also become public. + + +Group Composition +================= + +Security Group Members +---------------------- + +The members of the group represent a wide cross-section of the community, and meet the criteria for inclusion below. + +* Akila Srinivasan (Apple) +* Dimitry Andric (invidual; FreeBSD) +* Ed Maste (individual; FreeBSD) +* JF Bastien (Apple) +* Josh Eads (Sony) +* Kristof Beyls (ARM) +* Matthew Riley (Google) +* Oliver Hunt (Apple) +* Paul Robinson (Sony) +* Peter Smith (ARM) +* Philip Reames (Azul Systems Inc) +* Pietro Albini (individual; Rust) +* Serge Guelton (RedHat) +* Shayne Hiet-Block (Microsoft) +* Steve Klabnik (Oxide Computer Company; Rust) + +Criteria +-------- + +* Nominees for LLVM Security Group membership should fall in one of these groups: + + - Individual contributors: + + + Specializes in fixing compiler-based security related issues or often participates in their exploration and resolution. + + Has a track record of finding security vulnerabilities and responsible disclosure of those vulnerabilities. + + Is a compiler expert who has specific interests in knowing about, resolving, and preventing future security vulnerabilities. + + Has actively contributed non-trivial code to the LLVM project in the last year. + + - Researchers: + + + Has a track record of finding security vulnerabilities and responsible disclosure of those vulnerabilities. + + Is a compiler expert who has specific interests in knowing about, resolving, and preventing future security vulnerabilities. + + - Vendor contacts: + + + Represents an organization or company which ships products that include their own copy of LLVM. Due to their position in the organization, the nominee has a reasonable need to know about security issues and disclosure embargoes. + +* Additionally, the following are necessary but not sufficient criteria for membership in the LLVM Security Group: + + - If already in the LLVM Security Group, has actively participated in one (if any) security issue in the last year. + - If already in the LLVM Security Group, has actively participated in most membership discussions in the last year. + - If already in the LLVM Security Group, has actively participated in writing or reviewing a transparency report in the last year. + - When employed by a company or other entity, the parent entity has no more than three members already in the LLVM Security Group. + - When nominated as a vendor contact, their position with that vendor remains the same as when originally nominated. + - Nominees are trusted by existing Security Group members to keep communications embargoed while still active. + +Nomination process +------------------ + +Anyone who feels they meet these criteria can nominate themselves, or may be nominated by a third party such as an existing LLVM Security Group member. The nomination should state whether the nominee is nominated as an individual, researcher, or as a vendor contact. It should clearly describe the grounds for nomination. + +*FUTURE*: where nomination occurs (mailing list, GitHub, etc), can be decided later. See `Discussion Medium`_ below. + + +Choosing new members +-------------------- + +If a nomination for LLVM Security Group membership is supported by a majority of existing LLVM Security Group members, then it carries within five business days unless an existing member of the Security Group objects. If an objection is raised, the LLVM Security Group members should discuss the matter and try to come to consensus; failing this, the nomination will succeed only by a two-thirds supermajority vote of the LLVM Security Group. + +Accepting membership +-------------------- + +Before new LLVM Security Group membership is finalized, the successful nominee should accept membership and agree to abide by this security policy, particularly `Privileges and Responsibilities of LLVM Security Group Members`_ below. + +Keeping Membership Current +-------------------------- + +* At least every six months, the LLVM Security Group applies the above criteria. The membership list is pruned accordingly. +* Any Security Group member can ask that the criteria be applied within the next five business days. +* If a member of the LLVM Security Group does not act in accordance with the letter and spirit of this policy, then their LLVM Security Group membership can be revoked by a majority vote of the members, not including the person under consideration for revocation. After a member calls for a revocation vote, voting will be open for five business days. +* Emergency suspension: an LLVM Security Group member who blatantly disregards the LLVM Security Policy may have their membership temporarily suspended on the request of any two members. In such a case, the requesting members should notify the Security Group with a description of the offense. At this point, membership will be temporarily suspended for five business days, pending outcome of the vote for permanent revocation. +* The LLVM Board may remove any member from the LLVM Security Group. + +Transparency Report +------------------- + +Every year, the LLVM Security Group must publish a transparency report. The intent of this report is to keep the community informed by summarizing the disclosures that have been made public in the last year. It shall contain a list of all public disclosures, as well as statistics on time to fix issues, length of embargo periods, and so on. + + +Privileges and Responsibilities of LLVM Security Group Members +============================================================== + +Access +------ + +LLVM Security Group members will be subscribed to a private `Discussion Medium`_ (*FUTURE*: see section below). It will be used for technical discussions of security issues, as well as process discussions about matters such as disclosure timelines and group membership. Members have access to all security issues. + +Confidentiality +--------------- + +Members of the LLVM Security Group will be expected to treat LLVM security issue information shared with the group as confidential until publicly disclosed: + +* Members should not disclose security issue information to non-members unless both members are employed by the same vendor of a LLVM based product, in which case information can be shared within that organization on a need-to-know basis and handled as confidential information normally is within that organization. +* If the LLVM Security Group agrees, designated members may share issues with vendors of non-LLVM based products if their product suffers from the same issue. The non-LLVM vendor should be asked to respect the issue’s embargo date, and to not share the information beyond the need-to-know people within their organization. +* If the LLVM Security Group agrees, key experts can be brought in to help address particular issues. The key expert should be asked to respect the issue’s embargo date, and to not share the information. + +Disclosure +---------- + +Following the process below, the LLVM Security Group decides on embargo date for public disclosure for each Security issue. An embargo may be lifted before the agreed-upon date if all vendors planning to ship a fix have already done so, and if the reporter does not object. + +Collaboration +------------- + +Members of the LLVM Security Group are expected to: + +* Promptly share any LLVM vulnerabilities they become aware of. +* Volunteer to drive issues forward. +* Help evaluate the severity of incoming issues. +* Help write and review patches to address security issues. +* Participate in the member nomination and removal processes. + + +Discussion Medium +================= + +*FUTURE*: this section needs more work! Where discussions occur is influenced by other factors that are still open in this document. We can figure it out later. +See other existing systems: `chromium issue tracker`_, tentative `GitHub security`_. It seems like bugzilla and email don’t meet security requirements. + +The medium used to host LLVM Security Group discussions is security-sensitive. It should therefore run on infrastructure which can meet our security expectations. + +This is where all security discussions occur: + +* File security issues. +* Nominate new members. +* Propose member removal. +* Suggest policy changes. +* Discuss security improvements to LLVM. + + +When a new issue is filed, a template is provided to help issue reporters provide all relevant information. + + +Process +======= + +The following process occurs on the discussion medium for each reported issue: + +* A security issue reporter (not necessarily an LLVM contributor) reports an issue. +* Within two business days, a member of the Security Group is put in charge of driving the issue to an acceptable resolution. This champion doesn’t need to be the same person for each issue. This person can self-nominate. +* Members of the Security Group discuss in which circumstances (if any) an issue is relevant to security, and determine if it is a security issue. +* Negotiate an embargo date for public disclosure, with a default minimum time limit of ninety days. +* Security Group members can recommend that key experts be pulled in to specific issue discussions. The key expert can be pulled in unless there are objections from other Security Group members. +* Patches are written and reviewed. +* Backporting security patches from recent versions to old versions cannot always work. It is up to the Security Group to decide if such backporting should be done, and how far back. +* The Security Group figures out how the LLVM project’s own releases, as well as individual vendors’ releases, can be timed to patch the issue simultaneously. +* Embargo date can be delayed or pulled forward at the Security Group’s discretion. +* The issue champion obtains a CVE entry from MITRE_. +* Once the embargo expires, the patch is posted publicly according to LLVM’s usual code review process. +* All security issues (as well as nomination / removal discussions) become public within approximately fourteen weeks of the fix landing in the LLVM repository. Precautions should be taken to avoid disclosing particularly sensitive data included in the report (e.g. username and password pairs). + + +Changes to the Policy +===================== + +The LLVM Security Policy may be changed by majority vote of the LLVM Security Group. Such changes also need to be approved by the LLVM Board. + + +What is considered a security issue? +==================================== + +*FUTURE*: this section will be expanded once the Security Group is formed, and it agrees on an initial security surface area. + +The LLVM Project has a significant amount of code, and not all of it is considered security-sensitive. This is particularly true because LLVM is used in a wide variety of circumstances: there are diff erent threat models, untrusted inputs diff er, and the environment LLVM runs in is varied. Therefore, what the LLVM Project considers a security issue is what its members have signed up to maintain securely. + +As this security process matures, members of the LLVM community can propose that a part of the codebase be designated as security-sensitive (or no longer security-sensitive). This requires a rationale, and buy-in from the LLVM community as for any RFC. In some cases, parts of the codebase could be handled as security-sensitive but need significant work to get to the stage where that's manageable. The LLVM community will need to decide whether it wants to invest in making these parts of the code secure-able, and maintain these security properties over time. In all cases the LLVM Security Group should be consulted, since they'll be responding to security issues filed against these parts of the codebase. + +If you're not sure whether an issue is in-scope for this security process or not, err towards assuming that it is. The Security Group might agree or disagree and will explain its rationale in the report, as well as update this document through the above process. + +The security-sensitive parts of the LLVM Project currently are: + +* None (this process is new, the list hasn't been populated yet) +* *FUTURE*: this section will be expanded. + +The parts of the LLVM Project which are currently treated as non-security sensitive are: + +* Language front-ends, such as clang, for which a malicious input file can cause undesirable behavior. For example, a maliciously-crafter C or Rust source file can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been hardened, and compiling untrusted code usually also includes running utilities such as `make` which can more readily perform malicious things. +* *FUTURE*: this section will be expanded. + +.. _report-security-issue: + +How to report a security issue? +=============================== + +*FUTURE*: this section will be expanded once we’ve figured out other details above. + +Not everyone who wants to report a security issue will be familiar with LLVM, its community, and processes. Therefore, this needs to be easy to find on the LLVM website, and set clear expectations to issue reporters. + + + +.. _CVE process: https://cve.mitre.org +.. _chromium issue tracker: https://crbug.com +.. _GitHub security: https://help.github.com/en/articles/about-maintainer-security-advisories +.. _MITRE: https://cve.mitre.org diff --git a/llvm/docs/index.rst b/llvm/docs/index.rst index 7315d7278f8b..7e0bc8c4552b 100644 --- a/llvm/docs/index.rst +++ b/llvm/docs/index.rst @@ -83,6 +83,10 @@ LLVM welcomes contributions of all kinds. To learn more, see the following artic * :ref:`meetups-social-events` * :ref:`community-proposals` + Reporting a security issue + +* :ref:`report-security-issue` + Indices and tables ================== From llvm-commits at lists.llvm.org Fri Jul 10 15:28:41 2020 From: llvm-commits at lists.llvm.org (JF Bastien via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:28:41 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <48ec381f544800873312523d869a16e2@localhost.localdomain> This revision was automatically updated to reflect the committed changes. jfb marked an inline comment as done. Closed by commit rG7bf73bcf6d93: [docs] LLVM Security Group and Process (authored by jfb). Changed prior to commit: https://reviews.llvm.org/D70326?vs=277107&id=277155#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 Files: llvm/docs/Contributing.rst llvm/docs/HowToSubmitABug.rst llvm/docs/Reference.rst llvm/docs/Security.rst llvm/docs/index.rst -------------- next part -------------- A non-text attachment was scrubbed... Name: D70326.277155.patch Type: text/x-patch Size: 16700 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 15:31:33 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:31:33 +0000 (UTC) Subject: [PATCH] D81429: [COFF] Port CallGraphSort to COFF from ELF In-Reply-To: References: Message-ID: <1d0db2bf7f3948e9e9725fb48f15edc3@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/COFF/cgprofile-txt.s:21 +# CHECK: 140001000 T B +# CHECK: 140001001 T C +# CHECK: 140001002 T D ---------------- Add `-NEXT:` whenever applicable ================ Comment at: lld/test/COFF/cgprofile-txt.s:44 + nop + ---------------- Delete trailing empty lines. Please fix other files as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81429/new/ https://reviews.llvm.org/D81429 From llvm-commits at lists.llvm.org Fri Jul 10 15:34:09 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:34:09 +0000 (UTC) Subject: [PATCH] D81775: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: <1fe1c13b37db45d04603d1be9a1aed9d@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. This revision is now accepted and ready to land. LGTM with one nit. ================ Comment at: llvm/test/MC/COFF/cgprofile.s:120 +# CHECK-NEXT: ] \ No newline at end of file ---------------- No newline Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81775/new/ https://reviews.llvm.org/D81775 From llvm-commits at lists.llvm.org Fri Jul 10 15:36:59 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:36:59 +0000 (UTC) Subject: [PATCH] D83270: [OpenMP] Compute a proper module slice for the CGSCCC pass In-Reply-To: References: Message-ID: sstefan1 accepted this revision. sstefan1 added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83270/new/ https://reviews.llvm.org/D83270 From llvm-commits at lists.llvm.org Fri Jul 10 15:39:13 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:39:13 +0000 (UTC) Subject: [PATCH] D83583: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen In-Reply-To: References: Message-ID: <025a79ea768a145e3513c421ebc96174@localhost.localdomain> sstefan1 accepted this revision. sstefan1 added a comment. This revision is now accepted and ready to land. Thanks for figuring this out! LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83583/new/ https://reviews.llvm.org/D83583 From llvm-commits at lists.llvm.org Fri Jul 10 15:53:46 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:53:46 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] use liveness information from AAIsDead in AAReachability and cache query results In-Reply-To: References: Message-ID: <13dd09fd36e9bcc5b776ab47a37ce77c@localhost.localdomain> jdoerfert added a comment. In D83246#2143962 , @okura wrote: > On second thoughts, I think so too about the licenses check. > Could you tell me where you think we can use the liveness information in this comment ? Right. What I meant is we should use liveness when computing the entire reachability result, not only the endpoints. We can (reasonably) expect users to do the latter (for now). If we compute reachability ourselves it is a different story as we can use the liveness of edges (eventually) to improve the result. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:729 + return Result; + } + ---------------- okura wrote: > jdoerfert wrote: > > Please add documentation and consider taking the instructions as references. > > > > Nit: Move `F` after the first check to shorten the lifetime (and avoid confusion). > > > Do you expect that I change interfaces of AAReachability to take instructions as references too? > Could you tell me the advantage of taking as references compared to taking as pointers? I don't mean I want to stick to pointers, just want to know about it. Yes, feel free to change the other interface as well. There is no(t much) "functional benefit". In the Attributor we usually use references whenever we cannot pass `null` and pointers when we might (or have to use pointers). At the end of the day it is just a hint for the user that nullptr is not acceptable. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 From llvm-commits at lists.llvm.org Fri Jul 10 15:58:03 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:58:03 +0000 (UTC) Subject: [PATCH] D83031: AMDGPU/GlobalISel: Select G_FREEZE In-Reply-To: References: Message-ID: <46dd3f2d588d7f6778e60e35e51f0f0a@localhost.localdomain> arsenm added inline comments. ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-freeze.mir:39-41 + %0:sgpr(s32) = COPY $sgpr0 + %1:sgpr(s1) = G_TRUNC %0 + %2:vcc(s1) = G_FREEZE %1 ---------------- A test with all preassigned banks isn't particularly interesting ================ Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-freeze.mir:81 + %1:_(s1) = G_TRUNC %0 + %2:vcc(s1) = G_FREEZE %1 + S_ENDPGM 0, implicit %2 ---------------- Shouldn't have a pre-set bank here CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83031/new/ https://reviews.llvm.org/D83031 From llvm-commits at lists.llvm.org Fri Jul 10 15:58:45 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 22:58:45 +0000 (UTC) Subject: [PATCH] D83289: [BPF] Emit unknown types as byte arrays In-Reply-To: References: Message-ID: <3749e2750ac278042911757c7c11c763@localhost.localdomain> yonghong-song added a comment. I will experiment with proper float type support soon. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83289/new/ https://reviews.llvm.org/D83289 From llvm-commits at lists.llvm.org Fri Jul 10 16:00:59 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:00:59 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: jdoerfert added inline comments. ================ Comment at: flang/lib/Semantics/check-omp-structure.cpp:67 - llvm::omp::Clause::OMPC_reduction}; - std::string OmpStructureChecker::ContextDirectiveAsFortran() { ---------------- I love these patches :) And we clean up Clang after! ================ Comment at: flang/test/Semantics/omp-clause-validity01.f90:457 - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) ---------------- clementval wrote: > ichoyjx wrote: > > clementval wrote: > > > As a side note, This is supposed to be fine in Clang so I removed the check. I looked at the OpenMP 5.0 std and didn't see a restriction on `reduction` for `task loop simd`. > > What's the current plan? Are we trying to cover OpenMP 5.0 Spec for semantics (it appears so)? > Clang just moved to 5.0as default and my guesses that we are targeting 5.0 as well since it is the current standard. If Flang has the equivalent to `-fopenmp-version=XX` or you just hardcode a version to be used, it should be possible to retain this error for the 4.5 case, right? ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:216 + VersionedClause, + ]; } ---------------- Yay! :) ================ Comment at: llvm/include/llvm/Frontend/OpenMP/OMP.td:229 VersionedClause, VersionedClause, VersionedClause ---------------- clementval wrote: > ichoyjx wrote: > > Bear with me, what does 50 mean? > The `VersionedClause` is defined as this: `VersionedClause` > > So here it means the clause is valid from version 5.0 and up. This is currently not used in Flang but in Clang it's used and I took the value from the old macros definition `OMPKinds.def`. So version 4.5 would 45 and so on. FWIW, that is how we specify the version on the command line (for clang) as well. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:305 - const auto &Directives = Records.getAllDerivedDefinitions("Directive"); - const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); + IfDefScope Scope("GEN_FLANG_DIRECTIVE_CLAUSE_SETS", OS); + ---------------- Any reason this is flang specific? I guess we want to use the information in Clang too (soonish). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Fri Jul 10 16:03:51 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 16:03:51 -0700 Subject: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record In-Reply-To: <8c477d792c405aae7777933ed204393d@localhost.localdomain> References: <8c477d792c405aae7777933ed204393d@localhost.localdomain> Message-ID: I'm seeing tests fail with a crash. Can we revert the patch and attempted fixes and start working from there? Stacktrace for the curious :) @ 0x56420187cbbe llvm::MCStreamer::emitIntValue() @ 0x5641fec38899 llvm::MCStreamer::emitInt16() @ 0x5641ff73b337 llvm::CodeViewDebug::emitCompilerInformation() @ 0x5641ff73ac73 llvm::CodeViewDebug::endModule() @ 0x5641ff718e83 llvm::AsmPrinter::doFinalization() @ 0x5642016fd9ca llvm::FPPassManager::doFinalization() @ 0x5642016f954e (anonymous namespace)::MPPassManager::runOnModule() -eric On Tue, Jun 30, 2020 at 11:56 AM Alexandre Ganea via Phabricator via cfe-commits wrote: > aganea added a comment. > > In D80833#2109172 , @uweigand > wrote: > > > Hmm, with clang-cl it seems the driver is trying to use this: > > Target: s390x-pc-windows-msvc > > which of course doesn't exist. Not sure what is supposed to be > happening here, but it seems that it's falling back on s390x-linux since on > s390x, Linux is currently the only supported OS. > > > I'm seeing some of the tests are setting the target explicitly `%clang_cl > --target=x86_64-windows-msvc`. Would that work on your machine? Or should I > do `UNSUPPORTED: s390x` ? > > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D80833/new/ > > https://reviews.llvm.org/D80833 > > > > _______________________________________________ > cfe-commits mailing list > cfe-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:04:47 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:04:47 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: clementval marked 7 inline comments as done. clementval added inline comments. ================ Comment at: flang/test/Semantics/omp-clause-validity01.f90:457 - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) ---------------- jdoerfert wrote: > clementval wrote: > > ichoyjx wrote: > > > clementval wrote: > > > > As a side note, This is supposed to be fine in Clang so I removed the check. I looked at the OpenMP 5.0 std and didn't see a restriction on `reduction` for `task loop simd`. > > > What's the current plan? Are we trying to cover OpenMP 5.0 Spec for semantics (it appears so)? > > Clang just moved to 5.0as default and my guesses that we are targeting 5.0 as well since it is the current standard. > If Flang has the equivalent to `-fopenmp-version=XX` or you just hardcode a version to be used, it should be possible to retain this error for the 4.5 case, right? Flang does not support passing a version at the moment. So we have to assume it's 5.0. I guess there will be work work on this later. ================ Comment at: llvm/utils/TableGen/DirectiveEmitter.cpp:305 - const auto &Directives = Records.getAllDerivedDefinitions("Directive"); - const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); + IfDefScope Scope("GEN_FLANG_DIRECTIVE_CLAUSE_SETS", OS); + ---------------- jdoerfert wrote: > Any reason this is flang specific? I guess we want to use the information in Clang too (soonish). No specific reason. I'm happy to remove the `FLANG_` here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Fri Jul 10 16:07:11 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:07:11 +0000 (UTC) Subject: [PATCH] D83507: [AssumeBundles] Fix Bug in Assume Queries In-Reply-To: References: Message-ID: <483bd02f130995d3055617c62083064e@localhost.localdomain> lebedev.ri added a comment. In D83507#2142683 , @Tyker wrote: > In D83507#2142627 , @lebedev.ri wrote: > > > Test? > > > i would also like to add a test for it, but the smallest reproduction example is still very big 30k+ line of IR > and depend on what is present in the AssumptionCache so i could only reproduce it under -O3 run. > it isn't minimized at all but minimizing it in a way that still exibit the bug is quite hard Can you at least post the reproducer+steps as-is? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83507/new/ https://reviews.llvm.org/D83507 From llvm-commits at lists.llvm.org Fri Jul 10 16:07:09 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 16:07:09 -0700 Subject: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record In-Reply-To: References: <8c477d792c405aae7777933ed204393d@localhost.localdomain> Message-ID: You'll probably want the assert as well: assert.h assertion failed at llvm-project/llvm/lib/MC/MCStreamer.cpp:134 in virtual void llvm::MCStreamer::emitIntValue(uint64_t, unsigned int): (isUIntN(8 * Size, Value) || isIntN(8 * Size, Value)) && "Invalid size" On Fri, Jul 10, 2020 at 4:03 PM Eric Christopher wrote: > I'm seeing tests fail with a crash. Can we revert the patch and attempted > fixes and start working from there? > > Stacktrace for the curious :) > > @ 0x56420187cbbe llvm::MCStreamer::emitIntValue() > @ 0x5641fec38899 llvm::MCStreamer::emitInt16() > @ 0x5641ff73b337 llvm::CodeViewDebug::emitCompilerInformation() > @ 0x5641ff73ac73 llvm::CodeViewDebug::endModule() > @ 0x5641ff718e83 llvm::AsmPrinter::doFinalization() > @ 0x5642016fd9ca llvm::FPPassManager::doFinalization() > @ 0x5642016f954e (anonymous > namespace)::MPPassManager::runOnModule() > > -eric > > On Tue, Jun 30, 2020 at 11:56 AM Alexandre Ganea via Phabricator via > cfe-commits wrote: > >> aganea added a comment. >> >> In D80833#2109172 , @uweigand >> wrote: >> >> > Hmm, with clang-cl it seems the driver is trying to use this: >> > Target: s390x-pc-windows-msvc >> > which of course doesn't exist. Not sure what is supposed to be >> happening here, but it seems that it's falling back on s390x-linux since on >> s390x, Linux is currently the only supported OS. >> >> >> I'm seeing some of the tests are setting the target explicitly `%clang_cl >> --target=x86_64-windows-msvc`. Would that work on your machine? Or should I >> do `UNSUPPORTED: s390x` ? >> >> >> Repository: >> rG LLVM Github Monorepo >> >> CHANGES SINCE LAST ACTION >> https://reviews.llvm.org/D80833/new/ >> >> https://reviews.llvm.org/D80833 >> >> >> >> _______________________________________________ >> cfe-commits mailing list >> cfe-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:10:29 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:10:29 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load Message-ID: guiand created this revision. guiand added reviewers: eugenis, vitalybuka. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. If we know that every path from an alloca leads to a store, we can optimize away poisoning its shadow (it'll be overwritten anyway). I'm wondering if there's a better approach for finding all these def-uses. I investigated the `DominatorTree`, but I'm not sure how I would use that to check *all* the uses without blowing up the runtime for instrumenting an alloca to O(N^2) (N=# uses) or so. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83595 Files: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp llvm/test/Instrumentation/MemorySanitizer/alloca.ll llvm/test/Instrumentation/MemorySanitizer/msan_x86_bts_asm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83595.277159.patch Type: text/x-patch Size: 4308 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:11:19 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:11:19 +0000 (UTC) Subject: [PATCH] D83506: [NFC] Add debug and stat counters to assume queries and assume builder In-Reply-To: References: Message-ID: jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. Nice. LGTM, some nits to be fixed below. ================ Comment at: llvm/lib/Analysis/AssumeBundleQueries.cpp:9 +#define DEBUG_TYPE "assume-queries" + ---------------- Style: Is usually below the includes. ================ Comment at: llvm/lib/Analysis/AssumeBundleQueries.cpp:173 Filter) { + NumAssumeQueries++; + if (!DebugCounter::shouldExecute(AssumeQueryCounter)) ---------------- Style: Use pre-increments everywhere. ================ Comment at: llvm/lib/Analysis/AssumeBundleQueries.cpp:185 + Filter(RK, II, &II->bundle_op_info_begin()[Elem.Index])) { + NumUsefullAssumeQueries++; return RK; ---------------- Same ================ Comment at: llvm/lib/Analysis/AssumeBundleQueries.cpp:199 + Filter(RK, cast(U.getUser()), Bundle)) { + NumUsefullAssumeQueries++; return RK; ---------------- Same ================ Comment at: llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp:9 +#define DEBUG_TYPE "assume-builder" + ---------------- Same. ================ Comment at: llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp:241 } + NumAssumeBuilt++; return cast(CallInst::Create( ---------------- Same. ================ Comment at: llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp:352 + else + NumAssumesRemoved++; Assume->eraseFromParent(); ---------------- Same. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83506/new/ https://reviews.llvm.org/D83506 From llvm-commits at lists.llvm.org Fri Jul 10 16:11:59 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via llvm-commits) Date: Fri, 10 Jul 2020 23:11:59 +0000 Subject: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record In-Reply-To: References: <8c477d792c405aae7777933ed204393d@localhost.localdomain> Message-ID: Thanks for letting me know Eric. What test fails exactly? What config? De : Eric Christopher Envoyé : July 10, 2020 7:04 PM À : reviews+D80833+public+da87cf0eabdca5a2 at reviews.llvm.org; Alexandre Ganea via Phabricator Cc : Alexandre Ganea ; Hans Wennborg ; Adrian McCarthy ; Martin Storsjo ; Amy Huang ; dmajor at mozilla.com; john.reagan at vmssoftware.com; zturner at roblox.com; 88888yl at gmail.com; llvm-commits ; stefan.reinalter at molecular-matters.com; Ulrich Weigand ; mlekena at skidmore.edu; Clang Commits ; Han Shen Objet : Re: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record I'm seeing tests fail with a crash. Can we revert the patch and attempted fixes and start working from there? Stacktrace for the curious :) @ 0x56420187cbbe llvm::MCStreamer::emitIntValue() @ 0x5641fec38899 llvm::MCStreamer::emitInt16() @ 0x5641ff73b337 llvm::CodeViewDebug::emitCompilerInformation() @ 0x5641ff73ac73 llvm::CodeViewDebug::endModule() @ 0x5641ff718e83 llvm::AsmPrinter::doFinalization() @ 0x5642016fd9ca llvm::FPPassManager::doFinalization() @ 0x5642016f954e (anonymous namespace)::MPPassManager::runOnModule() -eric On Tue, Jun 30, 2020 at 11:56 AM Alexandre Ganea via Phabricator via cfe-commits > wrote: aganea added a comment. In D80833#2109172 , @uweigand wrote: > Hmm, with clang-cl it seems the driver is trying to use this: > Target: s390x-pc-windows-msvc > which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS. I'm seeing some of the tests are setting the target explicitly `%clang_cl --target=x86_64-windows-msvc`. Would that work on your machine? Or should I do `UNSUPPORTED: s390x` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80833/new/ https://reviews.llvm.org/D80833 _______________________________________________ cfe-commits mailing list cfe-commits at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:12:45 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:12:45 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <7ad52250c5258b2dc3724aa72a65877c@localhost.localdomain> lebedev.ri added subscribers: fhahn, lebedev.ri. lebedev.ri added a comment. Perhaps MemSSA-based DCE can be taught about it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 16:14:54 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:14:54 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <6da5c1656cd461ef8d591e25f6800794@localhost.localdomain> guiand marked an inline comment as done. guiand added inline comments. ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3633 + // overwritten anyway. + bool firstUsesAreConstStore(SmallPtrSet &TraversedSet, + const SmallPtrSet &Users, ---------------- TODO: get rid of `Const` in this function name Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 16:15:33 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:15:33 +0000 (UTC) Subject: [PATCH] D83507: [AssumeBundles] Fix Bug in Assume Queries In-Reply-To: References: Message-ID: lebedev.ri requested changes to this revision. lebedev.ri added a comment. This revision now requires changes to proceed. Also, this either needs a rebase, or this isn't against master but the stack doesn't say so. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83507/new/ https://reviews.llvm.org/D83507 From llvm-commits at lists.llvm.org Fri Jul 10 16:17:30 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 16:17:30 -0700 Subject: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record In-Reply-To: References: <8c477d792c405aae7777933ed204393d@localhost.localdomain> Message-ID: Release+Asserts on x86_64-linux and the debug-info-codeview-buildinfo.c test. Sorry if the latter wasn't clear :) -eric On Fri, Jul 10, 2020 at 4:12 PM Alexandre Ganea wrote: > Thanks for letting me know Eric. What test fails exactly? What config? > > > > *De :* Eric Christopher > *Envoyé :* July 10, 2020 7:04 PM > *À :* reviews+D80833+public+da87cf0eabdca5a2 at reviews.llvm.org; Alexandre > Ganea via Phabricator > *Cc :* Alexandre Ganea ; Hans Wennborg < > hans at chromium.org>; Adrian McCarthy ; Martin Storsjo > ; Amy Huang ; dmajor at mozilla.com; > john.reagan at vmssoftware.com; zturner at roblox.com; 88888yl at gmail.com; > llvm-commits ; > stefan.reinalter at molecular-matters.com; Ulrich Weigand < > ulrich.weigand at de.ibm.com>; mlekena at skidmore.edu; Clang Commits < > cfe-commits at lists.llvm.org>; Han Shen > *Objet :* Re: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO > record > > > > I'm seeing tests fail with a crash. Can we revert the patch and attempted > fixes and start working from there? > > > > Stacktrace for the curious :) > > > > @ 0x56420187cbbe llvm::MCStreamer::emitIntValue() > @ 0x5641fec38899 llvm::MCStreamer::emitInt16() > @ 0x5641ff73b337 llvm::CodeViewDebug::emitCompilerInformation() > @ 0x5641ff73ac73 llvm::CodeViewDebug::endModule() > @ 0x5641ff718e83 llvm::AsmPrinter::doFinalization() > @ 0x5642016fd9ca llvm::FPPassManager::doFinalization() > @ 0x5642016f954e (anonymous > namespace)::MPPassManager::runOnModule() > > > > -eric > > > > On Tue, Jun 30, 2020 at 11:56 AM Alexandre Ganea via Phabricator via > cfe-commits wrote: > > aganea added a comment. > > In D80833#2109172 , @uweigand > wrote: > > > Hmm, with clang-cl it seems the driver is trying to use this: > > Target: s390x-pc-windows-msvc > > which of course doesn't exist. Not sure what is supposed to be > happening here, but it seems that it's falling back on s390x-linux since on > s390x, Linux is currently the only supported OS. > > > I'm seeing some of the tests are setting the target explicitly `%clang_cl > --target=x86_64-windows-msvc`. Would that work on your machine? Or should I > do `UNSUPPORTED: s390x` ? > > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D80833/new/ > > https://reviews.llvm.org/D80833 > > > > _______________________________________________ > cfe-commits mailing list > cfe-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:18:15 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:18:15 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <5c8c2d64d6cee2d1ace6289b2c510e54@localhost.localdomain> vitalybuka added inline comments. ================ Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3656 + // pointer somewhere. But we don't want that. + return Store->getPointerOperand() == &Alloca; + } else { ---------------- Store may write only part of alloca. How useful (binary size) the patch as is? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 16:18:23 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:18:23 +0000 (UTC) Subject: [PATCH] D83568: [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. In-Reply-To: References: Message-ID: <44eb7b1be4605c1caef06742ffc323e4@localhost.localdomain> paulwalker-arm updated this revision to Diff 277161. paulwalker-arm added a comment. Updated tests to ensure the correct number of fcvt instructions are emitted. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83568/new/ https://reviews.llvm.org/D83568 Files: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-fp-converts.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83568.277161.patch Type: text/x-patch Size: 8430 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:18:48 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:18:48 +0000 (UTC) Subject: [PATCH] D83417: GlobalISel: Restructure argument lowering loop in handleAssignments In-Reply-To: References: Message-ID: <748d520e9389a3d1622ee5f25b74bdbb@localhost.localdomain> aemerson added a comment. I don't think we should be changing the extending behavior for memlocs. Just for the varargs case in Darwin, there's lots of code out there which incorrectly try to interpret a sub 64bit incoming varargs parameter as a 64 bit value. Although it should be technically correct to emit a smaller store, what happens in practice is that this code breaks for very hard to detect reasons (i.e. you no longer get a free zeroing of the upper bits of the stack slot). In arm64 we could force this to always explicitly zero-extend to 64 bits but that incurs a penalty at the call site. It's unfortunate but copying the DAG behavior here is likely to cause less pain. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83417/new/ https://reviews.llvm.org/D83417 From llvm-commits at lists.llvm.org Fri Jul 10 16:22:29 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:22:29 +0000 (UTC) Subject: [PATCH] D83549: [ELF] Do not force bringing out symbols passed by -init and -fini. In-Reply-To: References: Message-ID: MaskRay added a comment. OK, I think D69885 (lto/init-fini.ll) still makes sense. A bitcode symbol can behave like a regular symbol or an archive symbol. For the `-init` case, a bitcode definition should behave more like a regular definition and `-init` can cause the definition to be emitted. This patch makes LLD similar to GNU ld: there is no symbol table entry for `-init`: cat > a.s < References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/test/ELF/archive-init-fini.s:7 + +# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t +# RUN: rm -f %t.a ---------------- `%t.o` ================ Comment at: lld/test/ELF/archive-init-fini.s:10 +# RUN: llvm-ar rcs %t.a %t +# RUN: ld.lld -shared -m elf_x86_64 %t.a -o %t.out +# RUN: llvm-nm %t.out | \ ---------------- `%t.out` -> `%t` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83549/new/ https://reviews.llvm.org/D83549 From llvm-commits at lists.llvm.org Fri Jul 10 16:23:46 2020 From: llvm-commits at lists.llvm.org (David Blaikie via llvm-commits) Date: Fri, 10 Jul 2020 16:23:46 -0700 (PDT) Subject: [llvm] 854e8f8 - Remove unnecessary/erroneous "static" from function templates in headers Message-ID: <5f08f882.1c69fb81.98f5a.1ec3@mx.google.com> Author: David Blaikie Date: 2020-07-10T16:23:33-07:00 New Revision: 854e8f88e96bd6a75844d2af061cacc61fb0defe URL: https://github.com/llvm/llvm-project/commit/854e8f88e96bd6a75844d2af061cacc61fb0defe DIFF: https://github.com/llvm/llvm-project/commit/854e8f88e96bd6a75844d2af061cacc61fb0defe.diff LOG: Remove unnecessary/erroneous "static" from function templates in headers This risks ODR violations in inline functions that call these functions (if they remain static) & otherwise just causes some object size increase, potentially, by these functions not being deduplicated by the linker. Added: Modified: llvm/include/llvm/ADT/STLExtras.h Removed: ################################################################################ diff --git a/llvm/include/llvm/ADT/STLExtras.h b/llvm/include/llvm/ADT/STLExtras.h index b2e709f7272f..50b688b36648 100644 --- a/llvm/include/llvm/ADT/STLExtras.h +++ b/llvm/include/llvm/ADT/STLExtras.h @@ -745,14 +745,14 @@ detail::zippy zip_first(T &&t, U &&u, namespace detail { template -static Iter next_or_end(const Iter &I, const Iter &End) { +Iter next_or_end(const Iter &I, const Iter &End) { if (I == End) return End; return std::next(I); } template -static auto deref_or_none(const Iter &I, const Iter &End) -> llvm::Optional< +auto deref_or_none(const Iter &I, const Iter &End) -> llvm::Optional< std::remove_const_t>> { if (I == End) return None; From llvm-commits at lists.llvm.org Fri Jul 10 16:23:55 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:23:55 +0000 (UTC) Subject: [PATCH] D81788: [OpenMPOpt] ICV Tracking In-Reply-To: References: Message-ID: sstefan1 updated this revision to Diff 277162. sstefan1 added a comment. fixing the issue with dead uses. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81788/new/ https://reviews.llvm.org/D81788 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/icv_tracking.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81788.277162.patch Type: text/x-patch Size: 18577 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:25:59 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:25:59 +0000 (UTC) Subject: [PATCH] D83596: [BPI] Compile time improvement when erasing blocks (NFC) Message-ID: tejohnson created this revision. tejohnson added reviewers: davidxl, xur. Herald added a subscriber: hiraditya. Herald added a project: LLVM. eraseBlock is trying to erase all probability info for the given BB. This info is stored in a DenseMap organized like so: using Edge = std::pair; DenseMap Probs; where the unsigned in the Edge key is the successor id. It was walking through every single map entry, checking if the BB in the key's pair matched the given BB. Much more efficient is to do what another method (getEdgeProbability) was already doing, which is to walk the successors of the BB, and simply do a map lookup on the key formed from each pair. Doing this dropped the overall compile time for a file containing a very large function by around 32%. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83596 Files: llvm/lib/Analysis/BranchProbabilityInfo.cpp Index: llvm/lib/Analysis/BranchProbabilityInfo.cpp =================================================================== --- llvm/lib/Analysis/BranchProbabilityInfo.cpp +++ llvm/lib/Analysis/BranchProbabilityInfo.cpp @@ -1056,10 +1056,10 @@ } void BranchProbabilityInfo::eraseBlock(const BasicBlock *BB) { - for (auto I = Probs.begin(), E = Probs.end(); I != E; ++I) { - auto Key = I->first; - if (Key.first == BB) - Probs.erase(Key); + for (const_succ_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) { + auto MapI = Probs.find(std::make_pair(BB, I.getSuccessorIndex())); + if (MapI != Probs.end()) + Probs.erase(MapI); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83596.277164.patch Type: text/x-patch Size: 668 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:31:55 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:31:55 +0000 (UTC) Subject: [PATCH] D81788: [OpenMPOpt] ICV Tracking In-Reply-To: References: Message-ID: jdoerfert accepted this revision. jdoerfert added a comment. Add the reproducer as test please. With that one and the change below, LGTM. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:342 + } + } + ---------------- `-Size` `+RFIs.size()` just add the function to the EnumeratedArray please. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81788/new/ https://reviews.llvm.org/D81788 From llvm-commits at lists.llvm.org Fri Jul 10 16:35:22 2020 From: llvm-commits at lists.llvm.org (Shoaib Meenai via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:35:22 +0000 (UTC) Subject: [PATCH] D83579: [llvm-ar][test][AIX] Unsupport error-opening-directory.test on AIX In-Reply-To: References: Message-ID: <553f07bda4c0e71a55a7f0a2b33d4f6d@localhost.localdomain> smeenai accepted this revision. smeenai added a comment. Thanks and sorry for the breakage! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83579/new/ https://reviews.llvm.org/D83579 From llvm-commits at lists.llvm.org Fri Jul 10 16:35:26 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:35:26 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <5c52ff65075cd745433ac6c3931d9c1f@localhost.localdomain> eugenis added a comment. Allocas are often used through bitcast or GEP, we should handle them as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 16:36:33 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:36:33 +0000 (UTC) Subject: [PATCH] D83559: [test/Object][llvm-objdump] - llvm-objdump: don't abort() when the e_phoff field is invalid and refine testing. In-Reply-To: References: Message-ID: <52adf297c40d1cad2a7b9e76575e5e57@localhost.localdomain> MaskRay accepted this revision. MaskRay added a comment. This revision is now accepted and ready to land. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83559/new/ https://reviews.llvm.org/D83559 From llvm-commits at lists.llvm.org Fri Jul 10 16:39:07 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:39:07 +0000 (UTC) Subject: [PATCH] D83597: [COFF] Add cg_profile directive and .llvm.call-graph-profile section Message-ID: zequanwu created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83597 Files: llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h llvm/include/llvm/MC/MCWinCOFFStreamer.h llvm/lib/MC/MCParser/COFFAsmParser.cpp llvm/lib/MC/MCParser/ELFAsmParser.cpp llvm/lib/MC/MCParser/MCAsmParserExtension.cpp llvm/lib/MC/MCWinCOFFStreamer.cpp llvm/lib/MC/WinCOFFObjectWriter.cpp llvm/test/MC/AsmParser/directive_cgprofile.s llvm/test/MC/COFF/cgprofile.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83597.277166.patch Type: text/x-patch Size: 13211 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:40:32 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:40:32 +0000 (UTC) Subject: [PATCH] D81775: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: zequanwu updated this revision to Diff 277167. zequanwu added a comment. rebase and add newline at end of file. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81775/new/ https://reviews.llvm.org/D81775 Files: llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h llvm/include/llvm/MC/MCWinCOFFStreamer.h llvm/lib/MC/MCParser/COFFAsmParser.cpp llvm/lib/MC/MCParser/ELFAsmParser.cpp llvm/lib/MC/MCParser/MCAsmParserExtension.cpp llvm/lib/MC/MCWinCOFFStreamer.cpp llvm/lib/MC/WinCOFFObjectWriter.cpp llvm/test/MC/AsmParser/directive_cgprofile.s llvm/test/MC/COFF/cgprofile.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D81775.277167.patch Type: text/x-patch Size: 13211 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:46:27 2020 From: llvm-commits at lists.llvm.org (Alexandre Ganea via llvm-commits) Date: Fri, 10 Jul 2020 16:46:27 -0700 (PDT) Subject: [llvm] b71499a - Revert "Re-land [CodeView] Add full repro to LF_BUILDINFO record" Message-ID: <5f08fdd3.1c69fb81.9040d.00d7@mx.google.com> Author: Alexandre Ganea Date: 2020-07-10T19:46:16-04:00 New Revision: b71499ac9eebbddd3a45ac15f161f89eb3378918 URL: https://github.com/llvm/llvm-project/commit/b71499ac9eebbddd3a45ac15f161f89eb3378918 DIFF: https://github.com/llvm/llvm-project/commit/b71499ac9eebbddd3a45ac15f161f89eb3378918.diff LOG: Revert "Re-land [CodeView] Add full repro to LF_BUILDINFO record" This reverts commit add59ecb34e3003311b7e2318b16a0ef10c76d79 and 41d2813a5faea1c18b7d329109e0287c5cd9ffea. Added: Modified: clang/cmake/caches/BaremetalARM.cmake clang/cmake/caches/CrossWinToARMLinux.cmake clang/cmake/caches/Fuchsia-stage2.cmake clang/test/CMakeLists.txt lld/COFF/PDB.cpp lld/test/COFF/Inputs/pdb_lines_1_relative.yaml lld/test/COFF/Inputs/pdb_lines_2_relative.yaml lld/test/COFF/pdb-relative-source-lines.test llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp llvm/test/DebugInfo/COFF/build-info.ll llvm/test/DebugInfo/COFF/global-type-hashes.ll llvm/test/DebugInfo/COFF/types-basic.ll llvm/test/DebugInfo/COFF/types-data-members.ll Removed: clang/test/CodeGen/debug-info-codeview-buildinfo.c lld/test/COFF/pdb-relative-source-lines2.test ################################################################################ diff --git a/clang/cmake/caches/BaremetalARM.cmake b/clang/cmake/caches/BaremetalARM.cmake index e44355cfcbd7..85295d9db392 100644 --- a/clang/cmake/caches/BaremetalARM.cmake +++ b/clang/cmake/caches/BaremetalARM.cmake @@ -31,7 +31,6 @@ set(LLVM_TOOLCHAIN_TOOLS llvm-dwarfdump llvm-nm llvm-objdump - llvm-pdbutil llvm-ranlib llvm-readobj llvm-size diff --git a/clang/cmake/caches/CrossWinToARMLinux.cmake b/clang/cmake/caches/CrossWinToARMLinux.cmake index ccfccce3cb89..9aa0efa8049f 100644 --- a/clang/cmake/caches/CrossWinToARMLinux.cmake +++ b/clang/cmake/caches/CrossWinToARMLinux.cmake @@ -137,7 +137,6 @@ set(LLVM_TOOLCHAIN_TOOLS llvm-lib llvm-nm llvm-objdump - llvm-pdbutil llvm-profdata llvm-ranlib llvm-readobj diff --git a/clang/cmake/caches/Fuchsia-stage2.cmake b/clang/cmake/caches/Fuchsia-stage2.cmake index 8b5e9d0c4181..259684ff2b0d 100644 --- a/clang/cmake/caches/Fuchsia-stage2.cmake +++ b/clang/cmake/caches/Fuchsia-stage2.cmake @@ -240,7 +240,6 @@ set(LLVM_TOOLCHAIN_TOOLS llvm-nm llvm-objcopy llvm-objdump - llvm-pdbutil llvm-profdata llvm-ranlib llvm-readelf diff --git a/clang/test/CMakeLists.txt b/clang/test/CMakeLists.txt index b2777fded0ae..38bbc5be90d5 100644 --- a/clang/test/CMakeLists.txt +++ b/clang/test/CMakeLists.txt @@ -126,7 +126,6 @@ if( NOT CLANG_BUILT_STANDALONE ) llvm-nm llvm-objcopy llvm-objdump - llvm-pdbutil llvm-profdata llvm-readelf llvm-readobj diff --git a/clang/test/CodeGen/debug-info-codeview-buildinfo.c b/clang/test/CodeGen/debug-info-codeview-buildinfo.c deleted file mode 100644 index e1082d2532b2..000000000000 --- a/clang/test/CodeGen/debug-info-codeview-buildinfo.c +++ /dev/null @@ -1,25 +0,0 @@ -// RUN: %clang_cl --target=i686-windows-msvc /c /Z7 /Fo%t.obj -- %s -// RUN: llvm-pdbutil dump --types %t.obj | FileCheck %s -// RUN: %clang_cl --target=i686-windows-msvc /c /Z7 /Fo%t.obj -fdebug-compilation-dir=. -- %s -// RUN: llvm-pdbutil dump --types %t.obj | FileCheck %s --check-prefix RELATIVE - -int main() { return 42; } - -// CHECK: Types (.debug$T) -// CHECK: ============================================================ -// CHECK: 0x[[PWD:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[PWDVAL:.+]] -// CHECK: 0x[[FILEPATH:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[FILEPATHVAL:.+[\\/]debug-info-codeview-buildinfo.c]] -// CHECK: 0x[[ZIPDB:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: -// CHECK: 0x[[TOOL:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: [[TOOLVAL:.+[\\/]clang.*]] -// CHECK: 0x[[CMDLINE:.+]] | LF_STRING_ID [size = {{.+}}] ID: , String: "-cc1 -// CHECK: 0x{{.+}} | LF_BUILDINFO [size = {{.+}}] -// CHECK: 0x[[PWD]]: `[[PWDVAL]]` -// CHECK: 0x[[TOOL]]: `[[TOOLVAL]]` -// CHECK: 0x[[FILEPATH]]: `[[FILEPATHVAL]]` -// CHECK: 0x[[ZIPDB]]: `` -// CHECK: 0x[[CMDLINE]]: `"-cc1 - -// RELATIVE: Types (.debug$T) -// RELATIVE: ============================================================ -// RELATIVE: 0x{{.+}} | LF_BUILDINFO [size = {{.+}}] -// RELATIVE: 0x{{.+}}: `.` diff --git a/lld/COFF/PDB.cpp b/lld/COFF/PDB.cpp index 5738eae7d6c4..49d04add5be0 100644 --- a/lld/COFF/PDB.cpp +++ b/lld/COFF/PDB.cpp @@ -250,72 +250,6 @@ static void addTypeInfo(pdb::TpiStreamBuilder &tpiBuilder, }); } -// LF_BUILDINFO records might contain relative paths, and we want to make them -// absolute. We do this remapping only after the type records were merged, -// because the full types graph isn't known during merging. In addition, we plan -// to multi-thread the type merging, and the change below needs to be done -// atomically, single-threaded. - -// A complication could arise when a LF_STRING_ID record already exists with the -// same content as the new absolutized path. In that case, we simply redirect -// LF_BUILDINFO's CurrentDirectory index to reference the existing LF_STRING_ID -// record. - -static void remapBuildInfo(TypeCollection &idTable) { - SimpleTypeSerializer s; - idTable.ForEachRecord([&](TypeIndex ti, const CVType &type) { - if (type.kind() != LF_BUILDINFO) - return; - BuildInfoRecord bi; - cantFail(TypeDeserializer::deserializeAs(const_cast(type), bi)); - - auto makeAbsoluteRecord = - [&](BuildInfoRecord::BuildInfoArg recordType) -> Optional { - TypeIndex recordTi = bi.getArgs()[recordType]; - if (recordTi.isNoneType()) - return None; - CVType recordRef = idTable.getType(recordTi); - - StringIdRecord record; - cantFail(TypeDeserializer::deserializeAs(recordRef, record)); - - SmallString<128> abolutizedPath(record.getString()); - pdbMakeAbsolute(abolutizedPath); - - if (abolutizedPath == record.getString()) - return None; // The path is already absolute. - - record.String = abolutizedPath; - ArrayRef recordData = s.serialize(record); - - // Replace the previous LF_STRING_ID record - if (!idTable.replaceType(recordTi, CVType(recordData), - /*Stabilize=*/true)) - return recordTi; - return None; - }; - - Optional curDirTI = - makeAbsoluteRecord(BuildInfoRecord::CurrentDirectory); - Optional buildToolTI = - makeAbsoluteRecord(BuildInfoRecord::BuildTool); - - if (curDirTI || buildToolTI) { - // This new record is already there. We don't want duplicates, so - // re-serialize the BuildInfoRecord instead. - if (curDirTI) - bi.ArgIndices[BuildInfoRecord::CurrentDirectory] = *curDirTI; - if (buildToolTI) - bi.ArgIndices[BuildInfoRecord::BuildTool] = *buildToolTI; - - ArrayRef biData = s.serialize(bi); - bool r = idTable.replaceType(ti, CVType(biData), /*Stabilize=*/true); - assert(r && "Didn't expect two build records pointing to the same OBJ!"); - (void)r; - } - }); -} - static bool remapTypeIndex(TypeIndex &ti, ArrayRef typeIndexMap) { if (ti.isSimple()) return true; @@ -1054,9 +988,6 @@ void PDBLinker::addObjectsToPDB() { builder.getStringTableBuilder().setStrings(pdbStrTab); t1.stop(); - // Remap the contents of the LF_BUILDINFO record. - remapBuildInfo(tMerger.getIDTable()); - // Construct TPI and IPI stream contents. ScopedTimer t2(tpiStreamLayoutTimer); addTypeInfo(builder.getTpiBuilder(), tMerger.getTypeTable()); diff --git a/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml b/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml index 9a6b192e1d0d..947de419d6b8 100644 --- a/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml +++ b/lld/test/COFF/Inputs/pdb_lines_1_relative.yaml @@ -19,7 +19,6 @@ sections: Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ] Alignment: 4 SectionData: '' - SizeOfRawData: 0 - Name: .xdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -39,6 +38,7 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 + SectionData: 04000000F10000002F0000002D003C1100000000D0000700000000000000581B000000000000636C616E672076657273696F6E20372E302E30200000F1000000300000002A0047110000000000000000000000001B000000000000000000000002100000000000000000006D61696E0002004F11F20000003000000000000000000000001B00000000000000030000002400000000000000020000000C000000030000001100000004000000F400000030000000010000001001EA6429BCE282CCF3F0E3CD93B216EB410000110000001001061EB73ABB642532857A4F1D9CBAC3230000F30000001C000000002E5C7064625F6C696E65735F312E63002E5C666F6F2E6800000000 Subsections: - !Symbols Records: @@ -46,15 +46,15 @@ sections: Compile3Sym: Flags: [ ] Machine: X64 - FrontendMajor: 11 + FrontendMajor: 7 FrontendMinor: 0 FrontendBuild: 0 FrontendQFE: 0 - BackendMajor: 11000 + BackendMajor: 7000 BackendMinor: 0 BackendBuild: 0 BackendQFE: 0 - Version: 'clang version 11.0.0 (https://github.com/llvm/llvm-project.git 77dad72eae974338ddc13d74783c012ccbb8c5ac)' + Version: 'clang version 7.0.0 ' - !Symbols Records: - Kind: S_GPROC32_ID @@ -65,17 +65,8 @@ sections: FunctionType: 4098 Flags: [ ] DisplayName: main - - Kind: S_FRAMEPROC - FrameProcSym: - TotalFrameBytes: 40 - PaddingFrameBytes: 0 - OffsetToPadding: 0 - BytesOfCalleeSavedRegisters: 0 - OffsetOfExceptionHandler: 0 - SectionIdOfExceptionHandler: 0 - Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: {} + ScopeEndSym: - !Lines CodeSize: 27 Flags: [ ] @@ -96,15 +87,15 @@ sections: LineStart: 4 IsStatement: false EndDelta: 0 - Columns: [] + Columns: - !FileChecksums Checksums: - FileName: '.\pdb_lines_1.c' Kind: MD5 - Checksum: 9A64DD4298487888B1D99F825D520C5E + Checksum: EA6429BCE282CCF3F0E3CD93B216EB41 - FileName: '.\foo.h' Kind: MD5 - Checksum: A9D05E6DC184DE20A57797E24F8B0E97 + Checksum: 061EB73ABB642532857A4F1D9CBAC323 - !StringTable Strings: - '.\pdb_lines_1.c' @@ -112,27 +103,23 @@ sections: - '' - '' - '' - - !Symbols - Records: - - Kind: S_BUILDINFO - BuildInfoSym: - BuildId: 4105 Relocations: - - VirtualAddress: 184 + - VirtualAddress: 100 SymbolName: main Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 188 + - VirtualAddress: 104 SymbolName: main Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 240 + - VirtualAddress: 124 SymbolName: main Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 244 + - VirtualAddress: 128 SymbolName: main Type: IMAGE_REL_AMD64_SECTION - Name: '.debug$T' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 + SectionData: 0400000006000112000000000E0008107400000000000000001000001200011600000000011000006D61696E00F3F2F10E0008100300000000000000001000000E0001160000000003100000666F6F00 Types: - Kind: LF_ARGLIST ArgList: @@ -161,25 +148,6 @@ sections: ParentScope: 0 FunctionType: 4099 Name: foo - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: . - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: pdb_lines_1.c - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: 'buildninjaRel\bin\clang-cl.exe' - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: '"-cc1" "-triple" "x86_64-pc-windows-msvc19.26.28806" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "pdb_lines_1.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-resource-dir" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0" "-internal-isystem" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\ATLMFC\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\include" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\ucrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\shared" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\winrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\cppwinrt" "-fdebug-compilation-dir" "." "-ferror-limit" "19" "-fmessage-length=146" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.26.28806" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-faddrsig" "-x" "c"' - - Kind: LF_BUILDINFO - BuildInfo: - ArgIndices: [ 4101, 4103, 4102, 0, 4104 ] - Name: .pdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -192,12 +160,8 @@ sections: SymbolName: main Type: IMAGE_REL_AMD64_ADDR32NB - VirtualAddress: 8 - SymbolTableIndex: 6 + SymbolName: .xdata Type: IMAGE_REL_AMD64_ADDR32NB - - Name: .llvm_addrsig - Characteristics: [ IMAGE_SCN_LNK_REMOVE ] - Alignment: 1 - SectionData: 0A1D - Name: .xdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_LNK_COMDAT, IMAGE_SCN_MEM_READ ] Alignment: 4 @@ -205,6 +169,7 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_LNK_COMDAT, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 + SectionData: 04000000F10000002F000000290047110000000000000000000000000F00000000000000000000000410000000000000000000666F6F0002004F1100F20000003000000000000000000000000F000000180000000300000024000000000000000200000004000000030000000900000004000000 Subsections: - !Symbols Records: @@ -216,17 +181,8 @@ sections: FunctionType: 4100 Flags: [ ] DisplayName: foo - - Kind: S_FRAMEPROC - FrameProcSym: - TotalFrameBytes: 40 - PaddingFrameBytes: 0 - OffsetToPadding: 0 - BytesOfCalleeSavedRegisters: 0 - OffsetOfExceptionHandler: 0 - SectionIdOfExceptionHandler: 0 - Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: {} + ScopeEndSym: - !Lines CodeSize: 15 Flags: [ ] @@ -247,7 +203,7 @@ sections: LineStart: 4 IsStatement: false EndDelta: 0 - Columns: [] + Columns: Relocations: - VirtualAddress: 44 SymbolName: foo @@ -255,10 +211,10 @@ sections: - VirtualAddress: 48 SymbolName: foo Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 100 + - VirtualAddress: 68 SymbolName: foo Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 104 + - VirtualAddress: 72 SymbolName: foo Type: IMAGE_REL_AMD64_SECTION - Name: .pdata @@ -273,7 +229,7 @@ sections: SymbolName: foo Type: IMAGE_REL_AMD64_ADDR32NB - VirtualAddress: 8 - SymbolTableIndex: 11 + SymbolName: .xdata Type: IMAGE_REL_AMD64_ADDR32NB symbols: - Name: .text @@ -345,7 +301,7 @@ symbols: StorageClass: IMAGE_SYM_CLASS_EXTERNAL - Name: .xdata Value: 0 - SectionNumber: 11 + SectionNumber: 10 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC @@ -375,22 +331,22 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 396 + Length: 264 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 3390249978 + CheckSum: 2204933783 Number: 7 - Name: '.debug$S' Value: 0 - SectionNumber: 12 + SectionNumber: 11 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 148 + Length: 116 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 1236081121 + CheckSum: 2691661839 Number: 5 Selection: IMAGE_COMDAT_SELECT_ASSOCIATIVE - Name: '.debug$T' @@ -400,10 +356,10 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 2028 + Length: 80 NumberOfRelocations: 0 NumberOfLinenumbers: 0 - CheckSum: 2043733667 + CheckSum: 3541780432 Number: 8 - Name: .pdata Value: 0 @@ -419,7 +375,7 @@ symbols: Number: 9 - Name: .pdata Value: 0 - SectionNumber: 13 + SectionNumber: 12 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC @@ -430,24 +386,6 @@ symbols: CheckSum: 3642757804 Number: 5 Selection: IMAGE_COMDAT_SELECT_ASSOCIATIVE - - Name: .llvm_addrsig - Value: 0 - SectionNumber: 10 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_STATIC - SectionDefinition: - Length: 2 - NumberOfRelocations: 0 - NumberOfLinenumbers: 0 - CheckSum: 2582217811 - Number: 10 - - Name: '@feat.00' - Value: 0 - SectionNumber: -1 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_STATIC - Name: main Value: 0 SectionNumber: 1 @@ -460,11 +398,4 @@ symbols: SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_EXTERNAL - - Name: .file - Value: 0 - SectionNumber: -2 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_FILE - File: pdb_lines_1.c ... diff --git a/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml b/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml index 71ce5d63f508..1b051d82d9a4 100644 --- a/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml +++ b/lld/test/COFF/Inputs/pdb_lines_2_relative.yaml @@ -15,7 +15,6 @@ sections: Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ] Alignment: 4 SectionData: '' - SizeOfRawData: 0 - Name: .drectve Characteristics: [ IMAGE_SCN_LNK_INFO, IMAGE_SCN_LNK_REMOVE ] Alignment: 1 @@ -23,6 +22,7 @@ sections: - Name: '.debug$S' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 + SectionData: 04000000F10000002F0000002D003C1100000000D0000700000000000000581B000000000000636C616E672076657273696F6E20372E302E30200000F10000002F0000002900471100000000000000000000000001000000000000000000000002100000000000000000006261720002004F1100F2000000200000000000000000000000010000000000000001000000140000000000000002000000F400000018000000010000001001DF91CB3A2B8D917486574BB50CAC4CC70000F300000014000000002E5C7064625F6C696E65735F322E6300000000 Subsections: - !Symbols Records: @@ -30,15 +30,15 @@ sections: Compile3Sym: Flags: [ ] Machine: X64 - FrontendMajor: 11 + FrontendMajor: 7 FrontendMinor: 0 FrontendBuild: 0 FrontendQFE: 0 - BackendMajor: 11000 + BackendMajor: 7000 BackendMinor: 0 BackendBuild: 0 BackendQFE: 0 - Version: 'clang version 11.0.0 (https://github.com/llvm/llvm-project.git 77dad72eae974338ddc13d74783c012ccbb8c5ac)' + Version: 'clang version 7.0.0 ' - !Symbols Records: - Kind: S_GPROC32_ID @@ -49,17 +49,8 @@ sections: FunctionType: 4098 Flags: [ ] DisplayName: bar - - Kind: S_FRAMEPROC - FrameProcSym: - TotalFrameBytes: 0 - PaddingFrameBytes: 0 - OffsetToPadding: 0 - BytesOfCalleeSavedRegisters: 0 - OffsetOfExceptionHandler: 0 - SectionIdOfExceptionHandler: 0 - Flags: [ ] - Kind: S_PROC_ID_END - ScopeEndSym: {} + ScopeEndSym: - !Lines CodeSize: 1 Flags: [ ] @@ -72,39 +63,35 @@ sections: LineStart: 2 IsStatement: false EndDelta: 0 - Columns: [] + Columns: - !FileChecksums Checksums: - FileName: '.\pdb_lines_2.c' Kind: MD5 - Checksum: 4CC58B73BFD5AB52F87CFB3C604BB288 + Checksum: DF91CB3A2B8D917486574BB50CAC4CC7 - !StringTable Strings: - '.\pdb_lines_2.c' - '' - '' - '' - - !Symbols - Records: - - Kind: S_BUILDINFO - BuildInfoSym: - BuildId: 4103 Relocations: - - VirtualAddress: 184 + - VirtualAddress: 100 SymbolName: bar Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 188 + - VirtualAddress: 104 SymbolName: bar Type: IMAGE_REL_AMD64_SECTION - - VirtualAddress: 240 + - VirtualAddress: 124 SymbolName: bar Type: IMAGE_REL_AMD64_SECREL - - VirtualAddress: 244 + - VirtualAddress: 128 SymbolName: bar Type: IMAGE_REL_AMD64_SECTION - Name: '.debug$T' Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] Alignment: 4 + SectionData: 0400000006000112000000000E0008100300000000000000001000000E000116000000000110000062617200 Types: - Kind: LF_ARGLIST ArgList: @@ -121,29 +108,6 @@ sections: ParentScope: 0 FunctionType: 4097 Name: bar - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: . - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: pdb_lines_2.c - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: 'buildninjaRel\bin\clang-cl.exe' - - Kind: LF_STRING_ID - StringId: - Id: 0 - String: '"-cc1" "-triple" "x86_64-pc-windows-msvc19.26.28806" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "pdb_lines_2.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-resource-dir" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0" "-internal-isystem" "D:\\llvm-project\\buildninjaRel\\lib\\clang\\11.0.0\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\ATLMFC\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\include" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\ucrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\shared" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\winrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\cppwinrt" "-fdebug-compilation-dir" "." "-ferror-limit" "19" "-fmessage-length=146" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.26.28806" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-faddrsig" "-x" "c"' - - Kind: LF_BUILDINFO - BuildInfo: - ArgIndices: [ 4099, 4101, 4100, 0, 4102 ] - - Name: .llvm_addrsig - Characteristics: [ IMAGE_SCN_LNK_REMOVE ] - Alignment: 1 - SectionData: '' symbols: - Name: .text Value: 0 @@ -200,10 +164,10 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 348 + Length: 216 NumberOfRelocations: 4 NumberOfLinenumbers: 0 - CheckSum: 2408981505 + CheckSum: 2383431754 Number: 5 - Name: '.debug$T' Value: 0 @@ -212,40 +176,15 @@ symbols: ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_STATIC SectionDefinition: - Length: 1992 + Length: 44 NumberOfRelocations: 0 NumberOfLinenumbers: 0 - CheckSum: 1158086003 + CheckSum: 179171995 Number: 6 - - Name: .llvm_addrsig - Value: 0 - SectionNumber: 7 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_STATIC - SectionDefinition: - Length: 0 - NumberOfRelocations: 0 - NumberOfLinenumbers: 0 - CheckSum: 0 - Number: 7 - - Name: '@feat.00' - Value: 0 - SectionNumber: -1 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_STATIC - Name: bar Value: 0 SectionNumber: 1 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_FUNCTION StorageClass: IMAGE_SYM_CLASS_EXTERNAL - - Name: .file - Value: 0 - SectionNumber: -2 - SimpleType: IMAGE_SYM_TYPE_NULL - ComplexType: IMAGE_SYM_DTYPE_NULL - StorageClass: IMAGE_SYM_CLASS_FILE - File: pdb_lines_2.c ... diff --git a/lld/test/COFF/pdb-relative-source-lines.test b/lld/test/COFF/pdb-relative-source-lines.test index 632aa48cb6cd..547056785962 100644 --- a/lld/test/COFF/pdb-relative-source-lines.test +++ b/lld/test/COFF/pdb-relative-source-lines.test @@ -15,9 +15,7 @@ int main(void) { void bar(void) { } -$ clang-cl -fdebug-compilation-dir . -no-canonical-prefixes -c -Z7 pdb_lines*.c -$ obj2yaml pdb_lines_1.obj > pdb_lines_1_relative.yaml -$ obj2yaml pdb_lines_2.obj > pdb_lines_2_relative.yaml +$ clang-cl -Xclang -fdebug-compilation-dir -Xclang . -c -Z7 pdb_lines*.c /pdbsourcepath: only sets the directory that relative paths are considered relative to, so this test needs to pass relative paths to lld-link for: @@ -35,9 +33,9 @@ RUN: cd %t RUN: yaml2obj %S/Inputs/pdb_lines_1_relative.yaml -o %t/pdb_lines_1_relative.obj RUN: yaml2obj %S/Inputs/pdb_lines_2_relative.yaml -o %t/pdb_lines_2_relative.obj RUN: ./lld-link -debug "-pdbsourcepath:c:\src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj -RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck %s +RUN: llvm-pdbutil pdb2yaml -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck %s RUN: ./lld-link -debug "-pdbsourcepath:/usr/src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj -RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=POSIX %s +RUN: llvm-pdbutil pdb2yaml -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=POSIX %s CHECK-LABEL: - Module: 'c:\src\pdb_lines_1_relative.obj' CHECK-NEXT: ObjFile: 'c:\src\pdb_lines_1_relative.obj' @@ -72,20 +70,6 @@ CHECK-NEXT: - 'c:\src\out.pdb' CHECK-NEXT: - cmd CHECK-NEXT: - '-debug -pdbsourcepath:c:\src -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj' -CHECK-LABEL: IpiStream: - -CHECK: - Kind: LF_STRING_ID -CHECK-NEXT: StringId: -CHECK-NEXT: Id: 0 -CHECK-NEXT: String: 'c:\src' -CHECK-NEXT: - Kind: LF_STRING_ID -CHECK-NEXT: StringId: -CHECK-NEXT: Id: 0 -CHECK-NEXT: String: pdb_lines_1.c -CHECK-NEXT: - Kind: LF_STRING_ID -CHECK-NEXT: StringId: -CHECK-NEXT: Id: 0 -CHECK-NEXT: String: 'c:\src\buildninjaRel\bin\clang-cl.exe' POSIX-LABEL: - Module: '/usr/src/pdb_lines_1_relative.obj' POSIX-NEXT: ObjFile: '/usr/src/pdb_lines_1_relative.obj' @@ -119,17 +103,3 @@ POSIX-NEXT: - pdb POSIX-NEXT: - '/usr/src/out.pdb' POSIX-NEXT: - cmd POSIX-NEXT: - '-debug -pdbsourcepath:/usr/src -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj' - -POSIX-LABEL: IpiStream: -POSIX: - Kind: LF_STRING_ID -POSIX-NEXT: StringId: -POSIX-NEXT: Id: 0 -POSIX-NEXT: String: '/usr/src' -POSIX-NEXT: - Kind: LF_STRING_ID -POSIX-NEXT: StringId: -POSIX-NEXT: Id: 0 -POSIX-NEXT: String: pdb_lines_1.c -POSIX-NEXT: - Kind: LF_STRING_ID -POSIX-NEXT: StringId: -POSIX-NEXT: Id: 0 -POSIX-NEXT: String: '/usr/src/buildninjaRel/bin/clang-cl.exe' diff --git a/lld/test/COFF/pdb-relative-source-lines2.test b/lld/test/COFF/pdb-relative-source-lines2.test deleted file mode 100644 index 955f7bc1e453..000000000000 --- a/lld/test/COFF/pdb-relative-source-lines2.test +++ /dev/null @@ -1,66 +0,0 @@ -REQUIRES: system-windows - -Test the linker line tables on roughly the following example: - -==> foo.h <== -void bar(void); -inline void foo(void) { - bar(); -} -==> pdb_lines_1.c <== -#include "foo.h" -int main(void) { - foo(); - return 42; -} -==> pdb_lines_2.c <== -void bar(void) { -} - -$ clang-cl -fdebug-compilation-dir . -no-canonical-prefixes -c -Z7 pdb_lines*.c -$ obj2yaml pdb_lines_1.obj > pdb_lines_1_relative.yaml -$ obj2yaml pdb_lines_2.obj > pdb_lines_2_relative.yaml - -/pdbsourcepath: only sets the directory that relative paths are considered -relative to, so this test needs to pass relative paths to lld-link for: -1. The input obj files -2. The /pdb: switch -3. The lld-link invocation itself -To achieve this, put all inputs of the lld-link invocation (including lld-link -itself) in a temp directory that's cwd and then make sure to only use relative -arguments when calling ./lld-link below. -RUN: rm -rf %t -RUN: mkdir %t -RUN: cp lld-link %t/lld-link -RUN: cd %t - -Test the convoluted case at the end of remapBuildInfo() in lld/COFF/PDB.cpp -The only drawback right now is that this edge case will create LF_BUILDINFO -records with front references in the IPI stream. However the Visual Studio -debugger takes the .PDB thusly created without any problems. -Tested on VS2015, 2017 and 2019. - -RUN: yaml2obj %S/Inputs/pdb_lines_1_relative.yaml -o %t/pdb_lines_1_relative.obj -RUN: sed -e "s|String: \.|String: "c:\\\src"|" < %S/Inputs/pdb_lines_2_relative.yaml > %t/pdb_lines_2_relative.yaml -RUN: yaml2obj pdb_lines_2_relative.yaml -o %t/pdb_lines_2_relative.obj -RUN: ./lld-link -debug "-pdbsourcepath:c:\src" -entry:main -nodefaultlib -out:out.exe -pdb:out.pdb pdb_lines_1_relative.obj pdb_lines_2_relative.obj -RUN: llvm-pdbutil pdb2yaml -ipi-stream -modules -module-files -module-syms -subsections=lines,fc %t/out.pdb | FileCheck --check-prefix=EXISTING %s - -EXISTING-LABEL: IpiStream: - -EXISTING: - Kind: LF_STRING_ID -EXISTING-NEXT: StringId: -EXISTING-NEXT: Id: 0 -EXISTING-NEXT: String: . -EXISTING-NEXT: - Kind: LF_STRING_ID -EXISTING-NEXT: StringId: -EXISTING-NEXT: Id: 0 -EXISTING-NEXT: String: pdb_lines_1.c -EXISTING: - Kind: LF_STRING_ID -EXISTING-NEXT: StringId: -EXISTING-NEXT: Id: 0 -EXISTING-LABEL: String: 'c:\src' -EXISTING-NEXT: - Kind: LF_STRING_ID -EXISTING-NEXT: StringId: -EXISTING-NEXT: Id: 0 -EXISTING-NEXT: String: pdb_lines_2.c diff --git a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp index cf3c38c57f6d..f7041c0cc926 100644 --- a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp @@ -77,7 +77,6 @@ #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormatVariadic.h" #include "llvm/Support/Path.h" -#include "llvm/Support/Program.h" #include "llvm/Support/SMLoc.h" #include "llvm/Support/ScopedPrinter.h" #include "llvm/Target/TargetLoweringObjectFile.h" @@ -832,31 +831,6 @@ static TypeIndex getStringIdTypeIdx(GlobalTypeTableBuilder &TypeTable, return TypeTable.writeLeafType(SIR); } -static std::string flattenCommandLine(ArrayRef Args, - StringRef MainFilename) { - std::string FlatCmdLine; - raw_string_ostream OS(FlatCmdLine); - StringRef LastArg; - for (StringRef Arg : Args) { - if (Arg.empty()) - continue; - // The command-line shall not contain the file to compile. - if (Arg == MainFilename && LastArg != "-main-file-name") - continue; - // Also remove the output file. - if (Arg == "-o" || LastArg == "-o") { - LastArg = Arg; - continue; - } - if (!LastArg.empty()) - OS << " "; - llvm::sys::printArg(OS, Arg, /*Quote=*/true); - LastArg = Arg; - } - OS.flush(); - return FlatCmdLine; -} - void CodeViewDebug::emitBuildInfo() { // First, make LF_BUILDINFO. It's a sequence of strings with various bits of // build info. The known prefix is: @@ -877,16 +851,8 @@ void CodeViewDebug::emitBuildInfo() { getStringIdTypeIdx(TypeTable, MainSourceFile->getDirectory()); BuildInfoArgs[BuildInfoRecord::SourceFile] = getStringIdTypeIdx(TypeTable, MainSourceFile->getFilename()); - // FIXME: PDB is intentionally blank unless we implement /Zi type servers. - BuildInfoArgs[BuildInfoRecord::TypeServerPDB] = - getStringIdTypeIdx(TypeTable, ""); - if (Asm->TM.Options.MCOptions.Argv0 != nullptr) { - BuildInfoArgs[BuildInfoRecord::BuildTool] = - getStringIdTypeIdx(TypeTable, Asm->TM.Options.MCOptions.Argv0); - BuildInfoArgs[BuildInfoRecord::CommandLine] = getStringIdTypeIdx( - TypeTable, flattenCommandLine(Asm->TM.Options.MCOptions.CommandLineArgs, - MainSourceFile->getFilename())); - } + // FIXME: Path to compiler and command line. PDB is intentionally blank unless + // we implement /Zi type servers. BuildInfoRecord BIR(BuildInfoArgs); TypeIndex BuildInfoIndex = TypeTable.writeLeafType(BIR); diff --git a/llvm/test/DebugInfo/COFF/build-info.ll b/llvm/test/DebugInfo/COFF/build-info.ll index 983aa22214bc..94f006c3b093 100644 --- a/llvm/test/DebugInfo/COFF/build-info.ll +++ b/llvm/test/DebugInfo/COFF/build-info.ll @@ -5,7 +5,7 @@ ; CHECK-NEXT: 0x{{.*}}: `D:\src\scopes\clang` ; CHECK-NEXT: : `` ; CHECK-NEXT: 0x{{.*}}: `D:\src\scopes\foo.cpp` -; CHECK-NEXT: 0x{{.*}}: `` +; CHECK-NEXT: : `` ; CHECK-NEXT: : `` ; CHECK: {{.*}} | S_BUILDINFO [size = 8] BuildId = `[[INFO_IDX]]` diff --git a/llvm/test/DebugInfo/COFF/global-type-hashes.ll b/llvm/test/DebugInfo/COFF/global-type-hashes.ll index 3c6c27301b20..70f9df156a5b 100644 --- a/llvm/test/DebugInfo/COFF/global-type-hashes.ll +++ b/llvm/test/DebugInfo/COFF/global-type-hashes.ll @@ -295,8 +295,7 @@ attributes #2 = { noinline nounwind optnone "correctly-rounded-divide-sqrt-fp-ma ; YAML: - 4470750F2E319329 ; YAML: - 0FB556FD1FAB66D7 ; YAML: - 5970EFB4874D0F3F -; YAML: - D8EF11198C33843F -; YAML: - D81F744D7366282B +; YAML: - EDB1D74C120CF44A ; ... diff --git a/llvm/test/DebugInfo/COFF/types-basic.ll b/llvm/test/DebugInfo/COFF/types-basic.ll index 6455452d125a..81e0c25d17cd 100644 --- a/llvm/test/DebugInfo/COFF/types-basic.ll +++ b/llvm/test/DebugInfo/COFF/types-basic.ll @@ -511,22 +511,14 @@ ; ASM: .asciz "t.cpp" # StringData ; ASM: .byte 242 ; ASM: .byte 241 -; ASM: # StringId (0x1015) -; ASM: .short 0xa # Record length -; ASM: .short 0x1605 # Record kind: LF_STRING_ID -; ASM: .long 0x0 # Id -; ASM: .byte 0 # StringData -; ASM: .byte 243 -; ASM: .byte 242 -; ASM: .byte 241 -; ASM: # BuildInfo (0x1016) +; ASM: # BuildInfo (0x1015) ; ASM: .short 0x1a # Record length ; ASM: .short 0x1603 # Record kind: LF_BUILDINFO ; ASM: .short 0x5 # NumArgs ; ASM: .long 0x1013 # Argument: D:\src\llvm\build ; ASM: .long 0x0 # Argument ; ASM: .long 0x1014 # Argument: t.cpp -; ASM: .long 0x1015 # Argument +; ASM: .long 0x0 # Argument ; ASM: .long 0x0 # Argument ; ASM: .byte 242 ; ASM: .byte 241 diff --git a/llvm/test/DebugInfo/COFF/types-data-members.ll b/llvm/test/DebugInfo/COFF/types-data-members.ll index 1e699efdf8ed..87fde74b989c 100644 --- a/llvm/test/DebugInfo/COFF/types-data-members.ll +++ b/llvm/test/DebugInfo/COFF/types-data-members.ll @@ -727,22 +727,14 @@ ; ASM: .asciz "t.cpp" # StringData ; ASM: .byte 242 ; ASM: .byte 241 -; ASM: # StringId (0x1022) -; ASM: .short 0xa # Record length -; ASM: .short 0x1605 # Record kind: LF_STRING_ID -; ASM: .long 0x0 # Id -; ASM: .byte 0 # StringData -; ASM: .byte 243 -; ASM: .byte 242 -; ASM: .byte 241 -; ASM: # BuildInfo (0x1023) +; ASM: # BuildInfo (0x1022) ; ASM: .short 0x1a # Record length ; ASM: .short 0x1603 # Record kind: LF_BUILDINFO ; ASM: .short 0x5 # NumArgs ; ASM: .long 0x1020 # Argument: D:\src\llvm\build ; ASM: .long 0x0 # Argument ; ASM: .long 0x1021 # Argument: t.cpp -; ASM: .long 0x1022 # Argument +; ASM: .long 0x0 # Argument ; ASM: .long 0x0 # Argument ; ASM: .byte 242 ; ASM: .byte 241 From llvm-commits at lists.llvm.org Fri Jul 10 16:48:55 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:48:55 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <14793627be1fb1e52e9dbb502b0c4044@localhost.localdomain> tmsriram added a comment. > Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me. > I think it lacks a summary about what the patch intends to do. Here is an attempt and trying to help understand what needs to be addressed to get CFI right with basic block sections: One of the main goals of CFI directives is to allow the unwinder to retrieve the Canonical Frame Address (CFA) from any PC. CFA - Canonical Frame Address, is the address of the stack pointer just before a call instruction is executed. FDE - Frame Descriptor Entries are unique to each section and holds the CFI information in eh_frame section CIE - Common Information Entries that hold information that is common to multiple FDEs in eh_frame section. Let’s first cover some basics of CFI which will help us understand what needs to change to support basic block sections. The CFI directives for a simple program without any basic block sections: void f1(); void f3(bool b) { if (b) f1(); } With the frame pointer using -fno-omit-frame-pointer: .type _Z2f3b, at function _Z2f3b: # @_Z2f3b .cfi_startproc # %bb.0: # %entry pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp … popq %rbp .cfi_def_cfa %rsp, 8 retq .Lfunc_end0: .size _Z2f3b, .Lfunc_end0-_Z2f3b - All CFI directives need to be placed between a cfi_startproc and cfi_endproc directive - A section/function is the smallest granularity for CFI directives. All CFI directives within a section can share the same cfi_startproc and cfi_endproc. However, each function must have a unique startproc and endproc directive too even without function sections. - Looking through the directives in sequence, “cfi_def_cfa_offset 16” tells us that initially the CFA (Canonical Frame Address) is 16 bytes from where the stack pointer is currently because we pushed the return address and %rbp onto the stack. “cfi_offset %rbp, -16” says the rbp register’s old value is now available 16 bytes from the CFA, because of the push. - “.cfi_def_cfa_register %rbp” basically says that the CFA can now be computed off %rbp as we just moved the stack pointer to it, the original offset of 16 from the value of rbp is used. This is efficient as we can change the %rsp now as much as we want and we don’t have to worry about generating directives to update CFA. - So, just before returning from the function we pop %rbp and correctly update that CFI needs to be computed using the %rsp. - CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc specification. So, a single FDE must point to a contiguous code region unlike debug info which has the support for ranges. This is what complicates CFI for basic block sections. Now, what happens when we start placing individual basic blocks in unique sections: - Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function. - The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently. - Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section. - Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own. - We are working on a de-duplication patch that can share common information in FDEs in a CIE (Common Information Entry) and we will present this as a follow up patch. This can significantly reduce the duplication overhead and is particularly useful when several basic block sections are created. - The CFI directives are emitted similarly for registers that are pushed onto the stack, like callee saved registers in the prologue. There are cfi directives that emit how to retrieve the value of the register at that point when the push happened. This has to be duplicated too in a basic block that is floated as a separate section. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 16:50:29 2020 From: llvm-commits at lists.llvm.org (David Li via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:50:29 +0000 (UTC) Subject: [PATCH] D83596: [BPI] Compile time improvement when erasing blocks (NFC) In-Reply-To: References: Message-ID: <59ad61feb174da725e5eefcb54dec58a@localhost.localdomain> davidxl accepted this revision. davidxl added a comment. This revision is now accepted and ready to land. lgtm Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83596/new/ https://reviews.llvm.org/D83596 From llvm-commits at lists.llvm.org Fri Jul 10 16:50:55 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via llvm-commits) Date: Fri, 10 Jul 2020 16:50:55 -0700 (PDT) Subject: [compiler-rt] e54b228 - [Sanitizers] Change protoent test to check for IPv6 instead of RDP Message-ID: <5f08fedf.1c69fb81.b4f05.29aa@mx.google.com> Author: Gui Andrade Date: 2020-07-10T23:50:39Z New Revision: e54b228408852e385d89b5202573bdd8d8da58cf URL: https://github.com/llvm/llvm-project/commit/e54b228408852e385d89b5202573bdd8d8da58cf DIFF: https://github.com/llvm/llvm-project/commit/e54b228408852e385d89b5202573bdd8d8da58cf.diff LOG: [Sanitizers] Change protoent test to check for IPv6 instead of RDP Looks like RDP isn't present on some of LLVM's buildbot machines Added: Modified: compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp Removed: ################################################################################ diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp index defc38ae2b1e..a1a93badf6b8 100644 --- a/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp +++ b/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp @@ -46,18 +46,24 @@ void print_protoent_by_num(int num) { } int main() { + // CHECK: All protoent // CHECK: ip (0) // CHECK-NEXT: alias IP // CHECK: ipv6 (41) // CHECK-NEXT: alias IPv6 + fprintf(stderr, "All protoent\n"); print_all_protoent(); - // CHECK: rdp (27) - // CHECK-NEXT: alias RDP - print_protoent_by_name("rdp"); + // CHECK: Protoent by name + // CHECK-NEXT: ipv6 (41) + // CHECK-NEXT: alias IPv6 + fprintf(stderr, "Protoent by name\n"); + print_protoent_by_name("ipv6"); - // CHECK: udp (17) + // CHECK: Protoent by num + // CHECK-NEXT: udp (17) // CHECK-NEXT: alias UDP + fprintf(stderr, "Protoent by num\n"); print_protoent_by_num(17); return 0; } From llvm-commits at lists.llvm.org Fri Jul 10 16:52:12 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:52:12 +0000 (UTC) Subject: [PATCH] D83598: [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts Message-ID: craig.topper created this revision. craig.topper added reviewers: llvm-commits, spatel. Herald added a subscriber: hiraditya. Herald added a project: LLVM. peekThroughOneUseBitcasts checks the use count of the operand of the bitcast. Not the bitcast itself. So I think that means we need to do any outside haseOneUse checks before calling the function not after. I was working on another patch where I misused the function and did a very quick audit to see if I there were other similar mistakes. https://reviews.llvm.org/D83598 Files: llvm/lib/Target/X86/X86ISelLowering.cpp Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36461,9 +36461,9 @@ (V.getOpcode() == X86ISD::PSHUFLW || V.getOpcode() == X86ISD::PSHUFHW) && V.getOpcode() != N.getOpcode() && - V.hasOneUse()) { + V.hasOneUse() && V.getOperand(0).hasOneUse()) { SDValue D = peekThroughOneUseBitcasts(V.getOperand(0)); - if (D.getOpcode() == X86ISD::PSHUFD && D.hasOneUse()) { + if (D.getOpcode() == X86ISD::PSHUFD) { SmallVector VMask = getPSHUFShuffleMask(V); SmallVector DMask = getPSHUFShuffleMask(D); int NOffset = N.getOpcode() == X86ISD::PSHUFLW ? 0 : 4; @@ -36900,10 +36900,11 @@ // insert into a zero vector. This helps get VZEXT_MOVL closer to // scalar_to_vectors where 256/512 are canonicalized to an insert and a // 128-bit scalar_to_vector. This reduces the number of isel patterns. - if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps()) { + if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps() && + N->getOperand(0).hasOneUse()) { SDValue V = peekThroughOneUseBitcasts(N->getOperand(0)); - if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.hasOneUse() && + if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.getOperand(0).isUndef() && isNullConstant(V.getOperand(2))) { SDValue In = V.getOperand(1); MVT SubVT = -------------- next part -------------- A non-text attachment was scrubbed... Name: D83598.277168.patch Type: text/x-patch Size: 1554 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:56:10 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via llvm-commits) Date: Fri, 10 Jul 2020 16:56:10 -0700 (PDT) Subject: [llvm] 3e5173d - [BPI] Compile time improvement when erasing blocks (NFC) Message-ID: <5f09001a.1c69fb81.5812c.1452@mx.google.com> Author: Teresa Johnson Date: 2020-07-10T16:55:54-07:00 New Revision: 3e5173dbc352317712ca35788333c6e118cbf79c URL: https://github.com/llvm/llvm-project/commit/3e5173dbc352317712ca35788333c6e118cbf79c DIFF: https://github.com/llvm/llvm-project/commit/3e5173dbc352317712ca35788333c6e118cbf79c.diff LOG: [BPI] Compile time improvement when erasing blocks (NFC) Summary: eraseBlock is trying to erase all probability info for the given BB. This info is stored in a DenseMap organized like so: using Edge = std::pair; DenseMap Probs; where the unsigned in the Edge key is the successor id. It was walking through every single map entry, checking if the BB in the key's pair matched the given BB. Much more efficient is to do what another method (getEdgeProbability) was already doing, which is to walk the successors of the BB, and simply do a map lookup on the key formed from each pair. Doing this dropped the overall compile time for a file containing a very large function by around 32%. Reviewers: davidxl, xur Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D83596 Added: Modified: llvm/lib/Analysis/BranchProbabilityInfo.cpp Removed: ################################################################################ diff --git a/llvm/lib/Analysis/BranchProbabilityInfo.cpp b/llvm/lib/Analysis/BranchProbabilityInfo.cpp index da711c4acaf6..a396b5ad21c6 100644 --- a/llvm/lib/Analysis/BranchProbabilityInfo.cpp +++ b/llvm/lib/Analysis/BranchProbabilityInfo.cpp @@ -1056,10 +1056,10 @@ BranchProbabilityInfo::printEdgeProbability(raw_ostream &OS, } void BranchProbabilityInfo::eraseBlock(const BasicBlock *BB) { - for (auto I = Probs.begin(), E = Probs.end(); I != E; ++I) { - auto Key = I->first; - if (Key.first == BB) - Probs.erase(Key); + for (const_succ_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) { + auto MapI = Probs.find(std::make_pair(BB, I.getSuccessorIndex())); + if (MapI != Probs.end()) + Probs.erase(MapI); } } From llvm-commits at lists.llvm.org Fri Jul 10 16:56:11 2020 From: llvm-commits at lists.llvm.org (Teresa Johnson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:56:11 +0000 (UTC) Subject: [PATCH] D83596: [BPI] Compile time improvement when erasing blocks (NFC) In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG3e5173dbc352: [BPI] Compile time improvement when erasing blocks (NFC) (authored by tejohnson). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83596/new/ https://reviews.llvm.org/D83596 Files: llvm/lib/Analysis/BranchProbabilityInfo.cpp Index: llvm/lib/Analysis/BranchProbabilityInfo.cpp =================================================================== --- llvm/lib/Analysis/BranchProbabilityInfo.cpp +++ llvm/lib/Analysis/BranchProbabilityInfo.cpp @@ -1056,10 +1056,10 @@ } void BranchProbabilityInfo::eraseBlock(const BasicBlock *BB) { - for (auto I = Probs.begin(), E = Probs.end(); I != E; ++I) { - auto Key = I->first; - if (Key.first == BB) - Probs.erase(Key); + for (const_succ_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) { + auto MapI = Probs.find(std::make_pair(BB, I.getSuccessorIndex())); + if (MapI != Probs.end()) + Probs.erase(MapI); } } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83596.277172.patch Type: text/x-patch Size: 668 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 16:58:26 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Fri, 10 Jul 2020 23:58:26 +0000 (UTC) Subject: [PATCH] D82615: [HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel In-Reply-To: References: Message-ID: <639d8ed018efdd49e3f903dacd19aa99@localhost.localdomain> aemerson accepted this revision. aemerson added a comment. jq3ysyuS Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82615/new/ https://reviews.llvm.org/D82615 From llvm-commits at lists.llvm.org Fri Jul 10 17:01:25 2020 From: llvm-commits at lists.llvm.org (Amara Emerson via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:01:25 +0000 (UTC) Subject: [PATCH] D82615: [HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel In-Reply-To: References: Message-ID: <0d41d44d61b2ef4cc255efb885e55843@localhost.localdomain> aemerson added a comment. In D82615#2145444 , @aemerson wrote: > jq3ysyuS My 9 month old did this. LGTM anyway. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82615/new/ https://reviews.llvm.org/D82615 From llvm-commits at lists.llvm.org Fri Jul 10 17:04:26 2020 From: llvm-commits at lists.llvm.org (Gui Andrade via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:04:26 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <878443d8749210bff7f0d0e70a9fefd8@localhost.localdomain> guiand added a comment. I figured if it's using GEP then it's likely not going to be storing to the entire shadow. Bitcast is overwhelmingly common though (especially bitcast->lifetime.start, which I realize I need to handle). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 17:07:41 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via llvm-commits) Date: Fri, 10 Jul 2020 17:07:41 -0700 (PDT) Subject: [llvm] 0f0c5af - [COFF] Add cg_profile directive and .llvm.call-graph-profile section Message-ID: <5f0902cd.1c69fb81.887d0.1ae2@mx.google.com> Author: Zequan Wu Date: 2020-07-10T17:07:30-07:00 New Revision: 0f0c5af3db9b0159d9b1a89faff3bd047510b628 URL: https://github.com/llvm/llvm-project/commit/0f0c5af3db9b0159d9b1a89faff3bd047510b628 DIFF: https://github.com/llvm/llvm-project/commit/0f0c5af3db9b0159d9b1a89faff3bd047510b628.diff LOG: [COFF] Add cg_profile directive and .llvm.call-graph-profile section Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83597 Added: llvm/test/MC/COFF/cgprofile.s Modified: llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h llvm/include/llvm/MC/MCWinCOFFStreamer.h llvm/lib/MC/MCParser/COFFAsmParser.cpp llvm/lib/MC/MCParser/ELFAsmParser.cpp llvm/lib/MC/MCParser/MCAsmParserExtension.cpp llvm/lib/MC/MCWinCOFFStreamer.cpp llvm/lib/MC/WinCOFFObjectWriter.cpp llvm/test/MC/AsmParser/directive_cgprofile.s Removed: ################################################################################ diff --git a/llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h b/llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h index 5d2afe81a54b..c37889cfc509 100644 --- a/llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h +++ b/llvm/include/llvm/MC/MCParser/MCAsmParserExtension.h @@ -98,6 +98,8 @@ class MCAsmParserExtension { return getParser().parseOptionalToken(T); } + bool ParseDirectiveCGProfile(StringRef, SMLoc); + bool check(bool P, const Twine &Msg) { return getParser().check(P, Msg); } diff --git a/llvm/include/llvm/MC/MCWinCOFFStreamer.h b/llvm/include/llvm/MC/MCWinCOFFStreamer.h index b5f570ec335c..1236304b9e5d 100644 --- a/llvm/include/llvm/MC/MCWinCOFFStreamer.h +++ b/llvm/include/llvm/MC/MCWinCOFFStreamer.h @@ -64,6 +64,8 @@ class MCWinCOFFStreamer : public MCObjectStreamer { unsigned ByteAlignment) override; void emitIdent(StringRef IdentString) override; void EmitWinEHHandlerData(SMLoc Loc) override; + void emitCGProfileEntry(const MCSymbolRefExpr *From, + const MCSymbolRefExpr *To, uint64_t Count) override; void finishImpl() override; /// \} @@ -73,6 +75,9 @@ class MCWinCOFFStreamer : public MCObjectStreamer { void emitInstToData(const MCInst &Inst, const MCSubtargetInfo &STI) override; + void finalizeCGProfileEntry(const MCSymbolRefExpr *&S); + void finalizeCGProfile(); + private: void Error(const Twine &Msg) const; }; diff --git a/llvm/lib/MC/MCParser/COFFAsmParser.cpp b/llvm/lib/MC/MCParser/COFFAsmParser.cpp index dec004eb6f95..2104fb83b309 100644 --- a/llvm/lib/MC/MCParser/COFFAsmParser.cpp +++ b/llvm/lib/MC/MCParser/COFFAsmParser.cpp @@ -70,6 +70,7 @@ class COFFAsmParser : public MCAsmParserExtension { addDirectiveHandler<&COFFAsmParser::ParseDirectiveLinkOnce>(".linkonce"); addDirectiveHandler<&COFFAsmParser::ParseDirectiveRVA>(".rva"); addDirectiveHandler<&COFFAsmParser::ParseDirectiveSymbolAttribute>(".weak"); + addDirectiveHandler<&COFFAsmParser::ParseDirectiveCGProfile>(".cg_profile"); // Win64 EH directives. addDirectiveHandler<&COFFAsmParser::ParseSEHDirectiveStartProc>( @@ -125,6 +126,7 @@ class COFFAsmParser : public MCAsmParserExtension { bool parseCOMDATType(COFF::COMDATType &Type); bool ParseDirectiveLinkOnce(StringRef, SMLoc); bool ParseDirectiveRVA(StringRef, SMLoc); + bool ParseDirectiveCGProfile(StringRef, SMLoc); // Win64 EH directives. bool ParseSEHDirectiveStartProc(StringRef, SMLoc); @@ -299,6 +301,10 @@ bool COFFAsmParser::ParseDirectiveSymbolAttribute(StringRef Directive, SMLoc) { return false; } +bool COFFAsmParser::ParseDirectiveCGProfile(StringRef S, SMLoc Loc) { + return MCAsmParserExtension::ParseDirectiveCGProfile(S, Loc); +} + bool COFFAsmParser::ParseSectionSwitch(StringRef Section, unsigned Characteristics, SectionKind Kind) { diff --git a/llvm/lib/MC/MCParser/ELFAsmParser.cpp b/llvm/lib/MC/MCParser/ELFAsmParser.cpp index a80e8a5832ef..e5ab13bc719d 100644 --- a/llvm/lib/MC/MCParser/ELFAsmParser.cpp +++ b/llvm/lib/MC/MCParser/ELFAsmParser.cpp @@ -862,45 +862,8 @@ bool ELFAsmParser::ParseDirectiveSubsection(StringRef, SMLoc) { return false; } -/// ParseDirectiveCGProfile -/// ::= .cg_profile identifier, identifier, -bool ELFAsmParser::ParseDirectiveCGProfile(StringRef, SMLoc) { - StringRef From; - SMLoc FromLoc = getLexer().getLoc(); - if (getParser().parseIdentifier(From)) - return TokError("expected identifier in directive"); - - if (getLexer().isNot(AsmToken::Comma)) - return TokError("expected a comma"); - Lex(); - - StringRef To; - SMLoc ToLoc = getLexer().getLoc(); - if (getParser().parseIdentifier(To)) - return TokError("expected identifier in directive"); - - if (getLexer().isNot(AsmToken::Comma)) - return TokError("expected a comma"); - Lex(); - - int64_t Count; - if (getParser().parseIntToken( - Count, "expected integer count in '.cg_profile' directive")) - return true; - - if (getLexer().isNot(AsmToken::EndOfStatement)) - return TokError("unexpected token in directive"); - - MCSymbol *FromSym = getContext().getOrCreateSymbol(From); - MCSymbol *ToSym = getContext().getOrCreateSymbol(To); - - getStreamer().emitCGProfileEntry( - MCSymbolRefExpr::create(FromSym, MCSymbolRefExpr::VK_None, getContext(), - FromLoc), - MCSymbolRefExpr::create(ToSym, MCSymbolRefExpr::VK_None, getContext(), - ToLoc), - Count); - return false; +bool ELFAsmParser::ParseDirectiveCGProfile(StringRef S, SMLoc Loc) { + return MCAsmParserExtension::ParseDirectiveCGProfile(S, Loc); } namespace llvm { diff --git a/llvm/lib/MC/MCParser/MCAsmParserExtension.cpp b/llvm/lib/MC/MCParser/MCAsmParserExtension.cpp index 18d18f0cf6ed..0b5046cd8fad 100644 --- a/llvm/lib/MC/MCParser/MCAsmParserExtension.cpp +++ b/llvm/lib/MC/MCParser/MCAsmParserExtension.cpp @@ -7,6 +7,8 @@ //===----------------------------------------------------------------------===// #include "llvm/MC/MCParser/MCAsmParserExtension.h" +#include "llvm/MC/MCContext.h" +#include "llvm/MC/MCStreamer.h" using namespace llvm; @@ -17,3 +19,44 @@ MCAsmParserExtension::~MCAsmParserExtension() = default; void MCAsmParserExtension::Initialize(MCAsmParser &Parser) { this->Parser = &Parser; } + +/// ParseDirectiveCGProfile +/// ::= .cg_profile identifier, identifier, +bool MCAsmParserExtension::ParseDirectiveCGProfile(StringRef, SMLoc) { + StringRef From; + SMLoc FromLoc = getLexer().getLoc(); + if (getParser().parseIdentifier(From)) + return TokError("expected identifier in directive"); + + if (getLexer().isNot(AsmToken::Comma)) + return TokError("expected a comma"); + Lex(); + + StringRef To; + SMLoc ToLoc = getLexer().getLoc(); + if (getParser().parseIdentifier(To)) + return TokError("expected identifier in directive"); + + if (getLexer().isNot(AsmToken::Comma)) + return TokError("expected a comma"); + Lex(); + + int64_t Count; + if (getParser().parseIntToken( + Count, "expected integer count in '.cg_profile' directive")) + return true; + + if (getLexer().isNot(AsmToken::EndOfStatement)) + return TokError("unexpected token in directive"); + + MCSymbol *FromSym = getContext().getOrCreateSymbol(From); + MCSymbol *ToSym = getContext().getOrCreateSymbol(To); + + getStreamer().emitCGProfileEntry( + MCSymbolRefExpr::create(FromSym, MCSymbolRefExpr::VK_None, getContext(), + FromLoc), + MCSymbolRefExpr::create(ToSym, MCSymbolRefExpr::VK_None, getContext(), + ToLoc), + Count); + return false; +} diff --git a/llvm/lib/MC/MCWinCOFFStreamer.cpp b/llvm/lib/MC/MCWinCOFFStreamer.cpp index 7f0f7fccd542..d8fde4004d44 100644 --- a/llvm/lib/MC/MCWinCOFFStreamer.cpp +++ b/llvm/lib/MC/MCWinCOFFStreamer.cpp @@ -328,7 +328,34 @@ void MCWinCOFFStreamer::EmitWinEHHandlerData(SMLoc Loc) { llvm_unreachable("not implemented"); } +void MCWinCOFFStreamer::emitCGProfileEntry(const MCSymbolRefExpr *From, + const MCSymbolRefExpr *To, + uint64_t Count) { + // Ignore temporary symbols for now. + if (!From->getSymbol().isTemporary() && !To->getSymbol().isTemporary()) + getAssembler().CGProfile.push_back({From, To, Count}); +} + +void MCWinCOFFStreamer::finalizeCGProfileEntry(const MCSymbolRefExpr *&SRE) { + const MCSymbol *S = &SRE->getSymbol(); + bool Created; + getAssembler().registerSymbol(*S, &Created); + if (Created) { + cast(S)->setIsWeakExternal(); + cast(S)->setExternal(true); + } +} + +void MCWinCOFFStreamer::finalizeCGProfile() { + for (MCAssembler::CGProfileEntry &E : getAssembler().CGProfile) { + finalizeCGProfileEntry(E.From); + finalizeCGProfileEntry(E.To); + } +} + void MCWinCOFFStreamer::finishImpl() { + finalizeCGProfile(); + MCObjectStreamer::finishImpl(); } diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp index c6829f5e107a..94a8d56c55fc 100644 --- a/llvm/lib/MC/WinCOFFObjectWriter.cpp +++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp @@ -154,6 +154,8 @@ class WinCOFFObjectWriter : public MCObjectWriter { MCSectionCOFF *AddrsigSection; std::vector AddrsigSyms; + MCSectionCOFF *CGProfileSection = nullptr; + WinCOFFObjectWriter(std::unique_ptr MOTW, raw_pwrite_stream &OS); @@ -674,6 +676,13 @@ void WinCOFFObjectWriter::executePostLayoutBinding(MCAssembler &Asm, Asm.registerSection(*AddrsigSection); } + if (!Asm.CGProfile.empty()) { + CGProfileSection = Asm.getContext().getCOFFSection( + ".llvm.call-graph-profile", COFF::IMAGE_SCN_LNK_REMOVE, + SectionKind::getMetadata()); + Asm.registerSection(*CGProfileSection); + } + // "Define" each section & symbol. This creates section & symbol // entries in the staging area. for (const auto &Section : Asm) @@ -1099,6 +1108,20 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm, } } + // Create the contents of the .llvm.call-graph-profile section. + if (CGProfileSection) { + auto *Frag = new MCDataFragment(CGProfileSection); + Frag->setLayoutOrder(0); + raw_svector_ostream OS(Frag->getContents()); + for (const MCAssembler::CGProfileEntry &CGPE : Asm.CGProfile) { + uint32_t FromIndex = CGPE.From->getSymbol().getIndex(); + uint32_t ToIndex = CGPE.To->getSymbol().getIndex(); + OS.write((const char *)&FromIndex, sizeof(uint32_t)); + OS.write((const char *)&ToIndex, sizeof(uint32_t)); + OS.write((const char *)&CGPE.Count, sizeof(uint64_t)); + } + } + assignFileOffsets(Asm, Layout); // MS LINK expects to be able to use this timestamp to implement their diff --git a/llvm/test/MC/AsmParser/directive_cgprofile.s b/llvm/test/MC/AsmParser/directive_cgprofile.s index 1db93dcbb033..b7bb82ed270d 100644 --- a/llvm/test/MC/AsmParser/directive_cgprofile.s +++ b/llvm/test/MC/AsmParser/directive_cgprofile.s @@ -1,5 +1,5 @@ # RUN: llvm-mc -triple i386-unknown-unknown %s | FileCheck %s - +# RUN: llvm-mc -triple x86_64-pc-win32 %s | FileCheck %s .cg_profile a, b, 32 .cg_profile freq, a, 11 .cg_profile freq, b, 20 diff --git a/llvm/test/MC/COFF/cgprofile.s b/llvm/test/MC/COFF/cgprofile.s new file mode 100644 index 000000000000..a0c47a69c069 --- /dev/null +++ b/llvm/test/MC/COFF/cgprofile.s @@ -0,0 +1,119 @@ +# RUN: llvm-mc -filetype=obj -triple x86_64-pc-win32 %s -o %t +# RUN: llvm-readobj -S --symbols --sd --cg-profile %t | FileCheck %s + + .section .test,"w" +a: + + .cg_profile a, b, 32 + .cg_profile freq, a, 11 + .cg_profile late, late2, 20 + .cg_profile .L.local, b, 42 + + .globl late +late: +late2: .word 0 +late3: +.L.local: + +# CHECK: Name: .llvm.call-graph-profile +# CHECK-NEXT: VirtualSize: +# CHECK-NEXT: VirtualAddress: +# CHECK-NEXT: RawDataSize: 48 +# CHECK-NEXT: PointerToRawData: +# CHECK-NEXT: PointerToRelocations: +# CHECK-NEXT: PointerToLineNumbers: +# CHECK-NEXT: RelocationCount: +# CHECK-NEXT: LineNumberCount: +# CHECK-NEXT: Characteristics [ (0x100800) +# CHECK-NEXT: IMAGE_SCN_ALIGN_1BYTES (0x100000) +# CHECK-NEXT: IMAGE_SCN_LNK_REMOVE (0x800) +# CHECK-NEXT: ] +# CHECK-NEXT: SectionData ( +# CHECK-NEXT: 0000: 0A000000 0E000000 20000000 00000000 +# CHECK-NEXT: 0010: 11000000 0A000000 0B000000 00000000 +# CHECK-NEXT: 0020: 0B000000 0C000000 14000000 00000000 +# CHECK-NEXT: ) + +# CHECK: Symbols [ +# CHECK: Name: a +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: .test +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: Static +# CHECK-NEXT: AuxSymbolCount: +# CHECK: Name: late +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: .test +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: External +# CHECK-NEXT: AuxSymbolCount: +# CHECK: Name: late2 +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: .test +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: Static +# CHECK-NEXT: AuxSymbolCount: +# CHECK: Name: late3 +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: .test +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: Static +# CHECK-NEXT: AuxSymbolCount: +# CHECK: Name: b +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: IMAGE_SYM_UNDEFINED +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: WeakExternal +# CHECK-NEXT: AuxSymbolCount: 1 +# CHECK-NEXT: AuxWeakExternal { +# CHECK-NEXT: Linked: .weak.b.default.late +# CHECK-NEXT: Search: Alias +# CHECK-NEXT: } +# CHECK: Name: .weak.b.default.late +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: IMAGE_SYM_ABSOLUTE +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: External +# CHECK-NEXT: AuxSymbolCount: 0 +# CHECK: Name: freq +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: IMAGE_SYM_UNDEFINED +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: WeakExternal +# CHECK-NEXT: AuxSymbolCount: 1 +# CHECK-NEXT: AuxWeakExternal { +# CHECK-NEXT: Linked: .weak.freq.default.late +# CHECK-NEXT: Search: Alias +# CHECK-NEXT: } +# CHECK: Name: .weak.freq.default.late +# CHECK-NEXT: Value: +# CHECK-NEXT: Section: IMAGE_SYM_ABSOLUTE +# CHECK-NEXT: BaseType: +# CHECK-NEXT: ComplexType: +# CHECK-NEXT: StorageClass: External +# CHECK-NEXT: AuxSymbolCount: 0 + +# CHECK: CGProfile [ +# CHECK-NEXT: CGProfileEntry { +# CHECK-NEXT: From: a +# CHECK-NEXT: To: b +# CHECK-NEXT: Weight: 32 +# CHECK-NEXT: } +# CHECK-NEXT: CGProfileEntry { +# CHECK-NEXT: From: freq +# CHECK-NEXT: To: a +# CHECK-NEXT: Weight: 11 +# CHECK-NEXT: } +# CHECK-NEXT: CGProfileEntry { +# CHECK-NEXT: From: late +# CHECK-NEXT: To: late2 +# CHECK-NEXT: Weight: 20 +# CHECK-NEXT: } +# CHECK-NEXT: ] From llvm-commits at lists.llvm.org Fri Jul 10 17:10:48 2020 From: llvm-commits at lists.llvm.org (Brian Yang via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:10:48 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: <3a4ecef969aa4c7d61365eccbd5bc0ca@localhost.localdomain> ichoyjx accepted this revision. ichoyjx added a comment. This revision is now accepted and ready to land. Thanks for the `Nowait` changes. Very nice work! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Fri Jul 10 17:12:07 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Fri, 10 Jul 2020 17:12:07 -0700 (PDT) Subject: [llvm] 943660f - [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen Message-ID: <5f0903d7.1c69fb81.5e829.041c@mx.google.com> Author: Valentin Clement Date: 2020-07-10T20:11:57-04:00 New Revision: 943660fd15f193dc6961597c25541fee2e01ebbb URL: https://github.com/llvm/llvm-project/commit/943660fd15f193dc6961597c25541fee2e01ebbb DIFF: https://github.com/llvm/llvm-project/commit/943660fd15f193dc6961597c25541fee2e01ebbb.diff LOG: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen Summary: Diff D83176 moved the last piece of code from OMPConstants.cpp and now this file was only useful to include the tablegen generated file. This patch replace OMPConstants.cpp with OMP.cpp generated by tablegen. Reviewers: sstefan1, jdoerfert, jdenny Reviewed By: sstefan1 Subscribers: mgorny, yaxunl, hiraditya, guansong, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83583 Added: Modified: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/lib/Frontend/OpenMP/CMakeLists.txt llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp Removed: llvm/lib/Frontend/OpenMP/OMPConstants.cpp ################################################################################ diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td index 87fb88c31ed0..785a520613b9 100644 --- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td +++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td @@ -39,6 +39,10 @@ class DirectiveLanguage { // Generate include and macro to enable LLVM BitmaskEnum. bit enableBitmaskEnumInNamespace = 0; + + // Header file included in the implementation code generated. Ususally the + // output file of the declaration code generation. Can be left blank. + string includeHeader = ""; } // Information about a specific clause. diff --git a/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt b/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt index e93fa38becfc..69f503675940 100644 --- a/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt +++ b/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt @@ -1,4 +1,3 @@ set(LLVM_TARGET_DEFINITIONS OMP.td) tablegen(LLVM OMP.h.inc --gen-directive-decl) -tablegen(LLVM OMP.cpp.inc --gen-directive-impl) add_public_tablegen_target(omp_gen) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td index 692bd2fb3210..bd81eeb01127 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMP.td +++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td @@ -23,6 +23,7 @@ def OpenMP : DirectiveLanguage { let clausePrefix = "OMPC_"; let makeEnumAvailableInNamespace = 1; let enableBitmaskEnumInNamespace = 1; + let includeHeader = "llvm/Frontend/OpenMP/OMP.h.inc"; } //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Frontend/OpenMP/CMakeLists.txt b/llvm/lib/Frontend/OpenMP/CMakeLists.txt index d137304ddc32..f88e3ed98662 100644 --- a/llvm/lib/Frontend/OpenMP/CMakeLists.txt +++ b/llvm/lib/Frontend/OpenMP/CMakeLists.txt @@ -1,5 +1,9 @@ +set(LLVM_TARGET_DEFINITIONS ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/OpenMP/OMP.td) +tablegen(LLVM OMP.cpp --gen-directive-impl) +add_public_tablegen_target(omp_cpp) + add_llvm_component_library(LLVMFrontendOpenMP - OMPConstants.cpp + OMP.cpp # Generated by tablegen above OMPContext.cpp OMPIRBuilder.cpp @@ -10,4 +14,5 @@ add_llvm_component_library(LLVMFrontendOpenMP DEPENDS intrinsics_gen omp_gen - ) + omp_cpp + ) \ No newline at end of file diff --git a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp b/llvm/lib/Frontend/OpenMP/OMPConstants.cpp deleted file mode 100644 index fdee3c5ef658..000000000000 --- a/llvm/lib/Frontend/OpenMP/OMPConstants.cpp +++ /dev/null @@ -1,21 +0,0 @@ -//===- OMPConstants.cpp - Helpers related to OpenMP code generation ---===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -//===----------------------------------------------------------------------===// - -#include "llvm/Frontend/OpenMP/OMPConstants.h" - -#include "llvm/ADT/StringRef.h" -#include "llvm/ADT/StringSwitch.h" -#include "llvm/IR/Module.h" -#include "llvm/IR/Type.h" - -using namespace llvm; -using namespace omp; - -#include "llvm/Frontend/OpenMP/OMP.cpp.inc" diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index 43b7ec399b99..8b3cc8702bd4 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -72,7 +72,13 @@ def TDL_DirA : Directive<"dira"> { // CHECK-NEXT: #endif // LLVM_Tdl_INC -// IMPL: Directive llvm::tdl::getTdlDirectiveKind(llvm::StringRef Str) { +// IMPL: #include "llvm/ADT/StringRef.h" +// IMPL-NEXT: #include "llvm/ADT/StringSwitch.h" +// IMPL-EMPTY: +// IMPL-NEXT: using namespace llvm; +// IMPL-NEXT: using namespace tdl; +// IMPL-EMPTY: +// IMPL-NEXT: Directive llvm::tdl::getTdlDirectiveKind(llvm::StringRef Str) { // IMPL-NEXT: return llvm::StringSwitch(Str) // IMPL-NEXT: .Case("dira",TDLD_dira) // IMPL-NEXT: .Default(TDLD_dira); diff --git a/llvm/test/TableGen/directive2.td b/llvm/test/TableGen/directive2.td index 10f48c2a3ceb..06c7aabcf3ad 100644 --- a/llvm/test/TableGen/directive2.td +++ b/llvm/test/TableGen/directive2.td @@ -9,6 +9,7 @@ def TestDirectiveLanguage : DirectiveLanguage { let cppNamespace = "tdl"; let directivePrefix = "TDLD_"; let clausePrefix = "TDLC_"; + let includeHeader = "tdl.h.inc"; } def TDLC_ClauseA : Clause<"clausea"> { @@ -62,7 +63,14 @@ def TDL_DirA : Directive<"dira"> { // CHECK-NEXT: } // namespace llvm // CHECK-NEXT: #endif // LLVM_Tdl_INC - +// IMPL: #include "tdl.h.inc" +// IMPL-EMPTY: +// IMPL-NEXT: #include "llvm/ADT/StringRef.h" +// IMPL-NEXT: #include "llvm/ADT/StringSwitch.h" +// IMPL-EMPTY: +// IMPL-NEXT: using namespace llvm; +// IMPL-NEXT: using namespace tdl; +// IMPL-EMPTY: // IMPL: Directive llvm::tdl::getTdlDirectiveKind(llvm::StringRef Str) { // IMPL-NEXT: return llvm::StringSwitch(Str) // IMPL-NEXT: .Case("dira",TDLD_dira) diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index d4d2b7965420..37f1677a7a84 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -284,10 +284,24 @@ void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { StringRef LanguageName = DirectiveLanguage->getValueAsString("name"); StringRef ClausePrefix = DirectiveLanguage->getValueAsString("clausePrefix"); StringRef CppNamespace = DirectiveLanguage->getValueAsString("cppNamespace"); + StringRef IncludeHeader = + DirectiveLanguage->getValueAsString("includeHeader"); const auto &Directives = Records.getAllDerivedDefinitions("Directive"); const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); + if (!IncludeHeader.empty()) + OS << "#include \"" << IncludeHeader << "\"\n\n"; + + OS << "#include \"llvm/ADT/StringRef.h\"\n"; + OS << "#include \"llvm/ADT/StringSwitch.h\"\n"; + OS << "\n"; + OS << "using namespace llvm;\n"; + llvm::SmallVector Namespaces; + llvm::SplitString(CppNamespace, Namespaces, "::"); + for (auto Ns : Namespaces) + OS << "using namespace " << Ns << ";\n"; + // getDirectiveKind(StringRef Str) GenerateGetKind(Directives, OS, "Directive", DirectivePrefix, LanguageName, CppNamespace, /*ImplicitAsUnknown=*/false); From llvm-commits at lists.llvm.org Fri Jul 10 17:12:09 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:12:09 +0000 (UTC) Subject: [PATCH] D83583: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen In-Reply-To: References: Message-ID: <9bc446fc727b3d042a11e01a6f2b3e6b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG943660fd15f1: [openmp] Remove OMPConstants.cpp and replace it by OMP.cpp generated by tablegen (authored by clementval). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83583/new/ https://reviews.llvm.org/D83583 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/lib/Frontend/OpenMP/CMakeLists.txt llvm/lib/Frontend/OpenMP/OMPConstants.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83583.277177.patch Type: text/x-patch Size: 6028 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 17:16:05 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:16:05 +0000 (UTC) Subject: [PATCH] D81429: [COFF] Port CallGraphSort to COFF from ELF In-Reply-To: References: Message-ID: <8041459dc93757ec3b6fa5b021925b08@localhost.localdomain> zequanwu updated this revision to Diff 277180. zequanwu marked 2 inline comments as done. zequanwu added a comment. Remove trailing empty lines. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81429/new/ https://reviews.llvm.org/D81429 Files: lld/COFF/CMakeLists.txt lld/COFF/CallGraphSort.cpp lld/COFF/CallGraphSort.h lld/COFF/Config.h lld/COFF/Driver.cpp lld/COFF/Options.td lld/COFF/Writer.cpp lld/ELF/CallGraphSort.cpp lld/test/COFF/cgprofile-bad-clusters.s lld/test/COFF/cgprofile-icf.s lld/test/COFF/cgprofile-print.s lld/test/COFF/cgprofile-txt.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D81429.277180.patch Type: text/x-patch Size: 20325 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 17:16:12 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:16:12 +0000 (UTC) Subject: [PATCH] D81429: [COFF] Port CallGraphSort to COFF from ELF In-Reply-To: References: Message-ID: zequanwu added inline comments. ================ Comment at: lld/test/COFF/cgprofile-txt.s:21 +# CHECK: 140001000 T B +# CHECK: 140001001 T C +# CHECK: 140001002 T D ---------------- MaskRay wrote: > Add `-NEXT:` whenever applicable The complete output is like the following, so there is a gap. ``` 140001000 t .text 140001000 T B 140001001 t .text 140001001 T C 140001002 t .text 140001002 T D 140001003 t .text 140001003 T A 140001004 t .text ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81429/new/ https://reviews.llvm.org/D81429 From llvm-commits at lists.llvm.org Fri Jul 10 17:20:51 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Fri, 10 Jul 2020 17:20:51 -0700 (PDT) Subject: [llvm] 351f2b3 - [InstSimplify] add tests for maxnum (PR46627); NFC Message-ID: <5f0905e3.1c69fb81.fdb7f.141c@mx.google.com> Author: Sanjay Patel Date: 2020-07-10T20:20:38-04:00 New Revision: 351f2b3c0ab357d42b3bda319979fa537f1565c3 URL: https://github.com/llvm/llvm-project/commit/351f2b3c0ab357d42b3bda319979fa537f1565c3 DIFF: https://github.com/llvm/llvm-project/commit/351f2b3c0ab357d42b3bda319979fa537f1565c3.diff LOG: [InstSimplify] add tests for maxnum (PR46627); NFC Added: Modified: llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll index ee097ffab094..653b730fac36 100644 --- a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll +++ b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll @@ -1333,3 +1333,53 @@ define float @fsub_fadd_common_op_wrong_commute_commute(float %x, float %y) { %r = fadd reassoc nsz float %s, %y ret float %r } + +define float @maxnum_with_poszero_op(float %a) { +; CHECK-LABEL: @maxnum_with_poszero_op( +; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float [[A:%.*]], float 0.000000e+00) +; CHECK-NEXT: ret float [[MAX]] +; + %max = call float @llvm.maxnum.f32(float %a, float 0.0) + %fabs = call float @llvm.fabs.f32(float %max) + ret float %fabs +} + +define float @maxnum_with_poszero_op_commute(float %a) { +; CHECK-LABEL: @maxnum_with_poszero_op_commute( +; CHECK-NEXT: [[SQRT:%.*]] = call float @llvm.sqrt.f32(float [[A:%.*]]) +; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float [[SQRT]]) +; CHECK-NEXT: ret float [[MAX]] +; + %sqrt = call float @llvm.sqrt.f32(float %a) + %max = call float @llvm.maxnum.f32(float 0.0, float %sqrt) + %fabs = call float @llvm.fabs.f32(float %max) + ret float %fabs +} + +define float @maxnum_with_negzero_op(float %a) { +; CHECK-LABEL: @maxnum_with_negzero_op( +; CHECK-NEXT: [[NNAN:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) +; CHECK-NEXT: [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) +; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float -0.000000e+00, float [[FABSA]]) +; CHECK-NEXT: ret float [[MAX]] +; + %nnan = call nnan float @llvm.sqrt.f32(float %a) + %fabsa = call float @llvm.fabs.f32(float %nnan) + %max = call float @llvm.maxnum.f32(float -0.0, float %fabsa) + %fabs = call float @llvm.fabs.f32(float %max) + ret float %fabs +} + +define float @maxnum_with_negzero_op_commute(float %a) { +; CHECK-LABEL: @maxnum_with_negzero_op_commute( +; CHECK-NEXT: [[NNAN:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) +; CHECK-NEXT: [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) +; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float [[FABSA]], float -0.000000e+00) +; CHECK-NEXT: ret float [[MAX]] +; + %nnan = call nnan float @llvm.sqrt.f32(float %a) + %fabsa = call float @llvm.fabs.f32(float %nnan) + %max = call float @llvm.maxnum.f32(float %fabsa, float -0.0) + %fabs = call float @llvm.fabs.f32(float %max) + ret float %fabs +} From llvm-commits at lists.llvm.org Fri Jul 10 17:27:32 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Fri, 10 Jul 2020 17:27:32 -0700 (PDT) Subject: [llvm] b8235d2 - Reland "[OpenMPOpt] ICV Tracking" Message-ID: <5f090774.1c69fb81.afaa7.0798@mx.google.com> Author: sstefan1 Date: 2020-07-11T02:25:57+02:00 New Revision: b8235d2bd87158c280dded0a40c0f9ddc9cb519b URL: https://github.com/llvm/llvm-project/commit/b8235d2bd87158c280dded0a40c0f9ddc9cb519b DIFF: https://github.com/llvm/llvm-project/commit/b8235d2bd87158c280dded0a40c0f9ddc9cb519b.diff LOG: Reland "[OpenMPOpt] ICV Tracking" This reverts commit 1d542f0ca83fa1411d6501a8d088450d83abd5b8. `recollectUses()` is added to prevent looking at dead uses after Attributor run. This is the first and most basic ICV Tracking implementation. For this first version, we only support deduplication within the same BB. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku, baziotis, lebedev.ri Differential Revision: https://reviews.llvm.org/D81788 Added: llvm/test/Transforms/OpenMP/dead_use.ll Modified: llvm/include/llvm/ADT/EnumeratedArray.h llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/icv_tracking.ll Removed: ################################################################################ diff --git a/llvm/include/llvm/ADT/EnumeratedArray.h b/llvm/include/llvm/ADT/EnumeratedArray.h index a9528115618c..a66ec9d08c37 100644 --- a/llvm/include/llvm/ADT/EnumeratedArray.h +++ b/llvm/include/llvm/ADT/EnumeratedArray.h @@ -38,6 +38,7 @@ class EnumeratedArray { static_cast &>(*this)[Index]); } + inline IndexType size() { return Size; } private: ValueType Underlying[Size]; diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h b/llvm/include/llvm/Transforms/IPO/Attributor.h index 93fc89278c79..c6261845b765 100644 --- a/llvm/include/llvm/Transforms/IPO/Attributor.h +++ b/llvm/include/llvm/Transforms/IPO/Attributor.h @@ -1036,6 +1036,14 @@ struct Attributor { identifyDefaultAbstractAttributes(const_cast(F)); } + /// Helper function to remove callsite. + void removeCallSite(CallInst *CI) { + if (!CI) + return; + + CGUpdater.removeCallSite(*CI); + } + /// Record that \p U is to be replaces with \p NV after information was /// manifested. This also triggers deletion of trivially dead istructions. bool changeUseAfterManifest(Use &U, Value &NV) { diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 85d88ec3ca26..0b2e4f24bd17 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -53,8 +53,47 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified, static constexpr auto TAG = "[" DEBUG_TYPE "]"; #endif +/// Helper struct to store tracked ICV values at specif instructions. +struct ICVValue { + Instruction *Inst; + Value *TrackedValue; + + ICVValue(Instruction *I, Value *Val) : Inst(I), TrackedValue(Val) {} +}; + +namespace llvm { + +// Provide DenseMapInfo for ICVValue +template <> struct DenseMapInfo { + using InstInfo = DenseMapInfo; + using ValueInfo = DenseMapInfo; + + static inline ICVValue getEmptyKey() { + return ICVValue(InstInfo::getEmptyKey(), ValueInfo::getEmptyKey()); + }; + + static inline ICVValue getTombstoneKey() { + return ICVValue(InstInfo::getTombstoneKey(), ValueInfo::getTombstoneKey()); + }; + + static unsigned getHashValue(const ICVValue &ICVVal) { + return detail::combineHashValue( + InstInfo::getHashValue(ICVVal.Inst), + ValueInfo::getHashValue(ICVVal.TrackedValue)); + } + + static bool isEqual(const ICVValue &LHS, const ICVValue &RHS) { + return InstInfo::isEqual(LHS.Inst, RHS.Inst) && + ValueInfo::isEqual(LHS.TrackedValue, RHS.TrackedValue); + } +}; + +} // end namespace llvm + namespace { +struct AAICVTracker; + /// OpenMP specific information. For now, stores RFIs and ICVs also needed for /// Attributor runs. struct OMPInformationCache : public InformationCache { @@ -119,11 +158,14 @@ struct OMPInformationCache : public InformationCache { /// Uses of this runtime function per function containing the use. using UseVector = SmallVector; + /// Clear UsesMap for runtime function. + void clearUsesMap() { UsesMap.clear(); } + /// Return the vector of uses in function \p F. UseVector &getOrCreateUseVector(Function *F) { - std::unique_ptr &UV = UsesMap[F]; + std::shared_ptr &UV = UsesMap[F]; if (!UV) - UV = std::make_unique(); + UV = std::make_shared(); return *UV; } @@ -179,7 +221,7 @@ struct OMPInformationCache : public InformationCache { private: /// Map from functions to all uses of this runtime function contained in /// them. - DenseMap> UsesMap; + DenseMap> UsesMap; }; /// The slice of the module we are allowed to look at. @@ -262,34 +304,45 @@ struct OMPInformationCache : public InformationCache { return true; } - /// Helper to initialize all runtime function information for those defined - /// in OpenMPKinds.def. - void initializeRuntimeFunctions() { - // Helper to collect all uses of the decleration in the UsesMap. - auto CollectUses = [&](RuntimeFunctionInfo &RFI) { - unsigned NumUses = 0; - if (!RFI.Declaration) - return NumUses; - OMPBuilder.addAttributes(RFI.Kind, *RFI.Declaration); + // Helper to collect all uses of the decleration in the UsesMap. + unsigned collectUses(RuntimeFunctionInfo &RFI, bool CollectStats = true) { + unsigned NumUses = 0; + if (!RFI.Declaration) + return NumUses; + OMPBuilder.addAttributes(RFI.Kind, *RFI.Declaration); + if (CollectStats) { NumOpenMPRuntimeFunctionsIdentified += 1; NumOpenMPRuntimeFunctionUsesIdentified += RFI.Declaration->getNumUses(); + } - // TODO: We directly convert uses into proper calls and unknown uses. - for (Use &U : RFI.Declaration->uses()) { - if (Instruction *UserI = dyn_cast(U.getUser())) { - if (ModuleSlice.count(UserI->getFunction())) { - RFI.getOrCreateUseVector(UserI->getFunction()).push_back(&U); - ++NumUses; - } - } else { - RFI.getOrCreateUseVector(nullptr).push_back(&U); + // TODO: We directly convert uses into proper calls and unknown uses. + for (Use &U : RFI.Declaration->uses()) { + if (Instruction *UserI = dyn_cast(U.getUser())) { + if (ModuleSlice.count(UserI->getFunction())) { + RFI.getOrCreateUseVector(UserI->getFunction()).push_back(&U); ++NumUses; } + } else { + RFI.getOrCreateUseVector(nullptr).push_back(&U); + ++NumUses; } - return NumUses; - }; + } + return NumUses; + } + // Helper function to recollect uses of all runtime functions. + void recollectUses() { + for (int Idx = 0; Idx < RFIs.size(); ++Idx) { + auto &RFI = RFIs[static_cast(Idx)]; + RFI.clearUsesMap(); + collectUses(RFI, /*CollectStats*/ false); + } + } + + /// Helper to initialize all runtime function information for those defined + /// in OpenMPKinds.def. + void initializeRuntimeFunctions() { Module &M = *((*ModuleSlice.begin())->getParent()); // Helper macros for handling __VA_ARGS__ in OMP_RTL @@ -327,7 +380,7 @@ struct OMPInformationCache : public InformationCache { RFI.ReturnType = OMPBuilder._ReturnType; \ RFI.ArgumentTypes = std::move(ArgsTypes); \ RFI.Declaration = F; \ - unsigned NumUses = CollectUses(RFI); \ + unsigned NumUses = collectUses(RFI); \ (void)NumUses; \ LLVM_DEBUG({ \ dbgs() << TAG << RFI.Name << (RFI.Declaration ? "" : " not") \ @@ -352,9 +405,9 @@ struct OpenMPOpt { OpenMPOpt(SmallVectorImpl &SCC, CallGraphUpdater &CGUpdater, OptimizationRemarkGetter OREGetter, - OMPInformationCache &OMPInfoCache) + OMPInformationCache &OMPInfoCache, Attributor &A) : M(*(*SCC.begin())->getParent()), SCC(SCC), CGUpdater(CGUpdater), - OREGetter(OREGetter), OMPInfoCache(OMPInfoCache) {} + OREGetter(OREGetter), OMPInfoCache(OMPInfoCache), A(A) {} /// Run all OpenMP optimizations on the underlying SCC/ModuleSlice. bool run() { @@ -385,6 +438,11 @@ struct OpenMPOpt { } } + Changed |= runAttributor(); + + // Recollect uses, in case Attributor deleted any. + OMPInfoCache.recollectUses(); + Changed |= deduplicateRuntimeCalls(); Changed |= deleteParallelRegions(); @@ -746,9 +804,206 @@ struct OpenMPOpt { /// OpenMP-specific information cache. Also Used for Attributor runs. OMPInformationCache &OMPInfoCache; + + /// Attributor instance. + Attributor &A; + + /// Helper function to run Attributor on SCC. + bool runAttributor() { + if (SCC.empty()) + return false; + + registerAAs(); + + ChangeStatus Changed = A.run(); + + LLVM_DEBUG(dbgs() << "[Attributor] Done with " << SCC.size() + << " functions, result: " << Changed << ".\n"); + + return Changed == ChangeStatus::CHANGED; + } + + /// Populate the Attributor with abstract attribute opportunities in the + /// function. + void registerAAs() { + for (Function *F : SCC) { + if (F->isDeclaration()) + continue; + + A.getOrCreateAAFor(IRPosition::function(*F)); + } + } +}; + +/// Abstract Attribute for tracking ICV values. +struct AAICVTracker : public StateWrapper { + using Base = StateWrapper; + AAICVTracker(const IRPosition &IRP, Attributor &A) : Base(IRP) {} + + /// Returns true if value is assumed to be tracked. + bool isAssumedTracked() const { return getAssumed(); } + + /// Returns true if value is known to be tracked. + bool isKnownTracked() const { return getAssumed(); } + + /// Create an abstract attribute biew for the position \p IRP. + static AAICVTracker &createForPosition(const IRPosition &IRP, Attributor &A); + + /// Return the value with which \p I can be replaced for specific \p ICV. + virtual Value *getReplacementValue(InternalControlVar ICV, + const Instruction *I, Attributor &A) = 0; + + /// See AbstractAttribute::getName() + const std::string getName() const override { return "AAICVTracker"; } + + static const char ID; +}; + +struct AAICVTrackerFunction : public AAICVTracker { + AAICVTrackerFunction(const IRPosition &IRP, Attributor &A) + : AAICVTracker(IRP, A) {} + + // FIXME: come up with better string. + const std::string getAsStr() const override { return "ICVTracker"; } + + // FIXME: come up with some stats. + void trackStatistics() const override {} + + /// TODO: decide whether to deduplicate here, or use current + /// deduplicateRuntimeCalls function. + ChangeStatus manifest(Attributor &A) override { + ChangeStatus Changed = ChangeStatus::UNCHANGED; + + for (InternalControlVar &ICV : TrackableICVs) + if (deduplicateICVGetters(ICV, A)) + Changed = ChangeStatus::CHANGED; + + return Changed; + } + + bool deduplicateICVGetters(InternalControlVar &ICV, Attributor &A) { + auto &OMPInfoCache = static_cast(A.getInfoCache()); + auto &ICVInfo = OMPInfoCache.ICVs[ICV]; + auto &GetterRFI = OMPInfoCache.RFIs[ICVInfo.Getter]; + + bool Changed = false; + + auto ReplaceAndDeleteCB = [&](Use &U, Function &Caller) { + CallInst *CI = OpenMPOpt::getCallIfRegularCall(U, &GetterRFI); + Instruction *UserI = cast(U.getUser()); + Value *ReplVal = getReplacementValue(ICV, UserI, A); + + if (!ReplVal || !CI) + return false; + + A.removeCallSite(CI); + CI->replaceAllUsesWith(ReplVal); + CI->eraseFromParent(); + Changed = true; + return true; + }; + + GetterRFI.foreachUse(ReplaceAndDeleteCB); + return Changed; + } + + // Map of ICV to their values at specific program point. + EnumeratedArray, InternalControlVar, + InternalControlVar::ICV___last> + ICVValuesMap; + + // Currently only nthreads is being tracked. + // this array will only grow with time. + InternalControlVar TrackableICVs[1] = {ICV_nthreads}; + + ChangeStatus updateImpl(Attributor &A) override { + ChangeStatus HasChanged = ChangeStatus::UNCHANGED; + + Function *F = getAnchorScope(); + + auto &OMPInfoCache = static_cast(A.getInfoCache()); + + for (InternalControlVar ICV : TrackableICVs) { + auto &SetterRFI = OMPInfoCache.RFIs[OMPInfoCache.ICVs[ICV].Setter]; + + auto TrackValues = [&](Use &U, Function &) { + CallInst *CI = OpenMPOpt::getCallIfRegularCall(U); + if (!CI) + return false; + + // FIXME: handle setters with more that 1 arguments. + /// Track new value. + if (ICVValuesMap[ICV].insert(ICVValue(CI, CI->getArgOperand(0)))) + HasChanged = ChangeStatus::CHANGED; + + return false; + }; + + SetterRFI.foreachUse(TrackValues, F); + } + + return HasChanged; + } + + /// Return the value with which \p I can be replaced for specific \p ICV. + Value *getReplacementValue(InternalControlVar ICV, const Instruction *I, + Attributor &A) override { + const BasicBlock *CurrBB = I->getParent(); + + auto &ValuesSet = ICVValuesMap[ICV]; + auto &OMPInfoCache = static_cast(A.getInfoCache()); + auto &GetterRFI = OMPInfoCache.RFIs[OMPInfoCache.ICVs[ICV].Getter]; + + for (const auto &ICVVal : ValuesSet) { + if (CurrBB == ICVVal.Inst->getParent()) { + if (!ICVVal.Inst->comesBefore(I)) + continue; + + // both instructions are in the same BB and at \p I we know the ICV + // value. + while (I != ICVVal.Inst) { + // we don't yet know if a call might update an ICV. + // TODO: check callsite AA for value. + if (const auto *CB = dyn_cast(I)) + if (CB->getCalledFunction() != GetterRFI.Declaration) + return nullptr; + + I = I->getPrevNode(); + } + + // No call in between, return the value. + return ICVVal.TrackedValue; + } + } + + // No value was tracked. + return nullptr; + } }; } // namespace +const char AAICVTracker::ID = 0; + +AAICVTracker &AAICVTracker::createForPosition(const IRPosition &IRP, + Attributor &A) { + AAICVTracker *AA = nullptr; + switch (IRP.getPositionKind()) { + case IRPosition::IRP_INVALID: + case IRPosition::IRP_FLOAT: + case IRPosition::IRP_ARGUMENT: + case IRPosition::IRP_RETURNED: + case IRPosition::IRP_CALL_SITE_RETURNED: + case IRPosition::IRP_CALL_SITE_ARGUMENT: + case IRPosition::IRP_CALL_SITE: + llvm_unreachable("ICVTracker can only be created for function position!"); + case IRPosition::IRP_FUNCTION: + AA = new (A.Allocator) AAICVTrackerFunction(IRP, A); + break; + } + + return *AA; +} + PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, LazyCallGraph &CG, CGSCCUpdateResult &UR) { @@ -785,8 +1040,10 @@ PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator, /*CGSCC*/ &Functions, ModuleSlice); + Attributor A(Functions, InfoCache, CGUpdater); + // TODO: Compute the module slice we are allowed to look at. - OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache); + OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A); bool Changed = OMPOpt.run(); (void)Changed; return PreservedAnalyses::all(); @@ -850,8 +1107,10 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { Allocator, /*CGSCC*/ &Functions, ModuleSlice); + Attributor A(Functions, InfoCache, CGUpdater); + // TODO: Compute the module slice we are allowed to look at. - OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache); + OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A); return OMPOpt.run(); } diff --git a/llvm/test/Transforms/OpenMP/dead_use.ll b/llvm/test/Transforms/OpenMP/dead_use.ll new file mode 100644 index 000000000000..4aca7935b10c --- /dev/null +++ b/llvm/test/Transforms/OpenMP/dead_use.ll @@ -0,0 +1,73 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature +; RUN: opt -S -openmpopt < %s | FileCheck %s +; RUN: opt -S -passes=openmpopt < %s | FileCheck %s +%struct.ident_t = type { i32, i32, i32, i32, i8* } + + at .str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 + at 0 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8 + +; Function Attrs: nounwind uwtable +define dso_local i32 @b() #0 { +; CHECK-LABEL: define {{[^@]+}}@b() #0 +; CHECK-NEXT: [[TMP1:%.*]] = alloca i32, align 4 +; CHECK-NEXT: [[TMP2:%.*]] = call i32 @a() +; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP1]], align 4 +; CHECK-NEXT: ret i32 [[TMP3]] +; + %1 = alloca i32, align 4 + %2 = call i32 @a() + %3 = load i32, i32* %1, align 4 + ret i32 %3 +} + +; Function Attrs: nounwind uwtable +define internal i32 @a() #0 { +; CHECK-LABEL: define {{[^@]+}}@a() #0 +; CHECK-NEXT: [[TMP1:%.*]] = alloca i32, align 4 +; CHECK-NEXT: [[TMP2:%.*]] = call i32 @b() +; CHECK-NEXT: call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*)) +; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP1]], align 4 +; CHECK-NEXT: ret i32 [[TMP3]] +; + %1 = alloca i32, align 4 + %2 = call i32 @b() + call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*)) + %3 = load i32, i32* %1, align 4 + ret i32 %3 +} + +; Function Attrs: norecurse nounwind uwtable +define internal void @.omp_outlined.(i32* noalias %0, i32* noalias %1) #1 { +; CHECK-LABEL: define {{[^@]+}}@.omp_outlined. +; CHECK-SAME: (i32* noalias [[TMP0:%.*]], i32* noalias [[TMP1:%.*]]) #1 +; CHECK-NEXT: [[TMP3:%.*]] = alloca i32*, align 8 +; CHECK-NEXT: [[TMP4:%.*]] = alloca i32*, align 8 +; CHECK-NEXT: store i32* [[TMP0]], i32** [[TMP3]], align 8, !tbaa !2 +; CHECK-NEXT: store i32* [[TMP1]], i32** [[TMP4]], align 8, !tbaa !2 +; CHECK-NEXT: ret void +; + %3 = alloca i32*, align 8 + %4 = alloca i32*, align 8 + store i32* %0, i32** %3, align 8, !tbaa !2 + store i32* %1, i32** %4, align 8, !tbaa !2 + ret void +} + +; Function Attrs: nounwind +declare !callback !6 void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) #2 + +attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #1 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #2 = { nounwind } + +!llvm.module.flags = !{!0} +!llvm.ident = !{!1} + +!0 = !{i32 1, !"wchar_size", i32 4} +!1 = !{!"Debian clang version 11.0.0-++20200709100646+c92a8c0a0f6-1~exp1~20200709201313.3348"} +!2 = !{!3, !3, i64 0} +!3 = !{!"any pointer", !4, i64 0} +!4 = !{!"omnipotent char", !5, i64 0} +!5 = !{!"Simple C/C++ TBAA"} +!6 = !{!7} +!7 = !{i64 2, i64 -1, i64 -1, i1 true} diff --git a/llvm/test/Transforms/OpenMP/icv_tracking.ll b/llvm/test/Transforms/OpenMP/icv_tracking.ll index e3704338a7a9..c2b5d40ce97a 100644 --- a/llvm/test/Transforms/OpenMP/icv_tracking.ll +++ b/llvm/test/Transforms/OpenMP/icv_tracking.ll @@ -11,16 +11,12 @@ define dso_local i32 @foo(i32 %0, i32 %1) { ; CHECK-LABEL: define {{[^@]+}}@foo ; CHECK-SAME: (i32 [[TMP0:%.*]], i32 [[TMP1:%.*]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP0]]) -; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP1]]) -; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) -; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) +; CHECK-NEXT: tail call void @use(i32 [[TMP1]]) +; CHECK-NEXT: tail call void @use(i32 [[TMP1]]) ; CHECK-NEXT: tail call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*)) -; CHECK-NEXT: [[TMP7:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP7]]) +; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP3]]) ; CHECK-NEXT: ret i32 0 ; tail call void @omp_set_num_threads(i32 %0) @@ -51,15 +47,13 @@ define internal void @.omp_outlined.(i32* %0, i32* %1) { ; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 10) -; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) +; CHECK-NEXT: tail call void @use(i32 10) ; CHECK-NEXT: ret void ; ; FIXME: this value should be tracked and the rest of the getters deduplicated and replaced with it. %3 = tail call i32 @omp_get_max_threads() %4 = tail call i32 @omp_get_max_threads() tail call void @use(i32 %4) -; FIXME: this value ( min(%3, 10) ) should be tracked and the rest of the getters deduplicated and replaced with it. tail call void @omp_set_num_threads(i32 10) %5 = tail call i32 @omp_get_max_threads() tail call void @use(i32 %5) @@ -74,10 +68,9 @@ define dso_local i32 @bar(i32 %0, i32 %1) { ; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP0]], [[TMP1]] ; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 [[TMP0]], i32 [[TMP1]] ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 [[TMP4]]) -; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined..1 to void (i32*, i32*, ...)*)) -; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP6]]) +; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() +; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) ; CHECK-NEXT: ret i32 0 ; %3 = icmp sgt i32 %0, %1 @@ -97,10 +90,9 @@ define internal void @.omp_outlined..1(i32* %0, i32* %1) { ; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP3]]) ; CHECK-NEXT: tail call void @omp_set_num_threads(i32 10) +; CHECK-NEXT: tail call void @use(i32 10) ; CHECK-NEXT: [[TMP4:%.*]] = tail call i32 @omp_get_max_threads() ; CHECK-NEXT: tail call void @use(i32 [[TMP4]]) -; CHECK-NEXT: [[TMP5:%.*]] = tail call i32 @omp_get_max_threads() -; CHECK-NEXT: tail call void @use(i32 [[TMP5]]) ; CHECK-NEXT: ret void ; %3 = tail call i32 @omp_get_max_threads() From llvm-commits at lists.llvm.org Fri Jul 10 17:27:36 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:27:36 +0000 (UTC) Subject: [PATCH] D81788: [OpenMPOpt] ICV Tracking In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGb8235d2bd871: Reland "[OpenMPOpt] ICV Tracking" (authored by sstefan1). Herald added a subscriber: dexonsmith. Changed prior to commit: https://reviews.llvm.org/D81788?vs=277162&id=277182#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81788/new/ https://reviews.llvm.org/D81788 Files: llvm/include/llvm/ADT/EnumeratedArray.h llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/dead_use.ll llvm/test/Transforms/OpenMP/icv_tracking.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D81788.277182.patch Type: text/x-patch Size: 23165 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 17:37:33 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:37:33 +0000 (UTC) Subject: [PATCH] D81775: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: zequanwu closed this revision. zequanwu added a comment. This is landed, but I accidentally wrote wrong diff ID (D83597 ) in the commit message: https://github.com/llvm/llvm-project/commit/0f0c5af3db9b0159d9b1a89faff3bd047510b628 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81775/new/ https://reviews.llvm.org/D81775 From llvm-commits at lists.llvm.org Fri Jul 10 17:38:39 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:38:39 +0000 (UTC) Subject: [PATCH] D83159: [RISCV][test] Add a new codegen test of add-mul transform In-Reply-To: References: Message-ID: <89e91150a8c343e152fb0b14df671541@localhost.localdomain> benshi001 updated this revision to Diff 277184. benshi001 retitled this revision from "[RISCV][test] Add a new codegen test" to "[RISCV][test] Add a new codegen test of add-mul transform". CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 Files: llvm/test/CodeGen/RISCV/addimm-mulimm.ll Index: llvm/test/CodeGen/RISCV/addimm-mulimm.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/RISCV/addimm-mulimm.ll @@ -0,0 +1,95 @@ +; Test whether (mul (add x, c1), c2) can be transformed to +; (add (mul x, c2), c1*c2) according to different c1/c2 pairs. + +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s +; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV64IM %s + +define signext i32 @add_mul_trans_accept_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 11 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: addi a0, a0, 407 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 11 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: addiw a0, a0, 407 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 37 + %tmp1 = mul i32 %tmp0, 11 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_accept_2(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_2 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 13 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 28 +; RV32IM-NEXT: addi a1, a1, 1701 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_2 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 13 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 28 +; RV64IM-NEXT: addiw a1, a1, 1701 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 8953 + %tmp1 = mul i32 %tmp0, 13 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_reject_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 19 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 9 +; RV32IM-NEXT: addi a1, a1, 585 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_reject_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 19 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 9 +; RV64IM-NEXT: addiw a1, a1, 585 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1971 + %tmp1 = mul i32 %tmp0, 19 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_2(i32 %x) { +; RV32IM: # %bb.0: +; RV32IM-NEXT: lui a1, 792 +; RV32IM-NEXT: addi a1, a1, -1709 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 1014660 +; RV32IM-NEXT: addi a1, a1, -1891 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM: # %bb.0: +; RV64IM-NEXT: lui a1, 792 +; RV64IM-NEXT: addiw a1, a1, -1709 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 1014660 +; RV64IM-NEXT: addiw a1, a1, -1891 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1841231 + %tmp1 = mul i32 %tmp0, 3242323 + ret i32 %tmp1 +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83159.277184.patch Type: text/x-patch Size: 3054 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 17:39:43 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:39:43 +0000 (UTC) Subject: [PATCH] D83601: [ValueTracking] fix bug in maxnum case of cannotBeOrderedLessThanZeroImpl (PR46627) Message-ID: spatel created this revision. spatel added reviewers: efriedma, cameron.mcinally, arsenm. Herald added subscribers: hiraditya, wdng, mcrosier. Herald added a project: LLVM. A miscompile with -0.0 is shown in: https://bugs.llvm.org/show_bug.cgi?id=46627 This is because maxnum(-0.0, +0.0) does not specify a fixed result: http://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic So we need to tighten the constraints for when it is ok to say the result of maxnum is positive (including +0.0). https://reviews.llvm.org/D83601 Files: llvm/lib/Analysis/ValueTracking.cpp llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll Index: llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll =================================================================== --- llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll +++ llvm/test/Transforms/InstSimplify/floating-point-arithmetic.ll @@ -1334,10 +1334,13 @@ ret float %r } +; PR46627 - https://bugs.llvm.org/show_bug.cgi?id=46627 + define float @maxnum_with_poszero_op(float %a) { ; CHECK-LABEL: @maxnum_with_poszero_op( ; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float [[A:%.*]], float 0.000000e+00) -; CHECK-NEXT: ret float [[MAX]] +; CHECK-NEXT: [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) +; CHECK-NEXT: ret float [[FABS]] ; %max = call float @llvm.maxnum.f32(float %a, float 0.0) %fabs = call float @llvm.fabs.f32(float %max) @@ -1348,7 +1351,8 @@ ; CHECK-LABEL: @maxnum_with_poszero_op_commute( ; CHECK-NEXT: [[SQRT:%.*]] = call float @llvm.sqrt.f32(float [[A:%.*]]) ; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float [[SQRT]]) -; CHECK-NEXT: ret float [[MAX]] +; CHECK-NEXT: [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) +; CHECK-NEXT: ret float [[FABS]] ; %sqrt = call float @llvm.sqrt.f32(float %a) %max = call float @llvm.maxnum.f32(float 0.0, float %sqrt) @@ -1361,7 +1365,8 @@ ; CHECK-NEXT: [[NNAN:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) ; CHECK-NEXT: [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) ; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float -0.000000e+00, float [[FABSA]]) -; CHECK-NEXT: ret float [[MAX]] +; CHECK-NEXT: [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) +; CHECK-NEXT: ret float [[FABS]] ; %nnan = call nnan float @llvm.sqrt.f32(float %a) %fabsa = call float @llvm.fabs.f32(float %nnan) @@ -1375,7 +1380,8 @@ ; CHECK-NEXT: [[NNAN:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) ; CHECK-NEXT: [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) ; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float [[FABSA]], float -0.000000e+00) -; CHECK-NEXT: ret float [[MAX]] +; CHECK-NEXT: [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) +; CHECK-NEXT: ret float [[FABS]] ; %nnan = call nnan float @llvm.sqrt.f32(float %a) %fabsa = call float @llvm.fabs.f32(float %nnan) Index: llvm/lib/Analysis/ValueTracking.cpp =================================================================== --- llvm/lib/Analysis/ValueTracking.cpp +++ llvm/lib/Analysis/ValueTracking.cpp @@ -3368,13 +3368,20 @@ switch (IID) { default: break; - case Intrinsic::maxnum: - return (isKnownNeverNaN(I->getOperand(0), TLI) && - cannotBeOrderedLessThanZeroImpl(I->getOperand(0), TLI, - SignBitOnly, Depth + 1)) || - (isKnownNeverNaN(I->getOperand(1), TLI) && - cannotBeOrderedLessThanZeroImpl(I->getOperand(1), TLI, - SignBitOnly, Depth + 1)); + case Intrinsic::maxnum: { + auto isPositiveNum = [&](Value *V) { + return isKnownNeverNaN(V, TLI) && + cannotBeOrderedLessThanZeroImpl(V, TLI, SignBitOnly, Depth + 1); + }; + + // This is tricky because the result of maxnum(+0.0, -0.0) is unspecified. + // TODO: This could be improved. If we had a "CannotBePositiveZero", then + // that plus isPositiveNum is enough to say maxnum returns a + // positive value. + Value *V0 = I->getOperand(0), *V1 = I->getOperand(1); + return (isPositiveNum(V0) && CannotBeNegativeZero(V1, TLI, Depth + 1)) || + (isPositiveNum(V1) && CannotBeNegativeZero(V0, TLI, Depth + 1)); + } case Intrinsic::maximum: return cannotBeOrderedLessThanZeroImpl(I->getOperand(0), TLI, SignBitOnly, -------------- next part -------------- A non-text attachment was scrubbed... Name: D83601.277183.patch Type: text/x-patch Size: 3899 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 17:58:23 2020 From: llvm-commits at lists.llvm.org (Justin Hibbits via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 00:58:23 +0000 (UTC) Subject: [PATCH] D82747: [PowerPC] Support constrained int/fp conversion in SPE targets In-Reply-To: References: Message-ID: jhibbits accepted this revision. jhibbits added a comment. This revision is now accepted and ready to land. Looks fine to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82747/new/ https://reviews.llvm.org/D82747 From llvm-commits at lists.llvm.org Fri Jul 10 18:00:46 2020 From: llvm-commits at lists.llvm.org (Justin Hibbits via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:00:46 +0000 (UTC) Subject: [PATCH] D78670: PowerPC: Fix SPE extloadf32 handling. In-Reply-To: References: Message-ID: <07dee21ad1ad2a5925c5d82b7a1601f3@localhost.localdomain> jhibbits added a comment. Ping again? Since it's a trivial change, and isolated at that, I'm not opposed to post-commit review instead. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78670/new/ https://reviews.llvm.org/D78670 From llvm-commits at lists.llvm.org Fri Jul 10 18:03:21 2020 From: llvm-commits at lists.llvm.org (Justin Hibbits via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:03:21 +0000 (UTC) Subject: [PATCH] D78669: PowerPC: Add emergency stack spill slots for SPE In-Reply-To: References: Message-ID: <2c1f56ad878d7a989c1facb0ee019758@localhost.localdomain> jhibbits added a comment. Funny thing, I'm able to reproduce it *only* while running natively on a powerpcspe based device, not on any other device I've tested. I'll try to reduce the testcase even further, since it really is unwieldy. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78669/new/ https://reviews.llvm.org/D78669 From llvm-commits at lists.llvm.org Fri Jul 10 18:13:29 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via llvm-commits) Date: Fri, 10 Jul 2020 18:13:29 -0700 (PDT) Subject: [llvm] 7b67bc1 - [openmp] Fix warning in generated OMP.cpp Message-ID: <5f091239.1c69fb81.e0b24.2024@mx.google.com> Author: Valentin Clement Date: 2020-07-10T21:13:12-04:00 New Revision: 7b67bc16ef1b954eea5f99e478ccd4840ebe7a06 URL: https://github.com/llvm/llvm-project/commit/7b67bc16ef1b954eea5f99e478ccd4840ebe7a06 DIFF: https://github.com/llvm/llvm-project/commit/7b67bc16ef1b954eea5f99e478ccd4840ebe7a06.diff LOG: [openmp] Fix warning in generated OMP.cpp Added: Modified: llvm/utils/TableGen/DirectiveEmitter.cpp Removed: ################################################################################ diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index 37f1677a7a84..f51f98872bb5 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -236,27 +236,32 @@ void GenerateIsAllowedClause(const std::vector &Directives, for (const auto &D : Directives) { const auto DirectiveName = D->getValueAsString("name"); + const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); + const auto &AllowedOnceClauses = + D->getValueAsListOfDefs("allowedOnceClauses"); + const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); OS << " case " << DirectivePrefix << getFormattedName(DirectiveName) << ":\n"; - OS << " switch (C) {\n"; + if (AllowedClauses.size() == 0 && AllowedOnceClauses.size() == 0 && + AllowedOnceClauses.size() == 0) { + OS << " return false;\n"; + } else { + OS << " switch (C) {\n"; - const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); - GenerateCaseForVersionedClauses(AllowedClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(AllowedClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); - const auto &AllowedOnceClauses = - D->getValueAsListOfDefs("allowedOnceClauses"); - GenerateCaseForVersionedClauses(AllowedOnceClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(AllowedOnceClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); - const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); - GenerateCaseForVersionedClauses(RequiredClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + GenerateCaseForVersionedClauses(RequiredClauses, OS, DirectiveName, + DirectivePrefix, ClausePrefix); - OS << " default:\n"; - OS << " return false;\n"; - OS << " }\n"; // End of clauses switch + OS << " default:\n"; + OS << " return false;\n"; + OS << " }\n"; // End of clauses switch + } OS << " break;\n"; } From llvm-commits at lists.llvm.org Fri Jul 10 18:16:25 2020 From: llvm-commits at lists.llvm.org (Evgenii Stepanov via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:16:25 +0000 (UTC) Subject: [PATCH] D83595: [Draft][MSAN] Optimize away poisoning allocas that are always written before load In-Reply-To: References: Message-ID: <6c7dca529687a9df84899ec6125812dd@localhost.localdomain> eugenis added a comment. In D83595#2145447 , @guiand wrote: > I figured if it's using GEP then it's likely not going to be storing to the entire shadow. Bitcast is overwhelmingly common though (especially bitcast->lifetime.start, which I realize I need to handle). Right. From what I've seen, small aggregates on stack are quite common as well, and they are always initialized piecemeal. This can be seen as an improvement of this change, but it would be a pretty bit change to the algorithm, so consider including that from the start. In D83595#2145347 , @lebedev.ri wrote: > Perhaps MemSSA-based DCE can be taught about it? That's an option, but it could be harder to pull off when poisoning is outlined. The code will look something like this: %p = alloca i32 call __msan_poison_and_set_origin(%p, 4) ... %s_p = inttoptr(xor(ptrtoint(%p), 0x50..00))) ; shadow address for %p store i32 zeroinitializer, %s_p store i32 , %p To eliminate the dead call to __msan_poison, DCE would need to know the shadow mapping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83595/new/ https://reviews.llvm.org/D83595 From llvm-commits at lists.llvm.org Fri Jul 10 18:16:36 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:16:36 +0000 (UTC) Subject: [PATCH] D83602: [DAGCombiner] Scalarize splats with just one demanded lane Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff, arsenm, spatel. Herald added subscribers: llvm-commits, ecnelises, hiraditya, jgravelle-google, sbc100, wdng. Herald added a project: LLVM. This patch implements a combine to scalarize subtrees of the selection DAG that produce splat values for which only a single lane is demanded. The scalarization only happens when the target supports scalar versions of each operation in the subtree to avoid introducing any new transitions between vector and scalar registers and to avoid potentially-expensive expansions of scalarized operations. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83602 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/arm64-nvcast.ll llvm/test/CodeGen/SystemZ/vec-trunc-to-i1.ll llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/X86/avx512-calling-conv.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83602.277186.patch Type: text/x-patch Size: 16306 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 18:24:33 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:24:33 +0000 (UTC) Subject: [PATCH] D83590: [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel In-Reply-To: References: Message-ID: <9462101a2a65e273dd3de117ba6d19b0@localhost.localdomain> hfinkel accepted this revision as: hfinkel. hfinkel added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83590/new/ https://reviews.llvm.org/D83590 From llvm-commits at lists.llvm.org Fri Jul 10 18:26:53 2020 From: llvm-commits at lists.llvm.org (Jez Ng via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:26:53 +0000 (UTC) Subject: [PATCH] D83603: [lld-macho] Support __dso_handle for C++ Message-ID: int3 created this revision. int3 added a reviewer: lld-macho. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. The C++ ABI requires dylibs to pass a pointer to __cxa_atexit which does e.g. cleanup of static global variables. The C++ spec says that the pointer can point to any address in one of the dylib's segments, but in practice ld64 seems to set it to point to the header, so that's what's implemented here. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83603 Files: lld/MachO/Driver.cpp lld/MachO/SymbolTable.cpp lld/MachO/SymbolTable.h lld/MachO/Symbols.cpp lld/MachO/Symbols.h lld/MachO/SyntheticSections.h lld/MachO/Writer.cpp lld/test/MachO/dso-handle-no-override.s lld/test/MachO/dso-handle.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D83603.277187.patch Type: text/x-patch Size: 9521 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 18:28:01 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:28:01 +0000 (UTC) Subject: [PATCH] D83581: [WebAssembly] Prefer v128.const for constant splats In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGb59c6fcaf3fc: [WebAssembly] Prefer v128.const for constant splats (authored by tlively). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83581/new/ https://reviews.llvm.org/D83581 Files: llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-arith.ll llvm/test/CodeGen/WebAssembly/simd.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83581.277188.patch Type: text/x-patch Size: 9915 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 18:28:00 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via llvm-commits) Date: Fri, 10 Jul 2020 18:28:00 -0700 (PDT) Subject: [llvm] b59c6fc - [WebAssembly] Prefer v128.const for constant splats Message-ID: <5f0915a0.1c69fb81.12ed.1a40@mx.google.com> Author: Thomas Lively Date: 2020-07-10T18:27:52-07:00 New Revision: b59c6fcaf3fc8fd4c42daeecf0545e47b37b1aa7 URL: https://github.com/llvm/llvm-project/commit/b59c6fcaf3fc8fd4c42daeecf0545e47b37b1aa7 DIFF: https://github.com/llvm/llvm-project/commit/b59c6fcaf3fc8fd4c42daeecf0545e47b37b1aa7.diff LOG: [WebAssembly] Prefer v128.const for constant splats In BUILD_VECTOR lowering, we used to generally prefer using splats over v128.const instructions because v128.const has a very large encoding. However, in d5b7a4e2e8 we switched to preferring consts because they are expected to be more efficient in engines. This patch updates the ISel patterns to match this current preference. Differential Revision: https://reviews.llvm.org/D83581 Added: Modified: llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td llvm/test/CodeGen/WebAssembly/simd-arith.ll llvm/test/CodeGen/WebAssembly/simd.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td b/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td index 814bb80fb693..4f3da2f35c61 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td +++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td @@ -328,8 +328,6 @@ def splat16 : PatFrag<(ops node:$x), (build_vector multiclass Splat simdop> { - // Prefer splats over v128.const for const splats (65 is lowest that works) - let AddedComplexity = 65 in defm SPLAT_#vec_t : SIMD_I<(outs V128:$dst), (ins reg_t:$x), (outs), (ins), [(set (vec_t V128:$dst), (splat_pat reg_t:$x))], vec#".splat\t$dst, $x", vec#".splat", simdop>; diff --git a/llvm/test/CodeGen/WebAssembly/simd-arith.ll b/llvm/test/CodeGen/WebAssembly/simd-arith.ll index c56566991a8c..fca4710b582f 100644 --- a/llvm/test/CodeGen/WebAssembly/simd-arith.ll +++ b/llvm/test/CodeGen/WebAssembly/simd-arith.ll @@ -1278,9 +1278,8 @@ define <4 x float> @abs_v4f32(<4 x float> %x) { ; CHECK-LABEL: min_unordered_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype min_unordered_v4f32 (v128) -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f32x4.min $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f32x4.min $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @min_unordered_v4f32(<4 x float> %x) { %cmps = fcmp ule <4 x float> %x, @@ -1292,9 +1291,8 @@ define <4 x float> @min_unordered_v4f32(<4 x float> %x) { ; CHECK-LABEL: max_unordered_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype max_unordered_v4f32 (v128) -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f32x4.max $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2, 0x1.4p2, 0x1.4p2 +; SIMD128-NEXT: f32x4.max $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @max_unordered_v4f32(<4 x float> %x) { %cmps = fcmp uge <4 x float> %x, @@ -1306,9 +1304,8 @@ define <4 x float> @max_unordered_v4f32(<4 x float> %x) { ; CHECK-LABEL: min_ordered_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype min_ordered_v4f32 (v128) -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f32x4.min $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f32x4.min $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @min_ordered_v4f32(<4 x float> %x) { %cmps = fcmp ole <4 x float> , %x @@ -1320,9 +1317,8 @@ define <4 x float> @min_ordered_v4f32(<4 x float> %x) { ; CHECK-LABEL: max_ordered_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype max_ordered_v4f32 (v128) -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f32x4.max $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f32x4.max $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @max_ordered_v4f32(<4 x float> %x) { %cmps = fcmp oge <4 x float> , %x @@ -1378,8 +1374,7 @@ define <4 x float> @maxnum_intrinsic_v4f32(<4 x float> %x, <4 x float> %y) { ; CHECK-LABEL: min_const_intrinsic_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype min_const_intrinsic_v4f32 () -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L:[0-9]+]]=, 0x1.4p2{{$}} -; SIMD128-NEXT: f32x4.splat $push[[R:[0-9]+]]=, $pop[[L]]{{$}} +; SIMD128-NEXT: v128.const $push[[R:[0-9]+]]=, 0x1.4p2, 0x1.4p2, 0x1.4p2, 0x1.4p2{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @min_const_intrinsic_v4f32() { %a = call <4 x float> @llvm.minimum.v4f32( @@ -1392,8 +1387,7 @@ define <4 x float> @min_const_intrinsic_v4f32() { ; CHECK-LABEL: max_const_intrinsic_v4f32: ; NO-SIMD128-NOT: f32x4 ; SIMD128-NEXT: .functype max_const_intrinsic_v4f32 () -> (v128){{$}} -; SIMD128-NEXT: f32.const $push[[L:[0-9]+]]=, 0x1.5p5{{$}} -; SIMD128-NEXT: f32x4.splat $push[[R:[0-9]+]]=, $pop[[L]]{{$}} +; SIMD128-NEXT: v128.const $push[[R:[0-9]+]]=, 0x1.5p5, 0x1.5p5, 0x1.5p5, 0x1.5p5{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <4 x float> @max_const_intrinsic_v4f32() { %a = call <4 x float> @llvm.maximum.v4f32( @@ -1482,9 +1476,8 @@ define <2 x double> @abs_v2f64(<2 x double> %x) { ; CHECK-LABEL: min_unordered_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype min_unordered_v2f64 (v128) -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f64x2.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f64x2.min $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f64x2.min $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @min_unordered_v2f64(<2 x double> %x) { %cmps = fcmp ule <2 x double> %x, @@ -1496,9 +1489,8 @@ define <2 x double> @min_unordered_v2f64(<2 x double> %x) { ; CHECK-LABEL: max_unordered_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype max_unordered_v2f64 (v128) -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f64x2.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f64x2.max $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f64x2.max $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @max_unordered_v2f64(<2 x double> %x) { %cmps = fcmp uge <2 x double> %x, @@ -1510,9 +1502,8 @@ define <2 x double> @max_unordered_v2f64(<2 x double> %x) { ; CHECK-LABEL: min_ordered_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype min_ordered_v2f64 (v128) -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f64x2.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f64x2.min $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f64x2.min $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @min_ordered_v2f64(<2 x double> %x) { %cmps = fcmp ole <2 x double> , %x @@ -1524,9 +1515,8 @@ define <2 x double> @min_ordered_v2f64(<2 x double> %x) { ; CHECK-LABEL: max_ordered_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype max_ordered_v2f64 (v128) -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L0:[0-9]+]]=, 0x1.4p2 -; SIMD128-NEXT: f64x2.splat $push[[L1:[0-9]+]]=, $pop[[L0]] -; SIMD128-NEXT: f64x2.max $push[[R:[0-9]+]]=, $0, $pop[[L1]]{{$}} +; SIMD128-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1.4p2, 0x1.4p2{{$}} +; SIMD128-NEXT: f64x2.max $push[[R:[0-9]+]]=, $0, $pop[[L0]]{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @max_ordered_v2f64(<2 x double> %x) { %cmps = fcmp oge <2 x double> , %x @@ -1560,8 +1550,7 @@ define <2 x double> @max_intrinsic_v2f64(<2 x double> %x, <2 x double> %y) { ; CHECK-LABEL: min_const_intrinsic_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype min_const_intrinsic_v2f64 () -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L:[0-9]+]]=, 0x1.4p2{{$}} -; SIMD128-NEXT: f64x2.splat $push[[R:[0-9]+]]=, $pop[[L]]{{$}} +; SIMD128-NEXT: v128.const $push[[R:[0-9]+]]=, 0x1.4p2, 0x1.4p2{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @min_const_intrinsic_v2f64() { %a = call <2 x double> @llvm.minimum.v2f64( @@ -1574,8 +1563,7 @@ define <2 x double> @min_const_intrinsic_v2f64() { ; CHECK-LABEL: max_const_intrinsic_v2f64: ; NO-SIMD128-NOT: f64x2 ; SIMD128-NEXT: .functype max_const_intrinsic_v2f64 () -> (v128){{$}} -; SIMD128-NEXT: f64.const $push[[L:[0-9]+]]=, 0x1.5p5{{$}} -; SIMD128-NEXT: f64x2.splat $push[[R:[0-9]+]]=, $pop[[L]]{{$}} +; SIMD128-NEXT: v128.const $push[[R:[0-9]+]]=, 0x1.5p5, 0x1.5p5{{$}} ; SIMD128-NEXT: return $pop[[R]]{{$}} define <2 x double> @max_const_intrinsic_v2f64() { %a = call <2 x double> @llvm.maximum.v2f64( diff --git a/llvm/test/CodeGen/WebAssembly/simd.ll b/llvm/test/CodeGen/WebAssembly/simd.ll index 2934d2c9beac..25e647f07230 100644 --- a/llvm/test/CodeGen/WebAssembly/simd.ll +++ b/llvm/test/CodeGen/WebAssembly/simd.ll @@ -36,7 +36,7 @@ define <16 x i8> @splat_v16i8(i8 %x) { } ; CHECK-LABEL: const_splat_v16i8: -; SIMD128: i8x16.splat +; SIMD128: v128.const $push0=, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42{{$}} define <16 x i8> @const_splat_v16i8() { ret <16 x i8> @@ -299,7 +299,7 @@ define <8 x i16> @splat_v8i16(i16 %x) { } ; CHECK-LABEL: const_splat_v8i16: -; SIMD128: i16x8.splat +; SIMD128: v128.const $push0=, 42, 42, 42, 42, 42, 42, 42, 42{{$}} define <8 x i16> @const_splat_v8i16() { ret <8 x i16> } @@ -547,7 +547,7 @@ define <4 x i32> @splat_v4i32(i32 %x) { } ; CHECK-LABEL: const_splat_v4i32: -; SIMD128: i32x4.splat +; SIMD128: v128.const $push0=, 42, 42, 42, 42{{$}} define <4 x i32> @const_splat_v4i32() { ret <4 x i32> } @@ -698,7 +698,7 @@ define <2 x i64> @splat_v2i64(i64 %x) { } ; CHECK-LABEL: const_splat_v2i64: -; SIMD128: i64x2.splat +; SIMD128: v128.const $push0=, 42, 42{{$}} define <2 x i64> @const_splat_v2i64() { ret <2 x i64> } @@ -847,7 +847,7 @@ define <4 x float> @splat_v4f32(float %x) { } ; CHECK-LABEL: const_splat_v4f32 -; SIMD128: f32x4.splat +; SIMD128: v128.const $push0=, 0x1.5p5, 0x1.5p5, 0x1.5p5, 0x1.5p5{{$}} define <4 x float> @const_splat_v4f32() { ret <4 x float> } @@ -998,7 +998,7 @@ define <2 x double> @splat_v2f64(double %x) { } ; CHECK-LABEL: const_splat_v2f64: -; SIMD128: f64x2.splat +; SIMD128: v128.const $push0=, 0x1.5p5, 0x1.5p5{{$}} define <2 x double> @const_splat_v2f64() { ret <2 x double> } From llvm-commits at lists.llvm.org Fri Jul 10 18:29:11 2020 From: llvm-commits at lists.llvm.org (David Truby via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:29:11 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: <01864cd7ffb74199107cfd11bf3ffd24@localhost.localdomain> DavidTruby accepted this revision. DavidTruby added a comment. This is great! Much neater than what I wrote for the combined constructs before and sharable with the rest of llvm :) Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 From llvm-commits at lists.llvm.org Fri Jul 10 18:33:19 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Fri, 10 Jul 2020 18:33:19 -0700 (PDT) Subject: [llvm] 28acaf8 - [RISCV][test] Add a test for (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2) transformation Message-ID: <5f0916df.1c69fb81.8d274.1c38@mx.google.com> Author: Ben Shi Date: 2020-07-10T18:33:12-07:00 New Revision: 28acaf84230fd246311f1cf18bad34b4bc3e5c0c URL: https://github.com/llvm/llvm-project/commit/28acaf84230fd246311f1cf18bad34b4bc3e5c0c DIFF: https://github.com/llvm/llvm-project/commit/28acaf84230fd246311f1cf18bad34b4bc3e5c0c.diff LOG: [RISCV][test] Add a test for (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2) transformation Reviewed By: lenary, MaskRay Differential Revision: https://reviews.llvm.org/D83159 Added: llvm/test/CodeGen/RISCV/addimm-mulimm.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll new file mode 100644 index 000000000000..6a2b3c3f0e3e --- /dev/null +++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll @@ -0,0 +1,95 @@ +;; Test that (mul (add x, c1), c2) can be transformed to +;; (add (mul x, c2), c1*c2) if profitable. + +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s +; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV64IM %s + +define signext i32 @add_mul_trans_accept_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 11 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: addi a0, a0, 407 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 11 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: addiw a0, a0, 407 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 37 + %tmp1 = mul i32 %tmp0, 11 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_accept_2(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_2 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 13 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 28 +; RV32IM-NEXT: addi a1, a1, 1701 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_2 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 13 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 28 +; RV64IM-NEXT: addiw a1, a1, 1701 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 8953 + %tmp1 = mul i32 %tmp0, 13 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_reject_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 19 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 9 +; RV32IM-NEXT: addi a1, a1, 585 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_reject_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 19 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 9 +; RV64IM-NEXT: addiw a1, a1, 585 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1971 + %tmp1 = mul i32 %tmp0, 19 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_2(i32 %x) { +; RV32IM: # %bb.0: +; RV32IM-NEXT: lui a1, 792 +; RV32IM-NEXT: addi a1, a1, -1709 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 1014660 +; RV32IM-NEXT: addi a1, a1, -1891 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM: # %bb.0: +; RV64IM-NEXT: lui a1, 792 +; RV64IM-NEXT: addiw a1, a1, -1709 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 1014660 +; RV64IM-NEXT: addiw a1, a1, -1891 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1841231 + %tmp1 = mul i32 %tmp0, 3242323 + ret i32 %tmp1 +} From llvm-commits at lists.llvm.org Fri Jul 10 18:33:35 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:33:35 +0000 (UTC) Subject: [PATCH] D83159: [RISCV][test] Add a new codegen test of add-mul transform In-Reply-To: References: Message-ID: <1697413b1d44c9fbca912230973a3124@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG28acaf84230f: [RISCV][test] Add a test for (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2)… (authored by benshi001, committed by MaskRay). Changed prior to commit: https://reviews.llvm.org/D83159?vs=277184&id=277190#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83159/new/ https://reviews.llvm.org/D83159 Files: llvm/test/CodeGen/RISCV/addimm-mulimm.ll Index: llvm/test/CodeGen/RISCV/addimm-mulimm.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/RISCV/addimm-mulimm.ll @@ -0,0 +1,95 @@ +;; Test that (mul (add x, c1), c2) can be transformed to +;; (add (mul x, c2), c1*c2) if profitable. + +; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV32IM %s +; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \ +; RUN: | FileCheck -check-prefix=RV64IM %s + +define signext i32 @add_mul_trans_accept_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 11 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: addi a0, a0, 407 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 11 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: addiw a0, a0, 407 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 37 + %tmp1 = mul i32 %tmp0, 11 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_accept_2(i32 %x) { +; RV32IM-LABEL: add_mul_trans_accept_2 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 13 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 28 +; RV32IM-NEXT: addi a1, a1, 1701 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_accept_2 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 13 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 28 +; RV64IM-NEXT: addiw a1, a1, 1701 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 8953 + %tmp1 = mul i32 %tmp0, 13 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_1(i32 %x) { +; RV32IM-LABEL: add_mul_trans_reject_1 +; RV32IM: # %bb.0: +; RV32IM-NEXT: addi a1, zero, 19 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 9 +; RV32IM-NEXT: addi a1, a1, 585 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM-LABEL: add_mul_trans_reject_1 +; RV64IM: # %bb.0: +; RV64IM-NEXT: addi a1, zero, 19 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 9 +; RV64IM-NEXT: addiw a1, a1, 585 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1971 + %tmp1 = mul i32 %tmp0, 19 + ret i32 %tmp1 +} + +define signext i32 @add_mul_trans_reject_2(i32 %x) { +; RV32IM: # %bb.0: +; RV32IM-NEXT: lui a1, 792 +; RV32IM-NEXT: addi a1, a1, -1709 +; RV32IM-NEXT: mul a0, a0, a1 +; RV32IM-NEXT: lui a1, 1014660 +; RV32IM-NEXT: addi a1, a1, -1891 +; RV32IM-NEXT: add a0, a0, a1 +; RV32IM-NEXT: ret +; +; RV64IM: # %bb.0: +; RV64IM-NEXT: lui a1, 792 +; RV64IM-NEXT: addiw a1, a1, -1709 +; RV64IM-NEXT: mul a0, a0, a1 +; RV64IM-NEXT: lui a1, 1014660 +; RV64IM-NEXT: addiw a1, a1, -1891 +; RV64IM-NEXT: addw a0, a0, a1 +; RV64IM-NEXT: ret + %tmp0 = add i32 %x, 1841231 + %tmp1 = mul i32 %tmp0, 3242323 + ret i32 %tmp1 +} -------------- next part -------------- A non-text attachment was scrubbed... Name: D83159.277190.patch Type: text/x-patch Size: 3032 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 18:41:09 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:41:09 +0000 (UTC) Subject: [PATCH] D83597: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: MaskRay added a comment. Hi, your git commit contains extra Phabricator tags. You can drop `Reviewers:` `Subscribers:` `Tags:` and the text `Summary:` from the git commit with the following script: arcfilter () { arc amend git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend --date=now -F - } `Reviewed By: ` is considered important by some people. Please keep the tag. (`--date=now` is my personal preference (author dates are usually not useful. Using committer dates can make log almost monotonic in time)) `llvm/utils/git/pre-push.py` can validate the message does not include unneeded tags. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83597/new/ https://reviews.llvm.org/D83597 From llvm-commits at lists.llvm.org Fri Jul 10 18:43:37 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:43:37 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <7f4bd37d44a5a5d1652cbc5ce0686f3a@localhost.localdomain> MaskRay added a comment. Hi, your git commit contains extra Phabricator tags. You can drop `Reviewers:` `Subscribers:` `Tags:` and the text `Summary:` from the git commit with the following script: arcfilter () { arc amend git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend --date=now -F - } `Reviewed By: ` is considered important by some people. Please keep the tag. (`--date=now` is my personal preference (author dates are usually not useful. Using committer dates can make log almost monotonic in time)) `llvm/utils/git/pre-push.py` can validate the message does not include unneeded tags. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 18:46:24 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:46:24 +0000 (UTC) Subject: [PATCH] D83605: [SelectionDAG][WebAssembly] Recognize splat value ABS operations Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff, RKSimon, david-arm. Herald added subscribers: llvm-commits, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. This patch gives SelectionDAG::isSplatValue the ability to look through ABS nodes, which allows the combine introduced in D83602 to scalarize ABS operations when only a single lane is needed. WebAssembly does not support ABS natively as a scalar operation, but we still want it to be scalarized and expanded, so this patch also adds a custom combine in the WebAssembly backend that scalarizes ABS specifically when it is a splat and only one lane is needed. This is useful when scalarizing WebAssembly shift values, which often contain ABS operations in Halide output. Depends on D83602 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83605 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83605.277192.patch Type: text/x-patch Size: 7042 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 18:48:22 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 01:48:22 +0000 (UTC) Subject: [PATCH] D70326: [docs] LLVM Security Group and Process In-Reply-To: References: Message-ID: <4259d9f1e4d1e8cd1429a685efcb1c51@localhost.localdomain> hfinkel added inline comments. ================ Comment at: llvm/docs/Security.rst:177 +* All security issues (as well as nomination / removal discussions) become public within approximately fourteen weeks of the fix landing in the LLVM repository. Precautions should be taken to avoid disclosing particularly sensitive data included in the report (e.g. username and password pairs). + + ---------------- I recommend that part of this process, presumably at the end, be directed at fulfilling goal #6 above ("Strive to improve security over time, for example by adding additional testing, fuzzing, and hardening after fixing issues."). Maybe something along the lines of: LLVM bug reports will be filed against fuzz testers and/or other components to detail gaps in testing coverage that seem likely to prevent similar cases from arising in the future. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70326/new/ https://reviews.llvm.org/D70326 From llvm-commits at lists.llvm.org Fri Jul 10 19:10:30 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:10:30 +0000 (UTC) Subject: [PATCH] D83606: [DAGCombiner][WebAssembly] Combine shuffles of more splat vals Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff, spatel. Herald added subscribers: llvm-commits, ecnelises, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. combineShuffleOfSplatVal previously only combined shuffles of splat values that were themselves shuffles. This patch generalizes the combine to also combine away shuffles of arbitrary splat values recognized by SelectionDAG::isSplatValue, as long as doing so does not create any new undefined lanes. On the WebAssembly side, this patch also introduces a new custom combine to remove undefined lanes from splatting build_vectors. Without this extra combine, the new generic shuffle combine would be inhibited on interesting cases such as the shl_abs_add function in simd-shift-complex-splats.ll because it would expose the undefined lanes. Depends on D83605 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83606 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-build-vector.ll llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/X86/vector-fshr-rot-256.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83606.277194.patch Type: text/x-patch Size: 9569 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:10:42 2020 From: llvm-commits at lists.llvm.org (Christudasan Devadasan via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:10:42 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <11916716f3fb64dc782315c7b8f08b17@localhost.localdomain> cdevadas updated this revision to Diff 277193. cdevadas added a comment. Incorporated the suggestions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 Files: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll Index: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll @@ -0,0 +1,60 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s | FileCheck -check-prefix=GCN %s +define void @test() #1 { + ; Clean up the unreachable blocks introduced with LowerSwitch pass. + ; This test ensures that, in the pass flow, UnreachableBlockElim pass + ; follows the LowerSwitch. Otherwise, this testcase will crash + ; immediately after the instruction selection due to the incomplete + ; PHI node in an MBB whose incoming values were never codegenerated. + ; + ; GCN-LABEL: name: test + ; GCN: bb.{{[0-9]+}}.entry: + ; GCN: bb.{{[0-9]+}}.entry.true.blk: + ; GCN: bb.{{[0-9]+}}.entry.false.blk: + ; GCN: bb.{{[0-9]+}}.switch.blk: + + ; GCN-NOT: bb.{{[0-9]+}}.preheader.blk + ; GCN-NOT: bb.{{[0-9]+}}.pre.false.blk: + ; GCN-NOT: bb.{{[0-9]+}}.unreach.blk: + ; GCN-NOT: PHI + + ; GCN: bb.{{[0-9]+}}.exit: + entry: + %idx = tail call i32 @llvm.amdgcn.workitem.id.x() #0 + br i1 undef, label %entry.true.blk, label %entry.false.blk + + entry.true.blk: ; preds = %entry + %exit.cmp = icmp ult i32 %idx, 3 + br i1 %exit.cmp, label %switch.blk, label %exit + + entry.false.blk: ; preds = %entry + unreachable + + switch.blk: ; preds = %entry.true.blk + switch i32 %idx, label %preheader.blk [ + i32 0, label %exit + i32 1, label %exit + i32 2, label %exit + ] + + preheader.blk: ; preds = %switch.blk + %pre.exit = icmp ult i32 %idx, 5 + br i1 %pre.exit, label %unreach.blk, label %pre.false.blk + + pre.false.blk: ; preds = %preheader.blk + %call.pre.false = tail call i32 @func(i32 %idx) #0 + br label %unreach.blk + + unreach.blk: ; preds = %preheader.blk, %pre.false.blk + %phi.val = phi i32 [ %call.pre.false, %pre.false.blk ], [ undef, %preheader.blk ] + store i32 %phi.val, i32* undef + unreachable + + exit: ; preds = %switch.blk + ret void +} + +declare i32 @llvm.amdgcn.workitem.id.x() #0 +declare i32 @func(i32)#0 + +attributes #0 = { nounwind readnone } +attributes #1 = { nounwind } Index: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -787,10 +787,14 @@ if (EnableLoadStoreVectorizer) addPass(createLoadStoreVectorizerPass()); + + // Moved it from PreISel so that any unreachable blocks + // introduced during the transformation will be removed by + // the UnreachableBlockElim pass. + addPass(createLowerSwitchPass()); } bool AMDGPUPassConfig::addPreISel() { - addPass(createLowerSwitchPass()); addPass(createFlattenCFGPass()); return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83584.277193.patch Type: text/x-patch Size: 3181 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:12:32 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:12:32 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: <2162ba2c38c35955eb0c8f9951018f00@localhost.localdomain> clementval updated this revision to Diff 277195. clementval marked 2 inline comments as done. clementval added a comment. Herald added a subscriber: mgorny. Rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 Files: flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/check-omp-structure.h flang/test/Semantics/omp-clause-validity01.f90 llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp llvm/utils/TableGen/TableGen.cpp llvm/utils/TableGen/TableGenBackends.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83326.277195.patch Type: text/x-patch Size: 79481 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:12:58 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:12:58 +0000 (UTC) Subject: [PATCH] D67687: [CodeGen] Define an interface for the new pass manager. (new) In-Reply-To: References: Message-ID: <60862ee14df8c65d75be5025669abd89@localhost.localdomain> ychen updated this revision to Diff 277196. ychen added a comment. - Update Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67687/new/ https://reviews.llvm.org/D67687 Files: llvm/include/llvm/CodeGen/MachinePassManager.h llvm/include/llvm/IR/PassManager.h llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/LLVMBuild.txt llvm/lib/CodeGen/MachinePassManager.cpp llvm/unittests/CodeGen/CMakeLists.txt llvm/unittests/CodeGen/PassManagerTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D67687.277196.patch Type: text/x-patch Size: 22803 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:13:53 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:13:53 +0000 (UTC) Subject: [PATCH] D83606: [DAGCombiner][WebAssembly] Combine shuffles of more splat vals In-Reply-To: References: Message-ID: tlively updated this revision to Diff 277197. tlively added a comment. - Remove TODO Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83606/new/ https://reviews.llvm.org/D83606 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-build-vector.ll llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/X86/vector-fshr-rot-256.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83606.277197.patch Type: text/x-patch Size: 9663 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:17:11 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:17:11 +0000 (UTC) Subject: [PATCH] D83607: [NewPM][CodeGen] Port MIRPrinter to NewPM Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83607 Files: llvm/include/llvm/CodeGen/MIRPrinter.h llvm/lib/CodeGen/MIRPrintingPass.cpp Index: llvm/lib/CodeGen/MIRPrintingPass.cpp =================================================================== --- llvm/lib/CodeGen/MIRPrintingPass.cpp +++ llvm/lib/CodeGen/MIRPrintingPass.cpp @@ -16,10 +16,29 @@ #include "llvm/CodeGen/Passes.h" #include "llvm/InitializePasses.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/Error.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; +PreservedAnalyses PrintMIRPass::run(MachineFunction &MF, + MachineFunctionAnalysisManager &) { + std::string Str; + raw_string_ostream StrOS(Str); + printMIR(StrOS, MF); + MachineFunctions.append(StrOS.str()); + return PreservedAnalyses::all(); +} + +Error PrintMIRPass::doFinalization(Module &M, + MachineFunctionAnalysisManager &) { + printMIR(OS, M); + OS << MachineFunctions; + return Error::success(); +} + +AnalysisKey PrintMIRPass::Key; + namespace { /// This pass prints out the LLVM IR to an output stream using the MIR Index: llvm/include/llvm/CodeGen/MIRPrinter.h =================================================================== --- llvm/include/llvm/CodeGen/MIRPrinter.h +++ llvm/include/llvm/CodeGen/MIRPrinter.h @@ -14,6 +14,8 @@ #ifndef LLVM_LIB_CODEGEN_MIRPRINTER_H #define LLVM_LIB_CODEGEN_MIRPRINTER_H +#include "llvm/CodeGen/MachinePassManager.h" + namespace llvm { class MachineBasicBlock; @@ -22,6 +24,20 @@ class raw_ostream; template class SmallVectorImpl; +class PrintMIRPass : public PassInfoMixin { + raw_ostream &OS; + std::string MachineFunctions; + +public: + PrintMIRPass() : OS(dbgs()) {} + PrintMIRPass(raw_ostream &OS) : OS(OS) {} + + PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &); + Error doFinalization(Module &M, MachineFunctionAnalysisManager &); + + static AnalysisKey Key; +}; + /// Print LLVM IR using the MIR serialization format to the given output stream. void printMIR(raw_ostream &OS, const Module &M); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83607.277198.patch Type: text/x-patch Size: 2030 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:18:32 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:18:32 +0000 (UTC) Subject: [PATCH] D83608: [NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, jfb, aheejin, hiraditya, mgorny. Herald added a project: LLVM. It is the counterpart of `TargetPassConfig` for legacy PM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83608 Files: llvm/include/llvm/CodeGen/CGPassBuilderOption.h llvm/include/llvm/CodeGen/CodeGenPassBuilder.h llvm/include/llvm/CodeGen/MachinePassRegistry.def llvm/include/llvm/Passes/StandardInstrumentations.h llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/CodeGenPassBuilder.cpp llvm/lib/CodeGen/TargetPassConfig.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83608.277199.patch Type: text/x-patch Size: 75111 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:20:09 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:20:09 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: wmi marked 5 inline comments as done. wmi added inline comments. ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:293 +static void writeInstrProfile(StringRef OutputFilename, + ProfileFormat OutputFormat, ---------------- davidxl wrote: > this refactoring can also be committed independently Done in https://reviews.llvm.org/D83521 ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:553 + std::unique_ptr WC; + // Make sure Inputs[i] is sample profile and Inputs[i - 1] is + // instrumentation profile. ---------------- davidxl wrote: > make sample file path as the part of the option, so there is no need to handle the ordering. Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the count in sample profile to be roughtly the same as the count in instr profile. To support -supplement-instr-with-sample=, will be a little weird and increase complexity. ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:568 + adjustInstrProfiles(WC, Reader, + Inputs[i].Weight / (double)Inputs[1 - i].Weight, + EarlyInlineSizeThreshold, BaseScaleFunction); ---------------- davidxl wrote: > Are these two weights comparable? Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we want to scale the count in sample profile by 3/2 before update the entry in instr profile. ================ Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:872 + cl::opt BaseScaleFunction( + "base-scale-function", cl::init(""), cl::Hidden, + cl::desc("When supplementing an instrumentation profile with sample " ---------------- davidxl wrote: > Is this flag tested? Good point, add tests for this flag and the flag early-inline-size-threshold Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 From llvm-commits at lists.llvm.org Fri Jul 10 19:20:50 2020 From: llvm-commits at lists.llvm.org (Wei Mi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:20:50 +0000 (UTC) Subject: [PATCH] D81981: [PGO] Supplement PGO profile with Sample profile In-Reply-To: References: Message-ID: wmi updated this revision to Diff 277201. wmi marked an inline comment as done. wmi added a comment. Address David's comments. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81981/new/ https://reviews.llvm.org/D81981 Files: llvm/docs/CommandGuide/llvm-profdata.rst llvm/include/llvm/ProfileData/InstrProf.h llvm/include/llvm/ProfileData/InstrProfWriter.h llvm/lib/ProfileData/InstrProf.cpp llvm/lib/ProfileData/InstrProfWriter.cpp llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test llvm/tools/llvm-profdata/llvm-profdata.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D81981.277201.patch Type: text/x-patch Size: 19429 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:22:09 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:22:09 +0000 (UTC) Subject: [PATCH] D83609: [NewPM][CodeGen] Support printing machine functions from StandardInstrumentation Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83609 Files: llvm/lib/Passes/StandardInstrumentations.cpp Index: llvm/lib/Passes/StandardInstrumentations.cpp =================================================================== --- llvm/lib/Passes/StandardInstrumentations.cpp +++ llvm/lib/Passes/StandardInstrumentations.cpp @@ -17,6 +17,8 @@ #include "llvm/Analysis/CallGraphSCCPass.h" #include "llvm/Analysis/LazyCallGraph.h" #include "llvm/Analysis/LoopInfo.h" +#include "llvm/CodeGen/MIRPrinter.h" +#include "llvm/CodeGen/MachineFunction.h" #include "llvm/IR/Function.h" #include "llvm/IR/IRPrintingPasses.h" #include "llvm/IR/Module.h" @@ -43,6 +45,15 @@ return std::make_pair(M, formatv(" (function: {0})", F->getName()).str()); } + if (any_isa(IR)) { + const MachineFunction *MF = any_cast(IR); + if (!llvm::isFunctionInPrintList(MF->getName())) + return None; + const Module *M = MF->getFunction().getParent(); + return std::make_pair( + M, formatv(" (machine function: {0})", MF->getName()).str()); + } + if (any_isa(IR)) { const LazyCallGraph::SCC *C = any_cast(IR); for (const LazyCallGraph::Node &N : *C) { @@ -109,6 +120,13 @@ llvm::printLoop(const_cast(*L), dbgs(), std::string(Banner)); } +void printIR(const MachineFunction *MF, StringRef Banner, + StringRef Extra = StringRef()) { + if (!llvm::isFunctionInPrintList(MF->getName())) + return; + dbgs() << Banner << Extra << "\n"; + printMIR(dbgs(), *MF); +} /// Generic IR-printing helper that unpacks a pointer to IRUnit wrapped into /// llvm::Any and does actual print job. void unwrapAndPrint(Any IR, StringRef Banner, bool ForceModule = false) { @@ -146,6 +164,14 @@ printIR(L, Banner); return; } + + if (any_isa(IR)) { + const MachineFunction *MF = any_cast(IR); + assert(MF && "machine function should be valid for printing"); + printIR(MF, Banner); + return; + } + llvm_unreachable("Unknown wrapped IR type"); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83609.277202.patch Type: text/x-patch Size: 2057 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:23:38 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:23:38 +0000 (UTC) Subject: [PATCH] D83610: [NewPM][CodeGen] Add TargetMachine polymorphic API to build codegen pipeline for NPM Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83610 Files: llvm/include/llvm/Target/TargetMachine.h llvm/lib/CodeGen/LLVMTargetMachine.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83610.277203.patch Type: text/x-patch Size: 4790 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:25:46 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:25:46 +0000 (UTC) Subject: [PATCH] D83612: [NewPM][CodeGen] Add NPM support to llc Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, hiraditya, mgorny. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83612 Files: llvm/include/llvm/CodeGen/TargetPassConfig.h llvm/include/llvm/Target/TargetMachine.h llvm/lib/CodeGen/TargetPassConfig.cpp llvm/tools/llc/CMakeLists.txt llvm/tools/llc/NewPMDriver.cpp llvm/tools/llc/NewPMDriver.h llvm/tools/llc/llc.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83612.277204.patch Type: text/x-patch Size: 17904 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:26:51 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:26:51 +0000 (UTC) Subject: [PATCH] D83613: [NewPM][CodeGen][X86] Add NPM pipeline builder Message-ID: ychen created this revision. Herald added subscribers: llvm-commits, nikic, jfb, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83613 Files: llvm/lib/Target/X86/LLVMBuild.txt llvm/lib/Target/X86/X86PassRegistry.def llvm/lib/Target/X86/X86TargetMachine.cpp llvm/lib/Target/X86/X86TargetMachine.h llvm/test/CodeGen/Generic/llc-start-stop-instance-errors.ll llvm/test/CodeGen/Generic/new-pm/llc-start-stop.ll llvm/test/CodeGen/X86/new-pm/O0-pipeline.ll llvm/test/CodeGen/X86/new-pm/llc-start-stop-instance.ll llvm/test/CodeGen/X86/new-pm/opt-pipeline.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83613.277205.patch Type: text/x-patch Size: 33617 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:28:34 2020 From: llvm-commits at lists.llvm.org (Yuanfang Chen via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:28:34 +0000 (UTC) Subject: [PATCH] D83614: [NewPM] Add back some codegen IR pass test removed in ebc88811b5c9ed Message-ID: ychen created this revision. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83614 Files: llvm/test/CodeGen/X86/unreachableblockelim.ll llvm/test/Transforms/PreISelIntrinsicLowering/load-relative.ll llvm/test/Transforms/PreISelIntrinsicLowering/objc-arc.ll Index: llvm/test/Transforms/PreISelIntrinsicLowering/objc-arc.ll =================================================================== --- llvm/test/Transforms/PreISelIntrinsicLowering/objc-arc.ll +++ llvm/test/Transforms/PreISelIntrinsicLowering/objc-arc.ll @@ -1,4 +1,5 @@ ; RUN: opt -pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s +; RUN: llc -enable-new-pm -passes='pre-isel-intrinsic-lowering' < %s | FileCheck %s ; Make sure calls to the objc intrinsics are translated to calls in to the ; runtime Index: llvm/test/Transforms/PreISelIntrinsicLowering/load-relative.ll =================================================================== --- llvm/test/Transforms/PreISelIntrinsicLowering/load-relative.ll +++ llvm/test/Transforms/PreISelIntrinsicLowering/load-relative.ll @@ -1,4 +1,5 @@ ; RUN: opt -pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s +; RUN: llc -enable-new-pm -passes='pre-isel-intrinsic-lowering' < %s | FileCheck %s ; CHECK: define i8* @foo32(i8* [[P:%.*]], i32 [[O:%.*]]) define i8* @foo32(i8* %p, i32 %o) { Index: llvm/test/CodeGen/X86/unreachableblockelim.ll =================================================================== --- llvm/test/CodeGen/X86/unreachableblockelim.ll +++ llvm/test/CodeGen/X86/unreachableblockelim.ll @@ -1,4 +1,5 @@ ; RUN: opt -S < %s -unreachableblockelim | FileCheck %s +; RUN: llc -enable-new-pm -passes='unreachableblockelim' < %s | FileCheck %s target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" -------------- next part -------------- A non-text attachment was scrubbed... Name: D83614.277206.patch Type: text/x-patch Size: 1530 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:31:01 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:31:01 +0000 (UTC) Subject: [PATCH] D83606: [DAGCombiner][WebAssembly] Combine shuffles of more splat vals In-Reply-To: References: Message-ID: <8583754af56eb1de6cdcdba258513b09@localhost.localdomain> tlively updated this revision to Diff 277207. tlively added a comment. - Update another x86 test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83606/new/ https://reviews.llvm.org/D83606 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-build-vector.ll llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll llvm/test/CodeGen/X86/vector-fshl-rot-256.ll llvm/test/CodeGen/X86/vector-fshr-rot-256.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83606.277207.patch Type: text/x-patch Size: 11508 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:31:33 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:31:33 +0000 (UTC) Subject: [PATCH] D83597: [COFF] Add cg_profile directive and .llvm.call-graph-profile section In-Reply-To: References: Message-ID: <62e3587dec50af949dd071674c1e44f3@localhost.localdomain> zequanwu added a comment. This is the same as D81775 , but I put the wrong diff ID in the commit message. Sorry about the confusion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83597/new/ https://reviews.llvm.org/D83597 From llvm-commits at lists.llvm.org Fri Jul 10 19:34:48 2020 From: llvm-commits at lists.llvm.org (Jinsong Ji via llvm-commits) Date: Fri, 10 Jul 2020 19:34:48 -0700 (PDT) Subject: [llvm] 3e3acc1 - [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel Message-ID: <5f092548.1c69fb81.c68f9.0e72@mx.google.com> Author: Jinsong Ji Date: 2020-07-11T02:24:12Z New Revision: 3e3acc1cc773f61470d815165704288118cb27e1 URL: https://github.com/llvm/llvm-project/commit/3e3acc1cc773f61470d815165704288118cb27e1 DIFF: https://github.com/llvm/llvm-project/commit/3e3acc1cc773f61470d815165704288118cb27e1.diff LOG: [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel P9 is the only one with InstrSchedModel, but we may have more in the future, we should not hardcoded it to P9, check hasInstrSchedModel instead. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83590 Added: Modified: llvm/lib/Target/PowerPC/PPCSubtarget.cpp llvm/test/CodeGen/PowerPC/sms-remark.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/PowerPC/PPCSubtarget.cpp b/llvm/lib/Target/PowerPC/PPCSubtarget.cpp index 2acff5d11ce0..3836cc960394 100644 --- a/llvm/lib/Target/PowerPC/PPCSubtarget.cpp +++ b/llvm/lib/Target/PowerPC/PPCSubtarget.cpp @@ -180,7 +180,7 @@ void PPCSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { bool PPCSubtarget::enableMachineScheduler() const { return true; } bool PPCSubtarget::enableMachinePipeliner() const { - return (CPUDirective == PPC::DIR_PWR9) && EnableMachinePipeliner; + return getSchedModel().hasInstrSchedModel() && EnableMachinePipeliner; } bool PPCSubtarget::useDFAforSMS() const { return false; } diff --git a/llvm/test/CodeGen/PowerPC/sms-remark.ll b/llvm/test/CodeGen/PowerPC/sms-remark.ll index 647b56fa7fcd..e68eb5631df4 100644 --- a/llvm/test/CodeGen/PowerPC/sms-remark.ll +++ b/llvm/test/CodeGen/PowerPC/sms-remark.ll @@ -1,14 +1,19 @@ ; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ ; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr9 --ppc-enable-pipeliner \ ; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ -; RUN: | FileCheck %s +; RUN: | FileCheck %s --check-prefix=ENABLED +; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ +; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr8 --ppc-enable-pipeliner \ +; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ +; RUN: | FileCheck %s --allow-empty --check-prefix=DISABLED @x = dso_local local_unnamed_addr global <{ i32, i32, i32, i32, [1020 x i32] }> <{ i32 1, i32 2, i32 3, i32 4, [1020 x i32] zeroinitializer }>, align 4 @y = dso_local global [1024 x i32] zeroinitializer, align 4 define dso_local i32* @foo() local_unnamed_addr { -;CHECK: Schedule found with Initiation Interval -;CHECK: Pipelined succesfully! +;ENABLED: Schedule found with Initiation Interval +;ENABLED: Pipelined succesfully! +;DISABLED-NOT: remark entry: %.pre = load i32, i32* getelementptr inbounds ([1024 x i32], [1024 x i32]* @y, i64 0, i64 0), align 4 br label %for.body From llvm-commits at lists.llvm.org Fri Jul 10 19:34:55 2020 From: llvm-commits at lists.llvm.org (Jinsong Ji via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:34:55 +0000 (UTC) Subject: [PATCH] D83590: [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel In-Reply-To: References: Message-ID: <5da65fb802b61ace99c4071f8e877d96@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG3e3acc1cc773: [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel (authored by jsji). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83590/new/ https://reviews.llvm.org/D83590 Files: llvm/lib/Target/PowerPC/PPCSubtarget.cpp llvm/test/CodeGen/PowerPC/sms-remark.ll Index: llvm/test/CodeGen/PowerPC/sms-remark.ll =================================================================== --- llvm/test/CodeGen/PowerPC/sms-remark.ll +++ llvm/test/CodeGen/PowerPC/sms-remark.ll @@ -1,14 +1,19 @@ ; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ ; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr9 --ppc-enable-pipeliner \ ; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ -; RUN: | FileCheck %s +; RUN: | FileCheck %s --check-prefix=ENABLED +; RUN: llc < %s -ppc-vsr-nums-as-vr -mtriple=powerpc64-unknown-linux-gnu \ +; RUN: -verify-machineinstrs -ppc-asm-full-reg-names -mcpu=pwr8 --ppc-enable-pipeliner \ +; RUN: -pass-remarks-analysis=pipeliner -pass-remarks=pipeliner -o /dev/null 2>&1 \ +; RUN: | FileCheck %s --allow-empty --check-prefix=DISABLED @x = dso_local local_unnamed_addr global <{ i32, i32, i32, i32, [1020 x i32] }> <{ i32 1, i32 2, i32 3, i32 4, [1020 x i32] zeroinitializer }>, align 4 @y = dso_local global [1024 x i32] zeroinitializer, align 4 define dso_local i32* @foo() local_unnamed_addr { -;CHECK: Schedule found with Initiation Interval -;CHECK: Pipelined succesfully! +;ENABLED: Schedule found with Initiation Interval +;ENABLED: Pipelined succesfully! +;DISABLED-NOT: remark entry: %.pre = load i32, i32* getelementptr inbounds ([1024 x i32], [1024 x i32]* @y, i64 0, i64 0), align 4 br label %for.body Index: llvm/lib/Target/PowerPC/PPCSubtarget.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCSubtarget.cpp +++ llvm/lib/Target/PowerPC/PPCSubtarget.cpp @@ -180,7 +180,7 @@ bool PPCSubtarget::enableMachineScheduler() const { return true; } bool PPCSubtarget::enableMachinePipeliner() const { - return (CPUDirective == PPC::DIR_PWR9) && EnableMachinePipeliner; + return getSchedModel().hasInstrSchedModel() && EnableMachinePipeliner; } bool PPCSubtarget::useDFAforSMS() const { return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83590.277208.patch Type: text/x-patch Size: 2044 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 19:49:21 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 02:49:21 +0000 (UTC) Subject: [PATCH] D83615: [WebAssembly] Custom combine splat build_vectors into swizzles Message-ID: tlively created this revision. tlively added reviewers: aheejin, dschuff. Herald added subscribers: llvm-commits, sunfish, hiraditya, jgravelle-google, sbc100. Herald added a project: LLVM. Some splat build_vectors can be lowered to swizzles, but only if all of their lanes but one are undefined. This patch adds a custom combine to turn these build_vectors into swizzles before their undefined lanes are combined away by the combine introduced in D83606 . Depends on D83606 . Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83615 Files: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-build-vector.ll Index: llvm/test/CodeGen/WebAssembly/simd-build-vector.ll =================================================================== --- llvm/test/CodeGen/WebAssembly/simd-build-vector.ll +++ llvm/test/CodeGen/WebAssembly/simd-build-vector.ll @@ -93,14 +93,10 @@ ret <8 x i16> %v7 } -;; TODO: This should be a swizzle, but will need a custom combine to -;; preempt the combine that removes undef lanes from splat -;; build_vectors, since swizzle lowering depends on those lanes being -;; undef. - ; CHECK-LABEL: swizzle_one_i8x16: ; CHECK-NEXT: .functype swizzle_one_i8x16 (v128, v128) -> (v128) -; CHECK-NOT: v8x16.swizzle +; CHECK-NEXT: v8x16.swizzle $push[[L0:[0-9]+]]=, $0, $1 +; CHECK-NEXT: return $pop[[L0]] define <16 x i8> @swizzle_one_i8x16(<16 x i8> %src, <16 x i8> %mask) { %m0 = extractelement <16 x i8> %mask, i32 0 %s0 = extractelement <16 x i8> %src, i8 %m0 Index: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp =================================================================== --- llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp +++ llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp @@ -1736,6 +1736,48 @@ return DAG.getBitcast(DstType, NewShuffle); } +static SDValue combineSwizzleSplat(BuildVectorSDNode *N, SelectionDAG &DAG, + const BitVector &UndefElts) { + // This splat can only be lowered to a swizzle if all lanes but one are undef. + // + // We are looking for the following pattern: + // (insert undef, + // (extract Vec1, + // (sext (extract Vec2, Index)) + // ), + // Index + // ) + // + // To combine to: (swizzle vec1, vec2) + if (N->getValueType(0) != MVT::v16i8) + return SDValue(); + if (UndefElts.count() != 15) + return SDValue(); + unsigned Index1 = UndefElts.find_first_unset(); + auto Extract1 = N->getOperand(Index1); + if (Extract1.getOpcode() != ISD::EXTRACT_VECTOR_ELT) + return SDValue(); + auto Vec1 = Extract1.getOperand(0); + if (Vec1.getValueType() != MVT::v16i8) + return SDValue(); + auto SExt = Extract1.getOperand(1); + if (SExt.getOpcode() != ISD::SIGN_EXTEND) + return SDValue(); + auto Extract2 = SExt.getOperand(0); + if (Extract2.getOpcode() != ISD::EXTRACT_VECTOR_ELT) + return SDValue(); + auto Vec2 = Extract2.getOperand(0); + if (Vec2.getValueType() != MVT::v16i8) + return SDValue(); + if (Extract2.getOperand(1).getOpcode() != ISD::Constant) + return SDValue(); + unsigned Index2 = Extract2.getConstantOperandVal(1); + if (Index1 != Index2) + return SDValue(); + + return DAG.getNode(WebAssemblyISD::SWIZZLE, SDLoc(N), MVT::v16i8, Vec1, Vec2); +} + static SDValue performBUILD_VECTORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { auto &DAG = DCI.DAG; @@ -1744,9 +1786,16 @@ // Remove undef lanes from splats. They don't allow us to make any extra // optimizations and they can inhibit splat scalarization combines. BitVector UndefElts; - if (SDValue SplatVal = Build->getSplatValue(&UndefElts)) - if (UndefElts.any()) + if (SDValue SplatVal = Build->getSplatValue(&UndefElts)) { + if (UndefElts.any()) { + // If this BUILD_VECTOR can be implemented as a swizzle, perform that + // transformation now because once we remove the undef lanes we will not + // be able to recover the swizzle. + if (auto Swizzle = combineSwizzleSplat(Build, DAG, UndefElts)) + return Swizzle; return DAG.getSplatBuildVector(N->getValueType(0), SDLoc(N), SplatVal); + } + } return SDValue(); } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83615.277210.patch Type: text/x-patch Size: 3583 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 20:17:04 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Fri, 10 Jul 2020 20:17:04 -0700 (PDT) Subject: [llvm] e628092 - [X86][MMX] Optimize MMX shift intrinsics. Message-ID: <5f092f30.1c69fb81.5812c.234a@mx.google.com> Author: Wang, Pengfei Date: 2020-07-11T11:16:23+08:00 New Revision: e6280925249c87c11568a305a074581cc073bd45 URL: https://github.com/llvm/llvm-project/commit/e6280925249c87c11568a305a074581cc073bd45 DIFF: https://github.com/llvm/llvm-project/commit/e6280925249c87c11568a305a074581cc073bd45.diff LOG: [X86][MMX] Optimize MMX shift intrinsics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83534 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/mmx-intrinsics.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 695b6ef35f11..721b262aa433 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -25236,6 +25236,9 @@ SDValue X86TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, // Clamp out of bounds shift amounts since they will otherwise be masked // to 8-bits which may make it no longer out of bounds. unsigned ShiftAmount = C->getAPIntValue().getLimitedValue(255); + if (ShiftAmount == 0) + return Op.getOperand(1); + return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), Op.getOperand(0), Op.getOperand(1), DAG.getTargetConstant(ShiftAmount, DL, MVT::i32)); diff --git a/llvm/test/CodeGen/X86/mmx-intrinsics.ll b/llvm/test/CodeGen/X86/mmx-intrinsics.ll index 48d4ad0490f7..abc8fea58014 100644 --- a/llvm/test/CodeGen/X86/mmx-intrinsics.ll +++ b/llvm/test/CodeGen/X86/mmx-intrinsics.ll @@ -311,6 +311,19 @@ entry: ret i64 %4 } +define i64 @test72_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test72_2 +; ALL-NOT: psraw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrai.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.q(x86_mmx, i32) nounwind readnone define i64 @test71(<1 x i64> %a) nounwind readnone optsize ssp { @@ -339,6 +352,19 @@ entry: ret i64 %4 } +define i64 @test70_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test70_2 +; ALL-NOT: psrld +entry: + %0 = bitcast <1 x i64> %a to <2 x i32> + %mmx_var.i = bitcast <2 x i32> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrli.d(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <2 x i32> + %3 = bitcast <2 x i32> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.w(x86_mmx, i32) nounwind readnone define i64 @test69(<1 x i64> %a) nounwind readnone optsize ssp { @@ -397,6 +423,19 @@ entry: ret i64 %4 } +define i64 @test66_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test66_2 +; ALL-NOT: psllw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.pslli.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psra.d(x86_mmx, x86_mmx) nounwind readnone define i64 @test65(<1 x i64> %a, <1 x i64> %b) nounwind readnone optsize ssp { From llvm-commits at lists.llvm.org Fri Jul 10 20:17:07 2020 From: llvm-commits at lists.llvm.org (Pengfei Wang via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 03:17:07 +0000 (UTC) Subject: [PATCH] D83534: [X86][MMX] Optimize MMX shift intrinsics. In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rGe6280925249c: [X86][MMX] Optimize MMX shift intrinsics. (authored by pengfei). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83534/new/ https://reviews.llvm.org/D83534 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/mmx-intrinsics.ll Index: llvm/test/CodeGen/X86/mmx-intrinsics.ll =================================================================== --- llvm/test/CodeGen/X86/mmx-intrinsics.ll +++ llvm/test/CodeGen/X86/mmx-intrinsics.ll @@ -311,6 +311,19 @@ ret i64 %4 } +define i64 @test72_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test72_2 +; ALL-NOT: psraw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrai.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.q(x86_mmx, i32) nounwind readnone define i64 @test71(<1 x i64> %a) nounwind readnone optsize ssp { @@ -339,6 +352,19 @@ ret i64 %4 } +define i64 @test70_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test70_2 +; ALL-NOT: psrld +entry: + %0 = bitcast <1 x i64> %a to <2 x i32> + %mmx_var.i = bitcast <2 x i32> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.psrli.d(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <2 x i32> + %3 = bitcast <2 x i32> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psrli.w(x86_mmx, i32) nounwind readnone define i64 @test69(<1 x i64> %a) nounwind readnone optsize ssp { @@ -397,6 +423,19 @@ ret i64 %4 } +define i64 @test66_2(<1 x i64> %a) nounwind readnone optsize ssp { +; ALL-LABEL: @test66_2 +; ALL-NOT: psllw +entry: + %0 = bitcast <1 x i64> %a to <4 x i16> + %mmx_var.i = bitcast <4 x i16> %0 to x86_mmx + %1 = tail call x86_mmx @llvm.x86.mmx.pslli.w(x86_mmx %mmx_var.i, i32 0) nounwind + %2 = bitcast x86_mmx %1 to <4 x i16> + %3 = bitcast <4 x i16> %2 to <1 x i64> + %4 = extractelement <1 x i64> %3, i32 0 + ret i64 %4 +} + declare x86_mmx @llvm.x86.mmx.psra.d(x86_mmx, x86_mmx) nounwind readnone define i64 @test65(<1 x i64> %a, <1 x i64> %b) nounwind readnone optsize ssp { Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -25236,6 +25236,9 @@ // Clamp out of bounds shift amounts since they will otherwise be masked // to 8-bits which may make it no longer out of bounds. unsigned ShiftAmount = C->getAPIntValue().getLimitedValue(255); + if (ShiftAmount == 0) + return Op.getOperand(1); + return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), Op.getOperand(0), Op.getOperand(1), DAG.getTargetConstant(ShiftAmount, DL, MVT::i32)); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83534.277212.patch Type: text/x-patch Size: 2784 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 20:22:18 2020 From: llvm-commits at lists.llvm.org (rithik sharma via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 03:22:18 +0000 (UTC) Subject: [PATCH] D83543: [CodeMoverUtils] Add more data dependency related test case In-Reply-To: References: Message-ID: <5ea5aaa9661e76385f2b46b6251c1379@localhost.localdomain> RithikSharma updated this revision to Diff 277213. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83543/new/ https://reviews.llvm.org/D83543 Files: llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83543.277213.patch Type: text/x-patch Size: 8903 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 20:30:34 2020 From: llvm-commits at lists.llvm.org (Jan Vesely via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 03:30:34 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: <46503ec22abf610ea49cc76eb38631c8@localhost.localdomain> jvesely added a comment. tldr; I don't mind changing this to be in sync with clang, just get the patch right. However, it's not the right way to guarantee IR level compatibility between clang-cl-headers and libclc. details below In D83473#2145073 , @jenatali wrote: > @jvesely, I think libclc needs to change its definition here, as it's the only one out of 3 OpenCL standard lib headers that's different. There are two separate issues a) whether to change libclc b) doing the change in the correct way > Since OpenCL allows apps to provide precompiled intermediate representations to the API, in the form of SPIR or SPIR-V, that means that the app could have embedded references to `FP_ILOGBNAN`, which are just a constant value since it's a `#define`, in their own code. They could write a kernel which calls `ilogb` and compares the result to `FP_ILOGBNAN`, and compile that with either clang trunk (which uses opencl-c.h automatically), or with Khronos's offline compiler (https://github.com/KhronosGroup/SPIR) using Khronos's standard library headers (https://github.com/KhronosGroup/libclcxx). Both of these standard library implementation headers define `FP_ILOGBNAN` to be `INT_MAX`: > > - https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/opencl-c-base.h#L165 > - https://github.com/KhronosGroup/libclcxx/blob/96459f111c3e3a4709f7e09bf5fb73dea81a475a/include/opencl_math_constants#L85 I've no problem with the compatibility argument. but using `INT_MAX` for both `NaN` and `Inf` is in my understanding wrong. If the intention is to use `INT_MAX` for `FP_ILOGBNAN`, the implementation needs to be changed to return a different value for `Inf` (presumably 127 for fp32 and 1023 for fp64, since those can reuse the default codepath). Note that LLVM internally uses `INT_MIN + 1 (-INT_MAX)` for `ilogb(0)`, `INT_MIN` for `ilogb(NaN)` and `INT_MAX` for `ilogb(Inf)` [0]. This means that any optimization handling these constants and ilogb operation would be incorrect in LLVM. IMO, the correct action on LLVM side would be to submit a bug to Khronos to either fix their headers or provide clarification wrt `ilogb(Inf)` behaviour. > Nobody is going to offline compile OpenCL code using the libclc headers. That means that if our implementation wants to leverage libclc behind the scenes, then libclc should use the same definition of this value as the other standard libraries. Why not? it's fairly easy and straightforward. What are the requirements for precompiled portability? The specs allow implementations to use different values for `FP_ILOBGNAN` and `FP_ILOGB0`. afaiu, offline compiled kernels are not expected to work between different implementations. I haven't found a mention of ilogb values in SPIR-V spec. In this regard clangs opencl headers and libclc are different implementations. Again, I agree that syncing them would be desirable, but keeping two different set of headers is not the right way to do it. > Yeah, it's different from CPU land where you don't go around compiling against standard library headers from one library, and then linking against another -- except that you can kind of do that too between e.g. glibc and musl (which have matching definitions of this define for what it's worth). switching libraries at link time requires ABI compatibility, this is no different from CPU libraries, CLC spec only details API. fwiw: glibc definitions of these `FP_ILOB0` and `FP_ILOGBNAN` vary depending on system configuration [1]. on x86 it's # define FP_ILOGB0 (-2147483647 - 1) # define FP_ILOGBNAN (-2147483647 - 1) [2] while for m68k or is ia64 it's # define FP_ILOGB0 (-2147483647 - 1) # define FP_ILOGBNAN 2147483647 [3, 4] on other architectures it's # define FP_ILOGB0 (-2147483647) # define FP_ILOGBNAN 2147483647 [5] So copying IR representations of C programs between those platforms won't work for the same reasons. [0] https://llvm.org/doxygen/APFloat_8h_source.html#l00229 [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=math/math.h;h=e48860e3915b0ec5c0ae0d594d84432c0568ddc6;hb=HEAD#l190 [2] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/bits/fp-logb.h;h=f180a90754a223547f0d0b965b6fbe6d0132190b;hb=HEAD [3] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/m68k/m680x0/bits/fp-logb.h;h=bb31eb8d6210033e4be3e8e05bae9a6e09d09e86;hb=HEAD [4] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ia64/bits/fp-logb.h;h=45d1bd5e01bb0ecfb9ae7d366f18e315b637c8ca;hb=HEAD [5] https://sourceware.org/git/?p=glibc.git;a=blob;f=bits/fp-logb.h;h=30effcd52196e7e6a53bde38c66dd6011bbbc3e1;hb=HEAD Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Fri Jul 10 21:07:00 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 04:07:00 +0000 (UTC) Subject: [PATCH] D83097: [FileCheck] Implement -dump-input-filter In-Reply-To: References: Message-ID: <8449eb30fba70b1f0e4756fbeb85b52e@localhost.localdomain> mehdi_amini added inline comments. ================ Comment at: llvm/utils/FileCheck/FileCheck.cpp:455 + case DumpInputFilterAll: + llvm_unreachable("unexpected DumpInputFilterAll"); + break; ---------------- jdenny wrote: > jdenny wrote: > > mehdi_amini wrote: > > > In a tool like FileCheck I rather err on the side of deterministically failing with a `report_fatal_error` > > I don't object in principal, but I see no precedent for this in FileCheck. > > > > Are you ok with this landing as is? If FileCheck should generally use `report_fatal_error` instead of `llvm_unreachable`, I feel like that should be discussed in a separate review for all occurrences. > Given your accept and the tone of your comment, I decided it's safe to land this as is. I'm fine to revert or adjust if you feel this was the wrong decision. And again, I'm open to a larger discussion about making this change throughout FileCheck. LG I don't think it has bitten anyone, so likely not worth the effort right now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83097/new/ https://reviews.llvm.org/D83097 From llvm-commits at lists.llvm.org Fri Jul 10 21:07:51 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 04:07:51 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <18c99f7326e1e55bb0051f3320d0c3b8@localhost.localdomain> tmsriram updated this revision to Diff 277216. tmsriram added a comment. Enhance test cfi-basic-block-sections-1.ll to also check for frame-pointer=none. There is exactly 2 ways in which CFA can be computed: 1. With frame pointer via %ebp 2. Without, where %esp offset is used when frame pointer is omitted The first test now explicitly checks both to clear any ambiguity here. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.277216.patch Type: text/x-patch Size: 16005 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 21:33:17 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 04:33:17 +0000 (UTC) Subject: [PATCH] D83562: [fix-irreducible] Skip unreachable predecessors. In-Reply-To: References: Message-ID: sameerds accepted this revision. sameerds added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83562/new/ https://reviews.llvm.org/D83562 From llvm-commits at lists.llvm.org Fri Jul 10 21:42:00 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 04:42:00 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <2cc427ab9ce72a0918fc5e42da8f5f93@localhost.localdomain> sameerds added inline comments. ================ Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:790 addPass(createLoadStoreVectorizerPass()); + + addPass(createLowerSwitchPass()); ---------------- arsenm wrote: > Can you add a comment for why this is here The comment shouldn't record where the pass used to be. That is all in git history. But it should explain why this point is the correct place in the flow. Consider it as hint you would want to leave a future programmer if they discovered a reason to move the pass yet again. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 From llvm-commits at lists.llvm.org Fri Jul 10 21:44:13 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 04:44:13 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <8d5cfe81689f47c011b60793a57f5f1b@localhost.localdomain> sameerds added a comment. >From the description: "This patch inserts the Lowerswitch pass in an appropriate place to ensure any dead blocks resulting from the transformation will be removed" This change is not really about one transformation (LowerSwitch). It's about generally protecting the backend from unreachable blocks, no matter where they originate. So a better explanation of why this is the "appropriate place" will be useful. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 From llvm-commits at lists.llvm.org Fri Jul 10 21:50:21 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 21:50:21 -0700 Subject: [PATCH] D83462: [DWARF] Avoid entry_values production for SCE In-Reply-To: References: Message-ID: On Thu, Jul 9, 2020 at 2:01 PM Paul Robinson via Phabricator < reviews at reviews.llvm.org> wrote: > probinson added a comment. > > In D83462#2142036 , @echristo > wrote: > > > So the tuning here for SCE is also a "does not support" or something > else? > > > I'm told our debugger currently does not support the entry-value opcode. > Locations with that opcode would be dropped/ignored (presumably just the > individual location-list elements, although I haven't verified). > > I am nudging our folks in the direction of supporting it, but need to > demonstrate value first; that the additional location descriptions increase > availability, and that the expressions can be evaluated usefully. I have > to say a few preliminary experiments don't make me feel too positive about > that second part, as the entry-value expressions tend to rely on registers > that aren't saved by the ABI, so unwinding won't recover them. > > In the meantime I'd prefer that they weren't emitted. for the usual size > reasons. > Totally. That sounds more like the "preference" rather than "we can't support this part of dwarf5" that makes the command line difference for me if that makes sense :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Fri Jul 10 22:12:55 2020 From: llvm-commits at lists.llvm.org (Vitaly Buka via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 05:12:55 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <0d6efa2b4762dc40499b0666a1c74446@localhost.localdomain> vitalybuka added subscribers: eugenis, guiand, vitalybuka. vitalybuka added a comment. After this patch we have false msan reports on code like this: bool iv_compare2(const int *op1, const int *op2) { if (op1[1] != op2[1]) return op1[1] < op2[1]; for (int i = 1; i >= 0; i--) { if (op1[i] != op2[i]) return op1[i] < op2[i]; } return false; } void foo() { int a[2] = {}; int b[2] = {}; auto UNINITIALIZED= iv_compare2(a, b); } Here it looks fine and the same as before the patch. It returns "and undef, false" which should be false. *** IR Dump After Simplify the CFG *** ; Function Attrs: norecurse nounwind readonly sanitize_memory uwtable define zeroext i1 @_Z11iv_compare2PKiS0_(i32* nocapture readonly %op1, i32* nocapture readonly %op2) local_unnamed_addr #0 !dbg !8 { entry: %arrayidx = getelementptr inbounds i32, i32* %op1, i64 1, !dbg !10 %0 = load i32, i32* %arrayidx, align 4, !dbg !10 %arrayidx1 = getelementptr inbounds i32, i32* %op2, i64 1, !dbg !11 %1 = load i32, i32* %arrayidx1, align 4, !dbg !11 %cmp.not = icmp eq i32 %0, %1, !dbg !12 br i1 %cmp.not, label %for.cond, label %if.then, !dbg !10 if.then: ; preds = %entry %cmp4 = icmp slt i32 %0, %1, !dbg !13 ret i1 %cmp4, !dbg !14 for.cond: ; preds = %entry %2 = load i32, i32* %op1, align 4, !dbg !15 %3 = load i32, i32* %op2, align 4, !dbg !16 %cmp9.not.1 = icmp eq i32 %2, %3, !dbg !17 %cmp15 = icmp slt i32 %2, %3 %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15, !dbg !15 %spec.select40 = select i1 %cmp9.not.1, i1 false, i1 true, !dbg !15 %spec.select = and i1 %spec.select39, %spec.select40 ret i1 %spec.select } However with this patch after the next transformation it breaks the code: Now it returns undef instead of false if %2 == %3 *** IR Dump After Combine redundant instructions *** ; Function Attrs: norecurse nounwind readonly sanitize_memory uwtable define zeroext i1 @_Z11iv_compare2PKiS0_(i32* nocapture readonly %op1, i32* nocapture readonly %op2) local_unnamed_addr #0 !dbg !8 { entry: %arrayidx = getelementptr inbounds i32, i32* %op1, i64 1, !dbg !10 %0 = load i32, i32* %arrayidx, align 4, !dbg !10 %arrayidx1 = getelementptr inbounds i32, i32* %op2, i64 1, !dbg !11 %1 = load i32, i32* %arrayidx1, align 4, !dbg !11 %cmp.not = icmp eq i32 %0, %1, !dbg !12 br i1 %cmp.not, label %for.cond, label %if.then, !dbg !10 if.then: ; preds = %entry %cmp4 = icmp slt i32 %0, %1, !dbg !13 ret i1 %cmp4, !dbg !14 for.cond: ; preds = %entry %2 = load i32, i32* %op1, align 4, !dbg !15 %3 = load i32, i32* %op2, align 4, !dbg !16 %cmp9.not.1 = icmp eq i32 %2, %3, !dbg !17 %cmp15 = icmp slt i32 %2, %3 %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15, !dbg !15 ret i1 %spec.select39 } The msan reasonably reports a bug. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Fri Jul 10 22:53:04 2020 From: llvm-commits at lists.llvm.org (Eric Christopher via llvm-commits) Date: Fri, 10 Jul 2020 22:53:04 -0700 (PDT) Subject: [lld] 256e4d4 - Fix signed vs unsigned comparison warnings a different way. Message-ID: <5f0953c0.1c69fb81.b25d5.3079@mx.google.com> Author: Eric Christopher Date: 2020-07-10T22:52:50-07:00 New Revision: 256e4d46a67517056d1e45d71c02424db01eff44 URL: https://github.com/llvm/llvm-project/commit/256e4d46a67517056d1e45d71c02424db01eff44 DIFF: https://github.com/llvm/llvm-project/commit/256e4d46a67517056d1e45d71c02424db01eff44.diff LOG: Fix signed vs unsigned comparison warnings a different way. Added: Modified: lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp Removed: ################################################################################ diff --git a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp index aad5f8afcfdc..fbf18a8d9e00 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileBinaryReaderTests.cpp @@ -75,7 +75,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -106,7 +106,7 @@ TEST(BinaryReaderTest, empty_obj_x86) { fromBinary(fileBytes, sizeof(fileBytes), "i386"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -137,7 +137,7 @@ TEST(BinaryReaderTest, empty_obj_ppc) { fromBinary(fileBytes, sizeof(fileBytes), "ppc"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -168,7 +168,7 @@ TEST(BinaryReaderTest, empty_obj_armv7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -182,7 +182,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "x86_64"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); EXPECT_TRUE(f->undefinedSymbols.empty()); @@ -191,7 +191,7 @@ TEST(BinaryReaderTest, empty_obj_x86_64_arm7) { fromBinary(fileBytes, sizeof(fileBytes), "armv7"); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f2->fileType), MH_OBJECT); - EXPECT_EQ((int)(f2->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f2->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); EXPECT_TRUE(f2->undefinedSymbols.empty()); @@ -268,7 +268,7 @@ TEST(BinaryReaderTest, hello_obj_x86_64) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -393,7 +393,7 @@ TEST(BinaryReaderTest, hello_obj_x86) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -525,7 +525,7 @@ TEST(BinaryReaderTest, hello_obj_armv7) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); @@ -669,7 +669,7 @@ TEST(BinaryReaderTest, hello_obj_ppc) { EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ((int)(f->fileType), MH_OBJECT); - EXPECT_EQ((int)(f->flags), MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& text = f->sections[0]; EXPECT_TRUE(text.segmentName.equals("__TEXT")); diff --git a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp index 6ceb197b4b84..dbfe3a051811 100644 --- a/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp +++ b/lld/unittests/MachOTests/MachONormalizedFileYAMLTests.cpp @@ -50,7 +50,7 @@ TEST(ObjectFileYAML, empty_ppc) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_ppc); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)(int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -66,7 +66,7 @@ TEST(ObjectFileYAML, empty_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)(int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -82,7 +82,7 @@ TEST(ObjectFileYAML, empty_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -98,7 +98,7 @@ TEST(ObjectFileYAML, empty_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -114,7 +114,7 @@ TEST(ObjectFileYAML, empty_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -130,7 +130,7 @@ TEST(ObjectFileYAML, empty_armv7s) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7s); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f->sections.empty()); EXPECT_TRUE(f->localSymbols.empty()); EXPECT_TRUE(f->globalSymbols.empty()); @@ -143,7 +143,7 @@ TEST(ObjectFileYAML, roundTrip) { NormalizedFile f; f.arch = lld::MachOLinkingContext::arch_x86_64; f.fileType = llvm::MachO::MH_OBJECT; - f.flags = llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS; + f.flags = (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS; f.os = lld::MachOLinkingContext::OS::macOSX; toYAML(f, intermediate); } @@ -151,7 +151,7 @@ TEST(ObjectFileYAML, roundTrip) { std::unique_ptr f2 = fromYAML(intermediate); EXPECT_EQ(f2->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ((int)(f2->fileType), llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f2->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f2->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_TRUE(f2->sections.empty()); EXPECT_TRUE(f2->localSymbols.empty()); EXPECT_TRUE(f2->globalSymbols.empty()); @@ -275,7 +275,7 @@ TEST(ObjectFileYAML, hello_x86_64) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86_64); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -405,7 +405,7 @@ TEST(ObjectFileYAML, hello_x86) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_x86); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -533,7 +533,7 @@ TEST(ObjectFileYAML, hello_armv6) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv6); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; @@ -673,7 +673,7 @@ TEST(ObjectFileYAML, hello_armv7) { "...\n"); EXPECT_EQ(f->arch, lld::MachOLinkingContext::arch_armv7); EXPECT_EQ(f->fileType, llvm::MachO::MH_OBJECT); - EXPECT_EQ((int)(f->flags), llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); + EXPECT_EQ((int)(f->flags), (int)llvm::MachO::MH_SUBSECTIONS_VIA_SYMBOLS); EXPECT_EQ(f->sections.size(), 2UL); const Section& sect1 = f->sections[0]; From llvm-commits at lists.llvm.org Fri Jul 10 22:53:53 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 22:53:53 -0700 (PDT) Subject: [llvm] c986995 - [OpenMP][NFC] Remove unused (always fixed) arguments Message-ID: <5f0953f1.1c69fb81.4f425.1ea3@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T00:51:51-05:00 New Revision: c98699582a6333bbe76ff7853b4cd6beb45754cf URL: https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf DIFF: https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf.diff LOG: [OpenMP][NFC] Remove unused (always fixed) arguments There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83268 Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp clang/test/OpenMP/nvptx_data_sharing.cpp clang/test/OpenMP/nvptx_parallel_codegen.cpp clang/test/OpenMP/nvptx_target_codegen.cpp clang/test/OpenMP/nvptx_target_teams_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def openmp/libomptarget/deviceRTLs/common/src/parallel.cu openmp/libomptarget/deviceRTLs/interface.h Removed: ################################################################################ diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp index cabd06bd76e8..cbd443134e7a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp @@ -38,11 +38,9 @@ enum OpenMPRTLFunctionNVPTX { /// Call to void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime); OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2, /// Call to void __kmpc_kernel_prepare_parallel(void - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); + /// *outlined_function); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, - /// Call to bool __kmpc_kernel_parallel(void **outlined_function, - /// int16_t IsOMPRuntimeInitialized); + /// Call to bool __kmpc_kernel_parallel(void **outlined_function); OMPRTL_NVPTX__kmpc_kernel_parallel, /// Call to void __kmpc_kernel_end_parallel(); OMPRTL_NVPTX__kmpc_kernel_end_parallel, @@ -1466,8 +1464,7 @@ void CGOpenMPRuntimeNVPTX::emitWorkerLoop(CodeGenFunction &CGF, CGF.InitTempAlloca(WorkFn, llvm::Constant::getNullValue(CGF.Int8PtrTy)); // TODO: Optimize runtime initialization and pass in correct value. - llvm::Value *Args[] = {WorkFn.getPointer(), - /*RequiresOMPRuntime=*/Bld.getInt16(1)}; + llvm::Value *Args[] = {WorkFn.getPointer()}; llvm::Value *Ret = CGF.EmitRuntimeCall( createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_parallel), Args); Bld.CreateStore(Bld.CreateZExt(Ret, CGF.Int8Ty), ExecStatus); @@ -1595,17 +1592,16 @@ CGOpenMPRuntimeNVPTX::createNVPTXRuntimeFunction(unsigned Function) { } case OMPRTL_NVPTX__kmpc_kernel_prepare_parallel: { /// Build void __kmpc_kernel_prepare_parallel( - /// void *outlined_function, int16_t IsOMPRuntimeInitialized); - llvm::Type *TypeParams[] = {CGM.Int8PtrTy, CGM.Int16Ty}; + /// void *outlined_function); + llvm::Type *TypeParams[] = {CGM.Int8PtrTy}; auto *FnTy = llvm::FunctionType::get(CGM.VoidTy, TypeParams, /*isVarArg*/ false); RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_prepare_parallel"); break; } case OMPRTL_NVPTX__kmpc_kernel_parallel: { - /// Build bool __kmpc_kernel_parallel(void **outlined_function, - /// int16_t IsOMPRuntimeInitialized); - llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy, CGM.Int16Ty}; + /// Build bool __kmpc_kernel_parallel(void **outlined_function); + llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy}; llvm::Type *RetTy = CGM.getTypes().ConvertType(CGM.getContext().BoolTy); auto *FnTy = llvm::FunctionType::get(RetTy, TypeParams, /*isVarArg*/ false); @@ -2569,7 +2565,7 @@ void CGOpenMPRuntimeNVPTX::emitNonSPMDParallelCall( llvm::Value *ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy); // Prepare for parallel region. Indicate the outlined function. - llvm::Value *Args[] = {ID, /*RequiresOMPRuntime=*/Bld.getInt16(1)}; + llvm::Value *Args[] = {ID}; CGF.EmitRuntimeCall( createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_prepare_parallel), Args); diff --git a/clang/test/OpenMP/nvptx_data_sharing.cpp b/clang/test/OpenMP/nvptx_data_sharing.cpp index 2ee6bd2b4701..1372246c7fc8 100644 --- a/clang/test/OpenMP/nvptx_data_sharing.cpp +++ b/clang/test/OpenMP/nvptx_data_sharing.cpp @@ -55,7 +55,7 @@ void test_ds(){ // CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0 // CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1 // CK1: store i32 10, i32* [[A]] -// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1) +// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}) // CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS1]], i64 1) // CK1: [[SHARGSTMP1:%.+]] = load i8**, i8*** [[SHAREDARGS1]] // CK1: [[SHARGSTMP2:%.+]] = getelementptr inbounds i8*, i8** [[SHARGSTMP1]], i64 0 @@ -65,7 +65,7 @@ void test_ds(){ // CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CK1: call void @__kmpc_end_sharing_variables() // CK1: store i32 100, i32* [[B]] -// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1) +// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}) // CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS2]], i64 2) // CK1: [[SHARGSTMP3:%.+]] = load i8**, i8*** [[SHAREDARGS2]] // CK1: [[SHARGSTMP4:%.+]] = getelementptr inbounds i8*, i8** [[SHARGSTMP3]], i64 0 diff --git a/clang/test/OpenMP/nvptx_parallel_codegen.cpp b/clang/test/OpenMP/nvptx_parallel_codegen.cpp index c8b15c8f6e3b..ad25e0d775d1 100644 --- a/clang/test/OpenMP/nvptx_parallel_codegen.cpp +++ b/clang/test/OpenMP/nvptx_parallel_codegen.cpp @@ -92,7 +92,7 @@ int bar(int n){ // // CHECK: [[AWAIT_WORK]] // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) #[[#CONVERGENT:]] -// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]] +// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]]) // CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8 // store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1 // CHECK: [[WORK:%.+]] = load i8*, i8** [[OMP_WORK_FN]], @@ -166,13 +166,13 @@ int bar(int n){ // CHECK-DAG: [[MWS:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() // CHECK: [[MTMP1:%.+]] = sub nuw i32 [[MNTH]], [[MWS]] // CHECK: call void @__kmpc_kernel_init(i32 [[MTMP1]] -// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN1]]_wrapper to i8*), +// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN1]]_wrapper to i8*)) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK: call void @__kmpc_serialized_parallel( // CHECK: {{call|invoke}} void [[PARALLEL_FN3:@.+]]( // CHECK: call void @__kmpc_end_serialized_parallel( -// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN2]]_wrapper to i8*), +// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN2]]_wrapper to i8*)) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK-64-DAG: load i32, i32* [[REF_A]] @@ -211,7 +211,7 @@ int bar(int n){ // // CHECK: [[AWAIT_WORK]] // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) -// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]], +// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]]) // CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8 // store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1 // CHECK: [[WORK:%.+]] = load i8*, i8** [[OMP_WORK_FN]], @@ -291,7 +291,7 @@ int bar(int n){ // CHECK: br i1 [[CMP]], label {{%?}}[[IF_THEN:.+]], label {{%?}}[[IF_ELSE:.+]] // // CHECK: [[IF_THEN]] -// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN4]]_wrapper to i8*), +// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN4]]_wrapper to i8*)) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) // CHECK: br label {{%?}}[[IF_END:.+]] diff --git a/clang/test/OpenMP/nvptx_target_codegen.cpp b/clang/test/OpenMP/nvptx_target_codegen.cpp index 91f31185d8c1..56f04cb01f0a 100644 --- a/clang/test/OpenMP/nvptx_target_codegen.cpp +++ b/clang/test/OpenMP/nvptx_target_codegen.cpp @@ -612,7 +612,7 @@ int baz(int f, double &a) { // CHECK: call void @__kmpc_end_serialized_parallel(%struct.ident_t* [[UNKNOWN]], i32 [[GTID]]) // CHECK: br label -// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*), i16 1) +// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*)) // CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_PTR:%.+]], i{{64|32}} 2) // CHECK: [[SHARED:%.+]] = load i8**, i8*** [[SHARED_PTR]], // CHECK: [[REF:%.+]] = getelementptr inbounds i8*, i8** [[SHARED]], i{{64|32}} 0 diff --git a/clang/test/OpenMP/nvptx_target_teams_codegen.cpp b/clang/test/OpenMP/nvptx_target_teams_codegen.cpp index 3ab955fa8508..8ff393f074e4 100644 --- a/clang/test/OpenMP/nvptx_target_teams_codegen.cpp +++ b/clang/test/OpenMP/nvptx_target_teams_codegen.cpp @@ -68,7 +68,7 @@ int bar(int n){ // // CHECK: [[AWAIT_WORK]] // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) - // CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]], i16 1) + // CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]]) // CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8 // store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1 // CHECK: [[WORK:%.+]] = load i8*, i8** [[OMP_WORK_FN]], @@ -154,7 +154,7 @@ int bar(int n){ // // CHECK: [[AWAIT_WORK]] // CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) - // CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]], i16 1) + // CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]]) // CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8 // store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1 // CHECK: [[WORK:%.+]] = load i8*, i8** [[OMP_WORK_FN]], diff --git a/clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp b/clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp index fe294bbddf2b..4f23f18730cc 100644 --- a/clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp +++ b/clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp @@ -88,7 +88,7 @@ int bar(int n){ // CHECK: [[I_ADDR:%.+]] = getelementptr inbounds [[GLOB_TY]], [[GLOB_TY]]* [[RD]], i32 0, i32 0 // // CHECK: call void @__kmpc_for_static_init_4( - // CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*), i16 1) + // CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*)) // CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_VARS_PTR:%.+]], i{{64|32}} 1) // CHECK: [[SHARED_VARS_BUF:%.+]] = load i8**, i8*** [[SHARED_VARS_PTR]], // CHECK: [[VARS_BUF:%.+]] = getelementptr inbounds i8*, i8** [[SHARED_VARS_BUF]], i{{64|32}} 0 diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index f286403e657c..bf799a781ae1 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -584,6 +584,11 @@ __OMP_RTL(__tgt_push_mapper_component, false, Void, VoidPtr, VoidPtr, VoidPtr, __OMP_RTL(__kmpc_task_allow_completion_event, false, VoidPtr, IdentPtr, /* Int */ Int32, /* kmp_task_t */ VoidPtr) +/// Note that device runtime functions (in the following) do not necessarily +/// need attributes as we expect to see the definitions. +__OMP_RTL(__kmpc_kernel_parallel, false, Int1, VoidPtrPtr) +__OMP_RTL(__kmpc_kernel_prepare_parallel, false, Void, VoidPtr) + __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL diff --git a/openmp/libomptarget/deviceRTLs/common/src/parallel.cu b/openmp/libomptarget/deviceRTLs/common/src/parallel.cu index 4f3c3ac0c08a..20b03e9bab1b 100644 --- a/openmp/libomptarget/deviceRTLs/common/src/parallel.cu +++ b/openmp/libomptarget/deviceRTLs/common/src/parallel.cu @@ -72,10 +72,8 @@ INLINE static uint16_t determineNumberOfThreads(uint16_t NumThreadsClause, } // This routine is always called by the team master.. -EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn, - int16_t IsOMPRuntimeInitialized) { +EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn) { PRINT0(LD_IO, "call to __kmpc_kernel_prepare_parallel\n"); - ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized, "Expected initialized runtime."); omptarget_nvptx_workFn = WorkFn; @@ -120,12 +118,9 @@ EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn, // returns True if this thread is active, else False. // // Only the worker threads call this routine. -EXTERN bool __kmpc_kernel_parallel(void **WorkFn, - int16_t IsOMPRuntimeInitialized) { +EXTERN bool __kmpc_kernel_parallel(void **WorkFn) { PRINT0(LD_IO | LD_PAR, "call to __kmpc_kernel_parallel\n"); - ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized, "Expected initialized runtime."); - // Work function and arguments for L1 parallel region. *WorkFn = omptarget_nvptx_workFn; diff --git a/openmp/libomptarget/deviceRTLs/interface.h b/openmp/libomptarget/deviceRTLs/interface.h index 39ce73cba957..4d352bc648fa 100644 --- a/openmp/libomptarget/deviceRTLs/interface.h +++ b/openmp/libomptarget/deviceRTLs/interface.h @@ -424,10 +424,8 @@ EXTERN void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized); EXTERN void __kmpc_spmd_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime, int16_t RequiresDataSharing); EXTERN void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime); -EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn, - int16_t IsOMPRuntimeInitialized); -EXTERN bool __kmpc_kernel_parallel(void **WorkFn, - int16_t IsOMPRuntimeInitialized); +EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn); +EXTERN bool __kmpc_kernel_parallel(void **WorkFn); EXTERN void __kmpc_kernel_end_parallel(); EXTERN void __kmpc_data_sharing_init_stack(); From llvm-commits at lists.llvm.org Fri Jul 10 22:53:55 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 22:53:55 -0700 (PDT) Subject: [llvm] b726c55 - [OpenMP][NFC] Fix some typos Message-ID: <5f0953f3.1c69fb81.9770a.4445@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T00:51:51-05:00 New Revision: b726c55709a0a5e31a26c8e381544348c5dcd402 URL: https://github.com/llvm/llvm-project/commit/b726c55709a0a5e31a26c8e381544348c5dcd402 DIFF: https://github.com/llvm/llvm-project/commit/b726c55709a0a5e31a26c8e381544348c5dcd402.diff LOG: [OpenMP][NFC] Fix some typos Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 0b2e4f24bd17..d7572bf7dc53 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -210,7 +210,7 @@ struct OMPInformationCache : public InformationCache { } // Remove the to-be-deleted indices in reverse order as prior - // modifcations will not modify the smaller indices. + // modifications will not modify the smaller indices. while (!ToBeDeleted.empty()) { unsigned Idx = ToBeDeleted.pop_back_val(); UV[Idx] = UV.back(); @@ -304,7 +304,7 @@ struct OMPInformationCache : public InformationCache { return true; } - // Helper to collect all uses of the decleration in the UsesMap. + // Helper to collect all uses of the declaration in the UsesMap. unsigned collectUses(RuntimeFunctionInfo &RFI, bool CollectStats = true) { unsigned NumUses = 0; if (!RFI.Declaration) @@ -519,7 +519,7 @@ struct OpenMPOpt { return Changed; } - /// Try to eliminiate runtime calls by reusing existing ones. + /// Try to eliminate runtime calls by reusing existing ones. bool deduplicateRuntimeCalls() { bool Changed = false; @@ -615,7 +615,7 @@ struct OpenMPOpt { return Ident; } - /// Try to eliminiate calls of \p RFI in \p F by reusing an existing one or + /// Try to eliminate calls of \p RFI in \p F by reusing an existing one or /// \p ReplVal if given. bool deduplicateRuntimeCalls(Function &F, OMPInformationCache::RuntimeFunctionInfo &RFI, @@ -789,7 +789,7 @@ struct OpenMPOpt { }); } - /// The underyling module. + /// The underlying module. Module &M; /// The SCC we are operating on. From llvm-commits at lists.llvm.org Fri Jul 10 22:53:57 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 22:53:57 -0700 (PDT) Subject: [llvm] 54bd375 - [OpenMP][NFC] Add convenient helper and early exit check Message-ID: <5f0953f5.1c69fb81.c2ead.1a63@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T00:51:51-05:00 New Revision: 54bd3751ceebe6eb67804a1ed8be72943817852f URL: https://github.com/llvm/llvm-project/commit/54bd3751ceebe6eb67804a1ed8be72943817852f DIFF: https://github.com/llvm/llvm-project/commit/54bd3751ceebe6eb67804a1ed8be72943817852f.diff LOG: [OpenMP][NFC] Add convenient helper and early exit check Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index d7572bf7dc53..b2e30a4d2b79 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -161,6 +161,9 @@ struct OMPInformationCache : public InformationCache { /// Clear UsesMap for runtime function. void clearUsesMap() { UsesMap.clear(); } + /// Boolean conversion that is true if the runtime function was found. + operator bool() const { return Declaration; } + /// Return the vector of uses in function \p F. UseVector &getOrCreateUseVector(Function *F) { std::shared_ptr &UV = UsesMap[F]; @@ -411,6 +414,9 @@ struct OpenMPOpt { /// Run all OpenMP optimizations on the underlying SCC/ModuleSlice. bool run() { + if (SCC.empty()) + return false; + bool Changed = false; LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size() From llvm-commits at lists.llvm.org Fri Jul 10 22:54:01 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 05:54:01 +0000 (UTC) Subject: [PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments In-Reply-To: References: Message-ID: <00d71dbb572f607f88cdb815390a45c4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc98699582a63: [OpenMP][NFC] Remove unused (always fixed) arguments (authored by jdoerfert). Changed prior to commit: https://reviews.llvm.org/D83268?vs=275871&id=277218#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83268/new/ https://reviews.llvm.org/D83268 Files: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp clang/test/OpenMP/nvptx_data_sharing.cpp clang/test/OpenMP/nvptx_parallel_codegen.cpp clang/test/OpenMP/nvptx_target_codegen.cpp clang/test/OpenMP/nvptx_target_teams_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def openmp/libomptarget/deviceRTLs/common/src/parallel.cu openmp/libomptarget/deviceRTLs/interface.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83268.277218.patch Type: text/x-patch Size: 12882 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 23:22:36 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:22:36 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <4dae06e6cb5b795743ef522713c1ecb7@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll:21-36 +; Exhaust caller-saved parameter registers and force callee saved registers to +; be used. This tests that CFI directives for callee saved registers are +; generated with basic block sections. +; extern void f1(int, int, int); +; +; void foo(bool k, int p1, int p2, int p3, int p4, int p5, int p6) { +; // Using a conditional forces a basic block section. ---------------- Thanks for this - totally makes sense to me now! (Totally optional: I'd probably just write this with one parameter? And/or perhaps with constants for the first f1 call? - I've needed to do things like this when testing the DWARF OP_entry_values: ``` void f1(int); extern bool b; void f3(int i) { if (b) { // adds a basic block f1(1); // clobbers 'i', causing it to be spilled f1(i); // keeps 'i' alive } } ``` Not to nitpick - just trying to help ensure tests are very specifically targeted - at least makes them easier for me to read, not sure about other folks? (if there's something interesting about testing more than one of these - some comments describing that'd be handy too) What you've got is fine though, if you prefer it/find it valuable Thanks for the general write-up of CFI and this patch - helped me understand a bit of what this patch in general, and this test in particular are all about!) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Fri Jul 10 23:41:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:41:45 +0000 (UTC) Subject: [PATCH] D83617: [PowerPC] Fix combineVectorShuffle regression after D77448 Message-ID: MaskRay created this revision. MaskRay added reviewers: PowerPC, hfinkel, nemanjai, RolandF. Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. Herald added a project: LLVM. Commit 1fed131660b2 assumed that NewShuffle (shuffle vector canonicalization result) will always be ShuffleVectorSDNode, which may be false. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83617 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83617.277219.patch Type: text/x-patch Size: 4922 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 23:50:11 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:50:11 +0000 (UTC) Subject: [PATCH] D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) In-Reply-To: References: Message-ID: <7c3c9f095c7550465341c43a18373f48@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGe8039ad4def0: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) (authored by jdoerfert). Changed prior to commit: https://reviews.llvm.org/D83269?vs=275873&id=277220#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83269/new/ https://reviews.llvm.org/D83269 Files: llvm/include/llvm/Transforms/IPO/OpenMPOpt.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83269.277220.patch Type: text/x-patch Size: 10016 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 23:50:11 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 23:50:11 -0700 (PDT) Subject: [llvm] e8039ad - [OpenMP] Identify GPU kernels (aka. OpenMP target regions) Message-ID: <5f096123.1c69fb81.8aee9.3c82@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T01:44:00-05:00 New Revision: e8039ad4def0c4a2499cfbaba38bcc8ef48dee92 URL: https://github.com/llvm/llvm-project/commit/e8039ad4def0c4a2499cfbaba38bcc8ef48dee92 DIFF: https://github.com/llvm/llvm-project/commit/e8039ad4def0c4a2499cfbaba38bcc8ef48dee92.diff LOG: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) We now identify GPU kernels, that is entry points into the GPU code. These kernels (can) correspond to OpenMP target regions. With this patch we identify and on request print them via remarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83269 Added: llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll Modified: llvm/include/llvm/Transforms/IPO/OpenMPOpt.h llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h index 0bd81ea8f543..d96187b73f9b 100644 --- a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h +++ b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h @@ -17,6 +17,9 @@ namespace llvm { namespace omp { +/// Summary of a kernel (=entry point for target offloading). +using Kernel = Function *; + /// Helper to remember if the module contains OpenMP (runtime calls), to be used /// foremost with containsOpenMP. struct OpenMPInModule { @@ -30,8 +33,17 @@ struct OpenMPInModule { bool isKnown() { return Value != OpenMP::UNKNOWN; } operator bool() { return Value != OpenMP::NOT_FOUND; } + /// Return the known kernels (=GPU entry points) in the module. + SmallPtrSetImpl &getKernels() { return Kernels; } + + /// Identify kernels in the module and populate the Kernels set. + void identifyKernels(Module &M); + private: enum class OpenMP { FOUND, NOT_FOUND, UNKNOWN } Value = OpenMP::UNKNOWN; + + /// Collection of known kernels (=GPU entry points) in the module. + SmallPtrSet Kernels; }; /// Helper to determine if \p M contains OpenMP (runtime calls). diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index b2e30a4d2b79..f0fc8a6c8c4a 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -39,6 +39,8 @@ static cl::opt DisableOpenMPOptimizations( static cl::opt PrintICVValues("openmp-print-icv-values", cl::init(false), cl::Hidden); +static cl::opt PrintOpenMPKernels("openmp-print-gpu-kernels", + cl::init(false), cl::Hidden); STATISTIC(NumOpenMPRuntimeCallsDeduplicated, "Number of OpenMP runtime calls deduplicated"); @@ -48,6 +50,8 @@ STATISTIC(NumOpenMPRuntimeFunctionsIdentified, "Number of OpenMP runtime functions identified"); STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified, "Number of OpenMP runtime function uses identified"); +STATISTIC(NumOpenMPTargetRegionKernels, + "Number of OpenMP target region entry points (=kernels) identified"); #if !defined(NDEBUG) static constexpr auto TAG = "[" DEBUG_TYPE "]"; @@ -99,9 +103,10 @@ struct AAICVTracker; struct OMPInformationCache : public InformationCache { OMPInformationCache(Module &M, AnalysisGetter &AG, BumpPtrAllocator &Allocator, SetVector *CGSCC, - SmallPtrSetImpl &ModuleSlice) + SmallPtrSetImpl &ModuleSlice, + SmallPtrSetImpl &Kernels) : InformationCache(M, AG, Allocator, CGSCC), ModuleSlice(ModuleSlice), - OMPBuilder(M) { + OMPBuilder(M), Kernels(Kernels) { OMPBuilder.initialize(); initializeRuntimeFunctions(); initializeInternalControlVars(); @@ -399,6 +404,9 @@ struct OMPInformationCache : public InformationCache { // TODO: We should attach the attributes defined in OMPKinds.def. } + + /// Collection of known kernels (\see Kernel) in the module. + SmallPtrSetImpl &Kernels; }; struct OpenMPOpt { @@ -423,26 +431,10 @@ struct OpenMPOpt { << " functions in a slice with " << OMPInfoCache.ModuleSlice.size() << " functions\n"); - /// Print initial ICV values for testing. - /// FIXME: This should be done from the Attributor once it is added. - if (PrintICVValues) { - InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel}; - - for (Function *F : OMPInfoCache.ModuleSlice) { - for (auto ICV : ICVs) { - auto ICVInfo = OMPInfoCache.ICVs[ICV]; - auto Remark = [&](OptimizationRemark OR) { - return OR << "OpenMP ICV " << ore::NV("OpenMPICV", ICVInfo.Name) - << " Value: " - << (ICVInfo.InitValue - ? ICVInfo.InitValue->getValue().toString(10, true) - : "IMPLEMENTATION_DEFINED"); - }; - - emitRemarkOnFunction(F, "OpenMPICVTracker", Remark); - } - } - } + if (PrintICVValues) + printICVs(); + if (PrintOpenMPKernels) + printKernels(); Changed |= runAttributor(); @@ -455,6 +447,42 @@ struct OpenMPOpt { return Changed; } + /// Print initial ICV values for testing. + /// FIXME: This should be done from the Attributor once it is added. + void printICVs() const { + InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel}; + + for (Function *F : OMPInfoCache.ModuleSlice) { + for (auto ICV : ICVs) { + auto ICVInfo = OMPInfoCache.ICVs[ICV]; + auto Remark = [&](OptimizationRemark OR) { + return OR << "OpenMP ICV " << ore::NV("OpenMPICV", ICVInfo.Name) + << " Value: " + << (ICVInfo.InitValue + ? ICVInfo.InitValue->getValue().toString(10, true) + : "IMPLEMENTATION_DEFINED"); + }; + + emitRemarkOnFunction(F, "OpenMPICVTracker", Remark); + } + } + } + + /// Print OpenMP GPU kernels for testing. + void printKernels() const { + for (Function *F : SCC) { + if (!OMPInfoCache.Kernels.count(F)) + continue; + + auto Remark = [&](OptimizationRemark OR) { + return OR << "OpenMP GPU kernel " + << ore::NV("OpenMPGPUKernel", F->getName()) << "\n"; + }; + + emitRemarkOnFunction(F, "OpenMPGPU", Remark); + } + } + /// Return the call if \p U is a callee use in a regular call. If \p RFI is /// given it has to be the callee or a nullptr is returned. static CallInst *getCallIfRegularCall( @@ -775,7 +803,7 @@ struct OpenMPOpt { template > void emitRemark(Instruction *Inst, StringRef RemarkName, - RemarkCallBack &&RemarkCB) { + RemarkCallBack &&RemarkCB) const { Function *F = Inst->getParent()->getParent(); auto &ORE = OREGetter(F); @@ -785,9 +813,10 @@ struct OpenMPOpt { /// Emit a remark on a function. Since only OptimizationRemark is supporting /// this, it can't be made generic. - void emitRemarkOnFunction( - Function *F, StringRef RemarkName, - function_ref &&RemarkCB) { + void + emitRemarkOnFunction(Function *F, StringRef RemarkName, + function_ref + &&RemarkCB) const { auto &ORE = OREGetter(F); ORE.emit([&]() { @@ -1044,7 +1073,8 @@ PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, SetVector Functions(SCC.begin(), SCC.end()); BumpPtrAllocator Allocator; OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator, - /*CGSCC*/ &Functions, ModuleSlice); + /*CGSCC*/ &Functions, ModuleSlice, + OMPInModule.getKernels()); Attributor A(Functions, InfoCache, CGUpdater); @@ -1109,9 +1139,9 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { AnalysisGetter AG; SetVector Functions(SCC.begin(), SCC.end()); BumpPtrAllocator Allocator; - OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, - Allocator, - /*CGSCC*/ &Functions, ModuleSlice); + OMPInformationCache InfoCache( + *(Functions.back()->getParent()), AG, Allocator, + /*CGSCC*/ &Functions, ModuleSlice, OMPInModule.getKernels()); Attributor A(Functions, InfoCache, CGUpdater); @@ -1125,14 +1155,45 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { } // end anonymous namespace +void OpenMPInModule::identifyKernels(Module &M) { + + NamedMDNode *MD = M.getOrInsertNamedMetadata("nvvm.annotations"); + if (!MD) + return; + + for (auto *Op : MD->operands()) { + if (Op->getNumOperands() < 2) + continue; + MDString *KindID = dyn_cast(Op->getOperand(1)); + if (!KindID || KindID->getString() != "kernel") + continue; + + Function *KernelFn = + mdconst::dyn_extract_or_null(Op->getOperand(0)); + if (!KernelFn) + continue; + + ++NumOpenMPTargetRegionKernels; + + Kernels.insert(KernelFn); + } +} + bool llvm::omp::containsOpenMP(Module &M, OpenMPInModule &OMPInModule) { if (OMPInModule.isKnown()) return OMPInModule; - #define OMP_RTL(_Enum, _Name, ...) \ - if (M.getFunction(_Name)) \ - return OMPInModule = true; + else if (M.getFunction(_Name)) OMPInModule = true; #include "llvm/Frontend/OpenMP/OMPKinds.def" + + // Identify kernels once. TODO: We should split the OMPInformationCache into a + // module and an SCC part. The kernel information, among other things, could + // go into the module part. + if (OMPInModule.isKnown() && OMPInModule) { + OMPInModule.identifyKernels(M); + return true; + } + return OMPInModule = false; } diff --git a/llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll b/llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll new file mode 100644 index 000000000000..ccdf0b981dc2 --- /dev/null +++ b/llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll @@ -0,0 +1,27 @@ +; RUN: opt -passes=openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels -disable-output < %s 2>&1 | FileCheck %s --implicit-check-not=non_kernel +; RUN: opt -openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels -disable-output < %s 2>&1 | FileCheck %s --implicit-check-not=non_kernel + +; CHECK-DAG: remark: :0:0: OpenMP GPU kernel kernel1 +; CHECK-DAG: remark: :0:0: OpenMP GPU kernel kernel2 + +define void @kernel1() { + ret void +} + +define void @kernel2() { + ret void +} + +define void @non_kernel() { + ret void +} + +; Needed to trigger the openmp-opt pass +declare dso_local void @__kmpc_kernel_prepare_parallel(i8*) + +!nvvm.annotations = !{!2, !0, !1, !3, !1, !2} + +!0 = !{void ()* @kernel1, !"kernel", i32 1} +!1 = !{void ()* @non_kernel, !"non_kernel", i32 1} +!2 = !{null, !"align", i32 1} +!3 = !{void ()* @kernel2, !"kernel", i32 1} From llvm-commits at lists.llvm.org Fri Jul 10 23:50:13 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 23:50:13 -0700 (PDT) Subject: [llvm] 624d34a - [OpenMP] Compute a proper module slice for the CGSCCC pass Message-ID: <5f096125.1c69fb81.e3e56.268f@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T01:44:00-05:00 New Revision: 624d34afff5de099a6f84e678c81055556c3d42d URL: https://github.com/llvm/llvm-project/commit/624d34afff5de099a6f84e678c81055556c3d42d DIFF: https://github.com/llvm/llvm-project/commit/624d34afff5de099a6f84e678c81055556c3d42d.diff LOG: [OpenMP] Compute a proper module slice for the CGSCCC pass The module slice describes which functions we can analyze and transform while working on an SCC as part of the CGSCC OpenMPOpt pass. So far, we simply restricted it to the SCC. In a follow up we will need to have a bigger scope which is why this patch introduces a proper identification of the module slice. In short, everything that has a transitive reference to a function in the SCC or is transitively referenced by one is fair game. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D83270 Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index f0fc8a6c8c4a..38647b5eae68 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -57,6 +57,28 @@ STATISTIC(NumOpenMPTargetRegionKernels, static constexpr auto TAG = "[" DEBUG_TYPE "]"; #endif +/// Apply \p CB to all uses of \p F. If \p LookThroughConstantExprUses is +/// true, constant expression users are not given to \p CB but their uses are +/// traversed transitively. +template +static void foreachUse(Function &F, CBTy CB, + bool LookThroughConstantExprUses = true) { + SmallVector Worklist(make_pointer_range(F.uses())); + + for (unsigned idx = 0; idx < Worklist.size(); ++idx) { + Use &U = *Worklist[idx]; + + // Allow use in constant bitcasts and simply look through them. + if (LookThroughConstantExprUses && isa(U.getUser())) { + for (Use &CEU : cast(U.getUser())->uses()) + Worklist.push_back(&CEU); + continue; + } + + CB(U); + } +} + /// Helper struct to store tracked ICV values at specif instructions. struct ICVValue { Instruction *Inst; @@ -102,11 +124,12 @@ struct AAICVTracker; /// Attributor runs. struct OMPInformationCache : public InformationCache { OMPInformationCache(Module &M, AnalysisGetter &AG, - BumpPtrAllocator &Allocator, SetVector *CGSCC, - SmallPtrSetImpl &ModuleSlice, + BumpPtrAllocator &Allocator, SetVector &CGSCC, SmallPtrSetImpl &Kernels) - : InformationCache(M, AG, Allocator, CGSCC), ModuleSlice(ModuleSlice), - OMPBuilder(M), Kernels(Kernels) { + : InformationCache(M, AG, Allocator, &CGSCC), OMPBuilder(M), + Kernels(Kernels) { + initializeModuleSlice(CGSCC); + OMPBuilder.initialize(); initializeRuntimeFunctions(); initializeInternalControlVars(); @@ -196,20 +219,20 @@ struct OMPInformationCache : public InformationCache { /// Run the callback \p CB on each use and forget the use if the result is /// true. The callback will be fed the function in which the use was /// encountered as second argument. - void foreachUse(function_ref CB) { - for (auto &It : UsesMap) - foreachUse(CB, It.first, It.second.get()); + void foreachUse(SmallVectorImpl &SCC, + function_ref CB) { + for (Function *F : SCC) + foreachUse(CB, F); } /// Run the callback \p CB on each use within the function \p F and forget /// the use if the result is true. - void foreachUse(function_ref CB, Function *F, - UseVector *Uses = nullptr) { + void foreachUse(function_ref CB, Function *F) { SmallVector ToBeDeleted; ToBeDeleted.clear(); unsigned Idx = 0; - UseVector &UV = Uses ? *Uses : getOrCreateUseVector(F); + UseVector &UV = getOrCreateUseVector(F); for (Use *U : UV) { if (CB(*U, *F)) @@ -232,8 +255,45 @@ struct OMPInformationCache : public InformationCache { DenseMap> UsesMap; }; + /// Initialize the ModuleSlice member based on \p SCC. ModuleSlices contains + /// (a subset of) all functions that we can look at during this SCC traversal. + /// This includes functions (transitively) called from the SCC and the + /// (transitive) callers of SCC functions. We also can look at a function if + /// there is a "reference edge", i.a., if the function somehow uses (!=calls) + /// a function in the SCC or a caller of a function in the SCC. + void initializeModuleSlice(SetVector &SCC) { + ModuleSlice.insert(SCC.begin(), SCC.end()); + + SmallPtrSet Seen; + SmallVector Worklist(SCC.begin(), SCC.end()); + while (!Worklist.empty()) { + Function *F = Worklist.pop_back_val(); + ModuleSlice.insert(F); + + for (Instruction &I : instructions(*F)) + if (auto *CB = dyn_cast(&I)) + if (Function *Callee = CB->getCalledFunction()) + if (Seen.insert(Callee).second) + Worklist.push_back(Callee); + } + + Seen.clear(); + Worklist.append(SCC.begin(), SCC.end()); + while (!Worklist.empty()) { + Function *F = Worklist.pop_back_val(); + ModuleSlice.insert(F); + + // Traverse all transitive uses. + foreachUse(*F, [&](Use &U) { + if (auto *UsrI = dyn_cast(U.getUser())) + if (Seen.insert(UsrI->getFunction()).second) + Worklist.push_back(UsrI->getFunction()); + }); + } + } + /// The slice of the module we are allowed to look at. - SmallPtrSetImpl &ModuleSlice; + SmallPtrSet ModuleSlice; /// An OpenMP-IR-Builder instance OpenMPIRBuilder OMPBuilder; @@ -548,7 +608,7 @@ struct OpenMPOpt { return true; }; - RFI.foreachUse(DeleteCallCB); + RFI.foreachUse(SCC, DeleteCallCB); return Changed; } @@ -633,7 +693,7 @@ struct OpenMPOpt { /* GlobalOnly */ true, SingleChoice); return false; }; - RFI.foreachUse(CombineIdentStruct); + RFI.foreachUse(SCC, CombineIdentStruct); if (!Ident || !SingleChoice) { // The IRBuilder uses the insertion block to get to the module, this is @@ -733,7 +793,7 @@ struct OpenMPOpt { Changed = true; return true; }; - RFI.foreachUse(ReplaceAndDeleteCB); + RFI.foreachUse(SCC, ReplaceAndDeleteCB); return Changed; } @@ -776,7 +836,7 @@ struct OpenMPOpt { OMPInformationCache::RuntimeFunctionInfo &GlobThreadNumRFI = OMPInfoCache.RFIs[OMPRTL___kmpc_global_thread_num]; - GlobThreadNumRFI.foreachUse([&](Use &U, Function &F) { + GlobThreadNumRFI.foreachUse(SCC, [&](Use &U, Function &F) { if (CallInst *CI = getCallIfRegularCall(U, &GlobThreadNumRFI)) AddUserArgs(*CI); return false; @@ -938,7 +998,7 @@ struct AAICVTrackerFunction : public AAICVTracker { return true; }; - GetterRFI.foreachUse(ReplaceAndDeleteCB); + GetterRFI.foreachUse(ReplaceAndDeleteCB, getAnchorScope()); return Changed; } @@ -1048,12 +1108,9 @@ PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, if (DisableOpenMPOptimizations) return PreservedAnalyses::all(); - SmallPtrSet ModuleSlice; SmallVector SCC; - for (LazyCallGraph::Node &N : C) { + for (LazyCallGraph::Node &N : C) SCC.push_back(&N.getFunction()); - ModuleSlice.insert(SCC.back()); - } if (SCC.empty()) return PreservedAnalyses::all(); @@ -1073,8 +1130,7 @@ PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C, SetVector Functions(SCC.begin(), SCC.end()); BumpPtrAllocator Allocator; OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator, - /*CGSCC*/ &Functions, ModuleSlice, - OMPInModule.getKernels()); + /*CGSCC*/ Functions, OMPInModule.getKernels()); Attributor A(Functions, InfoCache, CGUpdater); @@ -1112,14 +1168,11 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { if (DisableOpenMPOptimizations || skipSCC(CGSCC)) return false; - SmallPtrSet ModuleSlice; SmallVector SCC; for (CallGraphNode *CGN : CGSCC) if (Function *Fn = CGN->getFunction()) - if (!Fn->isDeclaration()) { + if (!Fn->isDeclaration()) SCC.push_back(Fn); - ModuleSlice.insert(Fn); - } if (SCC.empty()) return false; @@ -1141,7 +1194,7 @@ struct OpenMPOptLegacyPass : public CallGraphSCCPass { BumpPtrAllocator Allocator; OMPInformationCache InfoCache( *(Functions.back()->getParent()), AG, Allocator, - /*CGSCC*/ &Functions, ModuleSlice, OMPInModule.getKernels()); + /*CGSCC*/ Functions, OMPInModule.getKernels()); Attributor A(Functions, InfoCache, CGUpdater); From llvm-commits at lists.llvm.org Fri Jul 10 23:50:16 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:50:16 +0000 (UTC) Subject: [PATCH] D83270: [OpenMP] Compute a proper module slice for the CGSCCC pass In-Reply-To: References: Message-ID: This revision was automatically updated to reflect the committed changes. Closed by commit rG624d34afff5d: [OpenMP] Compute a proper module slice for the CGSCCC pass (authored by jdoerfert). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83270/new/ https://reviews.llvm.org/D83270 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83270.277221.patch Type: text/x-patch Size: 7789 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 23:50:15 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Fri, 10 Jul 2020 23:50:15 -0700 (PDT) Subject: [llvm] 5b0581a - [OpenMP] Replace function pointer uses in GPU state machine Message-ID: <5f096127.1c69fb81.ea2e9.a83e@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T01:44:00-05:00 New Revision: 5b0581aedc2252481462970503d1085dc27e65eb URL: https://github.com/llvm/llvm-project/commit/5b0581aedc2252481462970503d1085dc27e65eb DIFF: https://github.com/llvm/llvm-project/commit/5b0581aedc2252481462970503d1085dc27e65eb.diff LOG: [OpenMP] Replace function pointer uses in GPU state machine In non-SPMD mode we create a state machine like code to identify the parallel region the GPU worker threads should execute next. The identification uses the parallel region function pointer as that allows it to work even if the kernel (=target region) and the parallel region are in separate TUs. However, taking the address of a function comes with various downsides. With this patch we will identify the most common situation and replace the function pointer use with a dummy global symbol (for identification purposes only). That means, if the parallel region is only called from a single target region (or kernel), we do not use the function pointer of the parallel region to identify it but a new global symbol. Fixes PR46450. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83271 Added: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 38647b5eae68..4df65f81912b 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -52,6 +52,9 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified, "Number of OpenMP runtime function uses identified"); STATISTIC(NumOpenMPTargetRegionKernels, "Number of OpenMP target region entry points (=kernels) identified"); +STATISTIC( + NumOpenMPParallelRegionsReplacedInGPUStateMachine, + "Number of OpenMP parallel regions replaced with ID in GPU state machines"); #if !defined(NDEBUG) static constexpr auto TAG = "[" DEBUG_TYPE "]"; @@ -496,6 +499,8 @@ struct OpenMPOpt { if (PrintOpenMPKernels) printKernels(); + Changed |= rewriteDeviceCodeStateMachine(); + Changed |= runAttributor(); // Recollect uses, in case Attributor deleted any. @@ -849,6 +854,31 @@ struct OpenMPOpt { AddUserArgs(*GTIdArgs[u]); } + /// Kernel (=GPU) optimizations and utility functions + /// + ///{{ + + /// Check if \p F is a kernel, hence entry point for target offloading. + bool isKernel(Function &F) { return OMPInfoCache.Kernels.count(&F); } + + /// Cache to remember the unique kernel for a function. + DenseMap> UniqueKernelMap; + + /// Find the unique kernel that will execute \p F, if any. + Kernel getUniqueKernelFor(Function &F); + + /// Find the unique kernel that will execute \p I, if any. + Kernel getUniqueKernelFor(Instruction &I) { + return getUniqueKernelFor(*I.getFunction()); + } + + /// Rewrite the device (=GPU) code state machine create in non-SPMD mode in + /// the cases we can avoid taking the address of a function. + bool rewriteDeviceCodeStateMachine(); + + /// + ///}} + /// Emit a remark generically /// /// This template function can be used to generically emit a remark. The @@ -930,6 +960,140 @@ struct OpenMPOpt { } }; +Kernel OpenMPOpt::getUniqueKernelFor(Function &F) { + if (!OMPInfoCache.ModuleSlice.count(&F)) + return nullptr; + + // Use a scope to keep the lifetime of the CachedKernel short. + { + Optional &CachedKernel = UniqueKernelMap[&F]; + if (CachedKernel) + return *CachedKernel; + + // TODO: We should use an AA to create an (optimistic and callback + // call-aware) call graph. For now we stick to simple patterns that + // are less powerful, basically the worst fixpoint. + if (isKernel(F)) { + CachedKernel = Kernel(&F); + return *CachedKernel; + } + + CachedKernel = nullptr; + if (!F.hasLocalLinkage()) + return nullptr; + } + + auto GetUniqueKernelForUse = [&](const Use &U) -> Kernel { + if (auto *Cmp = dyn_cast(U.getUser())) { + // Allow use in equality comparisons. + if (Cmp->isEquality()) + return getUniqueKernelFor(*Cmp); + return nullptr; + } + if (auto *CB = dyn_cast(U.getUser())) { + // Allow direct calls. + if (CB->isCallee(&U)) + return getUniqueKernelFor(*CB); + // Allow the use in __kmpc_kernel_prepare_parallel calls. + if (Function *Callee = CB->getCalledFunction()) + if (Callee->getName() == "__kmpc_kernel_prepare_parallel") + return getUniqueKernelFor(*CB); + return nullptr; + } + // Disallow every other use. + return nullptr; + }; + + // TODO: In the future we want to track more than just a unique kernel. + SmallPtrSet PotentialKernels; + foreachUse(F, [&](const Use &U) { + PotentialKernels.insert(GetUniqueKernelForUse(U)); + }); + + Kernel K = nullptr; + if (PotentialKernels.size() == 1) + K = *PotentialKernels.begin(); + + // Cache the result. + UniqueKernelMap[&F] = K; + + return K; +} + +bool OpenMPOpt::rewriteDeviceCodeStateMachine() { + constexpr unsigned KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO = 0; + + OMPInformationCache::RuntimeFunctionInfo &KernelPrepareParallelRFI = + OMPInfoCache.RFIs[OMPRTL___kmpc_kernel_prepare_parallel]; + + bool Changed = false; + if (!KernelPrepareParallelRFI) + return Changed; + + for (Function *F : SCC) { + + // Check if the function is uses in a __kmpc_kernel_prepare_parallel call at + // all. + bool UnknownUse = false; + unsigned NumDirectCalls = 0; + + SmallVector ToBeReplacedStateMachineUses; + foreachUse(*F, [&](Use &U) { + if (auto *CB = dyn_cast(U.getUser())) + if (CB->isCallee(&U)) { + ++NumDirectCalls; + return; + } + + if (auto *Cmp = dyn_cast(U.getUser())) { + ToBeReplacedStateMachineUses.push_back(&U); + return; + } + if (CallInst *CI = OpenMPOpt::getCallIfRegularCall( + *U.getUser(), &KernelPrepareParallelRFI)) { + ToBeReplacedStateMachineUses.push_back(&U); + return; + } + UnknownUse = true; + }); + + // If this ever hits, we should investigate. + if (UnknownUse || NumDirectCalls != 1) + continue; + + // TODO: This is not a necessary restriction and should be lifted. + if (ToBeReplacedStateMachineUses.size() != 2) + continue; + + // Even if we have __kmpc_kernel_prepare_parallel calls, we (for now) give + // up if the function is not called from a unique kernel. + Kernel K = getUniqueKernelFor(*F); + if (!K) + continue; + + // We now know F is a parallel body function called only from the kernel K. + // We also identified the state machine uses in which we replace the + // function pointer by a new global symbol for identification purposes. This + // ensures only direct calls to the function are left. + + Module &M = *F->getParent(); + Type *Int8Ty = Type::getInt8Ty(M.getContext()); + + auto *ID = new GlobalVariable( + M, Int8Ty, /* isConstant */ true, GlobalValue::PrivateLinkage, + UndefValue::get(Int8Ty), F->getName() + ".ID"); + + for (Use *U : ToBeReplacedStateMachineUses) + U->set(ConstantExpr::getBitCast(ID, U->get()->getType())); + + ++NumOpenMPParallelRegionsReplacedInGPUStateMachine; + + Changed = true; + } + + return Changed; +} + /// Abstract Attribute for tracking ICV values. struct AAICVTracker : public StateWrapper { using Base = StateWrapper; diff --git a/llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll b/llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll new file mode 100644 index 000000000000..0a8d7a9d231a --- /dev/null +++ b/llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll @@ -0,0 +1,153 @@ +; RUN: opt -S -passes=openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels < %s | FileCheck %s +; RUN: opt -S -openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels < %s | FileCheck %s + +; C input used for this test: + +; void bar(void) { +; #pragma omp parallel +; { } +; } +; void foo(void) { +; #pragma omp target teams +; { +; #pragma omp parallel +; {} +; bar(); +; #pragma omp parallel +; {} +; } +; } + +; Verify we replace the function pointer uses for the first and last outlined +; region (1 and 3) but not for the middle one (2) because it could be called from +; another kernel. + +; CHECK-DAG: @__omp_outlined__1_wrapper.ID = private constant i8 undef +; CHECK-DAG: @__omp_outlined__3_wrapper.ID = private constant i8 undef + +; CHECK-DAG: icmp eq i8* %5, @__omp_outlined__1_wrapper.ID +; CHECK-DAG: icmp eq i8* %7, @__omp_outlined__3_wrapper.ID + +; CHECK-DAG: call void @__kmpc_kernel_prepare_parallel(i8* @__omp_outlined__1_wrapper.ID) +; CHECK-DAG: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void ()* @__omp_outlined__2_wrapper to i8*)) +; CHECK-DAG: call void @__kmpc_kernel_prepare_parallel(i8* @__omp_outlined__3_wrapper.ID) + + +%struct.ident_t = type { i32, i32, i32, i32, i8* } + +define internal void @__omp_offloading_35_a1e179_foo_l7_worker() { +entry: + %work_fn = alloca i8*, align 8 + %exec_status = alloca i8, align 1 + store i8* null, i8** %work_fn, align 8 + store i8 0, i8* %exec_status, align 1 + br label %.await.work + +.await.work: ; preds = %.barrier.parallel, %entry + call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) + %0 = call i1 @__kmpc_kernel_parallel(i8** %work_fn) + %1 = zext i1 %0 to i8 + store i8 %1, i8* %exec_status, align 1 + %2 = load i8*, i8** %work_fn, align 8 + %should_terminate = icmp eq i8* %2, null + br i1 %should_terminate, label %.exit, label %.select.workers + +.select.workers: ; preds = %.await.work + %3 = load i8, i8* %exec_status, align 1 + %is_active = icmp ne i8 %3, 0 + br i1 %is_active, label %.execute.parallel, label %.barrier.parallel + +.execute.parallel: ; preds = %.select.workers + %4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* null) + %5 = load i8*, i8** %work_fn, align 8 + %work_match = icmp eq i8* %5, bitcast (void ()* @__omp_outlined__1_wrapper to i8*) + br i1 %work_match, label %.execute.fn, label %.check.next + +.execute.fn: ; preds = %.execute.parallel + call void @__omp_outlined__1_wrapper() + br label %.terminate.parallel + +.check.next: ; preds = %.execute.parallel + %6 = load i8*, i8** %work_fn, align 8 + %work_match1 = icmp eq i8* %6, bitcast (void ()* @__omp_outlined__2_wrapper to i8*) + br i1 %work_match1, label %.execute.fn2, label %.check.next3 + +.execute.fn2: ; preds = %.check.next + call void @__omp_outlined__2_wrapper() + br label %.terminate.parallel + +.check.next3: ; preds = %.check.next + %7 = load i8*, i8** %work_fn, align 8 + %work_match4 = icmp eq i8* %7, bitcast (void ()* @__omp_outlined__3_wrapper to i8*) + br i1 %work_match4, label %.execute.fn5, label %.check.next6 + +.execute.fn5: ; preds = %.check.next3 + call void @__omp_outlined__3_wrapper() + br label %.terminate.parallel + +.check.next6: ; preds = %.check.next3 + %8 = bitcast i8* %2 to void ()* + call void %8() + br label %.terminate.parallel + +.terminate.parallel: ; preds = %.check.next6, %.execute.fn5, %.execute.fn2, %.execute.fn + call void @__kmpc_kernel_end_parallel() + br label %.barrier.parallel + +.barrier.parallel: ; preds = %.terminate.parallel, %.select.workers + call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) + br label %.await.work + +.exit: ; preds = %.await.work + ret void +} + +define weak void @__omp_offloading_35_a1e179_foo_l7() { + call void @__omp_offloading_35_a1e179_foo_l7_worker() + call void @__omp_outlined__() + ret void +} + +define internal void @__omp_outlined__() { + call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void ()* @__omp_outlined__1_wrapper to i8*)) + call void @bar() + call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void ()* @__omp_outlined__3_wrapper to i8*)) + ret void +} + +define internal void @__omp_outlined__1() { + ret void +} + +define internal void @__omp_outlined__1_wrapper() { + call void @__omp_outlined__1() + ret void +} + +define hidden void @bar() { + call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void ()* @__omp_outlined__2_wrapper to i8*)) + ret void +} + +define internal void @__omp_outlined__2_wrapper() { + ret void +} + +define internal void @__omp_outlined__3_wrapper() { + ret void +} + +declare void @__kmpc_kernel_prepare_parallel(i8* %WorkFn) + +declare zeroext i1 @__kmpc_kernel_parallel(i8** nocapture %WorkFn) + +declare void @__kmpc_kernel_end_parallel() + +declare void @__kmpc_barrier_simple_spmd(%struct.ident_t* nocapture readnone %loc_ref, i32 %tid) + +declare i32 @__kmpc_global_thread_num(%struct.ident_t* nocapture readnone) + + +!nvvm.annotations = !{!0} + +!0 = !{void ()* @__omp_offloading_35_a1e179_foo_l7, !"kernel", i32 1} From llvm-commits at lists.llvm.org Fri Jul 10 23:50:20 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:50:20 +0000 (UTC) Subject: [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine In-Reply-To: References: Message-ID: <0417fc03f7ed856ec40ad0a1b8f16a2b@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG5b0581aedc22: [OpenMP] Replace function pointer uses in GPU state machine (authored by jdoerfert). Changed prior to commit: https://reviews.llvm.org/D83271?vs=275927&id=277222#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83271/new/ https://reviews.llvm.org/D83271 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83271.277222.patch Type: text/x-patch Size: 12103 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Fri Jul 10 23:50:51 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:50:51 +0000 (UTC) Subject: [PATCH] D83560: [DebugInfo] Added support for DW_OP_implicit_value in llvm In-Reply-To: References: Message-ID: <1d69276d8a64b461453d863aaf0c7a37@localhost.localdomain> dblaikie added a comment. Not sure it's worth committing such a narrow implementation - might be worth a bit of generalization even in this first patch? Or do you have specific/near-term plans to generalize this further (both the addImplicitValue itself, which looks to be very specific to long double right now (but without any assertions, API features, or comments to enforce that restriction) - and in the DwarfDebug.cpp caller, which could presumably be used for all constant values, maybe (not sure if that would be a good thing or not - haven't looked at the alternatives, etc)) @aprantl @probinson will probably want to weigh in on getting support from their debuggers before this is committed, or having this under a flag, etc. ================ Comment at: llvm/test/DebugInfo/X86/DW_OP_implicit_value.ll:13-18 +;;int main() { +;; long double ld = 3.14; +;; printf("dummy\n"); +;; ld *= ld; +;; return 0; +;;} ---------------- I'd probably write this as: ``` long double src(); void f1() { long double ld = 3.14; ld = src(); } ``` That should be enough to use the implicit_value to represent the value of 'ld' during the call to src, and to use a location list to do it (since the assignment to 'ld' is shortening the lifetime - using function calls at least I find are clearer opaque clobbers/sinks - rather than the complexity of printf, or wondering whether the use of multiplication is significant, whether this has to be main, has to have an integer return value for some reason, etc.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83560/new/ https://reviews.llvm.org/D83560 From llvm-commits at lists.llvm.org Fri Jul 10 23:55:33 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:55:33 +0000 (UTC) Subject: [PATCH] D82975: [DebugInfo] Allow GNU macro extension to be emitted In-Reply-To: References: Message-ID: <65b263d7ac8f971c4a5c38486fe6019c@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1359-1368 + dwarf::Attribute MacrosAttr = getDwarfVersion() >= 5 + ? dwarf::DW_AT_macros + : dwarf::DW_AT_GNU_macros; if (useSplitDwarf()) TheCU.addSectionDelta( - TheCU.getUnitDie(), dwarf::DW_AT_macros, U.getMacroLabelBegin(), + TheCU.getUnitDie(), MacrosAttr, U.getMacroLabelBegin(), TLOF.getDwarfMacroDWOSection()->getBeginSymbol()); ---------------- dstenb wrote: > dblaikie wrote: > > Looks like this might be wrong for v4 + split DWARF + using macro? Or perhaps this code isn't reachable by that combination? > > > > Might be more clear, then, to sink the MacrosAttr choice down into the "else" clause here, and assert in the split DWARF case that the version >= 5? (possibly including a note about how the pre-v5, GCC debug_macro extension isn't supported with Split DWARF) > Sorry, in what way does this look wrong? If I am not overlooking something, this look the same as what GCC emits for the attribute in the `-g3 -gdwarf-4 -gsplit-dwarf` case. > > Regardless of the above, doing like you suggest and adding an assert seems like a good idea. > Sorry, in what way does this look wrong? If I am not overlooking something, this look the same as what GCC emits for the attribute in the -g3 -gdwarf-4 -gsplit-dwarf case. I think that's what we were discussing at length previously in this review, that GNU debug_macro + v4 + split DWARF seems a bit ill specified, and it's probably best to have -fdebug-macro + v4 + -gsplit-dwarf continue to use debug_macinfo rather than the ill-specified and not-implemented-by-any-consumer v4 Split DWARF .debug_macro? https://reviews.llvm.org/D82975#2142264 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82975/new/ https://reviews.llvm.org/D82975 From llvm-commits at lists.llvm.org Fri Jul 10 23:57:14 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:57:14 +0000 (UTC) Subject: [PATCH] D83557: [DebugInfo] Simplify DwarfDebug::emitMacro In-Reply-To: References: Message-ID: <1a5074e7bee761d65994a97c5498f305@localhost.localdomain> dblaikie accepted this revision. dblaikie added a comment. This revision is now accepted and ready to land. Looks good, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83557/new/ https://reviews.llvm.org/D83557 From llvm-commits at lists.llvm.org Fri Jul 10 23:59:32 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 06:59:32 +0000 (UTC) Subject: [PATCH] D82886: [DebugInfo] Fix a possible crash when reading a malformed .debug_*lists section. In-Reply-To: References: Message-ID: <3e4cf3a386278f6a5ad3401bc57ecceb@localhost.localdomain> dblaikie added a comment. In D82886#2140817 , @ikudrin wrote: > In D82886#2139657 , @dblaikie wrote: > > > I'm not suggesting it needs to be fixed - but that that codepath (the one that returns zero) is untested - so when it was committed, it was committed without test coverage. It'd be good to add test coverage where it is missing like this. > > > Isn't adding that test coverage orthogonal to this particular patch? Oh yeah, for sure! Didn't mean to suggest it should go here! >> Right - but what I mean is if there's only 10 bytes, as in your example - it reads the 4 bytes of DWARF64 mark, then 6 bytes out of the desired 8 - if the length was then reported as 10 (with an error saying the length was garbled/the contents terminated earlier than expected), would that be adequate to no longer need the zero length special case? > > I am OK with the current convention that if it is not possible to read the length field the code returns zero as the total length. It might be better to make the result `Optional` and return `None` in that case, but I really doubt it is worth investing time in that. Reporting something that was not read from the section (10 in your example) smells not good for me, Not sure I follow - 10 bytes of the length (as many bytes as were available to read) were read from the section, right? Ah, you mean the length bytes did not describe a length of 10, but 10 bytes were read. *nod* I can see how that's a bit ambiguous, indeed. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82886/new/ https://reviews.llvm.org/D82886 From llvm-commits at lists.llvm.org Sat Jul 11 00:18:15 2020 From: llvm-commits at lists.llvm.org (Mehdi Amini via llvm-commits) Date: Sat, 11 Jul 2020 00:18:15 -0700 (PDT) Subject: [llvm] c44702b - Remove unused variable `KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO` (NFC) Message-ID: <5f0967b7.1c69fb81.e0b24.3f3a@mx.google.com> Author: Mehdi Amini Date: 2020-07-11T07:17:28Z New Revision: c44702bcdf8aa829e28399d0d4ac4bfc5ac4fff1 URL: https://github.com/llvm/llvm-project/commit/c44702bcdf8aa829e28399d0d4ac4bfc5ac4fff1 DIFF: https://github.com/llvm/llvm-project/commit/c44702bcdf8aa829e28399d0d4ac4bfc5ac4fff1.diff LOG: Remove unused variable `KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO` (NFC) This fixes a compiler warning. Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 4df65f81912b..7d93e78357b3 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -1021,8 +1021,6 @@ Kernel OpenMPOpt::getUniqueKernelFor(Function &F) { } bool OpenMPOpt::rewriteDeviceCodeStateMachine() { - constexpr unsigned KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO = 0; - OMPInformationCache::RuntimeFunctionInfo &KernelPrepareParallelRFI = OMPInfoCache.RFIs[OMPRTL___kmpc_kernel_prepare_parallel]; From llvm-commits at lists.llvm.org Sat Jul 11 00:26:05 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 07:26:05 +0000 (UTC) Subject: [PATCH] D83617: [PowerPC] Fix combineVectorShuffle regression after D77448 In-Reply-To: References: Message-ID: MaskRay updated this revision to Diff 277223. MaskRay edited the summary of this revision. MaskRay added a comment. Herald added a subscriber: wuzish. Improve test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83617/new/ https://reviews.llvm.org/D83617 Files: llvm/lib/Target/PowerPC/PPCISelLowering.cpp llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83617.277223.patch Type: text/x-patch Size: 4910 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 00:29:34 2020 From: llvm-commits at lists.llvm.org (Zhang Kang via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 07:29:34 +0000 (UTC) Subject: [PATCH] D83569: [PowerPC] Fix the killed flag in mi-peephole pass In-Reply-To: References: Message-ID: <265b186137957eaa63753cc5e2e7be09@localhost.localdomain> ZhangKang updated this revision to Diff 277224. ZhangKang added a comment. Fix the error case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83569/new/ https://reviews.llvm.org/D83569 Files: llvm/lib/Target/PowerPC/PPCMIPeephole.cpp llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir llvm/test/CodeGen/PowerPC/mi-peephole.mir Index: llvm/test/CodeGen/PowerPC/mi-peephole.mir =================================================================== --- llvm/test/CodeGen/PowerPC/mi-peephole.mir +++ llvm/test/CodeGen/PowerPC/mi-peephole.mir @@ -31,7 +31,7 @@ ; CHECK: bb.0.entry: ; CHECK: %1:g8rc = COPY $x4 ; CHECK: %0:g8rc = COPY $x3 - ; CHECK: %3:g8rc = RLDIC %1, 2, 30 + ; CHECK: %3:g8rc = RLDIC killed %1, 2, 30 ; CHECK: $x3 = COPY %3 ; CHECK: BLR8 implicit $lr8, implicit $rm, implicit $x3 ... Index: llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir =================================================================== --- llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir +++ llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir @@ -51,4 +51,4 @@ # # CHECK-PASS-NOT: %2:g8rc = RLDICL killed %1, 0, 32 # CHECK-PASS-NOT: %3:g8rc = RLDICR %2, 2, 61 -# CHECK-PASS: %3:g8rc = RLDIC %1, 2, 30 +# CHECK-PASS: %3:g8rc = RLDIC killed %1, 2, 30 Index: llvm/lib/Target/PowerPC/PPCMIPeephole.cpp =================================================================== --- llvm/lib/Target/PowerPC/PPCMIPeephole.cpp +++ llvm/lib/Target/PowerPC/PPCMIPeephole.cpp @@ -1556,6 +1556,11 @@ MI.getOperand(2).setImm(NewSH); MI.getOperand(3).setImm(NewMB); + if (SrcMI->getOperand(1).isKill()) { + MI.getOperand(1).setIsKill(true); + SrcMI->getOperand(1).setIsKill(false); + } + LLVM_DEBUG(dbgs() << "To: "); LLVM_DEBUG(MI.dump()); NumRotatesCollapsed++; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83569.277224.patch Type: text/x-patch Size: 1538 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 00:40:00 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Sat, 11 Jul 2020 00:40:00 -0700 (PDT) Subject: [llvm] dce6bc1 - [OpenMP][FIX] remove unused variable and long if-else chain Message-ID: <5f096cd0.1c69fb81.dc73a.3516@mx.google.com> Author: Johannes Doerfert Date: 2020-07-11T02:37:57-05:00 New Revision: dce6bc18c4e1d086182f9faa3f984912566a3c20 URL: https://github.com/llvm/llvm-project/commit/dce6bc18c4e1d086182f9faa3f984912566a3c20 DIFF: https://github.com/llvm/llvm-project/commit/dce6bc18c4e1d086182f9faa3f984912566a3c20.diff LOG: [OpenMP][FIX] remove unused variable and long if-else chain MSVC throws an error if you use "too many" if-else in a row: `Frontend/OpenMP/OMPKinds.def(570): fatal error C1061: compiler limit: blocks nested too deeply` We work around it now... Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index 7d93e78357b3..f25e95466407 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -1397,9 +1397,17 @@ void OpenMPInModule::identifyKernels(Module &M) { bool llvm::omp::containsOpenMP(Module &M, OpenMPInModule &OMPInModule) { if (OMPInModule.isKnown()) return OMPInModule; + + // MSVC doesn't like long if-else chains for some reason and instead just + // issues an error. Work around it.. + do { #define OMP_RTL(_Enum, _Name, ...) \ - else if (M.getFunction(_Name)) OMPInModule = true; + if (M.getFunction(_Name)) { \ + OMPInModule = true; \ + break; \ + } #include "llvm/Frontend/OpenMP/OMPKinds.def" + } while (false); // Identify kernels once. TODO: We should split the OMPInformationCache into a // module and an SCC part. The kernel information, among other things, could From llvm-commits at lists.llvm.org Sat Jul 11 01:16:39 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 08:16:39 +0000 (UTC) Subject: [PATCH] D83602: [DAGCombiner] Scalarize splats with just one demanded lane In-Reply-To: References: Message-ID: <6f2cd38024141c17f5df7716aac2a083@localhost.localdomain> lebedev.ri added a comment. Is this supposed to fix some lowering-produced code? If not, shouldn't this be best done in the middle-end? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83602/new/ https://reviews.llvm.org/D83602 From llvm-commits at lists.llvm.org Sat Jul 11 01:31:52 2020 From: llvm-commits at lists.llvm.org (Juneyoung Lee via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 08:31:52 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <81d79bd7eaa401c8b41e786a71226387@localhost.localdomain> aqjune added a comment. Seems like a bug in instsimplify: define i1 @f(i32 %x, i32 %y) { %cmp9.not.1 = icmp eq i32 %x, %y %cmp15 = icmp slt i32 %x, %y %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15 %spec.select40 = xor i1 %cmp9.not.1, 1 %spec.select = and i1 %spec.select39, %spec.select40 ret i1 %spec.select } => define i1 @f(i32 %x, i32 %y) { %cmp9.not.1 = icmp eq i32 %x, %y %cmp15 = icmp slt i32 %x, %y %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15 ret i1 %spec.select39 } https://godbolt.org/z/a8f7hT Alive2 says it's incorrect: https://alive2.llvm.org/ce/z/-8Q4HL Seems to be related with ValueTracking's isImpliedCondition since this optimizations happens only when operands of the two icmps are the same. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Sat Jul 11 01:42:35 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 08:42:35 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: bbrezillon added a comment. >> Since OpenCL allows apps to provide precompiled intermediate representations to the API, in the form of SPIR or SPIR-V, that means that the app could have embedded references to `FP_ILOGBNAN`, which are just a constant value since it's a `#define`, in their own code. They could write a kernel which calls `ilogb` and compares the result to `FP_ILOGBNAN`, and compile that with either clang trunk (which uses opencl-c.h automatically), or with Khronos's offline compiler (https://github.com/KhronosGroup/SPIR) using Khronos's standard library headers (https://github.com/KhronosGroup/libclcxx). Both of these standard library implementation headers define `FP_ILOGBNAN` to be `INT_MAX`: >> >> - https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/opencl-c-base.h#L165 >> - https://github.com/KhronosGroup/libclcxx/blob/96459f111c3e3a4709f7e09bf5fb73dea81a475a/include/opencl_math_constants#L85 > > I've no problem with the compatibility argument. but using `INT_MAX` for both `NaN` and `Inf` is in my understanding wrong. Again, I didn't see this clearly stated in the spec. Can you quote the spec or point us to the relevant section? > If the intention is to use `INT_MAX` for `FP_ILOGBNAN`, the implementation needs to be changed to return a different value for `Inf` (presumably 127 for fp32 and 1023 for fp64, since those can reuse the default codepath). I see several non-CL implementations using the same value for `NaN` and `Inf` [1][2][3]. Again, I'm not saying this applies to CL, but I couldn't find anything in the spec forbidding this collision. > Note that LLVM internally uses `INT_MIN + 1 (-INT_MAX)` for `ilogb(0)`, `INT_MIN` for `ilogb(NaN)` and `INT_MAX` for `ilogb(Inf)` [0]. This means that any optimization handling these constants and ilogb operation would be incorrect in LLVM. Hm, don't you have the same problem when such optimizations happen and the binary is linked with a libc that has different definitions for those values? [1]https://pubs.opengroup.org/onlinepubs/007908799/xsh/ilogb.html [2]https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/ilogb-ilogbf-ilogbl2?view=vs-2019 [3]http://redhat.polarhome.com/service/man/?qf=ilogb&tf=2&of=OpenDarwin&sf=3 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Sat Jul 11 02:16:05 2020 From: llvm-commits at lists.llvm.org (Boris Brezillon via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 09:16:05 +0000 (UTC) Subject: [PATCH] D83473: libclc: Fix FP_ILOGBNAN definition In-Reply-To: References: Message-ID: bbrezillon added a comment. >> Note that LLVM internally uses `INT_MIN + 1 (-INT_MAX)` for `ilogb(0)`, `INT_MIN` for `ilogb(NaN)` and `INT_MAX` for `ilogb(Inf)` [0]. This means that any optimization handling these constants and ilogb operation would be incorrect in LLVM. > > Hm, don't you have the same problem when such optimizations happen and the binary is linked with a libc that has different definitions for those values? I had a quick look, and it seems that `ilogb()` results are never returned directly. All callers seem to compare the result to the `IEK_xxx` defs and handle the `Inf`, `NaN` and `Zero` properly or check the arg value before passing it to `ilogb()`, so I don't think this mismatch is an issue. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83473/new/ https://reviews.llvm.org/D83473 From llvm-commits at lists.llvm.org Sat Jul 11 02:43:13 2020 From: llvm-commits at lists.llvm.org (Nathan James via llvm-commits) Date: Sat, 11 Jul 2020 02:43:13 -0700 (PDT) Subject: [llvm] 4abdcdb - Fix gn builds after 943660fd1 Message-ID: <5f0989b1.1c69fb81.a50cb.5312@mx.google.com> Author: Nathan James Date: 2020-07-11T10:42:57+01:00 New Revision: 4abdcdb45ee22d77dd64a71cb41e967d35361280 URL: https://github.com/llvm/llvm-project/commit/4abdcdb45ee22d77dd64a71cb41e967d35361280 DIFF: https://github.com/llvm/llvm-project/commit/4abdcdb45ee22d77dd64a71cb41e967d35361280.diff LOG: Fix gn builds after 943660fd1 Added: Modified: llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn index 3bf40626fc80..bfc2c7ae5110 100644 --- a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn @@ -7,7 +7,7 @@ static_library("OpenMP") { ] public_deps = [ "//llvm/include/llvm/Frontend/OpenMP:public_tablegen" ] sources = [ - "OMPConstants.cpp", + "OMP.cpp", "OMPContext.cpp", "OMPIRBuilder.cpp", ] From llvm-commits at lists.llvm.org Sat Jul 11 02:45:31 2020 From: llvm-commits at lists.llvm.org (Nathan James via llvm-commits) Date: Sat, 11 Jul 2020 02:45:31 -0700 (PDT) Subject: [llvm] 8fb91df - Revert "Fix gn builds after 943660fd1" Message-ID: <5f098a3b.1c69fb81.1b0ba.55a2@mx.google.com> Author: Nathan James Date: 2020-07-11T10:45:17+01:00 New Revision: 8fb91dfeed1bd1ffdfd31a345e1bf7cf0b7c86e2 URL: https://github.com/llvm/llvm-project/commit/8fb91dfeed1bd1ffdfd31a345e1bf7cf0b7c86e2 DIFF: https://github.com/llvm/llvm-project/commit/8fb91dfeed1bd1ffdfd31a345e1bf7cf0b7c86e2.diff LOG: Revert "Fix gn builds after 943660fd1" This reverts commit 4abdcdb45ee22d77dd64a71cb41e967d35361280. Added: Modified: llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn index bfc2c7ae5110..3bf40626fc80 100644 --- a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn @@ -7,7 +7,7 @@ static_library("OpenMP") { ] public_deps = [ "//llvm/include/llvm/Frontend/OpenMP:public_tablegen" ] sources = [ - "OMP.cpp", + "OMPConstants.cpp", "OMPContext.cpp", "OMPIRBuilder.cpp", ] From llvm-commits at lists.llvm.org Sat Jul 11 02:53:13 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 09:53:13 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <8ee6c2fdcd142aa7287a1384500e30ec@localhost.localdomain> ggeorgakoudis updated this revision to Diff 277228. ggeorgakoudis added a comment. Revert changes to regression tests Check AbstractCallSite is indeed a callback Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 Files: llvm/include/llvm/IR/Function.h llvm/lib/Analysis/CallGraph.cpp llvm/lib/IR/Function.cpp llvm/test/Analysis/CallGraph/ignore-callback-uses.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83370.277228.patch Type: text/x-patch Size: 5096 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 03:25:19 2020 From: llvm-commits at lists.llvm.org (Christudasan Devadasan via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 10:25:19 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: cdevadas updated this revision to Diff 277230. cdevadas edited the summary of this revision. cdevadas added a comment. Expanded the comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 Files: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll Index: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll @@ -0,0 +1,60 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s | FileCheck -check-prefix=GCN %s +define void @test() #1 { + ; Clean up the unreachable blocks introduced with LowerSwitch pass. + ; This test ensures that, in the pass flow, UnreachableBlockElim pass + ; follows the LowerSwitch. Otherwise, this testcase will crash + ; immediately after the instruction selection due to the incomplete + ; PHI node in an MBB whose incoming values were never codegenerated. + ; + ; GCN-LABEL: name: test + ; GCN: bb.{{[0-9]+}}.entry: + ; GCN: bb.{{[0-9]+}}.entry.true.blk: + ; GCN: bb.{{[0-9]+}}.entry.false.blk: + ; GCN: bb.{{[0-9]+}}.switch.blk: + + ; GCN-NOT: bb.{{[0-9]+}}.preheader.blk + ; GCN-NOT: bb.{{[0-9]+}}.pre.false.blk: + ; GCN-NOT: bb.{{[0-9]+}}.unreach.blk: + ; GCN-NOT: PHI + + ; GCN: bb.{{[0-9]+}}.exit: + entry: + %idx = tail call i32 @llvm.amdgcn.workitem.id.x() #0 + br i1 undef, label %entry.true.blk, label %entry.false.blk + + entry.true.blk: ; preds = %entry + %exit.cmp = icmp ult i32 %idx, 3 + br i1 %exit.cmp, label %switch.blk, label %exit + + entry.false.blk: ; preds = %entry + unreachable + + switch.blk: ; preds = %entry.true.blk + switch i32 %idx, label %preheader.blk [ + i32 0, label %exit + i32 1, label %exit + i32 2, label %exit + ] + + preheader.blk: ; preds = %switch.blk + %pre.exit = icmp ult i32 %idx, 5 + br i1 %pre.exit, label %unreach.blk, label %pre.false.blk + + pre.false.blk: ; preds = %preheader.blk + %call.pre.false = tail call i32 @func(i32 %idx) #0 + br label %unreach.blk + + unreach.blk: ; preds = %preheader.blk, %pre.false.blk + %phi.val = phi i32 [ %call.pre.false, %pre.false.blk ], [ undef, %preheader.blk ] + store i32 %phi.val, i32* undef + unreachable + + exit: ; preds = %switch.blk + ret void +} + +declare i32 @llvm.amdgcn.workitem.id.x() #0 +declare i32 @func(i32)#0 + +attributes #0 = { nounwind readnone } +attributes #1 = { nounwind } Index: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -787,10 +787,15 @@ if (EnableLoadStoreVectorizer) addPass(createLoadStoreVectorizerPass()); + + // LowerSwitch pass may introduce unreachable blocks that can + // cause unexpected behavior for subsequent passes. Placing it + // here seems better that these blocks would get cleaned up by + // UnreachableBlockElim inserted next in the pass flow. + addPass(createLowerSwitchPass()); } bool AMDGPUPassConfig::addPreISel() { - addPass(createLowerSwitchPass()); addPass(createFlattenCFGPass()); return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83584.277230.patch Type: text/x-patch Size: 3279 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 03:42:17 2020 From: llvm-commits at lists.llvm.org (Nathan James via llvm-commits) Date: Sat, 11 Jul 2020 03:42:17 -0700 (PDT) Subject: [llvm] 35af6f1 - Reland Fix gn build after 943660f Message-ID: <5f099789.1c69fb81.ac554.51d9@mx.google.com> Author: Nathan James Date: 2020-07-11T11:42:05+01:00 New Revision: 35af6f11e04b777b73035f59bfabb68a08ca4ad9 URL: https://github.com/llvm/llvm-project/commit/35af6f11e04b777b73035f59bfabb68a08ca4ad9 DIFF: https://github.com/llvm/llvm-project/commit/35af6f11e04b777b73035f59bfabb68a08ca4ad9.diff LOG: Reland Fix gn build after 943660f Added: Modified: llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn Removed: ################################################################################ diff --git a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn index 3bf40626fc80..07b265bcb288 100644 --- a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn @@ -7,7 +7,7 @@ static_library("OpenMP") { ] public_deps = [ "//llvm/include/llvm/Frontend/OpenMP:public_tablegen" ] sources = [ - "OMPConstants.cpp", + "OMP.cpp.inc", "OMPContext.cpp", "OMPIRBuilder.cpp", ] From llvm-commits at lists.llvm.org Sat Jul 11 03:44:45 2020 From: llvm-commits at lists.llvm.org (Nico Weber via llvm-commits) Date: Sat, 11 Jul 2020 03:44:45 -0700 (PDT) Subject: [llvm] 09a95f5 - [gn build] (manually) merge 943660fd15f193 Message-ID: <5f09981d.1c69fb81.5e829.3298@mx.google.com> Author: Nico Weber Date: 2020-07-11T06:44:28-04:00 New Revision: 09a95f51fb1fb86442418d891f67a43e2a3ca698 URL: https://github.com/llvm/llvm-project/commit/09a95f51fb1fb86442418d891f67a43e2a3ca698 DIFF: https://github.com/llvm/llvm-project/commit/09a95f51fb1fb86442418d891f67a43e2a3ca698.diff LOG: [gn build] (manually) merge 943660fd15f193 Added: Modified: llvm/lib/Frontend/OpenMP/CMakeLists.txt llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn Removed: ################################################################################ diff --git a/llvm/lib/Frontend/OpenMP/CMakeLists.txt b/llvm/lib/Frontend/OpenMP/CMakeLists.txt index f88e3ed98662..068283fd82e0 100644 --- a/llvm/lib/Frontend/OpenMP/CMakeLists.txt +++ b/llvm/lib/Frontend/OpenMP/CMakeLists.txt @@ -15,4 +15,4 @@ add_llvm_component_library(LLVMFrontendOpenMP intrinsics_gen omp_gen omp_cpp - ) \ No newline at end of file + ) diff --git a/llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn b/llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn index 9942a3647b58..a18f8db5f5eb 100644 --- a/llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn @@ -6,13 +6,6 @@ tablegen("OMP") { output_name = "OMP.h.inc" } -tablegen("OMPImpl") { - visibility = [ ":public_tablegen" ] - args = [ "-gen-directive-impl" ] - td_file = "OMP.td" - output_name = "OMP.cpp.inc" -} - # Groups all tablegen() calls that create .inc files that are included in # Frontent/OpenMP's public headers (just one so far). # //llvm/lib/Frontend/OpenMP has this as a public_dep, so targets depending on @@ -21,6 +14,5 @@ group("public_tablegen") { public_deps = [ # Frontend/OpenMP's public headers include OMP.h.inc. ":OMP", - ":OMPImpl", ] } diff --git a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn index 07b265bcb288..688a25e3c1df 100644 --- a/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn +++ b/llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn @@ -1,14 +1,23 @@ +import("//llvm/utils/TableGen/tablegen.gni") + +tablegen("OMPImpl") { + visibility = [ ":OpenMP" ] + args = [ "-gen-directive-impl" ] + td_file = "//llvm/include/llvm/Frontend/OpenMP/OMP.td" + output_name = "OMP.cpp" +} + static_library("OpenMP") { output_name = "LLVMFrontendOpenMP" deps = [ + ":OMPImpl", "//llvm/lib/IR", "//llvm/lib/Support", "//llvm/lib/Transforms/Utils", ] public_deps = [ "//llvm/include/llvm/Frontend/OpenMP:public_tablegen" ] sources = [ - "OMP.cpp.inc", "OMPContext.cpp", "OMPIRBuilder.cpp", - ] + ] + get_target_outputs(":OMPImpl") } From llvm-commits at lists.llvm.org Sat Jul 11 03:45:36 2020 From: llvm-commits at lists.llvm.org (Sameer Sahasrabuddhe via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 10:45:36 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <7d9c1778976fbb52fff8d6bd88226ee7@localhost.localdomain> sameerds accepted this revision. sameerds added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 From llvm-commits at lists.llvm.org Sat Jul 11 03:53:55 2020 From: llvm-commits at lists.llvm.org (Roman Lebedev via llvm-commits) Date: Sat, 11 Jul 2020 03:53:55 -0700 (PDT) Subject: [llvm] 4500db8 - Revert "Reland "[InstCombine] Lower infinite combine loop detection thresholds""" Message-ID: <5f099a43.1c69fb81.3d0d9.43e0@mx.google.com> Author: Roman Lebedev Date: 2020-07-11T13:53:24+03:00 New Revision: 4500db8c59621a31c622862a2946457fdee481ce URL: https://github.com/llvm/llvm-project/commit/4500db8c59621a31c622862a2946457fdee481ce DIFF: https://github.com/llvm/llvm-project/commit/4500db8c59621a31c622862a2946457fdee481ce.diff LOG: Revert "Reland "[InstCombine] Lower infinite combine loop detection thresholds""" And there's a new hit: https://bugs.llvm.org/show_bug.cgi?id=46680 This reverts commit 7103c87596efccd532e9fe04a6ba6a200fed8481. Added: Modified: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index e810b3de25bc..d1c1e5418825 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -123,13 +123,8 @@ STATISTIC(NumReassoc , "Number of reassociations"); DEBUG_COUNTER(VisitCounter, "instcombine-visit", "Controls which instructions are visited"); -// FIXME: these limits eventually should be as low as 2. static constexpr unsigned InstCombineDefaultMaxIterations = 1000; -#ifndef NDEBUG -static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 100; -#else static constexpr unsigned InstCombineDefaultInfiniteLoopThreshold = 1000; -#endif static cl::opt EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"), From llvm-commits at lists.llvm.org Sat Jul 11 04:02:57 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via llvm-commits) Date: Sat, 11 Jul 2020 04:02:57 -0700 (PDT) Subject: [llvm] f7907e9 - [TRE] allow TRE for non-capturing calls. Message-ID: <5f099c61.1c69fb81.a7a54.5528@mx.google.com> Author: Alexey Lapshin Date: 2020-07-11T14:01:48+03:00 New Revision: f7907e9d223d8484f9afd457ba614c2db2ae4743 URL: https://github.com/llvm/llvm-project/commit/f7907e9d223d8484f9afd457ba614c2db2ae4743 DIFF: https://github.com/llvm/llvm-project/commit/f7907e9d223d8484f9afd457ba614c2db2ae4743.diff LOG: [TRE] allow TRE for non-capturing calls. The current implementation of Tail Recursion Elimination has a very restricted pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function call receives a pointer to local stack. Generally, function calls that receive a pointer to local stack but do not capture it - should not break TRE. This fix allows us to do TRE if it is proved that no pointer to the local stack is escaped. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D82085 Added: llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll Modified: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/test/Transforms/TailCallElim/basic.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp index 5bb1d54d7d12..bfd312a52ea5 100644 --- a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp +++ b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp @@ -81,6 +81,7 @@ #include "llvm/Support/raw_ostream.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" +#include "llvm/Transforms/Utils/Local.h" using namespace llvm; #define DEBUG_TYPE "tailcallelim" @@ -92,7 +93,10 @@ STATISTIC(NumAccumAdded, "Number of accumulators introduced"); /// Scan the specified function for alloca instructions. /// If it contains any dynamic allocas, returns false. static bool canTRE(Function &F) { - // Because of PR962, we don't TRE dynamic allocas. + // TODO: We don't do TRE if dynamic allocas are used. + // Dynamic allocas allocate stack space which should be + // deallocated before new iteration started. That is + // currently not implemented. return llvm::all_of(instructions(F), [](Instruction &I) { auto *AI = dyn_cast(&I); return !AI || AI->isStaticAlloca(); @@ -185,11 +189,9 @@ struct AllocaDerivedValueTracker { }; } -static bool markTails(Function &F, bool &AllCallsAreTailCalls, - OptimizationRemarkEmitter *ORE) { +static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { if (F.callsFunctionThatReturnsTwice()) return false; - AllCallsAreTailCalls = true; // The local stack holds all alloca instructions and all byval arguments. AllocaDerivedValueTracker Tracker; @@ -272,11 +274,8 @@ static bool markTails(Function &F, bool &AllCallsAreTailCalls, } } - if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) { + if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) DeferredTails.push_back(CI); - } else { - AllCallsAreTailCalls = false; - } } for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) { @@ -313,8 +312,6 @@ static bool markTails(Function &F, bool &AllCallsAreTailCalls, LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n"); CI->setTailCall(); Modified = true; - } else { - AllCallsAreTailCalls = false; } } @@ -325,7 +322,16 @@ static bool markTails(Function &F, bool &AllCallsAreTailCalls, /// instruction from after the call to before the call, assuming that all /// instructions between the call and this instruction are movable. /// -static bool canMoveAboveCall(Instruction *I, CallInst *CI, AliasAnalysis *AA) { +static bool canMoveAboveCall(Instruction *I, CallInst *CI, AliasAnalysis *AA, + DenseMap &AllocaForValue) { + if (isa(I)) + return true; + + if (const IntrinsicInst *II = dyn_cast(I)) + if (II->getIntrinsicID() == Intrinsic::lifetime_end && + llvm::findAllocaForValue(II->getArgOperand(1), AllocaForValue)) + return true; + // FIXME: We can move load/store/call/free instructions above the call if the // call does not mod/ref the memory location being processed. if (I->mayHaveSideEffects()) // This also handles volatile loads. @@ -392,7 +398,6 @@ class TailRecursionEliminator { // createTailRecurseLoopHeader the first time we find a call we can eliminate. BasicBlock *HeaderBB = nullptr; SmallVector ArgumentPHIs; - bool RemovableCallsMustBeMarkedTail = false; // PHI node to store our return value. PHINode *RetPN = nullptr; @@ -414,13 +419,15 @@ class TailRecursionEliminator { // The instruction doing the accumulating. Instruction *AccumulatorRecursionInstr = nullptr; + // The cache for pairs. + DenseMap AllocaForValue; + TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI, AliasAnalysis *AA, OptimizationRemarkEmitter *ORE, DomTreeUpdater &DTU) : F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {} - CallInst *findTRECandidate(Instruction *TI, - bool CannotTailCallElimCallsMarkedTail); + CallInst *findTRECandidate(Instruction *TI); void createTailRecurseLoopHeader(CallInst *CI); @@ -428,11 +435,9 @@ class TailRecursionEliminator { bool eliminateCall(CallInst *CI); - bool foldReturnAndProcessPred(ReturnInst *Ret, - bool CannotTailCallElimCallsMarkedTail); + bool foldReturnAndProcessPred(ReturnInst *Ret); - bool processReturningBlock(ReturnInst *Ret, - bool CannotTailCallElimCallsMarkedTail); + bool processReturningBlock(ReturnInst *Ret); void cleanupAndFinalize(); @@ -443,8 +448,7 @@ class TailRecursionEliminator { }; } // namespace -CallInst *TailRecursionEliminator::findTRECandidate( - Instruction *TI, bool CannotTailCallElimCallsMarkedTail) { +CallInst *TailRecursionEliminator::findTRECandidate(Instruction *TI) { BasicBlock *BB = TI->getParent(); if (&BB->front() == TI) // Make sure there is something before the terminator. @@ -464,9 +468,9 @@ CallInst *TailRecursionEliminator::findTRECandidate( --BBI; } - // If this call is marked as a tail call, and if there are dynamic allocas in - // the function, we cannot perform this optimization. - if (CI->isTailCall() && CannotTailCallElimCallsMarkedTail) + assert((!CI->isTailCall() || !CI->isNoTailCall()) && + "Incompatible call site attributes(Tail,NoTail)"); + if (!CI->isTailCall()) return nullptr; // As a special case, detect code like this: @@ -498,26 +502,13 @@ void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) { BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry); BI->setDebugLoc(CI->getDebugLoc()); - // If this function has self recursive calls in the tail position where some - // are marked tail and some are not, only transform one flavor or another. - // We have to choose whether we move allocas in the entry block to the new - // entry block or not, so we can't make a good choice for both. We make this - // decision here based on whether the first call we found to remove is - // marked tail. - // NOTE: We could do slightly better here in the case that the function has - // no entry block allocas. - RemovableCallsMustBeMarkedTail = CI->isTailCall(); - - // If this tail call is marked 'tail' and if there are any allocas in the - // entry block, move them up to the new entry block. - if (RemovableCallsMustBeMarkedTail) - // Move all fixed sized allocas from HeaderBB to NewEntry. - for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(), - NEBI = NewEntry->begin(); - OEBI != E;) - if (AllocaInst *AI = dyn_cast(OEBI++)) - if (isa(AI->getArraySize())) - AI->moveBefore(&*NEBI); + // Move all fixed sized allocas from HeaderBB to NewEntry. + for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(), + NEBI = NewEntry->begin(); + OEBI != E;) + if (AllocaInst *AI = dyn_cast(OEBI++)) + if (isa(AI->getArraySize())) + AI->moveBefore(&*NEBI); // Now that we have created a new block, which jumps to the entry // block, insert a PHI node for each argument of the function. @@ -592,7 +583,7 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { Instruction *AccRecInstr = nullptr; BasicBlock::iterator BBI(CI); for (++BBI; &*BBI != Ret; ++BBI) { - if (canMoveAboveCall(&*BBI, CI, AA)) + if (canMoveAboveCall(&*BBI, CI, AA, AllocaForValue)) continue; // If we can't move the instruction above the call, it might be because it @@ -620,9 +611,6 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { if (!HeaderBB) createTailRecurseLoopHeader(CI); - if (RemovableCallsMustBeMarkedTail && !CI->isTailCall()) - return false; - // Ok, now that we know we have a pseudo-entry block WITH all of the // required PHI nodes, add entries into the PHI node for the actual // parameters passed into the tail-recursive call. @@ -672,8 +660,7 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { return true; } -bool TailRecursionEliminator::foldReturnAndProcessPred( - ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) { +bool TailRecursionEliminator::foldReturnAndProcessPred(ReturnInst *Ret) { BasicBlock *BB = Ret->getParent(); bool Change = false; @@ -698,8 +685,7 @@ bool TailRecursionEliminator::foldReturnAndProcessPred( while (!UncondBranchPreds.empty()) { BranchInst *BI = UncondBranchPreds.pop_back_val(); BasicBlock *Pred = BI->getParent(); - if (CallInst *CI = - findTRECandidate(BI, CannotTailCallElimCallsMarkedTail)) { + if (CallInst *CI = findTRECandidate(BI)) { LLVM_DEBUG(dbgs() << "FOLDING: " << *BB << "INTO UNCOND BRANCH PRED: " << *Pred); FoldReturnIntoUncondBranch(Ret, BB, Pred, &DTU); @@ -720,9 +706,8 @@ bool TailRecursionEliminator::foldReturnAndProcessPred( return Change; } -bool TailRecursionEliminator::processReturningBlock( - ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) { - CallInst *CI = findTRECandidate(Ret, CannotTailCallElimCallsMarkedTail); +bool TailRecursionEliminator::processReturningBlock(ReturnInst *Ret) { + CallInst *CI = findTRECandidate(Ret); if (!CI) return false; @@ -810,35 +795,25 @@ bool TailRecursionEliminator::eliminate(Function &F, return false; bool MadeChange = false; - bool AllCallsAreTailCalls = false; - MadeChange |= markTails(F, AllCallsAreTailCalls, ORE); - if (!AllCallsAreTailCalls) - return MadeChange; + MadeChange |= markTails(F, ORE); // If this function is a varargs function, we won't be able to PHI the args // right, so don't even try to convert it... if (F.getFunctionType()->isVarArg()) return MadeChange; - // If false, we cannot perform TRE on tail calls marked with the 'tail' - // attribute, because doing so would cause the stack size to increase (real - // TRE would deallocate variable sized allocas, TRE doesn't). - bool CanTRETailMarkedCall = canTRE(F); + if (!canTRE(F)) + return MadeChange; TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU); // Change any tail recursive calls to loops. - // - // FIXME: The code generator produces really bad code when an 'escaping - // alloca' is changed from being a static alloca to being a dynamic alloca. - // Until this is resolved, disable this transformation if that would ever - // happen. This bug is PR962. for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /*in loop*/) { BasicBlock *BB = &*BBI++; // foldReturnAndProcessPred may delete BB. if (ReturnInst *Ret = dyn_cast(BB->getTerminator())) { - bool Change = TRE.processReturningBlock(Ret, !CanTRETailMarkedCall); + bool Change = TRE.processReturningBlock(Ret); if (!Change && BB->getFirstNonPHIOrDbg() == Ret) - Change = TRE.foldReturnAndProcessPred(Ret, !CanTRETailMarkedCall); + Change = TRE.foldReturnAndProcessPred(Ret); MadeChange |= Change; } } diff --git a/llvm/test/Transforms/TailCallElim/basic.ll b/llvm/test/Transforms/TailCallElim/basic.ll index 6116014a024b..669210da6314 100644 --- a/llvm/test/Transforms/TailCallElim/basic.ll +++ b/llvm/test/Transforms/TailCallElim/basic.ll @@ -12,15 +12,16 @@ define void @test0() { ret void } -; PR615. Make sure that we do not move the alloca so that it interferes with the tail call. +; Make sure that we do not do TRE if pointer to local stack +; escapes through function call. define i32 @test1() { ; CHECK: i32 @test1() ; CHECK-NEXT: alloca %A = alloca i32 ; [#uses=2] store i32 5, i32* %A call void @use(i32* %A) -; CHECK: tail call i32 @test1 - %X = tail call i32 @test1() ; [#uses=1] +; CHECK: call i32 @test1 + %X = call i32 @test1() ; [#uses=1] ret i32 %X } diff --git a/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll b/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll new file mode 100644 index 000000000000..8f69087dd879 --- /dev/null +++ b/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll @@ -0,0 +1,125 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -tailcallelim -verify-dom-info -S | FileCheck %s + +; This test checks that TRE would be done for only one recursive call. +; The test_multiple_exits function has three recursive calls. +; First recursive call could not be eliminated because there is +; escaped pointer to local variable. Second recursive call could +; be eliminated. Thrid recursive call could not be eliminated since +; this is not last call. Thus, test checks that TRE would be done +; for only second recursive call. + +; IR for that test was generated from the following C++ source: +; +; void capture_arg (int*); +; void test_multiple_exits (int param); +; if (param >= 0 && param < 10) { +; int temp; +; capture_arg(&temp); +; // TRE could not be done because pointer to local +; // variable "temp" is escaped. +; test_multiple_exits(param + 1); +; } else if (param >=10 && param < 20) { +; // TRE should be done. +; test_multiple_exits(param + 1); +; } else if (param >= 20 && param < 22) { +; // TRE could not be done since recursive +; // call is not last call. +; test_multiple_exits(param + 1); +; func(); +; } +; +; return; +; } + +; Function Attrs: noinline optnone uwtable +declare void @_Z11capture_argPi(i32* %param) #0 + +; Function Attrs: noinline optnone uwtable +declare void @_Z4funcv() #0 + +; Function Attrs: noinline nounwind uwtable +define dso_local void @_Z19test_multiple_exitsi(i32 %param) local_unnamed_addr #2 { +; CHECK-LABEL: @_Z19test_multiple_exitsi( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4 +; CHECK-NEXT: br label [[TAILRECURSE:%.*]] +; CHECK: tailrecurse: +; CHECK-NEXT: [[PARAM_TR:%.*]] = phi i32 [ [[PARAM:%.*]], [[ENTRY:%.*]] ], [ [[ADD6:%.*]], [[IF_THEN5:%.*]] ] +; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[PARAM_TR]], 10 +; CHECK-NEXT: br i1 [[TMP0]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] +; CHECK: if.then: +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[TEMP]] to i8* +; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP1]]) #1 +; CHECK-NEXT: call void @_Z11capture_argPi(i32* nonnull [[TEMP]]) +; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[PARAM_TR]], 1 +; CHECK-NEXT: call void @_Z19test_multiple_exitsi(i32 [[ADD]]) +; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP1]]) #1 +; CHECK-NEXT: br label [[IF_END14:%.*]] +; CHECK: if.else: +; CHECK-NEXT: [[PARAM_OFF:%.*]] = add i32 [[PARAM_TR]], -10 +; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[PARAM_OFF]], 10 +; CHECK-NEXT: br i1 [[TMP2]], label [[IF_THEN5]], label [[IF_ELSE7:%.*]] +; CHECK: if.then5: +; CHECK-NEXT: [[ADD6]] = add nuw nsw i32 [[PARAM_TR]], 1 +; CHECK-NEXT: br label [[TAILRECURSE]] +; CHECK: if.else7: +; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[PARAM_TR]], -2 +; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP3]], 20 +; CHECK-NEXT: br i1 [[TMP4]], label [[IF_THEN11:%.*]], label [[IF_END14]] +; CHECK: if.then11: +; CHECK-NEXT: [[ADD12:%.*]] = add nsw i32 [[PARAM_TR]], 1 +; CHECK-NEXT: tail call void @_Z19test_multiple_exitsi(i32 [[ADD12]]) +; CHECK-NEXT: tail call void @_Z4funcv() +; CHECK-NEXT: ret void +; CHECK: if.end14: +; CHECK-NEXT: ret void +; +entry: + %temp = alloca i32, align 4 + %0 = icmp ult i32 %param, 10 + br i1 %0, label %if.then, label %if.else + +if.then: ; preds = %entry + %1 = bitcast i32* %temp to i8* + call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1) #2 + call void @_Z11capture_argPi(i32* nonnull %temp) + %add = add nuw nsw i32 %param, 1 + call void @_Z19test_multiple_exitsi(i32 %add) + call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1) #2 + br label %if.end14 + +if.else: ; preds = %entry + %param.off = add i32 %param, -10 + %2 = icmp ult i32 %param.off, 10 + br i1 %2, label %if.then5, label %if.else7 + +if.then5: ; preds = %if.else + %add6 = add nuw nsw i32 %param, 1 + call void @_Z19test_multiple_exitsi(i32 %add6) + br label %if.end14 + +if.else7: ; preds = %if.else + %3 = and i32 %param, -2 + %4 = icmp eq i32 %3, 20 + br i1 %4, label %if.then11, label %if.end14 + +if.then11: ; preds = %if.else7 + %add12 = add nsw i32 %param, 1 + call void @_Z19test_multiple_exitsi(i32 %add12) + call void @_Z4funcv() + br label %if.end14 + +if.end14: ; preds = %if.then5, %if.then11, %if.else7, %if.then + ret void +} + +; Function Attrs: argmemonly nounwind willreturn +declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 + +; Function Attrs: argmemonly nounwind willreturn +declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 + +attributes #0 = { nofree noinline norecurse nounwind uwtable } +attributes #1 = { nounwind uwtable } +attributes #2 = { argmemonly nounwind willreturn } diff --git a/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll b/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll new file mode 100644 index 000000000000..2168437fc570 --- /dev/null +++ b/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll @@ -0,0 +1,74 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -tailcallelim -verify-dom-info -S | FileCheck %s + +; IR for that test was generated from the following C++ source: +; +;int count; +;__attribute__((noinline)) void globalIncrement(const int* param) { count += *param; } +; +;void test(int recurseCount) +;{ +; if (recurseCount == 0) return; +; int temp = 10; +; globalIncrement(&temp); +; test(recurseCount - 1); +;} +; + + at count = dso_local local_unnamed_addr global i32 0, align 4 + +; Function Attrs: nofree noinline norecurse nounwind uwtable +declare void @_Z15globalIncrementPKi(i32* nocapture readonly %param) #0 + +; Test that TRE could be done for recursive tail routine containing +; call to function receiving a pointer to local stack. + +; Function Attrs: nounwind uwtable +define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 { +; CHECK-LABEL: @_Z4testi( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4 +; CHECK-NEXT: br label [[TAILRECURSE:%.*]] +; CHECK: tailrecurse: +; CHECK-NEXT: [[RECURSECOUNT_TR:%.*]] = phi i32 [ [[RECURSECOUNT:%.*]], [[ENTRY:%.*]] ], [ [[SUB:%.*]], [[IF_END:%.*]] ] +; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[RECURSECOUNT_TR]], 0 +; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[IF_END]] +; CHECK: if.end: +; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[TEMP]] to i8* +; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]]) +; CHECK-NEXT: store i32 10, i32* [[TEMP]], align 4 +; CHECK-NEXT: call void @_Z15globalIncrementPKi(i32* nonnull [[TEMP]]) +; CHECK-NEXT: [[SUB]] = add nsw i32 [[RECURSECOUNT_TR]], -1 +; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]]) +; CHECK-NEXT: br label [[TAILRECURSE]] +; CHECK: return: +; CHECK-NEXT: ret void +; +entry: + %temp = alloca i32, align 4 + %cmp = icmp eq i32 %recurseCount, 0 + br i1 %cmp, label %return, label %if.end + +if.end: ; preds = %entry + %0 = bitcast i32* %temp to i8* + call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6 + store i32 10, i32* %temp, align 4 + call void @_Z15globalIncrementPKi(i32* nonnull %temp) + %sub = add nsw i32 %recurseCount, -1 + call void @_Z4testi(i32 %sub) + call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6 + br label %return + +return: ; preds = %entry, %if.end + ret void +} + +; Function Attrs: argmemonly nounwind willreturn +declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 + +; Function Attrs: argmemonly nounwind willreturn +declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 + +attributes #0 = { nofree noinline norecurse nounwind uwtable } +attributes #1 = { nounwind uwtable } +attributes #2 = { argmemonly nounwind willreturn } From llvm-commits at lists.llvm.org Sat Jul 11 04:03:03 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 11:03:03 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <8b57a192e135eaac29ff80fd1d3aa6f3@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf7907e9d223d: [TRE] allow TRE for non-capturing calls. (authored by avl). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 Files: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/test/Transforms/TailCallElim/basic.ll llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82085.277231.patch Type: text/x-patch Size: 19838 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 04:03:07 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 11:03:07 +0000 (UTC) Subject: [PATCH] D70376: [LVI] Restructure caching In-Reply-To: References: Message-ID: nikic added a comment. In D70376#2145225 , @tejohnson wrote: > Hi @nikic, I just tracked down a big compile time increase to this patch. The issue cropped up for a very large function, where a cycle profile showed hotspots in: > > llvm::DenseMapBase, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo >, llvm::detail::DenseMapPair, llvm::ValueLatticeElement> >, llvm::AssertingVH, llvm::ValueLatticeElement, llvm::DenseMapInfo >, llvm::detail::DenseMapPair, llvm::ValueLatticeElement> >::erase(llvm::AssertingVH const&) > > and > > (anonymous namespace)::LVIValueHandle::deleted() > > The problem is related to this patch's restructuring of the LazyValueInfoCache so that instead of a 2-level map from Value* -> BB -> ValueLatticeElement, it now has a 2-level map from BB -> Value -> ValueLatticeElement. The problem is that LVIValueHandle::deleted invokes LazyValueInfoCache::eraseValue on a Value*, which now needs to walk through every entry in the outer map to remove the Value* from every block containing it in its lattice. Before, it could simply do a single lookup on the outer map to remove the Value. > > When I revert this patch the compile time goes down ~35%. > > I noticed in your description this comment: > "A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914 , but possibly that's not the right choice." > > It sounds like your proposed alternative would address this issue. Would you be able to do that? Unfortunately the alternative approach has its own issues. It would fix this performance problem, but I think it would also raise memory usage significantly. D81811 might be a good way to go about it, because it removes the need for separate tracking of overdefined values, which would fix the non-determinism problem here as a side-effect. Is it possible to share the problematic test case, so I can evaluate different approaches? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70376/new/ https://reviews.llvm.org/D70376 From llvm-commits at lists.llvm.org Sat Jul 11 04:22:20 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sat, 11 Jul 2020 04:22:20 -0700 (PDT) Subject: [llvm] d7a0569 - [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. Message-ID: <5f09a0ec.1c69fb81.95a14.33b0@mx.google.com> Author: Christudasan Devadasan Date: 2020-07-11T16:33:38+05:30 New Revision: d7a05698efcfa6c596bcaadd8d5154612990f8f3 URL: https://github.com/llvm/llvm-project/commit/d7a05698efcfa6c596bcaadd8d5154612990f8f3 DIFF: https://github.com/llvm/llvm-project/commit/d7a05698efcfa6c596bcaadd8d5154612990f8f3.diff LOG: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. It is possible that LowerSwitch pass leaves certain blocks unreachable from the entry. If not removed, these dead blocks can cause undefined behavior in the subsequent passes. It caused a crash in the AMDGPU backend after the instruction selection when a PHI node has its incoming values coming from these unreachable blocks. In the AMDGPU pass flow, the last invocation of UnreachableBlockElim precedes where LowerSwitch is currently placed and eventually missed out on the opportunity to get these blocks eliminated. This patch ensures that LowerSwitch pass get inserted earlier to make use of the existing unreachable block elimination pass. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D83584 Added: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll Modified: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 8604f5005eb2..b4b10835837c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -787,10 +787,15 @@ void AMDGPUPassConfig::addCodeGenPrepare() { if (EnableLoadStoreVectorizer) addPass(createLoadStoreVectorizerPass()); + + // LowerSwitch pass may introduce unreachable blocks that can + // cause unexpected behavior for subsequent passes. Placing it + // here seems better that these blocks would get cleaned up by + // UnreachableBlockElim inserted next in the pass flow. + addPass(createLowerSwitchPass()); } bool AMDGPUPassConfig::addPreISel() { - addPass(createLowerSwitchPass()); addPass(createFlattenCFGPass()); return false; } diff --git a/llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll b/llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll new file mode 100644 index 000000000000..13c4dc80be15 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll @@ -0,0 +1,60 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s | FileCheck -check-prefix=GCN %s +define void @test() #1 { + ; Clean up the unreachable blocks introduced with LowerSwitch pass. + ; This test ensures that, in the pass flow, UnreachableBlockElim pass + ; follows the LowerSwitch. Otherwise, this testcase will crash + ; immediately after the instruction selection due to the incomplete + ; PHI node in an MBB whose incoming values were never codegenerated. + ; + ; GCN-LABEL: name: test + ; GCN: bb.{{[0-9]+}}.entry: + ; GCN: bb.{{[0-9]+}}.entry.true.blk: + ; GCN: bb.{{[0-9]+}}.entry.false.blk: + ; GCN: bb.{{[0-9]+}}.switch.blk: + + ; GCN-NOT: bb.{{[0-9]+}}.preheader.blk + ; GCN-NOT: bb.{{[0-9]+}}.pre.false.blk: + ; GCN-NOT: bb.{{[0-9]+}}.unreach.blk: + ; GCN-NOT: PHI + + ; GCN: bb.{{[0-9]+}}.exit: + entry: + %idx = tail call i32 @llvm.amdgcn.workitem.id.x() #0 + br i1 undef, label %entry.true.blk, label %entry.false.blk + + entry.true.blk: ; preds = %entry + %exit.cmp = icmp ult i32 %idx, 3 + br i1 %exit.cmp, label %switch.blk, label %exit + + entry.false.blk: ; preds = %entry + unreachable + + switch.blk: ; preds = %entry.true.blk + switch i32 %idx, label %preheader.blk [ + i32 0, label %exit + i32 1, label %exit + i32 2, label %exit + ] + + preheader.blk: ; preds = %switch.blk + %pre.exit = icmp ult i32 %idx, 5 + br i1 %pre.exit, label %unreach.blk, label %pre.false.blk + + pre.false.blk: ; preds = %preheader.blk + %call.pre.false = tail call i32 @func(i32 %idx) #0 + br label %unreach.blk + + unreach.blk: ; preds = %preheader.blk, %pre.false.blk + %phi.val = phi i32 [ %call.pre.false, %pre.false.blk ], [ undef, %preheader.blk ] + store i32 %phi.val, i32* undef + unreachable + + exit: ; preds = %switch.blk + ret void +} + +declare i32 @llvm.amdgcn.workitem.id.x() #0 +declare i32 @func(i32)#0 + +attributes #0 = { nounwind readnone } +attributes #1 = { nounwind } From llvm-commits at lists.llvm.org Sat Jul 11 04:22:29 2020 From: llvm-commits at lists.llvm.org (Christudasan Devadasan via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 11:22:29 +0000 (UTC) Subject: [PATCH] D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. In-Reply-To: References: Message-ID: <8194017ce7b4bda39a5fe5dbf8b92c66@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGd7a05698efcf: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare. (authored by cdevadas). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83584/new/ https://reviews.llvm.org/D83584 Files: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll Index: llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll @@ -0,0 +1,60 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s | FileCheck -check-prefix=GCN %s +define void @test() #1 { + ; Clean up the unreachable blocks introduced with LowerSwitch pass. + ; This test ensures that, in the pass flow, UnreachableBlockElim pass + ; follows the LowerSwitch. Otherwise, this testcase will crash + ; immediately after the instruction selection due to the incomplete + ; PHI node in an MBB whose incoming values were never codegenerated. + ; + ; GCN-LABEL: name: test + ; GCN: bb.{{[0-9]+}}.entry: + ; GCN: bb.{{[0-9]+}}.entry.true.blk: + ; GCN: bb.{{[0-9]+}}.entry.false.blk: + ; GCN: bb.{{[0-9]+}}.switch.blk: + + ; GCN-NOT: bb.{{[0-9]+}}.preheader.blk + ; GCN-NOT: bb.{{[0-9]+}}.pre.false.blk: + ; GCN-NOT: bb.{{[0-9]+}}.unreach.blk: + ; GCN-NOT: PHI + + ; GCN: bb.{{[0-9]+}}.exit: + entry: + %idx = tail call i32 @llvm.amdgcn.workitem.id.x() #0 + br i1 undef, label %entry.true.blk, label %entry.false.blk + + entry.true.blk: ; preds = %entry + %exit.cmp = icmp ult i32 %idx, 3 + br i1 %exit.cmp, label %switch.blk, label %exit + + entry.false.blk: ; preds = %entry + unreachable + + switch.blk: ; preds = %entry.true.blk + switch i32 %idx, label %preheader.blk [ + i32 0, label %exit + i32 1, label %exit + i32 2, label %exit + ] + + preheader.blk: ; preds = %switch.blk + %pre.exit = icmp ult i32 %idx, 5 + br i1 %pre.exit, label %unreach.blk, label %pre.false.blk + + pre.false.blk: ; preds = %preheader.blk + %call.pre.false = tail call i32 @func(i32 %idx) #0 + br label %unreach.blk + + unreach.blk: ; preds = %preheader.blk, %pre.false.blk + %phi.val = phi i32 [ %call.pre.false, %pre.false.blk ], [ undef, %preheader.blk ] + store i32 %phi.val, i32* undef + unreachable + + exit: ; preds = %switch.blk + ret void +} + +declare i32 @llvm.amdgcn.workitem.id.x() #0 +declare i32 @func(i32)#0 + +attributes #0 = { nounwind readnone } +attributes #1 = { nounwind } Index: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -787,10 +787,15 @@ if (EnableLoadStoreVectorizer) addPass(createLoadStoreVectorizerPass()); + + // LowerSwitch pass may introduce unreachable blocks that can + // cause unexpected behavior for subsequent passes. Placing it + // here seems better that these blocks would get cleaned up by + // UnreachableBlockElim inserted next in the pass flow. + addPass(createLowerSwitchPass()); } bool AMDGPUPassConfig::addPreISel() { - addPass(createLowerSwitchPass()); addPass(createFlattenCFGPass()); return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83584.277232.patch Type: text/x-patch Size: 3279 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 05:36:28 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sat, 11 Jul 2020 05:36:28 -0700 (PDT) Subject: [llvm] 850b150 - [Attributor][NFC] Add more debug output for deleted functions Message-ID: <5f09b24c.1c69fb81.b3ce.5b1a@mx.google.com> Author: sstefan1 Date: 2020-07-11T14:26:08+02:00 New Revision: 850b150cff3dfb5f2113d9c3c483e2d22b318ced URL: https://github.com/llvm/llvm-project/commit/850b150cff3dfb5f2113d9c3c483e2d22b318ced DIFF: https://github.com/llvm/llvm-project/commit/850b150cff3dfb5f2113d9c3c483e2d22b318ced.diff LOG: [Attributor][NFC] Add more debug output for deleted functions Added: Modified: llvm/lib/Transforms/IPO/Attributor.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp b/llvm/lib/Transforms/IPO/Attributor.cpp index 6d7f08bfbe07..7f252079e053 100644 --- a/llvm/lib/Transforms/IPO/Attributor.cpp +++ b/llvm/lib/Transforms/IPO/Attributor.cpp @@ -1180,6 +1180,9 @@ ChangeStatus Attributor::cleanupIR() { } } + LLVM_DEBUG(dbgs() << "[Attributor] DeadInsts size: " << DeadInsts.size() + << "\n"); + RecursivelyDeleteTriviallyDeadInstructions(DeadInsts); if (unsigned NumDeadBlocks = ToBeDeletedBlocks.size()) { @@ -1238,6 +1241,9 @@ ChangeStatus Attributor::cleanupIR() { NumFnDeleted += ToBeDeletedFunctions.size(); + LLVM_DEBUG(dbgs() << "[Attributor] Deleted " << NumFnDeleted + << " functions after manifest.\n"); + #ifdef EXPENSIVE_CHECKS for (Function *F : Functions) { if (ToBeDeletedFunctions.count(F)) From llvm-commits at lists.llvm.org Sat Jul 11 07:09:56 2020 From: llvm-commits at lists.llvm.org (Michael Liao via llvm-commits) Date: Sat, 11 Jul 2020 07:09:56 -0700 (PDT) Subject: [llvm] 0b4cf80 - [fix-irreducible] Skip unreachable predecessors. Message-ID: <5f09c834.1c69fb81.a57d0.474d@mx.google.com> Author: Michael Liao Date: 2020-07-11T10:08:44-04:00 New Revision: 0b4cf802fad4f504aefbeb70c061e60cff10d153 URL: https://github.com/llvm/llvm-project/commit/0b4cf802fad4f504aefbeb70c061e60cff10d153 DIFF: https://github.com/llvm/llvm-project/commit/0b4cf802fad4f504aefbeb70c061e60cff10d153.diff LOG: [fix-irreducible] Skip unreachable predecessors. Summary: - Skip unreachable predecessors during header detection in SCC. Those unreachable blocks would be generated in the switch lowering pass in the corner cases or other frontends. Even though they could be removed through the CFG simplification, we should skip them during header detection. Reviewers: sameerds Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83562 Added: llvm/test/Transforms/FixIrreducible/unreachable.ll Modified: llvm/lib/Transforms/Utils/FixIrreducible.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Utils/FixIrreducible.cpp b/llvm/lib/Transforms/Utils/FixIrreducible.cpp index 510c033f6474..452463c9b627 100644 --- a/llvm/lib/Transforms/Utils/FixIrreducible.cpp +++ b/llvm/lib/Transforms/Utils/FixIrreducible.cpp @@ -281,6 +281,9 @@ static bool makeReducible(LoopInfo &LI, DominatorTree &DT, Graph &&G) { LLVM_DEBUG(dbgs() << "Found headers:"); for (auto BB : reverse(Blocks)) { for (const auto P : predecessors(BB)) { + // Skip unreachable predecessors. + if (!DT.isReachableFromEntry(P)) + continue; if (!Blocks.count(P)) { LLVM_DEBUG(dbgs() << " " << BB->getName()); Headers.insert(BB); diff --git a/llvm/test/Transforms/FixIrreducible/unreachable.ll b/llvm/test/Transforms/FixIrreducible/unreachable.ll new file mode 100644 index 000000000000..71cd81e01953 --- /dev/null +++ b/llvm/test/Transforms/FixIrreducible/unreachable.ll @@ -0,0 +1,24 @@ +; RUN: opt %s -fix-irreducible -S -o - | FileCheck %s + +; CHECK-LABEL: @unreachable( +; CHECK: entry: +; CHECK-NOT: irr.guard: +define void @unreachable(i32 %n) { +entry: + br label %loop.body + +loop.body: + br label %inner.block + +unreachable.block: + br label %inner.block + +inner.block: + br i1 undef, label %loop.exit, label %loop.latch + +loop.latch: + br label %loop.body + +loop.exit: + ret void +} From llvm-commits at lists.llvm.org Sat Jul 11 07:09:58 2020 From: llvm-commits at lists.llvm.org (Michael Liao via llvm-commits) Date: Sat, 11 Jul 2020 07:09:58 -0700 (PDT) Subject: [llvm] 81db614 - Fix `-Wunused-variable` warnings. NFC. Message-ID: <5f09c836.1c69fb81.95d9.5eaf@mx.google.com> Author: Michael Liao Date: 2020-07-11T10:09:44-04:00 New Revision: 81db614411bdc8f95e5b7e2acaf551507eb7201b URL: https://github.com/llvm/llvm-project/commit/81db614411bdc8f95e5b7e2acaf551507eb7201b DIFF: https://github.com/llvm/llvm-project/commit/81db614411bdc8f95e5b7e2acaf551507eb7201b.diff LOG: Fix `-Wunused-variable` warnings. NFC. Added: Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index f25e95466407..2a3b2abf6176 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -1043,12 +1043,12 @@ bool OpenMPOpt::rewriteDeviceCodeStateMachine() { return; } - if (auto *Cmp = dyn_cast(U.getUser())) { + if (isa(U.getUser())) { ToBeReplacedStateMachineUses.push_back(&U); return; } - if (CallInst *CI = OpenMPOpt::getCallIfRegularCall( - *U.getUser(), &KernelPrepareParallelRFI)) { + if (OpenMPOpt::getCallIfRegularCall(*U.getUser(), + &KernelPrepareParallelRFI)) { ToBeReplacedStateMachineUses.push_back(&U); return; } From llvm-commits at lists.llvm.org Sat Jul 11 07:10:12 2020 From: llvm-commits at lists.llvm.org (Michael Liao via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 14:10:12 +0000 (UTC) Subject: [PATCH] D83562: [fix-irreducible] Skip unreachable predecessors. In-Reply-To: References: Message-ID: <28967b37092e990e1257fc5deaf3fc74@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG0b4cf802fad4: [fix-irreducible] Skip unreachable predecessors. (authored by hliao). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83562/new/ https://reviews.llvm.org/D83562 Files: llvm/lib/Transforms/Utils/FixIrreducible.cpp llvm/test/Transforms/FixIrreducible/unreachable.ll Index: llvm/test/Transforms/FixIrreducible/unreachable.ll =================================================================== --- /dev/null +++ llvm/test/Transforms/FixIrreducible/unreachable.ll @@ -0,0 +1,24 @@ +; RUN: opt %s -fix-irreducible -S -o - | FileCheck %s + +; CHECK-LABEL: @unreachable( +; CHECK: entry: +; CHECK-NOT: irr.guard: +define void @unreachable(i32 %n) { +entry: + br label %loop.body + +loop.body: + br label %inner.block + +unreachable.block: + br label %inner.block + +inner.block: + br i1 undef, label %loop.exit, label %loop.latch + +loop.latch: + br label %loop.body + +loop.exit: + ret void +} Index: llvm/lib/Transforms/Utils/FixIrreducible.cpp =================================================================== --- llvm/lib/Transforms/Utils/FixIrreducible.cpp +++ llvm/lib/Transforms/Utils/FixIrreducible.cpp @@ -281,6 +281,9 @@ LLVM_DEBUG(dbgs() << "Found headers:"); for (auto BB : reverse(Blocks)) { for (const auto P : predecessors(BB)) { + // Skip unreachable predecessors. + if (!DT.isReachableFromEntry(P)) + continue; if (!Blocks.count(P)) { LLVM_DEBUG(dbgs() << " " << BB->getName()); Headers.insert(BB); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83562.277238.patch Type: text/x-patch Size: 1223 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 07:30:25 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 14:30:25 +0000 (UTC) Subject: [PATCH] D83576: [BasicAA] Fix -basicaa-recphi for geps with negative offsets In-Reply-To: References: Message-ID: <20d4135e6444a0efe5ae168ccd8cbb89@localhost.localdomain> dmgreen updated this revision to Diff 277241. dmgreen added a comment. Thanks. This creates a lambda, adds inbounds and isNegative, and updates some comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83576/new/ https://reviews.llvm.org/D83576 Files: llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/recphi.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83576.277241.patch Type: text/x-patch Size: 5559 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 07:33:09 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 14:33:09 +0000 (UTC) Subject: [PATCH] D83624: [DWARFYAML] Implement the .debug_rnglists section. Message-ID: Higuoxing created this revision. Higuoxing added reviewers: jhenderson, grimar, MaskRay, aprantl. Herald added subscribers: llvm-commits, hiraditya, emaste. Herald added a reviewer: espindola. Herald added a project: LLVM. This patch implements the .debug_rnglists section. We are able to produce the .debug_rnglists section by the following syntax. debug_rnglists: - Format: DWARF32 ## Optional Length: 0x1234 ## Optional Version: 5 ## Required AddressSize: 0x08 ## Optional SegmentSelectorSize: 0x00 ## Optional OffsetEntryCount: 2 ## Optional Offsets: [1, 2] ## Optional Lists: - Entries: - Operator: DW_RLE_base_address Values: [ 0x1234 ] The generated .debug_rnglists is verified by llvm-dwarfdump, except for the operator DW_RLE_startx_endx, since llvm-dwarfdump doesn't support it. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83624 Files: llvm/include/llvm/ObjectYAML/DWARFEmitter.h llvm/include/llvm/ObjectYAML/DWARFYAML.h llvm/lib/ObjectYAML/DWARFEmitter.cpp llvm/lib/ObjectYAML/DWARFYAML.cpp llvm/lib/ObjectYAML/ELFEmitter.cpp llvm/test/tools/yaml2obj/ELF/DWARF/debug-rnglists.yaml -------------- next part -------------- A non-text attachment was scrubbed... Name: D83624.277243.patch Type: text/x-patch Size: 39541 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 08:10:26 2020 From: llvm-commits at lists.llvm.org (Joachim Protze via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 15:10:26 +0000 (UTC) Subject: [PATCH] D83625: [TSan] Optimize handling of racy address Message-ID: protze.joachim created this revision. protze.joachim added reviewers: kcc, dvyukov. protze.joachim added a project: Sanitizers. Herald added a reviewer: jdoerfert. Herald added subscribers: Sanitizers, sstefan1. This patch splits the handling of racy address and racy stack into separate functions. If a race was already reported for the address, we can avoid the cost for collecting the involved stacks. This patch also removes the race condition in storing the racy address / racy stack. This race allows all threads to report the race. Because all threads get the read lock first, it is quite probable that they all finish the lookup before one thread gets the chance to aquire the write lock. For certain data race patterns in OpenMP programs, this patch significantly reduces the execution time. As an example the execution times for below code: master (report_bugs=1): real 0m24s master (report_bugs=0): real 0m0.2s patch (report_bugs=1): real 0m0.5s patch (report_bugs=0): real 0m0.2s #include int main(void) { long sum=0; #pragma omp parallel num_threads(4) //reduction(+:sum) for(int i=0; i<1000000; i++) { sum++; } printf("Sum: %ld\n",sum); } Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83625 Files: compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83625.277242.patch Type: text/x-patch Size: 4173 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 08:49:18 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 15:49:18 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: <89f92c9e6a3123b3edecba822c966af4@localhost.localdomain> benshi001 updated this revision to Diff 277244. benshi001 edited the summary of this revision. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 Files: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll llvm/test/CodeGen/RISCV/addimm-mulimm.ll llvm/test/CodeGen/X86/urem-seteq-nonzero.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83153.277244.patch Type: text/x-patch Size: 11010 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 08:56:04 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 15:56:04 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: benshi001 added a comment. Change list according to all your comments. 1. Seperate the test cases to show improvement in another patch. Done. https://reviews.llvm.org/D83159, which has been landed. 2. Make sure c1 and c2 do not exceed int64, to avoid assert failure. Done. One more if-statment is added to check that. 3. Check if c1*c2 is overflow. If we stop the transform when c1*c2 overflows, the x86 will be impacked a lot, I am afraid introducing more regression. 4. Make a inverse transform if "opt -instcombine" has been performed. Shall we seperate this inverse transform in another patch? At least this patch improves the test case urem-seteq-nonzero.ll, and the case llvm/test/CodeGen/RISCV/addimm-mulimm.ll 5. Some other small fixes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Sat Jul 11 08:59:33 2020 From: llvm-commits at lists.llvm.org (Ben Shi via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 15:59:33 +0000 (UTC) Subject: [PATCH] D83153: [DAGCombiner] Prevent regression in isMulAddWithConstProfitable In-Reply-To: References: Message-ID: benshi001 added a comment. Concusion, 1. RISCV got improved 2. X86 got slight improved 3. For aarch64's test urem-seteq-nonzero.ll, 3.1. 2 cases have one more instruction emitted, 3.2. 2 other cases have one less instruction emitted, 3.3. 9 other 9 cases have no change in instruction amount, but have madd replaced by mul. Since madd has larger latency than mul, I think my change also makes aarch64 optimized in total. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83153/new/ https://reviews.llvm.org/D83153 From llvm-commits at lists.llvm.org Sat Jul 11 09:45:23 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sat, 11 Jul 2020 09:45:23 -0700 (PDT) Subject: [llvm] 6e42a41 - [flang][openmp] Check clauses allowed semantic with tablegen generated map Message-ID: <5f09eca3.1c69fb81.486ff.5652@mx.google.com> Author: Valentin Clement Date: 2020-07-11T12:45:12-04:00 New Revision: 6e42a417bacbfd5a1f58b0ccb7c9b34ff9e54523 URL: https://github.com/llvm/llvm-project/commit/6e42a417bacbfd5a1f58b0ccb7c9b34ff9e54523 DIFF: https://github.com/llvm/llvm-project/commit/6e42a417bacbfd5a1f58b0ccb7c9b34ff9e54523.diff LOG: [flang][openmp] Check clauses allowed semantic with tablegen generated map Summary: This patch is enabling the generation of clauses enum sets for semantics check in Flang through tablegen. Enum sets and directive - sets map is generated by the new tablegen infrsatructure for OpenMP and other directive languages. The semantic checks for OpenMP are modified to use this newly generated map. Reviewers: DavidTruby, sscalpone, kiranchandramohan, ichoyjx, jdoerfert Reviewed By: DavidTruby, ichoyjx Subscribers: mgorny, yaxunl, hiraditya, guansong, sstefan1, aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83326 Added: Modified: flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/check-omp-structure.h flang/test/Semantics/omp-clause-validity01.f90 llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp llvm/utils/TableGen/TableGen.cpp llvm/utils/TableGen/TableGenBackends.h Removed: ################################################################################ diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp index b4e86faffe19..a5f65bcbc804 100644 --- a/flang/lib/Semantics/check-omp-structure.cpp +++ b/flang/lib/Semantics/check-omp-structure.cpp @@ -13,58 +13,6 @@ namespace Fortran::semantics { -static OmpClauseSet doAllowedClauses{llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_firstprivate, llvm::omp::Clause::OMPC_lastprivate, - llvm::omp::Clause::OMPC_linear, llvm::omp::Clause::OMPC_reduction}; -static OmpClauseSet doAllowedOnceClauses{llvm::omp::Clause::OMPC_schedule, - llvm::omp::Clause::OMPC_collapse, llvm::omp::Clause::OMPC_ordered}; - -static OmpClauseSet simdAllowedClauses{llvm::omp::Clause::OMPC_linear, - llvm::omp::Clause::OMPC_aligned, llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_lastprivate, llvm::omp::Clause::OMPC_reduction}; -static OmpClauseSet simdAllowedOnceClauses{llvm::omp::Clause::OMPC_collapse, - llvm::omp::Clause::OMPC_safelen, llvm::omp::Clause::OMPC_simdlen}; - -static OmpClauseSet parallelAllowedClauses{llvm::omp::Clause::OMPC_default, - llvm::omp::Clause::OMPC_private, llvm::omp::Clause::OMPC_firstprivate, - llvm::omp::Clause::OMPC_shared, llvm::omp::Clause::OMPC_copyin, - llvm::omp::Clause::OMPC_reduction}; -static OmpClauseSet parallelAllowedOnceClauses{llvm::omp::Clause::OMPC_if, - llvm::omp::Clause::OMPC_num_threads, llvm::omp::Clause::OMPC_proc_bind}; - -static OmpClauseSet taskloopAllowedClauses{llvm::omp::Clause::OMPC_shared, - llvm::omp::Clause::OMPC_private, llvm::omp::Clause::OMPC_firstprivate, - llvm::omp::Clause::OMPC_lastprivate, llvm::omp::Clause::OMPC_default, - llvm::omp::Clause::OMPC_untied, llvm::omp::Clause::OMPC_mergeable, - llvm::omp::Clause::OMPC_nogroup}; -static OmpClauseSet taskloopAllowedOnceClauses{llvm::omp::Clause::OMPC_collapse, - llvm::omp::Clause::OMPC_if, llvm::omp::Clause::OMPC_final, - llvm::omp::Clause::OMPC_priority}; -static OmpClauseSet taskloopAllowedExclusiveClauses{ - llvm::omp::Clause::OMPC_grainsize, llvm::omp::Clause::OMPC_num_tasks}; - -static OmpClauseSet distributeAllowedClauses{llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_firstprivate, llvm::omp::Clause::OMPC_lastprivate}; -static OmpClauseSet distributeAllowedOnceClauses{ - llvm::omp::Clause::OMPC_collapse, llvm::omp::Clause::OMPC_dist_schedule}; - -static OmpClauseSet targetAllowedClauses{llvm::omp::Clause::OMPC_if, - llvm::omp::Clause::OMPC_private, llvm::omp::Clause::OMPC_firstprivate, - llvm::omp::Clause::OMPC_map, llvm::omp::Clause::OMPC_is_device_ptr, - llvm::omp::Clause::OMPC_depend}; -static OmpClauseSet targetAllowedOnceClauses{llvm::omp::Clause::OMPC_device, - llvm::omp::Clause::OMPC_defaultmap, llvm::omp::Clause::OMPC_nowait}; - -static OmpClauseSet teamsAllowedClauses{llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_firstprivate, llvm::omp::Clause::OMPC_shared, - llvm::omp::Clause::OMPC_reduction}; -static OmpClauseSet teamsAllowedOnceClauses{llvm::omp::Clause::OMPC_num_teams, - llvm::omp::Clause::OMPC_thread_limit, llvm::omp::Clause::OMPC_default}; - -static OmpClauseSet sectionsAllowedClauses{llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_firstprivate, llvm::omp::Clause::OMPC_lastprivate, - llvm::omp::Clause::OMPC_reduction}; - std::string OmpStructureChecker::ContextDirectiveAsFortran() { auto dir = llvm::omp::getOpenMPDirectiveName(GetContext().directive).str(); std::transform(dir.begin(), dir.end(), dir.begin(), @@ -186,19 +134,18 @@ void OmpStructureChecker::Enter(const parser::OpenMPLoopConstruct &x) { CheckMatching(beginLoopDir, *endLoopDir); } - if (beginDir.v != llvm::omp::Directive::OMPD_do) - PushContext(beginDir.source, beginDir.v); + if (beginDir.v != llvm::omp::Directive::OMPD_do) { + PushContextAndClauseSets(beginDir.source, beginDir.v); + } else { + // 2.7.1 do-clause -> private-clause | + // firstprivate-clause | + // lastprivate-clause | + // linear-clause | + // reduction-clause | + // schedule-clause | + // collapse-clause | + // ordered-clause - switch (beginDir.v) { - // 2.7.1 do-clause -> private-clause | - // firstprivate-clause | - // lastprivate-clause | - // linear-clause | - // reduction-clause | - // schedule-clause | - // collapse-clause | - // ordered-clause - case llvm::omp::Directive::OMPD_do: { // nesting check HasInvalidWorksharingNesting(beginDir.source, {llvm::omp::Directive::OMPD_do, llvm::omp::Directive::OMPD_sections, @@ -210,218 +157,7 @@ void OmpStructureChecker::Enter(const parser::OpenMPLoopConstruct &x) { llvm::omp::Directive::OMPD_ordered, llvm::omp::Directive::OMPD_atomic, llvm::omp::Directive::OMPD_master}); - PushContext(beginDir.source, llvm::omp::Directive::OMPD_do); - SetContextAllowed(doAllowedClauses); - SetContextAllowedOnce(doAllowedOnceClauses); - } break; - - // 2.11.1 parallel-do-clause -> parallel-clause | - // do-clause - case llvm::omp::Directive::OMPD_parallel_do: { - SetContextAllowed(parallelAllowedClauses | doAllowedClauses); - SetContextAllowedOnce(parallelAllowedOnceClauses | doAllowedOnceClauses); - } break; - - // 2.8.1 simd-clause -> safelen-clause | - // simdlen-clause | - // linear-clause | - // aligned-clause | - // private-clause | - // lastprivate-clause | - // reduction-clause | - // collapse-clause - case llvm::omp::Directive::OMPD_simd: { - SetContextAllowed(simdAllowedClauses); - SetContextAllowedOnce(simdAllowedOnceClauses); - } break; - - // 2.8.3 do-simd-clause -> do-clause | - // simd-clause - case llvm::omp::Directive::OMPD_do_simd: { - SetContextAllowed(doAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(doAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - // 2.11.4 parallel-do-simd-clause -> parallel-clause | - // do-simd-clause - case llvm::omp::Directive::OMPD_parallel_do_simd: { - SetContextAllowed( - parallelAllowedClauses | doAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(parallelAllowedOnceClauses | doAllowedOnceClauses | - simdAllowedOnceClauses); - } break; - - // 2.9.2 taskloop-clause -> if-clause | - // shared-clause | - // private-clause | - // firstprivate-clause | - // lastprivate-clause | - // default-clause | - // grainsize-clause | - // num-tasks-clause | - // collapse-clause | - // final-clause | - // priority-clause | - // untied-clause | - // mergeable-clause | - // nogroup-clause - case llvm::omp::Directive::OMPD_taskloop: { - SetContextAllowed(taskloopAllowedClauses); - SetContextAllowedOnce(taskloopAllowedOnceClauses); - SetContextAllowedExclusive(taskloopAllowedExclusiveClauses); - } break; - - // 2.9.3 taskloop-simd-clause -> taskloop-clause | - // simd-clause - case llvm::omp::Directive::OMPD_taskloop_simd: { - SetContextAllowed((taskloopAllowedClauses | simdAllowedClauses) - - llvm::omp::Clause::OMPC_reduction); - SetContextAllowedOnce(taskloopAllowedOnceClauses | simdAllowedOnceClauses); - SetContextAllowedExclusive(taskloopAllowedExclusiveClauses); - } break; - - // 2.10.8 distribute-clause -> private-clause | - // firstprivate-clause | - // lastprivate-clause | - // collapse-clause | - // dist-schedule-clause - case llvm::omp::Directive::OMPD_distribute: { - SetContextAllowed(distributeAllowedClauses); - SetContextAllowedOnce(distributeAllowedOnceClauses); - } break; - - // 2.10.9 distribute-simd-clause -> distribute-clause | - // simd-clause - case llvm::omp::Directive::OMPD_distribute_simd: { - SetContextAllowed(distributeAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce( - distributeAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - // 2.10.10 distribute-parallel-do-clause -> distribute-clause | - // parallel-do-clause - case llvm::omp::Directive::OMPD_distribute_parallel_do: { - SetContextAllowed( - distributeAllowedClauses | parallelAllowedClauses | doAllowedClauses); - SetContextAllowedOnce(distributeAllowedOnceClauses | - parallelAllowedOnceClauses | doAllowedOnceClauses); - } break; - - // 2.10.11 distribute-parallel-do-simd-clause -> distribute-clause | - // parallel-do-simd-clause - case llvm::omp::Directive::OMPD_distribute_parallel_do_simd: { - SetContextAllowed(distributeAllowedClauses | parallelAllowedClauses | - doAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(distributeAllowedOnceClauses | - parallelAllowedOnceClauses | doAllowedOnceClauses | simdAllowedClauses); - } break; - - // 2.11.6 target-parallel-do-clause -> target-clause | - // parallel-do-clause - case llvm::omp::Directive::OMPD_target_parallel_do: { - SetContextAllowed( - targetAllowedClauses | parallelAllowedClauses | doAllowedClauses); - SetContextAllowedOnce( - (targetAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses) - - llvm::omp::Clause::OMPC_nowait); - } break; - - // 2.11.7 target-parallel-do-simd-clause -> target-clause | - // parallel-do-simd-clause - case llvm::omp::Directive::OMPD_target_parallel_do_simd: { - SetContextAllowed(targetAllowedClauses | parallelAllowedClauses | - doAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce( - (targetAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses | simdAllowedOnceClauses) - - llvm::omp::Clause::OMPC_nowait); - } break; - - // 2.11.8 target-simd-clause -> target-clause | - // simd-clause - case llvm::omp::Directive::OMPD_target_simd: { - SetContextAllowed(targetAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - // 2.11.10 teams-distribute-clause -> teams-clause | - // distribute-clause - case llvm::omp::Directive::OMPD_teams_distribute: { - SetContextAllowed(teamsAllowedClauses | distributeAllowedClauses); - SetContextAllowedOnce( - teamsAllowedOnceClauses | distributeAllowedOnceClauses); - } break; - - // 2.11.11 teams-distribute-simd-clause -> teams-clause | - // distribute-simd-clause - case llvm::omp::Directive::OMPD_teams_distribute_simd: { - SetContextAllowed( - teamsAllowedClauses | distributeAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(teamsAllowedOnceClauses | - distributeAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - // 2.11.12 target-teams-distribute-clause -> target-clause | - // teams-distribute-clause - case llvm::omp::Directive::OMPD_target_teams_distribute: { - SetContextAllowed( - targetAllowedClauses | teamsAllowedClauses | distributeAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses | teamsAllowedOnceClauses | - distributeAllowedOnceClauses); - } break; - - // 2.11.13 target-teams-distribute-simd-clause -> target-clause | - // teams-distribute-simd-clause - case llvm::omp::Directive::OMPD_target_teams_distribute_simd: { - SetContextAllowed(targetAllowedClauses | teamsAllowedClauses | - distributeAllowedClauses | simdAllowedClauses); - SetContextAllowed(targetAllowedOnceClauses | teamsAllowedOnceClauses | - distributeAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - // 2.11.14 teams-distribute-parallel-do-clause -> teams-clause | - // distribute-parallel-do-clause - case llvm::omp::Directive::OMPD_teams_distribute_parallel_do: { - SetContextAllowed(teamsAllowedClauses | distributeAllowedClauses | - parallelAllowedClauses | doAllowedClauses); - SetContextAllowedOnce(teamsAllowedOnceClauses | - distributeAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses); - } break; - - // 2.11.15 target-teams-distribute-parallel-do-clause -> target-clause | - // teams-distribute-parallel-do-clause - case llvm::omp::Directive::OMPD_target_teams_distribute_parallel_do: { - SetContextAllowed(targetAllowedClauses | teamsAllowedClauses | - distributeAllowedClauses | parallelAllowedClauses | doAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses | teamsAllowedOnceClauses | - distributeAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses); - } break; - - // 2.11.16 teams-distribute-parallel-do-clause -> teams-clause | - // distribute-parallel-do-simd-clause - case llvm::omp::Directive::OMPD_teams_distribute_parallel_do_simd: { - SetContextAllowed(teamsAllowedClauses | distributeAllowedClauses | - parallelAllowedClauses | doAllowedClauses | simdAllowedClauses); - SetContextAllowedOnce(teamsAllowedOnceClauses | - distributeAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - case llvm::omp::Directive::OMPD_target_teams_distribute_parallel_do_simd: { - SetContextAllowed(targetAllowedClauses | teamsAllowedClauses | - distributeAllowedClauses | parallelAllowedClauses | doAllowedClauses | - simdAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses | teamsAllowedOnceClauses | - distributeAllowedOnceClauses | parallelAllowedOnceClauses | - doAllowedOnceClauses | simdAllowedOnceClauses); - } break; - - default: - // TODO others - break; + PushContextAndClauseSets(beginDir.source, llvm::omp::Directive::OMPD_do); } } @@ -436,12 +172,8 @@ void OmpStructureChecker::Enter(const parser::OmpEndLoopDirective &x) { // 2.7.1 end-do -> END DO [nowait-clause] // 2.8.3 end-do-simd -> END DO SIMD [nowait-clause] case llvm::omp::Directive::OMPD_do: - SetContextDirectiveEnum(llvm::omp::Directive::OMPD_end_do); - SetContextAllowed(OmpClauseSet{llvm::omp::Clause::OMPC_nowait}); - break; case llvm::omp::Directive::OMPD_do_simd: - SetContextDirectiveEnum(llvm::omp::Directive::OMPD_end_do_simd); - SetContextAllowed(OmpClauseSet{llvm::omp::Clause::OMPC_nowait}); + SetClauseSets(dir.v); break; default: // no clauses are allowed @@ -455,112 +187,7 @@ void OmpStructureChecker::Enter(const parser::OpenMPBlockConstruct &x) { const auto &beginDir{ CheckMatching(beginBlockDir, endBlockDir)}; - PushContext(beginDir.source, beginDir.v); - switch (beginDir.v) { - // 2.5 parallel-clause -> if-clause | - // num-threads-clause | - // default-clause | - // private-clause | - // firstprivate-clause | - // shared-clause | - // copyin-clause | - // reduction-clause | - // proc-bind-clause - case llvm::omp::Directive::OMPD_parallel: { - // reserve for nesting check - SetContextAllowed(parallelAllowedClauses); - SetContextAllowedOnce(parallelAllowedOnceClauses); - } break; - // 2.7.3 single-clause -> private-clause | - // firstprivate-clause - case llvm::omp::Directive::OMPD_single: - SetContextAllowed({llvm::omp::Clause::OMPC_private, - llvm::omp::Clause::OMPC_firstprivate}); - break; - // 2.7.4 workshare (no clauses are allowed) - case llvm::omp::Directive::OMPD_workshare: - break; - // 2.11.3 parallel-workshare-clause -> parallel-clause - case llvm::omp::Directive::OMPD_parallel_workshare: { - SetContextAllowed(parallelAllowedClauses); - SetContextAllowedOnce(parallelAllowedOnceClauses); - } break; - // 2.9.1 task-clause -> if-clause | - // final-clause | - // untied-clause | - // default-clause | - // mergeable-clause | - // private-clause | - // firstprivate-clause | - // shared-clause | - // depend-clause | - // priority-clause - case llvm::omp::Directive::OMPD_task: { - OmpClauseSet allowed{llvm::omp::Clause::OMPC_untied, - llvm::omp::Clause::OMPC_default, llvm::omp::Clause::OMPC_mergeable, - llvm::omp::Clause::OMPC_private, llvm::omp::Clause::OMPC_firstprivate, - llvm::omp::Clause::OMPC_shared, llvm::omp::Clause::OMPC_depend}; - SetContextAllowed(allowed); - OmpClauseSet allowedOnce{llvm::omp::Clause::OMPC_if, - llvm::omp::Clause::OMPC_final, llvm::omp::Clause::OMPC_priority}; - SetContextAllowedOnce(allowedOnce); - } break; - // 2.10.4 target-clause -> if-clause | - // device-clause | - // private-clause | - // firstprivate-clause | - // map-clause | - // is-device-ptr-clause | - // defaultmap-clause | - // nowait-clause | - // depend-clause - case llvm::omp::Directive::OMPD_target: { - SetContextAllowed(targetAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses); - } break; - // 2.10.7 teams-clause -> num-teams-clause | - // thread-limit-clause | - // default-clause | - // private-clause | - // firstprivate-clause | - // shared-clause | - // reduction-clause - case llvm::omp::Directive::OMPD_teams: { - SetContextAllowed(teamsAllowedClauses); - SetContextAllowedOnce(teamsAllowedOnceClauses); - } break; - // 2.11.9 target-teams -> target-clause | - // teams-clause - case llvm::omp::Directive::OMPD_target_teams: { - SetContextAllowed(targetAllowedClauses | teamsAllowedClauses); - SetContextAllowedOnce(targetAllowedOnceClauses | teamsAllowedOnceClauses); - } break; - // 2.10.1 target-data-clause -> if-clause | - // device-clause | - // map-clause | - // use-device-ptr-clause - case llvm::omp::Directive::OMPD_target_data: { - OmpClauseSet allowed{llvm::omp::Clause::OMPC_if, - llvm::omp::Clause::OMPC_map, llvm::omp::Clause::OMPC_use_device_ptr}; - SetContextAllowed(allowed); - SetContextAllowedOnce({llvm::omp::Clause::OMPC_device}); - SetContextRequired({llvm::omp::Clause::OMPC_map}); - } break; - // 2.13.1 master (no clauses are allowed) - case llvm::omp::Directive::OMPD_master: - break; - // 2.11.5 target-parallel-clause -> target-clause | - // parallel-clause - case llvm::omp::Directive::OMPD_target_parallel: { - SetContextAllowed((targetAllowedClauses | parallelAllowedClauses) - - llvm::omp::Clause::OMPC_copyin); - SetContextAllowedOnce( - targetAllowedOnceClauses | parallelAllowedOnceClauses); - } break; - default: - // TODO others - break; - } + PushContextAndClauseSets(beginDir.source, beginDir.v); } void OmpStructureChecker::Leave(const parser::OpenMPBlockConstruct &) { @@ -574,25 +201,7 @@ void OmpStructureChecker::Enter(const parser::OpenMPSectionsConstruct &x) { const auto &beginDir{CheckMatching( beginSectionsDir, endSectionsDir)}; - PushContext(beginDir.source, beginDir.v); - switch (beginDir.v) { - // 2.7.2 sections-clause -> private-clause | - // firstprivate-clause | - // lastprivate-clause | - // reduction-clause - case llvm::omp::Directive::OMPD_sections: { - SetContextAllowed(sectionsAllowedClauses); - } break; - // 2.11.2 -> parallel-sections-clause -> parallel-clause | - // sections-clause - case llvm::omp::Directive::OMPD_parallel_sections: { - SetContextAllowed(parallelAllowedClauses | sectionsAllowedClauses); - SetContextAllowedOnce(parallelAllowedOnceClauses); - } break; - default: - // TODO others - break; - } + PushContextAndClauseSets(beginDir.source, beginDir.v); } void OmpStructureChecker::Leave(const parser::OpenMPSectionsConstruct &) { @@ -616,19 +225,7 @@ void OmpStructureChecker::Enter(const parser::OmpEndSectionsDirective &x) { void OmpStructureChecker::Enter(const parser::OpenMPDeclareSimdConstruct &x) { const auto &dir{std::get(x.t)}; - PushContext(dir.source, llvm::omp::Directive::OMPD_declare_simd); - // 2.8.2 declare-simd-clause -> simdlen-clause | - // linear-clause | - // aligned-clause | - // uniform-clause | - // inbranch-clause | - // notinbranch-clause - OmpClauseSet allowed{llvm::omp::Clause::OMPC_linear, - llvm::omp::Clause::OMPC_aligned, llvm::omp::Clause::OMPC_uniform}; - SetContextAllowed(allowed); - SetContextAllowedOnce({llvm::omp::Clause::OMPC_simdlen}); - SetContextAllowedExclusive( - {llvm::omp::Clause::OMPC_inbranch, llvm::omp::Clause::OMPC_notinbranch}); + PushContextAndClauseSets(dir.source, llvm::omp::Directive::OMPD_declare_simd); } void OmpStructureChecker::Leave(const parser::OpenMPDeclareSimdConstruct &) { @@ -652,57 +249,7 @@ void OmpStructureChecker::Leave(const parser::OpenMPDeclareTargetConstruct &) { void OmpStructureChecker::Enter( const parser::OpenMPSimpleStandaloneConstruct &x) { const auto &dir{std::get(x.t)}; - PushContext(dir.source, dir.v); - switch (dir.v) { - case llvm::omp::Directive::OMPD_barrier: { - // 2.13.3 barrier - } break; - case llvm::omp::Directive::OMPD_taskwait: { - // 2.13.4 taskwait - } break; - case llvm::omp::Directive::OMPD_taskyield: { - // 2.9.4 taskyield - } break; - case llvm::omp::Directive::OMPD_target_enter_data: { - // 2.10.2 target-enter-data-clause -> if-clause | - // device-clause | - // map-clause | - // depend-clause | - // nowait-clause - OmpClauseSet allowed{llvm::omp::Clause::OMPC_map, - llvm::omp::Clause::OMPC_depend, llvm::omp::Clause::OMPC_nowait}; - SetContextAllowed(allowed); - OmpClauseSet allowedOnce{ - llvm::omp::Clause::OMPC_device, llvm::omp::Clause::OMPC_if}; - SetContextAllowedOnce(allowedOnce); - SetContextRequired({llvm::omp::Clause::OMPC_map}); - } break; - case llvm::omp::Directive::OMPD_target_exit_data: { - // 2.10.3 target-enter-data-clause -> if-clause | - // device-clause | - // map-clause | - // depend-clause | - // nowait-clause - OmpClauseSet allowed{llvm::omp::Clause::OMPC_map, - llvm::omp::Clause::OMPC_depend, llvm::omp::Clause::OMPC_nowait}; - SetContextAllowed(allowed); - OmpClauseSet allowedOnce{ - llvm::omp::Clause::OMPC_device, llvm::omp::Clause::OMPC_if}; - SetContextAllowedOnce(allowedOnce); - SetContextRequired({llvm::omp::Clause::OMPC_map}); - } break; - case llvm::omp::Directive::OMPD_target_update: { - // 2.10.5 target-update - } break; - case llvm::omp::Directive::OMPD_ordered: { - // 2.13.8 ordered-construct-clause -> depend-clause - OmpClauseSet allowed{llvm::omp::Clause::OMPC_depend}; - SetContextAllowed(allowed); - } break; - default: - // TODO others - break; - } + PushContextAndClauseSets(dir.source, dir.v); } void OmpStructureChecker::Leave( @@ -712,7 +259,7 @@ void OmpStructureChecker::Leave( void OmpStructureChecker::Enter(const parser::OpenMPFlushConstruct &x) { const auto &dir{std::get(x.t)}; - PushContext(dir.source, llvm::omp::Directive::OMPD_flush); + PushContextAndClauseSets(dir.source, llvm::omp::Directive::OMPD_flush); } void OmpStructureChecker::Leave(const parser::OpenMPFlushConstruct &) { @@ -721,7 +268,7 @@ void OmpStructureChecker::Leave(const parser::OpenMPFlushConstruct &) { void OmpStructureChecker::Enter(const parser::OpenMPCancelConstruct &x) { const auto &dir{std::get(x.t)}; - PushContext(dir.source, llvm::omp::Directive::OMPD_cancel); + PushContextAndClauseSets(dir.source, llvm::omp::Directive::OMPD_cancel); } void OmpStructureChecker::Leave(const parser::OpenMPCancelConstruct &) { @@ -731,7 +278,8 @@ void OmpStructureChecker::Leave(const parser::OpenMPCancelConstruct &) { void OmpStructureChecker::Enter( const parser::OpenMPCancellationPointConstruct &x) { const auto &dir{std::get(x.t)}; - PushContext(dir.source, llvm::omp::Directive::OMPD_cancellation_point); + PushContextAndClauseSets( + dir.source, llvm::omp::Directive::OMPD_cancellation_point); } void OmpStructureChecker::Leave( diff --git a/flang/lib/Semantics/check-omp-structure.h b/flang/lib/Semantics/check-omp-structure.h index 1585b0c861ad..eff0eb4aa76b 100644 --- a/flang/lib/Semantics/check-omp-structure.h +++ b/flang/lib/Semantics/check-omp-structure.h @@ -25,6 +25,9 @@ using OmpDirectiveSet = Fortran::common::EnumSet; +#define GEN_FLANG_DIRECTIVE_CLAUSE_SETS +#include "llvm/Frontend/OpenMP/OMP.cpp.inc" + namespace llvm { namespace omp { static OmpDirectiveSet parallelSet{Directive::OMPD_distribute_parallel_do, @@ -151,6 +154,9 @@ class OmpStructureChecker : public virtual BaseChecker { void Enter(const parser::OmpScheduleClause &); private: +#define GEN_FLANG_DIRECTIVE_CLAUSE_MAP +#include "llvm/Frontend/OpenMP/OMP.cpp.inc" + struct OmpContext { OmpContext(parser::CharBlock source, llvm::omp::Directive d) : directiveSource{source}, directive{d} {} @@ -216,7 +222,20 @@ class OmpStructureChecker : public virtual BaseChecker { void PushContext(const parser::CharBlock &source, llvm::omp::Directive dir) { ompContext_.emplace_back(source, dir); } - + void SetClauseSets(llvm::omp::Directive dir) { + ompContext_.back().allowedClauses = directiveClausesTable[dir].allowed; + ompContext_.back().allowedOnceClauses = + directiveClausesTable[dir].allowedOnce; + ompContext_.back().allowedExclusiveClauses = + directiveClausesTable[dir].allowedExclusive; + ompContext_.back().requiredClauses = + directiveClausesTable[dir].requiredOneOf; + } + void PushContextAndClauseSets( + const parser::CharBlock &source, llvm::omp::Directive dir) { + PushContext(source, dir); + SetClauseSets(dir); + } void RequiresConstantPositiveParameter( const llvm::omp::Clause &clause, const parser::ScalarIntConstantExpr &i); void RequiresPositiveParameter( diff --git a/flang/test/Semantics/omp-clause-validity01.f90 b/flang/test/Semantics/omp-clause-validity01.f90 index e3f43dc5445e..77e40e323e5f 100644 --- a/flang/test/Semantics/omp-clause-validity01.f90 +++ b/flang/test/Semantics/omp-clause-validity01.f90 @@ -458,7 +458,6 @@ enddo !$omp end taskloop simd - !ERROR: REDUCTION clause is not allowed on the TASKLOOP SIMD directive !$omp taskloop simd reduction(+:a) do i = 1, N a = a + 3.14 diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td index 785a520613b9..3c295a1d7c5f 100644 --- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td +++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td @@ -43,6 +43,9 @@ class DirectiveLanguage { // Header file included in the implementation code generated. Ususally the // output file of the declaration code generation. Can be left blank. string includeHeader = ""; + + // EnumSet class name used for clauses to generated the allowed clauses map. + string clauseEnumSetClass = ""; } // Information about a specific clause. @@ -92,6 +95,9 @@ class Directive { // List of clauses that are allowed to appear only once. list allowedOnceClauses = []; + // List of clauses that are allowed but mutually exclusive. + list allowedExclusiveClauses = []; + // List of clauses that are required. list requiredClauses = []; diff --git a/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt b/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt index 69f503675940..3ff89888bfd6 100644 --- a/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt +++ b/llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt @@ -1,3 +1,4 @@ set(LLVM_TARGET_DEFINITIONS OMP.td) tablegen(LLVM OMP.h.inc --gen-directive-decl) +tablegen(LLVM OMP.cpp.inc --gen-directive-gen) add_public_tablegen_target(omp_gen) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td index bd81eeb01127..a565bdf90b3f 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMP.td +++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td @@ -24,6 +24,7 @@ def OpenMP : DirectiveLanguage { let makeEnumAvailableInNamespace = 1; let enableBitmaskEnumInNamespace = 1; let includeHeader = "llvm/Frontend/OpenMP/OMP.h.inc"; + let clauseEnumSetClass = "OmpClauseSet"; } //===----------------------------------------------------------------------===// @@ -201,10 +202,7 @@ def OMPC_Notinbranch : Clause<"notinbranch"> {} def OMP_ThreadPrivate : Directive<"threadprivate"> {} def OMP_Parallel : Directive<"parallel"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, @@ -212,11 +210,14 @@ def OMP_Parallel : Directive<"parallel"> { VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + ]; } def OMP_Task : Directive<"task"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, @@ -224,12 +225,16 @@ def OMP_Task : Directive<"task"> { VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Simd : Directive<"simd"> { let allowedClauses = [ @@ -237,15 +242,17 @@ def OMP_Simd : Directive<"simd"> { VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + ]; } def OMP_For : Directive<"for"> { let allowedClauses = [ @@ -273,7 +280,8 @@ def OMP_Do : Directive<"do"> { let allowedOnceClauses = [ VersionedClause, VersionedClause, - VersionedClause + VersionedClause, + VersionedClause ]; } def OMP_Sections : Directive<"sections"> { @@ -345,30 +353,34 @@ def OMP_Atomic : Directive<"atomic"> { def OMP_Target : Directive<"target"> { let allowedClauses = [ VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Teams : Directive<"teams"> { let allowedClauses = [ - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_Cancel : Directive<"cancel"> { let allowedClauses = [ @@ -386,50 +398,64 @@ def OMP_Requires : Directive<"requires"> { } def OMP_TargetData : Directive<"target data"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause + ]; + let requiredClauses = [ + VersionedClause + ]; } def OMP_TargetEnterData : Directive<"target enter data"> { let allowedClauses = [ + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } def OMP_TargetExitData : Directive<"target exit data"> { let allowedClauses = [ - VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause, + VersionedClause + ]; + let requiredClauses = [ + VersionedClause ]; } def OMP_TargetParallel : Directive<"target parallel"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TargetParallelFor : Directive<"target parallel for"> { let allowedClauses = [ @@ -459,27 +485,31 @@ def OMP_TargetParallelFor : Directive<"target parallel for"> { } def OMP_TargetParallelDo : Directive<"target parallel do"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause ]; } def OMP_TargetUpdate : Directive<"target update"> { @@ -558,27 +588,29 @@ def OMP_ParallelForSimd : Directive<"parallel for simd"> { } def OMP_ParallelDoSimd : Directive<"parallel do simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_ParallelMaster : Directive<"parallel master"> { let allowedClauses = [ @@ -597,7 +629,6 @@ def OMP_ParallelMaster : Directive<"parallel master"> { def OMP_ParallelSections : Directive<"parallel sections"> { let allowedClauses = [ VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, @@ -608,6 +639,9 @@ def OMP_ParallelSections : Directive<"parallel sections"> { VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause + ]; } def OMP_ForSimd : Directive<"for simd"> { let allowedClauses = [ @@ -643,7 +677,8 @@ def OMP_DoSimd : Directive<"do simd"> { VersionedClause, VersionedClause, VersionedClause, - VersionedClause + VersionedClause, + VersionedClause ]; } def OMP_CancellationPoint : Directive<"cancellation point"> {} @@ -653,53 +688,74 @@ def OMP_DeclareMapper : Directive<"declare mapper"> { VersionedClause ]; } -def OMP_DeclareSimd : Directive<"declare simd"> {} +def OMP_DeclareSimd : Directive<"declare simd"> { + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause + ]; + let allowedExclusiveClauses = [ + VersionedClause, + VersionedClause + ]; +} def OMP_TaskLoop : Directive<"taskloop"> { let allowedClauses = [ - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + ]; + let allowedExclusiveClauses = [ + VersionedClause, + VersionedClause + ]; } def OMP_TaskLoopSimd : Directive<"taskloop simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedExclusiveClauses = [ + VersionedClause, + VersionedClause ]; } def OMP_Distribute : Directive<"distribute"> { @@ -707,10 +763,12 @@ def OMP_Distribute : Directive<"distribute"> { VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause + ]; } def OMP_DeclareTarget : Directive<"declare target"> {} def OMP_EndDeclareTarget : Directive<"end declare target"> {} @@ -735,21 +793,25 @@ def OMP_DistributeParallelFor : Directive<"distribute parallel for"> { } def OMP_DistributeParallelDo : Directive<"distribute parallel do"> { let allowedClauses = [ + VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } def OMP_DistributeParallelForSimd : Directive<"distribute parallel for simd"> { @@ -802,22 +864,31 @@ def OMP_DistributeParallelDoSimd : Directive<"distribute parallel do simd"> { } def OMP_DistributeSimd : Directive<"distribute simd"> { let allowedClauses = [ - VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause ]; } + def OMP_TargetParallelForSimd : Directive<"target parallel for simd"> { let allowedClauses = [ VersionedClause, @@ -880,27 +951,33 @@ def OMP_TargetParallelDoSimd : Directive<"target parallel do simd"> { } def OMP_TargetSimd : Directive<"target simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, VersionedClause, + VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; } def OMP_TeamsDistribute : Directive<"teams distribute"> { let allowedClauses = [ @@ -919,26 +996,29 @@ def OMP_TeamsDistribute : Directive<"teams distribute"> { } def OMP_TeamsDistributeSimd : Directive<"teams distribute simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } + def OMP_TeamsDistributeParallelForSimd : Directive<"teams distribute parallel for simd"> { let allowedClauses = [ @@ -968,27 +1048,29 @@ def OMP_TeamsDistributeParallelForSimd : def OMP_TeamsDistributeParallelDoSimd : Directive<"teams distribute parallel do simd"> { let allowedClauses = [ + VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause, ]; } def OMP_TeamsDistributeParallelFor : @@ -1016,68 +1098,78 @@ def OMP_TeamsDistributeParallelFor : def OMP_TeamsDistributeParallelDo : Directive<"teams distribute parallel do"> { let allowedClauses = [ + VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; +let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } def OMP_TargetTeams : Directive<"target teams"> { let allowedClauses = [ VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause + VersionedClause, + VersionedClause + ]; + + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause ]; } def OMP_TargetTeamsDistribute : Directive<"target teams distribute"> { let allowedClauses = [ VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } + def OMP_TargetTeamsDistributeParallelFor : Directive<"target teams distribute parallel for"> { let allowedClauses = [ @@ -1110,28 +1202,33 @@ def OMP_TargetTeamsDistributeParallelDo : Directive<"target teams distribute parallel do"> { let allowedClauses = [ VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause ]; } def OMP_TargetTeamsDistributeParallelForSimd : @@ -1170,63 +1267,69 @@ def OMP_TargetTeamsDistributeParallelForSimd : def OMP_TargetTeamsDistributeParallelDoSimd : Directive<"target teams distribute parallel do simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, - VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause + VersionedClause ]; } def OMP_TargetTeamsDistributeSimd : Directive<"target teams distribute simd"> { let allowedClauses = [ - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, VersionedClause, - VersionedClause, + VersionedClause, VersionedClause, - VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, VersionedClause, VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, VersionedClause, VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause, - VersionedClause ]; } def OMP_Allocate : Directive<"allocate"> { @@ -1359,7 +1462,22 @@ def OMP_Scan : Directive<"scan"> { } def OMP_BeginDeclareVariant : Directive<"begin declare variant"> {} def OMP_EndDeclareVariant : Directive<"end declare variant"> {} -def OMP_ParallelWorkshare : Directive<"parallel workshare"> {} +def OMP_ParallelWorkshare : Directive<"parallel workshare"> { + let allowedClauses = [ + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause, + VersionedClause + ]; + let allowedOnceClauses = [ + VersionedClause, + VersionedClause, + VersionedClause + ]; +} def OMP_Workshare : Directive<"workshare"> {} def OMP_EndDo : Directive<"end do"> {} def OMP_EndDoSimd : Directive<"end do simd"> {} diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index 8b3cc8702bd4..b293196d4d55 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -1,5 +1,6 @@ // RUN: llvm-tblgen -gen-directive-decl -I %p/../../include %s | FileCheck -match-full-lines %s // RUN: llvm-tblgen -gen-directive-impl -I %p/../../include %s | FileCheck -match-full-lines %s -check-prefix=IMPL +// RUN: llvm-tblgen -gen-directive-gen -I %p/../../include %s | FileCheck -match-full-lines %s -check-prefix=GEN include "llvm/Frontend/Directive/DirectiveBase.td" @@ -126,3 +127,57 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: } // IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } +// IMPL-EMPTY: + + + +// GEN: #ifdef GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-NEXT: #undef GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-EMPTY: +// GEN-NEXT: namespace llvm { +// GEN-NEXT: namespace tdl { +// GEN-EMPTY: +// GEN-NEXT: // Sets for dira +// GEN-EMPTY: +// GEN-NEXT: static allowedClauses_TDLD_dira { +// GEN-NEXT: llvm::tdl::Clause::TDLC_clausea, +// GEN-NEXT: llvm::tdl::Clause::TDLC_clauseb, +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static allowedOnceClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static allowedExclusiveClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static requiredClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-NEXT: } // namespace tdl +// GEN-NEXT: } // namespace llvm +// GEN-EMPTY: +// GEN-NEXT: #endif // GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-EMPTY: +// GEN-NEXT: #ifdef GEN_FLANG_DIRECTIVE_CLAUSE_MAP +// GEN-NEXT: #undef GEN_FLANG_DIRECTIVE_CLAUSE_MAP +// GEN-EMPTY: +// GEN-NEXT: struct TdlDirectiveClauses { +// GEN-NEXT: const allowed; +// GEN-NEXT: const allowedOnce; +// GEN-NEXT: const allowedExclusive; +// GEN-NEXT: const requiredOneOf; +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: std::unordered_map +// GEN-NEXT: directiveClausesTable = { +// GEN-NEXT: {llvm::tdl::Directive::TDLD_dira, +// GEN-NEXT: { +// GEN-NEXT: llvm::tdl::allowedClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::allowedOnceClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::allowedExclusiveClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::requiredClauses_TDLD_dira, +// GEN-NEXT: } +// GEN-NEXT: }, +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: #endif // GEN_FLANG_DIRECTIVE_CLAUSE_MAP + diff --git a/llvm/test/TableGen/directive2.td b/llvm/test/TableGen/directive2.td index 06c7aabcf3ad..517c79d45798 100644 --- a/llvm/test/TableGen/directive2.td +++ b/llvm/test/TableGen/directive2.td @@ -1,5 +1,6 @@ // RUN: llvm-tblgen -gen-directive-decl -I %p/../../include %s | FileCheck -match-full-lines %s // RUN: llvm-tblgen -gen-directive-impl -I %p/../../include %s | FileCheck -match-full-lines %s -check-prefix=IMPL +// RUN: llvm-tblgen -gen-directive-gen -I %p/../../include %s | FileCheck -match-full-lines %s -check-prefix=GEN include "llvm/Frontend/Directive/DirectiveBase.td" @@ -71,7 +72,7 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: using namespace llvm; // IMPL-NEXT: using namespace tdl; // IMPL-EMPTY: -// IMPL: Directive llvm::tdl::getTdlDirectiveKind(llvm::StringRef Str) { +// IMPL-NEXT: Directive llvm::tdl::getTdlDirectiveKind(llvm::StringRef Str) { // IMPL-NEXT: return llvm::StringSwitch(Str) // IMPL-NEXT: .Case("dira",TDLD_dira) // IMPL-NEXT: .Default(TDLD_dira); @@ -119,3 +120,54 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: } // IMPL-NEXT: llvm_unreachable("Invalid Tdl Directive kind"); // IMPL-NEXT: } + + +// GEN: #ifdef GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-NEXT: #undef GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-EMPTY: +// GEN-NEXT: namespace llvm { +// GEN-NEXT: namespace tdl { +// GEN-EMPTY: +// GEN-NEXT: // Sets for dira +// GEN-EMPTY: +// GEN-NEXT: static allowedClauses_TDLD_dira { +// GEN-NEXT: llvm::tdl::Clause::TDLC_clausea, +// GEN-NEXT: llvm::tdl::Clause::TDLC_clauseb, +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static allowedOnceClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static allowedExclusiveClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: static requiredClauses_TDLD_dira { +// GEN-NEXT: }; +// GEN-NEXT: } // namespace tdl +// GEN-NEXT: } // namespace llvm +// GEN-EMPTY: +// GEN-NEXT: #endif // GEN_FLANG_DIRECTIVE_CLAUSE_SETS +// GEN-EMPTY: +// GEN-NEXT: #ifdef GEN_FLANG_DIRECTIVE_CLAUSE_MAP +// GEN-NEXT: #undef GEN_FLANG_DIRECTIVE_CLAUSE_MAP +// GEN-EMPTY: +// GEN-NEXT: struct TdlDirectiveClauses { +// GEN-NEXT: const allowed; +// GEN-NEXT: const allowedOnce; +// GEN-NEXT: const allowedExclusive; +// GEN-NEXT: const requiredOneOf; +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: std::unordered_map +// GEN-NEXT: directiveClausesTable = { +// GEN-NEXT: {llvm::tdl::Directive::TDLD_dira, +// GEN-NEXT: { +// GEN-NEXT: llvm::tdl::allowedClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::allowedOnceClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::allowedExclusiveClauses_TDLD_dira, +// GEN-NEXT: llvm::tdl::requiredClauses_TDLD_dira, +// GEN-NEXT: } +// GEN-NEXT: }, +// GEN-NEXT: }; +// GEN-EMPTY: +// GEN-NEXT: #endif // GEN_FLANG_DIRECTIVE_CLAUSE_MAP diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index f51f98872bb5..fc4a6757f808 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -14,12 +14,30 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" +#include "llvm/ADT/StringSet.h" #include "llvm/TableGen/Error.h" #include "llvm/TableGen/Record.h" #include "llvm/TableGen/TableGenBackend.h" using namespace llvm; +namespace { +// Simple RAII helper for defining ifdef-undef-endif scopes. +class IfDefScope { +public: + IfDefScope(StringRef Name, raw_ostream &OS) : Name(Name), OS(OS) { + OS << "#ifdef " << Name << "\n" + << "#undef " << Name << "\n"; + } + + ~IfDefScope() { OS << "\n#endif // " << Name << "\n\n"; } + +private: + StringRef Name; + raw_ostream &OS; +}; +} // end anonymous namespace + namespace llvm { // Get Directive or Clause name formatted by replacing whitespaces with @@ -205,16 +223,21 @@ void GenerateGetKind(const std::vector &Records, raw_ostream &OS, void GenerateCaseForVersionedClauses(const std::vector &Clauses, raw_ostream &OS, StringRef DirectiveName, StringRef DirectivePrefix, - StringRef ClausePrefix) { + StringRef ClausePrefix, + llvm::StringSet<> &Cases) { for (const auto &C : Clauses) { const auto MinVersion = C->getValueAsInt("minVersion"); const auto MaxVersion = C->getValueAsInt("maxVersion"); const auto SpecificClause = C->getValueAsDef("clause"); - const auto ClauseName = SpecificClause->getValueAsString("name"); - OS << " case " << ClausePrefix << getFormattedName(ClauseName) - << ":\n"; - OS << " return " << MinVersion << " <= Version && " << MaxVersion - << " >= Version;\n"; + const auto ClauseName = + getFormattedName(SpecificClause->getValueAsString("name")); + + if (Cases.find(ClauseName) == Cases.end()) { + Cases.insert(ClauseName); + OS << " case " << ClausePrefix << ClauseName << ":\n"; + OS << " return " << MinVersion << " <= Version && " << MaxVersion + << " >= Version;\n"; + } } } @@ -239,24 +262,32 @@ void GenerateIsAllowedClause(const std::vector &Directives, const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); const auto &AllowedOnceClauses = D->getValueAsListOfDefs("allowedOnceClauses"); + const auto &AllowedExclusiveClauses = + D->getValueAsListOfDefs("allowedExclusiveClauses"); const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); OS << " case " << DirectivePrefix << getFormattedName(DirectiveName) << ":\n"; - if (AllowedClauses.size() == 0 && AllowedOnceClauses.size() == 0 && - AllowedOnceClauses.size() == 0) { + if (AllowedClauses.size() == 0 && AllowedOnceClauses.size() == 0 && + AllowedExclusiveClauses.size() == 0 && RequiredClauses.size() == 0) { OS << " return false;\n"; } else { OS << " switch (C) {\n"; + llvm::StringSet<> Cases; + GenerateCaseForVersionedClauses(AllowedClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + DirectivePrefix, ClausePrefix, Cases); GenerateCaseForVersionedClauses(AllowedOnceClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + DirectivePrefix, ClausePrefix, Cases); + + GenerateCaseForVersionedClauses(AllowedExclusiveClauses, OS, + DirectiveName, DirectivePrefix, + ClausePrefix, Cases); GenerateCaseForVersionedClauses(RequiredClauses, OS, DirectiveName, - DirectivePrefix, ClausePrefix); + DirectivePrefix, ClausePrefix, Cases); OS << " default:\n"; OS << " return false;\n"; @@ -271,9 +302,143 @@ void GenerateIsAllowedClause(const std::vector &Directives, OS << "}\n"; // End of function isAllowedClauseForDirective } +// Generate a simple enum set with the give clauses. +void GenerateClauseSet(const std::vector &Clauses, raw_ostream &OS, + StringRef ClauseEnumSetClass, StringRef ClauseSetPrefix, + StringRef DirectiveName, StringRef DirectivePrefix, + StringRef ClausePrefix, StringRef CppNamespace) { + + OS << "\n"; + OS << " static " << ClauseEnumSetClass << " " << ClauseSetPrefix + << DirectivePrefix << getFormattedName(DirectiveName) << " {\n"; + + for (const auto &C : Clauses) { + const auto SpecificClause = C->getValueAsDef("clause"); + const auto ClauseName = SpecificClause->getValueAsString("name"); + OS << " llvm::" << CppNamespace << "::Clause::" << ClausePrefix + << getFormattedName(ClauseName) << ",\n"; + } + OS << " };\n"; +} + +// Generate an enum set for the 4 kinds of clauses linked to a directive. +void GenerateDirectiveClauseSets(const std::vector &Directives, + raw_ostream &OS, StringRef LanguageName, + StringRef ClauseEnumSetClass, + StringRef DirectivePrefix, + StringRef ClausePrefix, + StringRef CppNamespace) { + + IfDefScope Scope("GEN_FLANG_DIRECTIVE_CLAUSE_SETS", OS); + + OS << "\n"; + OS << "namespace llvm {\n"; + + // Open namespaces defined in the directive language. + llvm::SmallVector Namespaces; + llvm::SplitString(CppNamespace, Namespaces, "::"); + for (auto Ns : Namespaces) + OS << "namespace " << Ns << " {\n"; + + for (const auto &D : Directives) { + const auto DirectiveName = D->getValueAsString("name"); + + const auto &AllowedClauses = D->getValueAsListOfDefs("allowedClauses"); + const auto &AllowedOnceClauses = + D->getValueAsListOfDefs("allowedOnceClauses"); + const auto &AllowedExclusiveClauses = + D->getValueAsListOfDefs("allowedExclusiveClauses"); + const auto &RequiredClauses = D->getValueAsListOfDefs("requiredClauses"); + + OS << "\n"; + OS << " // Sets for " << DirectiveName << "\n"; + + GenerateClauseSet(AllowedClauses, OS, ClauseEnumSetClass, "allowedClauses_", + DirectiveName, DirectivePrefix, ClausePrefix, + CppNamespace); + GenerateClauseSet(AllowedOnceClauses, OS, ClauseEnumSetClass, + "allowedOnceClauses_", DirectiveName, DirectivePrefix, + ClausePrefix, CppNamespace); + GenerateClauseSet(AllowedExclusiveClauses, OS, ClauseEnumSetClass, + "allowedExclusiveClauses_", DirectiveName, + DirectivePrefix, ClausePrefix, CppNamespace); + GenerateClauseSet(RequiredClauses, OS, ClauseEnumSetClass, + "requiredClauses_", DirectiveName, DirectivePrefix, + ClausePrefix, CppNamespace); + } + + // Closing namespaces + for (auto Ns : llvm::reverse(Namespaces)) + OS << "} // namespace " << Ns << "\n"; + + OS << "} // namespace llvm\n"; +} + +// Generate a map of directive (key) with DirectiveClauses struct as values. +// The struct holds the 4 sets of enumeration for the 4 kinds of clauses +// allowances (allowed, allowed once, allowed exclusive and required). +void GenerateDirectiveClauseMap(const std::vector &Directives, + raw_ostream &OS, StringRef LanguageName, + StringRef ClauseEnumSetClass, + StringRef DirectivePrefix, + StringRef ClausePrefix, + StringRef CppNamespace) { + + IfDefScope Scope("GEN_FLANG_DIRECTIVE_CLAUSE_MAP", OS); + + OS << "\n"; + OS << "struct " << LanguageName << "DirectiveClauses {\n"; + OS << " const " << ClauseEnumSetClass << " allowed;\n"; + OS << " const " << ClauseEnumSetClass << " allowedOnce;\n"; + OS << " const " << ClauseEnumSetClass << " allowedExclusive;\n"; + OS << " const " << ClauseEnumSetClass << " requiredOneOf;\n"; + OS << "};\n"; + + OS << "\n"; + + OS << "std::unordered_map\n"; + OS << " directiveClausesTable = {\n"; + + for (const auto &D : Directives) { + const auto FormattedDirectiveName = + getFormattedName(D->getValueAsString("name")); + OS << " {llvm::" << CppNamespace << "::Directive::" << DirectivePrefix + << FormattedDirectiveName << ",\n"; + OS << " {\n"; + OS << " llvm::" << CppNamespace << "::allowedClauses_" + << DirectivePrefix << FormattedDirectiveName << ",\n"; + OS << " llvm::" << CppNamespace << "::allowedOnceClauses_" + << DirectivePrefix << FormattedDirectiveName << ",\n"; + OS << " llvm::" << CppNamespace << "::allowedExclusiveClauses_" + << DirectivePrefix << FormattedDirectiveName << ",\n"; + OS << " llvm::" << CppNamespace << "::requiredClauses_" + << DirectivePrefix << FormattedDirectiveName << ",\n"; + OS << " }\n"; + OS << " },\n"; + } + + OS << "};\n"; +} + // Generate the implemenation section for the enumeration in the directive // language -void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { +void EmitDirectivesFlangImpl(const std::vector &Directives, + raw_ostream &OS, StringRef LanguageName, + StringRef ClauseEnumSetClass, + StringRef DirectivePrefix, StringRef ClausePrefix, + StringRef CppNamespace) { + + GenerateDirectiveClauseSets(Directives, OS, LanguageName, ClauseEnumSetClass, + DirectivePrefix, ClausePrefix, CppNamespace); + + GenerateDirectiveClauseMap(Directives, OS, LanguageName, ClauseEnumSetClass, + DirectivePrefix, ClausePrefix, CppNamespace); +} + +// Generate the implemenation section for the enumeration in the directive +// language. +void EmitDirectivesGen(RecordKeeper &Records, raw_ostream &OS) { const auto &DirectiveLanguages = Records.getAllDerivedDefinitions("DirectiveLanguage"); @@ -289,12 +454,40 @@ void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { StringRef LanguageName = DirectiveLanguage->getValueAsString("name"); StringRef ClausePrefix = DirectiveLanguage->getValueAsString("clausePrefix"); StringRef CppNamespace = DirectiveLanguage->getValueAsString("cppNamespace"); - StringRef IncludeHeader = - DirectiveLanguage->getValueAsString("includeHeader"); + StringRef ClauseEnumSetClass = + DirectiveLanguage->getValueAsString("clauseEnumSetClass"); const auto &Directives = Records.getAllDerivedDefinitions("Directive"); const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); + EmitDirectivesFlangImpl(Directives, OS, LanguageName, ClauseEnumSetClass, + DirectivePrefix, ClausePrefix, CppNamespace); +} + +// Generate the implemenation for the enumeration in the directive +// language. This code can be included in library. +void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { + + const auto &DirectiveLanguages = + Records.getAllDerivedDefinitions("DirectiveLanguage"); + + if (DirectiveLanguages.size() != 1) { + PrintError("A single definition of DirectiveLanguage is needed."); + return; + } + + const auto &DirectiveLanguage = DirectiveLanguages[0]; + StringRef DirectivePrefix = + DirectiveLanguage->getValueAsString("directivePrefix"); + StringRef LanguageName = DirectiveLanguage->getValueAsString("name"); + StringRef ClausePrefix = DirectiveLanguage->getValueAsString("clausePrefix"); + StringRef CppNamespace = DirectiveLanguage->getValueAsString("cppNamespace"); + const auto &Directives = Records.getAllDerivedDefinitions("Directive"); + const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); + + StringRef IncludeHeader = + DirectiveLanguage->getValueAsString("includeHeader"); + if (!IncludeHeader.empty()) OS << "#include \"" << IncludeHeader << "\"\n\n"; @@ -323,6 +516,7 @@ void EmitDirectivesImpl(RecordKeeper &Records, raw_ostream &OS) { GenerateGetName(Clauses, OS, "Clause", ClausePrefix, LanguageName, CppNamespace); + // isAllowedClauseForDirective(Directive D, Clause C, unsigned Version) GenerateIsAllowedClause(Directives, OS, LanguageName, DirectivePrefix, ClausePrefix, CppNamespace); } diff --git a/llvm/utils/TableGen/TableGen.cpp b/llvm/utils/TableGen/TableGen.cpp index 7438749a1243..8015a58471ca 100644 --- a/llvm/utils/TableGen/TableGen.cpp +++ b/llvm/utils/TableGen/TableGen.cpp @@ -56,6 +56,7 @@ enum ActionType { GenAutomata, GenDirectivesEnumDecl, GenDirectivesEnumImpl, + GenDirectivesEnumGen, }; namespace llvm { @@ -132,9 +133,11 @@ cl::opt Action( "Generate llvm-exegesis tables"), clEnumValN(GenAutomata, "gen-automata", "Generate generic automata"), clEnumValN(GenDirectivesEnumDecl, "gen-directive-decl", - "Generate directive related declaration code"), + "Generate directive related declaration code (header file)"), clEnumValN(GenDirectivesEnumImpl, "gen-directive-impl", - "Generate directive related implementation code"))); + "Generate directive related implementation code"), + clEnumValN(GenDirectivesEnumGen, "gen-directive-gen", + "Generate directive related implementation code part"))); cl::OptionCategory PrintEnumsCat("Options for -print-enums"); cl::opt Class("class", cl::desc("Print Enum list for this class"), @@ -265,6 +268,9 @@ bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) { case GenDirectivesEnumImpl: EmitDirectivesImpl(Records, OS); break; + case GenDirectivesEnumGen: + EmitDirectivesGen(Records, OS); + break; } return false; diff --git a/llvm/utils/TableGen/TableGenBackends.h b/llvm/utils/TableGen/TableGenBackends.h index 9e6171abcabf..92204f39f8fa 100644 --- a/llvm/utils/TableGen/TableGenBackends.h +++ b/llvm/utils/TableGen/TableGenBackends.h @@ -92,6 +92,7 @@ void EmitExegesis(RecordKeeper &RK, raw_ostream &OS); void EmitAutomata(RecordKeeper &RK, raw_ostream &OS); void EmitDirectivesDecl(RecordKeeper &RK, raw_ostream &OS); void EmitDirectivesImpl(RecordKeeper &RK, raw_ostream &OS); +void EmitDirectivesGen(RecordKeeper &RK, raw_ostream &OS); } // End llvm namespace From llvm-commits at lists.llvm.org Sat Jul 11 09:45:26 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 16:45:26 +0000 (UTC) Subject: [PATCH] D83326: [flang][openmp] Check clauses allowed semantic with tablegen generated map In-Reply-To: References: Message-ID: <1a6b759aae3ff4576fd46f4bb7a35294@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG6e42a417bacb: [flang][openmp] Check clauses allowed semantic with tablegen generated map (authored by clementval). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83326/new/ https://reviews.llvm.org/D83326 Files: flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/check-omp-structure.h flang/test/Semantics/omp-clause-validity01.f90 llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp llvm/utils/TableGen/TableGen.cpp llvm/utils/TableGen/TableGenBackends.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83326.277248.patch Type: text/x-patch Size: 79481 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 09:55:00 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 16:55:00 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <1faee537d27b459ccc8e679d1b6ba3ad@localhost.localdomain> kuter updated this revision to Diff 277247. kuter added a comment. 1. Better comments. 2. Remove unneeded lines from the test. I investigated the situation with the argument option that this patch introduces. LLVM CommandLine library already does not allow a empty list but what happens is when you pass a empty list for a execution like this `opt --attribute-seed-allow-list` CommandLine automatically displays an error and terminates execution. But when there are other arguments after the option the then that option becomes a member of the list. It would have been great if we had a way of telling whether a name is valid or not. Perhaps we could have a list of valid attribute names. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp llvm/test/Transforms/Attributor/allow_list.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83185.277247.patch Type: text/x-patch Size: 4691 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 09:56:05 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 16:56:05 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <60400c4f5d0e56606dbe7455c96d8537@localhost.localdomain> aykevl marked 3 inline comments as done. aykevl added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:4 +# RUN: ld.lld %t.o --defsym=a=0x12345678 --defsym=b=30 -o %t +# RUN: llvm-objdump -d %t | FileCheck %s +# RUN: llvm-objdump -s %t | FileCheck --check-prefix=HEX %s ---------------- MaskRay wrote: > You may want `--print-imm-hex` to print immediates in hexadecimal. I tried, but it doesn't seem to make a difference in the output. However I agree the assembly would be easier to read if it were hexadecimal (because of the hexadecimal input). ================ Comment at: lld/test/ELF/avr-reloc.s:17 +# CHECK-NEXT: ldi r20, 255 +ldi r20, lo8(a) +ldi r20, hi8(a) ---------------- MaskRay wrote: > Might be worth adding comments about the exact relocation type used, e.g. > > `ldi r20, lo8(a) # R_AVR_LO8_LDI...` Good idea, I'll change this. ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- MaskRay wrote: > `{{.*}} 1e1e000f 00785634 12` > > The address of .DATA is insignificant and should be omitted. What do you mean? The address is not included in the test. This is the full output: ``` Contents of section .DATA: 110e4 1e1e000f 00785634 12 .....xV4. ``` I believe `110e4` is the address (not included in the test) while `1e1e000f 00785634 12` is the contents of the section (in hex). Or do you mean I should check for the presence of an address using `{{.*}}`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 10:00:13 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sat, 11 Jul 2020 10:00:13 -0700 (PDT) Subject: [llvm] 8f183d9 - [openmp] Remove unused variable in DirectiveEmitter Message-ID: <5f09f01d.1c69fb81.979dc.794b@mx.google.com> Author: clementval Date: 2020-07-11T12:59:52-04:00 New Revision: 8f183d9f3d13d66a679bd449b1f5d34942560028 URL: https://github.com/llvm/llvm-project/commit/8f183d9f3d13d66a679bd449b1f5d34942560028 DIFF: https://github.com/llvm/llvm-project/commit/8f183d9f3d13d66a679bd449b1f5d34942560028.diff LOG: [openmp] Remove unused variable in DirectiveEmitter Added: Modified: llvm/utils/TableGen/DirectiveEmitter.cpp Removed: ################################################################################ diff --git a/llvm/utils/TableGen/DirectiveEmitter.cpp b/llvm/utils/TableGen/DirectiveEmitter.cpp index fc4a6757f808..ebcd6873205e 100644 --- a/llvm/utils/TableGen/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/DirectiveEmitter.cpp @@ -458,7 +458,6 @@ void EmitDirectivesGen(RecordKeeper &Records, raw_ostream &OS) { DirectiveLanguage->getValueAsString("clauseEnumSetClass"); const auto &Directives = Records.getAllDerivedDefinitions("Directive"); - const auto &Clauses = Records.getAllDerivedDefinitions("Clause"); EmitDirectivesFlangImpl(Directives, OS, LanguageName, ClauseEnumSetClass, DirectivePrefix, ClausePrefix, CppNamespace); From llvm-commits at lists.llvm.org Sat Jul 11 10:04:53 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 17:04:53 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: aykevl updated this revision to Diff 277249. aykevl added a comment. - add relocation types in comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 Files: lld/ELF/Arch/AVR.cpp lld/test/ELF/avr-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D78741.277249.patch Type: text/x-patch Size: 6572 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 10:33:05 2020 From: llvm-commits at lists.llvm.org (Sourabh Singh Tomar via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 17:33:05 +0000 (UTC) Subject: [PATCH] D83560: [DebugInfo] Added support for DW_OP_implicit_value in llvm In-Reply-To: References: Message-ID: <61265bdd207508885dec7e63029d7633@localhost.localdomain> SouraVX marked an inline comment as done. SouraVX added a comment. In D83560#2145768 , @dblaikie wrote: > Thanks @dblaikie for your feedback! > Not sure it's worth committing such a narrow implementation - might be worth a bit of generalization even in this first patch? This operation(as you might have noticed) has limited usage, since `DW_OP_stack_value` and friends fits in other cases very well. This is specifically made to cater needs(such as in this case) where the variable size is bigger than (size of address). This will cater all the needs in math(HPC) based applications where usage of `long double` is ubiquitous. > looks to be very specific to long double right now (but without any assertions, API features, or comments to enforce that restriction) - and in the DwarfDebug.cpp caller, which could presumably be used for all constant values, maybe (not sure if that would be a good thing or not - haven't looked at the alternatives, etc)) Yes, that's very specific and you'll notice that it has very strict enforcement requirements `isConstrantFP` followed by `if (AP.getDwarfVersion() >= 4 && RawBytes.getBitWidth() == 80)`(making sure we don't mess up existing infra). For the documentation/comments part, admittedly I didn't put any behavior/requirement specifics in the declaration, However the definition part is fully documented and self-explanatory(IMO). Should we put some brief comments there(declaration part) too ? > @aprantl @probinson will probably want to weigh in on getting support from their debuggers before this is committed, or having this under a flag, etc. As of now `GDB ` has fairly good support for this. `LLDB` doesn't have it, I'm not sure how much effort is needed to put this in ? As a side note and as opportunistic usage of `DW_OP_implict_value` usage: I noted following WRT `GCC` usage of this: `GCC` uses `DW_OP_implicit_value` for `long double` as well as `float` `double`(may be more, these I've verified). `clang` uses `DW_OP_constu` followed by `DW_OP_stack_value` for cases of `float` `double`(correctly) and `long double` (incorrectly as depicted). I assume that since both `float` and `double` are within(64 bit) it's okay to represent that using `DW_OP_constu` and `DW_OP_stack_value`, but (as you may notice) Spec doesn't say anything whether it should also be used for floating point types Spec: Pg. 27: DW_OP_constu The single operand of the DW_OP_constu operation provides an unsigned LEB128 integer constant. One more(I would say advantage) for using `DW_OP_implicit_value` over some `DW_OP_stack_value` based expression for cases of `float` and `double` is that it consumes less space(to be exact 1 byte lesser in case of `float` and `double`). (GCC sort of uses an algorithm to choose between these(which representation will take less space)) Consider a simple case: float f=3.14f; CLANG based - DW_OP_constu + DW_OP_stack_value expression = 7 bytes GCC based- DW_OP_implicit_value = 6 bytes double d = 3.14; CLANG based - DW_OP_constu + DW_OP_stack_value expression = 11 bytes GCC based- DW_OP_implicit_value = 10 bytes These findings also encourage the need for support for `DW_OP_implicit_value` from `LLDB` side, otherwise `LLDB` won't be able to show locations based on these expression. ================ Comment at: llvm/test/DebugInfo/X86/DW_OP_implicit_value.ll:13-18 +;;int main() { +;; long double ld = 3.14; +;; printf("dummy\n"); +;; ld *= ld; +;; return 0; +;;} ---------------- dblaikie wrote: > I'd probably write this as: > ``` > long double src(); > void f1() { > long double ld = 3.14; > ld = src(); > } > ``` > That should be enough to use the implicit_value to represent the value of 'ld' during the call to src, and to use a location list to do it (since the assignment to 'ld' is shortening the lifetime - using function calls at least I find are clearer opaque clobbers/sinks - rather than the complexity of printf, or wondering whether the use of multiplication is significant, whether this has to be main, has to have an integer return value for some reason, etc.) Typically the workflow I follow while working on these test cases, is not to solely rely on DWARF but to have a executable as well ready to be loaded to debugger. Otherwise(you may notice, we could have never discovered this, there was an expression but through `GDB` I came to know that, it' incorrect.) That's why I putted the test case as it is. If you've concerns WRT this I'm okay with that as well(however I didn't see benefit/drawback in both approaches) :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83560/new/ https://reviews.llvm.org/D83560 From llvm-commits at lists.llvm.org Sat Jul 11 10:36:21 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 17:36:21 +0000 (UTC) Subject: [PATCH] D83185: [Attributor] Introduce Attribute seed allow list. In-Reply-To: References: Message-ID: <2e2181897116cc7339ed8a95586a48a2@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added a comment. This revision is now accepted and ready to land. > It would have been great if we had a way of telling whether a name is valid or not. Perhaps we could have a list of valid attribute names. Let's not worry about that too much now. It seems an empty list doesn't make sense and when people pass garbage they'll get something. One nit below and one idea but otherwise LGTM. We could set the seeding to false if the list is empty to avoid looking up if it is empty all the time. Feel free to ignore. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:1432 + /// Wheather attributes are being `seeded`, always false after ::run function + /// gets called see getOrCreateAAFor. + bool SeedingPeriod = true; ---------------- Nit: `called. \see ...` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83185/new/ https://reviews.llvm.org/D83185 From llvm-commits at lists.llvm.org Sat Jul 11 11:09:27 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 18:09:27 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <331a10f917174ff0cc2df5b23799b038@localhost.localdomain> MaskRay added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:17 +# CHECK-NEXT: ldi r20, 255 +ldi r20, lo8(a) +ldi r20, hi8(a) ---------------- aykevl wrote: > MaskRay wrote: > > Might be worth adding comments about the exact relocation type used, e.g. > > > > `ldi r20, lo8(a) # R_AVR_LO8_LDI...` > Good idea, I'll change this. Doesn't `# ` work as well? ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- aykevl wrote: > MaskRay wrote: > > `{{.*}} 1e1e000f 00785634 12` > > > > The address of .DATA is insignificant and should be omitted. > What do you mean? The address is not included in the test. > > This is the full output: > > ``` > Contents of section .DATA: > 110e4 1e1e000f 00785634 12 .....xV4. > ``` > > I believe `110e4` is the address (not included in the test) while `1e1e000f 00785634 12` is the contents of the section (in hex). > > Or do you mean I should check for the presence of an address using `{{.*}}`? 110e4 as the address is insignificant. If the content is not dependent on the address, omitting 110e4 has the benefit that the test does not need an update if the assigned addresses change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 11:15:29 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 18:15:29 +0000 (UTC) Subject: [PATCH] D83560: [DebugInfo] Added support for DW_OP_implicit_value in llvm In-Reply-To: References: Message-ID: <095a7ca7814b546c7ec02ebe9a2346bf@localhost.localdomain> dblaikie added a comment. In D83560#2145933 , @SouraVX wrote: > In D83560#2145768 , @dblaikie wrote: > > > > > > Thanks @dblaikie for your feedback! > > > Not sure it's worth committing such a narrow implementation - might be worth a bit of generalization even in this first patch? > > This operation(as you might have noticed) has limited usage, since `DW_OP_stack_value` and friends fits in other cases very well. This is specifically made to cater needs(such as in this case) where the variable size is bigger than (size of address). This will cater all the needs in math(HPC) based applications where usage of `long double` is ubiquitous. > > > looks to be very specific to long double right now (but without any assertions, API features, or comments to enforce that restriction) - and in the DwarfDebug.cpp caller, which could presumably be used for all constant values, maybe (not sure if that would be a good thing or not - haven't looked at the alternatives, etc)) > > Yes, that's very specific and you'll notice that it has very strict enforcement requirements `isConstrantFP` followed by `if (AP.getDwarfVersion() >= 4 && RawBytes.getBitWidth() == 80)`(making sure we don't mess up existing infra). The call site criticism I had was: "in the DwarfDebug.cpp caller, which could presumably be used for all constant values, maybe (not sure if that would be a good thing or not - haven't looked at the alternatives, etc))" > For the documentation/comments part, admittedly I didn't put any behavior/requirement specifics in the declaration, However the definition part is fully documented and self-explanatory(IMO). Should we put some brief comments there(declaration part) too ? The rest of the criticism " looks to be very specific to long double right now (but without any assertions, API features, or comments to enforce that restriction) " applied to the "addImplicitValue" function: Its name is very general, its signature is very general, but the implementation is very specific about a certain width field, etc. Which doesn't seem ideal/quite right. Comments inside the implementation without assertions - with a very general function name seems like it'd be pretty easy to misuse and get unexpected behavior. If the code is going to be so specific then the function name should probably describe that specific behavior and if possible assertions to ensure that's done - perhaps even have it take APFloat. (& I'm not sure whether it's worth choosing between implicit_value and uconst - and just always do implicit_value) >> @aprantl @probinson will probably want to weigh in on getting support from their debuggers before this is committed, or having this under a flag, etc. > > As of now `GDB ` has fairly good support for this. `LLDB` doesn't have it, I'm not sure how much effort is needed to put this in ? Neither am I - and, yeah, if we /just/ did it for 128 bit types, which you've demonstrated are already unusable by lldb - then there would be no harm in making this change sooner. But if we're going to generalize it (& I think the code merits generalization if it weren't for any other constraints) then we'd need to consider how it'd break existing consumers, etc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83560/new/ https://reviews.llvm.org/D83560 From llvm-commits at lists.llvm.org Sat Jul 11 11:16:17 2020 From: llvm-commits at lists.llvm.org (Mahesha S via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 18:16:17 +0000 (UTC) Subject: [PATCH] D83626: [AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flag Message-ID: hsmhsm created this revision. hsmhsm added reviewers: foad, rampitec, arsenm, cfang. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl. Herald added a project: LLVM. For the mem ops clustering logic, keep both old and new logic, guard new logic by a flag. By default, the flag is disabled and the old logic will be in place. When the flag is enabled, the new logic is triggered. The flag to enable new logic is - `--amdgpu-enable-new-mem-ops-cluster-heuristic=true`. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83626 Files: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll llvm/test/CodeGen/AMDGPU/kernel-args.ll llvm/test/CodeGen/AMDGPU/memory_clause.ll llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll llvm/test/CodeGen/AMDGPU/salu-to-valu.ll llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll llvm/test/CodeGen/AMDGPU/shift-i128.ll llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll llvm/test/CodeGen/AMDGPU/udivrem.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83626.277253.patch Type: text/x-patch Size: 39890 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 11:18:30 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 18:18:30 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <26dcc9850795fb4943aa9eb16ba016e1@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- jhenderson wrote: > dblaikie wrote: > > jhenderson wrote: > > > dblaikie wrote: > > > > Higuoxing wrote: > > > > > jhenderson wrote: > > > > > > dblaikie wrote: > > > > > > > Higuoxing wrote: > > > > > > > > dblaikie wrote: > > > > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > > > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > > > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > > > > > > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > > What do you mean by "auto-generation aspects"? > > > > But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time. > At the moment, to use yaml2obj to generate DWARF, you have to specify pretty much every detail of the DWARF, including the details of the abbrev table and the string table for example. Ideally, we should be able to describe the DWARF in a higher level manner (e.g. by just specifying the attributes and values in the .debug_info description, letting yaml2obj do all the leg work of selecting a form, populating the abbrev and string tables etc). You'll see details of this in @Higuoxing's mailing list posts about his GSOC project. > > We can use the basic-level testing for "bootstrapping". yaml2obj can generate valid raw sections, tested via hex -> allows testing of llvm-dwarfdump section dumping -> allows testing of yaml2obj higher-level functionality (because we know that llvm-dwarfdump section dumping now works). That seems like it's going to be fairly subtle/hard to maintain the separation here - if some yaml2obj tests use hex dumping but others can use llvm-dwarfdump - if/when/that's happening, might be worth separate directories for the two kinds of tests and some fairly specific documentation about how to determine which tests go where. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Sat Jul 11 11:53:56 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 18:53:56 +0000 (UTC) Subject: [PATCH] D81631: Fix undefined behavior in Dwarf. In-Reply-To: References: Message-ID: <2aa3f1c54bb4906d94196a1f1dbda797@localhost.localdomain> dblaikie accepted this revision. dblaikie added a comment. In D81631#2141400 , @linzj wrote: > > - bool SameAsPrevCU = this == DD->getPrevCU(); + const DwarfCompileUnit *PrevCU = DD->getPrevCU(); > > Err, once you invoke getPrevCU, then a load-to-undefined instruction will be emitted. So valgrind will still report this problem. valgrind doesn't like undefined value's load. Right you are! Don't mind me then :) thanks for walking me through it all! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81631/new/ https://reviews.llvm.org/D81631 From llvm-commits at lists.llvm.org Sat Jul 11 12:18:17 2020 From: llvm-commits at lists.llvm.org (Sourabh Singh Tomar via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:18:17 +0000 (UTC) Subject: [PATCH] D83560: [DebugInfo] Added support for DW_OP_implicit_value in llvm In-Reply-To: References: Message-ID: <53a0de06751e1d0afded76b0e58ffd7b@localhost.localdomain> SouraVX added a comment. Sure, I understand the need having specific name(and my mistake of choosing a generic name `addImplicitValue` for the purpose of this patch). I've noted down and work on your inputs/concerns and revise once @aprantl and @probinson also have a look on it. Thanks again for your inputs :) Regarding that strange check(80 bit width), at that time I didn't find a way to distinguish b/w `float` and `long double` so I did it in a brute force fashion(IEEE representation to distinguish). I'll see if I can do it better. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83560/new/ https://reviews.llvm.org/D83560 From llvm-commits at lists.llvm.org Sat Jul 11 12:20:28 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:20:28 +0000 (UTC) Subject: [PATCH] D70376: [LVI] Restructure caching In-Reply-To: References: Message-ID: <0031ec5a4257905b5d60751d73f4885a@localhost.localdomain> nikic added a comment. Unfortunately there is an additional complication here. Apparently the separate per-block storage of overdefined values is not just there to reduce memory usage, it is also necessary for the implementation of threadEdgeImpl: https://github.com/llvm/llvm-project/blob/8f183d9f3d13d66a679bd449b1f5d34942560028/llvm/lib/Analysis/LazyValueInfo.cpp#L264 This code needs to know which values are cached as overdefined for a given block, which would be highly inefficient (scan over all values) if we don't key by block first. I'm not sure whether the threadEdgeImpl code can be dropped -- no tests fail if I do, but it's not like the commit that introduced this (https://github.com/llvm/llvm-project/commit/aa7f66ba6798ea946baa622b55679597dab60742) had any tests either... Ideally we'd switch LVI to be based on PredicateInfo and thus avoid the need to track data per-block in the first place. Of course, this is not simple to do. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70376/new/ https://reviews.llvm.org/D70376 From llvm-commits at lists.llvm.org Sat Jul 11 12:31:49 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:31:49 +0000 (UTC) Subject: [PATCH] D83576: [BasicAA] Fix -basicaa-recphi for geps with negative offsets In-Reply-To: References: Message-ID: <81261ddbca329494c442a559c8a0df77@localhost.localdomain> hfinkel accepted this revision. hfinkel added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83576/new/ https://reviews.llvm.org/D83576 From llvm-commits at lists.llvm.org Sat Jul 11 12:38:28 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:38:28 +0000 (UTC) Subject: [PATCH] D83628: [examples] fix ExceptionDemo Message-ID: stephenneuendorffer created this revision. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Code didn't compile in a release build. Guard debug output with ifndef NDEBUG. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83628 Files: llvm/examples/ExceptionDemo/ExceptionDemo.cpp Index: llvm/examples/ExceptionDemo/ExceptionDemo.cpp =================================================================== --- llvm/examples/ExceptionDemo/ExceptionDemo.cpp +++ llvm/examples/ExceptionDemo/ExceptionDemo.cpp @@ -792,7 +792,7 @@ } #endif - const uint8_t *lsda = _Unwind_GetLanguageSpecificData(context); + const uint8_t *lsda = (const uint8_t *)_Unwind_GetLanguageSpecificData(context); #ifdef DEBUG fprintf(stderr, @@ -1959,11 +1959,13 @@ executionEngine->finalizeObject(); +#ifndef NDEBUG fprintf(stderr, "\nBegin module dump:\n\n"); module->dump(); fprintf(stderr, "\nEnd module dump:\n"); +#endif fprintf(stderr, "\n\nBegin Test:\n"); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83628.277255.patch Type: text/x-patch Size: 702 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 12:39:08 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:39:08 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <81e5c799a865e4c34787907e3a124dd4@localhost.localdomain> MaskRay added a comment. In D79978#2145413 , @tmsriram wrote: > ... > Now, what happens when we start placing individual basic blocks in unique sections: > > - Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function. > - The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently. > - Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section. > - Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own. ... Thanks for the detailed write-up! I I've learned a bit from it. I think at least the quoted items should be placed in the description. In D79978#2145189 , @amharc wrote: > > Note also that only `rbp` is described. I think we need another register to demonstrate the effect. > > `rbp` is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test. I cannot find a test with `.cfa_offset` or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registered? int f3(int i, int j, int k) { if (i == 0) { // adds a basic block asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15 return j; } if (j == 0) { asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory"); // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13 return k; } return i; } I get a lot of `.cfi_offset` with `clang -S -emit-llvm -O1 a.c; llc -O0 -basicblock-sections=all < a.ll`: ... .section .text,"ax", at progbits,unique,1 f3.1: # %if.then .cfi_startproc .cfi_def_cfa %rsp, 48 .cfi_offset %r12, -48 .cfi_offset %r13, -40 .cfi_offset %r14, -32 .cfi_offset %r15, -24 .cfi_offset %rbp, -16 #APP nop #NO_APP movl -8(%rsp), %eax # 4-byte Reload movl %eax, -16(%rsp) # 4-byte Spill ... ================ Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:484 +/// frame pointer. +void X86FrameLowering::emitCalleeSavedFrameMoves( + MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) const { ---------------- If the function is only used by basic block sections, please mention the fact. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 12:39:37 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via llvm-commits) Date: Sat, 11 Jul 2020 12:39:37 -0700 (PDT) Subject: [llvm] d8c3503 - [examples] fix ExceptionDemo Message-ID: <5f0a1579.1c69fb81.33616.5929@mx.google.com> Author: Stephen Neuendorffer Date: 2020-07-11T12:38:27-07:00 New Revision: d8c35031a39e7b1bf9524ddd325c7a91dbb05f1d URL: https://github.com/llvm/llvm-project/commit/d8c35031a39e7b1bf9524ddd325c7a91dbb05f1d DIFF: https://github.com/llvm/llvm-project/commit/d8c35031a39e7b1bf9524ddd325c7a91dbb05f1d.diff LOG: [examples] fix ExceptionDemo Code didn't compile in a release build. Guard debug output with ifndef NDEBUG. Differential Revision: https://reviews.llvm.org/D83628 Added: Modified: llvm/examples/ExceptionDemo/ExceptionDemo.cpp Removed: ################################################################################ diff --git a/llvm/examples/ExceptionDemo/ExceptionDemo.cpp b/llvm/examples/ExceptionDemo/ExceptionDemo.cpp index 0ecb527f4ec0..1b3ec7c91dde 100644 --- a/llvm/examples/ExceptionDemo/ExceptionDemo.cpp +++ b/llvm/examples/ExceptionDemo/ExceptionDemo.cpp @@ -792,7 +792,7 @@ _Unwind_Reason_Code ourPersonality(int version, _Unwind_Action actions, } #endif - const uint8_t *lsda = _Unwind_GetLanguageSpecificData(context); + const uint8_t *lsda = (const uint8_t *)_Unwind_GetLanguageSpecificData(context); #ifdef DEBUG fprintf(stderr, @@ -1959,11 +1959,13 @@ int main(int argc, char *argv[]) { executionEngine->finalizeObject(); +#ifndef NDEBUG fprintf(stderr, "\nBegin module dump:\n\n"); module->dump(); fprintf(stderr, "\nEnd module dump:\n"); +#endif fprintf(stderr, "\n\nBegin Test:\n"); From llvm-commits at lists.llvm.org Sat Jul 11 12:39:51 2020 From: llvm-commits at lists.llvm.org (Stephen Neuendorffer via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:39:51 +0000 (UTC) Subject: [PATCH] D83628: [examples] fix ExceptionDemo In-Reply-To: References: Message-ID: <2300e47ac3cd03565a34459d168d30c5@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGd8c35031a39e: [examples] fix ExceptionDemo (authored by stephenneuendorffer). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83628/new/ https://reviews.llvm.org/D83628 Files: llvm/examples/ExceptionDemo/ExceptionDemo.cpp Index: llvm/examples/ExceptionDemo/ExceptionDemo.cpp =================================================================== --- llvm/examples/ExceptionDemo/ExceptionDemo.cpp +++ llvm/examples/ExceptionDemo/ExceptionDemo.cpp @@ -792,7 +792,7 @@ } #endif - const uint8_t *lsda = _Unwind_GetLanguageSpecificData(context); + const uint8_t *lsda = (const uint8_t *)_Unwind_GetLanguageSpecificData(context); #ifdef DEBUG fprintf(stderr, @@ -1959,11 +1959,13 @@ executionEngine->finalizeObject(); +#ifndef NDEBUG fprintf(stderr, "\nBegin module dump:\n\n"); module->dump(); fprintf(stderr, "\nEnd module dump:\n"); +#endif fprintf(stderr, "\n\nBegin Test:\n"); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83628.277256.patch Type: text/x-patch Size: 702 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 12:55:06 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sat, 11 Jul 2020 12:55:06 -0700 (PDT) Subject: [llvm] 47872ad - [X86] Add test cases for missed opportunities to use vpternlog due to a bitcast between the logic ops. Message-ID: <5f0a191a.1c69fb81.5c7ac.540e@mx.google.com> Author: Craig Topper Date: 2020-07-11T12:54:52-07:00 New Revision: 47872adf6ae236c798d05b7229e00f363ab2fe0f URL: https://github.com/llvm/llvm-project/commit/47872adf6ae236c798d05b7229e00f363ab2fe0f DIFF: https://github.com/llvm/llvm-project/commit/47872adf6ae236c798d05b7229e00f363ab2fe0f.diff LOG: [X86] Add test cases for missed opportunities to use vpternlog due to a bitcast between the logic ops. These test cases fail to use vpternlog because the AND was converted to a blend shuffle and then converted back to AND during shuffle lowering. This results in the AND having a different type than it started with. This prevents our custom matching logic from seeing the two logic ops. Added: Modified: llvm/test/CodeGen/X86/avx512-logic.ll llvm/test/CodeGen/X86/avx512vl-logic.ll Removed: ################################################################################ diff --git a/llvm/test/CodeGen/X86/avx512-logic.ll b/llvm/test/CodeGen/X86/avx512-logic.ll index c2a4da1ba562..88a3b5aea9bd 100644 --- a/llvm/test/CodeGen/X86/avx512-logic.ll +++ b/llvm/test/CodeGen/X86/avx512-logic.ll @@ -885,3 +885,37 @@ define <16 x i32> @ternlog_xor_andn(<16 x i32> %x, <16 x i32> %y, <16 x i32> %z) %c = xor <16 x i32> %b, %z ret <16 x i32> %c } + +define <16 x i32> @ternlog_or_and_mask(<16 x i32> %x, <16 x i32> %y) { +; KNL-LABEL: ternlog_or_and_mask: +; KNL: ## %bb.0: +; KNL-NEXT: vpandq {{.*}}(%rip), %zmm0, %zmm0 +; KNL-NEXT: vpord %zmm1, %zmm0, %zmm0 +; KNL-NEXT: retq +; +; SKX-LABEL: ternlog_or_and_mask: +; SKX: ## %bb.0: +; SKX-NEXT: vandps {{.*}}(%rip), %zmm0, %zmm0 +; SKX-NEXT: vorps %zmm1, %zmm0, %zmm0 +; SKX-NEXT: retq + %a = and <16 x i32> %x, + %b = or <16 x i32> %a, %y + ret <16 x i32> %b +} + +define <8 x i64> @ternlog_xor_and_mask(<8 x i64> %x, <8 x i64> %y) { +; KNL-LABEL: ternlog_xor_and_mask: +; KNL: ## %bb.0: +; KNL-NEXT: vpandd {{.*}}(%rip), %zmm0, %zmm0 +; KNL-NEXT: vpxorq %zmm1, %zmm0, %zmm0 +; KNL-NEXT: retq +; +; SKX-LABEL: ternlog_xor_and_mask: +; SKX: ## %bb.0: +; SKX-NEXT: vandps {{.*}}(%rip), %zmm0, %zmm0 +; SKX-NEXT: vxorps %zmm1, %zmm0, %zmm0 +; SKX-NEXT: retq + %a = and <8 x i64> %x, + %b = xor <8 x i64> %a, %y + ret <8 x i64> %b +} diff --git a/llvm/test/CodeGen/X86/avx512vl-logic.ll b/llvm/test/CodeGen/X86/avx512vl-logic.ll index 0647f4e33bf2..26d905ebeae7 100644 --- a/llvm/test/CodeGen/X86/avx512vl-logic.ll +++ b/llvm/test/CodeGen/X86/avx512vl-logic.ll @@ -987,3 +987,47 @@ define <4 x i32> @ternlog_xor_andn(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) { %c = xor <4 x i32> %b, %z ret <4 x i32> %c } + +define <4 x i32> @ternlog_or_and_mask(<4 x i32> %x, <4 x i32> %y) { +; CHECK-LABEL: ternlog_or_and_mask: +; CHECK: ## %bb.0: +; CHECK-NEXT: vandps {{.*}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT: vorps %xmm1, %xmm0, %xmm0 +; CHECK-NEXT: retq + %a = and <4 x i32> %x, + %b = or <4 x i32> %a, %y + ret <4 x i32> %b +} + +define <8 x i32> @ternlog_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y) { +; CHECK-LABEL: ternlog_or_and_mask_ymm: +; CHECK: ## %bb.0: +; CHECK-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; CHECK-NEXT: vorps %ymm1, %ymm0, %ymm0 +; CHECK-NEXT: retq + %a = and <8 x i32> %x, + %b = or <8 x i32> %a, %y + ret <8 x i32> %b +} + +define <2 x i64> @ternlog_xor_and_mask(<2 x i64> %x, <2 x i64> %y) { +; CHECK-LABEL: ternlog_xor_and_mask: +; CHECK: ## %bb.0: +; CHECK-NEXT: vandps {{.*}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT: vxorps %xmm1, %xmm0, %xmm0 +; CHECK-NEXT: retq + %a = and <2 x i64> %x, + %b = xor <2 x i64> %a, %y + ret <2 x i64> %b +} + +define <4 x i64> @ternlog_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y) { +; CHECK-LABEL: ternlog_xor_and_mask_ymm: +; CHECK: ## %bb.0: +; CHECK-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0 +; CHECK-NEXT: vxorps %ymm1, %ymm0, %ymm0 +; CHECK-NEXT: retq + %a = and <4 x i64> %x, + %b = xor <4 x i64> %a, %y + ret <4 x i64> %b +} From llvm-commits at lists.llvm.org Sat Jul 11 12:58:35 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 19:58:35 +0000 (UTC) Subject: [PATCH] D83629: [Utils] Check function attributes in update_test_checks Message-ID: sstefan1 created this revision. sstefan1 added a reviewer: jdoerfert. Herald added subscribers: llvm-commits, arichardson. Herald added a project: LLVM. This introduces new flag to the update_test_checks that allows for function attributes to be checked in a check-line. If the flag is not, the behavior should remain the same. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83629 Files: llvm/utils/UpdateTestChecks/common.py llvm/utils/update_analyze_test_checks.py llvm/utils/update_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D83629.277257.patch Type: text/x-patch Size: 8250 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 13:03:39 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 20:03:39 +0000 (UTC) Subject: [PATCH] D83630: [X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops Message-ID: craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel. Herald added a subscriber: hiraditya. Herald added a project: LLVM. Previously we just matched the logic ops and replaced with an X86ISD::VPTERNLOG node that we would send through the normal pattern match. But that approach couldn't handle a bitcast between the logic ops. Extending that approach would require us to peek through the bitcasts and emit new bitcasts to match the types. Those new bitcasts would then have to be properly topologically sorted. This patch instead switches to directly emitting the MachineSDNode and skips the normal tablegen pattern matching. We do have to handle load folding and broadcast load folding ourselves now. Which also means commuting the immediate control. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83630 Files: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp llvm/test/CodeGen/X86/avx512-logic.ll llvm/test/CodeGen/X86/avx512vl-logic.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83630.277258.patch Type: text/x-patch Size: 9995 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 13:07:28 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 20:07:28 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: aykevl marked 2 inline comments as done. aykevl added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:17 +# CHECK-NEXT: ldi r20, 255 +ldi r20, lo8(a) +ldi r20, hi8(a) ---------------- MaskRay wrote: > aykevl wrote: > > MaskRay wrote: > > > Might be worth adding comments about the exact relocation type used, e.g. > > > > > > `ldi r20, lo8(a) # R_AVR_LO8_LDI...` > > Good idea, I'll change this. > Doesn't `# ` work as well? Apparently not. I tried, but the assembler chokes on it. It appears that `;` is the comment char in AVR assembly. ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- MaskRay wrote: > aykevl wrote: > > MaskRay wrote: > > > `{{.*}} 1e1e000f 00785634 12` > > > > > > The address of .DATA is insignificant and should be omitted. > > What do you mean? The address is not included in the test. > > > > This is the full output: > > > > ``` > > Contents of section .DATA: > > 110e4 1e1e000f 00785634 12 .....xV4. > > ``` > > > > I believe `110e4` is the address (not included in the test) while `1e1e000f 00785634 12` is the contents of the section (in hex). > > > > Or do you mean I should check for the presence of an address using `{{.*}}`? > 110e4 as the address is insignificant. If the content is not dependent on the address, omitting 110e4 has the benefit that the test does not need an update if the assigned addresses change. > Sorry, I still don't understand. There is no address `110e4` in the test code that could be removed. The test is independent of the address of the `.DATA` section. What change do you suggest? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 13:22:04 2020 From: llvm-commits at lists.llvm.org (Krzysztof Pszeniczny via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 20:22:04 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <89cb03c50096cdc613c674fecea76082@localhost.localdomain> amharc added a comment. In D79978#2145979 , @MaskRay wrote: > I cannot find a test with `.cfa_offset` or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers? I'm not exactly sure which CFI instruction this comment refers to (there is no `.cfa_offset` directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html). If to `.cfi_def_cfa_offset` - the question was answered before: we emit a full CFA definition (`.cfi_def_cfa` - setting both the offset and the register) which is equivalent to a `.cfi_def_cfa_offset` (only setting the offset) plus a `.cfi_def_cfa_register` (only setting the register). Moreover, for code following the SysV ABI the register used in the `.cfi_def_cfa_register`/`.cfi_def_cfa` instructions is conventionally always set to either `rbp` (if the frame pointer is kept explicitly) or in `rsp` (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer. If the review comment above refers to `.cfi_offset` instead, please note that the `cfi-basic-block-sections-1.ll` test checks that `.cfi_offset` is emitted properly in the generated assembly and `cfi-inserter-basic-block-sections-callee-save-registers.ll` checks that the CFI Inserter Pass actually inserts the corresponding `CFI_INSTRUCTION offset` instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 13:41:45 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 20:41:45 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: MaskRay added a comment. In D79978#2146009 , @amharc wrote: > In D79978#2145979 , @MaskRay wrote: > > > I cannot find a test with `.cfa_offset` or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers? > > > I'm not exactly sure which CFI instruction this comment refers to (there is no `.cfa_offset` directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html). > > If to `.cfi_def_cfa_offset` - the question was answered before: we emit a full CFA definition (`.cfi_def_cfa` - setting both the offset and the register) which is equivalent to a `.cfi_def_cfa_offset` (only setting the offset) plus a `.cfi_def_cfa_register` (only setting the register). Moreover, for code following the SysV ABI the register used in the `.cfi_def_cfa_register`/`.cfi_def_cfa` instructions is conventionally always set to either `rbp` (if the frame pointer is kept explicitly) or in `rsp` (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer. > > If the review comment above refers to `.cfi_offset` instead, please note that the `cfi-basic-block-sections-1.ll` test checks that `.cfi_offset` is emitted properly in the generated assembly and `cfi-inserter-basic-block-sections-callee-save-registers.ll` checks that the CFI Inserter Pass actually inserts the corresponding `CFI_INSTRUCTION offset` instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead. I ran llc for both tests. Neither has a section with CFI describing a non-rbp register at the top, thus I say I am not sure the implementation is properly tested. My proposed inline assembly makes sure more than one registers (not just the current CFA register) are recorded. Please inspect the output. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 13:46:17 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 20:46:17 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: MaskRay added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- aykevl wrote: > MaskRay wrote: > > aykevl wrote: > > > MaskRay wrote: > > > > `{{.*}} 1e1e000f 00785634 12` > > > > > > > > The address of .DATA is insignificant and should be omitted. > > > What do you mean? The address is not included in the test. > > > > > > This is the full output: > > > > > > ``` > > > Contents of section .DATA: > > > 110e4 1e1e000f 00785634 12 .....xV4. > > > ``` > > > > > > I believe `110e4` is the address (not included in the test) while `1e1e000f 00785634 12` is the contents of the section (in hex). > > > > > > Or do you mean I should check for the presence of an address using `{{.*}}`? > > 110e4 as the address is insignificant. If the content is not dependent on the address, omitting 110e4 has the benefit that the test does not need an update if the assigned addresses change. > > > Sorry, I still don't understand. There is no address `110e4` in the test code that could be removed. The test is independent of the address of the `.DATA` section. > > What change do you suggest? I have mentioned `{{.*}} 1e1e000f 00785634 12`. See the first comment. It matches ` 110e4 1e1e000f 00785634 12` and ` 210e4 1e1e000f 00785634 12` and other addresses. For a contributor updating LLD's address assignment algorithm, it is very annoying to update every test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 14:02:59 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 21:02:59 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <40a6460b5cc6c9d245ebddd9b256d486@localhost.localdomain> aykevl marked an inline comment as done. aykevl added inline comments. ================ Comment at: lld/test/ELF/avr-reloc.s:80 +# HEX-LABEL: section .DATA: +# HEX-NEXT: 1e1e000f 00785634 12 +.byte b ---------------- MaskRay wrote: > aykevl wrote: > > MaskRay wrote: > > > aykevl wrote: > > > > MaskRay wrote: > > > > > `{{.*}} 1e1e000f 00785634 12` > > > > > > > > > > The address of .DATA is insignificant and should be omitted. > > > > What do you mean? The address is not included in the test. > > > > > > > > This is the full output: > > > > > > > > ``` > > > > Contents of section .DATA: > > > > 110e4 1e1e000f 00785634 12 .....xV4. > > > > ``` > > > > > > > > I believe `110e4` is the address (not included in the test) while `1e1e000f 00785634 12` is the contents of the section (in hex). > > > > > > > > Or do you mean I should check for the presence of an address using `{{.*}}`? > > > 110e4 as the address is insignificant. If the content is not dependent on the address, omitting 110e4 has the benefit that the test does not need an update if the assigned addresses change. > > > > > Sorry, I still don't understand. There is no address `110e4` in the test code that could be removed. The test is independent of the address of the `.DATA` section. > > > > What change do you suggest? > I have mentioned `{{.*}} 1e1e000f 00785634 12`. See the first comment. > > It matches ` 110e4 1e1e000f 00785634 12` and ` 210e4 1e1e000f 00785634 12` and other addresses. For a contributor updating LLD's address assignment algorithm, it is very annoying to update every test. > > Oh okay, didn't interpret that as a code change suggestion. Will update the patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 14:07:00 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 21:07:00 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <7e0ee5848af4058e5adb91540a05514a@localhost.localdomain> aykevl updated this revision to Diff 277261. aykevl added a comment. - Updated `.DATA` test. I think this addresses all review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 Files: lld/ELF/Arch/AVR.cpp lld/test/ELF/avr-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D78741.277261.patch Type: text/x-patch Size: 6579 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 14:32:01 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Sat, 11 Jul 2020 14:32:01 -0700 (PDT) Subject: [llvm] 6792069 - [NewGVN] Regenerate test checks (NFC) Message-ID: <5f0a2fd1.1c69fb81.6774.7e26@mx.google.com> Author: Nikita Popov Date: 2020-07-11T22:51:49+02:00 New Revision: 6792069a3fdb412d06dd3cc42a6181c6fb7db860 URL: https://github.com/llvm/llvm-project/commit/6792069a3fdb412d06dd3cc42a6181c6fb7db860 DIFF: https://github.com/llvm/llvm-project/commit/6792069a3fdb412d06dd3cc42a6181c6fb7db860.diff LOG: [NewGVN] Regenerate test checks (NFC) Added: Modified: llvm/test/Transforms/NewGVN/assumes.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/NewGVN/assumes.ll b/llvm/test/Transforms/NewGVN/assumes.ll index 065cc0fb62e0..ea20b38bff6a 100644 --- a/llvm/test/Transforms/NewGVN/assumes.ll +++ b/llvm/test/Transforms/NewGVN/assumes.ll @@ -1,16 +1,28 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt < %s -newgvn -S | FileCheck %s -; CHECK-LABEL: @test1 -; CHECK: ret i32 %arg define i32 @test1(i32 %arg) { +; CHECK-LABEL: @test1( +; CHECK-NEXT: [[CMP:%.*]] = icmp sge i32 [[ARG:%.*]], 5 +; CHECK-NEXT: call void @llvm.assume(i1 true) +; CHECK-NEXT: ret i32 [[ARG]] +; %cmp = icmp sge i32 %arg, 5 call void @llvm.assume(i1 %cmp) ret i32 %arg } -; CHECK-LABEL: @test2 -; CHECK: ret i32 %arg define i32 @test2(i32 %arg, i1 %b) { +; CHECK-LABEL: @test2( +; CHECK-NEXT: br label [[BB:%.*]] +; CHECK: bb: +; CHECK-NEXT: [[A:%.*]] = phi i32 [ 1, [[TMP0:%.*]] ], [ 2, [[BB]] ] +; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[ARG:%.*]], [[A]] +; CHECK-NEXT: call void @llvm.assume(i1 true) +; CHECK-NEXT: br i1 [[B:%.*]], label [[BB]], label [[END:%.*]] +; CHECK: end: +; CHECK-NEXT: ret i32 [[ARG]] +; br label %bb bb: From llvm-commits at lists.llvm.org Sat Jul 11 14:42:23 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 21:42:23 +0000 (UTC) Subject: [PATCH] D83631: [PredicateInfo] Place predicate info after assume Message-ID: nikic created this revision. nikic added a reviewer: fhahn. Herald added subscribers: llvm-commits, hiraditya, Prazek. Herald added a project: LLVM. Place the ssa.copy instructions for assumes after the assume, instead of before it. Both options are valid, but placing them afterwards prevents assumes from being replaced with assume(true). This fixes https://bugs.llvm.org/show_bug.cgi?id=37541 in NewGVN and will avoid a similar issue in SCCP when we handle more predicate infos. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83631 Files: llvm/lib/Transforms/Utils/PredicateInfo.cpp llvm/test/Transforms/NewGVN/assumes.ll llvm/test/Transforms/Util/PredicateInfo/testandor.ll Index: llvm/test/Transforms/Util/PredicateInfo/testandor.ll =================================================================== --- llvm/test/Transforms/Util/PredicateInfo/testandor.ll +++ llvm/test/Transforms/Util/PredicateInfo/testandor.ll @@ -1,5 +1,5 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -print-predicateinfo < %s 2>&1 | FileCheck %s +; RUN: opt -print-predicateinfo < %s 2>&1 >/dev/null | FileCheck %s declare void @foo(i1) declare void @bar(i32) @@ -136,18 +136,18 @@ ; CHECK-NEXT: [[XZ:%.*]] = icmp eq i32 [[X:%.*]], 0 ; CHECK-NEXT: [[YZ:%.*]] = icmp eq i32 [[Y:%.*]], 0 ; CHECK-NEXT: [[Z:%.*]] = and i1 [[XZ]], [[YZ]] -; CHECK: [[TMP1:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[XZ]]) -; CHECK: [[TMP2:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[X]]) +; CHECK-NEXT: call void @llvm.assume(i1 [[Z]]) +; CHECK: [[TMP1:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[Z]]) +; CHECK: [[TMP2:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[Y]]) ; CHECK: [[TMP3:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[YZ]]) -; CHECK: [[TMP4:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[Y]]) -; CHECK: [[TMP5:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[Z]]) -; CHECK-NEXT: call void @llvm.assume(i1 [[TMP5]]) -; CHECK: [[DOT0:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[TMP1]]) -; CHECK: [[DOT01:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[TMP2]]) +; CHECK: [[TMP4:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[X]]) +; CHECK: [[TMP5:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[XZ]]) +; CHECK: [[DOT0:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[TMP5]]) +; CHECK: [[DOT01:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[TMP4]]) ; CHECK: [[DOT02:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[TMP3]]) -; CHECK: [[DOT03:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[TMP4]]) -; CHECK: [[DOT04:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[TMP5]]) -; CHECK-NEXT: br i1 [[TMP5]], label [[BOTH:%.*]], label [[NOPE:%.*]] +; CHECK: [[DOT03:%.*]] = call i32 @llvm.ssa.copy.{{.+}}(i32 [[TMP2]]) +; CHECK: [[DOT04:%.*]] = call i1 @llvm.ssa.copy.{{.+}}(i1 [[TMP1]]) +; CHECK-NEXT: br i1 [[TMP1]], label [[BOTH:%.*]], label [[NOPE:%.*]] ; CHECK: both: ; CHECK-NEXT: call void @foo(i1 [[DOT0]]) ; CHECK-NEXT: call void @foo(i1 [[DOT02]]) Index: llvm/test/Transforms/NewGVN/assumes.ll =================================================================== --- llvm/test/Transforms/NewGVN/assumes.ll +++ llvm/test/Transforms/NewGVN/assumes.ll @@ -4,7 +4,7 @@ define i32 @test1(i32 %arg) { ; CHECK-LABEL: @test1( ; CHECK-NEXT: [[CMP:%.*]] = icmp sge i32 [[ARG:%.*]], 5 -; CHECK-NEXT: call void @llvm.assume(i1 true) +; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]]) ; CHECK-NEXT: ret i32 [[ARG]] ; %cmp = icmp sge i32 %arg, 5 @@ -18,7 +18,7 @@ ; CHECK: bb: ; CHECK-NEXT: [[A:%.*]] = phi i32 [ 1, [[TMP0:%.*]] ], [ 2, [[BB]] ] ; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[ARG:%.*]], [[A]] -; CHECK-NEXT: call void @llvm.assume(i1 true) +; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]]) ; CHECK-NEXT: br i1 [[B:%.*]], label [[BB]], label [[END:%.*]] ; CHECK: end: ; CHECK-NEXT: ret i32 [[ARG]] Index: llvm/lib/Transforms/Utils/PredicateInfo.cpp =================================================================== --- llvm/lib/Transforms/Utils/PredicateInfo.cpp +++ llvm/lib/Transforms/Utils/PredicateInfo.cpp @@ -205,14 +205,14 @@ // numbering will say the placed predicaeinfos should go first (IE // LN_beginning), so we won't be in this function. For assumes, we will end // up here, beause we need to order the def we will place relative to the - // assume. So for the purpose of ordering, we pretend the def is the assume - // because that is where we will insert the info. + // assume. So for the purpose of ordering, we pretend the def is right + // after the assume, because that is where we will insert the info. if (!VD.U) { assert(VD.PInfo && "No def, no use, and no predicateinfo should not occur"); assert(isa(VD.PInfo) && "Middle of block should only occur for assumes"); - return cast(VD.PInfo)->AssumeInst; + return cast(VD.PInfo)->AssumeInst->getNextNode(); } return nullptr; } @@ -621,7 +621,9 @@ auto *PAssume = dyn_cast(ValInfo); assert(PAssume && "Should not have gotten here without it being an assume"); - IRBuilder<> B(PAssume->AssumeInst); + // Insert the predicate directly after the assume. While it also holds + // directly before it, assume(i1 true) is not a useful fact. + IRBuilder<> B(PAssume->AssumeInst->getNextNode()); Function *IF = getCopyDeclaration(F.getParent(), Op->getType()); if (IF->users().empty()) PI.CreatedDeclarations.insert(IF); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83631.277262.patch Type: text/x-patch Size: 5051 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 14:51:07 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 21:51:07 +0000 (UTC) Subject: [PATCH] D83629: [Utils] Check function attributes in update_test_checks In-Reply-To: References: Message-ID: sstefan1 updated this revision to Diff 277263. sstefan1 added a comment. remove unnecessary flag, fix update_cc_test_checks Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83629/new/ https://reviews.llvm.org/D83629 Files: llvm/utils/UpdateTestChecks/common.py llvm/utils/update_analyze_test_checks.py llvm/utils/update_cc_test_checks.py llvm/utils/update_test_checks.py -------------- next part -------------- A non-text attachment was scrubbed... Name: D83629.277263.patch Type: text/x-patch Size: 6925 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 14:56:44 2020 From: llvm-commits at lists.llvm.org (Matt Arsenault via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 21:56:44 +0000 (UTC) Subject: [PATCH] D83417: GlobalISel: Restructure argument lowering loop in handleAssignments In-Reply-To: References: Message-ID: arsenm updated this revision to Diff 277264. arsenm added a comment. Herald added subscribers: kerbowa, nhaehnle, jvesely. Use LocVT, which this code should have been using in the first place which avoids shrinking the store. I also found the 2 forms of assignValueToAddress confusing, and the full form interface can't handle the case where a single stack slot covers multiple registers (although it seems unlikely this would ever be needed) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83417/new/ https://reviews.llvm.org/D83417 Files: llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h llvm/lib/CodeGen/GlobalISel/CallLowering.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83417.277264.patch Type: text/x-patch Size: 12001 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 15:26:49 2020 From: llvm-commits at lists.llvm.org (Philip Reames via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 22:26:49 +0000 (UTC) Subject: [PATCH] D81648: MIR Statepoint refactoring. Part 4: ISEL changes. In-Reply-To: References: Message-ID: reames requested changes to this revision. reames added a comment. This revision now requires changes to proceed. I took some time today to apply this patch with the attention of addressing some of the unaddressed comments and landing it to unblock progress here. However, I hit two problems. First, a trivial test case crashes with this flag enabled. define void @in_register(i8 addrspace(1)* %a) #1 gc "statepoint-example" { %sp = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* undef, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1)* %a)] %a1 = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %sp, i32 0, i32 0) call void @use(i8 addrspace(1)* %a1) ret void } Second, there appears to be a semantic problem around the handling of base vs derived slots unless we *always* spill the base. We can't tie both uses to a single def. This may warrant some offline discussion. Denis, please address all of the previous applied comments, including writing targetting tests. ================ Comment at: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:225 unsigned NumVRegs = HasVRegVariadicDefs ? NumResults : II.getNumDefs(); + if (Node->getMachineOpcode() == TargetOpcode::STATEPOINT) + NumVRegs = NumResults; ---------------- dantrushin wrote: > reames wrote: > > This line looks unnecessary provided the defs list is marked variadic. If you want to leave it for the moment, that's fine, but I think we can clean this up in a follow up change. > > > > (To be specific, OutOperandList in Target.td for STATEPOINT can be variable_ops, and this line disappears.) > `II` here is `MCInstrDesc`. > Statepoint has 0 defs in it MCIntrDesc. (It is `MachineInstr::getNumDefs` which work correctly for variadic outs). > So I really need it here > Please read the line above your change with the explicit handling for variadic defs. (Not a blocking item. I'll fix after submit if needed) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81648/new/ https://reviews.llvm.org/D81648 From llvm-commits at lists.llvm.org Sat Jul 11 15:52:42 2020 From: llvm-commits at lists.llvm.org (Petr Hosek via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 22:52:42 +0000 (UTC) Subject: [PATCH] D83565: [compiler-rt][CMake] Pass down LLVM_LIT_ARGS in runtime build In-Reply-To: References: Message-ID: <05c47bbeeabdd9f90131c51fc9c90974@localhost.localdomain> phosek accepted this revision. phosek added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83565/new/ https://reviews.llvm.org/D83565 From llvm-commits at lists.llvm.org Sat Jul 11 16:41:37 2020 From: llvm-commits at lists.llvm.org (Arthur Eubanks via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 23:41:37 +0000 (UTC) Subject: [PATCH] D83633: [NewPM][opt] Translate -foo-analysis to require Message-ID: aeubanks created this revision. aeubanks added reviewers: ychen, asbirlea, hans. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Fixes 53 check-llvm tests under NPM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83633 Files: llvm/include/llvm/Passes/PassBuilder.h llvm/lib/Passes/PassBuilder.cpp llvm/tools/opt/NewPMDriver.cpp Index: llvm/tools/opt/NewPMDriver.cpp =================================================================== --- llvm/tools/opt/NewPMDriver.cpp +++ llvm/tools/opt/NewPMDriver.cpp @@ -358,8 +358,11 @@ } } for (auto PassName : NonAAPasses) { - if (auto Err = - PB.parsePassPipeline(MPM, PassName, VerifyEachPass, DebugPM)) { + std::string ModifiedPassName(PassName.begin(), PassName.end()); + if (PB.isAnalysisPassName(PassName)) + ModifiedPassName = "require<" + ModifiedPassName + ">"; + if (auto Err = PB.parsePassPipeline(MPM, ModifiedPassName, VerifyEachPass, + DebugPM)) { errs() << Arg0 << ": " << toString(std::move(Err)) << "\n"; return false; } Index: llvm/lib/Passes/PassBuilder.cpp =================================================================== --- llvm/lib/Passes/PassBuilder.cpp +++ llvm/lib/Passes/PassBuilder.cpp @@ -2665,3 +2665,20 @@ #include "PassRegistry.def" return false; } + +bool PassBuilder::isAnalysisPassName(StringRef PassName) { +#define MODULE_ANALYSIS(NAME, CREATE_PASS) \ + if (PassName == NAME) \ + return true; +#define FUNCTION_ANALYSIS(NAME, CREATE_PASS) \ + if (PassName == NAME) \ + return true; +#define LOOP_ANALYSIS(NAME, CREATE_PASS) \ + if (PassName == NAME) \ + return true; +#define CGSSC_ANALYSIS(NAME, CREATE_PASS) \ + if (PassName == NAME) \ + return true; +#include "PassRegistry.def" + return false; +} Index: llvm/include/llvm/Passes/PassBuilder.h =================================================================== --- llvm/include/llvm/Passes/PassBuilder.h +++ llvm/include/llvm/Passes/PassBuilder.h @@ -518,6 +518,9 @@ /// Returns true if the pass name is the name of an alias analysis pass. bool isAAPassName(StringRef PassName); + /// Returns true if the pass name is the name of a (non-alias) analysis pass. + bool isAnalysisPassName(StringRef PassName); + /// Register a callback for a default optimizer pipeline extension /// point /// -------------- next part -------------- A non-text attachment was scrubbed... Name: D83633.277266.patch Type: text/x-patch Size: 2398 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 16:42:20 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sat, 11 Jul 2020 16:42:20 -0700 (PDT) Subject: [llvm] 4dbe82e - [Attributor] Introudce attribute seed allow list. Message-ID: <5f0a4e5c.1c69fb81.aa776.8254@mx.google.com> Author: kuter Date: 2020-07-12T02:25:33+03:00 New Revision: 4dbe82eef34e5ab8a9b0dabdbca194ff6858fc7f URL: https://github.com/llvm/llvm-project/commit/4dbe82eef34e5ab8a9b0dabdbca194ff6858fc7f DIFF: https://github.com/llvm/llvm-project/commit/4dbe82eef34e5ab8a9b0dabdbca194ff6858fc7f.diff LOG: [Attributor] Introudce attribute seed allow list. Added: llvm/test/Transforms/Attributor/allow_list.ll Modified: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/Attributor.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h b/llvm/include/llvm/Transforms/IPO/Attributor.h index c6261845b765..d2666d4b8682 100644 --- a/llvm/include/llvm/Transforms/IPO/Attributor.h +++ b/llvm/include/llvm/Transforms/IPO/Attributor.h @@ -891,6 +891,13 @@ struct Attributor { // No matching attribute found, create one. // Use the static create method. auto &AA = AAType::createForPosition(IRP, *this); + + // If we are currenty seeding attributes, enforce seeding rules. + if (SeedingPeriod && !shouldSeedAttribute(AA)) { + AA.getState().indicatePessimisticFixpoint(); + return AA; + } + registerAA(AA); // For now we ignore naked and optnone functions. @@ -918,8 +925,15 @@ struct Attributor { return AA; } + // Allow seeded attributes to declare dependencies. + // Remember the seeding state. + bool OldSeedingPeriod = SeedingPeriod; + SeedingPeriod = false; + updateAA(AA); + SeedingPeriod = OldSeedingPeriod; + if (TrackDependence && AA.getState().isValidState()) recordDependence(AA, const_cast(*QueryingAA), DepClass); @@ -1345,6 +1359,10 @@ struct Attributor { ChangeStatus rewriteFunctionSignatures(SmallPtrSetImpl &ModifiedFns); + /// Check if the Attribute \p AA should be seeded. + /// See getOrCreateAAFor. + bool shouldSeedAttribute(AbstractAttribute &AA); + /// The set of all abstract attributes. ///{ using AAVector = SmallVector; @@ -1410,6 +1428,10 @@ struct Attributor { /// Invoke instructions with at least a single dead successor block. SmallVector InvokeWithDeadSuccessor; + /// Wheather attributes are being `seeded`, always false after ::run function + /// gets called \see getOrCreateAAFor. + bool SeedingPeriod = true; + /// Functions, blocks, and instructions we delete after manifest is done. /// ///{ diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp b/llvm/lib/Transforms/IPO/Attributor.cpp index 7f252079e053..6e5625d26c38 100644 --- a/llvm/lib/Transforms/IPO/Attributor.cpp +++ b/llvm/lib/Transforms/IPO/Attributor.cpp @@ -78,6 +78,12 @@ static cl::opt "wrappers for non-exact definitions."), cl::init(false)); +static cl::list + SeedAllowList("attributor-seed-allow-list", cl::Hidden, + cl::desc("Comma seperated list of attrbute names that are " + "allowed to be seeded."), + cl::ZeroOrMore, cl::CommaSeparated); + /// Logic operators for the change status enum class. /// ///{ @@ -1256,6 +1262,7 @@ ChangeStatus Attributor::cleanupIR() { } ChangeStatus Attributor::run() { + SeedingPeriod = false; runTillFixpoint(); ChangeStatus ManifestChange = manifestAttributes(); ChangeStatus CleanupChange = cleanupIR(); @@ -1452,6 +1459,12 @@ bool Attributor::registerFunctionSignatureRewrite( return true; } +bool Attributor::shouldSeedAttribute(AbstractAttribute &AA) { + if (SeedAllowList.size() == 0) + return true; + return std::count(SeedAllowList.begin(), SeedAllowList.end(), AA.getName()); +} + ChangeStatus Attributor::rewriteFunctionSignatures( SmallPtrSetImpl &ModifiedFns) { ChangeStatus Changed = ChangeStatus::UNCHANGED; diff --git a/llvm/test/Transforms/Attributor/allow_list.ll b/llvm/test/Transforms/Attributor/allow_list.ll new file mode 100644 index 000000000000..7670090cb03b --- /dev/null +++ b/llvm/test/Transforms/Attributor/allow_list.ll @@ -0,0 +1,33 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes +; RUN: opt -S -passes=attributor --attributor-seed-allow-list asd < %s | FileCheck %s --check-prefixes=CHECK_DISABLED +; RUN: opt -S -passes=attributor --attributor-seed-allow-list AAValueSimplify < %s | FileCheck %s --check-prefixes=CHECK_ENABLED + +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" + +; Function Attrs: nounwind uwtable +define internal i32 @range_test(i32 %a) #0 { +; CHECK_DISABLED-LABEL: define {{[^@]+}}@range_test +; CHECK_DISABLED-SAME: (i32 [[A:%.*]]) +; CHECK_DISABLED-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[A]], 100 +; CHECK_DISABLED-NEXT: [[TMP2:%.*]] = zext i1 [[TMP1]] to i32 +; CHECK_DISABLED-NEXT: ret i32 [[TMP2]] +; + %1 = icmp sgt i32 %a, 100 + %2 = zext i1 %1 to i32 + ret i32 %2 +} + +; Function Attrs: nounwind uwtable +define i32 @range_use() #0 { +; CHECK_DISABLED-LABEL: define {{[^@]+}}@range_use() +; CHECK_DISABLED-NEXT: [[TMP1:%.*]] = call i32 @range_test(i32 123) +; CHECK_DISABLED-NEXT: ret i32 [[TMP1]] +; +; CHECK_ENABLED-LABEL: define {{[^@]+}}@range_use() +; CHECK_ENABLED-NEXT: ret i32 1 +; + %1 = call i32 @range_test(i32 123) + ret i32 %1 +} + +attributes #0 = { nounwind uwtable noinline } \ No newline at end of file From llvm-commits at lists.llvm.org Sat Jul 11 16:45:15 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Sat, 11 Jul 2020 16:45:15 -0700 (PDT) Subject: [llvm] d1bcddb - [llvm-objdump][test] Move tests after dc4a6f5db4f0178bae43ef615cc8902c759d6195 Message-ID: <5f0a4f0b.1c69fb81.37098.8c26@mx.google.com> Author: Fangrui Song Date: 2020-07-11T16:45:05-07:00 New Revision: d1bcddb5c1fe7135e712b0e08874ed64c70f3e49 URL: https://github.com/llvm/llvm-project/commit/d1bcddb5c1fe7135e712b0e08874ed64c70f3e49 DIFF: https://github.com/llvm/llvm-project/commit/d1bcddb5c1fe7135e712b0e08874ed64c70f3e49.diff LOG: [llvm-objdump][test] Move tests after dc4a6f5db4f0178bae43ef615cc8902c759d6195 Move RISCV/ to ELF/RISCV/ as well. Added: llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/debug.c llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/wide-char.c llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4-sections.s llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4.s llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5-sections.s llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5.s llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-wide-chars.s llvm/test/tools/llvm-objdump/ELF/PowerPC/debug-vars.s llvm/test/tools/llvm-objdump/ELF/RISCV/lit.local.cfg llvm/test/tools/llvm-objdump/ELF/RISCV/unknown-arch-attr.test Modified: Removed: llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s llvm/test/tools/llvm-objdump/ARM/lit.local.cfg llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s llvm/test/tools/llvm-objdump/PowerPC/lit.local.cfg llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg llvm/test/tools/llvm-objdump/RISCV/unknown-arch-attr.test ################################################################################ diff --git a/llvm/test/tools/llvm-objdump/ARM/lit.local.cfg b/llvm/test/tools/llvm-objdump/ARM/lit.local.cfg deleted file mode 100644 index 236e1d344166..000000000000 --- a/llvm/test/tools/llvm-objdump/ARM/lit.local.cfg +++ /dev/null @@ -1,2 +0,0 @@ -if not 'ARM' in config.root.targets: - config.unsupported = True diff --git a/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c b/llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/debug.c similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c rename to llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/debug.c diff --git a/llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c b/llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/wide-char.c similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/Inputs/wide-char.c rename to llvm/test/tools/llvm-objdump/ELF/ARM/Inputs/wide-char.c diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s b/llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4-sections.s similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s rename to llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4-sections.s diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s b/llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4.s similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s rename to llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4.s diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s b/llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5-sections.s similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5-sections.s rename to llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5-sections.s diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s b/llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5.s similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf5.s rename to llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf5.s diff --git a/llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s b/llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-wide-chars.s similarity index 100% rename from llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s rename to llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-wide-chars.s diff --git a/llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s b/llvm/test/tools/llvm-objdump/ELF/PowerPC/debug-vars.s similarity index 100% rename from llvm/test/tools/llvm-objdump/PowerPC/debug-vars.s rename to llvm/test/tools/llvm-objdump/ELF/PowerPC/debug-vars.s diff --git a/llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg b/llvm/test/tools/llvm-objdump/ELF/RISCV/lit.local.cfg similarity index 100% rename from llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg rename to llvm/test/tools/llvm-objdump/ELF/RISCV/lit.local.cfg diff --git a/llvm/test/tools/llvm-objdump/RISCV/unknown-arch-attr.test b/llvm/test/tools/llvm-objdump/ELF/RISCV/unknown-arch-attr.test similarity index 100% rename from llvm/test/tools/llvm-objdump/RISCV/unknown-arch-attr.test rename to llvm/test/tools/llvm-objdump/ELF/RISCV/unknown-arch-attr.test diff --git a/llvm/test/tools/llvm-objdump/PowerPC/lit.local.cfg b/llvm/test/tools/llvm-objdump/PowerPC/lit.local.cfg deleted file mode 100644 index 091332439b18..000000000000 --- a/llvm/test/tools/llvm-objdump/PowerPC/lit.local.cfg +++ /dev/null @@ -1,2 +0,0 @@ -if not 'PowerPC' in config.root.targets: - config.unsupported = True From llvm-commits at lists.llvm.org Sat Jul 11 16:46:59 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sat, 11 Jul 2020 23:46:59 +0000 (UTC) Subject: [PATCH] D83351: [llvm-reduce] Reducing attributes In-Reply-To: References: Message-ID: dblaikie added a comment. > In D83351#2139879 , @nickdesaulniers wrote: > >> In D83351#2139818 , @lebedev.ri wrote: >> >> > so i'm still waiting on the link/patch. >> >> >> Treat your fellow contributors with more respect, please.  I know style disagreements aren't exciting, but we're all on the same team. > > > Please remember that any [self-respecting] community is diverse, and almost by definition > that includes people with different 'base' languages. American speak of sugar-coating > very message is neither universally-used nor is required, > and not always following it does not mean ill-intent. Sure enough - though the response did feel rather dismissive to me. I was explaining why I felt that the stylistic guidance was merited, but also wasn't what I think is best for the style guide (I think there's a lot of things that would be considered less readable that we can't encode in the style guide due to there being too many of them - that matter doesn't seem to have been answered/engaged with) and your response seemed to ignore the merits of that and abruptly repeat your point. It's important for diversity to also create a welcoming community - that's not necessarily sugar coating, but engaging with kindness and understanding to other folks & hopefully be able to engage with the discussion in productive ways. None of us get this perfect all the time, to be sure. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83351/new/ https://reviews.llvm.org/D83351 From llvm-commits at lists.llvm.org Sat Jul 11 17:07:51 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:07:51 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex Message-ID: MaskRay created this revision. MaskRay added reviewers: aykevl, dylanmckay. Herald added subscribers: llvm-commits, Jim, hiraditya. Herald added a project: LLVM. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83634 Files: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp Index: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp =================================================================== --- llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp +++ llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp @@ -131,7 +131,7 @@ O << getPrettyRegisterName(Op.getReg(), MRI); } } else if (Op.isImm()) { - O << Op.getImm(); + O << formatImm(Op.getImm()); } else { assert(Op.isExpr() && "Unknown operand kind in printOperand"); O << *Op.getExpr(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83634.277267.patch Type: text/x-patch Size: 509 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 17:08:12 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:08:12 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex In-Reply-To: References: Message-ID: <78c0b6c5809475330e8366757440c8fb@localhost.localdomain> MaskRay updated this revision to Diff 277268. MaskRay added a comment. Forgot to add test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83634/new/ https://reviews.llvm.org/D83634 Files: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp llvm/test/MC/AVR/hex-immediates.s Index: llvm/test/MC/AVR/hex-immediates.s =================================================================== --- /dev/null +++ llvm/test/MC/AVR/hex-immediates.s @@ -0,0 +1,7 @@ +; RUN: llvm-mc -filetype=obj -triple=avr %s -o %t +; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=DEC +; RUN: llvm-objdump -d --print-imm-hex %t | FileCheck %s --check-prefix=HEX + +; DEC: ldi r24, 66 +; HEX: ldi r24, 0x42 + ldi r24, 0x42 Index: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp =================================================================== --- llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp +++ llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp @@ -131,7 +131,7 @@ O << getPrettyRegisterName(Op.getReg(), MRI); } } else if (Op.isImm()) { - O << Op.getImm(); + O << formatImm(Op.getImm()); } else { assert(Op.isExpr() && "Unknown operand kind in printOperand"); O << *Op.getExpr(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83634.277268.patch Type: text/x-patch Size: 936 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 17:10:31 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:10:31 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <2550802d64d7b66e11c145b5d9b674b5@localhost.localdomain> MaskRay added a comment. Please check D83634 . We can then enable --print-imm-hex If the canonical comment marker of AVR is `;`, I think we should just use `; ` everywhere... ================ Comment at: lld/ELF/Arch/AVR.cpp:161 + checkAlignment(loc, val, 2, rel); + const uint16_t target = ((val - 2) >> 1); + write16le(loc, (read16le(loc) & 0xfc07) | ((target & 0x7f) << 3)); ---------------- Delete redundant parentheses: `((val - 2) >> 1)` -> `(val - 2) >> 1` ================ Comment at: lld/ELF/Arch/AVR.cpp:167 + checkAlignment(loc, val, 2, rel); + const uint16_t target = ((val - 2) >> 1); + write16le(loc, (read16le(loc) & 0xf000) | (target & 0xfff)); ---------------- Delete redundant parentheses Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sat Jul 11 17:29:43 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:29:43 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <64af5bd9d74437bd3b87a8ef89605964@localhost.localdomain> tmsriram added a comment. In D79978#2145979 , @MaskRay wrote: > In D79978#2145413 , @tmsriram wrote: > > > ... > > Now, what happens when we start placing individual basic blocks in unique sections: > > > > - Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function. > > - The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently. > > - Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section. > > - Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own. ... > > > Thanks for the detailed write-up! I I've learned a bit from it. I think at least the quoted items should be placed in the description. > > In D79978#2145189 , @amharc wrote: > > > > Note also that only `rbp` is described. I think we need another register to demonstrate the effect. > > > > `rbp` is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test. > > > I cannot find a test with `.cfa_offset` or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers? Please feel free to double-check with any body familiar with X86 , X86 has exactly one register for the frame pointer and that is %rbp, there is *no other register*. The only other option is to omit the frame pointer and *we have checked both now explicitly*. We are already checking for callee-saved registers. That is the whole point of test 2 , there is no need to force inline assembly. @dblaikie could you please help explain here? If you want us to check the final assembly too we can, but I don't see why you are asking for inline assembly when a program could do this. > > > int f3(int i, int j, int k) { > if (i == 0) { // adds a basic block > asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15 > return j; > } > if (j == 0) { // adds a basic block > asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory"); // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13 > return k; > } > return i; > } > > > I get a lot of `.cfi_offset` with `clang -S -emit-llvm -O1 a.c; llc -O0 -basicblock-sections=all < a.ll`: > > ... > .section .text,"ax", at progbits,unique,1 > f3.1: # %if.then > .cfi_startproc > .cfi_def_cfa %rsp, 48 > .cfi_offset %r12, -48 > .cfi_offset %r13, -40 > .cfi_offset %r14, -32 > .cfi_offset %r15, -24 > .cfi_offset %rbp, -16 It is the same thing, it is exactly what test2 does. Do you want use to test the assembly too? We can do it without inline assembly, I personally hate unnecessary uses of inline assembly. > #APP > nop > #NO_APP > movl -8(%rsp), %eax # 4-byte Reload > movl %eax, -16(%rsp) # 4-byte Spill > > ... > > CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 17:49:03 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:49:03 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <2f4bbdb0ce00c1a750119fdd224dc3ca@localhost.localdomain> tmsriram added a comment. In D79978#2146011 , @MaskRay wrote: > In D79978#2146009 , @amharc wrote: > > > In D79978#2145979 , @MaskRay wrote: > > > > > I cannot find a test with `.cfa_offset` or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers? > > > > > > I'm not exactly sure which CFI instruction this comment refers to (there is no `.cfa_offset` directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html). > > > > If to `.cfi_def_cfa_offset` - the question was answered before: we emit a full CFA definition (`.cfi_def_cfa` - setting both the offset and the register) which is equivalent to a `.cfi_def_cfa_offset` (only setting the offset) plus a `.cfi_def_cfa_register` (only setting the register). Moreover, for code following the SysV ABI the register used in the `.cfi_def_cfa_register`/`.cfi_def_cfa` instructions is conventionally always set to either `rbp` (if the frame pointer is kept explicitly) or in `rsp` (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer. > > > > If the review comment above refers to `.cfi_offset` instead, please note that the `cfi-basic-block-sections-1.ll` test checks that `.cfi_offset` is emitted properly in the generated assembly and `cfi-inserter-basic-block-sections-callee-save-registers.ll` checks that the CFI Inserter Pass actually inserts the corresponding `CFI_INSTRUCTION offset` instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead. > > > I ran llc for both tests. Neither has a section with CFI describing a non-rbp register at the top, thus I say I am not sure the implementation is properly tested. What is the problem here, I dont follow. Is this with or without frame pointer? If you ran with O2 frame pointer is omitted so you must explicitly say --frame-pointer=all. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 17:58:16 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 00:58:16 +0000 (UTC) Subject: [PATCH] D83635: [OpenMPOpt][WIP] Merge parallel regions Message-ID: ggeorgakoudis created this revision. Herald added subscribers: llvm-commits, bbn, aaron.ballman, sstefan1, jfb, guansong, hiraditya, yaxunl. Herald added a reviewer: jdoerfert. Herald added a reviewer: jdoerfert. Herald added a reviewer: sstefan1. Herald added a reviewer: baziotis. Herald added a project: LLVM. There are cases that generated OpenMP code consists of multiple, consecutive OpenMP parallel regions, either due to high-level programming models, such as RAJA, Kokkos, lowering to OpenMP code, or simply because the programmer parallelized code this way. This optimization merges consecutive parallel OpenMP regions to: (1) reduce the runtime overhead of re-activating a team of threads; (2) enlarge the scope for other OpenMP optimizations, e.g., runtime call deduplication and synchronization elimination. This implementation defensively merges parallel regions, only when they are within the same BB and any in-between instructions are safe to execute in parallel. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83635 Files: llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/parallel_deletion.ll llvm/test/Transforms/OpenMP/parallel_region_merging.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83635.277269.patch Type: text/x-patch Size: 38775 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 18:00:12 2020 From: llvm-commits at lists.llvm.org (Giorgis Georgakoudis via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 01:00:12 +0000 (UTC) Subject: [PATCH] D83635: [OpenMPOpt][WIP] Merge parallel regions In-Reply-To: References: Message-ID: <698c20dc999675476f3d69eeb8388876@localhost.localdomain> ggeorgakoudis added a comment. Note I also update the regression test Transforms/OpenMP/parallel_deletion.ll since the merging optimization, which applies after deletion, changes the output Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83635/new/ https://reviews.llvm.org/D83635 From llvm-commits at lists.llvm.org Sat Jul 11 19:16:46 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:16:46 +0000 (UTC) Subject: [PATCH] D83636: omp: Make OMP tablegen more like all other tablegens. Message-ID: thakis created this revision. thakis added a reviewer: clementval. Herald added subscribers: sstefan1, hiraditya, mgorny. Herald added a reviewer: jdoerfert. Herald added a project: LLVM. Generate a .inc file that's included by .h and .cpp files, instead of generating all C++ code in tablegen. The surrounding C++ is easier to write in a C++ file, and it's consistent with how the rest of llvm/ looks. No intended behavior change. https://reviews.llvm.org/D83636 Files: llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenMP/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMP.h llvm/include/llvm/Frontend/OpenMP/OMP.td llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/lib/Frontend/OpenMP/CMakeLists.txt llvm/lib/Frontend/OpenMP/OMP.cpp llvm/test/TableGen/directive1.td llvm/test/TableGen/directive2.td llvm/utils/TableGen/DirectiveEmitter.cpp llvm/utils/TableGen/TableGen.cpp llvm/utils/TableGen/TableGenBackends.h llvm/utils/gn/secondary/llvm/include/llvm/Frontend/OpenMP/BUILD.gn llvm/utils/gn/secondary/llvm/lib/Frontend/OpenMP/BUILD.gn -------------- next part -------------- A non-text attachment was scrubbed... Name: D83636.277271.patch Type: text/x-patch Size: 32952 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 19:17:10 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:17:10 +0000 (UTC) Subject: [PATCH] D83636: omp: Make OMP tablegen more like all other tablegens. In-Reply-To: References: Message-ID: thakis added a comment. As discussed a while ago in D81736 . It's about half as much code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83636/new/ https://reviews.llvm.org/D83636 From llvm-commits at lists.llvm.org Sat Jul 11 19:23:46 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:23:46 +0000 (UTC) Subject: [PATCH] D83345: [ms] [llvm-ml] Fix MASM support for nested unnamed STRUCTs and UNIONs In-Reply-To: References: Message-ID: thakis accepted this revision. thakis added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/test/tools/llvm-ml/struct.test:158 + y BYTE ? + ENDS + ENDS ---------------- Can you add another field after the unnamed struct, just to check that that does the right thing? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83345/new/ https://reviews.llvm.org/D83345 From llvm-commits at lists.llvm.org Sat Jul 11 19:24:49 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:24:49 +0000 (UTC) Subject: [PATCH] D83346: [ms] [llvm-ml] Add support for MASM STRUCT casting field accessors: ( PTR ). In-Reply-To: References: Message-ID: <17256ece3b8ba5383c6b11607e585cb6@localhost.localdomain> thakis accepted this revision. thakis added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:1660 + SM.onCast(Identifier); + // eat type and ptr + consumeToken(); ---------------- sentence case Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83346/new/ https://reviews.llvm.org/D83346 From llvm-commits at lists.llvm.org Sat Jul 11 19:25:09 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:25:09 +0000 (UTC) Subject: [PATCH] D83347: [ms] [llvm-ml] Add support for line continuations in MASM In-Reply-To: References: Message-ID: <3d05a4338beb8436ff79de7fac84c529@localhost.localdomain> thakis added a comment. test? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83347/new/ https://reviews.llvm.org/D83347 From llvm-commits at lists.llvm.org Sat Jul 11 19:27:47 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:27:47 +0000 (UTC) Subject: [PATCH] D83032: [utils] New script `check_ninja_deps.py` In-Reply-To: References: Message-ID: <0b77e8cccbd31cd61e56ffb9c0a742d4@localhost.localdomain> thakis accepted this revision. thakis added inline comments. This revision is now accepted and ready to land. ================ Comment at: llvm/utils/check_ninja_deps.py:178 + # stale data from targets that existed only in past builds in the + # same directory. + if (dep in targets and currtarget in deps and ---------------- This also means you have to make sure that latter deps for a file overwrite earlier deps for a file, iirc (the old entry might be stale) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83032/new/ https://reviews.llvm.org/D83032 From llvm-commits at lists.llvm.org Sat Jul 11 19:29:24 2020 From: llvm-commits at lists.llvm.org (Nico Weber via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:29:24 +0000 (UTC) Subject: [PATCH] D83069: [lit] warn if explicitly specified test won't be run indirectly In-Reply-To: References: Message-ID: thakis added inline comments. ================ Comment at: llvm/utils/lit/lit/discovery.py:162 + for res in lc.test_format.getTestsInDirectory(ts, test_dir_in_suite, + litConfig, lc): + if test.getFullName() == res.getFullName(): ---------------- This looks potentially slow. Did you do any perf measurements for this patch? What's lit test discovery time for some target (say, check-llvm) for this test vs without? (min-of-5, or ministat) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83069/new/ https://reviews.llvm.org/D83069 From llvm-commits at lists.llvm.org Sat Jul 11 19:32:21 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:32:21 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <643c0e433e9e11f9dcbf838771fa10a0@localhost.localdomain> tmsriram added a comment. Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict : 1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped. 2. Tracking the CFA with and without the frame pointer is explicitly tested, fine? 3. I will fix all the minor nits mentioned. Is there anything else? Thanks. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 19:41:33 2020 From: llvm-commits at lists.llvm.org (Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 02:41:33 +0000 (UTC) Subject: [PATCH] D71124: [RISCV] support clang driver to select cpu In-Reply-To: References: Message-ID: khchen added a comment. @asb @lenary I thought this path is ready to land? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71124/new/ https://reviews.llvm.org/D71124 From llvm-commits at lists.llvm.org Sat Jul 11 20:41:41 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 03:41:41 +0000 (UTC) Subject: [PATCH] D83637: [AMDGPU] Propagate dead flag during pre-RA exec mask optimizations Message-ID: critson created this revision. critson added reviewers: rampitec, arsenm, nhaehnle. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl. Herald added a project: LLVM. Preserve SCC dead flags in SIOptimizeExecMaskingPreRA. This helps with removing redundant s_andn2 instructions later. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83637 Files: llvm/lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir llvm/test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking-wave32.mir llvm/test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir -------------- next part -------------- A non-text attachment was scrubbed... Name: D83637.277273.patch Type: text/x-patch Size: 8670 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 20:49:50 2020 From: llvm-commits at lists.llvm.org (Zequan Wu via llvm-commits) Date: Sat, 11 Jul 2020 20:49:50 -0700 (PDT) Subject: [llvm] 77272d1 - [COFF] Fix endianness of .llvm.call-graph-profile section data Message-ID: <5f0a885e.1c69fb81.3763f.6a0a@mx.google.com> Author: Zequan Wu Date: 2020-07-11T20:49:26-07:00 New Revision: 77272d177a2d7128cf09dc2d27b353cc3e1ecae0 URL: https://github.com/llvm/llvm-project/commit/77272d177a2d7128cf09dc2d27b353cc3e1ecae0 DIFF: https://github.com/llvm/llvm-project/commit/77272d177a2d7128cf09dc2d27b353cc3e1ecae0.diff LOG: [COFF] Fix endianness of .llvm.call-graph-profile section data Added: Modified: llvm/lib/MC/WinCOFFObjectWriter.cpp Removed: ################################################################################ diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp index 94a8d56c55fc..4796ef531054 100644 --- a/llvm/lib/MC/WinCOFFObjectWriter.cpp +++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp @@ -1116,9 +1116,9 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm, for (const MCAssembler::CGProfileEntry &CGPE : Asm.CGProfile) { uint32_t FromIndex = CGPE.From->getSymbol().getIndex(); uint32_t ToIndex = CGPE.To->getSymbol().getIndex(); - OS.write((const char *)&FromIndex, sizeof(uint32_t)); - OS.write((const char *)&ToIndex, sizeof(uint32_t)); - OS.write((const char *)&CGPE.Count, sizeof(uint64_t)); + support::endian::write(OS, FromIndex, W.Endian); + support::endian::write(OS, ToIndex, W.Endian); + support::endian::write(OS, CGPE.Count, W.Endian); } } From llvm-commits at lists.llvm.org Sat Jul 11 21:11:03 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 04:11:03 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <18fe51ec028697cbfb18ff39779d434c@localhost.localdomain> MaskRay added a comment. My apologies that I did not try cfi-inserter-basic-block-sections-callee-save-registers.ll The generated assembly does have .cfi_offset (I cannot apply the patch with `arc patch` (https://reviews.llvm.org/D79978?id=277216#2138208 ) so I used curl | patch and somehow ignored the locally modified .ll) However, I do think inline asm with clobbered register list () has some advantage: the .cfi_offset registers can be precisely controlled. call void asm sideeffect "nop", "~{rbp},~{r12},~{r13},~{r14},~{r15}"() The CHECK lines can thus be strengthened: ; CFI_INSTR: CFI_INSTRUCTION def_cfa_offset 48 ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r12, -48 ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r13, -40 ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r14, -32 ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r15, -24 ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $rbp, -16 While the current approach cannot control the used registers. ---- There is another unaddressed comment. Adding part of your write-up to the description will be helpful https://reviews.llvm.org/D79978?id=277216#2145979 ================ Comment at: llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll:3 +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=asm --frame-pointer=none -o - | FileCheck --check-prefix=SECTIONS_NOFP_CFI %s +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=obj --frame-pointer=all -o - | llvm-dwarfdump --debug-frame - | FileCheck --check-prefix=DEBUG_FRAME %s + ---------------- While `--eh-frame` is an alias for `--debug-frame`, I think using `--eh-frame` here is more appropriate. This tests .eh_frame, not .debug_frame. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sat Jul 11 22:11:09 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 05:11:09 +0000 (UTC) Subject: [PATCH] D83602: [DAGCombiner] Scalarize splats with just one demanded lane In-Reply-To: References: Message-ID: <50e11e579055b7cabc8ce333d7082de8@localhost.localdomain> tlively added a comment. In D83602#2145785 , @lebedev.ri wrote: > Is this supposed to fix some lowering-produced code? > If not, shouldn't this be best done in the middle-end? Yes, this fixes lowering-produced code. In particular, WebAssembly's vector shift instructions take a scalar shift amount, but in LLVM IR vector shifts take vector shift amounts. WebAssembly's lowering then needs to scalarize the shift entirely except when the shift amount is a splat value, in which case it can just take one lane as the scalar shift amount. This sequence of patches improves codegen in that case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83602/new/ https://reviews.llvm.org/D83602 From llvm-commits at lists.llvm.org Sat Jul 11 23:02:50 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 06:02:50 +0000 (UTC) Subject: [PATCH] D83638: BPF: permit .maps section variables with typedef type Message-ID: yonghong-song created this revision. yonghong-song added reviewers: ast, anakryiko. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Currently, llvm when see a global variable in .maps section, it ensures its type must be a struct type. Then pointee will be further evaluated for the structure members. In normal cases, the pointee type will be skipped. Although this is what current all bpf programs are doing, but it is a little bit restrictive. For example, it is legitimate for users to have: typedef struct { int key_size; int value_size; } __map_t; __map_t map __attribute__((section(".maps"))); This patch lifts this restriction and typedef of a struct type is also allowed for .maps section variables. To avoid create unnecessary fixup entries when traversal started with typedef/struct type, the new implementation first traverse all map struct members and then traverse the typedef/struct type. This way, in internal BTFDebug implementation, no fixup entries are generated. Two new unit tests are added for typedef and const struct in .maps section. Also tested with kernel bpf selftests. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83638 Files: llvm/lib/Target/BPF/BTFDebug.cpp llvm/test/CodeGen/BPF/BTF/map-def-2.ll llvm/test/CodeGen/BPF/BTF/map-def-3.ll llvm/test/CodeGen/BPF/BTF/map-def.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83638.277274.patch Type: text/x-patch Size: 15935 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sat Jul 11 23:17:51 2020 From: llvm-commits at lists.llvm.org (Andrii Nakryiko via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 06:17:51 +0000 (UTC) Subject: [PATCH] D83638: BPF: permit .maps section variables with typedef type In-Reply-To: References: Message-ID: <84737e325b6db0e72ce74935163e255a@localhost.localdomain> anakryiko added a comment. maybe let's just remove special-casing of .maps section? It's just like any other global variable, should be handled in exactly the same way, no? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83638/new/ https://reviews.llvm.org/D83638 From llvm-commits at lists.llvm.org Sat Jul 11 23:50:54 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 06:50:54 +0000 (UTC) Subject: [PATCH] D83638: BPF: permit .maps section variables with typedef type In-Reply-To: References: Message-ID: <9eecbf505db53c4eb9ca0da8d4c194c2@localhost.localdomain> yonghong-song added a comment. In D83638#2146167 , @anakryiko wrote: > maybe let's just remove special-casing of .maps section? It's just like any other global variable, should be handled in exactly the same way, no? for .maps, we need to trace pointee types. For general globals, functional arguments, etc., we do not trace pointee types like in `struct t { struct task_struct *p; }` the `task_struct` won't be traced. That is why we need to handle .maps specially. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83638/new/ https://reviews.llvm.org/D83638 From llvm-commits at lists.llvm.org Sun Jul 12 00:21:42 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 07:21:42 +0000 (UTC) Subject: [PATCH] D83639: [OptTable] Support grouped short options Message-ID: MaskRay created this revision. MaskRay added reviewers: grimar, jhenderson, ikudrin, rupprecht. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. POSIX.1-2017 12.2 Utility Syntax Guidelines, Guideline 5 says: > One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one '-' delimiter. i.e. -abc represents -a -b -c. The grouped short options are very common. Many utilities extend the syntax by allowing (an option with an argument) following a sequence of short options. This patch adds the support to OptTable, similar to cl::Group for CommandLine (D58711 ). llvm-symbolizer will use the feature (D83530 ). CommandLine is exotic in some aspects. OptTable is preferred if the user wants to get rid of the behaviors. - `cl::opt i(...)` can be disabled via -i=false or -i=0, which is different from conventional --no-i. - Handling --foo & --no-foo requires a comparison of argument positions, which is a bit clumsy in user code. OptTable::parseOneArg (non-const reference InputArgList) is added along with ParseOneArg (const ArgList &). The duplicate does not look great at first glance. However, The implementation can be simpler if ArgList is mutable. (ParseOneArg is used by clang-cl (FlagsToInclude/FlagsToExclude) and lld COFF (case-insensitive). Adding grouped short options can make the function even more complex.) The implementation allows a long option following a group of short options. We probably should refine the code to disallow this in the future. Allowing this seems benign for now. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83639 Files: llvm/include/llvm/Option/ArgList.h llvm/include/llvm/Option/OptTable.h llvm/include/llvm/Option/Option.h llvm/lib/Option/OptTable.cpp llvm/lib/Option/Option.cpp llvm/unittests/Option/OptionParsingTest.cpp llvm/unittests/Option/Opts.td -------------- next part -------------- A non-text attachment was scrubbed... Name: D83639.277275.patch Type: text/x-patch Size: 8653 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 00:25:47 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 07:25:47 +0000 (UTC) Subject: [PATCH] D83530: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable In-Reply-To: References: Message-ID: <2ed821a26bbe43a9467c06c429d5c292@localhost.localdomain> MaskRay updated this revision to Diff 277276. MaskRay retitled this revision from "WIP: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable" to "[llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable". MaskRay edited the summary of this revision. MaskRay added a comment. Ready for review. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83530/new/ https://reviews.llvm.org/D83530 Files: llvm/test/DebugInfo/debuglineinfo-path.ll llvm/test/tools/llvm-symbolizer/basic.s llvm/test/tools/llvm-symbolizer/help.test llvm/test/tools/llvm-symbolizer/output-style-inlined.test llvm/test/tools/llvm-symbolizer/split-dwarf.test llvm/test/tools/llvm-symbolizer/unknown-argument.test llvm/test/tools/llvm-symbolizer/untag-addresses.test llvm/tools/llvm-symbolizer/CMakeLists.txt llvm/tools/llvm-symbolizer/Opts.td llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83530.277276.patch Type: text/x-patch Size: 32311 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 00:27:25 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 07:27:25 +0000 (UTC) Subject: [PATCH] D83530: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable In-Reply-To: References: Message-ID: <53ae7271211288b530fd40f13d759436@localhost.localdomain> MaskRay added a comment. In D83530#2144663 , @rupprecht wrote: > `-f` is a GNU option, so we should keep that for compatibility. Is there a reason it's being removed? > ... actually, the patch says `-f` is dropped, but from the tablegen, it looks like it's still there? Fixed. A subsequent patch can drop `-f=` and `-e=` which are not conventional. This patch tries hard to stick with the current behavior to prevent surprise. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83530/new/ https://reviews.llvm.org/D83530 From llvm-commits at lists.llvm.org Sun Jul 12 00:31:13 2020 From: llvm-commits at lists.llvm.org (Andrii Nakryiko via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 07:31:13 +0000 (UTC) Subject: [PATCH] D83638: BPF: permit .maps section variables with typedef type In-Reply-To: References: Message-ID: <1df80ed31b18bb68eb8f9c1088876e7b@localhost.localdomain> anakryiko accepted this revision. anakryiko added a comment. This revision is now accepted and ready to land. ah, makes sense. LGTM then :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83638/new/ https://reviews.llvm.org/D83638 From llvm-commits at lists.llvm.org Sun Jul 12 00:55:39 2020 From: llvm-commits at lists.llvm.org (Guillaume Chatelet via llvm-commits) Date: Sun, 12 Jul 2020 09:55:39 +0200 Subject: [llvm] 3058245 - [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) In-Reply-To: References: <5f07ee46.1c69fb81.3eb4d.b0f5@mx.google.com> Message-ID: Hi Eric, Thx a lot for the heads up and for the repro instructions! I'll have a look soon. Cheers, Guillaume On Sat, Jul 11, 2020 at 12:23 AM Eric Christopher wrote: > Hi Guillaume, > > I've temporarily reverted this in cc28058c13e89ecc85dac7e1bd5d13a2ce1bb620 > as it was causing compilation errors for thinlto. I'll send you an email > with some reproduction instructions offline. > > Sorry for any inconvenience :( > > -eric > > On Thu, Jul 9, 2020 at 9:27 PM Guillaume Chatelet via llvm-commits < > llvm-commits at lists.llvm.org> wrote: > >> >> Author: Guillaume Chatelet >> Date: 2020-07-10T04:27:39Z >> New Revision: 30582457b47004dec8a78144abc919a13ccbd08c >> >> URL: >> https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c >> DIFF: >> https://github.com/llvm/llvm-project/commit/30582457b47004dec8a78144abc919a13ccbd08c.diff >> >> LOG: [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) >> >> This is preparatory work to unable storing alignment for >> AtomicCmpXchgInst. >> See D83136 for context and bug: >> https://bugs.llvm.org/show_bug.cgi?id=27168 >> >> Differential Revision: https://reviews.llvm.org/D83375 >> >> Added: >> >> >> Modified: >> llvm/include/llvm/Bitcode/LLVMBitCodes.h >> llvm/lib/Bitcode/Reader/BitcodeReader.cpp >> >> Removed: >> >> >> >> >> ################################################################################ >> diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h >> b/llvm/include/llvm/Bitcode/LLVMBitCodes.h >> index de4fe6630324..a0c22a7d0905 100644 >> --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h >> +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h >> @@ -536,8 +536,9 @@ enum FunctionCodes { >> >> FUNC_CODE_DEBUG_LOC = 35, // DEBUG_LOC: [Line,Col,ScopeVal, >> IAVal] >> FUNC_CODE_INST_FENCE = 36, // FENCE: [ordering, synchscope] >> - FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty,ptr,cmp,new, >> align, vol, >> - // ordering, synchscope] >> + FUNC_CODE_INST_CMPXCHG_OLD = 37, // CMPXCHG: [ptrty, ptr, cmp, new, >> vol, >> + // success_ordering, ssid, >> + // failure_ordering?, weak?] >> FUNC_CODE_INST_ATOMICRMW = 38, // ATOMICRMW: [ptrty,ptr,val, >> operation, >> // align, vol, >> // ordering, synchscope] >> @@ -551,8 +552,9 @@ enum FunctionCodes { >> FUNC_CODE_INST_GEP = 43, // GEP: [inbounds, n x operands] >> FUNC_CODE_INST_STORE = 44, // STORE: [ptrty,ptr,valty,val, >> align, vol] >> FUNC_CODE_INST_STOREATOMIC = 45, // STORE: [ptrty,ptr,val, align, vol >> - FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty,ptr,valty,cmp,new, >> align, >> - // vol,ordering,synchscope] >> + FUNC_CODE_INST_CMPXCHG = 46, // CMPXCHG: [ptrty, ptr, cmp, newval, >> vol, >> + // success_ordering, ssid, >> + // failure_ordering, weak] >> FUNC_CODE_INST_LANDINGPAD = 47, // LANDINGPAD: >> [ty,val,num,id0,val0...] >> FUNC_CODE_INST_CLEANUPRET = 48, // CLEANUPRET: [val] or [val,bb#] >> FUNC_CODE_INST_CATCHRET = 49, // CATCHRET: [val,bb#] >> >> diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp >> b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp >> index 659e26c2bd25..ad1e97540298 100644 >> --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp >> +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp >> @@ -4982,63 +4982,120 @@ Error BitcodeReader::parseFunctionBody(Function >> *F) { >> InstructionList.push_back(I); >> break; >> } >> - case bitc::FUNC_CODE_INST_CMPXCHG_OLD: >> - case bitc::FUNC_CODE_INST_CMPXCHG: { >> - // CMPXCHG:[ptrty, ptr, cmp, new, vol, successordering, ssid, >> + case bitc::FUNC_CODE_INST_CMPXCHG_OLD: { >> + // CMPXCHG:[ptrty, ptr, cmp, new, vol, success_ordering, ssid, >> // failureordering?, isweak?] >> - unsigned OpNum = 0; >> - Value *Ptr, *Cmp, *New; >> - if (getValueTypePair(Record, OpNum, NextValueNo, Ptr, &FullTy)) >> + const size_t RecordCount = Record.size(); >> + unsigned Slot = 0; >> + Value *Ptr = nullptr; >> + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) >> return error("Invalid record"); >> >> if (!isa(Ptr->getType())) >> return error("Cmpxchg operand is not a pointer type"); >> >> - if (BitCode == bitc::FUNC_CODE_INST_CMPXCHG) { >> - if (getValueTypePair(Record, OpNum, NextValueNo, Cmp, &FullTy)) >> - return error("Invalid record"); >> - } else if (popValue(Record, OpNum, NextValueNo, >> - getPointerElementFlatType(FullTy), Cmp)) >> + Value *Cmp = nullptr; >> + if (popValue(Record, Slot, NextValueNo, >> getPointerElementFlatType(FullTy), >> + Cmp)) >> return error("Invalid record"); >> - else >> - FullTy = cast(FullTy)->getElementType(); >> >> - if (popValue(Record, OpNum, NextValueNo, Cmp->getType(), New) || >> - Record.size() < OpNum + 3 || Record.size() > OpNum + 5) >> + if (!(RecordCount == 6 || RecordCount == 7 || RecordCount == 8)) >> return error("Invalid record"); >> >> - AtomicOrdering SuccessOrdering = getDecodedOrdering(Record[OpNum + >> 1]); >> - if (SuccessOrdering == AtomicOrdering::NotAtomic || >> - SuccessOrdering == AtomicOrdering::Unordered) >> + Value *New = nullptr; >> + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) >> return error("Invalid record"); >> - SyncScope::ID SSID = getDecodedSyncScopeID(Record[OpNum + 2]); >> >> if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), >> Ptr->getType())) >> return Err; >> + >> + const bool IsVol = Record[3]; >> + >> + const AtomicOrdering SuccessOrdering = >> getDecodedOrdering(Record[4]); >> + if (SuccessOrdering == AtomicOrdering::NotAtomic || >> + SuccessOrdering == AtomicOrdering::Unordered) >> + return error("Invalid record"); >> + >> + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); >> + >> AtomicOrdering FailureOrdering; >> - if (Record.size() < 7) >> + if (RecordCount > 6) >> + FailureOrdering = getDecodedOrdering(Record[6]); >> + else >> FailureOrdering = >> >> AtomicCmpXchgInst::getStrongestFailureOrdering(SuccessOrdering); >> - else >> - FailureOrdering = getDecodedOrdering(Record[OpNum + 3]); >> >> - Align Alignment( >> + const Align Alignment( >> TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); >> + >> + FullTy = cast(FullTy)->getElementType(); >> + FullTy = StructType::get(Context, {FullTy, >> Type::getInt1Ty(Context)}); >> I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, >> SuccessOrdering, >> FailureOrdering, SSID); >> - FullTy = StructType::get(Context, {FullTy, >> Type::getInt1Ty(Context)}); >> - cast(I)->setVolatile(Record[OpNum]); >> >> - if (Record.size() < 8) { >> + cast(I)->setVolatile(IsVol); >> + >> + if (RecordCount > 7) { >> + cast(I)->setWeak(Record[7]); >> + } else { >> // Before weak cmpxchgs existed, the instruction simply returned >> the >> // value loaded from memory, so bitcode files from that era will >> be >> // expecting the first component of a modern cmpxchg. >> CurBB->getInstList().push_back(I); >> I = ExtractValueInst::Create(I, 0); >> FullTy = cast(FullTy)->getElementType(0); >> - } else { >> - cast(I)->setWeak(Record[OpNum+4]); >> } >> + InstructionList.push_back(I); >> + break; >> + } >> + case bitc::FUNC_CODE_INST_CMPXCHG: { >> + // CMPXCHG: [ptrty, ptr, cmp, newval, vol, success_ordering, ssid, >> + // failure_ordering, weak] >> + const size_t RecordCount = Record.size(); >> + unsigned Slot = 0; >> + Value *Ptr = nullptr; >> + if (getValueTypePair(Record, Slot, NextValueNo, Ptr, &FullTy)) >> + return error("Invalid record"); >> + >> + if (!isa(Ptr->getType())) >> + return error("Cmpxchg operand is not a pointer type"); >> + >> + Value *Cmp = nullptr; >> + if (getValueTypePair(Record, Slot, NextValueNo, Cmp, &FullTy)) >> + return error("Invalid record"); >> + >> + if (RecordCount != 8) >> + return error("Invalid record"); >> + >> + Value *New = nullptr; >> + if (popValue(Record, Slot, NextValueNo, Cmp->getType(), New)) >> + return error("Invalid record"); >> + >> + const bool IsVol = Record[3]; >> + >> + const AtomicOrdering SuccessOrdering = >> getDecodedOrdering(Record[4]); >> + if (SuccessOrdering == AtomicOrdering::NotAtomic || >> + SuccessOrdering == AtomicOrdering::Unordered) >> + return error("Invalid record"); >> + >> + const SyncScope::ID SSID = getDecodedSyncScopeID(Record[5]); >> + >> + if (Error Err = typeCheckLoadStoreInst(Cmp->getType(), >> Ptr->getType())) >> + return Err; >> + >> + const AtomicOrdering FailureOrdering = >> getDecodedOrdering(Record[6]); >> + >> + const bool IsWeak = Record[7]; >> + >> + const Align Alignment( >> + TheModule->getDataLayout().getTypeStoreSize(Cmp->getType())); >> + >> + FullTy = StructType::get(Context, {FullTy, >> Type::getInt1Ty(Context)}); >> + I = new AtomicCmpXchgInst(Ptr, Cmp, New, Alignment, >> SuccessOrdering, >> + FailureOrdering, SSID); >> + >> + cast(I)->setVolatile(IsVol); >> + cast(I)->setWeak(IsWeak); >> >> InstructionList.push_back(I); >> break; >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Sun Jul 12 01:13:23 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Sun, 12 Jul 2020 01:13:23 -0700 (PDT) Subject: [llvm] 6634aef - [SCCP] Add test for predicate info condition handling (NFC) Message-ID: <5f0ac623.1c69fb81.f868c.7e23@mx.google.com> Author: Nikita Popov Date: 2020-07-12T10:13:10+02:00 New Revision: 6634aef71f3b5e9820d2955bd6b39d2744de06eb URL: https://github.com/llvm/llvm-project/commit/6634aef71f3b5e9820d2955bd6b39d2744de06eb DIFF: https://github.com/llvm/llvm-project/commit/6634aef71f3b5e9820d2955bd6b39d2744de06eb.diff LOG: [SCCP] Add test for predicate info condition handling (NFC) Added: llvm/test/Transforms/SCCP/predicateinfo-cond.ll Modified: Removed: ################################################################################ diff --git a/llvm/test/Transforms/SCCP/predicateinfo-cond.ll b/llvm/test/Transforms/SCCP/predicateinfo-cond.ll new file mode 100644 index 000000000000..d8528918babe --- /dev/null +++ b/llvm/test/Transforms/SCCP/predicateinfo-cond.ll @@ -0,0 +1,110 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S -ipsccp < %s | FileCheck %s + +; Test that information about the true/false value of conditions themselves +; is also used, not information implied by comparisions. + +define i32 @switch(i32 %x) { +; CHECK-LABEL: @switch( +; CHECK-NEXT: switch i32 [[X:%.*]], label [[CASE_DEFAULT:%.*]] [ +; CHECK-NEXT: i32 0, label [[CASE_0:%.*]] +; CHECK-NEXT: i32 2, label [[CASE_2:%.*]] +; CHECK-NEXT: ] +; CHECK: case.0: +; CHECK-NEXT: [[ADD:%.*]] = add i32 [[X]], 1 +; CHECK-NEXT: br label [[END:%.*]] +; CHECK: case.2: +; CHECK-NEXT: [[SUB:%.*]] = sub i32 [[X]], 1 +; CHECK-NEXT: br label [[END]] +; CHECK: case.default: +; CHECK-NEXT: br label [[END]] +; CHECK: end: +; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[ADD]], [[CASE_0]] ], [ [[SUB]], [[CASE_2]] ], [ 1, [[CASE_DEFAULT]] ] +; CHECK-NEXT: ret i32 [[PHI]] +; + switch i32 %x, label %case.default [ + i32 0, label %case.0 + i32 2, label %case.2 + ] + +case.0: + %add = add i32 %x, 1 + br label %end + +case.2: + %sub = sub i32 %x, 1 + br label %end + +case.default: + br label %end + +end: + %phi = phi i32 [ %add, %case.0 ], [ %sub, %case.2 ], [ 1, %case.default] + ret i32 %phi +} + +define i1 @assume(i32 %x) { +; CHECK-LABEL: @assume( +; CHECK-NEXT: [[CMP:%.*]] = icmp sge i32 [[X:%.*]], 0 +; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]]) +; CHECK-NEXT: ret i1 [[CMP]] +; + %cmp = icmp sge i32 %x, 0 + call void @llvm.assume(i1 %cmp) + ret i1 %cmp +} + +define i32 @branch(i32 %x) { +; CHECK-LABEL: @branch( +; CHECK-NEXT: [[CMP:%.*]] = icmp sge i32 [[X:%.*]], 0 +; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN1:%.*]], label [[IF_THEN2:%.*]] +; CHECK: if.then1: +; CHECK-NEXT: br i1 [[CMP]], label [[IF2_THEN1:%.*]], label [[IF2_THEN2:%.*]] +; CHECK: if2.then1: +; CHECK-NEXT: br label [[IF2_END:%.*]] +; CHECK: if2.then2: +; CHECK-NEXT: br label [[IF2_END]] +; CHECK: if2.end: +; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, [[IF2_THEN1]] ], [ 1, [[IF2_THEN2]] ] +; CHECK-NEXT: ret i32 [[PHI]] +; CHECK: if.then2: +; CHECK-NEXT: br i1 [[CMP]], label [[IF3_THEN1:%.*]], label [[IF3_THEN2:%.*]] +; CHECK: if3.then1: +; CHECK-NEXT: br label [[IF3_END:%.*]] +; CHECK: if3.then2: +; CHECK-NEXT: br label [[IF3_END]] +; CHECK: if3.end: +; CHECK-NEXT: [[PHI2:%.*]] = phi i32 [ 0, [[IF3_THEN1]] ], [ 1, [[IF3_THEN2]] ] +; CHECK-NEXT: ret i32 [[PHI2]] +; + %cmp = icmp sge i32 %x, 0 + br i1 %cmp, label %if.then1, label %if.then2 + +if.then1: + br i1 %cmp, label %if2.then1, label %if2.then2 + +if2.then1: + br label %if2.end + +if2.then2: + br label %if2.end + +if2.end: + %phi = phi i32 [ 0, %if2.then1 ], [ 1, %if2.then2 ] + ret i32 %phi + +if.then2: + br i1 %cmp, label %if3.then1, label %if3.then2 + +if3.then1: + br label %if3.end + +if3.then2: + br label %if3.end + +if3.end: + %phi2 = phi i32 [ 0, %if3.then1 ], [ 1, %if3.then2 ] + ret i32 %phi2 +} + +declare void @llvm.assume(i1) From llvm-commits at lists.llvm.org Sun Jul 12 01:25:45 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:25:45 +0000 (UTC) Subject: [PATCH] D83640: [PredicateInfo] Add a common method to interpret predicate as cmp constraint Message-ID: nikic created this revision. nikic added a reviewer: fhahn. Herald added subscribers: llvm-commits, hiraditya, Prazek. Herald added a project: LLVM. Both users of predicteinfo (NewGVN and SCCP) are interested in getting a cmp constraint on the predicated value. They currently implement separate logic for this. This patch adds a common method for this in PredicateWithCondition (it would be nice to drop the PredicateBase/PredicateWithCondition split ... I saw you had a patch for that). This enables a missing bit of PredicateInfo handling in SCCP: Now the predicate on the condition itself is also used. For switches it means we know that the switched-on value is the same as the case value. For assumes/branches we know that the condition is true or false. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83640 Files: llvm/include/llvm/Transforms/Utils/PredicateInfo.h llvm/lib/Transforms/Scalar/NewGVN.cpp llvm/lib/Transforms/Scalar/SCCP.cpp llvm/lib/Transforms/Utils/PredicateInfo.cpp llvm/test/Transforms/SCCP/predicateinfo-cond.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83640.277277.patch Type: text/x-patch Size: 13138 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 01:30:32 2020 From: llvm-commits at lists.llvm.org (Carl Ritson via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:30:32 +0000 (UTC) Subject: [PATCH] D83641: [AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns Message-ID: critson created this revision. critson added reviewers: cdevadas, rampitec, nhaehnle. Herald added subscribers: llvm-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl, arsenm. Herald added a project: LLVM. Depends on D83637 for test correctness, but not operation. Add handling of s_andn2 and mask of 0. This eliminates code from uniform control flows. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83641 Files: llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp llvm/test/CodeGen/AMDGPU/branch-relaxation.ll llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll llvm/test/CodeGen/AMDGPU/infinite-loop.ll llvm/test/CodeGen/AMDGPU/insert-skip-from-vcc.mir llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll llvm/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll llvm/test/CodeGen/AMDGPU/wave32.ll llvm/test/CodeGen/AMDGPU/wqm.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83641.277278.patch Type: text/x-patch Size: 13882 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 01:40:34 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:40:34 +0000 (UTC) Subject: [PATCH] D83587: [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero In-Reply-To: References: Message-ID: RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83587/new/ https://reviews.llvm.org/D83587 From llvm-commits at lists.llvm.org Sun Jul 12 01:45:20 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:45:20 +0000 (UTC) Subject: [PATCH] D83605: [SelectionDAG][WebAssembly] Recognize splat value ABS operations In-Reply-To: References: Message-ID: <59b07c45917767d6f1a2a282e20ea230@localhost.localdomain> RKSimon added a comment. one minor in SelectionDAG - but a wasm dev needs to looks at the target code ================ Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2317 + return true; + } + break; ---------------- Unnecessary braces around the if() And if you move the getOperand(0) inside the isSplatValue() call then the other braces can go as well. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83605/new/ https://reviews.llvm.org/D83605 From llvm-commits at lists.llvm.org Sun Jul 12 01:49:24 2020 From: llvm-commits at lists.llvm.org (Simon Pilgrim via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:49:24 +0000 (UTC) Subject: [PATCH] D83598: [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts In-Reply-To: References: Message-ID: <41d022a1f01127ad4959cd1f17dae02f@localhost.localdomain> RKSimon accepted this revision. RKSimon added a comment. This revision is now accepted and ready to land. LGTM - cheers CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83598/new/ https://reviews.llvm.org/D83598 From llvm-commits at lists.llvm.org Sun Jul 12 01:56:02 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 08:56:02 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <49ad0b8248efa57d32386b448c9e402f@localhost.localdomain> Higuoxing marked an inline comment as done. Higuoxing added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- dblaikie wrote: > jhenderson wrote: > > dblaikie wrote: > > > jhenderson wrote: > > > > dblaikie wrote: > > > > > Higuoxing wrote: > > > > > > jhenderson wrote: > > > > > > > dblaikie wrote: > > > > > > > > Higuoxing wrote: > > > > > > > > > dblaikie wrote: > > > > > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > > > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > > > > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > > > > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > > > > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > > > > > > > > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. > > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > > > > What do you mean by "auto-generation aspects"? > > > > > > But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time. > > At the moment, to use yaml2obj to generate DWARF, you have to specify pretty much every detail of the DWARF, including the details of the abbrev table and the string table for example. Ideally, we should be able to describe the DWARF in a higher level manner (e.g. by just specifying the attributes and values in the .debug_info description, letting yaml2obj do all the leg work of selecting a form, populating the abbrev and string tables etc). You'll see details of this in @Higuoxing's mailing list posts about his GSOC project. > > > > We can use the basic-level testing for "bootstrapping". yaml2obj can generate valid raw sections, tested via hex -> allows testing of llvm-dwarfdump section dumping -> allows testing of yaml2obj higher-level functionality (because we know that llvm-dwarfdump section dumping now works). > That seems like it's going to be fairly subtle/hard to maintain the separation here - if some yaml2obj tests use hex dumping but others can use llvm-dwarfdump - if/when/that's happening, might be worth separate directories for the two kinds of tests and some fairly specific documentation about how to determine which tests go where. What do you think of making elf2yaml support dumping DWARF sections? In the future, we can use raw assembly to test elf2yaml and use elf2yaml to test yaml2elf. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Sun Jul 12 02:43:59 2020 From: llvm-commits at lists.llvm.org (David Zarzycki via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 09:43:59 +0000 (UTC) Subject: [PATCH] D82085: [TRE] allow TRE for non-capturing calls. In-Reply-To: References: Message-ID: <294696123008e665e0b813c136c34d09@localhost.localdomain> davezarzycki added a comment. Hello. I have an auto-bisecting multi-stage bot that is failing on two after this change. Can we please revert this or commit a quick fix? FAIL: Clang :: CXX/class/class.compare/class.spaceship/p1.cpp (6232 of 64222) ******************** TEST 'Clang :: CXX/class/class.compare/class.spaceship/p1.cpp' FAILED ******************** Script: -- : 'RUN: at line 1'; /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions -- Exit Code: 134 Command Output (stderr): -- clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions 1. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:127:38: current parser token ',' 2. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:39:1: parsing namespace 'Deletedness' 3. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: parsing function body 'Deletedness::g' 4. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: in compound statement ('{}') #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f) #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912) #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5) #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90) #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25) #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895) #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769) #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86) #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c) #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb) #10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab) #11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca) #12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60) #13 0x0000000005092783 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x5092783) #14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba) #15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86) #16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c) #17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22) #18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed) #19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a) #20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49) #21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc) #22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9) #23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl&, llvm::SmallVectorImpl&, llvm::function_ref) (/tmp/_update_lc/t/bin/clang+0x4d60dba) #24 0x0000000004d542d9 clang::Parser::ParsePostfixExpressionSuffix(clang::ActionResult) (/tmp/_update_lc/t/bin/clang+0x4d542d9) #25 0x0000000004d55b95 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d55b95) #26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89) #27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9) #28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368) #29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0) #30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614) #31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2) #32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0) #33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0) #34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d) #35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32) #36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938) #37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc) #38 0x0000000004d02c15 clang::Parser::ParseInnerNamespace(llvm::SmallVector const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) (/tmp/_update_lc/t/bin/clang+0x4d02c15) #39 0x0000000004d0251a clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) (/tmp/_update_lc/t/bin/clang+0x4d0251a) #40 0x0000000004d22f0a clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d22f0a) #41 0x0000000004cf7e39 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf7e39) #42 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858) #43 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed) #44 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21) #45 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3) #46 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b) #47 0x0000000002244636 cc1_main(llvm::ArrayRef, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636) #48 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl&) (/tmp/_update_lc/t/bin/clang+0x224297d) #49 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619) #50 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042) #51 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce) /tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.spaceship/Output/p1.cpp.script: line 1: 4146089 Aborted /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions -- ******************** Testing: 0.. FAIL: Clang :: CXX/class/class.compare/class.eq/p2.cpp (6242 of 64222) ******************** TEST 'Clang :: CXX/class/class.compare/class.eq/p2.cpp' FAILED ******************** Script: -- : 'RUN: at line 1'; /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp -- Exit Code: 134 Command Output (stderr): -- clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp 1. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:47:30: current parser token ')' 2. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: parsing function body 'test' 3. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: in compound statement ('{}') #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f) #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912) #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5) #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90) #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25) #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895) #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769) #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86) #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c) #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb) #10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab) #11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca) #12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60) #13 0x00000000050928b7 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x50928b7) #14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba) #15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86) #16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c) #17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22) #18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed) #19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a) #20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49) #21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc) #22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9) #23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl&, llvm::SmallVectorImpl&, llvm::function_ref) (/tmp/_update_lc/t/bin/clang+0x4d60dba) #24 0x0000000004d4b29c clang::Parser::ParseCXXTypeConstructExpression(clang::DeclSpec const&) (/tmp/_update_lc/t/bin/clang+0x4d4b29c) #25 0x0000000004d57617 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d57617) #26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89) #27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9) #28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368) #29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0) #30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614) #31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2) #32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0) #33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0) #34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d) #35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32) #36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938) #37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc) #38 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858) #39 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed) #40 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21) #41 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3) #42 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b) #43 0x0000000002244636 cc1_main(llvm::ArrayRef, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636) #44 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl&) (/tmp/_update_lc/t/bin/clang+0x224297d) #45 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619) #46 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042) #47 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce) /tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.eq/Output/p2.cpp.script: line 1: 4146047 Aborted /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp -- ******************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. ******************** Failed Tests (2): Clang :: CXX/class/class.compare/class.eq/p2.cpp Clang :: CXX/class/class.compare/class.spaceship/p1.cpp Testing Time: 117.51s Unsupported : 12906 Passed : 51214 Expectedly Failed: 100 Failed : 2 FAILED: CMakeFiles/check-all cd /tmp/_update_lc/t && /usr/bin/python3.8 /tmp/_update_lc/t/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /tmp/_update_lc/t/tools/clang/test /tmp/_update_lc/t/tools/lld/test /tmp/_update_lc/t/tools/lldb/test /tmp/_update_lc/t/utils/lit /tmp/_update_lc/t/test ninja: build stopped: subcommand failed. + do_error 'FAILURE -- STAGE TWO BUILD of LLVM' 12 + echo FAILURE -- STAGE TWO BUILD of LLVM FAILURE -- STAGE TWO BUILD of LLVM + exit 12 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 From llvm-commits at lists.llvm.org Sun Jul 12 02:47:58 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 09:47:58 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <05796b81f505bd6f8bdadef062f40a9c@localhost.localdomain> okura updated this revision to Diff 277281. okura added a comment. fix test CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 Files: llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/range.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82861.277281.patch Type: text/x-patch Size: 4272 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 03:04:36 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 10:04:36 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex In-Reply-To: References: Message-ID: <20fd8c6fd989be27d37304c0ca100744@localhost.localdomain> aykevl added a comment. Looks good to me. I've confirmed this works with D78741 and will update the patch soon to make use of it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83634/new/ https://reviews.llvm.org/D83634 From llvm-commits at lists.llvm.org Sun Jul 12 03:17:11 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 10:17:11 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <8722d13b4ab3abb282c8aa8b38e86893@localhost.localdomain> aykevl updated this revision to Diff 277282. aykevl edited the summary of this revision. aykevl added a comment. - added `--print-imm-hex` (so this patch depends on D83634 ) - deleted some redundant parentheses - changed all comment markers to `;` I've confirmed that `#` is not a valid comment char in the original AVR assembler. See: http://ww1.microchip.com/downloads/en/devicedoc/40001917a.pdf#page=12 (section 4.3). I don't know why `#` at the start of a line works as a comment, perhaps these lines are removed by llvm-lit? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 Files: lld/ELF/Arch/AVR.cpp lld/test/ELF/avr-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D78741.277282.patch Type: text/x-patch Size: 6618 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 03:39:40 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 10:39:40 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: okura added a comment. In D82861#2137685 , @jdoerfert wrote: > Can you merge this? Do I have a right to merge this by myself? I did `arc patch` and tried to `git push https://github.com/llvm/llvm-project.git HEAD:master` according to the document , but I failed to do that. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 03:50:32 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 10:50:32 +0000 (UTC) Subject: [PATCH] D83283: [Attributor] AAPotentialValues Interface In-Reply-To: References: Message-ID: okura marked an inline comment as done. okura added inline comments. ================ Comment at: llvm/include/llvm/Transforms/IPO/Attributor.h:3056 + static inline APInt getEmptyKey() { + APInt V(nullptr, 0); + V.U.VAL = 0; ---------------- baziotis wrote: > Is there an `APInt` constructor that takes a pointer and an `int` ? I think the only such constructor is a private one. Sorry for the delayed response. This struct is a friend struct of `APInt`. So Member functions can access the private functions of `APInt`. (c.f. https://llvm.org/doxygen/APInt_8h_source.html#l00099) On the other hand, the same struct is defined in "/lib/IR/LLVMContextImpl.h" and I want to include it but I failed to do that. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83283/new/ https://reviews.llvm.org/D83283 From llvm-commits at lists.llvm.org Sun Jul 12 04:06:19 2020 From: llvm-commits at lists.llvm.org (Stefan Stipanovic via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 11:06:19 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <3d7af3808abbb6012569549596812225@localhost.localdomain> sstefan1 added a comment. In D82861#2146242 , @okura wrote: > In D82861#2137685 , @jdoerfert wrote: > > > Can you merge this? > > > Do I have a right to merge this by myself? I did `arc patch` and tried to `git push https://github.com/llvm/llvm-project.git HEAD:master` according to the document , but I failed to do that. Did you get the commit access? If so, what problems did you have with `git push`? `git push origin master` should be enough. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 04:41:30 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 11:41:30 +0000 (UTC) Subject: [PATCH] D83624: [DWARFYAML] Implement the .debug_rnglists section. In-Reply-To: References: Message-ID: Higuoxing updated this revision to Diff 277283. Higuoxing added a comment. Simplify codes for emitting offsets. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83624/new/ https://reviews.llvm.org/D83624 Files: llvm/include/llvm/ObjectYAML/DWARFEmitter.h llvm/include/llvm/ObjectYAML/DWARFYAML.h llvm/lib/ObjectYAML/DWARFEmitter.cpp llvm/lib/ObjectYAML/DWARFYAML.cpp llvm/lib/ObjectYAML/ELFEmitter.cpp llvm/test/tools/yaml2obj/ELF/DWARF/debug-rnglists.yaml -------------- next part -------------- A non-text attachment was scrubbed... Name: D83624.277283.patch Type: text/x-patch Size: 39652 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 04:52:54 2020 From: llvm-commits at lists.llvm.org (Xing GUO via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 11:52:54 +0000 (UTC) Subject: [PATCH] D83624: [DWARFYAML] Implement the .debug_rnglists section. In-Reply-To: References: Message-ID: <9a1347bdc670358f410ff001467a679d@localhost.localdomain> Higuoxing updated this revision to Diff 277284. Higuoxing added a comment. Use reference in EmitOffsets. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83624/new/ https://reviews.llvm.org/D83624 Files: llvm/include/llvm/ObjectYAML/DWARFEmitter.h llvm/include/llvm/ObjectYAML/DWARFYAML.h llvm/lib/ObjectYAML/DWARFEmitter.cpp llvm/lib/ObjectYAML/DWARFYAML.cpp llvm/lib/ObjectYAML/ELFEmitter.cpp llvm/test/tools/yaml2obj/ELF/DWARF/debug-rnglists.yaml -------------- next part -------------- A non-text attachment was scrubbed... Name: D83624.277284.patch Type: text/x-patch Size: 39653 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 04:53:24 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 11:53:24 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <5f91f6012877a2c6b66eef097e738933@localhost.localdomain> okura added a comment. In D82861#2146246 , @sstefan1 wrote: > In D82861#2146242 , @okura wrote: > > > In D82861#2137685 , @jdoerfert wrote: > > > > > Can you merge this? > > > > > > Do I have a right to merge this by myself? I did `arc patch` and tried to `git push https://github.com/llvm/llvm-project.git HEAD:master` according to the document , but I failed to do that. > > > Did you get the commit access? If so, what problems did you have with `git push`? I got the following message with `git push` remote: Permission to llvm/llvm-project.git denied to okuraofvegetable. fatal: unable to access 'https://github.com/llvm/llvm-project.git/': The requested URL returned error: 403 >From this message, it seems to me that I don't have commit access. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 04:58:50 2020 From: llvm-commits at lists.llvm.org (Kuter Dinel via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 11:58:50 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <0019081be4421997a583290622ac8979@localhost.localdomain> kuter added a comment. In D82861#2146253 , @okura wrote: > In D82861#2146246 , @sstefan1 wrote: > > > In D82861#2146242 , @okura wrote: > > > > > In D82861#2137685 , @jdoerfert wrote: > > > > > > > Can you merge this? > > > > > > > > > Do I have a right to merge this by myself? I did `arc patch` and tried to `git push https://github.com/llvm/llvm-project.git HEAD:master` according to the document , but I failed to do that. > > > > > > Did you get the commit access? If so, what problems did you have with `git push`? > > > I got the following message with `git push` > > remote: Permission to llvm/llvm-project.git denied to okuraofvegetable. > fatal: unable to access 'https://github.com/llvm/llvm-project.git/': The requested URL returned error: 403 > > > From this message, it seems to me that I don't have commit access. You don't seem to have commit access you are not a member of the llvm org in github. If you asked for commit access you should have received a invite. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 05:00:23 2020 From: llvm-commits at lists.llvm.org (Luofan Chen via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 12:00:23 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <867dd5bbe008c85f01eada2778565d37@localhost.localdomain> bbn added a comment. Take a look at this page: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access , you need to send an email to Chris to ask for commit access. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 05:22:32 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 12:22:32 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: okura added a comment. Thank you, everyone. I sent a request e-mail. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 From llvm-commits at lists.llvm.org Sun Jul 12 05:53:14 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Sun, 12 Jul 2020 05:53:14 -0700 (PDT) Subject: [llvm] 39009a8 - [DAGCombiner] tighten fast-math constraints for fma fold Message-ID: <5f0b07ba.1c69fb81.aeec7.bffb@mx.google.com> Author: Sanjay Patel Date: 2020-07-12T08:51:49-04:00 New Revision: 39009a8245dae78250081b16fc679ce338af405a URL: https://github.com/llvm/llvm-project/commit/39009a8245dae78250081b16fc679ce338af405a DIFF: https://github.com/llvm/llvm-project/commit/39009a8245dae78250081b16fc679ce338af405a.diff LOG: [DAGCombiner] tighten fast-math constraints for fma fold fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) This is only allowed when "reassoc" is present on the fadd. As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products A*B and C*D are added together before adding in 'E'. I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast. Differential Revision: https://reviews.llvm.org/D82499 Added: Modified: llvm/docs/LangRef.rst llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/fadd-combines.ll llvm/test/CodeGen/X86/fma_patterns.ll Removed: ################################################################################ diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index c2d6200e67fa..86d315be74bc 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2778,7 +2778,9 @@ floating-point transformations. ``contract`` Allow floating-point contraction (e.g. fusing a multiply followed by an - addition into a fused multiply-and-add). + addition into a fused multiply-and-add). This does not enable reassociating + to form arbitrary contractions. For example, ``(a*b) + (c*d) + e`` can not + be transformed into ``(a*b) + ((c*d) + e)`` to create two fma operations. ``afn`` Approximate functions - Allow substitution of approximate calculations for diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 0d84cd89f5ae..42e6e12f3f02 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -11986,6 +11986,8 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) { SDNodeFlags Flags = N->getFlags(); bool CanFuse = Options.UnsafeFPMath || isContractable(N); + bool CanReassociate = + Options.UnsafeFPMath || N->getFlags().hasAllowReassociation(); bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast || CanFuse || HasFMAD); // If the addition is not contractable, do not combine. @@ -12028,13 +12030,14 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) { // fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) // fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E) + // This requires reassociation because it changes the order of operations. SDValue FMA, E; - if (CanFuse && N0.getOpcode() == PreferredFusedOpcode && + if (CanReassociate && N0.getOpcode() == PreferredFusedOpcode && N0.getOperand(2).getOpcode() == ISD::FMUL && N0.hasOneUse() && N0.getOperand(2).hasOneUse()) { FMA = N0; E = N1; - } else if (CanFuse && N1.getOpcode() == PreferredFusedOpcode && + } else if (CanReassociate && N1.getOpcode() == PreferredFusedOpcode && N1.getOperand(2).getOpcode() == ISD::FMUL && N1.hasOneUse() && N1.getOperand(2).hasOneUse()) { FMA = N1; diff --git a/llvm/test/CodeGen/AArch64/fadd-combines.ll b/llvm/test/CodeGen/AArch64/fadd-combines.ll index 0e4f2c02c311..2ff485830780 100644 --- a/llvm/test/CodeGen/AArch64/fadd-combines.ll +++ b/llvm/test/CodeGen/AArch64/fadd-combines.ll @@ -207,6 +207,10 @@ define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, doubl ret double %a2 } +; Minimum FMF - the 1st fadd is contracted because that combines +; fmul+fadd as specified by the order of operations; the 2nd fadd +; requires reassociation to fuse with c*d. + define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind { ; CHECK-LABEL: fadd_fma_fmul_fmf: ; CHECK: // %bb.0: @@ -220,13 +224,14 @@ define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n ret float %a2 } -; Minimum FMF, commute final add operands, change type. +; Not minimum FMF. define float @fadd_fma_fmul_2(float %a, float %b, float %c, float %d, float %n0) nounwind { ; CHECK-LABEL: fadd_fma_fmul_2: ; CHECK: // %bb.0: -; CHECK-NEXT: fmadd s2, s2, s3, s4 +; CHECK-NEXT: fmul s2, s2, s3 ; CHECK-NEXT: fmadd s0, s0, s1, s2 +; CHECK-NEXT: fadd s0, s4, s0 ; CHECK-NEXT: ret %m1 = fmul float %a, %b %m2 = fmul float %c, %d diff --git a/llvm/test/CodeGen/X86/fma_patterns.ll b/llvm/test/CodeGen/X86/fma_patterns.ll index 3049365b6f32..43b1f4a79aff 100644 --- a/llvm/test/CodeGen/X86/fma_patterns.ll +++ b/llvm/test/CodeGen/X86/fma_patterns.ll @@ -1821,6 +1821,10 @@ define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, doubl ret double %a2 } +; Minimum FMF - the 1st fadd is contracted because that combines +; fmul+fadd as specified by the order of operations; the 2nd fadd +; requires reassociation to fuse with c*d. + define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind { ; FMA-LABEL: fadd_fma_fmul_fmf: ; FMA: # %bb.0: @@ -1846,25 +1850,28 @@ define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n ret float %a2 } -; Minimum FMF, commute final add operands, change type. +; Not minimum FMF. define float @fadd_fma_fmul_2(float %a, float %b, float %c, float %d, float %n0) nounwind { ; FMA-LABEL: fadd_fma_fmul_2: ; FMA: # %bb.0: -; FMA-NEXT: vfmadd213ss {{.*#+}} xmm2 = (xmm3 * xmm2) + xmm4 -; FMA-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + xmm2 +; FMA-NEXT: vmulss %xmm3, %xmm2, %xmm2 +; FMA-NEXT: vfmadd231ss {{.*#+}} xmm2 = (xmm1 * xmm0) + xmm2 +; FMA-NEXT: vaddss %xmm2, %xmm4, %xmm0 ; FMA-NEXT: retq ; ; FMA4-LABEL: fadd_fma_fmul_2: ; FMA4: # %bb.0: -; FMA4-NEXT: vfmaddss {{.*#+}} xmm2 = (xmm2 * xmm3) + xmm4 +; FMA4-NEXT: vmulss %xmm3, %xmm2, %xmm2 ; FMA4-NEXT: vfmaddss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm2 +; FMA4-NEXT: vaddss %xmm0, %xmm4, %xmm0 ; FMA4-NEXT: retq ; ; AVX512-LABEL: fadd_fma_fmul_2: ; AVX512: # %bb.0: -; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm2 = (xmm3 * xmm2) + xmm4 -; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + xmm2 +; AVX512-NEXT: vmulss %xmm3, %xmm2, %xmm2 +; AVX512-NEXT: vfmadd231ss {{.*#+}} xmm2 = (xmm1 * xmm0) + xmm2 +; AVX512-NEXT: vaddss %xmm2, %xmm4, %xmm0 ; AVX512-NEXT: retq %m1 = fmul float %a, %b %m2 = fmul float %c, %d From llvm-commits at lists.llvm.org Sun Jul 12 05:53:23 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 12:53:23 +0000 (UTC) Subject: [PATCH] D82499: [DAGCombiner] tighten constraints for fma fold In-Reply-To: References: Message-ID: <4ca53e44381ebdbf53ce7c5aafcdcfe4@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG39009a8245da: [DAGCombiner] tighten fast-math constraints for fma fold (authored by spatel). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82499/new/ https://reviews.llvm.org/D82499 Files: llvm/docs/LangRef.rst llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/test/CodeGen/AArch64/fadd-combines.ll llvm/test/CodeGen/X86/fma_patterns.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82499.277285.patch Type: text/x-patch Size: 5032 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 06:01:46 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 13:01:46 +0000 (UTC) Subject: [PATCH] D83636: omp: Make OMP tablegen more like all other tablegens. In-Reply-To: References: Message-ID: <8d31af13b1a749e59c8be934e06d8c9a@localhost.localdomain> clementval added a comment. While I agree that this reduce the code size it also split the code generation in two. We are also generating new code for Flang and probably it will be used by clang when we move the rest of the OMPKinds.def to TableGen (see https://reviews.llvm.org/D83326). I don't have strong opinion on this but looks like going back to the OMPKinds.def which we are trying to remove. Maybe @jdoerfert can review/comment this better. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83636/new/ https://reviews.llvm.org/D83636 From llvm-commits at lists.llvm.org Sun Jul 12 07:35:14 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 14:35:14 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: <37e1ddddf090a723dafd86d5833849d8@localhost.localdomain> Ayal accepted this revision. Ayal added a comment. This revision is now accepted and ready to land. This looks good to me, thanks! with last couple of nits. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6595 + // finding the chain of operations that leads from the loop exit value back + // to the phi. + SmallVector ReductionOperations = ---------------- nit: "... that leads from the loop exit value back.." - chain is now found top-down. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7358 + } + RecipeBuilder.recordRecipeOf(Phi); + } ---------------- nit: can record the recipe of Phi first, just to follow chain order. ================ Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7583 + VPRecipeBase *CompareRecipe = + RecipeBuilder.getRecipe(cast(R->getOperand(0))); + CompareRecipe->removeFromParent(); ---------------- nit: can assert CompareRecipe->getVPRecipeID() CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Sun Jul 12 07:53:47 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 14:53:47 +0000 (UTC) Subject: [PATCH] D83642: [SelectionDAG] Prevent warnings when extracting fixed length vector from scalable. Message-ID: paulwalker-arm created this revision. Herald added subscribers: llvm-commits, jfb, hiraditya. Herald added a project: LLVM. ComputeNumSignBits and computeKnownBits both trigger "Scalable flag may be dropped" warnings when a fixed length vector is extracted from a scalable vector. This patch assumes nothing about the demanded elements thus matching the behaviour when extracting a scalable vector from a scalable vector. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83642 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll @@ -13,7 +13,10 @@ ; RUN: llc -aarch64-sve-vector-bits-min=1664 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 ; RUN: llc -aarch64-sve-vector-bits-min=1792 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 ; RUN: llc -aarch64-sve-vector-bits-min=1920 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 -; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 +; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s 2>%t | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; Test we can code generater patterns of the form: ; fixed_length_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 @@ -85,4 +88,19 @@ ret void } +; +define <4 x i1> @no_warn_dropped_salable(<4 x i64>* %in) #0 { +; CHECK-LABEL: no_warn_dropped_salable: +; VBITS_GE_256: ptrue [[PG:p[0-9]+]].s, vl4 +; VBITS_GE_256: ld1w { z{{[0-9]+}}.s }, [[PG]]/z, [x0] +; VBITS_GE_256-COUNT-4: cmp x{{[0-9]+}}, #0 +; CHECK: ret + %a = load <4 x i64>, <4 x i64>* %in + br label %bb1 + +bb1: + %cond = icmp sgt <4 x i64> %a, zeroinitializer + ret <4 x i1> %cond +} + attributes #0 = { "target-features"="+sve" } Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -2718,6 +2718,9 @@ case ISD::EXTRACT_SUBVECTOR: { // Offset the demanded elts by the subvector index. SDValue Src = Op.getOperand(0); + // Bail until we can represent demanded elements for scalable vectors. + if (Src.getValueType().isScalableVector()) + break; uint64_t Idx = Op.getConstantOperandVal(1); unsigned NumSrcElts = Src.getValueType().getVectorNumElements(); APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx); @@ -3973,6 +3976,9 @@ case ISD::EXTRACT_SUBVECTOR: { // Offset the demanded elts by the subvector index. SDValue Src = Op.getOperand(0); + // Bail until we can represent demanded elements for scalable vectors. + if (Src.getValueType().isScalableVector()) + break; uint64_t Idx = Op.getConstantOperandVal(1); unsigned NumSrcElts = Src.getValueType().getVectorNumElements(); APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83642.277287.patch Type: text/x-patch Size: 2951 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 07:58:32 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 14:58:32 +0000 (UTC) Subject: [PATCH] D83288: [LV] Pick vector loop body as insert point for SCEV expansion. In-Reply-To: References: Message-ID: <4bc3f15230b7213f196347d964c987bf@localhost.localdomain> Ayal added a comment. Good catch! The specific culprit for the pr is the call to SE.DT.dominates(EntInst, InsertPt) by SCEVExpander::FindValueInExprValueMap(). Would probably be better to keep DT up to date as we go along, due to SE.DT's dependence on it, instead of fixing it after code-gen via updateDominatorTree(); but there was some reason for doing it this way(?). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83288/new/ https://reviews.llvm.org/D83288 From llvm-commits at lists.llvm.org Sun Jul 12 08:07:08 2020 From: llvm-commits at lists.llvm.org (Paul Walker via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:07:08 +0000 (UTC) Subject: [PATCH] D83642: [SelectionDAG] Prevent warnings when extracting fixed length vector from scalable. In-Reply-To: References: Message-ID: paulwalker-arm updated this revision to Diff 277288. paulwalker-arm added a comment. Made new test consistent with the others in the file. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83642/new/ https://reviews.llvm.org/D83642 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll Index: llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll =================================================================== --- llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll +++ llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll @@ -13,7 +13,10 @@ ; RUN: llc -aarch64-sve-vector-bits-min=1664 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 ; RUN: llc -aarch64-sve-vector-bits-min=1792 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 ; RUN: llc -aarch64-sve-vector-bits-min=1920 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024 -; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 +; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s 2>%t | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048 +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; WARN-NOT: warning ; Test we can code generater patterns of the form: ; fixed_length_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 @@ -85,4 +88,19 @@ ret void } +; +define <8 x i1> @no_warn_dropped_scalable(<8 x i32>* %in) #0 { +; CHECK-LABEL: no_warn_dropped_scalable: +; CHECK: ptrue [[PG:p[0-9]+]].s, vl8 +; CHECK: ld1w { z{{[0-9]+}}.s }, [[PG]]/z, [x0] +; CHECK-COUNT-8: cmp w{{[0-9]+}}, #0 +; CHECK: ret + %a = load <8 x i32>, <8 x i32>* %in + br label %bb1 + +bb1: + %cond = icmp sgt <8 x i32> %a, zeroinitializer + ret <8 x i1> %cond +} + attributes #0 = { "target-features"="+sve" } Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -2718,6 +2718,9 @@ case ISD::EXTRACT_SUBVECTOR: { // Offset the demanded elts by the subvector index. SDValue Src = Op.getOperand(0); + // Bail until we can represent demanded elements for scalable vectors. + if (Src.getValueType().isScalableVector()) + break; uint64_t Idx = Op.getConstantOperandVal(1); unsigned NumSrcElts = Src.getValueType().getVectorNumElements(); APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx); @@ -3973,6 +3976,9 @@ case ISD::EXTRACT_SUBVECTOR: { // Offset the demanded elts by the subvector index. SDValue Src = Op.getOperand(0); + // Bail until we can represent demanded elements for scalable vectors. + if (Src.getValueType().isScalableVector()) + break; uint64_t Idx = Op.getConstantOperandVal(1); unsigned NumSrcElts = Src.getValueType().getVectorNumElements(); APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83642.277288.patch Type: text/x-patch Size: 2932 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 08:11:25 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:11:25 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: MaskRay added a comment. Thanks for the update. All look good to me now. In D78741#2146238 , @aykevl wrote: > - added `--print-imm-hex` (so this patch depends on D83634 ) > - deleted some redundant parentheses > - changed all comment markers to `;` > > I've confirmed that `#` is not a valid comment char in the original AVR assembler. See: http://ww1.microchip.com/downloads/en/devicedoc/40001917a.pdf#page=12 (section 4.3). I don't know why `#` at the start of a line works as a comment, perhaps these lines are removed by llvm-lit? MCParser supports `#` in the line beginning. 6.12 of the document you linked says "Unsurprisingly, this directive does exactly nothing. The only reason it exists is that it is required by the ANSI C standard" but I think that is not true. GCC/clang will report `error: invalid preprocessing directive` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 From llvm-commits at lists.llvm.org Sun Jul 12 08:14:47 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:14:47 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex In-Reply-To: References: Message-ID: <6e1b606accd22ac79f643d4510336c32@localhost.localdomain> MaskRay added a comment. In D83634#2146236 , @aykevl wrote: > Looks good to me. I've confirmed this works with D78741 and I have updated the patch to make use of it. Thanks. I'll push this to unblock D78741 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83634/new/ https://reviews.llvm.org/D83634 From llvm-commits at lists.llvm.org Sun Jul 12 08:15:09 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits) Date: Sun, 12 Jul 2020 08:15:09 -0700 (PDT) Subject: [llvm] be9f363 - [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex Message-ID: <5f0b28fd.1c69fb81.2a2cd.a64e@mx.google.com> Author: Fangrui Song Date: 2020-07-12T08:14:52-07:00 New Revision: be9f363704a802b10b30d853f1bb6571e5ebed94 URL: https://github.com/llvm/llvm-project/commit/be9f363704a802b10b30d853f1bb6571e5ebed94 DIFF: https://github.com/llvm/llvm-project/commit/be9f363704a802b10b30d853f1bb6571e5ebed94.diff LOG: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex Differential Revision: https://reviews.llvm.org/D83634 Added: llvm/test/MC/AVR/hex-immediates.s Modified: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp b/llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp index 815a309a8cae..42fac5e2e000 100644 --- a/llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp +++ b/llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp @@ -131,7 +131,7 @@ void AVRInstPrinter::printOperand(const MCInst *MI, unsigned OpNo, O << getPrettyRegisterName(Op.getReg(), MRI); } } else if (Op.isImm()) { - O << Op.getImm(); + O << formatImm(Op.getImm()); } else { assert(Op.isExpr() && "Unknown operand kind in printOperand"); O << *Op.getExpr(); diff --git a/llvm/test/MC/AVR/hex-immediates.s b/llvm/test/MC/AVR/hex-immediates.s new file mode 100644 index 000000000000..ca4c8b9f3355 --- /dev/null +++ b/llvm/test/MC/AVR/hex-immediates.s @@ -0,0 +1,7 @@ +; RUN: llvm-mc -filetype=obj -triple=avr %s -o %t +; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=DEC +; RUN: llvm-objdump -d --print-imm-hex %t | FileCheck %s --check-prefix=HEX + +; DEC: ldi r24, 66 +; HEX: ldi r24, 0x42 + ldi r24, 0x42 From llvm-commits at lists.llvm.org Sun Jul 12 08:15:21 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:15:21 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex In-Reply-To: References: Message-ID: This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rGbe9f363704a8: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex (authored by MaskRay). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83634/new/ https://reviews.llvm.org/D83634 Files: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp llvm/test/MC/AVR/hex-immediates.s Index: llvm/test/MC/AVR/hex-immediates.s =================================================================== --- /dev/null +++ llvm/test/MC/AVR/hex-immediates.s @@ -0,0 +1,7 @@ +; RUN: llvm-mc -filetype=obj -triple=avr %s -o %t +; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=DEC +; RUN: llvm-objdump -d --print-imm-hex %t | FileCheck %s --check-prefix=HEX + +; DEC: ldi r24, 66 +; HEX: ldi r24, 0x42 + ldi r24, 0x42 Index: llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp =================================================================== --- llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp +++ llvm/lib/Target/AVR/MCTargetDesc/AVRInstPrinter.cpp @@ -131,7 +131,7 @@ O << getPrettyRegisterName(Op.getReg(), MRI); } } else if (Op.isImm()) { - O << Op.getImm(); + O << formatImm(Op.getImm()); } else { assert(Op.isExpr() && "Unknown operand kind in printOperand"); O << *Op.getExpr(); -------------- next part -------------- A non-text attachment was scrubbed... Name: D83634.277289.patch Type: text/x-patch Size: 936 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 08:38:43 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:38:43 +0000 (UTC) Subject: [PATCH] D83640: [PredicateInfo] Add a common method to interpret predicate as cmp constraint In-Reply-To: References: Message-ID: <30b348784be6d0661682f493e9578ed1@localhost.localdomain> nikic updated this revision to Diff 277290. nikic added a comment. Relax assertion. RenamedOp may not be accurate due to https://reviews.llvm.org/D78133#2145094, so don't assert this for now. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83640/new/ https://reviews.llvm.org/D83640 Files: llvm/include/llvm/Transforms/Utils/PredicateInfo.h llvm/lib/Transforms/Scalar/NewGVN.cpp llvm/lib/Transforms/Scalar/SCCP.cpp llvm/lib/Transforms/Utils/PredicateInfo.cpp llvm/test/Transforms/SCCP/predicateinfo-cond.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83640.277290.patch Type: text/x-patch Size: 13202 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 08:48:44 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via llvm-commits) Date: Sun, 12 Jul 2020 08:48:44 -0700 (PDT) Subject: [llvm] d589372 - [SCCP] Extend nonnull metadata test (NFC) Message-ID: <5f0b30dc.1c69fb81.1d11.d120@mx.google.com> Author: Nikita Popov Date: 2020-07-12T17:48:32+02:00 New Revision: d589372704fc7da0c143cbfe27f930a9d7dd333b URL: https://github.com/llvm/llvm-project/commit/d589372704fc7da0c143cbfe27f930a9d7dd333b DIFF: https://github.com/llvm/llvm-project/commit/d589372704fc7da0c143cbfe27f930a9d7dd333b.diff LOG: [SCCP] Extend nonnull metadata test (NFC) Added: Modified: llvm/test/Transforms/SCCP/metadata.ll Removed: ################################################################################ diff --git a/llvm/test/Transforms/SCCP/metadata.ll b/llvm/test/Transforms/SCCP/metadata.ll index 43e4c59571e9..844e2103ae31 100644 --- a/llvm/test/Transforms/SCCP/metadata.ll +++ b/llvm/test/Transforms/SCCP/metadata.ll @@ -44,16 +44,39 @@ define i32 @load_range_single_volatile(i32* %p) { ret i32 %v } -define void @load_nonnull(i32** %p) { +define void @load_nonnull(i32** %p, i32** %p2) { ; CHECK-LABEL: @load_nonnull( ; CHECK-NEXT: [[V:%.*]] = load i32*, i32** [[P:%.*]], align 8, !nonnull !2 +; CHECK-NEXT: [[V2:%.*]] = load i32*, i32** [[P2:%.*]], align 8, !nonnull !2 ; CHECK-NEXT: [[C1:%.*]] = icmp ne i32* [[V]], null ; CHECK-NEXT: call void @use(i1 [[C1]]) +; CHECK-NEXT: [[C2:%.*]] = icmp eq i32* [[V]], null +; CHECK-NEXT: call void @use(i1 [[C2]]) +; CHECK-NEXT: [[C3:%.*]] = icmp ne i32* null, [[V]] +; CHECK-NEXT: call void @use(i1 [[C3]]) +; CHECK-NEXT: [[C4:%.*]] = icmp eq i32* null, [[V]] +; CHECK-NEXT: call void @use(i1 [[C4]]) +; CHECK-NEXT: [[C5:%.*]] = icmp eq i32* [[V]], [[V2]] +; CHECK-NEXT: call void @use(i1 [[C5]]) +; CHECK-NEXT: [[C6:%.*]] = icmp ne i32* [[V]], [[V2]] +; CHECK-NEXT: call void @use(i1 [[C6]]) ; CHECK-NEXT: ret void ; %v = load i32*, i32** %p, !nonnull !{} + %v2 = load i32*, i32** %p2, !nonnull !{} %c1 = icmp ne i32* %v, null call void @use(i1 %c1) + %c2 = icmp eq i32* %v, null + call void @use(i1 %c2) + %c3 = icmp ne i32* null, %v + call void @use(i1 %c3) + %c4 = icmp eq i32* null, %v + call void @use(i1 %c4) + ; There is no particular relationship between two nonnull values. + %c5 = icmp eq i32* %v, %v2 + call void @use(i1 %c5) + %c6 = icmp ne i32* %v, %v2 + call void @use(i1 %c6) ret void } From llvm-commits at lists.llvm.org Sun Jul 12 08:53:35 2020 From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 15:53:35 +0000 (UTC) Subject: [PATCH] D83643: [SCCP] Propagate inequalities Message-ID: nikic created this revision. nikic added reviewers: fhahn, efriedma. Herald added subscribers: llvm-commits, hiraditya. Herald added a project: LLVM. Teach SCCP to create notconstant lattice values from inequality comparisons and nonnull metadata, and update getConstant() to make use of them. Additionally isOverdefined() needs to be changed to consider notconstant an overdefined value. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83643 Files: llvm/include/llvm/Analysis/ValueLattice.h llvm/lib/Transforms/Scalar/SCCP.cpp llvm/test/Transforms/SCCP/conditions-ranges.ll llvm/test/Transforms/SCCP/metadata.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83643.277291.patch Type: text/x-patch Size: 5010 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 09:20:48 2020 From: llvm-commits at lists.llvm.org (Ayke van Laethem via llvm-commits) Date: Sun, 12 Jul 2020 09:20:48 -0700 (PDT) Subject: [lld] 69e60c9 - [LLD][ELF][AVR] Implement the missing relocation types Message-ID: <5f0b3860.1c69fb81.b1088.38fe@mx.google.com> Author: Ayke van Laethem Date: 2020-07-12T18:18:54+02:00 New Revision: 69e60c9dc76653c10c4e8f7af1743307532102eb URL: https://github.com/llvm/llvm-project/commit/69e60c9dc76653c10c4e8f7af1743307532102eb DIFF: https://github.com/llvm/llvm-project/commit/69e60c9dc76653c10c4e8f7af1743307532102eb.diff LOG: [LLD][ELF][AVR] Implement the missing relocation types Implements the missing relocation types for AVR target. The results have been cross-checked with binutils. Original patch by LemonBoy. Some changes by me. Differential Revision: https://reviews.llvm.org/D78741 Added: lld/test/ELF/avr-reloc.s Modified: lld/ELF/Arch/AVR.cpp Removed: ################################################################################ diff --git a/lld/ELF/Arch/AVR.cpp b/lld/ELF/Arch/AVR.cpp index 9b733837dd5d..4513a970b32d 100644 --- a/lld/ELF/Arch/AVR.cpp +++ b/lld/ELF/Arch/AVR.cpp @@ -54,11 +54,131 @@ AVR::AVR() { noneRel = R_AVR_NONE; } RelExpr AVR::getRelExpr(RelType type, const Symbol &s, const uint8_t *loc) const { - return R_ABS; + switch (type) { + case R_AVR_7_PCREL: + case R_AVR_13_PCREL: + return R_PC; + default: + return R_ABS; + } +} + +static void writeLDI(uint8_t *loc, uint64_t val) { + write16le(loc, (read16le(loc) & 0xf0f0) | (val & 0xf0) << 4 | (val & 0x0f)); } void AVR::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const { switch (rel.type) { + case R_AVR_8: + checkUInt(loc, val, 8, rel); + *loc = val; + break; + case R_AVR_16: + // Note: this relocation is often used between code and data space, which + // are 0x800000 apart in the output ELF file. The bitmask cuts off the high + // bit. + write16le(loc, val & 0xffff); + break; + case R_AVR_16_PM: + checkAlignment(loc, val, 2, rel); + checkUInt(loc, val >> 1, 16, rel); + write16le(loc, val >> 1); + break; + case R_AVR_32: + checkUInt(loc, val, 32, rel); + write32le(loc, val); + break; + + case R_AVR_LDI: + checkUInt(loc, val, 8, rel); + writeLDI(loc, val & 0xff); + break; + + case R_AVR_LO8_LDI_NEG: + writeLDI(loc, -val & 0xff); + break; + case R_AVR_LO8_LDI: + writeLDI(loc, val & 0xff); + break; + case R_AVR_HI8_LDI_NEG: + writeLDI(loc, (-val >> 8) & 0xff); + break; + case R_AVR_HI8_LDI: + writeLDI(loc, (val >> 8) & 0xff); + break; + case R_AVR_HH8_LDI_NEG: + writeLDI(loc, (-val >> 16) & 0xff); + break; + case R_AVR_HH8_LDI: + writeLDI(loc, (val >> 16) & 0xff); + break; + case R_AVR_MS8_LDI_NEG: + writeLDI(loc, (-val >> 24) & 0xff); + break; + case R_AVR_MS8_LDI: + writeLDI(loc, (val >> 24) & 0xff); + break; + + case R_AVR_LO8_LDI_PM: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (val >> 1) & 0xff); + break; + case R_AVR_HI8_LDI_PM: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (val >> 9) & 0xff); + break; + case R_AVR_HH8_LDI_PM: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (val >> 17) & 0xff); + break; + + case R_AVR_LO8_LDI_PM_NEG: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (-val >> 1) & 0xff); + break; + case R_AVR_HI8_LDI_PM_NEG: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (-val >> 9) & 0xff); + break; + case R_AVR_HH8_LDI_PM_NEG: + checkAlignment(loc, val, 2, rel); + writeLDI(loc, (-val >> 17) & 0xff); + break; + + case R_AVR_PORT5: + checkUInt(loc, val, 5, rel); + write16le(loc, (read16le(loc) & 0xff07) | (val << 3)); + break; + case R_AVR_PORT6: + checkUInt(loc, val, 6, rel); + write16le(loc, (read16le(loc) & 0xf9f0) | (val & 0x30) << 5 | (val & 0x0f)); + break; + + // Since every jump destination is word aligned we gain an extra bit + case R_AVR_7_PCREL: { + checkInt(loc, val, 7, rel); + checkAlignment(loc, val, 2, rel); + const uint16_t target = (val - 2) >> 1; + write16le(loc, (read16le(loc) & 0xfc07) | ((target & 0x7f) << 3)); + break; + } + case R_AVR_13_PCREL: { + checkAlignment(loc, val, 2, rel); + const uint16_t target = (val - 2) >> 1; + write16le(loc, (read16le(loc) & 0xf000) | (target & 0xfff)); + break; + } + + case R_AVR_6: + checkInt(loc, val, 6, rel); + write16le(loc, (read16le(loc) & 0xd3f8) | (val & 0x20) << 8 | + (val & 0x18) << 7 | (val & 0x07)); + break; + case R_AVR_6_ADIW: + checkInt(loc, val, 6, rel); + write16le(loc, (read16le(loc) & 0xff30) | (val & 0x30) << 2 | (val & 0x0F)); + break; + case R_AVR_CALL: { uint16_t hi = val >> 17; uint16_t lo = val >> 1; diff --git a/lld/test/ELF/avr-reloc.s b/lld/test/ELF/avr-reloc.s new file mode 100644 index 000000000000..49f78044068b --- /dev/null +++ b/lld/test/ELF/avr-reloc.s @@ -0,0 +1,84 @@ +; REQUIRES: avr +; RUN: llvm-mc -filetype=obj -triple=avr -mcpu=atmega328p %s -o %t.o +; RUN: ld.lld %t.o --defsym=a=0x12345678 --defsym=b=30 -o %t +; RUN: llvm-objdump -d --print-imm-hex %t | FileCheck %s +; RUN: llvm-objdump -s %t | FileCheck --check-prefix=HEX %s + +.section .LDI,"ax", at progbits +; CHECK-LABEL: section .LDI: +; CHECK: ldi r20, 0x78 +; CHECK-NEXT: ldi r20, 0x56 +; CHECK-NEXT: ldi r20, 0x34 +; CHECK-NEXT: ldi r20, 0x12 +; CHECK-NEXT: ldi r20, 0x3c +; CHECK-NEXT: ldi r20, 0x2b +; CHECK-NEXT: ldi r20, 0x1a +; CHECK-NEXT: ldi r20, 0xff +ldi r20, lo8(a) ; R_AVR_LO8_LDI +ldi r20, hi8(a) ; R_AVR_HI8_LDI +ldi r20, hh8(a) ; R_AVR_HH8_LDI +ldi r20, hhi8(a) ; R_AVR_MS8_LDI + +ldi r20, pm_lo8(a) ; R_AVR_LO8_LDI_PM +ldi r20, pm_hi8(a) ; R_AVR_HI8_LDI_PM +ldi r20, pm_hh8(a) ; R_AVR_HH8_LDI_PM + +ldi r20, b+225 + +.section .LDI_NEG,"ax", at progbits +; CHECK-LABEL: section .LDI_NEG: +; CHECK: ldi r20, 0x88 +; CHECK-NEXT: ldi r20, 0xa9 +; CHECK-NEXT: ldi r20, 0xcb +; CHECK-NEXT: ldi r20, 0xed +; CHECK-NEXT: ldi r20, 0xc4 +; CHECK-NEXT: ldi r20, 0xd4 +; CHECK-NEXT: ldi r20, 0xe5 +ldi r20, lo8(-(a)) ; R_AVR_LO8_LDI_NEG +ldi r20, hi8(-(a)) ; R_AVR_HI8_LDI_NEG +ldi r20, hh8(-(a)) ; R_AVR_HH8_LDI_NEG +ldi r20, hhi8(-(a)) ; R_AVR_MS8_LDI_NEG + +ldi r20, pm_lo8(-(a)) ; R_AVR_LO8_LDI_PM_NEG +ldi r20, pm_hi8(-(a)) ; R_AVR_HI8_LDI_PM_NEG +ldi r20, pm_hh8(-(a)) ; R_AVR_HH8_LDI_PM_NEG + +;; The disassembler is not yet able to decode those opcodes +;; 9e 8e std Y+30, r9 +;; 9e 8c ldd r9, Y+30 +;; 4e 96 adiw r24, 0x1e +.section .SIX,"ax", at progbits +; HEX-LABEL: section .SIX: +; HEX-NEXT: 9e8e9e8c 4e96 +std Y+b, r9 ; R_AVR_6 +ldd r9, Y+b ; R_AVR_6 +adiw r24, b ; R_AVR_6_ADIW + +.section .PORT,"ax", at progbits +; CHECK-LABEL: section .PORT: +; CHECK: in r20, 0x1e +; CHECK-NEXT: sbic 0x1e, 0x1 +in r20, b ; R_AVR_PORT6 +sbic b, 1 ; R_AVR_PORT5 + +;; The disassembler is not yet able to decode those opcodes +;; 0f c0 rjmp .+30 +;; ee cf rjmp .-36 +;; 69 f0 breq .+26 +;; 61 f3 breq .-40 +.section .PCREL,"ax", at progbits +; HEX-LABEL: section .PCREL: +; HEX-NEXT: 0fc0eecf 69f061f3 +foo: +rjmp foo + 32 ; R_AVR_13_PCREL +rjmp foo - 32 ; R_AVR_13_PCREL +breq foo + 32 ; R_AVR_7_PCREL +breq foo - 32 ; R_AVR_7_PCREL + +.section .DATA,"ax", at progbits +; HEX-LABEL: section .DATA: +; HEX-NEXT: {{.*}} 1e1e000f 00785634 12 +.byte b ; R_AVR_8 +.short b ; R_AVR_16 +.short gs(b) ; R_AVR_16_PM +.long a ; R_AVR_32 From llvm-commits at lists.llvm.org Sun Jul 12 09:20:58 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 16:20:58 +0000 (UTC) Subject: [PATCH] D78741: [LLD][ELF][AVR] Implement the missing relocation types In-Reply-To: References: Message-ID: <7c2558d3735703560b5340107ba19957@localhost.localdomain> This revision was not accepted when it landed; it landed in state "Needs Review". This revision was automatically updated to reflect the committed changes. Closed by commit rG69e60c9dc766: [LLD][ELF][AVR] Implement the missing relocation types (authored by aykevl). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78741/new/ https://reviews.llvm.org/D78741 Files: lld/ELF/Arch/AVR.cpp lld/test/ELF/avr-reloc.s -------------- next part -------------- A non-text attachment was scrubbed... Name: D78741.277292.patch Type: text/x-patch Size: 6618 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 09:25:04 2020 From: llvm-commits at lists.llvm.org (Ayke via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 16:25:04 +0000 (UTC) Subject: [PATCH] D83634: [AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex In-Reply-To: References: Message-ID: <11a696de6d93bf8d688a0b4efabaf895@localhost.localdomain> aykevl added a comment. > Thanks. I'll push this to unblock D78741 . Awesome, thank you! I merged that change as well so LLVM 11 should get AVR lld support. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83634/new/ https://reviews.llvm.org/D83634 From llvm-commits at lists.llvm.org Sun Jul 12 09:35:14 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 16:35:14 +0000 (UTC) Subject: [PATCH] D82367: [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections. In-Reply-To: References: Message-ID: <09b32a2b109743dc02375750cf5b9571@localhost.localdomain> dblaikie added inline comments. ================ Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml:8-9 +# RUN: yaml2obj --docnum=1 -DENDIAN=ELFDATA2LSB %s -o %t1.le.o +# RUN: llvm-readobj --sections --section-data %t1.le.o | \ +# RUN: FileCheck -DSIZE=32 -DADDRALIGN=1 %s --check-prefixes=SHDR,DWARF32-LE + ---------------- Higuoxing wrote: > dblaikie wrote: > > jhenderson wrote: > > > dblaikie wrote: > > > > jhenderson wrote: > > > > > dblaikie wrote: > > > > > > Higuoxing wrote: > > > > > > > jhenderson wrote: > > > > > > > > dblaikie wrote: > > > > > > > > > Higuoxing wrote: > > > > > > > > > > dblaikie wrote: > > > > > > > > > > > Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?) > > > > > > > > > > Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense? > > > > > > > > > Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc. > > > > > > > > > > > > > > > > > > (so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml) > > > > > > > > (just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing. > > > > > > > > > > > > > > > > The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > > > > > > > Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project. > > > > > > > Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me! > > > > > > > Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great). > > > > > > > > > > > > Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me. > > > > > > > > > > > > Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure. > > > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > > > By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too. > > > > > > > > What do you mean by "auto-generation aspects"? > > > > > > > > But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time. > > > At the moment, to use yaml2obj to generate DWARF, you have to specify pretty much every detail of the DWARF, including the details of the abbrev table and the string table for example. Ideally, we should be able to describe the DWARF in a higher level manner (e.g. by just specifying the attributes and values in the .debug_info description, letting yaml2obj do all the leg work of selecting a form, populating the abbrev and string tables etc). You'll see details of this in @Higuoxing's mailing list posts about his GSOC project. > > > > > > We can use the basic-level testing for "bootstrapping". yaml2obj can generate valid raw sections, tested via hex -> allows testing of llvm-dwarfdump section dumping -> allows testing of yaml2obj higher-level functionality (because we know that llvm-dwarfdump section dumping now works). > > That seems like it's going to be fairly subtle/hard to maintain the separation here - if some yaml2obj tests use hex dumping but others can use llvm-dwarfdump - if/when/that's happening, might be worth separate directories for the two kinds of tests and some fairly specific documentation about how to determine which tests go where. > What do you think of making elf2yaml support dumping DWARF sections? In the future, we can use raw assembly to test elf2yaml and use elf2yaml to test yaml2elf. Probably useful that elf2yaml and yaml2elf roundtrip/support the same features (would make it easier to create yaml files to work with/pare down, etc). But as for testing - not sure - seems like it adds another layer of indirection (then we'd use raw assembly+llvm-mc to test elf2yaml, to test yaml2elf, to test llvm-dwarfdump - when we could've been using raw assembly to test llvm-dwarfdump) & not sure how much it improves/streamlines the testing matrix. All that said, we did used to test llvm-dwarfdump with checked in object files - then we accepted that assembly + llvm-mc didn't especially reduce the test quality despite increasing the surface area of the test by using llvm-mc. Though I think the more DWARF-specific the functionality gets the less that sort of line of reasoning applies (ie: Once we're generating all of DWARF - we're reaching the same complexity as the parsing logic and have now written a whole other DWARF representation with all the risk of bugs, etc). But really - I don't have any particular action/takeaway from these thoughts right now, but I think they're worth keeping in mind/thinking about as this work continues. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82367/new/ https://reviews.llvm.org/D82367 From llvm-commits at lists.llvm.org Sun Jul 12 09:37:00 2020 From: llvm-commits at lists.llvm.org (weiwei via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 16:37:00 +0000 (UTC) Subject: [PATCH] D83644: [AArch64][ELF] Support FDE references more than +/-2GB range for AArch64 large code model Message-ID: wwei created this revision. wwei added reviewers: psmith, MaskRay, keith.walker.arm, jmolloy, leonardchan, mcgrathr, pcc. wwei added a project: LLVM. Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls, emaste. Herald added a reviewer: espindola. For JIT applications, the code and .eh_frame maybe placed very far apart. The large code model for AArch64 should handle FDE references more than +/-2GB range. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83644 Files: llvm/lib/MC/MCObjectFileInfo.cpp llvm/test/MC/ELF/cfi-large-model.s Index: llvm/test/MC/ELF/cfi-large-model.s =================================================================== --- llvm/test/MC/ELF/cfi-large-model.s +++ llvm/test/MC/ELF/cfi-large-model.s @@ -1,10 +1,12 @@ +// REQUIRES: aarch64-registered-target // REQUIRES: powerpc-registered-target // REQUIRES: x86-registered-target // RUN: llvm-mc -filetype=obj -triple x86_64-pc-linux-gnu -large-code-model %s \ // RUN: -o - | llvm-readobj -S --sd | FileCheck --check-prefix=CHECK-X86 %s // RUN: llvm-mc -filetype=obj -triple powerpc64le-linux-gnu -large-code-model %s \ // RUN: -o - | llvm-readobj -S --sd | FileCheck --check-prefix=CHECK-PPC %s - +// RUN: llvm-mc -filetype=obj -triple aarch64-linux-gnu -large-code-model %s \ +// RUN: -o - | llvm-readobj -S --sd | FileCheck --check-prefix=CHECK-ARM64 %s // CHECK-X86: Section { // CHECK-X86: Index: @@ -48,6 +50,27 @@ // CHECK-PPC-NEXT: ) // CHECK-PPC-NEXT: } +// CHECK-ARM64: Section { +// CHECK-ARM64: Index: +// CHECK-ARM64: Name: .eh_frame +// CHECK-ARM64-NEXT: Type: SHT_PROGBITS +// CHECK-ARM64-NEXT: Flags [ +// CHECK-ARM64-NEXT: SHF_ALLOC +// CHECK-ARM64-NEXT: ] +// CHECK-ARM64-NEXT: Address: 0x0 +// CHECK-ARM64-NEXT: Offset: 0x40 +// CHECK-ARM64-NEXT: Size: 48 +// CHECK-ARM64-NEXT: Link: 0 +// CHECK-ARM64-NEXT: Info: 0 +// CHECK-ARM64-NEXT: AddressAlignment: 8 +// CHECK-ARM64-NEXT: EntrySize: 0 +// CHECK-ARM64-NEXT: SectionData ( +// CHECK-ARM64-NEXT: 0000: 10000000 00000000 017A5200 017C1E01 |.........zR..|..| +// CHECK-ARM64-NEXT: 0010: 1C0C1F00 18000000 18000000 00000000 |................| +// CHECK-ARM64-NEXT: 0020: 00000000 00000000 00000000 00000000 |................| +// CHECK-ARM64-NEXT: ) +// CHECK-ARM64-NEXT: } + f: .cfi_startproc .cfi_endproc Index: llvm/lib/MC/MCObjectFileInfo.cpp =================================================================== --- llvm/lib/MC/MCObjectFileInfo.cpp +++ llvm/lib/MC/MCObjectFileInfo.cpp @@ -315,6 +315,8 @@ ? dwarf::DW_EH_PE_sdata4 : dwarf::DW_EH_PE_sdata8; break; + case Triple::aarch64: + case Triple::aarch64_be: case Triple::ppc64: case Triple::ppc64le: case Triple::x86_64: -------------- next part -------------- A non-text attachment was scrubbed... Name: D83644.277293.patch Type: text/x-patch Size: 2254 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 09:46:57 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 16:46:57 +0000 (UTC) Subject: [PATCH] D83638: BPF: permit .maps section variables with typedef type In-Reply-To: References: Message-ID: <8745896ed9eda42f2780b61fb60111af@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG152a9fef1b3b: BPF: permit .maps section variables with typedef type (authored by yonghong-song). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83638/new/ https://reviews.llvm.org/D83638 Files: llvm/lib/Target/BPF/BTFDebug.cpp llvm/test/CodeGen/BPF/BTF/map-def-2.ll llvm/test/CodeGen/BPF/BTF/map-def-3.ll llvm/test/CodeGen/BPF/BTF/map-def.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83638.277294.patch Type: text/x-patch Size: 15935 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 09:46:58 2020 From: llvm-commits at lists.llvm.org (Yonghong Song via llvm-commits) Date: Sun, 12 Jul 2020 09:46:58 -0700 (PDT) Subject: [llvm] 152a9fe - BPF: permit .maps section variables with typedef type Message-ID: <5f0b3e82.1c69fb81.ef764.c063@mx.google.com> Author: Yonghong Song Date: 2020-07-12T09:42:25-07:00 New Revision: 152a9fef1b3b44f2c224cb8096b3d649279f2578 URL: https://github.com/llvm/llvm-project/commit/152a9fef1b3b44f2c224cb8096b3d649279f2578 DIFF: https://github.com/llvm/llvm-project/commit/152a9fef1b3b44f2c224cb8096b3d649279f2578.diff LOG: BPF: permit .maps section variables with typedef type Currently, llvm when see a global variable in .maps section, it ensures its type must be a struct type. Then pointee will be further evaluated for the structure members. In normal cases, the pointee type will be skipped. Although this is what current all bpf programs are doing, but it is a little bit restrictive. For example, it is legitimate for users to have: typedef struct { int key_size; int value_size; } __map_t; __map_t map __attribute__((section(".maps"))); This patch lifts this restriction and typedef of a struct type is also allowed for .maps section variables. To avoid create unnecessary fixup entries when traversal started with typedef/struct type, the new implementation first traverse all map struct members and then traverse the typedef/struct type. This way, in internal BTFDebug implementation, no fixup entries are generated. Two new unit tests are added for typedef and const struct in .maps section. Also tested with kernel bpf selftests. Differential Revision: https://reviews.llvm.org/D83638 Added: llvm/test/CodeGen/BPF/BTF/map-def-2.ll llvm/test/CodeGen/BPF/BTF/map-def-3.ll Modified: llvm/lib/Target/BPF/BTFDebug.cpp llvm/test/CodeGen/BPF/BTF/map-def.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/BPF/BTFDebug.cpp b/llvm/lib/Target/BPF/BTFDebug.cpp index 6ada75adba96..4510e9357489 100644 --- a/llvm/lib/Target/BPF/BTFDebug.cpp +++ b/llvm/lib/Target/BPF/BTFDebug.cpp @@ -664,7 +664,17 @@ void BTFDebug::visitMapDefType(const DIType *Ty, uint32_t &TypeId) { return; } - // MapDef type is a struct type + // MapDef type may be a struct type or a non-pointer derived type + const DIType *OrigTy = Ty; + while (auto *DTy = dyn_cast(Ty)) { + auto Tag = DTy->getTag(); + if (Tag != dwarf::DW_TAG_typedef && Tag != dwarf::DW_TAG_const_type && + Tag != dwarf::DW_TAG_volatile_type && + Tag != dwarf::DW_TAG_restrict_type) + break; + Ty = DTy->getBaseType(); + } + const auto *CTy = dyn_cast(Ty); if (!CTy) return; @@ -673,27 +683,15 @@ void BTFDebug::visitMapDefType(const DIType *Ty, uint32_t &TypeId) { if (Tag != dwarf::DW_TAG_structure_type || CTy->isForwardDecl()) return; - // Record this type + // Visit all struct members to ensure pointee type is visited const DINodeArray Elements = CTy->getElements(); - bool HasBitField = false; - for (const auto *Element : Elements) { - auto E = cast(Element); - if (E->isBitField()) { - HasBitField = true; - break; - } - } - - auto TypeEntry = - std::make_unique(CTy, true, HasBitField, Elements.size()); - StructTypes.push_back(TypeEntry.get()); - TypeId = addType(std::move(TypeEntry), CTy); - - // Visit all struct members for (const auto *Element : Elements) { const auto *MemberType = cast(Element); visitTypeEntry(MemberType->getBaseType()); } + + // Visit this type, struct or a const/typedef/volatile/restrict type + visitTypeEntry(OrigTy, TypeId, false, false); } /// Read file contents from the actual file or from the source diff --git a/llvm/test/CodeGen/BPF/BTF/map-def-2.ll b/llvm/test/CodeGen/BPF/BTF/map-def-2.ll new file mode 100644 index 000000000000..bf3c4a7961fb --- /dev/null +++ b/llvm/test/CodeGen/BPF/BTF/map-def-2.ll @@ -0,0 +1,90 @@ +; RUN: llc -march=bpfel -filetype=asm -o - %s | FileCheck -check-prefixes=CHECK %s +; RUN: llc -march=bpfeb -filetype=asm -o - %s | FileCheck -check-prefixes=CHECK %s +; +; Source code: +; struct key_type { +; int a1; +; }; +; typedef struct map_type { +; struct key_type *key; +; } _map_type; +; typedef _map_type __map_type; +; __map_type __attribute__((section(".maps"))) hash_map; +; Compilation flag: +; clang -target bpf -O2 -g -S -emit-llvm t2.c + +%struct.map_type = type { %struct.key_type* } +%struct.key_type = type { i32 } + + at hash_map = dso_local local_unnamed_addr global %struct.map_type zeroinitializer, section ".maps", align 8, !dbg !0 + +; CHECK: .long 0 # BTF_KIND_PTR(id = 1) +; CHECK-NEXT: .long 33554432 # 0x2000000 +; CHECK-NEXT: .long 2 +; CHECK-NEXT: .long 1 # BTF_KIND_STRUCT(id = 2) +; CHECK-NEXT: .long 67108865 # 0x4000001 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 10 +; CHECK-NEXT: .long 3 +; CHECK-NEXT: .long 0 # 0x0 +; CHECK-NEXT: .long 13 # BTF_KIND_INT(id = 3) +; CHECK-NEXT: .long 16777216 # 0x1000000 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 16777248 # 0x1000020 +; CHECK-NEXT: .long 17 # BTF_KIND_TYPEDEF(id = 4) +; CHECK-NEXT: .long 134217728 # 0x8000000 +; CHECK-NEXT: .long 5 +; CHECK-NEXT: .long 28 # BTF_KIND_TYPEDEF(id = 5) +; CHECK-NEXT: .long 134217728 # 0x8000000 +; CHECK-NEXT: .long 6 +; CHECK-NEXT: .long 38 # BTF_KIND_STRUCT(id = 6) +; CHECK-NEXT: .long 67108865 # 0x4000001 +; CHECK-NEXT: .long 8 +; CHECK-NEXT: .long 47 +; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 0 # 0x0 +; CHECK-NEXT: .long 51 # BTF_KIND_VAR(id = 7) +; CHECK-NEXT: .long 234881024 # 0xe000000 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 60 # BTF_KIND_DATASEC(id = 8) +; CHECK-NEXT: .long 251658241 # 0xf000001 +; CHECK-NEXT: .long 0 +; CHECK-NEXT: .long 7 +; CHECK-NEXT: .long hash_map +; CHECK-NEXT: .long 8 + +; CHECK: .ascii "key_type" # string offset=1 +; CHECK: .ascii "a1" # string offset=10 +; CHECK: .ascii "int" # string offset=13 +; CHECK: .ascii "__map_type" # string offset=17 +; CHECK: .ascii "_map_type" # string offset=28 +; CHECK: .ascii "map_type" # string offset=38 +; CHECK: .ascii "key" # string offset=47 +; CHECK: .ascii "hash_map" # string offset=51 +; CHECK: .ascii ".maps" # string offset=60 + +!llvm.dbg.cu = !{!2} +!llvm.module.flags = !{!16, !17, !18} +!llvm.ident = !{!19} + +!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression()) +!1 = distinct !DIGlobalVariable(name: "hash_map", scope: !2, file: !3, line: 8, type: !6, isLocal: false, isDefinition: true) +!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "clang version 11.0.0 (https://github.com/llvm/llvm-project.git b8409c03ed90807f3d49c7d98dceea98cf461f7a)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5, splitDebugInlining: false, nameTableKind: None) +!3 = !DIFile(filename: "t2.c", directory: "/tmp/home/yhs/tmp1") +!4 = !{} +!5 = !{!0} +!6 = !DIDerivedType(tag: DW_TAG_typedef, name: "__map_type", file: !3, line: 7, baseType: !7) +!7 = !DIDerivedType(tag: DW_TAG_typedef, name: "_map_type", file: !3, line: 6, baseType: !8) +!8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "map_type", file: !3, line: 4, size: 64, elements: !9) +!9 = !{!10} +!10 = !DIDerivedType(tag: DW_TAG_member, name: "key", scope: !8, file: !3, line: 5, baseType: !11, size: 64) +!11 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !12, size: 64) +!12 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "key_type", file: !3, line: 1, size: 32, elements: !13) +!13 = !{!14} +!14 = !DIDerivedType(tag: DW_TAG_member, name: "a1", scope: !12, file: !3, line: 2, baseType: !15, size: 32) +!15 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) +!16 = !{i32 7, !"Dwarf Version", i32 4} +!17 = !{i32 2, !"Debug Info Version", i32 3} +!18 = !{i32 1, !"wchar_size", i32 4} +!19 = !{!"clang version 11.0.0 (https://github.com/llvm/llvm-project.git b8409c03ed90807f3d49c7d98dceea98cf461f7a)"} diff --git a/llvm/test/CodeGen/BPF/BTF/map-def-3.ll b/llvm/test/CodeGen/BPF/BTF/map-def-3.ll new file mode 100644 index 000000000000..e05470782ec2 --- /dev/null +++ b/llvm/test/CodeGen/BPF/BTF/map-def-3.ll @@ -0,0 +1,65 @@ +; RUN: llc -march=bpfel -filetype=asm -o - %s | FileCheck -check-prefixes=CHECK %s +; RUN: llc -march=bpfeb -filetype=asm -o - %s | FileCheck -check-prefixes=CHECK %s +; +; Source code: +; struct key_type { +; int a1; +; }; +; const struct key_type __attribute__((section(".maps"))) hash_map; +; Compilation flag: +; clang -target bpf -O2 -g -S -emit-llvm t3.c + +%struct.key_type = type { i32 } + + at hash_map = dso_local local_unnamed_addr constant %struct.key_type zeroinitializer, section ".maps", align 4, !dbg !0 + +; CHECK: .long 1 # BTF_KIND_INT(id = 1) +; CHECK-NEXT: .long 16777216 # 0x1000000 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 16777248 # 0x1000020 +; CHECK-NEXT: .long 0 # BTF_KIND_CONST(id = 2) +; CHECK-NEXT: .long 167772160 # 0xa000000 +; CHECK-NEXT: .long 3 +; CHECK-NEXT: .long 5 # BTF_KIND_STRUCT(id = 3) +; CHECK-NEXT: .long 67108865 # 0x4000001 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 14 +; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 0 # 0x0 +; CHECK-NEXT: .long 17 # BTF_KIND_VAR(id = 4) +; CHECK-NEXT: .long 234881024 # 0xe000000 +; CHECK-NEXT: .long 2 +; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 26 # BTF_KIND_DATASEC(id = 5) +; CHECK-NEXT: .long 251658241 # 0xf000001 +; CHECK-NEXT: .long 0 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long hash_map +; CHECK-NEXT: .long 4 + +; CHECK: .ascii "int" # string offset=1 +; CHECK: .ascii "key_type" # string offset=5 +; CHECK: .ascii "a1" # string offset=14 +; CHECK: .ascii "hash_map" # string offset=17 +; CHECK: .ascii ".maps" # string offset=26 + + +!llvm.dbg.cu = !{!2} +!llvm.module.flags = !{!11, !12, !13} +!llvm.ident = !{!14} + +!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression()) +!1 = distinct !DIGlobalVariable(name: "hash_map", scope: !2, file: !3, line: 4, type: !6, isLocal: false, isDefinition: true) +!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "clang version 11.0.0 (https://github.com/llvm/llvm-project.git 5bd074629f00d4798674b411cf00216f38016483)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5, splitDebugInlining: false, nameTableKind: None) +!3 = !DIFile(filename: "t3.c", directory: "/tmp/home/yhs/tmp1") +!4 = !{} +!5 = !{!0} +!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) +!7 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "key_type", file: !3, line: 1, size: 32, elements: !8) +!8 = !{!9} +!9 = !DIDerivedType(tag: DW_TAG_member, name: "a1", scope: !7, file: !3, line: 2, baseType: !10, size: 32) +!10 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) +!11 = !{i32 7, !"Dwarf Version", i32 4} +!12 = !{i32 2, !"Debug Info Version", i32 3} +!13 = !{i32 1, !"wchar_size", i32 4} +!14 = !{!"clang version 11.0.0 (https://github.com/llvm/llvm-project.git 5bd074629f00d4798674b411cf00216f38016483)"} diff --git a/llvm/test/CodeGen/BPF/BTF/map-def.ll b/llvm/test/CodeGen/BPF/BTF/map-def.ll index cf777880efa1..e12cde3ef98a 100644 --- a/llvm/test/CodeGen/BPF/BTF/map-def.ll +++ b/llvm/test/CodeGen/BPF/BTF/map-def.ll @@ -28,41 +28,41 @@ ; CHECK-NEXT: .long 168 ; CHECK-NEXT: .long 168 ; CHECK-NEXT: .long 65 -; CHECK-NEXT: .long 1 # BTF_KIND_STRUCT(id = 1) -; CHECK-NEXT: .long 67108866 # 0x4000002 -; CHECK-NEXT: .long 16 -; CHECK-NEXT: .long 10 -; CHECK-NEXT: .long 2 -; CHECK-NEXT: .long 0 # 0x0 -; CHECK-NEXT: .long 14 -; CHECK-NEXT: .long 5 -; CHECK-NEXT: .long 64 # 0x40 -; CHECK-NEXT: .long 0 # BTF_KIND_PTR(id = 2) +; CHECK-NEXT: .long 0 # BTF_KIND_PTR(id = 1) ; CHECK-NEXT: .long 33554432 # 0x2000000 -; CHECK-NEXT: .long 3 -; CHECK-NEXT: .long 20 # BTF_KIND_STRUCT(id = 3) +; CHECK-NEXT: .long 2 +; CHECK-NEXT: .long 1 # BTF_KIND_STRUCT(id = 2) ; CHECK-NEXT: .long 67108866 # 0x4000002 ; CHECK-NEXT: .long 8 -; CHECK-NEXT: .long 29 -; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 10 +; CHECK-NEXT: .long 3 ; CHECK-NEXT: .long 0 # 0x0 -; CHECK-NEXT: .long 31 -; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 12 +; CHECK-NEXT: .long 3 ; CHECK-NEXT: .long 32 # 0x20 -; CHECK-NEXT: .long 33 # BTF_KIND_INT(id = 4) +; CHECK-NEXT: .long 14 # BTF_KIND_INT(id = 3) ; CHECK-NEXT: .long 16777216 # 0x1000000 ; CHECK-NEXT: .long 4 ; CHECK-NEXT: .long 16777248 # 0x1000020 -; CHECK-NEXT: .long 0 # BTF_KIND_PTR(id = 5) +; CHECK-NEXT: .long 0 # BTF_KIND_PTR(id = 4) ; CHECK-NEXT: .long 33554432 # 0x2000000 -; CHECK-NEXT: .long 6 -; CHECK-NEXT: .long 37 # BTF_KIND_INT(id = 6) +; CHECK-NEXT: .long 5 +; CHECK-NEXT: .long 18 # BTF_KIND_INT(id = 5) ; CHECK-NEXT: .long 16777216 # 0x1000000 ; CHECK-NEXT: .long 4 ; CHECK-NEXT: .long 32 # 0x20 +; CHECK-NEXT: .long 31 # BTF_KIND_STRUCT(id = 6) +; CHECK-NEXT: .long 67108866 # 0x4000002 +; CHECK-NEXT: .long 16 +; CHECK-NEXT: .long 40 +; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 0 # 0x0 +; CHECK-NEXT: .long 44 +; CHECK-NEXT: .long 4 +; CHECK-NEXT: .long 64 # 0x40 ; CHECK-NEXT: .long 50 # BTF_KIND_VAR(id = 7) ; CHECK-NEXT: .long 234881024 # 0xe000000 -; CHECK-NEXT: .long 1 +; CHECK-NEXT: .long 6 ; CHECK-NEXT: .long 1 ; CHECK-NEXT: .long 59 # BTF_KIND_DATASEC(id = 8) ; CHECK-NEXT: .long 251658241 # 0xf000001 @@ -71,21 +71,21 @@ ; CHECK-NEXT: .long hash_map ; CHECK-NEXT: .long 16 ; CHECK-NEXT: .byte 0 # string offset=0 -; CHECK-NEXT: .ascii "map_type" # string offset=1 +; CHECK-NEXT: .ascii "key_type" # string offset=1 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .ascii "key" # string offset=10 +; CHECK-NEXT: .byte 97 # string offset=10 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .ascii "value" # string offset=14 +; CHECK-NEXT: .byte 98 # string offset=12 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .ascii "key_type" # string offset=20 +; CHECK-NEXT: .ascii "int" # string offset=14 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .byte 97 # string offset=29 +; CHECK-NEXT: .ascii "unsigned int" # string offset=18 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .byte 98 # string offset=31 +; CHECK-NEXT: .ascii "map_type" # string offset=31 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .ascii "int" # string offset=33 +; CHECK-NEXT: .ascii "key" # string offset=40 ; CHECK-NEXT: .byte 0 -; CHECK-NEXT: .ascii "unsigned int" # string offset=37 +; CHECK-NEXT: .ascii "value" # string offset=44 ; CHECK-NEXT: .byte 0 ; CHECK-NEXT: .ascii "hash_map" # string offset=50 ; CHECK-NEXT: .byte 0 From llvm-commits at lists.llvm.org Sun Jul 12 09:59:03 2020 From: llvm-commits at lists.llvm.org (=?utf-8?q?D=C3=A1vid_Bolvansk=C3=BD_via_Phabricator?= via llvm-commits) Date: Sun, 12 Jul 2020 16:59:03 +0000 (UTC) Subject: [PATCH] D83567: [DAGCombiner] allow load/store merging if pairs can be rotated into place In-Reply-To: References: Message-ID: <28e8c9f0f496c7f346c67b334d90e2e0@localhost.localdomain> xbolva00 added a comment. Can you create new PR for aarch64’s missed opportunity to use rev16? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83567/new/ https://reviews.llvm.org/D83567 From llvm-commits at lists.llvm.org Sun Jul 12 10:42:15 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via llvm-commits) Date: Sun, 12 Jul 2020 10:42:15 -0700 (PDT) Subject: [llvm] 82a5157 - [LV] Fixing versioning-for-unit-stide of loops with small trip count Message-ID: <5f0b4b77.1c69fb81.78ec3.c8b1@mx.google.com> Author: Ayal Zaks Date: 2020-07-12T19:51:47+03:00 New Revision: 82a5157ff1650e3366f7a9c619269766ad1d5e93 URL: https://github.com/llvm/llvm-project/commit/82a5157ff1650e3366f7a9c619269766ad1d5e93 DIFF: https://github.com/llvm/llvm-project/commit/82a5157ff1650e3366f7a9c619269766ad1d5e93.diff LOG: [LV] Fixing versioning-for-unit-stide of loops with small trip count This patch fixes D81345 and PR46652. If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis still generates runtime checks for unit strides that will version the loop. In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345. The latter is applied for now as the former requires refactoring. Differential Revision: https://reviews.llvm.org/D83470 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/optsize.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 10e690d56ffd..35af8e425778 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4949,8 +4949,14 @@ bool LoopVectorizationCostModel::runtimeChecksRequired() { return true; } - assert(Legal->getLAI()->getSymbolicStrides().empty() && - "Specializing for stride == 1 under -Os/-Oz"); + // FIXME: Avoid specializing for stride==1 instead of bailing out. + if (!Legal->getLAI()->getSymbolicStrides().empty()) { + reportVectorizationFailure("Runtime stride check for small trip count", + "runtime stride == 1 checks needed. Enable vectorization of " + "this loop without such check by compiling with -Os/-Oz", + "CantVersionLoopWithOptForSize", ORE, TheLoop); + return true; + } return false; } diff --git a/llvm/test/Transforms/LoopVectorize/optsize.ll b/llvm/test/Transforms/LoopVectorize/optsize.ll index 8def1ab0a0e8..0e88f362746f 100644 --- a/llvm/test/Transforms/LoopVectorize/optsize.ll +++ b/llvm/test/Transforms/LoopVectorize/optsize.ll @@ -221,6 +221,32 @@ for.end: ret void } +; PR46652: Check that the need for stride==1 check prevents vectorizing a loop +; having tiny trip count, when compiling w/o -Os/-Oz. +; CHECK-LABEL: @pr46652 +; CHECK-NOT: vector.scevcheck +; CHECK-NOT: vector.body +; CHECK-LABEL: for.body + + at g = external global [1 x i16], align 1 + +define void @pr46652(i16 %stride) { +entry: + br label %for.body + +for.body: ; preds = %for.body, %entry + %l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ] + %mul = mul nsw i16 %l1.02, %stride + %arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul + %0 = load i16, i16* %arrayidx6, align 1 + %inc9 = add nuw nsw i16 %l1.02, 1 + %exitcond.not = icmp eq i16 %inc9, 16 + br i1 %exitcond.not, label %for.end, label %for.body + +for.end: ; preds = %for.body + ret void +} + !llvm.module.flags = !{!0} !0 = !{i32 1, !"ProfileSummary", !1} !1 = !{!2, !3, !4, !5, !6, !7, !8, !9} From llvm-commits at lists.llvm.org Sun Jul 12 10:42:27 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 17:42:27 +0000 (UTC) Subject: [PATCH] D83470: [LV] Fix versioning-for-unit-stide of loops with small trip count In-Reply-To: References: Message-ID: <2cd2efc6c7f22c114ce2e78895f8caff@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG82a5157ff165: [LV] Fixing versioning-for-unit-stide of loops with small trip count (authored by Ayal). Changed prior to commit: https://reviews.llvm.org/D83470?vs=276687&id=277295#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83470/new/ https://reviews.llvm.org/D83470 Files: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/optsize.ll Index: llvm/test/Transforms/LoopVectorize/optsize.ll =================================================================== --- llvm/test/Transforms/LoopVectorize/optsize.ll +++ llvm/test/Transforms/LoopVectorize/optsize.ll @@ -221,6 +221,32 @@ ret void } +; PR46652: Check that the need for stride==1 check prevents vectorizing a loop +; having tiny trip count, when compiling w/o -Os/-Oz. +; CHECK-LABEL: @pr46652 +; CHECK-NOT: vector.scevcheck +; CHECK-NOT: vector.body +; CHECK-LABEL: for.body + + at g = external global [1 x i16], align 1 + +define void @pr46652(i16 %stride) { +entry: + br label %for.body + +for.body: ; preds = %for.body, %entry + %l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ] + %mul = mul nsw i16 %l1.02, %stride + %arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul + %0 = load i16, i16* %arrayidx6, align 1 + %inc9 = add nuw nsw i16 %l1.02, 1 + %exitcond.not = icmp eq i16 %inc9, 16 + br i1 %exitcond.not, label %for.end, label %for.body + +for.end: ; preds = %for.body + ret void +} + !llvm.module.flags = !{!0} !0 = !{i32 1, !"ProfileSummary", !1} !1 = !{!2, !3, !4, !5, !6, !7, !8, !9} Index: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp =================================================================== --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4949,8 +4949,14 @@ return true; } - assert(Legal->getLAI()->getSymbolicStrides().empty() && - "Specializing for stride == 1 under -Os/-Oz"); + // FIXME: Avoid specializing for stride==1 instead of bailing out. + if (!Legal->getLAI()->getSymbolicStrides().empty()) { + reportVectorizationFailure("Runtime stride check for small trip count", + "runtime stride == 1 checks needed. Enable vectorization of " + "this loop without such check by compiling with -Os/-Oz", + "CantVersionLoopWithOptForSize", ORE, TheLoop); + return true; + } return false; } -------------- next part -------------- A non-text attachment was scrubbed... Name: D83470.277295.patch Type: text/x-patch Size: 2074 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 10:48:32 2020 From: llvm-commits at lists.llvm.org (Ayal Zaks via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 17:48:32 +0000 (UTC) Subject: [PATCH] D83470: [LV] Fix versioning-for-unit-stide of loops with small trip count In-Reply-To: References: Message-ID: Ayal marked an inline comment as done. Ayal added a comment. In D83470#2142057 , @fhahn wrote: > LGTM, thanks! > > > In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345 . The latter is chosen for now as the former requires refactoring. > > As already discussed in D81345 , ideally LV would have more flexibility to drive LAA, but this requires non-trivial refactoring. Which we should do, but until then the patch looks like a reasonable fix to the crash. Indeed, seems like LV would need to re-run analyzeLoop(), which would affect other users. ================ Comment at: llvm/test/Transforms/LoopVectorize/optsize.ll:239 + %l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ] + %mul = mul nsw i16 %l1.02, undef + %arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul ---------------- fhahn wrote: > Better to use a non-undef constant/value? Sure, done, thanks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83470/new/ https://reviews.llvm.org/D83470 From llvm-commits at lists.llvm.org Sun Jul 12 10:53:32 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 10:53:32 -0700 (PDT) Subject: [llvm] 04013a0 - [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts Message-ID: <5f0b4e1c.1c69fb81.bda90.ae77@mx.google.com> Author: Craig Topper Date: 2020-07-12T10:52:43-07:00 New Revision: 04013a07ac3b67eb176ddfd1ddaeda41415c038f URL: https://github.com/llvm/llvm-project/commit/04013a07ac3b67eb176ddfd1ddaeda41415c038f DIFF: https://github.com/llvm/llvm-project/commit/04013a07ac3b67eb176ddfd1ddaeda41415c038f.diff LOG: [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts peekThroughOneUseBitcasts checks the use count of the operand of the bitcast. Not the bitcast itself. So I think that means we need to do any outside haseOneUse checks before calling the function not after. I was working on another patch where I misused the function and did a very quick audit to see if I there were other similar mistakes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83598 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 721b262aa433..7657125e1e5a 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36464,9 +36464,9 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG &DAG, (V.getOpcode() == X86ISD::PSHUFLW || V.getOpcode() == X86ISD::PSHUFHW) && V.getOpcode() != N.getOpcode() && - V.hasOneUse()) { + V.hasOneUse() && V.getOperand(0).hasOneUse()) { SDValue D = peekThroughOneUseBitcasts(V.getOperand(0)); - if (D.getOpcode() == X86ISD::PSHUFD && D.hasOneUse()) { + if (D.getOpcode() == X86ISD::PSHUFD) { SmallVector VMask = getPSHUFShuffleMask(V); SmallVector DMask = getPSHUFShuffleMask(D); int NOffset = N.getOpcode() == X86ISD::PSHUFLW ? 0 : 4; @@ -36903,10 +36903,11 @@ static SDValue combineShuffle(SDNode *N, SelectionDAG &DAG, // insert into a zero vector. This helps get VZEXT_MOVL closer to // scalar_to_vectors where 256/512 are canonicalized to an insert and a // 128-bit scalar_to_vector. This reduces the number of isel patterns. - if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps()) { + if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps() && + N->getOperand(0).hasOneUse()) { SDValue V = peekThroughOneUseBitcasts(N->getOperand(0)); - if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.hasOneUse() && + if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.getOperand(0).isUndef() && isNullConstant(V.getOperand(2))) { SDValue In = V.getOperand(1); MVT SubVT = From llvm-commits at lists.llvm.org Sun Jul 12 10:53:34 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 10:53:34 -0700 (PDT) Subject: [llvm] f8f007e - [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero Message-ID: <5f0b4e1e.1c69fb81.692b4.a65f@mx.google.com> Author: Craig Topper Date: 2020-07-12T10:52:43-07:00 New Revision: f8f007e378e1ed84fadf281f05166a4463a79316 URL: https://github.com/llvm/llvm-project/commit/f8f007e378e1ed84fadf281f05166a4463a79316 DIFF: https://github.com/llvm/llvm-project/commit/f8f007e378e1ed84fadf281f05166a4463a79316.diff LOG: [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero Bit 7 of the index controls zeroing, the other bits are ignored when bit 7 is set. Shuffle lowering was using 128 and shuffle combining was using 255. Seems like we should be consistent. This patch changes shuffle combining to use 128 to match lowering. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83587 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-trunc.ll Removed: ################################################################################ diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 7657125e1e5a..450927aaf5cc 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35043,7 +35043,7 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, continue; } if (M == SM_SentinelZero) { - PSHUFBMask.push_back(DAG.getConstant(255, DL, MVT::i8)); + PSHUFBMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; @@ -35074,7 +35074,7 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, continue; } if (M == SM_SentinelZero) { - VPPERMMask.push_back(DAG.getConstant(128, DL, MVT::i8)); + VPPERMMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; diff --git a/llvm/test/CodeGen/X86/vector-trunc.ll b/llvm/test/CodeGen/X86/vector-trunc.ll index a5f6be558e8c..1d596f5db3ae 100644 --- a/llvm/test/CodeGen/X86/vector-trunc.ll +++ b/llvm/test/CodeGen/X86/vector-trunc.ll @@ -456,7 +456,7 @@ define <8 x i16> @trunc8i32_8i16_lshr(<8 x i32> %a) { ; ; SSSE3-LABEL: trunc8i32_8i16_lshr: ; SSSE3: # %bb.0: # %entry -; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,255,255] +; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,128,128] ; SSSE3-NEXT: pshufb %xmm2, %xmm1 ; SSSE3-NEXT: pshufb %xmm2, %xmm0 ; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] From llvm-commits at lists.llvm.org Sun Jul 12 10:53:39 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 17:53:39 +0000 (UTC) Subject: [PATCH] D83598: [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts In-Reply-To: References: Message-ID: <56efeec3d0c6ce51091150100b4cf758@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rG04013a07ac3b: [X86] Fix two places that appear to misuse peekThroughOneUseBitcasts (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83598/new/ https://reviews.llvm.org/D83598 Files: llvm/lib/Target/X86/X86ISelLowering.cpp Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36464,9 +36464,9 @@ (V.getOpcode() == X86ISD::PSHUFLW || V.getOpcode() == X86ISD::PSHUFHW) && V.getOpcode() != N.getOpcode() && - V.hasOneUse()) { + V.hasOneUse() && V.getOperand(0).hasOneUse()) { SDValue D = peekThroughOneUseBitcasts(V.getOperand(0)); - if (D.getOpcode() == X86ISD::PSHUFD && D.hasOneUse()) { + if (D.getOpcode() == X86ISD::PSHUFD) { SmallVector VMask = getPSHUFShuffleMask(V); SmallVector DMask = getPSHUFShuffleMask(D); int NOffset = N.getOpcode() == X86ISD::PSHUFLW ? 0 : 4; @@ -36903,10 +36903,11 @@ // insert into a zero vector. This helps get VZEXT_MOVL closer to // scalar_to_vectors where 256/512 are canonicalized to an insert and a // 128-bit scalar_to_vector. This reduces the number of isel patterns. - if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps()) { + if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps() && + N->getOperand(0).hasOneUse()) { SDValue V = peekThroughOneUseBitcasts(N->getOperand(0)); - if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.hasOneUse() && + if (V.getOpcode() == ISD::INSERT_SUBVECTOR && V.getOperand(0).isUndef() && isNullConstant(V.getOperand(2))) { SDValue In = V.getOperand(1); MVT SubVT = -------------- next part -------------- A non-text attachment was scrubbed... Name: D83598.277297.patch Type: text/x-patch Size: 1554 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 10:53:43 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 17:53:43 +0000 (UTC) Subject: [PATCH] D83587: [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero In-Reply-To: References: Message-ID: <3204f98fac025dd245310e6c34f76400@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf8f007e378e1: [X86] Consistently use 128 as the PSHUFB/VPPERM index for zero (authored by craig.topper). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83587/new/ https://reviews.llvm.org/D83587 Files: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-trunc.ll Index: llvm/test/CodeGen/X86/vector-trunc.ll =================================================================== --- llvm/test/CodeGen/X86/vector-trunc.ll +++ llvm/test/CodeGen/X86/vector-trunc.ll @@ -456,7 +456,7 @@ ; ; SSSE3-LABEL: trunc8i32_8i16_lshr: ; SSSE3: # %bb.0: # %entry -; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,255,255] +; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,128,128] ; SSSE3-NEXT: pshufb %xmm2, %xmm1 ; SSSE3-NEXT: pshufb %xmm2, %xmm0 ; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] Index: llvm/lib/Target/X86/X86ISelLowering.cpp =================================================================== --- llvm/lib/Target/X86/X86ISelLowering.cpp +++ llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35043,7 +35043,7 @@ continue; } if (M == SM_SentinelZero) { - PSHUFBMask.push_back(DAG.getConstant(255, DL, MVT::i8)); + PSHUFBMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; @@ -35074,7 +35074,7 @@ continue; } if (M == SM_SentinelZero) { - VPPERMMask.push_back(DAG.getConstant(128, DL, MVT::i8)); + VPPERMMask.push_back(DAG.getConstant(0x80, DL, MVT::i8)); continue; } M = Ratio * M + i % Ratio; -------------- next part -------------- A non-text attachment was scrubbed... Name: D83587.277298.patch Type: text/x-patch Size: 1373 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 10:59:12 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sun, 12 Jul 2020 10:59:12 -0700 (PDT) Subject: [polly] 7a1bcf9 - [polly] NFC clang-format change following D83564 Message-ID: <5f0b4f70.1c69fb81.6f166.d58e@mx.google.com> Author: mydeveloperday Date: 2020-07-12T18:58:53+01:00 New Revision: 7a1bcf9f9a95fca9dcf8e42f8eb845db3643fffb URL: https://github.com/llvm/llvm-project/commit/7a1bcf9f9a95fca9dcf8e42f8eb845db3643fffb DIFF: https://github.com/llvm/llvm-project/commit/7a1bcf9f9a95fca9dcf8e42f8eb845db3643fffb.diff LOG: [polly] NFC clang-format change following D83564 Added: Modified: polly/lib/Analysis/ScopDetection.cpp Removed: ################################################################################ diff --git a/polly/lib/Analysis/ScopDetection.cpp b/polly/lib/Analysis/ScopDetection.cpp index abe189f3e890..53d0b705c055 100644 --- a/polly/lib/Analysis/ScopDetection.cpp +++ b/polly/lib/Analysis/ScopDetection.cpp @@ -383,7 +383,7 @@ ScopDetection::ScopDetection(Function &F, const DominatorTree &DT, template inline bool ScopDetection::invalid(DetectionContext &Context, bool Assert, - Args &&... Arguments) const { + Args &&...Arguments) const { if (!Context.Verifying) { RejectLog &Log = Context.Log; std::shared_ptr RejectReason = std::make_shared(Arguments...); From llvm-commits at lists.llvm.org Sun Jul 12 11:07:43 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via llvm-commits) Date: Sun, 12 Jul 2020 11:07:43 -0700 (PDT) Subject: [llvm] f4d29d6 - [Matrix] Tighten LangRef definitions and Verifier checks. Message-ID: <5f0b516f.1c69fb81.37038.d430@mx.google.com> Author: Sjoerd Meijer Date: 2020-07-12T19:07:22+01:00 New Revision: f4d29d6e8c43cfd924d9d7cc1ac0c269b2788e75 URL: https://github.com/llvm/llvm-project/commit/f4d29d6e8c43cfd924d9d7cc1ac0c269b2788e75 DIFF: https://github.com/llvm/llvm-project/commit/f4d29d6e8c43cfd924d9d7cc1ac0c269b2788e75.diff LOG: [Matrix] Tighten LangRef definitions and Verifier checks. This tightens the matrix intrinsic definitions in LLVM LangRef and adds correspondings checks to the IR Verifier. Differential Revision: https://reviews.llvm.org/D83477 Added: Modified: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll Removed: ################################################################################ diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 86d315be74bc..02c92f1a4daa 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -15524,6 +15524,7 @@ The argument to this intrinsic must be a vector of floating-point values. Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15548,17 +15549,20 @@ Matrix Intrinsics ----------------- Operations on matrixes requiring shape information (like number of rows/columns -or the memory layout) can be expressed using the matrix intrinsics. Matrixes are -embedded in a flat vector and the intrinsics take the dimensions as arguments. -Currently column-major layout is assumed. The intrinsics support both integer -and floating point matrixes. +or the memory layout) can be expressed using the matrix intrinsics. These +intrinsics require matrix dimensions to be passed as immediate arguments, and +matrixes are passed and returned as vectors. This means that for a ``R`` x +``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the +corresponding vector, with indices starting at 0. Currently column-major layout +is assumed. The intrinsics support both integer and floating point matrixes. '``llvm.matrix.transpose.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15567,21 +15571,24 @@ Syntax: Overview: """"""""" -The '``llvm.matrix.transpose.*``' intrinsic treats %In as containing a matrix -with rows and columns and returns the transposed matrix embedded in -the result vector. +The '``llvm.matrix.transpose.*``' intrinsics treat %In as a x matrix +and return the transposed matrix in the result vector. Arguments: """""""""" -The and arguments must be constant integers. The vector argument -%In and the returned vector must have * elements. +First argument %In is vector that corresponds to a x matrix. +Thus, arguments and correspond to the number of rows and columns, +respectively, and must be positive, constant integers. The returned vector must +have * elements, and have the same float or integer element type +as %In. '``llvm.matrix.multiply.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15590,18 +15597,19 @@ Syntax: Overview: """"""""" -The '``llvm.matrix.multiply.*``' intrinsic treats %A as a matrix with -rows and columns, %B as a matrix with rows and -columns and multiplies them. The result matrix is returned embedded in the -result vector. +The '``llvm.matrix.multiply.*``' intrinsics treat %A as a x +matrix, %B as a x matrix, and multiplies them. The result +matrix is returned in the result vector. Arguments: """""""""" -The , and arguments must be constant -integers. The vector argument %A must have * elements, %B -must have * elements and the returned vector must have - * elements. +The first vector argument %A corresponds to a matrix with * +elements, and the second argument %B to a matrix with * +elements. Arguments , and must be positive, +constant integers. The returned vector must have * +elements. Vectors %A, %B, and the returned vector all have the same float or +integer element type. '``llvm.matrix.column.major.load.*``' Intrinsic @@ -15609,6 +15617,7 @@ must have * elements and the returned vector must have Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15618,22 +15627,26 @@ Syntax: Overview: """"""""" -The '``llvm.matrix.column.major.load.*``' intrinsic loads a matrix with -rows and columns, using a stride of %Stride between columns. For two -consecutive columns A and B, %Stride refers to the distance (the number of -elements) between the start of column A and the start of column B. The result -matrix is returned embedded in the result vector. This allows for convenient -loading of sub matrixes. If is true, the intrinsic is considered -a :ref:`volatile memory access `. - -If the %Ptr argument is known to be aligned to some boundary, this can be -specified as an attribute on the argument. +The '``llvm.matrix.column.major.load.*``' intrinsics load a x +matrix using a stride of %Stride to compute the start address of the diff erent +columns. This allows for convenient loading of sub matrixes. If +is true, the intrinsic is considered a :ref:`volatile memory access +`. The result matrix is returned in the result vector. If the %Ptr +argument is known to be aligned to some boundary, this can be specified as an +attribute on the argument. Arguments: """""""""" -The , and arguments must be constant integers. The -returned vector must have * elements. %Stride must be >= . +The first argument %Ptr is a pointer type to the returned vector type, and +correponds to the start address to load from. The second argument %Stride is a +postive, constant integer with %Stride ``>=`` . %Stride is used to compute +the column memory addresses. I.e., for a column ``C``, its start memory +addresses is calculated with %Ptr + ``C`` * %Stride. The third Argument + is a boolean value. The fourth and fifth arguments, and +, correspond to the number of rows and columns, respectively, and must be +positive, constant integers. The returned vector must have * +elements. The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. @@ -15653,12 +15666,10 @@ Syntax: Overview: """"""""" -The '``llvm.matrix.column.major.store.*``' intrinsic stores the matrix with - rows and columns embedded in %In, using a stride of %Stride -between columns. For two consecutive columns A and B, %Stride refers to the -distance (the number of elements) between the start of column A and the start -of column B. If is true, the intrinsic is considered a -:ref:`volatile memory access `. +The '``llvm.matrix.column.major.store.*``' intrinsics store the x +matrix in %In to memory using a stride of %Stride between columns. If + is true, the intrinsic is considered a :ref:`volatile memory +access `. If the %Ptr argument is known to be aligned to some boundary, this can be specified as an attribute on the argument. @@ -15666,8 +15677,15 @@ specified as an attribute on the argument. Arguments: """""""""" -The , , arguments must be constant integers. The -vector argument %In must have * elements. %Stride must be >= . +The first argument %In is a vector that corresponds to a x matrix +to be stored to memory. The second argument %Ptr is a pointer to the vector +type of %In, and is the start address of the matrix in memory. The third +argument %Stride is a positive, constant integer with %Stride ``>=`` . +%Stride is used to compute the column memory addresses. I.e., for a column +``C``, its start memory addresses is calculated with %Ptr + ``C`` * %Stride. +The fourth argument is a boolean value. The arguments and + correspond to the number of rows and columns, respectively, and must be +positive, constant integers. The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 8fa87b748901..994082fbdb7c 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -5006,36 +5006,77 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { case Intrinsic::matrix_transpose: case Intrinsic::matrix_column_major_load: case Intrinsic::matrix_column_major_store: { + Function *IF = Call.getCalledFunction(); + ConstantInt *Stride = nullptr; ConstantInt *NumRows; ConstantInt *NumColumns; - VectorType *TypeToCheck; + VectorType *ResultTy; + Type *Op0ElemTy = nullptr; + Type *Op1ElemTy = nullptr; switch (ID) { case Intrinsic::matrix_multiply: NumRows = cast(Call.getArgOperand(2)); NumColumns = cast(Call.getArgOperand(4)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); + Op1ElemTy = + cast(Call.getArgOperand(1)->getType())->getElementType(); break; case Intrinsic::matrix_transpose: NumRows = cast(Call.getArgOperand(1)); NumColumns = cast(Call.getArgOperand(2)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); break; - case Intrinsic::matrix_column_major_load: + case Intrinsic::matrix_column_major_load: { + Stride = dyn_cast(Call.getArgOperand(1)); NumRows = cast(Call.getArgOperand(3)); NumColumns = cast(Call.getArgOperand(4)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + auto *VecTy = cast( + cast(Call.getArgOperand(0)->getType())->getElementType()); + Op0ElemTy = VecTy->getElementType(); + } break; - case Intrinsic::matrix_column_major_store: + case Intrinsic::matrix_column_major_store: { + Stride = dyn_cast(Call.getArgOperand(2)); NumRows = cast(Call.getArgOperand(4)); NumColumns = cast(Call.getArgOperand(5)); - TypeToCheck = cast(Call.getArgOperand(0)->getType()); + ResultTy = cast(Call.getArgOperand(0)->getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); + auto *VecTy = cast( + cast(Call.getArgOperand(1)->getType())->getElementType()); + Op1ElemTy = VecTy->getElementType(); + } break; default: llvm_unreachable("unexpected intrinsic"); } - Assert(TypeToCheck->getNumElements() == + + Assert(ResultTy->getElementType()->isIntegerTy() || + ResultTy->getElementType()->isFloatingPointTy(), + "Result type must be an integer or floating-point type!", IF); + + Assert(ResultTy->getElementType() == Op0ElemTy, + "Vector element type mismatch of the result and first operand " + "vector!", IF); + + if (Op1ElemTy) + Assert(ResultTy->getElementType() == Op1ElemTy, + "Vector element type mismatch of the result and second operand " + "vector!", IF); + + Assert(ResultTy->getNumElements() == NumRows->getZExtValue() * NumColumns->getZExtValue(), - "result of a matrix operation does not fit in the returned vector"); + "Result of a matrix operation does not fit in the returned vector!"); + + if (Stride) + Assert(Stride->getZExtValue() >= NumRows->getZExtValue(), + "Stride must be greater or equal than the number of rows!", IF); + break; } }; diff --git a/llvm/test/Verifier/matrix-intrinsics.ll b/llvm/test/Verifier/matrix-intrinsics.ll index 6b2a4c501c66..5afab26a48c5 100644 --- a/llvm/test/Verifier/matrix-intrinsics.ll +++ b/llvm/test/Verifier/matrix-intrinsics.ll @@ -3,9 +3,9 @@ declare <4 x float> @llvm.matrix.transpose.v4f32(<4 x float>, i32, i32) define <4 x float> @transpose(<4 x float> %m, i32 %arg) { ; CHECK: assembly parsed, but does not verify as correct! -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.2, i32 %arg, i32 2) @@ -22,9 +22,9 @@ define <4 x float> @transpose(<4 x float> %m, i32 %arg) { declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) define <4 x float> @multiply(<4 x float> %m, i32 %arg) { -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.2, <4 x float> %m, i32 %arg, i32 2, i32 1) @@ -38,9 +38,9 @@ define <4 x float> @multiply(<4 x float> %m, i32 %arg) { declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>*, i64, i1, i32, i32) declare <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>*, i64, i1, i32, i32) define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n, i32 %arg) { -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>* %n, i64 2, i1 true, i32 3, i32 %arg) @@ -54,13 +54,110 @@ define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n, i32 %arg declare void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float>, <4 x float>*, i64, i1, i32, i32) declare void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float>, <6 x float>*, i64, i1, i32, i32) define void @column.major_store(<4 x float>* %m, <6 x float>* %n, i64 %arg) { -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector -; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 0, i1 false, i32 0, i32 0) call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 1, i32 2) call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 2, i1 false, i32 3, i32 3) call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 %arg, i1 false, i32 3, i32 3) ret void } + +declare <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32>, i32, i32) +declare <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float>, i32, i32) + +define <4 x float> @transpose_mixed_types(<4 x float> %fvec, <4 x i32> %ivec, i32 %arg) { +; +; CHECK-NEXT: Intrinsic has incorrect argument type! +; CHECK-NEXT: <4 x float> (<4 x i32>, i32, i32)* @llvm.matrix.transpose.v4f32.v4i32 +; CHECK-NEXT: Intrinsic has incorrect argument type! +; CHECK-NEXT: <4 x i32> (<4 x float>, i32, i32)* @llvm.matrix.transpose.v4i32.v4f32 +; + %result.0 = call <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32> %ivec, i32 0, i32 0) + %result.1 = call <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float> %result.0, i32 3, i32 2) + ret <4 x float> %result.0 +} + +declare <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32>, <4 x float>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float>, <4 x i32>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32) + +define <4 x float> @multiply_mixed_types(<4 x i32> %ivec, <4 x float> %fvec, i32 %arg) { +; +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x i32> (<4 x float>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4i32.v4f32.v4f32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4f32 +; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! +; CHECK-NEXT: <4 x float> (<4 x float>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4f32.v4i32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4i32 +; + %result.0 = call <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float> %fvec, <4 x float> %fvec, i32 2, i32 2, i32 2) + %result.1 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32> %result.0, <4 x float> %fvec, i32 2, i32 2, i32 2) + %result.2 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float> %fvec, <4 x i32> %ivec, i32 2, i32 2, i32 2) + %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32> %ivec, <4 x i32> %ivec, i32 2, i32 2, i32 2) + ret <4 x float> %result.3 +} + +declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>*, i64, i1, i32, i32) +declare <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>*, i64, i1, i32, i32) + +define <4 x float> @column.major_load_mixed_types(<4 x i32>* %m, <4 x float>* %n, i32 %arg) { +; +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4i32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x i32> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4i32.p0v4f32 +; + %result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>* %m, i64 2, i1 false, i32 2, i32 2) + %result.1 = call <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>* %n, i64 2, i1 false, i32 2, i32 2) + ret <4 x float> %result.0 +} + +declare void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32>, <4 x float>*, i64, i1, i32, i32) +declare void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float>, <4 x i32>*, i64, i1, i32, i32) + +define void @column.major_store_mixed_types(<4 x float>* %m, <4 x i32>* %n, i64 %arg) { +; +; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! +; CHECK-NEXT: void (<4 x i32>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4i32.p0v4f32 +; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! +; CHECK-NEXT: void (<4 x float>, <4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4i32 +; + call void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 2, i32 2) + call void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float> zeroinitializer, <4 x i32>* %n, i64 2, i1 false, i32 2, i32 2) + ret void +} + +declare void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*>, <4 x float>*, i64, i1, i32, i32) + +define void @column.major_store_non_int_float_type(<4 x float>* %m, <4 x float>* %n, i64 %arg) { +; +; CHECK-NEXT: Result type must be an integer or floating-point type! +; CHECK-NEXT: void (<4 x float*>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4p0f32.p0v4f32 +; + call void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*> zeroinitializer, <4 x float>* %n, i64 2, i1 false, i32 2, i32 2) + ret void +} + +define <4 x float> @column.major_load_stride_too_small(<4 x float>* %m, i32 %arg) { +; +; CHECK-NEXT: Stride must be greater or equal than the number of rows! +; CHECK-NEXT: <4 x float> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4f32 +; + %result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>* %m, i64 1, i1 false, i32 2, i32 2) + ret <4 x float> %result.1 +} + +define void @column.major_store_stride_too_small(<4 x float>* %m, i64 %arg) { +; +; CHECK-NEXT: Stride must be greater or equal than the number of rows! +; CHECK-NEXT: void (<4 x float>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4f32 +; + call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 1, i1 false, i32 2, i32 2) + ret void +} From llvm-commits at lists.llvm.org Sun Jul 12 11:07:58 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:07:58 +0000 (UTC) Subject: [PATCH] D83477: [Matrix] Tighten LangRef definitions and Verifier checks. In-Reply-To: References: Message-ID: <87ac297d3e047dd070178c5943f853aa@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGf4d29d6e8c43: [Matrix] Tighten LangRef definitions and Verifier checks. (authored by SjoerdMeijer). Changed prior to commit: https://reviews.llvm.org/D83477?vs=276975&id=277299#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83477/new/ https://reviews.llvm.org/D83477 Files: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83477.277299.patch Type: text/x-patch Size: 21903 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 11:12:52 2020 From: llvm-commits at lists.llvm.org (Thomas Lively via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:12:52 +0000 (UTC) Subject: [PATCH] D83605: [SelectionDAG][WebAssembly] Recognize splat value ABS operations In-Reply-To: References: Message-ID: <5fd0cea73a0e64a203eed6b32ed1c73f@localhost.localdomain> tlively updated this revision to Diff 277300. tlively added a comment. - Remove superfluous parentheses Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83605/new/ https://reviews.llvm.org/D83605 Files: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp llvm/test/CodeGen/WebAssembly/simd-shift-complex-splats.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D83605.277300.patch Type: text/x-patch Size: 7002 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 11:20:23 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:20:23 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: dblaikie added a comment. In D79978#2146150 , @MaskRay wrote: > My apologies that I did not try cfi-inserter-basic-block-sections-callee-save-registers.ll > The generated assembly does have .cfi_offset > (I cannot apply the patch with `arc patch` (https://reviews.llvm.org/D79978?id=277216#2138208 ) > so I used curl | patch and somehow ignored the locally modified .ll) > > However, I do think inline asm with clobbered register list () has some advantage: the .cfi_offset registers > can be precisely controlled. > > call void asm sideeffect "nop", "~{rbp},~{r12},~{r13},~{r14},~{r15}"() > > > The CHECK lines can thus be strengthened: > > ; CFI_INSTR: CFI_INSTRUCTION def_cfa_offset 48 > ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r12, -48 > ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r13, -40 > ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r14, -32 > ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r15, -24 > ; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $rbp, -16 > > > While the current approach cannot control the used registers. Yeah, I have mixed feelings about this - not sure it's necessary to test all of these, or test them as strongly as you've suggested (overly constraining tests can make them brittle - prone to breakage for other reasons (eg: it might still be worth leaving off the specific stack offset, so that if stack layout changes these tests don't fail - other, pre-existing, tests specifically for stack layout should catch that)). And usually I'd advocate for not using inline assembly (as in the other test). But in this case, given how indirect the code is when trying to clobber the callee saved register - yeah, I think inline asm might've been more clear to me in that test. Though I think my non-asm suggestion (https://reviews.llvm.org/D79978#2145753) did manage to reduce the test down to exactly one register in a fairly understandable way... hmm, could be a bit simpler now that I understand better what's going on: void clobber(); void sink(int); void test(bool b, int i) { if (b) { // adds a basic block clobber(); // encourage 'i' to be in a callee-save register by clobbering caller-save registers sink(i); // keeps 'i' alive } } So that tests one callee save register, assuming LLVM doesn't decide to (gratuitously inefficiently - well, I guess it might not be that much less efficient) put 'i' on the stack or somewhere else instead. Whereas @maskray's suggestion of using explicit register use/clobber does seem a bit more explicit/intentional, rather than having to coax a certain representation out of the compiler backend, I guess something like: void f3(bool b) { if (b) // adds a basic block // clobber some example callee-save registers to force them to be callee-saved and to be described by cfi_offset directives asm("nop" : : : "r12","r13","r14","r15"); } (at least I'm not sure much else is required - @MaskRay - was the second inline asm needed? or the return values/parameters? Also perhaps just testing one register would be sufficient? (less ordering issues, chance for unrelated backend changes to disturb this test)) ================ Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:3087-3091 + if (MBB.isEndSection()) { + for (const HandlerInfo &HI : Handlers) { + HI.Handler->endBasicBlock(MBB); + } + } ---------------- probably drop the braces here too (as above) ================ Comment at: llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll:3 +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=asm --frame-pointer=none -o - | FileCheck --check-prefix=SECTIONS_NOFP_CFI %s +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=obj --frame-pointer=all -o - | llvm-dwarfdump --debug-frame - | FileCheck --check-prefix=DEBUG_FRAME %s + ---------------- MaskRay wrote: > While `--eh-frame` is an alias for `--debug-frame`, I think using `--eh-frame` here is more appropriate. This tests .eh_frame, not .debug_frame. > Agreed - the check on line 51 should be updated too. llvm-dwarfdump's output is actually rendering an empty debug_frame and then after that it's rendering the eh_frame - the test is currently a bit misleading down there too. (& maybe llvm-dwarfdump's output is prone to such misleading testing, unfortunately - it prints the name of every section explicitly requested (& considers eh_frame and debug_frame explicitly requested when using --eh-frame or --debug-frame) even if they're empty, so you can easily get this sort of flow-on effect) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sun Jul 12 11:21:51 2020 From: llvm-commits at lists.llvm.org (Sjoerd Meijer via llvm-commits) Date: Sun, 12 Jul 2020 11:21:51 -0700 (PDT) Subject: [llvm] 4ff7ed3 - Revert "[Matrix] Tighten LangRef definitions and Verifier checks." Message-ID: <5f0b54bf.1c69fb81.a9bc0.d40e@mx.google.com> Author: Sjoerd Meijer Date: 2020-07-12T19:19:25+01:00 New Revision: 4ff7ed33108d9039fd960a4979b2e1503888582c URL: https://github.com/llvm/llvm-project/commit/4ff7ed33108d9039fd960a4979b2e1503888582c DIFF: https://github.com/llvm/llvm-project/commit/4ff7ed33108d9039fd960a4979b2e1503888582c.diff LOG: Revert "[Matrix] Tighten LangRef definitions and Verifier checks." This reverts commit f4d29d6e8c43cfd924d9d7cc1ac0c269b2788e75. Hm, some build bot failures, reverting it while I investigate that. Added: Modified: llvm/docs/LangRef.rst llvm/lib/IR/Verifier.cpp llvm/test/Verifier/matrix-intrinsics.ll Removed: ################################################################################ diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 02c92f1a4daa..86d315be74bc 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -15524,7 +15524,6 @@ The argument to this intrinsic must be a vector of floating-point values. Syntax: """"""" -This is an overloaded intrinsic. :: @@ -15549,20 +15548,17 @@ Matrix Intrinsics ----------------- Operations on matrixes requiring shape information (like number of rows/columns -or the memory layout) can be expressed using the matrix intrinsics. These -intrinsics require matrix dimensions to be passed as immediate arguments, and -matrixes are passed and returned as vectors. This means that for a ``R`` x -``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the -corresponding vector, with indices starting at 0. Currently column-major layout -is assumed. The intrinsics support both integer and floating point matrixes. +or the memory layout) can be expressed using the matrix intrinsics. Matrixes are +embedded in a flat vector and the intrinsics take the dimensions as arguments. +Currently column-major layout is assumed. The intrinsics support both integer +and floating point matrixes. '``llvm.matrix.transpose.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" -This is an overloaded intrinsic. :: @@ -15571,24 +15567,21 @@ This is an overloaded intrinsic. Overview: """"""""" -The '``llvm.matrix.transpose.*``' intrinsics treat %In as a x matrix -and return the transposed matrix in the result vector. +The '``llvm.matrix.transpose.*``' intrinsic treats %In as containing a matrix +with rows and columns and returns the transposed matrix embedded in +the result vector. Arguments: """""""""" -First argument %In is vector that corresponds to a x matrix. -Thus, arguments and correspond to the number of rows and columns, -respectively, and must be positive, constant integers. The returned vector must -have * elements, and have the same float or integer element type -as %In. +The and arguments must be constant integers. The vector argument +%In and the returned vector must have * elements. '``llvm.matrix.multiply.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" -This is an overloaded intrinsic. :: @@ -15597,19 +15590,18 @@ This is an overloaded intrinsic. Overview: """"""""" -The '``llvm.matrix.multiply.*``' intrinsics treat %A as a x -matrix, %B as a x matrix, and multiplies them. The result -matrix is returned in the result vector. +The '``llvm.matrix.multiply.*``' intrinsic treats %A as a matrix with +rows and columns, %B as a matrix with rows and +columns and multiplies them. The result matrix is returned embedded in the +result vector. Arguments: """""""""" -The first vector argument %A corresponds to a matrix with * -elements, and the second argument %B to a matrix with * -elements. Arguments , and must be positive, -constant integers. The returned vector must have * -elements. Vectors %A, %B, and the returned vector all have the same float or -integer element type. +The , and arguments must be constant +integers. The vector argument %A must have * elements, %B +must have * elements and the returned vector must have + * elements. '``llvm.matrix.column.major.load.*``' Intrinsic @@ -15617,7 +15609,6 @@ integer element type. Syntax: """"""" -This is an overloaded intrinsic. :: @@ -15627,26 +15618,22 @@ This is an overloaded intrinsic. Overview: """"""""" -The '``llvm.matrix.column.major.load.*``' intrinsics load a x -matrix using a stride of %Stride to compute the start address of the diff erent -columns. This allows for convenient loading of sub matrixes. If -is true, the intrinsic is considered a :ref:`volatile memory access -`. The result matrix is returned in the result vector. If the %Ptr -argument is known to be aligned to some boundary, this can be specified as an -attribute on the argument. +The '``llvm.matrix.column.major.load.*``' intrinsic loads a matrix with +rows and columns, using a stride of %Stride between columns. For two +consecutive columns A and B, %Stride refers to the distance (the number of +elements) between the start of column A and the start of column B. The result +matrix is returned embedded in the result vector. This allows for convenient +loading of sub matrixes. If is true, the intrinsic is considered +a :ref:`volatile memory access `. + +If the %Ptr argument is known to be aligned to some boundary, this can be +specified as an attribute on the argument. Arguments: """""""""" -The first argument %Ptr is a pointer type to the returned vector type, and -correponds to the start address to load from. The second argument %Stride is a -postive, constant integer with %Stride ``>=`` . %Stride is used to compute -the column memory addresses. I.e., for a column ``C``, its start memory -addresses is calculated with %Ptr + ``C`` * %Stride. The third Argument - is a boolean value. The fourth and fifth arguments, and -, correspond to the number of rows and columns, respectively, and must be -positive, constant integers. The returned vector must have * -elements. +The , and arguments must be constant integers. The +returned vector must have * elements. %Stride must be >= . The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. @@ -15666,10 +15653,12 @@ Syntax: Overview: """"""""" -The '``llvm.matrix.column.major.store.*``' intrinsics store the x -matrix in %In to memory using a stride of %Stride between columns. If - is true, the intrinsic is considered a :ref:`volatile memory -access `. +The '``llvm.matrix.column.major.store.*``' intrinsic stores the matrix with + rows and columns embedded in %In, using a stride of %Stride +between columns. For two consecutive columns A and B, %Stride refers to the +distance (the number of elements) between the start of column A and the start +of column B. If is true, the intrinsic is considered a +:ref:`volatile memory access `. If the %Ptr argument is known to be aligned to some boundary, this can be specified as an attribute on the argument. @@ -15677,15 +15666,8 @@ specified as an attribute on the argument. Arguments: """""""""" -The first argument %In is a vector that corresponds to a x matrix -to be stored to memory. The second argument %Ptr is a pointer to the vector -type of %In, and is the start address of the matrix in memory. The third -argument %Stride is a positive, constant integer with %Stride ``>=`` . -%Stride is used to compute the column memory addresses. I.e., for a column -``C``, its start memory addresses is calculated with %Ptr + ``C`` * %Stride. -The fourth argument is a boolean value. The arguments and - correspond to the number of rows and columns, respectively, and must be -positive, constant integers. +The , , arguments must be constant integers. The +vector argument %In must have * elements. %Stride must be >= . The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 994082fbdb7c..8fa87b748901 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -5006,77 +5006,36 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { case Intrinsic::matrix_transpose: case Intrinsic::matrix_column_major_load: case Intrinsic::matrix_column_major_store: { - Function *IF = Call.getCalledFunction(); - ConstantInt *Stride = nullptr; ConstantInt *NumRows; ConstantInt *NumColumns; - VectorType *ResultTy; - Type *Op0ElemTy = nullptr; - Type *Op1ElemTy = nullptr; + VectorType *TypeToCheck; switch (ID) { case Intrinsic::matrix_multiply: NumRows = cast(Call.getArgOperand(2)); NumColumns = cast(Call.getArgOperand(4)); - ResultTy = cast(Call.getType()); - Op0ElemTy = - cast(Call.getArgOperand(0)->getType())->getElementType(); - Op1ElemTy = - cast(Call.getArgOperand(1)->getType())->getElementType(); + TypeToCheck = cast(Call.getType()); break; case Intrinsic::matrix_transpose: NumRows = cast(Call.getArgOperand(1)); NumColumns = cast(Call.getArgOperand(2)); - ResultTy = cast(Call.getType()); - Op0ElemTy = - cast(Call.getArgOperand(0)->getType())->getElementType(); + TypeToCheck = cast(Call.getType()); break; - case Intrinsic::matrix_column_major_load: { - Stride = dyn_cast(Call.getArgOperand(1)); + case Intrinsic::matrix_column_major_load: NumRows = cast(Call.getArgOperand(3)); NumColumns = cast(Call.getArgOperand(4)); - ResultTy = cast(Call.getType()); - auto *VecTy = cast( - cast(Call.getArgOperand(0)->getType())->getElementType()); - Op0ElemTy = VecTy->getElementType(); - } + TypeToCheck = cast(Call.getType()); break; - case Intrinsic::matrix_column_major_store: { - Stride = dyn_cast(Call.getArgOperand(2)); + case Intrinsic::matrix_column_major_store: NumRows = cast(Call.getArgOperand(4)); NumColumns = cast(Call.getArgOperand(5)); - ResultTy = cast(Call.getArgOperand(0)->getType()); - Op0ElemTy = - cast(Call.getArgOperand(0)->getType())->getElementType(); - auto *VecTy = cast( - cast(Call.getArgOperand(1)->getType())->getElementType()); - Op1ElemTy = VecTy->getElementType(); - } + TypeToCheck = cast(Call.getArgOperand(0)->getType()); break; default: llvm_unreachable("unexpected intrinsic"); } - - Assert(ResultTy->getElementType()->isIntegerTy() || - ResultTy->getElementType()->isFloatingPointTy(), - "Result type must be an integer or floating-point type!", IF); - - Assert(ResultTy->getElementType() == Op0ElemTy, - "Vector element type mismatch of the result and first operand " - "vector!", IF); - - if (Op1ElemTy) - Assert(ResultTy->getElementType() == Op1ElemTy, - "Vector element type mismatch of the result and second operand " - "vector!", IF); - - Assert(ResultTy->getNumElements() == + Assert(TypeToCheck->getNumElements() == NumRows->getZExtValue() * NumColumns->getZExtValue(), - "Result of a matrix operation does not fit in the returned vector!"); - - if (Stride) - Assert(Stride->getZExtValue() >= NumRows->getZExtValue(), - "Stride must be greater or equal than the number of rows!", IF); - + "result of a matrix operation does not fit in the returned vector"); break; } }; diff --git a/llvm/test/Verifier/matrix-intrinsics.ll b/llvm/test/Verifier/matrix-intrinsics.ll index 5afab26a48c5..6b2a4c501c66 100644 --- a/llvm/test/Verifier/matrix-intrinsics.ll +++ b/llvm/test/Verifier/matrix-intrinsics.ll @@ -3,9 +3,9 @@ declare <4 x float> @llvm.matrix.transpose.v4f32(<4 x float>, i32, i32) define <4 x float> @transpose(<4 x float> %m, i32 %arg) { ; CHECK: assembly parsed, but does not verify as correct! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.transpose.v4f32(<4 x float> %result.2, i32 %arg, i32 2) @@ -22,9 +22,9 @@ define <4 x float> @transpose(<4 x float> %m, i32 %arg) { declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) define <4 x float> @multiply(<4 x float> %m, i32 %arg) { -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %result.2, <4 x float> %m, i32 %arg, i32 2, i32 1) @@ -38,9 +38,9 @@ define <4 x float> @multiply(<4 x float> %m, i32 %arg) { declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>*, i64, i1, i32, i32) declare <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>*, i64, i1, i32, i32) define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n, i32 %arg) { -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector ; CHECK-NEXT: immarg operand has non-immediate parameter ; CHECK-NEXT: i32 %arg ; CHECK-NEXT: %result.3 = call <6 x float> @llvm.matrix.column.major.load.v6f32.p0v6f32(<6 x float>* %n, i64 2, i1 true, i32 3, i32 %arg) @@ -54,110 +54,13 @@ define <4 x float> @column.major_load(<4 x float>* %m, <6 x float>* %n, i32 %arg declare void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float>, <4 x float>*, i64, i1, i32, i32) declare void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float>, <6 x float>*, i64, i1, i32, i32) define void @column.major_store(<4 x float>* %m, <6 x float>* %n, i64 %arg) { -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! -; CHECK-NEXT: Result of a matrix operation does not fit in the returned vector! +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector +; CHECK-NEXT: result of a matrix operation does not fit in the returned vector call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 0, i1 false, i32 0, i32 0) call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 1, i32 2) call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 2, i1 false, i32 3, i32 3) call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 %arg, i1 false, i32 3, i32 3) ret void } - -declare <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32>, i32, i32) -declare <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float>, i32, i32) - -define <4 x float> @transpose_mixed_types(<4 x float> %fvec, <4 x i32> %ivec, i32 %arg) { -; -; CHECK-NEXT: Intrinsic has incorrect argument type! -; CHECK-NEXT: <4 x float> (<4 x i32>, i32, i32)* @llvm.matrix.transpose.v4f32.v4i32 -; CHECK-NEXT: Intrinsic has incorrect argument type! -; CHECK-NEXT: <4 x i32> (<4 x float>, i32, i32)* @llvm.matrix.transpose.v4i32.v4f32 -; - %result.0 = call <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32> %ivec, i32 0, i32 0) - %result.1 = call <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float> %result.0, i32 3, i32 2) - ret <4 x float> %result.0 -} - -declare <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) -declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32>, <4 x float>, i32, i32, i32) -declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float>, <4 x i32>, i32, i32, i32) -declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32) - -define <4 x float> @multiply_mixed_types(<4 x i32> %ivec, <4 x float> %fvec, i32 %arg) { -; -; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! -; CHECK-NEXT: <4 x i32> (<4 x float>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4i32.v4f32.v4f32 -; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! -; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4f32 -; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! -; CHECK-NEXT: <4 x float> (<4 x float>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4f32.v4i32 -; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! -; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4i32 -; - %result.0 = call <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float> %fvec, <4 x float> %fvec, i32 2, i32 2, i32 2) - %result.1 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32> %result.0, <4 x float> %fvec, i32 2, i32 2, i32 2) - %result.2 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float> %fvec, <4 x i32> %ivec, i32 2, i32 2, i32 2) - %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32> %ivec, <4 x i32> %ivec, i32 2, i32 2, i32 2) - ret <4 x float> %result.3 -} - -declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>*, i64, i1, i32, i32) -declare <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>*, i64, i1, i32, i32) - -define <4 x float> @column.major_load_mixed_types(<4 x i32>* %m, <4 x float>* %n, i32 %arg) { -; -; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! -; CHECK-NEXT: <4 x float> (<4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4i32 -; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! -; CHECK-NEXT: <4 x i32> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4i32.p0v4f32 -; - %result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>* %m, i64 2, i1 false, i32 2, i32 2) - %result.1 = call <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>* %n, i64 2, i1 false, i32 2, i32 2) - ret <4 x float> %result.0 -} - -declare void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32>, <4 x float>*, i64, i1, i32, i32) -declare void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float>, <4 x i32>*, i64, i1, i32, i32) - -define void @column.major_store_mixed_types(<4 x float>* %m, <4 x i32>* %n, i64 %arg) { -; -; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! -; CHECK-NEXT: void (<4 x i32>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4i32.p0v4f32 -; CHECK-NEXT: Vector element type mismatch of the result and second operand vector! -; CHECK-NEXT: void (<4 x float>, <4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4i32 -; - call void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 2, i32 2) - call void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float> zeroinitializer, <4 x i32>* %n, i64 2, i1 false, i32 2, i32 2) - ret void -} - -declare void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*>, <4 x float>*, i64, i1, i32, i32) - -define void @column.major_store_non_int_float_type(<4 x float>* %m, <4 x float>* %n, i64 %arg) { -; -; CHECK-NEXT: Result type must be an integer or floating-point type! -; CHECK-NEXT: void (<4 x float*>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4p0f32.p0v4f32 -; - call void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*> zeroinitializer, <4 x float>* %n, i64 2, i1 false, i32 2, i32 2) - ret void -} - -define <4 x float> @column.major_load_stride_too_small(<4 x float>* %m, i32 %arg) { -; -; CHECK-NEXT: Stride must be greater or equal than the number of rows! -; CHECK-NEXT: <4 x float> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4f32 -; - %result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>* %m, i64 1, i1 false, i32 2, i32 2) - ret <4 x float> %result.1 -} - -define void @column.major_store_stride_too_small(<4 x float>* %m, i64 %arg) { -; -; CHECK-NEXT: Stride must be greater or equal than the number of rows! -; CHECK-NEXT: void (<4 x float>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4f32 -; - call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 1, i1 false, i32 2, i32 2) - ret void -} From llvm-commits at lists.llvm.org Sun Jul 12 11:23:21 2020 From: llvm-commits at lists.llvm.org (David Blaikie via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:23:21 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <3b34b938f3e9bfa9f3a4c6cfc14475ed@localhost.localdomain> dblaikie added a comment. In D79978#2146131 , @tmsriram wrote: > Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict : > > 1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped. Just on this point - it's not clear to me the motivation for the difference in testing between the two tests (one testing assembly, the other testing LLVM's MIR) - was there some particular difference/detail that these different strategies enabled testing? Otherwise, yeah, I'd probably just go with only testing assembly in both cases, for consistency/simplicity? (any variation in tests always makes me wonder "what's the reason for this difference? Must be something important or they'd be the same and so I'm not understanding some important difference") CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sun Jul 12 11:43:08 2020 From: llvm-commits at lists.llvm.org (Hal Finkel via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:43:08 +0000 (UTC) Subject: [PATCH] D83635: [OpenMPOpt][WIP] Merge parallel regions In-Reply-To: References: Message-ID: <4f25c253505e98cb436044c55e6c7acd@localhost.localdomain> hfinkel added a comment. It's great that you're working on this. It's very important that we allow people to write code, structured and decomposed in a way that makes sense from an engineering and maintenance perspective, and have the compiler combine things later to avoid unnecessary overhead. This is just as much true for expressions of parallelism as it is for other aspects of the code. ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:744 + // Merge maximal number of succesive CIs when in-between + // instructions are safe to execute in parallel + for (Instruction &I : *BB) { ---------------- Comments in LLVM should be complete sentences and end with appropriate punctuation (here and a few other places). ================ Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:751 + + if (isSafeToSpeculativelyExecute(&I, &I, DT)) { + continue; ---------------- Uneeded {} (here and a few other places). ================ Comment at: llvm/test/Transforms/OpenMP/parallel_region_merging.ll:13 +; void merge_all() { +; #pragma omp parallel +; { ---------------- Do these need to say 'nowait' in order to actually match the code? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83635/new/ https://reviews.llvm.org/D83635 From llvm-commits at lists.llvm.org Sun Jul 12 11:56:30 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:56:30 +0000 (UTC) Subject: [PATCH] D83244: [lld] Don't error out on relocations in .gcc_except_table to discarded sections. In-Reply-To: References: Message-ID: MaskRay requested changes to this revision. MaskRay added a comment. This revision now requires changes to proceed. I think this is invalid. See more on https://bugs.llvm.org/show_bug.cgi?id=46675#c1 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83244/new/ https://reviews.llvm.org/D83244 From llvm-commits at lists.llvm.org Sun Jul 12 11:58:02 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 18:58:02 +0000 (UTC) Subject: [PATCH] D83567: [DAGCombiner] allow load/store merging if pairs can be rotated into place In-Reply-To: References: Message-ID: <440adb169fb3fc29b9d8ed1553784135@localhost.localdomain> spatel added a comment. In D83567#2146341 , @xbolva00 wrote: > Can you create new PR for aarch64’s missed opportunity to use rev16? https://bugs.llvm.org/show_bug.cgi?id=46694 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83567/new/ https://reviews.llvm.org/D83567 From llvm-commits at lists.llvm.org Sun Jul 12 12:30:55 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 19:30:55 +0000 (UTC) Subject: [PATCH] D83644: [AArch64][ELF] Support FDE references more than +/-2GB range for AArch64 large code model In-Reply-To: References: Message-ID: <943ac742a1ce531367a0c161247fe23d@localhost.localdomain> MaskRay added inline comments. ================ Comment at: llvm/test/MC/ELF/cfi-large-model.s:8 // RUN: -o - | llvm-readobj -S --sd | FileCheck --check-prefix=CHECK-PPC %s - +// RUN: llvm-mc -filetype=obj -triple aarch64-linux-gnu -large-code-model %s \ +// RUN: -o - | llvm-readobj -S --sd | FileCheck --check-prefix=CHECK-ARM64 %s ---------------- `-triple aarch64` is sufficient. This tests generic ELF behavior. Add -triple aarch64_be. You can move aarch64 before x86_64 to be closer (oh,yes,x86_64->powerpc64le is an inversion pair) to an alphabetical order. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83644/new/ https://reviews.llvm.org/D83644 From llvm-commits at lists.llvm.org Sun Jul 12 12:41:41 2020 From: llvm-commits at lists.llvm.org (=?UTF-8?B?QWxleGV5IExhcHNoaW4=?= via llvm-commits) Date: Sun, 12 Jul 2020 22:41:41 +0300 Subject: =?UTF-8?B?UmU6IFtQQVRDSF0gRDgyMDg1OiBbVFJFXSBhbGxvdyBUUkUgZm9yIG5vbi1j?= =?UTF-8?B?YXB0dXJpbmcgY2FsbHMu?= In-Reply-To: <294696123008e665e0b813c136c34d09@localhost.localdomain> References: <294696123008e665e0b813c136c34d09@localhost.localdomain> Message-ID: <1594582901.988822064@f313.i.mail.ru> Hi David,   Thank you for the information. I would revert that commit and work on the fix.   Thank you, Alexey. >Воскресенье, 12 июля 2020, 12:44 +03:00 от David Zarzycki via Phabricator : >  >davezarzycki added a comment. > >Hello. I have an auto-bisecting multi-stage bot that is failing on two after this change. Can we please revert this or commit a quick fix? > >  FAIL: Clang :: CXX/class/class.compare/class.spaceship/p1.cpp (6232 of 64222) >  ******************** TEST 'Clang :: CXX/class/class.compare/class.spaceship/p1.cpp' FAILED ******************** >  Script: >  -- >  : 'RUN: at line 1'; /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions >  -- >  Exit Code: 134 >   >  Command Output (stderr): >  -- >  clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed. >  PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. >  Stack dump: >  0. Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions >  1. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:127:38: current parser token ',' >  2. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:39:1: parsing namespace 'Deletedness' >  3. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: parsing function body 'Deletedness::g' >  4. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: in compound statement ('{}') >   #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f) >   #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912) >   #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5) >   #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90) >   #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25) >   #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895) >   #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769) >   #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86) >   #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c) >   #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb) >  #10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab) >  #11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca) >  #12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60) >  #13 0x0000000005092783 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x5092783) >  #14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba) >  #15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86) >  #16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c) >  #17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22) >  #18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed) >  #19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a) >  #20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49) >  #21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc) >  #22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9) >  #23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl&, llvm::SmallVectorImpl&, llvm::function_ref) (/tmp/_update_lc/t/bin/clang+0x4d60dba) >  #24 0x0000000004d542d9 clang::Parser::ParsePostfixExpressionSuffix(clang::ActionResult) (/tmp/_update_lc/t/bin/clang+0x4d542d9) >  #25 0x0000000004d55b95 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d55b95) >  #26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89) >  #27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9) >  #28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368) >  #29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0) >  #30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614) >  #31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2) >  #32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0) >  #33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0) >  #34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d) >  #35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32) >  #36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938) >  #37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc) >  #38 0x0000000004d02c15 clang::Parser::ParseInnerNamespace(llvm::SmallVector const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) (/tmp/_update_lc/t/bin/clang+0x4d02c15) >  #39 0x0000000004d0251a clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) (/tmp/_update_lc/t/bin/clang+0x4d0251a) >  #40 0x0000000004d22f0a clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d22f0a) >  #41 0x0000000004cf7e39 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf7e39) >  #42 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858) >  #43 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed) >  #44 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21) >  #45 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3) >  #46 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b) >  #47 0x0000000002244636 cc1_main(llvm::ArrayRef, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636) >  #48 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl&) (/tmp/_update_lc/t/bin/clang+0x224297d) >  #49 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619) >  #50 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042) >  #51 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce) >  /tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.spaceship/Output/p1.cpp.script: line 1: 4146089 Aborted /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions >   >  -- >   >  ******************** >  Testing: 0.. >  FAIL: Clang :: CXX/class/class.compare/class.eq/p2.cpp (6242 of 64222) >  ******************** TEST 'Clang :: CXX/class/class.compare/class.eq/p2.cpp' FAILED ******************** >  Script: >  -- >  : 'RUN: at line 1'; /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp >  -- >  Exit Code: 134 >   >  Command Output (stderr): >  -- >  clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed. >  PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. >  Stack dump: >  0. Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp >  1. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:47:30: current parser token ')' >  2. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: parsing function body 'test' >  3. /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: in compound statement ('{}') >   #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f) >   #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912) >   #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5) >   #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90) >   #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25) >   #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895) >   #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769) >   #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86) >   #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c) >   #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb) >  #10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab) >  #11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca) >  #12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60) >  #13 0x00000000050928b7 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x50928b7) >  #14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba) >  #15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86) >  #16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c) >  #17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22) >  #18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed) >  #19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a) >  #20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49) >  #21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc) >  #22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9) >  #23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl&, llvm::SmallVectorImpl&, llvm::function_ref) (/tmp/_update_lc/t/bin/clang+0x4d60dba) >  #24 0x0000000004d4b29c clang::Parser::ParseCXXTypeConstructExpression(clang::DeclSpec const&) (/tmp/_update_lc/t/bin/clang+0x4d4b29c) >  #25 0x0000000004d57617 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d57617) >  #26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89) >  #27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9) >  #28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368) >  #29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0) >  #30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614) >  #31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2) >  #32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0) >  #33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0) >  #34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d) >  #35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32) >  #36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938) >  #37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc) >  #38 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858) >  #39 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed) >  #40 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21) >  #41 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3) >  #42 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b) >  #43 0x0000000002244636 cc1_main(llvm::ArrayRef, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636) >  #44 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl&) (/tmp/_update_lc/t/bin/clang+0x224297d) >  #45 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619) >  #46 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042) >  #47 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce) >  /tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.eq/Output/p2.cpp.script: line 1: 4146047 Aborted /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp >   >  -- >   >  ******************** >  Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. >  ******************** >  Failed Tests (2): >    Clang :: CXX/class/class.compare/class.eq/p2.cpp >    Clang :: CXX/class/class.compare/class.spaceship/p1.cpp >   >   >  Testing Time: 117.51s >    Unsupported : 12906 >    Passed : 51214 >    Expectedly Failed: 100 >    Failed : 2 >  FAILED: CMakeFiles/check-all >  cd /tmp/_update_lc/t && /usr/bin/python3.8 /tmp/_update_lc/t/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /tmp/_update_lc/t/tools/clang/test /tmp/_update_lc/t/tools/lld/test /tmp/_update_lc/t/tools/lldb/test /tmp/_update_lc/t/utils/lit /tmp/_update_lc/t/test >  ninja: build stopped: subcommand failed. >  + do_error 'FAILURE -- STAGE TWO BUILD of LLVM' 12 >  + echo FAILURE -- STAGE TWO BUILD of LLVM >  FAILURE -- STAGE TWO BUILD of LLVM >  + exit 12 > > >Repository: >  rG LLVM Github Monorepo > >CHANGES SINCE LAST ACTION >   https://reviews.llvm.org/D82085/new/ > >https://reviews.llvm.org/D82085 > > >      -- Alexey Lapshin   -------------- next part -------------- An HTML attachment was scrubbed... URL: From llvm-commits at lists.llvm.org Sun Jul 12 12:46:12 2020 From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 19:46:12 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <70407cfe0c61d1be45de169ddc97ac76@localhost.localdomain> MaskRay added a comment. In D79978#2146362 , @dblaikie wrote: > Whereas @maskray's suggestion of using explicit register use/clobber does seem a bit more explicit/intentional, rather than having to coax a certain representation out of the compiler backend, I guess something like: > > void f3(bool b) { > if (b) // adds a basic block > // clobber some example callee-save registers to force them to be callee-saved and to be described by cfi_offset directives > asm("nop" : : : "r12","r13","r14","r15"); > } > > > (at least I'm not sure much else is required - @MaskRay - was the second inline asm needed? or the return values/parameters? Also perhaps just testing one register would be sufficient? (less ordering issues, chance for unrelated backend changes to disturb this test)) The second inline asm in the example (in a previous comment of mine; pasted below for your convenience) is not needed. When I wrote it, I was thinking whether we can exercise more code in CFIInstrInserter, i.e. not just that (all non-entry basic blocks can inherit CFI from the entry), but also that: if a non-entry basic block has updated its CFI information, subsequent basic blocks can pick up that delta part (relative to the entry basic block) int f3(int i, int j, int k) { if (i == 0) { // adds a basic block asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15 return j; } if (j == 0) { // adds a basic block // Ideally there is some code construct here which can reliably alter CFI, so that we can test that CFIInstrInserter can handle the delta part. // Unfortunately this cannot be inline asm `subq 100, %rsp` as that does not generate CFI_INSTRUCTION which can be tracked by CFIInstrInserter asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory"); // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13 return k; } return i; } If we can find such a code construct, it'd give me more confidence that we are updating CFIInstrInserter correctly & it'd be more difficult to break the basic block sections code. And yes that I used registers because I hope the order of these CFI_INSTRUCTION registers is relatively stable so `CHECK-NEXT: CFI_INSTRUCTION .....` can be a stronger test. As to offsets, I don't know codegen enough to confidently say that it is stable, but I hope simple code like this (with very specific spill slots) will not cause offsets to be updated abruptly. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sun Jul 12 12:58:18 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via llvm-commits) Date: Sun, 12 Jul 2020 12:58:18 -0700 (PDT) Subject: [llvm] 4458973 - [InstCombine] fold mul of zext/sext bools to 'and' Message-ID: <5f0b6b5a.1c69fb81.92b11.eed7@mx.google.com> Author: Sanjay Patel Date: 2020-07-12T15:56:26-04:00 New Revision: 445897334741c53e98f8044f5f33ab1e888b3818 URL: https://github.com/llvm/llvm-project/commit/445897334741c53e98f8044f5f33ab1e888b3818 DIFF: https://github.com/llvm/llvm-project/commit/445897334741c53e98f8044f5f33ab1e888b3818.diff LOG: [InstCombine] fold mul of zext/sext bools to 'and' Similar to rG40fcc42: The base case only worked because we were relying on a poison-unsafe select transform; if that is fixed, we would regress on patterns like this. The extra use tests show that the select transform can't be applied consistently. So it may be a regression to have an extra instruction on 1 test, but that result was not created safely and does not happen reliably. Added: Modified: llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp llvm/test/Transforms/InstCombine/mul.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp index 2965103d4029..c6233a68847d 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp @@ -376,6 +376,16 @@ Instruction *InstCombiner::visitMul(BinaryOperator &I) { Value *And = Builder.CreateAnd(X, Y, "mulbool"); return CastInst::Create(Instruction::ZExt, And, I.getType()); } + // (sext bool X) * (zext bool Y) --> sext (and X, Y) + // (zext bool X) * (sext bool Y) --> sext (and X, Y) + // Note: -1 * 1 == 1 * -1 == -1 + if (((match(Op0, m_SExt(m_Value(X))) && match(Op1, m_ZExt(m_Value(Y)))) || + (match(Op0, m_ZExt(m_Value(X))) && match(Op1, m_SExt(m_Value(Y))))) && + X->getType()->isIntOrIntVectorTy(1) && X->getType() == Y->getType() && + (Op0->hasOneUse() || Op1->hasOneUse())) { + Value *And = Builder.CreateAnd(X, Y, "mulbool"); + return CastInst::Create(Instruction::SExt, And, I.getType()); + } // (bool X) * Y --> X ? Y : 0 // Y * (bool X) --> X ? Y : 0 diff --git a/llvm/test/Transforms/InstCombine/mul.ll b/llvm/test/Transforms/InstCombine/mul.ll index 9d1b8ad457e4..059b18d30b90 100644 --- a/llvm/test/Transforms/InstCombine/mul.ll +++ b/llvm/test/Transforms/InstCombine/mul.ll @@ -247,8 +247,8 @@ define i32 @mul_bools_sext_use3(i1 %x, i1 %y) { define <3 x i32> @mul_bools_mixed_ext(<3 x i1> %x, <3 x i1> %y) { ; CHECK-LABEL: @mul_bools_mixed_ext( -; CHECK-NEXT: [[NARROW:%.*]] = and <3 x i1> [[X:%.*]], [[Y:%.*]] -; CHECK-NEXT: [[R:%.*]] = sext <3 x i1> [[NARROW]] to <3 x i32> +; CHECK-NEXT: [[MULBOOL:%.*]] = and <3 x i1> [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[R:%.*]] = sext <3 x i1> [[MULBOOL]] to <3 x i32> ; CHECK-NEXT: ret <3 x i32> [[R]] ; %zx = zext <3 x i1> %x to <3 x i32> @@ -261,8 +261,8 @@ define i32 @mul_bools_mixed_ext_use1(i1 %x, i1 %y) { ; CHECK-LABEL: @mul_bools_mixed_ext_use1( ; CHECK-NEXT: [[ZY:%.*]] = zext i1 [[Y:%.*]] to i32 ; CHECK-NEXT: call void @use32(i32 [[ZY]]) -; CHECK-NEXT: [[NARROW:%.*]] = and i1 [[Y]], [[X:%.*]] -; CHECK-NEXT: [[R:%.*]] = sext i1 [[NARROW]] to i32 +; CHECK-NEXT: [[MULBOOL:%.*]] = and i1 [[X:%.*]], [[Y]] +; CHECK-NEXT: [[R:%.*]] = sext i1 [[MULBOOL]] to i32 ; CHECK-NEXT: ret i32 [[R]] ; %sx = sext i1 %x to i32 @@ -276,7 +276,8 @@ define i32 @mul_bools_mixed_ext_use2(i1 %x, i1 %y) { ; CHECK-LABEL: @mul_bools_mixed_ext_use2( ; CHECK-NEXT: [[SY:%.*]] = sext i1 [[Y:%.*]] to i32 ; CHECK-NEXT: call void @use32(i32 [[SY]]) -; CHECK-NEXT: [[R:%.*]] = select i1 [[X:%.*]], i32 [[SY]], i32 0 +; CHECK-NEXT: [[MULBOOL:%.*]] = and i1 [[Y]], [[X:%.*]] +; CHECK-NEXT: [[R:%.*]] = sext i1 [[MULBOOL]] to i32 ; CHECK-NEXT: ret i32 [[R]] ; %zx = zext i1 %x to i32 From llvm-commits at lists.llvm.org Sun Jul 12 12:59:36 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 12:59:36 -0700 (PDT) Subject: [llvm] ea84dc9 - [X86] Add CPU string output to getIntelProcessorTypeAndSubtype/getAMDProcessorTypeAndSubtype in Host.cpp Message-ID: <5f0b6ba8.1c69fb81.1108.9cc1@mx.google.com> Author: Craig Topper Date: 2020-07-12T12:59:25-07:00 New Revision: ea84dc9500df383b4fe07199134033f358411e59 URL: https://github.com/llvm/llvm-project/commit/ea84dc9500df383b4fe07199134033f358411e59 DIFF: https://github.com/llvm/llvm-project/commit/ea84dc9500df383b4fe07199134033f358411e59.diff LOG: [X86] Add CPU string output to getIntelProcessorTypeAndSubtype/getAMDProcessorTypeAndSubtype in Host.cpp Rather than converting type/subtype into strings, just directly select the string as part of family/model decoding. This avoids the need for creating fake Type/SubTypes for CPUs not supported by compiler-rtl. I've left the Type/SubType in place where it matches compiler-rt so that the code can be diffed, but the Type/SubType is no longer used by Host.cpp. compiler-rt was already updated to select strings that aren't used so the code will look similar. Added: Modified: llvm/include/llvm/Support/X86TargetParser.def llvm/lib/Support/Host.cpp Removed: ################################################################################ diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 4b96c66b0e29..9e9f0985d15e 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -48,25 +48,6 @@ X86_CPU_TYPE_COMPAT("knm", INTEL_KNM, "knm") X86_CPU_TYPE_COMPAT("goldmont", INTEL_GOLDMONT, "goldmont") X86_CPU_TYPE_COMPAT("goldmont-plus", INTEL_GOLDMONT_PLUS, "goldmont-plus") X86_CPU_TYPE_COMPAT("tremont", INTEL_TREMONT, "tremont") -// Entries below this are not in libgcc/compiler-rt. -X86_CPU_TYPE ("i386", INTEL_i386) -X86_CPU_TYPE ("i486", INTEL_i486) -X86_CPU_TYPE ("pentium", INTEL_PENTIUM) -X86_CPU_TYPE ("pentium-mmx", INTEL_PENTIUM_MMX) -X86_CPU_TYPE ("pentiumpro", INTEL_PENTIUM_PRO) -X86_CPU_TYPE ("pentium2", INTEL_PENTIUM_II) -X86_CPU_TYPE ("pentium3", INTEL_PENTIUM_III) -X86_CPU_TYPE ("pentium4", INTEL_PENTIUM_IV) -X86_CPU_TYPE ("pentium-m", INTEL_PENTIUM_M) -X86_CPU_TYPE ("yonah", INTEL_CORE_DUO) -X86_CPU_TYPE ("nocona", INTEL_NOCONA) -X86_CPU_TYPE ("prescott", INTEL_PRESCOTT) -X86_CPU_TYPE ("i486", AMD_i486) -X86_CPU_TYPE ("pentium", AMDPENTIUM) -X86_CPU_TYPE ("athlon", AMD_ATHLON) -X86_CPU_TYPE ("athlon-xp", AMD_ATHLON_XP) -X86_CPU_TYPE ("k8", AMD_K8) -X86_CPU_TYPE ("k8-sse3", AMD_K8SSE3) // Alternate names supported by __builtin_cpu_is and target multiversioning. X86_CPU_TYPE_COMPAT_ALIAS(INTEL_BONNELL, "atom") @@ -112,13 +93,6 @@ X86_CPU_SUBTYPE_COMPAT("znver2", AMDFAM17H_ZNVER2, "znver2") X86_CPU_SUBTYPE_COMPAT("cascadelake", INTEL_COREI7_CASCADELAKE, "cascadelake") X86_CPU_SUBTYPE_COMPAT("tigerlake", INTEL_COREI7_TIGERLAKE, "tigerlake") X86_CPU_SUBTYPE_COMPAT("cooperlake", INTEL_COREI7_COOPERLAKE, "cooperlake") -// Entries below this are not in libgcc/compiler-rt. -X86_CPU_SUBTYPE ("core2", INTEL_CORE2_65) -X86_CPU_SUBTYPE ("penryn", INTEL_CORE2_45) -X86_CPU_SUBTYPE ("k6", AMDPENTIUM_K6) -X86_CPU_SUBTYPE ("k6-2", AMDPENTIUM_K62) -X86_CPU_SUBTYPE ("k6-3", AMDPENTIUM_K63) -X86_CPU_SUBTYPE ("geode", AMDPENTIUM_GEODE) #undef X86_CPU_SUBTYPE_COMPAT #undef X86_CPU_SUBTYPE diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index 8dc8c4e9775a..362b5850b394 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -583,7 +583,7 @@ static void detectX86FamilyModel(unsigned EAX, unsigned *Family, } } -static void +static StringRef getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, const unsigned *Features, unsigned *Type, unsigned *Subtype) { @@ -591,31 +591,33 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, return (Features[F / 32] & (1U << (F % 32))) != 0; }; + StringRef CPU; + switch (Family) { case 3: - *Type = X86::INTEL_i386; + CPU = "i386"; break; case 4: - *Type = X86::INTEL_i486; + CPU = "i486"; break; case 5: if (testFeature(X86::FEATURE_MMX)) { - *Type = X86::INTEL_PENTIUM_MMX; + CPU = "pentium-mmx"; break; } - *Type = X86::INTEL_PENTIUM; + CPU = "pentium"; break; case 6: switch (Model) { case 0x01: // Pentium Pro processor - *Type = X86::INTEL_PENTIUM_PRO; + CPU = "pentiumpro"; break; case 0x03: // Intel Pentium II OverDrive processor, Pentium II processor, // model 03 case 0x05: // Pentium II processor, model 05, Pentium II Xeon processor, // model 05, and Intel Celeron processor, model 05 case 0x06: // Celeron processor, model 06 - *Type = X86::INTEL_PENTIUM_II; + CPU = "pentium2"; break; case 0x07: // Pentium III processor, model 07, and Pentium III Xeon // processor, model 07 @@ -623,19 +625,19 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, // model 08, and Celeron processor, model 08 case 0x0a: // Pentium III Xeon processor, model 0Ah case 0x0b: // Pentium III processor, model 0Bh - *Type = X86::INTEL_PENTIUM_III; + CPU = "pentium3"; break; case 0x09: // Intel Pentium M processor, Intel Celeron M processor model 09. case 0x0d: // Intel Pentium M processor, Intel Celeron M processor, model // 0Dh. All processors are manufactured using the 90 nm process. case 0x15: // Intel EP80579 Integrated Processor and Intel EP80579 // Integrated Processor with Intel QuickAssist Technology - *Type = X86::INTEL_PENTIUM_M; + CPU = "pentium-m"; break; case 0x0e: // Intel Core Duo processor, Intel Core Solo processor, model // 0Eh. All processors are manufactured using the 65 nm process. - *Type = X86::INTEL_CORE_DUO; - break; // yonah + CPU = "yonah"; + break; case 0x0f: // Intel Core 2 Duo processor, Intel Core 2 Duo mobile // processor, Intel Core 2 Quad processor, Intel Core 2 Quad // mobile processor, Intel Core 2 Extreme processor, Intel @@ -643,8 +645,8 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, // 0Fh. All processors are manufactured using the 65 nm process. case 0x16: // Intel Celeron processor model 16h. All processors are // manufactured using the 65 nm process - *Type = X86::INTEL_CORE2; // "core2" - *Subtype = X86::INTEL_CORE2_65; + CPU = "core2"; + *Type = X86::INTEL_CORE2; break; case 0x17: // Intel Core 2 Extreme processor, Intel Xeon processor, model // 17h. All processors are manufactured using the 45 nm process. @@ -652,34 +654,38 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, // 45nm: Penryn , Wolfdale, Yorkfield (XE) case 0x1d: // Intel Xeon processor MP. All processors are manufactured using // the 45 nm process. - *Type = X86::INTEL_CORE2; // "penryn" - *Subtype = X86::INTEL_CORE2_45; + CPU = "penryn"; + *Type = X86::INTEL_CORE2; break; case 0x1a: // Intel Core i7 processor and Intel Xeon processor. All // processors are manufactured using the 45 nm process. case 0x1e: // Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz. // As found in a Summer 2010 model iMac. case 0x1f: - case 0x2e: // Nehalem EX - *Type = X86::INTEL_COREI7; // "nehalem" + case 0x2e: // Nehalem EX + CPU = "nehalem"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_NEHALEM; break; case 0x25: // Intel Core i7, laptop version. case 0x2c: // Intel Core i7 processor and Intel Xeon processor. All // processors are manufactured using the 32 nm process. case 0x2f: // Westmere EX - *Type = X86::INTEL_COREI7; // "westmere" + CPU = "westmere"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_WESTMERE; break; case 0x2a: // Intel Core i7 processor. All processors are manufactured // using the 32 nm process. case 0x2d: - *Type = X86::INTEL_COREI7; //"sandybridge" + CPU = "sandybridge"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_SANDYBRIDGE; break; case 0x3a: - case 0x3e: // Ivy Bridge EP - *Type = X86::INTEL_COREI7; // "ivybridge" + case 0x3e: // Ivy Bridge EP + CPU = "ivybridge"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_IVYBRIDGE; break; @@ -688,7 +694,8 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x3f: case 0x45: case 0x46: - *Type = X86::INTEL_COREI7; // "haswell" + CPU = "haswell"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_HASWELL; break; @@ -697,7 +704,8 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x47: case 0x4f: case 0x56: - *Type = X86::INTEL_COREI7; // "broadwell" + CPU = "broadwell"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_BROADWELL; break; @@ -708,39 +716,47 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x9e: // Kaby Lake desktop case 0xa5: // Comet Lake-H/S case 0xa6: // Comet Lake-U - *Type = X86::INTEL_COREI7; // "skylake" + CPU = "skylake"; + *Type = X86::INTEL_COREI7; *Subtype = X86::INTEL_COREI7_SKYLAKE; break; // Skylake Xeon: case 0x55: *Type = X86::INTEL_COREI7; - if (testFeature(X86::FEATURE_AVX512BF16)) - *Subtype = X86::INTEL_COREI7_COOPERLAKE; // "cooperlake" - else if (testFeature(X86::FEATURE_AVX512VNNI)) - *Subtype = X86::INTEL_COREI7_CASCADELAKE; // "cascadelake" - else - *Subtype = X86::INTEL_COREI7_SKYLAKE_AVX512; // "skylake-avx512" + if (testFeature(X86::FEATURE_AVX512BF16)) { + CPU = "cooperlake"; + *Subtype = X86::INTEL_COREI7_COOPERLAKE; + } else if (testFeature(X86::FEATURE_AVX512VNNI)) { + CPU = "cascadelake"; + *Subtype = X86::INTEL_COREI7_CASCADELAKE; + } else { + CPU = "skylake-avx512"; + *Subtype = X86::INTEL_COREI7_SKYLAKE_AVX512; + } break; // Cannonlake: case 0x66: + CPU = "cannonlake"; *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_CANNONLAKE; // "cannonlake" + *Subtype = X86::INTEL_COREI7_CANNONLAKE; break; // Icelake: case 0x7d: case 0x7e: + CPU = "icelake-client"; *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_ICELAKE_CLIENT; // "icelake-client" + *Subtype = X86::INTEL_COREI7_ICELAKE_CLIENT; break; // Icelake Xeon: case 0x6a: case 0x6c: + CPU = "icelake-server"; *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_ICELAKE_SERVER; // "icelake-server" + *Subtype = X86::INTEL_COREI7_ICELAKE_SERVER; break; case 0x1c: // Most 45 nm Intel Atom processors @@ -748,8 +764,9 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x27: // 32 nm Atom Medfield case 0x35: // 32 nm Atom Midview case 0x36: // 32 nm Atom Midview + CPU = "bonnell"; *Type = X86::INTEL_BONNELL; - break; // "bonnell" + break; // Atom Silvermont codes from the Intel software optimization guide. case 0x37: @@ -758,14 +775,17 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x5a: case 0x5d: case 0x4c: // really airmont + CPU = "silvermont"; *Type = X86::INTEL_SILVERMONT; - break; // "silvermont" + break; // Goldmont: case 0x5c: // Apollo Lake case 0x5f: // Denverton + CPU = "goldmont"; *Type = X86::INTEL_GOLDMONT; - break; // "goldmont" + break; case 0x7a: + CPU = "goldmont-plus"; *Type = X86::INTEL_GOLDMONT_PLUS; break; case 0x86: @@ -773,193 +793,140 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, break; case 0x57: - *Type = X86::INTEL_KNL; // knl + CPU = "tremont"; + *Type = X86::INTEL_KNL; break; case 0x85: - *Type = X86::INTEL_KNM; // knm + CPU = "knm"; + *Type = X86::INTEL_KNM; break; default: // Unknown family 6 CPU, try to guess. + // Don't both with Type/Subtype here, they aren't used by the caller. + // They're used above to keep the code in sync with compiler-rt. // TODO detect tigerlake host from model if (testFeature(X86::FEATURE_AVX512VP2INTERSECT)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_TIGERLAKE; - break; - } - - if (testFeature(X86::FEATURE_AVX512VBMI2)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_ICELAKE_CLIENT; - break; - } - - if (testFeature(X86::FEATURE_AVX512VBMI)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_CANNONLAKE; - break; - } - - if (testFeature(X86::FEATURE_AVX512BF16)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_COOPERLAKE; - break; - } - - if (testFeature(X86::FEATURE_AVX512VNNI)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_CASCADELAKE; - break; - } - - if (testFeature(X86::FEATURE_AVX512VL)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_SKYLAKE_AVX512; - break; - } - - if (testFeature(X86::FEATURE_AVX512ER)) { - *Type = X86::INTEL_KNL; // knl - break; - } - - if (testFeature(X86::FEATURE_CLFLUSHOPT)) { - if (testFeature(X86::FEATURE_SHA)) { - *Type = X86::INTEL_GOLDMONT; - } else { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_SKYLAKE; - } - break; - } - if (testFeature(X86::FEATURE_ADX)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_BROADWELL; - break; - } - if (testFeature(X86::FEATURE_AVX2)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_HASWELL; - break; - } - if (testFeature(X86::FEATURE_AVX)) { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_SANDYBRIDGE; - break; - } - if (testFeature(X86::FEATURE_SSE4_2)) { - if (testFeature(X86::FEATURE_MOVBE)) { - *Type = X86::INTEL_SILVERMONT; - } else { - *Type = X86::INTEL_COREI7; - *Subtype = X86::INTEL_COREI7_NEHALEM; - } - break; - } - if (testFeature(X86::FEATURE_SSE4_1)) { - *Type = X86::INTEL_CORE2; // "penryn" - *Subtype = X86::INTEL_CORE2_45; - break; - } - if (testFeature(X86::FEATURE_SSSE3)) { - if (testFeature(X86::FEATURE_MOVBE)) { - *Type = X86::INTEL_BONNELL; // "bonnell" - } else { - *Type = X86::INTEL_CORE2; // "core2" - *Subtype = X86::INTEL_CORE2_65; - } - break; - } - if (testFeature(X86::FEATURE_64BIT)) { - *Type = X86::INTEL_CORE2; // "core2" - *Subtype = X86::INTEL_CORE2_65; - break; - } - if (testFeature(X86::FEATURE_SSE3)) { - *Type = X86::INTEL_CORE_DUO; - break; + CPU = "tigerlake"; + } else if (testFeature(X86::FEATURE_AVX512VBMI2)) { + CPU = "icelake-client"; + } else if (testFeature(X86::FEATURE_AVX512VBMI)) { + CPU = "cannonlake"; + } else if (testFeature(X86::FEATURE_AVX512BF16)) { + CPU = "cooperlake"; + } else if (testFeature(X86::FEATURE_AVX512VNNI)) { + CPU = "cascadelake"; + } else if (testFeature(X86::FEATURE_AVX512VL)) { + CPU = "skylake-avx512"; + } else if (testFeature(X86::FEATURE_AVX512ER)) { + CPU = "knl"; + } else if (testFeature(X86::FEATURE_CLFLUSHOPT)) { + if (testFeature(X86::FEATURE_SHA)) + CPU = "goldmont"; + else + CPU = "skylake"; + } else if (testFeature(X86::FEATURE_ADX)) { + CPU = "broadwell"; + } else if (testFeature(X86::FEATURE_AVX2)) { + CPU = "haswell"; + } else if (testFeature(X86::FEATURE_AVX)) { + CPU = "sandybridge"; + } else if (testFeature(X86::FEATURE_SSE4_2)) { + if (testFeature(X86::FEATURE_MOVBE)) + CPU = "silvermont"; + else + CPU = "nehalem"; + } else if (testFeature(X86::FEATURE_SSE4_1)) { + CPU = "penryn"; + } else if (testFeature(X86::FEATURE_SSSE3)) { + if (testFeature(X86::FEATURE_MOVBE)) + CPU = "bonnell"; + else + CPU = "core2"; + } else if (testFeature(X86::FEATURE_64BIT)) { + CPU = "core2"; + } else if (testFeature(X86::FEATURE_SSE3)) { + CPU = "yonah"; + } else if (testFeature(X86::FEATURE_SSE2)) { + CPU = "pentium-m"; + } else if (testFeature(X86::FEATURE_SSE)) { + CPU = "pentium3"; + } else if (testFeature(X86::FEATURE_MMX)) { + CPU = "pentium2"; + } else { + CPU = "pentiumpro"; } - if (testFeature(X86::FEATURE_SSE2)) { - *Type = X86::INTEL_PENTIUM_M; - break; - } - if (testFeature(X86::FEATURE_SSE)) { - *Type = X86::INTEL_PENTIUM_III; - break; - } - if (testFeature(X86::FEATURE_MMX)) { - *Type = X86::INTEL_PENTIUM_II; - break; - } - *Type = X86::INTEL_PENTIUM_PRO; break; } break; case 15: { if (testFeature(X86::FEATURE_64BIT)) { - *Type = X86::INTEL_NOCONA; + CPU = "nocona"; break; } if (testFeature(X86::FEATURE_SSE3)) { - *Type = X86::INTEL_PRESCOTT; + CPU = "prescott"; break; } - *Type = X86::INTEL_PENTIUM_IV; + CPU = "pentium4"; break; } default: - break; /*"generic"*/ + break; // Unknown. } + + return CPU; } -static void getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, - const unsigned *Features, - unsigned *Type, unsigned *Subtype) { +static StringRef +getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, + const unsigned *Features, + unsigned *Type, unsigned *Subtype) { auto testFeature = [&](unsigned F) { return (Features[F / 32] & (1U << (F % 32))) != 0; }; - // FIXME: this poorly matches the generated SubtargetFeatureKV table. There - // appears to be no way to generate the wide variety of AMD-specific targets - // from the information returned from CPUID. + StringRef CPU; + switch (Family) { case 4: - *Type = X86::AMD_i486; + CPU = "i486"; break; case 5: - *Type = X86::AMDPENTIUM; + CPU = "pentium"; switch (Model) { case 6: case 7: - *Subtype = X86::AMDPENTIUM_K6; - break; // "k6" + CPU = "k6"; + break; case 8: - *Subtype = X86::AMDPENTIUM_K62; - break; // "k6-2" + CPU = "k6-2"; + break; case 9: case 13: - *Subtype = X86::AMDPENTIUM_K63; - break; // "k6-3" + CPU = "k6-3"; + break; case 10: - *Subtype = X86::AMDPENTIUM_GEODE; - break; // "geode" + CPU = "geode"; + break; } break; case 6: if (testFeature(X86::FEATURE_SSE)) { - *Type = X86::AMD_ATHLON_XP; - break; // "athlon-xp" + CPU = "athlon-xp"; + break; } - *Type = X86::AMD_ATHLON; - break; // "athlon" + CPU = "athlon"; + break; case 15: if (testFeature(X86::FEATURE_SSE3)) { - *Type = X86::AMD_K8SSE3; - break; // "k8-sse3" + CPU = "k8-sse3"; + break; } - *Type = X86::AMD_K8; - break; // "k8" + CPU = "k8"; + break; case 16: + CPU = "amdfam10"; *Type = X86::AMDFAM10H; // "amdfam10" switch (Model) { case 2: @@ -974,44 +941,54 @@ static void getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 20: + CPU = "btver1"; *Type = X86::AMD_BTVER1; - break; // "btver1"; + break; case 21: + CPU = "bdver1"; *Type = X86::AMDFAM15H; if (Model >= 0x60 && Model <= 0x7f) { + CPU = "bdver4"; *Subtype = X86::AMDFAM15H_BDVER4; - break; // "bdver4"; 60h-7Fh: Excavator + break; // 60h-7Fh: Excavator } if (Model >= 0x30 && Model <= 0x3f) { + CPU = "bdver3"; *Subtype = X86::AMDFAM15H_BDVER3; - break; // "bdver3"; 30h-3Fh: Steamroller + break; // 30h-3Fh: Steamroller } if ((Model >= 0x10 && Model <= 0x1f) || Model == 0x02) { + CPU = "bdver2"; *Subtype = X86::AMDFAM15H_BDVER2; - break; // "bdver2"; 02h, 10h-1Fh: Piledriver + break; // 02h, 10h-1Fh: Piledriver } if (Model <= 0x0f) { *Subtype = X86::AMDFAM15H_BDVER1; - break; // "bdver1"; 00h-0Fh: Bulldozer + break; // 00h-0Fh: Bulldozer } break; case 22: + CPU = "btver2"; *Type = X86::AMD_BTVER2; - break; // "btver2" + break; case 23: + CPU = "znver1"; *Type = X86::AMDFAM17H; if ((Model >= 0x30 && Model <= 0x3f) || Model == 0x71) { + CPU = "znver2"; *Subtype = X86::AMDFAM17H_ZNVER2; - break; // "znver2"; 30h-3fh, 71h: Zen2 + break; // 30h-3fh, 71h: Zen2 } if (Model <= 0x0f) { *Subtype = X86::AMDFAM17H_ZNVER1; - break; // "znver1"; 00h-0Fh: Zen1 + break; // 00h-0Fh: Zen1 } break; default: - break; // "generic" + break; // Unknown AMD CPU. } + + return CPU; } static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, @@ -1161,26 +1138,23 @@ StringRef sys::getHostCPUName() { detectX86FamilyModel(EAX, &Family, &Model); getAvailableFeatures(ECX, EDX, MaxLeaf, Features); + // These aren't consumed in this file, but we try to keep some source code the + // same or similar to compiler-rt. unsigned Type = 0; unsigned Subtype = 0; + StringRef CPU; + if (Vendor == SIG_INTEL) { - getIntelProcessorTypeAndSubtype(Family, Model, Features, &Type, &Subtype); + CPU = getIntelProcessorTypeAndSubtype(Family, Model, Features, &Type, + &Subtype); } else if (Vendor == SIG_AMD) { - getAMDProcessorTypeAndSubtype(Family, Model, Features, &Type, &Subtype); + CPU = getAMDProcessorTypeAndSubtype(Family, Model, Features, &Type, + &Subtype); } - // Check subtypes first since those are more specific. -#define X86_CPU_SUBTYPE(ARCHNAME, ENUM) \ - if (Subtype == X86::ENUM) \ - return ARCHNAME; -#include "llvm/Support/X86TargetParser.def" - - // Now check types. -#define X86_CPU_TYPE(ARCHNAME, ENUM) \ - if (Type == X86::ENUM) \ - return ARCHNAME; -#include "llvm/Support/X86TargetParser.def" + if (!CPU.empty()) + return CPU; return "generic"; } From llvm-commits at lists.llvm.org Sun Jul 12 12:59:34 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 12:59:34 -0700 (PDT) Subject: [compiler-rt] b92c2bb - [X86] Add CPU name strings to getIntelProcessorTypeAndSubtype and getAMDProcessorTypeAndSubtype in compiler-rt. Message-ID: <5f0b6ba6.1c69fb81.37f67.b43b@mx.google.com> Author: Craig Topper Date: 2020-07-12T12:59:25-07:00 New Revision: b92c2bb6a2058611d727c4e2ce3a928f0a3e647d URL: https://github.com/llvm/llvm-project/commit/b92c2bb6a2058611d727c4e2ce3a928f0a3e647d DIFF: https://github.com/llvm/llvm-project/commit/b92c2bb6a2058611d727c4e2ce3a928f0a3e647d.diff LOG: [X86] Add CPU name strings to getIntelProcessorTypeAndSubtype and getAMDProcessorTypeAndSubtype in compiler-rt. These aren't used in compiler-rt, but I plan to make a similar change to the equivalent code in Host.cpp where the mapping from type/subtype is an unnecessary complication. Having the CPU strings here will help keep the code somewhat synchronized. Added: Modified: compiler-rt/lib/builtins/cpu_model.c Removed: ################################################################################ diff --git a/compiler-rt/lib/builtins/cpu_model.c b/compiler-rt/lib/builtins/cpu_model.c index 042657232d8e..8346bb62dcfb 100644 --- a/compiler-rt/lib/builtins/cpu_model.c +++ b/compiler-rt/lib/builtins/cpu_model.c @@ -272,12 +272,17 @@ static void detectX86FamilyModel(unsigned EAX, unsigned *Family, } } -static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, - const unsigned *Features, - unsigned *Type, unsigned *Subtype) { +static const char * +getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, + const unsigned *Features, + unsigned *Type, unsigned *Subtype) { #define testFeature(F) \ (Features[F / 32] & (F % 32)) != 0 + // We select CPU strings to match the code in Host.cpp, but we don't use them + // in compiler-rt. + const char *CPU = 0; + switch (Family) { case 6: switch (Model) { @@ -288,13 +293,17 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, // 0Fh. All processors are manufactured using the 65 nm process. case 0x16: // Intel Celeron processor model 16h. All processors are // manufactured using the 65 nm process + CPU = "core2"; + *Type = INTEL_CORE2; + break; case 0x17: // Intel Core 2 Extreme processor, Intel Xeon processor, model // 17h. All processors are manufactured using the 45 nm process. // // 45nm: Penryn , Wolfdale, Yorkfield (XE) case 0x1d: // Intel Xeon processor MP. All processors are manufactured using // the 45 nm process. - *Type = INTEL_CORE2; // "penryn" + CPU = "penryn"; + *Type = INTEL_CORE2; break; case 0x1a: // Intel Core i7 processor and Intel Xeon processor. All // processors are manufactured using the 45 nm process. @@ -302,25 +311,29 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, // As found in a Summer 2010 model iMac. case 0x1f: case 0x2e: // Nehalem EX - *Type = INTEL_COREI7; // "nehalem" + CPU = "nehalem"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_NEHALEM; break; case 0x25: // Intel Core i7, laptop version. case 0x2c: // Intel Core i7 processor and Intel Xeon processor. All // processors are manufactured using the 32 nm process. case 0x2f: // Westmere EX - *Type = INTEL_COREI7; // "westmere" + CPU = "westmere"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_WESTMERE; break; case 0x2a: // Intel Core i7 processor. All processors are manufactured // using the 32 nm process. case 0x2d: - *Type = INTEL_COREI7; //"sandybridge" + CPU = "sandybridge"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_SANDYBRIDGE; break; case 0x3a: case 0x3e: // Ivy Bridge EP - *Type = INTEL_COREI7; // "ivybridge" + CPU = "ivybridge"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_IVYBRIDGE; break; @@ -329,7 +342,8 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x3f: case 0x45: case 0x46: - *Type = INTEL_COREI7; // "haswell" + CPU = "haswell"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_HASWELL; break; @@ -338,7 +352,8 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x47: case 0x4f: case 0x56: - *Type = INTEL_COREI7; // "broadwell" + CPU = "broadwell"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_BROADWELL; break; @@ -349,39 +364,47 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x9e: // Kaby Lake desktop case 0xa5: // Comet Lake-H/S case 0xa6: // Comet Lake-U - *Type = INTEL_COREI7; // "skylake" + CPU = "skylake"; + *Type = INTEL_COREI7; *Subtype = INTEL_COREI7_SKYLAKE; break; // Skylake Xeon: case 0x55: *Type = INTEL_COREI7; - if (testFeature(FEATURE_AVX512BF16)) - *Subtype = INTEL_COREI7_COOPERLAKE; // "cooperlake" - else if (testFeature(FEATURE_AVX512VNNI)) - *Subtype = INTEL_COREI7_CASCADELAKE; // "cascadelake" - else - *Subtype = INTEL_COREI7_SKYLAKE_AVX512; // "skylake-avx512" + if (testFeature(FEATURE_AVX512BF16)) { + CPU = "cooperlake"; + *Subtype = INTEL_COREI7_COOPERLAKE; + } else if (testFeature(FEATURE_AVX512VNNI)) { + CPU = "cascadelake"; + *Subtype = INTEL_COREI7_CASCADELAKE; + } else { + CPU = "skylake-avx512"; + *Subtype = INTEL_COREI7_SKYLAKE_AVX512; + } break; // Cannonlake: case 0x66: + CPU = "cannonlake"; *Type = INTEL_COREI7; - *Subtype = INTEL_COREI7_CANNONLAKE; // "cannonlake" + *Subtype = INTEL_COREI7_CANNONLAKE; break; // Icelake: case 0x7d: case 0x7e: + CPU = "icelake-client"; *Type = INTEL_COREI7; - *Subtype = INTEL_COREI7_ICELAKE_CLIENT; // "icelake-client" + *Subtype = INTEL_COREI7_ICELAKE_CLIENT; break; // Icelake Xeon: case 0x6a: case 0x6c: + CPU = "icelake-server"; *Type = INTEL_COREI7; - *Subtype = INTEL_COREI7_ICELAKE_SERVER; // "icelake-server" + *Subtype = INTEL_COREI7_ICELAKE_SERVER; break; case 0x1c: // Most 45 nm Intel Atom processors @@ -389,8 +412,9 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x27: // 32 nm Atom Medfield case 0x35: // 32 nm Atom Midview case 0x36: // 32 nm Atom Midview + CPU = "bonnell"; *Type = INTEL_BONNELL; - break; // "bonnell" + break; // Atom Silvermont codes from the Intel software optimization guide. case 0x37: @@ -399,26 +423,32 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, case 0x5a: case 0x5d: case 0x4c: // really airmont + CPU = "silvermont"; *Type = INTEL_SILVERMONT; - break; // "silvermont" + break; // Goldmont: case 0x5c: // Apollo Lake case 0x5f: // Denverton + CPU = "goldmont"; *Type = INTEL_GOLDMONT; break; // "goldmont" case 0x7a: + CPU = "goldmont-plus"; *Type = INTEL_GOLDMONT_PLUS; break; case 0x86: + CPU = "tremont"; *Type = INTEL_TREMONT; break; case 0x57: - *Type = INTEL_KNL; // knl + CPU = "knl"; + *Type = INTEL_KNL; break; case 0x85: - *Type = INTEL_KNM; // knm + CPU = "knm"; + *Type = INTEL_KNM; break; default: // Unknown family 6 CPU. @@ -428,17 +458,22 @@ static void getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, default: break; // Unknown. } + + return CPU; } -static void getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, - const unsigned *Features, - unsigned *Type, unsigned *Subtype) { - // FIXME: this poorly matches the generated SubtargetFeatureKV table. There - // appears to be no way to generate the wide variety of AMD-specific targets - // from the information returned from CPUID. +static const char * +getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, + const unsigned *Features, + unsigned *Type, unsigned *Subtype) { + // We select CPU strings to match the code in Host.cpp, but we don't use them + // in compiler-rt. + const char *CPU = 0; + switch (Family) { case 16: - *Type = AMDFAM10H; // "amdfam10" + CPU = "amdfam10"; + *Type = AMDFAM10H; switch (Model) { case 2: *Subtype = AMDFAM10H_BARCELONA; @@ -452,44 +487,54 @@ static void getAMDProcessorTypeAndSubtype(unsigned Family, unsigned Model, } break; case 20: + CPU = "btver1"; *Type = AMD_BTVER1; - break; // "btver1"; + break; case 21: + CPU = "bdver1"; *Type = AMDFAM15H; if (Model >= 0x60 && Model <= 0x7f) { + CPU = "bdver4"; *Subtype = AMDFAM15H_BDVER4; - break; // "bdver4"; 60h-7Fh: Excavator + break; // 60h-7Fh: Excavator } if (Model >= 0x30 && Model <= 0x3f) { + CPU = "bdver3"; *Subtype = AMDFAM15H_BDVER3; - break; // "bdver3"; 30h-3Fh: Steamroller + break; // 30h-3Fh: Steamroller } if ((Model >= 0x10 && Model <= 0x1f) || Model == 0x02) { + CPU = "bdver2"; *Subtype = AMDFAM15H_BDVER2; - break; // "bdver2"; 02h, 10h-1Fh: Piledriver + break; // 02h, 10h-1Fh: Piledriver } if (Model <= 0x0f) { *Subtype = AMDFAM15H_BDVER1; - break; // "bdver1"; 00h-0Fh: Bulldozer + break; // 00h-0Fh: Bulldozer } break; case 22: + CPU = "btver2"; *Type = AMD_BTVER2; - break; // "btver2" + break; case 23: + CPU = "znver1"; *Type = AMDFAM17H; if ((Model >= 0x30 && Model <= 0x3f) || Model == 0x71) { + CPU = "znver2"; *Subtype = AMDFAM17H_ZNVER2; - break; // "znver2"; 30h-3fh, 71h: Zen2 + break; // 30h-3fh, 71h: Zen2 } if (Model <= 0x0f) { *Subtype = AMDFAM17H_ZNVER1; - break; // "znver1"; 00h-0Fh: Zen1 + break; // 00h-0Fh: Zen1 } break; default: - break; // "generic" + break; // Unknown AMD CPU. } + + return CPU; } static void getAvailableFeatures(unsigned ECX, unsigned EDX, unsigned MaxLeaf, From llvm-commits at lists.llvm.org Sun Jul 12 12:59:38 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 12:59:38 -0700 (PDT) Subject: [llvm] 90c577a - [X86] Remove model number based detection for 'pentiumpro', 'pentium2', 'pentium3', 'pentium-m', and 'yonah' from getHostCPUName. Message-ID: <5f0b6baa.1c69fb81.85c9e.ce3a@mx.google.com> Author: Craig Topper Date: 2020-07-12T12:59:25-07:00 New Revision: 90c577a113e97212e02d5956d6db45e701e3552f URL: https://github.com/llvm/llvm-project/commit/90c577a113e97212e02d5956d6db45e701e3552f DIFF: https://github.com/llvm/llvm-project/commit/90c577a113e97212e02d5956d6db45e701e3552f.diff LOG: [X86] Remove model number based detection for 'pentiumpro', 'pentium2', 'pentium3', 'pentium-m', and 'yonah' from getHostCPUName. For model 6 CPUs, we have a fallback detection method based on available features. That mechanism should be enough to detect these early family 6 CPUs as they only differ in the features used by the detection anyway. Added: Modified: llvm/lib/Support/Host.cpp Removed: ################################################################################ diff --git a/llvm/lib/Support/Host.cpp b/llvm/lib/Support/Host.cpp index 362b5850b394..658c1ee74cfe 100644 --- a/llvm/lib/Support/Host.cpp +++ b/llvm/lib/Support/Host.cpp @@ -609,35 +609,6 @@ getIntelProcessorTypeAndSubtype(unsigned Family, unsigned Model, break; case 6: switch (Model) { - case 0x01: // Pentium Pro processor - CPU = "pentiumpro"; - break; - case 0x03: // Intel Pentium II OverDrive processor, Pentium II processor, - // model 03 - case 0x05: // Pentium II processor, model 05, Pentium II Xeon processor, - // model 05, and Intel Celeron processor, model 05 - case 0x06: // Celeron processor, model 06 - CPU = "pentium2"; - break; - case 0x07: // Pentium III processor, model 07, and Pentium III Xeon - // processor, model 07 - case 0x08: // Pentium III processor, model 08, Pentium III Xeon processor, - // model 08, and Celeron processor, model 08 - case 0x0a: // Pentium III Xeon processor, model 0Ah - case 0x0b: // Pentium III processor, model 0Bh - CPU = "pentium3"; - break; - case 0x09: // Intel Pentium M processor, Intel Celeron M processor model 09. - case 0x0d: // Intel Pentium M processor, Intel Celeron M processor, model - // 0Dh. All processors are manufactured using the 90 nm process. - case 0x15: // Intel EP80579 Integrated Processor and Intel EP80579 - // Integrated Processor with Intel QuickAssist Technology - CPU = "pentium-m"; - break; - case 0x0e: // Intel Core Duo processor, Intel Core Solo processor, model - // 0Eh. All processors are manufactured using the 65 nm process. - CPU = "yonah"; - break; case 0x0f: // Intel Core 2 Duo processor, Intel Core 2 Duo mobile // processor, Intel Core 2 Quad processor, Intel Core 2 Quad // mobile processor, Intel Core 2 Extreme processor, Intel From llvm-commits at lists.llvm.org Sun Jul 12 13:00:15 2020 From: llvm-commits at lists.llvm.org (Sanjay Patel via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 20:00:15 +0000 (UTC) Subject: [PATCH] D83005: [NFC] Combine cstfp_pred_ty and cst_pred_ty In-Reply-To: References: Message-ID: <834a5aa77b7cae8633768cf47b799b15@localhost.localdomain> spatel accepted this revision. spatel added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83005/new/ https://reviews.llvm.org/D83005 From llvm-commits at lists.llvm.org Sun Jul 12 13:39:09 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 20:39:09 +0000 (UTC) Subject: [PATCH] D83646: [LV][LoopUtils] Add UseReductionIntrinsic to createTargetReduction Message-ID: dmgreen created this revision. dmgreen added reviewers: gilr, Ayal, fhahn, SjoerdMeijer. Herald added subscribers: vkmr, rogfer01, hiraditya. Herald added a project: LLVM. This changes the interface to createTargetReduction and createSimpleTargetReduction to take a UseReductionIntrinsic bool, instead of asking TTI. This allows us to calculate UseReductionIntrinsic earlier in the vectorizer and reduce the need to propagate TTI into RecuctionRecipes. In the process I added a getReductionFlags mathod for conveniently creating the TargetTransformInfo::ReductionFlags flags used by several methods. https://reviews.llvm.org/D83646 Files: llvm/include/llvm/Analysis/IVDescriptors.h llvm/include/llvm/Transforms/Utils/LoopUtils.h llvm/lib/Analysis/IVDescriptors.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/lib/Transforms/Vectorize/VPlan.h -------------- next part -------------- A non-text attachment was scrubbed... Name: D83646.277302.patch Type: text/x-patch Size: 14603 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 13:40:07 2020 From: llvm-commits at lists.llvm.org (Dave Green via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 20:40:07 +0000 (UTC) Subject: [PATCH] D75069: [LoopVectorizer] Inloop vector reductions In-Reply-To: References: Message-ID: dmgreen added a comment. Thanks. I will update the patch, but I will wait until at least after the branch before I commit it. D83646 is an attempt at changing the interface to createTargetReduction, so that we don't need to store TTI in the reduction recipe. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75069/new/ https://reviews.llvm.org/D75069 From llvm-commits at lists.llvm.org Sun Jul 12 14:41:17 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via llvm-commits) Date: Sun, 12 Jul 2020 14:41:17 -0700 (PDT) Subject: [llvm] 0a01fc9 - Revert "[TRE] allow TRE for non-capturing calls." Message-ID: <5f0b837d.1c69fb81.21f21.d0dd@mx.google.com> Author: Alexey Lapshin Date: 2020-07-13T00:39:48+03:00 New Revision: 0a01fc96e24b7c7de2141a2ea07593500ea34732 URL: https://github.com/llvm/llvm-project/commit/0a01fc96e24b7c7de2141a2ea07593500ea34732 DIFF: https://github.com/llvm/llvm-project/commit/0a01fc96e24b7c7de2141a2ea07593500ea34732.diff LOG: Revert "[TRE] allow TRE for non-capturing calls." This reverts commit f7907e9d223d8484f9afd457ba614c2db2ae4743. That commit caused error on multi-stage build. Added: Modified: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/test/Transforms/TailCallElim/basic.ll Removed: llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll ################################################################################ diff --git a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp index bfd312a52ea5..5bb1d54d7d12 100644 --- a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp +++ b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp @@ -81,7 +81,6 @@ #include "llvm/Support/raw_ostream.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" -#include "llvm/Transforms/Utils/Local.h" using namespace llvm; #define DEBUG_TYPE "tailcallelim" @@ -93,10 +92,7 @@ STATISTIC(NumAccumAdded, "Number of accumulators introduced"); /// Scan the specified function for alloca instructions. /// If it contains any dynamic allocas, returns false. static bool canTRE(Function &F) { - // TODO: We don't do TRE if dynamic allocas are used. - // Dynamic allocas allocate stack space which should be - // deallocated before new iteration started. That is - // currently not implemented. + // Because of PR962, we don't TRE dynamic allocas. return llvm::all_of(instructions(F), [](Instruction &I) { auto *AI = dyn_cast(&I); return !AI || AI->isStaticAlloca(); @@ -189,9 +185,11 @@ struct AllocaDerivedValueTracker { }; } -static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { +static bool markTails(Function &F, bool &AllCallsAreTailCalls, + OptimizationRemarkEmitter *ORE) { if (F.callsFunctionThatReturnsTwice()) return false; + AllCallsAreTailCalls = true; // The local stack holds all alloca instructions and all byval arguments. AllocaDerivedValueTracker Tracker; @@ -274,8 +272,11 @@ static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { } } - if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) + if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) { DeferredTails.push_back(CI); + } else { + AllCallsAreTailCalls = false; + } } for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) { @@ -312,6 +313,8 @@ static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n"); CI->setTailCall(); Modified = true; + } else { + AllCallsAreTailCalls = false; } } @@ -322,16 +325,7 @@ static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { /// instruction from after the call to before the call, assuming that all /// instructions between the call and this instruction are movable. /// -static bool canMoveAboveCall(Instruction *I, CallInst *CI, AliasAnalysis *AA, - DenseMap &AllocaForValue) { - if (isa(I)) - return true; - - if (const IntrinsicInst *II = dyn_cast(I)) - if (II->getIntrinsicID() == Intrinsic::lifetime_end && - llvm::findAllocaForValue(II->getArgOperand(1), AllocaForValue)) - return true; - +static bool canMoveAboveCall(Instruction *I, CallInst *CI, AliasAnalysis *AA) { // FIXME: We can move load/store/call/free instructions above the call if the // call does not mod/ref the memory location being processed. if (I->mayHaveSideEffects()) // This also handles volatile loads. @@ -398,6 +392,7 @@ class TailRecursionEliminator { // createTailRecurseLoopHeader the first time we find a call we can eliminate. BasicBlock *HeaderBB = nullptr; SmallVector ArgumentPHIs; + bool RemovableCallsMustBeMarkedTail = false; // PHI node to store our return value. PHINode *RetPN = nullptr; @@ -419,15 +414,13 @@ class TailRecursionEliminator { // The instruction doing the accumulating. Instruction *AccumulatorRecursionInstr = nullptr; - // The cache for pairs. - DenseMap AllocaForValue; - TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI, AliasAnalysis *AA, OptimizationRemarkEmitter *ORE, DomTreeUpdater &DTU) : F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {} - CallInst *findTRECandidate(Instruction *TI); + CallInst *findTRECandidate(Instruction *TI, + bool CannotTailCallElimCallsMarkedTail); void createTailRecurseLoopHeader(CallInst *CI); @@ -435,9 +428,11 @@ class TailRecursionEliminator { bool eliminateCall(CallInst *CI); - bool foldReturnAndProcessPred(ReturnInst *Ret); + bool foldReturnAndProcessPred(ReturnInst *Ret, + bool CannotTailCallElimCallsMarkedTail); - bool processReturningBlock(ReturnInst *Ret); + bool processReturningBlock(ReturnInst *Ret, + bool CannotTailCallElimCallsMarkedTail); void cleanupAndFinalize(); @@ -448,7 +443,8 @@ class TailRecursionEliminator { }; } // namespace -CallInst *TailRecursionEliminator::findTRECandidate(Instruction *TI) { +CallInst *TailRecursionEliminator::findTRECandidate( + Instruction *TI, bool CannotTailCallElimCallsMarkedTail) { BasicBlock *BB = TI->getParent(); if (&BB->front() == TI) // Make sure there is something before the terminator. @@ -468,9 +464,9 @@ CallInst *TailRecursionEliminator::findTRECandidate(Instruction *TI) { --BBI; } - assert((!CI->isTailCall() || !CI->isNoTailCall()) && - "Incompatible call site attributes(Tail,NoTail)"); - if (!CI->isTailCall()) + // If this call is marked as a tail call, and if there are dynamic allocas in + // the function, we cannot perform this optimization. + if (CI->isTailCall() && CannotTailCallElimCallsMarkedTail) return nullptr; // As a special case, detect code like this: @@ -502,13 +498,26 @@ void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) { BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry); BI->setDebugLoc(CI->getDebugLoc()); - // Move all fixed sized allocas from HeaderBB to NewEntry. - for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(), - NEBI = NewEntry->begin(); - OEBI != E;) - if (AllocaInst *AI = dyn_cast(OEBI++)) - if (isa(AI->getArraySize())) - AI->moveBefore(&*NEBI); + // If this function has self recursive calls in the tail position where some + // are marked tail and some are not, only transform one flavor or another. + // We have to choose whether we move allocas in the entry block to the new + // entry block or not, so we can't make a good choice for both. We make this + // decision here based on whether the first call we found to remove is + // marked tail. + // NOTE: We could do slightly better here in the case that the function has + // no entry block allocas. + RemovableCallsMustBeMarkedTail = CI->isTailCall(); + + // If this tail call is marked 'tail' and if there are any allocas in the + // entry block, move them up to the new entry block. + if (RemovableCallsMustBeMarkedTail) + // Move all fixed sized allocas from HeaderBB to NewEntry. + for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(), + NEBI = NewEntry->begin(); + OEBI != E;) + if (AllocaInst *AI = dyn_cast(OEBI++)) + if (isa(AI->getArraySize())) + AI->moveBefore(&*NEBI); // Now that we have created a new block, which jumps to the entry // block, insert a PHI node for each argument of the function. @@ -583,7 +592,7 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { Instruction *AccRecInstr = nullptr; BasicBlock::iterator BBI(CI); for (++BBI; &*BBI != Ret; ++BBI) { - if (canMoveAboveCall(&*BBI, CI, AA, AllocaForValue)) + if (canMoveAboveCall(&*BBI, CI, AA)) continue; // If we can't move the instruction above the call, it might be because it @@ -611,6 +620,9 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { if (!HeaderBB) createTailRecurseLoopHeader(CI); + if (RemovableCallsMustBeMarkedTail && !CI->isTailCall()) + return false; + // Ok, now that we know we have a pseudo-entry block WITH all of the // required PHI nodes, add entries into the PHI node for the actual // parameters passed into the tail-recursive call. @@ -660,7 +672,8 @@ bool TailRecursionEliminator::eliminateCall(CallInst *CI) { return true; } -bool TailRecursionEliminator::foldReturnAndProcessPred(ReturnInst *Ret) { +bool TailRecursionEliminator::foldReturnAndProcessPred( + ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) { BasicBlock *BB = Ret->getParent(); bool Change = false; @@ -685,7 +698,8 @@ bool TailRecursionEliminator::foldReturnAndProcessPred(ReturnInst *Ret) { while (!UncondBranchPreds.empty()) { BranchInst *BI = UncondBranchPreds.pop_back_val(); BasicBlock *Pred = BI->getParent(); - if (CallInst *CI = findTRECandidate(BI)) { + if (CallInst *CI = + findTRECandidate(BI, CannotTailCallElimCallsMarkedTail)) { LLVM_DEBUG(dbgs() << "FOLDING: " << *BB << "INTO UNCOND BRANCH PRED: " << *Pred); FoldReturnIntoUncondBranch(Ret, BB, Pred, &DTU); @@ -706,8 +720,9 @@ bool TailRecursionEliminator::foldReturnAndProcessPred(ReturnInst *Ret) { return Change; } -bool TailRecursionEliminator::processReturningBlock(ReturnInst *Ret) { - CallInst *CI = findTRECandidate(Ret); +bool TailRecursionEliminator::processReturningBlock( + ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) { + CallInst *CI = findTRECandidate(Ret, CannotTailCallElimCallsMarkedTail); if (!CI) return false; @@ -795,25 +810,35 @@ bool TailRecursionEliminator::eliminate(Function &F, return false; bool MadeChange = false; - MadeChange |= markTails(F, ORE); + bool AllCallsAreTailCalls = false; + MadeChange |= markTails(F, AllCallsAreTailCalls, ORE); + if (!AllCallsAreTailCalls) + return MadeChange; // If this function is a varargs function, we won't be able to PHI the args // right, so don't even try to convert it... if (F.getFunctionType()->isVarArg()) return MadeChange; - if (!canTRE(F)) - return MadeChange; + // If false, we cannot perform TRE on tail calls marked with the 'tail' + // attribute, because doing so would cause the stack size to increase (real + // TRE would deallocate variable sized allocas, TRE doesn't). + bool CanTRETailMarkedCall = canTRE(F); TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU); // Change any tail recursive calls to loops. + // + // FIXME: The code generator produces really bad code when an 'escaping + // alloca' is changed from being a static alloca to being a dynamic alloca. + // Until this is resolved, disable this transformation if that would ever + // happen. This bug is PR962. for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /*in loop*/) { BasicBlock *BB = &*BBI++; // foldReturnAndProcessPred may delete BB. if (ReturnInst *Ret = dyn_cast(BB->getTerminator())) { - bool Change = TRE.processReturningBlock(Ret); + bool Change = TRE.processReturningBlock(Ret, !CanTRETailMarkedCall); if (!Change && BB->getFirstNonPHIOrDbg() == Ret) - Change = TRE.foldReturnAndProcessPred(Ret); + Change = TRE.foldReturnAndProcessPred(Ret, !CanTRETailMarkedCall); MadeChange |= Change; } } diff --git a/llvm/test/Transforms/TailCallElim/basic.ll b/llvm/test/Transforms/TailCallElim/basic.ll index 669210da6314..6116014a024b 100644 --- a/llvm/test/Transforms/TailCallElim/basic.ll +++ b/llvm/test/Transforms/TailCallElim/basic.ll @@ -12,16 +12,15 @@ define void @test0() { ret void } -; Make sure that we do not do TRE if pointer to local stack -; escapes through function call. +; PR615. Make sure that we do not move the alloca so that it interferes with the tail call. define i32 @test1() { ; CHECK: i32 @test1() ; CHECK-NEXT: alloca %A = alloca i32 ; [#uses=2] store i32 5, i32* %A call void @use(i32* %A) -; CHECK: call i32 @test1 - %X = call i32 @test1() ; [#uses=1] +; CHECK: tail call i32 @test1 + %X = tail call i32 @test1() ; [#uses=1] ret i32 %X } diff --git a/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll b/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll deleted file mode 100644 index 8f69087dd879..000000000000 --- a/llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll +++ /dev/null @@ -1,125 +0,0 @@ -; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -tailcallelim -verify-dom-info -S | FileCheck %s - -; This test checks that TRE would be done for only one recursive call. -; The test_multiple_exits function has three recursive calls. -; First recursive call could not be eliminated because there is -; escaped pointer to local variable. Second recursive call could -; be eliminated. Thrid recursive call could not be eliminated since -; this is not last call. Thus, test checks that TRE would be done -; for only second recursive call. - -; IR for that test was generated from the following C++ source: -; -; void capture_arg (int*); -; void test_multiple_exits (int param); -; if (param >= 0 && param < 10) { -; int temp; -; capture_arg(&temp); -; // TRE could not be done because pointer to local -; // variable "temp" is escaped. -; test_multiple_exits(param + 1); -; } else if (param >=10 && param < 20) { -; // TRE should be done. -; test_multiple_exits(param + 1); -; } else if (param >= 20 && param < 22) { -; // TRE could not be done since recursive -; // call is not last call. -; test_multiple_exits(param + 1); -; func(); -; } -; -; return; -; } - -; Function Attrs: noinline optnone uwtable -declare void @_Z11capture_argPi(i32* %param) #0 - -; Function Attrs: noinline optnone uwtable -declare void @_Z4funcv() #0 - -; Function Attrs: noinline nounwind uwtable -define dso_local void @_Z19test_multiple_exitsi(i32 %param) local_unnamed_addr #2 { -; CHECK-LABEL: @_Z19test_multiple_exitsi( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4 -; CHECK-NEXT: br label [[TAILRECURSE:%.*]] -; CHECK: tailrecurse: -; CHECK-NEXT: [[PARAM_TR:%.*]] = phi i32 [ [[PARAM:%.*]], [[ENTRY:%.*]] ], [ [[ADD6:%.*]], [[IF_THEN5:%.*]] ] -; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[PARAM_TR]], 10 -; CHECK-NEXT: br i1 [[TMP0]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] -; CHECK: if.then: -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[TEMP]] to i8* -; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP1]]) #1 -; CHECK-NEXT: call void @_Z11capture_argPi(i32* nonnull [[TEMP]]) -; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[PARAM_TR]], 1 -; CHECK-NEXT: call void @_Z19test_multiple_exitsi(i32 [[ADD]]) -; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP1]]) #1 -; CHECK-NEXT: br label [[IF_END14:%.*]] -; CHECK: if.else: -; CHECK-NEXT: [[PARAM_OFF:%.*]] = add i32 [[PARAM_TR]], -10 -; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[PARAM_OFF]], 10 -; CHECK-NEXT: br i1 [[TMP2]], label [[IF_THEN5]], label [[IF_ELSE7:%.*]] -; CHECK: if.then5: -; CHECK-NEXT: [[ADD6]] = add nuw nsw i32 [[PARAM_TR]], 1 -; CHECK-NEXT: br label [[TAILRECURSE]] -; CHECK: if.else7: -; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[PARAM_TR]], -2 -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP3]], 20 -; CHECK-NEXT: br i1 [[TMP4]], label [[IF_THEN11:%.*]], label [[IF_END14]] -; CHECK: if.then11: -; CHECK-NEXT: [[ADD12:%.*]] = add nsw i32 [[PARAM_TR]], 1 -; CHECK-NEXT: tail call void @_Z19test_multiple_exitsi(i32 [[ADD12]]) -; CHECK-NEXT: tail call void @_Z4funcv() -; CHECK-NEXT: ret void -; CHECK: if.end14: -; CHECK-NEXT: ret void -; -entry: - %temp = alloca i32, align 4 - %0 = icmp ult i32 %param, 10 - br i1 %0, label %if.then, label %if.else - -if.then: ; preds = %entry - %1 = bitcast i32* %temp to i8* - call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1) #2 - call void @_Z11capture_argPi(i32* nonnull %temp) - %add = add nuw nsw i32 %param, 1 - call void @_Z19test_multiple_exitsi(i32 %add) - call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1) #2 - br label %if.end14 - -if.else: ; preds = %entry - %param.off = add i32 %param, -10 - %2 = icmp ult i32 %param.off, 10 - br i1 %2, label %if.then5, label %if.else7 - -if.then5: ; preds = %if.else - %add6 = add nuw nsw i32 %param, 1 - call void @_Z19test_multiple_exitsi(i32 %add6) - br label %if.end14 - -if.else7: ; preds = %if.else - %3 = and i32 %param, -2 - %4 = icmp eq i32 %3, 20 - br i1 %4, label %if.then11, label %if.end14 - -if.then11: ; preds = %if.else7 - %add12 = add nsw i32 %param, 1 - call void @_Z19test_multiple_exitsi(i32 %add12) - call void @_Z4funcv() - br label %if.end14 - -if.end14: ; preds = %if.then5, %if.then11, %if.else7, %if.then - ret void -} - -; Function Attrs: argmemonly nounwind willreturn -declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 - -; Function Attrs: argmemonly nounwind willreturn -declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 - -attributes #0 = { nofree noinline norecurse nounwind uwtable } -attributes #1 = { nounwind uwtable } -attributes #2 = { argmemonly nounwind willreturn } diff --git a/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll b/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll deleted file mode 100644 index 2168437fc570..000000000000 --- a/llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll +++ /dev/null @@ -1,74 +0,0 @@ -; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -tailcallelim -verify-dom-info -S | FileCheck %s - -; IR for that test was generated from the following C++ source: -; -;int count; -;__attribute__((noinline)) void globalIncrement(const int* param) { count += *param; } -; -;void test(int recurseCount) -;{ -; if (recurseCount == 0) return; -; int temp = 10; -; globalIncrement(&temp); -; test(recurseCount - 1); -;} -; - - at count = dso_local local_unnamed_addr global i32 0, align 4 - -; Function Attrs: nofree noinline norecurse nounwind uwtable -declare void @_Z15globalIncrementPKi(i32* nocapture readonly %param) #0 - -; Test that TRE could be done for recursive tail routine containing -; call to function receiving a pointer to local stack. - -; Function Attrs: nounwind uwtable -define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 { -; CHECK-LABEL: @_Z4testi( -; CHECK-NEXT: entry: -; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4 -; CHECK-NEXT: br label [[TAILRECURSE:%.*]] -; CHECK: tailrecurse: -; CHECK-NEXT: [[RECURSECOUNT_TR:%.*]] = phi i32 [ [[RECURSECOUNT:%.*]], [[ENTRY:%.*]] ], [ [[SUB:%.*]], [[IF_END:%.*]] ] -; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[RECURSECOUNT_TR]], 0 -; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[IF_END]] -; CHECK: if.end: -; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[TEMP]] to i8* -; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]]) -; CHECK-NEXT: store i32 10, i32* [[TEMP]], align 4 -; CHECK-NEXT: call void @_Z15globalIncrementPKi(i32* nonnull [[TEMP]]) -; CHECK-NEXT: [[SUB]] = add nsw i32 [[RECURSECOUNT_TR]], -1 -; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]]) -; CHECK-NEXT: br label [[TAILRECURSE]] -; CHECK: return: -; CHECK-NEXT: ret void -; -entry: - %temp = alloca i32, align 4 - %cmp = icmp eq i32 %recurseCount, 0 - br i1 %cmp, label %return, label %if.end - -if.end: ; preds = %entry - %0 = bitcast i32* %temp to i8* - call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6 - store i32 10, i32* %temp, align 4 - call void @_Z15globalIncrementPKi(i32* nonnull %temp) - %sub = add nsw i32 %recurseCount, -1 - call void @_Z4testi(i32 %sub) - call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6 - br label %return - -return: ; preds = %entry, %if.end - ret void -} - -; Function Attrs: argmemonly nounwind willreturn -declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 - -; Function Attrs: argmemonly nounwind willreturn -declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 - -attributes #0 = { nofree noinline norecurse nounwind uwtable } -attributes #1 = { nounwind uwtable } -attributes #2 = { argmemonly nounwind willreturn } From llvm-commits at lists.llvm.org Sun Jul 12 15:01:56 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:01:56 +0000 (UTC) Subject: [PATCH] D83370: [CallGraph] Ignore callback uses In-Reply-To: References: Message-ID: <5ad0fb971f21e5283c34ef5dd46a01e1@localhost.localdomain> jdoerfert accepted this revision. jdoerfert added inline comments. ================ Comment at: llvm/lib/IR/Function.cpp:1499 + continue; + const auto *Call = dyn_cast(FU); ---------------- jdoerfert wrote: > You might need to inspect the ACS here. Check if it is really a callback callsite. Nit: Put the ACS stuff into a conditional guarded by the ignore. Trying to create an ACS is not completely free. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83370/new/ https://reviews.llvm.org/D83370 From llvm-commits at lists.llvm.org Sun Jul 12 15:03:49 2020 From: llvm-commits at lists.llvm.org (Alexey Lapshin via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:03:49 +0000 (UTC) Subject: [PATCH] D82269: [TRE][NFC] Refactor Basic Block Processing In-Reply-To: References: Message-ID: <0b83fabdbdad3f518e22c08289124a13@localhost.localdomain> avl added a comment. > I'll wait and rebase after D82085 . It looks like D82085 needs more work(it did not pass two staged build). I suggest not to wait D82085 and go head with D82269 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82269/new/ https://reviews.llvm.org/D82269 From llvm-commits at lists.llvm.org Sun Jul 12 15:04:06 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:04:06 +0000 (UTC) Subject: [PATCH] D83010: [flang] Add inliner pass. In-Reply-To: References: Message-ID: <65c40fcc9e075c1995a528cde3f25678@localhost.localdomain> mehdi_amini added inline comments. ================ Comment at: flang/include/flang/Optimizer/Transforms/Passes.h:43 +/// nodes as block arguments. +std::unique_ptr createMemToRegPass(); + ---------------- None of the above seem related to the inline to me? ================ Comment at: flang/include/flang/Optimizer/Transforms/Passes.td:51 + +#endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES ---------------- Here as well I don't understand how any of these is about inlining? ================ Comment at: flang/lib/Optimizer/Transforms/Inliner.cpp:18 + llvm::cl::desc("aggressively inline everything"), + llvm::cl::init(false)); + ---------------- Ideally please favor "Pass options" instead of globals as much as possible (also keep in mind that global cl::opt are really for debugging) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83010/new/ https://reviews.llvm.org/D83010 From llvm-commits at lists.llvm.org Sun Jul 12 15:04:50 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:04:50 +0000 (UTC) Subject: [PATCH] D83010: [flang] Add inliner pass. In-Reply-To: References: Message-ID: mehdi_amini added a comment. (It seems like you didn't preserve the URL in the commit message which prevent Phabricator from linking to the revision) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83010/new/ https://reviews.llvm.org/D83010 From llvm-commits at lists.llvm.org Sun Jul 12 15:29:48 2020 From: llvm-commits at lists.llvm.org (Ulrich Weigand via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:29:48 +0000 (UTC) Subject: [PATCH] D80833: [CodeView] Add full repro to LF_BUILDINFO record In-Reply-To: References: Message-ID: uweigand added a comment. In D80833#2123508 , @aganea wrote: > In D80833#2109172 , @uweigand wrote: > > > Hmm, with clang-cl it seems the driver is trying to use this: > > Target: s390x-pc-windows-msvc > > which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS. > > > I'm seeing some of the tests are setting the target explicitly `%clang_cl --target=x86_64-windows-msvc`. Would that work on your machine? Or should I do `UNSUPPORTED: s390x` ? Sorry, looks like I missed this. I think using an explicit target should work. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80833/new/ https://reviews.llvm.org/D80833 From llvm-commits at lists.llvm.org Sun Jul 12 15:56:24 2020 From: llvm-commits at lists.llvm.org (via llvm-commits) Date: Sun, 12 Jul 2020 15:56:24 -0700 (PDT) Subject: [llvm] c73f425 - [Attributor] Add AAValueSimplifyCallSiteArgument::manifest Message-ID: <5f0b9518.1c69fb81.1ec84.edc8@mx.google.com> Author: Shinji Okumura Date: 2020-07-13T07:01:50+09:00 New Revision: c73f425f84ad18e4b610dff7d21a5844fb0da5d7 URL: https://github.com/llvm/llvm-project/commit/c73f425f84ad18e4b610dff7d21a5844fb0da5d7 DIFF: https://github.com/llvm/llvm-project/commit/c73f425f84ad18e4b610dff7d21a5844fb0da5d7.diff LOG: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D82861 Added: Modified: llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/range.ll Removed: ################################################################################ diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp index dbc1541b9950..7e9fd61eeb41 100644 --- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp +++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp @@ -4670,6 +4670,30 @@ struct AAValueSimplifyCallSiteArgument : AAValueSimplifyFloating { AAValueSimplifyCallSiteArgument(const IRPosition &IRP, Attributor &A) : AAValueSimplifyFloating(IRP, A) {} + /// See AbstractAttribute::manifest(...). + ChangeStatus manifest(Attributor &A) override { + ChangeStatus Changed = ChangeStatus::UNCHANGED; + + if (SimplifiedAssociatedValue.hasValue() && + !SimplifiedAssociatedValue.getValue()) + return Changed; + + Value &V = getAssociatedValue(); + auto *C = SimplifiedAssociatedValue.hasValue() + ? dyn_cast(SimplifiedAssociatedValue.getValue()) + : UndefValue::get(V.getType()); + if (C) { + Use &U = cast(&getAnchorValue())->getArgOperandUse(getArgNo()); + // We can replace the AssociatedValue with the constant. + if (&V != C && V.getType() == C->getType()) { + if (A.changeUseAfterManifest(U, *C)) + Changed = ChangeStatus::CHANGED; + } + } + + return Changed | AAValueSimplify::manifest(A); + } + void trackStatistics() const override { STATS_DECLTRACK_CSARG_ATTR(value_simplify) } diff --git a/llvm/test/Transforms/Attributor/range.ll b/llvm/test/Transforms/Attributor/range.ll index 03338b4ce499..f105bb3fad0e 100644 --- a/llvm/test/Transforms/Attributor/range.ll +++ b/llvm/test/Transforms/Attributor/range.ll @@ -1063,6 +1063,71 @@ end: } +define i32 @func(i1 %c) { +; CHECK-LABEL: define {{[^@]+}}@func +; CHECK-SAME: (i1 [[C:%.*]]) +; CHECK-NEXT: [[RET:%.*]] = select i1 [[C]], i32 0, i32 1 +; CHECK-NEXT: ret i32 [[RET]] +; + %ret = select i1 %c, i32 0, i32 1 + ret i32 %ret +} + +define i32 @simplify_callsite_argument(i1 %d) { +; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@simplify_callsite_argument +; IS__TUNIT_OPM-SAME: (i1 [[D:%.*]]) +; IS__TUNIT_OPM-NEXT: [[C:%.*]] = select i1 [[D]], i1 true, i1 false +; IS__TUNIT_OPM-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] +; IS__TUNIT_OPM: t: +; IS__TUNIT_OPM-NEXT: [[RET1:%.*]] = call i32 @func(i1 [[C]]) #2, !range !3 +; IS__TUNIT_OPM-NEXT: ret i32 [[RET1]] +; IS__TUNIT_OPM: f: +; IS__TUNIT_OPM-NEXT: [[RET2:%.*]] = call i32 @func(i1 false) #2, !range !3 +; IS__TUNIT_OPM-NEXT: ret i32 [[RET2]] +; +; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@simplify_callsite_argument +; IS__TUNIT_NPM-SAME: (i1 [[D:%.*]]) +; IS__TUNIT_NPM-NEXT: [[C:%.*]] = select i1 [[D]], i1 true, i1 false +; IS__TUNIT_NPM-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] +; IS__TUNIT_NPM: t: +; IS__TUNIT_NPM-NEXT: [[RET1:%.*]] = call i32 @func(i1 true) #1, !range !4 +; IS__TUNIT_NPM-NEXT: ret i32 [[RET1]] +; IS__TUNIT_NPM: f: +; IS__TUNIT_NPM-NEXT: [[RET2:%.*]] = call i32 @func(i1 false) #1, !range !4 +; IS__TUNIT_NPM-NEXT: ret i32 [[RET2]] +; +; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@simplify_callsite_argument +; IS__CGSCC_OPM-SAME: (i1 [[D:%.*]]) +; IS__CGSCC_OPM-NEXT: [[C:%.*]] = select i1 [[D]], i1 true, i1 false +; IS__CGSCC_OPM-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] +; IS__CGSCC_OPM: t: +; IS__CGSCC_OPM-NEXT: [[RET1:%.*]] = call i32 @func(i1 [[C]]) +; IS__CGSCC_OPM-NEXT: ret i32 [[RET1]] +; IS__CGSCC_OPM: f: +; IS__CGSCC_OPM-NEXT: [[RET2:%.*]] = call i32 @func(i1 false) +; IS__CGSCC_OPM-NEXT: ret i32 [[RET2]] +; +; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@simplify_callsite_argument +; IS__CGSCC_NPM-SAME: (i1 [[D:%.*]]) +; IS__CGSCC_NPM-NEXT: [[C:%.*]] = select i1 [[D]], i1 true, i1 false +; IS__CGSCC_NPM-NEXT: br i1 [[C]], label [[T:%.*]], label [[F:%.*]] +; IS__CGSCC_NPM: t: +; IS__CGSCC_NPM-NEXT: [[RET1:%.*]] = call i32 @func(i1 true) +; IS__CGSCC_NPM-NEXT: ret i32 [[RET1]] +; IS__CGSCC_NPM: f: +; IS__CGSCC_NPM-NEXT: [[RET2:%.*]] = call i32 @func(i1 false) +; IS__CGSCC_NPM-NEXT: ret i32 [[RET2]] +; + %c = select i1 %d, i1 true, i1 false + br i1 %c, label %t, label %f +t: + %ret1 = call i32 @func(i1 %c) + ret i32 %ret1 +f: + %ret2 = call i32 @func(i1 false) + ret i32 %ret2 +} + !0 = !{i32 0, i32 10} !1 = !{i32 10, i32 100} From llvm-commits at lists.llvm.org Sun Jul 12 15:56:35 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 22:56:35 +0000 (UTC) Subject: [PATCH] D82861: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest In-Reply-To: References: Message-ID: <8cb73dfd1a58c667f38a82d2a0f4284f@localhost.localdomain> This revision was automatically updated to reflect the committed changes. Closed by commit rGc73f425f84ad: [Attributor] Add AAValueSimplifyCallSiteArgument::manifest (authored by okura). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82861/new/ https://reviews.llvm.org/D82861 Files: llvm/lib/Transforms/IPO/AttributorAttributes.cpp llvm/test/Transforms/Attributor/range.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D82861.277307.patch Type: text/x-patch Size: 4272 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 16:00:19 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 23:00:19 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <6903a06a4e5c20414ca4be02bf8da688@localhost.localdomain> tmsriram updated this revision to Diff 277306. tmsriram marked 8 inline comments as done. tmsriram edited the summary of this revision. tmsriram added a comment. - Change the callee saved register test to use a simple clobber on all callee saved registers and test cfi_offset for all the callee-save registers including rbp. - The tests now cover the 2 issues that need special handling with sections: CFA tracking with and without frame pointer and cfi directives for callee saved registers. - Other reviewer mentioned changes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 Files: llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCFIException.cpp llvm/lib/CodeGen/AsmPrinter/DwarfException.h llvm/lib/CodeGen/CFIInstrInserter.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/X86/X86FrameLowering.cpp llvm/lib/Target/X86/X86FrameLowering.h llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll -------------- next part -------------- A non-text attachment was scrubbed... Name: D79978.277306.patch Type: text/x-patch Size: 15918 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 16:01:59 2020 From: llvm-commits at lists.llvm.org (Sriraman Tallam via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 23:01:59 +0000 (UTC) Subject: [PATCH] D79978: Call Frame Information (CFI) Handling for Basic Block Sections In-Reply-To: References: Message-ID: <7f070a6f114af9afdd35c38e959544ef@localhost.localdomain> tmsriram added a comment. In D79978#2146366 , @dblaikie wrote: > In D79978#2146131 , @tmsriram wrote: > > > Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict : > > > > 1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped. > > > Just on this point - it's not clear to me the motivation for the difference in testing between the two tests (one testing assembly, the other testing LLVM's MIR) - was there some particular difference/detail that these different strategies enabled testing? Otherwise, yeah, I'd probably just go with only testing assembly in both cases, for consistency/simplicity? (any variation in tests always makes me wonder "what's the reason for this difference? Must be something important or they'd be the same and so I'm not understanding some important difference") You are right. For reasons that are long forgotten since this was written a while go, one of us thought it would be good to directly check CFI IR instructions too. Like you note, we could have just tested assembly, changed now. ================ Comment at: llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll:3 +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=asm --frame-pointer=none -o - | FileCheck --check-prefix=SECTIONS_NOFP_CFI %s +; RUN: llc -O0 %s --basicblock-sections=all -mtriple=x86_64-unknown-linux-gnu -filetype=obj --frame-pointer=all -o - | llvm-dwarfdump --debug-frame - | FileCheck --check-prefix=DEBUG_FRAME %s + ---------------- dblaikie wrote: > MaskRay wrote: > > While `--eh-frame` is an alias for `--debug-frame`, I think using `--eh-frame` here is more appropriate. This tests .eh_frame, not .debug_frame. > > > Agreed - the check on line 51 should be updated too. llvm-dwarfdump's output is actually rendering an empty debug_frame and then after that it's rendering the eh_frame - the test is currently a bit misleading down there too. (& maybe llvm-dwarfdump's output is prone to such misleading testing, unfortunately - it prints the name of every section explicitly requested (& considers eh_frame and debug_frame explicitly requested when using --eh-frame or --debug-frame) even if they're empty, so you can easily get this sort of flow-on effect) Fixed this, stems from my mis-understanding of the various tools output. ================ Comment at: llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll:21-36 +; Exhaust caller-saved parameter registers and force callee saved registers to +; be used. This tests that CFI directives for callee saved registers are +; generated with basic block sections. +; extern void f1(int, int, int); +; +; void foo(bool k, int p1, int p2, int p3, int p4, int p5, int p6) { +; // Using a conditional forces a basic block section. ---------------- dblaikie wrote: > Thanks for this - totally makes sense to me now! > (Totally optional: I'd probably just write this with one parameter? And/or perhaps with constants for the first f1 call? - I've needed to do things like this when testing the DWARF OP_entry_values: > > ``` > void f1(int); > extern bool b; > void f3(int i) { > if (b) { // adds a basic block > f1(1); // clobbers 'i', causing it to be spilled > f1(i); // keeps 'i' alive > } > } > ``` > > Not to nitpick - just trying to help ensure tests are very specifically targeted - at least makes them easier for me to read, not sure about other folks? (if there's something interesting about testing more than one of these - some comments describing that'd be handy too) > > What you've got is fine though, if you prefer it/find it valuable > > Thanks for the general write-up of CFI and this patch - helped me understand a bit of what this patch in general, and this test in particular are all about!) Marking it done since you wanted to go with your other suggestion with inline asm. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79978/new/ https://reviews.llvm.org/D79978 From llvm-commits at lists.llvm.org Sun Jul 12 16:37:29 2020 From: llvm-commits at lists.llvm.org (Valentin Clement via Phabricator via llvm-commits) Date: Sun, 12 Jul 2020 23:37:29 +0000 (UTC) Subject: [PATCH] D83649: [flang][openacc] OpenACC 3.0 parser Message-ID: clementval created this revision. clementval added reviewers: nvdatian, sscalpone, tskeith, klausler, ichoyjx. Herald added subscribers: llvm-commits, sstefan1, jfb, hiraditya, mgorny. Herald added a reviewer: jdoerfert. Herald added a reviewer: DavidTruby. Herald added a project: LLVM. This patch introduce the parser for OpenACC 3.0 in Flang. It uses the same TableGen mechanism than OpenMP. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83649 Files: flang/include/flang/Common/Fortran-features.h flang/include/flang/Parser/dump-parse-tree.h flang/include/flang/Parser/parse-tree.h flang/lib/Parser/CMakeLists.txt flang/lib/Parser/executable-parsers.cpp flang/lib/Parser/openacc-parsers.cpp flang/lib/Parser/parsing.cpp flang/lib/Parser/program-parsers.cpp flang/lib/Parser/stmt-parser.h flang/lib/Parser/type-parsers.h flang/lib/Parser/unparse.cpp flang/lib/Semantics/resolve-names.cpp flang/test/Semantics/acc-validity.f90 flang/tools/f18-parse-demo/CMakeLists.txt flang/tools/f18/CMakeLists.txt flang/tools/f18/f18.cpp llvm/include/llvm/CMakeLists.txt llvm/include/llvm/Frontend/CMakeLists.txt llvm/include/llvm/Frontend/Directive/DirectiveBase.td llvm/include/llvm/Frontend/OpenACC/ACC.td llvm/include/llvm/Frontend/OpenACC/CMakeLists.txt llvm/lib/Frontend/CMakeLists.txt llvm/lib/Frontend/OpenACC/CMakeLists.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: D83649.277310.patch Type: text/x-patch Size: 72731 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 17:12:55 2020 From: llvm-commits at lists.llvm.org (Craig Topper via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 00:12:55 +0000 (UTC) Subject: [PATCH] D83330: [PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. In-Reply-To: References: Message-ID: <71e554b985b14797f3765d67fc038547@localhost.localdomain> craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83330/new/ https://reviews.llvm.org/D83330 From llvm-commits at lists.llvm.org Sun Jul 12 17:20:07 2020 From: llvm-commits at lists.llvm.org (Craig Topper via llvm-commits) Date: Sun, 12 Jul 2020 17:20:07 -0700 (PDT) Subject: [llvm] b4dbb37 - [X86] Rename X86_CPU_TYPE_COMPAT_ALIAS/X86_CPU_TYPE_COMPAT/X86_CPU_SUBTYPE_COMPAT macros. NFC Message-ID: <5f0ba8b7.1c69fb81.564d.d8a8@mx.google.com> Author: Craig Topper Date: 2020-07-12T17:00:24-07:00 New Revision: b4dbb37f32e554e4d6f118d9ddd87717721ea664 URL: https://github.com/llvm/llvm-project/commit/b4dbb37f32e554e4d6f118d9ddd87717721ea664 DIFF: https://github.com/llvm/llvm-project/commit/b4dbb37f32e554e4d6f118d9ddd87717721ea664.diff LOG: [X86] Rename X86_CPU_TYPE_COMPAT_ALIAS/X86_CPU_TYPE_COMPAT/X86_CPU_SUBTYPE_COMPAT macros. NFC Remove _COMPAT. Drop the ARCHNAME. Remove the non-COMPAT versions that are no longer needed. We now only use these macros in places where we need compatibility with libgcc/compiler-rt. So we don't need to call out _COMPAT specifically. Added: Modified: clang/lib/Basic/Targets/X86.cpp clang/lib/CodeGen/CGBuiltin.cpp llvm/include/llvm/Support/X86TargetParser.def llvm/include/llvm/Support/X86TargetParser.h Removed: ################################################################################ diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp index e280a7216645..543f232d2459 100644 --- a/clang/lib/Basic/Targets/X86.cpp +++ b/clang/lib/Basic/Targets/X86.cpp @@ -1062,9 +1062,9 @@ void X86TargetInfo::getCPUSpecificCPUDispatchFeatures( bool X86TargetInfo::validateCpuIs(StringRef FeatureStr) const { return llvm::StringSwitch(FeatureStr) #define X86_VENDOR(ENUM, STRING) .Case(STRING, true) -#define X86_CPU_TYPE_COMPAT_ALIAS(ENUM, ALIAS) .Case(ALIAS, true) -#define X86_CPU_TYPE_COMPAT(ARCHNAME, ENUM, STR) .Case(STR, true) -#define X86_CPU_SUBTYPE_COMPAT(ARCHNAME, ENUM, STR) .Case(STR, true) +#define X86_CPU_TYPE_ALIAS(ENUM, ALIAS) .Case(ALIAS, true) +#define X86_CPU_TYPE(ENUM, STR) .Case(STR, true) +#define X86_CPU_SUBTYPE(ENUM, STR) .Case(STR, true) #include "llvm/Support/X86TargetParser.def" .Default(false); } diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 1d81ede5dc31..35a93a7889f4 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -11655,11 +11655,11 @@ Value *CodeGenFunction::EmitX86CpuIs(StringRef CPUStr) { std::tie(Index, Value) = StringSwitch>(CPUStr) #define X86_VENDOR(ENUM, STRING) \ .Case(STRING, {0u, static_cast(llvm::X86::ENUM)}) -#define X86_CPU_TYPE_COMPAT_ALIAS(ENUM, ALIAS) \ +#define X86_CPU_TYPE_ALIAS(ENUM, ALIAS) \ .Case(ALIAS, {1u, static_cast(llvm::X86::ENUM)}) -#define X86_CPU_TYPE_COMPAT(ARCHNAME, ENUM, STR) \ +#define X86_CPU_TYPE(ENUM, STR) \ .Case(STR, {1u, static_cast(llvm::X86::ENUM)}) -#define X86_CPU_SUBTYPE_COMPAT(ARCHNAME, ENUM, STR) \ +#define X86_CPU_SUBTYPE(ENUM, STR) \ .Case(STR, {2u, static_cast(llvm::X86::ENUM)}) #include "llvm/Support/X86TargetParser.def" .Default({0, 0}); diff --git a/llvm/include/llvm/Support/X86TargetParser.def b/llvm/include/llvm/Support/X86TargetParser.def index 9e9f0985d15e..697f8c70f962 100644 --- a/llvm/include/llvm/Support/X86TargetParser.def +++ b/llvm/include/llvm/Support/X86TargetParser.def @@ -20,80 +20,70 @@ X86_VENDOR(VENDOR_AMD, "amd") #undef X86_VENDOR // This macro is used for cpu types present in compiler-rt/libgcc. -#ifndef X86_CPU_TYPE_COMPAT -#define X86_CPU_TYPE_COMPAT(ARCHNAME, ENUM, STR) X86_CPU_TYPE(ARCHNAME, ENUM) -#endif - #ifndef X86_CPU_TYPE -#define X86_CPU_TYPE(ARCHNAME, ENUM) +#define X86_CPU_TYPE(ENUM, STR) #endif -#ifndef X86_CPU_TYPE_COMPAT_ALIAS -#define X86_CPU_TYPE_COMPAT_ALIAS(ENUM, STR) +#ifndef X86_CPU_TYPE_ALIAS +#define X86_CPU_TYPE_ALIAS(ENUM, STR) #endif -// The first part of this list must match what is implemented in libgcc and -// compilert-rt. Clang uses this to know how to implement __builtin_cpu_is. -X86_CPU_TYPE_COMPAT("bonnell", INTEL_BONNELL, "bonnell") -X86_CPU_TYPE_COMPAT("core2", INTEL_CORE2, "core2") -X86_CPU_TYPE_COMPAT("nehalem", INTEL_COREI7, "corei7") -X86_CPU_TYPE_COMPAT("amdfam10", AMDFAM10H, "amdfam10h") -X86_CPU_TYPE_COMPAT("bdver1", AMDFAM15H, "amdfam15h") -X86_CPU_TYPE_COMPAT("silvermont", INTEL_SILVERMONT, "silvermont") -X86_CPU_TYPE_COMPAT("knl", INTEL_KNL, "knl") -X86_CPU_TYPE_COMPAT("btver1", AMD_BTVER1, "btver1") -X86_CPU_TYPE_COMPAT("btver2", AMD_BTVER2, "btver2") -X86_CPU_TYPE_COMPAT("znver1", AMDFAM17H, "amdfam17h") -X86_CPU_TYPE_COMPAT("knm", INTEL_KNM, "knm") -X86_CPU_TYPE_COMPAT("goldmont", INTEL_GOLDMONT, "goldmont") -X86_CPU_TYPE_COMPAT("goldmont-plus", INTEL_GOLDMONT_PLUS, "goldmont-plus") -X86_CPU_TYPE_COMPAT("tremont", INTEL_TREMONT, "tremont") +// This list must match what is implemented in libgcc and compilert-rt. Clang +// uses this to know how to implement __builtin_cpu_is. +X86_CPU_TYPE(INTEL_BONNELL, "bonnell") +X86_CPU_TYPE(INTEL_CORE2, "core2") +X86_CPU_TYPE(INTEL_COREI7, "corei7") +X86_CPU_TYPE(AMDFAM10H, "amdfam10h") +X86_CPU_TYPE(AMDFAM15H, "amdfam15h") +X86_CPU_TYPE(INTEL_SILVERMONT, "silvermont") +X86_CPU_TYPE(INTEL_KNL, "knl") +X86_CPU_TYPE(AMD_BTVER1, "btver1") +X86_CPU_TYPE(AMD_BTVER2, "btver2") +X86_CPU_TYPE(AMDFAM17H, "amdfam17h") +X86_CPU_TYPE(INTEL_KNM, "knm") +X86_CPU_TYPE(INTEL_GOLDMONT, "goldmont") +X86_CPU_TYPE(INTEL_GOLDMONT_PLUS, "goldmont-plus") +X86_CPU_TYPE(INTEL_TREMONT, "tremont") // Alternate names supported by __builtin_cpu_is and target multiversioning. -X86_CPU_TYPE_COMPAT_ALIAS(INTEL_BONNELL, "atom") -X86_CPU_TYPE_COMPAT_ALIAS(AMDFAM10H, "amdfam10") -X86_CPU_TYPE_COMPAT_ALIAS(AMDFAM15H, "amdfam15") -X86_CPU_TYPE_COMPAT_ALIAS(INTEL_SILVERMONT, "slm") +X86_CPU_TYPE_ALIAS(INTEL_BONNELL, "atom") +X86_CPU_TYPE_ALIAS(AMDFAM10H, "amdfam10") +X86_CPU_TYPE_ALIAS(AMDFAM15H, "amdfam15") +X86_CPU_TYPE_ALIAS(INTEL_SILVERMONT, "slm") -#undef X86_CPU_TYPE_COMPAT_ALIAS -#undef X86_CPU_TYPE_COMPAT +#undef X86_CPU_TYPE_ALIAS #undef X86_CPU_TYPE // This macro is used for cpu subtypes present in compiler-rt/libgcc. -#ifndef X86_CPU_SUBTYPE_COMPAT -#define X86_CPU_SUBTYPE_COMPAT(ARCHNAME, ENUM, STR) X86_CPU_SUBTYPE(ARCHNAME, ENUM) -#endif - #ifndef X86_CPU_SUBTYPE -#define X86_CPU_SUBTYPE(ARCHNAME, ENUM) +#define X86_CPU_SUBTYPE(ENUM, STR) #endif -// The first part of this list must match what is implemented in libgcc and -// compilert-rt. Clang uses this to know how to implement __builtin_cpu_is. -X86_CPU_SUBTYPE_COMPAT("nehalem", INTEL_COREI7_NEHALEM, "nehalem") -X86_CPU_SUBTYPE_COMPAT("westmere", INTEL_COREI7_WESTMERE, "westmere") -X86_CPU_SUBTYPE_COMPAT("sandybridge", INTEL_COREI7_SANDYBRIDGE, "sandybridge") -X86_CPU_SUBTYPE_COMPAT("amdfam10", AMDFAM10H_BARCELONA, "barcelona") -X86_CPU_SUBTYPE_COMPAT("amdfam10", AMDFAM10H_SHANGHAI, "shanghai") -X86_CPU_SUBTYPE_COMPAT("amdfam10", AMDFAM10H_ISTANBUL, "istanbul") -X86_CPU_SUBTYPE_COMPAT("bdver1", AMDFAM15H_BDVER1, "bdver1") -X86_CPU_SUBTYPE_COMPAT("bdver2", AMDFAM15H_BDVER2, "bdver2") -X86_CPU_SUBTYPE_COMPAT("bdver3", AMDFAM15H_BDVER3, "bdver3") -X86_CPU_SUBTYPE_COMPAT("bdver4", AMDFAM15H_BDVER4, "bdver4") -X86_CPU_SUBTYPE_COMPAT("znver1", AMDFAM17H_ZNVER1, "znver1") -X86_CPU_SUBTYPE_COMPAT("ivybridge", INTEL_COREI7_IVYBRIDGE, "ivybridge") -X86_CPU_SUBTYPE_COMPAT("haswell", INTEL_COREI7_HASWELL, "haswell") -X86_CPU_SUBTYPE_COMPAT("broadwell", INTEL_COREI7_BROADWELL, "broadwell") -X86_CPU_SUBTYPE_COMPAT("skylake", INTEL_COREI7_SKYLAKE, "skylake") -X86_CPU_SUBTYPE_COMPAT("skylake-avx512", INTEL_COREI7_SKYLAKE_AVX512, "skylake-avx512") -X86_CPU_SUBTYPE_COMPAT("cannonlake", INTEL_COREI7_CANNONLAKE, "cannonlake") -X86_CPU_SUBTYPE_COMPAT("icelake-client", INTEL_COREI7_ICELAKE_CLIENT, "icelake-client") -X86_CPU_SUBTYPE_COMPAT("icelake-server", INTEL_COREI7_ICELAKE_SERVER, "icelake-server") -X86_CPU_SUBTYPE_COMPAT("znver2", AMDFAM17H_ZNVER2, "znver2") -X86_CPU_SUBTYPE_COMPAT("cascadelake", INTEL_COREI7_CASCADELAKE, "cascadelake") -X86_CPU_SUBTYPE_COMPAT("tigerlake", INTEL_COREI7_TIGERLAKE, "tigerlake") -X86_CPU_SUBTYPE_COMPAT("cooperlake", INTEL_COREI7_COOPERLAKE, "cooperlake") -#undef X86_CPU_SUBTYPE_COMPAT +// This list must match what is implemented in libgcc and compilert-rt. Clang +// uses this to know how to implement __builtin_cpu_is. +X86_CPU_SUBTYPE(INTEL_COREI7_NEHALEM, "nehalem") +X86_CPU_SUBTYPE(INTEL_COREI7_WESTMERE, "westmere") +X86_CPU_SUBTYPE(INTEL_COREI7_SANDYBRIDGE, "sandybridge") +X86_CPU_SUBTYPE(AMDFAM10H_BARCELONA, "barcelona") +X86_CPU_SUBTYPE(AMDFAM10H_SHANGHAI, "shanghai") +X86_CPU_SUBTYPE(AMDFAM10H_ISTANBUL, "istanbul") +X86_CPU_SUBTYPE(AMDFAM15H_BDVER1, "bdver1") +X86_CPU_SUBTYPE(AMDFAM15H_BDVER2, "bdver2") +X86_CPU_SUBTYPE(AMDFAM15H_BDVER3, "bdver3") +X86_CPU_SUBTYPE(AMDFAM15H_BDVER4, "bdver4") +X86_CPU_SUBTYPE(AMDFAM17H_ZNVER1, "znver1") +X86_CPU_SUBTYPE(INTEL_COREI7_IVYBRIDGE, "ivybridge") +X86_CPU_SUBTYPE(INTEL_COREI7_HASWELL, "haswell") +X86_CPU_SUBTYPE(INTEL_COREI7_BROADWELL, "broadwell") +X86_CPU_SUBTYPE(INTEL_COREI7_SKYLAKE, "skylake") +X86_CPU_SUBTYPE(INTEL_COREI7_SKYLAKE_AVX512, "skylake-avx512") +X86_CPU_SUBTYPE(INTEL_COREI7_CANNONLAKE, "cannonlake") +X86_CPU_SUBTYPE(INTEL_COREI7_ICELAKE_CLIENT, "icelake-client") +X86_CPU_SUBTYPE(INTEL_COREI7_ICELAKE_SERVER, "icelake-server") +X86_CPU_SUBTYPE(AMDFAM17H_ZNVER2, "znver2") +X86_CPU_SUBTYPE(INTEL_COREI7_CASCADELAKE, "cascadelake") +X86_CPU_SUBTYPE(INTEL_COREI7_TIGERLAKE, "tigerlake") +X86_CPU_SUBTYPE(INTEL_COREI7_COOPERLAKE, "cooperlake") #undef X86_CPU_SUBTYPE diff --git a/llvm/include/llvm/Support/X86TargetParser.h b/llvm/include/llvm/Support/X86TargetParser.h index 4a4fb8ccc4cc..66c474b5c275 100644 --- a/llvm/include/llvm/Support/X86TargetParser.h +++ b/llvm/include/llvm/Support/X86TargetParser.h @@ -34,7 +34,7 @@ enum ProcessorVendors : unsigned { // as a proxy for what's in libgcc/compiler-rt. enum ProcessorTypes : unsigned { CPU_TYPE_DUMMY, -#define X86_CPU_TYPE(ARCHNAME, ENUM) \ +#define X86_CPU_TYPE(ENUM, STRING) \ ENUM, #include "llvm/Support/X86TargetParser.def" CPU_TYPE_MAX @@ -44,7 +44,7 @@ enum ProcessorTypes : unsigned { // as a proxy for what's in libgcc/compiler-rt. enum ProcessorSubtypes : unsigned { CPU_SUBTYPE_DUMMY, -#define X86_CPU_SUBTYPE(ARCHNAME, ENUM) \ +#define X86_CPU_SUBTYPE(ENUM, STRING) \ ENUM, #include "llvm/Support/X86TargetParser.def" CPU_SUBTYPE_MAX From llvm-commits at lists.llvm.org Sun Jul 12 17:20:41 2020 From: llvm-commits at lists.llvm.org (Ruiling, Song via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 00:20:41 +0000 (UTC) Subject: [PATCH] D83020: [AMDGPU] Avoid using s_cmpk when src0 is not register In-Reply-To: References: Message-ID: ruiling added a comment. ping. can you help push the patch? @arsenm @nhaehnle Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83020/new/ https://reviews.llvm.org/D83020 From llvm-commits at lists.llvm.org Sun Jul 12 18:45:04 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 01:45:04 +0000 (UTC) Subject: [PATCH] D83650: [FileCheck] Extend -dump-input with substitutions Message-ID: jdenny created this revision. jdenny added reviewers: probinson, thopre, jhenderson, mehdi_amini. Herald added subscribers: llvm-commits, hiraditya, arichardson. Herald added a project: LLVM. Substitutions are already reported in the diagnostics appearing before the input dump in the case of failed directives, and they're reported in traces (produced by `-vv -dump-input=never`) in the case of successful directives. However, those reports are not always convenient to view while investigating the input dump, so this patch adds the substitution report to the input dump too. For example: $ cat check CHECK: hello [[WHAT:[a-z]+]] CHECK: [[VERB]] [[WHAT]] $ FileCheck -vv -DVERB=goodbye check < input |& tail -8 <<<<<< 1: hello world check:1 ^~~~~~~~~~~ 2: goodbye word check:2'0 X~~~~~~~~~~~ error: no match found check:2'1 with "VERB" equal to "goodbye" check:2'2 with "WHAT" equal to "world" >>>>>> Without this patch, the location reported for a substitution for a directive match is the directive's full match range. This location is misleading as it implies the substitution itself matches that range. This patch changes the reported location to just the match range start to suggest the substitution is known at the start of the match. (As in the above example, input dumps don't mark any range for substitutions. The location info in that case simply identifies the right line for the annotation.) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83650 Files: llvm/include/llvm/Support/FileCheck.h llvm/lib/Support/FileCheck.cpp llvm/lib/Support/FileCheckImpl.h llvm/test/FileCheck/dump-input-annotations.txt llvm/test/FileCheck/verbose.txt llvm/utils/FileCheck/FileCheck.cpp -------------- next part -------------- A non-text attachment was scrubbed... Name: D83650.277311.patch Type: text/x-patch Size: 16465 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 18:48:11 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 01:48:11 +0000 (UTC) Subject: [PATCH] D83651: [FileCheck] Report captured variables Message-ID: jdenny created this revision. jdenny added reviewers: probinson, thopre, jhenderson, mehdi_amini. Herald added subscribers: llvm-commits, mgrang, hiraditya, arichardson. Herald added a project: LLVM. Report captured variables in input dumps and traces. For example: $ cat check CHECK: hello [[WHAT:[a-z]+]] CHECK: goodbye [[WHAT]] $ FileCheck -dump-input=always -vv check < input |& tail -8 <<<<<< 1: hello world check:1'0 ^~~~~~~~~~~ check:1'1 ^~~~~ captured var "WHAT" 2: goodbye world check:2'0 ^~~~~~~~~~~~~ check:2'1 with "WHAT" equal to "world" >>>>>> $ FileCheck -dump-input=never -vv check < input check2:1:8: remark: CHECK: expected string found in input CHECK: hello [[WHAT:[a-z]+]] ^ :1:1: note: found here hello world ^~~~~~~~~~~ :1:7: note: captured var "WHAT" hello world ^~~~~ check2:2:8: remark: CHECK: expected string found in input CHECK: goodbye [[WHAT]] ^ :2:1: note: found here goodbye world ^~~~~~~~~~~~~ :2:1: note: with "WHAT" equal to "world" goodbye world ^ Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D83651 Files: llvm/lib/Support/FileCheck.cpp llvm/lib/Support/FileCheckImpl.h llvm/test/FileCheck/dump-input-annotations.txt llvm/test/FileCheck/verbose.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: D83651.277312.patch Type: text/x-patch Size: 11788 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 19:32:25 2020 From: llvm-commits at lists.llvm.org (Shinji Okumura via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 02:32:25 +0000 (UTC) Subject: [PATCH] D83246: [Attributor] Cache query results for isPotentiallyReachable in AAReachability In-Reply-To: References: Message-ID: <8ab5122296c8ea5815616ef3607ca7b6@localhost.localdomain> okura updated this revision to Diff 277314. okura retitled this revision from "[Attributor] use liveness information from AAIsDead in AAReachability and cache query results" to "[Attributor] Cache query results for isPotentiallyReachable in AAReachability". okura edited the summary of this revision. okura added a comment. fix according to review comments change the purpose of this patch to caching query results only. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83246/new/ https://reviews.llvm.org/D83246 Files: llvm/include/llvm/Transforms/IPO/Attributor.h llvm/lib/Transforms/IPO/AttributorAttributes.cpp Index: llvm/lib/Transforms/IPO/AttributorAttributes.cpp =================================================================== --- llvm/lib/Transforms/IPO/AttributorAttributes.cpp +++ llvm/lib/Transforms/IPO/AttributorAttributes.cpp @@ -2469,7 +2469,7 @@ const auto &ReachabilityAA = A.getAAFor(*this, IRPosition::function(*ScopeFn)); - if (!ReachabilityAA.isAssumedReachable(UserI, getCtxI())) + if (!ReachabilityAA.isAssumedReachable(A, *UserI, *getCtxI())) return true; if (auto *CB = dyn_cast(UserI)) { Index: llvm/include/llvm/Transforms/IPO/Attributor.h =================================================================== --- llvm/include/llvm/Transforms/IPO/Attributor.h +++ llvm/include/llvm/Transforms/IPO/Attributor.h @@ -715,6 +715,21 @@ /// Return the map conaining all the knowledge we have from `llvm.assume`s. const RetainedKnowledgeMap &getKnowledgeMap() const { return KnowledgeMap; } + /// Return if \p To is potentially reachable form \p From or not + /// If the same query was answered, return cached result + bool getPotentiallyReachable(const Instruction &From, const Instruction &To) { + auto KeyPair = std::make_pair(&From, &To); + auto Iter = PotentiallyReachableMap.find(KeyPair); + if (Iter != PotentiallyReachableMap.end()) + return Iter->second; + const Function &F = *From.getFunction(); + bool Result = isPotentiallyReachable( + &From, &To, nullptr, AG.getAnalysis(F), + AG.getAnalysis(F)); + PotentiallyReachableMap.insert(std::make_pair(KeyPair, Result)); + return Result; + } + private: struct FunctionInfo { ~FunctionInfo(); @@ -774,6 +789,10 @@ /// Set of inlineable functions SmallPtrSet InlineableFunctions; + /// A map for caching results of queries for isPotentiallyReachable + DenseMap, bool> + PotentiallyReachableMap; + /// Give the Attributor access to the members so /// Attributor::identifyDefaultAbstractAttributes(...) can initialize them. friend struct Attributor; @@ -2313,16 +2332,17 @@ /// Returns true if 'From' instruction is assumed to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isAssumedReachable(const Instruction *From, - const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isAssumedReachable(Attributor &A, const Instruction &From, + const Instruction &To) const { + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Returns true if 'From' instruction is known to reach, 'To' instruction. /// Users should provide two positions they are interested in, and the class /// determines (and caches) reachability. - bool isKnownReachable(const Instruction *From, const Instruction *To) const { - return isPotentiallyReachable(From, To); + bool isKnownReachable(Attributor &A, const Instruction &From, + const Instruction &To) const { + return A.getInfoCache().getPotentiallyReachable(From, To); } /// Create an abstract attribute view for the position \p IRP. -------------- next part -------------- A non-text attachment was scrubbed... Name: D83246.277314.patch Type: text/x-patch Size: 3363 bytes Desc: not available URL: From llvm-commits at lists.llvm.org Sun Jul 12 19:54:35 2020 From: llvm-commits at lists.llvm.org (Bill Wendling via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 02:54:35 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: <35f75801e59dbdae3a7de560cdbb8155@localhost.localdomain> void added a comment. I think this might be a better fix: diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp index de336abe607..d21407a60eb 100644 --- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp +++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp @@ -888,6 +888,7 @@ rescheduleMIBelowKill(MachineBasicBlock::iterator &mi, return false; if (KillMI->hasUnmodeledSideEffects() || KillMI->isCall() || + KillMI->getOpcode() == TargetOpcode::INLINEASM_BR || KillMI->isBranch() || KillMI->isTerminator()) // Don't move pass calls, etc. return false; @@ -948,6 +949,7 @@ rescheduleMIBelowKill(MachineBasicBlock::iterator &mi, return false; ++NumVisited; if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() || + OtherMI.getOpcode() == TargetOpcode::INLINEASM_BR || OtherMI.isBranch() || OtherMI.isTerminator()) // Don't move pass calls, etc. return false; @@ -1122,6 +1124,7 @@ rescheduleKillAboveMI(MachineBasicBlock::iterator &mi, return false; ++NumVisited; if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() || + OtherMI.getOpcode() == TargetOpcode::INLINEASM_BR || OtherMI.isBranch() || OtherMI.isTerminator()) // Don't move pass calls, etc. return false; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Sun Jul 12 20:03:10 2020 From: llvm-commits at lists.llvm.org (Bill Wendling via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 03:03:10 +0000 (UTC) Subject: [PATCH] D83523: MachineSink: permit sinking into INLINEASM_BR indirect targets In-Reply-To: References: Message-ID: void added a comment. Here are a few other places to contemplate similar changes: diff --git a/llvm/lib/CodeGen/MachineCSE.cpp b/llvm/lib/CodeGen/MachineCSE.cpp index 09531276bc1..a6873c7acba 100644 --- a/llvm/lib/CodeGen/MachineCSE.cpp +++ b/llvm/lib/CodeGen/MachineCSE.cpp @@ -404,6 +404,7 @@ bool MachineCSE::isCSECandidate(MachineInstr *MI) { // Ignore stuff that we obviously can't move. if (MI->mayStore() || MI->isCall() || MI->isTerminator() || + MI->getOpcode() == TargetOpcode::INLINEASM_BR || MI->mayRaiseFPException() || MI->hasUnmodeledSideEffects()) return false; diff --git a/llvm/lib/CodeGen/ReachingDefAnalysis.cpp b/llvm/lib/CodeGen/ReachingDefAnalysis.cpp index 5bd8b4b8e27..90e829c925e 100644 --- a/llvm/lib/CodeGen/ReachingDefAnalysis.cpp +++ b/llvm/lib/CodeGen/ReachingDefAnalysis.cpp @@ -519,6 +519,7 @@ MachineInstr* ReachingDefAnalysis::getLocalLiveOutMIDef(MachineBasicBlock *MBB, static bool mayHaveSideEffects(MachineInstr &MI) { return MI.mayLoadOrStore() || MI.mayRaiseFPException() || MI.hasUnmodeledSideEffects() || MI.isTerminator() || + MI.getOpcode() == TargetOpcode::INLINEASM_BR || MI.isCall() || MI.isBarrier() || MI.isBranch() || MI.isReturn(); } Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83523/new/ https://reviews.llvm.org/D83523 From llvm-commits at lists.llvm.org Sun Jul 12 20:13:14 2020 From: llvm-commits at lists.llvm.org (Mehdi AMINI via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 03:13:14 +0000 (UTC) Subject: [PATCH] D83650: [FileCheck] Extend -dump-input with substitutions In-Reply-To: References: Message-ID: mehdi_amini added a comment. I agree that I'd like to have more info inline and this is going in the right direction, but what seems missing is still the check line itself. Have you looked into localizing it as well? <<<<<< 1: hello world check:1 ^~~~~~~~~~~ // CHECK: hello [[WHAT:[a-z]+]] 2: goodbye word check:2'0 X~~~~~~~~~~~ error: no match found check:2'0 CHECK: [[VERB]] [[WHAT]] check:2'1 with "VERB" equal to "goodbye" check:2'2 with "WHAT" equal to "world" >>>>>> Ultimately with some scheme like this the following header can be entirely omitted: error: CHECK: expected string not found in input CHECK: [[VERB]] [[WHAT]] ^ :2:1: note: scanning from here goodbye word ^ :2:1: note: with "VERB" equal to "goodbye" goodbye word ^ :2:1: note: with "WHAT" equal to "world" goodbye word ^ Input file: Check file: check -dump-input=help explains the following input dump. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83650/new/ https://reviews.llvm.org/D83650 From llvm-commits at lists.llvm.org Sun Jul 12 20:23:55 2020 From: llvm-commits at lists.llvm.org (Juneyoung Lee via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 03:23:55 +0000 (UTC) Subject: [PATCH] D83360: [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X In-Reply-To: References: Message-ID: <89a710a364b23dc6357a58991f5dbfcf@localhost.localdomain> aqjune added a comment. (renaming variables for readability) %a = select i1 %s, i1 undef, i1 %t %b = xor i1 %s, 1 %c = and i1 %a, %b This series of reasoning happened from a single SimplifyAndInst call: c = a & (s ^ 1) = (a & s) ^ (a & 1) ; ExpandBinOp = ((select s, undef, t) & s) ^ a = (select s, (undef & s), (t & s)) ^ a ; ThreadBinOpOverSelect = (select s, (undef & s), false) ^ a ; since s = (x == y), t = (x < y) = (select s, false, false) ^ a ; choose undef to be false = a = select s, undef, t In general, distributing `a` into operands of xor (second line) isn't sound because it increases the number of uses of `a`. We don't want to totally disable the simplification, however. If InstSimplify never increases the number of uses in the end, we have an alternative solution: tracking to which value undef is folded. Whenever an undef value is chosen to be a concrete value, the decision should be remembered, so the copied undefs won't be folded into different values. In case of InstSimplify, we can identify individual undefs by Use, since InstSimplify won't do any transformation inside. This means SimplifyXXX needs to return two things: the simplified value & the undef cache. Since InstSimplify isn't designed to do transformation directly, other optimizations like InstCombine should perform the final change. Does this solution make sense? Then, I can prepare a patch for this. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83360/new/ https://reviews.llvm.org/D83360 From llvm-commits at lists.llvm.org Sun Jul 12 20:41:02 2020 From: llvm-commits at lists.llvm.org (Joel E. Denny via Phabricator via llvm-commits) Date: Mon, 13 Jul 2020 03:41:02 +0000 (UTC) Subject: [PATCH] D83650: [FileCheck] Extend -dump-input with substitutions In-Reply-To: References: Message-ID: jdenny added a comment. In D83650#2146534 , @mehdi_amini wrote: > I agree that I'd like to have more info inline and this is going in the right direction, but what seems missing is still the check line itself. > > Have you looked into localizing it as well? I too have been thinking that would be a good idea. However, it's a lower priority because, at least for tests I work with, `check:1` makes it easy to find that info in a file I have to open anyway. > Ultimately with some scheme like this the following header can be entirely omitted: Probably so. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D83650/new/ https://reviews.llvm.org/D83650 From llvm-commits at lists.llvm.org Sun Jul 12 21:03:54 2020 From: llvm-commits at lists.llvm.org (Johannes Doerfert via llvm-commits) Date: Sun, 12 Jul 2020 21:03:54 -0700 (PDT) Subject: [llvm] 7844366 - [OpenMP] Add firstprivate as a default data-sharing attribute to clang Message-ID: <5f0bdd2a.1c69fb81.8ce8c.0436@mx.google.com> Author: Atmn Patel Date: 2020-07-12T23:01:40-05:00 New Revision: 78443666bc18a6957d279a0f58319c8a3e57771a URL: https://github.com/llvm/llvm-project/commit/78443666bc18a6957d279a0f58319c8a3e57771a DIFF: https://github.com/llvm/llvm-project/commit/78443666bc18a6957d279a0f58319c8a3e57771a.diff LOG: [OpenMP] Add firstprivate as a default data-sharing attribute to clang This implements the default(firstprivate) clause as defined in OpenMP Technical Report 8 (2.22.4). Reviewed By: jdoerfert, ABataev Differential Revision: https://reviews.llvm.org/D75591 Added: Modified: clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp clang/docs/LibASTMatchersReference.html clang/include/clang/ASTMatchers/ASTMatchers.h clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/ASTMatchers/Dynamic/Registry.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/distribute_parallel_for_default_messages.cpp clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/driver.c clang/test/OpenMP/parallel_default_messages.cpp clang/test/OpenMP/parallel_for_default_messages.cpp clang/test/OpenMP/parallel_for_simd_default_messages.cpp clang/test/OpenMP/parallel_master_codegen.cpp clang/test/OpenMP/parallel_master_default_messages.cpp clang/test/OpenMP/parallel_sections_default_messages.cpp clang/test/OpenMP/target_parallel_default_messages.cpp clang/test/OpenMP/target_parallel_for_default_messages.cpp clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp clang/test/OpenMP/target_teams_default_messages.cpp clang/test/OpenMP/target_teams_distribute_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/task_default_messages.cpp clang/test/OpenMP/task_messages.cpp clang/test/OpenMP/teams_default_messages.cpp clang/test/OpenMP/teams_distribute_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp clang/test/OpenMP/teams_distribute_simd_default_messages.cpp clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp clang/unittests/ASTMatchers/ASTMatchersTest.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def Removed: ################################################################################ diff --git a/clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst b/clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst index 4223a10bd6e9..77114100ba1c 100644 --- a/clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst +++ b/clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst @@ -51,3 +51,12 @@ Example // WARNING: OpenMP directive ``parallel`` specifies ``default(shared)`` // clause. Consider using ``default(none)`` clause instead. } + + // ``parallel`` directive can have ``default`` clause, and said clause is + // specified, but with ``firstprivate`` kind, which is not ``none``, diagnose. + void p0_3() { + #pragma omp parallel default(firstprivate) + ; + // WARNING: OpenMP directive ``parallel`` specifies ``default(firstprivate)`` + // clause. Consider using ``default(none)`` clause instead. + } diff --git a/clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp b/clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp index 35d2d17b1e0e..d1d3b0e441f3 100644 --- a/clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp +++ b/clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp @@ -1,5 +1,5 @@ -// RUN: %check_clang_tidy %s openmp-use-default-none %t -- -- -fopenmp=libomp -fopenmp-version=40 -// RUN: %check_clang_tidy -std=c11 %s openmp-use-default-none %t -- -- -x c -fopenmp=libomp -fopenmp-version=40 +// RUN: %check_clang_tidy %s openmp-use-default-none %t -- -- -fopenmp=libomp -fopenmp-version=51 +// RUN: %check_clang_tidy -std=c11 %s openmp-use-default-none %t -- -- -x c -fopenmp=libomp -fopenmp-version=51 //----------------------------------------------------------------------------// // Null cases. @@ -42,6 +42,15 @@ void p0_2() { // CHECK-NOTES: :[[@LINE-3]]:22: note: existing 'default' clause specified here } +// 'parallel' directive can have 'default' clause, and said clause specified, +// but with 'firstprivate' kind, which is not 'none', diagnose. +void p0_3() { +#pragma omp parallel default(firstprivate) + ; + // CHECK-NOTES: :[[@LINE-2]]:1: warning: OpenMP directive 'parallel' specifies 'default(firstprivate)' clause, consider using 'default(none)' clause instead + // CHECK-NOTES: :[[@LINE-3]]:22: note: existing 'default' clause specified here +} + // 'task' directive. // 'task' directive can have 'default' clause, but said clause is not @@ -68,6 +77,15 @@ void p1_2() { // CHECK-NOTES: :[[@LINE-3]]:18: note: existing 'default' clause specified here } +// 'task' directive can have 'default' clause, and said clause specified, +// but with 'firstprivate' kind, which is not 'none', diagnose. +void p1_3() { +#pragma omp task default(firstprivate) + ; + // CHECK-NOTES: :[[@LINE-2]]:1: warning: OpenMP directive 'task' specifies 'default(firstprivate)' clause, consider using 'default(none)' clause instead + // CHECK-NOTES: :[[@LINE-3]]:18: note: existing 'default' clause specified here +} + // 'teams' directive. (has to be inside of 'target' directive) // 'teams' directive can have 'default' clause, but said clause is not @@ -97,6 +115,16 @@ void p2_2() { // CHECK-NOTES: :[[@LINE-3]]:19: note: existing 'default' clause specified here } +// 'teams' directive can have 'default' clause, and said clause specified, +// but with 'firstprivate' kind, which is not 'none', diagnose. +void p2_3() { +#pragma omp target +#pragma omp teams default(firstprivate) + ; + // CHECK-NOTES: :[[@LINE-2]]:1: warning: OpenMP directive 'teams' specifies 'default(firstprivate)' clause, consider using 'default(none)' clause instead + // CHECK-NOTES: :[[@LINE-3]]:19: note: existing 'default' clause specified here +} + // 'taskloop' directive. // 'taskloop' directive can have 'default' clause, but said clause is not @@ -126,6 +154,16 @@ void p3_2(const int a) { // CHECK-NOTES: :[[@LINE-4]]:22: note: existing 'default' clause specified here } +// 'taskloop' directive can have 'default' clause, and said clause specified, +// but with 'firstprivate' kind, which is not 'none', diagnose. +void p3_3(const int a) { +#pragma omp taskloop default(firstprivate) + for (int b = 0; b < a; b++) + ; + // CHECK-NOTES: :[[@LINE-3]]:1: warning: OpenMP directive 'taskloop' specifies 'default(firstprivate)' clause, consider using 'default(none)' clause instead + // CHECK-NOTES: :[[@LINE-4]]:22: note: existing 'default' clause specified here +} + //----------------------------------------------------------------------------// // Combined directives. // Let's not test every single possible permutation/combination of directives, @@ -158,3 +196,13 @@ void p4_2(const int a) { // CHECK-NOTES: :[[@LINE-3]]:1: warning: OpenMP directive 'parallel for' specifies 'default(shared)' clause, consider using 'default(none)' clause instead // CHECK-NOTES: :[[@LINE-4]]:26: note: existing 'default' clause specified here } + +// 'parallel' directive can have 'default' clause, and said clause specified, +// but with 'firstprivate' kind, which is not 'none', diagnose. +void p4_3(const int a) { +#pragma omp parallel for default(firstprivate) + for (int b = 0; b < a; b++) + ; + // CHECK-NOTES: :[[@LINE-3]]:1: warning: OpenMP directive 'parallel for' specifies 'default(firstprivate)' clause, consider using 'default(none)' clause instead + // CHECK-NOTES: :[[@LINE-4]]:26: note: existing 'default' clause specified here +} diff --git a/clang/docs/LibASTMatchersReference.html b/clang/docs/LibASTMatchersReference.html index 2256cbf71869..60ff6ffe6056 100644 --- a/clang/docs/LibASTMatchersReference.html +++ b/clang/docs/LibASTMatchersReference.html @@ -676,9 +676,10 @@

Node Matchers

#pragma omp parallel default(none) #pragma omp parallel default(shared) + #pragma omp parallel default(firstprivate) #pragma omp parallel -``ompDefaultClause()`` matches ``default(none)`` and ``default(shared)``. +``ompDefaultClause()`` matches ``default(none)``, ``default(shared)``, and ``default(firstprivate)``. @@ -3783,6 +3784,7 @@

Narrowing Matchers

#pragma omp parallel #pragma omp parallel default(none) #pragma omp parallel default(shared) + #pragma omp parallel default(firstprivate) ``ompDefaultClause(isNoneKind())`` matches only ``default(none)``. @@ -3796,11 +3798,26 @@

Narrowing Matchers

#pragma omp parallel #pragma omp parallel default(none) #pragma omp parallel default(shared) + #pragma omp parallel default(firstprivate) ``ompDefaultClause(isSharedKind())`` matches only ``default(shared)``. +Matcher<OMPDefaultClause>isSharedKind +
Matches if the OpenMP ``default`` clause has ``firstprivate`` kind specified.
+
+Given
+
+  #pragma omp parallel
+  #pragma omp parallel default(none)
+  #pragma omp parallel default(shared)
+  #pragma omp parallel default(firstprivate)
+
+``ompDefaultClause(isFirstPrivateKind())`` matches only ``default(firstprivate)``.
+
+ + Matcher<OMPExecutableDirective>isAllowedToContainClauseKindOpenMPClauseKind CKind
Matches if the OpenMP directive is allowed to contain the specified OpenMP
 clause kind.

diff  --git a/clang/include/clang/ASTMatchers/ASTMatchers.h b/clang/include/clang/ASTMatchers/ASTMatchers.h
index f16fb876cdd3..643419743a11 100644
--- a/clang/include/clang/ASTMatchers/ASTMatchers.h
+++ b/clang/include/clang/ASTMatchers/ASTMatchers.h
@@ -7190,10 +7190,12 @@ AST_MATCHER_P(OMPExecutableDirective, hasAnyClause,
 /// \code
 ///   #pragma omp parallel default(none)
 ///   #pragma omp parallel default(shared)
+///   #pragma omp parallel default(firstprivate)
 ///   #pragma omp parallel
 /// \endcode
 ///
-/// ``ompDefaultClause()`` matches ``default(none)`` and ``default(shared)``.
+/// ``ompDefaultClause()`` matches ``default(none)``, ``default(shared)``, and
+/// ``default(firstprivate)``
 extern const internal::VariadicDynCastAllOfMatcher
     ompDefaultClause;
 
@@ -7205,6 +7207,7 @@ extern const internal::VariadicDynCastAllOfMatcher
 ///   #pragma omp parallel
 ///   #pragma omp parallel default(none)
 ///   #pragma omp parallel default(shared)
+///   #pragma omp parallel default(firstprivate)
 /// \endcode
 ///
 /// ``ompDefaultClause(isNoneKind())`` matches only ``default(none)``.
@@ -7220,6 +7223,7 @@ AST_MATCHER(OMPDefaultClause, isNoneKind) {
 ///   #pragma omp parallel
 ///   #pragma omp parallel default(none)
 ///   #pragma omp parallel default(shared)
+///   #pragma omp parallel default(firstprivate)
 /// \endcode
 ///
 /// ``ompDefaultClause(isSharedKind())`` matches only ``default(shared)``.
@@ -7227,6 +7231,24 @@ AST_MATCHER(OMPDefaultClause, isSharedKind) {
   return Node.getDefaultKind() == llvm::omp::OMP_DEFAULT_shared;
 }
 
+/// Matches if the OpenMP ``default`` clause has ``firstprivate`` kind
+/// specified.
+///
+/// Given
+///
+/// \code
+///   #pragma omp parallel
+///   #pragma omp parallel default(none)
+///   #pragma omp parallel default(shared)
+///   #pragma omp parallel default(firstprivate)
+/// \endcode
+///
+/// ``ompDefaultClause(isFirstPrivateKind())`` matches only
+/// ``default(firstprivate)``.
+AST_MATCHER(OMPDefaultClause, isFirstPrivateKind) {
+  return Node.getDefaultKind() == llvm::omp::OMP_DEFAULT_firstprivate;
+}
+
 /// Matches if the OpenMP directive is allowed to contain the specified OpenMP
 /// clause kind.
 ///

diff  --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index f5b32a6ba5fa..1038a4119d4c 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1334,6 +1334,8 @@ def warn_omp_more_one_device_type_clause
       InGroup;
 def err_omp_variant_ctx_second_match_extension : Error<
   "only a single match extension allowed per OpenMP context selector">;
+def err_omp_invalid_dsa: Error<
+  "data-sharing attribute '%0' in '%1' clause requires OpenMP version %2 or above">;
 
 // Pragma loop support.
 def err_pragma_loop_missing_argument : Error<

diff  --git a/clang/lib/ASTMatchers/Dynamic/Registry.cpp b/clang/lib/ASTMatchers/Dynamic/Registry.cpp
index a0a65092a92b..ec2215804c09 100644
--- a/clang/lib/ASTMatchers/Dynamic/Registry.cpp
+++ b/clang/lib/ASTMatchers/Dynamic/Registry.cpp
@@ -389,6 +389,7 @@ RegistryMaps::RegistryMaps() {
   REGISTER_MATCHER(isExpr);
   REGISTER_MATCHER(isExternC);
   REGISTER_MATCHER(isFinal);
+  REGISTER_MATCHER(isFirstPrivateKind);
   REGISTER_MATCHER(isImplicit);
   REGISTER_MATCHER(isInStdNamespace);
   REGISTER_MATCHER(isInTemplateInstantiation);

diff  --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index afcef3043843..5223755c8fdf 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -1441,7 +1441,7 @@ bool Parser::parseOMPDeclareVariantMatchClause(SourceLocation Loc,
 /// Parsing of simple OpenMP clauses like 'default' or 'proc_bind'.
 ///
 ///    default-clause:
-///         'default' '(' 'none' | 'shared' ')
+///         'default' '(' 'none' | 'shared'  | 'firstprivate' ')
 ///
 ///    proc_bind-clause:
 ///         'proc_bind' '(' 'master' | 'close' | 'spread' ')
@@ -2772,7 +2772,7 @@ OMPClause *Parser::ParseOpenMPSingleExprClause(OpenMPClauseKind Kind,
 /// Parsing of simple OpenMP clauses like 'default' or 'proc_bind'.
 ///
 ///    default-clause:
-///         'default' '(' 'none' | 'shared' ')'
+///         'default' '(' 'none' | 'shared' | 'firstprivate' ')'
 ///
 ///    proc_bind-clause:
 ///         'proc_bind' '(' 'master' | 'close' | 'spread' ')'
@@ -2785,6 +2785,14 @@ OMPClause *Parser::ParseOpenMPSimpleClause(OpenMPClauseKind Kind,
   llvm::Optional Val = parseOpenMPSimpleClause(*this, Kind);
   if (!Val || ParseOnly)
     return nullptr;
+  if (getLangOpts().OpenMP < 51 && Kind == OMPC_default &&
+      static_cast(Val.getValue().Type) ==
+          OMP_DEFAULT_firstprivate) {
+    Diag(Val.getValue().LOpen, diag::err_omp_invalid_dsa)
+        << getOpenMPClauseName(OMPC_firstprivate)
+        << getOpenMPClauseName(OMPC_default) << "5.1";
+    return nullptr;
+  }
   return Actions.ActOnOpenMPSimpleClause(
       Kind, Val.getValue().Type, Val.getValue().TypeLoc, Val.getValue().LOpen,
       Val.getValue().Loc, Val.getValue().RLoc);

diff  --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index b27abb54c170..920463da4027 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -53,9 +53,10 @@ static const Expr *checkMapClauseExpressionBase(
 namespace {
 /// Default data sharing attributes, which can be applied to directive.
 enum DefaultDataSharingAttributes {
-  DSA_unspecified = 0, /// Data sharing attribute not specified.
-  DSA_none = 1 << 0,   /// Default data sharing attribute 'none'.
-  DSA_shared = 1 << 1, /// Default data sharing attribute 'shared'.
+  DSA_unspecified = 0,       /// Data sharing attribute not specified.
+  DSA_none = 1 << 0,         /// Default data sharing attribute 'none'.
+  DSA_shared = 1 << 1,       /// Default data sharing attribute 'shared'.
+  DSA_firstprivate = 1 << 2, /// Default data sharing attribute 'firstprivate'.
 };
 
 /// Stack for tracking declarations used in OpenMP directives and
@@ -684,6 +685,11 @@ class DSAStackTy {
     getTopOfStack().DefaultAttr = DSA_shared;
     getTopOfStack().DefaultAttrLoc = Loc;
   }
+  /// Set default data sharing attribute to firstprivate.
+  void setDefaultDSAFirstPrivate(SourceLocation Loc) {
+    getTopOfStack().DefaultAttr = DSA_firstprivate;
+    getTopOfStack().DefaultAttrLoc = Loc;
+  }
   /// Set default data mapping attribute to Modifier:Kind
   void setDefaultDMAAttr(OpenMPDefaultmapClauseModifier M,
                          OpenMPDefaultmapClauseKind Kind,
@@ -1183,6 +1189,15 @@ DSAStackTy::DSAVarData DSAStackTy::getDSA(const_iterator &Iter,
     return DVar;
   case DSA_none:
     return DVar;
+  case DSA_firstprivate:
+    if (VD->getStorageDuration() == SD_Static &&
+        VD->getDeclContext()->isFileContext()) {
+      DVar.CKind = OMPC_unknown;
+    } else {
+      DVar.CKind = OMPC_firstprivate;
+    }
+    DVar.ImplicitDSALoc = Iter->DefaultAttrLoc;
+    return DVar;
   case DSA_unspecified:
     // OpenMP [2.9.1.1, Data-sharing Attribute Rules for Variables Referenced
     // in a Construct, implicitly determined, p.2]
@@ -2058,7 +2073,13 @@ bool Sema::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
         // If the variable is artificial and must be captured by value - try to
         // capture by value.
         !(isa(D) && !D->hasAttr() &&
-          !cast(D)->getInit()->isGLValue());
+          !cast(D)->getInit()->isGLValue()) &&
+        // If the variable is implicitly firstprivate and scalar - capture by
+        // copy
+        !(DSAStack->getDefaultDSA() == DSA_firstprivate &&
+          !DSAStack->hasExplicitDSA(
+              D, [](OpenMPClauseKind K) { return K != OMPC_unknown; }, Level) &&
+          !DSAStack->isLoopControlVariable(D, Level).first);
   }
 
   // When passing data by copy, we need to make sure it fits the uintptr size
@@ -2185,10 +2206,13 @@ VarDecl *Sema::isOpenMPCapturedDecl(ValueDecl *D, bool CheckScopeInfo,
         DSAStack->isClauseParsingMode());
     // Global shared must not be captured.
     if (VD && !VD->hasLocalStorage() && DVarPrivate.CKind == OMPC_unknown &&
-        (DSAStack->getDefaultDSA() != DSA_none || DVarTop.CKind == OMPC_shared))
+        ((DSAStack->getDefaultDSA() != DSA_none &&
+          DSAStack->getDefaultDSA() != DSA_firstprivate) ||
+         DVarTop.CKind == OMPC_shared))
       return nullptr;
     if (DVarPrivate.CKind != OMPC_unknown ||
-        (VD && DSAStack->getDefaultDSA() == DSA_none))
+        (VD && (DSAStack->getDefaultDSA() == DSA_none ||
+                DSAStack->getDefaultDSA() == DSA_firstprivate)))
       return VD ? VD : cast(DVarPrivate.PrivateCopy->getDecl());
   }
   return nullptr;
@@ -3333,10 +3357,19 @@ class DSAAttrChecker final : public StmtVisitor {
       // in the construct, and does not have a predetermined data-sharing
       // attribute, must have its data-sharing attribute explicitly determined
       // by being listed in a data-sharing attribute clause.
-      if (DVar.CKind == OMPC_unknown && Stack->getDefaultDSA() == DSA_none &&
+      if (DVar.CKind == OMPC_unknown &&
+          (Stack->getDefaultDSA() == DSA_none ||
+           Stack->getDefaultDSA() == DSA_firstprivate) &&
           isImplicitOrExplicitTaskingRegion(DKind) &&
           VarsWithInheritedDSA.count(VD) == 0) {
-        VarsWithInheritedDSA[VD] = E;
+        bool InheritedDSA = Stack->getDefaultDSA() == DSA_none;
+        if (!InheritedDSA && Stack->getDefaultDSA() == DSA_firstprivate) {
+          DSAStackTy::DSAVarData DVar =
+              Stack->getImplicitDSA(VD, /*FromParent=*/false);
+          InheritedDSA = DVar.CKind == OMPC_unknown;
+        }
+        if (InheritedDSA)
+          VarsWithInheritedDSA[VD] = E;
         return;
       }
 
@@ -3438,7 +3471,9 @@ class DSAAttrChecker final : public StmtVisitor {
 
       // Define implicit data-sharing attributes for task.
       DVar = Stack->getImplicitDSA(VD, /*FromParent=*/false);
-      if (isOpenMPTaskingDirective(DKind) && DVar.CKind != OMPC_shared &&
+      if (((isOpenMPTaskingDirective(DKind) && DVar.CKind != OMPC_shared) ||
+           (Stack->getDefaultDSA() == DSA_firstprivate &&
+            DVar.CKind == OMPC_firstprivate && !DVar.RefExpr)) &&
           !Stack->isLoopControlVariable(VD).first) {
         ImplicitFirstprivate.push_back(E);
         return;
@@ -5342,8 +5377,10 @@ StmtResult Sema::ActOnOpenMPExecutableDirective(
 
   ErrorFound = Res.isInvalid() || ErrorFound;
 
-  // Check variables in the clauses if default(none) was specified.
-  if (DSAStack->getDefaultDSA() == DSA_none) {
+  // Check variables in the clauses if default(none) or
+  // default(firstprivate) was specified.
+  if (DSAStack->getDefaultDSA() == DSA_none ||
+      DSAStack->getDefaultDSA() == DSA_firstprivate) {
     DSAAttrChecker DSAChecker(DSAStack, *this, nullptr);
     for (OMPClause *C : Clauses) {
       switch (C->getClauseKind()) {
@@ -5454,7 +5491,8 @@ StmtResult Sema::ActOnOpenMPExecutableDirective(
     if (P.getFirst()->isImplicit() || isa(P.getFirst()))
       continue;
     ErrorFound = true;
-    if (DSAStack->getDefaultDSA() == DSA_none) {
+    if (DSAStack->getDefaultDSA() == DSA_none ||
+        DSAStack->getDefaultDSA() == DSA_firstprivate) {
       Diag(P.second->getExprLoc(), diag::err_omp_no_dsa_for_variable)
           << P.first << P.second->getSourceRange();
       Diag(DSAStack->getDefaultDSALocation(), diag::note_omp_default_dsa_none);
@@ -12932,10 +12970,20 @@ OMPClause *Sema::ActOnOpenMPDefaultClause(DefaultKind Kind,
         << getOpenMPClauseName(OMPC_default);
     return nullptr;
   }
-  if (Kind == OMP_DEFAULT_none)
+
+  switch (Kind) {
+  case OMP_DEFAULT_none:
     DSAStack->setDefaultDSANone(KindKwLoc);
-  else if (Kind == OMP_DEFAULT_shared)
+    break;
+  case OMP_DEFAULT_shared:
     DSAStack->setDefaultDSAShared(KindKwLoc);
+    break;
+  case OMP_DEFAULT_firstprivate:
+    DSAStack->setDefaultDSAFirstPrivate(KindKwLoc);
+    break;
+  default:
+    llvm_unreachable("DSA unexpected in OpenMP default clause");
+  }
 
   return new (Context)
       OMPDefaultClause(Kind, KindKwLoc, StartLoc, LParenLoc, EndLoc);

diff  --git a/clang/test/OpenMP/distribute_parallel_for_default_messages.cpp b/clang/test/OpenMP/distribute_parallel_for_default_messages.cpp
index 0629ba096d0c..67e4615ae8c0 100644
--- a/clang/test/OpenMP/distribute_parallel_for_default_messages.cpp
+++ b/clang/test/OpenMP/distribute_parallel_for_default_messages.cpp
@@ -2,8 +2,17 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 template 
 T tmain(T argc) {
   int i;
@@ -14,12 +23,12 @@ T tmain(T argc) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp distribute parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -34,7 +43,7 @@ T tmain(T argc) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -62,12 +71,12 @@ int main(int argc, char **argv) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp distribute parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -82,7 +91,7 @@ int main(int argc, char **argv) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -98,5 +107,15 @@ int main(int argc, char **argv) {
   for (i = 0; i < argc; ++i) // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     foo();
 
+#ifdef OMP51
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute parallel for default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (i = 0; i < argc; ++i) {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return (tmain(argc) + tmain(argv[0][0])); // expected-note {{in instantiation of function template specialization 'tmain' requested here}} expected-note {{in instantiation of function template specialization 'tmain' requested here}}
 }

diff  --git a/clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp b/clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp
index b9c5546ec5d9..9aab00f16c48 100644
--- a/clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp
+++ b/clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp
@@ -2,8 +2,17 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp -ferror-limit 100 -o - %s -Wuninitialized -DOMP51 -fopenmp-version=51
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized -DOMP51 -fopenmp-version=51
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 template 
 T tmain(T argc) {
   int i;
@@ -14,12 +23,12 @@ T tmain(T argc) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp distribute parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -34,7 +43,7 @@ T tmain(T argc) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -62,12 +71,12 @@ int main(int argc, char **argv) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp distribute parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -82,7 +91,7 @@ int main(int argc, char **argv) {
     foo();
 #pragma omp target
 #pragma omp teams
-#pragma omp distribute parallel for simd default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp distribute parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target
@@ -90,6 +99,15 @@ int main(int argc, char **argv) {
 #pragma omp distribute parallel for simd default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (i = 0; i < argc; ++i)  // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     foo();
+#ifdef OpenMP51
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute parallel for simd default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (i = 0; i < argc; ++i) {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
 
 #pragma omp parallel default(none) // expected-note 2 {{explicit data sharing attribute requested here}}
 #pragma omp target

diff  --git a/clang/test/OpenMP/driver.c b/clang/test/OpenMP/driver.c
index fa5bd1a8b5f8..047478256f9f 100644
--- a/clang/test/OpenMP/driver.c
+++ b/clang/test/OpenMP/driver.c
@@ -47,6 +47,7 @@
 // RUN: %clang %s -c -E -dM -fopenmp-simd -fopenmp-version=31 | FileCheck --check-prefix=CHECK-VERSION %s
 // RUN: %clang %s -c -E -dM -fopenmp-simd -fopenmp-version=40 | FileCheck --check-prefix=CHECK-VERSION %s
 // RUN: %clang %s -c -E -dM -fopenmp-simd -fopenmp-version=45 | FileCheck --check-prefix=CHECK-VERSION %s
+// RUN: %clang %s -c -E -dM -fopenmp-simd -fopenmp-version=51 | FileCheck --check-prefix=CHECK-VERSION %s
 
 // CHECK-VERSION-NOT: #define _OPENMP
 

diff  --git a/clang/test/OpenMP/parallel_default_messages.cpp b/clang/test/OpenMP/parallel_default_messages.cpp
index 6b8ad6705185..b098c43852a8 100644
--- a/clang/test/OpenMP/parallel_default_messages.cpp
+++ b/clang/test/OpenMP/parallel_default_messages.cpp
@@ -4,18 +4,25 @@
 // RUN: %clang_cc1 -verify=expected,ge40 -fopenmp-version=40 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
 // RUN: %clang_cc1 -verify -fopenmp-version=31 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
 // RUN: %clang_cc1 -verify -fopenmp-version=30 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge40 -fopenmp-version=51 -fopenmp -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge40 -fopenmp-version=51 -fopenmp-simd -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
 
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   const int c = 0;
 
   #pragma omp parallel default // expected-error {{expected '(' after 'default'}}
-  #pragma omp parallel default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
-  #pragma omp parallel default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
-  #pragma omp parallel default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
-  #pragma omp parallel default (shared), default(shared) // expected-error {{directive '#pragma omp parallel' cannot contain more than one 'default' clause}}
-  #pragma omp parallel default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel default(  // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
+#pragma omp parallel default(none                     // expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel default(shared), default(shared) // expected-error {{directive '#pragma omp parallel' cannot contain more than one 'default' clause}}
+#pragma omp parallel default(x)                       // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 
   #pragma omp parallel default(none) // expected-note {{explicit data sharing attribute requested here}}
@@ -27,5 +34,14 @@ int main(int argc, char **argv) {
 
   #pragma omp parallel default(none) // ge40-note {{explicit data sharing attribute requested here}}
   (void)c; // ge40-error {{variable 'c' must have explicitly specified data sharing attributes}}
+
+#ifdef OMP51
+#pragma omp parallel default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/parallel_for_default_messages.cpp b/clang/test/OpenMP/parallel_for_default_messages.cpp
index b02fa8803a3b..c64b76948c01 100644
--- a/clang/test/OpenMP/parallel_for_default_messages.cpp
+++ b/clang/test/OpenMP/parallel_for_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=51 -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=51 -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   int i;
 #pragma omp parallel for default // expected-error {{expected '(' after 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp parallel for default(none // expected-error {{expected ')'}} expected-note {{to match this '('}} expected-note {{explicit data sharing attribute requested here}}
@@ -21,7 +30,7 @@ int main(int argc, char **argv) {
 #pragma omp parallel for default(shared), default(shared) // expected-error {{directive '#pragma omp parallel for' cannot contain more than one 'default' clause}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 
@@ -34,5 +43,13 @@ int main(int argc, char **argv) {
   for (i = 0; i < argc; ++i) // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     foo();
 
+#ifdef OMP51
+#pragma omp parallel for default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (i = 0; i < argc; ++i) {
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/parallel_for_simd_default_messages.cpp b/clang/test/OpenMP/parallel_for_simd_default_messages.cpp
index 570ee14bbc84..6368d280de5d 100644
--- a/clang/test/OpenMP/parallel_for_simd_default_messages.cpp
+++ b/clang/test/OpenMP/parallel_for_simd_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   int i;
 #pragma omp parallel for simd default // expected-error {{expected '(' after 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for simd default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for simd default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp parallel for simd default(none // expected-error {{expected ')'}} expected-note {{to match this '('}} expected-note {{explicit data sharing attribute requested here}}
@@ -21,7 +30,7 @@ int main(int argc, char **argv) {
 #pragma omp parallel for simd default(shared), default(shared) // expected-error {{directive '#pragma omp parallel for simd' cannot contain more than one 'default' clause}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp parallel for simd default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 
@@ -34,5 +43,13 @@ int main(int argc, char **argv) {
   for (i = 0; i < argc; ++i) // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}} expected-error {{variable 'i' must have explicitly specified data sharing attributes}}
     foo();
 
+#ifdef OMP51
+#pragma omp parallel for default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (i = 0; i < argc; ++i) {
+    x++; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    y++; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/parallel_master_codegen.cpp b/clang/test/OpenMP/parallel_master_codegen.cpp
index 9ffa941314b9..82e18c80f103 100644
--- a/clang/test/OpenMP/parallel_master_codegen.cpp
+++ b/clang/test/OpenMP/parallel_master_codegen.cpp
@@ -118,6 +118,162 @@ void parallel_master_private() {
 
 #endif
 
+#ifdef CK31
+///==========================================================================///
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix CK31
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix CK31
+
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -verify -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -DCK31 -fopenmp-version=51 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+
+// CK31-DAG:   %struct.ident_t = type { i32, i32, i32, i32, i8* }
+// CK31-DAG:   [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
+
+void parallel_master_default_firstprivate() {
+  int a;
+#pragma omp parallel master default(firstprivate)
+  a++;
+}
+
+// CK31-LABEL: define void @{{.+}}parallel_master{{.+}}
+// CK31:       [[A_VAL:%.+]] = alloca i32{{.+}}
+// CK31:       [[A_CASTED:%.+]] = alloca i64
+// CK31:       [[ZERO_VAL:%.+]] = load i32, i32* [[A_VAL]]
+// CK31:       [[CONV:%.+]] = bitcast i64* [[A_CASTED]] to i32*
+// CK31:       store i32 [[ZERO_VAL]], i32* [[CONV]]
+// CK31:       [[ONE_VAL:%.+]] = load i64, i64* [[A_CASTED]]
+// CK31:       call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @0, i32 1, void (i32*, i32*, ...)* bitcast (void (i32*, i32*, i64)* @.omp_outlined. to void (i32*, i32*, ...)*), i64 [[ONE_VAL]])
+// CK31:       ret void
+
+// CK31:       [[GLOBAL_TID_ADDR:%.+]] = alloca i32*
+// CK31:       [[BOUND_TID_ADDR:%.+]] = alloca i32*
+// CK31:       [[A_ADDR:%.+]] = alloca i64{{.+}}
+// CK31:       store i32* [[GLOBAL_TID:%.+]], i32** [[GLOBAL_TID_ADDR]]{{.+}}
+// CK31:       store i32* [[BOUND_TID:%.+]], i32** [[BOUND_TID_ADDR]]
+// CK31:       store i64 [[A_VAL]], i64* [[A_ADDR]]
+// CK31:       [[CONV]] = bitcast i64* [[A_ADDR]]
+// CK31:       [[ZERO_VAL]] = load i32*, i32** [[GLOBAL_TID_ADDR]]
+// CK31:       [[ONE_VAL]] = load i32, i32* [[ZERO_VAL]]
+// CK31:       [[TWO_VAL:%.+]] = call i32 @__kmpc_master(%struct.ident_t* @0, i32 [[ONE_VAL]])
+// CK31:       [[THREE:%.+]] = icmp ne i32 [[TWO_VAL]], 0
+// CK31:       br i1 %3, label [[OMP_IF_THEN:%.+]], label [[OMP_IF_END:%.+]]
+
+// CK31:       [[FOUR:%.+]] = load i32, i32* [[CONV:%.+]]
+// CK31:       [[INC:%.+]] = add nsw i32 [[FOUR]]
+// CK31:       store i32 [[INC]], i32* [[CONV]]
+// CK31:       call void @__kmpc_end_master(%struct.ident_t* @0, i32 [[ONE_VAL]])
+// CK31:       br label [[OMP_IF_END]]
+
+// CK31:       ret void
+
+#endif
+
+#ifdef CK32
+///==========================================================================///
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix CK32
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix CK32
+
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -verify -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -DCK32 -fopenmp-version=51 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+
+// CK32-DAG:   %struct.ident_t = type { i32, i32, i32, i32, i8* }
+// CK32-DAG:   [[STR:@.+]] = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00"
+
+struct St {
+  int a, b;
+  static int y;
+  St() : a(0), b(0) {}
+  ~St() {}
+};
+int St::y = 0;
+
+void parallel_master_default_firstprivate() {
+  St a = St();
+  static int y = 0;
+#pragma omp parallel master default(firstprivate)
+  {
+    a.a += 1;
+    a.b += 1;
+    y++;
+    a.y++;
+  }
+}
+
+// CK32-LABEL: define {{.+}} @{{.+}}parallel_master_default_firstprivate{{.+}}
+// CK32: [[A_VAL:%.+]] = alloca %struct.St{{.+}}
+// CK32: [[Y_CASTED:%.+]] = alloca i64
+// CK32: call void @[[CTOR:.+]](%struct.St* [[A_VAL]])
+// CK32: [[ZERO:%.+]] = load i32, i32* @{{.+}}parallel_master_default_firstprivate{{.+}}
+// CK32: [[CONV:%.+]] = bitcast i64* [[Y_CASTED]] to i32*
+// CK32: store i32 [[ZERO]], i32* [[CONV]]
+// CK32: [[ONE:%.+]] = load i64, i64* [[Y_CASTED]]
+// CK32: call void {{.+}}@{{.+}} %struct.St* [[A_VAL]], i64 [[ONE]])
+// CK32: call void [[DTOR:@.+]](%struct.St* [[A_VAL]])
+
+// CK32: [[THIS_ADDR:%.+]] = alloca %struct.St*
+// CK32: store %struct.St* [[THIS:%.+]], %struct.St** [[THIS_ADDR]]
+// CK32: [[THIS_ONE:%.+]] = load %struct.St*, %struct.St** [[THIS_ADDR]]
+// CK32: call void [[CTOR_2:.+]](%struct.St* [[THIS_ONE]])
+// CK32: ret void
+
+// CK32: [[GLOBAL_TID_ADDR:%.+]] = alloca i32*
+// CK32: [[BOUND_TID_ADDR:%.+]] = alloca i32*
+// CK32: [[A_ADDR:%.+]] = alloca %struct.St
+// CK32: [[Y_ADDR:%.+]] = alloca i64
+// CK32: store i32* [[GLOBAL_TID:%.+]], i32** [[GLOBAL_TID_ADDR]]
+// CK32: store i32* %.bound_tid., i32** [[BOUND_TID_ADDR]]
+// CK32: store %struct.St* [[A_VAL]], %struct.St** [[A_ADDR]]{{.+}}
+// CK32: store i64 [[Y:%.+]], i64* [[Y_ADDR]]
+// CK32: [[ONE:%.+]] = load i32*, i32** [[GLOBAL_TID_ADDR]]
+// CK32: [[TWO:%.+]] = load i32, i32* [[ONE]]
+// CK32: [[THREE:%.+]] = call i32 @{{.+}} i32 [[TWO]])
+// CK32: [[FOUR:%.+]] = icmp ne i32 [[THREE]], 0
+// CK32: br i1 [[FOUR]], label [[IF_THEN:%.+]], label [[IF_END:%.+]]
+
+// CK32: [[A_1:%.+]] = getelementptr inbounds %struct.St, %struct.St* [[ZERO]], i32 0, i32 0
+// CK32: [[FIVE:%.+]] = load i32, i32* [[A_1]]
+// CK32: [[ADD:%.+]] = add nsw i32 [[FIVE]], 1
+// CK32: store i32 [[ADD]], i32* [[A_1]]
+// CK32: [[B:%.+]] = getelementptr inbounds %struct.St, %struct.St* [[ZERO]], i32 0, i32 1
+// CK32: [[SIX:%.+]] = load i32, i32* [[B]]
+// CK32: [[ADD_2:%.+]] = add nsw i32 [[SIX]], 1
+// CK32: store i32 [[ADD_2]], i32* [[B]]
+// CK32: [[SEVEN:%.+]] = load i32, i32* [[CONV]]
+// CK32: [[INC:%.+]] = add nsw i32 [[SEVEN]], 1
+// CK32: store i32 [[INC]], i32* [[CONV]]
+// CK32: [[EIGHT:%.+]] = load i32, i32* [[FUNC:@.+]]
+// CK32: [[INC_3:%.+]] = add nsw i32 [[EIGHT]], 1
+// CK32: store i32 [[INC_3]], i32* @{{.+}}
+// CK32: call void @{{.+}} i32 [[TWO]])
+// CK32: br label [[IF_END]]
+
+// CK32: [[DTOR]](%struct.St* [[THIS]])
+// CK32: [[THIS_ADDR]] = alloca %struct.St*
+// CK32: store %struct.St* [[THIS]], %struct.St** [[THIS_ADDR]]
+// CK32: [[THIS_ONE]] = load %struct.St*, %struct.St** [[THIS_ADDR]]
+// CK32: call void @_ZN2StD2Ev(%struct.St* [[THIS_ONE]])
+
+// CK32: [[THIS_ADDR]] = alloca %struct.St*
+// CK32: store %struct.St* [[THIS]], %struct.St** [[THIS_ADDR]]
+// CK32: [[THIS_ONE]] = load %struct.St*, %struct.St** [[THIS_ADDR]]
+// CK32: [[A_VAL]] = getelementptr inbounds %struct.St, %struct.St* [[THIS_ONE]], i32 0, i32 0
+// CK32: store i32 0, i32* [[A_VAL]]
+// CK32: [[B_VAL:%.+]] = getelementptr inbounds %struct.St, %struct.St* [[THIS_ONE]], i32 0, i32 1
+// CK32: store i32 0, i32* [[B_VAL]]
+// CK32: ret void
+
+// CK32: [[THIS_ADDR:%.+]] = alloca %struct.St*
+// CK32: store %struct.St* %this, %struct.St** [[THIS_ADDR]]
+// CK32: [[THIS_ONE]] = load %struct.St*, %struct.St** [[THIS_ADDR]]
+
+#endif
+
 #ifdef CK4
 ///==========================================================================///
 // RUN: %clang_cc1 -DCK4 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix CK4

diff  --git a/clang/test/OpenMP/parallel_master_default_messages.cpp b/clang/test/OpenMP/parallel_master_default_messages.cpp
index 557cba5aa322..39f78ea53ae1 100644
--- a/clang/test/OpenMP/parallel_master_default_messages.cpp
+++ b/clang/test/OpenMP/parallel_master_default_messages.cpp
@@ -2,20 +2,29 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
 #pragma omp parallel master default // expected-error {{expected '(' after 'default'}}
   {
-#pragma omp parallel master default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel master default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
     {
-#pragma omp parallel master default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel master default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
       {
 #pragma omp parallel master default(none // expected-error {{expected ')'}} expected-note {{to match this '('}}
         {
 #pragma omp parallel master default(shared), default(shared) // expected-error {{directive '#pragma omp parallel master' cannot contain more than one 'default' clause}}
           {
-#pragma omp parallel master default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel master default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
             {
               foo();
             }
@@ -37,5 +46,14 @@ int main(int argc, char **argv) {
       ++argc;  // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     }
   }
+
+#ifdef OMP51
+#pragma omp parallel master default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/parallel_sections_default_messages.cpp b/clang/test/OpenMP/parallel_sections_default_messages.cpp
index d6a10fe56b34..cfa95445fb53 100644
--- a/clang/test/OpenMP/parallel_sections_default_messages.cpp
+++ b/clang/test/OpenMP/parallel_sections_default_messages.cpp
@@ -7,15 +7,15 @@ void foo();
 int main(int argc, char **argv) {
 #pragma omp parallel sections default // expected-error {{expected '(' after 'default'}}
   {
-#pragma omp parallel sections default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp parallel sections default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
     {
-#pragma omp parallel sections default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel sections default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
       {
 #pragma omp parallel sections default(none // expected-error {{expected ')'}} expected-note {{to match this '('}}
         {
 #pragma omp parallel sections default(shared), default(shared) // expected-error {{directive '#pragma omp parallel sections' cannot contain more than one 'default' clause}}
           {
-#pragma omp parallel sections default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp parallel sections default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
             {
               foo();
             }

diff  --git a/clang/test/OpenMP/target_parallel_default_messages.cpp b/clang/test/OpenMP/target_parallel_default_messages.cpp
index 0691cdf37e4e..c8f68659438f 100644
--- a/clang/test/OpenMP/target_parallel_default_messages.cpp
+++ b/clang/test/OpenMP/target_parallel_default_messages.cpp
@@ -2,20 +2,29 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target parallel default // expected-error {{expected '(' after 'default'}}
   foo();
-  #pragma omp target parallel default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target parallel default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   foo();
-  #pragma omp target parallel default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
   #pragma omp target parallel default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
   foo();
   #pragma omp target parallel default (shared), default(shared) // expected-error {{directive '#pragma omp target parallel' cannot contain more than one 'default' clause}}
   foo();
-  #pragma omp target parallel default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 
   #pragma omp target parallel default(none) // expected-note {{explicit data sharing attribute requested here}}
@@ -28,5 +37,14 @@ int main(int argc, char **argv) {
   #pragma omp target parallel default(none) // expected-note {{explicit data sharing attribute requested here}}
   #pragma omp parallel default(shared)
   ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
+
+#ifndef OMP51
+#pragma omp target parallel default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_parallel_for_default_messages.cpp b/clang/test/OpenMP/target_parallel_for_default_messages.cpp
index fc6ba43138d7..4a3aae68e086 100644
--- a/clang/test/OpenMP/target_parallel_for_default_messages.cpp
+++ b/clang/test/OpenMP/target_parallel_for_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=51 -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=51 -DOMP51 -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   int i;
 #pragma omp target parallel for default // expected-error {{expected '(' after 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target parallel for default(none // expected-error {{expected ')'}} expected-note {{to match this '('}} expected-note {{explicit data sharing attribute requested here}}
@@ -21,7 +30,7 @@ int main(int argc, char **argv) {
 #pragma omp target parallel for default(shared), default(shared) // expected-error {{directive '#pragma omp target parallel for' cannot contain more than one 'default' clause}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 
@@ -34,5 +43,13 @@ int main(int argc, char **argv) {
   for (i = 0; i < argc; ++i) // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     foo();
 
+#ifndef OMP51
+#pragma omp target parallel for default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  for (i = 0; i < argc; ++i) {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp b/clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp
index daa93b9c9050..48489309ef03 100644
--- a/clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp
+++ b/clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   int i;
 #pragma omp target parallel for simd default // expected-error {{expected '(' after 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for simd default( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for simd default() // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 #pragma omp target parallel for simd default(none // expected-error {{expected ')'}} expected-note {{to match this '('}} expected-note {{explicit data sharing attribute requested here}}
@@ -21,7 +30,7 @@ int main(int argc, char **argv) {
 #pragma omp target parallel for simd default(shared), default(shared) // expected-error {{directive '#pragma omp target parallel for simd' cannot contain more than one 'default' clause}}
   for (i = 0; i < argc; ++i)
     foo();
-#pragma omp target parallel for simd default(x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (i = 0; i < argc; ++i)
     foo();
 
@@ -34,5 +43,13 @@ int main(int argc, char **argv) {
   for (i = 0; i < argc; ++i) // expected-error {{variable 'i' must have explicitly specified data sharing attributes}} expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
     foo();
 
+#ifndef OMP51
+#pragma omp target parallel for simd default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  for (int i = 0; i < argc; i++) {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_teams_default_messages.cpp b/clang/test/OpenMP/target_teams_default_messages.cpp
index 21fa8270ef6a..85c417f8f985 100644
--- a/clang/test/OpenMP/target_teams_default_messages.cpp
+++ b/clang/test/OpenMP/target_teams_default_messages.cpp
@@ -2,20 +2,29 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
 #pragma omp target teams default // expected-error {{expected '(' after 'default'}}
   foo();
-#pragma omp target teams default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target teams default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   foo();
-#pragma omp target teams default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 #pragma omp target teams default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
   foo();
 #pragma omp target teams default (shared), default(shared) // expected-error {{directive '#pragma omp target teams' cannot contain more than one 'default' clause}}
   foo();
-#pragma omp target teams default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 
 #pragma omp target teams default(none) // expected-note {{explicit data sharing attribute requested here}}
@@ -24,5 +33,14 @@ int main(int argc, char **argv) {
 #pragma omp target teams default(none) // expected-note {{explicit data sharing attribute requested here}}
 #pragma omp parallel default(shared)
   ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
+
+#ifndef OMP51
+#pragma omp target teams default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_teams_distribute_default_messages.cpp b/clang/test/OpenMP/target_teams_distribute_default_messages.cpp
index fd834e7cba32..a490ad61385f 100644
--- a/clang/test/OpenMP/target_teams_distribute_default_messages.cpp
+++ b/clang/test/OpenMP/target_teams_distribute_default_messages.cpp
@@ -2,24 +2,41 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=51 -DOMP51 %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=51 -DOMP51 %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target teams distribute default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
-  #pragma omp target teams distribute default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target teams distribute default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
-  #pragma omp target teams distribute default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target teams distribute default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target teams distribute default (shared), default(shared) // expected-error {{directive '#pragma omp target teams distribute' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
-  #pragma omp target teams distribute default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
   #pragma omp target teams distribute default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifndef OMP51
+#pragma omp target teams distribute default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  for (int i = 0; i < 200; i++) {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp b/clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp
index 00e0704a6cca..2fe793136961 100644
--- a/clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp
+++ b/clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp
@@ -2,24 +2,41 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
 #pragma omp target teams distribute parallel for default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
- #pragma omp target teams distribute parallel for default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target teams distribute parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
-#pragma omp target teams distribute parallel for default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 #pragma omp target teams distribute parallel for default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
 #pragma omp target teams distribute parallel for default (shared), default(shared) // expected-error {{directive '#pragma omp target teams distribute parallel for' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
-#pragma omp target teams distribute parallel for default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
 #pragma omp target teams distribute parallel for default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifndef OMP51
+#pragma omp target teams distribute parallel for default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  for (int i = 0; i < 200; i++) {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp b/clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp
index 7c46c964d2ec..e5ff85622250 100644
--- a/clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp
+++ b/clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp
@@ -2,16 +2,25 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
 #pragma omp target teams distribute parallel for simd default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
 
-#pragma omp target teams distribute parallel for simd default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp target teams distribute parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
 
-#pragma omp target teams distribute parallel for simd default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
 #pragma omp target teams distribute parallel for simd default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -20,11 +29,19 @@ int main(int argc, char **argv) {
 #pragma omp target teams distribute parallel for simd default (shared), default(shared) // expected-error {{directive '#pragma omp target teams distribute parallel for simd' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
 
-#pragma omp target teams distribute parallel for simd default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp target teams distribute parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
 #pragma omp target teams distribute parallel for simd default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifndef OMP51
+#pragma omp target teams distribute parallel for simd default(firstprivate) // expected-error {{data-sharing attribute 'firstprivate' in 'default' clause requires OpenMP version 5.1 or above}}
+  for (int i = 0; i < argc; ++i) {
+    ++x;
+    ++y;
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/task_default_messages.cpp b/clang/test/OpenMP/task_default_messages.cpp
index 4826c253aa04..8b6809ee05d5 100644
--- a/clang/test/OpenMP/task_default_messages.cpp
+++ b/clang/test/OpenMP/task_default_messages.cpp
@@ -2,15 +2,24 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
 #pragma omp task default                          // expected-error {{expected '(' after 'default'}}
-#pragma omp task default(                         // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
-#pragma omp task default()                        // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp task default(                         // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp task default()                        // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
 #pragma omp task default(none                     // expected-error {{expected ')'}} expected-note {{to match this '('}}
 #pragma omp task default(shared), default(shared) // expected-error {{directive '#pragma omp task' cannot contain more than one 'default' clause}}
-#pragma omp task default(x)                       // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp task default(x)                       // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 
 #pragma omp task default(none) // expected-note {{explicit data sharing attribute requested here}}
@@ -19,5 +28,13 @@ int main(int argc, char **argv) {
 #pragma omp task default(none) // expected-note {{explicit data sharing attribute requested here}}
 #pragma omp task default(shared)
   ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
+
+#ifdef OMP51
+#pragma omp task default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
   return 0;
 }

diff  --git a/clang/test/OpenMP/task_messages.cpp b/clang/test/OpenMP/task_messages.cpp
index 8b3183e0bd93..13cbfb6c4569 100644
--- a/clang/test/OpenMP/task_messages.cpp
+++ b/clang/test/OpenMP/task_messages.cpp
@@ -4,6 +4,9 @@
 // RUN: %clang_cc1 -verify=expected,omp45 -fopenmp-version=45 -fopenmp-simd -ferror-limit 200 -std=c++11 -o - %s -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,omp50 -fopenmp-version=50 -fopenmp-simd -ferror-limit 200 -std=c++11 -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify=expected,omp50 -fopenmp-version=51 -DOMP51 -fopenmp -ferror-limit 100 -std=c++11 -o - %s -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,omp50 -fopenmp-version=51 -DOMP51 -fopenmp-simd -ferror-limit 100 -std=c++11 -o - %s -Wuninitialized
+
 void xxx(int argc) {
   int x; // expected-note {{initialize the variable 'x' to silence this warning}}
 #pragma omp task
@@ -16,6 +19,10 @@ void foo() {
 }
 
 typedef unsigned long omp_event_handle_t;
+namespace {
+static int y = 0;
+}
+static int x = 0;
 
 #pragma omp task // expected-error {{unexpected OpenMP directive '#pragma omp task'}}
 
@@ -52,6 +59,15 @@ int foo() {
 #pragma omp task default(none) // expected-note 2 {{explicit data sharing attribute requested here}}
 #pragma omp task default(shared)
   ++a; // expected-error 2 {{variable 'a' must have explicitly specified data sharing attributes}}
+#ifdef OMP51
+#pragma omp task default(firstprivate) // expected-note 4 {{explicit data sharing attribute requested here}}
+#pragma omp task
+  {
+    ++x; // expected-error 2 {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error 2 {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
 #pragma omp task default(none) // expected-note 2 {{explicit data sharing attribute requested here}}
 #pragma omp task
   // expected-error at +1 {{calling a private constructor of class 'S'}}

diff  --git a/clang/test/OpenMP/teams_default_messages.cpp b/clang/test/OpenMP/teams_default_messages.cpp
index a02505040600..b117ef4948a0 100644
--- a/clang/test/OpenMP/teams_default_messages.cpp
+++ b/clang/test/OpenMP/teams_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -o - %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp -o - %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd -o - %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target
   #pragma omp teams default // expected-error {{expected '(' after 'default'}}
   foo();
   #pragma omp target
-  #pragma omp teams default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp teams default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   foo();
   #pragma omp target
-  #pragma omp teams default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
   #pragma omp target
   #pragma omp teams default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -21,7 +30,7 @@ int main(int argc, char **argv) {
   #pragma omp teams default (shared), default(shared) // expected-error {{directive '#pragma omp teams' cannot contain more than one 'default' clause}}
   foo();
   #pragma omp target
-  #pragma omp teams default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   foo();
 
   #pragma omp target
@@ -32,5 +41,14 @@ int main(int argc, char **argv) {
   #pragma omp teams default(none) // expected-note {{explicit data sharing attribute requested here}}
   #pragma omp parallel default(shared)
   ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
+
+#ifdef OMP51
+#pragma omp target
+#pragma omp teams default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
   return 0;
 }

diff  --git a/clang/test/OpenMP/teams_distribute_default_messages.cpp b/clang/test/OpenMP/teams_distribute_default_messages.cpp
index 7f000208303b..1d5fd40c53a6 100644
--- a/clang/test/OpenMP/teams_distribute_default_messages.cpp
+++ b/clang/test/OpenMP/teams_distribute_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target
   #pragma omp teams distribute default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp teams distribute default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
   #pragma omp teams distribute default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -21,12 +30,21 @@ int main(int argc, char **argv) {
   #pragma omp teams distribute default (shared), default(shared) // expected-error {{directive '#pragma omp teams distribute' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
   #pragma omp target
   #pragma omp teams distribute default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifdef OMP51
+#pragma omp target
+#pragma omp teams distribute default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (int i = 0; i < 200; i++) {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp b/clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp
index 2c4662398507..3a414543be80 100644
--- a/clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp
+++ b/clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp %s -Wuninitialized
+
+// RUN: %clang_cc1 -verify -fopenmp-version=51 -DOMP51 -fopenmp-simd %s -Wuninitialized
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target
   #pragma omp teams distribute parallel for default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp teams distribute parallel for default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute parallel for default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
   #pragma omp teams distribute parallel for default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -21,12 +30,21 @@ int main(int argc, char **argv) {
   #pragma omp teams distribute parallel for default (shared), default(shared) // expected-error {{directive '#pragma omp teams distribute parallel for' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute parallel for default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
   #pragma omp target
   #pragma omp teams distribute parallel for default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifdef OMP51
+#pragma omp target
+#pragma omp teams distribute parallel for default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (int i = 0; i < 200; i++) {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp b/clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp
index 93017a8233ff..ce7f35b47959 100644
--- a/clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp
+++ b/clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp
@@ -2,17 +2,26 @@
 
 // RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
 
+// RUN: %clang_cc1 -verify -fopenmp %s -Wuninitialized -fopenmp-version=51 -DOMP51
+
+// RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized -fopenmp-version=51 -DOMP51
+
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target
   #pragma omp teams distribute parallel for simd default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for simd default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp teams distribute parallel for simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for simd default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute parallel for simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
   #pragma omp teams distribute parallel for simd default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -21,12 +30,20 @@ int main(int argc, char **argv) {
   #pragma omp teams distribute parallel for simd default (shared), default(shared) // expected-error {{directive '#pragma omp teams distribute parallel for simd' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute parallel for simd default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute parallel for simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
   #pragma omp target
   #pragma omp teams distribute parallel for simd default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#ifdef OpenMP51
+#pragma omp teams distribute parallel for default(firstprivate) // expected-note 2 {{explicit data sharing attribute requested here}}
+  for (int i = 0; i < 200; i++) {
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+  }
+#endif
+
   return 0;
 }

diff  --git a/clang/test/OpenMP/teams_distribute_simd_default_messages.cpp b/clang/test/OpenMP/teams_distribute_simd_default_messages.cpp
index 2775210ae048..11f5d1cd1fc8 100644
--- a/clang/test/OpenMP/teams_distribute_simd_default_messages.cpp
+++ b/clang/test/OpenMP/teams_distribute_simd_default_messages.cpp
@@ -1,18 +1,23 @@
-// RUN: %clang_cc1 -verify -fopenmp %s -Wuninitialized
+// RUN: %clang_cc1 -verify -fopenmp %s -Wuninitialized -fopenmp-version=51
 
-// RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized
+// RUN: %clang_cc1 -verify -fopenmp-simd %s -Wuninitialized -fopenmp-version=51
 
 void foo();
 
+namespace {
+static int y = 0;
+}
+static int x = 0;
+
 int main(int argc, char **argv) {
   #pragma omp target
   #pragma omp teams distribute simd default // expected-error {{expected '(' after 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute simd default ( // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
+#pragma omp teams distribute simd default( // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}} expected-error {{expected ')'}} expected-note {{to match this '('}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute simd default () // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute simd default() // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
   #pragma omp teams distribute simd default (none // expected-error {{expected ')'}} expected-note {{to match this '('}}
@@ -21,12 +26,22 @@ int main(int argc, char **argv) {
   #pragma omp teams distribute simd default (shared), default(shared) // expected-error {{directive '#pragma omp teams distribute simd' cannot contain more than one 'default' clause}}
   for (int i=0; i<200; i++) foo();
   #pragma omp target
-  #pragma omp teams distribute simd default (x) // expected-error {{expected 'none' or 'shared' in OpenMP clause 'default'}}
+#pragma omp teams distribute simd default(x) // expected-error {{expected 'none', 'shared' or 'firstprivate' in OpenMP clause 'default'}}
   for (int i=0; i<200; i++) foo();
 
   #pragma omp target
   #pragma omp teams distribute simd default(none) // expected-note {{explicit data sharing attribute requested here}}
   for (int i=0; i<200; i++) ++argc; // expected-error {{variable 'argc' must have explicitly specified data sharing attributes}}
 
+#pragma omp target
+#pragma omp teams distribute simd default(firstprivate) // expected-note {{explicit data sharing attribute requested here}}
+  for (int i = 0; i < 200; i++)
+    ++x; // expected-error {{variable 'x' must have explicitly specified data sharing attributes}}
+
+#pragma omp target
+#pragma omp teams distribute simd default(firstprivate) // expected-note {{explicit data sharing attribute requested here}}
+  for (int i = 0; i < 200; i++)
+    ++y; // expected-error {{variable 'y' must have explicitly specified data sharing attributes}}
+
   return 0;
 }

diff  --git a/clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp
index aeb4fd098d22..687908043a8d 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp
@@ -103,9 +103,9 @@ TEST(IsExpandedFromMacro, ShouldMatchFromCommandLine) {
   StringRef input = R"cc(
     void Test() { FOUR_PLUS_FOUR; }
   )cc";
-  EXPECT_TRUE(matchesConditionally(input,
-                                   binaryOperator(isExpandedFromMacro("FOUR_PLUS_FOUR")),
-                                   true, {"-std=c++11", "-DFOUR_PLUS_FOUR=4+4"}));
+  EXPECT_TRUE(matchesConditionally(
+      input, binaryOperator(isExpandedFromMacro("FOUR_PLUS_FOUR")), true,
+      {"-std=c++11", "-DFOUR_PLUS_FOUR=4+4"}));
 }
 
 TEST(IsExpandedFromMacro, ShouldNotMatchBeginOnly) {
@@ -143,31 +143,31 @@ TEST(IsExpandedFromMacro, ShouldNotMatchDifferentInstances) {
 }
 
 TEST(AllOf, AllOverloadsWork) {
-  const char Program[] =
-      "struct T { };"
-      "int f(int, T*, int, int);"
-      "void g(int x) { T t; f(x, &t, 3, 4); }";
-  EXPECT_TRUE(matches(Program,
-      callExpr(allOf(callee(functionDecl(hasName("f"))),
-                     hasArgument(0, declRefExpr(to(varDecl())))))));
-  EXPECT_TRUE(matches(Program,
-      callExpr(allOf(callee(functionDecl(hasName("f"))),
-                     hasArgument(0, declRefExpr(to(varDecl()))),
-                     hasArgument(1, hasType(pointsTo(
-                                        recordDecl(hasName("T")))))))));
-  EXPECT_TRUE(matches(Program,
-      callExpr(allOf(callee(functionDecl(hasName("f"))),
-                     hasArgument(0, declRefExpr(to(varDecl()))),
-                     hasArgument(1, hasType(pointsTo(
-                                        recordDecl(hasName("T"))))),
-                     hasArgument(2, integerLiteral(equals(3)))))));
-  EXPECT_TRUE(matches(Program,
-      callExpr(allOf(callee(functionDecl(hasName("f"))),
-                     hasArgument(0, declRefExpr(to(varDecl()))),
-                     hasArgument(1, hasType(pointsTo(
-                                        recordDecl(hasName("T"))))),
-                     hasArgument(2, integerLiteral(equals(3))),
-                     hasArgument(3, integerLiteral(equals(4)))))));
+  const char Program[] = "struct T { };"
+                         "int f(int, T*, int, int);"
+                         "void g(int x) { T t; f(x, &t, 3, 4); }";
+  EXPECT_TRUE(matches(
+      Program, callExpr(allOf(callee(functionDecl(hasName("f"))),
+                              hasArgument(0, declRefExpr(to(varDecl())))))));
+  EXPECT_TRUE(matches(
+      Program,
+      callExpr(
+          allOf(callee(functionDecl(hasName("f"))),
+                hasArgument(0, declRefExpr(to(varDecl()))),
+                hasArgument(1, hasType(pointsTo(recordDecl(hasName("T")))))))));
+  EXPECT_TRUE(matches(
+      Program, callExpr(allOf(
+                   callee(functionDecl(hasName("f"))),
+                   hasArgument(0, declRefExpr(to(varDecl()))),
+                   hasArgument(1, hasType(pointsTo(recordDecl(hasName("T"))))),
+                   hasArgument(2, integerLiteral(equals(3)))))));
+  EXPECT_TRUE(matches(
+      Program, callExpr(allOf(
+                   callee(functionDecl(hasName("f"))),
+                   hasArgument(0, declRefExpr(to(varDecl()))),
+                   hasArgument(1, hasType(pointsTo(recordDecl(hasName("T"))))),
+                   hasArgument(2, integerLiteral(equals(3))),
+                   hasArgument(3, integerLiteral(equals(4)))))));
 }
 
 TEST(DeclarationMatcher, MatchHas) {
@@ -176,127 +176,103 @@ TEST(DeclarationMatcher, MatchHas) {
   EXPECT_TRUE(matches("class X {};", HasClassX));
 
   DeclarationMatcher YHasClassX =
-    recordDecl(hasName("Y"), has(recordDecl(hasName("X"))));
+      recordDecl(hasName("Y"), has(recordDecl(hasName("X"))));
   EXPECT_TRUE(matches("class Y { class X {}; };", YHasClassX));
   EXPECT_TRUE(notMatches("class X {};", YHasClassX));
-  EXPECT_TRUE(
-    notMatches("class Y { class Z { class X {}; }; };", YHasClassX));
+  EXPECT_TRUE(notMatches("class Y { class Z { class X {}; }; };", YHasClassX));
 }
 
 TEST(DeclarationMatcher, MatchHasRecursiveAllOf) {
   DeclarationMatcher Recursive =
-    recordDecl(
-      has(recordDecl(
-        has(recordDecl(hasName("X"))),
-        has(recordDecl(hasName("Y"))),
-        hasName("Z"))),
-      has(recordDecl(
-        has(recordDecl(hasName("A"))),
-        has(recordDecl(hasName("B"))),
-        hasName("C"))),
-      hasName("F"));
-
-  EXPECT_TRUE(matches(
-    "class F {"
-      "  class Z {"
-      "    class X {};"
-      "    class Y {};"
-      "  };"
-      "  class C {"
-      "    class A {};"
-      "    class B {};"
-      "  };"
-      "};", Recursive));
-
-  EXPECT_TRUE(matches(
-    "class F {"
-      "  class Z {"
-      "    class A {};"
-      "    class X {};"
-      "    class Y {};"
-      "  };"
-      "  class C {"
-      "    class X {};"
-      "    class A {};"
-      "    class B {};"
-      "  };"
-      "};", Recursive));
-
-  EXPECT_TRUE(matches(
-    "class O1 {"
-      "  class O2 {"
-      "    class F {"
-      "      class Z {"
-      "        class A {};"
-      "        class X {};"
-      "        class Y {};"
-      "      };"
-      "      class C {"
-      "        class X {};"
-      "        class A {};"
-      "        class B {};"
-      "      };"
-      "    };"
-      "  };"
-      "};", Recursive));
+      recordDecl(has(recordDecl(has(recordDecl(hasName("X"))),
+                                has(recordDecl(hasName("Y"))), hasName("Z"))),
+                 has(recordDecl(has(recordDecl(hasName("A"))),
+                                has(recordDecl(hasName("B"))), hasName("C"))),
+                 hasName("F"));
+
+  EXPECT_TRUE(matches("class F {"
+                      "  class Z {"
+                      "    class X {};"
+                      "    class Y {};"
+                      "  };"
+                      "  class C {"
+                      "    class A {};"
+                      "    class B {};"
+                      "  };"
+                      "};",
+                      Recursive));
+
+  EXPECT_TRUE(matches("class F {"
+                      "  class Z {"
+                      "    class A {};"
+                      "    class X {};"
+                      "    class Y {};"
+                      "  };"
+                      "  class C {"
+                      "    class X {};"
+                      "    class A {};"
+                      "    class B {};"
+                      "  };"
+                      "};",
+                      Recursive));
+
+  EXPECT_TRUE(matches("class O1 {"
+                      "  class O2 {"
+                      "    class F {"
+                      "      class Z {"
+                      "        class A {};"
+                      "        class X {};"
+                      "        class Y {};"
+                      "      };"
+                      "      class C {"
+                      "        class X {};"
+                      "        class A {};"
+                      "        class B {};"
+                      "      };"
+                      "    };"
+                      "  };"
+                      "};",
+                      Recursive));
 }
 
 TEST(DeclarationMatcher, MatchHasRecursiveAnyOf) {
-  DeclarationMatcher Recursive =
-    recordDecl(
-      anyOf(
-        has(recordDecl(
-          anyOf(
-            has(recordDecl(
-              hasName("X"))),
-            has(recordDecl(
-              hasName("Y"))),
-            hasName("Z")))),
-        has(recordDecl(
-          anyOf(
-            hasName("C"),
-            has(recordDecl(
-              hasName("A"))),
-            has(recordDecl(
-              hasName("B")))))),
-        hasName("F")));
+  DeclarationMatcher Recursive = recordDecl(
+      anyOf(has(recordDecl(anyOf(has(recordDecl(hasName("X"))),
+                                 has(recordDecl(hasName("Y"))), hasName("Z")))),
+            has(recordDecl(anyOf(hasName("C"), has(recordDecl(hasName("A"))),
+                                 has(recordDecl(hasName("B")))))),
+            hasName("F")));
 
   EXPECT_TRUE(matches("class F {};", Recursive));
   EXPECT_TRUE(matches("class Z {};", Recursive));
   EXPECT_TRUE(matches("class C {};", Recursive));
   EXPECT_TRUE(matches("class M { class N { class X {}; }; };", Recursive));
   EXPECT_TRUE(matches("class M { class N { class B {}; }; };", Recursive));
-  EXPECT_TRUE(
-    matches("class O1 { class O2 {"
-              "  class M { class N { class B {}; }; }; "
-              "}; };", Recursive));
+  EXPECT_TRUE(matches("class O1 { class O2 {"
+                      "  class M { class N { class B {}; }; }; "
+                      "}; };",
+                      Recursive));
 }
 
 TEST(DeclarationMatcher, MatchNot) {
   DeclarationMatcher NotClassX =
-    cxxRecordDecl(
-      isDerivedFrom("Y"),
-      unless(hasName("X")));
+      cxxRecordDecl(isDerivedFrom("Y"), unless(hasName("X")));
   EXPECT_TRUE(notMatches("", NotClassX));
   EXPECT_TRUE(notMatches("class Y {};", NotClassX));
   EXPECT_TRUE(matches("class Y {}; class Z : public Y {};", NotClassX));
   EXPECT_TRUE(notMatches("class Y {}; class X : public Y {};", NotClassX));
   EXPECT_TRUE(
-    notMatches("class Y {}; class Z {}; class X : public Y {};",
-               NotClassX));
+      notMatches("class Y {}; class Z {}; class X : public Y {};", NotClassX));
 
   DeclarationMatcher ClassXHasNotClassY =
-    recordDecl(
-      hasName("X"),
-      has(recordDecl(hasName("Z"))),
-      unless(
-        has(recordDecl(hasName("Y")))));
+      recordDecl(hasName("X"), has(recordDecl(hasName("Z"))),
+                 unless(has(recordDecl(hasName("Y")))));
   EXPECT_TRUE(matches("class X { class Z {}; };", ClassXHasNotClassY));
-  EXPECT_TRUE(notMatches("class X { class Y {}; class Z {}; };",
-                         ClassXHasNotClassY));
+  EXPECT_TRUE(
+      notMatches("class X { class Y {}; class Z {}; };", ClassXHasNotClassY));
 
   DeclarationMatcher NamedNotRecord =
-    namedDecl(hasName("Foo"), unless(recordDecl()));
+      namedDecl(hasName("Foo"), unless(recordDecl()));
   EXPECT_TRUE(matches("void Foo(){}", NamedNotRecord));
   EXPECT_TRUE(notMatches("struct Foo {};", NamedNotRecord));
 }
@@ -318,67 +294,61 @@ TEST(CastExpression, HasCastKind) {
 
 TEST(DeclarationMatcher, HasDescendant) {
   DeclarationMatcher ZDescendantClassX =
-    recordDecl(
-      hasDescendant(recordDecl(hasName("X"))),
-      hasName("Z"));
+      recordDecl(hasDescendant(recordDecl(hasName("X"))), hasName("Z"));
   EXPECT_TRUE(matches("class Z { class X {}; };", ZDescendantClassX));
   EXPECT_TRUE(
-    matches("class Z { class Y { class X {}; }; };", ZDescendantClassX));
+      matches("class Z { class Y { class X {}; }; };", ZDescendantClassX));
+  EXPECT_TRUE(matches("class Z { class A { class Y { class X {}; }; }; };",
+                      ZDescendantClassX));
   EXPECT_TRUE(
-    matches("class Z { class A { class Y { class X {}; }; }; };",
-            ZDescendantClassX));
-  EXPECT_TRUE(
-    matches("class Z { class A { class B { class Y { class X {}; }; }; }; };",
-            ZDescendantClassX));
+      matches("class Z { class A { class B { class Y { class X {}; }; }; }; };",
+              ZDescendantClassX));
   EXPECT_TRUE(notMatches("class Z {};", ZDescendantClassX));
 
-  DeclarationMatcher ZDescendantClassXHasClassY =
-    recordDecl(
-      hasDescendant(recordDecl(has(recordDecl(hasName("Y"))),
-                               hasName("X"))),
+  DeclarationMatcher ZDescendantClassXHasClassY = recordDecl(
+      hasDescendant(recordDecl(has(recordDecl(hasName("Y"))), hasName("X"))),
       hasName("Z"));
   EXPECT_TRUE(matches("class Z { class X { class Y {}; }; };",
                       ZDescendantClassXHasClassY));
   EXPECT_TRUE(
-    matches("class Z { class A { class B { class X { class Y {}; }; }; }; };",
-            ZDescendantClassXHasClassY));
-  EXPECT_TRUE(notMatches(
-    "class Z {"
-      "  class A {"
-      "    class B {"
-      "      class X {"
-      "        class C {"
-      "          class Y {};"
-      "        };"
-      "      };"
-      "    }; "
-      "  };"
-      "};", ZDescendantClassXHasClassY));
+      matches("class Z { class A { class B { class X { class Y {}; }; }; }; };",
+              ZDescendantClassXHasClassY));
+  EXPECT_TRUE(notMatches("class Z {"
+                         "  class A {"
+                         "    class B {"
+                         "      class X {"
+                         "        class C {"
+                         "          class Y {};"
+                         "        };"
+                         "      };"
+                         "    }; "
+                         "  };"
+                         "};",
+                         ZDescendantClassXHasClassY));
 
   DeclarationMatcher ZDescendantClassXDescendantClassY =
-    recordDecl(
-      hasDescendant(recordDecl(hasDescendant(recordDecl(hasName("Y"))),
-                               hasName("X"))),
-      hasName("Z"));
-  EXPECT_TRUE(
-    matches("class Z { class A { class X { class B { class Y {}; }; }; }; };",
-            ZDescendantClassXDescendantClassY));
-  EXPECT_TRUE(matches(
-    "class Z {"
-      "  class A {"
-      "    class X {"
-      "      class B {"
-      "        class Y {};"
-      "      };"
-      "      class Y {};"
-      "    };"
-      "  };"
-      "};", ZDescendantClassXDescendantClassY));
+      recordDecl(hasDescendant(recordDecl(
+                     hasDescendant(recordDecl(hasName("Y"))), hasName("X"))),
+                 hasName("Z"));
+  EXPECT_TRUE(
+      matches("class Z { class A { class X { class B { class Y {}; }; }; }; };",
+              ZDescendantClassXDescendantClassY));
+  EXPECT_TRUE(matches("class Z {"
+                      "  class A {"
+                      "    class X {"
+                      "      class B {"
+                      "        class Y {};"
+                      "      };"
+                      "      class Y {};"
+                      "    };"
+                      "  };"
+                      "};",
+                      ZDescendantClassXDescendantClassY));
 }
 
 TEST(DeclarationMatcher, HasDescendantMemoization) {
   DeclarationMatcher CannotMemoize =
-    decl(hasDescendant(typeLoc().bind("x")), has(decl()));
+      decl(hasDescendant(typeLoc().bind("x")), has(decl()));
   EXPECT_TRUE(matches("void f() { int i; }", CannotMemoize));
 }
 
@@ -401,39 +371,36 @@ TEST(DeclarationMatcher, HasAncestorMemoization) {
   // That node can't be memoized so we have to check for it before trying to put
   // it on the cache.
   DeclarationMatcher CannotMemoize = classTemplateSpecializationDecl(
-    hasAnyTemplateArgument(templateArgument().bind("targ")),
-    forEach(fieldDecl(hasAncestor(forStmt()))));
+      hasAnyTemplateArgument(templateArgument().bind("targ")),
+      forEach(fieldDecl(hasAncestor(forStmt()))));
 
   EXPECT_TRUE(notMatches("template  struct S;"
-                           "template <> struct S{ int i; int j; };",
+                         "template <> struct S{ int i; int j; };",
                          CannotMemoize));
 }
 
 TEST(DeclarationMatcher, HasAttr) {
   EXPECT_TRUE(matches("struct __attribute__((warn_unused)) X {};",
                       decl(hasAttr(clang::attr::WarnUnused))));
-  EXPECT_FALSE(matches("struct X {};",
-                       decl(hasAttr(clang::attr::WarnUnused))));
+  EXPECT_FALSE(matches("struct X {};", decl(hasAttr(clang::attr::WarnUnused))));
 }
 
-
 TEST(DeclarationMatcher, MatchAnyOf) {
   DeclarationMatcher YOrZDerivedFromX = cxxRecordDecl(
-    anyOf(hasName("Y"), allOf(isDerivedFrom("X"), hasName("Z"))));
+      anyOf(hasName("Y"), allOf(isDerivedFrom("X"), hasName("Z"))));
   EXPECT_TRUE(matches("class X {}; class Z : public X {};", YOrZDerivedFromX));
   EXPECT_TRUE(matches("class Y {};", YOrZDerivedFromX));
   EXPECT_TRUE(
-    notMatches("class X {}; class W : public X {};", YOrZDerivedFromX));
+      notMatches("class X {}; class W : public X {};", YOrZDerivedFromX));
   EXPECT_TRUE(notMatches("class Z {};", YOrZDerivedFromX));
 
   DeclarationMatcher XOrYOrZOrU =
-    recordDecl(anyOf(hasName("X"), hasName("Y"), hasName("Z"), hasName("U")));
+      recordDecl(anyOf(hasName("X"), hasName("Y"), hasName("Z"), hasName("U")));
   EXPECT_TRUE(matches("class X {};", XOrYOrZOrU));
   EXPECT_TRUE(notMatches("class V {};", XOrYOrZOrU));
 
-  DeclarationMatcher XOrYOrZOrUOrV =
-    recordDecl(anyOf(hasName("X"), hasName("Y"), hasName("Z"), hasName("U"),
-                     hasName("V")));
+  DeclarationMatcher XOrYOrZOrUOrV = recordDecl(anyOf(
+      hasName("X"), hasName("Y"), hasName("Z"), hasName("U"), hasName("V")));
   EXPECT_TRUE(matches("class X {};", XOrYOrZOrUOrV));
   EXPECT_TRUE(matches("class Y {};", XOrYOrZOrUOrV));
   EXPECT_TRUE(matches("class Z {};", XOrYOrZOrUOrV));
@@ -447,8 +414,8 @@ TEST(DeclarationMatcher, MatchAnyOf) {
   EXPECT_TRUE(notMatches("int F() { return 1; }", MixedTypes));
 
   EXPECT_TRUE(
-    matches("void f() try { } catch (int) { } catch (...) { }",
-            cxxCatchStmt(anyOf(hasDescendant(varDecl()), isCatchAll()))));
+      matches("void f() try { } catch (int) { } catch (...) { }",
+              cxxCatchStmt(anyOf(hasDescendant(varDecl()), isCatchAll()))));
 }
 
 TEST(DeclarationMatcher, ClassIsDerived) {
@@ -460,19 +427,17 @@ TEST(DeclarationMatcher, ClassIsDerived) {
   EXPECT_TRUE(notMatches("class Y;", IsDerivedFromX));
   EXPECT_TRUE(notMatches("", IsDerivedFromX));
   EXPECT_TRUE(matches("class X {}; template class Y : Y, X {};",
-    IsDerivedFromX));
+                      IsDerivedFromX));
   EXPECT_TRUE(matches("class X {}; template class Y : X, Y {};",
-    IsDerivedFromX));
+                      IsDerivedFromX));
 
-  DeclarationMatcher IsZDerivedFromX = cxxRecordDecl(hasName("Z"),
-    isDerivedFrom("X"));
-  EXPECT_TRUE(
-    matches(
-      "class X {};"
-      "template class Y : Y {};"
-      "template<> class Y<0> : X {};"
-      "class Z : Y<1> {};",
-      IsZDerivedFromX));
+  DeclarationMatcher IsZDerivedFromX =
+      cxxRecordDecl(hasName("Z"), isDerivedFrom("X"));
+  EXPECT_TRUE(matches("class X {};"
+                      "template class Y : Y {};"
+                      "template<> class Y<0> : X {};"
+                      "class Z : Y<1> {};",
+                      IsZDerivedFromX));
 
   DeclarationMatcher IsDirectlyDerivedFromX =
       cxxRecordDecl(isDirectlyDerivedFrom("X"));
@@ -493,145 +458,138 @@ TEST(DeclarationMatcher, ClassIsDerived) {
   EXPECT_TRUE(notMatches("", IsAX));
 
   DeclarationMatcher ZIsDerivedFromX =
-    cxxRecordDecl(hasName("Z"), isDerivedFrom("X"));
+      cxxRecordDecl(hasName("Z"), isDerivedFrom("X"));
   DeclarationMatcher ZIsDirectlyDerivedFromX =
       cxxRecordDecl(hasName("Z"), isDirectlyDerivedFrom("X"));
   EXPECT_TRUE(
-    matches("class X {}; class Y : public X {}; class Z : public Y {};",
-            ZIsDerivedFromX));
+      matches("class X {}; class Y : public X {}; class Z : public Y {};",
+              ZIsDerivedFromX));
   EXPECT_TRUE(
       notMatches("class X {}; class Y : public X {}; class Z : public Y {};",
                  ZIsDirectlyDerivedFromX));
-  EXPECT_TRUE(
-    matches("class X {};"
-              "template class Y : public X {};"
-              "class Z : public Y {};", ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {};"
+                      "template class Y : public X {};"
+                      "class Z : public Y {};",
+                      ZIsDerivedFromX));
   EXPECT_TRUE(notMatches("class X {};"
                          "template class Y : public X {};"
                          "class Z : public Y {};",
                          ZIsDirectlyDerivedFromX));
   EXPECT_TRUE(matches("class X {}; template class Z : public X {};",
                       ZIsDerivedFromX));
+  EXPECT_TRUE(matches("template class X {}; "
+                      "template class Z : public X {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("template class X {}; "
+                      "template class Z : public X {};",
+                      ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("template class X {}; "
-              "template class Z : public X {};",
-            ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("template class X {}; "
-              "template class Z : public X {};",
-            ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("template class A { class Z : public X {}; };",
-               ZIsDerivedFromX));
+      notMatches("template class A { class Z : public X {}; };",
+                 ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("template class A { public: class Z : public X {}; }; "
-              "class X{}; void y() { A::Z z; }", ZIsDerivedFromX));
+      matches("template class A { public: class Z : public X {}; }; "
+              "class X{}; void y() { A::Z z; }",
+              ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("template  class X {}; "
+      matches("template  class X {}; "
               "template class A { class Z : public X {}; };",
-            ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("template class X> class A { "
-                 "  class Z : public X {}; };", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("template class X> class A { "
-              "  public: class Z : public X {}; }; "
-              "template class X {}; void y() { A::Z z; }",
-            ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("template class A { class Z : public X::D {}; };",
-               ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("template class A { public: "
-              "  class Z : public X::D {}; }; "
-              "class Y { public: class X {}; typedef X D; }; "
-              "void y() { A::Z z; }", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class X {}; typedef X Y; class Z : public Y {};",
-            ZIsDerivedFromX));
+              ZIsDerivedFromX));
+  EXPECT_TRUE(notMatches("template class X> class A { "
+                         "  class Z : public X {}; };",
+                         ZIsDerivedFromX));
+  EXPECT_TRUE(matches("template class X> class A { "
+                      "  public: class Z : public X {}; }; "
+                      "template class X {}; void y() { A::Z z; }",
+                      ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("template class Y { typedef typename T::U X; "
-              "  class Z : public X {}; };", ZIsDerivedFromX));
-  EXPECT_TRUE(matches("class X {}; class Z : public ::X {};",
+      notMatches("template class A { class Z : public X::D {}; };",
+                 ZIsDerivedFromX));
+  EXPECT_TRUE(matches("template class A { public: "
+                      "  class Z : public X::D {}; }; "
+                      "class Y { public: class X {}; typedef X D; }; "
+                      "void y() { A::Z z; }",
                       ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {}; typedef X Y; class Z : public Y {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("template class Y { typedef typename T::U X; "
+                      "  class Z : public X {}; };",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {}; class Z : public ::X {};", ZIsDerivedFromX));
   EXPECT_TRUE(
-    notMatches("template class X {}; "
+      notMatches("template class X {}; "
                  "template class A { class Z : public X::D {}; };",
-               ZIsDerivedFromX));
+                 ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("template class X { public: typedef X D; }; "
+      matches("template class X { public: typedef X D; }; "
               "template class A { public: "
               "  class Z : public X::D {}; }; void y() { A::Z z; }",
-            ZIsDerivedFromX));
+              ZIsDerivedFromX));
   EXPECT_TRUE(
-    notMatches("template class A { class Z : public X::D::E {}; };",
-               ZIsDerivedFromX));
+      notMatches("template class A { class Z : public X::D::E {}; };",
+                 ZIsDerivedFromX));
   EXPECT_TRUE(
-    matches("class X {}; typedef X V; typedef V W; class Z : public W {};",
-            ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class X {}; class Y : public X {}; "
-              "typedef Y V; typedef V W; class Z : public W {};",
-            ZIsDerivedFromX));
+      matches("class X {}; typedef X V; typedef V W; class Z : public W {};",
+              ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {}; class Y : public X {}; "
+                      "typedef Y V; typedef V W; class Z : public W {};",
+                      ZIsDerivedFromX));
   EXPECT_TRUE(notMatches("class X {}; class Y : public X {}; "
                          "typedef Y V; typedef V W; class Z : public W {};",
                          ZIsDirectlyDerivedFromX));
   EXPECT_TRUE(
-    matches("template class X {}; "
+      matches("template class X {}; "
               "template class A { class Z : public X {}; };",
-            ZIsDerivedFromX));
+              ZIsDerivedFromX));
   EXPECT_TRUE(
-    notMatches("template class D { typedef X A; typedef A B; "
+      notMatches("template class D { typedef X A; typedef A B; "
                  "  typedef B C; class Z : public C {}; };",
-               ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class X {}; typedef X A; typedef A B; "
-              "class Z : public B {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class X {}; typedef X A; typedef A B; typedef B C; "
-              "class Z : public C {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class U {}; typedef U X; typedef X V; "
-              "class Z : public V {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class Base {}; typedef Base X; "
-              "class Z : public Base {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class Base {}; typedef Base Base2; typedef Base2 X; "
-              "class Z : public Base {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("class Base {}; class Base2 {}; typedef Base2 X; "
-                 "class Z : public Base {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    matches("class A {}; typedef A X; typedef A Y; "
-              "class Z : public Y {};", ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("template  class Z;"
-                 "template <> class Z {};"
-                 "template  class Z : public Z {};",
-               IsDerivedFromX));
-  EXPECT_TRUE(
-    matches("template  class X;"
-              "template <> class X {};"
-              "template  class X : public X {};",
-            IsDerivedFromX));
-  EXPECT_TRUE(matches(
-    "class X {};"
-      "template  class Z;"
-      "template <> class Z {};"
-      "template  class Z : public Z, public X {};",
-    ZIsDerivedFromX));
-  EXPECT_TRUE(
-    notMatches("template struct X;"
+                 ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {}; typedef X A; typedef A B; "
+                      "class Z : public B {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class X {}; typedef X A; typedef A B; typedef B C; "
+                      "class Z : public C {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class U {}; typedef U X; typedef X V; "
+                      "class Z : public V {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class Base {}; typedef Base X; "
+                      "class Z : public Base {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class Base {}; typedef Base Base2; typedef Base2 X; "
+                      "class Z : public Base {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(notMatches("class Base {}; class Base2 {}; typedef Base2 X; "
+                         "class Z : public Base {};",
+                         ZIsDerivedFromX));
+  EXPECT_TRUE(matches("class A {}; typedef A X; typedef A Y; "
+                      "class Z : public Y {};",
+                      ZIsDerivedFromX));
+  EXPECT_TRUE(notMatches("template  class Z;"
+                         "template <> class Z {};"
+                         "template  class Z : public Z {};",
+                         IsDerivedFromX));
+  EXPECT_TRUE(matches("template  class X;"
+                      "template <> class X {};"
+                      "template  class X : public X {};",
+                      IsDerivedFromX));
+  EXPECT_TRUE(
+      matches("class X {};"
+              "template  class Z;"
+              "template <> class Z {};"
+              "template  class Z : public Z, public X {};",
+              ZIsDerivedFromX));
+  EXPECT_TRUE(
+      notMatches("template struct X;"
                  "template struct X : public X {};",
-               cxxRecordDecl(isDerivedFrom(recordDecl(hasName("Some"))))));
+                 cxxRecordDecl(isDerivedFrom(recordDecl(hasName("Some"))))));
   EXPECT_TRUE(matches(
-    "struct A {};"
+      "struct A {};"
       "template struct X;"
       "template struct X : public X {};"
       "template<> struct X<0> : public A {};"
       "struct B : public X<42> {};",
-    cxxRecordDecl(hasName("B"), isDerivedFrom(recordDecl(hasName("A"))))));
+      cxxRecordDecl(hasName("B"), isDerivedFrom(recordDecl(hasName("A"))))));
   EXPECT_TRUE(notMatches(
       "struct A {};"
       "template struct X;"
@@ -645,7 +603,7 @@ TEST(DeclarationMatcher, ClassIsDerived) {
   // get rid of the Variable(...) matching and match the right template
   // declarations directly.
   const char *RecursiveTemplateOneParameter =
-    "class Base1 {}; class Base2 {};"
+      "class Base1 {}; class Base2 {};"
       "template  class Z;"
       "template <> class Z : public Base1 {};"
       "template <> class Z : public Base2 {};"
@@ -654,21 +612,21 @@ TEST(DeclarationMatcher, ClassIsDerived) {
       "template  class Z : public Z, public Z {};"
       "void f() { Z z_float; Z z_double; Z z_char; }";
   EXPECT_TRUE(matches(
-    RecursiveTemplateOneParameter,
-    varDecl(hasName("z_float"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1")))))));
+      RecursiveTemplateOneParameter,
+      varDecl(hasName("z_float"),
+              hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1")))))));
   EXPECT_TRUE(notMatches(
-    RecursiveTemplateOneParameter,
-    varDecl(hasName("z_float"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base2")))))));
-  EXPECT_TRUE(matches(
-    RecursiveTemplateOneParameter,
-    varDecl(hasName("z_char"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1"),
-                                                 isDerivedFrom("Base2")))))));
+      RecursiveTemplateOneParameter,
+      varDecl(hasName("z_float"),
+              hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base2")))))));
+  EXPECT_TRUE(
+      matches(RecursiveTemplateOneParameter,
+              varDecl(hasName("z_char"),
+                      hasInitializer(hasType(cxxRecordDecl(
+                          isDerivedFrom("Base1"), isDerivedFrom("Base2")))))));
 
   const char *RecursiveTemplateTwoParameters =
-    "class Base1 {}; class Base2 {};"
+      "class Base1 {}; class Base2 {};"
       "template  class Z;"
       "template  class Z : public Base1 {};"
       "template  class Z : public Base2 {};"
@@ -679,34 +637,31 @@ TEST(DeclarationMatcher, ClassIsDerived) {
       "void f() { Z z_float; Z z_double; "
       "           Z z_char; }";
   EXPECT_TRUE(matches(
-    RecursiveTemplateTwoParameters,
-    varDecl(hasName("z_float"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1")))))));
-  EXPECT_TRUE(notMatches(
-    RecursiveTemplateTwoParameters,
-    varDecl(hasName("z_float"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base2")))))));
-  EXPECT_TRUE(matches(
-    RecursiveTemplateTwoParameters,
-    varDecl(hasName("z_char"),
-            hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1"),
-                                                 isDerivedFrom("Base2")))))));
-  EXPECT_TRUE(matches(
-    "namespace ns { class X {}; class Y : public X {}; }",
-    cxxRecordDecl(isDerivedFrom("::ns::X"))));
+      RecursiveTemplateTwoParameters,
+      varDecl(hasName("z_float"),
+              hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base1")))))));
   EXPECT_TRUE(notMatches(
-    "class X {}; class Y : public X {};",
-    cxxRecordDecl(isDerivedFrom("::ns::X"))));
+      RecursiveTemplateTwoParameters,
+      varDecl(hasName("z_float"),
+              hasInitializer(hasType(cxxRecordDecl(isDerivedFrom("Base2")))))));
+  EXPECT_TRUE(
+      matches(RecursiveTemplateTwoParameters,
+              varDecl(hasName("z_char"),
+                      hasInitializer(hasType(cxxRecordDecl(
+                          isDerivedFrom("Base1"), isDerivedFrom("Base2")))))));
+  EXPECT_TRUE(matches("namespace ns { class X {}; class Y : public X {}; }",
+                      cxxRecordDecl(isDerivedFrom("::ns::X"))));
+  EXPECT_TRUE(notMatches("class X {}; class Y : public X {};",
+                         cxxRecordDecl(isDerivedFrom("::ns::X"))));
 
   EXPECT_TRUE(matches(
-    "class X {}; class Y : public X {};",
-    cxxRecordDecl(isDerivedFrom(recordDecl(hasName("X")).bind("test")))));
+      "class X {}; class Y : public X {};",
+      cxxRecordDecl(isDerivedFrom(recordDecl(hasName("X")).bind("test")))));
 
-  EXPECT_TRUE(matches(
-    "template class X {};"
-      "template using Z = X;"
-      "template  class Y : Z {};",
-    cxxRecordDecl(isDerivedFrom(namedDecl(hasName("X"))))));
+  EXPECT_TRUE(matches("template class X {};"
+                      "template using Z = X;"
+                      "template  class Y : Z {};",
+                      cxxRecordDecl(isDerivedFrom(namedDecl(hasName("X"))))));
 }
 
 TEST(DeclarationMatcher, IsDerivedFromEmptyName) {
@@ -737,24 +692,24 @@ TEST(DeclarationMatcher, ObjCClassIsDerived) {
 
   DeclarationMatcher IsDirectlyDerivedFromX =
       objcInterfaceDecl(isDirectlyDerivedFrom("X"));
-  EXPECT_TRUE(
-      matchesObjC("@interface X @end @interface Y : X @end", IsDirectlyDerivedFromX));
+  EXPECT_TRUE(matchesObjC("@interface X @end @interface Y : X @end",
+                          IsDirectlyDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface X @end @interface Y<__covariant ObjectType> : X @end",
       IsDirectlyDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface X @end @compatibility_alias Y X; @interface Z : Y @end",
       IsDirectlyDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface X @end typedef X Y; @interface Z : Y @end",
-      IsDirectlyDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface X @end typedef X Y; @interface Z : Y @end",
+                  IsDirectlyDerivedFromX));
   EXPECT_TRUE(notMatchesObjC("@interface X @end", IsDirectlyDerivedFromX));
   EXPECT_TRUE(notMatchesObjC("@class X;", IsDirectlyDerivedFromX));
   EXPECT_TRUE(notMatchesObjC("@class Y;", IsDirectlyDerivedFromX));
   EXPECT_TRUE(notMatchesObjC("@interface X @end @compatibility_alias Y X;",
                              IsDirectlyDerivedFromX));
-  EXPECT_TRUE(notMatchesObjC("@interface X @end typedef X Y;",
-                             IsDirectlyDerivedFromX));
+  EXPECT_TRUE(
+      notMatchesObjC("@interface X @end typedef X Y;", IsDirectlyDerivedFromX));
 
   DeclarationMatcher IsAX = objcInterfaceDecl(isSameOrDerivedFrom("X"));
   EXPECT_TRUE(matchesObjC("@interface X @end @interface Y : X @end", IsAX));
@@ -775,9 +730,9 @@ TEST(DeclarationMatcher, ObjCClassIsDerived) {
                           ZIsDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface X @end typedef X Y; @interface Z : Y @end", ZIsDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface X @end typedef X Y; @interface Z : Y @end",
-      ZIsDirectlyDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface X @end typedef X Y; @interface Z : Y @end",
+                  ZIsDirectlyDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface A @end typedef A X; typedef A Y; @interface Z : Y @end",
       ZIsDerivedFromX));
@@ -798,27 +753,33 @@ TEST(DeclarationMatcher, ObjCClassIsDerived) {
       ZIsDirectlyDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface A @end @compatibility_alias X A; @compatibility_alias Y A;"
-      "@interface Z : Y @end", ZIsDerivedFromX));
+      "@interface Z : Y @end",
+      ZIsDerivedFromX));
   EXPECT_TRUE(matchesObjC(
       "@interface A @end @compatibility_alias X A; @compatibility_alias Y A;"
-      "@interface Z : Y @end", ZIsDirectlyDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface Y @end typedef Y X; @interface Z : X @end", ZIsDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface Y @end typedef Y X; @interface Z : X @end",
+      "@interface Z : Y @end",
       ZIsDirectlyDerivedFromX));
   EXPECT_TRUE(matchesObjC(
-      "@interface A @end @compatibility_alias Y A; typedef Y X;"
-      "@interface Z : A @end", ZIsDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface A @end @compatibility_alias Y A; typedef Y X;"
-      "@interface Z : A @end", ZIsDirectlyDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface A @end typedef A Y; @compatibility_alias X Y;"
-      "@interface Z : A @end", ZIsDerivedFromX));
-  EXPECT_TRUE(matchesObjC(
-      "@interface A @end typedef A Y; @compatibility_alias X Y;"
-      "@interface Z : A @end", ZIsDirectlyDerivedFromX));
+      "@interface Y @end typedef Y X; @interface Z : X @end", ZIsDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface Y @end typedef Y X; @interface Z : X @end",
+                  ZIsDirectlyDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface A @end @compatibility_alias Y A; typedef Y X;"
+                  "@interface Z : A @end",
+                  ZIsDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface A @end @compatibility_alias Y A; typedef Y X;"
+                  "@interface Z : A @end",
+                  ZIsDirectlyDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface A @end typedef A Y; @compatibility_alias X Y;"
+                  "@interface Z : A @end",
+                  ZIsDerivedFromX));
+  EXPECT_TRUE(
+      matchesObjC("@interface A @end typedef A Y; @compatibility_alias X Y;"
+                  "@interface Z : A @end",
+                  ZIsDirectlyDerivedFromX));
 }
 
 TEST(DeclarationMatcher, IsLambda) {
@@ -830,42 +791,41 @@ TEST(DeclarationMatcher, IsLambda) {
 TEST(Matcher, BindMatchedNodes) {
   DeclarationMatcher ClassX = has(recordDecl(hasName("::X")).bind("x"));
 
-  EXPECT_TRUE(matchAndVerifyResultTrue("class X {};",
-                                       ClassX, std::make_unique>("x")));
+  EXPECT_TRUE(matchAndVerifyResultTrue(
+      "class X {};", ClassX,
+      std::make_unique>("x")));
 
-  EXPECT_TRUE(matchAndVerifyResultFalse("class X {};",
-                                        ClassX, std::make_unique>("other-id")));
+  EXPECT_TRUE(matchAndVerifyResultFalse(
+      "class X {};", ClassX,
+      std::make_unique>("other-id")));
 
   TypeMatcher TypeAHasClassB = hasDeclaration(
-    recordDecl(hasName("A"), has(recordDecl(hasName("B")).bind("b"))));
+      recordDecl(hasName("A"), has(recordDecl(hasName("B")).bind("b"))));
 
-  EXPECT_TRUE(matchAndVerifyResultTrue("class A { public: A *a; class B {}; };",
-                                       TypeAHasClassB,
-                                       std::make_unique>("b")));
+  EXPECT_TRUE(matchAndVerifyResultTrue(
+      "class A { public: A *a; class B {}; };", TypeAHasClassB,
+      std::make_unique>("b")));
 
   StatementMatcher MethodX =
-    callExpr(callee(cxxMethodDecl(hasName("x")))).bind("x");
+      callExpr(callee(cxxMethodDecl(hasName("x")))).bind("x");
 
-  EXPECT_TRUE(matchAndVerifyResultTrue("class A { void x() { x(); } };",
-                                       MethodX,
-                                       std::make_unique>("x")));
+  EXPECT_TRUE(matchAndVerifyResultTrue(
+      "class A { void x() { x(); } };", MethodX,
+      std::make_unique>("x")));
 }
 
 TEST(Matcher, BindTheSameNameInAlternatives) {
   StatementMatcher matcher = anyOf(
-    binaryOperator(hasOperatorName("+"),
-                   hasLHS(expr().bind("x")),
-                   hasRHS(integerLiteral(equals(0)))),
-    binaryOperator(hasOperatorName("+"),
-                   hasLHS(integerLiteral(equals(0))),
-                   hasRHS(expr().bind("x"))));
+      binaryOperator(hasOperatorName("+"), hasLHS(expr().bind("x")),
+                     hasRHS(integerLiteral(equals(0)))),
+      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))),
+                     hasRHS(expr().bind("x"))));
 
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    // The first branch of the matcher binds x to 0 but then fails.
-    // The second branch binds x to f() and succeeds.
-    "int f() { return 0 + f(); }",
-    matcher,
-    std::make_unique>("x")));
+      // The first branch of the matcher binds x to 0 but then fails.
+      // The second branch binds x to f() and succeeds.
+      "int f() { return 0 + f(); }", matcher,
+      std::make_unique>("x")));
 }
 
 TEST(Matcher, BindsIDForMemoizedResults) {
@@ -873,48 +833,48 @@ TEST(Matcher, BindsIDForMemoizedResults) {
   // kick in.
   DeclarationMatcher ClassX = recordDecl(hasName("X")).bind("x");
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "class A { class B { class X {}; }; };",
-    DeclarationMatcher(anyOf(
-      recordDecl(hasName("A"), hasDescendant(ClassX)),
-      recordDecl(hasName("B"), hasDescendant(ClassX)))),
-    std::make_unique>("x", 2)));
+      "class A { class B { class X {}; }; };",
+      DeclarationMatcher(
+          anyOf(recordDecl(hasName("A"), hasDescendant(ClassX)),
+                recordDecl(hasName("B"), hasDescendant(ClassX)))),
+      std::make_unique>("x", 2)));
 }
 
 TEST(HasType, MatchesAsString) {
   EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z() {Y* y; y->x(); }",
-            cxxMemberCallExpr(on(hasType(asString("class Y *"))))));
+      matches("class Y { public: void x(); }; void z() {Y* y; y->x(); }",
+              cxxMemberCallExpr(on(hasType(asString("class Y *"))))));
   EXPECT_TRUE(
-    matches("class X { void x(int x) {} };",
-            cxxMethodDecl(hasParameter(0, hasType(asString("int"))))));
+      matches("class X { void x(int x) {} };",
+              cxxMethodDecl(hasParameter(0, hasType(asString("int"))))));
   EXPECT_TRUE(matches("namespace ns { struct A {}; }  struct B { ns::A a; };",
                       fieldDecl(hasType(asString("ns::A")))));
-  EXPECT_TRUE(matches("namespace { struct A {}; }  struct B { A a; };",
-                      fieldDecl(hasType(asString("struct (anonymous namespace)::A")))));
+  EXPECT_TRUE(
+      matches("namespace { struct A {}; }  struct B { A a; };",
+              fieldDecl(hasType(asString("struct (anonymous namespace)::A")))));
 }
 
 TEST(Matcher, HasOperatorNameForOverloadedOperatorCall) {
   StatementMatcher OpCallAndAnd =
-    cxxOperatorCallExpr(hasOverloadedOperatorName("&&"));
+      cxxOperatorCallExpr(hasOverloadedOperatorName("&&"));
   EXPECT_TRUE(matches("class Y { }; "
-                        "bool operator&&(Y x, Y y) { return true; }; "
-                        "Y a; Y b; bool c = a && b;", OpCallAndAnd));
+                      "bool operator&&(Y x, Y y) { return true; }; "
+                      "Y a; Y b; bool c = a && b;",
+                      OpCallAndAnd));
   StatementMatcher OpCallLessLess =
-    cxxOperatorCallExpr(hasOverloadedOperatorName("<<"));
+      cxxOperatorCallExpr(hasOverloadedOperatorName("<<"));
   EXPECT_TRUE(notMatches("class Y { }; "
-                           "bool operator&&(Y x, Y y) { return true; }; "
-                           "Y a; Y b; bool c = a && b;",
+                         "bool operator&&(Y x, Y y) { return true; }; "
+                         "Y a; Y b; bool c = a && b;",
                          OpCallLessLess));
   StatementMatcher OpStarCall =
-    cxxOperatorCallExpr(hasOverloadedOperatorName("*"));
-  EXPECT_TRUE(matches("class Y; int operator*(Y &); void f(Y &y) { *y; }",
-                      OpStarCall));
+      cxxOperatorCallExpr(hasOverloadedOperatorName("*"));
+  EXPECT_TRUE(
+      matches("class Y; int operator*(Y &); void f(Y &y) { *y; }", OpStarCall));
   DeclarationMatcher ClassWithOpStar =
-    cxxRecordDecl(hasMethod(hasOverloadedOperatorName("*")));
-  EXPECT_TRUE(matches("class Y { int operator*(); };",
-                      ClassWithOpStar));
-  EXPECT_TRUE(notMatches("class Y { void myOperator(); };",
-                         ClassWithOpStar)) ;
+      cxxRecordDecl(hasMethod(hasOverloadedOperatorName("*")));
+  EXPECT_TRUE(matches("class Y { int operator*(); };", ClassWithOpStar));
+  EXPECT_TRUE(notMatches("class Y { void myOperator(); };", ClassWithOpStar));
   DeclarationMatcher AnyOpStar = functionDecl(hasOverloadedOperatorName("*"));
   EXPECT_TRUE(matches("class Y; int operator*(Y &);", AnyOpStar));
   EXPECT_TRUE(matches("class Y { int operator*(); };", AnyOpStar));
@@ -926,23 +886,22 @@ TEST(Matcher, HasOperatorNameForOverloadedOperatorCall) {
   EXPECT_TRUE(matches("class Y { Y operator&&(Y &); };", AnyAndOp));
 }
 
-
 TEST(Matcher, NestedOverloadedOperatorCalls) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "class Y { }; "
+      "class Y { }; "
       "Y& operator&&(Y& x, Y& y) { return x; }; "
       "Y a; Y b; Y c; Y d = a && b && c;",
-    cxxOperatorCallExpr(hasOverloadedOperatorName("&&")).bind("x"),
-    std::make_unique>("x", 2)));
+      cxxOperatorCallExpr(hasOverloadedOperatorName("&&")).bind("x"),
+      std::make_unique>("x", 2)));
   EXPECT_TRUE(matches("class Y { }; "
-                        "Y& operator&&(Y& x, Y& y) { return x; }; "
-                        "Y a; Y b; Y c; Y d = a && b && c;",
+                      "Y& operator&&(Y& x, Y& y) { return x; }; "
+                      "Y a; Y b; Y c; Y d = a && b && c;",
                       cxxOperatorCallExpr(hasParent(cxxOperatorCallExpr()))));
   EXPECT_TRUE(
-    matches("class Y { }; "
+      matches("class Y { }; "
               "Y& operator&&(Y& x, Y& y) { return x; }; "
               "Y a; Y b; Y c; Y d = a && b && c;",
-            cxxOperatorCallExpr(hasDescendant(cxxOperatorCallExpr()))));
+              cxxOperatorCallExpr(hasDescendant(cxxOperatorCallExpr()))));
 }
 
 TEST(Matcher, VarDecl_Storage) {
@@ -971,9 +930,9 @@ TEST(Matcher, VarDecl_StorageDuration) {
 
   EXPECT_TRUE(matches(T, varDecl(hasName("x"), hasAutomaticStorageDuration())));
   EXPECT_TRUE(
-    notMatches(T, varDecl(hasName("y"), hasAutomaticStorageDuration())));
+      notMatches(T, varDecl(hasName("y"), hasAutomaticStorageDuration())));
   EXPECT_TRUE(
-    notMatches(T, varDecl(hasName("a"), hasAutomaticStorageDuration())));
+      notMatches(T, varDecl(hasName("a"), hasAutomaticStorageDuration())));
 
   EXPECT_TRUE(matches(T, varDecl(hasName("y"), hasStaticStorageDuration())));
   EXPECT_TRUE(matches(T, varDecl(hasName("a"), hasStaticStorageDuration())));
@@ -991,48 +950,48 @@ TEST(Matcher, VarDecl_StorageDuration) {
 }
 
 TEST(Matcher, FindsVarDeclInFunctionParameter) {
-  EXPECT_TRUE(matches(
-    "void f(int i) {}",
-    varDecl(hasName("i"))));
+  EXPECT_TRUE(matches("void f(int i) {}", varDecl(hasName("i"))));
 }
 
 TEST(UnaryExpressionOrTypeTraitExpression, MatchesCorrectType) {
-  EXPECT_TRUE(matches("void x() { int a = sizeof(a); }", sizeOfExpr(
-    hasArgumentOfType(asString("int")))));
-  EXPECT_TRUE(notMatches("void x() { int a = sizeof(a); }", sizeOfExpr(
-    hasArgumentOfType(asString("float")))));
+  EXPECT_TRUE(matches("void x() { int a = sizeof(a); }",
+                      sizeOfExpr(hasArgumentOfType(asString("int")))));
+  EXPECT_TRUE(notMatches("void x() { int a = sizeof(a); }",
+                         sizeOfExpr(hasArgumentOfType(asString("float")))));
   EXPECT_TRUE(matches(
-    "struct A {}; void x() { A a; int b = sizeof(a); }",
-    sizeOfExpr(hasArgumentOfType(hasDeclaration(recordDecl(hasName("A")))))));
-  EXPECT_TRUE(notMatches("void x() { int a = sizeof(a); }", sizeOfExpr(
-    hasArgumentOfType(hasDeclaration(recordDecl(hasName("string")))))));
+      "struct A {}; void x() { A a; int b = sizeof(a); }",
+      sizeOfExpr(hasArgumentOfType(hasDeclaration(recordDecl(hasName("A")))))));
+  EXPECT_TRUE(notMatches("void x() { int a = sizeof(a); }",
+                         sizeOfExpr(hasArgumentOfType(
+                             hasDeclaration(recordDecl(hasName("string")))))));
 }
 
 TEST(IsInteger, MatchesIntegers) {
   EXPECT_TRUE(matches("int i = 0;", varDecl(hasType(isInteger()))));
-  EXPECT_TRUE(matches(
-    "long long i = 0; void f(long long) { }; void g() {f(i);}",
-    callExpr(hasArgument(0, declRefExpr(
-      to(varDecl(hasType(isInteger()))))))));
+  EXPECT_TRUE(
+      matches("long long i = 0; void f(long long) { }; void g() {f(i);}",
+              callExpr(hasArgument(
+                  0, declRefExpr(to(varDecl(hasType(isInteger()))))))));
 }
 
 TEST(IsInteger, ReportsNoFalsePositives) {
   EXPECT_TRUE(notMatches("int *i;", varDecl(hasType(isInteger()))));
-  EXPECT_TRUE(notMatches("struct T {}; T t; void f(T *) { }; void g() {f(&t);}",
-                         callExpr(hasArgument(0, declRefExpr(
-                           to(varDecl(hasType(isInteger()))))))));
+  EXPECT_TRUE(
+      notMatches("struct T {}; T t; void f(T *) { }; void g() {f(&t);}",
+                 callExpr(hasArgument(
+                     0, declRefExpr(to(varDecl(hasType(isInteger()))))))));
 }
 
 TEST(IsSignedInteger, MatchesSignedIntegers) {
   EXPECT_TRUE(matches("int i = 0;", varDecl(hasType(isSignedInteger()))));
-  EXPECT_TRUE(notMatches("unsigned i = 0;",
-                         varDecl(hasType(isSignedInteger()))));
+  EXPECT_TRUE(
+      notMatches("unsigned i = 0;", varDecl(hasType(isSignedInteger()))));
 }
 
 TEST(IsUnsignedInteger, MatchesUnsignedIntegers) {
   EXPECT_TRUE(notMatches("int i = 0;", varDecl(hasType(isUnsignedInteger()))));
-  EXPECT_TRUE(matches("unsigned i = 0;",
-                      varDecl(hasType(isUnsignedInteger()))));
+  EXPECT_TRUE(
+      matches("unsigned i = 0;", varDecl(hasType(isUnsignedInteger()))));
 }
 
 TEST(IsAnyPointer, MatchesPointers) {
@@ -1059,8 +1018,8 @@ TEST(IsAnyCharacter, ReportsNoFalsePositives) {
 TEST(IsArrow, MatchesMemberVariablesViaArrow) {
   EXPECT_TRUE(matches("class Y { void x() { this->y; } int y; };",
                       memberExpr(isArrow())));
-  EXPECT_TRUE(matches("class Y { void x() { y; } int y; };",
-                      memberExpr(isArrow())));
+  EXPECT_TRUE(
+      matches("class Y { void x() { y; } int y; };", memberExpr(isArrow())));
   EXPECT_TRUE(notMatches("class Y { void x() { (*this).y; } int y; };",
                          memberExpr(isArrow())));
   EXPECT_TRUE(matches("template  class Y { void x() { this->m; } };",
@@ -1080,10 +1039,9 @@ TEST(IsArrow, MatchesStaticMemberVariablesViaArrow) {
 }
 
 TEST(IsArrow, MatchesMemberCallsViaArrow) {
-  EXPECT_TRUE(matches("class Y { void x() { this->x(); } };",
-                      memberExpr(isArrow())));
-  EXPECT_TRUE(matches("class Y { void x() { x(); } };",
-                      memberExpr(isArrow())));
+  EXPECT_TRUE(
+      matches("class Y { void x() { this->x(); } };", memberExpr(isArrow())));
+  EXPECT_TRUE(matches("class Y { void x() { x(); } };", memberExpr(isArrow())));
   EXPECT_TRUE(notMatches("class Y { void x() { Y y; y.x(); } };",
                          memberExpr(isArrow())));
   EXPECT_TRUE(
@@ -1128,20 +1086,18 @@ TEST(Matcher, ParameterCount) {
 }
 
 TEST(Matcher, References) {
-  DeclarationMatcher ReferenceClassX = varDecl(
-    hasType(references(recordDecl(hasName("X")))));
-  EXPECT_TRUE(matches("class X {}; void y(X y) { X &x = y; }",
-                      ReferenceClassX));
+  DeclarationMatcher ReferenceClassX =
+      varDecl(hasType(references(recordDecl(hasName("X")))));
   EXPECT_TRUE(
-    matches("class X {}; void y(X y) { const X &x = y; }", ReferenceClassX));
+      matches("class X {}; void y(X y) { X &x = y; }", ReferenceClassX));
+  EXPECT_TRUE(
+      matches("class X {}; void y(X y) { const X &x = y; }", ReferenceClassX));
   // The match here is on the implicit copy constructor code for
   // class X, not on code 'X x = y'.
+  EXPECT_TRUE(matches("class X {}; void y(X y) { X x = y; }", ReferenceClassX));
+  EXPECT_TRUE(notMatches("class X {}; extern X x;", ReferenceClassX));
   EXPECT_TRUE(
-    matches("class X {}; void y(X y) { X x = y; }", ReferenceClassX));
-  EXPECT_TRUE(
-    notMatches("class X {}; extern X x;", ReferenceClassX));
-  EXPECT_TRUE(
-    notMatches("class X {}; void y(X *y) { X *&x = y; }", ReferenceClassX));
+      notMatches("class X {}; void y(X *y) { X *&x = y; }", ReferenceClassX));
 }
 
 TEST(QualType, hasLocalQualifiers) {
@@ -1149,16 +1105,15 @@ TEST(QualType, hasLocalQualifiers) {
                          varDecl(hasType(hasLocalQualifiers()))));
   EXPECT_TRUE(matches("int *const j = nullptr;",
                       varDecl(hasType(hasLocalQualifiers()))));
-  EXPECT_TRUE(matches("int *volatile k;",
-                      varDecl(hasType(hasLocalQualifiers()))));
-  EXPECT_TRUE(notMatches("int m;",
-                         varDecl(hasType(hasLocalQualifiers()))));
+  EXPECT_TRUE(
+      matches("int *volatile k;", varDecl(hasType(hasLocalQualifiers()))));
+  EXPECT_TRUE(notMatches("int m;", varDecl(hasType(hasLocalQualifiers()))));
 }
 
 TEST(IsExternC, MatchesExternCFunctionDeclarations) {
   EXPECT_TRUE(matches("extern \"C\" void f() {}", functionDecl(isExternC())));
-  EXPECT_TRUE(matches("extern \"C\" { void f() {} }",
-                      functionDecl(isExternC())));
+  EXPECT_TRUE(
+      matches("extern \"C\" { void f() {} }", functionDecl(isExternC())));
   EXPECT_TRUE(notMatches("void f() {}", functionDecl(isExternC())));
 }
 
@@ -1186,7 +1141,7 @@ TEST(IsDefaulted, MatchesDefaultedFunctionDeclarations) {
 
 TEST(IsDeleted, MatchesDeletedFunctionDeclarations) {
   EXPECT_TRUE(
-    notMatches("void Func();", functionDecl(hasName("Func"), isDeleted())));
+      notMatches("void Func();", functionDecl(hasName("Func"), isDeleted())));
   EXPECT_TRUE(matches("void Func() = delete;",
                       functionDecl(hasName("Func"), isDeleted())));
 }
@@ -1195,14 +1150,15 @@ TEST(IsNoThrow, MatchesNoThrowFunctionDeclarations) {
   EXPECT_TRUE(notMatches("void f();", functionDecl(isNoThrow())));
   EXPECT_TRUE(notMatches("void f() throw(int);", functionDecl(isNoThrow())));
   EXPECT_TRUE(
-    notMatches("void f() noexcept(false);", functionDecl(isNoThrow())));
+      notMatches("void f() noexcept(false);", functionDecl(isNoThrow())));
   EXPECT_TRUE(matches("void f() throw();", functionDecl(isNoThrow())));
   EXPECT_TRUE(matches("void f() noexcept;", functionDecl(isNoThrow())));
 
   EXPECT_TRUE(notMatches("void f();", functionProtoType(isNoThrow())));
-  EXPECT_TRUE(notMatches("void f() throw(int);", functionProtoType(isNoThrow())));
   EXPECT_TRUE(
-    notMatches("void f() noexcept(false);", functionProtoType(isNoThrow())));
+      notMatches("void f() throw(int);", functionProtoType(isNoThrow())));
+  EXPECT_TRUE(
+      notMatches("void f() noexcept(false);", functionProtoType(isNoThrow())));
   EXPECT_TRUE(matches("void f() throw();", functionProtoType(isNoThrow())));
   EXPECT_TRUE(matches("void f() noexcept;", functionProtoType(isNoThrow())));
 }
@@ -1249,41 +1205,41 @@ TEST(hasInitStatement, MatchesRangeForInitializers) {
 
 TEST(TemplateArgumentCountIs, Matches) {
   EXPECT_TRUE(
-    matches("template struct C {}; C c;",
-            classTemplateSpecializationDecl(templateArgumentCountIs(1))));
+      matches("template struct C {}; C c;",
+              classTemplateSpecializationDecl(templateArgumentCountIs(1))));
   EXPECT_TRUE(
-    notMatches("template struct C {}; C c;",
-               classTemplateSpecializationDecl(templateArgumentCountIs(2))));
+      notMatches("template struct C {}; C c;",
+                 classTemplateSpecializationDecl(templateArgumentCountIs(2))));
 
   EXPECT_TRUE(matches("template struct C {}; C c;",
                       templateSpecializationType(templateArgumentCountIs(1))));
   EXPECT_TRUE(
-    notMatches("template struct C {}; C c;",
-               templateSpecializationType(templateArgumentCountIs(2))));
+      notMatches("template struct C {}; C c;",
+                 templateSpecializationType(templateArgumentCountIs(2))));
 }
 
 TEST(IsIntegral, Matches) {
-  EXPECT_TRUE(matches("template struct C {}; C<42> c;",
-                      classTemplateSpecializationDecl(
-                        hasAnyTemplateArgument(isIntegral()))));
+  EXPECT_TRUE(matches(
+      "template struct C {}; C<42> c;",
+      classTemplateSpecializationDecl(hasAnyTemplateArgument(isIntegral()))));
   EXPECT_TRUE(notMatches("template struct C {}; C c;",
                          classTemplateSpecializationDecl(hasAnyTemplateArgument(
-                           templateArgument(isIntegral())))));
+                             templateArgument(isIntegral())))));
 }
 
 TEST(EqualsIntegralValue, Matches) {
   EXPECT_TRUE(matches("template struct C {}; C<42> c;",
                       classTemplateSpecializationDecl(
-                        hasAnyTemplateArgument(equalsIntegralValue("42")))));
+                          hasAnyTemplateArgument(equalsIntegralValue("42")))));
   EXPECT_TRUE(matches("template struct C {}; C<-42> c;",
                       classTemplateSpecializationDecl(
-                        hasAnyTemplateArgument(equalsIntegralValue("-42")))));
+                          hasAnyTemplateArgument(equalsIntegralValue("-42")))));
   EXPECT_TRUE(matches("template struct C {}; C<-0042> c;",
                       classTemplateSpecializationDecl(
-                        hasAnyTemplateArgument(equalsIntegralValue("-34")))));
+                          hasAnyTemplateArgument(equalsIntegralValue("-34")))));
   EXPECT_TRUE(notMatches("template struct C {}; C<42> c;",
                          classTemplateSpecializationDecl(hasAnyTemplateArgument(
-                           equalsIntegralValue("0042")))));
+                             equalsIntegralValue("0042")))));
 }
 
 TEST(Matcher, MatchesAccessSpecDecls) {
@@ -1304,7 +1260,7 @@ TEST(Matcher, MatchesFinal) {
                       cxxMethodDecl(isFinal())));
   EXPECT_TRUE(notMatches("class X {};", cxxRecordDecl(isFinal())));
   EXPECT_TRUE(
-    notMatches("class X { virtual void f(); };", cxxMethodDecl(isFinal())));
+      notMatches("class X { virtual void f(); };", cxxMethodDecl(isFinal())));
 }
 
 TEST(Matcher, MatchesVirtualMethod) {
@@ -1315,12 +1271,12 @@ TEST(Matcher, MatchesVirtualMethod) {
 
 TEST(Matcher, MatchesVirtualAsWrittenMethod) {
   EXPECT_TRUE(matches("class A { virtual int f(); };"
-                        "class B : public A { int f(); };",
+                      "class B : public A { int f(); };",
                       cxxMethodDecl(isVirtualAsWritten(), hasName("::A::f"))));
   EXPECT_TRUE(
-    notMatches("class A { virtual int f(); };"
+      notMatches("class A { virtual int f(); };"
                  "class B : public A { int f(); };",
-               cxxMethodDecl(isVirtualAsWritten(), hasName("::B::f"))));
+                 cxxMethodDecl(isVirtualAsWritten(), hasName("::B::f"))));
 }
 
 TEST(Matcher, MatchesPureMethod) {
@@ -1358,26 +1314,26 @@ TEST(Matcher, MatchesMoveAssignmentOperator) {
 
 TEST(Matcher, MatchesConstMethod) {
   EXPECT_TRUE(
-    matches("struct A { void foo() const; };", cxxMethodDecl(isConst())));
+      matches("struct A { void foo() const; };", cxxMethodDecl(isConst())));
   EXPECT_TRUE(
-    notMatches("struct A { void foo(); };", cxxMethodDecl(isConst())));
+      notMatches("struct A { void foo(); };", cxxMethodDecl(isConst())));
 }
 
 TEST(Matcher, MatchesOverridingMethod) {
   EXPECT_TRUE(matches("class X { virtual int f(); }; "
-                        "class Y : public X { int f(); };",
+                      "class Y : public X { int f(); };",
                       cxxMethodDecl(isOverride(), hasName("::Y::f"))));
   EXPECT_TRUE(notMatches("class X { virtual int f(); }; "
-                           "class Y : public X { int f(); };",
+                         "class Y : public X { int f(); };",
                          cxxMethodDecl(isOverride(), hasName("::X::f"))));
   EXPECT_TRUE(notMatches("class X { int f(); }; "
-                           "class Y : public X { int f(); };",
+                         "class Y : public X { int f(); };",
                          cxxMethodDecl(isOverride())));
   EXPECT_TRUE(notMatches("class X { int f(); int f(int); }; ",
                          cxxMethodDecl(isOverride())));
   EXPECT_TRUE(
-    matches("template  struct Y : Base { void f() override;};",
-            cxxMethodDecl(isOverride(), hasName("::Y::f"))));
+      matches("template  struct Y : Base { void f() override;};",
+              cxxMethodDecl(isOverride(), hasName("::Y::f"))));
 }
 
 TEST(Matcher, ConstructorArgument) {
@@ -1385,44 +1341,38 @@ TEST(Matcher, ConstructorArgument) {
       ast_type_traits::TK_AsIs,
       cxxConstructExpr(hasArgument(0, declRefExpr(to(varDecl(hasName("y")))))));
 
+  EXPECT_TRUE(matches(
+      "class X { public: X(int); }; void x() { int y; X x(y); }", Constructor));
   EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { int y; X x(y); }",
-            Constructor));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { int y; X x = X(y); }",
-            Constructor));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { int y; X x = y; }",
-            Constructor));
+      matches("class X { public: X(int); }; void x() { int y; X x = X(y); }",
+              Constructor));
   EXPECT_TRUE(
-    notMatches("class X { public: X(int); }; void x() { int z; X x(z); }",
-               Constructor));
+      matches("class X { public: X(int); }; void x() { int y; X x = y; }",
+              Constructor));
+  EXPECT_TRUE(notMatches(
+      "class X { public: X(int); }; void x() { int z; X x(z); }", Constructor));
 
   StatementMatcher WrongIndex =
       traverse(ast_type_traits::TK_AsIs,
                cxxConstructExpr(
                    hasArgument(42, declRefExpr(to(varDecl(hasName("y")))))));
-  EXPECT_TRUE(
-    notMatches("class X { public: X(int); }; void x() { int y; X x(y); }",
-               WrongIndex));
+  EXPECT_TRUE(notMatches(
+      "class X { public: X(int); }; void x() { int y; X x(y); }", WrongIndex));
 }
 
 TEST(Matcher, ConstructorArgumentCount) {
   auto Constructor1Arg =
       traverse(ast_type_traits::TK_AsIs, cxxConstructExpr(argumentCountIs(1)));
 
+  EXPECT_TRUE(matches("class X { public: X(int); }; void x() { X x(0); }",
+                      Constructor1Arg));
+  EXPECT_TRUE(matches("class X { public: X(int); }; void x() { X x = X(0); }",
+                      Constructor1Arg));
+  EXPECT_TRUE(matches("class X { public: X(int); }; void x() { X x = 0; }",
+                      Constructor1Arg));
   EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { X x(0); }",
-            Constructor1Arg));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { X x = X(0); }",
-            Constructor1Arg));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { X x = 0; }",
-            Constructor1Arg));
-  EXPECT_TRUE(
-    notMatches("class X { public: X(int, int); }; void x() { X x(0, 0); }",
-               Constructor1Arg));
+      notMatches("class X { public: X(int, int); }; void x() { X x(0, 0); }",
+                 Constructor1Arg));
 }
 
 TEST(Matcher, ConstructorListInitialization) {
@@ -1430,19 +1380,16 @@ TEST(Matcher, ConstructorListInitialization) {
       traverse(ast_type_traits::TK_AsIs,
                varDecl(has(cxxConstructExpr(isListInitialization()))));
 
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { X x{0}; }",
-            ConstructorListInit));
-  EXPECT_FALSE(
-    matches("class X { public: X(int); }; void x() { X x(0); }",
-            ConstructorListInit));
+  EXPECT_TRUE(matches("class X { public: X(int); }; void x() { X x{0}; }",
+                      ConstructorListInit));
+  EXPECT_FALSE(matches("class X { public: X(int); }; void x() { X x(0); }",
+                       ConstructorListInit));
 }
 
 TEST(ConstructorDeclaration, IsImplicit) {
   // This one doesn't match because the constructor is not added by the
   // compiler (it is not needed).
-  EXPECT_TRUE(notMatches("class Foo { };",
-                         cxxConstructorDecl(isImplicit())));
+  EXPECT_TRUE(notMatches("class Foo { };", cxxConstructorDecl(isImplicit())));
   // The compiler added the implicit default constructor.
   EXPECT_TRUE(matches("class Foo { }; Foo* f = new Foo();",
                       cxxConstructorDecl(isImplicit())));
@@ -1456,8 +1403,8 @@ TEST(ConstructorDeclaration, IsImplicit) {
 TEST(ConstructorDeclaration, IsExplicit) {
   EXPECT_TRUE(matches("struct S { explicit S(int); };",
                       cxxConstructorDecl(isExplicit())));
-  EXPECT_TRUE(notMatches("struct S { S(int); };",
-                         cxxConstructorDecl(isExplicit())));
+  EXPECT_TRUE(
+      notMatches("struct S { S(int); };", cxxConstructorDecl(isExplicit())));
   EXPECT_TRUE(notMatches("template struct S { explicit(b) S(int);};",
                          cxxConstructorDecl(isExplicit()), langCxx20OrLater()));
   EXPECT_TRUE(matches("struct S { explicit(true) S(int);};",
@@ -1488,9 +1435,9 @@ TEST(DeductionGuideDeclaration, IsExplicit) {
 }
 
 TEST(ConstructorDeclaration, Kinds) {
-  EXPECT_TRUE(matches(
-      "struct S { S(); };",
-      cxxConstructorDecl(isDefaultConstructor(), unless(isImplicit()))));
+  EXPECT_TRUE(
+      matches("struct S { S(); };", cxxConstructorDecl(isDefaultConstructor(),
+                                                       unless(isImplicit()))));
   EXPECT_TRUE(notMatches(
       "struct S { S(); };",
       cxxConstructorDecl(isCopyConstructor(), unless(isImplicit()))));
@@ -1501,9 +1448,9 @@ TEST(ConstructorDeclaration, Kinds) {
   EXPECT_TRUE(notMatches(
       "struct S { S(const S&); };",
       cxxConstructorDecl(isDefaultConstructor(), unless(isImplicit()))));
-  EXPECT_TRUE(matches(
-      "struct S { S(const S&); };",
-      cxxConstructorDecl(isCopyConstructor(), unless(isImplicit()))));
+  EXPECT_TRUE(
+      matches("struct S { S(const S&); };",
+              cxxConstructorDecl(isCopyConstructor(), unless(isImplicit()))));
   EXPECT_TRUE(notMatches(
       "struct S { S(const S&); };",
       cxxConstructorDecl(isMoveConstructor(), unless(isImplicit()))));
@@ -1514,9 +1461,9 @@ TEST(ConstructorDeclaration, Kinds) {
   EXPECT_TRUE(notMatches(
       "struct S { S(S&&); };",
       cxxConstructorDecl(isCopyConstructor(), unless(isImplicit()))));
-  EXPECT_TRUE(matches(
-      "struct S { S(S&&); };",
-      cxxConstructorDecl(isMoveConstructor(), unless(isImplicit()))));
+  EXPECT_TRUE(
+      matches("struct S { S(S&&); };",
+              cxxConstructorDecl(isMoveConstructor(), unless(isImplicit()))));
 }
 
 TEST(ConstructorDeclaration, IsUserProvided) {
@@ -1527,7 +1474,7 @@ TEST(ConstructorDeclaration, IsUserProvided) {
   EXPECT_TRUE(notMatches("struct S { S() = delete; };",
                          cxxConstructorDecl(isUserProvided())));
   EXPECT_TRUE(
-    matches("struct S { S(); };", cxxConstructorDecl(isUserProvided())));
+      matches("struct S { S(); };", cxxConstructorDecl(isUserProvided())));
   EXPECT_TRUE(matches("struct S { S(); }; S::S(){}",
                       cxxConstructorDecl(isUserProvided())));
 }
@@ -1538,11 +1485,11 @@ TEST(ConstructorDeclaration, IsDelegatingConstructor) {
   EXPECT_TRUE(notMatches("struct S { S(){} S(int X) : X(X) {} int X; };",
                          cxxConstructorDecl(isDelegatingConstructor())));
   EXPECT_TRUE(matches(
-    "struct S { S() : S(0) {} S(int X) : X(X) {} int X; };",
-    cxxConstructorDecl(isDelegatingConstructor(), parameterCountIs(0))));
+      "struct S { S() : S(0) {} S(int X) : X(X) {} int X; };",
+      cxxConstructorDecl(isDelegatingConstructor(), parameterCountIs(0))));
   EXPECT_TRUE(matches(
-    "struct S { S(); S(int X); int X; }; S::S(int X) : S() {}",
-    cxxConstructorDecl(isDelegatingConstructor(), parameterCountIs(1))));
+      "struct S { S(); S(int X); int X; }; S::S(int X) : S() {}",
+      cxxConstructorDecl(isDelegatingConstructor(), parameterCountIs(1))));
 }
 
 TEST(StringLiteral, HasSize) {
@@ -1584,38 +1531,28 @@ TEST(Matcher, HasNameSupportsNamespaces) {
 }
 
 TEST(Matcher, HasNameSupportsOuterClasses) {
-  EXPECT_TRUE(
-    matches("class A { class B { class C; }; };",
-            recordDecl(hasName("A::B::C"))));
-  EXPECT_TRUE(
-    matches("class A { class B { class C; }; };",
-            recordDecl(hasName("::A::B::C"))));
-  EXPECT_TRUE(
-    matches("class A { class B { class C; }; };",
-            recordDecl(hasName("B::C"))));
-  EXPECT_TRUE(
-    matches("class A { class B { class C; }; };",
-            recordDecl(hasName("C"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("c::B::C"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("A::c::C"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("A::B::A"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("::C"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("::B::C"))));
+  EXPECT_TRUE(matches("class A { class B { class C; }; };",
+                      recordDecl(hasName("A::B::C"))));
+  EXPECT_TRUE(matches("class A { class B { class C; }; };",
+                      recordDecl(hasName("::A::B::C"))));
+  EXPECT_TRUE(matches("class A { class B { class C; }; };",
+                      recordDecl(hasName("B::C"))));
+  EXPECT_TRUE(
+      matches("class A { class B { class C; }; };", recordDecl(hasName("C"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("c::B::C"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("A::c::C"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("A::B::A"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("::C"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("::B::C"))));
   EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
                          recordDecl(hasName("z::A::B::C"))));
-  EXPECT_TRUE(
-    notMatches("class A { class B { class C; }; };",
-               recordDecl(hasName("A+B::C"))));
+  EXPECT_TRUE(notMatches("class A { class B { class C; }; };",
+                         recordDecl(hasName("A+B::C"))));
 }
 
 TEST(Matcher, HasNameSupportsInlinedNamespaces) {
@@ -1629,10 +1566,10 @@ TEST(Matcher, HasNameSupportsInlinedNamespaces) {
 TEST(Matcher, HasNameSupportsAnonymousNamespaces) {
   StringRef code = "namespace a { namespace { class C; } }";
   EXPECT_TRUE(
-    matches(code, recordDecl(hasName("a::(anonymous namespace)::C"))));
+      matches(code, recordDecl(hasName("a::(anonymous namespace)::C"))));
   EXPECT_TRUE(matches(code, recordDecl(hasName("a::C"))));
   EXPECT_TRUE(
-    matches(code, recordDecl(hasName("::a::(anonymous namespace)::C"))));
+      matches(code, recordDecl(hasName("::a::(anonymous namespace)::C"))));
   EXPECT_TRUE(matches(code, recordDecl(hasName("::a::C"))));
 }
 
@@ -1689,7 +1626,7 @@ TEST(Matcher, HasAnyName) {
 
   EXPECT_TRUE(notMatches(Code, recordDecl(hasAnyName("::C", "::b::C"))));
   EXPECT_TRUE(
-    matches(Code, recordDecl(hasAnyName("::C", "::b::C", "::a::b::C"))));
+      matches(Code, recordDecl(hasAnyName("::C", "::b::C", "::a::b::C"))));
 
   std::vector Names = {"::C", "::b::C", "::a::b::C"};
   EXPECT_TRUE(matches(Code, recordDecl(hasAnyName(Names))));
@@ -1697,27 +1634,27 @@ TEST(Matcher, HasAnyName) {
 
 TEST(Matcher, IsDefinition) {
   DeclarationMatcher DefinitionOfClassA =
-    recordDecl(hasName("A"), isDefinition());
+      recordDecl(hasName("A"), isDefinition());
   EXPECT_TRUE(matches("class A {};", DefinitionOfClassA));
   EXPECT_TRUE(notMatches("class A;", DefinitionOfClassA));
 
   DeclarationMatcher DefinitionOfVariableA =
-    varDecl(hasName("a"), isDefinition());
+      varDecl(hasName("a"), isDefinition());
   EXPECT_TRUE(matches("int a;", DefinitionOfVariableA));
   EXPECT_TRUE(notMatches("extern int a;", DefinitionOfVariableA));
 
   DeclarationMatcher DefinitionOfMethodA =
-    cxxMethodDecl(hasName("a"), isDefinition());
+      cxxMethodDecl(hasName("a"), isDefinition());
   EXPECT_TRUE(matches("class A { void a() {} };", DefinitionOfMethodA));
   EXPECT_TRUE(notMatches("class A { void a(); };", DefinitionOfMethodA));
 
   DeclarationMatcher DefinitionOfObjCMethodA =
-    objcMethodDecl(hasName("a"), isDefinition());
+      objcMethodDecl(hasName("a"), isDefinition());
   EXPECT_TRUE(matchesObjC("@interface A @end "
                           "@implementation A; -(void)a {} @end",
                           DefinitionOfObjCMethodA));
-  EXPECT_TRUE(notMatchesObjC("@interface A; - (void)a; @end",
-                             DefinitionOfObjCMethodA));
+  EXPECT_TRUE(
+      notMatchesObjC("@interface A; - (void)a; @end", DefinitionOfObjCMethodA));
 }
 
 TEST(Matcher, HandlesNullQualTypes) {
@@ -1728,7 +1665,7 @@ TEST(Matcher, HandlesNullQualTypes) {
   // We don't really care whether this matcher succeeds; we're testing that
   // it completes without crashing.
   EXPECT_TRUE(matches(
-    "struct A { };"
+      "struct A { };"
       "template "
       "void f(T t) {"
       "  T local_t(t /* this becomes a null QualType in the AST */);"
@@ -1736,13 +1673,10 @@ TEST(Matcher, HandlesNullQualTypes) {
       "void g() {"
       "  f(0);"
       "}",
-    expr(hasType(TypeMatcher(
-      anyOf(
-        TypeMatcher(hasDeclaration(anything())),
-        pointsTo(AnyType),
-        references(AnyType)
-        // Other QualType matchers should go here.
-      ))))));
+      expr(hasType(TypeMatcher(anyOf(TypeMatcher(hasDeclaration(anything())),
+                                     pointsTo(AnyType), references(AnyType)
+                                     // Other QualType matchers should go here.
+                                     ))))));
 }
 
 TEST(ObjCIvarRefExprMatcher, IvarExpr) {
@@ -1750,10 +1684,10 @@ TEST(ObjCIvarRefExprMatcher, IvarExpr) {
       "@interface A @end "
       "@implementation A { A *x; } - (void) func { x = 0; } @end";
   EXPECT_TRUE(matchesObjC(ObjCString, objcIvarRefExpr()));
-  EXPECT_TRUE(matchesObjC(ObjCString, objcIvarRefExpr(
-        hasDeclaration(namedDecl(hasName("x"))))));
-  EXPECT_FALSE(matchesObjC(ObjCString, objcIvarRefExpr(
-        hasDeclaration(namedDecl(hasName("y"))))));
+  EXPECT_TRUE(matchesObjC(
+      ObjCString, objcIvarRefExpr(hasDeclaration(namedDecl(hasName("x"))))));
+  EXPECT_FALSE(matchesObjC(
+      ObjCString, objcIvarRefExpr(hasDeclaration(namedDecl(hasName("y"))))));
 }
 
 TEST(BlockExprMatcher, BlockExpr) {
@@ -1761,24 +1695,19 @@ TEST(BlockExprMatcher, BlockExpr) {
 }
 
 TEST(StatementCountIs, FindsNoStatementsInAnEmptyCompoundStatement) {
-  EXPECT_TRUE(matches("void f() { }",
-                      compoundStmt(statementCountIs(0))));
-  EXPECT_TRUE(notMatches("void f() {}",
-                         compoundStmt(statementCountIs(1))));
+  EXPECT_TRUE(matches("void f() { }", compoundStmt(statementCountIs(0))));
+  EXPECT_TRUE(notMatches("void f() {}", compoundStmt(statementCountIs(1))));
 }
 
 TEST(StatementCountIs, AppearsToMatchOnlyOneCount) {
-  EXPECT_TRUE(matches("void f() { 1; }",
-                      compoundStmt(statementCountIs(1))));
-  EXPECT_TRUE(notMatches("void f() { 1; }",
-                         compoundStmt(statementCountIs(0))));
-  EXPECT_TRUE(notMatches("void f() { 1; }",
-                         compoundStmt(statementCountIs(2))));
+  EXPECT_TRUE(matches("void f() { 1; }", compoundStmt(statementCountIs(1))));
+  EXPECT_TRUE(notMatches("void f() { 1; }", compoundStmt(statementCountIs(0))));
+  EXPECT_TRUE(notMatches("void f() { 1; }", compoundStmt(statementCountIs(2))));
 }
 
 TEST(StatementCountIs, WorksWithMultipleStatements) {
-  EXPECT_TRUE(matches("void f() { 1; 2; 3; }",
-                      compoundStmt(statementCountIs(3))));
+  EXPECT_TRUE(
+      matches("void f() { 1; 2; 3; }", compoundStmt(statementCountIs(3))));
 }
 
 TEST(StatementCountIs, WorksWithNestedCompoundStatements) {
@@ -1806,19 +1735,19 @@ TEST(Member, DoesNotMatchTheBaseExpression) {
 
 TEST(Member, MatchesInMemberFunctionCall) {
   EXPECT_TRUE(matches("void f() {"
-                        "  struct { void first() {}; } s;"
-                        "  s.first();"
-                        "};",
+                      "  struct { void first() {}; } s;"
+                      "  s.first();"
+                      "};",
                       memberExpr(member(hasName("first")))));
 }
 
 TEST(Member, MatchesMember) {
-  EXPECT_TRUE(matches(
-    "struct A { int i; }; void f() { A a; a.i = 2; }",
-    memberExpr(hasDeclaration(fieldDecl(hasType(isInteger()))))));
-  EXPECT_TRUE(notMatches(
-    "struct A { float f; }; void f() { A a; a.f = 2.0f; }",
-    memberExpr(hasDeclaration(fieldDecl(hasType(isInteger()))))));
+  EXPECT_TRUE(
+      matches("struct A { int i; }; void f() { A a; a.i = 2; }",
+              memberExpr(hasDeclaration(fieldDecl(hasType(isInteger()))))));
+  EXPECT_TRUE(
+      notMatches("struct A { float f; }; void f() { A a; a.f = 2.0f; }",
+                 memberExpr(hasDeclaration(fieldDecl(hasType(isInteger()))))));
 }
 
 TEST(Member, BitFields) {
@@ -1841,26 +1770,26 @@ TEST(Member, InClassInitializer) {
 }
 
 TEST(Member, UnderstandsAccess) {
-  EXPECT_TRUE(matches(
-    "struct A { int i; };", fieldDecl(isPublic(), hasName("i"))));
-  EXPECT_TRUE(notMatches(
-    "struct A { int i; };", fieldDecl(isProtected(), hasName("i"))));
-  EXPECT_TRUE(notMatches(
-    "struct A { int i; };", fieldDecl(isPrivate(), hasName("i"))));
+  EXPECT_TRUE(
+      matches("struct A { int i; };", fieldDecl(isPublic(), hasName("i"))));
+  EXPECT_TRUE(notMatches("struct A { int i; };",
+                         fieldDecl(isProtected(), hasName("i"))));
+  EXPECT_TRUE(
+      notMatches("struct A { int i; };", fieldDecl(isPrivate(), hasName("i"))));
 
-  EXPECT_TRUE(notMatches(
-    "class A { int i; };", fieldDecl(isPublic(), hasName("i"))));
-  EXPECT_TRUE(notMatches(
-    "class A { int i; };", fieldDecl(isProtected(), hasName("i"))));
-  EXPECT_TRUE(matches(
-    "class A { int i; };", fieldDecl(isPrivate(), hasName("i"))));
+  EXPECT_TRUE(
+      notMatches("class A { int i; };", fieldDecl(isPublic(), hasName("i"))));
+  EXPECT_TRUE(notMatches("class A { int i; };",
+                         fieldDecl(isProtected(), hasName("i"))));
+  EXPECT_TRUE(
+      matches("class A { int i; };", fieldDecl(isPrivate(), hasName("i"))));
 
-  EXPECT_TRUE(notMatches(
-    "class A { protected: int i; };", fieldDecl(isPublic(), hasName("i"))));
+  EXPECT_TRUE(notMatches("class A { protected: int i; };",
+                         fieldDecl(isPublic(), hasName("i"))));
   EXPECT_TRUE(matches("class A { protected: int i; };",
                       fieldDecl(isProtected(), hasName("i"))));
-  EXPECT_TRUE(notMatches(
-    "class A { protected: int i; };", fieldDecl(isPrivate(), hasName("i"))));
+  EXPECT_TRUE(notMatches("class A { protected: int i; };",
+                         fieldDecl(isPrivate(), hasName("i"))));
 
   // Non-member decls have the AccessSpecifier AS_none and thus aren't matched.
   EXPECT_TRUE(notMatches("int i;", varDecl(isPublic(), hasName("i"))));
@@ -1883,35 +1812,35 @@ TEST(hasDynamicExceptionSpec, MatchesDynamicExceptionSpecifications) {
   EXPECT_TRUE(
       matches("void l() throw(...);", functionDecl(hasDynamicExceptionSpec())));
 
-  EXPECT_TRUE(notMatches("void f();", functionProtoType(hasDynamicExceptionSpec())));
+  EXPECT_TRUE(
+      notMatches("void f();", functionProtoType(hasDynamicExceptionSpec())));
   EXPECT_TRUE(notMatches("void g() noexcept;",
                          functionProtoType(hasDynamicExceptionSpec())));
   EXPECT_TRUE(notMatches("void h() noexcept(true);",
                          functionProtoType(hasDynamicExceptionSpec())));
   EXPECT_TRUE(notMatches("void i() noexcept(false);",
                          functionProtoType(hasDynamicExceptionSpec())));
-  EXPECT_TRUE(
-      matches("void j() throw();", functionProtoType(hasDynamicExceptionSpec())));
-  EXPECT_TRUE(
-      matches("void k() throw(int);", functionProtoType(hasDynamicExceptionSpec())));
-  EXPECT_TRUE(
-      matches("void l() throw(...);", functionProtoType(hasDynamicExceptionSpec())));
+  EXPECT_TRUE(matches("void j() throw();",
+                      functionProtoType(hasDynamicExceptionSpec())));
+  EXPECT_TRUE(matches("void k() throw(int);",
+                      functionProtoType(hasDynamicExceptionSpec())));
+  EXPECT_TRUE(matches("void l() throw(...);",
+                      functionProtoType(hasDynamicExceptionSpec())));
 }
 
 TEST(HasObjectExpression, DoesNotMatchMember) {
   EXPECT_TRUE(notMatches(
-    "class X {}; struct Z { X m; }; void f(Z z) { z.m; }",
-    memberExpr(hasObjectExpression(hasType(recordDecl(hasName("X")))))));
+      "class X {}; struct Z { X m; }; void f(Z z) { z.m; }",
+      memberExpr(hasObjectExpression(hasType(recordDecl(hasName("X")))))));
 }
 
 TEST(HasObjectExpression, MatchesBaseOfVariable) {
   EXPECT_TRUE(matches(
-    "struct X { int m; }; void f(X x) { x.m; }",
-    memberExpr(hasObjectExpression(hasType(recordDecl(hasName("X")))))));
-  EXPECT_TRUE(matches(
-    "struct X { int m; }; void f(X* x) { x->m; }",
-    memberExpr(hasObjectExpression(
-      hasType(pointsTo(recordDecl(hasName("X"))))))));
+      "struct X { int m; }; void f(X x) { x.m; }",
+      memberExpr(hasObjectExpression(hasType(recordDecl(hasName("X")))))));
+  EXPECT_TRUE(matches("struct X { int m; }; void f(X* x) { x->m; }",
+                      memberExpr(hasObjectExpression(
+                          hasType(pointsTo(recordDecl(hasName("X"))))))));
   EXPECT_TRUE(matches("template  struct X { void f() { T t; t.m; } };",
                       cxxDependentScopeMemberExpr(hasObjectExpression(
                           declRefExpr(to(namedDecl(hasName("t"))))))));
@@ -1936,14 +1865,12 @@ TEST(HasObjectExpression, MatchesBaseOfMemberFunc) {
 
 TEST(HasObjectExpression,
      MatchesObjectExpressionOfImplicitlyFormedMemberExpression) {
-  EXPECT_TRUE(matches(
-    "class X {}; struct S { X m; void f() { this->m; } };",
-    memberExpr(hasObjectExpression(
-      hasType(pointsTo(recordDecl(hasName("S"))))))));
-  EXPECT_TRUE(matches(
-    "class X {}; struct S { X m; void f() { m; } };",
-    memberExpr(hasObjectExpression(
-      hasType(pointsTo(recordDecl(hasName("S"))))))));
+  EXPECT_TRUE(matches("class X {}; struct S { X m; void f() { this->m; } };",
+                      memberExpr(hasObjectExpression(
+                          hasType(pointsTo(recordDecl(hasName("S"))))))));
+  EXPECT_TRUE(matches("class X {}; struct S { X m; void f() { m; } };",
+                      memberExpr(hasObjectExpression(
+                          hasType(pointsTo(recordDecl(hasName("S"))))))));
 }
 
 TEST(Field, DoesNotMatchNonFieldMembers) {
@@ -1958,17 +1885,17 @@ TEST(Field, MatchesField) {
 }
 
 TEST(IsVolatileQualified, QualifiersMatch) {
-  EXPECT_TRUE(matches("volatile int i = 42;",
-                      varDecl(hasType(isVolatileQualified()))));
-  EXPECT_TRUE(notMatches("volatile int *i;",
-                         varDecl(hasType(isVolatileQualified()))));
+  EXPECT_TRUE(
+      matches("volatile int i = 42;", varDecl(hasType(isVolatileQualified()))));
+  EXPECT_TRUE(
+      notMatches("volatile int *i;", varDecl(hasType(isVolatileQualified()))));
   EXPECT_TRUE(matches("typedef volatile int v_int; v_int i = 42;",
                       varDecl(hasType(isVolatileQualified()))));
 }
 
 TEST(IsConstQualified, MatchesConstInt) {
-  EXPECT_TRUE(matches("const int i = 42;",
-                      varDecl(hasType(isConstQualified()))));
+  EXPECT_TRUE(
+      matches("const int i = 42;", varDecl(hasType(isConstQualified()))));
 }
 
 TEST(IsConstQualified, MatchesConstPointer) {
@@ -1986,43 +1913,41 @@ TEST(IsConstQualified, MatchesThroughTypedef) {
 TEST(IsConstQualified, DoesNotMatchInappropriately) {
   EXPECT_TRUE(notMatches("typedef int nonconst_int; nonconst_int i = 42;",
                          varDecl(hasType(isConstQualified()))));
-  EXPECT_TRUE(notMatches("int const* p;",
-                         varDecl(hasType(isConstQualified()))));
+  EXPECT_TRUE(
+      notMatches("int const* p;", varDecl(hasType(isConstQualified()))));
 }
 
 TEST(DeclCount, DeclCountIsCorrect) {
-  EXPECT_TRUE(matches("void f() {int i,j;}",
-                      declStmt(declCountIs(2))));
-  EXPECT_TRUE(notMatches("void f() {int i,j; int k;}",
-                         declStmt(declCountIs(3))));
-  EXPECT_TRUE(notMatches("void f() {int i,j, k, l;}",
-                         declStmt(declCountIs(3))));
+  EXPECT_TRUE(matches("void f() {int i,j;}", declStmt(declCountIs(2))));
+  EXPECT_TRUE(
+      notMatches("void f() {int i,j; int k;}", declStmt(declCountIs(3))));
+  EXPECT_TRUE(
+      notMatches("void f() {int i,j, k, l;}", declStmt(declCountIs(3))));
 }
 
-
 TEST(EachOf, TriggersForEachMatch) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "class A { int a; int b; };",
-    recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
-                      has(fieldDecl(hasName("b")).bind("v")))),
-    std::make_unique>("v", 2)));
+      "class A { int a; int b; };",
+      recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
+                        has(fieldDecl(hasName("b")).bind("v")))),
+      std::make_unique>("v", 2)));
 }
 
 TEST(EachOf, BehavesLikeAnyOfUnlessBothMatch) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "class A { int a; int c; };",
-    recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
-                      has(fieldDecl(hasName("b")).bind("v")))),
-    std::make_unique>("v", 1)));
+      "class A { int a; int c; };",
+      recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
+                        has(fieldDecl(hasName("b")).bind("v")))),
+      std::make_unique>("v", 1)));
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "class A { int c; int b; };",
-    recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
-                      has(fieldDecl(hasName("b")).bind("v")))),
-    std::make_unique>("v", 1)));
-  EXPECT_TRUE(notMatches(
-    "class A { int c; int d; };",
-    recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
-                      has(fieldDecl(hasName("b")).bind("v"))))));
+      "class A { int c; int b; };",
+      recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
+                        has(fieldDecl(hasName("b")).bind("v")))),
+      std::make_unique>("v", 1)));
+  EXPECT_TRUE(
+      notMatches("class A { int c; int d; };",
+                 recordDecl(eachOf(has(fieldDecl(hasName("a")).bind("v")),
+                                   has(fieldDecl(hasName("b")).bind("v"))))));
 }
 
 TEST(Optionally, SubmatchersDoNotMatch) {
@@ -2056,29 +1981,30 @@ TEST(IsTemplateInstantiation, MatchesImplicitClassTemplateInstantiation) {
   // Make sure that we can both match the class by name (::X) and by the type
   // the template was instantiated with (via a field).
 
-  EXPECT_TRUE(matches(
-    "template  class X {}; class A {}; X x;",
-    cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
+  EXPECT_TRUE(
+      matches("template  class X {}; class A {}; X x;",
+              cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
 
   EXPECT_TRUE(matches(
-    "template  class X { T t; }; class A {}; X x;",
-    cxxRecordDecl(isTemplateInstantiation(), hasDescendant(
-      fieldDecl(hasType(recordDecl(hasName("A"))))))));
+      "template  class X { T t; }; class A {}; X x;",
+      cxxRecordDecl(
+          isTemplateInstantiation(),
+          hasDescendant(fieldDecl(hasType(recordDecl(hasName("A"))))))));
 }
 
 TEST(IsTemplateInstantiation, MatchesImplicitFunctionTemplateInstantiation) {
   EXPECT_TRUE(matches(
-    "template  void f(T t) {} class A {}; void g() { f(A()); }",
-    functionDecl(hasParameter(0, hasType(recordDecl(hasName("A")))),
-                 isTemplateInstantiation())));
+      "template  void f(T t) {} class A {}; void g() { f(A()); }",
+      functionDecl(hasParameter(0, hasType(recordDecl(hasName("A")))),
+                   isTemplateInstantiation())));
 }
 
 TEST(IsTemplateInstantiation, MatchesExplicitClassTemplateInstantiation) {
-  EXPECT_TRUE(matches(
-    "template  class X { T t; }; class A {};"
-      "template class X;",
-    cxxRecordDecl(isTemplateInstantiation(), hasDescendant(
-      fieldDecl(hasType(recordDecl(hasName("A"))))))));
+  EXPECT_TRUE(matches("template  class X { T t; }; class A {};"
+                      "template class X;",
+                      cxxRecordDecl(isTemplateInstantiation(),
+                                    hasDescendant(fieldDecl(
+                                        hasType(recordDecl(hasName("A"))))))));
 
   // Make sure that we match the instantiation instead of the template
   // definition by checking whether the member function is present.
@@ -2091,21 +2017,21 @@ TEST(IsTemplateInstantiation, MatchesExplicitClassTemplateInstantiation) {
 
 TEST(IsTemplateInstantiation,
      MatchesInstantiationOfPartiallySpecializedClassTemplate) {
-  EXPECT_TRUE(matches(
-    "template  class X {};"
-      "template  class X {}; class A {}; X x;",
-    cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
+  EXPECT_TRUE(
+      matches("template  class X {};"
+              "template  class X {}; class A {}; X x;",
+              cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
 }
 
 TEST(IsTemplateInstantiation,
      MatchesInstantiationOfClassTemplateNestedInNonTemplate) {
-  EXPECT_TRUE(matches(
-    "class A {};"
-      "class X {"
-      "  template  class Y { U u; };"
-      "  Y y;"
-      "};",
-    cxxRecordDecl(hasName("::X::Y"), isTemplateInstantiation())));
+  EXPECT_TRUE(
+      matches("class A {};"
+              "class X {"
+              "  template  class Y { U u; };"
+              "  Y y;"
+              "};",
+              cxxRecordDecl(hasName("::X::Y"), isTemplateInstantiation())));
 }
 
 TEST(IsTemplateInstantiation, DoesNotMatchInstantiationsInsideOfInstantiation) {
@@ -2113,31 +2039,30 @@ TEST(IsTemplateInstantiation, DoesNotMatchInstantiationsInsideOfInstantiation) {
   // normal use case as long as the uppermost instantiation always is marked
   // as template instantiation, but it might be confusing as a predicate.
   EXPECT_TRUE(matches(
-    "class A {};"
+      "class A {};"
       "template  class X {"
       "  template  class Y { U u; };"
       "  Y y;"
       "}; X x;",
-    cxxRecordDecl(hasName("::X::Y"), unless(isTemplateInstantiation()))));
+      cxxRecordDecl(hasName("::X::Y"), unless(isTemplateInstantiation()))));
 }
 
 TEST(IsTemplateInstantiation, DoesNotMatchExplicitClassTemplateSpecialization) {
-  EXPECT_TRUE(notMatches(
-    "template  class X {}; class A {};"
-      "template <> class X {}; X x;",
-    cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
+  EXPECT_TRUE(
+      notMatches("template  class X {}; class A {};"
+                 "template <> class X {}; X x;",
+                 cxxRecordDecl(hasName("::X"), isTemplateInstantiation())));
 }
 
 TEST(IsTemplateInstantiation, DoesNotMatchNonTemplate) {
-  EXPECT_TRUE(notMatches(
-    "class A {}; class Y { A a; };",
-    cxxRecordDecl(isTemplateInstantiation())));
+  EXPECT_TRUE(notMatches("class A {}; class Y { A a; };",
+                         cxxRecordDecl(isTemplateInstantiation())));
 }
 
 TEST(IsInstantiated, MatchesInstantiation) {
   EXPECT_TRUE(
-    matches("template class A { T i; }; class Y { A a; };",
-            cxxRecordDecl(isInstantiated())));
+      matches("template class A { T i; }; class Y { A a; };",
+              cxxRecordDecl(isInstantiated())));
 }
 
 TEST(IsInstantiated, NotMatchesDefinition) {
@@ -2147,7 +2072,7 @@ TEST(IsInstantiated, NotMatchesDefinition) {
 
 TEST(IsInTemplateInstantiation, MatchesInstantiationStmt) {
   EXPECT_TRUE(matches("template struct A { A() { T i; } };"
-                        "class Y { A a; }; Y y;",
+                      "class Y { A a; }; Y y;",
                       declStmt(isInTemplateInstantiation())));
 }
 
@@ -2158,8 +2083,8 @@ TEST(IsInTemplateInstantiation, NotMatchesDefinitionStmt) {
 
 TEST(IsInstantiated, MatchesFunctionInstantiation) {
   EXPECT_TRUE(
-    matches("template void A(T t) { T i; } void x() { A(0); }",
-            functionDecl(isInstantiated())));
+      matches("template void A(T t) { T i; } void x() { A(0); }",
+              functionDecl(isInstantiated())));
 }
 
 TEST(IsInstantiated, NotMatchesFunctionDefinition) {
@@ -2169,8 +2094,8 @@ TEST(IsInstantiated, NotMatchesFunctionDefinition) {
 
 TEST(IsInTemplateInstantiation, MatchesFunctionInstantiationStmt) {
   EXPECT_TRUE(
-    matches("template void A(T t) { T i; } void x() { A(0); }",
-            declStmt(isInTemplateInstantiation())));
+      matches("template void A(T t) { T i; } void x() { A(0); }",
+              declStmt(isInTemplateInstantiation())));
 }
 
 TEST(IsInTemplateInstantiation, NotMatchesFunctionDefinitionStmt) {
@@ -2183,11 +2108,11 @@ TEST(IsInTemplateInstantiation, Sharing) {
   // FIXME: Node sharing is an implementation detail, exposing it is ugly
   // and makes the matcher behave in non-obvious ways.
   EXPECT_TRUE(notMatches(
-    "int j; template void A(T t) { j += 42; } void x() { A(0); }",
-    Matcher));
+      "int j; template void A(T t) { j += 42; } void x() { A(0); }",
+      Matcher));
   EXPECT_TRUE(matches(
-    "int j; template void A(T t) { j += t; } void x() { A(0); }",
-    Matcher));
+      "int j; template void A(T t) { j += t; } void x() { A(0); }",
+      Matcher));
 }
 
 TEST(IsInstantiationDependent, MatchesNonValueTypeDependent) {
@@ -2232,48 +2157,41 @@ TEST(IsValueDependent, MatchesInstantiationDependent) {
       expr(isValueDependent())));
 }
 
-TEST(IsExplicitTemplateSpecialization,
-     DoesNotMatchPrimaryTemplate) {
-  EXPECT_TRUE(notMatches(
-    "template  class X {};",
-    cxxRecordDecl(isExplicitTemplateSpecialization())));
-  EXPECT_TRUE(notMatches(
-    "template  void f(T t);",
-    functionDecl(isExplicitTemplateSpecialization())));
+TEST(IsExplicitTemplateSpecialization, DoesNotMatchPrimaryTemplate) {
+  EXPECT_TRUE(notMatches("template  class X {};",
+                         cxxRecordDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(notMatches("template  void f(T t);",
+                         functionDecl(isExplicitTemplateSpecialization())));
 }
 
 TEST(IsExplicitTemplateSpecialization,
      DoesNotMatchExplicitTemplateInstantiations) {
-  EXPECT_TRUE(notMatches(
-    "template  class X {};"
-      "template class X; extern template class X;",
-    cxxRecordDecl(isExplicitTemplateSpecialization())));
-  EXPECT_TRUE(notMatches(
-    "template  void f(T t) {}"
-      "template void f(int t); extern template void f(long t);",
-    functionDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(
+      notMatches("template  class X {};"
+                 "template class X; extern template class X;",
+                 cxxRecordDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(
+      notMatches("template  void f(T t) {}"
+                 "template void f(int t); extern template void f(long t);",
+                 functionDecl(isExplicitTemplateSpecialization())));
 }
 
 TEST(IsExplicitTemplateSpecialization,
      DoesNotMatchImplicitTemplateInstantiations) {
-  EXPECT_TRUE(notMatches(
-    "template  class X {}; X x;",
-    cxxRecordDecl(isExplicitTemplateSpecialization())));
-  EXPECT_TRUE(notMatches(
-    "template  void f(T t); void g() { f(10); }",
-    functionDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(notMatches("template  class X {}; X x;",
+                         cxxRecordDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(
+      notMatches("template  void f(T t); void g() { f(10); }",
+                 functionDecl(isExplicitTemplateSpecialization())));
 }
 
-TEST(IsExplicitTemplateSpecialization,
-     MatchesExplicitTemplateSpecializations) {
-  EXPECT_TRUE(matches(
-    "template  class X {};"
-      "template<> class X {};",
-    cxxRecordDecl(isExplicitTemplateSpecialization())));
-  EXPECT_TRUE(matches(
-    "template  void f(T t) {}"
-      "template<> void f(int t) {}",
-    functionDecl(isExplicitTemplateSpecialization())));
+TEST(IsExplicitTemplateSpecialization, MatchesExplicitTemplateSpecializations) {
+  EXPECT_TRUE(matches("template  class X {};"
+                      "template<> class X {};",
+                      cxxRecordDecl(isExplicitTemplateSpecialization())));
+  EXPECT_TRUE(matches("template  void f(T t) {}"
+                      "template<> void f(int t) {}",
+                      functionDecl(isExplicitTemplateSpecialization())));
 }
 
 TEST(TypeMatching, MatchesNoReturn) {
@@ -2314,8 +2232,8 @@ TEST(TypeMatching, MatchesNoReturn) {
 
   EXPECT_TRUE(
       matches("struct S { [[noreturn]] S(); };", functionDecl(isNoReturn())));
-  EXPECT_TRUE(matches("struct S { [[noreturn]] S() {} };",
-                      functionDecl(isNoReturn())));
+  EXPECT_TRUE(
+      matches("struct S { [[noreturn]] S() {} };", functionDecl(isNoReturn())));
 
   // ---
 
@@ -2344,14 +2262,12 @@ TEST(TypeMatching, MatchesNoReturn) {
   // ---
 
   EXPECT_TRUE(matchesC("__attribute__((noreturn)) void func();",
-                      functionDecl(isNoReturn())));
+                       functionDecl(isNoReturn())));
   EXPECT_TRUE(matchesC("__attribute__((noreturn)) void func() {}",
-                      functionDecl(isNoReturn())));
+                       functionDecl(isNoReturn())));
 
-  EXPECT_TRUE(matchesC("_Noreturn void func();",
-                      functionDecl(isNoReturn())));
-  EXPECT_TRUE(matchesC("_Noreturn void func() {}",
-                      functionDecl(isNoReturn())));
+  EXPECT_TRUE(matchesC("_Noreturn void func();", functionDecl(isNoReturn())));
+  EXPECT_TRUE(matchesC("_Noreturn void func() {}", functionDecl(isNoReturn())));
 }
 
 TEST(TypeMatching, MatchesBool) {
@@ -2383,45 +2299,42 @@ TEST(TypeMatching, MatchesArrayTypes) {
   EXPECT_TRUE(notMatches("struct A {}; A a[7];",
                          arrayType(hasElementType(builtinType()))));
 
+  EXPECT_TRUE(matches("int const a[] = { 2, 3 };",
+                      qualType(arrayType(hasElementType(builtinType())))));
   EXPECT_TRUE(matches(
-    "int const a[] = { 2, 3 };",
-    qualType(arrayType(hasElementType(builtinType())))));
-  EXPECT_TRUE(matches(
-    "int const a[] = { 2, 3 };",
-    qualType(isConstQualified(), arrayType(hasElementType(builtinType())))));
-  EXPECT_TRUE(matches(
-    "typedef const int T; T x[] = { 1, 2 };",
-    qualType(isConstQualified(), arrayType())));
+      "int const a[] = { 2, 3 };",
+      qualType(isConstQualified(), arrayType(hasElementType(builtinType())))));
+  EXPECT_TRUE(matches("typedef const int T; T x[] = { 1, 2 };",
+                      qualType(isConstQualified(), arrayType())));
 
   EXPECT_TRUE(notMatches(
-    "int a[] = { 2, 3 };",
-    qualType(isConstQualified(), arrayType(hasElementType(builtinType())))));
-  EXPECT_TRUE(notMatches(
-    "int a[] = { 2, 3 };",
-    qualType(arrayType(hasElementType(isConstQualified(), builtinType())))));
+      "int a[] = { 2, 3 };",
+      qualType(isConstQualified(), arrayType(hasElementType(builtinType())))));
   EXPECT_TRUE(notMatches(
-    "int const a[] = { 2, 3 };",
-    qualType(arrayType(hasElementType(builtinType())),
-             unless(isConstQualified()))));
+      "int a[] = { 2, 3 };",
+      qualType(arrayType(hasElementType(isConstQualified(), builtinType())))));
+  EXPECT_TRUE(notMatches("int const a[] = { 2, 3 };",
+                         qualType(arrayType(hasElementType(builtinType())),
+                                  unless(isConstQualified()))));
 
-  EXPECT_TRUE(matches("int a[2];",
-                      constantArrayType(hasElementType(builtinType()))));
+  EXPECT_TRUE(
+      matches("int a[2];", constantArrayType(hasElementType(builtinType()))));
   EXPECT_TRUE(matches("const int a = 0;", qualType(isInteger())));
 }
 
 TEST(TypeMatching, DecayedType) {
-  EXPECT_TRUE(matches("void f(int i[]);", valueDecl(hasType(decayedType(hasDecayedType(pointerType()))))));
+  EXPECT_TRUE(
+      matches("void f(int i[]);",
+              valueDecl(hasType(decayedType(hasDecayedType(pointerType()))))));
   EXPECT_TRUE(notMatches("int i[7];", decayedType()));
 }
 
 TEST(TypeMatching, MatchesComplexTypes) {
   EXPECT_TRUE(matches("_Complex float f;", complexType()));
-  EXPECT_TRUE(matches(
-    "_Complex float f;",
-    complexType(hasElementType(builtinType()))));
-  EXPECT_TRUE(notMatches(
-    "_Complex float f;",
-    complexType(hasElementType(isInteger()))));
+  EXPECT_TRUE(
+      matches("_Complex float f;", complexType(hasElementType(builtinType()))));
+  EXPECT_TRUE(notMatches("_Complex float f;",
+                         complexType(hasElementType(isInteger()))));
 }
 
 TEST(NS, Anonymous) {
@@ -2482,38 +2395,38 @@ TEST(DeclarationMatcher, InStdNamespace) {
 
 TEST(EqualsBoundNodeMatcher, QualType) {
   EXPECT_TRUE(matches(
-    "int i = 1;", varDecl(hasType(qualType().bind("type")),
-                          hasInitializer(ignoringParenImpCasts(
-                            hasType(qualType(equalsBoundNode("type"))))))));
+      "int i = 1;", varDecl(hasType(qualType().bind("type")),
+                            hasInitializer(ignoringParenImpCasts(
+                                hasType(qualType(equalsBoundNode("type"))))))));
   EXPECT_TRUE(notMatches("int i = 1.f;",
                          varDecl(hasType(qualType().bind("type")),
                                  hasInitializer(ignoringParenImpCasts(hasType(
-                                   qualType(equalsBoundNode("type"))))))));
+                                     qualType(equalsBoundNode("type"))))))));
 }
 
 TEST(EqualsBoundNodeMatcher, NonMatchingTypes) {
   EXPECT_TRUE(notMatches(
-    "int i = 1;", varDecl(namedDecl(hasName("i")).bind("name"),
-                          hasInitializer(ignoringParenImpCasts(
-                            hasType(qualType(equalsBoundNode("type"))))))));
+      "int i = 1;", varDecl(namedDecl(hasName("i")).bind("name"),
+                            hasInitializer(ignoringParenImpCasts(
+                                hasType(qualType(equalsBoundNode("type"))))))));
 }
 
 TEST(EqualsBoundNodeMatcher, Stmt) {
   EXPECT_TRUE(
-    matches("void f() { if(true) {} }",
-            stmt(allOf(ifStmt().bind("if"),
-                       hasParent(stmt(has(stmt(equalsBoundNode("if")))))))));
+      matches("void f() { if(true) {} }",
+              stmt(allOf(ifStmt().bind("if"),
+                         hasParent(stmt(has(stmt(equalsBoundNode("if")))))))));
 
   EXPECT_TRUE(notMatches(
-    "void f() { if(true) { if (true) {} } }",
-    stmt(allOf(ifStmt().bind("if"), has(stmt(equalsBoundNode("if")))))));
+      "void f() { if(true) { if (true) {} } }",
+      stmt(allOf(ifStmt().bind("if"), has(stmt(equalsBoundNode("if")))))));
 }
 
 TEST(EqualsBoundNodeMatcher, Decl) {
   EXPECT_TRUE(matches(
-    "class X { class Y {}; };",
-    decl(allOf(recordDecl(hasName("::X::Y")).bind("record"),
-               hasParent(decl(has(decl(equalsBoundNode("record")))))))));
+      "class X { class Y {}; };",
+      decl(allOf(recordDecl(hasName("::X::Y")).bind("record"),
+                 hasParent(decl(has(decl(equalsBoundNode("record")))))))));
 
   EXPECT_TRUE(notMatches("class X { class Y {}; };",
                          decl(allOf(recordDecl(hasName("::X")).bind("record"),
@@ -2522,21 +2435,21 @@ TEST(EqualsBoundNodeMatcher, Decl) {
 
 TEST(EqualsBoundNodeMatcher, Type) {
   EXPECT_TRUE(matches(
-    "class X { int a; int b; };",
-    recordDecl(
-      has(fieldDecl(hasName("a"), hasType(type().bind("t")))),
-      has(fieldDecl(hasName("b"), hasType(type(equalsBoundNode("t"))))))));
+      "class X { int a; int b; };",
+      recordDecl(
+          has(fieldDecl(hasName("a"), hasType(type().bind("t")))),
+          has(fieldDecl(hasName("b"), hasType(type(equalsBoundNode("t"))))))));
 
   EXPECT_TRUE(notMatches(
-    "class X { int a; double b; };",
-    recordDecl(
-      has(fieldDecl(hasName("a"), hasType(type().bind("t")))),
-      has(fieldDecl(hasName("b"), hasType(type(equalsBoundNode("t"))))))));
+      "class X { int a; double b; };",
+      recordDecl(
+          has(fieldDecl(hasName("a"), hasType(type().bind("t")))),
+          has(fieldDecl(hasName("b"), hasType(type(equalsBoundNode("t"))))))));
 }
 
 TEST(EqualsBoundNodeMatcher, UsingForEachDescendant) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "int f() {"
+      "int f() {"
       "  if (1) {"
       "    int i = 9;"
       "  }"
@@ -2546,63 +2459,65 @@ TEST(EqualsBoundNodeMatcher, UsingForEachDescendant) {
       "  }"
       "  return 0;"
       "}",
-    // Look for variable declarations within functions whose type is the same
-    // as the function return type.
-    functionDecl(returns(qualType().bind("type")),
-                 forEachDescendant(varDecl(hasType(
-                   qualType(equalsBoundNode("type")))).bind("decl"))),
-    // Only i and j should match, not k.
-    std::make_unique>("decl", 2)));
+      // Look for variable declarations within functions whose type is the same
+      // as the function return type.
+      functionDecl(
+          returns(qualType().bind("type")),
+          forEachDescendant(varDecl(hasType(qualType(equalsBoundNode("type"))))
+                                .bind("decl"))),
+      // Only i and j should match, not k.
+      std::make_unique>("decl", 2)));
 }
 
 TEST(EqualsBoundNodeMatcher, FiltersMatchedCombinations) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "void f() {"
+      "void f() {"
       "  int x;"
       "  double d;"
       "  x = d + x - d + x;"
       "}",
-    functionDecl(
-      hasName("f"), forEachDescendant(varDecl().bind("d")),
-      forEachDescendant(declRefExpr(to(decl(equalsBoundNode("d")))))),
-    std::make_unique>("d", 5)));
+      functionDecl(
+          hasName("f"), forEachDescendant(varDecl().bind("d")),
+          forEachDescendant(declRefExpr(to(decl(equalsBoundNode("d")))))),
+      std::make_unique>("d", 5)));
 }
 
 TEST(EqualsBoundNodeMatcher, UnlessDescendantsOfAncestorsMatch) {
   EXPECT_TRUE(matchAndVerifyResultTrue(
-    "struct StringRef { int size() const; const char* data() const; };"
+      "struct StringRef { int size() const; const char* data() const; };"
       "void f(StringRef v) {"
       "  v.data();"
       "}",
-    cxxMemberCallExpr(
-      callee(cxxMethodDecl(hasName("data"))),
-      on(declRefExpr(to(
-        varDecl(hasType(recordDecl(hasName("StringRef")))).bind("var")))),
-      unless(hasAncestor(stmt(hasDescendant(cxxMemberCallExpr(
-        callee(cxxMethodDecl(anyOf(hasName("size"), hasName("length")))),
-        on(declRefExpr(to(varDecl(equalsBoundNode("var")))))))))))
-      .bind("data"),
-    std::make_unique>("data", 1)));
+      cxxMemberCallExpr(
+          callee(cxxMethodDecl(hasName("data"))),
+          on(declRefExpr(to(
+              varDecl(hasType(recordDecl(hasName("StringRef")))).bind("var")))),
+          unless(hasAncestor(stmt(hasDescendant(cxxMemberCallExpr(
+              callee(cxxMethodDecl(anyOf(hasName("size"), hasName("length")))),
+              on(declRefExpr(to(varDecl(equalsBoundNode("var")))))))))))
+          .bind("data"),
+      std::make_unique>("data", 1)));
 
   EXPECT_FALSE(matches(
-    "struct StringRef { int size() const; const char* data() const; };"
+      "struct StringRef { int size() const; const char* data() const; };"
       "void f(StringRef v) {"
       "  v.data();"
       "  v.size();"
       "}",
-    cxxMemberCallExpr(
-      callee(cxxMethodDecl(hasName("data"))),
-      on(declRefExpr(to(
-        varDecl(hasType(recordDecl(hasName("StringRef")))).bind("var")))),
-      unless(hasAncestor(stmt(hasDescendant(cxxMemberCallExpr(
-        callee(cxxMethodDecl(anyOf(hasName("size"), hasName("length")))),
-        on(declRefExpr(to(varDecl(equalsBoundNode("var")))))))))))
-      .bind("data")));
+      cxxMemberCallExpr(
+          callee(cxxMethodDecl(hasName("data"))),
+          on(declRefExpr(to(
+              varDecl(hasType(recordDecl(hasName("StringRef")))).bind("var")))),
+          unless(hasAncestor(stmt(hasDescendant(cxxMemberCallExpr(
+              callee(cxxMethodDecl(anyOf(hasName("size"), hasName("length")))),
+              on(declRefExpr(to(varDecl(equalsBoundNode("var")))))))))))
+          .bind("data")));
 }
 
 TEST(NullPointerConstants, Basic) {
   EXPECT_TRUE(matches("#define NULL ((void *)0)\n"
-                        "void *v1 = NULL;", expr(nullPointerConstant())));
+                      "void *v1 = NULL;",
+                      expr(nullPointerConstant())));
   EXPECT_TRUE(matches("void *v2 = nullptr;", expr(nullPointerConstant())));
   EXPECT_TRUE(matches("void *v3 = __null;", expr(nullPointerConstant())));
   EXPECT_TRUE(matches("char *cp = (char *)0;", expr(nullPointerConstant())));
@@ -2635,10 +2550,10 @@ TEST(HasExternalFormalLinkage, Basic) {
 }
 
 TEST(HasDefaultArgument, Basic) {
-  EXPECT_TRUE(matches("void x(int val = 0) {}",
-                      parmVarDecl(hasDefaultArgument())));
-  EXPECT_TRUE(notMatches("void x(int val) {}",
-                      parmVarDecl(hasDefaultArgument())));
+  EXPECT_TRUE(
+      matches("void x(int val = 0) {}", parmVarDecl(hasDefaultArgument())));
+  EXPECT_TRUE(
+      notMatches("void x(int val) {}", parmVarDecl(hasDefaultArgument())));
 }
 
 TEST(IsAtPosition, Basic) {
@@ -2691,24 +2606,18 @@ TEST(HasArraySize, Basic) {
 }
 
 TEST(HasDefinition, MatchesStructDefinition) {
-  EXPECT_TRUE(matches("struct x {};",
-                      cxxRecordDecl(hasDefinition())));
-  EXPECT_TRUE(notMatches("struct x;",
-                      cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(matches("struct x {};", cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(notMatches("struct x;", cxxRecordDecl(hasDefinition())));
 }
 
 TEST(HasDefinition, MatchesClassDefinition) {
-  EXPECT_TRUE(matches("class x {};",
-                      cxxRecordDecl(hasDefinition())));
-  EXPECT_TRUE(notMatches("class x;",
-                      cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(matches("class x {};", cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(notMatches("class x;", cxxRecordDecl(hasDefinition())));
 }
 
 TEST(HasDefinition, MatchesUnionDefinition) {
-  EXPECT_TRUE(matches("union x {};",
-                      cxxRecordDecl(hasDefinition())));
-  EXPECT_TRUE(notMatches("union x;",
-                      cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(matches("union x {};", cxxRecordDecl(hasDefinition())));
+  EXPECT_TRUE(notMatches("union x;", cxxRecordDecl(hasDefinition())));
 }
 
 TEST(IsScopedEnum, MatchesScopedEnum) {
@@ -2727,19 +2636,19 @@ TEST(HasTrailingReturn, MatchesTrailingReturn) {
   EXPECT_TRUE(matches("auto Y() -> int { return 0; }",
                       functionDecl(hasTrailingReturn())));
   EXPECT_TRUE(matches("auto X() -> int;", functionDecl(hasTrailingReturn())));
-  EXPECT_TRUE(notMatches("int X() { return 0; }",
-                      functionDecl(hasTrailingReturn())));
+  EXPECT_TRUE(
+      notMatches("int X() { return 0; }", functionDecl(hasTrailingReturn())));
   EXPECT_TRUE(notMatches("int X();", functionDecl(hasTrailingReturn())));
   EXPECT_TRUE(notMatchesC("void X();", functionDecl(hasTrailingReturn())));
 }
 
 TEST(HasTrailingReturn, MatchesLambdaTrailingReturn) {
   EXPECT_TRUE(matches(
-          "auto lambda2 = [](double x, double y) -> double {return x + y;};",
-          functionDecl(hasTrailingReturn())));
-  EXPECT_TRUE(notMatches(
-          "auto lambda2 = [](double x, double y) {return x + y;};",
-          functionDecl(hasTrailingReturn())));
+      "auto lambda2 = [](double x, double y) -> double {return x + y;};",
+      functionDecl(hasTrailingReturn())));
+  EXPECT_TRUE(
+      notMatches("auto lambda2 = [](double x, double y) {return x + y;};",
+                 functionDecl(hasTrailingReturn())));
 }
 
 TEST(IsAssignmentOperator, Basic) {
@@ -2772,23 +2681,15 @@ TEST(IsComparisonOperator, Basic) {
 }
 
 TEST(HasInit, Basic) {
-  EXPECT_TRUE(
-    matches("int x{0};",
-            initListExpr(hasInit(0, expr()))));
-  EXPECT_FALSE(
-    matches("int x{0};",
-            initListExpr(hasInit(1, expr()))));
-  EXPECT_FALSE(
-    matches("int x;",
-            initListExpr(hasInit(0, expr()))));
+  EXPECT_TRUE(matches("int x{0};", initListExpr(hasInit(0, expr()))));
+  EXPECT_FALSE(matches("int x{0};", initListExpr(hasInit(1, expr()))));
+  EXPECT_FALSE(matches("int x;", initListExpr(hasInit(0, expr()))));
 }
 
 TEST(Matcher, isMain) {
-  EXPECT_TRUE(
-    matches("int main() {}", functionDecl(isMain())));
+  EXPECT_TRUE(matches("int main() {}", functionDecl(isMain())));
 
-  EXPECT_TRUE(
-    notMatches("int main2() {}", functionDecl(isMain())));
+  EXPECT_TRUE(notMatches("int main2() {}", functionDecl(isMain())));
 }
 
 TEST(OMPExecutableDirective, isStandaloneDirective) {
@@ -2867,11 +2768,18 @@ void x() {
   EXPECT_TRUE(matchesWithOpenMP(Source3, Matcher));
 
   StringRef Source4 = R"(
+void x() {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(matchesWithOpenMP51(Source4, Matcher));
+
+  StringRef Source5 = R"(
 void x(int x) {
 #pragma omp parallel num_threads(x)
 ;
 })";
-  EXPECT_TRUE(matchesWithOpenMP(Source4, Matcher));
+  EXPECT_TRUE(matchesWithOpenMP(Source5, Matcher));
 }
 
 TEST(OMPDefaultClause, isNoneKind) {
@@ -2907,10 +2815,17 @@ void x() {
 
   StringRef Source4 = R"(
 void x(int x) {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP51(Source4, Matcher));
+
+  const std::string Source5 = R"(
+void x(int x) {
 #pragma omp parallel num_threads(x)
 ;
 })";
-  EXPECT_TRUE(notMatchesWithOpenMP(Source4, Matcher));
+  EXPECT_TRUE(notMatchesWithOpenMP(Source5, Matcher));
 }
 
 TEST(OMPDefaultClause, isSharedKind) {
@@ -2946,10 +2861,63 @@ void x() {
 
   StringRef Source4 = R"(
 void x(int x) {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP51(Source4, Matcher));
+
+  const std::string Source5 = R"(
+void x(int x) {
 #pragma omp parallel num_threads(x)
 ;
 })";
-  EXPECT_TRUE(notMatchesWithOpenMP(Source4, Matcher));
+  EXPECT_TRUE(notMatchesWithOpenMP(Source5, Matcher));
+}
+
+TEST(OMPDefaultClause, isFirstPrivateKind) {
+  auto Matcher = ompExecutableDirective(
+      hasAnyClause(ompDefaultClause(isFirstPrivateKind())));
+
+  const std::string Source0 = R"(
+void x() {
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP(Source0, Matcher));
+
+  const std::string Source1 = R"(
+void x() {
+#pragma omp parallel
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP(Source1, Matcher));
+
+  const std::string Source2 = R"(
+void x() {
+#pragma omp parallel default(shared)
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP(Source2, Matcher));
+
+  const std::string Source3 = R"(
+void x() {
+#pragma omp parallel default(none)
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP(Source3, Matcher));
+
+  const std::string Source4 = R"(
+void x(int x) {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(matchesWithOpenMP51(Source4, Matcher));
+
+  const std::string Source5 = R"(
+void x(int x) {
+#pragma omp parallel num_threads(x)
+;
+})";
+  EXPECT_TRUE(notMatchesWithOpenMP(Source5, Matcher));
 }
 
 TEST(OMPExecutableDirective, isAllowedToContainClauseKind) {
@@ -2984,24 +2952,31 @@ void x() {
   EXPECT_TRUE(matchesWithOpenMP(Source3, Matcher));
 
   StringRef Source4 = R"(
+void x() {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(matchesWithOpenMP51(Source4, Matcher));
+
+  StringRef Source5 = R"(
 void x(int x) {
 #pragma omp parallel num_threads(x)
 ;
 })";
-  EXPECT_TRUE(matchesWithOpenMP(Source4, Matcher));
+  EXPECT_TRUE(matchesWithOpenMP(Source5, Matcher));
 
-  StringRef Source5 = R"(
+  StringRef Source6 = R"(
 void x() {
 #pragma omp taskyield
 })";
-  EXPECT_TRUE(notMatchesWithOpenMP(Source5, Matcher));
+  EXPECT_TRUE(notMatchesWithOpenMP(Source6, Matcher));
 
-  StringRef Source6 = R"(
+  StringRef Source7 = R"(
 void x() {
 #pragma omp task
 ;
 })";
-  EXPECT_TRUE(matchesWithOpenMP(Source6, Matcher));
+  EXPECT_TRUE(matchesWithOpenMP(Source7, Matcher));
 }
 
 TEST(HasAnyBase, DirectBase) {

diff  --git a/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
index 59e0f74b3910..895c8ae48adc 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
@@ -118,13 +118,13 @@ TEST_P(ASTMatchersTest, TranslationUnitDecl) {
                    "int MyVar2;\n"
                    "}  // namespace NameSpace\n";
   EXPECT_TRUE(matches(
-    Code, varDecl(hasName("MyVar1"), hasDeclContext(translationUnitDecl()))));
+      Code, varDecl(hasName("MyVar1"), hasDeclContext(translationUnitDecl()))));
   EXPECT_FALSE(matches(
-    Code, varDecl(hasName("MyVar2"), hasDeclContext(translationUnitDecl()))));
+      Code, varDecl(hasName("MyVar2"), hasDeclContext(translationUnitDecl()))));
   EXPECT_TRUE(matches(
-    Code,
-    varDecl(hasName("MyVar2"),
-            hasDeclContext(decl(hasDeclContext(translationUnitDecl()))))));
+      Code,
+      varDecl(hasName("MyVar2"),
+              hasDeclContext(decl(hasDeclContext(translationUnitDecl()))))));
 }
 
 TEST_P(ASTMatchersTest, LinkageSpecDecl) {
@@ -158,10 +158,10 @@ TEST_P(ASTMatchersTest,
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(notMatches("template class X { };"
-                           "template<> class X { int a; };",
-                         classTemplateDecl(hasName("X"),
-                                           hasDescendant(fieldDecl(hasName("a"))))));
+  EXPECT_TRUE(notMatches(
+      "template class X { };"
+      "template<> class X { int a; };",
+      classTemplateDecl(hasName("X"), hasDescendant(fieldDecl(hasName("a"))))));
 }
 
 TEST_P(ASTMatchersTest,
@@ -169,18 +169,17 @@ TEST_P(ASTMatchersTest,
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(notMatches("template class X { };"
-                           "template class X { int a; };",
-                         classTemplateDecl(hasName("X"),
-                                           hasDescendant(fieldDecl(hasName("a"))))));
+  EXPECT_TRUE(notMatches(
+      "template class X { };"
+      "template class X { int a; };",
+      classTemplateDecl(hasName("X"), hasDescendant(fieldDecl(hasName("a"))))));
 }
 
 TEST(ASTMatchersTestCUDA, CUDAKernelCallExpr) {
   EXPECT_TRUE(matchesWithCuda("__global__ void f() { }"
-                                "void g() { f<<<1, 2>>>(); }",
+                              "void g() { f<<<1, 2>>>(); }",
                               cudaKernelCallExpr()));
-  EXPECT_TRUE(notMatchesWithCuda("void f() {}",
-                                 cudaKernelCallExpr()));
+  EXPECT_TRUE(notMatchesWithCuda("void f() {}", cudaKernelCallExpr()));
 }
 
 TEST(ASTMatchersTestCUDA, HasAttrCUDA) {
@@ -316,56 +315,50 @@ TEST_P(ASTMatchersTest, CallExpr_CXX) {
   // FIXME: Do we want to overload Call() to directly take
   // Matcher, too?
   StatementMatcher MethodX =
-    callExpr(hasDeclaration(cxxMethodDecl(hasName("x"))));
+      callExpr(hasDeclaration(cxxMethodDecl(hasName("x"))));
 
   EXPECT_TRUE(matches("class Y { void x() { x(); } };", MethodX));
   EXPECT_TRUE(notMatches("class Y { void x() {} };", MethodX));
 
   StatementMatcher MethodOnY =
-    cxxMemberCallExpr(on(hasType(recordDecl(hasName("Y")))));
+      cxxMemberCallExpr(on(hasType(recordDecl(hasName("Y")))));
 
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    notMatches("class Y { public: void x(); }; void z(Y *&y) { y->x(); }",
-               MethodOnY));
-  EXPECT_TRUE(
-    notMatches("class Y { public: void x(); }; void z(Y y[]) { y->x(); }",
-               MethodOnY));
-  EXPECT_TRUE(
-    notMatches("class Y { public: void x(); }; void z() { Y *y; y->x(); }",
-               MethodOnY));
+  EXPECT_TRUE(matches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
+                      MethodOnY));
+  EXPECT_TRUE(matches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
+                      MethodOnY));
+  EXPECT_TRUE(notMatches(
+      "class Y { public: void x(); }; void z(Y *&y) { y->x(); }", MethodOnY));
+  EXPECT_TRUE(notMatches(
+      "class Y { public: void x(); }; void z(Y y[]) { y->x(); }", MethodOnY));
+  EXPECT_TRUE(notMatches(
+      "class Y { public: void x(); }; void z() { Y *y; y->x(); }", MethodOnY));
 
   StatementMatcher MethodOnYPointer =
-    cxxMemberCallExpr(on(hasType(pointsTo(recordDecl(hasName("Y"))))));
+      cxxMemberCallExpr(on(hasType(pointsTo(recordDecl(hasName("Y"))))));
 
   EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z() { Y *y; y->x(); }",
-            MethodOnYPointer));
+      matches("class Y { public: void x(); }; void z() { Y *y; y->x(); }",
+              MethodOnYPointer));
   EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y *&y) { y->x(); }",
-            MethodOnYPointer));
+      matches("class Y { public: void x(); }; void z(Y *&y) { y->x(); }",
+              MethodOnYPointer));
   EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y y[]) { y->x(); }",
-            MethodOnYPointer));
+      matches("class Y { public: void x(); }; void z(Y y[]) { y->x(); }",
+              MethodOnYPointer));
   EXPECT_TRUE(
-    notMatches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
-               MethodOnYPointer));
+      notMatches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
+                 MethodOnYPointer));
   EXPECT_TRUE(
-    notMatches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
-               MethodOnYPointer));
+      notMatches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
+                 MethodOnYPointer));
 }
 
 TEST_P(ASTMatchersTest, LambdaExpr) {
   if (!GetParam().isCXX11OrLater()) {
     return;
   }
-  EXPECT_TRUE(matches("auto f = [] (int i) { return i; };",
-                      lambdaExpr()));
+  EXPECT_TRUE(matches("auto f = [] (int i) { return i; };", lambdaExpr()));
 }
 
 TEST_P(ASTMatchersTest, CXXForRangeStmt) {
@@ -378,7 +371,7 @@ TEST_P(ASTMatchersTest, CXXForRangeStmt_CXX11) {
     return;
   }
   EXPECT_TRUE(matches("int as[] = { 1, 2, 3 };"
-                        "void f() { for (auto &a : as); }",
+                      "void f() { for (auto &a : as); }",
                       cxxForRangeStmt()));
 }
 
@@ -387,15 +380,13 @@ TEST_P(ASTMatchersTest, SubstNonTypeTemplateParmExpr) {
     return;
   }
   EXPECT_FALSE(matches("template\n"
-                         "struct A {  static const int n = 0; };\n"
-                         "struct B : public A<42> {};",
-                         traverse(TK_AsIs,
-                       substNonTypeTemplateParmExpr())));
+                       "struct A {  static const int n = 0; };\n"
+                       "struct B : public A<42> {};",
+                       traverse(TK_AsIs, substNonTypeTemplateParmExpr())));
   EXPECT_TRUE(matches("template\n"
-                        "struct A {  static const int n = N; };\n"
-                        "struct B : public A<42> {};",
-                         traverse(TK_AsIs,
-                      substNonTypeTemplateParmExpr())));
+                      "struct A {  static const int n = N; };\n"
+                      "struct B : public A<42> {};",
+                      traverse(TK_AsIs, substNonTypeTemplateParmExpr())));
 }
 
 TEST_P(ASTMatchersTest, NonTypeTemplateParmDecl) {
@@ -405,7 +396,7 @@ TEST_P(ASTMatchersTest, NonTypeTemplateParmDecl) {
   EXPECT_TRUE(matches("template  void f();",
                       nonTypeTemplateParmDecl(hasName("N"))));
   EXPECT_TRUE(
-    notMatches("template  void f();", nonTypeTemplateParmDecl()));
+      notMatches("template  void f();", nonTypeTemplateParmDecl()));
 }
 
 TEST_P(ASTMatchersTest, TemplateTypeParmDecl) {
@@ -414,8 +405,7 @@ TEST_P(ASTMatchersTest, TemplateTypeParmDecl) {
   }
   EXPECT_TRUE(matches("template  void f();",
                       templateTypeParmDecl(hasName("T"))));
-  EXPECT_TRUE(
-    notMatches("template  void f();", templateTypeParmDecl()));
+  EXPECT_TRUE(notMatches("template  void f();", templateTypeParmDecl()));
 }
 
 TEST_P(ASTMatchersTest, UserDefinedLiteral) {
@@ -423,9 +413,9 @@ TEST_P(ASTMatchersTest, UserDefinedLiteral) {
     return;
   }
   EXPECT_TRUE(matches("constexpr char operator \"\" _inc (const char i) {"
-                        "  return i + 1;"
-                        "}"
-                        "char c = 'a'_inc;",
+                      "  return i + 1;"
+                      "}"
+                      "char c = 'a'_inc;",
                       userDefinedLiteral()));
 }
 
@@ -434,9 +424,7 @@ TEST_P(ASTMatchersTest, FlowControl) {
   EXPECT_TRUE(matches("void f() { while(1) { continue; } }", continueStmt()));
   EXPECT_TRUE(matches("void f() { goto FOO; FOO: ;}", gotoStmt()));
   EXPECT_TRUE(matches("void f() { goto FOO; FOO: ;}",
-                      labelStmt(
-                        hasDeclaration(
-                          labelDecl(hasName("FOO"))))));
+                      labelStmt(hasDeclaration(labelDecl(hasName("FOO"))))));
   EXPECT_TRUE(matches("void f() { FOO: ; void *ptr = &&FOO; goto *ptr; }",
                       addrLabelExpr()));
   EXPECT_TRUE(matches("void f() { return; }", returnStmt()));
@@ -450,8 +438,9 @@ TEST_P(ASTMatchersTest, CXXOperatorCallExpr) {
   StatementMatcher OpCall = cxxOperatorCallExpr();
   // Unary operator
   EXPECT_TRUE(matches("class Y { }; "
-                        "bool operator!(Y x) { return false; }; "
-                        "Y y; bool c = !y;", OpCall));
+                      "bool operator!(Y x) { return false; }; "
+                      "Y y; bool c = !y;",
+                      OpCall));
   // No match -- special operators like "new", "delete"
   // FIXME: operator new takes size_t, for which we need stddef.h, for which
   // we need to figure out include paths in the test.
@@ -460,12 +449,13 @@ TEST_P(ASTMatchersTest, CXXOperatorCallExpr) {
   //             "void *operator new(size_t size) { return 0; } "
   //             "Y *y = new Y;", OpCall));
   EXPECT_TRUE(notMatches("class Y { }; "
-                           "void operator delete(void *p) { } "
-                           "void a() {Y *y = new Y; delete y;}", OpCall));
+                         "void operator delete(void *p) { } "
+                         "void a() {Y *y = new Y; delete y;}",
+                         OpCall));
   // Binary operator
   EXPECT_TRUE(matches("class Y { }; "
-                        "bool operator&&(Y x, Y y) { return true; }; "
-                        "Y a; Y b; bool c = a && b;",
+                      "bool operator&&(Y x, Y y) { return true; }; "
+                      "Y a; Y b; bool c = a && b;",
                       OpCall));
   // No match -- normal operator, not an overloaded one.
   EXPECT_TRUE(notMatches("bool x = true, y = true; bool t = x && y;", OpCall));
@@ -481,30 +471,25 @@ TEST_P(ASTMatchersTest, ThisPointerType) {
       traverse(ast_type_traits::TK_AsIs,
                cxxMemberCallExpr(thisPointerType(recordDecl(hasName("Y")))));
 
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y *&y) { y->x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z(Y y[]) { y->x(); }",
-            MethodOnY));
-  EXPECT_TRUE(
-    matches("class Y { public: void x(); }; void z() { Y *y; y->x(); }",
-            MethodOnY));
-
+  EXPECT_TRUE(matches("class Y { public: void x(); }; void z() { Y y; y.x(); }",
+                      MethodOnY));
+  EXPECT_TRUE(matches("class Y { public: void x(); }; void z(Y &y) { y.x(); }",
+                      MethodOnY));
   EXPECT_TRUE(matches(
-    "class Y {"
-      "  public: virtual void x();"
-      "};"
-      "class X : public Y {"
-      "  public: virtual void x();"
-      "};"
-      "void z() { X *x; x->Y::x(); }", MethodOnY));
+      "class Y { public: void x(); }; void z(Y *&y) { y->x(); }", MethodOnY));
+  EXPECT_TRUE(matches(
+      "class Y { public: void x(); }; void z(Y y[]) { y->x(); }", MethodOnY));
+  EXPECT_TRUE(matches(
+      "class Y { public: void x(); }; void z() { Y *y; y->x(); }", MethodOnY));
+
+  EXPECT_TRUE(matches("class Y {"
+                      "  public: virtual void x();"
+                      "};"
+                      "class X : public Y {"
+                      "  public: virtual void x();"
+                      "};"
+                      "void z() { X *x; x->Y::x(); }",
+                      MethodOnY));
 }
 
 TEST_P(ASTMatchersTest, DeclRefExpr) {
@@ -512,29 +497,27 @@ TEST_P(ASTMatchersTest, DeclRefExpr) {
     // FIXME: Add a test for `declRefExpr()` that does not depend on C++.
     return;
   }
-  StatementMatcher Reference =
-    declRefExpr(to(
-      varDecl(hasInitializer(
-        cxxMemberCallExpr(thisPointerType(recordDecl(hasName("Y"))))))));
+  StatementMatcher Reference = declRefExpr(to(varDecl(hasInitializer(
+      cxxMemberCallExpr(thisPointerType(recordDecl(hasName("Y"))))))));
 
-  EXPECT_TRUE(matches(
-    "class Y {"
-      " public:"
-      "  bool x() const;"
-      "};"
-      "void z(const Y &y) {"
-      "  bool b = y.x();"
-      "  if (b) {}"
-      "}", Reference));
+  EXPECT_TRUE(matches("class Y {"
+                      " public:"
+                      "  bool x() const;"
+                      "};"
+                      "void z(const Y &y) {"
+                      "  bool b = y.x();"
+                      "  if (b) {}"
+                      "}",
+                      Reference));
 
-  EXPECT_TRUE(notMatches(
-    "class Y {"
-      " public:"
-      "  bool x() const;"
-      "};"
-      "void z(const Y &y) {"
-      "  bool b = y.x();"
-      "}", Reference));
+  EXPECT_TRUE(notMatches("class Y {"
+                         " public:"
+                         "  bool x() const;"
+                         "};"
+                         "void z(const Y &y) {"
+                         "  bool b = y.x();"
+                         "}",
+                         Reference));
 }
 
 TEST_P(ASTMatchersTest, CXXMemberCallExpr) {
@@ -542,32 +525,32 @@ TEST_P(ASTMatchersTest, CXXMemberCallExpr) {
     return;
   }
   StatementMatcher CallOnVariableY =
-    cxxMemberCallExpr(on(declRefExpr(to(varDecl(hasName("y"))))));
-
-  EXPECT_TRUE(matches(
-    "class Y { public: void x() { Y y; y.x(); } };", CallOnVariableY));
-  EXPECT_TRUE(matches(
-    "class Y { public: void x() const { Y y; y.x(); } };", CallOnVariableY));
-  EXPECT_TRUE(matches(
-    "class Y { public: void x(); };"
-      "class X : public Y { void z() { X y; y.x(); } };", CallOnVariableY));
-  EXPECT_TRUE(matches(
-    "class Y { public: void x(); };"
-      "class X : public Y { void z() { X *y; y->x(); } };", CallOnVariableY));
+      cxxMemberCallExpr(on(declRefExpr(to(varDecl(hasName("y"))))));
+
+  EXPECT_TRUE(matches("class Y { public: void x() { Y y; y.x(); } };",
+                      CallOnVariableY));
+  EXPECT_TRUE(matches("class Y { public: void x() const { Y y; y.x(); } };",
+                      CallOnVariableY));
+  EXPECT_TRUE(matches("class Y { public: void x(); };"
+                      "class X : public Y { void z() { X y; y.x(); } };",
+                      CallOnVariableY));
+  EXPECT_TRUE(matches("class Y { public: void x(); };"
+                      "class X : public Y { void z() { X *y; y->x(); } };",
+                      CallOnVariableY));
   EXPECT_TRUE(notMatches(
-    "class Y { public: void x(); };"
+      "class Y { public: void x(); };"
       "class X : public Y { void z() { unsigned long y; ((X*)y)->x(); } };",
-    CallOnVariableY));
+      CallOnVariableY));
 }
 
 TEST_P(ASTMatchersTest, UnaryExprOrTypeTraitExpr) {
-  EXPECT_TRUE(matches("void x() { int a = sizeof(a); }",
-                      unaryExprOrTypeTraitExpr()));
+  EXPECT_TRUE(
+      matches("void x() { int a = sizeof(a); }", unaryExprOrTypeTraitExpr()));
 }
 
 TEST_P(ASTMatchersTest, AlignOfExpr) {
-  EXPECT_TRUE(notMatches("void x() { int a = sizeof(a); }",
-                         alignOfExpr(anything())));
+  EXPECT_TRUE(
+      notMatches("void x() { int a = sizeof(a); }", alignOfExpr(anything())));
   // FIXME: Uncomment once alignof is enabled.
   // EXPECT_TRUE(matches("void x() { int a = alignof(a); }",
   //                     unaryExprOrTypeTraitExpr()));
@@ -603,11 +586,10 @@ TEST_P(ASTMatchersTest, MemberExpr_MatchesVariable) {
     return;
   }
   EXPECT_TRUE(
-    matches("class Y { void x() { this->y; } int y; };", memberExpr()));
-  EXPECT_TRUE(
-    matches("class Y { void x() { y; } int y; };", memberExpr()));
+      matches("class Y { void x() { this->y; } int y; };", memberExpr()));
+  EXPECT_TRUE(matches("class Y { void x() { y; } int y; };", memberExpr()));
   EXPECT_TRUE(
-    matches("class Y { void x() { Y y; y.y; } int y; };", memberExpr()));
+      matches("class Y { void x() { Y y; y.y; } int y; };", memberExpr()));
   EXPECT_TRUE(matches("template "
                       "class X : T { void f() { this->T::v; } };",
                       cxxDependentScopeMemberExpr()));
@@ -623,8 +605,8 @@ TEST_P(ASTMatchersTest, MemberExpr_MatchesStaticVariable) {
   }
   EXPECT_TRUE(matches("class Y { void x() { this->y; } static int y; };",
                       memberExpr()));
-  EXPECT_TRUE(notMatches("class Y { void x() { y; } static int y; };",
-                         memberExpr()));
+  EXPECT_TRUE(
+      notMatches("class Y { void x() { y; } static int y; };", memberExpr()));
   EXPECT_TRUE(notMatches("class Y { void x() { Y::y; } static int y; };",
                          memberExpr()));
 }
@@ -658,21 +640,21 @@ TEST_P(ASTMatchersTest, FunctionDecl_CXX) {
   if (!GetParam().hasDelayedTemplateParsing()) {
     // FIXME: Fix this test to work with delayed template parsing.
     // Dependent contexts, but a non-dependent call.
-    EXPECT_TRUE(matches("void f(); template  void g() { f(); }",
-                        CallFunctionF));
     EXPECT_TRUE(
-      matches("void f(); template  struct S { void g() { f(); } };",
-              CallFunctionF));
+        matches("void f(); template  void g() { f(); }", CallFunctionF));
+    EXPECT_TRUE(
+        matches("void f(); template  struct S { void g() { f(); } };",
+                CallFunctionF));
   }
 
   // Depedent calls don't match.
   EXPECT_TRUE(
-    notMatches("void f(int); template  void g(T t) { f(t); }",
-               CallFunctionF));
+      notMatches("void f(int); template  void g(T t) { f(t); }",
+                 CallFunctionF));
   EXPECT_TRUE(
-    notMatches("void f(int);"
+      notMatches("void f(int);"
                  "template  struct S { void g(T t) { f(t); } };",
-               CallFunctionF));
+                 CallFunctionF));
 
   EXPECT_TRUE(matches("void f(...);", functionDecl(isVariadic())));
   EXPECT_TRUE(matches("void f(...);", functionDecl(parameterCountIs(0))));
@@ -692,9 +674,8 @@ TEST_P(ASTMatchersTest,
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(
-    matches("template  void f(T t) {}",
-            functionTemplateDecl(hasName("f"))));
+  EXPECT_TRUE(matches("template  void f(T t) {}",
+                      functionTemplateDecl(hasName("f"))));
 }
 
 TEST_P(ASTMatchersTest, FunctionTemplate_DoesNotMatchFunctionDeclarations) {
@@ -709,12 +690,11 @@ TEST_P(ASTMatchersTest,
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(
-    notMatches("void g(); template  void f(T t) {}"
-                 "template <> void f(int t) { g(); }",
-               functionTemplateDecl(hasName("f"),
-                                    hasDescendant(declRefExpr(to(
-                                      functionDecl(hasName("g"))))))));
+  EXPECT_TRUE(notMatches(
+      "void g(); template  void f(T t) {}"
+      "template <> void f(int t) { g(); }",
+      functionTemplateDecl(hasName("f"), hasDescendant(declRefExpr(to(
+                                             functionDecl(hasName("g"))))))));
 }
 
 TEST_P(ASTMatchersTest, ClassTemplateSpecializationDecl) {
@@ -722,7 +702,7 @@ TEST_P(ASTMatchersTest, ClassTemplateSpecializationDecl) {
     return;
   }
   EXPECT_TRUE(matches("template struct A {};"
-                        "template<> struct A {};",
+                      "template<> struct A {};",
                       classTemplateSpecializationDecl()));
   EXPECT_TRUE(matches("template struct A {}; A a;",
                       classTemplateSpecializationDecl()));
@@ -756,13 +736,11 @@ TEST_P(ASTMatchersTest, Matcher_ConstructorCall) {
       traverse(ast_type_traits::TK_AsIs, cxxConstructExpr());
 
   EXPECT_TRUE(
-    matches("class X { public: X(); }; void x() { X x; }", Constructor));
-  EXPECT_TRUE(
-    matches("class X { public: X(); }; void x() { X x = X(); }",
-            Constructor));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { X x = 0; }",
-            Constructor));
+      matches("class X { public: X(); }; void x() { X x; }", Constructor));
+  EXPECT_TRUE(matches("class X { public: X(); }; void x() { X x = X(); }",
+                      Constructor));
+  EXPECT_TRUE(matches("class X { public: X(int); }; void x() { X x = 0; }",
+                      Constructor));
   EXPECT_TRUE(matches("class X {}; void x(int) { X x; }", Constructor));
 }
 
@@ -779,9 +757,9 @@ TEST_P(ASTMatchersTest, Matcher_ThisExpr) {
     return;
   }
   EXPECT_TRUE(
-    matches("struct X { int a; int f () { return a; } };", cxxThisExpr()));
+      matches("struct X { int a; int f () { return a; } };", cxxThisExpr()));
   EXPECT_TRUE(
-    notMatches("struct X { int f () { int a; return a; } };", cxxThisExpr()));
+      notMatches("struct X { int f () { int a; return a; } };", cxxThisExpr()));
 }
 
 TEST_P(ASTMatchersTest, Matcher_BindTemporaryExpression) {
@@ -794,30 +772,27 @@ TEST_P(ASTMatchersTest, Matcher_BindTemporaryExpression) {
 
   StringRef ClassString = "class string { public: string(); ~string(); }; ";
 
-  EXPECT_TRUE(
-    matches(ClassString +
-              "string GetStringByValue();"
-                "void FunctionTakesString(string s);"
-                "void run() { FunctionTakesString(GetStringByValue()); }",
-            TempExpression));
+  EXPECT_TRUE(matches(
+      ClassString + "string GetStringByValue();"
+                    "void FunctionTakesString(string s);"
+                    "void run() { FunctionTakesString(GetStringByValue()); }",
+      TempExpression));
 
-  EXPECT_TRUE(
-    notMatches(ClassString +
-                 "string* GetStringPointer(); "
-                   "void FunctionTakesStringPtr(string* s);"
-                   "void run() {"
-                   "  string* s = GetStringPointer();"
-                   "  FunctionTakesStringPtr(GetStringPointer());"
-                   "  FunctionTakesStringPtr(s);"
-                   "}",
-               TempExpression));
+  EXPECT_TRUE(notMatches(ClassString +
+                             "string* GetStringPointer(); "
+                             "void FunctionTakesStringPtr(string* s);"
+                             "void run() {"
+                             "  string* s = GetStringPointer();"
+                             "  FunctionTakesStringPtr(GetStringPointer());"
+                             "  FunctionTakesStringPtr(s);"
+                             "}",
+                         TempExpression));
 
-  EXPECT_TRUE(
-    notMatches("class no_dtor {};"
-                 "no_dtor GetObjByValue();"
-                 "void ConsumeObj(no_dtor param);"
-                 "void run() { ConsumeObj(GetObjByValue()); }",
-               TempExpression));
+  EXPECT_TRUE(notMatches("class no_dtor {};"
+                         "no_dtor GetObjByValue();"
+                         "void ConsumeObj(no_dtor param);"
+                         "void run() { ConsumeObj(GetObjByValue()); }",
+                         TempExpression));
 }
 
 TEST_P(ASTMatchersTest, MaterializeTemporaryExpr_MatchesTemporaryCXX11CXX14) {
@@ -872,10 +847,9 @@ TEST_P(ASTMatchersTest, Matcher_NewExpression) {
   StatementMatcher New = cxxNewExpr();
 
   EXPECT_TRUE(matches("class X { public: X(); }; void x() { new X; }", New));
+  EXPECT_TRUE(matches("class X { public: X(); }; void x() { new X(); }", New));
   EXPECT_TRUE(
-    matches("class X { public: X(); }; void x() { new X(); }", New));
-  EXPECT_TRUE(
-    matches("class X { public: X(int); }; void x() { new X(0); }", New));
+      matches("class X { public: X(int); }; void x() { new X(0); }", New));
   EXPECT_TRUE(matches("class X {}; void x(int) { new X; }", New));
 }
 
@@ -883,8 +857,8 @@ TEST_P(ASTMatchersTest, Matcher_DeleteExpression) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches("struct A {}; void f(A* a) { delete a; }",
-                      cxxDeleteExpr()));
+  EXPECT_TRUE(
+      matches("struct A {}; void f(A* a) { delete a; }", cxxDeleteExpr()));
 }
 
 TEST_P(ASTMatchersTest, Matcher_NoexceptExpression) {
@@ -907,7 +881,7 @@ TEST_P(ASTMatchersTest, Matcher_DefaultArgument) {
   StatementMatcher Arg = cxxDefaultArgExpr();
   EXPECT_TRUE(matches("void x(int, int = 0) { int y; x(y); }", Arg));
   EXPECT_TRUE(
-    matches("class X { void x(int, int = 0) { int y; x(y); } };", Arg));
+      matches("class X { void x(int, int = 0) { int y; x(y); } };", Arg));
   EXPECT_TRUE(notMatches("void x(int, int = 0) { int y; x(y, 0); }", Arg));
 }
 
@@ -951,7 +925,7 @@ TEST_P(ASTMatchersTest, IntegerLiteral) {
 
   // Non-matching cases (character literals, float and double)
   EXPECT_TRUE(notMatches("int i = L'a';",
-                         HasIntLiteral));  // this is actually a character
+                         HasIntLiteral)); // this is actually a character
   // literal cast to int
   EXPECT_TRUE(notMatches("int i = 'a';", HasIntLiteral));
   EXPECT_TRUE(notMatches("int i = 1e10;", HasIntLiteral));
@@ -974,13 +948,13 @@ TEST_P(ASTMatchersTest, FloatLiteral) {
   EXPECT_TRUE(matches("double i = 5.0;", floatLiteral(equals(5.0))));
   EXPECT_TRUE(matches("double i = 5.0;", floatLiteral(equals(5.0f))));
   EXPECT_TRUE(
-    matches("double i = 5.0;", floatLiteral(equals(llvm::APFloat(5.0)))));
+      matches("double i = 5.0;", floatLiteral(equals(llvm::APFloat(5.0)))));
 
   EXPECT_TRUE(notMatches("float i = 10;", HasFloatLiteral));
   EXPECT_TRUE(notMatches("double i = 5.0;", floatLiteral(equals(6.0))));
   EXPECT_TRUE(notMatches("double i = 5.0;", floatLiteral(equals(6.0f))));
   EXPECT_TRUE(
-    notMatches("double i = 5.0;", floatLiteral(equals(llvm::APFloat(6.0)))));
+      notMatches("double i = 5.0;", floatLiteral(equals(llvm::APFloat(6.0)))));
 }
 
 TEST_P(ASTMatchersTest, CXXNullPtrLiteralExpr) {
@@ -1051,9 +1025,9 @@ TEST_P(ASTMatchersTest, ParenListExpr) {
     return;
   }
   EXPECT_TRUE(
-    matches("template class foo { void bar() { foo X(*this); } };"
+      matches("template class foo { void bar() { foo X(*this); } };"
               "template class foo;",
-            varDecl(hasInitializer(parenListExpr(has(unaryOperator()))))));
+              varDecl(hasInitializer(parenListExpr(has(unaryOperator()))))));
 }
 
 TEST_P(ASTMatchersTest, StmtExpr) {
@@ -1064,9 +1038,8 @@ TEST_P(ASTMatchersTest, StmtExpr) {
 TEST_P(ASTMatchersTest, PredefinedExpr) {
   // __func__ expands as StringLiteral("foo")
   EXPECT_TRUE(matches("void foo() { __func__; }",
-                      predefinedExpr(
-                        hasType(asString("const char [4]")),
-                        has(stringLiteral()))));
+                      predefinedExpr(hasType(asString("const char [4]")),
+                                     has(stringLiteral()))));
 }
 
 TEST_P(ASTMatchersTest, AsmStatement) {
@@ -1080,7 +1053,7 @@ TEST_P(ASTMatchersTest, HasCondition) {
   }
 
   StatementMatcher Condition =
-    ifStmt(hasCondition(cxxBoolLiteral(equals(true))));
+      ifStmt(hasCondition(cxxBoolLiteral(equals(true))));
 
   EXPECT_TRUE(matches("void x() { if (true) {} }", Condition));
   EXPECT_TRUE(notMatches("void x() { if (false) {} }", Condition));
@@ -1096,24 +1069,24 @@ TEST_P(ASTMatchersTest, ConditionalOperator) {
     return;
   }
 
-  StatementMatcher Conditional = conditionalOperator(
-    hasCondition(cxxBoolLiteral(equals(true))),
-    hasTrueExpression(cxxBoolLiteral(equals(false))));
+  StatementMatcher Conditional =
+      conditionalOperator(hasCondition(cxxBoolLiteral(equals(true))),
+                          hasTrueExpression(cxxBoolLiteral(equals(false))));
 
   EXPECT_TRUE(matches("void x() { true ? false : true; }", Conditional));
   EXPECT_TRUE(notMatches("void x() { false ? false : true; }", Conditional));
   EXPECT_TRUE(notMatches("void x() { true ? true : false; }", Conditional));
 
-  StatementMatcher ConditionalFalse = conditionalOperator(
-    hasFalseExpression(cxxBoolLiteral(equals(false))));
+  StatementMatcher ConditionalFalse =
+      conditionalOperator(hasFalseExpression(cxxBoolLiteral(equals(false))));
 
   EXPECT_TRUE(matches("void x() { true ? true : false; }", ConditionalFalse));
   EXPECT_TRUE(
-    notMatches("void x() { true ? false : true; }", ConditionalFalse));
+      notMatches("void x() { true ? false : true; }", ConditionalFalse));
 
   EXPECT_TRUE(matches("void x() { true ? true : false; }", ConditionalFalse));
   EXPECT_TRUE(
-    notMatches("void x() { true ? false : true; }", ConditionalFalse));
+      notMatches("void x() { true ? false : true; }", ConditionalFalse));
 }
 
 TEST_P(ASTMatchersTest, BinaryConditionalOperator) {
@@ -1132,18 +1105,17 @@ TEST_P(ASTMatchersTest, BinaryConditionalOperator) {
   EXPECT_TRUE(matches("void x() { 1 ?: 0; }", AlwaysOne));
 
   StatementMatcher FourNotFive = binaryConditionalOperator(
-    hasTrueExpression(opaqueValueExpr(
-      hasSourceExpression((integerLiteral(equals(4)))))),
-    hasFalseExpression(integerLiteral(equals(5))));
+      hasTrueExpression(
+          opaqueValueExpr(hasSourceExpression((integerLiteral(equals(4)))))),
+      hasFalseExpression(integerLiteral(equals(5))));
 
   EXPECT_TRUE(matches("void x() { 4 ?: 5; }", FourNotFive));
 }
 
 TEST_P(ASTMatchersTest, ArraySubscriptExpr) {
-  EXPECT_TRUE(matches("int i[2]; void f() { i[1] = 1; }",
-                      arraySubscriptExpr()));
-  EXPECT_TRUE(notMatches("int i; void f() { i = 1; }",
-                         arraySubscriptExpr()));
+  EXPECT_TRUE(
+      matches("int i[2]; void f() { i[1] = 1; }", arraySubscriptExpr()));
+  EXPECT_TRUE(notMatches("int i; void f() { i = 1; }", arraySubscriptExpr()));
 }
 
 TEST_P(ASTMatchersTest, ForStmt) {
@@ -1178,10 +1150,9 @@ TEST_P(ASTMatchersTest, CompoundStatement_DoesNotMatchEmptyStruct) {
   }
   // It's not a compound statement just because there's "{}" in the source
   // text. This is an AST search, not grep.
-  EXPECT_TRUE(notMatches("namespace n { struct S {}; }",
-                         compoundStmt()));
-  EXPECT_TRUE(matches("namespace n { struct S { void f() {{}} }; }",
-                      compoundStmt()));
+  EXPECT_TRUE(notMatches("namespace n { struct S {}; }", compoundStmt()));
+  EXPECT_TRUE(
+      matches("namespace n { struct S { void f() {{}} }; }", compoundStmt()));
 }
 
 TEST_P(ASTMatchersTest, CastExpr_MatchesExplicitCasts) {
@@ -1242,8 +1213,8 @@ TEST_P(ASTMatchersTest, CXXReinterpretCastExpr_DoesNotMatchOtherCasts) {
   EXPECT_TRUE(notMatches("void* p = static_cast(&p);",
                          cxxReinterpretCastExpr()));
   EXPECT_TRUE(notMatches("struct B { virtual ~B() {} }; struct D : B {};"
-                           "B b;"
-                           "D* p = dynamic_cast(&b);",
+                         "B b;"
+                         "D* p = dynamic_cast(&b);",
                          cxxReinterpretCastExpr()));
 }
 
@@ -1262,11 +1233,10 @@ TEST_P(ASTMatchersTest, CXXFunctionalCastExpr_DoesNotMatchOtherCasts) {
   }
   StringRef FooClass = "class Foo { public: Foo(const char*); };";
   EXPECT_TRUE(
-    notMatches(FooClass + "void r() { Foo f = (Foo) \"hello world\"; }",
-               cxxFunctionalCastExpr()));
-  EXPECT_TRUE(
-    notMatches(FooClass + "void r() { Foo f = \"hello world\"; }",
-               cxxFunctionalCastExpr()));
+      notMatches(FooClass + "void r() { Foo f = (Foo) \"hello world\"; }",
+                 cxxFunctionalCastExpr()));
+  EXPECT_TRUE(notMatches(FooClass + "void r() { Foo f = \"hello world\"; }",
+                         cxxFunctionalCastExpr()));
 }
 
 TEST_P(ASTMatchersTest, CXXDynamicCastExpr) {
@@ -1274,8 +1244,8 @@ TEST_P(ASTMatchersTest, CXXDynamicCastExpr) {
     return;
   }
   EXPECT_TRUE(matches("struct B { virtual ~B() {} }; struct D : B {};"
-                        "B b;"
-                        "D* p = dynamic_cast(&b);",
+                      "B b;"
+                      "D* p = dynamic_cast(&b);",
                       cxxDynamicCastExpr()));
 }
 
@@ -1283,8 +1253,7 @@ TEST_P(ASTMatchersTest, CXXStaticCastExpr_MatchesSimpleCase) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches("void* p(static_cast(&p));",
-                      cxxStaticCastExpr()));
+  EXPECT_TRUE(matches("void* p(static_cast(&p));", cxxStaticCastExpr()));
 }
 
 TEST_P(ASTMatchersTest, CXXStaticCastExpr_DoesNotMatchOtherCasts) {
@@ -1292,13 +1261,13 @@ TEST_P(ASTMatchersTest, CXXStaticCastExpr_DoesNotMatchOtherCasts) {
     return;
   }
   EXPECT_TRUE(notMatches("char* p = (char*)(&p);", cxxStaticCastExpr()));
-  EXPECT_TRUE(notMatches("char q, *p = const_cast(&q);",
-                         cxxStaticCastExpr()));
+  EXPECT_TRUE(
+      notMatches("char q, *p = const_cast(&q);", cxxStaticCastExpr()));
   EXPECT_TRUE(notMatches("void* p = reinterpret_cast(&p);",
                          cxxStaticCastExpr()));
   EXPECT_TRUE(notMatches("struct B { virtual ~B() {} }; struct D : B {};"
-                           "B b;"
-                           "D* p = dynamic_cast(&b);",
+                         "B b;"
+                         "D* p = dynamic_cast(&b);",
                          cxxStaticCastExpr()));
 }
 
@@ -1311,11 +1280,11 @@ TEST_P(ASTMatchersTest, CStyleCastExpr_DoesNotMatchOtherCasts) {
     return;
   }
   EXPECT_TRUE(notMatches("char* p = static_cast(0);"
-                           "char q, *r = const_cast(&q);"
-                           "void* s = reinterpret_cast(&s);"
-                           "struct B { virtual ~B() {} }; struct D : B {};"
-                           "B b;"
-                           "D* t = dynamic_cast(&b);",
+                         "char q, *r = const_cast(&q);"
+                         "void* s = reinterpret_cast(&s);"
+                         "struct B { virtual ~B() {} }; struct D : B {};"
+                         "B b;"
+                         "D* t = dynamic_cast(&b);",
                          cStyleCastExpr()));
 }
 
@@ -1335,12 +1304,12 @@ TEST_P(ASTMatchersTest, ImplicitCastExpr_MatchesSimpleCase) {
 }
 
 TEST_P(ASTMatchersTest, ImplicitCastExpr_DoesNotMatchIncorrectly) {
-  // This test verifies that implicitCastExpr() matches exactly when implicit casts
-  // are present, and that it ignores explicit and paren casts.
+  // This test verifies that implicitCastExpr() matches exactly when implicit
+  // casts are present, and that it ignores explicit and paren casts.
 
   // These two test cases have no casts.
-  EXPECT_TRUE(notMatches("int x = 0;",
-                         varDecl(hasInitializer(implicitCastExpr()))));
+  EXPECT_TRUE(
+      notMatches("int x = 0;", varDecl(hasInitializer(implicitCastExpr()))));
   EXPECT_TRUE(
       notMatches("int x = (0);", varDecl(hasInitializer(implicitCastExpr()))));
   EXPECT_TRUE(notMatches("void f() { int x = 0; double d = (double) x; }",
@@ -1393,7 +1362,7 @@ TEST_P(ASTMatchersTest, InitListExpr) {
   EXPECT_TRUE(matches("struct B { int x, y; }; struct B b = { 5, 6 };",
                       initListExpr(hasType(recordDecl(hasName("B"))))));
   EXPECT_TRUE(
-    matches("int i[1] = {42, [0] = 43};", integerLiteral(equals(42))));
+      matches("int i[1] = {42, [0] = 43};", integerLiteral(equals(42))));
 }
 
 TEST_P(ASTMatchersTest, InitListExpr_CXX) {
@@ -1441,8 +1410,7 @@ TEST_P(ASTMatchersTest, UsingDecl_MatchesUsingDeclarations) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches("namespace X { int x; } using X::x;",
-                      usingDecl()));
+  EXPECT_TRUE(matches("namespace X { int x; } using X::x;", usingDecl()));
 }
 
 TEST_P(ASTMatchersTest, UsingDecl_MatchesShadowUsingDelcarations) {
@@ -1460,7 +1428,7 @@ TEST_P(ASTMatchersTest, UsingDirectiveDecl_MatchesUsingNamespace) {
   EXPECT_TRUE(matches("namespace X { int x; } using namespace X;",
                       usingDirectiveDecl()));
   EXPECT_FALSE(
-    matches("namespace X { int x; } using X::x;", usingDirectiveDecl()));
+      matches("namespace X { int x; } using X::x;", usingDirectiveDecl()));
 }
 
 TEST_P(ASTMatchersTest, WhileStmt) {
@@ -1499,11 +1467,11 @@ TEST_P(ASTMatchersTest, CxxExceptionHandling_SimpleCases) {
   EXPECT_TRUE(matches("void foo() try { } catch(int X) { }", cxxCatchStmt()));
   EXPECT_TRUE(matches("void foo() try { } catch(int X) { }", cxxTryStmt()));
   EXPECT_TRUE(
-    notMatches("void foo() try { } catch(int X) { }", cxxThrowExpr()));
-  EXPECT_TRUE(matches("void foo() try { throw; } catch(int X) { }",
-                      cxxThrowExpr()));
-  EXPECT_TRUE(matches("void foo() try { throw 5;} catch(int X) { }",
-                      cxxThrowExpr()));
+      notMatches("void foo() try { } catch(int X) { }", cxxThrowExpr()));
+  EXPECT_TRUE(
+      matches("void foo() try { throw; } catch(int X) { }", cxxThrowExpr()));
+  EXPECT_TRUE(
+      matches("void foo() try { throw 5;} catch(int X) { }", cxxThrowExpr()));
   EXPECT_TRUE(matches("void foo() try { throw; } catch(...) { }",
                       cxxCatchStmt(isCatchAll())));
   EXPECT_TRUE(notMatches("void foo() try { throw; } catch(int) { }",
@@ -1542,9 +1510,8 @@ TEST_P(ASTMatchersTest, QualType) {
 
 TEST_P(ASTMatchersTest, ConstantArrayType) {
   EXPECT_TRUE(matches("int a[2];", constantArrayType()));
-  EXPECT_TRUE(notMatches(
-    "void f() { int a[] = { 2, 3 }; int b[a[0]]; }",
-    constantArrayType(hasElementType(builtinType()))));
+  EXPECT_TRUE(notMatches("void f() { int a[] = { 2, 3 }; int b[a[0]]; }",
+                         constantArrayType(hasElementType(builtinType()))));
 
   EXPECT_TRUE(matches("int a[42];", constantArrayType(hasSize(42))));
   EXPECT_TRUE(matches("int b[2*21];", constantArrayType(hasSize(42))));
@@ -1555,12 +1522,12 @@ TEST_P(ASTMatchersTest, DependentSizedArrayType) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches(
-    "template  class array { T data[Size]; };",
-    dependentSizedArrayType()));
-  EXPECT_TRUE(notMatches(
-    "int a[42]; int b[] = { 2, 3 }; void f() { int c[b[0]]; }",
-    dependentSizedArrayType()));
+  EXPECT_TRUE(
+      matches("template  class array { T data[Size]; };",
+              dependentSizedArrayType()));
+  EXPECT_TRUE(
+      notMatches("int a[42]; int b[] = { 2, 3 }; void f() { int c[b[0]]; }",
+                 dependentSizedArrayType()));
 }
 
 TEST_P(ASTMatchersTest, IncompleteArrayType) {
@@ -1575,22 +1542,21 @@ TEST_P(ASTMatchersTest, VariableArrayType) {
   EXPECT_TRUE(matches("void f(int b) { int a[b]; }", variableArrayType()));
   EXPECT_TRUE(notMatches("int a[] = {2, 3}; int b[42];", variableArrayType()));
 
-  EXPECT_TRUE(matches(
-    "void f(int b) { int a[b]; }",
-    variableArrayType(hasSizeExpr(ignoringImpCasts(declRefExpr(to(
-      varDecl(hasName("b")))))))));
+  EXPECT_TRUE(matches("void f(int b) { int a[b]; }",
+                      variableArrayType(hasSizeExpr(ignoringImpCasts(
+                          declRefExpr(to(varDecl(hasName("b")))))))));
 }
 
 TEST_P(ASTMatchersTest, AtomicType) {
   if (llvm::Triple(llvm::sys::getDefaultTargetTriple()).getOS() !=
-    llvm::Triple::Win32) {
+      llvm::Triple::Win32) {
     // FIXME: Make this work for MSVC.
     EXPECT_TRUE(matches("_Atomic(int) i;", atomicType()));
 
-    EXPECT_TRUE(matches("_Atomic(int) i;",
-                        atomicType(hasValueType(isInteger()))));
-    EXPECT_TRUE(notMatches("_Atomic(float) f;",
-                           atomicType(hasValueType(isInteger()))));
+    EXPECT_TRUE(
+        matches("_Atomic(int) i;", atomicType(hasValueType(isInteger()))));
+    EXPECT_TRUE(
+        notMatches("_Atomic(float) f;", atomicType(hasValueType(isInteger()))));
   }
 }
 
@@ -1608,9 +1574,9 @@ TEST_P(ASTMatchersTest, AutoType) {
 
   // FIXME: Matching against the type-as-written can't work here, because the
   //        type as written was not deduced.
-  //EXPECT_TRUE(matches("auto a = 1;",
+  // EXPECT_TRUE(matches("auto a = 1;",
   //                    autoType(hasDeducedType(isInteger()))));
-  //EXPECT_TRUE(notMatches("auto b = 2.0;",
+  // EXPECT_TRUE(notMatches("auto b = 2.0;",
   //                       autoType(hasDeducedType(isInteger()))));
 }
 
@@ -1657,48 +1623,43 @@ TEST_P(ASTMatchersTest, FunctionProtoType_CXX) {
 
 TEST_P(ASTMatchersTest, ParenType) {
   EXPECT_TRUE(
-    matches("int (*array)[4];", varDecl(hasType(pointsTo(parenType())))));
+      matches("int (*array)[4];", varDecl(hasType(pointsTo(parenType())))));
   EXPECT_TRUE(notMatches("int *array[4];", varDecl(hasType(parenType()))));
 
   EXPECT_TRUE(matches(
-    "int (*ptr_to_func)(int);",
-    varDecl(hasType(pointsTo(parenType(innerType(functionType())))))));
+      "int (*ptr_to_func)(int);",
+      varDecl(hasType(pointsTo(parenType(innerType(functionType())))))));
   EXPECT_TRUE(notMatches(
-    "int (*ptr_to_array)[4];",
-    varDecl(hasType(pointsTo(parenType(innerType(functionType())))))));
+      "int (*ptr_to_array)[4];",
+      varDecl(hasType(pointsTo(parenType(innerType(functionType())))))));
 }
 
 TEST_P(ASTMatchersTest, PointerType) {
   // FIXME: Reactive when these tests can be more specific (not matching
   // implicit code on certain platforms), likely when we have hasDescendant for
   // Types/TypeLocs.
-  //EXPECT_TRUE(matchAndVerifyResultTrue(
+  // EXPECT_TRUE(matchAndVerifyResultTrue(
   //    "int* a;",
   //    pointerTypeLoc(pointeeLoc(typeLoc().bind("loc"))),
   //    std::make_unique>("loc", 1)));
-  //EXPECT_TRUE(matchAndVerifyResultTrue(
+  // EXPECT_TRUE(matchAndVerifyResultTrue(
   //    "int* a;",
   //    pointerTypeLoc().bind("loc"),
   //    std::make_unique>("loc", 1)));
-  EXPECT_TRUE(matches(
-    "int** a;",
-    loc(pointerType(pointee(qualType())))));
-  EXPECT_TRUE(matches(
-    "int** a;",
-    loc(pointerType(pointee(pointerType())))));
-  EXPECT_TRUE(matches(
-    "int* b; int* * const a = &b;",
-    loc(qualType(isConstQualified(), pointerType()))));
+  EXPECT_TRUE(matches("int** a;", loc(pointerType(pointee(qualType())))));
+  EXPECT_TRUE(matches("int** a;", loc(pointerType(pointee(pointerType())))));
+  EXPECT_TRUE(matches("int* b; int* * const a = &b;",
+                      loc(qualType(isConstQualified(), pointerType()))));
 
   StringRef Fragment = "int *ptr;";
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("ptr"),
-                                           hasType(blockPointerType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("ptr"),
-                                           hasType(memberPointerType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("ptr"),
-                                        hasType(pointerType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("ptr"),
-                                           hasType(referenceType()))));
+  EXPECT_TRUE(notMatches(Fragment,
+                         varDecl(hasName("ptr"), hasType(blockPointerType()))));
+  EXPECT_TRUE(notMatches(
+      Fragment, varDecl(hasName("ptr"), hasType(memberPointerType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("ptr"), hasType(pointerType()))));
+  EXPECT_TRUE(
+      notMatches(Fragment, varDecl(hasName("ptr"), hasType(referenceType()))));
 }
 
 TEST_P(ASTMatchersTest, PointerType_CXX) {
@@ -1763,28 +1724,28 @@ TEST_P(ASTMatchersTest, AutoRefTypes) {
                        "auto &c = a;"
                        "auto &&d = c;"
                        "auto &&e = 2;";
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("a"),
-                                           hasType(referenceType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("b"),
-                                           hasType(referenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("c"),
-                                        hasType(referenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("c"),
-                                        hasType(lValueReferenceType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("c"),
-                                           hasType(rValueReferenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("d"),
-                                        hasType(referenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("d"),
-                                        hasType(lValueReferenceType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("d"),
-                                           hasType(rValueReferenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("e"),
-                                        hasType(referenceType()))));
-  EXPECT_TRUE(notMatches(Fragment, varDecl(hasName("e"),
-                                           hasType(lValueReferenceType()))));
-  EXPECT_TRUE(matches(Fragment, varDecl(hasName("e"),
-                                        hasType(rValueReferenceType()))));
+  EXPECT_TRUE(
+      notMatches(Fragment, varDecl(hasName("a"), hasType(referenceType()))));
+  EXPECT_TRUE(
+      notMatches(Fragment, varDecl(hasName("b"), hasType(referenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("c"), hasType(referenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("c"), hasType(lValueReferenceType()))));
+  EXPECT_TRUE(notMatches(
+      Fragment, varDecl(hasName("c"), hasType(rValueReferenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("d"), hasType(referenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("d"), hasType(lValueReferenceType()))));
+  EXPECT_TRUE(notMatches(
+      Fragment, varDecl(hasName("d"), hasType(rValueReferenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("e"), hasType(referenceType()))));
+  EXPECT_TRUE(notMatches(
+      Fragment, varDecl(hasName("e"), hasType(lValueReferenceType()))));
+  EXPECT_TRUE(
+      matches(Fragment, varDecl(hasName("e"), hasType(rValueReferenceType()))));
 }
 
 TEST_P(ASTMatchersTest, EnumType) {
@@ -1796,34 +1757,29 @@ TEST_P(ASTMatchersTest, EnumType_CXX) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches("enum Color { Green }; Color color;",
-                      loc(enumType())));
+  EXPECT_TRUE(matches("enum Color { Green }; Color color;", loc(enumType())));
 }
 
 TEST_P(ASTMatchersTest, EnumType_CXX11) {
   if (!GetParam().isCXX11OrLater()) {
     return;
   }
-  EXPECT_TRUE(matches("enum class Color { Green }; Color color;",
-                      loc(enumType())));
+  EXPECT_TRUE(
+      matches("enum class Color { Green }; Color color;", loc(enumType())));
 }
 
 TEST_P(ASTMatchersTest, PointerType_MatchesPointersToConstTypes) {
-  EXPECT_TRUE(matches("int b; int * const a = &b;",
-                      loc(pointerType())));
-  EXPECT_TRUE(matches("int b; int * const a = &b;",
-                      loc(pointerType())));
-  EXPECT_TRUE(matches(
-    "int b; const int * a = &b;",
-    loc(pointerType(pointee(builtinType())))));
-  EXPECT_TRUE(matches(
-    "int b; const int * a = &b;",
-    pointerType(pointee(builtinType()))));
+  EXPECT_TRUE(matches("int b; int * const a = &b;", loc(pointerType())));
+  EXPECT_TRUE(matches("int b; int * const a = &b;", loc(pointerType())));
+  EXPECT_TRUE(matches("int b; const int * a = &b;",
+                      loc(pointerType(pointee(builtinType())))));
+  EXPECT_TRUE(matches("int b; const int * a = &b;",
+                      pointerType(pointee(builtinType()))));
 }
 
 TEST_P(ASTMatchersTest, TypedefType) {
-  EXPECT_TRUE(matches("typedef int X; X a;", varDecl(hasName("a"),
-                                                     hasType(typedefType()))));
+  EXPECT_TRUE(matches("typedef int X; X a;",
+                      varDecl(hasName("a"), hasType(typedefType()))));
 }
 
 TEST_P(ASTMatchersTest, TemplateSpecializationType) {
@@ -1864,13 +1820,13 @@ TEST_P(ASTMatchersTest, ElaboratedType) {
     // FIXME: Add a test for `elaboratedType()` that does not depend on C++.
     return;
   }
-  EXPECT_TRUE(matches(
-    "namespace N {"
-      "  namespace M {"
-      "    class D {};"
-      "  }"
-      "}"
-      "N::M::D d;", elaboratedType()));
+  EXPECT_TRUE(matches("namespace N {"
+                      "  namespace M {"
+                      "    class D {};"
+                      "  }"
+                      "}"
+                      "N::M::D d;",
+                      elaboratedType()));
   EXPECT_TRUE(matches("class C {} c;", elaboratedType()));
   EXPECT_TRUE(notMatches("class C {}; C c;", elaboratedType()));
 }
@@ -1885,30 +1841,29 @@ TEST_P(ASTMatchersTest, SubstTemplateTypeParmType) {
                    "}"
                    "int i = F();";
   EXPECT_FALSE(matches(code, binaryOperator(hasLHS(
-    expr(hasType(substTemplateTypeParmType()))))));
+                                 expr(hasType(substTemplateTypeParmType()))))));
   EXPECT_TRUE(matches(code, binaryOperator(hasRHS(
-    expr(hasType(substTemplateTypeParmType()))))));
+                                expr(hasType(substTemplateTypeParmType()))))));
 }
 
 TEST_P(ASTMatchersTest, NestedNameSpecifier) {
   if (!GetParam().isCXX()) {
     return;
   }
-  EXPECT_TRUE(matches("namespace ns { struct A {}; } ns::A a;",
-                      nestedNameSpecifier()));
+  EXPECT_TRUE(
+      matches("namespace ns { struct A {}; } ns::A a;", nestedNameSpecifier()));
   EXPECT_TRUE(matches("template  class A { typename T::B b; };",
                       nestedNameSpecifier()));
-  EXPECT_TRUE(matches("struct A { void f(); }; void A::f() {}",
-                      nestedNameSpecifier()));
+  EXPECT_TRUE(
+      matches("struct A { void f(); }; void A::f() {}", nestedNameSpecifier()));
   EXPECT_TRUE(matches("namespace a { namespace b {} } namespace ab = a::b;",
                       nestedNameSpecifier()));
 
-  EXPECT_TRUE(matches(
-    "struct A { static void f() {} }; void g() { A::f(); }",
-    nestedNameSpecifier()));
-  EXPECT_TRUE(notMatches(
-    "struct A { static void f() {} }; void g(A* a) { a->f(); }",
-    nestedNameSpecifier()));
+  EXPECT_TRUE(matches("struct A { static void f() {} }; void g() { A::f(); }",
+                      nestedNameSpecifier()));
+  EXPECT_TRUE(
+      notMatches("struct A { static void f() {} }; void g(A* a) { a->f(); }",
+                 nestedNameSpecifier()));
 }
 
 TEST_P(ASTMatchersTest, NullStmt) {
@@ -1929,10 +1884,10 @@ TEST_P(ASTMatchersTest, NestedNameSpecifier_MatchesTypes) {
     return;
   }
   NestedNameSpecifierMatcher Matcher = nestedNameSpecifier(
-    specifiesType(hasDeclaration(recordDecl(hasName("A")))));
+      specifiesType(hasDeclaration(recordDecl(hasName("A")))));
   EXPECT_TRUE(matches("struct A { struct B {}; }; A::B b;", Matcher));
-  EXPECT_TRUE(matches("struct A { struct B { struct C {}; }; }; A::B::C c;",
-                      Matcher));
+  EXPECT_TRUE(
+      matches("struct A { struct B { struct C {}; }; }; A::B::C c;", Matcher));
   EXPECT_TRUE(notMatches("namespace A { struct B {}; } A::B b;", Matcher));
 }
 
@@ -1940,8 +1895,8 @@ TEST_P(ASTMatchersTest, NestedNameSpecifier_MatchesNamespaceDecls) {
   if (!GetParam().isCXX()) {
     return;
   }
-  NestedNameSpecifierMatcher Matcher = nestedNameSpecifier(
-    specifiesNamespace(hasName("ns")));
+  NestedNameSpecifierMatcher Matcher =
+      nestedNameSpecifier(specifiesNamespace(hasName("ns")));
   EXPECT_TRUE(matches("namespace ns { struct A {}; } ns::A a;", Matcher));
   EXPECT_TRUE(notMatches("namespace xx { struct A {}; } xx::A a;", Matcher));
   EXPECT_TRUE(notMatches("struct ns { struct A {}; }; ns::A a;", Matcher));
@@ -1953,16 +1908,15 @@ TEST_P(ASTMatchersTest,
     return;
   }
   EXPECT_TRUE(matches(
-    "struct A { struct B { struct C {}; }; }; A::B::C c;",
-    nestedNameSpecifier(hasPrefix(specifiesType(asString("struct A"))))));
-  EXPECT_TRUE(matches(
-    "struct A { struct B { struct C {}; }; }; A::B::C c;",
-    nestedNameSpecifierLoc(hasPrefix(
-      specifiesTypeLoc(loc(qualType(asString("struct A"))))))));
+      "struct A { struct B { struct C {}; }; }; A::B::C c;",
+      nestedNameSpecifier(hasPrefix(specifiesType(asString("struct A"))))));
+  EXPECT_TRUE(matches("struct A { struct B { struct C {}; }; }; A::B::C c;",
+                      nestedNameSpecifierLoc(hasPrefix(specifiesTypeLoc(
+                          loc(qualType(asString("struct A"))))))));
   EXPECT_TRUE(matches(
-    "namespace N { struct A { struct B { struct C {}; }; }; } N::A::B::C c;",
-    nestedNameSpecifierLoc(hasPrefix(
-      specifiesTypeLoc(loc(qualType(asString("struct N::A"))))))));
+      "namespace N { struct A { struct B { struct C {}; }; }; } N::A::B::C c;",
+      nestedNameSpecifierLoc(hasPrefix(
+          specifiesTypeLoc(loc(qualType(asString("struct N::A"))))))));
 }
 
 template 
@@ -1980,18 +1934,18 @@ class VerifyAncestorHasChildIsEqual : public BoundNodesCallback {
     // to equalsNode.
     const T *TypedNode = cast(Node);
     return selectFirst(
-      "", match(stmt(hasParent(
-        stmt(has(stmt(equalsNode(TypedNode)))).bind(""))),
-                *Node, Context)) != nullptr;
+               "", match(stmt(hasParent(
+                             stmt(has(stmt(equalsNode(TypedNode)))).bind(""))),
+                         *Node, Context)) != nullptr;
   }
   bool verify(const BoundNodes &Nodes, ASTContext &Context, const Decl *Node) {
     // Use the original typed pointer to verify we can pass pointers to subtypes
     // to equalsNode.
     const T *TypedNode = cast(Node);
     return selectFirst(
-      "", match(decl(hasParent(
-        decl(has(decl(equalsNode(TypedNode)))).bind(""))),
-                *Node, Context)) != nullptr;
+               "", match(decl(hasParent(
+                             decl(has(decl(equalsNode(TypedNode)))).bind(""))),
+                         *Node, Context)) != nullptr;
   }
   bool verify(const BoundNodes &Nodes, ASTContext &Context, const Type *Node) {
     // Use the original typed pointer to verify we can pass pointers to subtypes
@@ -1999,9 +1953,9 @@ class VerifyAncestorHasChildIsEqual : public BoundNodesCallback {
     const T *TypedNode = cast(Node);
     const auto *Dec = Nodes.getNodeAs("decl");
     return selectFirst(
-      "", match(fieldDecl(hasParent(decl(has(fieldDecl(
-        hasType(type(equalsNode(TypedNode)).bind(""))))))),
-                *Dec, Context)) != nullptr;
+               "", match(fieldDecl(hasParent(decl(has(fieldDecl(
+                             hasType(type(equalsNode(TypedNode)).bind(""))))))),
+                         *Dec, Context)) != nullptr;
   }
 };
 
@@ -2100,43 +2054,31 @@ TEST(ASTMatchersTestObjC, ObjCMessageExpr) {
                           "  Str *up = [text uppercaseString];"
                           "} "
                           "@end ";
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(anything())));
+  EXPECT_TRUE(matchesObjC(Objc1String, objcMessageExpr(anything())));
   EXPECT_TRUE(matchesObjC(Objc1String,
-                          objcMessageExpr(hasAnySelector({
-                                          "contents", "meth:"}))
+                          objcMessageExpr(hasAnySelector({"contents", "meth:"}))
 
-                         ));
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(hasSelector("contents"))));
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(hasAnySelector("contents", "contentsA"))));
-  EXPECT_FALSE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(hasAnySelector("contentsB", "contentsC"))));
+                              ));
+  EXPECT_TRUE(
+      matchesObjC(Objc1String, objcMessageExpr(hasSelector("contents"))));
   EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(matchesSelector("cont*"))));
+      Objc1String, objcMessageExpr(hasAnySelector("contents", "contentsA"))));
   EXPECT_FALSE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(matchesSelector("?cont*"))));
-  EXPECT_TRUE(notMatchesObjC(
-    Objc1String,
-    objcMessageExpr(hasSelector("contents"), hasNullSelector())));
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(hasSelector("contents"), hasUnarySelector())));
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(hasSelector("contents"), numSelectorArgs(0))));
-  EXPECT_TRUE(matchesObjC(
-    Objc1String,
-    objcMessageExpr(matchesSelector("uppercase*"),
-                    argumentCountIs(0)
-    )));
+      Objc1String, objcMessageExpr(hasAnySelector("contentsB", "contentsC"))));
+  EXPECT_TRUE(
+      matchesObjC(Objc1String, objcMessageExpr(matchesSelector("cont*"))));
+  EXPECT_FALSE(
+      matchesObjC(Objc1String, objcMessageExpr(matchesSelector("?cont*"))));
+  EXPECT_TRUE(
+      notMatchesObjC(Objc1String, objcMessageExpr(hasSelector("contents"),
+                                                  hasNullSelector())));
+  EXPECT_TRUE(matchesObjC(Objc1String, objcMessageExpr(hasSelector("contents"),
+                                                       hasUnarySelector())));
+  EXPECT_TRUE(matchesObjC(Objc1String, objcMessageExpr(hasSelector("contents"),
+                                                       numSelectorArgs(0))));
+  EXPECT_TRUE(
+      matchesObjC(Objc1String, objcMessageExpr(matchesSelector("uppercase*"),
+                                               argumentCountIs(0))));
 }
 
 TEST(ASTMatchersTestObjC, ObjCDecls) {
@@ -2157,33 +2099,17 @@ TEST(ASTMatchersTestObjC, ObjCDecls) {
                          "- (void)abc_doThing {} "
                          "@end ";
 
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcProtocolDecl(hasName("Proto"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcImplementationDecl(hasName("Thing"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcCategoryDecl(hasName("ABC"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcCategoryImplDecl(hasName("ABC"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcMethodDecl(hasName("protoDidThing"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcMethodDecl(hasName("abc_doThing"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcMethodDecl(hasName("anything"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcIvarDecl(hasName("_ivar"))));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcPropertyDecl(hasName("enabled"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcProtocolDecl(hasName("Proto"))));
+  EXPECT_TRUE(
+      matchesObjC(ObjCString, objcImplementationDecl(hasName("Thing"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcCategoryDecl(hasName("ABC"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcCategoryImplDecl(hasName("ABC"))));
+  EXPECT_TRUE(
+      matchesObjC(ObjCString, objcMethodDecl(hasName("protoDidThing"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcMethodDecl(hasName("abc_doThing"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcMethodDecl(hasName("anything"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcIvarDecl(hasName("_ivar"))));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcPropertyDecl(hasName("enabled"))));
 }
 
 TEST(ASTMatchersTestObjC, ObjCExceptionStmts) {
@@ -2194,18 +2120,10 @@ TEST(ASTMatchersTestObjC, ObjCExceptionStmts) {
                          "  } @finally {}"
                          "}";
 
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcTryStmt()));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcThrowStmt()));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcCatchStmt()));
-  EXPECT_TRUE(matchesObjC(
-    ObjCString,
-    objcFinallyStmt()));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcTryStmt()));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcThrowStmt()));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcCatchStmt()));
+  EXPECT_TRUE(matchesObjC(ObjCString, objcFinallyStmt()));
 }
 
 TEST(ASTMatchersTestObjC, ObjCAutoreleasePoolStmt) {
@@ -2274,11 +2192,18 @@ void x() {
   EXPECT_TRUE(matchesWithOpenMP(Source3, Matcher));
 
   StringRef Source4 = R"(
+void x() {
+#pragma omp parallel default(firstprivate)
+;
+})";
+  EXPECT_TRUE(matchesWithOpenMP51(Source4, Matcher));
+
+  StringRef Source5 = R"(
 void x(int x) {
 #pragma omp parallel num_threads(x)
 ;
 })";
-  EXPECT_TRUE(notMatchesWithOpenMP(Source4, Matcher));
+  EXPECT_TRUE(notMatchesWithOpenMP(Source5, Matcher));
 }
 
 TEST(ASTMatchersTest, Finder_DynamicOnlyAcceptsSomeMatchers) {

diff  --git a/clang/unittests/ASTMatchers/ASTMatchersTest.h b/clang/unittests/ASTMatchers/ASTMatchersTest.h
index 8669ebd552c8..bde6297f82dd 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTest.h
+++ b/clang/unittests/ASTMatchers/ASTMatchersTest.h
@@ -20,10 +20,10 @@ namespace clang {
 namespace ast_matchers {
 
 using clang::tooling::buildASTFromCodeWithArgs;
+using clang::tooling::FileContentMappings;
+using clang::tooling::FrontendActionFactory;
 using clang::tooling::newFrontendActionFactory;
 using clang::tooling::runToolOnCodeWithArgs;
-using clang::tooling::FrontendActionFactory;
-using clang::tooling::FileContentMappings;
 
 class BoundNodesCallback {
 public:
@@ -38,7 +38,8 @@ class BoundNodesCallback {
 // If 'FindResultVerifier' is NULL, sets *Verified to true when Run is called.
 class VerifyMatch : public MatchFinder::MatchCallback {
 public:
-  VerifyMatch(std::unique_ptr FindResultVerifier, bool *Verified)
+  VerifyMatch(std::unique_ptr FindResultVerifier,
+              bool *Verified)
       : Verified(Verified), FindResultReviewer(std::move(FindResultVerifier)) {}
 
   void run(const MatchFinder::MatchResult &Result) override {
@@ -124,17 +125,16 @@ testing::AssertionResult matchesConditionally(
     return testing::AssertionFailure() << "Parsing error in \"" << Code << "\"";
   }
   if (Found != DynamicFound) {
-    return testing::AssertionFailure() << "Dynamic match result ("
-                                       << DynamicFound
-                                       << ") does not match static result ("
-                                       << Found << ")";
+    return testing::AssertionFailure()
+           << "Dynamic match result (" << DynamicFound
+           << ") does not match static result (" << Found << ")";
   }
   if (!Found && ExpectMatch) {
     return testing::AssertionFailure()
-      << "Could not find match in \"" << Code << "\"";
+           << "Could not find match in \"" << Code << "\"";
   } else if (Found && !ExpectMatch) {
     return testing::AssertionFailure()
-      << "Found unexpected match in \"" << Code << "\"";
+           << "Found unexpected match in \"" << Code << "\"";
   }
   return testing::AssertionSuccess();
 }
@@ -216,7 +216,8 @@ matchesConditionallyWithCuda(const Twine &Code, const T &AMatcher,
       "                      size_t sharedSize = 0,"
       "                      cudaStream_t stream = 0);"
       "extern \"C\" unsigned __cudaPushCallConfiguration("
-      "    dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, void *stream = 0);";
+      "    dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, void *stream = "
+      "0);";
 
   bool Found = false, DynamicFound = false;
   MatchFinder Finder;
@@ -233,22 +234,20 @@ matchesConditionallyWithCuda(const Twine &Code, const T &AMatcher,
   std::vector Args = {
       "-xcuda",  "-fno-ms-extensions",     "--cuda-host-only",     "-nocudainc",
       "-target", "x86_64-unknown-unknown", std::string(CompileArg)};
-  if (!runToolOnCodeWithArgs(Factory->create(),
-                             CudaHeader + Code, Args)) {
+  if (!runToolOnCodeWithArgs(Factory->create(), CudaHeader + Code, Args)) {
     return testing::AssertionFailure() << "Parsing error in \"" << Code << "\"";
   }
   if (Found != DynamicFound) {
-    return testing::AssertionFailure() << "Dynamic match result ("
-                                       << DynamicFound
-                                       << ") does not match static result ("
-                                       << Found << ")";
+    return testing::AssertionFailure()
+           << "Dynamic match result (" << DynamicFound
+           << ") does not match static result (" << Found << ")";
   }
   if (!Found && ExpectMatch) {
     return testing::AssertionFailure()
-      << "Could not find match in \"" << Code << "\"";
+           << "Could not find match in \"" << Code << "\"";
   } else if (Found && !ExpectMatch) {
     return testing::AssertionFailure()
-      << "Found unexpected match in \"" << Code << "\"";
+           << "Found unexpected match in \"" << Code << "\"";
   }
   return testing::AssertionSuccess();
 }
@@ -276,13 +275,28 @@ testing::AssertionResult notMatchesWithOpenMP(const Twine &Code,
   return matchesConditionally(Code, AMatcher, false, {"-fopenmp=libomp"});
 }
 
+template 
+testing::AssertionResult matchesWithOpenMP51(const Twine &Code,
+                                             const T &AMatcher) {
+  return matchesConditionally(Code, AMatcher, true,
+                              {"-fopenmp=libomp", "-fopenmp-version=51"});
+}
+
+template 
+testing::AssertionResult notMatchesWithOpenMP51(const Twine &Code,
+                                                const T &AMatcher) {
+  return matchesConditionally(Code, AMatcher, false,
+                              {"-fopenmp=libomp", "-fopenmp-version=51"});
+}
+
 template 
 testing::AssertionResult matchAndVerifyResultConditionally(
     const Twine &Code, const T &AMatcher,
     std::unique_ptr FindResultVerifier, bool ExpectResult) {
   bool VerifiedResult = false;
   MatchFinder Finder;
-  VerifyMatch VerifyVerifiedResult(std::move(FindResultVerifier), &VerifiedResult);
+  VerifyMatch VerifyVerifiedResult(std::move(FindResultVerifier),
+                                   &VerifiedResult);
   Finder.addMatcher(AMatcher, &VerifyVerifiedResult);
   std::unique_ptr Factory(
       newFrontendActionFactory(&Finder));
@@ -296,10 +310,10 @@ testing::AssertionResult matchAndVerifyResultConditionally(
   }
   if (!VerifiedResult && ExpectResult) {
     return testing::AssertionFailure()
-      << "Could not verify result in \"" << Code << "\"";
+           << "Could not verify result in \"" << Code << "\"";
   } else if (VerifiedResult && !ExpectResult) {
     return testing::AssertionFailure()
-      << "Verified unexpected result in \"" << Code << "\"";
+           << "Verified unexpected result in \"" << Code << "\"";
   }
 
   VerifiedResult = false;
@@ -307,15 +321,15 @@ testing::AssertionResult matchAndVerifyResultConditionally(
   std::unique_ptr AST(
       buildASTFromCodeWithArgs(Code.toStringRef(Buffer), Args));
   if (!AST.get())
-    return testing::AssertionFailure() << "Parsing error in \"" << Code
-                                       << "\" while building AST";
+    return testing::AssertionFailure()
+           << "Parsing error in \"" << Code << "\" while building AST";
   Finder.matchAST(AST->getASTContext());
   if (!VerifiedResult && ExpectResult) {
     return testing::AssertionFailure()
-      << "Could not verify result in \"" << Code << "\" with AST";
+           << "Could not verify result in \"" << Code << "\" with AST";
   } else if (VerifiedResult && !ExpectResult) {
     return testing::AssertionFailure()
-      << "Verified unexpected result in \"" << Code << "\" with AST";
+           << "Verified unexpected result in \"" << Code << "\" with AST";
   }
 
   return testing::AssertionSuccess();
@@ -327,8 +341,8 @@ template 
 testing::AssertionResult matchAndVerifyResultTrue(
     const Twine &Code, const T &AMatcher,
     std::unique_ptr FindResultVerifier) {
-  return matchAndVerifyResultConditionally(
-      Code, AMatcher, std::move(FindResultVerifier), true);
+  return matchAndVerifyResultConditionally(Code, AMatcher,
+                                           std::move(FindResultVerifier), true);
 }
 
 template 
@@ -342,8 +356,7 @@ testing::AssertionResult matchAndVerifyResultFalse(
 // Implements a run method that returns whether BoundNodes contains a
 // Decl bound to Id that can be dynamically cast to T.
 // Optionally checks that the check succeeded a specific number of times.
-template 
-class VerifyIdIsBoundTo : public BoundNodesCallback {
+template  class VerifyIdIsBoundTo : public BoundNodesCallback {
 public:
   // Create an object that checks that a node of type \c T was bound to \c Id.
   // Does not check for a certain number of matches.
@@ -386,7 +399,7 @@ class VerifyIdIsBoundTo : public BoundNodesCallback {
       if (const NamedDecl *Named = Nodes->getNodeAs(Id)) {
         Name = Named->getNameAsString();
       } else if (const NestedNameSpecifier *NNS =
-        Nodes->getNodeAs(Id)) {
+                     Nodes->getNodeAs(Id)) {
         llvm::raw_string_ostream OS(Name);
         NNS->print(OS, PrintingPolicy(LangOptions()));
       }
@@ -398,7 +411,7 @@ class VerifyIdIsBoundTo : public BoundNodesCallback {
       return true;
     }
     EXPECT_TRUE(M.count(Id) == 0 ||
-      M.find(Id)->second.template get() == nullptr);
+                M.find(Id)->second.template get() == nullptr);
     return false;
   }
 
@@ -437,4 +450,4 @@ class ASTMatchersTest : public ::testing::Test,
 } // namespace ast_matchers
 } // namespace clang
 
-#endif  // LLVM_CLANG_UNITTESTS_AST_MATCHERS_AST_MATCHERS_TEST_H
+#endif // LLVM_CLANG_UNITTESTS_AST_MATCHERS_AST_MATCHERS_TEST_H

diff  --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index bf799a781ae1..93ea63c1c2e6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -982,6 +982,7 @@ __OMP_CANCEL_KIND(taskgroup, 4)
 
 __OMP_DEFAULT_KIND(none)
 __OMP_DEFAULT_KIND(shared)
+__OMP_DEFAULT_KIND(firstprivate)
 __OMP_DEFAULT_KIND(unknown)
 
 #undef __OMP_DEFAULT_KIND


        

From llvm-commits at lists.llvm.org  Sun Jul 12 21:04:06 2020
From: llvm-commits at lists.llvm.org (Johannes Doerfert via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 04:04:06 +0000 (UTC)
Subject: [PATCH] D75591: [OpenMP] Add firstprivate as a default data-sharing
 attribute to clang
In-Reply-To: 
References: 
Message-ID: 

This revision was automatically updated to reflect the committed changes.
Closed by commit rG78443666bc18: [OpenMP] Add firstprivate as a default data-sharing attribute to clang (authored by atmnpatel, committed by jdoerfert).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75591/new/

https://reviews.llvm.org/D75591

Files:
  clang-tools-extra/docs/clang-tidy/checks/openmp-use-default-none.rst
  clang-tools-extra/test/clang-tidy/checkers/openmp-use-default-none.cpp
  clang/docs/LibASTMatchersReference.html
  clang/include/clang/ASTMatchers/ASTMatchers.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/ASTMatchers/Dynamic/Registry.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/test/OpenMP/distribute_parallel_for_default_messages.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_default_messages.cpp
  clang/test/OpenMP/driver.c
  clang/test/OpenMP/parallel_default_messages.cpp
  clang/test/OpenMP/parallel_for_default_messages.cpp
  clang/test/OpenMP/parallel_for_simd_default_messages.cpp
  clang/test/OpenMP/parallel_master_codegen.cpp
  clang/test/OpenMP/parallel_master_default_messages.cpp
  clang/test/OpenMP/parallel_sections_default_messages.cpp
  clang/test/OpenMP/target_parallel_default_messages.cpp
  clang/test/OpenMP/target_parallel_for_default_messages.cpp
  clang/test/OpenMP/target_parallel_for_simd_default_messages.cpp
  clang/test/OpenMP/target_teams_default_messages.cpp
  clang/test/OpenMP/target_teams_distribute_default_messages.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_default_messages.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_simd_default_messages.cpp
  clang/test/OpenMP/task_default_messages.cpp
  clang/test/OpenMP/task_messages.cpp
  clang/test/OpenMP/teams_default_messages.cpp
  clang/test/OpenMP/teams_distribute_default_messages.cpp
  clang/test/OpenMP/teams_distribute_parallel_for_default_messages.cpp
  clang/test/OpenMP/teams_distribute_parallel_for_simd_default_messages.cpp
  clang/test/OpenMP/teams_distribute_simd_default_messages.cpp
  clang/unittests/ASTMatchers/ASTMatchersNarrowingTest.cpp
  clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
  clang/unittests/ASTMatchers/ASTMatchersTest.h
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D75591.277321.patch
Type: text/x-patch
Size: 274251 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 21:05:10 2020
From: llvm-commits at lists.llvm.org (Fangrui Song via llvm-commits)
Date: Sun, 12 Jul 2020 21:05:10 -0700 (PDT)
Subject: [llvm] 4d5fd0e - [MC][RISCV] Set UseIntegratedAssembler to true
Message-ID: <5f0bdd76.1c69fb81.b1088.6f4c@mx.google.com>


Author: Fangrui Song
Date: 2020-07-12T21:04:48-07:00
New Revision: 4d5fd0ee5ebda8979a448f5de397e3f1321b1ca8

URL: https://github.com/llvm/llvm-project/commit/4d5fd0ee5ebda8979a448f5de397e3f1321b1ca8
DIFF: https://github.com/llvm/llvm-project/commit/4d5fd0ee5ebda8979a448f5de397e3f1321b1ca8.diff

LOG: [MC][RISCV] Set UseIntegratedAssembler to true

to align with most other targets. Also, -fintegrated-as is the default
for clang -target riscv*.

Added: 
    

Modified: 
    llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCAsmInfo.cpp
    llvm/test/CodeGen/RISCV/branch-relaxation.ll
    llvm/test/CodeGen/RISCV/inline-asm-abi-names.ll
    llvm/test/CodeGen/RISCV/inline-asm.ll
    llvm/test/CodeGen/RISCV/large-stack.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCAsmInfo.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCAsmInfo.cpp
index 8db1738566ac..089a2def4c21 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCAsmInfo.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCAsmInfo.cpp
@@ -27,7 +27,6 @@ RISCVMCAsmInfo::RISCVMCAsmInfo(const Triple &TT) {
   ExceptionsType = ExceptionHandling::DwarfCFI;
   Data16bitsDirective = "\t.half\t";
   Data32bitsDirective = "\t.word\t";
-  UseIntegratedAssembler = false;
 }
 
 const MCExpr *RISCVMCAsmInfo::getExprForFDESymbol(const MCSymbol *Sym,

diff  --git a/llvm/test/CodeGen/RISCV/branch-relaxation.ll b/llvm/test/CodeGen/RISCV/branch-relaxation.ll
index 56f0f27a0648..3d617bf0b26b 100644
--- a/llvm/test/CodeGen/RISCV/branch-relaxation.ll
+++ b/llvm/test/CodeGen/RISCV/branch-relaxation.ll
@@ -11,7 +11,7 @@ define void @relax_bcc(i1 %a) nounwind {
 ; CHECK-NEXT:    j .LBB0_2
 ; CHECK-NEXT:  .LBB0_1: # %iftrue
 ; CHECK-NEXT:    #APP
-; CHECK-NEXT:    .space 4096
+; CHECK-NEXT:    .zero 4096
 ; CHECK-NEXT:    #NO_APP
 ; CHECK-NEXT:  .LBB0_2: # %tail
 ; CHECK-NEXT:    ret
@@ -38,7 +38,7 @@ define i32 @relax_jal(i1 %a) nounwind {
 ; CHECK-NEXT:    #APP
 ; CHECK-NEXT:    #NO_APP
 ; CHECK-NEXT:    #APP
-; CHECK-NEXT:    .space 1048576
+; CHECK-NEXT:    .zero 1048576
 ; CHECK-NEXT:    #NO_APP
 ; CHECK-NEXT:    addi a0, zero, 1
 ; CHECK-NEXT:    ret

diff  --git a/llvm/test/CodeGen/RISCV/inline-asm-abi-names.ll b/llvm/test/CodeGen/RISCV/inline-asm-abi-names.ll
index 4d85e3ea006b..f9ed4aed6ca3 100644
--- a/llvm/test/CodeGen/RISCV/inline-asm-abi-names.ll
+++ b/llvm/test/CodeGen/RISCV/inline-asm-abi-names.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs -no-integrated-as < %s \
 ; RUN:   | FileCheck -check-prefix=RV32I %s
-; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs -no-integrated-as < %s \
 ; RUN:   | FileCheck -check-prefix=RV64I %s
 
 ; These test that we can use both the architectural names (x*) and the ABI names

diff  --git a/llvm/test/CodeGen/RISCV/inline-asm.ll b/llvm/test/CodeGen/RISCV/inline-asm.ll
index 43f951e352a6..de5d9a5f22a8 100644
--- a/llvm/test/CodeGen/RISCV/inline-asm.ll
+++ b/llvm/test/CodeGen/RISCV/inline-asm.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs -no-integrated-as < %s \
 ; RUN:   | FileCheck -check-prefix=RV32I %s
-; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs -no-integrated-as < %s \
 ; RUN:   | FileCheck -check-prefix=RV64I %s
 
 @gi = external global i32

diff  --git a/llvm/test/CodeGen/RISCV/large-stack.ll b/llvm/test/CodeGen/RISCV/large-stack.ll
index 7acf0f4076e8..7cc6e83d7d85 100644
--- a/llvm/test/CodeGen/RISCV/large-stack.ll
+++ b/llvm/test/CodeGen/RISCV/large-stack.ll
@@ -64,10 +64,12 @@ define void @test_emergency_spill_slot(i32 %a) {
 ; RV32I-FPELIM-NEXT:    add a1, a2, a1
 ; RV32I-FPELIM-NEXT:    #APP
 ; RV32I-FPELIM-NEXT:    nop
+; RV32I-FPELIM-EMPTY:
 ; RV32I-FPELIM-NEXT:    #NO_APP
 ; RV32I-FPELIM-NEXT:    sw a0, 0(a1)
 ; RV32I-FPELIM-NEXT:    #APP
 ; RV32I-FPELIM-NEXT:    nop
+; RV32I-FPELIM-EMPTY:
 ; RV32I-FPELIM-NEXT:    #NO_APP
 ; RV32I-FPELIM-NEXT:    lui a0, 97
 ; RV32I-FPELIM-NEXT:    addi a0, a0, 672
@@ -103,10 +105,12 @@ define void @test_emergency_spill_slot(i32 %a) {
 ; RV32I-WITHFP-NEXT:    add a1, a2, a1
 ; RV32I-WITHFP-NEXT:    #APP
 ; RV32I-WITHFP-NEXT:    nop
+; RV32I-WITHFP-EMPTY:
 ; RV32I-WITHFP-NEXT:    #NO_APP
 ; RV32I-WITHFP-NEXT:    sw a0, 0(a1)
 ; RV32I-WITHFP-NEXT:    #APP
 ; RV32I-WITHFP-NEXT:    nop
+; RV32I-WITHFP-EMPTY:
 ; RV32I-WITHFP-NEXT:    #NO_APP
 ; RV32I-WITHFP-NEXT:    lui a0, 97
 ; RV32I-WITHFP-NEXT:    addi a0, a0, 688


        

From llvm-commits at lists.llvm.org  Sun Jul 12 21:34:16 2020
From: llvm-commits at lists.llvm.org (Qiu Chaofan via llvm-commits)
Date: Sun, 12 Jul 2020 21:34:16 -0700 (PDT)
Subject: [llvm] b6912c8 - [PowerPC] Support constrained conversion in SPE
 target
Message-ID: <5f0be448.1c69fb81.1b195.006b@mx.google.com>


Author: Qiu Chaofan
Date: 2020-07-13T12:18:36+08:00
New Revision: b6912c879ed848fd59c108e8b90fe0180893ee56

URL: https://github.com/llvm/llvm-project/commit/b6912c879ed848fd59c108e8b90fe0180893ee56
DIFF: https://github.com/llvm/llvm-project/commit/b6912c879ed848fd59c108e8b90fe0180893ee56.diff

LOG: [PowerPC] Support constrained conversion in SPE target

This patch adds support for constrained int/fp conversion between
signed/unsigned i32 and f32/f64.

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D82747

Added: 
    llvm/test/CodeGen/PowerPC/fp-strict-conv.ll

Modified: 
    llvm/lib/Target/PowerPC/PPCISelLowering.cpp
    llvm/lib/Target/PowerPC/PPCInstrSPE.td

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 49140bab5134..575ad68fecd9 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -423,6 +423,9 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
 
   if (Subtarget.hasSPE()) {
     // SPE has built-in conversions
+    setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i32, Legal);
+    setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i32, Legal);
+    setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i32, Legal);
     setOperationAction(ISD::FP_TO_SINT, MVT::i32, Legal);
     setOperationAction(ISD::SINT_TO_FP, MVT::i32, Legal);
     setOperationAction(ISD::UINT_TO_FP, MVT::i32, Legal);
@@ -572,9 +575,10 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
       setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
   } else {
     // PowerPC does not have FP_TO_UINT on 32-bit implementations.
-    if (Subtarget.hasSPE())
+    if (Subtarget.hasSPE()) {
+      setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i32, Legal);
       setOperationAction(ISD::FP_TO_UINT, MVT::i32, Legal);
-    else
+    } else
       setOperationAction(ISD::FP_TO_UINT, MVT::i32, Expand);
   }
 

diff  --git a/llvm/lib/Target/PowerPC/PPCInstrSPE.td b/llvm/lib/Target/PowerPC/PPCInstrSPE.td
index 935c3044ae47..858eb0c9fe50 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrSPE.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrSPE.td
@@ -158,7 +158,7 @@ def EFDCFSF        : EFXForm_2a<755, (outs sperc:$RT), (ins spe4rc:$RB),
 
 def EFDCFSI        : EFXForm_2a<753, (outs sperc:$RT), (ins gprc:$RB),
                                 "efdcfsi $RT, $RB", IIC_FPDGeneral,
-                                [(set f64:$RT, (sint_to_fp i32:$RB))]>;
+                                [(set f64:$RT, (any_sint_to_fp i32:$RB))]>;
 
 def EFDCFSID       : EFXForm_2a<739, (outs sperc:$RT), (ins gprc:$RB),
                                 "efdcfsid $RT, $RB", IIC_FPDGeneral,
@@ -169,7 +169,7 @@ def EFDCFUF        : EFXForm_2a<754, (outs sperc:$RT), (ins spe4rc:$RB),
 
 def EFDCFUI        : EFXForm_2a<752, (outs sperc:$RT), (ins gprc:$RB),
                                 "efdcfui $RT, $RB", IIC_FPDGeneral,
-                                [(set f64:$RT, (uint_to_fp i32:$RB))]>;
+                                [(set f64:$RT, (any_uint_to_fp i32:$RB))]>;
 
 def EFDCFUID       : EFXForm_2a<738, (outs sperc:$RT), (ins gprc:$RB),
                                 "efdcfuid $RT, $RB", IIC_FPDGeneral,
@@ -197,7 +197,7 @@ def EFDCTSIDZ      : EFXForm_2a<747, (outs gprc:$RT), (ins sperc:$RB),
 
 def EFDCTSIZ       : EFXForm_2a<762, (outs gprc:$RT), (ins sperc:$RB),
                                 "efdctsiz $RT, $RB", IIC_FPDGeneral,
-                                [(set i32:$RT, (fp_to_sint f64:$RB))]>;
+                                [(set i32:$RT, (any_fp_to_sint f64:$RB))]>;
 
 def EFDCTUF        : EFXForm_2a<758, (outs sperc:$RT), (ins spe4rc:$RB),
                                 "efdctuf $RT, $RB", IIC_FPDGeneral, []>;
@@ -212,7 +212,7 @@ def EFDCTUIDZ      : EFXForm_2a<746, (outs gprc:$RT), (ins sperc:$RB),
 
 def EFDCTUIZ       : EFXForm_2a<760, (outs gprc:$RT), (ins sperc:$RB),
                                 "efdctuiz $RT, $RB", IIC_FPDGeneral,
-                                [(set i32:$RT, (fp_to_uint f64:$RB))]>;
+                                [(set i32:$RT, (any_fp_to_uint f64:$RB))]>;
 
 def EFDDIV         : EFXForm_1<745, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
                                "efddiv $RT, $RA, $RB", IIC_FPDivD,
@@ -261,14 +261,14 @@ def EFSCFSF        : EFXForm_2a<723, (outs spe4rc:$RT), (ins spe4rc:$RB),
 
 def EFSCFSI        : EFXForm_2a<721, (outs spe4rc:$RT), (ins gprc:$RB),
                                 "efscfsi $RT, $RB", IIC_FPSGeneral,
-                                [(set f32:$RT, (sint_to_fp i32:$RB))]>;
+                                [(set f32:$RT, (any_sint_to_fp i32:$RB))]>;
 
 def EFSCFUF        : EFXForm_2a<722, (outs spe4rc:$RT), (ins spe4rc:$RB),
                                 "efscfuf $RT, $RB", IIC_FPSGeneral, []>;
 
 def EFSCFUI        : EFXForm_2a<720, (outs spe4rc:$RT), (ins gprc:$RB),
                                 "efscfui $RT, $RB", IIC_FPSGeneral,
-                                [(set f32:$RT, (uint_to_fp i32:$RB))]>;
+                                [(set f32:$RT, (any_uint_to_fp i32:$RB))]>;
 
 let isCompare = 1 in {
 def EFSCMPEQ       : EFXForm_3<718, (outs crrc:$crD), (ins spe4rc:$RA, spe4rc:$RB),
@@ -288,7 +288,7 @@ def EFSCTSI        : EFXForm_2a<725, (outs gprc:$RT), (ins spe4rc:$RB),
 
 def EFSCTSIZ       : EFXForm_2a<730, (outs gprc:$RT), (ins spe4rc:$RB),
                                 "efsctsiz $RT, $RB", IIC_FPSGeneral,
-                                [(set i32:$RT, (fp_to_sint f32:$RB))]>;
+                                [(set i32:$RT, (any_fp_to_sint f32:$RB))]>;
 
 def EFSCTUF        : EFXForm_2a<726, (outs sperc:$RT), (ins spe4rc:$RB),
                                 "efsctuf $RT, $RB", IIC_FPSGeneral, []>;
@@ -299,7 +299,7 @@ def EFSCTUI        : EFXForm_2a<724, (outs gprc:$RT), (ins spe4rc:$RB),
 
 def EFSCTUIZ       : EFXForm_2a<728, (outs gprc:$RT), (ins spe4rc:$RB),
                                 "efsctuiz $RT, $RB", IIC_FPSGeneral,
-                                [(set i32:$RT, (fp_to_uint f32:$RB))]>;
+                                [(set i32:$RT, (any_fp_to_uint f32:$RB))]>;
 
 def EFSDIV         : EFXForm_1<713, (outs spe4rc:$RT), (ins spe4rc:$RA, spe4rc:$RB),
                                "efsdiv $RT, $RA, $RB", IIC_FPDivD,

diff  --git a/llvm/test/CodeGen/PowerPC/fp-strict-conv.ll b/llvm/test/CodeGen/PowerPC/fp-strict-conv.ll
new file mode 100644
index 000000000000..ab806a19c158
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/fp-strict-conv.ll
@@ -0,0 +1,274 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names < %s -mcpu=e500 \
+; RUN:   -mtriple=powerpc-unknown-linux-gnu -mattr=spe | FileCheck %s \
+; RUN:   -check-prefix=SPE
+
+declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata)
+declare i64 @llvm.experimental.constrained.fptosi.i64.f64(double, metadata)
+declare i64 @llvm.experimental.constrained.fptoui.i64.f64(double, metadata)
+declare i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata)
+
+declare i32 @llvm.experimental.constrained.fptosi.i32.f32(float, metadata)
+declare i64 @llvm.experimental.constrained.fptosi.i64.f32(float, metadata)
+declare i64 @llvm.experimental.constrained.fptoui.i64.f32(float, metadata)
+declare i32 @llvm.experimental.constrained.fptoui.i32.f32(float, metadata)
+
+declare double @llvm.experimental.constrained.sitofp.f64.i32(i32, metadata, metadata)
+declare double @llvm.experimental.constrained.sitofp.f64.i64(i64, metadata, metadata)
+declare double @llvm.experimental.constrained.uitofp.f64.i32(i32, metadata, metadata)
+declare double @llvm.experimental.constrained.uitofp.f64.i64(i64, metadata, metadata)
+
+declare float @llvm.experimental.constrained.sitofp.f32.i64(i64, metadata, metadata)
+declare float @llvm.experimental.constrained.sitofp.f32.i32(i32, metadata, metadata)
+declare float @llvm.experimental.constrained.uitofp.f32.i32(i32, metadata, metadata)
+declare float @llvm.experimental.constrained.uitofp.f32.i64(i64, metadata, metadata)
+
+define i32 @d_to_i32(double %m) #0 {
+; SPE-LABEL: d_to_i32:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    evmergelo r3, r3, r4
+; SPE-NEXT:    efdctsiz r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %m, metadata !"fpexcept.strict") #0
+  ret i32 %conv
+}
+
+define i64 @d_to_i64(double %m) #0 {
+; SPE-LABEL: d_to_i64:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    evmergelo r4, r3, r4
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    bl __fixdfdi
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = call i64 @llvm.experimental.constrained.fptosi.i64.f64(double %m, metadata !"fpexcept.strict") #0
+  ret i64 %conv
+}
+
+define i64 @d_to_u64(double %m) #0 {
+; SPE-LABEL: d_to_u64:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    evmergelo r4, r3, r4
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    bl __fixunsdfdi
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = call i64 @llvm.experimental.constrained.fptoui.i64.f64(double %m, metadata !"fpexcept.strict") #0
+  ret i64 %conv
+}
+
+define zeroext i32 @d_to_u32(double %m) #0 {
+; SPE-LABEL: d_to_u32:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    evmergelo r3, r3, r4
+; SPE-NEXT:    efdctuiz r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %m, metadata !"fpexcept.strict") #0
+  ret i32 %conv
+}
+
+define signext i32 @f_to_i32(float %m) #0 {
+; SPE-LABEL: f_to_i32:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efsctsiz r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %m, metadata !"fpexcept.strict") #0
+  ret i32 %conv
+}
+
+define i64 @f_to_i64(float %m) #0 {
+; SPE-LABEL: f_to_i64:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __fixsfdi
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = call i64 @llvm.experimental.constrained.fptosi.i64.f32(float %m, metadata !"fpexcept.strict") #0
+  ret i64 %conv
+}
+
+define i64 @f_to_u64(float %m) #0 {
+; SPE-LABEL: f_to_u64:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __fixunssfdi
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = call i64 @llvm.experimental.constrained.fptoui.i64.f32(float %m, metadata !"fpexcept.strict") #0
+  ret i64 %conv
+}
+
+define zeroext i32 @f_to_u32(float %m) #0 {
+; SPE-LABEL: f_to_u32:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efsctuiz r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = call i32 @llvm.experimental.constrained.fptoui.i32.f32(float %m, metadata !"fpexcept.strict") #0
+  ret i32 %conv
+}
+
+define double @i32_to_d(i32 signext %m) #0 {
+; SPE-LABEL: i32_to_d:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efdcfsi r4, r3
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret double %conv
+}
+
+define double @i64_to_d(i64 %m) #0 {
+; SPE-LABEL: i64_to_d:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __floatdidf
+; SPE-NEXT:    evmergelo r4, r3, r4
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call double @llvm.experimental.constrained.sitofp.f64.i64(i64 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret double %conv
+}
+
+define double @u32_to_d(i32 zeroext %m) #0 {
+; SPE-LABEL: u32_to_d:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efdcfui r4, r3
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call double @llvm.experimental.constrained.uitofp.f64.i32(i32 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret double %conv
+}
+
+define double @u64_to_d(i64 %m) #0 {
+; SPE-LABEL: u64_to_d:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __floatundidf
+; SPE-NEXT:    evmergelo r4, r3, r4
+; SPE-NEXT:    evmergehi r3, r4, r4
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    # kill: def $r3 killed $r3 killed $s3
+; SPE-NEXT:    # kill: def $r4 killed $r4 killed $s4
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call double @llvm.experimental.constrained.uitofp.f64.i64(i64 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret double %conv
+}
+
+define float @i32_to_f(i32 signext %m) #0 {
+; SPE-LABEL: i32_to_f:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efscfsi r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %conv
+}
+
+define float @i64_to_f(i64 %m) #0 {
+; SPE-LABEL: i64_to_f:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __floatdisf
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call float @llvm.experimental.constrained.sitofp.f32.i64(i64 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %conv
+}
+
+define float @u32_to_f(i32 zeroext %m) #0 {
+; SPE-LABEL: u32_to_f:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    efscfui r3, r3
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %conv
+}
+
+define float @u64_to_f(i64 %m) #0 {
+; SPE-LABEL: u64_to_f:
+; SPE:       # %bb.0: # %entry
+; SPE-NEXT:    mflr r0
+; SPE-NEXT:    stw r0, 4(r1)
+; SPE-NEXT:    stwu r1, -16(r1)
+; SPE-NEXT:    .cfi_def_cfa_offset 16
+; SPE-NEXT:    .cfi_offset lr, 4
+; SPE-NEXT:    bl __floatundisf
+; SPE-NEXT:    lwz r0, 20(r1)
+; SPE-NEXT:    addi r1, r1, 16
+; SPE-NEXT:    mtlr r0
+; SPE-NEXT:    blr
+entry:
+  %conv = tail call float @llvm.experimental.constrained.uitofp.f32.i64(i64 %m, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %conv
+}
+
+attributes #0 = { strictfp }


        

From llvm-commits at lists.llvm.org  Sun Jul 12 21:34:28 2020
From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 04:34:28 +0000 (UTC)
Subject: [PATCH] D82747: [PowerPC] Support constrained int/fp conversion in
 SPE targets
In-Reply-To: 
References: 
Message-ID: <3f0d8ddd5b42c28e1e7bcd6f4754acd5@localhost.localdomain>

This revision was automatically updated to reflect the committed changes.
Closed by commit rGb6912c879ed8: [PowerPC] Support constrained conversion in SPE target (authored by qiucf).

Changed prior to commit:
  https://reviews.llvm.org/D82747?vs=274021&id=277322#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82747/new/

https://reviews.llvm.org/D82747

Files:
  llvm/lib/Target/PowerPC/PPCISelLowering.cpp
  llvm/lib/Target/PowerPC/PPCInstrSPE.td
  llvm/test/CodeGen/PowerPC/fp-strict-conv.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D82747.277322.patch
Type: text/x-patch
Size: 14707 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 21:37:26 2020
From: llvm-commits at lists.llvm.org (Kai Luo via llvm-commits)
Date: Sun, 12 Jul 2020 21:37:26 -0700 (PDT)
Subject: [llvm] ac8dc52 - [PowerPC] Enhance tests for D83276. NFC.
Message-ID: <5f0be506.1c69fb81.9bc78.d220@mx.google.com>


Author: Kai Luo
Date: 2020-07-13T04:37:09Z
New Revision: ac8dc526c4717907bed11b2fc7ab0db5a0f466ba

URL: https://github.com/llvm/llvm-project/commit/ac8dc526c4717907bed11b2fc7ab0db5a0f466ba
DIFF: https://github.com/llvm/llvm-project/commit/ac8dc526c4717907bed11b2fc7ab0db5a0f466ba.diff

LOG: [PowerPC] Enhance tests for D83276. NFC.

Added: 
    llvm/test/CodeGen/PowerPC/stack-clash-prologue-nounwind.ll

Modified: 
    llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll

Removed: 
    


################################################################################
diff  --git a/llvm/test/CodeGen/PowerPC/stack-clash-prologue-nounwind.ll b/llvm/test/CodeGen/PowerPC/stack-clash-prologue-nounwind.ll
new file mode 100644
index 000000000000..e595d8a732a5
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/stack-clash-prologue-nounwind.ll
@@ -0,0 +1,474 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -ppc-asm-full-reg-names -verify-machineinstrs \
+; RUN:   -mtriple=powerpc64le-linux-gnu < %s | FileCheck \
+; RUN:   -check-prefix=CHECK-LE %s
+; RUN: llc -ppc-asm-full-reg-names -verify-machineinstrs \
+; RUN:   -mtriple=powerpc64-linux-gnu < %s | FileCheck \
+; RUN:   -check-prefix=CHECK-BE %s
+; RUN: llc -ppc-asm-full-reg-names -verify-machineinstrs \
+; RUN:   -mtriple=powerpc-linux-gnu < %s | FileCheck \
+; RUN:   -check-prefix=CHECK-32 %s
+
+; Free probe
+define i8 @f0() #0 nounwind {
+; CHECK-LE-LABEL: f0:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, -64(r1)
+; CHECK-LE-NEXT:    lbz r3, -64(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f0:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, -64(r1)
+; CHECK-BE-NEXT:    lbz r3, -64(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f0:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    stwu r1, -80(r1)
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    addi r1, r1, 80
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 64
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f1() #0 "stack-probe-size"="0" nounwind {
+; CHECK-LE-LABEL: f1:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    li r0, 259
+; CHECK-LE-NEXT:    mtctr r0
+; CHECK-LE-NEXT:  .LBB1_1: # %entry
+; CHECK-LE-NEXT:    #
+; CHECK-LE-NEXT:    stdu r12, -16(r1)
+; CHECK-LE-NEXT:    bdnz .LBB1_1
+; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    addi r1, r1, 4144
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f1:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    li r0, 260
+; CHECK-BE-NEXT:    mtctr r0
+; CHECK-BE-NEXT:  .LBB1_1: # %entry
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    stdu r12, -16(r1)
+; CHECK-BE-NEXT:    bdnz .LBB1_1
+; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    addi r1, r1, 4160
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f1:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    li r0, 257
+; CHECK-32-NEXT:    mtctr r0
+; CHECK-32-NEXT:  .LBB1_1: # %entry
+; CHECK-32-NEXT:    #
+; CHECK-32-NEXT:    stwu r12, -16(r1)
+; CHECK-32-NEXT:    bdnz .LBB1_1
+; CHECK-32-NEXT:  # %bb.2: # %entry
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    addi r1, r1, 4112
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 4096
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f2() #0 nounwind {
+; CHECK-LE-LABEL: f2:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    stdu r12, -48(r1)
+; CHECK-LE-NEXT:    li r0, 16
+; CHECK-LE-NEXT:    mtctr r0
+; CHECK-LE-NEXT:  .LBB2_1: # %entry
+; CHECK-LE-NEXT:    #
+; CHECK-LE-NEXT:    stdu r12, -4096(r1)
+; CHECK-LE-NEXT:    bdnz .LBB2_1
+; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f2:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    stdu r12, -64(r1)
+; CHECK-BE-NEXT:    li r0, 16
+; CHECK-BE-NEXT:    mtctr r0
+; CHECK-BE-NEXT:  .LBB2_1: # %entry
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    stdu r12, -4096(r1)
+; CHECK-BE-NEXT:    bdnz .LBB2_1
+; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f2:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    stwu r12, -16(r1)
+; CHECK-32-NEXT:    li r0, 16
+; CHECK-32-NEXT:    mtctr r0
+; CHECK-32-NEXT:  .LBB2_1: # %entry
+; CHECK-32-NEXT:    #
+; CHECK-32-NEXT:    stwu r12, -4096(r1)
+; CHECK-32-NEXT:    bdnz .LBB2_1
+; CHECK-32-NEXT:  # %bb.2: # %entry
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 65536
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f3() #0 "stack-probe-size"="32768" nounwind {
+; CHECK-LE-LABEL: f3:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    stdu r12, -48(r1)
+; CHECK-LE-NEXT:    stdu r12, -32768(r1)
+; CHECK-LE-NEXT:    stdu r12, -32768(r1)
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f3:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    stdu r12, -64(r1)
+; CHECK-BE-NEXT:    stdu r12, -32768(r1)
+; CHECK-BE-NEXT:    stdu r12, -32768(r1)
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f3:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    stwu r12, -16(r1)
+; CHECK-32-NEXT:    stwu r12, -32768(r1)
+; CHECK-32-NEXT:    stwu r12, -32768(r1)
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 65536
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+; Same as f2, but without protection.
+define i8 @f4() nounwind {
+; CHECK-LE-LABEL: f4:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    lis r0, -2
+; CHECK-LE-NEXT:    ori r0, r0, 65488
+; CHECK-LE-NEXT:    stdux r1, r1, r0
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f4:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    lis r0, -2
+; CHECK-BE-NEXT:    ori r0, r0, 65472
+; CHECK-BE-NEXT:    stdux r1, r1, r0
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f4:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    lis r0, -2
+; CHECK-32-NEXT:    ori r0, r0, 65520
+; CHECK-32-NEXT:    stwux r1, r1, r0
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 65536
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f5() #0 "stack-probe-size"="65536" nounwind {
+; CHECK-LE-LABEL: f5:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    stdu r12, -48(r1)
+; CHECK-LE-NEXT:    li r0, 16
+; CHECK-LE-NEXT:    mtctr r0
+; CHECK-LE-NEXT:    lis r0, -1
+; CHECK-LE-NEXT:    nop
+; CHECK-LE-NEXT:  .LBB5_1: # %entry
+; CHECK-LE-NEXT:    #
+; CHECK-LE-NEXT:    stdux r12, r1, r0
+; CHECK-LE-NEXT:    bdnz .LBB5_1
+; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f5:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    stdu r12, -64(r1)
+; CHECK-BE-NEXT:    li r0, 16
+; CHECK-BE-NEXT:    mtctr r0
+; CHECK-BE-NEXT:    lis r0, -1
+; CHECK-BE-NEXT:    nop
+; CHECK-BE-NEXT:  .LBB5_1: # %entry
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    stdux r12, r1, r0
+; CHECK-BE-NEXT:    bdnz .LBB5_1
+; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f5:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    stwu r12, -16(r1)
+; CHECK-32-NEXT:    li r0, 16
+; CHECK-32-NEXT:    mtctr r0
+; CHECK-32-NEXT:    lis r0, -1
+; CHECK-32-NEXT:    nop
+; CHECK-32-NEXT:  .LBB5_1: # %entry
+; CHECK-32-NEXT:    #
+; CHECK-32-NEXT:    stwux r12, r1, r0
+; CHECK-32-NEXT:    bdnz .LBB5_1
+; CHECK-32-NEXT:  # %bb.2: # %entry
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 1048576
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f6() #0 nounwind {
+; CHECK-LE-LABEL: f6:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    stdu r12, -48(r1)
+; CHECK-LE-NEXT:    lis r0, 4
+; CHECK-LE-NEXT:    nop
+; CHECK-LE-NEXT:    mtctr r0
+; CHECK-LE-NEXT:  .LBB6_1: # %entry
+; CHECK-LE-NEXT:    #
+; CHECK-LE-NEXT:    stdu r12, -4096(r1)
+; CHECK-LE-NEXT:    bdnz .LBB6_1
+; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 48(r1)
+; CHECK-LE-NEXT:    lbz r3, 48(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f6:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    stdu r12, -64(r1)
+; CHECK-BE-NEXT:    lis r0, 4
+; CHECK-BE-NEXT:    nop
+; CHECK-BE-NEXT:    mtctr r0
+; CHECK-BE-NEXT:  .LBB6_1: # %entry
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    stdu r12, -4096(r1)
+; CHECK-BE-NEXT:    bdnz .LBB6_1
+; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 64(r1)
+; CHECK-BE-NEXT:    lbz r3, 64(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f6:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    stwu r12, -16(r1)
+; CHECK-32-NEXT:    lis r0, 4
+; CHECK-32-NEXT:    nop
+; CHECK-32-NEXT:    mtctr r0
+; CHECK-32-NEXT:  .LBB6_1: # %entry
+; CHECK-32-NEXT:    #
+; CHECK-32-NEXT:    stwu r12, -4096(r1)
+; CHECK-32-NEXT:    bdnz .LBB6_1
+; CHECK-32-NEXT:  # %bb.2: # %entry
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 16(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 16(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 1073741824
+  %b = getelementptr inbounds i8, i8* %a, i64 63
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+define i8 @f7() #0 "stack-probe-size"="65536" nounwind {
+; CHECK-LE-LABEL: f7:
+; CHECK-LE:       # %bb.0: # %entry
+; CHECK-LE-NEXT:    lis r0, -1
+; CHECK-LE-NEXT:    mr r12, r1
+; CHECK-LE-NEXT:    ori r0, r0, 13776
+; CHECK-LE-NEXT:    stdux r12, r1, r0
+; CHECK-LE-NEXT:    li r0, 15258
+; CHECK-LE-NEXT:    mtctr r0
+; CHECK-LE-NEXT:    lis r0, -1
+; CHECK-LE-NEXT:    nop
+; CHECK-LE-NEXT:  .LBB7_1: # %entry
+; CHECK-LE-NEXT:    #
+; CHECK-LE-NEXT:    stdux r12, r1, r0
+; CHECK-LE-NEXT:    bdnz .LBB7_1
+; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    li r3, 3
+; CHECK-LE-NEXT:    stb r3, 41(r1)
+; CHECK-LE-NEXT:    lbz r3, 41(r1)
+; CHECK-LE-NEXT:    ld r1, 0(r1)
+; CHECK-LE-NEXT:    blr
+;
+; CHECK-BE-LABEL: f7:
+; CHECK-BE:       # %bb.0: # %entry
+; CHECK-BE-NEXT:    lis r0, -1
+; CHECK-BE-NEXT:    mr r12, r1
+; CHECK-BE-NEXT:    ori r0, r0, 13760
+; CHECK-BE-NEXT:    stdux r12, r1, r0
+; CHECK-BE-NEXT:    li r0, 15258
+; CHECK-BE-NEXT:    mtctr r0
+; CHECK-BE-NEXT:    lis r0, -1
+; CHECK-BE-NEXT:    nop
+; CHECK-BE-NEXT:  .LBB7_1: # %entry
+; CHECK-BE-NEXT:    #
+; CHECK-BE-NEXT:    stdux r12, r1, r0
+; CHECK-BE-NEXT:    bdnz .LBB7_1
+; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    li r3, 3
+; CHECK-BE-NEXT:    stb r3, 57(r1)
+; CHECK-BE-NEXT:    lbz r3, 57(r1)
+; CHECK-BE-NEXT:    ld r1, 0(r1)
+; CHECK-BE-NEXT:    blr
+;
+; CHECK-32-LABEL: f7:
+; CHECK-32:       # %bb.0: # %entry
+; CHECK-32-NEXT:    lis r0, -1
+; CHECK-32-NEXT:    mr r12, r1
+; CHECK-32-NEXT:    ori r0, r0, 13808
+; CHECK-32-NEXT:    stwux r12, r1, r0
+; CHECK-32-NEXT:    li r0, 15258
+; CHECK-32-NEXT:    mtctr r0
+; CHECK-32-NEXT:    lis r0, -1
+; CHECK-32-NEXT:    nop
+; CHECK-32-NEXT:  .LBB7_1: # %entry
+; CHECK-32-NEXT:    #
+; CHECK-32-NEXT:    stwux r12, r1, r0
+; CHECK-32-NEXT:    bdnz .LBB7_1
+; CHECK-32-NEXT:  # %bb.2: # %entry
+; CHECK-32-NEXT:    sub r0, r1, r12
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    stb r3, 9(r1)
+; CHECK-32-NEXT:    mr r0, r31
+; CHECK-32-NEXT:    lbz r3, 9(r1)
+; CHECK-32-NEXT:    lwz r31, 0(r1)
+; CHECK-32-NEXT:    mr r1, r31
+; CHECK-32-NEXT:    mr r31, r0
+; CHECK-32-NEXT:    blr
+entry:
+  %a = alloca i8, i64 1000000007
+  %b = getelementptr inbounds i8, i8* %a, i64 101
+  store volatile i8 3, i8* %a
+  %c = load volatile i8, i8* %a
+  ret i8 %c
+}
+
+attributes #0 = { "probe-stack"="inline-asm" }

diff  --git a/llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll b/llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll
index e595d8a732a5..eb8e05eef519 100644
--- a/llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll
+++ b/llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll
@@ -41,7 +41,7 @@ entry:
   ret i8 %c
 }
 
-define i8 @f1() #0 "stack-probe-size"="0" nounwind {
+define i8 @f1() #0 "stack-probe-size"="0" {
 ; CHECK-LE-LABEL: f1:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    mr r12, r1
@@ -52,6 +52,7 @@ define i8 @f1() #0 "stack-probe-size"="0" nounwind {
 ; CHECK-LE-NEXT:    stdu r12, -16(r1)
 ; CHECK-LE-NEXT:    bdnz .LBB1_1
 ; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 4144
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -68,6 +69,7 @@ define i8 @f1() #0 "stack-probe-size"="0" nounwind {
 ; CHECK-BE-NEXT:    stdu r12, -16(r1)
 ; CHECK-BE-NEXT:    bdnz .LBB1_1
 ; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 4160
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -84,10 +86,11 @@ define i8 @f1() #0 "stack-probe-size"="0" nounwind {
 ; CHECK-32-NEXT:    stwu r12, -16(r1)
 ; CHECK-32-NEXT:    bdnz .LBB1_1
 ; CHECK-32-NEXT:  # %bb.2: # %entry
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 4112
+; CHECK-32-NEXT:    li r3, 3
+; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
 ; CHECK-32-NEXT:    addi r1, r1, 4112
 ; CHECK-32-NEXT:    blr
@@ -99,7 +102,7 @@ entry:
   ret i8 %c
 }
 
-define i8 @f2() #0 nounwind {
+define i8 @f2() #0 {
 ; CHECK-LE-LABEL: f2:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    mr r12, r1
@@ -111,6 +114,7 @@ define i8 @f2() #0 nounwind {
 ; CHECK-LE-NEXT:    stdu r12, -4096(r1)
 ; CHECK-LE-NEXT:    bdnz .LBB2_1
 ; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 65584
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -128,6 +132,7 @@ define i8 @f2() #0 nounwind {
 ; CHECK-BE-NEXT:    stdu r12, -4096(r1)
 ; CHECK-BE-NEXT:    bdnz .LBB2_1
 ; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 65600
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -146,8 +151,9 @@ define i8 @f2() #0 nounwind {
 ; CHECK-32-NEXT:    bdnz .LBB2_1
 ; CHECK-32-NEXT:  # %bb.2: # %entry
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 65552
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
@@ -163,13 +169,14 @@ entry:
   ret i8 %c
 }
 
-define i8 @f3() #0 "stack-probe-size"="32768" nounwind {
+define i8 @f3() #0 "stack-probe-size"="32768" {
 ; CHECK-LE-LABEL: f3:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    mr r12, r1
 ; CHECK-LE-NEXT:    stdu r12, -48(r1)
 ; CHECK-LE-NEXT:    stdu r12, -32768(r1)
 ; CHECK-LE-NEXT:    stdu r12, -32768(r1)
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 65584
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -182,6 +189,7 @@ define i8 @f3() #0 "stack-probe-size"="32768" nounwind {
 ; CHECK-BE-NEXT:    stdu r12, -64(r1)
 ; CHECK-BE-NEXT:    stdu r12, -32768(r1)
 ; CHECK-BE-NEXT:    stdu r12, -32768(r1)
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 65600
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -195,8 +203,9 @@ define i8 @f3() #0 "stack-probe-size"="32768" nounwind {
 ; CHECK-32-NEXT:    stwu r12, -32768(r1)
 ; CHECK-32-NEXT:    stwu r12, -32768(r1)
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 65552
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
@@ -213,12 +222,13 @@ entry:
 }
 
 ; Same as f2, but without protection.
-define i8 @f4() nounwind {
+define i8 @f4() {
 ; CHECK-LE-LABEL: f4:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    lis r0, -2
 ; CHECK-LE-NEXT:    ori r0, r0, 65488
 ; CHECK-LE-NEXT:    stdux r1, r1, r0
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 65584
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -230,6 +240,7 @@ define i8 @f4() nounwind {
 ; CHECK-BE-NEXT:    lis r0, -2
 ; CHECK-BE-NEXT:    ori r0, r0, 65472
 ; CHECK-BE-NEXT:    stdux r1, r1, r0
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 65600
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -241,8 +252,9 @@ define i8 @f4() nounwind {
 ; CHECK-32-NEXT:    lis r0, -2
 ; CHECK-32-NEXT:    ori r0, r0, 65520
 ; CHECK-32-NEXT:    stwux r1, r1, r0
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 65552
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
@@ -258,7 +270,7 @@ entry:
   ret i8 %c
 }
 
-define i8 @f5() #0 "stack-probe-size"="65536" nounwind {
+define i8 @f5() #0 "stack-probe-size"="65536" {
 ; CHECK-LE-LABEL: f5:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    mr r12, r1
@@ -272,6 +284,7 @@ define i8 @f5() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-LE-NEXT:    stdux r12, r1, r0
 ; CHECK-LE-NEXT:    bdnz .LBB5_1
 ; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 1048624
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -291,6 +304,7 @@ define i8 @f5() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-BE-NEXT:    stdux r12, r1, r0
 ; CHECK-BE-NEXT:    bdnz .LBB5_1
 ; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 1048640
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -311,8 +325,9 @@ define i8 @f5() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-32-NEXT:    bdnz .LBB5_1
 ; CHECK-32-NEXT:  # %bb.2: # %entry
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 1048592
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
@@ -328,7 +343,7 @@ entry:
   ret i8 %c
 }
 
-define i8 @f6() #0 nounwind {
+define i8 @f6() #0 {
 ; CHECK-LE-LABEL: f6:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    mr r12, r1
@@ -341,6 +356,7 @@ define i8 @f6() #0 nounwind {
 ; CHECK-LE-NEXT:    stdu r12, -4096(r1)
 ; CHECK-LE-NEXT:    bdnz .LBB6_1
 ; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 1073741872
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 48(r1)
 ; CHECK-LE-NEXT:    lbz r3, 48(r1)
@@ -359,6 +375,7 @@ define i8 @f6() #0 nounwind {
 ; CHECK-BE-NEXT:    stdu r12, -4096(r1)
 ; CHECK-BE-NEXT:    bdnz .LBB6_1
 ; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 1073741888
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 64(r1)
 ; CHECK-BE-NEXT:    lbz r3, 64(r1)
@@ -378,8 +395,9 @@ define i8 @f6() #0 nounwind {
 ; CHECK-32-NEXT:    bdnz .LBB6_1
 ; CHECK-32-NEXT:  # %bb.2: # %entry
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 1073741840
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 16(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 16(r1)
@@ -395,7 +413,7 @@ entry:
   ret i8 %c
 }
 
-define i8 @f7() #0 "stack-probe-size"="65536" nounwind {
+define i8 @f7() #0 "stack-probe-size"="65536" {
 ; CHECK-LE-LABEL: f7:
 ; CHECK-LE:       # %bb.0: # %entry
 ; CHECK-LE-NEXT:    lis r0, -1
@@ -411,6 +429,7 @@ define i8 @f7() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-LE-NEXT:    stdux r12, r1, r0
 ; CHECK-LE-NEXT:    bdnz .LBB7_1
 ; CHECK-LE-NEXT:  # %bb.2: # %entry
+; CHECK-LE-NEXT:    .cfi_def_cfa_offset 1000000048
 ; CHECK-LE-NEXT:    li r3, 3
 ; CHECK-LE-NEXT:    stb r3, 41(r1)
 ; CHECK-LE-NEXT:    lbz r3, 41(r1)
@@ -432,6 +451,7 @@ define i8 @f7() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-BE-NEXT:    stdux r12, r1, r0
 ; CHECK-BE-NEXT:    bdnz .LBB7_1
 ; CHECK-BE-NEXT:  # %bb.2: # %entry
+; CHECK-BE-NEXT:    .cfi_def_cfa_offset 1000000064
 ; CHECK-BE-NEXT:    li r3, 3
 ; CHECK-BE-NEXT:    stb r3, 57(r1)
 ; CHECK-BE-NEXT:    lbz r3, 57(r1)
@@ -454,8 +474,9 @@ define i8 @f7() #0 "stack-probe-size"="65536" nounwind {
 ; CHECK-32-NEXT:    bdnz .LBB7_1
 ; CHECK-32-NEXT:  # %bb.2: # %entry
 ; CHECK-32-NEXT:    sub r0, r1, r12
-; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    sub r0, r1, r0
+; CHECK-32-NEXT:    .cfi_def_cfa_offset 1000000016
+; CHECK-32-NEXT:    li r3, 3
 ; CHECK-32-NEXT:    stb r3, 9(r1)
 ; CHECK-32-NEXT:    mr r0, r31
 ; CHECK-32-NEXT:    lbz r3, 9(r1)


        

From llvm-commits at lists.llvm.org  Sun Jul 12 21:40:53 2020
From: llvm-commits at lists.llvm.org (Max Kazantsev via llvm-commits)
Date: Sun, 12 Jul 2020 21:40:53 -0700 (PDT)
Subject: [llvm] e808cab - [InstCombine] Improve select -> phi
 canonicalization: consider more blocks
Message-ID: <5f0be5d5.1c69fb81.7a62.dcec@mx.google.com>


Author: Max Kazantsev
Date: 2020-07-13T11:40:32+07:00
New Revision: e808cab824488af137b62902e65dec3827b83b46

URL: https://github.com/llvm/llvm-project/commit/e808cab824488af137b62902e65dec3827b83b46
DIFF: https://github.com/llvm/llvm-project/commit/e808cab824488af137b62902e65dec3827b83b46.diff

LOG: [InstCombine] Improve select -> phi canonicalization: consider more blocks

We can try to replace select with a Phi not in its parent block alone,
but also in blocks of its arguments. We benefit from it when select's
argument is a Phi.

Differential Revision: https://reviews.llvm.org/D83284
Reviewed By: nikic

Added: 
    

Modified: 
    llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
    llvm/test/Transforms/InstCombine/select.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 233fb3878ba7..17124f717af7 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -2443,11 +2443,11 @@ Instruction *InstCombiner::foldVectorSelect(SelectInst &Sel) {
   return nullptr;
 }
 
-static Instruction *foldSelectToPhi(SelectInst &Sel, const DominatorTree &DT,
-                                    InstCombiner::BuilderTy &Builder) {
+static Instruction *foldSelectToPhiImpl(SelectInst &Sel, BasicBlock *BB,
+                                        const DominatorTree &DT,
+                                        InstCombiner::BuilderTy &Builder) {
   // Find the block's immediate dominator that ends with a conditional branch
   // that matches select's condition (maybe inverted).
-  BasicBlock *BB = Sel.getParent();
   auto *IDomNode = DT[BB]->getIDom();
   if (!IDomNode)
     return nullptr;
@@ -2500,6 +2500,21 @@ static Instruction *foldSelectToPhi(SelectInst &Sel, const DominatorTree &DT,
   return PN;
 }
 
+static Instruction *foldSelectToPhi(SelectInst &Sel, const DominatorTree &DT,
+                                    InstCombiner::BuilderTy &Builder) {
+  // Try to replace this select with Phi in one of these blocks.
+  SmallSetVector CandidateBlocks;
+  CandidateBlocks.insert(Sel.getParent());
+  for (Value *V : Sel.operands())
+    if (auto *I = dyn_cast(V))
+      CandidateBlocks.insert(I->getParent());
+
+  for (BasicBlock *BB : CandidateBlocks)
+    if (auto *PN = foldSelectToPhiImpl(Sel, BB, DT, Builder))
+      return PN;
+  return nullptr;
+}
+
 Instruction *InstCombiner::visitSelectInst(SelectInst &SI) {
   Value *CondVal = SI.getCondition();
   Value *TrueVal = SI.getTrueValue();

diff  --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index f990a58f984c..08e547a6ea0a 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2250,11 +2250,40 @@ define i32 @test_select_into_phi_not_idom(i1 %cond, i32 %A, i32 %B)  {
 ; CHECK:       if.false:
 ; CHECK-NEXT:    br label [[MERGE]]
 ; CHECK:       merge:
-; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ [[A:%.*]], [[IF_TRUE]] ], [ [[B:%.*]], [[IF_FALSE]] ]
 ; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       exit:
-; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[COND]], i32 [[PHI]], i32 [[A]]
-; CHECK-NEXT:    ret i32 [[SEL]]
+; CHECK-NEXT:    ret i32 [[A:%.*]]
+;
+entry:
+  br i1 %cond, label %if.true, label %if.false
+
+if.true:
+  br label %merge
+
+if.false:
+  br label %merge
+
+merge:
+  %phi = phi i32 [%A, %if.true], [%B, %if.false]
+  br label %exit
+
+exit:
+  %sel = select i1 %cond, i32 %phi, i32 %A
+  ret i32 %sel
+}
+
+define i32 @test_select_into_phi_not_idom_2(i1 %cond, i32 %A, i32 %B)  {
+; CHECK-LABEL: @test_select_into_phi_not_idom_2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]]
+; CHECK:       if.true:
+; CHECK-NEXT:    br label [[MERGE:%.*]]
+; CHECK:       if.false:
+; CHECK-NEXT:    br label [[MERGE]]
+; CHECK:       merge:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[B:%.*]]
 ;
 entry:
   br i1 %cond, label %if.true, label %if.false
@@ -2269,11 +2298,145 @@ merge:
   %phi = phi i32 [%A, %if.true], [%B, %if.false]
   br label %exit
 
+exit:
+  %sel = select i1 %cond, i32 %B, i32 %phi
+  ret i32 %sel
+}
+
+define i32 @test_select_into_phi_not_idom_inverted(i1 %cond, i32 %A, i32 %B)  {
+; CHECK-LABEL: @test_select_into_phi_not_idom_inverted(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_FALSE:%.*]], label [[IF_TRUE:%.*]]
+; CHECK:       if.true:
+; CHECK-NEXT:    br label [[MERGE:%.*]]
+; CHECK:       if.false:
+; CHECK-NEXT:    br label [[MERGE]]
+; CHECK:       merge:
+; CHECK-NEXT:    [[SEL:%.*]] = phi i32 [ [[B:%.*]], [[IF_FALSE]] ], [ [[A:%.*]], [[IF_TRUE]] ]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[SEL]]
+;
+entry:
+  %inverted = xor i1 %cond, 1
+  br i1 %inverted, label %if.true, label %if.false
+
+if.true:
+  br label %merge
+
+if.false:
+  br label %merge
+
+merge:
+  %phi = phi i32 [%A, %if.true], [%B, %if.false]
+  br label %exit
+
+exit:
+  %sel = select i1 %cond, i32 %phi, i32 %A
+  ret i32 %sel
+}
+
+define i32 @test_select_into_phi_not_idom_inverted_2(i1 %cond, i32 %A, i32 %B)  {
+; CHECK-LABEL: @test_select_into_phi_not_idom_inverted_2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_FALSE:%.*]], label [[IF_TRUE:%.*]]
+; CHECK:       if.true:
+; CHECK-NEXT:    br label [[MERGE:%.*]]
+; CHECK:       if.false:
+; CHECK-NEXT:    br label [[MERGE]]
+; CHECK:       merge:
+; CHECK-NEXT:    [[SEL:%.*]] = phi i32 [ [[B:%.*]], [[IF_FALSE]] ], [ [[A:%.*]], [[IF_TRUE]] ]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[SEL]]
+;
+entry:
+  %inverted = xor i1 %cond, 1
+  br i1 %inverted, label %if.true, label %if.false
+
+if.true:
+  br label %merge
+
+if.false:
+  br label %merge
+
+merge:
+  %phi = phi i32 [%A, %if.true], [%B, %if.false]
+  br label %exit
+
+exit:
+  %sel = select i1 %cond, i32 %B, i32 %phi
+  ret i32 %sel
+}
+
+define i32 @test_select_into_phi_not_idom_no_dom_input_1(i1 %cond, i32 %A, i32 %B, i32 *%p)  {
+; CHECK-LABEL: @test_select_into_phi_not_idom_no_dom_input_1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]]
+; CHECK:       if.true:
+; CHECK-NEXT:    [[C:%.*]] = load i32, i32* [[P:%.*]], align 4
+; CHECK-NEXT:    br label [[MERGE:%.*]]
+; CHECK:       if.false:
+; CHECK-NEXT:    br label [[MERGE]]
+; CHECK:       merge:
+; CHECK-NEXT:    [[SEL:%.*]] = phi i32 [ [[A:%.*]], [[IF_FALSE]] ], [ [[C]], [[IF_TRUE]] ]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[SEL]]
+;
+entry:
+  br i1 %cond, label %if.true, label %if.false
+
+if.true:
+  %C = load i32, i32* %p
+  br label %merge
+
+if.false:
+  br label %merge
+
+merge:
+  %phi = phi i32 [%C, %if.true], [%B, %if.false]
+  br label %exit
+
 exit:
   %sel = select i1 %cond, i32 %phi, i32 %A
   ret i32 %sel
 }
 
+define i32 @test_select_into_phi_not_idom_no_dom_input_2(i1 %cond, i32 %A, i32 %B, i32 *%p)  {
+; CHECK-LABEL: @test_select_into_phi_not_idom_no_dom_input_2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]]
+; CHECK:       if.true:
+; CHECK-NEXT:    br label [[MERGE:%.*]]
+; CHECK:       if.false:
+; CHECK-NEXT:    [[C:%.*]] = load i32, i32* [[P:%.*]], align 4
+; CHECK-NEXT:    br label [[MERGE]]
+; CHECK:       merge:
+; CHECK-NEXT:    [[SEL:%.*]] = phi i32 [ [[C]], [[IF_FALSE]] ], [ [[B:%.*]], [[IF_TRUE]] ]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[SEL]]
+;
+entry:
+  br i1 %cond, label %if.true, label %if.false
+
+if.true:
+  br label %merge
+
+if.false:
+  %C = load i32, i32* %p
+  br label %merge
+
+merge:
+  %phi = phi i32 [%A, %if.true], [%C, %if.false]
+  br label %exit
+
+exit:
+  %sel = select i1 %cond, i32 %B, i32 %phi
+  ret i32 %sel
+}
+
 ; Negative tests to ensure we don't remove selects with undef true/false values.
 ; See https://bugs.llvm.org/show_bug.cgi?id=31633
 ; https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html


        

From llvm-commits at lists.llvm.org  Sun Jul 12 21:41:07 2020
From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 04:41:07 +0000 (UTC)
Subject: [PATCH] D83284: [InstCombine] Improve select -> phi canonicalization:
 consider more blocks
In-Reply-To: 
References: 
Message-ID: 

This revision was automatically updated to reflect the committed changes.
Closed by commit rGe808cab82448: [InstCombine] Improve select -> phi canonicalization: consider more blocks (authored by mkazantsev).

Changed prior to commit:
  https://reviews.llvm.org/D83284?vs=276994&id=277323#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83284/new/

https://reviews.llvm.org/D83284

Files:
  llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
  llvm/test/Transforms/InstCombine/select.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83284.277323.patch
Type: text/x-patch
Size: 7233 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 21:56:17 2020
From: llvm-commits at lists.llvm.org (Qiu Chaofan via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 04:56:17 +0000 (UTC)
Subject: [PATCH] D83654: [PowerPC] Support constrained vector fp/int conversion
Message-ID: 

qiucf created this revision.
qiucf added reviewers: nemanjai, jsji, uweigand, steven.zhang, kpn, PowerPC, kbarton.
Herald added subscribers: llvm-commits, shchenz, hiraditya.
Herald added a project: LLVM.

This patch makes these operations legal, and add necessary codegen patterns.

There's still some issue similar to D77033  for conversion from/to `v1i128`, but normal type tests synced from X86/SystemZ's `vector-constrained-fp-intrinsics.ll` are all okay.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D83654

Files:
  llvm/lib/Target/PowerPC/PPCISelLowering.cpp
  llvm/lib/Target/PowerPC/PPCInstrVSX.td
  llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83654.277324.patch
Type: text/x-patch
Size: 153855 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 21:59:13 2020
From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 04:59:13 +0000 (UTC)
Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in
 prologue
In-Reply-To: 
References: 
Message-ID: <6e070eef2267b1c4342106ad85e63bad@localhost.localdomain>

lkail updated this revision to Diff 277325.
lkail added a comment.

Address @jsji 's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83276/new/

https://reviews.llvm.org/D83276

Files:
  llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
  llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83276.277325.patch
Type: text/x-patch
Size: 13312 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 22:03:00 2020
From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 05:03:00 +0000 (UTC)
Subject: [PATCH] D83655: [AsmPrinter] Split up .gcc_except_table
Message-ID: 

MaskRay created this revision.
MaskRay added reviewers: abidh, grimar, jhenderson, jrtc27, psmith.
Herald added subscribers: llvm-commits, luismarques, s.egerton, lenary, PkmX, simoncook, hiraditya, ki.stfu.
Herald added a project: LLVM.

MC currently produces monolithic .gcc_except_table section. GCC can split up .gcc_except_table:

- if comdat: `.section .gcc_except_table._Z6comdatv,"aG", at progbits,_Z6comdatv,comdat`
- otherwise, if -ffunction-sections: `.section .gcc_except_table._Z3fooi,"a", at progbits`

This ensures that (a) non-prevailing copies are discarded and (b)
.gcc_except_table associated to discarded text sections can be discarded by a
.gcc_except_table-aware linker (GNU ld, but not gold or LLD)

This patches matches the behavior and strengthens it by leveraging
SHF_LINK_ORDER (if integrated assembler) to make (b) work with
.gcc_except_table-unaware linkers (LLD). SHF_LINK_ORDER is a binutils
2.35 (not released yet) feature, so we don't use it for clang -fno-integrated-as.
GCC appends a suffix to ".gcc_except_table", but this just wastes .strtab space,
so we simply use the same string (i.e. as if -fno-unique-section-names
is specified).

For RISC-V -mrelax, this patch additionally fixes an assembler-linker
interaction problem: because a section is shrinkable, the length of a call-site
code range is not a constant. Relocations referencing the associated text
section (STT_SECTION) are needed. However, a STB_LOCAL relocation referencing a
discarded section group member from outside the group is disallowed by the ELF
specification (PR46675):

  // a.cc
  inline int comdat() { try { throw 1; } catch (int) { return 1; } return 0; }
  int main() { return comdat(); }
  
  // b.cc
  inline int comdat() { try { throw 1; } catch (int) { return 1; } return 0; }
  int foo() { return comdat(); }
  
  clang++ -target riscv64-linux -c a.cc b.cc -fPIC
  ld.lld -shared a.o b.o => ld.lld: error: relocation refers to a symbol in a discarded section:


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D83655

Files:
  llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
  llvm/test/CodeGen/X86/gcc_except_table-multi.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83655.277326.patch
Type: text/x-patch
Size: 4281 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 22:12:44 2020
From: llvm-commits at lists.llvm.org (Fangrui Song via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 05:12:44 +0000 (UTC)
Subject: [PATCH] D83655: [AsmPrinter] Split up .gcc_except_table
In-Reply-To: 
References: 
Message-ID: 

MaskRay marked an inline comment as done.
MaskRay added inline comments.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp:440
+        Flags |= ELF::SHF_GROUP;
+      }
+      if (Asm->MAI->useIntegratedAssembler()) {
----------------
Forgot to add comments. I'd write

`// GNU as<2.35 does not support SHF_LINK_ORDER. If we are using the integrated assembler, use SHF_LINK_ORDER so that the linker can discard .gcc_except_table associated to a discarded text section.`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83655/new/

https://reviews.llvm.org/D83655




From llvm-commits at lists.llvm.org  Sun Jul 12 22:20:37 2020
From: llvm-commits at lists.llvm.org (Kai Luo via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 05:20:37 +0000 (UTC)
Subject: [PATCH] D83276: [PowerPC] Generate CFI directives when probing in
 prologue
In-Reply-To: 
References: 
Message-ID: <63f473befbccd1f77b4d08e1753eac8b@localhost.localdomain>

lkail updated this revision to Diff 277328.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83276/new/

https://reviews.llvm.org/D83276

Files:
  llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
  llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83276.277328.patch
Type: text/x-patch
Size: 13366 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Sun Jul 12 22:39:33 2020
From: llvm-commits at lists.llvm.org (Max Kazantsev via Phabricator via llvm-commits)
Date: Mon, 13 Jul 2020 05:39:33 +0000 (UTC)
Subject: [PATCH] D81375: [InstCombine] Simplify boolean Phis with const inputs
 using CFG
In-Reply-To: 
References: 
Message-ID: 

mkazantsev updated this revision to Diff 277331.
mkazantsev added a comment.

At last!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81375/new/

https://reviews.llvm.org/D81375

Files:
  llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
  llvm/test/Transforms/CallSiteSplitting/callsite-split.ll
  llvm/test/Transforms/InstCombine/branch.ll
  llvm/test/Transforms/InstCombine/icmp-constant-phi.ll
  llvm/test/Transforms/InstCombine/phi.ll
  llvm/test/Transforms/InstCombine/select.ll
  llvm/test/Transforms/InstCombine/simple_phi_condition.ll
  llvm/test/Transforms/PhaseOrdering/simplifycfg-options.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D81375.277331.patch
Type: text/x-patch
Size: 14028 bytes
Desc: not available
URL: 

From llvm-commits at lists.llvm.org  Fri Jul 10 09:30:39 2020
From: llvm-commits at lists.llvm.org (Zhang Kang via Phabricator via llvm-commits)
Date: Fri, 10 Jul 2020 16:30:39 +0000 (UTC)
Subject: [PATCH] D80274: [MachineVerifier] Handle the PHI node for
 verifyLiveVariables()
In-Reply-To: 
References: 
Message-ID: <28847a6c7cdf8a975fe8eb607c7e577b@localhost.localdomain>

ZhangKang marked an inline comment as done.
ZhangKang added a comment.

ping...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80274/new/

https://reviews.llvm.org/D80274




From llvm-commits at lists.llvm.org  Fri Jul 10 09:57:42 2020
From: llvm-commits at lists.llvm.org (Nikita Popov via Phabricator via llvm-commits)
Date: Fri, 10 Jul 2020 16:57:42 +0000 (UTC)
Subject: [PATCH] D81728: [InstCombine] Add target-specific inst combining
In-Reply-To: 
References: 
Message-ID: 

nikic added inline comments.


================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:540
+  bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
+                            Instruction **ResultI) const;
+  bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
----------------
For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction **` argument and bool result, is there any reason not to have an `Instruction *` return value, with nullptr indicating that the intrinsic couldn't be simplified?


================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:542
+  bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
+                                        APInt DemandedMask, KnownBits &Known,
+                                        bool &KnownBitsComputed,
----------------
`const APInt &DemandedMask`?


================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:546
+  bool simplifyDemandedVectorEltsIntrinsic(
+      InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
+      APInt &UndefElts2, APInt &UndefElts3,
----------------
`const APInt &DemandedElts`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728




From llvm-commits at lists.llvm.org  Fri Jul 10 12:22:53 2020
From: llvm-commits at lists.llvm.org (Sebastian Neubauer via Phabricator via llvm-commits)
Date: Fri, 10 Jul 2020 19:22:53 +0000 (UTC)
Subject: [PATCH] D81728: [InstCombine] Add target-specific inst combining
In-Reply-To: 
References: 
Message-ID: <6e8b02f127ad4a766c5a81ce99096b00@localhost.localdomain>

Flakebi marked an inline comment as done.
Flakebi added inline comments.


================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:540
+  bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
+                            Instruction **ResultI) const;
+  bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
----------------
nikic wrote:
> For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction **` argument and bool result, is there any reason not to have an `Instruction *` return value, with nullptr indicating that the intrinsic couldn't be simplified?
Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is called or other code is executed after `instCombineIntrinsic`.
So, somehow the caller must be able to see a difference between 'do nothing, just continue execution' and 'return this Instruction*', where the `Instruction*` can also be a nullptr.
The return type could be an `optional`.

I’ll take a look at your other comments on Monday.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728




From llvm-commits at lists.llvm.org  Fri Jul 10 15:33:50 2020
From: llvm-commits at lists.llvm.org (Chris Lattner via Phabricator via llvm-commits)
Date: Fri, 10 Jul 2020 22:33:50 +0000 (UTC)
Subject: [PATCH] D81728: [InstCombine] Add target-specific inst combining
In-Reply-To: 
References: 
Message-ID: <30c1e42316f35db021f77f92f69bc156@localhost.localdomain>

lattner resigned from this revision.
lattner added a comment.

Please don't consider me a blocker on this patch, thank you for pushing on it!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728